diff --git a/.github/ISSUE_TEMPLATE/10_question.md b/.github/ISSUE_TEMPLATE/10_question.md index 6e23fbdc605..a112b9599d5 100644 --- a/.github/ISSUE_TEMPLATE/10_question.md +++ b/.github/ISSUE_TEMPLATE/10_question.md @@ -7,6 +7,6 @@ assignees: '' --- -Make sure to check documentation https://clickhouse.yandex/docs/en/ first. If the question is concise and probably has a short answer, asking it in Telegram chat https://telegram.me/clickhouse_en is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse +> Make sure to check documentation https://clickhouse.yandex/docs/en/ first. If the question is concise and probably has a short answer, asking it in Telegram chat https://telegram.me/clickhouse_en is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse -If you still prefer GitHub issues, remove all this text and ask your question here. +> If you still prefer GitHub issues, remove all this text and ask your question here. diff --git a/.github/ISSUE_TEMPLATE/20_feature-request.md b/.github/ISSUE_TEMPLATE/20_feature-request.md index 99a762a2019..f59dbc2c40f 100644 --- a/.github/ISSUE_TEMPLATE/20_feature-request.md +++ b/.github/ISSUE_TEMPLATE/20_feature-request.md @@ -7,16 +7,20 @@ assignees: '' --- -(you don't have to strictly follow this form) +> (you don't have to strictly follow this form) **Use case** -A clear and concise description of what is the intended usage scenario is. + +> A clear and concise description of what is the intended usage scenario is. **Describe the solution you'd like** -A clear and concise description of what you want to happen. + +> A clear and concise description of what you want to happen. **Describe alternatives you've considered** -A clear and concise description of any alternative solutions or features you've considered. + +> A clear and concise description of any alternative solutions or features you've considered. **Additional context** -Add any other context or screenshots about the feature request here. + +> Add any other context or screenshots about the feature request here. diff --git a/.github/ISSUE_TEMPLATE/40_bug-report.md b/.github/ISSUE_TEMPLATE/40_bug-report.md index 5c8611d47e6..d62ec578f8d 100644 --- a/.github/ISSUE_TEMPLATE/40_bug-report.md +++ b/.github/ISSUE_TEMPLATE/40_bug-report.md @@ -7,11 +7,11 @@ assignees: '' --- -You have to provide the following information whenever possible. +> You have to provide the following information whenever possible. **Describe the bug** -A clear and concise description of what works not as it is supposed to. +> A clear and concise description of what works not as it is supposed to. **Does it reproduce on recent release?** @@ -19,7 +19,7 @@ A clear and concise description of what works not as it is supposed to. **Enable crash reporting** -If possible, change "enabled" to true in "send_crash_reports" section in `config.xml`: +> If possible, change "enabled" to true in "send_crash_reports" section in `config.xml`: ``` @@ -39,12 +39,12 @@ If possible, change "enabled" to true in "send_crash_reports" section in `config **Expected behavior** -A clear and concise description of what you expected to happen. +> A clear and concise description of what you expected to happen. **Error message and/or stacktrace** -If applicable, add screenshots to help explain your problem. +> If applicable, add screenshots to help explain your problem. **Additional context** -Add any other context about the problem here. +> Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/50_build-issue.md b/.github/ISSUE_TEMPLATE/50_build-issue.md index 73c97cc3cfb..a358575cd7c 100644 --- a/.github/ISSUE_TEMPLATE/50_build-issue.md +++ b/.github/ISSUE_TEMPLATE/50_build-issue.md @@ -7,10 +7,11 @@ assignees: '' --- -Make sure that `git diff` result is empty and you've just pulled fresh master. Try cleaning up cmake cache. Just in case, official build instructions are published here: https://clickhouse.yandex/docs/en/development/build/ +> Make sure that `git diff` result is empty and you've just pulled fresh master. Try cleaning up cmake cache. Just in case, official build instructions are published here: https://clickhouse.yandex/docs/en/development/build/ **Operating system** -OS kind or distribution, specific version/release, non-standard kernel if any. If you are trying to build inside virtual machine, please mention it too. + +> OS kind or distribution, specific version/release, non-standard kernel if any. If you are trying to build inside virtual machine, please mention it too. **Cmake version** diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index db923369296..d3fac8670e8 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -2,28 +2,26 @@ I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla Changelog category (leave one): - New Feature -- Bug Fix - Improvement +- Bug Fix - Performance Improvement - Backward Incompatible Change - Build/Testing/Packaging Improvement - Documentation (changelog entry is not required) -- Other - Not for changelog (changelog entry is not required) Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md): - ... Detailed description / Documentation draft: - ... -By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in [docs](https://github.com/ClickHouse/ClickHouse/tree/master/docs) folder. -If you are doing this for the first time, it's recommended to read the lightweight [Contributing to ClickHouse Documentation](https://github.com/ClickHouse/ClickHouse/tree/master/docs/README.md) guide first. +> By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in [docs](https://github.com/ClickHouse/ClickHouse/tree/master/docs) folder. + +> If you are doing this for the first time, it's recommended to read the lightweight [Contributing to ClickHouse Documentation](https://github.com/ClickHouse/ClickHouse/tree/master/docs/README.md) guide first. -Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/ +> Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/ diff --git a/.gitmodules b/.gitmodules index 4df7798e1e7..43c878427ec 100644 --- a/.gitmodules +++ b/.gitmodules @@ -225,6 +225,15 @@ [submodule "contrib/yaml-cpp"] path = contrib/yaml-cpp url = https://github.com/ClickHouse-Extras/yaml-cpp.git +[submodule "contrib/libstemmer_c"] + path = contrib/libstemmer_c + url = https://github.com/ClickHouse-Extras/libstemmer_c.git +[submodule "contrib/wordnet-blast"] + path = contrib/wordnet-blast + url = https://github.com/ClickHouse-Extras/wordnet-blast.git +[submodule "contrib/lemmagen-c"] + path = contrib/lemmagen-c + url = https://github.com/ClickHouse-Extras/lemmagen-c.git [submodule "contrib/libpqxx"] path = contrib/libpqxx url = https://github.com/ClickHouse-Extras/libpqxx.git diff --git a/CMakeLists.txt b/CMakeLists.txt index 875a6d1ab61..24022c256ec 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -542,6 +542,7 @@ include (cmake/find/libpqxx.cmake) include (cmake/find/nuraft.cmake) include (cmake/find/yaml-cpp.cmake) include (cmake/find/s2geometry.cmake) +include (cmake/find/nlp.cmake) if(NOT USE_INTERNAL_PARQUET_LIBRARY) set (ENABLE_ORC OFF CACHE INTERNAL "") diff --git a/base/common/ReplxxLineReader.cpp b/base/common/ReplxxLineReader.cpp index 9c65b1dfe4c..c79013f1850 100644 --- a/base/common/ReplxxLineReader.cpp +++ b/base/common/ReplxxLineReader.cpp @@ -69,7 +69,7 @@ void convertHistoryFile(const std::string & path, replxx::Replxx & rx) } std::string line; - if (!getline(in, line).good()) + if (getline(in, line).bad()) { rx.print("Cannot read from %s (for conversion): %s\n", path.c_str(), errnoToString(errno).c_str()); @@ -78,7 +78,7 @@ void convertHistoryFile(const std::string & path, replxx::Replxx & rx) /// This is the marker of the date, no need to convert. static char const REPLXX_TIMESTAMP_PATTERN[] = "### dddd-dd-dd dd:dd:dd.ddd"; - if (line.starts_with("### ") && line.size() == strlen(REPLXX_TIMESTAMP_PATTERN)) + if (line.empty() || (line.starts_with("### ") && line.size() == strlen(REPLXX_TIMESTAMP_PATTERN))) { return; } diff --git a/base/common/removeDuplicates.h b/base/common/removeDuplicates.h new file mode 100644 index 00000000000..a0142b1a948 --- /dev/null +++ b/base/common/removeDuplicates.h @@ -0,0 +1,24 @@ +#pragma once +#include + +/// Removes duplicates from a container without changing the order of its elements. +/// Keeps the last occurrence of each element. +/// Should NOT be used for containers with a lot of elements because it has O(N^2) complexity. +template +void removeDuplicatesKeepLast(std::vector & vec) +{ + auto begin = vec.begin(); + auto end = vec.end(); + auto new_begin = end; + for (auto current = end; current != begin;) + { + --current; + if (std::find(new_begin, end, *current) == end) + { + --new_begin; + if (new_begin != current) + *new_begin = *current; + } + } + vec.erase(begin, new_begin); +} diff --git a/base/daemon/BaseDaemon.cpp b/base/daemon/BaseDaemon.cpp index ed5c81e89fa..745e020c8bb 100644 --- a/base/daemon/BaseDaemon.cpp +++ b/base/daemon/BaseDaemon.cpp @@ -259,10 +259,25 @@ private: Poco::Logger * log; BaseDaemon & daemon; - void onTerminate(const std::string & message, UInt32 thread_num) const + void onTerminate(std::string_view message, UInt32 thread_num) const { + size_t pos = message.find('\n'); + LOG_FATAL(log, "(version {}{}, {}) (from thread {}) {}", - VERSION_STRING, VERSION_OFFICIAL, daemon.build_id_info, thread_num, message); + VERSION_STRING, VERSION_OFFICIAL, daemon.build_id_info, thread_num, message.substr(0, pos)); + + /// Print trace from std::terminate exception line-by-line to make it easy for grep. + while (pos != std::string_view::npos) + { + ++pos; + size_t next_pos = message.find('\n', pos); + size_t size = next_pos; + if (next_pos != std::string_view::npos) + size = next_pos - pos; + + LOG_FATAL(log, "{}", message.substr(pos, size)); + pos = next_pos; + } } void onFault( diff --git a/benchmark/clickhouse/benchmark-new.sh b/benchmark/clickhouse/benchmark-new.sh index 876ff87f58d..0c4cad6e5e3 100755 --- a/benchmark/clickhouse/benchmark-new.sh +++ b/benchmark/clickhouse/benchmark-new.sh @@ -4,13 +4,24 @@ QUERIES_FILE="queries.sql" TABLE=$1 TRIES=3 +if [ -x ./clickhouse ] +then + CLICKHOUSE_CLIENT="./clickhouse client" +elif command -v clickhouse-client >/dev/null 2>&1 +then + CLICKHOUSE_CLIENT="clickhouse-client" +else + echo "clickhouse-client is not found" + exit 1 +fi + cat "$QUERIES_FILE" | sed "s/{table}/${TABLE}/g" | while read query; do sync echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null echo -n "[" for i in $(seq 1 $TRIES); do - RES=$(clickhouse-client --time --format=Null --query="$query" 2>&1) + RES=$(${CLICKHOUSE_CLIENT} --time --format=Null --max_memory_usage=100G --query="$query" 2>&1) [[ "$?" == "0" ]] && echo -n "${RES}" || echo -n "null" [[ "$i" != $TRIES ]] && echo -n ", " done diff --git a/benchmark/hardware.sh b/benchmark/hardware.sh index e7773ab1d1e..e4a94ef1854 100755 --- a/benchmark/hardware.sh +++ b/benchmark/hardware.sh @@ -11,8 +11,8 @@ DATASET="${TABLE}_v1.tar.xz" QUERIES_FILE="queries.sql" TRIES=3 -AMD64_BIN_URL="https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_build_check/gcc-10_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse" -AARCH64_BIN_URL="https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_special_build_check/clang-10-aarch64_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse" +AMD64_BIN_URL="https://builds.clickhouse.tech/master/amd64/clickhouse" +AARCH64_BIN_URL="https://builds.clickhouse.tech/master/aarch64/clickhouse" # Note: on older Ubuntu versions, 'axel' does not support IPv6. If you are using IPv6-only servers on very old Ubuntu, just don't install 'axel'. @@ -89,7 +89,7 @@ cat "$QUERIES_FILE" | sed "s/{table}/${TABLE}/g" | while read query; do echo -n "[" for i in $(seq 1 $TRIES); do - RES=$(./clickhouse client --max_memory_usage 100000000000 --time --format=Null --query="$query" 2>&1 ||:) + RES=$(./clickhouse client --max_memory_usage 100G --time --format=Null --query="$query" 2>&1 ||:) [[ "$?" == "0" ]] && echo -n "${RES}" || echo -n "null" [[ "$i" != $TRIES ]] && echo -n ", " done diff --git a/cmake/find/nlp.cmake b/cmake/find/nlp.cmake new file mode 100644 index 00000000000..f1204a85dea --- /dev/null +++ b/cmake/find/nlp.cmake @@ -0,0 +1,32 @@ +option(ENABLE_NLP "Enable NLP functions support" ${ENABLE_LIBRARIES}) + +if (NOT ENABLE_NLP) + + message (STATUS "NLP functions disabled") + return() +endif() + +if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libstemmer_c/Makefile") + message (WARNING "submodule contrib/libstemmer_c is missing. to fix try run: \n git submodule update --init --recursive") + message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find internal libstemmer_c library, NLP functions will be disabled") + set (USE_NLP 0) + return() +endif () + +if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/wordnet-blast/CMakeLists.txt") + message (WARNING "submodule contrib/wordnet-blast is missing. to fix try run: \n git submodule update --init --recursive") + message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find internal wordnet-blast library, NLP functions will be disabled") + set (USE_NLP 0) + return() +endif () + +if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/lemmagen-c/README.md") + message (WARNING "submodule contrib/lemmagen-c is missing. to fix try run: \n git submodule update --init --recursive") + message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find internal lemmagen-c library, NLP functions will be disabled") + set (USE_NLP 0) + return() +endif () + +set (USE_NLP 1) + +message (STATUS "Using Libraries for NLP functions: contrib/wordnet-blast, contrib/libstemmer_c, contrib/lemmagen-c") diff --git a/contrib/AMQP-CPP b/contrib/AMQP-CPP index 03781aaff0f..1a6c51f4ac5 160000 --- a/contrib/AMQP-CPP +++ b/contrib/AMQP-CPP @@ -1 +1 @@ -Subproject commit 03781aaff0f10ef41f902b8cf865fe0067180c10 +Subproject commit 1a6c51f4ac51ac56610fa95081bd2f349911375a diff --git a/contrib/CMakeLists.txt b/contrib/CMakeLists.txt index 2b6629d0817..82cddb0ace0 100644 --- a/contrib/CMakeLists.txt +++ b/contrib/CMakeLists.txt @@ -328,6 +328,12 @@ endif() add_subdirectory(fast_float) +if (USE_NLP) + add_subdirectory(libstemmer-c-cmake) + add_subdirectory(wordnet-blast-cmake) + add_subdirectory(lemmagen-c-cmake) +endif() + if (USE_SQLITE) add_subdirectory(sqlite-cmake) endif() diff --git a/contrib/NuRaft b/contrib/NuRaft index 976874b7aa7..0ce94900930 160000 --- a/contrib/NuRaft +++ b/contrib/NuRaft @@ -1 +1 @@ -Subproject commit 976874b7aa7f422bf4ea595bb7d1166c617b1c26 +Subproject commit 0ce9490093021c63564cca159571a8b27772ad48 diff --git a/contrib/amqpcpp-cmake/CMakeLists.txt b/contrib/amqpcpp-cmake/CMakeLists.txt index 4e8342af125..5637db4cf41 100644 --- a/contrib/amqpcpp-cmake/CMakeLists.txt +++ b/contrib/amqpcpp-cmake/CMakeLists.txt @@ -10,11 +10,12 @@ set (SRCS "${LIBRARY_DIR}/src/deferredconsumer.cpp" "${LIBRARY_DIR}/src/deferredextreceiver.cpp" "${LIBRARY_DIR}/src/deferredget.cpp" - "${LIBRARY_DIR}/src/deferredpublisher.cpp" + "${LIBRARY_DIR}/src/deferredrecall.cpp" "${LIBRARY_DIR}/src/deferredreceiver.cpp" "${LIBRARY_DIR}/src/field.cpp" "${LIBRARY_DIR}/src/flags.cpp" "${LIBRARY_DIR}/src/linux_tcp/openssl.cpp" + "${LIBRARY_DIR}/src/linux_tcp/sslerrorprinter.cpp" "${LIBRARY_DIR}/src/linux_tcp/tcpconnection.cpp" "${LIBRARY_DIR}/src/inbuffer.cpp" "${LIBRARY_DIR}/src/receivedframe.cpp" diff --git a/contrib/arrow b/contrib/arrow index debf751a129..078e21bad34 160000 --- a/contrib/arrow +++ b/contrib/arrow @@ -1 +1 @@ -Subproject commit debf751a129bdda9ff4d1e895e08957ff77000a1 +Subproject commit 078e21bad344747b7656ef2d7a4f7410a0a303eb diff --git a/contrib/arrow-cmake/CMakeLists.txt b/contrib/arrow-cmake/CMakeLists.txt index 2237be9913a..2c72055a3e7 100644 --- a/contrib/arrow-cmake/CMakeLists.txt +++ b/contrib/arrow-cmake/CMakeLists.txt @@ -194,9 +194,18 @@ set(ARROW_SRCS "${LIBRARY_DIR}/compute/cast.cc" "${LIBRARY_DIR}/compute/exec.cc" "${LIBRARY_DIR}/compute/function.cc" + "${LIBRARY_DIR}/compute/function_internal.cc" "${LIBRARY_DIR}/compute/kernel.cc" "${LIBRARY_DIR}/compute/registry.cc" + "${LIBRARY_DIR}/compute/exec/exec_plan.cc" + "${LIBRARY_DIR}/compute/exec/expression.cc" + "${LIBRARY_DIR}/compute/exec/key_compare.cc" + "${LIBRARY_DIR}/compute/exec/key_encode.cc" + "${LIBRARY_DIR}/compute/exec/key_hash.cc" + "${LIBRARY_DIR}/compute/exec/key_map.cc" + "${LIBRARY_DIR}/compute/exec/util.cc" + "${LIBRARY_DIR}/compute/kernels/aggregate_basic.cc" "${LIBRARY_DIR}/compute/kernels/aggregate_mode.cc" "${LIBRARY_DIR}/compute/kernels/aggregate_quantile.cc" @@ -207,6 +216,7 @@ set(ARROW_SRCS "${LIBRARY_DIR}/compute/kernels/scalar_arithmetic.cc" "${LIBRARY_DIR}/compute/kernels/scalar_boolean.cc" "${LIBRARY_DIR}/compute/kernels/scalar_cast_boolean.cc" + "${LIBRARY_DIR}/compute/kernels/scalar_cast_dictionary.cc" "${LIBRARY_DIR}/compute/kernels/scalar_cast_internal.cc" "${LIBRARY_DIR}/compute/kernels/scalar_cast_nested.cc" "${LIBRARY_DIR}/compute/kernels/scalar_cast_numeric.cc" @@ -214,15 +224,18 @@ set(ARROW_SRCS "${LIBRARY_DIR}/compute/kernels/scalar_cast_temporal.cc" "${LIBRARY_DIR}/compute/kernels/scalar_compare.cc" "${LIBRARY_DIR}/compute/kernels/scalar_fill_null.cc" + "${LIBRARY_DIR}/compute/kernels/scalar_if_else.cc" "${LIBRARY_DIR}/compute/kernels/scalar_nested.cc" "${LIBRARY_DIR}/compute/kernels/scalar_set_lookup.cc" "${LIBRARY_DIR}/compute/kernels/scalar_string.cc" + "${LIBRARY_DIR}/compute/kernels/scalar_temporal.cc" "${LIBRARY_DIR}/compute/kernels/scalar_validity.cc" + "${LIBRARY_DIR}/compute/kernels/util_internal.cc" "${LIBRARY_DIR}/compute/kernels/vector_hash.cc" "${LIBRARY_DIR}/compute/kernels/vector_nested.cc" + "${LIBRARY_DIR}/compute/kernels/vector_replace.cc" "${LIBRARY_DIR}/compute/kernels/vector_selection.cc" "${LIBRARY_DIR}/compute/kernels/vector_sort.cc" - "${LIBRARY_DIR}/compute/kernels/util_internal.cc" "${LIBRARY_DIR}/csv/chunker.cc" "${LIBRARY_DIR}/csv/column_builder.cc" @@ -231,6 +244,7 @@ set(ARROW_SRCS "${LIBRARY_DIR}/csv/options.cc" "${LIBRARY_DIR}/csv/parser.cc" "${LIBRARY_DIR}/csv/reader.cc" + "${LIBRARY_DIR}/csv/writer.cc" "${LIBRARY_DIR}/ipc/dictionary.cc" "${LIBRARY_DIR}/ipc/feather.cc" @@ -247,6 +261,7 @@ set(ARROW_SRCS "${LIBRARY_DIR}/io/interfaces.cc" "${LIBRARY_DIR}/io/memory.cc" "${LIBRARY_DIR}/io/slow.cc" + "${LIBRARY_DIR}/io/stdio.cc" "${LIBRARY_DIR}/io/transform.cc" "${LIBRARY_DIR}/tensor/coo_converter.cc" @@ -257,9 +272,9 @@ set(ARROW_SRCS "${LIBRARY_DIR}/util/bit_block_counter.cc" "${LIBRARY_DIR}/util/bit_run_reader.cc" "${LIBRARY_DIR}/util/bit_util.cc" - "${LIBRARY_DIR}/util/bitmap.cc" "${LIBRARY_DIR}/util/bitmap_builders.cc" "${LIBRARY_DIR}/util/bitmap_ops.cc" + "${LIBRARY_DIR}/util/bitmap.cc" "${LIBRARY_DIR}/util/bpacking.cc" "${LIBRARY_DIR}/util/cancel.cc" "${LIBRARY_DIR}/util/compression.cc" diff --git a/contrib/boost b/contrib/boost index 1ccbb5a522a..9cf09dbfd55 160000 --- a/contrib/boost +++ b/contrib/boost @@ -1 +1 @@ -Subproject commit 1ccbb5a522a571ce83b606dbc2e1011c42ecccfb +Subproject commit 9cf09dbfd55a5c6202dedbdf40781a51b02c2675 diff --git a/contrib/boost-cmake/CMakeLists.txt b/contrib/boost-cmake/CMakeLists.txt index 9f6c5b1255d..675931d319f 100644 --- a/contrib/boost-cmake/CMakeLists.txt +++ b/contrib/boost-cmake/CMakeLists.txt @@ -13,11 +13,12 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY) regex context coroutine + graph ) if(Boost_INCLUDE_DIR AND Boost_FILESYSTEM_LIBRARY AND Boost_FILESYSTEM_LIBRARY AND Boost_PROGRAM_OPTIONS_LIBRARY AND Boost_REGEX_LIBRARY AND Boost_SYSTEM_LIBRARY AND Boost_CONTEXT_LIBRARY AND - Boost_COROUTINE_LIBRARY) + Boost_COROUTINE_LIBRARY AND Boost_GRAPH_LIBRARY) set(EXTERNAL_BOOST_FOUND 1) @@ -32,6 +33,7 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY) add_library (_boost_system INTERFACE) add_library (_boost_context INTERFACE) add_library (_boost_coroutine INTERFACE) + add_library (_boost_graph INTERFACE) target_link_libraries (_boost_filesystem INTERFACE ${Boost_FILESYSTEM_LIBRARY}) target_link_libraries (_boost_iostreams INTERFACE ${Boost_IOSTREAMS_LIBRARY}) @@ -40,6 +42,7 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY) target_link_libraries (_boost_system INTERFACE ${Boost_SYSTEM_LIBRARY}) target_link_libraries (_boost_context INTERFACE ${Boost_CONTEXT_LIBRARY}) target_link_libraries (_boost_coroutine INTERFACE ${Boost_COROUTINE_LIBRARY}) + target_link_libraries (_boost_graph INTERFACE ${Boost_GRAPH_LIBRARY}) add_library (boost::filesystem ALIAS _boost_filesystem) add_library (boost::iostreams ALIAS _boost_iostreams) @@ -48,6 +51,7 @@ if (NOT USE_INTERNAL_BOOST_LIBRARY) add_library (boost::system ALIAS _boost_system) add_library (boost::context ALIAS _boost_context) add_library (boost::coroutine ALIAS _boost_coroutine) + add_library (boost::graph ALIAS _boost_graph) else() set(EXTERNAL_BOOST_FOUND 0) message (${RECONFIGURE_MESSAGE_LEVEL} "Can't find system boost") @@ -221,4 +225,17 @@ if (NOT EXTERNAL_BOOST_FOUND) add_library (boost::coroutine ALIAS _boost_coroutine) target_include_directories (_boost_coroutine PRIVATE ${LIBRARY_DIR}) target_link_libraries(_boost_coroutine PRIVATE _boost_context) + + # graph + + set (SRCS_GRAPH + "${LIBRARY_DIR}/libs/graph/src/graphml.cpp" + "${LIBRARY_DIR}/libs/graph/src/read_graphviz_new.cpp" + ) + + add_library (_boost_graph ${SRCS_GRAPH}) + add_library (boost::graph ALIAS _boost_graph) + target_include_directories (_boost_graph PRIVATE ${LIBRARY_DIR}) + target_link_libraries(_boost_graph PRIVATE _boost_regex) + endif () diff --git a/contrib/lemmagen-c b/contrib/lemmagen-c new file mode 160000 index 00000000000..59537bdcf57 --- /dev/null +++ b/contrib/lemmagen-c @@ -0,0 +1 @@ +Subproject commit 59537bdcf57bbed17913292cb4502d15657231f1 diff --git a/contrib/lemmagen-c-cmake/CMakeLists.txt b/contrib/lemmagen-c-cmake/CMakeLists.txt new file mode 100644 index 00000000000..b5b92b774e1 --- /dev/null +++ b/contrib/lemmagen-c-cmake/CMakeLists.txt @@ -0,0 +1,9 @@ +set(LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/lemmagen-c") +set(LEMMAGEN_INCLUDE_DIR "${LIBRARY_DIR}/include") + +set(SRCS + "${LIBRARY_DIR}/src/RdrLemmatizer.cpp" +) + +add_library(lemmagen STATIC ${SRCS}) +target_include_directories(lemmagen PUBLIC "${LEMMAGEN_INCLUDE_DIR}") diff --git a/contrib/libstemmer-c-cmake/CMakeLists.txt b/contrib/libstemmer-c-cmake/CMakeLists.txt new file mode 100644 index 00000000000..2d38e5f3612 --- /dev/null +++ b/contrib/libstemmer-c-cmake/CMakeLists.txt @@ -0,0 +1,31 @@ +set(LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/libstemmer_c") +set(STEMMER_INCLUDE_DIR "${LIBRARY_DIR}/include") + +FILE ( READ "${LIBRARY_DIR}/mkinc.mak" _CONTENT ) +# replace '\ ' into one big line +STRING ( REGEX REPLACE "\\\\\n " " ${LIBRARY_DIR}/" _CONTENT "${_CONTENT}" ) +# escape ';' (if any) +STRING ( REGEX REPLACE ";" "\\\\;" _CONTENT "${_CONTENT}" ) +# now replace lf into ';' (it makes list from the line) +STRING ( REGEX REPLACE "\n" ";" _CONTENT "${_CONTENT}" ) +FOREACH ( LINE ${_CONTENT} ) + # skip comments (beginning with #) + IF ( NOT "${LINE}" MATCHES "^#.*" ) + # parse 'name=value1 value2..." - extract the 'name' part + STRING ( REGEX REPLACE "=.*$" "" _NAME "${LINE}" ) + # extract the list of values part + STRING ( REGEX REPLACE "^.*=" "" _LIST "${LINE}" ) + # replace (multi)spaces into ';' (it makes list from the line) + STRING ( REGEX REPLACE " +" ";" _LIST "${_LIST}" ) + # finally get our two variables + IF ( "${_NAME}" MATCHES "snowball_sources" ) + SET ( _SOURCES "${_LIST}" ) + ELSEIF ( "${_NAME}" MATCHES "snowball_headers" ) + SET ( _HEADERS "${_LIST}" ) + ENDIF () + endif () +endforeach () + +# all the sources parsed. Now just add the lib +add_library ( stemmer STATIC ${_SOURCES} ${_HEADERS} ) +target_include_directories (stemmer PUBLIC "${STEMMER_INCLUDE_DIR}") diff --git a/contrib/libstemmer_c b/contrib/libstemmer_c new file mode 160000 index 00000000000..c753054304d --- /dev/null +++ b/contrib/libstemmer_c @@ -0,0 +1 @@ +Subproject commit c753054304d87daf460057c1a649c482aa094835 diff --git a/contrib/nuraft-cmake/CMakeLists.txt b/contrib/nuraft-cmake/CMakeLists.txt index 725e86195e1..d9e0aa6efc7 100644 --- a/contrib/nuraft-cmake/CMakeLists.txt +++ b/contrib/nuraft-cmake/CMakeLists.txt @@ -22,6 +22,7 @@ set(SRCS "${LIBRARY_DIR}/src/launcher.cxx" "${LIBRARY_DIR}/src/srv_config.cxx" "${LIBRARY_DIR}/src/snapshot_sync_req.cxx" + "${LIBRARY_DIR}/src/snapshot_sync_ctx.cxx" "${LIBRARY_DIR}/src/handle_timeout.cxx" "${LIBRARY_DIR}/src/handle_append_entries.cxx" "${LIBRARY_DIR}/src/cluster_config.cxx" diff --git a/contrib/poco b/contrib/poco index 59945069080..7351c4691b5 160000 --- a/contrib/poco +++ b/contrib/poco @@ -1 +1 @@ -Subproject commit 5994506908028612869fee627d68d8212dfe7c1e +Subproject commit 7351c4691b5d401f59e3959adfc5b4fa263b32da diff --git a/contrib/protobuf b/contrib/protobuf index 73b12814204..75601841d17 160000 --- a/contrib/protobuf +++ b/contrib/protobuf @@ -1 +1 @@ -Subproject commit 73b12814204ad9068ba352914d0dc244648b48ee +Subproject commit 75601841d172c73ae6bf4ce8121f42b875cdbabd diff --git a/contrib/rocksdb b/contrib/rocksdb index 07c77549a20..b6480c69bf3 160000 --- a/contrib/rocksdb +++ b/contrib/rocksdb @@ -1 +1 @@ -Subproject commit 07c77549a20b63ff6981b400085eba36bb5c80c4 +Subproject commit b6480c69bf3ab6e298e0d019a07fd4f69029b26a diff --git a/contrib/rocksdb-cmake/CMakeLists.txt b/contrib/rocksdb-cmake/CMakeLists.txt index bccc9ed5294..e7ff1f548e3 100644 --- a/contrib/rocksdb-cmake/CMakeLists.txt +++ b/contrib/rocksdb-cmake/CMakeLists.txt @@ -70,11 +70,6 @@ else() endif() endif() -set(BUILD_VERSION_CC rocksdb_build_version.cc) -add_library(rocksdb_build_version OBJECT ${BUILD_VERSION_CC}) - -target_include_directories(rocksdb_build_version PRIVATE "${ROCKSDB_SOURCE_DIR}/util") - include(CheckCCompilerFlag) if(CMAKE_SYSTEM_PROCESSOR MATCHES "^(powerpc|ppc)64") CHECK_C_COMPILER_FLAG("-mcpu=power9" HAS_POWER9) @@ -243,272 +238,293 @@ find_package(Threads REQUIRED) # Main library source code set(SOURCES - "${ROCKSDB_SOURCE_DIR}/cache/cache.cc" - "${ROCKSDB_SOURCE_DIR}/cache/clock_cache.cc" - "${ROCKSDB_SOURCE_DIR}/cache/lru_cache.cc" - "${ROCKSDB_SOURCE_DIR}/cache/sharded_cache.cc" - "${ROCKSDB_SOURCE_DIR}/db/arena_wrapped_db_iter.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_addition.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_builder.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_cache.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_garbage.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_meta.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_reader.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_log_format.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_log_sequential_reader.cc" - "${ROCKSDB_SOURCE_DIR}/db/blob/blob_log_writer.cc" - "${ROCKSDB_SOURCE_DIR}/db/builder.cc" - "${ROCKSDB_SOURCE_DIR}/db/c.cc" - "${ROCKSDB_SOURCE_DIR}/db/column_family.cc" - "${ROCKSDB_SOURCE_DIR}/db/compacted_db_impl.cc" - "${ROCKSDB_SOURCE_DIR}/db/compaction/compaction.cc" - "${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_iterator.cc" - "${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_picker.cc" - "${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_job.cc" - "${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_picker_fifo.cc" - "${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_picker_level.cc" - "${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_picker_universal.cc" - "${ROCKSDB_SOURCE_DIR}/db/compaction/sst_partitioner.cc" - "${ROCKSDB_SOURCE_DIR}/db/convenience.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_filesnapshot.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_write.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_compaction_flush.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_files.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_open.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_debug.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_experimental.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_readonly.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_secondary.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_info_dumper.cc" - "${ROCKSDB_SOURCE_DIR}/db/db_iter.cc" - "${ROCKSDB_SOURCE_DIR}/db/dbformat.cc" - "${ROCKSDB_SOURCE_DIR}/db/error_handler.cc" - "${ROCKSDB_SOURCE_DIR}/db/event_helpers.cc" - "${ROCKSDB_SOURCE_DIR}/db/experimental.cc" - "${ROCKSDB_SOURCE_DIR}/db/external_sst_file_ingestion_job.cc" - "${ROCKSDB_SOURCE_DIR}/db/file_indexer.cc" - "${ROCKSDB_SOURCE_DIR}/db/flush_job.cc" - "${ROCKSDB_SOURCE_DIR}/db/flush_scheduler.cc" - "${ROCKSDB_SOURCE_DIR}/db/forward_iterator.cc" - "${ROCKSDB_SOURCE_DIR}/db/import_column_family_job.cc" - "${ROCKSDB_SOURCE_DIR}/db/internal_stats.cc" - "${ROCKSDB_SOURCE_DIR}/db/logs_with_prep_tracker.cc" - "${ROCKSDB_SOURCE_DIR}/db/log_reader.cc" - "${ROCKSDB_SOURCE_DIR}/db/log_writer.cc" - "${ROCKSDB_SOURCE_DIR}/db/malloc_stats.cc" - "${ROCKSDB_SOURCE_DIR}/db/memtable.cc" - "${ROCKSDB_SOURCE_DIR}/db/memtable_list.cc" - "${ROCKSDB_SOURCE_DIR}/db/merge_helper.cc" - "${ROCKSDB_SOURCE_DIR}/db/merge_operator.cc" - "${ROCKSDB_SOURCE_DIR}/db/output_validator.cc" - "${ROCKSDB_SOURCE_DIR}/db/periodic_work_scheduler.cc" - "${ROCKSDB_SOURCE_DIR}/db/range_del_aggregator.cc" - "${ROCKSDB_SOURCE_DIR}/db/range_tombstone_fragmenter.cc" - "${ROCKSDB_SOURCE_DIR}/db/repair.cc" - "${ROCKSDB_SOURCE_DIR}/db/snapshot_impl.cc" - "${ROCKSDB_SOURCE_DIR}/db/table_cache.cc" - "${ROCKSDB_SOURCE_DIR}/db/table_properties_collector.cc" - "${ROCKSDB_SOURCE_DIR}/db/transaction_log_impl.cc" - "${ROCKSDB_SOURCE_DIR}/db/trim_history_scheduler.cc" - "${ROCKSDB_SOURCE_DIR}/db/version_builder.cc" - "${ROCKSDB_SOURCE_DIR}/db/version_edit.cc" - "${ROCKSDB_SOURCE_DIR}/db/version_edit_handler.cc" - "${ROCKSDB_SOURCE_DIR}/db/version_set.cc" - "${ROCKSDB_SOURCE_DIR}/db/wal_edit.cc" - "${ROCKSDB_SOURCE_DIR}/db/wal_manager.cc" - "${ROCKSDB_SOURCE_DIR}/db/write_batch.cc" - "${ROCKSDB_SOURCE_DIR}/db/write_batch_base.cc" - "${ROCKSDB_SOURCE_DIR}/db/write_controller.cc" - "${ROCKSDB_SOURCE_DIR}/db/write_thread.cc" - "${ROCKSDB_SOURCE_DIR}/env/env.cc" - "${ROCKSDB_SOURCE_DIR}/env/env_chroot.cc" - "${ROCKSDB_SOURCE_DIR}/env/env_encryption.cc" - "${ROCKSDB_SOURCE_DIR}/env/env_hdfs.cc" - "${ROCKSDB_SOURCE_DIR}/env/file_system.cc" - "${ROCKSDB_SOURCE_DIR}/env/file_system_tracer.cc" - "${ROCKSDB_SOURCE_DIR}/env/mock_env.cc" - "${ROCKSDB_SOURCE_DIR}/file/delete_scheduler.cc" - "${ROCKSDB_SOURCE_DIR}/file/file_prefetch_buffer.cc" - "${ROCKSDB_SOURCE_DIR}/file/file_util.cc" - "${ROCKSDB_SOURCE_DIR}/file/filename.cc" - "${ROCKSDB_SOURCE_DIR}/file/random_access_file_reader.cc" - "${ROCKSDB_SOURCE_DIR}/file/read_write_util.cc" - "${ROCKSDB_SOURCE_DIR}/file/readahead_raf.cc" - "${ROCKSDB_SOURCE_DIR}/file/sequence_file_reader.cc" - "${ROCKSDB_SOURCE_DIR}/file/sst_file_manager_impl.cc" - "${ROCKSDB_SOURCE_DIR}/file/writable_file_writer.cc" - "${ROCKSDB_SOURCE_DIR}/logging/auto_roll_logger.cc" - "${ROCKSDB_SOURCE_DIR}/logging/event_logger.cc" - "${ROCKSDB_SOURCE_DIR}/logging/log_buffer.cc" - "${ROCKSDB_SOURCE_DIR}/memory/arena.cc" - "${ROCKSDB_SOURCE_DIR}/memory/concurrent_arena.cc" - "${ROCKSDB_SOURCE_DIR}/memory/jemalloc_nodump_allocator.cc" - "${ROCKSDB_SOURCE_DIR}/memory/memkind_kmem_allocator.cc" - "${ROCKSDB_SOURCE_DIR}/memtable/alloc_tracker.cc" - "${ROCKSDB_SOURCE_DIR}/memtable/hash_linklist_rep.cc" - "${ROCKSDB_SOURCE_DIR}/memtable/hash_skiplist_rep.cc" - "${ROCKSDB_SOURCE_DIR}/memtable/skiplistrep.cc" - "${ROCKSDB_SOURCE_DIR}/memtable/vectorrep.cc" - "${ROCKSDB_SOURCE_DIR}/memtable/write_buffer_manager.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/histogram.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/histogram_windowing.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/in_memory_stats_history.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/instrumented_mutex.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/iostats_context.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/perf_context.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/perf_level.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/persistent_stats_history.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/statistics.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/thread_status_impl.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/thread_status_updater.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/thread_status_util.cc" - "${ROCKSDB_SOURCE_DIR}/monitoring/thread_status_util_debug.cc" - "${ROCKSDB_SOURCE_DIR}/options/cf_options.cc" - "${ROCKSDB_SOURCE_DIR}/options/configurable.cc" - "${ROCKSDB_SOURCE_DIR}/options/customizable.cc" - "${ROCKSDB_SOURCE_DIR}/options/db_options.cc" - "${ROCKSDB_SOURCE_DIR}/options/options.cc" - "${ROCKSDB_SOURCE_DIR}/options/options_helper.cc" - "${ROCKSDB_SOURCE_DIR}/options/options_parser.cc" - "${ROCKSDB_SOURCE_DIR}/port/stack_trace.cc" - "${ROCKSDB_SOURCE_DIR}/table/adaptive/adaptive_table_factory.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/binary_search_index_reader.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_filter_block.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_table_builder.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_table_factory.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_table_iterator.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_table_reader.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block_builder.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block_prefetcher.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/block_prefix_index.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/data_block_hash_index.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/data_block_footer.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/filter_block_reader_common.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/filter_policy.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/flush_block_policy.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/full_filter_block.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/hash_index_reader.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/index_builder.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/index_reader_common.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/parsed_full_filter_block.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/partitioned_filter_block.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/partitioned_index_iterator.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/partitioned_index_reader.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/reader_common.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_based/uncompression_dict_reader.cc" - "${ROCKSDB_SOURCE_DIR}/table/block_fetcher.cc" - "${ROCKSDB_SOURCE_DIR}/table/cuckoo/cuckoo_table_builder.cc" - "${ROCKSDB_SOURCE_DIR}/table/cuckoo/cuckoo_table_factory.cc" - "${ROCKSDB_SOURCE_DIR}/table/cuckoo/cuckoo_table_reader.cc" - "${ROCKSDB_SOURCE_DIR}/table/format.cc" - "${ROCKSDB_SOURCE_DIR}/table/get_context.cc" - "${ROCKSDB_SOURCE_DIR}/table/iterator.cc" - "${ROCKSDB_SOURCE_DIR}/table/merging_iterator.cc" - "${ROCKSDB_SOURCE_DIR}/table/meta_blocks.cc" - "${ROCKSDB_SOURCE_DIR}/table/persistent_cache_helper.cc" - "${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_bloom.cc" - "${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_builder.cc" - "${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_factory.cc" - "${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_index.cc" - "${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_key_coding.cc" - "${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_reader.cc" - "${ROCKSDB_SOURCE_DIR}/table/sst_file_dumper.cc" - "${ROCKSDB_SOURCE_DIR}/table/sst_file_reader.cc" - "${ROCKSDB_SOURCE_DIR}/table/sst_file_writer.cc" - "${ROCKSDB_SOURCE_DIR}/table/table_factory.cc" - "${ROCKSDB_SOURCE_DIR}/table/table_properties.cc" - "${ROCKSDB_SOURCE_DIR}/table/two_level_iterator.cc" - "${ROCKSDB_SOURCE_DIR}/test_util/sync_point.cc" - "${ROCKSDB_SOURCE_DIR}/test_util/sync_point_impl.cc" - "${ROCKSDB_SOURCE_DIR}/test_util/testutil.cc" - "${ROCKSDB_SOURCE_DIR}/test_util/transaction_test_util.cc" - "${ROCKSDB_SOURCE_DIR}/tools/block_cache_analyzer/block_cache_trace_analyzer.cc" - "${ROCKSDB_SOURCE_DIR}/tools/dump/db_dump_tool.cc" - "${ROCKSDB_SOURCE_DIR}/tools/io_tracer_parser_tool.cc" - "${ROCKSDB_SOURCE_DIR}/tools/ldb_cmd.cc" - "${ROCKSDB_SOURCE_DIR}/tools/ldb_tool.cc" - "${ROCKSDB_SOURCE_DIR}/tools/sst_dump_tool.cc" - "${ROCKSDB_SOURCE_DIR}/tools/trace_analyzer_tool.cc" - "${ROCKSDB_SOURCE_DIR}/trace_replay/trace_replay.cc" - "${ROCKSDB_SOURCE_DIR}/trace_replay/block_cache_tracer.cc" - "${ROCKSDB_SOURCE_DIR}/trace_replay/io_tracer.cc" - "${ROCKSDB_SOURCE_DIR}/util/coding.cc" - "${ROCKSDB_SOURCE_DIR}/util/compaction_job_stats_impl.cc" - "${ROCKSDB_SOURCE_DIR}/util/comparator.cc" - "${ROCKSDB_SOURCE_DIR}/util/compression_context_cache.cc" - "${ROCKSDB_SOURCE_DIR}/util/concurrent_task_limiter_impl.cc" - "${ROCKSDB_SOURCE_DIR}/util/crc32c.cc" - "${ROCKSDB_SOURCE_DIR}/util/dynamic_bloom.cc" - "${ROCKSDB_SOURCE_DIR}/util/hash.cc" - "${ROCKSDB_SOURCE_DIR}/util/murmurhash.cc" - "${ROCKSDB_SOURCE_DIR}/util/random.cc" - "${ROCKSDB_SOURCE_DIR}/util/rate_limiter.cc" - "${ROCKSDB_SOURCE_DIR}/util/slice.cc" - "${ROCKSDB_SOURCE_DIR}/util/file_checksum_helper.cc" - "${ROCKSDB_SOURCE_DIR}/util/status.cc" - "${ROCKSDB_SOURCE_DIR}/util/string_util.cc" - "${ROCKSDB_SOURCE_DIR}/util/thread_local.cc" - "${ROCKSDB_SOURCE_DIR}/util/threadpool_imp.cc" - "${ROCKSDB_SOURCE_DIR}/util/xxhash.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/backupable/backupable_db.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_compaction_filter.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_db.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_db_impl.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_db_impl_filesnapshot.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_dump_tool.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_file.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/cassandra/cassandra_compaction_filter.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/cassandra/format.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/cassandra/merge_operator.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/checkpoint/checkpoint_impl.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/compaction_filters/remove_emptyvalue_compactionfilter.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/debug.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/env_mirror.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/env_timed.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/fault_injection_env.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/fault_injection_fs.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/leveldb_options/leveldb_options.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/memory/memory_util.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/bytesxor.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/max.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/put.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/sortlist.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/string_append/stringappend.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/string_append/stringappend2.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/uint64add.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/object_registry.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/option_change_migration/option_change_migration.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/options/options_util.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/block_cache_tier.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/block_cache_tier_file.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/block_cache_tier_metadata.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/persistent_cache_tier.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/volatile_tier_impl.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/simulator_cache/cache_simulator.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/simulator_cache/sim_cache.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/table_properties_collectors/compact_on_deletion_collector.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/trace/file_trace_reader_writer.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/lock_manager.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/point/point_lock_tracker.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/point/point_lock_manager.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/optimistic_transaction_db_impl.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/optimistic_transaction.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/pessimistic_transaction.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/pessimistic_transaction_db.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/snapshot_checker.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/transaction_base.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/transaction_db_mutex_impl.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/transaction_util.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/write_prepared_txn.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/write_prepared_txn_db.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/write_unprepared_txn.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/transactions/write_unprepared_txn_db.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/ttl/db_ttl_impl.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/write_batch_with_index/write_batch_with_index.cc" - "${ROCKSDB_SOURCE_DIR}/utilities/write_batch_with_index/write_batch_with_index_internal.cc" - $) + ${ROCKSDB_SOURCE_DIR}/cache/cache.cc + ${ROCKSDB_SOURCE_DIR}/cache/cache_entry_roles.cc + ${ROCKSDB_SOURCE_DIR}/cache/clock_cache.cc + ${ROCKSDB_SOURCE_DIR}/cache/lru_cache.cc + ${ROCKSDB_SOURCE_DIR}/cache/sharded_cache.cc + ${ROCKSDB_SOURCE_DIR}/db/arena_wrapped_db_iter.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_fetcher.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_addition.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_builder.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_cache.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_garbage.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_meta.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_file_reader.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_garbage_meter.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_log_format.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_log_sequential_reader.cc + ${ROCKSDB_SOURCE_DIR}/db/blob/blob_log_writer.cc + ${ROCKSDB_SOURCE_DIR}/db/builder.cc + ${ROCKSDB_SOURCE_DIR}/db/c.cc + ${ROCKSDB_SOURCE_DIR}/db/column_family.cc + ${ROCKSDB_SOURCE_DIR}/db/compaction/compaction.cc + ${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_iterator.cc + ${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_picker.cc + ${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_job.cc + ${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_picker_fifo.cc + ${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_picker_level.cc + ${ROCKSDB_SOURCE_DIR}/db/compaction/compaction_picker_universal.cc + ${ROCKSDB_SOURCE_DIR}/db/compaction/sst_partitioner.cc + ${ROCKSDB_SOURCE_DIR}/db/convenience.cc + ${ROCKSDB_SOURCE_DIR}/db/db_filesnapshot.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/compacted_db_impl.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_write.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_compaction_flush.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_files.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_open.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_debug.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_experimental.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_readonly.cc + ${ROCKSDB_SOURCE_DIR}/db/db_impl/db_impl_secondary.cc + ${ROCKSDB_SOURCE_DIR}/db/db_info_dumper.cc + ${ROCKSDB_SOURCE_DIR}/db/db_iter.cc + ${ROCKSDB_SOURCE_DIR}/db/dbformat.cc + ${ROCKSDB_SOURCE_DIR}/db/error_handler.cc + ${ROCKSDB_SOURCE_DIR}/db/event_helpers.cc + ${ROCKSDB_SOURCE_DIR}/db/experimental.cc + ${ROCKSDB_SOURCE_DIR}/db/external_sst_file_ingestion_job.cc + ${ROCKSDB_SOURCE_DIR}/db/file_indexer.cc + ${ROCKSDB_SOURCE_DIR}/db/flush_job.cc + ${ROCKSDB_SOURCE_DIR}/db/flush_scheduler.cc + ${ROCKSDB_SOURCE_DIR}/db/forward_iterator.cc + ${ROCKSDB_SOURCE_DIR}/db/import_column_family_job.cc + ${ROCKSDB_SOURCE_DIR}/db/internal_stats.cc + ${ROCKSDB_SOURCE_DIR}/db/logs_with_prep_tracker.cc + ${ROCKSDB_SOURCE_DIR}/db/log_reader.cc + ${ROCKSDB_SOURCE_DIR}/db/log_writer.cc + ${ROCKSDB_SOURCE_DIR}/db/malloc_stats.cc + ${ROCKSDB_SOURCE_DIR}/db/memtable.cc + ${ROCKSDB_SOURCE_DIR}/db/memtable_list.cc + ${ROCKSDB_SOURCE_DIR}/db/merge_helper.cc + ${ROCKSDB_SOURCE_DIR}/db/merge_operator.cc + ${ROCKSDB_SOURCE_DIR}/db/output_validator.cc + ${ROCKSDB_SOURCE_DIR}/db/periodic_work_scheduler.cc + ${ROCKSDB_SOURCE_DIR}/db/range_del_aggregator.cc + ${ROCKSDB_SOURCE_DIR}/db/range_tombstone_fragmenter.cc + ${ROCKSDB_SOURCE_DIR}/db/repair.cc + ${ROCKSDB_SOURCE_DIR}/db/snapshot_impl.cc + ${ROCKSDB_SOURCE_DIR}/db/table_cache.cc + ${ROCKSDB_SOURCE_DIR}/db/table_properties_collector.cc + ${ROCKSDB_SOURCE_DIR}/db/transaction_log_impl.cc + ${ROCKSDB_SOURCE_DIR}/db/trim_history_scheduler.cc + ${ROCKSDB_SOURCE_DIR}/db/version_builder.cc + ${ROCKSDB_SOURCE_DIR}/db/version_edit.cc + ${ROCKSDB_SOURCE_DIR}/db/version_edit_handler.cc + ${ROCKSDB_SOURCE_DIR}/db/version_set.cc + ${ROCKSDB_SOURCE_DIR}/db/wal_edit.cc + ${ROCKSDB_SOURCE_DIR}/db/wal_manager.cc + ${ROCKSDB_SOURCE_DIR}/db/write_batch.cc + ${ROCKSDB_SOURCE_DIR}/db/write_batch_base.cc + ${ROCKSDB_SOURCE_DIR}/db/write_controller.cc + ${ROCKSDB_SOURCE_DIR}/db/write_thread.cc + ${ROCKSDB_SOURCE_DIR}/env/composite_env.cc + ${ROCKSDB_SOURCE_DIR}/env/env.cc + ${ROCKSDB_SOURCE_DIR}/env/env_chroot.cc + ${ROCKSDB_SOURCE_DIR}/env/env_encryption.cc + ${ROCKSDB_SOURCE_DIR}/env/env_hdfs.cc + ${ROCKSDB_SOURCE_DIR}/env/file_system.cc + ${ROCKSDB_SOURCE_DIR}/env/file_system_tracer.cc + ${ROCKSDB_SOURCE_DIR}/env/fs_remap.cc + ${ROCKSDB_SOURCE_DIR}/env/mock_env.cc + ${ROCKSDB_SOURCE_DIR}/file/delete_scheduler.cc + ${ROCKSDB_SOURCE_DIR}/file/file_prefetch_buffer.cc + ${ROCKSDB_SOURCE_DIR}/file/file_util.cc + ${ROCKSDB_SOURCE_DIR}/file/filename.cc + ${ROCKSDB_SOURCE_DIR}/file/line_file_reader.cc + ${ROCKSDB_SOURCE_DIR}/file/random_access_file_reader.cc + ${ROCKSDB_SOURCE_DIR}/file/read_write_util.cc + ${ROCKSDB_SOURCE_DIR}/file/readahead_raf.cc + ${ROCKSDB_SOURCE_DIR}/file/sequence_file_reader.cc + ${ROCKSDB_SOURCE_DIR}/file/sst_file_manager_impl.cc + ${ROCKSDB_SOURCE_DIR}/file/writable_file_writer.cc + ${ROCKSDB_SOURCE_DIR}/logging/auto_roll_logger.cc + ${ROCKSDB_SOURCE_DIR}/logging/event_logger.cc + ${ROCKSDB_SOURCE_DIR}/logging/log_buffer.cc + ${ROCKSDB_SOURCE_DIR}/memory/arena.cc + ${ROCKSDB_SOURCE_DIR}/memory/concurrent_arena.cc + ${ROCKSDB_SOURCE_DIR}/memory/jemalloc_nodump_allocator.cc + ${ROCKSDB_SOURCE_DIR}/memory/memkind_kmem_allocator.cc + ${ROCKSDB_SOURCE_DIR}/memtable/alloc_tracker.cc + ${ROCKSDB_SOURCE_DIR}/memtable/hash_linklist_rep.cc + ${ROCKSDB_SOURCE_DIR}/memtable/hash_skiplist_rep.cc + ${ROCKSDB_SOURCE_DIR}/memtable/skiplistrep.cc + ${ROCKSDB_SOURCE_DIR}/memtable/vectorrep.cc + ${ROCKSDB_SOURCE_DIR}/memtable/write_buffer_manager.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/histogram.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/histogram_windowing.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/in_memory_stats_history.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/instrumented_mutex.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/iostats_context.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/perf_context.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/perf_level.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/persistent_stats_history.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/statistics.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/thread_status_impl.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/thread_status_updater.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/thread_status_util.cc + ${ROCKSDB_SOURCE_DIR}/monitoring/thread_status_util_debug.cc + ${ROCKSDB_SOURCE_DIR}/options/cf_options.cc + ${ROCKSDB_SOURCE_DIR}/options/configurable.cc + ${ROCKSDB_SOURCE_DIR}/options/customizable.cc + ${ROCKSDB_SOURCE_DIR}/options/db_options.cc + ${ROCKSDB_SOURCE_DIR}/options/options.cc + ${ROCKSDB_SOURCE_DIR}/options/options_helper.cc + ${ROCKSDB_SOURCE_DIR}/options/options_parser.cc + ${ROCKSDB_SOURCE_DIR}/port/stack_trace.cc + ${ROCKSDB_SOURCE_DIR}/table/adaptive/adaptive_table_factory.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/binary_search_index_reader.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_filter_block.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_table_builder.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_table_factory.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_table_iterator.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block_based_table_reader.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block_builder.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block_prefetcher.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/block_prefix_index.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/data_block_hash_index.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/data_block_footer.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/filter_block_reader_common.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/filter_policy.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/flush_block_policy.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/full_filter_block.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/hash_index_reader.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/index_builder.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/index_reader_common.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/parsed_full_filter_block.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/partitioned_filter_block.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/partitioned_index_iterator.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/partitioned_index_reader.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/reader_common.cc + ${ROCKSDB_SOURCE_DIR}/table/block_based/uncompression_dict_reader.cc + ${ROCKSDB_SOURCE_DIR}/table/block_fetcher.cc + ${ROCKSDB_SOURCE_DIR}/table/cuckoo/cuckoo_table_builder.cc + ${ROCKSDB_SOURCE_DIR}/table/cuckoo/cuckoo_table_factory.cc + ${ROCKSDB_SOURCE_DIR}/table/cuckoo/cuckoo_table_reader.cc + ${ROCKSDB_SOURCE_DIR}/table/format.cc + ${ROCKSDB_SOURCE_DIR}/table/get_context.cc + ${ROCKSDB_SOURCE_DIR}/table/iterator.cc + ${ROCKSDB_SOURCE_DIR}/table/merging_iterator.cc + ${ROCKSDB_SOURCE_DIR}/table/meta_blocks.cc + ${ROCKSDB_SOURCE_DIR}/table/persistent_cache_helper.cc + ${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_bloom.cc + ${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_builder.cc + ${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_factory.cc + ${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_index.cc + ${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_key_coding.cc + ${ROCKSDB_SOURCE_DIR}/table/plain/plain_table_reader.cc + ${ROCKSDB_SOURCE_DIR}/table/sst_file_dumper.cc + ${ROCKSDB_SOURCE_DIR}/table/sst_file_reader.cc + ${ROCKSDB_SOURCE_DIR}/table/sst_file_writer.cc + ${ROCKSDB_SOURCE_DIR}/table/table_factory.cc + ${ROCKSDB_SOURCE_DIR}/table/table_properties.cc + ${ROCKSDB_SOURCE_DIR}/table/two_level_iterator.cc + ${ROCKSDB_SOURCE_DIR}/test_util/sync_point.cc + ${ROCKSDB_SOURCE_DIR}/test_util/sync_point_impl.cc + ${ROCKSDB_SOURCE_DIR}/test_util/testutil.cc + ${ROCKSDB_SOURCE_DIR}/test_util/transaction_test_util.cc + ${ROCKSDB_SOURCE_DIR}/tools/block_cache_analyzer/block_cache_trace_analyzer.cc + ${ROCKSDB_SOURCE_DIR}/tools/dump/db_dump_tool.cc + ${ROCKSDB_SOURCE_DIR}/tools/io_tracer_parser_tool.cc + ${ROCKSDB_SOURCE_DIR}/tools/ldb_cmd.cc + ${ROCKSDB_SOURCE_DIR}/tools/ldb_tool.cc + ${ROCKSDB_SOURCE_DIR}/tools/sst_dump_tool.cc + ${ROCKSDB_SOURCE_DIR}/tools/trace_analyzer_tool.cc + ${ROCKSDB_SOURCE_DIR}/trace_replay/trace_replay.cc + ${ROCKSDB_SOURCE_DIR}/trace_replay/block_cache_tracer.cc + ${ROCKSDB_SOURCE_DIR}/trace_replay/io_tracer.cc + ${ROCKSDB_SOURCE_DIR}/util/coding.cc + ${ROCKSDB_SOURCE_DIR}/util/compaction_job_stats_impl.cc + ${ROCKSDB_SOURCE_DIR}/util/comparator.cc + ${ROCKSDB_SOURCE_DIR}/util/compression_context_cache.cc + ${ROCKSDB_SOURCE_DIR}/util/concurrent_task_limiter_impl.cc + ${ROCKSDB_SOURCE_DIR}/util/crc32c.cc + ${ROCKSDB_SOURCE_DIR}/util/dynamic_bloom.cc + ${ROCKSDB_SOURCE_DIR}/util/hash.cc + ${ROCKSDB_SOURCE_DIR}/util/murmurhash.cc + ${ROCKSDB_SOURCE_DIR}/util/random.cc + ${ROCKSDB_SOURCE_DIR}/util/rate_limiter.cc + ${ROCKSDB_SOURCE_DIR}/util/ribbon_config.cc + ${ROCKSDB_SOURCE_DIR}/util/slice.cc + ${ROCKSDB_SOURCE_DIR}/util/file_checksum_helper.cc + ${ROCKSDB_SOURCE_DIR}/util/status.cc + ${ROCKSDB_SOURCE_DIR}/util/string_util.cc + ${ROCKSDB_SOURCE_DIR}/util/thread_local.cc + ${ROCKSDB_SOURCE_DIR}/util/threadpool_imp.cc + ${ROCKSDB_SOURCE_DIR}/util/xxhash.cc + ${ROCKSDB_SOURCE_DIR}/utilities/backupable/backupable_db.cc + ${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_compaction_filter.cc + ${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_db.cc + ${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_db_impl.cc + ${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_db_impl_filesnapshot.cc + ${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_dump_tool.cc + ${ROCKSDB_SOURCE_DIR}/utilities/blob_db/blob_file.cc + ${ROCKSDB_SOURCE_DIR}/utilities/cassandra/cassandra_compaction_filter.cc + ${ROCKSDB_SOURCE_DIR}/utilities/cassandra/format.cc + ${ROCKSDB_SOURCE_DIR}/utilities/cassandra/merge_operator.cc + ${ROCKSDB_SOURCE_DIR}/utilities/checkpoint/checkpoint_impl.cc + ${ROCKSDB_SOURCE_DIR}/utilities/compaction_filters/remove_emptyvalue_compactionfilter.cc + ${ROCKSDB_SOURCE_DIR}/utilities/debug.cc + ${ROCKSDB_SOURCE_DIR}/utilities/env_mirror.cc + ${ROCKSDB_SOURCE_DIR}/utilities/env_timed.cc + ${ROCKSDB_SOURCE_DIR}/utilities/fault_injection_env.cc + ${ROCKSDB_SOURCE_DIR}/utilities/fault_injection_fs.cc + ${ROCKSDB_SOURCE_DIR}/utilities/leveldb_options/leveldb_options.cc + ${ROCKSDB_SOURCE_DIR}/utilities/memory/memory_util.cc + ${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/bytesxor.cc + ${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/max.cc + ${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/put.cc + ${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/sortlist.cc + ${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/string_append/stringappend.cc + ${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/string_append/stringappend2.cc + ${ROCKSDB_SOURCE_DIR}/utilities/merge_operators/uint64add.cc + ${ROCKSDB_SOURCE_DIR}/utilities/object_registry.cc + ${ROCKSDB_SOURCE_DIR}/utilities/option_change_migration/option_change_migration.cc + ${ROCKSDB_SOURCE_DIR}/utilities/options/options_util.cc + ${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/block_cache_tier.cc + ${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/block_cache_tier_file.cc + ${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/block_cache_tier_metadata.cc + ${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/persistent_cache_tier.cc + ${ROCKSDB_SOURCE_DIR}/utilities/persistent_cache/volatile_tier_impl.cc + ${ROCKSDB_SOURCE_DIR}/utilities/simulator_cache/cache_simulator.cc + ${ROCKSDB_SOURCE_DIR}/utilities/simulator_cache/sim_cache.cc + ${ROCKSDB_SOURCE_DIR}/utilities/table_properties_collectors/compact_on_deletion_collector.cc + ${ROCKSDB_SOURCE_DIR}/utilities/trace/file_trace_reader_writer.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/lock_manager.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/point/point_lock_tracker.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/point/point_lock_manager.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/range_tree_lock_manager.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/range_tree_lock_tracker.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/optimistic_transaction_db_impl.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/optimistic_transaction.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/pessimistic_transaction.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/pessimistic_transaction_db.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/snapshot_checker.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/transaction_base.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/transaction_db_mutex_impl.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/transaction_util.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/write_prepared_txn.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/write_prepared_txn_db.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/write_unprepared_txn.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/write_unprepared_txn_db.cc + ${ROCKSDB_SOURCE_DIR}/utilities/ttl/db_ttl_impl.cc + ${ROCKSDB_SOURCE_DIR}/utilities/write_batch_with_index/write_batch_with_index.cc + ${ROCKSDB_SOURCE_DIR}/utilities/write_batch_with_index/write_batch_with_index_internal.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/concurrent_tree.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/keyrange.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/lock_request.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/locktree.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/manager.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/range_buffer.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/treenode.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/txnid_set.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/locktree/wfg.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/standalone_port.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/util/dbt.cc + ${ROCKSDB_SOURCE_DIR}/utilities/transactions/lock/range/range_tree/lib/util/memarena.cc + rocksdb_build_version.cc) if(HAVE_SSE42 AND NOT MSVC) set_source_files_properties( diff --git a/contrib/rocksdb-cmake/rocksdb_build_version.cc b/contrib/rocksdb-cmake/rocksdb_build_version.cc index 8697652ae9f..f9639da516f 100644 --- a/contrib/rocksdb-cmake/rocksdb_build_version.cc +++ b/contrib/rocksdb-cmake/rocksdb_build_version.cc @@ -1,3 +1,62 @@ -const char* rocksdb_build_git_sha = "rocksdb_build_git_sha:0"; -const char* rocksdb_build_git_date = "rocksdb_build_git_date:2000-01-01"; -const char* rocksdb_build_compile_date = "2000-01-01"; +// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. +/// This file was edited for ClickHouse. + +#include + +#include "rocksdb/version.h" +#include "util/string_util.h" + +// The build script may replace these values with real values based +// on whether or not GIT is available and the platform settings +static const std::string rocksdb_build_git_sha = "rocksdb_build_git_sha:0"; +static const std::string rocksdb_build_git_tag = "rocksdb_build_git_tag:master"; +static const std::string rocksdb_build_date = "rocksdb_build_date:2000-01-01"; + +namespace ROCKSDB_NAMESPACE { +static void AddProperty(std::unordered_map *props, const std::string& name) { + size_t colon = name.find(":"); + if (colon != std::string::npos && colon > 0 && colon < name.length() - 1) { + // If we found a "@:", then this property was a build-time substitution that failed. Skip it + size_t at = name.find("@", colon); + if (at != colon + 1) { + // Everything before the colon is the name, after is the value + (*props)[name.substr(0, colon)] = name.substr(colon + 1); + } + } +} + +static std::unordered_map* LoadPropertiesSet() { + auto * properties = new std::unordered_map(); + AddProperty(properties, rocksdb_build_git_sha); + AddProperty(properties, rocksdb_build_git_tag); + AddProperty(properties, rocksdb_build_date); + return properties; +} + +const std::unordered_map& GetRocksBuildProperties() { + static std::unique_ptr> props(LoadPropertiesSet()); + return *props; +} + +std::string GetRocksVersionAsString(bool with_patch) { + std::string version = ToString(ROCKSDB_MAJOR) + "." + ToString(ROCKSDB_MINOR); + if (with_patch) { + return version + "." + ToString(ROCKSDB_PATCH); + } else { + return version; + } +} + +std::string GetRocksBuildInfoAsString(const std::string& program, bool verbose) { + std::string info = program + " (RocksDB) " + GetRocksVersionAsString(true); + if (verbose) { + for (const auto& it : GetRocksBuildProperties()) { + info.append("\n "); + info.append(it.first); + info.append(": "); + info.append(it.second); + } + } + return info; +} +} // namespace ROCKSDB_NAMESPACE diff --git a/contrib/s2geometry-cmake/CMakeLists.txt b/contrib/s2geometry-cmake/CMakeLists.txt index f54562652a6..41d570c9afd 100644 --- a/contrib/s2geometry-cmake/CMakeLists.txt +++ b/contrib/s2geometry-cmake/CMakeLists.txt @@ -115,6 +115,8 @@ set(S2_SRCS add_library(s2 ${S2_SRCS}) +set_property(TARGET s2 PROPERTY CXX_STANDARD 11) + if (OPENSSL_FOUND) target_link_libraries(s2 PRIVATE ${OPENSSL_LIBRARIES}) endif() diff --git a/contrib/wordnet-blast b/contrib/wordnet-blast new file mode 160000 index 00000000000..1d16ac28036 --- /dev/null +++ b/contrib/wordnet-blast @@ -0,0 +1 @@ +Subproject commit 1d16ac28036e19fe8da7ba72c16a307fbdf8c87e diff --git a/contrib/wordnet-blast-cmake/CMakeLists.txt b/contrib/wordnet-blast-cmake/CMakeLists.txt new file mode 100644 index 00000000000..8d59c312664 --- /dev/null +++ b/contrib/wordnet-blast-cmake/CMakeLists.txt @@ -0,0 +1,13 @@ +set(LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/wordnet-blast") + +set(SRCS + "${LIBRARY_DIR}/wnb/core/info_helper.cc" + "${LIBRARY_DIR}/wnb/core/load_wordnet.cc" + "${LIBRARY_DIR}/wnb/core/wordnet.cc" +) + +add_library(wnb ${SRCS}) + +target_link_libraries(wnb PRIVATE boost::headers_only boost::graph) + +target_include_directories(wnb PUBLIC "${LIBRARY_DIR}") \ No newline at end of file diff --git a/docker/packager/deb/Dockerfile b/docker/packager/deb/Dockerfile index 2f1d28efe61..241b691cd23 100644 --- a/docker/packager/deb/Dockerfile +++ b/docker/packager/deb/Dockerfile @@ -27,7 +27,7 @@ RUN apt-get update \ # Special dpkg-deb (https://github.com/ClickHouse-Extras/dpkg) version which is able # to compress files using pigz (https://zlib.net/pigz/) instead of gzip. # Significantly increase deb packaging speed and compatible with old systems -RUN curl -O https://clickhouse-builds.s3.yandex.net/utils/1/dpkg-deb \ +RUN curl -O https://clickhouse-datasets.s3.yandex.net/utils/1/dpkg-deb \ && chmod +x dpkg-deb \ && cp dpkg-deb /usr/bin diff --git a/docker/packager/unbundled/Dockerfile b/docker/packager/unbundled/Dockerfile index 4dd6dbc61d8..07031aa2d1b 100644 --- a/docker/packager/unbundled/Dockerfile +++ b/docker/packager/unbundled/Dockerfile @@ -2,7 +2,7 @@ FROM yandex/clickhouse-deb-builder RUN export CODENAME="$(lsb_release --codename --short | tr 'A-Z' 'a-z')" \ - && wget -nv -O /tmp/arrow-keyring.deb "https://apache.bintray.com/arrow/ubuntu/apache-arrow-archive-keyring-latest-${CODENAME}.deb" \ + && wget -nv -O /tmp/arrow-keyring.deb "https://apache.jfrog.io/artifactory/arrow/ubuntu/apache-arrow-apt-source-latest-${CODENAME}.deb" \ && dpkg -i /tmp/arrow-keyring.deb # Libraries from OS are only needed to test the "unbundled" build (that is not used in production). @@ -23,6 +23,7 @@ RUN apt-get update \ libboost-regex-dev \ libboost-context-dev \ libboost-coroutine-dev \ + libboost-graph-dev \ zlib1g-dev \ liblz4-dev \ libdouble-conversion-dev \ diff --git a/docker/server/entrypoint.sh b/docker/server/entrypoint.sh index c93017bd0d3..40ba9f730cb 100755 --- a/docker/server/entrypoint.sh +++ b/docker/server/entrypoint.sh @@ -72,7 +72,10 @@ do if [ "$DO_CHOWN" = "1" ]; then # ensure proper directories permissions - chown -R "$USER:$GROUP" "$dir" + # but skip it for if directory already has proper premissions, cause recursive chown may be slow + if [ "$(stat -c %u "$dir")" != "$USER" ] || [ "$(stat -c %g "$dir")" != "$GROUP" ]; then + chown -R "$USER:$GROUP" "$dir" + fi elif ! $gosu test -d "$dir" -a -w "$dir" -a -r "$dir"; then echo "Necessary directory '$dir' isn't accessible by user with id '$USER'" exit 1 @@ -161,6 +164,10 @@ fi # if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then + # Watchdog is launched by default, but does not send SIGINT to the main process, + # so the container can't be finished by ctrl+c + CLICKHOUSE_WATCHDOG_ENABLE=${CLICKHOUSE_WATCHDOG_ENABLE:-0} + export CLICKHOUSE_WATCHDOG_ENABLE exec $gosu /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" "$@" fi diff --git a/docker/test/base/Dockerfile b/docker/test/base/Dockerfile index a722132c3a5..29ac7a925b8 100644 --- a/docker/test/base/Dockerfile +++ b/docker/test/base/Dockerfile @@ -27,7 +27,7 @@ RUN apt-get update \ # Special dpkg-deb (https://github.com/ClickHouse-Extras/dpkg) version which is able # to compress files using pigz (https://zlib.net/pigz/) instead of gzip. # Significantly increase deb packaging speed and compatible with old systems -RUN curl -O https://clickhouse-builds.s3.yandex.net/utils/1/dpkg-deb \ +RUN curl -O https://clickhouse-datasets.s3.yandex.net/utils/1/dpkg-deb \ && chmod +x dpkg-deb \ && cp dpkg-deb /usr/bin @@ -61,4 +61,7 @@ ENV TSAN_OPTIONS='halt_on_error=1 history_size=7' ENV UBSAN_OPTIONS='print_stacktrace=1' ENV MSAN_OPTIONS='abort_on_error=1 poison_in_dtor=1' +ENV TZ=Europe/Moscow +RUN ln -snf "/usr/share/zoneinfo/$TZ" /etc/localtime && echo "$TZ" > /etc/timezone + CMD sleep 1 diff --git a/docker/test/fasttest/Dockerfile b/docker/test/fasttest/Dockerfile index 2864f7fc4da..916c94e7311 100644 --- a/docker/test/fasttest/Dockerfile +++ b/docker/test/fasttest/Dockerfile @@ -27,7 +27,7 @@ RUN apt-get update \ # Special dpkg-deb (https://github.com/ClickHouse-Extras/dpkg) version which is able # to compress files using pigz (https://zlib.net/pigz/) instead of gzip. # Significantly increase deb packaging speed and compatible with old systems -RUN curl -O https://clickhouse-builds.s3.yandex.net/utils/1/dpkg-deb \ +RUN curl -O https://clickhouse-datasets.s3.yandex.net/utils/1/dpkg-deb \ && chmod +x dpkg-deb \ && cp dpkg-deb /usr/bin @@ -65,7 +65,7 @@ RUN apt-get update \ unixodbc \ --yes --no-install-recommends -RUN pip3 install numpy scipy pandas +RUN pip3 install numpy scipy pandas Jinja2 # This symlink required by gcc to find lld compiler RUN ln -s /usr/bin/lld-${LLVM_VERSION} /usr/bin/ld.lld diff --git a/docker/test/fasttest/run.sh b/docker/test/fasttest/run.sh index 3e8bf306a83..6419ea3659c 100755 --- a/docker/test/fasttest/run.sh +++ b/docker/test/fasttest/run.sh @@ -299,6 +299,7 @@ function run_tests 01318_decrypt # Depends on OpenSSL 01663_aes_msan # Depends on OpenSSL 01667_aes_args_check # Depends on OpenSSL + 01683_codec_encrypted # Depends on OpenSSL 01776_decrypt_aead_size_check # Depends on OpenSSL 01811_filter_by_null # Depends on OpenSSL 01281_unsucceeded_insert_select_queries_counter @@ -310,6 +311,7 @@ function run_tests 01411_bayesian_ab_testing 01798_uniq_theta_sketch 01799_long_uniq_theta_sketch + 01890_stem # depends on libstemmer_c collate collation _orc_ diff --git a/docker/test/fuzzer/run-fuzzer.sh b/docker/test/fuzzer/run-fuzzer.sh index 3ca67a58278..3b074ae46cd 100755 --- a/docker/test/fuzzer/run-fuzzer.sh +++ b/docker/test/fuzzer/run-fuzzer.sh @@ -194,6 +194,10 @@ continue jobs pstree -aspgT + server_exit_code=0 + wait $server_pid || server_exit_code=$? + echo "Server exit code is $server_exit_code" + # Make files with status and description we'll show for this check on Github. task_exit_code=$fuzzer_exit_code if [ "$server_died" == 1 ] diff --git a/docker/test/integration/base/Dockerfile b/docker/test/integration/base/Dockerfile index e15697da029..344c1b9a698 100644 --- a/docker/test/integration/base/Dockerfile +++ b/docker/test/integration/base/Dockerfile @@ -32,7 +32,7 @@ RUN rm -rf \ RUN apt-get clean # Install MySQL ODBC driver -RUN curl 'https://cdn.mysql.com//Downloads/Connector-ODBC/8.0/mysql-connector-odbc-8.0.21-linux-glibc2.12-x86-64bit.tar.gz' --output 'mysql-connector.tar.gz' && tar -xzf mysql-connector.tar.gz && cd mysql-connector-odbc-8.0.21-linux-glibc2.12-x86-64bit/lib && mv * /usr/local/lib && ln -s /usr/local/lib/libmyodbc8a.so /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so +RUN curl 'https://downloads.mysql.com/archives/get/p/10/file/mysql-connector-odbc-8.0.21-linux-glibc2.12-x86-64bit.tar.gz' --location --output 'mysql-connector.tar.gz' && tar -xzf mysql-connector.tar.gz && cd mysql-connector-odbc-8.0.21-linux-glibc2.12-x86-64bit/lib && mv * /usr/local/lib && ln -s /usr/local/lib/libmyodbc8a.so /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so # Unfortunately this is required for a single test for conversion data from zookeeper to clickhouse-keeper. # ZooKeeper is not started by default, but consumes some space in containers. @@ -49,4 +49,3 @@ RUN mkdir /zookeeper && chmod -R 777 /zookeeper ENV TZ=Europe/Moscow RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone - diff --git a/docker/test/integration/runner/Dockerfile b/docker/test/integration/runner/Dockerfile index 0665ab7560f..6bde4ef60db 100644 --- a/docker/test/integration/runner/Dockerfile +++ b/docker/test/integration/runner/Dockerfile @@ -76,6 +76,7 @@ RUN python3 -m pip install \ pytest \ pytest-timeout \ pytest-xdist \ + pytest-repeat \ redis \ tzlocal \ urllib3 \ diff --git a/docker/test/integration/runner/compose/docker_compose_jdbc_bridge.yml b/docker/test/integration/runner/compose/docker_compose_jdbc_bridge.yml index e3e0d5d07ce..a65ef629df6 100644 --- a/docker/test/integration/runner/compose/docker_compose_jdbc_bridge.yml +++ b/docker/test/integration/runner/compose/docker_compose_jdbc_bridge.yml @@ -14,10 +14,14 @@ services: } EOF ./docker-entrypoint.sh' - ports: - - 9020:9019 + expose: + - 9019 healthcheck: test: ["CMD", "curl", "-s", "localhost:9019/ping"] interval: 5s timeout: 3s retries: 30 + volumes: + - type: ${JDBC_BRIDGE_FS:-tmpfs} + source: ${JDBC_BRIDGE_LOGS:-} + target: /app/logs \ No newline at end of file diff --git a/docker/test/integration/runner/compose/docker_compose_mongo_secure.yml b/docker/test/integration/runner/compose/docker_compose_mongo_secure.yml new file mode 100644 index 00000000000..5d283cfc343 --- /dev/null +++ b/docker/test/integration/runner/compose/docker_compose_mongo_secure.yml @@ -0,0 +1,13 @@ +version: '2.3' +services: + mongo1: + image: mongo:3.6 + restart: always + environment: + MONGO_INITDB_ROOT_USERNAME: root + MONGO_INITDB_ROOT_PASSWORD: clickhouse + volumes: + - ${MONGO_CONFIG_PATH}:/mongo/ + ports: + - ${MONGO_EXTERNAL_PORT}:${MONGO_INTERNAL_PORT} + command: --config /mongo/mongo_secure.conf --profile=2 --verbose diff --git a/docker/test/integration/runner/compose/docker_compose_mysql_5_7_for_materialize_mysql.yml b/docker/test/integration/runner/compose/docker_compose_mysql_5_7_for_materialized_mysql.yml similarity index 100% rename from docker/test/integration/runner/compose/docker_compose_mysql_5_7_for_materialize_mysql.yml rename to docker/test/integration/runner/compose/docker_compose_mysql_5_7_for_materialized_mysql.yml diff --git a/docker/test/integration/runner/compose/docker_compose_postgres.yml b/docker/test/integration/runner/compose/docker_compose_postgres.yml index 4b83ed21410..c444e71798e 100644 --- a/docker/test/integration/runner/compose/docker_compose_postgres.yml +++ b/docker/test/integration/runner/compose/docker_compose_postgres.yml @@ -2,7 +2,7 @@ version: '2.3' services: postgres1: image: postgres - command: ["postgres", "-c", "logging_collector=on", "-c", "log_directory=/postgres/logs", "-c", "log_filename=postgresql.log", "-c", "log_statement=all"] + command: ["postgres", "-c", "logging_collector=on", "-c", "log_directory=/postgres/logs", "-c", "log_filename=postgresql.log", "-c", "log_statement=all", "-c", "max_connections=200"] restart: always expose: - ${POSTGRES_PORT} diff --git a/docker/test/integration/runner/compose/docker_compose_rabbitmq.yml b/docker/test/integration/runner/compose/docker_compose_rabbitmq.yml index 99e0ea8e280..539c065e03b 100644 --- a/docker/test/integration/runner/compose/docker_compose_rabbitmq.yml +++ b/docker/test/integration/runner/compose/docker_compose_rabbitmq.yml @@ -2,7 +2,7 @@ version: '2.3' services: rabbitmq1: - image: rabbitmq:3-management-alpine + image: rabbitmq:3.8-management-alpine hostname: rabbitmq1 expose: - ${RABBITMQ_PORT} diff --git a/docker/test/performance-comparison/compare.sh b/docker/test/performance-comparison/compare.sh index 9a8ffff7cd9..a6e1ee482d6 100755 --- a/docker/test/performance-comparison/compare.sh +++ b/docker/test/performance-comparison/compare.sh @@ -1196,7 +1196,7 @@ create table changes engine File(TSV, 'metrics/changes.tsv') as if(left > right, left / right, right / left) times_diff from metrics group by metric - having abs(diff) > 0.05 and isFinite(diff) + having abs(diff) > 0.05 and isFinite(diff) and isFinite(times_diff) ) order by diff desc ; diff --git a/docker/test/performance-comparison/perf.py b/docker/test/performance-comparison/perf.py index 9628c512e83..a6e7e397e32 100755 --- a/docker/test/performance-comparison/perf.py +++ b/docker/test/performance-comparison/perf.py @@ -183,6 +183,10 @@ for conn_index, c in enumerate(all_connections): # requires clickhouse-driver >= 1.1.5 to accept arbitrary new settings # (https://github.com/mymarilyn/clickhouse-driver/pull/142) c.settings[s.tag] = s.text + # We have to perform a query to make sure the settings work. Otherwise an + # unknown setting will lead to failing precondition check, and we will skip + # the test, which is wrong. + c.execute("select 1") reportStageEnd('settings') diff --git a/docker/test/stateful/run.sh b/docker/test/stateful/run.sh index d8ea2153b36..de058469192 100755 --- a/docker/test/stateful/run.sh +++ b/docker/test/stateful/run.sh @@ -2,6 +2,11 @@ set -e -x +# Choose random timezone for this test run +TZ="$(grep -v '#' /usr/share/zoneinfo/zone.tab | awk '{print $3}' | shuf | head -n1)" +echo "Choosen random timezone $TZ" +ln -snf "/usr/share/zoneinfo/$TZ" /etc/localtime && echo "$TZ" > /etc/timezone + dpkg -i package_folder/clickhouse-common-static_*.deb; dpkg -i package_folder/clickhouse-common-static-dbg_*.deb dpkg -i package_folder/clickhouse-server_*.deb diff --git a/docker/test/stateless/Dockerfile b/docker/test/stateless/Dockerfile index 17c89232e17..f5fa86a6f33 100644 --- a/docker/test/stateless/Dockerfile +++ b/docker/test/stateless/Dockerfile @@ -32,7 +32,7 @@ RUN apt-get update -y \ postgresql-client \ sqlite3 -RUN pip3 install numpy scipy pandas +RUN pip3 install numpy scipy pandas Jinja2 RUN mkdir -p /tmp/clickhouse-odbc-tmp \ && wget -nv -O - ${odbc_driver_url} | tar --strip-components=1 -xz -C /tmp/clickhouse-odbc-tmp \ diff --git a/docker/test/stateless/process_functional_tests_result.py b/docker/test/stateless/process_functional_tests_result.py index b3c8fa96144..e60424ad4d1 100755 --- a/docker/test/stateless/process_functional_tests_result.py +++ b/docker/test/stateless/process_functional_tests_result.py @@ -12,7 +12,7 @@ UNKNOWN_SIGN = "[ UNKNOWN " SKIPPED_SIGN = "[ SKIPPED " HUNG_SIGN = "Found hung queries in processlist" -NO_TASK_TIMEOUT_SIGN = "All tests have finished" +NO_TASK_TIMEOUT_SIGNS = ["All tests have finished", "No tests were run"] RETRIES_SIGN = "Some tests were restarted" @@ -29,7 +29,7 @@ def process_test_log(log_path): with open(log_path, 'r') as test_file: for line in test_file: line = line.strip() - if NO_TASK_TIMEOUT_SIGN in line: + if any(s in line for s in NO_TASK_TIMEOUT_SIGNS): task_timeout = False if HUNG_SIGN in line: hung = True @@ -80,6 +80,7 @@ def process_result(result_path): if result_path and os.path.exists(result_path): total, skipped, unknown, failed, success, hung, task_timeout, retries, test_results = process_test_log(result_path) is_flacky_check = 1 < int(os.environ.get('NUM_TRIES', 1)) + logging.info("Is flacky check: %s", is_flacky_check) # If no tests were run (success == 0) it indicates an error (e.g. server did not start or crashed immediately) # But it's Ok for "flaky checks" - they can contain just one test for check which is marked as skipped. if failed != 0 or unknown != 0 or (success == 0 and (not is_flacky_check)): diff --git a/docker/test/stateless/run.sh b/docker/test/stateless/run.sh index a7fb956bf94..e5ef72e747a 100755 --- a/docker/test/stateless/run.sh +++ b/docker/test/stateless/run.sh @@ -3,6 +3,11 @@ # fail on errors, verbose and export all env variables set -e -x -a +# Choose random timezone for this test run. +TZ="$(grep -v '#' /usr/share/zoneinfo/zone.tab | awk '{print $3}' | shuf | head -n1)" +echo "Choosen random timezone $TZ" +ln -snf "/usr/share/zoneinfo/$TZ" /etc/localtime && echo "$TZ" > /etc/timezone + dpkg -i package_folder/clickhouse-common-static_*.deb dpkg -i package_folder/clickhouse-common-static-dbg_*.deb dpkg -i package_folder/clickhouse-server_*.deb @@ -138,15 +143,18 @@ if [[ -n "$WITH_COVERAGE" ]] && [[ "$WITH_COVERAGE" -eq 1 ]]; then fi tar -chf /test_output/text_log_dump.tar /var/lib/clickhouse/data/system/text_log ||: tar -chf /test_output/query_log_dump.tar /var/lib/clickhouse/data/system/query_log ||: +tar -chf /test_output/zookeeper_log_dump.tar /var/lib/clickhouse/data/system/zookeeper_log ||: tar -chf /test_output/coordination.tar /var/lib/clickhouse/coordination ||: if [[ -n "$USE_DATABASE_REPLICATED" ]] && [[ "$USE_DATABASE_REPLICATED" -eq 1 ]]; then - grep -Fa "Fatal" /var/log/clickhouse-server/clickhouse-server1.log ||: - grep -Fa "Fatal" /var/log/clickhouse-server/clickhouse-server2.log ||: + grep -Fa "Fatal" /var/log/clickhouse-server/clickhouse-server1.log ||: + grep -Fa "Fatal" /var/log/clickhouse-server/clickhouse-server2.log ||: pigz < /var/log/clickhouse-server/clickhouse-server1.log > /test_output/clickhouse-server1.log.gz ||: pigz < /var/log/clickhouse-server/clickhouse-server2.log > /test_output/clickhouse-server2.log.gz ||: mv /var/log/clickhouse-server/stderr1.log /test_output/ ||: mv /var/log/clickhouse-server/stderr2.log /test_output/ ||: + tar -chf /test_output/zookeeper_log_dump1.tar /var/lib/clickhouse1/data/system/zookeeper_log ||: + tar -chf /test_output/zookeeper_log_dump2.tar /var/lib/clickhouse2/data/system/zookeeper_log ||: tar -chf /test_output/coordination1.tar /var/lib/clickhouse1/coordination ||: tar -chf /test_output/coordination2.tar /var/lib/clickhouse2/coordination ||: fi diff --git a/docker/test/stateless_unbundled/Dockerfile b/docker/test/stateless_unbundled/Dockerfile index c5463ac447d..53857a90ac7 100644 --- a/docker/test/stateless_unbundled/Dockerfile +++ b/docker/test/stateless_unbundled/Dockerfile @@ -77,9 +77,6 @@ RUN mkdir -p /tmp/clickhouse-odbc-tmp \ && odbcinst -i -s -l -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbc.ini.sample \ && rm -rf /tmp/clickhouse-odbc-tmp -ENV TZ=Europe/Moscow -RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone - COPY run.sh / CMD ["/bin/bash", "/run.sh"] diff --git a/docker/test/stress/run.sh b/docker/test/stress/run.sh index 428fdb9fdb7..87d127ab946 100755 --- a/docker/test/stress/run.sh +++ b/docker/test/stress/run.sh @@ -58,11 +58,11 @@ function start() echo "Cannot start clickhouse-server" cat /var/log/clickhouse-server/stdout.log tail -n1000 /var/log/clickhouse-server/stderr.log - tail -n1000 /var/log/clickhouse-server/clickhouse-server.log + tail -n100000 /var/log/clickhouse-server/clickhouse-server.log | grep -F -v ' RaftInstance:' -e ' RaftInstance' | tail -n1000 break fi # use root to match with current uid - clickhouse start --user root >/var/log/clickhouse-server/stdout.log 2>/var/log/clickhouse-server/stderr.log + clickhouse start --user root >/var/log/clickhouse-server/stdout.log 2>>/var/log/clickhouse-server/stderr.log sleep 0.5 counter=$((counter + 1)) done @@ -118,35 +118,35 @@ clickhouse-client --query "SELECT 'Server successfully started', 'OK'" >> /test_ [ -f /var/log/clickhouse-server/stderr.log ] || echo -e "Stderr log does not exist\tFAIL" # Print Fatal log messages to stdout -zgrep -Fa " " /var/log/clickhouse-server/clickhouse-server.log +zgrep -Fa " " /var/log/clickhouse-server/clickhouse-server.log* # Grep logs for sanitizer asserts, crashes and other critical errors # Sanitizer asserts zgrep -Fa "==================" /var/log/clickhouse-server/stderr.log >> /test_output/tmp zgrep -Fa "WARNING" /var/log/clickhouse-server/stderr.log >> /test_output/tmp -zgrep -Fav "ASan doesn't fully support makecontext/swapcontext functions" > /dev/null \ +zgrep -Fav "ASan doesn't fully support makecontext/swapcontext functions" /test_output/tmp > /dev/null \ && echo -e 'Sanitizer assert (in stderr.log)\tFAIL' >> /test_output/test_results.tsv \ || echo -e 'No sanitizer asserts\tOK' >> /test_output/test_results.tsv rm -f /test_output/tmp # OOM -zgrep -Fa " Application: Child process was terminated by signal 9" /var/log/clickhouse-server/clickhouse-server.log > /dev/null \ +zgrep -Fa " Application: Child process was terminated by signal 9" /var/log/clickhouse-server/clickhouse-server.log* > /dev/null \ && echo -e 'OOM killer (or signal 9) in clickhouse-server.log\tFAIL' >> /test_output/test_results.tsv \ || echo -e 'No OOM messages in clickhouse-server.log\tOK' >> /test_output/test_results.tsv # Logical errors -zgrep -Fa "Code: 49, e.displayText() = DB::Exception:" /var/log/clickhouse-server/clickhouse-server.log > /dev/null \ +zgrep -Fa "Code: 49, e.displayText() = DB::Exception:" /var/log/clickhouse-server/clickhouse-server.log* > /dev/null \ && echo -e 'Logical error thrown (see clickhouse-server.log)\tFAIL' >> /test_output/test_results.tsv \ || echo -e 'No logical errors\tOK' >> /test_output/test_results.tsv # Crash -zgrep -Fa "########################################" /var/log/clickhouse-server/clickhouse-server.log > /dev/null \ +zgrep -Fa "########################################" /var/log/clickhouse-server/clickhouse-server.log* > /dev/null \ && echo -e 'Killed by signal (in clickhouse-server.log)\tFAIL' >> /test_output/test_results.tsv \ || echo -e 'Not crashed\tOK' >> /test_output/test_results.tsv # It also checks for crash without stacktrace (printed by watchdog) -zgrep -Fa " " /var/log/clickhouse-server/clickhouse-server.log > /dev/null \ +zgrep -Fa " " /var/log/clickhouse-server/clickhouse-server.log* > /dev/null \ && echo -e 'Fatal message in clickhouse-server.log\tFAIL' >> /test_output/test_results.tsv \ || echo -e 'No fatal messages in clickhouse-server.log\tOK' >> /test_output/test_results.tsv diff --git a/docker/test/stress/stress b/docker/test/stress/stress index c98a527c1fe..c71722809d7 100755 --- a/docker/test/stress/stress +++ b/docker/test/stress/stress @@ -20,6 +20,7 @@ def get_skip_list_cmd(path): def get_options(i): options = [] + client_options = [] if 0 < i: options.append("--order=random") @@ -27,25 +28,29 @@ def get_options(i): options.append("--db-engine=Ordinary") if i % 3 == 2: - options.append('''--client-option='allow_experimental_database_replicated=1' --db-engine="Replicated('/test/db/test_{}', 's1', 'r1')"'''.format(i)) + options.append('''--db-engine="Replicated('/test/db/test_{}', 's1', 'r1')"'''.format(i)) + client_options.append('allow_experimental_database_replicated=1') # If database name is not specified, new database is created for each functional test. # Run some threads with one database for all tests. if i % 2 == 1: options.append(" --database=test_{}".format(i)) - if i % 7 == 0: - options.append(" --client-option='join_use_nulls=1'") + if i % 5 == 1: + client_options.append("join_use_nulls=1") - if i % 14 == 0: - options.append(' --client-option="join_algorithm=\'partial_merge\'"') + if i % 15 == 6: + client_options.append("join_algorithm='partial_merge'") - if i % 21 == 0: - options.append(' --client-option="join_algorithm=\'auto\'"') - options.append(' --client-option="max_rows_in_join=1000"') + if i % 15 == 11: + client_options.append("join_algorithm='auto'") + client_options.append('max_rows_in_join=1000') if i == 13: - options.append(" --client-option='memory_tracker_fault_probability=0.00001'") + client_options.append('memory_tracker_fault_probability=0.001') + + if client_options: + options.append(" --client-option " + ' '.join(client_options)) return ' '.join(options) diff --git a/docker/test/testflows/runner/Dockerfile b/docker/test/testflows/runner/Dockerfile index 9fa028fedca..264b98c669d 100644 --- a/docker/test/testflows/runner/Dockerfile +++ b/docker/test/testflows/runner/Dockerfile @@ -35,7 +35,7 @@ RUN apt-get update \ ENV TZ=Europe/Moscow RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone -RUN pip3 install urllib3 testflows==1.6.90 docker-compose==1.29.1 docker==5.0.0 dicttoxml kazoo tzlocal python-dateutil numpy +RUN pip3 install urllib3 testflows==1.7.20 docker-compose==1.29.1 docker==5.0.0 dicttoxml kazoo tzlocal python-dateutil numpy ENV DOCKER_CHANNEL stable ENV DOCKER_VERSION 20.10.6 diff --git a/docker/test/unit/Dockerfile b/docker/test/unit/Dockerfile index e2f4a691939..e111611eecd 100644 --- a/docker/test/unit/Dockerfile +++ b/docker/test/unit/Dockerfile @@ -1,8 +1,6 @@ # docker build -t yandex/clickhouse-unit-test . FROM yandex/clickhouse-stateless-test -ENV TZ=Europe/Moscow -RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone RUN apt-get install gdb COPY run.sh / diff --git a/docs/README.md b/docs/README.md index a4df023a6ad..d71e92f20d1 100644 --- a/docs/README.md +++ b/docs/README.md @@ -9,7 +9,7 @@ Many developers can say that the code is the best docs by itself, and they are r If you want to help ClickHouse with documentation you can face, for example, the following questions: - "I don't know how to write." - + We have prepared some [recommendations](#what-to-write) for you. - "I know what I want to write, but I don't know how to contribute to docs." @@ -71,17 +71,17 @@ Contribute all new information in English language. Other languages are translat ``` - Bold text: `**asterisks**` or `__underlines__`. -- Links: `[link text](uri)`. Examples: +- Links: `[link text](uri)`. Examples: - External link: `[ClickHouse repo](https://github.com/ClickHouse/ClickHouse)` - Cross link: `[How to build docs](tools/README.md)` - Images: `![Exclamation sign](uri)`. You can refer to local images as well as remote in internet. - Lists: Lists can be of two types: - + - `- unordered`: Each item starts from the `-`. - `1. ordered`: Each item starts from the number. - + A list must be separated from the text by an empty line. Nested lists must be indented with 4 spaces. - Inline code: `` `in backticks` ``. @@ -107,7 +107,7 @@ Contribute all new information in English language. Other languages are translat - Text hidden behind a cut (single sting that opens on click): ```text -
Visible text +
Visible text Hidden content.
`. ``` diff --git a/docs/_description_templates/template-data-type.md b/docs/_description_templates/template-data-type.md index 5e560b9325d..d39be305838 100644 --- a/docs/_description_templates/template-data-type.md +++ b/docs/_description_templates/template-data-type.md @@ -1,6 +1,6 @@ --- -toc_priority: -toc_title: +toc_priority: +toc_title: --- # data_type_name {#data_type-name} diff --git a/docs/_description_templates/template-engine.md b/docs/_description_templates/template-engine.md index 490f490fc4e..392bc59ed33 100644 --- a/docs/_description_templates/template-engine.md +++ b/docs/_description_templates/template-engine.md @@ -58,6 +58,6 @@ Result: Follow up with any text to clarify the example. -**See Also** +**See Also** - [link](#) diff --git a/docs/_description_templates/template-function.md b/docs/_description_templates/template-function.md index 3d4d921898a..6bdc764c449 100644 --- a/docs/_description_templates/template-function.md +++ b/docs/_description_templates/template-function.md @@ -14,8 +14,8 @@ More text (Optional). **Arguments** (Optional) -- `x` — Description. Optional (only for optional arguments). Possible values: . Default value: . [Type name](relative/path/to/type/dscr.md#type). -- `y` — Description. Optional (only for optional arguments). Possible values: .Default value: . [Type name](relative/path/to/type/dscr.md#type). +- `x` — Description. Optional (only for optional arguments). Possible values: . Default value: . [Type name](relative/path/to/type/dscr.md#type). +- `y` — Description. Optional (only for optional arguments). Possible values: .Default value: . [Type name](relative/path/to/type/dscr.md#type). **Parameters** (Optional, only for parametric aggregate functions) @@ -23,7 +23,7 @@ More text (Optional). **Returned value(s)** -- Returned values list. +- Returned values list. Type: [Type name](relative/path/to/type/dscr.md#type). diff --git a/docs/_includes/cmake_in_clickhouse_footer.md b/docs/_includes/cmake_in_clickhouse_footer.md index ab884bd4dfe..bf8411ba815 100644 --- a/docs/_includes/cmake_in_clickhouse_footer.md +++ b/docs/_includes/cmake_in_clickhouse_footer.md @@ -16,8 +16,8 @@ Better: option(ENABLE_TESTS "Provide unit_test_dbms target with Google.test unit tests" OFF) ``` -If the option's purpose can't be guessed by its name, or the purpose guess may be misleading, or option has some -pre-conditions, leave a comment above the `option()` line and explain what it does. +If the option's purpose can't be guessed by its name, or the purpose guess may be misleading, or option has some +pre-conditions, leave a comment above the `option()` line and explain what it does. The best way would be linking the docs page (if it exists). The comment is parsed into a separate column (see below). @@ -33,7 +33,7 @@ option(ENABLE_TESTS "Provide unit_test_dbms target with Google.test unit tests" Suppose you have an option that may strip debug symbols from the ClickHouse's part. This can speed up the linking process, but produces a binary that cannot be debugged. -In that case, prefer explicitly raising a warning telling the developer that he may be doing something wrong. +In that case, prefer explicitly raising a warning telling the developer that he may be doing something wrong. Also, such options should be disabled if applies. Bad: diff --git a/docs/en/commercial/support.md b/docs/en/commercial/support.md index 1a3d1b71869..27f3f0c6a22 100644 --- a/docs/en/commercial/support.md +++ b/docs/en/commercial/support.md @@ -7,7 +7,7 @@ toc_title: Support !!! info "Info" If you have launched a ClickHouse commercial support service, feel free to [open a pull-request](https://github.com/ClickHouse/ClickHouse/edit/master/docs/en/commercial/support.md) adding it to the following list. - + ## Yandex.Cloud ClickHouse worldwide support from the authors of ClickHouse. Supports on-premise and cloud deployments. Ask details on clickhouse-support@yandex-team.com diff --git a/docs/en/development/adding_test_queries.md b/docs/en/development/adding_test_queries.md index 547d8b0fa37..4da027b8fb1 100644 --- a/docs/en/development/adding_test_queries.md +++ b/docs/en/development/adding_test_queries.md @@ -4,11 +4,11 @@ ClickHouse has hundreds (or even thousands) of features. Every commit gets check The core functionality is very well tested, but some corner-cases and different combinations of features can be uncovered with ClickHouse CI. -Most of the bugs/regressions we see happen in that 'grey area' where test coverage is poor. +Most of the bugs/regressions we see happen in that 'grey area' where test coverage is poor. -And we are very interested in covering most of the possible scenarios and feature combinations used in real life by tests. +And we are very interested in covering most of the possible scenarios and feature combinations used in real life by tests. -## Why adding tests +## Why adding tests Why/when you should add a test case into ClickHouse code: 1) you use some complicated scenarios / feature combinations / you have some corner case which is probably not widely used @@ -17,18 +17,18 @@ Why/when you should add a test case into ClickHouse code: 4) once the test is added/accepted, you can be sure the corner case you check will never be accidentally broken. 5) you will be a part of great open-source community 6) your name will be visible in the `system.contributors` table! -7) you will make a world bit better :) +7) you will make a world bit better :) ### Steps to do -#### Prerequisite +#### Prerequisite -I assume you run some Linux machine (you can use docker / virtual machines on other OS) and any modern browser / internet connection, and you have some basic Linux & SQL skills. +I assume you run some Linux machine (you can use docker / virtual machines on other OS) and any modern browser / internet connection, and you have some basic Linux & SQL skills. Any highly specialized knowledge is not needed (so you don't need to know C++ or know something about how ClickHouse CI works). -#### Preparation +#### Preparation 1) [create GitHub account](https://github.com/join) (if you haven't one yet) 2) [setup git](https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/set-up-git) @@ -54,17 +54,17 @@ git remote add upstream https://github.com/ClickHouse/ClickHouse #### New branch for the test -1) create a new branch from the latest clickhouse master +1) create a new branch from the latest clickhouse master ``` cd ~/workspace/ClickHouse git fetch upstream -git checkout -b name_for_a_branch_with_my_test upstream/master +git checkout -b name_for_a_branch_with_my_test upstream/master ``` -#### Install & run clickhouse +#### Install & run clickhouse 1) install `clickhouse-server` (follow [official docs](https://clickhouse.tech/docs/en/getting-started/install/)) -2) install test configurations (it will use Zookeeper mock implementation and adjust some settings) +2) install test configurations (it will use Zookeeper mock implementation and adjust some settings) ``` cd ~/workspace/ClickHouse/tests/config sudo ./install.sh @@ -74,7 +74,7 @@ sudo ./install.sh sudo systemctl restart clickhouse-server ``` -#### Creating the test file +#### Creating the test file 1) find the number for your test - find the file with the biggest number in `tests/queries/0_stateless/` @@ -86,7 +86,7 @@ tests/queries/0_stateless/01520_client_print_query_id.reference ``` Currently, the last number for the test is `01520`, so my test will have the number `01521` -2) create an SQL file with the next number and name of the feature you test +2) create an SQL file with the next number and name of the feature you test ```sh touch tests/queries/0_stateless/01521_dummy_test.sql @@ -112,16 +112,16 @@ clickhouse-client -nmT < tests/queries/0_stateless/01521_dummy_test.sql | tee te - fast - should not take longer than a few seconds (better subseconds) - correct - fails then feature is not working - deterministic - - isolated / stateless + - isolated / stateless - don't rely on some environment things - - don't rely on timing when possible -- try to cover corner cases (zeros / Nulls / empty sets / throwing exceptions) + - don't rely on timing when possible +- try to cover corner cases (zeros / Nulls / empty sets / throwing exceptions) - to test that query return errors, you can put special comment after the query: `-- { serverError 60 }` or `-- { clientError 20 }` - don't switch databases (unless necessary) - you can create several table replicas on the same node if needed - you can use one of the test cluster definitions when needed (see system.clusters) - use `number` / `numbers_mt` / `zeros` / `zeros_mt` and similar for queries / to initialize data when applicable -- clean up the created objects after test and before the test (DROP IF EXISTS) - in case of some dirty state +- clean up the created objects after test and before the test (DROP IF EXISTS) - in case of some dirty state - prefer sync mode of operations (mutations, merges, etc.) - use other SQL files in the `0_stateless` folder as an example - ensure the feature / feature combination you want to test is not yet covered with existing tests @@ -138,7 +138,7 @@ It's important to name tests correctly, so one could turn some tests subset off #### Commit / push / create PR. -1) commit & push your changes +1) commit & push your changes ```sh cd ~/workspace/ClickHouse git add tests/queries/0_stateless/01521_dummy_test.sql @@ -147,5 +147,5 @@ git commit # use some nice commit message when possible git push origin HEAD ``` 2) use a link which was shown during the push, to create a PR into the main repo -3) adjust the PR title and contents, in `Changelog category (leave one)` keep -`Build/Testing/Packaging Improvement`, fill the rest of the fields if you want. +3) adjust the PR title and contents, in `Changelog category (leave one)` keep +`Build/Testing/Packaging Improvement`, fill the rest of the fields if you want. diff --git a/docs/en/development/contrib.md b/docs/en/development/contrib.md index a65ddb40af0..9daf6148324 100644 --- a/docs/en/development/contrib.md +++ b/docs/en/development/contrib.md @@ -8,7 +8,7 @@ toc_title: Third-Party Libraries Used The list of third-party libraries can be obtained by the following query: ``` sql -SELECT library_name, license_type, license_path FROM system.licenses ORDER BY library_name COLLATE 'en' +SELECT library_name, license_type, license_path FROM system.licenses ORDER BY library_name COLLATE 'en'; ``` [Example](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUIGxpYnJhcnlfbmFtZSwgbGljZW5zZV90eXBlLCBsaWNlbnNlX3BhdGggRlJPTSBzeXN0ZW0ubGljZW5zZXMgT1JERVIgQlkgbGlicmFyeV9uYW1lIENPTExBVEUgJ2VuJw==) diff --git a/docs/en/development/developer-instruction.md b/docs/en/development/developer-instruction.md index 90f406f3ba8..537ed6a9c4f 100644 --- a/docs/en/development/developer-instruction.md +++ b/docs/en/development/developer-instruction.md @@ -123,7 +123,7 @@ For installing CMake and Ninja on Mac OS X first install Homebrew and then insta /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" brew install cmake ninja -Next, check the version of CMake: `cmake --version`. If it is below 3.3, you should install a newer version from the website: https://cmake.org/download/. +Next, check the version of CMake: `cmake --version`. If it is below 3.12, you should install a newer version from the website: https://cmake.org/download/. ## Optional External Libraries {#optional-external-libraries} diff --git a/docs/en/development/style.md b/docs/en/development/style.md index c495e3f0417..987df275c1d 100644 --- a/docs/en/development/style.md +++ b/docs/en/development/style.md @@ -749,7 +749,7 @@ If your code in the `master` branch is not buildable yet, exclude it from the bu **1.** The C++20 standard library is used (experimental extensions are allowed), as well as `boost` and `Poco` frameworks. -**2.** It is not allowed to use libraries from OS packages. It is also not allowed to use pre-installed libraries. All libraries should be placed in form of source code in `contrib` directory and built with ClickHouse. +**2.** It is not allowed to use libraries from OS packages. It is also not allowed to use pre-installed libraries. All libraries should be placed in form of source code in `contrib` directory and built with ClickHouse. See [Guidelines for adding new third-party libraries](contrib.md#adding-third-party-libraries) for details. **3.** Preference is always given to libraries that are already in use. diff --git a/docs/en/development/tests.md b/docs/en/development/tests.md index 4231bda6c35..c7c0ec88be4 100644 --- a/docs/en/development/tests.md +++ b/docs/en/development/tests.md @@ -70,7 +70,13 @@ Note that integration of ClickHouse with third-party drivers is not tested. Also Unit tests are useful when you want to test not the ClickHouse as a whole, but a single isolated library or class. You can enable or disable build of tests with `ENABLE_TESTS` CMake option. Unit tests (and other test programs) are located in `tests` subdirectories across the code. To run unit tests, type `ninja test`. Some tests use `gtest`, but some are just programs that return non-zero exit code on test failure. -It’s not necessarily to have unit tests if the code is already covered by functional tests (and functional tests are usually much more simple to use). +It’s not necessary to have unit tests if the code is already covered by functional tests (and functional tests are usually much more simple to use). + +You can run individual gtest checks by calling the executable directly, for example: + +```bash +$ ./src/unit_tests_dbms --gtest_filter=LocalAddress* +``` ## Performance Tests {#performance-tests} diff --git a/docs/en/engines/database-engines/atomic.md b/docs/en/engines/database-engines/atomic.md index 4f5f69a5ab7..bdab87aa4b1 100644 --- a/docs/en/engines/database-engines/atomic.md +++ b/docs/en/engines/database-engines/atomic.md @@ -17,7 +17,7 @@ It supports non-blocking [DROP TABLE](#drop-detach-table) and [RENAME TABLE](#re ### Table UUID {#table-uuid} -All tables in database `Atomic` have persistent [UUID](../../sql-reference/data-types/uuid.md) and store data in directory `/clickhouse_path/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/`, where `xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy` is UUID of the table. +All tables in database `Atomic` have persistent [UUID](../../sql-reference/data-types/uuid.md) and store data in directory `/clickhouse_path/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/`, where `xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy` is UUID of the table. Usually, the UUID is generated automatically, but the user can also explicitly specify the UUID in the same way when creating the table (this is not recommended). To display the `SHOW CREATE` query with the UUID you can use setting [show_table_uuid_in_table_create_query_if_not_nil](../../operations/settings/settings.md#show_table_uuid_in_table_create_query_if_not_nil). For example: ```sql @@ -47,7 +47,7 @@ EXCHANGE TABLES new_table AND old_table; ### ReplicatedMergeTree in Atomic Database {#replicatedmergetree-in-atomic-database} -For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables, it is recommended to not specify engine parameters - path in ZooKeeper and replica name. In this case, configuration parameters will be used [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). If you want to specify engine parameters explicitly, it is recommended to use {uuid} macros. This is useful so that unique paths are automatically generated for each table in ZooKeeper. +For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables, it is recommended to not specify engine parameters - path in ZooKeeper and replica name. In this case, configuration parameters will be used [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). If you want to specify engine parameters explicitly, it is recommended to use `{uuid}` macros. This is useful so that unique paths are automatically generated for each table in ZooKeeper. ## See Also diff --git a/docs/en/engines/database-engines/index.md b/docs/en/engines/database-engines/index.md index b6892099378..264f7a44a4e 100644 --- a/docs/en/engines/database-engines/index.md +++ b/docs/en/engines/database-engines/index.md @@ -14,7 +14,7 @@ You can also use the following database engines: - [MySQL](../../engines/database-engines/mysql.md) -- [MaterializeMySQL](../../engines/database-engines/materialize-mysql.md) +- [MaterializedMySQL](../../engines/database-engines/materialized-mysql.md) - [Lazy](../../engines/database-engines/lazy.md) @@ -22,4 +22,4 @@ You can also use the following database engines: - [PostgreSQL](../../engines/database-engines/postgresql.md) -[Original article](https://clickhouse.tech/docs/en/database_engines/) +- [Replicated](../../engines/database-engines/replicated.md) diff --git a/docs/en/engines/database-engines/materialize-mysql.md b/docs/en/engines/database-engines/materialized-mysql.md similarity index 57% rename from docs/en/engines/database-engines/materialize-mysql.md rename to docs/en/engines/database-engines/materialized-mysql.md index 93e4aedfd5a..20e16473115 100644 --- a/docs/en/engines/database-engines/materialize-mysql.md +++ b/docs/en/engines/database-engines/materialized-mysql.md @@ -1,9 +1,11 @@ --- toc_priority: 29 -toc_title: MaterializeMySQL +toc_title: MaterializedMySQL --- -# MaterializeMySQL {#materialize-mysql} +# MaterializedMySQL {#materialized-mysql} + +**This is experimental feature that should not be used in production.** Creates ClickHouse database with all the tables existing in MySQL, and all the data in those tables. @@ -15,7 +17,7 @@ This feature is experimental. ``` sql CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] -ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...] +ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...] ``` **Engine Parameters** @@ -25,13 +27,35 @@ ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'passwor - `user` — MySQL user. - `password` — User password. +**Engine Settings** +- `max_rows_in_buffer` — Max rows that data is allowed to cache in memory(for single table and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `65505`. +- `max_bytes_in_buffer` — Max bytes that data is allowed to cache in memory(for single table and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `1048576`. +- `max_rows_in_buffers` — Max rows that data is allowed to cache in memory(for database and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `65505`. +- `max_bytes_in_buffers` — Max bytes that data is allowed to cache in memory(for database and the cache data unable to query). when rows is exceeded, the data will be materialized. Default: `1048576`. +- `max_flush_data_time` — Max milliseconds that data is allowed to cache in memory(for database and the cache data unable to query). when this time is exceeded, the data will be materialized. Default: `1000`. +- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disable retry. Default: `1000`. +- `allows_query_when_mysql_lost` — Allow query materialized table when mysql is lost. Default: `0` (`false`). +``` +CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***') + SETTINGS + allows_query_when_mysql_lost=true, + max_wait_time_when_mysql_unavailable=10000; +``` + +**Settings on MySQL-server side** + +For the correct work of `MaterializeMySQL`, there are few mandatory `MySQL`-side configuration settings that should be set: + +- `default_authentication_plugin = mysql_native_password` since `MaterializeMySQL` can only authorize with this method. +- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializeMySQL` replication. Pay attention that while turning this mode `On` you should also specify `enforce_gtid_consistency = on`. + ## Virtual columns {#virtual-columns} -When working with the `MaterializeMySQL` database engine, [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns. - +When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns. + - `_version` — Transaction counter. Type [UInt64](../../sql-reference/data-types/int-uint.md). - `_sign` — Deletion mark. Type [Int8](../../sql-reference/data-types/int-uint.md). Possible values: - - `1` — Row is not deleted, + - `1` — Row is not deleted, - `-1` — Row is deleted. ## Data Types Support {#data_types-support} @@ -53,6 +77,7 @@ When working with the `MaterializeMySQL` database engine, [ReplacingMergeTree](. | STRING | [String](../../sql-reference/data-types/string.md) | | VARCHAR, VAR_STRING | [String](../../sql-reference/data-types/string.md) | | BLOB | [String](../../sql-reference/data-types/string.md) | +| BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) | Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication. @@ -60,28 +85,38 @@ Other types are not supported. If MySQL table contains a column of such type, Cl ## Specifics and Recommendations {#specifics-and-recommendations} +### Compatibility restrictions + +Apart of the data types limitations there are few restrictions comparing to `MySQL` databases, that should be resolved before replication will be possible: + +- Each table in `MySQL` should contain `PRIMARY KEY`. + +- Replication for tables, those are containing rows with `ENUM` field values out of range (specified in `ENUM` signature) will not work. + ### DDL Queries {#ddl-queries} MySQL DDL queries are converted into the corresponding ClickHouse DDL queries ([ALTER](../../sql-reference/statements/alter/index.md), [CREATE](../../sql-reference/statements/create/index.md), [DROP](../../sql-reference/statements/drop.md), [RENAME](../../sql-reference/statements/rename.md)). If ClickHouse cannot parse some DDL query, the query is ignored. ### Data Replication {#data-replication} -`MaterializeMySQL` does not support direct `INSERT`, `DELETE` and `UPDATE` queries. However, they are supported in terms of data replication: +`MaterializedMySQL` does not support direct `INSERT`, `DELETE` and `UPDATE` queries. However, they are supported in terms of data replication: - MySQL `INSERT` query is converted into `INSERT` with `_sign=1`. -- MySQL `DELETE` query is converted into `INSERT` with `_sign=-1`. +- MySQL `DELETE` query is converted into `INSERT` with `_sign=-1`. - MySQL `UPDATE` query is converted into `INSERT` with `_sign=-1` and `INSERT` with `_sign=1`. -### Selecting from MaterializeMySQL Tables {#select} +### Selecting from MaterializedMySQL Tables {#select} -`SELECT` query from `MaterializeMySQL` tables has some specifics: +`SELECT` query from `MaterializedMySQL` tables has some specifics: - If `_version` is not specified in the `SELECT` query, [FINAL](../../sql-reference/statements/select/from.md#select-from-final) modifier is used. So only rows with `MAX(_version)` are selected. - If `_sign` is not specified in the `SELECT` query, `WHERE _sign=1` is used by default. So the deleted rows are not included into the result set. +- The result includes columns comments in case they exist in MySQL database tables. + ### Index Conversion {#index-conversion} MySQL `PRIMARY KEY` and `INDEX` clauses are converted into `ORDER BY` tuples in ClickHouse tables. @@ -91,10 +126,10 @@ ClickHouse has only one physical order, which is determined by `ORDER BY` clause **Notes** - Rows with `_sign=-1` are not deleted physically from the tables. -- Cascade `UPDATE/DELETE` queries are not supported by the `MaterializeMySQL` engine. +- Cascade `UPDATE/DELETE` queries are not supported by the `MaterializedMySQL` engine. - Replication can be easily broken. - Manual operations on database and tables are forbidden. -- `MaterializeMySQL` is influenced by [optimize_on_insert](../../operations/settings/settings.md#optimize-on-insert) setting. The data is merged in the corresponding table in the `MaterializeMySQL` database when a table in the MySQL server changes. +- `MaterializedMySQL` is influenced by [optimize_on_insert](../../operations/settings/settings.md#optimize-on-insert) setting. The data is merged in the corresponding table in the `MaterializedMySQL` database when a table in the MySQL server changes. ## Examples of Use {#examples-of-use} @@ -111,9 +146,9 @@ mysql> SELECT * FROM test; ``` ```text -+---+------+------+ ++---+------+------+ | a | b | c | -+---+------+------+ ++---+------+------+ | 2 | 222 | Wow! | +---+------+------+ ``` @@ -123,7 +158,7 @@ Database in ClickHouse, exchanging data with the MySQL server: The database and the table created: ``` sql -CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***'); +CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***'); SHOW TABLES FROM mysql; ``` @@ -140,9 +175,9 @@ SELECT * FROM mysql.test; ``` ``` text -┌─a─┬──b─┠-│ 1 │ 11 │ -│ 2 │ 22 │ +┌─a─┬──b─┠+│ 1 │ 11 │ +│ 2 │ 22 │ └───┴────┘ ``` @@ -153,9 +188,9 @@ SELECT * FROM mysql.test; ``` ``` text -┌─a─┬───b─┬─c────┠-│ 2 │ 222 │ Wow! │ +┌─a─┬───b─┬─c────┠+│ 2 │ 222 │ Wow! │ └───┴─────┴──────┘ ``` -[Original article](https://clickhouse.tech/docs/en/engines/database-engines/materialize-mysql/) +[Original article](https://clickhouse.tech/docs/en/engines/database-engines/materialized-mysql/) diff --git a/docs/en/engines/database-engines/mysql.md b/docs/en/engines/database-engines/mysql.md index 2de07d1fe4b..0507be6bd46 100644 --- a/docs/en/engines/database-engines/mysql.md +++ b/docs/en/engines/database-engines/mysql.md @@ -53,7 +53,7 @@ All other MySQL data types are converted into [String](../../sql-reference/data- ## Global Variables Support {#global-variables-support} -For better compatibility you may address global variables in MySQL style, as `@@identifier`. +For better compatibility you may address global variables in MySQL style, as `@@identifier`. These variables are supported: - `version` diff --git a/docs/en/engines/database-engines/postgresql.md b/docs/en/engines/database-engines/postgresql.md index 1fa86b7ac21..9e339f9d6f4 100644 --- a/docs/en/engines/database-engines/postgresql.md +++ b/docs/en/engines/database-engines/postgresql.md @@ -14,8 +14,8 @@ Supports table structure modifications (`ALTER TABLE ... ADD|DROP COLUMN`). If ` ## Creating a Database {#creating-a-database} ``` sql -CREATE DATABASE test_database -ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `use_table_cache`]); +CREATE DATABASE test_database +ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `schema`, `use_table_cache`]); ``` **Engine Parameters** @@ -24,6 +24,7 @@ ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `use_table_cac - `database` — Remote database name. - `user` — PostgreSQL user. - `password` — User password. +- `schema` — PostgreSQL schema. - `use_table_cache` — Defines if the database table structure is cached or not. Optional. Default value: `0`. ## Data Types Support {#data_types-support} @@ -43,14 +44,14 @@ ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `use_table_cac | TEXT, CHAR | [String](../../sql-reference/data-types/string.md) | | INTEGER | Nullable([Int32](../../sql-reference/data-types/int-uint.md))| | ARRAY | [Array](../../sql-reference/data-types/array.md) | - + ## Examples of Use {#examples-of-use} Database in ClickHouse, exchanging data with the PostgreSQL server: ``` sql -CREATE DATABASE test_database +CREATE DATABASE test_database ENGINE = PostgreSQL('postgres1:5432', 'test_database', 'postgres', 'mysecretpassword', 1); ``` @@ -102,7 +103,7 @@ SELECT * FROM test_database.test_table; └────────┴───────┘ ``` -Consider the table structure was modified in PostgreSQL: +Consider the table structure was modified in PostgreSQL: ``` sql postgre> ALTER TABLE test_table ADD COLUMN data Text diff --git a/docs/en/engines/database-engines/replicated.md b/docs/en/engines/database-engines/replicated.md new file mode 100644 index 00000000000..ed8406e6a86 --- /dev/null +++ b/docs/en/engines/database-engines/replicated.md @@ -0,0 +1,115 @@ +# [experimental] Replicated {#replicated} + +The engine is based on the [Atomic](../../engines/database-engines/atomic.md) engine. It supports replication of metadata via DDL log being written to ZooKeeper and executed on all of the replicas for a given database. + +One ClickHouse server can have multiple replicated databases running and updating at the same time. But there can't be multiple replicas of the same replicated database. + +## Creating a Database {#creating-a-database} +``` sql + CREATE DATABASE testdb ENGINE = Replicated('zoo_path', 'shard_name', 'replica_name') [SETTINGS ...] +``` + +**Engine Parameters** + +- `zoo_path` — ZooKeeper path. The same ZooKeeper path corresponds to the same database. +- `shard_name` — Shard name. Database replicas are grouped into shards by `shard_name`. +- `replica_name` — Replica name. Replica names must be different for all replicas of the same shard. + +!!! note "Warning" + For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables if no arguments provided, then default arguments are used: `/clickhouse/tables/{uuid}/{shard}` and `{replica}`. These can be changed in the server settings [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). Macro `{uuid}` is unfolded to table's uuid, `{shard}` and `{replica}` are unfolded to values from server config, not from database engine arguments. But in the future, it will be possible to use `shard_name` and `replica_name` of Replicated database. + +## Specifics and Recommendations {#specifics-and-recommendations} + +DDL queries with `Replicated` database work in a similar way to [ON CLUSTER](../../sql-reference/distributed-ddl.md) queries, but with minor differences. + +First, the DDL request tries to execute on the initiator (the host that originally received the request from the user). If the request is not fulfilled, then the user immediately receives an error, other hosts do not try to fulfill it. If the request has been successfully completed on the initiator, then all other hosts will automatically retry until they complete it. The initiator will try to wait for the query to be completed on other hosts (no longer than [distributed_ddl_task_timeout](../../operations/settings/settings.md#distributed_ddl_task_timeout)) and will return a table with the query execution statuses on each host. + +The behavior in case of errors is regulated by the [distributed_ddl_output_mode](../../operations/settings/settings.md#distributed_ddl_output_mode) setting, for a `Replicated` database it is better to set it to `null_status_on_timeout` — i.e. if some hosts did not have time to execute the request for [distributed_ddl_task_timeout](../../operations/settings/settings.md#distributed_ddl_task_timeout), then do not throw an exception, but show the `NULL` status for them in the table. + +The [system.clusters](../../operations/system-tables/clusters.md) system table contains a cluster named like the replicated database, which consists of all replicas of the database. This cluster is updated automatically when creating/deleting replicas, and it can be used for [Distributed](../../engines/table-engines/special/distributed.md#distributed) tables. + +When creating a new replica of the database, this replica creates tables by itself. If the replica has been unavailable for a long time and has lagged behind the replication log — it checks its local metadata with the current metadata in ZooKeeper, moves the extra tables with data to a separate non-replicated database (so as not to accidentally delete anything superfluous), creates the missing tables, updates the table names if they have been renamed. The data is replicated at the `ReplicatedMergeTree` level, i.e. if the table is not replicated, the data will not be replicated (the database is responsible only for metadata). + +## Usage Example {#usage-example} + +Creating a cluster with three hosts: + +``` sql +node1 :) CREATE DATABASE r ENGINE=Replicated('some/path/r','shard1','replica1'); +node2 :) CREATE DATABASE r ENGINE=Replicated('some/path/r','shard1','other_replica'); +node3 :) CREATE DATABASE r ENGINE=Replicated('some/path/r','other_shard','{replica}'); +``` + +Running the DDL-query: + +``` sql +CREATE TABLE r.rmt (n UInt64) ENGINE=ReplicatedMergeTree ORDER BY n; +``` + +``` text +┌─────hosts────────────┬──status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┠+│ shard1|replica1 │ 0 │ │ 2 │ 0 │ +│ shard1|other_replica │ 0 │ │ 1 │ 0 │ +│ other_shard|r1 │ 0 │ │ 0 │ 0 │ +└──────────────────────┴─────────┴───────┴─────────────────────┴──────────────────┘ +``` + +Showing the system table: + +``` sql +SELECT cluster, shard_num, replica_num, host_name, host_address, port, is_local +FROM system.clusters WHERE cluster='r'; +``` + +``` text +┌─cluster─┬─shard_num─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┠+│ r │ 1 │ 1 │ node3 │ 127.0.0.1 │ 9002 │ 0 │ +│ r │ 2 │ 1 │ node2 │ 127.0.0.1 │ 9001 │ 0 │ +│ r │ 2 │ 2 │ node1 │ 127.0.0.1 │ 9000 │ 1 │ +└─────────┴───────────┴─────────────┴───────────┴──────────────┴──────┴──────────┘ +``` + +Creating a distributed table and inserting the data: + +``` sql +node2 :) CREATE TABLE r.d (n UInt64) ENGINE=Distributed('r','r','rmt', n % 2); +node3 :) INSERT INTO r SELECT * FROM numbers(10); +node1 :) SELECT materialize(hostName()) AS host, groupArray(n) FROM r.d GROUP BY host; +``` + +``` text +┌─hosts─┬─groupArray(n)─┠+│ node1 │ [1,3,5,7,9] │ +│ node2 │ [0,2,4,6,8] │ +└───────┴───────────────┘ +``` + +Adding replica on the one more host: + +``` sql +node4 :) CREATE DATABASE r ENGINE=Replicated('some/path/r','other_shard','r2'); +``` + +The cluster configuration will look like this: + +``` text +┌─cluster─┬─shard_num─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┠+│ r │ 1 │ 1 │ node3 │ 127.0.0.1 │ 9002 │ 0 │ +│ r │ 1 │ 2 │ node4 │ 127.0.0.1 │ 9003 │ 0 │ +│ r │ 2 │ 1 │ node2 │ 127.0.0.1 │ 9001 │ 0 │ +│ r │ 2 │ 2 │ node1 │ 127.0.0.1 │ 9000 │ 1 │ +└─────────┴───────────┴─────────────┴───────────┴──────────────┴──────┴──────────┘ +``` + +The distributed table also will get data from the new host: + +```sql +node2 :) SELECT materialize(hostName()) AS host, groupArray(n) FROM r.d GROUP BY host; +``` + +```text +┌─hosts─┬─groupArray(n)─┠+│ node2 │ [1,3,5,7,9] │ +│ node4 │ [0,2,4,6,8] │ +└───────┴───────────────┘ +``` \ No newline at end of file diff --git a/docs/en/engines/table-engines/integrations/ExternalDistributed.md b/docs/en/engines/table-engines/integrations/ExternalDistributed.md index 819abdbf9d7..0ecbc5383e1 100644 --- a/docs/en/engines/table-engines/integrations/ExternalDistributed.md +++ b/docs/en/engines/table-engines/integrations/ExternalDistributed.md @@ -35,7 +35,7 @@ The table structure can differ from the original table structure: - `password` — User password. ## Implementation Details {#implementation-details} - + Supports multiple replicas that must be listed by `|` and shards must be listed by `,`. For example: ```sql diff --git a/docs/en/engines/table-engines/integrations/embedded-rocksdb.md b/docs/en/engines/table-engines/integrations/embedded-rocksdb.md index 88c8973eeab..b55f1b68aea 100644 --- a/docs/en/engines/table-engines/integrations/embedded-rocksdb.md +++ b/docs/en/engines/table-engines/integrations/embedded-rocksdb.md @@ -20,7 +20,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] Required parameters: -- `primary_key_name` – any column name in the column list. +- `primary_key_name` – any column name in the column list. - `primary key` must be specified, it supports only one column in the primary key. The primary key will be serialized in binary as a `rocksdb key`. - columns other than the primary key will be serialized in binary as `rocksdb` value in corresponding order. - queries with key `equals` or `in` filtering will be optimized to multi keys lookup from `rocksdb`. @@ -39,4 +39,46 @@ ENGINE = EmbeddedRocksDB PRIMARY KEY key ``` +## Metrics + +There is also `system.rocksdb` table, that expose rocksdb statistics: + +```sql +SELECT + name, + value +FROM system.rocksdb + +┌─name──────────────────────┬─value─┠+│ no.file.opens │ 1 │ +│ number.block.decompressed │ 1 │ +└───────────────────────────┴───────┘ +``` + +## Configuration + +You can also change any [rocksdb options](https://github.com/facebook/rocksdb/wiki/Option-String-and-Option-Map) using config: + +```xml + + + 8 + + + 2 + + + + TABLE + + 8 + + + 2 + +
+
+
+``` + [Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/embedded-rocksdb/) diff --git a/docs/en/engines/table-engines/integrations/mongodb.md b/docs/en/engines/table-engines/integrations/mongodb.md index a378ab03f55..9839893d4e8 100644 --- a/docs/en/engines/table-engines/integrations/mongodb.md +++ b/docs/en/engines/table-engines/integrations/mongodb.md @@ -15,7 +15,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name name1 [type1], name2 [type2], ... -) ENGINE = MongoDB(host:port, database, collection, user, password); +) ENGINE = MongoDB(host:port, database, collection, user, password [, options]); ``` **Engine Parameters** @@ -30,18 +30,30 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name - `password` — User password. +- `options` — MongoDB connection string options (optional parameter). + ## Usage Example {#usage-example} -Table in ClickHouse which allows to read data from MongoDB collection: +Create a table in ClickHouse which allows to read data from MongoDB collection: ``` text CREATE TABLE mongo_table ( - key UInt64, + key UInt64, data String ) ENGINE = MongoDB('mongo1:27017', 'test', 'simple_table', 'testuser', 'clickhouse'); ``` +To read from an SSL secured MongoDB server: + +``` text +CREATE TABLE mongo_table_ssl +( + key UInt64, + data String +) ENGINE = MongoDB('mongo2:27017', 'test', 'simple_table', 'testuser', 'clickhouse', 'ssl=true'); +``` + Query: ``` sql diff --git a/docs/en/engines/table-engines/integrations/postgresql.md b/docs/en/engines/table-engines/integrations/postgresql.md index 1a8f2c4b758..4c763153a36 100644 --- a/docs/en/engines/table-engines/integrations/postgresql.md +++ b/docs/en/engines/table-engines/integrations/postgresql.md @@ -49,14 +49,14 @@ PostgreSQL `Array` types are converted into ClickHouse arrays. !!! info "Note" Be careful - in PostgreSQL an array data, created like a `type_name[]`, may contain multi-dimensional arrays of different dimensions in different table rows in same column. But in ClickHouse it is only allowed to have multidimensional arrays of the same count of dimensions in all table rows in same column. - + Supports multiple replicas that must be listed by `|`. For example: ```sql CREATE TABLE test_replicas (id UInt32, name String) ENGINE = PostgreSQL(`postgres{2|3|4}:5432`, 'clickhouse', 'test_replicas', 'postgres', 'mysecretpassword'); ``` -Replicas priority for PostgreSQL dictionary source is supported. The bigger the number in map, the less the priority. The highest priority is `0`. +Replicas priority for PostgreSQL dictionary source is supported. The bigger the number in map, the less the priority. The highest priority is `0`. In the example below replica `example01-1` has the highest priority: diff --git a/docs/en/engines/table-engines/log-family/index.md b/docs/en/engines/table-engines/log-family/index.md index 8cdde239f44..b505fe1c474 100644 --- a/docs/en/engines/table-engines/log-family/index.md +++ b/docs/en/engines/table-engines/log-family/index.md @@ -14,6 +14,8 @@ Engines of the family: - [Log](../../../engines/table-engines/log-family/log.md) - [TinyLog](../../../engines/table-engines/log-family/tinylog.md) +`Log` family table engines can store data to [HDFS](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-hdfs) or [S3](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-s3) distributed file systems. + ## Common Properties {#common-properties} Engines: diff --git a/docs/en/engines/table-engines/log-family/log.md b/docs/en/engines/table-engines/log-family/log.md index 87cdb890e9f..2aeef171128 100644 --- a/docs/en/engines/table-engines/log-family/log.md +++ b/docs/en/engines/table-engines/log-family/log.md @@ -5,10 +5,8 @@ toc_title: Log # Log {#log} -Engine belongs to the family of log engines. See the common properties of log engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/index.md) article. +The engine belongs to the family of `Log` engines. See the common properties of `Log` engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/index.md) article. -Log differs from [TinyLog](../../../engines/table-engines/log-family/tinylog.md) in that a small file of “marks†resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads. +`Log` differs from [TinyLog](../../../engines/table-engines/log-family/tinylog.md) in that a small file of "marks" resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads. For concurrent data access, the read operations can be performed simultaneously, while write operations block reads and each other. -The Log engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The Log engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes. - -[Original article](https://clickhouse.tech/docs/en/operations/table_engines/log/) +The `Log` engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The `Log` engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes. diff --git a/docs/en/engines/table-engines/mergetree-family/mergetree.md b/docs/en/engines/table-engines/mergetree-family/mergetree.md index 9d259456ea5..561b0ad8023 100644 --- a/docs/en/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md @@ -76,7 +76,7 @@ For a description of parameters, see the [CREATE query description](../../../sql - `SAMPLE BY` — An expression for sampling. Optional. - If a sampling expression is used, the primary key must contain it. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`. + If a sampling expression is used, the primary key must contain it. The result of sampling expression must be unsigned integer. Example: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`. - `TTL` — A list of rules specifying storage duration of rows and defining logic of automatic parts movement [between disks and volumes](#table_engine-mergetree-multiple-volumes). Optional. @@ -728,7 +728,9 @@ During this time, they are not moved to other volumes or disks. Therefore, until ## Using S3 for Data Storage {#table_engine-mergetree-s3} -`MergeTree` family table engines is able to store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`. +`MergeTree` family table engines can store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`. + +This feature is under development and not ready for production. There are known drawbacks such as very low performance. Configuration markup: ``` xml @@ -762,11 +764,13 @@ Configuration markup: ``` Required parameters: -- `endpoint` — S3 endpoint url in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint url should contain bucket and root path to store data. + +- `endpoint` — S3 endpoint URL in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint URL should contain a bucket and root path to store data. - `access_key_id` — S3 access key id. - `secret_access_key` — S3 secret access key. Optional parameters: + - `region` — S3 region name. - `use_environment_credentials` — Reads AWS credentials from the Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN if they exist. Default value is `false`. - `use_insecure_imds_request` — If set to `true`, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Default value is `false`. @@ -782,7 +786,6 @@ Optional parameters: - `skip_access_check` — If true, disk access checks will not be performed on disk start-up. Default value is `false`. - `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set. - S3 disk can be configured as `main` or `cold` storage: ``` xml @@ -821,4 +824,43 @@ S3 disk can be configured as `main` or `cold` storage: In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule. -[Original article](https://clickhouse.tech/docs/ru/operations/table_engines/mergetree/) +## Using HDFS for Data Storage {#table_engine-mergetree-hdfs} + +[HDFS](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) is a distributed file system for remote data storage. + +`MergeTree` family table engines can store data to HDFS using a disk with type `HDFS`. + +Configuration markup: +``` xml + + + + + hdfs + hdfs://hdfs1:9000/clickhouse/ + + + + + +
+ hdfs +
+
+
+
+
+ + + 0 + +
+``` + +Required parameters: + +- `endpoint` — HDFS endpoint URL in `path` format. Endpoint URL should contain a root path to store data. + +Optional parameters: + +- `min_bytes_for_seek` — The minimal number of bytes to use seek operation instead of sequential read. Default value: `1 Mb`. diff --git a/docs/en/engines/table-engines/mergetree-family/replication.md b/docs/en/engines/table-engines/mergetree-family/replication.md index 2db6686beb7..4fc30355927 100644 --- a/docs/en/engines/table-engines/mergetree-family/replication.md +++ b/docs/en/engines/table-engines/mergetree-family/replication.md @@ -101,7 +101,7 @@ For very large clusters, you can use different ZooKeeper clusters for different Replication is asynchronous and multi-master. `INSERT` queries (as well as `ALTER`) can be sent to any available server. Data is inserted on the server where the query is run, and then it is copied to the other servers. Because it is asynchronous, recently inserted data appears on the other replicas with some latency. If part of the replicas are not available, the data is written when they become available. If a replica is available, the latency is the amount of time it takes to transfer the block of compressed data over the network. The number of threads performing background tasks for replicated tables can be set by [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size) setting. -`ReplicatedMergeTree` engine uses a separate thread pool for replicated fetches. Size of the pool is limited by the [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size) setting which can be tuned with a server restart. +`ReplicatedMergeTree` engine uses a separate thread pool for replicated fetches. Size of the pool is limited by the [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size) setting which can be tuned with a server restart. By default, an INSERT query waits for confirmation of writing the data from only one replica. If the data was successfully written to only one replica and the server with this replica ceases to exist, the stored data will be lost. To enable getting confirmation of data writes from multiple replicas, use the `insert_quorum` option. @@ -155,7 +155,7 @@ CREATE TABLE table_name
-As the example shows, these parameters can contain substitutions in curly brackets. The substituted values are taken from the «[macros](../../../operations/server-configuration-parameters/settings/#macros) section of the configuration file. +As the example shows, these parameters can contain substitutions in curly brackets. The substituted values are taken from the «[macros](../../../operations/server-configuration-parameters/settings/#macros) section of the configuration file. Example: @@ -198,7 +198,7 @@ In this case, you can omit arguments when creating tables: ``` sql CREATE TABLE table_name ( x UInt32 -) ENGINE = ReplicatedMergeTree +) ENGINE = ReplicatedMergeTree ORDER BY x; ``` @@ -207,7 +207,7 @@ It is equivalent to: ``` sql CREATE TABLE table_name ( x UInt32 -) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/table_name', '{replica}') +) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/table_name', '{replica}') ORDER BY x; ``` diff --git a/docs/en/engines/table-engines/special/distributed.md b/docs/en/engines/table-engines/special/distributed.md index 6de6602a216..5c911c6cc0a 100644 --- a/docs/en/engines/table-engines/special/distributed.md +++ b/docs/en/engines/table-engines/special/distributed.md @@ -37,6 +37,14 @@ Also, it accepts the following settings: - `max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for async send. Default 60. +- `monitor_batch_inserts` - same as [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts) + +- `monitor_split_batch_on_failure` - same as [distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure) + +- `monitor_sleep_time_ms` - same as [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) + +- `monitor_max_sleep_time_ms` - same as [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) + !!! note "Note" **Durability settings** (`fsync_...`): diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index eb288721231..015afd1cd24 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -1130,17 +1130,18 @@ The table below shows supported data types and how they match ClickHouse [data t | `boolean`, `int`, `long`, `float`, `double` | [Int64](../sql-reference/data-types/int-uint.md), [UInt64](../sql-reference/data-types/int-uint.md) | `long` | | `boolean`, `int`, `long`, `float`, `double` | [Float32](../sql-reference/data-types/float.md) | `float` | | `boolean`, `int`, `long`, `float`, `double` | [Float64](../sql-reference/data-types/float.md) | `double` | -| `bytes`, `string`, `fixed`, `enum` | [String](../sql-reference/data-types/string.md) | `bytes` | +| `bytes`, `string`, `fixed`, `enum` | [String](../sql-reference/data-types/string.md) | `bytes` or `string` \* | | `bytes`, `string`, `fixed` | [FixedString(N)](../sql-reference/data-types/fixedstring.md) | `fixed(N)` | | `enum` | [Enum(8\|16)](../sql-reference/data-types/enum.md) | `enum` | | `array(T)` | [Array(T)](../sql-reference/data-types/array.md) | `array(T)` | | `union(null, T)`, `union(T, null)` | [Nullable(T)](../sql-reference/data-types/date.md) | `union(null, T)` | | `null` | [Nullable(Nothing)](../sql-reference/data-types/special-data-types/nothing.md) | `null` | -| `int (date)` \* | [Date](../sql-reference/data-types/date.md) | `int (date)` \* | -| `long (timestamp-millis)` \* | [DateTime64(3)](../sql-reference/data-types/datetime.md) | `long (timestamp-millis)` \* | -| `long (timestamp-micros)` \* | [DateTime64(6)](../sql-reference/data-types/datetime.md) | `long (timestamp-micros)` \* | +| `int (date)` \** | [Date](../sql-reference/data-types/date.md) | `int (date)` \** | +| `long (timestamp-millis)` \** | [DateTime64(3)](../sql-reference/data-types/datetime.md) | `long (timestamp-millis)` \* | +| `long (timestamp-micros)` \** | [DateTime64(6)](../sql-reference/data-types/datetime.md) | `long (timestamp-micros)` \* | -\* [Avro logical types](https://avro.apache.org/docs/current/spec.html#Logical+Types) +\* `bytes` is default, controlled by [output_format_avro_string_column_pattern](../operations/settings/settings.md#settings-output_format_avro_string_column_pattern) +\** [Avro logical types](https://avro.apache.org/docs/current/spec.html#Logical+Types) Unsupported Avro data types: `record` (non-root), `map` @@ -1246,12 +1247,14 @@ The table below shows supported data types and how they match ClickHouse [data t | `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` | | `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` | | `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` | -| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `STRING` | -| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `STRING` | +| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` | +| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` | | `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` | | `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` | +| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` | +| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` | -Arrays can be nested and can have a value of the `Nullable` type as an argument. +Arrays can be nested and can have a value of the `Nullable` type as an argument. `Tuple` and `Map` types also can be nested. ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query treats the Parquet `DECIMAL` type as the ClickHouse `Decimal128` type. @@ -1299,13 +1302,17 @@ The table below shows supported data types and how they match ClickHouse [data t | `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `FLOAT64` | | `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` | | `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` | -| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `UTF8` | -| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `UTF8` | +| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` | +| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` | | `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` | | `DECIMAL256` | [Decimal256](../sql-reference/data-types/decimal.md)| `DECIMAL256` | | `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` | +| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` | +| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` | -Arrays can be nested and can have a value of the `Nullable` type as an argument. +Arrays can be nested and can have a value of the `Nullable` type as an argument. `Tuple` and `Map` types also can be nested. + +The `DICTIONARY` type is supported for `INSERT` queries, and for `SELECT` queries there is an [output_format_arrow_low_cardinality_as_dictionary](../operations/settings/settings.md#output-format-arrow-low-cardinality-as-dictionary) setting that allows to output [LowCardinality](../sql-reference/data-types/lowcardinality.md) type as a `DICTIONARY` type. ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` query treats the Arrow `DECIMAL` type as the ClickHouse `Decimal128` type. @@ -1358,8 +1365,10 @@ The table below shows supported data types and how they match ClickHouse [data t | `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` | | `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` | | `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` | +| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` | +| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` | -Arrays can be nested and can have a value of the `Nullable` type as an argument. +Arrays can be nested and can have a value of the `Nullable` type as an argument. `Tuple` and `Map` types also can be nested. ClickHouse supports configurable precision of the `Decimal` type. The `INSERT` query treats the ORC `DECIMAL` type as the ClickHouse `Decimal128` type. diff --git a/docs/en/interfaces/http.md b/docs/en/interfaces/http.md index 5d0a17a5279..6454262122f 100644 --- a/docs/en/interfaces/http.md +++ b/docs/en/interfaces/http.md @@ -16,7 +16,7 @@ $ curl 'http://localhost:8123/' Ok. ``` -Web UI can be accessed here: `http://localhost:8123/play`. +Web UI can be accessed here: `http://localhost:8123/play`. ![Web UI](../images/play.png) diff --git a/docs/en/interfaces/third-party/gui.md b/docs/en/interfaces/third-party/gui.md index fffe0c87a53..2d7f3a24011 100644 --- a/docs/en/interfaces/third-party/gui.md +++ b/docs/en/interfaces/third-party/gui.md @@ -84,6 +84,8 @@ Features: - Table data preview. - Full-text search. +By default, DBeaver does not connect using a session (the CLI for example does). If you require session support (for example to set settings for your session), edit the driver connection properties and set session_id to a random string (it uses the http connection under the hood). Then you can use any setting from the query window + ### clickhouse-cli {#clickhouse-cli} [clickhouse-cli](https://github.com/hatarist/clickhouse-cli) is an alternative command-line client for ClickHouse, written in Python 3. diff --git a/docs/en/interfaces/third-party/integrations.md b/docs/en/interfaces/third-party/integrations.md index f28a870e206..48782268940 100644 --- a/docs/en/interfaces/third-party/integrations.md +++ b/docs/en/interfaces/third-party/integrations.md @@ -43,7 +43,7 @@ toc_title: Integrations - Monitoring - [Graphite](https://graphiteapp.org) - [graphouse](https://github.com/yandex/graphouse) - - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + + - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) - [graphite-clickhouse](https://github.com/lomik/graphite-clickhouse) - [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer) - optimizes staled partitions in [\*GraphiteMergeTree](../../engines/table-engines/mergetree-family/graphitemergetree.md#graphitemergetree) if rules from [rollup configuration](../../engines/table-engines/mergetree-family/graphitemergetree.md#rollup-configuration) could be applied - [Grafana](https://grafana.com/) diff --git a/docs/en/introduction/adopters.md b/docs/en/introduction/adopters.md index 990cb30346c..d408a3d6849 100644 --- a/docs/en/introduction/adopters.md +++ b/docs/en/introduction/adopters.md @@ -115,6 +115,7 @@ toc_title: Adopters | Sina | News | — | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/6.%20ClickHouse最佳实践%20高é¹_新浪.pdf) | | SMI2 | News | Analytics | — | — | [Blog Post in Russian, November 2017](https://habr.com/ru/company/smi2/blog/314558/) | | Spark New Zealand | Telecommunications | Security Operations | — | — | [Blog Post, Feb 2020](https://blog.n0p.me/2020/02/2020-02-05-dnsmonster/) | +| Splitbee | Analytics | Main Product | — | — | [Blog Post, Mai 2021](https://splitbee.io/blog/new-pricing) | | Splunk | Business Analytics | Main product | — | — | [Slides in English, January 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup12/splunk.pdf) | | Spotify | Music | Experimentation | — | — | [Slides, July 2018](https://www.slideshare.net/glebus/using-clickhouse-for-experimentation-104247173) | | Staffcop | Information Security | Main Product | — | — | [Official website, Documentation](https://www.staffcop.ru/sce43) | @@ -157,5 +158,6 @@ toc_title: Adopters | SigNoz | Observability Platform | Main Product | — | — | [Source code](https://github.com/SigNoz/signoz) | | ChelPipe Group | Analytics | — | — | — | [Blog post, June 2021](https://vc.ru/trade/253172-tyazhelomu-proizvodstvu-user-friendly-sayt-internet-magazin-trub-dlya-chtpz) | | Zagrava Trading | — | — | — | — | [Job offer, May 2021](https://twitter.com/datastackjobs/status/1394707267082063874) | +| Beeline | Telecom | Data Platform | — | — | [Blog post, July 2021](https://habr.com/en/company/beeline/blog/567508/) | [Original article](https://clickhouse.tech/docs/en/introduction/adopters/) diff --git a/docs/en/operations/clickhouse-keeper.md b/docs/en/operations/clickhouse-keeper.md index 6af12eb9b01..5fc1baa003c 100644 --- a/docs/en/operations/clickhouse-keeper.md +++ b/docs/en/operations/clickhouse-keeper.md @@ -10,7 +10,7 @@ ClickHouse server use [ZooKeeper](https://zookeeper.apache.org/) coordination sy !!! warning "Warning" This feature currently in pre-production stage. We test it in our CI and on small internal installations. -## Implemetation details +## Implementation details ZooKeeper is one of the first well-known open-source coordination systems. It's implemented in Java, has quite a simple and powerful data model. ZooKeeper's coordination algorithm called ZAB (ZooKeeper Atomic Broadcast) doesn't provide linearizability guarantees for reads, because each ZooKeeper node serves reads locally. Unlike ZooKeeper `clickhouse-keeper` written in C++ and use [RAFT algorithm](https://raft.github.io/) [implementation](https://github.com/eBay/NuRaft). This algorithm allows to have linearizability for reads and writes, has several open-source implementations in different languages. @@ -30,21 +30,25 @@ Other common parameters are inherited from clickhouse-server config (`listen_hos Internal coordination settings are located in `.` section: -- `operation_timeout_ms` — timeout for a single client operation -- `session_timeout_ms` — timeout for client session -- `dead_session_check_period_ms` — how often clickhouse-keeper check dead sessions and remove them -- `heart_beat_interval_ms` — how often a clickhouse-keeper leader will send heartbeats to followers -- `election_timeout_lower_bound_ms` — if follower didn't receive heartbeats from the leader in this interval, then it can initiate leader election -- `election_timeout_upper_bound_ms` — if follower didn't receive heartbeats from the leader in this interval, then it must initiate leader election -- `rotate_log_storage_interval` — how many logs to store in a single file -- `reserved_log_items` — how many coordination logs to store before compaction -- `snapshot_distance` — how often clickhouse-keeper will create new snapshots (in the number of logs) -- `snapshots_to_keep` — how many snapshots to keep -- `stale_log_gap` — the threshold when leader consider follower as stale and send snapshot to it instead of logs -- `force_sync` — call `fsync` on each write to coordination log -- `raft_logs_level` — text logging level about coordination (trace, debug, and so on) -- `shutdown_timeout` — wait to finish internal connections and shutdown -- `startup_timeout` — if the server doesn't connect to other quorum participants in the specified timeout it will terminate +- `operation_timeout_ms` — timeout for a single client operation (default: 10000) +- `session_timeout_ms` — timeout for client session (default: 30000) +- `dead_session_check_period_ms` — how often clickhouse-keeper check dead sessions and remove them (default: 500) +- `heart_beat_interval_ms` — how often a clickhouse-keeper leader will send heartbeats to followers (default: 500) +- `election_timeout_lower_bound_ms` — if follower didn't receive heartbeats from the leader in this interval, then it can initiate leader election (default: 1000) +- `election_timeout_upper_bound_ms` — if follower didn't receive heartbeats from the leader in this interval, then it must initiate leader election (default: 2000) +- `rotate_log_storage_interval` — how many log records to store in a single file (default: 100000) +- `reserved_log_items` — how many coordination log records to store before compaction (default: 100000) +- `snapshot_distance` — how often clickhouse-keeper will create new snapshots (in the number of records in logs) (default: 100000) +- `snapshots_to_keep` — how many snapshots to keep (default: 3) +- `stale_log_gap` — the threshold when leader consider follower as stale and send snapshot to it instead of logs (default: 10000) +- `fresh_log_gap` - when node became fresh (default: 200) +- `max_requests_batch_size` - max size of batch in requests count before it will be sent to RAFT (default: 100) +- `force_sync` — call `fsync` on each write to coordination log (default: true) +- `quorum_reads` - execute read requests as writes through whole RAFT consesus with similar speed (default: false) +- `raft_logs_level` — text logging level about coordination (trace, debug, and so on) (default: system default) +- `auto_forwarding` - allow to forward write requests from followers to leader (default: true) +- `shutdown_timeout` — wait to finish internal connections and shutdown (ms) (default: 5000) +- `startup_timeout` — if the server doesn't connect to other quorum participants in the specified timeout it will terminate (ms) (default: 30000) Quorum configuration is located in `.` section and contain servers description. The only parameter for the whole quorum is `secure`, which enables encrypted connection for communication between quorum participants. The main parameters for each `` are: diff --git a/docs/en/operations/performance-test.md b/docs/en/operations/performance-test.md index a808ffd0a85..d6d9cdb55cc 100644 --- a/docs/en/operations/performance-test.md +++ b/docs/en/operations/performance-test.md @@ -5,50 +5,67 @@ toc_title: Testing Hardware # How to Test Your Hardware with ClickHouse {#how-to-test-your-hardware-with-clickhouse} -With this instruction you can run basic ClickHouse performance test on any server without installation of ClickHouse packages. +You can run basic ClickHouse performance test on any server without installation of ClickHouse packages. -1. Go to “commits†page: https://github.com/ClickHouse/ClickHouse/commits/master -2. Click on the first green check mark or red cross with green “ClickHouse Build Check†and click on the “Details†link near “ClickHouse Build Checkâ€. There is no such link in some commits, for example commits with documentation. In this case, choose the nearest commit having this link. -3. Copy the link to `clickhouse` binary for amd64 or aarch64. -4. ssh to the server and download it with wget: + +## Automated Run + +You can run benchmark with a single script. + +1. Download the script. +``` +wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/hardware.sh +``` + +2. Run the script. +``` +chmod a+x ./hardware.sh +./hardware.sh +``` + +3. Copy the output and send it to clickhouse-feedback@yandex-team.com + +All the results are published here: https://clickhouse.tech/benchmark/hardware/ + + +## Manual Run + +Alternatively you can perform benchmark in the following steps. + +1. ssh to the server and download the binary with wget: ```bash -# These links are outdated, please obtain the fresh link from the "commits" page. # For amd64: -wget https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_build_check/gcc-10_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse +wget https://builds.clickhouse.tech/master/amd64/clickhouse # For aarch64: -wget https://clickhouse-builds.s3.yandex.net/0/e29c4c3cc47ab2a6c4516486c1b77d57e7d42643/clickhouse_special_build_check/clang-10-aarch64_relwithdebuginfo_none_bundled_unsplitted_disable_False_binary/clickhouse +wget https://builds.clickhouse.tech/master/aarch64/clickhouse # Then do: chmod a+x clickhouse ``` -5. Download benchmark files: +2. Download benchmark files: ```bash wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/clickhouse/benchmark-new.sh chmod a+x benchmark-new.sh wget https://raw.githubusercontent.com/ClickHouse/ClickHouse/master/benchmark/clickhouse/queries.sql ``` -6. Download test data according to the [Yandex.Metrica dataset](../getting-started/example-datasets/metrica.md) instruction (“hits†table containing 100 million rows). +3. Download test data according to the [Yandex.Metrica dataset](../getting-started/example-datasets/metrica.md) instruction (“hits†table containing 100 million rows). ```bash wget https://datasets.clickhouse.tech/hits/partitions/hits_100m_obfuscated_v1.tar.xz tar xvf hits_100m_obfuscated_v1.tar.xz -C . mv hits_100m_obfuscated_v1/* . ``` -7. Run the server: +4. Run the server: ```bash ./clickhouse server ``` -8. Check the data: ssh to the server in another terminal +5. Check the data: ssh to the server in another terminal ```bash ./clickhouse client --query "SELECT count() FROM hits_100m_obfuscated" 100000000 ``` -9. Edit the benchmark-new.sh, change `clickhouse-client` to `./clickhouse client` and add `--max_memory_usage 100000000000` parameter. -```bash -mcedit benchmark-new.sh -``` -10. Run the benchmark: +6. Run the benchmark: ```bash ./benchmark-new.sh hits_100m_obfuscated ``` -11. Send the numbers and the info about your hardware configuration to clickhouse-feedback@yandex-team.com +7. Send the numbers and the info about your hardware configuration to clickhouse-feedback@yandex-team.com All the results are published here: https://clickhouse.tech/benchmark/hardware/ diff --git a/docs/en/operations/server-configuration-parameters/settings.md b/docs/en/operations/server-configuration-parameters/settings.md index 801a1d27add..aedd1c107c4 100644 --- a/docs/en/operations/server-configuration-parameters/settings.md +++ b/docs/en/operations/server-configuration-parameters/settings.md @@ -34,6 +34,7 @@ Configuration template: ... ... ... + ... ... @@ -43,7 +44,8 @@ Configuration template: - `min_part_size` – The minimum size of a data part. - `min_part_size_ratio` – The ratio of the data part size to the table size. -- `method` – Compression method. Acceptable values: `lz4` or `zstd`. +- `method` – Compression method. Acceptable values: `lz4`, `lz4hc`, `zstd`. +- `level` – Compression level. See [Codecs](../../sql-reference/statements/create/table/#create-query-general-purpose-codecs). You can configure multiple `` sections. @@ -62,10 +64,33 @@ If no conditions met for a data part, ClickHouse uses the `lz4` compression. 10000000000 0.01 zstd + 1 ``` +## encryption {#server-settings-encryption} + +Configures a command to obtain a key to be used by [encryption codecs](../../sql-reference/statements/create/table.md#create-query-encryption-codecs). The command, or a shell script, is expected to write a Base64-encoded key of any length to the stdout. + +**Example** + +For Linux with systemd: + +```xml + + /usr/bin/systemd-ask-password --id="clickhouse-server" --timeout=0 "Enter the ClickHouse encryption passphrase:" | base64 + +``` + +For other systems: + +```xml + + /dev/tty "Enter the ClickHouse encryption passphrase: "; stty=`stty -F /dev/tty -g`; stty -F /dev/tty -echo; read k + +``` + ## custom_settings_prefixes {#custom_settings_prefixes} List of prefixes for [custom settings](../../operations/settings/index.md#custom_settings). The prefixes must be separated with commas. @@ -98,7 +123,7 @@ Default value: `1073741824` (1 GB). ```xml 1073741824 - + ``` ## database_atomic_delay_before_drop_table_sec {#database_atomic_delay_before_drop_table_sec} @@ -439,8 +464,8 @@ The server will need access to the public Internet via IPv4 (at the time of writ Keys: -- `enabled` – Boolean flag to enable the feature, `false` by default. Set to `true` to allow sending crash reports. -- `endpoint` – You can override the Sentry endpoint URL for sending crash reports. It can be either a separate Sentry account or your self-hosted Sentry instance. Use the [Sentry DSN](https://docs.sentry.io/error-reporting/quickstart/?platform=native#configure-the-sdk) syntax. +- `enabled` – Boolean flag to enable the feature, `false` by default. Set to `true` to allow sending crash reports. +- `endpoint` – You can override the Sentry endpoint URL for sending crash reports. It can be either a separate Sentry account or your self-hosted Sentry instance. Use the [Sentry DSN](https://docs.sentry.io/error-reporting/quickstart/?platform=native#configure-the-sdk) syntax. - `anonymize` - Avoid attaching the server hostname to the crash report. - `http_proxy` - Configure HTTP proxy for sending crash reports. - `debug` - Sets the Sentry client into debug mode. @@ -502,7 +527,7 @@ The default `max_server_memory_usage` value is calculated as `memory_amount * ma ## max_server_memory_usage_to_ram_ratio {#max_server_memory_usage_to_ram_ratio} -Defines the fraction of total physical RAM amount, available to the ClickHouse server. If the server tries to utilize more, the memory is cut down to the appropriate amount. +Defines the fraction of total physical RAM amount, available to the ClickHouse server. If the server tries to utilize more, the memory is cut down to the appropriate amount. Possible values: @@ -713,7 +738,7 @@ Keys for server/client settings: - extendedVerification – Automatically extended verification of certificates after the session ends. Acceptable values: `true`, `false`. - requireTLSv1 – Require a TLSv1 connection. Acceptable values: `true`, `false`. - requireTLSv1_1 – Require a TLSv1.1 connection. Acceptable values: `true`, `false`. -- requireTLSv1 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`. +- requireTLSv1_2 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`. - fips – Activates OpenSSL FIPS mode. Supported if the library’s OpenSSL version supports FIPS. - privateKeyPassphraseHandler – Class (PrivateKeyPassphraseHandler subclass) that requests the passphrase for accessing the private key. For example: ``, `KeyFileHandler`, `test`, ``. - invalidCertificateHandler – Class (a subclass of CertificateHandler) for verifying invalid certificates. For example: ` ConsoleCertificateHandler ` . @@ -880,7 +905,7 @@ Parameters: - `flush_interval_milliseconds` — Interval for flushing data from the buffer in memory to the table. **Example** -```xml +```xml notice diff --git a/docs/en/operations/settings/index.md b/docs/en/operations/settings/index.md index d38c98f51cb..b5dcef932e8 100644 --- a/docs/en/operations/settings/index.md +++ b/docs/en/operations/settings/index.md @@ -31,7 +31,7 @@ Settings that can only be made in the server config file are not covered in this ## Custom Settings {#custom_settings} -In addition to the common [settings](../../operations/settings/settings.md), users can define custom settings. +In addition to the common [settings](../../operations/settings/settings.md), users can define custom settings. A custom setting name must begin with one of predefined prefixes. The list of these prefixes must be declared in the [custom_settings_prefixes](../../operations/server-configuration-parameters/settings.md#custom_settings_prefixes) parameter in the server configuration file. @@ -48,7 +48,7 @@ SET custom_a = 123; To get the current value of a custom setting use `getSetting()` function: ```sql -SELECT getSetting('custom_a'); +SELECT getSetting('custom_a'); ``` **See Also** diff --git a/docs/en/operations/settings/merge-tree-settings.md b/docs/en/operations/settings/merge-tree-settings.md index 791ac344bcf..65d63438aea 100644 --- a/docs/en/operations/settings/merge-tree-settings.md +++ b/docs/en/operations/settings/merge-tree-settings.md @@ -278,4 +278,15 @@ Possible values: Default value: `0`. -[Original article](https://clickhouse.tech/docs/en/operations/settings/merge_tree_settings/) +## check_sample_column_is_correct {#check_sample_column_is_correct} + +Enables the check at table creation, that the data type of a column for sampling or sampling expression is correct. The data type must be one of unsigned [integer types](../../sql-reference/data-types/int-uint.md): `UInt8`, `UInt16`, `UInt32`, `UInt64`. + +Possible values: + +- true — The check is enabled. +- false — The check is disabled at table creation. + +Default value: `true`. + +By default, the ClickHouse server checks at table creation the data type of a column for sampling or sampling expression. If you already have tables with incorrect sampling expression and do not want the server to raise an exception during startup, set `check_sample_column_is_correct` to `false`. diff --git a/docs/en/operations/settings/query-complexity.md b/docs/en/operations/settings/query-complexity.md index d60aa170907..236359bfe55 100644 --- a/docs/en/operations/settings/query-complexity.md +++ b/docs/en/operations/settings/query-complexity.md @@ -65,20 +65,20 @@ What to do when the volume of data read exceeds one of the limits: ‘throw’ o The following restrictions can be checked on each block (instead of on each row). That is, the restrictions can be broken a little. A maximum number of rows that can be read from a local table on a leaf node when running a distributed query. While -distributed queries can issue a multiple sub-queries to each shard (leaf) - this limit will be checked only on the read -stage on the leaf nodes and ignored on results merging stage on the root node. For example, cluster consists of 2 shards -and each shard contains a table with 100 rows. Then distributed query which suppose to read all the data from both -tables with setting `max_rows_to_read=150` will fail as in total it will be 200 rows. While query +distributed queries can issue a multiple sub-queries to each shard (leaf) - this limit will be checked only on the read +stage on the leaf nodes and ignored on results merging stage on the root node. For example, cluster consists of 2 shards +and each shard contains a table with 100 rows. Then distributed query which suppose to read all the data from both +tables with setting `max_rows_to_read=150` will fail as in total it will be 200 rows. While query with `max_rows_to_read_leaf=150` will succeed since leaf nodes will read 100 rows at max. ## max_bytes_to_read_leaf {#max-bytes-to-read-leaf} -A maximum number of bytes (uncompressed data) that can be read from a local table on a leaf node when running -a distributed query. While distributed queries can issue a multiple sub-queries to each shard (leaf) - this limit will -be checked only on the read stage on the leaf nodes and ignored on results merging stage on the root node. -For example, cluster consists of 2 shards and each shard contains a table with 100 bytes of data. -Then distributed query which suppose to read all the data from both tables with setting `max_bytes_to_read=150` will fail -as in total it will be 200 bytes. While query with `max_bytes_to_read_leaf=150` will succeed since leaf nodes will read +A maximum number of bytes (uncompressed data) that can be read from a local table on a leaf node when running +a distributed query. While distributed queries can issue a multiple sub-queries to each shard (leaf) - this limit will +be checked only on the read stage on the leaf nodes and ignored on results merging stage on the root node. +For example, cluster consists of 2 shards and each shard contains a table with 100 bytes of data. +Then distributed query which suppose to read all the data from both tables with setting `max_bytes_to_read=150` will fail +as in total it will be 200 bytes. While query with `max_bytes_to_read_leaf=150` will succeed since leaf nodes will read 100 bytes at max. ## read_overflow_mode_leaf {#read-overflow-mode-leaf} diff --git a/docs/en/operations/settings/settings-users.md b/docs/en/operations/settings/settings-users.md index ee834dca98a..2c8315ad069 100644 --- a/docs/en/operations/settings/settings-users.md +++ b/docs/en/operations/settings/settings-users.md @@ -28,7 +28,7 @@ Structure of the `users` section: profile_name default - + default diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index a5d1f3b121e..d519cacb8dc 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -20,6 +20,29 @@ Possible values: - `global` — Replaces the `IN`/`JOIN` query with `GLOBAL IN`/`GLOBAL JOIN.` - `allow` — Allows the use of these types of subqueries. +## prefer_global_in_and_join {#prefer-global-in-and-join} + +Enables the replacement of `IN`/`JOIN` operators with `GLOBAL IN`/`GLOBAL JOIN`. + +Possible values: + +- 0 — Disabled. `IN`/`JOIN` operators are not replaced with `GLOBAL IN`/`GLOBAL JOIN`. +- 1 — Enabled. `IN`/`JOIN` operators are replaced with `GLOBAL IN`/`GLOBAL JOIN`. + +Default value: `0`. + +**Usage** + +Although `SET distributed_product_mode=global` can change the queries behavior for the distributed tables, it's not suitable for local tables or tables from external resources. Here is when the `prefer_global_in_and_join` setting comes into play. + +For example, we have query serving nodes that contain local tables, which are not suitable for distribution. We need to scatter their data on the fly during distributed processing with the `GLOBAL` keyword — `GLOBAL IN`/`GLOBAL JOIN`. + +Another use case of `prefer_global_in_and_join` is accessing tables created by external engines. This setting helps to reduce the number of calls to external sources while joining such tables: only one call per query. + +**See also:** + +- [Distributed subqueries](../../sql-reference/operators/in.md#select-distributed-subqueries) for more information on how to use `GLOBAL IN`/`GLOBAL JOIN` + ## enable_optimize_predicate_expression {#enable-optimize-predicate-expression} Turns on predicate pushdown in `SELECT` queries. @@ -153,6 +176,26 @@ Possible values: Default value: 1048576. +## table_function_remote_max_addresses {#table_function_remote_max_addresses} + +Sets the maximum number of addresses generated from patterns for the [remote](../../sql-reference/table-functions/remote.md) function. + +Possible values: + +- Positive integer. + +Default value: `1000`. + +## glob_expansion_max_elements {#glob_expansion_max_elements } + +Sets the maximum number of addresses generated from patterns for external storages and table functions (like [url](../../sql-reference/table-functions/url.md)) except the `remote` function. + +Possible values: + +- Positive integer. + +Default value: `1000`. + ## send_progress_in_http_headers {#settings-send_progress_in_http_headers} Enables or disables `X-ClickHouse-Progress` HTTP response headers in `clickhouse-server` responses. @@ -509,6 +552,23 @@ Possible values: Default value: `ALL`. +## join_algorithm {#settings-join_algorithm} + +Specifies [JOIN](../../sql-reference/statements/select/join.md) algorithm. + +Possible values: + +- `hash` — [Hash join algorithm](https://en.wikipedia.org/wiki/Hash_join) is used. +- `partial_merge` — [Sort-merge algorithm](https://en.wikipedia.org/wiki/Sort-merge_join) is used. +- `prefer_partial_merge` — ClickHouse always tries to use `merge` join if possible. +- `auto` — ClickHouse tries to change `hash` join to `merge` join on the fly to avoid out of memory. + +Default value: `hash`. + +When using `hash` algorithm the right part of `JOIN` is uploaded into RAM. + +When using `partial_merge` algorithm ClickHouse sorts the data and dumps it to the disk. The `merge` algorithm in ClickHouse differs a bit from the classic realization. First ClickHouse sorts the right table by [join key](../../sql-reference/statements/select/join.md#select-join) in blocks and creates min-max index for sorted blocks. Then it sorts parts of left table by `join key` and joins them over right table. The min-max index is also used to skip unneeded right table blocks. + ## join_any_take_last_row {#settings-join_any_take_last_row} Changes behaviour of join operations with `ANY` strictness. @@ -1234,7 +1294,7 @@ Default value: `3`. ## output_format_json_quote_64bit_integers {#session_settings-output_format_json_quote_64bit_integers} Controls quoting of 64-bit or bigger [integers](../../sql-reference/data-types/int-uint.md) (like `UInt64` or `Int128`) when they are output in a [JSON](../../interfaces/formats.md#json) format. -Such integers are enclosed in quotes by default. This behavior is compatible with most JavaScript implementations. +Such integers are enclosed in quotes by default. This behavior is compatible with most JavaScript implementations. Possible values: @@ -1758,7 +1818,7 @@ Default value: 0. ## optimize_functions_to_subcolumns {#optimize-functions-to-subcolumns} -Enables or disables optimization by transforming some functions to reading subcolumns. This reduces the amount of data to read. +Enables or disables optimization by transforming some functions to reading subcolumns. This reduces the amount of data to read. These functions can be transformed: @@ -1989,6 +2049,13 @@ Possible values: 32 (32 bytes) - 1073741824 (1 GiB) Default value: 32768 (32 KiB) +## output_format_avro_string_column_pattern {#output_format_avro_string_column_pattern} + +Regexp of column names of type String to output as Avro `string` (default is `bytes`). +RE2 syntax is supported. + +Type: string + ## format_avro_schema_registry_url {#format_avro_schema_registry_url} Sets [Confluent Schema Registry](https://docs.confluent.io/current/schema-registry/index.html) URL to use with [AvroConfluent](../../interfaces/formats.md#data-format-avro-confluent) format. @@ -2018,6 +2085,16 @@ Possible values: Default value: 16. +## merge_selecting_sleep_ms {#merge_selecting_sleep_ms} + +Sleep time for merge selecting when no part is selected. A lower setting triggers selecting tasks in `background_schedule_pool` frequently, which results in a large number of requests to Zookeeper in large-scale clusters. + +Possible values: + +- Any positive integer. + +Default value: `5000`. + ## parallel_distributed_insert_select {#parallel_distributed_insert_select} Enables parallel distributed `INSERT ... SELECT` query. @@ -2893,7 +2970,7 @@ Result: └─────────────┘ ``` -Note that this setting influences [Materialized view](../../sql-reference/statements/create/view.md#materialized) and [MaterializeMySQL](../../engines/database-engines/materialize-mysql.md) behaviour. +Note that this setting influences [Materialized view](../../sql-reference/statements/create/view.md#materialized) and [MaterializedMySQL](../../engines/database-engines/materialized-mysql.md) behaviour. ## engine_file_empty_if_not_exists {#engine-file-empty_if-not-exists} @@ -3151,6 +3228,53 @@ SELECT FROM fuse_tbl ``` +## allow_experimental_database_replicated {#allow_experimental_database_replicated} + +Enables to create databases with [Replicated](../../engines/database-engines/replicated.md) engine. + +Possible values: + +- 0 — Disabled. +- 1 — Enabled. + +Default value: `0`. + +## database_replicated_initial_query_timeout_sec {#database_replicated_initial_query_timeout_sec} + +Sets how long initial DDL query should wait for Replicated database to precess previous DDL queue entries in seconds. + +Possible values: + +- Positive integer. +- 0 — Unlimited. + +Default value: `300`. + +## distributed_ddl_task_timeout {#distributed_ddl_task_timeout} + +Sets timeout for DDL query responses from all hosts in cluster. If a DDL request has not been performed on all hosts, a response will contain a timeout error and a request will be executed in an async mode. Negative value means infinite. + +Possible values: + +- Positive integer. +- 0 — Async mode. +- Negative integer — infinite timeout. + +Default value: `180`. + +## distributed_ddl_output_mode {#distributed_ddl_output_mode} + +Sets format of distributed DDL query result. + +Possible values: + +- `throw` — Returns result set with query execution status for all hosts where query is finished. If query has failed on some hosts, then it will rethrow the first exception. If query is not finished yet on some hosts and [distributed_ddl_task_timeout](#distributed_ddl_task_timeout) exceeded, then it throws `TIMEOUT_EXCEEDED` exception. +- `none` — Is similar to throw, but distributed DDL query returns no result set. +- `null_status_on_timeout` — Returns `NULL` as execution status in some rows of result set instead of throwing `TIMEOUT_EXCEEDED` if query is not finished on the corresponding hosts. +- `never_throw` — Do not throw `TIMEOUT_EXCEEDED` and do not rethrow exceptions if query has failed on some hosts. + +Default value: `throw`. + ## flatten_nested {#flatten-nested} Sets the data format of a [nested](../../sql-reference/data-types/nested-data-structures/nested.md) columns. @@ -3230,3 +3354,14 @@ Default value: `1`. **Usage** If the setting is set to `0`, the table function does not make Nullable columns and inserts default values instead of NULL. This is also applicable for NULL values inside arrays. + +## output_format_arrow_low_cardinality_as_dictionary {#output-format-arrow-low-cardinality-as-dictionary} + +Allows to convert the [LowCardinality](../../sql-reference/data-types/lowcardinality.md) type to the `DICTIONARY` type of the [Arrow](../../interfaces/formats.md#data-format-arrow) format for `SELECT` queries. + +Possible values: + +- 0 — The `LowCardinality` type is not converted to the `DICTIONARY` type. +- 1 — The `LowCardinality` type is converted to the `DICTIONARY` type. + +Default value: `0`. diff --git a/docs/en/operations/system-tables/asynchronous_metric_log.md b/docs/en/operations/system-tables/asynchronous_metric_log.md index b0480dc256a..de2e8faab10 100644 --- a/docs/en/operations/system-tables/asynchronous_metric_log.md +++ b/docs/en/operations/system-tables/asynchronous_metric_log.md @@ -33,7 +33,7 @@ SELECT * FROM system.asynchronous_metric_log LIMIT 10 **See Also** -- [system.asynchronous_metrics](../system-tables/asynchronous_metrics.md) — Contains metrics, calculated periodically in the background. +- [system.asynchronous_metrics](../system-tables/asynchronous_metrics.md) — Contains metrics, calculated periodically in the background. - [system.metric_log](../system-tables/metric_log.md) — Contains history of metrics values from tables `system.metrics` and `system.events`, periodically flushed to disk. [Original article](https://clickhouse.tech/docs/en/operations/system-tables/asynchronous_metric_log) diff --git a/docs/en/operations/system-tables/columns.md b/docs/en/operations/system-tables/columns.md index 2a8009dddee..da4bcec48ed 100644 --- a/docs/en/operations/system-tables/columns.md +++ b/docs/en/operations/system-tables/columns.md @@ -4,7 +4,7 @@ Contains information about columns in all the tables. You can use this table to get information similar to the [DESCRIBE TABLE](../../sql-reference/statements/misc.md#misc-describe-table) query, but for multiple tables at once. -Columns from [temporary tables](../../sql-reference/statements/create/table.md#temporary-tables) are visible in the `system.columns` only in those session where they have been created. They are shown with the empty `database` field. +Columns from [temporary tables](../../sql-reference/statements/create/table.md#temporary-tables) are visible in the `system.columns` only in those session where they have been created. They are shown with the empty `database` field. Columns: @@ -38,17 +38,17 @@ database: system table: aggregate_function_combinators name: name type: String -default_kind: -default_expression: +default_kind: +default_expression: data_compressed_bytes: 0 data_uncompressed_bytes: 0 marks_bytes: 0 -comment: +comment: is_in_partition_key: 0 is_in_sorting_key: 0 is_in_primary_key: 0 is_in_sampling_key: 0 -compression_codec: +compression_codec: Row 2: ────── @@ -56,17 +56,17 @@ database: system table: aggregate_function_combinators name: is_internal type: UInt8 -default_kind: -default_expression: +default_kind: +default_expression: data_compressed_bytes: 0 data_uncompressed_bytes: 0 marks_bytes: 0 -comment: +comment: is_in_partition_key: 0 is_in_sorting_key: 0 is_in_primary_key: 0 is_in_sampling_key: 0 -compression_codec: +compression_codec: ``` The `system.columns` table contains the following columns (the column type is shown in brackets): diff --git a/docs/en/operations/system-tables/disks.md b/docs/en/operations/system-tables/disks.md index 833a0b3b16b..027e722dc55 100644 --- a/docs/en/operations/system-tables/disks.md +++ b/docs/en/operations/system-tables/disks.md @@ -21,7 +21,7 @@ Columns: │ default │ /var/lib/clickhouse/ │ 276392587264 │ 490652508160 │ 0 │ └─────────┴──────────────────────┴──────────────┴──────────────┴─────────────────┘ -1 rows in set. Elapsed: 0.001 sec. +1 rows in set. Elapsed: 0.001 sec. ``` [Original article](https://clickhouse.tech/docs/en/operations/system-tables/disks) diff --git a/docs/en/operations/system-tables/distributed_ddl_queue.md b/docs/en/operations/system-tables/distributed_ddl_queue.md index fa871d215b5..07f72d76324 100644 --- a/docs/en/operations/system-tables/distributed_ddl_queue.md +++ b/docs/en/operations/system-tables/distributed_ddl_queue.md @@ -62,4 +62,3 @@ exception_code: ZOK ``` [Original article](https://clickhouse.tech/docs/en/operations/system_tables/distributed_ddl_queuedistributed_ddl_queue.md) - \ No newline at end of file diff --git a/docs/en/operations/system-tables/distribution_queue.md b/docs/en/operations/system-tables/distribution_queue.md index 3b09c20874c..a7037a23312 100644 --- a/docs/en/operations/system-tables/distribution_queue.md +++ b/docs/en/operations/system-tables/distribution_queue.md @@ -40,7 +40,7 @@ is_blocked: 1 error_count: 0 data_files: 1 data_compressed_bytes: 499 -last_exception: +last_exception: ``` **See Also** diff --git a/docs/en/operations/system-tables/enabled-roles.md b/docs/en/operations/system-tables/enabled-roles.md index c03129b32dd..ecfb889b6ef 100644 --- a/docs/en/operations/system-tables/enabled-roles.md +++ b/docs/en/operations/system-tables/enabled-roles.md @@ -1,6 +1,6 @@ # system.enabled_roles {#system_tables-enabled_roles} -Contains all active roles at the moment, including current role of the current user and granted roles for current role. +Contains all active roles at the moment, including current role of the current user and granted roles for current role. Columns: diff --git a/docs/en/operations/system-tables/functions.md b/docs/en/operations/system-tables/functions.md index 888e768fc93..38aa62c9c09 100644 --- a/docs/en/operations/system-tables/functions.md +++ b/docs/en/operations/system-tables/functions.md @@ -27,7 +27,7 @@ Columns: │ JSONExtractInt │ 0 │ 0 │ │ └──────────────────────────┴──────────────┴──────────────────┴──────────┘ -10 rows in set. Elapsed: 0.002 sec. +10 rows in set. Elapsed: 0.002 sec. ``` [Original article](https://clickhouse.tech/docs/en/operations/system-tables/functions) diff --git a/docs/en/operations/system-tables/licenses.md b/docs/en/operations/system-tables/licenses.md index a9cada507c6..d1c3e1cc5de 100644 --- a/docs/en/operations/system-tables/licenses.md +++ b/docs/en/operations/system-tables/licenses.md @@ -1,11 +1,11 @@ # system.licenses {#system-tables_system.licenses} -Сontains licenses of third-party libraries that are located in the [contrib](https://github.com/ClickHouse/ClickHouse/tree/master/contrib) directory of ClickHouse sources. +Сontains licenses of third-party libraries that are located in the [contrib](https://github.com/ClickHouse/ClickHouse/tree/master/contrib) directory of ClickHouse sources. Columns: - `library_name` ([String](../../sql-reference/data-types/string.md)) — Name of the library, which is license connected with. -- `license_type` ([String](../../sql-reference/data-types/string.md)) — License type — e.g. Apache, MIT. +- `license_type` ([String](../../sql-reference/data-types/string.md)) — License type — e.g. Apache, MIT. - `license_path` ([String](../../sql-reference/data-types/string.md)) — Path to the file with the license text. - `license_text` ([String](../../sql-reference/data-types/string.md)) — License text. diff --git a/docs/en/operations/system-tables/merge_tree_settings.md b/docs/en/operations/system-tables/merge_tree_settings.md index 309c1cbc9d1..ce82cd09b8a 100644 --- a/docs/en/operations/system-tables/merge_tree_settings.md +++ b/docs/en/operations/system-tables/merge_tree_settings.md @@ -48,7 +48,7 @@ changed: 0 description: How many rows in blocks should be formed for merge operations. type: SettingUInt64 -4 rows in set. Elapsed: 0.001 sec. +4 rows in set. Elapsed: 0.001 sec. ``` [Original article](https://clickhouse.tech/docs/en/operations/system-tables/merge_tree_settings) diff --git a/docs/en/operations/system-tables/mutations.md b/docs/en/operations/system-tables/mutations.md index 24fa559197c..e7d3e90b806 100644 --- a/docs/en/operations/system-tables/mutations.md +++ b/docs/en/operations/system-tables/mutations.md @@ -1,6 +1,6 @@ # system.mutations {#system_tables-mutations} -The table contains information about [mutations](../../sql-reference/statements/alter/index.md#mutations) of [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) tables and their progress. Each mutation command is represented by a single row. +The table contains information about [mutations](../../sql-reference/statements/alter/index.md#mutations) of [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) tables and their progress. Each mutation command is represented by a single row. Columns: @@ -16,17 +16,17 @@ Columns: - `block_numbers.partition_id` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — For mutations of replicated tables, the array contains the partitions' IDs (one record for each partition). For mutations of non-replicated tables the array is empty. -- `block_numbers.number` ([Array](../../sql-reference/data-types/array.md)([Int64](../../sql-reference/data-types/int-uint.md))) — For mutations of replicated tables, the array contains one record for each partition, with the block number that was acquired by the mutation. Only parts that contain blocks with numbers less than this number will be mutated in the partition. - +- `block_numbers.number` ([Array](../../sql-reference/data-types/array.md)([Int64](../../sql-reference/data-types/int-uint.md))) — For mutations of replicated tables, the array contains one record for each partition, with the block number that was acquired by the mutation. Only parts that contain blocks with numbers less than this number will be mutated in the partition. + In non-replicated tables, block numbers in all partitions form a single sequence. This means that for mutations of non-replicated tables, the column will contain one record with a single block number acquired by the mutation. - `parts_to_do_names` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — An array of names of data parts that need to be mutated for the mutation to complete. - `parts_to_do` ([Int64](../../sql-reference/data-types/int-uint.md)) — The number of data parts that need to be mutated for the mutation to complete. -- `is_done` ([UInt8](../../sql-reference/data-types/int-uint.md)) — The flag whether the mutation is done or not. Possible values: +- `is_done` ([UInt8](../../sql-reference/data-types/int-uint.md)) — The flag whether the mutation is done or not. Possible values: - `1` if the mutation is completed, - - `0` if the mutation is still in process. + - `0` if the mutation is still in process. !!! info "Note" Even if `parts_to_do = 0` it is possible that a mutation of a replicated table is not completed yet because of a long-running `INSERT` query, that will create a new data part needed to be mutated. diff --git a/docs/en/operations/system-tables/numbers.md b/docs/en/operations/system-tables/numbers.md index bf948d9dd5b..e9d64483525 100644 --- a/docs/en/operations/system-tables/numbers.md +++ b/docs/en/operations/system-tables/numbers.md @@ -26,7 +26,7 @@ Reads from this table are not parallelized. │ 9 │ └────────┘ -10 rows in set. Elapsed: 0.001 sec. +10 rows in set. Elapsed: 0.001 sec. ``` [Original article](https://clickhouse.tech/docs/en/operations/system-tables/numbers) diff --git a/docs/en/operations/system-tables/numbers_mt.md b/docs/en/operations/system-tables/numbers_mt.md index d7df1bc1e0e..e11515d4c06 100644 --- a/docs/en/operations/system-tables/numbers_mt.md +++ b/docs/en/operations/system-tables/numbers_mt.md @@ -24,7 +24,7 @@ Used for tests. │ 9 │ └────────┘ -10 rows in set. Elapsed: 0.001 sec. +10 rows in set. Elapsed: 0.001 sec. ``` [Original article](https://clickhouse.tech/docs/en/operations/system-tables/numbers_mt) diff --git a/docs/en/operations/system-tables/one.md b/docs/en/operations/system-tables/one.md index 10b2a1757d0..4a00e118b05 100644 --- a/docs/en/operations/system-tables/one.md +++ b/docs/en/operations/system-tables/one.md @@ -17,7 +17,7 @@ This is similar to the `DUAL` table found in other DBMSs. │ 0 │ └───────┘ -1 rows in set. Elapsed: 0.001 sec. +1 rows in set. Elapsed: 0.001 sec. ``` [Original article](https://clickhouse.tech/docs/en/operations/system-tables/one) diff --git a/docs/en/operations/system-tables/part_log.md b/docs/en/operations/system-tables/part_log.md index b815d2366bb..ce983282880 100644 --- a/docs/en/operations/system-tables/part_log.md +++ b/docs/en/operations/system-tables/part_log.md @@ -63,7 +63,7 @@ read_rows: 0 read_bytes: 0 peak_memory_usage: 0 error: 0 -exception: +exception: ``` [Original article](https://clickhouse.tech/docs/en/operations/system-tables/part_log) diff --git a/docs/en/operations/system-tables/parts.md b/docs/en/operations/system-tables/parts.md index b9b5aa09b64..cd5b3fc799a 100644 --- a/docs/en/operations/system-tables/parts.md +++ b/docs/en/operations/system-tables/parts.md @@ -19,10 +19,10 @@ Columns: Possible Values: - - `Wide` — Each column is stored in a separate file in a filesystem. - - `Compact` — All columns are stored in one file in a filesystem. + - `Wide` — Each column is stored in a separate file in a filesystem. + - `Compact` — All columns are stored in one file in a filesystem. - Data storing format is controlled by the `min_bytes_for_wide_part` and `min_rows_for_wide_part` settings of the [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) table. + Data storing format is controlled by the `min_bytes_for_wide_part` and `min_rows_for_wide_part` settings of the [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) table. - `active` ([UInt8](../../sql-reference/data-types/int-uint.md)) – Flag that indicates whether the data part is active. If a data part is active, it’s used in a table. Otherwise, it’s deleted. Inactive data parts remain after merging. @@ -88,7 +88,7 @@ Columns: - `delete_ttl_info_max` ([DateTime](../../sql-reference/data-types/datetime.md)) — The maximum value of the date and time key for [TTL DELETE rule](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl). -- `move_ttl_info.expression` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — Array of expressions. Each expression defines a [TTL MOVE rule](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl). +- `move_ttl_info.expression` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — Array of expressions. Each expression defines a [TTL MOVE rule](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl). !!! note "Warning" The `move_ttl_info.expression` array is kept mostly for backward compatibility, now the simpliest way to check `TTL MOVE` rule is to use the `move_ttl_info.min` and `move_ttl_info.max` fields. diff --git a/docs/en/operations/system-tables/parts_columns.md b/docs/en/operations/system-tables/parts_columns.md index 293abb18a50..81529544a8b 100644 --- a/docs/en/operations/system-tables/parts_columns.md +++ b/docs/en/operations/system-tables/parts_columns.md @@ -19,10 +19,10 @@ Columns: Possible values: - - `Wide` — Each column is stored in a separate file in a filesystem. - - `Compact` — All columns are stored in one file in a filesystem. + - `Wide` — Each column is stored in a separate file in a filesystem. + - `Compact` — All columns are stored in one file in a filesystem. - Data storing format is controlled by the `min_bytes_for_wide_part` and `min_rows_for_wide_part` settings of the [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) table. + Data storing format is controlled by the `min_bytes_for_wide_part` and `min_rows_for_wide_part` settings of the [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) table. - `active` ([UInt8](../../sql-reference/data-types/int-uint.md)) — Flag that indicates whether the data part is active. If a data part is active, it’s used in a table. Otherwise, it’s deleted. Inactive data parts remain after merging. diff --git a/docs/en/operations/system-tables/query_log.md b/docs/en/operations/system-tables/query_log.md index d58e549616f..987f1968356 100644 --- a/docs/en/operations/system-tables/query_log.md +++ b/docs/en/operations/system-tables/query_log.md @@ -51,6 +51,7 @@ Columns: - `databases` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the databases present in the query. - `tables` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the tables present in the query. - `columns` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — Names of the columns present in the query. +- `projections` ([String](../../sql-reference/data-types/string.md)) — Names of the projections used during the query execution. - `exception_code` ([Int32](../../sql-reference/data-types/int-uint.md)) — Code of an exception. - `exception` ([String](../../sql-reference/data-types/string.md)) — Exception message. - `stack_trace` ([String](../../sql-reference/data-types/string.md)) — [Stack trace](https://en.wikipedia.org/wiki/Stack_trace). An empty string, if the query was completed successfully. @@ -65,6 +66,8 @@ Columns: - `initial_query_id` ([String](../../sql-reference/data-types/string.md)) — ID of the initial query (for distributed query execution). - `initial_address` ([IPv6](../../sql-reference/data-types/domains/ipv6.md)) — IP address that the parent query was launched from. - `initial_port` ([UInt16](../../sql-reference/data-types/int-uint.md)) — The client port that was used to make the parent query. +- `initial_query_start_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Initial query starting time (for distributed query execution). +- `initial_query_start_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — Initial query starting time with microseconds precision (for distributed query execution). - `interface` ([UInt8](../../sql-reference/data-types/int-uint.md)) — Interface that the query was initiated from. Possible values: - 1 — TCP. - 2 — HTTP. @@ -101,55 +104,77 @@ Columns: **Example** ``` sql -SELECT * FROM system.query_log WHERE type = 'QueryFinish' AND (query LIKE '%toDate(\'2000-12-05\')%') ORDER BY query_start_time DESC LIMIT 1 FORMAT Vertical; +SELECT * FROM system.query_log WHERE type = 'QueryFinish' ORDER BY query_start_time DESC LIMIT 1 FORMAT Vertical; ``` ``` text Row 1: ────── -type: QueryStart -event_date: 2020-09-11 -event_time: 2020-09-11 10:08:17 -event_time_microseconds: 2020-09-11 10:08:17.063321 -query_start_time: 2020-09-11 10:08:17 -query_start_time_microseconds: 2020-09-11 10:08:17.063321 -query_duration_ms: 0 -read_rows: 0 -read_bytes: 0 -written_rows: 0 -written_bytes: 0 -result_rows: 0 -result_bytes: 0 -memory_usage: 0 -current_database: default -query: INSERT INTO test1 VALUES -exception_code: 0 +type: QueryFinish +event_date: 2021-07-28 +event_time: 2021-07-28 13:46:56 +event_time_microseconds: 2021-07-28 13:46:56.719791 +query_start_time: 2021-07-28 13:46:56 +query_start_time_microseconds: 2021-07-28 13:46:56.704542 +query_duration_ms: 14 +read_rows: 8393 +read_bytes: 374325 +written_rows: 0 +written_bytes: 0 +result_rows: 4201 +result_bytes: 153024 +memory_usage: 4714038 +current_database: default +query: SELECT DISTINCT arrayJoin(extractAll(name, '[\\w_]{2,}')) AS res FROM (SELECT name FROM system.functions UNION ALL SELECT name FROM system.table_engines UNION ALL SELECT name FROM system.formats UNION ALL SELECT name FROM system.table_functions UNION ALL SELECT name FROM system.data_type_families UNION ALL SELECT name FROM system.merge_tree_settings UNION ALL SELECT name FROM system.settings UNION ALL SELECT cluster FROM system.clusters UNION ALL SELECT macro FROM system.macros UNION ALL SELECT policy_name FROM system.storage_policies UNION ALL SELECT concat(func.name, comb.name) FROM system.functions AS func CROSS JOIN system.aggregate_function_combinators AS comb WHERE is_aggregate UNION ALL SELECT name FROM system.databases LIMIT 10000 UNION ALL SELECT DISTINCT name FROM system.tables LIMIT 10000 UNION ALL SELECT DISTINCT name FROM system.dictionaries LIMIT 10000 UNION ALL SELECT DISTINCT name FROM system.columns LIMIT 10000) WHERE notEmpty(res) +normalized_query_hash: 6666026786019643712 +query_kind: Select +databases: ['system'] +tables: ['system.aggregate_function_combinators','system.clusters','system.columns','system.data_type_families','system.databases','system.dictionaries','system.formats','system.functions','system.macros','system.merge_tree_settings','system.settings','system.storage_policies','system.table_engines','system.table_functions','system.tables'] +columns: ['system.aggregate_function_combinators.name','system.clusters.cluster','system.columns.name','system.data_type_families.name','system.databases.name','system.dictionaries.name','system.formats.name','system.functions.is_aggregate','system.functions.name','system.macros.macro','system.merge_tree_settings.name','system.settings.name','system.storage_policies.policy_name','system.table_engines.name','system.table_functions.name','system.tables.name'] +projections: [] +exception_code: 0 exception: stack_trace: -is_initial_query: 1 -user: default -query_id: 50a320fd-85a8-49b8-8761-98a86bcbacef -address: ::ffff:127.0.0.1 -port: 33452 -initial_user: default -initial_query_id: 50a320fd-85a8-49b8-8761-98a86bcbacef -initial_address: ::ffff:127.0.0.1 -initial_port: 33452 -interface: 1 -os_user: bharatnc -client_hostname: tower -client_name: ClickHouse -client_revision: 54437 -client_version_major: 20 -client_version_minor: 7 -client_version_patch: 2 -http_method: 0 +is_initial_query: 1 +user: default +query_id: a3361f6e-a1fd-4d54-9f6f-f93a08bab0bf +address: ::ffff:127.0.0.1 +port: 51006 +initial_user: default +initial_query_id: a3361f6e-a1fd-4d54-9f6f-f93a08bab0bf +initial_address: ::ffff:127.0.0.1 +initial_port: 51006 +initial_query_start_time: 2021-07-28 13:46:56 +initial_query_start_time_microseconds: 2021-07-28 13:46:56.704542 +interface: 1 +os_user: +client_hostname: +client_name: ClickHouse client +client_revision: 54449 +client_version_major: 21 +client_version_minor: 8 +client_version_patch: 0 +http_method: 0 http_user_agent: +http_referer: +forwarded_for: quota_key: -revision: 54440 -thread_ids: [] -ProfileEvents: {'Query':1,'SelectQuery':1,'ReadCompressedBytes':36,'CompressedReadBufferBlocks':1,'CompressedReadBufferBytes':10,'IOBufferAllocs':1,'IOBufferAllocBytes':89,'ContextLock':15,'RWLockAcquiredReadLocks':1} -Settings: {'background_pool_size':'32','load_balancing':'random','allow_suspicious_low_cardinality_types':'1','distributed_aggregation_memory_efficient':'1','skip_unavailable_shards':'1','log_queries':'1','max_bytes_before_external_group_by':'20000000000','max_bytes_before_external_sort':'20000000000','allow_introspection_functions':'1'} +revision: 54453 +log_comment: +thread_ids: [5058,22097,22110,22094] +ProfileEvents.Names: ['Query','SelectQuery','ArenaAllocChunks','ArenaAllocBytes','FunctionExecute','NetworkSendElapsedMicroseconds','SelectedRows','SelectedBytes','ContextLock','RWLockAcquiredReadLocks','RealTimeMicroseconds','UserTimeMicroseconds','SystemTimeMicroseconds','SoftPageFaults','OSCPUWaitMicroseconds','OSCPUVirtualTimeMicroseconds','OSWriteBytes','OSWriteChars'] +ProfileEvents.Values: [1,1,39,352256,64,360,8393,374325,412,440,34480,13108,4723,671,19,17828,8192,10240] +Settings.Names: ['load_balancing','max_memory_usage'] +Settings.Values: ['random','10000000000'] +used_aggregate_functions: [] +used_aggregate_function_combinators: [] +used_database_engines: [] +used_data_type_families: ['UInt64','UInt8','Nullable','String','date'] +used_dictionaries: [] +used_formats: [] +used_functions: ['concat','notEmpty','extractAll'] +used_storages: [] +used_table_functions: [] ``` **See Also** diff --git a/docs/en/operations/system-tables/replicas.md b/docs/en/operations/system-tables/replicas.md index 5a6ec54723b..e2cc607f6d8 100644 --- a/docs/en/operations/system-tables/replicas.md +++ b/docs/en/operations/system-tables/replicas.md @@ -82,6 +82,7 @@ The next 4 columns have a non-zero value only where there is an active session w - `absolute_delay` (`UInt64`) - How big lag in seconds the current replica has. - `total_replicas` (`UInt8`) - The total number of known replicas of this table. - `active_replicas` (`UInt8`) - The number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas). +- `replica_is_active` ([Map(String, UInt8)](../../sql-reference/data-types/map.md)) — Map between replica name and is replica active. If you request all the columns, the table may work a bit slowly, since several reads from ZooKeeper are made for each row. If you do not request the last 4 columns (log_max_index, log_pointer, total_replicas, active_replicas), the table works quickly. diff --git a/docs/en/operations/system-tables/role-grants.md b/docs/en/operations/system-tables/role-grants.md index d90bc1f77be..d754c6d7fb5 100644 --- a/docs/en/operations/system-tables/role-grants.md +++ b/docs/en/operations/system-tables/role-grants.md @@ -13,9 +13,9 @@ Columns: - `granted_role_is_default` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Flag that shows whether `granted_role` is a default role. Possible values: - 1 — `granted_role` is a default role. - 0 — `granted_role` is not a default role. - + - `with_admin_option` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Flag that shows whether `granted_role` is a role with [ADMIN OPTION](../../sql-reference/statements/grant.md#admin-option-privilege) privilege. Possible values: - 1 — The role has `ADMIN OPTION` privilege. - - 0 — The role without `ADMIN OPTION` privilege. + - 0 — The role without `ADMIN OPTION` privilege. [Original article](https://clickhouse.tech/docs/en/operations/system-tables/role-grants) diff --git a/docs/en/operations/system-tables/row_policies.md b/docs/en/operations/system-tables/row_policies.md index 767270d64ae..157ec75f05e 100644 --- a/docs/en/operations/system-tables/row_policies.md +++ b/docs/en/operations/system-tables/row_policies.md @@ -13,7 +13,7 @@ Columns: - `id` ([UUID](../../sql-reference/data-types/uuid.md)) — Row policy ID. -- `storage` ([String](../../sql-reference/data-types/string.md)) — Name of the directory where the row policy is stored. +- `storage` ([String](../../sql-reference/data-types/string.md)) — Name of the directory where the row policy is stored. - `select_filter` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Condition which is used to filter rows. diff --git a/docs/en/operations/system-tables/settings_profiles.md b/docs/en/operations/system-tables/settings_profiles.md index 80dc5172f4e..7339c5c2ef6 100644 --- a/docs/en/operations/system-tables/settings_profiles.md +++ b/docs/en/operations/system-tables/settings_profiles.md @@ -7,7 +7,7 @@ Columns: - `id` ([UUID](../../sql-reference/data-types/uuid.md)) — Setting profile ID. -- `storage` ([String](../../sql-reference/data-types/string.md)) — Path to the storage of setting profiles. Configured in the `access_control_path` parameter. +- `storage` ([String](../../sql-reference/data-types/string.md)) — Path to the storage of setting profiles. Configured in the `access_control_path` parameter. - `num_elements` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Number of elements for this profile in the `system.settings_profile_elements` table. diff --git a/docs/en/operations/system-tables/tables.md b/docs/en/operations/system-tables/tables.md index 4d7b20be311..f37da02cf5b 100644 --- a/docs/en/operations/system-tables/tables.md +++ b/docs/en/operations/system-tables/tables.md @@ -1,24 +1,24 @@ # system.tables {#system-tables} -Contains metadata of each table that the server knows about. +Contains metadata of each table that the server knows about. [Detached](../../sql-reference/statements/detach.md) tables are not shown in `system.tables`. -[Temporary tables](../../sql-reference/statements/create/table.md#temporary-tables) are visible in the `system.tables` only in those session where they have been created. They are shown with the empty `database` field and with the `is_temporary` flag switched on. +[Temporary tables](../../sql-reference/statements/create/table.md#temporary-tables) are visible in the `system.tables` only in those session where they have been created. They are shown with the empty `database` field and with the `is_temporary` flag switched on. Columns: -- `database` ([String](../../sql-reference/data-types/string.md)) — The name of the database the table is in. +- `database` ([String](../../sql-reference/data-types/string.md)) — The name of the database the table is in. -- `name` ([String](../../sql-reference/data-types/string.md)) — Table name. +- `name` ([String](../../sql-reference/data-types/string.md)) — Table name. -- `engine` ([String](../../sql-reference/data-types/string.md)) — Table engine name (without parameters). +- `engine` ([String](../../sql-reference/data-types/string.md)) — Table engine name (without parameters). -- `is_temporary` ([UInt8](../../sql-reference/data-types/int-uint.md)) - Flag that indicates whether the table is temporary. +- `is_temporary` ([UInt8](../../sql-reference/data-types/int-uint.md)) - Flag that indicates whether the table is temporary. -- `data_path` ([String](../../sql-reference/data-types/string.md)) - Path to the table data in the file system. +- `data_path` ([String](../../sql-reference/data-types/string.md)) - Path to the table data in the file system. -- `metadata_path` ([String](../../sql-reference/data-types/string.md)) - Path to the table metadata in the file system. +- `metadata_path` ([String](../../sql-reference/data-types/string.md)) - Path to the table metadata in the file system. - `metadata_modification_time` ([DateTime](../../sql-reference/data-types/datetime.md)) - Time of latest modification of the table metadata. @@ -28,33 +28,33 @@ Columns: - `create_table_query` ([String](../../sql-reference/data-types/string.md)) - The query that was used to create the table. -- `engine_full` ([String](../../sql-reference/data-types/string.md)) - Parameters of the table engine. +- `engine_full` ([String](../../sql-reference/data-types/string.md)) - Parameters of the table engine. -- `partition_key` ([String](../../sql-reference/data-types/string.md)) - The partition key expression specified in the table. +- `partition_key` ([String](../../sql-reference/data-types/string.md)) - The partition key expression specified in the table. -- `sorting_key` ([String](../../sql-reference/data-types/string.md)) - The sorting key expression specified in the table. +- `sorting_key` ([String](../../sql-reference/data-types/string.md)) - The sorting key expression specified in the table. -- `primary_key` ([String](../../sql-reference/data-types/string.md)) - The primary key expression specified in the table. +- `primary_key` ([String](../../sql-reference/data-types/string.md)) - The primary key expression specified in the table. -- `sampling_key` ([String](../../sql-reference/data-types/string.md)) - The sampling key expression specified in the table. +- `sampling_key` ([String](../../sql-reference/data-types/string.md)) - The sampling key expression specified in the table. - `storage_policy` ([String](../../sql-reference/data-types/string.md)) - The storage policy: - [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) - [Distributed](../../engines/table-engines/special/distributed.md#distributed) -- `total_rows` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) - Total number of rows, if it is possible to quickly determine exact number of rows in the table, otherwise `NULL` (including underying `Buffer` table). +- `total_rows` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) - Total number of rows, if it is possible to quickly determine exact number of rows in the table, otherwise `NULL` (including underying `Buffer` table). -- `total_bytes` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) - Total number of bytes, if it is possible to quickly determine exact number of bytes for the table on storage, otherwise `NULL` (does not includes any underlying storage). +- `total_bytes` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) - Total number of bytes, if it is possible to quickly determine exact number of bytes for the table on storage, otherwise `NULL` (does not includes any underlying storage). - If the table stores data on disk, returns used space on disk (i.e. compressed). - If the table stores data in memory, returns approximated number of used bytes in memory. -- `lifetime_rows` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) - Total number of rows INSERTed since server start (only for `Buffer` tables). +- `lifetime_rows` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) - Total number of rows INSERTed since server start (only for `Buffer` tables). -- `lifetime_bytes` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) - Total number of bytes INSERTed since server start (only for `Buffer` tables). +- `lifetime_bytes` ([Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md))) - Total number of bytes INSERTed since server start (only for `Buffer` tables). -- `comment` ([String](../../sql-reference/data-types/string.md)) - The comment for the table. +- `comment` ([String](../../sql-reference/data-types/string.md)) - The comment for the table. The `system.tables` table is used in `SHOW TABLES` query implementation. diff --git a/docs/en/operations/system-tables/text_log.md b/docs/en/operations/system-tables/text_log.md index ad95e91f0d2..e97af34beec 100644 --- a/docs/en/operations/system-tables/text_log.md +++ b/docs/en/operations/system-tables/text_log.md @@ -42,12 +42,12 @@ microseconds: 871397 thread_name: clickhouse-serv thread_id: 564917 level: Information -query_id: +query_id: logger_name: DNSCacheUpdater message: Update period 15 seconds revision: 54440 source_file: /ClickHouse/src/Interpreters/DNSCacheUpdater.cpp; void DB::DNSCacheUpdater::start() source_line: 45 ``` - + [Original article](https://clickhouse.tech/docs/en/operations/system-tables/text_log) diff --git a/docs/en/operations/system-tables/trace_log.md b/docs/en/operations/system-tables/trace_log.md index 5de597a0a51..cc70b2d8236 100644 --- a/docs/en/operations/system-tables/trace_log.md +++ b/docs/en/operations/system-tables/trace_log.md @@ -49,7 +49,7 @@ timestamp_ns: 1599762189872924510 revision: 54440 trace_type: Memory thread_id: 564963 -query_id: +query_id: trace: [371912858,371912789,371798468,371799717,371801313,371790250,624462773,566365041,566440261,566445834,566460071,566459914,566459842,566459580,566459469,566459389,566459341,566455774,371993941,371988245,372158848,372187428,372187309,372187093,372185478,140222123165193,140222122205443] size: 5244400 ``` diff --git a/docs/en/operations/system-tables/users.md b/docs/en/operations/system-tables/users.md index 11fdeb1e9ae..0170c85a6d5 100644 --- a/docs/en/operations/system-tables/users.md +++ b/docs/en/operations/system-tables/users.md @@ -7,7 +7,7 @@ Columns: - `id` ([UUID](../../sql-reference/data-types/uuid.md)) — User ID. -- `storage` ([String](../../sql-reference/data-types/string.md)) — Path to the storage of users. Configured in the `access_control_path` parameter. +- `storage` ([String](../../sql-reference/data-types/string.md)) — Path to the storage of users. Configured in the `access_control_path` parameter. - `auth_type` ([Enum8](../../sql-reference/data-types/enum.md)('no_password' = 0,'plaintext_password' = 1, 'sha256_password' = 2, 'double_sha1_password' = 3)) — Shows the authentication type. There are multiple ways of user identification: with no password, with plain text password, with [SHA256](https://ru.wikipedia.org/wiki/SHA-2)-encoded password or with [double SHA-1](https://ru.wikipedia.org/wiki/SHA-1)-encoded password. diff --git a/docs/en/operations/update.md b/docs/en/operations/update.md index dbcf9ae2b3e..ffb646ffce2 100644 --- a/docs/en/operations/update.md +++ b/docs/en/operations/update.md @@ -16,12 +16,12 @@ $ sudo service clickhouse-server restart If you installed ClickHouse using something other than the recommended `deb` packages, use the appropriate update method. !!! note "Note" - You can update multiple servers at once as soon as there is no moment when all replicas of one shard are offline. + You can update multiple servers at once as soon as there is no moment when all replicas of one shard are offline. The upgrade of older version of ClickHouse to specific version: As an example: - + `xx.yy.a.b` is a current stable version. The latest stable version could be found [here](https://github.com/ClickHouse/ClickHouse/releases) ```bash diff --git a/docs/en/operations/utilities/clickhouse-benchmark.md b/docs/en/operations/utilities/clickhouse-benchmark.md index 92a96f8cd6e..5971fc0f9b3 100644 --- a/docs/en/operations/utilities/clickhouse-benchmark.md +++ b/docs/en/operations/utilities/clickhouse-benchmark.md @@ -40,7 +40,7 @@ clickhouse-benchmark [keys] < queries_file; ## Keys {#clickhouse-benchmark-keys} -- `--query=QUERY` — Query to execute. If this parameter is not passed, `clickhouse-benchmark` will read queries from standard input. +- `--query=QUERY` — Query to execute. If this parameter is not passed, `clickhouse-benchmark` will read queries from standard input. - `-c N`, `--concurrency=N` — Number of queries that `clickhouse-benchmark` sends simultaneously. Default value: 1. - `-d N`, `--delay=N` — Interval in seconds between intermediate reports (to disable reports set 0). Default value: 1. - `-h HOST`, `--host=HOST` — Server host. Default value: `localhost`. For the [comparison mode](#clickhouse-benchmark-comparison-mode) you can use multiple `-h` keys. diff --git a/docs/en/operations/utilities/clickhouse-copier.md b/docs/en/operations/utilities/clickhouse-copier.md index 056b06271ef..d9c209f2101 100644 --- a/docs/en/operations/utilities/clickhouse-copier.md +++ b/docs/en/operations/utilities/clickhouse-copier.md @@ -74,7 +74,7 @@ Parameters: source cluster & destination clusters accept exactly the same parameters as parameters for the usual Distributed table see https://clickhouse.tech/docs/en/engines/table-engines/special/distributed/ - --> + --> false diff --git a/docs/en/operations/utilities/clickhouse-format.md b/docs/en/operations/utilities/clickhouse-format.md index 17948dce82d..edba55689e7 100644 --- a/docs/en/operations/utilities/clickhouse-format.md +++ b/docs/en/operations/utilities/clickhouse-format.md @@ -18,7 +18,7 @@ Keys: - `--seed ` — Seed arbitrary string that determines the result of obfuscation. - `--backslash` — Add a backslash at the end of each line of the formatted query. Can be useful when you copy a query from web or somewhere else with multiple lines, and want to execute it in command line. -## Examples {#examples} +## Examples {#examples} 1. Highlighting and single line: @@ -32,12 +32,12 @@ Result: SELECT sum(number) FROM numbers(5) ``` -2. Multiqueries: +2. Multiqueries: ```bash $ clickhouse-format -n <<< "SELECT * FROM (SELECT 1 AS x UNION ALL SELECT 1 UNION DISTINCT SELECT 3);" ``` - + Result: ```text @@ -58,13 +58,13 @@ FROM ```bash $ clickhouse-format --seed Hello --obfuscate <<< "SELECT cost_first_screen BETWEEN a AND b, CASE WHEN x >= 123 THEN y ELSE NULL END;" ``` - + Result: ```text SELECT treasury_mammoth_hazelnut BETWEEN nutmeg AND span, CASE WHEN chive >= 116 THEN switching ELSE ANYTHING END; ``` - + Same query and another seed string: ```bash @@ -95,4 +95,4 @@ FROM \ UNION DISTINCT \ SELECT 3 \ ) -``` +``` diff --git a/docs/en/operations/utilities/clickhouse-local.md b/docs/en/operations/utilities/clickhouse-local.md index cfabf42bff1..b166e2f1b3c 100644 --- a/docs/en/operations/utilities/clickhouse-local.md +++ b/docs/en/operations/utilities/clickhouse-local.md @@ -38,7 +38,7 @@ Arguments: - `-of`, `--format`, `--output-format` — output format, `TSV` by default. - `-d`, `--database` — default database, `_local` by default. - `--stacktrace` — whether to dump debug output in case of exception. -- `--echo` — print query before execution. +- `--echo` — print query before execution. - `--verbose` — more details on query execution. - `--logger.console` — Log to console. - `--logger.log` — Log file name. diff --git a/docs/en/sql-reference/aggregate-functions/parametric-functions.md b/docs/en/sql-reference/aggregate-functions/parametric-functions.md index c77dfd6fa90..bdf115acb34 100644 --- a/docs/en/sql-reference/aggregate-functions/parametric-functions.md +++ b/docs/en/sql-reference/aggregate-functions/parametric-functions.md @@ -255,7 +255,7 @@ windowFunnel(window, [mode, [mode, ... ]])(timestamp, cond1, cond2, ..., condN) - `window` — Length of the sliding window, it is the time interval between the first and the last condition. The unit of `window` depends on the `timestamp` itself and varies. Determined using the expression `timestamp of cond1 <= timestamp of cond2 <= ... <= timestamp of condN <= timestamp of cond1 + window`. - `mode` — It is an optional argument. One or more modes can be set. - - `'strict'` — If same condition holds for sequence of events then such non-unique events would be skipped. + - `'strict'` — If same condition holds for sequence of events then such non-unique events would be skipped. - `'strict_order'` — Don't allow interventions of other events. E.g. in the case of `A->B->D->C`, it stops finding `A->B->C` at the `D` and the max event level is 2. - `'strict_increase'` — Apply conditions only to events with strictly increasing timestamps. @@ -530,7 +530,7 @@ sequenceNextNode(direction, base)(timestamp, event_column, base_condition, event - tail — Set the base point to the last event. - first_match — Set the base point to the first matched `event1`. - last_match — Set the base point to the last matched `event1`. - + **Arguments** - `timestamp` — Name of the column containing the timestamp. Data types supported: [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md#data_type-datetime) and other unsigned integer types. @@ -553,11 +553,11 @@ The query statement searching the event following A->B: ``` sql CREATE TABLE test_flow ( - dt DateTime, - id int, + dt DateTime, + id int, page String) -ENGINE = MergeTree() -PARTITION BY toYYYYMMDD(dt) +ENGINE = MergeTree() +PARTITION BY toYYYYMMDD(dt) ORDER BY id; INSERT INTO test_flow VALUES (1, 1, 'A') (2, 1, 'B') (3, 1, 'C') (4, 1, 'D') (5, 1, 'E'); @@ -585,21 +585,21 @@ INSERT INTO test_flow VALUES (1, 3, 'Gift') (2, 3, 'Home') (3, 3, 'Gift') (4, 3, ``` sql SELECT id, sequenceNextNode('forward', 'head')(dt, page, page = 'Home', page = 'Home', page = 'Gift') FROM test_flow GROUP BY id; - + dt id page 1970-01-01 09:00:01 1 Home // Base point, Matched with Home 1970-01-01 09:00:02 1 Gift // Matched with Gift - 1970-01-01 09:00:03 1 Exit // The result + 1970-01-01 09:00:03 1 Exit // The result 1970-01-01 09:00:01 2 Home // Base point, Matched with Home 1970-01-01 09:00:02 2 Home // Unmatched with Gift 1970-01-01 09:00:03 2 Gift - 1970-01-01 09:00:04 2 Basket - + 1970-01-01 09:00:04 2 Basket + 1970-01-01 09:00:01 3 Gift // Base point, Unmatched with Home - 1970-01-01 09:00:02 3 Home - 1970-01-01 09:00:03 3 Gift - 1970-01-01 09:00:04 3 Basket + 1970-01-01 09:00:02 3 Home + 1970-01-01 09:00:03 3 Gift + 1970-01-01 09:00:04 3 Basket ``` **Behavior for `backward` and `tail`** @@ -611,14 +611,14 @@ SELECT id, sequenceNextNode('backward', 'tail')(dt, page, page = 'Basket', page 1970-01-01 09:00:01 1 Home 1970-01-01 09:00:02 1 Gift 1970-01-01 09:00:03 1 Exit // Base point, Unmatched with Basket - -1970-01-01 09:00:01 2 Home -1970-01-01 09:00:02 2 Home // The result + +1970-01-01 09:00:01 2 Home +1970-01-01 09:00:02 2 Home // The result 1970-01-01 09:00:03 2 Gift // Matched with Gift 1970-01-01 09:00:04 2 Basket // Base point, Matched with Basket - + 1970-01-01 09:00:01 3 Gift -1970-01-01 09:00:02 3 Home // The result +1970-01-01 09:00:02 3 Home // The result 1970-01-01 09:00:03 3 Gift // Base point, Matched with Gift 1970-01-01 09:00:04 3 Basket // Base point, Matched with Basket ``` @@ -633,16 +633,16 @@ SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, page = 'Gift', p 1970-01-01 09:00:01 1 Home 1970-01-01 09:00:02 1 Gift // Base point 1970-01-01 09:00:03 1 Exit // The result - -1970-01-01 09:00:01 2 Home -1970-01-01 09:00:02 2 Home + +1970-01-01 09:00:01 2 Home +1970-01-01 09:00:02 2 Home 1970-01-01 09:00:03 2 Gift // Base point 1970-01-01 09:00:04 2 Basket The result - + 1970-01-01 09:00:01 3 Gift // Base point 1970-01-01 09:00:02 3 Home // The result -1970-01-01 09:00:03 3 Gift -1970-01-01 09:00:04 3 Basket +1970-01-01 09:00:03 3 Gift +1970-01-01 09:00:04 3 Basket ``` ``` sql @@ -652,16 +652,16 @@ SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, page = 'Gift', p 1970-01-01 09:00:01 1 Home 1970-01-01 09:00:02 1 Gift // Base point 1970-01-01 09:00:03 1 Exit // Unmatched with Home - -1970-01-01 09:00:01 2 Home -1970-01-01 09:00:02 2 Home + +1970-01-01 09:00:01 2 Home +1970-01-01 09:00:02 2 Home 1970-01-01 09:00:03 2 Gift // Base point 1970-01-01 09:00:04 2 Basket // Unmatched with Home - + 1970-01-01 09:00:01 3 Gift // Base point 1970-01-01 09:00:02 3 Home // Matched with Home 1970-01-01 09:00:03 3 Gift // The result -1970-01-01 09:00:04 3 Basket +1970-01-01 09:00:04 3 Basket ``` @@ -673,17 +673,17 @@ SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, page = 'Gift', p dt id page 1970-01-01 09:00:01 1 Home // The result 1970-01-01 09:00:02 1 Gift // Base point -1970-01-01 09:00:03 1 Exit - -1970-01-01 09:00:01 2 Home +1970-01-01 09:00:03 1 Exit + +1970-01-01 09:00:01 2 Home 1970-01-01 09:00:02 2 Home // The result 1970-01-01 09:00:03 2 Gift // Base point -1970-01-01 09:00:04 2 Basket - -1970-01-01 09:00:01 3 Gift +1970-01-01 09:00:04 2 Basket + +1970-01-01 09:00:01 3 Gift 1970-01-01 09:00:02 3 Home // The result -1970-01-01 09:00:03 3 Gift // Base point -1970-01-01 09:00:04 3 Basket +1970-01-01 09:00:03 3 Gift // Base point +1970-01-01 09:00:04 3 Basket ``` ``` sql @@ -692,17 +692,17 @@ SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, page = 'Gift', p dt id page 1970-01-01 09:00:01 1 Home // Matched with Home, the result is null 1970-01-01 09:00:02 1 Gift // Base point -1970-01-01 09:00:03 1 Exit - +1970-01-01 09:00:03 1 Exit + 1970-01-01 09:00:01 2 Home // The result 1970-01-01 09:00:02 2 Home // Matched with Home 1970-01-01 09:00:03 2 Gift // Base point -1970-01-01 09:00:04 2 Basket - +1970-01-01 09:00:04 2 Basket + 1970-01-01 09:00:01 3 Gift // The result 1970-01-01 09:00:02 3 Home // Matched with Home -1970-01-01 09:00:03 3 Gift // Base point -1970-01-01 09:00:04 3 Basket +1970-01-01 09:00:03 3 Gift // Base point +1970-01-01 09:00:04 3 Basket ``` @@ -726,39 +726,39 @@ INSERT INTO test_flow_basecond VALUES (1, 1, 'A', 'ref4') (2, 1, 'A', 'ref3') (3 ``` sql SELECT id, sequenceNextNode('forward', 'head')(dt, page, ref = 'ref1', page = 'A') FROM test_flow_basecond GROUP BY id; - dt id page ref + dt id page ref 1970-01-01 09:00:01 1 A ref4 // The head can not be base point because the ref column of the head unmatched with 'ref1'. - 1970-01-01 09:00:02 1 A ref3 - 1970-01-01 09:00:03 1 B ref2 - 1970-01-01 09:00:04 1 B ref1 + 1970-01-01 09:00:02 1 A ref3 + 1970-01-01 09:00:03 1 B ref2 + 1970-01-01 09:00:04 1 B ref1 ``` ``` sql SELECT id, sequenceNextNode('backward', 'tail')(dt, page, ref = 'ref4', page = 'B') FROM test_flow_basecond GROUP BY id; - dt id page ref + dt id page ref 1970-01-01 09:00:01 1 A ref4 - 1970-01-01 09:00:02 1 A ref3 - 1970-01-01 09:00:03 1 B ref2 + 1970-01-01 09:00:02 1 A ref3 + 1970-01-01 09:00:03 1 B ref2 1970-01-01 09:00:04 1 B ref1 // The tail can not be base point because the ref column of the tail unmatched with 'ref4'. ``` ``` sql SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, ref = 'ref3', page = 'A') FROM test_flow_basecond GROUP BY id; - dt id page ref + dt id page ref 1970-01-01 09:00:01 1 A ref4 // This row can not be base point because the ref column unmatched with 'ref3'. 1970-01-01 09:00:02 1 A ref3 // Base point 1970-01-01 09:00:03 1 B ref2 // The result - 1970-01-01 09:00:04 1 B ref1 + 1970-01-01 09:00:04 1 B ref1 ``` ``` sql SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, ref = 'ref2', page = 'B') FROM test_flow_basecond GROUP BY id; - dt id page ref + dt id page ref 1970-01-01 09:00:01 1 A ref4 1970-01-01 09:00:02 1 A ref3 // The result 1970-01-01 09:00:03 1 B ref2 // Base point - 1970-01-01 09:00:04 1 B ref1 // This row can not be base point because the ref column unmatched with 'ref2'. + 1970-01-01 09:00:04 1 B ref1 // This row can not be base point because the ref column unmatched with 'ref2'. ``` diff --git a/docs/en/sql-reference/aggregate-functions/reference/avg.md b/docs/en/sql-reference/aggregate-functions/reference/avg.md index cbd409ccab6..14a4a4c5ad5 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/avg.md +++ b/docs/en/sql-reference/aggregate-functions/reference/avg.md @@ -47,7 +47,7 @@ Query: CREATE table test (t UInt8) ENGINE = Memory; ``` -Get the arithmetic mean: +Get the arithmetic mean: Query: diff --git a/docs/en/sql-reference/aggregate-functions/reference/avgweighted.md b/docs/en/sql-reference/aggregate-functions/reference/avgweighted.md index 2df09e560b4..5f4d846e81b 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/avgweighted.md +++ b/docs/en/sql-reference/aggregate-functions/reference/avgweighted.md @@ -19,7 +19,7 @@ avgWeighted(x, weight) `x` and `weight` must both be [Integer](../../../sql-reference/data-types/int-uint.md), -[floating-point](../../../sql-reference/data-types/float.md), or +[floating-point](../../../sql-reference/data-types/float.md), or [Decimal](../../../sql-reference/data-types/decimal.md), but may have different types. diff --git a/docs/en/sql-reference/aggregate-functions/reference/deltasumtimestamp.md b/docs/en/sql-reference/aggregate-functions/reference/deltasumtimestamp.md index 241010c4761..7238f73bc0d 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/deltasumtimestamp.md +++ b/docs/en/sql-reference/aggregate-functions/reference/deltasumtimestamp.md @@ -32,7 +32,7 @@ Type: [Integer](../../data-types/int-uint.md) or [Float](../../data-types/float. Query: ```sql -SELECT deltaSumTimestamp(value, timestamp) +SELECT deltaSumTimestamp(value, timestamp) FROM (SELECT number AS timestamp, [0, 4, 8, 3, 0, 0, 0, 1, 3, 5][number] AS value FROM numbers(1, 10)); ``` diff --git a/docs/en/sql-reference/aggregate-functions/reference/grouparraysample.md b/docs/en/sql-reference/aggregate-functions/reference/grouparraysample.md index df0b8120eef..bd170ead577 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/grouparraysample.md +++ b/docs/en/sql-reference/aggregate-functions/reference/grouparraysample.md @@ -4,7 +4,7 @@ toc_priority: 114 # groupArraySample {#grouparraysample} -Creates an array of sample argument values. The size of the resulting array is limited to `max_size` elements. Argument values are selected and added to the array randomly. +Creates an array of sample argument values. The size of the resulting array is limited to `max_size` elements. Argument values are selected and added to the array randomly. **Syntax** diff --git a/docs/en/sql-reference/aggregate-functions/reference/mannwhitneyutest.md b/docs/en/sql-reference/aggregate-functions/reference/mannwhitneyutest.md index 34e8188299c..8c57a3eb896 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/mannwhitneyutest.md +++ b/docs/en/sql-reference/aggregate-functions/reference/mannwhitneyutest.md @@ -13,7 +13,7 @@ Applies the Mann-Whitney rank test to samples from two populations. mannWhitneyUTest[(alternative[, continuity_correction])](sample_data, sample_index) ``` -Values of both samples are in the `sample_data` column. If `sample_index` equals to 0 then the value in that row belongs to the sample from the first population. Otherwise it belongs to the sample from the second population. +Values of both samples are in the `sample_data` column. If `sample_index` equals to 0 then the value in that row belongs to the sample from the first population. Otherwise it belongs to the sample from the second population. The null hypothesis is that two populations are stochastically equal. Also one-sided hypothesises can be tested. This test does not assume that data have normal distribution. **Arguments** diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md b/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md index b914e1feedf..25c7233aa56 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md +++ b/docs/en/sql-reference/aggregate-functions/reference/quantilebfloat16.md @@ -4,7 +4,7 @@ toc_priority: 209 # quantileBFloat16 {#quantilebfloat16} -Computes an approximate [quantile](https://en.wikipedia.org/wiki/Quantile) of a sample consisting of [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) numbers. `bfloat16` is a floating-point data type with 1 sign bit, 8 exponent bits and 7 fraction bits. +Computes an approximate [quantile](https://en.wikipedia.org/wiki/Quantile) of a sample consisting of [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) numbers. `bfloat16` is a floating-point data type with 1 sign bit, 8 exponent bits and 7 fraction bits. The function converts input values to 32-bit floats and takes the most significant 16 bits. Then it calculates `bfloat16` quantile value and converts the result to a 64-bit float by appending zero bits. The function is a fast quantile estimator with a relative error no more than 0.390625%. @@ -18,7 +18,7 @@ Alias: `medianBFloat16` **Arguments** -- `expr` — Column with numeric data. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md). +- `expr` — Column with numeric data. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md). **Parameters** diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md b/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md index 069aadc225b..bfd9d1e5a55 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md +++ b/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md @@ -115,7 +115,7 @@ Similar to `quantileExact`, this computes the exact [quantile](https://en.wikipe All the passed values are combined into an array, which is then fully sorted, to get the exact value. The sorting [algorithm's](https://en.cppreference.com/w/cpp/algorithm/sort) complexity is `O(N·log(N))`, where `N = std::distance(first, last)` comparisons. -The return value depends on the quantile level and the number of elements in the selection, i.e. if the level is 0.5, then the function returns the higher median value for an even number of elements and the middle median value for an odd number of elements. Median is calculated similarly to the [median_high](https://docs.python.org/3/library/statistics.html#statistics.median_high) implementation which is used in python. For all other levels, the element at the index corresponding to the value of `level * size_of_array` is returned. +The return value depends on the quantile level and the number of elements in the selection, i.e. if the level is 0.5, then the function returns the higher median value for an even number of elements and the middle median value for an odd number of elements. Median is calculated similarly to the [median_high](https://docs.python.org/3/library/statistics.html#statistics.median_high) implementation which is used in python. For all other levels, the element at the index corresponding to the value of `level * size_of_array` is returned. This implementation behaves exactly similar to the current `quantileExact` implementation. @@ -214,7 +214,7 @@ Result: ## quantileExactInclusive {#quantileexactinclusive} -Exactly computes the [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence. +Exactly computes the [quantile](https://en.wikipedia.org/wiki/Quantile) of a numeric data sequence. To get exact value, all the passed values ​​are combined into an array, which is then partially sorted. Therefore, the function consumes `O(n)` memory, where `n` is a number of values that were passed. However, for a small number of values, the function is very effective. diff --git a/docs/en/sql-reference/aggregate-functions/reference/quantiles.md b/docs/en/sql-reference/aggregate-functions/reference/quantiles.md index 67d1c1ca7e5..9777570be83 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/quantiles.md +++ b/docs/en/sql-reference/aggregate-functions/reference/quantiles.md @@ -16,7 +16,7 @@ Exactly computes the [quantiles](https://en.wikipedia.org/wiki/Quantile) of a nu To get exact value, all the passed values ​​are combined into an array, which is then partially sorted. Therefore, the function consumes `O(n)` memory, where `n` is a number of values that were passed. However, for a small number of values, the function is very effective. -This function is equivalent to [PERCENTILE.EXC](https://support.microsoft.com/en-us/office/percentile-exc-function-bbaa7204-e9e1-4010-85bf-c31dc5dce4ba) Excel function, ([type R6](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample)). +This function is equivalent to [PERCENTILE.EXC](https://support.microsoft.com/en-us/office/percentile-exc-function-bbaa7204-e9e1-4010-85bf-c31dc5dce4ba) Excel function, ([type R6](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample)). Works more efficiently with sets of levels than [quantileExactExclusive](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexactexclusive). diff --git a/docs/en/sql-reference/aggregate-functions/reference/studentttest.md b/docs/en/sql-reference/aggregate-functions/reference/studentttest.md index 3398fc1ca8c..36f80ae6cd7 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/studentttest.md +++ b/docs/en/sql-reference/aggregate-functions/reference/studentttest.md @@ -5,7 +5,7 @@ toc_title: studentTTest # studentTTest {#studentttest} -Applies Student's t-test to samples from two populations. +Applies Student's t-test to samples from two populations. **Syntax** diff --git a/docs/en/sql-reference/aggregate-functions/reference/sumcount.md b/docs/en/sql-reference/aggregate-functions/reference/sumcount.md index b2cb2cfdc09..2986511e01a 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/sumcount.md +++ b/docs/en/sql-reference/aggregate-functions/reference/sumcount.md @@ -12,7 +12,7 @@ Calculates the sum of the numbers and counts the number of rows at the same time sumCount(x) ``` -**Arguments** +**Arguments** - `x` — Input value, must be [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), or [Decimal](../../../sql-reference/data-types/decimal.md). diff --git a/docs/en/sql-reference/aggregate-functions/reference/sumkahan.md b/docs/en/sql-reference/aggregate-functions/reference/sumkahan.md index 1f2b07f692b..d4d47fde1fa 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/sumkahan.md +++ b/docs/en/sql-reference/aggregate-functions/reference/sumkahan.md @@ -15,13 +15,13 @@ The compensation works only for [Float](../../../sql-reference/data-types/float. sumKahan(x) ``` -**Arguments** +**Arguments** - `x` — Input value, must be [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), or [Decimal](../../../sql-reference/data-types/decimal.md). **Returned value** -- the sum of numbers, with type [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), or [Decimal](../../../sql-reference/data-types/decimal.md) depends on type of input arguments +- the sum of numbers, with type [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), or [Decimal](../../../sql-reference/data-types/decimal.md) depends on type of input arguments **Example** diff --git a/docs/en/sql-reference/aggregate-functions/reference/welchttest.md b/docs/en/sql-reference/aggregate-functions/reference/welchttest.md index 02238de42ef..2c1a043aed6 100644 --- a/docs/en/sql-reference/aggregate-functions/reference/welchttest.md +++ b/docs/en/sql-reference/aggregate-functions/reference/welchttest.md @@ -5,7 +5,7 @@ toc_title: welchTTest # welchTTest {#welchttest} -Applies Welch's t-test to samples from two populations. +Applies Welch's t-test to samples from two populations. **Syntax** diff --git a/docs/en/sql-reference/data-types/array.md b/docs/en/sql-reference/data-types/array.md index 0a92a634b6b..4e7e7390e41 100644 --- a/docs/en/sql-reference/data-types/array.md +++ b/docs/en/sql-reference/data-types/array.md @@ -45,7 +45,7 @@ SELECT [1, 2] AS x, toTypeName(x) ## Working with Data Types {#working-with-data-types} -The maximum size of an array is limited to one million elements. +The maximum size of an array is limited to one million elements. When creating an array on the fly, ClickHouse automatically defines the argument type as the narrowest data type that can store all the listed arguments. If there are any [Nullable](../../sql-reference/data-types/nullable.md#data_type-nullable) or literal [NULL](../../sql-reference/syntax.md#null-literal) values, the type of an array element also becomes [Nullable](../../sql-reference/data-types/nullable.md). diff --git a/docs/en/sql-reference/data-types/geo.md b/docs/en/sql-reference/data-types/geo.md index 50093053686..d44f86a4262 100644 --- a/docs/en/sql-reference/data-types/geo.md +++ b/docs/en/sql-reference/data-types/geo.md @@ -5,7 +5,7 @@ toc_title: Geo # Geo Data Types {#geo-data-types} -ClickHouse supports data types for representing geographical objects — locations, lands, etc. +ClickHouse supports data types for representing geographical objects — locations, lands, etc. !!! warning "Warning" Currently geo data types are an experimental feature. To work with them you must set `allow_experimental_geo_types = 1`. @@ -28,7 +28,7 @@ CREATE TABLE geo_point (p Point) ENGINE = Memory(); INSERT INTO geo_point VALUES((10, 10)); SELECT p, toTypeName(p) FROM geo_point; ``` -Result: +Result: ``` text ┌─p─────┬─toTypeName(p)─┠@@ -50,7 +50,7 @@ CREATE TABLE geo_ring (r Ring) ENGINE = Memory(); INSERT INTO geo_ring VALUES([(0, 0), (10, 0), (10, 10), (0, 10)]); SELECT r, toTypeName(r) FROM geo_ring; ``` -Result: +Result: ``` text ┌─r─────────────────────────────┬─toTypeName(r)─┠@@ -73,7 +73,7 @@ INSERT INTO geo_polygon VALUES([[(20, 20), (50, 20), (50, 50), (20, 50)], [(30, SELECT pg, toTypeName(pg) FROM geo_polygon; ``` -Result: +Result: ``` text ┌─pg────────────────────────────────────────────────────────────┬─toTypeName(pg)─┠@@ -83,7 +83,7 @@ Result: ## MultiPolygon {#multipolygon-data-type} -`MultiPolygon` consists of multiple polygons and is stored as an array of polygons: [Array](array.md)([Polygon](#polygon-data-type)). +`MultiPolygon` consists of multiple polygons and is stored as an array of polygons: [Array](array.md)([Polygon](#polygon-data-type)). **Example** @@ -95,7 +95,7 @@ CREATE TABLE geo_multipolygon (mpg MultiPolygon) ENGINE = Memory(); INSERT INTO geo_multipolygon VALUES([[[(0, 0), (10, 0), (10, 10), (0, 10)]], [[(20, 20), (50, 20), (50, 50), (20, 50)],[(30, 30), (50, 50), (50, 30)]]]); SELECT mpg, toTypeName(mpg) FROM geo_multipolygon; ``` -Result: +Result: ``` text ┌─mpg─────────────────────────────────────────────────────────────────────────────────────────────┬─toTypeName(mpg)─┠diff --git a/docs/en/sql-reference/data-types/int-uint.md b/docs/en/sql-reference/data-types/int-uint.md index f0a706b0a37..95d1120ed3d 100644 --- a/docs/en/sql-reference/data-types/int-uint.md +++ b/docs/en/sql-reference/data-types/int-uint.md @@ -7,7 +7,7 @@ toc_title: UInt8, UInt16, UInt32, UInt64, UInt256, Int8, Int16, Int32, Int64, In Fixed-length integers, with or without a sign. -When creating tables, numeric parameters for integer numbers can be set (e.g. `TINYINT(8)`, `SMALLINT(16)`, `INT(32)`, `BIGINT(64)`), but ClickHouse ignores them. +When creating tables, numeric parameters for integer numbers can be set (e.g. `TINYINT(8)`, `SMALLINT(16)`, `INT(32)`, `BIGINT(64)`), but ClickHouse ignores them. ## Int Ranges {#int-ranges} diff --git a/docs/en/sql-reference/data-types/lowcardinality.md b/docs/en/sql-reference/data-types/lowcardinality.md index 5f0f400ce43..b3ff26a943d 100644 --- a/docs/en/sql-reference/data-types/lowcardinality.md +++ b/docs/en/sql-reference/data-types/lowcardinality.md @@ -47,6 +47,7 @@ Settings: - [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part) - [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format) - [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types) +- [output_format_arrow_low_cardinality_as_dictionary](../../operations/settings/settings.md#output-format-arrow-low-cardinality-as-dictionary) Functions: @@ -57,5 +58,3 @@ Functions: - [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality). - [Reducing ClickHouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/). - [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf). - -[Original article](https://clickhouse.tech/docs/en/sql-reference/data-types/lowcardinality/) diff --git a/docs/en/sql-reference/data-types/map.md b/docs/en/sql-reference/data-types/map.md index 6abd150b20f..bfad4375f28 100644 --- a/docs/en/sql-reference/data-types/map.md +++ b/docs/en/sql-reference/data-types/map.md @@ -5,12 +5,12 @@ toc_title: Map(key, value) # Map(key, value) {#data_type-map} -`Map(key, value)` data type stores `key:value` pairs. +`Map(key, value)` data type stores `key:value` pairs. -**Parameters** +**Parameters** -- `key` — The key part of the pair. [String](../../sql-reference/data-types/string.md) or [Integer](../../sql-reference/data-types/int-uint.md). -- `value` — The value part of the pair. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md) or [Array](../../sql-reference/data-types/array.md). +- `key` — The key part of the pair. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md), [LowCardinality](../../sql-reference/data-types/lowcardinality.md), or [FixedString](../../sql-reference/data-types/fixedstring.md). +- `value` — The value part of the pair. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md), [Array](../../sql-reference/data-types/array.md), [LowCardinality](../../sql-reference/data-types/lowcardinality.md), or [FixedString](../../sql-reference/data-types/fixedstring.md). To get the value from an `a Map('key', 'value')` column, use `a['key']` syntax. This lookup works now with a linear complexity. @@ -23,7 +23,7 @@ CREATE TABLE table_map (a Map(String, UInt64)) ENGINE=Memory; INSERT INTO table_map VALUES ({'key1':1, 'key2':10}), ({'key1':2,'key2':20}), ({'key1':3,'key2':30}); ``` -Select all `key2` values: +Select all `key2` values: ```sql SELECT a['key2'] FROM table_map; @@ -38,7 +38,7 @@ Result: └─────────────────────────┘ ``` -If there's no such `key` in the `Map()` column, the query returns zeros for numerical values, empty strings or empty arrays. +If there's no such `key` in the `Map()` column, the query returns zeros for numerical values, empty strings or empty arrays. ```sql INSERT INTO table_map VALUES ({'key3':100}), ({}); diff --git a/docs/en/sql-reference/data-types/simpleaggregatefunction.md b/docs/en/sql-reference/data-types/simpleaggregatefunction.md index 8138d4a4103..e0d8001dcbb 100644 --- a/docs/en/sql-reference/data-types/simpleaggregatefunction.md +++ b/docs/en/sql-reference/data-types/simpleaggregatefunction.md @@ -24,7 +24,7 @@ The following aggregate functions are supported: !!! note "Note" Values of the `SimpleAggregateFunction(func, Type)` look and stored the same way as `Type`, so you do not need to apply functions with `-Merge`/`-State` suffixes. - + `SimpleAggregateFunction` has better performance than `AggregateFunction` with same aggregation function. **Parameters** diff --git a/docs/en/sql-reference/data-types/string.md b/docs/en/sql-reference/data-types/string.md index e72ce8f0b5a..cb3a70ec7f8 100644 --- a/docs/en/sql-reference/data-types/string.md +++ b/docs/en/sql-reference/data-types/string.md @@ -8,7 +8,7 @@ toc_title: String Strings of an arbitrary length. The length is not limited. The value can contain an arbitrary set of bytes, including null bytes. The String type replaces the types VARCHAR, BLOB, CLOB, and others from other DBMSs. -When creating tables, numeric parameters for string fields can be set (e.g. `VARCHAR(255)`), but ClickHouse ignores them. +When creating tables, numeric parameters for string fields can be set (e.g. `VARCHAR(255)`), but ClickHouse ignores them. ## Encodings {#encodings} diff --git a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-layout.md b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-layout.md index c69dc4224e6..ffa0fd6f29e 100644 --- a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-layout.md +++ b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-layout.md @@ -275,9 +275,13 @@ The dictionary is stored in a cache that has a fixed number of cells. These cell When searching for a dictionary, the cache is searched first. For each block of data, all keys that are not found in the cache or are outdated are requested from the source using `SELECT attrs... FROM db.table WHERE id IN (k1, k2, ...)`. The received data is then written to the cache. -For cache dictionaries, the expiration [lifetime](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-lifetime.md) of data in the cache can be set. If more time than `lifetime` has passed since loading the data in a cell, the cell’s value is not used, and it is re-requested the next time it needs to be used. +If keys are not found in dictionary, then update cache task is created and added into update queue. Update queue properties can be controlled with settings `max_update_queue_size`, `update_queue_push_timeout_milliseconds`, `query_wait_timeout_milliseconds`, `max_threads_for_updates`. + +For cache dictionaries, the expiration [lifetime](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-lifetime.md) of data in the cache can be set. If more time than `lifetime` has passed since loading the data in a cell, the cell’s value is not used and key becomes expired, and it is re-requested the next time it needs to be used this behaviour can be configured with setting `allow_read_expired_keys`. This is the least effective of all the ways to store dictionaries. The speed of the cache depends strongly on correct settings and the usage scenario. A cache type dictionary performs well only when the hit rates are high enough (recommended 99% and higher). You can view the average hit rate in the `system.dictionaries` table. +If setting `allow_read_expired_keys` is set to 1, by default 0. Then dictionary can support asynchronous updates. If a client requests keys and all of them are in cache, but some of them are expired, then dictionary will return expired keys for a client and request them asynchronously from the source. + To improve cache performance, use a subquery with `LIMIT`, and call the function with the dictionary externally. Supported [sources](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md): MySQL, ClickHouse, executable, HTTP. @@ -289,6 +293,16 @@ Example of settings: 1000000000 + + 0 + + 100000 + + 10 + + 60000 + + 4 ``` @@ -315,7 +329,7 @@ This type of storage is for use with composite [keys](../../../sql-reference/dic ### ssd_cache {#ssd-cache} -Similar to `cache`, but stores data on SSD and index in RAM. +Similar to `cache`, but stores data on SSD and index in RAM. All cache dictionary settings related to update queue can also be applied to SSD cache dictionaries. ``` xml diff --git a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-polygon.md b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-polygon.md index 93b9b340e89..d2a5011df98 100644 --- a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-polygon.md +++ b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-polygon.md @@ -1,6 +1,6 @@ --- toc_priority: 46 -toc_title: Polygon Dictionaries With Grids +toc_title: Polygon Dictionaries With Grids --- diff --git a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md index 8022221843b..a1d787a37ea 100644 --- a/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md +++ b/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md @@ -598,7 +598,7 @@ SOURCE(CLICKHOUSE( table 'ids' where 'id=10' secure 1 -)) +)); ``` Setting fields: diff --git a/docs/en/sql-reference/functions/bit-functions.md b/docs/en/sql-reference/functions/bit-functions.md index 57e55a7da56..5af04f75e66 100644 --- a/docs/en/sql-reference/functions/bit-functions.md +++ b/docs/en/sql-reference/functions/bit-functions.md @@ -267,7 +267,7 @@ bitHammingDistance(int1, int2) **Returned value** -- The Hamming distance. +- The Hamming distance. Type: [UInt8](../../sql-reference/data-types/int-uint.md). diff --git a/docs/en/sql-reference/functions/encoding-functions.md b/docs/en/sql-reference/functions/encoding-functions.md index b6393e7b4e5..c22f041e0c3 100644 --- a/docs/en/sql-reference/functions/encoding-functions.md +++ b/docs/en/sql-reference/functions/encoding-functions.md @@ -85,7 +85,7 @@ hex(arg) The function is using uppercase letters `A-F` and not using any prefixes (like `0x`) or suffixes (like `h`). -For integer arguments, it prints hex digits (“nibblesâ€) from the most significant to least significant (big endian or “human readable†order). It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints both digits of every byte even if leading digit is zero. +For integer arguments, it prints hex digits (“nibblesâ€) from the most significant to least significant (big-endian or “human-readable†order). It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints both digits of every byte even if the leading digit is zero. **Example** @@ -105,7 +105,7 @@ Values of type `Date` and `DateTime` are formatted as corresponding integers (th For `String` and `FixedString`, all bytes are simply encoded as two hexadecimal numbers. Zero bytes are not omitted. -Values of floating point and Decimal types are encoded as their representation in memory. As we support little endian architecture, they are encoded in little endian. Zero leading/trailing bytes are not omitted. +Values of floating point and Decimal types are encoded as their representation in memory. As we support little-endian architecture, they are encoded in little-endian. Zero leading/trailing bytes are not omitted. **Arguments** @@ -206,6 +206,141 @@ Result: └──────┘ ``` +## bin {#bin} + +Returns a string containing the argument’s binary representation. + +Alias: `BIN`. + +**Syntax** + +``` sql +bin(arg) +``` + +For integer arguments, it prints bin digits from the most significant to least significant (big-endian or “human-readable†order). It starts with the most significant non-zero byte (leading zero bytes are omitted) but always prints eight digits of every byte if the leading digit is zero. + +**Example** + +Query: + +``` sql +SELECT bin(1); +``` + +Result: + +``` text +00000001 +``` + +Values of type `Date` and `DateTime` are formatted as corresponding integers (the number of days since Epoch for Date and the value of Unix Timestamp for DateTime). + +For `String` and `FixedString`, all bytes are simply encoded as eight binary numbers. Zero bytes are not omitted. + +Values of floating-point and Decimal types are encoded as their representation in memory. As we support little-endian architecture, they are encoded in little-endian. Zero leading/trailing bytes are not omitted. + +**Arguments** + +- `arg` — A value to convert to binary. Types: [String](../../sql-reference/data-types/string.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md), [Decimal](../../sql-reference/data-types/decimal.md), [Date](../../sql-reference/data-types/date.md) or [DateTime](../../sql-reference/data-types/datetime.md). + +**Returned value** + +- A string with the binary representation of the argument. + +Type: `String`. + +**Example** + +Query: + +``` sql +SELECT bin(toFloat32(number)) as bin_presentation FROM numbers(15, 2); +``` + +Result: + +``` text +┌─bin_presentation─────────────────┠+│ 00000000000000000111000001000001 │ +│ 00000000000000001000000001000001 │ +└──────────────────────────────────┘ +``` + +Query: + +``` sql +SELECT bin(toFloat64(number)) as bin_presentation FROM numbers(15, 2); +``` + +Result: + +``` text +┌─bin_presentation─────────────────────────────────────────────────┠+│ 0000000000000000000000000000000000000000000000000010111001000000 │ +│ 0000000000000000000000000000000000000000000000000011000001000000 │ +└──────────────────────────────────────────────────────────────────┘ +``` + +## unbin {#unbinstr} + +Performs the opposite operation of [bin](#bin). It interprets each pair of binary digits (in the argument) as a number and converts it to the byte represented by the number. The return value is a binary string (BLOB). + +If you want to convert the result to a number, you can use the [reverse](../../sql-reference/functions/string-functions.md#reverse) and [reinterpretAs](../../sql-reference/functions/type-conversion-functions.md#type-conversion-functions) functions. + +!!! note "Note" + If `unbin` is invoked from within the `clickhouse-client`, binary strings display using UTF-8. + +Alias: `UNBIN`. + +**Syntax** + +``` sql +unbin(arg) +``` + +**Arguments** + +- `arg` — A string containing any number of binary digits. Type: [String](../../sql-reference/data-types/string.md). + +Supports binary digits `0-1`. The number of binary digits does not have to be multiples of eight. If the argument string contains anything other than binary digits, some implementation-defined result is returned (an exception isn’t thrown). For a numeric argument the inverse of bin(N) is not performed by unbin(). + +**Returned value** + +- A binary string (BLOB). + +Type: [String](../../sql-reference/data-types/string.md). + +**Example** + +Query: + +``` sql +SELECT UNBIN('001100000011000100110010'), UNBIN('0100110101111001010100110101000101001100'); +``` + +Result: + +``` text +┌─unbin('001100000011000100110010')─┬─unbin('0100110101111001010100110101000101001100')─┠+│ 012 │ MySQL │ +└───────────────────────────────────┴───────────────────────────────────────────────────┘ +``` + +Query: + +``` sql +SELECT reinterpretAsUInt64(reverse(unbin('1010'))) AS num; +``` + +Result: + +``` text +┌─num─┠+│ 10 │ +└─────┘ +``` + ## UUIDStringToNum(str) {#uuidstringtonumstr} Accepts a string containing 36 characters in the format `123e4567-e89b-12d3-a456-426655440000`, and returns it as a set of bytes in a FixedString(16). diff --git a/docs/en/sql-reference/functions/encryption-functions.md b/docs/en/sql-reference/functions/encryption-functions.md index df27685dcb3..8dc59c65904 100644 --- a/docs/en/sql-reference/functions/encryption-functions.md +++ b/docs/en/sql-reference/functions/encryption-functions.md @@ -9,7 +9,7 @@ These functions implement encryption and decryption of data with AES (Advanced Key length depends on encryption mode. It is 16, 24, and 32 bytes long for `-128-`, `-196-`, and `-256-` modes respectively. -Initialization vector length is always 16 bytes (bytes in excess of 16 are ignored). +Initialization vector length is always 16 bytes (bytes in excess of 16 are ignored). Note that these functions work slowly until ClickHouse 21.1. @@ -168,7 +168,7 @@ Result: ``` text Received exception from server (version 21.1.2): -Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Invalid key size: 33 expected 32: While processing encrypt('aes-256-cfb128', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123'). +Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Invalid key size: 33 expected 32: While processing encrypt('aes-256-cfb128', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123'). ``` While `aes_encrypt_mysql` produces MySQL-compatitalbe output: diff --git a/docs/en/sql-reference/functions/ext-dict-functions.md b/docs/en/sql-reference/functions/ext-dict-functions.md index d7f142dd8b1..54b72e77f01 100644 --- a/docs/en/sql-reference/functions/ext-dict-functions.md +++ b/docs/en/sql-reference/functions/ext-dict-functions.md @@ -87,7 +87,7 @@ SELECT dictGetOrDefault('ext-dict-test', 'c1', number + 1, toUInt32(number * 10)) AS val, toTypeName(val) AS type FROM system.numbers -LIMIT 3 +LIMIT 3; ``` ``` text diff --git a/docs/en/sql-reference/functions/functions-for-nulls.md b/docs/en/sql-reference/functions/functions-for-nulls.md index c06711b3cd2..29de9ee4b70 100644 --- a/docs/en/sql-reference/functions/functions-for-nulls.md +++ b/docs/en/sql-reference/functions/functions-for-nulls.md @@ -211,7 +211,7 @@ SELECT nullIf(1, 2); ## assumeNotNull {#assumenotnull} -Results in a value of type [Nullable](../../sql-reference/data-types/nullable.md) for a non- `Nullable`, if the value is not `NULL`. +Results in an equivalent non-`Nullable` value for a [Nullable](../../sql-reference/data-types/nullable.md) type. In case the original value is `NULL` the result is undetermined. See also `ifNull` and `coalesce` functions. ``` sql assumeNotNull(x) diff --git a/docs/en/sql-reference/functions/geo/geohash.md b/docs/en/sql-reference/functions/geo/geohash.md index cfe35746809..5fbd286eba6 100644 --- a/docs/en/sql-reference/functions/geo/geohash.md +++ b/docs/en/sql-reference/functions/geo/geohash.md @@ -4,7 +4,7 @@ toc_title: Geohash # Functions for Working with Geohash {#geohash} -[Geohash](https://en.wikipedia.org/wiki/Geohash) is the geocode system, which subdivides Earth’s surface into buckets of grid shape and encodes each cell into a short string of letters and digits. It is a hierarchical data structure, so the longer is the geohash string, the more precise is the geographic location. +[Geohash](https://en.wikipedia.org/wiki/Geohash) is the geocode system, which subdivides Earth’s surface into buckets of grid shape and encodes each cell into a short string of letters and digits. It is a hierarchical data structure, so the longer is the geohash string, the more precise is the geographic location. If you need to manually convert geographic coordinates to geohash strings, you can use [geohash.org](http://geohash.org/). diff --git a/docs/en/sql-reference/functions/geo/h3.md b/docs/en/sql-reference/functions/geo/h3.md index 6c03f55cebe..e178603dc45 100644 --- a/docs/en/sql-reference/functions/geo/h3.md +++ b/docs/en/sql-reference/functions/geo/h3.md @@ -4,15 +4,15 @@ toc_title: H3 Indexes # Functions for Working with H3 Indexes {#h3index} -[H3](https://eng.uber.com/h3/) is a geographical indexing system where Earth’s surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be splitted into seven even but smaller ones ("children"), and so on. +[H3](https://eng.uber.com/h3/) is a geographical indexing system where Earth’s surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be splitted into seven even but smaller ones ("children"), and so on. -The level of the hierarchy is called `resolution` and can receive a value from `0` till `15`, where `0` is the `base` level with the largest and coarsest cells. +The level of the hierarchy is called `resolution` and can receive a value from `0` till `15`, where `0` is the `base` level with the largest and coarsest cells. A latitude and longitude pair can be transformed to a 64-bit H3 index, identifying a grid cell. The H3 index is used primarily for bucketing locations and other geospatial manipulations. -The full description of the H3 system is available at [the Uber Engeneering site](https://eng.uber.com/h3/). +The full description of the H3 system is available at [the Uber Engeneering site](https://eng.uber.com/h3/). ## h3IsValid {#h3isvalid} @@ -142,7 +142,7 @@ h3EdgeLengthM(resolution) **Example** -Query: +Query: ``` sql SELECT h3EdgeLengthM(15) as edgeLengthM; diff --git a/docs/en/sql-reference/functions/index.md b/docs/en/sql-reference/functions/index.md index 58e0994a11d..54afd461e1d 100644 --- a/docs/en/sql-reference/functions/index.md +++ b/docs/en/sql-reference/functions/index.md @@ -48,7 +48,7 @@ Functions can’t change the values of their arguments – any changes are retur Higher-order functions can only accept lambda functions as their functional argument. To pass a lambda function to a higher-order function use `->` operator. The left side of the arrow has a formal parameter, which is any ID, or multiple formal parameters – any IDs in a tuple. The right side of the arrow has an expression that can use these formal parameters, as well as any table columns. -Examples: +Examples: ``` x -> 2 * x diff --git a/docs/en/sql-reference/functions/ip-address-functions.md b/docs/en/sql-reference/functions/ip-address-functions.md index 137ebc2407d..469a66d460f 100644 --- a/docs/en/sql-reference/functions/ip-address-functions.md +++ b/docs/en/sql-reference/functions/ip-address-functions.md @@ -53,7 +53,7 @@ Since using ‘xxx’ is highly unusual, this may be changed in the future. We r ### IPv6NumToString(x) {#ipv6numtostringx} Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format. -IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44. +IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44. Alias: `INET6_NTOA`. @@ -123,7 +123,7 @@ LIMIT 10 ## IPv6StringToNum {#ipv6stringtonums} -The reverse function of [IPv6NumToString](#ipv6numtostringx). If the IPv6 address has an invalid format, it returns a string of null bytes. +The reverse function of [IPv6NumToString](#ipv6numtostringx). If the IPv6 address has an invalid format, it returns a string of null bytes. If the input string contains a valid IPv4 address, returns its IPv6 equivalent. HEX can be uppercase or lowercase. @@ -136,13 +136,13 @@ Alias: `INET6_ATON`. IPv6StringToNum(string) ``` -**Argument** +**Argument** - `string` — IP address. [String](../../sql-reference/data-types/string.md). **Returned value** -- IPv6 address in binary format. +- IPv6 address in binary format. Type: [FixedString(16)](../../sql-reference/data-types/fixedstring.md). @@ -280,7 +280,7 @@ toIPv6(string) **Returned value** -- IP address. +- IP address. Type: [IPv6](../../sql-reference/data-types/domains/ipv6.md). diff --git a/docs/en/sql-reference/functions/json-functions.md b/docs/en/sql-reference/functions/json-functions.md index 596ad17f07d..fc49d3a810d 100644 --- a/docs/en/sql-reference/functions/json-functions.md +++ b/docs/en/sql-reference/functions/json-functions.md @@ -325,7 +325,7 @@ toJSONString(value) **Returned value** -- JSON representation of the value. +- JSON representation of the value. Type: [String](../../sql-reference/data-types/string.md). diff --git a/docs/en/sql-reference/functions/logical-functions.md b/docs/en/sql-reference/functions/logical-functions.md index 9d451dfe2b5..965ed97f20c 100644 --- a/docs/en/sql-reference/functions/logical-functions.md +++ b/docs/en/sql-reference/functions/logical-functions.md @@ -21,7 +21,7 @@ and(val1, val2...) **Arguments** -- `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). +- `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). **Returned value** @@ -73,7 +73,7 @@ and(val1, val2...) **Arguments** -- `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). +- `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). **Returned value** @@ -125,7 +125,7 @@ not(val); **Arguments** -- `val` — The value. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). +- `val` — The value. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). **Returned value** @@ -163,11 +163,11 @@ xor(val1, val2...) **Arguments** -- `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). +- `val1, val2, ...` — List of at least two values. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) or [Nullable](../../sql-reference/data-types/nullable.md). **Returned value** -- `1`, for two values: if one of the values is zero and other is not. +- `1`, for two values: if one of the values is zero and other is not. - `0`, for two values: if both values are zero or non-zero at the same time. - `NULL`, if there is at least one `NULL` value. diff --git a/docs/en/sql-reference/functions/machine-learning-functions.md b/docs/en/sql-reference/functions/machine-learning-functions.md index 60dabd73781..4d9322526df 100644 --- a/docs/en/sql-reference/functions/machine-learning-functions.md +++ b/docs/en/sql-reference/functions/machine-learning-functions.md @@ -21,13 +21,13 @@ The [stochasticLogisticRegression](../../sql-reference/aggregate-functions/refer Compares test groups (variants) and calculates for each group the probability to be the best one. The first group is used as a control group. -**Syntax** +**Syntax** ``` sql bayesAB(distribution_name, higher_is_better, variant_names, x, y) ``` -**Arguments** +**Arguments** - `distribution_name` — Name of the probability distribution. [String](../../sql-reference/data-types/string.md). Possible values: diff --git a/docs/en/sql-reference/functions/nlp-functions.md b/docs/en/sql-reference/functions/nlp-functions.md new file mode 100644 index 00000000000..1654771574b --- /dev/null +++ b/docs/en/sql-reference/functions/nlp-functions.md @@ -0,0 +1,132 @@ +--- +toc_priority: 67 +toc_title: NLP +--- + +# [experimental] Natural Language Processing functions {#nlp-functions} + +!!! warning "Warning" + This is an experimental feature that is currently in development and is not ready for general use. It will change in unpredictable backwards-incompatible ways in future releases. Set `allow_experimental_nlp_functions = 1` to enable it. + +## stem {#stem} + +Performs stemming on a given word. + +**Syntax** + +``` sql +stem('language', word) +``` + +**Arguments** + +- `language` — Language which rules will be applied. Must be in lowercase. [String](../../sql-reference/data-types/string.md#string). +- `word` — word that needs to be stemmed. Must be in lowercase. [String](../../sql-reference/data-types/string.md#string). + +**Examples** + +Query: + +``` sql +SELECT SELECT arrayMap(x -> stem('en', x), ['I', 'think', 'it', 'is', 'a', 'blessing', 'in', 'disguise']) as res; +``` + +Result: + +``` text +┌─res────────────────────────────────────────────────┠+│ ['I','think','it','is','a','bless','in','disguis'] │ +└────────────────────────────────────────────────────┘ +``` + +## lemmatize {#lemmatize} + +Performs lemmatization on a given word. Needs dictionaries to operate, which can be obtained [here](https://github.com/vpodpecan/lemmagen3/tree/master/src/lemmagen3/models). + +**Syntax** + +``` sql +lemmatize('language', word) +``` + +**Arguments** + +- `language` — Language which rules will be applied. [String](../../sql-reference/data-types/string.md#string). +- `word` — Word that needs to be lemmatized. Must be lowercase. [String](../../sql-reference/data-types/string.md#string). + +**Examples** + +Query: + +``` sql +SELECT lemmatize('en', 'wolves'); +``` + +Result: + +``` text +┌─lemmatize("wolves")─┠+│ "wolf" │ +└─────────────────────┘ +``` + +Configuration: +``` xml + + + en + en.bin + + +``` + +## synonyms {#synonyms} + +Finds synonyms to a given word. There are two types of synonym extensions: `plain` and `wordnet`. + +With the `plain` extension type we need to provide a path to a simple text file, where each line corresponds to a certain synonym set. Words in this line must be separated with space or tab characters. + +With the `wordnet` extension type we need to provide a path to a directory with WordNet thesaurus in it. Thesaurus must contain a WordNet sense index. + +**Syntax** + +``` sql +synonyms('extension_name', word) +``` + +**Arguments** + +- `extension_name` — Name of the extension in which search will be performed. [String](../../sql-reference/data-types/string.md#string). +- `word` — Word that will be searched in extension. [String](../../sql-reference/data-types/string.md#string). + +**Examples** + +Query: + +``` sql +SELECT synonyms('list', 'important'); +``` + +Result: + +``` text +┌─synonyms('list', 'important')────────────┠+│ ['important','big','critical','crucial'] │ +└──────────────────────────────────────────┘ +``` + +Configuration: +``` xml + + + en + plain + en.txt + + + en + wordnet + en/ + + +``` diff --git a/docs/en/sql-reference/functions/other-functions.md b/docs/en/sql-reference/functions/other-functions.md index 30e2e427158..9b1a07a9faa 100644 --- a/docs/en/sql-reference/functions/other-functions.md +++ b/docs/en/sql-reference/functions/other-functions.md @@ -2138,3 +2138,52 @@ Result: - [tcp_port](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port) +## currentProfiles {#current-profiles} + +Returns a list of the current [settings profiles](../../operations/access-rights.md#settings-profiles-management) for the current user. + +The command [SET PROFILE](../../sql-reference/statements/set.md#query-set) could be used to change the current setting profile. If the command `SET PROFILE` was not used the function returns the profiles specified at the current user's definition (see [CREATE USER](../../sql-reference/statements/create/user.md#create-user-statement)). + +**Syntax** + +``` sql +currentProfiles() +``` + +**Returned value** + +- List of the current user settings profiles. + +Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). + +## enabledProfiles {#enabled-profiles} + + Returns settings profiles, assigned to the current user both explicitly and implicitly. Explicitly assigned profiles are the same as returned by the [currentProfiles](#current-profiles) function. Implicitly assigned profiles include parent profiles of other assigned profiles, profiles assigned via granted roles, profiles assigned via their own settings, and the main default profile (see the `default_profile` section in the main server configuration file). + +**Syntax** + +``` sql +enabledProfiles() +``` + +**Returned value** + +- List of the enabled settings profiles. + +Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). + +## defaultProfiles {#default-profiles} + +Returns all the profiles specified at the current user's definition (see [CREATE USER](../../sql-reference/statements/create/user.md#create-user-statement) statement). + +**Syntax** + +``` sql +defaultProfiles() +``` + +**Returned value** + +- List of the default settings profiles. + +Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). diff --git a/docs/en/sql-reference/functions/splitting-merging-functions.md b/docs/en/sql-reference/functions/splitting-merging-functions.md index 2d384f1aa3c..718d5a977b9 100644 --- a/docs/en/sql-reference/functions/splitting-merging-functions.md +++ b/docs/en/sql-reference/functions/splitting-merging-functions.md @@ -145,6 +145,72 @@ Result: └────────────────────────────┘ ``` +## splitByWhitespace(s) {#splitbywhitespaceseparator-s} + +Splits a string into substrings separated by whitespace characters. +Returns an array of selected substrings. + +**Syntax** + +``` sql +splitByWhitespace(s) +``` + +**Arguments** + +- `s` — The string to split. [String](../../sql-reference/data-types/string.md). + +**Returned value(s)** + +Returns an array of selected substrings. + +Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). + +**Example** + +``` sql +SELECT splitByWhitespace(' 1! a, b. '); +``` + +``` text +┌─splitByWhitespace(' 1! a, b. ')─┠+│ ['1!','a,','b.'] │ +└─────────────────────────────────────┘ +``` + +## splitByNonAlpha(s) {#splitbynonalphaseparator-s} + +Splits a string into substrings separated by whitespace and punctuation characters. +Returns an array of selected substrings. + +**Syntax** + +``` sql +splitByNonAlpha(s) +``` + +**Arguments** + +- `s` — The string to split. [String](../../sql-reference/data-types/string.md). + +**Returned value(s)** + +Returns an array of selected substrings. + +Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). + +**Example** + +``` sql +SELECT splitByNonAlpha(' 1! a, b. '); +``` + +``` text +┌─splitByNonAlpha(' 1! a, b. ')─┠+│ ['1','a','b'] │ +└───────────────────────────────────┘ +``` + ## arrayStringConcat(arr\[, separator\]) {#arraystringconcatarr-separator} Concatenates the strings listed in the array with the separator.’separator’ is an optional parameter: a constant string, set to an empty string by default. @@ -170,13 +236,13 @@ SELECT alphaTokens('abca1abc'); Extracts all groups from non-overlapping substrings matched by a regular expression. -**Syntax** +**Syntax** ``` sql -extractAllGroups(text, regexp) +extractAllGroups(text, regexp) ``` -**Arguments** +**Arguments** - `text` — [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). - `regexp` — Regular expression. Constant. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md). diff --git a/docs/en/sql-reference/functions/string-functions.md b/docs/en/sql-reference/functions/string-functions.md index 5074f478bc0..8ec8aa7339d 100644 --- a/docs/en/sql-reference/functions/string-functions.md +++ b/docs/en/sql-reference/functions/string-functions.md @@ -13,13 +13,14 @@ toc_title: Strings Returns 1 for an empty string or 0 for a non-empty string. The result type is UInt8. A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte. -The function also works for arrays. +The function also works for arrays or UUID. +UUID is empty if it is all zeros (nil UUID). ## notEmpty {#notempty} Returns 0 for an empty string or 1 for a non-empty string. The result type is UInt8. -The function also works for arrays. +The function also works for arrays or UUID. ## length {#length} @@ -503,7 +504,7 @@ Replaces literals, sequences of literals and complex aliases with placeholders. normalizeQuery(x) ``` -**Arguments** +**Arguments** - `x` — Sequence of characters. [String](../../sql-reference/data-types/string.md). @@ -533,13 +534,13 @@ Result: Returns identical 64bit hash values without the values of literals for similar queries. It helps to analyze query log. -**Syntax** +**Syntax** ``` sql normalizedQueryHash(x) ``` -**Arguments** +**Arguments** - `x` — Sequence of characters. [String](../../sql-reference/data-types/string.md). @@ -571,13 +572,13 @@ Escapes characters to place string into XML text node or attribute. The following five XML predefined entities will be replaced: `<`, `&`, `>`, `"`, `'`. -**Syntax** +**Syntax** ``` sql encodeXMLComponent(x) ``` -**Arguments** +**Arguments** - `x` — The sequence of characters. [String](../../sql-reference/data-types/string.md). @@ -640,7 +641,7 @@ SELECT decodeXMLComponent('< Σ >'); Result: ``` text -'foo' +'foo' < Σ > ``` @@ -682,7 +683,7 @@ extractTextFromHTML(x) **Arguments** -- `x` — input text. [String](../../sql-reference/data-types/string.md). +- `x` — input text. [String](../../sql-reference/data-types/string.md). **Returned value** diff --git a/docs/en/sql-reference/functions/string-search-functions.md b/docs/en/sql-reference/functions/string-search-functions.md index 551c4aee8f0..d3b4fc908cc 100644 --- a/docs/en/sql-reference/functions/string-search-functions.md +++ b/docs/en/sql-reference/functions/string-search-functions.md @@ -14,7 +14,7 @@ The search is case-sensitive by default in all these functions. There are separa Searches for the substring `needle` in the string `haystack`. -Returns the position (in bytes) of the found substring in the string, starting from 1. +Returns the position (in bytes) of the found substring in the string, starting from 1. For a case-insensitive search, use the function [positionCaseInsensitive](#positioncaseinsensitive). @@ -22,11 +22,11 @@ For a case-insensitive search, use the function [positionCaseInsensitive](#posit ``` sql position(haystack, needle[, start_pos]) -``` +``` ``` sql position(needle IN haystack) -``` +``` Alias: `locate(haystack, needle[, start_pos])`. @@ -399,27 +399,27 @@ Extracts all the fragments of a string using a regular expression. If ‘haystac ## extractAllGroupsHorizontal {#extractallgroups-horizontal} -Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc. +Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc. !!! note "Note" `extractAllGroupsHorizontal` function is slower than [extractAllGroupsVertical](#extractallgroups-vertical). -**Syntax** +**Syntax** ``` sql extractAllGroupsHorizontal(haystack, pattern) ``` -**Arguments** +**Arguments** - `haystack` — Input string. Type: [String](../../sql-reference/data-types/string.md). -- `pattern` — Regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. Type: [String](../../sql-reference/data-types/string.md). +- `pattern` — Regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. Type: [String](../../sql-reference/data-types/string.md). **Returned value** - Type: [Array](../../sql-reference/data-types/array.md). -If `haystack` does not match the `pattern` regex, an array of empty arrays is returned. +If `haystack` does not match the `pattern` regex, an array of empty arrays is returned. **Example** @@ -445,13 +445,13 @@ Result: Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the `haystack`. -**Syntax** +**Syntax** ``` sql extractAllGroupsVertical(haystack, pattern) ``` -**Arguments** +**Arguments** - `haystack` — Input string. Type: [String](../../sql-reference/data-types/string.md). - `pattern` — Regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. Type: [String](../../sql-reference/data-types/string.md). @@ -460,7 +460,7 @@ extractAllGroupsVertical(haystack, pattern) - Type: [Array](../../sql-reference/data-types/array.md). -If `haystack` does not match the `pattern` regex, an empty array is returned. +If `haystack` does not match the `pattern` regex, an empty array is returned. **Example** @@ -731,7 +731,7 @@ SELECT countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos]) Type: [UInt64](../../sql-reference/data-types/int-uint.md). -**Examples** +**Examples** Query: diff --git a/docs/en/sql-reference/functions/tuple-map-functions.md b/docs/en/sql-reference/functions/tuple-map-functions.md index dcfa18e04bf..ef5f5814017 100644 --- a/docs/en/sql-reference/functions/tuple-map-functions.md +++ b/docs/en/sql-reference/functions/tuple-map-functions.md @@ -9,13 +9,13 @@ toc_title: Working with maps Arranges `key:value` pairs into [Map(key, value)](../../sql-reference/data-types/map.md) data type. -**Syntax** +**Syntax** -``` sql +```sql map(key1, value1[, key2, value2, ...]) ``` -**Arguments** +**Arguments** - `key` — The key part of the pair. [String](../../sql-reference/data-types/string.md) or [Integer](../../sql-reference/data-types/int-uint.md). - `value` — The value part of the pair. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md) or [Array](../../sql-reference/data-types/array.md). @@ -30,7 +30,7 @@ Type: [Map(key, value)](../../sql-reference/data-types/map.md). Query: -``` sql +```sql SELECT map('key1', number, 'key2', number * 2) FROM numbers(3); ``` @@ -46,7 +46,7 @@ Result: Query: -``` sql +```sql CREATE TABLE table_map (a Map(String, UInt64)) ENGINE = MergeTree() ORDER BY a; INSERT INTO table_map SELECT map('key1', number, 'key2', number * 2) FROM numbers(3); SELECT a['key2'] FROM table_map; @@ -54,7 +54,7 @@ SELECT a['key2'] FROM table_map; Result: -``` text +```text ┌─arrayElement(a, 'key2')─┠│ 0 │ │ 2 │ @@ -62,7 +62,7 @@ Result: └─────────────────────────┘ ``` -**See Also** +**See Also** - [Map(key, value)](../../sql-reference/data-types/map.md) data type @@ -72,7 +72,7 @@ Collect all the keys and sum corresponding values. **Syntax** -``` sql +```sql mapAdd(arg1, arg2 [, ...]) ``` @@ -88,13 +88,13 @@ Arguments are [maps](../../sql-reference/data-types/map.md) or [tuples](../../sq Query with a tuple map: -``` sql +```sql SELECT mapAdd(([toUInt8(1), 2], [1, 1]), ([toUInt8(1), 2], [1, 1])) as res, toTypeName(res) as type; ``` Result: -``` text +```text ┌─res───────────┬─type───────────────────────────────┠│ ([1,2],[2,2]) │ Tuple(Array(UInt8), Array(UInt64)) │ └───────────────┴────────────────────────────────────┘ @@ -102,30 +102,39 @@ Result: Query with `Map` type: -``` sql +```sql +SELECT mapAdd(map(1,1), map(1,1)); +``` + +Result: + +```text +┌─mapAdd(map(1, 1), map(1, 1))─┠+│ {1:2} │ +└──────────────────────────────┘ ``` ## mapSubtract {#function-mapsubtract} Collect all the keys and subtract corresponding values. -**Syntax** +**Syntax** -``` sql +```sql mapSubtract(Tuple(Array, Array), Tuple(Array, Array) [, ...]) ``` -**Arguments** +**Arguments** -Arguments are [tuples](../../sql-reference/data-types/tuple.md#tuplet1-t2) of two [arrays](../../sql-reference/data-types/array.md#data-type-array), where items in the first array represent keys, and the second array contains values for the each key. All key arrays should have same type, and all value arrays should contain items which are promote to the one type ([Int64](../../sql-reference/data-types/int-uint.md#int-ranges), [UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges) or [Float64](../../sql-reference/data-types/float.md#float32-float64)). The common promoted type is used as a type for the result array. +Arguments are [maps](../../sql-reference/data-types/map.md) or [tuples](../../sql-reference/data-types/tuple.md#tuplet1-t2) of two [arrays](../../sql-reference/data-types/array.md#data-type-array), where items in the first array represent keys, and the second array contains values for the each key. All key arrays should have same type, and all value arrays should contain items which are promote to the one type ([Int64](../../sql-reference/data-types/int-uint.md#int-ranges), [UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges) or [Float64](../../sql-reference/data-types/float.md#float32-float64)). The common promoted type is used as a type for the result array. **Returned value** -- Returns one [tuple](../../sql-reference/data-types/tuple.md#tuplet1-t2), where the first array contains the sorted keys and the second array contains values. +- Depending on the arguments returns one [map](../../sql-reference/data-types/map.md) or [tuple](../../sql-reference/data-types/tuple.md#tuplet1-t2), where the first array contains the sorted keys and the second array contains values. **Example** -Query: +Query with a tuple map: ```sql SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt32(2), 1])) as res, toTypeName(res) as type; @@ -139,32 +148,54 @@ Result: └────────────────┴───────────────────────────────────┘ ``` +Query with `Map` type: + +```sql +SELECT mapSubtract(map(1,1), map(1,1)); +``` + +Result: + +```text +┌─mapSubtract(map(1, 1), map(1, 1))─┠+│ {1:0} │ +└───────────────────────────────────┘ +``` + ## mapPopulateSeries {#function-mappopulateseries} Fills missing keys in the maps (key and value array pair), where keys are integers. Also, it supports specifying the max key, which is used to extend the keys array. +Arguments are [maps](../../sql-reference/data-types/map.md) or two [arrays](../../sql-reference/data-types/array.md#data-type-array), where the first array represent keys, and the second array contains values for the each key. -**Syntax** +For array arguments the number of elements in `keys` and `values` must be the same for each row. -``` sql +**Syntax** + +```sql mapPopulateSeries(keys, values[, max]) +mapPopulateSeries(map[, max]) ``` -Generates a map, where keys are a series of numbers, from minimum to maximum keys (or `max` argument if it specified) taken from `keys` array with a step size of one, and corresponding values taken from `values` array. If the value is not specified for the key, then it uses the default value in the resulting map. For repeated keys, only the first value (in order of appearing) gets associated with the key. - -The number of elements in `keys` and `values` must be the same for each row. +Generates a map (a tuple with two arrays or a value of `Map` type, depending on the arguments), where keys are a series of numbers, from minimum to maximum keys (or `max` argument if it specified) taken from the map with a step size of one, and corresponding values. If the value is not specified for the key, then it uses the default value in the resulting map. For repeated keys, only the first value (in order of appearing) gets associated with the key. **Arguments** +Mapped arrays: + - `keys` — Array of keys. [Array](../../sql-reference/data-types/array.md#data-type-array)([Int](../../sql-reference/data-types/int-uint.md#uint-ranges)). - `values` — Array of values. [Array](../../sql-reference/data-types/array.md#data-type-array)([Int](../../sql-reference/data-types/int-uint.md#uint-ranges)). +or + +- `map` — Map with integer keys. [Map](../../sql-reference/data-types/map.md). + **Returned value** -- Returns a [tuple](../../sql-reference/data-types/tuple.md#tuplet1-t2) of two [arrays](../../sql-reference/data-types/array.md#data-type-array): keys in sorted order, and values the corresponding keys. +- Depending on the arguments returns a [map](../../sql-reference/data-types/map.md) or a [tuple](../../sql-reference/data-types/tuple.md#tuplet1-t2) of two [arrays](../../sql-reference/data-types/array.md#data-type-array): keys in sorted order, and values the corresponding keys. **Example** -Query: +Query with mapped arrays: ```sql select mapPopulateSeries([1,2,4], [11,22,44], 5) as res, toTypeName(res) as type; @@ -178,17 +209,31 @@ Result: └──────────────────────────────┴───────────────────────────────────┘ ``` +Query with `Map` type: + +```sql +SELECT mapPopulateSeries(map(1, 10, 5, 20), 6); +``` + +Result: + +```text +┌─mapPopulateSeries(map(1, 10, 5, 20), 6)─┠+│ {1:10,2:0,3:0,4:0,5:20,6:0} │ +└─────────────────────────────────────────┘ +``` + ## mapContains {#mapcontains} Determines whether the `map` contains the `key` parameter. **Syntax** -``` sql +```sql mapContains(map, key) ``` -**Parameters** +**Parameters** - `map` — Map. [Map](../../sql-reference/data-types/map.md). - `key` — Key. Type matches the type of keys of `map` parameter. diff --git a/docs/en/sql-reference/functions/type-conversion-functions.md b/docs/en/sql-reference/functions/type-conversion-functions.md index 661469e6901..efd28def688 100644 --- a/docs/en/sql-reference/functions/type-conversion-functions.md +++ b/docs/en/sql-reference/functions/type-conversion-functions.md @@ -373,7 +373,7 @@ This function accepts a number or date or date with time, and returns a FixedStr ## reinterpretAsUUID {#reinterpretasuuid} -Accepts 16 bytes string and returns UUID containing bytes representing the corresponding value in network byte order (big-endian). If the string isn't long enough, the function works as if the string is padded with the necessary number of null bytes to the end. If the string longer than 16 bytes, the extra bytes at the end are ignored. +Accepts 16 bytes string and returns UUID containing bytes representing the corresponding value in network byte order (big-endian). If the string isn't long enough, the function works as if the string is padded with the necessary number of null bytes to the end. If the string longer than 16 bytes, the extra bytes at the end are ignored. **Syntax** @@ -439,8 +439,8 @@ reinterpret(x, type) **Arguments** -- `x` — Any type. -- `type` — Destination type. [String](../../sql-reference/data-types/string.md). +- `x` — Any type. +- `type` — Destination type. [String](../../sql-reference/data-types/string.md). **Returned value** @@ -465,27 +465,29 @@ Result: ## CAST(x, T) {#type_conversion_function-cast} -Converts input value `x` to the `T` data type. Unlike to `reinterpret` function, type conversion is performed in a natural way. - -The syntax `CAST(x AS t)` is also supported. - -!!! note "Note" - If value `x` does not fit the bounds of type `T`, the function overflows. For example, `CAST(-1, 'UInt8')` returns `255`. +Converts an input value to the specified data type. Unlike the [reinterpret](#type_conversion_function-reinterpret) function, `CAST` tries to present the same value using the new data type. If the conversion can not be done then an exception is raised. +Several syntax variants are supported. **Syntax** ``` sql CAST(x, T) +CAST(x AS t) +x::t ``` **Arguments** -- `x` — Any type. -- `T` — Destination type. [String](../../sql-reference/data-types/string.md). +- `x` — A value to convert. May be of any type. +- `T` — The name of the target data type. [String](../../sql-reference/data-types/string.md). +- `t` — The target data type. **Returned value** -- Destination type value. +- Converted value. + +!!! note "Note" + If the input value does not fit the bounds of the target type, the result overflows. For example, `CAST(-1, 'UInt8')` returns `255`. **Examples** @@ -494,16 +496,16 @@ Query: ```sql SELECT CAST(toInt8(-1), 'UInt8') AS cast_int_to_uint, - CAST(toInt8(1), 'Float32') AS cast_int_to_float, - CAST('1', 'UInt32') AS cast_string_to_int; + CAST(1.5 AS Decimal(3,2)) AS cast_float_to_decimal, + '1'::Int32 AS cast_string_to_int; ``` Result: ``` -┌─cast_int_to_uint─┬─cast_int_to_float─┬─cast_string_to_int─┠-│ 255 │ 1 │ 1 │ -└──────────────────┴───────────────────┴────────────────────┘ +┌─cast_int_to_uint─┬─cast_float_to_decimal─┬─cast_string_to_int─┠+│ 255 │ 1.50 │ 1 │ +└──────────────────┴───────────────────────┴────────────────────┘ ``` Query: @@ -527,7 +529,7 @@ Result: Conversion to FixedString(N) only works for arguments of type [String](../../sql-reference/data-types/string.md) or [FixedString](../../sql-reference/data-types/fixedstring.md). -Type conversion to [Nullable](../../sql-reference/data-types/nullable.md) and back is supported. +Type conversion to [Nullable](../../sql-reference/data-types/nullable.md) and back is supported. **Example** @@ -567,7 +569,7 @@ Result: ## accurateCast(x, T) {#type_conversion_function-accurate-cast} -Converts `x` to the `T` data type. +Converts `x` to the `T` data type. The difference from [cast(x, T)](#type_conversion_function-cast) is that `accurateCast` does not allow overflow of numeric types during cast if type value `x` does not fit the bounds of type `T`. For example, `accurateCast(-1, 'UInt8')` throws an exception. @@ -1168,7 +1170,7 @@ Result: ## toUnixTimestamp64Nano {#tounixtimestamp64nano} -Converts a `DateTime64` to a `Int64` value with fixed sub-second precision. Input value is scaled up or down appropriately depending on it precision. +Converts a `DateTime64` to a `Int64` value with fixed sub-second precision. Input value is scaled up or down appropriately depending on it precision. !!! info "Note" The output value is a timestamp in UTC, not in the timezone of `DateTime64`. @@ -1204,7 +1206,7 @@ Result: └──────────────────────────────┘ ``` -Query: +Query: ``` sql WITH toDateTime64('2019-09-16 19:20:12.345678910', 6) AS dt64 diff --git a/docs/en/sql-reference/functions/url-functions.md b/docs/en/sql-reference/functions/url-functions.md index 397ae45ec71..ae2113a2b64 100644 --- a/docs/en/sql-reference/functions/url-functions.md +++ b/docs/en/sql-reference/functions/url-functions.md @@ -283,7 +283,7 @@ SELECT firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'publi Result: -```text +```text ┌─firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list')─┠│ foo │ └──────────────────────────────────────────────────────────────────────────────────────────┘ diff --git a/docs/en/sql-reference/statements/alter/column.md b/docs/en/sql-reference/statements/alter/column.md index 2e7cd1be952..801690afbb2 100644 --- a/docs/en/sql-reference/statements/alter/column.md +++ b/docs/en/sql-reference/statements/alter/column.md @@ -20,12 +20,11 @@ The following actions are supported: - [ADD COLUMN](#alter_add-column) — Adds a new column to the table. - [DROP COLUMN](#alter_drop-column) — Deletes the column. -- [RENAME COLUMN](#alter_rename-column) — Renames the column. +- [RENAME COLUMN](#alter_rename-column) — Renames an existing column. - [CLEAR COLUMN](#alter_clear-column) — Resets column values. - [COMMENT COLUMN](#alter_comment-column) — Adds a text comment to the column. - [MODIFY COLUMN](#alter_modify-column) — Changes column’s type, default expression and TTL. - [MODIFY COLUMN REMOVE](#modify-remove) — Removes one of the column properties. -- [RENAME COLUMN](#alter_rename-column) — Renames an existing column. These actions are described in detail below. @@ -35,7 +34,7 @@ These actions are described in detail below. ADD COLUMN [IF NOT EXISTS] name [type] [default_expr] [codec] [AFTER name_after | FIRST] ``` -Adds a new column to the table with the specified `name`, `type`, [`codec`](../../../sql-reference/statements/create/table.md#codecs) and `default_expr` (see the section [Default expressions](../../../sql-reference/statements/create/table.md#create-default-values)). +Adds a new column to the table with the specified `name`, `type`, [`codec`](../create/table.md#codecs) and `default_expr` (see the section [Default expressions](../../../sql-reference/statements/create/table.md#create-default-values)). If the `IF NOT EXISTS` clause is included, the query won’t return an error if the column already exists. If you specify `AFTER name_after` (the name of another column), the column is added after the specified one in the list of table columns. If you want to add a column to the beginning of the table use the `FIRST` clause. Otherwise, the column is added to the end of the table. For a chain of actions, `name_after` can be the name of a column that is added in one of the previous actions. @@ -64,6 +63,7 @@ Added2 UInt32 ToDrop UInt32 Added3 UInt32 ``` + ## DROP COLUMN {#alter_drop-column} ``` sql @@ -91,7 +91,7 @@ RENAME COLUMN [IF EXISTS] name to new_name Renames the column `name` to `new_name`. If the `IF EXISTS` clause is specified, the query won’t return an error if the column does not exist. Since renaming does not involve the underlying data, the query is completed almost instantly. -**NOTE**: Columns specified in the key expression of the table (either with `ORDER BY` or `PRIMARY KEY`) cannot be renamed. Trying to change these columns will produce `SQL Error [524]`. +**NOTE**: Columns specified in the key expression of the table (either with `ORDER BY` or `PRIMARY KEY`) cannot be renamed. Trying to change these columns will produce `SQL Error [524]`. Example: @@ -118,7 +118,7 @@ ALTER TABLE visits CLEAR COLUMN browser IN PARTITION tuple() ## COMMENT COLUMN {#alter_comment-column} ``` sql -COMMENT COLUMN [IF EXISTS] name 'comment' +COMMENT COLUMN [IF EXISTS] name 'Text comment' ``` Adds a comment to the column. If the `IF EXISTS` clause is specified, the query won’t return an error if the column does not exist. @@ -136,7 +136,7 @@ ALTER TABLE visits COMMENT COLUMN browser 'The table shows the browser used for ## MODIFY COLUMN {#alter_modify-column} ``` sql -MODIFY COLUMN [IF EXISTS] name [type] [default_expr] [TTL] [AFTER name_after | FIRST] +MODIFY COLUMN [IF EXISTS] name [type] [default_expr] [codec] [TTL] [AFTER name_after | FIRST] ``` This query changes the `name` column properties: @@ -145,8 +145,12 @@ This query changes the `name` column properties: - Default expression +- Compression Codec + - TTL +For examples of columns compression CODECS modifying, see [Column Compression Codecs](../create/table.md#codecs). + For examples of columns TTL modifying, see [Column TTL](../../../engines/table-engines/mergetree-family/mergetree.md#mergetree-column-ttl). If the `IF EXISTS` clause is specified, the query won’t return an error if the column does not exist. @@ -179,6 +183,8 @@ ALTER TABLE table_name MODIFY column_name REMOVE property; **Example** +Remove TTL: + ```sql ALTER TABLE table_with_ttl MODIFY COLUMN column_ttl REMOVE TTL; ``` @@ -187,22 +193,6 @@ ALTER TABLE table_with_ttl MODIFY COLUMN column_ttl REMOVE TTL; - [REMOVE TTL](ttl.md). -## RENAME COLUMN {#alter_rename-column} - -Renames an existing column. - -Syntax: - -```sql -ALTER TABLE table_name RENAME COLUMN column_name TO new_column_name -``` - -**Example** - -```sql -ALTER TABLE table_with_ttl RENAME COLUMN column_ttl TO column_ttl_new; -``` - ## Limitations {#alter-query-limitations} The `ALTER` query lets you create and delete separate elements (columns) in nested data structures, but not whole nested data structures. To add a nested data structure, you can add columns with a name like `name.nested_name` and the type `Array(T)`. A nested data structure is equivalent to multiple array columns with a name that has the same prefix before the dot. @@ -213,4 +203,4 @@ If the `ALTER` query is not sufficient to make the table changes you need, you c The `ALTER` query blocks all reads and writes for the table. In other words, if a long `SELECT` is running at the time of the `ALTER` query, the `ALTER` query will wait for it to complete. At the same time, all new queries to the same table will wait while this `ALTER` is running. -For tables that do not store data themselves (such as `Merge` and `Distributed`), `ALTER` just changes the table structure, and does not change the structure of subordinate tables. For example, when running ALTER for a `Distributed` table, you will also need to run `ALTER` for the tables on all remote servers. +For tables that do not store data themselves (such as [Merge](../../../sql-reference/statements/alter/index.md) and [Distributed](../../../sql-reference/statements/alter/index.md)), `ALTER` just changes the table structure, and does not change the structure of subordinate tables. For example, when running ALTER for a `Distributed` table, you will also need to run `ALTER` for the tables on all remote servers. diff --git a/docs/en/sql-reference/statements/alter/partition.md b/docs/en/sql-reference/statements/alter/partition.md index 090cbe93c54..e1a76d2c0ae 100644 --- a/docs/en/sql-reference/statements/alter/partition.md +++ b/docs/en/sql-reference/statements/alter/partition.md @@ -88,7 +88,7 @@ ALTER TABLE visits ATTACH PART 201901_2_2_0; Read more about setting the partition expression in a section [How to specify the partition expression](#alter-how-to-specify-part-expr). -This query is replicated. The replica-initiator checks whether there is data in the `detached` directory. +This query is replicated. The replica-initiator checks whether there is data in the `detached` directory. If data exists, the query checks its integrity. If everything is correct, the query adds the data to the table. If the non-initiator replica, receiving the attach command, finds the part with the correct checksums in its own `detached` folder, it attaches the data without fetching it from other replicas. diff --git a/docs/en/sql-reference/statements/alter/role.md b/docs/en/sql-reference/statements/alter/role.md index 1253b72542d..ea6d3c61820 100644 --- a/docs/en/sql-reference/statements/alter/role.md +++ b/docs/en/sql-reference/statements/alter/role.md @@ -10,7 +10,7 @@ Changes roles. Syntax: ``` sql -ALTER ROLE [IF EXISTS] name1 [ON CLUSTER cluster_name1] [RENAME TO new_name1] +ALTER ROLE [IF EXISTS] name1 [ON CLUSTER cluster_name1] [RENAME TO new_name1] [, name2 [ON CLUSTER cluster_name2] [RENAME TO new_name2] ...] [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY|WRITABLE] | PROFILE 'profile_name'] [,...] ``` diff --git a/docs/en/sql-reference/statements/alter/row-policy.md b/docs/en/sql-reference/statements/alter/row-policy.md index 56967f11605..bbf9f317737 100644 --- a/docs/en/sql-reference/statements/alter/row-policy.md +++ b/docs/en/sql-reference/statements/alter/row-policy.md @@ -10,7 +10,7 @@ Changes row policy. Syntax: ``` sql -ALTER [ROW] POLICY [IF EXISTS] name1 [ON CLUSTER cluster_name1] ON [database1.]table1 [RENAME TO new_name1] +ALTER [ROW] POLICY [IF EXISTS] name1 [ON CLUSTER cluster_name1] ON [database1.]table1 [RENAME TO new_name1] [, name2 [ON CLUSTER cluster_name2] ON [database2.]table2 [RENAME TO new_name2] ...] [AS {PERMISSIVE | RESTRICTIVE}] [FOR SELECT] diff --git a/docs/en/sql-reference/statements/alter/setting.md b/docs/en/sql-reference/statements/alter/setting.md new file mode 100644 index 00000000000..90747bc1919 --- /dev/null +++ b/docs/en/sql-reference/statements/alter/setting.md @@ -0,0 +1,60 @@ +--- +toc_priority: 38 +toc_title: SETTING +--- + +# Table Settings Manipulations {#table_settings_manipulations} + +There is a set of queries to change table settings. You can modify settings or reset them to default values. A single query can change several settings at once. +If a setting with the specified name does not exist, then the query raises an exception. + +**Syntax** + +``` sql +ALTER TABLE [db].name [ON CLUSTER cluster] MODIFY|RESET SETTING ... +``` + +!!! note "Note" + These queries can be applied to [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md) tables only. + + +## MODIFY SETTING {#alter_modify_setting} + +Changes table settings. + +**Syntax** + +```sql +MODIFY SETTING setting_name=value [, ...] +``` + +**Example** + +```sql +CREATE TABLE example_table (id UInt32, data String) ENGINE=MergeTree() ORDER BY id; + +ALTER TABLE example_table MODIFY SETTING max_part_loading_threads=8, max_parts_in_total=50000; +``` + +## RESET SETTING {#alter_reset_setting} + +Resets table settings to their default values. If a setting is in a default state, then no action is taken. + +**Syntax** + +```sql +RESET SETTING setting_name [, ...] +``` + +**Example** + +```sql +CREATE TABLE example_table (id UInt32, data String) ENGINE=MergeTree() ORDER BY id + SETTINGS max_part_loading_threads=8; + +ALTER TABLE example_table RESET SETTING max_part_loading_threads; +``` + +**See Also** + +- [MergeTree settings](../../../operations/settings/merge-tree-settings.md) diff --git a/docs/en/sql-reference/statements/alter/user.md b/docs/en/sql-reference/statements/alter/user.md index 73081bc8619..4873982e2a1 100644 --- a/docs/en/sql-reference/statements/alter/user.md +++ b/docs/en/sql-reference/statements/alter/user.md @@ -10,7 +10,7 @@ Changes ClickHouse user accounts. Syntax: ``` sql -ALTER USER [IF EXISTS] name1 [ON CLUSTER cluster_name1] [RENAME TO new_name1] +ALTER USER [IF EXISTS] name1 [ON CLUSTER cluster_name1] [RENAME TO new_name1] [, name2 [ON CLUSTER cluster_name2] [RENAME TO new_name2] ...] [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password | plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']}] [[ADD | DROP] HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE] diff --git a/docs/en/sql-reference/statements/attach.md b/docs/en/sql-reference/statements/attach.md index bebba01980e..84165d30357 100644 --- a/docs/en/sql-reference/statements/attach.md +++ b/docs/en/sql-reference/statements/attach.md @@ -5,7 +5,7 @@ toc_title: ATTACH # ATTACH Statement {#attach} -Attaches the table, for example, when moving a database to another server. +Attaches the table, for example, when moving a database to another server. The query does not create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the table to the server. After executing an `ATTACH` query, the server will know about the existence of the table. diff --git a/docs/en/sql-reference/statements/check-table.md b/docs/en/sql-reference/statements/check-table.md index d40fe263b1a..bc89b11ae4d 100644 --- a/docs/en/sql-reference/statements/check-table.md +++ b/docs/en/sql-reference/statements/check-table.md @@ -32,7 +32,7 @@ Engines from the `*Log` family do not provide automatic data recovery on failure ## Checking the MergeTree Family Tables {#checking-mergetree-tables} -For `MergeTree` family engines, if [check_query_single_value_result](../../operations/settings/settings.md#check_query_single_value_result) = 0, the `CHECK TABLE` query shows a check status for every individual data part of a table on the local server. +For `MergeTree` family engines, if [check_query_single_value_result](../../operations/settings/settings.md#check_query_single_value_result) = 0, the `CHECK TABLE` query shows a check status for every individual data part of a table on the local server. ```sql SET check_query_single_value_result = 0; diff --git a/docs/en/sql-reference/statements/create/quota.md b/docs/en/sql-reference/statements/create/quota.md index 0698d9bede5..767846ead52 100644 --- a/docs/en/sql-reference/statements/create/quota.md +++ b/docs/en/sql-reference/statements/create/quota.md @@ -24,7 +24,7 @@ Parameters `queries`, `query_selects`, `query_inserts`, `errors`, `result_rows`, `ON CLUSTER` clause allows creating quotas on a cluster, see [Distributed DDL](../../../sql-reference/distributed-ddl.md). -**Examples** +**Examples** Limit the maximum number of queries for the current user with 123 queries in 15 months constraint: diff --git a/docs/en/sql-reference/statements/create/row-policy.md b/docs/en/sql-reference/statements/create/row-policy.md index 1df7cc36995..3f88d794619 100644 --- a/docs/en/sql-reference/statements/create/row-policy.md +++ b/docs/en/sql-reference/statements/create/row-policy.md @@ -13,8 +13,8 @@ Creates a [row policy](../../../operations/access-rights.md#row-policy-managemen Syntax: ``` sql -CREATE [ROW] POLICY [IF NOT EXISTS | OR REPLACE] policy_name1 [ON CLUSTER cluster_name1] ON [db1.]table1 - [, policy_name2 [ON CLUSTER cluster_name2] ON [db2.]table2 ...] +CREATE [ROW] POLICY [IF NOT EXISTS | OR REPLACE] policy_name1 [ON CLUSTER cluster_name1] ON [db1.]table1 + [, policy_name2 [ON CLUSTER cluster_name2] ON [db2.]table2 ...] [FOR SELECT] USING condition [AS {PERMISSIVE | RESTRICTIVE}] [TO {role1 [, role2 ...] | ALL | ALL EXCEPT role1 [, role2 ...]}] @@ -32,11 +32,11 @@ Keyword `ALL` means all the ClickHouse users including current user. Keyword `AL !!! note "Note" If there are no row policies defined for a table then any user can `SELECT` all the row from the table. Defining one or more row policies for the table makes the access to the table depending on the row policies no matter if those row policies are defined for the current user or not. For example, the following policy - + `CREATE ROW POLICY pol1 ON mydb.table1 USING b=1 TO mira, peter` forbids the users `mira` and `peter` to see the rows with `b != 1`, and any non-mentioned user (e.g., the user `paul`) will see no rows from `mydb.table1` at all. - + If that's not desirable it can't be fixed by adding one more row policy, like the following: `CREATE ROW POLICY pol2 ON mydb.table1 USING 1 TO ALL EXCEPT mira, peter` diff --git a/docs/en/sql-reference/statements/create/settings-profile.md b/docs/en/sql-reference/statements/create/settings-profile.md index 0dab6a8512c..07bb54c9da3 100644 --- a/docs/en/sql-reference/statements/create/settings-profile.md +++ b/docs/en/sql-reference/statements/create/settings-profile.md @@ -10,7 +10,7 @@ Creates [settings profiles](../../../operations/access-rights.md#settings-profil Syntax: ``` sql -CREATE SETTINGS PROFILE [IF NOT EXISTS | OR REPLACE] TO name1 [ON CLUSTER cluster_name1] +CREATE SETTINGS PROFILE [IF NOT EXISTS | OR REPLACE] TO name1 [ON CLUSTER cluster_name1] [, name2 [ON CLUSTER cluster_name2] ...] [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY|WRITABLE] | INHERIT 'profile_name'] [,...] ``` diff --git a/docs/en/sql-reference/statements/create/table.md b/docs/en/sql-reference/statements/create/table.md index 70ac9acd186..c20981b6bbf 100644 --- a/docs/en/sql-reference/statements/create/table.md +++ b/docs/en/sql-reference/statements/create/table.md @@ -189,7 +189,7 @@ CREATE TABLE codec_example dt Date CODEC(ZSTD), ts DateTime CODEC(LZ4HC), float_value Float32 CODEC(NONE), - double_value Float64 CODEC(LZ4HC(9)) + double_value Float64 CODEC(LZ4HC(9)), value Float32 CODEC(Delta, ZSTD) ) ENGINE = @@ -254,6 +254,20 @@ CREATE TABLE codec_example ENGINE = MergeTree() ``` +### Encryption Codecs {#create-query-encryption-codecs} + +These codecs don't actually compress data, but instead encrypt data on disk. These are only available when an encryption key is specified by [encryption](../../../operations/server-configuration-parameters/settings.md#server-settings-encryption) settings. Note that encryption only makes sense at the end of codec pipelines, because encrypted data usually can't be compressed in any meaningful way. + +Encryption codecs: + +- `Encrypted('AES-128-GCM-SIV')` — Encrypts data with AES-128 in [RFC 8452](https://tools.ietf.org/html/rfc8452) GCM-SIV mode. This codec uses a fixed nonce and encryption is therefore deterministic. This makes it compatible with deduplicating engines such as [ReplicatedMergeTree](../../../engines/table-engines/mergetree-family/replication.md) but has a weakness: when the same data block is encrypted twice, the resulting ciphertext will be exactly the same so an adversary who can read the disk can see this equivalence (although only the equivalence). + +!!! attention "Attention" + Most engines including the "*MergeTree" family create index files on disk without applying codecs. This means plaintext will appear on disk if an encrypted column is indexed. + +!!! attention "Attention" + If you perform a SELECT query mentioning a specific value in an encrypted column (such as in its WHERE clause), the value may appear in [system.query_log](../../../operations/system-tables/query_log.md). You may want to disable the logging. + ## Temporary Tables {#temporary-tables} ClickHouse supports temporary tables which have the following characteristics: @@ -361,7 +375,7 @@ You can add a comment to the table when you creating it. !!!note "Note" The comment is supported for all table engines except [Kafka](../../../engines/table-engines/integrations/kafka.md), [RabbitMQ](../../../engines/table-engines/integrations/rabbitmq.md) and [EmbeddedRocksDB](../../../engines/table-engines/integrations/embedded-rocksdb.md). - + **Syntax** @@ -373,7 +387,7 @@ CREATE TABLE db.table_name ENGINE = engine COMMENT 'Comment' ``` - + **Example** Query: diff --git a/docs/en/sql-reference/statements/create/user.md b/docs/en/sql-reference/statements/create/user.md index ad9f203b768..dfa065f5d0a 100644 --- a/docs/en/sql-reference/statements/create/user.md +++ b/docs/en/sql-reference/statements/create/user.md @@ -10,11 +10,12 @@ Creates [user accounts](../../../operations/access-rights.md#user-account-manage Syntax: ``` sql -CREATE USER [IF NOT EXISTS | OR REPLACE] name1 [ON CLUSTER cluster_name1] +CREATE USER [IF NOT EXISTS | OR REPLACE] name1 [ON CLUSTER cluster_name1] [, name2 [ON CLUSTER cluster_name2] ...] [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password | plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']}] [HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE] [DEFAULT ROLE role [,...]] + [DEFAULT DATABASE database | NONE] [GRANTEES {user | role | ANY | NONE} [,...] [EXCEPT {user | role} [,...]]] [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY | WRITABLE] | PROFILE 'profile_name'] [,...] ``` @@ -54,7 +55,7 @@ Another way of specifying host is to use `@` syntax following the username. Exam !!! info "Warning" ClickHouse treats `user_name@'address'` as a username as a whole. Thus, technically you can create multiple users with the same `user_name` and different constructions after `@`. However, we do not recommend to do so. - + ## GRANTEES Clause {#grantees} Specifies users or roles which are allowed to receive [privileges](../../../sql-reference/statements/grant.md#grant-privileges) from this user on the condition this user has also all required access granted with [GRANT OPTION](../../../sql-reference/statements/grant.md#grant-privigele-syntax). Options of the `GRANTEES` clause: diff --git a/docs/en/sql-reference/statements/create/view.md b/docs/en/sql-reference/statements/create/view.md index 4b51bb8b067..9693f584761 100644 --- a/docs/en/sql-reference/statements/create/view.md +++ b/docs/en/sql-reference/statements/create/view.md @@ -77,7 +77,7 @@ CREATE LIVE VIEW [IF NOT EXISTS] [db.]table_name [WITH [TIMEOUT [value_in_sec] [ Live views store result of the corresponding [SELECT](../../../sql-reference/statements/select/index.md) query and are updated any time the result of the query changes. Query result as well as partial result needed to combine with new data are stored in memory providing increased performance for repeated queries. Live views can provide push notifications when query result changes using the [WATCH](../../../sql-reference/statements/watch.md) query. -Live views are triggered by insert into the innermost table specified in the query. +Live views are triggered by insert into the innermost table specified in the query. Live views work similarly to how a query in a distributed table works. But instead of combining partial results from different servers they combine partial result from current data with partial result from the new data. When a live view query includes a subquery then the cached partial result is only stored for the innermost subquery. @@ -166,7 +166,7 @@ You can force live view refresh using the `ALTER LIVE VIEW [db.]table_name REFRE ### WITH TIMEOUT Clause {#live-view-with-timeout} -When a live view is created with a `WITH TIMEOUT` clause then the live view will be dropped automatically after the specified number of seconds elapse since the end of the last [WATCH](../../../sql-reference/statements/watch.md) query that was watching the live view. +When a live view is created with a `WITH TIMEOUT` clause then the live view will be dropped automatically after the specified number of seconds elapse since the end of the last [WATCH](../../../sql-reference/statements/watch.md) query that was watching the live view. ```sql CREATE LIVE VIEW [db.]table_name WITH TIMEOUT [value_in_sec] AS SELECT ... @@ -210,7 +210,7 @@ WATCH lv └─────────────────────┴──────────┘ ``` -You can combine `WITH TIMEOUT` and `WITH REFRESH` clauses using an `AND` clause. +You can combine `WITH TIMEOUT` and `WITH REFRESH` clauses using an `AND` clause. ```sql CREATE LIVE VIEW [db.]table_name WITH TIMEOUT [value_in_sec] AND REFRESH [value_in_sec] AS SELECT ... @@ -229,7 +229,7 @@ WATCH lv ``` ``` -Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.lv does not exist.. +Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.lv does not exist.. ``` ### Usage {#live-view-usage} diff --git a/docs/en/sql-reference/statements/detach.md b/docs/en/sql-reference/statements/detach.md index a181dd8deee..049c5a5dad9 100644 --- a/docs/en/sql-reference/statements/detach.md +++ b/docs/en/sql-reference/statements/detach.md @@ -13,13 +13,13 @@ Syntax: DETACH TABLE|VIEW [IF EXISTS] [db.]name [ON CLUSTER cluster] [PERMANENTLY] ``` -Detaching does not delete the data or metadata for the table or materialized view. If the table or view was not detached `PERMANENTLY`, on the next server launch the server will read the metadata and recall the table/view again. If the table or view was detached `PERMANENTLY`, there will be no automatic recall. +Detaching does not delete the data or metadata for the table or materialized view. If the table or view was not detached `PERMANENTLY`, on the next server launch the server will read the metadata and recall the table/view again. If the table or view was detached `PERMANENTLY`, there will be no automatic recall. Whether the table was detached permanently or not, in both cases you can reattach it using the [ATTACH](../../sql-reference/statements/attach.md). System log tables can be also attached back (e.g. `query_log`, `text_log`, etc). Other system tables can't be reattached. On the next server launch the server will recall those tables again. `ATTACH MATERIALIZED VIEW` does not work with short syntax (without `SELECT`), but you can attach it using the `ATTACH TABLE` query. -Note that you can not detach permanently the table which is already detached (temporary). But you can attach it back and then detach permanently again. +Note that you can not detach permanently the table which is already detached (temporary). But you can attach it back and then detach permanently again. Also you can not [DROP](../../sql-reference/statements/drop.md#drop-table) the detached table, or [CREATE TABLE](../../sql-reference/statements/create/table.md) with the same name as detached permanently, or replace it with the other table with [RENAME TABLE](../../sql-reference/statements/rename.md) query. diff --git a/docs/en/sql-reference/statements/explain.md b/docs/en/sql-reference/statements/explain.md index f22f92c625a..0ab2cedd73e 100644 --- a/docs/en/sql-reference/statements/explain.md +++ b/docs/en/sql-reference/statements/explain.md @@ -240,7 +240,7 @@ EXPLAIN json = 1, description = 0, header = 1 SELECT 1, 2 + dummy; } ] ``` - + With `indexes` = 1, the `Indexes` key is added. It contains an array of used indexes. Each index is described as JSON with `Type` key (a string `MinMax`, `Partition`, `PrimaryKey` or `Skip`) and optional keys: - `Name` — An index name (for now, is used only for `Skip` index). diff --git a/docs/en/sql-reference/statements/grant.md b/docs/en/sql-reference/statements/grant.md index 8ca2b25ce66..25dffc36954 100644 --- a/docs/en/sql-reference/statements/grant.md +++ b/docs/en/sql-reference/statements/grant.md @@ -13,7 +13,7 @@ To revoke privileges, use the [REVOKE](../../sql-reference/statements/revoke.md) ## Granting Privilege Syntax {#grant-privigele-syntax} ``` sql -GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.table|db.*|*.*|table|*} TO {user | role | CURRENT_USER} [,...] [WITH GRANT OPTION] +GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.table|db.*|*.*|table|*} TO {user | role | CURRENT_USER} [,...] [WITH GRANT OPTION] [WITH REPLACE OPTION] ``` - `privilege` — Type of privilege. @@ -21,17 +21,19 @@ GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.ta - `user` — ClickHouse user account. The `WITH GRANT OPTION` clause grants `user` or `role` with permission to execute the `GRANT` query. Users can grant privileges of the same scope they have and less. +The `WITH REPLACE OPTION` clause replace old privileges by new privileges for the `user` or `role`, if not specified it is append privileges. ## Assigning Role Syntax {#assign-role-syntax} ``` sql -GRANT [ON CLUSTER cluster_name] role [,...] TO {user | another_role | CURRENT_USER} [,...] [WITH ADMIN OPTION] +GRANT [ON CLUSTER cluster_name] role [,...] TO {user | another_role | CURRENT_USER} [,...] [WITH ADMIN OPTION] [WITH REPLACE OPTION] ``` - `role` — ClickHouse user role. - `user` — ClickHouse user account. The `WITH ADMIN OPTION` clause grants [ADMIN OPTION](#admin-option-privilege) privilege to `user` or `role`. +The `WITH REPLACE OPTION` clause replace old roles by new role for the `user` or `role`, if not specified it is append roles. ## Usage {#grant-usage} diff --git a/docs/en/sql-reference/statements/insert-into.md b/docs/en/sql-reference/statements/insert-into.md index db10ddd47c6..31b4d30835f 100644 --- a/docs/en/sql-reference/statements/insert-into.md +++ b/docs/en/sql-reference/statements/insert-into.md @@ -105,7 +105,7 @@ However, you can delete old data using `ALTER TABLE ... DROP PARTITION`. `FORMAT` clause must be specified in the end of query if `SELECT` clause contains table function [input()](../../sql-reference/table-functions/input.md). -To insert a default value instead of `NULL` into a column with not nullable data type, enable [insert_null_as_default](../../operations/settings/settings.md#insert_null_as_default) setting. +To insert a default value instead of `NULL` into a column with not nullable data type, enable [insert_null_as_default](../../operations/settings/settings.md#insert_null_as_default) setting. ### Performance Considerations {#performance-considerations} diff --git a/docs/en/sql-reference/statements/optimize.md b/docs/en/sql-reference/statements/optimize.md index 5eaf0558d7b..864509cec94 100644 --- a/docs/en/sql-reference/statements/optimize.md +++ b/docs/en/sql-reference/statements/optimize.md @@ -37,7 +37,7 @@ If you want to perform deduplication on custom set of columns rather than on all **Syntax** ``` sql -OPTIMIZE TABLE table DEDUPLICATE; -- all columns +OPTIMIZE TABLE table DEDUPLICATE; -- all columns OPTIMIZE TABLE table DEDUPLICATE BY *; -- excludes MATERIALIZED and ALIAS columns OPTIMIZE TABLE table DEDUPLICATE BY colX,colY,colZ; OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX; @@ -65,7 +65,7 @@ PARTITION BY partition_key ORDER BY (primary_key, secondary_key); ``` ``` sql -INSERT INTO example (primary_key, secondary_key, value, partition_key) +INSERT INTO example (primary_key, secondary_key, value, partition_key) VALUES (0, 0, 0, 0), (0, 0, 0, 0), (1, 1, 2, 2), (1, 1, 2, 3), (1, 1, 3, 3); ``` ``` sql diff --git a/docs/en/sql-reference/statements/select/group-by.md b/docs/en/sql-reference/statements/select/group-by.md index e6affc07b78..7c2d3a20f43 100644 --- a/docs/en/sql-reference/statements/select/group-by.md +++ b/docs/en/sql-reference/statements/select/group-by.md @@ -49,14 +49,14 @@ If you pass several keys to `GROUP BY`, the result will give you all the combina `WITH ROLLUP` modifier is used to calculate subtotals for the key expressions, based on their order in the `GROUP BY` list. The subtotals rows are added after the result table. -The subtotals are calculated in the reverse order: at first subtotals are calculated for the last key expression in the list, then for the previous one, and so on up to the first key expression. +The subtotals are calculated in the reverse order: at first subtotals are calculated for the last key expression in the list, then for the previous one, and so on up to the first key expression. In the subtotals rows the values of already "grouped" key expressions are set to `0` or empty line. !!! note "Note" Mind that [HAVING](../../../sql-reference/statements/select/having.md) clause can affect the subtotals results. -**Example** +**Example** Consider the table t: @@ -115,7 +115,7 @@ In the subtotals rows the values of all "grouped" key expressions are set to `0` !!! note "Note" Mind that [HAVING](../../../sql-reference/statements/select/having.md) clause can affect the subtotals results. -**Example** +**Example** Consider the table t: @@ -138,13 +138,13 @@ SELECT year, month, day, count(*) FROM t GROUP BY year, month, day WITH CUBE; As `GROUP BY` section has three key expressions, the result contains eight tables with subtotals for all key expression combinations: -- `GROUP BY year, month, day` -- `GROUP BY year, month` +- `GROUP BY year, month, day` +- `GROUP BY year, month` - `GROUP BY year, day` -- `GROUP BY year` -- `GROUP BY month, day` -- `GROUP BY month` -- `GROUP BY day` +- `GROUP BY year` +- `GROUP BY month, day` +- `GROUP BY month` +- `GROUP BY day` - and totals. Columns, excluded from `GROUP BY`, are filled with zeros. @@ -257,7 +257,7 @@ Aggregation is one of the most important features of a column-oriented DBMS, and ### GROUP BY Optimization Depending on Table Sorting Key {#aggregation-in-order} -The aggregation can be performed more effectively, if a table is sorted by some key, and `GROUP BY` expression contains at least prefix of sorting key or injective functions. In this case when a new key is read from table, the in-between result of aggregation can be finalized and sent to client. This behaviour is switched on by the [optimize_aggregation_in_order](../../../operations/settings/settings.md#optimize_aggregation_in_order) setting. Such optimization reduces memory usage during aggregation, but in some cases may slow down the query execution. +The aggregation can be performed more effectively, if a table is sorted by some key, and `GROUP BY` expression contains at least prefix of sorting key or injective functions. In this case when a new key is read from table, the in-between result of aggregation can be finalized and sent to client. This behaviour is switched on by the [optimize_aggregation_in_order](../../../operations/settings/settings.md#optimize_aggregation_in_order) setting. Such optimization reduces memory usage during aggregation, but in some cases may slow down the query execution. ### GROUP BY in External Memory {#select-group-by-in-external-memory} diff --git a/docs/en/sql-reference/statements/select/index.md b/docs/en/sql-reference/statements/select/index.md index 2f2ce943225..04273ca1d4d 100644 --- a/docs/en/sql-reference/statements/select/index.md +++ b/docs/en/sql-reference/statements/select/index.md @@ -170,7 +170,7 @@ You can use the following modifiers in `SELECT` queries. ### APPLY {#apply-modifier} -Allows you to invoke some function for each row returned by an outer table expression of a query. +Allows you to invoke some function for each row returned by an outer table expression of a query. **Syntax:** @@ -178,7 +178,7 @@ Allows you to invoke some function for each row returned by an outer table expre SELECT APPLY( ) FROM [db.]table_name ``` -**Example:** +**Example:** ``` sql CREATE TABLE columns_transformers (i Int64, j Int16, k Int64) ENGINE = MergeTree ORDER by (i); @@ -272,9 +272,9 @@ SELECT * REPLACE(i + 1 AS i) EXCEPT (j) APPLY(sum) from columns_transformers; ## SETTINGS in SELECT Query {#settings-in-select} -You can specify the necessary settings right in the `SELECT` query. The setting value is applied only to this query and is reset to default or previous value after the query is executed. +You can specify the necessary settings right in the `SELECT` query. The setting value is applied only to this query and is reset to default or previous value after the query is executed. -Other ways to make settings see [here](../../../operations/settings/index.md). +Other ways to make settings see [here](../../../operations/settings/index.md). **Example** diff --git a/docs/en/sql-reference/statements/select/join.md b/docs/en/sql-reference/statements/select/join.md index c90b4bf0eaa..0002e6db313 100644 --- a/docs/en/sql-reference/statements/select/join.md +++ b/docs/en/sql-reference/statements/select/join.md @@ -36,14 +36,23 @@ Additional join types available in ClickHouse: - `LEFT ANY JOIN`, `RIGHT ANY JOIN` and `INNER ANY JOIN`, partially (for opposite side of `LEFT` and `RIGHT`) or completely (for `INNER` and `FULL`) disables the cartesian product for standard `JOIN` types. - `ASOF JOIN` and `LEFT ASOF JOIN`, joining sequences with a non-exact match. `ASOF JOIN` usage is described below. -## Setting {#join-settings} +## Settings {#join-settings} -!!! note "Note" - The default join type can be overriden using [join_default_strictness](../../../operations/settings/settings.md#settings-join_default_strictness) setting. +The default join type can be overriden using [join_default_strictness](../../../operations/settings/settings.md#settings-join_default_strictness) setting. - Also the behavior of ClickHouse server for `ANY JOIN` operations depends on the [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) setting. +The behavior of ClickHouse server for `ANY JOIN` operations depends on the [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) setting. -### ASOF JOIN Usage {#asof-join-usage} +**See also** + +- [join_algorithm](../../../operations/settings/settings.md#settings-join_algorithm) +- [join_any_take_last_row](../../../operations/settings/settings.md#settings-join_any_take_last_row) +- [join_use_nulls](../../../operations/settings/settings.md#join_use_nulls) +- [partial_merge_join_optimizations](../../../operations/settings/settings.md#partial_merge_join_optimizations) +- [partial_merge_join_rows_in_right_blocks](../../../operations/settings/settings.md#partial_merge_join_rows_in_right_blocks) +- [join_on_disk_max_files_to_merge](../../../operations/settings/settings.md#join_on_disk_max_files_to_merge) +- [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) + +## ASOF JOIN Usage {#asof-join-usage} `ASOF JOIN` is useful when you need to join records that have no exact match. @@ -93,7 +102,7 @@ For example, consider the following tables: !!! note "Note" `ASOF` join is **not** supported in the [Join](../../../engines/table-engines/special/join.md) table engine. -## Distributed Join {#global-join} +## Distributed JOIN {#global-join} There are two ways to execute join involving distributed tables: @@ -102,6 +111,42 @@ There are two ways to execute join involving distributed tables: Be careful when using `GLOBAL`. For more information, see the [Distributed subqueries](../../../sql-reference/operators/in.md#select-distributed-subqueries) section. +## Implicit Type Conversion {#implicit-type-conversion} + +`INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL JOIN` queries support the implicit type conversion for "join keys". However the query can not be executed, if join keys from the left and the right tables cannot be converted to a single type (for example, there is no data type that can hold all values from both `UInt64` and `Int64`, or `String` and `Int32`). + +**Example** + +Consider the table `t_1`: +```text +┌─a─┬─b─┬─toTypeName(a)─┬─toTypeName(b)─┠+│ 1 │ 1 │ UInt16 │ UInt8 │ +│ 2 │ 2 │ UInt16 │ UInt8 │ +└───┴───┴───────────────┴───────────────┘ +``` +and the table `t_2`: +```text +┌──a─┬────b─┬─toTypeName(a)─┬─toTypeName(b)───┠+│ -1 │ 1 │ Int16 │ Nullable(Int64) │ +│ 1 │ -1 │ Int16 │ Nullable(Int64) │ +│ 1 │ 1 │ Int16 │ Nullable(Int64) │ +└────┴──────┴───────────────┴─────────────────┘ +``` + +The query +```sql +SELECT a, b, toTypeName(a), toTypeName(b) FROM t_1 FULL JOIN t_2 USING (a, b); +``` +returns the set: +```text +┌──a─┬────b─┬─toTypeName(a)─┬─toTypeName(b)───┠+│ 1 │ 1 │ Int32 │ Nullable(Int64) │ +│ 2 │ 2 │ Int32 │ Nullable(Int64) │ +│ -1 │ 1 │ Int32 │ Nullable(Int64) │ +│ 1 │ -1 │ Int32 │ Nullable(Int64) │ +└────┴──────┴───────────────┴─────────────────┘ +``` + ## Usage Recommendations {#usage-recommendations} ### Processing of Empty or NULL Cells {#processing-of-empty-or-null-cells} @@ -139,9 +184,9 @@ If you need a `JOIN` for joining with dimension tables (these are relatively sma ### Memory Limitations {#memory-limitations} -By default, ClickHouse uses the [hash join](https://en.wikipedia.org/wiki/Hash_join) algorithm. ClickHouse takes the `` and creates a hash table for it in RAM. After some threshold of memory consumption, ClickHouse falls back to merge join algorithm. +By default, ClickHouse uses the [hash join](https://en.wikipedia.org/wiki/Hash_join) algorithm. ClickHouse takes the right_table and creates a hash table for it in RAM. If `join_algorithm = 'auto'` is enabled, then after some threshold of memory consumption, ClickHouse falls back to [merge](https://en.wikipedia.org/wiki/Sort-merge_join) join algorithm. For `JOIN` algorithms description see the [join_algorithm](../../../operations/settings/settings.md#settings-join_algorithm) setting. -If you need to restrict join operation memory consumption use the following settings: +If you need to restrict `JOIN` operation memory consumption use the following settings: - [max_rows_in_join](../../../operations/settings/query-complexity.md#settings-max_rows_in_join) — Limits number of rows in the hash table. - [max_bytes_in_join](../../../operations/settings/query-complexity.md#settings-max_bytes_in_join) — Limits size of the hash table. diff --git a/docs/en/sql-reference/statements/select/offset.md b/docs/en/sql-reference/statements/select/offset.md index 3efd916bcb8..20ebd972a24 100644 --- a/docs/en/sql-reference/statements/select/offset.md +++ b/docs/en/sql-reference/statements/select/offset.md @@ -32,10 +32,10 @@ The `WITH TIES` option is used to return any additional rows that tie for the la !!! note "Note" According to the standard, the `OFFSET` clause must come before the `FETCH` clause if both are present. - + !!! note "Note" The real offset can also depend on the [offset](../../../operations/settings/settings.md#offset) setting. - + ## Examples {#examples} Input table: diff --git a/docs/en/sql-reference/statements/select/order-by.md b/docs/en/sql-reference/statements/select/order-by.md index a8fec5cfa26..156f68935b5 100644 --- a/docs/en/sql-reference/statements/select/order-by.md +++ b/docs/en/sql-reference/statements/select/order-by.md @@ -250,8 +250,8 @@ External sorting works much less effectively than sorting in RAM. ## Optimization of Data Reading {#optimize_read_in_order} - If `ORDER BY` expression has a prefix that coincides with the table sorting key, you can optimize the query by using the [optimize_read_in_order](../../../operations/settings/settings.md#optimize_read_in_order) setting. - + If `ORDER BY` expression has a prefix that coincides with the table sorting key, you can optimize the query by using the [optimize_read_in_order](../../../operations/settings/settings.md#optimize_read_in_order) setting. + When the `optimize_read_in_order` setting is enabled, the ClickHouse server uses the table index and reads the data in order of the `ORDER BY` key. This allows to avoid reading all data in case of specified [LIMIT](../../../sql-reference/statements/select/limit.md). So queries on big data with small limit are processed faster. Optimization works with both `ASC` and `DESC` and does not work together with [GROUP BY](../../../sql-reference/statements/select/group-by.md) clause and [FINAL](../../../sql-reference/statements/select/from.md#select-from-final) modifier. @@ -274,28 +274,28 @@ This modifier also can be combined with [LIMIT … WITH TIES modifier](../../../ `WITH FILL` modifier can be set after `ORDER BY expr` with optional `FROM expr`, `TO expr` and `STEP expr` parameters. All missed values of `expr` column will be filled sequentially and other columns will be filled as defaults. -Use following syntax for filling multiple columns add `WITH FILL` modifier with optional parameters after each field name in `ORDER BY` section. +To fill multiple columns, add `WITH FILL` modifier with optional parameters after each field name in `ORDER BY` section. ``` sql ORDER BY expr [WITH FILL] [FROM const_expr] [TO const_expr] [STEP const_numeric_expr], ... exprN [WITH FILL] [FROM expr] [TO expr] [STEP numeric_expr] ``` -`WITH FILL` can be applied only for fields with Numeric (all kind of float, decimal, int) or Date/DateTime types. +`WITH FILL` can be applied for fields with Numeric (all kinds of float, decimal, int) or Date/DateTime types. When applied for `String` fields, missed values are filled with empty strings. When `FROM const_expr` not defined sequence of filling use minimal `expr` field value from `ORDER BY`. When `TO const_expr` not defined sequence of filling use maximum `expr` field value from `ORDER BY`. When `STEP const_numeric_expr` defined then `const_numeric_expr` interprets `as is` for numeric types as `days` for Date type and as `seconds` for DateTime type. When `STEP const_numeric_expr` omitted then sequence of filling use `1.0` for numeric type, `1 day` for Date type and `1 second` for DateTime type. -For example, the following query +Example of a query without `WITH FILL`: ``` sql SELECT n, source FROM ( SELECT toFloat32(number % 10) AS n, 'original' AS source FROM numbers(10) WHERE number % 3 = 1 -) ORDER BY n +) ORDER BY n; ``` -returns +Result: ``` text ┌─n─┬─source───┠@@ -305,16 +305,16 @@ returns └───┴──────────┘ ``` -but after apply `WITH FILL` modifier +Same query after applying `WITH FILL` modifier: ``` sql SELECT n, source FROM ( SELECT toFloat32(number % 10) AS n, 'original' AS source FROM numbers(10) WHERE number % 3 = 1 -) ORDER BY n WITH FILL FROM 0 TO 5.51 STEP 0.5 +) ORDER BY n WITH FILL FROM 0 TO 5.51 STEP 0.5; ``` -returns +Result: ``` text ┌───n─┬─source───┠@@ -334,7 +334,7 @@ returns └─────┴──────────┘ ``` -For the case when we have multiple fields `ORDER BY field2 WITH FILL, field1 WITH FILL` order of filling will follow the order of fields in `ORDER BY` clause. +For the case with multiple fields `ORDER BY field2 WITH FILL, field1 WITH FILL` order of filling will follow the order of fields in the `ORDER BY` clause. Example: @@ -350,7 +350,7 @@ ORDER BY d1 WITH FILL STEP 5; ``` -returns +Result: ``` text ┌───d1───────┬───d2───────┬─source───┠@@ -364,9 +364,9 @@ returns └────────────┴────────────┴──────────┘ ``` -Field `d1` does not fill and use default value cause we do not have repeated values for `d2` value, and sequence for `d1` can’t be properly calculated. +Field `d1` does not fill in and use the default value cause we do not have repeated values for `d2` value, and the sequence for `d1` can’t be properly calculated. -The following query with a changed field in `ORDER BY` +The following query with the changed field in `ORDER BY`: ``` sql SELECT @@ -380,7 +380,7 @@ ORDER BY d2 WITH FILL; ``` -returns +Result: ``` text ┌───d1───────┬───d2───────┬─source───┠diff --git a/docs/en/sql-reference/statements/select/prewhere.md b/docs/en/sql-reference/statements/select/prewhere.md index 663b84f2d48..ada8fff7012 100644 --- a/docs/en/sql-reference/statements/select/prewhere.md +++ b/docs/en/sql-reference/statements/select/prewhere.md @@ -17,8 +17,8 @@ A query may simultaneously specify `PREWHERE` and `WHERE`. In this case, `PREWHE If the `optimize_move_to_prewhere` setting is set to 0, heuristics to automatically move parts of expressions from `WHERE` to `PREWHERE` are disabled. !!! note "Attention" - The `PREWHERE` section is executed before` FINAL`, so the results of `FROM FINAL` queries may be skewed when using` PREWHERE` with fields not in the `ORDER BY` section of a table. - + The `PREWHERE` section is executed before` FINAL`, so the results of `FROM FINAL` queries may be skewed when using` PREWHERE` with fields not in the `ORDER BY` section of a table. + ## Limitations {#limitations} `PREWHERE` is only supported by tables from the `*MergeTree` family. diff --git a/docs/en/sql-reference/statements/select/with.md b/docs/en/sql-reference/statements/select/with.md index 0958f651847..2dca9650340 100644 --- a/docs/en/sql-reference/statements/select/with.md +++ b/docs/en/sql-reference/statements/select/with.md @@ -63,7 +63,7 @@ LIMIT 10; **Example 4:** Reusing expression in a subquery ``` sql -WITH test1 AS (SELECT i + 1, j + 1 FROM test1) +WITH test1 AS (SELECT i + 1, j + 1 FROM test1) SELECT * FROM test1; ``` diff --git a/docs/en/sql-reference/statements/show.md b/docs/en/sql-reference/statements/show.md index eaded449a33..b5df38642ad 100644 --- a/docs/en/sql-reference/statements/show.md +++ b/docs/en/sql-reference/statements/show.md @@ -297,7 +297,7 @@ Returns a list of [user account](../../operations/access-rights.md#user-account- ``` sql SHOW USERS ``` - + ## SHOW ROLES {#show-roles-statement} Returns a list of [roles](../../operations/access-rights.md#role-management). To view another parameters, see system tables [system.roles](../../operations/system-tables/roles.md#system_tables-roles) and [system.role-grants](../../operations/system-tables/role-grants.md#system_tables-role_grants). @@ -335,8 +335,8 @@ Returns a list of [quotas](../../operations/access-rights.md#quotas-management). ``` sql SHOW QUOTAS -``` - +``` + ## SHOW QUOTA {#show-quota-statement} Returns a [quota](../../operations/quotas.md) consumption for all users or for current user. To view another parameters, see system tables [system.quotas_usage](../../operations/system-tables/quotas_usage.md#system_tables-quotas_usage) and [system.quota_usage](../../operations/system-tables/quota_usage.md#system_tables-quota_usage). diff --git a/docs/en/sql-reference/table-functions/dictionary.md b/docs/en/sql-reference/table-functions/dictionary.md index 675fcb5bfdd..ad30cb30adf 100644 --- a/docs/en/sql-reference/table-functions/dictionary.md +++ b/docs/en/sql-reference/table-functions/dictionary.md @@ -13,7 +13,7 @@ Displays the [dictionary](../../sql-reference/dictionaries/external-dictionaries dictionary('dict') ``` -**Arguments** +**Arguments** - `dict` — A dictionary name. [String](../../sql-reference/data-types/string.md). diff --git a/docs/en/sql-reference/table-functions/jdbc.md b/docs/en/sql-reference/table-functions/jdbc.md index 333700358de..4943e0291df 100644 --- a/docs/en/sql-reference/table-functions/jdbc.md +++ b/docs/en/sql-reference/table-functions/jdbc.md @@ -25,7 +25,7 @@ SELECT * FROM jdbc('mysql-dev?p1=233', 'num Int32', 'select toInt32OrZero(''{{p1 ``` ``` sql -SELECT * +SELECT * FROM jdbc('mysql-dev?p1=233', 'num Int32', 'select toInt32OrZero(''{{p1}}'') as num') ``` diff --git a/docs/en/sql-reference/table-functions/mysql.md b/docs/en/sql-reference/table-functions/mysql.md index a174786d4b7..627b81a3ddb 100644 --- a/docs/en/sql-reference/table-functions/mysql.md +++ b/docs/en/sql-reference/table-functions/mysql.md @@ -56,7 +56,7 @@ SELECT name FROM mysql(`mysql1:3306|mysql2:3306|mysql3:3306`, 'mysql_database', A table object with the same columns as the original MySQL table. !!! info "Note" - In the `INSERT` query to distinguish table function `mysql(...)` from table name with column names list, you must use keywords `FUNCTION` or `TABLE FUNCTION`. See examples below. + In the `INSERT` query to distinguish table function `mysql(...)` from table name with column names list, you must use keywords `FUNCTION` or `TABLE FUNCTION`. See examples below. **Examples** diff --git a/docs/en/sql-reference/table-functions/null.md b/docs/en/sql-reference/table-functions/null.md index 355a45a83e1..273091f8fd1 100644 --- a/docs/en/sql-reference/table-functions/null.md +++ b/docs/en/sql-reference/table-functions/null.md @@ -7,13 +7,13 @@ toc_title: null function Creates a temporary table of the specified structure with the [Null](../../engines/table-engines/special/null.md) table engine. According to the `Null`-engine properties, the table data is ignored and the table itself is immediately droped right after the query execution. The function is used for the convenience of test writing and demonstrations. -**Syntax** +**Syntax** ``` sql null('structure') ``` -**Parameter** +**Parameter** - `structure` — A list of columns and column types. [String](../../sql-reference/data-types/string.md). @@ -36,7 +36,7 @@ INSERT INTO t SELECT * FROM numbers_mt(1000000000); DROP TABLE IF EXISTS t; ``` -See also: +See also: - [Null table engine](../../engines/table-engines/special/null.md) diff --git a/docs/en/sql-reference/table-functions/postgresql.md b/docs/en/sql-reference/table-functions/postgresql.md index 7ef664de269..85c9366daf9 100644 --- a/docs/en/sql-reference/table-functions/postgresql.md +++ b/docs/en/sql-reference/table-functions/postgresql.md @@ -43,7 +43,7 @@ PostgreSQL Array types converts into ClickHouse arrays. !!! info "Note" Be careful, in PostgreSQL an array data type column like Integer[] may contain arrays of different dimensions in different rows, but in ClickHouse it is only allowed to have multidimensional arrays of the same dimension in all rows. - + Supports multiple replicas that must be listed by `|`. For example: ```sql diff --git a/docs/en/sql-reference/table-functions/remote.md b/docs/en/sql-reference/table-functions/remote.md index ae399c7e612..9effbb03553 100644 --- a/docs/en/sql-reference/table-functions/remote.md +++ b/docs/en/sql-reference/table-functions/remote.md @@ -20,10 +20,10 @@ remoteSecure('addresses_expr', db.table[, 'user'[, 'password'], sharding_key]) **Parameters** -- `addresses_expr` — An expression that generates addresses of remote servers. This may be just one server address. The server address is `host:port`, or just `host`. - - The host can be specified as the server name, or as the IPv4 or IPv6 address. An IPv6 address is specified in square brackets. - +- `addresses_expr` — An expression that generates addresses of remote servers. This may be just one server address. The server address is `host:port`, or just `host`. + + The host can be specified as the server name, or as the IPv4 or IPv6 address. An IPv6 address is specified in square brackets. + The port is the TCP port on the remote server. If the port is omitted, it uses [tcp_port](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port) from the server’s config file in `remote` (by default, 9000) and [tcp_port_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) in `remoteSecure` (by default, 9440). The port is required for an IPv6 address. @@ -68,28 +68,6 @@ Multiple addresses can be comma-separated. In this case, ClickHouse will use dis example01-01-1,example01-02-1 ``` -Part of the expression can be specified in curly brackets. The previous example can be written as follows: - -``` text -example01-0{1,2}-1 -``` - -Curly brackets can contain a range of numbers separated by two dots (non-negative integers). In this case, the range is expanded to a set of values that generate shard addresses. If the first number starts with zero, the values are formed with the same zero alignment. The previous example can be written as follows: - -``` text -example01-{01..02}-1 -``` - -If you have multiple pairs of curly brackets, it generates the direct product of the corresponding sets. - -Addresses and parts of addresses in curly brackets can be separated by the pipe symbol (\|). In this case, the corresponding sets of addresses are interpreted as replicas, and the query will be sent to the first healthy replica. However, the replicas are iterated in the order currently set in the [load_balancing](../../operations/settings/settings.md#settings-load_balancing) setting. This example specifies two shards that each have two replicas: - -``` text -example01-{01..02}-{1|2} -``` - -The number of addresses generated is limited by a constant. Right now this is 1000 addresses. - **Examples** Selecting data from a remote server: @@ -106,4 +84,15 @@ INSERT INTO FUNCTION remote('127.0.0.1', currentDatabase(), 'remote_table') VALU SELECT * FROM remote_table; ``` -[Original article](https://clickhouse.tech/docs/en/sql-reference/table-functions/remote/) +## Globs in Addresses {globs-in-addresses} + +Patterns in curly brackets `{ }` are used to generate a set of shards and to specify replicas. If there are multiple pairs of curly brackets, then the direct product of the corresponding sets is generated. +The following pattern types are supported. + +- {*a*,*b*} - Any number of variants separated by a comma. The pattern is replaced with *a* in the first shard address and it is replaced with *b* in the second shard address and so on. For instance, `example0{1,2}-1` generates addresses `example01-1` and `example02-1`. +- {*n*..*m*} - A range of numbers. This pattern generates shard addresses with incrementing indices from *n* to *m*. `example0{1..2}-1` generates `example01-1` and `example02-1`. +- {*0n*..*0m*} - A range of numbers with leading zeroes. This modification preserves leading zeroes in indices. The pattern `example{01..03}-1` generates `example01-1`, `example02-1` and `example03-1`. +- {*a*|*b*} - Any number of variants separated by a `|`. The pattern specifies replicas. For instance, `example01-{1|2}` generates replicas `example01-1` and `example01-2`. + +The query will be sent to the first healthy replica. However, for `remote` the replicas are iterated in the order currently set in the [load_balancing](../../operations/settings/settings.md#settings-load_balancing) setting. +The number of generated addresses is limited by [table_function_remote_max_addresses](../../operations/settings/settings.md#table_function_remote_max_addresses) setting. diff --git a/docs/en/sql-reference/table-functions/url.md b/docs/en/sql-reference/table-functions/url.md index 2192b69d006..bfad7a67e0d 100644 --- a/docs/en/sql-reference/table-functions/url.md +++ b/docs/en/sql-reference/table-functions/url.md @@ -41,4 +41,7 @@ INSERT INTO FUNCTION url('http://127.0.0.1:8123/?query=INSERT+INTO+test_table+FO SELECT * FROM test_table; ``` -[Original article](https://clickhouse.tech/docs/en/sql-reference/table-functions/url/) +## Globs in URL {globs-in-url} + +Patterns in curly brackets `{ }` are used to generate a set of shards or to specify failover addresses. Supported pattern types and examples see in the description of the [remote](remote.md#globs-in-addresses) function. +Character `|` inside patterns is used to specify failover addresses. They are iterated in the same order as listed in the pattern. The number of generated addresses is limited by [glob_expansion_max_elements](../../operations/settings/settings.md#glob_expansion_max_elements) setting. diff --git a/docs/en/sql-reference/window-functions/index.md b/docs/en/sql-reference/window-functions/index.md index dcab019c9d5..e62808a46bd 100644 --- a/docs/en/sql-reference/window-functions/index.md +++ b/docs/en/sql-reference/window-functions/index.md @@ -5,9 +5,6 @@ toc_title: Window Functions # [experimental] Window Functions -!!! warning "Warning" - This is an experimental feature that is currently in development and is not ready for general use. It will change in unpredictable backwards-incompatible ways in the future releases. Set `allow_experimental_window_functions = 1` to enable it. - ClickHouse supports the standard grammar for defining windows and window functions. The following features are currently supported: | Feature | Support or workaround | diff --git a/docs/en/whats-new/changelog/2020.md b/docs/en/whats-new/changelog/2020.md index 7fb1f5d9377..b54fbacd2b0 100644 --- a/docs/en/whats-new/changelog/2020.md +++ b/docs/en/whats-new/changelog/2020.md @@ -398,18 +398,18 @@ toc_title: '2020' ## ClickHouse release 20.10 ### ClickHouse release v20.10.7.4-stable, 2020-12-24 - -#### Bug Fix - -* Fixed issue when `clickhouse-odbc-bridge` process is unreachable by server on machines with dual `IPv4/IPv6` stack and fixed issue when ODBC dictionary updates are performed using malformed queries and/or cause crashes. This possibly closes [#14489](https://github.com/ClickHouse/ClickHouse/issues/14489). [#18278](https://github.com/ClickHouse/ClickHouse/pull/18278) ([Denis Glazachev](https://github.com/traceon)). -* Fix key comparison between Enum and Int types. This fixes [#17989](https://github.com/ClickHouse/ClickHouse/issues/17989). [#18214](https://github.com/ClickHouse/ClickHouse/pull/18214) ([Amos Bird](https://github.com/amosbird)). + +#### Bug Fix + +* Fixed issue when `clickhouse-odbc-bridge` process is unreachable by server on machines with dual `IPv4/IPv6` stack and fixed issue when ODBC dictionary updates are performed using malformed queries and/or cause crashes. This possibly closes [#14489](https://github.com/ClickHouse/ClickHouse/issues/14489). [#18278](https://github.com/ClickHouse/ClickHouse/pull/18278) ([Denis Glazachev](https://github.com/traceon)). +* Fix key comparison between Enum and Int types. This fixes [#17989](https://github.com/ClickHouse/ClickHouse/issues/17989). [#18214](https://github.com/ClickHouse/ClickHouse/pull/18214) ([Amos Bird](https://github.com/amosbird)). * Fixed unique key convert crash in `MaterializeMySQL` database engine. This fixes [#18186](https://github.com/ClickHouse/ClickHouse/issues/18186) and fixes [#16372](https://github.com/ClickHouse/ClickHouse/issues/16372) [#18211](https://github.com/ClickHouse/ClickHouse/pull/18211) ([Winter Zhang](https://github.com/zhang2014)). * Fixed `std::out_of_range: basic_string` in S3 URL parsing. [#18059](https://github.com/ClickHouse/ClickHouse/pull/18059) ([Vladimir Chebotarev](https://github.com/excitoon)). -* Fixed the issue when some tables not synchronized to ClickHouse from MySQL caused by the fact that convertion MySQL prefix index wasn't supported for MaterializeMySQL. This fixes [#15187](https://github.com/ClickHouse/ClickHouse/issues/15187) and fixes [#17912](https://github.com/ClickHouse/ClickHouse/issues/17912) [#17944](https://github.com/ClickHouse/ClickHouse/pull/17944) ([Winter Zhang](https://github.com/zhang2014)). -* Fix possible segfault in `topK` aggregate function. This closes [#17404](https://github.com/ClickHouse/ClickHouse/issues/17404). [#17845](https://github.com/ClickHouse/ClickHouse/pull/17845) ([Maksim Kita](https://github.com/kitaisreal)). -* Do not restore parts from `WAL` if `in_memory_parts_enable_wal` is disabled. [#17802](https://github.com/ClickHouse/ClickHouse/pull/17802) ([detailyang](https://github.com/detailyang)). -* Fixed problem when ClickHouse fails to resume connection to MySQL servers. [#17681](https://github.com/ClickHouse/ClickHouse/pull/17681) ([Alexander Kazakov](https://github.com/Akazz)). -* Fixed empty `system.stack_trace` table when server is running in daemon mode. [#17630](https://github.com/ClickHouse/ClickHouse/pull/17630) ([Amos Bird](https://github.com/amosbird)). +* Fixed the issue when some tables not synchronized to ClickHouse from MySQL caused by the fact that convertion MySQL prefix index wasn't supported for MaterializeMySQL. This fixes [#15187](https://github.com/ClickHouse/ClickHouse/issues/15187) and fixes [#17912](https://github.com/ClickHouse/ClickHouse/issues/17912) [#17944](https://github.com/ClickHouse/ClickHouse/pull/17944) ([Winter Zhang](https://github.com/zhang2014)). +* Fix possible segfault in `topK` aggregate function. This closes [#17404](https://github.com/ClickHouse/ClickHouse/issues/17404). [#17845](https://github.com/ClickHouse/ClickHouse/pull/17845) ([Maksim Kita](https://github.com/kitaisreal)). +* Do not restore parts from `WAL` if `in_memory_parts_enable_wal` is disabled. [#17802](https://github.com/ClickHouse/ClickHouse/pull/17802) ([detailyang](https://github.com/detailyang)). +* Fixed problem when ClickHouse fails to resume connection to MySQL servers. [#17681](https://github.com/ClickHouse/ClickHouse/pull/17681) ([Alexander Kazakov](https://github.com/Akazz)). +* Fixed empty `system.stack_trace` table when server is running in daemon mode. [#17630](https://github.com/ClickHouse/ClickHouse/pull/17630) ([Amos Bird](https://github.com/amosbird)). * Fixed the behaviour when `clickhouse-client` is used in interactive mode with multiline queries and single line comment was erronously extended till the end of query. This fixes [#13654](https://github.com/ClickHouse/ClickHouse/issues/13654). [#17565](https://github.com/ClickHouse/ClickHouse/pull/17565) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fixed the issue when server can stop accepting connections in very rare cases. [#17542](https://github.com/ClickHouse/ClickHouse/pull/17542) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fixed `ALTER` query hang when the corresponding mutation was killed on the different replica. This fixes [#16953](https://github.com/ClickHouse/ClickHouse/issues/16953). [#17499](https://github.com/ClickHouse/ClickHouse/pull/17499) ([alesapin](https://github.com/alesapin)). diff --git a/docs/en/whats-new/security-changelog.md b/docs/en/whats-new/security-changelog.md index bebc9a6035f..97cad9965fd 100644 --- a/docs/en/whats-new/security-changelog.md +++ b/docs/en/whats-new/security-changelog.md @@ -3,6 +3,16 @@ toc_priority: 76 toc_title: Security Changelog --- +## Fixed in ClickHouse 21.4.3.21, 2021-04-12 {#fixed-in-clickhouse-release-21-4-3-21-2021-04-12} + +### CVE-2021-25263 {#cve-2021-25263} + +An attacker that has CREATE DICTIONARY privilege, can read arbitary file outside permitted directory. + +Fix has been pushed to versions 20.8.18.32-lts, 21.1.9.41-stable, 21.2.9.41-stable, 21.3.6.55-lts, 21.4.3.21-stable and later. + +Credits: [Vyacheslav Egoshin](https://twitter.com/vegoshin) + ## Fixed in ClickHouse Release 19.14.3.3, 2019-09-10 {#fixed-in-clickhouse-release-19-14-3-3-2019-09-10} ### CVE-2019-15024 {#cve-2019-15024} diff --git a/docs/ja/interfaces/third-party/integrations.md b/docs/ja/interfaces/third-party/integrations.md index 851eeec665f..1cf996a3011 100644 --- a/docs/ja/interfaces/third-party/integrations.md +++ b/docs/ja/interfaces/third-party/integrations.md @@ -45,7 +45,7 @@ toc_title: "\u7D71\u5408" - 監視 - [黒鉛](https://graphiteapp.org) - [グラファウス](https://github.com/yandex/graphouse) - - [カーボンクリックãƒã‚¦ã‚¹](https://github.com/lomik/carbon-clickhouse) + + - [カーボンクリックãƒã‚¦ã‚¹](https://github.com/lomik/carbon-clickhouse) - [グラファイト-クリック](https://github.com/lomik/graphite-clickhouse) - [黒鉛-ch-オプティマイザー](https://github.com/innogames/graphite-ch-optimizer) -staled仕切りを最大é™ã«æ´»ç”¨ã™ã‚‹ [\*GraphiteMergeTree](../../engines/table-engines/mergetree-family/graphitemergetree.md#graphitemergetree) ã‹ã‚‰ã®ãƒ«ãƒ¼ãƒ«ã®å ´åˆ [ロールアップ構æˆ](../../engines/table-engines/mergetree-family/graphitemergetree.md#rollup-configuration) 応用ã§ãã¾ã™ - [グラファナ](https://grafana.com/) @@ -103,7 +103,7 @@ toc_title: "\u7D71\u5408" - Ruby - [Ruby on rails](https://rubyonrails.org/) - [activecube](https://github.com/bitquery/activecube) - - [ActiveRecord](https://github.com/PNixx/clickhouse-activerecord) + - [ActiveRecord](https://github.com/PNixx/clickhouse-activerecord) - [GraphQL](https://github.com/graphql) - [activecube-graphql](https://github.com/bitquery/activecube-graphql) diff --git a/docs/ja/operations/server-configuration-parameters/settings.md b/docs/ja/operations/server-configuration-parameters/settings.md index 0ec71b2af69..f544a92e377 100644 --- a/docs/ja/operations/server-configuration-parameters/settings.md +++ b/docs/ja/operations/server-configuration-parameters/settings.md @@ -464,7 +464,7 @@ SSLã®ã‚µãƒãƒ¼ãƒˆã¯ä»¥ä¸‹ã«ã‚ˆã£ã¦æä¾›ã•ã‚Œã¾ã™ `libpoco` 図書館 - extendedVerification – Automatically extended verification of certificates after the session ends. Acceptable values: `true`, `false`. - requireTLSv1 – Require a TLSv1 connection. Acceptable values: `true`, `false`. - requireTLSv1_1 – Require a TLSv1.1 connection. Acceptable values: `true`, `false`. -- requireTLSv1 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`. +- requireTLSv1_2 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`. - fips – Activates OpenSSL FIPS mode. Supported if the library's OpenSSL version supports FIPS. - privateKeyPassphraseHandler – Class (PrivateKeyPassphraseHandler subclass) that requests the passphrase for accessing the private key. For example: ``, `KeyFileHandler`, `test`, ``. - invalidCertificateHandler – Class (a subclass of CertificateHandler) for verifying invalid certificates. For example: ` ConsoleCertificateHandler ` . diff --git a/docs/ja/sql-reference/statements/grant.md b/docs/ja/sql-reference/statements/grant.md index 63ac7f3aafa..4a088e0e78b 100644 --- a/docs/ja/sql-reference/statements/grant.md +++ b/docs/ja/sql-reference/statements/grant.md @@ -15,7 +15,7 @@ toc_title: GRANT ## 権é™æ§‹æ–‡ã®ä»˜ä¸Ž {#grant-privigele-syntax} ``` sql -GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.table|db.*|*.*|table|*} TO {user | role | CURRENT_USER} [,...] [WITH GRANT OPTION] +GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.table|db.*|*.*|table|*} TO {user | role | CURRENT_USER} [,...] [WITH GRANT OPTION] [WITH REPLACE OPTION] ``` - `privilege` — Type of privilege. @@ -23,17 +23,19 @@ GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.ta - `user` — ClickHouse user account. ã“ã® `WITH GRANT OPTION` å¥ã®ä»˜ä¸Ž `user` ã¾ãŸã¯ `role` 実行ã™ã‚‹è¨±å¯ã‚’得㦠`GRANT` クエリ。 ユーザーã¯ã€æŒã£ã¦ã„るスコープã¨ãれ以下ã®æ¨©é™ã‚’付与ã§ãã¾ã™ã€‚ +ã“ã® `WITH REPLACE OPTION` å¥ã¯ `user`ã¾ãŸã¯` role`ã®æ–°ã—ã„特権ã§å¤ã„特権を置ãæ›ãˆã¾ã™, 指定ã—ãªã„å ´åˆã¯ã€å¤ã„特権をå¤ã„ã‚‚ã®ã«è¿½åŠ ã—ã¦ãã ã•ã„ ## ロール構文ã®å‰²ã‚Šå½“㦠{#assign-role-syntax} ``` sql -GRANT [ON CLUSTER cluster_name] role [,...] TO {user | another_role | CURRENT_USER} [,...] [WITH ADMIN OPTION] +GRANT [ON CLUSTER cluster_name] role [,...] TO {user | another_role | CURRENT_USER} [,...] [WITH ADMIN OPTION] [WITH REPLACE OPTION] ``` - `role` — ClickHouse user role. - `user` — ClickHouse user account. ã“ã® `WITH ADMIN OPTION` å¥ã®ä»˜ä¸Ž [ADMIN OPTION](#admin-option-privilege) ã¸ã®ç‰¹æ¨© `user` ã¾ãŸã¯ `role`. +ã“ã® `WITH REPLACE OPTION` å¥ã¯`user`ã¾ãŸã¯` role`ã®æ–°ã—ã„役割ã«ã‚ˆã£ã¦å¤ã„役割を置ãæ›ãˆã¾ã™, 指定ã—ãªã„å ´åˆã¯ã€å¤ã„特権をå¤ã„ã‚‚ã®ã«è¿½åŠ ã—ã¦ãã ã•ã„ ## 使用法 {#grant-usage} diff --git a/docs/ja/sql-reference/statements/select/index.md b/docs/ja/sql-reference/statements/select/index.md index b1a97ba1b28..de337cab8df 100644 --- a/docs/ja/sql-reference/statements/select/index.md +++ b/docs/ja/sql-reference/statements/select/index.md @@ -170,7 +170,7 @@ You can use the following modifiers in `SELECT` queries. ### APPLY {#apply-modifier} -Allows you to invoke some function for each row returned by an outer table expression of a query. +Allows you to invoke some function for each row returned by an outer table expression of a query. **Syntax:** @@ -178,7 +178,7 @@ Allows you to invoke some function for each row returned by an outer table expre SELECT APPLY( ) FROM [db.]table_name ``` -**Example:** +**Example:** ``` sql CREATE TABLE columns_transformers (i Int64, j Int16, k Int64) ENGINE = MergeTree ORDER by (i); @@ -272,9 +272,9 @@ SELECT * REPLACE(i + 1 AS i) EXCEPT (j) APPLY(sum) from columns_transformers; ## SETTINGS in SELECT Query {#settings-in-select} -You can specify the necessary settings right in the `SELECT` query. The setting value is applied only to this query and is reset to default or previous value after the query is executed. +You can specify the necessary settings right in the `SELECT` query. The setting value is applied only to this query and is reset to default or previous value after the query is executed. -Other ways to make settings see [here](../../../operations/settings/index.md). +Other ways to make settings see [here](../../../operations/settings/index.md). **Example** diff --git a/docs/ja/sql-reference/statements/select/offset.md b/docs/ja/sql-reference/statements/select/offset.md index 3efd916bcb8..20ebd972a24 100644 --- a/docs/ja/sql-reference/statements/select/offset.md +++ b/docs/ja/sql-reference/statements/select/offset.md @@ -32,10 +32,10 @@ The `WITH TIES` option is used to return any additional rows that tie for the la !!! note "Note" According to the standard, the `OFFSET` clause must come before the `FETCH` clause if both are present. - + !!! note "Note" The real offset can also depend on the [offset](../../../operations/settings/settings.md#offset) setting. - + ## Examples {#examples} Input table: diff --git a/docs/ja/sql-reference/table-functions/jdbc.md b/docs/ja/sql-reference/table-functions/jdbc.md index 29ecbe25a95..313aefdf581 100644 --- a/docs/ja/sql-reference/table-functions/jdbc.md +++ b/docs/ja/sql-reference/table-functions/jdbc.md @@ -27,7 +27,7 @@ SELECT * FROM jdbc('mysql-dev?p1=233', 'num Int32', 'select toInt32OrZero(''{{p1 ``` ``` sql -SELECT * +SELECT * FROM jdbc('mysql-dev?p1=233', 'num Int32', 'select toInt32OrZero(''{{p1}}'') as num') ``` diff --git a/docs/ru/development/build-osx.md b/docs/ru/development/build-osx.md deleted file mode 120000 index 8e172b919d8..00000000000 --- a/docs/ru/development/build-osx.md +++ /dev/null @@ -1 +0,0 @@ -../../en/development/build-osx.md \ No newline at end of file diff --git a/docs/ru/development/build-osx.md b/docs/ru/development/build-osx.md new file mode 100644 index 00000000000..8d5d06a544c --- /dev/null +++ b/docs/ru/development/build-osx.md @@ -0,0 +1,125 @@ +--- +toc_priority: 65 +toc_title: Сборка на Mac OS X +--- +# Как Ñобрать ClickHouse на Mac OS X {#how-to-build-clickhouse-on-mac-os-x} + +Сборка должна запуÑкатьÑÑ Ñ x86_64 (Intel) на macOS верÑии 10.15 (Catalina) и выше в поÑледней верÑии компилÑтора Xcode's native AppleClang, Homebrew's vanilla Clang или в GCC-компилÑторах. + +## УÑтановка Homebrew {#install-homebrew} + +``` bash +$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" +``` + +## УÑтановка Xcode и инÑтрументов командной Ñтроки {#install-xcode-and-command-line-tools} + + 1. УÑтановите из App Store поÑледнюю верÑию [Xcode](https://apps.apple.com/am/app/xcode/id497799835?mt=12). + + 2. ЗапуÑтите ее, чтобы принÑÑ‚ÑŒ лицензионное Ñоглашение. Ðеобходимые компоненты уÑтановÑÑ‚ÑÑ Ð°Ð²Ñ‚Ð¾Ð¼Ð°Ñ‚Ð¸Ñ‡ÐµÑки. + + 3. Затем убедитеÑÑŒ, что в ÑиÑтеме выбрана поÑледнÑÑ Ð²ÐµÑ€ÑÐ¸Ñ Ð¸Ð½Ñтрументов командной Ñтроки: + + ``` bash + $ sudo rm -rf /Library/Developer/CommandLineTools + $ sudo xcode-select --install + ``` + + 4. ПерезагрузитеÑÑŒ. + +## УÑтановка компилÑторов, инÑтрументов и библиотек {#install-required-compilers-tools-and-libraries} + + ``` bash + $ brew update + $ brew install cmake ninja libtool gettext llvm gcc + ``` + +## ПроÑмотр иÑходников ClickHouse {#checkout-clickhouse-sources} + + ``` bash + $ git clone --recursive git@github.com:ClickHouse/ClickHouse.git # or https://github.com/ClickHouse/ClickHouse.git + ``` + +## Сборка ClickHouse {#build-clickhouse} + + Чтобы запуÑтить Ñборку в компилÑторе Xcode's native AppleClang: + + ``` bash + $ cd ClickHouse + $ rm -rf build + $ mkdir build + $ cd build + $ cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_JEMALLOC=OFF .. + $ cmake --build . --config RelWithDebInfo + $ cd .. + ``` + +Чтобы запуÑтить Ñборку в компилÑторе Homebrew's vanilla Clang: + + ``` bash + $ cd ClickHouse + $ rm -rf build + $ mkdir build + $ cd build + $ cmake -DCMAKE_C_COMPILER=$(brew --prefix llvm)/bin/clang -DCMAKE_CXX_COMPILER==$(brew --prefix llvm)/bin/clang++ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_JEMALLOC=OFF .. + $ cmake -DCMAKE_C_COMPILER=$(brew --prefix llvm)/bin/clang -DCMAKE_CXX_COMPILER=$(brew --prefix llvm)/bin/clang++ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_JEMALLOC=OFF .. + $ cmake --build . --config RelWithDebInfo + $ cd .. + ``` + +Чтобы Ñобрать Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ компилÑтора Homebrew's vanilla GCC: + + ``` bash + $ cd ClickHouse + $ rm -rf build + $ mkdir build + $ cd build + $ cmake -DCMAKE_C_COMPILER=$(brew --prefix gcc)/bin/gcc-10 -DCMAKE_CXX_COMPILER=$(brew --prefix gcc)/bin/g++-10 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_JEMALLOC=OFF .. + $ cmake --build . --config RelWithDebInfo + $ cd .. + ``` + +## ÐŸÑ€ÐµÐ´ÑƒÐ¿Ñ€ÐµÐ¶Ð´ÐµÐ½Ð¸Ñ {#caveats} + +ЕÑли будете запуÑкать `clickhouse-server`, убедитеÑÑŒ, что увеличили ÑиÑтемную переменную `maxfiles`. + +!!! info "Note" + Вам понадобитÑÑ ÐºÐ¾Ð¼Ð°Ð½Ð´Ð° `sudo`. + +1. Создайте файл `/Library/LaunchDaemons/limit.maxfiles.plist` и помеÑтите в него Ñледующее: + + ``` xml + + + + + Label + limit.maxfiles + ProgramArguments + + launchctl + limit + maxfiles + 524288 + 524288 + + RunAtLoad + + ServiceIPC + + + + ``` + +2. Выполните команду: + + ``` bash + $ sudo chown root:wheel /Library/LaunchDaemons/limit.maxfiles.plist + ``` + +3. ПерезагрузитеÑÑŒ. + +4. Чтобы проверить, как Ñто работает, выполните команду `ulimit -n`. + +[Original article](https://clickhouse.tech/docs/en/development/build_osx/) diff --git a/docs/ru/development/contrib.md b/docs/ru/development/contrib.md index f3310836ba9..33a533d7f75 100644 --- a/docs/ru/development/contrib.md +++ b/docs/ru/development/contrib.md @@ -38,3 +38,15 @@ toc_title: "ИÑпользуемые Ñторонние библиотеки" | UnixODBC | [LGPL v2.1](https://github.com/ClickHouse-Extras/UnixODBC/tree/b0ad30f7f6289c12b76f04bfb9d466374bb32168) | | zlib-ng | [Zlib License](https://github.com/ClickHouse-Extras/zlib-ng/blob/develop/LICENSE.md) | | zstd | [BSD 3-Clause License](https://github.com/facebook/zstd/blob/dev/LICENSE) | + +## Рекомендации по добавлению Ñторонних библиотек и поддержанию в них пользовательÑких изменений {#adding-third-party-libraries} + +1. ВеÑÑŒ внешний Ñторонний код должен находитьÑÑ Ð² отдельных папках внутри папки `contrib` Ñ€ÐµÐ¿Ð¾Ð·Ð¸Ñ‚Ð¾Ñ€Ð¸Ñ ClickHouse. По возможноÑти, иÑпользуйте Ñабмодули Git. +2. Клонируйте официальный репозиторий [Clickhouse-extras](https://github.com/ClickHouse-Extras). ИÑпользуйте официальные репозитории GitHub, еÑли они доÑтупны. +3. Создавайте новую ветку на оÑнове той ветки, которую вы хотите интегрировать: например, `master` -> `clickhouse/master` или `release/vX.Y.Z` -> `clickhouse/release/vX.Y.Z`. +4. Ð’Ñе копии [Clickhouse-extras](https://github.com/ClickHouse-Extras) можно автоматичеÑки Ñинхронизировать Ñ ÑƒÐ´Ð°Ð»ÐµÐ½Ð½Ñ‹Ð¼Ð¸ репозиториÑми. Ветки `clickhouse/...` оÑтанутÑÑ Ð½ÐµÐ·Ð°Ñ‚Ñ€Ð¾Ð½ÑƒÑ‚Ñ‹Ð¼Ð¸, поÑкольку Ñкорее вÑего никто не будет иÑпользовать Ñтот шаблон Ð¸Ð¼ÐµÐ½Ð¾Ð²Ð°Ð½Ð¸Ñ Ð² Ñвоих репозиториÑÑ…. +5. Добавьте Ñабмодули в папку `contrib` Ñ€ÐµÐ¿Ð¾Ð·Ð¸Ñ‚Ð¾Ñ€Ð¸Ñ ClickHouse, на который ÑÑылаютÑÑ ÐºÐ»Ð¾Ð½Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð½Ñ‹Ðµ репозитории. ÐаÑтройте Ñабмодули Ð´Ð»Ñ Ð¾Ñ‚ÑÐ»ÐµÐ¶Ð¸Ð²Ð°Ð½Ð¸Ñ Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ð¹ в ÑоответÑтвующих ветках `clickhouse/...`. +6. Каждый раз, когда необходимо внеÑти Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ð² код библиотеки, Ñледует Ñоздавать отдельную ветку, например `clickhouse/my-fix`. Затем Ñта ветка должна быть Ñлита (`merge`) в ветку, отÑлеживаемую Ñабмодулем, например, в `clickhouse/master` или `clickhouse/release/vX.Y.Z`. +7. Ðе добавлÑйте код в клоны Ñ€ÐµÐ¿Ð¾Ð·Ð¸Ñ‚Ð¾Ñ€Ð¸Ñ [Clickhouse-extras](https://github.com/ClickHouse-Extras), еÑли Ð¸Ð¼Ñ Ð²ÐµÑ‚ÐºÐ¸ не ÑоответÑтвует шаблону `clickhouse/...`. +8. Ð’Ñегда вноÑите Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ñ ÑƒÑ‡ÐµÑ‚Ð¾Ð¼ того, что они попадут в официальный репозиторий. ПоÑле того как PR будет влит из (ветки разработки/иÑправлений) вашего личного клона Ñ€ÐµÐ¿Ð¾Ð·Ð¸Ñ‚Ð¾Ñ€Ð¸Ñ Ð² [Clickhouse-extras](https://github.com/ClickHouse-Extras), и Ñабмодуль будет добавлен в репозиторий ClickHouse, рекомендуетÑÑ Ñделать еще один PR из (ветки разработки/иÑправлений) Ñ€ÐµÐ¿Ð¾Ð·Ð¸Ñ‚Ð¾Ñ€Ð¸Ñ [Clickhouse-extras](https://github.com/ClickHouse-Extras) в официальный репозиторий библиотеки. Таким образом будут решены Ñледующие задачи: 1) публикуемый код может быть иÑпользован многократно и будет иметь более выÑокую ценноÑÑ‚ÑŒ; 2) другие пользователи также Ñмогут иÑпользовать его в Ñвоих целÑÑ…; 3) поддержкой кода будут заниматьÑÑ Ð½Ðµ только разработчики ClickHouse. +9. Чтобы Ñабмодуль начал иÑпользовать новый код из иÑходной ветки (например, `master`), Ñначала Ñледует аккуратно выполнить ÑлиÑние (`master` -> `clickhouse/master`), и только поÑле Ñтого Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ð¼Ð¾Ð³ÑƒÑ‚ быть добавлены в оÑновной репозиторий ClickHouse. Это ÑвÑзано Ñ Ñ‚ÐµÐ¼, что в отÑлеживаемую ветку (например, `clickhouse/master`) могут быть внеÑены изменениÑ, и поÑтому ветка может отличатьÑÑ Ð¾Ñ‚ первоиÑточника (`master`). diff --git a/docs/ru/development/developer-instruction.md b/docs/ru/development/developer-instruction.md index 463d38a44fb..391d28d5a89 100644 --- a/docs/ru/development/developer-instruction.md +++ b/docs/ru/development/developer-instruction.md @@ -92,7 +92,7 @@ ClickHouse не работает и не ÑобираетÑÑ Ð½Ð° 32-битны # Две поÑледние команды могут быть объединены вмеÑте: git submodule update --init -The next commands would help you to reset all submodules to the initial state (!WARING! - any chenges inside will be deleted): +The next commands would help you to reset all submodules to the initial state (!WARING! - any changes inside will be deleted): Следующие команды помогут ÑброÑить вÑе Ñабмодули в изначальное ÑоÑтоÑние (!Ð’ÐИМÐÐИЕ! - вÑе Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ð² ÑабмодулÑÑ… будут утерÑны): # Synchronizes submodules' remote URL with .gitmodules @@ -128,7 +128,7 @@ Ninja - ÑиÑтема запуÑка Ñборочных задач. /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" brew install cmake ninja -Проверьте верÑию CMake: `cmake --version`. ЕÑли верÑÐ¸Ñ Ð¼ÐµÐ½ÑŒÑˆÐµ 3.3, то уÑтановите новую верÑию Ñ Ñайта https://cmake.org/download/ +Проверьте верÑию CMake: `cmake --version`. ЕÑли верÑÐ¸Ñ Ð¼ÐµÐ½ÑŒÑˆÐµ 3.12, то уÑтановите новую верÑию Ñ Ñайта https://cmake.org/download/ ## ÐеобÑзательные внешние библиотеки {#neobiazatelnye-vneshnie-biblioteki} @@ -242,6 +242,8 @@ sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)" Стиль кода: https://clickhouse.tech/docs/ru/development/style/ +Рекомендации по добавлению Ñторонних библиотек и поддержанию в них пользовательÑких изменений: https://clickhouse.tech/docs/ru/development/contrib/#adding-third-party-libraries + Разработка теÑтов: https://clickhouse.tech/docs/ru/development/tests/ СпиÑок задач: https://github.com/ClickHouse/ClickHouse/issues?q=is%3Aopen+is%3Aissue+label%3A%22easy+task%22 diff --git a/docs/ru/development/style.md b/docs/ru/development/style.md index 6e1230b4831..c73eb138c9c 100644 --- a/docs/ru/development/style.md +++ b/docs/ru/development/style.md @@ -820,11 +820,11 @@ The dictionary is configured incorrectly. **10.** Ðенужный код удалÑетÑÑ Ð¸Ð· иÑходников. -## Библиотеки {#biblioteki} +## Библиотеки {#libraries} -**1.** ИÑпользуютÑÑ ÑÑ‚Ð°Ð½Ð´Ð°Ñ€Ñ‚Ð½Ð°Ñ Ð±Ð¸Ð±Ð»Ð¸Ð¾Ñ‚ÐµÐºÐ° C++20 (допуÑтимо иÑпользовать ÑкÑпериментальные раÑширениÑ) а также фреймворки `boost`, `Poco`. +**1.** ИÑпользуютÑÑ Ñтандартные библиотеки C++20 (допуÑтимо иÑпользовать ÑкÑпериментальные раÑширениÑ), а также фреймворки `boost`, `Poco`. -**2.** Библиотеки должны быть раÑположены в виде иÑходников в директории `contrib` и ÑобиратьÑÑ Ð²Ð¼ÐµÑте Ñ ClickHouse. Ðе разрешено иÑпользовать библиотеки, доÑтупные в пакетах ОС или любые другие ÑпоÑобы уÑтановки библиотек в ÑиÑтему. +**2.** Библиотеки должны быть раÑположены в виде иÑходников в директории `contrib` и ÑобиратьÑÑ Ð²Ð¼ÐµÑте Ñ ClickHouse. Ðе разрешено иÑпользовать библиотеки, доÑтупные в пакетах ОС, или любые другие ÑпоÑобы уÑтановки библиотек в ÑиÑтему. Подробнее Ñмотрите раздел [Рекомендации по добавлению Ñторонних библиотек и поддержанию в них пользовательÑких изменений](contrib.md#adding-third-party-libraries). **3.** Предпочтение отдаётÑÑ ÑƒÐ¶Ðµ иÑпользующимÑÑ Ð±Ð¸Ð±Ð»Ð¸Ð¾Ñ‚ÐµÐºÐ°Ð¼. @@ -902,4 +902,3 @@ function( const & RangesInDataParts ranges, size_t limit) ``` - diff --git a/docs/ru/engines/database-engines/atomic.md b/docs/ru/engines/database-engines/atomic.md index 8c75be3d93b..ecdd809b6ec 100644 --- a/docs/ru/engines/database-engines/atomic.md +++ b/docs/ru/engines/database-engines/atomic.md @@ -25,7 +25,7 @@ CREATE TABLE name UUID '28f1c61c-2970-457a-bffe-454156ddcfef' (n UInt64) ENGINE ``` ### RENAME TABLE {#rename-table} -ЗапроÑÑ‹ `RENAME` выполнÑÑŽÑ‚ÑÑ Ð±ÐµÐ· Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ UUID и Ð¿ÐµÑ€ÐµÐ¼ÐµÑ‰ÐµÐ½Ð¸Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ‡Ð½Ñ‹Ñ… данных. Эти запроÑÑ‹ не ожидают Ð·Ð°Ð²ÐµÑ€ÑˆÐµÐ½Ð¸Ñ Ð¸Ñпользующих таблицу запроÑов и будут выполнены мгновенно. +ЗапроÑÑ‹ `RENAME` выполнÑÑŽÑ‚ÑÑ Ð±ÐµÐ· Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ UUID и Ð¿ÐµÑ€ÐµÐ¼ÐµÑ‰ÐµÐ½Ð¸Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ‡Ð½Ñ‹Ñ… данных. Эти запроÑÑ‹ не ожидают Ð·Ð°Ð²ÐµÑ€ÑˆÐµÐ½Ð¸Ñ Ð¸Ñпользующих таблицу запроÑов и будут выполнены мгновенно. ### DROP/DETACH TABLE {#drop-detach-table} diff --git a/docs/ru/engines/database-engines/index.md b/docs/ru/engines/database-engines/index.md index d4fad8f43a9..704d8ce9df2 100644 --- a/docs/ru/engines/database-engines/index.md +++ b/docs/ru/engines/database-engines/index.md @@ -14,9 +14,11 @@ toc_title: "Введение" - [MySQL](../../engines/database-engines/mysql.md) -- [MaterializeMySQL](../../engines/database-engines/materialize-mysql.md) +- [MaterializedMySQL](../../engines/database-engines/materialized-mysql.md) - [Lazy](../../engines/database-engines/lazy.md) - [PostgreSQL](../../engines/database-engines/postgresql.md) +- [Replicated](../../engines/database-engines/replicated.md) + diff --git a/docs/ru/engines/database-engines/materialize-mysql.md b/docs/ru/engines/database-engines/materialized-mysql.md similarity index 80% rename from docs/ru/engines/database-engines/materialize-mysql.md rename to docs/ru/engines/database-engines/materialized-mysql.md index db2208a9016..f5f0166c9dc 100644 --- a/docs/ru/engines/database-engines/materialize-mysql.md +++ b/docs/ru/engines/database-engines/materialized-mysql.md @@ -1,21 +1,22 @@ + --- toc_priority: 29 -toc_title: MaterializeMySQL +toc_title: MaterializedMySQL --- -# MaterializeMySQL {#materialize-mysql} +# MaterializedMySQL {#materialized-mysql} Создает базу данных ClickHouse Ñо вÑеми таблицами, ÑущеÑтвующими в MySQL, и вÑеми данными в Ñтих таблицах. Сервер ClickHouse работает как реплика MySQL. Он читает файл binlog и выполнÑет DDL and DML-запроÑÑ‹. -`MaterializeMySQL` — ÑкÑпериментальный движок баз данных. +`MaterializedMySQL` — ÑкÑпериментальный движок баз данных. ## Создание базы данных {#creating-a-database} ``` sql CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] -ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...] +ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...] ``` **Параметры движка** @@ -27,11 +28,11 @@ ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'passwor ## Виртуальные Ñтолбцы {#virtual-columns} -При работе Ñ Ð´Ð²Ð¸Ð¶ÐºÐ¾Ð¼ баз данных `MaterializeMySQL` иÑпользуютÑÑ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹ ÑемейÑтва [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) Ñ Ð²Ð¸Ñ€Ñ‚ÑƒÐ°Ð»ÑŒÐ½Ñ‹Ð¼Ð¸ Ñтолбцами `_sign` и `_version`. +При работе Ñ Ð´Ð²Ð¸Ð¶ÐºÐ¾Ð¼ баз данных `MaterializedMySQL` иÑпользуютÑÑ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹ ÑемейÑтва [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) Ñ Ð²Ð¸Ñ€Ñ‚ÑƒÐ°Ð»ÑŒÐ½Ñ‹Ð¼Ð¸ Ñтолбцами `_sign` и `_version`. - `_version` — Ñчетчик транзакций. Тип [UInt64](../../sql-reference/data-types/int-uint.md). - `_sign` — метка удалениÑ. Тип [Int8](../../sql-reference/data-types/int-uint.md). Возможные значениÑ: - - `1` — Ñтрока не удалена, + - `1` — Ñтрока не удалена, - `-1` — Ñтрока удалена. ## Поддержка типов данных {#data_types-support} @@ -74,13 +75,15 @@ DDL-запроÑÑ‹ в MySQL конвертируютÑÑ Ð² ÑоответÑтв - Ð—Ð°Ð¿Ñ€Ð¾Ñ `UPDATE` конвертируетÑÑ Ð² ClickHouse в `INSERT` Ñ `_sign=-1` и `INSERT` Ñ `_sign=1`. -### Выборка из таблиц движка MaterializeMySQL {#select} +### Выборка из таблиц движка MaterializedMySQL {#select} -Ð—Ð°Ð¿Ñ€Ð¾Ñ `SELECT` из таблиц движка `MaterializeMySQL` имеет некоторую Ñпецифику: +Ð—Ð°Ð¿Ñ€Ð¾Ñ `SELECT` из таблиц движка `MaterializedMySQL` имеет некоторую Ñпецифику: - ЕÑли в запроÑе `SELECT` напрÑмую не указан Ñтолбец `_version`, то иÑпользуетÑÑ Ð¼Ð¾Ð´Ð¸Ñ„Ð¸ÐºÐ°Ñ‚Ð¾Ñ€ [FINAL](../../sql-reference/statements/select/from.md#select-from-final). Таким образом, выбираютÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ Ñтроки Ñ `MAX(_version)`. -- ЕÑли в запроÑе `SELECT` напрÑмую не указан Ñтолбец `_sign`, то по умолчанию иÑпользуетÑÑ `WHERE _sign=1`. Таким образом, удаленные Ñтроки не включаютÑÑ Ð² результирующий набор. +- ЕÑли в запроÑе `SELECT` напрÑмую не указан Ñтолбец `_sign`, то по умолчанию иÑпользуетÑÑ `WHERE _sign=1`. Таким образом, удаленные Ñтроки не включаютÑÑ Ð² результирующий набор. + +- Результат включает комментарии к Ñтолбцам, еÑли они ÑущеÑтвуют в таблицах базы данных MySQL. ### ÐšÐ¾Ð½Ð²ÐµÑ€Ñ‚Ð°Ñ†Ð¸Ñ Ð¸Ð½Ð´ÐµÐºÑов {#index-conversion} @@ -91,10 +94,10 @@ DDL-запроÑÑ‹ в MySQL конвертируютÑÑ Ð² ÑоответÑтв **Примечание** - Строки Ñ `_sign=-1` физичеÑки не удалÑÑŽÑ‚ÑÑ Ð¸Ð· таблиц. -- КаÑкадные запроÑÑ‹ `UPDATE/DELETE` не поддерживаютÑÑ Ð´Ð²Ð¸Ð¶ÐºÐ¾Ð¼ `MaterializeMySQL`. +- КаÑкадные запроÑÑ‹ `UPDATE/DELETE` не поддерживаютÑÑ Ð´Ð²Ð¸Ð¶ÐºÐ¾Ð¼ `MaterializedMySQL`. - Ð ÐµÐ¿Ð»Ð¸ÐºÐ°Ñ†Ð¸Ñ Ð¼Ð¾Ð¶ÐµÑ‚ быть легко нарушена. -- ПрÑмые операции Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… в таблицах и базах данных `MaterializeMySQL` запрещены. -- Ðа работу `MaterializeMySQL` влиÑет наÑтройка [optimize_on_insert](../../operations/settings/settings.md#optimize-on-insert). Когда таблица на MySQL Ñервере менÑетÑÑ, проиÑходит ÑлиÑние данных в ÑоответÑвующей таблице в базе данных `MaterializeMySQL`. +- ПрÑмые операции Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… в таблицах и базах данных `MaterializedMySQL` запрещены. +- Ðа работу `MaterializedMySQL` влиÑет наÑтройка [optimize_on_insert](../../operations/settings/settings.md#optimize-on-insert). Когда таблица на MySQL Ñервере менÑетÑÑ, проиÑходит ÑлиÑние данных в ÑоответÑвующей таблице в базе данных `MaterializedMySQL`. ## Примеры иÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ {#examples-of-use} @@ -111,9 +114,9 @@ mysql> SELECT * FROM test; ``` ```text -+---+------+------+ ++---+------+------+ | a | b | c | -+---+------+------+ ++---+------+------+ | 2 | 222 | Wow! | +---+------+------+ ``` @@ -123,7 +126,7 @@ mysql> SELECT * FROM test; База данных и ÑÐ¾Ð·Ð´Ð°Ð½Ð½Ð°Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ð°: ``` sql -CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***'); +CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***'); SHOW TABLES FROM mysql; ``` @@ -140,9 +143,9 @@ SELECT * FROM mysql.test; ``` ``` text -┌─a─┬──b─┠-│ 1 │ 11 │ -│ 2 │ 22 │ +┌─a─┬──b─┠+│ 1 │ 11 │ +│ 2 │ 22 │ └───┴────┘ ``` @@ -153,8 +156,8 @@ SELECT * FROM mysql.test; ``` ``` text -┌─a─┬───b─┬─c────┠-│ 2 │ 222 │ Wow! │ +┌─a─┬───b─┬─c────┠+│ 2 │ 222 │ Wow! │ └───┴─────┴──────┘ ``` diff --git a/docs/ru/engines/database-engines/mysql.md b/docs/ru/engines/database-engines/mysql.md index 8d9354a7d7a..ae7cb6dfdb1 100644 --- a/docs/ru/engines/database-engines/mysql.md +++ b/docs/ru/engines/database-engines/mysql.md @@ -53,7 +53,7 @@ ENGINE = MySQL('host:port', ['database' | database], 'user', 'password') ## ИÑпользование глобальных переменных {#global-variables-support} -Ð”Ð»Ñ Ð»ÑƒÑ‡ÑˆÐµÐ¹ ÑовмеÑтимоÑти к глобальным переменным можно обращатьÑÑ Ð² формате MySQL, как `@@identifier`. +Ð”Ð»Ñ Ð»ÑƒÑ‡ÑˆÐµÐ¹ ÑовмеÑтимоÑти к глобальным переменным можно обращатьÑÑ Ð² формате MySQL, как `@@identifier`. ПоддерживаютÑÑ Ñледующие переменные: - `version` diff --git a/docs/ru/engines/database-engines/postgresql.md b/docs/ru/engines/database-engines/postgresql.md index c11dab6f1aa..06e2b35b002 100644 --- a/docs/ru/engines/database-engines/postgresql.md +++ b/docs/ru/engines/database-engines/postgresql.md @@ -14,7 +14,7 @@ toc_title: PostgreSQL ## Создание БД {#creating-a-database} ``` sql -CREATE DATABASE test_database +CREATE DATABASE test_database ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `use_table_cache`]); ``` @@ -43,14 +43,14 @@ ENGINE = PostgreSQL('host:port', 'database', 'user', 'password'[, `use_table_cac | TEXT, CHAR | [String](../../sql-reference/data-types/string.md) | | INTEGER | Nullable([Int32](../../sql-reference/data-types/int-uint.md))| | ARRAY | [Array](../../sql-reference/data-types/array.md) | - + ## Примеры иÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ {#examples-of-use} Обмен данными между БД ClickHouse и Ñервером PostgreSQL: ``` sql -CREATE DATABASE test_database +CREATE DATABASE test_database ENGINE = PostgreSQL('postgres1:5432', 'test_database', 'postgres', 'mysecretpassword', 1); ``` @@ -102,7 +102,7 @@ SELECT * FROM test_database.test_table; └────────┴───────┘ ``` -ПуÑÑ‚ÑŒ Ñтруктура таблицы была изменена в PostgreSQL: +ПуÑÑ‚ÑŒ Ñтруктура таблицы была изменена в PostgreSQL: ``` sql postgre> ALTER TABLE test_table ADD COLUMN data Text diff --git a/docs/ru/engines/database-engines/replicated.md b/docs/ru/engines/database-engines/replicated.md new file mode 100644 index 00000000000..f1d5755647a --- /dev/null +++ b/docs/ru/engines/database-engines/replicated.md @@ -0,0 +1,119 @@ + +# [ÑкÑпериментальный] Replicated {#replicated} + +Движок оÑнован на движке [Atomic](../../engines/database-engines/atomic.md). Он поддерживает репликацию метаданных через журнал DDL, запиÑываемый в ZooKeeper и выполнÑемый на вÑех репликах Ð´Ð»Ñ Ð´Ð°Ð½Ð½Ð¾Ð¹ базы данных. + +Ðа одном Ñервере ClickHouse может одновременно работать и обновлÑÑ‚ÑŒÑÑ Ð½ÐµÑколько реплицированных баз данных. Ðо не может ÑущеÑтвовать неÑкольких реплик одной и той же реплицированной базы данных. + +## Создание базы данных {#creating-a-database} +``` sql + CREATE DATABASE testdb ENGINE = Replicated('zoo_path', 'shard_name', 'replica_name') [SETTINGS ...] +``` + +**Параметры движка** + +- `zoo_path` — путь в ZooKeeper. Один и тот же путь ZooKeeper ÑоответÑтвует одной и той же базе данных. +- `shard_name` — Ð˜Ð¼Ñ ÑˆÐ°Ñ€Ð´Ð°. Реплики базы данных группируютÑÑ Ð² шарды по имени. +- `replica_name` — Ð˜Ð¼Ñ Ñ€ÐµÐ¿Ð»Ð¸ÐºÐ¸. Имена реплик должны быть разными Ð´Ð»Ñ Ð²Ñех реплик одного и того же шарда. + +!!! note "Предупреждение" + Ð”Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ† [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) еÑли аргументы не заданы, то иÑпользуютÑÑ Ð°Ñ€Ð³ÑƒÐ¼ÐµÐ½Ñ‚Ñ‹ по умолчанию: `/clickhouse/tables/{uuid}/{shard}` и `{replica}`. Они могут быть изменены в Ñерверных наÑтройках: [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) и [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). ÐœÐ°ÐºÑ€Ð¾Ñ `{uuid}` раÑкрываетÑÑ Ð² `UUID` таблицы, `{shard}` и `{replica}` — в Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð· конфига Ñервера. Ð’ будущем поÑвитÑÑ Ð²Ð¾Ð·Ð¼Ð¾Ð¶Ð½Ð¾ÑÑ‚ÑŒ иÑпользовать Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ `shard_name` и `replica_name` аргументов движка базы данных `Replicated`. + +## ОÑобенноÑти и рекомендации {#specifics-and-recommendations} + +DDL-запроÑÑ‹ Ñ Ð±Ð°Ð·Ð¾Ð¹ данных `Replicated` работают похожим образом на [ON CLUSTER](../../sql-reference/distributed-ddl.md) запроÑÑ‹, но Ñ Ð½ÐµÐ±Ð¾Ð»ÑŒÑˆÐ¸Ð¼Ð¸ отличиÑми. + +Сначала DDL-Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð¿Ñ‹Ñ‚Ð°ÐµÑ‚ÑÑ Ð²Ñ‹Ð¿Ð¾Ð»Ð½Ð¸Ñ‚ÑŒÑÑ Ð½Ð° инициаторе (том хоÑте, который изначально получил Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð¾Ñ‚ пользователÑ). ЕÑли Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ выполнилÑÑ, то пользователь Ñразу получает ошибку, другие хоÑÑ‚Ñ‹ не пытаютÑÑ ÐµÐ³Ð¾ выполнить. ЕÑли Ð·Ð°Ð¿Ñ€Ð¾Ñ ÑƒÑпешно выполнилÑÑ Ð½Ð° инициаторе, то вÑе оÑтальные хоÑÑ‚Ñ‹ будут автоматичеÑки делать попытки выполнить его. +Инициатор попытаетÑÑ Ð´Ð¾Ð¶Ð´Ð°Ñ‚ÑŒÑÑ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа на других хоÑтах (не дольше [distributed_ddl_task_timeout](../../operations/settings/settings.md#distributed_ddl_task_timeout)) и вернёт таблицу Ñо ÑтатуÑами Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа на каждом хоÑте. + +Поведение в Ñлучае ошибок регулируетÑÑ Ð½Ð°Ñтройкой [distributed_ddl_output_mode](../../operations/settings/settings.md#distributed_ddl_output_mode), Ð´Ð»Ñ `Replicated` лучше выÑтавлÑÑ‚ÑŒ её в `null_status_on_timeout` — Ñ‚.е. еÑли какие-то хоÑÑ‚Ñ‹ не уÑпели выполнить Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð·Ð° [distributed_ddl_task_timeout](../../operations/settings/settings.md#distributed_ddl_task_timeout), то вмеÑто иÑÐºÐ»ÑŽÑ‡ÐµÐ½Ð¸Ñ Ð´Ð»Ñ Ð½Ð¸Ñ… будет показан ÑÑ‚Ð°Ñ‚ÑƒÑ `NULL` в таблице. + +Ð’ ÑиÑтемной таблице [system.clusters](../../operations/system-tables/clusters.md) еÑÑ‚ÑŒ клаÑтер Ñ Ð¸Ð¼ÐµÐ½ÐµÐ¼, как у реплицируемой базы, который ÑоÑтоит из вÑех реплик базы. Этот клаÑтер обновлÑетÑÑ Ð°Ð²Ñ‚Ð¾Ð¼Ð°Ñ‚Ð¸Ñ‡ÐµÑки при Ñоздании/удалении реплик, и его можно иÑпользовать Ð´Ð»Ñ [Distributed](../../engines/table-engines/special/distributed.md#distributed) таблиц. + + При Ñоздании новой реплики базы, Ñта реплика Ñама Ñоздаёт таблицы. ЕÑли реплика долго была недоÑтупна и отÑтала от лога репликации — она ÑверÑет Ñвои локальные метаданные Ñ Ð°ÐºÑ‚ÑƒÐ°Ð»ÑŒÐ½Ñ‹Ð¼Ð¸ метаданными в ZooKeeper, перекладывает лишние таблицы Ñ Ð´Ð°Ð½Ð½Ñ‹Ð¼Ð¸ в отдельную нереплицируемую базу (чтобы Ñлучайно не удалить что-нибудь лишнее), Ñоздаёт недоÑтающие таблицы, обновлÑет имена таблиц, еÑли были переименованиÑ. Данные реплицируютÑÑ Ð½Ð° уровне `ReplicatedMergeTree`, Ñ‚.е. еÑли таблица не реплицируемаÑ, то данные реплицироватьÑÑ Ð½Ðµ будут (база отвечает только за метаданные). + +## Примеры иÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ {#usage-example} + +Создадим реплицируемую базу на трех хоÑтах: + +``` sql +node1 :) CREATE DATABASE r ENGINE=Replicated('some/path/r','shard1','replica1'); +node2 :) CREATE DATABASE r ENGINE=Replicated('some/path/r','shard1','other_replica'); +node3 :) CREATE DATABASE r ENGINE=Replicated('some/path/r','other_shard','{replica}'); +``` + +Выполним DDL-Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ð° одном из хоÑтов: + +``` sql +CREATE TABLE r.rmt (n UInt64) ENGINE=ReplicatedMergeTree ORDER BY n; +``` + +Ð—Ð°Ð¿Ñ€Ð¾Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½Ð¸Ñ‚ÑÑ Ð½Ð° вÑех оÑтальных хоÑтах: + +``` text +┌─────hosts────────────┬──status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┠+│ shard1|replica1 │ 0 │ │ 2 │ 0 │ +│ shard1|other_replica │ 0 │ │ 1 │ 0 │ +│ other_shard|r1 │ 0 │ │ 0 │ 0 │ +└──────────────────────┴─────────┴───────┴─────────────────────┴──────────────────┘ +``` + +КлаÑтер в ÑиÑтемной таблице `system.clusters`: + +``` sql +SELECT cluster, shard_num, replica_num, host_name, host_address, port, is_local +FROM system.clusters WHERE cluster='r'; +``` + +``` text +┌─cluster─┬─shard_num─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┠+│ r │ 1 │ 1 │ node3 │ 127.0.0.1 │ 9002 │ 0 │ +│ r │ 2 │ 1 │ node2 │ 127.0.0.1 │ 9001 │ 0 │ +│ r │ 2 │ 2 │ node1 │ 127.0.0.1 │ 9000 │ 1 │ +└─────────┴───────────┴─────────────┴───────────┴──────────────┴──────┴──────────┘ +``` + +Создадим раÑпределенную таблицу и вÑтавим в нее данные: + +``` sql +node2 :) CREATE TABLE r.d (n UInt64) ENGINE=Distributed('r','r','rmt', n % 2); +node3 :) INSERT INTO r SELECT * FROM numbers(10); +node1 :) SELECT materialize(hostName()) AS host, groupArray(n) FROM r.d GROUP BY host; +``` + +``` text +┌─hosts─┬─groupArray(n)─┠+│ node1 │ [1,3,5,7,9] │ +│ node2 │ [0,2,4,6,8] │ +└───────┴───────────────┘ +``` + +Добавление реплики: + +``` sql +node4 :) CREATE DATABASE r ENGINE=Replicated('some/path/r','other_shard','r2'); +``` + +ÐÐ¾Ð²Ð°Ñ Ñ€ÐµÐ¿Ð»Ð¸ÐºÐ° автоматичеÑки ÑоздаÑÑ‚ вÑе таблицы, которые еÑÑ‚ÑŒ в базе, а Ñтарые реплики перезагрузÑÑ‚ из ZooKeeper-а конфигурацию клаÑтера: + +``` text +┌─cluster─┬─shard_num─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┠+│ r │ 1 │ 1 │ node3 │ 127.0.0.1 │ 9002 │ 0 │ +│ r │ 1 │ 2 │ node4 │ 127.0.0.1 │ 9003 │ 0 │ +│ r │ 2 │ 1 │ node2 │ 127.0.0.1 │ 9001 │ 0 │ +│ r │ 2 │ 2 │ node1 │ 127.0.0.1 │ 9000 │ 1 │ +└─────────┴───────────┴─────────────┴───────────┴──────────────┴──────┴──────────┘ +``` + +РаÑÐ¿Ñ€ÐµÐ´ÐµÐ»ÐµÐ½Ð½Ð°Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ð° также получит данные от нового хоÑта: + +```sql +node2 :) SELECT materialize(hostName()) AS host, groupArray(n) FROM r.d GROUP BY host; +``` + +```text +┌─hosts─┬─groupArray(n)─┠+│ node2 │ [1,3,5,7,9] │ +│ node4 │ [0,2,4,6,8] │ +└───────┴───────────────┘ +``` \ No newline at end of file diff --git a/docs/ru/engines/table-engines/index.md b/docs/ru/engines/table-engines/index.md index b17b2124250..1636efc5112 100644 --- a/docs/ru/engines/table-engines/index.md +++ b/docs/ru/engines/table-engines/index.md @@ -87,7 +87,7 @@ toc_title: "Введение" Виртуальный Ñтолбец — Ñто неотъемлемый атрибут движка таблиц, определенный в иÑходном коде движка. -Виртуальные Ñтолбцы не надо указывать в запроÑе `CREATE TABLE` и их не отображаютÑÑ Ð² результатах запроÑов `SHOW CREATE TABLE` и `DESCRIBE TABLE`. Также виртуальные Ñтолбцы доÑтупны только Ð´Ð»Ñ Ñ‡Ñ‚ÐµÐ½Ð¸Ñ, поÑтому вы не можете вÑтавлÑÑ‚ÑŒ в них данные. +Виртуальные Ñтолбцы не надо указывать в запроÑе `CREATE TABLE` и они не отображаютÑÑ Ð² результатах запроÑов `SHOW CREATE TABLE` и `DESCRIBE TABLE`. Также виртуальные Ñтолбцы доÑтупны только Ð´Ð»Ñ Ñ‡Ñ‚ÐµÐ½Ð¸Ñ, поÑтому вы не можете вÑтавлÑÑ‚ÑŒ в них данные. Чтобы получить данные из виртуального Ñтолбца, необходимо указать его название в запроÑе `SELECT`. `SELECT *` не отображает данные из виртуальных Ñтолбцов. diff --git a/docs/ru/engines/table-engines/integrations/ExternalDistributed.md b/docs/ru/engines/table-engines/integrations/ExternalDistributed.md index 5b4386ff8b9..d5dd0782fb2 100644 --- a/docs/ru/engines/table-engines/integrations/ExternalDistributed.md +++ b/docs/ru/engines/table-engines/integrations/ExternalDistributed.md @@ -35,7 +35,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] - `password` — пароль пользователÑ. ## ОÑобенноÑти реализации {#implementation-details} - + Поддерживает неÑколько реплик, которые должны быть перечиÑлены через `|`, а шарды — через `,`. Ðапример: ```sql diff --git a/docs/ru/engines/table-engines/integrations/embedded-rocksdb.md b/docs/ru/engines/table-engines/integrations/embedded-rocksdb.md index 5a7909f63b2..617fd953406 100644 --- a/docs/ru/engines/table-engines/integrations/embedded-rocksdb.md +++ b/docs/ru/engines/table-engines/integrations/embedded-rocksdb.md @@ -15,16 +15,16 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], ... -) ENGINE = EmbeddedRocksDB +) ENGINE = EmbeddedRocksDB PRIMARY KEY(primary_key_name); ``` ОбÑзательные параметры: - `primary_key_name` может быть любое Ð¸Ð¼Ñ Ñтолбца из ÑпиÑка Ñтолбцов. -- Указание первичного ключа `primary key` ÑвлÑетÑÑ Ð¾Ð±Ñзательным. Он будет Ñериализован в двоичном формате как ключ `rocksdb`. +- Указание первичного ключа `primary key` ÑвлÑетÑÑ Ð¾Ð±Ñзательным. Он будет Ñериализован в двоичном формате как ключ `rocksdb`. - ПоддерживаетÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ один Ñтолбец в первичном ключе. -- Столбцы, которые отличаютÑÑ Ð¾Ñ‚ первичного ключа, будут Ñериализованы в двоичном формате как значение `rockdb` в ÑоответÑтвующем порÑдке. +- Столбцы, которые отличаютÑÑ Ð¾Ñ‚ первичного ключа, будут Ñериализованы в двоичном формате как значение `rockdb` в ÑоответÑтвующем порÑдке. - ЗапроÑÑ‹ Ñ Ñ„Ð¸Ð»ÑŒÑ‚Ñ€Ð°Ñ†Ð¸ÐµÐ¹ по ключу `equals` или `in` оптимизируютÑÑ Ð´Ð»Ñ Ð¿Ð¾Ð¸Ñка по неÑкольким ключам из `rocksdb`. Пример: diff --git a/docs/ru/engines/table-engines/integrations/mongodb.md b/docs/ru/engines/table-engines/integrations/mongodb.md index 97f903bdf89..05820d03fe6 100644 --- a/docs/ru/engines/table-engines/integrations/mongodb.md +++ b/docs/ru/engines/table-engines/integrations/mongodb.md @@ -37,7 +37,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name ``` text CREATE TABLE mongo_table ( - key UInt64, + key UInt64, data String ) ENGINE = MongoDB('mongo1:27017', 'test', 'simple_table', 'testuser', 'clickhouse'); ``` diff --git a/docs/ru/engines/table-engines/integrations/postgresql.md b/docs/ru/engines/table-engines/integrations/postgresql.md index caf3bb8c69a..b2b0db3067a 100644 --- a/docs/ru/engines/table-engines/integrations/postgresql.md +++ b/docs/ru/engines/table-engines/integrations/postgresql.md @@ -33,7 +33,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] - `table` — Ð¸Ð¼Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹. - `user` — Ð¸Ð¼Ñ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ PostgreSQL. - `password` — пароль Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ PostgreSQL. -- `schema` — Ð¸Ð¼Ñ Ñхемы, еÑли не иÑпользуетÑÑ Ñхема по умолчанию. ÐеобÑзательный аргумент. +- `schema` — Ð¸Ð¼Ñ Ñхемы, еÑли не иÑпользуетÑÑ Ñхема по умолчанию. ÐеобÑзательный аргумент. ## ОÑобенноÑти реализации {#implementation-details} @@ -49,14 +49,14 @@ PostgreSQL маÑÑивы конвертируютÑÑ Ð² маÑÑивы ClickHo !!! info "Внимание" Будьте внимательны, в PostgreSQL маÑÑивы, Ñозданные как `type_name[]`, ÑвлÑÑŽÑ‚ÑÑ Ð¼Ð½Ð¾Ð³Ð¾Ð¼ÐµÑ€Ð½Ñ‹Ð¼Ð¸ и могут Ñодержать в Ñебе разное количеÑтво измерений в разных Ñтроках одной таблицы. Внутри ClickHouse допуÑтимы только многомерные маÑÑивы Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ кол-вом измерений во вÑех Ñтроках таблицы. - + Поддерживает неÑколько реплик, которые должны быть перечиÑлены через `|`. Ðапример: ```sql CREATE TABLE test_replicas (id UInt32, name String) ENGINE = PostgreSQL(`postgres{2|3|4}:5432`, 'clickhouse', 'test_replicas', 'postgres', 'mysecretpassword'); ``` -При иÑпользовании ÑÐ»Ð¾Ð²Ð°Ñ€Ñ PostgreSQL поддерживаетÑÑ Ð¿Ñ€Ð¸Ð¾Ñ€Ð¸Ñ‚ÐµÑ‚ реплик. Чем больше номер реплики, тем ниже ее приоритет. ÐаивыÑший приоритет у реплики Ñ Ð½Ð¾Ð¼ÐµÑ€Ð¾Ð¼ `0`. +При иÑпользовании ÑÐ»Ð¾Ð²Ð°Ñ€Ñ PostgreSQL поддерживаетÑÑ Ð¿Ñ€Ð¸Ð¾Ñ€Ð¸Ñ‚ÐµÑ‚ реплик. Чем больше номер реплики, тем ниже ее приоритет. ÐаивыÑший приоритет у реплики Ñ Ð½Ð¾Ð¼ÐµÑ€Ð¾Ð¼ `0`. Ð’ примере ниже реплика `example01-1` имеет более выÑокий приоритет: @@ -119,7 +119,7 @@ ENGINE = PostgreSQL('localhost:5432', 'public', 'test', 'postges_user', 'postgre ``` ``` sql -SELECT * FROM postgresql_table WHERE str IN ('test'); +SELECT * FROM postgresql_table WHERE str IN ('test'); ``` ``` text @@ -143,7 +143,7 @@ CREATE TABLE pg_table_schema_with_dots (a UInt32) ENGINE PostgreSQL('localhost:5432', 'clickhouse', 'nice.table', 'postgrsql_user', 'password', 'nice.schema'); ``` -**См. также** +**См. также** - [Ð¢Ð°Ð±Ð»Ð¸Ñ‡Ð½Ð°Ñ Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ `postgresql`](../../../sql-reference/table-functions/postgresql.md) - [ИÑпользование PostgreSQL в качеÑтве иÑточника Ð´Ð»Ñ Ð²Ð½ÐµÑˆÐ½ÐµÐ³Ð¾ ÑловарÑ](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md#dicts-external_dicts_dict_sources-postgresql) diff --git a/docs/ru/engines/table-engines/log-family/index.md b/docs/ru/engines/table-engines/log-family/index.md index 7737eac2f43..5eaf8dd0cfc 100644 --- a/docs/ru/engines/table-engines/log-family/index.md +++ b/docs/ru/engines/table-engines/log-family/index.md @@ -14,6 +14,8 @@ toc_priority: 29 - [Log](log.md) - [TinyLog](tinylog.md) +Табличные движки ÑемейÑтва `Log` могут хранить данные в раÑпределенных файловых ÑиÑтемах [HDFS](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-hdfs) или [S3](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-s3). + ## Общие ÑвойÑтва {#obshchie-svoistva} Движки: diff --git a/docs/ru/engines/table-engines/log-family/log.md b/docs/ru/engines/table-engines/log-family/log.md index 6c5bf2221f8..8ead518ebee 100644 --- a/docs/ru/engines/table-engines/log-family/log.md +++ b/docs/ru/engines/table-engines/log-family/log.md @@ -5,9 +5,8 @@ toc_title: Log # Log {#log} -Движок отноÑитÑÑ Ðº ÑемейÑтву движков Log. Смотрите общие ÑвойÑтва и Ñ€Ð°Ð·Ð»Ð¸Ñ‡Ð¸Ñ Ð´Ð²Ð¸Ð¶ÐºÐ¾Ð² в Ñтатье [СемейÑтво Log](index.md). - -ОтличаетÑÑ Ð¾Ñ‚ [TinyLog](tinylog.md) тем, что вмеÑте Ñ Ñ„Ð°Ð¹Ð»Ð°Ð¼Ð¸ Ñтолбцов лежит небольшой файл «заÑечек». ЗаÑечки пишутÑÑ Ð½Ð° каждый блок данных и Ñодержат Ñмещение - Ñ ÐºÐ°ÐºÐ¾Ð³Ð¾ меÑта нужно читать файл, чтобы пропуÑтить заданное количеÑтво Ñтрок. Это позволÑет читать данные из таблицы в неÑколько потоков. -При конкурентном доÑтупе к данным, Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¼Ð¾Ð³ÑƒÑ‚ выполнÑÑ‚ÑŒÑÑ Ð¾Ð´Ð½Ð¾Ð²Ñ€ÐµÐ¼ÐµÐ½Ð½Ð¾, а запиÑи блокируют Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¸ друг друга. -Движок Log не поддерживает индекÑÑ‹. Также, еÑли при запиÑи в таблицу произошёл Ñбой, то таблица Ñтанет битой, и Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¸Ð· неё будут возвращать ошибку. Движок Log подходит Ð´Ð»Ñ Ð²Ñ€ÐµÐ¼ÐµÐ½Ð½Ñ‹Ñ… данных, write-once таблиц, а также Ð´Ð»Ñ Ñ‚ÐµÑтовых и демонÑтрационных целей. +Движок отноÑитÑÑ Ðº ÑемейÑтву движков `Log`. Смотрите общие ÑвойÑтва и Ñ€Ð°Ð·Ð»Ð¸Ñ‡Ð¸Ñ Ð´Ð²Ð¸Ð¶ÐºÐ¾Ð² в Ñтатье [СемейÑтво Log](../../../engines/table-engines/log-family/index.md). +ОтличаетÑÑ Ð¾Ñ‚ [TinyLog](../../../engines/table-engines/log-family/tinylog.md) тем, что вмеÑте Ñ Ñ„Ð°Ð¹Ð»Ð°Ð¼Ð¸ Ñтолбцов лежит небольшой файл "заÑечек". ЗаÑечки пишутÑÑ Ð½Ð° каждый блок данных и Ñодержат Ñмещение: Ñ ÐºÐ°ÐºÐ¾Ð³Ð¾ меÑта нужно читать файл, чтобы пропуÑтить заданное количеÑтво Ñтрок. Это позволÑет читать данные из таблицы в неÑколько потоков. +При конкурентном доÑтупе к данным Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¼Ð¾Ð³ÑƒÑ‚ выполнÑÑ‚ÑŒÑÑ Ð¾Ð´Ð½Ð¾Ð²Ñ€ÐµÐ¼ÐµÐ½Ð½Ð¾, а запиÑи блокируют Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¸ друг друга. +Движок `Log` не поддерживает индекÑÑ‹. Также, еÑли при запиÑи в таблицу произошёл Ñбой, то таблица Ñтанет битой, и Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¸Ð· нее будут возвращать ошибку. Движок `Log` подходит Ð´Ð»Ñ Ð²Ñ€ÐµÐ¼ÐµÐ½Ð½Ñ‹Ñ… данных, write-once таблиц, а также Ð´Ð»Ñ Ñ‚ÐµÑтовых и демонÑтрационных целей. diff --git a/docs/ru/engines/table-engines/mergetree-family/collapsingmergetree.md b/docs/ru/engines/table-engines/mergetree-family/collapsingmergetree.md index 424fcbb5873..3e5acae2b1d 100644 --- a/docs/ru/engines/table-engines/mergetree-family/collapsingmergetree.md +++ b/docs/ru/engines/table-engines/mergetree-family/collapsingmergetree.md @@ -116,7 +116,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] Ð”Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ результирующего куÑка данных ClickHouse ÑохранÑет: -1. Первую Ñтроку отмены ÑоÑтоÑÐ½Ð¸Ñ Ð¸ поÑледнюю Ñтроку ÑоÑтоÑниÑ, еÑли количеÑтво Ñтрок обоих видов Ñовпадает и поÑледнÑÑ Ñтрока — Ñтрока ÑоÑтоÑниÑ. +1. Первую Ñтроку отмены ÑоÑтоÑÐ½Ð¸Ñ Ð¸ поÑледнюю Ñтроку ÑоÑтоÑниÑ, еÑли количеÑтво Ñтрок обоих видов Ñовпадает и поÑледнÑÑ Ñтрока — Ñтрока ÑоÑтоÑниÑ. 2. ПоÑледнюю Ñтроку ÑоÑтоÑниÑ, еÑли Ñтрок ÑоÑтоÑÐ½Ð¸Ñ Ð½Ð° одну больше, чем Ñтрок отмены ÑоÑтоÑниÑ. 3. Первую Ñтроку отмены ÑоÑтоÑниÑ, еÑли их на одну больше, чем Ñтрок ÑоÑтоÑниÑ. 4. Ðи одну из Ñтрок во вÑех оÑтальных ÑлучаÑÑ…. diff --git a/docs/ru/engines/table-engines/mergetree-family/graphitemergetree.md b/docs/ru/engines/table-engines/mergetree-family/graphitemergetree.md index f3e915a413b..891d5227100 100644 --- a/docs/ru/engines/table-engines/mergetree-family/graphitemergetree.md +++ b/docs/ru/engines/table-engines/mergetree-family/graphitemergetree.md @@ -134,7 +134,7 @@ default - `regexp` – шаблон имени метрики. - `age` – минимальный возраÑÑ‚ данных в Ñекундах. - `precision` – точноÑÑ‚ÑŒ Ð¾Ð¿Ñ€ÐµÐ´ÐµÐ»ÐµÐ½Ð¸Ñ Ð²Ð¾Ð·Ñ€Ð°Ñта данных в Ñекундах. Должен быть делителем Ð´Ð»Ñ 86400 (количеÑтво Ñекунд в Ñутках). -- `function` – Ð¸Ð¼Ñ Ð°Ð³Ñ€ÐµÐ³Ð¸Ñ€ÑƒÑŽÑ‰ÐµÐ¹ функции, которую Ñледует применить к данным, чей возраÑÑ‚ оказалÑÑ Ð² интервале `[age, age + precision]`. +- `function` – Ð¸Ð¼Ñ Ð°Ð³Ñ€ÐµÐ³Ð¸Ñ€ÑƒÑŽÑ‰ÐµÐ¹ функции, которую Ñледует применить к данным, чей возраÑÑ‚ оказалÑÑ Ð² интервале `[age, age + precision]`. ДопуÑтимые функции: min/max/any/avg. Avg вычиÑлÑетÑÑ Ð½ÐµÑ‚Ð¾Ñ‡Ð½Ð¾, как Ñреднее от Ñредних. ### Пример конфигурации {#configuration-example} @@ -171,3 +171,6 @@ default ``` + +!!! warning "Внимание" + Прореживание данных производитÑÑ Ð²Ð¾ Ð²Ñ€ÐµÐ¼Ñ ÑлиÑний. Обычно Ð´Ð»Ñ Ñтарых партций ÑлиÑÐ½Ð¸Ñ Ð½Ðµ запуÑкаютÑÑ, поÑтому Ð´Ð»Ñ Ð¿Ñ€Ð¾Ñ€ÐµÐ¶Ð¸Ð²Ð°Ð½Ð¸Ñ Ð½Ð°Ð´Ð¾ иницировать незапланированное ÑлиÑние иÑÐ¿Ð¾Ð»ÑŒÐ·ÑƒÑ [optimize](../../../sql-reference/statements/optimize/). Или иÑпользовать дополнительные инÑтрументы, например [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer). diff --git a/docs/ru/engines/table-engines/mergetree-family/mergetree.md b/docs/ru/engines/table-engines/mergetree-family/mergetree.md index e16f71b16cf..4bced6254d1 100644 --- a/docs/ru/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/ru/engines/table-engines/mergetree-family/mergetree.md @@ -771,7 +771,6 @@ SETTINGS storage_policy = 'moving_from_ssd_to_hdd' - `cache_path` — путь в локальной файловой ÑиÑтеме, где будут хранитьÑÑ ÐºÑш заÑечек и файлы индекÑа. Значение по умолчанию: `/var/lib/clickhouse/disks//cache/`. - `skip_access_check` — признак, выполнÑÑ‚ÑŒ ли проверку доÑтупов при запуÑке диÑка. ЕÑли уÑтановлено значение `true`, то проверка не выполнÑетÑÑ. Значение по умолчанию: `false`. - ДиÑк S3 может быть Ñконфигурирован как `main` или `cold`: ``` xml @@ -810,3 +809,44 @@ SETTINGS storage_policy = 'moving_from_ssd_to_hdd' ``` ЕÑли диÑк Ñконфигурирован как `cold`, данные будут переноÑитьÑÑ Ð² S3 при Ñрабатывании правил TTL или когда Ñвободное меÑто на локальном диÑке Ñтанет меньше порогового значениÑ, которое определÑетÑÑ ÐºÐ°Ðº `move_factor * disk_size`. + +## ИÑпользование ÑервиÑа HDFS Ð´Ð»Ñ Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… {#table_engine-mergetree-hdfs} + +[HDFS](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) — Ñто раÑÐ¿Ñ€ÐµÐ´ÐµÐ»ÐµÐ½Ð½Ð°Ñ Ñ„Ð°Ð¹Ð»Ð¾Ð²Ð°Ñ ÑиÑтема Ð´Ð»Ñ ÑƒÐ´Ð°Ð»ÐµÐ½Ð½Ð¾Ð³Ð¾ Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ…. + +Таблицы ÑемейÑтва `MergeTree` могут хранить данные в ÑервиÑе HDFS при иÑпользовании диÑка типа `HDFS`. + +Пример конфигурации: +``` xml + + + + + hdfs + hdfs://hdfs1:9000/clickhouse/ + + + + + +
+ hdfs +
+
+
+
+
+ + + 0 + +
+``` + +ОбÑзательные параметры: + +- `endpoint` — URL точки приема запроÑа на Ñтороне HDFS в формате `path`. URL точки должен Ñодержать путь к корневой директории на Ñервере, где хранÑÑ‚ÑÑ Ð´Ð°Ð½Ð½Ñ‹Ðµ. + +ÐеобÑзательные параметры: + +- `min_bytes_for_seek` — минимальное количеÑтво байтов, которые иÑпользуютÑÑ Ð´Ð»Ñ Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ð¹ поиÑка вмеÑто поÑледовательного чтениÑ. Значение по умолчанию: 1 МБайт. diff --git a/docs/ru/engines/table-engines/mergetree-family/replication.md b/docs/ru/engines/table-engines/mergetree-family/replication.md index cb92084695a..6a259ebd3b8 100644 --- a/docs/ru/engines/table-engines/mergetree-family/replication.md +++ b/docs/ru/engines/table-engines/mergetree-family/replication.md @@ -65,7 +65,7 @@ ClickHouse хранит метаинформацию о репликах в [Apa Ð ÐµÐ¿Ð»Ð¸ÐºÐ°Ñ†Ð¸Ñ Ð°ÑинхроннаÑ, мульти-маÑтер. ЗапроÑÑ‹ `INSERT` и `ALTER` можно направлÑÑ‚ÑŒ на любой доÑтупный Ñервер. Данные вÑтавÑÑ‚ÑÑ Ð½Ð° Ñервер, где выполнен запроÑ, а затем ÑкопируютÑÑ Ð½Ð° оÑтальные Ñерверы. Ð’ ÑвÑзи Ñ Ð°ÑинхронноÑтью, только что вÑтавленные данные поÑвлÑÑŽÑ‚ÑÑ Ð½Ð° оÑтальных репликах Ñ Ð½ÐµÐ±Ð¾Ð»ÑŒÑˆÐ¾Ð¹ задержкой. ЕÑли чаÑÑ‚ÑŒ реплик недоÑтупна, данные на них запишутÑÑ Ñ‚Ð¾Ð³Ð´Ð°, когда они Ñтанут доÑтупны. ЕÑли реплика доÑтупна, то задержка ÑоÑтавлÑет Ñтолько времени, Ñколько требуетÑÑ Ð´Ð»Ñ Ð¿ÐµÑ€ÐµÐ´Ð°Ñ‡Ð¸ блока Ñжатых данных по Ñети. КоличеÑтво потоков Ð´Ð»Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ñ„Ð¾Ð½Ð¾Ð²Ñ‹Ñ… задач можно задать Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ наÑтройки [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size). -Движок `ReplicatedMergeTree` иÑпользует отдельный пул потоков Ð´Ð»Ñ ÑÐºÐ°Ñ‡Ð¸Ð²Ð°Ð½Ð¸Ñ ÐºÑƒÑков данных. Размер пула ограничен наÑтройкой [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size), которую можно указать при перезапуÑке Ñервера. +Движок `ReplicatedMergeTree` иÑпользует отдельный пул потоков Ð´Ð»Ñ ÑÐºÐ°Ñ‡Ð¸Ð²Ð°Ð½Ð¸Ñ ÐºÑƒÑков данных. Размер пула ограничен наÑтройкой [background_fetches_pool_size](../../../operations/settings/settings.md#background_fetches_pool_size), которую можно указать при перезапуÑке Ñервера. По умолчанию, Ð·Ð°Ð¿Ñ€Ð¾Ñ INSERT ждёт Ð¿Ð¾Ð´Ñ‚Ð²ÐµÑ€Ð¶Ð´ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ð¸Ñи только от одной реплики. ЕÑли данные были уÑпешно запиÑаны только на одну реплику, и Ñервер Ñ Ñтой репликой переÑтал ÑущеÑтвовать, то запиÑанные данные будут потерÑны. Ð’Ñ‹ можете включить подтверждение запиÑи от неÑкольких реплик, иÑÐ¿Ð¾Ð»ÑŒÐ·ÑƒÑ Ð½Ð°Ñтройку `insert_quorum`. @@ -120,7 +120,7 @@ CREATE TABLE table_name -Как видно в примере, Ñти параметры могут Ñодержать подÑтановки в фигурных Ñкобках. ПодÑтавлÑемые Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð´Ð¾ÑтаютÑÑ Ð¸Ð· конфигурационного файла, из Ñекции «[macros](../../../operations/server-configuration-parameters/settings/#macros)». +Как видно в примере, Ñти параметры могут Ñодержать подÑтановки в фигурных Ñкобках. ПодÑтавлÑемые Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð´Ð¾ÑтаютÑÑ Ð¸Ð· конфигурационного файла, из Ñекции «[macros](../../../operations/server-configuration-parameters/settings/#macros)». Пример: @@ -142,7 +142,7 @@ CREATE TABLE table_name `table_name` - Ð¸Ð¼Ñ ÑƒÐ·Ð»Ð° Ð´Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹ в ZooKeeper. Разумно делать его таким же, как Ð¸Ð¼Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹. Оно указываетÑÑ Ñвно, так как, в отличие от имени таблицы, оно не менÑетÑÑ Ð¿Ð¾Ñле запроÑа RENAME. *ПодÑказка*: можно также указать Ð¸Ð¼Ñ Ð±Ð°Ð·Ñ‹ данных перед `table_name`, например `db_name.table_name` -Можно иÑпользовать две вÑтроенных подÑтановки `{database}` и `{table}`, они раÑкрываютÑÑ Ð² Ð¸Ð¼Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹ и в Ð¸Ð¼Ñ Ð±Ð°Ð·Ñ‹ данных ÑоответÑтвенно (еÑли Ñти подÑтановки не переопределены в Ñекции `macros`). Т.о. Zookeeper путь можно задать как `'/clickhouse/tables/{layer}-{shard}/{database}/{table}'`. +Можно иÑпользовать две вÑтроенных подÑтановки `{database}` и `{table}`, они раÑкрываютÑÑ Ð² Ð¸Ð¼Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹ и в Ð¸Ð¼Ñ Ð±Ð°Ð·Ñ‹ данных ÑоответÑтвенно (еÑли Ñти подÑтановки не переопределены в Ñекции `macros`). Т.о. Zookeeper путь можно задать как `'/clickhouse/tables/{layer}-{shard}/{database}/{table}'`. Будьте оÑторожны Ñ Ð¿ÐµÑ€ÐµÐ¸Ð¼ÐµÐ½Ð¾Ð²Ð°Ð½Ð¸Ñми таблицы при иÑпользовании Ñтих автоматичеÑких подÑтановок. Путь в Zookeeper-е Ð½ÐµÐ»ÑŒÐ·Ñ Ð¸Ð·Ð¼ÐµÐ½Ð¸Ñ‚ÑŒ, а подÑтановка при переименовании таблицы раÑкроетÑÑ Ð² другой путь, таблица будет обращатьÑÑ Ðº неÑущеÑтвующему в Zookeeper-е пути и перейдет в режим только Ð´Ð»Ñ Ñ‡Ñ‚ÐµÐ½Ð¸Ñ. Ð˜Ð¼Ñ Ñ€ÐµÐ¿Ð»Ð¸ÐºÐ¸ — то, что идентифицирует разные реплики одной и той же таблицы. Можно иÑпользовать Ð´Ð»Ñ Ð½ÐµÐ³Ð¾ Ð¸Ð¼Ñ Ñервера, как показано в примере. Впрочем, доÑтаточно, чтобы Ð¸Ð¼Ñ Ð±Ñ‹Ð»Ð¾ уникально лишь в пределах каждого шарда. @@ -163,7 +163,7 @@ CREATE TABLE table_name ``` sql CREATE TABLE table_name ( x UInt32 -) ENGINE = ReplicatedMergeTree +) ENGINE = ReplicatedMergeTree ORDER BY x; ``` @@ -172,7 +172,7 @@ ORDER BY x; ``` sql CREATE TABLE table_name ( x UInt32 -) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/table_name', '{replica}') +) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/table_name', '{replica}') ORDER BY x; ``` diff --git a/docs/ru/engines/table-engines/special/distributed.md b/docs/ru/engines/table-engines/special/distributed.md index 86eef35ebbc..b1f6f56623d 100644 --- a/docs/ru/engines/table-engines/special/distributed.md +++ b/docs/ru/engines/table-engines/special/distributed.md @@ -7,7 +7,7 @@ toc_title: Distributed **Движок Distributed не хранит данные ÑамоÑтоÑтельно**, а позволÑет обрабатывать запроÑÑ‹ раÑпределённо, на неÑкольких Ñерверах. Чтение автоматичеÑки раÑпараллеливаетÑÑ. При чтении будут иÑпользованы индекÑÑ‹ таблиц на удалённых Ñерверах, еÑли еÑÑ‚ÑŒ. -Движок Distributed принимает параметры: +Движок Distributed принимает параметры: - Ð¸Ð¼Ñ ÐºÐ»Ð°Ñтера в конфигурационном файле Ñервера @@ -21,7 +21,7 @@ toc_title: Distributed Смотрите также: - - наÑтройка `insert_distributed_sync` + - наÑтройка `insert_distributed_sync` - [MergeTree](../mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) Ð´Ð»Ñ Ð¿Ñ€Ð¸Ð¼ÐµÑ€Ð° Пример: diff --git a/docs/ru/faq/general/columnar-database.md b/docs/ru/faq/general/columnar-database.md index f38e46cfe93..5ed6185736d 100644 --- a/docs/ru/faq/general/columnar-database.md +++ b/docs/ru/faq/general/columnar-database.md @@ -8,7 +8,7 @@ toc_priority: 101 Ð’ Ñтолбцовой БД данные каждого Ñтолбца хранÑÑ‚ÑÑ Ð¾Ñ‚Ð´ÐµÐ»ÑŒÐ½Ð¾ (незавиÑимо) от других Ñтолбцов. Такой принцип Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð¿Ð¾Ð·Ð²Ð¾Ð»Ñет при выполнении запроÑа Ñчитывать Ñ Ð´Ð¸Ñка данные только тех Ñтолбцов, которые непоÑредÑтвенно учаÑтвуют в Ñтом запроÑе. ÐžÐ±Ñ€Ð°Ñ‚Ð½Ð°Ñ Ñторона такого принципа Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð·Ð°ÐºÐ»ÑŽÑ‡Ð°ÐµÑ‚ÑÑ Ð² том, что выполнение операций над Ñтроками ÑтановитÑÑ Ð±Ð¾Ð»ÐµÐµ затратным. ClickHouse — типичный пример Ñтолбцовой СУБД. -Ключевые преимущеÑтва Ñтолбцовой СУБД: +Ключевые преимущеÑтва Ñтолбцовой СУБД: - выполнение запроÑов над отдельными Ñтолбцами таблицы, а не над вÑей таблицей Ñразу; - Ð°Ð³Ñ€ÐµÐ³Ð°Ñ†Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов на больших объемах данных; diff --git a/docs/ru/faq/general/mapreduce.md b/docs/ru/faq/general/mapreduce.md index 8a524c9f680..2e7520aef1a 100644 --- a/docs/ru/faq/general/mapreduce.md +++ b/docs/ru/faq/general/mapreduce.md @@ -6,7 +6,7 @@ toc_priority: 110 # Почему бы не иÑпользовать ÑиÑтемы типа MapReduce? {#why-not-use-something-like-mapreduce} -СиÑтемами типа MapReduce будем называть ÑиÑтемы раÑпределённых вычиÑлений, в которых Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ñ Ñвёртки реализована на оÑнове раÑпределённой Ñортировки. Ðаиболее раÑпроÑтранённое решение Ñ Ð¾Ñ‚ÐºÑ€Ñ‹Ñ‚Ñ‹Ð¼ кодом в данном клаÑÑе — [Apache Hadoop](http://hadoop.apache.org). Ð¯Ð½Ð´ÐµÐºÑ Ð¿Ð¾Ð»ÑŒÐ·ÑƒÐµÑ‚ÑÑ ÑобÑтвенным решением — YT. +СиÑтемами типа MapReduce будем называть ÑиÑтемы раÑпределённых вычиÑлений, в которых Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ñ Ñвёртки реализована на оÑнове раÑпределённой Ñортировки. Ðаиболее раÑпроÑтранённое решение Ñ Ð¾Ñ‚ÐºÑ€Ñ‹Ñ‚Ñ‹Ð¼ кодом в данном клаÑÑе — [Apache Hadoop](http://hadoop.apache.org). Ð¯Ð½Ð´ÐµÐºÑ Ð¿Ð¾Ð»ÑŒÐ·ÑƒÐµÑ‚ÑÑ ÑобÑтвенным решением — YT. Такие ÑиÑтемы не подходÑÑ‚ Ð´Ð»Ñ Ð¾Ð½Ð»Ð°Ð¹Ð½ запроÑов в Ñилу Ñлишком большой задержки. То еÑÑ‚ÑŒ не могут быть иÑпользованы в качеÑтве бÑкенда Ð´Ð»Ñ Ð²ÐµÐ±-интерфейÑа. Также Ñти ÑиÑтемы не подходÑÑ‚ Ð´Ð»Ñ Ð¾Ð±Ð½Ð¾Ð²Ð»ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… в реальном времени. РаÑÐ¿Ñ€ÐµÐ´ÐµÐ»Ñ‘Ð½Ð½Ð°Ñ Ñортировка ÑвлÑетÑÑ Ð½Ðµ оптимальным ÑпоÑобом Ð´Ð»Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ð¸ Ñвёртки в Ñлучае запроÑов, выполнÑющихÑÑ Ð² режиме онлайн, потому что результат Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ð¸ и вÑе промежуточные результаты (еÑли такие еÑÑ‚ÑŒ) помещаютÑÑ Ð² оперативную памÑÑ‚ÑŒ на одном Ñервере. Ð’ таком Ñлучае оптимальным ÑпоÑобом Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ð¸ Ñвёртки ÑвлÑетÑÑ Ñ…ÐµÑˆ-таблица. ЧаÑтым ÑпоÑобом оптимизации "map-reduce" задач ÑвлÑетÑÑ Ð¿Ñ€ÐµÐ´Ð°Ð³Ñ€ÐµÐ³Ð°Ñ†Ð¸Ñ (чаÑÑ‚Ð¸Ñ‡Ð½Ð°Ñ Ñвёртка) Ñ Ð¸Ñпользованием хеш-таблицы в оперативной памÑти. Пользователь делает Ñту оптимизацию в ручном режиме. РаÑÐ¿Ñ€ÐµÐ´ÐµÐ»Ñ‘Ð½Ð½Ð°Ñ Ñортировка — оÑÐ½Ð¾Ð²Ð½Ð°Ñ Ð¿Ñ€Ð¸Ñ‡Ð¸Ð½Ð° тормозов при выполнении неÑложных задач типа "map-reduce". diff --git a/docs/ru/faq/general/ne-tormozit.md b/docs/ru/faq/general/ne-tormozit.md index 1230e34c475..c1cf87d5b0c 100644 --- a/docs/ru/faq/general/ne-tormozit.md +++ b/docs/ru/faq/general/ne-tormozit.md @@ -8,14 +8,14 @@ toc_priority: 11 Обычно Ñтот Ð²Ð¾Ð¿Ñ€Ð¾Ñ Ð²Ð¾Ð·Ð½Ð¸ÐºÐ°ÐµÑ‚, когда люди видÑÑ‚ официальные футболки ClickHouse. Ðа них большими буквами напиÑано **“ClickHouse не тормозитâ€**. -До того, как код ClickHouse Ñтал открытым, его разрабатывали как ÑобÑтвенную ÑиÑтему Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… в крупнейшей роÑÑийÑкой ИТ-компании [ЯндекÑ](https://yandex.com/company/). ПоÑтому оригинальный Ñлоган напиÑан по-руÑÑки. ПоÑле выхода верÑии Ñ Ð¾Ñ‚ÐºÑ€Ñ‹Ñ‚Ñ‹Ð¼ иÑходным кодом мы впервые выпуÑтили некоторое количеÑтво таких футболок Ð´Ð»Ñ Ð¼ÐµÑ€Ð¾Ð¿Ñ€Ð¸Ñтий в РоÑÑии, и проÑто оÑтавили прежний Ñлоган. +До того, как код ClickHouse Ñтал открытым, его разрабатывали как ÑобÑтвенную ÑиÑтему Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… в крупнейшей роÑÑийÑкой ИТ-компании [ЯндекÑ](https://yandex.com/company/). ПоÑтому оригинальный Ñлоган напиÑан по-руÑÑки. ПоÑле выхода верÑии Ñ Ð¾Ñ‚ÐºÑ€Ñ‹Ñ‚Ñ‹Ð¼ иÑходным кодом мы впервые выпуÑтили некоторое количеÑтво таких футболок Ð´Ð»Ñ Ð¼ÐµÑ€Ð¾Ð¿Ñ€Ð¸Ñтий в РоÑÑии, и проÑто оÑтавили прежний Ñлоган. Когда мы решили отправить партию Ñтих футболок на мероприÑÑ‚Ð¸Ñ Ð²Ð½Ðµ РоÑÑии, мы пробовали подобрать подходÑщий английÑкий Ñлоган. К Ñожалению, мы так и не Ñмогли придумать доÑтаточно точный и выразительный перевод, ведь на руÑÑком Ñтот Ñлоган звучит очень ёмко и при Ñтом довольно Ñлегантно. К тому же, ÑущеÑтвовало ограничение по количеÑтву Ñимволов на футболках. Ð’ итоге мы решили оÑтавить руÑÑкий вариант даже Ð´Ð»Ñ Ð¼ÐµÐ¶Ð´ÑƒÐ½Ð°Ñ€Ð¾Ð´Ð½Ñ‹Ñ… Ñобытий. И Ñто Ñтало прекраÑным решением, потому что люди по вÑему миру приÑтно удивлÑлиÑÑŒ, когда видели фразу и интереÑовалиÑÑŒ, что же там напиÑано. -Итак, как же объÑÑнить Ñту фразу на английÑком? Вот неÑколько вариантов: +Итак, как же объÑÑнить Ñту фразу на английÑком? Вот неÑколько вариантов: - ЕÑли переводить буквально, то получитÑÑ Ñ‡Ñ‚Ð¾-то подобное: *“ClickHouse doesn’t press the brake pedalâ€*. -- ЕÑли же вы хотите макÑимально Ñохранить том ÑмыÑл, который вкладывает в Ñту фразу человек из ИТ-Ñферы, то будет примерно Ñледующее: *“If your larger system lags, it’s not because it uses ClickHouseâ€*. +- ЕÑли же вы хотите макÑимально Ñохранить том ÑмыÑл, который вкладывает в Ñту фразу человек из ИТ-Ñферы, то будет примерно Ñледующее: *“If your larger system lags, it’s not because it uses ClickHouseâ€*. - Более короткие, но не такие точные верÑии: *“ClickHouse is not slowâ€*, *“ClickHouse doesn’t lagâ€* или проÑто *“ClickHouse is fastâ€*. ЕÑли вы не видели наших футболок, поÑмотрите видео о ClickHouse. Ðапример, вот Ñто: diff --git a/docs/ru/faq/general/olap.md b/docs/ru/faq/general/olap.md index 9dce0ffbdf7..42715a195ad 100644 --- a/docs/ru/faq/general/olap.md +++ b/docs/ru/faq/general/olap.md @@ -19,7 +19,7 @@ toc_priority: 100 ## OLAP Ñ Ñ‚Ð¾Ñ‡ÐºÐ¸ Ð·Ñ€ÐµÐ½Ð¸Ñ Ð±Ð¸Ð·Ð½ÐµÑа {#olap-from-the-business-perspective} -Ð’ поÑледние годы бизнеÑ-ÑообщеÑтво Ñтало оÑознавать ценноÑÑ‚ÑŒ данных. Компании, которые принимают Ñ€ÐµÑˆÐµÐ½Ð¸Ñ Ð²Ñлепую, чаще вÑего отÑтают от конкурентов. Управление бизнеÑом на оÑнове данных, которое применÑетÑÑ ÑƒÑпешными компаниÑми, побуждает Ñобирать вÑе данные, которые могут быть полезны в будущем Ð´Ð»Ñ Ð¿Ñ€Ð¸Ð½ÑÑ‚Ð¸Ñ Ð±Ð¸Ð·Ð½ÐµÑ-решений, а также подбирать механизмы, чтобы Ñвоевременно Ñти данные анализировать. Именно Ð´Ð»Ñ Ñтого и нужны СУБД Ñ OLAP. +Ð’ поÑледние годы бизнеÑ-ÑообщеÑтво Ñтало оÑознавать ценноÑÑ‚ÑŒ данных. Компании, которые принимают Ñ€ÐµÑˆÐµÐ½Ð¸Ñ Ð²Ñлепую, чаще вÑего отÑтают от конкурентов. Управление бизнеÑом на оÑнове данных, которое применÑетÑÑ ÑƒÑпешными компаниÑми, побуждает Ñобирать вÑе данные, которые могут быть полезны в будущем Ð´Ð»Ñ Ð¿Ñ€Ð¸Ð½ÑÑ‚Ð¸Ñ Ð±Ð¸Ð·Ð½ÐµÑ-решений, а также подбирать механизмы, чтобы Ñвоевременно Ñти данные анализировать. Именно Ð´Ð»Ñ Ñтого и нужны СУБД Ñ OLAP. С точки Ð·Ñ€ÐµÐ½Ð¸Ñ Ð±Ð¸Ð·Ð½ÐµÑа, OLAP позволÑет компаниÑм поÑтоÑнно планировать, анализировать и оценивать операционную деÑтельноÑÑ‚ÑŒ, чтобы повышать её ÑффективноÑÑ‚ÑŒ, уменьшать затраты и как ÑледÑтвие — увеличивать долю рынка. Это можно делать как в ÑобÑтвенной ÑиÑтеме, так и в облачной (SaaS), в веб или мобильных аналитичеÑких приложениÑÑ…, CRM-ÑиÑтемах и Ñ‚.д. Ð¢ÐµÑ…Ð½Ð¾Ð»Ð¾Ð³Ð¸Ñ OLAP иÑпользуетÑÑ Ð²Ð¾ многих приложениÑÑ… BI (Business Intelligence — бизнеÑ-аналитика). diff --git a/docs/ru/faq/general/why-clickhouse-is-so-fast.md b/docs/ru/faq/general/why-clickhouse-is-so-fast.md index 694488b40a9..fa73586fbcf 100644 --- a/docs/ru/faq/general/why-clickhouse-is-so-fast.md +++ b/docs/ru/faq/general/why-clickhouse-is-so-fast.md @@ -6,9 +6,9 @@ toc_priority: 8 # Почему ClickHouse так быÑтро работает? {#why-clickhouse-is-so-fast} -ПроизводительноÑÑ‚ÑŒ изначально заложена в архитектуре ClickHouse. Ð’Ñ‹ÑÐ¾ÐºÐ°Ñ ÑкороÑÑ‚ÑŒ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов была и оÑтаетÑÑ Ñамым важным критерием, который учитываетÑÑ Ð¿Ñ€Ð¸ разработке. Ðо мы обращаем внимание и на другие характериÑтики, такие как удобÑтво иÑпользованиÑ, маÑштабируемоÑÑ‚ÑŒ, безопаÑноÑÑ‚ÑŒ. Ð’ÑÑ‘ Ñто делает ClickHouse наÑтоÑщей промышленной разработкой. +ПроизводительноÑÑ‚ÑŒ изначально заложена в архитектуре ClickHouse. Ð’Ñ‹ÑÐ¾ÐºÐ°Ñ ÑкороÑÑ‚ÑŒ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов была и оÑтаетÑÑ Ñамым важным критерием, который учитываетÑÑ Ð¿Ñ€Ð¸ разработке. Ðо мы обращаем внимание и на другие характериÑтики, такие как удобÑтво иÑпользованиÑ, маÑштабируемоÑÑ‚ÑŒ, безопаÑноÑÑ‚ÑŒ. Ð’ÑÑ‘ Ñто делает ClickHouse наÑтоÑщей промышленной разработкой. -Сначала ClickHouse ÑоздавалÑÑ ÐºÐ°Ðº прототип, который должен был отлично ÑправлÑÑ‚ÑŒÑÑ Ñ Ð¾Ð´Ð½Ð¾Ð¹ единÑтвенной задачей — отбирать и агрегировать данные Ñ Ð¼Ð°ÐºÑимальной ÑкороÑтью. Это необходимо, чтобы Ñоздать обычный аналитичеÑкий отчет, и именно Ñто делает Ñтандартный Ð·Ð°Ð¿Ñ€Ð¾Ñ [GROUP BY](../../sql-reference/statements/select/group-by.md). Ð”Ð»Ñ Ñ€ÐµÑˆÐµÐ½Ð¸Ñ Ñ‚Ð°ÐºÐ¾Ð¹ задачи команда разработки ClickHouse принÑла неÑколько архитектурных решений: +Сначала ClickHouse ÑоздавалÑÑ ÐºÐ°Ðº прототип, который должен был отлично ÑправлÑÑ‚ÑŒÑÑ Ñ Ð¾Ð´Ð½Ð¾Ð¹ единÑтвенной задачей — отбирать и агрегировать данные Ñ Ð¼Ð°ÐºÑимальной ÑкороÑтью. Это необходимо, чтобы Ñоздать обычный аналитичеÑкий отчет, и именно Ñто делает Ñтандартный Ð·Ð°Ð¿Ñ€Ð¾Ñ [GROUP BY](../../sql-reference/statements/select/group-by.md). Ð”Ð»Ñ Ñ€ÐµÑˆÐµÐ½Ð¸Ñ Ñ‚Ð°ÐºÐ¾Ð¹ задачи команда разработки ClickHouse принÑла неÑколько архитектурных решений: Столбцовое хранение данных : ИÑходные данные чаÑто Ñодержат Ñотни или даже Ñ‚Ñ‹ÑÑчи Ñтолбцов, в то Ð²Ñ€ÐµÐ¼Ñ ÐºÐ°Ðº Ð´Ð»Ñ ÐºÐ¾Ð½ÐºÑ€ÐµÑ‚Ð½Ð¾Ð³Ð¾ отчета нужны только неÑколько из них. СиÑтема не должна читать ненужные Ñтолбцы, поÑкольку операции Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… Ñ Ð´Ð¸Ñка — Ñамые дорогоÑтоÑщие. @@ -17,7 +17,7 @@ toc_priority: 8 : ClickHouse хранит Ñтруктуры данных в оперативной памÑти, что позволÑет Ñчитывать не только нужные Ñтолбцы, но и нужные диапазоны Ñтрок Ð´Ð»Ñ Ñтих Ñтолбцов. Сжатие данных -: Различные ÑпоÑобы Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ñмежных значений в Ñтолбце позволÑÑŽÑ‚ доÑтигать более выÑокой Ñтепени ÑÐ¶Ð°Ñ‚Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… (по Ñравнению Ñ Ð¾Ð±Ñ‹Ñ‡Ð½Ñ‹Ð¼Ð¸ Ñтроковыми СУБД), Ñ‚.к. в Ñмежных Ñтроках Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ñ‡Ð°Ñто бывают одинаковыми или близкими. Ð’ дополнение к универÑальному Ñжатию ClickHouse поддерживает [Ñпециализированные кодеки](../../sql-reference/statements/create/table.md#create-query-specialized-codecs), которые позволÑÑŽÑ‚ еще больше уменьшить объемы хранимых данных. +: Различные ÑпоÑобы Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ñмежных значений в Ñтолбце позволÑÑŽÑ‚ доÑтигать более выÑокой Ñтепени ÑÐ¶Ð°Ñ‚Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… (по Ñравнению Ñ Ð¾Ð±Ñ‹Ñ‡Ð½Ñ‹Ð¼Ð¸ Ñтроковыми СУБД), Ñ‚.к. в Ñмежных Ñтроках Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ñ‡Ð°Ñто бывают одинаковыми или близкими. Ð’ дополнение к универÑальному Ñжатию ClickHouse поддерживает [Ñпециализированные кодеки](../../sql-reference/statements/create/table.md#create-query-specialized-codecs), которые позволÑÑŽÑ‚ еще больше уменьшить объемы хранимых данных. Векторные запроÑÑ‹ : ClickHouse не только хранит, но и обрабатывает данные в Ñтолбцах. Это приводит к лучшей утилизации кеша процеÑÑора и позволÑет иÑпользовать инÑтрукции [SIMD](https://en.wikipedia.org/wiki/SIMD). @@ -45,7 +45,7 @@ toc_priority: 8 - Как Ñравнивать данные? - Ðе ÑвлÑÑŽÑ‚ÑÑ Ð»Ð¸ данные чаÑтично отÑортированными? -Ðлгоритмы, оÑнованные на характериÑтиках рабочих данных, обычно дают лучшие результаты, чем их более универÑальные аналоги. ЕÑли заранее неизвеÑтно, Ñ ÐºÐ°ÐºÐ¸Ð¼Ð¸ данными придетÑÑ Ñ€Ð°Ð±Ð¾Ñ‚Ð°Ñ‚ÑŒ, ClickHouse будет в процеÑÑе Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¿Ñ€Ð¾Ð±Ð¾Ð²Ð°Ñ‚ÑŒ различные реализации и в итоге выберет оптимальный вариант. Ðапример, рекомендуем прочитать [Ñтатью о том, как в ClickHouse реализуетÑÑ Ñ€Ð°Ñпаковка LZ4](https://habr.com/en/company/yandex/blog/457612/). +Ðлгоритмы, оÑнованные на характериÑтиках рабочих данных, обычно дают лучшие результаты, чем их более универÑальные аналоги. ЕÑли заранее неизвеÑтно, Ñ ÐºÐ°ÐºÐ¸Ð¼Ð¸ данными придетÑÑ Ñ€Ð°Ð±Ð¾Ñ‚Ð°Ñ‚ÑŒ, ClickHouse будет в процеÑÑе Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¿Ñ€Ð¾Ð±Ð¾Ð²Ð°Ñ‚ÑŒ различные реализации и в итоге выберет оптимальный вариант. Ðапример, рекомендуем прочитать [Ñтатью о том, как в ClickHouse реализуетÑÑ Ñ€Ð°Ñпаковка LZ4](https://habr.com/en/company/yandex/blog/457612/). Ðу и поÑледнее, но тем не менее важное уÑловие: команда ClickHouse поÑтоÑнно отÑлеживает в интернете ÑÐ¾Ð¾Ð±Ñ‰ÐµÐ½Ð¸Ñ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»ÐµÐ¹ о найденных ими удачных реализациÑÑ…, алгоритмах или Ñтруктурах данных, анализирует и пробует новые идеи. Иногда в Ñтом потоке Ñообщений попадаютÑÑ Ð´ÐµÐ¹Ñтвительно ценные предложениÑ. diff --git a/docs/ru/faq/index.md b/docs/ru/faq/index.md index 08deec5f7ce..90de6b0aa84 100644 --- a/docs/ru/faq/index.md +++ b/docs/ru/faq/index.md @@ -10,7 +10,7 @@ toc_priority: 76 Категории: -- **[Общие вопроÑÑ‹](general/index.md)** +- **[Общие вопроÑÑ‹](general/index.md)** - [Что такое ClickHouse?](../index.md#what-is-clickhouse) - [Почему ClickHouse такой быÑтрый?](general/why-clickhouse-is-so-fast.md) - [Кто пользуетÑÑ ClickHouse?](general/who-is-using-clickhouse.md) @@ -22,7 +22,7 @@ toc_priority: 76 - **[Применение](use-cases/index.md)** - [Можно ли иÑпользовать ClickHouse как БД временных Ñ€Ñдов?](use-cases/time-series.md) - [Можно ли иÑпользовать ClickHouse Ð´Ð»Ñ Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… вида "ключ-значение"?](use-cases/key-value.md) -- **[Операции](operations/index.md)** +- **[Операции](operations/index.md)** - [Какую верÑию ClickHouse иÑпользовать?](operations/production.md) - [Возможно ли удалить Ñтарые запиÑи из таблицы ClickHouse?](operations/delete-old-data.md) - **[ИнтеграциÑ](integration/index.md)** diff --git a/docs/ru/faq/operations/production.md b/docs/ru/faq/operations/production.md index a82a7f5e888..3a2ef965958 100644 --- a/docs/ru/faq/operations/production.md +++ b/docs/ru/faq/operations/production.md @@ -8,14 +8,14 @@ toc_priority: 10 Во-первых, давайте обÑудим, почему возникает Ñтот вопроÑ. ЕÑÑ‚ÑŒ две оÑновные причины: -1. ClickHouse развиваетÑÑ Ð´Ð¾Ñтаточно быÑтро, и обычно мы выпуÑкаем более 10 Ñтабильных релизов в год. Так что еÑÑ‚ÑŒ из чего выбрать, а Ñто не вÑегда проÑто. +1. ClickHouse развиваетÑÑ Ð´Ð¾Ñтаточно быÑтро, и обычно мы выпуÑкаем более 10 Ñтабильных релизов в год. Так что еÑÑ‚ÑŒ из чего выбрать, а Ñто не вÑегда проÑто. 2. Ðекоторые пользователи не хотÑÑ‚ тратить Ð²Ñ€ÐµÐ¼Ñ Ð½Ð° анализ того, ÐºÐ°ÐºÐ°Ñ Ð²ÐµÑ€ÑÐ¸Ñ Ð»ÑƒÑ‡ÑˆÐµ подходит Ð´Ð»Ñ Ð¸Ñ… задач, и проÑто хотÑÑ‚ получить Ñовет от ÑкÑперта. Ð’Ñ‚Ð¾Ñ€Ð°Ñ Ð¿Ñ€Ð¸Ñ‡Ð¸Ð½Ð° более веÑомаÑ, так что начнем Ñ Ð½ÐµÐµ, а затем раÑÑмотрим, какие бывают релизы ClickHouse. ## Какую верÑию ClickHouse вы поÑоветуете? {#which-clickhouse-version-do-you-recommend} -КазалоÑÑŒ бы, Ñамый удобный вариант — нанÑÑ‚ÑŒ конÑультанта или доверитьÑÑ ÑкÑперту, и делегировать ему ответÑтвенноÑÑ‚ÑŒ за вашу ÑиÑтему. Ð’Ñ‹ уÑтанавливаете ту верÑию ClickHouse, которую вам рекомендовали, и теперь еÑли что-то пойдет не так — Ñто уже не ваша вина. Ðа Ñамом деле Ñто не так. Ðикто не может знать лучше ваÑ, что проиÑходит в вашей ÑиÑтеме. +КазалоÑÑŒ бы, Ñамый удобный вариант — нанÑÑ‚ÑŒ конÑультанта или доверитьÑÑ ÑкÑперту, и делегировать ему ответÑтвенноÑÑ‚ÑŒ за вашу ÑиÑтему. Ð’Ñ‹ уÑтанавливаете ту верÑию ClickHouse, которую вам рекомендовали, и теперь еÑли что-то пойдет не так — Ñто уже не ваша вина. Ðа Ñамом деле Ñто не так. Ðикто не может знать лучше ваÑ, что проиÑходит в вашей ÑиÑтеме. Как же правильно выбрать верÑию ClickHouse, на которую Ñтоит обновитьÑÑ? Или как выбрать верÑию, Ñ ÐºÐ¾Ñ‚Ð¾Ñ€Ð¾Ð¹ Ñледует начать, еÑли вы только внедрÑете ClickHouse? Во-первых, мы рекомендуем позаботитьÑÑ Ð¾ Ñоздании **реалиÑтичной теÑтовой Ñреды** (pre-production). Ð’ идеальном мире Ñто была бы Ð¿Ð¾Ð»Ð½Ð°Ñ ÐºÐ¾Ð¿Ð¸Ñ Ñ€Ð°Ð±Ð¾Ñ‡ÐµÐ¹ Ñреды, но чаще вÑего такое решение оказываетÑÑ Ñлишком дорогоÑтоÑщим. @@ -25,8 +25,8 @@ toc_priority: 10 - Ðе иÑпользуйте теÑтовую Ñреду в режиме "только Ð´Ð»Ñ Ñ‡Ñ‚ÐµÐ½Ð¸Ñ", Ñ€Ð°Ð±Ð¾Ñ‚Ð°Ñ Ñ ÐºÐ°ÐºÐ¸Ð¼-то Ñтатичным набором данных. - Ðе иÑпользуйте её в режиме "только Ð´Ð»Ñ Ð·Ð°Ð¿Ð¸Ñи", проверÑÑ Ð»Ð¸ÑˆÑŒ копирование данных, без поÑÑ‚Ñ€Ð¾ÐµÐ½Ð¸Ñ Ñ‚Ð¸Ð¿Ð¾Ð²Ñ‹Ñ… отчетов. - Ðе очищайте её, удалÑÑ Ð²Ñе данные подчиÑтую вмеÑто теÑÑ‚Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ñ€Ð°Ð±Ð¾Ñ‡Ð¸Ñ… Ñхем миграции. -- ВыполнÑйте реальные запроÑÑ‹ на выборке из реальных рабочих данных. ПоÑтарайтеÑÑŒ подготовить репрезентативную выборку, на которой Ð·Ð°Ð¿Ñ€Ð¾Ñ `SELECT` будет возвращать адекватные результаты. ЕÑли регламенты безопаÑноÑти не позволÑÑŽÑ‚ иÑпользовать реальные данные за пределами защищенной рабочей Ñреды, иÑпользуйте обфуÑкацию. -- УбедитеÑÑŒ, что теÑÑ‚Ð¾Ð²Ð°Ñ Ñреда находитÑÑ Ð¿Ð¾Ð´ контролем тех же ÑиÑтем мониторинга и оповещениÑ, что и рабочаÑ. +- ВыполнÑйте реальные запроÑÑ‹ на выборке из реальных рабочих данных. ПоÑтарайтеÑÑŒ подготовить репрезентативную выборку, на которой Ð·Ð°Ð¿Ñ€Ð¾Ñ `SELECT` будет возвращать адекватные результаты. ЕÑли регламенты безопаÑноÑти не позволÑÑŽÑ‚ иÑпользовать реальные данные за пределами защищенной рабочей Ñреды, иÑпользуйте обфуÑкацию. +- УбедитеÑÑŒ, что теÑÑ‚Ð¾Ð²Ð°Ñ Ñреда находитÑÑ Ð¿Ð¾Ð´ контролем тех же ÑиÑтем мониторинга и оповещениÑ, что и рабочаÑ. - ЕÑли ваша Ñ€Ð°Ð±Ð¾Ñ‡Ð°Ñ Ñреда раÑпределена между разными дата-центрами и регионами, теÑÑ‚Ð¾Ð²Ð°Ñ Ñреда должна быть такой же. - ЕÑли в рабочей Ñреде иÑпользуютÑÑ Ñложные инÑтрументы типа репликации, раÑпределённых таблиц или каÑкадных материализованных предÑтавлений, теÑÑ‚Ð¾Ð²Ð°Ñ Ñреда должна быть Ñконфигурирована так же. - Обычно в теÑтовой Ñреде ÑтараютÑÑ Ð¸Ñпользовать то же количеÑтво Ñерверов и виртуальных машин, что и в рабочей, но делают их меньшего объема. Либо наоборот, иÑпользуют ÑущеÑтвенно меньшее чиÑло Ñерверов и Ð’Ðœ, но тех же объемов. Первый вариант Ñкорее позволит обнаружить проблемы, ÑвÑзанные Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð¾Ð¹ Ñети, а второй вариант более проÑÑ‚ в управлении. diff --git a/docs/ru/faq/use-cases/key-value.md b/docs/ru/faq/use-cases/key-value.md index 4daa9773f84..b751bb2ce75 100644 --- a/docs/ru/faq/use-cases/key-value.md +++ b/docs/ru/faq/use-cases/key-value.md @@ -8,7 +8,7 @@ toc_priority: 101 ЕÑли отвечать коротко, то **"нет"**. Операции над данными вида "ключ-значение" занимают одну из верхних позиций в ÑпиÑке Ñитуаций, когда категоричеÑки **не Ñтоит**{.text-danger} иÑпользовать ClickHouse. Это [OLAP](../../faq/general/olap.md) СУБД, в то Ð²Ñ€ÐµÐ¼Ñ ÐºÐ°Ðº еÑÑ‚ÑŒ много Ñпециализированных СУБД Ð´Ð»Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… вида "ключ-значение". -Тем не менее, в некоторых ÑитуациÑÑ… имеет ÑмыÑл иÑпользовать ClickHouse Ð´Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов над данными вида "ключ-значение". Чаще вÑего Ñто отноÑитÑÑ Ðº ÑиÑтемам Ñ Ð¾Ñ‚Ð½Ð¾Ñительно невыÑокой нагрузкой, в которых оÑновной объем операций отноÑитÑÑ Ðº аналитичеÑкой обработке данных и отлично подходит Ð´Ð»Ñ ClickHouse. Однако в них еÑÑ‚ÑŒ некий второÑтепенный процеÑÑ, в котором нужно обрабатывать данные вида "ключ-значение", при Ñтом процеÑÑ Ð½Ðµ требует Ñлишком выÑокой производительноÑти и не имеет Ñтрогих ограничений по задержкам Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов. ЕÑли у Ð²Ð°Ñ Ð½ÐµÑ‚ ограничений по бюджету, вы можете иÑпользовать Ð´Ð»Ñ Ñ‚Ð°ÐºÐ¸Ñ… операций вÑпомогательную базу данных "ключ-значение", но Ñто увеличит раÑходы на обÑлуживание еще одной СУБД (мониторинг, бÑкапы и Ñ‚.д.). +Тем не менее, в некоторых ÑитуациÑÑ… имеет ÑмыÑл иÑпользовать ClickHouse Ð´Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов над данными вида "ключ-значение". Чаще вÑего Ñто отноÑитÑÑ Ðº ÑиÑтемам Ñ Ð¾Ñ‚Ð½Ð¾Ñительно невыÑокой нагрузкой, в которых оÑновной объем операций отноÑитÑÑ Ðº аналитичеÑкой обработке данных и отлично подходит Ð´Ð»Ñ ClickHouse. Однако в них еÑÑ‚ÑŒ некий второÑтепенный процеÑÑ, в котором нужно обрабатывать данные вида "ключ-значение", при Ñтом процеÑÑ Ð½Ðµ требует Ñлишком выÑокой производительноÑти и не имеет Ñтрогих ограничений по задержкам Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов. ЕÑли у Ð²Ð°Ñ Ð½ÐµÑ‚ ограничений по бюджету, вы можете иÑпользовать Ð´Ð»Ñ Ñ‚Ð°ÐºÐ¸Ñ… операций вÑпомогательную базу данных "ключ-значение", но Ñто увеличит раÑходы на обÑлуживание еще одной СУБД (мониторинг, бÑкапы и Ñ‚.д.). ЕÑли вы вÑе же решите не Ñледовать рекомендациÑм и иÑпользовать ClickHouse Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ Ð´Ð°Ð½Ð½Ñ‹Ð¼Ð¸ вида "ключ-значение", вот неÑколько Ñоветов: diff --git a/docs/ru/getting-started/install.md b/docs/ru/getting-started/install.md index 66a94bcfbca..1cbeb70ef96 100644 --- a/docs/ru/getting-started/install.md +++ b/docs/ru/getting-started/install.md @@ -100,11 +100,11 @@ sudo ./clickhouse install Ð”Ð»Ñ Ð´Ñ€ÑƒÐ³Ð¸Ñ… операционных ÑиÑтем и архитектуры AArch64 Ñборки ClickHouse предоÑтавлÑÑŽÑ‚ÑÑ Ð² виде кроÑÑ-компилированного бинарного файла из поÑледнего коммита ветки `master` (Ñ Ð·Ð°Ð´ÐµÑ€Ð¶ÐºÐ¾Ð¹ в неÑколько чаÑов). -- [macOS](https://builds.clickhouse.tech/master/macos/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/macos/clickhouse' && chmod a+x ./clickhouse` -- [AArch64](https://builds.clickhouse.tech/master/aarch64/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/aarch64/clickhouse' && chmod a+x ./clickhouse` -- [FreeBSD](https://builds.clickhouse.tech/master/freebsd/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/freebsd/clickhouse' && chmod a+x ./clickhouse` +- [macOS](https://builds.clickhouse.tech/master/macos/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/macos/clickhouse' && chmod a+x ./clickhouse` +- [FreeBSD](https://builds.clickhouse.tech/master/freebsd/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/freebsd/clickhouse' && chmod a+x ./clickhouse` +- [AArch64](https://builds.clickhouse.tech/master/aarch64/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/aarch64/clickhouse' && chmod a+x ./clickhouse` -ПоÑле ÑÐºÐ°Ñ‡Ð¸Ð²Ð°Ð½Ð¸Ñ Ð¼Ð¾Ð¶Ð½Ð¾ воÑпользоватьÑÑ `clickhouse client` Ð´Ð»Ñ Ð¿Ð¾Ð´ÐºÐ»ÑŽÑ‡ÐµÐ½Ð¸Ñ Ðº Ñерверу или `clickhouse local` Ð´Ð»Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸ локальных данных. +ПоÑле ÑÐºÐ°Ñ‡Ð¸Ð²Ð°Ð½Ð¸Ñ Ð¼Ð¾Ð¶Ð½Ð¾ воÑпользоватьÑÑ `clickhouse client` Ð´Ð»Ñ Ð¿Ð¾Ð´ÐºÐ»ÑŽÑ‡ÐµÐ½Ð¸Ñ Ðº Ñерверу или `clickhouse local` Ð´Ð»Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸ локальных данных. Чтобы уÑтановить ClickHouse в рамках вÑей ÑиÑтемы (Ñ Ð½ÐµÐ¾Ð±Ñ…Ð¾Ð´Ð¸Ð¼Ñ‹Ð¼Ð¸ конфигурационными файлами, наÑтройками пользователей и Ñ‚.д.), выполните `sudo ./clickhouse install`. Затем выполните команды `clickhouse start` (чтобы запуÑтить Ñервер) и `clickhouse-client` (чтобы подключитьÑÑ Ðº нему). diff --git a/docs/ru/guides/apply-catboost-model.md b/docs/ru/guides/apply-catboost-model.md index a9cba1d7f70..d864d6c75cc 100644 --- a/docs/ru/guides/apply-catboost-model.md +++ b/docs/ru/guides/apply-catboost-model.md @@ -162,7 +162,7 @@ FROM amazon_train ``` !!! note "Примечание" Ð’Ñ‹ можете позднее изменить путь к конфигурации модели CatBoost без перезагрузки Ñервера. - + ## 4. ЗапуÑтите вывод модели из SQL {#run-model-inference} Ð”Ð»Ñ Ñ‚ÐµÑÑ‚Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð¼Ð¾Ð´ÐµÐ»Ð¸ запуÑтите клиент ClickHouse `$ clickhouse client`. diff --git a/docs/ru/interfaces/formats.md b/docs/ru/interfaces/formats.md index 7780a75a706..563a137ac17 100644 --- a/docs/ru/interfaces/formats.md +++ b/docs/ru/interfaces/formats.md @@ -1165,12 +1165,14 @@ SELECT * FROM topic1_stream; | `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `DOUBLE` | | `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` | | `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` | -| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `STRING` | -| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `STRING` | +| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` | +| — | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` | | `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` | | `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` | +| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` | +| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` | -МаÑÑивы могут быть вложенными и иметь в качеÑтве аргумента значение типа `Nullable`. +МаÑÑивы могут быть вложенными и иметь в качеÑтве аргумента значение типа `Nullable`. Типы `Tuple` и `Map` также могут быть вложенными. ClickHouse поддерживает наÑтраиваемую точноÑÑ‚ÑŒ Ð´Ð»Ñ Ñ„Ð¾Ñ€Ð¼Ð°Ñ‚Ð° `Decimal`. При выполнении запроÑа `INSERT` ClickHouse обрабатывает тип данных Parquet `DECIMAL` как `Decimal128`. @@ -1218,12 +1220,17 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_ | `DOUBLE` | [Float64](../sql-reference/data-types/float.md) | `FLOAT64` | | `DATE32` | [Date](../sql-reference/data-types/date.md) | `UINT16` | | `DATE64`, `TIMESTAMP` | [DateTime](../sql-reference/data-types/datetime.md) | `UINT32` | -| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `UTF8` | -| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `UTF8` | +| `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` | +| `STRING`, `BINARY` | [FixedString](../sql-reference/data-types/fixedstring.md) | `BINARY` | | `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` | +| `DECIMAL256` | [Decimal256](../sql-reference/data-types/decimal.md)| `DECIMAL256` | | `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` | +| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` | +| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` | -МаÑÑивы могут быть вложенными и иметь в качеÑтве аргумента значение типа `Nullable`. +МаÑÑивы могут быть вложенными и иметь в качеÑтве аргумента значение типа `Nullable`. Типы `Tuple` и `Map` также могут быть вложенными. + +Тип `DICTIONARY` поддерживаетÑÑ Ð´Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов `INSERT`. Ð”Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов `SELECT` еÑÑ‚ÑŒ наÑтройка [output_format_arrow_low_cardinality_as_dictionary](../operations/settings/settings.md#output-format-arrow-low-cardinality-as-dictionary), ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð¿Ð¾Ð·Ð²Ð¾Ð»Ñет выводить тип [LowCardinality](../sql-reference/data-types/lowcardinality.md) как `DICTIONARY`. ClickHouse поддерживает наÑтраиваемую точноÑÑ‚ÑŒ Ð´Ð»Ñ Ñ„Ð¾Ñ€Ð¼Ð°Ñ‚Ð° `Decimal`. При выполнении запроÑа `INSERT` ClickHouse обрабатывает тип данных Arrow `DECIMAL` как `Decimal128`. @@ -1276,8 +1283,10 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Arrow" > {filenam | `STRING`, `BINARY` | [String](../sql-reference/data-types/string.md) | `BINARY` | | `DECIMAL` | [Decimal](../sql-reference/data-types/decimal.md) | `DECIMAL` | | `LIST` | [Array](../sql-reference/data-types/array.md) | `LIST` | +| `STRUCT` | [Tuple](../sql-reference/data-types/tuple.md) | `STRUCT` | +| `MAP` | [Map](../sql-reference/data-types/map.md) | `MAP` | -МаÑÑивы могут быть вложенными и иметь в качеÑтве аргумента значение типа `Nullable`. +МаÑÑивы могут быть вложенными и иметь в качеÑтве аргумента значение типа `Nullable`. Типы `Tuple` и `Map` также могут быть вложенными. ClickHouse поддерживает наÑтраиваемую точноÑÑ‚ÑŒ Ð´Ð»Ñ Ñ„Ð¾Ñ€Ð¼Ð°Ñ‚Ð° `Decimal`. При выполнении запроÑа `INSERT` ClickHouse обрабатывает тип данных ORC `DECIMAL` как `Decimal128`. diff --git a/docs/ru/interfaces/http.md b/docs/ru/interfaces/http.md index 278aeec44b1..fca343e8529 100644 --- a/docs/ru/interfaces/http.md +++ b/docs/ru/interfaces/http.md @@ -15,7 +15,7 @@ $ curl 'http://localhost:8123/' Ok. ``` -Веб-Ð¸Ð½Ñ‚ÐµÑ€Ñ„ÐµÐ¹Ñ Ð´Ð¾Ñтупен по адреÑу: `http://localhost:8123/play`. +Веб-Ð¸Ð½Ñ‚ÐµÑ€Ñ„ÐµÐ¹Ñ Ð´Ð¾Ñтупен по адреÑу: `http://localhost:8123/play`. ![Веб-интерфейÑ](../images/play.png) @@ -162,14 +162,14 @@ $ echo 'DROP TABLE t' | curl 'http://localhost:8123/' --data-binary @- ЕÑли вы указали `compress=1` в URL, то Ñервер Ñжимает данные, которые он отправлÑет. ЕÑли вы указали `decompress=1` в URL, Ñервер раÑпаковывает те данные, которые вы передаёте методом `POST`. -Также можно иÑпользовать [Ñжатие HTTP](https://en.wikipedia.org/wiki/HTTP_compression). ClickHouse поддерживает Ñледующие [методы ÑжатиÑ](https://en.wikipedia.org/wiki/HTTP_compression#Content-Encoding_tokens): +Также можно иÑпользовать [Ñжатие HTTP](https://en.wikipedia.org/wiki/HTTP_compression). ClickHouse поддерживает Ñледующие [методы ÑжатиÑ](https://en.wikipedia.org/wiki/HTTP_compression#Content-Encoding_tokens): - `gzip` - `br` - `deflate` - `xz` -Ð”Ð»Ñ Ð¾Ñ‚Ð¿Ñ€Ð°Ð²ÐºÐ¸ Ñжатого запроÑа `POST` добавьте заголовок `Content-Encoding: compression_method`. +Ð”Ð»Ñ Ð¾Ñ‚Ð¿Ñ€Ð°Ð²ÐºÐ¸ Ñжатого запроÑа `POST` добавьте заголовок `Content-Encoding: compression_method`. Чтобы ClickHouse Ñжимал ответ, разрешите Ñжатие наÑтройкой [enable_http_compression](../operations/settings/settings.md#settings-enable_http_compression) и добавьте заголовок `Accept-Encoding: compression_method`. Уровень ÑÐ¶Ð°Ñ‚Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… Ð´Ð»Ñ Ð²Ñех методов ÑÐ¶Ð°Ñ‚Ð¸Ñ Ð¼Ð¾Ð¶Ð½Ð¾ задать Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ наÑтройки [http_zlib_compression_level](../operations/settings/settings.md#settings-http_zlib_compression_level). !!! note "Примечание" @@ -403,13 +403,13 @@ $ curl -v 'http://localhost:8123/predefined_query' - `handler` Ñодержит оÑновную чаÑÑ‚ÑŒ обработчика. Ð¡ÐµÐ¹Ñ‡Ð°Ñ `handler` может наÑтраивать `type`, `status`, `content_type`, `response_content`, `query`, `query_param_name`. `type` на данный момент поддерживает три типа: [predefined_query_handler](#predefined_query_handler), [dynamic_query_handler](#dynamic_query_handler), [static](#static). - + - `query` — иÑпользуетÑÑ Ñ Ñ‚Ð¸Ð¿Ð¾Ð¼ `predefined_query_handler`, выполнÑет Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð¿Ñ€Ð¸ вызове обработчика. - + - `query_param_name` — иÑпользуетÑÑ Ñ Ñ‚Ð¸Ð¿Ð¾Ð¼ `dynamic_query_handler`, извлекает и выполнÑет значение, ÑоответÑтвующее значению `query_param_name` в параметрах HTTP-запроÑа. - + - `status` — иÑпользуетÑÑ Ñ Ñ‚Ð¸Ð¿Ð¾Ð¼ `static`, возвращает код ÑоÑтоÑÐ½Ð¸Ñ Ð¾Ñ‚Ð²ÐµÑ‚Ð°. - + - `content_type` — иÑпользуетÑÑ Ñ Ñ‚Ð¸Ð¿Ð¾Ð¼ `static`, возвращает [content-type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type). - `response_content` — иÑпользуетÑÑ Ñ Ñ‚Ð¸Ð¿Ð¾Ð¼`static`, Ñодержимое ответа, отправленное клиенту, при иÑпользовании префикÑа ‘file://’ or ‘config://’, находит Ñодержимое из файла или конфигурации, отправленного клиенту. diff --git a/docs/ru/interfaces/third-party/gui.md b/docs/ru/interfaces/third-party/gui.md index dc96c32e996..9cb28a2c9a2 100644 --- a/docs/ru/interfaces/third-party/gui.md +++ b/docs/ru/interfaces/third-party/gui.md @@ -111,7 +111,7 @@ toc_title: "Визуальные интерфейÑÑ‹ от Ñторонних Ñ€ ### DataGrip {#datagrip} -[DataGrip](https://www.jetbrains.com/datagrip/) — Ñто IDE Ð´Ð»Ñ Ð±Ð°Ð· данных о JetBrains Ñ Ð²Ñ‹Ð´ÐµÐ»ÐµÐ½Ð½Ð¾Ð¹ поддержкой ClickHouse. Он также вÑтроен в другие инÑтрументы на оÑнове IntelliJ: PyCharm, IntelliJ IDEA, GoLand, PhpStorm и другие. +[DataGrip](https://www.jetbrains.com/datagrip/) — Ñто IDE Ð´Ð»Ñ Ð±Ð°Ð· данных от JetBrains Ñ Ð²Ñ‹Ð´ÐµÐ»ÐµÐ½Ð½Ð¾Ð¹ поддержкой ClickHouse. Он также вÑтроен в другие инÑтрументы на оÑнове IntelliJ: PyCharm, IntelliJ IDEA, GoLand, PhpStorm и другие. ОÑновные возможноÑти: diff --git a/docs/ru/interfaces/third-party/integrations.md b/docs/ru/interfaces/third-party/integrations.md index 198e9d6be76..70a4d233277 100644 --- a/docs/ru/interfaces/third-party/integrations.md +++ b/docs/ru/interfaces/third-party/integrations.md @@ -43,7 +43,7 @@ toc_title: "Библиотеки Ð´Ð»Ñ Ð¸Ð½Ñ‚ÐµÐ³Ñ€Ð°Ñ†Ð¸Ð¸ от Ñторонн - Мониторинг - [Graphite](https://graphiteapp.org) - [graphouse](https://github.com/yandex/graphouse) - - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + + - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) - [graphite-clickhouse](https://github.com/lomik/graphite-clickhouse) - [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer) - оптимизирует партиции таблиц [\*GraphiteMergeTree](../../engines/table-engines/mergetree-family/graphitemergetree.md#graphitemergetree) ÑоглаÑно правилам в [конфигурации rollup](../../engines/table-engines/mergetree-family/graphitemergetree.md#rollup-configuration) - [Grafana](https://grafana.com/) @@ -104,7 +104,7 @@ toc_title: "Библиотеки Ð´Ð»Ñ Ð¸Ð½Ñ‚ÐµÐ³Ñ€Ð°Ñ†Ð¸Ð¸ от Ñторонн - Ruby - [Ruby on Rails](https://rubyonrails.org/) - [activecube](https://github.com/bitquery/activecube) - - [ActiveRecord](https://github.com/PNixx/clickhouse-activerecord) + - [ActiveRecord](https://github.com/PNixx/clickhouse-activerecord) - [GraphQL](https://github.com/graphql) - [activecube-graphql](https://github.com/bitquery/activecube-graphql) - + diff --git a/docs/ru/operations/external-authenticators/kerberos.md b/docs/ru/operations/external-authenticators/kerberos.md index b90714d14fd..2d31e355bba 100644 --- a/docs/ru/operations/external-authenticators/kerberos.md +++ b/docs/ru/operations/external-authenticators/kerberos.md @@ -56,7 +56,7 @@ ClickHouse предоÑтавлÑет возможноÑÑ‚ÑŒ аутентифи Ð’ конфигурационном файле не могут быть указаны одновременно оба параметра. Ð’ противном Ñлучае, Ð°ÑƒÑ‚ÐµÐ½Ñ‚Ð¸Ñ„Ð¸ÐºÐ°Ñ†Ð¸Ñ Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ Kerberos будет недоÑтупна Ð´Ð»Ñ Ð²Ñех пользователей. !!! Warning "Важно" - Ð’ конфигурационном файле может быть не более одной Ñекции `kerberos`. Ð’ противном Ñлучае, Ð°ÑƒÑ‚ÐµÐ½Ñ‚Ð¸Ñ„Ð¸ÐºÐ°Ñ†Ð¸Ñ Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ Kerberos будет отключена Ð´Ð»Ñ Ð²Ñех пользователей. + Ð’ конфигурационном файле может быть не более одной Ñекции `kerberos`. Ð’ противном Ñлучае, Ð°ÑƒÑ‚ÐµÐ½Ñ‚Ð¸Ñ„Ð¸ÐºÐ°Ñ†Ð¸Ñ Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ Kerberos будет отключена Ð´Ð»Ñ Ð²Ñех пользователей. ## ÐÑƒÑ‚ÐµÐ½Ñ‚Ð¸Ñ„Ð¸ÐºÐ°Ñ†Ð¸Ñ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»ÐµÐ¹ Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ Kerberos {#kerberos-as-an-external-authenticator-for-existing-users} diff --git a/docs/ru/operations/external-authenticators/ldap.md b/docs/ru/operations/external-authenticators/ldap.md index 8df59cdfdad..8f1328c9aa6 100644 --- a/docs/ru/operations/external-authenticators/ldap.md +++ b/docs/ru/operations/external-authenticators/ldap.md @@ -32,7 +32,7 @@ /path/to/tls_ca_cert_dir ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:AES256-GCM-SHA384 - + localhost @@ -66,7 +66,7 @@ - При формировании фильтра вÑе подÑтроки `{user_name}`, `{bind_dn}`, `{user_dn}` и `{base_dn}` в шаблоне будут заменÑÑ‚ÑŒÑÑ Ð½Ð° фактичеÑкое Ð¸Ð¼Ñ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ, DN подключениÑ, DN Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ Ð¸ базовый DN ÑоответÑтвенно при каждом LDAP поиÑке. - Обратите внимание, что Ñпециальные Ñимволы должны быть правильно Ñкранированы в XML. - `verification_cooldown` — промежуток времени (в Ñекундах) поÑле уÑпешной попытки подключениÑ, в течение которого пользователь будет ÑчитатьÑÑ Ð°ÑƒÑ‚ÐµÐ½Ñ‚Ð¸Ñ„Ð¸Ñ†Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð½Ñ‹Ð¼ и Ñможет выполнÑÑ‚ÑŒ запроÑÑ‹ без повторного Ð¾Ð±Ñ€Ð°Ñ‰ÐµÐ½Ð¸Ñ Ðº Ñерверам LDAP. - - Чтобы отключить кеширование и заÑтавить обращатьÑÑ Ðº Ñерверу LDAP Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ запроÑа аутентификации, укажите `0` (значение по умолчанию). + - Чтобы отключить кеширование и заÑтавить обращатьÑÑ Ðº Ñерверу LDAP Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ запроÑа аутентификации, укажите `0` (значение по умолчанию). - `enable_tls` — флаг, включающий иÑпользование защищенного ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ Ñервером LDAP. - Укажите `no` Ð´Ð»Ñ Ð¸ÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ Ñ‚ÐµÐºÑтового протокола `ldap://` (не рекомендовано). - Укажите `yes` Ð´Ð»Ñ Ð¾Ð±Ñ€Ð°Ñ‰ÐµÐ½Ð¸Ñ Ðº LDAP по протоколу SSL/TLS `ldaps://` (рекомендовано, иÑпользуетÑÑ Ð¿Ð¾ умолчанию). @@ -78,7 +78,7 @@ - `tls_cert_file` — путь к файлу Ñертификата. - `tls_key_file` — путь к файлу ключа Ñертификата. - `tls_ca_cert_file` — путь к файлу ЦС (certification authority) Ñертификата. -- `tls_ca_cert_dir` — путь к каталогу, Ñодержащему Ñертификаты ЦС. +- `tls_ca_cert_dir` — путь к каталогу, Ñодержащему Ñертификаты ЦС. - `tls_cipher_suite` — разрешенный набор шифров (в нотации OpenSSL). ## Внешний аутентификатор LDAP {#ldap-external-authenticator} @@ -143,7 +143,7 @@ CREATE USER my_user IDENTIFIED WITH ldap SERVER 'my_ldap_server'; clickhouse_ - + my_ad_server @@ -177,6 +177,6 @@ CREATE USER my_user IDENTIFIED WITH ldap SERVER 'my_ldap_server'; - При формировании фильтра вÑе подÑтроки `{user_name}`, `{bind_dn}`, `{user_dn}` и `{base_dn}` в шаблоне будут заменÑÑ‚ÑŒÑÑ Ð½Ð° фактичеÑкое Ð¸Ð¼Ñ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ, DN подключениÑ, DN Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ Ð¸ базовый DN ÑоответÑтвенно при каждом LDAP поиÑке. - Обратите внимание, что Ñпециальные Ñимволы должны быть правильно Ñкранированы в XML. - `attribute` — Ð¸Ð¼Ñ Ð°Ñ‚Ñ€Ð¸Ð±ÑƒÑ‚Ð°, значение которого будет возвращатьÑÑ LDAP поиÑком. По умолчанию: `cn`. - - `prefix` — префикÑ, который, как предполагаетÑÑ, будет находитьÑÑ Ð¿ÐµÑ€ÐµÐ´ началом каждой Ñтроки в иÑходном ÑпиÑке Ñтрок, возвращаемых LDAP поиÑком. ÐŸÑ€ÐµÑ„Ð¸ÐºÑ Ð±ÑƒÐ´ÐµÑ‚ удален из иÑходных Ñтрок, а Ñами они будут раÑÑматриватьÑÑ ÐºÐ°Ðº имена локальных ролей. По умолчанию: пуÑÑ‚Ð°Ñ Ñтрока. + - `prefix` — префикÑ, который, как предполагаетÑÑ, будет находитьÑÑ Ð¿ÐµÑ€ÐµÐ´ началом каждой Ñтроки в иÑходном ÑпиÑке Ñтрок, возвращаемых LDAP поиÑком. ÐŸÑ€ÐµÑ„Ð¸ÐºÑ Ð±ÑƒÐ´ÐµÑ‚ удален из иÑходных Ñтрок, а Ñами они будут раÑÑматриватьÑÑ ÐºÐ°Ðº имена локальных ролей. По умолчанию: пуÑÑ‚Ð°Ñ Ñтрока. [ÐžÑ€Ð¸Ð³Ð¸Ð½Ð°Ð»ÑŒÐ½Ð°Ñ ÑтатьÑ](https://clickhouse.tech/docs/en/operations/external-authenticators/ldap) diff --git a/docs/ru/operations/opentelemetry.md b/docs/ru/operations/opentelemetry.md index 073e7c67e9c..4065f88350b 100644 --- a/docs/ru/operations/opentelemetry.md +++ b/docs/ru/operations/opentelemetry.md @@ -5,7 +5,7 @@ toc_title: Поддержка OpenTelemetry # [ÑкÑпериментально] Поддержка OpenTelemetry -ClickHouse поддерживает [OpenTelemetry](https://opentelemetry.io/) — открытый Ñтандарт Ð´Ð»Ñ Ñбора траÑÑировок и метрик из раÑпределенного приложениÑ. +ClickHouse поддерживает [OpenTelemetry](https://opentelemetry.io/) — открытый Ñтандарт Ð´Ð»Ñ Ñбора траÑÑировок и метрик из раÑпределенного приложениÑ. !!! warning "Предупреждение" Поддержка Ñтандарта ÑкÑÐ¿ÐµÑ€Ð¸Ð¼ÐµÐ½Ñ‚Ð°Ð»ÑŒÐ½Ð°Ñ Ð¸ будет Ñо временем менÑÑ‚ÑŒÑÑ. diff --git a/docs/ru/operations/server-configuration-parameters/settings.md b/docs/ru/operations/server-configuration-parameters/settings.md index abaf2a8f2da..6b4e25eb692 100644 --- a/docs/ru/operations/server-configuration-parameters/settings.md +++ b/docs/ru/operations/server-configuration-parameters/settings.md @@ -34,6 +34,7 @@ ClickHouse перезагружает вÑтроенные Ñловари Ñ Ð· ... ... ... + ... ... @@ -43,11 +44,12 @@ ClickHouse перезагружает вÑтроенные Ñловари Ñ Ð· - `min_part_size` - Минимальный размер чаÑти таблицы. - `min_part_size_ratio` - Отношение размера минимальной чаÑти таблицы к полному размеру таблицы. -- `method` - Метод ÑжатиÑ. Возможные значениÑ: `lz4`, `zstd`. +- `method` - Метод ÑжатиÑ. Возможные значениÑ: `lz4`, `lz4hc`, `zstd`. +- `level` – Уровень ÑжатиÑ. См. [Кодеки](../../sql-reference/statements/create/table/#create-query-common-purpose-codecs). Можно Ñконфигурировать неÑколько разделов ``. -ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part_size_ratio` и выполнит те блоки `case`, Ð´Ð»Ñ ÐºÐ¾Ñ‚Ð¾Ñ€Ñ‹Ñ… уÑÐ»Ð¾Ð²Ð¸Ñ Ñовпали. +ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part_size_ratio` и выполнит те блоки `case`, Ð´Ð»Ñ ÐºÐ¾Ñ‚Ð¾Ñ€Ñ‹Ñ… уÑÐ»Ð¾Ð²Ð¸Ñ Ñовпали. - ЕÑли куÑок данных Ñовпадает Ñ ÑƒÑловиÑми, ClickHouse иÑпользует указанные метод ÑжатиÑ. - ЕÑли куÑок данных Ñовпадает Ñ Ð½ÐµÑколькими блоками `case`, ClickHouse иÑпользует перый Ñовпавший блок уÑловий. @@ -62,6 +64,7 @@ ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part 10000000000 0.01 zstd + 1 ``` @@ -98,7 +101,7 @@ ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part ```xml 1073741824 - + ``` ## database_atomic_delay_before_drop_table_sec {#database_atomic_delay_before_drop_table_sec} @@ -154,7 +157,7 @@ ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part ЕÑли `true`, то каждый Ñловарь ÑоздаётÑÑ Ð¿Ñ€Ð¸ первом иÑпользовании. ЕÑли Ñловарь не удалоÑÑŒ Ñоздать, то вызов функции, иÑпользующей Ñловарь, Ñгенерирует иÑключение. -ЕÑли `false`, то вÑе Ñловари ÑоздаютÑÑ Ð¿Ñ€Ð¸ Ñтарте Ñервера, еÑли Ñловарь или Ñловари ÑоздаютÑÑ Ñлишком долго или ÑоздаютÑÑ Ñ Ð¾ÑˆÐ¸Ð±ÐºÐ¾Ð¹, то Ñервер загружаетÑÑ Ð±ÐµÐ· +ЕÑли `false`, то вÑе Ñловари ÑоздаютÑÑ Ð¿Ñ€Ð¸ Ñтарте Ñервера, еÑли Ñловарь или Ñловари ÑоздаютÑÑ Ñлишком долго или ÑоздаютÑÑ Ñ Ð¾ÑˆÐ¸Ð±ÐºÐ¾Ð¹, то Ñервер загружаетÑÑ Ð±ÐµÐ· Ñтих Ñловарей и продолжает попытки Ñоздать Ñти Ñловари. По умолчанию - `true`. @@ -424,7 +427,7 @@ ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part Ключи: -- `enabled` – Булевый флаг чтобы включить функциональноÑÑ‚ÑŒ, по умолчанию `false`. УÑтановите `true` чтобы разрешить отправку отчетов о ÑбоÑÑ…. +- `enabled` – Булевый флаг чтобы включить функциональноÑÑ‚ÑŒ, по умолчанию `false`. УÑтановите `true` чтобы разрешить отправку отчетов о ÑбоÑÑ…. - `endpoint` – Ð’Ñ‹ можете переопределить URL на который будут отÑылатьÑÑ Ð¾Ñ‚Ñ‡ÐµÑ‚Ñ‹ об ошибках и иÑпользовать ÑобÑтвенную инÑталÑцию Sentry. ИÑпользуйте URL ÑинтакÑÐ¸Ñ [Sentry DSN](https://docs.sentry.io/error-reporting/quickstart/?platform=native#configure-the-sdk). - `anonymize` - Запретить отÑылку имени хоÑта Ñервера в отчете о Ñбое. - `http_proxy` - ÐаÑтройка HTTP proxy Ð´Ð»Ñ Ð¾Ñ‚Ñылки отчетов о ÑбоÑÑ…. @@ -487,7 +490,7 @@ ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part ## max_server_memory_usage_to_ram_ratio {#max_server_memory_usage_to_ram_ratio} -ОпределÑет долю оперативной памÑти, доÑтупную Ð´Ð»Ñ Ð¸ÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ Ñервером Clickhouse. ЕÑли Ñервер попытаетÑÑ Ð¸Ñпользовать больше, предоÑтавлÑемый ему объём памÑти будет ограничен до раÑчётного значениÑ. +ОпределÑет долю оперативной памÑти, доÑтупную Ð´Ð»Ñ Ð¸ÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ Ñервером Clickhouse. ЕÑли Ñервер попытаетÑÑ Ð¸Ñпользовать больше, предоÑтавлÑемый ему объём памÑти будет ограничен до раÑчётного значениÑ. Возможные значениÑ: @@ -515,7 +518,7 @@ ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part ОпределÑет макÑимальное количеÑтво одновременно обрабатываемых запроÑов, ÑвÑзанных Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†ÐµÐ¹ ÑемейÑтва `MergeTree`. ЗапроÑÑ‹ также могут быть ограничены наÑтройками: [max_concurrent_queries_for_all_users](#max-concurrent-queries-for-all-users), [min_marks_to_honor_max_concurrent_queries](#min-marks-to-honor-max-concurrent-queries). !!! info "Примечание" - Параметры Ñтих наÑтроек могут быть изменены во Ð²Ñ€ÐµÐ¼Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов и вÑтупÑÑ‚ в Ñилу немедленно. ЗапроÑÑ‹, которые уже запущены, выполнÑÑ‚ÑÑ Ð±ÐµÐ· изменений. + Параметры Ñтих наÑтроек могут быть изменены во Ð²Ñ€ÐµÐ¼Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов и вÑтупÑÑ‚ в Ñилу немедленно. ЗапроÑÑ‹, которые уже запущены, выполнÑÑ‚ÑÑ Ð±ÐµÐ· изменений. Возможные значениÑ: @@ -864,8 +867,8 @@ ClickHouse проверÑет уÑÐ»Ð¾Ð²Ð¸Ñ Ð´Ð»Ñ `min_part_size` и `min_part - `engine` - уÑтанавливает [наÑтройки MergeTree Engine](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) Ð´Ð»Ñ ÑиÑтемной таблицы. ÐÐµÐ»ÑŒÐ·Ñ Ð¸Ñпользовать еÑли иÑпользуетÑÑ `partition_by`. - `flush_interval_milliseconds` — период ÑброÑа данных из буфера в памÑти в таблицу. -**Пример** -```xml +**Пример** +```xml notice @@ -929,7 +932,7 @@ The default server configuration file `config.xml` contains the following settin `system.events` таблица Ñодержит Ñчетчик `QueryMaskingRulesMatch` который Ñчитает общее кол-во Ñовпадений правил маÑкировки. -Ð”Ð»Ñ Ñ€Ð°Ñпределенных запроÑов каждый Ñервер должен быть Ñконфигурирован отдельно, иначе, подзапроÑÑ‹, +Ð”Ð»Ñ Ñ€Ð°Ñпределенных запроÑов каждый Ñервер должен быть Ñконфигурирован отдельно, иначе, подзапроÑÑ‹, переданные на другие узлы, будут ÑохранÑÑ‚ÑŒÑÑ Ð±ÐµÐ· маÑкировки. ## remote_servers {#server-settings-remote-servers} @@ -1177,7 +1180,7 @@ ClickHouse иÑпользует ZooKeeper Ð´Ð»Ñ Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð¼ÐµÑ‚Ð°Ð´Ð°Ð½ Ð¡ÐµÐºÑ†Ð¸Ñ ÐºÐ¾Ð½Ñ„Ð¸Ð³ÑƒÑ€Ð°Ñ†Ð¸Ð¾Ð½Ð½Ð¾Ð³Ð¾ файла,ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ñодержит наÑтройки: - Путь к конфигурационному файлу Ñ Ð¿Ñ€ÐµÐ´ÑƒÑтановленными пользователÑми. -- Путь к файлу, в котором ÑодержатÑÑ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ð¸, Ñозданные при помощи SQL команд. +- Путь к файлу, в котором ÑодержатÑÑ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ð¸, Ñозданные при помощи SQL команд. ЕÑли Ñта ÑÐµÐºÑ†Ð¸Ñ Ð¾Ð¿Ñ€ÐµÐ´ÐµÐ»ÐµÐ½Ð°, путь из [users_config](../../operations/server-configuration-parameters/settings.md#users-config) и [access_control_path](../../operations/server-configuration-parameters/settings.md#access_control_path) не иÑпользуетÑÑ. diff --git a/docs/ru/operations/settings/index.md b/docs/ru/operations/settings/index.md index 050df975b47..926a1217ef6 100644 --- a/docs/ru/operations/settings/index.md +++ b/docs/ru/operations/settings/index.md @@ -24,13 +24,13 @@ toc_title: Introduction - При запуÑке конÑольного клиента ClickHouse в не интерактивном режиме уÑтановите параметр запуÑка `--setting=value`. - При иÑпользовании HTTP API передавайте cgi-параметры (`URL?setting_1=value&setting_2=value...`). - - Укажите необходимые наÑтройки в Ñекции [SETTINGS](../../sql-reference/statements/select/index.md#settings-in-select) запроÑа SELECT. Эти наÑтройки дейÑтвуют только в рамках данного запроÑа, а поÑле его Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ ÑбраÑываютÑÑ Ð´Ð¾ предыдущего Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð»Ð¸ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ умолчанию. + - Укажите необходимые наÑтройки в Ñекции [SETTINGS](../../sql-reference/statements/select/index.md#settings-in-select) запроÑа SELECT. Эти наÑтройки дейÑтвуют только в рамках данного запроÑа, а поÑле его Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ ÑбраÑываютÑÑ Ð´Ð¾ предыдущего Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð»Ð¸ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ умолчанию. ÐаÑтройки, которые можно задать только в конфигурационном файле Ñервера, в разделе не раÑÑматриваютÑÑ. ## ПользовательÑкие наÑтройки {#custom_settings} -Ð’ дополнение к общим [наÑтройкам](../../operations/settings/settings.md), пользователи могут определÑÑ‚ÑŒ ÑобÑтвенные наÑтройки. +Ð’ дополнение к общим [наÑтройкам](../../operations/settings/settings.md), пользователи могут определÑÑ‚ÑŒ ÑобÑтвенные наÑтройки. Ðазвание пользовательÑкой наÑтройки должно начинатьÑÑ Ñ Ð¾Ð´Ð½Ð¾Ð³Ð¾ из предопределённых префикÑов. СпиÑок Ñтих префикÑов должен быть задан в параметре [custom_settings_prefixes](../../operations/server-configuration-parameters/settings.md#custom_settings_prefixes) конфигурационнного файла Ñервера. @@ -47,7 +47,7 @@ SET custom_a = 123; Чтобы получить текущее значение пользовательÑкой наÑтройки, иÑпользуйте функцию `getSetting()`: ```sql -SELECT getSetting('custom_a'); +SELECT getSetting('custom_a'); ``` **См. также** diff --git a/docs/ru/operations/settings/merge-tree-settings.md b/docs/ru/operations/settings/merge-tree-settings.md index 9ae247cf7a7..88c511d4d80 100644 --- a/docs/ru/operations/settings/merge-tree-settings.md +++ b/docs/ru/operations/settings/merge-tree-settings.md @@ -119,7 +119,7 @@ EÑли Ñуммарное чиÑло активных куÑков во вÑе Значение по умолчанию: 100. Команда `Insert` Ñоздает один или неÑколько блоков (куÑков). При вÑтавке в Replicated таблицы ClickHouse Ð´Ð»Ñ [дедупликации вÑтавок](../../engines/table-engines/mergetree-family/replication.md) запиÑывает в Zookeeper хеш-Ñуммы Ñозданных куÑков. Ðо хранÑÑ‚ÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ поÑледние `replicated_deduplication_window` хеш-Ñумм. Самые Ñтарые хеш-Ñуммы удалÑÑŽÑ‚ÑÑ Ð¸Ð· Zookeeper. -Большое значение `replicated_deduplication_window` замедлÑет `Insert`, так как приходитÑÑ Ñравнивать большее количеÑтво хеш-Ñумм. +Большое значение `replicated_deduplication_window` замедлÑет `Insert`, так как приходитÑÑ Ñравнивать большее количеÑтво хеш-Ñумм. Хеш-Ñумма раÑÑчитываетÑÑ Ð¿Ð¾ названиÑм и типам полей, а также по данным вÑтавленного куÑка (потока байт). ## non_replicated_deduplication_window {#non-replicated-deduplication-window} @@ -162,8 +162,8 @@ EÑли Ñуммарное чиÑло активных куÑков во вÑе При запиÑи нового куÑка `fsync` не вызываетÑÑ, поÑтому неактивные куÑки удалÑÑŽÑ‚ÑÑ Ð¿Ð¾Ð·Ð¶Ðµ. Это значит, что некоторое Ð²Ñ€ÐµÐ¼Ñ Ð½Ð¾Ð²Ñ‹Ð¹ куÑок находитÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ в оперативной памÑти Ñервера (кеш ОС). ЕÑли Ñервер перезагрузитÑÑ Ñпонтанно, новый Ñлитый куÑок может иÑпортитьÑÑ Ð¸Ð»Ð¸ потерÑÑ‚ÑŒÑÑ. -Во Ð²Ñ€ÐµÐ¼Ñ Ð·Ð°Ð¿ÑƒÑка Ñервер ClickHouse проверÑет целоÑтноÑÑ‚ÑŒ куÑков. -ЕÑли новый (Ñлитый) куÑок поврежден, ClickHouse возвращает неактивные куÑки в ÑпиÑок активных и позже Ñнова выполнÑет ÑлиÑние. Ð’ Ñтом Ñлучае иÑпорченный куÑок получает новое Ð¸Ð¼Ñ (добавлÑетÑÑ Ð¿Ñ€ÐµÑ„Ð¸ÐºÑ `broken_`) и попадает в каталог `detached`. +Во Ð²Ñ€ÐµÐ¼Ñ Ð·Ð°Ð¿ÑƒÑка Ñервер ClickHouse проверÑет целоÑтноÑÑ‚ÑŒ куÑков. +ЕÑли новый (Ñлитый) куÑок поврежден, ClickHouse возвращает неактивные куÑки в ÑпиÑок активных и позже Ñнова выполнÑет ÑлиÑние. Ð’ Ñтом Ñлучае иÑпорченный куÑок получает новое Ð¸Ð¼Ñ (добавлÑетÑÑ Ð¿Ñ€ÐµÑ„Ð¸ÐºÑ `broken_`) и попадает в каталог `detached`. ЕÑли проверка целоÑтноÑти не выÑвлÑет проблем в Ñлитом куÑке, то иÑходные неактивные куÑки переименовываютÑÑ (добавлÑетÑÑ Ð¿Ñ€ÐµÑ„Ð¸ÐºÑ `ignored_`) и перемещаютÑÑ Ð² каталог `detached`. Стандартное Ð´Ð»Ñ Linux значение `dirty_expire_centisecs` — 30 Ñекунд. Это макÑимальное времÑ, в течение которого запиÑанные данные хранÑÑ‚ÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ в оперативной памÑти. ЕÑли нагрузка на диÑковую ÑиÑтему большаÑ, то данные запиÑываютÑÑ Ð½Ð°Ð¼Ð½Ð¾Ð³Ð¾ позже. Значение 480 Ñекунд подобрали ÑкÑпериментальным путем — Ñто времÑ, за которое новый куÑок гарантированно запишетÑÑ Ð½Ð° диÑк. @@ -277,4 +277,15 @@ EÑли Ñуммарное чиÑло активных куÑков во вÑе Значение по умолчанию: `0`. -[Original article](https://clickhouse.tech/docs/ru/operations/settings/merge_tree_settings/) +## check_sample_column_is_correct {#check_sample_column_is_correct} + +Разрешает проверку того, что тип данных Ñтолбца Ð´Ð»Ñ ÑÑÐ¼Ð¿Ð»Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð¸Ð»Ð¸ Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ ÑÑÐ¼Ð¿Ð»Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð¿Ñ€Ð¸ Ñоздании таблицы верный. Тип данных должен ÑоответÑтвовать одному из беззнаковых [целочиÑленных типов](../../sql-reference/data-types/int-uint.md): `UInt8`, `UInt16`, `UInt32`, `UInt64`. + +Возможные значениÑ: + +- true — проверка включена. +- false — проверка при Ñоздании таблицы не проводитÑÑ. + +Значение по умолчанию: `true`. + +По умолчанию Ñервер ClickHouse при Ñоздании таблицы проверÑет тип данных Ñтолбца Ð´Ð»Ñ ÑÑÐ¼Ð¿Ð»Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð¸Ð»Ð¸ Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ ÑÑмплированиÑ. ЕÑли уже ÑущеÑтвуют таблицы Ñ Ð½ÐµÐºÐ¾Ñ€Ñ€ÐµÐºÑ‚Ð½Ñ‹Ð¼ выражением ÑÑмплированиÑ, то чтобы не возникало иÑключение при запуÑке Ñервера, уÑтановите `check_sample_column_is_correct` в значение `false`. diff --git a/docs/ru/operations/settings/query-complexity.md b/docs/ru/operations/settings/query-complexity.md index c2e00302d18..dcca2a254b8 100644 --- a/docs/ru/operations/settings/query-complexity.md +++ b/docs/ru/operations/settings/query-complexity.md @@ -66,21 +66,21 @@ toc_title: "ÐžÐ³Ñ€Ð°Ð½Ð¸Ñ‡ÐµÐ½Ð¸Ñ Ð½Ð° ÑложноÑÑ‚ÑŒ запроÑа" Следующие Ð¾Ð³Ñ€Ð°Ð½Ð¸Ñ‡ÐµÐ½Ð¸Ñ Ð¼Ð¾Ð³ÑƒÑ‚ проверÑÑ‚ÑŒÑÑ Ð½Ð° каждый блок (а не на каждую Ñтроку). То еÑÑ‚ÑŒ, Ð¾Ð³Ñ€Ð°Ð½Ð¸Ñ‡ÐµÐ½Ð¸Ñ Ð¼Ð¾Ð³ÑƒÑ‚ быть немного нарушены. МакÑимальное количеÑтво Ñтрочек, которое можно прочитать из таблицы на удалённом Ñервере при выполнении -раÑпределенного запроÑа. РаÑпределенные запроÑÑ‹ могут Ñоздавать неÑколько подзапроÑов к каждому из шардов в клаÑтере и -тогда Ñтот лимит будет применен при выполнении Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð½Ð° удаленных Ñерверах (Ð²ÐºÐ»ÑŽÑ‡Ð°Ñ Ð¸ Ñервер-инициатор) и проигнорирован -на Ñервере-инициаторе запроÑа во Ð²Ñ€ÐµÐ¼Ñ Ð¾Ð±ÑŒÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð½Ñ‹Ñ… результатов. Ðапример, клаÑтер ÑоÑтоит из 2 шард и каждый -из них хранит таблицу Ñ 100 Ñтрок. Тогда раÑпределнный Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð´Ð»Ñ Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð¸Ñ Ð²Ñех данных из Ñтих таблиц и уÑтановленной -наÑтройкой `max_rows_to_read=150` выброÑит иÑключение, Ñ‚.к. в общем он прочитает 200 Ñтрок. Ðо Ð·Ð°Ð¿Ñ€Ð¾Ñ +раÑпределенного запроÑа. РаÑпределенные запроÑÑ‹ могут Ñоздавать неÑколько подзапроÑов к каждому из шардов в клаÑтере и +тогда Ñтот лимит будет применен при выполнении Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð½Ð° удаленных Ñерверах (Ð²ÐºÐ»ÑŽÑ‡Ð°Ñ Ð¸ Ñервер-инициатор) и проигнорирован +на Ñервере-инициаторе запроÑа во Ð²Ñ€ÐµÐ¼Ñ Ð¾Ð±ÑŒÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð½Ñ‹Ñ… результатов. Ðапример, клаÑтер ÑоÑтоит из 2 шард и каждый +из них хранит таблицу Ñ 100 Ñтрок. Тогда раÑпределнный Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð´Ð»Ñ Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð¸Ñ Ð²Ñех данных из Ñтих таблиц и уÑтановленной +наÑтройкой `max_rows_to_read=150` выброÑит иÑключение, Ñ‚.к. в общем он прочитает 200 Ñтрок. Ðо Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñ Ð½Ð°Ñтройкой `max_rows_to_read_leaf=150` завершитÑÑ ÑƒÑпешно, потому что каждый из шардов прочитает макÑимум 100 Ñтрок. ## max_bytes_to_read_leaf {#max-bytes-to-read-leaf} -МакÑимальное количеÑтво байт (неÑжатых данных), которое можно прочитать из таблицы на удалённом Ñервере при -выполнении раÑпределенного запроÑа. РаÑпределенные запроÑÑ‹ могут Ñоздавать неÑколько подзапроÑов к каждому из шардов в -клаÑтере и тогда Ñтот лимит будет применен при выполнении Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð½Ð° удаленных Ñерверах (Ð²ÐºÐ»ÑŽÑ‡Ð°Ñ Ð¸ Ñервер-инициатор) -и проигнорирован на Ñервере-инициаторе запроÑа во Ð²Ñ€ÐµÐ¼Ñ Ð¾Ð±ÑŒÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð½Ñ‹Ñ… результатов. Ðапример, клаÑтер ÑоÑтоит -из 2 шард и каждый из них хранит таблицу Ñо 100 байтами. Тогда раÑпределнный Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð´Ð»Ñ Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð¸Ñ Ð²Ñех данных из Ñтих таблиц -и уÑтановленной наÑтройкой `max_bytes_to_read=150` выброÑит иÑключение, Ñ‚.к. в общем он прочитает 200 байт. Ðо Ð·Ð°Ð¿Ñ€Ð¾Ñ +МакÑимальное количеÑтво байт (неÑжатых данных), которое можно прочитать из таблицы на удалённом Ñервере при +выполнении раÑпределенного запроÑа. РаÑпределенные запроÑÑ‹ могут Ñоздавать неÑколько подзапроÑов к каждому из шардов в +клаÑтере и тогда Ñтот лимит будет применен при выполнении Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð½Ð° удаленных Ñерверах (Ð²ÐºÐ»ÑŽÑ‡Ð°Ñ Ð¸ Ñервер-инициатор) +и проигнорирован на Ñервере-инициаторе запроÑа во Ð²Ñ€ÐµÐ¼Ñ Ð¾Ð±ÑŒÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð½Ñ‹Ñ… результатов. Ðапример, клаÑтер ÑоÑтоит +из 2 шард и каждый из них хранит таблицу Ñо 100 байтами. Тогда раÑпределнный Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð´Ð»Ñ Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð¸Ñ Ð²Ñех данных из Ñтих таблиц +и уÑтановленной наÑтройкой `max_bytes_to_read=150` выброÑит иÑключение, Ñ‚.к. в общем он прочитает 200 байт. Ðо Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñ Ð½Ð°Ñтройкой `max_bytes_to_read_leaf=150` завершитÑÑ ÑƒÑпешно, потому что каждый из шардов прочитает макÑимум 100 байт. ## read_overflow_mode_leaf {#read-overflow-mode-leaf} diff --git a/docs/ru/operations/settings/settings-profiles.md b/docs/ru/operations/settings/settings-profiles.md index d3b3d29db94..fdf350cff41 100644 --- a/docs/ru/operations/settings/settings-profiles.md +++ b/docs/ru/operations/settings/settings-profiles.md @@ -71,9 +71,9 @@ SET profile = 'web' ``` -Ð’ примере задано два профилÑ: `default` и `web`. +Ð’ примере задано два профилÑ: `default` и `web`. -Профиль `default` имеет Ñпециальное значение — он обÑзателен и применÑетÑÑ Ð¿Ñ€Ð¸ запуÑке Ñервера. Профиль `default` Ñодержит наÑтройки по умолчанию. +Профиль `default` имеет Ñпециальное значение — он обÑзателен и применÑетÑÑ Ð¿Ñ€Ð¸ запуÑке Ñервера. Профиль `default` Ñодержит наÑтройки по умолчанию. Профиль `web` — обычный профиль, который может быть уÑтановлен Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ запроÑа `SET` или параметра URL при запроÑе по HTTP. diff --git a/docs/ru/operations/settings/settings.md b/docs/ru/operations/settings/settings.md index 5518736ff47..c74b3f5f2a5 100644 --- a/docs/ru/operations/settings/settings.md +++ b/docs/ru/operations/settings/settings.md @@ -25,6 +25,30 @@ ClickHouse применÑет наÑтройку в тех ÑлучаÑÑ…, ко - `global` — заменÑет Ð·Ð°Ð¿Ñ€Ð¾Ñ `IN`/`JOIN` на `GLOBAL IN`/`GLOBAL JOIN.` - `allow` — разрешает иÑпользование таких подзапроÑов. +## prefer_global_in_and_join {#prefer-global-in-and-join} + +ЗаменÑет Ð·Ð°Ð¿Ñ€Ð¾Ñ `IN`/`JOIN` на `GLOBAL IN`/`GLOBAL JOIN`. + +Возможные значениÑ: + +- 0 — выключена. Операторы `IN`/`JOIN` не заменÑÑŽÑ‚ÑÑ Ð½Ð° `GLOBAL IN`/`GLOBAL JOIN`. +- 1 — включена. Операторы `IN`/`JOIN` заменÑÑŽÑ‚ÑÑ Ð½Ð° `GLOBAL IN`/`GLOBAL JOIN`. + +Значение по умолчанию: `0`. + +**ИÑпользование** + +ÐаÑтройка `SET distributed_product_mode=global` менÑет поведение запроÑов Ð´Ð»Ñ Ñ€Ð°Ñпределенных таблиц, но она не подходит Ð´Ð»Ñ Ð»Ð¾ÐºÐ°Ð»ÑŒÐ½Ñ‹Ñ… таблиц или таблиц из внешних иÑточников. Ð’ Ñтих ÑлучаÑÑ… удобно иÑпользовать наÑтройку `prefer_global_in_and_join`. + +Ðапример, еÑли нужно объединить вÑе данные из локальных таблиц, которые находÑÑ‚ÑÑ Ð½Ð° разных узлах — Ð´Ð»Ñ Ñ€Ð°Ñпределенной обработки необходим `GLOBAL JOIN`. + +Другой вариант иÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ Ð½Ð°Ñтройки `prefer_global_in_and_join` — регулирование обращений к таблицам из внешних иÑточников. +Эта наÑтройка помогает уменьшить количеÑтво обращений к внешним реÑурÑам при объединении внешних таблиц: только один вызов на веÑÑŒ раÑпределенный запроÑ. + +**См. также:** + +- [РаÑпределенные подзапроÑÑ‹](../../sql-reference/operators/in.md#select-distributed-subqueries) `GLOBAL IN`/`GLOBAL JOIN` + ## enable_optimize_predicate_expression {#enable-optimize-predicate-expression} Включает пробраÑывание предикатов в подзапроÑÑ‹ Ð´Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов `SELECT`. @@ -129,6 +153,26 @@ ClickHouse применÑет наÑтройку в тех ÑлучаÑÑ…, ко Значение по умолчанию: 1048576. +## table_function_remote_max_addresses {#table_function_remote_max_addresses} + +Задает макÑимальное количеÑтво адреÑов, которые могут быть Ñгенерированы из шаблонов Ð´Ð»Ñ Ñ„ÑƒÐ½ÐºÑ†Ð¸Ð¸ [remote](../../sql-reference/table-functions/remote.md). + +Возможные значениÑ: + +- Положительное целое. + +Значение по умолчанию: `1000`. + +## glob_expansion_max_elements {#glob_expansion_max_elements } + +Задает макÑимальное количеÑтво адреÑов, которые могут быть Ñгенерированы из шаблонов при иÑпользовании внешних хранилищ и при вызове табличных функциÑÑ… (например, [url](../../sql-reference/table-functions/url.md)), кроме функции `remote`. + +Возможные значениÑ: + +- Положительное целое. + +Значение по умолчанию: `1000`. + ## send_progress_in_http_headers {#settings-send_progress_in_http_headers} Включает или отключает HTTP-заголовки `X-ClickHouse-Progress` в ответах `clickhouse-server`. @@ -490,6 +534,23 @@ ClickHouse может парÑить только базовый формат `Y Значение по умолчанию: `ALL`. +## join_algorithm {#settings-join_algorithm} + +ОпределÑет алгоритм Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа [JOIN](../../sql-reference/statements/select/join.md). + +Возможные значениÑ: + +- `hash` — иÑпользуетÑÑ [алгоритм ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ…ÐµÑˆÐ¸Ñ€Ð¾Ð²Ð°Ð½Ð¸ÐµÐ¼](https://ru.wikipedia.org/wiki/Ðлгоритм_ÑоединениÑ_хешированием). +- `partial_merge` — иÑпользуетÑÑ [алгоритм ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ ÑлиÑнием Ñортированных ÑпиÑков](https://ru.wikipedia.org/wiki/Ðлгоритм_ÑоединениÑ_ÑлиÑнием_Ñортированных_ÑпиÑков). +- `prefer_partial_merge` — иÑпользуетÑÑ Ð°Ð»Ð³Ð¾Ñ€Ð¸Ñ‚Ð¼ ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ ÑлиÑнием Ñортированных ÑпиÑков, когда Ñто возможно. +- `auto` — Ñервер ClickHouse пытаетÑÑ Ð½Ð° лету заменить алгоритм `hash` на `merge`, чтобы избежать Ð¿ÐµÑ€ÐµÐ¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¿Ð°Ð¼Ñти. + +Значение по умолчанию: `hash`. + +При иÑпользовании алгоритма `hash` Ð¿Ñ€Ð°Ð²Ð°Ñ Ñ‡Ð°ÑÑ‚ÑŒ `JOIN` загружаетÑÑ Ð² оперативную памÑÑ‚ÑŒ. + +При иÑпользовании алгоритма `partial_merge` Ñервер Ñортирует данные и ÑбраÑывает их на диÑк. Работа алгоритма `merge` в ClickHouse немного отличаетÑÑ Ð¾Ñ‚ клаÑÑичеÑкой реализации. Сначала ClickHouse Ñортирует правую таблицу по блокам на оÑнове [ключей ÑоединениÑ](../../sql-reference/statements/select/join.md#select-join) и Ð´Ð»Ñ Ð¾Ñ‚Ñортированных блоков Ñтроит индекÑÑ‹ min-max. Затем он Ñортирует куÑки левой таблицы на оÑнове ключей ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¸ объединÑет их Ñ Ð¿Ñ€Ð°Ð²Ð¾Ð¹ таблицей операцией `JOIN`. Созданные min-max индекÑÑ‹ иÑпользуютÑÑ Ð´Ð»Ñ Ð¿Ñ€Ð¾Ð¿ÑƒÑка тех блоков из правой таблицы, которые не учаÑтвуют в данной операции `JOIN`. + ## join_any_take_last_row {#settings-join_any_take_last_row} ИзменÑет поведение операций, выполнÑемых Ñо ÑтрогоÑтью `ANY`. @@ -1821,7 +1882,7 @@ ClickHouse генерирует иÑключение Тип: unsigned int -озможные значениÑ: 32 (32 байта) - 1073741824 (1 GiB) +Возможные значениÑ: 32 (32 байта) - 1073741824 (1 GiB) Значение по умолчанию: 32768 (32 KiB) @@ -1835,6 +1896,16 @@ ClickHouse генерирует иÑключение Значение по умолчанию: 16. +## merge_selecting_sleep_ms {#merge_selecting_sleep_ms} + +Ð’Ñ€ÐµÐ¼Ñ Ð¾Ð¶Ð¸Ð´Ð°Ð½Ð¸Ñ Ð´Ð»Ñ ÑлиÑÐ½Ð¸Ñ Ð²Ñ‹Ð±Ð¾Ñ€ÐºÐ¸, еÑли ни один куÑок не выбран. Снижение времени Ð¾Ð¶Ð¸Ð´Ð°Ð½Ð¸Ñ Ð¿Ñ€Ð¸Ð²Ð¾Ð´Ð¸Ñ‚ к чаÑтому выбору задач в пуле `background_schedule_pool` и увеличению количеÑтва запроÑов к Zookeeper в крупных клаÑтерах. + +Возможные значениÑ: + +- Положительное целое чиÑло. + +Значение по умолчанию: `5000`. + ## parallel_distributed_insert_select {#parallel_distributed_insert_select} Включает параллельную обработку раÑпределённых запроÑов `INSERT ... SELECT`. @@ -2730,7 +2801,7 @@ SELECT * FROM test2; └─────────────┘ ``` -Обратите внимание на то, что Ñта наÑтройка влиÑет на поведение [материализованных предÑтавлений](../../sql-reference/statements/create/view.md#materialized) и БД [MaterializeMySQL](../../engines/database-engines/materialize-mysql.md). +Обратите внимание на то, что Ñта наÑтройка влиÑет на поведение [материализованных предÑтавлений](../../sql-reference/statements/create/view.md#materialized) и БД [MaterializedMySQL](../../engines/database-engines/materialized-mysql.md). ## engine_file_empty_if_not_exists {#engine-file-empty_if-not-exists} @@ -2986,6 +3057,53 @@ SELECT FROM fuse_tbl ``` +## allow_experimental_database_replicated {#allow_experimental_database_replicated} + +ПозволÑет Ñоздавать базы данных Ñ Ð´Ð²Ð¸Ð¶ÐºÐ¾Ð¼ [Replicated](../../engines/database-engines/replicated.md). + +Возможные значениÑ: + +- 0 — Disabled. +- 1 — Enabled. + +Значение по умолчанию: `0`. + +## database_replicated_initial_query_timeout_sec {#database_replicated_initial_query_timeout_sec} + +УÑтанавливает, как долго начальный DDL-Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð´Ð¾Ð»Ð¶ÐµÐ½ ждать, пока Ñ€ÐµÐ¿Ð»Ð¸Ñ†Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð½Ð°Ñ Ð±Ð°Ð·Ð° данных прецеÑÑирует предыдущие запиÑи очереди DDL в Ñекундах. + +Возможные значениÑ: + +- Положительное целое чиÑло. +- 0 — Ðе ограничено. + +Значение по умолчанию: `300`. + +## distributed_ddl_task_timeout {#distributed_ddl_task_timeout} + +УÑтанавливает тайм-аут Ð´Ð»Ñ Ð¾Ñ‚Ð²ÐµÑ‚Ð¾Ð² на DDL-запроÑÑ‹ от вÑех хоÑтов в клаÑтере. ЕÑли DDL-Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ был выполнен на вÑех хоÑтах, ответ будет Ñодержать ошибку тайм-аута, и Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð±ÑƒÐ´ÐµÑ‚ выполнен в аÑинхронном режиме. + +Возможные значениÑ: + +- Положительное целое чиÑло. +- 0 — ÐÑинхронный режим. +- Отрицательное чиÑло — беÑконечный тайм-аут. + +Значение по умолчанию: `180`. + +## distributed_ddl_output_mode {#distributed_ddl_output_mode} + +Задает формат результата раÑпределенного DDL-запроÑа. + +Возможные значениÑ: + +- `throw` — возвращает набор результатов Ñо ÑтатуÑом Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов Ð´Ð»Ñ Ð²Ñех хоÑтов, где завершен запроÑ. ЕÑли Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ выполнилÑÑ Ð½Ð° некоторых хоÑтах, то будет выброшено иÑключение. ЕÑли Ð·Ð°Ð¿Ñ€Ð¾Ñ ÐµÑ‰Ðµ не закончен на некоторых хоÑтах и таймаут [distributed_ddl_task_timeout](#distributed_ddl_task_timeout) превышен, то выбраÑываетÑÑ Ð¸Ñключение `TIMEOUT_EXCEEDED`. +- `none` — идентично `throw`, но раÑпределенный DDL-Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ возвращает набор результатов. +- `null_status_on_timeout` — возвращает `NULL` в качеÑтве ÑтатуÑа Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð² некоторых Ñтроках набора результатов вмеÑто выбраÑÑ‹Ð²Ð°Ð½Ð¸Ñ `TIMEOUT_EXCEEDED`, еÑли Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ закончен на ÑоответÑтвующих хоÑтах. +- `never_throw` — не выбраÑывает иÑключение и `TIMEOUT_EXCEEDED`, еÑли Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ удалÑÑ Ð½Ð° некоторых хоÑтах. + +Значение по умолчанию: `throw`. + ## flatten_nested {#flatten-nested} УÑтанавливает формат данных у [вложенных](../../sql-reference/data-types/nested-data-structures/nested.md) Ñтолбцов. @@ -3066,3 +3184,14 @@ SETTINGS index_granularity = 8192 │ **ИÑпользование** ЕÑли уÑтановлено значение `0`, то Ñ‚Ð°Ð±Ð»Ð¸Ñ‡Ð½Ð°Ñ Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð½Ðµ делает Nullable Ñтолбцы, а вмеÑто NULL выÑтавлÑет Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ умолчанию Ð´Ð»Ñ ÑкалÑрного типа. Это также применимо Ð´Ð»Ñ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ð¹ NULL внутри маÑÑивов. + +## output_format_arrow_low_cardinality_as_dictionary {#output-format-arrow-low-cardinality-as-dictionary} + +ПозволÑет конвертировать тип [LowCardinality](../../sql-reference/data-types/lowcardinality.md) в тип `DICTIONARY` формата [Arrow](../../interfaces/formats.md#data-format-arrow) Ð´Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов `SELECT`. + +Возможные значениÑ: + +- 0 — тип `LowCardinality` не конвертируетÑÑ Ð² тип `DICTIONARY`. +- 1 — тип `LowCardinality` конвертируетÑÑ Ð² тип `DICTIONARY`. + +Значение по умолчанию: `0`. diff --git a/docs/ru/operations/system-tables/asynchronous_metric_log.md b/docs/ru/operations/system-tables/asynchronous_metric_log.md index 979b63f0cc8..8bb371de230 100644 --- a/docs/ru/operations/system-tables/asynchronous_metric_log.md +++ b/docs/ru/operations/system-tables/asynchronous_metric_log.md @@ -31,6 +31,6 @@ SELECT * FROM system.asynchronous_metric_log LIMIT 10 ``` **Смотрите также** -- [system.asynchronous_metrics](#system_tables-asynchronous_metrics) — Содержит метрики, которые периодичеÑки вычиÑлÑÑŽÑ‚ÑÑ Ð² фоновом режиме. +- [system.asynchronous_metrics](#system_tables-asynchronous_metrics) — Содержит метрики, которые периодичеÑки вычиÑлÑÑŽÑ‚ÑÑ Ð² фоновом режиме. - [system.metric_log](#system_tables-metric_log) — таблица фикÑÐ¸Ñ€ÑƒÑŽÑ‰Ð°Ñ Ð¸Ñторию значений метрик из `system.metrics` и `system.events`. diff --git a/docs/ru/operations/system-tables/asynchronous_metrics.md b/docs/ru/operations/system-tables/asynchronous_metrics.md index 9d12a119c43..faefdf0eee5 100644 --- a/docs/ru/operations/system-tables/asynchronous_metrics.md +++ b/docs/ru/operations/system-tables/asynchronous_metrics.md @@ -35,4 +35,3 @@ SELECT * FROM system.asynchronous_metrics LIMIT 10 - [system.events](#system_tables-events) — таблица Ñ ÐºÐ¾Ð»Ð¸Ñ‡ÐµÑтвом произошедших Ñобытий. - [system.metric_log](#system_tables-metric_log) — таблица фикÑÐ¸Ñ€ÑƒÑŽÑ‰Ð°Ñ Ð¸Ñторию значений метрик из `system.metrics` и `system.events`. - \ No newline at end of file diff --git a/docs/ru/operations/system-tables/columns.md b/docs/ru/operations/system-tables/columns.md index b8a0aef2299..a896360b3f9 100644 --- a/docs/ru/operations/system-tables/columns.md +++ b/docs/ru/operations/system-tables/columns.md @@ -4,7 +4,7 @@ С помощью Ñтой таблицы можно получить информацию аналогично запроÑу [DESCRIBE TABLE](../../sql-reference/statements/misc.md#misc-describe-table), но Ð´Ð»Ñ Ð¼Ð½Ð¾Ð³Ð¸Ñ… таблиц Ñразу. -Колонки [временных таблиц](../../sql-reference/statements/create/table.md#temporary-tables) ÑодержатÑÑ Ð² `system.columns` только в тех ÑеÑÑиÑÑ…, в которых Ñти таблицы были Ñозданы. Поле `database` у таких колонок пуÑтое. +Колонки [временных таблиц](../../sql-reference/statements/create/table.md#temporary-tables) ÑодержатÑÑ Ð² `system.columns` только в тех ÑеÑÑиÑÑ…, в которых Ñти таблицы были Ñозданы. Поле `database` у таких колонок пуÑтое. Cтолбцы: @@ -38,17 +38,17 @@ database: system table: aggregate_function_combinators name: name type: String -default_kind: -default_expression: +default_kind: +default_expression: data_compressed_bytes: 0 data_uncompressed_bytes: 0 marks_bytes: 0 -comment: +comment: is_in_partition_key: 0 is_in_sorting_key: 0 is_in_primary_key: 0 is_in_sampling_key: 0 -compression_codec: +compression_codec: Row 2: ────── @@ -56,15 +56,15 @@ database: system table: aggregate_function_combinators name: is_internal type: UInt8 -default_kind: -default_expression: +default_kind: +default_expression: data_compressed_bytes: 0 data_uncompressed_bytes: 0 marks_bytes: 0 -comment: +comment: is_in_partition_key: 0 is_in_sorting_key: 0 is_in_primary_key: 0 is_in_sampling_key: 0 -compression_codec: +compression_codec: ``` diff --git a/docs/ru/operations/system-tables/current-roles.md b/docs/ru/operations/system-tables/current-roles.md index 42ed4260fde..ee9cbb08b3d 100644 --- a/docs/ru/operations/system-tables/current-roles.md +++ b/docs/ru/operations/system-tables/current-roles.md @@ -6,5 +6,5 @@ - `role_name` ([String](../../sql-reference/data-types/string.md))) — Ð˜Ð¼Ñ Ñ€Ð¾Ð»Ð¸. - `with_admin_option` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, обладает ли `current_role` роль привилегией `ADMIN OPTION`. - - `is_default` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, ÑвлÑетÑÑ Ð»Ð¸ `current_role` ролью по умолчанию. + - `is_default` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, ÑвлÑетÑÑ Ð»Ð¸ `current_role` ролью по умолчанию. diff --git a/docs/ru/operations/system-tables/distributed_ddl_queue.md b/docs/ru/operations/system-tables/distributed_ddl_queue.md index 99d92574a0b..b384243834f 100644 --- a/docs/ru/operations/system-tables/distributed_ddl_queue.md +++ b/docs/ru/operations/system-tables/distributed_ddl_queue.md @@ -61,4 +61,3 @@ exception_code: ZOK 2 rows in set. Elapsed: 0.025 sec. ``` - \ No newline at end of file diff --git a/docs/ru/operations/system-tables/distribution_queue.md b/docs/ru/operations/system-tables/distribution_queue.md index 5b811ab2be8..08f99d77343 100644 --- a/docs/ru/operations/system-tables/distribution_queue.md +++ b/docs/ru/operations/system-tables/distribution_queue.md @@ -36,7 +36,7 @@ is_blocked: 1 error_count: 0 data_files: 1 data_compressed_bytes: 499 -last_exception: +last_exception: ``` **Смотрите также** diff --git a/docs/ru/operations/system-tables/enabled-roles.md b/docs/ru/operations/system-tables/enabled-roles.md index a3f5ba179b3..2208f96e812 100644 --- a/docs/ru/operations/system-tables/enabled-roles.md +++ b/docs/ru/operations/system-tables/enabled-roles.md @@ -5,7 +5,7 @@ Столбцы: - `role_name` ([String](../../sql-reference/data-types/string.md))) — Ð˜Ð¼Ñ Ñ€Ð¾Ð»Ð¸. -- `with_admin_option` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, обладает ли `enabled_role` роль привилегией `ADMIN OPTION`. -- `is_current` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, ÑвлÑетÑÑ Ð»Ð¸ `enabled_role` текущей ролью текущего пользователÑ. +- `with_admin_option` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, обладает ли `enabled_role` роль привилегией `ADMIN OPTION`. +- `is_current` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, ÑвлÑетÑÑ Ð»Ð¸ `enabled_role` текущей ролью текущего пользователÑ. - `is_default` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, ÑвлÑетÑÑ Ð»Ð¸ `enabled_role` ролью по умолчанию. diff --git a/docs/ru/operations/system-tables/index.md b/docs/ru/operations/system-tables/index.md index fce93f33a27..73b839ddc1f 100644 --- a/docs/ru/operations/system-tables/index.md +++ b/docs/ru/operations/system-tables/index.md @@ -50,7 +50,7 @@ toc_title: "СиÑтемные таблицы" По умолчанию размер таблицы не ограничен. УправлÑÑ‚ÑŒ размером таблицы можно иÑÐ¿Ð¾Ð»ÑŒÐ·ÑƒÑ [TTL](../../sql-reference/statements/alter/ttl.md#manipuliatsii-s-ttl-tablitsy) Ð´Ð»Ñ ÑƒÐ´Ð°Ð»ÐµÐ½Ð¸Ñ ÑƒÑтаревших запиÑей журнала. Также вы можете иÑпользовать функцию Ð¿Ð°Ñ€Ñ‚Ð¸Ñ†Ð¸Ð¾Ð½Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð´Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ† `MergeTree`. -### ИÑточники ÑиÑтемных показателей +### ИÑточники ÑиÑтемных показателей Ð”Ð»Ñ Ñбора ÑиÑтемных показателей Ñервер ClickHouse иÑпользует: diff --git a/docs/ru/operations/system-tables/licenses.md b/docs/ru/operations/system-tables/licenses.md index 598da1e72ee..b22dc73b666 100644 --- a/docs/ru/operations/system-tables/licenses.md +++ b/docs/ru/operations/system-tables/licenses.md @@ -5,7 +5,7 @@ Столбцы: - `library_name` ([String](../../sql-reference/data-types/string.md)) — Ðазвание библиотеки, к которой отноÑитÑÑ Ð»Ð¸Ñ†ÐµÐ½Ð·Ð¸Ñ. -- `license_type` ([String](../../sql-reference/data-types/string.md)) — Тип лицензии, например, Apache, MIT. +- `license_type` ([String](../../sql-reference/data-types/string.md)) — Тип лицензии, например, Apache, MIT. - `license_path` ([String](../../sql-reference/data-types/string.md)) — Путь к файлу Ñ Ñ‚ÐµÐºÑтом лицензии. - `license_text` ([String](../../sql-reference/data-types/string.md)) — ТекÑÑ‚ лицензии. diff --git a/docs/ru/operations/system-tables/mutations.md b/docs/ru/operations/system-tables/mutations.md index 4370ab593e7..bbd4d9fac13 100644 --- a/docs/ru/operations/system-tables/mutations.md +++ b/docs/ru/operations/system-tables/mutations.md @@ -1,6 +1,6 @@ # system.mutations {#system_tables-mutations} -Таблица Ñодержит информацию о ходе Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ [мутаций](../../sql-reference/statements/alter/index.md#mutations) таблиц ÑемейÑтва MergeTree. Каждой команде мутации ÑоответÑтвует одна Ñтрока таблицы. +Таблица Ñодержит информацию о ходе Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ [мутаций](../../sql-reference/statements/alter/index.md#mutations) таблиц ÑемейÑтва MergeTree. Каждой команде мутации ÑоответÑтвует одна Ñтрока таблицы. Столбцы: diff --git a/docs/ru/operations/system-tables/part_log.md b/docs/ru/operations/system-tables/part_log.md index a8d892f3b67..78e9a7c0fbe 100644 --- a/docs/ru/operations/system-tables/part_log.md +++ b/docs/ru/operations/system-tables/part_log.md @@ -63,6 +63,6 @@ read_rows: 0 read_bytes: 0 peak_memory_usage: 0 error: 0 -exception: +exception: ``` diff --git a/docs/ru/operations/system-tables/parts.md b/docs/ru/operations/system-tables/parts.md index 1c7f0ad2e9a..c73e1566a95 100644 --- a/docs/ru/operations/system-tables/parts.md +++ b/docs/ru/operations/system-tables/parts.md @@ -19,10 +19,10 @@ Возможные значениÑ: - - `Wide` — ÐºÐ°Ð¶Ð´Ð°Ñ ÐºÐ¾Ð»Ð¾Ð½ÐºÐ° хранитÑÑ Ð² отдельном файле. - - `Compact` — вÑе колонки хранÑÑ‚ÑÑ Ð² одном файле. + - `Wide` — ÐºÐ°Ð¶Ð´Ð°Ñ ÐºÐ¾Ð»Ð¾Ð½ÐºÐ° хранитÑÑ Ð² отдельном файле. + - `Compact` — вÑе колонки хранÑÑ‚ÑÑ Ð² одном файле. - Формат Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… определÑетÑÑ Ð½Ð°Ñтройками `min_bytes_for_wide_part` и `min_rows_for_wide_part` таблицы [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md). + Формат Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… определÑетÑÑ Ð½Ð°Ñтройками `min_bytes_for_wide_part` и `min_rows_for_wide_part` таблицы [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md). - `active` ([UInt8](../../sql-reference/data-types/int-uint.md)) – признак активноÑти. ЕÑли куÑок активен, то он иÑпользуетÑÑ Ñ‚Ð°Ð±Ð»Ð¸Ñ†ÐµÐ¹, в противном Ñлучает он будет удален. Ðеактивные куÑки оÑтаютÑÑ Ð¿Ð¾Ñле ÑлиÑний. @@ -88,7 +88,7 @@ - `delete_ttl_info_max` ([DateTime](../../sql-reference/data-types/datetime.md)) — МакÑимальное значение ключа даты и времени Ð´Ð»Ñ Ð¿Ñ€Ð°Ð²Ð¸Ð»Ð° [TTL DELETE](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl). -- `move_ttl_info.expression` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — МаÑÑив выражений. Каждое выражение задаёт правило [TTL MOVE](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl). +- `move_ttl_info.expression` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — МаÑÑив выражений. Каждое выражение задаёт правило [TTL MOVE](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl). !!! note "Предупреждение" МаÑÑив выражений `move_ttl_info.expression` иÑпользуетÑÑ, в оÑновном, Ð´Ð»Ñ Ð¾Ð±Ñ€Ð°Ñ‚Ð½Ð¾Ð¹ ÑовмеÑтимоÑти. Ð”Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ Ð¿Ñ€Ð°Ð²Ð¸Ð»Ð°Ð¼Ð¸ `TTL MOVE` лучше иÑпользовать Ð¿Ð¾Ð»Ñ `move_ttl_info.min` и `move_ttl_info.max`. diff --git a/docs/ru/operations/system-tables/parts_columns.md b/docs/ru/operations/system-tables/parts_columns.md index 5640929d810..04220ea480f 100644 --- a/docs/ru/operations/system-tables/parts_columns.md +++ b/docs/ru/operations/system-tables/parts_columns.md @@ -19,10 +19,10 @@ Возможные значениÑ: - - `Wide` — ÐºÐ°Ð¶Ð´Ð°Ñ ÐºÐ¾Ð»Ð¾Ð½ÐºÐ° хранитÑÑ Ð² отдельном файле. - - `Compact` — вÑе колонки хранÑÑ‚ÑÑ Ð² одном файле. + - `Wide` — ÐºÐ°Ð¶Ð´Ð°Ñ ÐºÐ¾Ð»Ð¾Ð½ÐºÐ° хранитÑÑ Ð² отдельном файле. + - `Compact` — вÑе колонки хранÑÑ‚ÑÑ Ð² одном файле. - Формат Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… определÑетÑÑ Ð½Ð°Ñтройками `min_bytes_for_wide_part` и `min_rows_for_wide_part` таблицы [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md). + Формат Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… определÑетÑÑ Ð½Ð°Ñтройками `min_bytes_for_wide_part` и `min_rows_for_wide_part` таблицы [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md). - `active` ([UInt8](../../sql-reference/data-types/int-uint.md)) — признак активноÑти. ЕÑли куÑок данных активен, то он иÑпользуетÑÑ Ñ‚Ð°Ð±Ð»Ð¸Ñ†ÐµÐ¹, в противном Ñлучае он будет удален. Ðеактивные куÑки оÑтаютÑÑ Ð¿Ð¾Ñле ÑлиÑний. diff --git a/docs/ru/operations/system-tables/query_log.md b/docs/ru/operations/system-tables/query_log.md index 8cdddba462c..7e98ddedcec 100644 --- a/docs/ru/operations/system-tables/query_log.md +++ b/docs/ru/operations/system-tables/query_log.md @@ -51,6 +51,7 @@ ClickHouse не удалÑет данные из таблица автомати - `databases` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — имена баз данных, приÑутÑтвующих в запроÑе. - `tables` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — имена таблиц, приÑутÑтвующих в запроÑе. - `columns` ([Array](../../sql-reference/data-types/array.md)([LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md))) — имена Ñтолбцов, приÑутÑтвующих в запроÑе. +- `projections` ([String](../../sql-reference/data-types/string.md)) — имена проекций, иÑпользованных при выполнении запроÑа. - `exception_code` ([Int32](../../sql-reference/data-types/int-uint.md)) — код иÑключениÑ. - `exception` ([String](../../sql-reference/data-types/string.md)) — Ñообщение иÑключениÑ, еÑли Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð·Ð°Ð²ÐµÑ€ÑˆÐ¸Ð»ÑÑ Ð¿Ð¾ иÑключению. - `stack_trace` ([String](../../sql-reference/data-types/string.md)) — [stack trace](https://en.wikipedia.org/wiki/Stack_trace). ПуÑÑ‚Ð°Ñ Ñтрока, еÑли Ð·Ð°Ð¿Ñ€Ð¾Ñ ÑƒÑпешно завершен. @@ -65,6 +66,8 @@ ClickHouse не удалÑет данные из таблица автомати - `initial_query_id` ([String](../../sql-reference/data-types/string.md)) — ID родительÑкого запроÑа. - `initial_address` ([IPv6](../../sql-reference/data-types/domains/ipv6.md)) — IP адреÑ, Ñ ÐºÐ¾Ñ‚Ð¾Ñ€Ð¾Ð³Ð¾ пришел родительÑкий запроÑ. - `initial_port` ([UInt16](../../sql-reference/data-types/int-uint.md)) — порт, Ñ ÐºÐ¾Ñ‚Ð¾Ñ€Ð¾Ð³Ð¾ клиент Ñделал родительÑкий запроÑ. +- `initial_query_start_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Ð²Ñ€ÐµÐ¼Ñ Ð½Ð°Ñ‡Ð°Ð»Ð° обработки запроÑа (Ð´Ð»Ñ Ñ€Ð°Ñпределенных запроÑов). +- `initial_query_start_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — Ð²Ñ€ÐµÐ¼Ñ Ð½Ð°Ñ‡Ð°Ð»Ð° обработки запроÑа Ñ Ñ‚Ð¾Ñ‡Ð½Ð¾Ñтью до микроÑекунд (Ð´Ð»Ñ Ñ€Ð°Ñпределенных запроÑов). - `interface` ([UInt8](../../sql-reference/data-types/int-uint.md)) — интерфейÑ, Ñ ÐºÐ¾Ñ‚Ð¾Ñ€Ð¾Ð³Ð¾ ушёл запроÑ. Возможные значениÑ: - 1 — TCP. - 2 — HTTP. @@ -101,55 +104,77 @@ ClickHouse не удалÑет данные из таблица автомати **Пример** ``` sql -SELECT * FROM system.query_log WHERE type = 'QueryFinish' AND (query LIKE '%toDate(\'2000-12-05\')%') ORDER BY query_start_time DESC LIMIT 1 FORMAT Vertical; +SELECT * FROM system.query_log WHERE type = 'QueryFinish' ORDER BY query_start_time DESC LIMIT 1 FORMAT Vertical; ``` ``` text Row 1: ────── -type: QueryStart -event_date: 2020-09-11 -event_time: 2020-09-11 10:08:17 -event_time_microseconds: 2020-09-11 10:08:17.063321 -query_start_time: 2020-09-11 10:08:17 -query_start_time_microseconds: 2020-09-11 10:08:17.063321 -query_duration_ms: 0 -read_rows: 0 -read_bytes: 0 -written_rows: 0 -written_bytes: 0 -result_rows: 0 -result_bytes: 0 -memory_usage: 0 -current_database: default -query: INSERT INTO test1 VALUES -exception_code: 0 +type: QueryFinish +event_date: 2021-07-28 +event_time: 2021-07-28 13:46:56 +event_time_microseconds: 2021-07-28 13:46:56.719791 +query_start_time: 2021-07-28 13:46:56 +query_start_time_microseconds: 2021-07-28 13:46:56.704542 +query_duration_ms: 14 +read_rows: 8393 +read_bytes: 374325 +written_rows: 0 +written_bytes: 0 +result_rows: 4201 +result_bytes: 153024 +memory_usage: 4714038 +current_database: default +query: SELECT DISTINCT arrayJoin(extractAll(name, '[\\w_]{2,}')) AS res FROM (SELECT name FROM system.functions UNION ALL SELECT name FROM system.table_engines UNION ALL SELECT name FROM system.formats UNION ALL SELECT name FROM system.table_functions UNION ALL SELECT name FROM system.data_type_families UNION ALL SELECT name FROM system.merge_tree_settings UNION ALL SELECT name FROM system.settings UNION ALL SELECT cluster FROM system.clusters UNION ALL SELECT macro FROM system.macros UNION ALL SELECT policy_name FROM system.storage_policies UNION ALL SELECT concat(func.name, comb.name) FROM system.functions AS func CROSS JOIN system.aggregate_function_combinators AS comb WHERE is_aggregate UNION ALL SELECT name FROM system.databases LIMIT 10000 UNION ALL SELECT DISTINCT name FROM system.tables LIMIT 10000 UNION ALL SELECT DISTINCT name FROM system.dictionaries LIMIT 10000 UNION ALL SELECT DISTINCT name FROM system.columns LIMIT 10000) WHERE notEmpty(res) +normalized_query_hash: 6666026786019643712 +query_kind: Select +databases: ['system'] +tables: ['system.aggregate_function_combinators','system.clusters','system.columns','system.data_type_families','system.databases','system.dictionaries','system.formats','system.functions','system.macros','system.merge_tree_settings','system.settings','system.storage_policies','system.table_engines','system.table_functions','system.tables'] +columns: ['system.aggregate_function_combinators.name','system.clusters.cluster','system.columns.name','system.data_type_families.name','system.databases.name','system.dictionaries.name','system.formats.name','system.functions.is_aggregate','system.functions.name','system.macros.macro','system.merge_tree_settings.name','system.settings.name','system.storage_policies.policy_name','system.table_engines.name','system.table_functions.name','system.tables.name'] +projections: [] +exception_code: 0 exception: stack_trace: -is_initial_query: 1 -user: default -query_id: 50a320fd-85a8-49b8-8761-98a86bcbacef -address: ::ffff:127.0.0.1 -port: 33452 -initial_user: default -initial_query_id: 50a320fd-85a8-49b8-8761-98a86bcbacef -initial_address: ::ffff:127.0.0.1 -initial_port: 33452 -interface: 1 -os_user: bharatnc -client_hostname: tower -client_name: ClickHouse -client_revision: 54437 -client_version_major: 20 -client_version_minor: 7 -client_version_patch: 2 -http_method: 0 +is_initial_query: 1 +user: default +query_id: a3361f6e-a1fd-4d54-9f6f-f93a08bab0bf +address: ::ffff:127.0.0.1 +port: 51006 +initial_user: default +initial_query_id: a3361f6e-a1fd-4d54-9f6f-f93a08bab0bf +initial_address: ::ffff:127.0.0.1 +initial_port: 51006 +initial_query_start_time: 2021-07-28 13:46:56 +initial_query_start_time_microseconds: 2021-07-28 13:46:56.704542 +interface: 1 +os_user: +client_hostname: +client_name: ClickHouse client +client_revision: 54449 +client_version_major: 21 +client_version_minor: 8 +client_version_patch: 0 +http_method: 0 http_user_agent: +http_referer: +forwarded_for: quota_key: -revision: 54440 -thread_ids: [] -ProfileEvents: {'Query':1,'SelectQuery':1,'ReadCompressedBytes':36,'CompressedReadBufferBlocks':1,'CompressedReadBufferBytes':10,'IOBufferAllocs':1,'IOBufferAllocBytes':89,'ContextLock':15,'RWLockAcquiredReadLocks':1} -Settings: {'background_pool_size':'32','load_balancing':'random','allow_suspicious_low_cardinality_types':'1','distributed_aggregation_memory_efficient':'1','skip_unavailable_shards':'1','log_queries':'1','max_bytes_before_external_group_by':'20000000000','max_bytes_before_external_sort':'20000000000','allow_introspection_functions':'1'} +revision: 54453 +log_comment: +thread_ids: [5058,22097,22110,22094] +ProfileEvents.Names: ['Query','SelectQuery','ArenaAllocChunks','ArenaAllocBytes','FunctionExecute','NetworkSendElapsedMicroseconds','SelectedRows','SelectedBytes','ContextLock','RWLockAcquiredReadLocks','RealTimeMicroseconds','UserTimeMicroseconds','SystemTimeMicroseconds','SoftPageFaults','OSCPUWaitMicroseconds','OSCPUVirtualTimeMicroseconds','OSWriteBytes','OSWriteChars'] +ProfileEvents.Values: [1,1,39,352256,64,360,8393,374325,412,440,34480,13108,4723,671,19,17828,8192,10240] +Settings.Names: ['load_balancing','max_memory_usage'] +Settings.Values: ['random','10000000000'] +used_aggregate_functions: [] +used_aggregate_function_combinators: [] +used_database_engines: [] +used_data_type_families: ['UInt64','UInt8','Nullable','String','date'] +used_dictionaries: [] +used_formats: [] +used_functions: ['concat','notEmpty','extractAll'] +used_storages: [] +used_table_functions: [] ``` **Смотрите также** diff --git a/docs/ru/operations/system-tables/quota_limits.md b/docs/ru/operations/system-tables/quota_limits.md index 4103391cfd6..21505b7d2c5 100644 --- a/docs/ru/operations/system-tables/quota_limits.md +++ b/docs/ru/operations/system-tables/quota_limits.md @@ -5,7 +5,7 @@ Столбцы: - `quota_name` ([String](../../sql-reference/data-types/string.md)) — Ð¸Ð¼Ñ ÐºÐ²Ð¾Ñ‚Ñ‹. -- `duration` ([UInt32](../../sql-reference/data-types/int-uint.md)) — длина временного интервала Ð´Ð»Ñ Ñ€Ð°Ñчета Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»ÐµÐ½Ð¸Ñ Ñ€ÐµÑурÑов, в Ñекундах. +- `duration` ([UInt32](../../sql-reference/data-types/int-uint.md)) — длина временного интервала Ð´Ð»Ñ Ñ€Ð°Ñчета Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»ÐµÐ½Ð¸Ñ Ñ€ÐµÑурÑов, в Ñекундах. - `is_randomized_interval` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — логичеÑкое значение. Оно показывает, ÑвлÑетÑÑ Ð»Ð¸ интервал рандомизированным. Интервал вÑегда начинаетÑÑ Ð² одно и то же времÑ, еÑли он не рандомизирован. Ðапример, интервал в 1 минуту вÑегда начинаетÑÑ Ñ Ñ†ÐµÐ»Ð¾Ð³Ð¾ чиÑла минут (то еÑÑ‚ÑŒ он может начинатьÑÑ Ð² 11:20:00, но никогда не начинаетÑÑ Ð² 11:20:01), интервал в один день вÑегда начинаетÑÑ Ð² полночь UTC. ЕÑли интервал рандомизирован, то Ñамый первый интервал начинаетÑÑ Ð² произвольное времÑ, а поÑледующие интервалы начинаютÑÑ Ð¾Ð´Ð¸Ð½ за другим. ЗначениÑ: - `0` — интервал рандомизирован. - `1` — интервал не рандомизирован. diff --git a/docs/ru/operations/system-tables/quotas.md b/docs/ru/operations/system-tables/quotas.md index fe6b78cc44b..3715bc89596 100644 --- a/docs/ru/operations/system-tables/quotas.md +++ b/docs/ru/operations/system-tables/quotas.md @@ -7,14 +7,14 @@ - `name` ([String](../../sql-reference/data-types/string.md)) — Ð˜Ð¼Ñ ÐºÐ²Ð¾Ñ‚Ñ‹. - `id` ([UUID](../../sql-reference/data-types/uuid.md)) — ID квоты. - `storage`([String](../../sql-reference/data-types/string.md)) — Хранилище квот. Возможные значениÑ: "users.xml", еÑли квота задана в файле users.xml, "disk" — еÑли квота задана в SQL-запроÑе. -- `keys` ([Array](../../sql-reference/data-types/array.md)([Enum8](../../sql-reference/data-types/enum.md))) — Ключ определÑет ÑовмеÑтное иÑпользование квоты. ЕÑли два ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¸Ñпользуют одну и ту же квоту, они ÑовмеÑтно иÑпользуют один и тот же объем реÑурÑов. ЗначениÑ: +- `keys` ([Array](../../sql-reference/data-types/array.md)([Enum8](../../sql-reference/data-types/enum.md))) — Ключ определÑет ÑовмеÑтное иÑпользование квоты. ЕÑли два ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¸Ñпользуют одну и ту же квоту, они ÑовмеÑтно иÑпользуют один и тот же объем реÑурÑов. ЗначениÑ: - `[]` — Ð’Ñе пользователи иÑпользуют одну и ту же квоту. - - `['user_name']` — Ð¡Ð¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ именем Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ Ð¸Ñпользуют одну и ту же квоту. - - `['ip_address']` — Ð¡Ð¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ IP-адреÑом иÑпользуют одну и ту же квоту. + - `['user_name']` — Ð¡Ð¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ именем Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ Ð¸Ñпользуют одну и ту же квоту. + - `['ip_address']` — Ð¡Ð¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ IP-адреÑом иÑпользуют одну и ту же квоту. - `['client_key']` — Ð¡Ð¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ ключом иÑпользуют одну и ту же квоту. Ключ может быть Ñвно задан клиентом. При иÑпользовании [clickhouse-client](../../interfaces/cli.md), передайте ключевое значение в параметре `--quota-key`, или иÑпользуйте параметр `quota_key` файле наÑтроек клиента. Ð’ Ñлучае иÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ HTTP интерфейÑа, иÑпользуйте заголовок `X-ClickHouse-Quota`. - `['user_name', 'client_key']` — Ð¡Ð¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ ключом иÑпользуют одну и ту же квоту. ЕÑли ключ не предоÑтавлен клиентом, то квота отÑлеживаетÑÑ Ð´Ð»Ñ `user_name`. - `['client_key', 'ip_address']` — Ð¡Ð¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ ключом иÑпользуют одну и ту же квоту. ЕÑли ключ не предоÑтавлен клиентом, то квота отÑлеживаетÑÑ Ð´Ð»Ñ `ip_address`. -- `durations` ([Array](../../sql-reference/data-types/array.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Длины временных интервалов Ð´Ð»Ñ Ñ€Ð°Ñчета Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»ÐµÐ½Ð¸Ñ Ñ€ÐµÑурÑов, в Ñекундах. +- `durations` ([Array](../../sql-reference/data-types/array.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Длины временных интервалов Ð´Ð»Ñ Ñ€Ð°Ñчета Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»ÐµÐ½Ð¸Ñ Ñ€ÐµÑурÑов, в Ñекундах. - `apply_to_all` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — ЛогичеÑкое значение. Он показывает, к каким пользователÑм применÑетÑÑ ÐºÐ²Ð¾Ñ‚Ð°. ЗначениÑ: - `0` — Квота применÑетÑÑ Ðº пользователÑм, перечиÑленным в ÑпиÑке `apply_to_list`. - `1` — Квота применÑетÑÑ Ðº пользователÑм, за иÑключением тех, что перечиÑлены в ÑпиÑке `apply_to_except`. diff --git a/docs/ru/operations/system-tables/replication_queue.md b/docs/ru/operations/system-tables/replication_queue.md index 2f9d80be16f..661962e83c4 100644 --- a/docs/ru/operations/system-tables/replication_queue.md +++ b/docs/ru/operations/system-tables/replication_queue.md @@ -21,7 +21,7 @@ - `MERGE_PARTS` — выполнить ÑлиÑние куÑков. - `DROP_RANGE` — удалить куÑки в партициÑÑ… из указнного диапазона. - `CLEAR_COLUMN` — удалить указанный Ñтолбец из указанной партиции. Примечание: не иÑпользуетÑÑ Ñ 20.4. - - `CLEAR_INDEX` — удалить указанный Ð¸Ð½Ð´ÐµÐºÑ Ð¸Ð· указанной партиции. Примечание: не иÑпользуетÑÑ Ñ 20.4. + - `CLEAR_INDEX` — удалить указанный Ð¸Ð½Ð´ÐµÐºÑ Ð¸Ð· указанной партиции. Примечание: не иÑпользуетÑÑ Ñ 20.4. - `REPLACE_RANGE` — удалить указанный диапазон куÑков и заменить их на новые. - `MUTATE_PART` — применить одну или неÑколько мутаций к куÑку. - `ALTER_METADATA` — применить Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ñтруктуры таблицы в результате запроÑов Ñ Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸ÐµÐ¼ `ALTER`. diff --git a/docs/ru/operations/system-tables/role-grants.md b/docs/ru/operations/system-tables/role-grants.md index 2c80a597857..e392349af48 100644 --- a/docs/ru/operations/system-tables/role-grants.md +++ b/docs/ru/operations/system-tables/role-grants.md @@ -12,5 +12,5 @@ - 0 — `granted_role` не ÑвлÑетÑÑ Ñ€Ð¾Ð»ÑŒÑŽ по умолчанию. - `with_admin_option` ([UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Флаг, который показывает, обладает ли `granted_role` роль привилегией `ADMIN OPTION`. Возможные значениÑ: - 1 — Роль обладает привилегией `ADMIN OPTION`. - - 0 — Роль не обладает привилегией `ADMIN OPTION`. + - 0 — Роль не обладает привилегией `ADMIN OPTION`. diff --git a/docs/ru/operations/system-tables/settings_profiles.md b/docs/ru/operations/system-tables/settings_profiles.md index f8101fb0cb7..8e0a8fde702 100644 --- a/docs/ru/operations/system-tables/settings_profiles.md +++ b/docs/ru/operations/system-tables/settings_profiles.md @@ -7,7 +7,7 @@ - `id` ([UUID](../../sql-reference/data-types/uuid.md)) — ID Ð¿Ñ€Ð¾Ñ„Ð¸Ð»Ñ Ð½Ð°Ñтроек. -- `storage` ([String](../../sql-reference/data-types/string.md)) — Путь к хранилищу профилей наÑтроек. ÐаÑтраиваетÑÑ Ð² параметре `access_control_path`. +- `storage` ([String](../../sql-reference/data-types/string.md)) — Путь к хранилищу профилей наÑтроек. ÐаÑтраиваетÑÑ Ð² параметре `access_control_path`. - `num_elements` ([UInt64](../../sql-reference/data-types/int-uint.md)) — ЧиÑло Ñлементов Ð´Ð»Ñ Ñтого Ð¿Ñ€Ð¾Ñ„Ð¸Ð»Ñ Ð² таблице `system.settings_profile_elements`. diff --git a/docs/ru/operations/system-tables/tables.md b/docs/ru/operations/system-tables/tables.md index 3dec1e7d940..03ad174780f 100644 --- a/docs/ru/operations/system-tables/tables.md +++ b/docs/ru/operations/system-tables/tables.md @@ -1,10 +1,10 @@ # system.tables {#system-tables} -Содержит метаданные каждой таблицы, о которой знает Ñервер. +Содержит метаданные каждой таблицы, о которой знает Ñервер. ОтÑоединённые таблицы ([DETACH](../../sql-reference/statements/detach.md)) не отображаютÑÑ Ð² `system.tables`. -Ð˜Ð½Ñ„Ð¾Ñ€Ð¼Ð°Ñ†Ð¸Ñ Ð¾ [временных таблицах](../../sql-reference/statements/create/table.md#temporary-tables) ÑодержитÑÑ Ð² `system.tables` только в тех ÑеÑÑиÑÑ…, в которых Ñти таблицы были Ñозданы. Поле `database` у таких таблиц пуÑтое, а флаг `is_temporary` включен. +Ð˜Ð½Ñ„Ð¾Ñ€Ð¼Ð°Ñ†Ð¸Ñ Ð¾ [временных таблицах](../../sql-reference/statements/create/table.md#temporary-tables) ÑодержитÑÑ Ð² `system.tables` только в тех ÑеÑÑиÑÑ…, в которых Ñти таблицы были Ñозданы. Поле `database` у таких таблиц пуÑтое, а флаг `is_temporary` включен. Столбцы: diff --git a/docs/ru/operations/system-tables/text_log.md b/docs/ru/operations/system-tables/text_log.md index 97c6ef9e2cd..4936edc663b 100644 --- a/docs/ru/operations/system-tables/text_log.md +++ b/docs/ru/operations/system-tables/text_log.md @@ -42,7 +42,7 @@ microseconds: 871397 thread_name: clickhouse-serv thread_id: 564917 level: Information -query_id: +query_id: logger_name: DNSCacheUpdater message: Update period 15 seconds revision: 54440 diff --git a/docs/ru/operations/system-tables/trace_log.md b/docs/ru/operations/system-tables/trace_log.md index cc2eb4f9883..c43617ca7cf 100644 --- a/docs/ru/operations/system-tables/trace_log.md +++ b/docs/ru/operations/system-tables/trace_log.md @@ -47,7 +47,7 @@ timestamp_ns: 1599762189872924510 revision: 54440 trace_type: Memory thread_id: 564963 -query_id: +query_id: trace: [371912858,371912789,371798468,371799717,371801313,371790250,624462773,566365041,566440261,566445834,566460071,566459914,566459842,566459580,566459469,566459389,566459341,566455774,371993941,371988245,372158848,372187428,372187309,372187093,372185478,140222123165193,140222122205443] size: 5244400 ``` diff --git a/docs/ru/operations/system-tables/users.md b/docs/ru/operations/system-tables/users.md index 2a523ae4a9a..ba31382cc02 100644 --- a/docs/ru/operations/system-tables/users.md +++ b/docs/ru/operations/system-tables/users.md @@ -7,7 +7,7 @@ - `id` ([UUID](../../sql-reference/data-types/uuid.md)) — ID пользователÑ. -- `storage` ([String](../../sql-reference/data-types/string.md)) — Путь к хранилищу пользователей. ÐаÑтраиваетÑÑ Ð² параметре `access_control_path`. +- `storage` ([String](../../sql-reference/data-types/string.md)) — Путь к хранилищу пользователей. ÐаÑтраиваетÑÑ Ð² параметре `access_control_path`. - `auth_type` ([Enum8](../../sql-reference/data-types/enum.md)('no_password' = 0,'plaintext_password' = 1, 'sha256_password' = 2, 'double_sha1_password' = 3)) — Показывает тип аутентификации. СущеÑтвует неÑколько ÑпоÑобов идентификации пользователÑ: без паролÑ, Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ обычного текÑтового паролÑ, Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ ÑˆÐ¸Ñ„Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ [SHA256] (https://ru.wikipedia.org/wiki/SHA-2) или Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ ÑˆÐ¸Ñ„Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ [double SHA-1] (https://ru.wikipedia.org/wiki/SHA-1). diff --git a/docs/ru/operations/utilities/clickhouse-benchmark.md b/docs/ru/operations/utilities/clickhouse-benchmark.md index b4769b17818..ee14395332c 100644 --- a/docs/ru/operations/utilities/clickhouse-benchmark.md +++ b/docs/ru/operations/utilities/clickhouse-benchmark.md @@ -40,23 +40,23 @@ clickhouse-benchmark [keys] < queries_file; ## Ключи {#clickhouse-benchmark-keys} -- `--query=QUERY` — иÑполнÑемый запроÑ. ЕÑли параметр не передан, `clickhouse-benchmark` будет Ñчитывать запроÑÑ‹ из Ñтандартного ввода. +- `--query=QUERY` — иÑполнÑемый запроÑ. ЕÑли параметр не передан, `clickhouse-benchmark` будет Ñчитывать запроÑÑ‹ из Ñтандартного ввода. - `-c N`, `--concurrency=N` — количеÑтво запроÑов, которые `clickhouse-benchmark` отправлÑет одновременно. Значение по умолчанию: 1. - `-d N`, `--delay=N` — интервал в Ñекундах между промежуточными отчетами (чтобы отключить отчеты, уÑтановите 0). Значение по умолчанию: 1. -- `-h HOST`, `--host=HOST` — хоÑÑ‚ Ñервера. Значение по умолчанию: `localhost`. Ð”Ð»Ñ [режима ÑравнениÑ](#clickhouse-benchmark-comparison-mode) можно иÑпользовать неÑколько `-h` ключей. -- `-p N`, `--port=N` — порт Ñервера. Значение по умолчанию: 9000. Ð”Ð»Ñ [режима ÑравнениÑ](#clickhouse-benchmark-comparison-mode) можно иÑпользовать неÑколько `-p` ключей. +- `-h HOST`, `--host=HOST` — хоÑÑ‚ Ñервера. Значение по умолчанию: `localhost`. Ð”Ð»Ñ [режима ÑравнениÑ](#clickhouse-benchmark-comparison-mode) можно иÑпользовать неÑколько `-h` ключей. +- `-p N`, `--port=N` — порт Ñервера. Значение по умолчанию: 9000. Ð”Ð»Ñ [режима ÑравнениÑ](#clickhouse-benchmark-comparison-mode) можно иÑпользовать неÑколько `-p` ключей. - `-i N`, `--iterations=N` — общее чиÑло запроÑов. Значение по умолчанию: 0 (вечно будет повторÑÑ‚ÑŒÑÑ). - `-r`, `--randomize` — иÑпользовать Ñлучайный порÑдок Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов при наличии более одного входного запроÑа. - `-s`, `--secure` — иÑпользуетÑÑ `TLS` Ñоединение. - `-t N`, `--timelimit=N` — лимит по времени в Ñекундах. `clickhouse-benchmark` переÑтает отправлÑÑ‚ÑŒ запроÑÑ‹ при доÑтижении лимита по времени. Значение по умолчанию: 0 (лимит отключен). - `--confidence=N` — уровень Ð´Ð¾Ð²ÐµÑ€Ð¸Ñ Ð´Ð»Ñ T-критериÑ. Возможные значениÑ: 0 (80%), 1 (90%), 2 (95%), 3 (98%), 4 (99%), 5 (99.5%). Значение по умолчанию: 5. Ð’ [режиме ÑравнениÑ](#clickhouse-benchmark-comparison-mode) `clickhouse-benchmark` проверÑет [двухвыборочный t-критерий Стьюдента Ð´Ð»Ñ Ð½ÐµÐ·Ð°Ð²Ð¸Ñимых выборок](https://en.wikipedia.org/wiki/Student%27s_t-test#Independent_two-sample_t-test) чтобы определить, различны ли две выборки при выбранном уровне довериÑ. - `--cumulative` — выводить ÑтатиÑтику за вÑе Ð²Ñ€ÐµÐ¼Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹, а не за поÑледний временной интервал. -- `--database=DATABASE_NAME` — Ð¸Ð¼Ñ Ð±Ð°Ð·Ñ‹ данных ClickHouse. Значение по умолчанию: `default`. +- `--database=DATABASE_NAME` — Ð¸Ð¼Ñ Ð±Ð°Ð·Ñ‹ данных ClickHouse. Значение по умолчанию: `default`. - `--json=FILEPATH` — дополнительный вывод в формате `JSON`. Когда Ñтот ключ указан, `clickhouse-benchmark` выводит отчет в указанный JSON-файл. - `--user=USERNAME` — Ð¸Ð¼Ñ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ ClickHouse. Значение по умолчанию: `default`. - `--password=PSWD` — пароль Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ ClickHouse. Значение по умолчанию: пуÑÑ‚Ð°Ñ Ñтрока. - `--stacktrace` — вывод траÑÑировки Ñтека иÑключений. Когда Ñтот ключ указан, `clickhouse-bencmark` выводит траÑÑировку Ñтека иÑключений. -- `--stage=WORD` — ÑÑ‚Ð°Ð´Ð¸Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸ запроÑа на Ñервере. ClickHouse оÑтанавливает обработку запроÑа и возвращает ответ `clickhouse-benchmark` на заданной Ñтадии. Возможные значениÑ: `complete`, `fetch_columns`, `with_mergeable_state`. Значение по умолчанию: `complete`. +- `--stage=WORD` — ÑÑ‚Ð°Ð´Ð¸Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸ запроÑа на Ñервере. ClickHouse оÑтанавливает обработку запроÑа и возвращает ответ `clickhouse-benchmark` на заданной Ñтадии. Возможные значениÑ: `complete`, `fetch_columns`, `with_mergeable_state`. Значение по умолчанию: `complete`. - `--help` — показывает Ñправку. ЕÑли нужно применить [наÑтройки](../../operations/settings/index.md) Ð´Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов, их можно передать как ключ `--= SETTING_VALUE`. Ðапример, `--max_memory_usage=1048576`. @@ -65,7 +65,7 @@ clickhouse-benchmark [keys] < queries_file; По умолчанию, `clickhouse-benchmark` выводит Ñообщение Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ `--delay` интервала. -Пример ÑообщениÑ: +Пример ÑообщениÑ: ``` text Queries executed: 10. diff --git a/docs/ru/operations/utilities/clickhouse-copier.md b/docs/ru/operations/utilities/clickhouse-copier.md index aa4fd68f8e8..f1bde2be23d 100644 --- a/docs/ru/operations/utilities/clickhouse-copier.md +++ b/docs/ru/operations/utilities/clickhouse-copier.md @@ -71,7 +71,7 @@ $ clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --bas source cluster & destination clusters accept exactly the same parameters as parameters for the usual Distributed table see https://clickhouse.tech/docs/ru/engines/table-engines/special/distributed/ - --> + --> false diff --git a/docs/ru/operations/utilities/clickhouse-format.md b/docs/ru/operations/utilities/clickhouse-format.md index 43043fcc1d5..876c741e0ac 100644 --- a/docs/ru/operations/utilities/clickhouse-format.md +++ b/docs/ru/operations/utilities/clickhouse-format.md @@ -18,7 +18,7 @@ toc_title: clickhouse-format - `--seed <Ñтрока>` — задает Ñтроку, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð¾Ð¿Ñ€ÐµÐ´ÐµÐ»Ñет результат обфуÑкации. - `--backslash` — добавлÑет обратный Ñлеш в конце каждой Ñтроки отформатированного запроÑа. Удобно иÑпользовать еÑли многоÑтрочный Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñкопирован из интернета или другого иÑточника и его нужно выполнить из командной Ñтроки. -## Примеры {#examples} +## Примеры {#examples} 1. ПодÑветка ÑинтакÑиÑа и форматирование в одну Ñтроку: @@ -32,12 +32,12 @@ $ clickhouse-format --oneline --hilite <<< "SELECT sum(number) FROM numbers(5);" SELECT sum(number) FROM numbers(5) ``` -2. ÐеÑколько запроÑов в одной Ñтроке: +2. ÐеÑколько запроÑов в одной Ñтроке: ```bash $ clickhouse-format -n <<< "SELECT * FROM (SELECT 1 AS x UNION ALL SELECT 1 UNION DISTINCT SELECT 3);" ``` - + Результат: ```text @@ -64,13 +64,13 @@ $ clickhouse-format --seed Hello --obfuscate <<< "SELECT cost_first_screen BETWE ```text SELECT treasury_mammoth_hazelnut BETWEEN nutmeg AND span, CASE WHEN chive >= 116 THEN switching ELSE ANYTHING END; ``` - + Тот же Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñ Ð´Ñ€ÑƒÐ³Ð¾Ð¹ инициализацией обфуÑкатора: ```bash $ clickhouse-format --seed World --obfuscate <<< "SELECT cost_first_screen BETWEEN a AND b, CASE WHEN x >= 123 THEN y ELSE NULL END;" ``` - + Результат: ```text @@ -95,4 +95,4 @@ FROM \ UNION DISTINCT \ SELECT 3 \ ) -``` +``` diff --git a/docs/ru/operations/utilities/clickhouse-local.md b/docs/ru/operations/utilities/clickhouse-local.md index 682dc0b5ace..89ec424a9c2 100644 --- a/docs/ru/operations/utilities/clickhouse-local.md +++ b/docs/ru/operations/utilities/clickhouse-local.md @@ -14,7 +14,7 @@ toc_title: clickhouse-local !!! warning "Warning" Мы не рекомендуем подключать Ñерверную конфигурацию к `clickhouse-local`, поÑкольку данные можно легко повредить неоÑторожными дейÑтвиÑми. -Ð”Ð»Ñ Ð²Ñ€ÐµÐ¼ÐµÐ½Ð½Ñ‹Ñ… данных по умолчанию ÑоздаетÑÑ Ñпециальный каталог. +Ð”Ð»Ñ Ð²Ñ€ÐµÐ¼ÐµÐ½Ð½Ñ‹Ñ… данных по умолчанию ÑоздаетÑÑ Ñпециальный каталог. ## Вызов программы {#usage} @@ -36,7 +36,7 @@ $ clickhouse-local --structure "table_structure" --input-format "format_of_incom - `-of`, `--format`, `--output-format` — формат выходных данных. По умолчанию — `TSV`. - `-d`, `--database` — база данных по умолчанию. ЕÑли не указано, иÑпользуетÑÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ðµ `_local`. - `--stacktrace` — вывод отладочной информации при иÑключениÑÑ…. -- `--echo` — перед выполнением Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð²Ñ‹Ð²Ð¾Ð´Ð¸Ñ‚ÑÑ Ð² конÑоль. +- `--echo` — перед выполнением Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð²Ñ‹Ð²Ð¾Ð´Ð¸Ñ‚ÑÑ Ð² конÑоль. - `--verbose` — подробный вывод при выполнении запроÑа. - `--logger.console` — логирование дейÑтвий в конÑоль. - `--logger.log` — логирование дейÑтвий в файл Ñ ÑƒÐºÐ°Ð·Ð°Ð½Ð½Ñ‹Ð¼ именем. diff --git a/docs/ru/operations/utilities/clickhouse-obfuscator.md b/docs/ru/operations/utilities/clickhouse-obfuscator.md index a52d538965b..ff1fdc70288 100644 --- a/docs/ru/operations/utilities/clickhouse-obfuscator.md +++ b/docs/ru/operations/utilities/clickhouse-obfuscator.md @@ -1,43 +1,43 @@ -# ОбфуÑкатор ClickHouse - -ПроÑтой инÑтрумент Ð´Ð»Ñ Ð¾Ð±Ñ„ÑƒÑкации табличных данных. - -Он Ñчитывает данные входной таблицы и Ñоздает выходную таблицу, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ ÑохранÑет некоторые ÑвойÑтва входных данных, но при Ñтом Ñодержит другие данные. - -Это позволÑет публиковать практичеÑки реальные данные и иÑпользовать их в теÑтах на производительноÑÑ‚ÑŒ. - -ОбфуÑкатор предназначен Ð´Ð»Ñ ÑÐ¾Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ñледующих ÑвойÑтв данных: -- кардинальноÑÑ‚ÑŒ (количеÑтво уникальных данных) Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ Ñтолбца и каждого кортежа Ñтолбцов; -- уÑÐ»Ð¾Ð²Ð½Ð°Ñ ÐºÐ°Ñ€Ð´Ð¸Ð½Ð°Ð»ÑŒÐ½Ð¾ÑÑ‚ÑŒ: количеÑтво уникальных данных одного Ñтолбца в ÑоответÑтвии Ñо значением другого Ñтолбца; -- вероÑтноÑтные раÑÐ¿Ñ€ÐµÐ´ÐµÐ»ÐµÐ½Ð¸Ñ Ð°Ð±Ñолютного Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ñ†ÐµÐ»Ñ‹Ñ… чиÑел; знак чиÑла типа Int; показатель Ñтепени и знак Ð´Ð»Ñ Ñ‡Ð¸Ñел Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой; -- вероÑтноÑтное раÑпределение длины Ñтрок; -- вероÑтноÑÑ‚ÑŒ нулевых значений чиÑел; пуÑтые Ñтроки и маÑÑивы, `NULL`; -- Ñтепень ÑÐ¶Ð°Ñ‚Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… алгоритмом LZ77 и ÑемейÑтвом Ñнтропийных кодеков; - -- непрерывноÑÑ‚ÑŒ (величина разницы) значений времени в таблице; непрерывноÑÑ‚ÑŒ значений Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой; -- дату из значений `DateTime`; - -- кодировка UTF-8 значений Ñтроки; -- Ñтроковые Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð²Ñ‹Ð³Ð»ÑдÑÑ‚ еÑтеÑтвенным образом. - - -БольшинÑтво перечиÑленных выше ÑвойÑтв пригодны Ð´Ð»Ñ Ñ‚ÐµÑÑ‚Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð¿Ñ€Ð¾Ð¸Ð·Ð²Ð¾Ð´Ð¸Ñ‚ÐµÐ»ÑŒÐ½Ð¾Ñти. Чтение данных, фильтрациÑ, агрегирование и Ñортировка будут работать почти Ñ Ñ‚Ð¾Ð¹ же ÑкороÑтью, что и иÑходные данные, Ð±Ð»Ð°Ð³Ð¾Ð´Ð°Ñ€Ñ Ñохраненной кардинальноÑти, величине, Ñтепени ÑÐ¶Ð°Ñ‚Ð¸Ñ Ð¸ Ñ‚. д. - -Он работает детерминированно. Ð’Ñ‹ задаёте значение инициализатора, а преобразование полноÑтью определÑетÑÑ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ð¼Ð¸ данными и инициализатором. - -Ðекоторые Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÑÑŽÑ‚ÑÑ Ð¾Ð´Ð¸Ð½ к одному, и их можно отменить. ПоÑтому нужно иÑпользовать большое значение инициализатора и хранить его в Ñекрете. - - -ОбфуÑкатор иÑпользует некоторые криптографичеÑкие примитивы Ð´Ð»Ñ Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ…, но, Ñ ÐºÑ€Ð¸Ð¿Ñ‚Ð¾Ð³Ñ€Ð°Ñ„Ð¸Ñ‡ÐµÑкой точки зрениÑ, результат будет небезопаÑным. Ð’ нем могут ÑохранитьÑÑ Ð´Ð°Ð½Ð½Ñ‹Ðµ, которые не Ñледует публиковать. - - -Он вÑегда оÑтавлÑет без изменений чиÑла 0, 1, -1, даты, длины маÑÑивов и нулевые флаги. -Ðапример, еÑли у Ð²Ð°Ñ ÐµÑÑ‚ÑŒ Ñтолбец `IsMobile` в таблице Ñо значениÑми 0 и 1, то в преобразованных данных он будет иметь то же значение. - -Таким образом, пользователь Ñможет поÑчитать точное Ñоотношение мобильного трафика. - -Давайте раÑÑмотрим Ñлучай, когда у Ð²Ð°Ñ ÐµÑÑ‚ÑŒ какие-то личные данные в таблице (например, ÑÐ»ÐµÐºÑ‚Ñ€Ð¾Ð½Ð½Ð°Ñ Ð¿Ð¾Ñ‡Ñ‚Ð° пользователÑ), и вы не хотите их публиковать. -ЕÑли ваша таблица доÑтаточно Ð±Ð¾Ð»ÑŒÑˆÐ°Ñ Ð¸ Ñодержит неÑколько разных Ñлектронных почтовых адреÑов, и ни один из них не вÑтречаетÑÑ Ñ‡Ð°Ñто, то обфуÑкатор полноÑтью анонимизирует вÑе данные. Ðо, еÑли у Ð²Ð°Ñ ÐµÑÑ‚ÑŒ небольшое количеÑтво разных значений в Ñтолбце, он может Ñкопировать некоторые из них. -Ð’ Ñтом Ñлучае вам Ñледует поÑмотреть на алгоритм работы инÑтрумента и наÑтроить параметры командной Ñтроки. - -ОбфуÑкатор полезен в работе Ñо Ñредним объемом данных (не менее 1000 Ñтрок). +# ОбфуÑкатор ClickHouse + +ПроÑтой инÑтрумент Ð´Ð»Ñ Ð¾Ð±Ñ„ÑƒÑкации табличных данных. + +Он Ñчитывает данные входной таблицы и Ñоздает выходную таблицу, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ ÑохранÑет некоторые ÑвойÑтва входных данных, но при Ñтом Ñодержит другие данные. + +Это позволÑет публиковать практичеÑки реальные данные и иÑпользовать их в теÑтах на производительноÑÑ‚ÑŒ. + +ОбфуÑкатор предназначен Ð´Ð»Ñ ÑÐ¾Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ñледующих ÑвойÑтв данных: +- кардинальноÑÑ‚ÑŒ (количеÑтво уникальных данных) Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ Ñтолбца и каждого кортежа Ñтолбцов; +- уÑÐ»Ð¾Ð²Ð½Ð°Ñ ÐºÐ°Ñ€Ð´Ð¸Ð½Ð°Ð»ÑŒÐ½Ð¾ÑÑ‚ÑŒ: количеÑтво уникальных данных одного Ñтолбца в ÑоответÑтвии Ñо значением другого Ñтолбца; +- вероÑтноÑтные раÑÐ¿Ñ€ÐµÐ´ÐµÐ»ÐµÐ½Ð¸Ñ Ð°Ð±Ñолютного Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ñ†ÐµÐ»Ñ‹Ñ… чиÑел; знак чиÑла типа Int; показатель Ñтепени и знак Ð´Ð»Ñ Ñ‡Ð¸Ñел Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой; +- вероÑтноÑтное раÑпределение длины Ñтрок; +- вероÑтноÑÑ‚ÑŒ нулевых значений чиÑел; пуÑтые Ñтроки и маÑÑивы, `NULL`; +- Ñтепень ÑÐ¶Ð°Ñ‚Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… алгоритмом LZ77 и ÑемейÑтвом Ñнтропийных кодеков; + +- непрерывноÑÑ‚ÑŒ (величина разницы) значений времени в таблице; непрерывноÑÑ‚ÑŒ значений Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой; +- дату из значений `DateTime`; + +- кодировка UTF-8 значений Ñтроки; +- Ñтроковые Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð²Ñ‹Ð³Ð»ÑдÑÑ‚ еÑтеÑтвенным образом. + + +БольшинÑтво перечиÑленных выше ÑвойÑтв пригодны Ð´Ð»Ñ Ñ‚ÐµÑÑ‚Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð¿Ñ€Ð¾Ð¸Ð·Ð²Ð¾Ð´Ð¸Ñ‚ÐµÐ»ÑŒÐ½Ð¾Ñти. Чтение данных, фильтрациÑ, агрегирование и Ñортировка будут работать почти Ñ Ñ‚Ð¾Ð¹ же ÑкороÑтью, что и иÑходные данные, Ð±Ð»Ð°Ð³Ð¾Ð´Ð°Ñ€Ñ Ñохраненной кардинальноÑти, величине, Ñтепени ÑÐ¶Ð°Ñ‚Ð¸Ñ Ð¸ Ñ‚. д. + +Он работает детерминированно. Ð’Ñ‹ задаёте значение инициализатора, а преобразование полноÑтью определÑетÑÑ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ð¼Ð¸ данными и инициализатором. + +Ðекоторые Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÑÑŽÑ‚ÑÑ Ð¾Ð´Ð¸Ð½ к одному, и их можно отменить. ПоÑтому нужно иÑпользовать большое значение инициализатора и хранить его в Ñекрете. + + +ОбфуÑкатор иÑпользует некоторые криптографичеÑкие примитивы Ð´Ð»Ñ Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ…, но, Ñ ÐºÑ€Ð¸Ð¿Ñ‚Ð¾Ð³Ñ€Ð°Ñ„Ð¸Ñ‡ÐµÑкой точки зрениÑ, результат будет небезопаÑным. Ð’ нем могут ÑохранитьÑÑ Ð´Ð°Ð½Ð½Ñ‹Ðµ, которые не Ñледует публиковать. + + +Он вÑегда оÑтавлÑет без изменений чиÑла 0, 1, -1, даты, длины маÑÑивов и нулевые флаги. +Ðапример, еÑли у Ð²Ð°Ñ ÐµÑÑ‚ÑŒ Ñтолбец `IsMobile` в таблице Ñо значениÑми 0 и 1, то в преобразованных данных он будет иметь то же значение. + +Таким образом, пользователь Ñможет поÑчитать точное Ñоотношение мобильного трафика. + +Давайте раÑÑмотрим Ñлучай, когда у Ð²Ð°Ñ ÐµÑÑ‚ÑŒ какие-то личные данные в таблице (например, ÑÐ»ÐµÐºÑ‚Ñ€Ð¾Ð½Ð½Ð°Ñ Ð¿Ð¾Ñ‡Ñ‚Ð° пользователÑ), и вы не хотите их публиковать. +ЕÑли ваша таблица доÑтаточно Ð±Ð¾Ð»ÑŒÑˆÐ°Ñ Ð¸ Ñодержит неÑколько разных Ñлектронных почтовых адреÑов, и ни один из них не вÑтречаетÑÑ Ñ‡Ð°Ñто, то обфуÑкатор полноÑтью анонимизирует вÑе данные. Ðо, еÑли у Ð²Ð°Ñ ÐµÑÑ‚ÑŒ небольшое количеÑтво разных значений в Ñтолбце, он может Ñкопировать некоторые из них. +Ð’ Ñтом Ñлучае вам Ñледует поÑмотреть на алгоритм работы инÑтрумента и наÑтроить параметры командной Ñтроки. + +ОбфуÑкатор полезен в работе Ñо Ñредним объемом данных (не менее 1000 Ñтрок). diff --git a/docs/ru/sql-reference/aggregate-functions/parametric-functions.md b/docs/ru/sql-reference/aggregate-functions/parametric-functions.md index 8942bfa3444..b1eefc3fc16 100644 --- a/docs/ru/sql-reference/aggregate-functions/parametric-functions.md +++ b/docs/ru/sql-reference/aggregate-functions/parametric-functions.md @@ -253,7 +253,7 @@ windowFunnel(window, [mode, [mode, ... ]])(timestamp, cond1, cond2, ..., condN) **Параметры** -- `window` — ширина ÑкользÑщего окна по времени. Это Ð²Ñ€ÐµÐ¼Ñ Ð¼ÐµÐ¶Ð´Ñƒ первым и поÑледним уÑловием. Единица Ð¸Ð·Ð¼ÐµÑ€ÐµÐ½Ð¸Ñ Ð·Ð°Ð²Ð¸Ñит от `timestamp` и может варьироватьÑÑ. Должно ÑоблюдатьÑÑ ÑƒÑловие `timestamp ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ cond1 <= timestamp ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ cond2 <= ... <= timestamp ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ condN <= timestamp ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ cond1 + window`. +- `window` — ширина ÑкользÑщего окна по времени. Это Ð²Ñ€ÐµÐ¼Ñ Ð¼ÐµÐ¶Ð´Ñƒ первым и поÑледним уÑловием. Единица Ð¸Ð·Ð¼ÐµÑ€ÐµÐ½Ð¸Ñ Ð·Ð°Ð²Ð¸Ñит от `timestamp` и может варьироватьÑÑ. Должно ÑоблюдатьÑÑ ÑƒÑловие `timestamp ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ cond1 <= timestamp ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ cond2 <= ... <= timestamp ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ condN <= timestamp ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ cond1 + window`. - `mode` — необÑзательный параметр. Может быть уÑтановленно неÑколько значений одновременно. - `'strict'` — не учитывать подрÑд идущие повторÑющиеÑÑ ÑобытиÑ. - `'strict_order'` — запрещает поÑторонние ÑÐ¾Ð±Ñ‹Ñ‚Ð¸Ñ Ð² иÑкомой поÑледовательноÑти. Ðапример, при поиÑке цепочки `A->B->C` в `A->B->D->C` поиÑк будет оÑтановлен на `D` и Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²ÐµÑ€Ð½ÐµÑ‚ 2. @@ -519,7 +519,7 @@ sequenceNextNode(direction, base)(timestamp, event_column, base_condition, event - tail — уÑтановить начальную точку на поÑледнее Ñобытие цепочки. - first_match — уÑтановить начальную точку на первое ÑоответÑтвующее Ñобытие `event1`. - last_match — уÑтановить начальную точку на поÑледнее ÑоответÑтвующее Ñобытие `event1`. - + **Ðргументы** - `timestamp` — название Ñтолбца, Ñодержащего `timestamp`. Поддерживаемые типы данных: [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md#data_type-datetime) и другие беззнаковые целые типы. @@ -542,11 +542,11 @@ sequenceNextNode(direction, base)(timestamp, event_column, base_condition, event ``` sql CREATE TABLE test_flow ( - dt DateTime, - id int, + dt DateTime, + id int, page String) -ENGINE = MergeTree() -PARTITION BY toYYYYMMDD(dt) +ENGINE = MergeTree() +PARTITION BY toYYYYMMDD(dt) ORDER BY id; INSERT INTO test_flow VALUES (1, 1, 'A') (2, 1, 'B') (3, 1, 'C') (4, 1, 'D') (5, 1, 'E'); @@ -574,21 +574,21 @@ INSERT INTO test_flow VALUES (1, 3, 'Gift') (2, 3, 'Home') (3, 3, 'Gift') (4, 3, ``` sql SELECT id, sequenceNextNode('forward', 'head')(dt, page, page = 'Home', page = 'Home', page = 'Gift') FROM test_flow GROUP BY id; - + dt id page 1970-01-01 09:00:01 1 Home // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ°, Ñовпадение Ñ Home 1970-01-01 09:00:02 1 Gift // Совпадение Ñ Gift - 1970-01-01 09:00:03 1 Exit // Результат + 1970-01-01 09:00:03 1 Exit // Результат 1970-01-01 09:00:01 2 Home // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ°, Ñовпадение Ñ Home 1970-01-01 09:00:02 2 Home // ÐеÑовпадение Ñ Gift 1970-01-01 09:00:03 2 Gift - 1970-01-01 09:00:04 2 Basket - + 1970-01-01 09:00:04 2 Basket + 1970-01-01 09:00:01 3 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ°, неÑовпадение Ñ Home - 1970-01-01 09:00:02 3 Home - 1970-01-01 09:00:03 3 Gift - 1970-01-01 09:00:04 3 Basket + 1970-01-01 09:00:02 3 Home + 1970-01-01 09:00:03 3 Gift + 1970-01-01 09:00:04 3 Basket ``` **Поведение Ð´Ð»Ñ `backward` и `tail`** @@ -600,12 +600,12 @@ SELECT id, sequenceNextNode('backward', 'tail')(dt, page, page = 'Basket', page 1970-01-01 09:00:01 1 Home 1970-01-01 09:00:02 1 Gift 1970-01-01 09:00:03 1 Exit // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ°, неÑовпадение Ñ Basket - -1970-01-01 09:00:01 2 Home + +1970-01-01 09:00:01 2 Home 1970-01-01 09:00:02 2 Home // Результат 1970-01-01 09:00:03 2 Gift // Совпадение Ñ Gift 1970-01-01 09:00:04 2 Basket // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ°, Ñовпадение Ñ Basket - + 1970-01-01 09:00:01 3 Gift 1970-01-01 09:00:02 3 Home // Результат 1970-01-01 09:00:03 3 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ°, Ñовпадение Ñ Gift @@ -622,16 +622,16 @@ SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, page = 'Gift', p 1970-01-01 09:00:01 1 Home 1970-01-01 09:00:02 1 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° 1970-01-01 09:00:03 1 Exit // Результат - -1970-01-01 09:00:01 2 Home -1970-01-01 09:00:02 2 Home + +1970-01-01 09:00:01 2 Home +1970-01-01 09:00:02 2 Home 1970-01-01 09:00:03 2 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° 1970-01-01 09:00:04 2 Basket Результат - + 1970-01-01 09:00:01 3 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° 1970-01-01 09:00:02 3 Home // Результат -1970-01-01 09:00:03 3 Gift -1970-01-01 09:00:04 3 Basket +1970-01-01 09:00:03 3 Gift +1970-01-01 09:00:04 3 Basket ``` ``` sql @@ -641,16 +641,16 @@ SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, page = 'Gift', p 1970-01-01 09:00:01 1 Home 1970-01-01 09:00:02 1 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° 1970-01-01 09:00:03 1 Exit // ÐеÑовпадение Ñ Home - -1970-01-01 09:00:01 2 Home -1970-01-01 09:00:02 2 Home + +1970-01-01 09:00:01 2 Home +1970-01-01 09:00:02 2 Home 1970-01-01 09:00:03 2 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° 1970-01-01 09:00:04 2 Basket // ÐеÑовпадение Ñ Home - + 1970-01-01 09:00:01 3 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° 1970-01-01 09:00:02 3 Home // Совпадение Ñ Home 1970-01-01 09:00:03 3 Gift // Результат -1970-01-01 09:00:04 3 Basket +1970-01-01 09:00:04 3 Basket ``` @@ -662,17 +662,17 @@ SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, page = 'Gift', p dt id page 1970-01-01 09:00:01 1 Home // Результат 1970-01-01 09:00:02 1 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° -1970-01-01 09:00:03 1 Exit - -1970-01-01 09:00:01 2 Home +1970-01-01 09:00:03 1 Exit + +1970-01-01 09:00:01 2 Home 1970-01-01 09:00:02 2 Home // Результат 1970-01-01 09:00:03 2 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° -1970-01-01 09:00:04 2 Basket - -1970-01-01 09:00:01 3 Gift +1970-01-01 09:00:04 2 Basket + +1970-01-01 09:00:01 3 Gift 1970-01-01 09:00:02 3 Home // Результат 1970-01-01 09:00:03 3 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° -1970-01-01 09:00:04 3 Basket +1970-01-01 09:00:04 3 Basket ``` ``` sql @@ -681,17 +681,17 @@ SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, page = 'Gift', p dt id page 1970-01-01 09:00:01 1 Home // Совпадение Ñ Home, результат `Null` 1970-01-01 09:00:02 1 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° -1970-01-01 09:00:03 1 Exit - +1970-01-01 09:00:03 1 Exit + 1970-01-01 09:00:01 2 Home // Результат 1970-01-01 09:00:02 2 Home // Совпадение Ñ Home 1970-01-01 09:00:03 2 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° -1970-01-01 09:00:04 2 Basket - +1970-01-01 09:00:04 2 Basket + 1970-01-01 09:00:01 3 Gift // Результат 1970-01-01 09:00:02 3 Home // Совпадение Ñ Home -1970-01-01 09:00:03 3 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° -1970-01-01 09:00:04 3 Basket +1970-01-01 09:00:03 3 Gift // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° +1970-01-01 09:00:04 3 Basket ``` @@ -715,39 +715,39 @@ INSERT INTO test_flow_basecond VALUES (1, 1, 'A', 'ref4') (2, 1, 'A', 'ref3') (3 ``` sql SELECT id, sequenceNextNode('forward', 'head')(dt, page, ref = 'ref1', page = 'A') FROM test_flow_basecond GROUP BY id; - dt id page ref + dt id page ref 1970-01-01 09:00:01 1 A ref4 // Ðачало не может быть иÑходной точкой, поÑкольку Ñтолбец ref не ÑоответÑтвует 'ref1'. - 1970-01-01 09:00:02 1 A ref3 - 1970-01-01 09:00:03 1 B ref2 - 1970-01-01 09:00:04 1 B ref1 + 1970-01-01 09:00:02 1 A ref3 + 1970-01-01 09:00:03 1 B ref2 + 1970-01-01 09:00:04 1 B ref1 ``` ``` sql SELECT id, sequenceNextNode('backward', 'tail')(dt, page, ref = 'ref4', page = 'B') FROM test_flow_basecond GROUP BY id; - dt id page ref + dt id page ref 1970-01-01 09:00:01 1 A ref4 - 1970-01-01 09:00:02 1 A ref3 - 1970-01-01 09:00:03 1 B ref2 + 1970-01-01 09:00:02 1 A ref3 + 1970-01-01 09:00:03 1 B ref2 1970-01-01 09:00:04 1 B ref1 // Конец не может быть иÑходной точкой, поÑкольку Ñтолбец ref не ÑоответÑтвует 'ref4'. ``` ``` sql SELECT id, sequenceNextNode('forward', 'first_match')(dt, page, ref = 'ref3', page = 'A') FROM test_flow_basecond GROUP BY id; - dt id page ref + dt id page ref 1970-01-01 09:00:01 1 A ref4 // Эта Ñтрока не может быть иÑходной точкой, поÑкольку Ñтолбец ref не ÑоответÑтвует 'ref3'. 1970-01-01 09:00:02 1 A ref3 // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° 1970-01-01 09:00:03 1 B ref2 // Результат - 1970-01-01 09:00:04 1 B ref1 + 1970-01-01 09:00:04 1 B ref1 ``` ``` sql SELECT id, sequenceNextNode('backward', 'last_match')(dt, page, ref = 'ref2', page = 'B') FROM test_flow_basecond GROUP BY id; - dt id page ref + dt id page ref 1970-01-01 09:00:01 1 A ref4 1970-01-01 09:00:02 1 A ref3 // Результат 1970-01-01 09:00:03 1 B ref2 // ИÑÑ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð¾Ñ‡ÐºÐ° - 1970-01-01 09:00:04 1 B ref1 // Эта Ñтрока не может быть иÑходной точкой, поÑкольку Ñтолбец ref не ÑоответÑтвует 'ref2'. + 1970-01-01 09:00:04 1 B ref1 // Эта Ñтрока не может быть иÑходной точкой, поÑкольку Ñтолбец ref не ÑоответÑтвует 'ref2'. ``` diff --git a/docs/ru/sql-reference/aggregate-functions/reference/argmax.md b/docs/ru/sql-reference/aggregate-functions/reference/argmax.md index edad26ee232..71289423035 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/argmax.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/argmax.md @@ -29,7 +29,7 @@ argMax(tuple(arg, val)) - значение `arg`, ÑоответÑтвующее макÑимальному значению `val`. -Тип: ÑоответÑтвует типу `arg`. +Тип: ÑоответÑтвует типу `arg`. ЕÑли передан кортеж: diff --git a/docs/ru/sql-reference/aggregate-functions/reference/argmin.md b/docs/ru/sql-reference/aggregate-functions/reference/argmin.md index dc54c424fb3..4ee78a73a84 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/argmin.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/argmin.md @@ -29,7 +29,7 @@ argMin(tuple(arg, val)) - Значение `arg`, ÑоответÑтвующее минимальному значению `val`. -Тип: ÑоответÑтвует типу `arg`. +Тип: ÑоответÑтвует типу `arg`. ЕÑли передан кортеж: diff --git a/docs/ru/sql-reference/aggregate-functions/reference/deltasumtimestamp.md b/docs/ru/sql-reference/aggregate-functions/reference/deltasumtimestamp.md index 10294eb9e6d..e0a4516c11c 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/deltasumtimestamp.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/deltasumtimestamp.md @@ -4,7 +4,7 @@ toc_priority: 141 # deltaSumTimestamp {#agg_functions-deltasumtimestamp} -Суммирует разницу между поÑледовательными Ñтроками. ЕÑли разница отрицательна — она будет проигнорирована. +Суммирует разницу между поÑледовательными Ñтроками. ЕÑли разница отрицательна — она будет проигнорирована. Эта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ñ€ÐµÐ´Ð½Ð°Ð·Ð½Ð°Ñ‡ÐµÐ½Ð° в первую очередь Ð´Ð»Ñ [материализованных предÑтавлений](../../../sql-reference/statements/create/view.md#materialized), упорÑдоченных по некоторому временному бакету ÑоглаÑно timestamp, например, по бакету `toStartOfMinute`. ПоÑкольку Ñтроки в таком материализованном предÑтавлении будут иметь одинаковый timestamp, невозможно объединить их в "правом" порÑдке. Ð¤ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¾Ñ‚Ñлеживает `timestamp` наблюдаемых значений, поÑтому возможно правильно упорÑдочить ÑоÑтоÑÐ½Ð¸Ñ Ð²Ð¾ Ð²Ñ€ÐµÐ¼Ñ ÑлиÑниÑ. @@ -32,7 +32,7 @@ deltaSumTimestamp(value, timestamp) ЗапроÑ: ```sql -SELECT deltaSumTimestamp(value, timestamp) +SELECT deltaSumTimestamp(value, timestamp) FROM (SELECT number AS timestamp, [0, 4, 8, 3, 0, 0, 0, 1, 3, 5][number] AS value FROM numbers(1, 10)); ``` diff --git a/docs/ru/sql-reference/aggregate-functions/reference/index.md b/docs/ru/sql-reference/aggregate-functions/reference/index.md index 1af07623ade..b2172e1e70e 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/index.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/index.md @@ -1,67 +1,67 @@ ---- -toc_folder_title: "Справочник" -toc_priority: 36 -toc_hidden: true ---- - -# Перечень агрегатных функций {#aggregate-functions-list} - -Стандартные агрегатные функции: - -- [count](../../../sql-reference/aggregate-functions/reference/count.md) -- [min](../../../sql-reference/aggregate-functions/reference/min.md) -- [max](../../../sql-reference/aggregate-functions/reference/max.md) -- [sum](../../../sql-reference/aggregate-functions/reference/sum.md) -- [avg](../../../sql-reference/aggregate-functions/reference/avg.md) -- [any](../../../sql-reference/aggregate-functions/reference/any.md) -- [stddevPop](../../../sql-reference/aggregate-functions/reference/stddevpop.md) -- [stddevSamp](../../../sql-reference/aggregate-functions/reference/stddevsamp.md) -- [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md) -- [varSamp](../../../sql-reference/aggregate-functions/reference/varsamp.md) -- [covarPop](../../../sql-reference/aggregate-functions/reference/covarpop.md) -- [covarSamp](../../../sql-reference/aggregate-functions/reference/covarsamp.md) - -Ðгрегатные функции, Ñпецифичные Ð´Ð»Ñ ClickHouse: - -- [anyHeavy](../../../sql-reference/aggregate-functions/reference/anyheavy.md) -- [anyLast](../../../sql-reference/aggregate-functions/reference/anylast.md) -- [argMin](../../../sql-reference/aggregate-functions/reference/argmin.md) -- [argMax](../../../sql-reference/aggregate-functions/reference/argmax.md) -- [avgWeighted](../../../sql-reference/aggregate-functions/reference/avgweighted.md) -- [topK](../../../sql-reference/aggregate-functions/reference/topk.md) -- [topKWeighted](../../../sql-reference/aggregate-functions/reference/topkweighted.md) -- [groupArray](../../../sql-reference/aggregate-functions/reference/grouparray.md) -- [groupUniqArray](../../../sql-reference/aggregate-functions/reference/groupuniqarray.md) -- [groupArrayInsertAt](../../../sql-reference/aggregate-functions/reference/grouparrayinsertat.md) -- [groupArrayMovingAvg](../../../sql-reference/aggregate-functions/reference/grouparraymovingavg.md) -- [groupArrayMovingSum](../../../sql-reference/aggregate-functions/reference/grouparraymovingsum.md) -- [groupBitAnd](../../../sql-reference/aggregate-functions/reference/groupbitand.md) -- [groupBitOr](../../../sql-reference/aggregate-functions/reference/groupbitor.md) -- [groupBitXor](../../../sql-reference/aggregate-functions/reference/groupbitxor.md) -- [groupBitmap](../../../sql-reference/aggregate-functions/reference/groupbitmap.md) -- [sumWithOverflow](../../../sql-reference/aggregate-functions/reference/sumwithoverflow.md) -- [sumMap](../../../sql-reference/aggregate-functions/reference/summap.md) -- [skewSamp](../../../sql-reference/aggregate-functions/reference/skewsamp.md) -- [skewPop](../../../sql-reference/aggregate-functions/reference/skewpop.md) -- [kurtSamp](../../../sql-reference/aggregate-functions/reference/kurtsamp.md) -- [kurtPop](../../../sql-reference/aggregate-functions/reference/kurtpop.md) -- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md) -- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md) -- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md) -- [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md) -- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniqhll12.md) -- [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md) -- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md) -- [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md) -- [quantileExactLow](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexactlow) -- [quantileExactHigh](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexacthigh) -- [quantileExactWeighted](../../../sql-reference/aggregate-functions/reference/quantileexactweighted.md) -- [quantileTiming](../../../sql-reference/aggregate-functions/reference/quantiletiming.md) -- [quantileTimingWeighted](../../../sql-reference/aggregate-functions/reference/quantiletimingweighted.md) -- [quantileDeterministic](../../../sql-reference/aggregate-functions/reference/quantiledeterministic.md) -- [quantileTDigest](../../../sql-reference/aggregate-functions/reference/quantiletdigest.md) -- [quantileTDigestWeighted](../../../sql-reference/aggregate-functions/reference/quantiletdigestweighted.md) -- [simpleLinearRegression](../../../sql-reference/aggregate-functions/reference/simplelinearregression.md) -- [stochasticLinearRegression](../../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md) -- [stochasticLogisticRegression](../../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md) - +--- +toc_folder_title: "Справочник" +toc_priority: 36 +toc_hidden: true +--- + +# Перечень агрегатных функций {#aggregate-functions-list} + +Стандартные агрегатные функции: + +- [count](../../../sql-reference/aggregate-functions/reference/count.md) +- [min](../../../sql-reference/aggregate-functions/reference/min.md) +- [max](../../../sql-reference/aggregate-functions/reference/max.md) +- [sum](../../../sql-reference/aggregate-functions/reference/sum.md) +- [avg](../../../sql-reference/aggregate-functions/reference/avg.md) +- [any](../../../sql-reference/aggregate-functions/reference/any.md) +- [stddevPop](../../../sql-reference/aggregate-functions/reference/stddevpop.md) +- [stddevSamp](../../../sql-reference/aggregate-functions/reference/stddevsamp.md) +- [varPop](../../../sql-reference/aggregate-functions/reference/varpop.md) +- [varSamp](../../../sql-reference/aggregate-functions/reference/varsamp.md) +- [covarPop](../../../sql-reference/aggregate-functions/reference/covarpop.md) +- [covarSamp](../../../sql-reference/aggregate-functions/reference/covarsamp.md) + +Ðгрегатные функции, Ñпецифичные Ð´Ð»Ñ ClickHouse: + +- [anyHeavy](../../../sql-reference/aggregate-functions/reference/anyheavy.md) +- [anyLast](../../../sql-reference/aggregate-functions/reference/anylast.md) +- [argMin](../../../sql-reference/aggregate-functions/reference/argmin.md) +- [argMax](../../../sql-reference/aggregate-functions/reference/argmax.md) +- [avgWeighted](../../../sql-reference/aggregate-functions/reference/avgweighted.md) +- [topK](../../../sql-reference/aggregate-functions/reference/topk.md) +- [topKWeighted](../../../sql-reference/aggregate-functions/reference/topkweighted.md) +- [groupArray](../../../sql-reference/aggregate-functions/reference/grouparray.md) +- [groupUniqArray](../../../sql-reference/aggregate-functions/reference/groupuniqarray.md) +- [groupArrayInsertAt](../../../sql-reference/aggregate-functions/reference/grouparrayinsertat.md) +- [groupArrayMovingAvg](../../../sql-reference/aggregate-functions/reference/grouparraymovingavg.md) +- [groupArrayMovingSum](../../../sql-reference/aggregate-functions/reference/grouparraymovingsum.md) +- [groupBitAnd](../../../sql-reference/aggregate-functions/reference/groupbitand.md) +- [groupBitOr](../../../sql-reference/aggregate-functions/reference/groupbitor.md) +- [groupBitXor](../../../sql-reference/aggregate-functions/reference/groupbitxor.md) +- [groupBitmap](../../../sql-reference/aggregate-functions/reference/groupbitmap.md) +- [sumWithOverflow](../../../sql-reference/aggregate-functions/reference/sumwithoverflow.md) +- [sumMap](../../../sql-reference/aggregate-functions/reference/summap.md) +- [skewSamp](../../../sql-reference/aggregate-functions/reference/skewsamp.md) +- [skewPop](../../../sql-reference/aggregate-functions/reference/skewpop.md) +- [kurtSamp](../../../sql-reference/aggregate-functions/reference/kurtsamp.md) +- [kurtPop](../../../sql-reference/aggregate-functions/reference/kurtpop.md) +- [uniq](../../../sql-reference/aggregate-functions/reference/uniq.md) +- [uniqExact](../../../sql-reference/aggregate-functions/reference/uniqexact.md) +- [uniqCombined](../../../sql-reference/aggregate-functions/reference/uniqcombined.md) +- [uniqCombined64](../../../sql-reference/aggregate-functions/reference/uniqcombined64.md) +- [uniqHLL12](../../../sql-reference/aggregate-functions/reference/uniqhll12.md) +- [quantile](../../../sql-reference/aggregate-functions/reference/quantile.md) +- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md) +- [quantileExact](../../../sql-reference/aggregate-functions/reference/quantileexact.md) +- [quantileExactLow](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexactlow) +- [quantileExactHigh](../../../sql-reference/aggregate-functions/reference/quantileexact.md#quantileexacthigh) +- [quantileExactWeighted](../../../sql-reference/aggregate-functions/reference/quantileexactweighted.md) +- [quantileTiming](../../../sql-reference/aggregate-functions/reference/quantiletiming.md) +- [quantileTimingWeighted](../../../sql-reference/aggregate-functions/reference/quantiletimingweighted.md) +- [quantileDeterministic](../../../sql-reference/aggregate-functions/reference/quantiledeterministic.md) +- [quantileTDigest](../../../sql-reference/aggregate-functions/reference/quantiletdigest.md) +- [quantileTDigestWeighted](../../../sql-reference/aggregate-functions/reference/quantiletdigestweighted.md) +- [simpleLinearRegression](../../../sql-reference/aggregate-functions/reference/simplelinearregression.md) +- [stochasticLinearRegression](../../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md) +- [stochasticLogisticRegression](../../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md) + diff --git a/docs/ru/sql-reference/aggregate-functions/reference/quantilebfloat16.md b/docs/ru/sql-reference/aggregate-functions/reference/quantilebfloat16.md index ba4a762dff7..f8622e3fd05 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/quantilebfloat16.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/quantilebfloat16.md @@ -18,7 +18,7 @@ quantileBFloat16[(level)](expr) **Ðргументы** -- `expr` — Ñтолбец Ñ Ñ‡Ð¸Ñловыми данными. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md). +- `expr` — Ñтолбец Ñ Ñ‡Ð¸Ñловыми данными. [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md). **Параметры** diff --git a/docs/ru/sql-reference/aggregate-functions/reference/quantileexact.md b/docs/ru/sql-reference/aggregate-functions/reference/quantileexact.md index 2f1e879eaa1..bb8bddc7a0e 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/quantileexact.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/quantileexact.md @@ -1,270 +1,270 @@ ---- -toc_priority: 202 ---- - -# Функции quantileExact {#quantileexact-functions} - -## quantileExact {#quantileexact} - -Точно вычиÑлÑет [квантиль](https://ru.wikipedia.org/wiki/Квантиль) чиÑловой поÑледовательноÑти. - -Чтобы получить точный результат, вÑе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑобираютÑÑ Ð² маÑÑив, который затем чаÑтично ÑортируетÑÑ. Таким образом, Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»Ñет объем памÑти `O(n)`, где `n` — количеÑтво переданных значений. Ð”Ð»Ñ Ð½ÐµÐ±Ð¾Ð»ÑŒÑˆÐ¾Ð³Ð¾ чиÑла значений Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñффективна. - -Внутренние ÑоÑтоÑÐ½Ð¸Ñ Ñ„ÑƒÐ½ÐºÑ†Ð¸Ð¹ `quantile*` не объединÑÑŽÑ‚ÑÑ, еÑли они иÑпользуютÑÑ Ð² одном запроÑе. ЕÑли вам необходимо вычиÑлить квантили неÑкольких уровней, иÑпользуйте функцию [quantiles](#quantiles), Ñто повыÑит ÑффективноÑÑ‚ÑŒ запроÑа. - -**СинтакÑиÑ** - -``` sql -quantileExact(level)(expr) -``` - -ÐлиаÑ: `medianExact`. - -**Ðргументы** - -- `level` — уровень квантили. Опционально. КонÑтантное значение Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой от 0 до 1. Мы рекомендуем иÑпользовать значение `level` из диапазона `[0.01, 0.99]`. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://ru.wikipedia.org/wiki/Медиана_(ÑтатиÑтика)). -- `expr` — выражение, завиÑÑщее от значений Ñтолбцов, возвращающее данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types) или типов [Date](../../../sql-reference/data-types/date.md), [DateTime](../../../sql-reference/data-types/datetime.md). - -**Возвращаемое значение** - -- Квантиль заданного уровнÑ. - -Тип: - -- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. -- [Date](../../../sql-reference/data-types/date.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. -- [DateTime](../../../sql-reference/data-types/datetime.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. - -**Пример** - -ЗапроÑ: - -``` sql -SELECT quantileExact(number) FROM numbers(10) -``` - -Результат: - -``` text -┌─quantileExact(number)─┠-│ 5 │ -└───────────────────────┘ -``` - -## quantileExactLow {#quantileexactlow} - -Как и `quantileExact`, Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет точный [квантиль](https://en.wikipedia.org/wiki/Quantile) чиÑловой поÑледовательноÑти данных. - -Чтобы получить точное значение, вÑе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¾Ð±ÑŠÐµÐ´Ð¸Ð½ÑÑŽÑ‚ÑÑ Ð² маÑÑив, который затем полноÑтью ÑортируетÑÑ. СложноÑÑ‚ÑŒ [алгоритма Ñортировки](https://en.cppreference.com/w/cpp/algorithm/sort) равна `O(N·log(N))`, где `N = std::distance(first, last)`. - -Возвращаемое значение завиÑит от ÑƒÑ€Ð¾Ð²Ð½Ñ ÐºÐ²Ð°Ð½Ñ‚Ð¸Ð»Ð¸ и количеÑтва Ñлементов в выборке, то еÑÑ‚ÑŒ еÑли уровень 0,5, то Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÑ‚ нижнюю медиану при чётном количеÑтве Ñлементов и медиану при нечётном. Медиана вычиÑлÑетÑÑ Ð°Ð½Ð°Ð»Ð¾Ð³Ð¸Ñ‡Ð½Ð¾ реализации [median_low](https://docs.python.org/3/library/statistics.html#statistics.median_low), ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð¸ÑпользуетÑÑ Ð² python. - -Ð”Ð»Ñ Ð²Ñех оÑтальных уровней возвращаетÑÑ Ñлемент Ñ Ð¸Ð½Ð´ÐµÐºÑом, ÑоответÑтвующим значению `level * size_of_array`. Ðапример: - -``` sql -SELECT quantileExactLow(0.1)(number) FROM numbers(10) - -┌─quantileExactLow(0.1)(number)─┠-│ 1 │ -└───────────────────────────────┘ -``` - -При иÑпользовании в запроÑе неÑкольких функций `quantile*` Ñ Ñ€Ð°Ð·Ð½Ñ‹Ð¼Ð¸ уровнÑми, внутренние ÑоÑтоÑÐ½Ð¸Ñ Ð½Ðµ объединÑÑŽÑ‚ÑÑ (то еÑÑ‚ÑŒ Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°ÐµÑ‚ менее Ñффективно). Ð’ Ñтом Ñлучае иÑпользуйте функцию [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles). - -**СинтакÑиÑ** - -``` sql -quantileExact(level)(expr) -``` - -ÐлиаÑ: `medianExactLow`. - -**Ðргументы** - -- `level` — уровень квантили. Опциональный параметр. КонÑтантное занчение Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой от 0 до 1. Мы рекомендуем иÑпользовать значение `level` из диапазона `[0.01, 0.99]`. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://en.wikipedia.org/wiki/Median). -- `expr` — выражение, завиÑÑщее от значений Ñтолбцов, возвращающее данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) или [DateTime](../../../sql-reference/data-types/datetime.md). - -**Возвращаемое значение** - -- Квантиль заданного уровнÑ. - -Тип: - -- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. -- [Date](../../../sql-reference/data-types/date.md) еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. -- [DateTime](../../../sql-reference/data-types/datetime.md) еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. - -**Пример** - -ЗапроÑ: - -``` sql -SELECT quantileExactLow(number) FROM numbers(10) -``` - -Результат: - -``` text -┌─quantileExactLow(number)─┠-│ 4 │ -└──────────────────────────┘ -``` -## quantileExactHigh {#quantileexacthigh} - -Как и `quantileExact`, Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет точный [квантиль](https://en.wikipedia.org/wiki/Quantile) чиÑловой поÑледовательноÑти данных. - -Ð’Ñе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¾Ð±ÑŠÐµÐ´Ð¸Ð½ÑÑŽÑ‚ÑÑ Ð² маÑÑив, который затем ÑортируетÑÑ, чтобы получить точное значение. СложноÑÑ‚ÑŒ [алгоритма Ñортировки](https://en.cppreference.com/w/cpp/algorithm/sort) равна `O(N·log(N))`, где `N = std::distance(first, last)`. - -Возвращаемое значение завиÑит от ÑƒÑ€Ð¾Ð²Ð½Ñ ÐºÐ²Ð°Ð½Ñ‚Ð¸Ð»Ð¸ и количеÑтва Ñлементов в выборке, то еÑÑ‚ÑŒ еÑли уровень 0,5, то Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÑ‚ верхнюю медиану при чётном количеÑтве Ñлементов и медиану при нечётном. Медиана вычиÑлÑетÑÑ Ð°Ð½Ð°Ð»Ð¾Ð³Ð¸Ñ‡Ð½Ð¾ реализации [median_high](https://docs.python.org/3/library/statistics.html#statistics.median_high), ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð¸ÑпользуетÑÑ Ð² python. Ð”Ð»Ñ Ð²Ñех оÑтальных уровней возвращаетÑÑ Ñлемент Ñ Ð¸Ð½Ð´ÐµÐºÑом, ÑоответÑтвующим значению `level * size_of_array`. - -Эта Ñ€ÐµÐ°Ð»Ð¸Ð·Ð°Ñ†Ð¸Ñ Ð²ÐµÐ´ÐµÑ‚ ÑÐµÐ±Ñ Ñ‚Ð¾Ñ‡Ð½Ð¾ так же, как `quantileExact`. - -При иÑпользовании в запроÑе неÑкольких функций `quantile*` Ñ Ñ€Ð°Ð·Ð½Ñ‹Ð¼Ð¸ уровнÑми, внутренние ÑоÑтоÑÐ½Ð¸Ñ Ð½Ðµ объединÑÑŽÑ‚ÑÑ (то еÑÑ‚ÑŒ Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°ÐµÑ‚ менее Ñффективно). Ð’ Ñтом Ñлучае иÑпользуйте функцию [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles). - -**СинтакÑиÑ** - -``` sql -quantileExactHigh(level)(expr) -``` - -ÐлиаÑ: `medianExactHigh`. - -**Ðргументы** - -- `level` — уровень квантили. Опциональный параметр. КонÑтантное занчение Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой от 0 до 1. Мы рекомендуем иÑпользовать значение `level` из диапазона `[0.01, 0.99]`. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://en.wikipedia.org/wiki/Median). -- `expr` — выражение, завиÑÑщее от значений Ñтолбцов, возвращающее данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) или [DateTime](../../../sql-reference/data-types/datetime.md). - -**Возвращаемое значение** - -- Квантиль заданного уровнÑ. - -Тип: - -- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. -- [Date](../../../sql-reference/data-types/date.md) еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. -- [DateTime](../../../sql-reference/data-types/datetime.md) еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. - -**Пример** - -ЗапроÑ: - -``` sql -SELECT quantileExactHigh(number) FROM numbers(10) -``` - -Результат: - -``` text -┌─quantileExactHigh(number)─┠-│ 5 │ -└───────────────────────────┘ -``` - -## quantileExactExclusive {#quantileexactexclusive} - -Точно вычиÑлÑет [квантиль](https://ru.wikipedia.org/wiki/Квантиль) чиÑловой поÑледовательноÑти. - -Чтобы получить точный результат, вÑе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑобираютÑÑ Ð² маÑÑив, который затем чаÑтично ÑортируетÑÑ. Таким образом, Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»Ñет объем памÑти `O(n)`, где `n` — количеÑтво переданных значений. Ð”Ð»Ñ Ð½ÐµÐ±Ð¾Ð»ÑŒÑˆÐ¾Ð³Ð¾ чиÑла значений Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñффективна. - -Эта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñквивалентна Excel функции [PERCENTILE.EXC](https://support.microsoft.com/en-us/office/percentile-exc-function-bbaa7204-e9e1-4010-85bf-c31dc5dce4ba), [тип R6](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample). - -ЕÑли в одном запроÑе вызываетÑÑ Ð½ÐµÑколько функций `quantileExactExclusive` Ñ Ñ€Ð°Ð·Ð½Ñ‹Ð¼Ð¸ значениÑми `level`, Ñти функции вычиÑлÑÑŽÑ‚ÑÑ Ð½ÐµÐ·Ð°Ð²Ð¸Ñимо друг от друга. Ð’ таких ÑлучаÑÑ… иÑпользуйте функцию [quantilesExactExclusive](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantilesexactexclusive), Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð±ÑƒÐ´ÐµÑ‚ выполнÑÑ‚ÑŒÑÑ Ñффективнее. - -**СинтакÑиÑ** - -``` sql -quantileExactExclusive(level)(expr) -``` - -**Ðргументы** - -- `expr` — выражение, завиÑÑщее от значений Ñтолбцов. Возвращает данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) или [DateTime](../../../sql-reference/data-types/datetime.md). - -**Параметры** - -- `level` — уровень квантилÑ. ÐеобÑзательный параметр. Возможные значениÑ: (0, 1) — граничные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð½Ðµ учитываютÑÑ. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://ru.wikipedia.org/wiki/Медиана_(ÑтатиÑтика)). [Float](../../../sql-reference/data-types/float.md). - -**Возвращаемое значение** - -- Квантиль заданного уровнÑ. - -Тип: - -- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. -- [Date](../../../sql-reference/data-types/date.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. -- [DateTime](../../../sql-reference/data-types/datetime.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. - -**Пример** - -ЗапроÑ: - -``` sql -CREATE TABLE num AS numbers(1000); - -SELECT quantileExactExclusive(0.6)(x) FROM (SELECT number AS x FROM num); -``` - -Результат: - -``` text -┌─quantileExactExclusive(0.6)(x)─┠-│ 599.6 │ -└────────────────────────────────┘ -``` - -## quantileExactInclusive {#quantileexactinclusive} - -Точно вычиÑлÑет [квантиль](https://ru.wikipedia.org/wiki/Квантиль) чиÑловой поÑледовательноÑти. - -Чтобы получить точный результат, вÑе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑобираютÑÑ Ð² маÑÑив, который затем чаÑтично ÑортируетÑÑ. Таким образом, Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»Ñет объем памÑти `O(n)`, где `n` — количеÑтво переданных значений. Ð”Ð»Ñ Ð½ÐµÐ±Ð¾Ð»ÑŒÑˆÐ¾Ð³Ð¾ чиÑла значений Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñффективна. - -Эта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñквивалентна Excel функции [PERCENTILE.INC](https://support.microsoft.com/en-us/office/percentile-inc-function-680f9539-45eb-410b-9a5e-c1355e5fe2ed), [тип R7](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample). - -ЕÑли в одном запроÑе вызываетÑÑ Ð½ÐµÑколько функций `quantileExactInclusive` Ñ Ñ€Ð°Ð·Ð½Ñ‹Ð¼Ð¸ значениÑми `level`, Ñти функции вычиÑлÑÑŽÑ‚ÑÑ Ð½ÐµÐ·Ð°Ð²Ð¸Ñимо друг от друга. Ð’ таких ÑлучаÑÑ… иÑпользуйте функцию [quantilesExactInclusive](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantilesexactinclusive), Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð±ÑƒÐ´ÐµÑ‚ выполнÑÑ‚ÑŒÑÑ Ñффективнее. - -**СинтакÑиÑ** - -``` sql -quantileExactInclusive(level)(expr) -``` - -**Ðргументы** - -- `expr` — выражение, завиÑÑщее от значений Ñтолбцов. Возвращает данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) или [DateTime](../../../sql-reference/data-types/datetime.md). - -**Параметры** - -- `level` — уровень квантилÑ. ÐеобÑзательный параметр. Возможные значениÑ: [0, 1] — граничные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑƒÑ‡Ð¸Ñ‚Ñ‹Ð²Ð°ÑŽÑ‚ÑÑ. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://ru.wikipedia.org/wiki/Медиана_(ÑтатиÑтика)). [Float](../../../sql-reference/data-types/float.md). - -**Возвращаемое значение** - -- Квантиль заданного уровнÑ. - -Тип: - -- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. -- [Date](../../../sql-reference/data-types/date.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. -- [DateTime](../../../sql-reference/data-types/datetime.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. - -**Пример** - -ЗапроÑ: - -``` sql -CREATE TABLE num AS numbers(1000); - -SELECT quantileExactInclusive(0.6)(x) FROM (SELECT number AS x FROM num); -``` - -Результат: - -``` text -┌─quantileExactInclusive(0.6)(x)─┠-│ 599.4 │ -└────────────────────────────────┘ -``` - -**Смотрите также** - -- [median](../../../sql-reference/aggregate-functions/reference/median.md#median) -- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) +--- +toc_priority: 202 +--- + +# Функции quantileExact {#quantileexact-functions} + +## quantileExact {#quantileexact} + +Точно вычиÑлÑет [квантиль](https://ru.wikipedia.org/wiki/Квантиль) чиÑловой поÑледовательноÑти. + +Чтобы получить точный результат, вÑе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑобираютÑÑ Ð² маÑÑив, который затем чаÑтично ÑортируетÑÑ. Таким образом, Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»Ñет объем памÑти `O(n)`, где `n` — количеÑтво переданных значений. Ð”Ð»Ñ Ð½ÐµÐ±Ð¾Ð»ÑŒÑˆÐ¾Ð³Ð¾ чиÑла значений Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñффективна. + +Внутренние ÑоÑтоÑÐ½Ð¸Ñ Ñ„ÑƒÐ½ÐºÑ†Ð¸Ð¹ `quantile*` не объединÑÑŽÑ‚ÑÑ, еÑли они иÑпользуютÑÑ Ð² одном запроÑе. ЕÑли вам необходимо вычиÑлить квантили неÑкольких уровней, иÑпользуйте функцию [quantiles](#quantiles), Ñто повыÑит ÑффективноÑÑ‚ÑŒ запроÑа. + +**СинтакÑиÑ** + +``` sql +quantileExact(level)(expr) +``` + +ÐлиаÑ: `medianExact`. + +**Ðргументы** + +- `level` — уровень квантили. Опционально. КонÑтантное значение Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой от 0 до 1. Мы рекомендуем иÑпользовать значение `level` из диапазона `[0.01, 0.99]`. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://ru.wikipedia.org/wiki/Медиана_(ÑтатиÑтика)). +- `expr` — выражение, завиÑÑщее от значений Ñтолбцов, возвращающее данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types) или типов [Date](../../../sql-reference/data-types/date.md), [DateTime](../../../sql-reference/data-types/datetime.md). + +**Возвращаемое значение** + +- Квантиль заданного уровнÑ. + +Тип: + +- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. +- [Date](../../../sql-reference/data-types/date.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. +- [DateTime](../../../sql-reference/data-types/datetime.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. + +**Пример** + +ЗапроÑ: + +``` sql +SELECT quantileExact(number) FROM numbers(10) +``` + +Результат: + +``` text +┌─quantileExact(number)─┠+│ 5 │ +└───────────────────────┘ +``` + +## quantileExactLow {#quantileexactlow} + +Как и `quantileExact`, Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет точный [квантиль](https://en.wikipedia.org/wiki/Quantile) чиÑловой поÑледовательноÑти данных. + +Чтобы получить точное значение, вÑе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¾Ð±ÑŠÐµÐ´Ð¸Ð½ÑÑŽÑ‚ÑÑ Ð² маÑÑив, который затем полноÑтью ÑортируетÑÑ. СложноÑÑ‚ÑŒ [алгоритма Ñортировки](https://en.cppreference.com/w/cpp/algorithm/sort) равна `O(N·log(N))`, где `N = std::distance(first, last)`. + +Возвращаемое значение завиÑит от ÑƒÑ€Ð¾Ð²Ð½Ñ ÐºÐ²Ð°Ð½Ñ‚Ð¸Ð»Ð¸ и количеÑтва Ñлементов в выборке, то еÑÑ‚ÑŒ еÑли уровень 0,5, то Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÑ‚ нижнюю медиану при чётном количеÑтве Ñлементов и медиану при нечётном. Медиана вычиÑлÑетÑÑ Ð°Ð½Ð°Ð»Ð¾Ð³Ð¸Ñ‡Ð½Ð¾ реализации [median_low](https://docs.python.org/3/library/statistics.html#statistics.median_low), ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð¸ÑпользуетÑÑ Ð² python. + +Ð”Ð»Ñ Ð²Ñех оÑтальных уровней возвращаетÑÑ Ñлемент Ñ Ð¸Ð½Ð´ÐµÐºÑом, ÑоответÑтвующим значению `level * size_of_array`. Ðапример: + +``` sql +SELECT quantileExactLow(0.1)(number) FROM numbers(10) + +┌─quantileExactLow(0.1)(number)─┠+│ 1 │ +└───────────────────────────────┘ +``` + +При иÑпользовании в запроÑе неÑкольких функций `quantile*` Ñ Ñ€Ð°Ð·Ð½Ñ‹Ð¼Ð¸ уровнÑми, внутренние ÑоÑтоÑÐ½Ð¸Ñ Ð½Ðµ объединÑÑŽÑ‚ÑÑ (то еÑÑ‚ÑŒ Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°ÐµÑ‚ менее Ñффективно). Ð’ Ñтом Ñлучае иÑпользуйте функцию [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles). + +**СинтакÑиÑ** + +``` sql +quantileExact(level)(expr) +``` + +ÐлиаÑ: `medianExactLow`. + +**Ðргументы** + +- `level` — уровень квантили. Опциональный параметр. КонÑтантное занчение Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой от 0 до 1. Мы рекомендуем иÑпользовать значение `level` из диапазона `[0.01, 0.99]`. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://en.wikipedia.org/wiki/Median). +- `expr` — выражение, завиÑÑщее от значений Ñтолбцов, возвращающее данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) или [DateTime](../../../sql-reference/data-types/datetime.md). + +**Возвращаемое значение** + +- Квантиль заданного уровнÑ. + +Тип: + +- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. +- [Date](../../../sql-reference/data-types/date.md) еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. +- [DateTime](../../../sql-reference/data-types/datetime.md) еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. + +**Пример** + +ЗапроÑ: + +``` sql +SELECT quantileExactLow(number) FROM numbers(10) +``` + +Результат: + +``` text +┌─quantileExactLow(number)─┠+│ 4 │ +└──────────────────────────┘ +``` +## quantileExactHigh {#quantileexacthigh} + +Как и `quantileExact`, Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет точный [квантиль](https://en.wikipedia.org/wiki/Quantile) чиÑловой поÑледовательноÑти данных. + +Ð’Ñе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¾Ð±ÑŠÐµÐ´Ð¸Ð½ÑÑŽÑ‚ÑÑ Ð² маÑÑив, который затем ÑортируетÑÑ, чтобы получить точное значение. СложноÑÑ‚ÑŒ [алгоритма Ñортировки](https://en.cppreference.com/w/cpp/algorithm/sort) равна `O(N·log(N))`, где `N = std::distance(first, last)`. + +Возвращаемое значение завиÑит от ÑƒÑ€Ð¾Ð²Ð½Ñ ÐºÐ²Ð°Ð½Ñ‚Ð¸Ð»Ð¸ и количеÑтва Ñлементов в выборке, то еÑÑ‚ÑŒ еÑли уровень 0,5, то Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÑ‚ верхнюю медиану при чётном количеÑтве Ñлементов и медиану при нечётном. Медиана вычиÑлÑетÑÑ Ð°Ð½Ð°Ð»Ð¾Ð³Ð¸Ñ‡Ð½Ð¾ реализации [median_high](https://docs.python.org/3/library/statistics.html#statistics.median_high), ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð¸ÑпользуетÑÑ Ð² python. Ð”Ð»Ñ Ð²Ñех оÑтальных уровней возвращаетÑÑ Ñлемент Ñ Ð¸Ð½Ð´ÐµÐºÑом, ÑоответÑтвующим значению `level * size_of_array`. + +Эта Ñ€ÐµÐ°Ð»Ð¸Ð·Ð°Ñ†Ð¸Ñ Ð²ÐµÐ´ÐµÑ‚ ÑÐµÐ±Ñ Ñ‚Ð¾Ñ‡Ð½Ð¾ так же, как `quantileExact`. + +При иÑпользовании в запроÑе неÑкольких функций `quantile*` Ñ Ñ€Ð°Ð·Ð½Ñ‹Ð¼Ð¸ уровнÑми, внутренние ÑоÑтоÑÐ½Ð¸Ñ Ð½Ðµ объединÑÑŽÑ‚ÑÑ (то еÑÑ‚ÑŒ Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°ÐµÑ‚ менее Ñффективно). Ð’ Ñтом Ñлучае иÑпользуйте функцию [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles). + +**СинтакÑиÑ** + +``` sql +quantileExactHigh(level)(expr) +``` + +ÐлиаÑ: `medianExactHigh`. + +**Ðргументы** + +- `level` — уровень квантили. Опциональный параметр. КонÑтантное занчение Ñ Ð¿Ð»Ð°Ð²Ð°ÑŽÑ‰ÐµÐ¹ запÑтой от 0 до 1. Мы рекомендуем иÑпользовать значение `level` из диапазона `[0.01, 0.99]`. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://en.wikipedia.org/wiki/Median). +- `expr` — выражение, завиÑÑщее от значений Ñтолбцов, возвращающее данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) или [DateTime](../../../sql-reference/data-types/datetime.md). + +**Возвращаемое значение** + +- Квантиль заданного уровнÑ. + +Тип: + +- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. +- [Date](../../../sql-reference/data-types/date.md) еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. +- [DateTime](../../../sql-reference/data-types/datetime.md) еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. + +**Пример** + +ЗапроÑ: + +``` sql +SELECT quantileExactHigh(number) FROM numbers(10) +``` + +Результат: + +``` text +┌─quantileExactHigh(number)─┠+│ 5 │ +└───────────────────────────┘ +``` + +## quantileExactExclusive {#quantileexactexclusive} + +Точно вычиÑлÑет [квантиль](https://ru.wikipedia.org/wiki/Квантиль) чиÑловой поÑледовательноÑти. + +Чтобы получить точный результат, вÑе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑобираютÑÑ Ð² маÑÑив, который затем чаÑтично ÑортируетÑÑ. Таким образом, Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»Ñет объем памÑти `O(n)`, где `n` — количеÑтво переданных значений. Ð”Ð»Ñ Ð½ÐµÐ±Ð¾Ð»ÑŒÑˆÐ¾Ð³Ð¾ чиÑла значений Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñффективна. + +Эта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñквивалентна Excel функции [PERCENTILE.EXC](https://support.microsoft.com/en-us/office/percentile-exc-function-bbaa7204-e9e1-4010-85bf-c31dc5dce4ba), [тип R6](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample). + +ЕÑли в одном запроÑе вызываетÑÑ Ð½ÐµÑколько функций `quantileExactExclusive` Ñ Ñ€Ð°Ð·Ð½Ñ‹Ð¼Ð¸ значениÑми `level`, Ñти функции вычиÑлÑÑŽÑ‚ÑÑ Ð½ÐµÐ·Ð°Ð²Ð¸Ñимо друг от друга. Ð’ таких ÑлучаÑÑ… иÑпользуйте функцию [quantilesExactExclusive](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantilesexactexclusive), Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð±ÑƒÐ´ÐµÑ‚ выполнÑÑ‚ÑŒÑÑ Ñффективнее. + +**СинтакÑиÑ** + +``` sql +quantileExactExclusive(level)(expr) +``` + +**Ðргументы** + +- `expr` — выражение, завиÑÑщее от значений Ñтолбцов. Возвращает данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) или [DateTime](../../../sql-reference/data-types/datetime.md). + +**Параметры** + +- `level` — уровень квантилÑ. ÐеобÑзательный параметр. Возможные значениÑ: (0, 1) — граничные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð½Ðµ учитываютÑÑ. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://ru.wikipedia.org/wiki/Медиана_(ÑтатиÑтика)). [Float](../../../sql-reference/data-types/float.md). + +**Возвращаемое значение** + +- Квантиль заданного уровнÑ. + +Тип: + +- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. +- [Date](../../../sql-reference/data-types/date.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. +- [DateTime](../../../sql-reference/data-types/datetime.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. + +**Пример** + +ЗапроÑ: + +``` sql +CREATE TABLE num AS numbers(1000); + +SELECT quantileExactExclusive(0.6)(x) FROM (SELECT number AS x FROM num); +``` + +Результат: + +``` text +┌─quantileExactExclusive(0.6)(x)─┠+│ 599.6 │ +└────────────────────────────────┘ +``` + +## quantileExactInclusive {#quantileexactinclusive} + +Точно вычиÑлÑет [квантиль](https://ru.wikipedia.org/wiki/Квантиль) чиÑловой поÑледовательноÑти. + +Чтобы получить точный результат, вÑе переданные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑобираютÑÑ Ð² маÑÑив, который затем чаÑтично ÑортируетÑÑ. Таким образом, Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»Ñет объем памÑти `O(n)`, где `n` — количеÑтво переданных значений. Ð”Ð»Ñ Ð½ÐµÐ±Ð¾Ð»ÑŒÑˆÐ¾Ð³Ð¾ чиÑла значений Ñта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñффективна. + +Эта Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ñквивалентна Excel функции [PERCENTILE.INC](https://support.microsoft.com/en-us/office/percentile-inc-function-680f9539-45eb-410b-9a5e-c1355e5fe2ed), [тип R7](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample). + +ЕÑли в одном запроÑе вызываетÑÑ Ð½ÐµÑколько функций `quantileExactInclusive` Ñ Ñ€Ð°Ð·Ð½Ñ‹Ð¼Ð¸ значениÑми `level`, Ñти функции вычиÑлÑÑŽÑ‚ÑÑ Ð½ÐµÐ·Ð°Ð²Ð¸Ñимо друг от друга. Ð’ таких ÑлучаÑÑ… иÑпользуйте функцию [quantilesExactInclusive](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantilesexactinclusive), Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð±ÑƒÐ´ÐµÑ‚ выполнÑÑ‚ÑŒÑÑ Ñффективнее. + +**СинтакÑиÑ** + +``` sql +quantileExactInclusive(level)(expr) +``` + +**Ðргументы** + +- `expr` — выражение, завиÑÑщее от значений Ñтолбцов. Возвращает данные [чиÑловых типов](../../../sql-reference/data-types/index.md#data_types), [Date](../../../sql-reference/data-types/date.md) или [DateTime](../../../sql-reference/data-types/datetime.md). + +**Параметры** + +- `level` — уровень квантилÑ. ÐеобÑзательный параметр. Возможные значениÑ: [0, 1] — граничные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑƒÑ‡Ð¸Ñ‚Ñ‹Ð²Ð°ÑŽÑ‚ÑÑ. Значение по умолчанию: 0.5. При `level=0.5` Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ñ‹Ñ‡Ð¸ÑлÑет [медиану](https://ru.wikipedia.org/wiki/Медиана_(ÑтатиÑтика)). [Float](../../../sql-reference/data-types/float.md). + +**Возвращаемое значение** + +- Квантиль заданного уровнÑ. + +Тип: + +- [Float64](../../../sql-reference/data-types/float.md) Ð´Ð»Ñ Ð²Ñ…Ð¾Ð´Ð½Ñ‹Ñ… данных чиÑлового типа. +- [Date](../../../sql-reference/data-types/date.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `Date`. +- [DateTime](../../../sql-reference/data-types/datetime.md), еÑли входные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð¼ÐµÑŽÑ‚ тип `DateTime`. + +**Пример** + +ЗапроÑ: + +``` sql +CREATE TABLE num AS numbers(1000); + +SELECT quantileExactInclusive(0.6)(x) FROM (SELECT number AS x FROM num); +``` + +Результат: + +``` text +┌─quantileExactInclusive(0.6)(x)─┠+│ 599.4 │ +└────────────────────────────────┘ +``` + +**Смотрите также** + +- [median](../../../sql-reference/aggregate-functions/reference/median.md#median) +- [quantiles](../../../sql-reference/aggregate-functions/reference/quantiles.md#quantiles) diff --git a/docs/ru/sql-reference/aggregate-functions/reference/studentttest.md b/docs/ru/sql-reference/aggregate-functions/reference/studentttest.md index 16daddfbecf..a00a11301ac 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/studentttest.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/studentttest.md @@ -5,7 +5,7 @@ toc_title: studentTTest # studentTTest {#studentttest} -ВычиÑлÑет t-критерий Стьюдента Ð´Ð»Ñ Ð²Ñ‹Ð±Ð¾Ñ€Ð¾Ðº из двух генеральных ÑовокупноÑтей. +ВычиÑлÑет t-критерий Стьюдента Ð´Ð»Ñ Ð²Ñ‹Ð±Ð¾Ñ€Ð¾Ðº из двух генеральных ÑовокупноÑтей. **СинтакÑиÑ** diff --git a/docs/ru/sql-reference/aggregate-functions/reference/sumcount.md b/docs/ru/sql-reference/aggregate-functions/reference/sumcount.md index 0606b06fba0..5a8f93209cf 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/sumcount.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/sumcount.md @@ -12,7 +12,7 @@ toc_priority: 144 sumCount(x) ``` -**Ðргументы** +**Ðргументы** - `x` — Входное значение типа [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), или [Decimal](../../../sql-reference/data-types/decimal.md). diff --git a/docs/ru/sql-reference/aggregate-functions/reference/sumkahan.md b/docs/ru/sql-reference/aggregate-functions/reference/sumkahan.md index cdc713d5726..6e9c76c9371 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/sumkahan.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/sumkahan.md @@ -5,7 +5,7 @@ toc_priority: 145 # sumKahan {#agg_function-sumKahan} ВычиÑлÑет Ñумму Ñ Ð¸Ñпользованием [компенÑационного ÑÑƒÐ¼Ð¼Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð¿Ð¾ алгоритму КÑÑ…Ñна](https://ru.wikipedia.org/wiki/Ðлгоритм_КÑÑ…Ñна). -Работает медленнее функции [sum](./sum.md). +Работает медленнее функции [sum](./sum.md). КомпенÑÐ°Ñ†Ð¸Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°ÐµÑ‚ только Ð´Ð»Ñ [Float](../../../sql-reference/data-types/float.md) типов. **СинтакÑиÑ** diff --git a/docs/ru/sql-reference/aggregate-functions/reference/topkweighted.md b/docs/ru/sql-reference/aggregate-functions/reference/topkweighted.md index d0fd3856b24..3c2c4ea0cba 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/topkweighted.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/topkweighted.md @@ -4,7 +4,7 @@ toc_priority: 109 # topKWeighted {#topkweighted} -Возвращает маÑÑив наиболее чаÑто вÑтречающихÑÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ð¹ в указанном Ñтолбце. Результирующий маÑÑив упорÑдочен по убыванию чаÑтоты Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ (не по Ñамим значениÑм). Дополнительно учитываетÑÑ Ð²ÐµÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ. +Возвращает маÑÑив наиболее чаÑто вÑтречающихÑÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ð¹ в указанном Ñтолбце. Результирующий маÑÑив упорÑдочен по убыванию чаÑтоты Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ (не по Ñамим значениÑм). Дополнительно учитываетÑÑ Ð²ÐµÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ. **СинтакÑиÑ** diff --git a/docs/ru/sql-reference/aggregate-functions/reference/welchttest.md b/docs/ru/sql-reference/aggregate-functions/reference/welchttest.md index 594a609d89e..40588002917 100644 --- a/docs/ru/sql-reference/aggregate-functions/reference/welchttest.md +++ b/docs/ru/sql-reference/aggregate-functions/reference/welchttest.md @@ -5,7 +5,7 @@ toc_title: welchTTest # welchTTest {#welchttest} -ВычиÑлÑет t-критерий УÑлча Ð´Ð»Ñ Ð²Ñ‹Ð±Ð¾Ñ€Ð¾Ðº из двух генеральных ÑовокупноÑтей. +ВычиÑлÑет t-критерий УÑлча Ð´Ð»Ñ Ð²Ñ‹Ð±Ð¾Ñ€Ð¾Ðº из двух генеральных ÑовокупноÑтей. **СинтакÑиÑ** diff --git a/docs/ru/sql-reference/data-types/geo.md b/docs/ru/sql-reference/data-types/geo.md index 23b47f38d05..1b2e18d36c8 100644 --- a/docs/ru/sql-reference/data-types/geo.md +++ b/docs/ru/sql-reference/data-types/geo.md @@ -28,7 +28,7 @@ CREATE TABLE geo_point (p Point) ENGINE = Memory(); INSERT INTO geo_point VALUES((10, 10)); SELECT p, toTypeName(p) FROM geo_point; ``` -Результат: +Результат: ``` text ┌─p─────┬─toTypeName(p)─┠@@ -50,7 +50,7 @@ CREATE TABLE geo_ring (r Ring) ENGINE = Memory(); INSERT INTO geo_ring VALUES([(0, 0), (10, 0), (10, 10), (0, 10)]); SELECT r, toTypeName(r) FROM geo_ring; ``` -Результат: +Результат: ``` text ┌─r─────────────────────────────┬─toTypeName(r)─┠@@ -73,7 +73,7 @@ INSERT INTO geo_polygon VALUES([[(20, 20), (50, 20), (50, 50), (20, 50)], [(30, SELECT pg, toTypeName(pg) FROM geo_polygon; ``` -Результат: +Результат: ``` text ┌─pg────────────────────────────────────────────────────────────┬─toTypeName(pg)─┠@@ -83,7 +83,7 @@ SELECT pg, toTypeName(pg) FROM geo_polygon; ## MultiPolygon {#multipolygon-data-type} -Тип `MultiPolygon` опиÑывает Ñлемент, ÑоÑтоÑщий из неÑкольких проÑÑ‚Ñ‹Ñ… многоугольников (полигональную Ñетку). Он хранитÑÑ Ð² виде маÑÑива многоугольников: [Array](array.md)([Polygon](#polygon-data-type)). +Тип `MultiPolygon` опиÑывает Ñлемент, ÑоÑтоÑщий из неÑкольких проÑÑ‚Ñ‹Ñ… многоугольников (полигональную Ñетку). Он хранитÑÑ Ð² виде маÑÑива многоугольников: [Array](array.md)([Polygon](#polygon-data-type)). **Пример** @@ -95,7 +95,7 @@ CREATE TABLE geo_multipolygon (mpg MultiPolygon) ENGINE = Memory(); INSERT INTO geo_multipolygon VALUES([[[(0, 0), (10, 0), (10, 10), (0, 10)]], [[(20, 20), (50, 20), (50, 50), (20, 50)],[(30, 30), (50, 50), (50, 30)]]]); SELECT mpg, toTypeName(mpg) FROM geo_multipolygon; ``` -Result: +Result: ``` text ┌─mpg─────────────────────────────────────────────────────────────────────────────────────────────┬─toTypeName(mpg)─┠diff --git a/docs/ru/sql-reference/data-types/lowcardinality.md b/docs/ru/sql-reference/data-types/lowcardinality.md index fe9118b1e14..49ba5db0169 100644 --- a/docs/ru/sql-reference/data-types/lowcardinality.md +++ b/docs/ru/sql-reference/data-types/lowcardinality.md @@ -15,7 +15,7 @@ LowCardinality(data_type) **Параметры** -- `data_type` — [String](string.md), [FixedString](fixedstring.md), [Date](date.md), [DateTime](datetime.md) и чиÑла за иÑключением типа [Decimal](decimal.md). `LowCardinality` неÑффективен Ð´Ð»Ñ Ð½ÐµÐºÐ¾Ñ‚Ð¾Ñ€Ñ‹Ñ… типов данных, Ñм. опиÑание наÑтройки [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types). +- `data_type` — [String](string.md), [FixedString](fixedstring.md), [Date](date.md), [DateTime](datetime.md) и чиÑла за иÑключением типа [Decimal](decimal.md). `LowCardinality` неÑффективен Ð´Ð»Ñ Ð½ÐµÐºÐ¾Ñ‚Ð¾Ñ€Ñ‹Ñ… типов данных, Ñм. опиÑание наÑтройки [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types). ## ОпиÑание {#lowcardinality-dscr} @@ -23,16 +23,16 @@ LowCardinality(data_type) ЭффективноÑÑ‚ÑŒ иÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ Ñ‚Ð¸Ð¿Ð° данных `LowCarditality` завиÑит от Ñ€Ð°Ð·Ð½Ð¾Ð¾Ð±Ñ€Ð°Ð·Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ…. ЕÑли Ñловарь Ñодержит менее 10 000 различных значений, ClickHouse в оÑновном показывает более выÑокую ÑффективноÑÑ‚ÑŒ Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¸ Ñ…Ñ€Ð°Ð½ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ…. ЕÑли же Ñловарь Ñодержит более 100 000 различных значений, ClickHouse может работать хуже, чем при иÑпользовании обычных типов данных. -При работе Ñо Ñтроками, иÑпользование `LowCardinality` вмеÑто [Enum](enum.md) обеÑпечивает большую гибкоÑÑ‚ÑŒ в иÑпользовании и чаÑто показывает такую же или более выÑокую ÑффективноÑÑ‚ÑŒ. +При работе Ñо Ñтроками иÑпользование `LowCardinality` вмеÑто [Enum](enum.md) обеÑпечивает большую гибкоÑÑ‚ÑŒ в иÑпользовании и чаÑто показывает такую же или более выÑокую ÑффективноÑÑ‚ÑŒ. ## Пример -Создать таблицу Ñо Ñтолбцами типа `LowCardinality`: +Создание таблицы Ñо Ñтолбцами типа `LowCardinality`: ```sql CREATE TABLE lc_t ( - `id` UInt16, + `id` UInt16, `strings` LowCardinality(String) ) ENGINE = MergeTree() @@ -43,18 +43,18 @@ ORDER BY id ÐаÑтройки: -- [low_cardinality_max_dictionary_size](../../operations/settings/settings.md#low_cardinality_max_dictionary_size) -- [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part) -- [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format) -- [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types) +- [low_cardinality_max_dictionary_size](../../operations/settings/settings.md#low_cardinality_max_dictionary_size) +- [low_cardinality_use_single_dictionary_for_part](../../operations/settings/settings.md#low_cardinality_use_single_dictionary_for_part) +- [low_cardinality_allow_in_native_format](../../operations/settings/settings.md#low_cardinality_allow_in_native_format) +- [allow_suspicious_low_cardinality_types](../../operations/settings/settings.md#allow_suspicious_low_cardinality_types) +- [output_format_arrow_low_cardinality_as_dictionary](../../operations/settings/settings.md#output-format-arrow-low-cardinality-as-dictionary) Функции: -- [toLowCardinality](../functions/type-conversion-functions.md#tolowcardinality) +- [toLowCardinality](../functions/type-conversion-functions.md#tolowcardinality) ## Смотрите также -- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality). -- [Reducing Clickhouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/). -- [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf). - +- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality). +- [Reducing Clickhouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/). +- [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf). diff --git a/docs/ru/sql-reference/data-types/map.md b/docs/ru/sql-reference/data-types/map.md index aceeb21b6e6..32eb4a80b3e 100644 --- a/docs/ru/sql-reference/data-types/map.md +++ b/docs/ru/sql-reference/data-types/map.md @@ -5,12 +5,12 @@ toc_title: Map(key, value) # Map(key, value) {#data_type-map} -Тип данных `Map(key, value)` хранит пары `ключ:значение`. +Тип данных `Map(key, value)` хранит пары `ключ:значение`. -**Параметры** +**Параметры** -- `key` — ключ. [String](../../sql-reference/data-types/string.md) или [Integer](../../sql-reference/data-types/int-uint.md). -- `value` — значение. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md) или [Array](../../sql-reference/data-types/array.md). +- `key` — ключ. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md), [LowCardinality](../../sql-reference/data-types/lowcardinality.md) или [FixedString](../../sql-reference/data-types/fixedstring.md). +- `value` — значение. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md), [Array](../../sql-reference/data-types/array.md), [LowCardinality](../../sql-reference/data-types/lowcardinality.md) или [FixedString](../../sql-reference/data-types/fixedstring.md). Чтобы получить значение из колонки `a Map('key', 'value')`, иÑпользуйте ÑинтакÑÐ¸Ñ `a['key']`. Ð’ наÑтоÑщее Ð²Ñ€ÐµÐ¼Ñ Ñ‚Ð°ÐºÐ°Ñ Ð¿Ð¾Ð´Ñтановка работает по алгоритму Ñ Ð»Ð¸Ð½ÐµÐ¹Ð½Ð¾Ð¹ ÑложноÑтью. @@ -23,7 +23,7 @@ CREATE TABLE table_map (a Map(String, UInt64)) ENGINE=Memory; INSERT INTO table_map VALUES ({'key1':1, 'key2':10}), ({'key1':2,'key2':20}), ({'key1':3,'key2':30}); ``` -Выборка вÑех значений ключа `key2`: +Выборка вÑех значений ключа `key2`: ```sql SELECT a['key2'] FROM table_map; @@ -38,7 +38,7 @@ SELECT a['key2'] FROM table_map; └─────────────────────────┘ ``` -ЕÑли Ð´Ð»Ñ ÐºÐ°ÐºÐ¾Ð³Ð¾-то ключа `key` в колонке Ñ Ñ‚Ð¸Ð¿Ð¾Ð¼ `Map()` нет значениÑ, Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÑ‚ нули Ð´Ð»Ñ Ñ‡Ð¸Ñловых колонок, пуÑтые Ñтроки или пуÑтые маÑÑивы. +ЕÑли Ð´Ð»Ñ ÐºÐ°ÐºÐ¾Ð³Ð¾-то ключа `key` в колонке Ñ Ñ‚Ð¸Ð¿Ð¾Ð¼ `Map()` нет значениÑ, Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÑ‚ нули Ð´Ð»Ñ Ñ‡Ð¸Ñловых колонок, пуÑтые Ñтроки или пуÑтые маÑÑивы. ```sql INSERT INTO table_map VALUES ({'key3':100}), ({}); diff --git a/docs/ru/sql-reference/data-types/nullable.md b/docs/ru/sql-reference/data-types/nullable.md index 84a5d6a796c..ef5e486c128 100644 --- a/docs/ru/sql-reference/data-types/nullable.md +++ b/docs/ru/sql-reference/data-types/nullable.md @@ -33,7 +33,7 @@ toc_title: Nullable **Пример** -ЗапроÑ: +ЗапроÑ: ``` sql CREATE TABLE nullable (`n` Nullable(UInt32)) ENGINE = MergeTree ORDER BY tuple(); diff --git a/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-lifetime.md b/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-lifetime.md index ea1b62c6cef..c0c2b6ee43d 100644 --- a/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-lifetime.md +++ b/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-lifetime.md @@ -86,7 +86,7 @@ SOURCE(ODBC(... invalidate_query 'SELECT update_time FROM dictionary_source wher ... ``` -Ð”Ð»Ñ Ñловарей `Cache`, `ComplexKeyCache`, `SSDCache` и `SSDComplexKeyCache` поддерживаетÑÑ ÐºÐ°Ðº Ñинхронное, так и аÑинхронное обновление. +Ð”Ð»Ñ Ñловарей `Cache`, `ComplexKeyCache`, `SSDCache` и `SSDComplexKeyCache` поддерживаетÑÑ ÐºÐ°Ðº Ñинхронное, так и аÑинхронное обновление. Словари `Flat`, `Hashed` и `ComplexKeyHashed` могут запрашивать только те данные, которые были изменены поÑле предыдущего обновлениÑ. ЕÑли `update_field` указано как чаÑÑ‚ÑŒ конфигурации иÑточника ÑловарÑ, к запроÑу данных будет добавлено Ð²Ñ€ÐµÐ¼Ñ Ð¿Ñ€ÐµÐ´Ñ‹Ð´ÑƒÑ‰ÐµÐ³Ð¾ Ð¾Ð±Ð½Ð¾Ð²Ð»ÐµÐ½Ð¸Ñ Ð² Ñекундах. Ð’ завиÑимоÑти от типа иÑточника (Executable, HTTP, MySQL, PostgreSQL, ClickHouse, ODBC) к `update_field` будет применена ÑоответÑÑ‚Ð²ÑƒÑŽÑ‰Ð°Ñ Ð»Ð¾Ð³Ð¸ÐºÐ° перед запроÑом данных из внешнего иÑточника. diff --git a/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md b/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md index 58406e8ae14..d616960ce36 100644 --- a/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md +++ b/docs/ru/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md @@ -129,7 +129,7 @@ SOURCE(FILE(path './user_files/os.tsv' format 'TabSeparated')) ## ИÑполнÑемый пул {#dicts-external_dicts_dict_sources-executable_pool} -ИÑполнÑемый пул позволÑет загружать данные из пула процеÑÑов. Этот иÑточник не работает Ñо ÑловарÑми, которые требуют загрузки вÑех данных из иÑточника. ИÑполнÑемый пул работает ÑловарÑми, которые размещаютÑÑ [Ñледующими ÑпоÑобами](external-dicts-dict-layout.md#ways-to-store-dictionaries-in-memory): `cache`, `complex_key_cache`, `ssd_cache`, `complex_key_ssd_cache`, `direct`, `complex_key_direct`. +ИÑполнÑемый пул позволÑет загружать данные из пула процеÑÑов. Этот иÑточник не работает Ñо ÑловарÑми, которые требуют загрузки вÑех данных из иÑточника. ИÑполнÑемый пул работает ÑловарÑми, которые размещаютÑÑ [Ñледующими ÑпоÑобами](external-dicts-dict-layout.md#ways-to-store-dictionaries-in-memory): `cache`, `complex_key_cache`, `ssd_cache`, `complex_key_ssd_cache`, `direct`, `complex_key_direct`. ИÑполнÑемый пул генерирует пул процеÑÑов Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ указанной команды и оÑтавлÑет их активными, пока они не завершатÑÑ. Программа Ñчитывает данные из потока STDIN пока он доÑтупен и выводит результат в поток STDOUT, а затем ожидает Ñледующего блока данных из STDIN. ClickHouse не закрывает поток STDIN поÑле обработки блока данных и отправлÑет в него Ñледующую порцию данных, когда Ñто требуетÑÑ. ИÑполнÑемый Ñкрипт должен быть готов к такому ÑпоÑобу обработки данных — он должен заранее опрашивать STDIN и отправлÑÑ‚ÑŒ данные в STDOUT. @@ -581,6 +581,7 @@ SOURCE(MYSQL( default ids
id=10 + 1 ``` @@ -596,7 +597,8 @@ SOURCE(CLICKHOUSE( db 'default' table 'ids' where 'id=10' -)) + secure 1 +)); ``` ÐŸÐ¾Ð»Ñ Ð½Ð°Ñтройки: @@ -609,6 +611,7 @@ SOURCE(CLICKHOUSE( - `table` — Ð¸Ð¼Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹. - `where` — уÑловие выбора. Может отÑутÑтвовать. - `invalidate_query` — Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð´Ð»Ñ Ð¿Ñ€Ð¾Ð²ÐµÑ€ÐºÐ¸ ÑтатуÑа ÑловарÑ. ÐеобÑзательный параметр. Читайте подробнее в разделе [Обновление Ñловарей](external-dicts-dict-lifetime.md). +- `secure` - флаг, разрешающий или не разрешающий защищённое SSL-Ñоединение. ### MongoDB {#dicts-external_dicts_dict_sources-mongodb} @@ -769,4 +772,3 @@ Setting fields: - `table` – Ð˜Ð¼Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹. - `where` – УÑловие выборки. СинтакÑÐ¸Ñ Ð´Ð»Ñ ÑƒÑловий такой же как Ð´Ð»Ñ `WHERE` Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ Ð² PostgreSQL, Ð´Ð»Ñ Ð¿Ñ€Ð¸Ð¼ÐµÑ€Ð°, `id > 10 AND id < 20`. ÐеобÑзательный параметр. - `invalidate_query` – Ð—Ð°Ð¿Ñ€Ð¾Ñ Ð´Ð»Ñ Ð¿Ñ€Ð¾Ð²ÐµÑ€ÐºÐ¸ уÑÐ»Ð¾Ð²Ð¸Ñ Ð·Ð°Ð³Ñ€ÑƒÐ·ÐºÐ¸ ÑловарÑ. ÐеобÑзательный параметр. Читайте больше в разделе [Обновление Ñловарей](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-lifetime.md). - diff --git a/docs/ru/sql-reference/functions/bit-functions.md b/docs/ru/sql-reference/functions/bit-functions.md index a5124e67235..557288ae7c1 100644 --- a/docs/ru/sql-reference/functions/bit-functions.md +++ b/docs/ru/sql-reference/functions/bit-functions.md @@ -257,7 +257,7 @@ bitHammingDistance(int1, int2) **Возвращаемое значение** -- РаÑÑтоÑние Ð¥Ñмминга. +- РаÑÑтоÑние Ð¥Ñмминга. Тип: [UInt8](../../sql-reference/data-types/int-uint.md). diff --git a/docs/ru/sql-reference/functions/date-time-functions.md b/docs/ru/sql-reference/functions/date-time-functions.md index e7bd33bac45..282962b9e3f 100644 --- a/docs/ru/sql-reference/functions/date-time-functions.md +++ b/docs/ru/sql-reference/functions/date-time-functions.md @@ -340,7 +340,7 @@ toStartOfSecond(value[, timezone]) **Ðргументы** - `value` — дата и времÑ. [DateTime64](../data-types/datetime64.md). -- `timezone` — [чаÑовой поÑÑ](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) Ð´Ð»Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÐ¼Ð¾Ð³Ð¾ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ (необÑзательно). ЕÑли параметр не задан, иÑпользуетÑÑ Ñ‡Ð°Ñовой поÑÑ Ð¿Ð°Ñ€Ð°Ð¼ÐµÑ‚Ñ€Ð° `value`. [String](../data-types/string.md). +- `timezone` — [чаÑовой поÑÑ](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-timezone) Ð´Ð»Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÐ¼Ð¾Ð³Ð¾ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ (необÑзательно). ЕÑли параметр не задан, иÑпользуетÑÑ Ñ‡Ð°Ñовой поÑÑ Ð¿Ð°Ñ€Ð°Ð¼ÐµÑ‚Ñ€Ð° `value`. [String](../data-types/string.md). **Возвращаемое значение** @@ -558,13 +558,13 @@ SELECT toDate('2016-12-27') AS date, toYearWeek(date) AS yearWeek0, toYearWeek(d ОтÑекает от даты и времени чаÑти, меньшие чем ÑƒÐºÐ°Ð·Ð°Ð½Ð½Ð°Ñ Ñ‡Ð°ÑÑ‚ÑŒ. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql date_trunc(unit, value[, timezone]) ``` -Синоним: `dateTrunc`. +Синоним: `dateTrunc`. **Ðргументы** @@ -648,7 +648,7 @@ date_add(unit, value, date) - `month` - `quarter` - `year` - + - `value` — значение интервала Ð´Ð»Ñ Ð´Ð¾Ð±Ð°Ð²Ð»ÐµÐ½Ð¸Ñ. [Int](../../sql-reference/data-types/int-uint.md). - `date` — дата или дата Ñо временем, к которой добавлÑетÑÑ `value`. [Date](../../sql-reference/data-types/date.md) или [DateTime](../../sql-reference/data-types/datetime.md). @@ -783,16 +783,16 @@ SELECT date_sub(YEAR, 3, toDate('2018-01-01')); ДобавлÑет интервал времени к указанной дате или дате Ñо временем. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql timestamp_add(date, INTERVAL value unit) ``` -Синонимы: `timeStampAdd`, `TIMESTAMP_ADD`. +Синонимы: `timeStampAdd`, `TIMESTAMP_ADD`. **Ðргументы** - + - `date` — дата или дата Ñо временем. [Date](../../sql-reference/data-types/date.md) или [DateTime](../../sql-reference/data-types/datetime.md). - `value` — значение интервала Ð´Ð»Ñ Ð´Ð¾Ð±Ð°Ð²Ð»ÐµÐ½Ð¸Ñ. [Int](../../sql-reference/data-types/int-uint.md). - `unit` — единица Ð¸Ð·Ð¼ÐµÑ€ÐµÐ½Ð¸Ñ Ð²Ñ€ÐµÐ¼ÐµÐ½Ð¸, в которой задан интервал Ð´Ð»Ñ Ð´Ð¾Ð±Ð°Ð²Ð»ÐµÐ½Ð¸Ñ. [String](../../sql-reference/data-types/string.md). @@ -812,7 +812,7 @@ timestamp_add(date, INTERVAL value unit) Дата или дата Ñо временем, Ð¿Ð¾Ð»ÑƒÑ‡ÐµÐ½Ð½Ð°Ñ Ð² результате Ð´Ð¾Ð±Ð°Ð²Ð»ÐµÐ½Ð¸Ñ `value`, выраженного в `unit`, к `date`. Тип: [Date](../../sql-reference/data-types/date.md) или [DateTime](../../sql-reference/data-types/datetime.md). - + **Пример** ЗапроÑ: @@ -833,13 +833,13 @@ select timestamp_add(toDate('2018-01-01'), INTERVAL 3 MONTH); Вычитает интервал времени из указанной даты или даты Ñо временем. -**СинтакиÑ** +**СинтакиÑ** ``` sql timestamp_sub(unit, value, date) ``` -Синонимы: `timeStampSub`, `TIMESTAMP_SUB`. +Синонимы: `timeStampSub`, `TIMESTAMP_SUB`. **Ðргументы** @@ -854,8 +854,8 @@ timestamp_sub(unit, value, date) - `month` - `quarter` - `year` - -- `value` — значение интервала Ð´Ð»Ñ Ð²Ñ‹Ñ‡Ð¸Ñ‚Ð°Ð½Ð¸Ñ. [Int](../../sql-reference/data-types/int-uint.md). + +- `value` — значение интервала Ð´Ð»Ñ Ð²Ñ‹Ñ‡Ð¸Ñ‚Ð°Ð½Ð¸Ñ. [Int](../../sql-reference/data-types/int-uint.md). - `date` — дата или дата Ñо временем. [Date](../../sql-reference/data-types/date.md) или [DateTime](../../sql-reference/data-types/datetime.md). **Возвращаемое значение** @@ -882,9 +882,9 @@ select timestamp_sub(MONTH, 5, toDateTime('2018-12-18 01:02:03')); ## now {#now} -Возвращает текущую дату и времÑ. +Возвращает текущую дату и времÑ. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql now([timezone]) @@ -1067,7 +1067,7 @@ SELECT dateName('year', date_value), dateName('month', date_value), dateName('da ## FROM\_UNIXTIME {#fromunixtime} -Ð¤ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·ÑƒÐµÑ‚ Unix timestamp в календарную дату и времÑ. +Ð¤ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·ÑƒÐµÑ‚ Unix timestamp в календарную дату и времÑ. **Примеры** diff --git a/docs/ru/sql-reference/functions/encryption-functions.md b/docs/ru/sql-reference/functions/encryption-functions.md index 44957fde152..d9bcbaf0887 100644 --- a/docs/ru/sql-reference/functions/encryption-functions.md +++ b/docs/ru/sql-reference/functions/encryption-functions.md @@ -9,7 +9,7 @@ toc_title: "Функции Ð´Ð»Ñ ÑˆÐ¸Ñ„Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ" Длина ключа завиÑит от режима шифрованиÑ. Он может быть длинной в 16, 24 и 32 байта Ð´Ð»Ñ Ñ€ÐµÐ¶Ð¸Ð¼Ð¾Ð² ÑˆÐ¸Ñ„Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ `-128-`, `-196-` и `-256-` ÑоответÑтвенно. -Длина инициализирующего вектора вÑегда 16 байт (лишние байты игнорируютÑÑ). +Длина инициализирующего вектора вÑегда 16 байт (лишние байты игнорируютÑÑ). Обратите внимание, что до верÑии Clickhouse 21.1 Ñти функции работали медленно. @@ -58,7 +58,7 @@ CREATE TABLE encryption_test ENGINE = Memory; ``` -Ð’Ñтавим некоторые данные (замечание: не храните ключи или инициализирующие векторы в базе данных, так как Ñто компрометирует вÑÑŽ концепцию шифрованиÑ), также хранение "подÑказок" небезопаÑно и иÑпользуетÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ Ð´Ð»Ñ Ð½Ð°Ð³Ð»ÑдноÑти: +Ð’Ñтавим некоторые данные (замечание: не храните ключи или инициализирующие векторы в базе данных, так как Ñто компрометирует вÑÑŽ концепцию шифрованиÑ), также хранение "подÑказок" небезопаÑно и иÑпользуетÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ Ð´Ð»Ñ Ð½Ð°Ð³Ð»ÑдноÑти: ЗапроÑ: @@ -168,7 +168,7 @@ SELECT encrypt('aes-256-cfb128', 'Secret', '123456789101213141516171819202122', ``` text Received exception from server (version 21.1.2): -Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Invalid key size: 33 expected 32: While processing encrypt('aes-256-cfb128', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123'). +Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Invalid key size: 33 expected 32: While processing encrypt('aes-256-cfb128', 'Secret', '123456789101213141516171819202122', 'iviviviviviviviv123'). ``` Однако Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ `aes_encrypt_mysql` в аналогичном Ñлучае возвращает результат, который может быть обработан MySQL: @@ -297,7 +297,7 @@ SELECT comment, decrypt('aes-256-cfb128', secret, '12345678910121314151617181920 ## aes_decrypt_mysql {#aes_decrypt_mysql} -СовмеÑтима Ñ ÑˆÐ¸Ñ„Ñ€Ð¾Ð²Ð°Ð½Ð¸ÐµÐ¼ myqsl и может раÑшифровать данные, зашифрованные функцией [AES_ENCRYPT](https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-encrypt). +СовмеÑтима Ñ ÑˆÐ¸Ñ„Ñ€Ð¾Ð²Ð°Ð½Ð¸ÐµÐ¼ myqsl и может раÑшифровать данные, зашифрованные функцией [AES_ENCRYPT](https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-encrypt). При одинаковых входÑщих значениÑÑ… раÑшифрованный текÑÑ‚ будет Ñовпадать Ñ Ñ€ÐµÐ·ÑƒÐ»ÑŒÑ‚Ð°Ñ‚Ð¾Ð¼, возвращаемым функцией `decrypt`. Однако еÑли `key` или `iv` длиннее, чем должны быть, `aes_decrypt_mysql` будет работать аналогично функции `aes_decrypt` в MySQL: Ñвернет ключ и проигнорирует лишнюю чаÑÑ‚ÑŒ `iv`. diff --git a/docs/ru/sql-reference/functions/ext-dict-functions.md b/docs/ru/sql-reference/functions/ext-dict-functions.md index 612477dc806..0e234f1d84e 100644 --- a/docs/ru/sql-reference/functions/ext-dict-functions.md +++ b/docs/ru/sql-reference/functions/ext-dict-functions.md @@ -23,8 +23,8 @@ dictGetOrNull('dict_name', attr_name, id_expr) **Ðргументы** - `dict_name` — Ð¸Ð¼Ñ ÑловарÑ. [Строковый литерал](../syntax.md#syntax-string-literal). -- `attr_names` — Ð¸Ð¼Ñ Ñтолбца ÑловарÑ, [Строковый литерал](../syntax.md#syntax-string-literal), или кортеж [Tuple](../../sql-reference/data-types/tuple.md) таких имен. -- `id_expr` — значение ключа ÑловарÑ. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md) или [Tuple](../../sql-reference/functions/ext-dict-functions.md), в завиÑимоÑти от конфигурации ÑловарÑ. +- `attr_names` — Ð¸Ð¼Ñ Ñтолбца ÑловарÑ. [Строковый литерал](../syntax.md#syntax-string-literal), или кортеж [Tuple](../../sql-reference/data-types/tuple.md) таких имен. +- `id_expr` — значение ключа ÑловарÑ. [Expression](../../sql-reference/syntax.md#syntax-expressions) возвращает пару "ключ-значение" ÑÐ»Ð¾Ð²Ð°Ñ€Ñ Ð¸Ð»Ð¸ [Tuple](../../sql-reference/functions/ext-dict-functions.md), в завиÑимоÑти от конфигурации ÑловарÑ. - `default_value_expr` — значение, возвращаемое в том Ñлучае, когда Ñловарь не Ñодержит Ñтроки Ñ Ð·Ð°Ð´Ð°Ð½Ð½Ñ‹Ð¼ ключом `id_expr`. [Выражение](../syntax.md#syntax-expressions), возвращающее значение Ñ Ñ‚Ð¸Ð¿Ð¾Ð¼ данных, Ñконфигурированным Ð´Ð»Ñ Ð°Ñ‚Ñ€Ð¸Ð±ÑƒÑ‚Ð° `attr_names`, или кортеж [Tuple](../../sql-reference/data-types/tuple.md) таких выражений. **Возвращаемое значение** @@ -87,7 +87,7 @@ SELECT dictGetOrDefault('ext-dict-test', 'c1', number + 1, toUInt32(number * 10)) AS val, toTypeName(val) AS type FROM system.numbers -LIMIT 3 +LIMIT 3; ``` ``` text @@ -138,7 +138,7 @@ LIMIT 3 c2 String - + 0 @@ -237,7 +237,7 @@ dictHas('dict_name', id) **Ðргументы** - `dict_name` — Ð¸Ð¼Ñ ÑловарÑ. [Строковый литерал](../syntax.md#syntax-string-literal). -- `id_expr` — значение ключа ÑловарÑ. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md) или [Tuple](../../sql-reference/functions/ext-dict-functions.md) в завиÑимоÑти от конфигурации ÑловарÑ. +- `id_expr` — значение ключа ÑловарÑ. [Expression](../../sql-reference/syntax.md#syntax-expressions) возвращает пару "ключ-значение" ÑÐ»Ð¾Ð²Ð°Ñ€Ñ Ð¸Ð»Ð¸ [Tuple](../../sql-reference/functions/ext-dict-functions.md) в завиÑимоÑти от конфигурации ÑловарÑ. **Возвращаемое значение** @@ -290,16 +290,16 @@ Type: [Array](../../sql-reference/data-types/array.md)([UInt64](../../sql-refere Возвращает потомков первого ÑƒÑ€Ð¾Ð²Ð½Ñ Ð² виде маÑÑива индекÑов. Это обратное преобразование Ð´Ð»Ñ [dictGetHierarchy](#dictgethierarchy). -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql dictGetChildren(dict_name, key) ``` -**Ðргументы** +**Ðргументы** -- `dict_name` — Ð¸Ð¼Ñ ÑловарÑ. [String literal](../../sql-reference/syntax.md#syntax-string-literal). -- `key` — значение ключа. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md). +- `dict_name` — Ð¸Ð¼Ñ ÑловарÑ. [String literal](../../sql-reference/syntax.md#syntax-string-literal). +- `key` — значение ключа. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md). **Возвращаемые значениÑ** @@ -337,7 +337,7 @@ SELECT dictGetChildren('hierarchy_flat_dictionary', number) FROM system.numbers ## dictGetDescendant {#dictgetdescendant} -Возвращает вÑех потомков, как еÑли бы Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ [dictGetChildren](#dictgetchildren) была выполнена `level` раз рекурÑивно. +Возвращает вÑех потомков, как еÑли бы Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ [dictGetChildren](#dictgetchildren) была выполнена `level` раз рекурÑивно. **СинтакÑиÑ** @@ -345,9 +345,9 @@ SELECT dictGetChildren('hierarchy_flat_dictionary', number) FROM system.numbers dictGetDescendants(dict_name, key, level) ``` -**Ðргументы** +**Ðргументы** -- `dict_name` — Ð¸Ð¼Ñ ÑловарÑ. [String literal](../../sql-reference/syntax.md#syntax-string-literal). +- `dict_name` — Ð¸Ð¼Ñ ÑловарÑ. [String literal](../../sql-reference/syntax.md#syntax-string-literal). - `key` — значение ключа. [Выражение](../syntax.md#syntax-expressions), возвращающее значение типа [UInt64](../../sql-reference/functions/ext-dict-functions.md). - `level` — уровень иерархии. ЕÑли `level = 0`, возвращаютÑÑ Ð²Ñе потомки. [UInt8](../../sql-reference/data-types/int-uint.md). diff --git a/docs/ru/sql-reference/functions/geo/geohash.md b/docs/ru/sql-reference/functions/geo/geohash.md index 0992b620e60..73e739893ca 100644 --- a/docs/ru/sql-reference/functions/geo/geohash.md +++ b/docs/ru/sql-reference/functions/geo/geohash.md @@ -4,7 +4,7 @@ toc_title: "Функции Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ ÑиÑтемой Geohash" # Функции Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ ÑиÑтемой Geohash {#geohash} -[Geohash](https://en.wikipedia.org/wiki/Geohash) — Ñто ÑиÑтема геокодированиÑ, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð´ÐµÐ»Ð¸Ñ‚ поверхноÑÑ‚ÑŒ Земли на учаÑтки в виде "решетки", и каждую Ñчейку решетки кодирует в виде Ñтроки из букв и цифр. СиÑтема поддерживает иерархию (вложенноÑÑ‚ÑŒ) Ñчеек, поÑтому чем точнее определена геопозициÑ, тем длиннее Ñтрока Ñ ÐºÐ¾Ð´Ð¾Ð¼ ÑоответÑтвующей Ñчейки. +[Geohash](https://en.wikipedia.org/wiki/Geohash) — Ñто ÑиÑтема геокодированиÑ, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð´ÐµÐ»Ð¸Ñ‚ поверхноÑÑ‚ÑŒ Земли на учаÑтки в виде "решетки", и каждую Ñчейку решетки кодирует в виде Ñтроки из букв и цифр. СиÑтема поддерживает иерархию (вложенноÑÑ‚ÑŒ) Ñчеек, поÑтому чем точнее определена геопозициÑ, тем длиннее Ñтрока Ñ ÐºÐ¾Ð´Ð¾Ð¼ ÑоответÑтвующей Ñчейки. Ð”Ð»Ñ Ñ€ÑƒÑ‡Ð½Ð¾Ð³Ð¾ Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ñ Ð³ÐµÐ¾Ð³Ñ€Ð°Ñ„Ð¸Ñ‡ÐµÑких координат в Ñтроку geohash можно иÑпользовать Ñайт [geohash.org](http://geohash.org/). @@ -89,7 +89,7 @@ geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precisi **Возвращаемые значениÑ** -- МаÑÑив Ñтрок, опиÑывающих учаÑтки, покрывающие заданный учаÑток. Длина каждой Ñтроки ÑоответÑтвует точноÑти geohash. ПорÑдок Ñтрок — произвольный. +- МаÑÑив Ñтрок, опиÑывающих учаÑтки, покрывающие заданный учаÑток. Длина каждой Ñтроки ÑоответÑтвует точноÑти geohash. ПорÑдок Ñтрок — произвольный. - \[\] - ЕÑли переданные минимальные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÑˆÐ¸Ñ€Ð¾Ñ‚Ñ‹ и долготы больше ÑоответÑтвующих макÑимальных значений, Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÑ‚ пуÑтой маÑÑив. Тип данных: [Array](../../../sql-reference/data-types/array.md)([String](../../../sql-reference/data-types/string.md)). diff --git a/docs/ru/sql-reference/functions/geo/h3.md b/docs/ru/sql-reference/functions/geo/h3.md index 27a512a9931..088814c4d7d 100644 --- a/docs/ru/sql-reference/functions/geo/h3.md +++ b/docs/ru/sql-reference/functions/geo/h3.md @@ -4,9 +4,9 @@ toc_title: "Функции Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ Ð¸Ð½Ð´ÐµÐºÑами H3" # Функции Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ Ð¸Ð½Ð´ÐµÐºÑами H3 {#h3index} -[H3](https://eng.uber.com/h3/) — Ñто ÑиÑтема геокодированиÑ, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð´ÐµÐ»Ð¸Ñ‚ поверхноÑÑ‚ÑŒ Земли на равные шеÑтигранные Ñчейки. СиÑтема поддерживает иерархию (вложенноÑÑ‚ÑŒ) Ñчеек, Ñ‚.е. каждый "родительÑкий" шеÑтигранник может быть поделен на Ñемь одинаковых вложенных "дочерних" шеÑтигранников, и так далее. +[H3](https://eng.uber.com/h3/) — Ñто ÑиÑтема геокодированиÑ, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ Ð´ÐµÐ»Ð¸Ñ‚ поверхноÑÑ‚ÑŒ Земли на равные шеÑтигранные Ñчейки. СиÑтема поддерживает иерархию (вложенноÑÑ‚ÑŒ) Ñчеек, Ñ‚.е. каждый "родительÑкий" шеÑтигранник может быть поделен на Ñемь одинаковых вложенных "дочерних" шеÑтигранников, и так далее. -Уровень вложенноÑти назваетÑÑ `разрешением` и может принимать значение от `0` до `15`, где `0` ÑоответÑтвует `базовым` Ñчейкам Ñамого верхнего ÑƒÑ€Ð¾Ð²Ð½Ñ (наиболее крупным). +Уровень вложенноÑти назваетÑÑ `разрешением` и может принимать значение от `0` до `15`, где `0` ÑоответÑтвует `базовым` Ñчейкам Ñамого верхнего ÑƒÑ€Ð¾Ð²Ð½Ñ (наиболее крупным). Ð”Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð¹ точки, имеющей широту и долготу, можно получить 64-битный Ð¸Ð½Ð´ÐµÐºÑ H3, ÑоответÑтвующий номеру шеÑтигранной Ñчейки, где Ñта точка находитÑÑ. @@ -65,7 +65,7 @@ h3GetResolution(h3index) **Возвращаемые значениÑ** -- Разрешение Ñетки. Диапазон значений: `[0, 15]`. +- Разрешение Ñетки. Диапазон значений: `[0, 15]`. - Ð”Ð»Ñ Ð½ÐµÑущеÑтвующего идентификатора может быть возвращено произвольное значение. ИÑпользуйте [h3IsValid](#h3isvalid) Ð´Ð»Ñ Ð¿Ñ€Ð¾Ð²ÐµÑ€ÐºÐ¸ идентификаторов. Тип: [UInt8](../../../sql-reference/data-types/int-uint.md). @@ -252,7 +252,7 @@ h3GetBaseCell(index) **Возвращаемое значение** -- Ð˜Ð½Ð´ÐµÐºÑ Ð±Ð°Ð·Ð¾Ð²Ð¾Ð¹ шеÑтиугольной Ñчейки. +- Ð˜Ð½Ð´ÐµÐºÑ Ð±Ð°Ð·Ð¾Ð²Ð¾Ð¹ шеÑтиугольной Ñчейки. Тип: [UInt8](../../../sql-reference/data-types/int-uint.md). @@ -274,7 +274,7 @@ SELECT h3GetBaseCell(612916788725809151) as basecell; ## h3HexAreaM2 {#h3hexaream2} -ОпределÑет Ñреднюю площадь шеÑтиугольной [H3](#h3index)-Ñчейки заданного Ñ€Ð°Ð·Ñ€ÐµÑˆÐµÐ½Ð¸Ñ Ð² квадратных метрах. +ОпределÑет Ñреднюю площадь шеÑтиугольной [H3](#h3index)-Ñчейки заданного Ñ€Ð°Ð·Ñ€ÐµÑˆÐµÐ½Ð¸Ñ Ð² квадратных метрах. **СинтакÑиÑ** @@ -324,7 +324,7 @@ h3IndexesAreNeighbors(index1, index2) **Возвращаемое значение** - `1` — Ñчейки ÑвлÑÑŽÑ‚ÑÑ ÑоÑедÑми. -- `0` — Ñчейки не ÑвлÑÑŽÑ‚ÑÑ ÑоÑедÑми. +- `0` — Ñчейки не ÑвлÑÑŽÑ‚ÑÑ ÑоÑедÑми. Тип: [UInt8](../../../sql-reference/data-types/int-uint.md). @@ -361,7 +361,7 @@ h3ToChildren(index, resolution) **Возвращаемое значение** -- МаÑÑив дочерних H3-Ñчеек. +- МаÑÑив дочерних H3-Ñчеек. Тип: [Array](../../../sql-reference/data-types/array.md)([UInt64](../../../sql-reference/data-types/int-uint.md)). @@ -398,7 +398,7 @@ h3ToParent(index, resolution) **Возвращаемое значение** -- Ð˜Ð½Ð´ÐµÐºÑ Ñ€Ð¾Ð´Ð¸Ñ‚ÐµÐ»ÑŒÑкой H3-Ñчейки. +- Ð˜Ð½Ð´ÐµÐºÑ Ñ€Ð¾Ð´Ð¸Ñ‚ÐµÐ»ÑŒÑкой H3-Ñчейки. Тип: [UInt64](../../../sql-reference/data-types/int-uint.md). @@ -432,7 +432,7 @@ h3ToString(index) **Возвращаемое значение** -- Строковое предÑтавление H3-индекÑа. +- Строковое предÑтавление H3-индекÑа. Тип: [String](../../../sql-reference/data-types/string.md). @@ -468,8 +468,8 @@ stringToH3(index_str) **Возвращаемое значение** -- ЧиÑловое предÑтавление индекÑа шеÑтиугольной Ñчейки. -- `0`, еÑли при преобразовании возникла ошибка. +- ЧиÑловое предÑтавление индекÑа шеÑтиугольной Ñчейки. +- `0`, еÑли при преобразовании возникла ошибка. Тип: [UInt64](../../../sql-reference/data-types/int-uint.md). @@ -505,7 +505,7 @@ h3GetResolution(index) **Возвращаемое значение** -- Разрешение Ñчейки. Диапазон: `[0, 15]`. +- Разрешение Ñчейки. Диапазон: `[0, 15]`. Тип: [UInt8](../../../sql-reference/data-types/int-uint.md). diff --git a/docs/ru/sql-reference/functions/index.md b/docs/ru/sql-reference/functions/index.md index 1eefd4d9f73..15da9d36ef5 100644 --- a/docs/ru/sql-reference/functions/index.md +++ b/docs/ru/sql-reference/functions/index.md @@ -48,9 +48,9 @@ toc_title: "Введение" Функции выÑшего порÑдка, в качеÑтве Ñвоего функционального аргумента могут принимать только лÑмбда-функции. Чтобы передать лÑмбда-функцию в функцию выÑшего порÑдка, иÑпользуйте оператор `->`. Слева от Ñтрелочки Ñтоит формальный параметр — произвольный идентификатор, или неÑколько формальных параметров — произвольные идентификаторы в кортеже. Справа от Ñтрелочки Ñтоит выражение, в котором могут иÑпользоватьÑÑ Ñти формальные параметры, а также любые Ñтолбцы таблицы. -Примеры: +Примеры: ``` -x -> 2 * x +x -> 2 * x str -> str != Referer ``` diff --git a/docs/ru/sql-reference/functions/ip-address-functions.md b/docs/ru/sql-reference/functions/ip-address-functions.md index b02d45d7667..92ea368828e 100644 --- a/docs/ru/sql-reference/functions/ip-address-functions.md +++ b/docs/ru/sql-reference/functions/ip-address-functions.md @@ -53,7 +53,7 @@ LIMIT 10 ### IPv6NumToString(x) {#ipv6numtostringx} Принимает значение типа FixedString(16), Ñодержащее IPv6-Ð°Ð´Ñ€ÐµÑ Ð² бинарном виде. Возвращает Ñтроку, Ñодержащую Ñтот Ð°Ð´Ñ€ÐµÑ Ð² текÑтовом виде. -IPv6-mapped IPv4 адреÑа выводитÑÑ Ð² формате ::ffff:111.222.33.44. +IPv6-mapped IPv4 адреÑа выводитÑÑ Ð² формате ::ffff:111.222.33.44. Примеры: `INET6_NTOA`. @@ -137,7 +137,7 @@ HEX может быть в любом региÑтре. IPv6StringToNum(string) ``` -**Ðргумент** +**Ðргумент** - `string` — IP адреÑ. [String](../../sql-reference/data-types/string.md). @@ -281,7 +281,7 @@ toIPv6(string) **Возвращаемое значение** -- IP адреÑ. +- IP адреÑ. Тип: [IPv6](../../sql-reference/data-types/domains/ipv6.md). diff --git a/docs/ru/sql-reference/functions/json-functions.md b/docs/ru/sql-reference/functions/json-functions.md index b935244e821..d20d8cf5998 100644 --- a/docs/ru/sql-reference/functions/json-functions.md +++ b/docs/ru/sql-reference/functions/json-functions.md @@ -327,7 +327,7 @@ toJSONString(value) **Возвращаемое значение** -- JSON предÑтавление значениÑ. +- JSON предÑтавление значениÑ. Тип: [String](../../sql-reference/data-types/string.md). diff --git a/docs/ru/sql-reference/functions/logical-functions.md b/docs/ru/sql-reference/functions/logical-functions.md index f4dee477ee0..837541ec58f 100644 --- a/docs/ru/sql-reference/functions/logical-functions.md +++ b/docs/ru/sql-reference/functions/logical-functions.md @@ -21,7 +21,7 @@ and(val1, val2...) **Ðргументы** -- `val1, val2, ...` — ÑпиÑок из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). +- `val1, val2, ...` — ÑпиÑок из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). **Возвращаемое значение** @@ -73,7 +73,7 @@ and(val1, val2...) **Ðргументы** -- `val1, val2, ...` — ÑпиÑок из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). +- `val1, val2, ...` — ÑпиÑок из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). **Returned value** @@ -125,7 +125,7 @@ not(val); **Ðргументы** -- `val` — значение. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). +- `val` — значение. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). **Возвращаемое значение** @@ -163,7 +163,7 @@ xor(val1, val2...) **Ðргументы** -- `val1, val2, ...` — ÑпиÑок из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). +- `val1, val2, ...` — ÑпиÑок из как минимум двух значений. [Int](../../sql-reference/data-types/int-uint.md), [UInt](../../sql-reference/data-types/int-uint.md), [Float](../../sql-reference/data-types/float.md) или [Nullable](../../sql-reference/data-types/nullable.md). **Returned value** diff --git a/docs/ru/sql-reference/functions/machine-learning-functions.md b/docs/ru/sql-reference/functions/machine-learning-functions.md index a1716eed6c2..ce7d3cfd09e 100644 --- a/docs/ru/sql-reference/functions/machine-learning-functions.md +++ b/docs/ru/sql-reference/functions/machine-learning-functions.md @@ -21,13 +21,13 @@ toc_title: "Функции машинного обучениÑ" Сравнивает теÑтовые группы (варианты) и Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð¹ группы раÑÑчитывает вероÑтноÑÑ‚ÑŒ того, что Ñта группа окажетÑÑ Ð»ÑƒÑ‡ÑˆÐµÐ¹. ÐŸÐµÑ€Ð²Ð°Ñ Ð¸Ð· перечиÑленных групп ÑчитаетÑÑ ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð¾Ð¹. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql bayesAB(distribution_name, higher_is_better, variant_names, x, y) ``` -**Ðргументы** +**Ðргументы** - `distribution_name` — вероÑтноÑтное раÑпределение. [String](../../sql-reference/data-types/string.md). Возможные значениÑ: diff --git a/docs/ru/sql-reference/functions/nlp-functions.md b/docs/ru/sql-reference/functions/nlp-functions.md new file mode 100644 index 00000000000..58c4eb86e35 --- /dev/null +++ b/docs/ru/sql-reference/functions/nlp-functions.md @@ -0,0 +1,132 @@ +--- +toc_priority: 67 +toc_title: NLP +--- + +# [ÑкÑпериментально] Функции Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ ÐµÑтвеÑтвенным Ñзыком {#nlp-functions} + +!!! warning "Предупреждение" + Ð¡ÐµÐ¹Ñ‡Ð°Ñ Ð¸Ñпользование функций Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ ÐµÑтвеÑтвенным Ñзыком ÑвлÑетÑÑ ÑкÑпериментальной возможноÑтью. Чтобы иÑпользовать данные функции, включите наÑтройку `allow_experimental_nlp_functions = 1`. + +## stem {#stem} + +Ð”Ð°Ð½Ð½Ð°Ñ Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ñ€Ð¾Ð²Ð¾Ð´Ð¸Ñ‚ Ñтемминг заданного Ñлова. + +**СинтакÑиÑ** + +``` sql +stem('language', word) +``` + +**Ðргументы** + +- `language` — Язык, правила которого будут применены Ð´Ð»Ñ Ñтемминга. ДопуÑкаетÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ нижний региÑÑ‚Ñ€. [String](../../sql-reference/data-types/string.md#string). +- `word` — Слово подлежащее Ñтеммингу. ДопуÑкаетÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ нижний региÑÑ‚Ñ€. [String](../../sql-reference/data-types/string.md#string). + +**Examples** + +Query: + +``` sql +SELECT SELECT arrayMap(x -> stem('en', x), ['I', 'think', 'it', 'is', 'a', 'blessing', 'in', 'disguise']) as res; +``` + +Result: + +``` text +┌─res────────────────────────────────────────────────┠+│ ['I','think','it','is','a','bless','in','disguis'] │ +└────────────────────────────────────────────────────┘ +``` + +## lemmatize {#lemmatize} + +Ð”Ð°Ð½Ð½Ð°Ñ Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¿Ñ€Ð¾Ð²Ð¾Ð´Ð¸Ñ‚ лемматизацию Ð´Ð»Ñ Ð·Ð°Ð´Ð°Ð½Ð½Ð¾Ð³Ð¾ Ñлова. Ð”Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ лемматизатора необходимы Ñловари, которые можно найти [здеÑÑŒ](https://github.com/vpodpecan/lemmagen3/tree/master/src/lemmagen3/models). + +**СинтакÑиÑ** + +``` sql +lemmatize('language', word) +``` + +**Ðргументы** + +- `language` — Язык, правила которого будут применены Ð´Ð»Ñ Ð»ÐµÐ¼Ð¼Ð°Ñ‚Ð¸Ð·Ð°Ñ†Ð¸Ð¸. [String](../../sql-reference/data-types/string.md#string). +- `word` — Слово, подлежащее лемматизации. ДопуÑкаетÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ нижний региÑÑ‚Ñ€. [String](../../sql-reference/data-types/string.md#string). + +**Примеры** + +ЗапроÑ: + +``` sql +SELECT lemmatize('en', 'wolves'); +``` + +Результат: + +``` text +┌─lemmatize("wolves")─┠+│ "wolf" │ +└─────────────────────┘ +``` + +КонфигурациÑ: +``` xml + + + en + en.bin + + +``` + +## synonyms {#synonyms} + +Ðаходит Ñинонимы к заданному Ñлову. ПредÑтавлены два типа раÑширений Ñловарей: `plain` и `wordnet`. + +Ð”Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ раÑÑˆÐ¸Ñ€ÐµÐ½Ð¸Ñ Ñ‚Ð¸Ð¿Ð° `plain` необходимо указать путь до проÑтого текÑтового файла, где ÐºÐ°Ð¶Ð´Ð°Ñ Ñтрока ÑоотвеÑтвует одному набору Ñинонимов. Слова в данной Ñтроке должны быть разделены Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ пробела или знака табулÑции. + +Ð”Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ раÑÑˆÐ¸Ñ€ÐµÐ½Ð¸Ñ Ñ‚Ð¸Ð¿Ð° `plain` необходимо указать путь до WordNet тезауруÑа. Ð¢ÐµÐ·Ð°ÑƒÑ€ÑƒÑ Ð´Ð¾Ð»Ð¶ÐµÐ½ Ñодержать WordNet sense index. + +**СинтакÑиÑ** + +``` sql +synonyms('extension_name', word) +``` + +**Ðргументы** + +- `extension_name` — Ðазвание раÑширениÑ, в котором будет проводитьÑÑ Ð¿Ð¾Ð¸Ñк. [String](../../sql-reference/data-types/string.md#string). +- `word` — Слово, которое будет иÑкатьÑÑ Ð² раÑширении. [String](../../sql-reference/data-types/string.md#string). + +**Примеры** + +ЗапроÑ: + +``` sql +SELECT synonyms('list', 'important'); +``` + +Результат: + +``` text +┌─synonyms('list', 'important')────────────┠+│ ['important','big','critical','crucial'] │ +└──────────────────────────────────────────┘ +``` + +КонфигурациÑ: +``` xml + + + en + plain + en.txt + + + en + wordnet + en/ + + +``` \ No newline at end of file diff --git a/docs/ru/sql-reference/functions/other-functions.md b/docs/ru/sql-reference/functions/other-functions.md index a07bd19faa1..0e23c2f743f 100644 --- a/docs/ru/sql-reference/functions/other-functions.md +++ b/docs/ru/sql-reference/functions/other-functions.md @@ -2088,3 +2088,52 @@ SELECT tcpPort(); - [tcp_port](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port) +## currentProfiles {#current-profiles} + +Возвращает ÑпиÑок [профилей наÑтроек](../../operations/access-rights.md#settings-profiles-management) Ð´Ð»Ñ Ñ‚ÐµÐºÑƒÑ‰ÐµÐ³Ð¾ пользователÑ. + +Ð”Ð»Ñ Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ñ‚ÐµÐºÑƒÑ‰ÐµÐ³Ð¾ Ð¿Ñ€Ð¾Ñ„Ð¸Ð»Ñ Ð½Ð°Ñтроек может быть иÑпользована команда [SET PROFILE](../../sql-reference/statements/set.md#set-statement#query-set). ЕÑли команда `SET PROFILE` не применÑлаÑÑŒ, Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð²Ð¾Ð·Ð²Ñ€Ð°Ñ‰Ð°ÐµÑ‚ профили, указанные при определении текущего Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ (Ñм. [CREATE USER](../../sql-reference/statements/create/user.md#create-user-statement)). + +**СинтакÑиÑ** + +``` sql +currentProfiles() +``` + +**Возвращаемое значение** + +- СпиÑок профилей наÑтроек Ð´Ð»Ñ Ñ‚ÐµÐºÑƒÑ‰ÐµÐ³Ð¾ пользователÑ. + +Тип: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). + +## enabledProfiles {#enabled-profiles} + +Возвращает профили наÑтроек, назначенные пользователю как Ñвно, так и неÑвно. Явно назначенные профили — Ñто те же профили, которые возвращает Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ [currentProfiles](#current-profiles). ÐеÑвно назначенные профили включают родительÑкие профили других назначенных профилей; профили, назначенные Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ предоÑтавленных ролей; профили, назначенные Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ ÑобÑтвенных наÑтроек; оÑновной профиль по умолчанию (Ñм. Ñекцию `default_profile` в оÑновном конфигурационном файле Ñервера). + +**СинтакÑиÑ** + +``` sql +enabledProfiles() +``` + +**Возвращаемое значение** + +- СпиÑок доÑтупных профилей Ð´Ð»Ñ Ñ‚ÐµÐºÑƒÑ‰ÐµÐ³Ð¾ пользователÑ. + +Тип: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). + +## defaultProfiles {#default-profiles} + +Возвращает вÑе профили, указанные при объÑвлении текущего Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ (Ñм. [CREATE USER](../../sql-reference/statements/create/user.md#create-user-statement)) + +**СинтакÑиÑ** + +``` sql +defaultProfiles() +``` + +**Возвращаемое значение** + +- СпиÑок профилей по умолчанию. + +Тип: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). \ No newline at end of file diff --git a/docs/ru/sql-reference/functions/splitting-merging-functions.md b/docs/ru/sql-reference/functions/splitting-merging-functions.md index 5a0c540cf3a..efe74dba043 100644 --- a/docs/ru/sql-reference/functions/splitting-merging-functions.md +++ b/docs/ru/sql-reference/functions/splitting-merging-functions.md @@ -146,6 +146,70 @@ SELECT splitByRegexp('', 'abcde'); └────────────────────────────┘ ``` +## splitByWhitespace(s) {#splitbywhitespaceseparator-s} + +Разбивает Ñтроку на подÑтроки, иÑÐ¿Ð¾Ð»ÑŒÐ·ÑƒÑ Ð² качеÑтве разделителей пробельные Ñимволы. + +**СинтакÑиÑ** + +``` sql +splitByWhitespace(s) +``` + +**Ðргументы** + +- `s` — Ñ€Ð°Ð·Ð±Ð¸Ð²Ð°ÐµÐ¼Ð°Ñ Ñтрока. [String](../../sql-reference/data-types/string.md). + +**Возвращаемые значениÑ** + +Возвращает маÑÑив подÑтрок. + +Тип: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). + +**Пример** + +``` sql +SELECT splitByWhitespace(' 1! a, b. '); +``` + +``` text +┌─splitByWhitespace(' 1! a, b. ')─┠+│ ['1!','a,','b.'] │ +└─────────────────────────────────────┘ +``` + +## splitByNonAlpha(s) {#splitbynonalphaseparator-s} + +Разбивает Ñтроку на подÑтроки, иÑÐ¿Ð¾Ð»ÑŒÐ·ÑƒÑ Ð² качеÑтве разделителей пробельные Ñимволы и Ñимволы пунктуации. + +**СинтакÑиÑ** + +``` sql +splitByNonAlpha(s) +``` + +**Ðргументы** + +- `s` — Ñ€Ð°Ð·Ð±Ð¸Ð²Ð°ÐµÐ¼Ð°Ñ Ñтрока. [String](../../sql-reference/data-types/string.md). + +**Возвращаемые значениÑ** + +Возвращает маÑÑив подÑтрок. + +Тип: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). + +**Пример** + +``` sql +SELECT splitByNonAlpha(' 1! a, b. '); +``` + +``` text +┌─splitByNonAlpha(' 1! a, b. ')─┠+│ ['1','a','b'] │ +└───────────────────────────────────┘ +``` + ## arrayStringConcat(arr\[, separator\]) {#arraystringconcatarr-separator} Склеивает Ñтроки, перечиÑленные в маÑÑиве, Ñ Ñ€Ð°Ð·Ð´ÐµÐ»Ð¸Ñ‚ÐµÐ»ÐµÐ¼ separator. diff --git a/docs/ru/sql-reference/functions/string-functions.md b/docs/ru/sql-reference/functions/string-functions.md index 04af599c09a..b587a991db1 100644 --- a/docs/ru/sql-reference/functions/string-functions.md +++ b/docs/ru/sql-reference/functions/string-functions.md @@ -500,7 +500,7 @@ SELECT trimBoth(' Hello, world! '); normalizeQuery(x) ``` -**Ðргументы** +**Ðргументы** - `x` — поÑледовательноÑÑ‚ÑŒ Ñимволов. [String](../../sql-reference/data-types/string.md). @@ -530,13 +530,13 @@ SELECT normalizeQuery('[1, 2, 3, x]') AS query; Возвращает идентичные 64-битные Ñ…Ñш - Ñуммы без значений литералов Ð´Ð»Ñ Ð°Ð½Ð°Ð»Ð¾Ð³Ð¸Ñ‡Ð½Ñ‹Ñ… запроÑов. Это помогает анализировать журнал запроÑов. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql normalizedQueryHash(x) ``` -**Ðргументы** +**Ðргументы** - `x` — поÑледовательноÑÑ‚ÑŒ Ñимволов. [String](../../sql-reference/data-types/string.md). @@ -568,13 +568,13 @@ SELECT normalizedQueryHash('SELECT 1 AS `xyz`') != normalizedQueryHash('SELECT 1 ЭкранируютÑÑ Ñимволы, которые в формате XML ÑвлÑÑŽÑ‚ÑÑ Ð·Ð°Ñ€ÐµÐ·ÐµÑ€Ð²Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð½Ñ‹Ð¼Ð¸ (Ñлужебными): `<`, `&`, `>`, `"`, `'`. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql encodeXMLComponent(x) ``` -**Ðргументы** +**Ðргументы** - `x` — поÑледовательноÑÑ‚ÑŒ Ñимволов. [String](../../sql-reference/data-types/string.md). @@ -637,7 +637,7 @@ SELECT decodeXMLComponent('< Σ >'); Результат: ``` text -'foo' +'foo' < Σ > ``` @@ -679,7 +679,7 @@ extractTextFromHTML(x) **Ðргументы** -- `x` — текÑÑ‚ Ð´Ð»Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸. [String](../../sql-reference/data-types/string.md). +- `x` — текÑÑ‚ Ð´Ð»Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸. [String](../../sql-reference/data-types/string.md). **Возвращаемое значение** diff --git a/docs/ru/sql-reference/functions/string-search-functions.md b/docs/ru/sql-reference/functions/string-search-functions.md index 2417a1c6ffd..5b6cbc68d2e 100644 --- a/docs/ru/sql-reference/functions/string-search-functions.md +++ b/docs/ru/sql-reference/functions/string-search-functions.md @@ -23,7 +23,7 @@ position(haystack, needle[, start_pos]) ``` sql position(needle IN haystack) -``` +``` ÐлиаÑ: `locate(haystack, needle[, start_pos])`. @@ -93,7 +93,7 @@ SELECT 1 = position('абв' IN 'абв'); └───────────────────────────────────┘ ``` -ЗапроÑ: +ЗапроÑ: ```sql SELECT 0 = position('абв' IN ''); @@ -383,27 +383,27 @@ Result: ## extractAllGroupsHorizontal {#extractallgroups-horizontal} -Разбирает Ñтроку `haystack` на фрагменты, ÑоответÑтвующие группам регулÑрного Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ `pattern`. Возвращает маÑÑив маÑÑивов, где первый маÑÑив Ñодержит вÑе фрагменты, ÑоответÑтвующие первой группе регулÑрного выражениÑ, второй маÑÑив - ÑоответÑтвующие второй группе, и Ñ‚.д. +Разбирает Ñтроку `haystack` на фрагменты, ÑоответÑтвующие группам регулÑрного Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ `pattern`. Возвращает маÑÑив маÑÑивов, где первый маÑÑив Ñодержит вÑе фрагменты, ÑоответÑтвующие первой группе регулÑрного выражениÑ, второй маÑÑив - ÑоответÑтвующие второй группе, и Ñ‚.д. !!! note "Замечание" Ð¤ÑƒÐ½ÐºÑ†Ð¸Ñ `extractAllGroupsHorizontal` работает медленнее, чем Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ [extractAllGroupsVertical](#extractallgroups-vertical). -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql extractAllGroupsHorizontal(haystack, pattern) ``` -**Ðргументы** +**Ðргументы** - `haystack` — Ñтрока Ð´Ð»Ñ Ñ€Ð°Ð·Ð±Ð¾Ñ€Ð°. Тип: [String](../../sql-reference/data-types/string.md). -- `pattern` — регулÑрное выражение, поÑтроенное по ÑинтакÑичеÑким правилам [re2](https://github.com/google/re2/wiki/Syntax). Выражение должно Ñодержать группы, заключенные в круглые Ñкобки. ЕÑли выражение не Ñодержит групп, генерируетÑÑ Ð¸Ñключение. Тип: [String](../../sql-reference/data-types/string.md). +- `pattern` — регулÑрное выражение, поÑтроенное по ÑинтакÑичеÑким правилам [re2](https://github.com/google/re2/wiki/Syntax). Выражение должно Ñодержать группы, заключенные в круглые Ñкобки. ЕÑли выражение не Ñодержит групп, генерируетÑÑ Ð¸Ñключение. Тип: [String](../../sql-reference/data-types/string.md). **Возвращаемое значение** - Тип: [Array](../../sql-reference/data-types/array.md). -ЕÑли в Ñтроке `haystack` нет групп, ÑоответÑтвующих регулÑрному выражению `pattern`, возвращаетÑÑ Ð¼Ð°ÑÑив пуÑÑ‚Ñ‹Ñ… маÑÑивов. +ЕÑли в Ñтроке `haystack` нет групп, ÑоответÑтвующих регулÑрному выражению `pattern`, возвращаетÑÑ Ð¼Ð°ÑÑив пуÑÑ‚Ñ‹Ñ… маÑÑивов. **Пример** @@ -429,13 +429,13 @@ SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=( Разбирает Ñтроку `haystack` на фрагменты, ÑоответÑтвующие группам регулÑрного Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ `pattern`. Возвращает маÑÑив маÑÑивов, где каждый маÑÑив Ñодержит по одному фрагменту, ÑоответÑтвующему каждой группе регулÑрного выражениÑ. Фрагменты группируютÑÑ Ð² маÑÑивы в ÑоответÑтвии Ñ Ð¿Ð¾Ñ€Ñдком поÑÐ²Ð»ÐµÐ½Ð¸Ñ Ð² иÑходной Ñтроке. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql extractAllGroupsVertical(haystack, pattern) ``` -**Ðргументы** +**Ðргументы** - `haystack` — Ñтрока Ð´Ð»Ñ Ñ€Ð°Ð·Ð±Ð¾Ñ€Ð°. Тип: [String](../../sql-reference/data-types/string.md). - `pattern` — регулÑрное выражение, поÑтроенное по ÑинтакÑичеÑким правилам [re2](https://github.com/google/re2/wiki/Syntax). Выражение должно Ñодержать группы, заключенные в круглые Ñкобки. ЕÑли выражение не Ñодержит групп, генерируетÑÑ Ð¸Ñключение. Тип: [String](../../sql-reference/data-types/string.md). @@ -444,7 +444,7 @@ extractAllGroupsVertical(haystack, pattern) - Тип: [Array](../../sql-reference/data-types/array.md). -ЕÑли в Ñтроке `haystack` нет групп, ÑоответÑтвующих регулÑрному выражению `pattern`, возвращаетÑÑ Ð¿ÑƒÑтой маÑÑив. +ЕÑли в Ñтроке `haystack` нет групп, ÑоответÑтвующих регулÑрному выражению `pattern`, возвращаетÑÑ Ð¿ÑƒÑтой маÑÑив. **Пример** @@ -558,7 +558,7 @@ SELECT * FROM Months WHERE ilike(name, '%j%'); !!! note "Примечание" Ð”Ð»Ñ ÑÐ»ÑƒÑ‡Ð°Ñ UTF-8 мы иÑпользуем триграммное раÑÑтоÑние. ВычиÑление n-граммного раÑÑтоÑÐ½Ð¸Ñ Ð½Ðµ ÑовÑем чеÑтное. Мы иÑпользуем 2-Ñ… байтные Ñ…Ñши Ð´Ð»Ñ Ñ…ÑÑˆÐ¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ n-грамм, а затем вычиÑлÑем (не)ÑимметричеÑкую разноÑÑ‚ÑŒ между Ñ…Ñш таблицами – могут возникнуть коллизии. Ð’ формате UTF-8 без учета региÑтра мы не иÑпользуем чеÑтную функцию `tolower` – мы обнулÑем 5-й бит (Ð½ÑƒÐ¼ÐµÑ€Ð°Ñ†Ð¸Ñ Ñ Ð½ÑƒÐ»Ñ) каждого байта кодовой точки, а также первый бит нулевого байта, еÑли байтов больше 1 – Ñто работает Ð´Ð»Ñ Ð»Ð°Ñ‚Ð¸Ð½Ð¸Ñ†Ñ‹ и почти Ð´Ð»Ñ Ð²Ñех кирилличеÑких букв. - + ## countMatches(haystack, pattern) {#countmatcheshaystack-pattern} Возвращает количеÑтво Ñовпадений, найденных в Ñтроке `haystack`, Ð´Ð»Ñ Ñ€ÐµÐ³ÑƒÐ»Ñрного Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ `pattern`. diff --git a/docs/ru/sql-reference/functions/tuple-map-functions.md b/docs/ru/sql-reference/functions/tuple-map-functions.md index 94dccb58622..4775152fb54 100644 --- a/docs/ru/sql-reference/functions/tuple-map-functions.md +++ b/docs/ru/sql-reference/functions/tuple-map-functions.md @@ -9,13 +9,13 @@ toc_title: Работа Ñ ÐºÐ¾Ð½Ñ‚ÐµÐ¹Ð½ÐµÑ€Ð°Ð¼Ð¸ map Преобразовывает пары `ключ:значение` в тип данных [Map(key, value)](../../sql-reference/data-types/map.md). -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql map(key1, value1[, key2, value2, ...]) ``` -**Ðргументы** +**Ðргументы** - `key` — ключ. [String](../../sql-reference/data-types/string.md) или [Integer](../../sql-reference/data-types/int-uint.md). - `value` — значение. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md) или [Array](../../sql-reference/data-types/array.md). @@ -62,7 +62,7 @@ SELECT a['key2'] FROM table_map; └─────────────────────────┘ ``` -**Смотрите также** +**Смотрите также** - тип данных [Map(key, value)](../../sql-reference/data-types/map.md) @@ -70,13 +70,13 @@ SELECT a['key2'] FROM table_map; Собирает вÑе ключи и Ñуммирует ÑоответÑтвующие значениÑ. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql mapAdd(Tuple(Array, Array), Tuple(Array, Array) [, ...]) ``` -**Ðргументы** +**Ðргументы** Ðргументами ÑвлÑÑŽÑ‚ÑÑ [кортежи](../../sql-reference/data-types/tuple.md#tuplet1-t2) из двух [маÑÑивов](../../sql-reference/data-types/array.md#data-type-array), где Ñлементы в первом маÑÑиве предÑтавлÑÑŽÑ‚ ключи, а второй маÑÑив Ñодержит Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ ключа. Ð’Ñе маÑÑивы ключей должны иметь один и тот же тип, а вÑе маÑÑивы значений должны Ñодержать Ñлементы, которые можно приводить к одному типу ([Int64](../../sql-reference/data-types/int-uint.md#int-ranges), [UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges) или [Float64](../../sql-reference/data-types/float.md#float32-float64)). @@ -106,7 +106,7 @@ SELECT mapAdd(([toUInt8(1), 2], [1, 1]), ([toUInt8(1), 2], [1, 1])) as res, toTy Собирает вÑе ключи и вычитает ÑоответÑтвующие значениÑ. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql mapSubtract(Tuple(Array, Array), Tuple(Array, Array) [, ...]) @@ -142,7 +142,7 @@ SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt3 ЗаполнÑет недоÑтающие ключи в контейнере map (пара маÑÑивов ключей и значений), где ключи ÑвлÑÑŽÑ‚ÑÑ Ñ†ÐµÐ»Ñ‹Ð¼Ð¸ чиÑлами. Кроме того, он поддерживает указание макÑимального ключа, который иÑпользуетÑÑ Ð´Ð»Ñ Ñ€Ð°ÑÑˆÐ¸Ñ€ÐµÐ½Ð¸Ñ Ð¼Ð°ÑÑива ключей. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql mapPopulateSeries(keys, values[, max]) @@ -187,7 +187,7 @@ select mapPopulateSeries([1,2,4], [11,22,44], 5) as res, toTypeName(res) as type mapContains(map, key) ``` -**Ðргументы** +**Ðргументы** - `map` — контейнер Map. [Map](../../sql-reference/data-types/map.md). - `key` — ключ. Тип ÑоответÑтует типу ключей параметра `map`. diff --git a/docs/ru/sql-reference/functions/type-conversion-functions.md b/docs/ru/sql-reference/functions/type-conversion-functions.md index 8707642eb59..757afca9588 100644 --- a/docs/ru/sql-reference/functions/type-conversion-functions.md +++ b/docs/ru/sql-reference/functions/type-conversion-functions.md @@ -435,8 +435,8 @@ reinterpret(x, type) **Ðргументы** -- `x` — любой тип данных. -- `type` — конечный тип данных. [String](../../sql-reference/data-types/string.md). +- `x` — любой тип данных. +- `type` — конечный тип данных. [String](../../sql-reference/data-types/string.md). **Возвращаемое значение** @@ -462,27 +462,29 @@ SELECT reinterpret(toInt8(-1), 'UInt8') as int_to_uint, ## CAST(x, T) {#type_conversion_function-cast} -Преобразует входное значение `x` в указанный тип данных `T`. Ð’ отличии от функции `reinterpret` иÑпользует внешнее предÑтавление Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ `x`. - -ПоддерживаетÑÑ Ñ‚Ð°ÐºÐ¶Ðµ ÑинтакÑÐ¸Ñ `CAST(x AS t)`. - -!!! warning "Предупреждение" - ЕÑли значение `x` не может быть преобразовано к типу `T`, возникает переполнение. Ðапример, `CAST(-1, 'UInt8')` возвращает 255. +Преобразует входное значение к указанному типу данных. Ð’ отличие от функции [reinterpret](#type_conversion_function-reinterpret) `CAST` пытаетÑÑ Ð¿Ñ€ÐµÐ´Ñтавить то же Ñамое значение в новом типе данных. ЕÑли преобразование невозможно, то возникает иÑключение. +ПоддерживаетÑÑ Ð½ÐµÑколько вариантов ÑинтакÑиÑа. **СинтакÑиÑ** ``` sql CAST(x, T) +CAST(x AS t) +x::t ``` **Ðргументы** -- `x` — любой тип данных. -- `T` — конечный тип данных. [String](../../sql-reference/data-types/string.md). +- `x` — значение, которое нужно преобразовать. Может быть любого типа. +- `T` — Ð¸Ð¼Ñ Ñ‚Ð¸Ð¿Ð° данных. [String](../../sql-reference/data-types/string.md). +- `t` — тип данных. **Возвращаемое значение** -- Значение конечного типа данных. +- Преобразованное значение. + +!!! note "Примечание" + ЕÑли входное значение выходит за границы нового типа, то результат переполнÑетÑÑ. Ðапример, `CAST(-1, 'UInt8')` возвращает `255`. **Примеры** @@ -491,16 +493,16 @@ CAST(x, T) ```sql SELECT CAST(toInt8(-1), 'UInt8') AS cast_int_to_uint, - CAST(toInt8(1), 'Float32') AS cast_int_to_float, - CAST('1', 'UInt32') AS cast_string_to_int + CAST(1.5 AS Decimal(3,2)) AS cast_float_to_decimal, + '1'::Int32 AS cast_string_to_int; ``` Результат: ``` -┌─cast_int_to_uint─┬─cast_int_to_float─┬─cast_string_to_int─┠-│ 255 │ 1 │ 1 │ -└──────────────────┴───────────────────┴────────────────────┘ +┌─cast_int_to_uint─┬─cast_float_to_decimal─┬─cast_string_to_int─┠+│ 255 │ 1.50 │ 1 │ +└──────────────────┴───────────────────────┴────────────────────┘ ``` ЗапроÑ: @@ -524,7 +526,7 @@ SELECT Преобразование в FixedString(N) работает только Ð´Ð»Ñ Ð°Ñ€Ð³ÑƒÐ¼ÐµÐ½Ñ‚Ð¾Ð² типа [String](../../sql-reference/data-types/string.md) или [FixedString](../../sql-reference/data-types/fixedstring.md). -ПоддерживаетÑÑ Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ðµ к типу [Nullable](../../sql-reference/functions/type-conversion-functions.md) и обратно. +ПоддерживаетÑÑ Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ðµ к типу [Nullable](../../sql-reference/data-types/nullable.md) и обратно. **Примеры** @@ -573,7 +575,7 @@ SELECT toTypeName(CAST(x, 'Nullable(UInt16)')) FROM t_null; ЗапроÑ: ``` sql -SELECT cast(-1, 'UInt8') as uint8; +SELECT cast(-1, 'UInt8') as uint8; ``` Результат: @@ -1166,8 +1168,8 @@ SELECT toLowCardinality('1'); ## toUnixTimestamp64Nano {#tounixtimestamp64nano} -Преобразует значение `DateTime64` в значение `Int64` Ñ Ñ„Ð¸ÐºÑированной точноÑтью менее одной Ñекунды. -Входное значение округлÑетÑÑ ÑоответÑтвующим образом вверх или вниз в завиÑимоÑти от его точноÑти. +Преобразует значение `DateTime64` в значение `Int64` Ñ Ñ„Ð¸ÐºÑированной точноÑтью менее одной Ñекунды. +Входное значение округлÑетÑÑ ÑоответÑтвующим образом вверх или вниз в завиÑимоÑти от его точноÑти. !!! info "Примечание" Возвращаемое значение — Ñто Ð²Ñ€ÐµÐ¼ÐµÐ½Ð½Ð°Ñ Ð¼ÐµÑ‚ÐºÐ° в UTC, а не в чаÑовом поÑÑе `DateTime64`. @@ -1203,7 +1205,7 @@ SELECT toUnixTimestamp64Milli(dt64); └──────────────────────────────┘ ``` -ЗапроÑ: +ЗапроÑ: ``` sql WITH toDateTime64('2019-09-16 19:20:12.345678910', 6) AS dt64 @@ -1262,7 +1264,7 @@ SELECT fromUnixTimestamp64Milli(i64, 'UTC'); Преобразует произвольные Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ Ð² Ñтроку заданного формата. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql formatRow(format, x, y, ...) @@ -1303,7 +1305,7 @@ FROM numbers(3); Преобразует произвольные Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ Ð² Ñтроку заданного формата. При Ñтом удалÑет лишние переводы Ñтрок `\n`, еÑли они поÑвилиÑÑŒ. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql formatRowNoNewline(format, x, y, ...) diff --git a/docs/ru/sql-reference/functions/url-functions.md b/docs/ru/sql-reference/functions/url-functions.md index bdf9beeabf5..87e4b1b89c1 100644 --- a/docs/ru/sql-reference/functions/url-functions.md +++ b/docs/ru/sql-reference/functions/url-functions.md @@ -267,7 +267,7 @@ SELECT firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'publi Результат: -```text +```text ┌─firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'public_suffix_list')─┠│ foo │ └──────────────────────────────────────────────────────────────────────────────────────────┘ @@ -279,7 +279,7 @@ SELECT firstSignificantSubdomainCustom('bar.foo.there-is-no-such-domain', 'publi ### port(URL[, default_port = 0]) {#port} -Возвращает порт или значение `default_port`, еÑли в URL-адреÑе нет порта (или передан невалидный URL) +Возвращает порт или значение `default_port`, еÑли в URL-адреÑе нет порта (или передан невалидный URL) ### path {#path} diff --git a/docs/ru/sql-reference/operators/index.md b/docs/ru/sql-reference/operators/index.md index ee7f9a44388..98f6873f712 100644 --- a/docs/ru/sql-reference/operators/index.md +++ b/docs/ru/sql-reference/operators/index.md @@ -192,7 +192,7 @@ SELECT now() AS current_date_time, current_date_time + INTERVAL '4' day + INTERV Ð’Ñ‹ можете изменить дату, не иÑÐ¿Ð¾Ð»ÑŒÐ·ÑƒÑ ÑинтакÑÐ¸Ñ `INTERVAL`, а проÑто добавив или отнÑв Ñекунды, минуты и чаÑÑ‹. Ðапример, чтобы передвинуть дату на один день вперед, можно прибавить к ней значение `60*60*24`. !!! note "Примечание" - СинтакÑÐ¸Ñ `INTERVAL` или Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ `addDays` предпочтительнее Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ Ð´Ð°Ñ‚Ð°Ð¼Ð¸. Сложение Ñ Ñ‡Ð¸Ñлом (например, ÑинтакÑÐ¸Ñ `now() + ...`) не учитывает региональные наÑтройки времени, например, переход на летнее времÑ. + СинтакÑÐ¸Ñ `INTERVAL` или Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ `addDays` предпочтительнее Ð´Ð»Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ Ñ Ð´Ð°Ñ‚Ð°Ð¼Ð¸. Сложение Ñ Ñ‡Ð¸Ñлом (например, ÑинтакÑÐ¸Ñ `now() + ...`) не учитывает региональные наÑтройки времени, например, переход на летнее времÑ. Пример: diff --git a/docs/ru/sql-reference/statements/alter/column.md b/docs/ru/sql-reference/statements/alter/column.md index 158ab2e7385..9f59c79bfdd 100644 --- a/docs/ru/sql-reference/statements/alter/column.md +++ b/docs/ru/sql-reference/statements/alter/column.md @@ -5,15 +5,26 @@ toc_title: "МанипулÑции Ñо Ñтолбцами" # МанипулÑции Ñо Ñтолбцами {#manipuliatsii-so-stolbtsami} +Ðабор дейÑтвий, позволÑющих изменÑÑ‚ÑŒ Ñтруктуру таблицы. + +СинтакÑиÑ: + +``` sql +ALTER TABLE [db].name [ON CLUSTER cluster] ADD|DROP|CLEAR|COMMENT|MODIFY COLUMN ... +``` + +Ð’ запроÑе можно указать Ñразу неÑколько дейÑтвий над одной таблицей через запÑтую. +Каждое дейÑтвие — Ñто манипулÑÑ†Ð¸Ñ Ð½Ð°Ð´ Ñтолбцом. + СущеÑтвуют Ñледующие дейÑтвиÑ: - [ADD COLUMN](#alter_add-column) — добавлÑет Ñтолбец в таблицу; - [DROP COLUMN](#alter_drop-column) — удалÑет Ñтолбец; +- [RENAME COLUMN](#alter_rename-column) — переименовывает ÑущеÑтвующий Ñтолбец. - [CLEAR COLUMN](#alter_clear-column) — ÑбраÑывает вÑе Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð² Ñтолбце Ð´Ð»Ñ Ð·Ð°Ð´Ð°Ð½Ð½Ð¾Ð¹ партиции; - [COMMENT COLUMN](#alter_comment-column) — добавлÑет комментарий к Ñтолбцу; - [MODIFY COLUMN](#alter_modify-column) — изменÑет тип Ñтолбца, выражение Ð´Ð»Ñ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ умолчанию и TTL. - [MODIFY COLUMN REMOVE](#modify-remove) — удалÑет какое-либо из ÑвойÑтв Ñтолбца. -- [RENAME COLUMN](#alter_rename-column) — переименовывает ÑущеÑтвующий Ñтолбец. Подробное опиÑание Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ дейÑÑ‚Ð²Ð¸Ñ Ð¿Ñ€Ð¸Ð²ÐµÐ´ÐµÐ½Ð¾ ниже. @@ -72,6 +83,22 @@ DROP COLUMN [IF EXISTS] name ALTER TABLE visits DROP COLUMN browser ``` +## RENAME COLUMN {#alter_rename-column} + +``` sql +RENAME COLUMN [IF EXISTS] name to new_name +``` + +Переименовывает Ñтолбец `name` в `new_name`. ЕÑли указано выражение `IF EXISTS`, то Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ будет возвращать ошибку при уÑловии, что Ñтолбец `name` не ÑущеÑтвует. ПоÑкольку переименование не затрагивает физичеÑкие данные колонки, Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÑетÑÑ Ð¿Ñ€Ð°ÐºÑ‚Ð¸Ñ‡ÐµÑки мгновенно. + +**ЗÐМЕЧЕÐИЕ**: Столбцы, ÑвлÑющиеÑÑ Ñ‡Ð°Ñтью оÑновного ключа или ключа Ñортировки (заданные Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ `ORDER BY` или `PRIMARY KEY`), не могут быть переименованы. Попытка переименовать Ñти Ñлобцы приведет к `SQL Error [524]`. + +Пример: + +``` sql +ALTER TABLE visits RENAME COLUMN webBrowser TO browser +``` + ## CLEAR COLUMN {#alter_clear-column} ``` sql @@ -109,7 +136,7 @@ ALTER TABLE visits COMMENT COLUMN browser 'Столбец показывает, ## MODIFY COLUMN {#alter_modify-column} ``` sql -MODIFY COLUMN [IF EXISTS] name [type] [default_expr] [TTL] [AFTER name_after | FIRST] +MODIFY COLUMN [IF EXISTS] name [type] [default_expr] [codec] [TTL] [AFTER name_after | FIRST] ``` Ð—Ð°Ð¿Ñ€Ð¾Ñ Ð¸Ð·Ð¼ÐµÐ½Ñет Ñледующие ÑвойÑтва Ñтолбца `name`: @@ -118,11 +145,15 @@ MODIFY COLUMN [IF EXISTS] name [type] [default_expr] [TTL] [AFTER name_after | F - Значение по умолчанию +- Кодеки ÑÐ¶Ð°Ñ‚Ð¸Ñ + - TTL - Примеры Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ TTL Ñтолбца Ñмотрите в разделе [TTL Ñтолбца](../../../engines/table-engines/mergetree-family/mergetree.md#mergetree-column-ttl). +Примеры Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ ÐºÐ¾Ð´ÐµÐºÐ¾Ð² ÑÐ¶Ð°Ñ‚Ð¸Ñ Ñмотрите в разделе [Кодеки ÑÐ¶Ð°Ñ‚Ð¸Ñ Ñтолбцов](../create/table.md#codecs). -ЕÑли указано `IF EXISTS`, Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ возвращает ошибку, еÑли Ñтолбца не ÑущеÑтвует. +Примеры Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ TTL Ñтолбца Ñмотрите в разделе [TTL Ñтолбца](../../../engines/table-engines/mergetree-family/mergetree.md#mergetree-column-ttl). + +ЕÑли указано `IF EXISTS`, Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ возвращает ошибку при уÑловии, что Ñтолбец не ÑущеÑтвует. Ð—Ð°Ð¿Ñ€Ð¾Ñ Ñ‚Ð°ÐºÐ¶Ðµ может изменÑÑ‚ÑŒ порÑдок Ñтолбцов при помощи `FIRST | AFTER`, Ñмотрите опиÑание [ADD COLUMN](#alter_add-column). @@ -162,22 +193,6 @@ ALTER TABLE table_with_ttl MODIFY COLUMN column_ttl REMOVE TTL; - [REMOVE TTL](ttl.md). -## RENAME COLUMN {#alter_rename-column} - -Переименовывает ÑущеÑтвующий Ñтолбец. - -СинтакÑиÑ: - -```sql -ALTER TABLE table_name RENAME COLUMN column_name TO new_column_name -``` - -**Пример** - -```sql -ALTER TABLE table_with_ttl RENAME COLUMN column_ttl TO column_ttl_new; -``` - ## ÐžÐ³Ñ€Ð°Ð½Ð¸Ñ‡ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа ALTER {#ogranicheniia-zaprosa-alter} Ð—Ð°Ð¿Ñ€Ð¾Ñ `ALTER` позволÑет Ñоздавать и удалÑÑ‚ÑŒ отдельные Ñлементы (Ñтолбцы) вложенных Ñтруктур данных, но не вложенные Ñтруктуры данных целиком. Ð”Ð»Ñ Ð´Ð¾Ð±Ð°Ð²Ð»ÐµÐ½Ð¸Ñ Ð²Ð»Ð¾Ð¶ÐµÐ½Ð½Ð¾Ð¹ Ñтруктуры данных, вы можете добавить Ñтолбцы Ñ Ð¸Ð¼ÐµÐ½ÐµÐ¼ вида `name.nested_name` и типом `Array(T)` - Ð²Ð»Ð¾Ð¶ÐµÐ½Ð½Ð°Ñ Ñтруктура данных полноÑтью Ñквивалентна неÑкольким Ñтолбцам-маÑÑивам Ñ Ð¸Ð¼ÐµÐ½ÐµÐ¼, имеющим одинаковый Ð¿Ñ€ÐµÑ„Ð¸ÐºÑ Ð´Ð¾ точки. @@ -186,7 +201,6 @@ ALTER TABLE table_with_ttl RENAME COLUMN column_ttl TO column_ttl_new; ЕÑли возможноÑтей запроÑа `ALTER` не хватает Ð´Ð»Ñ Ð½ÑƒÐ¶Ð½Ð¾Ð³Ð¾ Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹, вы можете Ñоздать новую таблицу, Ñкопировать туда данные Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ запроÑа [INSERT SELECT](../insert-into.md#insert_query_insert-select), затем поменÑÑ‚ÑŒ таблицы меÑтами Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ запроÑа [RENAME](../misc.md#misc_operations-rename), и удалить Ñтарую таблицу. Ð’ качеÑтве альтернативы Ð´Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа `INSERT SELECT`, можно иÑпользовать инÑтрумент [clickhouse-copier](../../../sql-reference/statements/alter/index.md). -Ð—Ð°Ð¿Ñ€Ð¾Ñ `ALTER` блокирует вÑе Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¸ запиÑи Ð´Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹. То еÑÑ‚ÑŒ, еÑли на момент запроÑа `ALTER`, выполнÑлÑÑ Ð´Ð¾Ð»Ð³Ð¸Ð¹ `SELECT`, то Ð·Ð°Ð¿Ñ€Ð¾Ñ `ALTER` Ñначала дождётÑÑ ÐµÐ³Ð¾ выполнениÑ. И в Ñто времÑ, вÑе новые запроÑÑ‹ к той же таблице, будут ждать, пока завершитÑÑ Ñтот `ALTER`. +Ð—Ð°Ð¿Ñ€Ð¾Ñ `ALTER` блокирует вÑе Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð¸ запиÑи Ð´Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹. То еÑÑ‚ÑŒ еÑли на момент запроÑа `ALTER` выполнÑлÑÑ Ð´Ð¾Ð»Ð³Ð¸Ð¹ `SELECT`, то Ð·Ð°Ð¿Ñ€Ð¾Ñ `ALTER` Ñначала дождётÑÑ ÐµÐ³Ð¾ выполнениÑ. И в Ñто Ð²Ñ€ÐµÐ¼Ñ Ð²Ñе новые запроÑÑ‹ к той же таблице будут ждать, пока завершитÑÑ Ñтот `ALTER`. Ð”Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†, которые не хранÑÑ‚ данные ÑамоÑтоÑтельно (типа [Merge](../../../sql-reference/statements/alter/index.md) и [Distributed](../../../sql-reference/statements/alter/index.md)), `ALTER` вÑего лишь менÑет Ñтруктуру таблицы, но не менÑет Ñтруктуру подчинённых таблиц. Ð”Ð»Ñ Ð¿Ñ€Ð¸Ð¼ÐµÑ€Ð°, при ALTER-е таблицы типа `Distributed`, вам также потребуетÑÑ Ð²Ñ‹Ð¿Ð¾Ð»Ð½Ð¸Ñ‚ÑŒ Ð·Ð°Ð¿Ñ€Ð¾Ñ `ALTER` Ð´Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ† на вÑех удалённых Ñерверах. - diff --git a/docs/ru/sql-reference/statements/alter/index.md b/docs/ru/sql-reference/statements/alter/index.md index 648fb7e7c5c..043ac3839d9 100644 --- a/docs/ru/sql-reference/statements/alter/index.md +++ b/docs/ru/sql-reference/statements/alter/index.md @@ -67,5 +67,5 @@ ALTER TABLE [db.]table MATERIALIZE INDEX name IN PARTITION partition_name Ð”Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов `ALTER ... ATTACH|DETACH|DROP` можно наÑтроить ожидание, Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ наÑтройки `replication_alter_partitions_sync`. Возможные значениÑ: `0` - не ждать, `1` - ждать Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ у ÑÐµÐ±Ñ (по умолчанию), `2` - ждать вÑех. -Ð”Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов `ALTER TABLE ... UPDATE|DELETE` ÑинхронноÑÑ‚ÑŒ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¾Ð¿Ñ€ÐµÐ´ÐµÐ»ÑетÑÑ Ð½Ð°Ñтройкой [mutations_sync](../../../operations/settings/settings.md#mutations_sync). +Ð”Ð»Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñов `ALTER TABLE ... UPDATE|DELETE` ÑинхронноÑÑ‚ÑŒ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¾Ð¿Ñ€ÐµÐ´ÐµÐ»ÑетÑÑ Ð½Ð°Ñтройкой [mutations_sync](../../../operations/settings/settings.md#mutations_sync). diff --git a/docs/ru/sql-reference/statements/alter/partition.md b/docs/ru/sql-reference/statements/alter/partition.md index 0a485c7b591..f875103a498 100644 --- a/docs/ru/sql-reference/statements/alter/partition.md +++ b/docs/ru/sql-reference/statements/alter/partition.md @@ -17,7 +17,7 @@ toc_title: PARTITION - [CLEAR INDEX IN PARTITION](#alter_clear-index-partition) — очиÑтить поÑтроенные вторичные индекÑÑ‹ Ð´Ð»Ñ Ð·Ð°Ð´Ð°Ð½Ð½Ð¾Ð¹ партиции; - [FREEZE PARTITION](#alter_freeze-partition) — Ñоздать резервную копию партиции; - [UNFREEZE PARTITION](#alter_unfreeze-partition) — удалить резервную копию партиции; -- [FETCH PARTITION](#alter_fetch-partition) — Ñкачать партицию Ñ Ð´Ñ€ÑƒÐ³Ð¾Ð³Ð¾ Ñервера; +- [FETCH PARTITION\|PART](#alter_fetch-partition) — Ñкачать партицию/куÑок Ñ Ð´Ñ€ÑƒÐ³Ð¾Ð³Ð¾ Ñервера; - [MOVE PARTITION\|PART](#alter_move-partition) — перемеÑтить партицию/куÑкок на другой диÑк или том. - [UPDATE IN PARTITION](#update-in-partition) — обновить данные внутри партиции по уÑловию. - [DELETE IN PARTITION](#delete-in-partition) — удалить данные внутри партиции по уÑловию. @@ -209,29 +209,35 @@ ALTER TABLE 'table_name' UNFREEZE [PARTITION 'part_expr'] WITH NAME 'backup_name УдалÑет Ñ Ð´Ð¸Ñка "замороженные" партиции Ñ ÑƒÐºÐ°Ð·Ð°Ð½Ð½Ñ‹Ð¼ именем. ЕÑли ÑÐµÐºÑ†Ð¸Ñ `PARTITION` опущена, Ð·Ð°Ð¿Ñ€Ð¾Ñ ÑƒÐ´Ð°Ð»Ñет резервную копию вÑех партиций Ñразу. -## FETCH PARTITION {#alter_fetch-partition} +## FETCH PARTITION\|PART {#alter_fetch-partition} ``` sql -ALTER TABLE table_name FETCH PARTITION partition_expr FROM 'path-in-zookeeper' +ALTER TABLE table_name FETCH PARTITION|PART partition_expr FROM 'path-in-zookeeper' ``` Загружает партицию Ñ Ð´Ñ€ÑƒÐ³Ð¾Ð³Ð¾ Ñервера. Этот Ð·Ð°Ð¿Ñ€Ð¾Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°ÐµÑ‚ только Ð´Ð»Ñ Ñ€ÐµÐ¿Ð»Ð¸Ñ†Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð½Ñ‹Ñ… таблиц. Ð—Ð°Ð¿Ñ€Ð¾Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½Ñет Ñледующее: -1. Загружает партицию Ñ ÑƒÐºÐ°Ð·Ð°Ð½Ð½Ð¾Ð³Ð¾ шарда. Путь к шарду задаетÑÑ Ð² Ñекции `FROM` (‘path-in-zookeeper’). Обратите внимание, нужно задавать путь к шарду в ZooKeeper. +1. Загружает партицию/куÑок Ñ ÑƒÐºÐ°Ð·Ð°Ð½Ð½Ð¾Ð³Ð¾ шарда. Путь к шарду задаетÑÑ Ð² Ñекции `FROM` (‘path-in-zookeeper’). Обратите внимание, нужно задавать путь к шарду в ZooKeeper. 2. Помещает загруженные данные в директорию `detached` таблицы `table_name`. Чтобы прикрепить Ñти данные к таблице, иÑпользуйте Ð·Ð°Ð¿Ñ€Ð¾Ñ [ATTACH PARTITION\|PART](#alter_attach-partition). Ðапример: +1. FETCH PARTITION ``` sql ALTER TABLE users FETCH PARTITION 201902 FROM '/clickhouse/tables/01-01/visits'; ALTER TABLE users ATTACH PARTITION 201902; ``` +2. FETCH PART +``` sql +ALTER TABLE users FETCH PART 201901_2_2_0 FROM '/clickhouse/tables/01-01/visits'; +ALTER TABLE users ATTACH PART 201901_2_2_0; +``` Следует иметь в виду: -- Ð—Ð°Ð¿Ñ€Ð¾Ñ `ALTER TABLE t FETCH PARTITION` не реплицируетÑÑ. Он загружает партицию в директорию `detached` только на локальном Ñервере. +- Ð—Ð°Ð¿Ñ€Ð¾Ñ `ALTER TABLE t FETCH PARTITION|PART` не реплицируетÑÑ. Он загружает партицию в директорию `detached` только на локальном Ñервере. - Ð—Ð°Ð¿Ñ€Ð¾Ñ `ALTER TABLE t ATTACH` реплицируетÑÑ â€” он добавлÑет данные в таблицу Ñразу на вÑех репликах. Ðа одной из реплик данные будут добавлены из директории `detached`, а на других — из ÑоÑедних реплик. Перед загрузкой данных ÑиÑтема проверÑет, ÑущеÑтвует ли Ð¿Ð°Ñ€Ñ‚Ð¸Ñ†Ð¸Ñ Ð¸ Ñовпадает ли её Ñтруктура Ñо Ñтруктурой таблицы. При Ñтом автоматичеÑки выбираетÑÑ Ð½Ð°Ð¸Ð±Ð¾Ð»ÐµÐµ Ð°ÐºÑ‚ÑƒÐ°Ð»ÑŒÐ½Ð°Ñ Ñ€ÐµÐ¿Ð»Ð¸ÐºÐ° Ñреди вÑех живых реплик. diff --git a/docs/ru/sql-reference/statements/alter/role.md b/docs/ru/sql-reference/statements/alter/role.md index e9ce62c58d5..73f13b005a7 100644 --- a/docs/ru/sql-reference/statements/alter/role.md +++ b/docs/ru/sql-reference/statements/alter/role.md @@ -10,7 +10,7 @@ toc_title: ROLE СинтакÑиÑ: ``` sql -ALTER ROLE [IF EXISTS] name1 [ON CLUSTER cluster_name1] [RENAME TO new_name1] +ALTER ROLE [IF EXISTS] name1 [ON CLUSTER cluster_name1] [RENAME TO new_name1] [, name2 [ON CLUSTER cluster_name2] [RENAME TO new_name2] ...] [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY|WRITABLE] | PROFILE 'profile_name'] [,...] ``` diff --git a/docs/ru/sql-reference/statements/alter/row-policy.md b/docs/ru/sql-reference/statements/alter/row-policy.md index cff4d4e497a..63214f89e12 100644 --- a/docs/ru/sql-reference/statements/alter/row-policy.md +++ b/docs/ru/sql-reference/statements/alter/row-policy.md @@ -10,7 +10,7 @@ toc_title: ROW POLICY СинтакÑиÑ: ``` sql -ALTER [ROW] POLICY [IF EXISTS] name1 [ON CLUSTER cluster_name1] ON [database1.]table1 [RENAME TO new_name1] +ALTER [ROW] POLICY [IF EXISTS] name1 [ON CLUSTER cluster_name1] ON [database1.]table1 [RENAME TO new_name1] [, name2 [ON CLUSTER cluster_name2] ON [database2.]table2 [RENAME TO new_name2] ...] [AS {PERMISSIVE | RESTRICTIVE}] [FOR SELECT] diff --git a/docs/ru/sql-reference/statements/alter/setting.md b/docs/ru/sql-reference/statements/alter/setting.md new file mode 100644 index 00000000000..1378bdf5cf8 --- /dev/null +++ b/docs/ru/sql-reference/statements/alter/setting.md @@ -0,0 +1,60 @@ +--- +toc_priority: 38 +toc_title: SETTING +--- + +# Изменение наÑтроек таблицы {#table_settings_manipulations} + +СущеÑтвуют запроÑÑ‹, которые изменÑÑŽÑ‚ наÑтройки таблицы или ÑбраÑывают их в Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ умолчанию. Ð’ одном запроÑе можно изменить Ñразу неÑколько наÑтроек. +ЕÑли наÑтройка Ñ ÑƒÐºÐ°Ð·Ð°Ð½Ð½Ñ‹Ð¼ именем не ÑущеÑтвует, то генерируетÑÑ Ð¸Ñключение. + +**СинтакÑиÑ** + +``` sql +ALTER TABLE [db].name [ON CLUSTER cluster] MODIFY|RESET SETTING ... +``` + +!!! note "Примечание" + Эти запроÑÑ‹ могут применÑÑ‚ÑŒÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ к таблицам на движке [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md). + + +## MODIFY SETTING {#alter_modify_setting} + +ИзменÑет наÑтройки таблицы. + +**СинтакÑиÑ** + +```sql +MODIFY SETTING setting_name=value [, ...] +``` + +**Пример** + +```sql +CREATE TABLE example_table (id UInt32, data String) ENGINE=MergeTree() ORDER BY id; + +ALTER TABLE example_table MODIFY SETTING max_part_loading_threads=8, max_parts_in_total=50000; +``` + +## RESET SETTING {#alter_reset_setting} + +СбраÑывает наÑтройки таблицы в Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ умолчанию. ЕÑли наÑтройка уже находитÑÑ Ð² ÑоÑтоÑнии по умолчанию, то никакие дейÑÑ‚Ð²Ð¸Ñ Ð½Ðµ выполнÑÑŽÑ‚ÑÑ. + +**СинтакÑиÑ** + +```sql +RESET SETTING setting_name [, ...] +``` + +**Пример** + +```sql +CREATE TABLE example_table (id UInt32, data String) ENGINE=MergeTree() ORDER BY id + SETTINGS max_part_loading_threads=8; + +ALTER TABLE example_table RESET SETTING max_part_loading_threads; +``` + +**Смотрите также** + +- [ÐаÑтройки MergeTree таблиц](../../../operations/settings/merge-tree-settings.md) diff --git a/docs/ru/sql-reference/statements/alter/user.md b/docs/ru/sql-reference/statements/alter/user.md index bb57c3bb328..42970982fa1 100644 --- a/docs/ru/sql-reference/statements/alter/user.md +++ b/docs/ru/sql-reference/statements/alter/user.md @@ -10,14 +10,14 @@ toc_title: USER СинтакÑиÑ: ``` sql -ALTER USER [IF EXISTS] name1 [ON CLUSTER cluster_name1] [RENAME TO new_name1] +ALTER USER [IF EXISTS] name1 [ON CLUSTER cluster_name1] [RENAME TO new_name1] [, name2 [ON CLUSTER cluster_name2] [RENAME TO new_name2] ...] [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password | plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']}] [[ADD | DROP] HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE] [DEFAULT ROLE role [,...] | ALL | ALL EXCEPT role [,...] ] [GRANTEES {user | role | ANY | NONE} [,...] [EXCEPT {user | role} [,...]]] [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY | WRITABLE] | PROFILE 'profile_name'] [,...] -``` +``` Ð”Ð»Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ `ALTER USER` необходима Ð¿Ñ€Ð¸Ð²Ð¸Ð»ÐµÐ³Ð¸Ñ [ALTER USER](../grant.md#grant-access-management). diff --git a/docs/ru/sql-reference/statements/attach.md b/docs/ru/sql-reference/statements/attach.md index b135507b818..2ffd0fe8d5b 100644 --- a/docs/ru/sql-reference/statements/attach.md +++ b/docs/ru/sql-reference/statements/attach.md @@ -5,7 +5,7 @@ toc_title: ATTACH # ATTACH Statement {#attach} -ВыполнÑет подключение таблицы, например, при перемещении базы данных на другой Ñервер. +ВыполнÑет подключение таблицы, например, при перемещении базы данных на другой Ñервер. Ð—Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ Ñоздаёт данные на диÑке, а предполагает, что данные уже лежат в ÑоответÑтвующих меÑтах, и вÑего лишь добавлÑет информацию о таблице на Ñервер. ПоÑле Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа `ATTACH` Ñервер будет знать о ÑущеÑтвовании таблицы. diff --git a/docs/ru/sql-reference/statements/check-table.md b/docs/ru/sql-reference/statements/check-table.md index 9592c1a5bc2..87330d75383 100644 --- a/docs/ru/sql-reference/statements/check-table.md +++ b/docs/ru/sql-reference/statements/check-table.md @@ -31,7 +31,7 @@ CHECK TABLE [db.]name ## Проверка таблиц ÑемейÑтва MergeTree {#checking-mergetree-tables} -Ð”Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ† ÑемейÑтва `MergeTree` еÑли [check_query_single_value_result](../../operations/settings/settings.md#check_query_single_value_result) = 0, Ð·Ð°Ð¿Ñ€Ð¾Ñ `CHECK TABLE` возвращает ÑÑ‚Ð°Ñ‚ÑƒÑ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ куÑка данных таблицы на локальном Ñервере. +Ð”Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ† ÑемейÑтва `MergeTree` еÑли [check_query_single_value_result](../../operations/settings/settings.md#check_query_single_value_result) = 0, Ð·Ð°Ð¿Ñ€Ð¾Ñ `CHECK TABLE` возвращает ÑÑ‚Ð°Ñ‚ÑƒÑ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ куÑка данных таблицы на локальном Ñервере. ```sql SET check_query_single_value_result = 0; diff --git a/docs/ru/sql-reference/statements/create/row-policy.md b/docs/ru/sql-reference/statements/create/row-policy.md index 6fe1dc45815..8f32c916309 100644 --- a/docs/ru/sql-reference/statements/create/row-policy.md +++ b/docs/ru/sql-reference/statements/create/row-policy.md @@ -10,8 +10,8 @@ toc_title: "Политика доÑтупа" СинтакÑиÑ: ``` sql -CREATE [ROW] POLICY [IF NOT EXISTS | OR REPLACE] policy_name1 [ON CLUSTER cluster_name1] ON [db1.]table1 - [, policy_name2 [ON CLUSTER cluster_name2] ON [db2.]table2 ...] +CREATE [ROW] POLICY [IF NOT EXISTS | OR REPLACE] policy_name1 [ON CLUSTER cluster_name1] ON [db1.]table1 + [, policy_name2 [ON CLUSTER cluster_name2] ON [db2.]table2 ...] [AS {PERMISSIVE | RESTRICTIVE}] [FOR SELECT] USING condition [TO {role [,...] | ALL | ALL EXCEPT role [,...]}] @@ -33,7 +33,7 @@ CREATE [ROW] POLICY [IF NOT EXISTS | OR REPLACE] policy_name1 [ON CLUSTER cluste `CREATE ROW POLICY pol1 ON mydb.table1 USING b=1 TO mira, peter` запретит пользователÑм `mira` и `peter` видеть Ñтроки Ñ `b != 1`, и еще запретит вÑем оÑтальным пользователÑм (например, пользователю `paul`) видеть какие-либо Ñтроки вообще из таблицы `mydb.table1`. - + ЕÑли Ñто нежелательно, такое поведение можно иÑправить, определив дополнительную политику: `CREATE ROW POLICY pol2 ON mydb.table1 USING 1 TO ALL EXCEPT mira, peter` diff --git a/docs/ru/sql-reference/statements/create/settings-profile.md b/docs/ru/sql-reference/statements/create/settings-profile.md index 522caf04c80..5d5b47f2efa 100644 --- a/docs/ru/sql-reference/statements/create/settings-profile.md +++ b/docs/ru/sql-reference/statements/create/settings-profile.md @@ -10,7 +10,7 @@ toc_title: "Профиль наÑтроек" СинтакÑиÑ: ``` sql -CREATE SETTINGS PROFILE [IF NOT EXISTS | OR REPLACE] TO name1 [ON CLUSTER cluster_name1] +CREATE SETTINGS PROFILE [IF NOT EXISTS | OR REPLACE] TO name1 [ON CLUSTER cluster_name1] [, name2 [ON CLUSTER cluster_name2] ...] [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY|WRITABLE] | INHERIT 'profile_name'] [,...] ``` diff --git a/docs/ru/sql-reference/statements/create/table.md b/docs/ru/sql-reference/statements/create/table.md index 1d65d82b24c..5523557c18a 100644 --- a/docs/ru/sql-reference/statements/create/table.md +++ b/docs/ru/sql-reference/statements/create/table.md @@ -74,7 +74,7 @@ SELECT x, toTypeName(x) FROM t1; ## Модификатор NULL или NOT NULL {#null-modifiers} -Модификатор `NULL` или `NOT NULL`, указанный поÑле типа данных в определении Ñтолбца, позволÑет или не позволÑет типу данных быть [Nullable](../../../sql-reference/data-types/nullable.md#data_type-nullable). +Модификатор `NULL` или `NOT NULL`, указанный поÑле типа данных в определении Ñтолбца, позволÑет или не позволÑет типу данных быть [Nullable](../../../sql-reference/data-types/nullable.md#data_type-nullable). ЕÑли тип не `Nullable` и указан модификатор `NULL`, то Ñтолбец будет иметь тип `Nullable`; еÑли `NOT NULL`, то не `Nullable`. Ðапример, `INT NULL` то же, что и `Nullable(INT)`. ЕÑли тип `Nullable` и указаны модификаторы `NULL` или `NOT NULL`, то будет вызвано иÑключение. @@ -129,11 +129,11 @@ SELECT x, toTypeName(x) FROM t1; - в ÑпиÑке Ñтолбцов: ``` sql -CREATE TABLE db.table_name -( - name1 type1, name2 type2, ..., +CREATE TABLE db.table_name +( + name1 type1, name2 type2, ..., PRIMARY KEY(expr1[, expr2,...])] -) +) ENGINE = engine; ``` @@ -141,9 +141,9 @@ ENGINE = engine; ``` sql CREATE TABLE db.table_name -( +( name1 type1, name2 type2, ... -) +) ENGINE = engine PRIMARY KEY(expr1[, expr2,...]); ``` @@ -199,7 +199,7 @@ ENGINE = ALTER TABLE codec_example MODIFY COLUMN float_value CODEC(Default); ``` -Кодеки можно поÑледовательно комбинировать, например, `CODEC(Delta, Default)`. +Кодеки можно поÑледовательно комбинировать, например, `CODEC(Delta, Default)`. Чтобы выбрать наиболее подходÑщую Ð´Ð»Ñ Ð²Ð°ÑˆÐµÐ³Ð¾ проекта комбинацию кодеков, необходимо провеÑти Ñравнительные теÑÑ‚Ñ‹, подобные тем, что опиÑаны в Ñтатье Altinity [New Encodings to Improve ClickHouse Efficiency](https://www.altinity.com/blog/2019/7/new-encodings-to-improve-clickhouse). Ð”Ð»Ñ Ñтолбцов типа `ALIAS` кодеки не применÑÑŽÑ‚ÑÑ. @@ -352,7 +352,7 @@ SELECT * FROM base.t1; !!!note "Замечание" Комментарий поддерживаетÑÑ Ð´Ð»Ñ Ð²Ñех движков таблиц, кроме [Kafka](../../../engines/table-engines/integrations/kafka.md), [RabbitMQ](../../../engines/table-engines/integrations/rabbitmq.md) и [EmbeddedRocksDB](../../../engines/table-engines/integrations/embedded-rocksdb.md). - + **СинтакÑиÑ** ``` sql @@ -363,7 +363,7 @@ CREATE TABLE db.table_name ENGINE = engine COMMENT 'Comment' ``` - + **Пример** ЗапроÑ: diff --git a/docs/ru/sql-reference/statements/create/user.md b/docs/ru/sql-reference/statements/create/user.md index ea64bff061b..22efaa71bfc 100644 --- a/docs/ru/sql-reference/statements/create/user.md +++ b/docs/ru/sql-reference/statements/create/user.md @@ -10,7 +10,7 @@ toc_title: "Пользователь" СинтакÑиÑ: ``` sql -CREATE USER [IF NOT EXISTS | OR REPLACE] name1 [ON CLUSTER cluster_name1] +CREATE USER [IF NOT EXISTS | OR REPLACE] name1 [ON CLUSTER cluster_name1] [, name2 [ON CLUSTER cluster_name2] ...] [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password | plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']}] [HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE] @@ -53,7 +53,7 @@ CREATE USER [IF NOT EXISTS | OR REPLACE] name1 [ON CLUSTER cluster_name1] !!! info "Внимание" ClickHouse трактует конÑтрукцию `user_name@'address'` как Ð¸Ð¼Ñ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ Ñ†ÐµÐ»Ð¸ÐºÐ¾Ð¼. То еÑÑ‚ÑŒ техничеÑки вы можете Ñоздать неÑколько пользователей Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼Ð¸ `user_name`, но разными чаÑÑ‚Ñми конÑтрукции поÑле `@`, но лучше так не делать. - + ## Ð¡ÐµÐºÑ†Ð¸Ñ GRANTEES {#grantees} УказываютÑÑ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ð¸ или роли, которым разрешено получать [привилегии](../../../sql-reference/statements/grant.md#grant-privileges) от Ñоздаваемого Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ Ð¿Ñ€Ð¸ уÑловии, что Ñтому пользователю также предоÑтавлен веÑÑŒ необходимый доÑтуп Ñ Ð¸Ñпользованием [GRANT OPTION](../../../sql-reference/statements/grant.md#grant-privigele-syntax). Параметры Ñекции `GRANTEES`: diff --git a/docs/ru/sql-reference/statements/create/view.md b/docs/ru/sql-reference/statements/create/view.md index 4e34b5e3b6e..0be29b12aea 100644 --- a/docs/ru/sql-reference/statements/create/view.md +++ b/docs/ru/sql-reference/statements/create/view.md @@ -57,7 +57,7 @@ CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER] [TO[db.]na Ðедоработано выполнение запроÑов `ALTER` над материализованными предÑтавлениÑми, поÑтому они могут быть неудобными Ð´Ð»Ñ Ð¸ÑпользованиÑ. ЕÑли материализованное предÑтавление иÑпользует конÑтрукцию `TO [db.]name`, то можно выполнить `DETACH` предÑтавлениÑ, `ALTER` Ð´Ð»Ñ Ñ†ÐµÐ»ÐµÐ²Ð¾Ð¹ таблицы и поÑледующий `ATTACH` ранее отÑоединенного (`DETACH`) предÑтавлениÑ. Обратите внимание, что работа материализованного предÑÑ‚Ð°Ð²Ð»ÐµÐ½Ð¸Ñ Ð½Ð°Ñ…Ð¾Ð´Ð¸Ñ‚ÑÑ Ð¿Ð¾Ð´ влиÑнием наÑтройки [optimize_on_insert](../../../operations/settings/settings.md#optimize-on-insert). Перед вÑтавкой данных в таблицу проиÑходит их ÑлиÑние. - + ПредÑÑ‚Ð°Ð²Ð»ÐµÐ½Ð¸Ñ Ð²Ñ‹Ð³Ð»ÑдÑÑ‚ так же, как обычные таблицы. Ðапример, они перечиÑлÑÑŽÑ‚ÑÑ Ð² результате запроÑа `SHOW TABLES`. Чтобы удалить предÑтавление, Ñледует иÑпользовать [DROP VIEW](../../../sql-reference/statements/drop.md#drop-view). Впрочем, `DROP TABLE` тоже работает Ð´Ð»Ñ Ð¿Ñ€ÐµÐ´Ñтавлений. @@ -65,8 +65,8 @@ CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER] [TO[db.]na ## LIVE-предÑÑ‚Ð°Ð²Ð»ÐµÐ½Ð¸Ñ {#live-view} !!! important "Важно" - ПредÑÑ‚Ð°Ð²Ð»ÐµÐ½Ð¸Ñ `LIVE VIEW` ÑвлÑÑŽÑ‚ÑÑ ÑкÑпериментальной возможноÑтью. Их иÑпользование может повлечь потерю ÑовмеÑтимоÑти в будущих верÑиÑÑ…. - Чтобы иÑпользовать `LIVE VIEW` и запроÑÑ‹ `WATCH`, включите наÑтройку [allow_experimental_live_view](../../../operations/settings/settings.md#allow-experimental-live-view). + ПредÑÑ‚Ð°Ð²Ð»ÐµÐ½Ð¸Ñ `LIVE VIEW` ÑвлÑÑŽÑ‚ÑÑ ÑкÑпериментальной возможноÑтью. Их иÑпользование может повлечь потерю ÑовмеÑтимоÑти в будущих верÑиÑÑ…. + Чтобы иÑпользовать `LIVE VIEW` и запроÑÑ‹ `WATCH`, включите наÑтройку [allow_experimental_live_view](../../../operations/settings/settings.md#allow-experimental-live-view). ```sql CREATE LIVE VIEW [IF NOT EXISTS] [db.]table_name [WITH [TIMEOUT [value_in_sec] [AND]] [REFRESH [value_in_sec]]] AS SELECT ... @@ -154,7 +154,7 @@ SELECT * FROM [db.]live_view WHERE ... ### Ð¡ÐµÐºÑ†Ð¸Ñ WITH TIMEOUT {#live-view-with-timeout} -LIVE-предÑтавление, Ñозданное Ñ Ð¿Ð°Ñ€Ð°Ð¼ÐµÑ‚Ñ€Ð¾Ð¼ `WITH TIMEOUT`, будет автоматичеÑки удалено через определенное количеÑтво Ñекунд Ñ Ð¼Ð¾Ð¼ÐµÐ½Ñ‚Ð° предыдущего запроÑа [WATCH](../../../sql-reference/statements/watch.md), примененного к данному LIVE-предÑтавлению. +LIVE-предÑтавление, Ñозданное Ñ Ð¿Ð°Ñ€Ð°Ð¼ÐµÑ‚Ñ€Ð¾Ð¼ `WITH TIMEOUT`, будет автоматичеÑки удалено через определенное количеÑтво Ñекунд Ñ Ð¼Ð¾Ð¼ÐµÐ½Ñ‚Ð° предыдущего запроÑа [WATCH](../../../sql-reference/statements/watch.md), примененного к данному LIVE-предÑтавлению. ```sql CREATE LIVE VIEW [db.]table_name WITH TIMEOUT [value_in_sec] AS SELECT ... @@ -198,7 +198,7 @@ WATCH lv; └─────────────────────┴──────────┘ ``` -Параметры `WITH TIMEOUT` и `WITH REFRESH` можно Ñочетать Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ `AND`. +Параметры `WITH TIMEOUT` и `WITH REFRESH` можно Ñочетать Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ `AND`. ```sql CREATE LIVE VIEW [db.]table_name WITH TIMEOUT [value_in_sec] AND REFRESH [value_in_sec] AS SELECT ... @@ -217,7 +217,7 @@ WATCH lv; ``` ``` -Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.lv doesn't exist.. +Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.lv doesn't exist.. ``` ### ИÑпользование {#live-view-usage} diff --git a/docs/ru/sql-reference/statements/explain.md b/docs/ru/sql-reference/statements/explain.md index c925e7030a7..c11c33ae99a 100644 --- a/docs/ru/sql-reference/statements/explain.md +++ b/docs/ru/sql-reference/statements/explain.md @@ -135,7 +135,7 @@ Union !!! note "Примечание" Оценка ÑтоимоÑти Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ ÑˆÐ°Ð³Ð° и запроÑа не поддерживаетÑÑ. - + При `json = 1` шаги Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа выводÑÑ‚ÑÑ Ð² формате JSON. Каждый узел — Ñто Ñловарь, в котором вÑегда еÑÑ‚ÑŒ ключи `Node Type` и `Plans`. `Node Type` — Ñто Ñтрока Ñ Ð¸Ð¼ÐµÐ½ÐµÐ¼ шага. `Plans` — Ñто маÑÑив Ñ Ð¾Ð¿Ð¸ÑаниÑми дочерних шагов. Другие дополнительные ключи могут быть добавлены в завиÑимоÑти от типа узла и наÑтроек. Пример: @@ -240,7 +240,7 @@ EXPLAIN json = 1, description = 0, header = 1 SELECT 1, 2 + dummy; } ] ``` - + При `indexes` = 1 добавлÑетÑÑ ÐºÐ»ÑŽÑ‡ `Indexes`. Он Ñодержит маÑÑив иÑпользуемых индекÑов. Каждый Ð¸Ð½Ð´ÐµÐºÑ Ð¾Ð¿Ð¸ÑываетÑÑ ÐºÐ°Ðº Ñтрока в формате JSON Ñ ÐºÐ»ÑŽÑ‡Ð¾Ð¼ `Type` (`MinMax`, `Partition`, `PrimaryKey` или `Skip`) и дополнительные ключи: - `Name` — Ð¸Ð¼Ñ Ð¸Ð½Ð´ÐµÐºÑа (на данный момент иÑпользуетÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ Ð´Ð»Ñ Ð¸Ð½Ð´ÐµÐºÑа `Skip`). diff --git a/docs/ru/sql-reference/statements/grant.md b/docs/ru/sql-reference/statements/grant.md index 05ffaa22bbd..8d6605e1571 100644 --- a/docs/ru/sql-reference/statements/grant.md +++ b/docs/ru/sql-reference/statements/grant.md @@ -13,7 +13,7 @@ toc_title: GRANT ## СинтакÑÐ¸Ñ Ð¿Ñ€Ð¸ÑÐ²Ð¾ÐµÐ½Ð¸Ñ Ð¿Ñ€Ð¸Ð²Ð¸Ð»ÐµÐ³Ð¸Ð¹ {#grant-privigele-syntax} ```sql -GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.table|db.*|*.*|table|*} TO {user | role | CURRENT_USER} [,...] [WITH GRANT OPTION] +GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.table|db.*|*.*|table|*} TO {user | role | CURRENT_USER} [,...] [WITH GRANT OPTION] [WITH REPLACE OPTION] ``` - `privilege` — Тип привилегии @@ -21,18 +21,20 @@ GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.ta - `user` — Пользователь ClickHouse. `WITH GRANT OPTION` разрешает пользователю или роли выполнÑÑ‚ÑŒ Ð·Ð°Ð¿Ñ€Ð¾Ñ `GRANT`. Пользователь может выдавать только те привилегии, которые еÑÑ‚ÑŒ у него, той же или меньшей облаÑти дейÑтвий. +`WITH REPLACE OPTION` заменÑет вÑе Ñтарые привилегии новыми привилегиÑми Ð´Ð»Ñ `user` или `role`, ЕÑли не указано, добавьте новые привилегии Ð´Ð»Ñ Ñтарых. ## СинтакÑÐ¸Ñ Ð½Ð°Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ñ€Ð¾Ð»ÐµÐ¹ {#assign-role-syntax} ```sql -GRANT [ON CLUSTER cluster_name] role [,...] TO {user | another_role | CURRENT_USER} [,...] [WITH ADMIN OPTION] +GRANT [ON CLUSTER cluster_name] role [,...] TO {user | another_role | CURRENT_USER} [,...] [WITH ADMIN OPTION] [WITH REPLACE OPTION] ``` - `role` — Роль Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ ClickHouse. - `user` — Пользователь ClickHouse. `WITH ADMIN OPTION` приÑваивает привилегию [ADMIN OPTION](#admin-option-privilege) пользователю или роли. +`WITH REPLACE OPTION` заменÑет вÑе Ñтарые роли новыми ролÑми Ð´Ð»Ñ Ð¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ñ‚ÐµÐ»Ñ `user` или `role`, ЕÑли не указано, добавьте новые роли в Ñтарые. ## ИÑпользование {#grant-usage} @@ -84,7 +86,7 @@ GRANT SELECT(x,y) ON db.table TO john WITH GRANT OPTION - `ALTER RENAME COLUMN` - `ALTER INDEX` - `ALTER ORDER BY` - - `ALTER SAMPLE BY` + - `ALTER SAMPLE BY` - `ALTER ADD INDEX` - `ALTER DROP INDEX` - `ALTER MATERIALIZE INDEX` @@ -183,7 +185,7 @@ GRANT SELECT(x,y) ON db.table TO john WITH GRANT OPTION Примеры того, как трактуетÑÑ Ð´Ð°Ð½Ð½Ð°Ñ Ð¸ÐµÑ€Ð°Ñ€Ñ…Ð¸Ñ: -- ÐŸÑ€Ð¸Ð²Ð¸Ð»ÐµÐ³Ð¸Ñ `ALTER` включает вÑе оÑтальные `ALTER*` привилегии. +- ÐŸÑ€Ð¸Ð²Ð¸Ð»ÐµÐ³Ð¸Ñ `ALTER` включает вÑе оÑтальные `ALTER*` привилегии. - `ALTER CONSTRAINT` включает `ALTER ADD CONSTRAINT` и `ALTER DROP CONSTRAINT`. Привилегии применÑÑŽÑ‚ÑÑ Ð½Ð° разных уровнÑÑ…. Уровень определÑет ÑинтакÑÐ¸Ñ Ð¿Ñ€Ð¸ÑÐ²Ð°Ð¸Ð²Ð°Ð½Ð¸Ñ Ð¿Ñ€Ð¸Ð²Ð¸Ð»ÐµÐ³Ð¸Ð¸. @@ -257,7 +259,7 @@ GRANT INSERT(x,y) ON db.table TO john Разрешает выполнÑÑ‚ÑŒ запроÑÑ‹ [ALTER](alter/index.md) в ÑоответÑтвии Ñо Ñледующей иерархией привилегий: -- `ALTER`. Уровень: `COLUMN`. +- `ALTER`. Уровень: `COLUMN`. - `ALTER TABLE`. Уровень: `GROUP` - `ALTER UPDATE`. Уровень: `COLUMN`. ÐлиаÑÑ‹: `UPDATE` - `ALTER DELETE`. Уровень: `COLUMN`. ÐлиаÑÑ‹: `DELETE` @@ -270,7 +272,7 @@ GRANT INSERT(x,y) ON db.table TO john - `ALTER RENAME COLUMN`. Уровень: `COLUMN`. ÐлиаÑÑ‹: `RENAME COLUMN` - `ALTER INDEX`. Уровень: `GROUP`. ÐлиаÑÑ‹: `INDEX` - `ALTER ORDER BY`. Уровень: `TABLE`. ÐлиаÑÑ‹: `ALTER MODIFY ORDER BY`, `MODIFY ORDER BY` - - `ALTER SAMPLE BY`. Уровень: `TABLE`. ÐлиаÑÑ‹: `ALTER MODIFY SAMPLE BY`, `MODIFY SAMPLE BY` + - `ALTER SAMPLE BY`. Уровень: `TABLE`. ÐлиаÑÑ‹: `ALTER MODIFY SAMPLE BY`, `MODIFY SAMPLE BY` - `ALTER ADD INDEX`. Уровень: `TABLE`. ÐлиаÑÑ‹: `ADD INDEX` - `ALTER DROP INDEX`. Уровень: `TABLE`. ÐлиаÑÑ‹: `DROP INDEX` - `ALTER MATERIALIZE INDEX`. Уровень: `TABLE`. ÐлиаÑÑ‹: `MATERIALIZE INDEX` @@ -282,7 +284,7 @@ GRANT INSERT(x,y) ON db.table TO john - `ALTER MATERIALIZE TTL`. Уровень: `TABLE`. ÐлиаÑÑ‹: `MATERIALIZE TTL` - `ALTER SETTINGS`. Уровень: `TABLE`. ÐлиаÑÑ‹: `ALTER SETTING`, `ALTER MODIFY SETTING`, `MODIFY SETTING` - `ALTER MOVE PARTITION`. Уровень: `TABLE`. ÐлиаÑÑ‹: `ALTER MOVE PART`, `MOVE PARTITION`, `MOVE PART` - - `ALTER FETCH PARTITION`. Уровень: `TABLE`. ÐлиаÑÑ‹: `FETCH PARTITION` + - `ALTER FETCH PARTITION`. Уровень: `TABLE`. ÐлиаÑÑ‹: `ALTER FETCH PART`, `FETCH PARTITION`, `FETCH PART` - `ALTER FREEZE PARTITION`. Уровень: `TABLE`. ÐлиаÑÑ‹: `FREEZE PARTITION` - `ALTER VIEW` Уровень: `GROUP` - `ALTER VIEW REFRESH `. Уровень: `VIEW`. ÐлиаÑÑ‹: `ALTER LIVE VIEW REFRESH`, `REFRESH VIEW` @@ -290,7 +292,7 @@ GRANT INSERT(x,y) ON db.table TO john Примеры того, как трактуетÑÑ Ð´Ð°Ð½Ð½Ð°Ñ Ð¸ÐµÑ€Ð°Ñ€Ñ…Ð¸Ñ: -- ÐŸÑ€Ð¸Ð²Ð¸Ð»ÐµÐ³Ð¸Ñ `ALTER` включает вÑе оÑтальные `ALTER*` привилегии. +- ÐŸÑ€Ð¸Ð²Ð¸Ð»ÐµÐ³Ð¸Ñ `ALTER` включает вÑе оÑтальные `ALTER*` привилегии. - `ALTER CONSTRAINT` включает `ALTER ADD CONSTRAINT` и `ALTER DROP CONSTRAINT`. **Дополнительно** @@ -481,4 +483,3 @@ GRANT INSERT(x,y) ON db.table TO john ### ADMIN OPTION {#admin-option-privilege} ÐŸÑ€Ð¸Ð²Ð¸Ð»ÐµÐ³Ð¸Ñ `ADMIN OPTION` разрешает пользователю назначать Ñвои роли другому пользователю. - diff --git a/docs/ru/sql-reference/statements/insert-into.md b/docs/ru/sql-reference/statements/insert-into.md index 328f1023624..da9f3a11101 100644 --- a/docs/ru/sql-reference/statements/insert-into.md +++ b/docs/ru/sql-reference/statements/insert-into.md @@ -107,7 +107,7 @@ INSERT INTO [db.]table [(c1, c2, c3)] SELECT ... Ð”Ð»Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ‡Ð½Ð¾Ð¹ функции [input()](../table-functions/input.md) поÑле Ñекции `SELECT` должна Ñледовать ÑÐµÐºÑ†Ð¸Ñ `FORMAT`. -Чтобы вÑтавить значение по умолчанию вмеÑто `NULL` в Ñтолбец, который не позволÑет хранить `NULL`, включите наÑтройку [insert_null_as_default](../../operations/settings/settings.md#insert_null_as_default). +Чтобы вÑтавить значение по умолчанию вмеÑто `NULL` в Ñтолбец, который не позволÑет хранить `NULL`, включите наÑтройку [insert_null_as_default](../../operations/settings/settings.md#insert_null_as_default). ### Ð—Ð°Ð¼ÐµÑ‡Ð°Ð½Ð¸Ñ Ð¾ производительноÑти {#zamechaniia-o-proizvoditelnosti} diff --git a/docs/ru/sql-reference/statements/optimize.md b/docs/ru/sql-reference/statements/optimize.md index 70503ec4de9..1f0c5a0ebe9 100644 --- a/docs/ru/sql-reference/statements/optimize.md +++ b/docs/ru/sql-reference/statements/optimize.md @@ -62,12 +62,12 @@ CREATE TABLE example ( materialized_value UInt32 MATERIALIZED 12345, aliased_value UInt32 ALIAS 2, PRIMARY KEY primary_key -) ENGINE=MergeTree  +) ENGINE=MergeTree PARTITION BY partition_key ORDER BY (primary_key, secondary_key); ``` ``` sql -INSERT INTO example (primary_key, secondary_key, value, partition_key) +INSERT INTO example (primary_key, secondary_key, value, partition_key) VALUES (0, 0, 0, 0), (0, 0, 0, 0), (1, 1, 2, 2), (1, 1, 2, 3), (1, 1, 3, 3); ``` ``` sql diff --git a/docs/ru/sql-reference/statements/select/group-by.md b/docs/ru/sql-reference/statements/select/group-by.md index e94091f156d..2f0cabd14fb 100644 --- a/docs/ru/sql-reference/statements/select/group-by.md +++ b/docs/ru/sql-reference/statements/select/group-by.md @@ -52,7 +52,7 @@ toc_title: GROUP BY !!! note "Примечание" ЕÑли в запроÑе еÑÑ‚ÑŒ ÑÐµÐºÑ†Ð¸Ñ [HAVING](../../../sql-reference/statements/select/having.md), она может повлиÑÑ‚ÑŒ на результаты раÑчета подытогов. -**Пример** +**Пример** РаÑÑмотрим таблицу t: @@ -105,14 +105,14 @@ SELECT year, month, day, count(*) FROM t GROUP BY year, month, day WITH ROLLUP; ## Модификатор WITH CUBE {#with-cube-modifier} -Модификатор `WITH CUBE` применÑтеÑÑ Ð´Ð»Ñ Ñ€Ð°Ñчета подытогов по вÑем комбинациÑм группировки ключевых выражений в ÑпиÑке `GROUP BY`. +Модификатор `WITH CUBE` применÑтеÑÑ Ð´Ð»Ñ Ñ€Ð°Ñчета подытогов по вÑем комбинациÑм группировки ключевых выражений в ÑпиÑке `GROUP BY`. Строки Ñ Ð¿Ð¾Ð´Ñ‹Ñ‚Ð¾Ð³Ð°Ð¼Ð¸ добавлÑÑŽÑ‚ÑÑ Ð² конец результирующей таблицы. Ð’ колонках, по которым выполнÑетÑÑ Ð³Ñ€ÑƒÐ¿Ð¿Ð¸Ñ€Ð¾Ð²ÐºÐ°, указываетÑÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ðµ `0` или пуÑÑ‚Ð°Ñ Ñтрока. !!! note "Примечание" ЕÑли в запроÑе еÑÑ‚ÑŒ ÑÐµÐºÑ†Ð¸Ñ [HAVING](../../../sql-reference/statements/select/having.md), она может повлиÑÑ‚ÑŒ на результаты раÑчета подытогов. -**Пример** +**Пример** РаÑÑмотрим таблицу t: @@ -135,13 +135,13 @@ SELECT year, month, day, count(*) FROM t GROUP BY year, month, day WITH CUBE; ПоÑкольку ÑÐµÐºÑ†Ð¸Ñ `GROUP BY` Ñодержит три ключевых выражениÑ, результат ÑоÑтоит из воÑьми таблиц Ñ Ð¿Ð¾Ð´Ñ‹Ñ‚Ð¾Ð³Ð°Ð¼Ð¸ — по таблице Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð¹ комбинации ключевых выражений: -- `GROUP BY year, month, day` -- `GROUP BY year, month` +- `GROUP BY year, month, day` +- `GROUP BY year, month` - `GROUP BY year, day` -- `GROUP BY year` -- `GROUP BY month, day` -- `GROUP BY month` -- `GROUP BY day` +- `GROUP BY year` +- `GROUP BY month, day` +- `GROUP BY month` +- `GROUP BY day` - и общий итог. Колонки, которые не учаÑтвуют в `GROUP BY`, заполнены нулÑми. @@ -254,7 +254,7 @@ GROUP BY вычиÑлÑет Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð³Ð¾ вÑтретившегоÑÑ ### ÐžÐ¿Ñ‚Ð¸Ð¼Ð¸Ð·Ð°Ñ†Ð¸Ñ GROUP BY Ð´Ð»Ñ Ð¾Ñ‚Ñортированных таблиц {#aggregation-in-order} -Ðгрегирование данных в отÑортированных таблицах может выполнÑÑ‚ÑŒÑÑ Ð±Ð¾Ð»ÐµÐµ Ñффективно, еÑли выражение `GROUP BY` Ñодержит Ñ…Ð¾Ñ‚Ñ Ð±Ñ‹ Ð¿Ñ€ÐµÑ„Ð¸ÐºÑ ÐºÐ»ÑŽÑ‡Ð° Ñортировки или инъективную функцию Ñ Ñтим ключом. Ð’ таких ÑлучаÑÑ… в момент ÑÑ‡Ð¸Ñ‚Ñ‹Ð²Ð°Ð½Ð¸Ñ Ð¸Ð· таблицы нового Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÐºÐ»ÑŽÑ‡Ð° Ñортировки промежуточный результат Ð°Ð³Ñ€ÐµÐ³Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð±ÑƒÐ´ÐµÑ‚ финализироватьÑÑ Ð¸ отправлÑÑ‚ÑŒÑÑ Ð½Ð° клиентÑкую машину. Чтобы включить такой ÑпоÑоб Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа, иÑпользуйте наÑтройку [optimize_aggregation_in_order](../../../operations/settings/settings.md#optimize_aggregation_in_order). ÐŸÐ¾Ð´Ð¾Ð±Ð½Ð°Ñ Ð¾Ð¿Ñ‚Ð¸Ð¼Ð¸Ð·Ð°Ñ†Ð¸Ñ Ð¿Ð¾Ð·Ð²Ð¾Ð»Ñет ÑÑкономить памÑÑ‚ÑŒ во Ð²Ñ€ÐµÐ¼Ñ Ð°Ð³Ñ€ÐµÐ³Ð°Ñ†Ð¸Ð¸, но в некоторых ÑлучаÑÑ… может привеÑти к увеличению времени Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа. +Ðгрегирование данных в отÑортированных таблицах может выполнÑÑ‚ÑŒÑÑ Ð±Ð¾Ð»ÐµÐµ Ñффективно, еÑли выражение `GROUP BY` Ñодержит Ñ…Ð¾Ñ‚Ñ Ð±Ñ‹ Ð¿Ñ€ÐµÑ„Ð¸ÐºÑ ÐºÐ»ÑŽÑ‡Ð° Ñортировки или инъективную функцию Ñ Ñтим ключом. Ð’ таких ÑлучаÑÑ… в момент ÑÑ‡Ð¸Ñ‚Ñ‹Ð²Ð°Ð½Ð¸Ñ Ð¸Ð· таблицы нового Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÐºÐ»ÑŽÑ‡Ð° Ñортировки промежуточный результат Ð°Ð³Ñ€ÐµÐ³Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ Ð±ÑƒÐ´ÐµÑ‚ финализироватьÑÑ Ð¸ отправлÑÑ‚ÑŒÑÑ Ð½Ð° клиентÑкую машину. Чтобы включить такой ÑпоÑоб Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа, иÑпользуйте наÑтройку [optimize_aggregation_in_order](../../../operations/settings/settings.md#optimize_aggregation_in_order). ÐŸÐ¾Ð´Ð¾Ð±Ð½Ð°Ñ Ð¾Ð¿Ñ‚Ð¸Ð¼Ð¸Ð·Ð°Ñ†Ð¸Ñ Ð¿Ð¾Ð·Ð²Ð¾Ð»Ñет ÑÑкономить памÑÑ‚ÑŒ во Ð²Ñ€ÐµÐ¼Ñ Ð°Ð³Ñ€ÐµÐ³Ð°Ñ†Ð¸Ð¸, но в некоторых ÑлучаÑÑ… может привеÑти к увеличению времени Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа. ### Группировка во внешней памÑти {#select-group-by-in-external-memory} diff --git a/docs/ru/sql-reference/statements/select/index.md b/docs/ru/sql-reference/statements/select/index.md index a3b4e889397..a0a862cbf55 100644 --- a/docs/ru/sql-reference/statements/select/index.md +++ b/docs/ru/sql-reference/statements/select/index.md @@ -20,7 +20,7 @@ SELECT [DISTINCT] expr_list [WHERE expr] [GROUP BY expr_list] [WITH ROLLUP|WITH CUBE] [WITH TOTALS] [HAVING expr] -[ORDER BY expr_list] [WITH FILL] [FROM expr] [TO expr] [STEP expr] +[ORDER BY expr_list] [WITH FILL] [FROM expr] [TO expr] [STEP expr] [LIMIT [offset_value, ]n BY columns] [LIMIT [n, ]m] [WITH TIES] [SETTINGS ...] @@ -147,7 +147,7 @@ Code: 42. DB::Exception: Received from localhost:9000. DB::Exception: Number of ## Детали реализации {#implementation-details} ЕÑли в запроÑе отÑутÑтвуют Ñекции `DISTINCT`, `GROUP BY`, `ORDER BY`, подзапроÑÑ‹ в `IN` и `JOIN`, то Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð±ÑƒÐ´ÐµÑ‚ обработан полноÑтью потоково, Ñ Ð¸Ñпользованием O(1) количеÑтва оперативки. -Иначе Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð¼Ð¾Ð¶ÐµÑ‚ ÑъеÑÑ‚ÑŒ много оперативки, еÑли не указаны подходÑщие ограничениÑ: +Иначе Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð¼Ð¾Ð¶ÐµÑ‚ ÑъеÑÑ‚ÑŒ много оперативки, еÑли не указаны подходÑщие ограничениÑ: - `max_memory_usage` - `max_rows_to_group_by` @@ -169,7 +169,7 @@ Code: 42. DB::Exception: Received from localhost:9000. DB::Exception: Number of ### APPLY {#apply-modifier} -Вызывает указанную функцию Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð¹ Ñтроки, возвращаемой внешним табличным выражением запроÑа. +Вызывает указанную функцию Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð¹ Ñтроки, возвращаемой внешним табличным выражением запроÑа. **СинтакÑиÑ:** @@ -177,7 +177,7 @@ Code: 42. DB::Exception: Received from localhost:9000. DB::Exception: Number of SELECT APPLY( ) FROM [db.]table_name ``` -**Пример:** +**Пример:** ``` sql CREATE TABLE columns_transformers (i Int64, j Int16, k Int64) ENGINE = MergeTree ORDER by (i); @@ -271,9 +271,9 @@ SELECT * REPLACE(i + 1 AS i) EXCEPT (j) APPLY(sum) from columns_transformers; ## SETTINGS в запроÑе SELECT {#settings-in-select} -Ð’Ñ‹ можете задать Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð½ÐµÐ¾Ð±Ñ…Ð¾Ð´Ð¸Ð¼Ñ‹Ñ… наÑтроек непоÑредÑтвенно в запроÑе `SELECT` в Ñекции `SETTINGS`. Эти наÑтройки дейÑтвуют только в рамках данного запроÑа, а поÑле его Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ ÑбраÑываютÑÑ Ð´Ð¾ предыдущего Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð»Ð¸ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ умолчанию. +Ð’Ñ‹ можете задать Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð½ÐµÐ¾Ð±Ñ…Ð¾Ð´Ð¸Ð¼Ñ‹Ñ… наÑтроек непоÑредÑтвенно в запроÑе `SELECT` в Ñекции `SETTINGS`. Эти наÑтройки дейÑтвуют только в рамках данного запроÑа, а поÑле его Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ ÑбраÑываютÑÑ Ð´Ð¾ предыдущего Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¸Ð»Ð¸ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ умолчанию. -Другие ÑпоÑобы Ð·Ð°Ð´Ð°Ð½Ð¸Ñ Ð½Ð°Ñтроек опиÑаны [здеÑÑŒ](../../../operations/settings/index.md). +Другие ÑпоÑобы Ð·Ð°Ð´Ð°Ð½Ð¸Ñ Ð½Ð°Ñтроек опиÑаны [здеÑÑŒ](../../../operations/settings/index.md). **Пример** diff --git a/docs/ru/sql-reference/statements/select/join.md b/docs/ru/sql-reference/statements/select/join.md index 4bd883c87ff..03018a953ce 100644 --- a/docs/ru/sql-reference/statements/select/join.md +++ b/docs/ru/sql-reference/statements/select/join.md @@ -4,7 +4,7 @@ toc_title: JOIN # Ð¡ÐµÐºÑ†Ð¸Ñ JOIN {#select-join} -Join Ñоздаёт новую таблицу путем Ð¾Ð±ÑŠÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñтолбцов из одной или неÑкольких таблиц Ñ Ð¸Ñпользованием общих Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð¹ из них значений. Это Ð¾Ð±Ñ‹Ñ‡Ð½Ð°Ñ Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ñ Ð² базах данных Ñ Ð¿Ð¾Ð´Ð´ÐµÑ€Ð¶ÐºÐ¾Ð¹ SQL, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ ÑоответÑтвует join из [релÑционной алгебры](https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators). ЧаÑтный Ñлучай ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¾Ð´Ð½Ð¾Ð¹ таблицы чаÑто называют «self-join». +`JOIN` Ñоздаёт новую таблицу путем Ð¾Ð±ÑŠÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñтолбцов из одной или неÑкольких таблиц Ñ Ð¸Ñпользованием общих Ð´Ð»Ñ ÐºÐ°Ð¶Ð´Ð¾Ð¹ из них значений. Это Ð¾Ð±Ñ‹Ñ‡Ð½Ð°Ñ Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ñ Ð² базах данных Ñ Ð¿Ð¾Ð´Ð´ÐµÑ€Ð¶ÐºÐ¾Ð¹ SQL, ÐºÐ¾Ñ‚Ð¾Ñ€Ð°Ñ ÑоответÑтвует join из [релÑционной алгебры](https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators). ЧаÑтный Ñлучай ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ð¾Ð´Ð½Ð¾Ð¹ таблицы чаÑто называют self-join. СинтакÑиÑ: @@ -38,12 +38,21 @@ FROM ## ÐаÑтройки {#join-settings} -!!! note "Примечание" - Значение ÑтрогоÑти по умолчанию может быть переопределено Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ наÑтройки [join_default_strictness](../../../operations/settings/settings.md#settings-join_default_strictness). +Значение ÑтрогоÑти по умолчанию может быть переопределено Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ наÑтройки [join_default_strictness](../../../operations/settings/settings.md#settings-join_default_strictness). Поведение Ñервера ClickHouse Ð´Ð»Ñ Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ð¹ `ANY JOIN` завиÑит от параметра [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys). -### ИÑпользование ASOF JOIN {#asof-join-usage} +**См. также** + +- [join_algorithm](../../../operations/settings/settings.md#settings-join_algorithm) +- [join_any_take_last_row](../../../operations/settings/settings.md#settings-join_any_take_last_row) +- [join_use_nulls](../../../operations/settings/settings.md#join_use_nulls) +- [partial_merge_join_optimizations](../../../operations/settings/settings.md#partial_merge_join_optimizations) +- [partial_merge_join_rows_in_right_blocks](../../../operations/settings/settings.md#partial_merge_join_rows_in_right_blocks) +- [join_on_disk_max_files_to_merge](../../../operations/settings/settings.md#join_on_disk_max_files_to_merge) +- [any_join_distinct_right_table_keys](../../../operations/settings/settings.md#any_join_distinct_right_table_keys) + +## ИÑпользование ASOF JOIN {#asof-join-usage} `ASOF JOIN` применим в том Ñлучае, когда необходимо объединÑÑ‚ÑŒ запиÑи, которые не имеют точного ÑовпадениÑ. @@ -95,7 +104,7 @@ USING (equi_column1, ... equi_columnN, asof_column) Чтобы задать значение ÑтрогоÑти по умолчанию, иÑпользуйте ÑеÑÑионный параметр [join_default_strictness](../../../operations/settings/settings.md#settings-join_default_strictness). -#### РаÑпределённый join {#global-join} +## РаÑпределённый JOIN {#global-join} ЕÑÑ‚ÑŒ два пути Ð´Ð»Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ ÑÐ¾ÐµÐ´Ð¸Ð½ÐµÐ½Ð¸Ñ Ñ ÑƒÑ‡Ð°Ñтием раÑпределённых таблиц: @@ -104,6 +113,42 @@ USING (equi_column1, ... equi_columnN, asof_column) Будьте аккуратны при иÑпользовании `GLOBAL`. За дополнительной информацией обращайтеÑÑŒ в раздел [РаÑпределенные подзапроÑÑ‹](../../../sql-reference/operators/in.md#select-distributed-subqueries). +## ÐеÑвные Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ñ Ñ‚Ð¸Ð¿Ð¾Ð² {#implicit-type-conversion} + +ЗапроÑÑ‹ `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN` и `FULL JOIN` поддерживают неÑвные Ð¿Ñ€ÐµÐ¾Ð±Ñ€Ð°Ð·Ð¾Ð²Ð°Ð½Ð¸Ñ Ñ‚Ð¸Ð¿Ð¾Ð² Ð´Ð»Ñ ÐºÐ»ÑŽÑ‡ÐµÐ¹ ÑоединениÑ. Однако Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð½Ðµ может быть выполнен, еÑли не ÑущеÑтвует типа, к которому можно привеÑти Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ ÐºÐ»ÑŽÑ‡ÐµÐ¹ Ñ Ð¾Ð±ÐµÐ¸Ñ… Ñторон (например, нет типа, который бы одновременно вмещал в ÑÐµÐ±Ñ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ `UInt64` и `Int64`, или `String` и `Int32`). + +**Пример** + +РаÑÑмотрим таблицу `t_1`: +```text +┌─a─┬─b─┬─toTypeName(a)─┬─toTypeName(b)─┠+│ 1 │ 1 │ UInt16 │ UInt8 │ +│ 2 │ 2 │ UInt16 │ UInt8 │ +└───┴───┴───────────────┴───────────────┘ +``` +и таблицу `t_2`: +```text +┌──a─┬────b─┬─toTypeName(a)─┬─toTypeName(b)───┠+│ -1 │ 1 │ Int16 │ Nullable(Int64) │ +│ 1 │ -1 │ Int16 │ Nullable(Int64) │ +│ 1 │ 1 │ Int16 │ Nullable(Int64) │ +└────┴──────┴───────────────┴─────────────────┘ +``` + +Ð—Ð°Ð¿Ñ€Ð¾Ñ +```sql +SELECT a, b, toTypeName(a), toTypeName(b) FROM t_1 FULL JOIN t_2 USING (a, b); +``` +вернёт результат: +```text +┌──a─┬────b─┬─toTypeName(a)─┬─toTypeName(b)───┠+│ 1 │ 1 │ Int32 │ Nullable(Int64) │ +│ 2 │ 2 │ Int32 │ Nullable(Int64) │ +│ -1 │ 1 │ Int32 │ Nullable(Int64) │ +│ 1 │ -1 │ Int32 │ Nullable(Int64) │ +└────┴──────┴───────────────┴─────────────────┘ +``` + ## Рекомендации по иÑпользованию {#usage-recommendations} ### Обработка пуÑÑ‚Ñ‹Ñ… Ñчеек и NULL {#processing-of-empty-or-null-cells} @@ -142,12 +187,14 @@ USING (equi_column1, ... equi_columnN, asof_column) ### ÐžÐ³Ñ€Ð°Ð½Ð¸Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ памÑти {#memory-limitations} -По умолчанию ClickHouse иÑпользует алгоритм [hash join](https://en.wikipedia.org/wiki/Hash_join). ClickHouse берет `` и Ñоздает Ð´Ð»Ñ Ð½ÐµÐ³Ð¾ Ñ…Ñш-таблицу в оперативной памÑти. ПоÑле некоторого порога Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»ÐµÐ½Ð¸Ñ Ð¿Ð°Ð¼Ñти ClickHouse переходит к алгоритму merge join. +По умолчанию ClickHouse иÑпользует алгоритм [hash join](https://ru.wikipedia.org/wiki/Ðлгоритм_ÑоединениÑ_хешированием). ClickHouse берет правую таблицу и Ñоздает Ð´Ð»Ñ Ð½ÐµÐµ хеш-таблицу в оперативной памÑти. При включённой наÑтройке `join_algorithm = 'auto'`, поÑле некоторого порога Ð¿Ð¾Ñ‚Ñ€ÐµÐ±Ð»ÐµÐ½Ð¸Ñ Ð¿Ð°Ð¼Ñти ClickHouse переходит к алгоритму [merge join](https://ru.wikipedia.org/wiki/Ðлгоритм_ÑоединениÑ_ÑлиÑнием_Ñортированных_ÑпиÑков). ОпиÑание алгоритмов `JOIN` Ñм. в наÑтройке [join_algorithm](../../../operations/settings/settings.md#settings-join_algorithm). -- [max_rows_in_join](../../../operations/settings/query-complexity.md#settings-max_rows_in_join) — ограничивает количеÑтво Ñтрок в Ñ…Ñш-таблице. -- [max_bytes_in_join](../../../operations/settings/query-complexity.md#settings-max_bytes_in_join) — ограничивает размер Ñ…Ñш-таблицы. +ЕÑли вы хотите ограничить потребление памÑти во Ð²Ñ€ÐµÐ¼Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¾Ð¿ÐµÑ€Ð°Ñ†Ð¸Ð¸ `JOIN`, иÑпользуйте наÑтройки: -По доÑтижении любого из Ñтих ограничений, ClickHouse дейÑтвует в ÑоответÑтвии Ñ Ð½Ð°Ñтройкой [join_overflow_mode](../../../operations/settings/query-complexity.md#settings-join_overflow_mode). +- [max_rows_in_join](../../../operations/settings/query-complexity.md#settings-max_rows_in_join) — ограничивает количеÑтво Ñтрок в хеш-таблице. +- [max_bytes_in_join](../../../operations/settings/query-complexity.md#settings-max_bytes_in_join) — ограничивает размер хеш-таблицы. + +По доÑтижении любого из Ñтих ограничений ClickHouse дейÑтвует в ÑоответÑтвии Ñ Ð½Ð°Ñтройкой [join_overflow_mode](../../../operations/settings/query-complexity.md#settings-join_overflow_mode). ## Примеры {#examples} diff --git a/docs/ru/sql-reference/statements/select/limit.md b/docs/ru/sql-reference/statements/select/limit.md index e4012e89556..0266dbc540c 100644 --- a/docs/ru/sql-reference/statements/select/limit.md +++ b/docs/ru/sql-reference/statements/select/limit.md @@ -12,7 +12,7 @@ toc_title: LIMIT При отÑутÑтвии Ñекции [ORDER BY](order-by.md), однозначно Ñортирующей результат, результат может быть произвольным и может ÑвлÑÑ‚ÑŒÑÑ Ð½ÐµÐ´ÐµÑ‚ÐµÑ€Ð¼Ð¸Ð½Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð½Ñ‹Ð¼. -!!! note "Примечание" +!!! note "Примечание" КоличеÑтво возвращаемых Ñтрок может завиÑеть также от наÑтройки [limit](../../../operations/settings/settings.md#limit). ## Модификатор LIMIT ... WITH TIES {#limit-with-ties} @@ -46,7 +46,7 @@ SELECT * FROM ( ) ORDER BY n LIMIT 0,5 WITH TIES ``` -возвращает другой набор Ñтрок +возвращает другой набор Ñтрок ```text ┌─n─┠│ 0 │ diff --git a/docs/ru/sql-reference/statements/select/offset.md b/docs/ru/sql-reference/statements/select/offset.md index 31ff1d6ea8b..5b6f4cdfba8 100644 --- a/docs/ru/sql-reference/statements/select/offset.md +++ b/docs/ru/sql-reference/statements/select/offset.md @@ -35,7 +35,7 @@ SELECT * FROM test_fetch ORDER BY a LIMIT 3 OFFSET 1; !!! note "Примечание" Общее количеÑтво пропущенных Ñтрок может завиÑеть также от наÑтройки [offset](../../../operations/settings/settings.md#offset). - + ## Примеры {#examples} Ð’Ñ…Ð¾Ð´Ð½Ð°Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ð°: diff --git a/docs/ru/sql-reference/statements/select/order-by.md b/docs/ru/sql-reference/statements/select/order-by.md index cb49d167b13..d7d2e9c7574 100644 --- a/docs/ru/sql-reference/statements/select/order-by.md +++ b/docs/ru/sql-reference/statements/select/order-by.md @@ -250,8 +250,8 @@ SELECT * FROM collate_test ORDER BY s ASC COLLATE 'en'; ## ÐžÐ¿Ñ‚Ð¸Ð¼Ð¸Ð·Ð°Ñ†Ð¸Ñ Ñ‡Ñ‚ÐµÐ½Ð¸Ñ Ð´Ð°Ð½Ð½Ñ‹Ñ… {#optimize_read_in_order} - ЕÑли в ÑпиÑке выражений в Ñекции `ORDER BY` первыми указаны те полÑ, по которым проиндекÑирована таблица, по которой ÑтроитÑÑ Ð²Ñ‹Ð±Ð¾Ñ€ÐºÐ°, такой Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð¼Ð¾Ð¶Ð½Ð¾ оптимизировать — Ð´Ð»Ñ Ñтого иÑпользуйте наÑтройку [optimize_read_in_order](../../../operations/settings/settings.md#optimize_read_in_order). - + ЕÑли в ÑпиÑке выражений в Ñекции `ORDER BY` первыми указаны те полÑ, по которым проиндекÑирована таблица, по которой ÑтроитÑÑ Ð²Ñ‹Ð±Ð¾Ñ€ÐºÐ°, такой Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð¼Ð¾Ð¶Ð½Ð¾ оптимизировать — Ð´Ð»Ñ Ñтого иÑпользуйте наÑтройку [optimize_read_in_order](../../../operations/settings/settings.md#optimize_read_in_order). + Когда наÑтройка `optimize_read_in_order` включена, при выполнении запроÑа Ñервер иÑпользует табличные индекÑÑ‹ и Ñчитывает данные в том порÑдке, который задан ÑпиÑком выражений `ORDER BY`. ПоÑтому еÑли в запроÑе уÑтановлен [LIMIT](../../../sql-reference/statements/select/limit.md), Ñервер не Ñтанет Ñчитывать лишние данные. Таким образом, запроÑÑ‹ к большим таблицам, но имеющие Ð¾Ð³Ñ€Ð°Ð½Ð¸Ñ‡ÐµÐ½Ð¸Ñ Ð¿Ð¾ чиÑлу запиÑей, выполнÑÑŽÑ‚ÑÑ Ð±Ñ‹Ñтрее. ÐžÐ¿Ñ‚Ð¸Ð¼Ð¸Ð·Ð°Ñ†Ð¸Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ð°ÐµÑ‚ при любом порÑдке Ñортировки `ASC` или `DESC`, но не работает при иÑпользовании группировки [GROUP BY](../../../sql-reference/statements/select/group-by.md) и модификатора [FINAL](../../../sql-reference/statements/select/from.md#select-from-final). @@ -271,8 +271,8 @@ SELECT * FROM collate_test ORDER BY s ASC COLLATE 'en'; Этот модификатор также может быть Ñкобинирован Ñ Ð¼Ð¾Ð´Ð¸Ñ„Ð¸ÐºÐ°Ñ‚Ð¾Ñ€Ð¾Ð¼ [LIMIT ... WITH TIES](../../../sql-reference/statements/select/limit.md#limit-with-ties) -`WITH FILL` модификатор может быть уÑтановлен поÑле `ORDER BY expr` Ñ Ð¾Ð¿Ñ†Ð¸Ð¾Ð½Ð°Ð»ÑŒÐ½Ñ‹Ð¼Ð¸ параметрами `FROM expr`, `TO expr` и `STEP expr`. -Ð’Ñе пропущенные Ð·Ð½Ð°Ñ‡Ð½ÐµÐ¸Ñ Ð´Ð»Ñ ÐºÐ¾Ð»Ð¾Ð½ÐºÐ¸ `expr` будут заполненые значениÑми ÑоответÑвующими предполагаемой поÑледовательноÑти значений колонки, другие колонки будут заполнены значенÑми по умолчанию. +Модификатор `WITH FILL` может быть уÑтановлен поÑле `ORDER BY expr` Ñ Ð¾Ð¿Ñ†Ð¸Ð¾Ð½Ð°Ð»ÑŒÐ½Ñ‹Ð¼Ð¸ параметрами `FROM expr`, `TO expr` и `STEP expr`. +Ð’Ñе пропущенные Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð´Ð»Ñ ÐºÐ¾Ð»Ð¾Ð½ÐºÐ¸ `expr` будут заполнены значениÑми, ÑоответÑтвующими предполагаемой поÑледовательноÑти значений колонки, другие колонки будут заполнены значениÑми по умолчанию. ИÑпользуйте Ñледующую конÑтрукцию Ð´Ð»Ñ Ð·Ð°Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð½ÐµÑкольких колонок Ñ Ð¼Ð¾Ð´Ð¸Ñ„Ð¸ÐºÐ°Ñ‚Ð¾Ñ€Ð¾Ð¼ `WITH FILL` Ñ Ð½ÐµÐ¾Ð±Ñзательными параметрами поÑле каждого имени Ð¿Ð¾Ð»Ñ Ð² Ñекции `ORDER BY`. @@ -280,22 +280,22 @@ SELECT * FROM collate_test ORDER BY s ASC COLLATE 'en'; ORDER BY expr [WITH FILL] [FROM const_expr] [TO const_expr] [STEP const_numeric_expr], ... exprN [WITH FILL] [FROM expr] [TO expr] [STEP numeric_expr] ``` -`WITH FILL` может быть применене только к полÑм Ñ Ñ‡Ð¸Ñловыми (вÑе разновидноÑти float, int, decimal) или временными (вÑе разновидноÑти Date, DateTime) типами. +`WITH FILL` может быть применен к полÑм Ñ Ñ‡Ð¸Ñловыми (вÑе разновидноÑти float, int, decimal) или временными (вÑе разновидноÑти Date, DateTime) типами. Ð’ Ñлучае Ð¿Ñ€Ð¸Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ðº полÑм типа `String` недоÑтающие Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ð¾Ð»Ð½ÑÑŽÑ‚ÑÑ Ð¿ÑƒÑтой Ñтрокой. Когда не определен `FROM const_expr`, поÑледовательноÑÑ‚ÑŒ Ð·Ð°Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¸Ñпользует минимальное значение Ð¿Ð¾Ð»Ñ `expr` из `ORDER BY`. Когда не определен `TO const_expr`, поÑледовательноÑÑ‚ÑŒ Ð·Ð°Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð¸Ñпользует макÑимальное значение Ð¿Ð¾Ð»Ñ `expr` из `ORDER BY`. -Когда `STEP const_numeric_expr` определен, тогда `const_numeric_expr` интерпретируетÑÑ `как еÑÑ‚ÑŒ` Ð´Ð»Ñ Ñ‡Ð¸Ñловых типов, как `дни` Ð´Ð»Ñ Ñ‚Ð¸Ð¿Ð° Date и как `Ñекунды` Ð´Ð»Ñ Ñ‚Ð¸Ð¿Ð° DateTime. +Когда `STEP const_numeric_expr` определен, `const_numeric_expr` интерпретируетÑÑ "как еÑÑ‚ÑŒ" Ð´Ð»Ñ Ñ‡Ð¸Ñловых типов, как "дни" Ð´Ð»Ñ Ñ‚Ð¸Ð¿Ð° `Date` и как "Ñекунды" Ð´Ð»Ñ Ñ‚Ð¸Ð¿Ð° `DateTime`. + Когда `STEP const_numeric_expr` не указан, тогда иÑпользуетÑÑ `1.0` Ð´Ð»Ñ Ñ‡Ð¸Ñловых типов, `1 день` Ð´Ð»Ñ Ñ‚Ð¸Ð¿Ð° Date и `1 Ñекунда` Ð´Ð»Ñ Ñ‚Ð¸Ð¿Ð° DateTime. - -Ð”Ð»Ñ Ð¿Ñ€Ð¸Ð¼ÐµÑ€Ð°, Ñледующий Ð·Ð°Ð¿Ñ€Ð¾Ñ +Пример запроÑа без иÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ `WITH FILL`: ```sql SELECT n, source FROM ( SELECT toFloat32(number % 10) AS n, 'original' AS source FROM numbers(10) WHERE number % 3 = 1 -) ORDER BY n +) ORDER BY n; ``` -возвращает +Результат: ```text ┌─n─┬─source───┠│ 1 │ original │ @@ -304,7 +304,7 @@ SELECT n, source FROM ( └───┴──────────┘ ``` -но поÑле Ð¿Ñ€Ð¸Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ð¼Ð¾Ð´Ð¸Ñ„Ð¸ÐºÐ°Ñ‚Ð¾Ñ€Ð° `WITH FILL` +Тот же Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð¿Ð¾Ñле Ð¿Ñ€Ð¸Ð¼ÐµÐ½ÐµÐ½Ð¸Ñ Ð¼Ð¾Ð´Ð¸Ñ„Ð¸ÐºÐ°Ñ‚Ð¾Ñ€Ð° `WITH FILL`: ```sql SELECT n, source FROM ( SELECT toFloat32(number % 10) AS n, 'original' AS source @@ -312,7 +312,8 @@ SELECT n, source FROM ( ) ORDER BY n WITH FILL FROM 0 TO 5.51 STEP 0.5 ``` -возвращает +Результат: + ```text ┌───n─┬─source───┠│ 0 │ │ @@ -331,7 +332,7 @@ SELECT n, source FROM ( └─────┴──────────┘ ``` -Ð”Ð»Ñ ÑÐ»ÑƒÑ‡Ð°Ñ ÐºÐ¾Ð³Ð´Ð° у Ð½Ð°Ñ ÐµÑÑ‚ÑŒ неÑколько полей `ORDER BY field2 WITH FILL, field1 WITH FILL` порÑдок Ð·Ð°Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð±ÑƒÐ´ÐµÑ‚ Ñледовать порÑдку полей в Ñекции `ORDER BY`. +Ð”Ð»Ñ ÑÐ»ÑƒÑ‡Ð°Ñ Ñ Ð½ÐµÑколькими полÑми `ORDER BY field2 WITH FILL, field1 WITH FILL` порÑдок Ð·Ð°Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð±ÑƒÐ´ÐµÑ‚ ÑоответÑтвовать порÑдку полей в Ñекции `ORDER BY`. Пример: ```sql @@ -341,12 +342,12 @@ SELECT 'original' AS source FROM numbers(10) WHERE (number % 3) = 1 -ORDER BY - d2 WITH FILL, +ORDER BY + d2 WITH FILL, d1 WITH FILL STEP 5; ``` -возвращает +Результат: ```text ┌───d1───────┬───d2───────┬─source───┠│ 1970-01-11 │ 1970-01-02 │ original │ @@ -354,41 +355,41 @@ ORDER BY │ 1970-01-01 │ 1970-01-04 │ │ │ 1970-02-10 │ 1970-01-05 │ original │ │ 1970-01-01 │ 1970-01-06 │ │ -│ 1970-01-01 │ 1970-01-07 │ │ +│ 1970-01-01 │ 1970-01-07 │ │ │ 1970-03-12 │ 1970-01-08 │ original │ -└────────────┴────────────┴──────────┘ +└────────────┴────────────┴──────────┘ ``` -Поле `d1` не заполнÑет и иÑпользуетÑÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ðµ по умолчанию поÑкольку у Ð½Ð°Ñ Ð½ÐµÑ‚ повторÑющихÑÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ð´Ð»Ñ `d2` поÑтому мы не можем правильно раÑÑчитать поÑледователноÑÑ‚ÑŒ Ð·Ð°Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð´Ð»Ñ`d1`. +Поле `d1` не заполнÑетÑÑ Ð¸ иÑпользует значение по умолчанию. ПоÑкольку у Ð½Ð°Ñ Ð½ÐµÑ‚ повторÑющихÑÑ Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ð¹ Ð´Ð»Ñ `d2`, мы не можем правильно раÑÑчитать поÑледователноÑÑ‚ÑŒ Ð·Ð°Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð´Ð»Ñ `d1`. -Cледующий Ð·Ð°Ð¿Ñ€Ð¾Ñ (Ñ Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ñ‹Ð¼ порÑдком в ORDER BY) +Cледующий Ð·Ð°Ð¿Ñ€Ð¾Ñ (Ñ Ð¸Ð·Ð¼ÐµÐ½ÐµÐ½Ñ‹Ð¼ порÑдком в ORDER BY): ```sql -SELECT - toDate((number * 10) * 86400) AS d1, - toDate(number * 86400) AS d2, +SELECT + toDate((number * 10) * 86400) AS d1, + toDate(number * 86400) AS d2, 'original' AS source FROM numbers(10) WHERE (number % 3) = 1 -ORDER BY +ORDER BY d1 WITH FILL STEP 5, - d2 WITH FILL; + d2 WITH FILL; ``` -возвращает +Результат: ```text ┌───d1───────┬───d2───────┬─source───┠-│ 1970-01-11 │ 1970-01-02 │ original │ -│ 1970-01-16 │ 1970-01-01 │ │ -│ 1970-01-21 │ 1970-01-01 │ │ -│ 1970-01-26 │ 1970-01-01 │ │ -│ 1970-01-31 │ 1970-01-01 │ │ -│ 1970-02-05 │ 1970-01-01 │ │ +│ 1970-01-11 │ 1970-01-02 │ original │ +│ 1970-01-16 │ 1970-01-01 │ │ +│ 1970-01-21 │ 1970-01-01 │ │ +│ 1970-01-26 │ 1970-01-01 │ │ +│ 1970-01-31 │ 1970-01-01 │ │ +│ 1970-02-05 │ 1970-01-01 │ │ │ 1970-02-10 │ 1970-01-05 │ original │ -│ 1970-02-15 │ 1970-01-01 │ │ -│ 1970-02-20 │ 1970-01-01 │ │ -│ 1970-02-25 │ 1970-01-01 │ │ -│ 1970-03-02 │ 1970-01-01 │ │ +│ 1970-02-15 │ 1970-01-01 │ │ +│ 1970-02-20 │ 1970-01-01 │ │ +│ 1970-02-25 │ 1970-01-01 │ │ +│ 1970-03-02 │ 1970-01-01 │ │ │ 1970-03-07 │ 1970-01-01 │ │ -│ 1970-03-12 │ 1970-01-08 │ original │ -└────────────┴────────────┴──────────┘ +│ 1970-03-12 │ 1970-01-08 │ original │ +└────────────┴────────────┴──────────┘ ``` diff --git a/docs/ru/sql-reference/statements/select/with.md b/docs/ru/sql-reference/statements/select/with.md index 7e09d94770a..a80bc46a5a3 100644 --- a/docs/ru/sql-reference/statements/select/with.md +++ b/docs/ru/sql-reference/statements/select/with.md @@ -15,7 +15,7 @@ WITH AS ``` sql WITH AS ``` - + ## Примеры **Пример 1:** ИÑпользование конÑтантного Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ ÐºÐ°Ðº «переменной» @@ -63,7 +63,7 @@ LIMIT 10; **Пример 4:** ПереиÑпользование Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ ``` sql -WITH test1 AS (SELECT i + 1, j + 1 FROM test1) +WITH test1 AS (SELECT i + 1, j + 1 FROM test1) SELECT * FROM test1; ``` diff --git a/docs/ru/sql-reference/statements/show.md b/docs/ru/sql-reference/statements/show.md index 29d184f6c34..7b5296e988e 100644 --- a/docs/ru/sql-reference/statements/show.md +++ b/docs/ru/sql-reference/statements/show.md @@ -301,7 +301,7 @@ SHOW CREATE [SETTINGS] PROFILE name1 [, name2 ...] ``` sql SHOW USERS ``` - + ## SHOW ROLES {#show-roles-statement} Выводит ÑпиÑок [ролей](../../operations/access-rights.md#role-management). Ð”Ð»Ñ Ð¿Ñ€Ð¾Ñмотра параметров ролей, Ñм. ÑиÑтемные таблицы [system.roles](../../operations/system-tables/roles.md#system_tables-roles) и [system.role-grants](../../operations/system-tables/role-grants.md#system_tables-role_grants). @@ -340,8 +340,8 @@ SHOW [ROW] POLICIES [ON [db.]table] ``` sql SHOW QUOTAS -``` - +``` + ## SHOW QUOTA {#show-quota-statement} Выводит потребление [квоты](../../operations/quotas.md) Ð´Ð»Ñ Ð²Ñех пользователей или только Ð´Ð»Ñ Ñ‚ÐµÐºÑƒÑ‰ÐµÐ³Ð¾ пользователÑ. Ð”Ð»Ñ Ð¿Ñ€Ð¾Ñмотра других параметров, Ñм. ÑиÑтемные таблицы [system.quotas_usage](../../operations/system-tables/quotas_usage.md#system_tables-quotas_usage) и [system.quota_usage](../../operations/system-tables/quota_usage.md#system_tables-quota_usage). diff --git a/docs/ru/sql-reference/statements/watch.md b/docs/ru/sql-reference/statements/watch.md index ef5b2f80584..b7567bce6c4 100644 --- a/docs/ru/sql-reference/statements/watch.md +++ b/docs/ru/sql-reference/statements/watch.md @@ -7,7 +7,7 @@ toc_title: WATCH !!! important "Важно" Это ÑкÑÐ¿ÐµÑ€Ð¸Ð¼ÐµÐ½Ñ‚Ð°Ð»ÑŒÐ½Ð°Ñ Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ. Она может повлечь потерю ÑовмеÑтимоÑти в будущих верÑиÑÑ…. - Чтобы иÑпользовать `LIVE VIEW` и запроÑÑ‹ `WATCH`, включите наÑтройку `set allow_experimental_live_view = 1`. + Чтобы иÑпользовать `LIVE VIEW` и запроÑÑ‹ `WATCH`, включите наÑтройку `set allow_experimental_live_view = 1`. **СинтакÑиÑ** diff --git a/docs/ru/sql-reference/table-functions/cluster.md b/docs/ru/sql-reference/table-functions/cluster.md index b59ccf085ed..1a087971afe 100644 --- a/docs/ru/sql-reference/table-functions/cluster.md +++ b/docs/ru/sql-reference/table-functions/cluster.md @@ -6,7 +6,7 @@ toc_title: cluster # cluster, clusterAllReplicas {#cluster-clusterallreplicas} ПозволÑет обратитьÑÑ ÐºÐ¾ вÑем Ñерверам ÑущеÑтвующего клаÑтера, который приÑутÑтвует в таблице `system.clusters` и Ñконфигурирован в Ñекцци `remote_servers` без ÑÐ¾Ð·Ð´Ð°Ð½Ð¸Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹ типа `Distributed`. -`clusterAllReplicas` - работает также как `cluster` но ÐºÐ°Ð¶Ð´Ð°Ñ Ñ€ÐµÐ¿Ð»Ð¸ÐºÐ° в клаÑтере будет иÑпользована как отдельный шард/отдельное Ñоединение. +`clusterAllReplicas` - работает также как `cluster` но ÐºÐ°Ð¶Ð´Ð°Ñ Ñ€ÐµÐ¿Ð»Ð¸ÐºÐ° в клаÑтере будет иÑпользована как отдельный шард/отдельное Ñоединение. Сигнатуры: @@ -18,7 +18,7 @@ clusterAllReplicas('cluster_name', db.table) clusterAllReplicas('cluster_name', db, table) ``` -`cluster_name` – Ð¸Ð¼Ñ ÐºÐ»Ð°Ñтера, который обÑзан приÑутÑтвовать в таблице `system.clusters` и обозначает подмножеÑтво адреÑов и параметров Ð¿Ð¾Ð´ÐºÐ»ÑŽÑ‡ÐµÐ½Ð¸Ñ Ðº удаленным и локальным Ñерверам, входÑщим в клаÑтер. +`cluster_name` – Ð¸Ð¼Ñ ÐºÐ»Ð°Ñтера, который обÑзан приÑутÑтвовать в таблице `system.clusters` и обозначает подмножеÑтво адреÑов и параметров Ð¿Ð¾Ð´ÐºÐ»ÑŽÑ‡ÐµÐ½Ð¸Ñ Ðº удаленным и локальным Ñерверам, входÑщим в клаÑтер. ИÑпользование табличных функций `cluster` и `clusterAllReplicas` менее оптимальное чем Ñоздание таблицы типа `Distributed`, поÑкольку в Ñтом Ñлучае Ñоединение Ñ Ñервером переуÑтанавливаетÑÑ Ð½Ð° каждый запроÑ. При обработке большого количеÑтва запроÑов, вÑегда Ñоздавайте `Distributed` таблицу заранее и не иÑпользуйте табличные функции `cluster` и `clusterAllReplicas`. diff --git a/docs/ru/sql-reference/table-functions/dictionary.md b/docs/ru/sql-reference/table-functions/dictionary.md index d4909bf5d9f..093ca6d03c7 100644 --- a/docs/ru/sql-reference/table-functions/dictionary.md +++ b/docs/ru/sql-reference/table-functions/dictionary.md @@ -13,7 +13,7 @@ toc_title: dictionary dictionary('dict') ``` -**Ðргументы** +**Ðргументы** - `dict` — Ð¸Ð¼Ñ ÑловарÑ. [String](../../sql-reference/data-types/string.md). diff --git a/docs/ru/sql-reference/table-functions/jdbc.md b/docs/ru/sql-reference/table-functions/jdbc.md index 3bbbbb240e5..3955846d8db 100644 --- a/docs/ru/sql-reference/table-functions/jdbc.md +++ b/docs/ru/sql-reference/table-functions/jdbc.md @@ -25,7 +25,7 @@ SELECT * FROM jdbc('mysql-dev?p1=233', 'num Int32', 'select toInt32OrZero(''{{p1 ``` ``` sql -SELECT * +SELECT * FROM jdbc('mysql-dev?p1=233', 'num Int32', 'select toInt32OrZero(''{{p1}}'') as num') ``` diff --git a/docs/ru/sql-reference/table-functions/mysql.md b/docs/ru/sql-reference/table-functions/mysql.md index e21d1a7fa06..e881961e3d9 100644 --- a/docs/ru/sql-reference/table-functions/mysql.md +++ b/docs/ru/sql-reference/table-functions/mysql.md @@ -29,9 +29,9 @@ mysql('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_ - `0` - выполнÑетÑÑ Ð·Ð°Ð¿Ñ€Ð¾Ñ `INSERT INTO`. - `1` - выполнÑетÑÑ Ð·Ð°Ð¿Ñ€Ð¾Ñ `REPLACE INTO`. -- `on_duplicate_clause` — выражение `ON DUPLICATE KEY on_duplicate_clause`, добавлÑемое в Ð·Ð°Ð¿Ñ€Ð¾Ñ `INSERT`. Может быть передано только Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ `replace_query = 0` (еÑли вы одновременно передадите `replace_query = 1` и `on_duplicate_clause`, будет Ñгенерировано иÑключение). +- `on_duplicate_clause` — выражение `ON DUPLICATE KEY on_duplicate_clause`, добавлÑемое в Ð·Ð°Ð¿Ñ€Ð¾Ñ `INSERT`. Может быть передано только Ñ Ð¿Ð¾Ð¼Ð¾Ñ‰ÑŒÑŽ `replace_query = 0` (еÑли вы одновременно передадите `replace_query = 1` и `on_duplicate_clause`, будет Ñгенерировано иÑключение). - Пример: `INSERT INTO t (c1,c2) VALUES ('a', 2) ON DUPLICATE KEY UPDATE c2 = c2 + 1`, где `on_duplicate_clause` Ñто `UPDATE c2 = c2 + 1`. + Пример: `INSERT INTO t (c1,c2) VALUES ('a', 2) ON DUPLICATE KEY UPDATE c2 = c2 + 1`, где `on_duplicate_clause` Ñто `UPDATE c2 = c2 + 1`. ВыражениÑ, которые могут иÑпользоватьÑÑ Ð² качеÑтве `on_duplicate_clause` в Ñекции `ON DUPLICATE KEY`, можно поÑмотреть в документации по [MySQL](http://www.mysql.ru/docs/). ПроÑтые уÑÐ»Ð¾Ð²Ð¸Ñ `WHERE` такие как `=, !=, >, >=, <, =` выполнÑÑŽÑ‚ÑÑ Ð½Ð° Ñтороне Ñервера MySQL. @@ -104,7 +104,7 @@ SELECT * FROM mysql('localhost:3306', 'test', 'test', 'bayonet', '123'); └────────┴───────┘ ``` -**Смотрите также** +**Смотрите также** - [Движок таблиц ‘MySQL’](../../sql-reference/table-functions/mysql.md) - [ИÑпользование MySQL как иÑточника данных Ð´Ð»Ñ Ð²Ð½ÐµÑˆÐ½ÐµÐ³Ð¾ ÑловарÑ](../../sql-reference/table-functions/mysql.md#dicts-external_dicts_dict_sources-mysql) diff --git a/docs/ru/sql-reference/table-functions/null.md b/docs/ru/sql-reference/table-functions/null.md index 8e0173733f8..44fbc111db2 100644 --- a/docs/ru/sql-reference/table-functions/null.md +++ b/docs/ru/sql-reference/table-functions/null.md @@ -7,13 +7,13 @@ toc_title: null Ñ„ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¡Ð¾Ð·Ð´Ð°ÐµÑ‚ временную таблицу указанной Ñтруктуры Ñ Ð´Ð²Ð¸Ð¶ÐºÐ¾Ð¼ [Null](../../engines/table-engines/special/null.md). Ð’ ÑоответÑтвии Ñо ÑвойÑтвами движка, данные в таблице игнорируютÑÑ, а Ñама таблица удалÑетÑÑ Ñразу поÑле Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð·Ð°Ð¿Ñ€Ð¾Ñа. Ð¤ÑƒÐ½ÐºÑ†Ð¸Ñ Ð¸ÑпользуетÑÑ Ð´Ð»Ñ ÑƒÐ´Ð¾Ð±Ñтва напиÑÐ°Ð½Ð¸Ñ Ñ‚ÐµÑтов и демонÑтрационных примеров. -**СинтакÑиÑ** +**СинтакÑиÑ** ``` sql null('structure') ``` -**Параметр** +**Параметр** - `structure` — ÑпиÑок колонок и их типов. [String](../../sql-reference/data-types/string.md). @@ -36,7 +36,7 @@ INSERT INTO t SELECT * FROM numbers_mt(1000000000); DROP TABLE IF EXISTS t; ``` -См. также: +См. также: - [Движок таблиц Null](../../engines/table-engines/special/null.md) diff --git a/docs/ru/sql-reference/table-functions/postgresql.md b/docs/ru/sql-reference/table-functions/postgresql.md index 50f651527c5..76c2ee0fa18 100644 --- a/docs/ru/sql-reference/table-functions/postgresql.md +++ b/docs/ru/sql-reference/table-functions/postgresql.md @@ -20,7 +20,7 @@ postgresql('host:port', 'database', 'table', 'user', 'password'[, `schema`]) - `table` — Ð¸Ð¼Ñ Ñ‚Ð°Ð±Ð»Ð¸Ñ†Ñ‹ на удалённом Ñервере. - `user` — пользователь PostgreSQL. - `password` — пароль пользователÑ. -- `schema` — Ð¸Ð¼Ñ Ñхемы, еÑли не иÑпользуетÑÑ Ñхема по умолчанию. ÐеобÑзательный аргумент. +- `schema` — Ð¸Ð¼Ñ Ñхемы, еÑли не иÑпользуетÑÑ Ñхема по умолчанию. ÐеобÑзательный аргумент. **Возвращаемое значение** @@ -43,7 +43,7 @@ PostgreSQL маÑÑивы конвертируютÑÑ Ð² маÑÑивы ClickHo !!! info "Примечание" Будьте внимательны, в PostgreSQL маÑÑивы, Ñозданные как `type_name[]`, ÑвлÑÑŽÑ‚ÑÑ Ð¼Ð½Ð¾Ð³Ð¾Ð¼ÐµÑ€Ð½Ñ‹Ð¼Ð¸ и могут Ñодержать в Ñебе разное количеÑтво измерений в разных Ñтроках одной таблицы. Внутри ClickHouse допуÑтипы только многомерные маÑÑивы Ñ Ð¾Ð´Ð¸Ð½Ð°ÐºÐ¾Ð²Ñ‹Ð¼ кол-вом измерений во вÑех Ñтроках таблицы. - + Поддерживает неÑколько реплик, которые должны быть перечиÑлены через `|`. Ðапример: ```sql @@ -56,7 +56,7 @@ SELECT name FROM postgresql(`postgres{1|2|3}:5432`, 'postgres_database', 'postgr SELECT name FROM postgresql(`postgres1:5431|postgres2:5432`, 'postgres_database', 'postgres_table', 'user', 'password'); ``` -При иÑпользовании ÑÐ»Ð¾Ð²Ð°Ñ€Ñ PostgreSQL поддерживаетÑÑ Ð¿Ñ€Ð¸Ð¾Ñ€Ð¸Ñ‚ÐµÑ‚ реплик. Чем больше номер реплики, тем ниже ее приоритет. ÐаивыÑший приоритет у реплики Ñ Ð½Ð¾Ð¼ÐµÑ€Ð¾Ð¼ `0`. +При иÑпользовании ÑÐ»Ð¾Ð²Ð°Ñ€Ñ PostgreSQL поддерживаетÑÑ Ð¿Ñ€Ð¸Ð¾Ñ€Ð¸Ñ‚ÐµÑ‚ реплик. Чем больше номер реплики, тем ниже ее приоритет. ÐаивыÑший приоритет у реплики Ñ Ð½Ð¾Ð¼ÐµÑ€Ð¾Ð¼ `0`. **Примеры** @@ -86,7 +86,7 @@ postgresql> SELECT * FROM test; Получение данных в ClickHouse: ```sql -SELECT * FROM postgresql('localhost:5432', 'test', 'test', 'postgresql_user', 'password') WHERE str IN ('test'); +SELECT * FROM postgresql('localhost:5432', 'test', 'test', 'postgresql_user', 'password') WHERE str IN ('test'); ``` ``` text diff --git a/docs/ru/sql-reference/table-functions/remote.md b/docs/ru/sql-reference/table-functions/remote.md index 00179abb207..20c38ef2af7 100644 --- a/docs/ru/sql-reference/table-functions/remote.md +++ b/docs/ru/sql-reference/table-functions/remote.md @@ -23,7 +23,7 @@ remoteSecure('addresses_expr', db.table[, 'user'[, 'password']]) - `addresses_expr` — выражение, генерирующее адреÑа удалённых Ñерверов. Это может быть проÑто один Ð°Ð´Ñ€ÐµÑ Ñервера. ÐÐ´Ñ€ÐµÑ Ñервера — Ñто `host:port` или только `host`. ВмеÑто параметра `host` может быть указано Ð¸Ð¼Ñ Ñервера или его Ð°Ð´Ñ€ÐµÑ Ð² формате IPv4 или IPv6. IPv6 Ð°Ð´Ñ€ÐµÑ ÑƒÐºÐ°Ð·Ñ‹Ð²Ð°ÐµÑ‚ÑÑ Ð² квадратных Ñкобках. - + `port` — TCP-порт удалённого Ñервера. ЕÑли порт не указан, иÑпользуетÑÑ [tcp_port](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port) из конфигурационного файла Ñервера, к которому обратилиÑÑŒ через функцию `remote` (по умолчанию - 9000), и [tcp_port_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure), к которому обратилиÑÑŒ через функцию `remoteSecure` (по умолчанию — 9440). С IPv6-адреÑом обÑзательно нужно указывать порт. @@ -68,28 +68,6 @@ localhost example01-01-1,example01-02-1 ``` -ЧаÑÑ‚ÑŒ Ð²Ñ‹Ñ€Ð°Ð¶ÐµÐ½Ð¸Ñ Ð¼Ð¾Ð¶ÐµÑ‚ быть указана в фигурных Ñкобках. Предыдущий пример может быть запиÑан Ñледующим образом: - -``` text -example01-0{1,2}-1 -``` - -Ð’ фигурных Ñкобках может быть указан диапазон (неотрицательных целых) чиÑел через две точки. Ð’ Ñтом Ñлучае диапазон раÑкрываетÑÑ Ð² множеÑтво значений, генерирующих адреÑа шардов. ЕÑли запиÑÑŒ первого чиÑла начинаетÑÑ Ñ Ð½ÑƒÐ»Ñ, то Ð·Ð½Ð°Ñ‡ÐµÐ½Ð¸Ñ Ñ„Ð¾Ñ€Ð¼Ð¸Ñ€ÑƒÑŽÑ‚ÑÑ Ñ Ñ‚Ð°ÐºÐ¸Ð¼ же выравниванием нулÑми. Предыдущий пример может быть запиÑан Ñледующим образом: - -``` text -example01-{01..02}-1 -``` - -При наличии неÑкольких пар фигурных Ñкобок генерируетÑÑ Ð¿Ñ€Ñмое произведение ÑоответÑтвующих множеÑтв. - -ÐдреÑа или их фрагменты в фигурных Ñкобках можно указать через Ñимвол \|. Ð’ Ñтом Ñлучае ÑоответÑтвующие множеÑтва адреÑов понимаютÑÑ ÐºÐ°Ðº реплики — Ð·Ð°Ð¿Ñ€Ð¾Ñ Ð±ÑƒÐ´ÐµÑ‚ отправлен на первую живую реплику. При Ñтом реплики перебираютÑÑ Ð² порÑдке, ÑоглаÑно текущей наÑтройке [load_balancing](../../operations/settings/settings.md#settings-load_balancing). Ð’ Ñтом примере указаны два шарда, в каждом из которых имеютÑÑ Ð´Ð²Ðµ реплики: - -``` text -example01-{01..02}-{1|2} -``` - -КоличеÑтво генерируемых адреÑов ограничено конÑтантой. Ð¡ÐµÐ¹Ñ‡Ð°Ñ Ñто 1000 адреÑов. - **Примеры** Выборка данных Ñ ÑƒÐ´Ð°Ð»ÐµÐ½Ð½Ð¾Ð³Ð¾ Ñервера: @@ -106,3 +84,15 @@ INSERT INTO FUNCTION remote('127.0.0.1', currentDatabase(), 'remote_table') VALU SELECT * FROM remote_table; ``` +## Символы подÑтановки в адреÑах {globs-in-addresses} + +Шаблоны в фигурных Ñкобках `{ }` иÑпользуютÑÑ, чтобы Ñгенерировать ÑпиÑок шардов или указать альтернативный Ð°Ð´Ñ€ÐµÑ Ð½Ð° Ñлучай отказа. Ð’ одном URL можно иÑпользовать неÑколько шаблонов. +ПоддерживаютÑÑ Ñледующие типы шаблонов. + +- {*a*,*b*} - неÑколько вариантов, разделенных запÑтой. ВеÑÑŒ шаблон заменÑетÑÑ Ð½Ð° *a* в адреÑе первого шарда, заменÑетÑÑ Ð½Ð° *b* в адреÑе второго шарда и так далее. Ðапример, `example0{1,2}-1` генерирует адреÑа `example01-1` и `example02-1`. +- {*n*..*m*} - диапазон чиÑел. Этот шаблон генерирует адреÑа шардов Ñ ÑƒÐ²ÐµÐ»Ð¸Ñ‡Ð¸Ð²Ð°ÑŽÑ‰Ð¸Ð¼Ð¸ÑÑ Ð¸Ð½Ð´ÐµÐºÑами от *n* до *m*. `example0{1..2}-1` генерирует `example01-1` и `example02-1`. +- {*0n*..*0m*} - диапазон чиÑел Ñ Ð²ÐµÐ´ÑƒÑ‰Ð¸Ð¼Ð¸ нулÑми. Такой вариант ÑохранÑет ведущие нули в индекÑах. По шаблону `example{01..03}-1` генерируютÑÑ `example01-1`, `example02-1` и `example03-1`. +- {*a*|*b*} - неÑколько вариантов, разделенных `|`. Шаблон задает адреÑа реплик. Ðапример, `example01-{1|2}` генерирует реплики`example01-1` и `example01-2`. + +Ð—Ð°Ð¿Ñ€Ð¾Ñ Ð±ÑƒÐ´ÐµÑ‚ отправлен на первую живую реплику. При Ñтом Ð´Ð»Ñ `remote` реплики перебираютÑÑ Ð² порÑдке, заданном наÑтройкой [load_balancing](../../operations/settings/settings.md#settings-load_balancing). +КоличеÑтво генерируемых адреÑов ограничено наÑтройкой [table_function_remote_max_addresses](../../operations/settings/settings.md#table_function_remote_max_addresses). diff --git a/docs/ru/sql-reference/table-functions/s3.md b/docs/ru/sql-reference/table-functions/s3.md index 5b54940e830..597f145c096 100644 --- a/docs/ru/sql-reference/table-functions/s3.md +++ b/docs/ru/sql-reference/table-functions/s3.md @@ -18,7 +18,7 @@ s3(path, [aws_access_key_id, aws_secret_access_key,] format, structure, [compres - `path` — URL-Ð°Ð´Ñ€ÐµÑ Ð±Ð°ÐºÐµÑ‚Ð° Ñ ÑƒÐºÐ°Ð·Ð°Ð½Ð¸ÐµÐ¼ пути к файлу. Поддерживает Ñледующие подÑтановочные знаки в режиме "только чтение": `*, ?, {abc,def} и {N..M}` где `N, M` — чиÑла, `'abc', 'def'` — Ñтроки. Подробнее Ñмотри [здеÑÑŒ](../../engines/table-engines/integrations/s3.md#wildcards-in-path). - `format` — [формат](../../interfaces/formats.md#formats) файла. - `structure` — cтруктура таблицы. Формат `'column1_name column1_type, column2_name column2_type, ...'`. -- `compression` — автоматичеÑки обнаруживает Ñжатие по раÑширению файла. Возможные значениÑ: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. ÐеобÑзательный параметр. +- `compression` — автоматичеÑки обнаруживает Ñжатие по раÑширению файла. Возможные значениÑ: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. ÐеобÑзательный параметр. **Возвращаемые значениÑ** @@ -50,8 +50,8 @@ LIMIT 2; ЗапроÑ: ``` sql -SELECT * -FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/data.csv.gz', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32', 'gzip') +SELECT * +FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/data.csv.gz', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32', 'gzip') LIMIT 2; ``` diff --git a/docs/ru/sql-reference/table-functions/url.md b/docs/ru/sql-reference/table-functions/url.md index a41a1f53cde..b5fd948cc45 100644 --- a/docs/ru/sql-reference/table-functions/url.md +++ b/docs/ru/sql-reference/table-functions/url.md @@ -41,3 +41,7 @@ INSERT INTO FUNCTION url('http://127.0.0.1:8123/?query=INSERT+INTO+test_table+FO SELECT * FROM test_table; ``` +## Символы подÑтановки в URL {globs-in-url} + +Шаблоны в фигурных Ñкобках `{ }` иÑпользуютÑÑ, чтобы Ñгенерировать ÑпиÑок шардов или указать альтернативные адреÑа на Ñлучай отказа. Поддерживаемые типы шаблонов и примеры Ñмотрите в опиÑании функции [remote](remote.md#globs-in-addresses). +Символ `|` внутри шаблонов иÑпользуетÑÑ, чтобы задать адреÑа, еÑли предыдущие оказалиÑÑŒ недоÑтупны. Эти адреÑа перебираютÑÑ Ð² том же порÑдке, в котором они указаны в шаблоне. КоличеÑтво адреÑов, которые могут быть Ñгенерированы, ограничено наÑтройкой [glob_expansion_max_elements](../../operations/settings/settings.md#glob_expansion_max_elements). diff --git a/docs/ru/whats-new/security-changelog.md b/docs/ru/whats-new/security-changelog.md index e3d26e772c4..60d6c2f1b66 100644 --- a/docs/ru/whats-new/security-changelog.md +++ b/docs/ru/whats-new/security-changelog.md @@ -5,6 +5,17 @@ toc_title: Security Changelog # Security Changelog {#security-changelog} +## ИÑправлено в релизе 21.4.3.21, 2021-04-12 {#fixed-in-clickhouse-release-21-4-3-21-2019-09-10} + +### CVE-2021-25263 {#cve-2021-25263} + +Злоумышленник Ñ Ð´Ð¾Ñтупом к Ñозданию Ñловарей может читать файлы на файловой ÑиÑтеме Ñервера Clickhouse. +Злоумышленник может обойти некорректную проверку пути к файлу ÑÐ»Ð¾Ð²Ð°Ñ€Ñ Ð¸ загрузить чаÑÑ‚ÑŒ любого файла как Ñловарь. При Ñтом, Ð¼Ð°Ð½Ð¸Ð¿ÑƒÐ»Ð¸Ñ€ÑƒÑ Ð¾Ð¿Ñ†Ð¸Ñми парÑинга файла, можно получить Ñледующую чаÑÑ‚ÑŒ файла и пошагово прочитать веÑÑŒ файл. + +ИÑправление доÑтупно в верÑиÑÑ… 20.8.18.32-lts, 21.1.9.41-stable, 21.2.9.41-stable, 21.3.6.55-lts, 21.4.3.21-stable и выше. + +Обнаружено благодарÑ: [Ð’ÑчеÑлаву Егошину](https://twitter.com/vegoshin) + ## ИÑправлено в релизе 19.14.3.3, 2019-09-10 {#ispravleno-v-relize-19-14-3-3-2019-09-10} ### CVE-2019-15024 {#cve-2019-15024} diff --git a/docs/zh/commercial/cloud.md b/docs/zh/commercial/cloud.md index e0a297f51c8..651a1a15ec4 100644 --- a/docs/zh/commercial/cloud.md +++ b/docs/zh/commercial/cloud.md @@ -1,6 +1,6 @@ --- toc_priority: 1 -toc_title: 云 +toc_title: 云 --- # ClickHouse 云æœåŠ¡æ供商 {#clickhouse-cloud-service-providers} @@ -22,7 +22,7 @@ toc_title: 云 [Altinity.Cloud](https://altinity.com/cloud-database/) 是针对 Amazon 公共云的完全托管的 ClickHouse-as-a-Service -- 在 Amazon 资æºä¸Šå¿«é€Ÿéƒ¨ç½² ClickHouse 集群 +- 在 Amazon 资æºä¸Šå¿«é€Ÿéƒ¨ç½² ClickHouse 集群 - è½»æ¾è¿›è¡Œæ¨ªå‘扩展/纵å‘扩展以åŠèŠ‚点的垂直扩展 - 具有公共端点或VPC对等的租户隔离 - å¯é…置存储类型以åŠå·é…ç½® diff --git a/docs/zh/commercial/index.md b/docs/zh/commercial/index.md index 2bfd0767d1b..047ee817d7b 100644 --- a/docs/zh/commercial/index.md +++ b/docs/zh/commercial/index.md @@ -1,5 +1,5 @@ --- -toc_folder_title: å•†ä¸šæ”¯æŒ +toc_folder_title: å•†ä¸šæ”¯æŒ toc_priority: 70 toc_title: 简介 --- diff --git a/docs/zh/development/browse-code.md b/docs/zh/development/browse-code.md index 49da72a63aa..fd334dac55f 100644 --- a/docs/zh/development/browse-code.md +++ b/docs/zh/development/browse-code.md @@ -7,6 +7,6 @@ toc_title: "\u6D4F\u89C8\u6E90\u4EE3\u7801" 您å¯ä»¥ä½¿ç”¨ **Woboq** 在线代ç æµè§ˆå™¨ [点击这里](https://clickhouse.tech/codebrowser/html_report/ClickHouse/src/index.html). 它æ供了代ç å¯¼èˆªå’Œè¯­ä¹‰çªå‡ºæ˜¾ç¤ºã€æœç´¢å’Œç´¢å¼•ã€‚ 代ç å¿«ç…§æ¯å¤©æ›´æ–°ã€‚ -此外,您还å¯ä»¥åƒå¾€å¸¸ä¸€æ ·æµè§ˆæºä»£ç  [GitHub](https://github.com/ClickHouse/ClickHouse) +此外,您还å¯ä»¥åƒå¾€å¸¸ä¸€æ ·æµè§ˆæºä»£ç  [GitHub](https://github.com/ClickHouse/ClickHouse) 如果你希望了解哪ç§IDE较好,我们推è使用CLion,QT Creator,VS Codeå’ŒKDevelop(有注æ„事项)。 您å¯ä»¥ä½¿ç”¨ä»»ä½•æ‚¨å–œæ¬¢çš„IDE。 Vimå’ŒEmacs也å¯ä»¥ã€‚ diff --git a/docs/zh/engines/database-engines/atomic.md b/docs/zh/engines/database-engines/atomic.md index f019b94a00b..73e044b5e98 100644 --- a/docs/zh/engines/database-engines/atomic.md +++ b/docs/zh/engines/database-engines/atomic.md @@ -6,12 +6,12 @@ toc_title: Atomic # Atomic {#atomic} -It is supports non-blocking `DROP` and `RENAME TABLE` queries and atomic `EXCHANGE TABLES t1 AND t2` queries. Atomic database engine is used by default. +它支æŒéžé˜»å¡ž DROP å’Œ RENAME TABLE 查询以åŠåŽŸå­ EXCHANGE TABLES t1 AND t2 查询。默认情况下使用Atomicæ•°æ®åº“引擎。 -## Creating a Database {#creating-a-database} +## 创建数æ®åº“ {#creating-a-database} ```sql CREATE DATABASE test ENGINE = Atomic; ``` -[Original article](https://clickhouse.tech/docs/en/engines/database_engines/atomic/) +[原文](https://clickhouse.tech/docs/en/engines/database_engines/atomic/) diff --git a/docs/zh/engines/table-engines/integrations/mongodb.md b/docs/zh/engines/table-engines/integrations/mongodb.md index a3fa677c672..ff3fdae5b40 100644 --- a/docs/zh/engines/table-engines/integrations/mongodb.md +++ b/docs/zh/engines/table-engines/integrations/mongodb.md @@ -37,7 +37,7 @@ ClickHouse 中的表,从 MongoDB 集åˆä¸­è¯»å–æ•°æ®: ``` text CREATE TABLE mongo_table ( - key UInt64, + key UInt64, data String ) ENGINE = MongoDB('mongo1:27017', 'test', 'simple_table', 'testuser', 'clickhouse'); ``` diff --git a/docs/zh/engines/table-engines/mergetree-family/collapsingmergetree.md b/docs/zh/engines/table-engines/mergetree-family/collapsingmergetree.md index 6d1dfac7686..6fb57dc19d9 100644 --- a/docs/zh/engines/table-engines/mergetree-family/collapsingmergetree.md +++ b/docs/zh/engines/table-engines/mergetree-family/collapsingmergetree.md @@ -1,4 +1,4 @@ -# 折å æ ‘ {#table_engine-collapsingmergetree} +# CollapsingMergeTree {#table_engine-collapsingmergetree} 该引擎继承于 [MergeTree](mergetree.md),并在数æ®å—åˆå¹¶ç®—法中添加了折å è¡Œçš„逻辑。 @@ -203,4 +203,4 @@ SELECT * FROM UAct FINAL è¿™ç§æŸ¥è¯¢æ•°æ®çš„方法是éžå¸¸ä½Žæ•ˆçš„。ä¸è¦åœ¨å¤§è¡¨ä¸­ä½¿ç”¨å®ƒã€‚ -[æ¥æºæ–‡ç« ](https://clickhouse.tech/docs/en/operations/table_engines/collapsingmergetree/) +[原文](https://clickhouse.tech/docs/en/operations/table_engines/collapsingmergetree/) diff --git a/docs/zh/engines/table-engines/mergetree-family/graphitemergetree.md b/docs/zh/engines/table-engines/mergetree-family/graphitemergetree.md index 7440abcc027..fa17e0d5c5e 100644 --- a/docs/zh/engines/table-engines/mergetree-family/graphitemergetree.md +++ b/docs/zh/engines/table-engines/mergetree-family/graphitemergetree.md @@ -70,7 +70,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] 除了`config_section`,其它所有å‚æ•°å’Œ`MergeTree`的相应å‚数一样. -- `config_section` —é…置文件中设置汇总规则的节点 +- `config_section` —é…置文件中设置汇总规则的节点 diff --git a/docs/zh/engines/table-engines/mergetree-family/replacingmergetree.md b/docs/zh/engines/table-engines/mergetree-family/replacingmergetree.md index 73328015ea9..75ec4ea8b3d 100644 --- a/docs/zh/engines/table-engines/mergetree-family/replacingmergetree.md +++ b/docs/zh/engines/table-engines/mergetree-family/replacingmergetree.md @@ -28,7 +28,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] - `ver` — 版本列。类型为 `UInt*`, `Date` 或 `DateTime`。å¯é€‰å‚数。 在数æ®åˆå¹¶çš„时候,`ReplacingMergeTree` 从所有具有相åŒæŽ’åºé”®çš„行中选择一行留下: - + - 如果 `ver` 列未指定,ä¿ç•™æœ€åŽä¸€æ¡ã€‚ - 如果 `ver` 列已指定,ä¿ç•™ `ver` 值最大的版本。 diff --git a/docs/zh/engines/table-engines/mergetree-family/versionedcollapsingmergetree.md b/docs/zh/engines/table-engines/mergetree-family/versionedcollapsingmergetree.md index 3b89da9f595..8479c92f56f 100644 --- a/docs/zh/engines/table-engines/mergetree-family/versionedcollapsingmergetree.md +++ b/docs/zh/engines/table-engines/mergetree-family/versionedcollapsingmergetree.md @@ -3,7 +3,7 @@ toc_priority: 37 toc_title: "版本折å MergeTree" --- -# 版本折å MergeTree {#versionedcollapsingmergetree} +# VersionedCollapsingMergeTree {#versionedcollapsingmergetree} 这个引擎: @@ -47,7 +47,7 @@ VersionedCollapsingMergeTree(sign, version) **查询 Clauses** -当创建一个 `VersionedCollapsingMergeTree` 表时,跟创建一个 `MergeTree`表的时候需è¦ç›¸åŒ [Clause](mergetree.md) +当创建一个 `VersionedCollapsingMergeTree` 表时,跟创建一个 `MergeTree`表的时候需è¦ç›¸åŒ [Clause](mergetree.md)
diff --git a/docs/zh/engines/table-engines/special/generate.md b/docs/zh/engines/table-engines/special/generate.md index 80966767462..885ad381d7f 100644 --- a/docs/zh/engines/table-engines/special/generate.md +++ b/docs/zh/engines/table-engines/special/generate.md @@ -5,7 +5,7 @@ toc_title: éšæœºæ•°ç”Ÿæˆ # éšæœºæ•°ç”Ÿæˆè¡¨å¼•æ“Ž {#table_engines-generate} -éšæœºæ•°ç”Ÿæˆè¡¨å¼•æ“Žä¸ºæŒ‡å®šçš„表模å¼ç”Ÿæˆéšæœºæ•° +éšæœºæ•°ç”Ÿæˆè¡¨å¼•æ“Žä¸ºæŒ‡å®šçš„表模å¼ç”Ÿæˆéšæœºæ•° 使用示例: - 测试时生æˆå¯å¤å†™çš„大表 diff --git a/docs/zh/guides/apply-catboost-model.md b/docs/zh/guides/apply-catboost-model.md index 9002e5cf005..ad325dc29db 100644 --- a/docs/zh/guides/apply-catboost-model.md +++ b/docs/zh/guides/apply-catboost-model.md @@ -130,7 +130,7 @@ CatBoost集æˆåˆ°ClickHouse步骤: **1.** 构建评估库。 -评估CatBoost模型的最快方法是编译 `libcatboostmodel.` 库文件. +评估CatBoost模型的最快方法是编译 `libcatboostmodel.` 库文件. 有关如何构建库文件的详细信æ¯ï¼Œè¯·å‚阅 [CatBoost文件](https://catboost.ai/docs/concepts/c-plus-plus-api_dynamic-c-pluplus-wrapper.html). @@ -186,7 +186,7 @@ CatBoost集æˆåˆ°ClickHouse步骤: ACTION AS target FROM amazon_train LIMIT 10 -``` +``` !!! note "注" 函数 [modelEvaluate](../sql-reference/functions/other-functions.md#function-modelevaluate) 返回带有多类模型的æ¯ç±»åŽŸå§‹é¢„测的元组。 diff --git a/docs/zh/guides/index.md b/docs/zh/guides/index.md index 8d822307c28..54cb4e1055c 100644 --- a/docs/zh/guides/index.md +++ b/docs/zh/guides/index.md @@ -8,9 +8,9 @@ toc_title: "\u6982\u8FF0" # ClickHouseæŒ‡å— {#clickhouse-guides} -详细的一步一步的说明,帮助解决使用ClickHouseçš„å„ç§ä»»åŠ¡åˆ—表: +列出了如何使用 Clickhouse 解决å„ç§ä»»åŠ¡çš„详细说明: -- [简å•é›†ç¾¤è®¾ç½®æ•™ç¨‹](../getting-started/tutorial.md) +- [关于简å•é›†ç¾¤è®¾ç½®çš„教程](../getting-started/tutorial.md) - [在ClickHouse中应用CatBoost模型](apply-catboost-model.md) [原始文章](https://clickhouse.tech/docs/en/guides/) diff --git a/docs/zh/interfaces/formats.md b/docs/zh/interfaces/formats.md index ef18679af12..1cd91690e57 100644 --- a/docs/zh/interfaces/formats.md +++ b/docs/zh/interfaces/formats.md @@ -62,7 +62,7 @@ ClickHouseå¯ä»¥æŽ¥å—和返回å„ç§æ ¼å¼çš„æ•°æ®ã€‚å—支æŒçš„è¾“å…¥æ ¼å¼ | [RawBLOB](#rawblob) | ✔ | ✔ | -您å¯ä»¥ä½¿ç”¨ClickHouse设置一些格å¼åŒ–å‚数。更多详情设置请å‚考[设置](../operations/settings/settings.md) +您å¯ä»¥ä½¿ç”¨ClickHouse设置一些格å¼åŒ–å‚数。更多详情设置请å‚考[设置](../operations/settings/settings.md) ## TabSeparated {#tabseparated} diff --git a/docs/zh/interfaces/http.md b/docs/zh/interfaces/http.md index 1bae0ad1df3..8ab54293b32 100644 --- a/docs/zh/interfaces/http.md +++ b/docs/zh/interfaces/http.md @@ -162,7 +162,7 @@ $ echo 'DROP TABLE t' | curl 'http://localhost:8123/' --data-binary @- 如果在URL中指定了`compress=1`,æœåŠ¡ä¼šè¿”回压缩的数æ®ã€‚ 如果在URL中指定了`decompress=1`,æœåŠ¡ä¼šè§£åŽ‹é€šè¿‡POST方法å‘é€çš„æ•°æ®ã€‚ -您也å¯ä»¥é€‰æ‹©ä½¿ç”¨[HTTP compression](https://en.wikipedia.org/wiki/HTTP_compression)。å‘é€ä¸€ä¸ªåŽ‹ç¼©çš„POST请求,附加请求头`Content-Encoding: compression_method`。为了使ClickHouseå“应,您必须附加`Accept-Encoding: compression_method`。ClickHouse支æŒ`gzip`,`br`å’Œ`deflate` [compression methods](https://en.wikipedia.org/wiki/HTTP_compression#Content-Encoding_tokens)。è¦å¯ç”¨HTTP压缩,必须使用ClickHouse[å¯ç”¨Http压缩](../operations/settings/settings.md#settings-enable_http_compression)é…置。您å¯ä»¥åœ¨[Http zlib压缩级别](#settings-http_zlib_compression_level)设置中为所有压缩方法é…置数æ®åŽ‹ç¼©çº§åˆ«ã€‚ +您也å¯ä»¥é€‰æ‹©ä½¿ç”¨[HTTP compression](https://en.wikipedia.org/wiki/HTTP_compression)。å‘é€ä¸€ä¸ªåŽ‹ç¼©çš„POST请求,附加请求头`Content-Encoding: compression_method`。为了使ClickHouseå“应,您必须附加`Accept-Encoding: compression_method`。ClickHouse支æŒ`gzip`,`br`å’Œ`deflate` [compression methods](https://en.wikipedia.org/wiki/HTTP_compression#Content-Encoding_tokens)。è¦å¯ç”¨HTTP压缩,必须使用ClickHouse[å¯ç”¨Http压缩](../operations/settings/settings.md#settings-enable_http_compression)é…置。您å¯ä»¥åœ¨[Http zlib压缩级别](#settings-http_zlib_compression_level)设置中为所有压缩方法é…置数æ®åŽ‹ç¼©çº§åˆ«ã€‚ 您å¯ä»¥ä½¿ç”¨å®ƒåœ¨ä¼ è¾“大é‡æ•°æ®æ—¶å‡å°‘网络æµé‡ï¼Œæˆ–者创建立å³åŽ‹ç¼©çš„转储。 diff --git a/docs/zh/interfaces/tcp.md b/docs/zh/interfaces/tcp.md index b779b9fea40..571fd22b758 100644 --- a/docs/zh/interfaces/tcp.md +++ b/docs/zh/interfaces/tcp.md @@ -5,6 +5,6 @@ toc_title: 原生接å£(TCP) # 原生接å£ï¼ˆTCP){#native-interface-tcp} -原生接å£ç”¨äºŽ[命令行客户端](cli.md),用于分布å¼æŸ¥è¯¢å¤„ç†æœŸé—´çš„æœåŠ¡å™¨é—´é€šä¿¡ï¼Œä»¥åŠå…¶ä»–C++程åºã€‚å¯æƒœçš„是,原生的ClickHouseå议还没有正å¼çš„规范,但它å¯ä»¥ä»ŽClickHouse[æºä»£ç ](https://github.com/ClickHouse/ClickHouse/tree/master/src/Client)通过拦截和分æžTCPæµé‡è¿›è¡Œåå‘工程。 +原生接å£å议用于[命令行客户端](cli.md),用于分布å¼æŸ¥è¯¢å¤„ç†æœŸé—´çš„æœåŠ¡å™¨é—´é€šä¿¡ï¼Œä»¥åŠå…¶ä»–C++ 程åºã€‚ä¸å¹¸çš„是,原生ClickHouseå议还没有正å¼çš„规范,但它å¯ä»¥ä»ŽClickHouseæºä»£ç [从这里开始](https://github.com/ClickHouse/ClickHouse/tree/master/src/Client)或通过拦截和分æžTCPæµé‡è¿›è¡Œé€†å‘工程。 -[æ¥æºæ–‡ç« ](https://clickhouse.tech/docs/zh/interfaces/tcp/) +[原文](https://clickhouse.tech/docs/en/interfaces/tcp/) diff --git a/docs/zh/interfaces/third-party/gui.md b/docs/zh/interfaces/third-party/gui.md index e85f8b2ec79..46baf55d564 100644 --- a/docs/zh/interfaces/third-party/gui.md +++ b/docs/zh/interfaces/third-party/gui.md @@ -57,9 +57,9 @@ ClickHouse Web ç•Œé¢ [Tabix](https://github.com/tabixio/tabix). - 表格预览。 - 自动完æˆã€‚ -### ツ环æ¿-ï½®ï¾‚å˜‰ï½¯ï¾‚å² {#clickhouse-cli} +### clickhouse-cli {#clickhouse-cli} -[ツ环æ¿-ョツ嘉ッツå²](https://github.com/hatarist/clickhouse-cli) 是ClickHouse的替代命令行客户端,用Python 3编写。 +[clickhouse-cli](https://github.com/hatarist/clickhouse-cli) 是ClickHouse的替代命令行客户端,用Python 3编写。 特å¾ï¼š @@ -68,15 +68,15 @@ ClickHouse Web ç•Œé¢ [Tabix](https://github.com/tabixio/tabix). - 寻呼机支æŒæ•°æ®è¾“出。 - 自定义PostgreSQL类命令。 -### ツ暗ェツ氾环催ツ団ツ法ツ人 {#clickhouse-flamegraph} +### clickhouse-flamegraph {#clickhouse-flamegraph} [clickhouse-flamegraph](https://github.com/Slach/clickhouse-flamegraph) 是一个å¯è§†åŒ–的专业工具`system.trace_log`如[flamegraph](http://www.brendangregg.com/flamegraphs.html). ## 商业 {#shang-ye} -### ツ环æ¿Softwareョツ嘉ッ {#holistics-software} +### Holistics {#holistics-software} -[整体学](https://www.holistics.io/) 在2019年被Gartner FrontRunners列为å¯ç”¨æ€§æœ€é«˜æŽ’å第二的商业智能工具之一。 Holistics是一个基于SQL的全栈数æ®å¹³å°å’Œå•†ä¸šæ™ºèƒ½å·¥å…·ï¼Œç”¨äºŽè®¾ç½®æ‚¨çš„分æžæµç¨‹ã€‚ +[Holistics](https://www.holistics.io/) 在2019年被Gartner FrontRunners列为å¯ç”¨æ€§æœ€é«˜æŽ’å第二的商业智能工具之一。 Holistics是一个基于SQL的全栈数æ®å¹³å°å’Œå•†ä¸šæ™ºèƒ½å·¥å…·ï¼Œç”¨äºŽè®¾ç½®æ‚¨çš„分æžæµç¨‹ã€‚ 特å¾ï¼š diff --git a/docs/zh/interfaces/third-party/integrations.md b/docs/zh/interfaces/third-party/integrations.md index 403ef994bb9..20dcf47cabd 100644 --- a/docs/zh/interfaces/third-party/integrations.md +++ b/docs/zh/interfaces/third-party/integrations.md @@ -43,7 +43,7 @@ Yandex**没有**维护下é¢åˆ—出的库,也没有åšè¿‡ä»»ä½•å¹¿æ³›çš„测试 - Monitoring - [Graphite](https://graphiteapp.org) - [graphouse](https://github.com/yandex/graphouse) - - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) + + - [carbon-clickhouse](https://github.com/lomik/carbon-clickhouse) - [graphite-clickhouse](https://github.com/lomik/graphite-clickhouse) - [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer) - optimizes staled partitions in [\*GraphiteMergeTree](../../engines/table-engines/mergetree-family/graphitemergetree.md#graphitemergetree) if rules from [rollup configuration](../../engines/table-engines/mergetree-family/graphitemergetree.md#rollup-configuration) could be applied - [Grafana](https://grafana.com/) diff --git a/docs/zh/introduction/adopters.md b/docs/zh/introduction/adopters.md index ed78abeacfb..6b9ac7c2ceb 100644 --- a/docs/zh/introduction/adopters.md +++ b/docs/zh/introduction/adopters.md @@ -64,7 +64,7 @@ toc_title: "ClickHouse用户" | [Splunk](https://www.splunk.com/) | ä¸šåŠ¡åˆ†æž | 主è¦äº§å“ | — | — | [英文幻ç¯ç‰‡ï¼Œ2018å¹´1月](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup12/splunk.pdf) | | [Spotify](https://www.spotify.com) | éŸ³ä¹ | 实验 | — | — | [å¹»ç¯ç‰‡ï¼Œä¸ƒæœˆ2018](https://www.slideshare.net/glebus/using-clickhouse-for-experimentation-104247173) | | [腾讯](https://www.tencent.com) | å¤§æ•°æ® | æ•°æ®å¤„ç† | — | — | [中文幻ç¯ç‰‡ï¼Œ2018å¹´10月](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/5.%20ClickHouse大数æ®é›†ç¾¤åº”用_æŽä¿Šé£žè…¾è®¯ç½‘媒事业部.pdf) | -| 腾讯QQ音ä¹(TME) | å¤§æ•°æ® | æ•°æ®å¤„ç† | — | — | [åšå®¢æ–‡ç« ï¼Œ2020å¹´6月](https://cloud.tencent.com/developer/article/1637840) +| 腾讯QQ音ä¹(TME) | å¤§æ•°æ® | æ•°æ®å¤„ç† | — | — | [åšå®¢æ–‡ç« ï¼Œ2020å¹´6月](https://cloud.tencent.com/developer/article/1637840) | [优步](https://www.uber.com) | 出租车 | 日志记录 | — | — | [å¹»ç¯ç‰‡ï¼ŒäºŒæœˆ2020](https://presentations.clickhouse.tech/meetup40/uber.pdf) | | [VKontakte](https://vk.com) | 社交网络 | 统计,日志记录 | — | — | [俄文幻ç¯ç‰‡ï¼Œå…«æœˆ2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup17/3_vk.pdf) | | [Wisebits](https://wisebits.com/) | IT解决方案 | åˆ†æž | — | — | [俄文幻ç¯ç‰‡ï¼Œ2019å¹´5月](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup22/strategies.pdf) | diff --git a/docs/zh/operations/index.md b/docs/zh/operations/index.md index f35858279f5..5139f083ceb 100644 --- a/docs/zh/operations/index.md +++ b/docs/zh/operations/index.md @@ -5,9 +5,21 @@ toc_title: "æ“作" # æ“作 {#operations} -Clickhouseè¿ç»´æ‰‹å†Œä¸»è¦åŒ…å«ä¸‹é¢å‡ éƒ¨åˆ†ï¼š +ClickHouseæ“作手册由以下主è¦éƒ¨åˆ†ç»„æˆï¼š -- 安装è¦æ±‚ +- [安装è¦æ±‚](../operations/requirements.md) +- [监控](../operations/monitoring.md) +- [故障排除](../operations/troubleshooting.md) +- [使用建议](../operations/tips.md) +- [更新程åº](../operations/update.md) +- [访问æƒé™](../operations/access-rights.md) +- [æ•°æ®å¤‡ä»½](../operations/backup.md) +- [é…置文件](../operations/configuration-files.md) +- [é…é¢](../operations/quotas.md) +- [系统表](../operations/system-tables/index.md) +- [æœåŠ¡å™¨é…ç½®å‚æ•°](../operations/server-configuration-parameters/index.md) +- [如何用ClickHouse测试你的硬件](../operations/performance-test.md) +- [设置](../operations/settings/index.md) +- [实用工具](../operations/utilities/index.md) - -[原始文章](https://clickhouse.tech/docs/en/operations/) +[原文](https://clickhouse.tech/docs/en/operations/) diff --git a/docs/zh/operations/server-configuration-parameters/settings.md b/docs/zh/operations/server-configuration-parameters/settings.md index 615f5ef933d..0c6a5046877 100644 --- a/docs/zh/operations/server-configuration-parameters/settings.md +++ b/docs/zh/operations/server-configuration-parameters/settings.md @@ -462,7 +462,7 @@ SSL客户端/æœåŠ¡å™¨é…置。 - extendedVerification – Automatically extended verification of certificates after the session ends. Acceptable values: `true`, `false`. - requireTLSv1 – Require a TLSv1 connection. Acceptable values: `true`, `false`. - requireTLSv1_1 – Require a TLSv1.1 connection. Acceptable values: `true`, `false`. -- requireTLSv1 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`. +- requireTLSv1_2 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`. - fips – Activates OpenSSL FIPS mode. Supported if the library’s OpenSSL version supports FIPS. - privateKeyPassphraseHandler – Class (PrivateKeyPassphraseHandler subclass) that requests the passphrase for accessing the private key. For example: ``, `KeyFileHandler`, `test`, ``. - invalidCertificateHandler – Class (a subclass of CertificateHandler) for verifying invalid certificates. For example: ` ConsoleCertificateHandler ` . diff --git a/docs/zh/operations/system-tables/asynchronous_metrics.md b/docs/zh/operations/system-tables/asynchronous_metrics.md index 5a302f6da7b..d6d2682c9a1 100644 --- a/docs/zh/operations/system-tables/asynchronous_metrics.md +++ b/docs/zh/operations/system-tables/asynchronous_metrics.md @@ -9,8 +9,8 @@ machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3 列: -- `metric` ([字符串](../../sql-reference/data-types/string.md)) — 指标å。 -- `value` ([Float64](../../sql-reference/data-types/float.md)) — 指标值。 +- `metric` ([字符串](../../sql-reference/data-types/string.md)) — 指标å。 +- `value` ([Float64](../../sql-reference/data-types/float.md)) — 指标值。 **示例** @@ -35,6 +35,6 @@ SELECT * FROM system.asynchronous_metrics LIMIT 10 **å¦è¯·å‚阅** - [监测](../../operations/monitoring.md) — ClickHouse监控的基本概念。 -- [系统。指标](../../operations/system-tables/metrics.md#system_tables-metrics) — 包å«å³æ—¶è®¡ç®—的指标。 -- [系统。活动](../../operations/system-tables/events.md#system_tables-events) — 包å«å‡ºçŽ°çš„事件的次数。 +- [系统。指标](../../operations/system-tables/metrics.md#system_tables-metrics) — 包å«å³æ—¶è®¡ç®—的指标。 +- [系统。活动](../../operations/system-tables/events.md#system_tables-events) — 包å«å‡ºçŽ°çš„事件的次数。 - [系统。metric\_log](../../operations/system-tables/metric_log.md#system_tables-metric_log) — 包å«`system.metrics` å’Œ `system.events`表中的指标的历å²å€¼ã€‚ diff --git a/docs/zh/operations/system-tables/clusters.md b/docs/zh/operations/system-tables/clusters.md index bcafff4970a..386bdfd539b 100644 --- a/docs/zh/operations/system-tables/clusters.md +++ b/docs/zh/operations/system-tables/clusters.md @@ -5,13 +5,13 @@ 列: - `cluster` (String) — 集群å。 -- `shard_num` (UInt32) — 集群中的分片数,从1开始。 +- `shard_num` (UInt32) — 集群中的分片数,从1开始。 - `shard_weight` (UInt32) — 写数æ®æ—¶è¯¥åˆ†ç‰‡çš„相对æƒé‡ã€‚ -- `replica_num` (UInt32) — 分片的副本数é‡ï¼Œä»Ž1开始。 -- `host_name` (String) — é…置中指定的主机å。 +- `replica_num` (UInt32) — 分片的副本数é‡ï¼Œä»Ž1开始。 +- `host_name` (String) — é…置中指定的主机å。 - `host_address` (String) — 从DNS获å–的主机IP地å€ã€‚ -- `port` (UInt16) — 连接到æœåŠ¡å™¨çš„端å£ã€‚ -- `user` (String) — 连接到æœåŠ¡å™¨çš„用户å。 +- `port` (UInt16) — 连接到æœåŠ¡å™¨çš„端å£ã€‚ +- `user` (String) — 连接到æœåŠ¡å™¨çš„用户å。 - `errors_count` (UInt32) - 此主机无法访问副本的次数。 - `slowdowns_count` (UInt32) - 与对冲请求建立连接时导致更改副本的å‡é€Ÿæ¬¡æ•°ã€‚ - `estimated_recovery_time` (UInt32) - 剩下的秒数,直到副本错误计数归零并被视为æ¢å¤æ­£å¸¸ã€‚ diff --git a/docs/zh/operations/system-tables/functions.md b/docs/zh/operations/system-tables/functions.md index 8229a94cd5c..695c7b7fee1 100644 --- a/docs/zh/operations/system-tables/functions.md +++ b/docs/zh/operations/system-tables/functions.md @@ -26,5 +26,5 @@ │ JSONExtractInt │ 0 │ 0 │ │ └──────────────────────────┴──────────────┴──────────────────┴──────────┘ -10 rows in set. Elapsed: 0.002 sec. +10 rows in set. Elapsed: 0.002 sec. ``` diff --git a/docs/zh/operations/system-tables/tables.md b/docs/zh/operations/system-tables/tables.md index 0c3e913b9bb..d82911498ad 100644 --- a/docs/zh/operations/system-tables/tables.md +++ b/docs/zh/operations/system-tables/tables.md @@ -51,7 +51,7 @@ machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3 - 如果表将数æ®å­˜åœ¨ç£ç›˜ä¸Šï¼Œè¿”回实际使用的ç£ç›˜ç©ºé—´ï¼ˆåŽ‹ç¼©åŽï¼‰ã€‚ - 如果表在内存中存储数æ®ï¼Œè¿”回在内存中使用的近似字节数。 -- `lifetime_rows` (Nullbale(UInt64))-æœåŠ¡å¯åŠ¨åŽæ’入的总行数(åªé’ˆå¯¹`Buffer`表)。 +- `lifetime_rows` (Nullbale(UInt64))-æœåŠ¡å¯åŠ¨åŽæ’入的总行数(åªé’ˆå¯¹`Buffer`表)。 `system.tables` 表被用于 `SHOW TABLES` 的查询实现中。 diff --git a/docs/zh/operations/system-tables/zookeeper.md b/docs/zh/operations/system-tables/zookeeper.md index ca767fba7aa..c25ce498460 100644 --- a/docs/zh/operations/system-tables/zookeeper.md +++ b/docs/zh/operations/system-tables/zookeeper.md @@ -20,14 +20,14 @@ machine_translated_rev: 5decc73b5dc60054f19087d3690c4eb99446a6c3 - `name` (String) — 节点的å字。 - `path` (String) — 节点的路径。 -- `value` (String) — 节点的值。 +- `value` (String) — 节点的值。 - `dataLength` (Int32) — 节点的值长度。 - `numChildren` (Int32) — å­èŠ‚点的个数。 - `czxid` (Int64) — 创建该节点的事务ID。 - `mzxid` (Int64) — 最åŽä¿®æ”¹è¯¥èŠ‚点的事务ID。 - `pzxid` (Int64) — 最åŽåˆ é™¤æˆ–者增加å­èŠ‚点的事务ID。 -- `ctime` (DateTime) — 节点的创建时间。 -- `mtime` (DateTime) — 节点的最åŽä¿®æ”¹æ—¶é—´ã€‚ +- `ctime` (DateTime) — 节点的创建时间。 +- `mtime` (DateTime) — 节点的最åŽä¿®æ”¹æ—¶é—´ã€‚ - `version` (Int32) — 节点版本:节点被修改的次数。 - `cversion` (Int32) — 增加或删除å­èŠ‚点的个数。 - `aversion` (Int32) — ACL的修改次数。 diff --git a/docs/zh/operations/utilities/clickhouse-benchmark.md b/docs/zh/operations/utilities/clickhouse-benchmark.md index 1c255f621c0..54dbadc79bc 100644 --- a/docs/zh/operations/utilities/clickhouse-benchmark.md +++ b/docs/zh/operations/utilities/clickhouse-benchmark.md @@ -89,11 +89,11 @@ localhost:9000, queries 10, QPS: 6.772, RPS: 67904487.440, MiB/s: 518.070, resul - ClickHouseæœåŠ¡å™¨çš„连接信æ¯ã€‚ - 已处ç†çš„查询数。 - - QPS:æœåŠ¡ç«¯æ¯ç§’处ç†çš„æŸ¥è¯¢æ•°é‡ - - RPS:æœåŠ¡å™¨æ¯ç§’读å–多少行 - - MiB/s:æœåŠ¡å™¨æ¯ç§’读å–å¤šå°‘å­—èŠ‚çš„æ•°æ® - - 结果RPS:æœåŠ¡ç«¯æ¯ç§’生æˆå¤šå°‘è¡Œçš„ç»“æžœé›†æ•°æ® - - 结果MiB/s.æœåŠ¡ç«¯æ¯ç§’生æˆå¤šå°‘å­—èŠ‚çš„ç»“æžœé›†æ•°æ® + - QPS:æœåŠ¡ç«¯æ¯ç§’处ç†çš„æŸ¥è¯¢æ•°é‡ + - RPS:æœåŠ¡å™¨æ¯ç§’读å–多少行 + - MiB/s:æœåŠ¡å™¨æ¯ç§’读å–å¤šå°‘å­—èŠ‚çš„æ•°æ® + - 结果RPS:æœåŠ¡ç«¯æ¯ç§’生æˆå¤šå°‘è¡Œçš„ç»“æžœé›†æ•°æ® + - 结果MiB/s.æœåŠ¡ç«¯æ¯ç§’生æˆå¤šå°‘å­—èŠ‚çš„ç»“æžœé›†æ•°æ® - 查询执行时间的百分比。 diff --git a/docs/zh/sql-reference/aggregate-functions/parametric-functions.md b/docs/zh/sql-reference/aggregate-functions/parametric-functions.md index fc0c7144305..468e8e9c76e 100644 --- a/docs/zh/sql-reference/aggregate-functions/parametric-functions.md +++ b/docs/zh/sql-reference/aggregate-functions/parametric-functions.md @@ -19,7 +19,7 @@ histogram(number_of_bins)(values) **å‚æ•°** -`number_of_bins` — 直方图bin个数,这个函数会自动计算binçš„æ•°é‡ï¼Œè€Œä¸”会尽é‡ä½¿ç”¨æŒ‡å®šå€¼ï¼Œå¦‚果无法åšåˆ°ï¼Œé‚£å°±ä½¿ç”¨æ›´å°çš„bin个数。 +`number_of_bins` — 直方图bin个数,这个函数会自动计算binçš„æ•°é‡ï¼Œè€Œä¸”会尽é‡ä½¿ç”¨æŒ‡å®šå€¼ï¼Œå¦‚果无法åšåˆ°ï¼Œé‚£å°±ä½¿ç”¨æ›´å°çš„bin个数。 `values` — [表达å¼](../syntax.md#syntax-expressions) 输入值。 diff --git a/docs/zh/sql-reference/aggregate-functions/reference/anylast.md b/docs/zh/sql-reference/aggregate-functions/reference/anylast.md index e6792e0e449..81c93957fbf 100644 --- a/docs/zh/sql-reference/aggregate-functions/reference/anylast.md +++ b/docs/zh/sql-reference/aggregate-functions/reference/anylast.md @@ -6,4 +6,3 @@ toc_priority: 104 选择é‡åˆ°çš„最åŽä¸€ä¸ªå€¼ã€‚ 其结果和[any](../../../sql-reference/aggregate-functions/reference/any.md) 函数一样是ä¸ç¡®å®šçš„ 。 - \ No newline at end of file diff --git a/docs/zh/sql-reference/aggregate-functions/reference/avgweighted.md b/docs/zh/sql-reference/aggregate-functions/reference/avgweighted.md index 9b732f57b4a..433b1c12099 100644 --- a/docs/zh/sql-reference/aggregate-functions/reference/avgweighted.md +++ b/docs/zh/sql-reference/aggregate-functions/reference/avgweighted.md @@ -15,8 +15,8 @@ avgWeighted(x, weight) **å‚æ•°** -- `x` — 值。 -- `weight` — 值的加æƒã€‚ +- `x` — 值。 +- `weight` — 值的加æƒã€‚ `x` å’Œ `weight` 的类型必须是 [æ•´æ•°](../../../sql-reference/data-types/int-uint.md), 或 diff --git a/docs/zh/sql-reference/aggregate-functions/reference/rankCorr.md b/docs/zh/sql-reference/aggregate-functions/reference/rankCorr.md index c29a43f6ca9..716a9fb2440 100644 --- a/docs/zh/sql-reference/aggregate-functions/reference/rankCorr.md +++ b/docs/zh/sql-reference/aggregate-functions/reference/rankCorr.md @@ -17,7 +17,7 @@ rankCorr(x, y) - Returns a rank correlation coefficient of the ranks of x and y. The value of the correlation coefficient ranges from -1 to +1. If less than two arguments are passed, the function will return an exception. The value close to +1 denotes a high linear relationship, and with an increase of one random variable, the second random variable also increases. The value close to -1 denotes a high linear relationship, and with an increase of one random variable, the second random variable decreases. The value close or equal to 0 denotes no relationship between the two random variables. -类型: [Float64](../../../sql-reference/data-types/float.md#float32-float64)。 +类型: [Float64](../../../sql-reference/data-types/float.md#float32-float64)。 **示例** diff --git a/docs/zh/sql-reference/data-types/lowcardinality.md b/docs/zh/sql-reference/data-types/lowcardinality.md index b8985691f0f..a0cd4c270c3 100644 --- a/docs/zh/sql-reference/data-types/lowcardinality.md +++ b/docs/zh/sql-reference/data-types/lowcardinality.md @@ -32,7 +32,7 @@ LowCardinality(data_type) ```sql CREATE TABLE lc_t ( - `id` UInt16, + `id` UInt16, `strings` LowCardinality(String) ) ENGINE = MergeTree() diff --git a/docs/zh/sql-reference/data-types/special-data-types/interval.md b/docs/zh/sql-reference/data-types/special-data-types/interval.md index 9df25e3f555..7328988c68d 100644 --- a/docs/zh/sql-reference/data-types/special-data-types/interval.md +++ b/docs/zh/sql-reference/data-types/special-data-types/interval.md @@ -57,7 +57,7 @@ SELECT now() as current_date_time, current_date_time + INTERVAL 4 DAY ä¸åŒç±»åž‹çš„é—´éš”ä¸èƒ½åˆå¹¶ã€‚ ä½ ä¸èƒ½ä½¿ç”¨è¯¸å¦‚ `4 DAY 1 HOUR` 的时间间隔. 以å°äºŽæˆ–等于时间间隔最å°å•ä½çš„å•ä½æ¥æŒ‡å®šé—´éš”,例如,时间间隔 `1 day and an hour` å¯ä»¥è¡¨ç¤ºä¸º `25 HOUR` 或 `90000 SECOND`. ä½ ä¸èƒ½å¯¹ `Interval` 类型的值执行算术è¿ç®—,但你å¯ä»¥å‘ `Date` 或 `DateTime` æ•°æ®ç±»åž‹çš„值添加ä¸åŒç±»åž‹çš„时间间隔,例如: - + ``` sql SELECT now() AS current_date_time, current_date_time + INTERVAL 4 DAY + INTERVAL 3 HOUR ``` diff --git a/docs/zh/sql-reference/functions/date-time-functions.md b/docs/zh/sql-reference/functions/date-time-functions.md index 12024a11c3d..9f8e7f49d07 100644 --- a/docs/zh/sql-reference/functions/date-time-functions.md +++ b/docs/zh/sql-reference/functions/date-time-functions.md @@ -364,13 +364,13 @@ SELECT toDate('2016-12-27') AS date, toYearWeek(date) AS yearWeek0, toYearWeek(d å°†Date或DateTime按指定的å•ä½å‘å‰å–整到最接近的时间点。 -**语法** +**语法** ``` sql date_trunc(unit, value[, timezone]) ``` -别å: `dateTrunc`. +别å: `dateTrunc`. **å‚æ•°** @@ -433,7 +433,7 @@ SELECT now(), date_trunc('hour', now(), 'Europe/Moscow'); 返回当å‰æ—¥æœŸå’Œæ—¶é—´ã€‚ -**语法** +**语法** ``` sql now([timezone]) diff --git a/docs/zh/sql-reference/functions/encoding-functions.md b/docs/zh/sql-reference/functions/encoding-functions.md index 75e0118a88d..e0daab15dd3 100644 --- a/docs/zh/sql-reference/functions/encoding-functions.md +++ b/docs/zh/sql-reference/functions/encoding-functions.md @@ -1,7 +1,7 @@ # ç¼–ç å‡½æ•° {#bian-ma-han-shu} ## char {#char} - + 返回长度为传递å‚æ•°æ•°é‡çš„字符串,并且æ¯ä¸ªå­—节都有对应å‚数的值。接å—æ•°å­—Numeric类型的多个å‚数。如果å‚数的值超出了UInt8æ•°æ®ç±»åž‹çš„范围,则将其转æ¢ä¸ºUInt8,并å¯èƒ½è¿›è¡Œèˆå…¥å’Œæº¢å‡ºã€‚ **语法** diff --git a/docs/zh/sql-reference/statements/alter.md b/docs/zh/sql-reference/statements/alter.md index 4d1cdca71e5..81ef9124e45 100644 --- a/docs/zh/sql-reference/statements/alter.md +++ b/docs/zh/sql-reference/statements/alter.md @@ -208,7 +208,7 @@ ALTER TABLE [db].name DROP CONSTRAINT constraint_name; - [MOVE PARTITION TO TABLE](#alter_move_to_table-partition) — 从表中å¤åˆ¶æ•°æ®åˆ†åŒºåˆ°å…¶å®ƒè¡¨. - [CLEAR COLUMN IN PARTITION](#alter_clear-column-partition) — é‡ç½®åˆ†åŒºä¸­æŸä¸ªåˆ—的值 - [CLEAR INDEX IN PARTITION](#alter_clear-index-partition) — é‡ç½®åˆ†åŒºä¸­æŒ‡å®šçš„二级索引 -- [FREEZE PARTITION](#alter_freeze-partition) — 创建分区的备份 +- [FREEZE PARTITION](#alter_freeze-partition) — 创建分区的备份 - [FETCH PARTITION](#alter_fetch-partition) — 从其它æœåŠ¡å™¨ä¸Šä¸‹è½½åˆ† - [MOVE PARTITION\|PART](#alter_move-partition) — 将分区/æ•°æ®å—移动到å¦å¤–çš„ç£ç›˜/å· @@ -380,7 +380,7 @@ ALTER TABLE table_name FETCH PARTITION partition_expr FROM 'path-in-zookeeper' 从å¦ä¸€æœåŠ¡å™¨ä¸Šä¸‹è½½åˆ†åŒºæ•°æ®ã€‚仅支æŒå¯å¤åˆ¶å¼•æ“Žè¡¨ã€‚ 该æ“作åšäº†å¦‚下步骤: 1. 从指定数æ®åˆ†ç‰‡ä¸Šä¸‹è½½åˆ†åŒºã€‚在 path-in-zookeeper 这一å‚数你必须设置Zookeeper中该分片的path值。 -2. 然åŽå°†å·²ä¸‹è½½çš„æ•°æ®æ”¾åˆ° `table_name` 表的 `detached` 目录下。通过 [ATTACH PARTITION\|PART](#alter_attach-partition)将数æ®åŠ è½½åˆ°è¡¨ä¸­ã€‚ +2. 然åŽå°†å·²ä¸‹è½½çš„æ•°æ®æ”¾åˆ° `table_name` 表的 `detached` 目录下。通过 [ATTACH PARTITION\|PART](#alter_attach-partition)将数æ®åŠ è½½åˆ°è¡¨ä¸­ã€‚ 示例: diff --git a/docs/zh/sql-reference/statements/grant.md b/docs/zh/sql-reference/statements/grant.md index f8d85679fa3..cb3952767ac 100644 --- a/docs/zh/sql-reference/statements/grant.md +++ b/docs/zh/sql-reference/statements/grant.md @@ -12,7 +12,7 @@ toc_title: 授æƒæ“作 ## 授æƒæ“作语法 {#grant-privigele-syntax} ``` sql -GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.table|db.*|*.*|table|*} TO {user | role | CURRENT_USER} [,...] [WITH GRANT OPTION] +GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.table|db.*|*.*|table|*} TO {user | role | CURRENT_USER} [,...] [WITH GRANT OPTION] [WITH REPLACE OPTION] ``` - `privilege` — æƒé™ç±»åž‹ @@ -20,17 +20,19 @@ GRANT [ON CLUSTER cluster_name] privilege[(column_name [,...])] [,...] ON {db.ta - `user` — ç”¨æˆ·è´¦å· `WITH GRANT OPTION` 授予 `user` 或 `role`执行 `GRANT` æ“作的æƒé™ã€‚用户å¯å°†åœ¨è‡ªèº«æƒé™èŒƒå›´å†…çš„æƒé™è¿›è¡ŒæŽˆæƒ +`WITH REPLACE OPTION` 以当å‰sql里的新æƒé™æ›¿ä»£æŽ‰ `user` 或 `role`çš„æ—§æƒé™ï¼Œå¦‚果没有该选项则是追加授æƒã€‚ ## 角色分é…的语法 {#assign-role-syntax} ``` sql -GRANT [ON CLUSTER cluster_name] role [,...] TO {user | another_role | CURRENT_USER} [,...] [WITH ADMIN OPTION] +GRANT [ON CLUSTER cluster_name] role [,...] TO {user | another_role | CURRENT_USER} [,...] [WITH ADMIN OPTION] [WITH REPLACE OPTION] ``` - `role` — 角色 - `user` — 用户 `WITH ADMIN OPTION` 授予 `user` 或 `role` 执行[ADMIN OPTION](#admin-option-privilege) çš„æƒé™ +`WITH REPLACE OPTION` 以当å‰sql里的新role替代掉 `user` 或 `role`çš„æ—§role,如果没有该选项则是追加roles。 ## 用法 {#grant-usage} @@ -86,7 +88,7 @@ GRANT SELECT(x,y) ON db.table TO john WITH GRANT OPTION - `ALTER CLEAR INDEX` - `ALTER CONSTRAINT` - `ALTER ADD CONSTRAINT` - + - `ALTER DROP CONSTRAINT` - `ALTER TTL` - `ALTER MATERIALIZE TTL` diff --git a/docs/zh/sql-reference/statements/misc.md b/docs/zh/sql-reference/statements/misc.md index 26b3997a4d6..5ec589c9e9a 100644 --- a/docs/zh/sql-reference/statements/misc.md +++ b/docs/zh/sql-reference/statements/misc.md @@ -21,7 +21,7 @@ toc_title: "\u5176\u4ED6" ATTACH TABLE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster] ``` -å¯åŠ¨æœåŠ¡å™¨æ—¶ä¼šè‡ªåŠ¨è§¦å‘此查询。 +å¯åŠ¨æœåŠ¡å™¨æ—¶ä¼šè‡ªåŠ¨è§¦å‘此查询。 æœåŠ¡å™¨å°†è¡¨çš„元数æ®ä½œä¸ºæ–‡ä»¶å­˜å‚¨ `ATTACH` 查询,它åªæ˜¯åœ¨å¯åŠ¨æ—¶è¿è¡Œã€‚有些表例外,如系统表,它们是在æœåŠ¡å™¨ä¸Šæ˜¾å¼æŒ‡å®šçš„。 diff --git a/docs/zh/sql-reference/statements/select/from.md b/docs/zh/sql-reference/statements/select/from.md index 71b7cd319eb..fae25c0c3c1 100644 --- a/docs/zh/sql-reference/statements/select/from.md +++ b/docs/zh/sql-reference/statements/select/from.md @@ -14,7 +14,7 @@ toc_title: FROM å­æŸ¥è¯¢æ˜¯å¦ä¸€ä¸ª `SELECT` å¯ä»¥æŒ‡å®šåœ¨ `FROM` åŽçš„括å·å†…的查询。 -`FROM` å­å¥å¯ä»¥åŒ…å«å¤šä¸ªæ•°æ®æºï¼Œç”¨é€—å·åˆ†éš”,这相当于在他们身上执行 [CROSS JOIN](../../../sql-reference/statements/select/join.md) +`FROM` å­å¥å¯ä»¥åŒ…å«å¤šä¸ªæ•°æ®æºï¼Œç”¨é€—å·åˆ†éš”,这相当于在他们身上执行 [CROSS JOIN](../../../sql-reference/statements/select/join.md) ## FINAL 修饰符 {#select-from-final} diff --git a/docs/zh/sql-reference/statements/select/index.md b/docs/zh/sql-reference/statements/select/index.md index 689a4f91a0c..d3de71efc6b 100644 --- a/docs/zh/sql-reference/statements/select/index.md +++ b/docs/zh/sql-reference/statements/select/index.md @@ -22,7 +22,7 @@ SELECT [DISTINCT] expr_list [WHERE expr] [GROUP BY expr_list] [WITH TOTALS] [HAVING expr] -[ORDER BY expr_list] [WITH FILL] [FROM expr] [TO expr] [STEP expr] +[ORDER BY expr_list] [WITH FILL] [FROM expr] [TO expr] [STEP expr] [LIMIT [offset_value, ]n BY columns] [LIMIT [n, ]m] [WITH TIES] [UNION ALL ...] diff --git a/docs/zh/sql-reference/statements/select/join.md b/docs/zh/sql-reference/statements/select/join.md index 407c8ca6101..911ff15f576 100644 --- a/docs/zh/sql-reference/statements/select/join.md +++ b/docs/zh/sql-reference/statements/select/join.md @@ -51,7 +51,7 @@ ClickHouse中æ供的其他è”接类型: - 必须包å«æœ‰åºåºåˆ—。 - å¯ä»¥æ˜¯ä»¥ä¸‹ç±»åž‹ä¹‹ä¸€: [Int*,UInt*](../../../sql-reference/data-types/int-uint.md), [Float\*](../../../sql-reference/data-types/float.md), [Date](../../../sql-reference/data-types/date.md), [DateTime](../../../sql-reference/data-types/datetime.md), [Decimal\*](../../../sql-reference/data-types/decimal.md). -- ä¸èƒ½æ˜¯`JOIN`å­å¥ä¸­å”¯ä¸€çš„列 +- ä¸èƒ½æ˜¯`JOIN`å­å¥ä¸­å”¯ä¸€çš„列 语法 `ASOF JOIN ... ON`: diff --git a/docs/zh/sql-reference/table-functions/input.md b/docs/zh/sql-reference/table-functions/input.md index a0215b26c8a..61bb58d73e2 100644 --- a/docs/zh/sql-reference/table-functions/input.md +++ b/docs/zh/sql-reference/table-functions/input.md @@ -16,7 +16,7 @@ toc_title: input æ•°æ®å¯ä»¥åƒæ™®é€š `INSERT` 查询一样å‘é€ï¼Œå¹¶ä»¥å¿…须在查询末尾指定的任何å¯ç”¨[æ ¼å¼](../../interfaces/formats.md#formats) 传递(与普通 `INSERT SELECT`ä¸åŒ)。 -该函数的主è¦ç‰¹ç‚¹æ˜¯ï¼Œå½“æœåŠ¡å™¨ä»Žå®¢æˆ·ç«¯æŽ¥æ”¶æ•°æ®æ—¶ï¼Œå®ƒä¼šåŒæ—¶æ ¹æ® `SELECT` å­å¥ä¸­çš„表达å¼åˆ—表将其转æ¢ï¼Œå¹¶æ’入到目标表中。 +该函数的主è¦ç‰¹ç‚¹æ˜¯ï¼Œå½“æœåŠ¡å™¨ä»Žå®¢æˆ·ç«¯æŽ¥æ”¶æ•°æ®æ—¶ï¼Œå®ƒä¼šåŒæ—¶æ ¹æ® `SELECT` å­å¥ä¸­çš„表达å¼åˆ—表将其转æ¢ï¼Œå¹¶æ’入到目标表中。 ä¸ä¼šåˆ›å»ºåŒ…å«æ‰€æœ‰å·²ä¼ è¾“æ•°æ®çš„临时表。 **例** diff --git a/docs/zh/sql-reference/table-functions/jdbc.md b/docs/zh/sql-reference/table-functions/jdbc.md index 02ebdcb407e..4d95a5ed203 100644 --- a/docs/zh/sql-reference/table-functions/jdbc.md +++ b/docs/zh/sql-reference/table-functions/jdbc.md @@ -25,7 +25,7 @@ SELECT * FROM jdbc('mysql-dev?p1=233', 'num Int32', 'select toInt32OrZero(''{{p1 ``` ``` sql -SELECT * +SELECT * FROM jdbc('mysql-dev?p1=233', 'num Int32', 'select toInt32OrZero(''{{p1}}'') as num') ``` diff --git a/docs/zh/sql-reference/table-functions/remote.md b/docs/zh/sql-reference/table-functions/remote.md index cacc68c0b71..dfe81d0bafe 100644 --- a/docs/zh/sql-reference/table-functions/remote.md +++ b/docs/zh/sql-reference/table-functions/remote.md @@ -16,27 +16,27 @@ remoteSecure('addresses_expr', db.table[, 'user'[, 'password'], sharding_key]) **å‚æ•°** - `addresses_expr` – 代表远程æœåŠ¡å™¨åœ°å€çš„一个表达å¼ã€‚å¯ä»¥åªæ˜¯å•ä¸ªæœåŠ¡å™¨åœ°å€ã€‚ æœåŠ¡å™¨åœ°å€å¯ä»¥æ˜¯ `host:port` 或 `host`。 - + `host` å¯ä»¥æŒ‡å®šä¸ºæœåŠ¡å™¨å称,或是IPV4或IPV6地å€ã€‚IPv6地å€åœ¨æ–¹æ‹¬å·ä¸­æŒ‡å®šã€‚ - + `port` 是远程æœåŠ¡å™¨ä¸Šçš„TCP端å£ã€‚ 如果çœç•¥ç«¯å£ï¼Œåˆ™ `remote` 使用æœåŠ¡å™¨é…置文件中的 [tcp_port](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port) (默认情况为,9000),`remoteSecure` 使用 [tcp_port_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) (默认情况为,9440)。 IPv6地å€éœ€è¦æŒ‡å®šç«¯å£ã€‚ - + 类型: [String](../../sql-reference/data-types/string.md)。 - + - `db` — æ•°æ®åº“å。类型: [String](../../sql-reference/data-types/string.md)。 - `table` — 表å。类型: [String](../../sql-reference/data-types/string.md)。 - `user` — 用户å。如果未指定用户,则使用 `default` 。类型: [String](../../sql-reference/data-types/string.md)。 - `password` — 用户密ç ã€‚如果未指定密ç ï¼Œåˆ™ä½¿ç”¨ç©ºå¯†ç ã€‚类型: [String](../../sql-reference/data-types/string.md)。 - `sharding_key` — 分片键以支æŒåœ¨èŠ‚点之间分布数æ®ã€‚ 例如: `insert into remote('127.0.0.1:9000,127.0.0.2', db, table, 'default', rand())`。 类型: [UInt32](../../sql-reference/data-types/int-uint.md)。 - + **返回值** - + æ¥è‡ªè¿œç¨‹æœåŠ¡å™¨çš„æ•°æ®é›†ã€‚ - + **用法** - + 使用 `remote` 表函数没有创建一个 `Distributed` 表更优,因为在这ç§æƒ…况下,将为æ¯ä¸ªè¯·æ±‚é‡æ–°å»ºç«‹æœåŠ¡å™¨è¿žæŽ¥ã€‚此外,如果设置了主机å,则会解æžè¿™äº›å称,并且在使用å„ç§å‰¯æœ¬æ—¶ä¸ä¼šè®¡å…¥é”™è¯¯ã€‚ 在处ç†å¤§é‡æŸ¥è¯¢æ—¶ï¼Œå§‹ç»ˆä¼˜å…ˆåˆ›å»º `Distributed` 表,ä¸è¦ä½¿ç”¨ `remote` 表函数。 该 `remote` 表函数å¯ä»¥åœ¨ä»¥ä¸‹æƒ…况下是有用的: @@ -45,7 +45,7 @@ remoteSecure('addresses_expr', db.table[, 'user'[, 'password'], sharding_key]) - 在多个ClickHouse集群之间的用户研究目的的查询。 - 手动å‘出的ä¸é¢‘ç¹åˆ†å¸ƒå¼è¯·æ±‚。 - æ¯æ¬¡é‡æ–°å®šä¹‰æœåŠ¡å™¨é›†çš„分布å¼è¯·æ±‚。 - + **地å€** ``` text diff --git a/programs/benchmark/Benchmark.cpp b/programs/benchmark/Benchmark.cpp index 859222c236e..be57a3b92a0 100644 --- a/programs/benchmark/Benchmark.cpp +++ b/programs/benchmark/Benchmark.cpp @@ -58,7 +58,8 @@ namespace ErrorCodes class Benchmark : public Poco::Util::Application { public: - Benchmark(unsigned concurrency_, double delay_, Strings && hosts_, Ports && ports_, + Benchmark(unsigned concurrency_, double delay_, + Strings && hosts_, Ports && ports_, bool round_robin_, bool cumulative_, bool secure_, const String & default_database_, const String & user_, const String & password_, const String & stage, bool randomize_, size_t max_iterations_, double max_time_, @@ -66,7 +67,7 @@ public: const String & query_id_, const String & query_to_execute_, bool continue_on_errors_, bool reconnect_, bool print_stacktrace_, const Settings & settings_) : - concurrency(concurrency_), delay(delay_), queue(concurrency), randomize(randomize_), + round_robin(round_robin_), concurrency(concurrency_), delay(delay_), queue(concurrency), randomize(randomize_), cumulative(cumulative_), max_iterations(max_iterations_), max_time(max_time_), json_path(json_path_), confidence(confidence_), query_id(query_id_), query_to_execute(query_to_execute_), continue_on_errors(continue_on_errors_), reconnect(reconnect_), @@ -78,8 +79,8 @@ public: size_t connections_cnt = std::max(ports_.size(), hosts_.size()); connections.reserve(connections_cnt); - comparison_info_total.reserve(connections_cnt); - comparison_info_per_interval.reserve(connections_cnt); + comparison_info_total.reserve(round_robin ? 1 : connections_cnt); + comparison_info_per_interval.reserve(round_robin ? 1 : connections_cnt); for (size_t i = 0; i < connections_cnt; ++i) { @@ -90,11 +91,17 @@ public: concurrency, cur_host, cur_port, default_database_, user_, password_, - "", /* cluster */ - "", /* cluster_secret */ - "benchmark", Protocol::Compression::Enable, secure)); - comparison_info_per_interval.emplace_back(std::make_shared()); - comparison_info_total.emplace_back(std::make_shared()); + /* cluster_= */ "", + /* cluster_secret_= */ "", + /* client_name_= */ "benchmark", + Protocol::Compression::Enable, + secure)); + + if (!round_robin || comparison_info_per_interval.empty()) + { + comparison_info_per_interval.emplace_back(std::make_shared()); + comparison_info_total.emplace_back(std::make_shared()); + } } global_context->makeGlobalContext(); @@ -134,6 +141,7 @@ private: using EntryPtr = std::shared_ptr; using EntryPtrs = std::vector; + bool round_robin; unsigned concurrency; double delay; @@ -271,7 +279,8 @@ private: if (max_time > 0 && total_watch.elapsedSeconds() >= max_time) { - std::cout << "Stopping launch of queries. Requested time limit is exhausted.\n"; + std::cout << "Stopping launch of queries." + << " Requested time limit " << max_time << " seconds is exhausted.\n"; return false; } @@ -313,6 +322,7 @@ private: } catch (...) { + shutdown = true; pool.wait(); throw; } @@ -368,8 +378,7 @@ private: { extracted = queue.tryPop(query, 100); - if (shutdown - || (max_iterations && queries_executed == max_iterations)) + if (shutdown || (max_iterations && queries_executed == max_iterations)) { return; } @@ -382,8 +391,9 @@ private: } catch (...) { - std::cerr << "An error occurred while processing the query '" - << query << "'.\n"; + std::lock_guard lock(mutex); + std::cerr << "An error occurred while processing the query " << "'" << query << "'" + << ": " << getCurrentExceptionMessage(false) << std::endl; if (!continue_on_errors) { shutdown = true; @@ -394,8 +404,9 @@ private: std::cerr << getCurrentExceptionMessage(print_stacktrace, true /*check embedded stack trace*/) << std::endl; - comparison_info_per_interval[connection_index]->errors++; - comparison_info_total[connection_index]->errors++; + size_t info_index = round_robin ? 0 : connection_index; + comparison_info_per_interval[info_index]->errors++; + comparison_info_total[info_index]->errors++; } } // Count failed queries toward executed, so that we'd reach @@ -432,9 +443,10 @@ private: std::lock_guard lock(mutex); - comparison_info_per_interval[connection_index]->add(seconds, progress.read_rows, progress.read_bytes, info.rows, info.bytes); - comparison_info_total[connection_index]->add(seconds, progress.read_rows, progress.read_bytes, info.rows, info.bytes); - t_test.add(connection_index, seconds); + size_t info_index = round_robin ? 0 : connection_index; + comparison_info_per_interval[info_index]->add(seconds, progress.read_rows, progress.read_bytes, info.rows, info.bytes); + comparison_info_total[info_index]->add(seconds, progress.read_rows, progress.read_bytes, info.rows, info.bytes); + t_test.add(info_index, seconds); } void report(MultiStats & infos) @@ -452,8 +464,19 @@ private: double seconds = info->work_time / concurrency; + std::string connection_description = connections[i]->getDescription(); + if (round_robin) + { + connection_description.clear(); + for (const auto & conn : connections) + { + if (!connection_description.empty()) + connection_description += ", "; + connection_description += conn->getDescription(); + } + } std::cerr - << connections[i]->getDescription() << ", " + << connection_description << ", " << "queries " << info->queries << ", "; if (info->errors) { @@ -586,8 +609,9 @@ int mainEntryClickHouseBenchmark(int argc, char ** argv) ("timelimit,t", value()->default_value(0.), "stop launch of queries after specified time limit") ("randomize,r", value()->default_value(false), "randomize order of execution") ("json", value()->default_value(""), "write final report to specified file in JSON format") - ("host,h", value()->multitoken(), "") - ("port,p", value()->multitoken(), "") + ("host,h", value()->multitoken(), "list of hosts") + ("port,p", value()->multitoken(), "list of ports") + ("roundrobin", "Instead of comparing queries for different --host/--port just pick one random --host/--port for every query and send query to it.") ("cumulative", "prints cumulative data instead of data per interval") ("secure,s", "Use TLS connection") ("user", value()->default_value("default"), "") @@ -634,6 +658,7 @@ int mainEntryClickHouseBenchmark(int argc, char ** argv) options["delay"].as(), std::move(hosts), std::move(ports), + options.count("roundrobin"), options.count("cumulative"), options.count("secure"), options["database"].as(), diff --git a/programs/client/Client.cpp b/programs/client/Client.cpp index 9c1c8338321..b28ef8f7c7f 100644 --- a/programs/client/Client.cpp +++ b/programs/client/Client.cpp @@ -26,6 +26,10 @@ #include #include #include +#include +#include +#include +#include #include #include #include @@ -54,8 +58,7 @@ #include #include #include -#include -#include +#include #include #include #include @@ -79,6 +82,7 @@ #include #include #include +#include #include #include #include @@ -193,7 +197,7 @@ private: std::unique_ptr pager_cmd; /// The user can specify to redirect query output to a file. - std::optional out_file_buf; + std::unique_ptr out_file_buf; BlockOutputStreamPtr block_out_stream; /// The user could specify special file for server logs (stderr by default) @@ -301,26 +305,9 @@ private: } catch (const Exception & e) { - bool print_stack_trace = config().getBool("stacktrace", false); + bool print_stack_trace = config().getBool("stacktrace", false) && e.code() != ErrorCodes::NETWORK_ERROR; - std::string text = e.displayText(); - - /** If exception is received from server, then stack trace is embedded in message. - * If exception is thrown on client, then stack trace is in separate field. - */ - - auto embedded_stack_trace_pos = text.find("Stack trace"); - if (std::string::npos != embedded_stack_trace_pos && !print_stack_trace) - text.resize(embedded_stack_trace_pos); - - std::cerr << "Code: " << e.code() << ". " << text << std::endl << std::endl; - - /// Don't print the stack trace on the client if it was logged on the server. - /// Also don't print the stack trace in case of network errors. - if (print_stack_trace && e.code() != ErrorCodes::NETWORK_ERROR && std::string::npos == embedded_stack_trace_pos) - { - std::cerr << "Stack trace:" << std::endl << e.getStackTraceString(); - } + std::cerr << getExceptionMessage(e, print_stack_trace, true) << std::endl << std::endl; /// If exception code isn't zero, we should return non-zero return code anyway. return e.code() ? e.code() : -1; @@ -438,6 +425,7 @@ private: {TokenType::Semicolon, Replxx::Color::INTENSE}, {TokenType::Dot, Replxx::Color::INTENSE}, {TokenType::Asterisk, Replxx::Color::INTENSE}, + {TokenType::HereDoc, Replxx::Color::CYAN}, {TokenType::Plus, Replxx::Color::INTENSE}, {TokenType::Minus, Replxx::Color::INTENSE}, {TokenType::Slash, Replxx::Color::INTENSE}, @@ -463,8 +451,7 @@ private: {TokenType::ErrorDoubleQuoteIsNotClosed, Replxx::Color::RED}, {TokenType::ErrorSinglePipeMark, Replxx::Color::RED}, {TokenType::ErrorWrongNumber, Replxx::Color::RED}, - { TokenType::ErrorMaxQuerySizeExceeded, - Replxx::Color::RED }}; + {TokenType::ErrorMaxQuerySizeExceeded, Replxx::Color::RED }}; const Replxx::Color unknown_token_color = Replxx::Color::RED; @@ -487,6 +474,52 @@ private: } #endif + /// Make query to get all server warnings + std::vector loadWarningMessages() + { + std::vector messages; + connection->sendQuery(connection_parameters.timeouts, "SELECT message FROM system.warnings", "" /* query_id */, QueryProcessingStage::Complete); + while (true) + { + Packet packet = connection->receivePacket(); + switch (packet.type) + { + case Protocol::Server::Data: + if (packet.block) + { + const ColumnString & column = typeid_cast(*packet.block.getByPosition(0).column); + + size_t rows = packet.block.rows(); + for (size_t i = 0; i < rows; ++i) + messages.emplace_back(column.getDataAt(i).toString()); + } + continue; + + case Protocol::Server::Progress: + continue; + case Protocol::Server::ProfileInfo: + continue; + case Protocol::Server::Totals: + continue; + case Protocol::Server::Extremes: + continue; + case Protocol::Server::Log: + continue; + + case Protocol::Server::Exception: + packet.exception->rethrow(); + return messages; + + case Protocol::Server::EndOfStream: + return messages; + + default: + throw Exception(ErrorCodes::UNKNOWN_PACKET_FROM_SERVER, "Unknown packet {} from server {}", + packet.type, connection->getDescription()); + } + } + } + int mainImpl() { UseSSL use_ssl; @@ -565,6 +598,26 @@ private: suggest->load(connection_parameters, config().getInt("suggestion_limit")); } + /// Load Warnings at the beginning of connection + if (!config().has("no-warnings")) + { + try + { + std::vector messages = loadWarningMessages(); + if (!messages.empty()) + { + std::cout << "Warnings:" << std::endl; + for (const auto & message : messages) + std::cout << "* " << message << std::endl; + std::cout << std::endl; + } + } + catch (...) + { + /// Ignore exception + } + } + /// Load command history if present. if (config().has("history_file")) history_file = config().getString("history_file"); @@ -633,17 +686,10 @@ private: } catch (const Exception & e) { - // We don't need to handle the test hints in the interactive - // mode. - std::cerr << std::endl - << "Exception on client:" << std::endl - << "Code: " << e.code() << ". " << e.displayText() << std::endl; - - if (config().getBool("stacktrace", false)) - std::cerr << "Stack trace:" << std::endl << e.getStackTraceString() << std::endl; - - std::cerr << std::endl; + /// We don't need to handle the test hints in the interactive mode. + bool print_stack_trace = config().getBool("stacktrace", false); + std::cerr << "Exception on client:" << std::endl << getExceptionMessage(e, print_stack_trace, true) << std::endl << std::endl; client_exception = std::make_unique(e); } @@ -940,18 +986,11 @@ private: { if (server_exception) { - std::string text = server_exception->displayText(); - auto embedded_stack_trace_pos = text.find("Stack trace"); - if (std::string::npos != embedded_stack_trace_pos && !config().getBool("stacktrace", false)) - { - text.resize(embedded_stack_trace_pos); - } + bool print_stack_trace = config().getBool("stacktrace", false); std::cerr << "Received exception from server (version " << server_version << "):" << std::endl - << "Code: " << server_exception->code() << ". " << text << std::endl; + << getExceptionMessage(*server_exception, print_stack_trace, true) << std::endl; if (is_interactive) - { std::cerr << std::endl; - } } if (client_exception) @@ -1410,11 +1449,15 @@ private: { // Just report it, we'll terminate below. fmt::print(stderr, - "Error while reconnecting to the server: Code: {}: {}\n", - getCurrentExceptionCode(), + "Error while reconnecting to the server: {}\n", getCurrentExceptionMessage(true)); - assert(!connection->isConnected()); + // The reconnection might fail, but we'll still be connected + // in the sense of `connection->isConnected() = true`, + // in case when the requested database doesn't exist. + // Disconnect manually now, so that the following code doesn't + // have any doubts, and the connection state is predictable. + connection->disconnect(); } } @@ -1890,19 +1933,24 @@ private: current_format = insert->format; } - BlockInputStreamPtr block_input = context->getInputFormat(current_format, buf, sample, insert_format_max_block_size); + auto source = FormatFactory::instance().getInput(current_format, buf, sample, context, insert_format_max_block_size); + Pipe pipe(source); if (columns_description.hasDefaults()) - block_input = std::make_shared(block_input, columns_description, context); - - BlockInputStreamPtr async_block_input = std::make_shared(block_input); - - async_block_input->readPrefix(); - - while (true) { - Block block = async_block_input->read(); + pipe.addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, columns_description, *source, context); + }); + } + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); + PullingAsyncPipelineExecutor executor(pipeline); + + Block block; + while (executor.pull(block)) + { /// Check if server send Log packet receiveLogs(); @@ -1914,18 +1962,18 @@ private: * We're exiting with error, so it makes sense to kill the * input stream without waiting for it to complete. */ - async_block_input->cancel(true); + executor.cancel(); return; } - connection->sendData(block); - processed_rows += block.rows(); - - if (!block) - break; + if (block) + { + connection->sendData(block); + processed_rows += block.rows(); + } } - async_block_input->readSuffix(); + connection->sendData({}); } @@ -2195,8 +2243,11 @@ private: const auto & out_file_node = query_with_output->out_file->as(); const auto & out_file = out_file_node.value.safeGet(); - out_file_buf.emplace(out_file, DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_EXCL | O_CREAT); - out_buf = &*out_file_buf; + out_file_buf = wrapWriteBufferWithCompressionMethod( + std::make_unique(out_file, DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_EXCL | O_CREAT), + chooseCompressionMethod(out_file, ""), + /* compression level = */ 3 + ); // We are writing to file, so default format is the same as in non-interactive mode. if (is_interactive && is_default_format) @@ -2216,9 +2267,9 @@ private: /// It is not clear how to write progress with parallel formatting. It may increase code complexity significantly. if (!need_render_progress) - block_out_stream = context->getOutputStreamParallelIfPossible(current_format, *out_buf, block); + block_out_stream = context->getOutputStreamParallelIfPossible(current_format, out_file_buf ? *out_file_buf : *out_buf, block); else - block_out_stream = context->getOutputStream(current_format, *out_buf, block); + block_out_stream = context->getOutputStream(current_format, out_file_buf ? *out_file_buf : *out_buf, block); block_out_stream->writePrefix(); } @@ -2529,6 +2580,7 @@ public: ("opentelemetry-traceparent", po::value(), "OpenTelemetry traceparent header as described by W3C Trace Context recommendation") ("opentelemetry-tracestate", po::value(), "OpenTelemetry tracestate header as described by W3C Trace Context recommendation") ("history_file", po::value(), "path to history file") + ("no-warnings", "disable warnings when client connects to server") ; Settings cmd_settings; @@ -2596,8 +2648,7 @@ public: } catch (const Exception & e) { - std::string text = e.displayText(); - std::cerr << "Code: " << e.code() << ". " << text << std::endl; + std::cerr << getExceptionMessage(e, false) << std::endl; std::cerr << "Table â„–" << i << std::endl << std::endl; /// Avoid the case when error exit code can possibly overflow to normal (zero). auto exit_code = e.code() % 256; @@ -2689,6 +2740,8 @@ public: config().setBool("highlight", options["highlight"].as()); if (options.count("history_file")) config().setString("history_file", options["history_file"].as()); + if (options.count("no-warnings")) + config().setBool("no-warnings", true); if ((query_fuzzer_runs = options["query-fuzzer-runs"].as())) { @@ -2740,8 +2793,7 @@ int mainEntryClickHouseClient(int argc, char ** argv) } catch (const DB::Exception & e) { - std::string text = e.displayText(); - std::cerr << "Code: " << e.code() << ". " << text << std::endl; + std::cerr << DB::getExceptionMessage(e, false) << std::endl; return 1; } catch (...) diff --git a/programs/copier/CMakeLists.txt b/programs/copier/CMakeLists.txt index dfb067b00f9..57e0996ed78 100644 --- a/programs/copier/CMakeLists.txt +++ b/programs/copier/CMakeLists.txt @@ -11,7 +11,6 @@ set (CLICKHOUSE_COPIER_LINK clickhouse_functions clickhouse_table_functions clickhouse_aggregate_functions - clickhouse_dictionaries string_utils PUBLIC diff --git a/programs/copier/ClusterCopier.cpp b/programs/copier/ClusterCopier.cpp index 128f8bc1cdd..cf0b6cc76a4 100644 --- a/programs/copier/ClusterCopier.cpp +++ b/programs/copier/ClusterCopier.cpp @@ -1702,14 +1702,15 @@ void ClusterCopier::dropParticularPartitionPieceFromAllHelpingTables(const TaskT LOG_INFO(log, "All helping tables dropped partition {}", partition_name); } -String ClusterCopier::getRemoteCreateTable(const DatabaseAndTableName & table, Connection & connection, const Settings & settings) +String ClusterCopier::getRemoteCreateTable( + const DatabaseAndTableName & table, Connection & connection, const Settings & settings) { auto remote_context = Context::createCopy(context); remote_context->setSettings(settings); String query = "SHOW CREATE TABLE " + getQuotedTable(table); - Block block = getBlockWithAllStreamData(std::make_shared( - connection, query, InterpreterShowCreateQuery::getSampleBlock(), remote_context)); + Block block = getBlockWithAllStreamData( + std::make_shared(connection, query, InterpreterShowCreateQuery::getSampleBlock(), remote_context)); return typeid_cast(*block.safeGetByPosition(0).column).getDataAt(0).toString(); } @@ -1719,10 +1720,8 @@ ASTPtr ClusterCopier::getCreateTableForPullShard(const ConnectionTimeouts & time { /// Fetch and parse (possibly) new definition auto connection_entry = task_shard.info.pool->get(timeouts, &task_cluster->settings_pull, true); - String create_query_pull_str = getRemoteCreateTable( - task_shard.task_table.table_pull, - *connection_entry, - task_cluster->settings_pull); + String create_query_pull_str + = getRemoteCreateTable(task_shard.task_table.table_pull, *connection_entry, task_cluster->settings_pull); ParserCreateQuery parser_create_query; const auto & settings = getContext()->getSettingsRef(); @@ -1953,8 +1952,8 @@ UInt64 ClusterCopier::executeQueryOnCluster( /// For unknown reason global context is passed to IStorage::read() method /// So, task_identifier is passed as constructor argument. It is more obvious. auto remote_query_executor = std::make_shared( - *connections.back(), query, header, getContext(), - /*throttler=*/nullptr, Scalars(), Tables(), QueryProcessingStage::Complete); + *connections.back(), query, header, getContext(), + /*throttler=*/nullptr, Scalars(), Tables(), QueryProcessingStage::Complete); try { diff --git a/programs/extract-from-config/ExtractFromConfig.cpp b/programs/extract-from-config/ExtractFromConfig.cpp index dff7e81c430..3fd665bcb26 100644 --- a/programs/extract-from-config/ExtractFromConfig.cpp +++ b/programs/extract-from-config/ExtractFromConfig.cpp @@ -33,7 +33,7 @@ static std::string extractFromConfig( { DB::ConfigurationPtr bootstrap_configuration(new Poco::Util::XMLConfiguration(config_xml)); zkutil::ZooKeeperPtr zookeeper = std::make_shared( - *bootstrap_configuration, "zookeeper"); + *bootstrap_configuration, "zookeeper", nullptr); zkutil::ZooKeeperNodeCache zk_node_cache([&] { return zookeeper; }); config_xml = processor.processConfig(&has_zk_includes, &zk_node_cache); } diff --git a/programs/library-bridge/HandlerFactory.cpp b/programs/library-bridge/HandlerFactory.cpp index 9f53a24156f..43087082c46 100644 --- a/programs/library-bridge/HandlerFactory.cpp +++ b/programs/library-bridge/HandlerFactory.cpp @@ -12,8 +12,8 @@ namespace DB Poco::URI uri{request.getURI()}; LOG_DEBUG(log, "Request URI: {}", uri.toString()); - if (uri == "/ping" && request.getMethod() == Poco::Net::HTTPRequest::HTTP_GET) - return std::make_unique(keep_alive_timeout); + if (request.getMethod() == Poco::Net::HTTPRequest::HTTP_GET) + return std::make_unique(keep_alive_timeout, getContext()); if (request.getMethod() == Poco::Net::HTTPRequest::HTTP_POST) return std::make_unique(keep_alive_timeout, getContext()); diff --git a/programs/library-bridge/Handlers.cpp b/programs/library-bridge/Handlers.cpp index ec82d7d52f4..2b6d0057bb2 100644 --- a/programs/library-bridge/Handlers.cpp +++ b/programs/library-bridge/Handlers.cpp @@ -17,8 +17,24 @@ namespace DB { + +namespace ErrorCodes +{ + extern const int BAD_REQUEST_PARAMETER; +} + namespace { + void processError(HTTPServerResponse & response, const std::string & message) + { + response.setStatusAndReason(HTTPResponse::HTTP_INTERNAL_SERVER_ERROR); + + if (!response.sent()) + *response.send() << message << std::endl; + + LOG_WARNING(&Poco::Logger::get("LibraryBridge"), message); + } + std::shared_ptr parseColumns(std::string && column_string) { auto sample_block = std::make_shared(); @@ -30,9 +46,8 @@ namespace return sample_block; } - std::vector parseIdsFromBinary(const std::string & ids_string) + std::vector parseIdsFromBinary(ReadBuffer & buf) { - ReadBufferFromString buf(ids_string); std::vector ids; readVectorBinary(ids, buf); return ids; @@ -67,13 +82,36 @@ void LibraryRequestHandler::handleRequest(HTTPServerRequest & request, HTTPServe std::string method = params.get("method"); std::string dictionary_id = params.get("dictionary_id"); - LOG_TRACE(log, "Library method: '{}', dictionary id: {}", method, dictionary_id); + LOG_TRACE(log, "Library method: '{}', dictionary id: {}", method, dictionary_id); WriteBufferFromHTTPServerResponse out(response, request.getMethod() == Poco::Net::HTTPRequest::HTTP_HEAD, keep_alive_timeout); try { - if (method == "libNew") + bool lib_new = (method == "libNew"); + if (method == "libClone") + { + if (!params.has("from_dictionary_id")) + { + processError(response, "No 'from_dictionary_id' in request URL"); + return; + } + + std::string from_dictionary_id = params.get("from_dictionary_id"); + bool cloned = false; + cloned = SharedLibraryHandlerFactory::instance().clone(from_dictionary_id, dictionary_id); + + if (cloned) + { + writeStringBinary("1", out); + } + else + { + LOG_TRACE(log, "Cannot clone from dictionary with id: {}, will call libNew instead"); + lib_new = true; + } + } + if (lib_new) { auto & read_buf = request.getStream(); params.read(read_buf); @@ -92,6 +130,8 @@ void LibraryRequestHandler::handleRequest(HTTPServerRequest & request, HTTPServe std::string library_path = params.get("library_path"); const auto & settings_string = params.get("library_settings"); + + LOG_DEBUG(log, "Parsing library settings from binary string"); std::vector library_settings = parseNamesFromBinary(settings_string); /// Needed for library dictionary @@ -102,6 +142,8 @@ void LibraryRequestHandler::handleRequest(HTTPServerRequest & request, HTTPServe } const auto & attributes_string = params.get("attributes_names"); + + LOG_DEBUG(log, "Parsing attributes names from binary string"); std::vector attributes_names = parseNamesFromBinary(attributes_string); /// Needed to parse block from binary string format @@ -140,59 +182,63 @@ void LibraryRequestHandler::handleRequest(HTTPServerRequest & request, HTTPServe SharedLibraryHandlerFactory::instance().create(dictionary_id, library_path, library_settings, sample_block_with_nulls, attributes_names); writeStringBinary("1", out); } - else if (method == "libClone") - { - if (!params.has("from_dictionary_id")) - { - processError(response, "No 'from_dictionary_id' in request URL"); - return; - } - - std::string from_dictionary_id = params.get("from_dictionary_id"); - LOG_TRACE(log, "Calling libClone from {} to {}", from_dictionary_id, dictionary_id); - SharedLibraryHandlerFactory::instance().clone(from_dictionary_id, dictionary_id); - writeStringBinary("1", out); - } else if (method == "libDelete") { - SharedLibraryHandlerFactory::instance().remove(dictionary_id); + auto deleted = SharedLibraryHandlerFactory::instance().remove(dictionary_id); + + /// Do not throw, a warning is ok. + if (!deleted) + LOG_WARNING(log, "Cannot delete library for with dictionary id: {}, because such id was not found.", dictionary_id); + writeStringBinary("1", out); } else if (method == "isModified") { auto library_handler = SharedLibraryHandlerFactory::instance().get(dictionary_id); + if (!library_handler) + throw Exception(ErrorCodes::BAD_REQUEST_PARAMETER, "Not found dictionary with id: {}", dictionary_id); + bool res = library_handler->isModified(); writeStringBinary(std::to_string(res), out); } else if (method == "supportsSelectiveLoad") { auto library_handler = SharedLibraryHandlerFactory::instance().get(dictionary_id); + if (!library_handler) + throw Exception(ErrorCodes::BAD_REQUEST_PARAMETER, "Not found dictionary with id: {}", dictionary_id); + bool res = library_handler->supportsSelectiveLoad(); writeStringBinary(std::to_string(res), out); } else if (method == "loadAll") { auto library_handler = SharedLibraryHandlerFactory::instance().get(dictionary_id); + if (!library_handler) + throw Exception(ErrorCodes::BAD_REQUEST_PARAMETER, "Not found dictionary with id: {}", dictionary_id); + const auto & sample_block = library_handler->getSampleBlock(); + LOG_DEBUG(log, "Calling loadAll() for dictionary id: {}", dictionary_id); auto input = library_handler->loadAll(); + LOG_DEBUG(log, "Started sending result data for dictionary id: {}", dictionary_id); BlockOutputStreamPtr output = FormatFactory::instance().getOutputStream(FORMAT, out, sample_block, getContext()); copyData(*input, *output); } else if (method == "loadIds") { - params.read(request.getStream()); + LOG_DEBUG(log, "Getting diciontary ids for dictionary with id: {}", dictionary_id); + String ids_string; + std::vector ids = parseIdsFromBinary(request.getStream()); - if (!params.has("ids")) - { - processError(response, "No 'ids' in request URL"); - return; - } - - std::vector ids = parseIdsFromBinary(params.get("ids")); auto library_handler = SharedLibraryHandlerFactory::instance().get(dictionary_id); + if (!library_handler) + throw Exception(ErrorCodes::BAD_REQUEST_PARAMETER, "Not found dictionary with id: {}", dictionary_id); + const auto & sample_block = library_handler->getSampleBlock(); + LOG_DEBUG(log, "Calling loadIds() for dictionary id: {}", dictionary_id); auto input = library_handler->loadIds(ids); + + LOG_DEBUG(log, "Started sending result data for dictionary id: {}", dictionary_id); BlockOutputStreamPtr output = FormatFactory::instance().getOutputStream(FORMAT, out, sample_block, getContext()); copyData(*input, *output); } @@ -224,8 +270,14 @@ void LibraryRequestHandler::handleRequest(HTTPServerRequest & request, HTTPServe auto block = reader->read(); auto library_handler = SharedLibraryHandlerFactory::instance().get(dictionary_id); + if (!library_handler) + throw Exception(ErrorCodes::BAD_REQUEST_PARAMETER, "Not found dictionary with id: {}", dictionary_id); + const auto & sample_block = library_handler->getSampleBlock(); + LOG_DEBUG(log, "Calling loadKeys() for dictionary id: {}", dictionary_id); auto input = library_handler->loadKeys(block.getColumns()); + + LOG_DEBUG(log, "Started sending result data for dictionary id: {}", dictionary_id); BlockOutputStreamPtr output = FormatFactory::instance().getOutputStream(FORMAT, out, sample_block, getContext()); copyData(*input, *output); } @@ -233,8 +285,9 @@ void LibraryRequestHandler::handleRequest(HTTPServerRequest & request, HTTPServe catch (...) { auto message = getCurrentExceptionMessage(true); - response.setStatusAndReason(Poco::Net::HTTPResponse::HTTP_INTERNAL_SERVER_ERROR, message); // can't call process_error, because of too soon response sending + LOG_ERROR(log, "Failed to process request for dictionary_id: {}. Error: {}", dictionary_id, message); + response.setStatusAndReason(Poco::Net::HTTPResponse::HTTP_INTERNAL_SERVER_ERROR, message); // can't call process_error, because of too soon response sending try { writeStringBinary(message, out); @@ -244,8 +297,6 @@ void LibraryRequestHandler::handleRequest(HTTPServerRequest & request, HTTPServe { tryLogCurrentException(log); } - - tryLogCurrentException(log); } try @@ -259,24 +310,30 @@ void LibraryRequestHandler::handleRequest(HTTPServerRequest & request, HTTPServe } -void LibraryRequestHandler::processError(HTTPServerResponse & response, const std::string & message) -{ - response.setStatusAndReason(HTTPResponse::HTTP_INTERNAL_SERVER_ERROR); - - if (!response.sent()) - *response.send() << message << std::endl; - - LOG_WARNING(log, message); -} - - -void PingHandler::handleRequest(HTTPServerRequest & /* request */, HTTPServerResponse & response) +void LibraryExistsHandler::handleRequest(HTTPServerRequest & request, HTTPServerResponse & response) { try { + LOG_TRACE(log, "Request URI: {}", request.getURI()); + HTMLForm params(getContext()->getSettingsRef(), request); + + if (!params.has("dictionary_id")) + { + processError(response, "No 'dictionary_id' in request URL"); + return; + } + + std::string dictionary_id = params.get("dictionary_id"); + auto library_handler = SharedLibraryHandlerFactory::instance().get(dictionary_id); + String res; + if (library_handler) + res = "1"; + else + res = "0"; + setResponseDefaultHeaders(response, keep_alive_timeout); - const char * data = "Ok.\n"; - response.sendBuffer(data, strlen(data)); + LOG_TRACE(log, "Senging ping response: {} (dictionary id: {})", res, dictionary_id); + response.sendBuffer(res.data(), res.size()); } catch (...) { diff --git a/programs/library-bridge/Handlers.h b/programs/library-bridge/Handlers.h index dac61d3a735..58af24b06d1 100644 --- a/programs/library-bridge/Handlers.h +++ b/programs/library-bridge/Handlers.h @@ -22,8 +22,7 @@ class LibraryRequestHandler : public HTTPRequestHandler, WithContext public: LibraryRequestHandler( - size_t keep_alive_timeout_, - ContextPtr context_) + size_t keep_alive_timeout_, ContextPtr context_) : WithContext(context_) , log(&Poco::Logger::get("LibraryRequestHandler")) , keep_alive_timeout(keep_alive_timeout_) @@ -35,18 +34,18 @@ public: private: static constexpr inline auto FORMAT = "RowBinary"; - void processError(HTTPServerResponse & response, const std::string & message); - Poco::Logger * log; size_t keep_alive_timeout; }; -class PingHandler : public HTTPRequestHandler +class LibraryExistsHandler : public HTTPRequestHandler, WithContext { public: - explicit PingHandler(size_t keep_alive_timeout_) - : keep_alive_timeout(keep_alive_timeout_) + explicit LibraryExistsHandler(size_t keep_alive_timeout_, ContextPtr context_) + : WithContext(context_) + , keep_alive_timeout(keep_alive_timeout_) + , log(&Poco::Logger::get("LibraryRequestHandler")) { } @@ -54,6 +53,8 @@ public: private: const size_t keep_alive_timeout; + Poco::Logger * log; + }; } diff --git a/programs/library-bridge/SharedLibraryHandlerFactory.cpp b/programs/library-bridge/SharedLibraryHandlerFactory.cpp index 05494c313c4..a9358ca552a 100644 --- a/programs/library-bridge/SharedLibraryHandlerFactory.cpp +++ b/programs/library-bridge/SharedLibraryHandlerFactory.cpp @@ -4,11 +4,6 @@ namespace DB { -namespace ErrorCodes -{ - extern const int LOGICAL_ERROR; -} - SharedLibraryHandlerPtr SharedLibraryHandlerFactory::get(const std::string & dictionary_id) { std::lock_guard lock(mutex); @@ -29,32 +24,32 @@ void SharedLibraryHandlerFactory::create( const std::vector & attributes_names) { std::lock_guard lock(mutex); - library_handlers[dictionary_id] = std::make_shared(library_path, library_settings, sample_block, attributes_names); + if (!library_handlers.count(dictionary_id)) + library_handlers.emplace(std::make_pair(dictionary_id, std::make_shared(library_path, library_settings, sample_block, attributes_names))); + else + LOG_WARNING(&Poco::Logger::get("SharedLibraryHandlerFactory"), "Library handler with dictionary id {} already exists", dictionary_id); } -void SharedLibraryHandlerFactory::clone(const std::string & from_dictionary_id, const std::string & to_dictionary_id) +bool SharedLibraryHandlerFactory::clone(const std::string & from_dictionary_id, const std::string & to_dictionary_id) { std::lock_guard lock(mutex); auto from_library_handler = library_handlers.find(from_dictionary_id); - /// This is not supposed to happen as libClone is called from copy constructor of LibraryDictionarySource - /// object, and shared library handler of from_dictionary is removed only in its destructor. - /// And if for from_dictionary there was no shared library handler, it would have received and exception in - /// its constructor, so no libClone would be made from it. if (from_library_handler == library_handlers.end()) - throw Exception(ErrorCodes::LOGICAL_ERROR, "No shared library handler found"); + return false; /// libClone method will be called in copy constructor library_handlers[to_dictionary_id] = std::make_shared(*from_library_handler->second); + return true; } -void SharedLibraryHandlerFactory::remove(const std::string & dictionary_id) +bool SharedLibraryHandlerFactory::remove(const std::string & dictionary_id) { std::lock_guard lock(mutex); /// libDelete is called in destructor. - library_handlers.erase(dictionary_id); + return library_handlers.erase(dictionary_id); } diff --git a/programs/library-bridge/SharedLibraryHandlerFactory.h b/programs/library-bridge/SharedLibraryHandlerFactory.h index 473d90618a2..115cc78ae52 100644 --- a/programs/library-bridge/SharedLibraryHandlerFactory.h +++ b/programs/library-bridge/SharedLibraryHandlerFactory.h @@ -24,9 +24,9 @@ public: const Block & sample_block, const std::vector & attributes_names); - void clone(const std::string & from_dictionary_id, const std::string & to_dictionary_id); + bool clone(const std::string & from_dictionary_id, const std::string & to_dictionary_id); - void remove(const std::string & dictionary_id); + bool remove(const std::string & dictionary_id); private: /// map: dict_id -> sharedLibraryHandler diff --git a/programs/local/CMakeLists.txt b/programs/local/CMakeLists.txt index b61f0ea33b7..530128c2041 100644 --- a/programs/local/CMakeLists.txt +++ b/programs/local/CMakeLists.txt @@ -6,7 +6,6 @@ set (CLICKHOUSE_LOCAL_LINK clickhouse_aggregate_functions clickhouse_common_config clickhouse_common_io - clickhouse_dictionaries clickhouse_functions clickhouse_parsers clickhouse_storages_system diff --git a/programs/local/LocalServer.cpp b/programs/local/LocalServer.cpp index 6be7ba1ad73..e256338a538 100644 --- a/programs/local/LocalServer.cpp +++ b/programs/local/LocalServer.cpp @@ -433,7 +433,7 @@ void LocalServer::processQueries() try { - executeQuery(read_buf, write_buf, /* allow_into_outfile = */ true, context, {}, finalize_progress); + executeQuery(read_buf, write_buf, /* allow_into_outfile = */ true, context, {}, {}, finalize_progress); } catch (...) { diff --git a/programs/main.cpp b/programs/main.cpp index 225c1ac84de..7619f945f07 100644 --- a/programs/main.cpp +++ b/programs/main.cpp @@ -322,7 +322,7 @@ struct Checker { checkRequiredInstructions(); } -} checker; +} checker __attribute__((init_priority(101))); /// Run before other static initializers. } diff --git a/programs/obfuscator/Obfuscator.cpp b/programs/obfuscator/Obfuscator.cpp index ecefcac1faf..b1acc34ef93 100644 --- a/programs/obfuscator/Obfuscator.cpp +++ b/programs/obfuscator/Obfuscator.cpp @@ -15,8 +15,8 @@ #include #include #include -#include -#include +#include +#include #include #include #include @@ -24,6 +24,10 @@ #include #include #include +#include +#include +#include +#include #include #include #include @@ -1156,17 +1160,20 @@ try if (!silent) std::cerr << "Training models\n"; - BlockInputStreamPtr input = context->getInputFormat(input_format, file_in, header, max_block_size); + Pipe pipe(FormatFactory::instance().getInput(input_format, file_in, header, context, max_block_size)); - input->readPrefix(); - while (Block block = input->read()) + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); + PullingPipelineExecutor executor(pipeline); + + Block block; + while (executor.pull(block)) { obfuscator.train(block.getColumns()); source_rows += block.rows(); if (!silent) std::cerr << "Processed " << source_rows << " rows\n"; } - input->readSuffix(); } obfuscator.finalize(); @@ -1183,15 +1190,26 @@ try file_in.seek(0, SEEK_SET); - BlockInputStreamPtr input = context->getInputFormat(input_format, file_in, header, max_block_size); - BlockOutputStreamPtr output = context->getOutputStreamParallelIfPossible(output_format, file_out, header); + Pipe pipe(FormatFactory::instance().getInput(input_format, file_in, header, context, max_block_size)); if (processed_rows + source_rows > limit) - input = std::make_shared(input, limit - processed_rows, 0); + { + pipe.addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared(cur_header, limit - processed_rows, 0); + }); + } + + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); + + BlockOutputStreamPtr output = context->getOutputStreamParallelIfPossible(output_format, file_out, header); + + PullingPipelineExecutor executor(pipeline); - input->readPrefix(); output->writePrefix(); - while (Block block = input->read()) + Block block; + while (executor.pull(block)) { Columns columns = obfuscator.generate(block.getColumns()); output->write(header.cloneWithColumns(columns)); @@ -1200,7 +1218,6 @@ try std::cerr << "Processed " << processed_rows << " rows\n"; } output->writeSuffix(); - input->readSuffix(); obfuscator.updateSeed(); } diff --git a/programs/server/CMakeLists.txt b/programs/server/CMakeLists.txt index 739d1004025..281c25d50eb 100644 --- a/programs/server/CMakeLists.txt +++ b/programs/server/CMakeLists.txt @@ -13,7 +13,6 @@ set (CLICKHOUSE_SERVER_LINK clickhouse_common_config clickhouse_common_io clickhouse_common_zookeeper - clickhouse_dictionaries clickhouse_functions clickhouse_parsers clickhouse_storages_system diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp index d4f830e5a0c..86bb04351b1 100644 --- a/programs/server/Server.cpp +++ b/programs/server/Server.cpp @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -39,6 +40,7 @@ #include #include #include +#include #include #include #include @@ -59,6 +61,7 @@ #include #include #include +#include #include #include #include @@ -94,6 +97,9 @@ #endif #if USE_SSL +# if USE_INTERNAL_SSL_LIBRARY +# include +# endif # include # include #endif @@ -106,6 +112,10 @@ # include #endif +#if USE_BASE64 +# include +#endif + #if USE_JEMALLOC # include #endif @@ -241,6 +251,7 @@ namespace ErrorCodes extern const int SUPPORT_IS_DISABLED; extern const int ARGUMENT_OUT_OF_BOUND; extern const int EXCESSIVE_ELEMENT_IN_CONFIG; + extern const int INCORRECT_DATA; extern const int INVALID_CONFIG_PARAMETER; extern const int SYSTEM_ERROR; extern const int FAILED_TO_GETPWUID; @@ -444,6 +455,39 @@ void checkForUsersNotInMainConfig( } } +static void loadEncryptionKey(const std::string & key_command [[maybe_unused]], Poco::Logger * log) +{ +#if USE_BASE64 && USE_SSL && USE_INTERNAL_SSL_LIBRARY + + auto process = ShellCommand::execute(key_command); + + std::string b64_key; + readStringUntilEOF(b64_key, process->out); + process->wait(); + + // turbob64 doesn't like whitespace characters in input. Strip + // them before decoding. + std::erase_if(b64_key, [](char c) + { + return c == ' ' || c == '\t' || c == '\r' || c == '\n'; + }); + + std::vector buf(b64_key.size()); + const size_t key_size = tb64dec(reinterpret_cast(b64_key.data()), b64_key.size(), + reinterpret_cast(buf.data())); + if (!key_size) + throw Exception("Failed to decode encryption key", ErrorCodes::INCORRECT_DATA); + else if (key_size < 16) + LOG_WARNING(log, "The encryption key should be at least 16 octets long."); + + const std::string_view key = std::string_view(buf.data(), key_size); + CompressionCodecEncrypted::setMasterKey(key); + +#else + LOG_WARNING(log, "Server was built without Base64 or SSL support. Encryption is disabled."); +#endif +} + [[noreturn]] void forceShutdown() { @@ -503,6 +547,8 @@ if (ThreadFuzzer::instance().isEffective()) // ignore `max_thread_pool_size` in configs we fetch from ZK, but oh well. GlobalThreadPool::initialize(config().getUInt("max_thread_pool_size", 10000)); + ConnectionCollector::init(global_context, config().getUInt("max_threads_for_connection_collector", 10)); + bool has_zookeeper = config().has("zookeeper"); zkutil::ZooKeeperNodeCache main_config_zk_node_cache([&] { return global_context->getZooKeeper(); }); @@ -913,6 +959,10 @@ if (ThreadFuzzer::instance().isEffective()) global_context->getMergeTreeSettings().sanityCheck(settings); global_context->getReplicatedMergeTreeSettings().sanityCheck(settings); + /// Set up encryption. + if (config().has("encryption.key_command")) + loadEncryptionKey(config().getString("encryption.key_command"), log); + Poco::Timespan keep_alive_timeout(config().getUInt("keep_alive_timeout", 10), 0); Poco::ThreadPool server_pool(3, config().getUInt("max_connections", 1024)); @@ -1044,6 +1094,7 @@ if (ThreadFuzzer::instance().isEffective()) loadMetadataSystem(global_context); /// After attaching system databases we can initialize system log. global_context->initializeSystemLogs(); + global_context->setSystemZooKeeperLogAfterInitializationIfNeeded(); auto & database_catalog = DatabaseCatalog::instance(); /// After the system database is created, attach virtual system tables (in addition to query_log and part_log) attachSystemTablesServer(*database_catalog.getSystemDatabase(), has_zookeeper); diff --git a/programs/server/config.xml b/programs/server/config.xml index 6f0b228dda7..78182482c1c 100644 --- a/programs/server/config.xml +++ b/programs/server/config.xml @@ -1002,6 +1002,16 @@ --> + + + + + + @@ -1156,4 +1166,27 @@ + + + diff --git a/programs/server/play.html b/programs/server/play.html index 4165a2829bd..503bb92d03e 100644 --- a/programs/server/play.html +++ b/programs/server/play.html @@ -9,7 +9,7 @@ Do not use any JavaScript or CSS frameworks or preprocessors. This HTML page should not require any build systems (node.js, npm, gulp, etc.) This HTML page should not be minified, instead it should be reasonably minimalistic by itself. - This HTML page should not load any external resources + This HTML page should not load any external resources on load. (CSS and JavaScript must be embedded directly to the page. No external fonts or images should be loaded). This UI should look as lightweight, clean and fast as possible. All UI elements must be aligned in pixel-perfect way. @@ -343,34 +343,40 @@ /// Save query in history only if it is different. let previous_query = ''; - /// Substitute the address of the server where the page is served. - if (location.protocol != 'file:') { + const current_url = new URL(window.location); + + const server_address = current_url.searchParams.get('url'); + if (server_address) { + document.getElementById('url').value = server_address; + } else if (location.protocol != 'file:') { + /// Substitute the address of the server where the page is served. document.getElementById('url').value = location.origin; } /// Substitute user name if it's specified in the query string - let user_from_url = (new URL(window.location)).searchParams.get('user'); + const user_from_url = current_url.searchParams.get('user'); if (user_from_url) { document.getElementById('user').value = user_from_url; } function postImpl(posted_request_num, query) { - /// TODO: Check if URL already contains query string (append parameters). + const user = document.getElementById('user').value; + const password = document.getElementById('password').value; - let user = document.getElementById('user').value; - let password = document.getElementById('password').value; + const server_address = document.getElementById('url').value; - let url = document.getElementById('url').value + + const url = server_address + + (server_address.indexOf('?') >= 0 ? '&' : '?') + /// Ask server to allow cross-domain requests. - '?add_http_cors_header=1' + + 'add_http_cors_header=1' + '&user=' + encodeURIComponent(user) + '&password=' + encodeURIComponent(password) + '&default_format=JSONCompact' + /// Safety settings to prevent results that browser cannot display. '&max_result_rows=1000&max_result_bytes=10000000&result_overflow_mode=break'; - let xhr = new XMLHttpRequest; + const xhr = new XMLHttpRequest; xhr.open('POST', url, true); @@ -384,17 +390,24 @@ /// The query is saved in browser history (in state JSON object) /// as well as in URL fragment identifier. if (query != previous_query) { - let state = { + const state = { query: query, status: this.status, response: this.response.length > 100000 ? null : this.response /// Lower than the browser's limit. }; - let title = "ClickHouse Query: " + query; - let url = window.location.pathname + '?user=' + encodeURIComponent(user) + '#' + window.btoa(query); + const title = "ClickHouse Query: " + query; + + let history_url = window.location.pathname + '?user=' + encodeURIComponent(user); + if (server_address != location.origin) { + /// Save server's address in URL if it's not identical to the address of the play UI. + history_url += '&url=' + encodeURIComponent(server_address); + } + history_url += '#' + window.btoa(query); + if (previous_query == '') { - history.replaceState(state, title, url); + history.replaceState(state, title, history_url); } else { - history.pushState(state, title, url); + history.pushState(state, title, history_url); } document.title = title; previous_query = query; @@ -599,10 +612,16 @@ } /// Huge JS libraries should be loaded only if needed. - function loadJS(src) { + function loadJS(src, integrity) { return new Promise((resolve, reject) => { const script = document.createElement('script'); script.src = src; + if (integrity) { + script.crossOrigin = 'anonymous'; + script.integrity = integrity; + } else { + console.warn('no integrity for', src) + } script.addEventListener('load', function() { resolve(true); }); document.head.appendChild(script); }); @@ -613,10 +632,14 @@ if (load_dagre_promise) { return load_dagre_promise; } load_dagre_promise = Promise.all([ - loadJS('https://dagrejs.github.io/project/dagre/v0.8.5/dagre.min.js'), - loadJS('https://dagrejs.github.io/project/graphlib-dot/v0.6.4/graphlib-dot.min.js'), - loadJS('https://dagrejs.github.io/project/dagre-d3/v0.6.4/dagre-d3.min.js'), - loadJS('https://cdn.jsdelivr.net/npm/d3@7.0.0'), + loadJS('https://dagrejs.github.io/project/dagre/v0.8.5/dagre.min.js', + 'sha384-2IH3T69EIKYC4c+RXZifZRvaH5SRUdacJW7j6HtE5rQbvLhKKdawxq6vpIzJ7j9M'), + loadJS('https://dagrejs.github.io/project/graphlib-dot/v0.6.4/graphlib-dot.min.js', + 'sha384-Q7oatU+b+y0oTkSoiRH9wTLH6sROySROCILZso/AbMMm9uKeq++r8ujD4l4f+CWj'), + loadJS('https://dagrejs.github.io/project/dagre-d3/v0.6.4/dagre-d3.min.js', + 'sha384-9N1ty7Yz7VKL3aJbOk+8ParYNW8G5W+MvxEfFL9G7CRYPmkHI9gJqyAfSI/8190W'), + loadJS('https://cdn.jsdelivr.net/npm/d3@7.0.0', + 'sha384-S+Kf0r6YzKIhKA8d1k2/xtYv+j0xYUU3E7+5YLrcPVab6hBh/r1J6cq90OXhw80u'), ]); return load_dagre_promise; diff --git a/src/Access/AccessControlManager.cpp b/src/Access/AccessControlManager.cpp index 66023c1c0ea..08f1cda2fce 100644 --- a/src/Access/AccessControlManager.cpp +++ b/src/Access/AccessControlManager.cpp @@ -64,7 +64,12 @@ public: std::lock_guard lock{mutex}; auto x = cache.get(params); if (x) - return *x; + { + if ((*x)->getUser()) + return *x; + /// No user, probably the user has been dropped while it was in the cache. + cache.remove(params); + } auto res = std::shared_ptr(new ContextAccess(manager, params)); cache.add(params, res); return res; @@ -484,11 +489,12 @@ std::shared_ptr AccessControlManager::getEnabledSettings( return settings_profiles_cache->getEnabledSettings(user_id, settings_from_user, enabled_roles, settings_from_enabled_roles); } -std::shared_ptr AccessControlManager::getProfileSettings(const String & profile_name) const +std::shared_ptr AccessControlManager::getSettingsProfileInfo(const UUID & profile_id) { - return settings_profiles_cache->getProfileSettings(profile_name); + return settings_profiles_cache->getSettingsProfileInfo(profile_id); } + const ExternalAuthenticators & AccessControlManager::getExternalAuthenticators() const { return *external_authenticators; diff --git a/src/Access/AccessControlManager.h b/src/Access/AccessControlManager.h index 789c33af1c1..e41aa80a257 100644 --- a/src/Access/AccessControlManager.h +++ b/src/Access/AccessControlManager.h @@ -32,8 +32,7 @@ class RowPolicyCache; class EnabledQuota; class QuotaCache; struct QuotaUsage; -struct SettingsProfile; -using SettingsProfilePtr = std::shared_ptr; +struct SettingsProfilesInfo; class EnabledSettings; class SettingsProfilesCache; class SettingsProfileElements; @@ -145,7 +144,7 @@ public: const boost::container::flat_set & enabled_roles, const SettingsProfileElements & settings_from_enabled_roles) const; - std::shared_ptr getProfileSettings(const String & profile_name) const; + std::shared_ptr getSettingsProfileInfo(const UUID & profile_id); const ExternalAuthenticators & getExternalAuthenticators() const; diff --git a/src/Access/AccessRights.cpp b/src/Access/AccessRights.cpp index f9c1d23350d..d4b2dc8a252 100644 --- a/src/Access/AccessRights.cpp +++ b/src/Access/AccessRights.cpp @@ -655,7 +655,7 @@ private: for (auto & [lhs_childname, lhs_child] : *children) { if (!rhs.tryGetChild(lhs_childname)) - lhs_child.flags |= rhs.flags & lhs_child.getAllGrantableFlags(); + lhs_child.addGrantsRec(rhs.flags); } } } @@ -673,7 +673,7 @@ private: for (auto & [lhs_childname, lhs_child] : *children) { if (!rhs.tryGetChild(lhs_childname)) - lhs_child.flags &= rhs.flags; + lhs_child.removeGrantsRec(~rhs.flags); } } } @@ -1041,17 +1041,15 @@ void AccessRights::makeIntersection(const AccessRights & other) auto helper = [](std::unique_ptr & root_node, const std::unique_ptr & other_root_node) { if (!root_node) + return; + if (!other_root_node) { - if (other_root_node) - root_node = std::make_unique(*other_root_node); + root_node = nullptr; return; } - if (other_root_node) - { - root_node->makeIntersection(*other_root_node); - if (!root_node->flags && !root_node->children) - root_node = nullptr; - } + root_node->makeIntersection(*other_root_node); + if (!root_node->flags && !root_node->children) + root_node = nullptr; }; helper(root, other.root); helper(root_with_grant_option, other.root_with_grant_option); diff --git a/src/Access/ContextAccess.cpp b/src/Access/ContextAccess.cpp index 90495a83dfc..39b57a40e7a 100644 --- a/src/Access/ContextAccess.cpp +++ b/src/Access/ContextAccess.cpp @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -163,11 +164,10 @@ void ContextAccess::setUser(const UserPtr & user_) const if (!user) { /// User has been dropped. - auto nothing_granted = std::make_shared(); - access = nothing_granted; - access_with_implicit = nothing_granted; subscription_for_user_change = {}; subscription_for_roles_changes = {}; + access = nullptr; + access_with_implicit = nullptr; enabled_roles = nullptr; roles_info = nullptr; enabled_row_policies = nullptr; @@ -252,32 +252,45 @@ String ContextAccess::getUserName() const std::shared_ptr ContextAccess::getRolesInfo() const { std::lock_guard lock{mutex}; - return roles_info; + if (roles_info) + return roles_info; + static const auto no_roles = std::make_shared(); + return no_roles; } std::shared_ptr ContextAccess::getEnabledRowPolicies() const { std::lock_guard lock{mutex}; - return enabled_row_policies; + if (enabled_row_policies) + return enabled_row_policies; + static const auto no_row_policies = std::make_shared(); + return no_row_policies; } ASTPtr ContextAccess::getRowPolicyCondition(const String & database, const String & table_name, RowPolicy::ConditionType index, const ASTPtr & extra_condition) const { std::lock_guard lock{mutex}; - return enabled_row_policies ? enabled_row_policies->getCondition(database, table_name, index, extra_condition) : nullptr; + if (enabled_row_policies) + return enabled_row_policies->getCondition(database, table_name, index, extra_condition); + return nullptr; } std::shared_ptr ContextAccess::getQuota() const { std::lock_guard lock{mutex}; - return enabled_quota; + if (enabled_quota) + return enabled_quota; + static const auto unlimited_quota = EnabledQuota::getUnlimitedQuota(); + return unlimited_quota; } std::optional ContextAccess::getQuotaUsage() const { std::lock_guard lock{mutex}; - return enabled_quota ? enabled_quota->getUsage() : std::optional{}; + if (enabled_quota) + return enabled_quota->getUsage(); + return {}; } @@ -288,38 +301,52 @@ std::shared_ptr ContextAccess::getFullAccess() auto full_access = std::shared_ptr(new ContextAccess); full_access->is_full_access = true; full_access->access = std::make_shared(AccessRights::getFullAccess()); - full_access->enabled_quota = EnabledQuota::getUnlimitedQuota(); + full_access->access_with_implicit = std::make_shared(addImplicitAccessRights(*full_access->access)); return full_access; }(); return res; } -std::shared_ptr ContextAccess::getDefaultSettings() const +SettingsChanges ContextAccess::getDefaultSettings() const { std::lock_guard lock{mutex}; - return enabled_settings ? enabled_settings->getSettings() : nullptr; + if (enabled_settings) + { + if (auto info = enabled_settings->getInfo()) + return info->settings; + } + return {}; } -std::shared_ptr ContextAccess::getSettingsConstraints() const +std::shared_ptr ContextAccess::getDefaultProfileInfo() const { std::lock_guard lock{mutex}; - return enabled_settings ? enabled_settings->getConstraints() : nullptr; + if (enabled_settings) + return enabled_settings->getInfo(); + static const auto everything_by_default = std::make_shared(*manager); + return everything_by_default; } std::shared_ptr ContextAccess::getAccessRights() const { std::lock_guard lock{mutex}; - return access; + if (access) + return access; + static const auto nothing_granted = std::make_shared(); + return nothing_granted; } std::shared_ptr ContextAccess::getAccessRightsWithImplicit() const { std::lock_guard lock{mutex}; - return access_with_implicit; + if (access_with_implicit) + return access_with_implicit; + static const auto nothing_granted = std::make_shared(); + return nothing_granted; } @@ -551,7 +578,7 @@ bool ContextAccess::checkAdminOptionImplHelper(const Container & role_ids, const for (auto it = std::begin(role_ids); it != std::end(role_ids); ++it, ++i) { const UUID & role_id = *it; - if (info && info->enabled_roles_with_admin_option.count(role_id)) + if (info->enabled_roles_with_admin_option.count(role_id)) continue; if (throw_if_denied) @@ -560,7 +587,7 @@ bool ContextAccess::checkAdminOptionImplHelper(const Container & role_ids, const if (!role_name) role_name = "ID {" + toString(role_id) + "}"; - if (info && info->enabled_roles.count(role_id)) + if (info->enabled_roles.count(role_id)) show_error("Not enough privileges. " "Role " + backQuote(*role_name) + " is granted, but without ADMIN option. " "To execute this query it's necessary to have the role " + backQuoteIfNeed(*role_name) + " granted with ADMIN option.", diff --git a/src/Access/ContextAccess.h b/src/Access/ContextAccess.h index a4373be4ff0..70145b0a3ef 100644 --- a/src/Access/ContextAccess.h +++ b/src/Access/ContextAccess.h @@ -23,7 +23,8 @@ class EnabledQuota; class EnabledSettings; struct QuotaUsage; struct Settings; -class SettingsConstraints; +struct SettingsProfilesInfo; +class SettingsChanges; class AccessControlManager; class IAST; using ASTPtr = std::shared_ptr; @@ -71,11 +72,9 @@ public: String getUserName() const; /// Returns information about current and enabled roles. - /// The function can return nullptr. std::shared_ptr getRolesInfo() const; /// Returns information about enabled row policies. - /// The function can return nullptr. std::shared_ptr getEnabledRowPolicies() const; /// Returns the row policy filter for a specified table. @@ -83,17 +82,12 @@ public: ASTPtr getRowPolicyCondition(const String & database, const String & table_name, RowPolicy::ConditionType index, const ASTPtr & extra_condition = nullptr) const; /// Returns the quota to track resource consumption. - /// The function returns nullptr if no tracking or limitation is needed. std::shared_ptr getQuota() const; std::optional getQuotaUsage() const; - /// Returns the default settings, i.e. the settings to apply on user's login. - /// The function returns nullptr if it's no need to apply settings. - std::shared_ptr getDefaultSettings() const; - - /// Returns the settings' constraints. - /// The function returns nullptr if there are no constraints. - std::shared_ptr getSettingsConstraints() const; + /// Returns the default settings, i.e. the settings which should be applied on user's login. + SettingsChanges getDefaultSettings() const; + std::shared_ptr getDefaultProfileInfo() const; /// Returns the current access rights. std::shared_ptr getAccessRights() const; diff --git a/src/Access/EnabledRolesInfo.h b/src/Access/EnabledRolesInfo.h index f06b7478daf..091e1b64002 100644 --- a/src/Access/EnabledRolesInfo.h +++ b/src/Access/EnabledRolesInfo.h @@ -10,7 +10,7 @@ namespace DB { -/// Information about a role. +/// Information about roles enabled for a user at some specific time. struct EnabledRolesInfo { boost::container::flat_set current_roles; diff --git a/src/Access/EnabledRowPolicies.cpp b/src/Access/EnabledRowPolicies.cpp index efd5ed4ae10..674dab3e0f0 100644 --- a/src/Access/EnabledRowPolicies.cpp +++ b/src/Access/EnabledRowPolicies.cpp @@ -12,8 +12,11 @@ size_t EnabledRowPolicies::Hash::operator()(const MixedConditionKey & key) const } -EnabledRowPolicies::EnabledRowPolicies(const Params & params_) - : params(params_) +EnabledRowPolicies::EnabledRowPolicies() : params() +{ +} + +EnabledRowPolicies::EnabledRowPolicies(const Params & params_) : params(params_) { } diff --git a/src/Access/EnabledRowPolicies.h b/src/Access/EnabledRowPolicies.h index 0ca4f16fcf1..5e819733963 100644 --- a/src/Access/EnabledRowPolicies.h +++ b/src/Access/EnabledRowPolicies.h @@ -32,6 +32,7 @@ public: friend bool operator >=(const Params & lhs, const Params & rhs) { return !(lhs < rhs); } }; + EnabledRowPolicies(); ~EnabledRowPolicies(); using ConditionType = RowPolicy::ConditionType; diff --git a/src/Access/EnabledSettings.cpp b/src/Access/EnabledSettings.cpp index f913acb0150..eca650298f6 100644 --- a/src/Access/EnabledSettings.cpp +++ b/src/Access/EnabledSettings.cpp @@ -11,27 +11,16 @@ EnabledSettings::EnabledSettings(const Params & params_) : params(params_) EnabledSettings::~EnabledSettings() = default; - -std::shared_ptr EnabledSettings::getSettings() const +std::shared_ptr EnabledSettings::getInfo() const { std::lock_guard lock{mutex}; - return settings; + return info; } - -std::shared_ptr EnabledSettings::getConstraints() const +void EnabledSettings::setInfo(const std::shared_ptr & info_) { std::lock_guard lock{mutex}; - return constraints; -} - - -void EnabledSettings::setSettingsAndConstraints( - const std::shared_ptr & settings_, const std::shared_ptr & constraints_) -{ - std::lock_guard lock{mutex}; - settings = settings_; - constraints = constraints_; + info = info_; } } diff --git a/src/Access/EnabledSettings.h b/src/Access/EnabledSettings.h index 80635ca4542..35493ef01ab 100644 --- a/src/Access/EnabledSettings.h +++ b/src/Access/EnabledSettings.h @@ -1,15 +1,15 @@ #pragma once -#include -#include -#include #include +#include #include #include namespace DB { +struct SettingsProfilesInfo; + /// Watches settings profiles for a specific user and roles. class EnabledSettings { @@ -30,27 +30,19 @@ public: friend bool operator >=(const Params & lhs, const Params & rhs) { return !(lhs < rhs); } }; - ~EnabledSettings(); - /// Returns the default settings come from settings profiles defined for the user /// and the roles passed in the constructor. - std::shared_ptr getSettings() const; + std::shared_ptr getInfo() const; - /// Returns the constraints come from settings profiles defined for the user - /// and the roles passed in the constructor. - std::shared_ptr getConstraints() const; + ~EnabledSettings(); private: friend class SettingsProfilesCache; EnabledSettings(const Params & params_); - - void setSettingsAndConstraints( - const std::shared_ptr & settings_, const std::shared_ptr & constraints_); + void setInfo(const std::shared_ptr & info_); const Params params; - SettingsProfileElements settings_from_enabled; - std::shared_ptr settings; - std::shared_ptr constraints; + std::shared_ptr info; mutable std::mutex mutex; }; } diff --git a/src/Access/GrantedRoles.cpp b/src/Access/GrantedRoles.cpp index 2659f8a3ec9..7d16e3e65bb 100644 --- a/src/Access/GrantedRoles.cpp +++ b/src/Access/GrantedRoles.cpp @@ -3,7 +3,6 @@ #include #include - namespace DB { void GrantedRoles::grant(const UUID & role_) @@ -80,7 +79,7 @@ std::vector GrantedRoles::findGranted(const boost::container::flat_set res; res.reserve(ids.size()); - boost::range::set_difference(ids, roles, std::back_inserter(res)); + boost::range::set_intersection(ids, roles, std::back_inserter(res)); return res; } @@ -111,7 +110,7 @@ std::vector GrantedRoles::findGrantedWithAdminOption(const boost::containe { std::vector res; res.reserve(ids.size()); - boost::range::set_difference(ids, roles_with_admin_option, std::back_inserter(res)); + boost::range::set_intersection(ids, roles_with_admin_option, std::back_inserter(res)); return res; } diff --git a/src/Access/IAccessStorage.cpp b/src/Access/IAccessStorage.cpp index e258935f386..348987899cb 100644 --- a/src/Access/IAccessStorage.cpp +++ b/src/Access/IAccessStorage.cpp @@ -197,6 +197,16 @@ String IAccessStorage::readName(const UUID & id) const } +Strings IAccessStorage::readNames(const std::vector & ids) const +{ + Strings res; + res.reserve(ids.size()); + for (const auto & id : ids) + res.emplace_back(readName(id)); + return res; +} + + std::optional IAccessStorage::tryReadName(const UUID & id) const { String name; @@ -207,6 +217,19 @@ std::optional IAccessStorage::tryReadName(const UUID & id) const } +Strings IAccessStorage::tryReadNames(const std::vector & ids) const +{ + Strings res; + res.reserve(ids.size()); + for (const auto & id : ids) + { + if (auto name = tryReadName(id)) + res.emplace_back(std::move(name).value()); + } + return res; +} + + UUID IAccessStorage::insert(const AccessEntityPtr & entity) { return insertImpl(entity, false); diff --git a/src/Access/IAccessStorage.h b/src/Access/IAccessStorage.h index f37bc53eeae..bf45ed5b5ce 100644 --- a/src/Access/IAccessStorage.h +++ b/src/Access/IAccessStorage.h @@ -84,7 +84,9 @@ public: /// Reads only name of an entity. String readName(const UUID & id) const; + Strings readNames(const std::vector & ids) const; std::optional tryReadName(const UUID & id) const; + Strings tryReadNames(const std::vector & ids) const; /// Returns true if a specified entity can be inserted into this storage. /// This function doesn't check whether there are no entities with such name in the storage. diff --git a/src/Access/SettingsConstraintsAndProfileIDs.h b/src/Access/SettingsConstraintsAndProfileIDs.h new file mode 100644 index 00000000000..5538a10555e --- /dev/null +++ b/src/Access/SettingsConstraintsAndProfileIDs.h @@ -0,0 +1,21 @@ +#pragma once + +#include +#include +#include + + +namespace DB +{ + +/// Information about currently applied constraints and profiles. +struct SettingsConstraintsAndProfileIDs +{ + SettingsConstraints constraints; + std::vector current_profiles; + std::vector enabled_profiles; + + SettingsConstraintsAndProfileIDs(const AccessControlManager & manager_) : constraints(manager_) {} +}; + +} diff --git a/src/Access/SettingsProfileElement.cpp b/src/Access/SettingsProfileElement.cpp index 1c682900b00..edbb7509e9e 100644 --- a/src/Access/SettingsProfileElement.cpp +++ b/src/Access/SettingsProfileElement.cpp @@ -7,6 +7,7 @@ #include #include #include +#include namespace DB @@ -172,4 +173,21 @@ SettingsConstraints SettingsProfileElements::toSettingsConstraints(const AccessC return res; } +std::vector SettingsProfileElements::toProfileIDs() const +{ + std::vector res; + for (const auto & elem : *this) + { + if (elem.parent_profile) + res.push_back(*elem.parent_profile); + } + + /// If some profile occurs multiple times (with some other settings in between), + /// the latest occurrence overrides all the previous ones. + removeDuplicatesKeepLast(res); + + return res; +} + + } diff --git a/src/Access/SettingsProfileElement.h b/src/Access/SettingsProfileElement.h index c9262fecb73..d0e2343e726 100644 --- a/src/Access/SettingsProfileElement.h +++ b/src/Access/SettingsProfileElement.h @@ -62,6 +62,7 @@ public: Settings toSettings() const; SettingsChanges toSettingsChanges() const; SettingsConstraints toSettingsConstraints(const AccessControlManager & manager) const; + std::vector toProfileIDs() const; }; } diff --git a/src/Access/SettingsProfilesCache.cpp b/src/Access/SettingsProfilesCache.cpp index ef4bffa11f9..3cd73720c3e 100644 --- a/src/Access/SettingsProfilesCache.cpp +++ b/src/Access/SettingsProfilesCache.cpp @@ -1,11 +1,8 @@ #include #include #include -#include -#include +#include #include -#include -#include namespace DB @@ -15,7 +12,6 @@ namespace ErrorCodes extern const int THERE_IS_NO_PROFILE; } - SettingsProfilesCache::SettingsProfilesCache(const AccessControlManager & manager_) : manager(manager_) {} @@ -67,7 +63,7 @@ void SettingsProfilesCache::profileAddedOrChanged(const UUID & profile_id, const profiles_by_name.erase(old_profile->getName()); profiles_by_name[new_profile->getName()] = profile_id; } - settings_for_profiles.clear(); + profile_infos_cache.clear(); mergeSettingsAndConstraints(); } @@ -80,7 +76,7 @@ void SettingsProfilesCache::profileRemoved(const UUID & profile_id) return; profiles_by_name.erase(it->second->getName()); all_profiles.erase(it); - settings_for_profiles.clear(); + profile_infos_cache.clear(); mergeSettingsAndConstraints(); } @@ -142,49 +138,52 @@ void SettingsProfilesCache::mergeSettingsAndConstraintsFor(EnabledSettings & ena merged_settings.merge(enabled.params.settings_from_enabled_roles); merged_settings.merge(enabled.params.settings_from_user); - substituteProfiles(merged_settings); + auto info = std::make_shared(manager); + info->profiles = enabled.params.settings_from_user.toProfileIDs(); + substituteProfiles(merged_settings, info->profiles_with_implicit, info->names_of_profiles); + info->settings = merged_settings.toSettingsChanges(); + info->constraints = merged_settings.toSettingsConstraints(manager); - auto settings = merged_settings.toSettings(); - auto constraints = merged_settings.toSettingsConstraints(manager); - enabled.setSettingsAndConstraints( - std::make_shared(std::move(settings)), std::make_shared(std::move(constraints))); + enabled.setInfo(std::move(info)); } -void SettingsProfilesCache::substituteProfiles(SettingsProfileElements & elements) const +void SettingsProfilesCache::substituteProfiles( + SettingsProfileElements & elements, + std::vector & substituted_profiles, + std::unordered_map & names_of_substituted_profiles) const { - boost::container::flat_set already_substituted; - for (size_t i = 0; i != elements.size();) + /// We should substitute profiles in reversive order because the same profile can occur + /// in `elements` multiple times (with some other settings in between) and in this case + /// the last occurrence should override all the previous ones. + boost::container::flat_set substituted_profiles_set; + size_t i = elements.size(); + while (i != 0) { - auto & element = elements[i]; + auto & element = elements[--i]; if (!element.parent_profile) - { - ++i; continue; - } - auto parent_profile_id = *element.parent_profile; + auto profile_id = *element.parent_profile; element.parent_profile.reset(); - if (already_substituted.count(parent_profile_id)) - { - ++i; + if (substituted_profiles_set.count(profile_id)) continue; - } - already_substituted.insert(parent_profile_id); - auto parent_profile = all_profiles.find(parent_profile_id); - if (parent_profile == all_profiles.end()) - { - ++i; + auto profile_it = all_profiles.find(profile_id); + if (profile_it == all_profiles.end()) continue; - } - const auto & parent_profile_elements = parent_profile->second->elements; - elements.insert(elements.begin() + i, parent_profile_elements.begin(), parent_profile_elements.end()); + const auto & profile = profile_it->second; + const auto & profile_elements = profile->elements; + elements.insert(elements.begin() + i, profile_elements.begin(), profile_elements.end()); + i += profile_elements.size(); + substituted_profiles.push_back(profile_id); + substituted_profiles_set.insert(profile_id); + names_of_substituted_profiles.emplace(profile_id, profile->getName()); } + std::reverse(substituted_profiles.begin(), substituted_profiles.end()); } - std::shared_ptr SettingsProfilesCache::getEnabledSettings( const UUID & user_id, const SettingsProfileElements & settings_from_user, @@ -216,26 +215,26 @@ std::shared_ptr SettingsProfilesCache::getEnabledSettings } -std::shared_ptr SettingsProfilesCache::getProfileSettings(const String & profile_name) +std::shared_ptr SettingsProfilesCache::getSettingsProfileInfo(const UUID & profile_id) { std::lock_guard lock{mutex}; ensureAllProfilesRead(); - auto it = profiles_by_name.find(profile_name); - if (it == profiles_by_name.end()) - throw Exception("Settings profile " + backQuote(profile_name) + " not found", ErrorCodes::THERE_IS_NO_PROFILE); - const UUID profile_id = it->second; - - auto it2 = settings_for_profiles.find(profile_id); - if (it2 != settings_for_profiles.end()) - return it2->second; + if (auto pos = this->profile_infos_cache.get(profile_id)) + return *pos; SettingsProfileElements elements = all_profiles[profile_id]->elements; - substituteProfiles(elements); - auto res = std::make_shared(elements.toSettingsChanges()); - settings_for_profiles.emplace(profile_id, res); - return res; + + auto info = std::make_shared(manager); + + info->profiles.push_back(profile_id); + info->profiles_with_implicit.push_back(profile_id); + substituteProfiles(elements, info->profiles_with_implicit, info->names_of_profiles); + info->settings = elements.toSettingsChanges(); + info->constraints.merge(elements.toSettingsConstraints(manager)); + + profile_infos_cache.add(profile_id, info); + return info; } - } diff --git a/src/Access/SettingsProfilesCache.h b/src/Access/SettingsProfilesCache.h index fb71d3f4bd0..23a37212883 100644 --- a/src/Access/SettingsProfilesCache.h +++ b/src/Access/SettingsProfilesCache.h @@ -1,8 +1,7 @@ #pragma once #include -#include -#include +#include #include #include #include @@ -13,9 +12,7 @@ namespace DB class AccessControlManager; struct SettingsProfile; using SettingsProfilePtr = std::shared_ptr; -class SettingsProfileElements; -class EnabledSettings; - +struct SettingsProfilesInfo; /// Reads and caches all the settings profiles. class SettingsProfilesCache @@ -32,7 +29,7 @@ public: const boost::container::flat_set & enabled_roles, const SettingsProfileElements & settings_from_enabled_roles_); - std::shared_ptr getProfileSettings(const String & profile_name); + std::shared_ptr getSettingsProfileInfo(const UUID & profile_id); private: void ensureAllProfilesRead(); @@ -40,7 +37,7 @@ private: void profileRemoved(const UUID & profile_id); void mergeSettingsAndConstraints(); void mergeSettingsAndConstraintsFor(EnabledSettings & enabled) const; - void substituteProfiles(SettingsProfileElements & elements) const; + void substituteProfiles(SettingsProfileElements & elements, std::vector & substituted_profiles, std::unordered_map & names_of_substituted_profiles) const; const AccessControlManager & manager; std::unordered_map all_profiles; @@ -49,7 +46,7 @@ private: scope_guard subscription; std::map> enabled_settings; std::optional default_profile_id; - std::unordered_map> settings_for_profiles; + Poco::LRUCache> profile_infos_cache; mutable std::mutex mutex; }; } diff --git a/src/Access/SettingsProfilesInfo.cpp b/src/Access/SettingsProfilesInfo.cpp new file mode 100644 index 00000000000..46ebb6084e7 --- /dev/null +++ b/src/Access/SettingsProfilesInfo.cpp @@ -0,0 +1,58 @@ +#include +#include +#include + + +namespace DB +{ + +bool operator==(const SettingsProfilesInfo & lhs, const SettingsProfilesInfo & rhs) +{ + if (lhs.settings != rhs.settings) + return false; + + if (lhs.constraints != rhs.constraints) + return false; + + if (lhs.profiles != rhs.profiles) + return false; + + if (lhs.profiles_with_implicit != rhs.profiles_with_implicit) + return false; + + if (lhs.names_of_profiles != rhs.names_of_profiles) + return false; + + return true; +} + +std::shared_ptr +SettingsProfilesInfo::getConstraintsAndProfileIDs(const std::shared_ptr & previous) const +{ + auto res = std::make_shared(manager); + res->current_profiles = profiles; + + if (previous) + { + res->constraints = previous->constraints; + res->constraints.merge(constraints); + } + else + res->constraints = constraints; + + if (previous) + { + res->enabled_profiles.reserve(previous->enabled_profiles.size() + profiles_with_implicit.size()); + res->enabled_profiles = previous->enabled_profiles; + } + res->enabled_profiles.insert(res->enabled_profiles.end(), profiles_with_implicit.begin(), profiles_with_implicit.end()); + + /// If some profile occurs multiple times (with some other settings in between), + /// the latest occurrence overrides all the previous ones. + removeDuplicatesKeepLast(res->current_profiles); + removeDuplicatesKeepLast(res->enabled_profiles); + + return res; +} + +} diff --git a/src/Access/SettingsProfilesInfo.h b/src/Access/SettingsProfilesInfo.h new file mode 100644 index 00000000000..d1fba0e9f5f --- /dev/null +++ b/src/Access/SettingsProfilesInfo.h @@ -0,0 +1,43 @@ +#pragma once + +#include +#include +#include +#include + + +namespace DB +{ +struct SettingsConstraintsAndProfileIDs; + +/// Information about the default settings which are applied to an user on login. +struct SettingsProfilesInfo +{ + SettingsChanges settings; + SettingsConstraints constraints; + + /// Profiles explicitly assigned to the user. + std::vector profiles; + + /// Profiles assigned to the user both explicitly and implicitly. + /// Implicitly assigned profiles include parent profiles of other assigned profiles, + /// profiles assigned via granted roles, profiles assigned via their own settings, + /// and the main default profile (see the section `default_profile` in the main configuration file). + /// The order of IDs in this vector corresponds the order of applying of these profiles. + std::vector profiles_with_implicit; + + /// Names of all the profiles in `profiles`. + std::unordered_map names_of_profiles; + + SettingsProfilesInfo(const AccessControlManager & manager_) : constraints(manager_), manager(manager_) {} + std::shared_ptr getConstraintsAndProfileIDs( + const std::shared_ptr & previous = nullptr) const; + + friend bool operator ==(const SettingsProfilesInfo & lhs, const SettingsProfilesInfo & rhs); + friend bool operator !=(const SettingsProfilesInfo & lhs, const SettingsProfilesInfo & rhs) { return !(lhs == rhs); } + +private: + const AccessControlManager & manager; +}; + +} diff --git a/src/Access/User.cpp b/src/Access/User.cpp index 016f378e83f..e21b48e11a0 100644 --- a/src/Access/User.cpp +++ b/src/Access/User.cpp @@ -11,7 +11,7 @@ bool User::equal(const IAccessEntity & other) const const auto & other_user = typeid_cast(other); return (authentication == other_user.authentication) && (allowed_client_hosts == other_user.allowed_client_hosts) && (access == other_user.access) && (granted_roles == other_user.granted_roles) && (default_roles == other_user.default_roles) - && (settings == other_user.settings) && (grantees == other_user.grantees); + && (settings == other_user.settings) && (grantees == other_user.grantees) && (default_database == other_user.default_database); } } diff --git a/src/Access/User.h b/src/Access/User.h index 5b10d953fc0..6b61d5afdea 100644 --- a/src/Access/User.h +++ b/src/Access/User.h @@ -22,6 +22,7 @@ struct User : public IAccessEntity RolesOrUsersSet default_roles = RolesOrUsersSet::AllTag{}; SettingsProfileElements settings; RolesOrUsersSet grantees = RolesOrUsersSet::AllTag{}; + String default_database; bool equal(const IAccessEntity & other) const override; std::shared_ptr clone() const override { return cloneImpl(); } diff --git a/src/Access/UsersConfigAccessStorage.cpp b/src/Access/UsersConfigAccessStorage.cpp index 5d8725f0cdf..68333ed45da 100644 --- a/src/Access/UsersConfigAccessStorage.cpp +++ b/src/Access/UsersConfigAccessStorage.cpp @@ -196,6 +196,9 @@ namespace user->access.revokeGrantOption(AccessType::ALL); } + String default_database = config.getString(user_config + ".default_database", ""); + user->default_database = default_database; + return user; } diff --git a/src/Access/tests/gtest_access_rights_ops.cpp b/src/Access/tests/gtest_access_rights_ops.cpp new file mode 100644 index 00000000000..3d7b396a6f2 --- /dev/null +++ b/src/Access/tests/gtest_access_rights_ops.cpp @@ -0,0 +1,94 @@ +#include +#include + +using namespace DB; + + +TEST(AccessRights, Union) +{ + AccessRights lhs, rhs; + lhs.grant(AccessType::CREATE_TABLE, "db1", "tb1"); + rhs.grant(AccessType::SELECT, "db2"); + lhs.makeUnion(rhs); + ASSERT_EQ(lhs.toString(), "GRANT CREATE TABLE ON db1.tb1, GRANT SELECT ON db2.*"); + + lhs.clear(); + rhs.clear(); + rhs.grant(AccessType::SELECT, "db2"); + lhs.grant(AccessType::CREATE_TABLE, "db1", "tb1"); + lhs.makeUnion(rhs); + ASSERT_EQ(lhs.toString(), "GRANT CREATE TABLE ON db1.tb1, GRANT SELECT ON db2.*"); + + lhs = {}; + rhs = {}; + lhs.grant(AccessType::SELECT); + rhs.grant(AccessType::SELECT, "db1", "tb1"); + lhs.makeUnion(rhs); + ASSERT_EQ(lhs.toString(), "GRANT SELECT ON *.*"); + + lhs = {}; + rhs = {}; + lhs.grant(AccessType::SELECT, "db1", "tb1", Strings{"col1", "col2"}); + rhs.grant(AccessType::SELECT, "db1", "tb1", Strings{"col2", "col3"}); + lhs.makeUnion(rhs); + ASSERT_EQ(lhs.toString(), "GRANT SELECT(col1, col2, col3) ON db1.tb1"); + + lhs = {}; + rhs = {}; + lhs.grant(AccessType::SELECT, "db1", "tb1", Strings{"col1", "col2"}); + rhs.grantWithGrantOption(AccessType::SELECT, "db1", "tb1", Strings{"col2", "col3"}); + lhs.makeUnion(rhs); + ASSERT_EQ(lhs.toString(), "GRANT SELECT(col1) ON db1.tb1, GRANT SELECT(col2, col3) ON db1.tb1 WITH GRANT OPTION"); + + lhs = {}; + rhs = {}; + lhs.grant(AccessType::INSERT); + rhs.grant(AccessType::ALL, "db1"); + lhs.makeUnion(rhs); + ASSERT_EQ(lhs.toString(), "GRANT INSERT ON *.*, GRANT SHOW, SELECT, ALTER, CREATE DATABASE, CREATE TABLE, CREATE VIEW, CREATE DICTIONARY, DROP, TRUNCATE, OPTIMIZE, SYSTEM MERGES, SYSTEM TTL MERGES, SYSTEM FETCHES, SYSTEM MOVES, SYSTEM SENDS, SYSTEM REPLICATION QUEUES, SYSTEM DROP REPLICA, SYSTEM SYNC REPLICA, SYSTEM RESTART REPLICA, SYSTEM RESTORE REPLICA, SYSTEM FLUSH DISTRIBUTED, dictGet ON db1.*"); +} + + +TEST(AccessRights, Intersection) +{ + AccessRights lhs, rhs; + lhs.grant(AccessType::CREATE_TABLE, "db1", "tb1"); + rhs.grant(AccessType::SELECT, "db2"); + lhs.makeIntersection(rhs); + ASSERT_EQ(lhs.toString(), "GRANT USAGE ON *.*"); + + lhs.clear(); + rhs.clear(); + lhs.grant(AccessType::SELECT, "db2"); + rhs.grant(AccessType::CREATE_TABLE, "db1", "tb1"); + lhs.makeIntersection(rhs); + ASSERT_EQ(lhs.toString(), "GRANT USAGE ON *.*"); + + lhs = {}; + rhs = {}; + lhs.grant(AccessType::SELECT); + rhs.grant(AccessType::SELECT, "db1", "tb1"); + lhs.makeIntersection(rhs); + ASSERT_EQ(lhs.toString(), "GRANT SELECT ON db1.tb1"); + + lhs = {}; + rhs = {}; + lhs.grant(AccessType::SELECT, "db1", "tb1", Strings{"col1", "col2"}); + rhs.grant(AccessType::SELECT, "db1", "tb1", Strings{"col2", "col3"}); + lhs.makeIntersection(rhs); + ASSERT_EQ(lhs.toString(), "GRANT SELECT(col2) ON db1.tb1"); + + lhs = {}; + rhs = {}; + lhs.grant(AccessType::SELECT, "db1", "tb1", Strings{"col1", "col2"}); + rhs.grantWithGrantOption(AccessType::SELECT, "db1", "tb1", Strings{"col2", "col3"}); + lhs.makeIntersection(rhs); + ASSERT_EQ(lhs.toString(), "GRANT SELECT(col2) ON db1.tb1"); + + lhs = {}; + rhs = {}; + lhs.grant(AccessType::INSERT); + rhs.grant(AccessType::ALL, "db1"); + lhs.makeIntersection(rhs); + ASSERT_EQ(lhs.toString(), "GRANT INSERT ON db1.*"); +} diff --git a/src/Access/ya.make b/src/Access/ya.make index e8584230538..5f2f410cabd 100644 --- a/src/Access/ya.make +++ b/src/Access/ya.make @@ -43,8 +43,10 @@ SRCS( SettingsProfile.cpp SettingsProfileElement.cpp SettingsProfilesCache.cpp + SettingsProfilesInfo.cpp User.cpp UsersConfigAccessStorage.cpp + tests/gtest_access_rights_ops.cpp ) diff --git a/src/AggregateFunctions/AggregateFunctionArray.cpp b/src/AggregateFunctions/AggregateFunctionArray.cpp index 5ec41fbdd82..982180ab50c 100644 --- a/src/AggregateFunctions/AggregateFunctionArray.cpp +++ b/src/AggregateFunctions/AggregateFunctionArray.cpp @@ -43,9 +43,9 @@ public: const AggregateFunctionPtr & nested_function, const AggregateFunctionProperties &, const DataTypes & arguments, - const Array &) const override + const Array & params) const override { - return std::make_shared(nested_function, arguments); + return std::make_shared(nested_function, arguments, params); } }; diff --git a/src/AggregateFunctions/AggregateFunctionArray.h b/src/AggregateFunctions/AggregateFunctionArray.h index f1005e2e43a..e6f2b46c67e 100644 --- a/src/AggregateFunctions/AggregateFunctionArray.h +++ b/src/AggregateFunctions/AggregateFunctionArray.h @@ -29,10 +29,11 @@ private: size_t num_arguments; public: - AggregateFunctionArray(AggregateFunctionPtr nested_, const DataTypes & arguments) - : IAggregateFunctionHelper(arguments, {}) + AggregateFunctionArray(AggregateFunctionPtr nested_, const DataTypes & arguments, const Array & params_) + : IAggregateFunctionHelper(arguments, params_) , nested_func(nested_), num_arguments(arguments.size()) { + assert(parameters == nested_func->getParameters()); for (const auto & type : arguments) if (!isArray(type)) throw Exception("All arguments for aggregate function " + getName() + " must be arrays", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); diff --git a/src/AggregateFunctions/AggregateFunctionDistinct.cpp b/src/AggregateFunctions/AggregateFunctionDistinct.cpp index d5e4d421bb1..f224768991b 100644 --- a/src/AggregateFunctions/AggregateFunctionDistinct.cpp +++ b/src/AggregateFunctions/AggregateFunctionDistinct.cpp @@ -34,14 +34,14 @@ public: const AggregateFunctionPtr & nested_function, const AggregateFunctionProperties &, const DataTypes & arguments, - const Array &) const override + const Array & params) const override { AggregateFunctionPtr res; if (arguments.size() == 1) { res.reset(createWithNumericType< AggregateFunctionDistinct, - AggregateFunctionDistinctSingleNumericData>(*arguments[0], nested_function, arguments)); + AggregateFunctionDistinctSingleNumericData>(*arguments[0], nested_function, arguments, params)); if (res) return res; @@ -49,14 +49,14 @@ public: if (arguments[0]->isValueUnambiguouslyRepresentedInContiguousMemoryRegion()) return std::make_shared< AggregateFunctionDistinct< - AggregateFunctionDistinctSingleGenericData>>(nested_function, arguments); + AggregateFunctionDistinctSingleGenericData>>(nested_function, arguments, params); else return std::make_shared< AggregateFunctionDistinct< - AggregateFunctionDistinctSingleGenericData>>(nested_function, arguments); + AggregateFunctionDistinctSingleGenericData>>(nested_function, arguments, params); } - return std::make_shared>(nested_function, arguments); + return std::make_shared>(nested_function, arguments, params); } }; diff --git a/src/AggregateFunctions/AggregateFunctionDistinct.h b/src/AggregateFunctions/AggregateFunctionDistinct.h index 9b7853f8665..0f085423bb9 100644 --- a/src/AggregateFunctions/AggregateFunctionDistinct.h +++ b/src/AggregateFunctions/AggregateFunctionDistinct.h @@ -167,8 +167,8 @@ private: } public: - AggregateFunctionDistinct(AggregateFunctionPtr nested_func_, const DataTypes & arguments) - : IAggregateFunctionDataHelper(arguments, nested_func_->getParameters()) + AggregateFunctionDistinct(AggregateFunctionPtr nested_func_, const DataTypes & arguments, const Array & params_) + : IAggregateFunctionDataHelper(arguments, params_) , nested_func(nested_func_) , arguments_num(arguments.size()) {} diff --git a/src/AggregateFunctions/AggregateFunctionFactory.cpp b/src/AggregateFunctions/AggregateFunctionFactory.cpp index 4a2b93e544a..c9dcdb54424 100644 --- a/src/AggregateFunctions/AggregateFunctionFactory.cpp +++ b/src/AggregateFunctions/AggregateFunctionFactory.cpp @@ -95,18 +95,18 @@ AggregateFunctionPtr AggregateFunctionFactory::get( // nullability themselves. Another special case is functions from Nothing // that are rewritten to AggregateFunctionNothing, in this case // nested_function is nullptr. - if (nested_function && nested_function->isOnlyWindowFunction()) + if (!nested_function || !nested_function->isOnlyWindowFunction()) { - return nested_function; + return combinator->transformAggregateFunction(nested_function, + out_properties, type_without_low_cardinality, parameters); } - - return combinator->transformAggregateFunction(nested_function, out_properties, type_without_low_cardinality, parameters); } - auto res = getImpl(name, type_without_low_cardinality, parameters, out_properties, false); - if (!res) + auto with_original_arguments = getImpl(name, type_without_low_cardinality, parameters, out_properties, false); + + if (!with_original_arguments) throw Exception("Logical error: AggregateFunctionFactory returned nullptr", ErrorCodes::LOGICAL_ERROR); - return res; + return with_original_arguments; } diff --git a/src/AggregateFunctions/AggregateFunctionForEach.cpp b/src/AggregateFunctions/AggregateFunctionForEach.cpp index 7b09c7d95da..cf448d602bf 100644 --- a/src/AggregateFunctions/AggregateFunctionForEach.cpp +++ b/src/AggregateFunctions/AggregateFunctionForEach.cpp @@ -38,9 +38,9 @@ public: const AggregateFunctionPtr & nested_function, const AggregateFunctionProperties &, const DataTypes & arguments, - const Array &) const override + const Array & params) const override { - return std::make_shared(nested_function, arguments); + return std::make_shared(nested_function, arguments, params); } }; diff --git a/src/AggregateFunctions/AggregateFunctionForEach.h b/src/AggregateFunctions/AggregateFunctionForEach.h index 66209d8c0f5..084396b2405 100644 --- a/src/AggregateFunctions/AggregateFunctionForEach.h +++ b/src/AggregateFunctions/AggregateFunctionForEach.h @@ -105,8 +105,8 @@ private: } public: - AggregateFunctionForEach(AggregateFunctionPtr nested_, const DataTypes & arguments) - : IAggregateFunctionDataHelper(arguments, {}) + AggregateFunctionForEach(AggregateFunctionPtr nested_, const DataTypes & arguments, const Array & params_) + : IAggregateFunctionDataHelper(arguments, params_) , nested_func(nested_), num_arguments(arguments.size()) { nested_size_of_data = nested_func->sizeOfData(); diff --git a/src/AggregateFunctions/AggregateFunctionGroupBitmap.h b/src/AggregateFunctions/AggregateFunctionGroupBitmap.h index e5097211928..3faeb781284 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupBitmap.h +++ b/src/AggregateFunctions/AggregateFunctionGroupBitmap.h @@ -60,7 +60,7 @@ public: { } - String getName() const override { return Data::name(); } + String getName() const override { return Policy::name; } DataTypePtr getReturnType() const override { return std::make_shared>(); } @@ -120,6 +120,7 @@ template class BitmapAndPolicy { public: + static constexpr auto name = "groupBitmapAnd"; static void apply(Data & lhs, const Data & rhs) { lhs.rbs.rb_and(rhs.rbs); } }; @@ -127,6 +128,7 @@ template class BitmapOrPolicy { public: + static constexpr auto name = "groupBitmapOr"; static void apply(Data & lhs, const Data & rhs) { lhs.rbs.rb_or(rhs.rbs); } }; @@ -134,6 +136,7 @@ template class BitmapXorPolicy { public: + static constexpr auto name = "groupBitmapXor"; static void apply(Data & lhs, const Data & rhs) { lhs.rbs.rb_xor(rhs.rbs); } }; diff --git a/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp b/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp index 646d0341343..7709357189c 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp +++ b/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp @@ -25,8 +25,8 @@ template class AggregateFunctionGroupUniqArrayDate : public AggregateFunctionGroupUniqArray { public: - explicit AggregateFunctionGroupUniqArrayDate(const DataTypePtr & argument_type, UInt64 max_elems_ = std::numeric_limits::max()) - : AggregateFunctionGroupUniqArray(argument_type, max_elems_) {} + explicit AggregateFunctionGroupUniqArrayDate(const DataTypePtr & argument_type, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) + : AggregateFunctionGroupUniqArray(argument_type, parameters_, max_elems_) {} DataTypePtr getReturnType() const override { return std::make_shared(std::make_shared()); } }; @@ -34,8 +34,8 @@ template class AggregateFunctionGroupUniqArrayDateTime : public AggregateFunctionGroupUniqArray { public: - explicit AggregateFunctionGroupUniqArrayDateTime(const DataTypePtr & argument_type, UInt64 max_elems_ = std::numeric_limits::max()) - : AggregateFunctionGroupUniqArray(argument_type, max_elems_) {} + explicit AggregateFunctionGroupUniqArrayDateTime(const DataTypePtr & argument_type, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) + : AggregateFunctionGroupUniqArray(argument_type, parameters_, max_elems_) {} DataTypePtr getReturnType() const override { return std::make_shared(std::make_shared()); } }; @@ -102,9 +102,9 @@ AggregateFunctionPtr createAggregateFunctionGroupUniqArray( ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); if (!limit_size) - return createAggregateFunctionGroupUniqArrayImpl(name, argument_types[0]); + return createAggregateFunctionGroupUniqArrayImpl(name, argument_types[0], parameters); else - return createAggregateFunctionGroupUniqArrayImpl(name, argument_types[0], max_elems); + return createAggregateFunctionGroupUniqArrayImpl(name, argument_types[0], parameters, max_elems); } } diff --git a/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h b/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h index ccba789483f..cec160ee21f 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h +++ b/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h @@ -48,9 +48,9 @@ private: using State = AggregateFunctionGroupUniqArrayData; public: - AggregateFunctionGroupUniqArray(const DataTypePtr & argument_type, UInt64 max_elems_ = std::numeric_limits::max()) + AggregateFunctionGroupUniqArray(const DataTypePtr & argument_type, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) : IAggregateFunctionDataHelper, - AggregateFunctionGroupUniqArray>({argument_type}, {}), + AggregateFunctionGroupUniqArray>({argument_type}, parameters_), max_elems(max_elems_) {} String getName() const override { return "groupUniqArray"; } @@ -152,8 +152,8 @@ class AggregateFunctionGroupUniqArrayGeneric using State = AggregateFunctionGroupUniqArrayGenericData; public: - AggregateFunctionGroupUniqArrayGeneric(const DataTypePtr & input_data_type_, UInt64 max_elems_ = std::numeric_limits::max()) - : IAggregateFunctionDataHelper>({input_data_type_}, {}) + AggregateFunctionGroupUniqArrayGeneric(const DataTypePtr & input_data_type_, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) + : IAggregateFunctionDataHelper>({input_data_type_}, parameters_) , input_data_type(this->argument_types[0]) , max_elems(max_elems_) {} diff --git a/src/AggregateFunctions/AggregateFunctionIf.cpp b/src/AggregateFunctions/AggregateFunctionIf.cpp index c074daf45be..d841fe8c06d 100644 --- a/src/AggregateFunctions/AggregateFunctionIf.cpp +++ b/src/AggregateFunctions/AggregateFunctionIf.cpp @@ -35,9 +35,9 @@ public: const AggregateFunctionPtr & nested_function, const AggregateFunctionProperties &, const DataTypes & arguments, - const Array &) const override + const Array & params) const override { - return std::make_shared(nested_function, arguments); + return std::make_shared(nested_function, arguments, params); } }; diff --git a/src/AggregateFunctions/AggregateFunctionIf.h b/src/AggregateFunctions/AggregateFunctionIf.h index 153c80e87b2..79999437ca1 100644 --- a/src/AggregateFunctions/AggregateFunctionIf.h +++ b/src/AggregateFunctions/AggregateFunctionIf.h @@ -37,8 +37,8 @@ private: size_t num_arguments; public: - AggregateFunctionIf(AggregateFunctionPtr nested, const DataTypes & types) - : IAggregateFunctionHelper(types, nested->getParameters()) + AggregateFunctionIf(AggregateFunctionPtr nested, const DataTypes & types, const Array & params_) + : IAggregateFunctionHelper(types, params_) , nested_func(nested), num_arguments(types.size()) { if (num_arguments == 0) diff --git a/src/AggregateFunctions/AggregateFunctionMerge.cpp b/src/AggregateFunctions/AggregateFunctionMerge.cpp index a19a21fd4a4..cdf399585f5 100644 --- a/src/AggregateFunctions/AggregateFunctionMerge.cpp +++ b/src/AggregateFunctions/AggregateFunctionMerge.cpp @@ -39,7 +39,7 @@ public: const AggregateFunctionPtr & nested_function, const AggregateFunctionProperties &, const DataTypes & arguments, - const Array &) const override + const Array & params) const override { const DataTypePtr & argument = arguments[0]; @@ -53,7 +53,7 @@ public: + ", because it corresponds to different aggregate function: " + function->getFunctionName() + " instead of " + nested_function->getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - return std::make_shared(nested_function, argument); + return std::make_shared(nested_function, argument, params); } }; diff --git a/src/AggregateFunctions/AggregateFunctionMerge.h b/src/AggregateFunctions/AggregateFunctionMerge.h index 3bb482e4ac9..af9257d3c57 100644 --- a/src/AggregateFunctions/AggregateFunctionMerge.h +++ b/src/AggregateFunctions/AggregateFunctionMerge.h @@ -29,15 +29,15 @@ private: AggregateFunctionPtr nested_func; public: - AggregateFunctionMerge(const AggregateFunctionPtr & nested_, const DataTypePtr & argument) - : IAggregateFunctionHelper({argument}, nested_->getParameters()) + AggregateFunctionMerge(const AggregateFunctionPtr & nested_, const DataTypePtr & argument, const Array & params_) + : IAggregateFunctionHelper({argument}, params_) , nested_func(nested_) { const DataTypeAggregateFunction * data_type = typeid_cast(argument.get()); - if (!data_type || data_type->getFunctionName() != nested_func->getName()) - throw Exception("Illegal type " + argument->getName() + " of argument for aggregate function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + if (!data_type || !nested_func->haveSameStateRepresentation(*data_type->getFunction())) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Illegal type {} of argument for aggregate function {}, " + "expected {} or equivalent type", argument->getName(), getName(), getStateType()->getName()); } String getName() const override diff --git a/src/AggregateFunctions/AggregateFunctionQuantile.h b/src/AggregateFunctions/AggregateFunctionQuantile.h index 90745c7d749..a7a3d4042c2 100644 --- a/src/AggregateFunctions/AggregateFunctionQuantile.h +++ b/src/AggregateFunctions/AggregateFunctionQuantile.h @@ -105,6 +105,11 @@ public: return res; } + bool haveSameStateRepresentation(const IAggregateFunction & rhs) const override + { + return getName() == rhs.getName() && this->haveEqualArgumentTypes(rhs); + } + bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionSequenceMatch.h b/src/AggregateFunctions/AggregateFunctionSequenceMatch.h index 88a6809d229..d05a4ca314d 100644 --- a/src/AggregateFunctions/AggregateFunctionSequenceMatch.h +++ b/src/AggregateFunctions/AggregateFunctionSequenceMatch.h @@ -179,6 +179,11 @@ public: this->data(place).deserialize(buf); } + bool haveSameStateRepresentation(const IAggregateFunction & rhs) const override + { + return this->getName() == rhs.getName() && this->haveEqualArgumentTypes(rhs); + } + private: enum class PatternActionType { diff --git a/src/AggregateFunctions/AggregateFunctionSequenceNextNode.cpp b/src/AggregateFunctions/AggregateFunctionSequenceNextNode.cpp index 965413afb1d..d86499b90f3 100644 --- a/src/AggregateFunctions/AggregateFunctionSequenceNextNode.cpp +++ b/src/AggregateFunctions/AggregateFunctionSequenceNextNode.cpp @@ -31,10 +31,10 @@ namespace template inline AggregateFunctionPtr createAggregateFunctionSequenceNodeImpl( - const DataTypePtr data_type, const DataTypes & argument_types, SequenceDirection direction, SequenceBase base) + const DataTypePtr data_type, const DataTypes & argument_types, const Array & parameters, SequenceDirection direction, SequenceBase base) { return std::make_shared>>( - data_type, argument_types, base, direction, min_required_args); + data_type, argument_types, parameters, base, direction, min_required_args); } AggregateFunctionPtr @@ -116,17 +116,17 @@ createAggregateFunctionSequenceNode(const std::string & name, const DataTypes & WhichDataType timestamp_type(argument_types[0].get()); if (timestamp_type.idx == TypeIndex::UInt8) - return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, direction, base); + return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, parameters, direction, base); if (timestamp_type.idx == TypeIndex::UInt16) - return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, direction, base); + return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, parameters, direction, base); if (timestamp_type.idx == TypeIndex::UInt32) - return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, direction, base); + return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, parameters, direction, base); if (timestamp_type.idx == TypeIndex::UInt64) - return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, direction, base); + return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, parameters, direction, base); if (timestamp_type.isDate()) - return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, direction, base); + return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, parameters, direction, base); if (timestamp_type.isDateTime()) - return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, direction, base); + return createAggregateFunctionSequenceNodeImpl(data_type, argument_types, parameters, direction, base); throw Exception{"Illegal type " + argument_types.front().get()->getName() + " of first argument of aggregate function " + name + ", must be Unsigned Number, Date, DateTime", diff --git a/src/AggregateFunctions/AggregateFunctionSequenceNextNode.h b/src/AggregateFunctions/AggregateFunctionSequenceNextNode.h index 116e53e95e8..e5b007232e2 100644 --- a/src/AggregateFunctions/AggregateFunctionSequenceNextNode.h +++ b/src/AggregateFunctions/AggregateFunctionSequenceNextNode.h @@ -175,11 +175,12 @@ public: SequenceNextNodeImpl( const DataTypePtr & data_type_, const DataTypes & arguments, + const Array & parameters_, SequenceBase seq_base_kind_, SequenceDirection seq_direction_, size_t min_required_args_, UInt64 max_elems_ = std::numeric_limits::max()) - : IAggregateFunctionDataHelper, Self>({data_type_}, {}) + : IAggregateFunctionDataHelper, Self>({data_type_}, parameters_) , seq_base_kind(seq_base_kind_) , seq_direction(seq_direction_) , min_required_args(min_required_args_) @@ -193,6 +194,11 @@ public: DataTypePtr getReturnType() const override { return data_type; } + bool haveSameStateRepresentation(const IAggregateFunction & rhs) const override + { + return this->getName() == rhs.getName() && this->haveEqualArgumentTypes(rhs); + } + AggregateFunctionPtr getOwnNullAdapter( const AggregateFunctionPtr & nested_function, const DataTypes & arguments, const Array & params, const AggregateFunctionProperties &) const override diff --git a/src/AggregateFunctions/AggregateFunctionSumMap.h b/src/AggregateFunctions/AggregateFunctionSumMap.h index 03327f76e48..b103f42fcc5 100644 --- a/src/AggregateFunctions/AggregateFunctionSumMap.h +++ b/src/AggregateFunctions/AggregateFunctionSumMap.h @@ -459,6 +459,8 @@ public: explicit FieldVisitorMax(const Field & rhs_) : rhs(rhs_) {} bool operator() (Null &) const { throw Exception("Cannot compare Nulls", ErrorCodes::LOGICAL_ERROR); } + bool operator() (NegativeInfinity &) const { throw Exception("Cannot compare -Inf", ErrorCodes::LOGICAL_ERROR); } + bool operator() (PositiveInfinity &) const { throw Exception("Cannot compare +Inf", ErrorCodes::LOGICAL_ERROR); } bool operator() (AggregateFunctionStateData &) const { throw Exception("Cannot compare AggregateFunctionStates", ErrorCodes::LOGICAL_ERROR); } bool operator() (Array & x) const { return compareImpl(x); } @@ -494,6 +496,8 @@ public: explicit FieldVisitorMin(const Field & rhs_) : rhs(rhs_) {} bool operator() (Null &) const { throw Exception("Cannot compare Nulls", ErrorCodes::LOGICAL_ERROR); } + bool operator() (NegativeInfinity &) const { throw Exception("Cannot compare -Inf", ErrorCodes::LOGICAL_ERROR); } + bool operator() (PositiveInfinity &) const { throw Exception("Cannot compare +Inf", ErrorCodes::LOGICAL_ERROR); } bool operator() (AggregateFunctionStateData &) const { throw Exception("Cannot sum AggregateFunctionStates", ErrorCodes::LOGICAL_ERROR); } bool operator() (Array & x) const { return compareImpl(x); } diff --git a/src/AggregateFunctions/IAggregateFunction.cpp b/src/AggregateFunctions/IAggregateFunction.cpp index 55998d963bf..ea4f8338fb8 100644 --- a/src/AggregateFunctions/IAggregateFunction.cpp +++ b/src/AggregateFunctions/IAggregateFunction.cpp @@ -50,4 +50,21 @@ String IAggregateFunction::getDescription() const return description; } + +bool IAggregateFunction::haveEqualArgumentTypes(const IAggregateFunction & rhs) const +{ + return std::equal(argument_types.begin(), argument_types.end(), + rhs.argument_types.begin(), rhs.argument_types.end(), + [](const auto & t1, const auto & t2) { return t1->equals(*t2); }); +} + +bool IAggregateFunction::haveSameStateRepresentation(const IAggregateFunction & rhs) const +{ + bool res = getName() == rhs.getName() + && parameters == rhs.parameters + && haveEqualArgumentTypes(rhs); + assert(res == (getStateType()->getName() == rhs.getStateType()->getName())); + return res; +} + } diff --git a/src/AggregateFunctions/IAggregateFunction.h b/src/AggregateFunctions/IAggregateFunction.h index 7acfa82a139..a06f1d12c0d 100644 --- a/src/AggregateFunctions/IAggregateFunction.h +++ b/src/AggregateFunctions/IAggregateFunction.h @@ -74,6 +74,16 @@ public: /// Get the data type of internal state. By default it is AggregateFunction(name(params), argument_types...). virtual DataTypePtr getStateType() const; + /// Returns true if two aggregate functions have the same state representation in memory and the same serialization, + /// so state of one aggregate function can be safely used with another. + /// Examples: + /// - quantile(x), quantile(a)(x), quantile(b)(x) - parameter doesn't affect state and used for finalization only + /// - foo(x) and fooIf(x) - If combinator doesn't affect state + /// By default returns true only if functions have exactly the same names, combinators and parameters. + virtual bool haveSameStateRepresentation(const IAggregateFunction & rhs) const; + + bool haveEqualArgumentTypes(const IAggregateFunction & rhs) const; + /// Get type which will be used for prediction result in case if function is an ML method. virtual DataTypePtr getReturnTypeToPredict() const { diff --git a/src/Bridge/IBridgeHelper.cpp b/src/Bridge/IBridgeHelper.cpp index b6f3446d0a6..5c884a2ca3d 100644 --- a/src/Bridge/IBridgeHelper.cpp +++ b/src/Bridge/IBridgeHelper.cpp @@ -33,24 +33,9 @@ Poco::URI IBridgeHelper::getPingURI() const } -bool IBridgeHelper::checkBridgeIsRunning() const +void IBridgeHelper::startBridgeSync() { - try - { - ReadWriteBufferFromHTTP buf( - getPingURI(), Poco::Net::HTTPRequest::HTTP_GET, {}, ConnectionTimeouts::getHTTPTimeouts(getContext())); - return checkString(PING_OK_ANSWER, buf); - } - catch (...) - { - return false; - } -} - - -void IBridgeHelper::startBridgeSync() const -{ - if (!checkBridgeIsRunning()) + if (!bridgeHandShake()) { LOG_TRACE(getLog(), "{} is not running, will try to start it", serviceAlias()); startBridge(startBridgeCommand()); @@ -64,7 +49,7 @@ void IBridgeHelper::startBridgeSync() const ++counter; LOG_TRACE(getLog(), "Checking {} is running, try {}", serviceAlias(), counter); - if (checkBridgeIsRunning()) + if (bridgeHandShake()) { started = true; break; @@ -81,7 +66,7 @@ void IBridgeHelper::startBridgeSync() const } -std::unique_ptr IBridgeHelper::startBridgeCommand() const +std::unique_ptr IBridgeHelper::startBridgeCommand() { if (startBridgeManually()) throw Exception(serviceAlias() + " is not running. Please, start it manually", ErrorCodes::EXTERNAL_SERVER_IS_NOT_RESPONDING); diff --git a/src/Bridge/IBridgeHelper.h b/src/Bridge/IBridgeHelper.h index caaf031b7d8..0537658663d 100644 --- a/src/Bridge/IBridgeHelper.h +++ b/src/Bridge/IBridgeHelper.h @@ -28,16 +28,19 @@ public: static const inline std::string MAIN_METHOD = Poco::Net::HTTPRequest::HTTP_POST; explicit IBridgeHelper(ContextPtr context_) : WithContext(context_) {} - virtual ~IBridgeHelper() = default; - void startBridgeSync() const; + virtual ~IBridgeHelper() = default; Poco::URI getMainURI() const; Poco::URI getPingURI() const; + void startBridgeSync(); protected: + /// Check bridge is running. Can also check something else in the mean time. + virtual bool bridgeHandShake() = 0; + /// clickhouse-odbc-bridge, clickhouse-library-bridge virtual String serviceAlias() const = 0; @@ -61,9 +64,7 @@ protected: private: - bool checkBridgeIsRunning() const; - - std::unique_ptr startBridgeCommand() const; + std::unique_ptr startBridgeCommand(); }; } diff --git a/src/Bridge/LibraryBridgeHelper.cpp b/src/Bridge/LibraryBridgeHelper.cpp index 39ec80534b3..95a96074be0 100644 --- a/src/Bridge/LibraryBridgeHelper.cpp +++ b/src/Bridge/LibraryBridgeHelper.cpp @@ -1,12 +1,14 @@ #include "LibraryBridgeHelper.h" -#include #include #include +#include #include #include #include #include +#include +#include #include #include #include @@ -19,16 +21,25 @@ namespace DB { +namespace ErrorCodes +{ + extern const int EXTERNAL_LIBRARY_ERROR; + extern const int LOGICAL_ERROR; +} + LibraryBridgeHelper::LibraryBridgeHelper( ContextPtr context_, const Block & sample_block_, - const Field & dictionary_id_) + const Field & dictionary_id_, + const LibraryInitData & library_data_) : IBridgeHelper(context_->getGlobalContext()) , log(&Poco::Logger::get("LibraryBridgeHelper")) , sample_block(sample_block_) , config(context_->getConfigRef()) , http_timeout(context_->getGlobalContext()->getSettingsRef().http_receive_timeout.value) + , library_data(library_data_) , dictionary_id(dictionary_id_) + , http_timeouts(ConnectionTimeouts::getHTTPTimeouts(context_)) { bridge_port = config.getUInt("library_bridge.port", DEFAULT_PORT); bridge_host = config.getString("library_bridge.host", DEFAULT_HOST); @@ -60,26 +71,91 @@ void LibraryBridgeHelper::startBridge(std::unique_ptr cmd) const } -bool LibraryBridgeHelper::initLibrary(const std::string & library_path, const std::string library_settings, const std::string attributes_names) +bool LibraryBridgeHelper::bridgeHandShake() { - startBridgeSync(); - auto uri = createRequestURI(LIB_NEW_METHOD); + String result; + try + { + ReadWriteBufferFromHTTP buf(createRequestURI(PING), Poco::Net::HTTPRequest::HTTP_GET, {}, http_timeouts); + readString(result, buf); + } + catch (...) + { + return false; + } + /* + * When pinging bridge we also pass current dicionary_id. The bridge will check if there is such + * dictionary. It is possible that such dictionary_id is not present only in two cases: + * 1. It is dictionary source creation and initialization of library handler on bridge side did not happen yet. + * 2. Bridge crashed or restarted for some reason while server did not. + **/ + if (result.size() != 1) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected message from library bridge: {}. Check bridge and server have the same version.", result); + + UInt8 dictionary_id_exists; + auto parsed = tryParse(dictionary_id_exists, result); + if (!parsed || (dictionary_id_exists != 0 && dictionary_id_exists != 1)) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected message from library bridge: {} ({}). Check bridge and server have the same version.", + result, parsed ? toString(dictionary_id_exists) : "failed to parse"); + + LOG_TRACE(log, "dictionary_id: {}, dictionary_id_exists on bridge side: {}, library confirmed to be initialized on server side: {}", + toString(dictionary_id), toString(dictionary_id_exists), library_initialized); + + if (dictionary_id_exists && !library_initialized) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Library was not initialized, but bridge responded to already have dictionary id: {}", dictionary_id); + + /// Here we want to say bridge to recreate a new library handler for current dictionary, + /// because it responded to have lost it, but we know that it has already been created. (It is a direct result of bridge crash). + if (!dictionary_id_exists && library_initialized) + { + LOG_WARNING(log, "Library bridge does not have library handler with dictionaty id: {}. It will be reinitialized.", dictionary_id); + bool reinitialized = false; + try + { + auto uri = createRequestURI(LIB_NEW_METHOD); + reinitialized = executeRequest(uri, getInitLibraryCallback()); + } + catch (...) + { + tryLogCurrentException(log); + return false; + } + + if (!reinitialized) + throw Exception(ErrorCodes::EXTERNAL_LIBRARY_ERROR, + "Failed to reinitialize library handler on bridge side for dictionary with id: {}", dictionary_id); + } + + return true; +} + + +ReadWriteBufferFromHTTP::OutStreamCallback LibraryBridgeHelper::getInitLibraryCallback() const +{ /// Sample block must contain null values WriteBufferFromOwnString out; auto output_stream = getContext()->getOutputStream(LibraryBridgeHelper::DEFAULT_FORMAT, out, sample_block); formatBlock(output_stream, sample_block); auto block_string = out.str(); - auto out_stream_callback = [library_path, library_settings, attributes_names, block_string, this](std::ostream & os) + return [block_string, this](std::ostream & os) { - os << "library_path=" << escapeForFileName(library_path) << "&"; - os << "library_settings=" << escapeForFileName(library_settings) << "&"; - os << "attributes_names=" << escapeForFileName(attributes_names) << "&"; + os << "library_path=" << escapeForFileName(library_data.library_path) << "&"; + os << "library_settings=" << escapeForFileName(library_data.library_settings) << "&"; + os << "attributes_names=" << escapeForFileName(library_data.dict_attributes) << "&"; os << "sample_block=" << escapeForFileName(sample_block.getNamesAndTypesList().toString()) << "&"; os << "null_values=" << escapeForFileName(block_string); }; - return executeRequest(uri, out_stream_callback); +} + + +bool LibraryBridgeHelper::initLibrary() +{ + startBridgeSync(); + auto uri = createRequestURI(LIB_NEW_METHOD); + library_initialized = executeRequest(uri, getInitLibraryCallback()); + return library_initialized; } @@ -88,15 +164,23 @@ bool LibraryBridgeHelper::cloneLibrary(const Field & other_dictionary_id) startBridgeSync(); auto uri = createRequestURI(LIB_CLONE_METHOD); uri.addQueryParameter("from_dictionary_id", toString(other_dictionary_id)); - return executeRequest(uri); + /// We also pass initialization settings in order to create a library handler + /// in case from_dictionary_id does not exist in bridge side (possible in case of bridge crash). + library_initialized = executeRequest(uri, getInitLibraryCallback()); + return library_initialized; } bool LibraryBridgeHelper::removeLibrary() { - startBridgeSync(); - auto uri = createRequestURI(LIB_DELETE_METHOD); - return executeRequest(uri); + /// Do not force bridge restart if it is not running in case of removeLibrary + /// because in this case after restart it will not have this dictionaty id in memory anyway. + if (bridgeHandShake()) + { + auto uri = createRequestURI(LIB_DELETE_METHOD); + return executeRequest(uri); + } + return true; } @@ -124,11 +208,13 @@ BlockInputStreamPtr LibraryBridgeHelper::loadAll() } -BlockInputStreamPtr LibraryBridgeHelper::loadIds(const std::string ids_string) +BlockInputStreamPtr LibraryBridgeHelper::loadIds(const std::vector & ids) { startBridgeSync(); auto uri = createRequestURI(LOAD_IDS_METHOD); - return loadBase(uri, [ids_string](std::ostream & os) { os << "ids=" << ids_string; }); + uri.addQueryParameter("ids_num", toString(ids.size())); /// Not used parameter, but helpful + auto ids_string = getDictIdsString(ids); + return loadBase(uri, [ids_string](std::ostream & os) { os << ids_string; }); } @@ -148,13 +234,13 @@ BlockInputStreamPtr LibraryBridgeHelper::loadKeys(const Block & requested_block) } -bool LibraryBridgeHelper::executeRequest(const Poco::URI & uri, ReadWriteBufferFromHTTP::OutStreamCallback out_stream_callback) +bool LibraryBridgeHelper::executeRequest(const Poco::URI & uri, ReadWriteBufferFromHTTP::OutStreamCallback out_stream_callback) const { ReadWriteBufferFromHTTP buf( uri, Poco::Net::HTTPRequest::HTTP_POST, std::move(out_stream_callback), - ConnectionTimeouts::getHTTPTimeouts(getContext())); + http_timeouts); bool res; readBoolText(res, buf); @@ -168,7 +254,7 @@ BlockInputStreamPtr LibraryBridgeHelper::loadBase(const Poco::URI & uri, ReadWri uri, Poco::Net::HTTPRequest::HTTP_POST, std::move(out_stream_callback), - ConnectionTimeouts::getHTTPTimeouts(getContext()), + http_timeouts, 0, Poco::Net::HTTPBasicCredentials{}, DBMS_DEFAULT_BUFFER_SIZE, @@ -178,4 +264,13 @@ BlockInputStreamPtr LibraryBridgeHelper::loadBase(const Poco::URI & uri, ReadWri return std::make_shared>(input_stream, std::move(read_buf_ptr)); } + +String LibraryBridgeHelper::getDictIdsString(const std::vector & ids) +{ + WriteBufferFromOwnString out; + writeVectorBinary(ids, out); + return out.str(); +} + + } diff --git a/src/Bridge/LibraryBridgeHelper.h b/src/Bridge/LibraryBridgeHelper.h index 12fe0c33363..fec644240da 100644 --- a/src/Bridge/LibraryBridgeHelper.h +++ b/src/Bridge/LibraryBridgeHelper.h @@ -15,11 +15,18 @@ class LibraryBridgeHelper : public IBridgeHelper { public: + struct LibraryInitData + { + String library_path; + String library_settings; + String dict_attributes; + }; + static constexpr inline size_t DEFAULT_PORT = 9012; - LibraryBridgeHelper(ContextPtr context_, const Block & sample_block, const Field & dictionary_id_); + LibraryBridgeHelper(ContextPtr context_, const Block & sample_block, const Field & dictionary_id_, const LibraryInitData & library_data_); - bool initLibrary(const std::string & library_path, std::string library_settings, std::string attributes_names); + bool initLibrary(); bool cloneLibrary(const Field & other_dictionary_id); @@ -31,16 +38,19 @@ public: BlockInputStreamPtr loadAll(); - BlockInputStreamPtr loadIds(std::string ids_string); + BlockInputStreamPtr loadIds(const std::vector & ids); BlockInputStreamPtr loadKeys(const Block & requested_block); BlockInputStreamPtr loadBase(const Poco::URI & uri, ReadWriteBufferFromHTTP::OutStreamCallback out_stream_callback = {}); - bool executeRequest(const Poco::URI & uri, ReadWriteBufferFromHTTP::OutStreamCallback out_stream_callback = {}); + bool executeRequest(const Poco::URI & uri, ReadWriteBufferFromHTTP::OutStreamCallback out_stream_callback = {}) const; + LibraryInitData getLibraryData() const { return library_data; } protected: + bool bridgeHandShake() override; + void startBridge(std::unique_ptr cmd) const override; String serviceAlias() const override { return "clickhouse-library-bridge"; } @@ -61,6 +71,8 @@ protected: Poco::URI createBaseURI() const override; + ReadWriteBufferFromHTTP::OutStreamCallback getInitLibraryCallback() const; + private: static constexpr inline auto LIB_NEW_METHOD = "libNew"; static constexpr inline auto LIB_CLONE_METHOD = "libClone"; @@ -69,18 +81,24 @@ private: static constexpr inline auto LOAD_IDS_METHOD = "loadIds"; static constexpr inline auto LOAD_KEYS_METHOD = "loadKeys"; static constexpr inline auto IS_MODIFIED_METHOD = "isModified"; + static constexpr inline auto PING = "ping"; static constexpr inline auto SUPPORTS_SELECTIVE_LOAD_METHOD = "supportsSelectiveLoad"; Poco::URI createRequestURI(const String & method) const; + static String getDictIdsString(const std::vector & ids); + Poco::Logger * log; const Block sample_block; const Poco::Util::AbstractConfiguration & config; const Poco::Timespan http_timeout; + LibraryInitData library_data; Field dictionary_id; std::string bridge_host; size_t bridge_port; + bool library_initialized = false; + ConnectionTimeouts http_timeouts; }; } diff --git a/src/Bridge/XDBCBridgeHelper.h b/src/Bridge/XDBCBridgeHelper.h index 8be4c194962..b3613c381d0 100644 --- a/src/Bridge/XDBCBridgeHelper.h +++ b/src/Bridge/XDBCBridgeHelper.h @@ -60,20 +60,33 @@ public: static constexpr inline auto SCHEMA_ALLOWED_HANDLER = "/schema_allowed"; XDBCBridgeHelper( - ContextPtr context_, - Poco::Timespan http_timeout_, - const std::string & connection_string_) - : IXDBCBridgeHelper(context_->getGlobalContext()) - , log(&Poco::Logger::get(BridgeHelperMixin::getName() + "BridgeHelper")) - , connection_string(connection_string_) - , http_timeout(http_timeout_) - , config(context_->getGlobalContext()->getConfigRef()) -{ - bridge_host = config.getString(BridgeHelperMixin::configPrefix() + ".host", DEFAULT_HOST); - bridge_port = config.getUInt(BridgeHelperMixin::configPrefix() + ".port", DEFAULT_PORT); -} + ContextPtr context_, + Poco::Timespan http_timeout_, + const std::string & connection_string_) + : IXDBCBridgeHelper(context_->getGlobalContext()) + , log(&Poco::Logger::get(BridgeHelperMixin::getName() + "BridgeHelper")) + , connection_string(connection_string_) + , http_timeout(http_timeout_) + , config(context_->getGlobalContext()->getConfigRef()) + { + bridge_host = config.getString(BridgeHelperMixin::configPrefix() + ".host", DEFAULT_HOST); + bridge_port = config.getUInt(BridgeHelperMixin::configPrefix() + ".port", DEFAULT_PORT); + } protected: + bool bridgeHandShake() override + { + try + { + ReadWriteBufferFromHTTP buf(getPingURI(), Poco::Net::HTTPRequest::HTTP_GET, {}, ConnectionTimeouts::getHTTPTimeouts(getContext())); + return checkString(PING_OK_ANSWER, buf); + } + catch (...) + { + return false; + } + } + auto getConnectionString() const { return connection_string; } String getName() const override { return BridgeHelperMixin::getName(); } diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 31286c740d4..a99201e4aaa 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -473,6 +473,12 @@ endif () dbms_target_link_libraries(PRIVATE _boost_context) +if (USE_NLP) + dbms_target_link_libraries (PUBLIC stemmer) + dbms_target_link_libraries (PUBLIC wnb) + dbms_target_link_libraries (PUBLIC lemmagen) +endif() + include ("${ClickHouse_SOURCE_DIR}/cmake/add_check.cmake") if (ENABLE_TESTS AND USE_GTEST) diff --git a/src/Client/Connection.cpp b/src/Client/Connection.cpp index 87f768d7e75..366e61bc8e2 100644 --- a/src/Client/Connection.cpp +++ b/src/Client/Connection.cpp @@ -580,6 +580,12 @@ void Connection::sendPreparedData(ReadBuffer & input, size_t size, const String void Connection::sendScalarsData(Scalars & data) { + /// Avoid sending scalars to old servers. Note that this isn't a full fix. We didn't introduce a + /// dedicated revision after introducing scalars, so this will still break some versions with + /// revision 54428. + if (server_revision < DBMS_MIN_REVISION_WITH_SCALARS) + return; + if (data.empty()) return; diff --git a/src/Client/ConnectionPool.cpp b/src/Client/ConnectionPool.cpp new file mode 100644 index 00000000000..c5f398c899f --- /dev/null +++ b/src/Client/ConnectionPool.cpp @@ -0,0 +1,86 @@ +#include + +#include + +namespace DB +{ + +ConnectionPoolPtr ConnectionPoolFactory::get( + unsigned max_connections, + String host, + UInt16 port, + String default_database, + String user, + String password, + String cluster, + String cluster_secret, + String client_name, + Protocol::Compression compression, + Protocol::Secure secure, + Int64 priority) +{ + Key key{ + max_connections, host, port, default_database, user, password, cluster, cluster_secret, client_name, compression, secure, priority}; + + std::unique_lock lock(mutex); + auto [it, inserted] = pools.emplace(key, ConnectionPoolPtr{}); + if (!inserted) + if (auto res = it->second.lock()) + return res; + + ConnectionPoolPtr ret + { + new ConnectionPool( + max_connections, + host, + port, + default_database, + user, + password, + cluster, + cluster_secret, + client_name, + compression, + secure, + priority), + [key, this](auto ptr) + { + { + std::lock_guard another_lock(mutex); + pools.erase(key); + } + delete ptr; + } + }; + it->second = ConnectionPoolWeakPtr(ret); + return ret; +} + +size_t ConnectionPoolFactory::KeyHash::operator()(const ConnectionPoolFactory::Key & k) const +{ + using boost::hash_combine; + using boost::hash_value; + size_t seed = 0; + hash_combine(seed, hash_value(k.max_connections)); + hash_combine(seed, hash_value(k.host)); + hash_combine(seed, hash_value(k.port)); + hash_combine(seed, hash_value(k.default_database)); + hash_combine(seed, hash_value(k.user)); + hash_combine(seed, hash_value(k.password)); + hash_combine(seed, hash_value(k.cluster)); + hash_combine(seed, hash_value(k.cluster_secret)); + hash_combine(seed, hash_value(k.client_name)); + hash_combine(seed, hash_value(k.compression)); + hash_combine(seed, hash_value(k.secure)); + hash_combine(seed, hash_value(k.priority)); + return seed; +} + + +ConnectionPoolFactory & ConnectionPoolFactory::instance() +{ + static ConnectionPoolFactory ret; + return ret; +} + +} diff --git a/src/Client/ConnectionPool.h b/src/Client/ConnectionPool.h index bf73e9756d2..8f7bb2116d4 100644 --- a/src/Client/ConnectionPool.h +++ b/src/Client/ConnectionPool.h @@ -135,4 +135,60 @@ private: }; +/** + * Connection pool factory. Responsible for creating new connection pools and reuse existing ones. + */ +class ConnectionPoolFactory final : private boost::noncopyable +{ +public: + struct Key + { + unsigned max_connections; + String host; + UInt16 port; + String default_database; + String user; + String password; + String cluster; + String cluster_secret; + String client_name; + Protocol::Compression compression; + Protocol::Secure secure; + Int64 priority; + }; + + struct KeyHash + { + size_t operator()(const ConnectionPoolFactory::Key & k) const; + }; + + static ConnectionPoolFactory & instance(); + + ConnectionPoolPtr + get(unsigned max_connections, + String host, + UInt16 port, + String default_database, + String user, + String password, + String cluster, + String cluster_secret, + String client_name, + Protocol::Compression compression, + Protocol::Secure secure, + Int64 priority); +private: + mutable std::mutex mutex; + using ConnectionPoolWeakPtr = std::weak_ptr; + std::unordered_map pools; +}; + +inline bool operator==(const ConnectionPoolFactory::Key & lhs, const ConnectionPoolFactory::Key & rhs) +{ + return lhs.max_connections == rhs.max_connections && lhs.host == rhs.host && lhs.port == rhs.port + && lhs.default_database == rhs.default_database && lhs.user == rhs.user && lhs.password == rhs.password + && lhs.cluster == rhs.cluster && lhs.cluster_secret == rhs.cluster_secret && lhs.client_name == rhs.client_name + && lhs.compression == rhs.compression && lhs.secure == rhs.secure && lhs.priority == rhs.priority; +} + } diff --git a/src/Client/HedgedConnections.cpp b/src/Client/HedgedConnections.cpp index 8455ef3117e..b833241b2bc 100644 --- a/src/Client/HedgedConnections.cpp +++ b/src/Client/HedgedConnections.cpp @@ -3,6 +3,7 @@ #include #include #include +#include namespace ProfileEvents { @@ -21,13 +22,16 @@ namespace ErrorCodes HedgedConnections::HedgedConnections( const ConnectionPoolWithFailoverPtr & pool_, - const Settings & settings_, + ContextPtr context_, const ConnectionTimeouts & timeouts_, const ThrottlerPtr & throttler_, PoolMode pool_mode, std::shared_ptr table_to_check_) - : hedged_connections_factory(pool_, &settings_, timeouts_, table_to_check_) - , settings(settings_) + : hedged_connections_factory(pool_, &context_->getSettingsRef(), timeouts_, table_to_check_) + , context(std::move(context_)) + , settings(context->getSettingsRef()) + , drain_timeout(settings.drain_timeout) + , allow_changing_replica_until_first_data_packet(settings.allow_changing_replica_until_first_data_packet) , throttler(throttler_) { std::vector connections = hedged_connections_factory.getManyConnections(pool_mode); @@ -251,7 +255,7 @@ Packet HedgedConnections::drain() while (!epoll.empty()) { - ReplicaLocation location = getReadyReplicaLocation(); + ReplicaLocation location = getReadyReplicaLocation(DrainCallback{drain_timeout}); Packet packet = receivePacketFromReplica(location); switch (packet.type) { @@ -278,10 +282,10 @@ Packet HedgedConnections::drain() Packet HedgedConnections::receivePacket() { std::lock_guard lock(cancel_mutex); - return receivePacketUnlocked({}); + return receivePacketUnlocked({}, false /* is_draining */); } -Packet HedgedConnections::receivePacketUnlocked(AsyncCallback async_callback) +Packet HedgedConnections::receivePacketUnlocked(AsyncCallback async_callback, bool /* is_draining */) { if (!sent_query) throw Exception("Cannot receive packets: no query sent.", ErrorCodes::LOGICAL_ERROR); @@ -353,6 +357,11 @@ bool HedgedConnections::resumePacketReceiver(const HedgedConnections::ReplicaLoc if (offset_states[location.offset].active_connection_count == 0 && !offset_states[location.offset].next_replica_in_process) throw NetException("Receive timeout expired", ErrorCodes::SOCKET_TIMEOUT); } + else if (std::holds_alternative(res)) + { + finishProcessReplica(replica_state, true); + std::rethrow_exception(std::move(std::get(res))); + } return false; } @@ -391,7 +400,7 @@ Packet HedgedConnections::receivePacketFromReplica(const ReplicaLocation & repli { /// If we are allowed to change replica until the first data packet, /// just restart timeout (if it hasn't expired yet). Otherwise disable changing replica with this offset. - if (settings.allow_changing_replica_until_first_data_packet && !replica.is_change_replica_timeout_expired) + if (allow_changing_replica_until_first_data_packet && !replica.is_change_replica_timeout_expired) replica.change_replica_timeout.setRelative(hedged_connections_factory.getConnectionTimeouts().receive_data_timeout); else disableChangingReplica(replica_location); @@ -472,6 +481,15 @@ void HedgedConnections::checkNewReplica() Connection * connection = nullptr; HedgedConnectionsFactory::State state = hedged_connections_factory.waitForReadyConnections(connection); + if (cancelled) + { + /// Do not start new connection if query is already canceled. + if (connection) + connection->disconnect(); + + state = HedgedConnectionsFactory::State::CANNOT_CHOOSE; + } + processNewReplicaState(state, connection); /// Check if we don't need to listen hedged_connections_factory file descriptor in epoll anymore. diff --git a/src/Client/HedgedConnections.h b/src/Client/HedgedConnections.h index 9f7d8837536..b0bff8e7c5d 100644 --- a/src/Client/HedgedConnections.h +++ b/src/Client/HedgedConnections.h @@ -72,7 +72,7 @@ public: }; HedgedConnections(const ConnectionPoolWithFailoverPtr & pool_, - const Settings & settings_, + ContextPtr context_, const ConnectionTimeouts & timeouts_, const ThrottlerPtr & throttler, PoolMode pool_mode, @@ -97,7 +97,7 @@ public: Packet receivePacket() override; - Packet receivePacketUnlocked(AsyncCallback async_callback) override; + Packet receivePacketUnlocked(AsyncCallback async_callback, bool is_draining) override; void disconnect() override; @@ -188,7 +188,14 @@ private: Packet last_received_packet; Epoll epoll; + ContextPtr context; const Settings & settings; + + /// The following two fields are from settings but can be referenced outside the lifetime of + /// settings when connection is drained asynchronously. + Poco::Timespan drain_timeout; + bool allow_changing_replica_until_first_data_packet; + ThrottlerPtr throttler; bool sent_query = false; bool cancelled = false; diff --git a/src/Client/IConnections.cpp b/src/Client/IConnections.cpp new file mode 100644 index 00000000000..dc57cae61a4 --- /dev/null +++ b/src/Client/IConnections.cpp @@ -0,0 +1,31 @@ +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int SOCKET_TIMEOUT; +} + +/// This wrapper struct allows us to use Poco's socket polling code with a raw fd. +/// The only difference from Poco::Net::SocketImpl is that we don't close the fd in the destructor. +struct PocoSocketWrapper : public Poco::Net::SocketImpl +{ + explicit PocoSocketWrapper(int fd) + { + reset(fd); + } + + // Do not close fd. + ~PocoSocketWrapper() override { reset(-1); } +}; + +void IConnections::DrainCallback::operator()(int fd, Poco::Timespan, const std::string fd_description) const +{ + if (!PocoSocketWrapper(fd).poll(drain_timeout, Poco::Net::Socket::SELECT_READ)) + throw Exception(ErrorCodes::SOCKET_TIMEOUT, "Read timeout while draining from {}", fd_description); +} + +} diff --git a/src/Client/IConnections.h b/src/Client/IConnections.h index d251a5fb3ab..53267cbbb3e 100644 --- a/src/Client/IConnections.h +++ b/src/Client/IConnections.h @@ -10,6 +10,12 @@ namespace DB class IConnections : boost::noncopyable { public: + struct DrainCallback + { + Poco::Timespan drain_timeout; + void operator()(int fd, Poco::Timespan, const std::string fd_description = "") const; + }; + /// Send all scalars to replicas. virtual void sendScalarsData(Scalars & data) = 0; /// Send all content of external tables to replicas. @@ -30,7 +36,7 @@ public: virtual Packet receivePacket() = 0; /// Version of `receivePacket` function without locking. - virtual Packet receivePacketUnlocked(AsyncCallback async_callback) = 0; + virtual Packet receivePacketUnlocked(AsyncCallback async_callback, bool is_draining) = 0; /// Break all active connections. virtual void disconnect() = 0; diff --git a/src/Client/MultiplexedConnections.cpp b/src/Client/MultiplexedConnections.cpp index 350beffce28..fe3879fdd30 100644 --- a/src/Client/MultiplexedConnections.cpp +++ b/src/Client/MultiplexedConnections.cpp @@ -18,7 +18,7 @@ namespace ErrorCodes MultiplexedConnections::MultiplexedConnections(Connection & connection, const Settings & settings_, const ThrottlerPtr & throttler) - : settings(settings_) + : settings(settings_), drain_timeout(settings.drain_timeout), receive_timeout(settings.receive_timeout) { connection.setThrottler(throttler); @@ -29,10 +29,23 @@ MultiplexedConnections::MultiplexedConnections(Connection & connection, const Se active_connection_count = 1; } + +MultiplexedConnections::MultiplexedConnections(std::shared_ptr connection_ptr_, const Settings & settings_, const ThrottlerPtr & throttler) + : settings(settings_), drain_timeout(settings.drain_timeout), receive_timeout(settings.receive_timeout) + , connection_ptr(connection_ptr_) +{ + connection_ptr->setThrottler(throttler); + + ReplicaState replica_state; + replica_state.connection = connection_ptr.get(); + replica_states.push_back(replica_state); + + active_connection_count = 1; +} + MultiplexedConnections::MultiplexedConnections( - std::vector && connections, - const Settings & settings_, const ThrottlerPtr & throttler) - : settings(settings_) + std::vector && connections, const Settings & settings_, const ThrottlerPtr & throttler) + : settings(settings_), drain_timeout(settings.drain_timeout), receive_timeout(settings.receive_timeout) { /// If we didn't get any connections from pool and getMany() did not throw exceptions, this means that /// `skip_unavailable_shards` was set. Then just return. @@ -168,7 +181,7 @@ void MultiplexedConnections::sendReadTaskResponse(const String & response) Packet MultiplexedConnections::receivePacket() { std::lock_guard lock(cancel_mutex); - Packet packet = receivePacketUnlocked({}); + Packet packet = receivePacketUnlocked({}, false /* is_draining */); return packet; } @@ -216,7 +229,7 @@ Packet MultiplexedConnections::drain() while (hasActiveConnections()) { - Packet packet = receivePacketUnlocked({}); + Packet packet = receivePacketUnlocked(DrainCallback{drain_timeout}, true /* is_draining */); switch (packet.type) { @@ -264,14 +277,14 @@ std::string MultiplexedConnections::dumpAddressesUnlocked() const return buf.str(); } -Packet MultiplexedConnections::receivePacketUnlocked(AsyncCallback async_callback) +Packet MultiplexedConnections::receivePacketUnlocked(AsyncCallback async_callback, bool is_draining) { if (!sent_query) throw Exception("Cannot receive packets: no query sent.", ErrorCodes::LOGICAL_ERROR); if (!hasActiveConnections()) throw Exception("No more packets are available.", ErrorCodes::LOGICAL_ERROR); - ReplicaState & state = getReplicaForReading(); + ReplicaState & state = getReplicaForReading(is_draining); current_connection = state.connection; if (current_connection == nullptr) throw Exception("Logical error: no available replica", ErrorCodes::NO_AVAILABLE_REPLICA); @@ -323,9 +336,10 @@ Packet MultiplexedConnections::receivePacketUnlocked(AsyncCallback async_callbac return packet; } -MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForReading() +MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForReading(bool is_draining) { - if (replica_states.size() == 1) + /// Fast path when we only focus on one replica and are not draining the connection. + if (replica_states.size() == 1 && !is_draining) return replica_states[0]; Poco::Net::Socket::SocketList read_list; @@ -353,10 +367,26 @@ MultiplexedConnections::ReplicaState & MultiplexedConnections::getReplicaForRead read_list.push_back(*connection->socket); } - int n = Poco::Net::Socket::select(read_list, write_list, except_list, settings.receive_timeout); + int n = Poco::Net::Socket::select( + read_list, + write_list, + except_list, + is_draining ? drain_timeout : receive_timeout); if (n == 0) - throw Exception("Timeout exceeded while reading from " + dumpAddressesUnlocked(), ErrorCodes::TIMEOUT_EXCEEDED); + { + auto err_msg = fmt::format("Timeout exceeded while reading from {}", dumpAddressesUnlocked()); + for (ReplicaState & state : replica_states) + { + Connection * connection = state.connection; + if (connection != nullptr) + { + connection->disconnect(); + invalidateReplica(state); + } + } + throw Exception(err_msg, ErrorCodes::TIMEOUT_EXCEEDED); + } } /// TODO Absolutely wrong code: read_list could be empty; motivation of rand is unclear. diff --git a/src/Client/MultiplexedConnections.h b/src/Client/MultiplexedConnections.h index f642db1c4cd..4fb7d496b0c 100644 --- a/src/Client/MultiplexedConnections.h +++ b/src/Client/MultiplexedConnections.h @@ -22,6 +22,8 @@ class MultiplexedConnections final : public IConnections public: /// Accepts ready connection. MultiplexedConnections(Connection & connection, const Settings & settings_, const ThrottlerPtr & throttler_); + /// Accepts ready connection and keep it alive before drain + MultiplexedConnections(std::shared_ptr connection_, const Settings & settings_, const ThrottlerPtr & throttler_); /// Accepts a vector of connections to replicas of one shard already taken from pool. MultiplexedConnections( @@ -61,7 +63,7 @@ public: bool hasActiveConnections() const override { return active_connection_count > 0; } private: - Packet receivePacketUnlocked(AsyncCallback async_callback) override; + Packet receivePacketUnlocked(AsyncCallback async_callback, bool is_draining) override; /// Internal version of `dumpAddresses` function without locking. std::string dumpAddressesUnlocked() const; @@ -74,14 +76,18 @@ private: }; /// Get a replica where you can read the data. - ReplicaState & getReplicaForReading(); + ReplicaState & getReplicaForReading(bool is_draining); /// Mark the replica as invalid. void invalidateReplica(ReplicaState & replica_state); -private: const Settings & settings; + /// The following two fields are from settings but can be referenced outside the lifetime of + /// settings when connection is drained asynchronously. + Poco::Timespan drain_timeout; + Poco::Timespan receive_timeout; + /// The current number of valid connections to the replicas of this shard. size_t active_connection_count = 0; @@ -90,6 +96,8 @@ private: /// Connection that received last block. Connection * current_connection = nullptr; + /// Shared connection, may be empty. Used to keep object alive before draining. + std::shared_ptr connection_ptr; bool sent_query = false; bool cancelled = false; diff --git a/src/Client/PacketReceiver.h b/src/Client/PacketReceiver.h index 516491db994..ca0d62f0257 100644 --- a/src/Client/PacketReceiver.h +++ b/src/Client/PacketReceiver.h @@ -31,7 +31,7 @@ public: } /// Resume packet receiving. - std::variant resume() + std::variant resume() { /// If there is no pending data, check receive timeout. if (!connection->hasReadPendingData() && !checkReceiveTimeout()) @@ -43,7 +43,7 @@ public: /// Resume fiber. fiber = std::move(fiber).resume(); if (exception) - std::rethrow_exception(std::move(exception)); + return std::move(exception); if (is_read_in_process) return epoll.getFileDescriptor(); diff --git a/src/Client/ya.make b/src/Client/ya.make index 4201203a8e9..88fa14ad377 100644 --- a/src/Client/ya.make +++ b/src/Client/ya.make @@ -12,9 +12,11 @@ PEERDIR( SRCS( Connection.cpp ConnectionEstablisher.cpp + ConnectionPool.cpp ConnectionPoolWithFailover.cpp HedgedConnections.cpp HedgedConnectionsFactory.cpp + IConnections.cpp MultiplexedConnections.cpp ) diff --git a/src/Columns/ColumnNullable.cpp b/src/Columns/ColumnNullable.cpp index 62524315354..dec93fc7a30 100644 --- a/src/Columns/ColumnNullable.cpp +++ b/src/Columns/ColumnNullable.cpp @@ -546,97 +546,54 @@ namespace { /// The following function implements a slightly more general version -/// of getExtremes() than the implementation from ColumnVector. +/// of getExtremes() than the implementation from Not-Null IColumns. /// It takes into account the possible presence of nullable values. -template -void getExtremesFromNullableContent(const ColumnVector & col, const NullMap & null_map, Field & min, Field & max) +void getExtremesWithNulls(const IColumn & nested_column, const NullMap & null_array, Field & min, Field & max, bool null_last = false) { - const auto & data = col.getData(); - size_t size = data.size(); - - if (size == 0) + size_t number_of_nulls = 0; + size_t n = null_array.size(); + NullMap not_null_array(n); + for (auto i = 0ul; i < n; ++i) { - min = Null(); - max = Null(); - return; - } - - bool has_not_null = false; - bool has_not_nan = false; - - T cur_min = 0; - T cur_max = 0; - - for (size_t i = 0; i < size; ++i) - { - const T x = data[i]; - - if (null_map[i]) - continue; - - if (!has_not_null) + if (null_array[i]) { - cur_min = x; - cur_max = x; - has_not_null = true; - has_not_nan = !isNaN(x); - continue; + ++number_of_nulls; + not_null_array[i] = 0; } - - if (isNaN(x)) - continue; - - if (!has_not_nan) + else { - cur_min = x; - cur_max = x; - has_not_nan = true; - continue; + not_null_array[i] = 1; } - - if (x < cur_min) - cur_min = x; - else if (x > cur_max) - cur_max = x; } - - if (has_not_null) + if (number_of_nulls == 0) { - min = cur_min; - max = cur_max; + nested_column.getExtremes(min, max); + } + else if (number_of_nulls == n) + { + min = PositiveInfinity(); + max = PositiveInfinity(); + } + else + { + auto filtered_column = nested_column.filter(not_null_array, -1); + filtered_column->getExtremes(min, max); + if (null_last) + max = PositiveInfinity(); } } - } void ColumnNullable::getExtremes(Field & min, Field & max) const { - min = Null(); - max = Null(); + getExtremesWithNulls(getNestedColumn(), getNullMapData(), min, max); +} - const auto & null_map_data = getNullMapData(); - if (const auto * col_i8 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_i8, null_map_data, min, max); - else if (const auto * col_i16 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_i16, null_map_data, min, max); - else if (const auto * col_i32 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_i32, null_map_data, min, max); - else if (const auto * col_i64 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_i64, null_map_data, min, max); - else if (const auto * col_u8 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_u8, null_map_data, min, max); - else if (const auto * col_u16 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_u16, null_map_data, min, max); - else if (const auto * col_u32 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_u32, null_map_data, min, max); - else if (const auto * col_u64 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_u64, null_map_data, min, max); - else if (const auto * col_f32 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_f32, null_map_data, min, max); - else if (const auto * col_f64 = typeid_cast(nested_column.get())) - getExtremesFromNullableContent(*col_f64, null_map_data, min, max); +void ColumnNullable::getExtremesNullLast(Field & min, Field & max) const +{ + getExtremesWithNulls(getNestedColumn(), getNullMapData(), min, max, true); } diff --git a/src/Columns/ColumnNullable.h b/src/Columns/ColumnNullable.h index 963b3e1e8fa..7b339893ff4 100644 --- a/src/Columns/ColumnNullable.h +++ b/src/Columns/ColumnNullable.h @@ -111,6 +111,8 @@ public: void updateWeakHash32(WeakHash32 & hash) const override; void updateHashFast(SipHash & hash) const override; void getExtremes(Field & min, Field & max) const override; + // Special function for nullable minmax index + void getExtremesNullLast(Field & min, Field & max) const; MutableColumns scatter(ColumnIndex num_columns, const Selector & selector) const override { diff --git a/src/Common/ColumnsHashingImpl.h b/src/Common/ColumnsHashingImpl.h index 9af746a69ad..aa7ae6ea29d 100644 --- a/src/Common/ColumnsHashingImpl.h +++ b/src/Common/ColumnsHashingImpl.h @@ -124,6 +124,10 @@ class FindResultImpl : public FindResultImplBase, public FindResultImplOffsetBas Mapped * value; public: + FindResultImpl() + : FindResultImplBase(false), FindResultImplOffsetBase(0) + {} + FindResultImpl(Mapped * value_, bool found_, size_t off) : FindResultImplBase(found_), FindResultImplOffsetBase(off), value(value_) {} Mapped & getMapped() const { return *value; } diff --git a/src/Common/CurrentMetrics.cpp b/src/Common/CurrentMetrics.cpp index e9fa13e11e6..f94c3421107 100644 --- a/src/Common/CurrentMetrics.cpp +++ b/src/Common/CurrentMetrics.cpp @@ -71,6 +71,10 @@ M(PartsInMemory, "In-memory parts.") \ M(MMappedFiles, "Total number of mmapped files.") \ M(MMappedFileBytes, "Sum size of mmapped file regions.") \ + M(AsyncDrainedConnections, "Number of connections drained asynchronously.") \ + M(ActiveAsyncDrainedConnections, "Number of active connections drained asynchronously.") \ + M(SyncDrainedConnections, "Number of connections drained synchronously.") \ + M(ActiveSyncDrainedConnections, "Number of active connections drained synchronously.") \ namespace CurrentMetrics { diff --git a/src/Common/DNSResolver.cpp b/src/Common/DNSResolver.cpp index 8b006bc550d..4fe0f0bb8c8 100644 --- a/src/Common/DNSResolver.cpp +++ b/src/Common/DNSResolver.cpp @@ -109,11 +109,23 @@ static DNSResolver::IPAddresses resolveIPAddressImpl(const std::string & host) /// It should not affect client address checking, since client cannot connect from IPv6 address /// if server has no IPv6 addresses. flags |= Poco::Net::DNS::DNS_HINT_AI_ADDRCONFIG; + + DNSResolver::IPAddresses addresses; + + try + { #if defined(ARCADIA_BUILD) - auto addresses = Poco::Net::DNS::hostByName(host, &Poco::Net::DNS::DEFAULT_DNS_TIMEOUT, flags).addresses(); + addresses = Poco::Net::DNS::hostByName(host, &Poco::Net::DNS::DEFAULT_DNS_TIMEOUT, flags).addresses(); #else - auto addresses = Poco::Net::DNS::hostByName(host, flags).addresses(); + addresses = Poco::Net::DNS::hostByName(host, flags).addresses(); #endif + } + catch (const Poco::Net::DNSException & e) + { + LOG_ERROR(&Poco::Logger::get("DNSResolver"), "Cannot resolve host ({}), error {}: {}.", host, e.code(), e.message()); + addresses.clear(); + } + if (addresses.empty()) throw Exception("Not found address of host: " + host, ErrorCodes::DNS_ERROR); diff --git a/src/Common/Exception.cpp b/src/Common/Exception.cpp index e98cd3c3046..641f8bbe0f0 100644 --- a/src/Common/Exception.cpp +++ b/src/Common/Exception.cpp @@ -313,7 +313,7 @@ std::string getCurrentExceptionMessage(bool with_stacktrace, bool check_embedded try { stream << "Poco::Exception. Code: " << ErrorCodes::POCO_EXCEPTION << ", e.code() = " << e.code() - << ", e.displayText() = " << e.displayText() + << ", " << e.displayText() << (with_stacktrace ? ", Stack trace (when copying this message, always include the lines below):\n\n" + getExceptionStackTraceString(e) : "") << (with_extra_info ? getExtraExceptionInfo(e) : "") << " (version " << VERSION_STRING << VERSION_OFFICIAL << ")"; @@ -433,7 +433,12 @@ std::string getExceptionMessage(const Exception & e, bool with_stacktrace, bool } } - stream << "Code: " << e.code() << ", e.displayText() = " << text; + stream << "Code: " << e.code() << ". " << text; + + if (!text.empty() && text.back() != '.') + stream << '.'; + + stream << " (" << ErrorCodes::getName(e.code()) << ")"; if (with_stacktrace && !has_embedded_stack_trace) stream << ", Stack trace (when copying this message, always include the lines below):\n\n" << e.getStackTraceString(); diff --git a/src/Common/FieldVisitorConvertToNumber.h b/src/Common/FieldVisitorConvertToNumber.h index 0f099c6215d..82a804691d7 100644 --- a/src/Common/FieldVisitorConvertToNumber.h +++ b/src/Common/FieldVisitorConvertToNumber.h @@ -26,6 +26,16 @@ public: throw Exception("Cannot convert NULL to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE); } + T operator() (const NegativeInfinity &) const + { + throw Exception("Cannot convert -Inf to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE); + } + + T operator() (const PositiveInfinity &) const + { + throw Exception("Cannot convert +Inf to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE); + } + T operator() (const String &) const { throw Exception("Cannot convert String to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE); diff --git a/src/Common/FieldVisitorDump.cpp b/src/Common/FieldVisitorDump.cpp index e6726a4502e..5e767cf30c1 100644 --- a/src/Common/FieldVisitorDump.cpp +++ b/src/Common/FieldVisitorDump.cpp @@ -25,6 +25,8 @@ static inline void writeQuoted(const DecimalField & x, WriteBuffer & buf) } String FieldVisitorDump::operator() (const Null &) const { return "NULL"; } +String FieldVisitorDump::operator() (const NegativeInfinity &) const { return "-Inf"; } +String FieldVisitorDump::operator() (const PositiveInfinity &) const { return "+Inf"; } String FieldVisitorDump::operator() (const UInt64 & x) const { return formatQuotedWithPrefix(x, "UInt64_"); } String FieldVisitorDump::operator() (const Int64 & x) const { return formatQuotedWithPrefix(x, "Int64_"); } String FieldVisitorDump::operator() (const Float64 & x) const { return formatQuotedWithPrefix(x, "Float64_"); } diff --git a/src/Common/FieldVisitorDump.h b/src/Common/FieldVisitorDump.h index 22e34d66ff7..bc82d35f0f1 100644 --- a/src/Common/FieldVisitorDump.h +++ b/src/Common/FieldVisitorDump.h @@ -10,6 +10,8 @@ class FieldVisitorDump : public StaticVisitor { public: String operator() (const Null & x) const; + String operator() (const NegativeInfinity & x) const; + String operator() (const PositiveInfinity & x) const; String operator() (const UInt64 & x) const; String operator() (const UInt128 & x) const; String operator() (const UInt256 & x) const; diff --git a/src/Common/FieldVisitorHash.cpp b/src/Common/FieldVisitorHash.cpp index 80d5f2daf65..259dd871d20 100644 --- a/src/Common/FieldVisitorHash.cpp +++ b/src/Common/FieldVisitorHash.cpp @@ -14,6 +14,18 @@ void FieldVisitorHash::operator() (const Null &) const hash.update(type); } +void FieldVisitorHash::operator() (const NegativeInfinity &) const +{ + UInt8 type = Field::Types::NegativeInfinity; + hash.update(type); +} + +void FieldVisitorHash::operator() (const PositiveInfinity &) const +{ + UInt8 type = Field::Types::PositiveInfinity; + hash.update(type); +} + void FieldVisitorHash::operator() (const UInt64 & x) const { UInt8 type = Field::Types::UInt64; diff --git a/src/Common/FieldVisitorHash.h b/src/Common/FieldVisitorHash.h index 6c786fda4ad..bf7c3d5004f 100644 --- a/src/Common/FieldVisitorHash.h +++ b/src/Common/FieldVisitorHash.h @@ -16,6 +16,8 @@ public: FieldVisitorHash(SipHash & hash_); void operator() (const Null & x) const; + void operator() (const NegativeInfinity & x) const; + void operator() (const PositiveInfinity & x) const; void operator() (const UInt64 & x) const; void operator() (const UInt128 & x) const; void operator() (const UInt256 & x) const; diff --git a/src/Common/FieldVisitorSum.cpp b/src/Common/FieldVisitorSum.cpp index 0064830c08a..e0ffca28341 100644 --- a/src/Common/FieldVisitorSum.cpp +++ b/src/Common/FieldVisitorSum.cpp @@ -22,6 +22,8 @@ bool FieldVisitorSum::operator() (UInt64 & x) const bool FieldVisitorSum::operator() (Float64 & x) const { x += get(rhs); return x != 0; } bool FieldVisitorSum::operator() (Null &) const { throw Exception("Cannot sum Nulls", ErrorCodes::LOGICAL_ERROR); } +bool FieldVisitorSum::operator() (NegativeInfinity &) const { throw Exception("Cannot sum -Inf", ErrorCodes::LOGICAL_ERROR); } +bool FieldVisitorSum::operator() (PositiveInfinity &) const { throw Exception("Cannot sum +Inf", ErrorCodes::LOGICAL_ERROR); } bool FieldVisitorSum::operator() (String &) const { throw Exception("Cannot sum Strings", ErrorCodes::LOGICAL_ERROR); } bool FieldVisitorSum::operator() (Array &) const { throw Exception("Cannot sum Arrays", ErrorCodes::LOGICAL_ERROR); } bool FieldVisitorSum::operator() (Tuple &) const { throw Exception("Cannot sum Tuples", ErrorCodes::LOGICAL_ERROR); } diff --git a/src/Common/FieldVisitorSum.h b/src/Common/FieldVisitorSum.h index e208933043b..4c34fa86455 100644 --- a/src/Common/FieldVisitorSum.h +++ b/src/Common/FieldVisitorSum.h @@ -21,6 +21,8 @@ public: bool operator() (UInt64 & x) const; bool operator() (Float64 & x) const; bool operator() (Null &) const; + bool operator() (NegativeInfinity & x) const; + bool operator() (PositiveInfinity & x) const; bool operator() (String &) const; bool operator() (Array &) const; bool operator() (Tuple &) const; diff --git a/src/Common/FieldVisitorToString.cpp b/src/Common/FieldVisitorToString.cpp index 45bc54f2c2a..74dfc55e1db 100644 --- a/src/Common/FieldVisitorToString.cpp +++ b/src/Common/FieldVisitorToString.cpp @@ -53,6 +53,8 @@ static String formatFloat(const Float64 x) String FieldVisitorToString::operator() (const Null &) const { return "NULL"; } +String FieldVisitorToString::operator() (const NegativeInfinity &) const { return "-Inf"; } +String FieldVisitorToString::operator() (const PositiveInfinity &) const { return "+Inf"; } String FieldVisitorToString::operator() (const UInt64 & x) const { return formatQuoted(x); } String FieldVisitorToString::operator() (const Int64 & x) const { return formatQuoted(x); } String FieldVisitorToString::operator() (const Float64 & x) const { return formatFloat(x); } diff --git a/src/Common/FieldVisitorToString.h b/src/Common/FieldVisitorToString.h index 39709f1c272..139f011927f 100644 --- a/src/Common/FieldVisitorToString.h +++ b/src/Common/FieldVisitorToString.h @@ -10,6 +10,8 @@ class FieldVisitorToString : public StaticVisitor { public: String operator() (const Null & x) const; + String operator() (const NegativeInfinity & x) const; + String operator() (const PositiveInfinity & x) const; String operator() (const UInt64 & x) const; String operator() (const UInt128 & x) const; String operator() (const UInt256 & x) const; diff --git a/src/Common/FieldVisitorWriteBinary.cpp b/src/Common/FieldVisitorWriteBinary.cpp index 8e991ad13d3..56df9f1e43a 100644 --- a/src/Common/FieldVisitorWriteBinary.cpp +++ b/src/Common/FieldVisitorWriteBinary.cpp @@ -7,6 +7,8 @@ namespace DB { void FieldVisitorWriteBinary::operator() (const Null &, WriteBuffer &) const { } +void FieldVisitorWriteBinary::operator() (const NegativeInfinity &, WriteBuffer &) const { } +void FieldVisitorWriteBinary::operator() (const PositiveInfinity &, WriteBuffer &) const { } void FieldVisitorWriteBinary::operator() (const UInt64 & x, WriteBuffer & buf) const { writeVarUInt(x, buf); } void FieldVisitorWriteBinary::operator() (const Int64 & x, WriteBuffer & buf) const { writeVarInt(x, buf); } void FieldVisitorWriteBinary::operator() (const Float64 & x, WriteBuffer & buf) const { writeFloatBinary(x, buf); } diff --git a/src/Common/FieldVisitorWriteBinary.h b/src/Common/FieldVisitorWriteBinary.h index ae864ca74f3..5f7bf578e32 100644 --- a/src/Common/FieldVisitorWriteBinary.h +++ b/src/Common/FieldVisitorWriteBinary.h @@ -9,6 +9,8 @@ class FieldVisitorWriteBinary { public: void operator() (const Null & x, WriteBuffer & buf) const; + void operator() (const NegativeInfinity & x, WriteBuffer & buf) const; + void operator() (const PositiveInfinity & x, WriteBuffer & buf) const; void operator() (const UInt64 & x, WriteBuffer & buf) const; void operator() (const UInt128 & x, WriteBuffer & buf) const; void operator() (const UInt256 & x, WriteBuffer & buf) const; diff --git a/src/Common/FieldVisitorsAccurateComparison.h b/src/Common/FieldVisitorsAccurateComparison.h index ba3fabd1535..9e6a93cee3f 100644 --- a/src/Common/FieldVisitorsAccurateComparison.h +++ b/src/Common/FieldVisitorsAccurateComparison.h @@ -26,8 +26,12 @@ public: template bool operator() (const T & l, const U & r) const { - if constexpr (std::is_same_v || std::is_same_v) + if constexpr (std::is_same_v || std::is_same_v + || std::is_same_v || std::is_same_v + || std::is_same_v || std::is_same_v) + { return std::is_same_v; + } else { if constexpr (std::is_same_v) @@ -77,6 +81,10 @@ public: { if constexpr (std::is_same_v || std::is_same_v) return false; + else if constexpr (std::is_same_v || std::is_same_v) + return !std::is_same_v; + else if constexpr (std::is_same_v || std::is_same_v) + return false; else { if constexpr (std::is_same_v) diff --git a/src/Common/ProfileEvents.cpp b/src/Common/ProfileEvents.cpp index 915d14466b6..f4f47148d56 100644 --- a/src/Common/ProfileEvents.cpp +++ b/src/Common/ProfileEvents.cpp @@ -224,7 +224,7 @@ M(PerfLocalMemoryReferences, "Local NUMA node memory reads") \ M(PerfLocalMemoryMisses, "Local NUMA node memory read misses") \ \ - M(CreatedHTTPConnections, "Total amount of created HTTP connections (closed or opened).") \ + M(CreatedHTTPConnections, "Total amount of created HTTP connections (counter increase every time connection is created).") \ \ M(CannotWriteToWriteBufferDiscard, "Number of stack traces dropped by query profiler or signal handler because pipe is full or cannot write to pipe.") \ M(QueryProfilerSignalOverruns, "Number of times we drop processing of a signal due to overrun plus the number of signals that OS has not delivered due to overrun.") \ @@ -248,6 +248,9 @@ M(S3WriteRequestsThrottling, "Number of 429 and 503 errors in POST, DELETE, PUT and PATCH requests to S3 storage.") \ M(S3WriteRequestsRedirects, "Number of redirects in POST, DELETE, PUT and PATCH requests to S3 storage.") \ M(QueryMemoryLimitExceeded, "Number of times when memory limit exceeded for query.") \ + \ + M(SleepFunctionCalls, "Number of times a sleep function (sleep, sleepEachRow) has been called.") \ + M(SleepFunctionMicroseconds, "Time spent sleeping due to a sleep function call.") \ namespace ProfileEvents diff --git a/src/Common/TLDListsHolder.cpp b/src/Common/TLDListsHolder.cpp index f0702f37e93..34bef8248b5 100644 --- a/src/Common/TLDListsHolder.cpp +++ b/src/Common/TLDListsHolder.cpp @@ -2,6 +2,7 @@ #include #include #include +#include #include #include @@ -11,11 +12,10 @@ namespace DB namespace ErrorCodes { extern const int TLD_LIST_NOT_FOUND; + extern const int LOGICAL_ERROR; } -/// /// TLDList -/// TLDList::TLDList(size_t size) : tld_container(size) , pool(std::make_unique(10 << 20)) @@ -31,9 +31,7 @@ bool TLDList::has(const StringRef & host) const return tld_container.has(host); } -/// /// TLDListsHolder -/// TLDListsHolder & TLDListsHolder::getInstance() { static TLDListsHolder instance; @@ -62,24 +60,22 @@ size_t TLDListsHolder::parseAndAddTldList(const std::string & name, const std::s std::unordered_set tld_list_tmp; ReadBufferFromFile in(path); + String line; while (!in.eof()) { - char * newline = find_first_symbols<'\n'>(in.position(), in.buffer().end()); - if (newline >= in.buffer().end()) - break; - - std::string_view line(in.position(), newline - in.position()); - in.position() = newline + 1; - + readEscapedStringUntilEOL(line, in); + ++in.position(); /// Skip comments if (line.size() > 2 && line[0] == '/' && line[1] == '/') continue; - trim(line); + line = trim(line, [](char c) { return std::isspace(c); }); /// Skip empty line if (line.empty()) continue; tld_list_tmp.emplace(line); } + if (!in.eof()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Not all list had been read", name); TLDList tld_list(tld_list_tmp.size()); for (const auto & host : tld_list_tmp) diff --git a/src/Common/ZooKeeper/ZooKeeper.cpp b/src/Common/ZooKeeper/ZooKeeper.cpp index 1ee70b0cc3f..9b3c3191b5d 100644 --- a/src/Common/ZooKeeper/ZooKeeper.cpp +++ b/src/Common/ZooKeeper/ZooKeeper.cpp @@ -111,7 +111,8 @@ void ZooKeeper::init(const std::string & implementation_, const Strings & hosts_ identity_, Poco::Timespan(0, session_timeout_ms_ * 1000), Poco::Timespan(0, ZOOKEEPER_CONNECTION_TIMEOUT_MS * 1000), - Poco::Timespan(0, operation_timeout_ms_ * 1000)); + Poco::Timespan(0, operation_timeout_ms_ * 1000), + zk_log); if (chroot.empty()) LOG_TRACE(log, "Initialized, hosts: {}", fmt::join(hosts, ",")); @@ -134,8 +135,10 @@ void ZooKeeper::init(const std::string & implementation_, const Strings & hosts_ } ZooKeeper::ZooKeeper(const std::string & hosts_string, const std::string & identity_, int32_t session_timeout_ms_, - int32_t operation_timeout_ms_, const std::string & chroot_, const std::string & implementation_) + int32_t operation_timeout_ms_, const std::string & chroot_, const std::string & implementation_, + std::shared_ptr zk_log_) { + zk_log = std::move(zk_log_); Strings hosts_strings; splitInto<','>(hosts_strings, hosts_string); @@ -143,8 +146,10 @@ ZooKeeper::ZooKeeper(const std::string & hosts_string, const std::string & ident } ZooKeeper::ZooKeeper(const Strings & hosts_, const std::string & identity_, int32_t session_timeout_ms_, - int32_t operation_timeout_ms_, const std::string & chroot_, const std::string & implementation_) + int32_t operation_timeout_ms_, const std::string & chroot_, const std::string & implementation_, + std::shared_ptr zk_log_) { + zk_log = std::move(zk_log_); init(implementation_, hosts_, identity_, session_timeout_ms_, operation_timeout_ms_, chroot_); } @@ -209,7 +214,8 @@ struct ZooKeeperArgs std::string implementation; }; -ZooKeeper::ZooKeeper(const Poco::Util::AbstractConfiguration & config, const std::string & config_name) +ZooKeeper::ZooKeeper(const Poco::Util::AbstractConfiguration & config, const std::string & config_name, std::shared_ptr zk_log_) + : zk_log(std::move(zk_log_)) { ZooKeeperArgs args(config, config_name); init(args.implementation, args.hosts, args.identity, args.session_timeout_ms, args.operation_timeout_ms, args.chroot); @@ -727,7 +733,7 @@ bool ZooKeeper::waitForDisappear(const std::string & path, const WaitCondition & ZooKeeperPtr ZooKeeper::startNewSession() const { - return std::make_shared(hosts, identity, session_timeout_ms, operation_timeout_ms, chroot, implementation); + return std::make_shared(hosts, identity, session_timeout_ms, operation_timeout_ms, chroot, implementation, zk_log); } @@ -1018,6 +1024,14 @@ void ZooKeeper::finalize() impl->finalize(); } +void ZooKeeper::setZooKeeperLog(std::shared_ptr zk_log_) +{ + zk_log = std::move(zk_log_); + if (auto * zk = dynamic_cast(impl.get())) + zk->setZooKeeperLog(zk_log); +} + + size_t KeeperMultiException::getFailedOpIndex(Coordination::Error exception_code, const Coordination::Responses & responses) { if (responses.empty()) diff --git a/src/Common/ZooKeeper/ZooKeeper.h b/src/Common/ZooKeeper/ZooKeeper.h index 7aafee52bf0..bfbfea03aae 100644 --- a/src/Common/ZooKeeper/ZooKeeper.h +++ b/src/Common/ZooKeeper/ZooKeeper.h @@ -25,6 +25,10 @@ namespace CurrentMetrics extern const Metric EphemeralNode; } +namespace DB +{ + class ZooKeeperLog; +} namespace zkutil { @@ -52,13 +56,15 @@ public: int32_t session_timeout_ms_ = Coordination::DEFAULT_SESSION_TIMEOUT_MS, int32_t operation_timeout_ms_ = Coordination::DEFAULT_OPERATION_TIMEOUT_MS, const std::string & chroot_ = "", - const std::string & implementation_ = "zookeeper"); + const std::string & implementation_ = "zookeeper", + std::shared_ptr zk_log_ = nullptr); ZooKeeper(const Strings & hosts_, const std::string & identity_ = "", int32_t session_timeout_ms_ = Coordination::DEFAULT_SESSION_TIMEOUT_MS, int32_t operation_timeout_ms_ = Coordination::DEFAULT_OPERATION_TIMEOUT_MS, const std::string & chroot_ = "", - const std::string & implementation_ = "zookeeper"); + const std::string & implementation_ = "zookeeper", + std::shared_ptr zk_log_ = nullptr); /** Config of the form: @@ -82,7 +88,7 @@ public: user:password */ - ZooKeeper(const Poco::Util::AbstractConfiguration & config, const std::string & config_name); + ZooKeeper(const Poco::Util::AbstractConfiguration & config, const std::string & config_name, std::shared_ptr zk_log_); /// Creates a new session with the same parameters. This method can be used for reconnecting /// after the session has expired. @@ -269,6 +275,8 @@ public: void finalize(); + void setZooKeeperLog(std::shared_ptr zk_log_); + private: friend class EphemeralNodeHolder; @@ -298,6 +306,7 @@ private: std::mutex mutex; Poco::Logger * log = nullptr; + std::shared_ptr zk_log; }; diff --git a/src/Common/ZooKeeper/ZooKeeperCommon.cpp b/src/Common/ZooKeeper/ZooKeeperCommon.cpp index 1560d7a25da..f96d29d3290 100644 --- a/src/Common/ZooKeeper/ZooKeeperCommon.cpp +++ b/src/Common/ZooKeeper/ZooKeeperCommon.cpp @@ -537,6 +537,139 @@ void ZooKeeperSessionIDResponse::writeImpl(WriteBuffer & out) const Coordination::write(server_id, out); } + +void ZooKeeperRequest::createLogElements(LogElements & elems) const +{ + elems.emplace_back(); + auto & elem = elems.back(); + elem.xid = xid; + elem.has_watch = has_watch; + elem.op_num = static_cast(getOpNum()); + elem.path = getPath(); + elem.request_idx = elems.size() - 1; +} + + +void ZooKeeperCreateRequest::createLogElements(LogElements & elems) const +{ + ZooKeeperRequest::createLogElements(elems); + auto & elem = elems.back(); + elem.data = data; + elem.is_ephemeral = is_ephemeral; + elem.is_sequential = is_sequential; +} + +void ZooKeeperRemoveRequest::createLogElements(LogElements & elems) const +{ + ZooKeeperRequest::createLogElements(elems); + auto & elem = elems.back(); + elem.version = version; +} + +void ZooKeeperSetRequest::createLogElements(LogElements & elems) const +{ + ZooKeeperRequest::createLogElements(elems); + auto & elem = elems.back(); + elem.data = data; + elem.version = version; +} + +void ZooKeeperCheckRequest::createLogElements(LogElements & elems) const +{ + ZooKeeperRequest::createLogElements(elems); + auto & elem = elems.back(); + elem.version = version; +} + +void ZooKeeperMultiRequest::createLogElements(LogElements & elems) const +{ + ZooKeeperRequest::createLogElements(elems); + elems.back().requests_size = requests.size(); + for (const auto & request : requests) + { + auto & req = dynamic_cast(*request); + assert(!req.xid || req.xid == xid); + req.createLogElements(elems); + } +} + + +void ZooKeeperResponse::fillLogElements(LogElements & elems, size_t idx) const +{ + auto & elem = elems[idx]; + assert(!elem.xid || elem.xid == xid); + elem.xid = xid; + int32_t response_op = tryGetOpNum(); + assert(!elem.op_num || elem.op_num == response_op || response_op < 0); + elem.op_num = response_op; + + elem.zxid = zxid; + elem.error = static_cast(error); +} + +void ZooKeeperWatchResponse::fillLogElements(LogElements & elems, size_t idx) const +{ + ZooKeeperResponse::fillLogElements(elems, idx); + auto & elem = elems[idx]; + elem.watch_type = type; + elem.watch_state = state; + elem.path = path; +} + +void ZooKeeperCreateResponse::fillLogElements(LogElements & elems, size_t idx) const +{ + ZooKeeperResponse::fillLogElements(elems, idx); + auto & elem = elems[idx]; + elem.path_created = path_created; +} + +void ZooKeeperExistsResponse::fillLogElements(LogElements & elems, size_t idx) const +{ + ZooKeeperResponse::fillLogElements(elems, idx); + auto & elem = elems[idx]; + elem.stat = stat; +} + +void ZooKeeperGetResponse::fillLogElements(LogElements & elems, size_t idx) const +{ + ZooKeeperResponse::fillLogElements(elems, idx); + auto & elem = elems[idx]; + elem.data = data; + elem.stat = stat; +} + +void ZooKeeperSetResponse::fillLogElements(LogElements & elems, size_t idx) const +{ + ZooKeeperResponse::fillLogElements(elems, idx); + auto & elem = elems[idx]; + elem.stat = stat; +} + +void ZooKeeperListResponse::fillLogElements(LogElements & elems, size_t idx) const +{ + ZooKeeperResponse::fillLogElements(elems, idx); + auto & elem = elems[idx]; + elem.stat = stat; + elem.children = names; +} + +void ZooKeeperMultiResponse::fillLogElements(LogElements & elems, size_t idx) const +{ + assert(idx == 0); + assert(elems.size() == responses.size() + 1); + ZooKeeperResponse::fillLogElements(elems, idx); + for (const auto & response : responses) + { + auto & resp = dynamic_cast(*response); + assert(!resp.xid || resp.xid == xid); + assert(!resp.zxid || resp.zxid == zxid); + resp.xid = xid; + resp.zxid = zxid; + resp.fillLogElements(elems, ++idx); + } +} + + void ZooKeeperRequestFactory::registerRequest(OpNum op_num, Creator creator) { if (!op_num_to_request.try_emplace(op_num, creator).second) diff --git a/src/Common/ZooKeeper/ZooKeeperCommon.h b/src/Common/ZooKeeper/ZooKeeperCommon.h index eb7f42f900a..16190aa25d3 100644 --- a/src/Common/ZooKeeper/ZooKeeperCommon.h +++ b/src/Common/ZooKeeper/ZooKeeperCommon.h @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -22,6 +23,8 @@ namespace Coordination { +using LogElements = std::vector; + struct ZooKeeperResponse : virtual Response { XID xid = 0; @@ -32,6 +35,8 @@ struct ZooKeeperResponse : virtual Response virtual void writeImpl(WriteBuffer &) const = 0; virtual void write(WriteBuffer & out) const; virtual OpNum getOpNum() const = 0; + virtual void fillLogElements(LogElements & elems, size_t idx) const; + virtual int32_t tryGetOpNum() const { return static_cast(getOpNum()); } }; using ZooKeeperResponsePtr = std::shared_ptr; @@ -63,6 +68,8 @@ struct ZooKeeperRequest : virtual Request virtual ZooKeeperResponsePtr makeResponse() const = 0; virtual bool isReadRequest() const = 0; + + virtual void createLogElements(LogElements & elems) const; }; using ZooKeeperRequestPtr = std::shared_ptr; @@ -119,6 +126,9 @@ struct ZooKeeperWatchResponse final : WatchResponse, ZooKeeperResponse { throw Exception("OpNum for watch response doesn't exist", Error::ZRUNTIMEINCONSISTENCY); } + + void fillLogElements(LogElements & elems, size_t idx) const override; + int32_t tryGetOpNum() const override { return 0; } }; struct ZooKeeperAuthRequest final : ZooKeeperRequest @@ -188,6 +198,8 @@ struct ZooKeeperCreateRequest final : public CreateRequest, ZooKeeperRequest bool isReadRequest() const override { return false; } size_t bytesSize() const override { return CreateRequest::bytesSize() + sizeof(xid) + sizeof(has_watch); } + + void createLogElements(LogElements & elems) const override; }; struct ZooKeeperCreateResponse final : CreateResponse, ZooKeeperResponse @@ -199,6 +211,8 @@ struct ZooKeeperCreateResponse final : CreateResponse, ZooKeeperResponse OpNum getOpNum() const override { return OpNum::Create; } size_t bytesSize() const override { return CreateResponse::bytesSize() + sizeof(xid) + sizeof(zxid); } + + void fillLogElements(LogElements & elems, size_t idx) const override; }; struct ZooKeeperRemoveRequest final : RemoveRequest, ZooKeeperRequest @@ -214,6 +228,8 @@ struct ZooKeeperRemoveRequest final : RemoveRequest, ZooKeeperRequest bool isReadRequest() const override { return false; } size_t bytesSize() const override { return RemoveRequest::bytesSize() + sizeof(xid); } + + void createLogElements(LogElements & elems) const override; }; struct ZooKeeperRemoveResponse final : RemoveResponse, ZooKeeperResponse @@ -244,6 +260,8 @@ struct ZooKeeperExistsResponse final : ExistsResponse, ZooKeeperResponse OpNum getOpNum() const override { return OpNum::Exists; } size_t bytesSize() const override { return ExistsResponse::bytesSize() + sizeof(xid) + sizeof(zxid); } + + void fillLogElements(LogElements & elems, size_t idx) const override; }; struct ZooKeeperGetRequest final : GetRequest, ZooKeeperRequest @@ -265,6 +283,8 @@ struct ZooKeeperGetResponse final : GetResponse, ZooKeeperResponse OpNum getOpNum() const override { return OpNum::Get; } size_t bytesSize() const override { return GetResponse::bytesSize() + sizeof(xid) + sizeof(zxid); } + + void fillLogElements(LogElements & elems, size_t idx) const override; }; struct ZooKeeperSetRequest final : SetRequest, ZooKeeperRequest @@ -279,6 +299,8 @@ struct ZooKeeperSetRequest final : SetRequest, ZooKeeperRequest bool isReadRequest() const override { return false; } size_t bytesSize() const override { return SetRequest::bytesSize() + sizeof(xid); } + + void createLogElements(LogElements & elems) const override; }; struct ZooKeeperSetResponse final : SetResponse, ZooKeeperResponse @@ -288,6 +310,8 @@ struct ZooKeeperSetResponse final : SetResponse, ZooKeeperResponse OpNum getOpNum() const override { return OpNum::Set; } size_t bytesSize() const override { return SetResponse::bytesSize() + sizeof(xid) + sizeof(zxid); } + + void fillLogElements(LogElements & elems, size_t idx) const override; }; struct ZooKeeperListRequest : ListRequest, ZooKeeperRequest @@ -313,6 +337,8 @@ struct ZooKeeperListResponse : ListResponse, ZooKeeperResponse OpNum getOpNum() const override { return OpNum::List; } size_t bytesSize() const override { return ListResponse::bytesSize() + sizeof(xid) + sizeof(zxid); } + + void fillLogElements(LogElements & elems, size_t idx) const override; }; struct ZooKeeperSimpleListResponse final : ZooKeeperListResponse @@ -333,6 +359,8 @@ struct ZooKeeperCheckRequest final : CheckRequest, ZooKeeperRequest bool isReadRequest() const override { return true; } size_t bytesSize() const override { return CheckRequest::bytesSize() + sizeof(xid) + sizeof(has_watch); } + + void createLogElements(LogElements & elems) const override; }; struct ZooKeeperCheckResponse final : CheckResponse, ZooKeeperResponse @@ -409,6 +437,8 @@ struct ZooKeeperMultiRequest final : MultiRequest, ZooKeeperRequest bool isReadRequest() const override; size_t bytesSize() const override { return MultiRequest::bytesSize() + sizeof(xid) + sizeof(has_watch); } + + void createLogElements(LogElements & elems) const override; }; struct ZooKeeperMultiResponse final : MultiResponse, ZooKeeperResponse @@ -433,6 +463,8 @@ struct ZooKeeperMultiResponse final : MultiResponse, ZooKeeperResponse void writeImpl(WriteBuffer & out) const override; size_t bytesSize() const override { return MultiResponse::bytesSize() + sizeof(xid) + sizeof(zxid); } + + void fillLogElements(LogElements & elems, size_t idx) const override; }; /// Fake internal coordination (keeper) response. Never received from client diff --git a/src/Common/ZooKeeper/ZooKeeperImpl.cpp b/src/Common/ZooKeeper/ZooKeeperImpl.cpp index bd1620099bb..5f15a3b8b75 100644 --- a/src/Common/ZooKeeper/ZooKeeperImpl.cpp +++ b/src/Common/ZooKeeper/ZooKeeperImpl.cpp @@ -311,11 +311,14 @@ ZooKeeper::ZooKeeper( const String & auth_data, Poco::Timespan session_timeout_, Poco::Timespan connection_timeout, - Poco::Timespan operation_timeout_) + Poco::Timespan operation_timeout_, + std::shared_ptr zk_log_) : root_path(root_path_), session_timeout(session_timeout_), operation_timeout(std::min(operation_timeout_, session_timeout_)) { + std::atomic_store(&zk_log, std::move(zk_log_)); + if (!root_path.empty()) { if (root_path.back() == '/') @@ -578,6 +581,8 @@ void ZooKeeper::sendThread() info.request->probably_sent = true; info.request->write(*out); + logOperationIfNeeded(info.request); + /// We sent close request, exit if (info.request->xid == CLOSE_XID) break; @@ -747,6 +752,9 @@ void ZooKeeper::receiveEvent() if (!response) response = request_info.request->makeResponse(); + response->xid = xid; + response->zxid = zxid; + if (err != Error::ZOK) { response->error = err; @@ -785,6 +793,8 @@ void ZooKeeper::receiveEvent() int32_t actual_length = in->count() - count_before_event; if (length != actual_length) throw Exception("Response length doesn't match. Expected: " + DB::toString(length) + ", actual: " + DB::toString(actual_length), Error::ZMARSHALLINGERROR); + + logOperationIfNeeded(request_info.request, response); //-V614 } catch (...) { @@ -802,6 +812,8 @@ void ZooKeeper::receiveEvent() { if (request_info.callback) request_info.callback(*response); + + logOperationIfNeeded(request_info.request, response); } catch (...) { @@ -880,17 +892,19 @@ void ZooKeeper::finalize(bool error_send, bool error_receive) for (auto & op : operations) { RequestInfo & request_info = op.second; - ResponsePtr response = request_info.request->makeResponse(); + ZooKeeperResponsePtr response = request_info.request->makeResponse(); response->error = request_info.request->probably_sent ? Error::ZCONNECTIONLOSS : Error::ZSESSIONEXPIRED; + response->xid = request_info.request->xid; if (request_info.callback) { try { request_info.callback(*response); + logOperationIfNeeded(request_info.request, response, true); } catch (...) { @@ -942,13 +956,15 @@ void ZooKeeper::finalize(bool error_send, bool error_receive) { if (info.callback) { - ResponsePtr response = info.request->makeResponse(); + ZooKeeperResponsePtr response = info.request->makeResponse(); if (response) { response->error = Error::ZSESSIONEXPIRED; + response->xid = info.request->xid; try { info.callback(*response); + logOperationIfNeeded(info.request, response, true); } catch (...) { @@ -993,6 +1009,12 @@ void ZooKeeper::pushRequest(RequestInfo && info) throw Exception("xid equal to close_xid", Error::ZSESSIONEXPIRED); if (info.request->xid < 0) throw Exception("XID overflow", Error::ZSESSIONEXPIRED); + + if (auto * multi_request = dynamic_cast(info.request.get())) + { + for (auto & request : multi_request->requests) + dynamic_cast(*request).xid = multi_request->xid; + } } /// We must serialize 'pushRequest' and 'finalize' (from sendThread, receiveThread) calls @@ -1190,4 +1212,53 @@ void ZooKeeper::close() ProfileEvents::increment(ProfileEvents::ZooKeeperClose); } + +void ZooKeeper::setZooKeeperLog(std::shared_ptr zk_log_) +{ + /// logOperationIfNeeded(...) uses zk_log and can be called from different threads, so we have to use atomic shared_ptr + std::atomic_store(&zk_log, std::move(zk_log_)); +} + +void ZooKeeper::logOperationIfNeeded(const ZooKeeperRequestPtr & request, const ZooKeeperResponsePtr & response, bool finalize) +{ + auto maybe_zk_log = std::atomic_load(&zk_log); + if (!maybe_zk_log) + return; + + ZooKeeperLogElement::Type log_type = ZooKeeperLogElement::UNKNOWN; + Decimal64 event_time = std::chrono::duration_cast( + std::chrono::system_clock::now().time_since_epoch() + ).count(); + LogElements elems; + if (request) + { + request->createLogElements(elems); + log_type = ZooKeeperLogElement::REQUEST; + } + else + { + assert(response); + assert(response->xid == PING_XID || response->xid == WATCH_XID); + elems.emplace_back(); + } + + if (response) + { + response->fillLogElements(elems, 0); + log_type = ZooKeeperLogElement::RESPONSE; + } + + if (finalize) + log_type = ZooKeeperLogElement::FINALIZE; + + for (auto & elem : elems) + { + elem.type = log_type; + elem.event_time = event_time; + elem.address = socket.peerAddress(); + elem.session_id = session_id; + maybe_zk_log->add(elem); + } +} + } diff --git a/src/Common/ZooKeeper/ZooKeeperImpl.h b/src/Common/ZooKeeper/ZooKeeperImpl.h index 2210fd98b18..8f0f64ceafa 100644 --- a/src/Common/ZooKeeper/ZooKeeperImpl.h +++ b/src/Common/ZooKeeper/ZooKeeperImpl.h @@ -80,6 +80,10 @@ namespace CurrentMetrics extern const Metric ZooKeeperSession; } +namespace DB +{ + class ZooKeeperLog; +} namespace Coordination { @@ -110,7 +114,8 @@ public: const String & auth_data, Poco::Timespan session_timeout_, Poco::Timespan connection_timeout, - Poco::Timespan operation_timeout_); + Poco::Timespan operation_timeout_, + std::shared_ptr zk_log_); ~ZooKeeper() override; @@ -184,6 +189,8 @@ public: void finalize() override { finalize(false, false); } + void setZooKeeperLog(std::shared_ptr zk_log_); + private: String root_path; ACLs default_acls; @@ -258,7 +265,10 @@ private: template void read(T &); + void logOperationIfNeeded(const ZooKeeperRequestPtr & request, const ZooKeeperResponsePtr & response = nullptr, bool finalize = false); + CurrentMetrics::Increment active_session_metric_increment{CurrentMetrics::ZooKeeperSession}; + std::shared_ptr zk_log; }; } diff --git a/src/Common/ZooKeeper/examples/zk_many_watches_reconnect.cpp b/src/Common/ZooKeeper/examples/zk_many_watches_reconnect.cpp index fa4fea55580..cf819121234 100644 --- a/src/Common/ZooKeeper/examples/zk_many_watches_reconnect.cpp +++ b/src/Common/ZooKeeper/examples/zk_many_watches_reconnect.cpp @@ -25,7 +25,7 @@ int main(int argc, char ** argv) DB::ConfigProcessor processor(argv[1], false, true); auto config = processor.loadConfig().configuration; - zkutil::ZooKeeper zk(*config, "zookeeper"); + zkutil::ZooKeeper zk(*config, "zookeeper", nullptr); zkutil::EventPtr watch = std::make_shared(); /// NOTE: setting watches in multiple threads because doing it in a single thread is too slow. diff --git a/src/Common/ZooKeeper/examples/zkutil_test_commands_new_lib.cpp b/src/Common/ZooKeeper/examples/zkutil_test_commands_new_lib.cpp index 89659fa5e46..5f1487da9a2 100644 --- a/src/Common/ZooKeeper/examples/zkutil_test_commands_new_lib.cpp +++ b/src/Common/ZooKeeper/examples/zkutil_test_commands_new_lib.cpp @@ -40,7 +40,7 @@ try } - ZooKeeper zk(nodes, {}, {}, {}, {5, 0}, {0, 50000}, {0, 50000}); + ZooKeeper zk(nodes, {}, {}, {}, {5, 0}, {0, 50000}, {0, 50000}, nullptr); Poco::Event event(true); diff --git a/src/Common/ZooKeeper/examples/zookeeper_impl.cpp b/src/Common/ZooKeeper/examples/zookeeper_impl.cpp index 3c2e52c93f2..e6ba4fe2a30 100644 --- a/src/Common/ZooKeeper/examples/zookeeper_impl.cpp +++ b/src/Common/ZooKeeper/examples/zookeeper_impl.cpp @@ -5,7 +5,7 @@ int main() try { - Coordination::ZooKeeper zookeeper({Coordination::ZooKeeper::Node{Poco::Net::SocketAddress{"localhost:2181"}, false}}, "", "", "", {30, 0}, {0, 50000}, {0, 50000}); + Coordination::ZooKeeper zookeeper({Coordination::ZooKeeper::Node{Poco::Net::SocketAddress{"localhost:2181"}, false}}, "", "", "", {30, 0}, {0, 50000}, {0, 50000}, nullptr); zookeeper.create("/test", "hello", false, false, {}, [](const Coordination::CreateResponse & response) { diff --git a/src/Common/config.h.in b/src/Common/config.h.in index df27a7b7d9e..0665b1717ed 100644 --- a/src/Common/config.h.in +++ b/src/Common/config.h.in @@ -2,8 +2,10 @@ // .h autogenerated by cmake! +#cmakedefine01 USE_BASE64 #cmakedefine01 USE_RE2_ST #cmakedefine01 USE_SSL +#cmakedefine01 USE_INTERNAL_SSL_LIBRARY #cmakedefine01 USE_HDFS #cmakedefine01 USE_INTERNAL_HDFS3_LIBRARY #cmakedefine01 USE_AWS_S3 diff --git a/src/Compression/CompressionCodecDelta.cpp b/src/Compression/CompressionCodecDelta.cpp index 447abe9e840..e281609ff43 100644 --- a/src/Compression/CompressionCodecDelta.cpp +++ b/src/Compression/CompressionCodecDelta.cpp @@ -132,6 +132,10 @@ void CompressionCodecDelta::doDecompressData(const char * source, UInt32 source_ throw Exception("Cannot decompress. File has wrong header", ErrorCodes::CANNOT_DECOMPRESS); UInt8 bytes_size = source[0]; + + if (bytes_size == 0) + throw Exception("Cannot decompress. File has wrong header", ErrorCodes::CANNOT_DECOMPRESS); + UInt8 bytes_to_skip = uncompressed_size % bytes_size; if (UInt32(2 + bytes_to_skip) > source_size) diff --git a/src/Compression/CompressionCodecDoubleDelta.cpp b/src/Compression/CompressionCodecDoubleDelta.cpp index 79ced55594a..c416582eb6b 100644 --- a/src/Compression/CompressionCodecDoubleDelta.cpp +++ b/src/Compression/CompressionCodecDoubleDelta.cpp @@ -502,6 +502,10 @@ void CompressionCodecDoubleDelta::doDecompressData(const char * source, UInt32 s throw Exception("Cannot decompress. File has wrong header", ErrorCodes::CANNOT_DECOMPRESS); UInt8 bytes_size = source[0]; + + if (bytes_size == 0) + throw Exception("Cannot decompress. File has wrong header", ErrorCodes::CANNOT_DECOMPRESS); + UInt8 bytes_to_skip = uncompressed_size % bytes_size; if (UInt32(2 + bytes_to_skip) > source_size) diff --git a/src/Compression/CompressionCodecEncrypted.cpp b/src/Compression/CompressionCodecEncrypted.cpp new file mode 100644 index 00000000000..d0904b4bf24 --- /dev/null +++ b/src/Compression/CompressionCodecEncrypted.cpp @@ -0,0 +1,212 @@ +#include +#include +#if USE_SSL && USE_INTERNAL_SSL_LIBRARY + +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + namespace ErrorCodes + { + extern const int ILLEGAL_CODEC_PARAMETER; + extern const int ILLEGAL_SYNTAX_FOR_CODEC_TYPE; + extern const int NO_ELEMENTS_IN_CONFIG; + extern const int OPENSSL_ERROR; + } + + void CompressionCodecEncrypted::setMasterKey(const std::string_view & master_key) + { + keys.emplace(master_key); + } + + CompressionCodecEncrypted::KeyHolder::KeyHolder(const std::string_view & master_key) + { + // Derive a key from it. + keygen_key = deriveKey(master_key); + + // EVP_AEAD_CTX is not stateful so we can create an + // instance now. + EVP_AEAD_CTX_zero(&ctx); + const int ok = EVP_AEAD_CTX_init(&ctx, EVP_aead_aes_128_gcm(), + reinterpret_cast(keygen_key.data()), keygen_key.size(), + 16 /* tag size */, nullptr); + if (!ok) + throw Exception(lastErrorString(), ErrorCodes::OPENSSL_ERROR); + } + + CompressionCodecEncrypted::KeyHolder::~KeyHolder() + { + EVP_AEAD_CTX_cleanup(&ctx); + } + + const CompressionCodecEncrypted::KeyHolder & CompressionCodecEncrypted::getKeys() + { + if (keys) + return *keys; + else + throw Exception("There is no configuration for encryption in the server config", + ErrorCodes::NO_ELEMENTS_IN_CONFIG); + } + + CompressionCodecEncrypted::CompressionCodecEncrypted(const std::string_view & cipher) + { + setCodecDescription("Encrypted", {std::make_shared(cipher)}); + } + + uint8_t CompressionCodecEncrypted::getMethodByte() const + { + return static_cast(CompressionMethodByte::Encrypted); + } + + void CompressionCodecEncrypted::updateHash(SipHash & hash) const + { + getCodecDesc()->updateTreeHash(hash); + } + + UInt32 CompressionCodecEncrypted::getMaxCompressedDataSize(UInt32 uncompressed_size) const + { + // The GCM mode is a stream cipher. No paddings are + // involved. There will be a tag at the end of ciphertext (16 + // octets). + return uncompressed_size + 16; + } + + UInt32 CompressionCodecEncrypted::doCompressData(const char * source, UInt32 source_size, char * dest) const + { + // Generate an IV out of the data block and the key-generation + // key. It is completely deterministic, but does not leak any + // information about the data block except for equivalence of + // identical blocks (under the same master key). The IV will + // be used as an authentication tag. The ciphertext and the + // tag will be written directly in the dest buffer. + const std::string_view plaintext = std::string_view(source, source_size); + + encrypt(plaintext, dest); + return source_size + 16; + } + + void CompressionCodecEncrypted::doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 uncompressed_size [[maybe_unused]]) const + { + // Extract the IV from the encrypted data block. Decrypt the + // block with the extracted IV, and compare the tag. Throw an + // exception if tags don't match. + const std::string_view ciphertext_and_tag = std::string_view(source, source_size); + assert(ciphertext_and_tag.size() == uncompressed_size + 16); + + decrypt(ciphertext_and_tag, dest); + } + + std::string CompressionCodecEncrypted::lastErrorString() + { + std::array buffer; + ERR_error_string_n(ERR_get_error(), buffer.data(), buffer.size()); + return std::string(buffer.data()); + } + + std::string CompressionCodecEncrypted::deriveKey(const std::string_view & master_key) + { + std::string_view salt(""); // No salt: derive keys in a deterministic manner. + std::string_view info("Codec Encrypted('AES-128-GCM-SIV') key generation key"); + std::array result; + + const int ok = HKDF(reinterpret_cast(result.data()), result.size(), + EVP_sha256(), + reinterpret_cast(master_key.data()), master_key.size(), + reinterpret_cast(salt.data()), salt.size(), + reinterpret_cast(info.data()), info.size()); + if (!ok) + throw Exception(lastErrorString(), ErrorCodes::OPENSSL_ERROR); + + return std::string(result.data(), 16); + } + + void CompressionCodecEncrypted::encrypt(const std::string_view & plaintext, char * ciphertext_and_tag) + { + // Fixed nonce. Yes this is unrecommended, but we have to live + // with it. + std::string_view nonce("\0\0\0\0\0\0\0\0\0\0\0\0", 12); + + size_t out_len; + const int ok = EVP_AEAD_CTX_seal(&getKeys().ctx, + reinterpret_cast(ciphertext_and_tag), + &out_len, plaintext.size() + 16, + reinterpret_cast(nonce.data()), nonce.size(), + reinterpret_cast(plaintext.data()), plaintext.size(), + nullptr, 0); + if (!ok) + throw Exception(lastErrorString(), ErrorCodes::OPENSSL_ERROR); + + assert(out_len == plaintext.size() + 16); + } + + void CompressionCodecEncrypted::decrypt(const std::string_view & ciphertext, char * plaintext) + { + std::string_view nonce("\0\0\0\0\0\0\0\0\0\0\0\0", 12); + + size_t out_len; + const int ok = EVP_AEAD_CTX_open(&getKeys().ctx, + reinterpret_cast(plaintext), + &out_len, ciphertext.size(), + reinterpret_cast(nonce.data()), nonce.size(), + reinterpret_cast(ciphertext.data()), ciphertext.size(), + nullptr, 0); + if (!ok) + throw Exception(lastErrorString(), ErrorCodes::OPENSSL_ERROR); + + assert(out_len == ciphertext.size() - 16); + } + + void registerCodecEncrypted(CompressionCodecFactory & factory) + { + const auto method_code = uint8_t(CompressionMethodByte::Encrypted); + factory.registerCompressionCodec("Encrypted", method_code, [&](const ASTPtr & arguments) -> CompressionCodecPtr + { + if (arguments) + { + if (arguments->children.size() != 1) + throw Exception("Codec Encrypted() must have 1 parameter, given " + + std::to_string(arguments->children.size()), + ErrorCodes::ILLEGAL_SYNTAX_FOR_CODEC_TYPE); + + const auto children = arguments->children; + const auto * literal = children[0]->as(); + if (!literal) + throw Exception("Wrong argument for codec Encrypted(). Expected a string literal", + ErrorCodes::ILLEGAL_SYNTAX_FOR_CODEC_TYPE); + + const String cipher = literal->value.safeGet(); + if (cipher == "AES-128-GCM-SIV") + return std::make_shared(cipher); + else + throw Exception("Cipher '" + cipher + "' is not supported", + ErrorCodes::ILLEGAL_CODEC_PARAMETER); + } + else + { + /* The factory is asking us to construct the codec + * only from the method code. How can that be + * possible? For now we only support a single cipher + * so it's not really a problem, but if we were to + * support more ciphers it would be catastrophic. */ + return std::make_shared("AES-128-GCM-SIV"); + } + }); + } +} + +#else /* USE_SSL && USE_INTERNAL_SSL_LIBRARY */ + +namespace DB +{ + void registerCodecEncrypted(CompressionCodecFactory &) + { + } +} + +#endif /* USE_SSL && USE_INTERNAL_SSL_LIBRARY */ diff --git a/src/Compression/CompressionCodecEncrypted.h b/src/Compression/CompressionCodecEncrypted.h new file mode 100644 index 00000000000..e58fd4ab173 --- /dev/null +++ b/src/Compression/CompressionCodecEncrypted.h @@ -0,0 +1,104 @@ +#pragma once + +// This depends on BoringSSL-specific API, notably . +#include +#if USE_SSL && USE_INTERNAL_SSL_LIBRARY + +#include +#include +#include +#include + +namespace DB +{ + /** This codec encrypts and decrypts blocks with AES-128 in + * GCM-SIV mode (RFC-8452), which is the only cipher currently + * supported. Although it is implemented as a compression codec + * it doesn't actually compress data. In fact encrypted data will + * no longer be compressible in any meaningful way. This means if + * you want to apply both compression and encryption to your + * columns, you need to put this codec at the end of the chain + * like "column Int32 Codec(Delta, LZ4, + * Encrypted('AES-128-GCM-SIV'))". + * + * The key is obtained by executing a command specified in the + * configuration file at startup, and if it doesn't specify a + * command the codec refuses to process any data. The command is + * expected to write a Base64-encoded key of any length, and we + * apply HKDF-SHA-256 to derive a 128-bit key-generation key + * (only the first half of the result is used). We then encrypt + * blocks in AES-128-GCM-SIV with a universally fixed nonce (12 + * repeated NUL characters). + * + * This construct has a weakness due to the nonce being fixed at + * all times: when the same data block is encrypted twice, the + * resulting ciphertext will be exactly the same. We have to live + * with this weakness because ciphertext must be deterministic, + * as otherwise our engines like ReplicatedMergeTree cannot + * deduplicate data blocks. + */ + class CompressionCodecEncrypted : public ICompressionCodec + { + public: + /** If a master key is available, the server is supposed to + * invoke this static method at the startup. The codec will + * refuse to compress or decompress any data until that. The + * key can be an arbitrary octet string, but it is + * recommended that the key is at least 16 octets long. + * + * Note that the master key is currently not guarded by a + * mutex. This method should be invoked no more than once. + */ + static void setMasterKey(const std::string_view & master_key); + + CompressionCodecEncrypted(const std::string_view & cipher); + + uint8_t getMethodByte() const override; + void updateHash(SipHash & hash) const override; + + bool isCompression() const override + { + return false; + } + + bool isGenericCompression() const override + { + return false; + } + + bool isPostProcessing() const override + { + return true; + } + + protected: + UInt32 getMaxCompressedDataSize(UInt32 uncompressed_size) const override; + + UInt32 doCompressData(const char * source, UInt32 source_size, char * dest) const override; + void doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 uncompressed_size) const override; + + private: + static std::string lastErrorString(); + static std::string deriveKey(const std::string_view & master_key); + static void encrypt(const std::string_view & plaintext, char * ciphertext_and_tag); + static void decrypt(const std::string_view & ciphertext_and_tag, char * plaintext); + + /** A private class that holds keys derived from the master + * key. + */ + struct KeyHolder : private boost::noncopyable + { + KeyHolder(const std::string_view & master_key); + ~KeyHolder(); + + std::string keygen_key; + EVP_AEAD_CTX ctx; + }; + + static const KeyHolder & getKeys(); + + static inline std::optional keys; + }; +} + +#endif /* USE_SSL && USE_INTERNAL_SSL_LIBRARY */ diff --git a/src/Compression/CompressionCodecGorilla.cpp b/src/Compression/CompressionCodecGorilla.cpp index 7fcb2183503..1276ac911f1 100644 --- a/src/Compression/CompressionCodecGorilla.cpp +++ b/src/Compression/CompressionCodecGorilla.cpp @@ -410,6 +410,10 @@ void CompressionCodecGorilla::doDecompressData(const char * source, UInt32 sourc throw Exception("Cannot decompress. File has wrong header", ErrorCodes::CANNOT_DECOMPRESS); UInt8 bytes_size = source[0]; + + if (bytes_size == 0) + throw Exception("Cannot decompress. File has wrong header", ErrorCodes::CANNOT_DECOMPRESS); + UInt8 bytes_to_skip = uncompressed_size % bytes_size; if (UInt32(2 + bytes_to_skip) > source_size) diff --git a/src/Compression/CompressionCodecLZ4.cpp b/src/Compression/CompressionCodecLZ4.cpp index 8cb81e460b1..396f6fad2c3 100644 --- a/src/Compression/CompressionCodecLZ4.cpp +++ b/src/Compression/CompressionCodecLZ4.cpp @@ -62,6 +62,7 @@ private: namespace ErrorCodes { extern const int CANNOT_COMPRESS; + extern const int CANNOT_DECOMPRESS; extern const int ILLEGAL_SYNTAX_FOR_CODEC_TYPE; extern const int ILLEGAL_CODEC_PARAMETER; } @@ -93,7 +94,10 @@ UInt32 CompressionCodecLZ4::doCompressData(const char * source, UInt32 source_si void CompressionCodecLZ4::doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 uncompressed_size) const { - LZ4::decompress(source, dest, source_size, uncompressed_size, lz4_stat); + bool success = LZ4::decompress(source, dest, source_size, uncompressed_size, lz4_stat); + + if (!success) + throw Exception("Cannot decompress", ErrorCodes::CANNOT_DECOMPRESS); } void registerCodecLZ4(CompressionCodecFactory & factory) diff --git a/src/Compression/CompressionFactory.cpp b/src/Compression/CompressionFactory.cpp index dc65713471c..bb2b00a56ef 100644 --- a/src/Compression/CompressionFactory.cpp +++ b/src/Compression/CompressionFactory.cpp @@ -79,6 +79,7 @@ ASTPtr CompressionCodecFactory::validateCodecAndGetPreprocessedAST( bool is_compression = false; bool has_none = false; std::optional generic_compression_codec_pos; + std::set post_processing_codecs; bool can_substitute_codec_arguments = true; for (size_t i = 0, size = func->arguments->children.size(); i < size; ++i) @@ -156,6 +157,9 @@ ASTPtr CompressionCodecFactory::validateCodecAndGetPreprocessedAST( if (!generic_compression_codec_pos && result_codec->isGenericCompression()) generic_compression_codec_pos = i; + + if (result_codec->isPostProcessing()) + post_processing_codecs.insert(i); } String codec_description = queryToString(codecs_descriptions); @@ -170,7 +174,8 @@ ASTPtr CompressionCodecFactory::validateCodecAndGetPreprocessedAST( /// Allow to explicitly specify single NONE codec if user don't want any compression. /// But applying other transformations solely without compression (e.g. Delta) does not make sense. - if (!is_compression && !has_none) + /// It's okay to apply post-processing codecs solely without anything else. + if (!is_compression && !has_none && post_processing_codecs.size() != codecs_descriptions->children.size()) throw Exception( "Compression codec " + codec_description + " does not compress anything." @@ -180,9 +185,19 @@ ASTPtr CompressionCodecFactory::validateCodecAndGetPreprocessedAST( " (Note: you can enable setting 'allow_suspicious_codecs' to skip this check).", ErrorCodes::BAD_ARGUMENTS); + /// It does not make sense to apply any non-post-processing codecs + /// after post-processing one. + if (!post_processing_codecs.empty() && + *post_processing_codecs.begin() != codecs_descriptions->children.size() - post_processing_codecs.size()) + throw Exception("The combination of compression codecs " + codec_description + " is meaningless," + " because it does not make sense to apply any non-post-processing codecs after" + " post-processing ones. (Note: you can enable setting 'allow_suspicious_codecs'" + " to skip this check).", ErrorCodes::BAD_ARGUMENTS); + /// It does not make sense to apply any transformations after generic compression algorithm /// So, generic compression can be only one and only at the end. - if (generic_compression_codec_pos && *generic_compression_codec_pos != codecs_descriptions->children.size() - 1) + if (generic_compression_codec_pos && + *generic_compression_codec_pos != codecs_descriptions->children.size() - 1 - post_processing_codecs.size()) throw Exception("The combination of compression codecs " + codec_description + " is meaningless," " because it does not make sense to apply any transformations after generic compression algorithm." " (Note: you can enable setting 'allow_suspicious_codecs' to skip this check).", ErrorCodes::BAD_ARGUMENTS); @@ -337,6 +352,7 @@ void registerCodecDelta(CompressionCodecFactory & factory); void registerCodecT64(CompressionCodecFactory & factory); void registerCodecDoubleDelta(CompressionCodecFactory & factory); void registerCodecGorilla(CompressionCodecFactory & factory); +void registerCodecEncrypted(CompressionCodecFactory & factory); void registerCodecMultiple(CompressionCodecFactory & factory); CompressionCodecFactory::CompressionCodecFactory() @@ -349,6 +365,7 @@ CompressionCodecFactory::CompressionCodecFactory() registerCodecT64(*this); registerCodecDoubleDelta(*this); registerCodecGorilla(*this); + registerCodecEncrypted(*this); registerCodecMultiple(*this); default_codec = get("LZ4", {}); diff --git a/src/Compression/CompressionInfo.h b/src/Compression/CompressionInfo.h index 58a39bb12a4..869c7110d62 100644 --- a/src/Compression/CompressionInfo.h +++ b/src/Compression/CompressionInfo.h @@ -41,8 +41,9 @@ enum class CompressionMethodByte : uint8_t Multiple = 0x91, Delta = 0x92, T64 = 0x93, - DoubleDelta = 0x94, - Gorilla = 0x95, + DoubleDelta = 0x94, + Gorilla = 0x95, + Encrypted = 0x96, }; } diff --git a/src/Compression/ICompressionCodec.h b/src/Compression/ICompressionCodec.h index 47b4d9bfb43..c49c16d8bad 100644 --- a/src/Compression/ICompressionCodec.h +++ b/src/Compression/ICompressionCodec.h @@ -73,6 +73,9 @@ public: /// Is it a generic compression algorithm like lz4, zstd. Usually it does not make sense to apply generic compression more than single time. virtual bool isGenericCompression() const = 0; + /// If it is a post-processing codec such as encryption. Usually it does not make sense to apply non-post-processing codecs after this. + virtual bool isPostProcessing() const { return false; } + /// It is a codec available only for evaluation purposes and not meant to be used in production. /// It will not be allowed to use unless the user will turn off the safety switch. virtual bool isExperimental() const { return false; } diff --git a/src/Compression/LZ4_decompress_faster.cpp b/src/Compression/LZ4_decompress_faster.cpp index dc293941310..6972457f11b 100644 --- a/src/Compression/LZ4_decompress_faster.cpp +++ b/src/Compression/LZ4_decompress_faster.cpp @@ -412,13 +412,16 @@ template <> void inline copyOverlap<32, false>(UInt8 * op, const UInt8 *& match, /// See also https://stackoverflow.com/a/30669632 template -void NO_INLINE decompressImpl( +bool NO_INLINE decompressImpl( const char * const source, char * const dest, + size_t source_size, size_t dest_size) { const UInt8 * ip = reinterpret_cast(source); UInt8 * op = reinterpret_cast(dest); + const UInt8 * const input_end = ip + source_size; + UInt8 * const output_begin = op; UInt8 * const output_end = op + dest_size; /// Unrolling with clang is doing >10% performance degrade. @@ -461,13 +464,19 @@ void NO_INLINE decompressImpl( /// output: xyzHello, w /// ^-op (we will overwrite excessive bytes on next iteration) - wildCopy(op, ip, copy_end); /// Here we can write up to copy_amount - 1 bytes after buffer. + { + auto * target = std::min(copy_end, output_end); + wildCopy(op, ip, target); /// Here we can write up to copy_amount - 1 bytes after buffer. + + if (target == output_end) + return true; + } ip += length; op = copy_end; - if (copy_end >= output_end) - return; + if (unlikely(ip > input_end)) + return false; /// Get match offset. @@ -475,6 +484,9 @@ void NO_INLINE decompressImpl( ip += 2; const UInt8 * match = op - offset; + if (unlikely(match < output_begin)) + return false; + /// Get match length. length = token & 0x0F; @@ -515,7 +527,10 @@ void NO_INLINE decompressImpl( copy(op, match); /// copy_amount + copy_amount - 1 - 4 * 2 bytes after buffer. if (length > copy_amount * 2) - wildCopy(op + copy_amount, match + copy_amount, copy_end); + { + auto * target = std::min(copy_end, output_end); + wildCopy(op + copy_amount, match + copy_amount, target); + } op = copy_end; } @@ -524,7 +539,7 @@ void NO_INLINE decompressImpl( } -void decompress( +bool decompress( const char * const source, char * const dest, size_t source_size, @@ -532,7 +547,7 @@ void decompress( PerformanceStatistics & statistics [[maybe_unused]]) { if (source_size == 0 || dest_size == 0) - return; + return true; /// Don't run timer if the block is too small. if (dest_size >= 32768) @@ -542,24 +557,27 @@ void decompress( /// Run the selected method and measure time. Stopwatch watch; + bool success = true; if (best_variant == 0) - decompressImpl<16, true>(source, dest, dest_size); + success = decompressImpl<16, true>(source, dest, source_size, dest_size); if (best_variant == 1) - decompressImpl<16, false>(source, dest, dest_size); + success = decompressImpl<16, false>(source, dest, source_size, dest_size); if (best_variant == 2) - decompressImpl<8, true>(source, dest, dest_size); + success = decompressImpl<8, true>(source, dest, source_size, dest_size); if (best_variant == 3) - decompressImpl<32, false>(source, dest, dest_size); + success = decompressImpl<32, false>(source, dest, source_size, dest_size); watch.stop(); /// Update performance statistics. statistics.data[best_variant].update(watch.elapsedSeconds(), dest_size); + + return success; } else { - decompressImpl<8, false>(source, dest, dest_size); + return decompressImpl<8, false>(source, dest, source_size, dest_size); } } diff --git a/src/Compression/LZ4_decompress_faster.h b/src/Compression/LZ4_decompress_faster.h index 30a0d7acb22..c1b54cf20c9 100644 --- a/src/Compression/LZ4_decompress_faster.h +++ b/src/Compression/LZ4_decompress_faster.h @@ -122,14 +122,14 @@ struct PerformanceStatistics return choose_method; } - PerformanceStatistics() {} - PerformanceStatistics(ssize_t choose_method_) : choose_method(choose_method_) {} + PerformanceStatistics() = default; + explicit PerformanceStatistics(ssize_t choose_method_) : choose_method(choose_method_) {} }; /** This method dispatch to one of different implementations depending on performance statistics. */ -void decompress( +bool decompress( const char * const source, char * const dest, size_t source_size, diff --git a/src/Compression/ya.make b/src/Compression/ya.make index 3e0429fa7fc..2dafbda262a 100644 --- a/src/Compression/ya.make +++ b/src/Compression/ya.make @@ -24,6 +24,7 @@ SRCS( CompressedWriteBuffer.cpp CompressionCodecDelta.cpp CompressionCodecDoubleDelta.cpp + CompressionCodecEncrypted.cpp CompressionCodecGorilla.cpp CompressionCodecLZ4.cpp CompressionCodecMultiple.cpp diff --git a/src/Coordination/CoordinationSettings.h b/src/Coordination/CoordinationSettings.h index 7a98e3f200d..2d8da8806cd 100644 --- a/src/Coordination/CoordinationSettings.h +++ b/src/Coordination/CoordinationSettings.h @@ -21,7 +21,7 @@ struct Settings; M(Milliseconds, dead_session_check_period_ms, 500, "How often leader will check sessions to consider them dead and remove", 0) \ M(Milliseconds, heart_beat_interval_ms, 500, "Heartbeat interval between quorum nodes", 0) \ M(Milliseconds, election_timeout_lower_bound_ms, 1000, "Lower bound of election timer (avoid too often leader elections)", 0) \ - M(Milliseconds, election_timeout_upper_bound_ms, 2000, "Lower bound of election timer (avoid too often leader elections)", 0) \ + M(Milliseconds, election_timeout_upper_bound_ms, 2000, "Upper bound of election timer (avoid too often leader elections)", 0) \ M(UInt64, reserved_log_items, 100000, "How many log items to store (don't remove during compaction)", 0) \ M(UInt64, snapshot_distance, 100000, "How many log items we have to collect to write new snapshot", 0) \ M(Bool, auto_forwarding, true, "Allow to forward write requests from followers to leader", 0) \ diff --git a/src/Core/Block.cpp b/src/Core/Block.cpp index fa78f052f37..efd8de43a3c 100644 --- a/src/Core/Block.cpp +++ b/src/Core/Block.cpp @@ -44,21 +44,7 @@ void Block::initializeIndexByName() } -void Block::insert(size_t position, const ColumnWithTypeAndName & elem) -{ - if (position > data.size()) - throw Exception("Position out of bound in Block::insert(), max position = " - + toString(data.size()), ErrorCodes::POSITION_OUT_OF_BOUND); - - for (auto & name_pos : index_by_name) - if (name_pos.second >= position) - ++name_pos.second; - - index_by_name.emplace(elem.name, position); - data.emplace(data.begin() + position, elem); -} - -void Block::insert(size_t position, ColumnWithTypeAndName && elem) +void Block::insert(size_t position, ColumnWithTypeAndName elem) { if (position > data.size()) throw Exception("Position out of bound in Block::insert(), max position = " @@ -73,26 +59,14 @@ void Block::insert(size_t position, ColumnWithTypeAndName && elem) } -void Block::insert(const ColumnWithTypeAndName & elem) -{ - index_by_name.emplace(elem.name, data.size()); - data.emplace_back(elem); -} - -void Block::insert(ColumnWithTypeAndName && elem) +void Block::insert(ColumnWithTypeAndName elem) { index_by_name.emplace(elem.name, data.size()); data.emplace_back(std::move(elem)); } -void Block::insertUnique(const ColumnWithTypeAndName & elem) -{ - if (index_by_name.end() == index_by_name.find(elem.name)) - insert(elem); -} - -void Block::insertUnique(ColumnWithTypeAndName && elem) +void Block::insertUnique(ColumnWithTypeAndName elem) { if (index_by_name.end() == index_by_name.find(elem.name)) insert(std::move(elem)); @@ -369,15 +343,19 @@ void Block::setColumns(const Columns & columns) } -void Block::setColumn(size_t position, ColumnWithTypeAndName && column) +void Block::setColumn(size_t position, ColumnWithTypeAndName column) { if (position >= data.size()) throw Exception(ErrorCodes::POSITION_OUT_OF_BOUND, "Position {} out of bound in Block::setColumn(), max position {}", position, toString(data.size())); - data[position].name = std::move(column.name); - data[position].type = std::move(column.type); - data[position].column = std::move(column.column); + if (data[position].name != column.name) + { + index_by_name.erase(data[position].name); + index_by_name.emplace(column.name, position); + } + + data[position] = std::move(column); } @@ -436,7 +414,7 @@ Block Block::sortColumns() const Block sorted_block; /// std::unordered_map (index_by_name) cannot be used to guarantee the sort order - std::vector sorted_index_by_name(index_by_name.size()); + std::vector sorted_index_by_name(index_by_name.size()); { size_t i = 0; for (auto it = index_by_name.begin(); it != index_by_name.end(); ++it) diff --git a/src/Core/Block.h b/src/Core/Block.h index a21bd290571..a2d91190795 100644 --- a/src/Core/Block.h +++ b/src/Core/Block.h @@ -39,14 +39,11 @@ public: Block(const ColumnsWithTypeAndName & data_); /// insert the column at the specified position - void insert(size_t position, const ColumnWithTypeAndName & elem); - void insert(size_t position, ColumnWithTypeAndName && elem); + void insert(size_t position, ColumnWithTypeAndName elem); /// insert the column to the end - void insert(const ColumnWithTypeAndName & elem); - void insert(ColumnWithTypeAndName && elem); + void insert(ColumnWithTypeAndName elem); /// insert the column to the end, if there is no column with that name yet - void insertUnique(const ColumnWithTypeAndName & elem); - void insertUnique(ColumnWithTypeAndName && elem); + void insertUnique(ColumnWithTypeAndName elem); /// remove the column at the specified position void erase(size_t position); /// remove the columns at the specified positions @@ -68,7 +65,7 @@ public: const_cast(this)->findByName(name)); } - const ColumnWithTypeAndName* findByName(const std::string & name) const; + const ColumnWithTypeAndName * findByName(const std::string & name) const; ColumnWithTypeAndName & getByName(const std::string & name) { @@ -125,7 +122,7 @@ public: Columns getColumns() const; void setColumns(const Columns & columns); - void setColumn(size_t position, ColumnWithTypeAndName && column); + void setColumn(size_t position, ColumnWithTypeAndName column); Block cloneWithColumns(const Columns & columns) const; Block cloneWithoutColumns() const; Block cloneWithCutColumns(size_t start, size_t length) const; diff --git a/src/Core/Defines.h b/src/Core/Defines.h index 5751f4beeb7..8244a0fc815 100644 --- a/src/Core/Defines.h +++ b/src/Core/Defines.h @@ -11,6 +11,7 @@ #define DBMS_DEFAULT_CONNECT_TIMEOUT_WITH_FAILOVER_SECURE_MS 100 #define DBMS_DEFAULT_SEND_TIMEOUT_SEC 300 #define DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC 300 +#define DBMS_DEFAULT_DRAIN_TIMEOUT_SEC 3 /// Timeouts for hedged requests. #define DBMS_DEFAULT_HEDGED_CONNECTION_TIMEOUT_MS 100 #define DBMS_DEFAULT_RECEIVE_DATA_TIMEOUT_MS 2000 @@ -70,6 +71,7 @@ /// Minimum revision supporting SettingsBinaryFormat::STRINGS. #define DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS 54429 +#define DBMS_MIN_REVISION_WITH_SCALARS 54429 /// Minimum revision supporting OpenTelemetry #define DBMS_MIN_REVISION_WITH_OPENTELEMETRY 54442 diff --git a/src/Core/ExternalTable.cpp b/src/Core/ExternalTable.cpp index 87945dd1ce6..9b53cd79a84 100644 --- a/src/Core/ExternalTable.cpp +++ b/src/Core/ExternalTable.cpp @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -159,12 +160,11 @@ void ExternalTablesHandler::handlePart(const Poco::Net::MessageHeader & header, auto temporary_table = TemporaryTableHolder(getContext(), ColumnsDescription{columns}, {}); auto storage = temporary_table.getTable(); getContext()->addExternalTable(data->table_name, std::move(temporary_table)); - BlockOutputStreamPtr output = storage->write(ASTPtr(), storage->getInMemoryMetadataPtr(), getContext()); + auto sink = storage->write(ASTPtr(), storage->getInMemoryMetadataPtr(), getContext()); /// Write data data->pipe->resize(1); - auto sink = std::make_shared(std::move(output)); connect(*data->pipe->getOutputPort(0), sink->getPort()); auto processors = Pipe::detachProcessors(std::move(*data->pipe)); diff --git a/src/Core/Field.cpp b/src/Core/Field.cpp index e625c92f826..b7b03951ac9 100644 --- a/src/Core/Field.cpp +++ b/src/Core/Field.cpp @@ -455,6 +455,16 @@ inline void writeText(const Null &, WriteBuffer & buf) writeText(std::string("NULL"), buf); } +inline void writeText(const NegativeInfinity &, WriteBuffer & buf) +{ + writeText(std::string("-Inf"), buf); +} + +inline void writeText(const PositiveInfinity &, WriteBuffer & buf) +{ + writeText(std::string("+Inf"), buf); +} + String toString(const Field & x) { return Field::dispatch( diff --git a/src/Core/Field.h b/src/Core/Field.h index 23569f5f9f1..744675d6e86 100644 --- a/src/Core/Field.h +++ b/src/Core/Field.h @@ -218,6 +218,8 @@ template <> struct NearestFieldTypeImpl { using Type = Tuple; }; template <> struct NearestFieldTypeImpl { using Type = Map; }; template <> struct NearestFieldTypeImpl { using Type = UInt64; }; template <> struct NearestFieldTypeImpl { using Type = Null; }; +template <> struct NearestFieldTypeImpl { using Type = NegativeInfinity; }; +template <> struct NearestFieldTypeImpl { using Type = PositiveInfinity; }; template <> struct NearestFieldTypeImpl { using Type = AggregateFunctionStateData; }; @@ -269,6 +271,10 @@ public: Int256 = 25, Map = 26, UUID = 27, + + // Special types for index analysis + NegativeInfinity = 254, + PositiveInfinity = 255, }; static const char * toString(Which which) @@ -276,6 +282,8 @@ public: switch (which) { case Null: return "Null"; + case NegativeInfinity: return "-Inf"; + case PositiveInfinity: return "+Inf"; case UInt64: return "UInt64"; case UInt128: return "UInt128"; case UInt256: return "UInt256"; @@ -404,7 +412,10 @@ public: Types::Which getType() const { return which; } const char * getTypeName() const { return Types::toString(which); } - bool isNull() const { return which == Types::Null; } + // Non-valued field are all denoted as Null + bool isNull() const { return which == Types::Null || which == Types::NegativeInfinity || which == Types::PositiveInfinity; } + bool isNegativeInfinity() const { return which == Types::NegativeInfinity; } + bool isPositiveInfinity() const { return which == Types::PositiveInfinity; } template @@ -459,7 +470,10 @@ public: switch (which) { - case Types::Null: return false; + case Types::Null: + case Types::NegativeInfinity: + case Types::PositiveInfinity: + return false; case Types::UInt64: return get() < rhs.get(); case Types::UInt128: return get() < rhs.get(); case Types::UInt256: return get() < rhs.get(); @@ -496,7 +510,10 @@ public: switch (which) { - case Types::Null: return true; + case Types::Null: + case Types::NegativeInfinity: + case Types::PositiveInfinity: + return true; case Types::UInt64: return get() <= rhs.get(); case Types::UInt128: return get() <= rhs.get(); case Types::UInt256: return get() <= rhs.get(); @@ -533,8 +550,11 @@ public: switch (which) { - case Types::Null: return true; - case Types::UInt64: return get() == rhs.get(); + case Types::Null: + case Types::NegativeInfinity: + case Types::PositiveInfinity: + return true; + case Types::UInt64: return get() == rhs.get(); case Types::Int64: return get() == rhs.get(); case Types::Float64: { @@ -573,6 +593,8 @@ public: switch (field.which) { case Types::Null: return f(field.template get()); + case Types::NegativeInfinity: return f(field.template get()); + case Types::PositiveInfinity: return f(field.template get()); // gcc 8.2.1 #if !defined(__clang__) #pragma GCC diagnostic push @@ -731,6 +753,8 @@ using Row = std::vector; template <> struct Field::TypeToEnum { static const Types::Which value = Types::Null; }; +template <> struct Field::TypeToEnum { static const Types::Which value = Types::NegativeInfinity; }; +template <> struct Field::TypeToEnum { static const Types::Which value = Types::PositiveInfinity; }; template <> struct Field::TypeToEnum { static const Types::Which value = Types::UInt64; }; template <> struct Field::TypeToEnum { static const Types::Which value = Types::UInt128; }; template <> struct Field::TypeToEnum { static const Types::Which value = Types::UInt256; }; @@ -751,6 +775,8 @@ template <> struct Field::TypeToEnum>{ static const Typ template <> struct Field::TypeToEnum{ static const Types::Which value = Types::AggregateFunctionState; }; template <> struct Field::EnumToType { using Type = Null; }; +template <> struct Field::EnumToType { using Type = NegativeInfinity; }; +template <> struct Field::EnumToType { using Type = PositiveInfinity; }; template <> struct Field::EnumToType { using Type = UInt64; }; template <> struct Field::EnumToType { using Type = UInt128; }; template <> struct Field::EnumToType { using Type = UInt256; }; diff --git a/src/Core/MySQL/MySQLClient.cpp b/src/Core/MySQL/MySQLClient.cpp index d103ea873e5..26535f05be7 100644 --- a/src/Core/MySQL/MySQLClient.cpp +++ b/src/Core/MySQL/MySQLClient.cpp @@ -24,16 +24,15 @@ namespace ErrorCodes } MySQLClient::MySQLClient(const String & host_, UInt16 port_, const String & user_, const String & password_) - : host(host_), port(port_), user(user_), password(std::move(password_)) + : host(host_), port(port_), user(user_), password(std::move(password_)), + client_capabilities(CLIENT_PROTOCOL_41 | CLIENT_PLUGIN_AUTH | CLIENT_SECURE_CONNECTION) { - mysql_context.client_capabilities = CLIENT_PROTOCOL_41 | CLIENT_PLUGIN_AUTH | CLIENT_SECURE_CONNECTION; } MySQLClient::MySQLClient(MySQLClient && other) : host(std::move(other.host)), port(other.port), user(std::move(other.user)), password(std::move(other.password)) - , mysql_context(other.mysql_context) + , client_capabilities(other.client_capabilities) { - mysql_context.sequence_id = 0; } void MySQLClient::connect() @@ -57,7 +56,8 @@ void MySQLClient::connect() in = std::make_shared(*socket); out = std::make_shared(*socket); - packet_endpoint = mysql_context.makeEndpoint(*in, *out); + packet_endpoint = MySQLProtocol::PacketEndpoint::create(*in, *out, sequence_id); + handshake(); } @@ -69,7 +69,7 @@ void MySQLClient::disconnect() socket->close(); socket = nullptr; connected = false; - mysql_context.sequence_id = 0; + sequence_id = 0; } /// https://dev.mysql.com/doc/internals/en/connection-phase-packets.html @@ -88,10 +88,10 @@ void MySQLClient::handshake() String auth_plugin_data = native41.getAuthPluginData(); HandshakeResponse handshake_response( - mysql_context.client_capabilities, MAX_PACKET_LENGTH, charset_utf8, user, "", auth_plugin_data, mysql_native_password); + client_capabilities, MAX_PACKET_LENGTH, charset_utf8, user, "", auth_plugin_data, mysql_native_password); packet_endpoint->sendPacket(handshake_response, true); - ResponsePacket packet_response(mysql_context.client_capabilities, true); + ResponsePacket packet_response(client_capabilities, true); packet_endpoint->receivePacket(packet_response); packet_endpoint->resetSequenceId(); @@ -106,7 +106,7 @@ void MySQLClient::writeCommand(char command, String query) WriteCommand write_command(command, query); packet_endpoint->sendPacket(write_command, true); - ResponsePacket packet_response(mysql_context.client_capabilities); + ResponsePacket packet_response(client_capabilities); packet_endpoint->receivePacket(packet_response); switch (packet_response.getType()) { @@ -125,7 +125,7 @@ void MySQLClient::registerSlaveOnMaster(UInt32 slave_id) RegisterSlave register_slave(slave_id); packet_endpoint->sendPacket(register_slave, true); - ResponsePacket packet_response(mysql_context.client_capabilities); + ResponsePacket packet_response(client_capabilities); packet_endpoint->receivePacket(packet_response); packet_endpoint->resetSequenceId(); if (packet_response.getType() == PACKET_ERR) diff --git a/src/Core/MySQL/MySQLClient.h b/src/Core/MySQL/MySQLClient.h index 6144b14690d..2c93fc888a3 100644 --- a/src/Core/MySQL/MySQLClient.h +++ b/src/Core/MySQL/MySQLClient.h @@ -45,7 +45,9 @@ private: String password; bool connected = false; - MySQLWireContext mysql_context; + uint8_t sequence_id = 0; + uint32_t client_capabilities = 0; + const UInt8 charset_utf8 = 33; const String mysql_native_password = "mysql_native_password"; diff --git a/src/Core/MySQL/PacketEndpoint.cpp b/src/Core/MySQL/PacketEndpoint.cpp index fa1d60034d2..0bc5c585516 100644 --- a/src/Core/MySQL/PacketEndpoint.cpp +++ b/src/Core/MySQL/PacketEndpoint.cpp @@ -68,15 +68,4 @@ String PacketEndpoint::packetToText(const String & payload) } - -MySQLProtocol::PacketEndpointPtr MySQLWireContext::makeEndpoint(WriteBuffer & out) -{ - return MySQLProtocol::PacketEndpoint::create(out, sequence_id); -} - -MySQLProtocol::PacketEndpointPtr MySQLWireContext::makeEndpoint(ReadBuffer & in, WriteBuffer & out) -{ - return MySQLProtocol::PacketEndpoint::create(in, out, sequence_id); -} - } diff --git a/src/Core/MySQL/PacketEndpoint.h b/src/Core/MySQL/PacketEndpoint.h index 3aa76ac93de..df81f49fefb 100644 --- a/src/Core/MySQL/PacketEndpoint.h +++ b/src/Core/MySQL/PacketEndpoint.h @@ -58,14 +58,4 @@ using PacketEndpointPtr = std::shared_ptr; } -struct MySQLWireContext -{ - uint8_t sequence_id = 0; - uint32_t client_capabilities = 0; - size_t max_packet_size = 0; - - MySQLProtocol::PacketEndpointPtr makeEndpoint(WriteBuffer & out); - MySQLProtocol::PacketEndpointPtr makeEndpoint(ReadBuffer & in, WriteBuffer & out); -}; - } diff --git a/src/Core/NamesAndTypes.cpp b/src/Core/NamesAndTypes.cpp index 57d29c96c53..91191c73fd0 100644 --- a/src/Core/NamesAndTypes.cpp +++ b/src/Core/NamesAndTypes.cpp @@ -6,6 +6,7 @@ #include #include #include +#include namespace DB @@ -161,18 +162,24 @@ NamesAndTypesList NamesAndTypesList::filter(const Names & names) const NamesAndTypesList NamesAndTypesList::addTypes(const Names & names) const { - std::unordered_map self_columns; + /// NOTE: It's better to make a map in `IStorage` than to create it here every time again. +#if !defined(ARCADIA_BUILD) + google::dense_hash_map types; +#else + google::sparsehash::dense_hash_map types; +#endif + types.set_empty_key(StringRef()); for (const auto & column : *this) - self_columns[column.name] = &column; + types[column.name] = &column.type; NamesAndTypesList res; for (const String & name : names) { - auto it = self_columns.find(name); - if (it == self_columns.end()) + auto it = types.find(name); + if (it == types.end()) throw Exception("No column " + name, ErrorCodes::THERE_IS_NO_COLUMN); - res.emplace_back(*it->second); + res.emplace_back(name, *it->second); } return res; diff --git a/src/Core/PostgreSQL/insertPostgreSQLValue.cpp b/src/Core/PostgreSQL/insertPostgreSQLValue.cpp index e606300fc37..19560cec9ea 100644 --- a/src/Core/PostgreSQL/insertPostgreSQLValue.cpp +++ b/src/Core/PostgreSQL/insertPostgreSQLValue.cpp @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -102,7 +103,16 @@ void insertPostgreSQLValue( assert_cast(column).insertValue(time); break; } - case ExternalResultDescription::ValueType::vtDateTime64:[[fallthrough]]; + case ExternalResultDescription::ValueType::vtDateTime64: + { + ReadBufferFromString in(value); + DateTime64 time = 0; + readDateTime64Text(time, 6, in, assert_cast(data_type.get())->getTimeZone()); + if (time < 0) + time = 0; + assert_cast &>(column).insertValue(time); + break; + } case ExternalResultDescription::ValueType::vtDecimal32: [[fallthrough]]; case ExternalResultDescription::ValueType::vtDecimal64: [[fallthrough]]; case ExternalResultDescription::ValueType::vtDecimal128: [[fallthrough]]; @@ -204,6 +214,18 @@ void preparePostgreSQLArrayInfo( ReadBufferFromString in(field); time_t time = 0; readDateTimeText(time, in, assert_cast(nested.get())->getTimeZone()); + if (time < 0) + time = 0; + return time; + }; + else if (which.isDateTime64()) + parser = [nested](std::string & field) -> Field + { + ReadBufferFromString in(field); + DateTime64 time = 0; + readDateTime64Text(time, 6, in, assert_cast(nested.get())->getTimeZone()); + if (time < 0) + time = 0; return time; }; else if (which.isDecimal32()) diff --git a/src/Core/PostgreSQL/insertPostgreSQLValue.h b/src/Core/PostgreSQL/insertPostgreSQLValue.h index 7acba4f09bd..4ed3eb95aac 100644 --- a/src/Core/PostgreSQL/insertPostgreSQLValue.h +++ b/src/Core/PostgreSQL/insertPostgreSQLValue.h @@ -7,7 +7,6 @@ #if USE_LIBPQXX #include -#include #include #include diff --git a/src/Core/Settings.h b/src/Core/Settings.h index fcfeb5d9543..7b036a976c8 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -54,10 +54,11 @@ class IColumn; M(Milliseconds, connect_timeout_with_failover_secure_ms, DBMS_DEFAULT_CONNECT_TIMEOUT_WITH_FAILOVER_SECURE_MS, "Connection timeout for selecting first healthy replica (for secure connections).", 0) \ M(Seconds, receive_timeout, DBMS_DEFAULT_RECEIVE_TIMEOUT_SEC, "", 0) \ M(Seconds, send_timeout, DBMS_DEFAULT_SEND_TIMEOUT_SEC, "", 0) \ + M(Seconds, drain_timeout, DBMS_DEFAULT_DRAIN_TIMEOUT_SEC, "", 0) \ M(Seconds, tcp_keep_alive_timeout, 0, "The time in seconds the connection needs to remain idle before TCP starts sending keepalive probes", 0) \ M(Milliseconds, hedged_connection_timeout_ms, DBMS_DEFAULT_HEDGED_CONNECTION_TIMEOUT_MS, "Connection timeout for establishing connection with replica for Hedged requests", 0) \ M(Milliseconds, receive_data_timeout_ms, DBMS_DEFAULT_RECEIVE_DATA_TIMEOUT_MS, "Connection timeout for receiving first packet of data or packet with positive progress from replica", 0) \ - M(Bool, use_hedged_requests, false, "Use hedged requests for distributed queries", 0) \ + M(Bool, use_hedged_requests, true, "Use hedged requests for distributed queries", 0) \ M(Bool, allow_changing_replica_until_first_data_packet, false, "Allow HedgedConnections to change replica until receiving first data packet", 0) \ M(Milliseconds, queue_max_wait_ms, 0, "The wait time in the request queue, if the number of concurrent requests exceeds the maximum.", 0) \ M(Milliseconds, connection_pool_max_wait_ms, 0, "The wait time when the connection pool is full.", 0) \ @@ -108,7 +109,7 @@ class IColumn; M(Bool, compile_expressions, true, "Compile some scalar functions and operators to native code.", 0) \ M(UInt64, min_count_to_compile_expression, 3, "The number of identical expressions before they are JIT-compiled", 0) \ M(Bool, compile_aggregate_expressions, true, "Compile aggregate functions to native code.", 0) \ - M(UInt64, min_count_to_compile_aggregate_expression, 0, "The number of identical aggregate expressions before they are JIT-compiled", 0) \ + M(UInt64, min_count_to_compile_aggregate_expression, 3, "The number of identical aggregate expressions before they are JIT-compiled", 0) \ M(UInt64, group_by_two_level_threshold, 100000, "From what number of keys, a two-level aggregation starts. 0 - the threshold is not set.", 0) \ M(UInt64, group_by_two_level_threshold_bytes, 50000000, "From what size of the aggregation state in bytes, a two-level aggregation begins to be used. 0 - the threshold is not set. Two-level aggregation is used when at least one of the thresholds is triggered.", 0) \ M(Bool, distributed_aggregation_memory_efficient, true, "Is the memory-saving mode of distributed aggregation enabled.", 0) \ @@ -235,6 +236,8 @@ class IColumn; M(Milliseconds, sleep_in_send_tables_status_ms, 0, "Time to sleep in sending tables status response in TCPHandler", 0) \ M(Milliseconds, sleep_in_send_data_ms, 0, "Time to sleep in sending data in TCPHandler", 0) \ M(UInt64, unknown_packet_in_send_data, 0, "Send unknown packet instead of data Nth data packet", 0) \ + /** Settings for testing connection collector */ \ + M(Milliseconds, sleep_in_receive_cancel_ms, 0, "Time to sleep in receiving cancel in TCPHandler", 0) \ \ M(Bool, insert_allow_materialized_columns, 0, "If setting is enabled, Allow materialized columns in INSERT.", 0) \ M(Seconds, http_connection_timeout, DEFAULT_HTTP_READ_BUFFER_CONNECTION_TIMEOUT, "HTTP connection timeout.", 0) \ @@ -433,7 +436,7 @@ class IColumn; M(Bool, data_type_default_nullable, false, "Data types without NULL or NOT NULL will make Nullable", 0) \ M(Bool, cast_keep_nullable, false, "CAST operator keep Nullable for result data type", 0) \ M(Bool, alter_partition_verbose_result, false, "Output information about affected parts. Currently works only for FREEZE and ATTACH commands.", 0) \ - M(Bool, allow_experimental_database_materialize_mysql, false, "Allow to create database with Engine=MaterializeMySQL(...).", 0) \ + M(Bool, allow_experimental_database_materialized_mysql, false, "Allow to create database with Engine=MaterializedMySQL(...).", 0) \ M(Bool, allow_experimental_database_materialized_postgresql, false, "Allow to create database with Engine=MaterializedPostgreSQL(...).", 0) \ M(Bool, system_events_show_zero_values, false, "Include all metrics, even with zero values", 0) \ M(MySQLDataTypesSupport, mysql_datatypes_support_level, 0, "Which MySQL types should be converted to corresponding ClickHouse types (rather than being represented as String). Can be empty or any combination of 'decimal' or 'datetime64'. When empty MySQL's DECIMAL and DATETIME/TIMESTAMP with non-zero precision are seen as String on ClickHouse's side.", 0) \ @@ -448,7 +451,6 @@ class IColumn; M(Bool, optimize_skip_merged_partitions, false, "Skip partitions with one part with level > 0 in optimize final", 0) \ M(Bool, optimize_on_insert, true, "Do the same transformation for inserted block of data as if merge was done on this block.", 0) \ M(Bool, allow_experimental_map_type, true, "Obsolete setting, does nothing.", 0) \ - M(Bool, allow_experimental_window_functions, false, "Allow experimental window functions", 0) \ M(Bool, allow_experimental_projection_optimization, false, "Enable projection optimization when processing SELECT queries", 0) \ M(Bool, force_optimize_projection, false, "If projection optimization is enabled, SELECT queries need to use projection", 0) \ M(Bool, async_socket_for_remote, true, "Asynchronously read from socket executing remote query", 0) \ @@ -470,8 +472,8 @@ class IColumn; M(Bool, database_replicated_always_detach_permanently, false, "Execute DETACH TABLE as DETACH TABLE PERMANENTLY if database engine is Replicated", 0) \ M(DistributedDDLOutputMode, distributed_ddl_output_mode, DistributedDDLOutputMode::THROW, "Format of distributed DDL query result", 0) \ M(UInt64, distributed_ddl_entry_format_version, 1, "Version of DDL entry to write into ZooKeeper", 0) \ - M(UInt64, external_storage_max_read_rows, 0, "Limit maximum number of rows when table with external engine should flush history data. Now supported only for MySQL table engine, database engine, dictionary and MaterializeMySQL. If equal to 0, this setting is disabled", 0) \ - M(UInt64, external_storage_max_read_bytes, 0, "Limit maximum number of bytes when table with external engine should flush history data. Now supported only for MySQL table engine, database engine, dictionary and MaterializeMySQL. If equal to 0, this setting is disabled", 0) \ + M(UInt64, external_storage_max_read_rows, 0, "Limit maximum number of rows when table with external engine should flush history data. Now supported only for MySQL table engine, database engine, dictionary and MaterializedMySQL. If equal to 0, this setting is disabled", 0) \ + M(UInt64, external_storage_max_read_bytes, 0, "Limit maximum number of bytes when table with external engine should flush history data. Now supported only for MySQL table engine, database engine, dictionary and MaterializedMySQL. If equal to 0, this setting is disabled", 0) \ M(UnionMode, union_default_mode, UnionMode::Unspecified, "Set default Union Mode in SelectWithUnion query. Possible values: empty string, 'ALL', 'DISTINCT'. If empty, query without Union Mode will throw exception.", 0) \ M(Bool, optimize_aggregators_of_group_by_keys, true, "Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section", 0) \ M(Bool, optimize_group_by_function_keys, true, "Eliminates functions of other keys in GROUP BY section", 0) \ @@ -480,12 +482,16 @@ class IColumn; M(Bool, query_plan_enable_optimizations, true, "Apply optimizations to query plan", 0) \ M(UInt64, query_plan_max_optimizations_to_apply, 10000, "Limit the total number of optimizations applied to query plan. If zero, ignored. If limit reached, throw exception", 0) \ M(Bool, query_plan_filter_push_down, true, "Allow to push down filter by predicate query plan step", 0) \ + M(UInt64, regexp_max_matches_per_row, 1000, "Max matches of any single regexp per row, used to safeguard 'extractAllGroupsHorizontal' against consuming too much memory with greedy RE.", 0) \ \ M(UInt64, limit, 0, "Limit on read rows from the most 'end' result for select query, default 0 means no limit length", 0) \ M(UInt64, offset, 0, "Offset on read rows from the most 'end' result for select query", 0) \ \ + M(UInt64, function_range_max_elements_in_block, 500000000, "Maximum number of values generated by function 'range' per block of data (sum of array sizes for every row in a block, see also 'max_block_size' and 'min_insert_block_size_rows'). It is a safety threshold.", 0) \ + \ /** Experimental functions */ \ M(Bool, allow_experimental_funnel_functions, false, "Enable experimental functions for funnel analysis.", 0) \ + M(Bool, allow_experimental_nlp_functions, false, "Enable experimental functions for natural language processing.", 0) \ \ \ /** Obsolete settings that do nothing but left for compatibility reasons. Remove each one after half a year of obsolescence. */ \ @@ -526,6 +532,7 @@ class IColumn; M(Bool, input_format_values_accurate_types_of_literals, true, "For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.", 0) \ M(Bool, input_format_avro_allow_missing_fields, false, "For Avro/AvroConfluent format: when field is not found in schema use default value instead of error", 0) \ M(URI, format_avro_schema_registry_url, "", "For AvroConfluent format: Confluent Schema Registry URL.", 0) \ + M(String, output_format_avro_string_column_pattern, "", "For Avro format: regexp of String columns to select as AVRO string.", 0) \ \ M(Bool, output_format_json_quote_64bit_integers, true, "Controls quoting of 64-bit integers in JSON output format.", 0) \ \ diff --git a/src/Core/SortDescription.h b/src/Core/SortDescription.h index 41b4e5b6b32..e1653b9102b 100644 --- a/src/Core/SortDescription.h +++ b/src/Core/SortDescription.h @@ -42,15 +42,15 @@ struct SortColumnDescription bool with_fill; FillColumnDescription fill_description; - SortColumnDescription( - size_t column_number_, int direction_, int nulls_direction_, + explicit SortColumnDescription( + size_t column_number_, int direction_ = 1, int nulls_direction_ = 1, const std::shared_ptr & collator_ = nullptr, bool with_fill_ = false, const FillColumnDescription & fill_description_ = {}) : column_number(column_number_), direction(direction_), nulls_direction(nulls_direction_), collator(collator_) , with_fill(with_fill_), fill_description(fill_description_) {} - SortColumnDescription( - const std::string & column_name_, int direction_, int nulls_direction_, + explicit SortColumnDescription( + const std::string & column_name_, int direction_ = 1, int nulls_direction_ = 1, const std::shared_ptr & collator_ = nullptr, bool with_fill_ = false, const FillColumnDescription & fill_description_ = {}) : column_name(column_name_), column_number(0), direction(direction_), nulls_direction(nulls_direction_) diff --git a/src/Core/Types.h b/src/Core/Types.h index 5496f09f3d3..b5f3c1bff9f 100644 --- a/src/Core/Types.h +++ b/src/Core/Types.h @@ -14,6 +14,8 @@ namespace DB /// Data types for representing elementary values from a database in RAM. struct Null {}; +struct NegativeInfinity {}; +struct PositiveInfinity {}; /// Ignore strange gcc warning https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55776 #if !defined(__clang__) diff --git a/src/Core/callOnTypeIndex.h b/src/Core/callOnTypeIndex.h index b0420073998..d3348466369 100644 --- a/src/Core/callOnTypeIndex.h +++ b/src/Core/callOnTypeIndex.h @@ -73,6 +73,7 @@ bool callOnBasicType(TypeIndex number, F && f) switch (number) { case TypeIndex::Date: return f(TypePair()); + case TypeIndex::Date32: return f(TypePair()); case TypeIndex::DateTime: return f(TypePair()); case TypeIndex::DateTime64: return f(TypePair()); default: @@ -142,6 +143,7 @@ inline bool callOnBasicTypes(TypeIndex type_num1, TypeIndex type_num2, F && f) switch (type_num1) { case TypeIndex::Date: return callOnBasicType(type_num2, std::forward(f)); + case TypeIndex::Date32: return callOnBasicType(type_num2, std::forward(f)); case TypeIndex::DateTime: return callOnBasicType(type_num2, std::forward(f)); case TypeIndex::DateTime64: return callOnBasicType(type_num2, std::forward(f)); default: @@ -154,6 +156,7 @@ inline bool callOnBasicTypes(TypeIndex type_num1, TypeIndex type_num2, F && f) class DataTypeDate; +class DataTypeDate32; class DataTypeString; class DataTypeFixedString; class DataTypeUUID; @@ -192,7 +195,7 @@ bool callOnIndexAndDataType(TypeIndex number, F && f, ExtraArgs && ... args) case TypeIndex::Decimal256: return f(TypePair, T>(), std::forward(args)...); case TypeIndex::Date: return f(TypePair(), std::forward(args)...); - case TypeIndex::Date32: return f(TypePair(), std::forward(args)...); + case TypeIndex::Date32: return f(TypePair(), std::forward(args)...); case TypeIndex::DateTime: return f(TypePair(), std::forward(args)...); case TypeIndex::DateTime64: return f(TypePair(), std::forward(args)...); diff --git a/src/Core/config_core.h.in b/src/Core/config_core.h.in index 45cbc6efe19..cc9c993b205 100644 --- a/src/Core/config_core.h.in +++ b/src/Core/config_core.h.in @@ -15,4 +15,5 @@ #cmakedefine01 USE_LIBPQXX #cmakedefine01 USE_SQLITE #cmakedefine01 USE_NURAFT +#cmakedefine01 USE_NLP #cmakedefine01 USE_KRB5 diff --git a/src/Core/iostream_debug_helpers.cpp b/src/Core/iostream_debug_helpers.cpp index 8ec06af049e..38e61ac4fca 100644 --- a/src/Core/iostream_debug_helpers.cpp +++ b/src/Core/iostream_debug_helpers.cpp @@ -6,7 +6,6 @@ #include #include #include -#include #include #include #include @@ -28,12 +27,6 @@ std::ostream & operator<< (std::ostream & stream, const Field & what) return stream; } -std::ostream & operator<<(std::ostream & stream, const IBlockInputStream & what) -{ - stream << "IBlockInputStream(name = " << what.getName() << ")"; - return stream; -} - std::ostream & operator<<(std::ostream & stream, const NameAndTypePair & what) { stream << "NameAndTypePair(name = " << what.name << ", type = " << what.type << ")"; diff --git a/src/Core/iostream_debug_helpers.h b/src/Core/iostream_debug_helpers.h index 7568fa6e445..f57788b63d8 100644 --- a/src/Core/iostream_debug_helpers.h +++ b/src/Core/iostream_debug_helpers.h @@ -10,9 +10,6 @@ class Field; template >> std::ostream & operator<<(std::ostream & stream, const T & what); -class IBlockInputStream; -std::ostream & operator<<(std::ostream & stream, const IBlockInputStream & what); - struct NameAndTypePair; std::ostream & operator<<(std::ostream & stream, const NameAndTypePair & what); diff --git a/src/Core/ya.make b/src/Core/ya.make index 6946d7a47bb..d1e352ee846 100644 --- a/src/Core/ya.make +++ b/src/Core/ya.make @@ -31,10 +31,6 @@ SRCS( MySQL/PacketsProtocolText.cpp MySQL/PacketsReplication.cpp NamesAndTypes.cpp - PostgreSQL/Connection.cpp - PostgreSQL/PoolWithFailover.cpp - PostgreSQL/Utils.cpp - PostgreSQL/insertPostgreSQLValue.cpp PostgreSQLProtocol.cpp QueryProcessingStage.cpp Settings.cpp diff --git a/src/DataStreams/BlocksListBlockInputStream.h b/src/DataStreams/BlocksListBlockInputStream.h deleted file mode 100644 index de287c2dc5e..00000000000 --- a/src/DataStreams/BlocksListBlockInputStream.h +++ /dev/null @@ -1,44 +0,0 @@ -#pragma once - -#include - - -namespace DB -{ - -/** A stream of blocks from which you can read the next block from an explicitly provided list. - * Also see OneBlockInputStream. - */ -class BlocksListBlockInputStream : public IBlockInputStream -{ -public: - /// Acquires the ownership of the block list. - BlocksListBlockInputStream(BlocksList && list_) - : list(std::move(list_)), it(list.begin()), end(list.end()) {} - - /// Uses a list of blocks lying somewhere else. - BlocksListBlockInputStream(BlocksList::iterator & begin_, BlocksList::iterator & end_) - : it(begin_), end(end_) {} - - String getName() const override { return "BlocksList"; } - -protected: - Block getHeader() const override { return list.empty() ? Block() : *list.begin(); } - - Block readImpl() override - { - if (it == end) - return Block(); - - Block res = *it; - ++it; - return res; - } - -private: - BlocksList list; - BlocksList::iterator it; - const BlocksList::iterator end; -}; - -} diff --git a/src/DataStreams/ConnectionCollector.cpp b/src/DataStreams/ConnectionCollector.cpp new file mode 100644 index 00000000000..3a411fd2e33 --- /dev/null +++ b/src/DataStreams/ConnectionCollector.cpp @@ -0,0 +1,115 @@ +#include + +#include +#include +#include +#include + +namespace CurrentMetrics +{ +extern const Metric AsyncDrainedConnections; +extern const Metric ActiveAsyncDrainedConnections; +} + +namespace DB +{ +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; + extern const int UNKNOWN_PACKET_FROM_SERVER; +} + +std::unique_ptr ConnectionCollector::connection_collector; + +static constexpr UInt64 max_connection_draining_tasks_per_thread = 20; + +ConnectionCollector::ConnectionCollector(ContextMutablePtr global_context_, size_t max_threads) + : WithMutableContext(global_context_), pool(max_threads, max_threads, max_threads * max_connection_draining_tasks_per_thread) +{ +} + +ConnectionCollector & ConnectionCollector::init(ContextMutablePtr global_context_, size_t max_threads) +{ + if (connection_collector) + { + throw Exception("Connection collector is initialized twice. This is a bug.", ErrorCodes::LOGICAL_ERROR); + } + + connection_collector.reset(new ConnectionCollector(global_context_, max_threads)); + return *connection_collector; +} + +struct AsyncDrainTask +{ + const ConnectionPoolWithFailoverPtr pool; + std::shared_ptr shared_connections; + void operator()() const + { + ConnectionCollector::drainConnections(*shared_connections); + } + + // We don't have std::unique_function yet. Wrap it in shared_ptr to make the functor copyable. + std::shared_ptr metric_increment + = std::make_shared(CurrentMetrics::ActiveAsyncDrainedConnections); +}; + +std::shared_ptr ConnectionCollector::enqueueConnectionCleanup( + const ConnectionPoolWithFailoverPtr & pool, std::shared_ptr connections) noexcept +{ + if (!connections) + return nullptr; + + if (connection_collector) + { + if (connection_collector->pool.trySchedule(AsyncDrainTask{pool, connections})) + { + CurrentMetrics::add(CurrentMetrics::AsyncDrainedConnections, 1); + return nullptr; + } + } + return connections; +} + +void ConnectionCollector::drainConnections(IConnections & connections) noexcept +{ + bool is_drained = false; + try + { + Packet packet = connections.drain(); + is_drained = true; + switch (packet.type) + { + case Protocol::Server::EndOfStream: + case Protocol::Server::Log: + break; + + case Protocol::Server::Exception: + packet.exception->rethrow(); + break; + + default: + throw Exception( + ErrorCodes::UNKNOWN_PACKET_FROM_SERVER, + "Unknown packet {} from one of the following replicas: {}", + toString(packet.type), + connections.dumpAddresses()); + } + } + catch (...) + { + tryLogCurrentException(&Poco::Logger::get("ConnectionCollector"), __PRETTY_FUNCTION__); + if (!is_drained) + { + try + { + connections.disconnect(); + } + catch (...) + { + tryLogCurrentException(&Poco::Logger::get("ConnectionCollector"), __PRETTY_FUNCTION__); + } + } + } +} + +} diff --git a/src/DataStreams/ConnectionCollector.h b/src/DataStreams/ConnectionCollector.h new file mode 100644 index 00000000000..5b6e82d000e --- /dev/null +++ b/src/DataStreams/ConnectionCollector.h @@ -0,0 +1,30 @@ +#pragma once + +#include +#include +#include +#include + +namespace DB +{ + +class ConnectionPoolWithFailover; +using ConnectionPoolWithFailoverPtr = std::shared_ptr; + +class ConnectionCollector : boost::noncopyable, WithMutableContext +{ +public: + static ConnectionCollector & init(ContextMutablePtr global_context_, size_t max_threads); + static std::shared_ptr + enqueueConnectionCleanup(const ConnectionPoolWithFailoverPtr & pool, std::shared_ptr connections) noexcept; + static void drainConnections(IConnections & connections) noexcept; + +private: + explicit ConnectionCollector(ContextMutablePtr global_context_, size_t max_threads); + + static constexpr size_t reschedule_time_ms = 1000; + ThreadPool pool; + static std::unique_ptr connection_collector; +}; + +} diff --git a/src/DataStreams/CountingBlockOutputStream.h b/src/DataStreams/CountingBlockOutputStream.h index 5c36c40c1ad..c7247b39945 100644 --- a/src/DataStreams/CountingBlockOutputStream.h +++ b/src/DataStreams/CountingBlockOutputStream.h @@ -1,6 +1,6 @@ #pragma once + #include -#include #include diff --git a/src/DataStreams/IBlockInputStream.cpp b/src/DataStreams/IBlockInputStream.cpp index a6484c41b4f..c3071cdcf20 100644 --- a/src/DataStreams/IBlockInputStream.cpp +++ b/src/DataStreams/IBlockInputStream.cpp @@ -25,7 +25,6 @@ namespace ErrorCodes extern const int TOO_MANY_BYTES; extern const int TOO_MANY_ROWS_OR_BYTES; extern const int LOGICAL_ERROR; - extern const int TOO_DEEP_PIPELINE; } @@ -357,74 +356,4 @@ Block IBlockInputStream::getExtremes() return res; } - -String IBlockInputStream::getTreeID() const -{ - WriteBufferFromOwnString s; - s << getName(); - - if (!children.empty()) - { - s << "("; - for (BlockInputStreams::const_iterator it = children.begin(); it != children.end(); ++it) - { - if (it != children.begin()) - s << ", "; - s << (*it)->getTreeID(); - } - s << ")"; - } - - return s.str(); -} - - -size_t IBlockInputStream::checkDepthImpl(size_t max_depth, size_t level) const -{ - if (children.empty()) - return 0; - - if (level > max_depth) - throw Exception("Query pipeline is too deep. Maximum: " + toString(max_depth), ErrorCodes::TOO_DEEP_PIPELINE); - - size_t res = 0; - for (const auto & child : children) - { - size_t child_depth = child->checkDepth(level + 1); - if (child_depth > res) - res = child_depth; - } - - return res + 1; -} - - -void IBlockInputStream::dumpTree(WriteBuffer & ostr, size_t indent, size_t multiplier) const -{ - ostr << String(indent, ' ') << getName(); - if (multiplier > 1) - ostr << " × " << multiplier; - //ostr << ": " << getHeader().dumpStructure(); - ostr << '\n'; - ++indent; - - /// If the subtree is repeated several times, then we output it once with the multiplier. - using Multipliers = std::map; - Multipliers multipliers; - - for (const auto & child : children) - ++multipliers[child->getTreeID()]; - - for (const auto & child : children) - { - String id = child->getTreeID(); - size_t & subtree_multiplier = multipliers[id]; - if (subtree_multiplier != 0) /// Already printed subtrees are marked with zero in the array of multipliers. - { - child->dumpTree(ostr, indent, subtree_multiplier); - subtree_multiplier = 0; - } - } -} - } diff --git a/src/DataStreams/IBlockInputStream.h b/src/DataStreams/IBlockInputStream.h index 090ea394fd6..8b3e2512e47 100644 --- a/src/DataStreams/IBlockInputStream.h +++ b/src/DataStreams/IBlockInputStream.h @@ -23,15 +23,6 @@ namespace ErrorCodes class ProcessListElement; class EnabledQuota; class QueryStatus; -struct SortColumnDescription; -using SortDescription = std::vector; - -/** Callback to track the progress of the query. - * Used in IBlockInputStream and Context. - * The function takes the number of rows in the last block, the number of bytes in the last block. - * Note that the callback can be called from different threads. - */ -using ProgressCallback = std::function; /** The stream interface for reading data by blocks from the database. @@ -93,15 +84,6 @@ public: */ virtual void readSuffix(); - /// Must be called before `read()` and `readPrefix()`. - void dumpTree(WriteBuffer & ostr, size_t indent = 0, size_t multiplier = 1) const; - - /** Check the depth of the pipeline. - * If max_depth is specified and the `depth` is greater - throw an exception. - * Must be called before `read()` and `readPrefix()`. - */ - size_t checkDepth(size_t max_depth) const { return checkDepthImpl(max_depth, max_depth); } - /// Do not allow to change the table while the blocks stream and its children are alive. void addTableLock(const TableLockHolder & lock) { table_locks.push_back(lock); } @@ -269,9 +251,6 @@ private: size_t checkDepthImpl(size_t max_depth, size_t level) const; - /// Get text with names of this source and the entire subtree. - String getTreeID() const; - template void forEachChild(F && f) { diff --git a/src/DataStreams/InputStreamFromASTInsertQuery.h b/src/DataStreams/InputStreamFromASTInsertQuery.h deleted file mode 100644 index 15b698a2d68..00000000000 --- a/src/DataStreams/InputStreamFromASTInsertQuery.h +++ /dev/null @@ -1,46 +0,0 @@ -#pragma once - -#include -#include -#include -#include - - -namespace DB -{ - -struct BlockIO; -class Context; -struct StorageInMemoryMetadata; -using StorageMetadataPtr = std::shared_ptr; - -/** Prepares an input stream which produce data containing in INSERT query - * Head of inserting data could be stored in INSERT ast directly - * Remaining (tail) data could be stored in input_buffer_tail_part - */ -class InputStreamFromASTInsertQuery : public IBlockInputStream -{ -public: - InputStreamFromASTInsertQuery( - const ASTPtr & ast, - ReadBuffer * input_buffer_tail_part, - const Block & header, - ContextPtr context, - const ASTPtr & input_function); - - Block readImpl() override { return res_stream->read(); } - void readPrefixImpl() override { return res_stream->readPrefix(); } - void readSuffixImpl() override { return res_stream->readSuffix(); } - - String getName() const override { return "InputStreamFromASTInsertQuery"; } - - Block getHeader() const override { return res_stream->getHeader(); } - -private: - std::unique_ptr input_buffer_ast_part; - std::unique_ptr input_buffer_contacenated; - - BlockInputStreamPtr res_stream; -}; - -} diff --git a/src/DataStreams/LazyBlockInputStream.h b/src/DataStreams/LazyBlockInputStream.h deleted file mode 100644 index 37089c9bb5b..00000000000 --- a/src/DataStreams/LazyBlockInputStream.h +++ /dev/null @@ -1,80 +0,0 @@ -#pragma once - -#include - - -namespace DB -{ - -/** Initialize another source on the first `read` call, and then use it. - * This is needed, for example, to read from a table that will be populated - * after creation of LazyBlockInputStream object, but before the first `read` call. - */ -class LazyBlockInputStream : public IBlockInputStream -{ -public: - using Generator = std::function; - - LazyBlockInputStream(const Block & header_, Generator generator_) - : header(header_), generator(std::move(generator_)) - { - } - - LazyBlockInputStream(const char * name_, const Block & header_, Generator generator_) - : name(name_), header(header_), generator(std::move(generator_)) - { - } - - String getName() const override { return name; } - - Block getHeader() const override - { - return header; - } - - /// We call readPrefix lazily. Suppress default behaviour. - void readPrefix() override {} - -protected: - Block readImpl() override - { - if (!input) - { - input = generator(); - - if (!input) - return Block(); - - auto * p_input = dynamic_cast(input.get()); - - if (p_input) - { - /// They could have been set before, but were not passed into the `input`. - if (progress_callback) - p_input->setProgressCallback(progress_callback); - if (process_list_elem) - p_input->setProcessListElement(process_list_elem); - } - - input->readPrefix(); - - { - addChild(input); - - if (isCancelled() && p_input) - p_input->cancel(is_killed); - } - } - - return input->read(); - } - -private: - const char * name = "Lazy"; - Block header; - Generator generator; - - BlockInputStreamPtr input; -}; - -} diff --git a/src/DataStreams/LimitBlockInputStream.cpp b/src/DataStreams/LimitBlockInputStream.cpp deleted file mode 100644 index 5e262e921e8..00000000000 --- a/src/DataStreams/LimitBlockInputStream.cpp +++ /dev/null @@ -1,158 +0,0 @@ -#include - -#include - - -namespace DB -{ - -/// gets pointers to all columns of block, which were used for ORDER BY -static ColumnRawPtrs extractSortColumns(const Block & block, const SortDescription & description) -{ - size_t size = description.size(); - ColumnRawPtrs res; - res.reserve(size); - - for (size_t i = 0; i < size; ++i) - { - const IColumn * column = !description[i].column_name.empty() - ? block.getByName(description[i].column_name).column.get() - : block.safeGetByPosition(description[i].column_number).column.get(); - res.emplace_back(column); - } - - return res; -} - - -LimitBlockInputStream::LimitBlockInputStream( - const BlockInputStreamPtr & input, UInt64 limit_, UInt64 offset_, bool always_read_till_end_, - bool use_limit_as_total_rows_approx, bool with_ties_, const SortDescription & description_) - : limit(limit_), offset(offset_), always_read_till_end(always_read_till_end_), with_ties(with_ties_) - , description(description_) -{ - if (use_limit_as_total_rows_approx) - { - addTotalRowsApprox(static_cast(limit)); - } - - children.push_back(input); -} - -Block LimitBlockInputStream::readImpl() -{ - Block res; - UInt64 rows = 0; - - /// pos >= offset + limit and all rows in the end of previous block were equal - /// to row at 'limit' position. So we check current block. - if (!ties_row_ref.empty() && pos >= offset + limit) - { - res = children.back()->read(); - rows = res.rows(); - - if (!res) - return res; - - SharedBlockPtr ptr = new detail::SharedBlock(std::move(res)); - ptr->sort_columns = extractSortColumns(*ptr, description); - - UInt64 len; - for (len = 0; len < rows; ++len) - { - SharedBlockRowRef current_row; - current_row.set(ptr, &ptr->sort_columns, len); - - if (current_row != ties_row_ref) - { - ties_row_ref.reset(); - break; - } - } - - if (len < rows) - { - for (size_t i = 0; i < ptr->columns(); ++i) - ptr->safeGetByPosition(i).column = ptr->safeGetByPosition(i).column->cut(0, len); - } - - return *ptr; - } - - if (pos >= offset + limit) - { - if (!always_read_till_end) - return res; - else - { - while (children.back()->read()) - ; - return res; - } - } - - do - { - res = children.back()->read(); - if (!res) - return res; - rows = res.rows(); - pos += rows; - } while (pos <= offset); - - SharedBlockPtr ptr = new detail::SharedBlock(std::move(res)); - if (with_ties) - ptr->sort_columns = extractSortColumns(*ptr, description); - - /// give away the whole block - if (pos >= offset + rows && pos <= offset + limit) - { - /// Save rowref for last row, because probalbly next block begins with the same row. - if (with_ties && pos == offset + limit) - ties_row_ref.set(ptr, &ptr->sort_columns, rows - 1); - return *ptr; - } - - /// give away a piece of the block - UInt64 start = std::max( - static_cast(0), - static_cast(offset) - static_cast(pos) + static_cast(rows)); - - UInt64 length = std::min( - static_cast(limit), std::min( - static_cast(pos) - static_cast(offset), - static_cast(limit) + static_cast(offset) - static_cast(pos) + static_cast(rows))); - - - /// check if other rows in current block equals to last one in limit - if (with_ties) - { - ties_row_ref.set(ptr, &ptr->sort_columns, start + length - 1); - - for (size_t i = ties_row_ref.row_num + 1; i < rows; ++i) - { - SharedBlockRowRef current_row; - current_row.set(ptr, &ptr->sort_columns, i); - if (current_row == ties_row_ref) - ++length; - else - { - ties_row_ref.reset(); - break; - } - } - } - - if (length == rows) - return *ptr; - - for (size_t i = 0; i < ptr->columns(); ++i) - ptr->safeGetByPosition(i).column = ptr->safeGetByPosition(i).column->cut(start, length); - - // TODO: we should provide feedback to child-block, so it will know how many rows are actually consumed. - // It's crucial for streaming engines like Kafka. - - return *ptr; -} - -} diff --git a/src/DataStreams/LimitBlockInputStream.h b/src/DataStreams/LimitBlockInputStream.h deleted file mode 100644 index 112e5dddb0c..00000000000 --- a/src/DataStreams/LimitBlockInputStream.h +++ /dev/null @@ -1,47 +0,0 @@ -#pragma once - -#include -#include -#include - - -namespace DB -{ - - -/** Implements the LIMIT relational operation. - */ -class LimitBlockInputStream : public IBlockInputStream -{ -public: - /** If always_read_till_end = false (by default), then after reading enough data, - * returns an empty block, and this causes the query to be canceled. - * If always_read_till_end = true - reads all the data to the end, but ignores them. This is necessary in rare cases: - * when otherwise, due to the cancellation of the request, we would not have received the data for GROUP BY WITH TOTALS from the remote server. - * If use_limit_as_total_rows_approx = true, then addTotalRowsApprox is called to use the limit in progress & stats - * with_ties = true, when query has WITH TIES modifier. If so, description should be provided - * description lets us know which row we should check for equality - */ - LimitBlockInputStream( - const BlockInputStreamPtr & input, UInt64 limit_, UInt64 offset_, - bool always_read_till_end_ = false, bool use_limit_as_total_rows_approx = false, - bool with_ties_ = false, const SortDescription & description_ = {}); - - String getName() const override { return "Limit"; } - - Block getHeader() const override { return children.at(0)->getHeader(); } - -protected: - Block readImpl() override; - -private: - UInt64 limit; - UInt64 offset; - UInt64 pos = 0; - bool always_read_till_end; - bool with_ties; - const SortDescription description; - SharedBlockRowRef ties_row_ref; -}; - -} diff --git a/src/DataStreams/MergingSortedBlockInputStream.cpp b/src/DataStreams/MergingSortedBlockInputStream.cpp deleted file mode 100644 index b7396a23d6a..00000000000 --- a/src/DataStreams/MergingSortedBlockInputStream.cpp +++ /dev/null @@ -1,273 +0,0 @@ -#include - -#include - -#include -#include - - -namespace DB -{ - -namespace ErrorCodes -{ - extern const int LOGICAL_ERROR; -} - - -MergingSortedBlockInputStream::MergingSortedBlockInputStream( - const BlockInputStreams & inputs_, SortDescription description_, - size_t max_block_size_, UInt64 limit_, WriteBuffer * out_row_sources_buf_, bool quiet_) - : description(std::move(description_)), max_block_size(max_block_size_), limit(limit_), quiet(quiet_) - , source_blocks(inputs_.size()) - , cursors(inputs_.size()), out_row_sources_buf(out_row_sources_buf_) - , log(&Poco::Logger::get("MergingSortedBlockInputStream")) -{ - children.insert(children.end(), inputs_.begin(), inputs_.end()); - header = children.at(0)->getHeader(); - num_columns = header.columns(); -} - -void MergingSortedBlockInputStream::init(MutableColumns & merged_columns) -{ - /// Read the first blocks, initialize the queue. - if (first) - { - first = false; - - for (size_t i = 0; i < source_blocks.size(); ++i) - { - Block & block = source_blocks[i]; - - if (block) - continue; - - block = children[i]->read(); - - const size_t rows = block.rows(); - - if (rows == 0) - continue; - - if (expected_block_size < rows) - expected_block_size = std::min(rows, max_block_size); - - cursors[i] = SortCursorImpl(block, description, i); - has_collation |= cursors[i].has_collation; - } - - if (has_collation) - queue_with_collation = SortingHeap(cursors); - else - queue_without_collation = SortingHeap(cursors); - } - - /// Let's check that all source blocks have the same structure. - for (const auto & block : source_blocks) - { - if (!block) - continue; - - assertBlocksHaveEqualStructure(block, header, getName()); - } - - merged_columns.resize(num_columns); - for (size_t i = 0; i < num_columns; ++i) - { - merged_columns[i] = header.safeGetByPosition(i).column->cloneEmpty(); - merged_columns[i]->reserve(expected_block_size); - } -} - - -Block MergingSortedBlockInputStream::readImpl() -{ - if (finished) - return {}; - - if (children.size() == 1) - return children[0]->read(); - - MutableColumns merged_columns; - - init(merged_columns); - if (merged_columns.empty()) - return {}; - - if (has_collation) - merge(merged_columns, queue_with_collation); - else - merge(merged_columns, queue_without_collation); - - return header.cloneWithColumns(std::move(merged_columns)); -} - - -template -void MergingSortedBlockInputStream::fetchNextBlock(const TSortCursor & current, SortingHeap & queue) -{ - size_t order = current->order; - size_t size = cursors.size(); - - if (order >= size || &cursors[order] != current.impl) - throw Exception("Logical error in MergingSortedBlockInputStream", ErrorCodes::LOGICAL_ERROR); - - while (true) - { - source_blocks[order] = children[order]->read(); - - if (!source_blocks[order]) - { - queue.removeTop(); - break; - } - - if (source_blocks[order].rows()) - { - cursors[order].reset(source_blocks[order]); - queue.replaceTop(&cursors[order]); - break; - } - } -} - - -template -void MergingSortedBlockInputStream::merge(MutableColumns & merged_columns, TSortingHeap & queue) -{ - size_t merged_rows = 0; - - /** Increase row counters. - * Return true if it's time to finish generating the current data block. - */ - auto count_row_and_check_limit = [&, this]() - { - ++total_merged_rows; - if (limit && total_merged_rows == limit) - { - // std::cerr << "Limit reached\n"; - cancel(false); - finished = true; - return true; - } - - ++merged_rows; - return merged_rows >= max_block_size; - }; - - /// Take rows in required order and put them into `merged_columns`, while the number of rows are no more than `max_block_size` - while (queue.isValid()) - { - auto current = queue.current(); - - /** And what if the block is totally less or equal than the rest for the current cursor? - * Or is there only one data source left in the queue? Then you can take the entire block on current cursor. - */ - if (current->isFirst() - && (queue.size() == 1 - || (queue.size() >= 2 && current.totallyLessOrEquals(queue.nextChild())))) - { -// std::cerr << "current block is totally less or equals\n"; - - /// If there are already data in the current block, we first return it. We'll get here again the next time we call the merge function. - if (merged_rows != 0) - { - //std::cerr << "merged rows is non-zero\n"; - return; - } - - /// Actually, current->order stores source number (i.e. cursors[current->order] == current) - size_t source_num = current->order; - - if (source_num >= cursors.size()) - throw Exception("Logical error in MergingSortedBlockInputStream", ErrorCodes::LOGICAL_ERROR); - - for (size_t i = 0; i < num_columns; ++i) - merged_columns[i] = IColumn::mutate(std::move(source_blocks[source_num].getByPosition(i).column)); - -// std::cerr << "copied columns\n"; - - merged_rows = merged_columns.at(0)->size(); - - /// Limit output - if (limit && total_merged_rows + merged_rows > limit) - { - merged_rows = limit - total_merged_rows; - for (size_t i = 0; i < num_columns; ++i) - { - auto & column = merged_columns[i]; - column = IColumn::mutate(column->cut(0, merged_rows)); - } - - cancel(false); - finished = true; - } - - /// Write order of rows for other columns - /// this data will be used in grather stream - if (out_row_sources_buf) - { - RowSourcePart row_source(source_num); - for (size_t i = 0; i < merged_rows; ++i) - out_row_sources_buf->write(row_source.data); - } - - //std::cerr << "fetching next block\n"; - - total_merged_rows += merged_rows; - fetchNextBlock(current, queue); - return; - } - -// std::cerr << "total_merged_rows: " << total_merged_rows << ", merged_rows: " << merged_rows << "\n"; -// std::cerr << "Inserting row\n"; - for (size_t i = 0; i < num_columns; ++i) - merged_columns[i]->insertFrom(*current->all_columns[i], current->getRow()); - - if (out_row_sources_buf) - { - /// Actually, current.impl->order stores source number (i.e. cursors[current.impl->order] == current.impl) - RowSourcePart row_source(current->order); - out_row_sources_buf->write(row_source.data); - } - - if (!current->isLast()) - { -// std::cerr << "moving to next row\n"; - queue.next(); - } - else - { - /// We get the next block from the corresponding source, if there is one. -// std::cerr << "It was last row, fetching next block\n"; - fetchNextBlock(current, queue); - } - - if (count_row_and_check_limit()) - return; - } - - /// We have read all data. Ask children to cancel providing more data. - cancel(false); - finished = true; -} - - -void MergingSortedBlockInputStream::readSuffixImpl() -{ - if (quiet) - return; - - const BlockStreamProfileInfo & profile_info = getProfileInfo(); - double seconds = profile_info.total_stopwatch.elapsedSeconds(); - - if (!seconds) - LOG_DEBUG(log, "Merge sorted {} blocks, {} rows in 0 sec.", profile_info.blocks, profile_info.rows); - else - LOG_DEBUG(log, "Merge sorted {} blocks, {} rows in {} sec., {} rows/sec., {}/sec", - profile_info.blocks, profile_info.rows, seconds, - profile_info.rows / seconds, - ReadableSize(profile_info.bytes / seconds)); -} - -} diff --git a/src/DataStreams/MergingSortedBlockInputStream.h b/src/DataStreams/MergingSortedBlockInputStream.h deleted file mode 100644 index 582b41ff3af..00000000000 --- a/src/DataStreams/MergingSortedBlockInputStream.h +++ /dev/null @@ -1,87 +0,0 @@ -#pragma once - -#include -#include - -#include - -#include - - -namespace Poco { class Logger; } - - -namespace DB -{ - -/** Merges several sorted streams into one sorted stream. - */ -class MergingSortedBlockInputStream : public IBlockInputStream -{ -public: - /** limit - if isn't 0, then we can produce only first limit rows in sorted order. - * out_row_sources - if isn't nullptr, then at the end of execution it should contain part numbers of each read row (and needed flag) - * quiet - don't log profiling info - */ - MergingSortedBlockInputStream( - const BlockInputStreams & inputs_, SortDescription description_, size_t max_block_size_, - UInt64 limit_ = 0, WriteBuffer * out_row_sources_buf_ = nullptr, bool quiet_ = false); - - String getName() const override { return "MergingSorted"; } - - Block getHeader() const override { return header; } - -protected: - Block readImpl() override; - - void readSuffixImpl() override; - - /// Initializes the queue and the columns of next result block. - void init(MutableColumns & merged_columns); - - /// Gets the next block from the source corresponding to the `current`. - template - void fetchNextBlock(const TSortCursor & current, SortingHeap & queue); - - Block header; - - const SortDescription description; - const size_t max_block_size; - UInt64 limit; - UInt64 total_merged_rows = 0; - - bool first = true; - bool has_collation = false; - bool quiet = false; - - /// May be smaller or equal to max_block_size. To do 'reserve' for columns. - size_t expected_block_size = 0; - - /// Blocks currently being merged. - size_t num_columns = 0; - Blocks source_blocks; - - SortCursorImpls cursors; - - SortingHeap queue_without_collation; - SortingHeap queue_with_collation; - - /// Used in Vertical merge algorithm to gather non-PK/non-index columns (on next step) - /// If it is not nullptr then it should be populated during execution - WriteBuffer * out_row_sources_buf; - -private: - - /** We support two different cursors - with Collation and without. - * Templates are used instead of polymorphic SortCursor and calls to virtual functions. - */ - template - void merge(MutableColumns & merged_columns, TSortingHeap & queue); - - Poco::Logger * log; - - /// Read is finished. - bool finished = false; -}; - -} diff --git a/src/DataStreams/NullBlockInputStream.h b/src/DataStreams/NullBlockInputStream.h deleted file mode 100644 index 2e4f78899dc..00000000000 --- a/src/DataStreams/NullBlockInputStream.h +++ /dev/null @@ -1,24 +0,0 @@ -#pragma once - -#include - - -namespace DB -{ - -/// Empty stream of blocks of specified structure. -class NullBlockInputStream : public IBlockInputStream -{ -public: - NullBlockInputStream(const Block & header_) : header(header_) {} - - Block getHeader() const override { return header; } - String getName() const override { return "Null"; } - -private: - Block header; - - Block readImpl() override { return {}; } -}; - -} diff --git a/src/DataStreams/ParallelInputsProcessor.h b/src/DataStreams/ParallelInputsProcessor.h index 07602954223..65c7e741ec2 100644 --- a/src/DataStreams/ParallelInputsProcessor.h +++ b/src/DataStreams/ParallelInputsProcessor.h @@ -8,7 +8,6 @@ #include -#include #include #include #include diff --git a/src/DataStreams/PushingToSinkBlockOutputStream.h b/src/DataStreams/PushingToSinkBlockOutputStream.h new file mode 100644 index 00000000000..eeca8506d8e --- /dev/null +++ b/src/DataStreams/PushingToSinkBlockOutputStream.h @@ -0,0 +1,114 @@ +#pragma once +#include +#include +#include +namespace DB +{ + +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + +class PushingToSinkBlockOutputStream : public IBlockOutputStream +{ +public: + explicit PushingToSinkBlockOutputStream(SinkToStoragePtr sink_) + : sink(std::move(sink_)), port(sink->getPort().getHeader(), sink.get()) {} + + Block getHeader() const override { return sink->getPort().getHeader(); } + + void write(const Block & block) override + { + /// In case writePrefix was not called. + if (!port.isConnected()) + writePrefix(); + + if (!block) + return; + + size_t num_rows = block.rows(); + Chunk chunk(block.getColumns(), num_rows); + port.push(std::move(chunk)); + + while (true) + { + auto status = sink->prepare(); + switch (status) + { + case IProcessor::Status::Ready: + sink->work(); + continue; + case IProcessor::Status::NeedData: + return; + case IProcessor::Status::Async: [[fallthrough]]; + case IProcessor::Status::ExpandPipeline: [[fallthrough]]; + case IProcessor::Status::Finished: [[fallthrough]]; + case IProcessor::Status::PortFull: + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Status {} in not expected in PushingToSinkBlockOutputStream::writePrefix", + IProcessor::statusToName(status)); + } + } + } + + void writePrefix() override + { + connect(port, sink->getPort()); + + while (true) + { + auto status = sink->prepare(); + switch (status) + { + case IProcessor::Status::Ready: + sink->work(); + continue; + case IProcessor::Status::NeedData: + return; + case IProcessor::Status::Async: [[fallthrough]]; + case IProcessor::Status::ExpandPipeline: [[fallthrough]]; + case IProcessor::Status::Finished: [[fallthrough]]; + case IProcessor::Status::PortFull: + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Status {} in not expected in PushingToSinkBlockOutputStream::writePrefix", + IProcessor::statusToName(status)); + } + } + } + + void writeSuffix() override + { + port.finish(); + while (true) + { + auto status = sink->prepare(); + switch (status) + { + case IProcessor::Status::Ready: + sink->work(); + continue; + case IProcessor::Status::Finished: + + ///flush(); + return; + case IProcessor::Status::NeedData: + case IProcessor::Status::Async: + case IProcessor::Status::ExpandPipeline: + case IProcessor::Status::PortFull: + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Status {} in not expected in PushingToSinkBlockOutputStream::writeSuffix", + IProcessor::statusToName(status)); + } + } + } + +private: + SinkToStoragePtr sink; + OutputPort port; +}; + +} diff --git a/src/DataStreams/PushingToViewsBlockOutputStream.cpp b/src/DataStreams/PushingToViewsBlockOutputStream.cpp index c09420cab6f..7729eb5fb44 100644 --- a/src/DataStreams/PushingToViewsBlockOutputStream.cpp +++ b/src/DataStreams/PushingToViewsBlockOutputStream.cpp @@ -13,12 +13,12 @@ #include #include #include -#include +#include #include #include #include #include - +#include namespace DB { @@ -127,8 +127,12 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream( /// Do not push to destination table if the flag is set if (!no_destination) { - output = storage->write(query_ptr, storage->getInMemoryMetadataPtr(), getContext()); - replicated_output = dynamic_cast(output.get()); + auto sink = storage->write(query_ptr, storage->getInMemoryMetadataPtr(), getContext()); + + metadata_snapshot->check(sink->getPort().getHeader().getColumnsWithTypeAndName()); + + replicated_output = dynamic_cast(sink.get()); + output = std::make_shared(std::move(sink)); } } diff --git a/src/DataStreams/PushingToViewsBlockOutputStream.h b/src/DataStreams/PushingToViewsBlockOutputStream.h index 552a0a3452a..db6b671ce2c 100644 --- a/src/DataStreams/PushingToViewsBlockOutputStream.h +++ b/src/DataStreams/PushingToViewsBlockOutputStream.h @@ -13,7 +13,7 @@ class Logger; namespace DB { -class ReplicatedMergeTreeBlockOutputStream; +class ReplicatedMergeTreeSink; /** Writes data to the specified table and to all dependent materialized views. */ @@ -38,7 +38,7 @@ private: StoragePtr storage; StorageMetadataPtr metadata_snapshot; BlockOutputStreamPtr output; - ReplicatedMergeTreeBlockOutputStream * replicated_output = nullptr; + ReplicatedMergeTreeSink * replicated_output = nullptr; Poco::Logger * log; ASTPtr query_ptr; diff --git a/src/DataStreams/RemoteBlockInputStream.cpp b/src/DataStreams/RemoteBlockInputStream.cpp index c633600d37f..7caa54cff22 100644 --- a/src/DataStreams/RemoteBlockInputStream.cpp +++ b/src/DataStreams/RemoteBlockInputStream.cpp @@ -5,27 +5,28 @@ namespace DB { RemoteBlockInputStream::RemoteBlockInputStream( - Connection & connection, - const String & query_, const Block & header_, ContextPtr context_, - const ThrottlerPtr & throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_) + Connection & connection, + const String & query_, const Block & header_, ContextPtr context_, + const ThrottlerPtr & throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_) : query_executor(connection, query_, header_, context_, throttler, scalars_, external_tables_, stage_) { init(); } RemoteBlockInputStream::RemoteBlockInputStream( - std::vector && connections, - const String & query_, const Block & header_, ContextPtr context_, - const ThrottlerPtr & throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_) - : query_executor(std::move(connections), query_, header_, context_, throttler, scalars_, external_tables_, stage_) + const ConnectionPoolWithFailoverPtr & pool, + std::vector && connections, + const String & query_, const Block & header_, ContextPtr context_, + const ThrottlerPtr & throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_) + : query_executor(pool, std::move(connections), query_, header_, context_, throttler, scalars_, external_tables_, stage_) { init(); } RemoteBlockInputStream::RemoteBlockInputStream( - const ConnectionPoolWithFailoverPtr & pool, - const String & query_, const Block & header_, ContextPtr context_, - const ThrottlerPtr & throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_) + const ConnectionPoolWithFailoverPtr & pool, + const String & query_, const Block & header_, ContextPtr context_, + const ThrottlerPtr & throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_) : query_executor(pool, query_, header_, context_, throttler, scalars_, external_tables_, stage_) { init(); @@ -38,11 +39,6 @@ void RemoteBlockInputStream::init() query_executor.setLogger(log); } -void RemoteBlockInputStream::readPrefix() -{ - query_executor.sendQuery(); -} - void RemoteBlockInputStream::cancel(bool kill) { if (kill) diff --git a/src/DataStreams/RemoteBlockInputStream.h b/src/DataStreams/RemoteBlockInputStream.h index b0029da91bb..1be6b031520 100644 --- a/src/DataStreams/RemoteBlockInputStream.h +++ b/src/DataStreams/RemoteBlockInputStream.h @@ -24,24 +24,25 @@ class RemoteBlockInputStream : public IBlockInputStream public: /// Takes already set connection. RemoteBlockInputStream( - Connection & connection, - const String & query_, const Block & header_, ContextPtr context_, - const ThrottlerPtr & throttler = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), - QueryProcessingStage::Enum stage_ = QueryProcessingStage::Complete); + Connection & connection, + const String & query_, const Block & header_, ContextPtr context_, + const ThrottlerPtr & throttler = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), + QueryProcessingStage::Enum stage_ = QueryProcessingStage::Complete); /// Accepts several connections already taken from pool. RemoteBlockInputStream( - std::vector && connections, - const String & query_, const Block & header_, ContextPtr context_, - const ThrottlerPtr & throttler = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), - QueryProcessingStage::Enum stage_ = QueryProcessingStage::Complete); + const ConnectionPoolWithFailoverPtr & pool, + std::vector && connections, + const String & query_, const Block & header_, ContextPtr context_, + const ThrottlerPtr & throttler = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), + QueryProcessingStage::Enum stage_ = QueryProcessingStage::Complete); /// Takes a pool and gets one or several connections from it. RemoteBlockInputStream( - const ConnectionPoolWithFailoverPtr & pool, - const String & query_, const Block & header_, ContextPtr context_, - const ThrottlerPtr & throttler = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), - QueryProcessingStage::Enum stage_ = QueryProcessingStage::Complete); + const ConnectionPoolWithFailoverPtr & pool, + const String & query_, const Block & header_, ContextPtr context_, + const ThrottlerPtr & throttler = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), + QueryProcessingStage::Enum stage_ = QueryProcessingStage::Complete); /// Set the query_id. For now, used by performance test to later find the query /// in the server query_log. Must be called before sending the query to the server. @@ -52,9 +53,6 @@ public: void setMainTable(StorageID main_table_) { query_executor.setMainTable(std::move(main_table_)); } - /// Sends query (initiates calculation) before read() - void readPrefix() override; - /// Prevent default progress notification because progress' callback is called by its own. void progress(const Progress & /*value*/) override {} diff --git a/src/DataStreams/RemoteQueryExecutor.cpp b/src/DataStreams/RemoteQueryExecutor.cpp index 0c60bfdbfdb..21e874691c1 100644 --- a/src/DataStreams/RemoteQueryExecutor.cpp +++ b/src/DataStreams/RemoteQueryExecutor.cpp @@ -1,3 +1,4 @@ +#include #include #include @@ -17,6 +18,12 @@ #include #include +namespace CurrentMetrics +{ +extern const Metric SyncDrainedConnections; +extern const Metric ActiveSyncDrainedConnections; +} + namespace DB { @@ -27,42 +34,63 @@ namespace ErrorCodes extern const int DUPLICATED_PART_UUIDS; } +RemoteQueryExecutor::RemoteQueryExecutor( + const String & query_, const Block & header_, ContextPtr context_, + const Scalars & scalars_, const Tables & external_tables_, + QueryProcessingStage::Enum stage_, std::shared_ptr task_iterator_) + : header(header_), query(query_), context(context_), scalars(scalars_) + , external_tables(external_tables_), stage(stage_), task_iterator(task_iterator_) +{} + RemoteQueryExecutor::RemoteQueryExecutor( Connection & connection, const String & query_, const Block & header_, ContextPtr context_, ThrottlerPtr throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_, std::shared_ptr task_iterator_) - : header(header_), query(query_), context(context_) - , scalars(scalars_), external_tables(external_tables_), stage(stage_), task_iterator(task_iterator_) + : RemoteQueryExecutor(query_, header_, context_, scalars_, external_tables_, stage_, task_iterator_) { create_connections = [this, &connection, throttler]() { - return std::make_unique(connection, context->getSettingsRef(), throttler); + return std::make_shared(connection, context->getSettingsRef(), throttler); }; } RemoteQueryExecutor::RemoteQueryExecutor( + std::shared_ptr connection_ptr, + const String & query_, const Block & header_, ContextPtr context_, + ThrottlerPtr throttler, const Scalars & scalars_, const Tables & external_tables_, + QueryProcessingStage::Enum stage_, std::shared_ptr task_iterator_) + : RemoteQueryExecutor(query_, header_, context_, scalars_, external_tables_, stage_, task_iterator_) +{ + create_connections = [this, connection_ptr, throttler]() + { + return std::make_shared(connection_ptr, context->getSettingsRef(), throttler); + }; +} + +RemoteQueryExecutor::RemoteQueryExecutor( + const ConnectionPoolWithFailoverPtr & pool_, std::vector && connections_, const String & query_, const Block & header_, ContextPtr context_, const ThrottlerPtr & throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_, std::shared_ptr task_iterator_) : header(header_), query(query_), context(context_) - , scalars(scalars_), external_tables(external_tables_), stage(stage_), task_iterator(task_iterator_) + , scalars(scalars_), external_tables(external_tables_), stage(stage_), task_iterator(task_iterator_), pool(pool_) { create_connections = [this, connections_, throttler]() mutable { - return std::make_unique(std::move(connections_), context->getSettingsRef(), throttler); + return std::make_shared(std::move(connections_), context->getSettingsRef(), throttler); }; } RemoteQueryExecutor::RemoteQueryExecutor( - const ConnectionPoolWithFailoverPtr & pool, + const ConnectionPoolWithFailoverPtr & pool_, const String & query_, const Block & header_, ContextPtr context_, const ThrottlerPtr & throttler, const Scalars & scalars_, const Tables & external_tables_, QueryProcessingStage::Enum stage_, std::shared_ptr task_iterator_) : header(header_), query(query_), context(context_) - , scalars(scalars_), external_tables(external_tables_), stage(stage_), task_iterator(task_iterator_) + , scalars(scalars_), external_tables(external_tables_), stage(stage_), task_iterator(task_iterator_), pool(pool_) { - create_connections = [this, pool, throttler]()->std::unique_ptr + create_connections = [this, throttler]()->std::shared_ptr { const Settings & current_settings = context->getSettingsRef(); auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover(current_settings); @@ -74,7 +102,7 @@ RemoteQueryExecutor::RemoteQueryExecutor( if (main_table) table_to_check = std::make_shared(main_table.getQualifiedName()); - return std::make_unique(pool, current_settings, timeouts, throttler, pool_mode, table_to_check); + return std::make_shared(pool, context, timeouts, throttler, pool_mode, table_to_check); } #endif @@ -89,7 +117,7 @@ RemoteQueryExecutor::RemoteQueryExecutor( else connection_entries = pool->getMany(timeouts, ¤t_settings, pool_mode); - return std::make_unique(std::move(connection_entries), current_settings, throttler); + return std::make_shared(std::move(connection_entries), current_settings, throttler); }; } @@ -406,32 +434,15 @@ void RemoteQueryExecutor::finish(std::unique_ptr * read_context) /// Send the request to abort the execution of the request, if not already sent. tryCancel("Cancelling query because enough data has been read", read_context); - - /// Get the remaining packets so that there is no out of sync in the connections to the replicas. - Packet packet = connections->drain(); - switch (packet.type) + /// Try to drain connections asynchronously. + if (auto conn = ConnectionCollector::enqueueConnectionCleanup(pool, connections)) { - case Protocol::Server::EndOfStream: - finished = true; - break; - - case Protocol::Server::Log: - /// Pass logs from remote server to client - if (auto log_queue = CurrentThread::getInternalTextLogsQueue()) - log_queue->pushBlock(std::move(packet.block)); - break; - - case Protocol::Server::Exception: - got_exception_from_replica = true; - packet.exception->rethrow(); - break; - - default: - got_unknown_packet_from_replica = true; - throw Exception(ErrorCodes::UNKNOWN_PACKET_FROM_SERVER, "Unknown packet {} from one of the following replicas: {}", - toString(packet.type), - connections->dumpAddresses()); + /// Drain connections synchronously. + CurrentMetrics::Increment metric_increment(CurrentMetrics::ActiveSyncDrainedConnections); + ConnectionCollector::drainConnections(*conn); + CurrentMetrics::add(CurrentMetrics::SyncDrainedConnections, 1); } + finished = true; } void RemoteQueryExecutor::cancel(std::unique_ptr * read_context) @@ -506,20 +517,18 @@ void RemoteQueryExecutor::sendExternalTables() void RemoteQueryExecutor::tryCancel(const char * reason, std::unique_ptr * read_context) { - { - /// Flag was_cancelled is atomic because it is checked in read(). - std::lock_guard guard(was_cancelled_mutex); + /// Flag was_cancelled is atomic because it is checked in read(). + std::lock_guard guard(was_cancelled_mutex); - if (was_cancelled) - return; + if (was_cancelled) + return; - was_cancelled = true; + was_cancelled = true; - if (read_context && *read_context) - (*read_context)->cancel(); + if (read_context && *read_context) + (*read_context)->cancel(); - connections->sendCancel(); - } + connections->sendCancel(); if (log) LOG_TRACE(log, "({}) {}", connections->dumpAddresses(), reason); diff --git a/src/DataStreams/RemoteQueryExecutor.h b/src/DataStreams/RemoteQueryExecutor.h index a9cffd9cf97..3df17fab3fc 100644 --- a/src/DataStreams/RemoteQueryExecutor.h +++ b/src/DataStreams/RemoteQueryExecutor.h @@ -36,14 +36,23 @@ public: using ReadContext = RemoteQueryExecutorReadContext; /// Takes already set connection. + /// We don't own connection, thus we have to drain it synchronously. RemoteQueryExecutor( Connection & connection, const String & query_, const Block & header_, ContextPtr context_, ThrottlerPtr throttler_ = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), QueryProcessingStage::Enum stage_ = QueryProcessingStage::Complete, std::shared_ptr task_iterator_ = {}); + /// Takes already set connection. + RemoteQueryExecutor( + std::shared_ptr connection, + const String & query_, const Block & header_, ContextPtr context_, + ThrottlerPtr throttler_ = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), + QueryProcessingStage::Enum stage_ = QueryProcessingStage::Complete, std::shared_ptr task_iterator_ = {}); + /// Accepts several connections already taken from pool. RemoteQueryExecutor( + const ConnectionPoolWithFailoverPtr & pool, std::vector && connections_, const String & query_, const Block & header_, ContextPtr context_, const ThrottlerPtr & throttler = nullptr, const Scalars & scalars_ = Scalars(), const Tables & external_tables_ = Tables(), @@ -103,13 +112,15 @@ public: const Block & getHeader() const { return header; } private: + RemoteQueryExecutor( + const String & query_, const Block & header_, ContextPtr context_, + const Scalars & scalars_, const Tables & external_tables_, + QueryProcessingStage::Enum stage_, std::shared_ptr task_iterator_); + Block header; Block totals; Block extremes; - std::function()> create_connections; - std::unique_ptr connections; - const String query; String query_id; ContextPtr context; @@ -125,6 +136,12 @@ private: /// Initiator identifier for distributed task processing std::shared_ptr task_iterator; + std::function()> create_connections; + /// Hold a shared reference to the connection pool so that asynchronous connection draining will + /// work safely. Make sure it's the first member so that we don't destruct it too early. + const ConnectionPoolWithFailoverPtr pool; + std::shared_ptr connections; + /// Streams for reading from temporary tables and following sending of data /// to remote servers for GLOBAL-subqueries std::vector external_tables_data; diff --git a/src/DataStreams/RemoteQueryExecutorReadContext.cpp b/src/DataStreams/RemoteQueryExecutorReadContext.cpp index eb89fd9398f..c1f415bb597 100644 --- a/src/DataStreams/RemoteQueryExecutorReadContext.cpp +++ b/src/DataStreams/RemoteQueryExecutorReadContext.cpp @@ -43,7 +43,7 @@ struct RemoteQueryExecutorRoutine { while (true) { - read_context.packet = connections.receivePacketUnlocked(ReadCallback{read_context, sink}); + read_context.packet = connections.receivePacketUnlocked(ReadCallback{read_context, sink}, false /* is_draining */); sink = std::move(sink).resume(); } } @@ -144,7 +144,7 @@ bool RemoteQueryExecutorReadContext::checkTimeoutImpl(bool blocking) if (is_timer_alarmed && !is_socket_ready) { - /// Socket receive timeout. Drain it in case or error, or it may be hide by timeout exception. + /// Socket receive timeout. Drain it in case of error, or it may be hide by timeout exception. timer.drain(); throw NetException("Timeout exceeded", ErrorCodes::SOCKET_TIMEOUT); } diff --git a/src/DataStreams/TemporaryFileStream.h b/src/DataStreams/TemporaryFileStream.h index ce9071801d0..ec38f6c1baa 100644 --- a/src/DataStreams/TemporaryFileStream.h +++ b/src/DataStreams/TemporaryFileStream.h @@ -4,6 +4,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -32,32 +35,38 @@ struct TemporaryFileStream {} /// Flush data from input stream into file for future reading - static void write(const std::string & path, const Block & header, IBlockInputStream & input, - std::atomic * is_cancelled, const std::string & codec) + static void write(const std::string & path, const Block & header, QueryPipeline pipeline, const std::string & codec) { WriteBufferFromFile file_buf(path); CompressedWriteBuffer compressed_buf(file_buf, CompressionCodecFactory::instance().get(codec, {})); NativeBlockOutputStream output(compressed_buf, 0, header); - copyData(input, output, is_cancelled); + + PullingPipelineExecutor executor(pipeline); + + output.writePrefix(); + + Block block; + while (executor.pull(block)) + output.write(block); + + output.writeSuffix(); compressed_buf.finalize(); } }; -class TemporaryFileLazyInputStream : public IBlockInputStream +class TemporaryFileLazySource : public ISource { public: - TemporaryFileLazyInputStream(const std::string & path_, const Block & header_) - : path(path_) - , header(header_) + TemporaryFileLazySource(const std::string & path_, const Block & header_) + : ISource(header_) + , path(path_) , done(false) {} - String getName() const override { return "TemporaryFile"; } - Block getHeader() const override { return header; } - void readSuffix() override {} + String getName() const override { return "TemporaryFileLazySource"; } protected: - Block readImpl() override + Chunk generate() override { if (done) return {}; @@ -71,7 +80,7 @@ protected: done = true; stream.reset(); } - return block; + return Chunk(block.getColumns(), block.rows()); } private: diff --git a/src/DataStreams/formatBlock.cpp b/src/DataStreams/formatBlock.cpp new file mode 100644 index 00000000000..e38540256ac --- /dev/null +++ b/src/DataStreams/formatBlock.cpp @@ -0,0 +1,15 @@ +#include +#include +#include + +namespace DB +{ +void formatBlock(BlockOutputStreamPtr & out, const Block & block) +{ + out->writePrefix(); + out->write(block); + out->writeSuffix(); + out->flush(); +} + +} diff --git a/src/DataStreams/formatBlock.h b/src/DataStreams/formatBlock.h new file mode 100644 index 00000000000..939b72682c3 --- /dev/null +++ b/src/DataStreams/formatBlock.h @@ -0,0 +1,9 @@ +#pragma once + +#include + +namespace DB +{ +void formatBlock(BlockOutputStreamPtr & out, const Block & block); + +} diff --git a/src/DataStreams/narrowBlockInputStreams.h b/src/DataStreams/narrowBlockInputStreams.h index 97e9c164ddc..c026f5fbedf 100644 --- a/src/DataStreams/narrowBlockInputStreams.h +++ b/src/DataStreams/narrowBlockInputStreams.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB diff --git a/src/DataStreams/tests/gtest_blocks_size_merging_streams.cpp b/src/DataStreams/tests/gtest_blocks_size_merging_streams.cpp index 0ce450c4e6c..aa4c717a28b 100644 --- a/src/DataStreams/tests/gtest_blocks_size_merging_streams.cpp +++ b/src/DataStreams/tests/gtest_blocks_size_merging_streams.cpp @@ -1,11 +1,10 @@ #include #include #include -#include +#include #include #include #include -#include #include #include #include @@ -40,7 +39,7 @@ static Pipe getInputStreams(const std::vector & column_names, const size_t start = stride; while (blocks_count--) blocks.push_back(getBlockWithSize(column_names, block_size_in_bytes, stride, start)); - pipes.emplace_back(std::make_shared(std::make_shared(std::move(blocks)))); + pipes.emplace_back(std::make_shared(std::move(blocks))); } return Pipe::unitePipes(std::move(pipes)); @@ -57,7 +56,7 @@ static Pipe getInputStreamsEqualStride(const std::vector & column_n size_t start = i; while (blocks_count--) blocks.push_back(getBlockWithSize(column_names, block_size_in_bytes, stride, start)); - pipes.emplace_back(std::make_shared(std::make_shared(std::move(blocks)))); + pipes.emplace_back(std::make_shared(std::move(blocks))); i++; } return Pipe::unitePipes(std::move(pipes)); @@ -84,7 +83,7 @@ TEST(MergingSortedTest, SimpleBlockSizeTest) EXPECT_EQ(pipe.numOutputPorts(), 3); auto transform = std::make_shared(pipe.getHeader(), pipe.numOutputPorts(), sort_description, - DEFAULT_MERGE_BLOCK_SIZE, 0, nullptr, false, true); + DEFAULT_MERGE_BLOCK_SIZE, 0, false, nullptr, false, true); pipe.addTransform(std::move(transform)); @@ -129,7 +128,7 @@ TEST(MergingSortedTest, MoreInterestingBlockSizes) EXPECT_EQ(pipe.numOutputPorts(), 3); auto transform = std::make_shared(pipe.getHeader(), pipe.numOutputPorts(), sort_description, - DEFAULT_MERGE_BLOCK_SIZE, 0, nullptr, false, true); + DEFAULT_MERGE_BLOCK_SIZE, 0, false, nullptr, false, true); pipe.addTransform(std::move(transform)); diff --git a/src/DataStreams/tests/gtest_check_sorted_stream.cpp b/src/DataStreams/tests/gtest_check_sorted_stream.cpp index 0c5cc6d58e1..2788c44389b 100644 --- a/src/DataStreams/tests/gtest_check_sorted_stream.cpp +++ b/src/DataStreams/tests/gtest_check_sorted_stream.cpp @@ -2,8 +2,10 @@ #include #include -#include -#include +#include +#include +#include +#include #include @@ -89,14 +91,22 @@ TEST(CheckSortedBlockInputStream, CheckGoodCase) for (size_t i = 0; i < 3; ++i) blocks.push_back(getSortedBlockWithSize(key_columns, 10, 1, i * 10)); - BlockInputStreamPtr stream = std::make_shared(std::move(blocks)); + Pipe pipe(std::make_shared(std::move(blocks))); + pipe.addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, sort_description); + }); - CheckSortedBlockInputStream sorted(stream, sort_description); + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); - EXPECT_NO_THROW(sorted.read()); - EXPECT_NO_THROW(sorted.read()); - EXPECT_NO_THROW(sorted.read()); - EXPECT_EQ(sorted.read(), Block()); + PullingPipelineExecutor executor(pipeline); + + Chunk chunk; + EXPECT_NO_THROW(executor.pull(chunk)); + EXPECT_NO_THROW(executor.pull(chunk)); + EXPECT_NO_THROW(executor.pull(chunk)); + EXPECT_FALSE(executor.pull(chunk)); } TEST(CheckSortedBlockInputStream, CheckBadLastRow) @@ -109,14 +119,21 @@ TEST(CheckSortedBlockInputStream, CheckBadLastRow) blocks.push_back(getSortedBlockWithSize(key_columns, 100, 1, 0)); blocks.push_back(getSortedBlockWithSize(key_columns, 100, 1, 300)); - BlockInputStreamPtr stream = std::make_shared(std::move(blocks)); + Pipe pipe(std::make_shared(std::move(blocks))); + pipe.addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, sort_description); + }); - CheckSortedBlockInputStream sorted(stream, sort_description); + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); + PullingPipelineExecutor executor(pipeline); - EXPECT_NO_THROW(sorted.read()); - EXPECT_NO_THROW(sorted.read()); - EXPECT_THROW(sorted.read(), DB::Exception); + Chunk chunk; + EXPECT_NO_THROW(executor.pull(chunk)); + EXPECT_NO_THROW(executor.pull(chunk)); + EXPECT_THROW(executor.pull(chunk), DB::Exception); } @@ -127,11 +144,19 @@ TEST(CheckSortedBlockInputStream, CheckUnsortedBlock1) BlocksList blocks; blocks.push_back(getUnSortedBlockWithSize(key_columns, 100, 1, 0, 5, 1, 77)); - BlockInputStreamPtr stream = std::make_shared(std::move(blocks)); + Pipe pipe(std::make_shared(std::move(blocks))); + pipe.addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, sort_description); + }); - CheckSortedBlockInputStream sorted(stream, sort_description); + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); - EXPECT_THROW(sorted.read(), DB::Exception); + PullingPipelineExecutor executor(pipeline); + + Chunk chunk; + EXPECT_THROW(executor.pull(chunk), DB::Exception); } TEST(CheckSortedBlockInputStream, CheckUnsortedBlock2) @@ -141,11 +166,19 @@ TEST(CheckSortedBlockInputStream, CheckUnsortedBlock2) BlocksList blocks; blocks.push_back(getUnSortedBlockWithSize(key_columns, 100, 1, 0, 99, 2, 77)); - BlockInputStreamPtr stream = std::make_shared(std::move(blocks)); + Pipe pipe(std::make_shared(std::move(blocks))); + pipe.addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, sort_description); + }); - CheckSortedBlockInputStream sorted(stream, sort_description); + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); - EXPECT_THROW(sorted.read(), DB::Exception); + PullingPipelineExecutor executor(pipeline); + + Chunk chunk; + EXPECT_THROW(executor.pull(chunk), DB::Exception); } TEST(CheckSortedBlockInputStream, CheckUnsortedBlock3) @@ -155,11 +188,19 @@ TEST(CheckSortedBlockInputStream, CheckUnsortedBlock3) BlocksList blocks; blocks.push_back(getUnSortedBlockWithSize(key_columns, 100, 1, 0, 50, 0, 77)); - BlockInputStreamPtr stream = std::make_shared(std::move(blocks)); + Pipe pipe(std::make_shared(std::move(blocks))); + pipe.addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, sort_description); + }); - CheckSortedBlockInputStream sorted(stream, sort_description); + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); - EXPECT_THROW(sorted.read(), DB::Exception); + PullingPipelineExecutor executor(pipeline); + + Chunk chunk; + EXPECT_THROW(executor.pull(chunk), DB::Exception); } TEST(CheckSortedBlockInputStream, CheckEqualBlock) @@ -171,11 +212,19 @@ TEST(CheckSortedBlockInputStream, CheckEqualBlock) blocks.push_back(getEqualValuesBlockWithSize(key_columns, 10)); blocks.push_back(getEqualValuesBlockWithSize(key_columns, 1)); - BlockInputStreamPtr stream = std::make_shared(std::move(blocks)); + Pipe pipe(std::make_shared(std::move(blocks))); + pipe.addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, sort_description); + }); - CheckSortedBlockInputStream sorted(stream, sort_description); + QueryPipeline pipeline; + pipeline.init(std::move(pipe)); - EXPECT_NO_THROW(sorted.read()); - EXPECT_NO_THROW(sorted.read()); - EXPECT_NO_THROW(sorted.read()); + PullingPipelineExecutor executor(pipeline); + + Chunk chunk; + EXPECT_NO_THROW(executor.pull(chunk)); + EXPECT_NO_THROW(executor.pull(chunk)); + EXPECT_NO_THROW(executor.pull(chunk)); } diff --git a/src/DataStreams/ya.make b/src/DataStreams/ya.make index e6534ebc2f7..2012af76697 100644 --- a/src/DataStreams/ya.make +++ b/src/DataStreams/ya.make @@ -14,13 +14,12 @@ NO_COMPILER_WARNINGS() SRCS( AddingDefaultBlockOutputStream.cpp - AddingDefaultsBlockInputStream.cpp AsynchronousBlockInputStream.cpp BlockIO.cpp BlockStreamProfileInfo.cpp CheckConstraintsBlockOutputStream.cpp - CheckSortedBlockInputStream.cpp ColumnGathererStream.cpp + ConnectionCollector.cpp ConvertingBlockInputStream.cpp CountingBlockOutputStream.cpp DistinctSortedBlockInputStream.cpp @@ -28,11 +27,8 @@ SRCS( ExpressionBlockInputStream.cpp IBlockInputStream.cpp ITTLAlgorithm.cpp - InputStreamFromASTInsertQuery.cpp InternalTextLogsRowOutputStream.cpp - LimitBlockInputStream.cpp MaterializingBlockInputStream.cpp - MergingSortedBlockInputStream.cpp MongoDBBlockInputStream.cpp NativeBlockInputStream.cpp NativeBlockOutputStream.cpp diff --git a/src/DataTypes/DataTypeAggregateFunction.h b/src/DataTypes/DataTypeAggregateFunction.h index c3fea2ba727..f9bfbcafdab 100644 --- a/src/DataTypes/DataTypeAggregateFunction.h +++ b/src/DataTypes/DataTypeAggregateFunction.h @@ -33,6 +33,8 @@ public: const char * getFamilyName() const override { return "AggregateFunction"; } TypeIndex getTypeId() const override { return TypeIndex::AggregateFunction; } + Array getParameters() const { return parameters; } + bool canBeInsideNullable() const override { return false; } DataTypePtr getReturnType() const { return function->getReturnType(); } diff --git a/src/DataTypes/DataTypeDate32.h b/src/DataTypes/DataTypeDate32.h index 17f2f8b9924..e74e4553614 100644 --- a/src/DataTypes/DataTypeDate32.h +++ b/src/DataTypes/DataTypeDate32.h @@ -1,6 +1,7 @@ #pragma once #include +#include namespace DB { @@ -12,6 +13,11 @@ public: TypeIndex getTypeId() const override { return TypeIndex::Date32; } const char * getFamilyName() const override { return family_name; } + Field getDefault() const override + { + return -static_cast(DateLUT::instance().getDayNumOffsetEpoch()); + } + bool canBeUsedAsVersion() const override { return true; } bool canBeInsideNullable() const override { return true; } diff --git a/src/DataTypes/DataTypeInterval.h b/src/DataTypes/DataTypeInterval.h index d66b329185d..a44fd686b61 100644 --- a/src/DataTypes/DataTypeInterval.h +++ b/src/DataTypes/DataTypeInterval.h @@ -36,6 +36,7 @@ public: bool isParametric() const override { return true; } bool cannotBeStoredInTables() const override { return true; } bool isCategorial() const override { return false; } + bool canBeInsideNullable() const override { return true; } }; } diff --git a/src/DataTypes/FieldToDataType.cpp b/src/DataTypes/FieldToDataType.cpp index c1a8cacd5c2..3c3439593ed 100644 --- a/src/DataTypes/FieldToDataType.cpp +++ b/src/DataTypes/FieldToDataType.cpp @@ -19,6 +19,7 @@ namespace DB namespace ErrorCodes { extern const int EMPTY_DATA_PASSED; + extern const int LOGICAL_ERROR; } @@ -27,6 +28,16 @@ DataTypePtr FieldToDataType::operator() (const Null &) const return std::make_shared(std::make_shared()); } +DataTypePtr FieldToDataType::operator() (const NegativeInfinity &) const +{ + throw Exception("It's invalid to have -inf literals in SQL", ErrorCodes::LOGICAL_ERROR); +} + +DataTypePtr FieldToDataType::operator() (const PositiveInfinity &) const +{ + throw Exception("It's invalid to have +inf literals in SQL", ErrorCodes::LOGICAL_ERROR); +} + DataTypePtr FieldToDataType::operator() (const UInt64 & x) const { if (x <= std::numeric_limits::max()) return std::make_shared(); diff --git a/src/DataTypes/FieldToDataType.h b/src/DataTypes/FieldToDataType.h index ca83ce868fc..6d579b2bf65 100644 --- a/src/DataTypes/FieldToDataType.h +++ b/src/DataTypes/FieldToDataType.h @@ -21,6 +21,8 @@ class FieldToDataType : public StaticVisitor { public: DataTypePtr operator() (const Null & x) const; + DataTypePtr operator() (const NegativeInfinity & x) const; + DataTypePtr operator() (const PositiveInfinity & x) const; DataTypePtr operator() (const UInt64 & x) const; DataTypePtr operator() (const UInt128 & x) const; DataTypePtr operator() (const UInt256 & x) const; diff --git a/src/DataTypes/IDataType.h b/src/DataTypes/IDataType.h index 5eba65e39b9..c4f04282487 100644 --- a/src/DataTypes/IDataType.h +++ b/src/DataTypes/IDataType.h @@ -417,7 +417,7 @@ template inline bool isColumnedAsNumber(const T & data_type) { WhichDataType which(data_type); - return which.isInt() || which.isUInt() || which.isFloat() || which.isDate() || which.isDateTime() || which.isDateTime64() || which.isUUID(); + return which.isInt() || which.isUInt() || which.isFloat() || which.isDateOrDate32() || which.isDateTime() || which.isDateTime64() || which.isUUID(); } template @@ -484,6 +484,7 @@ template class DataTypeNumber; class DataTypeDate; +class DataTypeDate32; class DataTypeDateTime; class DataTypeDateTime64; @@ -493,6 +494,7 @@ template <> inline constexpr bool IsDataTypeDecimal = true; template constexpr bool IsDataTypeNumber> = true; template <> inline constexpr bool IsDataTypeDateOrDateTime = true; +template <> inline constexpr bool IsDataTypeDateOrDateTime = true; template <> inline constexpr bool IsDataTypeDateOrDateTime = true; template <> inline constexpr bool IsDataTypeDateOrDateTime = true; diff --git a/src/DataTypes/Serializations/SerializationIP.cpp b/src/DataTypes/Serializations/SerializationIP.cpp index ec49f960c77..14790c6b530 100644 --- a/src/DataTypes/Serializations/SerializationIP.cpp +++ b/src/DataTypes/Serializations/SerializationIP.cpp @@ -1,8 +1,11 @@ #include + #include +#include #include #include -#include +#include +#include namespace DB { diff --git a/src/Databases/DatabaseFactory.cpp b/src/Databases/DatabaseFactory.cpp index 6a1914bf046..75a3b9c9e1e 100644 --- a/src/Databases/DatabaseFactory.cpp +++ b/src/Databases/DatabaseFactory.cpp @@ -23,8 +23,8 @@ # include # include # include -# include -# include +# include +# include # include #endif @@ -103,8 +103,11 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String const String & engine_name = engine_define->engine->name; const UUID & uuid = create.uuid; - bool engine_may_have_arguments = engine_name == "MySQL" || engine_name == "MaterializeMySQL" || engine_name == "Lazy" || - engine_name == "Replicated" || engine_name == "PostgreSQL" || engine_name == "MaterializedPostgreSQL" || engine_name == "SQLite"; + static const std::unordered_set engines_with_arguments{"MySQL", "MaterializeMySQL", "MaterializedMySQL", + "Lazy", "Replicated", "PostgreSQL", "MaterializedPostgreSQL", "SQLite"}; + + bool engine_may_have_arguments = engines_with_arguments.contains(engine_name); + if (engine_define->engine->arguments && !engine_may_have_arguments) throw Exception("Database engine " + engine_name + " cannot have arguments", ErrorCodes::BAD_ARGUMENTS); @@ -112,6 +115,7 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String engine_define->primary_key || engine_define->order_by || engine_define->sample_by; bool may_have_settings = endsWith(engine_name, "MySQL") || engine_name == "Replicated" || engine_name == "MaterializedPostgreSQL"; + if (has_unexpected_element || (!may_have_settings && engine_define->settings)) throw Exception("Database engine " + engine_name + " cannot have parameters, primary_key, order_by, sample_by, settings", ErrorCodes::UNKNOWN_ELEMENT_IN_AST); @@ -127,7 +131,7 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String #if USE_MYSQL - else if (engine_name == "MySQL" || engine_name == "MaterializeMySQL") + else if (engine_name == "MySQL" || engine_name == "MaterializeMySQL" || engine_name == "MaterializedMySQL") { const ASTFunction * engine = engine_define->engine; if (!engine->arguments || engine->arguments->children.size() != 4) @@ -165,17 +169,17 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String auto mysql_pool = mysqlxx::Pool(mysql_database_name, remote_host_name, mysql_user_name, mysql_user_password, remote_port); - auto materialize_mode_settings = std::make_unique(); + auto materialize_mode_settings = std::make_unique(); if (engine_define->settings) materialize_mode_settings->loadFromQuery(*engine_define); if (create.uuid == UUIDHelpers::Nil) - return std::make_shared>( + return std::make_shared>( context, database_name, metadata_path, uuid, mysql_database_name, std::move(mysql_pool), std::move(client) , std::move(materialize_mode_settings)); else - return std::make_shared>( + return std::make_shared>( context, database_name, metadata_path, uuid, mysql_database_name, std::move(mysql_pool), std::move(client) , std::move(materialize_mode_settings)); } @@ -232,11 +236,10 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String { const ASTFunction * engine = engine_define->engine; - if (!engine->arguments || engine->arguments->children.size() < 4 || engine->arguments->children.size() > 5) - throw Exception(fmt::format( - "{} Database require host:port, database_name, username, password arguments " - "[, use_table_cache = 0].", engine_name), - ErrorCodes::BAD_ARGUMENTS); + if (!engine->arguments || engine->arguments->children.size() < 4 || engine->arguments->children.size() > 6) + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "{} Database require `host:port`, `database_name`, `username`, `password` [, `schema` = "", `use_table_cache` = 0].", + engine_name); ASTs & engine_args = engine->arguments->children; @@ -248,9 +251,13 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String const auto & username = safeGetLiteralValue(engine_args[2], engine_name); const auto & password = safeGetLiteralValue(engine_args[3], engine_name); + String schema; + if (engine->arguments->children.size() >= 5) + schema = safeGetLiteralValue(engine_args[4], engine_name); + auto use_table_cache = 0; - if (engine->arguments->children.size() == 5) - use_table_cache = safeGetLiteralValue(engine_args[4], engine_name); + if (engine->arguments->children.size() >= 6) + use_table_cache = safeGetLiteralValue(engine_args[5], engine_name); /// Split into replicas if needed. size_t max_addresses = context->getSettingsRef().glob_expansion_max_elements; @@ -265,7 +272,7 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String context->getSettingsRef().postgresql_connection_pool_wait_timeout); return std::make_shared( - context, metadata_path, engine_define, database_name, postgres_database_name, connection_pool, use_table_cache); + context, metadata_path, engine_define, database_name, postgres_database_name, schema, connection_pool, use_table_cache); } else if (engine_name == "MaterializedPostgreSQL") { @@ -273,9 +280,9 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String if (!engine->arguments || engine->arguments->children.size() != 4) { - throw Exception( - fmt::format("{} Database require host:port, database_name, username, password arguments ", engine_name), - ErrorCodes::BAD_ARGUMENTS); + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "{} Database require `host:port`, `database_name`, `username`, `password`.", + engine_name); } ASTs & engine_args = engine->arguments->children; @@ -317,7 +324,7 @@ DatabasePtr DatabaseFactory::getImpl(const ASTCreateQuery & create, const String String database_path = safeGetLiteralValue(arguments[0], "SQLite"); - return std::make_shared(context, engine_define, database_path); + return std::make_shared(context, engine_define, create.attach, database_path); } #endif diff --git a/src/Databases/DatabaseLazy.cpp b/src/Databases/DatabaseLazy.cpp index 28f9372a61e..abcb8dbb974 100644 --- a/src/Databases/DatabaseLazy.cpp +++ b/src/Databases/DatabaseLazy.cpp @@ -305,12 +305,12 @@ void DatabaseLazy::clearExpiredTables() const DatabaseLazyIterator::DatabaseLazyIterator(DatabaseLazy & database_, Strings && table_names_) - : database(database_) + : IDatabaseTablesIterator(database_.database_name) + , database(database_) , table_names(std::move(table_names_)) , iterator(table_names.begin()) , current_storage(nullptr) { - database_name = database.database_name; } void DatabaseLazyIterator::next() diff --git a/src/Databases/DatabaseReplicated.cpp b/src/Databases/DatabaseReplicated.cpp index b3e5fc67151..26dd8763c40 100644 --- a/src/Databases/DatabaseReplicated.cpp +++ b/src/Databases/DatabaseReplicated.cpp @@ -40,6 +40,7 @@ namespace ErrorCodes extern const int NOT_IMPLEMENTED; extern const int INCORRECT_QUERY; extern const int ALL_CONNECTION_TRIES_FAILED; + extern const int NO_ACTIVE_REPLICAS; } static constexpr const char * DROPPED_MARK = "DROPPED"; @@ -137,7 +138,9 @@ ClusterPtr DatabaseReplicated::getClusterImpl() const Coordination::Stat stat; hosts = zookeeper->getChildren(zookeeper_path + "/replicas", &stat); if (hosts.empty()) - throw Exception(ErrorCodes::LOGICAL_ERROR, "No hosts found"); + throw Exception(ErrorCodes::NO_ACTIVE_REPLICAS, "No replicas of database {} found. " + "It's possible if the first replica is not fully created yet " + "or if the last replica was just dropped or due to logical error", database_name); Int32 cversion = stat.cversion; std::sort(hosts.begin(), hosts.end()); @@ -193,7 +196,17 @@ ClusterPtr DatabaseReplicated::getClusterImpl() const UInt16 default_port = getContext()->getTCPPort(); bool secure = db_settings.cluster_secure_connection; - return std::make_shared(getContext()->getSettingsRef(), shards, username, password, default_port, false, secure); + bool treat_local_as_remote = false; + bool treat_local_port_as_remote = getContext()->getApplicationType() == Context::ApplicationType::LOCAL; + return std::make_shared( + getContext()->getSettingsRef(), + shards, + username, + password, + default_port, + treat_local_as_remote, + treat_local_port_as_remote, + secure); } void DatabaseReplicated::tryConnectToZooKeeperAndInitDatabase(bool force_attach) @@ -504,6 +517,19 @@ void DatabaseReplicated::recoverLostReplica(const ZooKeeperPtr & current_zookeep } } + auto make_query_context = [this, current_zookeeper]() + { + auto query_context = Context::createCopy(getContext()); + query_context->makeQueryContext(); + query_context->getClientInfo().query_kind = ClientInfo::QueryKind::SECONDARY_QUERY; + query_context->getClientInfo().is_replicated_database_internal = true; + query_context->setCurrentDatabase(database_name); + query_context->setCurrentQueryId(""); + auto txn = std::make_shared(current_zookeeper, zookeeper_path, false, ""); + query_context->initZooKeeperMetadataTransaction(txn); + return query_context; + }; + String db_name = getDatabaseName(); String to_db_name = getDatabaseName() + BROKEN_TABLES_SUFFIX; if (total_tables * db_settings.max_broken_tables_ratio < tables_to_detach.size()) @@ -538,7 +564,7 @@ void DatabaseReplicated::recoverLostReplica(const ZooKeeperPtr & current_zookeep dropped_dictionaries += table->isDictionary(); table->flushAndShutdown(); - DatabaseAtomic::dropTable(getContext(), table_name, true); + DatabaseAtomic::dropTable(make_query_context(), table_name, true); } else { @@ -548,7 +574,7 @@ void DatabaseReplicated::recoverLostReplica(const ZooKeeperPtr & current_zookeep assert(db_name < to_db_name); DDLGuardPtr to_table_guard = DatabaseCatalog::instance().getDDLGuard(to_db_name, to_name); auto to_db_ptr = DatabaseCatalog::instance().getDatabase(to_db_name); - DatabaseAtomic::renameTable(getContext(), table_name, *to_db_ptr, to_name, false, false); + DatabaseAtomic::renameTable(make_query_context(), table_name, *to_db_ptr, to_name, false, false); ++moved_tables; } } @@ -567,7 +593,7 @@ void DatabaseReplicated::recoverLostReplica(const ZooKeeperPtr & current_zookeep /// TODO Maybe we should do it in two steps: rename all tables to temporary names and then rename them to actual names? DDLGuardPtr table_guard = DatabaseCatalog::instance().getDDLGuard(db_name, std::min(from, to)); DDLGuardPtr to_table_guard = DatabaseCatalog::instance().getDDLGuard(db_name, std::max(from, to)); - DatabaseAtomic::renameTable(getContext(), from, *this, to, false, false); + DatabaseAtomic::renameTable(make_query_context(), from, *this, to, false, false); } for (const auto & id : dropped_tables) @@ -582,15 +608,9 @@ void DatabaseReplicated::recoverLostReplica(const ZooKeeperPtr & current_zookeep } auto query_ast = parseQueryFromMetadataInZooKeeper(name_and_meta.first, name_and_meta.second); - - auto query_context = Context::createCopy(getContext()); - query_context->makeQueryContext(); - query_context->getClientInfo().query_kind = ClientInfo::QueryKind::SECONDARY_QUERY; - query_context->setCurrentDatabase(database_name); - query_context->setCurrentQueryId(""); // generate random query_id - LOG_INFO(log, "Executing {}", serializeAST(*query_ast)); - InterpreterCreateQuery(query_ast, query_context).execute(); + auto create_query_context = make_query_context(); + InterpreterCreateQuery(query_ast, create_query_context).execute(); } current_zookeeper->set(replica_path + "/log_ptr", toString(max_log_ptr)); diff --git a/src/Databases/DatabaseReplicatedSettings.h b/src/Databases/DatabaseReplicatedSettings.h index 43003af1120..0aff26712c0 100644 --- a/src/Databases/DatabaseReplicatedSettings.h +++ b/src/Databases/DatabaseReplicatedSettings.h @@ -18,9 +18,6 @@ class ASTStorage; DECLARE_SETTINGS_TRAITS(DatabaseReplicatedSettingsTraits, LIST_OF_DATABASE_REPLICATED_SETTINGS) -/** Settings for the MaterializeMySQL database engine. - * Could be loaded from a CREATE DATABASE query (SETTINGS clause). - */ struct DatabaseReplicatedSettings : public BaseSettings { void loadFromQuery(ASTStorage & storage_def); diff --git a/src/Databases/DatabaseReplicatedWorker.cpp b/src/Databases/DatabaseReplicatedWorker.cpp index eb7e65e1b70..365a5d02816 100644 --- a/src/Databases/DatabaseReplicatedWorker.cpp +++ b/src/Databases/DatabaseReplicatedWorker.cpp @@ -60,12 +60,13 @@ void DatabaseReplicatedDDLWorker::initializeReplication() /// Check if we need to recover replica. /// Invariant: replica is lost if it's log_ptr value is less then max_log_ptr - logs_to_keep. - String log_ptr_str = current_zookeeper->get(database->replica_path + "/log_ptr"); + auto zookeeper = getAndSetZooKeeper(); + String log_ptr_str = zookeeper->get(database->replica_path + "/log_ptr"); UInt32 our_log_ptr = parse(log_ptr_str); - UInt32 max_log_ptr = parse(current_zookeeper->get(database->zookeeper_path + "/max_log_ptr")); - logs_to_keep = parse(current_zookeeper->get(database->zookeeper_path + "/logs_to_keep")); + UInt32 max_log_ptr = parse(zookeeper->get(database->zookeeper_path + "/max_log_ptr")); + logs_to_keep = parse(zookeeper->get(database->zookeeper_path + "/logs_to_keep")); if (our_log_ptr == 0 || our_log_ptr + logs_to_keep < max_log_ptr) - database->recoverLostReplica(current_zookeeper, our_log_ptr, max_log_ptr); + database->recoverLostReplica(zookeeper, our_log_ptr, max_log_ptr); else last_skipped_entry_name.emplace(DDLTaskBase::getLogEntryName(our_log_ptr)); } @@ -198,7 +199,7 @@ DDLTaskPtr DatabaseReplicatedDDLWorker::initAndCheckTask(const String & entry_na } } - UInt32 our_log_ptr = parse(current_zookeeper->get(fs::path(database->replica_path) / "log_ptr")); + UInt32 our_log_ptr = parse(zookeeper->get(fs::path(database->replica_path) / "log_ptr")); UInt32 entry_num = DatabaseReplicatedTask::getLogEntryNumber(entry_name); if (entry_num <= our_log_ptr) diff --git a/src/Databases/DatabaseReplicatedWorker.h b/src/Databases/DatabaseReplicatedWorker.h index 4020906f9b2..773612e403c 100644 --- a/src/Databases/DatabaseReplicatedWorker.h +++ b/src/Databases/DatabaseReplicatedWorker.h @@ -43,7 +43,7 @@ private: mutable std::mutex mutex; std::condition_variable wait_current_task_change; String current_task; - UInt32 logs_to_keep = std::numeric_limits::max(); + std::atomic logs_to_keep = std::numeric_limits::max(); }; } diff --git a/src/Databases/IDatabase.h b/src/Databases/IDatabase.h index ba5fa974d5c..0c8382465f7 100644 --- a/src/Databases/IDatabase.h +++ b/src/Databases/IDatabase.h @@ -45,6 +45,9 @@ public: /// - it maintains a list of tables but tables are loaded lazily). virtual const StoragePtr & table() const = 0; + IDatabaseTablesIterator(const String & database_name_) : database_name(database_name_) { } + IDatabaseTablesIterator(String && database_name_) : database_name(std::move(database_name_)) { } + virtual ~IDatabaseTablesIterator() = default; virtual UUID uuid() const { return UUIDHelpers::Nil; } @@ -52,7 +55,7 @@ public: const String & databaseName() const { assert(!database_name.empty()); return database_name; } protected: - String database_name; + const String database_name; }; /// Copies list of tables and iterates through such snapshot. @@ -64,26 +67,24 @@ private: protected: DatabaseTablesSnapshotIterator(DatabaseTablesSnapshotIterator && other) + : IDatabaseTablesIterator(std::move(other.database_name)) { size_t idx = std::distance(other.tables.begin(), other.it); std::swap(tables, other.tables); other.it = other.tables.end(); it = tables.begin(); std::advance(it, idx); - database_name = std::move(other.database_name); } public: DatabaseTablesSnapshotIterator(const Tables & tables_, const String & database_name_) - : tables(tables_), it(tables.begin()) + : IDatabaseTablesIterator(database_name_), tables(tables_), it(tables.begin()) { - database_name = database_name_; } DatabaseTablesSnapshotIterator(Tables && tables_, String && database_name_) - : tables(std::move(tables_)), it(tables.begin()) + : IDatabaseTablesIterator(std::move(database_name_)), tables(std::move(tables_)), it(tables.begin()) { - database_name = std::move(database_name_); } void next() override { ++it; } diff --git a/src/Databases/MySQL/DatabaseMaterializeMySQL.cpp b/src/Databases/MySQL/DatabaseMaterializedMySQL.cpp similarity index 59% rename from src/Databases/MySQL/DatabaseMaterializeMySQL.cpp rename to src/Databases/MySQL/DatabaseMaterializedMySQL.cpp index bf9d6cdfbfa..4046c6929c2 100644 --- a/src/Databases/MySQL/DatabaseMaterializeMySQL.cpp +++ b/src/Databases/MySQL/DatabaseMaterializedMySQL.cpp @@ -4,15 +4,15 @@ #if USE_MYSQL -# include +# include # include # include # include -# include -# include +# include +# include # include -# include +# include # include # include # include @@ -29,7 +29,7 @@ namespace ErrorCodes } template <> -DatabaseMaterializeMySQL::DatabaseMaterializeMySQL( +DatabaseMaterializedMySQL::DatabaseMaterializedMySQL( ContextPtr context_, const String & database_name_, const String & metadata_path_, @@ -37,12 +37,12 @@ DatabaseMaterializeMySQL::DatabaseMaterializeMySQL( const String & mysql_database_name_, mysqlxx::Pool && pool_, MySQLClient && client_, - std::unique_ptr settings_) + std::unique_ptr settings_) : DatabaseOrdinary( database_name_, metadata_path_, "data/" + escapeForFileName(database_name_) + "/", - "DatabaseMaterializeMySQL (" + database_name_ + ")", + "DatabaseMaterializedMySQL (" + database_name_ + ")", context_) , settings(std::move(settings_)) , materialize_thread(context_, database_name_, mysql_database_name_, std::move(pool_), std::move(client_), settings.get()) @@ -50,7 +50,7 @@ DatabaseMaterializeMySQL::DatabaseMaterializeMySQL( } template <> -DatabaseMaterializeMySQL::DatabaseMaterializeMySQL( +DatabaseMaterializedMySQL::DatabaseMaterializedMySQL( ContextPtr context_, const String & database_name_, const String & metadata_path_, @@ -58,15 +58,15 @@ DatabaseMaterializeMySQL::DatabaseMaterializeMySQL( const String & mysql_database_name_, mysqlxx::Pool && pool_, MySQLClient && client_, - std::unique_ptr settings_) - : DatabaseAtomic(database_name_, metadata_path_, uuid, "DatabaseMaterializeMySQL (" + database_name_ + ")", context_) + std::unique_ptr settings_) + : DatabaseAtomic(database_name_, metadata_path_, uuid, "DatabaseMaterializedMySQL (" + database_name_ + ")", context_) , settings(std::move(settings_)) , materialize_thread(context_, database_name_, mysql_database_name_, std::move(pool_), std::move(client_), settings.get()) { } template -void DatabaseMaterializeMySQL::rethrowExceptionIfNeed() const +void DatabaseMaterializedMySQL::rethrowExceptionIfNeed() const { std::unique_lock lock(Base::mutex); @@ -87,14 +87,14 @@ void DatabaseMaterializeMySQL::rethrowExceptionIfNeed() const } template -void DatabaseMaterializeMySQL::setException(const std::exception_ptr & exception_) +void DatabaseMaterializedMySQL::setException(const std::exception_ptr & exception_) { std::unique_lock lock(Base::mutex); exception = exception_; } template -void DatabaseMaterializeMySQL::loadStoredObjects(ContextMutablePtr context_, bool has_force_restore_data_flag, bool force_attach) +void DatabaseMaterializedMySQL::loadStoredObjects(ContextMutablePtr context_, bool has_force_restore_data_flag, bool force_attach) { Base::loadStoredObjects(context_, has_force_restore_data_flag, force_attach); if (!force_attach) @@ -105,59 +105,59 @@ void DatabaseMaterializeMySQL::loadStoredObjects(ContextMutablePtr context } template -void DatabaseMaterializeMySQL::createTable(ContextPtr context_, const String & name, const StoragePtr & table, const ASTPtr & query) +void DatabaseMaterializedMySQL::createTable(ContextPtr context_, const String & name, const StoragePtr & table, const ASTPtr & query) { assertCalledFromSyncThreadOrDrop("create table"); Base::createTable(context_, name, table, query); } template -void DatabaseMaterializeMySQL::dropTable(ContextPtr context_, const String & name, bool no_delay) +void DatabaseMaterializedMySQL::dropTable(ContextPtr context_, const String & name, bool no_delay) { assertCalledFromSyncThreadOrDrop("drop table"); Base::dropTable(context_, name, no_delay); } template -void DatabaseMaterializeMySQL::attachTable(const String & name, const StoragePtr & table, const String & relative_table_path) +void DatabaseMaterializedMySQL::attachTable(const String & name, const StoragePtr & table, const String & relative_table_path) { assertCalledFromSyncThreadOrDrop("attach table"); Base::attachTable(name, table, relative_table_path); } template -StoragePtr DatabaseMaterializeMySQL::detachTable(const String & name) +StoragePtr DatabaseMaterializedMySQL::detachTable(const String & name) { assertCalledFromSyncThreadOrDrop("detach table"); return Base::detachTable(name); } template -void DatabaseMaterializeMySQL::renameTable(ContextPtr context_, const String & name, IDatabase & to_database, const String & to_name, bool exchange, bool dictionary) +void DatabaseMaterializedMySQL::renameTable(ContextPtr context_, const String & name, IDatabase & to_database, const String & to_name, bool exchange, bool dictionary) { assertCalledFromSyncThreadOrDrop("rename table"); if (exchange) - throw Exception("MaterializeMySQL database not support exchange table.", ErrorCodes::NOT_IMPLEMENTED); + throw Exception("MaterializedMySQL database not support exchange table.", ErrorCodes::NOT_IMPLEMENTED); if (dictionary) - throw Exception("MaterializeMySQL database not support rename dictionary.", ErrorCodes::NOT_IMPLEMENTED); + throw Exception("MaterializedMySQL database not support rename dictionary.", ErrorCodes::NOT_IMPLEMENTED); if (to_database.getDatabaseName() != Base::getDatabaseName()) - throw Exception("Cannot rename with other database for MaterializeMySQL database.", ErrorCodes::NOT_IMPLEMENTED); + throw Exception("Cannot rename with other database for MaterializedMySQL database.", ErrorCodes::NOT_IMPLEMENTED); Base::renameTable(context_, name, *this, to_name, exchange, dictionary); } template -void DatabaseMaterializeMySQL::alterTable(ContextPtr context_, const StorageID & table_id, const StorageInMemoryMetadata & metadata) +void DatabaseMaterializedMySQL::alterTable(ContextPtr context_, const StorageID & table_id, const StorageInMemoryMetadata & metadata) { assertCalledFromSyncThreadOrDrop("alter table"); Base::alterTable(context_, table_id, metadata); } template -void DatabaseMaterializeMySQL::drop(ContextPtr context_) +void DatabaseMaterializedMySQL::drop(ContextPtr context_) { /// Remove metadata info fs::path metadata(Base::getMetadataPath() + "/.metadata"); @@ -169,16 +169,16 @@ void DatabaseMaterializeMySQL::drop(ContextPtr context_) } template -StoragePtr DatabaseMaterializeMySQL::tryGetTable(const String & name, ContextPtr context_) const +StoragePtr DatabaseMaterializedMySQL::tryGetTable(const String & name, ContextPtr context_) const { - if (!MaterializeMySQLSyncThread::isMySQLSyncThread()) + if (!MaterializedMySQLSyncThread::isMySQLSyncThread()) { StoragePtr nested_storage = Base::tryGetTable(name, context_); if (!nested_storage) return {}; - return std::make_shared(std::move(nested_storage), this); + return std::make_shared(std::move(nested_storage), this); } return Base::tryGetTable(name, context_); @@ -186,36 +186,36 @@ StoragePtr DatabaseMaterializeMySQL::tryGetTable(const String & name, Cont template DatabaseTablesIteratorPtr -DatabaseMaterializeMySQL::getTablesIterator(ContextPtr context_, const DatabaseOnDisk::FilterByNameFunction & filter_by_table_name) +DatabaseMaterializedMySQL::getTablesIterator(ContextPtr context_, const DatabaseOnDisk::FilterByNameFunction & filter_by_table_name) { - if (!MaterializeMySQLSyncThread::isMySQLSyncThread()) + if (!MaterializedMySQLSyncThread::isMySQLSyncThread()) { DatabaseTablesIteratorPtr iterator = Base::getTablesIterator(context_, filter_by_table_name); - return std::make_unique(std::move(iterator), this); + return std::make_unique(std::move(iterator), this); } return Base::getTablesIterator(context_, filter_by_table_name); } template -void DatabaseMaterializeMySQL::assertCalledFromSyncThreadOrDrop(const char * method) const +void DatabaseMaterializedMySQL::assertCalledFromSyncThreadOrDrop(const char * method) const { - if (!MaterializeMySQLSyncThread::isMySQLSyncThread() && started_up) - throw Exception(ErrorCodes::NOT_IMPLEMENTED, "MaterializeMySQL database not support {}", method); + if (!MaterializedMySQLSyncThread::isMySQLSyncThread() && started_up) + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "MaterializedMySQL database not support {}", method); } template -void DatabaseMaterializeMySQL::shutdownSynchronizationThread() +void DatabaseMaterializedMySQL::shutdownSynchronizationThread() { materialize_thread.stopSynchronization(); started_up = false; } template class Helper, typename... Args> -auto castToMaterializeMySQLAndCallHelper(Database * database, Args && ... args) +auto castToMaterializedMySQLAndCallHelper(Database * database, Args && ... args) { - using Ordinary = DatabaseMaterializeMySQL; - using Atomic = DatabaseMaterializeMySQL; + using Ordinary = DatabaseMaterializedMySQL; + using Atomic = DatabaseMaterializedMySQL; using ToOrdinary = typename std::conditional_t, const Ordinary *, Ordinary *>; using ToAtomic = typename std::conditional_t, const Atomic *, Atomic *>; if (auto * database_materialize = typeid_cast(database)) @@ -223,29 +223,29 @@ auto castToMaterializeMySQLAndCallHelper(Database * database, Args && ... args) if (auto * database_materialize = typeid_cast(database)) return (database_materialize->*Helper::v)(std::forward(args)...); - throw Exception("LOGICAL_ERROR: cannot cast to DatabaseMaterializeMySQL, it is a bug.", ErrorCodes::LOGICAL_ERROR); + throw Exception("LOGICAL_ERROR: cannot cast to DatabaseMaterializedMySQL, it is a bug.", ErrorCodes::LOGICAL_ERROR); } template struct HelperSetException { static constexpr auto v = &T::setException; }; -void setSynchronizationThreadException(const DatabasePtr & materialize_mysql_db, const std::exception_ptr & exception) +void setSynchronizationThreadException(const DatabasePtr & materialized_mysql_db, const std::exception_ptr & exception) { - castToMaterializeMySQLAndCallHelper(materialize_mysql_db.get(), exception); + castToMaterializedMySQLAndCallHelper(materialized_mysql_db.get(), exception); } template struct HelperStopSync { static constexpr auto v = &T::shutdownSynchronizationThread; }; -void stopDatabaseSynchronization(const DatabasePtr & materialize_mysql_db) +void stopDatabaseSynchronization(const DatabasePtr & materialized_mysql_db) { - castToMaterializeMySQLAndCallHelper(materialize_mysql_db.get()); + castToMaterializedMySQLAndCallHelper(materialized_mysql_db.get()); } template struct HelperRethrow { static constexpr auto v = &T::rethrowExceptionIfNeed; }; -void rethrowSyncExceptionIfNeed(const IDatabase * materialize_mysql_db) +void rethrowSyncExceptionIfNeed(const IDatabase * materialized_mysql_db) { - castToMaterializeMySQLAndCallHelper(materialize_mysql_db); + castToMaterializedMySQLAndCallHelper(materialized_mysql_db); } -template class DatabaseMaterializeMySQL; -template class DatabaseMaterializeMySQL; +template class DatabaseMaterializedMySQL; +template class DatabaseMaterializedMySQL; } diff --git a/src/Databases/MySQL/DatabaseMaterializeMySQL.h b/src/Databases/MySQL/DatabaseMaterializedMySQL.h similarity index 75% rename from src/Databases/MySQL/DatabaseMaterializeMySQL.h rename to src/Databases/MySQL/DatabaseMaterializedMySQL.h index 74a3c06e6f0..cdeb8a595db 100644 --- a/src/Databases/MySQL/DatabaseMaterializeMySQL.h +++ b/src/Databases/MySQL/DatabaseMaterializedMySQL.h @@ -7,8 +7,8 @@ #include #include #include -#include -#include +#include +#include namespace DB { @@ -18,30 +18,30 @@ namespace DB * All table structure and data will be written to the local file system */ template -class DatabaseMaterializeMySQL : public Base +class DatabaseMaterializedMySQL : public Base { public: - DatabaseMaterializeMySQL( + DatabaseMaterializedMySQL( ContextPtr context, const String & database_name_, const String & metadata_path_, UUID uuid, const String & mysql_database_name_, mysqlxx::Pool && pool_, - MySQLClient && client_, std::unique_ptr settings_); + MySQLClient && client_, std::unique_ptr settings_); void rethrowExceptionIfNeed() const; void setException(const std::exception_ptr & exception); protected: - std::unique_ptr settings; + std::unique_ptr settings; - MaterializeMySQLSyncThread materialize_thread; + MaterializedMySQLSyncThread materialize_thread; std::exception_ptr exception; std::atomic_bool started_up{false}; public: - String getEngineName() const override { return "MaterializeMySQL"; } + String getEngineName() const override { return "MaterializedMySQL"; } void loadStoredObjects(ContextMutablePtr context_, bool has_force_restore_data_flag, bool force_attach) override; @@ -66,12 +66,14 @@ public: void assertCalledFromSyncThreadOrDrop(const char * method) const; void shutdownSynchronizationThread(); + + friend class DatabaseMaterializedTablesIterator; }; -void setSynchronizationThreadException(const DatabasePtr & materialize_mysql_db, const std::exception_ptr & exception); -void stopDatabaseSynchronization(const DatabasePtr & materialize_mysql_db); -void rethrowSyncExceptionIfNeed(const IDatabase * materialize_mysql_db); +void setSynchronizationThreadException(const DatabasePtr & materialized_mysql_db, const std::exception_ptr & exception); +void stopDatabaseSynchronization(const DatabasePtr & materialized_mysql_db); +void rethrowSyncExceptionIfNeed(const IDatabase * materialized_mysql_db); } diff --git a/src/Databases/MySQL/DatabaseMaterializeTablesIterator.h b/src/Databases/MySQL/DatabaseMaterializedTablesIterator.h similarity index 60% rename from src/Databases/MySQL/DatabaseMaterializeTablesIterator.h rename to src/Databases/MySQL/DatabaseMaterializedTablesIterator.h index 54031de40a2..8a7dbacf4a2 100644 --- a/src/Databases/MySQL/DatabaseMaterializeTablesIterator.h +++ b/src/Databases/MySQL/DatabaseMaterializedTablesIterator.h @@ -1,18 +1,18 @@ #pragma once #include -#include +#include namespace DB { -/** MaterializeMySQL database table iterator +/** MaterializedMySQL database table iterator * * The iterator returns different storage engine types depending on the visitor. * When MySQLSync thread accesses, it always returns MergeTree - * Other cases always convert MergeTree to StorageMaterializeMySQL + * Other cases always convert MergeTree to StorageMaterializedMySQL */ -class DatabaseMaterializeTablesIterator final : public IDatabaseTablesIterator +class DatabaseMaterializedTablesIterator final : public IDatabaseTablesIterator { public: void next() override { nested_iterator->next(); } @@ -23,14 +23,14 @@ public: const StoragePtr & table() const override { - StoragePtr storage = std::make_shared(nested_iterator->table(), database); + StoragePtr storage = std::make_shared(nested_iterator->table(), database); return tables.emplace_back(storage); } UUID uuid() const override { return nested_iterator->uuid(); } - DatabaseMaterializeTablesIterator(DatabaseTablesIteratorPtr nested_iterator_, const IDatabase * database_) - : nested_iterator(std::move(nested_iterator_)), database(database_) + DatabaseMaterializedTablesIterator(DatabaseTablesIteratorPtr nested_iterator_, const IDatabase * database_) + : IDatabaseTablesIterator(database_->getDatabaseName()), nested_iterator(std::move(nested_iterator_)), database(database_) { } diff --git a/src/Databases/MySQL/MaterializeMySQLSettings.cpp b/src/Databases/MySQL/MaterializedMySQLSettings.cpp similarity index 75% rename from src/Databases/MySQL/MaterializeMySQLSettings.cpp rename to src/Databases/MySQL/MaterializedMySQLSettings.cpp index a8672bf488e..d314e1f35a9 100644 --- a/src/Databases/MySQL/MaterializeMySQLSettings.cpp +++ b/src/Databases/MySQL/MaterializedMySQLSettings.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include @@ -11,9 +11,9 @@ namespace ErrorCodes extern const int UNKNOWN_SETTING; } -IMPLEMENT_SETTINGS_TRAITS(MaterializeMySQLSettingsTraits, LIST_OF_MATERIALIZE_MODE_SETTINGS) +IMPLEMENT_SETTINGS_TRAITS(MaterializedMySQLSettingsTraits, LIST_OF_MATERIALIZE_MODE_SETTINGS) -void MaterializeMySQLSettings::loadFromQuery(ASTStorage & storage_def) +void MaterializedMySQLSettings::loadFromQuery(ASTStorage & storage_def) { if (storage_def.settings) { diff --git a/src/Databases/MySQL/MaterializeMySQLSettings.h b/src/Databases/MySQL/MaterializedMySQLSettings.h similarity index 87% rename from src/Databases/MySQL/MaterializeMySQLSettings.h rename to src/Databases/MySQL/MaterializedMySQLSettings.h index 9bd05b5382b..d5acdc81602 100644 --- a/src/Databases/MySQL/MaterializeMySQLSettings.h +++ b/src/Databases/MySQL/MaterializedMySQLSettings.h @@ -17,13 +17,13 @@ class ASTStorage; M(Int64, max_wait_time_when_mysql_unavailable, 1000, "Retry interval when MySQL is not available (milliseconds). Negative value disable retry.", 0) \ M(Bool, allows_query_when_mysql_lost, false, "Allow query materialized table when mysql is lost.", 0) \ - DECLARE_SETTINGS_TRAITS(MaterializeMySQLSettingsTraits, LIST_OF_MATERIALIZE_MODE_SETTINGS) + DECLARE_SETTINGS_TRAITS(MaterializedMySQLSettingsTraits, LIST_OF_MATERIALIZE_MODE_SETTINGS) -/** Settings for the MaterializeMySQL database engine. +/** Settings for the MaterializedMySQL database engine. * Could be loaded from a CREATE DATABASE query (SETTINGS clause). */ -struct MaterializeMySQLSettings : public BaseSettings +struct MaterializedMySQLSettings : public BaseSettings { void loadFromQuery(ASTStorage & storage_def); }; diff --git a/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp b/src/Databases/MySQL/MaterializedMySQLSyncThread.cpp similarity index 94% rename from src/Databases/MySQL/MaterializeMySQLSyncThread.cpp rename to src/Databases/MySQL/MaterializedMySQLSyncThread.cpp index a8577ddbd03..dcf77f56e18 100644 --- a/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp +++ b/src/Databases/MySQL/MaterializedMySQLSyncThread.cpp @@ -4,7 +4,7 @@ #if USE_MYSQL -#include +#include # include # include # include @@ -12,7 +12,7 @@ # include # include # include -# include +# include # include # include # include @@ -71,14 +71,14 @@ static BlockIO tryToExecuteQuery(const String & query_to_execute, ContextMutable catch (...) { tryLogCurrentException( - &Poco::Logger::get("MaterializeMySQLSyncThread(" + database + ")"), + &Poco::Logger::get("MaterializedMySQLSyncThread(" + database + ")"), "Query " + query_to_execute + " wasn't finished successfully"); throw; } } -MaterializeMySQLSyncThread::~MaterializeMySQLSyncThread() +MaterializedMySQLSyncThread::~MaterializedMySQLSyncThread() { try { @@ -129,7 +129,7 @@ static void checkMySQLVariables(const mysqlxx::Pool::Entry & connection, const S { bool first = true; WriteBufferFromOwnString error_message; - error_message << "Illegal MySQL variables, the MaterializeMySQL engine requires "; + error_message << "Illegal MySQL variables, the MaterializedMySQL engine requires "; for (const auto & [variable_name, variable_error_val] : variables_error_message) { error_message << (first ? "" : ", ") << variable_name << "='" << variable_error_val << "'"; @@ -142,15 +142,15 @@ static void checkMySQLVariables(const mysqlxx::Pool::Entry & connection, const S } } -MaterializeMySQLSyncThread::MaterializeMySQLSyncThread( +MaterializedMySQLSyncThread::MaterializedMySQLSyncThread( ContextPtr context_, const String & database_name_, const String & mysql_database_name_, mysqlxx::Pool && pool_, MySQLClient && client_, - MaterializeMySQLSettings * settings_) + MaterializedMySQLSettings * settings_) : WithContext(context_->getGlobalContext()) - , log(&Poco::Logger::get("MaterializeMySQLSyncThread")) + , log(&Poco::Logger::get("MaterializedMySQLSyncThread")) , database_name(database_name_) , mysql_database_name(mysql_database_name_) , pool(std::move(pool_)) @@ -160,7 +160,7 @@ MaterializeMySQLSyncThread::MaterializeMySQLSyncThread( query_prefix = "EXTERNAL DDL FROM MySQL(" + backQuoteIfNeed(database_name) + ", " + backQuoteIfNeed(mysql_database_name) + ") "; } -void MaterializeMySQLSyncThread::synchronization() +void MaterializedMySQLSyncThread::synchronization() { setThreadName(MYSQL_BACKGROUND_THREAD_NAME); @@ -221,7 +221,7 @@ void MaterializeMySQLSyncThread::synchronization() } } -void MaterializeMySQLSyncThread::stopSynchronization() +void MaterializedMySQLSyncThread::stopSynchronization() { if (!sync_quit && background_thread_pool) { @@ -231,12 +231,12 @@ void MaterializeMySQLSyncThread::stopSynchronization() } } -void MaterializeMySQLSyncThread::startSynchronization() +void MaterializedMySQLSyncThread::startSynchronization() { background_thread_pool = std::make_unique([this]() { synchronization(); }); } -void MaterializeMySQLSyncThread::assertMySQLAvailable() +void MaterializedMySQLSyncThread::assertMySQLAvailable() { try { @@ -334,7 +334,7 @@ static inline void dumpDataForTables( Stopwatch watch; copyData(input, *out, is_cancelled); const Progress & progress = out->getProgress(); - LOG_INFO(&Poco::Logger::get("MaterializeMySQLSyncThread(" + database_name + ")"), + LOG_INFO(&Poco::Logger::get("MaterializedMySQLSyncThread(" + database_name + ")"), "Materialize MySQL step 1: dump {}, {} rows, {} in {} sec., {} rows/sec., {}/sec." , table_name, formatReadableQuantity(progress.written_rows), formatReadableSizeWithBinarySuffix(progress.written_bytes) , watch.elapsedSeconds(), formatReadableQuantity(static_cast(progress.written_rows / watch.elapsedSeconds())) @@ -356,7 +356,7 @@ static inline UInt32 randomNumber() return dist6(rng); } -bool MaterializeMySQLSyncThread::prepareSynchronized(MaterializeMetadata & metadata) +bool MaterializedMySQLSyncThread::prepareSynchronized(MaterializeMetadata & metadata) { bool opened_transaction = false; mysqlxx::PoolWithFailover::Entry connection; @@ -441,7 +441,7 @@ bool MaterializeMySQLSyncThread::prepareSynchronized(MaterializeMetadata & metad return false; } -void MaterializeMySQLSyncThread::flushBuffersData(Buffers & buffers, MaterializeMetadata & metadata) +void MaterializedMySQLSyncThread::flushBuffersData(Buffers & buffers, MaterializeMetadata & metadata) { if (buffers.data.empty()) return; @@ -674,7 +674,7 @@ static inline size_t onUpdateData(const Row & rows_data, Block & buffer, size_t return buffer.bytes() - prev_bytes; } -void MaterializeMySQLSyncThread::onEvent(Buffers & buffers, const BinlogEventPtr & receive_event, MaterializeMetadata & metadata) +void MaterializedMySQLSyncThread::onEvent(Buffers & buffers, const BinlogEventPtr & receive_event, MaterializeMetadata & metadata) { if (receive_event->type() == MYSQL_WRITE_ROWS_EVENT) { @@ -729,7 +729,7 @@ void MaterializeMySQLSyncThread::onEvent(Buffers & buffers, const BinlogEventPtr } } -void MaterializeMySQLSyncThread::executeDDLAtomic(const QueryEvent & query_event) +void MaterializedMySQLSyncThread::executeDDLAtomic(const QueryEvent & query_event) { try { @@ -751,18 +751,18 @@ void MaterializeMySQLSyncThread::executeDDLAtomic(const QueryEvent & query_event } } -bool MaterializeMySQLSyncThread::isMySQLSyncThread() +bool MaterializedMySQLSyncThread::isMySQLSyncThread() { return getThreadName() == MYSQL_BACKGROUND_THREAD_NAME; } -void MaterializeMySQLSyncThread::setSynchronizationThreadException(const std::exception_ptr & exception) +void MaterializedMySQLSyncThread::setSynchronizationThreadException(const std::exception_ptr & exception) { auto db = DatabaseCatalog::instance().getDatabase(database_name); DB::setSynchronizationThreadException(db, exception); } -void MaterializeMySQLSyncThread::Buffers::add(size_t block_rows, size_t block_bytes, size_t written_rows, size_t written_bytes) +void MaterializedMySQLSyncThread::Buffers::add(size_t block_rows, size_t block_bytes, size_t written_rows, size_t written_bytes) { total_blocks_rows += written_rows; total_blocks_bytes += written_bytes; @@ -770,13 +770,13 @@ void MaterializeMySQLSyncThread::Buffers::add(size_t block_rows, size_t block_by max_block_bytes = std::max(block_bytes, max_block_bytes); } -bool MaterializeMySQLSyncThread::Buffers::checkThresholds(size_t check_block_rows, size_t check_block_bytes, size_t check_total_rows, size_t check_total_bytes) const +bool MaterializedMySQLSyncThread::Buffers::checkThresholds(size_t check_block_rows, size_t check_block_bytes, size_t check_total_rows, size_t check_total_bytes) const { return max_block_rows >= check_block_rows || max_block_bytes >= check_block_bytes || total_blocks_rows >= check_total_rows || total_blocks_bytes >= check_total_bytes; } -void MaterializeMySQLSyncThread::Buffers::commit(ContextPtr context) +void MaterializedMySQLSyncThread::Buffers::commit(ContextPtr context) { try { @@ -801,7 +801,7 @@ void MaterializeMySQLSyncThread::Buffers::commit(ContextPtr context) } } -MaterializeMySQLSyncThread::Buffers::BufferAndSortingColumnsPtr MaterializeMySQLSyncThread::Buffers::getTableDataBuffer( +MaterializedMySQLSyncThread::Buffers::BufferAndSortingColumnsPtr MaterializedMySQLSyncThread::Buffers::getTableDataBuffer( const String & table_name, ContextPtr context) { const auto & iterator = data.find(table_name); diff --git a/src/Databases/MySQL/MaterializeMySQLSyncThread.h b/src/Databases/MySQL/MaterializedMySQLSyncThread.h similarity index 93% rename from src/Databases/MySQL/MaterializeMySQLSyncThread.h rename to src/Databases/MySQL/MaterializedMySQLSyncThread.h index 03958fe10cc..0cd0701439f 100644 --- a/src/Databases/MySQL/MaterializeMySQLSyncThread.h +++ b/src/Databases/MySQL/MaterializedMySQLSyncThread.h @@ -14,7 +14,7 @@ # include # include # include -# include +# include # include # include # include @@ -36,18 +36,18 @@ namespace DB * real-time pull incremental data: * We will pull the binlog event of MySQL to parse and execute when the full data synchronization is completed. */ -class MaterializeMySQLSyncThread : WithContext +class MaterializedMySQLSyncThread : WithContext { public: - ~MaterializeMySQLSyncThread(); + ~MaterializedMySQLSyncThread(); - MaterializeMySQLSyncThread( + MaterializedMySQLSyncThread( ContextPtr context, const String & database_name_, const String & mysql_database_name_, mysqlxx::Pool && pool_, MySQLClient && client_, - MaterializeMySQLSettings * settings_); + MaterializedMySQLSettings * settings_); void stopSynchronization(); @@ -65,7 +65,7 @@ private: mutable mysqlxx::Pool pool; mutable MySQLClient client; - MaterializeMySQLSettings * settings; + MaterializedMySQLSettings * settings; String query_prefix; // USE MySQL ERROR CODE: diff --git a/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp b/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp index c8ef5a44682..c848c784712 100644 --- a/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp +++ b/src/Databases/PostgreSQL/DatabasePostgreSQL.cpp @@ -39,14 +39,16 @@ DatabasePostgreSQL::DatabasePostgreSQL( const String & metadata_path_, const ASTStorage * database_engine_define_, const String & dbname_, - const String & postgres_dbname, + const String & postgres_dbname_, + const String & postgres_schema_, postgres::PoolWithFailoverPtr pool_, bool cache_tables_) : IDatabase(dbname_) , WithContext(context_->getGlobalContext()) , metadata_path(metadata_path_) , database_engine_define(database_engine_define_->clone()) - , dbname(postgres_dbname) + , postgres_dbname(postgres_dbname_) + , postgres_schema(postgres_schema_) , pool(std::move(pool_)) , cache_tables(cache_tables_) { @@ -55,12 +57,28 @@ DatabasePostgreSQL::DatabasePostgreSQL( } +String DatabasePostgreSQL::getTableNameForLogs(const String & table_name) const +{ + if (postgres_schema.empty()) + return fmt::format("{}.{}", postgres_dbname, table_name); + return fmt::format("{}.{}.{}", postgres_dbname, postgres_schema, table_name); +} + + +String DatabasePostgreSQL::formatTableName(const String & table_name) const +{ + if (postgres_schema.empty()) + return doubleQuoteString(table_name); + return fmt::format("{}.{}", doubleQuoteString(postgres_schema), doubleQuoteString(table_name)); +} + + bool DatabasePostgreSQL::empty() const { std::lock_guard lock(mutex); auto connection_holder = pool->get(); - auto tables_list = fetchPostgreSQLTablesList(connection_holder->get()); + auto tables_list = fetchPostgreSQLTablesList(connection_holder->get(), postgres_schema); for (const auto & table_name : tables_list) if (!detached_or_dropped.count(table_name)) @@ -76,7 +94,7 @@ DatabaseTablesIteratorPtr DatabasePostgreSQL::getTablesIterator(ContextPtr local Tables tables; auto connection_holder = pool->get(); - auto table_names = fetchPostgreSQLTablesList(connection_holder->get()); + auto table_names = fetchPostgreSQLTablesList(connection_holder->get(), postgres_schema); for (const auto & table_name : table_names) if (!detached_or_dropped.count(table_name)) @@ -104,8 +122,11 @@ bool DatabasePostgreSQL::checkPostgresTable(const String & table_name) const pqxx::result result = tx.exec(fmt::format( "SELECT '{}'::regclass, tablename " "FROM pg_catalog.pg_tables " - "WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema' " - "AND tablename = '{}'", table_name, table_name)); + "WHERE schemaname != 'pg_catalog' AND {} " + "AND tablename = '{}'", + formatTableName(table_name), + (postgres_schema.empty() ? "schemaname != 'information_schema'" : "schemaname = " + quoteString(postgres_schema)), + formatTableName(table_name))); } catch (pqxx::undefined_table const &) { @@ -151,14 +172,14 @@ StoragePtr DatabasePostgreSQL::fetchTable(const String & table_name, ContextPtr return StoragePtr{}; auto connection_holder = pool->get(); - auto columns = fetchPostgreSQLTableStructure(connection_holder->get(), doubleQuoteString(table_name)).columns; + auto columns = fetchPostgreSQLTableStructure(connection_holder->get(), formatTableName(table_name)).columns; if (!columns) return StoragePtr{}; auto storage = StoragePostgreSQL::create( StorageID(database_name, table_name), pool, table_name, - ColumnsDescription{*columns}, ConstraintsDescription{}, String{}, local_context); + ColumnsDescription{*columns}, ConstraintsDescription{}, String{}, local_context, postgres_schema); if (cache_tables) cached_tables[table_name] = storage; @@ -182,10 +203,14 @@ void DatabasePostgreSQL::attachTable(const String & table_name, const StoragePtr std::lock_guard lock{mutex}; if (!checkPostgresTable(table_name)) - throw Exception(fmt::format("Cannot attach table {}.{} because it does not exist", database_name, table_name), ErrorCodes::UNKNOWN_TABLE); + throw Exception(ErrorCodes::UNKNOWN_TABLE, + "Cannot attach PostgreSQL table {} because it does not exist in PostgreSQL", + getTableNameForLogs(table_name), database_name); if (!detached_or_dropped.count(table_name)) - throw Exception(fmt::format("Cannot attach table {}.{}. It already exists", database_name, table_name), ErrorCodes::TABLE_ALREADY_EXISTS); + throw Exception(ErrorCodes::TABLE_ALREADY_EXISTS, + "Cannot attach PostgreSQL table {} because it already exists", + getTableNameForLogs(table_name), database_name); if (cache_tables) cached_tables[table_name] = storage; @@ -203,10 +228,10 @@ StoragePtr DatabasePostgreSQL::detachTable(const String & table_name) std::lock_guard lock{mutex}; if (detached_or_dropped.count(table_name)) - throw Exception(fmt::format("Cannot detach table {}.{}. It is already dropped/detached", database_name, table_name), ErrorCodes::TABLE_IS_DROPPED); + throw Exception(ErrorCodes::TABLE_IS_DROPPED, "Cannot detach table {}. It is already dropped/detached", getTableNameForLogs(table_name)); if (!checkPostgresTable(table_name)) - throw Exception(fmt::format("Cannot detach table {}.{} because it does not exist", database_name, table_name), ErrorCodes::UNKNOWN_TABLE); + throw Exception(ErrorCodes::UNKNOWN_TABLE, "Cannot detach table {}, because it does not exist", getTableNameForLogs(table_name)); if (cache_tables) cached_tables.erase(table_name); @@ -234,10 +259,10 @@ void DatabasePostgreSQL::dropTable(ContextPtr, const String & table_name, bool / std::lock_guard lock{mutex}; if (!checkPostgresTable(table_name)) - throw Exception(fmt::format("Cannot drop table {}.{} because it does not exist", database_name, table_name), ErrorCodes::UNKNOWN_TABLE); + throw Exception(ErrorCodes::UNKNOWN_TABLE, "Cannot drop table {} because it does not exist", getTableNameForLogs(table_name)); if (detached_or_dropped.count(table_name)) - throw Exception(fmt::format("Table {}.{} is already dropped/detached", database_name, table_name), ErrorCodes::TABLE_IS_DROPPED); + throw Exception(ErrorCodes::TABLE_IS_DROPPED, "Table {} is already dropped/detached", getTableNameForLogs(table_name)); fs::path mark_table_removed = fs::path(getMetadataPath()) / (escapeForFileName(table_name) + suffix); FS::createFile(mark_table_removed); @@ -281,7 +306,7 @@ void DatabasePostgreSQL::removeOutdatedTables() { std::lock_guard lock{mutex}; auto connection_holder = pool->get(); - auto actual_tables = fetchPostgreSQLTablesList(connection_holder->get()); + auto actual_tables = fetchPostgreSQLTablesList(connection_holder->get(), postgres_schema); if (cache_tables) { @@ -334,7 +359,7 @@ ASTPtr DatabasePostgreSQL::getCreateTableQueryImpl(const String & table_name, Co if (!storage) { if (throw_on_error) - throw Exception(fmt::format("PostgreSQL table {}.{} does not exist", database_name, table_name), ErrorCodes::UNKNOWN_TABLE); + throw Exception(ErrorCodes::UNKNOWN_TABLE, "PostgreSQL table {} does not exist", getTableNameForLogs(table_name)); return nullptr; } @@ -367,9 +392,9 @@ ASTPtr DatabasePostgreSQL::getCreateTableQueryImpl(const String & table_name, Co ASTs storage_children = ast_storage->children; auto storage_engine_arguments = ast_storage->engine->arguments; - /// Remove extra engine argument (`use_table_cache`) - if (storage_engine_arguments->children.size() > 4) - storage_engine_arguments->children.resize(storage_engine_arguments->children.size() - 1); + /// Remove extra engine argument (`schema` and `use_table_cache`) + if (storage_engine_arguments->children.size() >= 5) + storage_engine_arguments->children.resize(4); /// Add table_name to engine arguments assert(storage_engine_arguments->children.size() >= 2); diff --git a/src/Databases/PostgreSQL/DatabasePostgreSQL.h b/src/Databases/PostgreSQL/DatabasePostgreSQL.h index ea465390099..ec5fb441958 100644 --- a/src/Databases/PostgreSQL/DatabasePostgreSQL.h +++ b/src/Databases/PostgreSQL/DatabasePostgreSQL.h @@ -32,7 +32,8 @@ public: const String & metadata_path_, const ASTStorage * database_engine_define, const String & dbname_, - const String & postgres_dbname, + const String & postgres_dbname_, + const String & postgres_schema_, postgres::PoolWithFailoverPtr pool_, bool cache_tables_); @@ -69,7 +70,8 @@ protected: private: String metadata_path; ASTPtr database_engine_define; - String dbname; + String postgres_dbname; + String postgres_schema; postgres::PoolWithFailoverPtr pool; const bool cache_tables; @@ -77,6 +79,10 @@ private: std::unordered_set detached_or_dropped; BackgroundSchedulePool::TaskHolder cleaner_task; + String getTableNameForLogs(const String & table_name) const; + + String formatTableName(const String & table_name) const; + bool checkPostgresTable(const String & table_name) const; StoragePtr fetchTable(const String & table_name, ContextPtr context, const bool table_checked) const; diff --git a/src/Databases/PostgreSQL/fetchPostgreSQLTableStructure.cpp b/src/Databases/PostgreSQL/fetchPostgreSQLTableStructure.cpp index 64d47720af9..1b77947264e 100644 --- a/src/Databases/PostgreSQL/fetchPostgreSQLTableStructure.cpp +++ b/src/Databases/PostgreSQL/fetchPostgreSQLTableStructure.cpp @@ -9,7 +9,7 @@ #include #include #include -#include +#include #include #include #include @@ -27,11 +27,12 @@ namespace ErrorCodes template -std::unordered_set fetchPostgreSQLTablesList(T & tx) +std::unordered_set fetchPostgreSQLTablesList(T & tx, const String & postgres_schema) { std::unordered_set tables; - std::string query = "SELECT tablename FROM pg_catalog.pg_tables " - "WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema'"; + std::string query = fmt::format("SELECT tablename FROM pg_catalog.pg_tables " + "WHERE schemaname != 'pg_catalog' AND {}", + postgres_schema.empty() ? "schemaname != 'information_schema'" : "schemaname = " + quoteString(postgres_schema)); for (auto table_name : tx.template stream(query)) tables.insert(std::get<0>(table_name)); @@ -71,7 +72,7 @@ static DataTypePtr convertPostgreSQLDataType(String & type, const std::function< else if (type == "bigserial") res = std::make_shared(); else if (type.starts_with("timestamp")) - res = std::make_shared(); + res = std::make_shared(6); else if (type == "date") res = std::make_shared(); else if (type.starts_with("numeric")) @@ -270,10 +271,10 @@ PostgreSQLTableStructure fetchPostgreSQLTableStructure(pqxx::connection & connec } -std::unordered_set fetchPostgreSQLTablesList(pqxx::connection & connection) +std::unordered_set fetchPostgreSQLTablesList(pqxx::connection & connection, const String & postgres_schema) { pqxx::ReadTransaction tx(connection); - auto result = fetchPostgreSQLTablesList(tx); + auto result = fetchPostgreSQLTablesList(tx, postgres_schema); tx.commit(); return result; } @@ -290,10 +291,10 @@ PostgreSQLTableStructure fetchPostgreSQLTableStructure( bool with_primary_key, bool with_replica_identity_index); template -std::unordered_set fetchPostgreSQLTablesList(pqxx::work & tx); +std::unordered_set fetchPostgreSQLTablesList(pqxx::work & tx, const String & postgres_schema); template -std::unordered_set fetchPostgreSQLTablesList(pqxx::ReadTransaction & tx); +std::unordered_set fetchPostgreSQLTablesList(pqxx::ReadTransaction & tx, const String & postgres_schema); } diff --git a/src/Databases/PostgreSQL/fetchPostgreSQLTableStructure.h b/src/Databases/PostgreSQL/fetchPostgreSQLTableStructure.h index 07562cd69fa..0097287701c 100644 --- a/src/Databases/PostgreSQL/fetchPostgreSQLTableStructure.h +++ b/src/Databases/PostgreSQL/fetchPostgreSQLTableStructure.h @@ -21,7 +21,7 @@ struct PostgreSQLTableStructure using PostgreSQLTableStructurePtr = std::unique_ptr; -std::unordered_set fetchPostgreSQLTablesList(pqxx::connection & connection); +std::unordered_set fetchPostgreSQLTablesList(pqxx::connection & connection, const String & postgres_schema); PostgreSQLTableStructure fetchPostgreSQLTableStructure( pqxx::connection & connection, const String & postgres_table_name, bool use_nulls = true); @@ -32,7 +32,7 @@ PostgreSQLTableStructure fetchPostgreSQLTableStructure( bool with_primary_key = false, bool with_replica_identity_index = false); template -std::unordered_set fetchPostgreSQLTablesList(T & tx); +std::unordered_set fetchPostgreSQLTablesList(T & tx, const String & postgres_schema); } diff --git a/src/Databases/SQLite/DatabaseSQLite.cpp b/src/Databases/SQLite/DatabaseSQLite.cpp index f8e04bf6973..b3966288316 100644 --- a/src/Databases/SQLite/DatabaseSQLite.cpp +++ b/src/Databases/SQLite/DatabaseSQLite.cpp @@ -10,6 +10,7 @@ #include #include #include +#include namespace DB @@ -24,23 +25,15 @@ namespace ErrorCodes DatabaseSQLite::DatabaseSQLite( ContextPtr context_, const ASTStorage * database_engine_define_, + bool is_attach_, const String & database_path_) : IDatabase("SQLite") , WithContext(context_->getGlobalContext()) , database_engine_define(database_engine_define_->clone()) + , database_path(database_path_) , log(&Poco::Logger::get("DatabaseSQLite")) { - sqlite3 * tmp_sqlite_db = nullptr; - int status = sqlite3_open(database_path_.c_str(), &tmp_sqlite_db); - - if (status != SQLITE_OK) - { - throw Exception(ErrorCodes::SQLITE_ENGINE_ERROR, - "Cannot access sqlite database. Error status: {}. Message: {}", - status, sqlite3_errstr(status)); - } - - sqlite_db = std::shared_ptr(tmp_sqlite_db, sqlite3_close); + sqlite_db = openSQLiteDB(database_path_, context_, !is_attach_); } @@ -66,6 +59,9 @@ DatabaseTablesIteratorPtr DatabaseSQLite::getTablesIterator(ContextPtr local_con std::unordered_set DatabaseSQLite::fetchTablesList() const { + if (!sqlite_db) + sqlite_db = openSQLiteDB(database_path, getContext(), /* throw_on_error */true); + std::unordered_set tables; std::string query = "SELECT name FROM sqlite_master " "WHERE type = 'table' AND name NOT LIKE 'sqlite_%'"; @@ -94,6 +90,9 @@ std::unordered_set DatabaseSQLite::fetchTablesList() const bool DatabaseSQLite::checkSQLiteTable(const String & table_name) const { + if (!sqlite_db) + sqlite_db = openSQLiteDB(database_path, getContext(), /* throw_on_error */true); + const String query = fmt::format("SELECT name FROM sqlite_master WHERE type='table' AND name='{table_name}';", table_name); auto callback_get_data = [](void * res, int, char **, char **) -> int @@ -134,6 +133,9 @@ StoragePtr DatabaseSQLite::tryGetTable(const String & table_name, ContextPtr loc StoragePtr DatabaseSQLite::fetchTable(const String & table_name, ContextPtr local_context, bool table_checked) const { + if (!sqlite_db) + sqlite_db = openSQLiteDB(database_path, getContext(), /* throw_on_error */true); + if (!table_checked && !checkSQLiteTable(table_name)) return StoragePtr{}; @@ -145,6 +147,7 @@ StoragePtr DatabaseSQLite::fetchTable(const String & table_name, ContextPtr loca auto storage = StorageSQLite::create( StorageID(database_name, table_name), sqlite_db, + database_path, table_name, ColumnsDescription{*columns}, ConstraintsDescription{}, diff --git a/src/Databases/SQLite/DatabaseSQLite.h b/src/Databases/SQLite/DatabaseSQLite.h index 35b1200f397..a754b56f8ed 100644 --- a/src/Databases/SQLite/DatabaseSQLite.h +++ b/src/Databases/SQLite/DatabaseSQLite.h @@ -19,7 +19,8 @@ class DatabaseSQLite final : public IDatabase, protected WithContext public: using SQLitePtr = std::shared_ptr; - DatabaseSQLite(ContextPtr context_, const ASTStorage * database_engine_define_, const String & database_path_); + DatabaseSQLite(ContextPtr context_, const ASTStorage * database_engine_define_, + bool is_attach_, const String & database_path_); String getEngineName() const override { return "SQLite"; } @@ -47,7 +48,9 @@ protected: private: ASTPtr database_engine_define; - SQLitePtr sqlite_db; + String database_path; + + mutable SQLitePtr sqlite_db; Poco::Logger * log; diff --git a/src/Databases/SQLite/SQLiteUtils.cpp b/src/Databases/SQLite/SQLiteUtils.cpp new file mode 100644 index 00000000000..3046eea0b56 --- /dev/null +++ b/src/Databases/SQLite/SQLiteUtils.cpp @@ -0,0 +1,68 @@ +#include "SQLiteUtils.h" + +#if USE_SQLITE +#include +#include + +namespace fs = std::filesystem; + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int PATH_ACCESS_DENIED; +} + + +void processSQLiteError(const String & message, bool throw_on_error) +{ + if (throw_on_error) + throw Exception(ErrorCodes::PATH_ACCESS_DENIED, message); + else + LOG_ERROR(&Poco::Logger::get("SQLiteEngine"), message); +} + + +String validateSQLiteDatabasePath(const String & path, const String & user_files_path, bool throw_on_error) +{ + String canonical_user_files_path = fs::canonical(user_files_path); + + String canonical_path; + std::error_code err; + + if (fs::path(path).is_relative()) + canonical_path = fs::canonical(fs::path(user_files_path) / path, err); + else + canonical_path = fs::canonical(path, err); + + if (err) + processSQLiteError(fmt::format("SQLite database path '{}' is invalid. Error: {}", path, err.message()), throw_on_error); + + if (!canonical_path.starts_with(canonical_user_files_path)) + processSQLiteError(fmt::format("SQLite database file path '{}' must be inside 'user_files' directory", path), throw_on_error); + + return canonical_path; +} + + +SQLitePtr openSQLiteDB(const String & database_path, ContextPtr context, bool throw_on_error) +{ + auto validated_path = validateSQLiteDatabasePath(database_path, context->getUserFilesPath(), throw_on_error); + + sqlite3 * tmp_sqlite_db = nullptr; + int status = sqlite3_open(validated_path.c_str(), &tmp_sqlite_db); + + if (status != SQLITE_OK) + { + processSQLiteError(fmt::format("Cannot access sqlite database. Error status: {}. Message: {}", + status, sqlite3_errstr(status)), throw_on_error); + return nullptr; + } + + return std::shared_ptr(tmp_sqlite_db, sqlite3_close); +} + +} + +#endif diff --git a/src/Databases/SQLite/SQLiteUtils.h b/src/Databases/SQLite/SQLiteUtils.h new file mode 100644 index 00000000000..35f00904d0d --- /dev/null +++ b/src/Databases/SQLite/SQLiteUtils.h @@ -0,0 +1,22 @@ +#pragma once + +#if !defined(ARCADIA_BUILD) +#include "config_core.h" +#endif + +#if USE_SQLITE +#include +#include +#include // Y_IGNORE + + +namespace DB +{ + +using SQLitePtr = std::shared_ptr; + +SQLitePtr openSQLiteDB(const String & database_path, ContextPtr context, bool throw_on_error = true); + +} + +#endif diff --git a/src/Databases/ya.make b/src/Databases/ya.make index d858dcb9bee..34f47a5edf0 100644 --- a/src/Databases/ya.make +++ b/src/Databases/ya.make @@ -21,13 +21,14 @@ SRCS( DatabaseReplicatedWorker.cpp DatabasesCommon.cpp MySQL/ConnectionMySQLSettings.cpp - MySQL/DatabaseMaterializeMySQL.cpp + MySQL/DatabaseMaterializedMySQL.cpp MySQL/DatabaseMySQL.cpp MySQL/FetchTablesColumnsList.cpp MySQL/MaterializeMetadata.cpp - MySQL/MaterializeMySQLSettings.cpp - MySQL/MaterializeMySQLSyncThread.cpp + MySQL/MaterializedMySQLSettings.cpp + MySQL/MaterializedMySQLSyncThread.cpp SQLite/DatabaseSQLite.cpp + SQLite/SQLiteUtils.cpp SQLite/fetchSQLiteTableStructure.cpp ) diff --git a/src/Dictionaries/DictionaryBlockInputStream.h b/src/Dictionaries/DictionaryBlockInputStream.h index de1acd294f7..7692c910b94 100644 --- a/src/Dictionaries/DictionaryBlockInputStream.h +++ b/src/Dictionaries/DictionaryBlockInputStream.h @@ -6,13 +6,13 @@ #include #include #include -#include #include #include #include "DictionaryBlockInputStreamBase.h" #include "DictionaryStructure.h" #include "IDictionary.h" + namespace DB { diff --git a/src/Dictionaries/DictionaryHelpers.h b/src/Dictionaries/DictionaryHelpers.h index 1478518dee4..ed124ce1e0a 100644 --- a/src/Dictionaries/DictionaryHelpers.h +++ b/src/Dictionaries/DictionaryHelpers.h @@ -9,13 +9,14 @@ #include #include #include -#include #include #include #include #include #include #include +#include + namespace DB { diff --git a/src/Dictionaries/DictionarySourceHelpers.cpp b/src/Dictionaries/DictionarySourceHelpers.cpp index 2d53ac4321e..54ed07092d3 100644 --- a/src/Dictionaries/DictionarySourceHelpers.cpp +++ b/src/Dictionaries/DictionarySourceHelpers.cpp @@ -1,7 +1,7 @@ #include "DictionarySourceHelpers.h" #include #include -#include +#include #include #include #include "DictionaryStructure.h" @@ -18,14 +18,6 @@ namespace ErrorCodes extern const int SIZES_OF_COLUMNS_DOESNT_MATCH; } -void formatBlock(BlockOutputStreamPtr & out, const Block & block) -{ - out->writePrefix(); - out->write(block); - out->writeSuffix(); - out->flush(); -} - /// For simple key Block blockForIds( diff --git a/src/Dictionaries/DictionarySourceHelpers.h b/src/Dictionaries/DictionarySourceHelpers.h index 6c9a321aa36..6fed4c7181c 100644 --- a/src/Dictionaries/DictionarySourceHelpers.h +++ b/src/Dictionaries/DictionarySourceHelpers.h @@ -13,15 +13,8 @@ namespace DB { -class IBlockOutputStream; -using BlockOutputStreamPtr = std::shared_ptr; - struct DictionaryStructure; -/// Write keys to block output stream. - -void formatBlock(BlockOutputStreamPtr & out, const Block & block); - /// For simple key Block blockForIds( diff --git a/src/Dictionaries/DirectDictionary.cpp b/src/Dictionaries/DirectDictionary.cpp index 0508a0d70ad..c9b38acfbb5 100644 --- a/src/Dictionaries/DirectDictionary.cpp +++ b/src/Dictionaries/DirectDictionary.cpp @@ -2,13 +2,13 @@ #include #include -#include #include #include #include #include + namespace DB { namespace ErrorCodes diff --git a/src/Dictionaries/ExecutableDictionarySource.cpp b/src/Dictionaries/ExecutableDictionarySource.cpp index 5247c8038cd..daf79965428 100644 --- a/src/Dictionaries/ExecutableDictionarySource.cpp +++ b/src/Dictionaries/ExecutableDictionarySource.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -266,7 +267,7 @@ void registerDictionarySourceExecutable(DictionarySourceFactory & factory) /// Executable dictionaries may execute arbitrary commands. /// It's OK for dictionaries created by administrator from xml-file, but /// maybe dangerous for dictionaries created from DDL-queries. - if (created_from_ddl) + if (created_from_ddl && context->getApplicationType() != Context::ApplicationType::LOCAL) throw Exception(ErrorCodes::DICTIONARY_ACCESS_DENIED, "Dictionaries with executable dictionary source are not allowed to be created from DDL query"); auto context_local_copy = copyContextAndApplySettings(config_prefix, context, config); diff --git a/src/Dictionaries/ExecutablePoolDictionarySource.cpp b/src/Dictionaries/ExecutablePoolDictionarySource.cpp index fe6b19b8253..9eacda343cf 100644 --- a/src/Dictionaries/ExecutablePoolDictionarySource.cpp +++ b/src/Dictionaries/ExecutablePoolDictionarySource.cpp @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -283,7 +284,7 @@ void registerDictionarySourceExecutablePool(DictionarySourceFactory & factory) /// Executable dictionaries may execute arbitrary commands. /// It's OK for dictionaries created by administrator from xml-file, but /// maybe dangerous for dictionaries created from DDL-queries. - if (created_from_ddl) + if (created_from_ddl && context->getApplicationType() != Context::ApplicationType::LOCAL) throw Exception(ErrorCodes::DICTIONARY_ACCESS_DENIED, "Dictionaries with executable pool dictionary source are not allowed to be created from DDL query"); auto context_local_copy = copyContextAndApplySettings(config_prefix, context, config); diff --git a/src/Dictionaries/HTTPDictionarySource.cpp b/src/Dictionaries/HTTPDictionarySource.cpp index b1b1968454c..ea26e9b7a2a 100644 --- a/src/Dictionaries/HTTPDictionarySource.cpp +++ b/src/Dictionaries/HTTPDictionarySource.cpp @@ -1,6 +1,7 @@ #include "HTTPDictionarySource.h" #include #include +#include #include #include #include diff --git a/src/Dictionaries/LibraryDictionarySource.cpp b/src/Dictionaries/LibraryDictionarySource.cpp index 0b8b52a2d67..4747d3f1216 100644 --- a/src/Dictionaries/LibraryDictionarySource.cpp +++ b/src/Dictionaries/LibraryDictionarySource.cpp @@ -41,6 +41,9 @@ LibraryDictionarySource::LibraryDictionarySource( , sample_block{sample_block_} , context(Context::createCopy(context_)) { + if (fs::path(path).is_relative()) + path = fs::canonical(path); + if (created_from_ddl && !pathStartsWith(path, context->getDictionariesLibPath())) throw Exception(ErrorCodes::PATH_ACCESS_DENIED, "File path {} is not inside {}", path, context->getDictionariesLibPath()); @@ -48,17 +51,32 @@ LibraryDictionarySource::LibraryDictionarySource( throw Exception(ErrorCodes::FILE_DOESNT_EXIST, "LibraryDictionarySource: Can't load library {}: file doesn't exist", path); description.init(sample_block); - bridge_helper = std::make_shared(context, description.sample_block, dictionary_id); - auto res = bridge_helper->initLibrary(path, getLibrarySettingsString(config, config_prefix + ".settings"), getDictAttributesString()); - if (!res) + LibraryBridgeHelper::LibraryInitData library_data + { + .library_path = path, + .library_settings = getLibrarySettingsString(config, config_prefix + ".settings"), + .dict_attributes = getDictAttributesString() + }; + + bridge_helper = std::make_shared(context, description.sample_block, dictionary_id, library_data); + + if (!bridge_helper->initLibrary()) throw Exception(ErrorCodes::EXTERNAL_LIBRARY_ERROR, "Failed to create shared library from path: {}", path); } LibraryDictionarySource::~LibraryDictionarySource() { - bridge_helper->removeLibrary(); + try + { + bridge_helper->removeLibrary(); + } + catch (...) + { + tryLogCurrentException("LibraryDictionarySource"); + } + } @@ -72,8 +90,9 @@ LibraryDictionarySource::LibraryDictionarySource(const LibraryDictionarySource & , context(other.context) , description{other.description} { - bridge_helper = std::make_shared(context, description.sample_block, dictionary_id); - bridge_helper->cloneLibrary(other.dictionary_id); + bridge_helper = std::make_shared(context, description.sample_block, dictionary_id, other.bridge_helper->getLibraryData()); + if (!bridge_helper->cloneLibrary(other.dictionary_id)) + throw Exception(ErrorCodes::EXTERNAL_LIBRARY_ERROR, "Failed to clone library"); } @@ -99,7 +118,7 @@ BlockInputStreamPtr LibraryDictionarySource::loadAll() BlockInputStreamPtr LibraryDictionarySource::loadIds(const std::vector & ids) { LOG_TRACE(log, "loadIds {} size = {}", toString(), ids.size()); - return bridge_helper->loadIds(getDictIdsString(ids)); + return bridge_helper->loadIds(ids); } @@ -147,14 +166,6 @@ String LibraryDictionarySource::getLibrarySettingsString(const Poco::Util::Abstr } -String LibraryDictionarySource::getDictIdsString(const std::vector & ids) -{ - WriteBufferFromOwnString out; - writeVectorBinary(ids, out); - return out.str(); -} - - String LibraryDictionarySource::getDictAttributesString() { std::vector attributes_names(dict_struct.attributes.size()); diff --git a/src/Dictionaries/LibraryDictionarySource.h b/src/Dictionaries/LibraryDictionarySource.h index 88e133666e6..b5cf4f68b05 100644 --- a/src/Dictionaries/LibraryDictionarySource.h +++ b/src/Dictionaries/LibraryDictionarySource.h @@ -70,8 +70,6 @@ public: std::string toString() const override; private: - static String getDictIdsString(const std::vector & ids); - String getDictAttributesString(); static String getLibrarySettingsString(const Poco::Util::AbstractConfiguration & config, const std::string & config_root); @@ -82,7 +80,7 @@ private: const DictionaryStructure dict_struct; const std::string config_prefix; - const std::string path; + std::string path; const Field dictionary_id; Block sample_block; diff --git a/src/Dictionaries/RangeDictionaryBlockInputStream.h b/src/Dictionaries/RangeDictionaryBlockInputStream.h index bef28e71d57..7d40531cfa5 100644 --- a/src/Dictionaries/RangeDictionaryBlockInputStream.h +++ b/src/Dictionaries/RangeDictionaryBlockInputStream.h @@ -2,7 +2,6 @@ #include #include #include -#include #include #include #include @@ -11,6 +10,7 @@ #include "IDictionary.h" #include "RangeHashedDictionary.h" + namespace DB { /* diff --git a/src/Dictionaries/registerCacheDictionaries.cpp b/src/Dictionaries/registerCacheDictionaries.cpp index 500d0cc923c..d039c5b6630 100644 --- a/src/Dictionaries/registerCacheDictionaries.cpp +++ b/src/Dictionaries/registerCacheDictionaries.cpp @@ -17,27 +17,26 @@ namespace ErrorCodes } CacheDictionaryStorageConfiguration parseCacheStorageConfiguration( - const String & full_name, const Poco::Util::AbstractConfiguration & config, - const String & layout_prefix, - const DictionaryLifetime & dict_lifetime, - DictionaryKeyType dictionary_key_type) + const String & full_name, + const String & layout_type, + const String & dictionary_layout_prefix, + const DictionaryLifetime & dict_lifetime) { - String dictionary_type_prefix = (dictionary_key_type == DictionaryKeyType::complex) ? ".complex_key_cache." : ".cache."; - String dictionary_configuration_prefix = layout_prefix + dictionary_type_prefix; - - const size_t size = config.getUInt64(dictionary_configuration_prefix + "size_in_cells"); + size_t size = config.getUInt64(dictionary_layout_prefix + ".size_in_cells"); if (size == 0) - throw Exception(ErrorCodes::TOO_SMALL_BUFFER_SIZE, - "{}: cache dictionary cannot have 0 cells", - full_name); + throw Exception(ErrorCodes::TOO_SMALL_BUFFER_SIZE, "{}: dictionary of layout '{}' setting 'size_in_cells' must be greater than 0", full_name, layout_type); size_t dict_lifetime_seconds = static_cast(dict_lifetime.max_sec); - const size_t strict_max_lifetime_seconds = config.getUInt64(dictionary_configuration_prefix + "strict_max_lifetime_seconds", dict_lifetime_seconds); - + size_t strict_max_lifetime_seconds = config.getUInt64(dictionary_layout_prefix + ".strict_max_lifetime_seconds", dict_lifetime_seconds); size_t rounded_size = roundUpToPowerOfTwoOrZero(size); - CacheDictionaryStorageConfiguration storage_configuration{rounded_size, strict_max_lifetime_seconds, dict_lifetime}; + CacheDictionaryStorageConfiguration storage_configuration + { + .max_size_in_cells = rounded_size, + .strict_max_lifetime_seconds = strict_max_lifetime_seconds, + .lifetime = dict_lifetime + }; return storage_configuration; } @@ -45,17 +44,13 @@ CacheDictionaryStorageConfiguration parseCacheStorageConfiguration( #if defined(OS_LINUX) || defined(__FreeBSD__) SSDCacheDictionaryStorageConfiguration parseSSDCacheStorageConfiguration( - const String & full_name, const Poco::Util::AbstractConfiguration & config, - const String & layout_prefix, - const DictionaryLifetime & dict_lifetime, - DictionaryKeyType dictionary_key_type) + const String & full_name, + const String & layout_type, + const String & dictionary_layout_prefix, + const DictionaryLifetime & dict_lifetime) { - String dictionary_type_prefix = dictionary_key_type == DictionaryKeyType::complex ? ".complex_key_ssd_cache." : ".ssd_cache."; - String dictionary_configuration_prefix = layout_prefix + dictionary_type_prefix; - - const size_t strict_max_lifetime_seconds - = config.getUInt64(dictionary_configuration_prefix + "strict_max_lifetime_seconds", static_cast(dict_lifetime.max_sec)); + size_t strict_max_lifetime_seconds = config.getUInt64(dictionary_layout_prefix + ".strict_max_lifetime_seconds", static_cast(dict_lifetime.max_sec)); static constexpr size_t DEFAULT_SSD_BLOCK_SIZE_BYTES = DEFAULT_AIO_FILE_BLOCK_SIZE; static constexpr size_t DEFAULT_FILE_SIZE_BYTES = 4 * 1024 * 1024 * 1024ULL; @@ -64,44 +59,48 @@ SSDCacheDictionaryStorageConfiguration parseSSDCacheStorageConfiguration( static constexpr size_t DEFAULT_PARTITIONS_COUNT = 16; - const size_t max_partitions_count - = config.getInt64(dictionary_configuration_prefix + "ssd_cache.max_partitions_count", DEFAULT_PARTITIONS_COUNT); + size_t max_partitions_count = config.getInt64(dictionary_layout_prefix + ".max_partitions_count", DEFAULT_PARTITIONS_COUNT); - const size_t block_size = config.getInt64(dictionary_configuration_prefix + "block_size", DEFAULT_SSD_BLOCK_SIZE_BYTES); - const size_t file_size = config.getInt64(dictionary_configuration_prefix + "file_size", DEFAULT_FILE_SIZE_BYTES); + size_t block_size = config.getInt64(dictionary_layout_prefix + ".block_size", DEFAULT_SSD_BLOCK_SIZE_BYTES); + size_t file_size = config.getInt64(dictionary_layout_prefix + ".file_size", DEFAULT_FILE_SIZE_BYTES); if (file_size % block_size != 0) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: file_size must be a multiple of block_size", - full_name); + "{}: dictionary of layout '{}' setting 'file_size' must be a multiple of block_size", + full_name, + layout_type); - const size_t read_buffer_size = config.getInt64(dictionary_configuration_prefix + "read_buffer_size", DEFAULT_READ_BUFFER_SIZE_BYTES); + size_t read_buffer_size = config.getInt64(dictionary_layout_prefix + ".read_buffer_size", DEFAULT_READ_BUFFER_SIZE_BYTES); if (read_buffer_size % block_size != 0) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: read_buffer_size must be a multiple of block_size", - full_name); + "{}: dictionary of layout '{}' setting 'read_buffer_size' must be a multiple of block_size", + full_name, + layout_type); - const size_t write_buffer_size - = config.getInt64(dictionary_configuration_prefix + "write_buffer_size", DEFAULT_WRITE_BUFFER_SIZE_BYTES); + size_t write_buffer_size = config.getInt64(dictionary_layout_prefix + ".write_buffer_size", DEFAULT_WRITE_BUFFER_SIZE_BYTES); if (write_buffer_size % block_size != 0) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: write_buffer_size must be a multiple of block_size", - full_name); + "{}: dictionary of layout '{}' setting 'write_buffer_size' must be a multiple of block_size", + full_name, + layout_type); - auto file_path = config.getString(dictionary_configuration_prefix + "path"); + auto file_path = config.getString(dictionary_layout_prefix + ".path"); if (file_path.empty()) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: ssd cache dictionary cannot have empty path", - full_name); + "{}: dictionary of layout '{}' setting 'path' must be specified", + full_name, + layout_type); - SSDCacheDictionaryStorageConfiguration configuration{ - strict_max_lifetime_seconds, - dict_lifetime, - file_path, - max_partitions_count, - block_size, - file_size / block_size, - read_buffer_size / block_size, - write_buffer_size / block_size}; + SSDCacheDictionaryStorageConfiguration configuration + { + .strict_max_lifetime_seconds = strict_max_lifetime_seconds, + .lifetime = dict_lifetime, + .file_path = file_path, + .max_partitions_count = max_partitions_count, + .block_size = block_size, + .file_blocks_size = file_size / block_size, + .read_buffer_blocks_size = read_buffer_size / block_size, + .write_buffer_blocks_size = write_buffer_size / block_size + }; return configuration; } @@ -109,155 +108,131 @@ SSDCacheDictionaryStorageConfiguration parseSSDCacheStorageConfiguration( #endif CacheDictionaryUpdateQueueConfiguration parseCacheDictionaryUpdateQueueConfiguration( - const String & full_name, const Poco::Util::AbstractConfiguration & config, - const String & layout_prefix, - DictionaryKeyType key_type) + const String & full_name, + const String & layout_type, + const String & dictionary_layout_prefix) { - String layout_type = key_type == DictionaryKeyType::complex ? "complex_key_cache" : "cache"; - - const size_t max_update_queue_size = config.getUInt64(layout_prefix + ".cache.max_update_queue_size", 100000); + size_t max_update_queue_size = config.getUInt64(dictionary_layout_prefix + ".max_update_queue_size", 100000); if (max_update_queue_size == 0) throw Exception(ErrorCodes::TOO_SMALL_BUFFER_SIZE, - "{}: dictionary of layout '{}' cannot have empty update queue of size 0", + "{}: dictionary of layout '{}' setting 'max_update_queue_size' must be greater than 0", full_name, layout_type); - const size_t update_queue_push_timeout_milliseconds - = config.getUInt64(layout_prefix + ".cache.update_queue_push_timeout_milliseconds", 10); + size_t update_queue_push_timeout_milliseconds = config.getUInt64(dictionary_layout_prefix + ".update_queue_push_timeout_milliseconds", 10); if (update_queue_push_timeout_milliseconds < 10) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: dictionary of layout '{}' have too little update_queue_push_timeout", + "{}: dictionary of layout '{}' setting 'update_queue_push_timeout_milliseconds' must be greater or equal than 10", full_name, layout_type); - const size_t query_wait_timeout_milliseconds = config.getUInt64(layout_prefix + ".cache.query_wait_timeout_milliseconds", 60000); + size_t query_wait_timeout_milliseconds = config.getUInt64(dictionary_layout_prefix + ".query_wait_timeout_milliseconds", 60000); - const size_t max_threads_for_updates = config.getUInt64(layout_prefix + ".max_threads_for_updates", 4); + size_t max_threads_for_updates = config.getUInt64(dictionary_layout_prefix + ".max_threads_for_updates", 4); if (max_threads_for_updates == 0) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: dictionary of layout) '{}' cannot have zero threads for updates", + "{}: dictionary of layout '{}' setting 'max_threads_for_updates' must be greater than 0", full_name, layout_type); - CacheDictionaryUpdateQueueConfiguration update_queue_configuration{ - max_update_queue_size, max_threads_for_updates, update_queue_push_timeout_milliseconds, query_wait_timeout_milliseconds}; + CacheDictionaryUpdateQueueConfiguration update_queue_configuration + { + .max_update_queue_size = max_update_queue_size, + .max_threads_for_updates = max_threads_for_updates, + .update_queue_push_timeout_milliseconds = update_queue_push_timeout_milliseconds, + .query_wait_timeout_milliseconds = query_wait_timeout_milliseconds + }; return update_queue_configuration; } -template +template DictionaryPtr createCacheDictionaryLayout( - const String & full_name, - const DictionaryStructure & dict_struct, - const Poco::Util::AbstractConfiguration & config, - const std::string & config_prefix, - DictionarySourcePtr source_ptr) -{ - static_assert(dictionary_key_type != DictionaryKeyType::range, "Range key type is not supported by CacheDictionary"); - - if constexpr (dictionary_key_type == DictionaryKeyType::simple) - { - if (dict_struct.key) - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "'key' is not supported for dictionary of layout 'cache'"); - } - else if constexpr (dictionary_key_type == DictionaryKeyType::complex) - { - if (dict_struct.id) - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "'id' is not supported for dictionary of layout 'complex_key_cache'"); - } - - if (dict_struct.range_min || dict_struct.range_max) - throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: elements .structure.range_min and .structure.range_max should be defined only " - "for a dictionary of layout 'range_hashed'", - full_name); - - const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); - if (require_nonempty) - throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: cache dictionary of layout cannot have 'require_nonempty' attribute set", - full_name); - - const auto & layout_prefix = config_prefix + ".layout"; - - const auto dict_id = StorageID::fromDictionaryConfig(config, config_prefix); - - const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; - - const bool allow_read_expired_keys = config.getBool(layout_prefix + ".cache.allow_read_expired_keys", false); - - auto storage_configuration = parseCacheStorageConfiguration(full_name, config, layout_prefix, dict_lifetime, dictionary_key_type); - - std::shared_ptr storage = std::make_shared>(dict_struct, storage_configuration); - - auto update_queue_configuration = parseCacheDictionaryUpdateQueueConfiguration(full_name, config, layout_prefix, dictionary_key_type); - - return std::make_unique>( - dict_id, dict_struct, std::move(source_ptr), storage, update_queue_configuration, dict_lifetime, allow_read_expired_keys); -} - -#if defined(OS_LINUX) || defined(__FreeBSD__) - -template -DictionaryPtr createSSDCacheDictionaryLayout( const String & full_name, const DictionaryStructure & dict_struct, const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, DictionarySourcePtr source_ptr, - ContextPtr context, - bool created_from_ddl) + ContextPtr context [[maybe_unused]], + bool created_from_ddl [[maybe_unused]]) { static_assert(dictionary_key_type != DictionaryKeyType::range, "Range key type is not supported by CacheDictionary"); + String layout_type; + if constexpr (dictionary_key_type == DictionaryKeyType::simple && !ssd) + layout_type = "cache"; + else if constexpr (dictionary_key_type == DictionaryKeyType::simple && ssd) + layout_type = "ssd_cache"; + else if constexpr (dictionary_key_type == DictionaryKeyType::complex && !ssd) + layout_type = "complex_key_cache"; + else if constexpr (dictionary_key_type == DictionaryKeyType::complex && ssd) + layout_type = "complex_key_ssd_cache"; + if constexpr (dictionary_key_type == DictionaryKeyType::simple) { if (dict_struct.key) - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "'key' is not supported for dictionary of layout 'ssd_cache'"); + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "{}: dictionary of layout '{}' 'key' is not supported", full_name, layout_type); } else if constexpr (dictionary_key_type == DictionaryKeyType::complex) { if (dict_struct.id) - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "'id' is not supported for dictionary of layout 'complex_key_ssd_cache'"); + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "{}: dictionary of layout '{}' 'id' is not supported", full_name, layout_type); } if (dict_struct.range_min || dict_struct.range_max) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: elements .structure.range_min and .structure.range_max should be defined only " + "{}: dictionary of layout '{}' elements .structure.range_min and .structure.range_max must be defined only " "for a dictionary of layout 'range_hashed'", - full_name); + full_name, + layout_type); const bool require_nonempty = config.getBool(config_prefix + ".require_nonempty", false); if (require_nonempty) throw Exception(ErrorCodes::BAD_ARGUMENTS, - "{}: cache dictionary of layout cannot have 'require_nonempty' attribute set", - full_name); - - const auto & layout_prefix = config_prefix + ".layout"; - - const auto dict_id = StorageID::fromDictionaryConfig(config, config_prefix); + "{}: cache dictionary of layout '{}' cannot have 'require_nonempty' attribute set", + full_name, + layout_type); + const auto dictionary_identifier = StorageID::fromDictionaryConfig(config, config_prefix); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; - const bool allow_read_expired_keys = config.getBool(layout_prefix + ".cache.allow_read_expired_keys", false); + const auto & layout_prefix = config_prefix + ".layout"; + const auto & dictionary_layout_prefix = layout_prefix + '.' + layout_type; + const bool allow_read_expired_keys = config.getBool(dictionary_layout_prefix + ".allow_read_expired_keys", false); - auto storage_configuration = parseSSDCacheStorageConfiguration(full_name, config, layout_prefix, dict_lifetime, dictionary_key_type); + auto update_queue_configuration = parseCacheDictionaryUpdateQueueConfiguration(config, full_name, layout_type, dictionary_layout_prefix); - if (created_from_ddl && !pathStartsWith(storage_configuration.file_path, context->getUserFilesPath())) - throw Exception(ErrorCodes::PATH_ACCESS_DENIED, "File path {} is not inside {}", storage_configuration.file_path, context->getUserFilesPath()); + std::shared_ptr storage; + if constexpr (!ssd) + { + auto storage_configuration = parseCacheStorageConfiguration(config, full_name, layout_type, dictionary_layout_prefix, dict_lifetime); + storage = std::make_shared>(dict_struct, storage_configuration); + } +#if defined(OS_LINUX) || defined(__FreeBSD__) + else + { + auto storage_configuration = parseSSDCacheStorageConfiguration(config, full_name, layout_type, dictionary_layout_prefix, dict_lifetime); + if (created_from_ddl && !pathStartsWith(storage_configuration.file_path, context->getUserFilesPath())) + throw Exception(ErrorCodes::PATH_ACCESS_DENIED, "File path {} is not inside {}", storage_configuration.file_path, context->getUserFilesPath()); - auto storage = std::make_shared>(storage_configuration); - - auto update_queue_configuration = parseCacheDictionaryUpdateQueueConfiguration(full_name, config, layout_prefix, dictionary_key_type); - - return std::make_unique>( - dict_id, dict_struct, std::move(source_ptr), storage, update_queue_configuration, dict_lifetime, allow_read_expired_keys); -} - + storage = std::make_shared>(storage_configuration); + } #endif + auto dictionary = std::make_unique>( + dictionary_identifier, + dict_struct, + std::move(source_ptr), + std::move(storage), + update_queue_configuration, + dict_lifetime, + allow_read_expired_keys); + + return dictionary; +} + void registerDictionaryCache(DictionaryFactory & factory) { auto create_simple_cache_layout = [=](const String & full_name, @@ -265,10 +240,10 @@ void registerDictionaryCache(DictionaryFactory & factory) const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, DictionarySourcePtr source_ptr, - ContextPtr /* context */, - bool /* created_from_ddl */) -> DictionaryPtr + ContextPtr context, + bool created_from_ddl) -> DictionaryPtr { - return createCacheDictionaryLayout(full_name, dict_struct, config, config_prefix, std::move(source_ptr)); + return createCacheDictionaryLayout(full_name, dict_struct, config, config_prefix, std::move(source_ptr), std::move(context), created_from_ddl); }; factory.registerLayout("cache", create_simple_cache_layout, false); @@ -278,10 +253,10 @@ void registerDictionaryCache(DictionaryFactory & factory) const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix, DictionarySourcePtr source_ptr, - ContextPtr /* context */, - bool /* created_from_ddl */) -> DictionaryPtr + ContextPtr context, + bool created_from_ddl) -> DictionaryPtr { - return createCacheDictionaryLayout(full_name, dict_struct, config, config_prefix, std::move(source_ptr)); + return createCacheDictionaryLayout(full_name, dict_struct, config, config_prefix, std::move(source_ptr), std::move(context), created_from_ddl); }; factory.registerLayout("complex_key_cache", create_complex_key_cache_layout, true); @@ -296,7 +271,7 @@ void registerDictionaryCache(DictionaryFactory & factory) ContextPtr context, bool created_from_ddl) -> DictionaryPtr { - return createSSDCacheDictionaryLayout(full_name, dict_struct, config, config_prefix, std::move(source_ptr), context, created_from_ddl); + return createCacheDictionaryLayout(full_name, dict_struct, config, config_prefix, std::move(source_ptr), std::move(context), created_from_ddl); }; factory.registerLayout("ssd_cache", create_simple_ssd_cache_layout, false); @@ -308,11 +283,13 @@ void registerDictionaryCache(DictionaryFactory & factory) DictionarySourcePtr source_ptr, ContextPtr context, bool created_from_ddl) -> DictionaryPtr { - return createSSDCacheDictionaryLayout(full_name, dict_struct, config, config_prefix, std::move(source_ptr), context, created_from_ddl); + return createCacheDictionaryLayout(full_name, dict_struct, config, config_prefix, std::move(source_ptr), std::move(context), created_from_ddl); }; factory.registerLayout("complex_key_ssd_cache", create_complex_key_ssd_cache_layout, true); + #endif + } } diff --git a/src/Disks/DiskEncrypted.cpp b/src/Disks/DiskEncrypted.cpp index cec033ef465..9980dc0d8dc 100644 --- a/src/Disks/DiskEncrypted.cpp +++ b/src/Disks/DiskEncrypted.cpp @@ -5,6 +5,7 @@ #include #include #include +#include namespace DB @@ -12,13 +13,152 @@ namespace DB namespace ErrorCodes { + extern const int BAD_ARGUMENTS; extern const int INCORRECT_DISK_INDEX; - extern const int UNKNOWN_ELEMENT_IN_CONFIG; - extern const int LOGICAL_ERROR; + extern const int DATA_ENCRYPTION_ERROR; + extern const int NOT_IMPLEMENTED; } -using DiskEncryptedPtr = std::shared_ptr; -using namespace FileEncryption; +namespace +{ + using DiskEncryptedPtr = std::shared_ptr; + using namespace FileEncryption; + + constexpr Algorithm DEFAULT_ENCRYPTION_ALGORITHM = Algorithm::AES_128_CTR; + + String unhexKey(const String & hex) + { + try + { + return boost::algorithm::unhex(hex); + } + catch (const std::exception &) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Cannot read key_hex, check for valid characters [0-9a-fA-F] and length"); + } + } + + std::unique_ptr parseDiskEncryptedSettings( + const String & name, const Poco::Util::AbstractConfiguration & config, const String & config_prefix, const DisksMap & map) + { + try + { + auto res = std::make_unique(); + res->current_algorithm = DEFAULT_ENCRYPTION_ALGORITHM; + if (config.has(config_prefix + ".algorithm")) + parseFromString(res->current_algorithm, config.getString(config_prefix + ".algorithm")); + + Strings config_keys; + config.keys(config_prefix, config_keys); + for (const std::string & config_key : config_keys) + { + String key; + UInt64 key_id; + + if ((config_key == "key") || config_key.starts_with("key[")) + { + key = config.getString(config_prefix + "." + config_key, ""); + key_id = config.getUInt64(config_prefix + "." + config_key + "[@id]", 0); + } + else if ((config_key == "key_hex") || config_key.starts_with("key_hex[")) + { + key = unhexKey(config.getString(config_prefix + "." + config_key, "")); + key_id = config.getUInt64(config_prefix + "." + config_key + "[@id]", 0); + } + else + continue; + + if (res->keys.contains(key_id)) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Multiple keys have the same ID {}", key_id); + res->keys[key_id] = key; + } + + if (res->keys.empty()) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "No keys, an encrypted disk needs keys to work"); + + res->current_key_id = config.getUInt64(config_prefix + ".current_key_id", 0); + if (!res->keys.contains(res->current_key_id)) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Not found a key with the current ID {}", res->current_key_id); + FileEncryption::checkKeySize(res->current_algorithm, res->keys[res->current_key_id].size()); + + String wrapped_disk_name = config.getString(config_prefix + ".disk", ""); + if (wrapped_disk_name.empty()) + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "Name of the wrapped disk must not be empty. Encrypted disk is a wrapper over another disk"); + + auto wrapped_disk_it = map.find(wrapped_disk_name); + if (wrapped_disk_it == map.end()) + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "The wrapped disk must have been announced earlier. No disk with name {}", + wrapped_disk_name); + res->wrapped_disk = wrapped_disk_it->second; + + res->disk_path = config.getString(config_prefix + ".path", ""); + if (!res->disk_path.empty() && (res->disk_path.back() != '/')) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Disk path must ends with '/', but '{}' doesn't.", quoteString(res->disk_path)); + + return res; + } + catch (Exception & e) + { + e.addMessage("Disk " + name); + throw; + } + } + + FileEncryption::Header readHeader(ReadBufferFromFileBase & read_buffer) + { + try + { + FileEncryption::Header header; + header.read(read_buffer); + return header; + } + catch (Exception & e) + { + e.addMessage("While reading the header of encrypted file " + quoteString(read_buffer.getFileName())); + throw; + } + } + + String getCurrentKey(const String & path, const DiskEncryptedSettings & settings) + { + auto it = settings.keys.find(settings.current_key_id); + if (it == settings.keys.end()) + throw Exception( + ErrorCodes::DATA_ENCRYPTION_ERROR, + "Not found a key with the current ID {} required to cipher file {}", + settings.current_key_id, + quoteString(path)); + + return it->second; + } + + String getKey(const String & path, const FileEncryption::Header & header, const DiskEncryptedSettings & settings) + { + auto it = settings.keys.find(header.key_id); + if (it == settings.keys.end()) + throw Exception( + ErrorCodes::DATA_ENCRYPTION_ERROR, + "Not found a key with ID {} required to decipher file {}", + header.key_id, + quoteString(path)); + + String key = it->second; + if (calculateKeyHash(key) != header.key_hash) + throw Exception( + ErrorCodes::DATA_ENCRYPTION_ERROR, "Wrong key with ID {}, could not decipher file {}", header.key_id, quoteString(path)); + + return key; + } + + bool inline isSameDiskType(const IDisk & one, const IDisk & another) + { + return typeid(one) == typeid(another); + } +} class DiskEncryptedReservation : public IReservation { @@ -46,6 +186,22 @@ private: std::unique_ptr reservation; }; +DiskEncrypted::DiskEncrypted( + const String & name_, const Poco::Util::AbstractConfiguration & config_, const String & config_prefix_, const DisksMap & map_) + : DiskEncrypted(name_, parseDiskEncryptedSettings(name_, config_, config_prefix_, map_)) +{ +} + +DiskEncrypted::DiskEncrypted(const String & name_, std::unique_ptr settings_) + : DiskDecorator(settings_->wrapped_disk) + , name(name_) + , disk_path(settings_->disk_path) + , disk_absolute_path(settings_->wrapped_disk->getPath() + settings_->disk_path) + , current_settings(std::move(settings_)) +{ + delegate->createDirectories(disk_path); +} + ReservationPtr DiskEncrypted::reserve(UInt64 bytes) { auto reservation = delegate->reserve(bytes); @@ -54,24 +210,30 @@ ReservationPtr DiskEncrypted::reserve(UInt64 bytes) return std::make_unique(std::static_pointer_cast(shared_from_this()), std::move(reservation)); } -DiskEncrypted::DiskEncrypted(const String & name_, DiskPtr disk_, const String & key_, const String & path_) - : DiskDecorator(disk_) - , name(name_), key(key_), disk_path(path_) - , disk_absolute_path(delegate->getPath() + disk_path) +void DiskEncrypted::copy(const String & from_path, const std::shared_ptr & to_disk, const String & to_path) { - initialize(); -} + /// Check if we can copy the file without deciphering. + if (isSameDiskType(*this, *to_disk)) + { + /// Disk type is the same, check if the key is the same too. + if (auto * to_disk_enc = typeid_cast(to_disk.get())) + { + auto from_settings = current_settings.get(); + auto to_settings = to_disk_enc->current_settings.get(); + if (from_settings->keys == to_settings->keys) + { + /// Keys are the same so we can simply copy the encrypted file. + auto wrapped_from_path = wrappedPath(from_path); + auto to_delegate = to_disk_enc->delegate; + auto wrapped_to_path = to_disk_enc->wrappedPath(to_path); + delegate->copy(wrapped_from_path, to_delegate, wrapped_to_path); + return; + } + } + } -void DiskEncrypted::initialize() -{ - // use wrapped_disk as an EncryptedDisk store - if (disk_path.empty()) - return; - - if (disk_path.back() != '/') - throw Exception("Disk path must ends with '/', but '" + disk_path + "' doesn't.", ErrorCodes::LOGICAL_ERROR); - - delegate->createDirectories(disk_path); + /// Copy the file through buffers with deciphering. + copyThroughBuffers(from_path, to_disk, to_path); } std::unique_ptr DiskEncrypted::readFile( @@ -84,38 +246,41 @@ std::unique_ptr DiskEncrypted::readFile( { auto wrapped_path = wrappedPath(path); auto buffer = delegate->readFile(wrapped_path, buf_size, estimated_size, aio_threshold, mmap_threshold, mmap_cache); - - String iv; - size_t offset = 0; - - if (exists(path) && getFileSize(path)) - { - iv = readIV(kIVSize, *buffer); - offset = kIVSize; - } - else - iv = randomString(kIVSize); - - return std::make_unique(buf_size, std::move(buffer), iv, key, offset); + auto settings = current_settings.get(); + FileEncryption::Header header = readHeader(*buffer); + String key = getKey(path, header, *settings); + return std::make_unique(buf_size, std::move(buffer), key, header); } std::unique_ptr DiskEncrypted::writeFile(const String & path, size_t buf_size, WriteMode mode) { - String iv; - size_t start_offset = 0; auto wrapped_path = wrappedPath(path); - - if (mode == WriteMode::Append && exists(path) && getFileSize(path)) + FileEncryption::Header header; + String key; + UInt64 old_file_size = 0; + auto settings = current_settings.get(); + if (mode == WriteMode::Append && exists(path)) { - auto read_buffer = delegate->readFile(wrapped_path, kIVSize); - iv = readIV(kIVSize, *read_buffer); - start_offset = getFileSize(path); + old_file_size = getFileSize(path); + if (old_file_size) + { + /// Append mode: we continue to use the same header. + auto read_buffer = delegate->readFile(wrapped_path, FileEncryption::Header::kSize); + header = readHeader(*read_buffer); + key = getKey(path, header, *settings); + } + } + if (!old_file_size) + { + /// Rewrite mode: we generate a new header. + key = getCurrentKey(path, *settings); + header.algorithm = settings->current_algorithm; + header.key_id = settings->current_key_id; + header.key_hash = calculateKeyHash(key); + header.init_vector = InitVector::random(); } - else - iv = randomString(kIVSize); - auto buffer = delegate->writeFile(wrapped_path, buf_size, mode); - return std::make_unique(buf_size, std::move(buffer), iv, key, start_offset); + return std::make_unique(buf_size, std::move(buffer), key, header, old_file_size); } @@ -123,13 +288,13 @@ size_t DiskEncrypted::getFileSize(const String & path) const { auto wrapped_path = wrappedPath(path); size_t size = delegate->getFileSize(wrapped_path); - return size > kIVSize ? (size - kIVSize) : 0; + return size > FileEncryption::Header::kSize ? (size - FileEncryption::Header::kSize) : 0; } void DiskEncrypted::truncateFile(const String & path, size_t size) { auto wrapped_path = wrappedPath(path); - delegate->truncateFile(wrapped_path, size ? (size + kIVSize) : 0); + delegate->truncateFile(wrapped_path, size ? (size + FileEncryption::Header::kSize) : 0); } SyncGuardPtr DiskEncrypted::getDirectorySyncGuard(const String & path) const @@ -142,25 +307,16 @@ void DiskEncrypted::applyNewSettings( const Poco::Util::AbstractConfiguration & config, ContextPtr /*context*/, const String & config_prefix, - const DisksMap & map) + const DisksMap & disk_map) { - String wrapped_disk_name = config.getString(config_prefix + ".disk", ""); - if (wrapped_disk_name.empty()) - throw Exception("The wrapped disk name can not be empty. An encrypted disk is a wrapper over another disk. " - "Disk " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); + auto new_settings = parseDiskEncryptedSettings(name, config, config_prefix, disk_map); + if (new_settings->wrapped_disk != delegate) + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Сhanging wrapped disk on the fly is not supported. Disk {}", name); - key = config.getString(config_prefix + ".key", ""); - if (key.empty()) - throw Exception("Encrypted disk key can not be empty. Disk " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); + if (new_settings->disk_path != disk_path) + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Сhanging disk path on the fly is not supported. Disk {}", name); - auto wrapped_disk = map.find(wrapped_disk_name); - if (wrapped_disk == map.end()) - throw Exception("The wrapped disk must have been announced earlier. No disk with name " + wrapped_disk_name + ". Disk " + name, - ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); - delegate = wrapped_disk->second; - - disk_path = config.getString(config_prefix + ".path", ""); - initialize(); + current_settings.set(std::move(new_settings)); } void registerDiskEncrypted(DiskFactory & factory) @@ -169,28 +325,9 @@ void registerDiskEncrypted(DiskFactory & factory) const Poco::Util::AbstractConfiguration & config, const String & config_prefix, ContextPtr /*context*/, - const DisksMap & map) -> DiskPtr { - - String wrapped_disk_name = config.getString(config_prefix + ".disk", ""); - if (wrapped_disk_name.empty()) - throw Exception("The wrapped disk name can not be empty. An encrypted disk is a wrapper over another disk. " - "Disk " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); - - String key = config.getString(config_prefix + ".key", ""); - if (key.empty()) - throw Exception("Encrypted disk key can not be empty. Disk " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); - if (key.size() != cipherKeyLength(defaultCipher())) - throw Exception("Expected key with size " + std::to_string(cipherKeyLength(defaultCipher())) + ", got key with size " + std::to_string(key.size()), - ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); - - auto wrapped_disk = map.find(wrapped_disk_name); - if (wrapped_disk == map.end()) - throw Exception("The wrapped disk must have been announced earlier. No disk with name " + wrapped_disk_name + ". Disk " + name, - ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); - - String relative_path = config.getString(config_prefix + ".path", ""); - - return std::make_shared(name, wrapped_disk->second, key, relative_path); + const DisksMap & map) -> DiskPtr + { + return std::make_shared(name, config, config_prefix, map); }; factory.registerDiskType("encrypted", creator); } diff --git a/src/Disks/DiskEncrypted.h b/src/Disks/DiskEncrypted.h index 0a38765a791..8217db4141a 100644 --- a/src/Disks/DiskEncrypted.h +++ b/src/Disks/DiskEncrypted.h @@ -7,17 +7,32 @@ #if USE_SSL #include #include +#include namespace DB { class ReadBufferFromFileBase; class WriteBufferFromFileBase; +namespace FileEncryption { enum class Algorithm; } +struct DiskEncryptedSettings +{ + DiskPtr wrapped_disk; + String disk_path; + std::unordered_map keys; + UInt64 current_key_id; + FileEncryption::Algorithm current_algorithm; +}; + +/// Encrypted disk ciphers all written files on the fly and writes the encrypted files to an underlying (normal) disk. +/// And when we read files from an encrypted disk it deciphers them automatically, +/// so we can work with a encrypted disk like it's a normal disk. class DiskEncrypted : public DiskDecorator { public: - DiskEncrypted(const String & name_, DiskPtr disk_, const String & key_, const String & path_); + DiskEncrypted(const String & name_, const Poco::Util::AbstractConfiguration & config_, const String & config_prefix_, const DisksMap & map_); + DiskEncrypted(const String & name_, std::unique_ptr settings_); const String & getName() const override { return name; } const String & getPath() const override { return disk_absolute_path; } @@ -102,10 +117,7 @@ public: delegate->listFiles(wrapped_path, file_names); } - void copy(const String & from_path, const std::shared_ptr & to_disk, const String & to_path) override - { - IDisk::copy(from_path, to_disk, to_path); - } + void copy(const String & from_path, const std::shared_ptr & to_disk, const String & to_path) override; std::unique_ptr readFile( const String & path, @@ -208,8 +220,6 @@ public: SyncGuardPtr getDirectorySyncGuard(const String & path) const override; private: - void initialize(); - String wrappedPath(const String & path) const { // if path starts_with disk_path -> got already wrapped path @@ -218,10 +228,10 @@ private: return disk_path + path; } - String name; - String key; - String disk_path; - String disk_absolute_path; + const String name; + const String disk_path; + const String disk_absolute_path; + MultiVersion current_settings; }; } diff --git a/src/Disks/DiskLocal.cpp b/src/Disks/DiskLocal.cpp index a723803cd88..b4d7f74f10f 100644 --- a/src/Disks/DiskLocal.cpp +++ b/src/Disks/DiskLocal.cpp @@ -31,6 +31,56 @@ std::mutex DiskLocal::reservation_mutex; using DiskLocalPtr = std::shared_ptr; +static void loadDiskLocalConfig(const String & name, + const Poco::Util::AbstractConfiguration & config, + const String & config_prefix, + ContextPtr context, + String & path, + UInt64 & keep_free_space_bytes) +{ + path = config.getString(config_prefix + ".path", ""); + if (name == "default") + { + if (!path.empty()) + throw Exception( + "\"default\" disk path should be provided in not it ", + ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); + path = context->getPath(); + } + else + { + if (path.empty()) + throw Exception("Disk path can not be empty. Disk " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); + if (path.back() != '/') + throw Exception("Disk path must end with /. Disk " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); + } + + if (!FS::canRead(path) || !FS::canWrite(path)) + throw Exception("There is no RW access to the disk " + name + " (" + path + ")", ErrorCodes::PATH_ACCESS_DENIED); + + bool has_space_ratio = config.has(config_prefix + ".keep_free_space_ratio"); + + if (config.has(config_prefix + ".keep_free_space_bytes") && has_space_ratio) + throw Exception( + "Only one of 'keep_free_space_bytes' and 'keep_free_space_ratio' can be specified", + ErrorCodes::EXCESSIVE_ELEMENT_IN_CONFIG); + + keep_free_space_bytes = config.getUInt64(config_prefix + ".keep_free_space_bytes", 0); + + if (has_space_ratio) + { + auto ratio = config.getDouble(config_prefix + ".keep_free_space_ratio"); + if (ratio < 0 || ratio > 1) + throw Exception("'keep_free_space_ratio' have to be between 0 and 1", ErrorCodes::EXCESSIVE_ELEMENT_IN_CONFIG); + String tmp_path = path; + if (tmp_path.empty()) + tmp_path = context->getPath(); + + // Create tmp disk for getting total disk space. + keep_free_space_bytes = static_cast(DiskLocal("tmp", tmp_path, 0).getTotalSpace() * ratio); + } +} + class DiskLocalReservation : public IReservation { public: @@ -309,7 +359,7 @@ void DiskLocal::copy(const String & from_path, const std::shared_ptr & to fs::copy(from, to, fs::copy_options::recursive | fs::copy_options::overwrite_existing); /// Use more optimal way. } else - IDisk::copy(from_path, to_disk, to_path); /// Copy files through buffers. + copyThroughBuffers(from_path, to_disk, to_path); /// Base implementation. } SyncGuardPtr DiskLocal::getDirectorySyncGuard(const String & path) const @@ -317,6 +367,21 @@ SyncGuardPtr DiskLocal::getDirectorySyncGuard(const String & path) const return std::make_unique(fs::path(disk_path) / path); } + +void DiskLocal::applyNewSettings(const Poco::Util::AbstractConfiguration & config, ContextPtr context, const String & config_prefix, const DisksMap &) +{ + String new_disk_path; + UInt64 new_keep_free_space_bytes; + + loadDiskLocalConfig(name, config, config_prefix, context, new_disk_path, new_keep_free_space_bytes); + + if (disk_path != new_disk_path) + throw Exception("Disk path can't be updated from config " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); + + if (keep_free_space_bytes != new_keep_free_space_bytes) + keep_free_space_bytes = new_keep_free_space_bytes; +} + DiskPtr DiskLocalReservation::getDisk(size_t i) const { if (i != 0) @@ -334,7 +399,6 @@ void DiskLocalReservation::update(UInt64 new_size) disk->reserved_bytes += size; } - DiskLocalReservation::~DiskLocalReservation() { try @@ -369,48 +433,9 @@ void registerDiskLocal(DiskFactory & factory) const String & config_prefix, ContextPtr context, const DisksMap & /*map*/) -> DiskPtr { - String path = config.getString(config_prefix + ".path", ""); - if (name == "default") - { - if (!path.empty()) - throw Exception( - "\"default\" disk path should be provided in not it ", - ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); - path = context->getPath(); - } - else - { - if (path.empty()) - throw Exception("Disk path can not be empty. Disk " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); - if (path.back() != '/') - throw Exception("Disk path must end with /. Disk " + name, ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); - } - - if (!FS::canRead(path) || !FS::canWrite(path)) - throw Exception("There is no RW access to the disk " + name + " (" + path + ")", ErrorCodes::PATH_ACCESS_DENIED); - - bool has_space_ratio = config.has(config_prefix + ".keep_free_space_ratio"); - - if (config.has(config_prefix + ".keep_free_space_bytes") && has_space_ratio) - throw Exception( - "Only one of 'keep_free_space_bytes' and 'keep_free_space_ratio' can be specified", - ErrorCodes::EXCESSIVE_ELEMENT_IN_CONFIG); - - UInt64 keep_free_space_bytes = config.getUInt64(config_prefix + ".keep_free_space_bytes", 0); - - if (has_space_ratio) - { - auto ratio = config.getDouble(config_prefix + ".keep_free_space_ratio"); - if (ratio < 0 || ratio > 1) - throw Exception("'keep_free_space_ratio' have to be between 0 and 1", ErrorCodes::EXCESSIVE_ELEMENT_IN_CONFIG); - String tmp_path = path; - if (tmp_path.empty()) - tmp_path = context->getPath(); - - // Create tmp disk for getting total disk space. - keep_free_space_bytes = static_cast(DiskLocal("tmp", tmp_path, 0).getTotalSpace() * ratio); - } - + String path; + UInt64 keep_free_space_bytes; + loadDiskLocalConfig(name, config, config_prefix, context, path, keep_free_space_bytes); return std::make_shared(name, path, keep_free_space_bytes); }; factory.registerDiskType("local", creator); diff --git a/src/Disks/DiskLocal.h b/src/Disks/DiskLocal.h index 3aa243b103b..0cf0cf39bbc 100644 --- a/src/Disks/DiskLocal.h +++ b/src/Disks/DiskLocal.h @@ -5,6 +5,7 @@ #include #include #include +#include namespace DB @@ -104,13 +105,15 @@ public: SyncGuardPtr getDirectorySyncGuard(const String & path) const override; + void applyNewSettings(const Poco::Util::AbstractConfiguration & config, ContextPtr context, const String & config_prefix, const DisksMap &) override; + private: bool tryReserve(UInt64 bytes); private: const String name; const String disk_path; - const UInt64 keep_free_space_bytes; + std::atomic keep_free_space_bytes; UInt64 reserved_bytes = 0; UInt64 reservation_count = 0; @@ -120,4 +123,5 @@ private: Poco::Logger * log = &Poco::Logger::get("DiskLocal"); }; + } diff --git a/src/Disks/DiskSelector.h b/src/Disks/DiskSelector.h index 88cc6ee5197..54752215441 100644 --- a/src/Disks/DiskSelector.h +++ b/src/Disks/DiskSelector.h @@ -32,7 +32,7 @@ public: /// Get all disks with names const DisksMap & getDisksMap() const { return disks; } - void addToDiskMap(String name, DiskPtr disk) + void addToDiskMap(const String & name, DiskPtr disk) { disks.emplace(name, disk); } diff --git a/src/Disks/IDisk.cpp b/src/Disks/IDisk.cpp index 82705b5dcc8..df0f921389f 100644 --- a/src/Disks/IDisk.cpp +++ b/src/Disks/IDisk.cpp @@ -58,7 +58,7 @@ void asyncCopy(IDisk & from_disk, String from_path, IDisk & to_disk, String to_p } } -void IDisk::copy(const String & from_path, const std::shared_ptr & to_disk, const String & to_path) +void IDisk::copyThroughBuffers(const String & from_path, const std::shared_ptr & to_disk, const String & to_path) { auto & exec = to_disk->getExecutor(); ResultsCollector results; @@ -71,6 +71,11 @@ void IDisk::copy(const String & from_path, const std::shared_ptr & to_dis result.get(); } +void IDisk::copy(const String & from_path, const std::shared_ptr & to_disk, const String & to_path) +{ + copyThroughBuffers(from_path, to_disk, to_path); +} + void IDisk::truncateFile(const String &, size_t) { throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Truncate operation is not implemented for disk of type {}", getType()); diff --git a/src/Disks/IDisk.h b/src/Disks/IDisk.h index a70b44a789f..61c805961ae 100644 --- a/src/Disks/IDisk.h +++ b/src/Disks/IDisk.h @@ -13,9 +13,9 @@ #include #include #include +#include "Poco/Util/AbstractConfiguration.h" #include #include -#include "Poco/Util/AbstractConfiguration.h" namespace fs = std::filesystem; @@ -246,6 +246,11 @@ protected: /// Returns executor to perform asynchronous operations. virtual Executor & getExecutor() { return *executor; } + /// Base implementation of the function copy(). + /// It just opens two files, reads data by portions from the first file, and writes it to the second one. + /// A derived class may override copy() to provide a faster implementation. + void copyThroughBuffers(const String & from_path, const std::shared_ptr & to_disk, const String & to_path); + private: std::unique_ptr executor; }; diff --git a/src/Disks/S3/DiskS3.cpp b/src/Disks/S3/DiskS3.cpp index 1f1c73c32c3..6dd29165566 100644 --- a/src/Disks/S3/DiskS3.cpp +++ b/src/Disks/S3/DiskS3.cpp @@ -363,7 +363,8 @@ int DiskS3::readSchemaVersion(const String & source_bucket, const String & sourc settings->client, source_bucket, source_path + SCHEMA_VERSION_OBJECT, - settings->s3_max_single_read_retries); + settings->s3_max_single_read_retries, + DBMS_DEFAULT_BUFFER_SIZE); readIntText(version, buffer); diff --git a/src/Disks/S3/ProxyConfiguration.h b/src/Disks/S3/ProxyConfiguration.h index 888f4c6faf9..793170e727c 100644 --- a/src/Disks/S3/ProxyConfiguration.h +++ b/src/Disks/S3/ProxyConfiguration.h @@ -19,6 +19,7 @@ public: virtual ~ProxyConfiguration() = default; /// Returns proxy configuration on each HTTP request. virtual Aws::Client::ClientConfigurationPerRequest getConfiguration(const Aws::Http::HttpRequest & request) = 0; + virtual void errorReport(const Aws::Client::ClientConfigurationPerRequest & config) = 0; }; } diff --git a/src/Disks/S3/ProxyListConfiguration.h b/src/Disks/S3/ProxyListConfiguration.h index 4d37d2b6d69..bd5bbba19a4 100644 --- a/src/Disks/S3/ProxyListConfiguration.h +++ b/src/Disks/S3/ProxyListConfiguration.h @@ -20,6 +20,7 @@ class ProxyListConfiguration : public ProxyConfiguration public: explicit ProxyListConfiguration(std::vector proxies_); Aws::Client::ClientConfigurationPerRequest getConfiguration(const Aws::Http::HttpRequest & request) override; + void errorReport(const Aws::Client::ClientConfigurationPerRequest &) override {} private: /// List of configured proxies. diff --git a/src/Disks/S3/ProxyResolverConfiguration.cpp b/src/Disks/S3/ProxyResolverConfiguration.cpp index b959d8b4415..17dd19fe444 100644 --- a/src/Disks/S3/ProxyResolverConfiguration.cpp +++ b/src/Disks/S3/ProxyResolverConfiguration.cpp @@ -16,8 +16,10 @@ namespace DB::ErrorCodes namespace DB::S3 { -ProxyResolverConfiguration::ProxyResolverConfiguration(const Poco::URI & endpoint_, String proxy_scheme_, unsigned proxy_port_) - : endpoint(endpoint_), proxy_scheme(std::move(proxy_scheme_)), proxy_port(proxy_port_) + +ProxyResolverConfiguration::ProxyResolverConfiguration(const Poco::URI & endpoint_, String proxy_scheme_ + , unsigned proxy_port_, unsigned cache_ttl_) + : endpoint(endpoint_), proxy_scheme(std::move(proxy_scheme_)), proxy_port(proxy_port_), cache_ttl(cache_ttl_) { } @@ -25,16 +27,25 @@ Aws::Client::ClientConfigurationPerRequest ProxyResolverConfiguration::getConfig { LOG_DEBUG(&Poco::Logger::get("AWSClient"), "Obtain proxy using resolver: {}", endpoint.toString()); + std::unique_lock lock(cache_mutex); + + std::chrono::time_point now = std::chrono::system_clock::now(); + + if (cache_ttl.count() && cache_valid && now <= cache_timestamp + cache_ttl && now >= cache_timestamp) + { + LOG_DEBUG(&Poco::Logger::get("AWSClient"), "Use cached proxy: {}://{}:{}", Aws::Http::SchemeMapper::ToString(cached_config.proxyScheme), cached_config.proxyHost, cached_config.proxyPort); + return cached_config; + } + /// 1 second is enough for now. /// TODO: Make timeouts configurable. ConnectionTimeouts timeouts( Poco::Timespan(1000000), /// Connection timeout. Poco::Timespan(1000000), /// Send timeout. - Poco::Timespan(1000000) /// Receive timeout. + Poco::Timespan(1000000) /// Receive timeout. ); auto session = makeHTTPSession(endpoint, timeouts); - Aws::Client::ClientConfigurationPerRequest cfg; try { /// It should be just empty GET request. @@ -53,20 +64,41 @@ Aws::Client::ClientConfigurationPerRequest ProxyResolverConfiguration::getConfig LOG_DEBUG(&Poco::Logger::get("AWSClient"), "Use proxy: {}://{}:{}", proxy_scheme, proxy_host, proxy_port); - cfg.proxyScheme = Aws::Http::SchemeMapper::FromString(proxy_scheme.c_str()); - cfg.proxyHost = proxy_host; - cfg.proxyPort = proxy_port; + cached_config.proxyScheme = Aws::Http::SchemeMapper::FromString(proxy_scheme.c_str()); + cached_config.proxyHost = proxy_host; + cached_config.proxyPort = proxy_port; + cache_timestamp = std::chrono::system_clock::now(); + cache_valid = true; - return cfg; + return cached_config; } catch (...) { tryLogCurrentException("AWSClient", "Failed to obtain proxy"); /// Don't use proxy if it can't be obtained. + Aws::Client::ClientConfigurationPerRequest cfg; return cfg; } } +void ProxyResolverConfiguration::errorReport(const Aws::Client::ClientConfigurationPerRequest & config) +{ + if (config.proxyHost.empty()) + return; + + std::unique_lock lock(cache_mutex); + + if (!cache_ttl.count() || !cache_valid) + return; + + if (cached_config.proxyScheme != config.proxyScheme || cached_config.proxyHost != config.proxyHost + || cached_config.proxyPort != config.proxyPort) + return; + + /// Invalidate cached proxy when got error with this proxy + cache_valid = false; +} + } #endif diff --git a/src/Disks/S3/ProxyResolverConfiguration.h b/src/Disks/S3/ProxyResolverConfiguration.h index 8eea662f257..f7eba8d028a 100644 --- a/src/Disks/S3/ProxyResolverConfiguration.h +++ b/src/Disks/S3/ProxyResolverConfiguration.h @@ -8,6 +8,8 @@ #include "ProxyConfiguration.h" +#include + namespace DB::S3 { /** @@ -18,8 +20,9 @@ namespace DB::S3 class ProxyResolverConfiguration : public ProxyConfiguration { public: - ProxyResolverConfiguration(const Poco::URI & endpoint_, String proxy_scheme_, unsigned proxy_port_); + ProxyResolverConfiguration(const Poco::URI & endpoint_, String proxy_scheme_, unsigned proxy_port_, unsigned cache_ttl_); Aws::Client::ClientConfigurationPerRequest getConfiguration(const Aws::Http::HttpRequest & request) override; + void errorReport(const Aws::Client::ClientConfigurationPerRequest & config) override; private: /// Endpoint to obtain a proxy host. @@ -28,6 +31,12 @@ private: const String proxy_scheme; /// Port for obtained proxy. const unsigned proxy_port; + + std::mutex cache_mutex; + bool cache_valid = false; + std::chrono::time_point cache_timestamp; + const std::chrono::seconds cache_ttl{0}; + Aws::Client::ClientConfigurationPerRequest cached_config; }; } diff --git a/src/Disks/S3/registerDiskS3.cpp b/src/Disks/S3/registerDiskS3.cpp index 49a11b1dbb9..01b2cea2045 100644 --- a/src/Disks/S3/registerDiskS3.cpp +++ b/src/Disks/S3/registerDiskS3.cpp @@ -56,11 +56,12 @@ std::shared_ptr getProxyResolverConfiguration( if (proxy_scheme != "http" && proxy_scheme != "https") throw Exception("Only HTTP/HTTPS schemas allowed in proxy resolver config: " + proxy_scheme, ErrorCodes::BAD_ARGUMENTS); auto proxy_port = proxy_resolver_config.getUInt(prefix + ".proxy_port"); + auto cache_ttl = proxy_resolver_config.getUInt(prefix + ".proxy_cache_time", 10); LOG_DEBUG(&Poco::Logger::get("DiskS3"), "Configured proxy resolver: {}, Scheme: {}, Port: {}", endpoint.toString(), proxy_scheme, proxy_port); - return std::make_shared(endpoint, proxy_scheme, proxy_port); + return std::make_shared(endpoint, proxy_scheme, proxy_port, cache_ttl); } std::shared_ptr getProxyListConfiguration( @@ -128,8 +129,12 @@ getClient(const Poco::Util::AbstractConfiguration & config, const String & confi auto proxy_config = getProxyConfiguration(config_prefix, config); if (proxy_config) + { client_configuration.perRequestConfiguration = [proxy_config](const auto & request) { return proxy_config->getConfiguration(request); }; + client_configuration.error_report + = [proxy_config](const auto & request_config) { proxy_config->errorReport(request_config); }; + } client_configuration.retryStrategy = std::make_shared(config.getUInt(config_prefix + ".retry_attempts", 10)); diff --git a/src/Formats/CMakeLists.txt b/src/Formats/CMakeLists.txt index 0a342917073..12def0fb1d0 100644 --- a/src/Formats/CMakeLists.txt +++ b/src/Formats/CMakeLists.txt @@ -1,5 +1 @@ configure_file(config_formats.h.in ${ConfigIncludePath}/config_formats.h) - -if (ENABLE_EXAMPLES) - add_subdirectory(examples) -endif() diff --git a/src/Formats/FormatFactory.cpp b/src/Formats/FormatFactory.cpp index a00839fc5f5..d2d6d92dea3 100644 --- a/src/Formats/FormatFactory.cpp +++ b/src/Formats/FormatFactory.cpp @@ -9,7 +9,6 @@ #include #include #include -#include #include #include #include @@ -60,6 +59,7 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings) format_settings.avro.output_codec = settings.output_format_avro_codec; format_settings.avro.output_sync_interval = settings.output_format_avro_sync_interval; format_settings.avro.schema_registry_url = settings.format_avro_schema_registry_url.toString(); + format_settings.avro.string_column_pattern = settings.output_format_avro_string_column_pattern.toString(); format_settings.csv.allow_double_quotes = settings.format_csv_allow_double_quotes; format_settings.csv.allow_single_quotes = settings.format_csv_allow_single_quotes; format_settings.csv.crlf_end_of_line = settings.output_format_csv_crlf_end_of_line; @@ -208,9 +208,6 @@ BlockOutputStreamPtr FormatFactory::getOutputStreamParallelIfPossible( WriteCallback callback, const std::optional & _format_settings) const { - if (context->getMySQLProtocolContext() && name != "MySQLWire") - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "MySQL protocol does not support custom output formats"); - const auto & output_getter = getCreators(name).output_processor_creator; const Settings & settings = context->getSettingsRef(); @@ -315,9 +312,6 @@ OutputFormatPtr FormatFactory::getOutputFormatParallelIfPossible( if (!output_getter) throw Exception(ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT, "Format {} is not suitable for output (with processors)", name); - if (context->getMySQLProtocolContext() && name != "MySQLWire") - throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "MySQL protocol does not support custom output formats"); - auto format_settings = _format_settings ? *_format_settings : getFormatSettings(context); const Settings & settings = context->getSettingsRef(); @@ -359,8 +353,11 @@ OutputFormatPtr FormatFactory::getOutputFormat( RowOutputFormatParams params; params.callback = std::move(callback); - auto format_settings = _format_settings - ? *_format_settings : getFormatSettings(context); + auto format_settings = _format_settings ? *_format_settings : getFormatSettings(context); + + /// If we're handling MySQL protocol connection right now then MySQLWire is only allowed output format. + if (format_settings.mysql_wire.sequence_id && (name != "MySQLWire")) + throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "MySQL protocol does not support custom output formats"); /** TODO: Materialization is needed, because formats can use the functions `IDataType`, * which only work with full columns. diff --git a/src/Formats/FormatSettings.h b/src/Formats/FormatSettings.h index 1773f2cc2c6..69df095bca8 100644 --- a/src/Formats/FormatSettings.h +++ b/src/Formats/FormatSettings.h @@ -61,6 +61,7 @@ struct FormatSettings String output_codec; UInt64 output_sync_interval = 16 * 1024; bool allow_missing_fields = false; + String string_column_pattern; } avro; struct CSV @@ -131,6 +132,13 @@ struct FormatSettings bool allow_multiple_rows_without_delimiter = false; } protobuf; + struct + { + uint32_t client_capabilities = 0; + size_t max_packet_size = 0; + uint8_t * sequence_id = nullptr; /// Not null if it's MySQLWire output format used to handle MySQL protocol connections. + } mysql_wire; + struct { std::string regexp; @@ -169,4 +177,3 @@ struct FormatSettings }; } - diff --git a/src/Formats/JSONEachRowUtils.cpp b/src/Formats/JSONEachRowUtils.cpp index 28ba625d9fb..d06f507f044 100644 --- a/src/Formats/JSONEachRowUtils.cpp +++ b/src/Formats/JSONEachRowUtils.cpp @@ -29,10 +29,12 @@ std::pair fileSegmentationEngineJSONEachRowImpl(ReadBuffer & in, D if (quotes) { pos = find_first_symbols<'\\', '"'>(pos, in.buffer().end()); + if (pos > in.buffer().end()) throw Exception("Position in buffer is out of bounds. There must be a bug.", ErrorCodes::LOGICAL_ERROR); else if (pos == in.buffer().end()) continue; + if (*pos == '\\') { ++pos; @@ -48,10 +50,12 @@ std::pair fileSegmentationEngineJSONEachRowImpl(ReadBuffer & in, D else { pos = find_first_symbols<'{', '}', '\\', '"'>(pos, in.buffer().end()); + if (pos > in.buffer().end()) throw Exception("Position in buffer is out of bounds. There must be a bug.", ErrorCodes::LOGICAL_ERROR); else if (pos == in.buffer().end()) continue; + else if (*pos == '{') { ++balance; diff --git a/src/Formats/MySQLBlockInputStream.cpp b/src/Formats/MySQLBlockInputStream.cpp index 40b2a0bc1a9..79ebeacfad5 100644 --- a/src/Formats/MySQLBlockInputStream.cpp +++ b/src/Formats/MySQLBlockInputStream.cpp @@ -49,7 +49,7 @@ MySQLBlockInputStream::Connection::Connection( { } -/// Used in MaterializeMySQL and in doInvalidateQuery for dictionary source. +/// Used in MaterializedMySQL and in doInvalidateQuery for dictionary source. MySQLBlockInputStream::MySQLBlockInputStream( const mysqlxx::PoolWithFailover::Entry & entry, const std::string & query_str, diff --git a/src/Formats/ProtobufSerializer.cpp b/src/Formats/ProtobufSerializer.cpp index ac1d5f048e3..c5781ee6c9f 100644 --- a/src/Formats/ProtobufSerializer.cpp +++ b/src/Formats/ProtobufSerializer.cpp @@ -1423,18 +1423,23 @@ namespace }; - /// Serializes a ColumnVector containing dates to a field of any type except TYPE_MESSAGE, TYPE_GROUP, TYPE_BOOL, TYPE_ENUM. + /// Serializes a ColumnVector containing datetimes to a field of any type except TYPE_MESSAGE, TYPE_GROUP, TYPE_BOOL, TYPE_ENUM. class ProtobufSerializerDateTime : public ProtobufSerializerNumber { public: ProtobufSerializerDateTime( - const FieldDescriptor & field_descriptor_, const ProtobufReaderOrWriter & reader_or_writer_) - : ProtobufSerializerNumber(field_descriptor_, reader_or_writer_) + const DataTypeDateTime & type, + const FieldDescriptor & field_descriptor_, + const ProtobufReaderOrWriter & reader_or_writer_) + : ProtobufSerializerNumber(field_descriptor_, reader_or_writer_), + date_lut(type.getTimeZone()) { setFunctions(); } protected: + const DateLUTImpl & date_lut; + void setFunctions() { switch (field_typeid) @@ -1458,17 +1463,17 @@ namespace { write_function = [this](UInt32 value) { - dateTimeToString(value, text_buffer); + dateTimeToString(value, text_buffer, date_lut); writeStr(text_buffer); }; read_function = [this]() -> UInt32 { readStr(text_buffer); - return stringToDateTime(text_buffer); + return stringToDateTime(text_buffer, date_lut); }; - default_function = [this]() -> UInt32 { return stringToDateTime(field_descriptor.default_value_string()); }; + default_function = [this]() -> UInt32 { return stringToDateTime(field_descriptor.default_value_string(), date_lut); }; break; } @@ -1477,17 +1482,17 @@ namespace } } - static void dateTimeToString(time_t tm, String & str) + static void dateTimeToString(time_t tm, String & str, const DateLUTImpl & lut) { WriteBufferFromString buf{str}; - writeDateTimeText(tm, buf); + writeDateTimeText(tm, buf, lut); } - static time_t stringToDateTime(const String & str) + static time_t stringToDateTime(const String & str, const DateLUTImpl & lut) { ReadBufferFromString buf{str}; time_t tm = 0; - readDateTimeText(tm, buf); + readDateTimeText(tm, buf, lut); if (tm < 0) tm = 0; return tm; @@ -2833,7 +2838,7 @@ namespace case TypeIndex::Float32: return std::make_unique>(field_descriptor, reader_or_writer); case TypeIndex::Float64: return std::make_unique>(field_descriptor, reader_or_writer); case TypeIndex::Date: return std::make_unique(field_descriptor, reader_or_writer); - case TypeIndex::DateTime: return std::make_unique(field_descriptor, reader_or_writer); + case TypeIndex::DateTime: return std::make_unique(assert_cast(*data_type), field_descriptor, reader_or_writer); case TypeIndex::DateTime64: return std::make_unique(assert_cast(*data_type), field_descriptor, reader_or_writer); case TypeIndex::String: return std::make_unique>(field_descriptor, reader_or_writer); case TypeIndex::FixedString: return std::make_unique>(typeid_cast>(data_type), field_descriptor, reader_or_writer); diff --git a/src/Formats/examples/CMakeLists.txt b/src/Formats/examples/CMakeLists.txt deleted file mode 100644 index e1cb7604fab..00000000000 --- a/src/Formats/examples/CMakeLists.txt +++ /dev/null @@ -1,4 +0,0 @@ -set(SRCS ) - -add_executable (tab_separated_streams tab_separated_streams.cpp ${SRCS}) -target_link_libraries (tab_separated_streams PRIVATE clickhouse_aggregate_functions dbms) diff --git a/src/Formats/examples/tab_separated_streams.cpp b/src/Formats/examples/tab_separated_streams.cpp deleted file mode 100644 index bd733e4b9aa..00000000000 --- a/src/Formats/examples/tab_separated_streams.cpp +++ /dev/null @@ -1,57 +0,0 @@ -#include - -#include - -#include -#include - -#include -#include - -#include - -#include -#include -#include -#include - - -using namespace DB; - -int main(int, char **) -try -{ - Block sample; - { - ColumnWithTypeAndName col; - col.type = std::make_shared(); - sample.insert(std::move(col)); - } - { - ColumnWithTypeAndName col; - col.type = std::make_shared(); - sample.insert(std::move(col)); - } - - ReadBufferFromFile in_buf("test_in"); - WriteBufferFromFile out_buf("test_out"); - - FormatSettings format_settings; - - RowInputFormatParams in_params{DEFAULT_INSERT_BLOCK_SIZE, 0, 0}; - RowOutputFormatParams out_params{[](const Columns & /* columns */, size_t /* row */){}}; - - InputFormatPtr input_format = std::make_shared(sample, in_buf, in_params, false, false, format_settings); - BlockInputStreamPtr block_input = std::make_shared(std::move(input_format)); - - BlockOutputStreamPtr block_output = std::make_shared( - std::make_shared(out_buf, sample, false, false, out_params, format_settings)); - - copyData(*block_input, *block_output); - return 0; -} -catch (...) -{ - std::cerr << getCurrentExceptionMessage(true) << '\n'; - return 1; -} diff --git a/src/Formats/registerFormats.cpp b/src/Formats/registerFormats.cpp index 89fb7c6cc02..c035ec0a1d1 100644 --- a/src/Formats/registerFormats.cpp +++ b/src/Formats/registerFormats.cpp @@ -123,12 +123,13 @@ void registerFormats() registerOutputFormatProcessorORC(factory); registerInputFormatProcessorParquet(factory); registerOutputFormatProcessorParquet(factory); - registerInputFormatProcessorArrow(factory); - registerOutputFormatProcessorArrow(factory); registerInputFormatProcessorAvro(factory); registerOutputFormatProcessorAvro(factory); #endif + registerInputFormatProcessorArrow(factory); + registerOutputFormatProcessorArrow(factory); + registerOutputFormatNull(factory); registerOutputFormatProcessorPretty(factory); diff --git a/src/Functions/DateTimeTransforms.h b/src/Functions/DateTimeTransforms.h index d12bc1701ad..1891410a18e 100644 --- a/src/Functions/DateTimeTransforms.h +++ b/src/Functions/DateTimeTransforms.h @@ -74,6 +74,30 @@ struct ToDateImpl using FactorTransform = ZeroTransform; }; +struct ToDate32Impl +{ + static constexpr auto name = "toDate32"; + + static inline Int32 execute(Int64 t, const DateLUTImpl & time_zone) + { + return Int32(time_zone.toDayNum(t)); + } + static inline Int32 execute(UInt32 t, const DateLUTImpl & time_zone) + { + return Int32(time_zone.toDayNum(t)); + } + static inline Int32 execute(Int32 d, const DateLUTImpl &) + { + return d; + } + static inline Int32 execute(UInt16 d, const DateLUTImpl &) + { + return d; + } + + using FactorTransform = ZeroTransform; +}; + struct ToStartOfDayImpl { static constexpr auto name = "toStartOfDay"; diff --git a/src/Functions/FunctionChar.cpp b/src/Functions/FunctionChar.cpp new file mode 100644 index 00000000000..1cbb60b7760 --- /dev/null +++ b/src/Functions/FunctionChar.cpp @@ -0,0 +1,120 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int ILLEGAL_COLUMN; +} + +class FunctionChar : public IFunction +{ +public: + static constexpr auto name = "char"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + bool isVariadic() const override { return true; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + size_t getNumberOfArguments() const override { return 0; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (arguments.empty()) + throw Exception("Number of arguments for function " + getName() + " can't be " + toString(arguments.size()) + + ", should be at least 1", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + for (const auto & arg : arguments) + { + WhichDataType which(arg); + if (!(which.isInt() || which.isUInt() || which.isFloat())) + throw Exception("Illegal type " + arg->getName() + " of argument of function " + getName() + + ", must be Int, UInt or Float number", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + } + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override + { + auto col_str = ColumnString::create(); + ColumnString::Chars & out_vec = col_str->getChars(); + ColumnString::Offsets & out_offsets = col_str->getOffsets(); + + const auto size_per_row = arguments.size() + 1; + out_vec.resize(size_per_row * input_rows_count); + out_offsets.resize(input_rows_count); + + for (size_t row = 0; row < input_rows_count; ++row) + { + out_offsets[row] = size_per_row + out_offsets[row - 1]; + out_vec[row * size_per_row + size_per_row - 1] = '\0'; + } + + Columns columns_holder(arguments.size()); + for (size_t idx = 0; idx < arguments.size(); ++idx) + { + //partial const column + columns_holder[idx] = arguments[idx].column->convertToFullColumnIfConst(); + const IColumn * column = columns_holder[idx].get(); + + if (!(executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) + || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row))) + { + throw Exception{"Illegal column " + arguments[idx].column->getName() + + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN}; + } + } + + return col_str; + } + +private: + template + bool executeNumber(const IColumn & src_data, ColumnString::Chars & out_vec, const size_t & column_idx, const size_t & rows, const size_t & size_per_row) const + { + const ColumnVector * src_data_concrete = checkAndGetColumn>(&src_data); + + if (!src_data_concrete) + { + return false; + } + + for (size_t row = 0; row < rows; ++row) + { + out_vec[row * size_per_row + column_idx] = static_cast(src_data_concrete->getInt(row)); + } + return true; + } +}; + +void registerFunctionChar(FunctionFactory & factory) +{ + factory.registerFunction(FunctionFactory::CaseInsensitive); +} + +} diff --git a/src/Functions/FunctionDateOrDateTimeToSomething.h b/src/Functions/FunctionDateOrDateTimeToSomething.h index 8bd5218261e..abf7f967653 100644 --- a/src/Functions/FunctionDateOrDateTimeToSomething.h +++ b/src/Functions/FunctionDateOrDateTimeToSomething.h @@ -39,7 +39,7 @@ public: { if (arguments.size() == 1) { - if (!isDate(arguments[0].type) && !isDate32(arguments[0].type) && !isDateTime(arguments[0].type) && !isDateTime64(arguments[0].type)) + if (!isDateOrDate32(arguments[0].type) && !isDateTime(arguments[0].type) && !isDateTime64(arguments[0].type)) throw Exception( "Illegal type " + arguments[0].type->getName() + " of argument of function " + getName() + ". Should be a date or a date with time", @@ -47,7 +47,7 @@ public: } else if (arguments.size() == 2) { - if (!isDate(arguments[0].type) && !isDate32(arguments[0].type) && !isDateTime(arguments[0].type) && !isDateTime64(arguments[0].type)) + if (!isDateOrDate32(arguments[0].type) && !isDateTime(arguments[0].type) && !isDateTime64(arguments[0].type)) throw Exception( "Illegal type " + arguments[0].type->getName() + " of argument of function " + getName() + ". Should be a date or a date with time", @@ -165,4 +165,3 @@ public: }; } - diff --git a/src/Functions/FunctionsBinaryRepr.cpp b/src/Functions/FunctionsBinaryRepr.cpp new file mode 100644 index 00000000000..08d74b30166 --- /dev/null +++ b/src/Functions/FunctionsBinaryRepr.cpp @@ -0,0 +1,562 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int LOGICAL_ERROR; + extern const int ILLEGAL_COLUMN; +} + +/* + * hex(x) - Returns hexadecimal representation; capital letters; there are no prefixes 0x or suffixes h. + * For numbers, returns a variable-length string - hex in the "human" (big endian) format, with the leading zeros being cut, + * but only by whole bytes. For dates and datetimes - the same as for numbers. + * For example, hex(257) = '0101'. + * + * unhex(string) - Returns a string, hex of which is equal to `string` with regard of case and discarding one leading zero. + * If such a string does not exist, could return arbitrary implementation specific value. + * + * bin(x) - Returns binary representation. + * + * unbin(x) - Returns a string, opposite to `bin`. + * + */ + +struct HexImpl +{ + static constexpr auto name = "hex"; + static constexpr size_t word_size = 2; + + template + static void executeOneUInt(T x, char *& out) + { + bool was_nonzero = false; + for (int offset = (sizeof(T) - 1) * 8; offset >= 0; offset -= 8) + { + UInt8 byte = x >> offset; + + /// Skip leading zeros + if (byte == 0 && !was_nonzero && offset) + continue; + + was_nonzero = true; + writeHexByteUppercase(byte, out); + out += word_size; + } + *out = '\0'; + ++out; + } + + static void executeOneString(const UInt8 * pos, const UInt8 * end, char *& out) + { + while (pos < end) + { + writeHexByteUppercase(*pos, out); + ++pos; + out += word_size; + } + *out = '\0'; + ++out; + } + + template + static void executeFloatAndDecimal(const T & in_vec, ColumnPtr & col_res, const size_t type_size_in_bytes) + { + const size_t hex_length = type_size_in_bytes * word_size + 1; /// Including trailing zero byte. + auto col_str = ColumnString::create(); + + ColumnString::Chars & out_vec = col_str->getChars(); + ColumnString::Offsets & out_offsets = col_str->getOffsets(); + + size_t size = in_vec.size(); + out_offsets.resize(size); + out_vec.resize(size * hex_length); + + size_t pos = 0; + char * out = reinterpret_cast(&out_vec[0]); + for (size_t i = 0; i < size; ++i) + { + const UInt8 * in_pos = reinterpret_cast(&in_vec[i]); + executeOneString(in_pos, in_pos + type_size_in_bytes, out); + + pos += hex_length; + out_offsets[i] = pos; + } + col_res = std::move(col_str); + } +}; + +struct UnhexImpl +{ + static constexpr auto name = "unhex"; + static constexpr size_t word_size = 2; + + static void decode(const char * pos, const char * end, char *& out) + { + if ((end - pos) & 1) + { + *out = unhex(*pos); + ++out; + ++pos; + } + while (pos < end) + { + *out = unhex2(pos); + pos += word_size; + ++out; + } + *out = '\0'; + ++out; + } +}; + +struct BinImpl +{ + static constexpr auto name = "bin"; + static constexpr size_t word_size = 8; + + template + static void executeOneUInt(T x, char *& out) + { + bool was_nonzero = false; + for (int offset = (sizeof(T) - 1) * 8; offset >= 0; offset -= 8) + { + UInt8 byte = x >> offset; + + /// Skip leading zeros + if (byte == 0 && !was_nonzero && offset) + continue; + + was_nonzero = true; + writeBinByte(byte, out); + out += word_size; + } + *out = '\0'; + ++out; + } + + template + static void executeFloatAndDecimal(const T & in_vec, ColumnPtr & col_res, const size_t type_size_in_bytes) + { + const size_t hex_length = type_size_in_bytes * word_size + 1; /// Including trailing zero byte. + auto col_str = ColumnString::create(); + + ColumnString::Chars & out_vec = col_str->getChars(); + ColumnString::Offsets & out_offsets = col_str->getOffsets(); + + size_t size = in_vec.size(); + out_offsets.resize(size); + out_vec.resize(size * hex_length); + + size_t pos = 0; + char * out = reinterpret_cast(out_vec.data()); + for (size_t i = 0; i < size; ++i) + { + const UInt8 * in_pos = reinterpret_cast(&in_vec[i]); + executeOneString(in_pos, in_pos + type_size_in_bytes, out); + + pos += hex_length; + out_offsets[i] = pos; + } + col_res = std::move(col_str); + } + + static void executeOneString(const UInt8 * pos, const UInt8 * end, char *& out) + { + while (pos < end) + { + writeBinByte(*pos, out); + ++pos; + out += word_size; + } + *out = '\0'; + ++out; + } +}; + +struct UnbinImpl +{ + static constexpr auto name = "unbin"; + static constexpr size_t word_size = 8; + + static void decode(const char * pos, const char * end, char *& out) + { + if (pos == end) + { + *out = '\0'; + ++out; + return; + } + + UInt8 left = 0; + + /// end - pos is the length of input. + /// (length & 7) to make remain bits length mod 8 is zero to split. + /// e.g. the length is 9 and the input is "101000001", + /// first left_cnt is 1, left is 0, right shift, pos is 1, left = 1 + /// then, left_cnt is 0, remain input is '01000001'. + for (UInt8 left_cnt = (end - pos) & 7; left_cnt > 0; --left_cnt) + { + left = left << 1; + if (*pos != '0') + left += 1; + ++pos; + } + + if (left != 0 || end - pos == 0) + { + *out = left; + ++out; + } + + assert((end - pos) % 8 == 0); + + while (end - pos != 0) + { + UInt8 c = 0; + for (UInt8 i = 0; i < 8; ++i) + { + c = c << 1; + if (*pos != '0') + c += 1; + ++pos; + } + *out = c; + ++out; + } + + *out = '\0'; + ++out; + } +}; + +/// Encode number or string to string with binary or hexadecimal representation +template +class EncodeToBinaryRepr : public IFunction +{ +public: + static constexpr auto name = Impl::name; + static constexpr size_t word_size = Impl::word_size; + + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 1; } + + bool useDefaultImplementationForConstants() const override { return true; } + + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + WhichDataType which(arguments[0]); + + if (!which.isStringOrFixedString() && + !which.isDate() && + !which.isDateTime() && + !which.isDateTime64() && + !which.isUInt() && + !which.isFloat() && + !which.isDecimal() && + !which.isAggregateFunction()) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const IColumn * column = arguments[0].column.get(); + ColumnPtr res_column; + + WhichDataType which(column->getDataType()); + if (which.isAggregateFunction()) + { + const ColumnPtr to_string = castColumn(arguments[0], std::make_shared()); + const auto * str_column = checkAndGetColumn(to_string.get()); + tryExecuteString(str_column, res_column); + return res_column; + } + + if (tryExecuteUInt(column, res_column) || + tryExecuteUInt(column, res_column) || + tryExecuteUInt(column, res_column) || + tryExecuteUInt(column, res_column) || + tryExecuteString(column, res_column) || + tryExecuteFixedString(column, res_column) || + tryExecuteFloat(column, res_column) || + tryExecuteFloat(column, res_column) || + tryExecuteDecimal(column, res_column) || + tryExecuteDecimal(column, res_column) || + tryExecuteDecimal(column, res_column)) + return res_column; + + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } + + template + bool tryExecuteUInt(const IColumn * col, ColumnPtr & col_res) const + { + const ColumnVector * col_vec = checkAndGetColumn>(col); + + static constexpr size_t MAX_LENGTH = sizeof(T) * word_size + 1; /// Including trailing zero byte. + + if (col_vec) + { + auto col_str = ColumnString::create(); + ColumnString::Chars & out_vec = col_str->getChars(); + ColumnString::Offsets & out_offsets = col_str->getOffsets(); + + const typename ColumnVector::Container & in_vec = col_vec->getData(); + + size_t size = in_vec.size(); + out_offsets.resize(size); + out_vec.resize(size * (word_size+1) + MAX_LENGTH); /// word_size+1 is length of one byte in hex/bin plus zero byte. + + size_t pos = 0; + for (size_t i = 0; i < size; ++i) + { + /// Manual exponential growth, so as not to rely on the linear amortized work time of `resize` (no one guarantees it). + if (pos + MAX_LENGTH > out_vec.size()) + out_vec.resize(out_vec.size() * word_size + MAX_LENGTH); + + char * begin = reinterpret_cast(&out_vec[pos]); + char * end = begin; + Impl::executeOneUInt(in_vec[i], end); + + pos += end - begin; + out_offsets[i] = pos; + } + out_vec.resize(pos); + + col_res = std::move(col_str); + return true; + } + else + { + return false; + } + } + + bool tryExecuteString(const IColumn *col, ColumnPtr &col_res) const + { + const ColumnString * col_str_in = checkAndGetColumn(col); + + if (col_str_in) + { + auto col_str = ColumnString::create(); + ColumnString::Chars & out_vec = col_str->getChars(); + ColumnString::Offsets & out_offsets = col_str->getOffsets(); + + const ColumnString::Chars & in_vec = col_str_in->getChars(); + const ColumnString::Offsets & in_offsets = col_str_in->getOffsets(); + + size_t size = in_offsets.size(); + + out_offsets.resize(size); + /// reserve `word_size` bytes for each non trailing zero byte from input + `size` bytes for trailing zeros + out_vec.resize((in_vec.size() - size) * word_size + size); + + char * begin = reinterpret_cast(out_vec.data()); + char * pos = begin; + size_t prev_offset = 0; + + for (size_t i = 0; i < size; ++i) + { + size_t new_offset = in_offsets[i]; + + Impl::executeOneString(&in_vec[prev_offset], &in_vec[new_offset - 1], pos); + + out_offsets[i] = pos - begin; + + prev_offset = new_offset; + } + if (!out_offsets.empty() && out_offsets.back() != out_vec.size()) + throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR); + + col_res = std::move(col_str); + return true; + } + else + { + return false; + } + } + + template + bool tryExecuteDecimal(const IColumn * col, ColumnPtr & col_res) const + { + const ColumnDecimal * col_dec = checkAndGetColumn>(col); + if (col_dec) + { + const typename ColumnDecimal::Container & in_vec = col_dec->getData(); + Impl::executeFloatAndDecimal(in_vec, col_res, sizeof(T)); + return true; + } + else + { + return false; + } + } + + static bool tryExecuteFixedString(const IColumn * col, ColumnPtr & col_res) + { + const ColumnFixedString * col_fstr_in = checkAndGetColumn(col); + + if (col_fstr_in) + { + auto col_str = ColumnString::create(); + ColumnString::Chars & out_vec = col_str->getChars(); + ColumnString::Offsets & out_offsets = col_str->getOffsets(); + + const ColumnString::Chars & in_vec = col_fstr_in->getChars(); + + size_t size = col_fstr_in->size(); + + out_offsets.resize(size); + out_vec.resize(in_vec.size() * word_size + size); + + char * begin = reinterpret_cast(out_vec.data()); + char * pos = begin; + + size_t n = col_fstr_in->getN(); + + size_t prev_offset = 0; + + for (size_t i = 0; i < size; ++i) + { + size_t new_offset = prev_offset + n; + + Impl::executeOneString(&in_vec[prev_offset], &in_vec[new_offset], pos); + + out_offsets[i] = pos - begin; + prev_offset = new_offset; + } + + if (!out_offsets.empty() && out_offsets.back() != out_vec.size()) + throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR); + + col_res = std::move(col_str); + return true; + } + else + { + return false; + } + } + + template + bool tryExecuteFloat(const IColumn * col, ColumnPtr & col_res) const + { + const ColumnVector * col_vec = checkAndGetColumn>(col); + if (col_vec) + { + const typename ColumnVector::Container & in_vec = col_vec->getData(); + Impl::executeFloatAndDecimal(in_vec, col_res, sizeof(T)); + return true; + } + else + { + return false; + } + } +}; + +/// Decode number or string from string with binary or hexadecimal representation +template +class DecodeFromBinaryRepr : public IFunction +{ +public: + static constexpr auto name = Impl::name; + static constexpr size_t word_size = Impl::word_size; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnPtr & column = arguments[0].column; + + if (const ColumnString * col = checkAndGetColumn(column.get())) + { + auto col_res = ColumnString::create(); + + ColumnString::Chars & out_vec = col_res->getChars(); + ColumnString::Offsets & out_offsets = col_res->getOffsets(); + + const ColumnString::Chars & in_vec = col->getChars(); + const ColumnString::Offsets & in_offsets = col->getOffsets(); + + size_t size = in_offsets.size(); + out_offsets.resize(size); + out_vec.resize(in_vec.size() / word_size + size); + + char * begin = reinterpret_cast(out_vec.data()); + char * pos = begin; + size_t prev_offset = 0; + + for (size_t i = 0; i < size; ++i) + { + size_t new_offset = in_offsets[i]; + + Impl::decode(reinterpret_cast(&in_vec[prev_offset]), reinterpret_cast(&in_vec[new_offset - 1]), pos); + + out_offsets[i] = pos - begin; + + prev_offset = new_offset; + } + + out_vec.resize(pos - begin); + + return col_res; + } + else + { + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } + } +}; + +void registerFunctionsBinaryRepr(FunctionFactory & factory) +{ + factory.registerFunction>(FunctionFactory::CaseInsensitive); + factory.registerFunction>(FunctionFactory::CaseInsensitive); + factory.registerFunction>(FunctionFactory::CaseInsensitive); + factory.registerFunction>(FunctionFactory::CaseInsensitive); +} + +} diff --git a/src/Functions/FunctionsBitToArray.cpp b/src/Functions/FunctionsBitToArray.cpp new file mode 100644 index 00000000000..32c45823e0f --- /dev/null +++ b/src/Functions/FunctionsBitToArray.cpp @@ -0,0 +1,337 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int ILLEGAL_COLUMN; +} + + +/** Functions for an unusual conversion to a string or array: + * + * bitmaskToList - takes an integer - a bitmask, returns a string of degrees of 2 separated by a comma. + * for example, bitmaskToList(50) = '2,16,32' + * + * bitmaskToArray(x) - Returns an array of powers of two in the binary form of x. For example, bitmaskToArray(50) = [2, 16, 32]. + * + */ + +namespace +{ + +class FunctionBitmaskToList : public IFunction +{ +public: + static constexpr auto name = "bitmaskToList"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + const DataTypePtr & type = arguments[0]; + + if (!isInteger(type)) + throw Exception("Cannot format " + type->getName() + " as bitmask string", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + ColumnPtr res; + if (!((res = executeType(arguments)) + || (res = executeType(arguments)) + || (res = executeType(arguments)) + || (res = executeType(arguments)) + || (res = executeType(arguments)) + || (res = executeType(arguments)) + || (res = executeType(arguments)) + || (res = executeType(arguments)))) + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + + return res; + } + +private: + template + inline static void writeBitmask(T x, WriteBuffer & out) + { + using UnsignedT = make_unsigned_t; + UnsignedT u_x = x; + + bool first = true; + while (u_x) + { + UnsignedT y = u_x & (u_x - 1); + UnsignedT bit = u_x ^ y; + u_x = y; + if (!first) + writeChar(',', out); + first = false; + writeIntText(T(bit), out); + } + } + + template + ColumnPtr executeType(const ColumnsWithTypeAndName & columns) const + { + if (const ColumnVector * col_from = checkAndGetColumn>(columns[0].column.get())) + { + auto col_to = ColumnString::create(); + + const typename ColumnVector::Container & vec_from = col_from->getData(); + ColumnString::Chars & data_to = col_to->getChars(); + ColumnString::Offsets & offsets_to = col_to->getOffsets(); + size_t size = vec_from.size(); + data_to.resize(size * 2); + offsets_to.resize(size); + + WriteBufferFromVector buf_to(data_to); + + for (size_t i = 0; i < size; ++i) + { + writeBitmask(vec_from[i], buf_to); + writeChar(0, buf_to); + offsets_to[i] = buf_to.count(); + } + + buf_to.finalize(); + return col_to; + } + + return nullptr; + } +}; + + +class FunctionBitmaskToArray : public IFunction +{ +public: + static constexpr auto name = "bitmaskToArray"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isInteger(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(arguments[0]); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + template + bool tryExecute(const IColumn * column, ColumnPtr & out_column) const + { + using UnsignedT = make_unsigned_t; + + if (const ColumnVector * col_from = checkAndGetColumn>(column)) + { + auto col_values = ColumnVector::create(); + auto col_offsets = ColumnArray::ColumnOffsets::create(); + + typename ColumnVector::Container & res_values = col_values->getData(); + ColumnArray::Offsets & res_offsets = col_offsets->getData(); + + const typename ColumnVector::Container & vec_from = col_from->getData(); + size_t size = vec_from.size(); + res_offsets.resize(size); + res_values.reserve(size * 2); + + for (size_t row = 0; row < size; ++row) + { + UnsignedT x = vec_from[row]; + while (x) + { + UnsignedT y = x & (x - 1); + UnsignedT bit = x ^ y; + x = y; + res_values.push_back(bit); + } + res_offsets[row] = res_values.size(); + } + + out_column = ColumnArray::create(std::move(col_values), std::move(col_offsets)); + return true; + } + else + { + return false; + } + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const IColumn * in_column = arguments[0].column.get(); + ColumnPtr out_column; + + if (tryExecute(in_column, out_column) || + tryExecute(in_column, out_column) || + tryExecute(in_column, out_column) || + tryExecute(in_column, out_column) || + tryExecute(in_column, out_column) || + tryExecute(in_column, out_column) || + tryExecute(in_column, out_column) || + tryExecute(in_column, out_column)) + return out_column; + + throw Exception("Illegal column " + arguments[0].column->getName() + + " of first argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + +class FunctionBitPositionsToArray : public IFunction +{ +public: + static constexpr auto name = "bitPositionsToArray"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isInteger(arguments[0])) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Illegal type {} of argument of function {}", + getName(), + arguments[0]->getName()); + + return std::make_shared(std::make_shared()); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + template + ColumnPtr executeType(const IColumn * column) const + { + const ColumnVector * col_from = checkAndGetColumn>(column); + if (!col_from) + return nullptr; + + auto result_array_values = ColumnVector::create(); + auto result_array_offsets = ColumnArray::ColumnOffsets::create(); + + auto & result_array_values_data = result_array_values->getData(); + auto & result_array_offsets_data = result_array_offsets->getData(); + + auto & vec_from = col_from->getData(); + size_t size = vec_from.size(); + result_array_offsets_data.resize(size); + result_array_values_data.reserve(size * 2); + + using UnsignedType = make_unsigned_t; + + for (size_t row = 0; row < size; ++row) + { + UnsignedType x = static_cast(vec_from[row]); + + if constexpr (is_big_int_v) + { + size_t position = 0; + + while (x) + { + if (x & 1) + result_array_values_data.push_back(position); + + x >>= 1; + ++position; + } + } + else + { + while (x) + { + result_array_values_data.push_back(getTrailingZeroBitsUnsafe(x)); + x &= (x - 1); + } + } + + result_array_offsets_data[row] = result_array_values_data.size(); + } + + auto result_column = ColumnArray::create(std::move(result_array_values), std::move(result_array_offsets)); + + return result_column; + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const IColumn * in_column = arguments[0].column.get(); + ColumnPtr result_column; + + if (!((result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)) + || (result_column = executeType(in_column)))) + { + throw Exception(ErrorCodes::ILLEGAL_COLUMN, + "Illegal column {} of first argument of function {}", + arguments[0].column->getName(), + getName()); + } + + return result_column; + } +}; + +} + +void registerFunctionsBitToArray(FunctionFactory & factory) +{ + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); +} + +} + diff --git a/src/Functions/FunctionsCoding.cpp b/src/Functions/FunctionsCoding.cpp deleted file mode 100644 index f1bbeb5c43f..00000000000 --- a/src/Functions/FunctionsCoding.cpp +++ /dev/null @@ -1,54 +0,0 @@ -#include -#include - - -namespace DB -{ - -struct NameFunctionIPv4NumToString { static constexpr auto name = "IPv4NumToString"; }; -struct NameFunctionIPv4NumToStringClassC { static constexpr auto name = "IPv4NumToStringClassC"; }; - - -void registerFunctionsCoding(FunctionFactory & factory) -{ - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction>(); - factory.registerFunction>(); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(FunctionFactory::CaseInsensitive); - factory.registerFunction(FunctionFactory::CaseInsensitive); - factory.registerFunction(FunctionFactory::CaseInsensitive); - factory.registerFunction(FunctionFactory::CaseInsensitive); - factory.registerFunction(FunctionFactory::CaseInsensitive); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(); - factory.registerFunction(); - - factory.registerFunction>(); - factory.registerFunction>(); - /// MysQL compatibility alias. - factory.registerFunction>("INET_NTOA", FunctionFactory::CaseInsensitive); - - factory.registerFunction(); - /// MysQL compatibility alias. - factory.registerFunction("INET_ATON", FunctionFactory::CaseInsensitive); - - factory.registerFunction(); - /// MysQL compatibility alias. - factory.registerFunction("INET6_NTOA", FunctionFactory::CaseInsensitive); - - factory.registerFunction(); - /// MysQL compatibility alias. - factory.registerFunction("INET6_ATON", FunctionFactory::CaseInsensitive); -} - -} diff --git a/src/Functions/FunctionsCoding.h b/src/Functions/FunctionsCoding.h deleted file mode 100644 index 00b09acea1f..00000000000 --- a/src/Functions/FunctionsCoding.h +++ /dev/null @@ -1,2218 +0,0 @@ -#pragma once - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include - - -namespace DB -{ - -namespace ErrorCodes -{ - extern const int ILLEGAL_TYPE_OF_ARGUMENT; - extern const int LOGICAL_ERROR; - extern const int ILLEGAL_COLUMN; -} - - -/** TODO This file contains ridiculous amount of copy-paste. - */ - -/** Encoding functions: - * - * IPv4NumToString (num) - See below. - * IPv4StringToNum(string) - Convert, for example, '192.168.0.1' to 3232235521 and vice versa. - * - * hex(x) - Returns hex; capital letters; there are no prefixes 0x or suffixes h. - * For numbers, returns a variable-length string - hex in the "human" (big endian) format, with the leading zeros being cut, - * but only by whole bytes. For dates and datetimes - the same as for numbers. - * For example, hex(257) = '0101'. - * unhex(string) - Returns a string, hex of which is equal to `string` with regard of case and discarding one leading zero. - * If such a string does not exist, could return arbitrary implementation specific value. - * - * bitmaskToArray(x) - Returns an array of powers of two in the binary form of x. For example, bitmaskToArray(50) = [2, 16, 32]. - */ - - -constexpr size_t uuid_bytes_length = 16; -constexpr size_t uuid_text_length = 36; - -class FunctionIPv6NumToString : public IFunction -{ -public: - static constexpr auto name = "IPv6NumToString"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - const auto * ptr = checkAndGetDataType(arguments[0].get()); - if (!ptr || ptr->getN() != IPV6_BINARY_LENGTH) - throw Exception("Illegal type " + arguments[0]->getName() + - " of argument of function " + getName() + - ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const auto & col_type_name = arguments[0]; - const ColumnPtr & column = col_type_name.column; - - if (const auto * col_in = checkAndGetColumn(column.get())) - { - if (col_in->getN() != IPV6_BINARY_LENGTH) - throw Exception("Illegal type " + col_type_name.type->getName() + - " of column " + col_in->getName() + - " argument of function " + getName() + - ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const auto size = col_in->size(); - const auto & vec_in = col_in->getChars(); - - auto col_res = ColumnString::create(); - - ColumnString::Chars & vec_res = col_res->getChars(); - ColumnString::Offsets & offsets_res = col_res->getOffsets(); - vec_res.resize(size * (IPV6_MAX_TEXT_LENGTH + 1)); - offsets_res.resize(size); - - auto * begin = reinterpret_cast(vec_res.data()); - auto * pos = begin; - - for (size_t offset = 0, i = 0; offset < vec_in.size(); offset += IPV6_BINARY_LENGTH, ++i) - { - formatIPv6(reinterpret_cast(&vec_in[offset]), pos); - offsets_res[i] = pos - begin; - } - - vec_res.resize(pos - begin); - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - - -class FunctionCutIPv6 : public IFunction -{ -public: - static constexpr auto name = "cutIPv6"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 3; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - const auto * ptr = checkAndGetDataType(arguments[0].get()); - if (!ptr || ptr->getN() != IPV6_BINARY_LENGTH) - throw Exception("Illegal type " + arguments[0]->getName() + - " of argument 1 of function " + getName() + - ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - if (!WhichDataType(arguments[1]).isUInt8()) - throw Exception("Illegal type " + arguments[1]->getName() + - " of argument 2 of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - if (!WhichDataType(arguments[2]).isUInt8()) - throw Exception("Illegal type " + arguments[2]->getName() + - " of argument 3 of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1, 2}; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const auto & col_type_name = arguments[0]; - const ColumnPtr & column = col_type_name.column; - - const auto & col_ipv6_zeroed_tail_bytes_type = arguments[1]; - const auto & col_ipv6_zeroed_tail_bytes = col_ipv6_zeroed_tail_bytes_type.column; - const auto & col_ipv4_zeroed_tail_bytes_type = arguments[2]; - const auto & col_ipv4_zeroed_tail_bytes = col_ipv4_zeroed_tail_bytes_type.column; - - if (const auto * col_in = checkAndGetColumn(column.get())) - { - if (col_in->getN() != IPV6_BINARY_LENGTH) - throw Exception("Illegal type " + col_type_name.type->getName() + - " of column " + col_in->getName() + - " argument of function " + getName() + - ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const auto * ipv6_zeroed_tail_bytes = checkAndGetColumnConst>(col_ipv6_zeroed_tail_bytes.get()); - if (!ipv6_zeroed_tail_bytes) - throw Exception("Illegal type " + col_ipv6_zeroed_tail_bytes_type.type->getName() + - " of argument 2 of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - UInt8 ipv6_zeroed_tail_bytes_count = ipv6_zeroed_tail_bytes->getValue(); - if (ipv6_zeroed_tail_bytes_count > IPV6_BINARY_LENGTH) - throw Exception("Illegal value for argument 2 " + col_ipv6_zeroed_tail_bytes_type.type->getName() + - " of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const auto * ipv4_zeroed_tail_bytes = checkAndGetColumnConst>(col_ipv4_zeroed_tail_bytes.get()); - if (!ipv4_zeroed_tail_bytes) - throw Exception("Illegal type " + col_ipv4_zeroed_tail_bytes_type.type->getName() + - " of argument 3 of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - UInt8 ipv4_zeroed_tail_bytes_count = ipv4_zeroed_tail_bytes->getValue(); - if (ipv4_zeroed_tail_bytes_count > IPV6_BINARY_LENGTH) - throw Exception("Illegal value for argument 3 " + col_ipv4_zeroed_tail_bytes_type.type->getName() + - " of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const auto size = col_in->size(); - const auto & vec_in = col_in->getChars(); - - auto col_res = ColumnString::create(); - - ColumnString::Chars & vec_res = col_res->getChars(); - ColumnString::Offsets & offsets_res = col_res->getOffsets(); - vec_res.resize(size * (IPV6_MAX_TEXT_LENGTH + 1)); - offsets_res.resize(size); - - auto * begin = reinterpret_cast(vec_res.data()); - auto * pos = begin; - - for (size_t offset = 0, i = 0; offset < vec_in.size(); offset += IPV6_BINARY_LENGTH, ++i) - { - const auto * address = &vec_in[offset]; - UInt8 zeroed_tail_bytes_count = isIPv4Mapped(address) ? ipv4_zeroed_tail_bytes_count : ipv6_zeroed_tail_bytes_count; - cutAddress(reinterpret_cast(address), pos, zeroed_tail_bytes_count); - offsets_res[i] = pos - begin; - } - - vec_res.resize(pos - begin); - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } - -private: - static bool isIPv4Mapped(const UInt8 * address) - { - return (unalignedLoad(address) == 0) && - ((unalignedLoad(address + 8) & 0x00000000FFFFFFFFull) == 0x00000000FFFF0000ull); - } - - static void cutAddress(const unsigned char * address, char *& dst, UInt8 zeroed_tail_bytes_count) - { - formatIPv6(address, dst, zeroed_tail_bytes_count); - } -}; - - -class FunctionIPv6StringToNum : public IFunction -{ -public: - static constexpr auto name = "IPv6StringToNum"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - static inline bool tryParseIPv4(const char * pos) - { - UInt32 result = 0; - return DB::parseIPv4(pos, reinterpret_cast(&result)); - } - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 1; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isString(arguments[0])) - throw Exception( - "Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(IPV6_BINARY_LENGTH); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnPtr & column = arguments[0].column; - - if (const auto * col_in = checkAndGetColumn(column.get())) - { - auto col_res = ColumnFixedString::create(IPV6_BINARY_LENGTH); - - auto & vec_res = col_res->getChars(); - vec_res.resize(col_in->size() * IPV6_BINARY_LENGTH); - - const ColumnString::Chars & vec_src = col_in->getChars(); - const ColumnString::Offsets & offsets_src = col_in->getOffsets(); - size_t src_offset = 0; - char src_ipv4_buf[sizeof("::ffff:") + IPV4_MAX_TEXT_LENGTH + 1] = "::ffff:"; - - for (size_t out_offset = 0, i = 0; out_offset < vec_res.size(); out_offset += IPV6_BINARY_LENGTH, ++i) - { - /// For both cases below: In case of failure, the function parseIPv6 fills vec_res with zero bytes. - - /// If the source IP address is parsable as an IPv4 address, then transform it into a valid IPv6 address. - /// Keeping it simple by just prefixing `::ffff:` to the IPv4 address to represent it as a valid IPv6 address. - if (tryParseIPv4(reinterpret_cast(&vec_src[src_offset]))) - { - std::memcpy( - src_ipv4_buf + std::strlen("::ffff:"), - reinterpret_cast(&vec_src[src_offset]), - std::min(offsets_src[i] - src_offset, IPV4_MAX_TEXT_LENGTH + 1)); - parseIPv6(src_ipv4_buf, reinterpret_cast(&vec_res[out_offset])); - } - else - { - parseIPv6( - reinterpret_cast(&vec_src[src_offset]), reinterpret_cast(&vec_res[out_offset])); - } - src_offset = offsets_src[i]; - } - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - - -/** If mask_tail_octets > 0, the last specified number of octets will be filled with "xxx". - */ -template -class FunctionIPv4NumToString : public IFunction -{ -public: - static constexpr auto name = Name::name; - static FunctionPtr create(ContextPtr) { return std::make_shared>(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return mask_tail_octets == 0; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!WhichDataType(arguments[0]).isUInt32()) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName() + ", expected UInt32", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnPtr & column = arguments[0].column; - - if (const ColumnUInt32 * col = typeid_cast(column.get())) - { - const ColumnUInt32::Container & vec_in = col->getData(); - - auto col_res = ColumnString::create(); - - ColumnString::Chars & vec_res = col_res->getChars(); - ColumnString::Offsets & offsets_res = col_res->getOffsets(); - - vec_res.resize(vec_in.size() * (IPV4_MAX_TEXT_LENGTH + 1)); /// the longest value is: 255.255.255.255\0 - offsets_res.resize(vec_in.size()); - char * begin = reinterpret_cast(vec_res.data()); - char * pos = begin; - - for (size_t i = 0; i < vec_in.size(); ++i) - { - DB::formatIPv4(reinterpret_cast(&vec_in[i]), pos, mask_tail_octets, "xxx"); - offsets_res[i] = pos - begin; - } - - vec_res.resize(pos - begin); - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - - -class FunctionIPv4StringToNum : public IFunction -{ -public: - static constexpr auto name = "IPv4StringToNum"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isString(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - static inline UInt32 parseIPv4(const char * pos) - { - UInt32 result = 0; - DB::parseIPv4(pos, reinterpret_cast(&result)); - - return result; - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnPtr & column = arguments[0].column; - - if (const ColumnString * col = checkAndGetColumn(column.get())) - { - auto col_res = ColumnUInt32::create(); - - ColumnUInt32::Container & vec_res = col_res->getData(); - vec_res.resize(col->size()); - - const ColumnString::Chars & vec_src = col->getChars(); - const ColumnString::Offsets & offsets_src = col->getOffsets(); - size_t prev_offset = 0; - - for (size_t i = 0; i < vec_res.size(); ++i) - { - vec_res[i] = parseIPv4(reinterpret_cast(&vec_src[prev_offset])); - prev_offset = offsets_src[i]; - } - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - - -class FunctionIPv4ToIPv6 : public IFunction -{ -public: - static constexpr auto name = "IPv4ToIPv6"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!checkAndGetDataType(arguments[0].get())) - throw Exception("Illegal type " + arguments[0]->getName() + - " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(16); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const auto & col_type_name = arguments[0]; - const ColumnPtr & column = col_type_name.column; - - if (const auto * col_in = typeid_cast(column.get())) - { - auto col_res = ColumnFixedString::create(IPV6_BINARY_LENGTH); - - auto & vec_res = col_res->getChars(); - vec_res.resize(col_in->size() * IPV6_BINARY_LENGTH); - - const auto & vec_in = col_in->getData(); - - for (size_t out_offset = 0, i = 0; out_offset < vec_res.size(); out_offset += IPV6_BINARY_LENGTH, ++i) - mapIPv4ToIPv6(vec_in[i], &vec_res[out_offset]); - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } - -private: - static void mapIPv4ToIPv6(UInt32 in, UInt8 * buf) - { - unalignedStore(buf, 0); - unalignedStore(buf + 8, 0x00000000FFFF0000ull | (static_cast(ntohl(in)) << 32)); - } -}; - -class FunctionToIPv4 : public FunctionIPv4StringToNum -{ -public: - static constexpr auto name = "toIPv4"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isString(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return DataTypeFactory::instance().get("IPv4"); - } -}; - -class FunctionToIPv6 : public FunctionIPv6StringToNum -{ -public: - static constexpr auto name = "toIPv6"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isString(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return DataTypeFactory::instance().get("IPv6"); - } -}; - -class FunctionMACNumToString : public IFunction -{ -public: - static constexpr auto name = "MACNumToString"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!WhichDataType(arguments[0]).isUInt64()) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName() + ", expected UInt64", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - static void formatMAC(UInt64 mac, UInt8 * out) - { - /// MAC address is represented in UInt64 in natural order (so, MAC addresses are compared in same order as UInt64). - /// Higher two bytes in UInt64 are just ignored. - - writeHexByteUppercase(mac >> 40, &out[0]); - out[2] = ':'; - writeHexByteUppercase(mac >> 32, &out[3]); - out[5] = ':'; - writeHexByteUppercase(mac >> 24, &out[6]); - out[8] = ':'; - writeHexByteUppercase(mac >> 16, &out[9]); - out[11] = ':'; - writeHexByteUppercase(mac >> 8, &out[12]); - out[14] = ':'; - writeHexByteUppercase(mac, &out[15]); - out[17] = '\0'; - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnPtr & column = arguments[0].column; - - if (const ColumnUInt64 * col = typeid_cast(column.get())) - { - const ColumnUInt64::Container & vec_in = col->getData(); - - auto col_res = ColumnString::create(); - - ColumnString::Chars & vec_res = col_res->getChars(); - ColumnString::Offsets & offsets_res = col_res->getOffsets(); - - vec_res.resize(vec_in.size() * 18); /// the value is: xx:xx:xx:xx:xx:xx\0 - offsets_res.resize(vec_in.size()); - - size_t current_offset = 0; - for (size_t i = 0; i < vec_in.size(); ++i) - { - formatMAC(vec_in[i], &vec_res[current_offset]); - current_offset += 18; - offsets_res[i] = current_offset; - } - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - - -struct ParseMACImpl -{ - static constexpr size_t min_string_size = 17; - static constexpr size_t max_string_size = 17; - - /** Example: 01:02:03:04:05:06. - * There could be any separators instead of : and them are just ignored. - * The order of resulting integers are correspond to the order of MAC address. - * If there are any chars other than valid hex digits for bytes, the behaviour is implementation specific. - */ - static UInt64 parse(const char * pos) - { - return (UInt64(unhex(pos[0])) << 44) - | (UInt64(unhex(pos[1])) << 40) - | (UInt64(unhex(pos[3])) << 36) - | (UInt64(unhex(pos[4])) << 32) - | (UInt64(unhex(pos[6])) << 28) - | (UInt64(unhex(pos[7])) << 24) - | (UInt64(unhex(pos[9])) << 20) - | (UInt64(unhex(pos[10])) << 16) - | (UInt64(unhex(pos[12])) << 12) - | (UInt64(unhex(pos[13])) << 8) - | (UInt64(unhex(pos[15])) << 4) - | (UInt64(unhex(pos[16]))); - } - - static constexpr auto name = "MACStringToNum"; -}; - -struct ParseOUIImpl -{ - static constexpr size_t min_string_size = 8; - static constexpr size_t max_string_size = 17; - - /** OUI is the first three bytes of MAC address. - * Example: 01:02:03. - */ - static UInt64 parse(const char * pos) - { - return (UInt64(unhex(pos[0])) << 20) - | (UInt64(unhex(pos[1])) << 16) - | (UInt64(unhex(pos[3])) << 12) - | (UInt64(unhex(pos[4])) << 8) - | (UInt64(unhex(pos[6])) << 4) - | (UInt64(unhex(pos[7]))); - } - - static constexpr auto name = "MACStringToOUI"; -}; - - -template -class FunctionMACStringTo : public IFunction -{ -public: - static constexpr auto name = Impl::name; - static FunctionPtr create(ContextPtr) { return std::make_shared>(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isString(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnPtr & column = arguments[0].column; - - if (const ColumnString * col = checkAndGetColumn(column.get())) - { - auto col_res = ColumnUInt64::create(); - - ColumnUInt64::Container & vec_res = col_res->getData(); - vec_res.resize(col->size()); - - const ColumnString::Chars & vec_src = col->getChars(); - const ColumnString::Offsets & offsets_src = col->getOffsets(); - size_t prev_offset = 0; - - for (size_t i = 0; i < vec_res.size(); ++i) - { - size_t current_offset = offsets_src[i]; - size_t string_size = current_offset - prev_offset - 1; /// mind the terminating zero byte - - if (string_size >= Impl::min_string_size && string_size <= Impl::max_string_size) - vec_res[i] = Impl::parse(reinterpret_cast(&vec_src[prev_offset])); - else - vec_res[i] = 0; - - prev_offset = current_offset; - } - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - - -class FunctionUUIDNumToString : public IFunction -{ - -public: - static constexpr auto name = "UUIDNumToString"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - const auto * ptr = checkAndGetDataType(arguments[0].get()); - if (!ptr || ptr->getN() != uuid_bytes_length) - throw Exception("Illegal type " + arguments[0]->getName() + - " of argument of function " + getName() + - ", expected FixedString(" + toString(uuid_bytes_length) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnWithTypeAndName & col_type_name = arguments[0]; - const ColumnPtr & column = col_type_name.column; - - if (const auto * col_in = checkAndGetColumn(column.get())) - { - if (col_in->getN() != uuid_bytes_length) - throw Exception("Illegal type " + col_type_name.type->getName() + - " of column " + col_in->getName() + - " argument of function " + getName() + - ", expected FixedString(" + toString(uuid_bytes_length) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const auto size = col_in->size(); - const auto & vec_in = col_in->getChars(); - - auto col_res = ColumnString::create(); - - ColumnString::Chars & vec_res = col_res->getChars(); - ColumnString::Offsets & offsets_res = col_res->getOffsets(); - vec_res.resize(size * (uuid_text_length + 1)); - offsets_res.resize(size); - - size_t src_offset = 0; - size_t dst_offset = 0; - - for (size_t i = 0; i < size; ++i) - { - formatUUID(&vec_in[src_offset], &vec_res[dst_offset]); - src_offset += uuid_bytes_length; - dst_offset += uuid_text_length; - vec_res[dst_offset] = 0; - ++dst_offset; - offsets_res[i] = dst_offset; - } - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - - -class FunctionUUIDStringToNum : public IFunction -{ -private: - static void parseHex(const UInt8 * __restrict src, UInt8 * __restrict dst, const size_t num_bytes) - { - size_t src_pos = 0; - size_t dst_pos = 0; - for (; dst_pos < num_bytes; ++dst_pos) - { - dst[dst_pos] = unhex2(reinterpret_cast(&src[src_pos])); - src_pos += 2; - } - } - - static void parseUUID(const UInt8 * src36, UInt8 * dst16) - { - /// If string is not like UUID - implementation specific behaviour. - - parseHex(&src36[0], &dst16[0], 4); - parseHex(&src36[9], &dst16[4], 2); - parseHex(&src36[14], &dst16[6], 2); - parseHex(&src36[19], &dst16[8], 2); - parseHex(&src36[24], &dst16[10], 6); - } - -public: - static constexpr auto name = "UUIDStringToNum"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - /// String or FixedString(36) - if (!isString(arguments[0])) - { - const auto * ptr = checkAndGetDataType(arguments[0].get()); - if (!ptr || ptr->getN() != uuid_text_length) - throw Exception("Illegal type " + arguments[0]->getName() + - " of argument of function " + getName() + - ", expected FixedString(" + toString(uuid_text_length) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - } - - return std::make_shared(uuid_bytes_length); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnWithTypeAndName & col_type_name = arguments[0]; - const ColumnPtr & column = col_type_name.column; - - if (const auto * col_in = checkAndGetColumn(column.get())) - { - const auto & vec_in = col_in->getChars(); - const auto & offsets_in = col_in->getOffsets(); - const size_t size = offsets_in.size(); - - auto col_res = ColumnFixedString::create(uuid_bytes_length); - - ColumnString::Chars & vec_res = col_res->getChars(); - vec_res.resize(size * uuid_bytes_length); - - size_t src_offset = 0; - size_t dst_offset = 0; - - for (size_t i = 0; i < size; ++i) - { - /// If string has incorrect length - then return zero UUID. - /// If string has correct length but contains something not like UUID - implementation specific behaviour. - - size_t string_size = offsets_in[i] - src_offset; - if (string_size == uuid_text_length + 1) - parseUUID(&vec_in[src_offset], &vec_res[dst_offset]); - else - memset(&vec_res[dst_offset], 0, uuid_bytes_length); - - dst_offset += uuid_bytes_length; - src_offset += string_size; - } - - return col_res; - } - else if (const auto * col_in_fixed = checkAndGetColumn(column.get())) - { - if (col_in_fixed->getN() != uuid_text_length) - throw Exception("Illegal type " + col_type_name.type->getName() + - " of column " + col_in_fixed->getName() + - " argument of function " + getName() + - ", expected FixedString(" + toString(uuid_text_length) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const auto size = col_in_fixed->size(); - const auto & vec_in = col_in_fixed->getChars(); - - auto col_res = ColumnFixedString::create(uuid_bytes_length); - - ColumnString::Chars & vec_res = col_res->getChars(); - vec_res.resize(size * uuid_bytes_length); - - size_t src_offset = 0; - size_t dst_offset = 0; - - for (size_t i = 0; i < size; ++i) - { - parseUUID(&vec_in[src_offset], &vec_res[dst_offset]); - src_offset += uuid_text_length; - dst_offset += uuid_bytes_length; - } - - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); - } -}; - -/// Encode number or string to string with binary or hexadecimal representation -template -class EncodeToBinaryRepr : public IFunction -{ -public: - static constexpr auto name = Impl::name; - static constexpr size_t word_size = Impl::word_size; - - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 1; } - - bool useDefaultImplementationForConstants() const override { return true; } - - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - WhichDataType which(arguments[0]); - - if (!which.isStringOrFixedString() && - !which.isDate() && - !which.isDateTime() && - !which.isDateTime64() && - !which.isUInt() && - !which.isFloat() && - !which.isDecimal() && - !which.isAggregateFunction()) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const IColumn * column = arguments[0].column.get(); - ColumnPtr res_column; - - WhichDataType which(column->getDataType()); - if (which.isAggregateFunction()) - { - const ColumnPtr to_string = castColumn(arguments[0], std::make_shared()); - const auto * str_column = checkAndGetColumn(to_string.get()); - tryExecuteString(str_column, res_column); - return res_column; - } - - if (tryExecuteUInt(column, res_column) || - tryExecuteUInt(column, res_column) || - tryExecuteUInt(column, res_column) || - tryExecuteUInt(column, res_column) || - tryExecuteString(column, res_column) || - tryExecuteFixedString(column, res_column) || - tryExecuteFloat(column, res_column) || - tryExecuteFloat(column, res_column) || - tryExecuteDecimal(column, res_column) || - tryExecuteDecimal(column, res_column) || - tryExecuteDecimal(column, res_column)) - return res_column; - - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } - - template - bool tryExecuteUInt(const IColumn * col, ColumnPtr & col_res) const - { - const ColumnVector * col_vec = checkAndGetColumn>(col); - - static constexpr size_t MAX_LENGTH = sizeof(T) * word_size + 1; /// Including trailing zero byte. - - if (col_vec) - { - auto col_str = ColumnString::create(); - ColumnString::Chars & out_vec = col_str->getChars(); - ColumnString::Offsets & out_offsets = col_str->getOffsets(); - - const typename ColumnVector::Container & in_vec = col_vec->getData(); - - size_t size = in_vec.size(); - out_offsets.resize(size); - out_vec.resize(size * (word_size+1) + MAX_LENGTH); /// word_size+1 is length of one byte in hex/bin plus zero byte. - - size_t pos = 0; - for (size_t i = 0; i < size; ++i) - { - /// Manual exponential growth, so as not to rely on the linear amortized work time of `resize` (no one guarantees it). - if (pos + MAX_LENGTH > out_vec.size()) - out_vec.resize(out_vec.size() * word_size + MAX_LENGTH); - - char * begin = reinterpret_cast(&out_vec[pos]); - char * end = begin; - Impl::executeOneUInt(in_vec[i], end); - - pos += end - begin; - out_offsets[i] = pos; - } - out_vec.resize(pos); - - col_res = std::move(col_str); - return true; - } - else - { - return false; - } - } - - bool tryExecuteString(const IColumn *col, ColumnPtr &col_res) const - { - const ColumnString * col_str_in = checkAndGetColumn(col); - - if (col_str_in) - { - auto col_str = ColumnString::create(); - ColumnString::Chars & out_vec = col_str->getChars(); - ColumnString::Offsets & out_offsets = col_str->getOffsets(); - - const ColumnString::Chars & in_vec = col_str_in->getChars(); - const ColumnString::Offsets & in_offsets = col_str_in->getOffsets(); - - size_t size = in_offsets.size(); - - out_offsets.resize(size); - /// reserve `word_size` bytes for each non trailing zero byte from input + `size` bytes for trailing zeros - out_vec.resize((in_vec.size() - size) * word_size + size); - - char * begin = reinterpret_cast(out_vec.data()); - char * pos = begin; - size_t prev_offset = 0; - - for (size_t i = 0; i < size; ++i) - { - size_t new_offset = in_offsets[i]; - - Impl::executeOneString(&in_vec[prev_offset], &in_vec[new_offset - 1], pos); - - out_offsets[i] = pos - begin; - - prev_offset = new_offset; - } - if (!out_offsets.empty() && out_offsets.back() != out_vec.size()) - throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR); - - col_res = std::move(col_str); - return true; - } - else - { - return false; - } - } - - template - bool tryExecuteDecimal(const IColumn * col, ColumnPtr & col_res) const - { - const ColumnDecimal * col_dec = checkAndGetColumn>(col); - if (col_dec) - { - const typename ColumnDecimal::Container & in_vec = col_dec->getData(); - Impl::executeFloatAndDecimal(in_vec, col_res, sizeof(T)); - return true; - } - else - { - return false; - } - } - - static bool tryExecuteFixedString(const IColumn * col, ColumnPtr & col_res) - { - const ColumnFixedString * col_fstr_in = checkAndGetColumn(col); - - if (col_fstr_in) - { - auto col_str = ColumnString::create(); - ColumnString::Chars & out_vec = col_str->getChars(); - ColumnString::Offsets & out_offsets = col_str->getOffsets(); - - const ColumnString::Chars & in_vec = col_fstr_in->getChars(); - - size_t size = col_fstr_in->size(); - - out_offsets.resize(size); - out_vec.resize(in_vec.size() * word_size + size); - - char * begin = reinterpret_cast(out_vec.data()); - char * pos = begin; - - size_t n = col_fstr_in->getN(); - - size_t prev_offset = 0; - - for (size_t i = 0; i < size; ++i) - { - size_t new_offset = prev_offset + n; - - Impl::executeOneString(&in_vec[prev_offset], &in_vec[new_offset], pos); - - out_offsets[i] = pos - begin; - prev_offset = new_offset; - } - - if (!out_offsets.empty() && out_offsets.back() != out_vec.size()) - throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR); - - col_res = std::move(col_str); - return true; - } - else - { - return false; - } - } - - template - bool tryExecuteFloat(const IColumn * col, ColumnPtr & col_res) const - { - const ColumnVector * col_vec = checkAndGetColumn>(col); - if (col_vec) - { - const typename ColumnVector::Container & in_vec = col_vec->getData(); - Impl::executeFloatAndDecimal(in_vec, col_res, sizeof(T)); - return true; - } - else - { - return false; - } - } -}; - -/// Decode number or string from string with binary or hexadecimal representation -template -class DecodeFromBinaryRepr : public IFunction -{ -public: - static constexpr auto name = Impl::name; - static constexpr size_t word_size = Impl::word_size; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isString(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnPtr & column = arguments[0].column; - - if (const ColumnString * col = checkAndGetColumn(column.get())) - { - auto col_res = ColumnString::create(); - - ColumnString::Chars & out_vec = col_res->getChars(); - ColumnString::Offsets & out_offsets = col_res->getOffsets(); - - const ColumnString::Chars & in_vec = col->getChars(); - const ColumnString::Offsets & in_offsets = col->getOffsets(); - - size_t size = in_offsets.size(); - out_offsets.resize(size); - out_vec.resize(in_vec.size() / word_size + size); - - char * begin = reinterpret_cast(out_vec.data()); - char * pos = begin; - size_t prev_offset = 0; - - for (size_t i = 0; i < size; ++i) - { - size_t new_offset = in_offsets[i]; - - Impl::decode(reinterpret_cast(&in_vec[prev_offset]), reinterpret_cast(&in_vec[new_offset - 1]), pos); - - out_offsets[i] = pos - begin; - - prev_offset = new_offset; - } - - out_vec.resize(pos - begin); - - return col_res; - } - else - { - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } - } -}; - -struct HexImpl -{ - static constexpr auto name = "hex"; - static constexpr size_t word_size = 2; - - template - static void executeOneUInt(T x, char *& out) - { - bool was_nonzero = false; - for (int offset = (sizeof(T) - 1) * 8; offset >= 0; offset -= 8) - { - UInt8 byte = x >> offset; - - /// Skip leading zeros - if (byte == 0 && !was_nonzero && offset) - continue; - - was_nonzero = true; - writeHexByteUppercase(byte, out); - out += word_size; - } - *out = '\0'; - ++out; - } - - static void executeOneString(const UInt8 * pos, const UInt8 * end, char *& out) - { - while (pos < end) - { - writeHexByteUppercase(*pos, out); - ++pos; - out += word_size; - } - *out = '\0'; - ++out; - } - - template - static void executeFloatAndDecimal(const T & in_vec, ColumnPtr & col_res, const size_t type_size_in_bytes) - { - const size_t hex_length = type_size_in_bytes * word_size + 1; /// Including trailing zero byte. - auto col_str = ColumnString::create(); - - ColumnString::Chars & out_vec = col_str->getChars(); - ColumnString::Offsets & out_offsets = col_str->getOffsets(); - - size_t size = in_vec.size(); - out_offsets.resize(size); - out_vec.resize(size * hex_length); - - size_t pos = 0; - char * out = reinterpret_cast(&out_vec[0]); - for (size_t i = 0; i < size; ++i) - { - const UInt8 * in_pos = reinterpret_cast(&in_vec[i]); - executeOneString(in_pos, in_pos + type_size_in_bytes, out); - - pos += hex_length; - out_offsets[i] = pos; - } - col_res = std::move(col_str); - } -}; - -struct UnhexImpl -{ - static constexpr auto name = "unhex"; - static constexpr size_t word_size = 2; - - static void decode(const char * pos, const char * end, char *& out) - { - if ((end - pos) & 1) - { - *out = unhex(*pos); - ++out; - ++pos; - } - while (pos < end) - { - *out = unhex2(pos); - pos += word_size; - ++out; - } - *out = '\0'; - ++out; - } -}; - -struct BinImpl -{ - static constexpr auto name = "bin"; - static constexpr size_t word_size = 8; - - template - static void executeOneUInt(T x, char *& out) - { - bool was_nonzero = false; - for (int offset = (sizeof(T) - 1) * 8; offset >= 0; offset -= 8) - { - UInt8 byte = x >> offset; - - /// Skip leading zeros - if (byte == 0 && !was_nonzero && offset) - continue; - - was_nonzero = true; - writeBinByte(byte, out); - out += word_size; - } - *out = '\0'; - ++out; - } - - template - static void executeFloatAndDecimal(const T & in_vec, ColumnPtr & col_res, const size_t type_size_in_bytes) - { - const size_t hex_length = type_size_in_bytes * word_size + 1; /// Including trailing zero byte. - auto col_str = ColumnString::create(); - - ColumnString::Chars & out_vec = col_str->getChars(); - ColumnString::Offsets & out_offsets = col_str->getOffsets(); - - size_t size = in_vec.size(); - out_offsets.resize(size); - out_vec.resize(size * hex_length); - - size_t pos = 0; - char * out = reinterpret_cast(out_vec.data()); - for (size_t i = 0; i < size; ++i) - { - const UInt8 * in_pos = reinterpret_cast(&in_vec[i]); - executeOneString(in_pos, in_pos + type_size_in_bytes, out); - - pos += hex_length; - out_offsets[i] = pos; - } - col_res = std::move(col_str); - } - - static void executeOneString(const UInt8 * pos, const UInt8 * end, char *& out) - { - while (pos < end) - { - writeBinByte(*pos, out); - ++pos; - out += word_size; - } - *out = '\0'; - ++out; - } -}; - -struct UnbinImpl -{ - static constexpr auto name = "unbin"; - static constexpr size_t word_size = 8; - - static void decode(const char * pos, const char * end, char *& out) - { - if (pos == end) - { - *out = '\0'; - ++out; - return; - } - - UInt8 left = 0; - - /// end - pos is the length of input. - /// (length & 7) to make remain bits length mod 8 is zero to split. - /// e.g. the length is 9 and the input is "101000001", - /// first left_cnt is 1, left is 0, right shift, pos is 1, left = 1 - /// then, left_cnt is 0, remain input is '01000001'. - for (UInt8 left_cnt = (end - pos) & 7; left_cnt > 0; --left_cnt) - { - left = left << 1; - if (*pos != '0') - left += 1; - ++pos; - } - - if (left != 0 || end - pos == 0) - { - *out = left; - ++out; - } - - assert((end - pos) % 8 == 0); - - while (end - pos != 0) - { - UInt8 c = 0; - for (UInt8 i = 0; i < 8; ++i) - { - c = c << 1; - if (*pos != '0') - c += 1; - ++pos; - } - *out = c; - ++out; - } - - *out = '\0'; - ++out; - } -}; - -using FunctionHex = EncodeToBinaryRepr; -using FunctionUnhex = DecodeFromBinaryRepr; -using FunctionBin = EncodeToBinaryRepr; -using FunctionUnbin = DecodeFromBinaryRepr; - -class FunctionChar : public IFunction -{ -public: - static constexpr auto name = "char"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - bool isVariadic() const override { return true; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - size_t getNumberOfArguments() const override { return 0; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (arguments.empty()) - throw Exception("Number of arguments for function " + getName() + " can't be " + toString(arguments.size()) - + ", should be at least 1", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - for (const auto & arg : arguments) - { - WhichDataType which(arg); - if (!(which.isInt() || which.isUInt() || which.isFloat())) - throw Exception("Illegal type " + arg->getName() + " of argument of function " + getName() - + ", must be Int, UInt or Float number", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - } - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override - { - auto col_str = ColumnString::create(); - ColumnString::Chars & out_vec = col_str->getChars(); - ColumnString::Offsets & out_offsets = col_str->getOffsets(); - - const auto size_per_row = arguments.size() + 1; - out_vec.resize(size_per_row * input_rows_count); - out_offsets.resize(input_rows_count); - - for (size_t row = 0; row < input_rows_count; ++row) - { - out_offsets[row] = size_per_row + out_offsets[row - 1]; - out_vec[row * size_per_row + size_per_row - 1] = '\0'; - } - - Columns columns_holder(arguments.size()); - for (size_t idx = 0; idx < arguments.size(); ++idx) - { - //partial const column - columns_holder[idx] = arguments[idx].column->convertToFullColumnIfConst(); - const IColumn * column = columns_holder[idx].get(); - - if (!(executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row) - || executeNumber(*column, out_vec, idx, input_rows_count, size_per_row))) - { - throw Exception{"Illegal column " + arguments[idx].column->getName() - + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN}; - } - } - - return col_str; - } - -private: - template - bool executeNumber(const IColumn & src_data, ColumnString::Chars & out_vec, const size_t & column_idx, const size_t & rows, const size_t & size_per_row) const - { - const ColumnVector * src_data_concrete = checkAndGetColumn>(&src_data); - - if (!src_data_concrete) - { - return false; - } - - for (size_t row = 0; row < rows; ++row) - { - out_vec[row * size_per_row + column_idx] = static_cast(src_data_concrete->getInt(row)); - } - return true; - } -}; - -class FunctionBitmaskToArray : public IFunction -{ -public: - static constexpr auto name = "bitmaskToArray"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isInteger(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(arguments[0]); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - template - bool tryExecute(const IColumn * column, ColumnPtr & out_column) const - { - using UnsignedT = make_unsigned_t; - - if (const ColumnVector * col_from = checkAndGetColumn>(column)) - { - auto col_values = ColumnVector::create(); - auto col_offsets = ColumnArray::ColumnOffsets::create(); - - typename ColumnVector::Container & res_values = col_values->getData(); - ColumnArray::Offsets & res_offsets = col_offsets->getData(); - - const typename ColumnVector::Container & vec_from = col_from->getData(); - size_t size = vec_from.size(); - res_offsets.resize(size); - res_values.reserve(size * 2); - - for (size_t row = 0; row < size; ++row) - { - UnsignedT x = vec_from[row]; - while (x) - { - UnsignedT y = x & (x - 1); - UnsignedT bit = x ^ y; - x = y; - res_values.push_back(bit); - } - res_offsets[row] = res_values.size(); - } - - out_column = ColumnArray::create(std::move(col_values), std::move(col_offsets)); - return true; - } - else - { - return false; - } - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const IColumn * in_column = arguments[0].column.get(); - ColumnPtr out_column; - - if (tryExecute(in_column, out_column) || - tryExecute(in_column, out_column) || - tryExecute(in_column, out_column) || - tryExecute(in_column, out_column) || - tryExecute(in_column, out_column) || - tryExecute(in_column, out_column) || - tryExecute(in_column, out_column) || - tryExecute(in_column, out_column)) - return out_column; - - throw Exception("Illegal column " + arguments[0].column->getName() - + " of first argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - -class FunctionBitPositionsToArray : public IFunction -{ -public: - static constexpr auto name = "bitPositionsToArray"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isInteger(arguments[0])) - throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, - "Illegal type {} of argument of function {}", - getName(), - arguments[0]->getName()); - - return std::make_shared(std::make_shared()); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - template - ColumnPtr executeType(const IColumn * column) const - { - const ColumnVector * col_from = checkAndGetColumn>(column); - if (!col_from) - return nullptr; - - auto result_array_values = ColumnVector::create(); - auto result_array_offsets = ColumnArray::ColumnOffsets::create(); - - auto & result_array_values_data = result_array_values->getData(); - auto & result_array_offsets_data = result_array_offsets->getData(); - - auto & vec_from = col_from->getData(); - size_t size = vec_from.size(); - result_array_offsets_data.resize(size); - result_array_values_data.reserve(size * 2); - - using UnsignedType = make_unsigned_t; - - for (size_t row = 0; row < size; ++row) - { - UnsignedType x = static_cast(vec_from[row]); - - if constexpr (is_big_int_v) - { - size_t position = 0; - - while (x) - { - if (x & 1) - result_array_values_data.push_back(position); - - x >>= 1; - ++position; - } - } - else - { - while (x) - { - result_array_values_data.push_back(getTrailingZeroBitsUnsafe(x)); - x &= (x - 1); - } - } - - result_array_offsets_data[row] = result_array_values_data.size(); - } - - auto result_column = ColumnArray::create(std::move(result_array_values), std::move(result_array_offsets)); - - return result_column; - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const IColumn * in_column = arguments[0].column.get(); - ColumnPtr result_column; - - if (!((result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)) - || (result_column = executeType(in_column)))) - { - throw Exception(ErrorCodes::ILLEGAL_COLUMN, - "Illegal column {} of first argument of function {}", - arguments[0].column->getName(), - getName()); - } - - return result_column; - } -}; - -class FunctionToStringCutToZero : public IFunction -{ -public: - static constexpr auto name = "toStringCutToZero"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isStringOrFixedString(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - static bool tryExecuteString(const IColumn * col, ColumnPtr & col_res) - { - const ColumnString * col_str_in = checkAndGetColumn(col); - - if (col_str_in) - { - auto col_str = ColumnString::create(); - ColumnString::Chars & out_vec = col_str->getChars(); - ColumnString::Offsets & out_offsets = col_str->getOffsets(); - - const ColumnString::Chars & in_vec = col_str_in->getChars(); - const ColumnString::Offsets & in_offsets = col_str_in->getOffsets(); - - size_t size = in_offsets.size(); - out_offsets.resize(size); - out_vec.resize(in_vec.size()); - - char * begin = reinterpret_cast(out_vec.data()); - char * pos = begin; - - ColumnString::Offset current_in_offset = 0; - - for (size_t i = 0; i < size; ++i) - { - const char * pos_in = reinterpret_cast(&in_vec[current_in_offset]); - size_t current_size = strlen(pos_in); - memcpySmallAllowReadWriteOverflow15(pos, pos_in, current_size); - pos += current_size; - *pos = '\0'; - ++pos; - out_offsets[i] = pos - begin; - current_in_offset = in_offsets[i]; - } - out_vec.resize(pos - begin); - - if (!out_offsets.empty() && out_offsets.back() != out_vec.size()) - throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR); - - col_res = std::move(col_str); - return true; - } - else - { - return false; - } - } - - static bool tryExecuteFixedString(const IColumn * col, ColumnPtr & col_res) - { - const ColumnFixedString * col_fstr_in = checkAndGetColumn(col); - - if (col_fstr_in) - { - auto col_str = ColumnString::create(); - ColumnString::Chars & out_vec = col_str->getChars(); - ColumnString::Offsets & out_offsets = col_str->getOffsets(); - - const ColumnString::Chars & in_vec = col_fstr_in->getChars(); - - size_t size = col_fstr_in->size(); - - out_offsets.resize(size); - out_vec.resize(in_vec.size() + size); - - char * begin = reinterpret_cast(out_vec.data()); - char * pos = begin; - const char * pos_in = reinterpret_cast(in_vec.data()); - - size_t n = col_fstr_in->getN(); - - for (size_t i = 0; i < size; ++i) - { - size_t current_size = strnlen(pos_in, n); - memcpySmallAllowReadWriteOverflow15(pos, pos_in, current_size); - pos += current_size; - *pos = '\0'; - out_offsets[i] = ++pos - begin; - pos_in += n; - } - out_vec.resize(pos - begin); - - if (!out_offsets.empty() && out_offsets.back() != out_vec.size()) - throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR); - - col_res = std::move(col_str); - return true; - } - else - { - return false; - } - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const IColumn * column = arguments[0].column.get(); - ColumnPtr res_column; - - if (tryExecuteFixedString(column, res_column) || tryExecuteString(column, res_column)) - return res_column; - - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - - -class FunctionIPv6CIDRToRange : public IFunction -{ -private: - -#if defined(__SSE2__) - - #include - - static inline void applyCIDRMask(const UInt8 * __restrict src, UInt8 * __restrict dst_lower, UInt8 * __restrict dst_upper, UInt8 bits_to_keep) - { - __m128i mask = _mm_loadu_si128(reinterpret_cast(getCIDRMaskIPv6(bits_to_keep).data())); - __m128i lower = _mm_and_si128(_mm_loadu_si128(reinterpret_cast(src)), mask); - _mm_storeu_si128(reinterpret_cast<__m128i *>(dst_lower), lower); - - __m128i inv_mask = _mm_xor_si128(mask, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128())); - __m128i upper = _mm_or_si128(lower, inv_mask); - _mm_storeu_si128(reinterpret_cast<__m128i *>(dst_upper), upper); - } - -#else - - /// NOTE IPv6 is stored in memory in big endian format that makes some difficulties. - static void applyCIDRMask(const UInt8 * __restrict src, UInt8 * __restrict dst_lower, UInt8 * __restrict dst_upper, UInt8 bits_to_keep) - { - const auto & mask = getCIDRMaskIPv6(bits_to_keep); - - for (size_t i = 0; i < 16; ++i) - { - dst_lower[i] = src[i] & mask[i]; - dst_upper[i] = dst_lower[i] | ~mask[i]; - } - } - -#endif - -public: - static constexpr auto name = "IPv6CIDRToRange"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - size_t getNumberOfArguments() const override { return 2; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - const auto * first_argument = checkAndGetDataType(arguments[0].get()); - if (!first_argument || first_argument->getN() != IPV6_BINARY_LENGTH) - throw Exception("Illegal type " + arguments[0]->getName() + - " of first argument of function " + getName() + - ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const DataTypePtr & second_argument = arguments[1]; - if (!isUInt8(second_argument)) - throw Exception{"Illegal type " + second_argument->getName() - + " of second argument of function " + getName() - + ", expected UInt8", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; - - DataTypePtr element = DataTypeFactory::instance().get("IPv6"); - return std::make_shared(DataTypes{element, element}); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override - { - const auto & col_type_name_ip = arguments[0]; - const ColumnPtr & column_ip = col_type_name_ip.column; - - const auto * col_const_ip_in = checkAndGetColumnConst(column_ip.get()); - const auto * col_ip_in = checkAndGetColumn(column_ip.get()); - - if (!col_ip_in && !col_const_ip_in) - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - - if ((col_const_ip_in && col_const_ip_in->getValue().size() != IPV6_BINARY_LENGTH) || - (col_ip_in && col_ip_in->getN() != IPV6_BINARY_LENGTH)) - throw Exception("Illegal type " + col_type_name_ip.type->getName() + - " of column " + column_ip->getName() + - " argument of function " + getName() + - ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - const auto & col_type_name_cidr = arguments[1]; - const ColumnPtr & column_cidr = col_type_name_cidr.column; - - const auto * col_const_cidr_in = checkAndGetColumnConst(column_cidr.get()); - const auto * col_cidr_in = checkAndGetColumn(column_cidr.get()); - - if (!col_const_cidr_in && !col_cidr_in) - throw Exception("Illegal column " + arguments[1].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - - auto col_res_lower_range = ColumnFixedString::create(IPV6_BINARY_LENGTH); - auto col_res_upper_range = ColumnFixedString::create(IPV6_BINARY_LENGTH); - - ColumnString::Chars & vec_res_lower_range = col_res_lower_range->getChars(); - vec_res_lower_range.resize(input_rows_count * IPV6_BINARY_LENGTH); - - ColumnString::Chars & vec_res_upper_range = col_res_upper_range->getChars(); - vec_res_upper_range.resize(input_rows_count * IPV6_BINARY_LENGTH); - - static constexpr UInt8 max_cidr_mask = IPV6_BINARY_LENGTH * 8; - - const String col_const_ip_str = col_const_ip_in ? col_const_ip_in->getValue() : ""; - const UInt8 * col_const_ip_value = col_const_ip_in ? reinterpret_cast(col_const_ip_str.c_str()) : nullptr; - - for (size_t offset = 0; offset < input_rows_count; ++offset) - { - const size_t offset_ipv6 = offset * IPV6_BINARY_LENGTH; - - const UInt8 * ip = col_const_ip_in - ? col_const_ip_value - : &col_ip_in->getChars()[offset_ipv6]; - - UInt8 cidr = col_const_cidr_in - ? col_const_cidr_in->getValue() - : col_cidr_in->getData()[offset]; - - cidr = std::min(cidr, max_cidr_mask); - - applyCIDRMask(ip, &vec_res_lower_range[offset_ipv6], &vec_res_upper_range[offset_ipv6], cidr); - } - - return ColumnTuple::create(Columns{std::move(col_res_lower_range), std::move(col_res_upper_range)}); - } -}; - - -class FunctionIPv4CIDRToRange : public IFunction -{ -private: - static inline std::pair applyCIDRMask(UInt32 src, UInt8 bits_to_keep) - { - if (bits_to_keep >= 8 * sizeof(UInt32)) - return { src, src }; - if (bits_to_keep == 0) - return { UInt32(0), UInt32(-1) }; - - UInt32 mask = UInt32(-1) << (8 * sizeof(UInt32) - bits_to_keep); - UInt32 lower = src & mask; - UInt32 upper = lower | ~mask; - - return { lower, upper }; - } - -public: - static constexpr auto name = "IPv4CIDRToRange"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - size_t getNumberOfArguments() const override { return 2; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!WhichDataType(arguments[0]).isUInt32()) - throw Exception("Illegal type " + arguments[0]->getName() + - " of first argument of function " + getName() + - ", expected UInt32", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - - const DataTypePtr & second_argument = arguments[1]; - if (!isUInt8(second_argument)) - throw Exception{"Illegal type " + second_argument->getName() - + " of second argument of function " + getName() - + ", expected UInt8", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; - - DataTypePtr element = DataTypeFactory::instance().get("IPv4"); - return std::make_shared(DataTypes{element, element}); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override - { - const auto & col_type_name_ip = arguments[0]; - const ColumnPtr & column_ip = col_type_name_ip.column; - - const auto * col_const_ip_in = checkAndGetColumnConst(column_ip.get()); - const auto * col_ip_in = checkAndGetColumn(column_ip.get()); - if (!col_const_ip_in && !col_ip_in) - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - - const auto & col_type_name_cidr = arguments[1]; - const ColumnPtr & column_cidr = col_type_name_cidr.column; - - const auto * col_const_cidr_in = checkAndGetColumnConst(column_cidr.get()); - const auto * col_cidr_in = checkAndGetColumn(column_cidr.get()); - - if (!col_const_cidr_in && !col_cidr_in) - throw Exception("Illegal column " + arguments[1].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - - auto col_res_lower_range = ColumnUInt32::create(); - auto col_res_upper_range = ColumnUInt32::create(); - - auto & vec_res_lower_range = col_res_lower_range->getData(); - vec_res_lower_range.resize(input_rows_count); - - auto & vec_res_upper_range = col_res_upper_range->getData(); - vec_res_upper_range.resize(input_rows_count); - - for (size_t i = 0; i < input_rows_count; ++i) - { - UInt32 ip = col_const_ip_in - ? col_const_ip_in->getValue() - : col_ip_in->getData()[i]; - - UInt8 cidr = col_const_cidr_in - ? col_const_cidr_in->getValue() - : col_cidr_in->getData()[i]; - - std::tie(vec_res_lower_range[i], vec_res_upper_range[i]) = applyCIDRMask(ip, cidr); - } - - return ColumnTuple::create(Columns{std::move(col_res_lower_range), std::move(col_res_upper_range)}); - } -}; - -class FunctionIsIPv4String : public FunctionIPv4StringToNum -{ -public: - static constexpr auto name = "isIPv4String"; - - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isString(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - return std::make_shared(); - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnPtr & column = arguments[0].column; - if (const ColumnString * col = checkAndGetColumn(column.get())) - { - auto col_res = ColumnUInt8::create(); - - ColumnUInt8::Container & vec_res = col_res->getData(); - vec_res.resize(col->size()); - - const ColumnString::Chars & vec_src = col->getChars(); - const ColumnString::Offsets & offsets_src = col->getOffsets(); - size_t prev_offset = 0; - UInt32 result = 0; - - for (size_t i = 0; i < vec_res.size(); ++i) - { - vec_res[i] = DB::parseIPv4(reinterpret_cast(&vec_src[prev_offset]), reinterpret_cast(&result)); - prev_offset = offsets_src[i]; - } - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - -class FunctionIsIPv6String : public FunctionIPv6StringToNum -{ -public: - static constexpr auto name = "isIPv6String"; - - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override { return name; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - if (!isString(arguments[0])) - throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - const ColumnPtr & column = arguments[0].column; - - if (const ColumnString * col = checkAndGetColumn(column.get())) - { - auto col_res = ColumnUInt8::create(); - - ColumnUInt8::Container & vec_res = col_res->getData(); - vec_res.resize(col->size()); - - const ColumnString::Chars & vec_src = col->getChars(); - const ColumnString::Offsets & offsets_src = col->getOffsets(); - size_t prev_offset = 0; - char v[IPV6_BINARY_LENGTH]; - - for (size_t i = 0; i < vec_res.size(); ++i) - { - vec_res[i] = DB::parseIPv6(reinterpret_cast(&vec_src[prev_offset]), reinterpret_cast(v)); - prev_offset = offsets_src[i]; - } - return col_res; - } - else - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - } -}; - -} diff --git a/src/Functions/FunctionsCodingIP.cpp b/src/Functions/FunctionsCodingIP.cpp new file mode 100644 index 00000000000..20af7d41aca --- /dev/null +++ b/src/Functions/FunctionsCodingIP.cpp @@ -0,0 +1,1077 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int ILLEGAL_COLUMN; +} + + +/** Encoding functions for network addresses: + * + * IPv4NumToString (num) - See below. + * IPv4StringToNum(string) - Convert, for example, '192.168.0.1' to 3232235521 and vice versa. + */ +class FunctionIPv6NumToString : public IFunction +{ +public: + static constexpr auto name = "IPv6NumToString"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + const auto * ptr = checkAndGetDataType(arguments[0].get()); + if (!ptr || ptr->getN() != IPV6_BINARY_LENGTH) + throw Exception("Illegal type " + arguments[0]->getName() + + " of argument of function " + getName() + + ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const auto & col_type_name = arguments[0]; + const ColumnPtr & column = col_type_name.column; + + if (const auto * col_in = checkAndGetColumn(column.get())) + { + if (col_in->getN() != IPV6_BINARY_LENGTH) + throw Exception("Illegal type " + col_type_name.type->getName() + + " of column " + col_in->getName() + + " argument of function " + getName() + + ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const auto size = col_in->size(); + const auto & vec_in = col_in->getChars(); + + auto col_res = ColumnString::create(); + + ColumnString::Chars & vec_res = col_res->getChars(); + ColumnString::Offsets & offsets_res = col_res->getOffsets(); + vec_res.resize(size * (IPV6_MAX_TEXT_LENGTH + 1)); + offsets_res.resize(size); + + auto * begin = reinterpret_cast(vec_res.data()); + auto * pos = begin; + + for (size_t offset = 0, i = 0; offset < vec_in.size(); offset += IPV6_BINARY_LENGTH, ++i) + { + formatIPv6(reinterpret_cast(&vec_in[offset]), pos); + offsets_res[i] = pos - begin; + } + + vec_res.resize(pos - begin); + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + + +class FunctionCutIPv6 : public IFunction +{ +public: + static constexpr auto name = "cutIPv6"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 3; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + const auto * ptr = checkAndGetDataType(arguments[0].get()); + if (!ptr || ptr->getN() != IPV6_BINARY_LENGTH) + throw Exception("Illegal type " + arguments[0]->getName() + + " of argument 1 of function " + getName() + + ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + if (!WhichDataType(arguments[1]).isUInt8()) + throw Exception("Illegal type " + arguments[1]->getName() + + " of argument 2 of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + if (!WhichDataType(arguments[2]).isUInt8()) + throw Exception("Illegal type " + arguments[2]->getName() + + " of argument 3 of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1, 2}; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const auto & col_type_name = arguments[0]; + const ColumnPtr & column = col_type_name.column; + + const auto & col_ipv6_zeroed_tail_bytes_type = arguments[1]; + const auto & col_ipv6_zeroed_tail_bytes = col_ipv6_zeroed_tail_bytes_type.column; + const auto & col_ipv4_zeroed_tail_bytes_type = arguments[2]; + const auto & col_ipv4_zeroed_tail_bytes = col_ipv4_zeroed_tail_bytes_type.column; + + if (const auto * col_in = checkAndGetColumn(column.get())) + { + if (col_in->getN() != IPV6_BINARY_LENGTH) + throw Exception("Illegal type " + col_type_name.type->getName() + + " of column " + col_in->getName() + + " argument of function " + getName() + + ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const auto * ipv6_zeroed_tail_bytes = checkAndGetColumnConst>(col_ipv6_zeroed_tail_bytes.get()); + if (!ipv6_zeroed_tail_bytes) + throw Exception("Illegal type " + col_ipv6_zeroed_tail_bytes_type.type->getName() + + " of argument 2 of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + UInt8 ipv6_zeroed_tail_bytes_count = ipv6_zeroed_tail_bytes->getValue(); + if (ipv6_zeroed_tail_bytes_count > IPV6_BINARY_LENGTH) + throw Exception("Illegal value for argument 2 " + col_ipv6_zeroed_tail_bytes_type.type->getName() + + " of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const auto * ipv4_zeroed_tail_bytes = checkAndGetColumnConst>(col_ipv4_zeroed_tail_bytes.get()); + if (!ipv4_zeroed_tail_bytes) + throw Exception("Illegal type " + col_ipv4_zeroed_tail_bytes_type.type->getName() + + " of argument 3 of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + UInt8 ipv4_zeroed_tail_bytes_count = ipv4_zeroed_tail_bytes->getValue(); + if (ipv4_zeroed_tail_bytes_count > IPV6_BINARY_LENGTH) + throw Exception("Illegal value for argument 3 " + col_ipv4_zeroed_tail_bytes_type.type->getName() + + " of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const auto size = col_in->size(); + const auto & vec_in = col_in->getChars(); + + auto col_res = ColumnString::create(); + + ColumnString::Chars & vec_res = col_res->getChars(); + ColumnString::Offsets & offsets_res = col_res->getOffsets(); + vec_res.resize(size * (IPV6_MAX_TEXT_LENGTH + 1)); + offsets_res.resize(size); + + auto * begin = reinterpret_cast(vec_res.data()); + auto * pos = begin; + + for (size_t offset = 0, i = 0; offset < vec_in.size(); offset += IPV6_BINARY_LENGTH, ++i) + { + const auto * address = &vec_in[offset]; + UInt8 zeroed_tail_bytes_count = isIPv4Mapped(address) ? ipv4_zeroed_tail_bytes_count : ipv6_zeroed_tail_bytes_count; + cutAddress(reinterpret_cast(address), pos, zeroed_tail_bytes_count); + offsets_res[i] = pos - begin; + } + + vec_res.resize(pos - begin); + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } + +private: + static bool isIPv4Mapped(const UInt8 * address) + { + return (unalignedLoad(address) == 0) && + ((unalignedLoad(address + 8) & 0x00000000FFFFFFFFull) == 0x00000000FFFF0000ull); + } + + static void cutAddress(const unsigned char * address, char *& dst, UInt8 zeroed_tail_bytes_count) + { + formatIPv6(address, dst, zeroed_tail_bytes_count); + } +}; + + +class FunctionIPv6StringToNum : public IFunction +{ +public: + static constexpr auto name = "IPv6StringToNum"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + static inline bool tryParseIPv4(const char * pos) + { + UInt32 result = 0; + return DB::parseIPv4(pos, reinterpret_cast(&result)); + } + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 1; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception( + "Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(IPV6_BINARY_LENGTH); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnPtr & column = arguments[0].column; + + if (const auto * col_in = checkAndGetColumn(column.get())) + { + auto col_res = ColumnFixedString::create(IPV6_BINARY_LENGTH); + + auto & vec_res = col_res->getChars(); + vec_res.resize(col_in->size() * IPV6_BINARY_LENGTH); + + const ColumnString::Chars & vec_src = col_in->getChars(); + const ColumnString::Offsets & offsets_src = col_in->getOffsets(); + size_t src_offset = 0; + char src_ipv4_buf[sizeof("::ffff:") + IPV4_MAX_TEXT_LENGTH + 1] = "::ffff:"; + + for (size_t out_offset = 0, i = 0; out_offset < vec_res.size(); out_offset += IPV6_BINARY_LENGTH, ++i) + { + /// For both cases below: In case of failure, the function parseIPv6 fills vec_res with zero bytes. + + /// If the source IP address is parsable as an IPv4 address, then transform it into a valid IPv6 address. + /// Keeping it simple by just prefixing `::ffff:` to the IPv4 address to represent it as a valid IPv6 address. + if (tryParseIPv4(reinterpret_cast(&vec_src[src_offset]))) + { + std::memcpy( + src_ipv4_buf + std::strlen("::ffff:"), + reinterpret_cast(&vec_src[src_offset]), + std::min(offsets_src[i] - src_offset, IPV4_MAX_TEXT_LENGTH + 1)); + parseIPv6(src_ipv4_buf, reinterpret_cast(&vec_res[out_offset])); + } + else + { + parseIPv6( + reinterpret_cast(&vec_src[src_offset]), reinterpret_cast(&vec_res[out_offset])); + } + src_offset = offsets_src[i]; + } + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + + +/** If mask_tail_octets > 0, the last specified number of octets will be filled with "xxx". + */ +template +class FunctionIPv4NumToString : public IFunction +{ +public: + static constexpr auto name = Name::name; + static FunctionPtr create(ContextPtr) { return std::make_shared>(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return mask_tail_octets == 0; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!WhichDataType(arguments[0]).isUInt32()) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName() + ", expected UInt32", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnPtr & column = arguments[0].column; + + if (const ColumnUInt32 * col = typeid_cast(column.get())) + { + const ColumnUInt32::Container & vec_in = col->getData(); + + auto col_res = ColumnString::create(); + + ColumnString::Chars & vec_res = col_res->getChars(); + ColumnString::Offsets & offsets_res = col_res->getOffsets(); + + vec_res.resize(vec_in.size() * (IPV4_MAX_TEXT_LENGTH + 1)); /// the longest value is: 255.255.255.255\0 + offsets_res.resize(vec_in.size()); + char * begin = reinterpret_cast(vec_res.data()); + char * pos = begin; + + for (size_t i = 0; i < vec_in.size(); ++i) + { + DB::formatIPv4(reinterpret_cast(&vec_in[i]), pos, mask_tail_octets, "xxx"); + offsets_res[i] = pos - begin; + } + + vec_res.resize(pos - begin); + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + + +class FunctionIPv4StringToNum : public IFunction +{ +public: + static constexpr auto name = "IPv4StringToNum"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + static inline UInt32 parseIPv4(const char * pos) + { + UInt32 result = 0; + DB::parseIPv4(pos, reinterpret_cast(&result)); + + return result; + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnPtr & column = arguments[0].column; + + if (const ColumnString * col = checkAndGetColumn(column.get())) + { + auto col_res = ColumnUInt32::create(); + + ColumnUInt32::Container & vec_res = col_res->getData(); + vec_res.resize(col->size()); + + const ColumnString::Chars & vec_src = col->getChars(); + const ColumnString::Offsets & offsets_src = col->getOffsets(); + size_t prev_offset = 0; + + for (size_t i = 0; i < vec_res.size(); ++i) + { + vec_res[i] = parseIPv4(reinterpret_cast(&vec_src[prev_offset])); + prev_offset = offsets_src[i]; + } + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + + +class FunctionIPv4ToIPv6 : public IFunction +{ +public: + static constexpr auto name = "IPv4ToIPv6"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!checkAndGetDataType(arguments[0].get())) + throw Exception("Illegal type " + arguments[0]->getName() + + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(16); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const auto & col_type_name = arguments[0]; + const ColumnPtr & column = col_type_name.column; + + if (const auto * col_in = typeid_cast(column.get())) + { + auto col_res = ColumnFixedString::create(IPV6_BINARY_LENGTH); + + auto & vec_res = col_res->getChars(); + vec_res.resize(col_in->size() * IPV6_BINARY_LENGTH); + + const auto & vec_in = col_in->getData(); + + for (size_t out_offset = 0, i = 0; out_offset < vec_res.size(); out_offset += IPV6_BINARY_LENGTH, ++i) + mapIPv4ToIPv6(vec_in[i], &vec_res[out_offset]); + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } + +private: + static void mapIPv4ToIPv6(UInt32 in, UInt8 * buf) + { + unalignedStore(buf, 0); + unalignedStore(buf + 8, 0x00000000FFFF0000ull | (static_cast(ntohl(in)) << 32)); + } +}; + +class FunctionToIPv4 : public FunctionIPv4StringToNum +{ +public: + static constexpr auto name = "toIPv4"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return DataTypeFactory::instance().get("IPv4"); + } +}; + +class FunctionToIPv6 : public FunctionIPv6StringToNum +{ +public: + static constexpr auto name = "toIPv6"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return DataTypeFactory::instance().get("IPv6"); + } +}; + +class FunctionMACNumToString : public IFunction +{ +public: + static constexpr auto name = "MACNumToString"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!WhichDataType(arguments[0]).isUInt64()) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName() + ", expected UInt64", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + static void formatMAC(UInt64 mac, UInt8 * out) + { + /// MAC address is represented in UInt64 in natural order (so, MAC addresses are compared in same order as UInt64). + /// Higher two bytes in UInt64 are just ignored. + + writeHexByteUppercase(mac >> 40, &out[0]); + out[2] = ':'; + writeHexByteUppercase(mac >> 32, &out[3]); + out[5] = ':'; + writeHexByteUppercase(mac >> 24, &out[6]); + out[8] = ':'; + writeHexByteUppercase(mac >> 16, &out[9]); + out[11] = ':'; + writeHexByteUppercase(mac >> 8, &out[12]); + out[14] = ':'; + writeHexByteUppercase(mac, &out[15]); + out[17] = '\0'; + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnPtr & column = arguments[0].column; + + if (const ColumnUInt64 * col = typeid_cast(column.get())) + { + const ColumnUInt64::Container & vec_in = col->getData(); + + auto col_res = ColumnString::create(); + + ColumnString::Chars & vec_res = col_res->getChars(); + ColumnString::Offsets & offsets_res = col_res->getOffsets(); + + vec_res.resize(vec_in.size() * 18); /// the value is: xx:xx:xx:xx:xx:xx\0 + offsets_res.resize(vec_in.size()); + + size_t current_offset = 0; + for (size_t i = 0; i < vec_in.size(); ++i) + { + formatMAC(vec_in[i], &vec_res[current_offset]); + current_offset += 18; + offsets_res[i] = current_offset; + } + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + + +struct ParseMACImpl +{ + static constexpr size_t min_string_size = 17; + static constexpr size_t max_string_size = 17; + + /** Example: 01:02:03:04:05:06. + * There could be any separators instead of : and them are just ignored. + * The order of resulting integers are correspond to the order of MAC address. + * If there are any chars other than valid hex digits for bytes, the behaviour is implementation specific. + */ + static UInt64 parse(const char * pos) + { + return (UInt64(unhex(pos[0])) << 44) + | (UInt64(unhex(pos[1])) << 40) + | (UInt64(unhex(pos[3])) << 36) + | (UInt64(unhex(pos[4])) << 32) + | (UInt64(unhex(pos[6])) << 28) + | (UInt64(unhex(pos[7])) << 24) + | (UInt64(unhex(pos[9])) << 20) + | (UInt64(unhex(pos[10])) << 16) + | (UInt64(unhex(pos[12])) << 12) + | (UInt64(unhex(pos[13])) << 8) + | (UInt64(unhex(pos[15])) << 4) + | (UInt64(unhex(pos[16]))); + } + + static constexpr auto name = "MACStringToNum"; +}; + +struct ParseOUIImpl +{ + static constexpr size_t min_string_size = 8; + static constexpr size_t max_string_size = 17; + + /** OUI is the first three bytes of MAC address. + * Example: 01:02:03. + */ + static UInt64 parse(const char * pos) + { + return (UInt64(unhex(pos[0])) << 20) + | (UInt64(unhex(pos[1])) << 16) + | (UInt64(unhex(pos[3])) << 12) + | (UInt64(unhex(pos[4])) << 8) + | (UInt64(unhex(pos[6])) << 4) + | (UInt64(unhex(pos[7]))); + } + + static constexpr auto name = "MACStringToOUI"; +}; + + +template +class FunctionMACStringTo : public IFunction +{ +public: + static constexpr auto name = Impl::name; + static FunctionPtr create(ContextPtr) { return std::make_shared>(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnPtr & column = arguments[0].column; + + if (const ColumnString * col = checkAndGetColumn(column.get())) + { + auto col_res = ColumnUInt64::create(); + + ColumnUInt64::Container & vec_res = col_res->getData(); + vec_res.resize(col->size()); + + const ColumnString::Chars & vec_src = col->getChars(); + const ColumnString::Offsets & offsets_src = col->getOffsets(); + size_t prev_offset = 0; + + for (size_t i = 0; i < vec_res.size(); ++i) + { + size_t current_offset = offsets_src[i]; + size_t string_size = current_offset - prev_offset - 1; /// mind the terminating zero byte + + if (string_size >= Impl::min_string_size && string_size <= Impl::max_string_size) + vec_res[i] = Impl::parse(reinterpret_cast(&vec_src[prev_offset])); + else + vec_res[i] = 0; + + prev_offset = current_offset; + } + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + +class FunctionIPv6CIDRToRange : public IFunction +{ +private: + +#if defined(__SSE2__) + +#include + + static inline void applyCIDRMask(const UInt8 * __restrict src, UInt8 * __restrict dst_lower, UInt8 * __restrict dst_upper, UInt8 bits_to_keep) + { + __m128i mask = _mm_loadu_si128(reinterpret_cast(getCIDRMaskIPv6(bits_to_keep).data())); + __m128i lower = _mm_and_si128(_mm_loadu_si128(reinterpret_cast(src)), mask); + _mm_storeu_si128(reinterpret_cast<__m128i *>(dst_lower), lower); + + __m128i inv_mask = _mm_xor_si128(mask, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128())); + __m128i upper = _mm_or_si128(lower, inv_mask); + _mm_storeu_si128(reinterpret_cast<__m128i *>(dst_upper), upper); + } + +#else + + /// NOTE IPv6 is stored in memory in big endian format that makes some difficulties. + static void applyCIDRMask(const UInt8 * __restrict src, UInt8 * __restrict dst_lower, UInt8 * __restrict dst_upper, UInt8 bits_to_keep) + { + const auto & mask = getCIDRMaskIPv6(bits_to_keep); + + for (size_t i = 0; i < 16; ++i) + { + dst_lower[i] = src[i] & mask[i]; + dst_upper[i] = dst_lower[i] | ~mask[i]; + } + } + +#endif + +public: + static constexpr auto name = "IPv6CIDRToRange"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + size_t getNumberOfArguments() const override { return 2; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + const auto * first_argument = checkAndGetDataType(arguments[0].get()); + if (!first_argument || first_argument->getN() != IPV6_BINARY_LENGTH) + throw Exception("Illegal type " + arguments[0]->getName() + + " of first argument of function " + getName() + + ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const DataTypePtr & second_argument = arguments[1]; + if (!isUInt8(second_argument)) + throw Exception{"Illegal type " + second_argument->getName() + + " of second argument of function " + getName() + + ", expected UInt8", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; + + DataTypePtr element = DataTypeFactory::instance().get("IPv6"); + return std::make_shared(DataTypes{element, element}); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override + { + const auto & col_type_name_ip = arguments[0]; + const ColumnPtr & column_ip = col_type_name_ip.column; + + const auto * col_const_ip_in = checkAndGetColumnConst(column_ip.get()); + const auto * col_ip_in = checkAndGetColumn(column_ip.get()); + + if (!col_ip_in && !col_const_ip_in) + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + + if ((col_const_ip_in && col_const_ip_in->getValue().size() != IPV6_BINARY_LENGTH) || + (col_ip_in && col_ip_in->getN() != IPV6_BINARY_LENGTH)) + throw Exception("Illegal type " + col_type_name_ip.type->getName() + + " of column " + column_ip->getName() + + " argument of function " + getName() + + ", expected FixedString(" + toString(IPV6_BINARY_LENGTH) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const auto & col_type_name_cidr = arguments[1]; + const ColumnPtr & column_cidr = col_type_name_cidr.column; + + const auto * col_const_cidr_in = checkAndGetColumnConst(column_cidr.get()); + const auto * col_cidr_in = checkAndGetColumn(column_cidr.get()); + + if (!col_const_cidr_in && !col_cidr_in) + throw Exception("Illegal column " + arguments[1].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + + auto col_res_lower_range = ColumnFixedString::create(IPV6_BINARY_LENGTH); + auto col_res_upper_range = ColumnFixedString::create(IPV6_BINARY_LENGTH); + + ColumnString::Chars & vec_res_lower_range = col_res_lower_range->getChars(); + vec_res_lower_range.resize(input_rows_count * IPV6_BINARY_LENGTH); + + ColumnString::Chars & vec_res_upper_range = col_res_upper_range->getChars(); + vec_res_upper_range.resize(input_rows_count * IPV6_BINARY_LENGTH); + + static constexpr UInt8 max_cidr_mask = IPV6_BINARY_LENGTH * 8; + + const String col_const_ip_str = col_const_ip_in ? col_const_ip_in->getValue() : ""; + const UInt8 * col_const_ip_value = col_const_ip_in ? reinterpret_cast(col_const_ip_str.c_str()) : nullptr; + + for (size_t offset = 0; offset < input_rows_count; ++offset) + { + const size_t offset_ipv6 = offset * IPV6_BINARY_LENGTH; + + const UInt8 * ip = col_const_ip_in + ? col_const_ip_value + : &col_ip_in->getChars()[offset_ipv6]; + + UInt8 cidr = col_const_cidr_in + ? col_const_cidr_in->getValue() + : col_cidr_in->getData()[offset]; + + cidr = std::min(cidr, max_cidr_mask); + + applyCIDRMask(ip, &vec_res_lower_range[offset_ipv6], &vec_res_upper_range[offset_ipv6], cidr); + } + + return ColumnTuple::create(Columns{std::move(col_res_lower_range), std::move(col_res_upper_range)}); + } +}; + + +class FunctionIPv4CIDRToRange : public IFunction +{ +private: + static inline std::pair applyCIDRMask(UInt32 src, UInt8 bits_to_keep) + { + if (bits_to_keep >= 8 * sizeof(UInt32)) + return { src, src }; + if (bits_to_keep == 0) + return { UInt32(0), UInt32(-1) }; + + UInt32 mask = UInt32(-1) << (8 * sizeof(UInt32) - bits_to_keep); + UInt32 lower = src & mask; + UInt32 upper = lower | ~mask; + + return { lower, upper }; + } + +public: + static constexpr auto name = "IPv4CIDRToRange"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + size_t getNumberOfArguments() const override { return 2; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!WhichDataType(arguments[0]).isUInt32()) + throw Exception("Illegal type " + arguments[0]->getName() + + " of first argument of function " + getName() + + ", expected UInt32", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + + const DataTypePtr & second_argument = arguments[1]; + if (!isUInt8(second_argument)) + throw Exception{"Illegal type " + second_argument->getName() + + " of second argument of function " + getName() + + ", expected UInt8", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; + + DataTypePtr element = DataTypeFactory::instance().get("IPv4"); + return std::make_shared(DataTypes{element, element}); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override + { + const auto & col_type_name_ip = arguments[0]; + const ColumnPtr & column_ip = col_type_name_ip.column; + + const auto * col_const_ip_in = checkAndGetColumnConst(column_ip.get()); + const auto * col_ip_in = checkAndGetColumn(column_ip.get()); + if (!col_const_ip_in && !col_ip_in) + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + + const auto & col_type_name_cidr = arguments[1]; + const ColumnPtr & column_cidr = col_type_name_cidr.column; + + const auto * col_const_cidr_in = checkAndGetColumnConst(column_cidr.get()); + const auto * col_cidr_in = checkAndGetColumn(column_cidr.get()); + + if (!col_const_cidr_in && !col_cidr_in) + throw Exception("Illegal column " + arguments[1].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + + auto col_res_lower_range = ColumnUInt32::create(); + auto col_res_upper_range = ColumnUInt32::create(); + + auto & vec_res_lower_range = col_res_lower_range->getData(); + vec_res_lower_range.resize(input_rows_count); + + auto & vec_res_upper_range = col_res_upper_range->getData(); + vec_res_upper_range.resize(input_rows_count); + + for (size_t i = 0; i < input_rows_count; ++i) + { + UInt32 ip = col_const_ip_in + ? col_const_ip_in->getValue() + : col_ip_in->getData()[i]; + + UInt8 cidr = col_const_cidr_in + ? col_const_cidr_in->getValue() + : col_cidr_in->getData()[i]; + + std::tie(vec_res_lower_range[i], vec_res_upper_range[i]) = applyCIDRMask(ip, cidr); + } + + return ColumnTuple::create(Columns{std::move(col_res_lower_range), std::move(col_res_upper_range)}); + } +}; + +class FunctionIsIPv4String : public FunctionIPv4StringToNum +{ +public: + static constexpr auto name = "isIPv4String"; + + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + return std::make_shared(); + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnPtr & column = arguments[0].column; + if (const ColumnString * col = checkAndGetColumn(column.get())) + { + auto col_res = ColumnUInt8::create(); + + ColumnUInt8::Container & vec_res = col_res->getData(); + vec_res.resize(col->size()); + + const ColumnString::Chars & vec_src = col->getChars(); + const ColumnString::Offsets & offsets_src = col->getOffsets(); + size_t prev_offset = 0; + UInt32 result = 0; + + for (size_t i = 0; i < vec_res.size(); ++i) + { + vec_res[i] = DB::parseIPv4(reinterpret_cast(&vec_src[prev_offset]), reinterpret_cast(&result)); + prev_offset = offsets_src[i]; + } + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + +class FunctionIsIPv6String : public FunctionIPv6StringToNum +{ +public: + static constexpr auto name = "isIPv6String"; + + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override { return name; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnPtr & column = arguments[0].column; + + if (const ColumnString * col = checkAndGetColumn(column.get())) + { + auto col_res = ColumnUInt8::create(); + + ColumnUInt8::Container & vec_res = col_res->getData(); + vec_res.resize(col->size()); + + const ColumnString::Chars & vec_src = col->getChars(); + const ColumnString::Offsets & offsets_src = col->getOffsets(); + size_t prev_offset = 0; + char v[IPV6_BINARY_LENGTH]; + + for (size_t i = 0; i < vec_res.size(); ++i) + { + vec_res[i] = DB::parseIPv6(reinterpret_cast(&vec_src[prev_offset]), reinterpret_cast(v)); + prev_offset = offsets_src[i]; + } + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + +struct NameFunctionIPv4NumToString { static constexpr auto name = "IPv4NumToString"; }; +struct NameFunctionIPv4NumToStringClassC { static constexpr auto name = "IPv4NumToStringClassC"; }; + +void registerFunctionsCoding(FunctionFactory & factory) +{ + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction>(); + factory.registerFunction>(); + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); + + factory.registerFunction>(); + factory.registerFunction>(); + + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); + + /// MysQL compatibility aliases: + factory.registerAlias("INET_ATON", FunctionIPv4StringToNum::name, FunctionFactory::CaseInsensitive); + factory.registerAlias("INET6_NTOA", FunctionIPv6NumToString::name, FunctionFactory::CaseInsensitive); + factory.registerAlias("INET6_ATON", FunctionIPv6StringToNum::name, FunctionFactory::CaseInsensitive); + factory.registerAlias("INET_NTOA", NameFunctionIPv4NumToString::name, FunctionFactory::CaseInsensitive); +} + +} diff --git a/src/Functions/FunctionsCodingUUID.cpp b/src/Functions/FunctionsCodingUUID.cpp new file mode 100644 index 00000000000..5f3e7b0de4a --- /dev/null +++ b/src/Functions/FunctionsCodingUUID.cpp @@ -0,0 +1,236 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int ILLEGAL_COLUMN; +} + +constexpr size_t uuid_bytes_length = 16; +constexpr size_t uuid_text_length = 36; + +class FunctionUUIDNumToString : public IFunction +{ + +public: + static constexpr auto name = "UUIDNumToString"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + const auto * ptr = checkAndGetDataType(arguments[0].get()); + if (!ptr || ptr->getN() != uuid_bytes_length) + throw Exception("Illegal type " + arguments[0]->getName() + + " of argument of function " + getName() + + ", expected FixedString(" + toString(uuid_bytes_length) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnWithTypeAndName & col_type_name = arguments[0]; + const ColumnPtr & column = col_type_name.column; + + if (const auto * col_in = checkAndGetColumn(column.get())) + { + if (col_in->getN() != uuid_bytes_length) + throw Exception("Illegal type " + col_type_name.type->getName() + + " of column " + col_in->getName() + + " argument of function " + getName() + + ", expected FixedString(" + toString(uuid_bytes_length) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const auto size = col_in->size(); + const auto & vec_in = col_in->getChars(); + + auto col_res = ColumnString::create(); + + ColumnString::Chars & vec_res = col_res->getChars(); + ColumnString::Offsets & offsets_res = col_res->getOffsets(); + vec_res.resize(size * (uuid_text_length + 1)); + offsets_res.resize(size); + + size_t src_offset = 0; + size_t dst_offset = 0; + + for (size_t i = 0; i < size; ++i) + { + formatUUID(&vec_in[src_offset], &vec_res[dst_offset]); + src_offset += uuid_bytes_length; + dst_offset += uuid_text_length; + vec_res[dst_offset] = 0; + ++dst_offset; + offsets_res[i] = dst_offset; + } + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + + +class FunctionUUIDStringToNum : public IFunction +{ +private: + static void parseHex(const UInt8 * __restrict src, UInt8 * __restrict dst, const size_t num_bytes) + { + size_t src_pos = 0; + size_t dst_pos = 0; + for (; dst_pos < num_bytes; ++dst_pos) + { + dst[dst_pos] = unhex2(reinterpret_cast(&src[src_pos])); + src_pos += 2; + } + } + + static void parseUUID(const UInt8 * src36, UInt8 * dst16) + { + /// If string is not like UUID - implementation specific behaviour. + + parseHex(&src36[0], &dst16[0], 4); + parseHex(&src36[9], &dst16[4], 2); + parseHex(&src36[14], &dst16[6], 2); + parseHex(&src36[19], &dst16[8], 2); + parseHex(&src36[24], &dst16[10], 6); + } + +public: + static constexpr auto name = "UUIDStringToNum"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + /// String or FixedString(36) + if (!isString(arguments[0])) + { + const auto * ptr = checkAndGetDataType(arguments[0].get()); + if (!ptr || ptr->getN() != uuid_text_length) + throw Exception("Illegal type " + arguments[0]->getName() + + " of argument of function " + getName() + + ", expected FixedString(" + toString(uuid_text_length) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + } + + return std::make_shared(uuid_bytes_length); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const ColumnWithTypeAndName & col_type_name = arguments[0]; + const ColumnPtr & column = col_type_name.column; + + if (const auto * col_in = checkAndGetColumn(column.get())) + { + const auto & vec_in = col_in->getChars(); + const auto & offsets_in = col_in->getOffsets(); + const size_t size = offsets_in.size(); + + auto col_res = ColumnFixedString::create(uuid_bytes_length); + + ColumnString::Chars & vec_res = col_res->getChars(); + vec_res.resize(size * uuid_bytes_length); + + size_t src_offset = 0; + size_t dst_offset = 0; + + for (size_t i = 0; i < size; ++i) + { + /// If string has incorrect length - then return zero UUID. + /// If string has correct length but contains something not like UUID - implementation specific behaviour. + + size_t string_size = offsets_in[i] - src_offset; + if (string_size == uuid_text_length + 1) + parseUUID(&vec_in[src_offset], &vec_res[dst_offset]); + else + memset(&vec_res[dst_offset], 0, uuid_bytes_length); + + dst_offset += uuid_bytes_length; + src_offset += string_size; + } + + return col_res; + } + else if (const auto * col_in_fixed = checkAndGetColumn(column.get())) + { + if (col_in_fixed->getN() != uuid_text_length) + throw Exception("Illegal type " + col_type_name.type->getName() + + " of column " + col_in_fixed->getName() + + " argument of function " + getName() + + ", expected FixedString(" + toString(uuid_text_length) + ")", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + const auto size = col_in_fixed->size(); + const auto & vec_in = col_in_fixed->getChars(); + + auto col_res = ColumnFixedString::create(uuid_bytes_length); + + ColumnString::Chars & vec_res = col_res->getChars(); + vec_res.resize(size * uuid_bytes_length); + + size_t src_offset = 0; + size_t dst_offset = 0; + + for (size_t i = 0; i < size; ++i) + { + parseUUID(&vec_in[src_offset], &vec_res[dst_offset]); + src_offset += uuid_text_length; + dst_offset += uuid_bytes_length; + } + + return col_res; + } + else + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); + } +}; + +void registerFunctionsCodingUUID(FunctionFactory & factory) +{ + factory.registerFunction(); + factory.registerFunction(); +} + +} diff --git a/src/Functions/FunctionsComparison.h b/src/Functions/FunctionsComparison.h index 389b150e381..b9c7c211b74 100644 --- a/src/Functions/FunctionsComparison.h +++ b/src/Functions/FunctionsComparison.h @@ -1218,17 +1218,36 @@ public: { return res; } - else if ((isColumnedAsDecimal(left_type) || isColumnedAsDecimal(right_type)) - // Comparing Date and DateTime64 requires implicit conversion, - // otherwise Date is treated as number. - && !(date_and_datetime && (isDate(left_type) || isDate(right_type)))) + else if ((isColumnedAsDecimal(left_type) || isColumnedAsDecimal(right_type))) { - // compare - if (!allowDecimalComparison(left_type, right_type) && !date_and_datetime) - throw Exception("No operation " + getName() + " between " + left_type->getName() + " and " + right_type->getName(), - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + // Comparing Date and DateTime64 requires implicit conversion, + if (date_and_datetime && (isDate(left_type) || isDate(right_type))) + { + DataTypePtr common_type = getLeastSupertype({left_type, right_type}); + ColumnPtr c0_converted = castColumn(col_with_type_and_name_left, common_type); + ColumnPtr c1_converted = castColumn(col_with_type_and_name_right, common_type); + return executeDecimal({c0_converted, common_type, "left"}, {c1_converted, common_type, "right"}); + } + else + { + // compare + if (!allowDecimalComparison(left_type, right_type) && !date_and_datetime) + throw Exception( + "No operation " + getName() + " between " + left_type->getName() + " and " + right_type->getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + return executeDecimal(col_with_type_and_name_left, col_with_type_and_name_right); + } - return executeDecimal(col_with_type_and_name_left, col_with_type_and_name_right); + } + else if (date_and_datetime) + { + DataTypePtr common_type = getLeastSupertype({left_type, right_type}); + ColumnPtr c0_converted = castColumn(col_with_type_and_name_left, common_type); + ColumnPtr c1_converted = castColumn(col_with_type_and_name_right, common_type); + if (!((res = executeNumLeftType(c0_converted.get(), c1_converted.get())) + || (res = executeNumLeftType(c0_converted.get(), c1_converted.get())))) + throw Exception("Date related common types can only be UInt32 or UInt64", ErrorCodes::LOGICAL_ERROR); + return res; } else if (left_type->equals(*right_type)) { diff --git a/src/Functions/FunctionsConversion.h b/src/Functions/FunctionsConversion.h index c7bbb062b40..67a02e3fd34 100644 --- a/src/Functions/FunctionsConversion.h +++ b/src/Functions/FunctionsConversion.h @@ -132,7 +132,7 @@ struct ConvertImpl if (std::is_same_v) { - if (isDate(named_from.type)) + if (isDateOrDate32(named_from.type)) throw Exception("Illegal type " + named_from.type->getName() + " of first argument of function " + Name::name, ErrorCodes::ILLEGAL_COLUMN); } @@ -285,6 +285,10 @@ struct ConvertImpl template struct ConvertImpl : DateTimeTransformImpl {}; +/** Conversion of DateTime to Date32: throw off time component. + */ +template struct ConvertImpl + : DateTimeTransformImpl {}; /** Conversion of Date to DateTime: adding 00:00:00 time component. */ @@ -297,6 +301,11 @@ struct ToDateTimeImpl return time_zone.fromDayNum(DayNum(d)); } + static inline UInt32 execute(Int32 d, const DateLUTImpl & time_zone) + { + return time_zone.fromDayNum(ExtendedDayNum(d)); + } + static inline UInt32 execute(UInt32 dt, const DateLUTImpl & /*time_zone*/) { return dt; @@ -312,6 +321,9 @@ struct ToDateTimeImpl template struct ConvertImpl : DateTimeTransformImpl {}; +template struct ConvertImpl + : DateTimeTransformImpl {}; + /// Implementation of toDate function. template @@ -322,7 +334,7 @@ struct ToDateTransform32Or64 static inline NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl & time_zone) { // since converting to Date, no need in values outside of default LUT range. - return (from < 0xFFFF) + return (from < DATE_LUT_MAX_DAY_NUM) ? from : time_zone.toDayNum(std::min(time_t(from), time_t(0xFFFFFFFF))); } @@ -339,7 +351,7 @@ struct ToDateTransform32Or64Signed /// The function should be monotonic (better for query optimizations), so we saturate instead of overflow. if (from < 0) return 0; - return (from < 0xFFFF) + return (from < DATE_LUT_MAX_DAY_NUM) ? from : time_zone.toDayNum(std::min(time_t(from), time_t(0xFFFFFFFF))); } @@ -358,6 +370,48 @@ struct ToDateTransform8Or16Signed } }; +/// Implementation of toDate32 function. + +template +struct ToDate32Transform32Or64 +{ + static constexpr auto name = "toDate32"; + + static inline NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl & time_zone) + { + return (from < DATE_LUT_MAX_EXTEND_DAY_NUM) + ? from + : time_zone.toDayNum(std::min(time_t(from), time_t(0xFFFFFFFF))); + } +}; + +template +struct ToDate32Transform32Or64Signed +{ + static constexpr auto name = "toDate32"; + + static inline NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl & time_zone) + { + static const Int32 daynum_min_offset = -static_cast(DateLUT::instance().getDayNumOffsetEpoch()); + if (from < daynum_min_offset) + return daynum_min_offset; + return (from < DATE_LUT_MAX_EXTEND_DAY_NUM) + ? from + : time_zone.toDayNum(std::min(time_t(from), time_t(0xFFFFFFFF))); + } +}; + +template +struct ToDate32Transform8Or16Signed +{ + static constexpr auto name = "toDate32"; + + static inline NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl &) + { + return from; + } +}; + /** Special case of converting Int8, Int16, (U)Int32 or (U)Int64 (and also, for convenience, * Float32, Float64) to Date. If the number is negative, saturate it to unix epoch time. If the * number is less than 65536, then it is treated as DayNum, and if it's greater or equals to 65536, @@ -384,6 +438,23 @@ template struct ConvertImpl struct ConvertImpl : DateTimeTransformImpl> {}; +template struct ConvertImpl + : DateTimeTransformImpl> {}; +template struct ConvertImpl + : DateTimeTransformImpl> {}; +template struct ConvertImpl + : DateTimeTransformImpl> {}; +template struct ConvertImpl + : DateTimeTransformImpl> {}; +template struct ConvertImpl + : DateTimeTransformImpl> {}; +template struct ConvertImpl + : DateTimeTransformImpl> {}; +template struct ConvertImpl + : DateTimeTransformImpl> {}; +template struct ConvertImpl + : DateTimeTransformImpl> {}; + template struct ToDateTimeTransform64 @@ -599,6 +670,17 @@ struct FormatImpl } }; +template <> +struct FormatImpl +{ + template + static ReturnType execute(const DataTypeDate::FieldType x, WriteBuffer & wb, const DataTypeDate32 *, const DateLUTImpl *) + { + writeDateText(ExtendedDayNum(x), wb); + return ReturnType(true); + } +}; + template <> struct FormatImpl { @@ -867,7 +949,9 @@ inline bool tryParseImpl(DataTypeDate32::FieldType & x, ReadBuff { ExtendedDayNum tmp(0); if (!tryReadDateText(tmp, rb)) + { return false; + } x = tmp; return true; } @@ -1096,7 +1180,9 @@ struct ConvertThroughParsing SerializationDecimal::readText( vec_to[i], read_buffer, ToDataType::maxPrecision(), vec_to.getScale()); else + { parseImpl(vec_to[i], read_buffer, local_time_zone); + } } if (!isAllRead(read_buffer)) @@ -1146,7 +1232,16 @@ struct ConvertThroughParsing parsed = false; if (!parsed) - vec_to[i] = static_cast(0); + { + if constexpr (std::is_same_v) + { + vec_to[i] = -static_cast(DateLUT::instance().getDayNumOffsetEpoch()); + } + else + { + vec_to[i] = static_cast(0); + } + } if constexpr (exception_mode == ConvertFromStringExceptionMode::Null) (*vec_null_map_to)[i] = !parsed; @@ -1420,6 +1515,8 @@ public: || std::is_same_v // toDate(value[, timezone : String]) || std::is_same_v // TODO: shall we allow timestamp argument for toDate? DateTime knows nothing about timezones and this argument is ignored below. + // toDate(value[, timezone : String]) + || std::is_same_v // toDateTime(value[, timezone: String]) || std::is_same_v // toDateTime64(value, scale : Integer[, timezone: String]) @@ -1607,7 +1704,9 @@ private: result_column = ConvertImpl::execute(arguments, result_type, input_rows_count); } else + { result_column = ConvertImpl::execute(arguments, result_type, input_rows_count); + } return true; }; @@ -1979,7 +2078,7 @@ struct ToDateMonotonicity static IFunction::Monotonicity get(const IDataType & type, const Field & left, const Field & right) { auto which = WhichDataType(type); - if (which.isDate() || which.isDate32() || which.isDateTime() || which.isDateTime64() || which.isInt8() || which.isInt16() || which.isUInt8() || which.isUInt16()) + if (which.isDateOrDate32() || which.isDateTime() || which.isDateTime64() || which.isInt8() || which.isInt16() || which.isUInt8() || which.isUInt16()) return {true, true, true}; else if ( (which.isUInt() && ((left.isNull() || left.get() < 0xFFFF) && (right.isNull() || right.get() >= 0xFFFF))) @@ -2021,8 +2120,8 @@ struct ToStringMonotonicity if (const auto * low_cardinality_type = checkAndGetDataType(type_ptr)) type_ptr = low_cardinality_type->getDictionaryType().get(); - /// `toString` function is monotonous if the argument is Date or DateTime or String, or non-negative numbers with the same number of symbols. - if (checkDataTypes(type_ptr)) + /// `toString` function is monotonous if the argument is Date or Date32 or DateTime or String, or non-negative numbers with the same number of symbols. + if (checkDataTypes(type_ptr)) return positive; if (left.isNull() || right.isNull()) @@ -2110,6 +2209,7 @@ template <> struct FunctionTo { using Type = FunctionToInt256; } template <> struct FunctionTo { using Type = FunctionToFloat32; }; template <> struct FunctionTo { using Type = FunctionToFloat64; }; template <> struct FunctionTo { using Type = FunctionToDate; }; +template <> struct FunctionTo { using Type = FunctionToDate32; }; template <> struct FunctionTo { using Type = FunctionToDateTime; }; template <> struct FunctionTo { using Type = FunctionToDateTime64; }; template <> struct FunctionTo { using Type = FunctionToUUID; }; @@ -2502,7 +2602,7 @@ private: UInt32 scale = to_type->getScale(); WhichDataType which(type_index); - bool ok = which.isNativeInt() || which.isNativeUInt() || which.isDecimal() || which.isFloat() || which.isDate() || which.isDateTime() || which.isDateTime64() + bool ok = which.isNativeInt() || which.isNativeUInt() || which.isDecimal() || which.isFloat() || which.isDateOrDate32() || which.isDateTime() || which.isDateTime64() || which.isStringOrFixedString(); if (!ok) { @@ -3164,6 +3264,7 @@ private: std::is_same_v || std::is_same_v || std::is_same_v || + std::is_same_v || std::is_same_v || std::is_same_v) { @@ -3263,6 +3364,8 @@ public: return monotonicityForType(type); if (const auto * type = checkAndGetDataType(to_type)) return monotonicityForType(type); + if (const auto * type = checkAndGetDataType(to_type)) + return monotonicityForType(type); if (const auto * type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto * type = checkAndGetDataType(to_type)) diff --git a/src/Functions/FunctionsStringArray.cpp b/src/Functions/FunctionsStringArray.cpp index 14092d7dd3d..765317093c1 100644 --- a/src/Functions/FunctionsStringArray.cpp +++ b/src/Functions/FunctionsStringArray.cpp @@ -9,6 +9,8 @@ void registerFunctionsStringArray(FunctionFactory & factory) { factory.registerFunction(); factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); factory.registerFunction(); factory.registerFunction(); factory.registerFunction(); diff --git a/src/Functions/FunctionsStringArray.h b/src/Functions/FunctionsStringArray.h index 27f10797651..4d2312f207c 100644 --- a/src/Functions/FunctionsStringArray.h +++ b/src/Functions/FunctionsStringArray.h @@ -33,6 +33,9 @@ namespace ErrorCodes * splitByString(sep, s) * splitByRegexp(regexp, s) * + * splitByWhitespace(s) - split the string by whitespace characters + * splitByNonAlpha(s) - split the string by whitespace and punctuation characters + * * extractAll(s, regexp) - select from the string the subsequences corresponding to the regexp. * - first subpattern, if regexp has subpattern; * - zero subpattern (the match part, otherwise); @@ -111,6 +114,121 @@ public: } }; +class SplitByNonAlphaImpl +{ +private: + Pos pos; + Pos end; + +public: + /// Get the name of the function. + static constexpr auto name = "splitByNonAlpha"; + static String getName() { return name; } + + static size_t getNumberOfArguments() { return 1; } + + /// Check the type of the function's arguments. + static void checkArguments(const DataTypes & arguments) + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of first argument of function " + getName() + ". Must be String.", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + } + + /// Initialize by the function arguments. + void init(const ColumnsWithTypeAndName & /*arguments*/) {} + + /// Called for each next string. + void set(Pos pos_, Pos end_) + { + pos = pos_; + end = end_; + } + + /// Returns the position of the argument, that is the column of strings + size_t getStringsArgumentPosition() + { + return 0; + } + + /// Get the next token, if any, or return false. + bool get(Pos & token_begin, Pos & token_end) + { + /// Skip garbage + while (pos < end && (isWhitespaceASCII(*pos) || isPunctuationASCII(*pos))) + ++pos; + + if (pos == end) + return false; + + token_begin = pos; + + while (pos < end && !(isWhitespaceASCII(*pos) || isPunctuationASCII(*pos))) + ++pos; + + token_end = pos; + + return true; + } +}; + +class SplitByWhitespaceImpl +{ +private: + Pos pos; + Pos end; + +public: + /// Get the name of the function. + static constexpr auto name = "splitByWhitespace"; + static String getName() { return name; } + + static size_t getNumberOfArguments() { return 1; } + + /// Check the type of the function's arguments. + static void checkArguments(const DataTypes & arguments) + { + if (!isString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of first argument of function " + getName() + ". Must be String.", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + } + + /// Initialize by the function arguments. + void init(const ColumnsWithTypeAndName & /*arguments*/) {} + + /// Called for each next string. + void set(Pos pos_, Pos end_) + { + pos = pos_; + end = end_; + } + + /// Returns the position of the argument, that is the column of strings + size_t getStringsArgumentPosition() + { + return 0; + } + + /// Get the next token, if any, or return false. + bool get(Pos & token_begin, Pos & token_end) + { + /// Skip garbage + while (pos < end && isWhitespaceASCII(*pos)) + ++pos; + + if (pos == end) + return false; + + token_begin = pos; + + while (pos < end && !isWhitespaceASCII(*pos)) + ++pos; + + token_end = pos; + + return true; + } +}; class SplitByCharImpl { @@ -662,6 +780,8 @@ public: using FunctionAlphaTokens = FunctionTokens; +using FunctionSplitByNonAlpha = FunctionTokens; +using FunctionSplitByWhitespace = FunctionTokens; using FunctionSplitByChar = FunctionTokens; using FunctionSplitByString = FunctionTokens; using FunctionSplitByRegexp = FunctionTokens; diff --git a/src/Functions/IFunction.cpp b/src/Functions/IFunction.cpp index 998d48941ba..e3802b98abf 100644 --- a/src/Functions/IFunction.cpp +++ b/src/Functions/IFunction.cpp @@ -181,7 +181,10 @@ ColumnPtr IExecutableFunction::defaultImplementationForNulls( { // Default implementation for nulls returns null result for null arguments, // so the result type must be nullable. - assert(result_type->isNullable()); + if (!result_type->isNullable()) + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Function {} with Null argument and default implementation for Nulls " + "is expected to return Nullable result, got {}", result_type->getName()); return result_type->createColumnConstWithDefaultValue(input_rows_count); } diff --git a/src/Functions/Regexps.h b/src/Functions/Regexps.h index 11f3e31e22e..5da1224ab8c 100644 --- a/src/Functions/Regexps.h +++ b/src/Functions/Regexps.h @@ -113,12 +113,34 @@ namespace MultiRegexps ScratchPtr scratch; }; + class RegexpsConstructor + { + public: + RegexpsConstructor() = default; + + void setConstructor(std::function constructor_) { constructor = std::move(constructor_); } + + Regexps * operator()() + { + std::unique_lock lock(mutex); + if (regexp) + return &*regexp; + regexp = constructor(); + return &*regexp; + } + + private: + std::function constructor; + std::optional regexp; + std::mutex mutex; + }; + struct Pool { /// Mutex for finding in map. std::mutex mutex; /// Patterns + possible edit_distance to database and scratch. - std::map, std::optional>, Regexps> storage; + std::map, std::optional>, RegexpsConstructor> storage; }; template @@ -250,15 +272,19 @@ namespace MultiRegexps /// If not found, compile and let other threads wait. if (known_regexps.storage.end() == it) + { it = known_regexps.storage - .emplace( - std::pair{str_patterns, edit_distance}, - constructRegexps(str_patterns, edit_distance)) + .emplace(std::piecewise_construct, std::make_tuple(std::move(str_patterns), edit_distance), std::make_tuple()) .first; - /// If found, unlock and return the database. - lock.unlock(); + it->second.setConstructor([&str_patterns = it->first.first, edit_distance]() + { + return constructRegexps(str_patterns, edit_distance); + }); + } - return &it->second; + /// Unlock before possible construction. + lock.unlock(); + return it->second(); } } diff --git a/src/Functions/array/arrayIndex.h b/src/Functions/array/arrayIndex.h index f3b279faaef..a390abc4eaf 100644 --- a/src/Functions/array/arrayIndex.h +++ b/src/Functions/array/arrayIndex.h @@ -58,10 +58,10 @@ struct CountEqualAction namespace Impl { template < - class ConcreteAction, + typename ConcreteAction, bool RightArgIsConstant = false, - class IntegralInitial = UInt64, - class IntegralResult = UInt64> + typename IntegralInitial = UInt64, + typename IntegralResult = UInt64> struct Main { private: @@ -94,13 +94,13 @@ private: } /// LowCardinality - static bool compare(const IColumn & left, const Result& right, size_t i, size_t) + static bool compare(const IColumn & left, const Result & right, size_t i, size_t) { return left.getUInt(i) == right; } /// Generic - static bool compare(const IColumn& left, const IColumn& right, size_t i, size_t j) + static bool compare(const IColumn & left, const IColumn & right, size_t i, size_t j) { return 0 == left.compareAt(i, RightArgIsConstant ? 0 : j, right, 1); } @@ -109,7 +109,7 @@ private: static constexpr bool hasNull(const NullMap * const null_map, size_t i) noexcept { return (*null_map)[i]; } - template + template static void process( const Data & data, const ArrOffsets & offsets, const Target & target, ResultArr & result, [[maybe_unused]] const NullMap * const null_map_data, @@ -148,7 +148,7 @@ private: continue; } else if (!compare(data, target, current_offset + j, i)) - continue; + continue; ConcreteAction::apply(current, j); @@ -162,7 +162,7 @@ private: } public: - template + template static void vector( const Data & data, const ArrOffsets & offsets, @@ -183,7 +183,7 @@ public: }; /// When the 2nd function argument is a NULL value. -template +template struct Null { using ResultType = typename ConcreteAction::ResultType; @@ -227,7 +227,7 @@ struct Null } }; -template +template struct String { private: @@ -350,7 +350,7 @@ public: }; } -template +template class FunctionArrayIndex : public IFunction { public: @@ -565,7 +565,7 @@ private: * Integral s = {s1, s2, ...} * (s1, s1, s2, ...), (s2, s1, s2, ...), (s3, s1, s2, ...) */ - template + template static inline ColumnPtr executeIntegral(const ColumnsWithTypeAndName & arguments) { const ColumnArray * const left = checkAndGetColumn(arguments[0].column.get()); @@ -590,14 +590,14 @@ private: return nullptr; } - template + template static inline bool executeIntegral(ExecutionData& data) { return (executeIntegralExpanded(data) || ...); } /// Invoke executeIntegralImpl with such parameters: (A, other1), (A, other2), ... - template + template static inline bool executeIntegralExpanded(ExecutionData& data) { return (executeIntegralImpl(data) || ...); @@ -608,7 +608,7 @@ private: * second argument, namely, the @e value, so it's possible to invoke the has(Array(Int8), UInt64) e.g. * so we have to check all possible variants for #Initial and #Resulting types. */ - template + template static bool executeIntegralImpl(ExecutionData& data) { const ColumnVector * col_nested = checkAndGetColumn>(&data.left); @@ -647,7 +647,7 @@ private: } /** - * Catches arguments of type LC(T) (left) and U (right). + * Catches arguments of type LowCardinality(T) (left) and U (right). * * The perftests * https://clickhouse-test-reports.s3.yandex.net/12550/2d27fa0fa8c198a82bf1fe3625050ccf56695976/integration_tests_(release).html @@ -726,7 +726,7 @@ private: return col_result; } - else if (col_lc->nestedIsNullable()) // LC(Nullable(T)) and U + else if (col_lc->nestedIsNullable()) // LowCardinality(Nullable(T)) and U { const ColumnPtr left_casted = col_lc->convertToFullColumnIfLowCardinality(); // Nullable(T) const ColumnNullable& left_nullable = *checkAndGetColumn(left_casted.get()); @@ -746,16 +746,17 @@ private: ? right_nullable->getNestedColumn() : *right_casted.get(); - ExecutionData data = { + ExecutionData data = + { left_ptr, right_ptr, col_array->getOffsets(), nullptr, {null_map_left_casted, null_map_right_casted}}; - if (dispatchConvertedLCColumns(data)) + if (dispatchConvertedLowCardinalityColumns(data)) return data.result_column; } - else // LC(T) and U, T not Nullable + else // LowCardinality(T) and U, T not Nullable { if (col_arg.isNullable()) return nullptr; @@ -764,24 +765,25 @@ private: arg_lc && arg_lc->isNullable()) return nullptr; - // LC(T) and U (possibly LC(V)) + // LowCardinality(T) and U (possibly LowCardinality(V)) const ColumnPtr left_casted = col_lc->convertToFullColumnIfLowCardinality(); const ColumnPtr right_casted = col_arg.convertToFullColumnIfLowCardinality(); - ExecutionData data = { + ExecutionData data = + { *left_casted.get(), *right_casted.get(), col_array->getOffsets(), nullptr, {null_map_data, null_map_item} }; - if (dispatchConvertedLCColumns(data)) + if (dispatchConvertedLowCardinalityColumns(data)) return data.result_column; } return nullptr; } - static bool dispatchConvertedLCColumns(ExecutionData& data) + static bool dispatchConvertedLowCardinalityColumns(ExecutionData & data) { if (data.left.isNumeric() && data.right.isNumeric()) // ColumnArrays return executeIntegral(data); diff --git a/src/Functions/array/mapPopulateSeries.cpp b/src/Functions/array/mapPopulateSeries.cpp index eb2f6192346..51e436e8022 100644 --- a/src/Functions/array/mapPopulateSeries.cpp +++ b/src/Functions/array/mapPopulateSeries.cpp @@ -1,4 +1,5 @@ #include +#include #include #include #include @@ -7,6 +8,7 @@ #include #include #include "Core/ColumnWithTypeAndName.h" +#include "DataTypes/DataTypeMap.h" #include "DataTypes/IDataType.h" namespace DB @@ -32,85 +34,211 @@ private: bool isVariadic() const override { return true; } bool useDefaultImplementationForConstants() const override { return true; } - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + void checkTypes(const DataTypePtr & key_type, const DataTypePtr max_key_type) const + { + WhichDataType which_key(key_type); + if (!(which_key.isInt() || which_key.isUInt())) + { + throw Exception( + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Keys for {} function should be of integer type (signed or unsigned)", getName()); + } + + if (max_key_type) + { + WhichDataType which_max_key(max_key_type); + + if (which_max_key.isNullable()) + throw Exception( + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Max key argument in arguments of function " + getName() + " can not be Nullable"); + + if (key_type->getTypeId() != max_key_type->getTypeId()) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Max key type in {} should be same as keys type", getName()); + } + } + + DataTypePtr getReturnTypeForTuple(const DataTypes & arguments) const { if (arguments.size() < 2) - throw Exception{getName() + " accepts at least two arrays for key and value", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH}; + throw Exception( + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Function {} accepts at least two arrays for key and value", getName()); if (arguments.size() > 3) - throw Exception{"too many arguments in " + getName() + " call", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH}; + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Too many arguments in {} call", getName()); const DataTypeArray * key_array_type = checkAndGetDataType(arguments[0].get()); const DataTypeArray * val_array_type = checkAndGetDataType(arguments[1].get()); if (!key_array_type || !val_array_type) - throw Exception{getName() + " accepts two arrays for key and value", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Function {} accepts two arrays for key and value", getName()); - DataTypePtr keys_type = key_array_type->getNestedType(); - WhichDataType which_key(keys_type); - if (!(which_key.isNativeInt() || which_key.isNativeUInt())) - { - throw Exception( - "Keys for " + getName() + " should be of native integer type (signed or unsigned)", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - } + const auto & key_type = key_array_type->getNestedType(); if (arguments.size() == 3) - { - DataTypePtr max_key_type = arguments[2]; - WhichDataType which_max_key(max_key_type); - - if (which_max_key.isNullable()) - throw Exception( - "Max key argument in arguments of function " + getName() + " can not be Nullable", - ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - if (keys_type->getTypeId() != max_key_type->getTypeId()) - throw Exception("Max key type in " + getName() + " should be same as keys type", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - } + this->checkTypes(key_type, arguments[2]); + else + this->checkTypes(key_type, nullptr); return std::make_shared(DataTypes{arguments[0], arguments[1]}); } - template - ColumnPtr execute2(ColumnPtr key_column, ColumnPtr val_column, ColumnPtr max_key_column, const DataTypeTuple & res_type) const + DataTypePtr getReturnTypeForMap(const DataTypes & arguments) const { - MutableColumnPtr res_tuple = res_type.createColumn(); + const auto * map = assert_cast(arguments[0].get()); + if (arguments.size() == 1) + this->checkTypes(map->getKeyType(), nullptr); + else if (arguments.size() == 2) + this->checkTypes(map->getKeyType(), arguments[1]); + else + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Too many arguments in {} call", getName()); - auto * to_tuple = assert_cast(res_tuple.get()); - auto & to_keys_arr = assert_cast(to_tuple->getColumn(0)); - auto & to_keys_data = to_keys_arr.getData(); - auto & to_keys_offsets = to_keys_arr.getOffsets(); + return std::make_shared(map->getKeyType(), map->getValueType()); + } - auto & to_vals_arr = assert_cast(to_tuple->getColumn(1)); - auto & to_values_data = to_vals_arr.getData(); + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (arguments.empty()) + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, getName() + " accepts at least one map or two arrays"); - bool max_key_is_const = false, key_is_const = false, val_is_const = false; + if (arguments[0]->getTypeId() == TypeIndex::Array) + return getReturnTypeForTuple(arguments); + else if (arguments[0]->getTypeId() == TypeIndex::Map) + return getReturnTypeForMap(arguments); + else + throw Exception( + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Function {} only accepts one map or arrays, but got {}", + getName(), + arguments[0]->getName()); + } - const auto * keys_array = checkAndGetColumn(key_column.get()); - if (!keys_array) + // Struct holds input and output columns references, + // Both arrays and maps have similar columns to work with but extracted differently + template + struct ColumnsInOut + { + // inputs + const PaddedPODArray & in_keys_data; + const PaddedPODArray & in_vals_data; + const IColumn::Offsets & in_key_offsets; + const IColumn::Offsets & in_val_offsets; + size_t row_count; + bool key_is_const; + bool val_is_const; + + // outputs + PaddedPODArray & out_keys_data; + PaddedPODArray & out_vals_data; + + IColumn::Offsets & out_keys_offsets; + // with map argument this field will not be used + IColumn::Offsets * out_vals_offsets; + }; + + template + ColumnsInOut getInOutDataFromArrays(MutableColumnPtr & res_column, ColumnPtr * arg_columns) const + { + auto * out_tuple = assert_cast(res_column.get()); + auto & out_keys_array = assert_cast(out_tuple->getColumn(0)); + auto & out_vals_array = assert_cast(out_tuple->getColumn(1)); + + const auto * key_column = arg_columns[0].get(); + const auto * in_keys_array = checkAndGetColumn(key_column); + + bool key_is_const = false, val_is_const = false; + + if (!in_keys_array) { - const ColumnConst * const_array = checkAndGetColumnConst(key_column.get()); + const ColumnConst * const_array = checkAndGetColumnConst(key_column); if (!const_array) - throw Exception("Expected array column, found " + key_column->getName(), ErrorCodes::ILLEGAL_COLUMN); + throw Exception( + ErrorCodes::ILLEGAL_COLUMN, "Expected array column in function {}, found {}", getName(), key_column->getName()); - keys_array = checkAndGetColumn(const_array->getDataColumnPtr().get()); + in_keys_array = checkAndGetColumn(const_array->getDataColumnPtr().get()); key_is_const = true; } - const auto * values_array = checkAndGetColumn(val_column.get()); - if (!values_array) + const auto * val_column = arg_columns[1].get(); + const auto * in_values_array = checkAndGetColumn(val_column); + if (!in_values_array) { - const ColumnConst * const_array = checkAndGetColumnConst(val_column.get()); + const ColumnConst * const_array = checkAndGetColumnConst(val_column); if (!const_array) - throw Exception("Expected array column, found " + val_column->getName(), ErrorCodes::ILLEGAL_COLUMN); + throw Exception( + ErrorCodes::ILLEGAL_COLUMN, "Expected array column in function {}, found {}", getName(), val_column->getName()); - values_array = checkAndGetColumn(const_array->getDataColumnPtr().get()); + in_values_array = checkAndGetColumn(const_array->getDataColumnPtr().get()); val_is_const = true; } - if (!keys_array || !values_array) + if (!in_keys_array || !in_values_array) /* something went wrong */ - throw Exception{"Illegal columns in arguments of function " + getName(), ErrorCodes::ILLEGAL_COLUMN}; + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Illegal columns in arguments of function " + getName()); + + const auto & in_keys_data = assert_cast &>(in_keys_array->getData()).getData(); + const auto & in_values_data = assert_cast &>(in_values_array->getData()).getData(); + const auto & in_keys_offsets = in_keys_array->getOffsets(); + const auto & in_vals_offsets = in_values_array->getOffsets(); + + auto & out_keys_data = assert_cast &>(out_keys_array.getData()).getData(); + auto & out_vals_data = assert_cast &>(out_vals_array.getData()).getData(); + auto & out_keys_offsets = out_keys_array.getOffsets(); + + size_t row_count = key_is_const ? in_values_array->size() : in_keys_array->size(); + IColumn::Offsets * out_vals_offsets = &out_vals_array.getOffsets(); + + return { + in_keys_data, + in_values_data, + in_keys_offsets, + in_vals_offsets, + row_count, + key_is_const, + val_is_const, + out_keys_data, + out_vals_data, + out_keys_offsets, + out_vals_offsets}; + } + + template + ColumnsInOut getInOutDataFromMap(MutableColumnPtr & res_column, ColumnPtr * arg_columns) const + { + const auto * in_map = assert_cast(arg_columns[0].get()); + const auto & in_nested_array = in_map->getNestedColumn(); + const auto & in_nested_tuple = in_map->getNestedData(); + const auto & in_keys_data = assert_cast &>(in_nested_tuple.getColumn(0)).getData(); + const auto & in_vals_data = assert_cast &>(in_nested_tuple.getColumn(1)).getData(); + const auto & in_keys_offsets = in_nested_array.getOffsets(); + + auto * out_map = assert_cast(res_column.get()); + auto & out_nested_array = out_map->getNestedColumn(); + auto & out_nested_tuple = out_map->getNestedData(); + auto & out_keys_data = assert_cast &>(out_nested_tuple.getColumn(0)).getData(); + auto & out_vals_data = assert_cast &>(out_nested_tuple.getColumn(1)).getData(); + auto & out_keys_offsets = out_nested_array.getOffsets(); + + return { + in_keys_data, + in_vals_data, + in_keys_offsets, + in_keys_offsets, + in_nested_array.size(), + false, + false, + out_keys_data, + out_vals_data, + out_keys_offsets, + nullptr}; + } + + template + ColumnPtr execute2(ColumnPtr * arg_columns, ColumnPtr max_key_column, const DataTypePtr & res_type) const + { + MutableColumnPtr res_column = res_type->createColumn(); + bool max_key_is_const = false; + auto columns = res_column->getDataType() == TypeIndex::Tuple ? getInOutDataFromArrays(res_column, arg_columns) + : getInOutDataFromMap(res_column, arg_columns); KeyType max_key_const{0}; @@ -121,49 +249,43 @@ private: max_key_is_const = true; } - auto & keys_data = assert_cast &>(keys_array->getData()).getData(); - auto & values_data = assert_cast &>(values_array->getData()).getData(); - - // Original offsets - const IColumn::Offsets & key_offsets = keys_array->getOffsets(); - const IColumn::Offsets & val_offsets = values_array->getOffsets(); - IColumn::Offset offset{0}; - size_t row_count = key_is_const ? values_array->size() : keys_array->size(); - std::map res_map; //Iterate through two arrays and fill result values. - for (size_t row = 0; row < row_count; ++row) + for (size_t row = 0; row < columns.row_count; ++row) { - size_t key_offset = 0, val_offset = 0, array_size = key_offsets[0], val_array_size = val_offsets[0]; + size_t key_offset = 0, val_offset = 0, items_count = columns.in_key_offsets[0], val_array_size = columns.in_val_offsets[0]; res_map.clear(); - if (!key_is_const) + if (!columns.key_is_const) { - key_offset = row > 0 ? key_offsets[row - 1] : 0; - array_size = key_offsets[row] - key_offset; + key_offset = row > 0 ? columns.in_key_offsets[row - 1] : 0; + items_count = columns.in_key_offsets[row] - key_offset; } - if (!val_is_const) + if (!columns.val_is_const) { - val_offset = row > 0 ? val_offsets[row - 1] : 0; - val_array_size = val_offsets[row] - val_offset; + val_offset = row > 0 ? columns.in_val_offsets[row - 1] : 0; + val_array_size = columns.in_val_offsets[row] - val_offset; } - if (array_size != val_array_size) - throw Exception("Key and value array should have same amount of elements", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + if (items_count != val_array_size) + throw Exception( + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, + "Key and value array should have same amount of elements in function {}", + getName()); - if (array_size == 0) + if (items_count == 0) { - to_keys_offsets.push_back(offset); + columns.out_keys_offsets.push_back(offset); continue; } - for (size_t i = 0; i < array_size; ++i) + for (size_t i = 0; i < items_count; ++i) { - res_map.insert({keys_data[key_offset + i], values_data[val_offset + i]}); + res_map.insert({columns.in_keys_data[key_offset + i], columns.in_vals_data[val_offset + i]}); } auto min_key = res_map.begin()->first; @@ -184,7 +306,7 @@ private: /* no need to add anything, max key is less that first key */ if (max_key < min_key) { - to_keys_offsets.push_back(offset); + columns.out_keys_offsets.push_back(offset); continue; } } @@ -197,16 +319,16 @@ private: KeyType key; for (key = min_key;; ++key) { - to_keys_data.insert(key); + columns.out_keys_data.push_back(key); auto it = res_map.find(key); if (it != res_map.end()) { - to_values_data.insert(it->second); + columns.out_vals_data.push_back(it->second); } else { - to_values_data.insertDefault(); + columns.out_vals_data.push_back(0); } ++offset; @@ -214,80 +336,112 @@ private: break; } - to_keys_offsets.push_back(offset); + columns.out_keys_offsets.push_back(offset); } - to_vals_arr.getOffsets().insert(to_keys_offsets.begin(), to_keys_offsets.end()); - return res_tuple; + if (columns.out_vals_offsets) + columns.out_vals_offsets->insert(columns.out_keys_offsets.begin(), columns.out_keys_offsets.end()); + + return res_column; } template - ColumnPtr execute1(ColumnPtr key_column, ColumnPtr val_column, ColumnPtr max_key_column, const DataTypeTuple & res_type) const + ColumnPtr execute1(ColumnPtr * arg_columns, ColumnPtr max_key_column, const DataTypePtr & res_type, const DataTypePtr & val_type) const { - const auto & val_type = (assert_cast(res_type.getElements()[1].get()))->getNestedType(); switch (val_type->getTypeId()) { case TypeIndex::Int8: - return execute2(key_column, val_column, max_key_column, res_type); + return execute2(arg_columns, max_key_column, res_type); case TypeIndex::Int16: - return execute2(key_column, val_column, max_key_column, res_type); + return execute2(arg_columns, max_key_column, res_type); case TypeIndex::Int32: - return execute2(key_column, val_column, max_key_column, res_type); + return execute2(arg_columns, max_key_column, res_type); case TypeIndex::Int64: - return execute2(key_column, val_column, max_key_column, res_type); + return execute2(arg_columns, max_key_column, res_type); + case TypeIndex::Int128: + return execute2(arg_columns, max_key_column, res_type); + case TypeIndex::Int256: + return execute2(arg_columns, max_key_column, res_type); case TypeIndex::UInt8: - return execute2(key_column, val_column, max_key_column, res_type); + return execute2(arg_columns, max_key_column, res_type); case TypeIndex::UInt16: - return execute2(key_column, val_column, max_key_column, res_type); + return execute2(arg_columns, max_key_column, res_type); case TypeIndex::UInt32: - return execute2(key_column, val_column, max_key_column, res_type); + return execute2(arg_columns, max_key_column, res_type); case TypeIndex::UInt64: - return execute2(key_column, val_column, max_key_column, res_type); + return execute2(arg_columns, max_key_column, res_type); + case TypeIndex::UInt128: + return execute2(arg_columns, max_key_column, res_type); + case TypeIndex::UInt256: + return execute2(arg_columns, max_key_column, res_type); default: - throw Exception{"Illegal columns in arguments of function " + getName(), ErrorCodes::ILLEGAL_COLUMN}; + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Illegal columns in arguments of function " + getName()); } } ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t) const override { - auto col1 = arguments[0]; - auto col2 = arguments[1]; - - const auto * k = assert_cast(col1.type.get()); - const auto * v = assert_cast(col2.type.get()); - - /* determine output type */ - const DataTypeTuple & res_type = DataTypeTuple( - DataTypes{std::make_shared(k->getNestedType()), std::make_shared(v->getNestedType())}); - + DataTypePtr res_type, key_type, val_type; ColumnPtr max_key_column = nullptr; + ColumnPtr arg_columns[] = {arguments[0].column, nullptr}; - if (arguments.size() == 3) + if (arguments[0].type->getTypeId() == TypeIndex::Array) { - /* max key provided */ - max_key_column = arguments[2].column; + key_type = assert_cast(arguments[0].type.get())->getNestedType(); + val_type = assert_cast(arguments[1].type.get())->getNestedType(); + res_type = getReturnTypeImpl(DataTypes{arguments[0].type, arguments[1].type}); + + arg_columns[1] = arguments[1].column; + if (arguments.size() == 3) + { + /* max key provided */ + max_key_column = arguments[2].column; + } + } + else + { + assert(arguments[0].type->getTypeId() == TypeIndex::Map); + + const auto * map_type = assert_cast(arguments[0].type.get()); + res_type = getReturnTypeImpl(DataTypes{arguments[0].type}); + key_type = map_type->getKeyType(); + val_type = map_type->getValueType(); + + if (arguments.size() == 2) + { + /* max key provided */ + max_key_column = arguments[1].column; + } } - switch (k->getNestedType()->getTypeId()) + switch (key_type->getTypeId()) { case TypeIndex::Int8: - return execute1(col1.column, col2.column, max_key_column, res_type); + return execute1(arg_columns, max_key_column, res_type, val_type); case TypeIndex::Int16: - return execute1(col1.column, col2.column, max_key_column, res_type); + return execute1(arg_columns, max_key_column, res_type, val_type); case TypeIndex::Int32: - return execute1(col1.column, col2.column, max_key_column, res_type); + return execute1(arg_columns, max_key_column, res_type, val_type); case TypeIndex::Int64: - return execute1(col1.column, col2.column, max_key_column, res_type); + return execute1(arg_columns, max_key_column, res_type, val_type); + case TypeIndex::Int128: + return execute1(arg_columns, max_key_column, res_type, val_type); + case TypeIndex::Int256: + return execute1(arg_columns, max_key_column, res_type, val_type); case TypeIndex::UInt8: - return execute1(col1.column, col2.column, max_key_column, res_type); + return execute1(arg_columns, max_key_column, res_type, val_type); case TypeIndex::UInt16: - return execute1(col1.column, col2.column, max_key_column, res_type); + return execute1(arg_columns, max_key_column, res_type, val_type); case TypeIndex::UInt32: - return execute1(col1.column, col2.column, max_key_column, res_type); + return execute1(arg_columns, max_key_column, res_type, val_type); case TypeIndex::UInt64: - return execute1(col1.column, col2.column, max_key_column, res_type); + return execute1(arg_columns, max_key_column, res_type, val_type); + case TypeIndex::UInt128: + return execute1(arg_columns, max_key_column, res_type, val_type); + case TypeIndex::UInt256: + return execute1(arg_columns, max_key_column, res_type, val_type); default: - throw Exception{"Illegal columns in arguments of function " + getName(), ErrorCodes::ILLEGAL_COLUMN}; + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Illegal columns in arguments of function " + getName()); } } }; @@ -296,5 +450,4 @@ void registerFunctionMapPopulateSeries(FunctionFactory & factory) { factory.registerFunction(); } - } diff --git a/src/Functions/array/range.cpp b/src/Functions/array/range.cpp index 5b9886580dc..9eefc4f178d 100644 --- a/src/Functions/array/range.cpp +++ b/src/Functions/array/range.cpp @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -31,8 +32,10 @@ class FunctionRange : public IFunction { public: static constexpr auto name = "range"; - static constexpr size_t max_elements = 100'000'000; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + const size_t max_elements; + static FunctionPtr create(ContextPtr context_) { return std::make_shared(std::move(context_)); } + explicit FunctionRange(ContextPtr context) : max_elements(context->getSettingsRef().function_range_max_elements_in_block) {} private: String getName() const override { return name; } diff --git a/src/Functions/bitmaskToList.cpp b/src/Functions/bitmaskToList.cpp deleted file mode 100644 index 8c3105724ac..00000000000 --- a/src/Functions/bitmaskToList.cpp +++ /dev/null @@ -1,132 +0,0 @@ -#include -#include -#include -#include -#include -#include -#include - - -namespace DB -{ - -namespace ErrorCodes -{ - extern const int ILLEGAL_TYPE_OF_ARGUMENT; - extern const int ILLEGAL_COLUMN; -} - - -/** Function for an unusual conversion to a string: - * - * bitmaskToList - takes an integer - a bitmask, returns a string of degrees of 2 separated by a comma. - * for example, bitmaskToList(50) = '2,16,32' - */ - -namespace -{ - -class FunctionBitmaskToList : public IFunction -{ -public: - static constexpr auto name = "bitmaskToList"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - - String getName() const override - { - return name; - } - - size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const ColumnsWithTypeAndName &) const override { return true; } - - DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override - { - const DataTypePtr & type = arguments[0]; - - if (!isInteger(type)) - throw Exception("Cannot format " + type->getName() + " as bitmask string", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - - return std::make_shared(); - } - - bool useDefaultImplementationForConstants() const override { return true; } - - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override - { - ColumnPtr res; - if (!((res = executeType(arguments)) - || (res = executeType(arguments)) - || (res = executeType(arguments)) - || (res = executeType(arguments)) - || (res = executeType(arguments)) - || (res = executeType(arguments)) - || (res = executeType(arguments)) - || (res = executeType(arguments)))) - throw Exception("Illegal column " + arguments[0].column->getName() - + " of argument of function " + getName(), - ErrorCodes::ILLEGAL_COLUMN); - - return res; - } - -private: - template - inline static void writeBitmask(T x, WriteBuffer & out) - { - using UnsignedT = make_unsigned_t; - UnsignedT u_x = x; - - bool first = true; - while (u_x) - { - UnsignedT y = u_x & (u_x - 1); - UnsignedT bit = u_x ^ y; - u_x = y; - if (!first) - writeChar(',', out); - first = false; - writeIntText(T(bit), out); - } - } - - template - ColumnPtr executeType(const ColumnsWithTypeAndName & columns) const - { - if (const ColumnVector * col_from = checkAndGetColumn>(columns[0].column.get())) - { - auto col_to = ColumnString::create(); - - const typename ColumnVector::Container & vec_from = col_from->getData(); - ColumnString::Chars & data_to = col_to->getChars(); - ColumnString::Offsets & offsets_to = col_to->getOffsets(); - size_t size = vec_from.size(); - data_to.resize(size * 2); - offsets_to.resize(size); - - WriteBufferFromVector buf_to(data_to); - - for (size_t i = 0; i < size; ++i) - { - writeBitmask(vec_from[i], buf_to); - writeChar(0, buf_to); - offsets_to[i] = buf_to.count(); - } - - buf_to.finalize(); - return col_to; - } - - return nullptr; - } -}; - -} - -void registerFunctionBitmaskToList(FunctionFactory & factory) -{ - factory.registerFunction(); -} - -} - diff --git a/src/Functions/buildId.cpp b/src/Functions/buildId.cpp index cc0c21350ca..5329d4a40b4 100644 --- a/src/Functions/buildId.cpp +++ b/src/Functions/buildId.cpp @@ -5,6 +5,7 @@ #include #include #include +#include namespace DB @@ -18,9 +19,13 @@ class FunctionBuildId : public IFunction { public: static constexpr auto name = "buildId"; - static FunctionPtr create(ContextPtr) + static FunctionPtr create(ContextPtr context) + { + return std::make_shared(context->isDistributed()); + } + + explicit FunctionBuildId(bool is_distributed_) : is_distributed(is_distributed_) { - return std::make_shared(); } String getName() const override @@ -33,6 +38,10 @@ public: return 0; } + bool isDeterministic() const override { return false; } + bool isDeterministicInScopeOfQuery() const override { return true; } + bool isSuitableForConstantFolding() const override { return !is_distributed; } + DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override { return std::make_shared(); @@ -42,6 +51,9 @@ public: { return DataTypeString().createColumnConst(input_rows_count, SymbolIndex::instance()->getBuildIDHex()); } + +private: + bool is_distributed; }; } diff --git a/src/Functions/currentProfiles.cpp b/src/Functions/currentProfiles.cpp new file mode 100644 index 00000000000..afe54fd6582 --- /dev/null +++ b/src/Functions/currentProfiles.cpp @@ -0,0 +1,88 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace +{ + enum class Kind + { + CURRENT_PROFILES, + ENABLED_PROFILES, + DEFAULT_PROFILES, + }; + + template + class FunctionCurrentProfiles : public IFunction + { + public: + static constexpr auto name = (kind == Kind::CURRENT_PROFILES) ? "currentProfiles" : ((kind == Kind::ENABLED_PROFILES) ? "enabledProfiles" : "defaultProfiles"); + static FunctionPtr create(const ContextPtr & context) { return std::make_shared(context); } + + String getName() const override { return name; } + + explicit FunctionCurrentProfiles(const ContextPtr & context) + { + const auto & manager = context->getAccessControlManager(); + + std::vector profile_ids; + if constexpr (kind == Kind::CURRENT_PROFILES) + { + profile_ids = context->getCurrentProfiles(); + } + else if constexpr (kind == Kind::ENABLED_PROFILES) + { + profile_ids = context->getEnabledProfiles(); + } + else + { + static_assert(kind == Kind::DEFAULT_PROFILES); + if (auto user = context->getUser()) + profile_ids = user->settings.toProfileIDs(); + } + + profile_names = manager.tryReadNames(profile_ids); + } + + size_t getNumberOfArguments() const override { return 0; } + bool isDeterministic() const override { return false; } + + DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override + { + return std::make_shared(std::make_shared()); + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override + { + auto col_res = ColumnArray::create(ColumnString::create()); + ColumnString & res_strings = typeid_cast(col_res->getData()); + ColumnArray::Offsets & res_offsets = col_res->getOffsets(); + for (const String & profile_name : profile_names) + res_strings.insertData(profile_name.data(), profile_name.length()); + res_offsets.push_back(res_strings.size()); + return ColumnConst::create(std::move(col_res), input_rows_count); + } + + private: + Strings profile_names; + }; +} + +void registerFunctionCurrentProfiles(FunctionFactory & factory) +{ + factory.registerFunction>(); + factory.registerFunction>(); + factory.registerFunction>(); +} + +} diff --git a/src/Functions/currentRoles.cpp b/src/Functions/currentRoles.cpp new file mode 100644 index 00000000000..0a4e23308d8 --- /dev/null +++ b/src/Functions/currentRoles.cpp @@ -0,0 +1,88 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace +{ + enum class Kind + { + CURRENT_ROLES, + ENABLED_ROLES, + DEFAULT_ROLES, + }; + + template + class FunctionCurrentRoles : public IFunction + { + public: + static constexpr auto name = (kind == Kind::CURRENT_ROLES) ? "currentRoles" : ((kind == Kind::ENABLED_ROLES) ? "enabledRoles" : "defaultRoles"); + static FunctionPtr create(const ContextPtr & context) { return std::make_shared(context); } + + String getName() const override { return name; } + + explicit FunctionCurrentRoles(const ContextPtr & context) + { + if constexpr (kind == Kind::CURRENT_ROLES) + { + role_names = context->getRolesInfo()->getCurrentRolesNames(); + } + else if constexpr (kind == Kind::ENABLED_ROLES) + { + role_names = context->getRolesInfo()->getEnabledRolesNames(); + } + else + { + static_assert(kind == Kind::DEFAULT_ROLES); + const auto & manager = context->getAccessControlManager(); + if (auto user = context->getUser()) + role_names = manager.tryReadNames(user->granted_roles.findGranted(user->default_roles)); + } + + /// We sort the names because the result of the function should not depend on the order of UUIDs. + std::sort(role_names.begin(), role_names.end()); + } + + size_t getNumberOfArguments() const override { return 0; } + bool isDeterministic() const override { return false; } + + DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override + { + return std::make_shared(std::make_shared()); + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override + { + auto col_res = ColumnArray::create(ColumnString::create()); + ColumnString & res_strings = typeid_cast(col_res->getData()); + ColumnArray::Offsets & res_offsets = col_res->getOffsets(); + for (const String & role_name : role_names) + res_strings.insertData(role_name.data(), role_name.length()); + res_offsets.push_back(res_strings.size()); + return ColumnConst::create(std::move(col_res), input_rows_count); + } + + private: + Strings role_names; + }; +} + +void registerFunctionCurrentRoles(FunctionFactory & factory) +{ + factory.registerFunction>(); + factory.registerFunction>(); + factory.registerFunction>(); +} + +} diff --git a/src/Functions/extractAllGroups.h b/src/Functions/extractAllGroups.h index 864a788cf18..62026dcc147 100644 --- a/src/Functions/extractAllGroups.h +++ b/src/Functions/extractAllGroups.h @@ -7,6 +7,8 @@ #include #include #include +#include +#include #include #include @@ -47,11 +49,17 @@ enum class ExtractAllGroupsResultKind template class FunctionExtractAllGroups : public IFunction { + ContextPtr context; + public: static constexpr auto Kind = Impl::Kind; static constexpr auto name = Impl::Name; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } + FunctionExtractAllGroups(ContextPtr context_) + : context(context_) + {} + + static FunctionPtr create(ContextPtr context) { return std::make_shared(context); } String getName() const override { return name; } @@ -147,6 +155,9 @@ public: } else { + /// Additional limit to fail fast on supposedly incorrect usage. + const auto max_matches_per_row = context->getSettingsRef().regexp_max_matches_per_row; + PODArray all_matches; /// Number of times RE matched on each row of haystack column. PODArray number_of_matches_per_row; @@ -172,16 +183,13 @@ public: for (size_t group = 1; group <= groups_count; ++group) all_matches.push_back(matched_groups[group]); - /// Additional limit to fail fast on supposedly incorrect usage, arbitrary value. - static constexpr size_t MAX_MATCHES_PER_ROW = 1000; - if (matches_per_row > MAX_MATCHES_PER_ROW) + ++matches_per_row; + if (matches_per_row > max_matches_per_row) throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Too many matches per row (> {}) in the result of function {}", - MAX_MATCHES_PER_ROW, getName()); + max_matches_per_row, getName()); pos = matched_groups[0].data() + std::max(1, matched_groups[0].size()); - - ++matches_per_row; } number_of_matches_per_row.push_back(matches_per_row); diff --git a/src/Functions/getScalar.cpp b/src/Functions/getScalar.cpp index a29abd257e7..13ec97a94fb 100644 --- a/src/Functions/getScalar.cpp +++ b/src/Functions/getScalar.cpp @@ -2,6 +2,7 @@ #include #include #include +#include #include #include #include @@ -60,11 +61,89 @@ private: mutable ColumnWithTypeAndName scalar; }; + +/** Get special scalar values + */ +template +class FunctionGetSpecialScalar : public IFunction +{ +public: + static constexpr auto name = Scalar::name; + static FunctionPtr create(ContextPtr context_) + { + return std::make_shared>(context_); + } + + static ColumnWithTypeAndName createScalar(ContextPtr context_) + { + if (const auto * block = context_->tryGetLocalScalar(Scalar::scalar_name)) + return block->getByPosition(0); + else if (context_->hasQueryContext()) + { + if (context_->getQueryContext()->hasScalar(Scalar::scalar_name)) + return context_->getQueryContext()->getScalar(Scalar::scalar_name).getByPosition(0); + } + return {DataTypeUInt32().createColumnConst(1, 0), std::make_shared(), Scalar::scalar_name}; + } + + explicit FunctionGetSpecialScalar(ContextPtr context_) + : scalar(createScalar(context_)), is_distributed(context_->isDistributed()) + { + } + + String getName() const override + { + return name; + } + + bool isDeterministic() const override { return false; } + + bool isDeterministicInScopeOfQuery() const override + { + return true; + } + + bool isSuitableForConstantFolding() const override { return !is_distributed; } + + size_t getNumberOfArguments() const override + { + return 0; + } + + DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName &) const override + { + return scalar.type; + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override + { + return ColumnConst::create(scalar.column, input_rows_count); + } + +private: + ColumnWithTypeAndName scalar; + bool is_distributed; +}; + +struct GetShardNum +{ + static constexpr auto name = "shardNum"; + static constexpr auto scalar_name = "_shard_num"; +}; + +struct GetShardCount +{ + static constexpr auto name = "shardCount"; + static constexpr auto scalar_name = "_shard_count"; +}; + } void registerFunctionGetScalar(FunctionFactory & factory) { factory.registerFunction(); + factory.registerFunction>(); + factory.registerFunction>(); } } diff --git a/src/Functions/hasColumnInTable.cpp b/src/Functions/hasColumnInTable.cpp index 0fa0562389b..2e7428332cb 100644 --- a/src/Functions/hasColumnInTable.cpp +++ b/src/Functions/hasColumnInTable.cpp @@ -127,13 +127,16 @@ ColumnPtr FunctionHasColumnInTable::executeImpl(const ColumnsWithTypeAndName & a { std::vector> host_names = {{ host_name }}; + bool treat_local_as_remote = false; + bool treat_local_port_as_remote = getContext()->getApplicationType() == Context::ApplicationType::LOCAL; auto cluster = std::make_shared( getContext()->getSettings(), host_names, !user_name.empty() ? user_name : "default", password, getContext()->getTCPPort(), - false); + treat_local_as_remote, + treat_local_port_as_remote); // FIXME this (probably) needs a non-constant access to query context, // because it might initialized a storage. Ideally, the tables required diff --git a/src/Functions/hostName.cpp b/src/Functions/hostName.cpp index 0aba155bb36..fd8400d3b9f 100644 --- a/src/Functions/hostName.cpp +++ b/src/Functions/hostName.cpp @@ -3,6 +3,7 @@ #include #include #include +#include namespace DB @@ -15,9 +16,13 @@ class FunctionHostName : public IFunction { public: static constexpr auto name = "hostName"; - static FunctionPtr create(ContextPtr) + static FunctionPtr create(ContextPtr context) + { + return std::make_shared(context->isDistributed()); + } + + explicit FunctionHostName(bool is_distributed_) : is_distributed(is_distributed_) { - return std::make_shared(); } String getName() const override @@ -29,10 +34,10 @@ public: bool isDeterministicInScopeOfQuery() const override { - return false; + return true; } - bool isSuitableForConstantFolding() const override { return false; } + bool isSuitableForConstantFolding() const override { return !is_distributed; } size_t getNumberOfArguments() const override { @@ -44,14 +49,12 @@ public: return std::make_shared(); } - /** convertToFullColumn needed because in distributed query processing, - * each server returns its own value. - */ ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr & result_type, size_t input_rows_count) const override { - return result_type->createColumnConst( - input_rows_count, DNSResolver::instance().getHostName())->convertToFullColumnIfConst(); + return result_type->createColumnConst(input_rows_count, DNSResolver::instance().getHostName()); } +private: + bool is_distributed; }; } diff --git a/src/Functions/initialQueryID.cpp b/src/Functions/initialQueryID.cpp new file mode 100644 index 00000000000..118339a8fb6 --- /dev/null +++ b/src/Functions/initialQueryID.cpp @@ -0,0 +1,44 @@ +#include +#include +#include +#include +#include + +namespace DB +{ +class FunctionInitialQueryID : public IFunction +{ + const String initial_query_id; + +public: + static constexpr auto name = "initialQueryID"; + static FunctionPtr create(ContextPtr context) + { + return std::make_shared(context->getClientInfo().initial_query_id); + } + + explicit FunctionInitialQueryID(const String & initial_query_id_) : initial_query_id(initial_query_id_) {} + + inline String getName() const override { return name; } + + inline size_t getNumberOfArguments() const override { return 0; } + + DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override + { + return std::make_shared(); + } + + inline bool isDeterministic() const override { return false; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override + { + return DataTypeString().createColumnConst(input_rows_count, initial_query_id); + } +}; + +void registerFunctionInitialQueryID(FunctionFactory & factory) +{ + factory.registerFunction(); + factory.registerAlias("initial_query_id", FunctionInitialQueryID::name, FunctionFactory::CaseInsensitive); +} +} diff --git a/src/Functions/lemmatize.cpp b/src/Functions/lemmatize.cpp new file mode 100644 index 00000000000..35d2bfebe08 --- /dev/null +++ b/src/Functions/lemmatize.cpp @@ -0,0 +1,130 @@ +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + +#if USE_NLP + +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ +namespace ErrorCodes +{ + extern const int ILLEGAL_COLUMN; + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int SUPPORT_IS_DISABLED; +} + +namespace +{ + +struct LemmatizeImpl +{ + static void vector( + const ColumnString::Chars & data, + const ColumnString::Offsets & offsets, + ColumnString::Chars & res_data, + ColumnString::Offsets & res_offsets, + Lemmatizers::LemmPtr & lemmatizer) + { + res_data.resize(data.size()); + res_offsets.assign(offsets); + + UInt64 data_size = 0; + for (UInt64 i = 0; i < offsets.size(); ++i) + { + /// lemmatize() uses the fact the fact that each string ends with '\0' + auto result = lemmatizer->lemmatize(reinterpret_cast(data.data() + offsets[i - 1])); + size_t new_size = strlen(result.get()) + 1; + + if (data_size + new_size > res_data.size()) + res_data.resize(data_size + new_size); + + memcpy(res_data.data() + data_size, reinterpret_cast(result.get()), new_size); + + data_size += new_size; + res_offsets[i] = data_size; + } + res_data.resize(data_size); + } +}; + + +class FunctionLemmatize : public IFunction +{ +public: + static constexpr auto name = "lemmatize"; + static FunctionPtr create(ContextPtr context) + { + if (!context->getSettingsRef().allow_experimental_nlp_functions) + throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Natural language processing function '{}' is experimental. Set `allow_experimental_nlp_functions` setting to enable it", name); + + return std::make_shared(context->getLemmatizers()); + } + +private: + Lemmatizers & lemmatizers; + +public: + explicit FunctionLemmatize(Lemmatizers & lemmatizers_) + : lemmatizers(lemmatizers_) {} + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 2; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception( + "Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + if (!isString(arguments[1])) + throw Exception( + "Illegal type " + arguments[1]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + return arguments[1]; + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t) const override + { + const auto & langcolumn = arguments[0].column; + const auto & strcolumn = arguments[1].column; + + const ColumnConst * lang_col = checkAndGetColumn(langcolumn.get()); + const ColumnString * words_col = checkAndGetColumn(strcolumn.get()); + + if (!lang_col) + throw Exception( + "Illegal column " + arguments[0].column->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); + if (!words_col) + throw Exception( + "Illegal column " + arguments[1].column->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); + + String language = lang_col->getValue(); + auto lemmatizer = lemmatizers.getLemmatizer(language); + + auto col_res = ColumnString::create(); + LemmatizeImpl::vector(words_col->getChars(), words_col->getOffsets(), col_res->getChars(), col_res->getOffsets(), lemmatizer); + return col_res; + } +}; + +} + +void registerFunctionLemmatize(FunctionFactory & factory) +{ + factory.registerFunction(FunctionFactory::CaseInsensitive); +} + +} + +#endif diff --git a/src/Functions/queryID.cpp b/src/Functions/queryID.cpp new file mode 100644 index 00000000000..b55d3fa0326 --- /dev/null +++ b/src/Functions/queryID.cpp @@ -0,0 +1,44 @@ +#include +#include +#include +#include +#include + +namespace DB +{ +class FunctionQueryID : public IFunction +{ + const String query_id; + +public: + static constexpr auto name = "queryID"; + static FunctionPtr create(ContextPtr context) + { + return std::make_shared(context->getClientInfo().current_query_id); + } + + explicit FunctionQueryID(const String & query_id_) : query_id(query_id_) {} + + inline String getName() const override { return name; } + + inline size_t getNumberOfArguments() const override { return 0; } + + DataTypePtr getReturnTypeImpl(const DataTypes & /*arguments*/) const override + { + return std::make_shared(); + } + + inline bool isDeterministic() const override { return false; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override + { + return DataTypeString().createColumnConst(input_rows_count, query_id)->convertToFullColumnIfConst(); + } +}; + +void registerFunctionQueryID(FunctionFactory & factory) +{ + factory.registerFunction(); + factory.registerAlias("query_id", FunctionQueryID::name, FunctionFactory::CaseInsensitive); +} +} diff --git a/src/Functions/registerFunctions.cpp b/src/Functions/registerFunctions.cpp index 29343a871a8..7e8f35bc0c4 100644 --- a/src/Functions/registerFunctions.cpp +++ b/src/Functions/registerFunctions.cpp @@ -12,7 +12,10 @@ void registerFunctionsArray(FunctionFactory &); void registerFunctionsTuple(FunctionFactory &); void registerFunctionsMap(FunctionFactory &); void registerFunctionsBitmap(FunctionFactory &); +void registerFunctionsBinaryRepr(FunctionFactory &); void registerFunctionsCoding(FunctionFactory &); +void registerFunctionsCodingUUID(FunctionFactory &); +void registerFunctionChar(FunctionFactory &); void registerFunctionsComparison(FunctionFactory &); void registerFunctionsConditional(FunctionFactory &); void registerFunctionsConversion(FunctionFactory &); @@ -73,7 +76,10 @@ void registerFunctions() #if !defined(ARCADIA_BUILD) registerFunctionsBitmap(factory); #endif + registerFunctionsBinaryRepr(factory); registerFunctionsCoding(factory); + registerFunctionsCodingUUID(factory); + registerFunctionChar(factory); registerFunctionsComparison(factory); registerFunctionsConditional(factory); registerFunctionsConversion(factory); diff --git a/src/Functions/registerFunctionsFormatting.cpp b/src/Functions/registerFunctionsFormatting.cpp index ab258589b92..e434b0e49f0 100644 --- a/src/Functions/registerFunctionsFormatting.cpp +++ b/src/Functions/registerFunctionsFormatting.cpp @@ -3,14 +3,14 @@ namespace DB class FunctionFactory; -void registerFunctionBitmaskToList(FunctionFactory &); +void registerFunctionsBitToArray(FunctionFactory &); void registerFunctionFormatReadableSize(FunctionFactory &); void registerFunctionFormatReadableQuantity(FunctionFactory &); void registerFunctionFormatReadableTimeDelta(FunctionFactory &); void registerFunctionsFormatting(FunctionFactory & factory) { - registerFunctionBitmaskToList(factory); + registerFunctionsBitToArray(factory); registerFunctionFormatReadableSize(factory); registerFunctionFormatReadableQuantity(factory); registerFunctionFormatReadableTimeDelta(factory); diff --git a/src/Functions/registerFunctionsMiscellaneous.cpp b/src/Functions/registerFunctionsMiscellaneous.cpp index 3403390ea72..12c54aeeefd 100644 --- a/src/Functions/registerFunctionsMiscellaneous.cpp +++ b/src/Functions/registerFunctionsMiscellaneous.cpp @@ -9,6 +9,8 @@ class FunctionFactory; void registerFunctionCurrentDatabase(FunctionFactory &); void registerFunctionCurrentUser(FunctionFactory &); +void registerFunctionCurrentProfiles(FunctionFactory &); +void registerFunctionCurrentRoles(FunctionFactory &); void registerFunctionHostName(FunctionFactory &); void registerFunctionFQDN(FunctionFactory &); void registerFunctionVisibleWidth(FunctionFactory &); @@ -74,6 +76,8 @@ void registerFunctionFile(FunctionFactory & factory); void registerFunctionConnectionId(FunctionFactory & factory); void registerFunctionPartitionId(FunctionFactory & factory); void registerFunctionIsIPAddressContainedIn(FunctionFactory &); +void registerFunctionQueryID(FunctionFactory & factory); +void registerFunctionInitialQueryID(FunctionFactory & factory); #if USE_ICU void registerFunctionConvertCharset(FunctionFactory &); @@ -83,6 +87,8 @@ void registerFunctionsMiscellaneous(FunctionFactory & factory) { registerFunctionCurrentDatabase(factory); registerFunctionCurrentUser(factory); + registerFunctionCurrentProfiles(factory); + registerFunctionCurrentRoles(factory); registerFunctionHostName(factory); registerFunctionFQDN(factory); registerFunctionVisibleWidth(factory); @@ -148,6 +154,8 @@ void registerFunctionsMiscellaneous(FunctionFactory & factory) registerFunctionConnectionId(factory); registerFunctionPartitionId(factory); registerFunctionIsIPAddressContainedIn(factory); + registerFunctionQueryID(factory); + registerFunctionInitialQueryID(factory); #if USE_ICU registerFunctionConvertCharset(factory); diff --git a/src/Functions/registerFunctionsString.cpp b/src/Functions/registerFunctionsString.cpp index 18a30469386..ba6a294abba 100644 --- a/src/Functions/registerFunctionsString.cpp +++ b/src/Functions/registerFunctionsString.cpp @@ -1,5 +1,6 @@ #if !defined(ARCADIA_BUILD) # include "config_functions.h" +# include "config_core.h" #endif namespace DB @@ -37,7 +38,7 @@ void registerFunctionCountMatches(FunctionFactory &); void registerFunctionEncodeXMLComponent(FunctionFactory &); void registerFunctionDecodeXMLComponent(FunctionFactory &); void registerFunctionExtractTextFromHTML(FunctionFactory &); - +void registerFunctionToStringCutToZero(FunctionFactory &); #if USE_BASE64 void registerFunctionBase64Encode(FunctionFactory &); @@ -45,6 +46,12 @@ void registerFunctionBase64Decode(FunctionFactory &); void registerFunctionTryBase64Decode(FunctionFactory &); #endif +#if USE_NLP +void registerFunctionStem(FunctionFactory &); +void registerFunctionSynonyms(FunctionFactory &); +void registerFunctionLemmatize(FunctionFactory &); +#endif + void registerFunctionsString(FunctionFactory & factory) { registerFunctionRepeat(factory); @@ -77,11 +84,19 @@ void registerFunctionsString(FunctionFactory & factory) registerFunctionEncodeXMLComponent(factory); registerFunctionDecodeXMLComponent(factory); registerFunctionExtractTextFromHTML(factory); + registerFunctionToStringCutToZero(factory); + #if USE_BASE64 registerFunctionBase64Encode(factory); registerFunctionBase64Decode(factory); registerFunctionTryBase64Decode(factory); #endif + +#if USE_NLP + registerFunctionStem(factory); + registerFunctionSynonyms(factory); + registerFunctionLemmatize(factory); +#endif } } diff --git a/src/Functions/sleep.h b/src/Functions/sleep.h index 8f78fd19a1f..304d51760de 100644 --- a/src/Functions/sleep.h +++ b/src/Functions/sleep.h @@ -5,11 +5,17 @@ #include #include #include +#include #include #include #include #include +namespace ProfileEvents +{ +extern const Event SleepFunctionCalls; +extern const Event SleepFunctionMicroseconds; +} namespace DB { @@ -91,8 +97,11 @@ public: if (seconds > 3.0) /// The choice is arbitrary throw Exception("The maximum sleep time is 3 seconds. Requested: " + toString(seconds), ErrorCodes::TOO_SLOW); - UInt64 microseconds = seconds * (variant == FunctionSleepVariant::PerBlock ? 1 : size) * 1e6; + UInt64 count = (variant == FunctionSleepVariant::PerBlock ? 1 : size); + UInt64 microseconds = seconds * count * 1e6; sleepForMicroseconds(microseconds); + ProfileEvents::increment(ProfileEvents::SleepFunctionCalls, count); + ProfileEvents::increment(ProfileEvents::SleepFunctionMicroseconds, microseconds); } /// convertToFullColumn needed, because otherwise (constant expression case) function will not get called on each columns. diff --git a/src/Functions/stem.cpp b/src/Functions/stem.cpp new file mode 100644 index 00000000000..98dcbccd005 --- /dev/null +++ b/src/Functions/stem.cpp @@ -0,0 +1,135 @@ +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + +#if USE_NLP + +#include +#include +#include +#include +#include +#include + +#include + + +namespace DB +{ +namespace ErrorCodes +{ + extern const int ILLEGAL_COLUMN; + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int SUPPORT_IS_DISABLED; +} + +namespace +{ + +struct StemImpl +{ + static void vector( + const ColumnString::Chars & data, + const ColumnString::Offsets & offsets, + ColumnString::Chars & res_data, + ColumnString::Offsets & res_offsets, + const String & language) + { + sb_stemmer * stemmer = sb_stemmer_new(language.data(), "UTF_8"); + + if (stemmer == nullptr) + { + throw Exception( + "Language " + language + " is not supported for function stem", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + } + + res_data.resize(data.size()); + res_offsets.assign(offsets); + + UInt64 data_size = 0; + for (UInt64 i = 0; i < offsets.size(); ++i) + { + /// Note that accessing -1th element is valid for PaddedPODArray. + size_t original_size = offsets[i] - offsets[i - 1]; + const sb_symbol * result = sb_stemmer_stem(stemmer, + reinterpret_cast(data.data() + offsets[i - 1]), + original_size - 1); + size_t new_size = sb_stemmer_length(stemmer) + 1; + + memcpy(res_data.data() + data_size, result, new_size); + + data_size += new_size; + res_offsets[i] = data_size; + } + res_data.resize(data_size); + sb_stemmer_delete(stemmer); + } +}; + + +class FunctionStem : public IFunction +{ +public: + static constexpr auto name = "stem"; + + static FunctionPtr create(ContextPtr context) + { + if (!context->getSettingsRef().allow_experimental_nlp_functions) + throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Natural language processing function '{}' is experimental. Set `allow_experimental_nlp_functions` setting to enable it", name); + + return std::make_shared(); + } + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 2; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception( + "Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + if (!isString(arguments[1])) + throw Exception( + "Illegal type " + arguments[1]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + return arguments[1]; + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t) const override + { + const auto & langcolumn = arguments[0].column; + const auto & strcolumn = arguments[1].column; + + const ColumnConst * lang_col = checkAndGetColumn(langcolumn.get()); + const ColumnString * words_col = checkAndGetColumn(strcolumn.get()); + + if (!lang_col) + throw Exception( + "Illegal column " + arguments[0].column->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); + if (!words_col) + throw Exception( + "Illegal column " + arguments[1].column->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); + + String language = lang_col->getValue(); + + auto col_res = ColumnString::create(); + StemImpl::vector(words_col->getChars(), words_col->getOffsets(), col_res->getChars(), col_res->getOffsets(), language); + return col_res; + } +}; + +} + +void registerFunctionStem(FunctionFactory & factory) +{ + factory.registerFunction(FunctionFactory::CaseInsensitive); +} + +} + +#endif diff --git a/src/Functions/stringCutToZero.cpp b/src/Functions/stringCutToZero.cpp new file mode 100644 index 00000000000..ed8cee0d70c --- /dev/null +++ b/src/Functions/stringCutToZero.cpp @@ -0,0 +1,154 @@ +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int LOGICAL_ERROR; + extern const int ILLEGAL_COLUMN; +} + +class FunctionToStringCutToZero : public IFunction +{ +public: + static constexpr auto name = "toStringCutToZero"; + static FunctionPtr create(ContextPtr) { return std::make_shared(); } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override { return 1; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isStringOrFixedString(arguments[0])) + throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + static bool tryExecuteString(const IColumn * col, ColumnPtr & col_res) + { + const ColumnString * col_str_in = checkAndGetColumn(col); + + if (col_str_in) + { + auto col_str = ColumnString::create(); + ColumnString::Chars & out_vec = col_str->getChars(); + ColumnString::Offsets & out_offsets = col_str->getOffsets(); + + const ColumnString::Chars & in_vec = col_str_in->getChars(); + const ColumnString::Offsets & in_offsets = col_str_in->getOffsets(); + + size_t size = in_offsets.size(); + out_offsets.resize(size); + out_vec.resize(in_vec.size()); + + char * begin = reinterpret_cast(out_vec.data()); + char * pos = begin; + + ColumnString::Offset current_in_offset = 0; + + for (size_t i = 0; i < size; ++i) + { + const char * pos_in = reinterpret_cast(&in_vec[current_in_offset]); + size_t current_size = strlen(pos_in); + memcpySmallAllowReadWriteOverflow15(pos, pos_in, current_size); + pos += current_size; + *pos = '\0'; + ++pos; + out_offsets[i] = pos - begin; + current_in_offset = in_offsets[i]; + } + out_vec.resize(pos - begin); + + if (!out_offsets.empty() && out_offsets.back() != out_vec.size()) + throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR); + + col_res = std::move(col_str); + return true; + } + else + { + return false; + } + } + + static bool tryExecuteFixedString(const IColumn * col, ColumnPtr & col_res) + { + const ColumnFixedString * col_fstr_in = checkAndGetColumn(col); + + if (col_fstr_in) + { + auto col_str = ColumnString::create(); + ColumnString::Chars & out_vec = col_str->getChars(); + ColumnString::Offsets & out_offsets = col_str->getOffsets(); + + const ColumnString::Chars & in_vec = col_fstr_in->getChars(); + + size_t size = col_fstr_in->size(); + + out_offsets.resize(size); + out_vec.resize(in_vec.size() + size); + + char * begin = reinterpret_cast(out_vec.data()); + char * pos = begin; + const char * pos_in = reinterpret_cast(in_vec.data()); + + size_t n = col_fstr_in->getN(); + + for (size_t i = 0; i < size; ++i) + { + size_t current_size = strnlen(pos_in, n); + memcpySmallAllowReadWriteOverflow15(pos, pos_in, current_size); + pos += current_size; + *pos = '\0'; + out_offsets[i] = ++pos - begin; + pos_in += n; + } + out_vec.resize(pos - begin); + + if (!out_offsets.empty() && out_offsets.back() != out_vec.size()) + throw Exception("Column size mismatch (internal logical error)", ErrorCodes::LOGICAL_ERROR); + + col_res = std::move(col_str); + return true; + } + else + { + return false; + } + } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + { + const IColumn * column = arguments[0].column.get(); + ColumnPtr res_column; + + if (tryExecuteFixedString(column, res_column) || tryExecuteString(column, res_column)) + return res_column; + + throw Exception("Illegal column " + arguments[0].column->getName() + + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + } +}; + + +void registerFunctionToStringCutToZero(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +} diff --git a/src/Functions/synonyms.cpp b/src/Functions/synonyms.cpp new file mode 100644 index 00000000000..4201fbfa677 --- /dev/null +++ b/src/Functions/synonyms.cpp @@ -0,0 +1,128 @@ +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + +#if USE_NLP + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + + +namespace DB +{ +namespace ErrorCodes +{ + extern const int ILLEGAL_COLUMN; + extern const int ILLEGAL_TYPE_OF_ARGUMENT; + extern const int SUPPORT_IS_DISABLED; +} + +class FunctionSynonyms : public IFunction +{ +public: + static constexpr auto name = "synonyms"; + static FunctionPtr create(ContextPtr context) + { + if (!context->getSettingsRef().allow_experimental_nlp_functions) + throw Exception(ErrorCodes::SUPPORT_IS_DISABLED, "Natural language processing function '{}' is experimental. Set `allow_experimental_nlp_functions` setting to enable it", name); + + return std::make_shared(context->getSynonymsExtensions()); + } + +private: + SynonymsExtensions & extensions; + +public: + explicit FunctionSynonyms(SynonymsExtensions & extensions_) + : extensions(extensions_) {} + + String getName() const override { return name; } + + size_t getNumberOfArguments() const override { return 2; } + + DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override + { + if (!isString(arguments[0])) + throw Exception( + "Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + if (!isString(arguments[1])) + throw Exception( + "Illegal type " + arguments[1]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + return std::make_shared(std::make_shared()); + } + + bool useDefaultImplementationForConstants() const override { return true; } + + ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override + { + const auto & extcolumn = arguments[0].column; + const auto & strcolumn = arguments[1].column; + + const ColumnConst * ext_col = checkAndGetColumn(extcolumn.get()); + const ColumnString * word_col = checkAndGetColumn(strcolumn.get()); + + if (!ext_col) + throw Exception( + "Illegal column " + arguments[0].column->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + if (!word_col) + throw Exception( + "Illegal column " + arguments[1].column->getName() + " of argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + + String ext_name = ext_col->getValue(); + auto extension = extensions.getExtension(ext_name); + + /// Create and fill the result array. + const DataTypePtr & elem_type = static_cast(*result_type).getNestedType(); + + auto out = ColumnArray::create(elem_type->createColumn()); + IColumn & out_data = out->getData(); + IColumn::Offsets & out_offsets = out->getOffsets(); + + const ColumnString::Chars & data = word_col->getChars(); + const ColumnString::Offsets & offsets = word_col->getOffsets(); + out_data.reserve(input_rows_count); + out_offsets.resize(input_rows_count); + + IColumn::Offset current_offset = 0; + for (size_t i = 0; i < offsets.size(); ++i) + { + std::string_view word(reinterpret_cast(data.data() + offsets[i - 1]), offsets[i] - offsets[i - 1] - 1); + + const auto * synset = extension->getSynonyms(word); + + if (synset) + { + for (const auto & token : *synset) + out_data.insert(Field(token.data(), token.size())); + + current_offset += synset->size(); + } + out_offsets[i] = current_offset; + } + + return out; + } +}; + +void registerFunctionSynonyms(FunctionFactory & factory) +{ + factory.registerFunction(FunctionFactory::CaseInsensitive); +} + +} + +#endif diff --git a/src/Functions/tcpPort.cpp b/src/Functions/tcpPort.cpp index 484843ced3f..e0905ed4d0c 100644 --- a/src/Functions/tcpPort.cpp +++ b/src/Functions/tcpPort.cpp @@ -16,10 +16,10 @@ public: static FunctionPtr create(ContextPtr context) { - return std::make_shared(context->getTCPPort()); + return std::make_shared(context->isDistributed(), context->getTCPPort()); } - explicit FunctionTcpPort(UInt16 port_) : port(port_) + explicit FunctionTcpPort(bool is_distributed_, UInt16 port_) : is_distributed(is_distributed_), port(port_) { } @@ -31,12 +31,17 @@ public: bool isDeterministic() const override { return false; } + bool isDeterministicInScopeOfQuery() const override { return true; } + + bool isSuitableForConstantFolding() const override { return !is_distributed; } + ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override { return DataTypeUInt16().createColumnConst(input_rows_count, port); } private: + bool is_distributed; const UInt64 port; }; diff --git a/src/Functions/timezone.cpp b/src/Functions/timezone.cpp index 67f7462fc95..dda30352750 100644 --- a/src/Functions/timezone.cpp +++ b/src/Functions/timezone.cpp @@ -3,6 +3,7 @@ #include #include #include +#include namespace DB @@ -16,9 +17,13 @@ class FunctionTimezone : public IFunction { public: static constexpr auto name = "timezone"; - static FunctionPtr create(ContextPtr) + static FunctionPtr create(ContextPtr context) + { + return std::make_shared(context->isDistributed()); + } + + explicit FunctionTimezone(bool is_distributed_) : is_distributed(is_distributed_) { - return std::make_shared(); } String getName() const override @@ -36,11 +41,15 @@ public: } bool isDeterministic() const override { return false; } + bool isDeterministicInScopeOfQuery() const override { return true; } + bool isSuitableForConstantFolding() const override { return !is_distributed; } ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override { return DataTypeString().createColumnConst(input_rows_count, DateLUT::instance().getTimeZone()); } +private: + bool is_distributed; }; } diff --git a/src/Functions/toTimezone.cpp b/src/Functions/toTimezone.cpp index 551e07a8354..4bb5ab47659 100644 --- a/src/Functions/toTimezone.cpp +++ b/src/Functions/toTimezone.cpp @@ -19,20 +19,70 @@ namespace ErrorCodes namespace { +class ExecutableFunctionToTimeZone : public IExecutableFunction +{ +public: + explicit ExecutableFunctionToTimeZone() = default; + + String getName() const override { return "toTimezone"; } + + ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t /*input_rows_count*/) const override + { + return arguments[0].column; + } +}; + +class FunctionBaseToTimeZone : public IFunctionBase +{ +public: + FunctionBaseToTimeZone( + bool is_constant_timezone_, + DataTypes argument_types_, + DataTypePtr return_type_) + : is_constant_timezone(is_constant_timezone_) + , argument_types(std::move(argument_types_)) + , return_type(std::move(return_type_)) {} + + String getName() const override { return "toTimezone"; } + + const DataTypes & getArgumentTypes() const override + { + return argument_types; + } + + const DataTypePtr & getResultType() const override + { + return return_type; + } + + ExecutableFunctionPtr prepare(const ColumnsWithTypeAndName & /*arguments*/) const override + { + return std::make_unique(); + } + + bool hasInformationAboutMonotonicity() const override { return is_constant_timezone; } + + Monotonicity getMonotonicityForRange(const IDataType & /*type*/, const Field & /*left*/, const Field & /*right*/) const override + { + return {is_constant_timezone, is_constant_timezone, is_constant_timezone}; + } + +private: + bool is_constant_timezone; + DataTypes argument_types; + DataTypePtr return_type; +}; /// Just changes time zone information for data type. The calculation is free. -class FunctionToTimezone : public IFunction +class ToTimeZoneOverloadResolver : public IFunctionOverloadResolver { public: static constexpr auto name = "toTimezone"; - static FunctionPtr create(ContextPtr) { return std::make_shared(); } - String getName() const override - { - return name; - } + String getName() const override { return name; } size_t getNumberOfArguments() const override { return 2; } + static FunctionOverloadResolverPtr create(ContextPtr) { return std::make_unique(); } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { @@ -54,9 +104,17 @@ public: return std::make_shared(date_time64->getScale(), time_zone_name); } - ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t /*input_rows_count*/) const override + FunctionBasePtr buildImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type) const override { - return arguments[0].column; + bool is_constant_timezone = false; + if (arguments[1].column) + is_constant_timezone = isColumnConst(*arguments[1].column); + + DataTypes data_types(arguments.size()); + for (size_t i = 0; i < arguments.size(); ++i) + data_types[i] = arguments[i].type; + + return std::make_unique(is_constant_timezone, data_types, result_type); } }; @@ -64,7 +122,7 @@ public: void registerFunctionToTimeZone(FunctionFactory & factory) { - factory.registerFunction(); + factory.registerFunction(); factory.registerAlias("toTimeZone", "toTimezone"); } diff --git a/src/Functions/uptime.cpp b/src/Functions/uptime.cpp index 02454df4de5..f0175031fc4 100644 --- a/src/Functions/uptime.cpp +++ b/src/Functions/uptime.cpp @@ -15,10 +15,10 @@ public: static constexpr auto name = "uptime"; static FunctionPtr create(ContextPtr context) { - return std::make_shared(context->getUptimeSeconds()); + return std::make_shared(context->isDistributed(), context->getUptimeSeconds()); } - explicit FunctionUptime(time_t uptime_) : uptime(uptime_) + explicit FunctionUptime(bool is_distributed_, time_t uptime_) : is_distributed(is_distributed_), uptime(uptime_) { } @@ -38,6 +38,8 @@ public: } bool isDeterministic() const override { return false; } + bool isDeterministicInScopeOfQuery() const override { return true; } + bool isSuitableForConstantFolding() const override { return !is_distributed; } ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override { @@ -45,6 +47,7 @@ public: } private: + bool is_distributed; time_t uptime; }; diff --git a/src/Functions/version.cpp b/src/Functions/version.cpp index 4e0ddf60975..5a31bd073d4 100644 --- a/src/Functions/version.cpp +++ b/src/Functions/version.cpp @@ -2,6 +2,7 @@ #include #include #include +#include #if !defined(ARCADIA_BUILD) # include @@ -16,9 +17,13 @@ class FunctionVersion : public IFunction { public: static constexpr auto name = "version"; - static FunctionPtr create(ContextPtr) + static FunctionPtr create(ContextPtr context) + { + return std::make_shared(context->isDistributed()); + } + + explicit FunctionVersion(bool is_distributed_) : is_distributed(is_distributed_) { - return std::make_shared(); } String getName() const override @@ -27,8 +32,8 @@ public: } bool isDeterministic() const override { return false; } - bool isDeterministicInScopeOfQuery() const override { return false; } - bool isSuitableForConstantFolding() const override { return false; } + bool isDeterministicInScopeOfQuery() const override { return true; } + bool isSuitableForConstantFolding() const override { return !is_distributed; } size_t getNumberOfArguments() const override { @@ -44,6 +49,8 @@ public: { return DataTypeString().createColumnConst(input_rows_count, VERSION_STRING); } +private: + bool is_distributed; }; diff --git a/src/Functions/ya.make b/src/Functions/ya.make index 7955d4091e9..2b9b3d94313 100644 --- a/src/Functions/ya.make +++ b/src/Functions/ya.make @@ -39,6 +39,7 @@ PEERDIR( SRCS( CRC.cpp + FunctionChar.cpp FunctionFQDN.cpp FunctionFactory.cpp FunctionFile.cpp @@ -46,7 +47,10 @@ SRCS( FunctionJoinGet.cpp FunctionSQLJSON.cpp FunctionsAES.cpp - FunctionsCoding.cpp + FunctionsBinaryRepr.cpp + FunctionsBitToArray.cpp + FunctionsCodingIP.cpp + FunctionsCodingUUID.cpp FunctionsConversion.cpp FunctionsEmbeddedDictionaries.cpp FunctionsExternalDictionaries.cpp @@ -209,7 +213,6 @@ SRCS( bitTestAny.cpp bitWrapperFunc.cpp bitXor.cpp - bitmaskToList.cpp blockNumber.cpp blockSerializedSize.cpp blockSize.cpp @@ -229,6 +232,8 @@ SRCS( countSubstringsCaseInsensitive.cpp countSubstringsCaseInsensitiveUTF8.cpp currentDatabase.cpp + currentProfiles.cpp + currentRoles.cpp currentUser.cpp dateDiff.cpp dateName.cpp @@ -316,6 +321,7 @@ SRCS( ilike.cpp in.cpp indexHint.cpp + initialQueryID.cpp initializeAggregation.cpp intDiv.cpp intDivOrZero.cpp @@ -334,6 +340,7 @@ SRCS( jumpConsistentHash.cpp lcm.cpp least.cpp + lemmatize.cpp lengthUTF8.cpp less.cpp lessOrEquals.cpp @@ -409,6 +416,7 @@ SRCS( positionCaseInsensitiveUTF8.cpp positionUTF8.cpp pow.cpp + queryID.cpp rand.cpp rand64.cpp randConstant.cpp @@ -474,6 +482,8 @@ SRCS( sleepEachRow.cpp sqrt.cpp startsWith.cpp + stem.cpp + stringCutToZero.cpp stringToH3.cpp substring.cpp subtractDays.cpp @@ -485,6 +495,7 @@ SRCS( subtractWeeks.cpp subtractYears.cpp svg.cpp + synonyms.cpp tan.cpp tanh.cpp tcpPort.cpp diff --git a/src/IO/FileEncryptionCommon.cpp b/src/IO/FileEncryptionCommon.cpp index 9cbc8ff0f3c..134c446c102 100644 --- a/src/IO/FileEncryptionCommon.cpp +++ b/src/IO/FileEncryptionCommon.cpp @@ -1,11 +1,13 @@ #include #if USE_SSL -#include #include +#include #include #include +#include +#include #include #include @@ -15,6 +17,7 @@ namespace DB namespace ErrorCodes { + extern const int BAD_ARGUMENTS; extern const int DATA_ENCRYPTION_ERROR; } @@ -23,244 +26,387 @@ namespace FileEncryption namespace { - String toBigEndianString(UInt128 value) + const EVP_CIPHER * getCipher(Algorithm algorithm) { - WriteBufferFromOwnString out; - writeBinaryBigEndian(value, out); - return std::move(out.str()); + switch (algorithm) + { + case Algorithm::AES_128_CTR: return EVP_aes_128_ctr(); + case Algorithm::AES_192_CTR: return EVP_aes_192_ctr(); + case Algorithm::AES_256_CTR: return EVP_aes_256_ctr(); + } + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "Encryption algorithm {} is not supported, specify one of the following: aes_128_ctr, aes_192_ctr, aes_256_ctr", + std::to_string(static_cast(algorithm))); } - UInt128 fromBigEndianString(const String & str) + void checkKeySize(const EVP_CIPHER * evp_cipher, size_t key_size) { - ReadBufferFromMemory in{str.data(), str.length()}; - UInt128 result; - readBinaryBigEndian(result, in); - return result; - } -} - -InitVector::InitVector(const String & iv_) : iv(fromBigEndianString(iv_)) {} - -const String & InitVector::str() const -{ - local = toBigEndianString(iv + counter); - return local; -} - -Encryption::Encryption(const String & iv_, const EncryptionKey & key_, size_t offset_) - : evp_cipher(defaultCipher()) - , init_vector(iv_) - , key(key_) - , block_size(cipherIVLength(evp_cipher)) -{ - if (iv_.size() != cipherIVLength(evp_cipher)) - throw DB::Exception("Expected iv with size " + std::to_string(cipherIVLength(evp_cipher)) + ", got iv with size " + std::to_string(iv_.size()), - DB::ErrorCodes::DATA_ENCRYPTION_ERROR); - if (key_.size() != cipherKeyLength(evp_cipher)) - throw DB::Exception("Expected key with size " + std::to_string(cipherKeyLength(evp_cipher)) + ", got iv with size " + std::to_string(key_.size()), - DB::ErrorCodes::DATA_ENCRYPTION_ERROR); - - offset = offset_; -} - -size_t Encryption::partBlockSize(size_t size, size_t off) const -{ - assert(off < block_size); - /// write the part as usual block - if (off == 0) - return 0; - return off + size <= block_size ? size : (block_size - off) % block_size; -} - -void Encryptor::encrypt(const char * plaintext, WriteBuffer & buf, size_t size) -{ - if (!size) - return; - - auto iv = InitVector(init_vector); - auto off = blockOffset(offset); - iv.set(blocks(offset)); - - size_t part_size = partBlockSize(size, off); - if (off) - { - buf.write(encryptPartialBlock(plaintext, part_size, iv, off).data(), part_size); - offset += part_size; - size -= part_size; - iv.inc(); + if (!key_size) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Encryption key must not be empty"); + size_t expected_key_size = static_cast(EVP_CIPHER_key_length(evp_cipher)); + if (key_size != expected_key_size) + throw Exception( + ErrorCodes::BAD_ARGUMENTS, "Got an encryption key with unexpected size {}, the size should be {}", key_size, expected_key_size); } - if (size) + void checkInitVectorSize(const EVP_CIPHER * evp_cipher) { - buf.write(encryptNBytes(plaintext + part_size, size, iv).data(), size); - offset += size; - } -} - -String Encryptor::encryptPartialBlock(const char * partial_block, size_t size, const InitVector & iv, size_t off) const -{ - if (size > block_size) - throw Exception("Expected partial block, got block with size > block_size: size = " + std::to_string(size) + " and offset = " + std::to_string(off), - ErrorCodes::DATA_ENCRYPTION_ERROR); - - String plaintext(block_size, '\0'); - for (size_t i = 0; i < size; ++i) - plaintext[i + off] = partial_block[i]; - - return String(encryptNBytes(plaintext.data(), block_size, iv), off, size); -} - -String Encryptor::encryptNBytes(const char * data, size_t bytes, const InitVector & iv) const -{ - String ciphertext(bytes, '\0'); - auto * ciphertext_ref = ciphertext.data(); - - auto evp_ctx_ptr = std::unique_ptr(EVP_CIPHER_CTX_new(), &EVP_CIPHER_CTX_free); - auto * evp_ctx = evp_ctx_ptr.get(); - - if (EVP_EncryptInit_ex(evp_ctx, evp_cipher, nullptr, nullptr, nullptr) != 1) - throw Exception("Failed to initialize encryption context with cipher", ErrorCodes::DATA_ENCRYPTION_ERROR); - - if (EVP_EncryptInit_ex(evp_ctx, nullptr, nullptr, - reinterpret_cast(key.str().data()), - reinterpret_cast(iv.str().data())) != 1) - throw Exception("Failed to set key and IV for encryption", ErrorCodes::DATA_ENCRYPTION_ERROR); - - int output_len = 0; - if (EVP_EncryptUpdate(evp_ctx, - reinterpret_cast(ciphertext_ref), &output_len, - reinterpret_cast(data), static_cast(bytes)) != 1) - throw Exception("Failed to encrypt", ErrorCodes::DATA_ENCRYPTION_ERROR); - - ciphertext_ref += output_len; - - int final_output_len = 0; - if (EVP_EncryptFinal_ex(evp_ctx, - reinterpret_cast(ciphertext_ref), &final_output_len) != 1) - throw Exception("Failed to fetch ciphertext", ErrorCodes::DATA_ENCRYPTION_ERROR); - - if (output_len < 0 || final_output_len < 0 || static_cast(output_len) + static_cast(final_output_len) != bytes) - throw Exception("Only part of the data was encrypted", ErrorCodes::DATA_ENCRYPTION_ERROR); - - return ciphertext; -} - -void Decryptor::decrypt(const char * ciphertext, BufferBase::Position buf, size_t size, size_t off) -{ - if (!size) - return; - - auto iv = InitVector(init_vector); - iv.set(blocks(off)); - off = blockOffset(off); - - size_t part_size = partBlockSize(size, off); - if (off) - { - decryptPartialBlock(buf, ciphertext, part_size, iv, off); - size -= part_size; - if (part_size + off == block_size) - iv.inc(); + size_t expected_iv_length = static_cast(EVP_CIPHER_iv_length(evp_cipher)); + if (InitVector::kSize != expected_iv_length) + throw Exception( + ErrorCodes::DATA_ENCRYPTION_ERROR, + "Got an initialization vector with unexpected size {}, the size should be {}", + InitVector::kSize, + expected_iv_length); } - if (size) - decryptNBytes(buf, ciphertext + part_size, size, iv); + constexpr const size_t kBlockSize = 16; + + size_t blockOffset(size_t pos) { return pos % kBlockSize; } + size_t blocks(size_t pos) { return pos / kBlockSize; } + + size_t partBlockSize(size_t size, size_t off) + { + assert(off < kBlockSize); + /// write the part as usual block + if (off == 0) + return 0; + return off + size <= kBlockSize ? size : (kBlockSize - off) % kBlockSize; + } + + size_t encryptBlocks(EVP_CIPHER_CTX * evp_ctx, const char * data, size_t size, WriteBuffer & out) + { + const uint8_t * in = reinterpret_cast(data); + size_t in_size = 0; + size_t out_size = 0; + + while (in_size < size) + { + out.nextIfAtEnd(); + size_t part_size = std::min(size - in_size, out.available()); + uint8_t * ciphertext = reinterpret_cast(out.position()); + int ciphertext_size = 0; + if (!EVP_EncryptUpdate(evp_ctx, ciphertext, &ciphertext_size, &in[in_size], part_size)) + throw Exception("Failed to encrypt", ErrorCodes::DATA_ENCRYPTION_ERROR); + + in_size += part_size; + if (ciphertext_size) + { + out.position() += ciphertext_size; + out_size += ciphertext_size; + } + } + + return out_size; + } + + size_t encryptBlockWithPadding(EVP_CIPHER_CTX * evp_ctx, const char * data, size_t size, size_t pad_left, WriteBuffer & out) + { + assert((size <= kBlockSize) && (size + pad_left <= kBlockSize)); + uint8_t padded_data[kBlockSize] = {}; + memcpy(&padded_data[pad_left], data, size); + size_t padded_data_size = pad_left + size; + + uint8_t ciphertext[kBlockSize]; + int ciphertext_size = 0; + if (!EVP_EncryptUpdate(evp_ctx, ciphertext, &ciphertext_size, padded_data, padded_data_size)) + throw Exception("Failed to encrypt", ErrorCodes::DATA_ENCRYPTION_ERROR); + + if (!ciphertext_size) + return 0; + + if (static_cast(ciphertext_size) < pad_left) + throw Exception(ErrorCodes::DATA_ENCRYPTION_ERROR, "Unexpected size of encrypted data: {} < {}", ciphertext_size, pad_left); + + uint8_t * ciphertext_begin = &ciphertext[pad_left]; + ciphertext_size -= pad_left; + out.write(reinterpret_cast(ciphertext_begin), ciphertext_size); + return ciphertext_size; + } + + size_t encryptFinal(EVP_CIPHER_CTX * evp_ctx, WriteBuffer & out) + { + uint8_t ciphertext[kBlockSize]; + int ciphertext_size = 0; + if (!EVP_EncryptFinal_ex(evp_ctx, + ciphertext, &ciphertext_size)) + throw Exception("Failed to finalize encrypting", ErrorCodes::DATA_ENCRYPTION_ERROR); + if (ciphertext_size) + out.write(reinterpret_cast(ciphertext), ciphertext_size); + return ciphertext_size; + } + + size_t decryptBlocks(EVP_CIPHER_CTX * evp_ctx, const char * data, size_t size, char * out) + { + const uint8_t * in = reinterpret_cast(data); + uint8_t * plaintext = reinterpret_cast(out); + int plaintext_size = 0; + if (!EVP_DecryptUpdate(evp_ctx, plaintext, &plaintext_size, in, size)) + throw Exception("Failed to decrypt", ErrorCodes::DATA_ENCRYPTION_ERROR); + return plaintext_size; + } + + size_t decryptBlockWithPadding(EVP_CIPHER_CTX * evp_ctx, const char * data, size_t size, size_t pad_left, char * out) + { + assert((size <= kBlockSize) && (size + pad_left <= kBlockSize)); + uint8_t padded_data[kBlockSize] = {}; + memcpy(&padded_data[pad_left], data, size); + size_t padded_data_size = pad_left + size; + + uint8_t plaintext[kBlockSize]; + int plaintext_size = 0; + if (!EVP_DecryptUpdate(evp_ctx, plaintext, &plaintext_size, padded_data, padded_data_size)) + throw Exception("Failed to decrypt", ErrorCodes::DATA_ENCRYPTION_ERROR); + + if (!plaintext_size) + return 0; + + if (static_cast(plaintext_size) < pad_left) + throw Exception(ErrorCodes::DATA_ENCRYPTION_ERROR, "Unexpected size of decrypted data: {} < {}", plaintext_size, pad_left); + + const uint8_t * plaintext_begin = &plaintext[pad_left]; + plaintext_size -= pad_left; + memcpy(out, plaintext_begin, plaintext_size); + return plaintext_size; + } + + size_t decryptFinal(EVP_CIPHER_CTX * evp_ctx, char * out) + { + uint8_t plaintext[kBlockSize]; + int plaintext_size = 0; + if (!EVP_DecryptFinal_ex(evp_ctx, plaintext, &plaintext_size)) + throw Exception("Failed to finalize decrypting", ErrorCodes::DATA_ENCRYPTION_ERROR); + if (plaintext_size) + memcpy(out, plaintext, plaintext_size); + return plaintext_size; + } + + constexpr const char kHeaderSignature[] = "ENC"; + constexpr const UInt16 kHeaderCurrentVersion = 1; } -void Decryptor::decryptPartialBlock(BufferBase::Position & to, const char * partial_block, size_t size, const InitVector & iv, size_t off) const + +String toString(Algorithm algorithm) { - if (size > block_size) - throw Exception("Expecter partial block, got block with size > block_size: size = " + std::to_string(size) + " and offset = " + std::to_string(off), - ErrorCodes::DATA_ENCRYPTION_ERROR); - - String ciphertext(block_size, '\0'); - String plaintext(block_size, '\0'); - for (size_t i = 0; i < size; ++i) - ciphertext[i + off] = partial_block[i]; - - auto * plaintext_ref = plaintext.data(); - decryptNBytes(plaintext_ref, ciphertext.data(), off + size, iv); - - for (size_t i = 0; i < size; ++i) - *(to++) = plaintext[i + off]; + switch (algorithm) + { + case Algorithm::AES_128_CTR: return "aes_128_ctr"; + case Algorithm::AES_192_CTR: return "aes_192_ctr"; + case Algorithm::AES_256_CTR: return "aes_256_ctr"; + } + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "Encryption algorithm {} is not supported, specify one of the following: aes_128_ctr, aes_192_ctr, aes_256_ctr", + std::to_string(static_cast(algorithm))); } -void Decryptor::decryptNBytes(BufferBase::Position & to, const char * data, size_t bytes, const InitVector & iv) const +void parseFromString(Algorithm & algorithm, const String & str) { - auto evp_ctx_ptr = std::unique_ptr(EVP_CIPHER_CTX_new(), &EVP_CIPHER_CTX_free); - auto * evp_ctx = evp_ctx_ptr.get(); - - if (EVP_DecryptInit_ex(evp_ctx, evp_cipher, nullptr, nullptr, nullptr) != 1) - throw Exception("Failed to initialize decryption context with cipher", ErrorCodes::DATA_ENCRYPTION_ERROR); - - if (EVP_DecryptInit_ex(evp_ctx, nullptr, nullptr, - reinterpret_cast(key.str().data()), - reinterpret_cast(iv.str().data())) != 1) - throw Exception("Failed to set key and IV for decryption", ErrorCodes::DATA_ENCRYPTION_ERROR); - - int output_len = 0; - if (EVP_DecryptUpdate(evp_ctx, - reinterpret_cast(to), &output_len, - reinterpret_cast(data), static_cast(bytes)) != 1) - throw Exception("Failed to decrypt", ErrorCodes::DATA_ENCRYPTION_ERROR); - - to += output_len; - - int final_output_len = 0; - if (EVP_DecryptFinal_ex(evp_ctx, - reinterpret_cast(to), &final_output_len) != 1) - throw Exception("Failed to fetch plaintext", ErrorCodes::DATA_ENCRYPTION_ERROR); - - if (output_len < 0 || final_output_len < 0 || static_cast(output_len) + static_cast(final_output_len) != bytes) - throw Exception("Only part of the data was decrypted", ErrorCodes::DATA_ENCRYPTION_ERROR); + if (boost::iequals(str, "aes_128_ctr")) + algorithm = Algorithm::AES_128_CTR; + else if (boost::iequals(str, "aes_192_ctr")) + algorithm = Algorithm::AES_192_CTR; + else if (boost::iequals(str, "aes_256_ctr")) + algorithm = Algorithm::AES_256_CTR; + else + throw Exception( + ErrorCodes::BAD_ARGUMENTS, + "Encryption algorithm '{}' is not supported, specify one of the following: aes_128_ctr, aes_192_ctr, aes_256_ctr", + str); } -String readIV(size_t size, ReadBuffer & in) +void checkKeySize(Algorithm algorithm, size_t key_size) { checkKeySize(getCipher(algorithm), key_size); } + + +String InitVector::toString() const { - String iv(size, 0); - in.readStrict(reinterpret_cast(iv.data()), size); - return iv; + static_assert(sizeof(counter) == InitVector::kSize); + WriteBufferFromOwnString out; + writeBinaryBigEndian(counter, out); + return std::move(out.str()); } -String randomString(size_t size) +InitVector InitVector::fromString(const String & str) { - String iv(size, 0); + if (str.length() != InitVector::kSize) + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Expected iv with size {}, got iv with size {}", InitVector::kSize, str.length()); + ReadBufferFromMemory in{str.data(), str.length()}; + UInt128 counter; + readBinaryBigEndian(counter, in); + return InitVector{counter}; +} +void InitVector::read(ReadBuffer & in) +{ + readBinaryBigEndian(counter, in); +} + +void InitVector::write(WriteBuffer & out) const +{ + writeBinaryBigEndian(counter, out); +} + +InitVector InitVector::random() +{ std::random_device rd; std::mt19937 gen{rd()}; - std::uniform_int_distribution dis; + std::uniform_int_distribution dis; + UInt128 counter; + for (size_t i = 0; i != std::size(counter.items); ++i) + counter.items[i] = dis(gen); + return InitVector{counter}; +} - char * ptr = iv.data(); - while (size) + +Encryptor::Encryptor(Algorithm algorithm_, const String & key_, const InitVector & iv_) + : key(key_) + , init_vector(iv_) + , evp_cipher(getCipher(algorithm_)) +{ + checkKeySize(evp_cipher, key.size()); + checkInitVectorSize(evp_cipher); +} + +void Encryptor::encrypt(const char * data, size_t size, WriteBuffer & out) +{ + if (!size) + return; + + auto current_iv = (init_vector + blocks(offset)).toString(); + + auto evp_ctx_ptr = std::unique_ptr(EVP_CIPHER_CTX_new(), &EVP_CIPHER_CTX_free); + auto * evp_ctx = evp_ctx_ptr.get(); + + if (!EVP_EncryptInit_ex(evp_ctx, evp_cipher, nullptr, nullptr, nullptr)) + throw Exception("Failed to initialize encryption context with cipher", ErrorCodes::DATA_ENCRYPTION_ERROR); + + if (!EVP_EncryptInit_ex(evp_ctx, nullptr, nullptr, + reinterpret_cast(key.c_str()), reinterpret_cast(current_iv.c_str()))) + throw Exception("Failed to set key and IV for encryption", ErrorCodes::DATA_ENCRYPTION_ERROR); + + size_t in_size = 0; + size_t out_size = 0; + + auto off = blockOffset(offset); + if (off) { - auto value = dis(gen); - size_t n = std::min(size, sizeof(value)); - memcpy(ptr, &value, n); - ptr += n; - size -= n; + size_t in_part_size = partBlockSize(size, off); + size_t out_part_size = encryptBlockWithPadding(evp_ctx, &data[in_size], in_part_size, off, out); + in_size += in_part_size; + out_size += out_part_size; } - return iv; + if (in_size < size) + { + size_t in_part_size = size - in_size; + size_t out_part_size = encryptBlocks(evp_ctx, &data[in_size], in_part_size, out); + in_size += in_part_size; + out_size += out_part_size; + } + + out_size += encryptFinal(evp_ctx, out); + + if (out_size != in_size) + throw Exception("Only part of the data was encrypted", ErrorCodes::DATA_ENCRYPTION_ERROR); + offset += in_size; } -void writeIV(const String & iv, WriteBuffer & out) +void Encryptor::decrypt(const char * data, size_t size, char * out) { - out.write(iv.data(), iv.length()); + if (!size) + return; + + auto current_iv = (init_vector + blocks(offset)).toString(); + + auto evp_ctx_ptr = std::unique_ptr(EVP_CIPHER_CTX_new(), &EVP_CIPHER_CTX_free); + auto * evp_ctx = evp_ctx_ptr.get(); + + if (!EVP_DecryptInit_ex(evp_ctx, evp_cipher, nullptr, nullptr, nullptr)) + throw Exception("Failed to initialize decryption context with cipher", ErrorCodes::DATA_ENCRYPTION_ERROR); + + if (!EVP_DecryptInit_ex(evp_ctx, nullptr, nullptr, + reinterpret_cast(key.c_str()), reinterpret_cast(current_iv.c_str()))) + throw Exception("Failed to set key and IV for decryption", ErrorCodes::DATA_ENCRYPTION_ERROR); + + size_t in_size = 0; + size_t out_size = 0; + + auto off = blockOffset(offset); + if (off) + { + size_t in_part_size = partBlockSize(size, off); + size_t out_part_size = decryptBlockWithPadding(evp_ctx, &data[in_size], in_part_size, off, &out[out_size]); + in_size += in_part_size; + out_size += out_part_size; + } + + if (in_size < size) + { + size_t in_part_size = size - in_size; + size_t out_part_size = decryptBlocks(evp_ctx, &data[in_size], in_part_size, &out[out_size]); + in_size += in_part_size; + out_size += out_part_size; + } + + out_size += decryptFinal(evp_ctx, &out[out_size]); + + if (out_size != in_size) + throw Exception("Only part of the data was decrypted", ErrorCodes::DATA_ENCRYPTION_ERROR); + offset += in_size; } -size_t cipherKeyLength(const EVP_CIPHER * evp_cipher) + +void Header::read(ReadBuffer & in) { - return static_cast(EVP_CIPHER_key_length(evp_cipher)); + constexpr size_t header_signature_size = std::size(kHeaderSignature) - 1; + char signature[std::size(kHeaderSignature)] = {}; + in.readStrict(signature, header_signature_size); + if (strcmp(signature, kHeaderSignature) != 0) + throw Exception(ErrorCodes::DATA_ENCRYPTION_ERROR, "Wrong signature, this is not an encrypted file"); + + UInt16 version; + readPODBinary(version, in); + if (version != kHeaderCurrentVersion) + throw Exception(ErrorCodes::DATA_ENCRYPTION_ERROR, "Version {} of the header is not supported", version); + + UInt16 algorithm_u16; + readPODBinary(algorithm_u16, in); + algorithm = static_cast(algorithm_u16); + + readPODBinary(key_id, in); + readPODBinary(key_hash, in); + init_vector.read(in); + + constexpr size_t reserved_size = kSize - header_signature_size - sizeof(version) - sizeof(algorithm_u16) - sizeof(key_id) - sizeof(key_hash) - InitVector::kSize; + static_assert(reserved_size < kSize); + in.ignore(reserved_size); } -size_t cipherIVLength(const EVP_CIPHER * evp_cipher) +void Header::write(WriteBuffer & out) const { - return static_cast(EVP_CIPHER_iv_length(evp_cipher)); + constexpr size_t header_signature_size = std::size(kHeaderSignature) - 1; + out.write(kHeaderSignature, header_signature_size); + + UInt16 version = kHeaderCurrentVersion; + writePODBinary(version, out); + + UInt16 algorithm_u16 = static_cast(algorithm); + writePODBinary(algorithm_u16, out); + + writePODBinary(key_id, out); + writePODBinary(key_hash, out); + init_vector.write(out); + + constexpr size_t reserved_size = kSize - header_signature_size - sizeof(version) - sizeof(algorithm_u16) - sizeof(key_id) - sizeof(key_hash) - InitVector::kSize; + static_assert(reserved_size < kSize); + char reserved_zero_bytes[reserved_size] = {}; + out.write(reserved_zero_bytes, reserved_size); } -const EVP_CIPHER * defaultCipher() +UInt8 calculateKeyHash(const String & key) { - return EVP_aes_128_ctr(); + return static_cast(sipHash64(key.data(), key.size())) & 0x0F; } } diff --git a/src/IO/FileEncryptionCommon.h b/src/IO/FileEncryptionCommon.h index f40de99faf6..91fc80c0b22 100644 --- a/src/IO/FileEncryptionCommon.h +++ b/src/IO/FileEncryptionCommon.h @@ -16,87 +16,116 @@ class WriteBuffer; namespace FileEncryption { -constexpr size_t kIVSize = sizeof(UInt128); +/// Encryption algorithm. +/// We chose to use CTR cipther algorithms because they have the following features which are important for us: +/// - No right padding, so we can append encrypted files without deciphering; +/// - One byte is always ciphered as one byte, so we get random access to encrypted files easily. +enum class Algorithm +{ + AES_128_CTR, /// Size of key is 16 bytes. + AES_192_CTR, /// Size of key is 24 bytes. + AES_256_CTR, /// Size of key is 32 bytes. +}; +String toString(Algorithm algorithm); +void parseFromString(Algorithm & algorithm, const String & str); + +/// Throws an exception if a specified key size doesn't correspond a specified encryption algorithm. +void checkKeySize(Algorithm algorithm, size_t key_size); + + +/// Initialization vector. Its size is always 16 bytes. class InitVector { public: - InitVector(const String & iv_); - const String & str() const; - void inc() { ++counter; } - void inc(size_t n) { counter += n; } - void set(size_t n) { counter = n; } + static constexpr const size_t kSize = 16; + + InitVector() = default; + explicit InitVector(const UInt128 & counter_) { set(counter_); } + + void set(const UInt128 & counter_) { counter = counter_; } + UInt128 get() const { return counter; } + + void read(ReadBuffer & in); + void write(WriteBuffer & out) const; + + /// Write 16 bytes of the counter to a string in big endian order. + /// We need big endian because the used cipher algorithms treat an initialization vector as a counter in big endian. + String toString() const; + + /// Converts a string of 16 bytes length in big endian order to a counter. + static InitVector fromString(const String & str_); + + /// Adds a specified offset to the counter. + InitVector & operator++() { ++counter; return *this; } + InitVector operator++(int) { InitVector res = *this; ++counter; return res; } + InitVector & operator+=(size_t offset) { counter += offset; return *this; } + InitVector operator+(size_t offset) const { InitVector res = *this; return res += offset; } + + /// Generates a random initialization vector. + static InitVector random(); private: - UInt128 iv; UInt128 counter = 0; - mutable String local; }; -class EncryptionKey +/// Encrypts or decrypts data. +class Encryptor { public: - EncryptionKey(const String & key_) : key(key_) { } - size_t size() const { return key.size(); } - const String & str() const { return key; } + /// The `key` should have size 16 or 24 or 32 bytes depending on which `algorithm` is specified. + Encryptor(Algorithm algorithm_, const String & key_, const InitVector & iv_); + + /// Sets the current position in the data stream from the very beginning of data. + /// It affects how the data will be encrypted or decrypted because + /// the initialization vector is increased by an index of the current block + /// and the index of the current block is calculated from this offset. + void setOffset(size_t offset_) { offset = offset_; } + + /// Encrypts some data. + /// Also the function moves `offset` by `size` (for successive encryptions). + void encrypt(const char * data, size_t size, WriteBuffer & out); + + /// Decrypts some data. + /// The used cipher algorithms generate the same number of bytes in output as they were in input, + /// so the function always writes `size` bytes of the plaintext to `out`. + /// Also the function moves `offset` by `size` (for successive decryptions). + void decrypt(const char * data, size_t size, char * out); private: - String key; -}; + const String key; + const InitVector init_vector; + const EVP_CIPHER * const evp_cipher; - -class Encryption -{ -public: - Encryption(const String & iv_, const EncryptionKey & key_, size_t offset_); - -protected: - size_t blockOffset(size_t pos) const { return pos % block_size; } - size_t blocks(size_t pos) const { return pos / block_size; } - size_t partBlockSize(size_t size, size_t off) const; - const EVP_CIPHER * get() const { return evp_cipher; } - - const EVP_CIPHER * evp_cipher; - const String init_vector; - const EncryptionKey key; - size_t block_size; - - /// absolute offset + /// The current position in the data stream from the very beginning of data. size_t offset = 0; }; -class Encryptor : public Encryption +/// File header which is stored at the beginning of encrypted files. +struct Header { -public: - using Encryption::Encryption; - void encrypt(const char * plaintext, WriteBuffer & buf, size_t size); + Algorithm algorithm = Algorithm::AES_128_CTR; -private: - String encryptPartialBlock(const char * partial_block, size_t size, const InitVector & iv, size_t off) const; - String encryptNBytes(const char * data, size_t bytes, const InitVector & iv) const; + /// Identifier of the key to encrypt or decrypt this file. + UInt64 key_id = 0; + + /// Hash of the key to encrypt or decrypt this file. + UInt8 key_hash = 0; + + InitVector init_vector; + + /// The size of this header in bytes, including reserved bytes. + static constexpr const size_t kSize = 64; + + void read(ReadBuffer & in); + void write(WriteBuffer & out) const; }; - -class Decryptor : public Encryption -{ -public: - Decryptor(const String & iv_, const EncryptionKey & key_) : Encryption(iv_, key_, 0) { } - void decrypt(const char * ciphertext, char * buf, size_t size, size_t off); - -private: - void decryptPartialBlock(char *& to, const char * partial_block, size_t size, const InitVector & iv, size_t off) const; - void decryptNBytes(char *& to, const char * data, size_t bytes, const InitVector & iv) const; -}; - - -String readIV(size_t size, ReadBuffer & in); -String randomString(size_t size); -void writeIV(const String & iv, WriteBuffer & out); -size_t cipherKeyLength(const EVP_CIPHER * evp_cipher); -size_t cipherIVLength(const EVP_CIPHER * evp_cipher); -const EVP_CIPHER * defaultCipher(); +/// Calculates the hash of a passed key. +/// 1 byte is enough because this hash is used only for the first check. +UInt8 calculateKeyHash(const String & key); } } diff --git a/src/IO/OpenedFile.cpp b/src/IO/OpenedFile.cpp new file mode 100644 index 00000000000..6df21e836b4 --- /dev/null +++ b/src/IO/OpenedFile.cpp @@ -0,0 +1,67 @@ +#include +#include + +#include +#include +#include + + +namespace ProfileEvents +{ + extern const Event FileOpen; +} + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int FILE_DOESNT_EXIST; + extern const int CANNOT_OPEN_FILE; + extern const int CANNOT_CLOSE_FILE; +} + + +void OpenedFile::open(int flags) +{ + ProfileEvents::increment(ProfileEvents::FileOpen); + + fd = ::open(file_name.c_str(), (flags == -1 ? 0 : flags) | O_RDONLY | O_CLOEXEC); + + if (-1 == fd) + throwFromErrnoWithPath("Cannot open file " + file_name, file_name, + errno == ENOENT ? ErrorCodes::FILE_DOESNT_EXIST : ErrorCodes::CANNOT_OPEN_FILE); +} + + +std::string OpenedFile::getFileName() const +{ + return file_name; +} + + +OpenedFile::OpenedFile(const std::string & file_name_, int flags) + : file_name(file_name_) +{ + open(flags); +} + + +OpenedFile::~OpenedFile() +{ + if (fd != -1) + close(); /// Exceptions will lead to std::terminate and that's Ok. +} + + +void OpenedFile::close() +{ + if (0 != ::close(fd)) + throw Exception("Cannot close file", ErrorCodes::CANNOT_CLOSE_FILE); + + fd = -1; + metric_increment.destroy(); +} + +} + diff --git a/src/IO/OpenedFile.h b/src/IO/OpenedFile.h new file mode 100644 index 00000000000..8af0c83c363 --- /dev/null +++ b/src/IO/OpenedFile.h @@ -0,0 +1,39 @@ +#pragma once + +#include +#include + + +namespace CurrentMetrics +{ + extern const Metric OpenFileForRead; +} + + +namespace DB +{ + +/// RAII for readonly opened file descriptor. +class OpenedFile +{ +public: + OpenedFile(const std::string & file_name_, int flags); + ~OpenedFile(); + + /// Close prematurally. + void close(); + + int getFD() const { return fd; } + std::string getFileName() const; + +private: + std::string file_name; + int fd = -1; + + CurrentMetrics::Increment metric_increment{CurrentMetrics::OpenFileForRead}; + + void open(int flags); +}; + +} + diff --git a/src/IO/OpenedFileCache.h b/src/IO/OpenedFileCache.h new file mode 100644 index 00000000000..3ae668c20e1 --- /dev/null +++ b/src/IO/OpenedFileCache.h @@ -0,0 +1,74 @@ +#pragma once + +#include +#include + +#include +#include +#include + + +namespace ProfileEvents +{ + extern const Event OpenedFileCacheHits; + extern const Event OpenedFileCacheMisses; +} + +namespace DB +{ + + +/** Cache of opened files for reading. + * It allows to share file descriptors when doing reading with 'pread' syscalls on readonly files. + * Note: open/close of files is very cheap on Linux and we should not bother doing it 10 000 times a second. + * (This may not be the case on Windows with WSL. This is also not the case if strace is active. Neither when some eBPF is loaded). + * But sometimes we may end up opening one file multiple times, that increases chance exhausting opened files limit. + */ +class OpenedFileCache +{ +private: + using Key = std::pair; + + using OpenedFileWeakPtr = std::weak_ptr; + using Files = std::map; + + Files files; + std::mutex mutex; + +public: + using OpenedFilePtr = std::shared_ptr; + + OpenedFilePtr get(const std::string & path, int flags) + { + Key key(path, flags); + + std::lock_guard lock(mutex); + + auto [it, inserted] = files.emplace(key, OpenedFilePtr{}); + if (!inserted) + if (auto res = it->second.lock()) + return res; + + OpenedFilePtr res + { + new OpenedFile(path, flags), + [key, this](auto ptr) + { + { + std::lock_guard another_lock(mutex); + files.erase(key); + } + delete ptr; + } + }; + + it->second = res; + return res; + } +}; + +using OpenedFileCachePtr = std::shared_ptr; + +} + + diff --git a/src/IO/Progress.h b/src/IO/Progress.h index 446acef9abd..e1253ab8eb8 100644 --- a/src/IO/Progress.h +++ b/src/IO/Progress.h @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -120,4 +121,12 @@ struct Progress } }; + +/** Callback to track the progress of the query. + * Used in IBlockInputStream and Context. + * The function takes the number of rows in the last block, the number of bytes in the last block. + * Note that the callback can be called from different threads. + */ +using ProgressCallback = std::function; + } diff --git a/src/IO/ReadBufferFromEncryptedFile.cpp b/src/IO/ReadBufferFromEncryptedFile.cpp index 7a4d0e4ca14..445c55ac269 100644 --- a/src/IO/ReadBufferFromEncryptedFile.cpp +++ b/src/IO/ReadBufferFromEncryptedFile.cpp @@ -10,90 +10,94 @@ namespace ErrorCodes } ReadBufferFromEncryptedFile::ReadBufferFromEncryptedFile( - size_t buf_size_, + size_t buffer_size_, std::unique_ptr in_, - const String & init_vector_, - const FileEncryption::EncryptionKey & key_, - const size_t iv_offset_) - : ReadBufferFromFileBase(buf_size_, nullptr, 0) + const String & key_, + const FileEncryption::Header & header_, + size_t offset_) + : ReadBufferFromFileBase(buffer_size_, nullptr, 0) , in(std::move(in_)) - , buf_size(buf_size_) - , decryptor(FileEncryption::Decryptor(init_vector_, key_)) - , iv_offset(iv_offset_) + , encrypted_buffer(buffer_size_) + , encryptor(header_.algorithm, key_, header_.init_vector) { + offset = offset_; + encryptor.setOffset(offset_); + need_seek = true; } off_t ReadBufferFromEncryptedFile::seek(off_t off, int whence) { - if (whence == SEEK_CUR) - { - if (off < 0 && -off > getPosition()) - throw Exception("SEEK_CUR shift out of bounds", ErrorCodes::ARGUMENT_OUT_OF_BOUND); - - if (!working_buffer.empty() && static_cast(offset() + off) < working_buffer.size()) - { - pos += off; - return getPosition(); - } - else - start_pos = off + getPosition(); - } - else if (whence == SEEK_SET) + off_t new_pos; + if (whence == SEEK_SET) { if (off < 0) throw Exception("SEEK_SET underflow: off = " + std::to_string(off), ErrorCodes::ARGUMENT_OUT_OF_BOUND); - - if (!working_buffer.empty() && static_cast(off) >= start_pos - && static_cast(off) < (start_pos + working_buffer.size())) - { - pos = working_buffer.begin() + (off - start_pos); - return getPosition(); - } - else - start_pos = off; + new_pos = off; + } + else if (whence == SEEK_CUR) + { + if (off < 0 && -off > getPosition()) + throw Exception("SEEK_CUR shift out of bounds", ErrorCodes::ARGUMENT_OUT_OF_BOUND); + new_pos = getPosition() + off; } else - throw Exception("ReadBufferFromEncryptedFile::seek expects SEEK_SET or SEEK_CUR as whence", ErrorCodes::ARGUMENT_OUT_OF_BOUND); + throw Exception("ReadBufferFromFileEncrypted::seek expects SEEK_SET or SEEK_CUR as whence", ErrorCodes::ARGUMENT_OUT_OF_BOUND); - initialize(); - return start_pos; + if ((offset - static_cast(working_buffer.size()) <= new_pos) && (new_pos <= offset) && !need_seek) + { + /// Position is still inside buffer. + pos = working_buffer.end() - offset + new_pos; + assert(pos >= working_buffer.begin()); + assert(pos <= working_buffer.end()); + } + else + { + need_seek = true; + offset = new_pos; + + /// No more reading from the current working buffer until next() is called. + pos = working_buffer.end(); + assert(!hasPendingData()); + } + + /// The encryptor always needs to know what the current offset is. + encryptor.setOffset(new_pos); + + return new_pos; +} + +off_t ReadBufferFromEncryptedFile::getPosition() +{ + return offset - available(); } bool ReadBufferFromEncryptedFile::nextImpl() { + if (need_seek) + { + off_t raw_offset = offset + FileEncryption::Header::kSize; + if (in->seek(raw_offset, SEEK_SET) != raw_offset) + return false; + need_seek = false; + } + if (in->eof()) return false; - if (initialized) - start_pos += working_buffer.size(); - initialize(); - return true; -} - -void ReadBufferFromEncryptedFile::initialize() -{ - size_t in_pos = start_pos + iv_offset; - - String data; - data.resize(buf_size); - size_t data_size = 0; - - in->seek(in_pos, SEEK_SET); - while (data_size < buf_size && !in->eof()) + /// Read up to the size of `encrypted_buffer`. + size_t bytes_read = 0; + while (bytes_read < encrypted_buffer.size() && !in->eof()) { - auto size = in->read(data.data() + data_size, buf_size - data_size); - data_size += size; - in_pos += size; - in->seek(in_pos, SEEK_SET); + bytes_read += in->read(encrypted_buffer.data() + bytes_read, encrypted_buffer.size() - bytes_read); } - data.resize(data_size); - working_buffer.resize(data_size); - - decryptor.decrypt(data.data(), working_buffer.begin(), data_size, start_pos); + /// The used cipher algorithms generate the same number of bytes in output as it were in input, + /// so after deciphering the numbers of bytes will be still `bytes_read`. + working_buffer.resize(bytes_read); + encryptor.decrypt(encrypted_buffer.data(), bytes_read, working_buffer.begin()); pos = working_buffer.begin(); - initialized = true; + return true; } } diff --git a/src/IO/ReadBufferFromEncryptedFile.h b/src/IO/ReadBufferFromEncryptedFile.h index b9c84537f17..9409c4577d2 100644 --- a/src/IO/ReadBufferFromEncryptedFile.h +++ b/src/IO/ReadBufferFromEncryptedFile.h @@ -12,39 +12,34 @@ namespace DB { +/// Reads data from the underlying read buffer and decrypts it. class ReadBufferFromEncryptedFile : public ReadBufferFromFileBase { public: ReadBufferFromEncryptedFile( - size_t buf_size_, + size_t buffer_size_, std::unique_ptr in_, - const String & init_vector_, - const FileEncryption::EncryptionKey & key_, - const size_t iv_offset_); + const String & key_, + const FileEncryption::Header & header_, + size_t offset_ = 0); off_t seek(off_t off, int whence) override; - - off_t getPosition() override { return start_pos + offset(); } + off_t getPosition() override; std::string getFileName() const override { return in->getFileName(); } private: bool nextImpl() override; - void initialize(); - std::unique_ptr in; - size_t buf_size; - FileEncryption::Decryptor decryptor; - bool initialized = false; + off_t offset = 0; + bool need_seek = false; - // current working_buffer.begin() offset from decrypted file - size_t start_pos = 0; - size_t iv_offset = 0; + Memory<> encrypted_buffer; + FileEncryption::Encryptor encryptor; }; } - #endif diff --git a/src/IO/ReadBufferFromFile.cpp b/src/IO/ReadBufferFromFile.cpp index d0f94441622..2d0d135f886 100644 --- a/src/IO/ReadBufferFromFile.cpp +++ b/src/IO/ReadBufferFromFile.cpp @@ -88,4 +88,7 @@ void ReadBufferFromFile::close() metric_increment.destroy(); } + +OpenedFileCache ReadBufferFromFilePReadWithCache::cache; + } diff --git a/src/IO/ReadBufferFromFile.h b/src/IO/ReadBufferFromFile.h index 676f53afeb8..5b4f997ce3d 100644 --- a/src/IO/ReadBufferFromFile.h +++ b/src/IO/ReadBufferFromFile.h @@ -1,12 +1,14 @@ #pragma once #include +#include #include #ifndef O_DIRECT #define O_DIRECT 00040000 #endif + namespace CurrentMetrics { extern const Metric OpenFileForRead; @@ -60,4 +62,31 @@ public: } }; + +/** Similar to ReadBufferFromFilePRead but also transparently shares open file descriptors. + */ +class ReadBufferFromFilePReadWithCache : public ReadBufferFromFileDescriptorPRead +{ +private: + static OpenedFileCache cache; + + std::string file_name; + OpenedFileCache::OpenedFilePtr file; + +public: + ReadBufferFromFilePReadWithCache(const std::string & file_name_, size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, int flags = -1, + char * existing_memory = nullptr, size_t alignment = 0) + : ReadBufferFromFileDescriptorPRead(-1, buf_size, existing_memory, alignment), + file_name(file_name_) + { + file = cache.get(file_name, flags); + fd = file->getFD(); + } + + std::string getFileName() const override + { + return file_name; + } +}; + } diff --git a/src/IO/ReadBufferFromS3.h b/src/IO/ReadBufferFromS3.h index 46fb79a4a14..e24d571b557 100644 --- a/src/IO/ReadBufferFromS3.h +++ b/src/IO/ReadBufferFromS3.h @@ -43,7 +43,7 @@ public: const String & bucket_, const String & key_, UInt64 max_single_read_retries_, - size_t buffer_size_ = DBMS_DEFAULT_BUFFER_SIZE); + size_t buffer_size_); bool nextImpl() override; diff --git a/src/IO/ReadHelpers.cpp b/src/IO/ReadHelpers.cpp index 2a5594a6866..f6ccfbd56bb 100644 --- a/src/IO/ReadHelpers.cpp +++ b/src/IO/ReadHelpers.cpp @@ -327,6 +327,7 @@ static void parseComplexEscapeSequence(Vector & s, ReadBuffer & buf) && decoded_char != '"' && decoded_char != '`' /// MySQL style identifiers && decoded_char != '/' /// JavaScript in HTML + && decoded_char != '=' /// Yandex's TSKV && !isControlASCII(decoded_char)) { s.push_back('\\'); @@ -351,9 +352,12 @@ static ReturnType parseJSONEscapeSequence(Vector & s, ReadBuffer & buf) }; ++buf.position(); + if (buf.eof()) return error("Cannot parse escape sequence", ErrorCodes::CANNOT_PARSE_ESCAPE_SEQUENCE); + assert(buf.hasPendingData()); + switch (*buf.position()) { case '"': @@ -1124,10 +1128,13 @@ void saveUpToPosition(ReadBuffer & in, DB::Memory<> & memory, char * current) const size_t old_bytes = memory.size(); const size_t additional_bytes = current - in.position(); const size_t new_bytes = old_bytes + additional_bytes; + /// There are no new bytes to add to memory. /// No need to do extra stuff. if (new_bytes == 0) return; + + assert(in.position() + additional_bytes <= in.buffer().end()); memory.resize(new_bytes); memcpy(memory.data() + old_bytes, in.position(), additional_bytes); in.position() = current; diff --git a/src/IO/ReadHelpers.h b/src/IO/ReadHelpers.h index d140120aa58..f985070788c 100644 --- a/src/IO/ReadHelpers.h +++ b/src/IO/ReadHelpers.h @@ -403,7 +403,6 @@ bool tryReadIntText(T & x, ReadBuffer & buf) // -V1071 * Differs in following: * - for numbers starting with zero, parsed only zero; * - symbol '+' before number is not supported; - * - symbols :;<=>? are parsed as some numbers. */ template void readIntTextUnsafe(T & x, ReadBuffer & buf) @@ -437,15 +436,12 @@ void readIntTextUnsafe(T & x, ReadBuffer & buf) while (!buf.eof()) { - /// This check is suddenly faster than - /// unsigned char c = *buf.position() - '0'; - /// if (c < 10) - /// for unknown reason on Xeon E5645. + unsigned char value = *buf.position() - '0'; - if ((*buf.position() & 0xF0) == 0x30) /// It makes sense to have this condition inside loop. + if (value < 10) { res *= 10; - res += *buf.position() & 0x0F; + res += value; ++buf.position(); } else @@ -644,7 +640,7 @@ inline ReturnType readDateTextImpl(ExtendedDayNum & date, ReadBuffer & buf) else if (!readDateTextImpl(local_date, buf)) return false; /// When the parameter is out of rule or out of range, Date32 uses 1925-01-01 as the default value (-DateLUT::instance().getDayNumOffsetEpoch(), -16436) and Date uses 1970-01-01. - date = DateLUT::instance().makeDayNum(local_date.year(), local_date.month(), local_date.day(), -DateLUT::instance().getDayNumOffsetEpoch()); + date = DateLUT::instance().makeDayNum(local_date.year(), local_date.month(), local_date.day(), -static_cast(DateLUT::instance().getDayNumOffsetEpoch())); return ReturnType(true); } diff --git a/src/IO/S3/PocoHTTPClient.cpp b/src/IO/S3/PocoHTTPClient.cpp index 618d9ab7661..78cb5300101 100644 --- a/src/IO/S3/PocoHTTPClient.cpp +++ b/src/IO/S3/PocoHTTPClient.cpp @@ -89,6 +89,7 @@ void PocoHTTPClientConfiguration::updateSchemeAndRegion() PocoHTTPClient::PocoHTTPClient(const PocoHTTPClientConfiguration & clientConfiguration) : per_request_configuration(clientConfiguration.perRequestConfiguration) + , error_report(clientConfiguration.error_report) , timeouts(ConnectionTimeouts( Poco::Timespan(clientConfiguration.connectTimeoutMs * 1000), /// connection timeout. Poco::Timespan(clientConfiguration.requestTimeoutMs * 1000), /// send timeout. @@ -296,6 +297,8 @@ void PocoHTTPClient::makeRequestInternal( else if (status_code >= 300) { ProfileEvents::increment(select_metric(S3MetricType::Errors)); + if (status_code >= 500 && error_report) + error_report(request_configuration); } response->SetResponseBody(response_body_stream, session); diff --git a/src/IO/S3/PocoHTTPClient.h b/src/IO/S3/PocoHTTPClient.h index e374863cf00..12f5af60ed4 100644 --- a/src/IO/S3/PocoHTTPClient.h +++ b/src/IO/S3/PocoHTTPClient.h @@ -37,6 +37,8 @@ struct PocoHTTPClientConfiguration : public Aws::Client::ClientConfiguration void updateSchemeAndRegion(); + std::function error_report; + private: PocoHTTPClientConfiguration(const String & force_region_, const RemoteHostFilter & remote_host_filter_, unsigned int s3_max_redirects_); @@ -95,6 +97,7 @@ private: Aws::Utils::RateLimits::RateLimiterInterface * writeLimiter) const; std::function per_request_configuration; + std::function error_report; ConnectionTimeouts timeouts; const RemoteHostFilter & remote_host_filter; unsigned int s3_max_redirects; diff --git a/src/IO/S3Common.cpp b/src/IO/S3Common.cpp index 511ebaf1edd..74c328661c4 100644 --- a/src/IO/S3Common.cpp +++ b/src/IO/S3Common.cpp @@ -661,7 +661,7 @@ namespace S3 /// S3 specification requires at least 3 and at most 63 characters in bucket name. /// https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-s3-bucket-naming-requirements.html if (bucket.length() < 3 || bucket.length() > 63) - throw Exception(ErrorCodes::BAD_ARGUMENTS, "Key name is empty in path style S3 URI: {} ({})", quoteString(key), uri.toString()); + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Bucket name length is out of bounds in virtual hosted style S3 URI: {} ({})", quoteString(bucket), uri.toString()); } else throw Exception(ErrorCodes::BAD_ARGUMENTS, "Bucket or key name are invalid in S3 URI: {}", uri.toString()); diff --git a/src/IO/WriteBufferFromEncryptedFile.cpp b/src/IO/WriteBufferFromEncryptedFile.cpp index ebc6b8610a1..63d30477ca4 100644 --- a/src/IO/WriteBufferFromEncryptedFile.cpp +++ b/src/IO/WriteBufferFromEncryptedFile.cpp @@ -7,17 +7,18 @@ namespace DB { WriteBufferFromEncryptedFile::WriteBufferFromEncryptedFile( - size_t buf_size_, + size_t buffer_size_, std::unique_ptr out_, - const String & init_vector_, - const FileEncryption::EncryptionKey & key_, - const size_t & file_size) - : WriteBufferFromFileBase(buf_size_, nullptr, 0) + const String & key_, + const FileEncryption::Header & header_, + size_t old_file_size) + : WriteBufferFromFileBase(buffer_size_, nullptr, 0) , out(std::move(out_)) - , flush_iv(!file_size) - , iv(init_vector_) - , encryptor(FileEncryption::Encryptor(init_vector_, key_, file_size)) + , header(header_) + , flush_header(!old_file_size) + , encryptor(header.algorithm, key_, header.init_vector) { + encryptor.setOffset(old_file_size); } WriteBufferFromEncryptedFile::~WriteBufferFromEncryptedFile() @@ -51,6 +52,11 @@ void WriteBufferFromEncryptedFile::finishImpl() { /// If buffer has pending data - write it. next(); + + /// Note that if there is no data to write an empty file will be written, even without the initialization vector + /// (see nextImpl(): it writes the initialization vector only if there is some data ready to write). + /// That's fine because DiskEncrypted allows files without initialization vectors when they're empty. + out->finalize(); } @@ -58,6 +64,7 @@ void WriteBufferFromEncryptedFile::sync() { /// If buffer has pending data - write it. next(); + out->sync(); } @@ -66,14 +73,15 @@ void WriteBufferFromEncryptedFile::nextImpl() if (!offset()) return; - if (flush_iv) + if (flush_header) { - FileEncryption::writeIV(iv, *out); - flush_iv = false; + header.write(*out); + flush_header = false; } - encryptor.encrypt(working_buffer.begin(), *out, offset()); + encryptor.encrypt(working_buffer.begin(), offset(), *out); } + } #endif diff --git a/src/IO/WriteBufferFromEncryptedFile.h b/src/IO/WriteBufferFromEncryptedFile.h index 132b9886ef5..8ae72b405d6 100644 --- a/src/IO/WriteBufferFromEncryptedFile.h +++ b/src/IO/WriteBufferFromEncryptedFile.h @@ -12,15 +12,17 @@ namespace DB { +/// Encrypts data and writes the encrypted data to the underlying write buffer. class WriteBufferFromEncryptedFile : public WriteBufferFromFileBase { public: + /// `old_file_size` should be set to non-zero if we're going to append an existing file. WriteBufferFromEncryptedFile( - size_t buf_size_, + size_t buffer_size_, std::unique_ptr out_, - const String & init_vector_, - const FileEncryption::EncryptionKey & key_, - const size_t & file_size); + const String & key_, + const FileEncryption::Header & header_, + size_t old_file_size = 0); ~WriteBufferFromEncryptedFile() override; void sync() override; @@ -37,8 +39,9 @@ private: bool finished = false; std::unique_ptr out; - bool flush_iv; - String iv; + FileEncryption::Header header; + bool flush_header = false; + FileEncryption::Encryptor encryptor; }; diff --git a/src/IO/ZstdInflatingReadBuffer.cpp b/src/IO/ZstdInflatingReadBuffer.cpp index b441a6a7210..6c03ea420a9 100644 --- a/src/IO/ZstdInflatingReadBuffer.cpp +++ b/src/IO/ZstdInflatingReadBuffer.cpp @@ -56,6 +56,13 @@ bool ZstdInflatingReadBuffer::nextImpl() eof = true; return !working_buffer.empty(); } + else if (output.pos == 0) + { + /// It is possible, that input buffer is not at eof yet, but nothing was decompressed in current iteration. + /// But there are cases, when such behaviour is not allowed - i.e. if input buffer is not eof, then + /// it has to be guaranteed that working_buffer is not empty. So if it is empty, continue. + return nextImpl(); + } return true; } diff --git a/src/IO/createReadBufferFromFileBase.cpp b/src/IO/createReadBufferFromFileBase.cpp index 11a0937ee48..9675d89c0dd 100644 --- a/src/IO/createReadBufferFromFileBase.cpp +++ b/src/IO/createReadBufferFromFileBase.cpp @@ -75,7 +75,7 @@ std::unique_ptr createReadBufferFromFileBase( /// Attempt to open a file with O_DIRECT try { - auto res = std::make_unique( + auto res = std::make_unique( filename, buffer_size, (flags == -1 ? O_RDONLY | O_CLOEXEC : flags) | O_DIRECT, existing_memory, alignment); ProfileEvents::increment(ProfileEvents::CreatedReadBufferDirectIO); return res; @@ -92,7 +92,7 @@ std::unique_ptr createReadBufferFromFileBase( #endif ProfileEvents::increment(ProfileEvents::CreatedReadBufferOrdinary); - return std::make_unique(filename, buffer_size, flags, existing_memory, alignment); + return std::make_unique(filename, buffer_size, flags, existing_memory, alignment); } } diff --git a/src/IO/tests/gtest_file_encryption.cpp b/src/IO/tests/gtest_file_encryption.cpp index 1f6d793dc76..187073c7262 100644 --- a/src/IO/tests/gtest_file_encryption.cpp +++ b/src/IO/tests/gtest_file_encryption.cpp @@ -11,140 +11,205 @@ using namespace DB; using namespace DB::FileEncryption; + struct InitVectorTestParam { - const std::string_view comment; const String init; - UInt128 adder; - UInt128 setter; const String after_inc; + const UInt64 adder; const String after_add; - const String after_set; }; +class FileEncryptionInitVectorTest : public ::testing::TestWithParam {}; -class InitVectorTest : public ::testing::TestWithParam {}; - - -String string_ends_with(size_t size, String str) -{ - String res(size, 0); - res.replace(size - str.size(), str.size(), str); - return res; -} - - -static std::ostream & operator << (std::ostream & ostr, const InitVectorTestParam & param) -{ - return ostr << param.comment; -} - - -TEST_P(InitVectorTest, InitVector) +TEST_P(FileEncryptionInitVectorTest, InitVector) { const auto & param = GetParam(); - auto iv = InitVector(param.init); - ASSERT_EQ(param.init, iv.str()); + auto iv = InitVector::fromString(param.init); + ASSERT_EQ(param.init, iv.toString()); - iv.inc(); - ASSERT_EQ(param.after_inc, iv.str()); + ++iv; + ASSERT_EQ(param.after_inc, iv.toString()); - iv.inc(param.adder); - ASSERT_EQ(param.after_add, iv.str()); - - iv.set(param.setter); - ASSERT_EQ(param.after_set, iv.str()); - - iv.set(0); - ASSERT_EQ(param.init, iv.str()); + iv += param.adder; + ASSERT_EQ(param.after_add, iv.toString()); } - -INSTANTIATE_TEST_SUITE_P(InitVectorInputs, - InitVectorTest, - ::testing::ValuesIn(std::initializer_list{ - { - "Basic init vector test. Get zero-string, add 0, set 0", +INSTANTIATE_TEST_SUITE_P(All, + FileEncryptionInitVectorTest, + ::testing::ValuesIn(std::initializer_list + { + { // #0. Basic init vector test. Get zero-string, add 1, add 0. String(16, 0), + String("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01", 16), 0, - 0, - string_ends_with(16, "\x1"), - string_ends_with(16, "\x1"), - String(16, 0), + String("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01", 16), }, { - "Init vector test. Get zero-string, add 85, set 1024", + // #1. Init vector test. Get zero-string, add 1, add 85, add 1024. String(16, 0), + String("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01", 16), 85, - 1024, - string_ends_with(16, "\x1"), - string_ends_with(16, "\x56"), - string_ends_with(16, String("\x4\0", 2)), + String("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x56", 16), }, { - "Long init vector test", - "\xa8\x65\x9c\x73\xf8\x5d\x83\xb4\x5c\xa6\x8c\x19\xf4\x77\x80\xe1", - 3349249125638641, - 1698923461902341, - "\xa8\x65\x9c\x73\xf8\x5d\x83\xb4\x5c\xa6\x8c\x19\xf4\x77\x80\xe2", - "\xa8\x65\x9c\x73\xf8\x5d\x83\xb4\x5c\xb2\x72\x39\xc8\xdd\x62\xd3", - String("\xa8\x65\x9c\x73\xf8\x5d\x83\xb4\x5c\xac\x95\x43\x65\xea\x00\xe6", 16) + // #2. Init vector test #2. Get zero-string, add 1, add 1024. + String(16, 0), + String("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01", 16), + 1024, + String("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x01", 16) + }, + { + // #3. Long init vector test. + String("\xa8\x65\x9c\x73\xf8\x5d\x83\xb4\x9c\xa6\x8c\x19\xf4\x77\x80\xe1", 16), + String("\xa8\x65\x9c\x73\xf8\x5d\x83\xb4\x9c\xa6\x8c\x19\xf4\x77\x80\xe2", 16), + 9349249176525638641ULL, + String("\xa8\x65\x9c\x73\xf8\x5d\x83\xb5\x1e\x65\xc0\xb1\x67\xe4\x0c\xd3", 16) }, }) ); -TEST(FileEncryption, Encryption) +struct CipherTestParam { - String iv(16, 0); - EncryptionKey key("1234567812345678"); - String input = "abcd1234efgh5678ijkl"; - String expected = "\xfb\x8a\x9e\x66\x82\x72\x1b\xbe\x6b\x1d\xd8\x98\xc5\x8c\x63\xee\xcd\x36\x4a\x50"; + const Algorithm algorithm; + const String key; + const InitVector iv; + const size_t offset; + const String plaintext; + const String ciphertext; +}; - String result(expected.size(), 0); - for (size_t i = 0; i <= expected.size(); ++i) +class FileEncryptionCipherTest : public ::testing::TestWithParam {}; + +TEST_P(FileEncryptionCipherTest, Encryption) +{ + const auto & param = GetParam(); + + Encryptor encryptor{param.algorithm, param.key, param.iv}; + std::string_view input = param.plaintext; + std::string_view expected = param.ciphertext; + size_t base_offset = param.offset; + + encryptor.setOffset(base_offset); + for (size_t i = 0; i < expected.size(); ++i) { - auto buf = WriteBufferFromString(result); - auto encryptor = Encryptor(iv, key, 0); - encryptor.encrypt(input.data(), buf, i); - ASSERT_EQ(expected.substr(0, i), result.substr(0, i)); + WriteBufferFromOwnString buf; + encryptor.encrypt(&input[i], 1, buf); + ASSERT_EQ(expected.substr(i, 1), buf.str()); + } + + for (size_t i = 0; i < expected.size(); ++i) + { + WriteBufferFromOwnString buf; + encryptor.setOffset(base_offset + i); + encryptor.encrypt(&input[i], 1, buf); + ASSERT_EQ(expected.substr(i, 1), buf.str()); } - size_t offset = 25; - String offset_expected = "\x6c\x67\xe4\xf5\x8f\x86\xb0\x19\xe5\xcd\x53\x59\xe0\xc6\x01\x5e\xc1\xfd\x60\x9d"; for (size_t i = 0; i <= expected.size(); ++i) { - auto buf = WriteBufferFromString(result); - auto encryptor = Encryptor(iv, key, offset); - encryptor.encrypt(input.data(), buf, i); - ASSERT_EQ(offset_expected.substr(0, i), result.substr(0, i)); + WriteBufferFromOwnString buf; + encryptor.setOffset(base_offset); + encryptor.encrypt(input.data(), i, buf); + ASSERT_EQ(expected.substr(0, i), buf.str()); } } - -TEST(FileEncryption, Decryption) +TEST_P(FileEncryptionCipherTest, Decryption) { - String iv(16, 0); - EncryptionKey key("1234567812345678"); - String expected = "abcd1234efgh5678ijkl"; - String input = "\xfb\x8a\x9e\x66\x82\x72\x1b\xbe\x6b\x1d\xd8\x98\xc5\x8c\x63\xee\xcd\x36\x4a\x50"; - auto decryptor = Decryptor(iv, key); - String result(expected.size(), 0); + const auto & param = GetParam(); - for (size_t i = 0; i <= expected.size(); ++i) + Encryptor encryptor{param.algorithm, param.key, param.iv}; + std::string_view input = param.ciphertext; + std::string_view expected = param.plaintext; + size_t base_offset = param.offset; + + encryptor.setOffset(base_offset); + for (size_t i = 0; i < expected.size(); ++i) { - decryptor.decrypt(input.data(), result.data(), i, 0); - ASSERT_EQ(expected.substr(0, i), result.substr(0, i)); + char c; + encryptor.decrypt(&input[i], 1, &c); + ASSERT_EQ(expected[i], c); } - size_t offset = 25; - String offset_input = "\x6c\x67\xe4\xf5\x8f\x86\xb0\x19\xe5\xcd\x53\x59\xe0\xc6\x01\x5e\xc1\xfd\x60\x9d"; + for (size_t i = 0; i < expected.size(); ++i) + { + char c; + encryptor.setOffset(base_offset + i); + encryptor.decrypt(&input[i], 1, &c); + ASSERT_EQ(expected[i], c); + } + + String buf(expected.size(), 0); for (size_t i = 0; i <= expected.size(); ++i) { - decryptor.decrypt(offset_input.data(), result.data(), i, offset); - ASSERT_EQ(expected.substr(0, i), result.substr(0, i)); + encryptor.setOffset(base_offset); + encryptor.decrypt(input.data(), i, buf.data()); + ASSERT_EQ(expected.substr(0, i), buf.substr(0, i)); } } +INSTANTIATE_TEST_SUITE_P(All, + FileEncryptionCipherTest, + ::testing::ValuesIn(std::initializer_list + { + { + // #0 + Algorithm::AES_128_CTR, + "1234567812345678", + InitVector{}, + 0, + "abcd1234efgh5678ijkl", + "\xfb\x8a\x9e\x66\x82\x72\x1b\xbe\x6b\x1d\xd8\x98\xc5\x8c\x63\xee\xcd\x36\x4a\x50" + }, + { + // #1 + Algorithm::AES_128_CTR, + "1234567812345678", + InitVector{}, + 25, + "abcd1234efgh5678ijkl", + "\x6c\x67\xe4\xf5\x8f\x86\xb0\x19\xe5\xcd\x53\x59\xe0\xc6\x01\x5e\xc1\xfd\x60\x9d" + }, + { + // #2 + Algorithm::AES_128_CTR, + String{"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 16}, + InitVector{}, + 0, + "abcd1234efgh5678ijkl", + "\xa7\xc3\x58\x53\xb6\xbd\x68\xb6\x0a\x29\xe6\x0a\x94\xfe\xef\x41\x1a\x2c\x78\xf9" + }, + { + // #3 + Algorithm::AES_128_CTR, + "1234567812345678", + InitVector::fromString(String{"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 16}), + 0, + "abcd1234efgh5678ijkl", + "\xcf\xab\x7c\xad\xa9\xdc\x67\x60\x90\x85\x7b\xb8\x72\xa9\x6f\x9c\x29\xb2\x4f\xf6" + }, + { + // #4 + Algorithm::AES_192_CTR, + "123456781234567812345678", + InitVector{}, + 0, + "abcd1234efgh5678ijkl", + "\xcc\x25\x2b\xad\xe8\xa2\xdc\x64\x3e\xf9\x60\xe0\x6e\xde\x70\xb6\x63\xa8\xfa\x02" + }, + { + // #5 + Algorithm::AES_256_CTR, + "12345678123456781234567812345678", + InitVector{}, + 0, + "abcd1234efgh5678ijkl", + "\xc7\x41\xa6\x63\x04\x60\x1b\x1a\xcb\x84\x19\xce\x3a\x36\xa3\xbd\x21\x71\x93\xfb" + }, + }) +); + #endif diff --git a/src/IO/ya.make b/src/IO/ya.make index 3bd704ec6f0..9e35a062a96 100644 --- a/src/IO/ya.make +++ b/src/IO/ya.make @@ -5,7 +5,7 @@ LIBRARY() ADDINCL( contrib/libs/zstd/include - contrib/restricted/fast_float + contrib/restricted/fast_float/include ) PEERDIR( @@ -43,6 +43,7 @@ SRCS( MySQLPacketPayloadReadBuffer.cpp MySQLPacketPayloadWriteBuffer.cpp NullWriteBuffer.cpp + OpenedFile.cpp PeekableReadBuffer.cpp Progress.cpp ReadBufferFromEncryptedFile.cpp diff --git a/src/IO/ya.make.in b/src/IO/ya.make.in index 4c28475e0e0..0b579b0df37 100644 --- a/src/IO/ya.make.in +++ b/src/IO/ya.make.in @@ -4,7 +4,7 @@ LIBRARY() ADDINCL( contrib/libs/zstd/include - contrib/restricted/fast_float + contrib/restricted/fast_float/include ) PEERDIR( diff --git a/src/Interpreters/Aggregator.cpp b/src/Interpreters/Aggregator.cpp index 715d44eecf0..7ffae761c0c 100644 --- a/src/Interpreters/Aggregator.cpp +++ b/src/Interpreters/Aggregator.cpp @@ -293,14 +293,14 @@ Aggregator::Aggregator(const Params & params_) aggregation_state_cache = AggregatedDataVariants::createCache(method_chosen, cache_settings); #if USE_EMBEDDED_COMPILER - compileAggregateFunctions(); + compileAggregateFunctionsIfNeeded(); #endif } #if USE_EMBEDDED_COMPILER -void Aggregator::compileAggregateFunctions() +void Aggregator::compileAggregateFunctionsIfNeeded() { static std::unordered_map aggregate_functions_description_to_count; static std::mutex mtx; @@ -362,7 +362,7 @@ void Aggregator::compileAggregateFunctions() { LOG_TRACE(log, "Compile expression {}", functions_description); - auto compiled_aggregate_functions = compileAggregateFunctons(getJITInstance(), functions_to_compile, functions_description); + auto compiled_aggregate_functions = compileAggregateFunctions(getJITInstance(), functions_to_compile, functions_description); return std::make_shared(std::move(compiled_aggregate_functions)); }); @@ -371,7 +371,7 @@ void Aggregator::compileAggregateFunctions() else { LOG_TRACE(log, "Compile expression {}", functions_description); - auto compiled_aggregate_functions = compileAggregateFunctons(getJITInstance(), functions_to_compile, functions_description); + auto compiled_aggregate_functions = compileAggregateFunctions(getJITInstance(), functions_to_compile, functions_description); compiled_aggregate_functions_holder = std::make_shared(std::move(compiled_aggregate_functions)); } } diff --git a/src/Interpreters/Aggregator.h b/src/Interpreters/Aggregator.h index bb36ae54a5d..265ef912794 100644 --- a/src/Interpreters/Aggregator.h +++ b/src/Interpreters/Aggregator.h @@ -1093,7 +1093,7 @@ private: /** Try to compile aggregate functions. */ - void compileAggregateFunctions(); + void compileAggregateFunctionsIfNeeded(); /** Select the aggregation method based on the number and types of keys. */ AggregatedDataVariants::Type chooseAggregationMethod(); diff --git a/src/Interpreters/AsynchronousMetrics.cpp b/src/Interpreters/AsynchronousMetrics.cpp index da514759eb5..d708ff4f9e0 100644 --- a/src/Interpreters/AsynchronousMetrics.cpp +++ b/src/Interpreters/AsynchronousMetrics.cpp @@ -77,6 +77,7 @@ AsynchronousMetrics::AsynchronousMetrics( , update_period(update_period_seconds) , servers_to_start_before_tables(servers_to_start_before_tables_) , servers(servers_) + , log(&Poco::Logger::get("AsynchronousMetrics")) { #if defined(OS_LINUX) openFileIfExists("/proc/meminfo", meminfo); @@ -174,26 +175,39 @@ AsynchronousMetrics::AsynchronousMetrics( edac.back().second = openFileIfExists(edac_uncorrectable_file); } - if (std::filesystem::exists("/sys/block")) - { - for (const auto & device_dir : std::filesystem::directory_iterator("/sys/block")) - { - String device_name = device_dir.path().filename(); - - /// We are not interested in loopback devices. - if (device_name.starts_with("loop")) - continue; - - std::unique_ptr file = openFileIfExists(device_dir.path() / "stat"); - if (!file) - continue; - - block_devs[device_name] = std::move(file); - } - } + openBlockDevices(); #endif } +#if defined(OS_LINUX) +void AsynchronousMetrics::openBlockDevices() +{ + LOG_TRACE(log, "Scanning /sys/block"); + + if (!std::filesystem::exists("/sys/block")) + return; + + block_devices_rescan_delay.restart(); + + block_devs.clear(); + + for (const auto & device_dir : std::filesystem::directory_iterator("/sys/block")) + { + String device_name = device_dir.path().filename(); + + /// We are not interested in loopback devices. + if (device_name.starts_with("loop")) + continue; + + std::unique_ptr file = openFileIfExists(device_dir.path() / "stat"); + if (!file) + continue; + + block_devs[device_name] = std::move(file); + } +} +#endif + void AsynchronousMetrics::start() { /// Update once right now, to make metrics available just after server start @@ -546,13 +560,16 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti Int64 peak = total_memory_tracker.getPeak(); Int64 new_amount = data.resident; - LOG_DEBUG(&Poco::Logger::get("AsynchronousMetrics"), - "MemoryTracking: was {}, peak {}, will set to {} (RSS), difference: {}", - ReadableSize(amount), - ReadableSize(peak), - ReadableSize(new_amount), - ReadableSize(new_amount - amount) - ); + Int64 difference = new_amount - amount; + + /// Log only if difference is high. This is for convenience. The threshold is arbitrary. + if (difference >= 1048576 || difference <= -1048576) + LOG_TRACE(log, + "MemoryTracking: was {}, peak {}, will set to {} (RSS), difference: {}", + ReadableSize(amount), + ReadableSize(peak), + ReadableSize(new_amount), + ReadableSize(difference)); total_memory_tracker.set(new_amount); CurrentMetrics::set(CurrentMetrics::MemoryTracking, new_amount); @@ -874,6 +891,11 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti } } + /// Update list of block devices periodically + /// (i.e. someone may add new disk to RAID array) + if (block_devices_rescan_delay.elapsedSeconds() >= 300) + openBlockDevices(); + for (auto & [name, device] : block_devs) { try @@ -927,6 +949,16 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti } catch (...) { + /// Try to reopen block devices in case of error + /// (i.e. ENOENT means that some disk had been replaced, and it may apperas with a new name) + try + { + openBlockDevices(); + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + } tryLogCurrentException(__PRETTY_FUNCTION__); } } @@ -1300,9 +1332,9 @@ void AsynchronousMetrics::update(std::chrono::system_clock::time_point update_ti new_values["AsynchronousMetricsCalculationTimeSpent"] = watch.elapsedSeconds(); /// Log the new metrics. - if (auto log = getContext()->getAsynchronousMetricLog()) + if (auto asynchronous_metric_log = getContext()->getAsynchronousMetricLog()) { - log->addValues(new_values); + asynchronous_metric_log->addValues(new_values); } first_run = false; diff --git a/src/Interpreters/AsynchronousMetrics.h b/src/Interpreters/AsynchronousMetrics.h index 07e117c4dd9..c8677ac3ced 100644 --- a/src/Interpreters/AsynchronousMetrics.h +++ b/src/Interpreters/AsynchronousMetrics.h @@ -3,6 +3,7 @@ #include #include #include +#include #include #include @@ -15,6 +16,11 @@ #include +namespace Poco +{ +class Logger; +} + namespace DB { @@ -175,12 +181,17 @@ private: std::unordered_map network_interface_stats; + Stopwatch block_devices_rescan_delay; + + void openBlockDevices(); #endif std::unique_ptr thread; void run(); void update(std::chrono::system_clock::time_point update_time); + + Poco::Logger * log; }; } diff --git a/src/Interpreters/ClientInfo.h b/src/Interpreters/ClientInfo.h index d6158a2d7d5..7c169e6ebb5 100644 --- a/src/Interpreters/ClientInfo.h +++ b/src/Interpreters/ClientInfo.h @@ -100,6 +100,8 @@ public: UInt64 distributed_depth = 0; + bool is_replicated_database_internal = false; + bool empty() const { return query_kind == QueryKind::NO_QUERY; } /** Serialization and deserialization. diff --git a/src/Interpreters/Cluster.cpp b/src/Interpreters/Cluster.cpp index 2fb5d7afcbd..1a7554a8c27 100644 --- a/src/Interpreters/Cluster.cpp +++ b/src/Interpreters/Cluster.cpp @@ -115,23 +115,44 @@ Cluster::Address::Address( Cluster::Address::Address( - const String & host_port_, - const String & user_, - const String & password_, - UInt16 clickhouse_port, - bool secure_, - Int64 priority_, - UInt32 shard_index_, - UInt32 replica_index_) - : user(user_) - , password(password_) + const String & host_port_, + const String & user_, + const String & password_, + UInt16 clickhouse_port, + bool treat_local_port_as_remote, + bool secure_, + Int64 priority_, + UInt32 shard_index_, + UInt32 replica_index_) + : user(user_), password(password_) { - auto parsed_host_port = parseAddress(host_port_, clickhouse_port); + bool can_be_local = true; + std::pair parsed_host_port; + if (!treat_local_port_as_remote) + { + parsed_host_port = parseAddress(host_port_, clickhouse_port); + } + else + { + /// For clickhouse-local (treat_local_port_as_remote) try to read the address without passing a default port + /// If it works we have a full address that includes a port, which means it won't be local + /// since clickhouse-local doesn't listen in any port + /// If it doesn't include a port then use the default one and it could be local (if the address is) + try + { + parsed_host_port = parseAddress(host_port_, 0); + can_be_local = false; + } + catch (...) + { + parsed_host_port = parseAddress(host_port_, clickhouse_port); + } + } host_name = parsed_host_port.first; port = parsed_host_port.second; secure = secure_ ? Protocol::Secure::Enable : Protocol::Secure::Disable; priority = priority_; - is_local = isLocal(clickhouse_port); + is_local = can_be_local && isLocal(clickhouse_port); shard_index = shard_index_; replica_index = replica_index_; } @@ -329,7 +350,7 @@ Clusters::Impl Clusters::getContainer() const Cluster::Cluster(const Poco::Util::AbstractConfiguration & config, const Settings & settings, const String & config_prefix_, - const String & cluster_name) + const String & cluster_name) : name(cluster_name) { auto config_prefix = config_prefix_ + "." + cluster_name; @@ -366,7 +387,7 @@ Cluster::Cluster(const Poco::Util::AbstractConfiguration & config, if (address.is_local) info.local_addresses.push_back(address); - ConnectionPoolPtr pool = std::make_shared( + auto pool = ConnectionPoolFactory::instance().get( settings.distributed_connections_pool_size, address.host_name, address.port, address.default_database, address.user, address.password, @@ -439,7 +460,7 @@ Cluster::Cluster(const Poco::Util::AbstractConfiguration & config, for (const auto & replica : replica_addresses) { - auto replica_pool = std::make_shared( + auto replica_pool = ConnectionPoolFactory::instance().get( settings.distributed_connections_pool_size, replica.host_name, replica.port, replica.default_database, replica.user, replica.password, @@ -482,9 +503,16 @@ Cluster::Cluster(const Poco::Util::AbstractConfiguration & config, } -Cluster::Cluster(const Settings & settings, const std::vector> & names, - const String & username, const String & password, UInt16 clickhouse_port, bool treat_local_as_remote, - bool secure, Int64 priority) +Cluster::Cluster( + const Settings & settings, + const std::vector> & names, + const String & username, + const String & password, + UInt16 clickhouse_port, + bool treat_local_as_remote, + bool treat_local_port_as_remote, + bool secure, + Int64 priority) { UInt32 current_shard_num = 1; @@ -492,7 +520,16 @@ Cluster::Cluster(const Settings & settings, const std::vector( + auto replica_pool = ConnectionPoolFactory::instance().get( settings.distributed_connections_pool_size, replica.host_name, replica.port, replica.default_database, replica.user, replica.password, @@ -606,7 +643,7 @@ Cluster::Cluster(Cluster::ReplicasAsShardsTag, const Cluster & from, const Setti if (address.is_local) info.local_addresses.push_back(address); - ConnectionPoolPtr pool = std::make_shared( + auto pool = ConnectionPoolFactory::instance().get( settings.distributed_connections_pool_size, address.host_name, address.port, diff --git a/src/Interpreters/Cluster.h b/src/Interpreters/Cluster.h index 0afc43b85b2..a77eb3983dc 100644 --- a/src/Interpreters/Cluster.h +++ b/src/Interpreters/Cluster.h @@ -39,14 +39,21 @@ public: /// Construct a cluster by the names of shards and replicas. /// Local are treated as well as remote ones if treat_local_as_remote is true. + /// Local are also treated as remote if treat_local_port_as_remote is set and the local address includes a port /// 'clickhouse_port' - port that this server instance listen for queries. /// This parameter is needed only to check that some address is local (points to ourself). /// /// Used for remote() function. - Cluster(const Settings & settings, const std::vector> & names, - const String & username, const String & password, - UInt16 clickhouse_port, bool treat_local_as_remote, - bool secure = false, Int64 priority = 1); + Cluster( + const Settings & settings, + const std::vector> & names, + const String & username, + const String & password, + UInt16 clickhouse_port, + bool treat_local_as_remote, + bool treat_local_port_as_remote, + bool secure = false, + Int64 priority = 1); Cluster(const Cluster &)= delete; Cluster & operator=(const Cluster &) = delete; @@ -115,6 +122,7 @@ public: const String & user_, const String & password_, UInt16 clickhouse_port, + bool treat_local_port_as_remote, bool secure_ = false, Int64 priority_ = 1, UInt32 shard_index_ = 0, @@ -265,6 +273,8 @@ private: size_t remote_shard_count = 0; size_t local_shard_count = 0; + + String name; }; using ClusterPtr = std::shared_ptr; diff --git a/src/Interpreters/ClusterProxy/IStreamFactory.h b/src/Interpreters/ClusterProxy/IStreamFactory.h index f66eee93e0a..6360aee2f55 100644 --- a/src/Interpreters/ClusterProxy/IStreamFactory.h +++ b/src/Interpreters/ClusterProxy/IStreamFactory.h @@ -18,6 +18,8 @@ using Pipes = std::vector; class QueryPlan; using QueryPlanPtr = std::unique_ptr; +struct StorageID; + namespace ClusterProxy { @@ -28,15 +30,32 @@ class IStreamFactory public: virtual ~IStreamFactory() = default; + struct Shard + { + /// Query and header may be changed depending on shard. + ASTPtr query; + Block header; + + size_t shard_num = 0; + ConnectionPoolWithFailoverPtr pool; + + /// If we connect to replicas lazily. + /// (When there is a local replica with big delay). + bool lazy = false; + UInt32 local_delay = 0; + }; + + using Shards = std::vector; + virtual void createForShard( const Cluster::ShardInfo & shard_info, const ASTPtr & query_ast, - ContextPtr context, const ThrottlerPtr & throttler, - const SelectQueryInfo & query_info, - std::vector & res, - Pipes & remote_pipes, - Pipes & delayed_pipes, - Poco::Logger * log) = 0; + const StorageID & main_table, + const ASTPtr & table_func_ptr, + ContextPtr context, + std::vector & local_plans, + Shards & remote_shards, + UInt32 shard_count) = 0; }; } diff --git a/src/Interpreters/ClusterProxy/SelectStreamFactory.cpp b/src/Interpreters/ClusterProxy/SelectStreamFactory.cpp index 0c9d42e1381..961de45c491 100644 --- a/src/Interpreters/ClusterProxy/SelectStreamFactory.cpp +++ b/src/Interpreters/ClusterProxy/SelectStreamFactory.cpp @@ -1,6 +1,5 @@ #include #include -#include #include #include #include @@ -11,10 +10,6 @@ #include #include -#include -#include -#include -#include #include #include #include @@ -32,7 +27,6 @@ namespace DB namespace ErrorCodes { - extern const int ALL_CONNECTION_TRIES_FAILED; extern const int ALL_REPLICAS_ARE_STALE; } @@ -42,94 +36,17 @@ namespace ClusterProxy SelectStreamFactory::SelectStreamFactory( const Block & header_, QueryProcessingStage::Enum processed_stage_, - StorageID main_table_, - const Scalars & scalars_, - bool has_virtual_shard_num_column_, - const Tables & external_tables_) + bool has_virtual_shard_num_column_) : header(header_), processed_stage{processed_stage_}, - main_table(std::move(main_table_)), - table_func_ptr{nullptr}, - scalars{scalars_}, - has_virtual_shard_num_column(has_virtual_shard_num_column_), - external_tables{external_tables_} + has_virtual_shard_num_column(has_virtual_shard_num_column_) { } -SelectStreamFactory::SelectStreamFactory( - const Block & header_, - QueryProcessingStage::Enum processed_stage_, - ASTPtr table_func_ptr_, - const Scalars & scalars_, - bool has_virtual_shard_num_column_, - const Tables & external_tables_) - : header(header_), - processed_stage{processed_stage_}, - table_func_ptr{table_func_ptr_}, - scalars{scalars_}, - has_virtual_shard_num_column(has_virtual_shard_num_column_), - external_tables{external_tables_} -{ -} namespace { -/// Special support for the case when `_shard_num` column is used in GROUP BY key expression. -/// This column is a constant for shard. -/// Constant expression with this column may be removed from intermediate header. -/// However, this column is not constant for initiator, and it expect intermediate header has it. -/// -/// To fix it, the following trick is applied. -/// We check all GROUP BY keys which depend only on `_shard_num`. -/// Calculate such expression for current shard if it is used in header. -/// Those columns will be added to modified header as already known constants. -/// -/// For local shard, missed constants will be added by converting actions. -/// For remote shard, RemoteQueryExecutor will automatically add missing constant. -Block evaluateConstantGroupByKeysWithShardNumber( - const ContextPtr & context, const ASTPtr & query_ast, const Block & header, UInt32 shard_num) -{ - Block res; - - ColumnWithTypeAndName shard_num_col; - shard_num_col.type = std::make_shared(); - shard_num_col.column = shard_num_col.type->createColumnConst(0, shard_num); - shard_num_col.name = "_shard_num"; - - if (auto group_by = query_ast->as().groupBy()) - { - for (const auto & elem : group_by->children) - { - String key_name = elem->getColumnName(); - if (header.has(key_name)) - { - auto ast = elem->clone(); - - RequiredSourceColumnsVisitor::Data columns_context; - RequiredSourceColumnsVisitor(columns_context).visit(ast); - - auto required_columns = columns_context.requiredColumns(); - if (required_columns.size() != 1 || required_columns.count("_shard_num") == 0) - continue; - - Block block({shard_num_col}); - auto syntax_result = TreeRewriter(context).analyze(ast, {NameAndTypePair{shard_num_col.name, shard_num_col.type}}); - ExpressionAnalyzer(ast, syntax_result, context).getActions(true, false)->execute(block); - - res.insert(block.getByName(key_name)); - } - } - } - - /// We always add _shard_num constant just in case. - /// For initial query it is considered as a column from table, and may be required by intermediate block. - if (!res.has(shard_num_col.name)) - res.insert(std::move(shard_num_col)); - - return res; -} - ActionsDAGPtr getConvertingDAG(const Block & block, const Block & header) { /// Convert header structure to expected. @@ -152,29 +69,20 @@ void addConvertingActions(QueryPlan & plan, const Block & header) plan.addStep(std::move(converting)); } -void addConvertingActions(Pipe & pipe, const Block & header) -{ - if (blocksHaveEqualStructure(pipe.getHeader(), header)) - return; - - auto convert_actions = std::make_shared(getConvertingDAG(pipe.getHeader(), header)); - pipe.addSimpleTransform([&](const Block & cur_header, Pipe::StreamType) -> ProcessorPtr - { - return std::make_shared(cur_header, convert_actions); - }); -} - std::unique_ptr createLocalPlan( const ASTPtr & query_ast, const Block & header, ContextPtr context, - QueryProcessingStage::Enum processed_stage) + QueryProcessingStage::Enum processed_stage, + UInt32 shard_num, + UInt32 shard_count) { checkStackSize(); auto query_plan = std::make_unique(); - InterpreterSelectQuery interpreter(query_ast, context, SelectQueryOptions(processed_stage)); + InterpreterSelectQuery interpreter( + query_ast, context, SelectQueryOptions(processed_stage).setShardInfo(shard_num, shard_count)); interpreter.buildQueryPlan(*query_plan); addConvertingActions(*query_plan, header); @@ -182,74 +90,37 @@ std::unique_ptr createLocalPlan( return query_plan; } -String formattedAST(const ASTPtr & ast) -{ - if (!ast) - return {}; - WriteBufferFromOwnString buf; - formatAST(*ast, buf, false, true); - return buf.str(); -} - } void SelectStreamFactory::createForShard( const Cluster::ShardInfo & shard_info, const ASTPtr & query_ast, - ContextPtr context, const ThrottlerPtr & throttler, - const SelectQueryInfo &, - std::vector & plans, - Pipes & remote_pipes, - Pipes & delayed_pipes, - Poco::Logger * log) + const StorageID & main_table, + const ASTPtr & table_func_ptr, + ContextPtr context, + std::vector & local_plans, + Shards & remote_shards, + UInt32 shard_count) { - bool add_agg_info = processed_stage == QueryProcessingStage::WithMergeableState; - bool add_totals = false; - bool add_extremes = false; - bool async_read = context->getSettingsRef().async_socket_for_remote; - if (processed_stage == QueryProcessingStage::Complete) - { - add_totals = query_ast->as().group_by_with_totals; - add_extremes = context->getSettingsRef().extremes; - } - auto modified_query_ast = query_ast->clone(); - auto modified_header = header; if (has_virtual_shard_num_column) - { VirtualColumnUtils::rewriteEntityInAst(modified_query_ast, "_shard_num", shard_info.shard_num, "toUInt32"); - auto shard_num_constants = evaluateConstantGroupByKeysWithShardNumber(context, query_ast, modified_header, shard_info.shard_num); - - for (auto & col : shard_num_constants) - { - if (modified_header.has(col.name)) - modified_header.getByName(col.name).column = std::move(col.column); - else - modified_header.insert(std::move(col)); - } - } auto emplace_local_stream = [&]() { - plans.emplace_back(createLocalPlan(modified_query_ast, modified_header, context, processed_stage)); - addConvertingActions(*plans.back(), header); + local_plans.emplace_back(createLocalPlan(modified_query_ast, header, context, processed_stage, shard_info.shard_num, shard_count)); }; - String modified_query = formattedAST(modified_query_ast); - - auto emplace_remote_stream = [&]() + auto emplace_remote_stream = [&](bool lazy = false, UInt32 local_delay = 0) { - auto remote_query_executor = std::make_shared( - shard_info.pool, modified_query, modified_header, context, throttler, scalars, external_tables, processed_stage); - remote_query_executor->setLogger(log); - - remote_query_executor->setPoolMode(PoolMode::GET_MANY); - if (!table_func_ptr) - remote_query_executor->setMainTable(main_table); - - remote_pipes.emplace_back(createRemoteSourcePipe(remote_query_executor, add_agg_info, add_totals, add_extremes, async_read)); - remote_pipes.back().addInterpreterContext(context); - addConvertingActions(remote_pipes.back(), header); + remote_shards.emplace_back(Shard{ + .query = modified_query_ast, + .header = header, + .shard_num = shard_info.shard_num, + .pool = shard_info.pool, + .lazy = lazy, + .local_delay = local_delay, + }); }; const auto & settings = context->getSettingsRef(); @@ -339,66 +210,7 @@ void SelectStreamFactory::createForShard( /// Try our luck with remote replicas, but if they are stale too, then fallback to local replica. /// Do it lazily to avoid connecting in the main thread. - - auto lazily_create_stream = [ - pool = shard_info.pool, shard_num = shard_info.shard_num, modified_query, header = modified_header, modified_query_ast, - context, throttler, - main_table = main_table, table_func_ptr = table_func_ptr, scalars = scalars, external_tables = external_tables, - stage = processed_stage, local_delay, add_agg_info, add_totals, add_extremes, async_read]() - -> Pipe - { - auto current_settings = context->getSettingsRef(); - auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover( - current_settings).getSaturated( - current_settings.max_execution_time); - std::vector try_results; - try - { - if (table_func_ptr) - try_results = pool->getManyForTableFunction(timeouts, ¤t_settings, PoolMode::GET_MANY); - else - try_results = pool->getManyChecked(timeouts, ¤t_settings, PoolMode::GET_MANY, main_table.getQualifiedName()); - } - catch (const Exception & ex) - { - if (ex.code() == ErrorCodes::ALL_CONNECTION_TRIES_FAILED) - LOG_WARNING(&Poco::Logger::get("ClusterProxy::SelectStreamFactory"), - "Connections to remote replicas of local shard {} failed, will use stale local replica", shard_num); - else - throw; - } - - double max_remote_delay = 0.0; - for (const auto & try_result : try_results) - { - if (!try_result.is_up_to_date) - max_remote_delay = std::max(try_result.staleness, max_remote_delay); - } - - if (try_results.empty() || local_delay < max_remote_delay) - { - auto plan = createLocalPlan(modified_query_ast, header, context, stage); - return QueryPipeline::getPipe(std::move(*plan->buildQueryPipeline( - QueryPlanOptimizationSettings::fromContext(context), - BuildQueryPipelineSettings::fromContext(context)))); - } - else - { - std::vector connections; - connections.reserve(try_results.size()); - for (auto & try_result : try_results) - connections.emplace_back(std::move(try_result.entry)); - - auto remote_query_executor = std::make_shared( - std::move(connections), modified_query, header, context, throttler, scalars, external_tables, stage); - - return createRemoteSourcePipe(remote_query_executor, add_agg_info, add_totals, add_extremes, async_read); - } - }; - - delayed_pipes.emplace_back(createDelayedPipe(modified_header, lazily_create_stream, add_totals, add_extremes)); - delayed_pipes.back().addInterpreterContext(context); - addConvertingActions(delayed_pipes.back(), header); + emplace_remote_stream(true /* lazy */, local_delay); } else emplace_remote_stream(); diff --git a/src/Interpreters/ClusterProxy/SelectStreamFactory.h b/src/Interpreters/ClusterProxy/SelectStreamFactory.h index 0705bcb2903..dda6fb96f01 100644 --- a/src/Interpreters/ClusterProxy/SelectStreamFactory.h +++ b/src/Interpreters/ClusterProxy/SelectStreamFactory.h @@ -14,42 +14,26 @@ namespace ClusterProxy class SelectStreamFactory final : public IStreamFactory { public: - /// Database in a query. SelectStreamFactory( const Block & header_, QueryProcessingStage::Enum processed_stage_, - StorageID main_table_, - const Scalars & scalars_, - bool has_virtual_shard_num_column_, - const Tables & external_tables); - - /// TableFunction in a query. - SelectStreamFactory( - const Block & header_, - QueryProcessingStage::Enum processed_stage_, - ASTPtr table_func_ptr_, - const Scalars & scalars_, - bool has_virtual_shard_num_column_, - const Tables & external_tables_); + bool has_virtual_shard_num_column_); void createForShard( const Cluster::ShardInfo & shard_info, const ASTPtr & query_ast, - ContextPtr context, const ThrottlerPtr & throttler, - const SelectQueryInfo & query_info, - std::vector & plans, - Pipes & remote_pipes, - Pipes & delayed_pipes, - Poco::Logger * log) override; + const StorageID & main_table, + const ASTPtr & table_func_ptr, + ContextPtr context, + std::vector & local_plans, + Shards & remote_shards, + UInt32 shard_count) override; private: const Block header; QueryProcessingStage::Enum processed_stage; - StorageID main_table = StorageID::createEmpty(); - ASTPtr table_func_ptr; - Scalars scalars; + bool has_virtual_shard_num_column = false; - Tables external_tables; }; } diff --git a/src/Interpreters/ClusterProxy/executeQuery.cpp b/src/Interpreters/ClusterProxy/executeQuery.cpp index 59d8942538c..95b279fd59b 100644 --- a/src/Interpreters/ClusterProxy/executeQuery.cpp +++ b/src/Interpreters/ClusterProxy/executeQuery.cpp @@ -8,9 +8,10 @@ #include #include #include -#include +#include #include #include +#include namespace DB @@ -101,6 +102,10 @@ ContextMutablePtr updateSettingsForCluster(const Cluster & cluster, ContextPtr c void executeQuery( QueryPlan & query_plan, + const Block & header, + QueryProcessingStage::Enum processed_stage, + const StorageID & main_table, + const ASTPtr & table_func_ptr, IStreamFactory & stream_factory, Poco::Logger * log, const ASTPtr & query_ast, ContextPtr context, const SelectQueryInfo & query_info, const ExpressionActionsPtr & sharding_key_expr, @@ -115,8 +120,7 @@ void executeQuery( throw Exception("Maximum distributed depth exceeded", ErrorCodes::TOO_LARGE_DISTRIBUTED_DEPTH); std::vector plans; - Pipes remote_pipes; - Pipes delayed_pipes; + IStreamFactory::Shards remote_shards; auto new_context = updateSettingsForCluster(*query_info.getCluster(), context, settings, log); @@ -161,29 +165,36 @@ void executeQuery( query_ast_for_shard = query_ast; stream_factory.createForShard(shard_info, - query_ast_for_shard, - new_context, throttler, query_info, plans, - remote_pipes, delayed_pipes, log); + query_ast_for_shard, main_table, table_func_ptr, + new_context, plans, remote_shards, shards); } - if (!remote_pipes.empty()) + if (!remote_shards.empty()) { + Scalars scalars = context->hasQueryContext() ? context->getQueryContext()->getScalars() : Scalars{}; + scalars.emplace( + "_shard_count", Block{{DataTypeUInt32().createColumnConst(1, shards), std::make_shared(), "_shard_count"}}); + auto external_tables = context->getExternalTables(); + auto plan = std::make_unique(); - auto read_from_remote = std::make_unique(Pipe::unitePipes(std::move(remote_pipes))); + auto read_from_remote = std::make_unique( + std::move(remote_shards), + header, + processed_stage, + main_table, + table_func_ptr, + new_context, + throttler, + std::move(scalars), + std::move(external_tables), + log, + shards); + read_from_remote->setStepDescription("Read from remote replica"); plan->addStep(std::move(read_from_remote)); plans.emplace_back(std::move(plan)); } - if (!delayed_pipes.empty()) - { - auto plan = std::make_unique(); - auto read_from_remote = std::make_unique(Pipe::unitePipes(std::move(delayed_pipes))); - read_from_remote->setStepDescription("Read from delayed local replica"); - plan->addStep(std::move(read_from_remote)); - plans.emplace_back(std::move(plan)); - } - if (plans.empty()) return; diff --git a/src/Interpreters/ClusterProxy/executeQuery.h b/src/Interpreters/ClusterProxy/executeQuery.h index c9efedfc422..0a77b7b6035 100644 --- a/src/Interpreters/ClusterProxy/executeQuery.h +++ b/src/Interpreters/ClusterProxy/executeQuery.h @@ -1,6 +1,7 @@ #pragma once #include +#include #include namespace DB @@ -17,6 +18,8 @@ class QueryPlan; class ExpressionActions; using ExpressionActionsPtr = std::shared_ptr; +struct StorageID; + namespace ClusterProxy { @@ -38,6 +41,10 @@ ContextMutablePtr updateSettingsForCluster(const Cluster & cluster, ContextPtr c /// (currently SELECT, DESCRIBE). void executeQuery( QueryPlan & query_plan, + const Block & header, + QueryProcessingStage::Enum processed_stage, + const StorageID & main_table, + const ASTPtr & table_func_ptr, IStreamFactory & stream_factory, Poco::Logger * log, const ASTPtr & query_ast, ContextPtr context, const SelectQueryInfo & query_info, const ExpressionActionsPtr & sharding_key_expr, diff --git a/src/Interpreters/CollectJoinOnKeysVisitor.cpp b/src/Interpreters/CollectJoinOnKeysVisitor.cpp index 3b3fdaa65cb..9715af01a0a 100644 --- a/src/Interpreters/CollectJoinOnKeysVisitor.cpp +++ b/src/Interpreters/CollectJoinOnKeysVisitor.cpp @@ -12,48 +12,77 @@ namespace ErrorCodes extern const int INVALID_JOIN_ON_EXPRESSION; extern const int AMBIGUOUS_COLUMN_NAME; extern const int SYNTAX_ERROR; - extern const int NOT_IMPLEMENTED; extern const int LOGICAL_ERROR; } -void CollectJoinOnKeysMatcher::Data::addJoinKeys(const ASTPtr & left_ast, const ASTPtr & right_ast, - const std::pair & table_no) +namespace +{ + +bool isLeftIdentifier(JoinIdentifierPos pos) +{ + /// Unknown identifiers considered as left, we will try to process it on later stages + /// Usually such identifiers came from `ARRAY JOIN ... AS ...` + return pos == JoinIdentifierPos::Left || pos == JoinIdentifierPos::Unknown; +} + +bool isRightIdentifier(JoinIdentifierPos pos) +{ + return pos == JoinIdentifierPos::Right; +} + +} + +void CollectJoinOnKeysMatcher::Data::addJoinKeys(const ASTPtr & left_ast, const ASTPtr & right_ast, JoinIdentifierPosPair table_pos) { ASTPtr left = left_ast->clone(); ASTPtr right = right_ast->clone(); - if (table_no.first == 1 || table_no.second == 2) + if (isLeftIdentifier(table_pos.first) && isRightIdentifier(table_pos.second)) analyzed_join.addOnKeys(left, right); - else if (table_no.first == 2 || table_no.second == 1) + else if (isRightIdentifier(table_pos.first) && isLeftIdentifier(table_pos.second)) analyzed_join.addOnKeys(right, left); else throw Exception("Cannot detect left and right JOIN keys. JOIN ON section is ambiguous.", - ErrorCodes::AMBIGUOUS_COLUMN_NAME); - has_some = true; + ErrorCodes::INVALID_JOIN_ON_EXPRESSION); } void CollectJoinOnKeysMatcher::Data::addAsofJoinKeys(const ASTPtr & left_ast, const ASTPtr & right_ast, - const std::pair & table_no, const ASOF::Inequality & inequality) + JoinIdentifierPosPair table_pos, const ASOF::Inequality & inequality) { - if (table_no.first == 1 || table_no.second == 2) + if (isLeftIdentifier(table_pos.first) && isRightIdentifier(table_pos.second)) { asof_left_key = left_ast->clone(); asof_right_key = right_ast->clone(); analyzed_join.setAsofInequality(inequality); } - else if (table_no.first == 2 || table_no.second == 1) + else if (isRightIdentifier(table_pos.first) && isLeftIdentifier(table_pos.second)) { asof_left_key = right_ast->clone(); asof_right_key = left_ast->clone(); analyzed_join.setAsofInequality(ASOF::reverseInequality(inequality)); } + else + { + throw Exception(ErrorCodes::INVALID_JOIN_ON_EXPRESSION, + "Expressions {} and {} are from the same table but from different arguments of equal function in ASOF JOIN", + queryToString(left_ast), queryToString(right_ast)); + } } void CollectJoinOnKeysMatcher::Data::asofToJoinKeys() { if (!asof_left_key || !asof_right_key) throw Exception("No inequality in ASOF JOIN ON section.", ErrorCodes::INVALID_JOIN_ON_EXPRESSION); - addJoinKeys(asof_left_key, asof_right_key, {1, 2}); + addJoinKeys(asof_left_key, asof_right_key, {JoinIdentifierPos::Left, JoinIdentifierPos::Right}); +} + +void CollectJoinOnKeysMatcher::visit(const ASTIdentifier & ident, const ASTPtr & ast, CollectJoinOnKeysMatcher::Data & data) +{ + if (auto expr_from_table = getTableForIdentifiers(ast, false, data); expr_from_table != JoinIdentifierPos::Unknown) + data.analyzed_join.addJoinCondition(ast, isLeftIdentifier(expr_from_table)); + else + throw Exception("Unexpected identifier '" + ident.name() + "' in JOIN ON section", + ErrorCodes::INVALID_JOIN_ON_EXPRESSION); } void CollectJoinOnKeysMatcher::visit(const ASTFunction & func, const ASTPtr & ast, Data & data) @@ -61,9 +90,6 @@ void CollectJoinOnKeysMatcher::visit(const ASTFunction & func, const ASTPtr & as if (func.name == "and") return; /// go into children - if (func.name == "or") - throw Exception("JOIN ON does not support OR. Unexpected '" + queryToString(ast) + "'", ErrorCodes::NOT_IMPLEMENTED); - ASOF::Inequality inequality = ASOF::getInequality(func.name); if (func.name == "equals" || inequality != ASOF::Inequality::None) { @@ -71,32 +97,50 @@ void CollectJoinOnKeysMatcher::visit(const ASTFunction & func, const ASTPtr & as throw Exception("Function " + func.name + " takes two arguments, got '" + func.formatForErrorMessage() + "' instead", ErrorCodes::SYNTAX_ERROR); } - else - throw Exception("Expected equality or inequality, got '" + queryToString(ast) + "'", ErrorCodes::INVALID_JOIN_ON_EXPRESSION); if (func.name == "equals") { ASTPtr left = func.arguments->children.at(0); ASTPtr right = func.arguments->children.at(1); - auto table_numbers = getTableNumbers(ast, left, right, data); - data.addJoinKeys(left, right, table_numbers); - } - else if (inequality != ASOF::Inequality::None) - { - if (!data.is_asof) - throw Exception("JOIN ON inequalities are not supported. Unexpected '" + queryToString(ast) + "'", - ErrorCodes::NOT_IMPLEMENTED); + auto table_numbers = getTableNumbers(left, right, data); + if (table_numbers.first == table_numbers.second) + { + if (table_numbers.first == JoinIdentifierPos::Unknown) + throw Exception("Ambiguous column in expression '" + queryToString(ast) + "' in JOIN ON section", + ErrorCodes::AMBIGUOUS_COLUMN_NAME); + data.analyzed_join.addJoinCondition(ast, isLeftIdentifier(table_numbers.first)); + return; + } + if (table_numbers.first != JoinIdentifierPos::NotApplicable && table_numbers.second != JoinIdentifierPos::NotApplicable) + { + data.addJoinKeys(left, right, table_numbers); + return; + } + } + + if (auto expr_from_table = getTableForIdentifiers(ast, false, data); expr_from_table != JoinIdentifierPos::Unknown) + { + data.analyzed_join.addJoinCondition(ast, isLeftIdentifier(expr_from_table)); + return; + } + + if (data.is_asof && inequality != ASOF::Inequality::None) + { if (data.asof_left_key || data.asof_right_key) throw Exception("ASOF JOIN expects exactly one inequality in ON section. Unexpected '" + queryToString(ast) + "'", ErrorCodes::INVALID_JOIN_ON_EXPRESSION); ASTPtr left = func.arguments->children.at(0); ASTPtr right = func.arguments->children.at(1); - auto table_numbers = getTableNumbers(ast, left, right, data); + auto table_numbers = getTableNumbers(left, right, data); data.addAsofJoinKeys(left, right, table_numbers, inequality); + return; } + + throw Exception("Unsupported JOIN ON conditions. Unexpected '" + queryToString(ast) + "'", + ErrorCodes::INVALID_JOIN_ON_EXPRESSION); } void CollectJoinOnKeysMatcher::getIdentifiers(const ASTPtr & ast, std::vector & out) @@ -118,32 +162,10 @@ void CollectJoinOnKeysMatcher::getIdentifiers(const ASTPtr & ast, std::vector CollectJoinOnKeysMatcher::getTableNumbers(const ASTPtr & expr, const ASTPtr & left_ast, const ASTPtr & right_ast, - Data & data) +JoinIdentifierPosPair CollectJoinOnKeysMatcher::getTableNumbers(const ASTPtr & left_ast, const ASTPtr & right_ast, Data & data) { - std::vector left_identifiers; - std::vector right_identifiers; - - getIdentifiers(left_ast, left_identifiers); - getIdentifiers(right_ast, right_identifiers); - - if (left_identifiers.empty() || right_identifiers.empty()) - { - throw Exception("Not equi-join ON expression: " + queryToString(expr) + ". No columns in one of equality side.", - ErrorCodes::INVALID_JOIN_ON_EXPRESSION); - } - - size_t left_idents_table = getTableForIdentifiers(left_identifiers, data); - size_t right_idents_table = getTableForIdentifiers(right_identifiers, data); - - if (left_idents_table && left_idents_table == right_idents_table) - { - auto left_name = queryToString(*left_identifiers[0]); - auto right_name = queryToString(*right_identifiers[0]); - - throw Exception("In expression " + queryToString(expr) + " columns " + left_name + " and " + right_name - + " are from the same table but from different arguments of equal function", ErrorCodes::INVALID_JOIN_ON_EXPRESSION); - } + auto left_idents_table = getTableForIdentifiers(left_ast, true, data); + auto right_idents_table = getTableForIdentifiers(right_ast, true, data); return std::make_pair(left_idents_table, right_idents_table); } @@ -173,11 +195,16 @@ const ASTIdentifier * CollectJoinOnKeysMatcher::unrollAliases(const ASTIdentifie return identifier; } -/// @returns 1 if identifiers belongs to left table, 2 for right table and 0 if unknown. Throws on table mix. +/// @returns Left or right table identifiers belongs to. /// Place detected identifier into identifiers[0] if any. -size_t CollectJoinOnKeysMatcher::getTableForIdentifiers(std::vector & identifiers, const Data & data) +JoinIdentifierPos CollectJoinOnKeysMatcher::getTableForIdentifiers(const ASTPtr & ast, bool throw_on_table_mix, const Data & data) { - size_t table_number = 0; + std::vector identifiers; + getIdentifiers(ast, identifiers); + if (identifiers.empty()) + return JoinIdentifierPos::NotApplicable; + + JoinIdentifierPos table_number = JoinIdentifierPos::Unknown; for (auto & ident : identifiers) { @@ -187,10 +214,20 @@ size_t CollectJoinOnKeysMatcher::getTableForIdentifiers(std::vectorname()); + } - if (!membership) + if (membership == JoinIdentifierPos::Unknown) { const String & name = identifier->name(); bool in_left_table = data.left_table.hasColumn(name); @@ -211,22 +248,24 @@ size_t CollectJoinOnKeysMatcher::getTableForIdentifiers(std::vectorgetAliasOrColumnName() + " and " + ident->getAliasOrColumnName() - + " are from different tables.", ErrorCodes::INVALID_JOIN_ON_EXPRESSION); + if (throw_on_table_mix) + throw Exception("Invalid columns in JOIN ON section. Columns " + + identifiers[0]->getAliasOrColumnName() + " and " + ident->getAliasOrColumnName() + + " are from different tables.", ErrorCodes::INVALID_JOIN_ON_EXPRESSION); + return JoinIdentifierPos::Unknown; } } diff --git a/src/Interpreters/CollectJoinOnKeysVisitor.h b/src/Interpreters/CollectJoinOnKeysVisitor.h index 54e008a114e..0647f58f79b 100644 --- a/src/Interpreters/CollectJoinOnKeysVisitor.h +++ b/src/Interpreters/CollectJoinOnKeysVisitor.h @@ -18,6 +18,21 @@ namespace ASOF enum class Inequality; } +enum class JoinIdentifierPos +{ + /// Position can't be established, identifier not resolved + Unknown, + /// Left side of JOIN + Left, + /// Right side of JOIN + Right, + /// Expression not valid, e.g. doesn't contain identifiers + NotApplicable, +}; + +using JoinIdentifierPosPair = std::pair; + + class CollectJoinOnKeysMatcher { public: @@ -32,10 +47,9 @@ public: const bool is_asof{false}; ASTPtr asof_left_key{}; ASTPtr asof_right_key{}; - bool has_some{false}; - void addJoinKeys(const ASTPtr & left_ast, const ASTPtr & right_ast, const std::pair & table_no); - void addAsofJoinKeys(const ASTPtr & left_ast, const ASTPtr & right_ast, const std::pair & table_no, + void addJoinKeys(const ASTPtr & left_ast, const ASTPtr & right_ast, JoinIdentifierPosPair table_pos); + void addAsofJoinKeys(const ASTPtr & left_ast, const ASTPtr & right_ast, JoinIdentifierPosPair table_pos, const ASOF::Inequality & asof_inequality); void asofToJoinKeys(); }; @@ -43,7 +57,17 @@ public: static void visit(const ASTPtr & ast, Data & data) { if (auto * func = ast->as()) + { visit(*func, ast, data); + } + else if (auto * ident = ast->as()) + { + visit(*ident, ast, data); + } + else + { + /// visit children + } } static bool needChildVisit(const ASTPtr & node, const ASTPtr &) @@ -55,11 +79,12 @@ public: private: static void visit(const ASTFunction & func, const ASTPtr & ast, Data & data); + static void visit(const ASTIdentifier & ident, const ASTPtr & ast, Data & data); static void getIdentifiers(const ASTPtr & ast, std::vector & out); - static std::pair getTableNumbers(const ASTPtr & expr, const ASTPtr & left_ast, const ASTPtr & right_ast, Data & data); + static JoinIdentifierPosPair getTableNumbers(const ASTPtr & left_ast, const ASTPtr & right_ast, Data & data); static const ASTIdentifier * unrollAliases(const ASTIdentifier * identifier, const Aliases & aliases); - static size_t getTableForIdentifiers(std::vector & identifiers, const Data & data); + static JoinIdentifierPos getTableForIdentifiers(const ASTPtr & ast, bool throw_on_table_mix, const Data & data); }; /// Parse JOIN ON expression and collect ASTs for joined columns. diff --git a/src/Interpreters/ColumnAliasesVisitor.cpp b/src/Interpreters/ColumnAliasesVisitor.cpp index b239d36ee13..9b7e0a91c18 100644 --- a/src/Interpreters/ColumnAliasesVisitor.cpp +++ b/src/Interpreters/ColumnAliasesVisitor.cpp @@ -81,6 +81,7 @@ void ColumnAliasesMatcher::visit(ASTIdentifier & node, ASTPtr & ast, Data & data else ast->setAlias(*column_name); + data.changed = true; // revisit ast to track recursive alias columns Visitor(data).visit(ast); } diff --git a/src/Interpreters/ColumnAliasesVisitor.h b/src/Interpreters/ColumnAliasesVisitor.h index e340ab0daa0..9be83d83d49 100644 --- a/src/Interpreters/ColumnAliasesVisitor.h +++ b/src/Interpreters/ColumnAliasesVisitor.h @@ -60,6 +60,9 @@ public: /// private_aliases are from lambda, so these are local names. NameSet private_aliases; + /// Check if query is changed by this visitor. + bool changed = false; + Data(const ColumnsDescription & columns_, const NameToNameMap & array_join_result_columns_, ContextPtr context_) : columns(columns_), context(context_) { diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index cbf2c0820f5..635af6f3cb7 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -42,7 +42,8 @@ #include #include #include -#include +#include +#include #include #include #include @@ -76,6 +77,8 @@ #include #include #include +#include +#include #include @@ -348,6 +351,11 @@ struct ContextSharedPart scope_guard dictionaries_xmls; +#if USE_NLP + mutable std::optional synonyms_extensions; + mutable std::optional lemmatizers; +#endif + String default_profile_name; /// Default profile name used for default values. String system_profile_name; /// Profile used by system processes String buffer_profile_name; /// Profile used by Buffer engine for flushing to the underlying @@ -798,10 +806,16 @@ void Context::setUser(const Credentials & credentials, const Poco::Net::SocketAd user_id = new_user_id; access = std::move(new_access); - current_roles.clear(); - use_default_roles = true; - setSettings(*access->getDefaultSettings()); + auto user = access->getUser(); + current_roles = std::make_shared>(user->granted_roles.findGranted(user->default_roles)); + + if (!user->default_database.empty()) + setCurrentDatabase(user->default_database); + + auto default_profile_info = access->getDefaultProfileInfo(); + settings_constraints_and_current_profiles = default_profile_info->getConstraintsAndProfileIDs(); + applySettingsChanges(default_profile_info->settings); } void Context::setUser(const String & name, const String & password, const Poco::Net::SocketAddress & address) @@ -840,21 +854,16 @@ std::optional Context::getUserID() const void Context::setCurrentRoles(const std::vector & current_roles_) { auto lock = getLock(); - if (current_roles == current_roles_ && !use_default_roles) - return; - current_roles = current_roles_; - use_default_roles = false; + if (current_roles ? (*current_roles == current_roles_) : current_roles_.empty()) + return; + current_roles = std::make_shared>(current_roles_); calculateAccessRights(); } void Context::setCurrentRolesDefault() { - auto lock = getLock(); - if (use_default_roles) - return; - current_roles.clear(); - use_default_roles = true; - calculateAccessRights(); + auto user = getUser(); + setCurrentRoles(user->granted_roles.findGranted(user->default_roles)); } boost::container::flat_set Context::getCurrentRoles() const @@ -877,7 +886,13 @@ void Context::calculateAccessRights() { auto lock = getLock(); if (user_id) - access = getAccessControlManager().getContextAccess(*user_id, current_roles, use_default_roles, settings, current_database, client_info); + access = getAccessControlManager().getContextAccess( + *user_id, + current_roles ? *current_roles : std::vector{}, + /* use_default_roles = */ false, + settings, + current_database, + client_info); } @@ -936,19 +951,41 @@ std::optional Context::getQuotaUsage() const } -void Context::setProfile(const String & profile_name) +void Context::setCurrentProfile(const String & profile_name) { - SettingsChanges profile_settings_changes = *getAccessControlManager().getProfileSettings(profile_name); + auto lock = getLock(); try { - checkSettingsConstraints(profile_settings_changes); + UUID profile_id = getAccessControlManager().getID(profile_name); + setCurrentProfile(profile_id); } catch (Exception & e) { e.addMessage(", while trying to set settings profile {}", profile_name); throw; } - applySettingsChanges(profile_settings_changes); +} + +void Context::setCurrentProfile(const UUID & profile_id) +{ + auto lock = getLock(); + auto profile_info = getAccessControlManager().getSettingsProfileInfo(profile_id); + checkSettingsConstraints(profile_info->settings); + applySettingsChanges(profile_info->settings); + settings_constraints_and_current_profiles = profile_info->getConstraintsAndProfileIDs(settings_constraints_and_current_profiles); +} + + +std::vector Context::getCurrentProfiles() const +{ + auto lock = getLock(); + return settings_constraints_and_current_profiles->current_profiles; +} + +std::vector Context::getEnabledProfiles() const +{ + auto lock = getLock(); + return settings_constraints_and_current_profiles->enabled_profiles; } @@ -970,6 +1007,13 @@ const Block & Context::getScalar(const String & name) const return it->second; } +const Block * Context::tryGetLocalScalar(const String & name) const +{ + auto it = local_scalars.find(name); + if (local_scalars.end() == it) + return nullptr; + return &it->second; +} Tables Context::getExternalTables() const { @@ -1029,6 +1073,13 @@ void Context::addScalar(const String & name, const Block & block) } +void Context::addLocalScalar(const String & name, const Block & block) +{ + assert(!isGlobalContext() || getApplicationType() == ApplicationType::LOCAL); + local_scalars[name] = block; +} + + bool Context::hasScalar(const String & name) const { assert(!isGlobalContext() || getApplicationType() == ApplicationType::LOCAL); @@ -1147,7 +1198,7 @@ void Context::setSetting(const StringRef & name, const String & value) auto lock = getLock(); if (name == "profile") { - setProfile(value); + setCurrentProfile(value); return; } settings.set(std::string_view{name}, value); @@ -1162,7 +1213,7 @@ void Context::setSetting(const StringRef & name, const Field & value) auto lock = getLock(); if (name == "profile") { - setProfile(value.safeGet()); + setCurrentProfile(value.safeGet()); return; } settings.set(std::string_view{name}, value); @@ -1198,31 +1249,31 @@ void Context::applySettingsChanges(const SettingsChanges & changes) void Context::checkSettingsConstraints(const SettingChange & change) const { - if (auto settings_constraints = getSettingsConstraints()) - settings_constraints->check(settings, change); + getSettingsConstraintsAndCurrentProfiles()->constraints.check(settings, change); } void Context::checkSettingsConstraints(const SettingsChanges & changes) const { - if (auto settings_constraints = getSettingsConstraints()) - settings_constraints->check(settings, changes); + getSettingsConstraintsAndCurrentProfiles()->constraints.check(settings, changes); } void Context::checkSettingsConstraints(SettingsChanges & changes) const { - if (auto settings_constraints = getSettingsConstraints()) - settings_constraints->check(settings, changes); + getSettingsConstraintsAndCurrentProfiles()->constraints.check(settings, changes); } void Context::clampToSettingsConstraints(SettingsChanges & changes) const { - if (auto settings_constraints = getSettingsConstraints()) - settings_constraints->clamp(settings, changes); + getSettingsConstraintsAndCurrentProfiles()->constraints.clamp(settings, changes); } -std::shared_ptr Context::getSettingsConstraints() const +std::shared_ptr Context::getSettingsConstraintsAndCurrentProfiles() const { - return getAccess()->getSettingsConstraints(); + auto lock = getLock(); + if (settings_constraints_and_current_profiles) + return settings_constraints_and_current_profiles; + static auto no_constraints_or_profiles = std::make_shared(getAccessControlManager()); + return no_constraints_or_profiles; } @@ -1475,6 +1526,29 @@ void Context::loadDictionaries(const Poco::Util::AbstractConfiguration & config) std::make_unique(config, "dictionaries_config")); } +#if USE_NLP + +SynonymsExtensions & Context::getSynonymsExtensions() const +{ + auto lock = getLock(); + + if (!shared->synonyms_extensions) + shared->synonyms_extensions.emplace(getConfigRef()); + + return *shared->synonyms_extensions; +} + +Lemmatizers & Context::getLemmatizers() const +{ + auto lock = getLock(); + + if (!shared->lemmatizers) + shared->lemmatizers.emplace(getConfigRef()); + + return *shared->lemmatizers; +} +#endif + void Context::setProgressCallback(ProgressCallback callback) { /// Callback is set to a session or to a query. In the session, only one query is processed at a time. Therefore, the lock is not needed. @@ -1718,13 +1792,31 @@ zkutil::ZooKeeperPtr Context::getZooKeeper() const const auto & config = shared->zookeeper_config ? *shared->zookeeper_config : getConfigRef(); if (!shared->zookeeper) - shared->zookeeper = std::make_shared(config, "zookeeper"); + shared->zookeeper = std::make_shared(config, "zookeeper", getZooKeeperLog()); else if (shared->zookeeper->expired()) shared->zookeeper = shared->zookeeper->startNewSession(); return shared->zookeeper; } +void Context::setSystemZooKeeperLogAfterInitializationIfNeeded() +{ + /// It can be nearly impossible to understand in which order global objects are initialized on server startup. + /// If getZooKeeper() is called before initializeSystemLogs(), then zkutil::ZooKeeper gets nullptr + /// instead of pointer to system table and it logs nothing. + /// This method explicitly sets correct pointer to system log after its initialization. + /// TODO get rid of this if possible + + std::lock_guard lock(shared->zookeeper_mutex); + if (!shared->system_logs || !shared->system_logs->zookeeper_log) + return; + + if (shared->zookeeper) + shared->zookeeper->setZooKeeperLog(shared->system_logs->zookeeper_log); + + for (auto & zk : shared->auxiliary_zookeepers) + zk.second->setZooKeeperLog(shared->system_logs->zookeeper_log); +} void Context::initializeKeeperStorageDispatcher() const { @@ -1782,8 +1874,8 @@ zkutil::ZooKeeperPtr Context::getAuxiliaryZooKeeper(const String & name) const "config.xml", name); - zookeeper - = shared->auxiliary_zookeepers.emplace(name, std::make_shared(config, "auxiliary_zookeepers." + name)).first; + zookeeper = shared->auxiliary_zookeepers.emplace(name, + std::make_shared(config, "auxiliary_zookeepers." + name, getZooKeeperLog())).first; } else if (zookeeper->second->expired()) zookeeper->second = zookeeper->second->startNewSession(); @@ -1797,14 +1889,15 @@ void Context::resetZooKeeper() const shared->zookeeper.reset(); } -static void reloadZooKeeperIfChangedImpl(const ConfigurationPtr & config, const std::string & config_name, zkutil::ZooKeeperPtr & zk) +static void reloadZooKeeperIfChangedImpl(const ConfigurationPtr & config, const std::string & config_name, zkutil::ZooKeeperPtr & zk, + std::shared_ptr zk_log) { if (!zk || zk->configChanged(*config, config_name)) { if (zk) zk->finalize(); - zk = std::make_shared(*config, config_name); + zk = std::make_shared(*config, config_name, std::move(zk_log)); } } @@ -1812,7 +1905,7 @@ void Context::reloadZooKeeperIfChanged(const ConfigurationPtr & config) const { std::lock_guard lock(shared->zookeeper_mutex); shared->zookeeper_config = config; - reloadZooKeeperIfChangedImpl(config, "zookeeper", shared->zookeeper); + reloadZooKeeperIfChangedImpl(config, "zookeeper", shared->zookeeper, getZooKeeperLog()); } void Context::reloadAuxiliaryZooKeepersConfigIfChanged(const ConfigurationPtr & config) @@ -1827,7 +1920,7 @@ void Context::reloadAuxiliaryZooKeepersConfigIfChanged(const ConfigurationPtr & it = shared->auxiliary_zookeepers.erase(it); else { - reloadZooKeeperIfChangedImpl(config, "auxiliary_zookeepers." + it->first, it->second); + reloadZooKeeperIfChangedImpl(config, "auxiliary_zookeepers." + it->first, it->second, getZooKeeperLog()); ++it; } } @@ -2110,6 +2203,17 @@ std::shared_ptr Context::getOpenTelemetrySpanLog() const } +std::shared_ptr Context::getZooKeeperLog() const +{ + auto lock = getLock(); + + if (!shared->system_logs) + return {}; + + return shared->system_logs->zookeeper_log; +} + + CompressionCodecPtr Context::chooseCompressionCodec(size_t part_size, double part_size_ratio) const { auto lock = getLock(); @@ -2413,13 +2517,13 @@ void Context::setDefaultProfiles(const Poco::Util::AbstractConfiguration & confi getAccessControlManager().setDefaultProfileName(shared->default_profile_name); shared->system_profile_name = config.getString("system_profile", shared->default_profile_name); - setProfile(shared->system_profile_name); + setCurrentProfile(shared->system_profile_name); applySettingsQuirks(settings, &Poco::Logger::get("SettingsQuirks")); shared->buffer_profile_name = config.getString("buffer_profile", shared->system_profile_name); buffer_context = Context::createCopy(shared_from_this()); - buffer_context->setProfile(shared->buffer_profile_name); + buffer_context->setCurrentProfile(shared->buffer_profile_name); } String Context::getDefaultProfileName() const @@ -2692,6 +2796,13 @@ ZooKeeperMetadataTransactionPtr Context::getZooKeeperMetadataTransaction() const return metadata_transaction; } +void Context::resetZooKeeperMetadataTransaction() +{ + assert(metadata_transaction); + assert(hasQueryContext()); + metadata_transaction = nullptr; +} + PartUUIDsPtr Context::getPartUUIDs() const { auto lock = getLock(); @@ -2727,18 +2838,4 @@ PartUUIDsPtr Context::getIgnoredPartUUIDs() const return ignored_part_uuids; } -void Context::setMySQLProtocolContext(MySQLWireContext * mysql_context) -{ - assert(session_context.lock().get() == this); - assert(!mysql_protocol_context); - assert(mysql_context); - mysql_protocol_context = mysql_context; -} - -MySQLWireContext * Context::getMySQLProtocolContext() const -{ - assert(!mysql_protocol_context || session_context.lock().get()); - return mysql_protocol_context; -} - } diff --git a/src/Interpreters/Context.h b/src/Interpreters/Context.h index 05eab209eff..66fac7e6e70 100644 --- a/src/Interpreters/Context.h +++ b/src/Interpreters/Context.h @@ -76,6 +76,7 @@ class TraceLog; class MetricLog; class AsynchronousMetricLog; class OpenTelemetrySpanLog; +class ZooKeeperLog; struct MergeTreeSettings; class StorageS3Settings; class IDatabase; @@ -89,7 +90,7 @@ class ICompressionCodec; class AccessControlManager; class Credentials; class GSSAcceptorContext; -class SettingsConstraints; +struct SettingsConstraintsAndProfileIDs; class RemoteHostFilter; struct StorageID; class IDisk; @@ -113,14 +114,17 @@ using VolumePtr = std::shared_ptr; struct NamedSession; struct BackgroundTaskSchedulingSettings; +#if USE_NLP + class SynonymsExtensions; + class Lemmatizers; +#endif + class Throttler; using ThrottlerPtr = std::shared_ptr; class ZooKeeperMetadataTransaction; using ZooKeeperMetadataTransactionPtr = std::shared_ptr; -struct MySQLWireContext; - /// Callback for external tables initializer using ExternalTablesInitializer = std::function; @@ -177,8 +181,8 @@ private: InputBlocksReader input_blocks_reader; std::optional user_id; - std::vector current_roles; - bool use_default_roles = false; + std::shared_ptr> current_roles; + std::shared_ptr settings_constraints_and_current_profiles; std::shared_ptr access; std::shared_ptr initial_row_policy; String current_database; @@ -192,11 +196,13 @@ private: QueryStatus * process_list_elem = nullptr; /// For tracking total resource usage for query. StorageID insertion_table = StorageID::createEmpty(); /// Saved insertion table in query context + bool is_distributed = false; /// Whether the current context it used for distributed query String default_format; /// Format, used when server formats data by itself and if query does not have FORMAT specification. /// Thus, used in HTTP interface. If not specified - then some globally default format is used. TemporaryTablesMapping external_tables_mapping; Scalars scalars; + Scalars local_scalars; /// Fields for distributed s3 function std::optional next_task_callback; @@ -300,8 +306,6 @@ private: /// thousands of signatures. /// And I hope it will be replaced with more common Transaction sometime. - MySQLWireContext * mysql_protocol_context = nullptr; - Context(); Context(const Context &); Context & operator=(const Context &); @@ -382,6 +386,11 @@ public: boost::container::flat_set getEnabledRoles() const; std::shared_ptr getRolesInfo() const; + void setCurrentProfile(const String & profile_name); + void setCurrentProfile(const UUID & profile_id); + std::vector getCurrentProfiles() const; + std::vector getEnabledProfiles() const; + /// Checks access rights. /// Empty database means the current database. void checkAccess(const AccessFlags & flags) const; @@ -452,6 +461,9 @@ public: void addScalar(const String & name, const Block & block); bool hasScalar(const String & name) const; + const Block * tryGetLocalScalar(const String & name) const; + void addLocalScalar(const String & name, const Block & block); + const QueryAccessInfo & getQueryAccessInfo() const { return query_access_info; } void addQueryAccessInfo( const String & quoted_database_name, @@ -498,6 +510,9 @@ public: void setInsertionTable(StorageID db_and_table) { insertion_table = std::move(db_and_table); } const StorageID & getInsertionTable() const { return insertion_table; } + void setDistributed(bool is_distributed_) { is_distributed = is_distributed_; } + bool isDistributed() const { return is_distributed; } + String getDefaultFormat() const; /// If default_format is not specified, some global default format is returned. void setDefaultFormat(const String & name); @@ -520,7 +535,7 @@ public: void clampToSettingsConstraints(SettingsChanges & changes) const; /// Returns the current constraints (can return null). - std::shared_ptr getSettingsConstraints() const; + std::shared_ptr getSettingsConstraintsAndCurrentProfiles() const; const EmbeddedDictionaries & getEmbeddedDictionaries() const; const ExternalDictionariesLoader & getExternalDictionariesLoader() const; @@ -532,6 +547,11 @@ public: void tryCreateEmbeddedDictionaries() const; void loadDictionaries(const Poco::Util::AbstractConfiguration & config); +#if USE_NLP + SynonymsExtensions & getSynonymsExtensions() const; + Lemmatizers & getLemmatizers() const; +#endif + void setExternalModelsConfig(const ConfigurationPtr & config, const std::string & config_name = "models_config"); /// I/O formats. @@ -647,6 +667,8 @@ public: // Reload Zookeeper void reloadZooKeeperIfChanged(const ConfigurationPtr & config) const; + void setSystemZooKeeperLogAfterInitializationIfNeeded(); + /// Create a cache of uncompressed blocks of specified size. This can be done only once. void setUncompressedCache(size_t max_size_in_bytes); std::shared_ptr getUncompressedCache() const; @@ -713,6 +735,7 @@ public: std::shared_ptr getMetricLog() const; std::shared_ptr getAsynchronousMetricLog() const; std::shared_ptr getOpenTelemetrySpanLog() const; + std::shared_ptr getZooKeeperLog() const; /// Returns an object used to log operations with parts if it possible. /// Provide table name to make required checks. @@ -796,11 +819,8 @@ public: void initZooKeeperMetadataTransaction(ZooKeeperMetadataTransactionPtr txn, bool attach_existing = false); /// Returns context of current distributed DDL query or nullptr. ZooKeeperMetadataTransactionPtr getZooKeeperMetadataTransaction() const; - - /// Caller is responsible for lifetime of mysql_context. - /// Used in MySQLHandler for session context. - void setMySQLProtocolContext(MySQLWireContext * mysql_context); - MySQLWireContext * getMySQLProtocolContext() const; + /// Removes context of current distributed DDL. + void resetZooKeeperMetadataTransaction(); PartUUIDsPtr getPartUUIDs() const; PartUUIDsPtr getIgnoredPartUUIDs() const; @@ -819,8 +839,6 @@ private: template void checkAccessImpl(const Args &... args) const; - void setProfile(const String & profile); - EmbeddedDictionaries & getEmbeddedDictionariesImpl(bool throw_on_error) const; void checkCanBeDropped(const String & database, const String & table, const size_t & size, const size_t & max_size_to_drop) const; diff --git a/src/Interpreters/DDLTask.cpp b/src/Interpreters/DDLTask.cpp index 4fb44738d8d..0391e76e763 100644 --- a/src/Interpreters/DDLTask.cpp +++ b/src/Interpreters/DDLTask.cpp @@ -22,6 +22,7 @@ namespace ErrorCodes extern const int UNKNOWN_FORMAT_VERSION; extern const int UNKNOWN_TYPE_OF_QUERY; extern const int INCONSISTENT_CLUSTER_DEFINITION; + extern const int LOGICAL_ERROR; } HostID HostID::fromString(const String & host_port_str) @@ -359,9 +360,10 @@ ContextMutablePtr DatabaseReplicatedTask::makeQueryContext(ContextPtr from_conte { auto query_context = DDLTaskBase::makeQueryContext(from_context, zookeeper); query_context->getClientInfo().query_kind = ClientInfo::QueryKind::SECONDARY_QUERY; + query_context->getClientInfo().is_replicated_database_internal = true; query_context->setCurrentDatabase(database->getDatabaseName()); - auto txn = std::make_shared(zookeeper, database->zookeeper_path, is_initial_query); + auto txn = std::make_shared(zookeeper, database->zookeeper_path, is_initial_query, entry_path); query_context->initZooKeeperMetadataTransaction(txn); if (is_initial_query) @@ -401,7 +403,8 @@ UInt32 DDLTaskBase::getLogEntryNumber(const String & log_entry_name) void ZooKeeperMetadataTransaction::commit() { - assert(state == CREATED); + if (state != CREATED) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Incorrect state ({}), it's a bug", state); state = FAILED; current_zookeeper->multi(ops); state = COMMITTED; diff --git a/src/Interpreters/DDLTask.h b/src/Interpreters/DDLTask.h index 703d691a358..ee49274707a 100644 --- a/src/Interpreters/DDLTask.h +++ b/src/Interpreters/DDLTask.h @@ -20,6 +20,11 @@ namespace fs = std::filesystem; namespace DB { +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + class ASTQueryWithOnCluster; using ZooKeeperPtr = std::shared_ptr; using ClusterPtr = std::shared_ptr; @@ -164,13 +169,15 @@ class ZooKeeperMetadataTransaction ZooKeeperPtr current_zookeeper; String zookeeper_path; bool is_initial_query; + String task_path; Coordination::Requests ops; public: - ZooKeeperMetadataTransaction(const ZooKeeperPtr & current_zookeeper_, const String & zookeeper_path_, bool is_initial_query_) + ZooKeeperMetadataTransaction(const ZooKeeperPtr & current_zookeeper_, const String & zookeeper_path_, bool is_initial_query_, const String & task_path_) : current_zookeeper(current_zookeeper_) , zookeeper_path(zookeeper_path_) , is_initial_query(is_initial_query_) + , task_path(task_path_) { } @@ -180,15 +187,21 @@ public: String getDatabaseZooKeeperPath() const { return zookeeper_path; } + String getTaskZooKeeperPath() const { return task_path; } + + ZooKeeperPtr getZooKeeper() const { return current_zookeeper; } + void addOp(Coordination::RequestPtr && op) { - assert(!isExecuted()); + if (isExecuted()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot add ZooKeeper operation because query is executed. It's a bug."); ops.emplace_back(op); } void moveOpsTo(Coordination::Requests & other_ops) { - assert(!isExecuted()); + if (isExecuted()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot add ZooKeeper operation because query is executed. It's a bug."); std::move(ops.begin(), ops.end(), std::back_inserter(other_ops)); ops.clear(); state = COMMITTED; @@ -196,7 +209,7 @@ public: void commit(); - ~ZooKeeperMetadataTransaction() { assert(isExecuted() || std::uncaught_exceptions()); } + ~ZooKeeperMetadataTransaction() { assert(isExecuted() || std::uncaught_exceptions() || ops.empty()); } }; ClusterPtr tryGetReplicatedDatabaseCluster(const String & cluster_name); diff --git a/src/Interpreters/DDLWorker.cpp b/src/Interpreters/DDLWorker.cpp index 4e51c346b6f..47ca2b72db8 100644 --- a/src/Interpreters/DDLWorker.cpp +++ b/src/Interpreters/DDLWorker.cpp @@ -31,6 +31,8 @@ #include #include +#include + namespace fs = std::filesystem; @@ -371,7 +373,7 @@ void DDLWorker::scheduleTasks(bool reinitialized) } } - Strings queue_nodes = zookeeper->getChildren(queue_dir, nullptr, queue_updated_event); + Strings queue_nodes = zookeeper->getChildren(queue_dir, &queue_node_stat, queue_updated_event); size_t size_before_filtering = queue_nodes.size(); filterAndSortQueueNodes(queue_nodes); /// The following message is too verbose, but it can be useful too debug mysterious test failures in CI @@ -1136,10 +1138,32 @@ void DDLWorker::runMainThread() cleanup_event->set(); scheduleTasks(reinitialized); - LOG_DEBUG(log, "Waiting for queue updates"); + LOG_DEBUG(log, "Waiting for queue updates (stat: {}, {}, {}, {})", + queue_node_stat.version, queue_node_stat.cversion, queue_node_stat.numChildren, queue_node_stat.pzxid); /// FIXME It may hang for unknown reason. Timeout is just a hotfix. constexpr int queue_wait_timeout_ms = 10000; - queue_updated_event->tryWait(queue_wait_timeout_ms); + bool updated = queue_updated_event->tryWait(queue_wait_timeout_ms); + if (!updated) + { + Coordination::Stat new_stat; + tryGetZooKeeper()->get(queue_dir, &new_stat); + bool queue_changed = memcmp(&queue_node_stat, &new_stat, sizeof(Coordination::Stat)) != 0; + bool watch_triggered = queue_updated_event->tryWait(0); + if (queue_changed && !watch_triggered) + { + /// It should never happen. + /// Maybe log message, abort() and system.zookeeper_log will help to debug it and remove timeout (#26036). + LOG_TRACE( + log, + "Queue was not updated (stat: {}, {}, {}, {})", + new_stat.version, + new_stat.cversion, + new_stat.numChildren, + new_stat.pzxid); + context->getZooKeeperLog()->flush(); + abort(); + } + } } catch (const Coordination::Exception & e) { diff --git a/src/Interpreters/DDLWorker.h b/src/Interpreters/DDLWorker.h index 45218226fee..d05b9b27611 100644 --- a/src/Interpreters/DDLWorker.h +++ b/src/Interpreters/DDLWorker.h @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -125,6 +126,7 @@ protected: std::optional first_failed_task_name; std::list current_tasks; + Coordination::Stat queue_node_stat; std::shared_ptr queue_updated_event = std::make_shared(); std::shared_ptr cleanup_event = std::make_shared(); std::atomic initialized = false; diff --git a/src/Interpreters/DatabaseCatalog.cpp b/src/Interpreters/DatabaseCatalog.cpp index 0d0c82f1abc..fd6b5b9a810 100644 --- a/src/Interpreters/DatabaseCatalog.cpp +++ b/src/Interpreters/DatabaseCatalog.cpp @@ -24,8 +24,8 @@ #endif #if USE_MYSQL -# include -# include +# include +# include #endif #if USE_LIBPQXX @@ -246,11 +246,11 @@ DatabaseAndTable DatabaseCatalog::getTableImpl( #endif #if USE_MYSQL - /// It's definitely not the best place for this logic, but behaviour must be consistent with DatabaseMaterializeMySQL::tryGetTable(...) - if (db_and_table.first->getEngineName() == "MaterializeMySQL") + /// It's definitely not the best place for this logic, but behaviour must be consistent with DatabaseMaterializedMySQL::tryGetTable(...) + if (db_and_table.first->getEngineName() == "MaterializedMySQL") { - if (!MaterializeMySQLSyncThread::isMySQLSyncThread()) - db_and_table.second = std::make_shared(std::move(db_and_table.second), db_and_table.first.get()); + if (!MaterializedMySQLSyncThread::isMySQLSyncThread()) + db_and_table.second = std::make_shared(std::move(db_and_table.second), db_and_table.first.get()); } #endif return db_and_table; diff --git a/src/Interpreters/ExecuteScalarSubqueriesVisitor.cpp b/src/Interpreters/ExecuteScalarSubqueriesVisitor.cpp index 1ce6c4f36d8..f46cbdd2465 100644 --- a/src/Interpreters/ExecuteScalarSubqueriesVisitor.cpp +++ b/src/Interpreters/ExecuteScalarSubqueriesVisitor.cpp @@ -1,9 +1,10 @@ #include #include -#include +#include #include #include +#include #include #include #include @@ -16,8 +17,10 @@ #include #include #include +#include #include + namespace DB { @@ -119,8 +122,24 @@ void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr if (block.rows() == 0) { - /// Interpret subquery with empty result as Null literal - auto ast_new = std::make_unique(Null()); + auto types = interpreter.getSampleBlock().getDataTypes(); + if (types.size() != 1) + types = {std::make_shared(types)}; + + auto & type = types[0]; + if (!type->isNullable()) + { + if (!type->canBeInsideNullable()) + throw Exception(ErrorCodes::INCORRECT_RESULT_OF_SCALAR_SUBQUERY, + "Scalar subquery returned empty result of type {} which cannot be Nullable", + type->getName()); + + type = makeNullable(type); + } + + ASTPtr ast_new = std::make_shared(Null()); + ast_new = addTypeConversionToAST(std::move(ast_new), type->getName()); + ast_new->setAlias(ast->tryGetAlias()); ast = std::move(ast_new); return; @@ -140,10 +159,20 @@ void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr size_t columns = block.columns(); if (columns == 1) + { + auto & column = block.getByPosition(0); + /// Here we wrap type to nullable if we can. + /// It is needed cause if subquery return no rows, it's result will be Null. + /// In case of many columns, do not check it cause tuple can't be nullable. + if (!column.type->isNullable() && column.type->canBeInsideNullable()) + { + column.type = makeNullable(column.type); + column.column = makeNullable(column.column); + } scalar = block; + } else { - ColumnWithTypeAndName ctn; ctn.type = std::make_shared(block.getDataTypes()); ctn.column = ColumnTuple::create(block.getColumns()); @@ -157,9 +186,14 @@ void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr if (data.only_analyze || !settings.enable_scalar_subquery_optimization || worthConvertingToLiteral(scalar) || !data.getContext()->hasQueryContext()) { + /// subquery and ast can be the same object and ast will be moved. + /// Save these fields to avoid use after move. + auto alias = subquery.alias; + auto prefer_alias_to_column_name = subquery.prefer_alias_to_column_name; + auto lit = std::make_unique((*scalar.safeGetByPosition(0).column)[0]); - lit->alias = subquery.alias; - lit->prefer_alias_to_column_name = subquery.prefer_alias_to_column_name; + lit->alias = alias; + lit->prefer_alias_to_column_name = prefer_alias_to_column_name; ast = addTypeConversionToAST(std::move(lit), scalar.safeGetByPosition(0).type->getName()); /// If only analyze was requested the expression is not suitable for constant folding, disable it. @@ -167,8 +201,8 @@ void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr { ast->as()->alias.clear(); auto func = makeASTFunction("identity", std::move(ast)); - func->alias = subquery.alias; - func->prefer_alias_to_column_name = subquery.prefer_alias_to_column_name; + func->alias = alias; + func->prefer_alias_to_column_name = prefer_alias_to_column_name; ast = std::move(func); } } diff --git a/src/Interpreters/ExpressionActions.cpp b/src/Interpreters/ExpressionActions.cpp index 905fcf0331c..6797947a101 100644 --- a/src/Interpreters/ExpressionActions.cpp +++ b/src/Interpreters/ExpressionActions.cpp @@ -812,6 +812,9 @@ void ExpressionActionsChain::JoinStep::finalize(const NameSet & required_output_ for (const auto & name : analyzed_join->keyNamesLeft()) required_names.emplace(name); + if (ASTPtr extra_condition_column = analyzed_join->joinConditionColumn(JoinTableSide::Left)) + required_names.emplace(extra_condition_column->getColumnName()); + for (const auto & column : required_columns) { if (required_names.count(column.name) != 0) diff --git a/src/Interpreters/ExpressionAnalyzer.cpp b/src/Interpreters/ExpressionAnalyzer.cpp index 1496ea3dc61..6245b297b36 100644 --- a/src/Interpreters/ExpressionAnalyzer.cpp +++ b/src/Interpreters/ExpressionAnalyzer.cpp @@ -126,6 +126,7 @@ ExpressionAnalyzerData::~ExpressionAnalyzerData() = default; ExpressionAnalyzer::ExtractedSettings::ExtractedSettings(const Settings & settings_) : use_index_for_in_with_subqueries(settings_.use_index_for_in_with_subqueries) , size_limits_for_set(settings_.max_rows_in_set, settings_.max_bytes_in_set, settings_.set_overflow_mode) + , distributed_group_by_no_merge(settings_.distributed_group_by_no_merge) {} ExpressionAnalyzer::~ExpressionAnalyzer() = default; @@ -247,21 +248,25 @@ void ExpressionAnalyzer::analyzeAggregation() if (!node) throw Exception("Unknown identifier (in GROUP BY): " + column_name, ErrorCodes::UNKNOWN_IDENTIFIER); - /// Constant expressions have non-null column pointer at this stage. - if (node->column && isColumnConst(*node->column)) + /// Only removes constant keys if it's an initiator or distributed_group_by_no_merge is enabled. + if (getContext()->getClientInfo().distributed_depth == 0 || settings.distributed_group_by_no_merge > 0) { - select_query->group_by_with_constant_keys = true; - - /// But don't remove last key column if no aggregate functions, otherwise aggregation will not work. - if (!aggregate_descriptions.empty() || size > 1) + /// Constant expressions have non-null column pointer at this stage. + if (node->column && isColumnConst(*node->column)) { - if (i + 1 < static_cast(size)) - group_asts[i] = std::move(group_asts.back()); + select_query->group_by_with_constant_keys = true; - group_asts.pop_back(); + /// But don't remove last key column if no aggregate functions, otherwise aggregation will not work. + if (!aggregate_descriptions.empty() || size > 1) + { + if (i + 1 < static_cast(size)) + group_asts[i] = std::move(group_asts.back()); - --i; - continue; + group_asts.pop_back(); + + --i; + continue; + } } } @@ -364,7 +369,7 @@ SetPtr ExpressionAnalyzer::isPlainStorageSetInSubquery(const ASTPtr & subquery_o } -/// Performance optimisation for IN() if storage supports it. +/// Performance optimization for IN() if storage supports it. void SelectQueryExpressionAnalyzer::makeSetsForIndex(const ASTPtr & node) { if (!node || !storage() || !storage()->supportsIndexForIn()) @@ -608,18 +613,6 @@ void makeWindowDescriptionFromAST(const Context & context, void ExpressionAnalyzer::makeWindowDescriptions(ActionsDAGPtr actions) { - // Convenient to check here because at least we have the Context. - if (!syntax->window_function_asts.empty() && - !getContext()->getSettingsRef().allow_experimental_window_functions) - { - throw Exception(ErrorCodes::NOT_IMPLEMENTED, - "The support for window functions is experimental and will change" - " in backwards-incompatible ways in the future releases. Set" - " allow_experimental_window_functions = 1 to enable it." - " While processing '{}'", - syntax->window_function_asts[0]->formatForErrorMessage()); - } - // Window definitions from the WINDOW clause const auto * select_query = query->as(); if (select_query && select_query->window()) @@ -1478,12 +1471,6 @@ ExpressionAnalysisResult::ExpressionAnalysisResult( chain.clear(); }; - if (storage) - { - query_analyzer.makeSetsForIndex(query.where()); - query_analyzer.makeSetsForIndex(query.prewhere()); - } - { ExpressionActionsChain chain(context); Names additional_required_columns_after_prewhere; diff --git a/src/Interpreters/ExpressionAnalyzer.h b/src/Interpreters/ExpressionAnalyzer.h index ac5d281f337..2d0041bd96b 100644 --- a/src/Interpreters/ExpressionAnalyzer.h +++ b/src/Interpreters/ExpressionAnalyzer.h @@ -90,6 +90,7 @@ private: { const bool use_index_for_in_with_subqueries; const SizeLimits size_limits_for_set; + const UInt64 distributed_group_by_no_merge; ExtractedSettings(const Settings & settings_); }; @@ -326,15 +327,15 @@ public: /// Deletes all columns except mentioned by SELECT, arranges the remaining columns and renames them to aliases. ActionsDAGPtr appendProjectResult(ExpressionActionsChain & chain) const; + /// Create Set-s that we make from IN section to use index on them. + void makeSetsForIndex(const ASTPtr & node); + private: StorageMetadataPtr metadata_snapshot; /// If non-empty, ignore all expressions not from this list. NameSet required_result_columns; SelectQueryOptions query_options; - /// Create Set-s that we make from IN section to use index on them. - void makeSetsForIndex(const ASTPtr & node); - JoinPtr makeTableJoin( const ASTTablesInSelectQueryElement & join_element, const ColumnsWithTypeAndName & left_sample_columns); diff --git a/src/Interpreters/ExternalDictionariesLoader.cpp b/src/Interpreters/ExternalDictionariesLoader.cpp index cfbe2b45f44..83931649443 100644 --- a/src/Interpreters/ExternalDictionariesLoader.cpp +++ b/src/Interpreters/ExternalDictionariesLoader.cpp @@ -81,8 +81,12 @@ DictionaryStructure ExternalDictionariesLoader::getDictionaryStructure(const std std::string ExternalDictionariesLoader::resolveDictionaryName(const std::string & dictionary_name, const std::string & current_database_name) const { + bool has_dictionary = has(dictionary_name); + if (has_dictionary) + return dictionary_name; + std::string resolved_name = resolveDictionaryNameFromDatabaseCatalog(dictionary_name); - bool has_dictionary = has(resolved_name); + has_dictionary = has(resolved_name); if (!has_dictionary) { diff --git a/src/Interpreters/GlobalSubqueriesVisitor.h b/src/Interpreters/GlobalSubqueriesVisitor.h index a9c7cb61a0a..6a87527dc9c 100644 --- a/src/Interpreters/GlobalSubqueriesVisitor.h +++ b/src/Interpreters/GlobalSubqueriesVisitor.h @@ -15,7 +15,8 @@ #include #include #include -#include +#include +#include #include namespace DB @@ -62,7 +63,7 @@ public: return; bool is_table = false; - ASTPtr subquery_or_table_name = ast; /// ASTTableIdentifier | ASTSubquery | ASTTableExpression + ASTPtr subquery_or_table_name; /// ASTTableIdentifier | ASTSubquery | ASTTableExpression if (const auto * ast_table_expr = ast->as()) { @@ -75,7 +76,14 @@ public: } } else if (ast->as()) + { + subquery_or_table_name = ast; is_table = true; + } + else if (ast->as()) + { + subquery_or_table_name = ast; + } if (!subquery_or_table_name) throw Exception("Global subquery requires subquery or table name", ErrorCodes::WRONG_GLOBAL_SUBQUERY); @@ -150,14 +158,13 @@ public: auto external_table = external_storage_holder->getTable(); auto table_out = external_table->write({}, external_table->getInMemoryMetadataPtr(), getContext()); auto io = interpreter->execute(); - PullingPipelineExecutor executor(io.pipeline); - - table_out->writePrefix(); - Block block; - while (executor.pull(block)) - table_out->write(block); - - table_out->writeSuffix(); + io.pipeline.resize(1); + io.pipeline.setSinks([&](const Block &, Pipe::StreamType) -> ProcessorPtr + { + return table_out; + }); + auto executor = io.pipeline.execute(); + executor->execute(io.pipeline.getNumStreams()); } else { diff --git a/src/Interpreters/HashJoin.cpp b/src/Interpreters/HashJoin.cpp index 6e5f7df99bd..dd17fc1004c 100644 --- a/src/Interpreters/HashJoin.cpp +++ b/src/Interpreters/HashJoin.cpp @@ -190,9 +190,12 @@ HashJoin::HashJoin(std::shared_ptr table_join_, const Block & right_s { LOG_DEBUG(log, "Right sample block: {}", right_sample_block.dumpStructure()); - table_join->splitAdditionalColumns(right_sample_block, right_table_keys, sample_block_with_columns_to_add); + JoinCommon::splitAdditionalColumns(key_names_right, right_sample_block, right_table_keys, sample_block_with_columns_to_add); + required_right_keys = table_join->getRequiredRightKeys(right_table_keys, required_right_keys_sources); + std::tie(condition_mask_column_name_left, condition_mask_column_name_right) = table_join->joinConditionColumnNames(); + JoinCommon::removeLowCardinalityInplace(right_table_keys); initRightBlockStructure(data->sample_block); @@ -500,7 +503,7 @@ namespace template size_t NO_INLINE insertFromBlockImplTypeCase( HashJoin & join, Map & map, size_t rows, const ColumnRawPtrs & key_columns, - const Sizes & key_sizes, Block * stored_block, ConstNullMapPtr null_map, Arena & pool) + const Sizes & key_sizes, Block * stored_block, ConstNullMapPtr null_map, UInt8ColumnDataPtr join_mask, Arena & pool) { [[maybe_unused]] constexpr bool mapped_one = std::is_same_v; constexpr bool is_asof_join = STRICTNESS == ASTTableJoin::Strictness::Asof; @@ -516,6 +519,10 @@ namespace if (has_null_map && (*null_map)[i]) continue; + /// Check condition for right table from ON section + if (join_mask && !(*join_mask)[i]) + continue; + if constexpr (is_asof_join) Inserter::insertAsof(join, map, key_getter, stored_block, i, pool, *asof_column); else if constexpr (mapped_one) @@ -530,19 +537,21 @@ namespace template size_t insertFromBlockImplType( HashJoin & join, Map & map, size_t rows, const ColumnRawPtrs & key_columns, - const Sizes & key_sizes, Block * stored_block, ConstNullMapPtr null_map, Arena & pool) + const Sizes & key_sizes, Block * stored_block, ConstNullMapPtr null_map, UInt8ColumnDataPtr join_mask, Arena & pool) { if (null_map) - return insertFromBlockImplTypeCase(join, map, rows, key_columns, key_sizes, stored_block, null_map, pool); + return insertFromBlockImplTypeCase( + join, map, rows, key_columns, key_sizes, stored_block, null_map, join_mask, pool); else - return insertFromBlockImplTypeCase(join, map, rows, key_columns, key_sizes, stored_block, null_map, pool); + return insertFromBlockImplTypeCase( + join, map, rows, key_columns, key_sizes, stored_block, null_map, join_mask, pool); } template size_t insertFromBlockImpl( HashJoin & join, HashJoin::Type type, Maps & maps, size_t rows, const ColumnRawPtrs & key_columns, - const Sizes & key_sizes, Block * stored_block, ConstNullMapPtr null_map, Arena & pool) + const Sizes & key_sizes, Block * stored_block, ConstNullMapPtr null_map, UInt8ColumnDataPtr join_mask, Arena & pool) { switch (type) { @@ -553,7 +562,7 @@ namespace #define M(TYPE) \ case HashJoin::Type::TYPE: \ return insertFromBlockImplType>::Type>(\ - join, *maps.TYPE, rows, key_columns, key_sizes, stored_block, null_map, pool); \ + join, *maps.TYPE, rows, key_columns, key_sizes, stored_block, null_map, join_mask, pool); \ break; APPLY_FOR_JOIN_VARIANTS(M) #undef M @@ -624,10 +633,34 @@ bool HashJoin::addJoinedBlock(const Block & source_block, bool check_limits) UInt8 save_nullmap = 0; if (isRightOrFull(kind) && null_map) { + /// Save rows with NULL keys for (size_t i = 0; !save_nullmap && i < null_map->size(); ++i) save_nullmap |= (*null_map)[i]; } + auto join_mask_col = JoinCommon::getColumnAsMask(block, condition_mask_column_name_right); + + /// Save blocks that do not hold conditions in ON section + ColumnUInt8::MutablePtr not_joined_map = nullptr; + if (isRightOrFull(kind) && join_mask_col) + { + const auto & join_mask = assert_cast(*join_mask_col).getData(); + /// Save rows that do not hold conditions + not_joined_map = ColumnUInt8::create(block.rows(), 0); + for (size_t i = 0, sz = join_mask.size(); i < sz; ++i) + { + /// Condition hold, do not save row + if (join_mask[i]) + continue; + + /// NULL key will be saved anyway because, do not save twice + if (save_nullmap && (*null_map)[i]) + continue; + + not_joined_map->getData()[i] = 1; + } + } + Block structured_block = structureRightBlock(block); size_t total_rows = 0; size_t total_bytes = 0; @@ -647,7 +680,10 @@ bool HashJoin::addJoinedBlock(const Block & source_block, bool check_limits) { joinDispatch(kind, strictness, data->maps, [&](auto kind_, auto strictness_, auto & map) { - size_t size = insertFromBlockImpl(*this, data->type, map, rows, key_columns, key_sizes, stored_block, null_map, data->pool); + size_t size = insertFromBlockImpl( + *this, data->type, map, rows, key_columns, key_sizes, stored_block, null_map, + join_mask_col ? &assert_cast(*join_mask_col).getData() : nullptr, + data->pool); /// Number of buckets + 1 value from zero storage used_flags.reinit(size + 1); }); @@ -656,6 +692,9 @@ bool HashJoin::addJoinedBlock(const Block & source_block, bool check_limits) if (save_nullmap) data->blocks_nullmaps.emplace_back(stored_block, null_map_holder); + if (not_joined_map) + data->blocks_nullmaps.emplace_back(stored_block, std::move(not_joined_map)); + if (!check_limits) return true; @@ -693,6 +732,7 @@ public: const HashJoin & join, const ColumnRawPtrs & key_columns_, const Sizes & key_sizes_, + const UInt8ColumnDataPtr & join_mask_column_, bool is_asof_join, bool is_join_get_) : key_columns(key_columns_) @@ -700,6 +740,7 @@ public: , rows_to_add(block.rows()) , asof_type(join.getAsofType()) , asof_inequality(join.getAsofInequality()) + , join_mask_column(join_mask_column_) , is_join_get(is_join_get_) { size_t num_columns_to_add = block_with_columns_to_add.columns(); @@ -784,6 +825,8 @@ public: ASOF::Inequality asofInequality() const { return asof_inequality; } const IColumn & leftAsofKey() const { return *left_asof_key; } + bool isRowFiltered(size_t i) { return join_mask_column && !(*join_mask_column)[i]; } + const ColumnRawPtrs & key_columns; const Sizes & key_sizes; size_t rows_to_add; @@ -799,6 +842,7 @@ private: std::optional asof_type; ASOF::Inequality asof_inequality; const IColumn * left_asof_key = nullptr; + UInt8ColumnDataPtr join_mask_column; bool is_join_get; void addColumn(const ColumnWithTypeAndName & src_column, const std::string & qualified_name) @@ -891,7 +935,9 @@ NO_INLINE IColumn::Filter joinRightColumns( } } - auto find_result = key_getter.findKey(map, i, pool); + bool row_acceptable = !added_columns.isRowFiltered(i); + using FindResult = typename KeyGetter::FindResult; + auto find_result = row_acceptable ? key_getter.findKey(map, i, pool) : FindResult(); if (find_result.isFound()) { @@ -1098,7 +1144,20 @@ void HashJoin::joinBlockImpl( * For ASOF, the last column is used as the ASOF column */ - AddedColumns added_columns(block_with_columns_to_add, block, savedBlockSample(), *this, left_key_columns, key_sizes, is_asof_join, is_join_get); + /// Only rows where mask == true can be joined + ColumnPtr join_mask_column = JoinCommon::getColumnAsMask(block, condition_mask_column_name_left); + + AddedColumns added_columns( + block_with_columns_to_add, + block, + savedBlockSample(), + *this, + left_key_columns, + key_sizes, + join_mask_column ? &assert_cast(*join_mask_column).getData() : nullptr, + is_asof_join, + is_join_get); + bool has_required_right_keys = (required_right_keys.columns() != 0); added_columns.need_filter = need_filter || has_required_right_keys; @@ -1324,7 +1383,8 @@ ColumnWithTypeAndName HashJoin::joinGet(const Block & block, const Block & block void HashJoin::joinBlock(Block & block, ExtraBlockPtr & not_processed) { const Names & key_names_left = table_join->keyNamesLeft(); - JoinCommon::checkTypesOfKeys(block, key_names_left, right_table_keys, key_names_right); + JoinCommon::checkTypesOfKeys(block, key_names_left, condition_mask_column_name_left, + right_sample_block, key_names_right, condition_mask_column_name_right); if (overDictionary()) { @@ -1368,18 +1428,6 @@ void HashJoin::joinBlock(Block & block, ExtraBlockPtr & not_processed) throw Exception("Logical error: unknown combination of JOIN", ErrorCodes::LOGICAL_ERROR); } - -void HashJoin::joinTotals(Block & block) const -{ - Block sample_right_block = sample_block_with_columns_to_add.cloneEmpty(); - /// For StorageJoin column names isn't qualified in sample_block_with_columns_to_add - for (auto & col : sample_right_block) - col.name = getTableJoin().renamedRightColumnName(col.name); - - JoinCommon::joinTotals(totals, sample_right_block, *table_join, block); -} - - template struct AdderNonJoined { diff --git a/src/Interpreters/HashJoin.h b/src/Interpreters/HashJoin.h index 84c447d875e..65e3f5dbabe 100644 --- a/src/Interpreters/HashJoin.h +++ b/src/Interpreters/HashJoin.h @@ -155,9 +155,7 @@ public: /** Keep "totals" (separate part of dataset, see WITH TOTALS) to use later. */ void setTotals(const Block & block) override { totals = block; } - bool hasTotals() const override { return totals; } - - void joinTotals(Block & block) const override; + const Block & getTotals() const override { return totals; } bool isFilled() const override { return from_storage_join || data->type == Type::DICT; } @@ -379,6 +377,10 @@ private: /// Left table column names that are sources for required_right_keys columns std::vector required_right_keys_sources; + /// Additional conditions for rows to join from JOIN ON section + String condition_mask_column_name_left; + String condition_mask_column_name_right; + Poco::Logger * log; Block totals; diff --git a/src/Interpreters/IInterpreterUnionOrSelectQuery.h b/src/Interpreters/IInterpreterUnionOrSelectQuery.h index 0b07f27e14a..cc960e748f6 100644 --- a/src/Interpreters/IInterpreterUnionOrSelectQuery.h +++ b/src/Interpreters/IInterpreterUnionOrSelectQuery.h @@ -4,6 +4,7 @@ #include #include #include +#include namespace DB { @@ -16,6 +17,14 @@ public: , options(options_) , max_streams(context->getSettingsRef().max_threads) { + if (options.shard_num) + context->addLocalScalar( + "_shard_num", + Block{{DataTypeUInt32().createColumnConst(1, *options.shard_num), std::make_shared(), "_shard_num"}}); + if (options.shard_count) + context->addLocalScalar( + "_shard_count", + Block{{DataTypeUInt32().createColumnConst(1, *options.shard_count), std::make_shared(), "_shard_count"}}); } virtual void buildQueryPlan(QueryPlan & query_plan) = 0; diff --git a/src/Interpreters/IJoin.h b/src/Interpreters/IJoin.h index 0f486fbe523..8fa85de4951 100644 --- a/src/Interpreters/IJoin.h +++ b/src/Interpreters/IJoin.h @@ -31,15 +31,13 @@ public: /// Could be called from different threads in parallel. virtual void joinBlock(Block & block, std::shared_ptr & not_processed) = 0; - virtual bool hasTotals() const = 0; - /// Set totals for right table + /// Set/Get totals for right table virtual void setTotals(const Block & block) = 0; - /// Add totals to block from left table - virtual void joinTotals(Block & block) const = 0; + virtual const Block & getTotals() const = 0; virtual size_t getTotalRowCount() const = 0; virtual size_t getTotalByteCount() const = 0; - virtual bool alwaysReturnsEmptySet() const { return false; } + virtual bool alwaysReturnsEmptySet() const = 0; /// StorageJoin/Dictionary is already filled. No need to call addJoinedBlock. /// Different query plan is used for such joins. diff --git a/src/Interpreters/InterpreterAlterQuery.cpp b/src/Interpreters/InterpreterAlterQuery.cpp index 6f0af049d05..76e7afb7009 100644 --- a/src/Interpreters/InterpreterAlterQuery.cpp +++ b/src/Interpreters/InterpreterAlterQuery.cpp @@ -54,7 +54,7 @@ BlockIO InterpreterAlterQuery::execute() DatabasePtr database = DatabaseCatalog::instance().getDatabase(table_id.database_name); if (typeid_cast(database.get()) - && getContext()->getClientInfo().query_kind != ClientInfo::QueryKind::SECONDARY_QUERY) + && !getContext()->getClientInfo().is_replicated_database_internal) { auto guard = DatabaseCatalog::instance().getDDLGuard(table_id.database_name, table_id.table_name); guard->releaseTableLock(); @@ -100,7 +100,8 @@ BlockIO InterpreterAlterQuery::execute() if (typeid_cast(database.get())) { int command_types_count = !mutation_commands.empty() + !partition_commands.empty() + !live_view_commands.empty() + !alter_commands.empty(); - if (1 < command_types_count) + bool mixed_settings_amd_metadata_alter = alter_commands.hasSettingsAlterCommand() && !alter_commands.isSettingsAlter(); + if (1 < command_types_count || mixed_settings_amd_metadata_alter) throw Exception(ErrorCodes::NOT_IMPLEMENTED, "For Replicated databases it's not allowed " "to execute ALTERs of different types in single query"); } diff --git a/src/Interpreters/InterpreterCreateQuery.cpp b/src/Interpreters/InterpreterCreateQuery.cpp index 38f8cd8d642..bf2cf6338aa 100644 --- a/src/Interpreters/InterpreterCreateQuery.cpp +++ b/src/Interpreters/InterpreterCreateQuery.cpp @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -31,7 +32,9 @@ #include #include +#include #include +#include #include #include #include @@ -84,7 +87,6 @@ namespace ErrorCodes extern const int UNKNOWN_DATABASE; extern const int PATH_ACCESS_DENIED; extern const int NOT_IMPLEMENTED; - extern const int UNKNOWN_TABLE; } namespace fs = std::filesystem; @@ -164,7 +166,7 @@ BlockIO InterpreterCreateQuery::createDatabase(ASTCreateQuery & create) if (!create.attach && fs::exists(metadata_path)) throw Exception(ErrorCodes::DATABASE_ALREADY_EXISTS, "Metadata directory {} already exists", metadata_path.string()); } - else if (create.storage->engine->name == "MaterializeMySQL") + else if (create.storage->engine->name == "MaterializeMySQL" || create.storage->engine->name == "MaterializedMySQL") { /// It creates nested database with Ordinary or Atomic engine depending on UUID in query and default engine setting. /// Do nothing if it's an internal ATTACH on server startup or short-syntax ATTACH query from user, @@ -204,11 +206,12 @@ BlockIO InterpreterCreateQuery::createDatabase(ASTCreateQuery & create) metadata_path = metadata_path / "metadata" / database_name_escaped; } - if (create.storage->engine->name == "MaterializeMySQL" && !getContext()->getSettingsRef().allow_experimental_database_materialize_mysql + if ((create.storage->engine->name == "MaterializeMySQL" || create.storage->engine->name == "MaterializedMySQL") + && !getContext()->getSettingsRef().allow_experimental_database_materialized_mysql && !internal) { - throw Exception("MaterializeMySQL is an experimental database engine. " - "Enable allow_experimental_database_materialize_mysql to use it.", ErrorCodes::UNKNOWN_DATABASE_ENGINE); + throw Exception("MaterializedMySQL is an experimental database engine. " + "Enable allow_experimental_database_materialized_mysql to use it.", ErrorCodes::UNKNOWN_DATABASE_ENGINE); } if (create.storage->engine->name == "Replicated" && !getContext()->getSettingsRef().allow_experimental_database_replicated && !internal) @@ -802,36 +805,6 @@ void InterpreterCreateQuery::assertOrSetUUID(ASTCreateQuery & create, const Data create.uuid = UUIDHelpers::Nil; create.to_inner_uuid = UUIDHelpers::Nil; } - - if (create.replace_table) - { - if (database->getUUID() == UUIDHelpers::Nil) - throw Exception(ErrorCodes::INCORRECT_QUERY, - "{} query is supported only for Atomic databases", - create.create_or_replace ? "CREATE OR REPLACE TABLE" : "REPLACE TABLE"); - - UUID uuid_of_table_to_replace; - if (create.create_or_replace) - { - uuid_of_table_to_replace = getContext()->tryResolveStorageID(StorageID(create.database, create.table)).uuid; - if (uuid_of_table_to_replace == UUIDHelpers::Nil) - { - /// Convert to usual CREATE - create.replace_table = false; - assert(!database->isTableExist(create.table, getContext())); - } - else - create.table = "_tmp_replace_" + toString(uuid_of_table_to_replace); - } - else - { - uuid_of_table_to_replace = getContext()->resolveStorageID(StorageID(create.database, create.table)).uuid; - if (uuid_of_table_to_replace == UUIDHelpers::Nil) - throw Exception(ErrorCodes::UNKNOWN_TABLE, "Table {}.{} doesn't exist", - backQuoteIfNeed(create.database), backQuoteIfNeed(create.table)); - create.table = "_tmp_replace_" + toString(uuid_of_table_to_replace); - } - } } @@ -855,7 +828,7 @@ BlockIO InterpreterCreateQuery::createTable(ASTCreateQuery & create) auto guard = DatabaseCatalog::instance().getDDLGuard(database_name, create.table); if (auto* ptr = typeid_cast(database.get()); - ptr && getContext()->getClientInfo().query_kind != ClientInfo::QueryKind::SECONDARY_QUERY) + ptr && !getContext()->getClientInfo().is_replicated_database_internal) { create.database = database_name; guard->releaseTableLock(); @@ -949,7 +922,7 @@ BlockIO InterpreterCreateQuery::createTable(ASTCreateQuery & create) auto guard = DatabaseCatalog::instance().getDDLGuard(create.database, create.table); if (auto * ptr = typeid_cast(database.get()); - ptr && getContext()->getClientInfo().query_kind != ClientInfo::QueryKind::SECONDARY_QUERY) + ptr && !getContext()->getClientInfo().is_replicated_database_internal) { assertOrSetUUID(create, database); guard->releaseTableLock(); @@ -1109,23 +1082,72 @@ bool InterpreterCreateQuery::doCreateTable(ASTCreateQuery & create, BlockIO InterpreterCreateQuery::doCreateOrReplaceTable(ASTCreateQuery & create, const InterpreterCreateQuery::TableProperties & properties) { + /// Replicated database requires separate contexts for each DDL query + ContextPtr current_context = getContext(); + ContextMutablePtr create_context = Context::createCopy(current_context); + create_context->setQueryContext(std::const_pointer_cast(current_context)); + + auto make_drop_context = [&](bool on_error) -> ContextMutablePtr + { + ContextMutablePtr drop_context = Context::createCopy(current_context); + drop_context->makeQueryContext(); + if (on_error) + return drop_context; + + if (auto txn = current_context->getZooKeeperMetadataTransaction()) + { + /// Execute drop as separate query, because [CREATE OR] REPLACE query can be considered as + /// successfully executed after RENAME/EXCHANGE query. + drop_context->resetZooKeeperMetadataTransaction(); + auto drop_txn = std::make_shared(txn->getZooKeeper(), txn->getDatabaseZooKeeperPath(), + txn->isInitialQuery(), txn->getTaskZooKeeperPath()); + drop_context->initZooKeeperMetadataTransaction(drop_txn); + } + return drop_context; + }; + auto ast_drop = std::make_shared(); String table_to_replace_name = create.table; - bool created = false; - bool replaced = false; - try { - [[maybe_unused]] bool done = doCreateTable(create, properties); - assert(done); + auto database = DatabaseCatalog::instance().getDatabase(create.database); + if (database->getUUID() == UUIDHelpers::Nil) + throw Exception(ErrorCodes::INCORRECT_QUERY, + "{} query is supported only for Atomic databases", + create.create_or_replace ? "CREATE OR REPLACE TABLE" : "REPLACE TABLE"); + + + UInt64 name_hash = sipHash64(create.database + create.table); + UInt16 random_suffix = thread_local_rng(); + if (auto txn = current_context->getZooKeeperMetadataTransaction()) + { + /// Avoid different table name on database replicas + random_suffix = sipHash64(txn->getTaskZooKeeperPath()); + } + create.table = fmt::format("_tmp_replace_{}_{}", + getHexUIntLowercase(name_hash), + getHexUIntLowercase(random_suffix)); + ast_drop->table = create.table; ast_drop->is_dictionary = create.is_dictionary; ast_drop->database = create.database; ast_drop->kind = ASTDropQuery::Drop; - created = true; - if (!create.replace_table) - return fillTableIfNeeded(create); + } + bool created = false; + bool renamed = false; + try + { + /// Create temporary table (random name will be generated) + [[maybe_unused]] bool done = InterpreterCreateQuery(query_ptr, create_context).doCreateTable(create, properties); + assert(done); + created = true; + + /// Try fill temporary table + BlockIO fill_io = fillTableIfNeeded(create); + executeTrivialBlockIO(fill_io, getContext()); + + /// Replace target table with created one auto ast_rename = std::make_shared(); ASTRenameQuery::Element elem { @@ -1134,22 +1156,44 @@ BlockIO InterpreterCreateQuery::doCreateOrReplaceTable(ASTCreateQuery & create, }; ast_rename->elements.push_back(std::move(elem)); - ast_rename->exchange = true; ast_rename->dictionary = create.is_dictionary; + if (create.create_or_replace) + { + /// CREATE OR REPLACE TABLE + /// Will execute ordinary RENAME instead of EXCHANGE if the target table does not exist + ast_rename->rename_if_cannot_exchange = true; + ast_rename->exchange = false; + } + else + { + /// REPLACE TABLE + /// Will execute EXCHANGE query and fail if the target table does not exist + ast_rename->exchange = true; + } - InterpreterRenameQuery(ast_rename, getContext()).execute(); - replaced = true; + InterpreterRenameQuery interpreter_rename{ast_rename, current_context}; + interpreter_rename.execute(); + renamed = true; - InterpreterDropQuery(ast_drop, getContext()).execute(); + if (!interpreter_rename.renamedInsteadOfExchange()) + { + /// Target table was replaced with new one, drop old table + auto drop_context = make_drop_context(false); + InterpreterDropQuery(ast_drop, drop_context).execute(); + } create.table = table_to_replace_name; - return fillTableIfNeeded(create); + return {}; } catch (...) { - if (created && create.replace_table && !replaced) - InterpreterDropQuery(ast_drop, getContext()).execute(); + /// Drop temporary table if it was successfully created, but was not renamed to target name + if (created && !renamed) + { + auto drop_context = make_drop_context(true); + InterpreterDropQuery(ast_drop, drop_context).execute(); + } throw; } } diff --git a/src/Interpreters/InterpreterCreateUserQuery.cpp b/src/Interpreters/InterpreterCreateUserQuery.cpp index 7f4969ff9ef..6f963a3b338 100644 --- a/src/Interpreters/InterpreterCreateUserQuery.cpp +++ b/src/Interpreters/InterpreterCreateUserQuery.cpp @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -59,6 +60,9 @@ namespace else if (query.default_roles) set_default_roles(*query.default_roles); + if (query.default_database) + user.default_database = query.default_database->database_name; + if (override_settings) user.settings = *override_settings; else if (query.settings) diff --git a/src/Interpreters/InterpreterDropQuery.cpp b/src/Interpreters/InterpreterDropQuery.cpp index 94d5fbf3ea7..0e15c6be27c 100644 --- a/src/Interpreters/InterpreterDropQuery.cpp +++ b/src/Interpreters/InterpreterDropQuery.cpp @@ -17,7 +17,7 @@ #endif #if USE_MYSQL -# include +# include #endif #if USE_LIBPQXX @@ -133,7 +133,7 @@ BlockIO InterpreterDropQuery::executeToTableImpl(ASTDropQuery & query, DatabaseP /// Prevents recursive drop from drop database query. The original query must specify a table. bool is_drop_or_detach_database = query_ptr->as()->table.empty(); bool is_replicated_ddl_query = typeid_cast(database.get()) && - getContext()->getClientInfo().query_kind != ClientInfo::QueryKind::SECONDARY_QUERY && + !getContext()->getClientInfo().is_replicated_database_internal && !is_drop_or_detach_database; AccessFlags drop_storage; @@ -315,7 +315,7 @@ BlockIO InterpreterDropQuery::executeToDatabaseImpl(const ASTDropQuery & query, throw Exception("DETACH PERMANENTLY is not implemented for databases", ErrorCodes::NOT_IMPLEMENTED); #if USE_MYSQL - if (database->getEngineName() == "MaterializeMySQL") + if (database->getEngineName() == "MaterializedMySQL") stopDatabaseSynchronization(database); #endif if (auto * replicated = typeid_cast(database.get())) @@ -335,7 +335,7 @@ BlockIO InterpreterDropQuery::executeToDatabaseImpl(const ASTDropQuery & query, /// Flush should not be done if shouldBeEmptyOnDetach() == false, /// since in this case getTablesIterator() may do some additional work, - /// see DatabaseMaterializeMySQL<>::getTablesIterator() + /// see DatabaseMaterializedMySQL<>::getTablesIterator() for (auto iterator = database->getTablesIterator(getContext()); iterator->isValid(); iterator->next()) { iterator->table()->flush(); @@ -426,6 +426,7 @@ void InterpreterDropQuery::executeDropQuery(ASTDropQuery::Kind kind, ContextPtr if (auto txn = current_context->getZooKeeperMetadataTransaction()) { /// For Replicated database + drop_context->getClientInfo().is_replicated_database_internal = true; drop_context->setQueryContext(std::const_pointer_cast(current_context)); drop_context->initZooKeeperMetadataTransaction(txn, true); } diff --git a/src/Interpreters/InterpreterExplainQuery.cpp b/src/Interpreters/InterpreterExplainQuery.cpp index b4a91170bc4..37650f5caa7 100644 --- a/src/Interpreters/InterpreterExplainQuery.cpp +++ b/src/Interpreters/InterpreterExplainQuery.cpp @@ -78,17 +78,35 @@ BlockIO InterpreterExplainQuery::execute() } -Block InterpreterExplainQuery::getSampleBlock() +Block InterpreterExplainQuery::getSampleBlock(const ASTExplainQuery::ExplainKind kind) { - Block block; - - ColumnWithTypeAndName col; - col.name = "explain"; - col.type = std::make_shared(); - col.column = col.type->createColumn(); - block.insert(col); - - return block; + if (kind == ASTExplainQuery::ExplainKind::QueryEstimates) + { + auto cols = NamesAndTypes{ + {"database", std::make_shared()}, + {"table", std::make_shared()}, + {"parts", std::make_shared()}, + {"rows", std::make_shared()}, + {"marks", std::make_shared()}, + }; + return Block({ + {cols[0].type->createColumn(), cols[0].type, cols[0].name}, + {cols[1].type->createColumn(), cols[1].type, cols[1].name}, + {cols[2].type->createColumn(), cols[2].type, cols[2].name}, + {cols[3].type->createColumn(), cols[3].type, cols[3].name}, + {cols[4].type->createColumn(), cols[4].type, cols[4].name}, + }); + } + else + { + Block res; + ColumnWithTypeAndName col; + col.name = "explain"; + col.type = std::make_shared(); + col.column = col.type->createColumn(); + res.insert(col); + return res; + } } /// Split str by line feed and write as separate row to ColumnString. @@ -223,9 +241,9 @@ ExplainSettings checkAndGetSettings(const ASTPtr & ast_settings) BlockInputStreamPtr InterpreterExplainQuery::executeImpl() { - const auto & ast = query->as(); + const auto & ast = query->as(); - Block sample_block = getSampleBlock(); + Block sample_block = getSampleBlock(ast.getKind()); MutableColumns res_columns = sample_block.cloneEmptyColumns(); WriteBufferFromOwnString buf; @@ -313,11 +331,32 @@ BlockInputStreamPtr InterpreterExplainQuery::executeImpl() plan.explainPipeline(buf, settings.query_pipeline_options); } } + else if (ast.getKind() == ASTExplainQuery::QueryEstimates) + { + if (!dynamic_cast(ast.getExplainedQuery().get())) + throw Exception("Only SELECT is supported for EXPLAIN ESTIMATE query", ErrorCodes::INCORRECT_QUERY); - if (single_line) - res_columns[0]->insertData(buf.str().data(), buf.str().size()); - else - fillColumn(*res_columns[0], buf.str()); + auto settings = checkAndGetSettings(ast.getSettings()); + QueryPlan plan; + + InterpreterSelectWithUnionQuery interpreter(ast.getExplainedQuery(), getContext(), SelectQueryOptions()); + interpreter.buildQueryPlan(plan); + // collect the selected marks, rows, parts during build query pipeline. + plan.buildQueryPipeline( + QueryPlanOptimizationSettings::fromContext(getContext()), + BuildQueryPipelineSettings::fromContext(getContext())); + + if (settings.optimize) + plan.optimize(QueryPlanOptimizationSettings::fromContext(getContext())); + plan.explainEstimate(res_columns); + } + if (ast.getKind() != ASTExplainQuery::QueryEstimates) + { + if (single_line) + res_columns[0]->insertData(buf.str().data(), buf.str().size()); + else + fillColumn(*res_columns[0], buf.str()); + } return std::make_shared(sample_block.cloneWithColumns(std::move(res_columns))); } diff --git a/src/Interpreters/InterpreterExplainQuery.h b/src/Interpreters/InterpreterExplainQuery.h index f16b1a8f69d..a7f54a10e3e 100644 --- a/src/Interpreters/InterpreterExplainQuery.h +++ b/src/Interpreters/InterpreterExplainQuery.h @@ -2,7 +2,7 @@ #include #include - +#include namespace DB { @@ -15,7 +15,7 @@ public: BlockIO execute() override; - static Block getSampleBlock(); + static Block getSampleBlock(const ASTExplainQuery::ExplainKind kind); private: ASTPtr query; diff --git a/src/Interpreters/InterpreterFactory.cpp b/src/Interpreters/InterpreterFactory.cpp index 79cda364c42..9e6061c2525 100644 --- a/src/Interpreters/InterpreterFactory.cpp +++ b/src/Interpreters/InterpreterFactory.cpp @@ -70,7 +70,7 @@ #include -#include +#include #include #include #include diff --git a/src/Interpreters/InterpreterGrantQuery.cpp b/src/Interpreters/InterpreterGrantQuery.cpp index 7487ca79bde..42c440b4c52 100644 --- a/src/Interpreters/InterpreterGrantQuery.cpp +++ b/src/Interpreters/InterpreterGrantQuery.cpp @@ -28,6 +28,15 @@ namespace const ASTGrantQuery & query, const std::vector & roles_to_grant_or_revoke) { + if (!query.is_revoke) + { + if (query.replace_access) + grantee.access = {}; + if (query.replace_granted_roles) + grantee.granted_roles = {}; + } + + if (!query.access_rights_elements.empty()) { if (query.is_revoke) @@ -93,24 +102,28 @@ namespace const AccessControlManager & access_control, const ContextAccess & access, const ASTGrantQuery & query, - const std::vector & grantees_from_query) + const std::vector & grantees_from_query, + bool & need_check_grantees_are_allowed) { const auto & elements = query.access_rights_elements; + need_check_grantees_are_allowed = true; if (elements.empty()) + { + /// No access rights to grant or revoke. + need_check_grantees_are_allowed = false; return; + } - /// To execute the command GRANT the current user needs to have the access granted - /// with GRANT OPTION. if (!query.is_revoke) { + /// To execute the command GRANT the current user needs to have the access granted with GRANT OPTION. access.checkGrantOption(elements); - checkGranteesAreAllowed(access_control, access, grantees_from_query); return; } if (access.hasGrantOption(elements)) { - checkGranteesAreAllowed(access_control, access, grantees_from_query); + /// Simple case: the current user has the grant option for all the access rights specified for REVOKE. return; } @@ -137,6 +150,7 @@ namespace all_granted_access.makeUnion(user->access); } } + need_check_grantees_are_allowed = false; /// already checked AccessRights required_access; if (elements[0].is_partial_revoke) @@ -158,21 +172,28 @@ namespace } } - std::vector getRoleIDsAndCheckAdminOption( const AccessControlManager & access_control, const ContextAccess & access, const ASTGrantQuery & query, const RolesOrUsersSet & roles_from_query, - const std::vector & grantees_from_query) + const std::vector & grantees_from_query, + bool & need_check_grantees_are_allowed) { - std::vector matching_ids; + need_check_grantees_are_allowed = true; + if (roles_from_query.empty()) + { + /// No roles to grant or revoke. + need_check_grantees_are_allowed = false; + return {}; + } + std::vector matching_ids; if (!query.is_revoke) { + /// To execute the command GRANT the current user needs to have the roles granted with ADMIN OPTION. matching_ids = roles_from_query.getMatchingIDs(access_control); access.checkAdminOption(matching_ids); - checkGranteesAreAllowed(access_control, access, grantees_from_query); return matching_ids; } @@ -181,7 +202,7 @@ namespace matching_ids = roles_from_query.getMatchingIDs(); if (access.hasAdminOption(matching_ids)) { - checkGranteesAreAllowed(access_control, access, grantees_from_query); + /// Simple case: the current user has the admin option for all the roles specified for REVOKE. return matching_ids; } } @@ -209,6 +230,7 @@ namespace all_granted_roles.makeUnion(user->granted_roles); } } + need_check_grantees_are_allowed = false; /// already checked const auto & all_granted_roles_set = query.admin_option ? all_granted_roles.getGrantedWithAdminOption() : all_granted_roles.getGranted(); if (roles_from_query.all) @@ -218,6 +240,33 @@ namespace access.checkAdminOption(matching_ids); return matching_ids; } + + void checkGrantOptionAndGrantees( + const AccessControlManager & access_control, + const ContextAccess & access, + const ASTGrantQuery & query, + const std::vector & grantees_from_query) + { + bool need_check_grantees_are_allowed = true; + checkGrantOption(access_control, access, query, grantees_from_query, need_check_grantees_are_allowed); + if (need_check_grantees_are_allowed) + checkGranteesAreAllowed(access_control, access, grantees_from_query); + } + + std::vector getRoleIDsAndCheckAdminOptionAndGrantees( + const AccessControlManager & access_control, + const ContextAccess & access, + const ASTGrantQuery & query, + const RolesOrUsersSet & roles_from_query, + const std::vector & grantees_from_query) + { + bool need_check_grantees_are_allowed = true; + auto role_ids = getRoleIDsAndCheckAdminOption( + access_control, access, query, roles_from_query, grantees_from_query, need_check_grantees_are_allowed); + if (need_check_grantees_are_allowed) + checkGranteesAreAllowed(access_control, access, grantees_from_query); + return role_ids; + } } @@ -243,7 +292,7 @@ BlockIO InterpreterGrantQuery::execute() /// Check if the current user has corresponding roles granted with admin option. std::vector roles; if (roles_set) - roles = getRoleIDsAndCheckAdminOption(access_control, *getContext()->getAccess(), query, *roles_set, grantees); + roles = getRoleIDsAndCheckAdminOptionAndGrantees(access_control, *getContext()->getAccess(), query, *roles_set, grantees); if (!query.cluster.empty()) { @@ -258,7 +307,7 @@ BlockIO InterpreterGrantQuery::execute() /// Check if the current user has corresponding access rights with grant option. if (!query.access_rights_elements.empty()) - checkGrantOption(access_control, *getContext()->getAccess(), query, grantees); + checkGrantOptionAndGrantees(access_control, *getContext()->getAccess(), query, grantees); /// Update roles and users listed in `grantees`. auto update_func = [&](const AccessEntityPtr & entity) -> AccessEntityPtr diff --git a/src/Interpreters/InterpreterInsertQuery.cpp b/src/Interpreters/InterpreterInsertQuery.cpp index 4d9e293d762..e5d4d952a0c 100644 --- a/src/Interpreters/InterpreterInsertQuery.cpp +++ b/src/Interpreters/InterpreterInsertQuery.cpp @@ -4,11 +4,11 @@ #include #include #include -#include -#include +#include #include #include #include +#include #include #include #include @@ -272,7 +272,7 @@ BlockIO InterpreterInsertQuery::execute() /// NOTE: we explicitly ignore bound materialized views when inserting into Kafka Storage. /// Otherwise we'll get duplicates when MV reads same rows again from Kafka. if (table->noPushingToViews() && !no_destination) - out = table->write(query_ptr, metadata_snapshot, getContext()); + out = std::make_shared(table->write(query_ptr, metadata_snapshot, getContext())); else out = std::make_shared(table, metadata_snapshot, getContext(), query_ptr, no_destination); @@ -351,9 +351,13 @@ BlockIO InterpreterInsertQuery::execute() } else if (query.data && !query.has_tail) /// can execute without additional data { - // res.out = std::move(out_streams.at(0)); - res.in = std::make_shared(query_ptr, nullptr, query_sample_block, getContext(), nullptr); - res.in = std::make_shared(res.in, out_streams.at(0)); + auto pipe = getSourceFromFromASTInsertQuery(query_ptr, nullptr, query_sample_block, getContext(), nullptr); + res.pipeline.init(std::move(pipe)); + res.pipeline.resize(1); + res.pipeline.setSinks([&](const Block &, Pipe::StreamType) + { + return std::make_shared(out_streams.at(0)); + }); } else res.out = std::move(out_streams.at(0)); diff --git a/src/Interpreters/InterpreterRenameQuery.cpp b/src/Interpreters/InterpreterRenameQuery.cpp index 515559ad903..e3d52487a52 100644 --- a/src/Interpreters/InterpreterRenameQuery.cpp +++ b/src/Interpreters/InterpreterRenameQuery.cpp @@ -72,16 +72,31 @@ BlockIO InterpreterRenameQuery::execute() BlockIO InterpreterRenameQuery::executeToTables(const ASTRenameQuery & rename, const RenameDescriptions & descriptions, TableGuards & ddl_guards) { + assert(!rename.rename_if_cannot_exchange || descriptions.size() == 1); + assert(!(rename.rename_if_cannot_exchange && rename.exchange)); auto & database_catalog = DatabaseCatalog::instance(); for (const auto & elem : descriptions) { - if (!rename.exchange) + bool exchange_tables; + if (rename.exchange) + { + exchange_tables = true; + } + else if (rename.rename_if_cannot_exchange) + { + exchange_tables = database_catalog.isTableExist(StorageID(elem.to_database_name, elem.to_table_name), getContext()); + renamed_instead_of_exchange = !exchange_tables; + } + else + { + exchange_tables = false; database_catalog.assertTableDoesntExist(StorageID(elem.to_database_name, elem.to_table_name), getContext()); + } DatabasePtr database = database_catalog.getDatabase(elem.from_database_name); if (typeid_cast(database.get()) - && getContext()->getClientInfo().query_kind != ClientInfo::QueryKind::SECONDARY_QUERY) + && !getContext()->getClientInfo().is_replicated_database_internal) { if (1 < descriptions.size()) throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Database {} is Replicated, " @@ -100,7 +115,7 @@ BlockIO InterpreterRenameQuery::executeToTables(const ASTRenameQuery & rename, c elem.from_table_name, *database_catalog.getDatabase(elem.to_database_name), elem.to_table_name, - rename.exchange, + exchange_tables, rename.dictionary); } } diff --git a/src/Interpreters/InterpreterRenameQuery.h b/src/Interpreters/InterpreterRenameQuery.h index 49fdd50f52d..dfcd741754e 100644 --- a/src/Interpreters/InterpreterRenameQuery.h +++ b/src/Interpreters/InterpreterRenameQuery.h @@ -55,6 +55,8 @@ public: BlockIO execute() override; void extendQueryLogElemImpl(QueryLogElement & elem, const ASTPtr & ast, ContextPtr) const override; + bool renamedInsteadOfExchange() const { return renamed_instead_of_exchange; } + private: BlockIO executeToTables(const ASTRenameQuery & rename, const RenameDescriptions & descriptions, TableGuards & ddl_guards); static BlockIO executeToDatabase(const ASTRenameQuery & rename, const RenameDescriptions & descriptions); @@ -62,6 +64,7 @@ private: AccessRightsElements getRequiredAccess() const; ASTPtr query_ptr; + bool renamed_instead_of_exchange{false}; }; } diff --git a/src/Interpreters/InterpreterSelectQuery.cpp b/src/Interpreters/InterpreterSelectQuery.cpp index 22314b0aab6..33f9deaf805 100644 --- a/src/Interpreters/InterpreterSelectQuery.cpp +++ b/src/Interpreters/InterpreterSelectQuery.cpp @@ -387,6 +387,10 @@ InterpreterSelectQuery::InterpreterSelectQuery( options, joined_tables.tablesWithColumns(), required_result_column_names, table_join); query_info.syntax_analyzer_result = syntax_analyzer_result; + context->setDistributed(syntax_analyzer_result->is_remote_storage); + + if (storage && !query.final() && storage->needRewriteQueryWithFinal(syntax_analyzer_result->requiredSourceColumns())) + query.setFinal(); /// Save scalar sub queries's results in the query context if (!options.only_analyze && context->hasQueryContext()) @@ -609,17 +613,17 @@ Block InterpreterSelectQuery::getSampleBlockImpl() query_info.query = query_ptr; query_info.has_window = query_analyzer->hasWindow(); - if (storage && !options.only_analyze) { - from_stage = storage->getQueryProcessingStage(context, options.to_stage, metadata_snapshot, query_info); - - /// TODO how can we make IN index work if we cache parts before selecting a projection? - /// XXX Used for IN set index analysis. Is this a proper way? - if (query_info.projection) - metadata_snapshot->selected_projection = query_info.projection->desc; + auto & query = getSelectQuery(); + query_analyzer->makeSetsForIndex(query.where()); + query_analyzer->makeSetsForIndex(query.prewhere()); + query_info.sets = query_analyzer->getPreparedSets(); } + if (storage && !options.only_analyze) + from_stage = storage->getQueryProcessingStage(context, options.to_stage, metadata_snapshot, query_info); + /// Do I need to perform the first part of the pipeline? /// Running on remote servers during distributed processing or if query is not distributed. /// @@ -1729,7 +1733,7 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc syntax_analyzer_result->optimize_trivial_count && (settings.max_parallel_replicas <= 1) && storage - && storage->getName() != "MaterializeMySQL" + && storage->getName() != "MaterializedMySQL" && !row_policy_filter && processing_stage == QueryProcessingStage::FetchColumns && query_analyzer->hasAggregation() @@ -1882,8 +1886,6 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc if (max_streams > 1 && !is_remote) max_streams *= settings.max_streams_to_max_threads_ratio; - // TODO figure out how to make set for projections - query_info.sets = query_analyzer->getPreparedSets(); auto & prewhere_info = analysis_result.prewhere_info; if (prewhere_info) @@ -1926,11 +1928,13 @@ void InterpreterSelectQuery::executeFetchColumns(QueryProcessingStage::Enum proc } } + /// If we don't have filtration, we can pushdown limit to reading stage for optimizations. + UInt64 limit = (query.hasFiltration() || query.groupBy()) ? 0 : getLimitForSorting(query, context); if (query_info.projection) query_info.projection->input_order_info - = query_info.projection->order_optimizer->getInputOrder(query_info.projection->desc->metadata, context); + = query_info.projection->order_optimizer->getInputOrder(query_info.projection->desc->metadata, context, limit); else - query_info.input_order_info = query_info.order_optimizer->getInputOrder(metadata_snapshot, context); + query_info.input_order_info = query_info.order_optimizer->getInputOrder(metadata_snapshot, context, limit); } StreamLocalLimits limits; @@ -2288,8 +2292,14 @@ void InterpreterSelectQuery::executeOrderOptimized(QueryPlan & query_plan, Input { const Settings & settings = context->getSettingsRef(); + const auto & query = getSelectQuery(); auto finish_sorting_step = std::make_unique( - query_plan.getCurrentDataStream(), input_sorting_info->order_key_prefix_descr, output_order_descr, settings.max_block_size, limit); + query_plan.getCurrentDataStream(), + input_sorting_info->order_key_prefix_descr, + output_order_descr, + settings.max_block_size, + limit, + query.hasFiltration()); query_plan.addStep(std::move(finish_sorting_step)); } diff --git a/src/Interpreters/InterpreterShowCreateAccessEntityQuery.cpp b/src/Interpreters/InterpreterShowCreateAccessEntityQuery.cpp index fbe4855f0af..8115b4a63df 100644 --- a/src/Interpreters/InterpreterShowCreateAccessEntityQuery.cpp +++ b/src/Interpreters/InterpreterShowCreateAccessEntityQuery.cpp @@ -82,6 +82,13 @@ namespace query->grantees->use_keyword_any = true; } + if (!user.default_database.empty()) + { + auto ast = std::make_shared(); + ast->database_name = user.default_database; + query->default_database = ast; + } + return query; } diff --git a/src/Interpreters/InterpreterSystemQuery.cpp b/src/Interpreters/InterpreterSystemQuery.cpp index bdeb4a30e9e..e1ca021deeb 100644 --- a/src/Interpreters/InterpreterSystemQuery.cpp +++ b/src/Interpreters/InterpreterSystemQuery.cpp @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -416,7 +417,8 @@ BlockIO InterpreterSystemQuery::execute() [&] { if (auto text_log = getContext()->getTextLog()) text_log->flush(true); }, [&] { if (auto metric_log = getContext()->getMetricLog()) metric_log->flush(true); }, [&] { if (auto asynchronous_metric_log = getContext()->getAsynchronousMetricLog()) asynchronous_metric_log->flush(true); }, - [&] { if (auto opentelemetry_span_log = getContext()->getOpenTelemetrySpanLog()) opentelemetry_span_log->flush(true); } + [&] { if (auto opentelemetry_span_log = getContext()->getOpenTelemetrySpanLog()) opentelemetry_span_log->flush(true); }, + [&] { if (auto zookeeper_log = getContext()->getZooKeeperLog()) zookeeper_log->flush(true); } ); break; } @@ -663,7 +665,7 @@ void InterpreterSystemQuery::syncReplica(ASTSystemQuery &) { LOG_ERROR(log, "SYNC REPLICA {}: Timed out!", table_id.getNameForLogs()); throw Exception( - "SYNC REPLICA " + table_id.getNameForLogs() + ": command timed out! " + "SYNC REPLICA " + table_id.getNameForLogs() + ": command timed out. " "See the 'receive_timeout' setting", ErrorCodes::TIMEOUT_EXCEEDED); } LOG_TRACE(log, "SYNC REPLICA {}: OK", table_id.getNameForLogs()); diff --git a/src/Interpreters/InterpreterWatchQuery.h b/src/Interpreters/InterpreterWatchQuery.h index 45b61a18b66..51eb4a00556 100644 --- a/src/Interpreters/InterpreterWatchQuery.h +++ b/src/Interpreters/InterpreterWatchQuery.h @@ -13,7 +13,6 @@ limitations under the License. */ #include #include -#include #include #include #include diff --git a/src/Interpreters/JIT/compileFunction.cpp b/src/Interpreters/JIT/compileFunction.cpp index 18a2400d22f..03ef0500757 100644 --- a/src/Interpreters/JIT/compileFunction.cpp +++ b/src/Interpreters/JIT/compileFunction.cpp @@ -563,8 +563,10 @@ static void compileInsertAggregatesIntoResultColumns(llvm::Module & module, cons b.CreateRetVoid(); } -CompiledAggregateFunctions compileAggregateFunctons(CHJIT & jit, const std::vector & functions, std::string functions_dump_name) +CompiledAggregateFunctions compileAggregateFunctions(CHJIT & jit, const std::vector & functions, std::string functions_dump_name) { + Stopwatch watch; + std::string create_aggregate_states_functions_name = functions_dump_name + "_create"; std::string add_aggregate_states_functions_name = functions_dump_name + "_add"; std::string merge_aggregate_states_functions_name = functions_dump_name + "_merge"; @@ -588,6 +590,10 @@ CompiledAggregateFunctions compileAggregateFunctons(CHJIT & jit, const std::vect assert(merge_aggregate_states_function); assert(insert_aggregate_states_function); + ProfileEvents::increment(ProfileEvents::CompileExpressionsMicroseconds, watch.elapsedMicroseconds()); + ProfileEvents::increment(ProfileEvents::CompileExpressionsBytes, compiled_module.size); + ProfileEvents::increment(ProfileEvents::CompileFunction); + CompiledAggregateFunctions compiled_aggregate_functions { .create_aggregate_states_function = create_aggregate_states_function, diff --git a/src/Interpreters/JIT/compileFunction.h b/src/Interpreters/JIT/compileFunction.h index 5355227defe..1cf15e15201 100644 --- a/src/Interpreters/JIT/compileFunction.h +++ b/src/Interpreters/JIT/compileFunction.h @@ -78,7 +78,7 @@ struct CompiledAggregateFunctions * JITMergeAggregateStatesFunction will merge aggregate states for aggregate functions. * JITInsertAggregateStatesIntoColumnsFunction will insert aggregate states for aggregate functions into result columns. */ -CompiledAggregateFunctions compileAggregateFunctons(CHJIT & jit, const std::vector & functions, std::string functions_dump_name); +CompiledAggregateFunctions compileAggregateFunctions(CHJIT & jit, const std::vector & functions, std::string functions_dump_name); } diff --git a/src/Interpreters/JoinSwitcher.h b/src/Interpreters/JoinSwitcher.h index 75ff7bb9b2c..a89ac6d5d98 100644 --- a/src/Interpreters/JoinSwitcher.h +++ b/src/Interpreters/JoinSwitcher.h @@ -31,9 +31,9 @@ public: join->joinBlock(block, not_processed); } - bool hasTotals() const override + const Block & getTotals() const override { - return join->hasTotals(); + return join->getTotals(); } void setTotals(const Block & block) override @@ -41,11 +41,6 @@ public: join->setTotals(block); } - void joinTotals(Block & block) const override - { - join->joinTotals(block); - } - size_t getTotalRowCount() const override { return join->getTotalRowCount(); diff --git a/src/Interpreters/JoinToSubqueryTransformVisitor.cpp b/src/Interpreters/JoinToSubqueryTransformVisitor.cpp index 8595d3a4179..eabdeaefc04 100644 --- a/src/Interpreters/JoinToSubqueryTransformVisitor.cpp +++ b/src/Interpreters/JoinToSubqueryTransformVisitor.cpp @@ -551,8 +551,6 @@ std::vector normalizeColumnNamesExtractNeeded( else needed_columns[*table_pos].no_clashes.emplace(ident->shortName()); } - else if (!got_alias) - throw Exception("Unknown column name '" + ident->name() + "'", ErrorCodes::UNKNOWN_IDENTIFIER); } return needed_columns; diff --git a/src/Interpreters/Lemmatizers.cpp b/src/Interpreters/Lemmatizers.cpp new file mode 100644 index 00000000000..38cd4c33678 --- /dev/null +++ b/src/Interpreters/Lemmatizers.cpp @@ -0,0 +1,100 @@ + +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + +#if USE_NLP + +#include +#include +#include + +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int UNKNOWN_ELEMENT_IN_CONFIG; + extern const int INVALID_CONFIG_PARAMETER; +} + + +class Lemmatizer : public ILemmatizer +{ +private: + RdrLemmatizer lemmatizer; + +public: + explicit Lemmatizer(const String & path) : lemmatizer(path.data()) {} + + TokenPtr lemmatize(const char * token) override + { + return TokenPtr(lemmatizer.Lemmatize(token)); + } +}; + +/// Duplicate of code from StringUtils.h. Copied here for less dependencies. +static bool startsWith(const std::string & s, const char * prefix) +{ + return s.size() >= strlen(prefix) && 0 == memcmp(s.data(), prefix, strlen(prefix)); +} + +Lemmatizers::Lemmatizers(const Poco::Util::AbstractConfiguration & config) +{ + String prefix = "lemmatizers"; + Poco::Util::AbstractConfiguration::Keys keys; + + if (!config.has(prefix)) + throw Exception(ErrorCodes::INVALID_CONFIG_PARAMETER, "No lemmatizers specified in server config on prefix '{}'", prefix); + + config.keys(prefix, keys); + + for (const auto & key : keys) + { + if (startsWith(key, "lemmatizer")) + { + const auto & lemm_name = config.getString(prefix + "." + key + ".lang", ""); + const auto & lemm_path = config.getString(prefix + "." + key + ".path", ""); + + if (lemm_name.empty()) + throw Exception("Lemmatizer language in config is not specified here: " + prefix + "." + key + ".lang", + ErrorCodes::INVALID_CONFIG_PARAMETER); + if (lemm_path.empty()) + throw Exception("Path to lemmatizer in config is not specified here: " + prefix + "." + key + ".path", + ErrorCodes::INVALID_CONFIG_PARAMETER); + + paths[lemm_name] = lemm_path; + } + else + throw Exception("Unknown element in config: " + prefix + "." + key + ", must be 'lemmatizer'", + ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); + } +} + +Lemmatizers::LemmPtr Lemmatizers::getLemmatizer(const String & name) +{ + std::lock_guard guard(mutex); + + if (lemmatizers.find(name) != lemmatizers.end()) + return lemmatizers[name]; + + if (paths.find(name) != paths.end()) + { + if (!std::filesystem::exists(paths[name])) + throw Exception("Incorrect path to lemmatizer: " + paths[name], + ErrorCodes::INVALID_CONFIG_PARAMETER); + + lemmatizers[name] = std::make_shared(paths[name]); + return lemmatizers[name]; + } + + throw Exception("Lemmatizer named: '" + name + "' is not found", + ErrorCodes::INVALID_CONFIG_PARAMETER); +} + +} + +#endif diff --git a/src/Interpreters/Lemmatizers.h b/src/Interpreters/Lemmatizers.h new file mode 100644 index 00000000000..6682afaa415 --- /dev/null +++ b/src/Interpreters/Lemmatizers.h @@ -0,0 +1,48 @@ +#pragma once + +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + +#if USE_NLP + +#include +#include + +#include +#include + + +namespace DB +{ + +class ILemmatizer +{ +public: + using TokenPtr = std::shared_ptr; + + virtual TokenPtr lemmatize(const char * token) = 0; + + virtual ~ILemmatizer() = default; +}; + + +class Lemmatizers +{ +public: + using LemmPtr = std::shared_ptr; + +private: + std::mutex mutex; + std::unordered_map lemmatizers; + std::unordered_map paths; + +public: + explicit Lemmatizers(const Poco::Util::AbstractConfiguration & config); + + LemmPtr getLemmatizer(const String & name); +}; + +} + +#endif diff --git a/src/Interpreters/MergeJoin.cpp b/src/Interpreters/MergeJoin.cpp index 26463c8c6ed..b93a94b4215 100644 --- a/src/Interpreters/MergeJoin.cpp +++ b/src/Interpreters/MergeJoin.cpp @@ -1,19 +1,20 @@ #include +#include #include #include -#include +#include +#include +#include +#include #include #include -#include #include -#include -#include -#include +#include +#include #include #include #include -#include namespace DB @@ -23,12 +24,50 @@ namespace ErrorCodes { extern const int NOT_IMPLEMENTED; extern const int PARAMETER_OUT_OF_BOUND; + extern const int ILLEGAL_COLUMN; extern const int LOGICAL_ERROR; } namespace { +String deriveTempName(const String & name) +{ + return "--" + name; +} + +/* + * Convert column with conditions for left or right table to join to joining key. + * Input column type is UInt8 output is Nullable(UInt8). + * 0 converted to NULL and such rows won't be joined, + * 1 converted to 0 (any constant non-NULL value to join) + */ +ColumnWithTypeAndName condtitionColumnToJoinable(const Block & block, const String & src_column_name) +{ + size_t res_size = block.rows(); + auto data_col = ColumnUInt8::create(res_size, 0); + auto null_map = ColumnUInt8::create(res_size, 0); + + if (!src_column_name.empty()) + { + auto mask_col = JoinCommon::getColumnAsMask(block, src_column_name); + assert(mask_col); + const auto & mask_data = assert_cast(*mask_col).getData(); + + for (size_t i = 0; i < res_size; ++i) + null_map->getData()[i] = !mask_data[i]; + } + + ColumnPtr res_col = ColumnNullable::create(std::move(data_col), std::move(null_map)); + DataTypePtr res_col_type = std::make_shared(std::make_shared()); + String res_name = deriveTempName(src_column_name); + + if (block.has(res_name)) + throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Conflicting column name '{}'", res_name); + + return {res_col, res_col_type, res_name}; +} + template int nullableCompareAt(const IColumn & left_column, const IColumn & right_column, size_t lhs_pos, size_t rhs_pos) { @@ -180,7 +219,7 @@ class MergeJoinCursor { public: MergeJoinCursor(const Block & block, const SortDescription & desc_) - : impl(SortCursorImpl(block, desc_)) + : impl(block, desc_) { /// SortCursorImpl can work with permutation, but MergeJoinCursor can't. if (impl.permutation) @@ -320,14 +359,17 @@ MutableColumns makeMutableColumns(const Block & block, size_t rows_to_reserve = void makeSortAndMerge(const Names & keys, SortDescription & sort, SortDescription & merge) { NameSet unique_keys; + for (const auto & sd: merge) + unique_keys.insert(sd.column_name); + for (const auto & key_name : keys) { - merge.emplace_back(SortColumnDescription(key_name, 1, 1)); + merge.emplace_back(key_name); - if (!unique_keys.count(key_name)) + if (!unique_keys.contains(key_name)) { unique_keys.insert(key_name); - sort.emplace_back(SortColumnDescription(key_name, 1, 1)); + sort.emplace_back(key_name); } } } @@ -464,15 +506,31 @@ MergeJoin::MergeJoin(std::shared_ptr table_join_, const Block & right ErrorCodes::PARAMETER_OUT_OF_BOUND); } - for (const auto & right_key : table_join->keyNamesRight()) + std::tie(mask_column_name_left, mask_column_name_right) = table_join->joinConditionColumnNames(); + + /// Add auxiliary joining keys to join only rows where conditions from JOIN ON sections holds + /// Input boolean column converted to nullable and only rows with non NULLS value will be joined + if (!mask_column_name_left.empty() || !mask_column_name_right.empty()) + { + JoinCommon::checkTypesOfMasks({}, "", right_sample_block, mask_column_name_right); + + key_names_left.push_back(deriveTempName(mask_column_name_left)); + key_names_right.push_back(deriveTempName(mask_column_name_right)); + } + + key_names_left.insert(key_names_left.end(), table_join->keyNamesLeft().begin(), table_join->keyNamesLeft().end()); + key_names_right.insert(key_names_right.end(), table_join->keyNamesRight().begin(), table_join->keyNamesRight().end()); + + addConditionJoinColumn(right_sample_block, JoinTableSide::Right); + JoinCommon::splitAdditionalColumns(key_names_right, right_sample_block, right_table_keys, right_columns_to_add); + + for (const auto & right_key : key_names_right) { if (right_sample_block.getByName(right_key).type->lowCardinality()) lowcard_right_keys.push_back(right_key); } - - table_join->splitAdditionalColumns(right_sample_block, right_table_keys, right_columns_to_add); JoinCommon::removeLowCardinalityInplace(right_table_keys); - JoinCommon::removeLowCardinalityInplace(right_sample_block, table_join->keyNamesRight()); + JoinCommon::removeLowCardinalityInplace(right_sample_block, key_names_right); const NameSet required_right_keys = table_join->requiredRightKeys(); for (const auto & column : right_table_keys) @@ -484,8 +542,8 @@ MergeJoin::MergeJoin(std::shared_ptr table_join_, const Block & right if (nullable_right_side) JoinCommon::convertColumnsToNullable(right_columns_to_add); - makeSortAndMerge(table_join->keyNamesLeft(), left_sort_description, left_merge_description); - makeSortAndMerge(table_join->keyNamesRight(), right_sort_description, right_merge_description); + makeSortAndMerge(key_names_left, left_sort_description, left_merge_description); + makeSortAndMerge(key_names_right, right_sort_description, right_merge_description); /// Temporary disable 'partial_merge_join_left_table_buffer_bytes' without 'partial_merge_join_optimizations' if (table_join->enablePartialMergeJoinOptimizations()) @@ -503,11 +561,6 @@ void MergeJoin::setTotals(const Block & totals_block) used_rows_bitmap = std::make_shared(getRightBlocksCount()); } -void MergeJoin::joinTotals(Block & block) const -{ - JoinCommon::joinTotals(totals, right_columns_to_add, *table_join, block); -} - void MergeJoin::mergeRightBlocks() { if (is_in_memory) @@ -523,15 +576,15 @@ void MergeJoin::mergeInMemoryRightBlocks() if (right_blocks.empty()) return; - auto stream = std::make_shared(std::move(right_blocks.blocks)); - Pipe source(std::make_shared(std::move(stream))); + Pipe source(std::make_shared(std::move(right_blocks.blocks))); right_blocks.clear(); QueryPipeline pipeline; pipeline.init(std::move(source)); /// TODO: there should be no split keys by blocks for RIGHT|FULL JOIN - pipeline.addTransform(std::make_shared(pipeline.getHeader(), right_sort_description, max_rows_in_right_block, 0, 0, 0, 0, nullptr, 0)); + pipeline.addTransform(std::make_shared( + pipeline.getHeader(), right_sort_description, max_rows_in_right_block, 0, 0, 0, 0, nullptr, 0)); auto sorted_input = PipelineExecutingBlockInputStream(std::move(pipeline)); @@ -607,6 +660,7 @@ bool MergeJoin::addJoinedBlock(const Block & src_block, bool) { Block block = modifyRightBlock(src_block); + addConditionJoinColumn(block, JoinTableSide::Right); sortBlock(block, right_sort_description); return saveRightBlock(std::move(block)); } @@ -616,16 +670,22 @@ void MergeJoin::joinBlock(Block & block, ExtraBlockPtr & not_processed) Names lowcard_keys = lowcard_right_keys; if (block) { - JoinCommon::checkTypesOfKeys(block, table_join->keyNamesLeft(), right_table_keys, table_join->keyNamesRight()); + JoinCommon::checkTypesOfMasks(block, mask_column_name_left, right_sample_block, mask_column_name_right); + + /// Add auxiliary column, will be removed after joining + addConditionJoinColumn(block, JoinTableSide::Left); + + JoinCommon::checkTypesOfKeys(block, key_names_left, right_table_keys, key_names_right); + materializeBlockInplace(block); - for (const auto & column_name : table_join->keyNamesLeft()) + for (const auto & column_name : key_names_left) { if (block.getByName(column_name).type->lowCardinality()) lowcard_keys.push_back(column_name); } - JoinCommon::removeLowCardinalityInplace(block, table_join->keyNamesLeft(), false); + JoinCommon::removeLowCardinalityInplace(block, key_names_left, false); sortBlock(block, left_sort_description); @@ -660,6 +720,9 @@ void MergeJoin::joinBlock(Block & block, ExtraBlockPtr & not_processed) if (!not_processed && left_blocks_buffer) not_processed = std::make_shared(NotProcessed{{}, 0, 0, 0}); + if (needConditionJoinColumn()) + block.erase(deriveTempName(mask_column_name_left)); + for (const auto & column_name : lowcard_keys) { if (!block.has(column_name)) @@ -702,7 +765,7 @@ void MergeJoin::joinSortedBlock(Block & block, ExtraBlockPtr & not_processed) if (skip_not_intersected) { - int intersection = left_cursor.intersect(min_max_right_blocks[i], table_join->keyNamesRight()); + int intersection = left_cursor.intersect(min_max_right_blocks[i], key_names_right); if (intersection < 0) break; /// (left) ... (right) if (intersection > 0) @@ -735,7 +798,7 @@ void MergeJoin::joinSortedBlock(Block & block, ExtraBlockPtr & not_processed) if (skip_not_intersected) { - int intersection = left_cursor.intersect(min_max_right_blocks[i], table_join->keyNamesRight()); + int intersection = left_cursor.intersect(min_max_right_blocks[i], key_names_right); if (intersection < 0) break; /// (left) ... (right) if (intersection > 0) @@ -836,7 +899,7 @@ bool MergeJoin::leftJoin(MergeJoinCursor & left_cursor, const Block & left_block } bool MergeJoin::allInnerJoin(MergeJoinCursor & left_cursor, const Block & left_block, RightBlockInfo & right_block_info, - MutableColumns & left_columns, MutableColumns & right_columns, size_t & left_key_tail) + MutableColumns & left_columns, MutableColumns & right_columns, size_t & left_key_tail) { const Block & right_block = *right_block_info.block; MergeJoinCursor right_cursor(right_block, right_merge_description); @@ -975,11 +1038,15 @@ void MergeJoin::initRightTableWriter() class NonMergeJoinedBlockInputStream : private NotJoined, public IBlockInputStream { public: - NonMergeJoinedBlockInputStream(const MergeJoin & parent_, const Block & result_sample_block_, UInt64 max_block_size_) + NonMergeJoinedBlockInputStream(const MergeJoin & parent_, + const Block & result_sample_block_, + const Names & key_names_right_, + UInt64 max_block_size_) : NotJoined(*parent_.table_join, parent_.modifyRightBlock(parent_.right_sample_block), parent_.right_sample_block, - result_sample_block_) + result_sample_block_, + {}, key_names_right_) , parent(parent_) , max_block_size(max_block_size_) {} @@ -1053,7 +1120,10 @@ private: } if (rows_added >= max_block_size) + { + ++block_number; break; + } } return rows_added; @@ -1064,10 +1134,26 @@ private: BlockInputStreamPtr MergeJoin::createStreamWithNonJoinedRows(const Block & result_sample_block, UInt64 max_block_size) const { if (table_join->strictness() == ASTTableJoin::Strictness::All && (is_right || is_full)) - return std::make_shared(*this, result_sample_block, max_block_size); + return std::make_shared(*this, result_sample_block, key_names_right, max_block_size); return {}; } +bool MergeJoin::needConditionJoinColumn() const +{ + return !mask_column_name_left.empty() || !mask_column_name_right.empty(); +} + +void MergeJoin::addConditionJoinColumn(Block & block, JoinTableSide block_side) const +{ + if (needConditionJoinColumn()) + { + if (block_side == JoinTableSide::Left) + block.insert(condtitionColumnToJoinable(block, mask_column_name_left)); + else + block.insert(condtitionColumnToJoinable(block, mask_column_name_right)); + } +} + MergeJoin::RightBlockInfo::RightBlockInfo(std::shared_ptr block_, size_t block_number_, size_t & skip_, RowBitmaps * bitmaps_) : block(block_) diff --git a/src/Interpreters/MergeJoin.h b/src/Interpreters/MergeJoin.h index b6bde8fb131..844c730de4f 100644 --- a/src/Interpreters/MergeJoin.h +++ b/src/Interpreters/MergeJoin.h @@ -16,7 +16,7 @@ class TableJoin; class MergeJoinCursor; struct MergeJoinEqualRange; class RowBitmaps; - +enum class JoinTableSide; class MergeJoin : public IJoin { @@ -26,11 +26,14 @@ public: const TableJoin & getTableJoin() const override { return *table_join; } bool addJoinedBlock(const Block & block, bool check_limits) override; void joinBlock(Block &, ExtraBlockPtr & not_processed) override; - void joinTotals(Block &) const override; + void setTotals(const Block &) override; - bool hasTotals() const override { return totals; } + const Block & getTotals() const override { return totals; } + size_t getTotalRowCount() const override { return right_blocks.row_count; } size_t getTotalByteCount() const override { return right_blocks.bytes; } + /// Has to be called only after setTotals()/mergeRightBlocks() + bool alwaysReturnsEmptySet() const override { return (is_right || is_inner) && min_max_right_blocks.empty(); } BlockInputStreamPtr createStreamWithNonJoinedRows(const Block & result_sample_block, UInt64 max_block_size) const override; @@ -78,6 +81,14 @@ private: Block right_columns_to_add; SortedBlocksWriter::Blocks right_blocks; + Names key_names_right; + Names key_names_left; + + /// Additional conditions for rows to join from JOIN ON section. + /// Only rows where conditions are met can be joined. + String mask_column_name_left; + String mask_column_name_right; + /// Each block stores first and last row from corresponding sorted block on disk Blocks min_max_right_blocks; std::shared_ptr left_blocks_buffer; @@ -150,6 +161,9 @@ private: void mergeFlushedRightBlocks(); void initRightTableWriter(); + + bool needConditionJoinColumn() const; + void addConditionJoinColumn(Block & block, JoinTableSide block_side) const; }; } diff --git a/src/Interpreters/MutationsInterpreter.cpp b/src/Interpreters/MutationsInterpreter.cpp index 03a2a4da1d1..fe0594bb58f 100644 --- a/src/Interpreters/MutationsInterpreter.cpp +++ b/src/Interpreters/MutationsInterpreter.cpp @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include #include #include @@ -901,12 +901,18 @@ BlockInputStreamPtr MutationsInterpreter::execute() select_interpreter->buildQueryPlan(plan); auto pipeline = addStreamsForLaterStages(stages, plan); - BlockInputStreamPtr result_stream = std::make_shared(std::move(*pipeline)); /// Sometimes we update just part of columns (for example UPDATE mutation) /// in this case we don't read sorting key, so just we don't check anything. - if (auto sort_desc = getStorageSortDescriptionIfPossible(result_stream->getHeader())) - result_stream = std::make_shared(result_stream, *sort_desc); + if (auto sort_desc = getStorageSortDescriptionIfPossible(pipeline->getHeader())) + { + pipeline->addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, *sort_desc); + }); + } + + BlockInputStreamPtr result_stream = std::make_shared(std::move(*pipeline)); if (!updated_header) updated_header = std::make_unique(result_stream->getHeader()); diff --git a/src/Interpreters/MutationsInterpreter.h b/src/Interpreters/MutationsInterpreter.h index 65ad027118a..c9a589e6b6d 100644 --- a/src/Interpreters/MutationsInterpreter.h +++ b/src/Interpreters/MutationsInterpreter.h @@ -1,6 +1,5 @@ #pragma once -#include #include #include #include diff --git a/src/Interpreters/MySQL/tests/gtest_create_rewritten.cpp b/src/Interpreters/MySQL/tests/gtest_create_rewritten.cpp index 036b933a461..08af123448e 100644 --- a/src/Interpreters/MySQL/tests/gtest_create_rewritten.cpp +++ b/src/Interpreters/MySQL/tests/gtest_create_rewritten.cpp @@ -28,7 +28,7 @@ static inline ASTPtr tryRewrittenCreateQuery(const String & query, ContextPtr co context, "test_database", "test_database")[0]; } -static const char MATERIALIZEMYSQL_TABLE_COLUMNS[] = ", `_sign` Int8() MATERIALIZED 1" +static const char MATERIALIZEDMYSQL_TABLE_COLUMNS[] = ", `_sign` Int8() MATERIALIZED 1" ", `_version` UInt64() MATERIALIZED 1" ", INDEX _version _version TYPE minmax GRANULARITY 1"; @@ -50,19 +50,19 @@ TEST(MySQLCreateRewritten, ColumnsDataType) EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, test " + test_type + ")", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` Nullable(" + mapped_type + ")" + - MATERIALIZEMYSQL_TABLE_COLUMNS + ") ENGINE = " + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = " "ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, test " + test_type + " NOT NULL)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` " + mapped_type + - MATERIALIZEMYSQL_TABLE_COLUMNS + ") ENGINE = " + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = " "ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, test " + test_type + " COMMENT 'test_comment' NOT NULL)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` " + mapped_type + " COMMENT 'test_comment'" + - MATERIALIZEMYSQL_TABLE_COLUMNS + ") ENGINE = " + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = " "ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); if (Poco::toUpper(test_type).find("INT") != std::string::npos) @@ -70,25 +70,25 @@ TEST(MySQLCreateRewritten, ColumnsDataType) EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, test " + test_type + " UNSIGNED)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` Nullable(U" + mapped_type + ")" + - MATERIALIZEMYSQL_TABLE_COLUMNS + ") ENGINE = " + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = " "ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, test " + test_type + " COMMENT 'test_comment' UNSIGNED)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` Nullable(U" + mapped_type + ")" + " COMMENT 'test_comment'" + - MATERIALIZEMYSQL_TABLE_COLUMNS + ") ENGINE = " + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = " "ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, test " + test_type + " NOT NULL UNSIGNED)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` U" + mapped_type + - MATERIALIZEMYSQL_TABLE_COLUMNS + ") ENGINE = " + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = " "ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, test " + test_type + " COMMENT 'test_comment' UNSIGNED NOT NULL)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` U" + mapped_type + " COMMENT 'test_comment'" + - MATERIALIZEMYSQL_TABLE_COLUMNS + ") ENGINE = " + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = " "ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); } } @@ -114,13 +114,13 @@ TEST(MySQLCreateRewritten, PartitionPolicy) EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` " + test_type + " PRIMARY KEY)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` " + mapped_type + - MATERIALIZEMYSQL_TABLE_COLUMNS + + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = ReplacingMergeTree(_version)" + partition_policy + " ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` " + test_type + " NOT NULL PRIMARY KEY)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` " + mapped_type + - MATERIALIZEMYSQL_TABLE_COLUMNS + + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = ReplacingMergeTree(_version)" + partition_policy + " ORDER BY tuple(key)"); } } @@ -145,25 +145,25 @@ TEST(MySQLCreateRewritten, OrderbyPolicy) EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` " + test_type + " PRIMARY KEY, `key2` " + test_type + " UNIQUE KEY)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` " + mapped_type + ", `key2` Nullable(" + mapped_type + ")" + - MATERIALIZEMYSQL_TABLE_COLUMNS + + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = ReplacingMergeTree(_version)" + partition_policy + " ORDER BY (key, assumeNotNull(key2))"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` " + test_type + " NOT NULL PRIMARY KEY, `key2` " + test_type + " NOT NULL UNIQUE KEY)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` " + mapped_type + ", `key2` " + mapped_type + - MATERIALIZEMYSQL_TABLE_COLUMNS + + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = ReplacingMergeTree(_version)" + partition_policy + " ORDER BY (key, key2)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` " + test_type + " KEY UNIQUE KEY)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` " + mapped_type + - MATERIALIZEMYSQL_TABLE_COLUMNS + + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = ReplacingMergeTree(_version)" + partition_policy + " ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` " + test_type + ", `key2` " + test_type + " UNIQUE KEY, PRIMARY KEY(`key`, `key2`))", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` " + mapped_type + ", `key2` " + mapped_type + - MATERIALIZEMYSQL_TABLE_COLUMNS + + MATERIALIZEDMYSQL_TABLE_COLUMNS + ") ENGINE = ReplacingMergeTree(_version)" + partition_policy + " ORDER BY (key, key2)"); } } @@ -176,25 +176,25 @@ TEST(MySQLCreateRewritten, RewrittenQueryWithPrimaryKey) EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` int NOT NULL PRIMARY KEY) ENGINE=InnoDB DEFAULT CHARSET=utf8", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` int NOT NULL, PRIMARY KEY (`key`)) ENGINE=InnoDB DEFAULT CHARSET=utf8", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key_1` int NOT NULL, key_2 INT NOT NULL, PRIMARY KEY (`key_1`, `key_2`)) ENGINE=InnoDB DEFAULT CHARSET=utf8", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key_1` Int32, `key_2` Int32" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(key_1, 4294967) ORDER BY (key_1, key_2)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key_1` BIGINT NOT NULL, key_2 INT NOT NULL, PRIMARY KEY (`key_1`, `key_2`)) ENGINE=InnoDB DEFAULT CHARSET=utf8", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key_1` Int64, `key_2` Int32" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(key_2, 4294967) ORDER BY (key_1, key_2)"); } @@ -206,7 +206,7 @@ TEST(MySQLCreateRewritten, RewrittenQueryWithPrefixKey) EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1` (`key` int NOT NULL PRIMARY KEY, `prefix_key` varchar(200) NOT NULL, KEY prefix_key_index(prefix_key(2))) ENGINE=InnoDB DEFAULT CHARSET=utf8", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `prefix_key` String" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + ") ENGINE = " + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = " "ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY (key, prefix_key)"); } @@ -220,7 +220,7 @@ TEST(MySQLCreateRewritten, UniqueKeysConvert) " id bigint NOT NULL AUTO_INCREMENT, tenant_id bigint NOT NULL, PRIMARY KEY (id), UNIQUE KEY code_id (code, tenant_id), UNIQUE KEY name_id (name, tenant_id))" " ENGINE=InnoDB AUTO_INCREMENT=100 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`code` String, `name` String, `id` Int64, `tenant_id` Int64" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(id, 18446744073709551) ORDER BY (code, name, tenant_id, id)"); } @@ -232,7 +232,7 @@ TEST(MySQLCreateRewritten, QueryWithColumnComments) EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, `test` INT COMMENT 'test_comment')", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` Nullable(Int32) COMMENT 'test_comment'" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); } @@ -244,16 +244,16 @@ TEST(MySQLCreateRewritten, QueryWithEnum) EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, `test` ENUM('a','b','c'))", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` Nullable(Enum8('a' = 1, 'b' = 2, 'c' = 3))" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, `test` ENUM('a','b','c') NOT NULL)", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` Enum8('a' = 1, 'b' = 2, 'c' = 3)" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); EXPECT_EQ(queryToString(tryRewrittenCreateQuery( "CREATE TABLE `test_database`.`test_table_1`(`key` INT NOT NULL PRIMARY KEY, `test` ENUM('a','b','c') COMMENT 'test_comment')", context_holder.context)), "CREATE TABLE test_database.test_table_1 (`key` Int32, `test` Nullable(Enum8('a' = 1, 'b' = 2, 'c' = 3)) COMMENT 'test_comment'" + - std::string(MATERIALIZEMYSQL_TABLE_COLUMNS) + + std::string(MATERIALIZEDMYSQL_TABLE_COLUMNS) + ") ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(key, 4294967) ORDER BY tuple(key)"); } diff --git a/src/Interpreters/OptimizeIfWithConstantConditionVisitor.cpp b/src/Interpreters/OptimizeIfWithConstantConditionVisitor.cpp index cdcf6f7dddd..a8e2d371e05 100644 --- a/src/Interpreters/OptimizeIfWithConstantConditionVisitor.cpp +++ b/src/Interpreters/OptimizeIfWithConstantConditionVisitor.cpp @@ -39,9 +39,12 @@ static bool tryExtractConstValueFromCondition(const ASTPtr & condition, bool & v const ASTPtr & type_ast = expr_list->children.at(1); if (const auto * type_literal = type_ast->as()) { - if (type_literal->value.getType() == Field::Types::String && - type_literal->value.get() == "UInt8") - return tryExtractConstValueFromCondition(expr_list->children.at(0), value); + if (type_literal->value.getType() == Field::Types::String) + { + const auto & type_str = type_literal->value.get(); + if (type_str == "UInt8" || type_str == "Nullable(UInt8)") + return tryExtractConstValueFromCondition(expr_list->children.at(0), value); + } } } } diff --git a/src/Interpreters/ProcessList.cpp b/src/Interpreters/ProcessList.cpp index 951ff6420c4..06320f00dfa 100644 --- a/src/Interpreters/ProcessList.cpp +++ b/src/Interpreters/ProcessList.cpp @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -297,7 +298,10 @@ QueryStatus::QueryStatus( { } -QueryStatus::~QueryStatus() = default; +QueryStatus::~QueryStatus() +{ + assert(executors.empty()); +} void QueryStatus::setQueryStreams(const BlockIO & io) { @@ -351,6 +355,11 @@ CancellationCode QueryStatus::cancelQuery(bool kill) BlockInputStreamPtr input_stream; BlockOutputStreamPtr output_stream; + SCOPE_EXIT({ + std::lock_guard lock(query_streams_mutex); + for (auto * e : executors) + e->cancel(); + }); if (tryGetQueryStreams(input_stream, output_stream)) { @@ -366,6 +375,20 @@ CancellationCode QueryStatus::cancelQuery(bool kill) return CancellationCode::CancelSent; } +void QueryStatus::addPipelineExecutor(PipelineExecutor * e) +{ + std::lock_guard lock(query_streams_mutex); + assert(std::find(executors.begin(), executors.end(), e) == executors.end()); + executors.push_back(e); +} + +void QueryStatus::removePipelineExecutor(PipelineExecutor * e) +{ + std::lock_guard lock(query_streams_mutex); + assert(std::find(executors.begin(), executors.end(), e) != executors.end()); + std::erase_if(executors, [e](PipelineExecutor * x) { return x == e; }); +} + void QueryStatus::setUserProcessList(ProcessListForUser * user_process_list_) { diff --git a/src/Interpreters/ProcessList.h b/src/Interpreters/ProcessList.h index fca28b031db..1adad84c040 100644 --- a/src/Interpreters/ProcessList.h +++ b/src/Interpreters/ProcessList.h @@ -22,6 +22,7 @@ #include #include #include +#include namespace CurrentMetrics @@ -34,6 +35,7 @@ namespace DB struct Settings; class IAST; +class PipelineExecutor; struct ProcessListForUser; class QueryStatus; @@ -109,6 +111,9 @@ protected: BlockInputStreamPtr query_stream_in; BlockOutputStreamPtr query_stream_out; + /// Array of PipelineExecutors to be cancelled when a cancelQuery is received + std::vector executors; + enum QueryStreamsStatus { NotInitialized, @@ -183,6 +188,12 @@ public: CancellationCode cancelQuery(bool kill); bool isKilled() const { return is_killed; } + + /// Adds a pipeline to the QueryStatus + void addPipelineExecutor(PipelineExecutor * e); + + /// Removes a pipeline to the QueryStatus + void removePipelineExecutor(PipelineExecutor * e); }; diff --git a/src/Interpreters/QueryNormalizer.cpp b/src/Interpreters/QueryNormalizer.cpp index aae714198b5..ea61ade2b49 100644 --- a/src/Interpreters/QueryNormalizer.cpp +++ b/src/Interpreters/QueryNormalizer.cpp @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include @@ -170,6 +171,24 @@ void QueryNormalizer::visitChildren(IAST * node, Data & data) /// Don't go into query argument. return; } + + /// For lambda functions we need to avoid replacing lambda parameters with external aliases, for example, + /// Select 1 as x, arrayMap(x -> x + 2, [1, 2, 3]) + /// shouldn't be replaced with Select 1 as x, arrayMap(x -> **(1 as x)** + 2, [1, 2, 3]) + Aliases extracted_aliases; + if (func_node->name == "lambda") + { + Names lambda_aliases = RequiredSourceColumnsMatcher::extractNamesFromLambda(*func_node); + for (const auto & name : lambda_aliases) + { + auto it = data.aliases.find(name); + if (it != data.aliases.end()) + { + extracted_aliases.insert(data.aliases.extract(it)); + } + } + } + /// We skip the first argument. We also assume that the lambda function can not have parameters. size_t first_pos = 0; if (func_node->name == "lambda") @@ -192,6 +211,11 @@ void QueryNormalizer::visitChildren(IAST * node, Data & data) { visitChildren(func_node->window_definition.get(), data); } + + for (auto & it : extracted_aliases) + { + data.aliases.insert(it); + } } else if (!node->as()) { diff --git a/src/Interpreters/QueryNormalizer.h b/src/Interpreters/QueryNormalizer.h index 7fc0f4bdf82..eebcff62cde 100644 --- a/src/Interpreters/QueryNormalizer.h +++ b/src/Interpreters/QueryNormalizer.h @@ -39,7 +39,7 @@ public: using SetOfASTs = std::set; using MapOfASTs = std::map; - const Aliases & aliases; + Aliases & aliases; const NameSet & source_columns_set; ExtractedSettings settings; @@ -53,7 +53,7 @@ public: /// It's Ok to have "c + 1 AS c" in queries, but not in table definition const bool allow_self_aliases; /// for constructs like "SELECT column + 1 AS column" - Data(const Aliases & aliases_, const NameSet & source_columns_set_, bool ignore_alias_, ExtractedSettings && settings_, bool allow_self_aliases_) + Data(Aliases & aliases_, const NameSet & source_columns_set_, bool ignore_alias_, ExtractedSettings && settings_, bool allow_self_aliases_) : aliases(aliases_) , source_columns_set(source_columns_set_) , settings(settings_) diff --git a/src/Interpreters/SelectQueryOptions.h b/src/Interpreters/SelectQueryOptions.h index 52ce7c83741..709ecdc239c 100644 --- a/src/Interpreters/SelectQueryOptions.h +++ b/src/Interpreters/SelectQueryOptions.h @@ -1,6 +1,7 @@ #pragma once #include +#include namespace DB { @@ -45,6 +46,12 @@ struct SelectQueryOptions bool is_subquery = false; // non-subquery can also have subquery_depth > 0, e.g. insert select bool with_all_cols = false; /// asterisk include materialized and aliased columns + /// These two fields are used to evaluate shardNum() and shardCount() function when + /// prefer_localhost_replica == 1 and local instance is selected. They are needed because local + /// instance might have multiple shards and scalars can only hold one value. + std::optional shard_num; + std::optional shard_count; + SelectQueryOptions( QueryProcessingStage::Enum stage = QueryProcessingStage::Complete, size_t depth = 0, @@ -124,6 +131,13 @@ struct SelectQueryOptions with_all_cols = value; return *this; } + + SelectQueryOptions & setShardInfo(UInt32 shard_num_, UInt32 shard_count_) + { + shard_num = shard_num_; + shard_count = shard_count_; + return *this; + } }; } diff --git a/src/Interpreters/Set.cpp b/src/Interpreters/Set.cpp index 66ba1f9ac9c..ff502b499cd 100644 --- a/src/Interpreters/Set.cpp +++ b/src/Interpreters/Set.cpp @@ -7,8 +7,6 @@ #include -#include - #include #include @@ -217,6 +215,8 @@ bool Set::insertFromBlock(const Block & block) set_elements[i] = filtered_column; else set_elements[i]->insertRangeFrom(*filtered_column, 0, filtered_column->size()); + if (transform_null_in && null_map_holder) + set_elements[i]->insert(Null{}); } } @@ -281,7 +281,7 @@ ColumnPtr Set::execute(const Block & block, bool negative) const key_columns.emplace_back() = materialized_columns.back().get(); } - /// We will check existence in Set only for keys, where all components are not NULL. + /// We will check existence in Set only for keys whose components do not contain any NULL value. ConstNullMapPtr null_map{}; ColumnPtr null_map_holder; if (!transform_null_in) @@ -408,7 +408,7 @@ MergeTreeSetIndex::MergeTreeSetIndex(const Columns & set_elements, std::vector & key_ranges, { size_t tuple_size = indexes_mapping.size(); - ColumnsWithInfinity left_point; - ColumnsWithInfinity right_point; + FieldValues left_point; + FieldValues right_point; left_point.reserve(tuple_size); right_point.reserve(tuple_size); @@ -458,8 +458,8 @@ BoolMask MergeTreeSetIndex::checkInRange(const std::vector & key_ranges, right_point.emplace_back(ordered_set[i]->cloneEmpty()); } - bool invert_left_infinities = false; - bool invert_right_infinities = false; + bool left_included = true; + bool right_included = true; for (size_t i = 0; i < tuple_size; ++i) { @@ -471,48 +471,29 @@ BoolMask MergeTreeSetIndex::checkInRange(const std::vector & key_ranges, if (!new_range) return {true, true}; - /** A range that ends in (x, y, ..., +inf) exclusive is the same as a range - * that ends in (x, y, ..., -inf) inclusive and vice versa for the left bound. - */ - if (new_range->left_bounded) - { - if (!new_range->left_included) - invert_left_infinities = true; - - left_point[i].update(new_range->left); - } - else - { - if (invert_left_infinities) - left_point[i].update(ValueWithInfinity::PLUS_INFINITY); - else - left_point[i].update(ValueWithInfinity::MINUS_INFINITY); - } - - if (new_range->right_bounded) - { - if (!new_range->right_included) - invert_right_infinities = true; - - right_point[i].update(new_range->right); - } - else - { - if (invert_right_infinities) - right_point[i].update(ValueWithInfinity::MINUS_INFINITY); - else - right_point[i].update(ValueWithInfinity::PLUS_INFINITY); - } + left_point[i].update(new_range->left); + left_included &= new_range->left_included; + right_point[i].update(new_range->right); + right_included &= new_range->right_included; } - auto compare = [](const IColumn & lhs, const ValueWithInfinity & rhs, size_t row) + /// lhs < rhs return -1 + /// lhs == rhs return 0 + /// lhs > rhs return 1 + auto compare = [](const IColumn & lhs, const FieldValue & rhs, size_t row) { - auto type = rhs.getType(); - /// Return inverted infinity sign, because in 'lhs' all values are finite. - if (type != ValueWithInfinity::NORMAL) - return -static_cast(type); - - return lhs.compareAt(row, 0, rhs.getColumnIfFinite(), 1); + if (rhs.isNegativeInfinity()) + return 1; + if (rhs.isPositiveInfinity()) + { + Field f; + lhs.get(row, f); + if (f.isNull()) + return 0; // +Inf == +Inf + else + return -1; + } + return lhs.compareAt(row, 0, *rhs.column, 1); }; auto less = [this, &compare, tuple_size](size_t row, const auto & point) @@ -535,31 +516,32 @@ BoolMask MergeTreeSetIndex::checkInRange(const std::vector & key_ranges, }; /** Because each hyperrectangle maps to a contiguous sequence of elements - * laid out in the lexicographically increasing order, the set intersects the range - * if and only if either bound coincides with an element or at least one element - * is between the lower bounds - */ + * laid out in the lexicographically increasing order, the set intersects the range + * if and only if either bound coincides with an element or at least one element + * is between the lower bounds + */ auto indices = collections::range(0, size()); auto left_lower = std::lower_bound(indices.begin(), indices.end(), left_point, less); auto right_lower = std::lower_bound(indices.begin(), indices.end(), right_point, less); - /// A special case of 1-element KeyRange. It's useful for partition pruning + /// A special case of 1-element KeyRange. It's useful for partition pruning. bool one_element_range = true; for (size_t i = 0; i < tuple_size; ++i) { auto & left = left_point[i]; auto & right = right_point[i]; - if (left.getType() == right.getType()) + if (left.isNormal() && right.isNormal()) { - if (left.getType() == ValueWithInfinity::NORMAL) + if (0 != left.column->compareAt(0, 0, *right.column, 1)) { - if (0 != left.getColumnIfFinite().compareAt(0, 0, right.getColumnIfFinite(), 1)) - { - one_element_range = false; - break; - } + one_element_range = false; + break; } } + else if ((left.isPositiveInfinity() && right.isPositiveInfinity()) || (left.isNegativeInfinity() && right.isNegativeInfinity())) + { + /// Special value equality. + } else { one_element_range = false; @@ -571,19 +553,40 @@ BoolMask MergeTreeSetIndex::checkInRange(const std::vector & key_ranges, /// Here we know that there is one element in range. /// The main difference with the normal case is that we can definitely say that /// condition in this range always TRUE (can_be_false = 0) xor always FALSE (can_be_true = 0). - if (left_lower != indices.end() && equals(*left_lower, left_point)) + + /// Check if it's an empty range + if (!left_included || !right_included) + return {false, true}; + else if (left_lower != indices.end() && equals(*left_lower, left_point)) return {true, false}; else return {false, true}; } - return + /// If there are more than one element in the range, it can always be false. Thus we only need to check if it may be true or not. + /// Given left_lower >= left_point, right_lower >= right_point, find if there may be a match in between left_lower and right_lower. + if (left_lower + 1 < right_lower) { - left_lower != right_lower - || (left_lower != indices.end() && equals(*left_lower, left_point)) - || (right_lower != indices.end() && equals(*right_lower, right_point)), - true - }; + /// There is an point in between: left_lower + 1 + return {true, true}; + } + else if (left_lower + 1 == right_lower) + { + /// Need to check if left_lower is a valid match, as left_point <= left_lower < right_point <= right_lower. + /// Note: left_lower is valid. + if (left_included || !equals(*left_lower, left_point)) + return {true, true}; + + /// We are unlucky that left_point fails to cover a point. Now we need to check if right_point can cover right_lower. + /// Check if there is a match at the right boundary. + return {right_included && right_lower != indices.end() && equals(*right_lower, right_point), true}; + } + else // left_lower == right_lower + { + /// Need to check if right_point is a valid match, as left_point < right_point <= left_lower = right_lower. + /// Check if there is a match at the left boundary. + return {right_included && right_lower != indices.end() && equals(*right_lower, right_point), true}; + } } bool MergeTreeSetIndex::hasMonotonicFunctionsChain() const @@ -594,23 +597,18 @@ bool MergeTreeSetIndex::hasMonotonicFunctionsChain() const return false; } -void ValueWithInfinity::update(const Field & x) +void FieldValue::update(const Field & x) { - /// Keep at most one element in column. - if (!column->empty()) - column->popBack(1); - column->insert(x); - type = NORMAL; -} - -const IColumn & ValueWithInfinity::getColumnIfFinite() const -{ -#ifndef NDEBUG - if (type != NORMAL) - throw Exception("Trying to get column of infinite type", ErrorCodes::LOGICAL_ERROR); -#endif - - return *column; + if (x.isNegativeInfinity() || x.isPositiveInfinity()) + value = x; + else + { + /// Keep at most one element in column. + if (!column->empty()) + column->popBack(1); + column->insert(x); + value = Field(); // Set back to normal value. + } } } diff --git a/src/Interpreters/Set.h b/src/Interpreters/Set.h index c9bfbf0625c..9bf6630b844 100644 --- a/src/Interpreters/Set.h +++ b/src/Interpreters/Set.h @@ -178,29 +178,19 @@ using FunctionPtr = std::shared_ptr; * Single field is stored in column for more optimal inplace comparisons with other regular columns. * Extracting fields from columns and further their comparison is suboptimal and requires extra copying. */ -class ValueWithInfinity +struct FieldValue { -public: - enum Type - { - MINUS_INFINITY = -1, - NORMAL = 0, - PLUS_INFINITY = 1 - }; - - ValueWithInfinity(MutableColumnPtr && column_) - : column(std::move(column_)), type(NORMAL) {} - + FieldValue(MutableColumnPtr && column_) : column(std::move(column_)) {} void update(const Field & x); - void update(Type type_) { type = type_; } - const IColumn & getColumnIfFinite() const; + bool isNormal() const { return !value.isPositiveInfinity() && !value.isNegativeInfinity(); } + bool isPositiveInfinity() const { return value.isPositiveInfinity(); } + bool isNegativeInfinity() const { return value.isNegativeInfinity(); } - Type getType() const { return type; } + Field value; // Null, -Inf, +Inf -private: + // If value is Null, uses the actual value in column MutableColumnPtr column; - Type type; }; @@ -230,7 +220,7 @@ private: Columns ordered_set; std::vector indexes_mapping; - using ColumnsWithInfinity = std::vector; + using FieldValues = std::vector; }; } diff --git a/src/Interpreters/SortedBlocksWriter.cpp b/src/Interpreters/SortedBlocksWriter.cpp index b12616dba1e..3ce9f2d1b90 100644 --- a/src/Interpreters/SortedBlocksWriter.cpp +++ b/src/Interpreters/SortedBlocksWriter.cpp @@ -1,48 +1,47 @@ #include #include -#include -#include +#include +#include +#include +#include #include #include #include + namespace DB { -namespace ErrorCodes -{ - extern const int NOT_ENOUGH_SPACE; -} - namespace { -std::unique_ptr flushToFile(const String & tmp_path, const Block & header, IBlockInputStream & stream, const String & codec) +std::unique_ptr flushToFile(const String & tmp_path, const Block & header, QueryPipeline pipeline, const String & codec) { auto tmp_file = createTemporaryFile(tmp_path); - std::atomic is_cancelled{false}; - TemporaryFileStream::write(tmp_file->path(), header, stream, &is_cancelled, codec); - if (is_cancelled) - throw Exception("Cannot flush MergeJoin data on disk. No space at " + tmp_path, ErrorCodes::NOT_ENOUGH_SPACE); + TemporaryFileStream::write(tmp_file->path(), header, std::move(pipeline), codec); return tmp_file; } -SortedBlocksWriter::SortedFiles flushToManyFiles(const String & tmp_path, const Block & header, IBlockInputStream & stream, +SortedBlocksWriter::SortedFiles flushToManyFiles(const String & tmp_path, const Block & header, QueryPipeline pipeline, const String & codec, std::function callback = [](const Block &){}) { std::vector> files; + PullingPipelineExecutor executor(pipeline); - while (Block block = stream.read()) + Block block; + while (executor.pull(block)) { if (!block.rows()) continue; callback(block); - OneBlockInputStream block_stream(block); - auto tmp_file = flushToFile(tmp_path, header, block_stream, codec); + QueryPipeline one_block_pipeline; + Chunk chunk(block.getColumns(), block.rows()); + one_block_pipeline.init(Pipe(std::make_shared(block.cloneEmpty(), std::move(chunk)))); + auto tmp_file = flushToFile(tmp_path, header, std::move(one_block_pipeline), codec); files.emplace_back(std::move(tmp_file)); } @@ -118,23 +117,30 @@ SortedBlocksWriter::TmpFilePtr SortedBlocksWriter::flush(const BlocksList & bloc { const std::string path = getPath(); - if (blocks.empty()) + Pipes pipes; + pipes.reserve(blocks.size()); + for (const auto & block : blocks) + if (auto num_rows = block.rows()) + pipes.emplace_back(std::make_shared(block.cloneEmpty(), Chunk(block.getColumns(), num_rows))); + + if (pipes.empty()) return {}; - if (blocks.size() == 1) + QueryPipeline pipeline; + pipeline.init(Pipe::unitePipes(std::move(pipes))); + + if (pipeline.getNumStreams() > 1) { - OneBlockInputStream sorted_input(blocks.front()); - return flushToFile(path, sample_block, sorted_input, codec); + auto transform = std::make_shared( + pipeline.getHeader(), + pipeline.getNumStreams(), + sort_description, + rows_in_block); + + pipeline.addTransform(std::move(transform)); } - BlockInputStreams inputs; - inputs.reserve(blocks.size()); - for (const auto & block : blocks) - if (block.rows()) - inputs.push_back(std::make_shared(block)); - - MergingSortedBlockInputStream sorted_input(inputs, sort_description, rows_in_block); - return flushToFile(path, sample_block, sorted_input, codec); + return flushToFile(path, sample_block, std::move(pipeline), codec); } SortedBlocksWriter::PremergedFiles SortedBlocksWriter::premerge() @@ -157,8 +163,8 @@ SortedBlocksWriter::PremergedFiles SortedBlocksWriter::premerge() if (!blocks.empty()) files.emplace_back(flush(blocks)); - BlockInputStreams inputs; - inputs.reserve(num_files_for_merge); + Pipes pipes; + pipes.reserve(num_files_for_merge); /// Merge by parts to save memory. It's possible to exchange disk I/O and memory by num_files_for_merge. { @@ -169,13 +175,26 @@ SortedBlocksWriter::PremergedFiles SortedBlocksWriter::premerge() { for (const auto & file : files) { - inputs.emplace_back(streamFromFile(file)); + pipes.emplace_back(streamFromFile(file)); - if (inputs.size() == num_files_for_merge || &file == &files.back()) + if (pipes.size() == num_files_for_merge || &file == &files.back()) { - MergingSortedBlockInputStream sorted_input(inputs, sort_description, rows_in_block); - new_files.emplace_back(flushToFile(getPath(), sample_block, sorted_input, codec)); - inputs.clear(); + QueryPipeline pipeline; + pipeline.init(Pipe::unitePipes(std::move(pipes))); + pipes = Pipes(); + + if (pipeline.getNumStreams() > 1) + { + auto transform = std::make_shared( + pipeline.getHeader(), + pipeline.getNumStreams(), + sort_description, + rows_in_block); + + pipeline.addTransform(std::move(transform)); + } + + new_files.emplace_back(flushToFile(getPath(), sample_block, std::move(pipeline), codec)); } } @@ -184,22 +203,35 @@ SortedBlocksWriter::PremergedFiles SortedBlocksWriter::premerge() } for (const auto & file : files) - inputs.emplace_back(streamFromFile(file)); + pipes.emplace_back(streamFromFile(file)); } - return PremergedFiles{std::move(files), std::move(inputs)}; + return PremergedFiles{std::move(files), Pipe::unitePipes(std::move(pipes))}; } SortedBlocksWriter::SortedFiles SortedBlocksWriter::finishMerge(std::function callback) { PremergedFiles files = premerge(); - MergingSortedBlockInputStream sorted_input(files.streams, sort_description, rows_in_block); - return flushToManyFiles(getPath(), sample_block, sorted_input, codec, callback); + QueryPipeline pipeline; + pipeline.init(std::move(files.pipe)); + + if (pipeline.getNumStreams() > 1) + { + auto transform = std::make_shared( + pipeline.getHeader(), + pipeline.getNumStreams(), + sort_description, + rows_in_block); + + pipeline.addTransform(std::move(transform)); + } + + return flushToManyFiles(getPath(), sample_block, std::move(pipeline), codec, callback); } -BlockInputStreamPtr SortedBlocksWriter::streamFromFile(const TmpFilePtr & file) const +Pipe SortedBlocksWriter::streamFromFile(const TmpFilePtr & file) const { - return std::make_shared(file->path(), materializeBlock(sample_block)); + return Pipe(std::make_shared(file->path(), materializeBlock(sample_block))); } String SortedBlocksWriter::getPath() const @@ -249,18 +281,35 @@ Block SortedBlocksBuffer::mergeBlocks(Blocks && blocks) const size_t num_rows = 0; { /// Merge sort blocks - BlockInputStreams inputs; - inputs.reserve(blocks.size()); + Pipes pipes; + pipes.reserve(blocks.size()); for (auto & block : blocks) { num_rows += block.rows(); - inputs.emplace_back(std::make_shared(block)); + Chunk chunk(block.getColumns(), block.rows()); + pipes.emplace_back(std::make_shared(block.cloneEmpty(), std::move(chunk))); } Blocks tmp_blocks; - MergingSortedBlockInputStream stream(inputs, sort_description, num_rows); - while (const auto & block = stream.read()) + + QueryPipeline pipeline; + pipeline.init(Pipe::unitePipes(std::move(pipes))); + + if (pipeline.getNumStreams() > 1) + { + auto transform = std::make_shared( + pipeline.getHeader(), + pipeline.getNumStreams(), + sort_description, + num_rows); + + pipeline.addTransform(std::move(transform)); + } + + PullingPipelineExecutor executor(pipeline); + Block block; + while (executor.pull(block)) tmp_blocks.emplace_back(block); blocks.swap(tmp_blocks); diff --git a/src/Interpreters/SortedBlocksWriter.h b/src/Interpreters/SortedBlocksWriter.h index 3c7bd8dc625..c65511e943e 100644 --- a/src/Interpreters/SortedBlocksWriter.h +++ b/src/Interpreters/SortedBlocksWriter.h @@ -6,9 +6,11 @@ #include #include #include +#include #include #include + namespace DB { @@ -16,6 +18,8 @@ class TableJoin; class MergeJoinCursor; struct MergeJoinEqualRange; +class Pipe; + class IVolume; using VolumePtr = std::shared_ptr; @@ -55,7 +59,7 @@ struct SortedBlocksWriter struct PremergedFiles { SortedFiles files; - BlockInputStreams streams; + Pipe pipe; }; static constexpr const size_t num_streams = 2; @@ -93,7 +97,7 @@ struct SortedBlocksWriter } String getPath() const; - BlockInputStreamPtr streamFromFile(const TmpFilePtr & file) const; + Pipe streamFromFile(const TmpFilePtr & file) const; void insert(Block && block); TmpFilePtr flush(const BlocksList & blocks) const; diff --git a/src/Interpreters/SynonymsExtensions.cpp b/src/Interpreters/SynonymsExtensions.cpp new file mode 100644 index 00000000000..22fa91a4349 --- /dev/null +++ b/src/Interpreters/SynonymsExtensions.cpp @@ -0,0 +1,157 @@ +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + +#if USE_NLP + +#include +#include + +#include +#include + +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; + extern const int UNKNOWN_ELEMENT_IN_CONFIG; + extern const int INVALID_CONFIG_PARAMETER; +} + +class PlainSynonymsExtension : public ISynonymsExtension +{ +private: + using Container = std::list; + using LookupTable = std::unordered_map; + + Container synsets; + LookupTable table; + +public: + explicit PlainSynonymsExtension(const String & path) + { + std::ifstream file(path); + if (!file.is_open()) + throw Exception("Cannot find synonyms extension at: " + path, + ErrorCodes::INVALID_CONFIG_PARAMETER); + + String line; + while (std::getline(file, line)) + { + Synset synset; + boost::split(synset, line, boost::is_any_of("\t ")); + if (!synset.empty()) + { + synsets.emplace_back(std::move(synset)); + + for (const auto &word : synsets.back()) + table[word] = &synsets.back(); + } + } + } + + const Synset * getSynonyms(std::string_view token) const override + { + auto it = table.find(token); + + if (it != table.end()) + return (*it).second; + + return nullptr; + } +}; + +class WordnetSynonymsExtension : public ISynonymsExtension +{ +private: + wnb::wordnet wn; + +public: + explicit WordnetSynonymsExtension(const String & path) : wn(path) {} + + const Synset * getSynonyms(std::string_view token) const override + { + return wn.get_synset(std::string(token)); + } +}; + +/// Duplicate of code from StringUtils.h. Copied here for less dependencies. +static bool startsWith(const std::string & s, const char * prefix) +{ + return s.size() >= strlen(prefix) && 0 == memcmp(s.data(), prefix, strlen(prefix)); +} + +SynonymsExtensions::SynonymsExtensions(const Poco::Util::AbstractConfiguration & config) +{ + String prefix = "synonyms_extensions"; + Poco::Util::AbstractConfiguration::Keys keys; + + if (!config.has(prefix)) + throw Exception(ErrorCodes::INVALID_CONFIG_PARAMETER, + "No synonims extensions specified in server config on prefix '{}'", prefix); + + config.keys(prefix, keys); + + for (const auto & key : keys) + { + if (startsWith(key, "extension")) + { + const auto & ext_name = config.getString(prefix + "." + key + ".name", ""); + const auto & ext_path = config.getString(prefix + "." + key + ".path", ""); + const auto & ext_type = config.getString(prefix + "." + key + ".type", ""); + + if (ext_name.empty()) + throw Exception("Extension name in config is not specified here: " + prefix + "." + key + ".name", + ErrorCodes::INVALID_CONFIG_PARAMETER); + if (ext_path.empty()) + throw Exception("Extension path in config is not specified here: " + prefix + "." + key + ".path", + ErrorCodes::INVALID_CONFIG_PARAMETER); + if (ext_type.empty()) + throw Exception("Extension type in config is not specified here: " + prefix + "." + key + ".type", + ErrorCodes::INVALID_CONFIG_PARAMETER); + if (ext_type != "plain" && ext_type != "wordnet") + throw Exception("Unknown extension type in config: " + prefix + "." + key + ".type, must be 'plain' or 'wordnet'", + ErrorCodes::INVALID_CONFIG_PARAMETER); + + info[ext_name].path = ext_path; + info[ext_name].type = ext_type; + } + else + throw Exception("Unknown element in config: " + prefix + "." + key + ", must be 'extension'", + ErrorCodes::UNKNOWN_ELEMENT_IN_CONFIG); + } +} + +SynonymsExtensions::ExtPtr SynonymsExtensions::getExtension(const String & name) +{ + std::lock_guard guard(mutex); + + if (extensions.find(name) != extensions.end()) + return extensions[name]; + + if (info.find(name) != info.end()) + { + const Info & ext_info = info[name]; + + if (ext_info.type == "plain") + extensions[name] = std::make_shared(ext_info.path); + else if (ext_info.type == "wordnet") + extensions[name] = std::make_shared(ext_info.path); + else + throw Exception("Unknown extension type: " + ext_info.type, ErrorCodes::LOGICAL_ERROR); + + return extensions[name]; + } + + throw Exception("Extension named: '" + name + "' is not found", + ErrorCodes::INVALID_CONFIG_PARAMETER); +} + +} + +#endif diff --git a/src/Interpreters/SynonymsExtensions.h b/src/Interpreters/SynonymsExtensions.h new file mode 100644 index 00000000000..fd2bf03e162 --- /dev/null +++ b/src/Interpreters/SynonymsExtensions.h @@ -0,0 +1,57 @@ +#pragma once + +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + +#if USE_NLP + +#include +#include + +#include +#include +#include +#include +#include + +namespace DB +{ + +class ISynonymsExtension +{ +public: + using Synset = std::vector; + + virtual const Synset * getSynonyms(std::string_view token) const = 0; + + virtual ~ISynonymsExtension() = default; +}; + +class SynonymsExtensions +{ +public: + using ExtPtr = std::shared_ptr; + + explicit SynonymsExtensions(const Poco::Util::AbstractConfiguration & config); + + ExtPtr getExtension(const String & name); + +private: + struct Info + { + String path; + String type; + }; + + using ExtContainer = std::unordered_map; + using InfoContainer = std::unordered_map; + + std::mutex mutex; + ExtContainer extensions; + InfoContainer info; +}; + +} + +#endif diff --git a/src/Interpreters/SystemLog.cpp b/src/Interpreters/SystemLog.cpp index 31ceca8ec05..d3224a53ccd 100644 --- a/src/Interpreters/SystemLog.cpp +++ b/src/Interpreters/SystemLog.cpp @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -103,6 +104,7 @@ SystemLogs::SystemLogs(ContextPtr global_context, const Poco::Util::AbstractConf opentelemetry_span_log = createSystemLog( global_context, "system", "opentelemetry_span_log", config, "opentelemetry_span_log"); + zookeeper_log = createSystemLog(global_context, "system", "zookeeper_log", config, "zookeeper_log"); if (query_log) logs.emplace_back(query_log.get()); @@ -122,6 +124,8 @@ SystemLogs::SystemLogs(ContextPtr global_context, const Poco::Util::AbstractConf logs.emplace_back(asynchronous_metric_log.get()); if (opentelemetry_span_log) logs.emplace_back(opentelemetry_span_log.get()); + if (zookeeper_log) + logs.emplace_back(zookeeper_log.get()); try { diff --git a/src/Interpreters/SystemLog.h b/src/Interpreters/SystemLog.h index ee3116362e5..b94f3f7d456 100644 --- a/src/Interpreters/SystemLog.h +++ b/src/Interpreters/SystemLog.h @@ -74,6 +74,7 @@ class CrashLog; class MetricLog; class AsynchronousMetricLog; class OpenTelemetrySpanLog; +class ZooKeeperLog; class ISystemLog @@ -110,6 +111,8 @@ struct SystemLogs std::shared_ptr asynchronous_metric_log; /// OpenTelemetry trace spans. std::shared_ptr opentelemetry_span_log; + /// Used to log all actions of ZooKeeper client + std::shared_ptr zookeeper_log; std::vector logs; }; diff --git a/src/Interpreters/TableJoin.cpp b/src/Interpreters/TableJoin.cpp index 122e2cd6479..20e8f6b18b4 100644 --- a/src/Interpreters/TableJoin.cpp +++ b/src/Interpreters/TableJoin.cpp @@ -1,17 +1,17 @@ #include -#include - -#include - -#include -#include -#include - #include +#include +#include +#include + #include -#include +#include +#include +#include + +#include namespace DB @@ -132,6 +132,8 @@ ASTPtr TableJoin::leftKeysList() const { ASTPtr keys_list = std::make_shared(); keys_list->children = key_asts_left; + if (ASTPtr extra_cond = joinConditionColumn(JoinTableSide::Left)) + keys_list->children.push_back(extra_cond); return keys_list; } @@ -140,6 +142,8 @@ ASTPtr TableJoin::rightKeysList() const ASTPtr keys_list = std::make_shared(); if (hasOn()) keys_list->children = key_asts_right; + if (ASTPtr extra_cond = joinConditionColumn(JoinTableSide::Right)) + keys_list->children.push_back(extra_cond); return keys_list; } @@ -176,22 +180,6 @@ NamesWithAliases TableJoin::getRequiredColumns(const Block & sample, const Names return getNamesWithAliases(required_columns); } -void TableJoin::splitAdditionalColumns(const Block & sample_block, Block & block_keys, Block & block_others) const -{ - block_others = materializeBlock(sample_block); - - for (const String & column_name : key_names_right) - { - /// Extract right keys with correct keys order. There could be the same key names. - if (!block_keys.has(column_name)) - { - auto & col = block_others.getByName(column_name); - block_keys.insert(col); - block_others.erase(column_name); - } - } -} - Block TableJoin::getRequiredRightKeys(const Block & right_table_keys, std::vector & keys_sources) const { const Names & left_keys = keyNamesLeft(); @@ -474,4 +462,48 @@ String TableJoin::renamedRightColumnName(const String & name) const return name; } +void TableJoin::addJoinCondition(const ASTPtr & ast, bool is_left) +{ + LOG_TRACE(&Poco::Logger::get("TableJoin"), "Add join condition for {} table: {}", (is_left ? "left" : "right"), queryToString(ast)); + + if (is_left) + on_filter_condition_asts_left.push_back(ast); + else + on_filter_condition_asts_right.push_back(ast); +} + +/// Returns all conditions related to one table joined with 'and' function +static ASTPtr buildJoinConditionColumn(const ASTs & on_filter_condition_asts) +{ + if (on_filter_condition_asts.empty()) + return nullptr; + + if (on_filter_condition_asts.size() == 1) + return on_filter_condition_asts[0]; + + auto function = std::make_shared(); + function->name = "and"; + function->arguments = std::make_shared(); + function->children.push_back(function->arguments); + function->arguments->children = on_filter_condition_asts; + return function; +} + +ASTPtr TableJoin::joinConditionColumn(JoinTableSide side) const +{ + if (side == JoinTableSide::Left) + return buildJoinConditionColumn(on_filter_condition_asts_left); + return buildJoinConditionColumn(on_filter_condition_asts_right); +} + +std::pair TableJoin::joinConditionColumnNames() const +{ + std::pair res; + if (auto cond_ast = joinConditionColumn(JoinTableSide::Left)) + res.first = cond_ast->getColumnName(); + if (auto cond_ast = joinConditionColumn(JoinTableSide::Right)) + res.second = cond_ast->getColumnName(); + return res; +} + } diff --git a/src/Interpreters/TableJoin.h b/src/Interpreters/TableJoin.h index 08098e5378c..4c8c16028f5 100644 --- a/src/Interpreters/TableJoin.h +++ b/src/Interpreters/TableJoin.h @@ -33,6 +33,12 @@ struct Settings; class IVolume; using VolumePtr = std::shared_ptr; +enum class JoinTableSide +{ + Left, + Right +}; + class TableJoin { @@ -67,9 +73,12 @@ private: Names key_names_left; Names key_names_right; /// Duplicating names are qualified. + ASTs on_filter_condition_asts_left; + ASTs on_filter_condition_asts_right; ASTs key_asts_left; ASTs key_asts_right; + ASTTableJoin table_join; ASOF::Inequality asof_inequality = ASOF::Inequality::GreaterOrEquals; @@ -150,6 +159,23 @@ public: void addUsingKey(const ASTPtr & ast); void addOnKeys(ASTPtr & left_table_ast, ASTPtr & right_table_ast); + /* Conditions for left/right table from JOIN ON section. + * + * Conditions for left and right tables stored separately and united with 'and' function into one column. + * For example for query: + * SELECT ... JOIN ... ON t1.id == t2.id AND expr11(t1) AND expr21(t2) AND expr12(t1) AND expr22(t2) + * + * We will build two new ASTs: `expr11(t1) AND expr12(t1)`, `expr21(t2) AND expr22(t2)` + * Such columns will be added and calculated for left and right tables respectively. + * Only rows where conditions are met (where new columns have non-zero value) will be joined. + * + * NOTE: non-equi condition containing columns from different tables (like `... ON t1.id = t2.id AND t1.val > t2.val) + * doesn't supported yet, it can be added later. + */ + void addJoinCondition(const ASTPtr & ast, bool is_left); + ASTPtr joinConditionColumn(JoinTableSide side) const; + std::pair joinConditionColumnNames() const; + bool hasUsing() const { return table_join.using_expression_list != nullptr; } bool hasOn() const { return table_join.on_expression != nullptr; } @@ -201,8 +227,6 @@ public: /// StorageJoin overrides key names (cause of different names qualification) void setRightKeys(const Names & keys) { key_names_right = keys; } - /// Split key and other columns by keys name list - void splitAdditionalColumns(const Block & sample_block, Block & block_keys, Block & block_others) const; Block getRequiredRightKeys(const Block & right_table_keys, std::vector & keys_sources) const; String renamedRightColumnName(const String & name) const; diff --git a/src/Interpreters/TreeRewriter.cpp b/src/Interpreters/TreeRewriter.cpp index 2bdad8b698f..cc345004f6f 100644 --- a/src/Interpreters/TreeRewriter.cpp +++ b/src/Interpreters/TreeRewriter.cpp @@ -532,9 +532,12 @@ void collectJoinedColumns(TableJoin & analyzed_join, const ASTTableJoin & table_ CollectJoinOnKeysVisitor::Data data{analyzed_join, tables[0], tables[1], aliases, is_asof}; CollectJoinOnKeysVisitor(data).visit(table_join.on_expression); - if (!data.has_some) + if (analyzed_join.keyNamesLeft().empty()) + { throw Exception("Cannot get JOIN keys from JOIN ON section: " + queryToString(table_join.on_expression), ErrorCodes::INVALID_JOIN_ON_EXPRESSION); + } + if (is_asof) data.asofToJoinKeys(); } @@ -951,8 +954,13 @@ TreeRewriterResultPtr TreeRewriter::analyzeSelect( /// rewrite filters for select query, must go after getArrayJoinedColumns if (settings.optimize_respect_aliases && result.metadata_snapshot) { - replaceAliasColumnsInQuery(query, result.metadata_snapshot->getColumns(), result.array_join_result_to_source, getContext()); - result.collectUsedColumns(query, true); + /// If query is changed, we need to redo some work to correct name resolution. + if (replaceAliasColumnsInQuery(query, result.metadata_snapshot->getColumns(), result.array_join_result_to_source, getContext())) + { + result.aggregates = getAggregates(query, *select_query); + result.window_function_asts = getWindowFunctions(query, *select_query); + result.collectUsedColumns(query, true); + } } result.ast_join = select_query->join(); diff --git a/src/Interpreters/ZooKeeperLog.cpp b/src/Interpreters/ZooKeeperLog.cpp new file mode 100644 index 00000000000..bc187876d24 --- /dev/null +++ b/src/Interpreters/ZooKeeperLog.cpp @@ -0,0 +1,202 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +NamesAndTypesList ZooKeeperLogElement::getNamesAndTypes() +{ + auto type_enum = std::make_shared( + DataTypeEnum8::Values + { + {"Request", static_cast(REQUEST)}, + {"Response", static_cast(RESPONSE)}, + {"Finalize", static_cast(FINALIZE)}, + }); + + auto op_num_enum = std::make_shared( + DataTypeEnum16::Values + { + {"Watch", 0}, + {"Close", static_cast(Coordination::OpNum::Close)}, + {"Error", static_cast(Coordination::OpNum::Error)}, + {"Create", static_cast(Coordination::OpNum::Create)}, + {"Remove", static_cast(Coordination::OpNum::Remove)}, + {"Exists", static_cast(Coordination::OpNum::Exists)}, + {"Get", static_cast(Coordination::OpNum::Get)}, + {"Set", static_cast(Coordination::OpNum::Set)}, + {"GetACL", static_cast(Coordination::OpNum::GetACL)}, + {"SetACL", static_cast(Coordination::OpNum::SetACL)}, + {"SimpleList", static_cast(Coordination::OpNum::SimpleList)}, + {"Sync", static_cast(Coordination::OpNum::Sync)}, + {"Heartbeat", static_cast(Coordination::OpNum::Heartbeat)}, + {"List", static_cast(Coordination::OpNum::List)}, + {"Check", static_cast(Coordination::OpNum::Check)}, + {"Multi", static_cast(Coordination::OpNum::Multi)}, + {"Auth", static_cast(Coordination::OpNum::Auth)}, + {"SessionID", static_cast(Coordination::OpNum::SessionID)}, + }); + + auto error_enum = std::make_shared( + DataTypeEnum8::Values + { + {"ZOK", static_cast(Coordination::Error::ZOK)}, + + {"ZSYSTEMERROR", static_cast(Coordination::Error::ZSYSTEMERROR)}, + {"ZRUNTIMEINCONSISTENCY", static_cast(Coordination::Error::ZRUNTIMEINCONSISTENCY)}, + {"ZDATAINCONSISTENCY", static_cast(Coordination::Error::ZDATAINCONSISTENCY)}, + {"ZCONNECTIONLOSS", static_cast(Coordination::Error::ZCONNECTIONLOSS)}, + {"ZMARSHALLINGERROR", static_cast(Coordination::Error::ZMARSHALLINGERROR)}, + {"ZUNIMPLEMENTED", static_cast(Coordination::Error::ZUNIMPLEMENTED)}, + {"ZOPERATIONTIMEOUT", static_cast(Coordination::Error::ZOPERATIONTIMEOUT)}, + {"ZBADARGUMENTS", static_cast(Coordination::Error::ZBADARGUMENTS)}, + {"ZINVALIDSTATE", static_cast(Coordination::Error::ZINVALIDSTATE)}, + + {"ZAPIERROR", static_cast(Coordination::Error::ZAPIERROR)}, + {"ZNONODE", static_cast(Coordination::Error::ZNONODE)}, + {"ZNOAUTH", static_cast(Coordination::Error::ZNOAUTH)}, + {"ZBADVERSION", static_cast(Coordination::Error::ZBADVERSION)}, + {"ZNOCHILDRENFOREPHEMERALS", static_cast(Coordination::Error::ZNOCHILDRENFOREPHEMERALS)}, + {"ZNODEEXISTS", static_cast(Coordination::Error::ZNODEEXISTS)}, + {"ZNOTEMPTY", static_cast(Coordination::Error::ZNOTEMPTY)}, + {"ZSESSIONEXPIRED", static_cast(Coordination::Error::ZSESSIONEXPIRED)}, + {"ZINVALIDCALLBACK", static_cast(Coordination::Error::ZINVALIDCALLBACK)}, + {"ZINVALIDACL", static_cast(Coordination::Error::ZINVALIDACL)}, + {"ZAUTHFAILED", static_cast(Coordination::Error::ZAUTHFAILED)}, + {"ZCLOSING", static_cast(Coordination::Error::ZCLOSING)}, + {"ZNOTHING", static_cast(Coordination::Error::ZNOTHING)}, + {"ZSESSIONMOVED", static_cast(Coordination::Error::ZSESSIONMOVED)}, + }); + + auto watch_type_enum = std::make_shared( + DataTypeEnum8::Values + { + {"CREATED", static_cast(Coordination::Event::CREATED)}, + {"DELETED", static_cast(Coordination::Event::DELETED)}, + {"CHANGED", static_cast(Coordination::Event::CHANGED)}, + {"CHILD", static_cast(Coordination::Event::CHILD)}, + {"SESSION", static_cast(Coordination::Event::SESSION)}, + {"NOTWATCHING", static_cast(Coordination::Event::NOTWATCHING)}, + }); + + auto watch_state_enum = std::make_shared( + DataTypeEnum16::Values + { + {"EXPIRED_SESSION", static_cast(Coordination::State::EXPIRED_SESSION)}, + {"AUTH_FAILED", static_cast(Coordination::State::AUTH_FAILED)}, + {"CONNECTING", static_cast(Coordination::State::CONNECTING)}, + {"ASSOCIATING", static_cast(Coordination::State::ASSOCIATING)}, + {"CONNECTED", static_cast(Coordination::State::CONNECTED)}, + {"NOTCONNECTED", static_cast(Coordination::State::NOTCONNECTED)}, + }); + + return + { + {"type", std::move(type_enum)}, + {"event_date", std::make_shared()}, + {"event_time", std::make_shared(6)}, + {"address", DataTypeFactory::instance().get("IPv6")}, + {"port", std::make_shared()}, + {"session_id", std::make_shared()}, + + {"xid", std::make_shared()}, + {"has_watch", std::make_shared()}, + {"op_num", op_num_enum}, + {"path", std::make_shared()}, + + {"data", std::make_shared()}, + + {"is_ephemeral", std::make_shared()}, + {"is_sequential", std::make_shared()}, + + {"version", std::make_shared(std::make_shared())}, + + {"requests_size", std::make_shared()}, + {"request_idx", std::make_shared()}, + + {"zxid", std::make_shared()}, + {"error", std::make_shared(error_enum)}, + + {"watch_type", std::make_shared(watch_type_enum)}, + {"watch_state", std::make_shared(watch_state_enum)}, + + {"path_created", std::make_shared()}, + + {"stat_czxid", std::make_shared()}, + {"stat_mzxid", std::make_shared()}, + {"stat_pzxid", std::make_shared()}, + {"stat_version", std::make_shared()}, + {"stat_cversion", std::make_shared()}, + {"stat_dataLength", std::make_shared()}, + {"stat_numChildren", std::make_shared()}, + + {"children", std::make_shared(std::make_shared())}, + }; +} + +void ZooKeeperLogElement::appendToBlock(MutableColumns & columns) const +{ + assert(type != UNKNOWN); + size_t i = 0; + + columns[i++]->insert(type); + auto event_time_seconds = event_time / 1000000; + columns[i++]->insert(DateLUT::instance().toDayNum(event_time_seconds).toUnderType()); + columns[i++]->insert(event_time); + columns[i++]->insert(IPv6ToBinary(address.host()).data()); + columns[i++]->insert(address.port()); + columns[i++]->insert(session_id); + + columns[i++]->insert(xid); + columns[i++]->insert(has_watch); + columns[i++]->insert(op_num); + columns[i++]->insert(path); + + columns[i++]->insert(data); + + columns[i++]->insert(is_ephemeral); + columns[i++]->insert(is_sequential); + + columns[i++]->insert(version ? Field(*version) : Field()); + + columns[i++]->insert(requests_size); + columns[i++]->insert(request_idx); + + columns[i++]->insert(zxid); + columns[i++]->insert(error ? Field(*error) : Field()); + + columns[i++]->insert(watch_type ? Field(*watch_type) : Field()); + columns[i++]->insert(watch_state ? Field(*watch_state) : Field()); + + columns[i++]->insert(path_created); + + columns[i++]->insert(stat.czxid); + columns[i++]->insert(stat.mzxid); + columns[i++]->insert(stat.pzxid); + columns[i++]->insert(stat.version); + columns[i++]->insert(stat.cversion); + columns[i++]->insert(stat.dataLength); + columns[i++]->insert(stat.numChildren); + + Array children_array; + for (const auto & c : children) + children_array.emplace_back(c); + columns[i++]->insert(children_array); +} + +}; diff --git a/src/Interpreters/ZooKeeperLog.h b/src/Interpreters/ZooKeeperLog.h new file mode 100644 index 00000000000..d3ef68625af --- /dev/null +++ b/src/Interpreters/ZooKeeperLog.h @@ -0,0 +1,76 @@ +#pragma once + +#include +#include +#include +#include + + +namespace DB +{ + +struct ZooKeeperLogElement +{ + enum Type + { + UNKNOWN = 0, + REQUEST = 1, + RESPONSE = 2, + FINALIZE = 3 + }; + + Type type = UNKNOWN; + Decimal64 event_time = 0; + Poco::Net::SocketAddress address; + Int64 session_id = 0; + + /// Common request info + Int32 xid = 0; + bool has_watch = false; + Int32 op_num = 0; + String path; + + /// create, set + String data; + + /// create + bool is_ephemeral = false; + bool is_sequential = false; + + /// remove, check, set + std::optional version; + + /// multi + UInt32 requests_size = 0; + UInt32 request_idx = 0; + + /// Common response info + Int64 zxid = 0; + std::optional error; + + /// watch + std::optional watch_type; + std::optional watch_state; + + /// create + String path_created; + + /// exists, get, set, list + Coordination::Stat stat = {}; + + /// list + Strings children; + + + static std::string name() { return "ZooKeeperLog"; } + static NamesAndTypesList getNamesAndTypes(); + static NamesAndAliases getNamesAndAliases() { return {}; } + void appendToBlock(MutableColumns & columns) const; +}; + +class ZooKeeperLog : public SystemLog +{ + using SystemLog::SystemLog; +}; + +} diff --git a/src/Interpreters/executeDDLQueryOnCluster.cpp b/src/Interpreters/executeDDLQueryOnCluster.cpp index c5dec2cf214..180a4f9af3e 100644 --- a/src/Interpreters/executeDDLQueryOnCluster.cpp +++ b/src/Interpreters/executeDDLQueryOnCluster.cpp @@ -14,11 +14,10 @@ #include #include #include -#include -#include -#include +#include #include + namespace fs = std::filesystem; namespace DB @@ -168,48 +167,72 @@ BlockIO executeDDLQueryOnCluster(const ASTPtr & query_ptr_, ContextPtr context, return getDistributedDDLStatus(node_path, entry, context); } + +class DDLQueryStatusSource final : public SourceWithProgress +{ +public: + DDLQueryStatusSource( + const String & zk_node_path, const DDLLogEntry & entry, ContextPtr context_, const std::optional & hosts_to_wait = {}); + + String getName() const override { return "DDLQueryStatus"; } + Chunk generate() override; + Status prepare() override; + +private: + static Strings getChildrenAllowNoNode(const std::shared_ptr & zookeeper, const String & node_path); + + Strings getNewAndUpdate(const Strings & current_list_of_finished_hosts); + + std::pair parseHostAndPort(const String & host_id) const; + + String node_path; + ContextPtr context; + Stopwatch watch; + Poco::Logger * log; + + NameSet waiting_hosts; /// hosts from task host list + NameSet finished_hosts; /// finished hosts from host list + NameSet ignoring_hosts; /// appeared hosts that are not in hosts list + Strings current_active_hosts; /// Hosts that were in active state at the last check + size_t num_hosts_finished = 0; + + /// Save the first detected error and throw it at the end of execution + std::unique_ptr first_exception; + + Int64 timeout_seconds = 120; + bool by_hostname = true; + bool throw_on_timeout = true; + bool timeout_exceeded = false; +}; + + BlockIO getDistributedDDLStatus(const String & node_path, const DDLLogEntry & entry, ContextPtr context, const std::optional & hosts_to_wait) { BlockIO io; if (context->getSettingsRef().distributed_ddl_task_timeout == 0) return io; - BlockInputStreamPtr stream = std::make_shared(node_path, entry, context, hosts_to_wait); - if (context->getSettingsRef().distributed_ddl_output_mode == DistributedDDLOutputMode::NONE) - { - /// Wait for query to finish, but ignore output - auto null_output = std::make_shared(stream->getHeader()); - stream = std::make_shared(std::move(stream), std::move(null_output)); - } + ProcessorPtr processor = std::make_shared(node_path, entry, context, hosts_to_wait); + io.pipeline.init(Pipe{processor}); + + if (context->getSettingsRef().distributed_ddl_output_mode == DistributedDDLOutputMode::NONE) + io.pipeline.setSinks([](const Block & header, QueryPipeline::StreamType){ return std::make_shared(header); }); - io.in = std::move(stream); return io; } -DDLQueryStatusInputStream::DDLQueryStatusInputStream(const String & zk_node_path, const DDLLogEntry & entry, ContextPtr context_, - const std::optional & hosts_to_wait) - : node_path(zk_node_path) - , context(context_) - , watch(CLOCK_MONOTONIC_COARSE) - , log(&Poco::Logger::get("DDLQueryStatusInputStream")) +static Block getSampleBlock(ContextPtr context_, bool hosts_to_wait) { - if (context->getSettingsRef().distributed_ddl_output_mode == DistributedDDLOutputMode::THROW || - context->getSettingsRef().distributed_ddl_output_mode == DistributedDDLOutputMode::NONE) - throw_on_timeout = true; - else if (context->getSettingsRef().distributed_ddl_output_mode == DistributedDDLOutputMode::NULL_STATUS_ON_TIMEOUT || - context->getSettingsRef().distributed_ddl_output_mode == DistributedDDLOutputMode::NEVER_THROW) - throw_on_timeout = false; - else - throw Exception(ErrorCodes::LOGICAL_ERROR, "Unknown output mode"); + auto output_mode = context_->getSettingsRef().distributed_ddl_output_mode; auto maybe_make_nullable = [&](const DataTypePtr & type) -> DataTypePtr { - if (throw_on_timeout) + if (output_mode == DistributedDDLOutputMode::THROW || output_mode == DistributedDDLOutputMode::NONE) return type; return std::make_shared(type); }; - sample = Block{ + Block res = Block{ {std::make_shared(), "host"}, {std::make_shared(), "port"}, {maybe_make_nullable(std::make_shared()), "status"}, @@ -218,11 +241,27 @@ DDLQueryStatusInputStream::DDLQueryStatusInputStream(const String & zk_node_path {std::make_shared(), "num_hosts_active"}, }; + if (hosts_to_wait) + res.erase("port"); + + return res; +} + +DDLQueryStatusSource::DDLQueryStatusSource( + const String & zk_node_path, const DDLLogEntry & entry, ContextPtr context_, const std::optional & hosts_to_wait) + : SourceWithProgress(getSampleBlock(context_, hosts_to_wait.has_value()), true) + , node_path(zk_node_path) + , context(context_) + , watch(CLOCK_MONOTONIC_COARSE) + , log(&Poco::Logger::get("DDLQueryStatusInputStream")) +{ + auto output_mode = context->getSettingsRef().distributed_ddl_output_mode; + throw_on_timeout = output_mode == DistributedDDLOutputMode::THROW || output_mode == DistributedDDLOutputMode::NONE; + if (hosts_to_wait) { waiting_hosts = NameSet(hosts_to_wait->begin(), hosts_to_wait->end()); by_hostname = false; - sample.erase("port"); } else { @@ -231,11 +270,10 @@ DDLQueryStatusInputStream::DDLQueryStatusInputStream(const String & zk_node_path } addTotalRowsApprox(waiting_hosts.size()); - timeout_seconds = context->getSettingsRef().distributed_ddl_task_timeout; } -std::pair DDLQueryStatusInputStream::parseHostAndPort(const String & host_id) const +std::pair DDLQueryStatusSource::parseHostAndPort(const String & host_id) const { String host = host_id; UInt16 port = 0; @@ -248,37 +286,28 @@ std::pair DDLQueryStatusInputStream::parseHostAndPort(const Stri return {host, port}; } -Block DDLQueryStatusInputStream::readImpl() +Chunk DDLQueryStatusSource::generate() { - Block res; bool all_hosts_finished = num_hosts_finished >= waiting_hosts.size(); + /// Seems like num_hosts_finished cannot be strictly greater than waiting_hosts.size() assert(num_hosts_finished <= waiting_hosts.size()); - if (all_hosts_finished || timeout_exceeded) - { - bool throw_if_error_on_host = context->getSettingsRef().distributed_ddl_output_mode != DistributedDDLOutputMode::NEVER_THROW; - if (first_exception && throw_if_error_on_host) - throw Exception(*first_exception); - return res; - } + if (all_hosts_finished || timeout_exceeded) + return {}; auto zookeeper = context->getZooKeeper(); size_t try_number = 0; - while (res.rows() == 0) + while (true) { if (isCancelled()) - { - bool throw_if_error_on_host = context->getSettingsRef().distributed_ddl_output_mode != DistributedDDLOutputMode::NEVER_THROW; - if (first_exception && throw_if_error_on_host) - throw Exception(*first_exception); - - return res; - } + return {}; if (timeout_seconds >= 0 && watch.elapsedSeconds() > timeout_seconds) { + timeout_exceeded = true; + size_t num_unfinished_hosts = waiting_hosts.size() - num_hosts_finished; size_t num_active_hosts = current_active_hosts.size(); @@ -286,10 +315,13 @@ Block DDLQueryStatusInputStream::readImpl() "There are {} unfinished hosts ({} of them are currently active), " "they are going to execute the query in background"; if (throw_on_timeout) - throw Exception(ErrorCodes::TIMEOUT_EXCEEDED, msg_format, - node_path, timeout_seconds, num_unfinished_hosts, num_active_hosts); + { + if (!first_exception) + first_exception = std::make_unique(ErrorCodes::TIMEOUT_EXCEEDED, msg_format, + node_path, timeout_seconds, num_unfinished_hosts, num_active_hosts); + return {}; + } - timeout_exceeded = true; LOG_INFO(log, msg_format, node_path, timeout_seconds, num_unfinished_hosts, num_active_hosts); NameSet unfinished_hosts = waiting_hosts; @@ -297,7 +329,7 @@ Block DDLQueryStatusInputStream::readImpl() unfinished_hosts.erase(host_id); /// Query is not finished on the rest hosts, so fill the corresponding rows with NULLs. - MutableColumns columns = sample.cloneEmptyColumns(); + MutableColumns columns = output.getHeader().cloneEmptyColumns(); for (const String & host_id : unfinished_hosts) { auto [host, port] = parseHostAndPort(host_id); @@ -310,8 +342,7 @@ Block DDLQueryStatusInputStream::readImpl() columns[num++]->insert(num_unfinished_hosts); columns[num++]->insert(num_active_hosts); } - res = sample.cloneWithColumns(std::move(columns)); - return res; + return Chunk(std::move(columns), unfinished_hosts.size()); } if (num_hosts_finished != 0 || try_number != 0) @@ -321,9 +352,13 @@ Block DDLQueryStatusInputStream::readImpl() if (!zookeeper->exists(node_path)) { - throw Exception(ErrorCodes::UNFINISHED, - "Cannot provide query execution status. The query's node {} has been deleted by the cleaner since it was finished (or its lifetime is expired)", - node_path); + /// Paradoxically, this exception will be throw even in case of "never_throw" mode. + + if (!first_exception) + first_exception = std::make_unique(ErrorCodes::UNFINISHED, + "Cannot provide query execution status. The query's node {} has been deleted by the cleaner" + " since it was finished (or its lifetime is expired)", node_path); + return {}; } Strings new_hosts = getNewAndUpdate(getChildrenAllowNoNode(zookeeper, fs::path(node_path) / "finished")); @@ -333,7 +368,7 @@ Block DDLQueryStatusInputStream::readImpl() current_active_hosts = getChildrenAllowNoNode(zookeeper, fs::path(node_path) / "active"); - MutableColumns columns = sample.cloneEmptyColumns(); + MutableColumns columns = output.getHeader().cloneEmptyColumns(); for (const String & host_id : new_hosts) { ExecutionStatus status(-1, "Cannot obtain error message"); @@ -345,8 +380,11 @@ Block DDLQueryStatusInputStream::readImpl() auto [host, port] = parseHostAndPort(host_id); - if (status.code != 0 && first_exception == nullptr) + if (status.code != 0 && !first_exception + && context->getSettingsRef().distributed_ddl_output_mode != DistributedDDLOutputMode::NEVER_THROW) + { first_exception = std::make_unique(status.code, "There was an error on [{}:{}]: {}", host, port, status.message); + } ++num_hosts_finished; @@ -359,13 +397,34 @@ Block DDLQueryStatusInputStream::readImpl() columns[num++]->insert(waiting_hosts.size() - num_hosts_finished); columns[num++]->insert(current_active_hosts.size()); } - res = sample.cloneWithColumns(std::move(columns)); - } - return res; + return Chunk(std::move(columns), new_hosts.size()); + } } -Strings DDLQueryStatusInputStream::getChildrenAllowNoNode(const std::shared_ptr & zookeeper, const String & node_path) +IProcessor::Status DDLQueryStatusSource::prepare() +{ + /// This method is overloaded to throw exception after all data is read. + /// Exception is pushed into pipe (instead of simply being thrown) to ensure the order of data processing and exception. + + if (finished) + { + if (first_exception) + { + if (!output.canPush()) + return Status::PortFull; + + output.pushException(std::make_exception_ptr(*first_exception)); + } + + output.finish(); + return Status::Finished; + } + else + return SourceWithProgress::prepare(); +} + +Strings DDLQueryStatusSource::getChildrenAllowNoNode(const std::shared_ptr & zookeeper, const String & node_path) { Strings res; Coordination::Error code = zookeeper->tryGetChildren(node_path, res); @@ -374,7 +433,7 @@ Strings DDLQueryStatusInputStream::getChildrenAllowNoNode(const std::shared_ptr< return res; } -Strings DDLQueryStatusInputStream::getNewAndUpdate(const Strings & current_list_of_finished_hosts) +Strings DDLQueryStatusSource::getNewAndUpdate(const Strings & current_list_of_finished_hosts) { Strings diff; for (const String & host : current_list_of_finished_hosts) @@ -384,7 +443,7 @@ Strings DDLQueryStatusInputStream::getNewAndUpdate(const Strings & current_list_ if (!ignoring_hosts.count(host)) { ignoring_hosts.emplace(host); - LOG_INFO(log, "Unexpected host {} appeared in task {}", host, node_path); + LOG_INFO(log, "Unexpected host {} appeared in task {}", host, node_path); } continue; } diff --git a/src/Interpreters/executeDDLQueryOnCluster.h b/src/Interpreters/executeDDLQueryOnCluster.h index bbd39a6e8ec..f430c2364b2 100644 --- a/src/Interpreters/executeDDLQueryOnCluster.h +++ b/src/Interpreters/executeDDLQueryOnCluster.h @@ -1,8 +1,11 @@ #pragma once #include +#include #include #include +#include + namespace zkutil { @@ -20,54 +23,12 @@ struct DDLLogEntry; bool isSupportedAlterType(int type); /// Pushes distributed DDL query to the queue. -/// Returns DDLQueryStatusInputStream, which reads results of query execution on each host in the cluster. +/// Returns DDLQueryStatusSource, which reads results of query execution on each host in the cluster. BlockIO executeDDLQueryOnCluster(const ASTPtr & query_ptr, ContextPtr context); BlockIO executeDDLQueryOnCluster(const ASTPtr & query_ptr, ContextPtr context, const AccessRightsElements & query_requires_access); BlockIO executeDDLQueryOnCluster(const ASTPtr & query_ptr, ContextPtr context, AccessRightsElements && query_requires_access); -BlockIO getDistributedDDLStatus(const String & node_path, const DDLLogEntry & entry, ContextPtr context, const std::optional & hosts_to_wait = {}); - -class DDLQueryStatusInputStream final : public IBlockInputStream -{ -public: - DDLQueryStatusInputStream(const String & zk_node_path, const DDLLogEntry & entry, ContextPtr context_, const std::optional & hosts_to_wait = {}); - - String getName() const override { return "DDLQueryStatusInputStream"; } - - Block getHeader() const override { return sample; } - - Block getSampleBlock() const { return sample.cloneEmpty(); } - - Block readImpl() override; - -private: - - static Strings getChildrenAllowNoNode(const std::shared_ptr & zookeeper, const String & node_path); - - Strings getNewAndUpdate(const Strings & current_list_of_finished_hosts); - - std::pair parseHostAndPort(const String & host_id) const; - - String node_path; - ContextPtr context; - Stopwatch watch; - Poco::Logger * log; - - Block sample; - - NameSet waiting_hosts; /// hosts from task host list - NameSet finished_hosts; /// finished hosts from host list - NameSet ignoring_hosts; /// appeared hosts that are not in hosts list - Strings current_active_hosts; /// Hosts that were in active state at the last check - size_t num_hosts_finished = 0; - - /// Save the first detected error and throw it at the end of execution - std::unique_ptr first_exception; - - Int64 timeout_seconds = 120; - bool by_hostname = true; - bool throw_on_timeout = true; - bool timeout_exceeded = false; -}; +BlockIO getDistributedDDLStatus( + const String & node_path, const DDLLogEntry & entry, ContextPtr context, const std::optional & hosts_to_wait = {}); } diff --git a/src/Interpreters/executeQuery.cpp b/src/Interpreters/executeQuery.cpp index 99c08c70b7c..3756f1b2765 100644 --- a/src/Interpreters/executeQuery.cpp +++ b/src/Interpreters/executeQuery.cpp @@ -11,7 +11,7 @@ #include #include #include -#include +#include #include #include @@ -31,6 +31,7 @@ #include #include +#include #include #include @@ -52,6 +53,7 @@ #include #include #include +#include namespace ProfileEvents @@ -72,6 +74,7 @@ namespace ErrorCodes { extern const int INTO_OUTFILE_NOT_ALLOWED; extern const int QUERY_WAS_CANCELLED; + extern const int LOGICAL_ERROR; } @@ -260,7 +263,11 @@ static void onExceptionBeforeStart(const String & query_for_logging, ContextPtr elem.query = query_for_logging; elem.normalized_query_hash = normalizedQueryHash(query_for_logging); - // We don't calculate query_kind, databases, tables and columns when the query isn't able to start + // Try log query_kind if ast is valid + if (ast) + elem.query_kind = ast->getQueryKindString(); + + // We don't calculate databases, tables and columns when the query isn't able to start elem.exception_code = getCurrentExceptionCode(); elem.exception = getCurrentExceptionMessage(false); @@ -511,9 +518,9 @@ static std::tuple executeQueryImpl( StoragePtr storage = context->executeTableFunction(input_function); auto & input_storage = dynamic_cast(*storage); auto input_metadata_snapshot = input_storage.getInMemoryMetadataPtr(); - BlockInputStreamPtr input_stream = std::make_shared( + auto pipe = getSourceFromFromASTInsertQuery( ast, istr, input_metadata_snapshot->getSampleBlock(), context, input_function); - input_storage.setInputStream(input_stream); + input_storage.setPipe(std::move(pipe)); } } } @@ -669,7 +676,7 @@ static std::tuple executeQueryImpl( } /// Common code for finish and exception callbacks - auto status_info_to_query_log = [](QueryLogElement &element, const QueryStatusInfo &info, const ASTPtr query_ast) mutable + auto status_info_to_query_log = [](QueryLogElement & element, const QueryStatusInfo & info, const ASTPtr query_ast, const ContextPtr context_ptr) mutable { DB::UInt64 query_time = info.elapsed_seconds * 1000000; ProfileEvents::increment(ProfileEvents::QueryTimeMicroseconds, query_time); @@ -694,6 +701,17 @@ static std::tuple executeQueryImpl( element.thread_ids = std::move(info.thread_ids); element.profile_counters = std::move(info.profile_counters); + + const auto & factories_info = context_ptr->getQueryFactoriesInfo(); + element.used_aggregate_functions = factories_info.aggregate_functions; + element.used_aggregate_function_combinators = factories_info.aggregate_function_combinators; + element.used_database_engines = factories_info.database_engines; + element.used_data_type_families = factories_info.data_type_families; + element.used_dictionaries = factories_info.dictionaries; + element.used_formats = factories_info.formats; + element.used_functions = factories_info.functions; + element.used_storages = factories_info.storages; + element.used_table_functions = factories_info.table_functions; }; /// Also make possible for caller to log successful query finish and exception during execution. @@ -724,7 +742,7 @@ static std::tuple executeQueryImpl( const auto finish_time = std::chrono::system_clock::now(); elem.event_time = time_in_seconds(finish_time); elem.event_time_microseconds = time_in_microseconds(finish_time); - status_info_to_query_log(elem, info, ast); + status_info_to_query_log(elem, info, ast, context); auto progress_callback = context->getProgressCallback(); @@ -765,20 +783,6 @@ static std::tuple executeQueryImpl( ReadableSize(elem.read_bytes / elapsed_seconds)); } - elem.thread_ids = std::move(info.thread_ids); - elem.profile_counters = std::move(info.profile_counters); - - const auto & factories_info = context->getQueryFactoriesInfo(); - elem.used_aggregate_functions = factories_info.aggregate_functions; - elem.used_aggregate_function_combinators = factories_info.aggregate_function_combinators; - elem.used_database_engines = factories_info.database_engines; - elem.used_data_type_families = factories_info.data_type_families; - elem.used_dictionaries = factories_info.dictionaries; - elem.used_formats = factories_info.formats; - elem.used_functions = factories_info.functions; - elem.used_storages = factories_info.storages; - elem.used_table_functions = factories_info.table_functions; - if (log_queries && elem.type >= log_queries_min_type && Int64(elem.query_duration_ms) >= log_queries_min_query_duration_ms) { if (auto query_log = context->getQueryLog()) @@ -847,7 +851,7 @@ static std::tuple executeQueryImpl( if (process_list_elem) { QueryStatusInfo info = process_list_elem->getInfo(true, current_settings.log_profile_events, false); - status_info_to_query_log(elem, info, ast); + status_info_to_query_log(elem, info, ast, context); } if (current_settings.calculate_text_stack_trace) @@ -875,13 +879,6 @@ static std::tuple executeQueryImpl( res.finish_callback = std::move(finish_callback); res.exception_callback = std::move(exception_callback); - - if (!internal && res.in) - { - WriteBufferFromOwnString msg_buf; - res.in->dumpTree(msg_buf); - LOG_DEBUG(&Poco::Logger::get("executeQuery"), "Query pipeline:\n{}", msg_buf.str()); - } } } catch (...) @@ -949,6 +946,7 @@ void executeQuery( bool allow_into_outfile, ContextMutablePtr context, std::function set_result_details, + const std::optional & output_format_settings, std::function before_finalize_callback) { PODArray parse_buf; @@ -997,30 +995,48 @@ void executeQuery( { if (streams.out) { - InputStreamFromASTInsertQuery in(ast, &istr, streams.out->getHeader(), context, nullptr); - copyData(in, *streams.out); + auto pipe = getSourceFromFromASTInsertQuery(ast, &istr, streams.out->getHeader(), context, nullptr); + + pipeline.init(std::move(pipe)); + pipeline.resize(1); + pipeline.setSinks([&](const Block &, Pipe::StreamType) + { + return std::make_shared(streams.out); + }); + + auto executor = pipeline.execute(); + executor->execute(pipeline.getNumThreads()); } else if (streams.in) { const auto * ast_query_with_output = dynamic_cast(ast.get()); WriteBuffer * out_buf = &ostr; - std::optional out_file_buf; + std::unique_ptr compressed_buffer; if (ast_query_with_output && ast_query_with_output->out_file) { if (!allow_into_outfile) throw Exception("INTO OUTFILE is not allowed", ErrorCodes::INTO_OUTFILE_NOT_ALLOWED); const auto & out_file = ast_query_with_output->out_file->as().value.safeGet(); - out_file_buf.emplace(out_file, DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_EXCL | O_CREAT); - out_buf = &*out_file_buf; + compressed_buffer = wrapWriteBufferWithCompressionMethod( + std::make_unique(out_file, DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_EXCL | O_CREAT), + chooseCompressionMethod(out_file, ""), + /* compression level = */ 3 + ); } String format_name = ast_query_with_output && (ast_query_with_output->format != nullptr) ? getIdentifierName(ast_query_with_output->format) : context->getDefaultFormat(); - auto out = context->getOutputStreamParallelIfPossible(format_name, *out_buf, streams.in->getHeader()); + auto out = FormatFactory::instance().getOutputStreamParallelIfPossible( + format_name, + compressed_buffer ? *compressed_buffer : *out_buf, + streams.in->getHeader(), + context, + {}, + output_format_settings); /// Save previous progress callback if any. TODO Do it more conveniently. auto previous_progress_callback = context->getProgressCallback(); @@ -1044,15 +1060,18 @@ void executeQuery( const ASTQueryWithOutput * ast_query_with_output = dynamic_cast(ast.get()); WriteBuffer * out_buf = &ostr; - std::optional out_file_buf; + std::unique_ptr compressed_buffer; if (ast_query_with_output && ast_query_with_output->out_file) { if (!allow_into_outfile) throw Exception("INTO OUTFILE is not allowed", ErrorCodes::INTO_OUTFILE_NOT_ALLOWED); const auto & out_file = typeid_cast(*ast_query_with_output->out_file).value.safeGet(); - out_file_buf.emplace(out_file, DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_EXCL | O_CREAT); - out_buf = &*out_file_buf; + compressed_buffer = wrapWriteBufferWithCompressionMethod( + std::make_unique(out_file, DBMS_DEFAULT_BUFFER_SIZE, O_WRONLY | O_EXCL | O_CREAT), + chooseCompressionMethod(out_file, ""), + /* compression level = */ 3 + ); } String format_name = ast_query_with_output && (ast_query_with_output->format != nullptr) @@ -1066,7 +1085,14 @@ void executeQuery( return std::make_shared(header); }); - auto out = context->getOutputFormatParallelIfPossible(format_name, *out_buf, pipeline.getHeader()); + auto out = FormatFactory::instance().getOutputFormatParallelIfPossible( + format_name, + compressed_buffer ? *compressed_buffer : *out_buf, + pipeline.getHeader(), + context, + {}, + output_format_settings); + out->setAutoFlush(); /// Save previous progress callback if any. TODO Do it more conveniently. @@ -1108,4 +1134,32 @@ void executeQuery( streams.onFinish(); } +void executeTrivialBlockIO(BlockIO & streams, ContextPtr context) +{ + try + { + if (streams.out) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Query stream requires input, but no input buffer provided, it's a bug"); + if (streams.in) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Query stream requires output, but no output buffer provided, it's a bug"); + + if (!streams.pipeline.initialized()) + return; + + if (!streams.pipeline.isCompleted()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Query pipeline requires output, but no output buffer provided, it's a bug"); + + streams.pipeline.setProgressCallback(context->getProgressCallback()); + auto executor = streams.pipeline.execute(); + executor->execute(streams.pipeline.getNumThreads()); + } + catch (...) + { + streams.onException(); + throw; + } + + streams.onFinish(); +} + } diff --git a/src/Interpreters/executeQuery.h b/src/Interpreters/executeQuery.h index 77f142de121..9672c4e7517 100644 --- a/src/Interpreters/executeQuery.h +++ b/src/Interpreters/executeQuery.h @@ -16,8 +16,9 @@ void executeQuery( ReadBuffer & istr, /// Where to read query from (and data for INSERT, if present). WriteBuffer & ostr, /// Where to write query output to. bool allow_into_outfile, /// If true and the query contains INTO OUTFILE section, redirect output to that file. - ContextMutablePtr context, /// DB, tables, data types, storage engines, functions, aggregate functions... + ContextMutablePtr context, /// DB, tables, data types, storage engines, functions, aggregate functions... std::function set_result_details, /// If a non-empty callback is passed, it will be called with the query id, the content-type, the format, and the timezone. + const std::optional & output_format_settings = std::nullopt, /// Format settings for output format, will be calculated from the context if not set. std::function before_finalize_callback = {} /// Will be set in output format to be called before finalize. ); @@ -54,4 +55,8 @@ BlockIO executeQuery( bool allow_processors /// If can use processors pipeline ); +/// Executes BlockIO returned from executeQuery(...) +/// if built pipeline does not require any input and does not produce any output. +void executeTrivialBlockIO(BlockIO & streams, ContextPtr context); + } diff --git a/src/Interpreters/inplaceBlockConversions.cpp b/src/Interpreters/inplaceBlockConversions.cpp index ff16c7b3ff6..26cf6912bc7 100644 --- a/src/Interpreters/inplaceBlockConversions.cpp +++ b/src/Interpreters/inplaceBlockConversions.cpp @@ -53,6 +53,7 @@ void addDefaultRequiredExpressionsRecursively( NameSet required_columns_names = columns_context.requiredColumns(); auto expr = makeASTFunction("CAST", column_default_expr, std::make_shared(columns.get(required_column_name).type->getName())); + if (is_column_in_query && convert_null_to_default) expr = makeASTFunction("ifNull", std::make_shared(required_column_name), std::move(expr)); default_expr_list_accum->children.emplace_back(setAlias(expr, required_column_name)); @@ -62,6 +63,15 @@ void addDefaultRequiredExpressionsRecursively( for (const auto & next_required_column_name : required_columns_names) addDefaultRequiredExpressionsRecursively(block, next_required_column_name, required_column_type, columns, default_expr_list_accum, added_columns, null_as_default); } + else + { + /// This column is required, but doesn't have default expression, so lets use "default default" + auto column = columns.get(required_column_name); + auto default_value = column.type->getDefault(); + auto default_ast = std::make_shared(default_value); + default_expr_list_accum->children.emplace_back(setAlias(default_ast, required_column_name)); + added_columns.emplace(required_column_name); + } } ASTPtr defaultRequiredExpressions(const Block & block, const NamesAndTypesList & required_columns, const ColumnsDescription & columns, bool null_as_default) diff --git a/src/Interpreters/join_common.cpp b/src/Interpreters/join_common.cpp index 5548667e1a7..9d6abda42ed 100644 --- a/src/Interpreters/join_common.cpp +++ b/src/Interpreters/join_common.cpp @@ -1,21 +1,29 @@ #include -#include -#include -#include + #include -#include -#include -#include +#include + #include + +#include +#include +#include +#include + #include +#include +#include + +#include namespace DB { namespace ErrorCodes { - extern const int TYPE_MISMATCH; + extern const int INVALID_JOIN_ON_EXPRESSION; extern const int LOGICAL_ERROR; + extern const int TYPE_MISMATCH; } namespace @@ -220,6 +228,12 @@ ColumnRawPtrs materializeColumnsInplace(Block & block, const Names & names) return ptrs; } +ColumnPtr materializeColumn(const Block & block, const String & column_name) +{ + const auto & src_column = block.getByName(column_name).column; + return recursiveRemoveLowCardinality(src_column->convertToFullColumnIfConst()); +} + Columns materializeColumns(const Block & block, const Names & names) { Columns materialized; @@ -227,8 +241,7 @@ Columns materializeColumns(const Block & block, const Names & names) for (const auto & column_name : names) { - const auto & src_column = block.getByName(column_name).column; - materialized.emplace_back(recursiveRemoveLowCardinality(src_column->convertToFullColumnIfConst())); + materialized.emplace_back(materializeColumn(block, column_name)); } return materialized; @@ -294,7 +307,8 @@ ColumnRawPtrs extractKeysForJoin(const Block & block_keys, const Names & key_nam return key_columns; } -void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, const Block & block_right, const Names & key_names_right) +void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, + const Block & block_right, const Names & key_names_right) { size_t keys_size = key_names_left.size(); @@ -305,12 +319,38 @@ void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, co if (!left_type->equals(*right_type)) throw Exception("Type mismatch of columns to JOIN by: " - + key_names_left[i] + " " + left_type->getName() + " at left, " - + key_names_right[i] + " " + right_type->getName() + " at right", - ErrorCodes::TYPE_MISMATCH); + + key_names_left[i] + " " + left_type->getName() + " at left, " + + key_names_right[i] + " " + right_type->getName() + " at right", + ErrorCodes::TYPE_MISMATCH); } } +void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, const String & condition_name_left, + const Block & block_right, const Names & key_names_right, const String & condition_name_right) +{ + checkTypesOfKeys(block_left, key_names_left,block_right,key_names_right); + checkTypesOfMasks(block_left, condition_name_left, block_right, condition_name_right); +} + +void checkTypesOfMasks(const Block & block_left, const String & condition_name_left, + const Block & block_right, const String & condition_name_right) +{ + auto check_cond_column_type = [](const Block & block, const String & col_name) + { + if (col_name.empty()) + return; + + DataTypePtr dtype = removeNullable(recursiveRemoveLowCardinality(block.getByName(col_name).type)); + + if (!dtype->equals(DataTypeUInt8{})) + throw Exception(ErrorCodes::INVALID_JOIN_ON_EXPRESSION, + "Expected logical expression in JOIN ON section, got unexpected column '{}' of type '{}'", + col_name, dtype->getName()); + }; + check_cond_column_type(block_left, condition_name_left); + check_cond_column_type(block_right, condition_name_right); +} + void createMissedColumns(Block & block) { for (size_t i = 0; i < block.columns(); ++i) @@ -322,46 +362,26 @@ void createMissedColumns(Block & block) } /// Append totals from right to left block, correct types if needed -void joinTotals(const Block & totals, const Block & columns_to_add, const TableJoin & table_join, Block & block) +void joinTotals(Block left_totals, Block right_totals, const TableJoin & table_join, Block & out_block) { if (table_join.forceNullableLeft()) - convertColumnsToNullable(block); + JoinCommon::convertColumnsToNullable(left_totals); - if (Block totals_without_keys = totals) + if (table_join.forceNullableRight()) + JoinCommon::convertColumnsToNullable(right_totals); + + for (auto & col : out_block) { - for (const auto & name : table_join.keyNamesRight()) - totals_without_keys.erase(totals_without_keys.getPositionByName(name)); + if (const auto * left_col = left_totals.findByName(col.name)) + col = *left_col; + else if (const auto * right_col = right_totals.findByName(col.name)) + col = *right_col; + else + col.column = col.type->createColumnConstWithDefaultValue(1)->convertToFullColumnIfConst(); - for (auto & col : totals_without_keys) - { - if (table_join.rightBecomeNullable(col.type)) - JoinCommon::convertColumnToNullable(col); - - /// In case of arrayJoin it can be not one row - if (col.column->size() != 1) - col.column = col.column->cloneResized(1); - } - - for (size_t i = 0; i < totals_without_keys.columns(); ++i) - block.insert(totals_without_keys.safeGetByPosition(i)); - } - else - { - /// We will join empty `totals` - from one row with the default values. - - for (size_t i = 0; i < columns_to_add.columns(); ++i) - { - const auto & col = columns_to_add.getByPosition(i); - if (block.has(col.name)) - { - /// For StorageJoin we discarded table qualifiers, so some names may clash - continue; - } - block.insert({ - col.type->createColumnConstWithDefaultValue(1)->convertToFullColumnIfConst(), - col.type, - col.name}); - } + /// In case of using `arrayJoin` we can get more or less rows than one + if (col.column->size() != 1) + col.column = col.column->cloneResized(1); } } @@ -379,28 +399,80 @@ bool typesEqualUpToNullability(DataTypePtr left_type, DataTypePtr right_type) return left_type_strict->equals(*right_type_strict); } +ColumnPtr getColumnAsMask(const Block & block, const String & column_name) +{ + if (column_name.empty()) + return nullptr; + + const auto & src_col = block.getByName(column_name); + + DataTypePtr col_type = recursiveRemoveLowCardinality(src_col.type); + if (isNothing(col_type)) + return ColumnUInt8::create(block.rows(), 0); + + const auto & join_condition_col = recursiveRemoveLowCardinality(src_col.column->convertToFullColumnIfConst()); + + if (const auto * nullable_col = typeid_cast(join_condition_col.get())) + { + if (isNothing(assert_cast(*col_type).getNestedType())) + return ColumnUInt8::create(block.rows(), 0); + + /// Return nested column with NULL set to false + const auto & nest_col = assert_cast(nullable_col->getNestedColumn()); + const auto & null_map = nullable_col->getNullMapColumn(); + + auto res = ColumnUInt8::create(nullable_col->size(), 0); + for (size_t i = 0, sz = nullable_col->size(); i < sz; ++i) + res->getData()[i] = !null_map.getData()[i] && nest_col.getData()[i]; + return res; + } + else + return join_condition_col; +} + + +void splitAdditionalColumns(const Names & key_names, const Block & sample_block, Block & block_keys, Block & block_others) +{ + block_others = materializeBlock(sample_block); + + for (const String & column_name : key_names) + { + /// Extract right keys with correct keys order. There could be the same key names. + if (!block_keys.has(column_name)) + { + auto & col = block_others.getByName(column_name); + block_keys.insert(col); + block_others.erase(column_name); + } + } +} + } NotJoined::NotJoined(const TableJoin & table_join, const Block & saved_block_sample_, const Block & right_sample_block, - const Block & result_sample_block_) + const Block & result_sample_block_, const Names & key_names_left_, const Names & key_names_right_) : saved_block_sample(saved_block_sample_) , result_sample_block(materializeBlock(result_sample_block_)) + , key_names_left(key_names_left_.empty() ? table_join.keyNamesLeft() : key_names_left_) + , key_names_right(key_names_right_.empty() ? table_join.keyNamesRight() : key_names_right_) { std::vector tmp; Block right_table_keys; Block sample_block_with_columns_to_add; - table_join.splitAdditionalColumns(right_sample_block, right_table_keys, sample_block_with_columns_to_add); + + JoinCommon::splitAdditionalColumns(key_names_right, right_sample_block, right_table_keys, + sample_block_with_columns_to_add); Block required_right_keys = table_join.getRequiredRightKeys(right_table_keys, tmp); std::unordered_map left_to_right_key_remap; if (table_join.hasUsing()) { - for (size_t i = 0; i < table_join.keyNamesLeft().size(); ++i) + for (size_t i = 0; i < key_names_left.size(); ++i) { - const String & left_key_name = table_join.keyNamesLeft()[i]; - const String & right_key_name = table_join.keyNamesRight()[i]; + const String & left_key_name = key_names_left[i]; + const String & right_key_name = key_names_right[i]; size_t left_key_pos = result_sample_block.getPositionByName(left_key_name); size_t right_key_pos = saved_block_sample.getPositionByName(right_key_name); diff --git a/src/Interpreters/join_common.h b/src/Interpreters/join_common.h index 9334b9d672f..8862116d1fa 100644 --- a/src/Interpreters/join_common.h +++ b/src/Interpreters/join_common.h @@ -1,5 +1,6 @@ #pragma once +#include #include #include #include @@ -12,6 +13,7 @@ struct ColumnWithTypeAndName; class TableJoin; class IColumn; using ColumnRawPtrs = std::vector; +using UInt8ColumnDataPtr = const ColumnUInt8::Container *; namespace JoinCommon { @@ -22,6 +24,7 @@ void convertColumnsToNullable(Block & block, size_t starting_pos = 0); void removeColumnNullability(ColumnWithTypeAndName & column); void changeColumnRepresentation(const ColumnPtr & src_column, ColumnPtr & dst_column); ColumnPtr emptyNotNullableClone(const ColumnPtr & column); +ColumnPtr materializeColumn(const Block & block, const String & name); Columns materializeColumns(const Block & block, const Names & names); ColumnRawPtrs materializeColumnsInplace(Block & block, const Names & names); ColumnRawPtrs getRawPointers(const Columns & columns); @@ -31,16 +34,31 @@ void restoreLowCardinalityInplace(Block & block); ColumnRawPtrs extractKeysForJoin(const Block & block_keys, const Names & key_names_right); -/// Throw an exception if blocks have different types of key columns. Compare up to Nullability. -void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, const Block & block_right, const Names & key_names_right); +/// Throw an exception if join condition column is not UIint8 +void checkTypesOfMasks(const Block & block_left, const String & condition_name_left, + const Block & block_right, const String & condition_name_right); + +/// Throw an exception if blocks have different types of key columns . Compare up to Nullability. +void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, + const Block & block_right, const Names & key_names_right); + +/// Check both keys and conditions +void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, const String & condition_name_left, + const Block & block_right, const Names & key_names_right, const String & condition_name_right); void createMissedColumns(Block & block); -void joinTotals(const Block & totals, const Block & columns_to_add, const TableJoin & table_join, Block & block); +void joinTotals(Block left_totals, Block right_totals, const TableJoin & table_join, Block & out_block); void addDefaultValues(IColumn & column, const DataTypePtr & type, size_t count); bool typesEqualUpToNullability(DataTypePtr left_type, DataTypePtr right_type); +/// Return mask array of type ColumnUInt8 for specified column. Source should have type UInt8 or Nullable(UInt8). +ColumnPtr getColumnAsMask(const Block & block, const String & column_name); + +/// Split key and other columns by keys name list +void splitAdditionalColumns(const Names & key_names, const Block & sample_block, Block & block_keys, Block & block_others); + void changeLowCardinalityInplace(ColumnWithTypeAndName & column); } @@ -50,7 +68,7 @@ class NotJoined { public: NotJoined(const TableJoin & table_join, const Block & saved_block_sample_, const Block & right_sample_block, - const Block & result_sample_block_); + const Block & result_sample_block_, const Names & key_names_left_ = {}, const Names & key_names_right_ = {}); void correctLowcardAndNullability(MutableColumns & columns_right); void addLeftColumns(Block & block, size_t rows_added) const; @@ -61,6 +79,9 @@ protected: Block saved_block_sample; Block result_sample_block; + Names key_names_left; + Names key_names_right; + ~NotJoined() = default; private: diff --git a/src/Interpreters/replaceAliasColumnsInQuery.cpp b/src/Interpreters/replaceAliasColumnsInQuery.cpp index 3f789ec3d4f..604ba3590ae 100644 --- a/src/Interpreters/replaceAliasColumnsInQuery.cpp +++ b/src/Interpreters/replaceAliasColumnsInQuery.cpp @@ -6,12 +6,13 @@ namespace DB { -void replaceAliasColumnsInQuery( +bool replaceAliasColumnsInQuery( ASTPtr & ast, const ColumnsDescription & columns, const NameToNameMap & array_join_result_to_source, ContextPtr context) { ColumnAliasesVisitor::Data aliases_column_data(columns, array_join_result_to_source, context); ColumnAliasesVisitor aliases_column_visitor(aliases_column_data); aliases_column_visitor.visit(ast); + return aliases_column_data.changed; } } diff --git a/src/Interpreters/replaceAliasColumnsInQuery.h b/src/Interpreters/replaceAliasColumnsInQuery.h index fadebe3c9e6..5d9207ad11b 100644 --- a/src/Interpreters/replaceAliasColumnsInQuery.h +++ b/src/Interpreters/replaceAliasColumnsInQuery.h @@ -10,7 +10,8 @@ namespace DB class ColumnsDescription; -void replaceAliasColumnsInQuery( +/// Replace storage alias columns in select query if possible. Return true if the query is changed. +bool replaceAliasColumnsInQuery( ASTPtr & ast, const ColumnsDescription & columns, const NameToNameMap & array_join_result_to_source, ContextPtr context); } diff --git a/src/Interpreters/ya.make b/src/Interpreters/ya.make index 17157fe3a8c..462c778bf3d 100644 --- a/src/Interpreters/ya.make +++ b/src/Interpreters/ya.make @@ -108,6 +108,7 @@ SRCS( JoinSwitcher.cpp JoinToSubqueryTransformVisitor.cpp JoinedTables.cpp + Lemmatizers.cpp LogicalExpressionsOptimizer.cpp MarkTableIdentifiersVisitor.cpp MergeJoin.cpp @@ -145,6 +146,7 @@ SRCS( SortedBlocksWriter.cpp StorageID.cpp SubqueryForSet.cpp + SynonymsExtensions.cpp SystemLog.cpp TableJoin.cpp TablesStatus.cpp @@ -155,6 +157,7 @@ SRCS( TreeOptimizer.cpp TreeRewriter.cpp WindowDescription.cpp + ZooKeeperLog.cpp addMissingDefaults.cpp addTypeConversionToAST.cpp castColumn.cpp diff --git a/src/Parsers/ASTAlterQuery.h b/src/Parsers/ASTAlterQuery.h index 5fc146a3072..0fd6d2805ea 100644 --- a/src/Parsers/ASTAlterQuery.h +++ b/src/Parsers/ASTAlterQuery.h @@ -225,6 +225,8 @@ public: return removeOnCluster(clone(), new_database); } + const char * getQueryKindString() const override { return "Alter"; } + protected: void formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override; diff --git a/src/Parsers/ASTCreateQuery.h b/src/Parsers/ASTCreateQuery.h index c7be67d9b78..e4f2b628886 100644 --- a/src/Parsers/ASTCreateQuery.h +++ b/src/Parsers/ASTCreateQuery.h @@ -102,6 +102,8 @@ public: bool isView() const { return is_ordinary_view || is_materialized_view || is_live_view; } + const char * getQueryKindString() const override { return "Create"; } + protected: void formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override; }; diff --git a/src/Parsers/ASTCreateUserQuery.cpp b/src/Parsers/ASTCreateUserQuery.cpp index 696b88ea9c1..594d21f2a4b 100644 --- a/src/Parsers/ASTCreateUserQuery.cpp +++ b/src/Parsers/ASTCreateUserQuery.cpp @@ -210,6 +210,12 @@ namespace settings.ostr << (settings.hilite ? IAST::hilite_keyword : "") << " GRANTEES " << (settings.hilite ? IAST::hilite_none : ""); grantees.format(settings); } + + void formatDefaultDatabase(const ASTDatabaseOrNone & default_database, const IAST::FormatSettings & settings) + { + settings.ostr << (settings.hilite ? IAST::hilite_keyword : "") << " DEFAULT DATABASE " << (settings.hilite ? IAST::hilite_none : ""); + default_database.format(settings); + } } @@ -262,6 +268,9 @@ void ASTCreateUserQuery::formatImpl(const FormatSettings & format, FormatState & if (remove_hosts) formatHosts("DROP", *remove_hosts, format); + if (default_database) + formatDefaultDatabase(*default_database, format); + if (default_roles) formatDefaultRoles(*default_roles, format); diff --git a/src/Parsers/ASTCreateUserQuery.h b/src/Parsers/ASTCreateUserQuery.h index 1612c213f34..9e80abcb6dd 100644 --- a/src/Parsers/ASTCreateUserQuery.h +++ b/src/Parsers/ASTCreateUserQuery.h @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -10,12 +11,14 @@ namespace DB { class ASTUserNamesWithHost; class ASTRolesOrUsersSet; +class ASTDatabaseOrNone; class ASTSettingsProfileElements; /** CREATE USER [IF NOT EXISTS | OR REPLACE] name * [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password|plaintext_password|sha256_password|sha256_hash|double_sha1_password|double_sha1_hash}] BY {'password'|'hash'}}|{WITH ldap SERVER 'server_name'}|{WITH kerberos [REALM 'realm']}] * [HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE] * [DEFAULT ROLE role [,...]] + * [DEFAULT DATABASE database | NONE] * [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY|WRITABLE] | PROFILE 'profile_name'] [,...] * [GRANTEES {user | role | ANY | NONE} [,...] [EXCEPT {user | role} [,...]]] * @@ -24,6 +27,7 @@ class ASTSettingsProfileElements; * [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password|plaintext_password|sha256_password|sha256_hash|double_sha1_password|double_sha1_hash}] BY {'password'|'hash'}}|{WITH ldap SERVER 'server_name'}|{WITH kerberos [REALM 'realm']}] * [[ADD|DROP] HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE] * [DEFAULT ROLE role [,...] | ALL | ALL EXCEPT role [,...] ] + * [DEFAULT DATABASE database | NONE] * [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY|WRITABLE] | PROFILE 'profile_name'] [,...] * [GRANTEES {user | role | ANY | NONE} [,...] [EXCEPT {user | role} [,...]]] */ @@ -51,6 +55,8 @@ public: std::shared_ptr settings; std::shared_ptr grantees; + std::shared_ptr default_database; + String getID(char) const override; ASTPtr clone() const override; void formatImpl(const FormatSettings & format, FormatState &, FormatStateStacked) const override; diff --git a/src/Parsers/ASTDatabaseOrNone.cpp b/src/Parsers/ASTDatabaseOrNone.cpp new file mode 100644 index 00000000000..f93322ef00c --- /dev/null +++ b/src/Parsers/ASTDatabaseOrNone.cpp @@ -0,0 +1,16 @@ +#include +#include + +namespace DB +{ +void ASTDatabaseOrNone::formatImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const +{ + if (none) + { + settings.ostr << (settings.hilite ? IAST::hilite_keyword : "") << "NONE" << (settings.hilite ? IAST::hilite_none : ""); + return; + } + settings.ostr << database_name; +} + +} diff --git a/src/Parsers/ASTDatabaseOrNone.h b/src/Parsers/ASTDatabaseOrNone.h new file mode 100644 index 00000000000..65ee6a59126 --- /dev/null +++ b/src/Parsers/ASTDatabaseOrNone.h @@ -0,0 +1,21 @@ +#pragma once + +#include + +namespace DB +{ + +class ASTDatabaseOrNone : public IAST +{ +public: + bool none = false; + String database_name; + + bool isNone() const { return none; } + String getID(char) const override { return "DatabaseOrNone"; } + ASTPtr clone() const override { return std::make_shared(*this); } + void formatImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const override; +}; +} + + diff --git a/src/Parsers/ASTDropQuery.h b/src/Parsers/ASTDropQuery.h index b612618aaec..6e5fd5854d8 100644 --- a/src/Parsers/ASTDropQuery.h +++ b/src/Parsers/ASTDropQuery.h @@ -45,6 +45,8 @@ public: return removeOnCluster(clone(), new_database); } + const char * getQueryKindString() const override { return "Drop"; } + protected: void formatQueryImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const override; }; diff --git a/src/Parsers/ASTExplainQuery.h b/src/Parsers/ASTExplainQuery.h index 95a3a362030..5c50a8cd82e 100644 --- a/src/Parsers/ASTExplainQuery.h +++ b/src/Parsers/ASTExplainQuery.h @@ -17,6 +17,7 @@ public: AnalyzedSyntax, /// 'EXPLAIN SYNTAX SELECT ...' QueryPlan, /// 'EXPLAIN SELECT ...' QueryPipeline, /// 'EXPLAIN PIPELINE ...' + QueryEstimates, /// 'EXPLAIN ESTIMATE ...' }; explicit ASTExplainQuery(ExplainKind kind_) : kind(kind_) {} @@ -76,6 +77,7 @@ private: case AnalyzedSyntax: return "EXPLAIN SYNTAX"; case QueryPlan: return "EXPLAIN"; case QueryPipeline: return "EXPLAIN PIPELINE"; + case QueryEstimates: return "EXPLAIN ESTIMATE"; } __builtin_unreachable(); diff --git a/src/Parsers/ASTFunction.cpp b/src/Parsers/ASTFunction.cpp index dfbf2532f1f..daae3e76aa1 100644 --- a/src/Parsers/ASTFunction.cpp +++ b/src/Parsers/ASTFunction.cpp @@ -370,7 +370,7 @@ void ASTFunction::formatImplWithoutAlias(const FormatSettings & settings, Format if (!written && 0 == strcmp(name.c_str(), "tupleElement")) { - // fuzzer sometimes may inserts tupleElement() created from ASTLiteral: + // fuzzer sometimes may insert tupleElement() created from ASTLiteral: // // Function_tupleElement, 0xx // -ExpressionList_, 0xx diff --git a/src/Parsers/ASTGrantQuery.cpp b/src/Parsers/ASTGrantQuery.cpp index aca53868226..e2ac7658c0f 100644 --- a/src/Parsers/ASTGrantQuery.cpp +++ b/src/Parsers/ASTGrantQuery.cpp @@ -102,7 +102,9 @@ ASTPtr ASTGrantQuery::clone() const void ASTGrantQuery::formatImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const { - settings.ostr << (settings.hilite ? IAST::hilite_keyword : "") << (attach_mode ? "ATTACH " : "") << (is_revoke ? "REVOKE" : "GRANT") + settings.ostr << (settings.hilite ? IAST::hilite_keyword : "") << (attach_mode ? "ATTACH " : "") + << (settings.hilite ? hilite_keyword : "") << ((!is_revoke && (replace_access || replace_granted_roles)) ? "REPLACE " : "") << (settings.hilite ? hilite_none : "") + << (settings.hilite ? hilite_keyword : "") << (is_revoke ? "REVOKE" : "GRANT") << (settings.hilite ? IAST::hilite_none : ""); if (!access_rights_elements.sameOptions()) diff --git a/src/Parsers/ASTGrantQuery.h b/src/Parsers/ASTGrantQuery.h index 833c4db8ec6..b0fb64cb33e 100644 --- a/src/Parsers/ASTGrantQuery.h +++ b/src/Parsers/ASTGrantQuery.h @@ -24,6 +24,8 @@ public: AccessRightsElements access_rights_elements; std::shared_ptr roles; bool admin_option = false; + bool replace_access = false; + bool replace_granted_roles = false; std::shared_ptr grantees; String getID(char) const override; @@ -32,5 +34,6 @@ public: void replaceEmptyDatabase(const String & current_database); void replaceCurrentUserTag(const String & current_user_name) const; ASTPtr getRewrittenASTWithoutOnCluster(const std::string &) const override { return removeOnCluster(clone()); } + const char * getQueryKindString() const override { return is_revoke ? "Revoke" : "Grant"; } }; } diff --git a/src/Parsers/ASTInsertQuery.h b/src/Parsers/ASTInsertQuery.h index 982f310fdb3..a454f46c3f1 100644 --- a/src/Parsers/ASTInsertQuery.h +++ b/src/Parsers/ASTInsertQuery.h @@ -47,6 +47,8 @@ public: return res; } + const char * getQueryKindString() const override { return "Insert"; } + protected: void formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override; }; diff --git a/src/Parsers/ASTRenameQuery.h b/src/Parsers/ASTRenameQuery.h index 611f81dc9e9..4940bf42a3a 100644 --- a/src/Parsers/ASTRenameQuery.h +++ b/src/Parsers/ASTRenameQuery.h @@ -34,6 +34,9 @@ public: bool database{false}; /// For RENAME DATABASE bool dictionary{false}; /// For RENAME DICTIONARY + /// Special flag for CREATE OR REPLACE. Do not throw if the second table does not exist. + bool rename_if_cannot_exchange{false}; + /** Get the text that identifies this element. */ String getID(char) const override { return "Rename"; } @@ -61,6 +64,8 @@ public: return query_ptr; } + const char * getQueryKindString() const override { return "Rename"; } + protected: void formatQueryImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const override { diff --git a/src/Parsers/ASTSelectQuery.cpp b/src/Parsers/ASTSelectQuery.cpp index 84a2e1070d6..7699d380623 100644 --- a/src/Parsers/ASTSelectQuery.cpp +++ b/src/Parsers/ASTSelectQuery.cpp @@ -438,4 +438,19 @@ ASTPtr & ASTSelectQuery::getExpression(Expression expr) return children[positions[expr]]; } +void ASTSelectQuery::setFinal() // NOLINT method can be made const +{ + auto & tables_in_select_query = tables()->as(); + + if (tables_in_select_query.children.empty()) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Tables list is empty, it's a bug"); + + auto & tables_element = tables_in_select_query.children[0]->as(); + + if (!tables_element.table_expression) + throw Exception(ErrorCodes::LOGICAL_ERROR, "There is no table expression, it's a bug"); + + tables_element.table_expression->as().final = true; +} + } diff --git a/src/Parsers/ASTSelectQuery.h b/src/Parsers/ASTSelectQuery.h index 3fc8efb5311..12817199d13 100644 --- a/src/Parsers/ASTSelectQuery.h +++ b/src/Parsers/ASTSelectQuery.h @@ -69,6 +69,8 @@ public: const ASTPtr limitLength() const { return getExpression(Expression::LIMIT_LENGTH); } const ASTPtr settings() const { return getExpression(Expression::SETTINGS); } + bool hasFiltration() const { return where() || prewhere() || having(); } + /// Set/Reset/Remove expression. void setExpression(Expression expr, ASTPtr && ast); @@ -93,6 +95,10 @@ public: void addTableFunction(ASTPtr & table_function_ptr); void updateTreeHashImpl(SipHash & hash_state) const override; + void setFinal(); + + const char * getQueryKindString() const override { return "Select"; } + protected: void formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override; diff --git a/src/Parsers/ASTSelectWithUnionQuery.h b/src/Parsers/ASTSelectWithUnionQuery.h index ecf03bb6a05..0465bdac3a6 100644 --- a/src/Parsers/ASTSelectWithUnionQuery.h +++ b/src/Parsers/ASTSelectWithUnionQuery.h @@ -16,6 +16,8 @@ public: ASTPtr clone() const override; void formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override; + const char * getQueryKindString() const override { return "Select"; } + enum class Mode { Unspecified, diff --git a/src/Parsers/ASTSystemQuery.h b/src/Parsers/ASTSystemQuery.h index cbe82cd936f..fa7b6ece59a 100644 --- a/src/Parsers/ASTSystemQuery.h +++ b/src/Parsers/ASTSystemQuery.h @@ -86,6 +86,8 @@ public: return removeOnCluster(clone(), new_database); } + const char * getQueryKindString() const override { return "System"; } + protected: void formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override; diff --git a/src/Parsers/ExpressionElementParsers.cpp b/src/Parsers/ExpressionElementParsers.cpp index e2c96b73277..ca563ddea41 100644 --- a/src/Parsers/ExpressionElementParsers.cpp +++ b/src/Parsers/ExpressionElementParsers.cpp @@ -1555,26 +1555,37 @@ bool ParserUnsignedInteger::parseImpl(Pos & pos, ASTPtr & node, Expected & expec bool ParserStringLiteral::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) { - if (pos->type != TokenType::StringLiteral) + if (pos->type != TokenType::StringLiteral && pos->type != TokenType::HereDoc) return false; String s; - ReadBufferFromMemory in(pos->begin, pos->size()); - try + if (pos->type == TokenType::StringLiteral) { - readQuotedStringWithSQLStyle(s, in); - } - catch (const Exception &) - { - expected.add(pos, "string literal"); - return false; - } + ReadBufferFromMemory in(pos->begin, pos->size()); - if (in.count() != pos->size()) + try + { + readQuotedStringWithSQLStyle(s, in); + } + catch (const Exception &) + { + expected.add(pos, "string literal"); + return false; + } + + if (in.count() != pos->size()) + { + expected.add(pos, "string literal"); + return false; + } + } + else if (pos->type == TokenType::HereDoc) { - expected.add(pos, "string literal"); - return false; + std::string_view here_doc(pos->begin, pos->size()); + size_t heredoc_size = here_doc.find('$', 1) + 1; + assert(heredoc_size != std::string_view::npos); + s = String(pos->begin + heredoc_size, pos->size() - heredoc_size * 2); } auto literal = std::make_shared(s); diff --git a/src/Parsers/ExpressionElementParsers.h b/src/Parsers/ExpressionElementParsers.h index 1dfdafa0888..c4ddb056a4d 100644 --- a/src/Parsers/ExpressionElementParsers.h +++ b/src/Parsers/ExpressionElementParsers.h @@ -308,6 +308,7 @@ protected: /** String in single quotes. + * String in heredoc $here$txt$here$ equivalent to 'txt'. */ class ParserStringLiteral : public IParserBase { diff --git a/src/Parsers/IAST.h b/src/Parsers/IAST.h index 143094e1d7a..94a7b1a52ab 100644 --- a/src/Parsers/IAST.h +++ b/src/Parsers/IAST.h @@ -231,6 +231,9 @@ public: void cloneChildren(); + // Return query_kind string representation of this AST query. + virtual const char * getQueryKindString() const { return ""; } + public: /// For syntax highlighting. static const char * hilite_keyword; diff --git a/src/Parsers/Lexer.cpp b/src/Parsers/Lexer.cpp index be956ee705a..24390773d18 100644 --- a/src/Parsers/Lexer.cpp +++ b/src/Parsers/Lexer.cpp @@ -2,7 +2,6 @@ #include #include - namespace DB { @@ -338,10 +337,33 @@ Token Lexer::nextTokenImpl() } default: - if (*pos == '$' && ((pos + 1 < end && !isWordCharASCII(pos[1])) || pos + 1 == end)) + if (*pos == '$') { - /// Capture standalone dollar sign - return Token(TokenType::DollarSign, token_begin, ++pos); + /// Try to capture dollar sign as start of here doc + + std::string_view token_stream(pos, end - pos); + auto heredoc_name_end_position = token_stream.find('$', 1); + if (heredoc_name_end_position != std::string::npos) + { + size_t heredoc_size = heredoc_name_end_position + 1; + std::string_view heredoc = {token_stream.data(), heredoc_size}; + + size_t heredoc_end_position = token_stream.find(heredoc, heredoc_size); + if (heredoc_end_position != std::string::npos) + { + + pos += heredoc_end_position; + pos += heredoc_size; + + return Token(TokenType::HereDoc, token_begin, pos); + } + } + + if (((pos + 1 < end && !isWordCharASCII(pos[1])) || pos + 1 == end)) + { + /// Capture standalone dollar sign + return Token(TokenType::DollarSign, token_begin, ++pos); + } } if (isWordCharASCII(*pos) || *pos == '$') { diff --git a/src/Parsers/Lexer.h b/src/Parsers/Lexer.h index 1bcfbb3afb9..f41e05147e5 100644 --- a/src/Parsers/Lexer.h +++ b/src/Parsers/Lexer.h @@ -33,6 +33,8 @@ namespace DB \ M(Asterisk) /** Could be used as multiplication operator or on it's own: "SELECT *" */ \ \ + M(HereDoc) \ + \ M(DollarSign) \ M(Plus) \ M(Minus) \ diff --git a/src/Parsers/ParserCreateUserQuery.cpp b/src/Parsers/ParserCreateUserQuery.cpp index 1d132582580..72246a27f80 100644 --- a/src/Parsers/ParserCreateUserQuery.cpp +++ b/src/Parsers/ParserCreateUserQuery.cpp @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -300,6 +301,23 @@ namespace return ParserKeyword{"ON"}.ignore(pos, expected) && ASTQueryWithOnCluster::parse(pos, cluster, expected); }); } + + bool parseDefaultDatabase(IParserBase::Pos & pos, Expected & expected, std::shared_ptr & default_database) + { + return IParserBase::wrapParseImpl(pos, [&] + { + if (!ParserKeyword{"DEFAULT DATABASE"}.ignore(pos, expected)) + return false; + + ASTPtr ast; + ParserDatabaseOrNone database_p; + if (!database_p.parse(pos, ast, expected)) + return false; + + default_database = typeid_cast>(ast); + return true; + }); + } } @@ -349,6 +367,7 @@ bool ParserCreateUserQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec std::shared_ptr default_roles; std::shared_ptr settings; std::shared_ptr grantees; + std::shared_ptr default_database; String cluster; while (true) @@ -390,6 +409,9 @@ bool ParserCreateUserQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec if (!grantees && parseGrantees(pos, expected, attach_mode, grantees)) continue; + if (!default_database && parseDefaultDatabase(pos, expected, default_database)) + continue; + if (alter) { if (new_name.empty() && (names->size() == 1) && parseRenameTo(pos, expected, new_name)) @@ -445,6 +467,7 @@ bool ParserCreateUserQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec query->default_roles = std::move(default_roles); query->settings = std::move(settings); query->grantees = std::move(grantees); + query->default_database = std::move(default_database); return true; } diff --git a/src/Parsers/ParserDatabaseOrNone.cpp b/src/Parsers/ParserDatabaseOrNone.cpp new file mode 100644 index 00000000000..c53c547aedd --- /dev/null +++ b/src/Parsers/ParserDatabaseOrNone.cpp @@ -0,0 +1,29 @@ +#include +#include +#include +#include + +namespace DB +{ +bool ParserDatabaseOrNone::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) +{ + auto result = std::make_shared(); + node = result; + + if (ParserKeyword{"NONE"}.ignore(pos, expected)) + { + result->none = true; + return true; + } + String database_name; + if (parseIdentifierOrStringLiteral(pos, expected, database_name)) + { + result->database_name = database_name; + return true; + } + return false; + + +} + +} diff --git a/src/Parsers/ParserDatabaseOrNone.h b/src/Parsers/ParserDatabaseOrNone.h new file mode 100644 index 00000000000..e5a21df4316 --- /dev/null +++ b/src/Parsers/ParserDatabaseOrNone.h @@ -0,0 +1,17 @@ +#pragma once +#include + +namespace DB +{ + +class ParserDatabaseOrNone : public IParserBase +{ +protected: + const char * getName() const override { return "DatabaseOrNone"; } + bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; + +}; + +} + + diff --git a/src/Parsers/ParserExplainQuery.cpp b/src/Parsers/ParserExplainQuery.cpp index dc548164157..b4ba0523239 100644 --- a/src/Parsers/ParserExplainQuery.cpp +++ b/src/Parsers/ParserExplainQuery.cpp @@ -19,6 +19,7 @@ bool ParserExplainQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected ParserKeyword s_syntax("SYNTAX"); ParserKeyword s_pipeline("PIPELINE"); ParserKeyword s_plan("PLAN"); + ParserKeyword s_estimates("ESTIMATE"); if (s_explain.ignore(pos, expected)) { @@ -32,6 +33,8 @@ bool ParserExplainQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected kind = ASTExplainQuery::ExplainKind::QueryPipeline; else if (s_plan.ignore(pos, expected)) kind = ASTExplainQuery::ExplainKind::QueryPlan; //-V1048 + else if (s_estimates.ignore(pos, expected)) + kind = ASTExplainQuery::ExplainKind::QueryEstimates; //-V1048 } else return false; diff --git a/src/Parsers/ParserGrantQuery.cpp b/src/Parsers/ParserGrantQuery.cpp index 9411fa93892..85a6c9c71d4 100644 --- a/src/Parsers/ParserGrantQuery.cpp +++ b/src/Parsers/ParserGrantQuery.cpp @@ -231,6 +231,7 @@ bool ParserGrantQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) if (attach_mode && !ParserKeyword{"ATTACH"}.ignore(pos, expected)) return false; + bool is_replace = false; bool is_revoke = false; if (ParserKeyword{"REVOKE"}.ignore(pos, expected)) is_revoke = true; @@ -271,6 +272,9 @@ bool ParserGrantQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) grant_option = true; else if (ParserKeyword{"WITH ADMIN OPTION"}.ignore(pos, expected)) admin_option = true; + + if (ParserKeyword{"WITH REPLACE OPTION"}.ignore(pos, expected)) + is_replace = true; } if (cluster.empty()) @@ -287,6 +291,17 @@ bool ParserGrantQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) element.grant_option = true; } + + bool replace_access = false; + bool replace_role = false; + if (is_replace) + { + if (roles) + replace_role = true; + else + replace_access = true; + } + if (!is_revoke) eraseNonGrantable(elements); @@ -300,6 +315,8 @@ bool ParserGrantQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) query->roles = std::move(roles); query->grantees = std::move(grantees); query->admin_option = admin_option; + query->replace_access = replace_access; + query->replace_granted_roles = replace_role; return true; } diff --git a/src/Parsers/queryNormalization.h b/src/Parsers/queryNormalization.h index bea365c02e4..e5b1724ccfa 100644 --- a/src/Parsers/queryNormalization.h +++ b/src/Parsers/queryNormalization.h @@ -28,7 +28,7 @@ inline UInt64 ALWAYS_INLINE normalizedQueryHash(const char * begin, const char * continue; /// Literals. - if (token.type == TokenType::Number || token.type == TokenType::StringLiteral) + if (token.type == TokenType::Number || token.type == TokenType::StringLiteral || token.type == TokenType::HereDoc) { if (0 == num_literals_in_sequence) hash.update("\x00", 1); @@ -156,7 +156,7 @@ inline void ALWAYS_INLINE normalizeQueryToPODArray(const char * begin, const cha prev_insignificant = false; /// Literals. - if (token.type == TokenType::Number || token.type == TokenType::StringLiteral) + if (token.type == TokenType::Number || token.type == TokenType::StringLiteral || token.type == TokenType::HereDoc) { if (0 == num_literals_in_sequence) res_data.push_back('?'); diff --git a/src/Processors/Executors/PipelineExecutor.cpp b/src/Processors/Executors/PipelineExecutor.cpp index ac6a2176979..b91c1caa4a5 100644 --- a/src/Processors/Executors/PipelineExecutor.cpp +++ b/src/Processors/Executors/PipelineExecutor.cpp @@ -45,6 +45,8 @@ PipelineExecutor::PipelineExecutor(Processors & processors_, QueryStatus * elem) try { graph = std::make_unique(processors); + if (process_list_element) + process_list_element->addPipelineExecutor(this); } catch (Exception & exception) { @@ -59,6 +61,12 @@ PipelineExecutor::PipelineExecutor(Processors & processors_, QueryStatus * elem) } } +PipelineExecutor::~PipelineExecutor() +{ + if (process_list_element) + process_list_element->removePipelineExecutor(this); +} + void PipelineExecutor::addChildlessProcessorsToStack(Stack & stack) { UInt64 num_processors = processors.size(); @@ -391,6 +399,9 @@ void PipelineExecutor::finish() void PipelineExecutor::execute(size_t num_threads) { + if (num_threads < 1) + num_threads = 1; + try { executeImpl(num_threads); diff --git a/src/Processors/Executors/PipelineExecutor.h b/src/Processors/Executors/PipelineExecutor.h index 213446ad43f..0652d81addc 100644 --- a/src/Processors/Executors/PipelineExecutor.h +++ b/src/Processors/Executors/PipelineExecutor.h @@ -31,6 +31,7 @@ public: /// /// Explicit graph representation is built in constructor. Throws if graph is not correct. explicit PipelineExecutor(Processors & processors_, QueryStatus * elem = nullptr); + ~PipelineExecutor(); /// Execute pipeline in multiple threads. Must be called once. /// In case of exception during execution throws any occurred. @@ -127,7 +128,7 @@ private: ProcessorsMap processors_map; /// Now it's used to check if query was killed. - QueryStatus * process_list_element = nullptr; + QueryStatus * const process_list_element = nullptr; /// Graph related methods. bool expandPipeline(Stack & stack, UInt64 pid); diff --git a/src/Processors/Executors/PullingAsyncPipelineExecutor.cpp b/src/Processors/Executors/PullingAsyncPipelineExecutor.cpp index ca5f4cc290f..8ecbe75af3a 100644 --- a/src/Processors/Executors/PullingAsyncPipelineExecutor.cpp +++ b/src/Processors/Executors/PullingAsyncPipelineExecutor.cpp @@ -174,9 +174,8 @@ void PullingAsyncPipelineExecutor::cancel() if (data && !data->is_finished && data->executor) data->executor->cancel(); - /// Finish lazy format. Otherwise thread.join() may hung. - if (lazy_format && !lazy_format->isFinished()) - lazy_format->finish(); + /// The following code is needed to rethrow exception from PipelineExecutor. + /// It could have been thrown from pull(), but we will not likely call it again. /// Join thread here to wait for possible exception. if (data && data->thread.joinable()) diff --git a/src/Processors/Formats/IInputFormat.h b/src/Processors/Formats/IInputFormat.h index f8811962260..59d33e3295e 100644 --- a/src/Processors/Formats/IInputFormat.h +++ b/src/Processors/Formats/IInputFormat.h @@ -72,12 +72,16 @@ public: size_t getCurrentUnitNumber() const { return current_unit_number; } void setCurrentUnitNumber(size_t current_unit_number_) { current_unit_number = current_unit_number_; } + void addBuffer(std::unique_ptr buffer) { owned_buffers.emplace_back(std::move(buffer)); } + protected: ColumnMappingPtr column_mapping{}; private: /// Number of currently parsed chunk (if parallel parsing is enabled) size_t current_unit_number = 0; + + std::vector> owned_buffers; }; } diff --git a/src/Processors/Formats/Impl/ArrowBlockInputFormat.h b/src/Processors/Formats/Impl/ArrowBlockInputFormat.h index 3bfead93bf1..9f458dece7f 100644 --- a/src/Processors/Formats/Impl/ArrowBlockInputFormat.h +++ b/src/Processors/Formats/Impl/ArrowBlockInputFormat.h @@ -1,5 +1,8 @@ #pragma once -#include "config_formats.h" +#if !defined(ARCADIA_BUILD) +# include "config_formats.h" +#endif + #if USE_ARROW #include diff --git a/src/Processors/Formats/Impl/ArrowBlockOutputFormat.h b/src/Processors/Formats/Impl/ArrowBlockOutputFormat.h index 40d81f8b919..44d46e97d2a 100644 --- a/src/Processors/Formats/Impl/ArrowBlockOutputFormat.h +++ b/src/Processors/Formats/Impl/ArrowBlockOutputFormat.h @@ -1,5 +1,8 @@ #pragma once -#include "config_formats.h" +#if !defined(ARCADIA_BUILD) +# include "config_formats.h" +#endif + #if USE_ARROW #include diff --git a/src/Processors/Formats/Impl/ArrowBufferedStreams.cpp b/src/Processors/Formats/Impl/ArrowBufferedStreams.cpp index 9582e0c3312..243f3da5903 100644 --- a/src/Processors/Formats/Impl/ArrowBufferedStreams.cpp +++ b/src/Processors/Formats/Impl/ArrowBufferedStreams.cpp @@ -6,7 +6,7 @@ #include #include #include -#include +#include #include #include diff --git a/src/Processors/Formats/Impl/ArrowBufferedStreams.h b/src/Processors/Formats/Impl/ArrowBufferedStreams.h index a10a5bcabdb..a49936f326c 100644 --- a/src/Processors/Formats/Impl/ArrowBufferedStreams.h +++ b/src/Processors/Formats/Impl/ArrowBufferedStreams.h @@ -1,5 +1,8 @@ #pragma once -#include "config_formats.h" +#if !defined(ARCADIA_BUILD) +# include "config_formats.h" +#endif + #if USE_ARROW || USE_ORC || USE_PARQUET #include diff --git a/src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp b/src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp index 01c19deb837..84c56f0f2b7 100644 --- a/src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp +++ b/src/Processors/Formats/Impl/ArrowColumnToCHColumn.cpp @@ -1,7 +1,7 @@ -#include "config_formats.h" #include "ArrowColumnToCHColumn.h" #if USE_ARROW || USE_ORC || USE_PARQUET + #include #include #include @@ -10,9 +10,11 @@ #include #include #include +#include #include #include #include +#include #include #include #include @@ -22,128 +24,153 @@ #include #include #include -#include +#include + + +#define FOR_ARROW_NUMERIC_TYPES(M) \ + M(arrow::Type::UINT8, DB::UInt8) \ + M(arrow::Type::INT8, DB::Int8) \ + M(arrow::Type::UINT16, DB::UInt16) \ + M(arrow::Type::INT16, DB::Int16) \ + M(arrow::Type::UINT32, DB::UInt32) \ + M(arrow::Type::INT32, DB::Int32) \ + M(arrow::Type::UINT64, DB::UInt64) \ + M(arrow::Type::INT64, DB::Int64) \ + M(arrow::Type::HALF_FLOAT, DB::Float32) \ + M(arrow::Type::FLOAT, DB::Float32) \ + M(arrow::Type::DOUBLE, DB::Float64) + +#define FOR_ARROW_INDEXES_TYPES(M) \ + M(arrow::Type::UINT8, DB::UInt8) \ + M(arrow::Type::INT8, DB::UInt8) \ + M(arrow::Type::UINT16, DB::UInt16) \ + M(arrow::Type::INT16, DB::UInt16) \ + M(arrow::Type::UINT32, DB::UInt32) \ + M(arrow::Type::INT32, DB::UInt32) \ + M(arrow::Type::UINT64, DB::UInt64) \ + M(arrow::Type::INT64, DB::UInt64) namespace DB { - namespace ErrorCodes - { - extern const int UNKNOWN_TYPE; - extern const int VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE; - extern const int CANNOT_CONVERT_TYPE; - extern const int CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN; - extern const int THERE_IS_NO_COLUMN; - extern const int BAD_ARGUMENTS; - } - static const std::initializer_list> arrow_type_to_internal_type = - { - {arrow::Type::UINT8, "UInt8"}, - {arrow::Type::INT8, "Int8"}, - {arrow::Type::UINT16, "UInt16"}, - {arrow::Type::INT16, "Int16"}, - {arrow::Type::UINT32, "UInt32"}, - {arrow::Type::INT32, "Int32"}, - {arrow::Type::UINT64, "UInt64"}, - {arrow::Type::INT64, "Int64"}, - {arrow::Type::HALF_FLOAT, "Float32"}, - {arrow::Type::FLOAT, "Float32"}, - {arrow::Type::DOUBLE, "Float64"}, +namespace ErrorCodes +{ + extern const int UNKNOWN_TYPE; + extern const int VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE; + extern const int CANNOT_CONVERT_TYPE; + extern const int CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN; + extern const int THERE_IS_NO_COLUMN; + extern const int BAD_ARGUMENTS; +} - {arrow::Type::BOOL, "UInt8"}, - {arrow::Type::DATE32, "Date"}, - {arrow::Type::DATE32, "Date32"}, - {arrow::Type::DATE64, "DateTime"}, - {arrow::Type::TIMESTAMP, "DateTime"}, +static const std::initializer_list> arrow_type_to_internal_type = +{ + {arrow::Type::UINT8, "UInt8"}, + {arrow::Type::INT8, "Int8"}, + {arrow::Type::UINT16, "UInt16"}, + {arrow::Type::INT16, "Int16"}, + {arrow::Type::UINT32, "UInt32"}, + {arrow::Type::INT32, "Int32"}, + {arrow::Type::UINT64, "UInt64"}, + {arrow::Type::INT64, "Int64"}, + {arrow::Type::HALF_FLOAT, "Float32"}, + {arrow::Type::FLOAT, "Float32"}, + {arrow::Type::DOUBLE, "Float64"}, - {arrow::Type::STRING, "String"}, - {arrow::Type::BINARY, "String"}, + {arrow::Type::BOOL, "UInt8"}, + {arrow::Type::DATE32, "Date"}, + {arrow::Type::DATE32, "Date32"}, + {arrow::Type::DATE64, "DateTime"}, + {arrow::Type::TIMESTAMP, "DateTime"}, - // TODO: add other types that are convertible to internal ones: - // 0. ENUM? - // 1. UUID -> String - // 2. JSON -> String - // Full list of types: contrib/arrow/cpp/src/arrow/type.h - }; + {arrow::Type::STRING, "String"}, + {arrow::Type::BINARY, "String"}, + + // TODO: add other types that are convertible to internal ones: + // 0. ENUM? + // 1. UUID -> String + // 2. JSON -> String + // Full list of types: contrib/arrow/cpp/src/arrow/type.h +}; /// Inserts numeric data right into internal column data to reduce an overhead - template > - static void fillColumnWithNumericData(std::shared_ptr & arrow_column, IColumn & internal_column) +template > +static void fillColumnWithNumericData(std::shared_ptr & arrow_column, IColumn & internal_column) +{ + auto & column_data = static_cast(internal_column).getData(); + column_data.reserve(arrow_column->length()); + + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - auto & column_data = static_cast(internal_column).getData(); - column_data.reserve(arrow_column->length()); + std::shared_ptr chunk = arrow_column->chunk(chunk_i); + /// buffers[0] is a null bitmap and buffers[1] are actual values + std::shared_ptr buffer = chunk->data()->buffers[1]; - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - std::shared_ptr chunk = arrow_column->chunk(chunk_i); - /// buffers[0] is a null bitmap and buffers[1] are actual values - std::shared_ptr buffer = chunk->data()->buffers[1]; - - const auto * raw_data = reinterpret_cast(buffer->data()); - column_data.insert_assume_reserved(raw_data, raw_data + chunk->length()); - } + const auto * raw_data = reinterpret_cast(buffer->data()); + column_data.insert_assume_reserved(raw_data, raw_data + chunk->length()); } +} /// Inserts chars and offsets right into internal column data to reduce an overhead. /// Internal offsets are shifted by one to the right in comparison with Arrow ones. So the last offset should map to the end of all chars. /// Also internal strings are null terminated. - static void fillColumnWithStringData(std::shared_ptr & arrow_column, IColumn & internal_column) +static void fillColumnWithStringData(std::shared_ptr & arrow_column, IColumn & internal_column) +{ + PaddedPODArray & column_chars_t = assert_cast(internal_column).getChars(); + PaddedPODArray & column_offsets = assert_cast(internal_column).getOffsets(); + + size_t chars_t_size = 0; + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - PaddedPODArray & column_chars_t = assert_cast(internal_column).getChars(); - PaddedPODArray & column_offsets = assert_cast(internal_column).getOffsets(); + arrow::BinaryArray & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + const size_t chunk_length = chunk.length(); - size_t chars_t_size = 0; - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + if (chunk_length > 0) { - arrow::BinaryArray & chunk = static_cast(*(arrow_column->chunk(chunk_i))); - const size_t chunk_length = chunk.length(); - - if (chunk_length > 0) - { - chars_t_size += chunk.value_offset(chunk_length - 1) + chunk.value_length(chunk_length - 1); - chars_t_size += chunk_length; /// additional space for null bytes - } - } - - column_chars_t.reserve(chars_t_size); - column_offsets.reserve(arrow_column->length()); - - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - arrow::BinaryArray & chunk = static_cast(*(arrow_column->chunk(chunk_i))); - std::shared_ptr buffer = chunk.value_data(); - const size_t chunk_length = chunk.length(); - - for (size_t offset_i = 0; offset_i != chunk_length; ++offset_i) - { - if (!chunk.IsNull(offset_i) && buffer) - { - const auto * raw_data = buffer->data() + chunk.value_offset(offset_i); - column_chars_t.insert_assume_reserved(raw_data, raw_data + chunk.value_length(offset_i)); - } - column_chars_t.emplace_back('\0'); - - column_offsets.emplace_back(column_chars_t.size()); - } + chars_t_size += chunk.value_offset(chunk_length - 1) + chunk.value_length(chunk_length - 1); + chars_t_size += chunk_length; /// additional space for null bytes } } - static void fillColumnWithBooleanData(std::shared_ptr & arrow_column, IColumn & internal_column) + column_chars_t.reserve(chars_t_size); + column_offsets.reserve(arrow_column->length()); + + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - auto & column_data = assert_cast &>(internal_column).getData(); - column_data.reserve(arrow_column->length()); + arrow::BinaryArray & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + std::shared_ptr buffer = chunk.value_data(); + const size_t chunk_length = chunk.length(); - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + for (size_t offset_i = 0; offset_i != chunk_length; ++offset_i) { - arrow::BooleanArray & chunk = static_cast(*(arrow_column->chunk(chunk_i))); - /// buffers[0] is a null bitmap and buffers[1] are actual values - std::shared_ptr buffer = chunk.data()->buffers[1]; + if (!chunk.IsNull(offset_i) && buffer) + { + const auto * raw_data = buffer->data() + chunk.value_offset(offset_i); + column_chars_t.insert_assume_reserved(raw_data, raw_data + chunk.value_length(offset_i)); + } + column_chars_t.emplace_back('\0'); - for (size_t bool_i = 0; bool_i != static_cast(chunk.length()); ++bool_i) - column_data.emplace_back(chunk.Value(bool_i)); + column_offsets.emplace_back(column_chars_t.size()); } } +} + +static void fillColumnWithBooleanData(std::shared_ptr & arrow_column, IColumn & internal_column) +{ + auto & column_data = assert_cast &>(internal_column).getData(); + column_data.reserve(arrow_column->length()); + + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + { + arrow::BooleanArray & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + /// buffers[0] is a null bitmap and buffers[1] are actual values + std::shared_ptr buffer = chunk.data()->buffers[1]; + + for (size_t bool_i = 0; bool_i != static_cast(chunk.length()); ++bool_i) + column_data.emplace_back(chunk.Value(bool_i)); + } +} /// Arrow stores Parquet::DATE in Int32, while ClickHouse stores Date in UInt16. Therefore, it should be checked before saving static void fillColumnWithDate32Data(std::shared_ptr & arrow_column, IColumn & internal_column) @@ -153,487 +180,506 @@ static void fillColumnWithDate32Data(std::shared_ptr & arro for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - arrow::Date32Array & chunk = static_cast(*(arrow_column->chunk(chunk_i))); + arrow::Date32Array & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) { UInt32 days_num = static_cast(chunk.Value(value_i)); + if (days_num > DATE_LUT_MAX_DAY_NUM) - { - // TODO: will it rollback correctly? - throw Exception - { - fmt::format("Input value {} of a column \"{}\" is greater than max allowed Date value, which is {}", days_num, internal_column.getName(), DATE_LUT_MAX_DAY_NUM), - ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE - }; - } + throw Exception(ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE, + "Input value {} of a column '{}' is greater than max allowed Date value, which is {}", + days_num, internal_column.getName(), DATE_LUT_MAX_DAY_NUM); column_data.emplace_back(days_num); } } } - static void fillDate32ColumnWithDate32Data(std::shared_ptr & arrow_column, IColumn & internal_column) +static void fillDate32ColumnWithDate32Data(std::shared_ptr & arrow_column, IColumn & internal_column) +{ + PaddedPODArray & column_data = assert_cast &>(internal_column).getData(); + column_data.reserve(arrow_column->length()); + + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - PaddedPODArray & column_data = assert_cast &>(internal_column).getData(); - column_data.reserve(arrow_column->length()); + arrow::Date32Array & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) { - arrow::Date32Array & chunk = static_cast(*(arrow_column->chunk(chunk_i))); + Int32 days_num = static_cast(chunk.Value(value_i)); + if (days_num > DATE_LUT_MAX_EXTEND_DAY_NUM) + throw Exception(ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE, + "Input value {} of a column '{}' is greater than max allowed Date value, which is {}", days_num, internal_column.getName(), DATE_LUT_MAX_DAY_NUM); - for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) - { - Int32 days_num = static_cast(chunk.Value(value_i)); - if (days_num > DATE_LUT_MAX_EXTEND_DAY_NUM) - { - // TODO: will it rollback correctly? - throw Exception - { - fmt::format("Input value {} of a column \"{}\" is greater than max allowed Date value, which is {}", days_num, internal_column.getName(), DATE_LUT_MAX_DAY_NUM), - ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE - }; - } - - column_data.emplace_back(days_num); - } + column_data.emplace_back(days_num); } } +} /// Arrow stores Parquet::DATETIME in Int64, while ClickHouse stores DateTime in UInt32. Therefore, it should be checked before saving - static void fillColumnWithDate64Data(std::shared_ptr & arrow_column, IColumn & internal_column) - { - auto & column_data = assert_cast &>(internal_column).getData(); - column_data.reserve(arrow_column->length()); +static void fillColumnWithDate64Data(std::shared_ptr & arrow_column, IColumn & internal_column) +{ + auto & column_data = assert_cast &>(internal_column).getData(); + column_data.reserve(arrow_column->length()); - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + { + auto & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) { - auto & chunk = static_cast(*(arrow_column->chunk(chunk_i))); - for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) - { - auto timestamp = static_cast(chunk.Value(value_i) / 1000); // Always? in ms - column_data.emplace_back(timestamp); - } + auto timestamp = static_cast(chunk.Value(value_i) / 1000); // Always? in ms + column_data.emplace_back(timestamp); } } +} - static void fillColumnWithTimestampData(std::shared_ptr & arrow_column, IColumn & internal_column) +static void fillColumnWithTimestampData(std::shared_ptr & arrow_column, IColumn & internal_column) +{ + auto & column_data = assert_cast &>(internal_column).getData(); + column_data.reserve(arrow_column->length()); + + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - auto & column_data = assert_cast &>(internal_column).getData(); - column_data.reserve(arrow_column->length()); + auto & chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + const auto & type = static_cast(*chunk.type()); - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + UInt32 divide = 1; + const auto unit = type.unit(); + switch (unit) { - auto & chunk = static_cast(*(arrow_column->chunk(chunk_i))); - const auto & type = static_cast(*chunk.type()); + case arrow::TimeUnit::SECOND: + divide = 1; + break; + case arrow::TimeUnit::MILLI: + divide = 1000; + break; + case arrow::TimeUnit::MICRO: + divide = 1000000; + break; + case arrow::TimeUnit::NANO: + divide = 1000000000; + break; + } - UInt32 divide = 1; - const auto unit = type.unit(); - switch (unit) - { - case arrow::TimeUnit::SECOND: - divide = 1; - break; - case arrow::TimeUnit::MILLI: - divide = 1000; - break; - case arrow::TimeUnit::MICRO: - divide = 1000000; - break; - case arrow::TimeUnit::NANO: - divide = 1000000000; - break; - } - - for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) - { - auto timestamp = static_cast(chunk.Value(value_i) / divide); // ms! TODO: check other 's' 'ns' ... - column_data.emplace_back(timestamp); - } + for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) + { + auto timestamp = static_cast(chunk.Value(value_i) / divide); // ms! TODO: check other 's' 'ns' ... + column_data.emplace_back(timestamp); } } +} - template - static void fillColumnWithDecimalData(std::shared_ptr & arrow_column, IColumn & internal_column) +template +static void fillColumnWithDecimalData(std::shared_ptr & arrow_column, IColumn & internal_column) +{ + auto & column = assert_cast &>(internal_column); + auto & column_data = column.getData(); + column_data.reserve(arrow_column->length()); + + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - auto & column = assert_cast &>(internal_column); - auto & column_data = column.getData(); - column_data.reserve(arrow_column->length()); - - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + auto & chunk = static_cast(*(arrow_column->chunk(chunk_i))); + for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) { - auto & chunk = static_cast(*(arrow_column->chunk(chunk_i))); - for (size_t value_i = 0, length = static_cast(chunk.length()); value_i < length; ++value_i) - { - column_data.emplace_back(chunk.IsNull(value_i) ? DecimalType(0) : *reinterpret_cast(chunk.Value(value_i))); // TODO: copy column - } + column_data.emplace_back(chunk.IsNull(value_i) ? DecimalType(0) : *reinterpret_cast(chunk.Value(value_i))); // TODO: copy column } } +} /// Creates a null bytemap from arrow's null bitmap - static void fillByteMapFromArrowColumn(std::shared_ptr & arrow_column, IColumn & bytemap) +static void fillByteMapFromArrowColumn(std::shared_ptr & arrow_column, IColumn & bytemap) +{ + PaddedPODArray & bytemap_data = assert_cast &>(bytemap).getData(); + bytemap_data.reserve(arrow_column->length()); + + for (size_t chunk_i = 0; chunk_i != static_cast(arrow_column->num_chunks()); ++chunk_i) { - PaddedPODArray & bytemap_data = assert_cast &>(bytemap).getData(); - bytemap_data.reserve(arrow_column->length()); + std::shared_ptr chunk = arrow_column->chunk(chunk_i); - for (size_t chunk_i = 0; chunk_i != static_cast(arrow_column->num_chunks()); ++chunk_i) - { - std::shared_ptr chunk = arrow_column->chunk(chunk_i); - - for (size_t value_i = 0; value_i != static_cast(chunk->length()); ++value_i) - bytemap_data.emplace_back(chunk->IsNull(value_i)); - } + for (size_t value_i = 0; value_i != static_cast(chunk->length()); ++value_i) + bytemap_data.emplace_back(chunk->IsNull(value_i)); } +} - static void fillOffsetsFromArrowListColumn(std::shared_ptr & arrow_column, IColumn & offsets) +static void fillOffsetsFromArrowListColumn(std::shared_ptr & arrow_column, IColumn & offsets) +{ + ColumnArray::Offsets & offsets_data = assert_cast &>(offsets).getData(); + offsets_data.reserve(arrow_column->length()); + + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) { - ColumnArray::Offsets & offsets_data = assert_cast &>(offsets).getData(); - offsets_data.reserve(arrow_column->length()); - - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - arrow::ListArray & list_chunk = static_cast(*(arrow_column->chunk(chunk_i))); - auto arrow_offsets_array = list_chunk.offsets(); - auto & arrow_offsets = static_cast(*arrow_offsets_array); - auto start = offsets_data.back(); - for (int64_t i = 1; i < arrow_offsets.length(); ++i) - offsets_data.emplace_back(start + arrow_offsets.Value(i)); - } + arrow::ListArray & list_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + auto arrow_offsets_array = list_chunk.offsets(); + auto & arrow_offsets = dynamic_cast(*arrow_offsets_array); + auto start = offsets_data.back(); + for (int64_t i = 1; i < arrow_offsets.length(); ++i) + offsets_data.emplace_back(start + arrow_offsets.Value(i)); } - static ColumnPtr createAndFillColumnWithIndexesData(std::shared_ptr & arrow_column) +} +static ColumnPtr createAndFillColumnWithIndexesData(std::shared_ptr & arrow_column) +{ + switch (arrow_column->type()->id()) { - switch (arrow_column->type()->id()) - { -# define DISPATCH(ARROW_NUMERIC_TYPE, CPP_NUMERIC_TYPE) \ - case ARROW_NUMERIC_TYPE: \ - { \ - auto column = DataTypeNumber().createColumn(); \ - fillColumnWithNumericData(arrow_column, *column); \ - return column; \ - } - FOR_ARROW_INDEXES_TYPES(DISPATCH) -# undef DISPATCH - default: - throw Exception(fmt::format("Unsupported type for indexes in LowCardinality: {}.", arrow_column->type()->name()), ErrorCodes::BAD_ARGUMENTS); - } - } - - static void readColumnFromArrowColumn( - std::shared_ptr & arrow_column, - IColumn & internal_column, - const std::string & column_name, - const std::string & format_name, - bool is_nullable, - std::unordered_map dictionary_values) - { - if (internal_column.isNullable()) - { - ColumnNullable & column_nullable = assert_cast(internal_column); - readColumnFromArrowColumn(arrow_column, column_nullable.getNestedColumn(), column_name, format_name, true, dictionary_values); - fillByteMapFromArrowColumn(arrow_column, column_nullable.getNullMapColumn()); - return; - } - - /// TODO: check if a column is const? - if (!is_nullable && arrow_column->null_count() && arrow_column->type()->id() != arrow::Type::LIST - && arrow_column->type()->id() != arrow::Type::MAP && arrow_column->type()->id() != arrow::Type::STRUCT) - { - throw Exception - { - fmt::format("Can not insert NULL data into non-nullable column \"{}\".", column_name), - ErrorCodes::CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN - }; - } - - switch (arrow_column->type()->id()) - { - case arrow::Type::STRING: - case arrow::Type::BINARY: - //case arrow::Type::FIXED_SIZE_BINARY: - fillColumnWithStringData(arrow_column, internal_column); - break; - case arrow::Type::BOOL: - fillColumnWithBooleanData(arrow_column, internal_column); - break; - case arrow::Type::DATE32: - if (WhichDataType(internal_column.getDataType()).isUInt16()) - { - fillColumnWithDate32Data(arrow_column, internal_column); - } - else - { - fillDate32ColumnWithDate32Data(arrow_column, internal_column); - } - break; - case arrow::Type::DATE64: - fillColumnWithDate64Data(arrow_column, internal_column); - break; - case arrow::Type::TIMESTAMP: - fillColumnWithTimestampData(arrow_column, internal_column); - break; - case arrow::Type::DECIMAL128: - fillColumnWithDecimalData(arrow_column, internal_column /*, internal_nested_type*/); - break; - case arrow::Type::DECIMAL256: - fillColumnWithDecimalData(arrow_column, internal_column /*, internal_nested_type*/); - break; - case arrow::Type::MAP: [[fallthrough]]; - case arrow::Type::LIST: - { - arrow::ArrayVector array_vector; - array_vector.reserve(arrow_column->num_chunks()); - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - arrow::ListArray & list_chunk = static_cast(*(arrow_column->chunk(chunk_i))); - std::shared_ptr chunk = list_chunk.values(); - array_vector.emplace_back(std::move(chunk)); - } - auto arrow_nested_column = std::make_shared(array_vector); - - ColumnArray & column_array = arrow_column->type()->id() == arrow::Type::MAP - ? assert_cast(internal_column).getNestedColumn() - : assert_cast(internal_column); - - readColumnFromArrowColumn(arrow_nested_column, column_array.getData(), column_name, format_name, false, dictionary_values); - fillOffsetsFromArrowListColumn(arrow_column, column_array.getOffsetsColumn()); - break; - } - case arrow::Type::STRUCT: - { - ColumnTuple & column_tuple = assert_cast(internal_column); - int fields_count = column_tuple.tupleSize(); - std::vector nested_arrow_columns(fields_count); - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - arrow::StructArray & struct_chunk = static_cast(*(arrow_column->chunk(chunk_i))); - for (int i = 0; i < fields_count; ++i) - nested_arrow_columns[i].emplace_back(struct_chunk.field(i)); - } - - for (int i = 0; i != fields_count; ++i) - { - auto nested_arrow_column = std::make_shared(nested_arrow_columns[i]); - readColumnFromArrowColumn(nested_arrow_column, column_tuple.getColumn(i), column_name, format_name, false, dictionary_values); - } - break; - } - case arrow::Type::DICTIONARY: - { - ColumnLowCardinality & column_lc = assert_cast(internal_column); - auto & dict_values = dictionary_values[column_name]; - /// Load dictionary values only once and reuse it. - if (!dict_values) - { - arrow::ArrayVector dict_array; - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - arrow::DictionaryArray & dict_chunk = static_cast(*(arrow_column->chunk(chunk_i))); - dict_array.emplace_back(dict_chunk.dictionary()); - } - auto arrow_dict_column = std::make_shared(dict_array); - - auto dict_column = IColumn::mutate(column_lc.getDictionaryPtr()); - auto * uniq_column = static_cast(dict_column.get()); - auto values_column = uniq_column->getNestedColumn()->cloneEmpty(); - readColumnFromArrowColumn(arrow_dict_column, *values_column, column_name, format_name, false, dictionary_values); - uniq_column->uniqueInsertRangeFrom(*values_column, 0, values_column->size()); - dict_values = std::move(dict_column); - } - - arrow::ArrayVector indexes_array; - for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) - { - arrow::DictionaryArray & dict_chunk = static_cast(*(arrow_column->chunk(chunk_i))); - indexes_array.emplace_back(dict_chunk.indices()); - } - - auto arrow_indexes_column = std::make_shared(indexes_array); - auto indexes_column = createAndFillColumnWithIndexesData(arrow_indexes_column); - - auto new_column_lc = ColumnLowCardinality::create(dict_values, std::move(indexes_column)); - column_lc = std::move(*new_column_lc); - break; - } # define DISPATCH(ARROW_NUMERIC_TYPE, CPP_NUMERIC_TYPE) \ case ARROW_NUMERIC_TYPE: \ - fillColumnWithNumericData(arrow_column, internal_column); \ - break; - - FOR_ARROW_NUMERIC_TYPES(DISPATCH) -# undef DISPATCH - // TODO: support TIMESTAMP_MICROS and TIMESTAMP_MILLIS with truncated micro- and milliseconds? - // TODO: read JSON as a string? - // TODO: read UUID as a string? - default: - throw Exception - { - fmt::format(R"(Unsupported {} type "{}" of an input column "{}".)", format_name, arrow_column->type()->name(), column_name), - ErrorCodes::UNKNOWN_TYPE - }; + { \ + auto column = DataTypeNumber().createColumn(); \ + fillColumnWithNumericData(arrow_column, *column); \ + return column; \ } + FOR_ARROW_INDEXES_TYPES(DISPATCH) +# undef DISPATCH + default: + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Unsupported type for indexes in LowCardinality: {}.", arrow_column->type()->name()); + } +} + +static void readColumnFromArrowColumn( + std::shared_ptr & arrow_column, + IColumn & internal_column, + const std::string & column_name, + const std::string & format_name, + bool is_nullable, + std::unordered_map dictionary_values) +{ + if (internal_column.isNullable()) + { + ColumnNullable & column_nullable = assert_cast(internal_column); + readColumnFromArrowColumn( + arrow_column, column_nullable.getNestedColumn(), column_name, format_name, true, dictionary_values); + fillByteMapFromArrowColumn(arrow_column, column_nullable.getNullMapColumn()); + return; } - static DataTypePtr getInternalType(std::shared_ptr arrow_type, const DataTypePtr & column_type, const std::string & column_name, const std::string & format_name) + /// TODO: check if a column is const? + if (!is_nullable && arrow_column->null_count() && arrow_column->type()->id() != arrow::Type::LIST + && arrow_column->type()->id() != arrow::Type::MAP && arrow_column->type()->id() != arrow::Type::STRUCT) { - if (column_type->isNullable()) - { - DataTypePtr nested_type = assert_cast(column_type.get())->getNestedType(); - return makeNullable(getInternalType(arrow_type, nested_type, column_name, format_name)); - } - - if (arrow_type->id() == arrow::Type::DECIMAL128) - { - const auto * decimal_type = static_cast(arrow_type.get()); - return std::make_shared>(decimal_type->precision(), decimal_type->scale()); - } - - if (arrow_type->id() == arrow::Type::DECIMAL256) - { - const auto * decimal_type = static_cast(arrow_type.get()); - return std::make_shared>(decimal_type->precision(), decimal_type->scale()); - } - - if (arrow_type->id() == arrow::Type::LIST) - { - const auto * list_type = static_cast(arrow_type.get()); - auto list_nested_type = list_type->value_type(); - - const DataTypeArray * array_type = typeid_cast(column_type.get()); - if (!array_type) - throw Exception{fmt::format("Cannot convert arrow LIST type to a not Array ClickHouse type {}.", column_type->getName()), ErrorCodes::CANNOT_CONVERT_TYPE}; - - return std::make_shared(getInternalType(list_nested_type, array_type->getNestedType(), column_name, format_name)); - } - - if (arrow_type->id() == arrow::Type::STRUCT) - { - const auto * struct_type = static_cast(arrow_type.get()); - const DataTypeTuple * tuple_type = typeid_cast(column_type.get()); - if (!tuple_type) - throw Exception{fmt::format("Cannot convert arrow STRUCT type to a not Tuple ClickHouse type {}.", column_type->getName()), ErrorCodes::CANNOT_CONVERT_TYPE}; - - const DataTypes & tuple_nested_types = tuple_type->getElements(); - int internal_fields_num = tuple_nested_types.size(); - /// If internal column has less elements then arrow struct, we will select only first internal_fields_num columns. - if (internal_fields_num > struct_type->num_fields()) - throw Exception - { - fmt::format( - "Cannot convert arrow STRUCT with {} fields to a ClickHouse Tuple with {} elements: {}.", - struct_type->num_fields(), - internal_fields_num, - column_type->getName()), - ErrorCodes::CANNOT_CONVERT_TYPE - }; - - DataTypes nested_types; - for (int i = 0; i < internal_fields_num; ++i) - nested_types.push_back(getInternalType(struct_type->field(i)->type(), tuple_nested_types[i], column_name, format_name)); - - return std::make_shared(std::move(nested_types)); - } - - if (arrow_type->id() == arrow::Type::DICTIONARY) - { - const auto * arrow_dict_type = static_cast(arrow_type.get()); - const auto * lc_type = typeid_cast(column_type.get()); - /// We allow to insert arrow dictionary into a non-LowCardinality column. - const auto & dict_type = lc_type ? lc_type->getDictionaryType() : column_type; - return std::make_shared(getInternalType(arrow_dict_type->value_type(), dict_type, column_name, format_name)); - } - - if (arrow_type->id() == arrow::Type::MAP) - { - const auto * arrow_map_type = typeid_cast(arrow_type.get()); - const auto * map_type = typeid_cast(column_type.get()); - if (!map_type) - throw Exception{fmt::format("Cannot convert arrow MAP type to a not Map ClickHouse type {}.", column_type->getName()), ErrorCodes::CANNOT_CONVERT_TYPE}; - - return std::make_shared( - getInternalType(arrow_map_type->key_type(), map_type->getKeyType(), column_name, format_name), - getInternalType(arrow_map_type->item_type(), map_type->getValueType(), column_name, format_name) - ); - } - - auto filter = [=](auto && elem) - { - auto which = WhichDataType(column_type); - if (arrow_type->id() == arrow::Type::DATE32 && which.isDateOrDate32()) - { - return (strcmp(elem.second, "Date") == 0 && which.isDate()) || (strcmp(elem.second, "Date32") == 0 && which.isDate32()); - } - else - { - return elem.first == arrow_type->id(); - } - }; - if (const auto * internal_type_it = std::find_if(arrow_type_to_internal_type.begin(), arrow_type_to_internal_type.end(), filter); - internal_type_it != arrow_type_to_internal_type.end()) - { - return DataTypeFactory::instance().get(internal_type_it->second); - } throw Exception { - fmt::format(R"(The type "{}" of an input column "{}" is not supported for conversion from a {} data format.)", arrow_type->name(), column_name, format_name), - ErrorCodes::CANNOT_CONVERT_TYPE + ErrorCodes::CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, + "Can not insert NULL data into non-nullable column \"{}\".", column_name }; } - ArrowColumnToCHColumn::ArrowColumnToCHColumn(const Block & header_, std::shared_ptr schema_, const std::string & format_name_) : header(header_), format_name(format_name_) + switch (arrow_column->type()->id()) { - for (const auto & field : schema_->fields()) - { - if (header.has(field->name())) + case arrow::Type::STRING: + case arrow::Type::BINARY: + //case arrow::Type::FIXED_SIZE_BINARY: + fillColumnWithStringData(arrow_column, internal_column); + break; + case arrow::Type::BOOL: + fillColumnWithBooleanData(arrow_column, internal_column); + break; + case arrow::Type::DATE32: + if (WhichDataType(internal_column.getDataType()).isUInt16()) { - const auto column_type = recursiveRemoveLowCardinality(header.getByName(field->name()).type); - name_to_internal_type[field->name()] = getInternalType(field->type(), column_type, field->name(), format_name); + fillColumnWithDate32Data(arrow_column, internal_column); } - } - } - - void ArrowColumnToCHColumn::arrowTableToCHChunk(Chunk & res, std::shared_ptr & table) - { - Columns columns_list; - UInt64 num_rows = 0; - - columns_list.reserve(header.rows()); - - using NameToColumnPtr = std::unordered_map>; - - NameToColumnPtr name_to_column_ptr; - for (const auto& column_name : table->ColumnNames()) + else + { + fillDate32ColumnWithDate32Data(arrow_column, internal_column); + } + break; + case arrow::Type::DATE64: + fillColumnWithDate64Data(arrow_column, internal_column); + break; + case arrow::Type::TIMESTAMP: + fillColumnWithTimestampData(arrow_column, internal_column); + break; +#if defined(ARCADIA_BUILD) + case arrow::Type::DECIMAL: + fillColumnWithDecimalData(arrow_column, internal_column /*, internal_nested_type*/); + break; +#else + case arrow::Type::DECIMAL128: + fillColumnWithDecimalData(arrow_column, internal_column /*, internal_nested_type*/); + break; + case arrow::Type::DECIMAL256: + fillColumnWithDecimalData(arrow_column, internal_column /*, internal_nested_type*/); + break; +#endif + case arrow::Type::MAP: [[fallthrough]]; + case arrow::Type::LIST: { - std::shared_ptr arrow_column = table->GetColumnByName(column_name); - name_to_column_ptr[column_name] = arrow_column; - } + arrow::ArrayVector array_vector; + array_vector.reserve(arrow_column->num_chunks()); + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + { + arrow::ListArray & list_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + std::shared_ptr chunk = list_chunk.values(); + array_vector.emplace_back(std::move(chunk)); + } + auto arrow_nested_column = std::make_shared(array_vector); - for (size_t column_i = 0, columns = header.columns(); column_i < columns; ++column_i) + ColumnArray & column_array = arrow_column->type()->id() == arrow::Type::MAP + ? assert_cast(internal_column).getNestedColumn() + : assert_cast(internal_column); + + readColumnFromArrowColumn( + arrow_nested_column, column_array.getData(), column_name, format_name, false, dictionary_values); + + fillOffsetsFromArrowListColumn(arrow_column, column_array.getOffsetsColumn()); + break; + } + case arrow::Type::STRUCT: { - const ColumnWithTypeAndName & header_column = header.getByPosition(column_i); + ColumnTuple & column_tuple = assert_cast(internal_column); + int fields_count = column_tuple.tupleSize(); + std::vector nested_arrow_columns(fields_count); + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + { + arrow::StructArray & struct_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + for (int i = 0; i < fields_count; ++i) + nested_arrow_columns[i].emplace_back(struct_chunk.field(i)); + } - if (name_to_column_ptr.find(header_column.name) == name_to_column_ptr.end()) - // TODO: What if some columns were not presented? Insert NULLs? What if a column is not nullable? - throw Exception{fmt::format("Column \"{}\" is not presented in input data.", header_column.name), - ErrorCodes::THERE_IS_NO_COLUMN}; - - std::shared_ptr arrow_column = name_to_column_ptr[header_column.name]; - - DataTypePtr & internal_type = name_to_internal_type[header_column.name]; - MutableColumnPtr read_column = internal_type->createColumn(); - readColumnFromArrowColumn(arrow_column, *read_column, header_column.name, format_name, false, dictionary_values); - - ColumnWithTypeAndName column; - column.name = header_column.name; - column.type = internal_type; - column.column = std::move(read_column); - - column.column = castColumn(column, header_column.type); - column.type = header_column.type; - num_rows = column.column->size(); - columns_list.push_back(std::move(column.column)); + for (int i = 0; i != fields_count; ++i) + { + auto nested_arrow_column = std::make_shared(nested_arrow_columns[i]); + readColumnFromArrowColumn( + nested_arrow_column, column_tuple.getColumn(i), column_name, format_name, false, dictionary_values); + } + break; } + case arrow::Type::DICTIONARY: + { + ColumnLowCardinality & column_lc = assert_cast(internal_column); + auto & dict_values = dictionary_values[column_name]; - res.setColumns(columns_list, num_rows); + /// Load dictionary values only once and reuse it. + if (!dict_values) + { + arrow::ArrayVector dict_array; + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + { + arrow::DictionaryArray & dict_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + dict_array.emplace_back(dict_chunk.dictionary()); + } + auto arrow_dict_column = std::make_shared(dict_array); + + auto dict_column = IColumn::mutate(column_lc.getDictionaryPtr()); + auto * uniq_column = static_cast(dict_column.get()); + auto values_column = uniq_column->getNestedColumn()->cloneEmpty(); + readColumnFromArrowColumn( + arrow_dict_column, *values_column, column_name, format_name, false, dictionary_values); + uniq_column->uniqueInsertRangeFrom(*values_column, 0, values_column->size()); + dict_values = std::move(dict_column); + } + + arrow::ArrayVector indexes_array; + for (size_t chunk_i = 0, num_chunks = static_cast(arrow_column->num_chunks()); chunk_i < num_chunks; ++chunk_i) + { + arrow::DictionaryArray & dict_chunk = dynamic_cast(*(arrow_column->chunk(chunk_i))); + indexes_array.emplace_back(dict_chunk.indices()); + } + + auto arrow_indexes_column = std::make_shared(indexes_array); + auto indexes_column = createAndFillColumnWithIndexesData(arrow_indexes_column); + + auto new_column_lc = ColumnLowCardinality::create(dict_values, std::move(indexes_column)); + column_lc = std::move(*new_column_lc); + break; + } +# define DISPATCH(ARROW_NUMERIC_TYPE, CPP_NUMERIC_TYPE) \ + case ARROW_NUMERIC_TYPE: \ + fillColumnWithNumericData(arrow_column, internal_column); \ + break; + + FOR_ARROW_NUMERIC_TYPES(DISPATCH) +# undef DISPATCH + // TODO: support TIMESTAMP_MICROS and TIMESTAMP_MILLIS with truncated micro- and milliseconds? + // TODO: read JSON as a string? + // TODO: read UUID as a string? + default: + throw Exception(ErrorCodes::UNKNOWN_TYPE, + "Unsupported {} type '{}' of an input column '{}'.", format_name, arrow_column->type()->name(), column_name); } } + +static DataTypePtr getInternalType( + std::shared_ptr arrow_type, + const DataTypePtr & column_type, + const std::string & column_name, + const std::string & format_name) +{ + if (column_type->isNullable()) + { + DataTypePtr nested_type = assert_cast(column_type.get())->getNestedType(); + return makeNullable(getInternalType(arrow_type, nested_type, column_name, format_name)); + } + +#if defined(ARCADIA_BUILD) + if (arrow_type->id() == arrow::Type::DECIMAL) + { + const auto & decimal_type = dynamic_cast(*arrow_type); + return std::make_shared>(decimal_type.precision(), decimal_type.scale()); + } +#else + if (arrow_type->id() == arrow::Type::DECIMAL128) + { + const auto & decimal_type = dynamic_cast(*arrow_type); + return std::make_shared>(decimal_type.precision(), decimal_type.scale()); + } + + if (arrow_type->id() == arrow::Type::DECIMAL256) + { + const auto & decimal_type = dynamic_cast(*arrow_type); + return std::make_shared>(decimal_type.precision(), decimal_type.scale()); + } +#endif + + if (arrow_type->id() == arrow::Type::LIST) + { + const auto & list_type = dynamic_cast(*arrow_type); + auto list_nested_type = list_type.value_type(); + + const DataTypeArray * array_type = typeid_cast(column_type.get()); + if (!array_type) + throw Exception{ErrorCodes::CANNOT_CONVERT_TYPE, + "Cannot convert arrow LIST type to a not Array ClickHouse type {}.", column_type->getName()}; + + return std::make_shared(getInternalType(list_nested_type, array_type->getNestedType(), column_name, format_name)); + } + + if (arrow_type->id() == arrow::Type::STRUCT) + { + const auto & struct_type = dynamic_cast(*arrow_type); + const DataTypeTuple * tuple_type = typeid_cast(column_type.get()); + if (!tuple_type) + throw Exception{ErrorCodes::CANNOT_CONVERT_TYPE, + "Cannot convert arrow STRUCT type to a not Tuple ClickHouse type {}.", column_type->getName()}; + + const DataTypes & tuple_nested_types = tuple_type->getElements(); + int internal_fields_num = tuple_nested_types.size(); + /// If internal column has less elements then arrow struct, we will select only first internal_fields_num columns. + if (internal_fields_num > struct_type.num_fields()) + throw Exception( + ErrorCodes::CANNOT_CONVERT_TYPE, + "Cannot convert arrow STRUCT with {} fields to a ClickHouse Tuple with {} elements: {}.", + struct_type.num_fields(), + internal_fields_num, + column_type->getName()); + + DataTypes nested_types; + for (int i = 0; i < internal_fields_num; ++i) + nested_types.push_back(getInternalType(struct_type.field(i)->type(), tuple_nested_types[i], column_name, format_name)); + + return std::make_shared(std::move(nested_types)); + } + + if (arrow_type->id() == arrow::Type::DICTIONARY) + { + const auto & arrow_dict_type = dynamic_cast(*arrow_type); + const auto * lc_type = typeid_cast(column_type.get()); + /// We allow to insert arrow dictionary into a non-LowCardinality column. + const auto & dict_type = lc_type ? lc_type->getDictionaryType() : column_type; + return std::make_shared(getInternalType(arrow_dict_type.value_type(), dict_type, column_name, format_name)); + } + + if (arrow_type->id() == arrow::Type::MAP) + { + const auto & arrow_map_type = typeid_cast(*arrow_type); + const auto * map_type = typeid_cast(column_type.get()); + if (!map_type) + throw Exception{ErrorCodes::CANNOT_CONVERT_TYPE, "Cannot convert arrow MAP type to a not Map ClickHouse type {}.", column_type->getName()}; + + return std::make_shared( + getInternalType(arrow_map_type.key_type(), map_type->getKeyType(), column_name, format_name), + getInternalType(arrow_map_type.item_type(), map_type->getValueType(), column_name, format_name)); + } + + if (arrow_type->id() == arrow::Type::UINT16 + && (isDate(column_type) || isDateTime(column_type) || isDate32(column_type) || isDateTime64(column_type))) + { + /// Read UInt16 as Date. It will allow correct conversion to DateTime further. + return std::make_shared(); + } + + auto filter = [=](auto && elem) + { + auto which = WhichDataType(column_type); + if (arrow_type->id() == arrow::Type::DATE32 && which.isDateOrDate32()) + { + return (strcmp(elem.second, "Date") == 0 && which.isDate()) + || (strcmp(elem.second, "Date32") == 0 && which.isDate32()); + } + else + { + return elem.first == arrow_type->id(); + } + }; + if (const auto * internal_type_it = std::find_if(arrow_type_to_internal_type.begin(), arrow_type_to_internal_type.end(), filter); + internal_type_it != arrow_type_to_internal_type.end()) + { + return DataTypeFactory::instance().get(internal_type_it->second); + } + + throw Exception(ErrorCodes::CANNOT_CONVERT_TYPE, + "The type '{}' of an input column '{}' is not supported for conversion from {} data format.", + arrow_type->name(), column_name, format_name); +} + +ArrowColumnToCHColumn::ArrowColumnToCHColumn(const Block & header_, std::shared_ptr schema_, const std::string & format_name_) + : header(header_), format_name(format_name_) +{ + for (const auto & field : schema_->fields()) + { + if (header.has(field->name())) + { + const auto column_type = recursiveRemoveLowCardinality(header.getByName(field->name()).type); + name_to_internal_type[field->name()] = getInternalType(field->type(), column_type, field->name(), format_name); + } + } +} + +void ArrowColumnToCHColumn::arrowTableToCHChunk(Chunk & res, std::shared_ptr & table) +{ + Columns columns_list; + UInt64 num_rows = 0; + + columns_list.reserve(header.rows()); + + using NameToColumnPtr = std::unordered_map>; + + NameToColumnPtr name_to_column_ptr; + for (const auto & column_name : table->ColumnNames()) + { + std::shared_ptr arrow_column = table->GetColumnByName(column_name); + name_to_column_ptr[column_name] = arrow_column; + } + + for (size_t column_i = 0, columns = header.columns(); column_i < columns; ++column_i) + { + const ColumnWithTypeAndName & header_column = header.getByPosition(column_i); + + if (name_to_column_ptr.find(header_column.name) == name_to_column_ptr.end()) + // TODO: What if some columns were not presented? Insert NULLs? What if a column is not nullable? + throw Exception(ErrorCodes::THERE_IS_NO_COLUMN, + "Column '{}' is not presented in input data.", header_column.name); + + std::shared_ptr arrow_column = name_to_column_ptr[header_column.name]; + + DataTypePtr & internal_type = name_to_internal_type[header_column.name]; + MutableColumnPtr read_column = internal_type->createColumn(); + readColumnFromArrowColumn(arrow_column, *read_column, header_column.name, format_name, false, dictionary_values); + + ColumnWithTypeAndName column; + column.name = header_column.name; + column.type = internal_type; + column.column = std::move(read_column); + + column.column = castColumn(column, header_column.type); + column.type = header_column.type; + num_rows = column.column->size(); + columns_list.push_back(std::move(column.column)); + } + + res.setColumns(columns_list, num_rows); +} + +} + #endif diff --git a/src/Processors/Formats/Impl/ArrowColumnToCHColumn.h b/src/Processors/Formats/Impl/ArrowColumnToCHColumn.h index 7da54a8a02d..7f38dc7a31c 100644 --- a/src/Processors/Formats/Impl/ArrowColumnToCHColumn.h +++ b/src/Processors/Formats/Impl/ArrowColumnToCHColumn.h @@ -1,24 +1,21 @@ #pragma once -#include "config_formats.h" + +#if !defined(ARCADIA_BUILD) +# include "config_formats.h" +#endif #if USE_ARROW || USE_ORC || USE_PARQUET #include -#include -#include -#include -#include -#include -#include #include -#include -#include -#include -#include + namespace DB { +class Block; +class Chunk; + class ArrowColumnToCHColumn { public: @@ -27,37 +24,16 @@ public: void arrowTableToCHChunk(Chunk & res, std::shared_ptr & table); private: -#define FOR_ARROW_NUMERIC_TYPES(M) \ - M(arrow::Type::UINT8, DB::UInt8) \ - M(arrow::Type::INT8, DB::Int8) \ - M(arrow::Type::UINT16, DB::UInt16) \ - M(arrow::Type::INT16, DB::Int16) \ - M(arrow::Type::UINT32, DB::UInt32) \ - M(arrow::Type::INT32, DB::Int32) \ - M(arrow::Type::UINT64, DB::UInt64) \ - M(arrow::Type::INT64, DB::Int64) \ - M(arrow::Type::HALF_FLOAT, DB::Float32) \ - M(arrow::Type::FLOAT, DB::Float32) \ - M(arrow::Type::DOUBLE, DB::Float64) - -#define FOR_ARROW_INDEXES_TYPES(M) \ - M(arrow::Type::UINT8, DB::UInt8) \ - M(arrow::Type::INT8, DB::UInt8) \ - M(arrow::Type::UINT16, DB::UInt16) \ - M(arrow::Type::INT16, DB::UInt16) \ - M(arrow::Type::UINT32, DB::UInt32) \ - M(arrow::Type::INT32, DB::UInt32) \ - M(arrow::Type::UINT64, DB::UInt64) \ - M(arrow::Type::INT64, DB::UInt64) - - const Block & header; std::unordered_map name_to_internal_type; const std::string format_name; + /// Map {column name : dictionary column}. /// To avoid converting dictionary from Arrow Dictionary /// to LowCardinality every chunk we save it and reuse. std::unordered_map dictionary_values; }; + } + #endif diff --git a/src/Processors/Formats/Impl/AvroRowOutputFormat.cpp b/src/Processors/Formats/Impl/AvroRowOutputFormat.cpp index 81922bdde80..24b231e9ea8 100644 --- a/src/Processors/Formats/Impl/AvroRowOutputFormat.cpp +++ b/src/Processors/Formats/Impl/AvroRowOutputFormat.cpp @@ -41,6 +41,7 @@ #include #include +#include namespace DB { @@ -48,8 +49,32 @@ namespace ErrorCodes { extern const int ILLEGAL_COLUMN; extern const int BAD_ARGUMENTS; + extern const int CANNOT_COMPILE_REGEXP; } +class AvroSerializerTraits +{ +public: + explicit AvroSerializerTraits(const FormatSettings & settings_) + : string_to_string_regexp(settings_.avro.string_column_pattern) + { + if (!string_to_string_regexp.ok()) + throw DB::Exception( + "Avro: cannot compile re2: " + settings_.avro.string_column_pattern + ", error: " + string_to_string_regexp.error() + + ". Look at https://github.com/google/re2/wiki/Syntax for reference.", + DB::ErrorCodes::CANNOT_COMPILE_REGEXP); + } + + bool isStringAsString(const String & column_name) + { + return RE2::PartialMatch(column_name, string_to_string_regexp); + } + +private: + const RE2 string_to_string_regexp; +}; + + class OutputStreamWriteBufferAdapter : public avro::OutputStream { public: @@ -75,7 +100,7 @@ private: }; -AvroSerializer::SchemaWithSerializeFn AvroSerializer::createSchemaWithSerializeFn(DataTypePtr data_type, size_t & type_name_increment) +AvroSerializer::SchemaWithSerializeFn AvroSerializer::createSchemaWithSerializeFn(DataTypePtr data_type, size_t & type_name_increment, const String & column_name) { ++type_name_increment; @@ -161,11 +186,20 @@ AvroSerializer::SchemaWithSerializeFn AvroSerializer::createSchemaWithSerializeF }}; } case TypeIndex::String: - return {avro::BytesSchema(), [](const IColumn & column, size_t row_num, avro::Encoder & encoder) - { - const StringRef & s = assert_cast(column).getDataAt(row_num); - encoder.encodeBytes(reinterpret_cast(s.data), s.size); - }}; + if (traits->isStringAsString(column_name)) + return {avro::StringSchema(), [](const IColumn & column, size_t row_num, avro::Encoder & encoder) + { + const StringRef & s = assert_cast(column).getDataAt(row_num); + encoder.encodeString(s.toString()); + } + }; + else + return {avro::BytesSchema(), [](const IColumn & column, size_t row_num, avro::Encoder & encoder) + { + const StringRef & s = assert_cast(column).getDataAt(row_num); + encoder.encodeBytes(reinterpret_cast(s.data), s.size); + } + }; case TypeIndex::FixedString: { auto size = data_type->getSizeOfValueInMemory(); @@ -223,7 +257,7 @@ AvroSerializer::SchemaWithSerializeFn AvroSerializer::createSchemaWithSerializeF case TypeIndex::Array: { const auto & array_type = assert_cast(*data_type); - auto nested_mapping = createSchemaWithSerializeFn(array_type.getNestedType(), type_name_increment); + auto nested_mapping = createSchemaWithSerializeFn(array_type.getNestedType(), type_name_increment, column_name); auto schema = avro::ArraySchema(nested_mapping.schema); return {schema, [nested_mapping](const IColumn & column, size_t row_num, avro::Encoder & encoder) { @@ -249,7 +283,7 @@ AvroSerializer::SchemaWithSerializeFn AvroSerializer::createSchemaWithSerializeF case TypeIndex::Nullable: { auto nested_type = removeNullable(data_type); - auto nested_mapping = createSchemaWithSerializeFn(nested_type, type_name_increment); + auto nested_mapping = createSchemaWithSerializeFn(nested_type, type_name_increment, column_name); if (nested_type->getTypeId() == TypeIndex::Nothing) { return nested_mapping; @@ -278,7 +312,7 @@ AvroSerializer::SchemaWithSerializeFn AvroSerializer::createSchemaWithSerializeF case TypeIndex::LowCardinality: { const auto & nested_type = removeLowCardinality(data_type); - auto nested_mapping = createSchemaWithSerializeFn(nested_type, type_name_increment); + auto nested_mapping = createSchemaWithSerializeFn(nested_type, type_name_increment, column_name); return {nested_mapping.schema, [nested_mapping](const IColumn & column, size_t row_num, avro::Encoder & encoder) { const auto & col = assert_cast(column); @@ -294,7 +328,8 @@ AvroSerializer::SchemaWithSerializeFn AvroSerializer::createSchemaWithSerializeF } -AvroSerializer::AvroSerializer(const ColumnsWithTypeAndName & columns) +AvroSerializer::AvroSerializer(const ColumnsWithTypeAndName & columns, std::unique_ptr traits_) + : traits(std::move(traits_)) { avro::RecordSchema record_schema("row"); @@ -303,7 +338,7 @@ AvroSerializer::AvroSerializer(const ColumnsWithTypeAndName & columns) { try { - auto field_mapping = createSchemaWithSerializeFn(column.type, type_name_increment); + auto field_mapping = createSchemaWithSerializeFn(column.type, type_name_increment, column.name); serialize_fns.push_back(field_mapping.serialize); //TODO: verify name starts with A-Za-z_ record_schema.addField(column.name, field_mapping.schema); @@ -314,7 +349,7 @@ AvroSerializer::AvroSerializer(const ColumnsWithTypeAndName & columns) throw; } } - schema.setSchema(record_schema); + valid_schema.setSchema(record_schema); } void AvroSerializer::serializeRow(const Columns & columns, size_t row_num, avro::Encoder & encoder) @@ -350,7 +385,7 @@ AvroRowOutputFormat::AvroRowOutputFormat( WriteBuffer & out_, const Block & header_, const RowOutputFormatParams & params_, const FormatSettings & settings_) : IRowOutputFormat(header_, out_, params_) , settings(settings_) - , serializer(header_.getColumnsWithTypeAndName()) + , serializer(header_.getColumnsWithTypeAndName(), std::make_unique(settings)) , file_writer( std::make_unique(out_), serializer.getSchema(), diff --git a/src/Processors/Formats/Impl/AvroRowOutputFormat.h b/src/Processors/Formats/Impl/AvroRowOutputFormat.h index 8d0581d3307..c807736071e 100644 --- a/src/Processors/Formats/Impl/AvroRowOutputFormat.h +++ b/src/Processors/Formats/Impl/AvroRowOutputFormat.h @@ -18,11 +18,13 @@ namespace DB { class WriteBuffer; +class AvroSerializerTraits; + class AvroSerializer { public: - AvroSerializer(const ColumnsWithTypeAndName & columns); - const avro::ValidSchema & getSchema() const { return schema; } + AvroSerializer(const ColumnsWithTypeAndName & columns, std::unique_ptr); + const avro::ValidSchema & getSchema() const { return valid_schema; } void serializeRow(const Columns & columns, size_t row_num, avro::Encoder & encoder); private: @@ -34,10 +36,11 @@ private: }; /// Type names for different complex types (e.g. enums, fixed strings) must be unique. We use simple incremental number to give them different names. - static SchemaWithSerializeFn createSchemaWithSerializeFn(DataTypePtr data_type, size_t & type_name_increment); + SchemaWithSerializeFn createSchemaWithSerializeFn(DataTypePtr data_type, size_t & type_name_increment, const String & column_name); std::vector serialize_fns; - avro::ValidSchema schema; + avro::ValidSchema valid_schema; + std::unique_ptr traits; }; class AvroRowOutputFormat : public IRowOutputFormat diff --git a/src/Processors/Formats/Impl/CHColumnToArrowColumn.cpp b/src/Processors/Formats/Impl/CHColumnToArrowColumn.cpp index 230b28c657e..42aa9e6ddc7 100644 --- a/src/Processors/Formats/Impl/CHColumnToArrowColumn.cpp +++ b/src/Processors/Formats/Impl/CHColumnToArrowColumn.cpp @@ -23,6 +23,30 @@ #include #include +#define FOR_INTERNAL_NUMERIC_TYPES(M) \ + M(UInt8, arrow::UInt8Builder) \ + M(Int8, arrow::Int8Builder) \ + M(UInt16, arrow::UInt16Builder) \ + M(Int16, arrow::Int16Builder) \ + M(UInt32, arrow::UInt32Builder) \ + M(Int32, arrow::Int32Builder) \ + M(UInt64, arrow::UInt64Builder) \ + M(Int64, arrow::Int64Builder) \ + M(Float32, arrow::FloatBuilder) \ + M(Float64, arrow::DoubleBuilder) + +#define FOR_ARROW_TYPES(M) \ + M(UINT8, arrow::UInt8Type) \ + M(INT8, arrow::Int8Type) \ + M(UINT16, arrow::UInt16Type) \ + M(INT16, arrow::Int16Type) \ + M(UINT32, arrow::UInt32Type) \ + M(INT32, arrow::Int32Type) \ + M(UINT64, arrow::UInt64Type) \ + M(INT64, arrow::Int64Type) \ + M(FLOAT, arrow::FloatType) \ + M(DOUBLE, arrow::DoubleType) \ + M(STRING, arrow::StringType) namespace DB { @@ -46,11 +70,8 @@ namespace DB {"Float32", arrow::float32()}, {"Float64", arrow::float64()}, - //{"Date", arrow::date64()}, - //{"Date", arrow::date32()}, - {"Date", arrow::uint16()}, // CHECK - //{"DateTime", arrow::date64()}, // BUG! saves as date32 - {"DateTime", arrow::uint32()}, + {"Date", arrow::uint16()}, /// uint16 is used instead of date32, because Apache Arrow cannot correctly serialize Date32Array. + {"DateTime", arrow::uint32()}, /// uint32 is used instead of date64, because we don't need milliseconds. {"String", arrow::binary()}, {"FixedString", arrow::binary()}, @@ -265,11 +286,11 @@ namespace DB auto value_type = assert_cast(array_builder->type().get())->value_type(); #define DISPATCH(ARROW_TYPE_ID, ARROW_TYPE) \ - if (arrow::Type::ARROW_TYPE_ID == value_type->id()) \ - { \ - fillArrowArrayWithLowCardinalityColumnDataImpl(column_name, column, column_type, null_bytemap, array_builder, format_name, start, end, dictionary_values); \ - return; \ - } + if (arrow::Type::ARROW_TYPE_ID == value_type->id()) \ + { \ + fillArrowArrayWithLowCardinalityColumnDataImpl(column_name, column, column_type, null_bytemap, array_builder, format_name, start, end, dictionary_values); \ + return; \ + } FOR_ARROW_TYPES(DISPATCH) #undef DISPATCH @@ -337,7 +358,6 @@ namespace DB size_t end) { const auto & internal_data = assert_cast &>(*write_column).getData(); - //arrow::Date64Builder builder; arrow::UInt32Builder & builder = assert_cast(*array_builder); arrow::Status status; @@ -346,8 +366,6 @@ namespace DB if (null_bytemap && (*null_bytemap)[value_i]) status = builder.AppendNull(); else - /// Implicitly converts UInt16 to Int32 - //status = date_builder.Append(static_cast(internal_data[value_i]) * 1000); // now ms. TODO check other units status = builder.Append(internal_data[value_i]); checkStatus(status, write_column->getName(), format_name); @@ -367,7 +385,7 @@ namespace DB { const String column_type_name = column_type->getFamilyName(); - if ("Nullable" == column_type_name) + if (column_type->isNullable()) { const ColumnNullable * column_nullable = assert_cast(column.get()); ColumnPtr nested_column = column_nullable->getNestedColumnPtr(); @@ -376,35 +394,35 @@ namespace DB const PaddedPODArray & bytemap = assert_cast &>(*null_column).getData(); fillArrowArray(column_name, nested_column, nested_type, &bytemap, array_builder, format_name, start, end, dictionary_values); } - else if ("String" == column_type_name) + else if (isString(column_type)) { fillArrowArrayWithStringColumnData(column, null_bytemap, format_name, array_builder, start, end); } - else if ("FixedString" == column_type_name) + else if (isFixedString(column_type)) { fillArrowArrayWithStringColumnData(column, null_bytemap, format_name, array_builder, start, end); } - else if ("Date" == column_type_name) + else if (isDate(column_type)) { fillArrowArrayWithDateColumnData(column, null_bytemap, format_name, array_builder, start, end); } - else if ("DateTime" == column_type_name) + else if (isDateTime(column_type)) { fillArrowArrayWithDateTimeColumnData(column, null_bytemap, format_name, array_builder, start, end); } - else if ("Array" == column_type_name) + else if (isArray(column_type)) { fillArrowArrayWithArrayColumnData(column_name, column, column_type, null_bytemap, array_builder, format_name, start, end, dictionary_values); } - else if ("Tuple" == column_type_name) + else if (isTuple(column_type)) { fillArrowArrayWithTupleColumnData(column_name, column, column_type, null_bytemap, array_builder, format_name, start, end, dictionary_values); } - else if ("LowCardinality" == column_type_name) + else if (column_type->getTypeId() == TypeIndex::LowCardinality) { fillArrowArrayWithLowCardinalityColumnData(column_name, column, column_type, null_bytemap, array_builder, format_name, start, end, dictionary_values); } - else if ("Map" == column_type_name) + else if (isMap(column_type)) { ColumnPtr column_array = assert_cast(column.get())->getNestedColumnPtr(); DataTypePtr array_type = assert_cast(column_type.get())->getNestedType(); @@ -424,11 +442,13 @@ namespace DB fillArrowArrayWithDecimalColumnData(column, null_bytemap, array_builder, format_name, start, end); return true; } +#if !defined(ARCADIA_BUILD) if constexpr (std::is_same_v>) { fillArrowArrayWithDecimalColumnData(column, null_bytemap, array_builder, format_name, start, end); return true; } +#endif return false; }; @@ -437,10 +457,10 @@ namespace DB throw Exception{ErrorCodes::LOGICAL_ERROR, "Cannot fill arrow array with decimal data with type {}", column_type_name}; } #define DISPATCH(CPP_NUMERIC_TYPE, ARROW_BUILDER_TYPE) \ - else if (#CPP_NUMERIC_TYPE == column_type_name) \ - { \ - fillArrowArrayWithNumericColumnData(column, null_bytemap, format_name, array_builder, start, end); \ - } + else if (#CPP_NUMERIC_TYPE == column_type_name) \ + { \ + fillArrowArrayWithNumericColumnData(column, null_bytemap, format_name, array_builder, start, end); \ + } FOR_INTERNAL_NUMERIC_TYPES(DISPATCH) #undef DISPATCH @@ -448,7 +468,7 @@ namespace DB { throw Exception { - fmt::format(R"(Internal type "{}" of a column "{}" is not supported for conversion into a {} data format.)", column_type_name, column_name, format_name), + fmt::format("Internal type '{}' of a column '{}' is not supported for conversion into {} data format.", column_type_name, column_name, format_name), ErrorCodes::UNKNOWN_TYPE }; } @@ -502,14 +522,15 @@ namespace DB } } - static std::shared_ptr getArrowType(DataTypePtr column_type, ColumnPtr column, const std::string & column_name, const std::string & format_name, bool * is_column_nullable) + static std::shared_ptr getArrowType( + DataTypePtr column_type, ColumnPtr column, const std::string & column_name, const std::string & format_name, bool * out_is_column_nullable) { if (column_type->isNullable()) { DataTypePtr nested_type = assert_cast(column_type.get())->getNestedType(); ColumnPtr nested_column = assert_cast(column.get())->getNestedColumnPtr(); - auto arrow_type = getArrowType(nested_type, nested_column, column_name, format_name, is_column_nullable); - *is_column_nullable = true; + auto arrow_type = getArrowType(nested_type, nested_column, column_name, format_name, out_is_column_nullable); + *out_is_column_nullable = true; return arrow_type; } @@ -542,7 +563,7 @@ namespace DB { auto nested_type = assert_cast(column_type.get())->getNestedType(); auto nested_column = assert_cast(column.get())->getDataPtr(); - auto nested_arrow_type = getArrowType(nested_type, nested_column, column_name, format_name, is_column_nullable); + auto nested_arrow_type = getArrowType(nested_type, nested_column, column_name, format_name, out_is_column_nullable); return arrow::list(nested_arrow_type); } @@ -554,8 +575,8 @@ namespace DB for (size_t i = 0; i != nested_types.size(); ++i) { String name = column_name + "." + std::to_string(i); - auto nested_arrow_type = getArrowType(nested_types[i], tuple_column->getColumnPtr(i), name, format_name, is_column_nullable); - nested_fields.push_back(std::make_shared(name, nested_arrow_type, *is_column_nullable)); + auto nested_arrow_type = getArrowType(nested_types[i], tuple_column->getColumnPtr(i), name, format_name, out_is_column_nullable); + nested_fields.push_back(std::make_shared(name, nested_arrow_type, *out_is_column_nullable)); } return arrow::struct_(std::move(nested_fields)); } @@ -568,7 +589,7 @@ namespace DB const auto & indexes_column = lc_column->getIndexesPtr(); return arrow::dictionary( getArrowTypeForLowCardinalityIndexes(indexes_column), - getArrowType(nested_type, nested_column, column_name, format_name, is_column_nullable)); + getArrowType(nested_type, nested_column, column_name, format_name, out_is_column_nullable)); } if (isMap(column_type)) @@ -579,9 +600,8 @@ namespace DB const auto & columns = assert_cast(column.get())->getNestedData().getColumns(); return arrow::map( - getArrowType(key_type, columns[0], column_name, format_name, is_column_nullable), - getArrowType(val_type, columns[1], column_name, format_name, is_column_nullable) - ); + getArrowType(key_type, columns[0], column_name, format_name, out_is_column_nullable), + getArrowType(val_type, columns[1], column_name, format_name, out_is_column_nullable)); } const std::string type_name = column_type->getFamilyName(); @@ -594,8 +614,9 @@ namespace DB return arrow_type_it->second; } - throw Exception{fmt::format(R"(The type "{}" of a column "{}" is not supported for conversion into a {} data format.)", column_type->getName(), column_name, format_name), - ErrorCodes::UNKNOWN_TYPE}; + throw Exception(ErrorCodes::UNKNOWN_TYPE, + "The type '{}' of a column '{}' is not supported for conversion into {} data format.", + column_type->getName(), column_name, format_name); } CHColumnToArrowColumn::CHColumnToArrowColumn(const Block & header, const std::string & format_name_, bool low_cardinality_as_dictionary_) @@ -638,7 +659,8 @@ namespace DB arrow::Status status = MakeBuilder(pool, arrow_fields[column_i]->type(), &array_builder); checkStatus(status, column->getName(), format_name); - fillArrowArray(header_column.name, column, header_column.type, nullptr, array_builder.get(), format_name, 0, column->size(), dictionary_values); + fillArrowArray( + header_column.name, column, header_column.type, nullptr, array_builder.get(), format_name, 0, column->size(), dictionary_values); std::shared_ptr arrow_array; status = array_builder->Finish(&arrow_array); diff --git a/src/Processors/Formats/Impl/CHColumnToArrowColumn.h b/src/Processors/Formats/Impl/CHColumnToArrowColumn.h index efe02a0d7d9..c0885d3778c 100644 --- a/src/Processors/Formats/Impl/CHColumnToArrowColumn.h +++ b/src/Processors/Formats/Impl/CHColumnToArrowColumn.h @@ -1,5 +1,8 @@ #pragma once -#include "config_formats.h" +#if !defined(ARCADIA_BUILD) +# include "config_formats.h" +#endif + #if USE_ARROW || USE_PARQUET @@ -7,42 +10,18 @@ #include #include + namespace DB { class CHColumnToArrowColumn { public: - CHColumnToArrowColumn(const Block & header, const std::string & format_name_, bool low_cardinality_as_dictionary_ = false); + CHColumnToArrowColumn(const Block & header, const std::string & format_name_, bool low_cardinality_as_dictionary_); void chChunkToArrowTable(std::shared_ptr & res, const Chunk & chunk, size_t columns_num); + private: - -#define FOR_INTERNAL_NUMERIC_TYPES(M) \ - M(UInt8, arrow::UInt8Builder) \ - M(Int8, arrow::Int8Builder) \ - M(UInt16, arrow::UInt16Builder) \ - M(Int16, arrow::Int16Builder) \ - M(UInt32, arrow::UInt32Builder) \ - M(Int32, arrow::Int32Builder) \ - M(UInt64, arrow::UInt64Builder) \ - M(Int64, arrow::Int64Builder) \ - M(Float32, arrow::FloatBuilder) \ - M(Float64, arrow::DoubleBuilder) - -#define FOR_ARROW_TYPES(M) \ - M(UINT8, arrow::UInt8Type) \ - M(INT8, arrow::Int8Type) \ - M(UINT16, arrow::UInt16Type) \ - M(INT16, arrow::Int16Type) \ - M(UINT32, arrow::UInt32Type) \ - M(INT32, arrow::Int32Type) \ - M(UINT64, arrow::UInt64Type) \ - M(INT64, arrow::Int64Type) \ - M(FLOAT, arrow::FloatType) \ - M(DOUBLE, arrow::DoubleType) \ - M(STRING, arrow::StringType) - ColumnsWithTypeAndName header_columns; std::vector> arrow_fields; const std::string format_name; @@ -52,5 +31,7 @@ private: /// Dictionary every chunk we save it and reuse. std::unordered_map> dictionary_values; }; + } + #endif diff --git a/src/Processors/Formats/Impl/MySQLOutputFormat.cpp b/src/Processors/Formats/Impl/MySQLOutputFormat.cpp index 0f6d90b720e..6fdcc544a18 100644 --- a/src/Processors/Formats/Impl/MySQLOutputFormat.cpp +++ b/src/Processors/Formats/Impl/MySQLOutputFormat.cpp @@ -1,7 +1,11 @@ #include -#include +#include +#include #include +#include #include +#include + namespace DB { @@ -13,24 +17,18 @@ using namespace MySQLProtocol::ProtocolText; MySQLOutputFormat::MySQLOutputFormat(WriteBuffer & out_, const Block & header_, const FormatSettings & settings_) : IOutputFormat(header_, out_) - , format_settings(settings_) + , client_capabilities(settings_.mysql_wire.client_capabilities) { + /// MySQlWire is a special format that is usually used as output format for MySQL protocol connections. + /// In this case we have a correct `sequence_id` stored in `settings_.mysql_wire`. + /// But it's also possible to specify MySQLWire as output format for clickhouse-client or clickhouse-local. + /// There is no `sequence_id` stored in `settings_.mysql_wire` in this case, so we create a dummy one. + sequence_id = settings_.mysql_wire.sequence_id ? settings_.mysql_wire.sequence_id : &dummy_sequence_id; } void MySQLOutputFormat::setContext(ContextPtr context_) { context = context_; - /// MySQlWire is a special format that is usually used as output format for MySQL protocol connections. - /// In this case we have to use the corresponding session context to set correct sequence_id. - mysql_context = getContext()->getMySQLProtocolContext(); - if (!mysql_context) - { - /// But it's also possible to specify MySQLWire as output format for clickhouse-client or clickhouse-local. - /// There is no MySQL protocol context in this case, so we create dummy one. - own_mysql_context.emplace(); - mysql_context = &own_mysql_context.value(); - } - packet_endpoint = mysql_context->makeEndpoint(out); } void MySQLOutputFormat::initialize() @@ -39,6 +37,7 @@ void MySQLOutputFormat::initialize() return; initialized = true; + const auto & header = getPort(PortKind::Main).getHeader(); data_types = header.getDataTypes(); @@ -46,6 +45,8 @@ void MySQLOutputFormat::initialize() for (const auto & type : data_types) serializations.emplace_back(type->getDefaultSerialization()); + packet_endpoint = MySQLProtocol::PacketEndpoint::create(out, *sequence_id); + if (header.columns()) { packet_endpoint->sendPacket(LengthEncodedNumber(header.columns())); @@ -56,7 +57,7 @@ void MySQLOutputFormat::initialize() packet_endpoint->sendPacket(getColumnDefinition(column_name, data_types[i]->getTypeId())); } - if (!(mysql_context->client_capabilities & Capability::CLIENT_DEPRECATE_EOF)) + if (!(client_capabilities & Capability::CLIENT_DEPRECATE_EOF)) { packet_endpoint->sendPacket(EOFPacket(0, 0)); } @@ -66,7 +67,6 @@ void MySQLOutputFormat::initialize() void MySQLOutputFormat::consume(Chunk chunk) { - initialize(); for (size_t i = 0; i < chunk.getNumRows(); i++) @@ -94,11 +94,9 @@ void MySQLOutputFormat::finalize() const auto & header = getPort(PortKind::Main).getHeader(); if (header.columns() == 0) - packet_endpoint->sendPacket( - OKPacket(0x0, mysql_context->client_capabilities, affected_rows, 0, 0, "", human_readable_info), true); - else if (mysql_context->client_capabilities & CLIENT_DEPRECATE_EOF) - packet_endpoint->sendPacket( - OKPacket(0xfe, mysql_context->client_capabilities, affected_rows, 0, 0, "", human_readable_info), true); + packet_endpoint->sendPacket(OKPacket(0x0, client_capabilities, affected_rows, 0, 0, "", human_readable_info), true); + else if (client_capabilities & CLIENT_DEPRECATE_EOF) + packet_endpoint->sendPacket(OKPacket(0xfe, client_capabilities, affected_rows, 0, 0, "", human_readable_info), true); else packet_endpoint->sendPacket(EOFPacket(0, 0), true); } diff --git a/src/Processors/Formats/Impl/MySQLOutputFormat.h b/src/Processors/Formats/Impl/MySQLOutputFormat.h index fed2a431860..a8e1ada3d6a 100644 --- a/src/Processors/Formats/Impl/MySQLOutputFormat.h +++ b/src/Processors/Formats/Impl/MySQLOutputFormat.h @@ -3,11 +3,9 @@ #include #include -#include -#include -#include -#include -#include +#include +#include + namespace DB { @@ -15,6 +13,7 @@ namespace DB class IColumn; class IDataType; class WriteBuffer; +struct FormatSettings; /** A stream for outputting data in a binary line-by-line format. */ @@ -32,15 +31,14 @@ public: void flush() override; void doWritePrefix() override { initialize(); } +private: void initialize(); -private: bool initialized = false; - - std::optional own_mysql_context; - MySQLWireContext * mysql_context = nullptr; + uint32_t client_capabilities = 0; + uint8_t * sequence_id = nullptr; + uint8_t dummy_sequence_id = 0; MySQLProtocol::PacketEndpointPtr packet_endpoint; - FormatSettings format_settings; DataTypes data_types; Serializations serializations; }; diff --git a/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.cpp b/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.cpp index ce7dd1abd51..f7723e3f1d2 100644 --- a/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.cpp +++ b/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.cpp @@ -13,11 +13,15 @@ namespace DB addChunk(Chunk{}, ProcessingUnitType::FINALIZE, /*can_throw_exception*/ false); collector_finished.wait(); - if (collector_thread.joinable()) - collector_thread.join(); + { + std::lock_guard lock(collector_thread_mutex); + if (collector_thread.joinable()) + collector_thread.join(); + } { std::unique_lock lock(mutex); + if (background_exception) std::rethrow_exception(background_exception); } @@ -66,8 +70,11 @@ namespace DB writer_condvar.notify_all(); } - if (collector_thread.joinable()) - collector_thread.join(); + { + std::lock_guard lock(collector_thread_mutex); + if (collector_thread.joinable()) + collector_thread.join(); + } try { diff --git a/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.h b/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.h index 8b9e8293c69..e7a6435981b 100644 --- a/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.h +++ b/src/Processors/Formats/Impl/ParallelFormattingOutputFormat.h @@ -172,6 +172,7 @@ private: ThreadPool pool; // Collecting all memory to original ReadBuffer ThreadFromGlobalPool collector_thread; + std::mutex collector_thread_mutex; std::exception_ptr background_exception = nullptr; diff --git a/src/Processors/Formats/Impl/ParallelParsingInputFormat.h b/src/Processors/Formats/Impl/ParallelParsingInputFormat.h index dafaf9bed72..5cf83bd3bb3 100644 --- a/src/Processors/Formats/Impl/ParallelParsingInputFormat.h +++ b/src/Processors/Formats/Impl/ParallelParsingInputFormat.h @@ -1,7 +1,6 @@ #pragma once #include -#include #include #include #include @@ -13,6 +12,7 @@ #include #include + namespace DB { diff --git a/src/Processors/Formats/Impl/ParquetBlockOutputFormat.cpp b/src/Processors/Formats/Impl/ParquetBlockOutputFormat.cpp index 800fd0ff0e8..c3771c7b552 100644 --- a/src/Processors/Formats/Impl/ParquetBlockOutputFormat.cpp +++ b/src/Processors/Formats/Impl/ParquetBlockOutputFormat.cpp @@ -2,16 +2,7 @@ #if USE_PARQUET -// TODO: clean includes -#include -#include -#include -#include -#include #include -#include -#include -#include #include #include "ArrowBufferedStreams.h" #include "CHColumnToArrowColumn.h" @@ -19,6 +10,7 @@ namespace DB { + namespace ErrorCodes { extern const int UNKNOWN_EXCEPTION; @@ -37,7 +29,7 @@ void ParquetBlockOutputFormat::consume(Chunk chunk) if (!ch_column_to_arrow_column) { const Block & header = getPort(PortKind::Main).getHeader(); - ch_column_to_arrow_column = std::make_unique(header, "Parquet"); + ch_column_to_arrow_column = std::make_unique(header, "Parquet", false); } ch_column_to_arrow_column->chChunkToArrowTable(arrow_table, chunk, columns_num); @@ -91,11 +83,7 @@ void registerOutputFormatProcessorParquet(FormatFactory & factory) const RowOutputFormatParams &, const FormatSettings & format_settings) { - auto impl = std::make_shared(buf, sample, format_settings); - /// TODO - // auto res = std::make_shared(impl, impl->getHeader(), format_settings.parquet.row_group_size, 0); - // res->disableFlush(); - return impl; + return std::make_shared(buf, sample, format_settings); }); } diff --git a/src/Processors/Formats/LazyOutputFormat.h b/src/Processors/Formats/LazyOutputFormat.h index 15ea5022f82..c6be0adb347 100644 --- a/src/Processors/Formats/LazyOutputFormat.h +++ b/src/Processors/Formats/LazyOutputFormat.h @@ -29,7 +29,7 @@ public: void setRowsBeforeLimit(size_t rows_before_limit) override; - void finish() + void onCancel() override { finished_processing = true; /// Clear queue in case if somebody is waiting lazy_format to push. diff --git a/src/Processors/ISink.cpp b/src/Processors/ISink.cpp index bfe015f876c..0de3ed37a61 100644 --- a/src/Processors/ISink.cpp +++ b/src/Processors/ISink.cpp @@ -11,12 +11,17 @@ ISink::ISink(Block header) ISink::Status ISink::prepare() { + if (!was_on_start_called) + return Status::Ready; + if (has_input) return Status::Ready; if (input.isFinished()) { - onFinish(); + if (!was_on_finish_called) + return Status::Ready; + return Status::Finished; } @@ -31,9 +36,21 @@ ISink::Status ISink::prepare() void ISink::work() { - consume(std::move(current_chunk)); - has_input = false; + if (!was_on_start_called) + { + was_on_start_called = true; + onStart(); + } + else if (has_input) + { + has_input = false; + consume(std::move(current_chunk)); + } + else if (!was_on_finish_called) + { + was_on_finish_called = true; + onFinish(); + } } } - diff --git a/src/Processors/ISink.h b/src/Processors/ISink.h index 33cb361e30b..f960def1cdd 100644 --- a/src/Processors/ISink.h +++ b/src/Processors/ISink.h @@ -12,9 +12,11 @@ protected: InputPort & input; Chunk current_chunk; bool has_input = false; + bool was_on_start_called = false; + bool was_on_finish_called = false; virtual void consume(Chunk block) = 0; - + virtual void onStart() {} virtual void onFinish() {} public: diff --git a/src/Processors/Merges/AggregatingSortedTransform.h b/src/Processors/Merges/AggregatingSortedTransform.h index a0425d4c376..e8bf90c2b31 100644 --- a/src/Processors/Merges/AggregatingSortedTransform.h +++ b/src/Processors/Merges/AggregatingSortedTransform.h @@ -16,7 +16,7 @@ public: const Block & header, size_t num_inputs, SortDescription description_, size_t max_block_size) : IMergingTransform( - num_inputs, header, header, true, + num_inputs, header, header, /*have_all_inputs_=*/ true, /*has_limit_below_one_block_=*/ false, header, num_inputs, std::move(description_), diff --git a/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.cpp b/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.cpp index df10fb26d40..7795292f922 100644 --- a/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.cpp +++ b/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.cpp @@ -2,6 +2,8 @@ #include #include #include +#include + namespace DB { @@ -16,6 +18,8 @@ static GraphiteRollupSortedAlgorithm::ColumnsDefinition defineColumns( def.value_column_num = header.getPositionByName(params.value_column_name); def.version_column_num = header.getPositionByName(params.version_column_name); + def.time_column_type = header.getByPosition(def.time_column_num).type; + size_t num_columns = header.columns(); for (size_t i = 0; i < num_columns; ++i) if (i != def.time_column_num && i != def.value_column_num && i != def.version_column_num) @@ -122,8 +126,8 @@ UInt32 GraphiteRollupSortedAlgorithm::selectPrecision(const Graphite::Retentions * In this case, the date should not change. The date is calculated using the local time zone. * * If the rounding value is less than an hour, - * then, assuming that time zones that differ from UTC by a non-integer number of hours are not supported, - * just simply round the unix timestamp down to a multiple of 3600. + * then, assuming that time zones that differ from UTC by a multiple of 15-minute intervals + * (that is true for all modern timezones but not true for historical timezones). * And if the rounding value is greater, * then we will round down the number of seconds from the beginning of the day in the local time zone. * @@ -131,7 +135,7 @@ UInt32 GraphiteRollupSortedAlgorithm::selectPrecision(const Graphite::Retentions */ static time_t roundTimeToPrecision(const DateLUTImpl & date_lut, time_t time, UInt32 precision) { - if (precision <= 3600) + if (precision <= 900) { return time / precision * precision; } @@ -145,7 +149,10 @@ static time_t roundTimeToPrecision(const DateLUTImpl & date_lut, time_t time, UI IMergingAlgorithm::Status GraphiteRollupSortedAlgorithm::merge() { - const DateLUTImpl & date_lut = DateLUT::instance(); + /// Timestamp column can be DateTime or UInt32. If it is DateTime, we can use its timezone for calculations. + const TimezoneMixin * timezone = dynamic_cast(columns_definition.time_column_type.get()); + + const DateLUTImpl & date_lut = timezone ? timezone->getTimeZone() : DateLUT::instance(); /// Take rows in needed order and put them into `merged_data` until we get `max_block_size` rows. /// diff --git a/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.h b/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.h index a0e8f1662aa..0155b73b238 100644 --- a/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.h +++ b/src/Processors/Merges/Algorithms/GraphiteRollupSortedAlgorithm.h @@ -35,6 +35,8 @@ public: size_t value_column_num; size_t version_column_num; + DataTypePtr time_column_type; + /// All columns other than 'time', 'value', 'version'. They are unmodified during rollup. ColumnNumbers unmodified_column_numbers; }; diff --git a/src/Processors/Merges/CollapsingSortedTransform.h b/src/Processors/Merges/CollapsingSortedTransform.h index 9e6bd306eee..87c466f31e8 100644 --- a/src/Processors/Merges/CollapsingSortedTransform.h +++ b/src/Processors/Merges/CollapsingSortedTransform.h @@ -20,7 +20,7 @@ public: WriteBuffer * out_row_sources_buf_ = nullptr, bool use_average_block_sizes = false) : IMergingTransform( - num_inputs, header, header, true, + num_inputs, header, header, /*have_all_inputs_=*/ true, /*has_limit_below_one_block_=*/ false, header, num_inputs, std::move(description_), diff --git a/src/Processors/Merges/FinishAggregatingInOrderTransform.h b/src/Processors/Merges/FinishAggregatingInOrderTransform.h index 4f9e53bd7d5..6d5e334311f 100644 --- a/src/Processors/Merges/FinishAggregatingInOrderTransform.h +++ b/src/Processors/Merges/FinishAggregatingInOrderTransform.h @@ -19,7 +19,7 @@ public: SortDescription description, size_t max_block_size) : IMergingTransform( - num_inputs, header, header, true, + num_inputs, header, header, /*have_all_inputs_=*/ true, /*has_limit_below_one_block_=*/ false, header, num_inputs, params, diff --git a/src/Processors/Merges/GraphiteRollupSortedTransform.h b/src/Processors/Merges/GraphiteRollupSortedTransform.h index 5104801aa0d..46272f00eed 100644 --- a/src/Processors/Merges/GraphiteRollupSortedTransform.h +++ b/src/Processors/Merges/GraphiteRollupSortedTransform.h @@ -15,7 +15,7 @@ public: SortDescription description_, size_t max_block_size, Graphite::Params params_, time_t time_of_merge_) : IMergingTransform( - num_inputs, header, header, true, + num_inputs, header, header, /*have_all_inputs_=*/ true, /*has_limit_below_one_block_=*/ false, header, num_inputs, std::move(description_), diff --git a/src/Processors/Merges/IMergingTransform.cpp b/src/Processors/Merges/IMergingTransform.cpp index eff786b150f..cba78390c97 100644 --- a/src/Processors/Merges/IMergingTransform.cpp +++ b/src/Processors/Merges/IMergingTransform.cpp @@ -14,9 +14,11 @@ IMergingTransformBase::IMergingTransformBase( size_t num_inputs, const Block & input_header, const Block & output_header, - bool have_all_inputs_) + bool have_all_inputs_, + bool has_limit_below_one_block_) : IProcessor(InputPorts(num_inputs, input_header), {output_header}) , have_all_inputs(have_all_inputs_) + , has_limit_below_one_block(has_limit_below_one_block_) { } @@ -64,10 +66,7 @@ IProcessor::Status IMergingTransformBase::prepareInitializeInputs() continue; if (input_states[i].is_initialized) - { - // input.setNotNeeded(); continue; - } input.setNeeded(); @@ -77,12 +76,17 @@ IProcessor::Status IMergingTransformBase::prepareInitializeInputs() continue; } - auto chunk = input.pull(); + /// setNotNeeded after reading first chunk, because in optimismtic case + /// (e.g. with optimized 'ORDER BY primary_key LIMIT n' and small 'n') + /// we won't have to read any chunks anymore; + auto chunk = input.pull(has_limit_below_one_block); if (!chunk.hasRows()) { - if (!input.isFinished()) + { + input.setNeeded(); all_inputs_has_data = false; + } continue; } diff --git a/src/Processors/Merges/IMergingTransform.h b/src/Processors/Merges/IMergingTransform.h index ce673131ab6..8b0a44ae025 100644 --- a/src/Processors/Merges/IMergingTransform.h +++ b/src/Processors/Merges/IMergingTransform.h @@ -16,7 +16,8 @@ public: size_t num_inputs, const Block & input_header, const Block & output_header, - bool have_all_inputs_); + bool have_all_inputs_, + bool has_limit_below_one_block_); OutputPort & getOutputPort() { return outputs.front(); } @@ -66,6 +67,7 @@ private: std::vector input_states; std::atomic have_all_inputs; bool is_initialized = false; + bool has_limit_below_one_block = false; IProcessor::Status prepareInitializeInputs(); }; @@ -81,8 +83,9 @@ public: const Block & input_header, const Block & output_header, bool have_all_inputs_, + bool has_limit_below_one_block_, Args && ... args) - : IMergingTransformBase(num_inputs, input_header, output_header, have_all_inputs_) + : IMergingTransformBase(num_inputs, input_header, output_header, have_all_inputs_, has_limit_below_one_block_) , algorithm(std::forward(args) ...) { } diff --git a/src/Processors/Merges/MergingSortedTransform.cpp b/src/Processors/Merges/MergingSortedTransform.cpp index ec1bdc59683..92fafa4242c 100644 --- a/src/Processors/Merges/MergingSortedTransform.cpp +++ b/src/Processors/Merges/MergingSortedTransform.cpp @@ -13,12 +13,13 @@ MergingSortedTransform::MergingSortedTransform( SortDescription description_, size_t max_block_size, UInt64 limit_, + bool has_limit_below_one_block_, WriteBuffer * out_row_sources_buf_, bool quiet_, bool use_average_block_sizes, bool have_all_inputs_) : IMergingTransform( - num_inputs, header, header, have_all_inputs_, + num_inputs, header, header, have_all_inputs_, has_limit_below_one_block_, header, num_inputs, std::move(description_), diff --git a/src/Processors/Merges/MergingSortedTransform.h b/src/Processors/Merges/MergingSortedTransform.h index 93bd36d8aec..1fa9b1275bd 100644 --- a/src/Processors/Merges/MergingSortedTransform.h +++ b/src/Processors/Merges/MergingSortedTransform.h @@ -17,6 +17,7 @@ public: SortDescription description, size_t max_block_size, UInt64 limit_ = 0, + bool has_limit_below_one_block_ = false, WriteBuffer * out_row_sources_buf_ = nullptr, bool quiet_ = false, bool use_average_block_sizes = false, diff --git a/src/Processors/Merges/ReplacingSortedTransform.h b/src/Processors/Merges/ReplacingSortedTransform.h index 757e19e2cbe..e760cdf0d2b 100644 --- a/src/Processors/Merges/ReplacingSortedTransform.h +++ b/src/Processors/Merges/ReplacingSortedTransform.h @@ -18,7 +18,7 @@ public: WriteBuffer * out_row_sources_buf_ = nullptr, bool use_average_block_sizes = false) : IMergingTransform( - num_inputs, header, header, true, + num_inputs, header, header, /*have_all_inputs_=*/ true, /*has_limit_below_one_block_=*/ false, header, num_inputs, std::move(description_), diff --git a/src/Processors/Merges/SummingSortedTransform.h b/src/Processors/Merges/SummingSortedTransform.h index 22361bb1a44..0287caed5aa 100644 --- a/src/Processors/Merges/SummingSortedTransform.h +++ b/src/Processors/Merges/SummingSortedTransform.h @@ -19,7 +19,7 @@ public: const Names & partition_key_columns, size_t max_block_size) : IMergingTransform( - num_inputs, header, header, true, + num_inputs, header, header, /*have_all_inputs_=*/ true, /*has_limit_below_one_block_=*/ false, header, num_inputs, std::move(description_), diff --git a/src/Processors/Merges/VersionedCollapsingTransform.h b/src/Processors/Merges/VersionedCollapsingTransform.h index f593734c603..f260e20f1da 100644 --- a/src/Processors/Merges/VersionedCollapsingTransform.h +++ b/src/Processors/Merges/VersionedCollapsingTransform.h @@ -19,7 +19,7 @@ public: WriteBuffer * out_row_sources_buf_ = nullptr, bool use_average_block_sizes = false) : IMergingTransform( - num_inputs, header, header, true, + num_inputs, header, header, /*have_all_inputs_=*/ true, /*has_limit_below_one_block_=*/ false, header, num_inputs, std::move(description_), diff --git a/src/Processors/Pipe.cpp b/src/Processors/Pipe.cpp index abd4df1f51f..e0da79f148d 100644 --- a/src/Processors/Pipe.cpp +++ b/src/Processors/Pipe.cpp @@ -4,7 +4,8 @@ #include #include #include -#include +#include +#include #include #include #include diff --git a/src/Processors/Port.h b/src/Processors/Port.h index ac71c394518..9f27b440be5 100644 --- a/src/Processors/Port.h +++ b/src/Processors/Port.h @@ -394,7 +394,7 @@ public: pushData({.chunk = std::move(chunk), .exception = {}}); } - void ALWAYS_INLINE push(std::exception_ptr exception) + void ALWAYS_INLINE pushException(std::exception_ptr exception) { pushData({.chunk = {}, .exception = std::move(exception)}); } diff --git a/src/Processors/QueryPipeline.h b/src/Processors/QueryPipeline.h index 1585f2532ff..358d31a6dff 100644 --- a/src/Processors/QueryPipeline.h +++ b/src/Processors/QueryPipeline.h @@ -1,6 +1,5 @@ #pragma once -#include #include #include #include diff --git a/src/Processors/QueryPlan/FinishSortingStep.cpp b/src/Processors/QueryPlan/FinishSortingStep.cpp index a2e056b3029..718eeb96cd8 100644 --- a/src/Processors/QueryPlan/FinishSortingStep.cpp +++ b/src/Processors/QueryPlan/FinishSortingStep.cpp @@ -31,12 +31,14 @@ FinishSortingStep::FinishSortingStep( SortDescription prefix_description_, SortDescription result_description_, size_t max_block_size_, - UInt64 limit_) + UInt64 limit_, + bool has_filtration_) : ITransformingStep(input_stream_, input_stream_.header, getTraits(limit_)) , prefix_description(std::move(prefix_description_)) , result_description(std::move(result_description_)) , max_block_size(max_block_size_) , limit(limit_) + , has_filtration(has_filtration_) { /// TODO: check input_stream is sorted by prefix_description. output_stream->sort_description = result_description; @@ -58,11 +60,14 @@ void FinishSortingStep::transformPipeline(QueryPipeline & pipeline, const BuildQ if (pipeline.getNumStreams() > 1) { UInt64 limit_for_merging = (need_finish_sorting ? 0 : limit); + bool has_limit_below_one_block = !has_filtration && limit_for_merging && limit_for_merging < max_block_size; auto transform = std::make_shared( pipeline.getHeader(), pipeline.getNumStreams(), prefix_description, - max_block_size, limit_for_merging); + max_block_size, + limit_for_merging, + has_limit_below_one_block); pipeline.addTransform(std::move(transform)); } diff --git a/src/Processors/QueryPlan/FinishSortingStep.h b/src/Processors/QueryPlan/FinishSortingStep.h index 9fe031e792d..5ea3a6d91b5 100644 --- a/src/Processors/QueryPlan/FinishSortingStep.h +++ b/src/Processors/QueryPlan/FinishSortingStep.h @@ -13,8 +13,9 @@ public: const DataStream & input_stream_, SortDescription prefix_description_, SortDescription result_description_, - size_t max_block_size, - UInt64 limit); + size_t max_block_size_, + UInt64 limit_, + bool has_filtration_); String getName() const override { return "FinishSorting"; } @@ -31,6 +32,7 @@ private: SortDescription result_description; size_t max_block_size; UInt64 limit; + bool has_filtration; }; } diff --git a/src/Processors/QueryPlan/JoinStep.cpp b/src/Processors/QueryPlan/JoinStep.cpp index b06d6628dcb..736d7eb37c1 100644 --- a/src/Processors/QueryPlan/JoinStep.cpp +++ b/src/Processors/QueryPlan/JoinStep.cpp @@ -70,7 +70,7 @@ FilledJoinStep::FilledJoinStep(const DataStream & input_stream_, JoinPtr join_, void FilledJoinStep::transformPipeline(QueryPipeline & pipeline, const BuildQueryPipelineSettings &) { bool default_totals = false; - if (!pipeline.hasTotals() && join->hasTotals()) + if (!pipeline.hasTotals() && join->getTotals()) { pipeline.addDefaultTotals(); default_totals = true; diff --git a/src/Processors/QueryPlan/QueryPlan.cpp b/src/Processors/QueryPlan/QueryPlan.cpp index 44c5c48975c..bc3b8458531 100644 --- a/src/Processors/QueryPlan/QueryPlan.cpp +++ b/src/Processors/QueryPlan/QueryPlan.cpp @@ -9,6 +9,7 @@ #include #include #include +#include #include namespace DB @@ -434,4 +435,59 @@ void QueryPlan::optimize(const QueryPlanOptimizationSettings & optimization_sett QueryPlanOptimizations::optimizeTree(optimization_settings, *root, nodes); } +void QueryPlan::explainEstimate(MutableColumns & columns) +{ + checkInitialized(); + + struct EstimateCounters + { + std::string database_name; + std::string table_name; + UInt64 parts = 0; + UInt64 rows = 0; + UInt64 marks = 0; + + EstimateCounters(const std::string & database, const std::string & table) : database_name(database), table_name(table) + { + } + }; + + using CountersPtr = std::shared_ptr; + std::unordered_map counters; + using processNodeFuncType = std::function; + processNodeFuncType process_node = [&counters, &process_node] (const Node * node) + { + if (!node) + return; + if (const auto * step = dynamic_cast(node->step.get())) + { + const auto & id = step->getStorageID(); + auto key = id.database_name + "." + id.table_name; + auto it = counters.find(key); + if (it == counters.end()) + { + it = counters.insert({key, std::make_shared(id.database_name, id.table_name)}).first; + } + it->second->parts += step->getSelectedParts(); + it->second->rows += step->getSelectedRows(); + it->second->marks += step->getSelectedMarks(); + } + for (const auto * child : node->children) + process_node(child); + }; + process_node(root); + + for (const auto & counter : counters) + { + size_t index = 0; + const auto & database_name = counter.second->database_name; + const auto & table_name = counter.second->table_name; + columns[index++]->insertData(database_name.c_str(), database_name.size()); + columns[index++]->insertData(table_name.c_str(), table_name.size()); + columns[index++]->insert(counter.second->parts); + columns[index++]->insert(counter.second->rows); + columns[index++]->insert(counter.second->marks); + } +} + } diff --git a/src/Processors/QueryPlan/QueryPlan.h b/src/Processors/QueryPlan/QueryPlan.h index 4c75f00cf4d..95034d34c9c 100644 --- a/src/Processors/QueryPlan/QueryPlan.h +++ b/src/Processors/QueryPlan/QueryPlan.h @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -85,6 +86,7 @@ public: JSONBuilder::ItemPtr explainPlan(const ExplainPlanOptions & options); void explainPlan(WriteBuffer & buffer, const ExplainPlanOptions & options); void explainPipeline(WriteBuffer & buffer, const ExplainPipelineOptions & options); + void explainEstimate(MutableColumns & columns); /// Set upper limit for the recommend number of threads. Will be applied to the newly-created pipelines. /// TODO: make it in a better way. diff --git a/src/Processors/QueryPlan/ReadFromMergeTree.cpp b/src/Processors/QueryPlan/ReadFromMergeTree.cpp index 2dc8246cde7..f8c12449c7e 100644 --- a/src/Processors/QueryPlan/ReadFromMergeTree.cpp +++ b/src/Processors/QueryPlan/ReadFromMergeTree.cpp @@ -13,9 +13,9 @@ #include #include #include -#include +#include #include -#include +#include #include #include #include @@ -47,6 +47,9 @@ struct ReadFromMergeTree::AnalysisResult IndexStats index_stats; Names column_names_to_read; ReadFromMergeTree::ReadType read_type = ReadFromMergeTree::ReadType::Default; + UInt64 selected_rows = 0; + UInt64 selected_marks = 0; + UInt64 selected_parts = 0; }; static MergeTreeReaderSettings getMergeTreeReaderSettings(const ContextPtr & context) @@ -154,7 +157,7 @@ Pipe ReadFromMergeTree::readFromPool( for (size_t i = 0; i < max_streams; ++i) { - auto source = std::make_shared( + auto source = std::make_shared( i, pool, min_marks_for_concurrent_read, max_block_size, settings.preferred_block_size_bytes, settings.preferred_max_column_in_block_size_bytes, data, metadata_snapshot, use_uncompressed_cache, @@ -176,26 +179,32 @@ template ProcessorPtr ReadFromMergeTree::createSource( const RangesInDataPart & part, const Names & required_columns, - bool use_uncompressed_cache) + bool use_uncompressed_cache, + bool has_limit_below_one_block) { return std::make_shared( data, metadata_snapshot, part.data_part, max_block_size, preferred_block_size_bytes, - preferred_max_column_in_block_size_bytes, required_columns, part.ranges, use_uncompressed_cache, - prewhere_info, actions_settings, true, reader_settings, virt_column_names, part.part_index_in_query); + preferred_max_column_in_block_size_bytes, required_columns, part.ranges, use_uncompressed_cache, prewhere_info, + actions_settings, true, reader_settings, virt_column_names, part.part_index_in_query, has_limit_below_one_block); } Pipe ReadFromMergeTree::readInOrder( RangesInDataParts parts_with_range, Names required_columns, ReadType read_type, - bool use_uncompressed_cache) + bool use_uncompressed_cache, + UInt64 limit) { Pipes pipes; + /// For reading in order it makes sense to read only + /// one range per task to reduce number of read rows. + bool has_limit_below_one_block = read_type != ReadType::Default && limit && limit < max_block_size; + for (const auto & part : parts_with_range) { auto source = read_type == ReadType::InReverseOrder - ? createSource(part, required_columns, use_uncompressed_cache) - : createSource(part, required_columns, use_uncompressed_cache); + ? createSource(part, required_columns, use_uncompressed_cache, has_limit_below_one_block) + : createSource(part, required_columns, use_uncompressed_cache, has_limit_below_one_block); pipes.emplace_back(std::move(source)); } @@ -221,7 +230,7 @@ Pipe ReadFromMergeTree::read( return readFromPool(parts_with_range, required_columns, max_streams, min_marks_for_concurrent_read, use_uncompressed_cache); - auto pipe = readInOrder(parts_with_range, required_columns, read_type, use_uncompressed_cache); + auto pipe = readInOrder(parts_with_range, required_columns, read_type, use_uncompressed_cache, 0); /// Use ConcatProcessor to concat sources together. /// It is needed to read in parts order (and so in PK order) if single thread is used. @@ -400,7 +409,6 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreamsWithOrder( { RangesInDataPart part = parts_with_ranges.back(); parts_with_ranges.pop_back(); - size_t & marks_in_part = info.sum_marks_in_parts.back(); /// We will not take too few rows from a part. @@ -415,8 +423,13 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreamsWithOrder( MarkRanges ranges_to_get_from_part; + /// We take full part if it contains enough marks or + /// if we know limit and part contains less than 'limit' rows. + bool take_full_part = marks_in_part <= need_marks + || (input_order_info->limit && input_order_info->limit < part.getRowsCount()); + /// We take the whole part if it is small enough. - if (marks_in_part <= need_marks) + if (take_full_part) { ranges_to_get_from_part = part.ranges; @@ -446,6 +459,7 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreamsWithOrder( } parts_with_ranges.emplace_back(part); } + ranges_to_get_from_part = split_ranges(ranges_to_get_from_part, input_order_info->direction); new_parts.emplace_back(part.data_part, part.part_index_in_query, std::move(ranges_to_get_from_part)); } @@ -454,8 +468,8 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreamsWithOrder( ? ReadFromMergeTree::ReadType::InOrder : ReadFromMergeTree::ReadType::InReverseOrder; - pipes.emplace_back(read(std::move(new_parts), column_names, read_type, - requested_num_streams, info.min_marks_for_concurrent_read, info.use_uncompressed_cache)); + pipes.emplace_back(readInOrder(std::move(new_parts), column_names, read_type, + info.use_uncompressed_cache, input_order_info->limit)); } if (need_preliminary_merge) @@ -483,7 +497,8 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreamsWithOrder( pipe.getHeader(), pipe.numOutputPorts(), sort_description, - max_block_size); + max_block_size, + 0, true); pipe.addTransform(std::move(transform)); } @@ -659,7 +674,7 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreamsFinal( /// If do_not_merge_across_partitions_select_final is true and there is only one part in partition /// with level > 0 then we won't postprocess this part and if num_streams > 1 we /// can use parallel select on such parts. We save such parts in one vector and then use - /// MergeTreeReadPool and MergeTreeThreadSelectBlockInputProcessor for parallel select. + /// MergeTreeReadPool and MergeTreeThreadSelectProcessor for parallel select. if (num_streams > 1 && settings.do_not_merge_across_partitions_select_final && std::distance(parts_to_merge_ranges[range_index], parts_to_merge_ranges[range_index + 1]) == 1 && parts_to_merge_ranges[range_index]->data_part->info.level > 0) @@ -829,7 +844,8 @@ ReadFromMergeTree::AnalysisResult ReadFromMergeTree::selectRangesToRead(MergeTre log, requested_num_streams, result.index_stats, - true); + true /* use_skip_indexes */, + true /* check_limits */); size_t sum_marks_pk = total_marks_pk; for (const auto & stat : result.index_stats) @@ -838,13 +854,17 @@ ReadFromMergeTree::AnalysisResult ReadFromMergeTree::selectRangesToRead(MergeTre size_t sum_marks = 0; size_t sum_ranges = 0; + size_t sum_rows = 0; for (const auto & part : result.parts_with_ranges) { sum_ranges += part.ranges.size(); sum_marks += part.getMarksCount(); + sum_rows += part.getRowsCount(); } - + result.selected_parts = result.parts_with_ranges.size(); + result.selected_marks = sum_marks; + result.selected_rows = sum_rows; LOG_DEBUG( log, "Selected {}/{} parts by partition key, {} parts by primary key, {}/{} marks by primary key, {} marks to read from {} ranges", @@ -882,6 +902,9 @@ void ReadFromMergeTree::initializePipeline(QueryPipeline & pipeline, const Build return; } + selected_marks = result.selected_marks; + selected_rows = result.selected_rows; + selected_parts = result.selected_parts; /// Projection, that needed to drop columns, which have appeared by execution /// of some extra expressions, and to allow execute the same expressions later. /// NOTE: It may lead to double computation of expressions. diff --git a/src/Processors/QueryPlan/ReadFromMergeTree.h b/src/Processors/QueryPlan/ReadFromMergeTree.h index a5184d28593..e83746c3ff0 100644 --- a/src/Processors/QueryPlan/ReadFromMergeTree.h +++ b/src/Processors/QueryPlan/ReadFromMergeTree.h @@ -80,6 +80,10 @@ public: void describeActions(JSONBuilder::JSONMap & map) const override; void describeIndexes(JSONBuilder::JSONMap & map) const override; + const StorageID getStorageID() const { return data.getStorageID(); } + UInt64 getSelectedParts() const { return selected_parts; } + UInt64 getSelectedRows() const { return selected_rows; } + UInt64 getSelectedMarks() const { return selected_marks; } private: const MergeTreeReaderSettings reader_settings; @@ -106,13 +110,16 @@ private: std::shared_ptr max_block_numbers_to_read; Poco::Logger * log; + UInt64 selected_parts = 0; + UInt64 selected_rows = 0; + UInt64 selected_marks = 0; Pipe read(RangesInDataParts parts_with_range, Names required_columns, ReadType read_type, size_t max_streams, size_t min_marks_for_concurrent_read, bool use_uncompressed_cache); Pipe readFromPool(RangesInDataParts parts_with_ranges, Names required_columns, size_t max_streams, size_t min_marks_for_concurrent_read, bool use_uncompressed_cache); - Pipe readInOrder(RangesInDataParts parts_with_range, Names required_columns, ReadType read_type, bool use_uncompressed_cache); + Pipe readInOrder(RangesInDataParts parts_with_range, Names required_columns, ReadType read_type, bool use_uncompressed_cache, UInt64 limit); template - ProcessorPtr createSource(const RangesInDataPart & part, const Names & required_columns, bool use_uncompressed_cache); + ProcessorPtr createSource(const RangesInDataPart & part, const Names & required_columns, bool use_uncompressed_cache, bool has_limit_below_one_block); Pipe spreadMarkRangesAmongStreams( RangesInDataParts && parts_with_ranges, diff --git a/src/Processors/QueryPlan/ReadFromRemote.cpp b/src/Processors/QueryPlan/ReadFromRemote.cpp new file mode 100644 index 00000000000..506ef795473 --- /dev/null +++ b/src/Processors/QueryPlan/ReadFromRemote.cpp @@ -0,0 +1,237 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ALL_CONNECTION_TRIES_FAILED; +} + +static ActionsDAGPtr getConvertingDAG(const Block & block, const Block & header) +{ + /// Convert header structure to expected. + /// Also we ignore constants from result and replace it with constants from header. + /// It is needed for functions like `now64()` or `randConstant()` because their values may be different. + return ActionsDAG::makeConvertingActions( + block.getColumnsWithTypeAndName(), + header.getColumnsWithTypeAndName(), + ActionsDAG::MatchColumnsMode::Name, + true); +} + +void addConvertingActions(QueryPlan & plan, const Block & header) +{ + if (blocksHaveEqualStructure(plan.getCurrentDataStream().header, header)) + return; + + auto convert_actions_dag = getConvertingDAG(plan.getCurrentDataStream().header, header); + auto converting = std::make_unique(plan.getCurrentDataStream(), convert_actions_dag); + plan.addStep(std::move(converting)); +} + +static void addConvertingActions(Pipe & pipe, const Block & header) +{ + if (blocksHaveEqualStructure(pipe.getHeader(), header)) + return; + + auto convert_actions = std::make_shared(getConvertingDAG(pipe.getHeader(), header)); + pipe.addSimpleTransform([&](const Block & cur_header, Pipe::StreamType) -> ProcessorPtr + { + return std::make_shared(cur_header, convert_actions); + }); +} + +static String formattedAST(const ASTPtr & ast) +{ + if (!ast) + return {}; + WriteBufferFromOwnString buf; + formatAST(*ast, buf, false, true); + return buf.str(); +} + +static std::unique_ptr createLocalPlan( + const ASTPtr & query_ast, + const Block & header, + ContextPtr context, + QueryProcessingStage::Enum processed_stage, + UInt32 shard_num, + UInt32 shard_count) +{ + checkStackSize(); + + auto query_plan = std::make_unique(); + + InterpreterSelectQuery interpreter( + query_ast, context, SelectQueryOptions(processed_stage).setShardInfo(shard_num, shard_count)); + interpreter.buildQueryPlan(*query_plan); + + addConvertingActions(*query_plan, header); + + return query_plan; +} + + +ReadFromRemote::ReadFromRemote( + ClusterProxy::IStreamFactory::Shards shards_, + Block header_, + QueryProcessingStage::Enum stage_, + StorageID main_table_, + ASTPtr table_func_ptr_, + ContextPtr context_, + ThrottlerPtr throttler_, + Scalars scalars_, + Tables external_tables_, + Poco::Logger * log_, + UInt32 shard_count_) + : ISourceStep(DataStream{.header = std::move(header_)}) + , shards(std::move(shards_)) + , stage(stage_) + , main_table(std::move(main_table_)) + , table_func_ptr(std::move(table_func_ptr_)) + , context(std::move(context_)) + , throttler(std::move(throttler_)) + , scalars(std::move(scalars_)) + , external_tables(std::move(external_tables_)) + , log(log_) + , shard_count(shard_count_) +{ +} + +void ReadFromRemote::addLazyPipe(Pipes & pipes, const ClusterProxy::IStreamFactory::Shard & shard) +{ + bool add_agg_info = stage == QueryProcessingStage::WithMergeableState; + bool add_totals = false; + bool add_extremes = false; + bool async_read = context->getSettingsRef().async_socket_for_remote; + if (stage == QueryProcessingStage::Complete) + { + add_totals = shard.query->as().group_by_with_totals; + add_extremes = context->getSettingsRef().extremes; + } + + auto lazily_create_stream = [ + pool = shard.pool, shard_num = shard.shard_num, shard_count = shard_count, query = shard.query, header = shard.header, + context = context, throttler = throttler, + main_table = main_table, table_func_ptr = table_func_ptr, + scalars = scalars, external_tables = external_tables, + stage = stage, local_delay = shard.local_delay, + add_agg_info, add_totals, add_extremes, async_read]() mutable + -> Pipe + { + auto current_settings = context->getSettingsRef(); + auto timeouts = ConnectionTimeouts::getTCPTimeoutsWithFailover( + current_settings).getSaturated( + current_settings.max_execution_time); + std::vector try_results; + try + { + if (table_func_ptr) + try_results = pool->getManyForTableFunction(timeouts, ¤t_settings, PoolMode::GET_MANY); + else + try_results = pool->getManyChecked(timeouts, ¤t_settings, PoolMode::GET_MANY, main_table.getQualifiedName()); + } + catch (const Exception & ex) + { + if (ex.code() == ErrorCodes::ALL_CONNECTION_TRIES_FAILED) + LOG_WARNING(&Poco::Logger::get("ClusterProxy::SelectStreamFactory"), + "Connections to remote replicas of local shard {} failed, will use stale local replica", shard_num); + else + throw; + } + + double max_remote_delay = 0.0; + for (const auto & try_result : try_results) + { + if (!try_result.is_up_to_date) + max_remote_delay = std::max(try_result.staleness, max_remote_delay); + } + + if (try_results.empty() || local_delay < max_remote_delay) + { + auto plan = createLocalPlan(query, header, context, stage, shard_num, shard_count); + return QueryPipeline::getPipe(std::move(*plan->buildQueryPipeline( + QueryPlanOptimizationSettings::fromContext(context), + BuildQueryPipelineSettings::fromContext(context)))); + } + else + { + std::vector connections; + connections.reserve(try_results.size()); + for (auto & try_result : try_results) + connections.emplace_back(std::move(try_result.entry)); + + String query_string = formattedAST(query); + + scalars["_shard_num"] + = Block{{DataTypeUInt32().createColumnConst(1, shard_num), std::make_shared(), "_shard_num"}}; + auto remote_query_executor = std::make_shared( + pool, std::move(connections), query_string, header, context, throttler, scalars, external_tables, stage); + + return createRemoteSourcePipe(remote_query_executor, add_agg_info, add_totals, add_extremes, async_read); + } + }; + + pipes.emplace_back(createDelayedPipe(shard.header, lazily_create_stream, add_totals, add_extremes)); + pipes.back().addInterpreterContext(context); + addConvertingActions(pipes.back(), output_stream->header); +} + +void ReadFromRemote::addPipe(Pipes & pipes, const ClusterProxy::IStreamFactory::Shard & shard) +{ + bool add_agg_info = stage == QueryProcessingStage::WithMergeableState; + bool add_totals = false; + bool add_extremes = false; + bool async_read = context->getSettingsRef().async_socket_for_remote; + if (stage == QueryProcessingStage::Complete) + { + add_totals = shard.query->as().group_by_with_totals; + add_extremes = context->getSettingsRef().extremes; + } + + String query_string = formattedAST(shard.query); + + scalars["_shard_num"] + = Block{{DataTypeUInt32().createColumnConst(1, shard.shard_num), std::make_shared(), "_shard_num"}}; + auto remote_query_executor = std::make_shared( + shard.pool, query_string, shard.header, context, throttler, scalars, external_tables, stage); + remote_query_executor->setLogger(log); + + remote_query_executor->setPoolMode(PoolMode::GET_MANY); + if (!table_func_ptr) + remote_query_executor->setMainTable(main_table); + + pipes.emplace_back(createRemoteSourcePipe(remote_query_executor, add_agg_info, add_totals, add_extremes, async_read)); + pipes.back().addInterpreterContext(context); + addConvertingActions(pipes.back(), output_stream->header); +} + +void ReadFromRemote::initializePipeline(QueryPipeline & pipeline, const BuildQueryPipelineSettings &) +{ + Pipes pipes; + for (const auto & shard : shards) + { + if (shard.lazy) + addLazyPipe(pipes, shard); + else + addPipe(pipes, shard); + } + + auto pipe = Pipe::unitePipes(std::move(pipes)); + pipeline.init(std::move(pipe)); +} + +} diff --git a/src/Processors/QueryPlan/ReadFromRemote.h b/src/Processors/QueryPlan/ReadFromRemote.h new file mode 100644 index 00000000000..ba0060d5470 --- /dev/null +++ b/src/Processors/QueryPlan/ReadFromRemote.h @@ -0,0 +1,59 @@ +#pragma once +#include +#include +#include +#include +#include + +namespace DB +{ + +class ConnectionPoolWithFailover; +using ConnectionPoolWithFailoverPtr = std::shared_ptr; + +class Throttler; +using ThrottlerPtr = std::shared_ptr; + +/// Reading step from remote servers. +/// Unite query results from several shards. +class ReadFromRemote final : public ISourceStep +{ +public: + ReadFromRemote( + ClusterProxy::IStreamFactory::Shards shards_, + Block header_, + QueryProcessingStage::Enum stage_, + StorageID main_table_, + ASTPtr table_func_ptr_, + ContextPtr context_, + ThrottlerPtr throttler_, + Scalars scalars_, + Tables external_tables_, + Poco::Logger * log_, + UInt32 shard_count_); + + String getName() const override { return "ReadFromRemote"; } + + void initializePipeline(QueryPipeline & pipeline, const BuildQueryPipelineSettings &) override; + +private: + ClusterProxy::IStreamFactory::Shards shards; + QueryProcessingStage::Enum stage; + + StorageID main_table; + ASTPtr table_func_ptr; + + ContextPtr context; + + ThrottlerPtr throttler; + Scalars scalars; + Tables external_tables; + + Poco::Logger * log; + + UInt32 shard_count; + void addLazyPipe(Pipes & pipes, const ClusterProxy::IStreamFactory::Shard & shard); + void addPipe(Pipes & pipes, const ClusterProxy::IStreamFactory::Shard & shard); +}; + +} diff --git a/src/Processors/QueueBuffer.h b/src/Processors/QueueBuffer.h index d81f7e779a3..826f4a22b8b 100644 --- a/src/Processors/QueueBuffer.h +++ b/src/Processors/QueueBuffer.h @@ -7,6 +7,9 @@ namespace DB { +/** Reads all data into queue. + * After all data has been read - output it in the same order. + */ class QueueBuffer : public IAccumulatingTransform { private: diff --git a/src/Processors/Sinks/EmptySink.h b/src/Processors/Sinks/EmptySink.h new file mode 100644 index 00000000000..6f4b1675779 --- /dev/null +++ b/src/Processors/Sinks/EmptySink.h @@ -0,0 +1,18 @@ +#pragma once +#include + +namespace DB +{ + +/// Sink which reads everything and do nothing with it. +class EmptySink : public ISink +{ +public: + explicit EmptySink(Block header) : ISink(std::move(header)) {} + String getName() const override { return "EmptySink"; } + +protected: + void consume(Chunk) override {} +}; + +} diff --git a/src/Processors/NullSink.h b/src/Processors/Sinks/NullSink.h similarity index 57% rename from src/Processors/NullSink.h rename to src/Processors/Sinks/NullSink.h index 962e65b5560..c63ef1a88b1 100644 --- a/src/Processors/NullSink.h +++ b/src/Processors/Sinks/NullSink.h @@ -1,5 +1,4 @@ #pragma once -#include #include namespace DB @@ -21,15 +20,4 @@ protected: void consume(Chunk) override {} }; -/// Sink which reads everything and do nothing with it. -class EmptySink : public ISink -{ -public: - explicit EmptySink(Block header) : ISink(std::move(header)) {} - String getName() const override { return "EmptySink"; } - -protected: - void consume(Chunk) override {} -}; - } diff --git a/src/Processors/Sinks/SinkToStorage.h b/src/Processors/Sinks/SinkToStorage.h new file mode 100644 index 00000000000..c57adef568f --- /dev/null +++ b/src/Processors/Sinks/SinkToStorage.h @@ -0,0 +1,32 @@ +#pragma once +#include +#include + +namespace DB +{ + +/// Sink which is returned from Storage::read. +/// The same as ISink, but also can hold table lock. +class SinkToStorage : public ISink +{ +public: + using ISink::ISink; + + void addTableLock(const TableLockHolder & lock) { table_locks.push_back(lock); } + +private: + std::vector table_locks; +}; + +using SinkToStoragePtr = std::shared_ptr; + + +class NullSinkToStorage : public SinkToStorage +{ +public: + using SinkToStorage::SinkToStorage; + std::string getName() const override { return "NullSinkToStorage"; } + void consume(Chunk) override {} +}; + +} diff --git a/src/Processors/Sources/BlocksListSource.h b/src/Processors/Sources/BlocksListSource.h new file mode 100644 index 00000000000..e0388214f3e --- /dev/null +++ b/src/Processors/Sources/BlocksListSource.h @@ -0,0 +1,47 @@ +#pragma once + +#include + + +namespace DB +{ + +/** A stream of blocks from which you can read the next block from an explicitly provided list. + * Also see OneBlockInputStream. + */ +class BlocksListSource : public SourceWithProgress +{ +public: + /// Acquires the ownership of the block list. + explicit BlocksListSource(BlocksList && list_) + : SourceWithProgress(list_.empty() ? Block() : list_.front().cloneEmpty()) + , list(std::move(list_)), it(list.begin()), end(list.end()) {} + + /// Uses a list of blocks lying somewhere else. + BlocksListSource(BlocksList::iterator & begin_, BlocksList::iterator & end_) + : SourceWithProgress(begin_ == end_ ? Block() : begin_->cloneEmpty()) + , it(begin_), end(end_) {} + + String getName() const override { return "BlocksListSource"; } + +protected: + + Chunk generate() override + { + if (it == end) + return {}; + + Block res = *it; + ++it; + + size_t num_rows = res.rows(); + return Chunk(res.getColumns(), num_rows); + } + +private: + BlocksList list; + BlocksList::iterator it; + const BlocksList::iterator end; +}; + +} diff --git a/src/DataStreams/BlocksSource.h b/src/Processors/Sources/BlocksSource.h similarity index 97% rename from src/DataStreams/BlocksSource.h rename to src/Processors/Sources/BlocksSource.h index 249f089f9af..a416a48e9d2 100644 --- a/src/DataStreams/BlocksSource.h +++ b/src/Processors/Sources/BlocksSource.h @@ -11,7 +11,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */ -#include #include #include diff --git a/src/Processors/Sources/DelayedSource.cpp b/src/Processors/Sources/DelayedSource.cpp index 054453c4b5a..205ea6e2253 100644 --- a/src/Processors/Sources/DelayedSource.cpp +++ b/src/Processors/Sources/DelayedSource.cpp @@ -1,6 +1,6 @@ #include #include -#include +#include #include namespace DB diff --git a/src/Processors/Sources/SourceFromInputStream.h b/src/Processors/Sources/SourceFromInputStream.h index 2e8cf007623..9649385909c 100644 --- a/src/Processors/Sources/SourceFromInputStream.h +++ b/src/Processors/Sources/SourceFromInputStream.h @@ -1,6 +1,9 @@ #pragma once + #include #include +#include + namespace DB { diff --git a/src/Processors/Sources/SourceWithProgress.h b/src/Processors/Sources/SourceWithProgress.h index 78e56eafb52..49728be01e3 100644 --- a/src/Processors/Sources/SourceWithProgress.h +++ b/src/Processors/Sources/SourceWithProgress.h @@ -1,12 +1,16 @@ #pragma once #include -#include #include #include +#include + namespace DB { +class QueryStatus; +class EnabledQuota; + /// Adds progress to ISource. /// This class takes care of limits, quotas, callback on progress and updating performance counters for current thread. class ISourceWithProgress : public ISource diff --git a/src/DataStreams/AddingDefaultsBlockInputStream.cpp b/src/Processors/Transforms/AddingDefaultsTransform.cpp similarity index 89% rename from src/DataStreams/AddingDefaultsBlockInputStream.cpp rename to src/Processors/Transforms/AddingDefaultsTransform.cpp index 81be24439a5..c92d4d7a456 100644 --- a/src/DataStreams/AddingDefaultsBlockInputStream.cpp +++ b/src/Processors/Transforms/AddingDefaultsTransform.cpp @@ -2,7 +2,8 @@ #include #include #include -#include +#include +#include #include #include @@ -13,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -127,31 +129,32 @@ static MutableColumnPtr mixColumns(const ColumnWithTypeAndName & col_read, } -AddingDefaultsBlockInputStream::AddingDefaultsBlockInputStream( - const BlockInputStreamPtr & input, +AddingDefaultsTransform::AddingDefaultsTransform( + const Block & header, const ColumnsDescription & columns_, + IInputFormat & input_format_, ContextPtr context_) - : columns(columns_) + : ISimpleTransform(header, header, true) + , columns(columns_) , column_defaults(columns.getDefaults()) + , input_format(input_format_) , context(context_) { - children.push_back(input); - header = input->getHeader(); } -Block AddingDefaultsBlockInputStream::readImpl() +void AddingDefaultsTransform::transform(Chunk & chunk) { - Block res = children.back()->read(); - if (!res) - return res; - if (column_defaults.empty()) - return res; + return; - const BlockMissingValues & block_missing_values = children.back()->getMissingValues(); + const BlockMissingValues & block_missing_values = input_format.getMissingValues(); if (block_missing_values.empty()) - return res; + return; + + const auto & header = getOutputPort().getHeader(); + size_t num_rows = chunk.getNumRows(); + auto res = header.cloneWithColumns(chunk.detachColumns()); /// res block already has all columns values, with default value for type /// (not value specified in table). We identify which columns we need to @@ -169,7 +172,7 @@ Block AddingDefaultsBlockInputStream::readImpl() } if (!evaluate_block.columns()) - evaluate_block.insert({ColumnConst::create(ColumnUInt8::create(1, 0), res.rows()), std::make_shared(), "_dummy"}); + evaluate_block.insert({ColumnConst::create(ColumnUInt8::create(1, 0), num_rows), std::make_shared(), "_dummy"}); auto dag = evaluateMissingDefaults(evaluate_block, header.getNamesAndTypesList(), columns, context, false); if (dag) @@ -223,7 +226,7 @@ Block AddingDefaultsBlockInputStream::readImpl() res.setColumns(std::move(mutation)); } - return res; + chunk.setColumns(res.getColumns(), num_rows); } } diff --git a/src/DataStreams/AddingDefaultsBlockInputStream.h b/src/Processors/Transforms/AddingDefaultsTransform.h similarity index 52% rename from src/DataStreams/AddingDefaultsBlockInputStream.h rename to src/Processors/Transforms/AddingDefaultsTransform.h index 957f14caff3..844f4fb96e6 100644 --- a/src/DataStreams/AddingDefaultsBlockInputStream.h +++ b/src/Processors/Transforms/AddingDefaultsTransform.h @@ -1,31 +1,33 @@ #pragma once -#include +#include #include namespace DB { +class IInputFormat; + /// Adds defaults to columns using BlockDelayedDefaults bitmask attached to Block by child InputStream. -class AddingDefaultsBlockInputStream : public IBlockInputStream +class AddingDefaultsTransform : public ISimpleTransform { public: - AddingDefaultsBlockInputStream( - const BlockInputStreamPtr & input, + AddingDefaultsTransform( + const Block & header, const ColumnsDescription & columns_, + IInputFormat & input_format_, ContextPtr context_); - String getName() const override { return "AddingDefaults"; } - Block getHeader() const override { return header; } + String getName() const override { return "AddingDefaultsTransform"; } protected: - Block readImpl() override; + void transform(Chunk & chunk) override; private: - Block header; const ColumnsDescription columns; const ColumnDefaults column_defaults; + IInputFormat & input_format; ContextPtr context; }; diff --git a/src/DataStreams/CheckSortedBlockInputStream.cpp b/src/Processors/Transforms/CheckSortedTransform.cpp similarity index 68% rename from src/DataStreams/CheckSortedBlockInputStream.cpp rename to src/Processors/Transforms/CheckSortedTransform.cpp index 064c1b690b8..3d4518a935d 100644 --- a/src/DataStreams/CheckSortedBlockInputStream.cpp +++ b/src/Processors/Transforms/CheckSortedTransform.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include #include @@ -12,20 +12,20 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; } -CheckSortedBlockInputStream::CheckSortedBlockInputStream( - const BlockInputStreamPtr & input_, +CheckSortedTransform::CheckSortedTransform( + const Block & header_, const SortDescription & sort_description_) - : header(input_->getHeader()) + : ISimpleTransform(header_, header_, false) , sort_description_map(addPositionsToSortDescriptions(sort_description_)) { - children.push_back(input_); } SortDescriptionsWithPositions -CheckSortedBlockInputStream::addPositionsToSortDescriptions(const SortDescription & sort_description) +CheckSortedTransform::addPositionsToSortDescriptions(const SortDescription & sort_description) { SortDescriptionsWithPositions result; result.reserve(sort_description.size()); + const auto & header = getInputPort().getHeader(); for (SortColumnDescription description_copy : sort_description) { @@ -39,11 +39,11 @@ CheckSortedBlockInputStream::addPositionsToSortDescriptions(const SortDescriptio } -Block CheckSortedBlockInputStream::readImpl() +void CheckSortedTransform::transform(Chunk & chunk) { - Block block = children.back()->read(); - if (!block || block.rows() == 0) - return block; + size_t num_rows = chunk.getNumRows(); + if (num_rows == 0) + return; auto check = [this](const Columns & left, size_t left_index, const Columns & right, size_t right_index) { @@ -70,23 +70,20 @@ Block CheckSortedBlockInputStream::readImpl() } }; - auto block_columns = block.getColumns(); + const auto & chunk_columns = chunk.getColumns(); if (!last_row.empty()) - check(last_row, 0, block_columns, 0); + check(last_row, 0, chunk_columns, 0); - size_t rows = block.rows(); - for (size_t i = 1; i < rows; ++i) - check(block_columns, i - 1, block_columns, i); + for (size_t i = 1; i < num_rows; ++i) + check(chunk_columns, i - 1, chunk_columns, i); last_row.clear(); - for (size_t i = 0; i < block.columns(); ++i) + for (const auto & chunk_column : chunk_columns) { - auto column = block_columns[i]->cloneEmpty(); - column->insertFrom(*block_columns[i], rows - 1); + auto column = chunk_column->cloneEmpty(); + column->insertFrom(*chunk_column, num_rows - 1); last_row.emplace_back(std::move(column)); } - - return block; } } diff --git a/src/DataStreams/CheckSortedBlockInputStream.h b/src/Processors/Transforms/CheckSortedTransform.h similarity index 64% rename from src/DataStreams/CheckSortedBlockInputStream.h rename to src/Processors/Transforms/CheckSortedTransform.h index 42060befeeb..d1b13d22578 100644 --- a/src/DataStreams/CheckSortedBlockInputStream.h +++ b/src/Processors/Transforms/CheckSortedTransform.h @@ -1,5 +1,5 @@ #pragma once -#include +#include #include #include @@ -9,26 +9,23 @@ using SortDescriptionsWithPositions = std::vector; /// Streams checks that flow of blocks is sorted in the sort_description order /// Othrewise throws exception in readImpl function. -class CheckSortedBlockInputStream : public IBlockInputStream +class CheckSortedTransform : public ISimpleTransform { public: - CheckSortedBlockInputStream( - const BlockInputStreamPtr & input_, + CheckSortedTransform( + const Block & header_, const SortDescription & sort_description_); - String getName() const override { return "CheckingSorted"; } + String getName() const override { return "CheckSortedTransform"; } - Block getHeader() const override { return header; } protected: - Block readImpl() override; + void transform(Chunk & chunk) override; private: - Block header; SortDescriptionsWithPositions sort_description_map; Columns last_row; -private: /// Just checks, that all sort_descriptions has column_number SortDescriptionsWithPositions addPositionsToSortDescriptions(const SortDescription & sort_description); }; diff --git a/src/Processors/Transforms/CreatingSetsTransform.cpp b/src/Processors/Transforms/CreatingSetsTransform.cpp index 86051019235..6f69765ee23 100644 --- a/src/Processors/Transforms/CreatingSetsTransform.cpp +++ b/src/Processors/Transforms/CreatingSetsTransform.cpp @@ -1,6 +1,6 @@ #include +#include -#include #include #include @@ -10,6 +10,7 @@ #include #include + namespace DB { @@ -49,7 +50,7 @@ void CreatingSetsTransform::startSubquery() LOG_TRACE(log, "Filling temporary table."); if (subquery.table) - table_out = subquery.table->write({}, subquery.table->getInMemoryMetadataPtr(), getContext()); + table_out = std::make_shared(subquery.table->write({}, subquery.table->getInMemoryMetadataPtr(), getContext())); done_with_set = !subquery.set; done_with_table = !subquery.table; diff --git a/src/Processors/Transforms/JoiningTransform.cpp b/src/Processors/Transforms/JoiningTransform.cpp index 31b2da46ab3..e402fd788bc 100644 --- a/src/Processors/Transforms/JoiningTransform.cpp +++ b/src/Processors/Transforms/JoiningTransform.cpp @@ -1,8 +1,9 @@ #include #include -#include -#include +#include #include +#include + namespace DB { @@ -159,19 +160,16 @@ void JoiningTransform::transform(Chunk & chunk) Block block; if (on_totals) { - /// We have to make chunk empty before return - /// In case of using `arrayJoin` we can get more or less rows than one - auto cols = chunk.detachColumns(); - for (auto & col : cols) - col = col->cloneResized(1); - block = inputs.front().getHeader().cloneWithColumns(std::move(cols)); + const auto & left_totals = inputs.front().getHeader().cloneWithColumns(chunk.detachColumns()); + const auto & right_totals = join->getTotals(); /// Drop totals if both out stream and joined stream doesn't have ones. /// See comment in ExpressionTransform.h - if (default_totals && !join->hasTotals()) + if (default_totals && !right_totals) return; - join->joinTotals(block); + block = outputs.front().getHeader().cloneEmpty(); + JoinCommon::joinTotals(left_totals, right_totals, join->getTableJoin(), block); } else block = readExecute(chunk); @@ -183,11 +181,9 @@ void JoiningTransform::transform(Chunk & chunk) Block JoiningTransform::readExecute(Chunk & chunk) { Block res; - // std::cerr << "=== Chunk rows " << chunk.getNumRows() << " cols " << chunk.getNumColumns() << std::endl; if (!not_processed) { - // std::cerr << "!not_processed " << std::endl; if (chunk.hasColumns()) res = inputs.front().getHeader().cloneWithColumns(chunk.detachColumns()); @@ -196,7 +192,6 @@ Block JoiningTransform::readExecute(Chunk & chunk) } else if (not_processed->empty()) /// There's not processed data inside expression. { - // std::cerr << "not_processed->empty() " << std::endl; if (chunk.hasColumns()) res = inputs.front().getHeader().cloneWithColumns(chunk.detachColumns()); @@ -205,12 +200,10 @@ Block JoiningTransform::readExecute(Chunk & chunk) } else { - // std::cerr << "not not_processed->empty() " << std::endl; res = std::move(not_processed->block); join->joinBlock(res, not_processed); } - // std::cerr << "Res block rows " << res.rows() << " cols " << res.columns() << std::endl; return res; } diff --git a/src/Processors/Transforms/MergeSortingTransform.cpp b/src/Processors/Transforms/MergeSortingTransform.cpp index 1806693db3a..ca78a29071e 100644 --- a/src/Processors/Transforms/MergeSortingTransform.cpp +++ b/src/Processors/Transforms/MergeSortingTransform.cpp @@ -200,6 +200,7 @@ void MergeSortingTransform::consume(Chunk chunk) description, max_merged_block_size, limit, + false, nullptr, quiet, use_average_block_sizes, diff --git a/src/Processors/Transforms/SortingTransform.h b/src/Processors/Transforms/SortingTransform.h index 9178991f324..0f7cb4347a4 100644 --- a/src/Processors/Transforms/SortingTransform.h +++ b/src/Processors/Transforms/SortingTransform.h @@ -3,7 +3,6 @@ #include #include #include -#include #include diff --git a/src/Processors/Transforms/TotalsHavingTransform.h b/src/Processors/Transforms/TotalsHavingTransform.h index 5809f382e0e..d42543d311a 100644 --- a/src/Processors/Transforms/TotalsHavingTransform.h +++ b/src/Processors/Transforms/TotalsHavingTransform.h @@ -70,7 +70,7 @@ private: /// They are added or not added to the current_totals, depending on the totals_mode. Chunk overflow_aggregates; - /// Here, total values are accumulated. After the work is finished, they will be placed in IBlockInputStream::totals. + /// Here, total values are accumulated. After the work is finished, they will be placed in totals. MutableColumns current_totals; }; diff --git a/src/Processors/Transforms/WindowTransform.cpp b/src/Processors/Transforms/WindowTransform.cpp index 43a7e745842..3ab1a23537b 100644 --- a/src/Processors/Transforms/WindowTransform.cpp +++ b/src/Processors/Transforms/WindowTransform.cpp @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -196,6 +197,16 @@ WindowTransform::WindowTransform(const Block & input_header_, , input_header(input_header_) , window_description(window_description_) { + // Materialize all columns in header, because we materialize all columns + // in chunks and it's convenient if they match. + auto input_columns = input_header.getColumns(); + for (auto & column : input_columns) + { + column = std::move(column)->convertToFullColumnIfConst(); + } + input_header.setColumns(std::move(input_columns)); + + // Initialize window function workspaces. workspaces.reserve(functions.size()); for (const auto & f : functions) { @@ -347,8 +358,8 @@ void WindowTransform::advancePartitionEnd() assert(end.block == partition_end.block + 1); // Try to advance the partition end pointer. - const size_t n = partition_by_indices.size(); - if (n == 0) + const size_t partition_by_columns = partition_by_indices.size(); + if (partition_by_columns == 0) { // No PARTITION BY. All input is one partition, which will end when the // input ends. @@ -359,27 +370,44 @@ void WindowTransform::advancePartitionEnd() // Check for partition end. // The partition ends when the PARTITION BY columns change. We need // some reference columns for comparison. We might have already - // dropped the blocks where the partition starts, but any row in the - // partition will do. We use the current_row for this. It might be the same - // as the partition_end if we're at the first row of the first partition, so - // we will compare it to itself, but it still works correctly. + // dropped the blocks where the partition starts, but any other row in the + // partition will do. We can't use frame_start or frame_end or current_row (the next row + // for which we are calculating the window functions), because they all might be + // past the end of the partition. prev_frame_start is suitable, because it + // is a pointer to the first row of the previous frame that must have been + // valid, or to the first row of the partition, and we make sure not to drop + // its block. + assert(partition_start <= prev_frame_start); + // The frame start should be inside the prospective partition, except the + // case when it still has no rows. + assert(prev_frame_start < partition_end || partition_start == partition_end); + assert(first_block_number <= prev_frame_start.block); const auto block_rows = blockRowsNumber(partition_end); for (; partition_end.row < block_rows; ++partition_end.row) { +// fmt::print(stderr, "compare reference '{}' to compared '{}'\n", +// prev_frame_start, partition_end); + size_t i = 0; - for (; i < n; i++) + for (; i < partition_by_columns; i++) { - const auto * ref = inputAt(current_row)[partition_by_indices[i]].get(); - const auto * c = inputAt(partition_end)[partition_by_indices[i]].get(); - if (c->compareAt(partition_end.row, - current_row.row, *ref, + const auto * reference_column + = inputAt(prev_frame_start)[partition_by_indices[i]].get(); + const auto * compared_column + = inputAt(partition_end)[partition_by_indices[i]].get(); + +// fmt::print(stderr, "reference '{}', compared '{}'\n", +// (*reference_column)[prev_frame_start.row], +// (*compared_column)[partition_end.row]); + if (compared_column->compareAt(partition_end.row, + prev_frame_start.row, *reference_column, 1 /* nan_direction_hint */) != 0) { break; } } - if (i < n) + if (i < partition_by_columns) { partition_ended = true; return; @@ -850,6 +878,8 @@ void WindowTransform::updateAggregationState() assert(prev_frame_start <= prev_frame_end); assert(prev_frame_start <= frame_start); assert(prev_frame_end <= frame_end); + assert(partition_start <= frame_start); + assert(frame_end <= partition_end); // We might have to reset aggregation state and/or add some rows to it. // Figure out what to do. @@ -963,12 +993,42 @@ void WindowTransform::writeOutCurrentRow() a->insertResultInto(buf, *result_column, arena.get()); } } + +// fmt::print(stderr, "wrote out aggregation state for current row '{}'\n", +// current_row); +} + +static void assertSameColumns(const Columns & left_all, + const Columns & right_all) +{ + assert(left_all.size() == right_all.size()); + + for (size_t i = 0; i < left_all.size(); ++i) + { + const auto * left_column = left_all[i].get(); + const auto * right_column = right_all[i].get(); + + assert(left_column); + assert(right_column); + + assert(typeid(*left_column).hash_code() + == typeid(*right_column).hash_code()); + + if (isColumnConst(*left_column)) + { + Field left_value = assert_cast(*left_column).getField(); + Field right_value = assert_cast(*right_column).getField(); + + assert(left_value == right_value); + } + } } void WindowTransform::appendChunk(Chunk & chunk) { // fmt::print(stderr, "new chunk, {} rows, finished={}\n", chunk.getNumRows(), // input_is_finished); +// fmt::print(stderr, "chunk structure '{}'\n", chunk.dumpStructure()); // First, prepare the new input block and add it to the queue. We might not // have it if it's end of data, though. @@ -984,28 +1044,42 @@ void WindowTransform::appendChunk(Chunk & chunk) blocks.push_back({}); auto & block = blocks.back(); + // Use the number of rows from the Chunk, because it is correct even in // the case where the Chunk has no columns. Not sure if this actually // happens, because even in the case of `count() over ()` we have a dummy // input column. block.rows = chunk.getNumRows(); - block.input_columns = chunk.detachColumns(); + // If we have a (logically) constant column, some Chunks will have a + // Const column for it, and some -- materialized. Such difference is + // generated by e.g. MergingSortedAlgorithm, which mostly materializes + // the constant ORDER BY columns, but in some obscure cases passes them + // through, unmaterialized. This mix is a pain to work with in Window + // Transform, because we have to compare columns across blocks, when e.g. + // searching for peer group boundaries, and each of the four combinations + // of const and materialized requires different code. + // Another problem with Const columns is that the aggregate functions + // can't work with them, so we have to materialize them like the + // Aggregator does. + // Just materialize everything. + auto columns = chunk.detachColumns(); + for (auto & column : columns) + column = std::move(column)->convertToFullColumnIfConst(); + block.input_columns = std::move(columns); + + // Initialize output columns. for (auto & ws : workspaces) { - // Aggregate functions can't work with constant columns, so we have to - // materialize them like the Aggregator does. - for (const auto column_index : ws.argument_column_indices) - { - block.input_columns[column_index] - = std::move(block.input_columns[column_index]) - ->convertToFullColumnIfConst(); - } - block.output_columns.push_back(ws.aggregate_function->getReturnType() ->createColumn()); block.output_columns.back()->reserve(block.rows); } + + // As a debugging aid, assert that all chunks have the same C++ type of + // columns, that also matches the input header, because we often have to + // work across chunks. + assertSameColumns(input_header.getColumns(), block.input_columns); } // Start the calculations. First, advance the partition end. @@ -1136,7 +1210,7 @@ void WindowTransform::appendChunk(Chunk & chunk) peer_group_number = 1; // fmt::print(stderr, "reinitialize agg data at start of {}\n", -// new_partition_start); +// partition_start); // Reinitialize the aggregate function states because the new partition // has started. for (auto & ws : workspaces) @@ -1313,13 +1387,16 @@ void WindowTransform::work() } // We don't really have to keep the entire partition, and it can be big, so - // we want to drop the starting blocks to save memory. - // We can drop the old blocks if we already returned them as output, and the - // frame and the current row are already past them. Note that the frame - // start can be further than current row for some frame specs (e.g. EXCLUDE - // CURRENT ROW), so we have to check both. + // we want to drop the starting blocks to save memory. We can drop the old + // blocks if we already returned them as output, and the frame and the + // current row are already past them. We also need to keep the previous + // frame start because we use it as the partition etalon. It is always less + // than the current frame start, so we don't have to check the latter. Note + // that the frame start can be further than current row for some frame specs + // (e.g. EXCLUDE CURRENT ROW), so we have to check both. + assert(prev_frame_start <= frame_start); const auto first_used_block = std::min(next_output_block_number, - std::min(frame_start.block, current_row.block)); + std::min(prev_frame_start.block, current_row.block)); if (first_block_number < first_used_block) { @@ -1332,6 +1409,7 @@ void WindowTransform::work() assert(next_output_block_number >= first_block_number); assert(frame_start.block >= first_block_number); + assert(prev_frame_start.block >= first_block_number); assert(current_row.block >= first_block_number); assert(peer_group_start.block >= first_block_number); } @@ -1475,12 +1553,21 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction return; } - if (!getLeastSupertype({argument_types[0], argument_types[2]})) + const auto supertype = getLeastSupertype({argument_types[0], argument_types[2]}); + if (!supertype) { throw Exception(ErrorCodes::BAD_ARGUMENTS, - "The default value type '{}' is not convertible to the argument type '{}'", - argument_types[2]->getName(), - argument_types[0]->getName()); + "There is no supertype for the argument type '{}' and the default value type '{}'", + argument_types[0]->getName(), + argument_types[2]->getName()); + } + if (!argument_types[0]->equals(*supertype)) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "The supertype '{}' for the argument type '{}' and the default value type '{}' is not the same as the argument type", + supertype->getName(), + argument_types[0]->getName(), + argument_types[2]->getName()); } if (argument_types.size() > 3) @@ -1491,8 +1578,7 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction } } - DataTypePtr getReturnType() const override - { return argument_types[0]; } + DataTypePtr getReturnType() const override { return argument_types[0]; } bool allocatesMemoryInArena() const override { return false; } @@ -1534,9 +1620,13 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction if (argument_types.size() > 2) { // Column with default values is specified. - to.insertFrom(*current_block.input_columns[ - workspace.argument_column_indices[2]], - transform->current_row.row); + // The conversion through Field is inefficient, but we accept + // subtypes of the argument type as a default value (for convenience), + // and it's a pain to write conversion that respects ColumnNothing + // and ColumnConst and so on. + const IColumn & default_column = *current_block.input_columns[ + workspace.argument_column_indices[2]].get(); + to.insert(default_column[transform->current_row.row]); } else { @@ -1553,6 +1643,74 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction } }; +struct WindowFunctionNthValue final : public WindowFunction +{ + WindowFunctionNthValue(const std::string & name_, + const DataTypes & argument_types_, const Array & parameters_) + : WindowFunction(name_, argument_types_, parameters_) + { + if (!parameters.empty()) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Function {} cannot be parameterized", name_); + } + + if (argument_types.size() != 2) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Function '{}' accepts 2 arguments, {} given", + name_, argument_types.size()); + } + } + + DataTypePtr getReturnType() const override + { return argument_types[0]; } + + bool allocatesMemoryInArena() const override { return false; } + + void windowInsertResultInto(const WindowTransform * transform, + size_t function_index) override + { + const auto & current_block = transform->blockAt(transform->current_row); + IColumn & to = *(current_block.output_columns[function_index]); + const auto & workspace = transform->workspaces[function_index]; + + int64_t offset = (*current_block.input_columns[ + workspace.argument_column_indices[1]])[ + transform->current_row.row].get() - 1; + + if (offset < 0) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "The offset for function {} must be non-negative, {} given", + getName(), offset); + } + + if (offset > INT_MAX) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "The offset for function {} must be less than {}, {} given", + getName(), INT_MAX, offset); + } + + const auto [target_row, offset_left] = transform->moveRowNumber(transform->frame_start, offset); + if (offset_left != 0 + || target_row < transform->frame_start + || transform->frame_end <= target_row) + { + // Offset is outside the frame. + to.insertDefault(); + } + else + { + // Offset is inside the frame. + to.insertFrom(*transform->blockAt(target_row).input_columns[ + workspace.argument_column_indices[0]], + target_row.row); + } + } +}; + void registerWindowFunctions(AggregateFunctionFactory & factory) { // Why didn't I implement lag/lead yet? Because they are a mess. I imagine @@ -1573,40 +1731,56 @@ void registerWindowFunctions(AggregateFunctionFactory & factory) // to a (rows between unbounded preceding and unbounded following) frame, // instead of adding separate logic for them. - factory.registerFunction("rank", [](const std::string & name, + const AggregateFunctionProperties properties = { + // By default, if an aggregate function has a null argument, it will be + // replaced with AggregateFunctionNothing. We don't need this behavior + // e.g. for lagInFrame(number, 1, null). + .returns_default_when_only_null = true, + // This probably doesn't make any difference for window functions because + // it is an Aggregator-specific setting. + .is_order_dependent = true }; + + factory.registerFunction("rank", {[](const std::string & name, const DataTypes & argument_types, const Array & parameters, const Settings *) { return std::make_shared(name, argument_types, parameters); - }); + }, properties}); - factory.registerFunction("dense_rank", [](const std::string & name, + factory.registerFunction("dense_rank", {[](const std::string & name, const DataTypes & argument_types, const Array & parameters, const Settings *) { return std::make_shared(name, argument_types, parameters); - }); + }, properties}); - factory.registerFunction("row_number", [](const std::string & name, + factory.registerFunction("row_number", {[](const std::string & name, const DataTypes & argument_types, const Array & parameters, const Settings *) { return std::make_shared(name, argument_types, parameters); - }); + }, properties}); - factory.registerFunction("lagInFrame", [](const std::string & name, + factory.registerFunction("lagInFrame", {[](const std::string & name, const DataTypes & argument_types, const Array & parameters, const Settings *) { return std::make_shared>( name, argument_types, parameters); - }); + }, properties}); - factory.registerFunction("leadInFrame", [](const std::string & name, + factory.registerFunction("leadInFrame", {[](const std::string & name, const DataTypes & argument_types, const Array & parameters, const Settings *) { return std::make_shared>( name, argument_types, parameters); - }); + }, properties}); + + factory.registerFunction("nth_value", {[](const std::string & name, + const DataTypes & argument_types, const Array & parameters, const Settings *) + { + return std::make_shared( + name, argument_types, parameters); + }, properties}); } } diff --git a/src/Processors/Transforms/WindowTransform.h b/src/Processors/Transforms/WindowTransform.h index 611b03ebf72..d7211f9edd7 100644 --- a/src/Processors/Transforms/WindowTransform.h +++ b/src/Processors/Transforms/WindowTransform.h @@ -139,7 +139,9 @@ public: } const Columns & inputAt(const RowNumber & x) const - { return const_cast(this)->inputAt(x); } + { + return const_cast(this)->inputAt(x); + } auto & blockAt(const uint64_t block_number) { @@ -149,13 +151,19 @@ public: } const auto & blockAt(const uint64_t block_number) const - { return const_cast(this)->blockAt(block_number); } + { + return const_cast(this)->blockAt(block_number); + } auto & blockAt(const RowNumber & x) - { return blockAt(x.block); } + { + return blockAt(x.block); + } const auto & blockAt(const RowNumber & x) const - { return const_cast(this)->blockAt(x); } + { + return const_cast(this)->blockAt(x); + } size_t blockRowsNumber(const RowNumber & x) const { @@ -225,10 +233,14 @@ public: } RowNumber blocksEnd() const - { return RowNumber{first_block_number + blocks.size(), 0}; } + { + return RowNumber{first_block_number + blocks.size(), 0}; + } RowNumber blocksBegin() const - { return RowNumber{first_block_number, 0}; } + { + return RowNumber{first_block_number, 0}; + } public: /* diff --git a/src/DataStreams/InputStreamFromASTInsertQuery.cpp b/src/Processors/Transforms/getSourceFromFromASTInsertQuery.cpp similarity index 69% rename from src/DataStreams/InputStreamFromASTInsertQuery.cpp rename to src/Processors/Transforms/getSourceFromFromASTInsertQuery.cpp index 0848d838276..8d8a4761657 100644 --- a/src/DataStreams/InputStreamFromASTInsertQuery.cpp +++ b/src/Processors/Transforms/getSourceFromFromASTInsertQuery.cpp @@ -1,13 +1,16 @@ #include #include #include +#include #include #include #include -#include -#include +#include +#include #include #include +#include +#include namespace DB @@ -20,7 +23,7 @@ namespace ErrorCodes } -InputStreamFromASTInsertQuery::InputStreamFromASTInsertQuery( +Pipe getSourceFromFromASTInsertQuery( const ASTPtr & ast, ReadBuffer * input_buffer_tail_part, const Block & header, @@ -42,7 +45,7 @@ InputStreamFromASTInsertQuery::InputStreamFromASTInsertQuery( /// Data could be in parsed (ast_insert_query.data) and in not parsed yet (input_buffer_tail_part) part of query. - input_buffer_ast_part = std::make_unique( + auto input_buffer_ast_part = std::make_unique( ast_insert_query->data, ast_insert_query->data ? ast_insert_query->end - ast_insert_query->data : 0); ConcatReadBuffer::ReadBuffers buffers; @@ -56,9 +59,10 @@ InputStreamFromASTInsertQuery::InputStreamFromASTInsertQuery( * - because 'query.data' could refer to memory piece, used as buffer for 'input_buffer_tail_part'. */ - input_buffer_contacenated = std::make_unique(buffers); + auto input_buffer_contacenated = std::make_unique(buffers); - res_stream = context->getInputFormat(format, *input_buffer_contacenated, header, context->getSettings().max_insert_block_size); + auto source = FormatFactory::instance().getInput(format, *input_buffer_contacenated, header, context, context->getSettings().max_insert_block_size); + Pipe pipe(source); if (context->getSettingsRef().input_format_defaults_for_omitted_fields && ast_insert_query->table_id && !input_function) { @@ -66,8 +70,18 @@ InputStreamFromASTInsertQuery::InputStreamFromASTInsertQuery( auto metadata_snapshot = storage->getInMemoryMetadataPtr(); const auto & columns = metadata_snapshot->getColumns(); if (columns.hasDefaults()) - res_stream = std::make_shared(res_stream, columns, context); + { + pipe.addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared(cur_header, columns, *source, context); + }); + } } + + source->addBuffer(std::move(input_buffer_ast_part)); + source->addBuffer(std::move(input_buffer_contacenated)); + + return pipe; } } diff --git a/src/Processors/Transforms/getSourceFromFromASTInsertQuery.h b/src/Processors/Transforms/getSourceFromFromASTInsertQuery.h new file mode 100644 index 00000000000..3c00bd47ea0 --- /dev/null +++ b/src/Processors/Transforms/getSourceFromFromASTInsertQuery.h @@ -0,0 +1,26 @@ +#pragma once + +#include +#include +#include +#include + + +namespace DB +{ + +/** Prepares a pipe which produce data containing in INSERT query + * Head of inserting data could be stored in INSERT ast directly + * Remaining (tail) data could be stored in input_buffer_tail_part + */ + +class Pipe; + +Pipe getSourceFromFromASTInsertQuery( + const ASTPtr & ast, + ReadBuffer * input_buffer_tail_part, + const Block & header, + ContextPtr context, + const ASTPtr & input_function); + +} diff --git a/src/Processors/tests/gtest_exception_on_incorrect_pipeline.cpp b/src/Processors/tests/gtest_exception_on_incorrect_pipeline.cpp index 3d85ffede9a..df3901e2eb1 100644 --- a/src/Processors/tests/gtest_exception_on_incorrect_pipeline.cpp +++ b/src/Processors/tests/gtest_exception_on_incorrect_pipeline.cpp @@ -1,7 +1,7 @@ #include #include -#include +#include #include #include diff --git a/src/Processors/ya.make b/src/Processors/ya.make index 86a40685d1f..4b95484a828 100644 --- a/src/Processors/ya.make +++ b/src/Processors/ya.make @@ -7,8 +7,14 @@ PEERDIR( clickhouse/src/Common contrib/libs/msgpack contrib/libs/protobuf + contrib/libs/arrow ) +ADDINCL( + contrib/libs/arrow/src +) + +CFLAGS(-DUSE_ARROW=1) SRCS( Chunk.cpp @@ -25,6 +31,11 @@ SRCS( Formats/IOutputFormat.cpp Formats/IRowInputFormat.cpp Formats/IRowOutputFormat.cpp + Formats/Impl/ArrowBlockInputFormat.cpp + Formats/Impl/ArrowBlockOutputFormat.cpp + Formats/Impl/ArrowBufferedStreams.cpp + Formats/Impl/ArrowColumnToCHColumn.cpp + Formats/Impl/CHColumnToArrowColumn.cpp Formats/Impl/BinaryRowInputFormat.cpp Formats/Impl/BinaryRowOutputFormat.cpp Formats/Impl/CSVRowInputFormat.cpp @@ -126,6 +137,7 @@ SRCS( QueryPlan/QueryPlan.cpp QueryPlan/ReadFromMergeTree.cpp QueryPlan/ReadFromPreparedSource.cpp + QueryPlan/ReadFromRemote.cpp QueryPlan/ReadNothingStep.cpp QueryPlan/RollupStep.cpp QueryPlan/SettingQuotaAndLimitsStep.cpp @@ -138,10 +150,12 @@ SRCS( Sources/SinkToOutputStream.cpp Sources/SourceFromInputStream.cpp Sources/SourceWithProgress.cpp + Transforms/AddingDefaultsTransform.cpp Transforms/AddingSelectorTransform.cpp Transforms/AggregatingInOrderTransform.cpp Transforms/AggregatingTransform.cpp Transforms/ArrayJoinTransform.cpp + Transforms/CheckSortedTransform.cpp Transforms/CopyTransform.cpp Transforms/CreatingSetsTransform.cpp Transforms/CubeTransform.cpp @@ -164,6 +178,7 @@ SRCS( Transforms/SortingTransform.cpp Transforms/TotalsHavingTransform.cpp Transforms/WindowTransform.cpp + Transforms/getSourceFromFromASTInsertQuery.cpp printPipeline.cpp ) diff --git a/src/Server/GRPCServer.cpp b/src/Server/GRPCServer.cpp index 82e5ed4d0db..b90b0c33f17 100644 --- a/src/Server/GRPCServer.cpp +++ b/src/Server/GRPCServer.cpp @@ -5,8 +5,10 @@ #include #include #include -#include +#include +#include #include +#include #include #include #include @@ -20,6 +22,10 @@ #include #include #include +#include +#include +#include +#include #include #include #include @@ -475,6 +481,40 @@ namespace }; + /// A boolean state protected by mutex able to wait until other thread sets it to a specific value. + class BoolState + { + public: + explicit BoolState(bool initial_value) : value(initial_value) {} + + bool get() const + { + std::lock_guard lock{mutex}; + return value; + } + + void set(bool new_value) + { + std::lock_guard lock{mutex}; + if (value == new_value) + return; + value = new_value; + changed.notify_all(); + } + + void wait(bool wanted_value) const + { + std::unique_lock lock{mutex}; + changed.wait(lock, [this, wanted_value]() { return value == wanted_value; }); + } + + private: + bool value; + mutable std::mutex mutex; + mutable std::condition_variable changed; + }; + + /// Handles a connection after a responder is started (i.e. after getting a new call). class Call { @@ -547,7 +587,8 @@ namespace std::optional read_buffer; std::optional write_buffer; - BlockInputStreamPtr block_input_stream; + std::unique_ptr pipeline; + std::unique_ptr pipeline_executor; BlockOutputStreamPtr block_output_stream; bool need_input_data_from_insert_query = true; bool need_input_data_from_query_info = true; @@ -558,18 +599,15 @@ namespace UInt64 waited_for_client_writing = 0; /// The following fields are accessed both from call_thread and queue_thread. - std::atomic reading_query_info = false; + BoolState reading_query_info{false}; std::atomic failed_to_read_query_info = false; GRPCQueryInfo next_query_info_while_reading; std::atomic want_to_cancel = false; std::atomic check_query_info_contains_cancel_only = false; - std::atomic sending_result = false; + BoolState sending_result{false}; std::atomic failed_to_send_result = false; ThreadFromGlobalPool call_thread; - std::condition_variable read_finished; - std::condition_variable write_finished; - std::mutex dummy_mutex; /// Doesn't protect anything. }; Call::Call(CallType call_type_, std::unique_ptr responder_, IServer & iserver_, Poco::Logger * log_) @@ -604,6 +642,7 @@ namespace { try { + setThreadName("GRPCServerCall"); receiveQuery(); executeQuery(); processInput(); @@ -755,16 +794,16 @@ namespace throw Exception("Unexpected context in Input initializer", ErrorCodes::LOGICAL_ERROR); input_function_is_used = true; initializeBlockInputStream(input_storage->getInMemoryMetadataPtr()->getSampleBlock()); - block_input_stream->readPrefix(); }); query_context->setInputBlocksReaderCallback([this](ContextPtr context) -> Block { if (context != query_context) throw Exception("Unexpected context in InputBlocksReader", ErrorCodes::LOGICAL_ERROR); - auto block = block_input_stream->read(); - if (!block) - block_input_stream->readSuffix(); + + Block block; + while (!block && pipeline_executor->pull(block)); + return block; }); @@ -797,13 +836,15 @@ namespace /// So we mustn't touch the input stream from other thread. initializeBlockInputStream(io.out->getHeader()); - block_input_stream->readPrefix(); io.out->writePrefix(); - while (auto block = block_input_stream->read()) - io.out->write(block); + Block block; + while (pipeline_executor->pull(block)) + { + if (block) + io.out->write(block); + } - block_input_stream->readSuffix(); io.out->writeSuffix(); } @@ -866,9 +907,11 @@ namespace return {nullptr, 0}; /// no more input data }); - assert(!block_input_stream); - block_input_stream = query_context->getInputFormat( - input_format, *read_buffer, header, query_context->getSettings().max_insert_block_size); + assert(!pipeline); + pipeline = std::make_unique(); + auto source = FormatFactory::instance().getInput( + input_format, *read_buffer, header, query_context, query_context->getSettings().max_insert_block_size); + pipeline->init(Pipe(source)); /// Add default values if necessary. if (ast) @@ -881,10 +924,17 @@ namespace StoragePtr storage = DatabaseCatalog::instance().getTable(table_id, query_context); const auto & columns = storage->getInMemoryMetadataPtr()->getColumns(); if (!columns.empty()) - block_input_stream = std::make_shared(block_input_stream, columns, query_context); + { + pipeline->addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared(cur_header, columns, *source, query_context); + }); + } } } } + + pipeline_executor = std::make_unique(*pipeline); } void Call::createExternalTables() @@ -927,7 +977,7 @@ namespace { /// The data will be written directly to the table. auto metadata_snapshot = storage->getInMemoryMetadataPtr(); - auto out_stream = storage->write(ASTPtr(), metadata_snapshot, query_context); + auto out_stream = std::make_shared(storage->write(ASTPtr(), metadata_snapshot, query_context)); ReadBufferFromMemory data(external_table.data().data(), external_table.data().size()); String format = external_table.format(); if (format.empty()) @@ -1150,7 +1200,7 @@ namespace { io.onException(); - LOG_ERROR(log, "Code: {}, e.displayText() = {}, Stack trace:\n\n{}", exception.code(), exception.displayText(), exception.getStackTraceString()); + LOG_ERROR(log, getExceptionMessage(exception, true)); if (responder && !responder_finished) { @@ -1196,7 +1246,8 @@ namespace void Call::close() { responder.reset(); - block_input_stream.reset(); + pipeline_executor.reset(); + pipeline.reset(); block_output_stream.reset(); read_buffer.reset(); write_buffer.reset(); @@ -1212,8 +1263,7 @@ namespace { auto start_reading = [&] { - assert(!reading_query_info); - reading_query_info = true; + reading_query_info.set(true); responder->read(next_query_info_while_reading, [this](bool ok) { /// Called on queue_thread. @@ -1238,18 +1288,16 @@ namespace /// on queue_thread. failed_to_read_query_info = true; } - reading_query_info = false; - read_finished.notify_one(); + reading_query_info.set(false); }); }; auto finish_reading = [&] { - if (reading_query_info) + if (reading_query_info.get()) { Stopwatch client_writing_watch; - std::unique_lock lock{dummy_mutex}; - read_finished.wait(lock, [this] { return !reading_query_info; }); + reading_query_info.wait(false); waited_for_client_writing += client_writing_watch.elapsedNanoseconds(); } throwIfFailedToReadQueryInfo(); @@ -1412,11 +1460,10 @@ namespace /// Wait for previous write to finish. /// (gRPC doesn't allow to start sending another result while the previous is still being sending.) - if (sending_result) + if (sending_result.get()) { Stopwatch client_reading_watch; - std::unique_lock lock{dummy_mutex}; - write_finished.wait(lock, [this] { return !sending_result; }); + sending_result.wait(false); waited_for_client_reading += client_reading_watch.elapsedNanoseconds(); } throwIfFailedToSendResult(); @@ -1427,14 +1474,13 @@ namespace if (write_buffer) write_buffer->finalize(); - sending_result = true; + sending_result.set(true); auto callback = [this](bool ok) { /// Called on queue_thread. if (!ok) failed_to_send_result = true; - sending_result = false; - write_finished.notify_one(); + sending_result.set(false); }; Stopwatch client_reading_final_watch; @@ -1454,8 +1500,7 @@ namespace if (send_final_message) { /// Wait until the result is actually sent. - std::unique_lock lock{dummy_mutex}; - write_finished.wait(lock, [this] { return !sending_result; }); + sending_result.wait(false); waited_for_client_reading += client_reading_final_watch.elapsedNanoseconds(); throwIfFailedToSendResult(); LOG_TRACE(log, "Final result has been sent to the client"); @@ -1566,7 +1611,7 @@ private: { /// Called on call_thread. That's why we can't destroy the `call` right now /// (thread can't join to itself). Thus here we only move the `call` from - /// `current_call` to `finished_calls` and run() will actually destroy the `call`. + /// `current_calls` to `finished_calls` and run() will actually destroy the `call`. std::lock_guard lock{mutex}; auto it = current_calls.find(call); finished_calls.push_back(std::move(it->second)); @@ -1575,6 +1620,7 @@ private: void run() { + setThreadName("GRPCServerQueue"); while (true) { { diff --git a/src/Server/HTTPHandler.cpp b/src/Server/HTTPHandler.cpp index ad38cfb341a..8e0bed4b4c2 100644 --- a/src/Server/HTTPHandler.cpp +++ b/src/Server/HTTPHandler.cpp @@ -6,7 +6,6 @@ #include #include #include -#include #include #include #include diff --git a/src/Server/MySQLHandler.cpp b/src/Server/MySQLHandler.cpp index b8913f5e64f..375f248d939 100644 --- a/src/Server/MySQLHandler.cpp +++ b/src/Server/MySQLHandler.cpp @@ -73,13 +73,13 @@ MySQLHandler::MySQLHandler(IServer & server_, const Poco::Net::StreamSocket & so : Poco::Net::TCPServerConnection(socket_) , server(server_) , log(&Poco::Logger::get("MySQLHandler")) - , connection_context(Context::createCopy(server.context())) , connection_id(connection_id_) + , connection_context(Context::createCopy(server.context())) , auth_plugin(new MySQLProtocol::Authentication::Native41()) { - server_capability_flags = CLIENT_PROTOCOL_41 | CLIENT_SECURE_CONNECTION | CLIENT_PLUGIN_AUTH | CLIENT_PLUGIN_AUTH_LENENC_CLIENT_DATA | CLIENT_CONNECT_WITH_DB | CLIENT_DEPRECATE_EOF; + server_capabilities = CLIENT_PROTOCOL_41 | CLIENT_SECURE_CONNECTION | CLIENT_PLUGIN_AUTH | CLIENT_PLUGIN_AUTH_LENENC_CLIENT_DATA | CLIENT_CONNECT_WITH_DB | CLIENT_DEPRECATE_EOF; if (ssl_enabled) - server_capability_flags |= CLIENT_SSL; + server_capabilities |= CLIENT_SSL; replacements.emplace("KILL QUERY", killConnectionIdReplacementQuery); replacements.emplace("SHOW TABLE STATUS LIKE", showTableStatusReplacementQuery); @@ -95,15 +95,15 @@ void MySQLHandler::run() connection_context->getClientInfo().interface = ClientInfo::Interface::MYSQL; connection_context->setDefaultFormat("MySQLWire"); connection_context->getClientInfo().connection_id = connection_id; - connection_context->setMySQLProtocolContext(&connection_context_mysql); + connection_context->getClientInfo().query_kind = ClientInfo::QueryKind::INITIAL_QUERY; in = std::make_shared(socket()); out = std::make_shared(socket()); - packet_endpoint = connection_context_mysql.makeEndpoint(*in, *out); + packet_endpoint = MySQLProtocol::PacketEndpoint::create(*in, *out, sequence_id); try { - Handshake handshake(server_capability_flags, connection_id, VERSION_STRING + String("-") + VERSION_NAME, + Handshake handshake(server_capabilities, connection_id, VERSION_STRING + String("-") + VERSION_NAME, auth_plugin->getName(), auth_plugin->getAuthPluginData(), CharacterSet::utf8_general_ci); packet_endpoint->sendPacket(handshake, true); @@ -111,11 +111,8 @@ void MySQLHandler::run() HandshakeResponse handshake_response; finishHandshake(handshake_response); - connection_context_mysql.client_capabilities = handshake_response.capability_flags; - if (handshake_response.max_packet_size) - connection_context_mysql.max_packet_size = handshake_response.max_packet_size; - if (!connection_context_mysql.max_packet_size) - connection_context_mysql.max_packet_size = MAX_PACKET_LENGTH; + client_capabilities = handshake_response.capability_flags; + max_packet_size = handshake_response.max_packet_size ? handshake_response.max_packet_size : MAX_PACKET_LENGTH; LOG_TRACE(log, "Capabilities: {}, max_packet_size: {}, character_set: {}, user: {}, auth_response length: {}, database: {}, auth_plugin_name: {}", @@ -127,8 +124,7 @@ void MySQLHandler::run() handshake_response.database, handshake_response.auth_plugin_name); - client_capability_flags = handshake_response.capability_flags; - if (!(client_capability_flags & CLIENT_PROTOCOL_41)) + if (!(client_capabilities & CLIENT_PROTOCOL_41)) throw Exception("Required capability: CLIENT_PROTOCOL_41.", ErrorCodes::MYSQL_CLIENT_INSUFFICIENT_CAPABILITIES); authenticate(handshake_response.username, handshake_response.auth_plugin_name, handshake_response.auth_response); @@ -282,7 +278,7 @@ void MySQLHandler::comInitDB(ReadBuffer & payload) readStringUntilEOF(database, payload); LOG_DEBUG(log, "Setting current database to {}", database); connection_context->setCurrentDatabase(database); - packet_endpoint->sendPacket(OKPacket(0, client_capability_flags, 0, 0, 1), true); + packet_endpoint->sendPacket(OKPacket(0, client_capabilities, 0, 0, 1), true); } void MySQLHandler::comFieldList(ReadBuffer & payload) @@ -299,12 +295,12 @@ void MySQLHandler::comFieldList(ReadBuffer & payload) ); packet_endpoint->sendPacket(column_definition); } - packet_endpoint->sendPacket(OKPacket(0xfe, client_capability_flags, 0, 0, 0), true); + packet_endpoint->sendPacket(OKPacket(0xfe, client_capabilities, 0, 0, 0), true); } void MySQLHandler::comPing() { - packet_endpoint->sendPacket(OKPacket(0x0, client_capability_flags, 0, 0, 0), true); + packet_endpoint->sendPacket(OKPacket(0x0, client_capabilities, 0, 0, 0), true); } static bool isFederatedServerSetupSetCommand(const String & query); @@ -317,7 +313,7 @@ void MySQLHandler::comQuery(ReadBuffer & payload) // As Clickhouse doesn't support these statements, we just send OK packet in response. if (isFederatedServerSetupSetCommand(query)) { - packet_endpoint->sendPacket(OKPacket(0x00, client_capability_flags, 0, 0, 0), true); + packet_endpoint->sendPacket(OKPacket(0x00, client_capabilities, 0, 0, 0), true); } else { @@ -351,15 +347,20 @@ void MySQLHandler::comQuery(ReadBuffer & payload) CurrentThread::QueryScope query_scope{query_context}; - executeQuery(should_replace ? replacement : payload, *out, false, query_context, - [&with_output](const String &, const String &, const String &, const String &) - { - with_output = true; - } - ); + FormatSettings format_settings; + format_settings.mysql_wire.client_capabilities = client_capabilities; + format_settings.mysql_wire.max_packet_size = max_packet_size; + format_settings.mysql_wire.sequence_id = &sequence_id; + + auto set_result_details = [&with_output](const String &, const String &, const String &, const String &) + { + with_output = true; + }; + + executeQuery(should_replace ? replacement : payload, *out, false, query_context, set_result_details, format_settings); if (!with_output) - packet_endpoint->sendPacket(OKPacket(0x00, client_capability_flags, affected_rows, 0, 0), true); + packet_endpoint->sendPacket(OKPacket(0x00, client_capabilities, affected_rows, 0, 0), true); } } @@ -396,14 +397,14 @@ void MySQLHandlerSSL::finishHandshakeSSL( ReadBufferFromMemory payload(buf, pos); payload.ignore(PACKET_HEADER_SIZE); ssl_request.readPayloadWithUnpacked(payload); - connection_context_mysql.client_capabilities = ssl_request.capability_flags; - connection_context_mysql.max_packet_size = ssl_request.max_packet_size ? ssl_request.max_packet_size : MAX_PACKET_LENGTH; + client_capabilities = ssl_request.capability_flags; + max_packet_size = ssl_request.max_packet_size ? ssl_request.max_packet_size : MAX_PACKET_LENGTH; secure_connection = true; ss = std::make_shared(SecureStreamSocket::attach(socket(), SSLManager::instance().defaultServerContext())); in = std::make_shared(*ss); out = std::make_shared(*ss); - connection_context_mysql.sequence_id = 2; - packet_endpoint = connection_context_mysql.makeEndpoint(*in, *out); + sequence_id = 2; + packet_endpoint = MySQLProtocol::PacketEndpoint::create(*in, *out, sequence_id); packet_endpoint->receivePacket(packet); /// Reading HandshakeResponse from secure socket. } diff --git a/src/Server/MySQLHandler.h b/src/Server/MySQLHandler.h index 2ea5695a0a6..96467797105 100644 --- a/src/Server/MySQLHandler.h +++ b/src/Server/MySQLHandler.h @@ -32,7 +32,7 @@ public: void run() final; -private: +protected: CurrentMetrics::Increment metric_increment{CurrentMetrics::MySQLConnection}; /// Enables SSL, if client requested. @@ -52,33 +52,25 @@ private: virtual void finishHandshakeSSL(size_t packet_size, char * buf, size_t pos, std::function read_bytes, MySQLProtocol::ConnectionPhase::HandshakeResponse & packet); IServer & server; - -protected: Poco::Logger * log; - - MySQLWireContext connection_context_mysql; - ContextMutablePtr connection_context; - - MySQLProtocol::PacketEndpointPtr packet_endpoint; - -private: UInt64 connection_id = 0; - size_t server_capability_flags = 0; - size_t client_capability_flags = 0; + uint32_t server_capabilities = 0; + uint32_t client_capabilities = 0; + size_t max_packet_size = 0; + uint8_t sequence_id = 0; -protected: - std::unique_ptr auth_plugin; + MySQLProtocol::PacketEndpointPtr packet_endpoint; + ContextMutablePtr connection_context; - std::shared_ptr in; - std::shared_ptr out; - - bool secure_connection = false; - -private: using ReplacementFn = std::function; using Replacements = std::unordered_map; Replacements replacements; + + std::unique_ptr auth_plugin; + std::shared_ptr in; + std::shared_ptr out; + bool secure_connection = false; }; #if USE_SSL diff --git a/src/Server/PostgreSQLHandler.cpp b/src/Server/PostgreSQLHandler.cpp index 01887444c65..1e98ed2e134 100644 --- a/src/Server/PostgreSQLHandler.cpp +++ b/src/Server/PostgreSQLHandler.cpp @@ -55,6 +55,7 @@ void PostgreSQLHandler::run() connection_context->makeSessionContext(); connection_context->getClientInfo().interface = ClientInfo::Interface::POSTGRESQL; connection_context->setDefaultFormat("PostgreSQLWire"); + connection_context->getClientInfo().query_kind = ClientInfo::QueryKind::INITIAL_QUERY; try { diff --git a/src/Server/TCPHandler.cpp b/src/Server/TCPHandler.cpp index 108b7b8070a..269c33d952e 100644 --- a/src/Server/TCPHandler.cpp +++ b/src/Server/TCPHandler.cpp @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -149,7 +150,7 @@ void TCPHandler::runImpl() if (!DatabaseCatalog::instance().isDatabaseExist(default_database)) { Exception e("Database " + backQuote(default_database) + " doesn't exist", ErrorCodes::UNKNOWN_DATABASE); - LOG_ERROR(log, "Code: {}, e.displayText() = {}, Stack trace:\n\n{}", e.code(), e.displayText(), e.getStackTraceString()); + LOG_ERROR(log, getExceptionMessage(e, true)); sendException(e, connection_context->getSettingsRef().calculate_text_stack_trace); return; } @@ -422,7 +423,7 @@ void TCPHandler::runImpl() } const auto & e = *exception; - LOG_ERROR(log, "Code: {}, e.displayText() = {}, Stack trace:\n\n{}", e.code(), e.displayText(), e.getStackTraceString()); + LOG_ERROR(log, getExceptionMessage(e, true)); sendException(*exception, send_exception_with_stack_trace); } } @@ -1026,7 +1027,17 @@ bool TCPHandler::receivePacket() return false; case Protocol::Client::Cancel: + { + /// For testing connection collector. + const Settings & settings = query_context->getSettingsRef(); + if (settings.sleep_in_receive_cancel_ms.totalMilliseconds()) + { + std::chrono::milliseconds ms(settings.sleep_in_receive_cancel_ms.totalMilliseconds()); + std::this_thread::sleep_for(ms); + } + return false; + } case Protocol::Client::Hello: receiveUnexpectedHello(); @@ -1063,6 +1074,13 @@ String TCPHandler::receiveReadTaskResponseAssumeLocked() if (packet_type == Protocol::Client::Cancel) { state.is_cancelled = true; + /// For testing connection collector. + const Settings & settings = query_context->getSettingsRef(); + if (settings.sleep_in_receive_cancel_ms.totalMilliseconds()) + { + std::chrono::milliseconds ms(settings.sleep_in_receive_cancel_ms.totalMilliseconds()); + std::this_thread::sleep_for(ms); + } return {}; } else @@ -1313,7 +1331,7 @@ bool TCPHandler::receiveData(bool scalar) } auto metadata_snapshot = storage->getInMemoryMetadataPtr(); /// The data will be written directly to the table. - auto temporary_table_out = storage->write(ASTPtr(), metadata_snapshot, query_context); + auto temporary_table_out = std::make_shared(storage->write(ASTPtr(), metadata_snapshot, query_context)); temporary_table_out->write(block); temporary_table_out->writeSuffix(); @@ -1461,6 +1479,16 @@ bool TCPHandler::isQueryCancelled() throw NetException("Unexpected packet Cancel received from client", ErrorCodes::UNEXPECTED_PACKET_FROM_CLIENT); LOG_INFO(log, "Query was cancelled."); state.is_cancelled = true; + /// For testing connection collector. + { + const Settings & settings = query_context->getSettingsRef(); + if (settings.sleep_in_receive_cancel_ms.totalMilliseconds()) + { + std::chrono::milliseconds ms(settings.sleep_in_receive_cancel_ms.totalMilliseconds()); + std::this_thread::sleep_for(ms); + } + } + return true; default: diff --git a/src/Storages/AlterCommands.cpp b/src/Storages/AlterCommands.cpp index 9e9510b51a4..1bb8e6bb9b0 100644 --- a/src/Storages/AlterCommands.cpp +++ b/src/Storages/AlterCommands.cpp @@ -1276,6 +1276,11 @@ void AlterCommands::validate(const StorageInMemoryMetadata & metadata, ContextPt validateColumnsDefaultsAndGetSampleBlock(default_expr_list, all_columns.getAll(), context); } +bool AlterCommands::hasSettingsAlterCommand() const +{ + return std::any_of(begin(), end(), [](const AlterCommand & c) { return c.isSettingsAlter(); }); +} + bool AlterCommands::isSettingsAlter() const { return std::all_of(begin(), end(), [](const AlterCommand & c) { return c.isSettingsAlter(); }); diff --git a/src/Storages/AlterCommands.h b/src/Storages/AlterCommands.h index 6987de68f9c..60f4ad7d552 100644 --- a/src/Storages/AlterCommands.h +++ b/src/Storages/AlterCommands.h @@ -195,9 +195,12 @@ public: void apply(StorageInMemoryMetadata & metadata, ContextPtr context) const; /// At least one command modify settings. + bool hasSettingsAlterCommand() const; + + /// All commands modify settings only. bool isSettingsAlter() const; - /// At least one command modify comments. + /// All commands modify comments only. bool isCommentAlter() const; /// Return mutation commands which some storages may execute as part of diff --git a/src/Storages/ColumnsDescription.cpp b/src/Storages/ColumnsDescription.cpp index 179204a1a0b..c05441148df 100644 --- a/src/Storages/ColumnsDescription.cpp +++ b/src/Storages/ColumnsDescription.cpp @@ -168,7 +168,7 @@ ColumnsDescription::ColumnsDescription(NamesAndTypesList ordinary, NamesAndAlias /// We are trying to find first column from end with name `column_name` or with a name beginning with `column_name` and ".". /// For example "fruits.bananas" /// names are considered the same if they completely match or `name_without_dot` matches the part of the name to the point -static auto getNameRange(const ColumnsDescription::Container & columns, const String & name_without_dot) +static auto getNameRange(const ColumnsDescription::ColumnsContainer & columns, const String & name_without_dot) { String name_with_dot = name_without_dot + "."; @@ -228,7 +228,7 @@ void ColumnsDescription::remove(const String & column_name) for (auto list_it = range.first; list_it != range.second;) { - removeSubcolumns(list_it->name, list_it->type); + removeSubcolumns(list_it->name); list_it = columns.get<0>().erase(list_it); } } @@ -303,7 +303,7 @@ void ColumnsDescription::flattenNested() } ColumnDescription column = std::move(*it); - removeSubcolumns(column.name, column.type); + removeSubcolumns(column.name); it = columns.get<0>().erase(it); const DataTypes & elements = type_tuple->getElements(); @@ -372,12 +372,7 @@ bool ColumnsDescription::hasNested(const String & column_name) const bool ColumnsDescription::hasSubcolumn(const String & column_name) const { - return subcolumns.find(column_name) != subcolumns.end(); -} - -bool ColumnsDescription::hasInStorageOrSubcolumn(const String & column_name) const -{ - return has(column_name) || hasSubcolumn(column_name); + return subcolumns.get<0>().count(column_name); } const ColumnDescription & ColumnsDescription::get(const String & column_name) const @@ -390,6 +385,50 @@ const ColumnDescription & ColumnsDescription::get(const String & column_name) co return *it; } +static ColumnsDescription::GetFlags defaultKindToGetFlag(ColumnDefaultKind kind) +{ + switch (kind) + { + case ColumnDefaultKind::Default: + return ColumnsDescription::Ordinary; + case ColumnDefaultKind::Materialized: + return ColumnsDescription::Materialized; + case ColumnDefaultKind::Alias: + return ColumnsDescription::Aliases; + } + __builtin_unreachable(); +} + +NamesAndTypesList ColumnsDescription::getByNames(GetFlags flags, const Names & names, bool with_subcolumns) const +{ + NamesAndTypesList res; + for (const auto & name : names) + { + if (auto it = columns.get<1>().find(name); it != columns.get<1>().end()) + { + auto kind = defaultKindToGetFlag(it->default_desc.kind); + if (flags & kind) + { + res.emplace_back(name, it->type); + continue; + } + } + else if (with_subcolumns) + { + auto jt = subcolumns.get<0>().find(name); + if (jt != subcolumns.get<0>().end()) + { + res.push_back(*jt); + continue; + } + } + + throw Exception(ErrorCodes::NO_SUCH_COLUMN_IN_TABLE, "There is no column {} in table", name); + } + + return res; +} + NamesAndTypesList ColumnsDescription::getAllPhysical() const { @@ -409,29 +448,46 @@ Names ColumnsDescription::getNamesOfPhysical() const return ret; } -NameAndTypePair ColumnsDescription::getPhysical(const String & column_name) const +std::optional ColumnsDescription::tryGetColumnOrSubcolumn(GetFlags flags, const String & column_name) const +{ + auto it = columns.get<1>().find(column_name); + if (it != columns.get<1>().end() && (defaultKindToGetFlag(it->default_desc.kind) & flags)) + return NameAndTypePair(it->name, it->type); + + auto jt = subcolumns.get<0>().find(column_name); + if (jt != subcolumns.get<0>().end()) + return *jt; + + return {}; +} + +NameAndTypePair ColumnsDescription::getColumnOrSubcolumn(GetFlags flags, const String & column_name) const +{ + auto column = tryGetColumnOrSubcolumn(flags, column_name); + if (!column) + throw Exception(ErrorCodes::NO_SUCH_COLUMN_IN_TABLE, + "There is no column or subcolumn {} in table.", column_name); + + return *column; +} + +std::optional ColumnsDescription::tryGetPhysical(const String & column_name) const { auto it = columns.get<1>().find(column_name); if (it == columns.get<1>().end() || it->default_desc.kind == ColumnDefaultKind::Alias) - throw Exception("There is no physical column " + column_name + " in table.", ErrorCodes::NO_SUCH_COLUMN_IN_TABLE); + return {}; + return NameAndTypePair(it->name, it->type); } -NameAndTypePair ColumnsDescription::getPhysicalOrSubcolumn(const String & column_name) const +NameAndTypePair ColumnsDescription::getPhysical(const String & column_name) const { - if (auto it = columns.get<1>().find(column_name); it != columns.get<1>().end() - && it->default_desc.kind != ColumnDefaultKind::Alias) - { - return NameAndTypePair(it->name, it->type); - } + auto column = tryGetPhysical(column_name); + if (!column) + throw Exception(ErrorCodes::NO_SUCH_COLUMN_IN_TABLE, + "There is no physical column {} in table.", column_name); - if (auto it = subcolumns.find(column_name); it != subcolumns.end()) - { - return it->second; - } - - throw Exception(ErrorCodes::NO_SUCH_COLUMN_IN_TABLE, - "There is no physical column or subcolumn {} in table.", column_name); + return *column; } bool ColumnsDescription::hasPhysical(const String & column_name) const @@ -440,32 +496,39 @@ bool ColumnsDescription::hasPhysical(const String & column_name) const return it != columns.get<1>().end() && it->default_desc.kind != ColumnDefaultKind::Alias; } -bool ColumnsDescription::hasPhysicalOrSubcolumn(const String & column_name) const +bool ColumnsDescription::hasColumnOrSubcolumn(GetFlags flags, const String & column_name) const { - return hasPhysical(column_name) || subcolumns.find(column_name) != subcolumns.end(); + auto it = columns.get<1>().find(column_name); + return (it != columns.get<1>().end() + && (defaultKindToGetFlag(it->default_desc.kind) & flags)) + || hasSubcolumn(column_name); } -static NamesAndTypesList getWithSubcolumns(NamesAndTypesList && source_list) +void ColumnsDescription::addSubcolumnsToList(NamesAndTypesList & source_list) const { - NamesAndTypesList ret; + NamesAndTypesList subcolumns_list; for (const auto & col : source_list) { - ret.emplace_back(col.name, col.type); - for (const auto & subcolumn : col.type->getSubcolumnNames()) - ret.emplace_back(col.name, subcolumn, col.type, col.type->getSubcolumnType(subcolumn)); + auto range = subcolumns.get<1>().equal_range(col.name); + if (range.first != range.second) + subcolumns_list.insert(subcolumns_list.end(), range.first, range.second); } - return ret; + source_list.splice(source_list.end(), std::move(subcolumns_list)); } NamesAndTypesList ColumnsDescription::getAllWithSubcolumns() const { - return getWithSubcolumns(getAll()); + auto columns_list = getAll(); + addSubcolumnsToList(columns_list); + return columns_list; } NamesAndTypesList ColumnsDescription::getAllPhysicalWithSubcolumns() const { - return getWithSubcolumns(getAllPhysical()); + auto columns_list = getAllPhysical(); + addSubcolumnsToList(columns_list); + return columns_list; } bool ColumnsDescription::hasDefaults() const @@ -591,14 +654,15 @@ void ColumnsDescription::addSubcolumns(const String & name_in_storage, const Dat throw Exception(ErrorCodes::ILLEGAL_COLUMN, "Cannot add subcolumn {}: column with this name already exists", subcolumn.name); - subcolumns[subcolumn.name] = subcolumn; + subcolumns.get<0>().insert(std::move(subcolumn)); } } -void ColumnsDescription::removeSubcolumns(const String & name_in_storage, const DataTypePtr & type_in_storage) +void ColumnsDescription::removeSubcolumns(const String & name_in_storage) { - for (const auto & subcolumn_name : type_in_storage->getSubcolumnNames()) - subcolumns.erase(name_in_storage + "." + subcolumn_name); + auto range = subcolumns.get<1>().equal_range(name_in_storage); + if (range.first != range.second) + subcolumns.get<1>().erase(range.first, range.second); } Block validateColumnsDefaultsAndGetSampleBlock(ASTPtr default_expr_list, const NamesAndTypesList & all_columns, ContextPtr context) diff --git a/src/Storages/ColumnsDescription.h b/src/Storages/ColumnsDescription.h index f1887d772ca..44f895c89ce 100644 --- a/src/Storages/ColumnsDescription.h +++ b/src/Storages/ColumnsDescription.h @@ -11,6 +11,8 @@ #include #include +#include +#include #include #include #include @@ -77,6 +79,18 @@ public: auto begin() const { return columns.begin(); } auto end() const { return columns.end(); } + enum GetFlags : UInt8 + { + Ordinary = 1, + Materialized = 2, + Aliases = 4, + + AllPhysical = Ordinary | Materialized, + All = AllPhysical | Aliases, + }; + + NamesAndTypesList getByNames(GetFlags flags, const Names & names, bool with_subcolumns) const; + NamesAndTypesList getOrdinary() const; NamesAndTypesList getMaterialized() const; NamesAndTypesList getAliases() const; @@ -91,7 +105,6 @@ public: bool has(const String & column_name) const; bool hasNested(const String & column_name) const; bool hasSubcolumn(const String & column_name) const; - bool hasInStorageOrSubcolumn(const String & column_name) const; const ColumnDescription & get(const String & column_name) const; template @@ -113,10 +126,15 @@ public: } Names getNamesOfPhysical() const; + bool hasPhysical(const String & column_name) const; - bool hasPhysicalOrSubcolumn(const String & column_name) const; + bool hasColumnOrSubcolumn(GetFlags flags, const String & column_name) const; + NameAndTypePair getPhysical(const String & column_name) const; - NameAndTypePair getPhysicalOrSubcolumn(const String & column_name) const; + NameAndTypePair getColumnOrSubcolumn(GetFlags flags, const String & column_name) const; + + std::optional tryGetPhysical(const String & column_name) const; + std::optional tryGetColumnOrSubcolumn(GetFlags flags, const String & column_name) const; ColumnDefaults getDefaults() const; /// TODO: remove bool hasDefault(const String & column_name) const; @@ -143,21 +161,27 @@ public: } /// Keep the sequence of columns and allow to lookup by name. - using Container = boost::multi_index_container< + using ColumnsContainer = boost::multi_index_container< ColumnDescription, boost::multi_index::indexed_by< boost::multi_index::sequenced<>, boost::multi_index::ordered_unique>>>; -private: - Container columns; + using SubcolumnsContainter = boost::multi_index_container< + NameAndTypePair, + boost::multi_index::indexed_by< + boost::multi_index::hashed_unique>, + boost::multi_index::hashed_non_unique>>>; - using SubcolumnsContainer = std::unordered_map; - SubcolumnsContainer subcolumns; +private: + ColumnsContainer columns; + SubcolumnsContainter subcolumns; void modifyColumnOrder(const String & column_name, const String & after_column, bool first); + void addSubcolumnsToList(NamesAndTypesList & source_list) const; + void addSubcolumns(const String & name_in_storage, const DataTypePtr & type_in_storage); - void removeSubcolumns(const String & name_in_storage, const DataTypePtr & type_in_storage); + void removeSubcolumns(const String & name_in_storage); }; /// Validate default expressions and corresponding types compatibility, i.e. diff --git a/src/Storages/Distributed/DirectoryMonitor.cpp b/src/Storages/Distributed/DirectoryMonitor.cpp index 17c0eec5c49..c674a705de1 100644 --- a/src/Storages/Distributed/DirectoryMonitor.cpp +++ b/src/Storages/Distributed/DirectoryMonitor.cpp @@ -2,6 +2,7 @@ #include #include #include +#include #include #include #include @@ -27,6 +28,7 @@ #include #include #include +#include #include @@ -330,6 +332,13 @@ namespace CheckingCompressedReadBuffer checking_in(in); remote.writePrepared(checking_in); } + + uint64_t doubleToUInt64(double d) + { + if (d >= std::numeric_limits::max()) + return std::numeric_limits::max(); + return static_cast(d); + } } @@ -345,15 +354,15 @@ StorageDistributedDirectoryMonitor::StorageDistributedDirectoryMonitor( , disk(disk_) , relative_path(relative_path_) , path(fs::path(disk->getPath()) / relative_path / "") - , should_batch_inserts(storage.getContext()->getSettingsRef().distributed_directory_monitor_batch_inserts) - , split_batch_on_failure(storage.getContext()->getSettingsRef().distributed_directory_monitor_split_batch_on_failure) + , should_batch_inserts(storage.getDistributedSettingsRef().monitor_batch_inserts) + , split_batch_on_failure(storage.getDistributedSettingsRef().monitor_split_batch_on_failure) , dir_fsync(storage.getDistributedSettingsRef().fsync_directories) , min_batched_block_size_rows(storage.getContext()->getSettingsRef().min_insert_block_size_rows) , min_batched_block_size_bytes(storage.getContext()->getSettingsRef().min_insert_block_size_bytes) , current_batch_file_path(path + "current_batch.txt") - , default_sleep_time(storage.getContext()->getSettingsRef().distributed_directory_monitor_sleep_time_ms.totalMilliseconds()) + , default_sleep_time(storage.getDistributedSettingsRef().monitor_sleep_time_ms.totalMilliseconds()) , sleep_time(default_sleep_time) - , max_sleep_time(storage.getContext()->getSettingsRef().distributed_directory_monitor_max_sleep_time_ms.totalMilliseconds()) + , max_sleep_time(storage.getDistributedSettingsRef().monitor_max_sleep_time_ms.totalMilliseconds()) , log(&Poco::Logger::get(getLoggerName())) , monitor_blocker(monitor_blocker_) , metric_pending_files(CurrentMetrics::DistributedFilesToInsert, 0) @@ -431,9 +440,14 @@ void StorageDistributedDirectoryMonitor::run() do_sleep = true; ++status.error_count; - sleep_time = std::min( - std::chrono::milliseconds{Int64(default_sleep_time.count() * std::exp2(status.error_count))}, - max_sleep_time); + + UInt64 q = doubleToUInt64(std::exp2(status.error_count)); + std::chrono::milliseconds new_sleep_time(default_sleep_time.count() * q); + if (new_sleep_time.count() < 0) + sleep_time = max_sleep_time; + else + sleep_time = std::min(new_sleep_time, max_sleep_time); + tryLogCurrentException(getLoggerName().data()); status.last_exception = std::current_exception(); } @@ -763,8 +777,8 @@ struct StorageDistributedDirectoryMonitor::Batch else { std::vector files(file_index_to_path.size()); - for (const auto & [index, name] : file_index_to_path) - files.push_back(name); + for (const auto && file_info : file_index_to_path | boost::adaptors::indexed()) + files[file_info.index()] = file_info.value().second; e.addMessage(fmt::format("While sending batch {}", fmt::join(files, "\n"))); throw; @@ -889,50 +903,78 @@ private: } }; -class DirectoryMonitorBlockInputStream : public IBlockInputStream +class DirectoryMonitorSource : public SourceWithProgress { public: - explicit DirectoryMonitorBlockInputStream(const String & file_name) - : in(file_name) - , decompressing_in(in) - , block_in(decompressing_in, DBMS_TCP_PROTOCOL_VERSION) - , log{&Poco::Logger::get("DirectoryMonitorBlockInputStream")} - { - readDistributedHeader(in, log); - block_in.readPrefix(); - first_block = block_in.read(); - header = first_block.cloneEmpty(); + struct Data + { + std::unique_ptr in; + std::unique_ptr decompressing_in; + std::unique_ptr block_in; + + Poco::Logger * log = nullptr; + + Block first_block; + + explicit Data(const String & file_name) + { + in = std::make_unique(file_name); + decompressing_in = std::make_unique(*in); + block_in = std::make_unique(*decompressing_in, DBMS_TCP_PROTOCOL_VERSION); + log = &Poco::Logger::get("DirectoryMonitorSource"); + + readDistributedHeader(*in, log); + + block_in->readPrefix(); + first_block = block_in->read(); + } + + Data(Data &&) = default; + }; + + explicit DirectoryMonitorSource(const String & file_name) + : DirectoryMonitorSource(Data(file_name)) + { } - String getName() const override { return "DirectoryMonitor"; } + explicit DirectoryMonitorSource(Data data_) + : SourceWithProgress(data_.first_block.cloneEmpty()) + , data(std::move(data_)) + { + } + + String getName() const override { return "DirectoryMonitorSource"; } protected: - Block getHeader() const override { return header; } - Block readImpl() override + Chunk generate() override { - if (first_block) - return std::move(first_block); + if (data.first_block) + { + size_t num_rows = data.first_block.rows(); + Chunk res(data.first_block.getColumns(), num_rows); + data.first_block.clear(); + return res; + } - return block_in.read(); + auto block = data.block_in->read(); + if (!block) + { + data.block_in->readSuffix(); + return {}; + } + + size_t num_rows = block.rows(); + return Chunk(block.getColumns(), num_rows); } - void readSuffix() override { block_in.readSuffix(); } - private: - ReadBufferFromFile in; - CompressedReadBuffer decompressing_in; - NativeBlockInputStream block_in; - - Block first_block; - Block header; - - Poco::Logger * log; + Data data; }; -BlockInputStreamPtr StorageDistributedDirectoryMonitor::createStreamFromFile(const String & file_name) +ProcessorPtr StorageDistributedDirectoryMonitor::createSourceFromFile(const String & file_name) { - return std::make_shared(file_name); + return std::make_shared(file_name); } bool StorageDistributedDirectoryMonitor::addAndSchedule(size_t file_size, size_t ms) diff --git a/src/Storages/Distributed/DirectoryMonitor.h b/src/Storages/Distributed/DirectoryMonitor.h index c04c49f3b9b..cd1d25179f3 100644 --- a/src/Storages/Distributed/DirectoryMonitor.h +++ b/src/Storages/Distributed/DirectoryMonitor.h @@ -21,6 +21,9 @@ class StorageDistributed; class ActionBlocker; class BackgroundSchedulePool; +class IProcessor; +using ProcessorPtr = std::shared_ptr; + /** Details of StorageDistributed. * This type is not designed for standalone use. */ @@ -45,7 +48,7 @@ public: void shutdownAndDropAllData(); - static BlockInputStreamPtr createStreamFromFile(const String & file_name); + static ProcessorPtr createSourceFromFile(const String & file_name); /// For scheduling via DistributedBlockOutputStream bool addAndSchedule(size_t file_size, size_t ms); diff --git a/src/Storages/Distributed/DistributedSettings.h b/src/Storages/Distributed/DistributedSettings.h index 7296fa11ffd..8cc942cab02 100644 --- a/src/Storages/Distributed/DistributedSettings.h +++ b/src/Storages/Distributed/DistributedSettings.h @@ -21,6 +21,11 @@ class ASTStorage; M(UInt64, bytes_to_throw_insert, 0, "If more than this number of compressed bytes will be pending for async INSERT, an exception will be thrown. 0 - do not throw.", 0) \ M(UInt64, bytes_to_delay_insert, 0, "If more than this number of compressed bytes will be pending for async INSERT, the query will be delayed. 0 - do not delay.", 0) \ M(UInt64, max_delay_to_insert, 60, "Max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for async send.", 0) \ + /** Directory monitor settings */ \ + M(UInt64, monitor_batch_inserts, 0, "Default - distributed_directory_monitor_batch_inserts", 0) \ + M(UInt64, monitor_split_batch_on_failure, 0, "Default - distributed_directory_monitor_split_batch_on_failure", 0) \ + M(Milliseconds, monitor_sleep_time_ms, 0, "Default - distributed_directory_monitor_sleep_time_ms", 0) \ + M(Milliseconds, monitor_max_sleep_time_ms, 0, "Default - distributed_directory_monitor_max_sleep_time_ms", 0) \ DECLARE_SETTINGS_TRAITS(DistributedSettingsTraits, LIST_OF_DISTRIBUTED_SETTINGS) diff --git a/src/Storages/Distributed/DistributedBlockOutputStream.cpp b/src/Storages/Distributed/DistributedSink.cpp similarity index 93% rename from src/Storages/Distributed/DistributedBlockOutputStream.cpp rename to src/Storages/Distributed/DistributedSink.cpp index c0d7541eacc..ec3f82d914c 100644 --- a/src/Storages/Distributed/DistributedBlockOutputStream.cpp +++ b/src/Storages/Distributed/DistributedSink.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include #include @@ -86,7 +86,7 @@ static void writeBlockConvert(const BlockOutputStreamPtr & out, const Block & bl } -DistributedBlockOutputStream::DistributedBlockOutputStream( +DistributedSink::DistributedSink( ContextPtr context_, StorageDistributed & storage_, const StorageMetadataPtr & metadata_snapshot_, @@ -95,7 +95,8 @@ DistributedBlockOutputStream::DistributedBlockOutputStream( bool insert_sync_, UInt64 insert_timeout_, StorageID main_table_) - : context(Context::createCopy(context_)) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , context(Context::createCopy(context_)) , storage(storage_) , metadata_snapshot(metadata_snapshot_) , query_ast(query_ast_) @@ -115,24 +116,15 @@ DistributedBlockOutputStream::DistributedBlockOutputStream( } -Block DistributedBlockOutputStream::getHeader() const +void DistributedSink::consume(Chunk chunk) { - if (!allow_materialized) - return metadata_snapshot->getSampleBlockNonMaterialized(); - else - return metadata_snapshot->getSampleBlock(); -} + if (is_first_chunk) + { + storage.delayInsertOrThrowIfNeeded(); + is_first_chunk = false; + } - -void DistributedBlockOutputStream::writePrefix() -{ - storage.delayInsertOrThrowIfNeeded(); -} - - -void DistributedBlockOutputStream::write(const Block & block) -{ - Block ordinary_block{ block }; + auto ordinary_block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); if (!allow_materialized) { @@ -155,7 +147,7 @@ void DistributedBlockOutputStream::write(const Block & block) writeAsync(ordinary_block); } -void DistributedBlockOutputStream::writeAsync(const Block & block) +void DistributedSink::writeAsync(const Block & block) { if (random_shard_insert) { @@ -174,7 +166,7 @@ void DistributedBlockOutputStream::writeAsync(const Block & block) } -std::string DistributedBlockOutputStream::getCurrentStateDescription() +std::string DistributedSink::getCurrentStateDescription() { WriteBufferFromOwnString buffer; const auto & addresses = cluster->getShardsAddresses(); @@ -203,7 +195,7 @@ std::string DistributedBlockOutputStream::getCurrentStateDescription() } -void DistributedBlockOutputStream::initWritingJobs(const Block & first_block, size_t start, size_t end) +void DistributedSink::initWritingJobs(const Block & first_block, size_t start, size_t end) { const Settings & settings = context->getSettingsRef(); const auto & addresses_with_failovers = cluster->getShardsAddresses(); @@ -249,7 +241,7 @@ void DistributedBlockOutputStream::initWritingJobs(const Block & first_block, si } -void DistributedBlockOutputStream::waitForJobs() +void DistributedSink::waitForJobs() { pool->wait(); @@ -279,7 +271,7 @@ void DistributedBlockOutputStream::waitForJobs() ThreadPool::Job -DistributedBlockOutputStream::runWritingJob(DistributedBlockOutputStream::JobReplica & job, const Block & current_block, size_t num_shards) +DistributedSink::runWritingJob(JobReplica & job, const Block & current_block, size_t num_shards) { auto thread_group = CurrentThread::getGroup(); return [this, thread_group, &job, ¤t_block, num_shards]() @@ -403,7 +395,7 @@ DistributedBlockOutputStream::runWritingJob(DistributedBlockOutputStream::JobRep } -void DistributedBlockOutputStream::writeSync(const Block & block) +void DistributedSink::writeSync(const Block & block) { const Settings & settings = context->getSettingsRef(); const auto & shards_info = cluster->getShardsInfo(); @@ -487,7 +479,7 @@ void DistributedBlockOutputStream::writeSync(const Block & block) } -void DistributedBlockOutputStream::writeSuffix() +void DistributedSink::onFinish() { auto log_performance = [this]() { @@ -537,7 +529,7 @@ void DistributedBlockOutputStream::writeSuffix() } -IColumn::Selector DistributedBlockOutputStream::createSelector(const Block & source_block) const +IColumn::Selector DistributedSink::createSelector(const Block & source_block) const { Block current_block_with_sharding_key_expr = source_block; storage.getShardingKeyExpr()->execute(current_block_with_sharding_key_expr); @@ -548,7 +540,7 @@ IColumn::Selector DistributedBlockOutputStream::createSelector(const Block & sou } -Blocks DistributedBlockOutputStream::splitBlock(const Block & block) +Blocks DistributedSink::splitBlock(const Block & block) { auto selector = createSelector(block); @@ -572,7 +564,7 @@ Blocks DistributedBlockOutputStream::splitBlock(const Block & block) } -void DistributedBlockOutputStream::writeSplitAsync(const Block & block) +void DistributedSink::writeSplitAsync(const Block & block) { Blocks splitted_blocks = splitBlock(block); const size_t num_shards = splitted_blocks.size(); @@ -585,7 +577,7 @@ void DistributedBlockOutputStream::writeSplitAsync(const Block & block) } -void DistributedBlockOutputStream::writeAsyncImpl(const Block & block, size_t shard_id) +void DistributedSink::writeAsyncImpl(const Block & block, size_t shard_id) { const auto & shard_info = cluster->getShardsInfo()[shard_id]; const auto & settings = context->getSettingsRef(); @@ -621,7 +613,7 @@ void DistributedBlockOutputStream::writeAsyncImpl(const Block & block, size_t sh } -void DistributedBlockOutputStream::writeToLocal(const Block & block, size_t repeats) +void DistributedSink::writeToLocal(const Block & block, size_t repeats) { InterpreterInsertQuery interp(query_ast, context, allow_materialized); @@ -633,7 +625,7 @@ void DistributedBlockOutputStream::writeToLocal(const Block & block, size_t repe } -void DistributedBlockOutputStream::writeToShard(const Block & block, const std::vector & dir_names) +void DistributedSink::writeToShard(const Block & block, const std::vector & dir_names) { const auto & settings = context->getSettingsRef(); const auto & distributed_settings = storage.getDistributedSettingsRef(); diff --git a/src/Storages/Distributed/DistributedBlockOutputStream.h b/src/Storages/Distributed/DistributedSink.h similarity index 90% rename from src/Storages/Distributed/DistributedBlockOutputStream.h rename to src/Storages/Distributed/DistributedSink.h index 8e6e914cb29..af04f8c8aac 100644 --- a/src/Storages/Distributed/DistributedBlockOutputStream.h +++ b/src/Storages/Distributed/DistributedSink.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include #include @@ -34,10 +34,10 @@ class StorageDistributed; * and the resulting blocks are written in a compressed Native format in separate directories for sending. * For each destination address (each directory with data to send), a separate thread is created in StorageDistributed, * which monitors the directory and sends data. */ -class DistributedBlockOutputStream : public IBlockOutputStream +class DistributedSink : public SinkToStorage { public: - DistributedBlockOutputStream( + DistributedSink( ContextPtr context_, StorageDistributed & storage_, const StorageMetadataPtr & metadata_snapshot_, @@ -47,11 +47,9 @@ public: UInt64 insert_timeout_, StorageID main_table_); - Block getHeader() const override; - void write(const Block & block) override; - void writePrefix() override; - - void writeSuffix() override; + String getName() const override { return "DistributedSink"; } + void consume(Chunk chunk) override; + void onFinish() override; private: IColumn::Selector createSelector(const Block & source_block) const; @@ -77,7 +75,7 @@ private: void initWritingJobs(const Block & first_block, size_t start, size_t end); struct JobReplica; - ThreadPool::Job runWritingJob(DistributedBlockOutputStream::JobReplica & job, const Block & current_block, size_t num_shards); + ThreadPool::Job runWritingJob(JobReplica & job, const Block & current_block, size_t num_shards); void waitForJobs(); @@ -97,6 +95,8 @@ private: bool random_shard_insert; bool allow_materialized; + bool is_first_chunk = true; + /// Sync-related stuff UInt64 insert_timeout; // in seconds StorageID main_table; diff --git a/src/Storages/HDFS/StorageHDFS.cpp b/src/Storages/HDFS/StorageHDFS.cpp index 578da239c20..9600eb975b4 100644 --- a/src/Storages/HDFS/StorageHDFS.cpp +++ b/src/Storages/HDFS/StorageHDFS.cpp @@ -15,9 +15,8 @@ #include #include #include -#include +#include #include -#include #include #include #include @@ -27,6 +26,7 @@ #include #include + namespace fs = std::filesystem; namespace DB @@ -172,36 +172,33 @@ private: Block sample_block; }; -class HDFSBlockOutputStream : public IBlockOutputStream +class HDFSSink : public SinkToStorage { public: - HDFSBlockOutputStream(const String & uri, + HDFSSink(const String & uri, const String & format, - const Block & sample_block_, + const Block & sample_block, ContextPtr context, const CompressionMethod compression_method) - : sample_block(sample_block_) + : SinkToStorage(sample_block) { write_buf = wrapWriteBufferWithCompressionMethod(std::make_unique(uri, context->getGlobalContext()->getConfigRef()), compression_method, 3); writer = FormatFactory::instance().getOutputStreamParallelIfPossible(format, *write_buf, sample_block, context); } - Block getHeader() const override + String getName() const override { return "HDFSSink"; } + + void consume(Chunk chunk) override { - return sample_block; + if (is_first_chunk) + { + writer->writePrefix(); + is_first_chunk = false; + } + writer->write(getPort().getHeader().cloneWithColumns(chunk.detachColumns())); } - void write(const Block & block) override - { - writer->write(block); - } - - void writePrefix() override - { - writer->writePrefix(); - } - - void writeSuffix() override + void onFinish() override { try { @@ -218,9 +215,9 @@ public: } private: - Block sample_block; std::unique_ptr write_buf; BlockOutputStreamPtr writer; + bool is_first_chunk = true; }; /* Recursive directory listing with matched paths as a result. @@ -314,9 +311,9 @@ Pipe StorageHDFS::read( return Pipe::unitePipes(std::move(pipes)); } -BlockOutputStreamPtr StorageHDFS::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) +SinkToStoragePtr StorageHDFS::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) { - return std::make_shared(uri, + return std::make_shared(uri, format_name, metadata_snapshot->getSampleBlock(), getContext(), diff --git a/src/Storages/HDFS/StorageHDFS.h b/src/Storages/HDFS/StorageHDFS.h index 4a6614be2e0..268f2205b97 100644 --- a/src/Storages/HDFS/StorageHDFS.h +++ b/src/Storages/HDFS/StorageHDFS.h @@ -32,7 +32,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; void truncate(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context_, TableExclusiveLockHolder &) override; diff --git a/src/Storages/IStorage.h b/src/Storages/IStorage.h index 2d6109bd7af..2180f92df98 100644 --- a/src/Storages/IStorage.h +++ b/src/Storages/IStorage.h @@ -51,6 +51,9 @@ class Pipe; class QueryPlan; using QueryPlanPtr = std::unique_ptr; +class SinkToStorage; +using SinkToStoragePtr = std::shared_ptr; + class QueryPipeline; using QueryPipelinePtr = std::unique_ptr; @@ -272,6 +275,10 @@ public: throw Exception("Method watch is not supported by storage " + getName(), ErrorCodes::NOT_IMPLEMENTED); } + /// Returns true if FINAL modifier must be added to SELECT query depending on required columns. + /// It's needed for ReplacingMergeTree wrappers such as MaterializedMySQL and MaterializedPostrgeSQL + virtual bool needRewriteQueryWithFinal(const Names & /*column_names*/) const { return false; } + /** Read a set of columns from the table. * Accepts a list of columns to read, as well as a description of the query, * from which information can be extracted about how to retrieve data @@ -322,7 +329,7 @@ public: * changed during lifetime of the returned streams, but the snapshot is * guaranteed to be immutable. */ - virtual BlockOutputStreamPtr write( + virtual SinkToStoragePtr write( const ASTPtr & /*query*/, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr /*context*/) diff --git a/src/Storages/Kafka/KafkaBlockOutputStream.cpp b/src/Storages/Kafka/KafkaBlockOutputStream.cpp index 21de27708b4..c7fe71f42c1 100644 --- a/src/Storages/Kafka/KafkaBlockOutputStream.cpp +++ b/src/Storages/Kafka/KafkaBlockOutputStream.cpp @@ -6,30 +6,26 @@ namespace DB { -KafkaBlockOutputStream::KafkaBlockOutputStream( +KafkaSink::KafkaSink( StorageKafka & storage_, const StorageMetadataPtr & metadata_snapshot_, const ContextPtr & context_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlockNonMaterialized()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) , context(context_) { } -Block KafkaBlockOutputStream::getHeader() const +void KafkaSink::onStart() { - return metadata_snapshot->getSampleBlockNonMaterialized(); -} - -void KafkaBlockOutputStream::writePrefix() -{ - buffer = storage.createWriteBuffer(getHeader()); + buffer = storage.createWriteBuffer(getPort().getHeader()); auto format_settings = getFormatSettings(context); format_settings.protobuf.allow_multiple_rows_without_delimiter = true; child = FormatFactory::instance().getOutputStream(storage.getFormatName(), *buffer, - getHeader(), context, + getPort().getHeader(), context, [this](const Columns & columns, size_t row) { buffer->countRow(columns, row); @@ -37,20 +33,17 @@ void KafkaBlockOutputStream::writePrefix() format_settings); } -void KafkaBlockOutputStream::write(const Block & block) +void KafkaSink::consume(Chunk chunk) { - child->write(block); + child->write(getPort().getHeader().cloneWithColumns(chunk.detachColumns())); } -void KafkaBlockOutputStream::writeSuffix() +void KafkaSink::onFinish() { if (child) child->writeSuffix(); - flush(); -} + //flush(); -void KafkaBlockOutputStream::flush() -{ if (buffer) buffer->flush(); } diff --git a/src/Storages/Kafka/KafkaBlockOutputStream.h b/src/Storages/Kafka/KafkaBlockOutputStream.h index 9f413ae527f..7d65ac99998 100644 --- a/src/Storages/Kafka/KafkaBlockOutputStream.h +++ b/src/Storages/Kafka/KafkaBlockOutputStream.h @@ -1,26 +1,25 @@ #pragma once -#include +#include #include namespace DB { -class KafkaBlockOutputStream : public IBlockOutputStream +class KafkaSink : public SinkToStorage { public: - explicit KafkaBlockOutputStream( + explicit KafkaSink( StorageKafka & storage_, const StorageMetadataPtr & metadata_snapshot_, const std::shared_ptr & context_); - Block getHeader() const override; + void consume(Chunk chunk) override; + void onStart() override; + void onFinish() override; + String getName() const override { return "KafkaSink"; } - void writePrefix() override; - void write(const Block & block) override; - void writeSuffix() override; - - void flush() override; + ///void flush() override; private: StorageKafka & storage; diff --git a/src/Storages/Kafka/StorageKafka.cpp b/src/Storages/Kafka/StorageKafka.cpp index 15dd5b553b0..cba67bc3bcb 100644 --- a/src/Storages/Kafka/StorageKafka.cpp +++ b/src/Storages/Kafka/StorageKafka.cpp @@ -1,7 +1,6 @@ #include #include -#include #include #include #include @@ -34,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -289,14 +289,14 @@ Pipe StorageKafka::read( } -BlockOutputStreamPtr StorageKafka::write(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) +SinkToStoragePtr StorageKafka::write(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) { auto modified_context = Context::createCopy(local_context); modified_context->applySettingsChanges(settings_adjustments); if (topics.size() > 1) throw Exception("Can't write to Kafka table with multiple topics!", ErrorCodes::NOT_IMPLEMENTED); - return std::make_shared(*this, metadata_snapshot, modified_context); + return std::make_shared(*this, metadata_snapshot, modified_context); } @@ -747,10 +747,11 @@ void registerStorageKafka(StorageFactory & factory) #undef CHECK_KAFKA_STORAGE_ARGUMENT auto num_consumers = kafka_settings->kafka_num_consumers.value; + auto physical_cpu_cores = getNumberOfPhysicalCPUCores(); - if (num_consumers > 16) + if (num_consumers > physical_cpu_cores) { - throw Exception("Number of consumers can not be bigger than 16", ErrorCodes::BAD_ARGUMENTS); + throw Exception(ErrorCodes::BAD_ARGUMENTS, "Number of consumers can not be bigger than {}", physical_cpu_cores); } else if (num_consumers < 1) { diff --git a/src/Storages/Kafka/StorageKafka.h b/src/Storages/Kafka/StorageKafka.h index 805fa9d510c..2d3abca6059 100644 --- a/src/Storages/Kafka/StorageKafka.h +++ b/src/Storages/Kafka/StorageKafka.h @@ -50,7 +50,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write( + SinkToStoragePtr write( const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; diff --git a/src/Storages/LiveView/StorageLiveView.cpp b/src/Storages/LiveView/StorageLiveView.cpp index f54abda6d7f..5f5ce8a4a37 100644 --- a/src/Storages/LiveView/StorageLiveView.cpp +++ b/src/Storages/LiveView/StorageLiveView.cpp @@ -16,7 +16,7 @@ limitations under the License. */ #include #include #include -#include +#include #include #include #include diff --git a/src/Storages/MergeTree/DataPartsExchange.cpp b/src/Storages/MergeTree/DataPartsExchange.cpp index e30da82416d..6ff9c16dad5 100644 --- a/src/Storages/MergeTree/DataPartsExchange.cpp +++ b/src/Storages/MergeTree/DataPartsExchange.cpp @@ -712,7 +712,7 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPartToDisk( MergeTreeData::DataPart::Checksums & checksums, ThrottlerPtr throttler) { - static const String TMP_PREFIX = "tmp_fetch_"; + static const String TMP_PREFIX = "tmp-fetch_"; String tmp_prefix = tmp_prefix_.empty() ? TMP_PREFIX : tmp_prefix_; /// We will remove directory if it's already exists. Make precautions. @@ -784,7 +784,7 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPartToDiskRemoteMeta( LOG_DEBUG(log, "Downloading Part {} unique id {} metadata onto disk {}.", part_name, part_id, disk->getName()); - static const String TMP_PREFIX = "tmp_fetch_"; + static const String TMP_PREFIX = "tmp-fetch_"; String tmp_prefix = tmp_prefix_.empty() ? TMP_PREFIX : tmp_prefix_; String part_relative_path = String(to_detached ? "detached/" : "") + tmp_prefix + part_name; diff --git a/src/Storages/MergeTree/IMergeTreeDataPart.cpp b/src/Storages/MergeTree/IMergeTreeDataPart.cpp index 70e72c85e79..db271cc280b 100644 --- a/src/Storages/MergeTree/IMergeTreeDataPart.cpp +++ b/src/Storages/MergeTree/IMergeTreeDataPart.cpp @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -78,6 +79,12 @@ void IMergeTreeDataPart::MinMaxIndex::load(const MergeTreeData & data, const Dis Field max_val; serialization->deserializeBinary(max_val, *file); + // NULL_LAST + if (min_val.isNull()) + min_val = PositiveInfinity(); + if (max_val.isNull()) + max_val = PositiveInfinity(); + hyperrectangle.emplace_back(min_val, true, max_val, true); } initialized = true; @@ -132,14 +139,19 @@ void IMergeTreeDataPart::MinMaxIndex::update(const Block & block, const Names & FieldRef min_value; FieldRef max_value; const ColumnWithTypeAndName & column = block.getByName(column_names[i]); - column.column->getExtremes(min_value, max_value); + if (const auto * column_nullable = typeid_cast(column.column.get())) + column_nullable->getExtremesNullLast(min_value, max_value); + else + column.column->getExtremes(min_value, max_value); if (!initialized) hyperrectangle.emplace_back(min_value, true, max_value, true); else { - hyperrectangle[i].left = std::min(hyperrectangle[i].left, min_value); - hyperrectangle[i].right = std::max(hyperrectangle[i].right, max_value); + hyperrectangle[i].left + = applyVisitor(FieldVisitorAccurateLess(), hyperrectangle[i].left, min_value) ? hyperrectangle[i].left : min_value; + hyperrectangle[i].right + = applyVisitor(FieldVisitorAccurateLess(), hyperrectangle[i].right, max_value) ? max_value : hyperrectangle[i].right; } } @@ -1310,6 +1322,9 @@ String IMergeTreeDataPart::getRelativePathForDetachedPart(const String & prefix) { /// Do not allow underscores in the prefix because they are used as separators. assert(prefix.find_first_of('_') == String::npos); + assert(prefix.empty() || std::find(DetachedPartInfo::DETACH_REASONS.begin(), + DetachedPartInfo::DETACH_REASONS.end(), + prefix) != DetachedPartInfo::DETACH_REASONS.end()); return "detached/" + getRelativePathForPrefix(prefix); } diff --git a/src/Storages/MergeTree/IMergeTreeDataPart.h b/src/Storages/MergeTree/IMergeTreeDataPart.h index 3c2c2d44271..8b7a15e5da0 100644 --- a/src/Storages/MergeTree/IMergeTreeDataPart.h +++ b/src/Storages/MergeTree/IMergeTreeDataPart.h @@ -1,7 +1,5 @@ #pragma once -#include - #include #include #include @@ -19,6 +17,7 @@ #include + namespace zkutil { class ZooKeeper; diff --git a/src/Storages/MergeTree/IMergeTreeReader.cpp b/src/Storages/MergeTree/IMergeTreeReader.cpp index 14187564536..4efd3d669eb 100644 --- a/src/Storages/MergeTree/IMergeTreeReader.cpp +++ b/src/Storages/MergeTree/IMergeTreeReader.cpp @@ -33,6 +33,7 @@ IMergeTreeReader::IMergeTreeReader( : data_part(data_part_) , avg_value_size_hints(avg_value_size_hints_) , columns(columns_) + , part_columns(data_part->getColumns()) , uncompressed_cache(uncompressed_cache_) , mark_cache(mark_cache_) , settings(settings_) @@ -41,15 +42,15 @@ IMergeTreeReader::IMergeTreeReader( , all_mark_ranges(all_mark_ranges_) , alter_conversions(storage.getAlterConversionsForPart(data_part)) { - auto part_columns = data_part->getColumns(); if (settings.convert_nested_to_subcolumns) { columns = Nested::convertToSubcolumns(columns); part_columns = Nested::collect(part_columns); } - for (const NameAndTypePair & column_from_part : part_columns) - columns_from_part[column_from_part.name] = column_from_part.type; + columns_from_part.set_empty_key(StringRef()); + for (const auto & column_from_part : part_columns) + columns_from_part.emplace(column_from_part.name, &column_from_part.type); } IMergeTreeReader::~IMergeTreeReader() = default; @@ -226,18 +227,19 @@ NameAndTypePair IMergeTreeReader::getColumnFromPart(const NameAndTypePair & requ if (it == columns_from_part.end()) return required_column; + const auto & type = *it->second; if (required_column.isSubcolumn()) { auto subcolumn_name = required_column.getSubcolumnName(); - auto subcolumn_type = it->second->tryGetSubcolumnType(subcolumn_name); + auto subcolumn_type = type->tryGetSubcolumnType(subcolumn_name); if (!subcolumn_type) return required_column; - return {it->first, subcolumn_name, it->second, subcolumn_type}; + return {String(it->first), subcolumn_name, type, subcolumn_type}; } - return {it->first, it->second}; + return {String(it->first), type}; } void IMergeTreeReader::performRequiredConversions(Columns & res_columns) diff --git a/src/Storages/MergeTree/IMergeTreeReader.h b/src/Storages/MergeTree/IMergeTreeReader.h index 0771bc3d5cb..ab412e48822 100644 --- a/src/Storages/MergeTree/IMergeTreeReader.h +++ b/src/Storages/MergeTree/IMergeTreeReader.h @@ -3,6 +3,7 @@ #include #include #include +#include namespace DB { @@ -72,6 +73,7 @@ protected: /// Columns that are read. NamesAndTypesList columns; + NamesAndTypesList part_columns; UncompressedCache * uncompressed_cache; MarkCache * mark_cache; @@ -92,7 +94,12 @@ private: MergeTreeData::AlterConversions alter_conversions; /// Actual data type of columns in part - std::unordered_map columns_from_part; + +#if !defined(ARCADIA_BUILD) + google::dense_hash_map columns_from_part; +#else + google::sparsehash::dense_hash_map columns_from_part; +#endif }; } diff --git a/src/Storages/MergeTree/KeyCondition.cpp b/src/Storages/MergeTree/KeyCondition.cpp index 476032e66aa..235cadfba11 100644 --- a/src/Storages/MergeTree/KeyCondition.cpp +++ b/src/Storages/MergeTree/KeyCondition.cpp @@ -43,15 +43,8 @@ String Range::toString() const { WriteBufferFromOwnString str; - if (!left_bounded) - str << "(-inf, "; - else - str << (left_included ? '[' : '(') << applyVisitor(FieldVisitorToString(), left) << ", "; - - if (!right_bounded) - str << "+inf)"; - else - str << applyVisitor(FieldVisitorToString(), right) << (right_included ? ']' : ')'); + str << (left_included ? '[' : '(') << applyVisitor(FieldVisitorToString(), left) << ", "; + str << applyVisitor(FieldVisitorToString(), right) << (right_included ? ']' : ')'); return str.str(); } @@ -205,6 +198,38 @@ const KeyCondition::AtomMap KeyCondition::atom_map return true; } }, + { + "nullIn", + [] (RPNElement & out, const Field &) + { + out.function = RPNElement::FUNCTION_IN_SET; + return true; + } + }, + { + "notNullIn", + [] (RPNElement & out, const Field &) + { + out.function = RPNElement::FUNCTION_NOT_IN_SET; + return true; + } + }, + { + "globalNullIn", + [] (RPNElement & out, const Field &) + { + out.function = RPNElement::FUNCTION_IN_SET; + return true; + } + }, + { + "globalNotNullIn", + [] (RPNElement & out, const Field &) + { + out.function = RPNElement::FUNCTION_NOT_IN_SET; + return true; + } + }, { "empty", [] (RPNElement & out, const Field & value) @@ -291,6 +316,26 @@ const KeyCondition::AtomMap KeyCondition::atom_map return true; } + }, + { + "isNotNull", + [] (RPNElement & out, const Field &) + { + out.function = RPNElement::FUNCTION_IS_NOT_NULL; + // isNotNull means (-Inf, +Inf), which is the default Range + out.range = Range(); + return true; + } + }, + { + "isNull", + [] (RPNElement & out, const Field &) + { + out.function = RPNElement::FUNCTION_IS_NULL; + // When using NULL_LAST, isNull means [+Inf, +Inf] + out.range = Range(Field(PositiveInfinity{})); + return true; + } } }; @@ -304,6 +349,14 @@ static const std::map inverse_relations = { {"lessOrEquals", "greater"}, {"in", "notIn"}, {"notIn", "in"}, + {"globalIn", "globalNotIn"}, + {"globalNotIn", "globalIn"}, + {"nullIn", "notNullIn"}, + {"notNullIn", "nullIn"}, + {"globalNullIn", "globalNotNullIn"}, + {"globalNullNotIn", "globalNullIn"}, + {"isNull", "isNotNull"}, + {"isNotNull", "isNull"}, {"like", "notLike"}, {"notLike", "like"}, {"empty", "notEmpty"}, @@ -478,6 +531,11 @@ bool KeyCondition::getConstant(const ASTPtr & expr, Block & block_with_constants /// Simple literal out_value = lit->value; out_type = block_with_constants.getByName(column_name).type; + + /// If constant is not Null, we can assume it's type is not Nullable as well. + if (!out_value.isNull()) + out_type = removeNullable(out_type); + return true; } else if (block_with_constants.has(column_name) && isColumnConst(*block_with_constants.getByName(column_name).column)) @@ -486,6 +544,10 @@ bool KeyCondition::getConstant(const ASTPtr & expr, Block & block_with_constants const auto & expr_info = block_with_constants.getByName(column_name); out_value = (*expr_info.column)[0]; out_type = expr_info.type; + + if (!out_value.isNull()) + out_type = removeNullable(out_type); + return true; } else @@ -620,7 +682,6 @@ bool KeyCondition::canConstantBeWrappedByMonotonicFunctions( if (key_subexpr_names.count(expr_name) == 0) return false; - /// TODO Nullable index is not yet landed. if (out_value.isNull()) return false; @@ -745,7 +806,6 @@ bool KeyCondition::canConstantBeWrappedByFunctions( const auto & sample_block = key_expr->getSampleBlock(); - /// TODO Nullable index is not yet landed. if (out_value.isNull()) return false; @@ -1147,7 +1207,7 @@ static void castValueToType(const DataTypePtr & desired_type, Field & src_value, bool KeyCondition::tryParseAtomFromAST(const ASTPtr & node, ContextPtr context, Block & block_with_constants, RPNElement & out) { - /** Functions < > = != <= >= in `notIn`, where one argument is a constant, and the other is one of columns of key, + /** Functions < > = != <= >= in `notIn` isNull isNotNull, where one argument is a constant, and the other is one of columns of key, * or itself, wrapped in a chain of possibly-monotonic functions, * or constant expression - number. */ @@ -1192,8 +1252,8 @@ bool KeyCondition::tryParseAtomFromAST(const ASTPtr & node, ContextPtr context, /// If we use this key condition to prune partitions by single value, we cannot relax conditions for NOT. if (single_point - && (func_name == "notLike" || func_name == "notIn" || func_name == "globalNotIn" || func_name == "notEquals" - || func_name == "notEmpty")) + && (func_name == "notLike" || func_name == "notIn" || func_name == "globalNotIn" || func_name == "notNullIn" + || func_name == "globalNotNullIn" || func_name == "notEquals" || func_name == "notEmpty")) strict_condition = true; if (functionIsInOrGlobalInOperator(func_name)) @@ -1504,6 +1564,8 @@ KeyCondition::Description KeyCondition::getDescription() const else if ( element.function == RPNElement::FUNCTION_IN_RANGE || element.function == RPNElement::FUNCTION_NOT_IN_RANGE + || element.function == RPNElement::FUNCTION_IS_NULL + || element.function == RPNElement::FUNCTION_IS_NOT_NULL || element.function == RPNElement::FUNCTION_IN_SET || element.function == RPNElement::FUNCTION_NOT_IN_SET) { @@ -1668,11 +1730,13 @@ KeyCondition::Description KeyCondition::getDescription() const * over at least one hyperrectangle from which this range consists. */ +FieldRef negativeInfinity(NegativeInfinity{}), positiveInfinity(PositiveInfinity{}); + template static BoolMask forAnyHyperrectangle( size_t key_size, - const FieldRef * key_left, - const FieldRef * key_right, + const FieldRef * left_keys, + const FieldRef * right_keys, bool left_bounded, bool right_bounded, std::vector & hyperrectangle, @@ -1688,10 +1752,10 @@ static BoolMask forAnyHyperrectangle( /// Let's go through the matching elements of the key. while (prefix_size < key_size) { - if (key_left[prefix_size] == key_right[prefix_size]) + if (left_keys[prefix_size] == right_keys[prefix_size]) { /// Point ranges. - hyperrectangle[prefix_size] = Range(key_left[prefix_size]); + hyperrectangle[prefix_size] = Range(left_keys[prefix_size]); ++prefix_size; } else @@ -1705,11 +1769,11 @@ static BoolMask forAnyHyperrectangle( if (prefix_size + 1 == key_size) { if (left_bounded && right_bounded) - hyperrectangle[prefix_size] = Range(key_left[prefix_size], true, key_right[prefix_size], true); + hyperrectangle[prefix_size] = Range(left_keys[prefix_size], true, right_keys[prefix_size], true); else if (left_bounded) - hyperrectangle[prefix_size] = Range::createLeftBounded(key_left[prefix_size], true); + hyperrectangle[prefix_size] = Range::createLeftBounded(left_keys[prefix_size], true); else if (right_bounded) - hyperrectangle[prefix_size] = Range::createRightBounded(key_right[prefix_size], true); + hyperrectangle[prefix_size] = Range::createRightBounded(right_keys[prefix_size], true); return callback(hyperrectangle); } @@ -1717,11 +1781,11 @@ static BoolMask forAnyHyperrectangle( /// (x1 .. x2) x (-inf .. +inf) if (left_bounded && right_bounded) - hyperrectangle[prefix_size] = Range(key_left[prefix_size], false, key_right[prefix_size], false); + hyperrectangle[prefix_size] = Range(left_keys[prefix_size], false, right_keys[prefix_size], false); else if (left_bounded) - hyperrectangle[prefix_size] = Range::createLeftBounded(key_left[prefix_size], false); + hyperrectangle[prefix_size] = Range::createLeftBounded(left_keys[prefix_size], false); else if (right_bounded) - hyperrectangle[prefix_size] = Range::createRightBounded(key_right[prefix_size], false); + hyperrectangle[prefix_size] = Range::createRightBounded(right_keys[prefix_size], false); for (size_t i = prefix_size + 1; i < key_size; ++i) hyperrectangle[i] = Range(); @@ -1741,8 +1805,8 @@ static BoolMask forAnyHyperrectangle( if (left_bounded) { - hyperrectangle[prefix_size] = Range(key_left[prefix_size]); - result = result | forAnyHyperrectangle(key_size, key_left, key_right, true, false, hyperrectangle, prefix_size + 1, initial_mask, callback); + hyperrectangle[prefix_size] = Range(left_keys[prefix_size]); + result = result | forAnyHyperrectangle(key_size, left_keys, right_keys, true, false, hyperrectangle, prefix_size + 1, initial_mask, callback); if (result.isComplete()) return result; } @@ -1751,8 +1815,8 @@ static BoolMask forAnyHyperrectangle( if (right_bounded) { - hyperrectangle[prefix_size] = Range(key_right[prefix_size]); - result = result | forAnyHyperrectangle(key_size, key_left, key_right, false, true, hyperrectangle, prefix_size + 1, initial_mask, callback); + hyperrectangle[prefix_size] = Range(right_keys[prefix_size]); + result = result | forAnyHyperrectangle(key_size, left_keys, right_keys, false, true, hyperrectangle, prefix_size + 1, initial_mask, callback); if (result.isComplete()) return result; } @@ -1763,37 +1827,31 @@ static BoolMask forAnyHyperrectangle( BoolMask KeyCondition::checkInRange( size_t used_key_size, - const FieldRef * left_key, - const FieldRef * right_key, + const FieldRef * left_keys, + const FieldRef * right_keys, const DataTypes & data_types, - bool right_bounded, BoolMask initial_mask) const { std::vector key_ranges(used_key_size, Range()); -/* std::cerr << "Checking for: ["; - for (size_t i = 0; i != used_key_size; ++i) - std::cerr << (i != 0 ? ", " : "") << applyVisitor(FieldVisitorToString(), left_key[i]); - std::cerr << " ... "; + // std::cerr << "Checking for: ["; + // for (size_t i = 0; i != used_key_size; ++i) + // std::cerr << (i != 0 ? ", " : "") << applyVisitor(FieldVisitorToString(), left_keys[i]); + // std::cerr << " ... "; - if (right_bounded) - { - for (size_t i = 0; i != used_key_size; ++i) - std::cerr << (i != 0 ? ", " : "") << applyVisitor(FieldVisitorToString(), right_key[i]); - std::cerr << "]\n"; - } - else - std::cerr << "+inf)\n";*/ + // for (size_t i = 0; i != used_key_size; ++i) + // std::cerr << (i != 0 ? ", " : "") << applyVisitor(FieldVisitorToString(), right_keys[i]); + // std::cerr << "]\n"; - return forAnyHyperrectangle(used_key_size, left_key, right_key, true, right_bounded, key_ranges, 0, initial_mask, + return forAnyHyperrectangle(used_key_size, left_keys, right_keys, true, true, key_ranges, 0, initial_mask, [&] (const std::vector & key_ranges_hyperrectangle) { auto res = checkInHyperrectangle(key_ranges_hyperrectangle, data_types); -/* std::cerr << "Hyperrectangle: "; - for (size_t i = 0, size = key_ranges.size(); i != size; ++i) - std::cerr << (i != 0 ? " x " : "") << key_ranges[i].toString(); - std::cerr << ": " << res.can_be_true << "\n";*/ + // std::cerr << "Hyperrectangle: "; + // for (size_t i = 0, size = key_ranges.size(); i != size; ++i) + // std::cerr << (i != 0 ? " x " : "") << key_ranges[i].toString(); + // std::cerr << ": " << res.can_be_true << "\n"; return res; }); @@ -1821,6 +1879,8 @@ std::optional KeyCondition::applyMonotonicFunctionsChainToRange( /// If we apply function to open interval, we can get empty intervals in result. /// E.g. for ('2020-01-03', '2020-01-20') after applying 'toYYYYMM' we will get ('202001', '202001'). /// To avoid this we make range left and right included. + /// Any function that treats NULL specially is not monotonic. + /// Thus we can safely use isNull() as an -Inf/+Inf indicator here. if (!key_range.left.isNull()) { key_range.left = applyFunction(func, current_type, key_range.left); @@ -1836,7 +1896,7 @@ std::optional KeyCondition::applyMonotonicFunctionsChainToRange( current_type = func->getResultType(); if (!monotonicity.is_positive) - key_range.swapLeftAndRight(); + key_range.invert(); } return key_range; } @@ -1961,6 +2021,17 @@ BoolMask KeyCondition::checkInHyperrectangle( if (element.function == RPNElement::FUNCTION_NOT_IN_RANGE) rpn_stack.back() = !rpn_stack.back(); } + else if ( + element.function == RPNElement::FUNCTION_IS_NULL + || element.function == RPNElement::FUNCTION_IS_NOT_NULL) + { + const Range * key_range = &hyperrectangle[element.key_column]; + + /// No need to apply monotonic functions as nulls are kept. + bool intersects = element.range.intersectsRange(*key_range); + bool contains = element.range.containsRange(*key_range); + rpn_stack.emplace_back(intersects, !contains); + } else if ( element.function == RPNElement::FUNCTION_IN_SET || element.function == RPNElement::FUNCTION_NOT_IN_SET) @@ -2015,43 +2086,13 @@ BoolMask KeyCondition::checkInHyperrectangle( } -BoolMask KeyCondition::checkInRange( - size_t used_key_size, - const FieldRef * left_key, - const FieldRef * right_key, - const DataTypes & data_types, - BoolMask initial_mask) const -{ - return checkInRange(used_key_size, left_key, right_key, data_types, true, initial_mask); -} - - bool KeyCondition::mayBeTrueInRange( size_t used_key_size, - const FieldRef * left_key, - const FieldRef * right_key, + const FieldRef * left_keys, + const FieldRef * right_keys, const DataTypes & data_types) const { - return checkInRange(used_key_size, left_key, right_key, data_types, true, BoolMask::consider_only_can_be_true).can_be_true; -} - - -BoolMask KeyCondition::checkAfter( - size_t used_key_size, - const FieldRef * left_key, - const DataTypes & data_types, - BoolMask initial_mask) const -{ - return checkInRange(used_key_size, left_key, nullptr, data_types, false, initial_mask); -} - - -bool KeyCondition::mayBeTrueAfter( - size_t used_key_size, - const FieldRef * left_key, - const DataTypes & data_types) const -{ - return checkInRange(used_key_size, left_key, nullptr, data_types, false, BoolMask::consider_only_can_be_true).can_be_true; + return checkInRange(used_key_size, left_keys, right_keys, data_types, BoolMask::consider_only_can_be_true).can_be_true; } String KeyCondition::RPNElement::toString() const { return toString("column " + std::to_string(key_column), false); } @@ -2121,6 +2162,15 @@ String KeyCondition::RPNElement::toString(const std::string_view & column_name, buf << ")"; return buf.str(); } + case FUNCTION_IS_NULL: + case FUNCTION_IS_NOT_NULL: + { + buf << "("; + print_wrapped_column(buf); + buf << (function == FUNCTION_IS_NULL ? " isNull" : " isNotNull"); + buf << ")"; + return buf.str(); + } case ALWAYS_FALSE: return "false"; case ALWAYS_TRUE: @@ -2162,6 +2212,8 @@ bool KeyCondition::unknownOrAlwaysTrue(bool unknown_any) const || element.function == RPNElement::FUNCTION_IN_RANGE || element.function == RPNElement::FUNCTION_IN_SET || element.function == RPNElement::FUNCTION_NOT_IN_SET + || element.function == RPNElement::FUNCTION_IS_NULL + || element.function == RPNElement::FUNCTION_IS_NOT_NULL || element.function == RPNElement::ALWAYS_FALSE) { rpn_stack.push_back(false); @@ -2205,6 +2257,8 @@ size_t KeyCondition::getMaxKeyColumn() const { if (element.function == RPNElement::FUNCTION_NOT_IN_RANGE || element.function == RPNElement::FUNCTION_IN_RANGE + || element.function == RPNElement::FUNCTION_IS_NULL + || element.function == RPNElement::FUNCTION_IS_NOT_NULL || element.function == RPNElement::FUNCTION_IN_SET || element.function == RPNElement::FUNCTION_NOT_IN_SET) { diff --git a/src/Storages/MergeTree/KeyCondition.h b/src/Storages/MergeTree/KeyCondition.h index c957c65fc40..edae921bfda 100644 --- a/src/Storages/MergeTree/KeyCondition.h +++ b/src/Storages/MergeTree/KeyCondition.h @@ -55,25 +55,24 @@ private: static bool less(const Field & lhs, const Field & rhs); public: - FieldRef left; /// the left border, if any - FieldRef right; /// the right border, if any - bool left_bounded = false; /// bounded at the left - bool right_bounded = false; /// bounded at the right - bool left_included = false; /// includes the left border, if any - bool right_included = false; /// includes the right border, if any + FieldRef left = NegativeInfinity{}; /// the left border + FieldRef right = PositiveInfinity{}; /// the right border + bool left_included = false; /// includes the left border + bool right_included = false; /// includes the right border - /// The whole unversum. + /// The whole universe (not null). Range() {} /// One point. Range(const FieldRef & point) - : left(point), right(point), left_bounded(true), right_bounded(true), left_included(true), right_included(true) {} + : left(point), right(point), left_included(true), right_included(true) {} /// A bounded two-sided range. Range(const FieldRef & left_, bool left_included_, const FieldRef & right_, bool right_included_) - : left(left_), right(right_), - left_bounded(true), right_bounded(true), - left_included(left_included_), right_included(right_included_) + : left(left_) + , right(right_) + , left_included(left_included_) + , right_included(right_included_) { shrinkToIncludedIfPossible(); } @@ -82,9 +81,11 @@ public: { Range r; r.right = right_point; - r.right_bounded = true; r.right_included = right_included; r.shrinkToIncludedIfPossible(); + // Special case for [-Inf, -Inf] + if (r.right.isNegativeInfinity() && right_included) + r.left_included = true; return r; } @@ -92,9 +93,11 @@ public: { Range r; r.left = left_point; - r.left_bounded = true; r.left_included = left_included; r.shrinkToIncludedIfPossible(); + // Special case for [+Inf, +Inf] + if (r.left.isPositiveInfinity() && left_included) + r.right_included = true; return r; } @@ -104,7 +107,7 @@ public: */ void shrinkToIncludedIfPossible() { - if (left.isExplicit() && left_bounded && !left_included) + if (left.isExplicit() && !left_included) { if (left.getType() == Field::Types::UInt64 && left.get() != std::numeric_limits::max()) { @@ -117,7 +120,7 @@ public: left_included = true; } } - if (right.isExplicit() && right_bounded && !right_included) + if (right.isExplicit() && !right_included) { if (right.getType() == Field::Types::UInt64 && right.get() != std::numeric_limits::min()) { @@ -132,12 +135,7 @@ public: } } - bool empty() const - { - return left_bounded && right_bounded - && (less(right, left) - || ((!left_included || !right_included) && !less(left, right))); - } + bool empty() const { return less(right, left) || ((!left_included || !right_included) && !less(left, right)); } /// x contained in the range bool contains(const FieldRef & x) const @@ -148,35 +146,23 @@ public: /// x is to the left bool rightThan(const FieldRef & x) const { - return (left_bounded - ? !(less(left, x) || (left_included && equals(x, left))) - : false); + return less(left, x) || (left_included && equals(x, left)); } /// x is to the right bool leftThan(const FieldRef & x) const { - return (right_bounded - ? !(less(x, right) || (right_included && equals(x, right))) - : false); + return less(x, right) || (right_included && equals(x, right)); } bool intersectsRange(const Range & r) const { /// r to the left of me. - if (r.right_bounded - && left_bounded - && (less(r.right, left) - || ((!left_included || !r.right_included) - && equals(r.right, left)))) + if (less(r.right, left) || ((!left_included || !r.right_included) && equals(r.right, left))) return false; /// r to the right of me. - if (r.left_bounded - && right_bounded - && (less(right, r.left) /// ...} {... - || ((!right_included || !r.left_included) /// ...) [... or ...] (... - && equals(r.left, right)))) + if (less(right, r.left) || ((!right_included || !r.left_included) && equals(r.left, right))) return false; return true; @@ -185,30 +171,23 @@ public: bool containsRange(const Range & r) const { /// r starts to the left of me. - if (left_bounded - && (!r.left_bounded - || less(r.left, left) - || (r.left_included - && !left_included - && equals(r.left, left)))) + if (less(r.left, left) || (r.left_included && !left_included && equals(r.left, left))) return false; /// r ends right of me. - if (right_bounded - && (!r.right_bounded - || less(right, r.right) - || (r.right_included - && !right_included - && equals(r.right, right)))) + if (less(right, r.right) || (r.right_included && !right_included && equals(r.right, right))) return false; return true; } - void swapLeftAndRight() + void invert() { std::swap(left, right); - std::swap(left_bounded, right_bounded); + if (left.isPositiveInfinity()) + left = NegativeInfinity{}; + if (right.isNegativeInfinity()) + right = PositiveInfinity{}; std::swap(left_included, right_included); } @@ -247,16 +226,8 @@ public: /// one of the resulting mask components (see BoolMask::consider_only_can_be_XXX). BoolMask checkInRange( size_t used_key_size, - const FieldRef * left_key, - const FieldRef* right_key, - const DataTypes & data_types, - BoolMask initial_mask = BoolMask(false, false)) const; - - /// Are the condition and its negation valid in a semi-infinite (not limited to the right) key range. - /// left_key must contain all the fields in the sort_descr in the appropriate order. - BoolMask checkAfter( - size_t used_key_size, - const FieldRef * left_key, + const FieldRef * left_keys, + const FieldRef * right_keys, const DataTypes & data_types, BoolMask initial_mask = BoolMask(false, false)) const; @@ -264,15 +235,8 @@ public: /// This is more efficient than checkInRange(...).can_be_true. bool mayBeTrueInRange( size_t used_key_size, - const FieldRef * left_key, - const FieldRef * right_key, - const DataTypes & data_types) const; - - /// Same as checkAfter, but calculate only may_be_true component of a result. - /// This is more efficient than checkAfter(...).can_be_true. - bool mayBeTrueAfter( - size_t used_key_size, - const FieldRef * left_key, + const FieldRef * left_keys, + const FieldRef * right_keys, const DataTypes & data_types) const; /// Checks that the index can not be used @@ -338,6 +302,8 @@ private: FUNCTION_NOT_IN_RANGE, FUNCTION_IN_SET, FUNCTION_NOT_IN_SET, + FUNCTION_IS_NULL, + FUNCTION_IS_NOT_NULL, FUNCTION_UNKNOWN, /// Can take any value. /// Operators of the logical expression. FUNCTION_NOT, diff --git a/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.cpp b/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.cpp index 549a8a4f772..c91d60c5de7 100644 --- a/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.cpp +++ b/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.cpp @@ -465,6 +465,19 @@ Block MergeTreeBaseSelectProcessor::transformHeader( return block; } +std::unique_ptr MergeTreeBaseSelectProcessor::getSizePredictor( + const MergeTreeData::DataPartPtr & data_part, + const MergeTreeReadTaskColumns & task_columns, + const Block & sample_block) +{ + const auto & required_column_names = task_columns.columns.getNames(); + const auto & required_pre_column_names = task_columns.pre_columns.getNames(); + NameSet complete_column_names(required_column_names.begin(), required_column_names.end()); + complete_column_names.insert(required_pre_column_names.begin(), required_pre_column_names.end()); + + return std::make_unique( + data_part, Names(complete_column_names.begin(), complete_column_names.end()), sample_block); +} MergeTreeBaseSelectProcessor::~MergeTreeBaseSelectProcessor() = default; diff --git a/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.h b/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.h index 8da9b002e16..d102e4f07a4 100644 --- a/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.h +++ b/src/Storages/MergeTree/MergeTreeBaseSelectProcessor.h @@ -1,12 +1,12 @@ #pragma once -#include #include #include #include #include + namespace DB { @@ -37,6 +37,11 @@ public: static Block transformHeader( Block block, const PrewhereInfoPtr & prewhere_info, const DataTypePtr & partition_value_type, const Names & virtual_columns); + static std::unique_ptr getSizePredictor( + const MergeTreeData::DataPartPtr & data_part, + const MergeTreeReadTaskColumns & task_columns, + const Block & sample_block); + protected: Chunk generate() final; diff --git a/src/Storages/MergeTree/MergeTreeBlockReadUtils.cpp b/src/Storages/MergeTree/MergeTreeBlockReadUtils.cpp index b8698ae3e01..93594dd4357 100644 --- a/src/Storages/MergeTree/MergeTreeBlockReadUtils.cpp +++ b/src/Storages/MergeTree/MergeTreeBlockReadUtils.cpp @@ -35,16 +35,16 @@ bool injectRequiredColumnsRecursively( /// stages. checkStackSize(); - if (storage_columns.hasPhysicalOrSubcolumn(column_name)) + auto column_in_storage = storage_columns.tryGetColumnOrSubcolumn(ColumnsDescription::AllPhysical, column_name); + if (column_in_storage) { - auto column_in_storage = storage_columns.getPhysicalOrSubcolumn(column_name); - auto column_name_in_part = column_in_storage.getNameInStorage(); + auto column_name_in_part = column_in_storage->getNameInStorage(); if (alter_conversions.isColumnRenamed(column_name_in_part)) column_name_in_part = alter_conversions.getColumnOldName(column_name_in_part); auto column_in_part = NameAndTypePair( - column_name_in_part, column_in_storage.getSubcolumnName(), - column_in_storage.getTypeInStorage(), column_in_storage.type); + column_name_in_part, column_in_storage->getSubcolumnName(), + column_in_storage->getTypeInStorage(), column_in_storage->type); /// column has files and hence does not require evaluation if (part->hasColumnFiles(column_in_part)) @@ -93,7 +93,7 @@ NameSet injectRequiredColumns(const MergeTreeData & storage, const StorageMetada for (size_t i = 0; i < columns.size(); ++i) { /// We are going to fetch only physical columns - if (!storage_columns.hasPhysicalOrSubcolumn(columns[i])) + if (!storage_columns.hasColumnOrSubcolumn(ColumnsDescription::AllPhysical, columns[i])) throw Exception("There is no physical column or subcolumn " + columns[i] + " in table.", ErrorCodes::NO_SUCH_COLUMN_IN_TABLE); have_at_least_one_physical_column |= injectRequiredColumnsRecursively( @@ -310,9 +310,9 @@ MergeTreeReadTaskColumns getReadTaskColumns( if (check_columns) { - const NamesAndTypesList & physical_columns = metadata_snapshot->getColumns().getAllWithSubcolumns(); - result.pre_columns = physical_columns.addTypes(pre_column_names); - result.columns = physical_columns.addTypes(column_names); + const auto & columns = metadata_snapshot->getColumns(); + result.pre_columns = columns.getByNames(ColumnsDescription::All, pre_column_names, true); + result.columns = columns.getByNames(ColumnsDescription::All, column_names, true); } else { diff --git a/src/Storages/MergeTree/MergeTreeBlockReadUtils.h b/src/Storages/MergeTree/MergeTreeBlockReadUtils.h index 31d609e4242..4c4081bd83b 100644 --- a/src/Storages/MergeTree/MergeTreeBlockReadUtils.h +++ b/src/Storages/MergeTree/MergeTreeBlockReadUtils.h @@ -70,7 +70,7 @@ struct MergeTreeReadTaskColumns /// column names to read during PREWHERE NamesAndTypesList pre_columns; /// resulting block may require reordering in accordance with `ordered_names` - bool should_reorder; + bool should_reorder = false; }; MergeTreeReadTaskColumns getReadTaskColumns( diff --git a/src/Storages/MergeTree/MergeTreeData.cpp b/src/Storages/MergeTree/MergeTreeData.cpp index 8f502652a45..60ff3d094b7 100644 --- a/src/Storages/MergeTree/MergeTreeData.cpp +++ b/src/Storages/MergeTree/MergeTreeData.cpp @@ -158,6 +158,16 @@ static void checkSampleExpression(const StorageInMemoryMetadata & metadata, bool ErrorCodes::ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER); } +inline UInt64 time_in_microseconds(std::chrono::time_point timepoint) +{ + return std::chrono::duration_cast(timepoint.time_since_epoch()).count(); +} + +inline UInt64 time_in_seconds(std::chrono::time_point timepoint) +{ + return std::chrono::duration_cast(timepoint.time_since_epoch()).count(); +} + MergeTreeData::MergeTreeData( const StorageID & table_id_, const String & relative_data_path_, @@ -1246,7 +1256,11 @@ void MergeTreeData::removePartsFinally(const MergeTreeData::DataPartsVector & pa PartLogElement part_log_elem; part_log_elem.event_type = PartLogElement::REMOVE_PART; - part_log_elem.event_time = time(nullptr); + + const auto time_now = std::chrono::system_clock::now(); + part_log_elem.event_time = time_in_seconds(time_now); + part_log_elem.event_time_microseconds = time_in_microseconds(time_now); + part_log_elem.duration_ms = 0; //-V1048 part_log_elem.database_name = table_id.database_name; @@ -2381,7 +2395,7 @@ MergeTreeData::DataPartsVector MergeTreeData::removePartsInRangeFromWorkingSet(c /// It's a DROP PART and it's already executed by fetching some covering part bool is_drop_part = !drop_range.isFakeDropRangePart() && drop_range.min_block; - if (is_drop_part && (part->info.min_block != drop_range.min_block || part->info.max_block != drop_range.max_block)) + if (is_drop_part && (part->info.min_block != drop_range.min_block || part->info.max_block != drop_range.max_block || part->info.getDataVersion() != drop_range.getDataVersion())) { /// Why we check only min and max blocks here without checking merge /// level? It's a tricky situation which can happen on a stale @@ -2398,9 +2412,7 @@ MergeTreeData::DataPartsVector MergeTreeData::removePartsInRangeFromWorkingSet(c /// So here we just check that all_1_3_1 covers blocks from drop /// all_2_2_2. /// - /// NOTE: this helps only to avoid logical error during drop part. - /// We still get intersecting "parts" in queue. - bool is_covered_by_min_max_block = part->info.min_block <= drop_range.min_block && part->info.max_block >= drop_range.max_block; + bool is_covered_by_min_max_block = part->info.min_block <= drop_range.min_block && part->info.max_block >= drop_range.max_block && part->info.getDataVersion() >= drop_range.getDataVersion(); if (is_covered_by_min_max_block) { LOG_INFO(log, "Skipping drop range for part {} because covering part {} already exists", drop_range.getPartName(), part->name); @@ -3200,7 +3212,11 @@ String MergeTreeData::getPartitionIDFromQuery(const ASTPtr & ast, ContextPtr loc const auto & partition_ast = ast->as(); if (!partition_ast.value) + { + if (!MergeTreePartInfo::validatePartitionID(partition_ast.id, format_version)) + throw Exception("Invalid partition format: " + partition_ast.id, ErrorCodes::INVALID_PARTITION_VALUE); return partition_ast.id; + } if (format_version < MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING) { @@ -3854,16 +3870,20 @@ bool MergeTreeData::mayBenefitFromIndexForIn( for (const auto & index : metadata_snapshot->getSecondaryIndices()) if (index_wrapper_factory.get(index)->mayBenefitFromIndexForIn(item)) return true; - if (metadata_snapshot->selected_projection - && metadata_snapshot->selected_projection->isPrimaryKeyColumnPossiblyWrappedInFunctions(item)) - return true; + for (const auto & projection : metadata_snapshot->getProjections()) + { + if (projection.isPrimaryKeyColumnPossiblyWrappedInFunctions(item)) + return true; + } } /// The tuple itself may be part of the primary key, so check that as a last resort. if (isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(left_in_operand, metadata_snapshot)) return true; - if (metadata_snapshot->selected_projection - && metadata_snapshot->selected_projection->isPrimaryKeyColumnPossiblyWrappedInFunctions(left_in_operand)) - return true; + for (const auto & projection : metadata_snapshot->getProjections()) + { + if (projection.isPrimaryKeyColumnPossiblyWrappedInFunctions(left_in_operand)) + return true; + } return false; } else @@ -3872,10 +3892,11 @@ bool MergeTreeData::mayBenefitFromIndexForIn( if (index_wrapper_factory.get(index)->mayBenefitFromIndexForIn(left_in_operand)) return true; - if (metadata_snapshot->selected_projection - && metadata_snapshot->selected_projection->isPrimaryKeyColumnPossiblyWrappedInFunctions(left_in_operand)) - return true; - + for (const auto & projection : metadata_snapshot->getProjections()) + { + if (projection.isPrimaryKeyColumnPossiblyWrappedInFunctions(left_in_operand)) + return true; + } return isPrimaryOrMinMaxKeyColumnPossiblyWrappedInFunctions(left_in_operand, metadata_snapshot); } } @@ -3915,7 +3936,7 @@ static void selectBestProjection( candidate.required_columns, metadata_snapshot, candidate.desc->metadata, - query_info, // TODO syntax_analysis_result set in index + query_info, query_context, settings.max_threads, max_added_blocks); @@ -3933,7 +3954,7 @@ static void selectBestProjection( required_columns, metadata_snapshot, metadata_snapshot, - query_info, // TODO syntax_analysis_result set in index + query_info, query_context, settings.max_threads, max_added_blocks); @@ -4103,7 +4124,8 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection( candidate.before_aggregation = analysis_result.before_aggregation->clone(); auto required_columns = candidate.before_aggregation->foldActionsByProjection(keys, projection.sample_block_for_keys); - if (required_columns.empty() && !keys.empty()) + // TODO Let's find out the exact required_columns for keys. + if (required_columns.empty() && (!keys.empty() && !candidate.before_aggregation->getRequiredColumns().empty())) continue; if (analysis_result.optimize_aggregation_in_order) @@ -4191,7 +4213,7 @@ bool MergeTreeData::getQueryProcessingStageWithAggregateProjection( analysis_result.required_columns, metadata_snapshot, metadata_snapshot, - query_info, // TODO syntax_analysis_result set in index + query_info, query_context, settings.max_threads, max_added_blocks); @@ -4574,17 +4596,6 @@ bool MergeTreeData::canReplacePartition(const DataPartPtr & src_part) const return true; } -inline UInt64 time_in_microseconds(std::chrono::time_point timepoint) -{ - return std::chrono::duration_cast(timepoint.time_since_epoch()).count(); -} - - -inline UInt64 time_in_seconds(std::chrono::time_point timepoint) -{ - return std::chrono::duration_cast(timepoint.time_since_epoch()).count(); -} - void MergeTreeData::writePartLog( PartLogElement::Type type, const ExecutionStatus & execution_status, diff --git a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp index a777c244426..6279d2d7d6f 100644 --- a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp +++ b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp @@ -894,7 +894,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor { case MergeTreeData::MergingParams::Ordinary: merged_transform = std::make_unique( - header, pipes.size(), sort_description, merge_block_size, 0, rows_sources_write_buf.get(), true, blocks_are_granules_size); + header, pipes.size(), sort_description, merge_block_size, 0, false, rows_sources_write_buf.get(), true, blocks_are_granules_size); break; case MergeTreeData::MergingParams::Collapsing: diff --git a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp index 49ec2a669e3..0b5351dcf01 100644 --- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp +++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp @@ -762,7 +762,8 @@ RangesInDataParts MergeTreeDataSelectExecutor::filterPartsByPrimaryKeyAndSkipInd Poco::Logger * log, size_t num_streams, ReadFromMergeTree::IndexStats & index_stats, - bool use_skip_indexes) + bool use_skip_indexes, + bool check_limits) { RangesInDataParts parts_with_ranges(parts.size()); const Settings & settings = context->getSettingsRef(); @@ -890,7 +891,7 @@ RangesInDataParts MergeTreeDataSelectExecutor::filterPartsByPrimaryKeyAndSkipInd if (!ranges.ranges.empty()) { - if (limits.max_rows || leaf_limits.max_rows) + if (check_limits && (limits.max_rows || leaf_limits.max_rows)) { /// Fail fast if estimated number of rows to read exceeds the limit auto current_rows_estimate = ranges.getRowsCount(); @@ -1155,7 +1156,8 @@ size_t MergeTreeDataSelectExecutor::estimateNumMarksToRead( log, num_streams, index_stats, - false); + true /* use_skip_indexes */, + false /* check_limits */); return index_stats.back().num_granules_after; } @@ -1295,6 +1297,9 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange( create_field_ref = [index_columns](size_t row, size_t column, FieldRef & field) { field = {index_columns.get(), row, column}; + // NULL_LAST + if (field.isNull()) + field = PositiveInfinity{}; }; } else @@ -1302,6 +1307,9 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange( create_field_ref = [&index](size_t row, size_t column, FieldRef & field) { index[column]->get(row, field); + // NULL_LAST + if (field.isNull()) + field = PositiveInfinity{}; }; } @@ -1314,21 +1322,22 @@ MarkRanges MergeTreeDataSelectExecutor::markRangesFromPKRange( if (range.end == marks_count && !has_final_mark) { for (size_t i = 0; i < used_key_size; ++i) + { create_field_ref(range.begin, i, index_left[i]); - - return key_condition.mayBeTrueAfter( - used_key_size, index_left.data(), primary_key.data_types); + index_right[i] = PositiveInfinity{}; + } } - - if (has_final_mark && range.end == marks_count) - range.end -= 1; /// Remove final empty mark. It's useful only for primary key condition. - - for (size_t i = 0; i < used_key_size; ++i) + else { - create_field_ref(range.begin, i, index_left[i]); - create_field_ref(range.end, i, index_right[i]); - } + if (has_final_mark && range.end == marks_count) + range.end -= 1; /// Remove final empty mark. It's useful only for primary key condition. + for (size_t i = 0; i < used_key_size; ++i) + { + create_field_ref(range.begin, i, index_left[i]); + create_field_ref(range.end, i, index_right[i]); + } + } return key_condition.mayBeTrueInRange( used_key_size, index_left.data(), index_right.data(), primary_key.data_types); }; diff --git a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.h b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.h index bd2a79f0aee..de5ca1f0138 100644 --- a/src/Storages/MergeTree/MergeTreeDataSelectExecutor.h +++ b/src/Storages/MergeTree/MergeTreeDataSelectExecutor.h @@ -174,6 +174,7 @@ public: /// Filter parts using primary key and secondary indexes. /// For every part, select mark ranges to read. + /// If 'check_limits = true' it will throw exception if the amount of data exceed the limits from settings. static RangesInDataParts filterPartsByPrimaryKeyAndSkipIndexes( MergeTreeData::DataPartsVector && parts, StorageMetadataPtr metadata_snapshot, @@ -184,7 +185,8 @@ public: Poco::Logger * log, size_t num_streams, ReadFromMergeTree::IndexStats & index_stats, - bool use_skip_indexes); + bool use_skip_indexes, + bool check_limits); /// Create expression for sampling. /// Also, calculate _sample_factor if needed. diff --git a/src/Storages/MergeTree/MergeTreeInOrderSelectProcessor.cpp b/src/Storages/MergeTree/MergeTreeInOrderSelectProcessor.cpp new file mode 100644 index 00000000000..48a9d62d872 --- /dev/null +++ b/src/Storages/MergeTree/MergeTreeInOrderSelectProcessor.cpp @@ -0,0 +1,54 @@ +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int MEMORY_LIMIT_EXCEEDED; +} + +bool MergeTreeInOrderSelectProcessor::getNewTask() +try +{ + if (all_mark_ranges.empty()) + { + finish(); + return false; + } + + if (!reader) + initializeReaders(); + + MarkRanges mark_ranges_for_task; + /// If we need to read few rows, set one range per task to reduce number of read data. + if (has_limit_below_one_block) + { + mark_ranges_for_task = { std::move(all_mark_ranges.front()) }; + all_mark_ranges.pop_front(); + } + else + { + mark_ranges_for_task = std::move(all_mark_ranges); + all_mark_ranges.clear(); + } + + auto size_predictor = (preferred_block_size_bytes == 0) ? nullptr + : getSizePredictor(data_part, task_columns, sample_block); + + task = std::make_unique( + data_part, mark_ranges_for_task, part_index_in_query, ordered_names, column_name_set, task_columns.columns, + task_columns.pre_columns, prewhere_info && prewhere_info->remove_prewhere_column, + task_columns.should_reorder, std::move(size_predictor)); + + return true; +} +catch (...) +{ + /// Suspicion of the broken part. A part is added to the queue for verification. + if (getCurrentExceptionCode() != ErrorCodes::MEMORY_LIMIT_EXCEEDED) + storage.reportBrokenPart(data_part->name); + throw; +} + +} diff --git a/src/Storages/MergeTree/MergeTreeInOrderSelectProcessor.h b/src/Storages/MergeTree/MergeTreeInOrderSelectProcessor.h new file mode 100644 index 00000000000..ecf648b0291 --- /dev/null +++ b/src/Storages/MergeTree/MergeTreeInOrderSelectProcessor.h @@ -0,0 +1,31 @@ +#pragma once +#include + +namespace DB +{ + + +/// Used to read data from single part with select query in order of primary key. +/// Cares about PREWHERE, virtual columns, indexes etc. +/// To read data from multiple parts, Storage (MergeTree) creates multiple such objects. +class MergeTreeInOrderSelectProcessor final : public MergeTreeSelectProcessor +{ +public: + template + MergeTreeInOrderSelectProcessor(Args &&... args) + : MergeTreeSelectProcessor{std::forward(args)...} + { + LOG_DEBUG(log, "Reading {} ranges in order from part {}, approx. {} rows starting from {}", + all_mark_ranges.size(), data_part->name, total_rows, + data_part->index_granularity.getMarkStartingRow(all_mark_ranges.front().begin)); + } + + String getName() const override { return "MergeTreeInOrder"; } + +private: + bool getNewTask() override; + + Poco::Logger * log = &Poco::Logger::get("MergeTreeInOrderSelectProcessor"); +}; + +} diff --git a/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp b/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp index 099d561cf80..ebf553295be 100644 --- a/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexMinMax.cpp @@ -5,6 +5,7 @@ #include #include +#include namespace DB { @@ -46,6 +47,9 @@ void MergeTreeIndexGranuleMinMax::serializeBinary(WriteBuffer & ostr) const } else { + /// NOTE: that this serialization differs from + /// IMergeTreeDataPart::MinMaxIndex::store() due to preserve + /// backward compatibility. bool is_null = hyperrectangle[i].left.isNull() || hyperrectangle[i].right.isNull(); // one is enough writeBinary(is_null, ostr); if (!is_null) @@ -63,7 +67,6 @@ void MergeTreeIndexGranuleMinMax::deserializeBinary(ReadBuffer & istr) Field min_val; Field max_val; - for (size_t i = 0; i < index_sample_block.columns(); ++i) { const DataTypePtr & type = index_sample_block.getByPosition(i).type; @@ -76,6 +79,9 @@ void MergeTreeIndexGranuleMinMax::deserializeBinary(ReadBuffer & istr) } else { + /// NOTE: that this serialization differs from + /// IMergeTreeDataPart::MinMaxIndex::load() due to preserve + /// backward compatibility. bool is_null; readBinary(is_null, istr); if (!is_null) @@ -117,8 +123,11 @@ void MergeTreeIndexAggregatorMinMax::update(const Block & block, size_t * pos, s for (size_t i = 0; i < index_sample_block.columns(); ++i) { auto index_column_name = index_sample_block.getByPosition(i).name; - const auto & column = block.getByName(index_column_name).column; - column->cut(*pos, rows_read)->getExtremes(field_min, field_max); + const auto & column = block.getByName(index_column_name).column->cut(*pos, rows_read); + if (const auto * column_nullable = typeid_cast(column.get())) + column_nullable->getExtremesNullLast(field_min, field_max); + else + column->getExtremes(field_min, field_max); if (hyperrectangle.size() <= i) { @@ -126,8 +135,10 @@ void MergeTreeIndexAggregatorMinMax::update(const Block & block, size_t * pos, s } else { - hyperrectangle[i].left = std::min(hyperrectangle[i].left, field_min); - hyperrectangle[i].right = std::max(hyperrectangle[i].right, field_max); + hyperrectangle[i].left + = applyVisitor(FieldVisitorAccurateLess(), hyperrectangle[i].left, field_min) ? hyperrectangle[i].left : field_min; + hyperrectangle[i].right + = applyVisitor(FieldVisitorAccurateLess(), hyperrectangle[i].right, field_max) ? field_max : hyperrectangle[i].right; } } @@ -156,9 +167,6 @@ bool MergeTreeIndexConditionMinMax::mayBeTrueOnGranule(MergeTreeIndexGranulePtr if (!granule) throw Exception( "Minmax index condition got a granule with the wrong type.", ErrorCodes::LOGICAL_ERROR); - for (const auto & range : granule->hyperrectangle) - if (range.left.isNull() || range.right.isNull()) - return true; return condition.checkInHyperrectangle(granule->hyperrectangle, index_data_types).can_be_true; } diff --git a/src/Storages/MergeTree/MergeTreePartInfo.cpp b/src/Storages/MergeTree/MergeTreePartInfo.cpp index 94430de422e..ccb26a0999e 100644 --- a/src/Storages/MergeTree/MergeTreePartInfo.cpp +++ b/src/Storages/MergeTree/MergeTreePartInfo.cpp @@ -21,6 +21,40 @@ MergeTreePartInfo MergeTreePartInfo::fromPartName(const String & part_name, Merg } +bool MergeTreePartInfo::validatePartitionID(const String & partition_id, MergeTreeDataFormatVersion format_version) +{ + if (partition_id.empty()) + return false; + + ReadBufferFromString in(partition_id); + + if (format_version < MERGE_TREE_DATA_MIN_FORMAT_VERSION_WITH_CUSTOM_PARTITIONING) + { + UInt32 min_yyyymmdd = 0; + UInt32 max_yyyymmdd = 0; + if (!tryReadIntText(min_yyyymmdd, in) + || !checkChar('_', in) + || !tryReadIntText(max_yyyymmdd, in) + || !checkChar('_', in)) + { + return false; + } + } + else + { + while (!in.eof()) + { + char c; + readChar(c, in); + + if (c == '_') + break; + } + } + + return in.eof(); +} + bool MergeTreePartInfo::tryParsePartName(const String & part_name, MergeTreePartInfo * part_info, MergeTreeDataFormatVersion format_version) { ReadBufferFromString in(part_name); @@ -213,13 +247,39 @@ String MergeTreePartInfo::getPartNameV0(DayNum left_date, DayNum right_date) con return wb.str(); } + +const std::vector DetachedPartInfo::DETACH_REASONS = + { + "broken", + "unexpected", + "noquorum", + "ignored", + "broken-on-start", + "clone", + "attaching", + "deleting", + "tmp-fetch", + }; + bool DetachedPartInfo::tryParseDetachedPartName(const String & dir_name, DetachedPartInfo & part_info, MergeTreeDataFormatVersion format_version) { part_info.dir_name = dir_name; - /// First, try to parse as . - // TODO what if tryParsePartName will parse prefix as partition_id? It can happen if dir_name doesn't contain mutation number at the end + /// First, try to find known prefix and parse dir_name as _. + /// Arbitrary strings are not allowed for partition_id, so known_prefix cannot be confused with partition_id. + for (const auto & known_prefix : DETACH_REASONS) + { + if (dir_name.starts_with(known_prefix) && known_prefix.size() < dir_name.size() && dir_name[known_prefix.size()] == '_') + { + part_info.prefix = known_prefix; + String part_name = dir_name.substr(known_prefix.size() + 1); + bool parsed = MergeTreePartInfo::tryParsePartName(part_name, &part_info, format_version); + return part_info.valid_name = parsed; + } + } + + /// Next, try to parse dir_name as . if (MergeTreePartInfo::tryParsePartName(dir_name, &part_info, format_version)) return part_info.valid_name = true; @@ -229,7 +289,6 @@ bool DetachedPartInfo::tryParseDetachedPartName(const String & dir_name, Detache if (first_separator == String::npos) return part_info.valid_name = false; - // TODO what if contains '_'? const auto part_name = dir_name.substr(first_separator + 1, dir_name.size() - first_separator - 1); if (!MergeTreePartInfo::tryParsePartName(part_name, &part_info, format_version)) diff --git a/src/Storages/MergeTree/MergeTreePartInfo.h b/src/Storages/MergeTree/MergeTreePartInfo.h index 8b77442bf8b..87f96ed5038 100644 --- a/src/Storages/MergeTree/MergeTreePartInfo.h +++ b/src/Storages/MergeTree/MergeTreePartInfo.h @@ -2,6 +2,7 @@ #include #include +#include #include #include #include @@ -86,6 +87,9 @@ struct MergeTreePartInfo return static_cast(max_block - min_block + 1); } + /// Simple sanity check for partition ID. Checking that it's not too long or too short, doesn't contain a lot of '_'. + static bool validatePartitionID(const String & partition_id, MergeTreeDataFormatVersion format_version); + static MergeTreePartInfo fromPartName(const String & part_name, MergeTreeDataFormatVersion format_version); // -V1071 static bool tryParsePartName(const String & part_name, MergeTreePartInfo * part_info, MergeTreeDataFormatVersion format_version); @@ -112,6 +116,10 @@ struct DetachedPartInfo : public MergeTreePartInfo /// If false, MergeTreePartInfo is in invalid state (directory name was not successfully parsed). bool valid_name; + static const std::vector DETACH_REASONS; + + /// NOTE: It may parse part info incorrectly. + /// For example, if prefix contain '_' or if DETACH_REASONS doesn't contain prefix. static bool tryParseDetachedPartName(const String & dir_name, DetachedPartInfo & part_info, MergeTreeDataFormatVersion format_version); }; diff --git a/src/Storages/MergeTree/MergeTreePartition.cpp b/src/Storages/MergeTree/MergeTreePartition.cpp index 8c027eb2089..0d457971dc6 100644 --- a/src/Storages/MergeTree/MergeTreePartition.cpp +++ b/src/Storages/MergeTree/MergeTreePartition.cpp @@ -43,6 +43,16 @@ namespace UInt8 type = Field::Types::Null; hash.update(type); } + void operator() (const NegativeInfinity &) const + { + UInt8 type = Field::Types::NegativeInfinity; + hash.update(type); + } + void operator() (const PositiveInfinity &) const + { + UInt8 type = Field::Types::PositiveInfinity; + hash.update(type); + } void operator() (const UInt64 & x) const { UInt8 type = Field::Types::UInt64; diff --git a/src/Storages/MergeTree/MergeTreeReadPool.cpp b/src/Storages/MergeTree/MergeTreeReadPool.cpp index f5ae5162676..b18f22c3ab1 100644 --- a/src/Storages/MergeTree/MergeTreeReadPool.cpp +++ b/src/Storages/MergeTree/MergeTreeReadPool.cpp @@ -228,29 +228,20 @@ std::vector MergeTreeReadPool::fillPerPartInfo( per_part_sum_marks.push_back(sum_marks); - auto [required_columns, required_pre_columns, should_reorder] = - getReadTaskColumns(data, metadata_snapshot, part.data_part, column_names, prewhere_info, check_columns); + auto task_columns = getReadTaskColumns(data, metadata_snapshot, part.data_part, column_names, prewhere_info, check_columns); - if (predict_block_size_bytes) - { - const auto & required_column_names = required_columns.getNames(); - const auto & required_pre_column_names = required_pre_columns.getNames(); - NameSet complete_column_names(required_column_names.begin(), required_column_names.end()); - complete_column_names.insert(required_pre_column_names.begin(), required_pre_column_names.end()); + auto size_predictor = !predict_block_size_bytes ? nullptr + : MergeTreeBaseSelectProcessor::getSizePredictor(part.data_part, task_columns, sample_block); - per_part_size_predictor.emplace_back(std::make_unique( - part.data_part, Names(complete_column_names.begin(), complete_column_names.end()), sample_block)); - } - else - per_part_size_predictor.emplace_back(nullptr); + per_part_size_predictor.emplace_back(std::move(size_predictor)); /// will be used to distinguish between PREWHERE and WHERE columns when applying filter - const auto & required_column_names = required_columns.getNames(); + const auto & required_column_names = task_columns.columns.getNames(); per_part_column_name_set.emplace_back(required_column_names.begin(), required_column_names.end()); - per_part_pre_columns.push_back(std::move(required_pre_columns)); - per_part_columns.push_back(std::move(required_columns)); - per_part_should_reorder.push_back(should_reorder); + per_part_pre_columns.push_back(std::move(task_columns.pre_columns)); + per_part_columns.push_back(std::move(task_columns.columns)); + per_part_should_reorder.push_back(task_columns.should_reorder); parts_with_idx.push_back({ part.data_part, part.part_index_in_query }); } diff --git a/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.cpp b/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.cpp index d546b2a95af..16ce9823ebb 100644 --- a/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.cpp +++ b/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.cpp @@ -1,8 +1,4 @@ #include -#include -#include -#include - namespace DB { @@ -12,74 +8,10 @@ namespace ErrorCodes extern const int MEMORY_LIMIT_EXCEEDED; } -MergeTreeReverseSelectProcessor::MergeTreeReverseSelectProcessor( - const MergeTreeData & storage_, - const StorageMetadataPtr & metadata_snapshot_, - const MergeTreeData::DataPartPtr & owned_data_part_, - UInt64 max_block_size_rows_, - size_t preferred_block_size_bytes_, - size_t preferred_max_column_in_block_size_bytes_, - Names required_columns_, - MarkRanges mark_ranges_, - bool use_uncompressed_cache_, - const PrewhereInfoPtr & prewhere_info_, - ExpressionActionsSettings actions_settings, - bool check_columns, - const MergeTreeReaderSettings & reader_settings_, - const Names & virt_column_names_, - size_t part_index_in_query_, - bool quiet) - : - MergeTreeBaseSelectProcessor{ - metadata_snapshot_->getSampleBlockForColumns(required_columns_, storage_.getVirtuals(), storage_.getStorageID()), - storage_, metadata_snapshot_, prewhere_info_, std::move(actions_settings), max_block_size_rows_, - preferred_block_size_bytes_, preferred_max_column_in_block_size_bytes_, - reader_settings_, use_uncompressed_cache_, virt_column_names_}, - required_columns{std::move(required_columns_)}, - data_part{owned_data_part_}, - all_mark_ranges(std::move(mark_ranges_)), - part_index_in_query(part_index_in_query_), - path(data_part->getFullRelativePath()) -{ - /// Let's estimate total number of rows for progress bar. - for (const auto & range : all_mark_ranges) - total_marks_count += range.end - range.begin; - - size_t total_rows = data_part->index_granularity.getRowsCountInRanges(all_mark_ranges); - - if (!quiet) - LOG_DEBUG(log, "Reading {} ranges in reverse order from part {}, approx. {} rows starting from {}", - all_mark_ranges.size(), data_part->name, total_rows, - data_part->index_granularity.getMarkStartingRow(all_mark_ranges.front().begin)); - - addTotalRowsApprox(total_rows); - - ordered_names = header_without_virtual_columns.getNames(); - - task_columns = getReadTaskColumns(storage, metadata_snapshot, data_part, required_columns, prewhere_info, check_columns); - - /// will be used to distinguish between PREWHERE and WHERE columns when applying filter - const auto & column_names = task_columns.columns.getNames(); - column_name_set = NameSet{column_names.begin(), column_names.end()}; - - if (use_uncompressed_cache) - owned_uncompressed_cache = storage.getContext()->getUncompressedCache(); - - owned_mark_cache = storage.getContext()->getMarkCache(); - - reader = data_part->getReader(task_columns.columns, metadata_snapshot, - all_mark_ranges, owned_uncompressed_cache.get(), - owned_mark_cache.get(), reader_settings); - - if (prewhere_info) - pre_reader = data_part->getReader(task_columns.pre_columns, metadata_snapshot, all_mark_ranges, - owned_uncompressed_cache.get(), owned_mark_cache.get(), reader_settings); -} - bool MergeTreeReverseSelectProcessor::getNewTask() try { - if ((chunks.empty() && all_mark_ranges.empty()) || total_marks_count == 0) + if (chunks.empty() && all_mark_ranges.empty()) { finish(); return false; @@ -90,21 +22,15 @@ try if (all_mark_ranges.empty()) return true; + if (!reader) + initializeReaders(); + /// Read ranges from right to left. MarkRanges mark_ranges_for_task = { all_mark_ranges.back() }; all_mark_ranges.pop_back(); - std::unique_ptr size_predictor; - if (preferred_block_size_bytes) - { - const auto & required_column_names = task_columns.columns.getNames(); - const auto & required_pre_column_names = task_columns.pre_columns.getNames(); - NameSet complete_column_names(required_column_names.begin(), required_column_names.end()); - complete_column_names.insert(required_pre_column_names.begin(), required_pre_column_names.end()); - - size_predictor = std::make_unique( - data_part, Names(complete_column_names.begin(), complete_column_names.end()), metadata_snapshot->getSampleBlock()); - } + auto size_predictor = (preferred_block_size_bytes == 0) ? nullptr + : getSizePredictor(data_part, task_columns, sample_block); task = std::make_unique( data_part, mark_ranges_for_task, part_index_in_query, ordered_names, column_name_set, @@ -150,17 +76,4 @@ Chunk MergeTreeReverseSelectProcessor::readFromPart() return res; } -void MergeTreeReverseSelectProcessor::finish() -{ - /** Close the files (before destroying the object). - * When many sources are created, but simultaneously reading only a few of them, - * buffers don't waste memory. - */ - reader.reset(); - pre_reader.reset(); - data_part.reset(); -} - -MergeTreeReverseSelectProcessor::~MergeTreeReverseSelectProcessor() = default; - } diff --git a/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.h b/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.h index b807c2d912c..18ab51c03a0 100644 --- a/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.h +++ b/src/Storages/MergeTree/MergeTreeReverseSelectProcessor.h @@ -1,76 +1,33 @@ #pragma once -#include -#include -#include -#include -#include -#include +#include + namespace DB { - /// Used to read data from single part with select query +/// in reverse order of primary key. /// Cares about PREWHERE, virtual columns, indexes etc. /// To read data from multiple parts, Storage (MergeTree) creates multiple such objects. -class MergeTreeReverseSelectProcessor : public MergeTreeBaseSelectProcessor +class MergeTreeReverseSelectProcessor final : public MergeTreeSelectProcessor { public: - MergeTreeReverseSelectProcessor( - const MergeTreeData & storage, - const StorageMetadataPtr & metadata_snapshot, - const MergeTreeData::DataPartPtr & owned_data_part, - UInt64 max_block_size_rows, - size_t preferred_block_size_bytes, - size_t preferred_max_column_in_block_size_bytes, - Names required_columns_, - MarkRanges mark_ranges, - bool use_uncompressed_cache, - const PrewhereInfoPtr & prewhere_info, - ExpressionActionsSettings actions_settings, - bool check_columns, - const MergeTreeReaderSettings & reader_settings, - const Names & virt_column_names = {}, - size_t part_index_in_query = 0, - bool quiet = false); - - ~MergeTreeReverseSelectProcessor() override; + template + MergeTreeReverseSelectProcessor(Args &&... args) + : MergeTreeSelectProcessor{std::forward(args)...} + { + LOG_DEBUG(log, "Reading {} ranges in reverse order from part {}, approx. {} rows starting from {}", + all_mark_ranges.size(), data_part->name, total_rows, + data_part->index_granularity.getMarkStartingRow(all_mark_ranges.front().begin)); + } String getName() const override { return "MergeTreeReverse"; } - /// Closes readers and unlock part locks - void finish(); - -protected: - +private: bool getNewTask() override; Chunk readFromPart() override; -private: - Block header; - - /// Used by Task - Names required_columns; - /// Names from header. Used in order to order columns in read blocks. - Names ordered_names; - NameSet column_name_set; - - MergeTreeReadTaskColumns task_columns; - - /// Data part will not be removed if the pointer owns it - MergeTreeData::DataPartPtr data_part; - - /// Mark ranges we should read (in ascending order) - MarkRanges all_mark_ranges; - /// Total number of marks we should read - size_t total_marks_count = 0; - /// Value of _part_index virtual column (used only in SelectExecutor) - size_t part_index_in_query = 0; - - String path; - Chunks chunks; - Poco::Logger * log = &Poco::Logger::get("MergeTreeReverseSelectProcessor"); }; diff --git a/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp b/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp index 1e4b61e13d9..98077605f89 100644 --- a/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp +++ b/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp @@ -7,11 +7,6 @@ namespace DB { -namespace ErrorCodes -{ - extern const int MEMORY_LIMIT_EXCEEDED; -} - MergeTreeSelectProcessor::MergeTreeSelectProcessor( const MergeTreeData & storage_, const StorageMetadataPtr & metadata_snapshot_, @@ -28,96 +23,48 @@ MergeTreeSelectProcessor::MergeTreeSelectProcessor( const MergeTreeReaderSettings & reader_settings_, const Names & virt_column_names_, size_t part_index_in_query_, - bool quiet) - : - MergeTreeBaseSelectProcessor{ + bool has_limit_below_one_block_) + : MergeTreeBaseSelectProcessor{ metadata_snapshot_->getSampleBlockForColumns(required_columns_, storage_.getVirtuals(), storage_.getStorageID()), storage_, metadata_snapshot_, prewhere_info_, std::move(actions_settings), max_block_size_rows_, preferred_block_size_bytes_, preferred_max_column_in_block_size_bytes_, reader_settings_, use_uncompressed_cache_, virt_column_names_}, required_columns{std::move(required_columns_)}, data_part{owned_data_part_}, + sample_block(metadata_snapshot_->getSampleBlock()), all_mark_ranges(std::move(mark_ranges_)), part_index_in_query(part_index_in_query_), - check_columns(check_columns_) + has_limit_below_one_block(has_limit_below_one_block_), + check_columns(check_columns_), + total_rows(data_part->index_granularity.getRowsCountInRanges(all_mark_ranges)) { - /// Let's estimate total number of rows for progress bar. - for (const auto & range : all_mark_ranges) - total_marks_count += range.end - range.begin; - - size_t total_rows = data_part->index_granularity.getRowsCountInRanges(all_mark_ranges); - - if (!quiet) - LOG_DEBUG(log, "Reading {} ranges from part {}, approx. {} rows starting from {}", - all_mark_ranges.size(), data_part->name, total_rows, - data_part->index_granularity.getMarkStartingRow(all_mark_ranges.front().begin)); - addTotalRowsApprox(total_rows); ordered_names = header_without_virtual_columns.getNames(); } - -bool MergeTreeSelectProcessor::getNewTask() -try +void MergeTreeSelectProcessor::initializeReaders() { - /// Produce no more than one task - if (!is_first_task || total_marks_count == 0) - { - finish(); - return false; - } - is_first_task = false; - task_columns = getReadTaskColumns( storage, metadata_snapshot, data_part, required_columns, prewhere_info, check_columns); - std::unique_ptr size_predictor; - if (preferred_block_size_bytes) - { - const auto & required_column_names = task_columns.columns.getNames(); - const auto & required_pre_column_names = task_columns.pre_columns.getNames(); - NameSet complete_column_names(required_column_names.begin(), required_column_names.end()); - complete_column_names.insert(required_pre_column_names.begin(), required_pre_column_names.end()); - - size_predictor = std::make_unique( - data_part, Names(complete_column_names.begin(), complete_column_names.end()), metadata_snapshot->getSampleBlock()); - } - - /// will be used to distinguish between PREWHERE and WHERE columns when applying filter + /// Will be used to distinguish between PREWHERE and WHERE columns when applying filter const auto & column_names = task_columns.columns.getNames(); column_name_set = NameSet{column_names.begin(), column_names.end()}; - task = std::make_unique( - data_part, all_mark_ranges, part_index_in_query, ordered_names, column_name_set, task_columns.columns, - task_columns.pre_columns, prewhere_info && prewhere_info->remove_prewhere_column, - task_columns.should_reorder, std::move(size_predictor)); + if (use_uncompressed_cache) + owned_uncompressed_cache = storage.getContext()->getUncompressedCache(); - if (!reader) - { - if (use_uncompressed_cache) - owned_uncompressed_cache = storage.getContext()->getUncompressedCache(); + owned_mark_cache = storage.getContext()->getMarkCache(); - owned_mark_cache = storage.getContext()->getMarkCache(); + reader = data_part->getReader(task_columns.columns, metadata_snapshot, all_mark_ranges, + owned_uncompressed_cache.get(), owned_mark_cache.get(), reader_settings); - reader = data_part->getReader(task_columns.columns, metadata_snapshot, all_mark_ranges, + if (prewhere_info) + pre_reader = data_part->getReader(task_columns.pre_columns, metadata_snapshot, all_mark_ranges, owned_uncompressed_cache.get(), owned_mark_cache.get(), reader_settings); - if (prewhere_info) - pre_reader = data_part->getReader(task_columns.pre_columns, metadata_snapshot, all_mark_ranges, - owned_uncompressed_cache.get(), owned_mark_cache.get(), reader_settings); - } - - return true; } -catch (...) -{ - /// Suspicion of the broken part. A part is added to the queue for verification. - if (getCurrentExceptionCode() != ErrorCodes::MEMORY_LIMIT_EXCEEDED) - storage.reportBrokenPart(data_part->name); - throw; -} - void MergeTreeSelectProcessor::finish() { @@ -130,8 +77,6 @@ void MergeTreeSelectProcessor::finish() data_part.reset(); } - MergeTreeSelectProcessor::~MergeTreeSelectProcessor() = default; - } diff --git a/src/Storages/MergeTree/MergeTreeSelectProcessor.h b/src/Storages/MergeTree/MergeTreeSelectProcessor.h index b63107b6dbf..ea4cd349cba 100644 --- a/src/Storages/MergeTree/MergeTreeSelectProcessor.h +++ b/src/Storages/MergeTree/MergeTreeSelectProcessor.h @@ -1,11 +1,11 @@ #pragma once -#include -#include +#include #include #include #include #include + namespace DB { @@ -28,24 +28,21 @@ public: bool use_uncompressed_cache, const PrewhereInfoPtr & prewhere_info, ExpressionActionsSettings actions_settings, - bool check_columns, + bool check_columns_, const MergeTreeReaderSettings & reader_settings, const Names & virt_column_names = {}, - size_t part_index_in_query = 0, - bool quiet = false); + size_t part_index_in_query_ = 0, + bool has_limit_below_one_block_ = false); ~MergeTreeSelectProcessor() override; - String getName() const override { return "MergeTree"; } - /// Closes readers and unlock part locks void finish(); protected: - - bool getNewTask() override; - -private: + /// Defer initialization from constructor, because it may be heavy + /// and it's better to do it lazily in `getNewTask`, which is executing in parallel. + void initializeReaders(); /// Used by Task Names required_columns; @@ -58,17 +55,19 @@ private: /// Data part will not be removed if the pointer owns it MergeTreeData::DataPartPtr data_part; + /// Cache getSampleBlock call, which might be heavy. + Block sample_block; + /// Mark ranges we should read (in ascending order) MarkRanges all_mark_ranges; - /// Total number of marks we should read - size_t total_marks_count = 0; /// Value of _part_index virtual column (used only in SelectExecutor) size_t part_index_in_query = 0; + /// If true, every task will be created only with one range. + /// It reduces amount of read data for queries with small LIMIT. + bool has_limit_below_one_block = false; bool check_columns; - bool is_first_task = true; - - Poco::Logger * log = &Poco::Logger::get("MergeTreeSelectProcessor"); + size_t total_rows = 0; }; } diff --git a/src/Storages/MergeTree/MergeTreeSequentialSource.cpp b/src/Storages/MergeTree/MergeTreeSequentialSource.cpp index 2a3c7ed00a1..c854ca4e305 100644 --- a/src/Storages/MergeTree/MergeTreeSequentialSource.cpp +++ b/src/Storages/MergeTree/MergeTreeSequentialSource.cpp @@ -43,8 +43,7 @@ MergeTreeSequentialSource::MergeTreeSequentialSource( NamesAndTypesList columns_for_reader; if (take_column_types_from_storage) { - const NamesAndTypesList & physical_columns = metadata_snapshot->getColumns().getAllPhysical(); - columns_for_reader = physical_columns.addTypes(columns_to_read); + columns_for_reader = metadata_snapshot->getColumns().getByNames(ColumnsDescription::AllPhysical, columns_to_read, false); } else { diff --git a/src/Storages/MergeTree/MergeTreeSettings.h b/src/Storages/MergeTree/MergeTreeSettings.h index b50a9935ea0..d018059c248 100644 --- a/src/Storages/MergeTree/MergeTreeSettings.h +++ b/src/Storages/MergeTree/MergeTreeSettings.h @@ -57,6 +57,7 @@ struct Settings; M(Bool, in_memory_parts_insert_sync, false, "If true insert of part with in-memory format will wait for fsync of WAL", 0) \ M(UInt64, non_replicated_deduplication_window, 0, "How many last blocks of hashes should be kept on disk (0 - disabled).", 0) \ M(UInt64, max_parts_to_merge_at_once, 100, "Max amount of parts which can be merged at once (0 - disabled). Doesn't affect OPTIMIZE FINAL query.", 0) \ + M(UInt64, merge_selecting_sleep_ms, 5000, "Sleep time for merge selecting when no part selected, a lower setting will trigger selecting tasks in background_schedule_pool frequently which result in large amount of requests to zookeeper in large-scale clusters", 0) \ \ /** Inserts settings. */ \ M(UInt64, parts_to_delay_insert, 150, "If table contains at least that many active parts in single partition, artificially slow down insert into table.", 0) \ diff --git a/src/Storages/MergeTree/MergeTreeBlockOutputStream.cpp b/src/Storages/MergeTree/MergeTreeSink.cpp similarity index 83% rename from src/Storages/MergeTree/MergeTreeBlockOutputStream.cpp rename to src/Storages/MergeTree/MergeTreeSink.cpp index 11feb905bb6..73c753386a4 100644 --- a/src/Storages/MergeTree/MergeTreeBlockOutputStream.cpp +++ b/src/Storages/MergeTree/MergeTreeSink.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include #include @@ -7,13 +7,7 @@ namespace DB { -Block MergeTreeBlockOutputStream::getHeader() const -{ - return metadata_snapshot->getSampleBlock(); -} - - -void MergeTreeBlockOutputStream::writePrefix() +void MergeTreeSink::onStart() { /// Only check "too many parts" before write, /// because interrupting long-running INSERT query in the middle is not convenient for users. @@ -21,8 +15,10 @@ void MergeTreeBlockOutputStream::writePrefix() } -void MergeTreeBlockOutputStream::write(const Block & block) +void MergeTreeSink::consume(Chunk chunk) { + auto block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); + auto part_blocks = storage.writer.splitBlockIntoParts(block, max_parts_per_block, metadata_snapshot, context); for (auto & current_block : part_blocks) { diff --git a/src/Storages/MergeTree/MergeTreeBlockOutputStream.h b/src/Storages/MergeTree/MergeTreeSink.h similarity index 63% rename from src/Storages/MergeTree/MergeTreeBlockOutputStream.h rename to src/Storages/MergeTree/MergeTreeSink.h index 32a5b8fccc2..60ac62c7592 100644 --- a/src/Storages/MergeTree/MergeTreeBlockOutputStream.h +++ b/src/Storages/MergeTree/MergeTreeSink.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include @@ -11,24 +11,25 @@ class Block; class StorageMergeTree; -class MergeTreeBlockOutputStream : public IBlockOutputStream +class MergeTreeSink : public SinkToStorage { public: - MergeTreeBlockOutputStream( + MergeTreeSink( StorageMergeTree & storage_, const StorageMetadataPtr metadata_snapshot_, size_t max_parts_per_block_, ContextPtr context_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) , max_parts_per_block(max_parts_per_block_) , context(context_) { } - Block getHeader() const override; - void write(const Block & block) override; - void writePrefix() override; + String getName() const override { return "MergeTreeSink"; } + void consume(Chunk chunk) override; + void onStart() override; private: StorageMergeTree & storage; diff --git a/src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.cpp b/src/Storages/MergeTree/MergeTreeThreadSelectProcessor.cpp similarity index 93% rename from src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.cpp rename to src/Storages/MergeTree/MergeTreeThreadSelectProcessor.cpp index daefb17038a..4eb6bc4b2e2 100644 --- a/src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.cpp +++ b/src/Storages/MergeTree/MergeTreeThreadSelectProcessor.cpp @@ -1,6 +1,6 @@ #include #include -#include +#include #include @@ -8,7 +8,7 @@ namespace DB { -MergeTreeThreadSelectBlockInputProcessor::MergeTreeThreadSelectBlockInputProcessor( +MergeTreeThreadSelectProcessor::MergeTreeThreadSelectProcessor( const size_t thread_, const MergeTreeReadPoolPtr & pool_, const size_t min_marks_to_read_, @@ -46,7 +46,7 @@ MergeTreeThreadSelectBlockInputProcessor::MergeTreeThreadSelectBlockInputProcess } /// Requests read task from MergeTreeReadPool and signals whether it got one -bool MergeTreeThreadSelectBlockInputProcessor::getNewTask() +bool MergeTreeThreadSelectProcessor::getNewTask() { task = pool->getTask(min_marks_to_read, thread, ordered_names); @@ -107,6 +107,6 @@ bool MergeTreeThreadSelectBlockInputProcessor::getNewTask() } -MergeTreeThreadSelectBlockInputProcessor::~MergeTreeThreadSelectBlockInputProcessor() = default; +MergeTreeThreadSelectProcessor::~MergeTreeThreadSelectProcessor() = default; } diff --git a/src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.h b/src/Storages/MergeTree/MergeTreeThreadSelectProcessor.h similarity index 88% rename from src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.h rename to src/Storages/MergeTree/MergeTreeThreadSelectProcessor.h index 30c551eede0..d17b15c3635 100644 --- a/src/Storages/MergeTree/MergeTreeThreadSelectBlockInputProcessor.h +++ b/src/Storages/MergeTree/MergeTreeThreadSelectProcessor.h @@ -11,10 +11,10 @@ class MergeTreeReadPool; /** Used in conjunction with MergeTreeReadPool, asking it for more work to do and performing whatever reads it is asked * to perform. */ -class MergeTreeThreadSelectBlockInputProcessor : public MergeTreeBaseSelectProcessor +class MergeTreeThreadSelectProcessor : public MergeTreeBaseSelectProcessor { public: - MergeTreeThreadSelectBlockInputProcessor( + MergeTreeThreadSelectProcessor( const size_t thread_, const std::shared_ptr & pool_, const size_t min_marks_to_read_, @@ -32,7 +32,7 @@ public: String getName() const override { return "MergeTreeThread"; } - ~MergeTreeThreadSelectBlockInputProcessor() override; + ~MergeTreeThreadSelectProcessor() override; protected: /// Requests read task from MergeTreeReadPool and signals whether it got one diff --git a/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp b/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp index aaa76009d74..ea5f7cfc36a 100644 --- a/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp +++ b/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp @@ -7,8 +7,14 @@ #include #include #include +#include +namespace CurrentMetrics +{ + extern const Metric BackgroundPoolTask; +} + namespace DB { @@ -275,6 +281,8 @@ void ReplicatedMergeTreeQueue::updateStateOnQueueEntryRemoval( current_parts.remove(*drop_range_part_name); virtual_parts.remove(*drop_range_part_name); + + removeCoveredPartsFromMutations(*drop_range_part_name, /*remove_part = */ true, /*remove_covered_parts = */ false); } if (entry->type == LogEntry::DROP_RANGE) @@ -297,6 +305,9 @@ void ReplicatedMergeTreeQueue::updateStateOnQueueEntryRemoval( for (const String & virtual_part_name : entry->getVirtualPartNames(format_version)) { + /// This part will never appear, so remove it from virtual parts + virtual_parts.remove(virtual_part_name); + /// Because execution of the entry is unsuccessful, /// `virtual_part_name` will never appear so we won't need to mutate /// it. @@ -886,7 +897,6 @@ bool ReplicatedMergeTreeQueue::checkReplaceRangeCanBeRemoved(const MergeTreePart if (entry_ptr->replace_range_entry == current.replace_range_entry) /// same partition, don't want to drop ourselves return false; - if (!part_info.contains(MergeTreePartInfo::fromPartName(entry_ptr->replace_range_entry->drop_range_part_name, format_version))) return false; @@ -1140,16 +1150,18 @@ bool ReplicatedMergeTreeQueue::shouldExecuteLogEntry( if (!ignore_max_size && sum_parts_size_in_bytes > max_source_parts_size) { - const char * format_str = "Not executing log entry {} of type {} for part {}" - " because source parts size ({}) is greater than the current maximum ({})."; + size_t busy_threads_in_pool = CurrentMetrics::values[CurrentMetrics::BackgroundPoolTask].load(std::memory_order_relaxed); + size_t thread_pool_size = data.getContext()->getSettingsRef().background_pool_size; + size_t free_threads = thread_pool_size - busy_threads_in_pool; + size_t required_threads = data_settings->number_of_free_entries_in_pool_to_execute_mutation; + out_postpone_reason = fmt::format("Not executing log entry {} of type {} for part {}" + " because source parts size ({}) is greater than the current maximum ({})." + " {} free of {} threads, required {} free threads.", + entry.znode_name, entry.typeToString(), entry.new_part_name, + ReadableSize(sum_parts_size_in_bytes), ReadableSize(max_source_parts_size), + free_threads, thread_pool_size, required_threads); - LOG_DEBUG(log, format_str, entry.znode_name, - entry.typeToString(), entry.new_part_name, - ReadableSize(sum_parts_size_in_bytes), ReadableSize(max_source_parts_size)); - - out_postpone_reason = fmt::format(format_str, entry.znode_name, - entry.typeToString(), entry.new_part_name, - ReadableSize(sum_parts_size_in_bytes), ReadableSize(max_source_parts_size)); + LOG_DEBUG(log, out_postpone_reason); return false; } diff --git a/src/Storages/MergeTree/ReplicatedMergeTreeRestartingThread.cpp b/src/Storages/MergeTree/ReplicatedMergeTreeRestartingThread.cpp index 1c9921aad1d..25f25480549 100644 --- a/src/Storages/MergeTree/ReplicatedMergeTreeRestartingThread.cpp +++ b/src/Storages/MergeTree/ReplicatedMergeTreeRestartingThread.cpp @@ -90,6 +90,8 @@ void ReplicatedMergeTreeRestartingThread::run() /// The exception when you try to zookeeper_init usually happens if DNS does not work. We will try to do it again. tryLogCurrentException(log, __PRETTY_FUNCTION__); + /// Here we're almost sure the table is already readonly, but it doesn't hurt to enforce it. + setReadonly(); if (first_time) storage.startup_event.set(); task->scheduleAfter(retry_period_ms); diff --git a/src/Storages/MergeTree/ReplicatedMergeTreeBlockOutputStream.cpp b/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp similarity index 97% rename from src/Storages/MergeTree/ReplicatedMergeTreeBlockOutputStream.cpp rename to src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp index 4a73658e8a4..c81f587cbbc 100644 --- a/src/Storages/MergeTree/ReplicatedMergeTreeBlockOutputStream.cpp +++ b/src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp @@ -1,6 +1,6 @@ #include #include -#include +#include #include #include #include @@ -33,7 +33,7 @@ namespace ErrorCodes } -ReplicatedMergeTreeBlockOutputStream::ReplicatedMergeTreeBlockOutputStream( +ReplicatedMergeTreeSink::ReplicatedMergeTreeSink( StorageReplicatedMergeTree & storage_, const StorageMetadataPtr & metadata_snapshot_, size_t quorum_, @@ -43,7 +43,8 @@ ReplicatedMergeTreeBlockOutputStream::ReplicatedMergeTreeBlockOutputStream( bool deduplicate_, ContextPtr context_, bool is_attach_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) , quorum(quorum_) , quorum_timeout_ms(quorum_timeout_ms_) @@ -60,12 +61,6 @@ ReplicatedMergeTreeBlockOutputStream::ReplicatedMergeTreeBlockOutputStream( } -Block ReplicatedMergeTreeBlockOutputStream::getHeader() const -{ - return metadata_snapshot->getSampleBlock(); -} - - /// Allow to verify that the session in ZooKeeper is still alive. static void assertSessionIsNotExpired(zkutil::ZooKeeperPtr & zookeeper) { @@ -77,7 +72,7 @@ static void assertSessionIsNotExpired(zkutil::ZooKeeperPtr & zookeeper) } -void ReplicatedMergeTreeBlockOutputStream::checkQuorumPrecondition(zkutil::ZooKeeperPtr & zookeeper) +void ReplicatedMergeTreeSink::checkQuorumPrecondition(zkutil::ZooKeeperPtr & zookeeper) { quorum_info.status_path = storage.zookeeper_path + "/quorum/status"; @@ -121,8 +116,10 @@ void ReplicatedMergeTreeBlockOutputStream::checkQuorumPrecondition(zkutil::ZooKe } -void ReplicatedMergeTreeBlockOutputStream::write(const Block & block) +void ReplicatedMergeTreeSink::consume(Chunk chunk) { + auto block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); + last_block_is_duplicate = false; auto zookeeper = storage.getZooKeeper(); @@ -183,7 +180,7 @@ void ReplicatedMergeTreeBlockOutputStream::write(const Block & block) } -void ReplicatedMergeTreeBlockOutputStream::writeExistingPart(MergeTreeData::MutableDataPartPtr & part) +void ReplicatedMergeTreeSink::writeExistingPart(MergeTreeData::MutableDataPartPtr & part) { last_block_is_duplicate = false; @@ -210,7 +207,7 @@ void ReplicatedMergeTreeBlockOutputStream::writeExistingPart(MergeTreeData::Muta } -void ReplicatedMergeTreeBlockOutputStream::commitPart( +void ReplicatedMergeTreeSink::commitPart( zkutil::ZooKeeperPtr & zookeeper, MergeTreeData::MutableDataPartPtr & part, const String & block_id) { metadata_snapshot->check(part->getColumns()); @@ -507,7 +504,7 @@ void ReplicatedMergeTreeBlockOutputStream::commitPart( } } -void ReplicatedMergeTreeBlockOutputStream::writePrefix() +void ReplicatedMergeTreeSink::onStart() { /// Only check "too many parts" before write, /// because interrupting long-running INSERT query in the middle is not convenient for users. @@ -515,7 +512,7 @@ void ReplicatedMergeTreeBlockOutputStream::writePrefix() } -void ReplicatedMergeTreeBlockOutputStream::waitForQuorum( +void ReplicatedMergeTreeSink::waitForQuorum( zkutil::ZooKeeperPtr & zookeeper, const std::string & part_name, const std::string & quorum_path, diff --git a/src/Storages/MergeTree/ReplicatedMergeTreeBlockOutputStream.h b/src/Storages/MergeTree/ReplicatedMergeTreeSink.h similarity index 88% rename from src/Storages/MergeTree/ReplicatedMergeTreeBlockOutputStream.h rename to src/Storages/MergeTree/ReplicatedMergeTreeSink.h index a3fce65a840..2a6702736df 100644 --- a/src/Storages/MergeTree/ReplicatedMergeTreeBlockOutputStream.h +++ b/src/Storages/MergeTree/ReplicatedMergeTreeSink.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include @@ -19,10 +19,10 @@ namespace DB class StorageReplicatedMergeTree; -class ReplicatedMergeTreeBlockOutputStream : public IBlockOutputStream +class ReplicatedMergeTreeSink : public SinkToStorage { public: - ReplicatedMergeTreeBlockOutputStream( + ReplicatedMergeTreeSink( StorageReplicatedMergeTree & storage_, const StorageMetadataPtr & metadata_snapshot_, size_t quorum_, @@ -35,9 +35,10 @@ public: // needed to set the special LogEntryType::ATTACH_PART bool is_attach_ = false); - Block getHeader() const override; - void writePrefix() override; - void write(const Block & block) override; + void onStart() override; + void consume(Chunk chunk) override; + + String getName() const override { return "ReplicatedMergeTreeSink"; } /// For ATTACHing existing data on filesystem. void writeExistingPart(MergeTreeData::MutableDataPartPtr & part); diff --git a/src/Storages/MergeTree/registerStorageMergeTree.cpp b/src/Storages/MergeTree/registerStorageMergeTree.cpp index 539f7713320..910492d2467 100644 --- a/src/Storages/MergeTree/registerStorageMergeTree.cpp +++ b/src/Storages/MergeTree/registerStorageMergeTree.cpp @@ -480,7 +480,10 @@ static StoragePtr create(const StorageFactory::Arguments & args) "No replica name in config" + getMergeTreeVerboseHelp(is_extended_storage_def), ErrorCodes::NO_REPLICA_NAME_GIVEN); ++arg_num; } - else if (is_extended_storage_def && (arg_cnt == 0 || !engine_args[arg_num]->as() || (arg_cnt == 1 && merging_params.mode == MergeTreeData::MergingParams::Graphite))) + else if (is_extended_storage_def + && (arg_cnt == 0 + || !engine_args[arg_num]->as() + || (arg_cnt == 1 && merging_params.mode == MergeTreeData::MergingParams::Graphite))) { /// Try use default values if arguments are not specified. /// Note: {uuid} macro works for ON CLUSTER queries when database engine is Atomic. diff --git a/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp b/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp index 4c614d8fd5a..32fb87873ea 100644 --- a/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp +++ b/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.cpp @@ -479,7 +479,7 @@ NameSet PostgreSQLReplicationHandler::fetchRequiredTables(postgres::Connection & "Publication {} already exists and tables list is empty. Assuming publication is correct.", publication_name); - result_tables = fetchPostgreSQLTablesList(tx); + result_tables = fetchPostgreSQLTablesList(tx, postgres_schema); } /// Check tables list from publication is the same as expected tables list. /// If not - drop publication and return expected tables list. @@ -521,7 +521,7 @@ NameSet PostgreSQLReplicationHandler::fetchRequiredTables(postgres::Connection & /// Fetch all tables list from database. Publication does not exist yet, which means /// that no replication took place. Publication will be created in /// startSynchronization method. - result_tables = fetchPostgreSQLTablesList(tx); + result_tables = fetchPostgreSQLTablesList(tx, postgres_schema); } } diff --git a/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.h b/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.h index 95ac12b3786..3a0bedc0852 100644 --- a/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.h +++ b/src/Storages/PostgreSQL/PostgreSQLReplicationHandler.h @@ -124,6 +124,8 @@ private: MaterializedStorages materialized_storages; UInt64 milliseconds_to_wait; + + String postgres_schema; }; } diff --git a/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.cpp b/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.cpp index 70251a940cc..e24e252bf01 100644 --- a/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.cpp +++ b/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.cpp @@ -256,6 +256,12 @@ NamesAndTypesList StorageMaterializedPostgreSQL::getVirtuals() const } +bool StorageMaterializedPostgreSQL::needRewriteQueryWithFinal(const Names & column_names) const +{ + return needRewriteQueryWithFinalForStorage(column_names, getNested()); +} + + Pipe StorageMaterializedPostgreSQL::read( const Names & column_names, const StorageMetadataPtr & metadata_snapshot, @@ -327,6 +333,16 @@ ASTPtr StorageMaterializedPostgreSQL::getColumnDeclaration(const DataTypePtr & d return make_decimal_expression("Decimal256"); } + if (which.isDateTime64()) + { + auto ast_expression = std::make_shared(); + + ast_expression->name = "DateTime64"; + ast_expression->arguments = std::make_shared(); + ast_expression->arguments->children.emplace_back(std::make_shared(UInt32(6))); + return ast_expression; + } + return std::make_shared(data_type->getName()); } diff --git a/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.h b/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.h index 5d18a0b16b7..becb4f6ba10 100644 --- a/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.h +++ b/src/Storages/PostgreSQL/StorageMaterializedPostgreSQL.h @@ -82,6 +82,8 @@ public: NamesAndTypesList getVirtuals() const override; + bool needRewriteQueryWithFinal(const Names & column_names) const override; + Pipe read( const Names & column_names, const StorageMetadataPtr & metadata_snapshot, @@ -119,6 +121,8 @@ public: /// for current table, set has_nested = true. StoragePtr prepare(); + bool supportsFinal() const override { return true; } + protected: StorageMaterializedPostgreSQL( const StorageID & table_id_, diff --git a/src/Storages/RabbitMQ/RabbitMQBlockOutputStream.h b/src/Storages/RabbitMQ/RabbitMQBlockOutputStream.h deleted file mode 100644 index 3941875ea86..00000000000 --- a/src/Storages/RabbitMQ/RabbitMQBlockOutputStream.h +++ /dev/null @@ -1,29 +0,0 @@ -#pragma once - -#include -#include - - -namespace DB -{ - -class RabbitMQBlockOutputStream : public IBlockOutputStream -{ - -public: - explicit RabbitMQBlockOutputStream(StorageRabbitMQ & storage_, const StorageMetadataPtr & metadata_snapshot_, ContextPtr context_); - - Block getHeader() const override; - - void writePrefix() override; - void write(const Block & block) override; - void writeSuffix() override; - -private: - StorageRabbitMQ & storage; - StorageMetadataPtr metadata_snapshot; - ContextPtr context; - ProducerBufferPtr buffer; - BlockOutputStreamPtr child; -}; -} diff --git a/src/Storages/RabbitMQ/RabbitMQHandler.cpp b/src/Storages/RabbitMQ/RabbitMQHandler.cpp index c994ab22494..85d8063a73f 100644 --- a/src/Storages/RabbitMQ/RabbitMQHandler.cpp +++ b/src/Storages/RabbitMQ/RabbitMQHandler.cpp @@ -57,11 +57,13 @@ void RabbitMQHandler::iterateLoop() /// initial RabbitMQ setup - at this point there is no background loop thread. void RabbitMQHandler::startBlockingLoop() { + LOG_DEBUG(log, "Started blocking loop."); uv_run(loop, UV_RUN_DEFAULT); } void RabbitMQHandler::stopLoop() { + LOG_DEBUG(log, "Implicit loop stop."); uv_stop(loop); } diff --git a/src/Storages/RabbitMQ/RabbitMQBlockOutputStream.cpp b/src/Storages/RabbitMQ/RabbitMQSink.cpp similarity index 68% rename from src/Storages/RabbitMQ/RabbitMQBlockOutputStream.cpp rename to src/Storages/RabbitMQ/RabbitMQSink.cpp index 3c837cb95b1..9c556ee0832 100644 --- a/src/Storages/RabbitMQ/RabbitMQBlockOutputStream.cpp +++ b/src/Storages/RabbitMQ/RabbitMQSink.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include #include @@ -8,24 +8,19 @@ namespace DB { -RabbitMQBlockOutputStream::RabbitMQBlockOutputStream( +RabbitMQSink::RabbitMQSink( StorageRabbitMQ & storage_, const StorageMetadataPtr & metadata_snapshot_, ContextPtr context_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlockNonMaterialized()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) , context(context_) { } -Block RabbitMQBlockOutputStream::getHeader() const -{ - return metadata_snapshot->getSampleBlockNonMaterialized(); -} - - -void RabbitMQBlockOutputStream::writePrefix() +void RabbitMQSink::onStart() { if (!storage.exchangeRemoved()) storage.unbindExchange(); @@ -37,7 +32,7 @@ void RabbitMQBlockOutputStream::writePrefix() format_settings.protobuf.allow_multiple_rows_without_delimiter = true; child = FormatFactory::instance().getOutputStream(storage.getFormatName(), *buffer, - getHeader(), context, + getPort().getHeader(), context, [this](const Columns & /* columns */, size_t /* rows */) { buffer->countRow(); @@ -46,13 +41,13 @@ void RabbitMQBlockOutputStream::writePrefix() } -void RabbitMQBlockOutputStream::write(const Block & block) +void RabbitMQSink::consume(Chunk chunk) { - child->write(block); + child->write(getPort().getHeader().cloneWithColumns(chunk.detachColumns())); } -void RabbitMQBlockOutputStream::writeSuffix() +void RabbitMQSink::onFinish() { child->writeSuffix(); diff --git a/src/Storages/RabbitMQ/RabbitMQSink.h b/src/Storages/RabbitMQ/RabbitMQSink.h new file mode 100644 index 00000000000..6222ccdf2ac --- /dev/null +++ b/src/Storages/RabbitMQ/RabbitMQSink.h @@ -0,0 +1,29 @@ +#pragma once + +#include +#include + + +namespace DB +{ + +class RabbitMQSink : public SinkToStorage +{ + +public: + explicit RabbitMQSink(StorageRabbitMQ & storage_, const StorageMetadataPtr & metadata_snapshot_, ContextPtr context_); + + void onStart() override; + void consume(Chunk chunk) override; + void onFinish() override; + + String getName() const override { return "RabbitMQSink"; } + +private: + StorageRabbitMQ & storage; + StorageMetadataPtr metadata_snapshot; + ContextPtr context; + ProducerBufferPtr buffer; + BlockOutputStreamPtr child; +}; +} diff --git a/src/Storages/RabbitMQ/ReadBufferFromRabbitMQConsumer.h b/src/Storages/RabbitMQ/ReadBufferFromRabbitMQConsumer.h index 1a06c5ebf60..ccc8e56db5e 100644 --- a/src/Storages/RabbitMQ/ReadBufferFromRabbitMQConsumer.h +++ b/src/Storages/RabbitMQ/ReadBufferFromRabbitMQConsumer.h @@ -56,7 +56,11 @@ public: ChannelPtr & getChannel() { return consumer_channel; } void setupChannel(); bool needChannelUpdate(); - void closeChannel() { consumer_channel->close(); } + void closeChannel() + { + if (consumer_channel) + consumer_channel->close(); + } void updateQueues(std::vector & queues_) { queues = queues_; } size_t queuesCount() { return queues.size(); } diff --git a/src/Storages/RabbitMQ/StorageRabbitMQ.cpp b/src/Storages/RabbitMQ/StorageRabbitMQ.cpp index 369f4e9eca9..44622f106f4 100644 --- a/src/Storages/RabbitMQ/StorageRabbitMQ.cpp +++ b/src/Storages/RabbitMQ/StorageRabbitMQ.cpp @@ -1,5 +1,4 @@ #include -#include #include #include #include @@ -15,7 +14,7 @@ #include #include #include -#include +#include #include #include #include @@ -265,6 +264,9 @@ size_t StorageRabbitMQ::getMaxBlockSize() const void StorageRabbitMQ::initRabbitMQ() { + if (stream_cancelled) + return; + if (use_user_setup) { queues.emplace_back(queue_base); @@ -642,9 +644,9 @@ Pipe StorageRabbitMQ::read( } -BlockOutputStreamPtr StorageRabbitMQ::write(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) +SinkToStoragePtr StorageRabbitMQ::write(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) { - return std::make_shared(*this, metadata_snapshot, local_context); + return std::make_shared(*this, metadata_snapshot, local_context); } @@ -704,10 +706,6 @@ void StorageRabbitMQ::shutdown() while (!connection->closed() && cnt_retries++ != RETRIES_MAX) event_handler->iterateLoop(); - /// Should actually force closure, if not yet closed, but it generates distracting error logs - //if (!connection->closed()) - // connection->close(true); - for (size_t i = 0; i < num_created_consumers; ++i) popReadBuffer(); } @@ -720,6 +718,22 @@ void StorageRabbitMQ::cleanupRabbitMQ() const if (use_user_setup) return; + if (!event_handler->connectionRunning()) + { + String queue_names; + for (const auto & queue : queues) + { + if (!queue_names.empty()) + queue_names += ", "; + queue_names += queue; + } + LOG_WARNING(log, + "RabbitMQ clean up not done, because there is no connection in table's shutdown." + "There are {} queues ({}), which might need to be deleted manually. Exchanges will be auto-deleted", + queues.size(), queue_names); + return; + } + AMQP::TcpChannel rabbit_channel(connection.get()); for (const auto & queue : queues) { diff --git a/src/Storages/RabbitMQ/StorageRabbitMQ.h b/src/Storages/RabbitMQ/StorageRabbitMQ.h index 1935dcaee0e..1a2445f3690 100644 --- a/src/Storages/RabbitMQ/StorageRabbitMQ.h +++ b/src/Storages/RabbitMQ/StorageRabbitMQ.h @@ -50,7 +50,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write( + SinkToStoragePtr write( const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) override; diff --git a/src/Storages/ReadFinalForExternalReplicaStorage.cpp b/src/Storages/ReadFinalForExternalReplicaStorage.cpp index fb96bb01936..36a40beca36 100644 --- a/src/Storages/ReadFinalForExternalReplicaStorage.cpp +++ b/src/Storages/ReadFinalForExternalReplicaStorage.cpp @@ -16,6 +16,14 @@ namespace DB { +bool needRewriteQueryWithFinalForStorage(const Names & column_names, const StoragePtr & storage) +{ + const StorageMetadataPtr & metadata = storage->getInMemoryMetadataPtr(); + Block header = metadata->getSampleBlock(); + ColumnWithTypeAndName & version_column = header.getByPosition(header.columns() - 1); + return std::find(column_names.begin(), column_names.end(), version_column.name) == column_names.end(); +} + Pipe readFinalFromNestedStorage( StoragePtr nested_storage, const Names & column_names, @@ -32,20 +40,6 @@ Pipe readFinalFromNestedStorage( Block nested_header = nested_metadata->getSampleBlock(); ColumnWithTypeAndName & sign_column = nested_header.getByPosition(nested_header.columns() - 2); - ColumnWithTypeAndName & version_column = nested_header.getByPosition(nested_header.columns() - 1); - - if (ASTSelectQuery * select_query = query_info.query->as(); select_query && !column_names_set.count(version_column.name)) - { - auto & tables_in_select_query = select_query->tables()->as(); - - if (!tables_in_select_query.children.empty()) - { - auto & tables_element = tables_in_select_query.children[0]->as(); - - if (tables_element.table_expression) - tables_element.table_expression->as().final = true; - } - } String filter_column_name; Names require_columns_name = column_names; @@ -59,9 +53,6 @@ Pipe readFinalFromNestedStorage( expressions->children.emplace_back(makeASTFunction("equals", sign_column_name, fetch_sign_value)); filter_column_name = expressions->children.back()->getColumnName(); - - for (const auto & column_name : column_names) - expressions->children.emplace_back(std::make_shared(column_name)); } Pipe pipe = nested_storage->read(require_columns_name, nested_metadata, query_info, context, processed_stage, max_block_size, num_streams); diff --git a/src/Storages/ReadFinalForExternalReplicaStorage.h b/src/Storages/ReadFinalForExternalReplicaStorage.h index b54592159ef..f09a115919d 100644 --- a/src/Storages/ReadFinalForExternalReplicaStorage.h +++ b/src/Storages/ReadFinalForExternalReplicaStorage.h @@ -13,6 +13,8 @@ namespace DB { +bool needRewriteQueryWithFinalForStorage(const Names & column_names, const StoragePtr & storage); + Pipe readFinalFromNestedStorage( StoragePtr nested_storage, const Names & column_names, diff --git a/src/Storages/ReadInOrderOptimizer.cpp b/src/Storages/ReadInOrderOptimizer.cpp index 87273330b34..912d284bfc0 100644 --- a/src/Storages/ReadInOrderOptimizer.cpp +++ b/src/Storages/ReadInOrderOptimizer.cpp @@ -37,7 +37,7 @@ ReadInOrderOptimizer::ReadInOrderOptimizer( array_join_result_to_source = syntax_result->array_join_result_to_source; } -InputOrderInfoPtr ReadInOrderOptimizer::getInputOrder(const StorageMetadataPtr & metadata_snapshot, ContextPtr context) const +InputOrderInfoPtr ReadInOrderOptimizer::getInputOrder(const StorageMetadataPtr & metadata_snapshot, ContextPtr context, UInt64 limit) const { Names sorting_key_columns = metadata_snapshot->getSortingKeyColumns(); if (!metadata_snapshot->hasSortingKey()) @@ -155,7 +155,8 @@ InputOrderInfoPtr ReadInOrderOptimizer::getInputOrder(const StorageMetadataPtr & if (order_key_prefix_descr.empty()) return {}; - return std::make_shared(std::move(order_key_prefix_descr), read_direction); + + return std::make_shared(std::move(order_key_prefix_descr), read_direction, limit); } } diff --git a/src/Storages/ReadInOrderOptimizer.h b/src/Storages/ReadInOrderOptimizer.h index 0abf2923a98..2686d081855 100644 --- a/src/Storages/ReadInOrderOptimizer.h +++ b/src/Storages/ReadInOrderOptimizer.h @@ -22,7 +22,7 @@ public: const SortDescription & required_sort_description, const TreeRewriterResultPtr & syntax_result); - InputOrderInfoPtr getInputOrder(const StorageMetadataPtr & metadata_snapshot, ContextPtr context) const; + InputOrderInfoPtr getInputOrder(const StorageMetadataPtr & metadata_snapshot, ContextPtr context, UInt64 limit = 0) const; private: /// Actions for every element of order expression to analyze functions for monotonicity diff --git a/src/Storages/RocksDB/EmbeddedRocksDBBlockOutputStream.cpp b/src/Storages/RocksDB/EmbeddedRocksDBSink.cpp similarity index 71% rename from src/Storages/RocksDB/EmbeddedRocksDBBlockOutputStream.cpp rename to src/Storages/RocksDB/EmbeddedRocksDBSink.cpp index d7b125cb41f..ddf839b6427 100644 --- a/src/Storages/RocksDB/EmbeddedRocksDBBlockOutputStream.cpp +++ b/src/Storages/RocksDB/EmbeddedRocksDBSink.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include @@ -13,14 +13,14 @@ namespace ErrorCodes extern const int ROCKSDB_ERROR; } -EmbeddedRocksDBBlockOutputStream::EmbeddedRocksDBBlockOutputStream( +EmbeddedRocksDBSink::EmbeddedRocksDBSink( StorageEmbeddedRocksDB & storage_, const StorageMetadataPtr & metadata_snapshot_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) { - Block sample_block = metadata_snapshot->getSampleBlock(); - for (const auto & elem : sample_block) + for (const auto & elem : getPort().getHeader()) { if (elem.name == storage.primary_key) break; @@ -28,15 +28,10 @@ EmbeddedRocksDBBlockOutputStream::EmbeddedRocksDBBlockOutputStream( } } -Block EmbeddedRocksDBBlockOutputStream::getHeader() const +void EmbeddedRocksDBSink::consume(Chunk chunk) { - return metadata_snapshot->getSampleBlock(); -} - -void EmbeddedRocksDBBlockOutputStream::write(const Block & block) -{ - metadata_snapshot->check(block, true); - auto rows = block.rows(); + auto rows = chunk.getNumRows(); + auto block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); WriteBufferFromOwnString wb_key; WriteBufferFromOwnString wb_value; diff --git a/src/Storages/RocksDB/EmbeddedRocksDBBlockOutputStream.h b/src/Storages/RocksDB/EmbeddedRocksDBSink.h similarity index 63% rename from src/Storages/RocksDB/EmbeddedRocksDBBlockOutputStream.h rename to src/Storages/RocksDB/EmbeddedRocksDBSink.h index e6229782505..e9e98c7df50 100644 --- a/src/Storages/RocksDB/EmbeddedRocksDBBlockOutputStream.h +++ b/src/Storages/RocksDB/EmbeddedRocksDBSink.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB @@ -10,15 +10,15 @@ class StorageEmbeddedRocksDB; struct StorageInMemoryMetadata; using StorageMetadataPtr = std::shared_ptr; -class EmbeddedRocksDBBlockOutputStream : public IBlockOutputStream +class EmbeddedRocksDBSink : public SinkToStorage { public: - EmbeddedRocksDBBlockOutputStream( + EmbeddedRocksDBSink( StorageEmbeddedRocksDB & storage_, const StorageMetadataPtr & metadata_snapshot_); - Block getHeader() const override; - void write(const Block & block) override; + void consume(Chunk chunk) override; + String getName() const override { return "EmbeddedRocksDBSink"; } private: StorageEmbeddedRocksDB & storage; diff --git a/src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp b/src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp index 70cc173e38a..459c0879cda 100644 --- a/src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp +++ b/src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp @@ -1,5 +1,5 @@ #include -#include +#include #include #include @@ -28,10 +28,12 @@ #include #include +#include #include #include #include +#include #include @@ -49,6 +51,24 @@ namespace ErrorCodes } using FieldVectorPtr = std::shared_ptr; +using RocksDBOptions = std::unordered_map; + + +static RocksDBOptions getOptionsFromConfig(const Poco::Util::AbstractConfiguration & config, const std::string & path) +{ + RocksDBOptions options; + + Poco::Util::AbstractConfiguration::Keys keys; + config.keys(path, keys); + + for (const auto & key : keys) + { + const String key_path = path + "." + key; + options[key] = config.getString(key_path); + } + + return options; +} // returns keys may be filter by condition @@ -250,7 +270,9 @@ StorageEmbeddedRocksDB::StorageEmbeddedRocksDB(const StorageID & table_id_, bool attach, ContextPtr context_, const String & primary_key_) - : IStorage(table_id_), primary_key{primary_key_} + : IStorage(table_id_) + , WithContext(context_->getGlobalContext()) + , primary_key{primary_key_} { setInMemoryMetadata(metadata_); rocksdb_dir = context_->getPath() + relative_data_path_; @@ -271,14 +293,86 @@ void StorageEmbeddedRocksDB::truncate(const ASTPtr &, const StorageMetadataPtr & void StorageEmbeddedRocksDB::initDb() { - rocksdb::Options options; + rocksdb::Status status; + rocksdb::Options base; rocksdb::DB * db; - options.create_if_missing = true; - options.compression = rocksdb::CompressionType::kZSTD; - rocksdb::Status status = rocksdb::DB::Open(options, rocksdb_dir, &db); - if (status != rocksdb::Status::OK()) - throw Exception("Fail to open rocksdb path at: " + rocksdb_dir + ": " + status.ToString(), ErrorCodes::ROCKSDB_ERROR); + base.create_if_missing = true; + base.compression = rocksdb::CompressionType::kZSTD; + base.statistics = rocksdb::CreateDBStatistics(); + /// It is too verbose by default, and in fact we don't care about rocksdb logs at all. + base.info_log_level = rocksdb::ERROR_LEVEL; + + rocksdb::Options merged = base; + + const auto & config = getContext()->getConfigRef(); + if (config.has("rocksdb.options")) + { + auto config_options = getOptionsFromConfig(config, "rocksdb.options"); + status = rocksdb::GetDBOptionsFromMap(merged, config_options, &merged); + if (!status.ok()) + { + throw Exception(ErrorCodes::ROCKSDB_ERROR, "Fail to merge rocksdb options from 'rocksdb.options' at: {}: {}", + rocksdb_dir, status.ToString()); + } + } + if (config.has("rocksdb.column_family_options")) + { + auto column_family_options = getOptionsFromConfig(config, "rocksdb.column_family_options"); + status = rocksdb::GetColumnFamilyOptionsFromMap(merged, column_family_options, &merged); + if (!status.ok()) + { + throw Exception(ErrorCodes::ROCKSDB_ERROR, "Fail to merge rocksdb options from 'rocksdb.options' at: {}: {}", + rocksdb_dir, status.ToString()); + } + } + + if (config.has("rocksdb.tables")) + { + auto table_name = getStorageID().getTableName(); + + Poco::Util::AbstractConfiguration::Keys keys; + config.keys("rocksdb.tables", keys); + + for (const auto & key : keys) + { + const String key_prefix = "rocksdb.tables." + key; + if (config.getString(key_prefix + ".name") != table_name) + continue; + + String config_key = key_prefix + ".options"; + if (config.has(config_key)) + { + auto table_config_options = getOptionsFromConfig(config, config_key); + status = rocksdb::GetDBOptionsFromMap(merged, table_config_options, &merged); + if (!status.ok()) + { + throw Exception(ErrorCodes::ROCKSDB_ERROR, "Fail to merge rocksdb options from '{}' at: {}: {}", + config_key, rocksdb_dir, status.ToString()); + } + } + + config_key = key_prefix + ".column_family_options"; + if (config.has(config_key)) + { + auto table_column_family_options = getOptionsFromConfig(config, config_key); + status = rocksdb::GetColumnFamilyOptionsFromMap(merged, table_column_family_options, &merged); + if (!status.ok()) + { + throw Exception(ErrorCodes::ROCKSDB_ERROR, "Fail to merge rocksdb options from '{}' at: {}: {}", + config_key, rocksdb_dir, status.ToString()); + } + } + } + } + + status = rocksdb::DB::Open(merged, rocksdb_dir, &db); + + if (!status.ok()) + { + throw Exception(ErrorCodes::ROCKSDB_ERROR, "Fail to open rocksdb path at: {}: {}", + rocksdb_dir, status.ToString()); + } rocksdb_ptr = std::unique_ptr(db); } @@ -333,10 +427,10 @@ Pipe StorageEmbeddedRocksDB::read( } } -BlockOutputStreamPtr StorageEmbeddedRocksDB::write( +SinkToStoragePtr StorageEmbeddedRocksDB::write( const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) { - return std::make_shared(*this, metadata_snapshot); + return std::make_shared(*this, metadata_snapshot); } @@ -364,6 +458,10 @@ static StoragePtr create(const StorageFactory::Arguments & args) return StorageEmbeddedRocksDB::create(args.table_id, args.relative_data_path, metadata, args.attach, args.getContext(), primary_key_names[0]); } +std::shared_ptr StorageEmbeddedRocksDB::getRocksDBStatistics() const +{ + return rocksdb_ptr->GetOptions().statistics; +} void registerStorageEmbeddedRocksDB(StorageFactory & factory) { diff --git a/src/Storages/RocksDB/StorageEmbeddedRocksDB.h b/src/Storages/RocksDB/StorageEmbeddedRocksDB.h index aa81bc4d35f..3f1b3b49492 100644 --- a/src/Storages/RocksDB/StorageEmbeddedRocksDB.h +++ b/src/Storages/RocksDB/StorageEmbeddedRocksDB.h @@ -8,6 +8,7 @@ namespace rocksdb { class DB; + class Statistics; } @@ -16,11 +17,11 @@ namespace DB class Context; -class StorageEmbeddedRocksDB final : public shared_ptr_helper, public IStorage +class StorageEmbeddedRocksDB final : public shared_ptr_helper, public IStorage, WithContext { friend struct shared_ptr_helper; friend class EmbeddedRocksDBSource; - friend class EmbeddedRocksDBBlockOutputStream; + friend class EmbeddedRocksDBSink; friend class EmbeddedRocksDBBlockInputStream; public: std::string getName() const override { return "EmbeddedRocksDB"; } @@ -34,7 +35,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; void truncate(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr, TableExclusiveLockHolder &) override; bool supportsParallelInsert() const override { return true; } @@ -48,6 +49,8 @@ public: bool storesDataOnDisk() const override { return true; } Strings getDataPaths() const override { return {rocksdb_dir}; } + std::shared_ptr getRocksDBStatistics() const; + protected: StorageEmbeddedRocksDB(const StorageID & table_id_, const String & relative_data_path_, diff --git a/src/Storages/RocksDB/StorageSystemRocksDB.cpp b/src/Storages/RocksDB/StorageSystemRocksDB.cpp new file mode 100644 index 00000000000..7d31d5ddc21 --- /dev/null +++ b/src/Storages/RocksDB/StorageSystemRocksDB.cpp @@ -0,0 +1,129 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + +} + +namespace DB +{ + + +NamesAndTypesList StorageSystemRocksDB::getNamesAndTypes() +{ + return { + { "database", std::make_shared() }, + { "table", std::make_shared() }, + { "name", std::make_shared() }, + { "value", std::make_shared() }, + }; +} + + +void StorageSystemRocksDB::fillData(MutableColumns & res_columns, ContextPtr context, const SelectQueryInfo & query_info) const +{ + const auto access = context->getAccess(); + const bool check_access_for_databases = !access->isGranted(AccessType::SHOW_TABLES); + + std::map> tables; + for (const auto & db : DatabaseCatalog::instance().getDatabases()) + { + const bool check_access_for_tables = check_access_for_databases && !access->isGranted(AccessType::SHOW_TABLES, db.first); + + for (auto iterator = db.second->getTablesIterator(context); iterator->isValid(); iterator->next()) + { + StoragePtr table = iterator->table(); + if (!table) + continue; + + if (!dynamic_cast(table.get())) + continue; + if (check_access_for_tables && !access->isGranted(AccessType::SHOW_TABLES, db.first, iterator->name())) + continue; + tables[db.first][iterator->name()] = table; + } + } + + + MutableColumnPtr col_database_mut = ColumnString::create(); + MutableColumnPtr col_table_mut = ColumnString::create(); + + for (auto & db : tables) + { + for (auto & table : db.second) + { + col_database_mut->insert(db.first); + col_table_mut->insert(table.first); + } + } + + ColumnPtr col_database_to_filter = std::move(col_database_mut); + ColumnPtr col_table_to_filter = std::move(col_table_mut); + + /// Determine what tables are needed by the conditions in the query. + { + Block filtered_block + { + { col_database_to_filter, std::make_shared(), "database" }, + { col_table_to_filter, std::make_shared(), "table" }, + }; + + VirtualColumnUtils::filterBlockWithQuery(query_info.query, filtered_block, context); + + if (!filtered_block.rows()) + return; + + col_database_to_filter = filtered_block.getByName("database").column; + col_table_to_filter = filtered_block.getByName("table").column; + } + + bool show_zeros = context->getSettingsRef().system_events_show_zero_values; + for (size_t i = 0, tables_size = col_database_to_filter->size(); i < tables_size; ++i) + { + String database = (*col_database_to_filter)[i].safeGet(); + String table = (*col_table_to_filter)[i].safeGet(); + + auto & rocksdb_table = dynamic_cast(*tables[database][table]); + auto statistics = rocksdb_table.getRocksDBStatistics(); + if (!statistics) + throw Exception(ErrorCodes::LOGICAL_ERROR, "rocksdb statistics is not enabled"); + + for (auto [tick, name] : rocksdb::TickersNameMap) + { + UInt64 value = statistics->getTickerCount(tick); + if (!value && !show_zeros) + continue; + + /// trim "rocksdb." + if (startsWith(name, "rocksdb.")) + name = name.substr(strlen("rocksdb.")); + + size_t col_num = 0; + res_columns[col_num++]->insert(database); + res_columns[col_num++]->insert(table); + + res_columns[col_num++]->insert(name); + res_columns[col_num++]->insert(value); + } + } +} + +} diff --git a/src/Storages/RocksDB/StorageSystemRocksDB.h b/src/Storages/RocksDB/StorageSystemRocksDB.h new file mode 100644 index 00000000000..e94bdb06aee --- /dev/null +++ b/src/Storages/RocksDB/StorageSystemRocksDB.h @@ -0,0 +1,29 @@ +#pragma once + +#include +#include + + +namespace DB +{ + +class Context; + + +/** Implements the `rocksdb` system table, which expose various rocksdb metrics. + */ +class StorageSystemRocksDB final : public shared_ptr_helper, public IStorageSystemOneBlock +{ + friend struct shared_ptr_helper; +public: + std::string getName() const override { return "SystemRocksDB"; } + + static NamesAndTypesList getNamesAndTypes(); + +protected: + using IStorageSystemOneBlock::IStorageSystemOneBlock; + + void fillData(MutableColumns & res_columns, ContextPtr context, const SelectQueryInfo & query_info) const override; +}; + +} diff --git a/src/Storages/SelectQueryInfo.h b/src/Storages/SelectQueryInfo.h index cf2c4d72f59..3b3c0fa1258 100644 --- a/src/Storages/SelectQueryInfo.h +++ b/src/Storages/SelectQueryInfo.h @@ -83,9 +83,10 @@ struct InputOrderInfo { SortDescription order_key_prefix_descr; int direction; + UInt64 limit; - InputOrderInfo(const SortDescription & order_key_prefix_descr_, int direction_) - : order_key_prefix_descr(order_key_prefix_descr_), direction(direction_) {} + InputOrderInfo(const SortDescription & order_key_prefix_descr_, int direction_, UInt64 limit_) + : order_key_prefix_descr(order_key_prefix_descr_), direction(direction_), limit(limit_) {} bool operator ==(const InputOrderInfo & other) const { diff --git a/src/Storages/StorageBuffer.cpp b/src/Storages/StorageBuffer.cpp index a433cd248c7..e9136bb5d05 100644 --- a/src/Storages/StorageBuffer.cpp +++ b/src/Storages/StorageBuffer.cpp @@ -5,7 +5,6 @@ #include #include #include -#include #include #include #include @@ -26,12 +25,14 @@ #include #include #include +#include #include #include #include #include #include + namespace ProfileEvents { extern const Event StorageBufferFlush; @@ -137,7 +138,7 @@ public: BufferSource(const Names & column_names_, StorageBuffer::Buffer & buffer_, const StorageBuffer & storage, const StorageMetadataPtr & metadata_snapshot) : SourceWithProgress( metadata_snapshot->getSampleBlockForColumns(column_names_, storage.getVirtuals(), storage.getStorageID())) - , column_names_and_types(metadata_snapshot->getColumns().getAllWithSubcolumns().addTypes(column_names_)) + , column_names_and_types(metadata_snapshot->getColumns().getByNames(ColumnsDescription::All, column_names_, true)) , buffer(buffer_) {} String getName() const override { return "Buffer"; } @@ -242,8 +243,8 @@ void StorageBuffer::read( { const auto & dest_columns = destination_metadata_snapshot->getColumns(); const auto & our_columns = metadata_snapshot->getColumns(); - return dest_columns.hasPhysicalOrSubcolumn(column_name) && - dest_columns.getPhysicalOrSubcolumn(column_name).type->equals(*our_columns.getPhysicalOrSubcolumn(column_name).type); + auto dest_columm = dest_columns.tryGetColumnOrSubcolumn(ColumnsDescription::AllPhysical, column_name); + return dest_columm && dest_columm->type->equals(*our_columns.getColumnOrSubcolumn(ColumnsDescription::AllPhysical, column_name).type); }); if (dst_has_same_structure) @@ -513,30 +514,30 @@ static void appendBlock(const Block & from, Block & to) } -class BufferBlockOutputStream : public IBlockOutputStream +class BufferSink : public SinkToStorage { public: - explicit BufferBlockOutputStream( + explicit BufferSink( StorageBuffer & storage_, const StorageMetadataPtr & metadata_snapshot_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) - {} - - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } - - void write(const Block & block) override { - if (!block) - return; - // Check table structure. - metadata_snapshot->check(block, true); + metadata_snapshot->check(getPort().getHeader(), true); + } - size_t rows = block.rows(); + String getName() const override { return "BufferSink"; } + + void consume(Chunk chunk) override + { + size_t rows = chunk.getNumRows(); if (!rows) return; + auto block = getPort().getHeader().cloneWithColumns(chunk.getColumns()); + StoragePtr destination; if (storage.destination_id) { @@ -642,9 +643,9 @@ private: }; -BlockOutputStreamPtr StorageBuffer::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) +SinkToStoragePtr StorageBuffer::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) { - return std::make_shared(*this, metadata_snapshot); + return std::make_shared(*this, metadata_snapshot); } diff --git a/src/Storages/StorageBuffer.h b/src/Storages/StorageBuffer.h index b4fcdf6cbc9..8749cf47eb9 100644 --- a/src/Storages/StorageBuffer.h +++ b/src/Storages/StorageBuffer.h @@ -46,7 +46,7 @@ class StorageBuffer final : public shared_ptr_helper, public ISto { friend struct shared_ptr_helper; friend class BufferSource; -friend class BufferBlockOutputStream; +friend class BufferSink; public: struct Thresholds @@ -84,7 +84,7 @@ public: bool supportsSubcolumns() const override { return true; } - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; void startup() override; /// Flush all buffers into the subordinate table and stop background thread. diff --git a/src/Storages/StorageDictionary.cpp b/src/Storages/StorageDictionary.cpp index 4c31f62b21f..30a9dad8d91 100644 --- a/src/Storages/StorageDictionary.cpp +++ b/src/Storages/StorageDictionary.cpp @@ -167,7 +167,8 @@ Pipe StorageDictionary::read( const size_t max_block_size, const unsigned /*threads*/) { - auto dictionary = getContext()->getExternalDictionariesLoader().getDictionary(dictionary_name, local_context); + auto registered_dictionary_name = location == Location::SameDatabaseAndNameAsDictionary ? getStorageID().getInternalDictionaryName() : dictionary_name; + auto dictionary = getContext()->getExternalDictionariesLoader().getDictionary(registered_dictionary_name, local_context); auto stream = dictionary->getBlockInputStream(column_names, max_block_size); /// TODO: update dictionary interface for processors. return Pipe(std::make_shared(stream)); diff --git a/src/Storages/StorageDistributed.cpp b/src/Storages/StorageDistributed.cpp index 21fa06e19f0..9aaa6692560 100644 --- a/src/Storages/StorageDistributed.cpp +++ b/src/Storages/StorageDistributed.cpp @@ -4,13 +4,13 @@ #include -#include +#include #include #include #include -#include +#include #include #include @@ -57,8 +57,9 @@ #include #include #include +#include #include -#include +#include #include #include @@ -290,26 +291,27 @@ void replaceConstantExpressions( /// - QueryProcessingStage::WithMergeableStateAfterAggregation /// - QueryProcessingStage::WithMergeableStateAfterAggregationAndLimit /// - none (in this case regular WithMergeableState should be used) -std::optional getOptimizedQueryProcessingStage(const SelectQueryInfo & query_info, bool extremes, const Block & sharding_key_block) +std::optional getOptimizedQueryProcessingStage(const SelectQueryInfo & query_info, bool extremes, const Names & sharding_key_columns) { const auto & select = query_info.query->as(); - auto sharding_block_has = [&](const auto & exprs, size_t limit = SIZE_MAX) -> bool + auto sharding_block_has = [&](const auto & exprs) -> bool { - size_t i = 0; + std::unordered_set expr_columns; for (auto & expr : exprs) { - ++i; - if (i > limit) - break; - auto id = expr->template as(); if (!id) - return false; - /// TODO: if GROUP BY contains multiIf()/if() it should contain only columns from sharding_key - if (!sharding_key_block.has(id->name())) + continue; + expr_columns.emplace(id->name()); + } + + for (const auto & column : sharding_key_columns) + { + if (!expr_columns.contains(column)) return false; } + return true; }; @@ -343,7 +345,7 @@ std::optional getOptimizedQueryProcessingStage(const } else { - if (!sharding_block_has(group_by->children, 1)) + if (!sharding_block_has(group_by->children)) return {}; } @@ -547,8 +549,7 @@ QueryProcessingStage::Enum StorageDistributed::getQueryProcessingStage( has_sharding_key && (settings.allow_nondeterministic_optimize_skip_unused_shards || sharding_key_is_deterministic)) { - Block sharding_key_block = sharding_key_expr->getSampleBlock(); - auto stage = getOptimizedQueryProcessingStage(query_info, settings.extremes, sharding_key_block); + auto stage = getOptimizedQueryProcessingStage(query_info, settings.extremes, sharding_key_expr->getRequiredColumns()); if (stage) { LOG_DEBUG(log, "Force processing stage to {}", QueryProcessingStage::toString(*stage)); @@ -602,25 +603,25 @@ void StorageDistributed::read( return; } - const Scalars & scalars = local_context->hasQueryContext() ? local_context->getQueryContext()->getScalars() : Scalars{}; - bool has_virtual_shard_num_column = std::find(column_names.begin(), column_names.end(), "_shard_num") != column_names.end(); if (has_virtual_shard_num_column && !isVirtualColumn("_shard_num", metadata_snapshot)) has_virtual_shard_num_column = false; - ClusterProxy::SelectStreamFactory select_stream_factory = remote_table_function_ptr - ? ClusterProxy::SelectStreamFactory( - header, processed_stage, remote_table_function_ptr, scalars, has_virtual_shard_num_column, local_context->getExternalTables()) - : ClusterProxy::SelectStreamFactory( + StorageID main_table = StorageID::createEmpty(); + if (!remote_table_function_ptr) + main_table = StorageID{remote_database, remote_table}; + + ClusterProxy::SelectStreamFactory select_stream_factory = + ClusterProxy::SelectStreamFactory( header, processed_stage, - StorageID{remote_database, remote_table}, - scalars, - has_virtual_shard_num_column, - local_context->getExternalTables()); + has_virtual_shard_num_column); - ClusterProxy::executeQuery(query_plan, select_stream_factory, log, - modified_query_ast, local_context, query_info, + ClusterProxy::executeQuery( + query_plan, header, processed_stage, + main_table, remote_table_function_ptr, + select_stream_factory, log, modified_query_ast, + local_context, query_info, sharding_key_expr, sharding_key_column_name, query_info.cluster); @@ -630,7 +631,7 @@ void StorageDistributed::read( } -BlockOutputStreamPtr StorageDistributed::write(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) +SinkToStoragePtr StorageDistributed::write(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) { auto cluster = getCluster(); const auto & settings = local_context->getSettingsRef(); @@ -668,7 +669,7 @@ BlockOutputStreamPtr StorageDistributed::write(const ASTPtr &, const StorageMeta sample_block = metadata_snapshot->getSampleBlock(); /// DistributedBlockOutputStream will not own cluster, but will own ConnectionPools of the cluster - return std::make_shared( + return std::make_shared( local_context, *this, metadata_snapshot, createInsertToRemoteTableQuery(remote_database, remote_table, sample_block), cluster, insert_sync, timeout, StorageID{remote_database, remote_table}); @@ -739,9 +740,10 @@ QueryPipelinePtr StorageDistributed::distributedWrite(const ASTInsertQuery & que "Expected exactly one connection for shard " + toString(shard_info.shard_num), ErrorCodes::LOGICAL_ERROR); /// INSERT SELECT query returns empty block - auto in_stream = std::make_shared(std::move(connections), new_query_str, Block{}, local_context); + auto remote_query_executor + = std::make_shared(shard_info.pool, std::move(connections), new_query_str, Block{}, local_context); pipelines.emplace_back(std::make_unique()); - pipelines.back()->init(Pipe(std::make_shared(std::move(in_stream)))); + pipelines.back()->init(Pipe(std::make_shared(remote_query_executor, false, settings.async_socket_for_remote))); pipelines.back()->setSinks([](const Block & header, QueryPipeline::StreamType) -> ProcessorPtr { return std::make_shared(header); @@ -1137,6 +1139,18 @@ ActionLock StorageDistributed::getActionLock(StorageActionBlockType type) return {}; } +void StorageDistributed::flush() +{ + try + { + flushClusterNodesAllData(getContext()); + } + catch (...) + { + tryLogCurrentException(log, "Cannot flush"); + } +} + void StorageDistributed::flushClusterNodesAllData(ContextPtr local_context) { /// Sync SYSTEM FLUSH DISTRIBUTED with TRUNCATE @@ -1292,8 +1306,11 @@ void registerStorageDistributed(StorageFactory & factory) String cluster_name = getClusterNameAndMakeLiteral(engine_args[0]); - engine_args[1] = evaluateConstantExpressionOrIdentifierAsLiteral(engine_args[1], args.getLocalContext()); - engine_args[2] = evaluateConstantExpressionOrIdentifierAsLiteral(engine_args[2], args.getLocalContext()); + const ContextPtr & context = args.getContext(); + const ContextPtr & local_context = args.getLocalContext(); + + engine_args[1] = evaluateConstantExpressionOrIdentifierAsLiteral(engine_args[1], local_context); + engine_args[2] = evaluateConstantExpressionOrIdentifierAsLiteral(engine_args[2], local_context); String remote_database = engine_args[1]->as().value.safeGet(); String remote_table = engine_args[2]->as().value.safeGet(); @@ -1304,7 +1321,7 @@ void registerStorageDistributed(StorageFactory & factory) /// Check that sharding_key exists in the table and has numeric type. if (sharding_key) { - auto sharding_expr = buildShardingKeyExpression(sharding_key, args.getContext(), args.columns.getAllPhysical(), true); + auto sharding_expr = buildShardingKeyExpression(sharding_key, context, args.columns.getAllPhysical(), true); const Block & block = sharding_expr->getSampleBlock(); if (block.columns() != 1) @@ -1335,6 +1352,16 @@ void registerStorageDistributed(StorageFactory & factory) "bytes_to_throw_insert cannot be less or equal to bytes_to_delay_insert (since it is handled first)"); } + /// Set default values from the distributed_directory_monitor_* global context settings. + if (!distributed_settings.monitor_batch_inserts.changed) + distributed_settings.monitor_batch_inserts = context->getSettingsRef().distributed_directory_monitor_batch_inserts; + if (!distributed_settings.monitor_split_batch_on_failure.changed) + distributed_settings.monitor_split_batch_on_failure = context->getSettingsRef().distributed_directory_monitor_split_batch_on_failure; + if (!distributed_settings.monitor_sleep_time_ms.changed) + distributed_settings.monitor_sleep_time_ms = Poco::Timespan(context->getSettingsRef().distributed_directory_monitor_sleep_time_ms); + if (!distributed_settings.monitor_max_sleep_time_ms.changed) + distributed_settings.monitor_max_sleep_time_ms = Poco::Timespan(context->getSettingsRef().distributed_directory_monitor_max_sleep_time_ms); + return StorageDistributed::create( args.table_id, args.columns, @@ -1343,7 +1370,7 @@ void registerStorageDistributed(StorageFactory & factory) remote_database, remote_table, cluster_name, - args.getContext(), + context, sharding_key, storage_policy, args.relative_data_path, diff --git a/src/Storages/StorageDistributed.h b/src/Storages/StorageDistributed.h index c63abbc6aa4..4331817386e 100644 --- a/src/Storages/StorageDistributed.h +++ b/src/Storages/StorageDistributed.h @@ -39,7 +39,7 @@ using ExpressionActionsPtr = std::shared_ptr; class StorageDistributed final : public shared_ptr_helper, public IStorage, WithContext { friend struct shared_ptr_helper; - friend class DistributedBlockOutputStream; + friend class DistributedSink; friend class StorageDistributedDirectoryMonitor; friend class StorageSystemDistributionQueue; @@ -81,7 +81,7 @@ public: bool supportsParallelInsert() const override { return true; } std::optional totalBytes(const Settings &) const override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; QueryPipelinePtr distributedWrite(const ASTInsertQuery & query, ContextPtr context) override; @@ -98,6 +98,7 @@ public: void startup() override; void shutdown() override; + void flush() override; void drop() override; bool storesDataOnDisk() const override { return true; } diff --git a/src/Storages/StorageFactory.cpp b/src/Storages/StorageFactory.cpp index 5ca423b449a..cfa50b95487 100644 --- a/src/Storages/StorageFactory.cpp +++ b/src/Storages/StorageFactory.cpp @@ -222,7 +222,7 @@ StoragePtr StorageFactory::get( storage_def->engine->arguments->children = empty_engine_args; } - if (local_context->hasQueryContext() && context->getSettingsRef().log_queries) + if (local_context->hasQueryContext() && local_context->getSettingsRef().log_queries) local_context->getQueryContext()->addQueryFactoriesInfo(Context::QueryLogFactories::Storage, name); return res; diff --git a/src/Storages/StorageFile.cpp b/src/Storages/StorageFile.cpp index efd59255c9e..cc8e397b668 100644 --- a/src/Storages/StorageFile.cpp +++ b/src/Storages/StorageFile.cpp @@ -9,15 +9,16 @@ #include #include +#include #include #include #include #include #include -#include #include -#include +#include +#include #include #include @@ -25,6 +26,7 @@ #include #include +#include #include #include @@ -35,6 +37,7 @@ #include #include #include +#include namespace fs = std::filesystem; @@ -46,7 +49,6 @@ namespace ErrorCodes { extern const int BAD_ARGUMENTS; extern const int NOT_IMPLEMENTED; - extern const int CANNOT_SEEK_THROUGH_FILE; extern const int CANNOT_TRUNCATE_FILE; extern const int DATABASE_ACCESS_DENIED; extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; @@ -55,6 +57,7 @@ namespace ErrorCodes extern const int FILE_DOESNT_EXIST; extern const int TIMEOUT_EXCEEDED; extern const int INCOMPATIBLE_COLUMNS; + extern const int CANNOT_STAT; } namespace @@ -169,10 +172,6 @@ StorageFile::StorageFile(int table_fd_, CommonArguments args) is_db_table = false; use_table_fd = true; table_fd = table_fd_; - - /// Save initial offset, it will be used for repeating SELECTs - /// If FD isn't seekable (lseek returns -1), then the second and subsequent SELECTs will fail. - table_fd_init_offset = lseek(table_fd, 0, SEEK_CUR); } StorageFile::StorageFile(const std::string & table_path_, const std::string & user_files_path, CommonArguments args) @@ -187,7 +186,7 @@ StorageFile::StorageFile(const std::string & table_path_, const std::string & us throw Exception("Cannot get table structure from file, because no files match specified name", ErrorCodes::INCORRECT_FILE_NAME); auto & first_path = paths[0]; - Block header = StorageDistributedDirectoryMonitor::createStreamFromFile(first_path)->getHeader(); + Block header = StorageDistributedDirectoryMonitor::createSourceFromFile(first_path)->getOutputs().front().getHeader(); StorageInMemoryMetadata storage_metadata; auto columns = ColumnsDescription(header.getNamesAndTypesList()); @@ -276,7 +275,8 @@ public: const FilesInfoPtr & files_info) { if (storage->isColumnOriented()) - return metadata_snapshot->getSampleBlockForColumns(columns_description.getNamesOfPhysical(), storage->getVirtuals(), storage->getStorageID()); + return metadata_snapshot->getSampleBlockForColumns( + columns_description.getNamesOfPhysical(), storage->getVirtuals(), storage->getStorageID()); else return getHeader(metadata_snapshot, files_info->need_path_column, files_info->need_file_column); } @@ -296,28 +296,7 @@ public: , context(context_) , max_block_size(max_block_size_) { - if (storage->use_table_fd) - { - unique_lock = std::unique_lock(storage->rwlock, getLockTimeout(context)); - if (!unique_lock) - throw Exception("Lock timeout exceeded", ErrorCodes::TIMEOUT_EXCEEDED); - - /// We could use common ReadBuffer and WriteBuffer in storage to leverage cache - /// and add ability to seek unseekable files, but cache sync isn't supported. - - if (storage->table_fd_was_used) /// We need seek to initial position - { - if (storage->table_fd_init_offset < 0) - throw Exception("File descriptor isn't seekable, inside " + storage->getName(), ErrorCodes::CANNOT_SEEK_THROUGH_FILE); - - /// ReadBuffer's seek() doesn't make sense, since cache is empty - if (lseek(storage->table_fd, storage->table_fd_init_offset, SEEK_SET) < 0) - throwFromErrno("Cannot seek file descriptor, inside " + storage->getName(), ErrorCodes::CANNOT_SEEK_THROUGH_FILE); - } - - storage->table_fd_was_used = true; - } - else + if (!storage->use_table_fd) { shared_lock = std::shared_lock(storage->rwlock, getLockTimeout(context)); if (!shared_lock) @@ -348,7 +327,9 @@ public: /// Special case for distributed format. Defaults are not needed here. if (storage->format_name == "Distributed") { - reader = StorageDistributedDirectoryMonitor::createStreamFromFile(current_path); + pipeline = std::make_unique(); + pipeline->init(Pipe(StorageDistributedDirectoryMonitor::createSourceFromFile(current_path))); + reader = std::make_unique(*pipeline); continue; } } @@ -356,14 +337,32 @@ public: std::unique_ptr nested_buffer; CompressionMethod method; + struct stat file_stat{}; + if (storage->use_table_fd) { - nested_buffer = std::make_unique(storage->table_fd); + /// Check if file descriptor allows random reads (and reading it twice). + if (0 != fstat(storage->table_fd, &file_stat)) + throwFromErrno("Cannot stat table file descriptor, inside " + storage->getName(), ErrorCodes::CANNOT_STAT); + + if (S_ISREG(file_stat.st_mode)) + nested_buffer = std::make_unique(storage->table_fd); + else + nested_buffer = std::make_unique(storage->table_fd); + method = chooseCompressionMethod("", storage->compression_method); } else { - nested_buffer = std::make_unique(current_path); + /// Check if file descriptor allows random reads (and reading it twice). + if (0 != stat(current_path.c_str(), &file_stat)) + throwFromErrno("Cannot stat file " + current_path, ErrorCodes::CANNOT_STAT); + + if (S_ISREG(file_stat.st_mode)) + nested_buffer = std::make_unique(current_path, context->getSettingsRef().max_read_buffer_size); + else + nested_buffer = std::make_unique(current_path, context->getSettingsRef().max_read_buffer_size); + method = chooseCompressionMethod(current_path, storage->compression_method); } @@ -386,24 +385,31 @@ public: auto format = FormatFactory::instance().getInput( storage->format_name, *read_buf, get_block_for_format(), context, max_block_size, storage->format_settings); - reader = std::make_shared(format); + pipeline = std::make_unique(); + pipeline->init(Pipe(format)); if (columns_description.hasDefaults()) - reader = std::make_shared(reader, columns_description, context); + { + pipeline->addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, columns_description, *format, context); + }); + } - reader->readPrefix(); + reader = std::make_unique(*pipeline); } - if (auto res = reader->read()) + Chunk chunk; + if (reader->pull(chunk)) { - Columns columns = res.getColumns(); - UInt64 num_rows = res.rows(); + //Columns columns = res.getColumns(); + UInt64 num_rows = chunk.getNumRows(); /// Enrich with virtual columns. if (files_info->need_path_column) { auto column = DataTypeString().createColumnConst(num_rows, current_path); - columns.push_back(column->convertToFullColumnIfConst()); + chunk.addColumn(column->convertToFullColumnIfConst()); } if (files_info->need_file_column) @@ -412,10 +418,10 @@ public: auto file_name = current_path.substr(last_slash_pos + 1); auto column = DataTypeString().createColumnConst(num_rows, std::move(file_name)); - columns.push_back(column->convertToFullColumnIfConst()); + chunk.addColumn(column->convertToFullColumnIfConst()); } - return Chunk(std::move(columns), num_rows); + return chunk; } /// Read only once for file descriptor. @@ -423,8 +429,8 @@ public: finished_generate = true; /// Close file prematurely if stream was ended. - reader->readSuffix(); reader.reset(); + pipeline.reset(); read_buf.reset(); } @@ -439,7 +445,8 @@ private: String current_path; Block sample_block; std::unique_ptr read_buf; - BlockInputStreamPtr reader; + std::unique_ptr pipeline; + std::unique_ptr reader; ColumnsDescription columns_description; @@ -449,9 +456,9 @@ private: bool finished_generate = false; std::shared_lock shared_lock; - std::unique_lock unique_lock; }; + Pipe StorageFile::read( const Names & column_names, const StorageMetadataPtr & metadata_snapshot, @@ -466,6 +473,7 @@ Pipe StorageFile::read( if (use_table_fd) /// need to call ctr BlockInputStream paths = {""}; /// when use fd, paths are empty else + { if (paths.size() == 1 && !fs::exists(paths[0])) { if (context->getSettingsRef().engine_file_empty_if_not_exists) @@ -473,7 +481,7 @@ Pipe StorageFile::read( else throw Exception("File " + paths[0] + " doesn't exist", ErrorCodes::FILE_DOESNT_EXIST); } - + } auto files_info = std::make_shared(); files_info->files = paths; @@ -509,6 +517,7 @@ Pipe StorageFile::read( else return metadata_snapshot->getColumns(); }; + pipes.emplace_back(std::make_shared( this_ptr, metadata_snapshot, context, max_block_size, files_info, get_columns_for_format())); } @@ -517,10 +526,10 @@ Pipe StorageFile::read( } -class StorageFileBlockOutputStream : public IBlockOutputStream +class StorageFileSink final : public SinkToStorage { public: - explicit StorageFileBlockOutputStream( + explicit StorageFileSink( StorageFile & storage_, const StorageMetadataPtr & metadata_snapshot_, std::unique_lock && lock_, @@ -528,7 +537,8 @@ public: ContextPtr context, const std::optional & format_settings, int & flags) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) , lock(std::move(lock_)) { @@ -538,11 +548,6 @@ public: std::unique_ptr naked_buffer = nullptr; if (storage.use_table_fd) { - /** NOTE: Using real file bounded to FD may be misleading: - * SELECT *; INSERT insert_data; SELECT *; last SELECT returns initil_fd_data + insert_data - * INSERT data; SELECT *; last SELECT returns only insert_data - */ - storage.table_fd_was_used = true; naked_buffer = std::make_unique(storage.table_fd, DBMS_DEFAULT_BUFFER_SIZE); } else @@ -564,29 +569,29 @@ public: {}, format_settings); } - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } + String getName() const override { return "StorageFileSink"; } - void write(const Block & block) override - { - writer->write(block); - } - - void writePrefix() override + void onStart() override { if (!prefix_written) writer->writePrefix(); prefix_written = true; } - void writeSuffix() override + void consume(Chunk chunk) override + { + writer->write(getPort().getHeader().cloneWithColumns(chunk.detachColumns())); + } + + void onFinish() override { writer->writeSuffix(); } - void flush() override - { - writer->flush(); - } + // void flush() override + // { + // writer->flush(); + // } private: StorageFile & storage; @@ -597,7 +602,7 @@ private: bool prefix_written{false}; }; -BlockOutputStreamPtr StorageFile::write( +SinkToStoragePtr StorageFile::write( const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) @@ -617,7 +622,7 @@ BlockOutputStreamPtr StorageFile::write( fs::create_directories(fs::path(path).parent_path()); } - return std::make_shared( + return std::make_shared( *this, metadata_snapshot, std::unique_lock{rwlock, getLockTimeout(context)}, diff --git a/src/Storages/StorageFile.h b/src/Storages/StorageFile.h index 843cd405828..b80333e7ba8 100644 --- a/src/Storages/StorageFile.h +++ b/src/Storages/StorageFile.h @@ -29,7 +29,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write( + SinkToStoragePtr write( const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; @@ -68,7 +68,7 @@ public: protected: friend class StorageFileSource; - friend class StorageFileBlockOutputStream; + friend class StorageFileSink; /// From file descriptor StorageFile(int table_fd_, CommonArguments args); @@ -95,10 +95,8 @@ private: std::string base_path; std::vector paths; - bool is_db_table = true; /// Table is stored in real database, not user's file - bool use_table_fd = false; /// Use table_fd instead of path - std::atomic table_fd_was_used{false}; /// To detect repeating reads from stdin - off_t table_fd_init_offset = -1; /// Initial position of fd, used for repeating reads + bool is_db_table = true; /// Table is stored in real database, not user's file + bool use_table_fd = false; /// Use table_fd instead of path mutable std::shared_timed_mutex rwlock; diff --git a/src/Storages/StorageInMemoryMetadata.cpp b/src/Storages/StorageInMemoryMetadata.cpp index 28574d6fdf1..dad83f64c70 100644 --- a/src/Storages/StorageInMemoryMetadata.cpp +++ b/src/Storages/StorageInMemoryMetadata.cpp @@ -320,23 +320,31 @@ Block StorageInMemoryMetadata::getSampleBlockForColumns( { Block res; - auto all_columns = getColumns().getAllWithSubcolumns(); - std::unordered_map columns_map; - columns_map.reserve(all_columns.size()); +#if !defined(ARCADIA_BUILD) + google::dense_hash_map virtuals_map; +#else + google::sparsehash::dense_hash_map virtuals_map; +#endif - for (const auto & elem : all_columns) - columns_map.emplace(elem.name, elem.type); + virtuals_map.set_empty_key(StringRef()); /// Virtual columns must be appended after ordinary, because user can /// override them. for (const auto & column : virtuals) - columns_map.emplace(column.name, column.type); + virtuals_map.emplace(column.name, &column.type); for (const auto & name : column_names) { - auto it = columns_map.find(name); - if (it != columns_map.end()) - res.insert({it->second->createColumn(), it->second, it->first}); + auto column = getColumns().tryGetColumnOrSubcolumn(ColumnsDescription::All, name); + if (column) + { + res.insert({column->type->createColumn(), column->type, column->name}); + } + else if (auto it = virtuals_map.find(name); it != virtuals_map.end()) + { + const auto & type = *it->second; + res.insert({type->createColumn(), type, name}); + } else throw Exception( "Column " + backQuote(name) + " not found in table " + (storage_id.empty() ? "" : storage_id.getNameForLogs()), @@ -508,26 +516,31 @@ namespace void StorageInMemoryMetadata::check(const Names & column_names, const NamesAndTypesList & virtuals, const StorageID & storage_id) const { - NamesAndTypesList available_columns = getColumns().getAllPhysicalWithSubcolumns(); - available_columns.insert(available_columns.end(), virtuals.begin(), virtuals.end()); - - const String list_of_columns = listOfColumns(available_columns); - if (column_names.empty()) - throw Exception("Empty list of columns queried. There are columns: " + list_of_columns, ErrorCodes::EMPTY_LIST_OF_COLUMNS_QUERIED); - - const auto columns_map = getColumnsMap(available_columns); + { + auto list_of_columns = listOfColumns(getColumns().getAllPhysicalWithSubcolumns()); + throw Exception(ErrorCodes::EMPTY_LIST_OF_COLUMNS_QUERIED, + "Empty list of columns queried. There are columns: {}", list_of_columns); + } + const auto virtuals_map = getColumnsMap(virtuals); auto unique_names = initUniqueStrings(); + for (const auto & name : column_names) { - if (columns_map.end() == columns_map.find(name)) - throw Exception( - "There is no column with name " + backQuote(name) + " in table " + storage_id.getNameForLogs() + ". There are columns: " + list_of_columns, - ErrorCodes::NO_SUCH_COLUMN_IN_TABLE); + bool has_column = getColumns().hasColumnOrSubcolumn(ColumnsDescription::AllPhysical, name) || virtuals_map.count(name); + + if (!has_column) + { + auto list_of_columns = listOfColumns(getColumns().getAllPhysicalWithSubcolumns()); + throw Exception(ErrorCodes::NO_SUCH_COLUMN_IN_TABLE, + "There is no column with name {} in table {}. There are columns: {}", + backQuote(name), storage_id.getNameForLogs(), list_of_columns); + } if (unique_names.end() != unique_names.find(name)) - throw Exception("Column " + name + " queried more than once", ErrorCodes::COLUMN_QUERIED_MORE_THAN_ONCE); + throw Exception(ErrorCodes::COLUMN_QUERIED_MORE_THAN_ONCE, "Column {} queried more than once", name); + unique_names.insert(name); } } diff --git a/src/Storages/StorageInMemoryMetadata.h b/src/Storages/StorageInMemoryMetadata.h index 861cb5866ee..d0d60f608d7 100644 --- a/src/Storages/StorageInMemoryMetadata.h +++ b/src/Storages/StorageInMemoryMetadata.h @@ -28,7 +28,6 @@ struct StorageInMemoryMetadata ConstraintsDescription constraints; /// Table projections. Currently supported for MergeTree only. ProjectionsDescription projections; - mutable const ProjectionDescription * selected_projection{}; /// PARTITION BY expression. Currently supported for MergeTree only. KeyDescription partition_key; /// PRIMARY KEY expression. If absent, than equal to order_by_ast. diff --git a/src/Storages/StorageInput.cpp b/src/Storages/StorageInput.cpp index 63b440aff08..d707d7a6cdf 100644 --- a/src/Storages/StorageInput.cpp +++ b/src/Storages/StorageInput.cpp @@ -3,7 +3,6 @@ #include -#include #include #include #include @@ -46,9 +45,9 @@ public: }; -void StorageInput::setInputStream(BlockInputStreamPtr input_stream_) +void StorageInput::setPipe(Pipe pipe_) { - input_stream = input_stream_; + pipe = std::move(pipe_); } @@ -71,10 +70,10 @@ Pipe StorageInput::read( return Pipe(std::make_shared(query_context, metadata_snapshot->getSampleBlock())); } - if (!input_stream) + if (pipe.empty()) throw Exception("Input stream is not initialized, input() must be used only in INSERT SELECT query", ErrorCodes::INVALID_USAGE_OF_INPUT); - return Pipe(std::make_shared(input_stream)); + return std::move(pipe); } } diff --git a/src/Storages/StorageInput.h b/src/Storages/StorageInput.h index 4b04907bea2..c3c40cab7a5 100644 --- a/src/Storages/StorageInput.h +++ b/src/Storages/StorageInput.h @@ -2,6 +2,7 @@ #include #include +#include namespace DB { @@ -15,7 +16,7 @@ public: String getName() const override { return "Input"; } /// A table will read from this stream. - void setInputStream(BlockInputStreamPtr input_stream_); + void setPipe(Pipe pipe_); Pipe read( const Names & column_names, @@ -27,7 +28,7 @@ public: unsigned num_streams) override; private: - BlockInputStreamPtr input_stream; + Pipe pipe; protected: StorageInput(const StorageID & table_id, const ColumnsDescription & columns_); diff --git a/src/Storages/StorageJoin.cpp b/src/Storages/StorageJoin.cpp index c3061ce9c51..5c5b12c7475 100644 --- a/src/Storages/StorageJoin.cpp +++ b/src/Storages/StorageJoin.cpp @@ -6,7 +6,6 @@ #include #include #include -#include #include #include #include @@ -68,7 +67,7 @@ StorageJoin::StorageJoin( restore(); } -BlockOutputStreamPtr StorageJoin::write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) +SinkToStoragePtr StorageJoin::write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) { std::lock_guard mutate_lock(mutate_mutex); return StorageSetOrJoinBase::write(query, metadata_snapshot, context); diff --git a/src/Storages/StorageJoin.h b/src/Storages/StorageJoin.h index cf28e9d4dd0..6a08773ecc8 100644 --- a/src/Storages/StorageJoin.h +++ b/src/Storages/StorageJoin.h @@ -45,7 +45,7 @@ public: /// (but not during processing whole query, it's safe for joinGet that doesn't involve `used_flags` from HashJoin) ColumnWithTypeAndName joinGet(const Block & block, const Block & block_with_columns_to_add) const; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) override; Pipe read( const Names & column_names, diff --git a/src/Storages/StorageLog.cpp b/src/Storages/StorageLog.cpp index b43cb6d71a0..0e156f24cc2 100644 --- a/src/Storages/StorageLog.cpp +++ b/src/Storages/StorageLog.cpp @@ -16,7 +16,6 @@ #include -#include #include #include @@ -26,6 +25,7 @@ #include "StorageLogSettings.h" #include #include +#include #include #include @@ -205,12 +205,13 @@ void LogSource::readData(const NameAndTypePair & name_and_type, ColumnPtr & colu } -class LogBlockOutputStream final : public IBlockOutputStream +class LogSink final : public SinkToStorage { public: - explicit LogBlockOutputStream( + explicit LogSink( StorageLog & storage_, const StorageMetadataPtr & metadata_snapshot_, std::unique_lock && lock_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) , lock(std::move(lock_)) , marks_stream( @@ -228,7 +229,9 @@ public: } } - ~LogBlockOutputStream() override + String getName() const override { return "LogSink"; } + + ~LogSink() override { try { @@ -245,9 +248,8 @@ public: } } - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } - void write(const Block & block) override; - void writeSuffix() override; + void consume(Chunk chunk) override; + void onFinish() override; private: StorageLog & storage; @@ -302,8 +304,9 @@ private: }; -void LogBlockOutputStream::write(const Block & block) +void LogSink::consume(Chunk chunk) { + auto block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); metadata_snapshot->check(block, true); /// The set of written offset columns so that you do not write shared offsets of columns for nested structures multiple times @@ -322,14 +325,14 @@ void LogBlockOutputStream::write(const Block & block) } -void LogBlockOutputStream::writeSuffix() +void LogSink::onFinish() { if (done) return; WrittenStreams written_streams; ISerialization::SerializeBinaryBulkSettings settings; - for (const auto & column : getHeader()) + for (const auto & column : getPort().getHeader()) { auto it = serialize_states.find(column.name); if (it != serialize_states.end()) @@ -366,7 +369,7 @@ void LogBlockOutputStream::writeSuffix() } -ISerialization::OutputStreamGetter LogBlockOutputStream::createStreamGetter(const NameAndTypePair & name_and_type, +ISerialization::OutputStreamGetter LogSink::createStreamGetter(const NameAndTypePair & name_and_type, WrittenStreams & written_streams) { return [&] (const ISerialization::SubstreamPath & path) -> WriteBuffer * @@ -384,7 +387,7 @@ ISerialization::OutputStreamGetter LogBlockOutputStream::createStreamGetter(cons } -void LogBlockOutputStream::writeData(const NameAndTypePair & name_and_type, const IColumn & column, +void LogSink::writeData(const NameAndTypePair & name_and_type, const IColumn & column, MarksForColumns & out_marks, WrittenStreams & written_streams) { ISerialization::SerializeBinaryBulkSettings settings; @@ -443,7 +446,7 @@ void LogBlockOutputStream::writeData(const NameAndTypePair & name_and_type, cons } -void LogBlockOutputStream::writeMarks(MarksForColumns && marks) +void LogSink::writeMarks(MarksForColumns && marks) { if (marks.size() != storage.file_count) throw Exception("Wrong number of marks generated from block. Makes no sense.", ErrorCodes::LOGICAL_ERROR); @@ -660,7 +663,7 @@ Pipe StorageLog::read( auto lock_timeout = getLockTimeout(context); loadMarks(lock_timeout); - auto all_columns = metadata_snapshot->getColumns().getAllWithSubcolumns().addTypes(column_names); + auto all_columns = metadata_snapshot->getColumns().getByNames(ColumnsDescription::All, column_names, true); all_columns = Nested::convertToSubcolumns(all_columns); std::shared_lock lock(rwlock, lock_timeout); @@ -698,7 +701,7 @@ Pipe StorageLog::read( return Pipe::unitePipes(std::move(pipes)); } -BlockOutputStreamPtr StorageLog::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) +SinkToStoragePtr StorageLog::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) { auto lock_timeout = getLockTimeout(context); loadMarks(lock_timeout); @@ -707,7 +710,7 @@ BlockOutputStreamPtr StorageLog::write(const ASTPtr & /*query*/, const StorageMe if (!lock) throw Exception("Lock timeout exceeded", ErrorCodes::TIMEOUT_EXCEEDED); - return std::make_shared(*this, metadata_snapshot, std::move(lock)); + return std::make_shared(*this, metadata_snapshot, std::move(lock)); } CheckResults StorageLog::checkData(const ASTPtr & /* query */, ContextPtr context) diff --git a/src/Storages/StorageLog.h b/src/Storages/StorageLog.h index 799bad26c7c..116bdc31520 100644 --- a/src/Storages/StorageLog.h +++ b/src/Storages/StorageLog.h @@ -19,7 +19,7 @@ namespace DB class StorageLog final : public shared_ptr_helper, public IStorage { friend class LogSource; - friend class LogBlockOutputStream; + friend class LogSink; friend struct shared_ptr_helper; public: @@ -34,7 +34,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; void rename(const String & new_path_to_table_data, const StorageID & new_table_id) override; diff --git a/src/Storages/StorageMaterializeMySQL.cpp b/src/Storages/StorageMaterializedMySQL.cpp similarity index 77% rename from src/Storages/StorageMaterializeMySQL.cpp rename to src/Storages/StorageMaterializedMySQL.cpp index 5b371fe3fb8..52f53b9ceee 100644 --- a/src/Storages/StorageMaterializeMySQL.cpp +++ b/src/Storages/StorageMaterializedMySQL.cpp @@ -4,7 +4,7 @@ #if USE_MYSQL -#include +#include #include #include @@ -21,14 +21,14 @@ #include #include -#include +#include #include #include namespace DB { -StorageMaterializeMySQL::StorageMaterializeMySQL(const StoragePtr & nested_storage_, const IDatabase * database_) +StorageMaterializedMySQL::StorageMaterializedMySQL(const StoragePtr & nested_storage_, const IDatabase * database_) : StorageProxy(nested_storage_->getStorageID()), nested_storage(nested_storage_), database(database_) { StorageInMemoryMetadata in_memory_metadata; @@ -36,7 +36,12 @@ StorageMaterializeMySQL::StorageMaterializeMySQL(const StoragePtr & nested_stora setInMemoryMetadata(in_memory_metadata); } -Pipe StorageMaterializeMySQL::read( +bool StorageMaterializedMySQL::needRewriteQueryWithFinal(const Names & column_names) const +{ + return needRewriteQueryWithFinalForStorage(column_names, nested_storage); +} + +Pipe StorageMaterializedMySQL::read( const Names & column_names, const StorageMetadataPtr & metadata_snapshot, SelectQueryInfo & query_info, @@ -47,18 +52,19 @@ Pipe StorageMaterializeMySQL::read( { /// If the background synchronization thread has exception. rethrowSyncExceptionIfNeed(database); + return readFinalFromNestedStorage(nested_storage, column_names, metadata_snapshot, query_info, context, processed_stage, max_block_size, num_streams); } -NamesAndTypesList StorageMaterializeMySQL::getVirtuals() const +NamesAndTypesList StorageMaterializedMySQL::getVirtuals() const { /// If the background synchronization thread has exception. rethrowSyncExceptionIfNeed(database); return nested_storage->getVirtuals(); } -IStorage::ColumnSizeByName StorageMaterializeMySQL::getColumnSizes() const +IStorage::ColumnSizeByName StorageMaterializedMySQL::getColumnSizes() const { auto sizes = nested_storage->getColumnSizes(); auto nested_header = nested_storage->getInMemoryMetadataPtr()->getSampleBlock(); diff --git a/src/Storages/StorageMaterializeMySQL.h b/src/Storages/StorageMaterializedMySQL.h similarity index 61% rename from src/Storages/StorageMaterializeMySQL.h rename to src/Storages/StorageMaterializedMySQL.h index 45221ed5b76..95ef4ad97fa 100644 --- a/src/Storages/StorageMaterializeMySQL.h +++ b/src/Storages/StorageMaterializedMySQL.h @@ -16,19 +16,21 @@ namespace ErrorCodes extern const int NOT_IMPLEMENTED; } -class StorageMaterializeMySQL final : public shared_ptr_helper, public StorageProxy +class StorageMaterializedMySQL final : public shared_ptr_helper, public StorageProxy { - friend struct shared_ptr_helper; + friend struct shared_ptr_helper; public: - String getName() const override { return "MaterializeMySQL"; } + String getName() const override { return "MaterializedMySQL"; } - StorageMaterializeMySQL(const StoragePtr & nested_storage_, const IDatabase * database_); + StorageMaterializedMySQL(const StoragePtr & nested_storage_, const IDatabase * database_); + + bool needRewriteQueryWithFinal(const Names & column_names) const override; Pipe read( const Names & column_names, const StorageMetadataPtr & metadata_snapshot, SelectQueryInfo & query_info, ContextPtr context, QueryProcessingStage::Enum processed_stage, size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr &, const StorageMetadataPtr &, ContextPtr) override { throwNotAllowed(); } + SinkToStoragePtr write(const ASTPtr &, const StorageMetadataPtr &, ContextPtr) override { throwNotAllowed(); } NamesAndTypesList getVirtuals() const override; ColumnSizeByName getColumnSizes() const override; @@ -40,7 +42,7 @@ public: private: [[noreturn]] void throwNotAllowed() const { - throw Exception("This method is not allowed for MaterializeMySQL", ErrorCodes::NOT_IMPLEMENTED); + throw Exception("This method is not allowed for MaterializedMySQL", ErrorCodes::NOT_IMPLEMENTED); } StoragePtr nested_storage; diff --git a/src/Storages/StorageMaterializedView.cpp b/src/Storages/StorageMaterializedView.cpp index 76fa4b8e20b..f72f6fee180 100644 --- a/src/Storages/StorageMaterializedView.cpp +++ b/src/Storages/StorageMaterializedView.cpp @@ -12,7 +12,6 @@ #include #include #include -#include #include #include @@ -27,6 +26,7 @@ #include #include #include +#include namespace DB { @@ -215,16 +215,16 @@ void StorageMaterializedView::read( } } -BlockOutputStreamPtr StorageMaterializedView::write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr local_context) +SinkToStoragePtr StorageMaterializedView::write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr local_context) { auto storage = getTargetTable(); auto lock = storage->lockForShare(local_context->getCurrentQueryId(), local_context->getSettingsRef().lock_acquire_timeout); auto metadata_snapshot = storage->getInMemoryMetadataPtr(); - auto stream = storage->write(query, metadata_snapshot, local_context); + auto sink = storage->write(query, metadata_snapshot, local_context); - stream->addTableLock(lock); - return stream; + sink->addTableLock(lock); + return sink; } diff --git a/src/Storages/StorageMaterializedView.h b/src/Storages/StorageMaterializedView.h index 1f1729b05f7..d282ece7f56 100644 --- a/src/Storages/StorageMaterializedView.h +++ b/src/Storages/StorageMaterializedView.h @@ -34,7 +34,7 @@ public: return target_table->mayBenefitFromIndexForIn(left_in_operand, query_context, metadata_snapshot); } - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; void drop() override; void dropInnerTableIfAny(bool no_delay, ContextPtr local_context) override; diff --git a/src/Storages/StorageMemory.cpp b/src/Storages/StorageMemory.cpp index 9e1ae24fc75..6823f661984 100644 --- a/src/Storages/StorageMemory.cpp +++ b/src/Storages/StorageMemory.cpp @@ -1,8 +1,6 @@ #include #include -#include - #include #include #include @@ -11,6 +9,7 @@ #include #include #include +#include namespace DB @@ -35,7 +34,7 @@ public: std::shared_ptr> parallel_execution_index_, InitializerFunc initializer_func_ = {}) : SourceWithProgress(metadata_snapshot->getSampleBlockForColumns(column_names_, storage.getVirtuals(), storage.getStorageID())) - , column_names_and_types(metadata_snapshot->getColumns().getAllWithSubcolumns().addTypes(std::move(column_names_))) + , column_names_and_types(metadata_snapshot->getColumns().getByNames(ColumnsDescription::All, column_names_, true)) , data(data_) , parallel_execution_index(parallel_execution_index_) , initializer_func(std::move(initializer_func_)) @@ -100,21 +99,23 @@ private: }; -class MemoryBlockOutputStream : public IBlockOutputStream +class MemorySink : public SinkToStorage { public: - MemoryBlockOutputStream( + MemorySink( StorageMemory & storage_, const StorageMetadataPtr & metadata_snapshot_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) { } - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } + String getName() const override { return "MemorySink"; } - void write(const Block & block) override + void consume(Chunk chunk) override { + auto block = getPort().getHeader().cloneWithColumns(chunk.getColumns()); metadata_snapshot->check(block, true); if (storage.compress) @@ -131,7 +132,7 @@ public: } } - void writeSuffix() override + void onFinish() override { size_t inserted_bytes = 0; size_t inserted_rows = 0; @@ -228,9 +229,9 @@ Pipe StorageMemory::read( } -BlockOutputStreamPtr StorageMemory::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) +SinkToStoragePtr StorageMemory::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) { - return std::make_shared(*this, metadata_snapshot); + return std::make_shared(*this, metadata_snapshot); } diff --git a/src/Storages/StorageMemory.h b/src/Storages/StorageMemory.h index 47f9a942b82..88539d0e731 100644 --- a/src/Storages/StorageMemory.h +++ b/src/Storages/StorageMemory.h @@ -22,7 +22,7 @@ namespace DB */ class StorageMemory final : public shared_ptr_helper, public IStorage { -friend class MemoryBlockOutputStream; +friend class MemorySink; friend struct shared_ptr_helper; public: @@ -47,7 +47,7 @@ public: bool hasEvenlyDistributedRead() const override { return true; } - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) override; void drop() override; diff --git a/src/Storages/StorageMerge.cpp b/src/Storages/StorageMerge.cpp index 2d5bbfc712d..243294351f3 100644 --- a/src/Storages/StorageMerge.cpp +++ b/src/Storages/StorageMerge.cpp @@ -388,6 +388,13 @@ Pipe StorageMerge::createSources( return pipe; } + if (!modified_select.final() && storage->needRewriteQueryWithFinal(real_column_names)) + { + /// NOTE: It may not work correctly in some cases, because query was analyzed without final. + /// However, it's needed for MaterializedMySQL and it's unlikely that someone will use it with Merge tables. + modified_select.setFinal(); + } + auto storage_stage = storage->getQueryProcessingStage(modified_context, QueryProcessingStage::Complete, metadata_snapshot, modified_query_info); if (processed_stage <= storage_stage) @@ -676,14 +683,16 @@ void StorageMerge::convertingSourceStream( auto convert_actions_dag = ActionsDAG::makeConvertingActions(pipe.getHeader().getColumnsWithTypeAndName(), header.getColumnsWithTypeAndName(), ActionsDAG::MatchColumnsMode::Name); - auto actions = std::make_shared(convert_actions_dag, ExpressionActionsSettings::fromContext(local_context, CompileExpressions::yes)); + auto actions = std::make_shared( + convert_actions_dag, + ExpressionActionsSettings::fromContext(local_context, CompileExpressions::yes)); + pipe.addSimpleTransform([&](const Block & stream_header) { return std::make_shared(stream_header, actions); }); } - auto where_expression = query->as()->where(); if (!where_expression) @@ -718,10 +727,10 @@ void StorageMerge::convertingSourceStream( IStorage::ColumnSizeByName StorageMerge::getColumnSizes() const { - auto first_materialize_mysql = getFirstTable([](const StoragePtr & table) { return table && table->getName() == "MaterializeMySQL"; }); - if (!first_materialize_mysql) + auto first_materialized_mysql = getFirstTable([](const StoragePtr & table) { return table && table->getName() == "MaterializedMySQL"; }); + if (!first_materialized_mysql) return {}; - return first_materialize_mysql->getColumnSizes(); + return first_materialized_mysql->getColumnSizes(); } void registerStorageMerge(StorageFactory & factory) diff --git a/src/Storages/StorageMergeTree.cpp b/src/Storages/StorageMergeTree.cpp index b670c1f909c..0763e2a25c4 100644 --- a/src/Storages/StorageMergeTree.cpp +++ b/src/Storages/StorageMergeTree.cpp @@ -18,7 +18,7 @@ #include #include #include -#include +#include #include #include #include @@ -224,11 +224,11 @@ std::optional StorageMergeTree::totalBytes(const Settings &) const return getTotalActiveSizeInBytes(); } -BlockOutputStreamPtr +SinkToStoragePtr StorageMergeTree::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) { const auto & settings = local_context->getSettingsRef(); - return std::make_shared( + return std::make_shared( *this, metadata_snapshot, settings.max_partitions_per_insert_block, local_context); } diff --git a/src/Storages/StorageMergeTree.h b/src/Storages/StorageMergeTree.h index f9552a49018..681475f7a49 100644 --- a/src/Storages/StorageMergeTree.h +++ b/src/Storages/StorageMergeTree.h @@ -61,7 +61,7 @@ public: std::optional totalRowsByPartitionPredicate(const SelectQueryInfo &, ContextPtr) const override; std::optional totalBytes(const Settings &) const override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; /** Perform the next step in combining the parts. */ @@ -241,7 +241,7 @@ private: std::unique_ptr getDefaultSettings() const override; friend class MergeTreeProjectionBlockOutputStream; - friend class MergeTreeBlockOutputStream; + friend class MergeTreeSink; friend class MergeTreeData; diff --git a/src/Storages/StorageMongoDB.cpp b/src/Storages/StorageMongoDB.cpp index e27d16ecc68..1fd58a293dc 100644 --- a/src/Storages/StorageMongoDB.cpp +++ b/src/Storages/StorageMongoDB.cpp @@ -1,4 +1,5 @@ #include "StorageMongoDB.h" +#include "StorageMongoDBSocketFactory.h" #include #include @@ -33,6 +34,7 @@ StorageMongoDB::StorageMongoDB( const std::string & collection_name_, const std::string & username_, const std::string & password_, + const std::string & options_, const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, const String & comment) @@ -43,6 +45,8 @@ StorageMongoDB::StorageMongoDB( , collection_name(collection_name_) , username(username_) , password(password_) + , options(options_) + , uri("mongodb://" + host_ + ":" + std::to_string(port_) + "/" + database_name_ + "?" + options_) { StorageInMemoryMetadata storage_metadata; storage_metadata.setColumns(columns_); @@ -56,7 +60,10 @@ void StorageMongoDB::connectIfNotConnected() { std::lock_guard lock{connection_mutex}; if (!connection) - connection = std::make_shared(host, port); + { + StorageMongoDBSocketFactory factory; + connection = std::make_shared(uri, factory); + } if (!authenticated) { @@ -102,9 +109,9 @@ void registerStorageMongoDB(StorageFactory & factory) { ASTs & engine_args = args.engine_args; - if (engine_args.size() != 5) + if (engine_args.size() < 5 || engine_args.size() > 6) throw Exception( - "Storage MongoDB requires 5 parameters: MongoDB('host:port', database, collection, 'user', 'password').", + "Storage MongoDB requires from 5 to 6 parameters: MongoDB('host:port', database, collection, 'user', 'password' [, 'options']).", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); for (auto & engine_arg : engine_args) @@ -118,6 +125,11 @@ void registerStorageMongoDB(StorageFactory & factory) const String & username = engine_args[3]->as().value.safeGet(); const String & password = engine_args[4]->as().value.safeGet(); + String options; + + if (engine_args.size() >= 6) + options = engine_args[5]->as().value.safeGet(); + return StorageMongoDB::create( args.table_id, parsed_host_port.first, @@ -126,6 +138,7 @@ void registerStorageMongoDB(StorageFactory & factory) collection, username, password, + options, args.columns, args.constraints, args.comment); diff --git a/src/Storages/StorageMongoDB.h b/src/Storages/StorageMongoDB.h index 2553acdd40c..3014b88a9ca 100644 --- a/src/Storages/StorageMongoDB.h +++ b/src/Storages/StorageMongoDB.h @@ -26,6 +26,7 @@ public: const std::string & collection_name_, const std::string & username_, const std::string & password_, + const std::string & options_, const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, const String & comment); @@ -50,6 +51,8 @@ private: const std::string collection_name; const std::string username; const std::string password; + const std::string options; + const std::string uri; std::shared_ptr connection; bool authenticated = false; diff --git a/src/Storages/StorageMongoDBSocketFactory.cpp b/src/Storages/StorageMongoDBSocketFactory.cpp new file mode 100644 index 00000000000..9a2b120e9ed --- /dev/null +++ b/src/Storages/StorageMongoDBSocketFactory.cpp @@ -0,0 +1,55 @@ +#include "StorageMongoDBSocketFactory.h" + +#include + +#if !defined(ARCADIA_BUILD) +# include +#endif + +#include +#include + +#if USE_SSL +# include +#endif + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int FEATURE_IS_NOT_ENABLED_AT_BUILD_TIME; +} + +Poco::Net::StreamSocket StorageMongoDBSocketFactory::createSocket(const std::string & host, int port, Poco::Timespan connectTimeout, bool secure) +{ + return secure ? createSecureSocket(host, port, connectTimeout) : createPlainSocket(host, port, connectTimeout); +} + +Poco::Net::StreamSocket StorageMongoDBSocketFactory::createPlainSocket(const std::string & host, int port, Poco::Timespan connectTimeout) +{ + Poco::Net::SocketAddress address(host, port); + Poco::Net::StreamSocket socket; + + socket.connect(address, connectTimeout); + + return socket; +} + + +Poco::Net::StreamSocket StorageMongoDBSocketFactory::createSecureSocket(const std::string & host [[maybe_unused]], int port [[maybe_unused]], Poco::Timespan connectTimeout [[maybe_unused]]) +{ +#if USE_SSL + Poco::Net::SocketAddress address(host, port); + Poco::Net::SecureStreamSocket socket; + + socket.connect(address, connectTimeout); + + return socket; +#else + throw Exception("SSL is not enabled at build time.", ErrorCodes::FEATURE_IS_NOT_ENABLED_AT_BUILD_TIME); +#endif +} + +} diff --git a/src/Storages/StorageMongoDBSocketFactory.h b/src/Storages/StorageMongoDBSocketFactory.h new file mode 100644 index 00000000000..5fc423c63cb --- /dev/null +++ b/src/Storages/StorageMongoDBSocketFactory.h @@ -0,0 +1,19 @@ +#pragma once + +#include + + +namespace DB +{ + +class StorageMongoDBSocketFactory : public Poco::MongoDB::Connection::SocketFactory +{ +public: + virtual Poco::Net::StreamSocket createSocket(const std::string & host, int port, Poco::Timespan connectTimeout, bool secure) override; + +private: + static Poco::Net::StreamSocket createPlainSocket(const std::string & host, int port, Poco::Timespan connectTimeout); + static Poco::Net::StreamSocket createSecureSocket(const std::string & host, int port, Poco::Timespan connectTimeout); +}; + +} diff --git a/src/Storages/StorageMySQL.cpp b/src/Storages/StorageMySQL.cpp index 5d37806c9ba..99a930f37c4 100644 --- a/src/Storages/StorageMySQL.cpp +++ b/src/Storages/StorageMySQL.cpp @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -108,17 +109,18 @@ Pipe StorageMySQL::read( } -class StorageMySQLBlockOutputStream : public IBlockOutputStream +class StorageMySQLSink : public SinkToStorage { public: - explicit StorageMySQLBlockOutputStream( + explicit StorageMySQLSink( const StorageMySQL & storage_, const StorageMetadataPtr & metadata_snapshot_, const std::string & remote_database_name_, const std::string & remote_table_name_, const mysqlxx::PoolWithFailover::Entry & entry_, const size_t & mysql_max_rows_to_insert) - : storage{storage_} + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage{storage_} , metadata_snapshot{metadata_snapshot_} , remote_database_name{remote_database_name_} , remote_table_name{remote_table_name_} @@ -127,10 +129,11 @@ public: { } - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } + String getName() const override { return "StorageMySQLSink"; } - void write(const Block & block) override + void consume(Chunk chunk) override { + auto block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); auto blocks = splitBlocks(block, max_batch_rows); mysqlxx::Transaction trans(entry); try @@ -221,9 +224,9 @@ private: }; -BlockOutputStreamPtr StorageMySQL::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) +SinkToStoragePtr StorageMySQL::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) { - return std::make_shared( + return std::make_shared( *this, metadata_snapshot, remote_database_name, diff --git a/src/Storages/StorageMySQL.h b/src/Storages/StorageMySQL.h index 2bcc7af5f2b..70d7a4455b1 100644 --- a/src/Storages/StorageMySQL.h +++ b/src/Storages/StorageMySQL.h @@ -48,10 +48,10 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; private: - friend class StorageMySQLBlockOutputStream; + friend class StorageMySQLSink; std::string remote_database_name; std::string remote_table_name; diff --git a/src/Storages/StorageNull.h b/src/Storages/StorageNull.h index c20937b2359..fda6faffb1e 100644 --- a/src/Storages/StorageNull.h +++ b/src/Storages/StorageNull.h @@ -6,6 +6,7 @@ #include #include #include +#include #include @@ -36,9 +37,9 @@ public: bool supportsParallelInsert() const override { return true; } - BlockOutputStreamPtr write(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr) override + SinkToStoragePtr write(const ASTPtr &, const StorageMetadataPtr & metadata_snapshot, ContextPtr) override { - return std::make_shared(metadata_snapshot->getSampleBlock()); + return std::make_shared(metadata_snapshot->getSampleBlock()); } void checkAlterIsPossible(const AlterCommands & commands, ContextPtr context) const override; diff --git a/src/Storages/StoragePostgreSQL.cpp b/src/Storages/StoragePostgreSQL.cpp index 211a626e8d4..6072412af35 100644 --- a/src/Storages/StoragePostgreSQL.cpp +++ b/src/Storages/StoragePostgreSQL.cpp @@ -27,6 +27,7 @@ #include #include #include +#include #include @@ -94,25 +95,27 @@ Pipe StoragePostgreSQL::read( } -class PostgreSQLBlockOutputStream : public IBlockOutputStream +class PostgreSQLSink : public SinkToStorage { public: - explicit PostgreSQLBlockOutputStream( + explicit PostgreSQLSink( const StorageMetadataPtr & metadata_snapshot_, postgres::ConnectionHolderPtr connection_holder_, const String & remote_table_name_, const String & remote_table_schema_) - : metadata_snapshot(metadata_snapshot_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , metadata_snapshot(metadata_snapshot_) , connection_holder(std::move(connection_holder_)) , remote_table_name(remote_table_name_) , remote_table_schema(remote_table_schema_) { } - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } + String getName() const override { return "PostgreSQLSink"; } - void write(const Block & block) override + void consume(Chunk chunk) override { + auto block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); if (!inserter) inserter = std::make_unique(connection_holder->get(), remote_table_schema.empty() ? pqxx::table_path({remote_table_name}) @@ -155,7 +158,7 @@ public: } } - void writeSuffix() override + void onFinish() override { if (inserter) inserter->complete(); @@ -234,6 +237,10 @@ public: else if (which.isFloat64()) nested_column = ColumnFloat64::create(); else if (which.isDate()) nested_column = ColumnUInt16::create(); else if (which.isDateTime()) nested_column = ColumnUInt32::create(); + else if (which.isDateTime64()) + { + nested_column = ColumnDecimal::create(0, 6); + } else if (which.isDecimal32()) { const auto & type = typeid_cast *>(nested.get()); @@ -291,10 +298,10 @@ private: }; -BlockOutputStreamPtr StoragePostgreSQL::write( +SinkToStoragePtr StoragePostgreSQL::write( const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /* context */) { - return std::make_shared(metadata_snapshot, pool->get(), remote_table_name, remote_table_schema); + return std::make_shared(metadata_snapshot, pool->get(), remote_table_name, remote_table_schema); } diff --git a/src/Storages/StoragePostgreSQL.h b/src/Storages/StoragePostgreSQL.h index 5a8ecf5598f..064fa481f9d 100644 --- a/src/Storages/StoragePostgreSQL.h +++ b/src/Storages/StoragePostgreSQL.h @@ -41,7 +41,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; private: friend class PostgreSQLBlockOutputStream; diff --git a/src/Storages/StorageProxy.h b/src/Storages/StorageProxy.h index 60e7bf24046..521a2b8d642 100644 --- a/src/Storages/StorageProxy.h +++ b/src/Storages/StorageProxy.h @@ -63,7 +63,7 @@ public: return getNested()->read(column_names, metadata_snapshot, query_info, context, processed_stage, max_block_size, num_streams); } - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) override + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) override { return getNested()->write(query, metadata_snapshot, context); } diff --git a/src/Storages/StorageReplicatedMergeTree.cpp b/src/Storages/StorageReplicatedMergeTree.cpp index 4129942eb92..150a71a09e5 100644 --- a/src/Storages/StorageReplicatedMergeTree.cpp +++ b/src/Storages/StorageReplicatedMergeTree.cpp @@ -21,7 +21,7 @@ #include #include #include -#include +#include #include #include #include @@ -155,7 +155,6 @@ namespace ActionLocks static const auto QUEUE_UPDATE_ERROR_SLEEP_MS = 1 * 1000; -static const auto MERGE_SELECTING_SLEEP_MS = 5 * 1000; static const auto MUTATIONS_FINALIZING_SLEEP_MS = 1 * 1000; static const auto MUTATIONS_FINALIZING_IDLE_SLEEP_MS = 5 * 1000; @@ -2782,6 +2781,16 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo } } + { + /// Check "is_lost" version after retrieving queue and parts. + /// If version has changed, then replica most likely has been dropped and parts set is inconsistent, + /// so throw exception and retry cloning. + Coordination::Stat is_lost_stat_new; + zookeeper->get(fs::path(source_path) / "is_lost", &is_lost_stat_new); + if (is_lost_stat_new.version != source_is_lost_stat.version) + throw Exception(ErrorCodes::REPLICA_STATUS_CHANGED, "Cannot clone {}, because it suddenly become lost", source_replica); + } + tryRemovePartsFromZooKeeperWithRetries(parts_to_remove_from_zk); auto local_active_parts = getDataParts(); @@ -3346,7 +3355,7 @@ void StorageReplicatedMergeTree::mergeSelectingTask() if (create_result != CreateMergeEntryResult::Ok && create_result != CreateMergeEntryResult::LogUpdated) { - merge_selecting_task->scheduleAfter(MERGE_SELECTING_SLEEP_MS); + merge_selecting_task->scheduleAfter(storage_settings_ptr->merge_selecting_sleep_ms); } else { @@ -4368,12 +4377,6 @@ void StorageReplicatedMergeTree::shutdown() /// Wait for all of them std::unique_lock lock(data_parts_exchange_ptr->rwlock); } - - /// We clear all old parts after stopping all background operations. It's - /// important, because background operations can produce temporary parts - /// which will remove themselves in their destructors. If so, we may have - /// race condition between our remove call and background process. - clearOldPartsFromFilesystem(true); } @@ -4545,7 +4548,7 @@ void StorageReplicatedMergeTree::assertNotReadonly() const } -BlockOutputStreamPtr StorageReplicatedMergeTree::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) +SinkToStoragePtr StorageReplicatedMergeTree::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) { const auto storage_settings_ptr = getSettings(); assertNotReadonly(); @@ -4554,7 +4557,7 @@ BlockOutputStreamPtr StorageReplicatedMergeTree::write(const ASTPtr & /*query*/, bool deduplicate = storage_settings_ptr->replicated_deduplication_window != 0 && query_settings.insert_deduplicate; // TODO: should we also somehow pass list of columns to deduplicate on to the ReplicatedMergeTreeBlockOutputStream ? - return std::make_shared( + return std::make_shared( *this, metadata_snapshot, query_settings.insert_quorum, query_settings.insert_quorum_timeout.totalMilliseconds(), query_settings.max_partitions_per_insert_block, @@ -4969,6 +4972,8 @@ void StorageReplicatedMergeTree::alter( if (auto txn = query_context->getZooKeeperMetadataTransaction()) { + /// It would be better to clone ops instead of moving, so we could retry on ZBADVERSION, + /// but clone() is not implemented for Coordination::Request. txn->moveOpsTo(ops); /// NOTE: IDatabase::alterTable(...) is called when executing ALTER_METADATA queue entry without query context, /// so we have to update metadata of DatabaseReplicated here. @@ -5012,6 +5017,11 @@ void StorageReplicatedMergeTree::alter( throw Exception("Metadata on replica is not up to date with common metadata in Zookeeper. Cannot alter", ErrorCodes::CANNOT_ASSIGN_ALTER); + /// Cannot retry automatically, because some zookeeper ops were lost on the first attempt. Will retry on DDLWorker-level. + if (query_context->getZooKeeperMetadataTransaction()) + throw Exception("Cannot execute alter, because mutations version was suddenly changed due to concurrent alter", + ErrorCodes::CANNOT_ASSIGN_ALTER); + continue; } else @@ -5254,7 +5264,7 @@ PartitionCommandsResultInfo StorageReplicatedMergeTree::attachPartition( MutableDataPartsVector loaded_parts = tryLoadPartsToAttach(partition, attach_part, query_context, renamed_parts); /// TODO Allow to use quorum here. - ReplicatedMergeTreeBlockOutputStream output(*this, metadata_snapshot, 0, 0, 0, false, false, query_context, + ReplicatedMergeTreeSink output(*this, metadata_snapshot, 0, 0, 0, false, false, query_context, /*is_attach*/true); for (size_t i = 0; i < loaded_parts.size(); ++i) @@ -5572,8 +5582,11 @@ void StorageReplicatedMergeTree::getStatus(Status & res, bool with_zk_fields) res.total_replicas = all_replicas.size(); for (const String & replica : all_replicas) - if (zookeeper->exists(fs::path(zookeeper_path) / "replicas" / replica / "is_active")) - ++res.active_replicas; + { + bool is_replica_active = zookeeper->exists(fs::path(zookeeper_path) / "replicas" / replica / "is_active"); + res.active_replicas += static_cast(is_replica_active); + res.replica_is_active.emplace(replica, is_replica_active); + } } catch (const Coordination::Exception &) { @@ -6004,6 +6017,10 @@ void StorageReplicatedMergeTree::mutate(const MutationCommands & commands, Conte } else if (rc == Coordination::Error::ZBADVERSION) { + /// Cannot retry automatically, because some zookeeper ops were lost on the first attempt. Will retry on DDLWorker-level. + if (query_context->getZooKeeperMetadataTransaction()) + throw Exception("Cannot execute alter, because mutations version was suddenly changed due to concurrent alter", + ErrorCodes::CANNOT_ASSIGN_ALTER); LOG_TRACE(log, "Version conflict when trying to create a mutation node, retrying..."); continue; } diff --git a/src/Storages/StorageReplicatedMergeTree.h b/src/Storages/StorageReplicatedMergeTree.h index 800f419cb76..3d2727d7bb9 100644 --- a/src/Storages/StorageReplicatedMergeTree.h +++ b/src/Storages/StorageReplicatedMergeTree.h @@ -116,7 +116,7 @@ public: std::optional totalRowsByPartitionPredicate(const SelectQueryInfo & query_info, ContextPtr context) const override; std::optional totalBytes(const Settings & settings) const override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; bool optimize( const ASTPtr & query, @@ -176,6 +176,7 @@ public: UInt8 active_replicas; /// If the error has happened fetching the info from ZooKeeper, this field will be set. String zookeeper_exception; + std::unordered_map replica_is_active; }; /// Get the status of the table. If with_zk_fields = false - do not fill in the fields that require queries to ZK. @@ -269,7 +270,7 @@ private: /// Delete old parts from disk and from ZooKeeper. void clearOldPartsAndRemoveFromZK(); - friend class ReplicatedMergeTreeBlockOutputStream; + friend class ReplicatedMergeTreeSink; friend class ReplicatedMergeTreePartCheckThread; friend class ReplicatedMergeTreeCleanupThread; friend class ReplicatedMergeTreeAlterThread; diff --git a/src/Storages/StorageS3.cpp b/src/Storages/StorageS3.cpp index b4fec69e075..fc3ce3a10ed 100644 --- a/src/Storages/StorageS3.cpp +++ b/src/Storages/StorageS3.cpp @@ -19,9 +19,12 @@ #include #include -#include +#include #include +#include +#include + #include #include @@ -35,6 +38,7 @@ #include #include +#include #include #include #include @@ -204,12 +208,21 @@ bool StorageS3Source::initialize() file_path = fs::path(bucket) / current_key; read_buf = wrapReadBufferWithCompressionMethod( - std::make_unique(client, bucket, current_key, max_single_read_retries), chooseCompressionMethod(current_key, compression_hint)); + std::make_unique(client, bucket, current_key, max_single_read_retries, DBMS_DEFAULT_BUFFER_SIZE), + chooseCompressionMethod(current_key, compression_hint)); auto input_format = FormatFactory::instance().getInput(format, *read_buf, sample_block, getContext(), max_block_size); - reader = std::make_shared(input_format); + pipeline = std::make_unique(); + pipeline->init(Pipe(input_format)); if (columns_desc.hasDefaults()) - reader = std::make_shared(reader, columns_desc, getContext()); + { + pipeline->addSimpleTransform([&](const Block & header) + { + return std::make_shared(header, columns_desc, *input_format, getContext()); + }); + } + + reader = std::make_unique(*pipeline); initialized = false; return true; @@ -225,31 +238,25 @@ Chunk StorageS3Source::generate() if (!reader) return {}; - if (!initialized) + Chunk chunk; + if (reader->pull(chunk)) { - reader->readPrefix(); - initialized = true; - } - - if (auto block = reader->read()) - { - auto columns = block.getColumns(); - UInt64 num_rows = block.rows(); + UInt64 num_rows = chunk.getNumRows(); if (with_path_column) - columns.push_back(DataTypeString().createColumnConst(num_rows, file_path)->convertToFullColumnIfConst()); + chunk.addColumn(DataTypeString().createColumnConst(num_rows, file_path)->convertToFullColumnIfConst()); if (with_file_column) { size_t last_slash_pos = file_path.find_last_of('/'); - columns.push_back(DataTypeString().createColumnConst(num_rows, file_path.substr( + chunk.addColumn(DataTypeString().createColumnConst(num_rows, file_path.substr( last_slash_pos + 1))->convertToFullColumnIfConst()); } - return Chunk(std::move(columns), num_rows); + return chunk; } - reader->readSuffix(); reader.reset(); + pipeline.reset(); read_buf.reset(); if (!initialize()) @@ -259,10 +266,10 @@ Chunk StorageS3Source::generate() } -class StorageS3BlockOutputStream : public IBlockOutputStream +class StorageS3Sink : public SinkToStorage { public: - StorageS3BlockOutputStream( + StorageS3Sink( const String & format, const Block & sample_block_, ContextPtr context, @@ -272,34 +279,32 @@ public: const String & key, size_t min_upload_part_size, size_t max_single_part_upload_size) - : sample_block(sample_block_) + : SinkToStorage(sample_block_) + , sample_block(sample_block_) { write_buf = wrapWriteBufferWithCompressionMethod( std::make_unique(client, bucket, key, min_upload_part_size, max_single_part_upload_size), compression_method, 3); writer = FormatFactory::instance().getOutputStreamParallelIfPossible(format, *write_buf, sample_block, context); } - Block getHeader() const override + String getName() const override { return "StorageS3Sink"; } + + void consume(Chunk chunk) override { - return sample_block; + if (is_first_chunk) + { + writer->writePrefix(); + is_first_chunk = false; + } + writer->write(getPort().getHeader().cloneWithColumns(chunk.detachColumns())); } - void write(const Block & block) override - { - writer->write(block); - } + // void flush() override + // { + // writer->flush(); + // } - void writePrefix() override - { - writer->writePrefix(); - } - - void flush() override - { - writer->flush(); - } - - void writeSuffix() override + void onFinish() override { try { @@ -319,6 +324,7 @@ private: Block sample_block; std::unique_ptr write_buf; BlockOutputStreamPtr writer; + bool is_first_chunk = true; }; @@ -421,10 +427,10 @@ Pipe StorageS3::read( return pipe; } -BlockOutputStreamPtr StorageS3::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) +SinkToStoragePtr StorageS3::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) { updateClientAndAuthSettings(local_context, client_auth); - return std::make_shared( + return std::make_shared( format_name, metadata_snapshot->getSampleBlock(), local_context, diff --git a/src/Storages/StorageS3.h b/src/Storages/StorageS3.h index 37ada9e8eb7..a0089578947 100644 --- a/src/Storages/StorageS3.h +++ b/src/Storages/StorageS3.h @@ -27,6 +27,7 @@ namespace Aws::S3 namespace DB { +class PullingPipelineExecutor; class StorageS3SequentialSource; class StorageS3Source : public SourceWithProgress, WithContext { @@ -79,7 +80,8 @@ private: std::unique_ptr read_buf; - BlockInputStreamPtr reader; + std::unique_ptr pipeline; + std::unique_ptr reader; bool initialized = false; bool with_file_column = false; bool with_path_column = false; @@ -128,7 +130,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; void truncate(const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context, TableExclusiveLockHolder &) override; diff --git a/src/Storages/StorageS3Cluster.cpp b/src/Storages/StorageS3Cluster.cpp index 8a320190036..ba2ed4c55cd 100644 --- a/src/Storages/StorageS3Cluster.cpp +++ b/src/Storages/StorageS3Cluster.cpp @@ -26,7 +26,7 @@ #include #include #include -#include +#include #include #include #include @@ -106,7 +106,6 @@ Pipe StorageS3Cluster::read( const Scalars & scalars = context->hasQueryContext() ? context->getQueryContext()->getScalars() : Scalars{}; Pipes pipes; - connections.reserve(cluster->getShardCount()); const bool add_agg_info = processed_stage == QueryProcessingStage::WithMergeableState; @@ -115,19 +114,27 @@ Pipe StorageS3Cluster::read( /// There will be only one replica, because we consider each replica as a shard for (const auto & node : replicas) { - connections.emplace_back(std::make_shared( + auto connection = std::make_shared( node.host_name, node.port, context->getGlobalContext()->getCurrentDatabase(), node.user, node.password, node.cluster, node.cluster_secret, "S3ClusterInititiator", node.compression, node.secure - )); + ); + /// For unknown reason global context is passed to IStorage::read() method /// So, task_identifier is passed as constructor argument. It is more obvious. auto remote_query_executor = std::make_shared( - *connections.back(), queryToString(query_info.query), header, context, - /*throttler=*/nullptr, scalars, Tables(), processed_stage, callback); + connection, + queryToString(query_info.query), + header, + context, + /*throttler=*/nullptr, + scalars, + Tables(), + processed_stage, + callback); pipes.emplace_back(std::make_shared(remote_query_executor, add_agg_info, false)); } diff --git a/src/Storages/StorageS3Cluster.h b/src/Storages/StorageS3Cluster.h index 821765a3780..81d82052d4d 100644 --- a/src/Storages/StorageS3Cluster.h +++ b/src/Storages/StorageS3Cluster.h @@ -50,8 +50,6 @@ protected: const String & compression_method_); private: - /// Connections from initiator to other nodes - std::vector> connections; StorageS3::ClientAuthentication client_auth; String filename; diff --git a/src/Storages/StorageSQLite.cpp b/src/Storages/StorageSQLite.cpp index f03576e2895..ba66083fea5 100644 --- a/src/Storages/StorageSQLite.cpp +++ b/src/Storages/StorageSQLite.cpp @@ -3,6 +3,7 @@ #if USE_SQLITE #include #include +#include #include #include #include @@ -11,8 +12,10 @@ #include #include #include +#include #include #include +#include namespace DB @@ -27,6 +30,7 @@ namespace ErrorCodes StorageSQLite::StorageSQLite( const StorageID & table_id_, SQLitePtr sqlite_db_, + const String & database_path_, const String & remote_table_name_, const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, @@ -34,6 +38,7 @@ StorageSQLite::StorageSQLite( : IStorage(table_id_) , WithContext(context_->getGlobalContext()) , remote_table_name(remote_table_name_) + , database_path(database_path_) , global_context(context_) , sqlite_db(sqlite_db_) { @@ -53,6 +58,9 @@ Pipe StorageSQLite::read( size_t max_block_size, unsigned int) { + if (!sqlite_db) + sqlite_db = openSQLiteDB(database_path, getContext(), /* throw_on_error */true); + metadata_snapshot->check(column_names, getVirtuals(), getStorageID()); String query = transformQueryForExternalDatabase( @@ -75,25 +83,27 @@ Pipe StorageSQLite::read( } -class SQLiteBlockOutputStream : public IBlockOutputStream +class SQLiteSink : public SinkToStorage { public: - explicit SQLiteBlockOutputStream( + explicit SQLiteSink( const StorageSQLite & storage_, const StorageMetadataPtr & metadata_snapshot_, StorageSQLite::SQLitePtr sqlite_db_, const String & remote_table_name_) - : storage{storage_} + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage{storage_} , metadata_snapshot(metadata_snapshot_) , sqlite_db(sqlite_db_) , remote_table_name(remote_table_name_) { } - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } + String getName() const override { return "SQLiteSink"; } - void write(const Block & block) override + void consume(Chunk chunk) override { + auto block = getPort().getHeader().cloneWithColumns(chunk.getColumns()); WriteBufferFromOwnString sqlbuf; sqlbuf << "INSERT INTO "; @@ -135,9 +145,11 @@ private: }; -BlockOutputStreamPtr StorageSQLite::write(const ASTPtr & /* query */, const StorageMetadataPtr & metadata_snapshot, ContextPtr) +SinkToStoragePtr StorageSQLite::write(const ASTPtr & /* query */, const StorageMetadataPtr & metadata_snapshot, ContextPtr) { - return std::make_shared(*this, metadata_snapshot, sqlite_db, remote_table_name); + if (!sqlite_db) + sqlite_db = openSQLiteDB(database_path, getContext(), /* throw_on_error */true); + return std::make_shared(*this, metadata_snapshot, sqlite_db, remote_table_name); } @@ -157,14 +169,9 @@ void registerStorageSQLite(StorageFactory & factory) const auto database_path = engine_args[0]->as().value.safeGet(); const auto table_name = engine_args[1]->as().value.safeGet(); - sqlite3 * tmp_sqlite_db = nullptr; - int status = sqlite3_open(database_path.c_str(), &tmp_sqlite_db); - if (status != SQLITE_OK) - throw Exception(ErrorCodes::SQLITE_ENGINE_ERROR, - "Failed to open sqlite database. Status: {}. Message: {}", - status, sqlite3_errstr(status)); + auto sqlite_db = openSQLiteDB(database_path, args.getContext(), /* throw_on_error */!args.attach); - return StorageSQLite::create(args.table_id, std::shared_ptr(tmp_sqlite_db, sqlite3_close), + return StorageSQLite::create(args.table_id, sqlite_db, database_path, table_name, args.columns, args.constraints, args.getContext()); }, { diff --git a/src/Storages/StorageSQLite.h b/src/Storages/StorageSQLite.h index 63b7a6fd415..ccb807a4e47 100644 --- a/src/Storages/StorageSQLite.h +++ b/src/Storages/StorageSQLite.h @@ -24,6 +24,7 @@ public: StorageSQLite( const StorageID & table_id_, SQLitePtr sqlite_db_, + const String & database_path_, const String & remote_table_name_, const ColumnsDescription & columns_, const ConstraintsDescription & constraints_, @@ -40,10 +41,11 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; private: String remote_table_name; + String database_path; ContextPtr global_context; SQLitePtr sqlite_db; }; diff --git a/src/Storages/StorageSet.cpp b/src/Storages/StorageSet.cpp index f585a5747b8..67fd89f5098 100644 --- a/src/Storages/StorageSet.cpp +++ b/src/Storages/StorageSet.cpp @@ -13,6 +13,7 @@ #include #include #include +#include namespace DB @@ -30,17 +31,17 @@ namespace ErrorCodes } -class SetOrJoinBlockOutputStream : public IBlockOutputStream +class SetOrJoinSink : public SinkToStorage { public: - SetOrJoinBlockOutputStream( + SetOrJoinSink( StorageSetOrJoinBase & table_, const StorageMetadataPtr & metadata_snapshot_, const String & backup_path_, const String & backup_tmp_path_, const String & backup_file_name_, bool persistent_); - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } - void write(const Block & block) override; - void writeSuffix() override; + String getName() const override { return "SetOrJoinSink"; } + void consume(Chunk chunk) override; + void onFinish() override; private: StorageSetOrJoinBase & table; @@ -55,14 +56,15 @@ private: }; -SetOrJoinBlockOutputStream::SetOrJoinBlockOutputStream( +SetOrJoinSink::SetOrJoinSink( StorageSetOrJoinBase & table_, const StorageMetadataPtr & metadata_snapshot_, const String & backup_path_, const String & backup_tmp_path_, const String & backup_file_name_, bool persistent_) - : table(table_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , table(table_) , metadata_snapshot(metadata_snapshot_) , backup_path(backup_path_) , backup_tmp_path(backup_tmp_path_) @@ -74,17 +76,17 @@ SetOrJoinBlockOutputStream::SetOrJoinBlockOutputStream( { } -void SetOrJoinBlockOutputStream::write(const Block & block) +void SetOrJoinSink::consume(Chunk chunk) { /// Sort columns in the block. This is necessary, since Set and Join count on the same column order in different blocks. - Block sorted_block = block.sortColumns(); + Block sorted_block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()).sortColumns(); table.insertBlock(sorted_block); if (persistent) backup_stream.write(sorted_block); } -void SetOrJoinBlockOutputStream::writeSuffix() +void SetOrJoinSink::onFinish() { table.finishInsert(); if (persistent) @@ -99,10 +101,10 @@ void SetOrJoinBlockOutputStream::writeSuffix() } -BlockOutputStreamPtr StorageSetOrJoinBase::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) +SinkToStoragePtr StorageSetOrJoinBase::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr /*context*/) { UInt64 id = ++increment; - return std::make_shared(*this, metadata_snapshot, path, path + "tmp/", toString(id) + ".bin", persistent); + return std::make_shared(*this, metadata_snapshot, path, path + "tmp/", toString(id) + ".bin", persistent); } diff --git a/src/Storages/StorageSet.h b/src/Storages/StorageSet.h index 90b4a00d915..1166557ec8e 100644 --- a/src/Storages/StorageSet.h +++ b/src/Storages/StorageSet.h @@ -18,12 +18,12 @@ using SetPtr = std::shared_ptr; */ class StorageSetOrJoinBase : public IStorage { - friend class SetOrJoinBlockOutputStream; + friend class SetOrJoinSink; public: void rename(const String & new_path_to_table_data, const StorageID & new_table_id) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; bool storesDataOnDisk() const override { return true; } Strings getDataPaths() const override { return {path}; } diff --git a/src/Storages/StorageStripeLog.cpp b/src/Storages/StorageStripeLog.cpp index 36b10dfd2bb..6bf91a145ed 100644 --- a/src/Storages/StorageStripeLog.cpp +++ b/src/Storages/StorageStripeLog.cpp @@ -14,7 +14,6 @@ #include #include -#include #include #include #include @@ -32,6 +31,7 @@ #include "StorageLogSettings.h" #include #include +#include #include #include @@ -154,12 +154,13 @@ private: }; -class StripeLogBlockOutputStream final : public IBlockOutputStream +class StripeLogSink final : public SinkToStorage { public: - explicit StripeLogBlockOutputStream( + explicit StripeLogSink( StorageStripeLog & storage_, const StorageMetadataPtr & metadata_snapshot_, std::unique_lock && lock_) - : storage(storage_) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_) , metadata_snapshot(metadata_snapshot_) , lock(std::move(lock_)) , data_out_file(storage.table_path + "data.bin") @@ -182,7 +183,9 @@ public: } } - ~StripeLogBlockOutputStream() override + String getName() const override { return "StripeLogSink"; } + + ~StripeLogSink() override { try { @@ -203,14 +206,12 @@ public: } } - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } - - void write(const Block & block) override + void consume(Chunk chunk) override { - block_out.write(block); + block_out.write(getPort().getHeader().cloneWithColumns(chunk.detachColumns())); } - void writeSuffix() override + void onFinish() override { if (done) return; @@ -369,13 +370,13 @@ Pipe StorageStripeLog::read( } -BlockOutputStreamPtr StorageStripeLog::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) +SinkToStoragePtr StorageStripeLog::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) { std::unique_lock lock(rwlock, getLockTimeout(context)); if (!lock) throw Exception("Lock timeout exceeded", ErrorCodes::TIMEOUT_EXCEEDED); - return std::make_shared(*this, metadata_snapshot, std::move(lock)); + return std::make_shared(*this, metadata_snapshot, std::move(lock)); } diff --git a/src/Storages/StorageStripeLog.h b/src/Storages/StorageStripeLog.h index 0749273043f..73d46b1e617 100644 --- a/src/Storages/StorageStripeLog.h +++ b/src/Storages/StorageStripeLog.h @@ -19,7 +19,7 @@ namespace DB class StorageStripeLog final : public shared_ptr_helper, public IStorage { friend class StripeLogSource; - friend class StripeLogBlockOutputStream; + friend class StripeLogSink; friend struct shared_ptr_helper; public: @@ -34,7 +34,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; void rename(const String & new_path_to_table_data, const StorageID & new_table_id) override; diff --git a/src/Storages/StorageTableFunction.h b/src/Storages/StorageTableFunction.h index 859735fec5b..75480db7ef3 100644 --- a/src/Storages/StorageTableFunction.h +++ b/src/Storages/StorageTableFunction.h @@ -113,7 +113,7 @@ public: return pipe; } - BlockOutputStreamPtr write( + SinkToStoragePtr write( const ASTPtr & query, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) override diff --git a/src/Storages/StorageTinyLog.cpp b/src/Storages/StorageTinyLog.cpp index 342101d91cc..53ae74c4e00 100644 --- a/src/Storages/StorageTinyLog.cpp +++ b/src/Storages/StorageTinyLog.cpp @@ -23,9 +23,6 @@ #include -#include -#include - #include #include @@ -38,6 +35,7 @@ #include "StorageLogSettings.h" #include +#include #include #define DBMS_STORAGE_LOG_DATA_FILE_EXTENSION ".bin" @@ -193,14 +191,15 @@ void TinyLogSource::readData(const NameAndTypePair & name_and_type, } -class TinyLogBlockOutputStream final : public IBlockOutputStream +class TinyLogSink final : public SinkToStorage { public: - explicit TinyLogBlockOutputStream( + explicit TinyLogSink( StorageTinyLog & storage_, const StorageMetadataPtr & metadata_snapshot_, std::unique_lock && lock_) - : storage(storage_), metadata_snapshot(metadata_snapshot_), lock(std::move(lock_)) + : SinkToStorage(metadata_snapshot_->getSampleBlock()) + , storage(storage_), metadata_snapshot(metadata_snapshot_), lock(std::move(lock_)) { if (!lock) throw Exception("Lock timeout exceeded", ErrorCodes::TIMEOUT_EXCEEDED); @@ -214,7 +213,7 @@ public: } } - ~TinyLogBlockOutputStream() override + ~TinyLogSink() override { try { @@ -232,10 +231,10 @@ public: } } - Block getHeader() const override { return metadata_snapshot->getSampleBlock(); } + String getName() const override { return "TinyLogSink"; } - void write(const Block & block) override; - void writeSuffix() override; + void consume(Chunk chunk) override; + void onFinish() override; private: StorageTinyLog & storage; @@ -275,7 +274,7 @@ private: }; -ISerialization::OutputStreamGetter TinyLogBlockOutputStream::createStreamGetter( +ISerialization::OutputStreamGetter TinyLogSink::createStreamGetter( const NameAndTypePair & column, WrittenStreams & written_streams) { @@ -299,7 +298,7 @@ ISerialization::OutputStreamGetter TinyLogBlockOutputStream::createStreamGetter( } -void TinyLogBlockOutputStream::writeData(const NameAndTypePair & name_and_type, const IColumn & column, WrittenStreams & written_streams) +void TinyLogSink::writeData(const NameAndTypePair & name_and_type, const IColumn & column, WrittenStreams & written_streams) { ISerialization::SerializeBinaryBulkSettings settings; const auto & [name, type] = name_and_type; @@ -319,7 +318,7 @@ void TinyLogBlockOutputStream::writeData(const NameAndTypePair & name_and_type, } -void TinyLogBlockOutputStream::writeSuffix() +void TinyLogSink::onFinish() { if (done) return; @@ -333,7 +332,7 @@ void TinyLogBlockOutputStream::writeSuffix() WrittenStreams written_streams; ISerialization::SerializeBinaryBulkSettings settings; - for (const auto & column : getHeader()) + for (const auto & column : getPort().getHeader()) { auto it = serialize_states.find(column.name); if (it != serialize_states.end()) @@ -366,8 +365,9 @@ void TinyLogBlockOutputStream::writeSuffix() } -void TinyLogBlockOutputStream::write(const Block & block) +void TinyLogSink::consume(Chunk chunk) { + auto block = getPort().getHeader().cloneWithColumns(chunk.detachColumns()); metadata_snapshot->check(block, true); /// The set of written offset columns so that you do not write shared columns for nested structures multiple times @@ -489,7 +489,7 @@ Pipe StorageTinyLog::read( { metadata_snapshot->check(column_names, getVirtuals(), getStorageID()); - auto all_columns = metadata_snapshot->getColumns().getAllWithSubcolumns().addTypes(column_names); + auto all_columns = metadata_snapshot->getColumns().getByNames(ColumnsDescription::All, column_names, true); // When reading, we lock the entire storage, because we only have one file // per column and can't modify it concurrently. @@ -509,9 +509,9 @@ Pipe StorageTinyLog::read( } -BlockOutputStreamPtr StorageTinyLog::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) +SinkToStoragePtr StorageTinyLog::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) { - return std::make_shared(*this, metadata_snapshot, std::unique_lock{rwlock, getLockTimeout(context)}); + return std::make_shared(*this, metadata_snapshot, std::unique_lock{rwlock, getLockTimeout(context)}); } diff --git a/src/Storages/StorageTinyLog.h b/src/Storages/StorageTinyLog.h index 849b0731a47..ef45e9fb74d 100644 --- a/src/Storages/StorageTinyLog.h +++ b/src/Storages/StorageTinyLog.h @@ -18,7 +18,7 @@ namespace DB class StorageTinyLog final : public shared_ptr_helper, public IStorage { friend class TinyLogSource; - friend class TinyLogBlockOutputStream; + friend class TinyLogSink; friend struct shared_ptr_helper; public: @@ -33,7 +33,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; void rename(const String & new_path_to_table_data, const StorageID & new_table_id) override; diff --git a/src/Storages/StorageURL.cpp b/src/Storages/StorageURL.cpp index fd9453e632c..1aa5ac7f236 100644 --- a/src/Storages/StorageURL.cpp +++ b/src/Storages/StorageURL.cpp @@ -16,11 +16,12 @@ #include #include -#include +#include #include #include -#include +#include +#include #include #include @@ -104,8 +105,15 @@ namespace compression_method); auto input_format = FormatFactory::instance().getInput(format, *read_buf, sample_block, context, max_block_size, format_settings); - reader = std::make_shared(input_format); - reader = std::make_shared(reader, columns, context); + pipeline = std::make_unique(); + pipeline->init(Pipe(input_format)); + + pipeline->addSimpleTransform([&](const Block & cur_header) + { + return std::make_shared(cur_header, columns, *input_format, context); + }); + + reader = std::make_unique(*pipeline); } String getName() const override @@ -118,15 +126,11 @@ namespace if (!reader) return {}; - if (!initialized) - reader->readPrefix(); + Chunk chunk; + if (reader->pull(chunk)) + return chunk; - initialized = true; - - if (auto block = reader->read()) - return Chunk(block.getColumns(), block.rows()); - - reader->readSuffix(); + pipeline->reset(); reader.reset(); return {}; @@ -135,19 +139,20 @@ namespace private: String name; std::unique_ptr read_buf; - BlockInputStreamPtr reader; - bool initialized = false; + std::unique_ptr pipeline; + std::unique_ptr reader; }; } -StorageURLBlockOutputStream::StorageURLBlockOutputStream(const Poco::URI & uri, - const String & format, - const std::optional & format_settings, - const Block & sample_block_, - ContextPtr context, - const ConnectionTimeouts & timeouts, - const CompressionMethod compression_method) - : sample_block(sample_block_) +StorageURLSink::StorageURLSink( + const Poco::URI & uri, + const String & format, + const std::optional & format_settings, + const Block & sample_block, + ContextPtr context, + const ConnectionTimeouts & timeouts, + const CompressionMethod compression_method) + : SinkToStorage(sample_block) { write_buf = wrapWriteBufferWithCompressionMethod( std::make_unique(uri, Poco::Net::HTTPRequest::HTTP_POST, timeouts), @@ -157,17 +162,18 @@ StorageURLBlockOutputStream::StorageURLBlockOutputStream(const Poco::URI & uri, } -void StorageURLBlockOutputStream::write(const Block & block) +void StorageURLSink::consume(Chunk chunk) { - writer->write(block); + if (is_first_chunk) + { + writer->writePrefix(); + is_first_chunk = false; + } + + writer->write(getPort().getHeader().cloneWithColumns(chunk.detachColumns())); } -void StorageURLBlockOutputStream::writePrefix() -{ - writer->writePrefix(); -} - -void StorageURLBlockOutputStream::writeSuffix() +void StorageURLSink::onFinish() { writer->writeSuffix(); writer->flush(); @@ -289,9 +295,9 @@ Pipe StorageURLWithFailover::read( } -BlockOutputStreamPtr IStorageURLBase::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) +SinkToStoragePtr IStorageURLBase::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr context) { - return std::make_shared(uri, format_name, + return std::make_shared(uri, format_name, format_settings, metadata_snapshot->getSampleBlock(), context, ConnectionTimeouts::getHTTPTimeouts(context), chooseCompressionMethod(uri.toString(), compression_method)); diff --git a/src/Storages/StorageURL.h b/src/Storages/StorageURL.h index 78306292044..7d61661b68d 100644 --- a/src/Storages/StorageURL.h +++ b/src/Storages/StorageURL.h @@ -3,7 +3,7 @@ #include #include #include -#include +#include #include #include #include @@ -32,7 +32,7 @@ public: size_t max_block_size, unsigned num_streams) override; - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; protected: IStorageURLBase( @@ -77,31 +77,27 @@ private: virtual Block getHeaderBlock(const Names & column_names, const StorageMetadataPtr & metadata_snapshot) const = 0; }; -class StorageURLBlockOutputStream : public IBlockOutputStream +class StorageURLSink : public SinkToStorage { public: - StorageURLBlockOutputStream( + StorageURLSink( const Poco::URI & uri, const String & format, const std::optional & format_settings, - const Block & sample_block_, + const Block & sample_block, ContextPtr context, const ConnectionTimeouts & timeouts, CompressionMethod compression_method); - Block getHeader() const override - { - return sample_block; - } - - void write(const Block & block) override; - void writePrefix() override; - void writeSuffix() override; + std::string getName() const override { return "StorageURLSink"; } + void consume(Chunk chunk) override; + void onFinish() override; private: - Block sample_block; std::unique_ptr write_buf; BlockOutputStreamPtr writer; + + bool is_first_chunk = true; }; class StorageURL : public shared_ptr_helper, public IStorageURLBase diff --git a/src/Storages/StorageXDBC.cpp b/src/Storages/StorageXDBC.cpp index 9cffc32fda1..123e6cfdec1 100644 --- a/src/Storages/StorageXDBC.cpp +++ b/src/Storages/StorageXDBC.cpp @@ -114,7 +114,7 @@ Pipe StorageXDBC::read( return IStorageURLBase::read(column_names, metadata_snapshot, query_info, local_context, processed_stage, max_block_size, num_streams); } -BlockOutputStreamPtr StorageXDBC::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) +SinkToStoragePtr StorageXDBC::write(const ASTPtr & /*query*/, const StorageMetadataPtr & metadata_snapshot, ContextPtr local_context) { bridge_helper->startBridgeSync(); @@ -130,7 +130,7 @@ BlockOutputStreamPtr StorageXDBC::write(const ASTPtr & /*query*/, const StorageM request_uri.addQueryParameter("format_name", format_name); request_uri.addQueryParameter("sample_block", metadata_snapshot->getSampleBlock().getNamesAndTypesList().toString()); - return std::make_shared( + return std::make_shared( request_uri, format_name, getFormatSettings(local_context), diff --git a/src/Storages/StorageXDBC.h b/src/Storages/StorageXDBC.h index db0b506546d..1a50f0461e6 100644 --- a/src/Storages/StorageXDBC.h +++ b/src/Storages/StorageXDBC.h @@ -33,7 +33,7 @@ public: ContextPtr context_, BridgeHelperPtr bridge_helper_); - BlockOutputStreamPtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; + SinkToStoragePtr write(const ASTPtr & query, const StorageMetadataPtr & /*metadata_snapshot*/, ContextPtr context) override; std::string getName() const override; private: diff --git a/src/Storages/System/StorageSystemNumbers.cpp b/src/Storages/System/StorageSystemNumbers.cpp index 545f2c8be9a..3a88cc96639 100644 --- a/src/Storages/System/StorageSystemNumbers.cpp +++ b/src/Storages/System/StorageSystemNumbers.cpp @@ -1,7 +1,6 @@ #include #include #include -#include #include #include diff --git a/src/Storages/System/StorageSystemPartsBase.cpp b/src/Storages/System/StorageSystemPartsBase.cpp index 7243e5aa3ba..c730d5a95c9 100644 --- a/src/Storages/System/StorageSystemPartsBase.cpp +++ b/src/Storages/System/StorageSystemPartsBase.cpp @@ -7,7 +7,7 @@ #include #include #include -#include +#include #include #include #include @@ -124,7 +124,7 @@ StoragesInfoStream::StoragesInfoStream(const SelectQueryInfo & query_info, Conte String engine_name = storage->getName(); #if USE_MYSQL - if (auto * proxy = dynamic_cast(storage.get())) + if (auto * proxy = dynamic_cast(storage.get())) { auto nested = proxy->getNested(); storage.swap(nested); diff --git a/src/Storages/System/StorageSystemReplicas.cpp b/src/Storages/System/StorageSystemReplicas.cpp index fc33c6b421b..5c22d3c2fae 100644 --- a/src/Storages/System/StorageSystemReplicas.cpp +++ b/src/Storages/System/StorageSystemReplicas.cpp @@ -2,6 +2,7 @@ #include #include #include +#include #include #include #include @@ -51,6 +52,7 @@ StorageSystemReplicas::StorageSystemReplicas(const StorageID & table_id_) { "total_replicas", std::make_shared() }, { "active_replicas", std::make_shared() }, { "zookeeper_exception", std::make_shared() }, + { "replica_is_active", std::make_shared(std::make_shared(), std::make_shared()) } })); setInMemoryMetadata(storage_metadata); } @@ -101,7 +103,8 @@ Pipe StorageSystemReplicas::read( || column_name == "log_pointer" || column_name == "total_replicas" || column_name == "active_replicas" - || column_name == "zookeeper_exception") + || column_name == "zookeeper_exception" + || column_name == "replica_is_active") { with_zk_fields = true; break; @@ -184,6 +187,18 @@ Pipe StorageSystemReplicas::read( res_columns[col_num++]->insert(status.total_replicas); res_columns[col_num++]->insert(status.active_replicas); res_columns[col_num++]->insert(status.zookeeper_exception); + + Map replica_is_active_values; + for (const auto & [name, is_active] : status.replica_is_active) + { + Tuple is_replica_active_value; + is_replica_active_value.emplace_back(name); + is_replica_active_value.emplace_back(is_active); + + replica_is_active_values.emplace_back(std::move(is_replica_active_value)); + } + + res_columns[col_num++]->insert(std::move(replica_is_active_values)); } Block header = metadata_snapshot->getSampleBlock(); diff --git a/src/Storages/System/StorageSystemSettings.cpp b/src/Storages/System/StorageSystemSettings.cpp index 1aca7e45190..e1f1e4985b4 100644 --- a/src/Storages/System/StorageSystemSettings.cpp +++ b/src/Storages/System/StorageSystemSettings.cpp @@ -3,7 +3,7 @@ #include #include #include -#include +#include namespace DB @@ -29,7 +29,8 @@ NamesAndTypesList StorageSystemSettings::getNamesAndTypes() void StorageSystemSettings::fillData(MutableColumns & res_columns, ContextPtr context, const SelectQueryInfo &) const { const Settings & settings = context->getSettingsRef(); - auto settings_constraints = context->getSettingsConstraints(); + auto constraints_and_current_profiles = context->getSettingsConstraintsAndCurrentProfiles(); + const auto & constraints = constraints_and_current_profiles->constraints; for (const auto & setting : settings.all()) { const auto & setting_name = setting.getName(); @@ -40,8 +41,7 @@ void StorageSystemSettings::fillData(MutableColumns & res_columns, ContextPtr co Field min, max; bool read_only = false; - if (settings_constraints) - settings_constraints->get(setting_name, min, max, read_only); + constraints.get(setting_name, min, max, read_only); /// These two columns can accept strings only. if (!min.isNull()) diff --git a/src/Storages/System/StorageSystemUsers.cpp b/src/Storages/System/StorageSystemUsers.cpp index 90b0a914d58..a48e12a1476 100644 --- a/src/Storages/System/StorageSystemUsers.cpp +++ b/src/Storages/System/StorageSystemUsers.cpp @@ -50,6 +50,7 @@ NamesAndTypesList StorageSystemUsers::getNamesAndTypes() {"grantees_any", std::make_shared()}, {"grantees_list", std::make_shared(std::make_shared())}, {"grantees_except", std::make_shared(std::make_shared())}, + {"default_database", std::make_shared()}, }; return names_and_types; } @@ -85,6 +86,7 @@ void StorageSystemUsers::fillData(MutableColumns & res_columns, ContextPtr conte auto & column_grantees_list_offsets = assert_cast(*res_columns[column_index++]).getOffsets(); auto & column_grantees_except = assert_cast(assert_cast(*res_columns[column_index]).getData()); auto & column_grantees_except_offsets = assert_cast(*res_columns[column_index++]).getOffsets(); + auto & column_default_database = assert_cast(*res_columns[column_index++]); auto add_row = [&](const String & name, const UUID & id, @@ -92,7 +94,8 @@ void StorageSystemUsers::fillData(MutableColumns & res_columns, ContextPtr conte const Authentication & authentication, const AllowedClientHosts & allowed_hosts, const RolesOrUsersSet & default_roles, - const RolesOrUsersSet & grantees) + const RolesOrUsersSet & grantees, + const String default_database) { column_name.insertData(name.data(), name.length()); column_id.push_back(id.toUnderType()); @@ -180,6 +183,8 @@ void StorageSystemUsers::fillData(MutableColumns & res_columns, ContextPtr conte for (const auto & except_name : grantees_ast->except_names) column_grantees_except.insertData(except_name.data(), except_name.length()); column_grantees_except_offsets.push_back(column_grantees_except.size()); + + column_default_database.insertData(default_database.data(),default_database.length()); }; for (const auto & id : ids) @@ -192,7 +197,8 @@ void StorageSystemUsers::fillData(MutableColumns & res_columns, ContextPtr conte if (!storage) continue; - add_row(user->getName(), id, storage->getStorageName(), user->authentication, user->allowed_client_hosts, user->default_roles, user->grantees); + add_row(user->getName(), id, storage->getStorageName(), user->authentication, user->allowed_client_hosts, + user->default_roles, user->grantees, user->default_database); } } diff --git a/src/Storages/System/StorageSystemZooKeeper.cpp b/src/Storages/System/StorageSystemZooKeeper.cpp index 1a8aac3b277..d19aef47616 100644 --- a/src/Storages/System/StorageSystemZooKeeper.cpp +++ b/src/Storages/System/StorageSystemZooKeeper.cpp @@ -15,6 +15,7 @@ #include #include #include +#include namespace DB diff --git a/src/Storages/System/attachSystemTables.cpp b/src/Storages/System/attachSystemTables.cpp index b3cc254a392..95e86487073 100644 --- a/src/Storages/System/attachSystemTables.cpp +++ b/src/Storages/System/attachSystemTables.cpp @@ -1,3 +1,7 @@ +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + #include #include #include @@ -74,6 +78,10 @@ #include #endif +#if USE_ROCKSDB +#include +#endif + namespace DB { @@ -126,6 +134,9 @@ void attachSystemTablesLocal(IDatabase & system_database) #ifdef OS_LINUX attach(system_database, "stack_trace"); #endif +#if USE_ROCKSDB + attach(system_database, "rocksdb"); +#endif } void attachSystemTablesServer(IDatabase & system_database, bool has_zookeeper) diff --git a/src/Storages/examples/get_abandonable_lock_in_all_partitions.cpp b/src/Storages/examples/get_abandonable_lock_in_all_partitions.cpp index 9027ba53708..0c20d2ed4c8 100644 --- a/src/Storages/examples/get_abandonable_lock_in_all_partitions.cpp +++ b/src/Storages/examples/get_abandonable_lock_in_all_partitions.cpp @@ -26,7 +26,7 @@ try auto config = processor.loadConfig().configuration; String root_path = argv[2]; - zkutil::ZooKeeper zk(*config, "zookeeper"); + zkutil::ZooKeeper zk(*config, "zookeeper", nullptr); String temp_path = root_path + "/temp"; String blocks_path = root_path + "/block_numbers"; diff --git a/src/Storages/examples/get_current_inserts_in_replicated.cpp b/src/Storages/examples/get_current_inserts_in_replicated.cpp index aa9e31792c1..94e63e49b19 100644 --- a/src/Storages/examples/get_current_inserts_in_replicated.cpp +++ b/src/Storages/examples/get_current_inserts_in_replicated.cpp @@ -29,7 +29,7 @@ try auto config = processor.loadConfig().configuration; String zookeeper_path = argv[2]; - auto zookeeper = std::make_shared(*config, "zookeeper"); + auto zookeeper = std::make_shared(*config, "zookeeper", nullptr); std::unordered_map> current_inserts; diff --git a/src/Storages/tests/gtest_storage_log.cpp b/src/Storages/tests/gtest_storage_log.cpp index e8057c54a0f..16902eafc98 100644 --- a/src/Storages/tests/gtest_storage_log.cpp +++ b/src/Storages/tests/gtest_storage_log.cpp @@ -1,7 +1,7 @@ #include #include -#include +#include #include #include #include @@ -100,7 +100,7 @@ std::string writeData(int rows, DB::StoragePtr & table, const DB::ContextPtr con block.insert(column); } - BlockOutputStreamPtr out = table->write({}, metadata_snapshot, context); + auto out = std::make_shared(table->write({}, metadata_snapshot, context)); out->write(block); out->writeSuffix(); diff --git a/src/Storages/ya.make b/src/Storages/ya.make index c001d933558..476449e8e6c 100644 --- a/src/Storages/ya.make +++ b/src/Storages/ya.make @@ -16,8 +16,8 @@ SRCS( ColumnsDescription.cpp ConstraintsDescription.cpp Distributed/DirectoryMonitor.cpp - Distributed/DistributedBlockOutputStream.cpp Distributed/DistributedSettings.cpp + Distributed/DistributedSink.cpp IStorage.cpp IndicesDescription.cpp JoinSettings.cpp @@ -41,7 +41,6 @@ SRCS( MergeTree/MergeAlgorithm.cpp MergeTree/MergeList.cpp MergeTree/MergeTreeBaseSelectProcessor.cpp - MergeTree/MergeTreeBlockOutputStream.cpp MergeTree/MergeTreeBlockReadUtils.cpp MergeTree/MergeTreeData.cpp MergeTree/MergeTreeDataMergerMutator.cpp @@ -59,6 +58,7 @@ SRCS( MergeTree/MergeTreeDataSelectExecutor.cpp MergeTree/MergeTreeDataWriter.cpp MergeTree/MergeTreeDeduplicationLog.cpp + MergeTree/MergeTreeInOrderSelectProcessor.cpp MergeTree/MergeTreeIndexAggregatorBloomFilter.cpp MergeTree/MergeTreeIndexBloomFilter.cpp MergeTree/MergeTreeIndexConditionBloomFilter.cpp @@ -87,7 +87,8 @@ SRCS( MergeTree/MergeTreeSelectProcessor.cpp MergeTree/MergeTreeSequentialSource.cpp MergeTree/MergeTreeSettings.cpp - MergeTree/MergeTreeThreadSelectBlockInputProcessor.cpp + MergeTree/MergeTreeSink.cpp + MergeTree/MergeTreeThreadSelectProcessor.cpp MergeTree/MergeTreeWhereOptimizer.cpp MergeTree/MergeTreeWriteAheadLog.cpp MergeTree/MergeType.cpp @@ -99,7 +100,6 @@ SRCS( MergeTree/ReplicatedFetchList.cpp MergeTree/ReplicatedMergeTreeAddress.cpp MergeTree/ReplicatedMergeTreeAltersSequence.cpp - MergeTree/ReplicatedMergeTreeBlockOutputStream.cpp MergeTree/ReplicatedMergeTreeCleanupThread.cpp MergeTree/ReplicatedMergeTreeLogEntry.cpp MergeTree/ReplicatedMergeTreeMergeStrategyPicker.cpp @@ -108,6 +108,7 @@ SRCS( MergeTree/ReplicatedMergeTreePartHeader.cpp MergeTree/ReplicatedMergeTreeQueue.cpp MergeTree/ReplicatedMergeTreeRestartingThread.cpp + MergeTree/ReplicatedMergeTreeSink.cpp MergeTree/ReplicatedMergeTreeTableMetadata.cpp MergeTree/SimpleMergeSelector.cpp MergeTree/TTLMergeSelector.cpp @@ -134,7 +135,7 @@ SRCS( StorageJoin.cpp StorageLog.cpp StorageLogSettings.cpp - StorageMaterializeMySQL.cpp + StorageMaterializedMySQL.cpp StorageMaterializedView.cpp StorageMemory.cpp StorageMerge.cpp diff --git a/src/TableFunctions/ITableFunctionFileLike.cpp b/src/TableFunctions/ITableFunctionFileLike.cpp index 3c4ab0edbab..1b96a0fe713 100644 --- a/src/TableFunctions/ITableFunctionFileLike.cpp +++ b/src/TableFunctions/ITableFunctionFileLike.cpp @@ -10,11 +10,13 @@ #include #include -#include #include #include +#include + + namespace DB { @@ -83,8 +85,8 @@ ColumnsDescription ITableFunctionFileLike::getActualTableStructure(ContextPtr co Strings paths = StorageFile::getPathsList(filename, context->getUserFilesPath(), context, total_bytes_to_read); if (paths.empty()) throw Exception("Cannot get table structure from file, because no files match specified name", ErrorCodes::INCORRECT_FILE_NAME); - auto read_stream = StorageDistributedDirectoryMonitor::createStreamFromFile(paths[0]); - return ColumnsDescription{read_stream->getHeader().getNamesAndTypesList()}; + auto read_stream = StorageDistributedDirectoryMonitor::createSourceFromFile(paths[0]); + return ColumnsDescription{read_stream->getOutputs().front().getHeader().getNamesAndTypesList()}; } return parseColumnsListFromString(structure, context); } diff --git a/src/TableFunctions/TableFunctionRemote.cpp b/src/TableFunctions/TableFunctionRemote.cpp index 40bfa2cbb6b..08f61a49fa5 100644 --- a/src/TableFunctions/TableFunctionRemote.cpp +++ b/src/TableFunctions/TableFunctionRemote.cpp @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -156,10 +157,11 @@ void TableFunctionRemote::parseArguments(const ASTPtr & ast_function, ContextPtr if (!cluster_name.empty()) { /// Use an existing cluster from the main config + String cluster_name_expanded = context->getMacros()->expand(cluster_name); if (name != "clusterAllReplicas") - cluster = context->getCluster(cluster_name); + cluster = context->getCluster(cluster_name_expanded); else - cluster = context->getCluster(cluster_name)->getClusterWithReplicasAsShards(context->getSettings()); + cluster = context->getCluster(cluster_name_expanded)->getClusterWithReplicasAsShards(context->getSettings()); } else { @@ -192,13 +194,16 @@ void TableFunctionRemote::parseArguments(const ASTPtr & ast_function, ContextPtr } } + bool treat_local_as_remote = false; + bool treat_local_port_as_remote = context->getApplicationType() == Context::ApplicationType::LOCAL; cluster = std::make_shared( context->getSettings(), names, username, password, (secure ? (maybe_secure_port ? *maybe_secure_port : DBMS_DEFAULT_SECURE_PORT) : context->getTCPPort()), - false, + treat_local_as_remote, + treat_local_port_as_remote, secure); } diff --git a/src/TableFunctions/TableFunctionSQLite.cpp b/src/TableFunctions/TableFunctionSQLite.cpp index 48bd350f851..a2038725d07 100644 --- a/src/TableFunctions/TableFunctionSQLite.cpp +++ b/src/TableFunctions/TableFunctionSQLite.cpp @@ -5,15 +5,18 @@ #include #include +#include +#include #include "registerTableFunctions.h" #include +#include + #include #include #include #include -#include namespace DB @@ -34,6 +37,7 @@ StoragePtr TableFunctionSQLite::executeImpl(const ASTPtr & /*ast_function*/, auto storage = StorageSQLite::create(StorageID(getDatabaseName(), table_name), sqlite_db, + database_path, remote_table_name, columns, ConstraintsDescription{}, context); @@ -72,14 +76,7 @@ void TableFunctionSQLite::parseArguments(const ASTPtr & ast_function, ContextPtr database_path = args[0]->as().value.safeGet(); remote_table_name = args[1]->as().value.safeGet(); - sqlite3 * tmp_sqlite_db = nullptr; - int status = sqlite3_open(database_path.c_str(), &tmp_sqlite_db); - if (status != SQLITE_OK) - throw Exception(ErrorCodes::SQLITE_ENGINE_ERROR, - "Failed to open sqlite database. Status: {}. Message: {}", - status, sqlite3_errstr(status)); - - sqlite_db = std::shared_ptr(tmp_sqlite_db, sqlite3_close); + sqlite_db = openSQLiteDB(database_path, context); } diff --git a/tests/.gitignore b/tests/.gitignore index ac05cdced53..6604360fe12 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -3,3 +3,6 @@ *.error *.dump test_data + +/queries/0_stateless/*.gen.sql +/queries/0_stateless/*.gen.reference diff --git a/tests/clickhouse-test b/tests/clickhouse-test index b4c8203878d..d83b3f08c42 100755 --- a/tests/clickhouse-test +++ b/tests/clickhouse-test @@ -27,8 +27,16 @@ except ImportError: import random import string import multiprocessing +import socket from contextlib import closing +USE_JINJA = True +try: + import jinja2 +except ImportError: + USE_JINJA = False + print('WARNING: jinja2 not installed! Template tests will be skipped.') + DISTRIBUTED_DDL_TIMEOUT_MSG = "is executing longer than distributed_ddl_task_timeout" MESSAGES_TO_RETRY = [ @@ -42,11 +50,14 @@ MESSAGES_TO_RETRY = [ "ConnectionPoolWithFailover: Connection failed at try", "DB::Exception: New table appeared in database being dropped or detached. Try again", "is already started to be removing by another replica right now", + "Shutdown is called for table", # It happens in SYSTEM SYNC REPLICA query if session with ZooKeeper is being reinitialized. DISTRIBUTED_DDL_TIMEOUT_MSG # FIXME ] MAX_RETRIES = 3 +TEST_FILE_EXTENSIONS = ['.sql', '.sql.j2', '.sh', '.py', '.expect'] + class Terminated(KeyboardInterrupt): pass @@ -165,7 +176,8 @@ def configure_testcase_args(args, case_file, suite_tmp_dir, stderr_file): database = 'test_{suffix}'.format(suffix=random_str()) with open(stderr_file, 'w') as stderr: - clickhouse_proc_create = Popen(shlex.split(testcase_args.testcase_client), stdin=PIPE, stdout=PIPE, stderr=stderr, universal_newlines=True) + client_cmd = testcase_args.testcase_client + " " + get_additional_client_options(args) + clickhouse_proc_create = Popen(shlex.split(client_cmd), stdin=PIPE, stdout=PIPE, stderr=stderr, universal_newlines=True) try: clickhouse_proc_create.communicate(("CREATE DATABASE " + database + get_db_engine(testcase_args, database)), timeout=testcase_args.timeout) except TimeoutExpired: @@ -256,6 +268,9 @@ def run_single_test(args, ext, server_logs_level, client_options, case_file, std os.system("LC_ALL=C sed -i -e 's|/auto_{{shard}}||g' {file}".format(file=stdout_file)) os.system("LC_ALL=C sed -i -e 's|auto_{{replica}}||g' {file}".format(file=stdout_file)) + # Normalize hostname in stdout file. + os.system("LC_ALL=C sed -i -e 's/{hostname}/localhost/g' {file}".format(hostname=socket.gethostname(), file=stdout_file)) + stdout = open(stdout_file, 'rb').read() if os.path.exists(stdout_file) else b'' stdout = str(stdout, errors='replace', encoding='utf-8') stderr = open(stderr_file, 'rb').read() if os.path.exists(stderr_file) else b'' @@ -264,8 +279,8 @@ def run_single_test(args, ext, server_logs_level, client_options, case_file, std return proc, stdout, stderr, total_time -def need_retry(stderr): - return any(msg in stderr for msg in MESSAGES_TO_RETRY) +def need_retry(stdout, stderr): + return any(msg in stdout for msg in MESSAGES_TO_RETRY) or any(msg in stderr for msg in MESSAGES_TO_RETRY) def get_processlist(args): @@ -407,13 +422,13 @@ def run_tests_array(all_tests_with_params): status = '' if not is_concurrent: sys.stdout.flush() - sys.stdout.write("{0:72}".format(name + ": ")) + sys.stdout.write("{0:72}".format(removesuffix(name, ".gen", ".sql") + ": ")) # This flush is needed so you can see the test name of the long # running test before it will finish. But don't do it in parallel # mode, so that the lines don't mix. sys.stdout.flush() else: - status = "{0:72}".format(name + ": ") + status = "{0:72}".format(removesuffix(name, ".gen", ".sql") + ": ") if args.skip and any(s in name for s in args.skip): status += MSG_SKIPPED + " - skip\n" @@ -434,6 +449,9 @@ def run_tests_array(all_tests_with_params): or 'race' in name): status += MSG_SKIPPED + " - no long\n" skipped_total += 1 + elif not USE_JINJA and ext.endswith("j2"): + status += MSG_SKIPPED + " - no jinja\n" + skipped_total += 1 else: disabled_file = os.path.join(suite_dir, name) + '.disabled' @@ -458,11 +476,10 @@ def run_tests_array(all_tests_with_params): break file_suffix = ('.' + str(os.getpid())) if is_concurrent and args.test_runs > 1 else '' - reference_file = os.path.join(suite_dir, name) + '.reference' + reference_file = get_reference_file(suite_dir, name) stdout_file = os.path.join(suite_tmp_dir, name) + file_suffix + '.stdout' stderr_file = os.path.join(suite_tmp_dir, name) + file_suffix + '.stderr' - testcase_args = configure_testcase_args(args, case_file, suite_tmp_dir, stderr_file) proc, stdout, stderr, total_time = run_single_test(testcase_args, ext, server_logs_level, client_options, case_file, stdout_file, stderr_file) @@ -482,7 +499,7 @@ def run_tests_array(all_tests_with_params): status += 'Database: ' + testcase_args.testcase_database else: counter = 1 - while need_retry(stderr): + while need_retry(stdout, stderr): restarted_tests.append((case_file, stderr)) testcase_args = configure_testcase_args(args, case_file, suite_tmp_dir, stderr_file) proc, stdout, stderr, total_time = run_single_test(testcase_args, ext, server_logs_level, client_options, case_file, stdout_file, stderr_file) @@ -535,7 +552,7 @@ def run_tests_array(all_tests_with_params): status += " - having exception:\n{}\n".format( '\n'.join(stdout.split('\n')[:100])) status += 'Database: ' + testcase_args.testcase_database - elif not os.path.isfile(reference_file): + elif reference_file is None: status += MSG_UNKNOWN status += print_test_time(total_time) status += " - no reference file\n" @@ -760,6 +777,97 @@ def do_run_tests(jobs, suite, suite_dir, suite_tmp_dir, all_tests, parallel_test return num_tests +def is_test_from_dir(suite_dir, case): + case_file = os.path.join(suite_dir, case) + # We could also test for executable files (os.access(case_file, os.X_OK), + # but it interferes with 01610_client_spawn_editor.editor, which is invoked + # as a query editor in the test, and must be marked as executable. + return os.path.isfile(case_file) and any(case_file.endswith(suppotred_ext) for suppotred_ext in TEST_FILE_EXTENSIONS) + + +def removesuffix(text, *suffixes): + """ + Added in python 3.9 + https://www.python.org/dev/peps/pep-0616/ + + This version can work with severtal possible suffixes + """ + for suffix in suffixes: + if suffix and text.endswith(suffix): + return text[:-len(suffix)] + return text + + +def render_test_template(j2env, suite_dir, test_name): + """ + Render template for test and reference file if needed + """ + + if j2env is None: + return test_name + + test_base_name = removesuffix(test_name, ".sql.j2", ".sql") + + reference_file_name = test_base_name + ".reference.j2" + reference_file_path = os.path.join(suite_dir, reference_file_name) + if os.path.isfile(reference_file_path): + tpl = j2env.get_template(reference_file_name) + tpl.stream().dump(os.path.join(suite_dir, test_base_name) + ".gen.reference") + + if test_name.endswith(".sql.j2"): + tpl = j2env.get_template(test_name) + generated_test_name = test_base_name + ".gen.sql" + tpl.stream().dump(os.path.join(suite_dir, generated_test_name)) + return generated_test_name + + return test_name + + +def get_selected_tests(suite_dir, patterns): + """ + Find all files with tests, filter, render templates + """ + + j2env = jinja2.Environment( + loader=jinja2.FileSystemLoader(suite_dir), + keep_trailing_newline=True, + ) if USE_JINJA else None + + for test_name in os.listdir(suite_dir): + if not is_test_from_dir(suite_dir, test_name): + continue + if patterns and not any(re.search(pattern, test_name) for pattern in patterns): + continue + if USE_JINJA and test_name.endswith(".gen.sql"): + continue + test_name = render_test_template(j2env, suite_dir, test_name) + yield test_name + + +def get_tests_list(suite_dir, patterns, test_runs, sort_key): + """ + Return list of tests file names to run + """ + + all_tests = list(get_selected_tests(suite_dir, patterns)) + all_tests = all_tests * test_runs + all_tests.sort(key=sort_key) + return all_tests + + +def get_reference_file(suite_dir, name): + """ + Returns reference file name for specified test + """ + + name = removesuffix(name, ".gen") + for ext in ['.reference', '.gen.reference']: + reference_file = os.path.join(suite_dir, name) + ext + if os.path.isfile(reference_file): + return reference_file + return None + + def main(args): global server_died global stop_time @@ -834,9 +942,10 @@ def main(args): def create_common_database(args, db_name): create_database_retries = 0 while create_database_retries < MAX_RETRIES: - clickhouse_proc_create = Popen(shlex.split(args.client), stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=True) - (_, stderr) = clickhouse_proc_create.communicate(("CREATE DATABASE IF NOT EXISTS " + db_name + get_db_engine(args, db_name))) - if not need_retry(stderr): + client_cmd = args.client + " " + get_additional_client_options(args) + clickhouse_proc_create = Popen(shlex.split(client_cmd), stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=True) + (stdout, stderr) = clickhouse_proc_create.communicate(("CREATE DATABASE IF NOT EXISTS " + db_name + get_db_engine(args, db_name))) + if not need_retry(stdout, stderr): break create_database_retries += 1 @@ -844,14 +953,6 @@ def main(args): create_common_database(args, args.database) create_common_database(args, "test") - def is_test_from_dir(suite_dir, case): - case_file = os.path.join(suite_dir, case) - (_, ext) = os.path.splitext(case) - # We could also test for executable files (os.access(case_file, os.X_OK), - # but it interferes with 01610_client_spawn_editor.editor, which is invoked - # as a query editor in the test, and must be marked as executable. - return os.path.isfile(case_file) and (ext in ['.sql', '.sh', '.py', '.expect']) - def sute_key_func(item): if args.order == 'random': return random.random() @@ -911,12 +1012,7 @@ def main(args): except ValueError: return 99997 - all_tests = os.listdir(suite_dir) - all_tests = [case for case in all_tests if is_test_from_dir(suite_dir, case)] - if args.test: - all_tests = [t for t in all_tests if any(re.search(r, t) for r in args.test)] - all_tests = all_tests * args.test_runs - all_tests.sort(key=key_func) + all_tests = get_tests_list(suite_dir, args.test, args.test_runs, key_func) jobs = args.jobs parallel_tests = [] diff --git a/tests/clickhouse-test-server b/tests/clickhouse-test-server deleted file mode 100755 index 4087468b597..00000000000 --- a/tests/clickhouse-test-server +++ /dev/null @@ -1,167 +0,0 @@ -#!/usr/bin/env bash - -set -x -set -o errexit -set -o pipefail - -CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) -ROOT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && cd ../.. && pwd) -DATA_DIR=${DATA_DIR:=`mktemp -d /tmp/clickhouse.test..XXXXX`} -DATA_DIR_PATTERN=${DATA_DIR_PATTERN:=/tmp/clickhouse} # path from config file, will be replaced to temporary -LOG_DIR=${LOG_DIR:=$DATA_DIR/log} -export CLICKHOUSE_BINARY_NAME=${CLICKHOUSE_BINARY_NAME:="clickhouse"} -( [ -x "$ROOT_DIR/programs/${CLICKHOUSE_BINARY_NAME}-server" ] || [ -x "$ROOT_DIR/programs/${CLICKHOUSE_BINARY_NAME}" ] ) && BUILD_DIR=${BUILD_DIR:=$ROOT_DIR} # Build without separate build dir -[ -d "$ROOT_DIR/build${BUILD_TYPE}" ] && BUILD_DIR=${BUILD_DIR:=$ROOT_DIR/build${BUILD_TYPE}} -BUILD_DIR=${BUILD_DIR:=$ROOT_DIR} -[ -x ${CLICKHOUSE_BINARY_NAME}-server" ] && [ -x ${CLICKHOUSE_BINARY_NAME}-client" ] && BIN_DIR= # Allow run in /usr/bin -( [ -x "$BUILD_DIR/programs/${CLICKHOUSE_BINARY_NAME}" ] || [ -x "$BUILD_DIR/programs/${CLICKHOUSE_BINARY_NAME}-server" ] ) && BIN_DIR=${BIN_DIR:=$BUILD_DIR/programs/} -[ -x "$BIN_DIR/${CLICKHOUSE_BINARY_NAME}-server" ] && CLICKHOUSE_SERVER=${CLICKHOUSE_SERVER:=$BIN_DIR/${CLICKHOUSE_BINARY_NAME}-server} -[ -x "$BIN_DIR/${CLICKHOUSE_BINARY_NAME}" ] && CLICKHOUSE_SERVER=${CLICKHOUSE_SERVER:=$BIN_DIR/${CLICKHOUSE_BINARY_NAME} server} -[ -x "$BIN_DIR/${CLICKHOUSE_BINARY_NAME}-client" ] && CLICKHOUSE_CLIENT=${CLICKHOUSE_CLIENT:=$BIN_DIR/${CLICKHOUSE_BINARY_NAME}-client} -[ -x "$BIN_DIR/${CLICKHOUSE_BINARY_NAME}" ] && CLICKHOUSE_CLIENT=${CLICKHOUSE_CLIENT:=$BIN_DIR/${CLICKHOUSE_BINARY_NAME} client} -[ -x "$BIN_DIR/${CLICKHOUSE_BINARY_NAME}-extract-from-config" ] && CLICKHOUSE_EXTRACT=${CLICKHOUSE_EXTRACT:=$BIN_DIR/${CLICKHOUSE_BINARY_NAME}-extract-from-config} -[ -x "$BIN_DIR/${CLICKHOUSE_BINARY_NAME}" ] && CLICKHOUSE_EXTRACT=${CLICKHOUSE_EXTRACT:=$BIN_DIR/${CLICKHOUSE_BINARY_NAME} extract-from-config} - -[ -f "$CUR_DIR/server-test.xml" ] && CONFIG_DIR=${CONFIG_DIR=$CUR_DIR}/ -CONFIG_CLIENT_DIR=${CONFIG_CLIENT_DIR=$CONFIG_DIR} -CONFIG_SERVER_DIR=${CONFIG_SERVER_DIR=$CONFIG_DIR} -[ ! -f "${CONFIG_CLIENT_DIR}client-test.xml" ] && CONFIG_CLIENT_DIR=${CONFIG_CLIENT_DIR:=/etc/clickhouse-client/} -[ ! -f "${CONFIG_SERVER_DIR}server-test.xml" ] && CONFIG_SERVER_DIR=${CONFIG_SERVER_DIR:=/etc/clickhouse-server/} -export CLICKHOUSE_CONFIG_CLIENT=${CLICKHOUSE_CONFIG_CLIENT:=${CONFIG_CLIENT_DIR}client-test.xml} -export CLICKHOUSE_CONFIG=${CLICKHOUSE_CONFIG:=${CONFIG_SERVER_DIR}server-test.xml} -CLICKHOUSE_CONFIG_USERS=${CONFIG_SERVER_DIR}users.xml -[ ! -f "$CLICKHOUSE_CONFIG_USERS" ] && CLICKHOUSE_CONFIG_USERS=$CUR_DIR/../programs/server/users.xml -CLICKHOUSE_CONFIG_USERS_D=${CONFIG_SERVER_DIR}users.d -[ ! -d "$CLICKHOUSE_CONFIG_USERS_D" ] && CLICKHOUSE_CONFIG_USERS_D=$CUR_DIR/../programs/server/users.d -[ -x "$CUR_DIR/clickhouse-test" ] && TEST_DIR=${TEST_DIR=$CUR_DIR/} -[ -d "$CUR_DIR/queries" ] && QUERIES_DIR=${QUERIES_DIR=$CUR_DIR/queries} -[ ! -d "$QUERIES_DIR" ] && [ -d "/usr/local/share/clickhouse-test/queries" ] && QUERIES_DIR=${QUERIES_DIR=/usr/local/share/clickhouse-test/queries} -[ ! -d "$QUERIES_DIR" ] && [ -d "/usr/share/clickhouse-test/queries" ] && QUERIES_DIR=${QUERIES_DIR=/usr/share/clickhouse-test/queries} - -TEST_PORT_RANDOM=${TEST_PORT_RANDOM=1} -if [ "${TEST_PORT_RANDOM}" ]; then - CLICKHOUSE_PORT_BASE=${CLICKHOUSE_PORT_BASE:=$(( ( RANDOM % 50000 ) + 10000 ))} - CLICKHOUSE_PORT_TCP=${CLICKHOUSE_PORT_TCP:=$(($CLICKHOUSE_PORT_BASE + 1))} - CLICKHOUSE_PORT_HTTP=${CLICKHOUSE_PORT_HTTP:=$(($CLICKHOUSE_PORT_BASE + 2))} - CLICKHOUSE_PORT_INTERSERVER=${CLICKHOUSE_PORT_INTERSERVER:=$(($CLICKHOUSE_PORT_BASE + 3))} - CLICKHOUSE_PORT_TCP_SECURE=${CLICKHOUSE_PORT_TCP_SECURE:=$(($CLICKHOUSE_PORT_BASE + 4))} - CLICKHOUSE_PORT_HTTPS=${CLICKHOUSE_PORT_HTTPS:=$(($CLICKHOUSE_PORT_BASE + 5))} - CLICKHOUSE_PORT_ODBC_BRIDGE=${CLICKHOUSE_ODBC_BRIDGE:=$(($CLICKHOUSE_PORT_BASE + 6))} -fi - -rm -rf $DATA_DIR || true -mkdir -p $LOG_DIR $DATA_DIR/etc || true - -if [ "$DATA_DIR_PATTERN" != "$DATA_DIR" ]; then - cat $CLICKHOUSE_CONFIG | sed -e s!$DATA_DIR_PATTERN!$DATA_DIR! > $DATA_DIR/etc/server-config.xml - export CLICKHOUSE_CONFIG=$DATA_DIR/etc/server-config.xml - cp $CLICKHOUSE_CONFIG_USERS $DATA_DIR/etc - cp -R -L $CLICKHOUSE_CONFIG_USERS_D $DATA_DIR/etc - cat ${CONFIG_SERVER_DIR}/ints_dictionary.xml | sed -e s!9000!$CLICKHOUSE_PORT_TCP! > $DATA_DIR/etc/ints_dictionary.xml - cat ${CONFIG_SERVER_DIR}/strings_dictionary.xml | sed -e s!9000!$CLICKHOUSE_PORT_TCP! > $DATA_DIR/etc/strings_dictionary.xml - cat ${CONFIG_SERVER_DIR}/decimals_dictionary.xml | sed -e s!9000!$CLICKHOUSE_PORT_TCP! > $DATA_DIR/etc/decimals_dictionary.xml - cat ${CONFIG_SERVER_DIR}/executable_pool_dictionary.xml | sed -e s!9000!$CLICKHOUSE_PORT_TCP! > $DATA_DIR/etc/executable_pool_dictionary.xml -fi - -CLICKHOUSE_EXTRACT_CONFIG=${CLICKHOUSE_EXTRACT_CONFIG:="${CLICKHOUSE_EXTRACT} --config=$CLICKHOUSE_CONFIG"} -CLICKHOUSE_LOG=${CLICKHOUSE_LOG:=${LOG_DIR}clickhouse-server.log} -export CLICKHOUSE_PORT_TCP=${CLICKHOUSE_PORT_TCP:=`$CLICKHOUSE_EXTRACT_CONFIG --key=tcp_port || echo 9000`} -export CLICKHOUSE_PORT_HTTP=${CLICKHOUSE_PORT_HTTP:=`$CLICKHOUSE_EXTRACT_CONFIG --key=http_port || echo 8123`} -export CLICKHOUSE_PORT_INTERSERVER=${CLICKHOUSE_PORT_INTERSERVER:=`$CLICKHOUSE_EXTRACT_CONFIG --key=interserver_http_port || echo 9009`} -export CLICKHOUSE_PORT_TCP_SECURE=${CLICKHOUSE_PORT_TCP_SECURE:=`$CLICKHOUSE_EXTRACT_CONFIG --key=tcp_port_secure`} -export CLICKHOUSE_PORT_HTTPS=${CLICKHOUSE_PORT_HTTPS:=`$CLICKHOUSE_EXTRACT_CONFIG --key=https_port`} -export CLICKHOUSE_ODBC_BRIDGE=${CLICKHOUSE_ODBC_BRIDGE:=`$CLICKHOUSE_EXTRACT_CONFIG --key=odbc_bridge.port || echo 9018`} - -DHPARAM=`$CLICKHOUSE_EXTRACT_CONFIG --key=openSSL.server.dhParamsFile` -PRIVATEKEY=`$CLICKHOUSE_EXTRACT_CONFIG --key=openSSL.server.privateKeyFile` -CERT=`$CLICKHOUSE_EXTRACT_CONFIG --key=openSSL.server.certificateFile` -# Do not generate in case broken extract-config -[ -n "$DHPARAM" ] && openssl dhparam -out $DHPARAM 256 -[ -n "$PRIVATEKEY" ] && [ -n "$CERT" ] && openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout $PRIVATEKEY -out $CERT - -if [ "$TEST_GDB" ] || [ "$GDB" ]; then - echo -e "run \nset pagination off \nset logging file $LOG_DIR/server.gdb.log \nset logging on \nbacktrace \nthread apply all backtrace \nbacktrace \ndetach \nquit " > $DATA_DIR/gdb.cmd - GDB=${GDB:="gdb -x $DATA_DIR/gdb.cmd --args "} -fi - -# Start a local clickhouse server which will be used to run tests - -# TODO: fix change shard ports: -# --remote_servers.test_shard_localhost_secure.shard.replica.port=$CLICKHOUSE_PORT_TCP_SECURE \ -# --remote_servers.test_shard_localhost.shard.replica.port=$CLICKHOUSE_PORT_TCP \ - -VERSION=`$CLICKHOUSE_CLIENT --version-clean` -# If run from compile dir - use in-place compile binary and headers -[ -n "$BIN_DIR" ] && INTERNAL_COMPILER_PARAMS="--compiler_executable_root=${INTERNAL_COMPILER_BIN_ROOT:=$BUILD_DIR/programs/} --compiler_headers=$BUILD_DIR/programs/clang/headers/$VERSION/ --compiler_headers_root=$BUILD_DIR/programs/clang/headers/$VERSION/" - -$GDB $CLICKHOUSE_SERVER --config-file=$CLICKHOUSE_CONFIG --log=$CLICKHOUSE_LOG $TEST_SERVER_PARAMS -- \ - --http_port=$CLICKHOUSE_PORT_HTTP \ - --tcp_port=$CLICKHOUSE_PORT_TCP \ - --https_port=$CLICKHOUSE_PORT_HTTPS \ - --tcp_port_secure=$CLICKHOUSE_PORT_TCP_SECURE \ - --interserver_http_port=$CLICKHOUSE_PORT_INTERSERVER \ - --odbc_bridge.port=$CLICKHOUSE_ODBC_BRIDGE \ - $INTERNAL_COMPILER_PARAMS \ - $TEST_SERVER_CONFIG_PARAMS \ - 2>&1 > $LOG_DIR/server.stdout.log & -CH_PID=$! -sleep ${TEST_SERVER_STARTUP_WAIT:=5} - -if [ "$GDB" ]; then - # Long symbols read - sleep ${TEST_GDB_SLEEP:=60} -fi - -tail -n50 $LOG_DIR/*.log || true - -# Define needed stuff to kill test clickhouse server after tests completion -function finish { - kill $CH_PID || true - wait - tail -n 50 $LOG_DIR/*.log || true - if [ "$GDB" ]; then - cat $LOG_DIR/server.gdb.log || true - fi - rm -rf $DATA_DIR -} -trap finish EXIT SIGINT SIGQUIT SIGTERM - -# Do tests -if [ -n "$*" ]; then - $* -else - TEST_RUN=${TEST_RUN=1} - TEST_DICT=${TEST_DICT=1} - CLICKHOUSE_CLIENT_QUERY="${CLICKHOUSE_CLIENT} --config ${CLICKHOUSE_CONFIG_CLIENT} --port $CLICKHOUSE_PORT_TCP -m -n -q" - $CLICKHOUSE_CLIENT_QUERY 'SELECT * from system.build_options; SELECT * FROM system.clusters;' - CLICKHOUSE_TEST="env ${TEST_DIR}clickhouse-test --force-color --binary ${BIN_DIR}${CLICKHOUSE_BINARY_NAME} --configclient $CLICKHOUSE_CONFIG_CLIENT --configserver $CLICKHOUSE_CONFIG --tmp $DATA_DIR/tmp --queries $QUERIES_DIR $TEST_OPT0 $TEST_OPT" - if [ "${TEST_RUN_STRESS}" ]; then - # Running test in parallel will fail some results (tests can create/fill/drop same tables) - TEST_NPROC=${TEST_NPROC:=$(( `nproc || sysctl -n hw.ncpu || echo 2` * 2))} - for i in `seq 1 ${TEST_NPROC}`; do - $CLICKHOUSE_TEST --order=random --testname --tmp=$DATA_DIR/tmp/tmp${i} & - done - fi - - if [ "${TEST_RUN_PARALLEL}" ]; then - # Running test in parallel will fail some results (tests can create/fill/drop same tables) - TEST_NPROC=${TEST_NPROC:=$(( `nproc || sysctl -n hw.ncpu || echo 2` * 2))} - for i in `seq 1 ${TEST_NPROC}`; do - $CLICKHOUSE_TEST --testname --tmp=$DATA_DIR/tmp/tmp${i} --database=test${i} --parallel=${i}/${TEST_NPROC} & - done - for job in `jobs -p`; do - #echo wait $job - wait $job || let "FAIL+=1" - done - - #echo $FAIL - if [ "$FAIL" != "0" ]; then - return $FAIL - fi - else - ( [ "$TEST_RUN" ] && $CLICKHOUSE_TEST ) || ${TEST_TRUE:=false} - fi - - $CLICKHOUSE_CLIENT_QUERY "SELECT event, value FROM system.events; SELECT metric, value FROM system.metrics; SELECT metric, value FROM system.asynchronous_metrics;" - $CLICKHOUSE_CLIENT_QUERY "SELECT 'Still alive'" -fi \ No newline at end of file diff --git a/tests/config/config.d/encryption.xml b/tests/config/config.d/encryption.xml new file mode 100644 index 00000000000..216021ae744 --- /dev/null +++ b/tests/config/config.d/encryption.xml @@ -0,0 +1,6 @@ + + + + echo "U29tZSBmaXhlZCBrZXkgdGhhdCBpcyBhdCBsZWFzdCAxNiBieXRlcyBsb25n" + + diff --git a/tests/config/config.d/macros.xml b/tests/config/config.d/macros.xml index 4902b12bc81..b17eb40099d 100644 --- a/tests/config/config.d/macros.xml +++ b/tests/config/config.d/macros.xml @@ -1,6 +1,7 @@ Hello, world! + test_shard_localhost s1 r1 /clickhouse/tables/{database}/{shard}/ diff --git a/tests/config/config.d/secure_ports.xml b/tests/config/config.d/secure_ports.xml index d915daaf743..e832dce5526 100644 --- a/tests/config/config.d/secure_ports.xml +++ b/tests/config/config.d/secure_ports.xml @@ -5,7 +5,7 @@ - AcceptCertificateHandler + AcceptCertificateHandler diff --git a/tests/config/config.d/zookeeper_log.xml b/tests/config/config.d/zookeeper_log.xml new file mode 100644 index 00000000000..2ec1f2446ed --- /dev/null +++ b/tests/config/config.d/zookeeper_log.xml @@ -0,0 +1,7 @@ + + + system + zookeeper_log
+ 7500 +
+
diff --git a/tests/config/executable_dictionary.xml b/tests/config/executable_dictionary.xml index c5a4a0947bc..6089f57a3d7 100644 --- a/tests/config/executable_dictionary.xml +++ b/tests/config/executable_dictionary.xml @@ -123,7 +123,7 @@ - echo "1\tValue" + printf "1\tValue\n" TabSeparated false @@ -197,7 +197,7 @@ - echo "1\tFirstKey\tValue" + printf "1\tFirstKey\tValue\n" TabSeparated false diff --git a/tests/config/install.sh b/tests/config/install.sh index 08add810cbf..571dff34018 100755 --- a/tests/config/install.sh +++ b/tests/config/install.sh @@ -34,6 +34,8 @@ ln -sf $SRC_PATH/config.d/logging_no_rotate.xml $DEST_SERVER_PATH/config.d/ ln -sf $SRC_PATH/config.d/tcp_with_proxy.xml $DEST_SERVER_PATH/config.d/ ln -sf $SRC_PATH/config.d/top_level_domains_lists.xml $DEST_SERVER_PATH/config.d/ ln -sf $SRC_PATH/config.d/top_level_domains_path.xml $DEST_SERVER_PATH/config.d/ +ln -sf $SRC_PATH/config.d/encryption.xml $DEST_SERVER_PATH/config.d/ +ln -sf $SRC_PATH/config.d/zookeeper_log.xml $DEST_SERVER_PATH/config.d/ ln -sf $SRC_PATH/users.d/log_queries.xml $DEST_SERVER_PATH/users.d/ ln -sf $SRC_PATH/users.d/readonly.xml $DEST_SERVER_PATH/users.d/ ln -sf $SRC_PATH/users.d/access_management.xml $DEST_SERVER_PATH/users.d/ diff --git a/tests/config/users.d/timeouts.xml b/tests/config/users.d/timeouts.xml index 583caca36a4..60b24cfdef8 100644 --- a/tests/config/users.d/timeouts.xml +++ b/tests/config/users.d/timeouts.xml @@ -4,6 +4,8 @@ 60 60 + + 60000
diff --git a/tests/integration/ci-runner.py b/tests/integration/ci-runner.py index 97d076f698e..ecd4cb8d4e7 100755 --- a/tests/integration/ci-runner.py +++ b/tests/integration/ci-runner.py @@ -261,12 +261,31 @@ class ClickhouseIntegrationTestsRunner: def _get_all_tests(self, repo_path): image_cmd = self._get_runner_image_cmd(repo_path) - cmd = "cd {}/tests/integration && ./runner --tmpfs {} ' --setup-plan' | grep '::' | sed 's/ (fixtures used:.*//g' | sed 's/^ *//g' | sed 's/ *$//g' | grep -v 'SKIPPED' | sort -u > all_tests.txt".format(repo_path, image_cmd) + out_file = "all_tests.txt" + out_file_full = "all_tests_full.txt" + cmd = "cd {repo_path}/tests/integration && " \ + "./runner --tmpfs {image_cmd} ' --setup-plan' " \ + "| tee {out_file_full} | grep '::' | sed 's/ (fixtures used:.*//g' | sed 's/^ *//g' | sed 's/ *$//g' " \ + "| grep -v 'SKIPPED' | sort -u > {out_file}".format( + repo_path=repo_path, image_cmd=image_cmd, out_file=out_file, out_file_full=out_file_full) + logging.info("Getting all tests with cmd '%s'", cmd) subprocess.check_call(cmd, shell=True) # STYLE_CHECK_ALLOW_SUBPROCESS_CHECK_CALL - all_tests_file_path = "{}/tests/integration/all_tests.txt".format(repo_path) + all_tests_file_path = "{repo_path}/tests/integration/{out_file}".format(repo_path=repo_path, out_file=out_file) if not os.path.isfile(all_tests_file_path) or os.path.getsize(all_tests_file_path) == 0: + all_tests_full_file_path = "{repo_path}/tests/integration/{out_file}".format(repo_path=repo_path, out_file=out_file_full) + if os.path.isfile(all_tests_full_file_path): + # log runner output + logging.info("runner output:") + with open(all_tests_full_file_path, 'r') as all_tests_full_file: + for line in all_tests_full_file: + line = line.rstrip() + if line: + logging.info("runner output: %s", line) + else: + logging.info("runner output '%s' is empty", all_tests_full_file_path) + raise Exception("There is something wrong with getting all tests list: file '{}' is empty or does not exist.".format(all_tests_file_path)) all_tests = [] @@ -376,7 +395,7 @@ class ClickhouseIntegrationTestsRunner: image_cmd = self._get_runner_image_cmd(repo_path) test_group_str = test_group.replace('/', '_').replace('.', '_') - + log_paths = [] test_data_dirs = {} diff --git a/tests/integration/conftest.py b/tests/integration/conftest.py index 993e7a6e973..bcd47899ca0 100644 --- a/tests/integration/conftest.py +++ b/tests/integration/conftest.py @@ -1,4 +1,3 @@ -import subprocess from helpers.cluster import run_and_check import pytest import logging diff --git a/tests/integration/helpers/cluster.py b/tests/integration/helpers/cluster.py index 5f7cfd9467b..6fe01b5df03 100644 --- a/tests/integration/helpers/cluster.py +++ b/tests/integration/helpers/cluster.py @@ -29,6 +29,8 @@ from dict2xml import dict2xml from kazoo.client import KazooClient from kazoo.exceptions import KazooException from minio import Minio +from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT + from helpers.test_tools import assert_eq_with_retry from helpers import pytest_xdist_logging_to_separate_files @@ -62,10 +64,10 @@ def run_and_check(args, env=None, shell=False, stdout=subprocess.PIPE, stderr=su out = res.stdout.decode('utf-8') err = res.stderr.decode('utf-8') # check_call(...) from subprocess does not print stderr, so we do it manually - if out: - logging.debug(f"Stdout:{out}") - if err: - logging.debug(f"Stderr:{err}") + for outline in out.splitlines(): + logging.debug(f"Stdout:{outline}") + for errline in err.splitlines(): + logging.debug(f"Stderr:{errline}") if res.returncode != 0: logging.debug(f"Exitcode:{res.returncode}") if env: @@ -108,6 +110,7 @@ def subprocess_check_call(args, detach=False, nothrow=False): #logging.info('run:' + ' '.join(args)) return run_and_check(args, detach=detach, nothrow=nothrow) + def get_odbc_bridge_path(): path = os.environ.get('CLICKHOUSE_TESTS_ODBC_BRIDGE_BIN_PATH') if path is None: @@ -259,6 +262,7 @@ class ClickHouseCluster: self.with_hdfs = False self.with_kerberized_hdfs = False self.with_mongo = False + self.with_mongo_secure = False self.with_net_trics = False self.with_redis = False self.with_cassandra = False @@ -332,12 +336,16 @@ class ClickHouseCluster: # available when with_postgres == True self.postgres_host = "postgres1" self.postgres_ip = None + self.postgres_conn = None self.postgres2_host = "postgres2" self.postgres2_ip = None + self.postgres2_conn = None self.postgres3_host = "postgres3" self.postgres3_ip = None + self.postgres3_conn = None self.postgres4_host = "postgres4" self.postgres4_ip = None + self.postgres4_conn = None self.postgres_port = 5432 self.postgres_dir = p.abspath(p.join(self.instances_dir, "postgres")) self.postgres_logs_dir = os.path.join(self.postgres_dir, "postgres1") @@ -386,6 +394,13 @@ class ClickHouseCluster: self.zookeeper_instance_dir_prefix = p.join(self.instances_dir, "zk") self.zookeeper_dirs_to_create = [] + # available when with_jdbc_bridge == True + self.jdbc_bridge_host = "bridge1" + self.jdbc_bridge_ip = None + self.jdbc_bridge_port = 9019 + self.jdbc_driver_dir = p.abspath(p.join(self.instances_dir, "jdbc_driver")) + self.jdbc_driver_logs_dir = os.path.join(self.jdbc_driver_dir, "logs") + self.docker_client = None self.is_up = False self.env = os.environ.copy() @@ -394,15 +409,20 @@ class ClickHouseCluster: def cleanup(self): # Just in case kill unstopped containers from previous launch try: + # docker-compose names containers using the following formula: + # container_name = project_name + '_' + instance_name + '_1' # We need to have "^/" and "$" in the "--filter name" option below to filter by exact name of the container, see # https://stackoverflow.com/questions/48767760/how-to-make-docker-container-ls-f-name-filter-by-exact-name - result = run_and_check(f'docker container list --all --filter name=^/{self.project_name}$ | wc -l', shell=True) - if int(result) > 1: - logging.debug(f"Trying to kill unstopped containers for project {self.project_name}...") - run_and_check(f'docker kill $(docker container list --all --quiet --filter name=^/{self.project_name}$)', shell=True) - run_and_check(f'docker rm $(docker container list --all --quiet --filter name=^/{self.project_name}$)', shell=True) + filter_name = f'^/{self.project_name}_.*_1$' + if int(run_and_check(f'docker container list --all --filter name={filter_name} | wc -l', shell=True)) > 1: + logging.debug(f"Trying to kill unstopped containers for project {self.project_name}:") + unstopped_containers = run_and_check(f'docker container list --all --filter name={filter_name}', shell=True) + unstopped_containers_ids = [line.split()[0] for line in unstopped_containers.splitlines()[1:]] + for id in unstopped_containers_ids: + run_and_check(f'docker kill {id}', shell=True, nothrow=True) + run_and_check(f'docker rm {id}', shell=True, nothrow=True) logging.debug("Unstopped containers killed") - run_and_check(['docker-compose', 'ps', '--services', '--all']) + run_and_check(f'docker container list --all --filter name={filter_name}', shell=True) else: logging.debug(f"No running containers for project: {self.project_name}") except: @@ -530,7 +550,6 @@ class ClickHouseCluster: return self.base_mysql_client_cmd - def setup_mysql_cmd(self, instance, env_variables, docker_compose_yml_dir): self.with_mysql = True env_variables['MYSQL_HOST'] = self.mysql_host @@ -662,6 +681,17 @@ class ClickHouseCluster: '--file', p.join(docker_compose_yml_dir, 'docker_compose_rabbitmq.yml')] return self.base_rabbitmq_cmd + def setup_mongo_secure_cmd(self, instance, env_variables, docker_compose_yml_dir): + self.with_mongo = self.with_mongo_secure = True + env_variables['MONGO_HOST'] = self.mongo_host + env_variables['MONGO_EXTERNAL_PORT'] = str(self.mongo_port) + env_variables['MONGO_INTERNAL_PORT'] = "27017" + env_variables['MONGO_CONFIG_PATH'] = HELPERS_DIR + self.base_cmd.extend(['--file', p.join(docker_compose_yml_dir, 'docker_compose_mongo_secure.yml')]) + self.base_mongo_cmd = ['docker-compose', '--env-file', instance.env_file, '--project-name', self.project_name, + '--file', p.join(docker_compose_yml_dir, 'docker_compose_mongo_secure.yml')] + return self.base_mongo_cmd + def setup_mongo_cmd(self, instance, env_variables, docker_compose_yml_dir): self.with_mongo = True env_variables['MONGO_HOST'] = self.mongo_host @@ -694,6 +724,8 @@ class ClickHouseCluster: def setup_jdbc_bridge_cmd(self, instance, env_variables, docker_compose_yml_dir): self.with_jdbc_bridge = True + env_variables['JDBC_DRIVER_LOGS'] = self.jdbc_driver_logs_dir + env_variables['JDBC_DRIVER_FS'] = "bind" self.base_cmd.extend(['--file', p.join(docker_compose_yml_dir, 'docker_compose_jdbc_bridge.yml')]) self.base_jdbc_bridge_cmd = ['docker-compose', '--env-file', instance.env_file, '--project-name', self.project_name, '--file', p.join(docker_compose_yml_dir, 'docker_compose_jdbc_bridge.yml')] @@ -703,7 +735,8 @@ class ClickHouseCluster: macros=None, with_zookeeper=False, with_zookeeper_secure=False, with_mysql_client=False, with_mysql=False, with_mysql8=False, with_mysql_cluster=False, with_kafka=False, with_kerberized_kafka=False, with_rabbitmq=False, clickhouse_path_dir=None, - with_odbc_drivers=False, with_postgres=False, with_postgres_cluster=False, with_hdfs=False, with_kerberized_hdfs=False, with_mongo=False, + with_odbc_drivers=False, with_postgres=False, with_postgres_cluster=False, with_hdfs=False, + with_kerberized_hdfs=False, with_mongo=False, with_mongo_secure=False, with_redis=False, with_minio=False, with_cassandra=False, with_jdbc_bridge=False, hostname=None, env_variables=None, image="yandex/clickhouse-integration-test", tag=None, stay_alive=False, ipv4_address=None, ipv6_address=None, with_installed_binary=False, tmpfs=None, @@ -756,7 +789,7 @@ class ClickHouseCluster: with_kerberized_kafka=with_kerberized_kafka, with_rabbitmq=with_rabbitmq, with_kerberized_hdfs=with_kerberized_hdfs, - with_mongo=with_mongo, + with_mongo=with_mongo or with_mongo_secure, with_redis=with_redis, with_minio=with_minio, with_cassandra=with_cassandra, @@ -841,8 +874,11 @@ class ClickHouseCluster: if with_kerberized_hdfs and not self.with_kerberized_hdfs: cmds.append(self.setup_kerberized_hdfs_cmd(instance, env_variables, docker_compose_yml_dir)) - if with_mongo and not self.with_mongo: - cmds.append(self.setup_mongo_cmd(instance, env_variables, docker_compose_yml_dir)) + if (with_mongo or with_mongo_secure) and not (self.with_mongo or self.with_mongo_secure): + if with_mongo_secure: + cmds.append(self.setup_mongo_secure_cmd(instance, env_variables, docker_compose_yml_dir)) + else: + cmds.append(self.setup_mongo_cmd(instance, env_variables, docker_compose_yml_dir)) if self.with_net_trics: for cmd in cmds: @@ -1077,8 +1113,9 @@ class ClickHouseCluster: start = time.time() while time.time() - start < timeout: try: - conn = psycopg2.connect(host=self.postgres_ip, port=self.postgres_port, user='postgres', password='mysecretpassword') - conn.close() + self.postgres_conn = psycopg2.connect(host=self.postgres_ip, port=self.postgres_port, database='postgres', user='postgres', password='mysecretpassword') + self.postgres_conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT) + self.postgres_conn.autocommit = True logging.debug("Postgres Started") return except Exception as ex: @@ -1092,20 +1129,40 @@ class ClickHouseCluster: self.postgres3_ip = self.get_instance_ip(self.postgres3_host) self.postgres4_ip = self.get_instance_ip(self.postgres4_host) start = time.time() - for ip in [self.postgres2_ip, self.postgres3_ip, self.postgres4_ip]: - while time.time() - start < timeout: - try: - conn = psycopg2.connect(host=ip, port=self.postgres_port, user='postgres', password='mysecretpassword') - conn.close() - logging.debug("Postgres Cluster Started") - return - except Exception as ex: - logging.debug("Can't connect to Postgres " + str(ex)) - time.sleep(0.5) + while time.time() - start < timeout: + try: + self.postgres2_conn = psycopg2.connect(host=self.postgres2_ip, port=self.postgres_port, database='postgres', user='postgres', password='mysecretpassword') + self.postgres2_conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT) + self.postgres2_conn.autocommit = True + logging.debug("Postgres Cluster host 2 started") + break + except Exception as ex: + logging.debug("Can't connect to Postgres host 2" + str(ex)) + time.sleep(0.5) + while time.time() - start < timeout: + try: + self.postgres3_conn = psycopg2.connect(host=self.postgres3_ip, port=self.postgres_port, database='postgres', user='postgres', password='mysecretpassword') + self.postgres3_conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT) + self.postgres3_conn.autocommit = True + logging.debug("Postgres Cluster host 3 started") + break + except Exception as ex: + logging.debug("Can't connect to Postgres host 3" + str(ex)) + time.sleep(0.5) + while time.time() - start < timeout: + try: + self.postgres4_conn = psycopg2.connect(host=self.postgres4_ip, port=self.postgres_port, database='postgres', user='postgres', password='mysecretpassword') + self.postgres4_conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT) + self.postgres4_conn.autocommit = True + logging.debug("Postgres Cluster host 4 started") + return + except Exception as ex: + logging.debug("Can't connect to Postgres host 4" + str(ex)) + time.sleep(0.5) raise Exception("Cannot wait Postgres container") - def wait_rabbitmq_to_start(self, timeout=180): + def wait_rabbitmq_to_start(self, timeout=180, throw=True): self.rabbitmq_ip = self.get_instance_ip(self.rabbitmq_host) start = time.time() @@ -1115,13 +1172,15 @@ class ClickHouseCluster: logging.debug("RabbitMQ is available") if enable_consistent_hash_plugin(self.rabbitmq_docker_id): logging.debug("RabbitMQ consistent hash plugin is available") - return + return True time.sleep(0.5) except Exception as ex: logging.debug("Can't connect to RabbitMQ " + str(ex)) time.sleep(0.5) - raise Exception("Cannot wait RabbitMQ container") + if throw: + raise Exception("Cannot wait RabbitMQ container") + return False def wait_zookeeper_secure_to_start(self, timeout=20): logging.debug("Wait ZooKeeper Secure to start") @@ -1191,7 +1250,6 @@ class ClickHouseCluster: logging.debug("Waiting for Kafka to start up") time.sleep(1) - def wait_hdfs_to_start(self, timeout=300, check_marker=False): start = time.time() while time.time() - start < timeout: @@ -1208,9 +1266,11 @@ class ClickHouseCluster: raise Exception("Can't wait HDFS to start") - def wait_mongo_to_start(self, timeout=180): + def wait_mongo_to_start(self, timeout=30, secure=False): connection_str = 'mongodb://{user}:{password}@{host}:{port}'.format( host='localhost', port=self.mongo_port, user='root', password='clickhouse') + if secure: + connection_str += '/?tls=true&tlsAllowInvalidCertificates=true' connection = pymongo.MongoClient(connection_str) start = time.time() while time.time() - start < timeout: @@ -1277,7 +1337,6 @@ class ClickHouseCluster: raise Exception("Can't wait Schema Registry to start") - def wait_cassandra_to_start(self, timeout=180): self.cassandra_ip = self.get_instance_ip(self.cassandra_host) cass_client = cassandra.cluster.Cluster([self.cassandra_ip], port=self.cassandra_port, load_balancing_policy=RoundRobinPolicy()) @@ -1435,9 +1494,13 @@ class ClickHouseCluster: logging.debug('Setup RabbitMQ') os.makedirs(self.rabbitmq_logs_dir) os.chmod(self.rabbitmq_logs_dir, stat.S_IRWXO) - subprocess_check_call(self.base_rabbitmq_cmd + common_opts + ['--renew-anon-volumes']) - self.rabbitmq_docker_id = self.get_instance_docker_id('rabbitmq1') - self.wait_rabbitmq_to_start() + + for i in range(5): + subprocess_check_call(self.base_rabbitmq_cmd + common_opts + ['--renew-anon-volumes']) + self.rabbitmq_docker_id = self.get_instance_docker_id('rabbitmq1') + logging.debug(f"RabbitMQ checking container try: {i}") + if self.wait_rabbitmq_to_start(throw=(i==4)): + break if self.with_hdfs and self.base_hdfs_cmd: logging.debug('Setup HDFS') @@ -1458,7 +1521,7 @@ class ClickHouseCluster: if self.with_mongo and self.base_mongo_cmd: logging.debug('Setup Mongo') run_and_check(self.base_mongo_cmd + common_opts) - self.wait_mongo_to_start(30) + self.wait_mongo_to_start(30, secure=self.with_mongo_secure) if self.with_redis and self.base_redis_cmd: logging.debug('Setup Redis') @@ -1485,8 +1548,12 @@ class ClickHouseCluster: self.wait_cassandra_to_start() if self.with_jdbc_bridge and self.base_jdbc_bridge_cmd: + os.makedirs(self.jdbc_driver_logs_dir) + os.chmod(self.jdbc_driver_logs_dir, stat.S_IRWXO) + subprocess_check_call(self.base_jdbc_bridge_cmd + ['up', '-d']) - self.wait_for_url("http://localhost:9020/ping") + self.jdbc_bridge_ip = self.get_instance_ip(self.jdbc_bridge_host) + self.wait_for_url(f"http://{self.jdbc_bridge_ip}:{self.jdbc_bridge_port}/ping") clickhouse_start_cmd = self.base_cmd + ['up', '-d', '--no-recreate'] logging.debug(("Trying to create ClickHouse instance by command %s", ' '.join(map(str, clickhouse_start_cmd)))) diff --git a/tests/integration/helpers/mongo_cert.pem b/tests/integration/helpers/mongo_cert.pem new file mode 100644 index 00000000000..9e18b1d4469 --- /dev/null +++ b/tests/integration/helpers/mongo_cert.pem @@ -0,0 +1,44 @@ +-----BEGIN RSA PRIVATE KEY----- +MIIEpAIBAAKCAQEAtz2fpa8hyUff8u8jYlh20HbkOO8hQi64Ke2Prack2Br0lhOr +1MI6I8nVk5iDrt+7ix2Cnt+2aZKb6HJv0CG1V25yWg+jgsXeIT1KHTJf8rTmYxhb +t+ye+S1Z0h/Rt+xqSd9XXfzOLPGHYfyx6ZQ4AumO/HoEFD4IH/qiREjwtOfRXuhz +CohqtUTyYR7pJmZqBSuGac461WVRisnjfKRxeVa3itc84/RgktgYej2x4PQBFk13 +xAXKrWmHkwdgWklTuuK8Gtoqz65Y4/J9CSl+Bd08QDdRnaVvq1u1eNTZg1BVyeRv +jFYBMSathKASrng5nK66Fdilw6tO/9khaP0SDQIDAQABAoIBAAm/5qGrKtIJ1/mW +Dbzq1g+Lc+MvngZmc/gPIsjrjsNM09y0WT0txGgpEgsTX1ZLoy/otw16+7qsSU1Z +4WcilAJ95umx0VJg8suz9iCNkJtaUrPNFPw5Q9AgQJo0hTUTCCi8EGr4y4OKqlhl +WJYEA+LryGbYmyT0k/wXmtClTOFjKS09mK4deQ1DqbBxayR9MUZgRJzEODA8eGXs +Rc6fJUenMVNMzIVLgpossRtKImoZNcf5UtCKL3HECunndQeMu4zuqLMU+EzL1F/o +iHKF7v3CVmsK0OxNJfOfT0abN3XaJttFwTJyghQjgP8OX1IKjlj3vo9xwEDfVUEf +GVIER0UCgYEA2j+kjaT3Dw2lNlBEsA8uvVlLJHBc9JBtMGduGYsIEgsL/iStqXv4 +xoA9N3CwkN/rrQpDfi/16DMXAAYdjGulPwiMNFBY22TCzhtPC2mAnfaSForxwZCs +lwc3KkIloo3N5XvN78AuZf8ewiS+bOEj+HHHqqSb1+u/csuaXO9neesCgYEA1u/I +Mlt/pxJkH+c3yOskwCh/CNhq9szi8G9LXROIQ58BT2ydJSEPpt7AhUTtQGimQQTW +KLiffJSkjuVaFckR1GjCoAmFGYw9wUb+TmFNScz5pJ2dXse8aBysAMIQfEIcRAEa +gKnkLBH6nw3+/Hm3xwoBc35t8Pa2ek7LsWDfbecCgYBhilQW4gVw+t49uf4Y2ZBA +G+pTbMx+mRXTrkYssFB5D+raOLZMqxVyUdoKLxkahpkkCxRDD1hN4JeE8Ta/jVSb +KUzQDKDJ3OybhOT86rgK4SpFXO/TXL9l+FmVT17WmZ3N1Fkjr7aM60pp5lYc/zo+ +TUu5XjwwcjJsMcbZhj2u5QKBgQCDNuUn4PYAP9kCJPzIWs0XxmEvPDeorZIJqFgA +3XC9n2+EVlFlHlbYz3oGofqY7Io6fUJkn7k1q+T+G4QwcozA+KeAXe90lkoJGVcc +8IfnewwYc+RjvVoG0SIsYE0CHrX0yhus2oqiYON4gGnfJkuMZk5WfKOPjH4AEuSF +SBd+lwKBgQCHG/DA6u2mYmezPF9hebWFoyAVSr2PDXDhu8cNNHCpx9GewJXhuK/P +tW8mazHzUuJKBvmaUXDIXFh4K6FNhjH16p5jR1w3hsPE7NEZhjfVRaUYPmBqaOYR +jp8H+Sh5g4Rwbtfp6Qhu6UAKi/y6Vozs5GkJtSiNrjNDVrD+sGGrXA== +-----END RSA PRIVATE KEY----- +-----BEGIN CERTIFICATE----- +MIICqDCCAZACFBdaMnuT0pWhmrh05UT3HXJ+kI0yMA0GCSqGSIb3DQEBCwUAMA0x +CzAJBgNVBAMMAmNhMB4XDTIxMDQwNjE3MDQxNVoXDTIyMDQwNjE3MDQxNVowFDES +MBAGA1UEAwwJbG9jYWxob3N0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKC +AQEAtz2fpa8hyUff8u8jYlh20HbkOO8hQi64Ke2Prack2Br0lhOr1MI6I8nVk5iD +rt+7ix2Cnt+2aZKb6HJv0CG1V25yWg+jgsXeIT1KHTJf8rTmYxhbt+ye+S1Z0h/R +t+xqSd9XXfzOLPGHYfyx6ZQ4AumO/HoEFD4IH/qiREjwtOfRXuhzCohqtUTyYR7p +JmZqBSuGac461WVRisnjfKRxeVa3itc84/RgktgYej2x4PQBFk13xAXKrWmHkwdg +WklTuuK8Gtoqz65Y4/J9CSl+Bd08QDdRnaVvq1u1eNTZg1BVyeRvjFYBMSathKAS +rng5nK66Fdilw6tO/9khaP0SDQIDAQABMA0GCSqGSIb3DQEBCwUAA4IBAQAct2If +isMLHIqyL9GjY4b0xcxF4svFU/DUwNanStmoFMW1ifPf1cCqeMzyQOxBCDdMs0RT +hBbDYHW0BMXDqYIr3Ktbu38/3iVyr3pb56YOCKy8yHXpmKEaUBhCknSLcQyvNfeS +tM+DWsKFTZfyR5px+WwXbGKVMYwLaTON+/wcv1MeKMig3CxluaCpEJVYYwAiUc4K +sgvQNAunwGmPLPoXtUnpR2ZWiQA5R6yjS1oIe+8vpryFP6kjhWs0HR0jZEtLulV5 +WXUuxkqTXiBIvYpsmusoR44e9rptwLbV1wL/LUScRt9ttqFM3N5/Pof+2UwkSjGB +GAyPmw0Pkqtt+lva +-----END CERTIFICATE----- diff --git a/tests/integration/helpers/mongo_secure.conf b/tests/integration/helpers/mongo_secure.conf new file mode 100644 index 00000000000..1128b16b546 --- /dev/null +++ b/tests/integration/helpers/mongo_secure.conf @@ -0,0 +1,5 @@ +net: + ssl: + mode: requireSSL + PEMKeyFile: /mongo/mongo_cert.pem + allowConnectionsWithoutCertificates: true diff --git a/tests/integration/parallel.json b/tests/integration/parallel.json index 2879f258406..6a630bf251f 100644 --- a/tests/integration/parallel.json +++ b/tests/integration/parallel.json @@ -127,60 +127,60 @@ "test_keeper_multinode_simple/test.py::test_simple_replicated_table", "test_keeper_multinode_simple/test.py::test_watch_on_follower", "test_limited_replicated_fetches/test.py::test_limited_fetches", - "test_materialize_mysql_database/test.py::test_clickhouse_killed_while_insert_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_clickhouse_killed_while_insert_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_clickhouse_killed_while_insert_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_clickhouse_killed_while_insert_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_insert_with_modify_binlog_checksum_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_insert_with_modify_binlog_checksum_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_insert_with_modify_binlog_checksum_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_insert_with_modify_binlog_checksum_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_mysql_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_mysql_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_mysql_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_mysql_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_dml_with_mysql_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_dml_with_mysql_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_dml_with_mysql_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_dml_with_mysql_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_err_sync_user_privs_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_err_sync_user_privs_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_err_sync_user_privs_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_err_sync_user_privs_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_multi_table_update[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_multi_table_update[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_mysql_killed_while_insert_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_mysql_killed_while_insert_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_mysql_killed_while_insert_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_mysql_killed_while_insert_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_mysql_kill_sync_thread_restore_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_mysql_kill_sync_thread_restore_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_mysql_kill_sync_thread_restore_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_mysql_kill_sync_thread_restore_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_mysql_settings[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_mysql_settings[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_network_partition_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_network_partition_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_network_partition_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_network_partition_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_select_without_columns_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_select_without_columns_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_select_without_columns_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_select_without_columns_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_system_parts_table[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_system_parts_table[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_system_tables_table[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_system_tables_table[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_materialize_with_column_comments[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_materialize_with_column_comments[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_materialize_with_enum[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_materialize_with_enum[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_utf8mb4[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_utf8mb4[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_clickhouse_killed_while_insert_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_clickhouse_killed_while_insert_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_clickhouse_killed_while_insert_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_clickhouse_killed_while_insert_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_insert_with_modify_binlog_checksum_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_insert_with_modify_binlog_checksum_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_insert_with_modify_binlog_checksum_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_insert_with_modify_binlog_checksum_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_mysql_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_mysql_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_mysql_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_mysql_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_dml_with_mysql_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_dml_with_mysql_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_dml_with_mysql_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_dml_with_mysql_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_err_sync_user_privs_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_err_sync_user_privs_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_err_sync_user_privs_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_err_sync_user_privs_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_multi_table_update[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_multi_table_update[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_mysql_killed_while_insert_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_mysql_killed_while_insert_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_mysql_killed_while_insert_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_mysql_killed_while_insert_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_mysql_kill_sync_thread_restore_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_mysql_kill_sync_thread_restore_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_mysql_kill_sync_thread_restore_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_mysql_kill_sync_thread_restore_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_mysql_settings[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_mysql_settings[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_network_partition_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_network_partition_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_network_partition_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_network_partition_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_select_without_columns_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_select_without_columns_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_select_without_columns_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_select_without_columns_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_system_parts_table[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_system_parts_table[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_system_tables_table[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_system_tables_table[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_materialize_with_column_comments[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_materialize_with_column_comments[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_materialize_with_enum[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_materialize_with_enum[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_utf8mb4[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_utf8mb4[clickhouse_node1]", "test_parts_delete_zookeeper/test.py::test_merge_doesnt_work_without_zookeeper", "test_polymorphic_parts/test.py::test_compact_parts_only", "test_polymorphic_parts/test.py::test_different_part_types_on_replicas[polymorphic_table_compact-Compact]", diff --git a/tests/integration/parallel.readme b/tests/integration/parallel.readme index ac4f897482c..5533ab8b84d 100644 --- a/tests/integration/parallel.readme +++ b/tests/integration/parallel.readme @@ -3,4 +3,4 @@ # 1. Generate all tests list as in CI run ./runner ' --setup-plan' | grep '::' | sed 's/ (fixtures used:.*//g' | sed 's/^ *//g' | sed 's/ *$//g' | sort -u > all_tests.txt # 2. Filter known tests that are currently not run in parallel -cat all_tests.txt | grep '^test_replicated_database\|^test_disabled_mysql_server\|^test_distributed_ddl\|^test_distributed_ddl\|^test_quorum_inserts_parallel\|^test_ddl_worker_non_leader\|^test_consistent_parts_after_clone_replica\|^test_materialize_mysql_database\|^test_atomic_drop_table\|^test_distributed_respect_user_timeouts\|^test_storage_kafka\|^test_replace_partition\|^test_replicated_fetches_timeouts\|^test_system_clusters_actual_information\|^test_delayed_replica_failover\|^test_limited_replicated_fetches\|^test_hedged_requests\|^test_insert_into_distributed\|^test_insert_into_distributed_through_materialized_view\|^test_drop_replica\|^test_attach_without_fetching\|^test_system_replicated_fetches\|^test_cross_replication\|^test_dictionary_allow_read_expired_keys\|^test_dictionary_allow_read_expired_keys\|^test_dictionary_allow_read_expired_keys\|^test_insert_into_distributed_sync_async\|^test_hedged_requests_parallel\|^test_dictionaries_update_field\|^test_broken_part_during_merge\|^test_random_inserts\|^test_reload_clusters_config\|^test_parts_delete_zookeeper\|^test_polymorphic_parts\|^test_keeper_multinode_simple\|^test_https_replication\|^test_storage_kerberized_kafka\|^test_cleanup_dir_after_bad_zk_conn\|^test_system_metrics\|^test_keeper_multinode_blocade_leader' | awk '{$1=$1;print}' | jq -R -n '[inputs] | .' > parallel_skip.json +cat all_tests.txt | grep '^test_replicated_database\|^test_disabled_mysql_server\|^test_distributed_ddl\|^test_distributed_ddl\|^test_quorum_inserts_parallel\|^test_ddl_worker_non_leader\|^test_consistent_parts_after_clone_replica\|^test_materialized_mysql_database\|^test_atomic_drop_table\|^test_distributed_respect_user_timeouts\|^test_storage_kafka\|^test_replace_partition\|^test_replicated_fetches_timeouts\|^test_system_clusters_actual_information\|^test_delayed_replica_failover\|^test_limited_replicated_fetches\|^test_hedged_requests\|^test_insert_into_distributed\|^test_insert_into_distributed_through_materialized_view\|^test_drop_replica\|^test_attach_without_fetching\|^test_system_replicated_fetches\|^test_cross_replication\|^test_dictionary_allow_read_expired_keys\|^test_dictionary_allow_read_expired_keys\|^test_dictionary_allow_read_expired_keys\|^test_insert_into_distributed_sync_async\|^test_hedged_requests_parallel\|^test_dictionaries_update_field\|^test_broken_part_during_merge\|^test_random_inserts\|^test_reload_clusters_config\|^test_parts_delete_zookeeper\|^test_polymorphic_parts\|^test_keeper_multinode_simple\|^test_https_replication\|^test_storage_kerberized_kafka\|^test_cleanup_dir_after_bad_zk_conn\|^test_system_metrics\|^test_keeper_multinode_blocade_leader' | awk '{$1=$1;print}' | jq -R -n '[inputs] | .' > parallel_skip.json diff --git a/tests/integration/parallel_skip.json b/tests/integration/parallel_skip.json index 2c993691d78..b4f368abb8e 100644 --- a/tests/integration/parallel_skip.json +++ b/tests/integration/parallel_skip.json @@ -131,60 +131,60 @@ "test_keeper_multinode_simple/test.py::test_simple_replicated_table", "test_keeper_multinode_simple/test.py::test_watch_on_follower", "test_limited_replicated_fetches/test.py::test_limited_fetches", - "test_materialize_mysql_database/test.py::test_clickhouse_killed_while_insert_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_clickhouse_killed_while_insert_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_clickhouse_killed_while_insert_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_clickhouse_killed_while_insert_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_insert_with_modify_binlog_checksum_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_insert_with_modify_binlog_checksum_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_insert_with_modify_binlog_checksum_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_insert_with_modify_binlog_checksum_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_mysql_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_mysql_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_mysql_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_ddl_with_mysql_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_dml_with_mysql_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_dml_with_mysql_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_dml_with_mysql_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_dml_with_mysql_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_err_sync_user_privs_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_err_sync_user_privs_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_materialize_database_err_sync_user_privs_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_materialize_database_err_sync_user_privs_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_multi_table_update[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_multi_table_update[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_mysql_killed_while_insert_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_mysql_killed_while_insert_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_mysql_killed_while_insert_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_mysql_killed_while_insert_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_mysql_kill_sync_thread_restore_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_mysql_kill_sync_thread_restore_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_mysql_kill_sync_thread_restore_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_mysql_kill_sync_thread_restore_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_mysql_settings[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_mysql_settings[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_network_partition_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_network_partition_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_network_partition_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_network_partition_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_select_without_columns_5_7[atomic]", - "test_materialize_mysql_database/test.py::test_select_without_columns_5_7[ordinary]", - "test_materialize_mysql_database/test.py::test_select_without_columns_8_0[atomic]", - "test_materialize_mysql_database/test.py::test_select_without_columns_8_0[ordinary]", - "test_materialize_mysql_database/test.py::test_system_parts_table[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_system_parts_table[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_system_tables_table[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_system_tables_table[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_materialize_with_column_comments[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_materialize_with_column_comments[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_materialize_with_enum[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_materialize_with_enum[clickhouse_node1]", - "test_materialize_mysql_database/test.py::test_utf8mb4[clickhouse_node0]", - "test_materialize_mysql_database/test.py::test_utf8mb4[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_clickhouse_killed_while_insert_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_clickhouse_killed_while_insert_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_clickhouse_killed_while_insert_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_clickhouse_killed_while_insert_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_insert_with_modify_binlog_checksum_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_insert_with_modify_binlog_checksum_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_insert_with_modify_binlog_checksum_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_insert_with_modify_binlog_checksum_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_empty_transaction_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_mysql_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_mysql_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_mysql_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_ddl_with_mysql_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_dml_with_mysql_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_dml_with_mysql_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_dml_with_mysql_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_dml_with_mysql_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_err_sync_user_privs_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_err_sync_user_privs_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_materialize_database_err_sync_user_privs_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_materialize_database_err_sync_user_privs_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_multi_table_update[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_multi_table_update[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_mysql_killed_while_insert_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_mysql_killed_while_insert_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_mysql_killed_while_insert_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_mysql_killed_while_insert_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_mysql_kill_sync_thread_restore_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_mysql_kill_sync_thread_restore_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_mysql_kill_sync_thread_restore_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_mysql_kill_sync_thread_restore_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_mysql_settings[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_mysql_settings[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_network_partition_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_network_partition_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_network_partition_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_network_partition_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_select_without_columns_5_7[atomic]", + "test_materialized_mysql_database/test.py::test_select_without_columns_5_7[ordinary]", + "test_materialized_mysql_database/test.py::test_select_without_columns_8_0[atomic]", + "test_materialized_mysql_database/test.py::test_select_without_columns_8_0[ordinary]", + "test_materialized_mysql_database/test.py::test_system_parts_table[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_system_parts_table[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_system_tables_table[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_system_tables_table[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_materialize_with_column_comments[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_materialize_with_column_comments[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_materialize_with_enum[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_materialize_with_enum[clickhouse_node1]", + "test_materialized_mysql_database/test.py::test_utf8mb4[clickhouse_node0]", + "test_materialized_mysql_database/test.py::test_utf8mb4[clickhouse_node1]", "test_parts_delete_zookeeper/test.py::test_merge_doesnt_work_without_zookeeper", "test_polymorphic_parts/test.py::test_compact_parts_only", "test_polymorphic_parts/test.py::test_different_part_types_on_replicas[polymorphic_table_compact-Compact]", diff --git a/tests/integration/runner b/tests/integration/runner index cfd98134ea3..36cb4f22f9a 100755 --- a/tests/integration/runner +++ b/tests/integration/runner @@ -177,6 +177,12 @@ if __name__ == "__main__": dest="tests_list", help="List of tests to run") + parser.add_argument( + "-k", "--keyword_expression", + action="store", + dest="keyword_expression", + help="pytest keyword expression") + parser.add_argument( "--tmpfs", action='store_true', @@ -262,6 +268,9 @@ if __name__ == "__main__": for old_log_path in glob.glob(args.cases_dir + "/pytest*.log"): os.remove(old_log_path) + if args.keyword_expression: + args.pytest_args += ['-k', args.keyword_expression] + cmd = "docker run {net} {tty} --rm --name {name} --privileged \ --volume={odbc_bridge_bin}:/clickhouse-odbc-bridge --volume={bin}:/clickhouse \ --volume={library_bridge_bin}:/clickhouse-library-bridge --volume={bin}:/clickhouse \ @@ -280,7 +289,7 @@ if __name__ == "__main__": env_tags=env_tags, env_cleanup=env_cleanup, parallel=parallel_args, - opts=' '.join(args.pytest_args), + opts=' '.join(args.pytest_args).replace('\'', '\\\''), tests_list=' '.join(args.tests_list), dockerd_internal_volume=dockerd_internal_volume, img=DIND_INTEGRATION_TESTS_IMAGE_NAME + ":" + args.docker_image_version, diff --git a/tests/integration/test_materialize_mysql_database/__init__.py b/tests/integration/test_async_drain_connection/__init__.py similarity index 100% rename from tests/integration/test_materialize_mysql_database/__init__.py rename to tests/integration/test_async_drain_connection/__init__.py diff --git a/tests/integration/test_async_drain_connection/configs/config.xml b/tests/integration/test_async_drain_connection/configs/config.xml new file mode 100644 index 00000000000..0c42ac84d31 --- /dev/null +++ b/tests/integration/test_async_drain_connection/configs/config.xml @@ -0,0 +1,4 @@ + + + 10000 + diff --git a/tests/integration/test_async_drain_connection/test.py b/tests/integration/test_async_drain_connection/test.py new file mode 100644 index 00000000000..21f9b142e7a --- /dev/null +++ b/tests/integration/test_async_drain_connection/test.py @@ -0,0 +1,36 @@ +import os +import sys +import time +from multiprocessing.dummy import Pool +import pytest +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) +node = cluster.add_instance("node", main_configs=["configs/config.xml"]) + + +@pytest.fixture(scope="module") +def started_cluster(): + try: + cluster.start() + node.query( + 'create table t (number UInt64) engine = Distributed(test_cluster_two_shards, system, numbers);' + ) + yield cluster + + finally: + cluster.shutdown() + + +def test_filled_async_drain_connection_pool(started_cluster): + busy_pool = Pool(10) + + def execute_query(i): + for _ in range(100): + node.query('select * from t where number = 0 limit 2;', + settings={ + "sleep_in_receive_cancel_ms": 10000000, + "max_execution_time": 5 + }) + + p = busy_pool.map(execute_query, range(10)) diff --git a/tests/integration/test_backward_compatibility/test_cte_distributed.py b/tests/integration/test_backward_compatibility/test_cte_distributed.py new file mode 100644 index 00000000000..3aec527524b --- /dev/null +++ b/tests/integration/test_backward_compatibility/test_cte_distributed.py @@ -0,0 +1,54 @@ +import pytest + +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__, name="cte_distributed") +node1 = cluster.add_instance('node1', with_zookeeper=False) +node2 = cluster.add_instance('node2', + with_zookeeper=False, image='yandex/clickhouse-server', tag='21.7.3.14', stay_alive=True, + with_installed_binary=True) + + +@pytest.fixture(scope="module") +def start_cluster(): + try: + cluster.start() + yield cluster + + finally: + cluster.shutdown() + + + +def test_cte_distributed(start_cluster): + node2.query(""" +WITH + quantile(0.05)(cnt) as p05, + quantile(0.95)(cnt) as p95, + p95 - p05 as inter_percentile_range +SELECT + sum(cnt) as total_requests, + count() as data_points, + inter_percentile_range +FROM ( + SELECT + count() as cnt + FROM remote('node{1,2}', numbers(10)) + GROUP BY number +)""") + + node1.query(""" +WITH + quantile(0.05)(cnt) as p05, + quantile(0.95)(cnt) as p95, + p95 - p05 as inter_percentile_range +SELECT + sum(cnt) as total_requests, + count() as data_points, + inter_percentile_range +FROM ( + SELECT + count() as cnt + FROM remote('node{1,2}', numbers(10)) + GROUP BY number +)""") diff --git a/tests/integration/test_backward_compatibility/test_data_skipping_indices.py b/tests/integration/test_backward_compatibility/test_data_skipping_indices.py new file mode 100644 index 00000000000..45b85897798 --- /dev/null +++ b/tests/integration/test_backward_compatibility/test_data_skipping_indices.py @@ -0,0 +1,44 @@ +# pylint: disable=line-too-long +# pylint: disable=unused-argument +# pylint: disable=redefined-outer-name + +import pytest +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) +node = cluster.add_instance('node', image='yandex/clickhouse-server', tag='21.6', stay_alive=True, with_installed_binary=True) + + +@pytest.fixture(scope="module") +def start_cluster(): + try: + cluster.start() + yield cluster + + finally: + cluster.shutdown() + + +# TODO: cover other types too, but for this we need to add something like +# restart_with_tagged_version(), since right now it is not possible to +# switch to old tagged clickhouse version. +def test_index(start_cluster): + node.query(""" + CREATE TABLE data + ( + key Int, + value Nullable(Int), + INDEX value_index value TYPE minmax GRANULARITY 1 + ) + ENGINE = MergeTree + ORDER BY key; + + INSERT INTO data SELECT number, number FROM numbers(10000); + + SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1; + """) + node.restart_with_latest_version() + node.query(""" + SELECT * FROM data WHERE value = 20000 SETTINGS force_data_skipping_indices = 'value_index' SETTINGS force_data_skipping_indices = 'value_index', max_rows_to_read=1; + DROP TABLE data; + """) diff --git a/tests/integration/test_backward_compatibility/test_detach_part_wrong_partition_id.py b/tests/integration/test_backward_compatibility/test_detach_part_wrong_partition_id.py index a4f976cc62d..7c20b3c2476 100644 --- a/tests/integration/test_backward_compatibility/test_detach_part_wrong_partition_id.py +++ b/tests/integration/test_backward_compatibility/test_detach_part_wrong_partition_id.py @@ -2,7 +2,7 @@ import pytest from helpers.cluster import ClickHouseCluster -cluster = ClickHouseCluster(__file__) +cluster = ClickHouseCluster(__file__, name="detach") # Version 21.6.3.14 has incompatible partition id for tables with UUID in partition key. node_21_6 = cluster.add_instance('node_21_6', image='yandex/clickhouse-server', tag='21.6.3.14', stay_alive=True, with_installed_binary=True) diff --git a/tests/integration/test_cluster_copier/test.py b/tests/integration/test_cluster_copier/test.py index 7fe1d8c9d29..3d28295d40e 100644 --- a/tests/integration/test_cluster_copier/test.py +++ b/tests/integration/test_cluster_copier/test.py @@ -89,9 +89,9 @@ class Task1: instance = cluster.instances['s0_0_0'] for cluster_num in ["0", "1"]: - ddl_check_query(instance, "DROP DATABASE IF EXISTS default ON CLUSTER cluster{}".format(cluster_num)) + ddl_check_query(instance, "DROP DATABASE IF EXISTS default ON CLUSTER cluster{} SYNC".format(cluster_num)) ddl_check_query(instance, - "CREATE DATABASE IF NOT EXISTS default ON CLUSTER cluster{}".format( + "CREATE DATABASE default ON CLUSTER cluster{} ".format( cluster_num)) ddl_check_query(instance, "CREATE TABLE hits ON CLUSTER cluster0 (d UInt64, d1 UInt64 MATERIALIZED d+1) " + @@ -105,11 +105,11 @@ class Task1: settings={"insert_distributed_sync": 1}) def check(self): - assert TSV(self.cluster.instances['s0_0_0'].query("SELECT count() FROM hits_all")) == TSV("1002\n") - assert TSV(self.cluster.instances['s1_0_0'].query("SELECT count() FROM hits_all")) == TSV("1002\n") + assert self.cluster.instances['s0_0_0'].query("SELECT count() FROM hits_all").strip() == "1002" + assert self.cluster.instances['s1_0_0'].query("SELECT count() FROM hits_all").strip() == "1002" - assert TSV(self.cluster.instances['s1_0_0'].query("SELECT DISTINCT d % 2 FROM hits")) == TSV("1\n") - assert TSV(self.cluster.instances['s1_1_0'].query("SELECT DISTINCT d % 2 FROM hits")) == TSV("0\n") + assert self.cluster.instances['s1_0_0'].query("SELECT DISTINCT d % 2 FROM hits").strip() == "1" + assert self.cluster.instances['s1_1_0'].query("SELECT DISTINCT d % 2 FROM hits").strip() == "0" instance = self.cluster.instances['s0_0_0'] ddl_check_query(instance, "DROP TABLE hits_all ON CLUSTER cluster0") diff --git a/tests/integration/test_create_user_and_login/test.py b/tests/integration/test_create_user_and_login/test.py index 58a48bde95d..d0edde2233b 100644 --- a/tests/integration/test_create_user_and_login/test.py +++ b/tests/integration/test_create_user_and_login/test.py @@ -1,5 +1,8 @@ import pytest +import time +import logging from helpers.cluster import ClickHouseCluster +from helpers.test_tools import assert_eq_with_retry cluster = ClickHouseCluster(__file__) instance = cluster.add_instance('instance') @@ -38,3 +41,46 @@ def test_grant_create_user(): instance.query("GRANT CREATE USER ON *.* TO A") instance.query("CREATE USER B", user='A') assert instance.query("SELECT 1", user='B') == "1\n" + + +def test_login_as_dropped_user(): + for _ in range(0, 2): + instance.query("CREATE USER A") + assert instance.query("SELECT 1", user='A') == "1\n" + + instance.query("DROP USER A") + expected_error = "no user with such name" + assert expected_error in instance.query_and_get_error("SELECT 1", user='A') + + +def test_login_as_dropped_user_xml(): + for _ in range(0, 2): + instance.exec_in_container(["bash", "-c" , """ + cat > /etc/clickhouse-server/users.d/user_c.xml << EOF + + + + + + + + +EOF"""]) + + assert_eq_with_retry(instance, "SELECT name FROM system.users WHERE name='C'", "C") + + instance.exec_in_container(["bash", "-c" , "rm /etc/clickhouse-server/users.d/user_c.xml"]) + + expected_error = "no user with such name" + while True: + out, err = instance.query_and_get_answer_with_error("SELECT 1", user='C') + if expected_error in err: + logging.debug(f"Got error '{expected_error}' just as expected") + break + if out == "1\n": + logging.debug(f"Got output '1', retrying...") + time.sleep(0.5) + continue + raise Exception(f"Expected either output '1' or error '{expected_error}', got output={out} and error={err}") + + assert instance.query("SELECT name FROM system.users WHERE name='C'") == "" diff --git a/tests/integration/test_distributed_respect_user_timeouts/test.py b/tests/integration/test_distributed_respect_user_timeouts/test.py index 662bf7fa6de..d8eb92d96b5 100644 --- a/tests/integration/test_distributed_respect_user_timeouts/test.py +++ b/tests/integration/test_distributed_respect_user_timeouts/test.py @@ -33,7 +33,7 @@ SELECTS_SQL = { "ORDER BY node"), } -EXCEPTION_NETWORK = 'e.displayText() = DB::NetException: ' +EXCEPTION_NETWORK = 'DB::NetException: ' EXCEPTION_TIMEOUT = 'Timeout exceeded while reading from socket (' EXCEPTION_CONNECT = 'Timeout: connect timed out: ' @@ -57,7 +57,7 @@ TIMEOUT_DIFF_UPPER_BOUND = { }, 'ready_to_wait': { 'distributed': 3, - 'remote': 1.5, + 'remote': 2.0, }, } @@ -76,13 +76,13 @@ def _check_exception(exception, expected_tries=3): for i, line in enumerate(lines[3:3 + expected_tries]): expected_lines = ( - 'Code: 209, ' + EXCEPTION_NETWORK + EXCEPTION_TIMEOUT, - 'Code: 209, ' + EXCEPTION_NETWORK + EXCEPTION_CONNECT, + 'Code: 209. ' + EXCEPTION_NETWORK + EXCEPTION_TIMEOUT, + 'Code: 209. ' + EXCEPTION_NETWORK + EXCEPTION_CONNECT, EXCEPTION_TIMEOUT, ) assert any(line.startswith(expected) for expected in expected_lines), \ - 'Unexpected exception at one of the connection attempts' + 'Unexpected exception "{}" at one of the connection attempts'.format(line) assert lines[3 + expected_tries] == '', 'Wrong number of connect attempts' diff --git a/tests/integration/test_encrypted_disk/configs/storage.xml b/tests/integration/test_encrypted_disk/configs/storage.xml index b0485178b13..6a5e016d501 100644 --- a/tests/integration/test_encrypted_disk/configs/storage.xml +++ b/tests/integration/test_encrypted_disk/configs/storage.xml @@ -25,8 +25,21 @@ encrypted disk_local encrypted/ - abcdefghijklmnop + 1234567812345678 + + encrypted + disk_local + encrypted2/ + 1234567812345678 + + + encrypted + disk_local + encrypted_key192b/ + aes_192_ctr + 109105c600c12066f82f1a4dbb41a08e4A4348C8387ADB6A +
@@ -36,6 +49,13 @@ + + +
+ disk_local_encrypted_key192b +
+
+
@@ -43,6 +63,8 @@
disk_local_encrypted + disk_local_encrypted2 + disk_local_encrypted_key192b
diff --git a/tests/integration/test_encrypted_disk/test.py b/tests/integration/test_encrypted_disk/test.py index 64085991ade..542980bcaaa 100644 --- a/tests/integration/test_encrypted_disk/test.py +++ b/tests/integration/test_encrypted_disk/test.py @@ -1,27 +1,37 @@ import pytest from helpers.cluster import ClickHouseCluster from helpers.client import QueryRuntimeException +from helpers.test_tools import assert_eq_with_retry FIRST_PART_NAME = "all_1_1_0" -@pytest.fixture(scope="module") -def cluster(): +cluster = ClickHouseCluster(__file__) +node = cluster.add_instance("node", + main_configs=["configs/storage.xml"], + tmpfs=["/disk:size=100M"], + with_minio=True) + + +@pytest.fixture(scope="module", autouse=True) +def start_cluster(): try: - cluster = ClickHouseCluster(__file__) - node = cluster.add_instance("node", - main_configs=["configs/storage.xml"], - tmpfs=["/disk:size=100M"], - with_minio=True) cluster.start() yield cluster finally: cluster.shutdown() -@pytest.mark.parametrize("policy", ["encrypted_policy", "local_policy", "s3_policy"]) -def test_encrypted_disk(cluster, policy): - node = cluster.instances["node"] +@pytest.fixture(autouse=True) +def cleanup_after_test(): + try: + yield + finally: + node.query("DROP TABLE IF EXISTS encrypted_test NO DELAY") + + +@pytest.mark.parametrize("policy", ["encrypted_policy", "encrypted_policy_key192b", "local_policy", "s3_policy"]) +def test_encrypted_disk(policy): node.query( """ CREATE TABLE encrypted_test ( @@ -41,12 +51,9 @@ def test_encrypted_disk(cluster, policy): node.query("OPTIMIZE TABLE encrypted_test FINAL") assert node.query(select_query) == "(0,'data'),(1,'data'),(2,'data'),(3,'data')" - node.query("DROP TABLE IF EXISTS encrypted_test NO DELAY") - -@pytest.mark.parametrize("policy,disk,encrypted_disk", [("local_policy", "disk_local", "disk_local_encrypted"), ("s3_policy", "disk_s3", "disk_s3_encrypted")]) -def test_part_move(cluster, policy, disk, encrypted_disk): - node = cluster.instances["node"] +@pytest.mark.parametrize("policy, destination_disks", [("local_policy", ["disk_local_encrypted", "disk_local_encrypted2", "disk_local_encrypted_key192b", "disk_local"]), ("s3_policy", ["disk_s3_encrypted", "disk_s3"])]) +def test_part_move(policy, destination_disks): node.query( """ CREATE TABLE encrypted_test ( @@ -62,23 +69,18 @@ def test_part_move(cluster, policy, disk, encrypted_disk): select_query = "SELECT * FROM encrypted_test ORDER BY id FORMAT Values" assert node.query(select_query) == "(0,'data'),(1,'data')" - node.query("ALTER TABLE encrypted_test MOVE PART '{}' TO DISK '{}'".format(FIRST_PART_NAME, encrypted_disk)) + for destination_disk in destination_disks: + node.query("ALTER TABLE encrypted_test MOVE PART '{}' TO DISK '{}'".format(FIRST_PART_NAME, destination_disk)) + assert node.query(select_query) == "(0,'data'),(1,'data')" + with pytest.raises(QueryRuntimeException) as exc: + node.query("ALTER TABLE encrypted_test MOVE PART '{}' TO DISK '{}'".format(FIRST_PART_NAME, destination_disk)) + assert("Part '{}' is already on disk '{}'".format(FIRST_PART_NAME, destination_disk) in str(exc.value)) + assert node.query(select_query) == "(0,'data'),(1,'data')" - with pytest.raises(QueryRuntimeException) as exc: - node.query("ALTER TABLE encrypted_test MOVE PART '{}' TO DISK '{}'".format(FIRST_PART_NAME, encrypted_disk)) - - assert("Part '{}' is already on disk '{}'".format(FIRST_PART_NAME, encrypted_disk) in str(exc.value)) - - node.query("ALTER TABLE encrypted_test MOVE PART '{}' TO DISK '{}'".format(FIRST_PART_NAME, disk)) - assert node.query(select_query) == "(0,'data'),(1,'data')" - - node.query("DROP TABLE IF EXISTS encrypted_test NO DELAY") - @pytest.mark.parametrize("policy,encrypted_disk", [("local_policy", "disk_local_encrypted"), ("s3_policy", "disk_s3_encrypted")]) -def test_optimize_table(cluster, policy, encrypted_disk): - node = cluster.instances["node"] +def test_optimize_table(policy, encrypted_disk): node.query( """ CREATE TABLE encrypted_test ( @@ -107,4 +109,76 @@ def test_optimize_table(cluster, policy, encrypted_disk): assert node.query(select_query) == "(0,'data'),(1,'data'),(2,'data'),(3,'data')" - node.query("DROP TABLE IF EXISTS encrypted_test NO DELAY") + +# Test adding encryption key on the fly. +def test_add_key(): + def make_storage_policy_with_keys(policy_name, keys): + node.exec_in_container(["bash", "-c" , """cat > /etc/clickhouse-server/config.d/storage_policy_{policy_name}.xml << EOF + + + + + <{policy_name}_disk> + encrypted + disk_local + {policy_name}_dir/ + {keys} + + + + <{policy_name}> + +
+ {policy_name}_disk +
+
+ +
+
+
+EOF""".format(policy_name=policy_name, keys=keys)]) + node.query("SYSTEM RELOAD CONFIG") + + # Add some data to an encrypted disk. + node.query("SELECT policy_name FROM system.storage_policies") + make_storage_policy_with_keys("encrypted_policy_multikeys", "firstfirstfirstf") + assert_eq_with_retry(node, "SELECT policy_name FROM system.storage_policies WHERE policy_name='encrypted_policy_multikeys'", "encrypted_policy_multikeys") + + node.query(""" + CREATE TABLE encrypted_test ( + id Int64, + data String + ) ENGINE=MergeTree() + ORDER BY id + SETTINGS storage_policy='encrypted_policy_multikeys' + """) + + node.query("INSERT INTO encrypted_test VALUES (0,'data'),(1,'data')") + select_query = "SELECT * FROM encrypted_test ORDER BY id FORMAT Values" + assert node.query(select_query) == "(0,'data'),(1,'data')" + + # Add a second key and start using it. + make_storage_policy_with_keys("encrypted_policy_multikeys", """ + firstfirstfirstf + secondsecondseco + 1 + """) + node.query("INSERT INTO encrypted_test VALUES (2,'data'),(3,'data')") + + # Now "(0,'data'),(1,'data')" is encrypted with the first key and "(2,'data'),(3,'data')" is encrypted with the second key. + # All data are accessible. + assert node.query(select_query) == "(0,'data'),(1,'data'),(2,'data'),(3,'data')" + + # Try to replace the first key with something wrong, and check that "(0,'data'),(1,'data')" cannot be read. + make_storage_policy_with_keys("encrypted_policy_multikeys", """ + wrongwrongwrongw + secondsecondseco + 1 + """) + + expected_error = "Wrong key" + assert expected_error in node.query_and_get_error(select_query) + + # Detach the part encrypted with the wrong key and check that another part containing "(2,'data'),(3,'data')" still can be read. + node.query("ALTER TABLE encrypted_test DETACH PART '{}'".format(FIRST_PART_NAME)) + assert node.query(select_query) == "(2,'data'),(3,'data')" diff --git a/tests/queries/0_stateless/01293_client_interactive_vertical_multiline_long.reference b/tests/integration/test_explain_estimates/__init__.py similarity index 100% rename from tests/queries/0_stateless/01293_client_interactive_vertical_multiline_long.reference rename to tests/integration/test_explain_estimates/__init__.py diff --git a/tests/integration/test_explain_estimates/test.py b/tests/integration/test_explain_estimates/test.py new file mode 100644 index 00000000000..a2b65564dbc --- /dev/null +++ b/tests/integration/test_explain_estimates/test.py @@ -0,0 +1,24 @@ +import pytest + +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) +node1 = cluster.add_instance('instance') + +@pytest.fixture(scope="module") +def start_cluster(): + try: + cluster.start() + yield cluster + + finally: + cluster.shutdown() + + +def test_explain_estimates(start_cluster): + node1.query("CREATE TABLE test (i Int64) ENGINE = MergeTree() ORDER BY i SETTINGS index_granularity = 16, write_final_mark = 0") + node1.query("INSERT INTO test SELECT number FROM numbers(128)") + node1.query("OPTIMIZE TABLE test") + system_parts_result = node1.query("SELECT any(database), any(table), count() as parts, sum(rows) as rows, sum(marks) as marks FROM system.parts WHERE database = 'default' AND table = 'test' and active = 1 GROUP BY (database, table)") + explain_estimates_result = node1.query("EXPLAIN ESTIMATE SELECT * FROM test") + assert(system_parts_result == explain_estimates_result) diff --git a/tests/integration/test_jdbc_bridge/test.py b/tests/integration/test_jdbc_bridge/test.py index 5972cfd7a5e..b5304c4cb10 100644 --- a/tests/integration/test_jdbc_bridge/test.py +++ b/tests/integration/test_jdbc_bridge/test.py @@ -1,7 +1,6 @@ -import contextlib +import logging import os.path as p import pytest -import time import uuid from helpers.cluster import ClickHouseCluster @@ -23,6 +22,14 @@ def started_cluster(): INSERT INTO test.ClickHouseTable(Num, Str) SELECT number, toString(number) FROM system.numbers LIMIT {}; '''.format(records)) + + while True: + datasources = instance.query("select * from jdbc('', 'show datasources')") + if 'self' in datasources: + logging.debug(f"JDBC Driver self datasource initialized.\n{datasources}") + break + else: + logging.debug(f"Waiting JDBC Driver to initialize 'self' datasource.\n{datasources}") yield cluster finally: cluster.shutdown() @@ -52,8 +59,9 @@ def test_jdbc_distributed_query(started_cluster): def test_jdbc_insert(started_cluster): """Test insert query using JDBC table function""" + instance.query('DROP TABLE IF EXISTS test.test_insert') instance.query(''' - CREATE TABLE test.test_insert engine = Memory AS + CREATE TABLE test.test_insert ENGINE = Memory AS SELECT * FROM test.ClickHouseTable; SELECT * FROM jdbc('{0}?mutation', 'INSERT INTO test.test_insert VALUES({1}, ''{1}'', ''{1}'')'); @@ -67,8 +75,9 @@ def test_jdbc_insert(started_cluster): def test_jdbc_update(started_cluster): """Test update query using JDBC table function""" secrets = str(uuid.uuid1()) + instance.query('DROP TABLE IF EXISTS test.test_update') instance.query(''' - CREATE TABLE test.test_update engine = Memory AS + CREATE TABLE test.test_update ENGINE = Memory AS SELECT * FROM test.ClickHouseTable; SELECT * FROM jdbc( @@ -85,8 +94,9 @@ def test_jdbc_update(started_cluster): def test_jdbc_delete(started_cluster): """Test delete query using JDBC table function""" + instance.query('DROP TABLE IF EXISTS test.test_delete') instance.query(''' - CREATE TABLE test.test_delete engine = Memory AS + CREATE TABLE test.test_delete ENGINE = Memory AS SELECT * FROM test.ClickHouseTable; SELECT * FROM jdbc( @@ -102,6 +112,7 @@ def test_jdbc_delete(started_cluster): def test_jdbc_table_engine(started_cluster): """Test query against a JDBC table""" + instance.query('DROP TABLE IF EXISTS test.jdbc_table') actual = instance.query(''' CREATE TABLE test.jdbc_table(Str String) ENGINE = JDBC('{}', 'test', 'ClickHouseTable'); diff --git a/tests/integration/test_library_bridge/configs/config.d/config.xml b/tests/integration/test_library_bridge/configs/config.d/config.xml index 9bea75fbb6f..7811c1e26b7 100644 --- a/tests/integration/test_library_bridge/configs/config.d/config.xml +++ b/tests/integration/test_library_bridge/configs/config.d/config.xml @@ -8,5 +8,9 @@ 10 /var/log/clickhouse-server/stderr.log /var/log/clickhouse-server/stdout.log + + /var/log/clickhouse-server/clickhouse-library-bridge.log + /var/log/clickhouse-server/clickhouse-library-bridge.err.log + trace diff --git a/tests/integration/test_library_bridge/test.py b/tests/integration/test_library_bridge/test.py index ba44918bd60..607afb6db5f 100644 --- a/tests/integration/test_library_bridge/test.py +++ b/tests/integration/test_library_bridge/test.py @@ -2,14 +2,30 @@ import os import os.path as p import pytest import time +import logging from helpers.cluster import ClickHouseCluster, run_and_check cluster = ClickHouseCluster(__file__) instance = cluster.add_instance('instance', - dictionaries=['configs/dictionaries/dict1.xml'], - main_configs=['configs/config.d/config.xml']) + dictionaries=['configs/dictionaries/dict1.xml'], main_configs=['configs/config.d/config.xml'], stay_alive=True) + + +def create_dict_simple(): + instance.query('DROP DICTIONARY IF EXISTS lib_dict_c') + instance.query(''' + CREATE DICTIONARY lib_dict_c (key UInt64, value1 UInt64, value2 UInt64, value3 UInt64) + PRIMARY KEY key SOURCE(library(PATH '/etc/clickhouse-server/config.d/dictionaries_lib/dict_lib.so')) + LAYOUT(CACHE( + SIZE_IN_CELLS 10000000 + BLOCK_SIZE 4096 + FILE_SIZE 16777216 + READ_BUFFER_SIZE 1048576 + MAX_STORED_KEYS 1048576)) + LIFETIME(2) ; + ''') + @pytest.fixture(scope="module") def ch_cluster(): @@ -98,6 +114,10 @@ def test_load_ids(ch_cluster): result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(0));''') assert(result.strip() == '100') + + # Just check bridge is ok with a large vector of random ids + instance.query('''select number, dictGet(lib_dict_c, 'value1', toUInt64(rand())) from numbers(1000);''') + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(1));''') assert(result.strip() == '101') instance.query('DROP DICTIONARY lib_dict_c') @@ -160,6 +180,91 @@ def test_null_values(ch_cluster): assert(result == expected) +def test_recover_after_bridge_crash(ch_cluster): + if instance.is_built_with_memory_sanitizer(): + pytest.skip("Memory Sanitizer cannot work with third-party shared libraries") + + create_dict_simple() + + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(0));''') + assert(result.strip() == '100') + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(1));''') + assert(result.strip() == '101') + + instance.exec_in_container(['bash', '-c', 'kill -9 `pidof clickhouse-library-bridge`'], user='root') + instance.query('SYSTEM RELOAD DICTIONARY lib_dict_c') + + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(0));''') + assert(result.strip() == '100') + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(1));''') + assert(result.strip() == '101') + + instance.exec_in_container(['bash', '-c', 'kill -9 `pidof clickhouse-library-bridge`'], user='root') + instance.query('DROP DICTIONARY lib_dict_c') + + +def test_server_restart_bridge_might_be_stil_alive(ch_cluster): + if instance.is_built_with_memory_sanitizer(): + pytest.skip("Memory Sanitizer cannot work with third-party shared libraries") + + create_dict_simple() + + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(1));''') + assert(result.strip() == '101') + + instance.restart_clickhouse() + + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(1));''') + assert(result.strip() == '101') + + instance.exec_in_container(['bash', '-c', 'kill -9 `pidof clickhouse-library-bridge`'], user='root') + instance.restart_clickhouse() + + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(1));''') + assert(result.strip() == '101') + + instance.query('DROP DICTIONARY lib_dict_c') + + +def test_bridge_dies_with_parent(ch_cluster): + if instance.is_built_with_memory_sanitizer(): + pytest.skip("Memory Sanitizer cannot work with third-party shared libraries") + if instance.is_built_with_address_sanitizer(): + pytest.skip("Leak sanitizer falsely reports about a leak of 16 bytes in clickhouse-odbc-bridge") + + create_dict_simple() + result = instance.query('''select dictGet(lib_dict_c, 'value1', toUInt64(1));''') + assert(result.strip() == '101') + + clickhouse_pid = instance.get_process_pid("clickhouse server") + bridge_pid = instance.get_process_pid("library-bridge") + assert clickhouse_pid is not None + assert bridge_pid is not None + + while clickhouse_pid is not None: + try: + instance.exec_in_container(["kill", str(clickhouse_pid)], privileged=True, user='root') + except: + pass + clickhouse_pid = instance.get_process_pid("clickhouse server") + time.sleep(1) + + for i in range(30): + time.sleep(1) + bridge_pid = instance.get_process_pid("library-bridge") + if bridge_pid is None: + break + + if bridge_pid: + out = instance.exec_in_container(["gdb", "-p", str(bridge_pid), "--ex", "thread apply all bt", "--ex", "q"], + privileged=True, user='root') + logging.debug(f"Bridge is running, gdb output:\n{out}") + + assert clickhouse_pid is None + assert bridge_pid is None + instance.start_clickhouse(20) + + if __name__ == '__main__': cluster.start() input("Cluster created, press any key to destroy...") diff --git a/tests/integration/test_materialized_mysql_database/__init__.py b/tests/integration/test_materialized_mysql_database/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_materialize_mysql_database/configs/users.xml b/tests/integration/test_materialized_mysql_database/configs/users.xml similarity index 83% rename from tests/integration/test_materialize_mysql_database/configs/users.xml rename to tests/integration/test_materialized_mysql_database/configs/users.xml index 196ea5cfb99..19236f6081b 100644 --- a/tests/integration/test_materialize_mysql_database/configs/users.xml +++ b/tests/integration/test_materialized_mysql_database/configs/users.xml @@ -2,7 +2,7 @@ - 1 + 1 1 0 Ordinary diff --git a/tests/integration/test_materialize_mysql_database/configs/users_db_atomic.xml b/tests/integration/test_materialized_mysql_database/configs/users_db_atomic.xml similarity index 79% rename from tests/integration/test_materialize_mysql_database/configs/users_db_atomic.xml rename to tests/integration/test_materialized_mysql_database/configs/users_db_atomic.xml index 3add72ec554..e2981fd1eef 100644 --- a/tests/integration/test_materialize_mysql_database/configs/users_db_atomic.xml +++ b/tests/integration/test_materialized_mysql_database/configs/users_db_atomic.xml @@ -2,7 +2,7 @@ - 1 + 1 Atomic diff --git a/tests/integration/test_materialize_mysql_database/configs/users_disable_bytes_settings.xml b/tests/integration/test_materialized_mysql_database/configs/users_disable_bytes_settings.xml similarity index 84% rename from tests/integration/test_materialize_mysql_database/configs/users_disable_bytes_settings.xml rename to tests/integration/test_materialized_mysql_database/configs/users_disable_bytes_settings.xml index 4516cb80c17..0f656f0fbd7 100644 --- a/tests/integration/test_materialize_mysql_database/configs/users_disable_bytes_settings.xml +++ b/tests/integration/test_materialized_mysql_database/configs/users_disable_bytes_settings.xml @@ -2,7 +2,7 @@ - 1 + 1 Atomic 1 0 diff --git a/tests/integration/test_materialize_mysql_database/configs/users_disable_rows_settings.xml b/tests/integration/test_materialized_mysql_database/configs/users_disable_rows_settings.xml similarity index 84% rename from tests/integration/test_materialize_mysql_database/configs/users_disable_rows_settings.xml rename to tests/integration/test_materialized_mysql_database/configs/users_disable_rows_settings.xml index dea20eb9e12..d83ed9e9a58 100644 --- a/tests/integration/test_materialize_mysql_database/configs/users_disable_rows_settings.xml +++ b/tests/integration/test_materialized_mysql_database/configs/users_disable_rows_settings.xml @@ -2,7 +2,7 @@ - 1 + 1 Atomic 0 1 diff --git a/tests/integration/test_materialize_mysql_database/materialize_with_ddl.py b/tests/integration/test_materialized_mysql_database/materialize_with_ddl.py similarity index 88% rename from tests/integration/test_materialize_mysql_database/materialize_with_ddl.py rename to tests/integration/test_materialized_mysql_database/materialize_with_ddl.py index 3fd1cb0ecae..23fa9894a84 100644 --- a/tests/integration/test_materialize_mysql_database/materialize_with_ddl.py +++ b/tests/integration/test_materialized_mysql_database/materialize_with_ddl.py @@ -30,7 +30,7 @@ def check_query(clickhouse_node, query, result_set, retry_count=10, interval_sec assert clickhouse_node.query(query) == result_set -def dml_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def dml_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_dml") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_dml") mysql_node.query("CREATE DATABASE test_database_dml DEFAULT CHARACTER SET 'utf8'") @@ -117,7 +117,7 @@ def dml_with_materialize_mysql_database(clickhouse_node, mysql_node, service_nam mysql_node.query("DROP DATABASE test_database_dml") -def materialize_mysql_database_with_views(clickhouse_node, mysql_node, service_name): +def materialized_mysql_database_with_views(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database") clickhouse_node.query("DROP DATABASE IF EXISTS test_database") mysql_node.query("CREATE DATABASE test_database DEFAULT CHARACTER SET 'utf8'") @@ -146,7 +146,7 @@ def materialize_mysql_database_with_views(clickhouse_node, mysql_node, service_n '2020-01-01', '2020-01-01 00:00:00', '2020-01-01 00:00:00', true); """) clickhouse_node.query( - "CREATE DATABASE test_database ENGINE = MaterializeMySQL('{}:3306', 'test_database', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database ENGINE = MaterializedMySQL('{}:3306', 'test_database', 'root', 'clickhouse')".format( service_name)) assert "test_database" in clickhouse_node.query("SHOW DATABASES") @@ -156,7 +156,7 @@ def materialize_mysql_database_with_views(clickhouse_node, mysql_node, service_n mysql_node.query("DROP DATABASE test_database") -def materialize_mysql_database_with_datetime_and_decimal(clickhouse_node, mysql_node, service_name): +def materialized_mysql_database_with_datetime_and_decimal(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_dt") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_dt") mysql_node.query("CREATE DATABASE test_database_dt DEFAULT CHARACTER SET 'utf8'") @@ -166,7 +166,7 @@ def materialize_mysql_database_with_datetime_and_decimal(clickhouse_node, mysql_ mysql_node.query("INSERT INTO test_database_dt.test_table_1 VALUES(3, '2020-01-01 01:02:03.9999', '2020-01-01 01:02:03.99', -" + ('9' * 35) + "." + ('9' * 30) + ")") mysql_node.query("INSERT INTO test_database_dt.test_table_1 VALUES(4, '2020-01-01 01:02:03.9999', '2020-01-01 01:02:03.9999', -." + ('0' * 29) + "1)") - clickhouse_node.query("CREATE DATABASE test_database_dt ENGINE = MaterializeMySQL('{}:3306', 'test_database_dt', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE test_database_dt ENGINE = MaterializedMySQL('{}:3306', 'test_database_dt', 'root', 'clickhouse')".format(service_name)) assert "test_database_dt" in clickhouse_node.query("SHOW DATABASES") check_query(clickhouse_node, "SELECT * FROM test_database_dt.test_table_1 ORDER BY key FORMAT TSV", @@ -190,7 +190,7 @@ def materialize_mysql_database_with_datetime_and_decimal(clickhouse_node, mysql_ mysql_node.query("DROP DATABASE test_database_dt") -def drop_table_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def drop_table_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_drop") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_drop") mysql_node.query("CREATE DATABASE test_database_drop DEFAULT CHARACTER SET 'utf8'") @@ -204,7 +204,7 @@ def drop_table_with_materialize_mysql_database(clickhouse_node, mysql_node, serv # create mapping clickhouse_node.query( - "CREATE DATABASE test_database_drop ENGINE = MaterializeMySQL('{}:3306', 'test_database_drop', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_drop ENGINE = MaterializedMySQL('{}:3306', 'test_database_drop', 'root', 'clickhouse')".format( service_name)) assert "test_database_drop" in clickhouse_node.query("SHOW DATABASES") @@ -225,7 +225,7 @@ def drop_table_with_materialize_mysql_database(clickhouse_node, mysql_node, serv mysql_node.query("DROP DATABASE test_database_drop") -def create_table_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def create_table_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_create") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_create") mysql_node.query("CREATE DATABASE test_database_create DEFAULT CHARACTER SET 'utf8'") @@ -236,7 +236,7 @@ def create_table_with_materialize_mysql_database(clickhouse_node, mysql_node, se # create mapping clickhouse_node.query( - "CREATE DATABASE test_database_create ENGINE = MaterializeMySQL('{}:3306', 'test_database_create', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_create ENGINE = MaterializedMySQL('{}:3306', 'test_database_create', 'root', 'clickhouse')".format( service_name)) # Check for pre-existing status @@ -253,7 +253,7 @@ def create_table_with_materialize_mysql_database(clickhouse_node, mysql_node, se mysql_node.query("DROP DATABASE test_database_create") -def rename_table_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def rename_table_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_rename") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_rename") mysql_node.query("CREATE DATABASE test_database_rename DEFAULT CHARACTER SET 'utf8'") @@ -263,7 +263,7 @@ def rename_table_with_materialize_mysql_database(clickhouse_node, mysql_node, se # create mapping clickhouse_node.query( - "CREATE DATABASE test_database_rename ENGINE = MaterializeMySQL('{}:3306', 'test_database_rename', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_rename ENGINE = MaterializedMySQL('{}:3306', 'test_database_rename', 'root', 'clickhouse')".format( service_name)) assert "test_database_rename" in clickhouse_node.query("SHOW DATABASES") @@ -275,7 +275,7 @@ def rename_table_with_materialize_mysql_database(clickhouse_node, mysql_node, se mysql_node.query("DROP DATABASE test_database_rename") -def alter_add_column_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def alter_add_column_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_add") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_add") mysql_node.query("CREATE DATABASE test_database_add DEFAULT CHARACTER SET 'utf8'") @@ -289,7 +289,7 @@ def alter_add_column_with_materialize_mysql_database(clickhouse_node, mysql_node # create mapping clickhouse_node.query( - "CREATE DATABASE test_database_add ENGINE = MaterializeMySQL('{}:3306', 'test_database_add', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_add ENGINE = MaterializedMySQL('{}:3306', 'test_database_add', 'root', 'clickhouse')".format( service_name)) assert "test_database_add" in clickhouse_node.query("SHOW DATABASES") @@ -317,7 +317,7 @@ def alter_add_column_with_materialize_mysql_database(clickhouse_node, mysql_node mysql_node.query("DROP DATABASE test_database_add") -def alter_drop_column_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def alter_drop_column_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_alter_drop") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_alter_drop") mysql_node.query("CREATE DATABASE test_database_alter_drop DEFAULT CHARACTER SET 'utf8'") @@ -328,7 +328,7 @@ def alter_drop_column_with_materialize_mysql_database(clickhouse_node, mysql_nod # create mapping clickhouse_node.query( - "CREATE DATABASE test_database_alter_drop ENGINE = MaterializeMySQL('{}:3306', 'test_database_alter_drop', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_alter_drop ENGINE = MaterializedMySQL('{}:3306', 'test_database_alter_drop', 'root', 'clickhouse')".format( service_name)) assert "test_database_alter_drop" in clickhouse_node.query("SHOW DATABASES") @@ -351,7 +351,7 @@ def alter_drop_column_with_materialize_mysql_database(clickhouse_node, mysql_nod mysql_node.query("DROP DATABASE test_database_alter_drop") -def alter_rename_column_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def alter_rename_column_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_alter_rename") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_alter_rename") mysql_node.query("CREATE DATABASE test_database_alter_rename DEFAULT CHARACTER SET 'utf8'") @@ -364,7 +364,7 @@ def alter_rename_column_with_materialize_mysql_database(clickhouse_node, mysql_n # create mapping clickhouse_node.query( - "CREATE DATABASE test_database_alter_rename ENGINE = MaterializeMySQL('{}:3306', 'test_database_alter_rename', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_alter_rename ENGINE = MaterializedMySQL('{}:3306', 'test_database_alter_rename', 'root', 'clickhouse')".format( service_name)) assert "test_database_alter_rename" in clickhouse_node.query("SHOW DATABASES") @@ -386,7 +386,7 @@ def alter_rename_column_with_materialize_mysql_database(clickhouse_node, mysql_n mysql_node.query("DROP DATABASE test_database_alter_rename") -def alter_modify_column_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def alter_modify_column_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_alter_modify") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_alter_modify") mysql_node.query("CREATE DATABASE test_database_alter_modify DEFAULT CHARACTER SET 'utf8'") @@ -399,7 +399,7 @@ def alter_modify_column_with_materialize_mysql_database(clickhouse_node, mysql_n # create mapping clickhouse_node.query( - "CREATE DATABASE test_database_alter_modify ENGINE = MaterializeMySQL('{}:3306', 'test_database_alter_modify', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_alter_modify ENGINE = MaterializedMySQL('{}:3306', 'test_database_alter_modify', 'root', 'clickhouse')".format( service_name)) assert "test_database_alter_modify" in clickhouse_node.query("SHOW DATABASES") @@ -429,10 +429,10 @@ def alter_modify_column_with_materialize_mysql_database(clickhouse_node, mysql_n # TODO: need ClickHouse support ALTER TABLE table_name ADD COLUMN column_name, RENAME COLUMN column_name TO new_column_name; -# def test_mysql_alter_change_column_for_materialize_mysql_database(started_cluster): +# def test_mysql_alter_change_column_for_materialized_mysql_database(started_cluster): # pass -def alter_rename_table_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def alter_rename_table_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS test_database_rename_table") clickhouse_node.query("DROP DATABASE IF EXISTS test_database_rename_table") mysql_node.query("CREATE DATABASE test_database_rename_table DEFAULT CHARACTER SET 'utf8'") @@ -444,7 +444,7 @@ def alter_rename_table_with_materialize_mysql_database(clickhouse_node, mysql_no # create mapping clickhouse_node.query( - "CREATE DATABASE test_database_rename_table ENGINE = MaterializeMySQL('{}:3306', 'test_database_rename_table', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_rename_table ENGINE = MaterializedMySQL('{}:3306', 'test_database_rename_table', 'root', 'clickhouse')".format( service_name)) assert "test_database_rename_table" in clickhouse_node.query("SHOW DATABASES") @@ -479,7 +479,7 @@ def query_event_with_empty_transaction(clickhouse_node, mysql_node, service_name mysql_node.query("INSERT INTO test_database_event.t1(a) VALUES(1)") clickhouse_node.query( - "CREATE DATABASE test_database_event ENGINE = MaterializeMySQL('{}:3306', 'test_database_event', 'root', 'clickhouse')".format( + "CREATE DATABASE test_database_event ENGINE = MaterializedMySQL('{}:3306', 'test_database_event', 'root', 'clickhouse')".format( service_name)) # Reject one empty GTID QUERY event with 'BEGIN' and 'COMMIT' @@ -510,7 +510,7 @@ def select_without_columns(clickhouse_node, mysql_node, service_name): mysql_node.query("CREATE DATABASE db") mysql_node.query("CREATE TABLE db.t (a INT PRIMARY KEY, b INT)") clickhouse_node.query( - "CREATE DATABASE db ENGINE = MaterializeMySQL('{}:3306', 'db', 'root', 'clickhouse') SETTINGS max_flush_data_time = 100000".format(service_name)) + "CREATE DATABASE db ENGINE = MaterializedMySQL('{}:3306', 'db', 'root', 'clickhouse') SETTINGS max_flush_data_time = 100000".format(service_name)) check_query(clickhouse_node, "SHOW TABLES FROM db FORMAT TSV", "t\n") clickhouse_node.query("SYSTEM STOP MERGES db.t") clickhouse_node.query("CREATE VIEW v AS SELECT * FROM db.t") @@ -548,7 +548,7 @@ def select_without_columns(clickhouse_node, mysql_node, service_name): def insert_with_modify_binlog_checksum(clickhouse_node, mysql_node, service_name): mysql_node.query("CREATE DATABASE test_checksum") mysql_node.query("CREATE TABLE test_checksum.t (a INT PRIMARY KEY, b varchar(200))") - clickhouse_node.query("CREATE DATABASE test_checksum ENGINE = MaterializeMySQL('{}:3306', 'test_checksum', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE test_checksum ENGINE = MaterializedMySQL('{}:3306', 'test_checksum', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SHOW TABLES FROM test_checksum FORMAT TSV", "t\n") mysql_node.query("INSERT INTO test_checksum.t VALUES(1, '1111')") check_query(clickhouse_node, "SELECT * FROM test_checksum.t ORDER BY a FORMAT TSV", "1\t1111\n") @@ -565,7 +565,7 @@ def insert_with_modify_binlog_checksum(clickhouse_node, mysql_node, service_name mysql_node.query("DROP DATABASE test_checksum") -def err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, mysql_node, service_name): +def err_sync_user_privs_with_materialized_mysql_database(clickhouse_node, mysql_node, service_name): clickhouse_node.query("DROP DATABASE IF EXISTS priv_err_db") mysql_node.query("DROP DATABASE IF EXISTS priv_err_db") mysql_node.query("CREATE DATABASE priv_err_db DEFAULT CHARACTER SET 'utf8'") @@ -575,7 +575,7 @@ def err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, mysql_n mysql_node.result("SHOW GRANTS FOR 'test'@'%';") clickhouse_node.query( - "CREATE DATABASE priv_err_db ENGINE = MaterializeMySQL('{}:3306', 'priv_err_db', 'test', '123')".format( + "CREATE DATABASE priv_err_db ENGINE = MaterializedMySQL('{}:3306', 'priv_err_db', 'test', '123')".format( service_name)) check_query(clickhouse_node, "SELECT count() FROM priv_err_db.test_table_1 FORMAT TSV", "1\n", 30, 5) @@ -585,7 +585,7 @@ def err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, mysql_n mysql_node.query("REVOKE REPLICATION SLAVE ON *.* FROM 'test'@'%'") clickhouse_node.query( - "CREATE DATABASE priv_err_db ENGINE = MaterializeMySQL('{}:3306', 'priv_err_db', 'test', '123')".format( + "CREATE DATABASE priv_err_db ENGINE = MaterializedMySQL('{}:3306', 'priv_err_db', 'test', '123')".format( service_name)) assert "priv_err_db" in clickhouse_node.query("SHOW DATABASES") assert "test_table_1" not in clickhouse_node.query("SHOW TABLES FROM priv_err_db") @@ -593,7 +593,7 @@ def err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, mysql_n mysql_node.query("REVOKE REPLICATION CLIENT, RELOAD ON *.* FROM 'test'@'%'") clickhouse_node.query( - "CREATE DATABASE priv_err_db ENGINE = MaterializeMySQL('{}:3306', 'priv_err_db', 'test', '123')".format( + "CREATE DATABASE priv_err_db ENGINE = MaterializedMySQL('{}:3306', 'priv_err_db', 'test', '123')".format( service_name)) assert "priv_err_db" in clickhouse_node.query("SHOW DATABASES") assert "test_table_1" not in clickhouse_node.query("SHOW TABLES FROM priv_err_db") @@ -641,7 +641,7 @@ def network_partition_test(clickhouse_node, mysql_node, service_name): mysql_node.query("CREATE DATABASE test;") clickhouse_node.query( - "CREATE DATABASE test_database_network ENGINE = MaterializeMySQL('{}:3306', 'test_database_network', 'root', 'clickhouse')".format(service_name)) + "CREATE DATABASE test_database_network ENGINE = MaterializedMySQL('{}:3306', 'test_database_network', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SELECT * FROM test_database_network.test_table", '') with PartitionManager() as pm: @@ -651,7 +651,7 @@ def network_partition_test(clickhouse_node, mysql_node, service_name): with pytest.raises(QueryRuntimeException) as exception: clickhouse_node.query( - "CREATE DATABASE test ENGINE = MaterializeMySQL('{}:3306', 'test', 'root', 'clickhouse')".format(service_name)) + "CREATE DATABASE test ENGINE = MaterializedMySQL('{}:3306', 'test', 'root', 'clickhouse')".format(service_name)) assert "Can't connect to MySQL server" in str(exception.value) @@ -660,7 +660,7 @@ def network_partition_test(clickhouse_node, mysql_node, service_name): check_query(clickhouse_node, "SELECT * FROM test_database_network.test_table FORMAT TSV", '1\n') clickhouse_node.query( - "CREATE DATABASE test ENGINE = MaterializeMySQL('{}:3306', 'test', 'root', 'clickhouse')".format(service_name)) + "CREATE DATABASE test ENGINE = MaterializedMySQL('{}:3306', 'test', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SHOW TABLES FROM test_database_network FORMAT TSV", "test_table\n") mysql_node.query("CREATE TABLE test.test ( `id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;") @@ -686,8 +686,8 @@ def mysql_kill_sync_thread_restore_test(clickhouse_node, mysql_node, service_nam mysql_node.query("CREATE TABLE test_database_auto.test_table ( `id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;") mysql_node.query("INSERT INTO test_database_auto.test_table VALUES (11)") - clickhouse_node.query("CREATE DATABASE test_database ENGINE = MaterializeMySQL('{}:3306', 'test_database', 'root', 'clickhouse') SETTINGS max_wait_time_when_mysql_unavailable=-1".format(service_name)) - clickhouse_node.query("CREATE DATABASE test_database_auto ENGINE = MaterializeMySQL('{}:3306', 'test_database_auto', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE test_database ENGINE = MaterializedMySQL('{}:3306', 'test_database', 'root', 'clickhouse') SETTINGS max_wait_time_when_mysql_unavailable=-1".format(service_name)) + clickhouse_node.query("CREATE DATABASE test_database_auto ENGINE = MaterializedMySQL('{}:3306', 'test_database_auto', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SELECT * FROM test_database.test_table FORMAT TSV", '1\n') check_query(clickhouse_node, "SELECT * FROM test_database_auto.test_table FORMAT TSV", '11\n') @@ -701,19 +701,20 @@ def mysql_kill_sync_thread_restore_test(clickhouse_node, mysql_node, service_nam check_query(clickhouse_node, "SELECT * FROM test_database.test_table ORDER BY id FORMAT TSV", '1\n2\n') check_query(clickhouse_node, "SELECT * FROM test_database_auto.test_table ORDER BY id FORMAT TSV", '11\n22\n') - get_sync_id_query = "select id from information_schema.processlist where STATE='Master has sent all binlog to slave; waiting for more updates'" + get_sync_id_query = "SELECT id FROM information_schema.processlist WHERE state LIKE '% has sent all binlog to % waiting for more updates%';" result = mysql_node.query_and_get_data(get_sync_id_query) + assert len(result) > 0 for row in result: - row_result = {} query = "kill " + str(row[0]) + ";" mysql_node.query(query) - with pytest.raises(QueryRuntimeException) as exception: + with pytest.raises(QueryRuntimeException, match="Cannot read all data"): # https://dev.mysql.com/doc/refman/5.7/en/kill.html - # When you use KILL, a thread-specific kill flag is set for the thread. In most cases, it might take some time for the thread to die because the kill flag is checked only at specific intervals: - time.sleep(3) - clickhouse_node.query("SELECT * FROM test_database.test_table") - assert "Cannot read all data" in str(exception.value) + # When you use KILL, a thread-specific kill flag is set for the thread. + # In most cases, it might take some time for the thread to die because the kill flag is checked only at specific intervals. + for sleep_time in [1, 3, 5]: + time.sleep(sleep_time) + clickhouse_node.query("SELECT * FROM test_database.test_table") clickhouse_node.query("DETACH DATABASE test_database") clickhouse_node.query("ATTACH DATABASE test_database") @@ -736,7 +737,7 @@ def mysql_killed_while_insert(clickhouse_node, mysql_node, service_name): clickhouse_node.query("DROP DATABASE IF EXISTS kill_mysql_while_insert") mysql_node.query("CREATE DATABASE kill_mysql_while_insert") mysql_node.query("CREATE TABLE kill_mysql_while_insert.test ( `id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;") - clickhouse_node.query("CREATE DATABASE kill_mysql_while_insert ENGINE = MaterializeMySQL('{}:3306', 'kill_mysql_while_insert', 'root', 'clickhouse') SETTINGS max_wait_time_when_mysql_unavailable=-1".format(service_name)) + clickhouse_node.query("CREATE DATABASE kill_mysql_while_insert ENGINE = MaterializedMySQL('{}:3306', 'kill_mysql_while_insert', 'root', 'clickhouse') SETTINGS max_wait_time_when_mysql_unavailable=-1".format(service_name)) check_query(clickhouse_node, "SHOW TABLES FROM kill_mysql_while_insert FORMAT TSV", 'test\n') try: @@ -772,7 +773,7 @@ def clickhouse_killed_while_insert(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS kill_clickhouse_while_insert") mysql_node.query("CREATE DATABASE kill_clickhouse_while_insert") mysql_node.query("CREATE TABLE kill_clickhouse_while_insert.test ( `id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB;") - clickhouse_node.query("CREATE DATABASE kill_clickhouse_while_insert ENGINE = MaterializeMySQL('{}:3306', 'kill_clickhouse_while_insert', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE kill_clickhouse_while_insert ENGINE = MaterializedMySQL('{}:3306', 'kill_clickhouse_while_insert', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SHOW TABLES FROM kill_clickhouse_while_insert FORMAT TSV", 'test\n') def insert(num): @@ -801,7 +802,7 @@ def utf8mb4_test(clickhouse_node, mysql_node, service_name): mysql_node.query("CREATE DATABASE utf8mb4_test") mysql_node.query("CREATE TABLE utf8mb4_test.test (id INT(11) NOT NULL PRIMARY KEY, name VARCHAR(255)) ENGINE=InnoDB DEFAULT CHARACTER SET utf8mb4") mysql_node.query("INSERT INTO utf8mb4_test.test VALUES(1, '🦄'),(2, '\u2601')") - clickhouse_node.query("CREATE DATABASE utf8mb4_test ENGINE = MaterializeMySQL('{}:3306', 'utf8mb4_test', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE utf8mb4_test ENGINE = MaterializedMySQL('{}:3306', 'utf8mb4_test', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SHOW TABLES FROM utf8mb4_test FORMAT TSV", "test\n") check_query(clickhouse_node, "SELECT id, name FROM utf8mb4_test.test ORDER BY id", "1\t\U0001F984\n2\t\u2601\n") @@ -813,7 +814,7 @@ def system_parts_test(clickhouse_node, mysql_node, service_name): mysql_node.query("INSERT INTO system_parts_test.test VALUES(1),(2),(3)") def check_active_parts(num): check_query(clickhouse_node, "SELECT count() FROM system.parts WHERE database = 'system_parts_test' AND table = 'test' AND active = 1", "{}\n".format(num)) - clickhouse_node.query("CREATE DATABASE system_parts_test ENGINE = MaterializeMySQL('{}:3306', 'system_parts_test', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE system_parts_test ENGINE = MaterializedMySQL('{}:3306', 'system_parts_test', 'root', 'clickhouse')".format(service_name)) check_active_parts(1) mysql_node.query("INSERT INTO system_parts_test.test VALUES(4),(5),(6)") check_active_parts(2) @@ -828,7 +829,7 @@ def multi_table_update_test(clickhouse_node, mysql_node, service_name): mysql_node.query("CREATE TABLE multi_table_update.b (id INT(11) NOT NULL PRIMARY KEY, othervalue VARCHAR(255))") mysql_node.query("INSERT INTO multi_table_update.a VALUES(1, 'foo')") mysql_node.query("INSERT INTO multi_table_update.b VALUES(1, 'bar')") - clickhouse_node.query("CREATE DATABASE multi_table_update ENGINE = MaterializeMySQL('{}:3306', 'multi_table_update', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE multi_table_update ENGINE = MaterializedMySQL('{}:3306', 'multi_table_update', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SHOW TABLES FROM multi_table_update", "a\nb\n") mysql_node.query("UPDATE multi_table_update.a, multi_table_update.b SET value='baz', othervalue='quux' where a.id=b.id") @@ -840,7 +841,7 @@ def system_tables_test(clickhouse_node, mysql_node, service_name): clickhouse_node.query("DROP DATABASE IF EXISTS system_tables_test") mysql_node.query("CREATE DATABASE system_tables_test") mysql_node.query("CREATE TABLE system_tables_test.test (id int NOT NULL PRIMARY KEY) ENGINE=InnoDB") - clickhouse_node.query("CREATE DATABASE system_tables_test ENGINE = MaterializeMySQL('{}:3306', 'system_tables_test', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE system_tables_test ENGINE = MaterializedMySQL('{}:3306', 'system_tables_test', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SELECT partition_key, sorting_key, primary_key FROM system.tables WHERE database = 'system_tables_test' AND name = 'test'", "intDiv(id, 4294967)\tid\tid\n") def materialize_with_column_comments_test(clickhouse_node, mysql_node, service_name): @@ -848,7 +849,7 @@ def materialize_with_column_comments_test(clickhouse_node, mysql_node, service_n clickhouse_node.query("DROP DATABASE IF EXISTS materialize_with_column_comments_test") mysql_node.query("CREATE DATABASE materialize_with_column_comments_test") mysql_node.query("CREATE TABLE materialize_with_column_comments_test.test (id int NOT NULL PRIMARY KEY, value VARCHAR(255) COMMENT 'test comment') ENGINE=InnoDB") - clickhouse_node.query("CREATE DATABASE materialize_with_column_comments_test ENGINE = MaterializeMySQL('{}:3306', 'materialize_with_column_comments_test', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE materialize_with_column_comments_test ENGINE = MaterializedMySQL('{}:3306', 'materialize_with_column_comments_test', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "DESCRIBE TABLE materialize_with_column_comments_test.test", "id\tInt32\t\t\t\t\t\nvalue\tNullable(String)\t\t\ttest comment\t\t\n_sign\tInt8\tMATERIALIZED\t1\t\t\t\n_version\tUInt64\tMATERIALIZED\t1\t\t\t\n") mysql_node.query("ALTER TABLE materialize_with_column_comments_test.test MODIFY value VARCHAR(255) COMMENT 'comment test'") check_query(clickhouse_node, "DESCRIBE TABLE materialize_with_column_comments_test.test", "id\tInt32\t\t\t\t\t\nvalue\tNullable(String)\t\t\tcomment test\t\t\n_sign\tInt8\tMATERIALIZED\t1\t\t\t\n_version\tUInt64\tMATERIALIZED\t1\t\t\t\n") @@ -871,7 +872,7 @@ def materialize_with_enum8_test(clickhouse_node, mysql_node, service_name): enum8_values_with_backslash += "\\\'" + str(enum8_values_count) +"\\\' = " + str(enum8_values_count) mysql_node.query("CREATE TABLE materialize_with_enum8_test.test (id int NOT NULL PRIMARY KEY, value ENUM(" + enum8_values + ")) ENGINE=InnoDB") mysql_node.query("INSERT INTO materialize_with_enum8_test.test (id, value) VALUES (1, '1'),(2, '2')") - clickhouse_node.query("CREATE DATABASE materialize_with_enum8_test ENGINE = MaterializeMySQL('{}:3306', 'materialize_with_enum8_test', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE materialize_with_enum8_test ENGINE = MaterializedMySQL('{}:3306', 'materialize_with_enum8_test', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SELECT value FROM materialize_with_enum8_test.test ORDER BY id", "1\n2\n") mysql_node.query("INSERT INTO materialize_with_enum8_test.test (id, value) VALUES (3, '127')") check_query(clickhouse_node, "SELECT value FROM materialize_with_enum8_test.test ORDER BY id", "1\n2\n127\n") @@ -893,7 +894,7 @@ def materialize_with_enum16_test(clickhouse_node, mysql_node, service_name): enum16_values_with_backslash += "\\\'" + str(enum16_values_count) +"\\\' = " + str(enum16_values_count) mysql_node.query("CREATE TABLE materialize_with_enum16_test.test (id int NOT NULL PRIMARY KEY, value ENUM(" + enum16_values + ")) ENGINE=InnoDB") mysql_node.query("INSERT INTO materialize_with_enum16_test.test (id, value) VALUES (1, '1'),(2, '2')") - clickhouse_node.query("CREATE DATABASE materialize_with_enum16_test ENGINE = MaterializeMySQL('{}:3306', 'materialize_with_enum16_test', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE materialize_with_enum16_test ENGINE = MaterializedMySQL('{}:3306', 'materialize_with_enum16_test', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SELECT value FROM materialize_with_enum16_test.test ORDER BY id", "1\n2\n") mysql_node.query("INSERT INTO materialize_with_enum16_test.test (id, value) VALUES (3, '500')") check_query(clickhouse_node, "SELECT value FROM materialize_with_enum16_test.test ORDER BY id", "1\n2\n500\n") @@ -905,7 +906,7 @@ def alter_enum8_to_enum16_test(clickhouse_node, mysql_node, service_name): mysql_node.query("DROP DATABASE IF EXISTS alter_enum8_to_enum16_test") clickhouse_node.query("DROP DATABASE IF EXISTS alter_enum8_to_enum16_test") mysql_node.query("CREATE DATABASE alter_enum8_to_enum16_test") - + enum8_values_count = 100 enum8_values = "" enum8_values_with_backslash = "" @@ -916,11 +917,11 @@ def alter_enum8_to_enum16_test(clickhouse_node, mysql_node, service_name): enum8_values_with_backslash += "\\\'" + str(enum8_values_count) +"\\\' = " + str(enum8_values_count) mysql_node.query("CREATE TABLE alter_enum8_to_enum16_test.test (id int NOT NULL PRIMARY KEY, value ENUM(" + enum8_values + ")) ENGINE=InnoDB") mysql_node.query("INSERT INTO alter_enum8_to_enum16_test.test (id, value) VALUES (1, '1'),(2, '2')") - clickhouse_node.query("CREATE DATABASE alter_enum8_to_enum16_test ENGINE = MaterializeMySQL('{}:3306', 'alter_enum8_to_enum16_test', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE alter_enum8_to_enum16_test ENGINE = MaterializedMySQL('{}:3306', 'alter_enum8_to_enum16_test', 'root', 'clickhouse')".format(service_name)) mysql_node.query("INSERT INTO alter_enum8_to_enum16_test.test (id, value) VALUES (3, '75')") check_query(clickhouse_node, "SELECT value FROM alter_enum8_to_enum16_test.test ORDER BY id", "1\n2\n75\n") check_query(clickhouse_node, "DESCRIBE TABLE alter_enum8_to_enum16_test.test", "id\tInt32\t\t\t\t\t\nvalue\tNullable(Enum8(" + enum8_values_with_backslash + "))\t\t\t\t\t\n_sign\tInt8\tMATERIALIZED\t1\t\t\t\n_version\tUInt64\tMATERIALIZED\t1\t\t\t\n") - + enum16_values_count = 600 enum16_values = "" enum16_values_with_backslash = "" @@ -933,7 +934,7 @@ def alter_enum8_to_enum16_test(clickhouse_node, mysql_node, service_name): check_query(clickhouse_node, "DESCRIBE TABLE alter_enum8_to_enum16_test.test", "id\tInt32\t\t\t\t\t\nvalue\tNullable(Enum16(" + enum16_values_with_backslash + "))\t\t\t\t\t\n_sign\tInt8\tMATERIALIZED\t1\t\t\t\n_version\tUInt64\tMATERIALIZED\t1\t\t\t\n") mysql_node.query("INSERT INTO alter_enum8_to_enum16_test.test (id, value) VALUES (4, '500')") check_query(clickhouse_node, "SELECT value FROM alter_enum8_to_enum16_test.test ORDER BY id", "1\n2\n75\n500\n") - + clickhouse_node.query("DROP DATABASE alter_enum8_to_enum16_test") mysql_node.query("DROP DATABASE alter_enum8_to_enum16_test") @@ -941,10 +942,25 @@ def move_to_prewhere_and_column_filtering(clickhouse_node, mysql_node, service_n clickhouse_node.query("DROP DATABASE IF EXISTS cond_on_key_col") mysql_node.query("DROP DATABASE IF EXISTS cond_on_key_col") mysql_node.query("CREATE DATABASE cond_on_key_col") - clickhouse_node.query("CREATE DATABASE cond_on_key_col ENGINE = MaterializeMySQL('{}:3306', 'cond_on_key_col', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE cond_on_key_col ENGINE = MaterializedMySQL('{}:3306', 'cond_on_key_col', 'root', 'clickhouse')".format(service_name)) mysql_node.query("create table cond_on_key_col.products (id int primary key, product_id int not null, catalog_id int not null, brand_id int not null, name text)") mysql_node.query("insert into cond_on_key_col.products (id, name, catalog_id, brand_id, product_id) values (915, 'ertyui', 5287, 15837, 0), (990, 'wer', 1053, 24390, 1), (781, 'qwerty', 1041, 1176, 2);") + mysql_node.query("create table cond_on_key_col.test (id int(11) NOT NULL AUTO_INCREMENT, a int(11) DEFAULT NULL, b int(11) DEFAULT NULL, PRIMARY KEY (id)) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4;") + mysql_node.query("insert into cond_on_key_col.test values (42, 123, 1);") + mysql_node.query("CREATE TABLE cond_on_key_col.balance_change_record (id bigint(20) NOT NULL AUTO_INCREMENT, type tinyint(4) DEFAULT NULL, value decimal(10,4) DEFAULT NULL, time timestamp NULL DEFAULT NULL, " + "initiative_id varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL, passivity_id varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL, " + "person_id varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL, tenant_code varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL, " + "created_time timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间', updated_time timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, " + "value_snapshot decimal(10,4) DEFAULT NULL, PRIMARY KEY (id), KEY balance_change_record_initiative_id (person_id) USING BTREE, " + "KEY type (type) USING BTREE, KEY balance_change_record_type (time) USING BTREE, KEY initiative_id (initiative_id) USING BTREE, " + "KEY balance_change_record_tenant_code (passivity_id) USING BTREE, KEY tenant_code (tenant_code) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=1691049 DEFAULT CHARSET=utf8") + mysql_node.query("insert into cond_on_key_col.balance_change_record values (123, 1, 3.14, null, 'qwe', 'asd', 'zxc', 'rty', null, null, 2.7);") + mysql_node.query("CREATE TABLE cond_on_key_col.test1 (id int(11) NOT NULL AUTO_INCREMENT, c1 varchar(32) NOT NULL, c2 varchar(32), PRIMARY KEY (id)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4") + mysql_node.query("insert into cond_on_key_col.test1(c1,c2) values ('a','b'), ('c', null);") check_query(clickhouse_node, "SELECT DISTINCT P.id, P.name, P.catalog_id FROM cond_on_key_col.products P WHERE P.name ILIKE '%e%' and P.catalog_id=5287", '915\tertyui\t5287\n') + check_query(clickhouse_node, "select count(a) from cond_on_key_col.test where b = 1;", "1\n") + check_query(clickhouse_node, "select id from cond_on_key_col.balance_change_record where type=1;", "123\n") + check_query(clickhouse_node, "select count(c1) from cond_on_key_col.test1 where c2='b';", "1\n") clickhouse_node.query("DROP DATABASE cond_on_key_col") mysql_node.query("DROP DATABASE cond_on_key_col") @@ -956,7 +972,7 @@ def mysql_settings_test(clickhouse_node, mysql_node, service_name): mysql_node.query("INSERT INTO test_database.a VALUES(1, 'foo')") mysql_node.query("INSERT INTO test_database.a VALUES(2, 'bar')") - clickhouse_node.query("CREATE DATABASE test_database ENGINE = MaterializeMySQL('{}:3306', 'test_database', 'root', 'clickhouse')".format(service_name)) + clickhouse_node.query("CREATE DATABASE test_database ENGINE = MaterializedMySQL('{}:3306', 'test_database', 'root', 'clickhouse')".format(service_name)) check_query(clickhouse_node, "SELECT COUNT() FROM test_database.a FORMAT TSV", "2\n") assert clickhouse_node.query("SELECT COUNT(DISTINCT blockNumber()) FROM test_database.a FORMAT TSV") == "2\n" diff --git a/tests/integration/test_materialize_mysql_database/test.py b/tests/integration/test_materialized_mysql_database/test.py similarity index 84% rename from tests/integration/test_materialize_mysql_database/test.py rename to tests/integration/test_materialized_mysql_database/test.py index 252cf551d2d..18cb5b3b87c 100644 --- a/tests/integration/test_materialize_mysql_database/test.py +++ b/tests/integration/test_materialized_mysql_database/test.py @@ -94,40 +94,40 @@ def started_mysql_8_0(): @pytest.mark.parametrize(('clickhouse_node'), [pytest.param(node_db_ordinary, id="ordinary"), pytest.param(node_db_atomic, id="atomic")]) def test_materialize_database_dml_with_mysql_5_7(started_cluster, started_mysql_5_7, clickhouse_node): - materialize_with_ddl.dml_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") - materialize_with_ddl.materialize_mysql_database_with_views(clickhouse_node, started_mysql_5_7, "mysql57") - materialize_with_ddl.materialize_mysql_database_with_datetime_and_decimal(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.dml_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.materialized_mysql_database_with_views(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.materialized_mysql_database_with_datetime_and_decimal(clickhouse_node, started_mysql_5_7, "mysql57") materialize_with_ddl.move_to_prewhere_and_column_filtering(clickhouse_node, started_mysql_5_7, "mysql57") @pytest.mark.parametrize(('clickhouse_node'), [pytest.param(node_db_ordinary, id="ordinary"), pytest.param(node_db_atomic, id="atomic")]) def test_materialize_database_dml_with_mysql_8_0(started_cluster, started_mysql_8_0, clickhouse_node): - materialize_with_ddl.dml_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.materialize_mysql_database_with_views(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.materialize_mysql_database_with_datetime_and_decimal(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.dml_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.materialized_mysql_database_with_views(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.materialized_mysql_database_with_datetime_and_decimal(clickhouse_node, started_mysql_8_0, "mysql80") materialize_with_ddl.move_to_prewhere_and_column_filtering(clickhouse_node, started_mysql_8_0, "mysql80") @pytest.mark.parametrize(('clickhouse_node'), [pytest.param(node_db_ordinary, id="ordinary"), pytest.param(node_db_atomic, id="atomic")]) def test_materialize_database_ddl_with_mysql_5_7(started_cluster, started_mysql_5_7, clickhouse_node): - materialize_with_ddl.drop_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") - materialize_with_ddl.create_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") - materialize_with_ddl.rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") - materialize_with_ddl.alter_add_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") - materialize_with_ddl.alter_drop_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.drop_table_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.create_table_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.rename_table_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.alter_add_column_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.alter_drop_column_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") # mysql 5.7 cannot support alter rename column - # materialize_with_ddl.alter_rename_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") - materialize_with_ddl.alter_rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") - materialize_with_ddl.alter_modify_column_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + # materialize_with_ddl.alter_rename_column_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.alter_rename_table_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.alter_modify_column_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") @pytest.mark.parametrize(('clickhouse_node'), [pytest.param(node_db_ordinary, id="ordinary"), pytest.param(node_db_atomic, id="atomic")]) def test_materialize_database_ddl_with_mysql_8_0(started_cluster, started_mysql_8_0, clickhouse_node): - materialize_with_ddl.drop_table_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.create_table_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.alter_add_column_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.alter_drop_column_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.alter_rename_table_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.alter_rename_column_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") - materialize_with_ddl.alter_modify_column_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.drop_table_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.create_table_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.rename_table_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.alter_add_column_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.alter_drop_column_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.alter_rename_table_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.alter_rename_column_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.alter_modify_column_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") @pytest.mark.parametrize(('clickhouse_node'), [pytest.param(node_db_ordinary, id="ordinary"), pytest.param(node_db_atomic, id="atomic")]) def test_materialize_database_ddl_with_empty_transaction_5_7(started_cluster, started_mysql_5_7, clickhouse_node): @@ -160,12 +160,12 @@ def test_insert_with_modify_binlog_checksum_8_0(started_cluster, started_mysql_8 @pytest.mark.parametrize(('clickhouse_node'), [pytest.param(node_db_ordinary, id="ordinary"), pytest.param(node_db_atomic, id="atomic")]) def test_materialize_database_err_sync_user_privs_5_7(started_cluster, started_mysql_5_7, clickhouse_node): - materialize_with_ddl.err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") + materialize_with_ddl.err_sync_user_privs_with_materialized_mysql_database(clickhouse_node, started_mysql_5_7, "mysql57") @pytest.mark.parametrize(('clickhouse_node'), [pytest.param(node_db_ordinary, id="ordinary"), pytest.param(node_db_atomic, id="atomic")]) def test_materialize_database_err_sync_user_privs_8_0(started_cluster, started_mysql_8_0, clickhouse_node): - materialize_with_ddl.err_sync_user_privs_with_materialize_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") + materialize_with_ddl.err_sync_user_privs_with_materialized_mysql_database(clickhouse_node, started_mysql_8_0, "mysql80") @pytest.mark.parametrize(('clickhouse_node'), [pytest.param(node_db_ordinary, id="ordinary"), pytest.param(node_db_atomic, id="atomic")]) def test_network_partition_5_7(started_cluster, started_mysql_5_7, clickhouse_node): diff --git a/tests/integration/test_max_http_connections_for_replication/test.py b/tests/integration/test_max_http_connections_for_replication/test.py index 3921cbfd1ae..67b3c5b53aa 100644 --- a/tests/integration/test_max_http_connections_for_replication/test.py +++ b/tests/integration/test_max_http_connections_for_replication/test.py @@ -11,7 +11,7 @@ def _fill_nodes(nodes, shard, connections_count): node.query( ''' CREATE DATABASE test; - + CREATE TABLE test_table(date Date, id UInt32, dummy UInt32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/test{shard}/replicated', '{replica}') PARTITION BY date @@ -114,5 +114,5 @@ def test_multiple_endpoint_connections_count(start_big_cluster): assert_eq_with_retry(node4, "select count() from test_table", "100") assert_eq_with_retry(node5, "select count() from test_table", "100") - # two per each host - assert node5.query("SELECT value FROM system.events where event='CreatedHTTPConnections'") == '4\n' + # Two per each host or sometimes less, if fetches are not performed in parallel. But not more. + assert node5.query("SELECT value FROM system.events where event='CreatedHTTPConnections'") <= '4\n' diff --git a/tests/integration/test_mysql_protocol/test.py b/tests/integration/test_mysql_protocol/test.py index 6533a6a23f9..070aa9967fc 100644 --- a/tests/integration/test_mysql_protocol/test.py +++ b/tests/integration/test_mysql_protocol/test.py @@ -95,8 +95,11 @@ def test_mysql_client(started_cluster): '''.format(host=started_cluster.get_instance_ip('node'), port=server_port), demux=True) assert stdout.decode() == 'count()\n1\n' - assert stderr[0:182].decode() == "mysql: [Warning] Using a password on the command line interface can be insecure.\n" \ - "ERROR 81 (00000) at line 1: Code: 81, e.displayText() = DB::Exception: Database system2 doesn't exist" + expected_msg = '\n'.join([ + "mysql: [Warning] Using a password on the command line interface can be insecure.", + "ERROR 81 (00000) at line 1: Code: 81. DB::Exception: Database system2 doesn't exist", + ]) + assert stderr[:len(expected_msg)].decode() == expected_msg code, (stdout, stderr) = started_cluster.mysql_client_container.exec_run(''' mysql --protocol tcp -h {host} -P {port} default -u default --password=123 @@ -122,8 +125,11 @@ def test_mysql_client_exception(started_cluster): -e "CREATE TABLE default.t1_remote_mysql AS mysql('127.0.0.1:10086','default','t1_local','default','');" '''.format(host=started_cluster.get_instance_ip('node'), port=server_port), demux=True) - assert stderr[0:258].decode() == "mysql: [Warning] Using a password on the command line interface can be insecure.\n" \ - "ERROR 1000 (00000) at line 1: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Exception: Connections to all replicas failed: default@127.0.0.1:10086 as user default" + expected_msg = '\n'.join([ + "mysql: [Warning] Using a password on the command line interface can be insecure.", + "ERROR 1000 (00000) at line 1: Poco::Exception. Code: 1000, e.code() = 0, Exception: Connections to all replicas failed: default@127.0.0.1:10086 as user default", + ]) + assert stderr[:len(expected_msg)].decode() == expected_msg def test_mysql_affected_rows(started_cluster): @@ -328,8 +334,7 @@ def test_python_client(started_cluster): with pytest.raises(pymysql.InternalError) as exc_info: client.query('select name from tables') - assert exc_info.value.args[1][ - 0:77] == "Code: 60, e.displayText() = DB::Exception: Table default.tables doesn't exist" + assert exc_info.value.args[1].startswith("Code: 60. DB::Exception: Table default.tables doesn't exist"), exc_info.value.args[1] cursor = client.cursor(pymysql.cursors.DictCursor) cursor.execute("select 1 as a, 'теÑÑ‚' as b") @@ -348,8 +353,7 @@ def test_python_client(started_cluster): with pytest.raises(pymysql.InternalError) as exc_info: client.query('select name from tables') - assert exc_info.value.args[1][ - 0:77] == "Code: 60, e.displayText() = DB::Exception: Table default.tables doesn't exist" + assert exc_info.value.args[1].startswith("Code: 60. DB::Exception: Table default.tables doesn't exist"), exc_info.value.args[1] cursor = client.cursor(pymysql.cursors.DictCursor) cursor.execute("select 1 as a, 'теÑÑ‚' as b") @@ -360,7 +364,7 @@ def test_python_client(started_cluster): with pytest.raises(pymysql.InternalError) as exc_info: client.select_db('system2') - assert exc_info.value.args[1][0:73] == "Code: 81, e.displayText() = DB::Exception: Database system2 doesn't exist" + assert exc_info.value.args[1].startswith("Code: 81. DB::Exception: Database system2 doesn't exist"), exc_info.value.args[1] cursor = client.cursor(pymysql.cursors.DictCursor) cursor.execute('CREATE DATABASE x') diff --git a/tests/integration/test_nlp/__init__.py b/tests/integration/test_nlp/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_nlp/configs/dicts_config.xml b/tests/integration/test_nlp/configs/dicts_config.xml new file mode 100644 index 00000000000..435507ce1d8 --- /dev/null +++ b/tests/integration/test_nlp/configs/dicts_config.xml @@ -0,0 +1,22 @@ + + + + + en + plain + /etc/clickhouse-server/dictionaries/ext-en.txt + + + ru + plain + /etc/clickhouse-server/dictionaries/ext-ru.txt + + + + + + en + /etc/clickhouse-server/dictionaries/lem-en.bin + + + diff --git a/tests/integration/test_nlp/dictionaries/ext-en.txt b/tests/integration/test_nlp/dictionaries/ext-en.txt new file mode 100644 index 00000000000..beb508e437d --- /dev/null +++ b/tests/integration/test_nlp/dictionaries/ext-en.txt @@ -0,0 +1,4 @@ +important big critical crucial essential +happy cheerful delighted ecstatic +however nonetheless but yet +quiz query check exam diff --git a/tests/integration/test_nlp/dictionaries/ext-ru.txt b/tests/integration/test_nlp/dictionaries/ext-ru.txt new file mode 100644 index 00000000000..5466354b264 --- /dev/null +++ b/tests/integration/test_nlp/dictionaries/ext-ru.txt @@ -0,0 +1,4 @@ +важный большой выÑокий хороший главный +веÑелый ÑчаÑтливый живой Ñркий Ñмешной +Ñ…Ð¾Ñ‚Ñ Ð¾Ð´Ð½Ð°ÐºÐ¾ но правда +Ñкзамен иÑпытание проверка \ No newline at end of file diff --git a/tests/integration/test_nlp/dictionaries/lem-en.bin b/tests/integration/test_nlp/dictionaries/lem-en.bin new file mode 100644 index 00000000000..8981bc1ead0 Binary files /dev/null and b/tests/integration/test_nlp/dictionaries/lem-en.bin differ diff --git a/tests/integration/test_nlp/test.py b/tests/integration/test_nlp/test.py new file mode 100644 index 00000000000..24935153608 --- /dev/null +++ b/tests/integration/test_nlp/test.py @@ -0,0 +1,47 @@ +import os +import sys + +import pytest + +sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) +SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__)) + +from helpers.cluster import ClickHouseCluster + + +cluster = ClickHouseCluster(__file__) +instance = cluster.add_instance('instance', main_configs=['configs/dicts_config.xml']) + +def copy_file_to_container(local_path, dist_path, container_id): + os.system("docker cp {local} {cont_id}:{dist}".format(local=local_path, cont_id=container_id, dist=dist_path)) + +@pytest.fixture(scope="module") +def start_cluster(): + try: + cluster.start() + + copy_file_to_container(os.path.join(SCRIPT_DIR, 'dictionaries/.'), '/etc/clickhouse-server/dictionaries', instance.docker_id) + + yield cluster + finally: + cluster.shutdown() + +def test_lemmatize(start_cluster): + assert instance.query("SELECT lemmatize('en', 'wolves')", settings={"allow_experimental_nlp_functions": 1}) == "wolf\n" + assert instance.query("SELECT lemmatize('en', 'dogs')", settings={"allow_experimental_nlp_functions": 1}) == "dog\n" + assert instance.query("SELECT lemmatize('en', 'looking')", settings={"allow_experimental_nlp_functions": 1}) == "look\n" + assert instance.query("SELECT lemmatize('en', 'took')", settings={"allow_experimental_nlp_functions": 1}) == "take\n" + assert instance.query("SELECT lemmatize('en', 'imported')", settings={"allow_experimental_nlp_functions": 1}) == "import\n" + assert instance.query("SELECT lemmatize('en', 'tokenized')", settings={"allow_experimental_nlp_functions": 1}) == "tokenize\n" + assert instance.query("SELECT lemmatize('en', 'flown')", settings={"allow_experimental_nlp_functions": 1}) == "fly\n" + +def test_synonyms_extensions(start_cluster): + assert instance.query("SELECT synonyms('en', 'crucial')", settings={"allow_experimental_nlp_functions": 1}) == "['important','big','critical','crucial','essential']\n" + assert instance.query("SELECT synonyms('en', 'cheerful')", settings={"allow_experimental_nlp_functions": 1}) == "['happy','cheerful','delighted','ecstatic']\n" + assert instance.query("SELECT synonyms('en', 'yet')", settings={"allow_experimental_nlp_functions": 1}) == "['however','nonetheless','but','yet']\n" + assert instance.query("SELECT synonyms('en', 'quiz')", settings={"allow_experimental_nlp_functions": 1}) == "['quiz','query','check','exam']\n" + + assert instance.query("SELECT synonyms('ru', 'главный')", settings={"allow_experimental_nlp_functions": 1}) == "['важный','большой','выÑокий','хороший','главный']\n" + assert instance.query("SELECT synonyms('ru', 'веÑелый')", settings={"allow_experimental_nlp_functions": 1}) == "['веÑелый','ÑчаÑтливый','живой','Ñркий','Ñмешной']\n" + assert instance.query("SELECT synonyms('ru', 'правда')", settings={"allow_experimental_nlp_functions": 1}) == "['хотÑ','однако','но','правда']\n" + assert instance.query("SELECT synonyms('ru', 'Ñкзамен')", settings={"allow_experimental_nlp_functions": 1}) == "['Ñкзамен','иÑпытание','проверка']\n" diff --git a/tests/integration/test_partition/test.py b/tests/integration/test_partition/test.py index baac5367c00..b43c85a4d48 100644 --- a/tests/integration/test_partition/test.py +++ b/tests/integration/test_partition/test.py @@ -236,3 +236,82 @@ def test_drop_detached_parts(drop_detached_parts_table): q("ALTER TABLE test.drop_detached DROP DETACHED PARTITION 1", settings=s) detached = q("SElECT name FROM system.detached_parts WHERE table='drop_detached' AND database='test' ORDER BY name") assert TSV(detached) == TSV('0_3_3_0\nattaching_0_6_6_0\ndeleting_0_7_7_0') + +def test_system_detached_parts(drop_detached_parts_table): + q("create table sdp_0 (n int, x int) engine=MergeTree order by n") + q("create table sdp_1 (n int, x int) engine=MergeTree order by n partition by x") + q("create table sdp_2 (n int, x String) engine=MergeTree order by n partition by x") + q("create table sdp_3 (n int, x Enum('broken' = 0, 'all' = 1)) engine=MergeTree order by n partition by x") + + for i in range(0, 4): + q("system stop merges sdp_{}".format(i)) + q("insert into sdp_{} values (0, 0)".format(i)) + q("insert into sdp_{} values (1, 1)".format(i)) + for p in q("select distinct partition_id from system.parts where table='sdp_{}'".format(i))[:-1].split('\n'): + q("alter table sdp_{} detach partition id '{}'".format(i, p)) + + path_to_detached = path_to_data + 'data/default/sdp_{}/detached/{}' + for i in range(0, 4): + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'attaching_0_6_6_0')]) + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'deleting_0_7_7_0')]) + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'any_other_name')]) + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'prefix_1_2_2_0_0')]) + + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'ignored_202107_714380_714380_0')]) + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'broken_202107_714380_714380_123')]) + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'clone_all_714380_714380_42')]) + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'clone_all_714380_714380_42_123')]) + instance.exec_in_container(['mkdir', path_to_detached.format(i, 'broken-on-start_6711e2b2592d86d18fc0f260cf33ef2b_714380_714380_42_123')]) + + res = q("select * from system.detached_parts where table like 'sdp_%' order by table, name") + assert res == \ + "default\tsdp_0\tall\tall_1_1_0\tdefault\t\t1\t1\t0\n" \ + "default\tsdp_0\tall\tall_2_2_0\tdefault\t\t2\t2\t0\n" \ + "default\tsdp_0\t\\N\tany_other_name\tdefault\t\\N\t\\N\t\\N\t\\N\n" \ + "default\tsdp_0\t0\tattaching_0_6_6_0\tdefault\tattaching\t6\t6\t0\n" \ + "default\tsdp_0\t6711e2b2592d86d18fc0f260cf33ef2b\tbroken-on-start_6711e2b2592d86d18fc0f260cf33ef2b_714380_714380_42_123\tdefault\tbroken-on-start\t714380\t714380\t42\n" \ + "default\tsdp_0\t202107\tbroken_202107_714380_714380_123\tdefault\tbroken\t714380\t714380\t123\n" \ + "default\tsdp_0\tall\tclone_all_714380_714380_42\tdefault\tclone\t714380\t714380\t42\n" \ + "default\tsdp_0\tall\tclone_all_714380_714380_42_123\tdefault\tclone\t714380\t714380\t42\n" \ + "default\tsdp_0\t0\tdeleting_0_7_7_0\tdefault\tdeleting\t7\t7\t0\n" \ + "default\tsdp_0\t202107\tignored_202107_714380_714380_0\tdefault\tignored\t714380\t714380\t0\n" \ + "default\tsdp_0\t1\tprefix_1_2_2_0_0\tdefault\tprefix\t2\t2\t0\n" \ + "default\tsdp_1\t0\t0_1_1_0\tdefault\t\t1\t1\t0\n" \ + "default\tsdp_1\t1\t1_2_2_0\tdefault\t\t2\t2\t0\n" \ + "default\tsdp_1\t\\N\tany_other_name\tdefault\t\\N\t\\N\t\\N\t\\N\n" \ + "default\tsdp_1\t0\tattaching_0_6_6_0\tdefault\tattaching\t6\t6\t0\n" \ + "default\tsdp_1\t6711e2b2592d86d18fc0f260cf33ef2b\tbroken-on-start_6711e2b2592d86d18fc0f260cf33ef2b_714380_714380_42_123\tdefault\tbroken-on-start\t714380\t714380\t42\n" \ + "default\tsdp_1\t202107\tbroken_202107_714380_714380_123\tdefault\tbroken\t714380\t714380\t123\n" \ + "default\tsdp_1\tall\tclone_all_714380_714380_42\tdefault\tclone\t714380\t714380\t42\n" \ + "default\tsdp_1\tall\tclone_all_714380_714380_42_123\tdefault\tclone\t714380\t714380\t42\n" \ + "default\tsdp_1\t0\tdeleting_0_7_7_0\tdefault\tdeleting\t7\t7\t0\n" \ + "default\tsdp_1\t202107\tignored_202107_714380_714380_0\tdefault\tignored\t714380\t714380\t0\n" \ + "default\tsdp_1\t1\tprefix_1_2_2_0_0\tdefault\tprefix\t2\t2\t0\n" \ + "default\tsdp_2\t58ed7160db50ea45e1c6aa694c8cbfd1\t58ed7160db50ea45e1c6aa694c8cbfd1_1_1_0\tdefault\t\t1\t1\t0\n" \ + "default\tsdp_2\t6711e2b2592d86d18fc0f260cf33ef2b\t6711e2b2592d86d18fc0f260cf33ef2b_2_2_0\tdefault\t\t2\t2\t0\n" \ + "default\tsdp_2\t\\N\tany_other_name\tdefault\t\\N\t\\N\t\\N\t\\N\n" \ + "default\tsdp_2\t0\tattaching_0_6_6_0\tdefault\tattaching\t6\t6\t0\n" \ + "default\tsdp_2\t6711e2b2592d86d18fc0f260cf33ef2b\tbroken-on-start_6711e2b2592d86d18fc0f260cf33ef2b_714380_714380_42_123\tdefault\tbroken-on-start\t714380\t714380\t42\n" \ + "default\tsdp_2\t202107\tbroken_202107_714380_714380_123\tdefault\tbroken\t714380\t714380\t123\n" \ + "default\tsdp_2\tall\tclone_all_714380_714380_42\tdefault\tclone\t714380\t714380\t42\n" \ + "default\tsdp_2\tall\tclone_all_714380_714380_42_123\tdefault\tclone\t714380\t714380\t42\n" \ + "default\tsdp_2\t0\tdeleting_0_7_7_0\tdefault\tdeleting\t7\t7\t0\n" \ + "default\tsdp_2\t202107\tignored_202107_714380_714380_0\tdefault\tignored\t714380\t714380\t0\n" \ + "default\tsdp_2\t1\tprefix_1_2_2_0_0\tdefault\tprefix\t2\t2\t0\n" \ + "default\tsdp_3\t0\t0_1_1_0\tdefault\t\t1\t1\t0\n" \ + "default\tsdp_3\t1\t1_2_2_0\tdefault\t\t2\t2\t0\n" \ + "default\tsdp_3\t\\N\tany_other_name\tdefault\t\\N\t\\N\t\\N\t\\N\n" \ + "default\tsdp_3\t0\tattaching_0_6_6_0\tdefault\tattaching\t6\t6\t0\n" \ + "default\tsdp_3\t6711e2b2592d86d18fc0f260cf33ef2b\tbroken-on-start_6711e2b2592d86d18fc0f260cf33ef2b_714380_714380_42_123\tdefault\tbroken-on-start\t714380\t714380\t42\n" \ + "default\tsdp_3\t202107\tbroken_202107_714380_714380_123\tdefault\tbroken\t714380\t714380\t123\n" \ + "default\tsdp_3\tall\tclone_all_714380_714380_42\tdefault\tclone\t714380\t714380\t42\n" \ + "default\tsdp_3\tall\tclone_all_714380_714380_42_123\tdefault\tclone\t714380\t714380\t42\n" \ + "default\tsdp_3\t0\tdeleting_0_7_7_0\tdefault\tdeleting\t7\t7\t0\n" \ + "default\tsdp_3\t202107\tignored_202107_714380_714380_0\tdefault\tignored\t714380\t714380\t0\n" \ + "default\tsdp_3\t1\tprefix_1_2_2_0_0\tdefault\tprefix\t2\t2\t0\n" + + for i in range(0, 4): + for p in q("select distinct partition_id from system.detached_parts where table='sdp_{}' and partition_id is not null".format(i))[:-1].split('\n'): + q("alter table sdp_{} attach partition id '{}'".format(i, p)) + + assert q("select n, x, count() from merge('default', 'sdp_') group by n, x") == "0\t0\t4\n1\t1\t4\n" diff --git a/tests/integration/test_postgresql_database_engine/test.py b/tests/integration/test_postgresql_database_engine/test.py index e89f1109c3a..8768c4037a1 100644 --- a/tests/integration/test_postgresql_database_engine/test.py +++ b/tests/integration/test_postgresql_database_engine/test.py @@ -151,7 +151,7 @@ def test_postgresql_database_engine_table_cache(started_cluster): cursor = conn.cursor() node1.query( - "CREATE DATABASE test_database ENGINE = PostgreSQL('postgres1:5432', 'test_database', 'postgres', 'mysecretpassword', 1)") + "CREATE DATABASE test_database ENGINE = PostgreSQL('postgres1:5432', 'test_database', 'postgres', 'mysecretpassword', '', 1)") create_postgres_table(cursor, 'test_table') assert node1.query('DESCRIBE TABLE test_database.test_table').rstrip() == 'id\tInt32\t\t\t\t\t\nvalue\tNullable(Int32)' @@ -183,6 +183,31 @@ def test_postgresql_database_engine_table_cache(started_cluster): assert 'test_database' not in node1.query('SHOW DATABASES') +def test_postgresql_database_with_schema(started_cluster): + conn = get_postgres_conn(started_cluster, True) + cursor = conn.cursor() + + cursor.execute('DROP SCHEMA IF EXISTS test_schema CASCADE') + cursor.execute('DROP SCHEMA IF EXISTS "test.nice.schema" CASCADE') + + cursor.execute('CREATE SCHEMA test_schema') + cursor.execute('CREATE TABLE test_schema.table1 (a integer)') + cursor.execute('CREATE TABLE test_schema.table2 (a integer)') + cursor.execute('CREATE TABLE table3 (a integer)') + + node1.query( + "CREATE DATABASE test_database ENGINE = PostgreSQL('postgres1:5432', 'test_database', 'postgres', 'mysecretpassword', 'test_schema')") + + assert(node1.query('SHOW TABLES FROM test_database') == 'table1\ntable2\n') + + node1.query("INSERT INTO test_database.table1 SELECT number from numbers(10000)") + assert node1.query("SELECT count() FROM test_database.table1").rstrip() == '10000' + node1.query("DETACH TABLE test_database.table1") + node1.query("ATTACH TABLE test_database.table1") + assert node1.query("SELECT count() FROM test_database.table1").rstrip() == '10000' + node1.query("DROP DATABASE test_database") + + if __name__ == '__main__': cluster.start() input("Cluster created, press any key to destroy...") diff --git a/tests/integration/test_postgresql_replica_database_engine/test.py b/tests/integration/test_postgresql_replica_database_engine/test.py index 97fd461e640..ed26ab82bc7 100644 --- a/tests/integration/test_postgresql_replica_database_engine/test.py +++ b/tests/integration/test_postgresql_replica_database_engine/test.py @@ -236,7 +236,7 @@ def test_different_data_types(started_cluster): ( key Integer NOT NULL PRIMARY KEY, a Date[] NOT NULL, -- Date - b Timestamp[] NOT NULL, -- DateTime + b Timestamp[] NOT NULL, -- DateTime64(6) c real[][] NOT NULL, -- Float32 d double precision[][] NOT NULL, -- Float64 e decimal(5, 5)[][][] NOT NULL, -- Decimal32 @@ -253,11 +253,11 @@ def test_different_data_types(started_cluster): for i in range(10): instance.query(''' INSERT INTO postgres_database.test_data_types VALUES - ({}, -32768, -2147483648, -9223372036854775808, 1.12345, 1.1234567890, 2147483647, 9223372036854775807, '2000-05-12 12:12:12', '2000-05-12', 0.2, 0.2)'''.format(i)) + ({}, -32768, -2147483648, -9223372036854775808, 1.12345, 1.1234567890, 2147483647, 9223372036854775807, '2000-05-12 12:12:12.012345', '2000-05-12', 0.2, 0.2)'''.format(i)) check_tables_are_synchronized('test_data_types', 'id'); result = instance.query('SELECT * FROM test_database.test_data_types ORDER BY id LIMIT 1;') - assert(result == '0\t-32768\t-2147483648\t-9223372036854775808\t1.12345\t1.123456789\t2147483647\t9223372036854775807\t2000-05-12 12:12:12\t2000-05-12\t0.20000\t0.20000\n') + assert(result == '0\t-32768\t-2147483648\t-9223372036854775808\t1.12345\t1.123456789\t2147483647\t9223372036854775807\t2000-05-12 12:12:12.012345\t2000-05-12\t0.20000\t0.20000\n') for i in range(10): col = random.choice(['a', 'b', 'c']) @@ -270,7 +270,7 @@ def test_different_data_types(started_cluster): "VALUES (" "0, " "['2000-05-12', '2000-05-12'], " - "['2000-05-12 12:12:12', '2000-05-12 12:12:12'], " + "['2000-05-12 12:12:12.012345', '2000-05-12 12:12:12.012345'], " "[[1.12345], [1.12345], [1.12345]], " "[[1.1234567891], [1.1234567891], [1.1234567891]], " "[[[0.11111, 0.11111]], [[0.22222, 0.22222]], [[0.33333, 0.33333]]], " @@ -284,7 +284,7 @@ def test_different_data_types(started_cluster): expected = ( "0\t" + "['2000-05-12','2000-05-12']\t" + - "['2000-05-12 12:12:12','2000-05-12 12:12:12']\t" + + "['2000-05-12 12:12:12.012345','2000-05-12 12:12:12.012345']\t" + "[[1.12345],[1.12345],[1.12345]]\t" + "[[1.1234567891],[1.1234567891],[1.1234567891]]\t" + "[[[0.11111,0.11111]],[[0.22222,0.22222]],[[0.33333,0.33333]]]\t" @@ -622,7 +622,7 @@ def test_virtual_columns(started_cluster): instance.query("INSERT INTO postgres_database.postgresql_replica_0 SELECT number, number from numbers(10)") check_tables_are_synchronized('postgresql_replica_0'); - # just check that it works, no check with `expected` becuase _version is taken as LSN, which will be different each time. + # just check that it works, no check with `expected` because _version is taken as LSN, which will be different each time. result = instance.query('SELECT key, value, _sign, _version FROM test_database.postgresql_replica_0;') print(result) diff --git a/tests/integration/test_reloading_storage_configuration/test.py b/tests/integration/test_reloading_storage_configuration/test.py index edcba4e8a60..7fa37d0909c 100644 --- a/tests/integration/test_reloading_storage_configuration/test.py +++ b/tests/integration/test_reloading_storage_configuration/test.py @@ -7,8 +7,10 @@ import xml.etree.ElementTree as ET import helpers.client import helpers.cluster +from helpers.test_tools import TSV import pytest + cluster = helpers.cluster.ClickHouseCluster(__file__) node1 = cluster.add_instance('node1', @@ -76,6 +78,37 @@ def add_disk(node, name, path, separate_file=False): else: tree.write(os.path.join(node.config_d_dir, "storage_configuration.xml")) +def update_disk(node, name, path, keep_free_space_bytes, separate_file=False): + separate_configuration_path = os.path.join(node.config_d_dir, + "separate_configuration.xml") + + try: + if separate_file: + tree = ET.parse(separate_configuration_path) + else: + tree = ET.parse( + os.path.join(node.config_d_dir, "storage_configuration.xml")) + except: + tree = ET.ElementTree( + ET.fromstring('')) + + root = tree.getroot() + disk = root.find("storage_configuration").find("disks").find(name) + assert disk is not None + + new_path = disk.find("path") + assert new_path is not None + new_path.text = path + + new_keep_free_space_bytes = disk.find("keep_free_space_bytes") + assert new_keep_free_space_bytes is not None + new_keep_free_space_bytes.text = keep_free_space_bytes + + if separate_file: + tree.write(separate_configuration_path) + else: + tree.write(os.path.join(node.config_d_dir, "storage_configuration.xml")) + def add_policy(node, name, volumes): tree = ET.parse(os.path.join(node.config_d_dir, "storage_configuration.xml")) @@ -123,6 +156,36 @@ def test_add_disk(started_cluster): except: """""" +def test_update_disk(started_cluster): + try: + name = "test_update_disk" + engine = "MergeTree()" + + start_over() + node1.restart_clickhouse(kill=True) + time.sleep(2) + + node1.query(""" + CREATE TABLE {name} ( + d UInt64 + ) ENGINE = {engine} + ORDER BY d + SETTINGS storage_policy='jbods_with_external' + """.format(name=name, engine=engine)) + + assert node1.query("SELECT path, keep_free_space FROM system.disks where name = 'jbod2'") == TSV([ + ["/jbod2/", "10485760"]]) + + update_disk(node1, "jbod2", "/jbod2/", "20971520") + node1.query("SYSTEM RELOAD CONFIG") + + assert node1.query("SELECT path, keep_free_space FROM system.disks where name = 'jbod2'") == TSV([ + ["/jbod2/", "20971520"]]) + finally: + try: + node1.query("DROP TABLE IF EXISTS {}".format(name)) + except: + """""" def test_add_disk_to_separate_config(started_cluster): try: diff --git a/tests/integration/test_rename_column/test.py b/tests/integration/test_rename_column/test.py index 3a818303f40..e3e776a0791 100644 --- a/tests/integration/test_rename_column/test.py +++ b/tests/integration/test_rename_column/test.py @@ -99,8 +99,8 @@ def create_distributed_table(node, table_name): def drop_distributed_table(node, table_name): - node.query("DROP TABLE IF EXISTS {} ON CLUSTER test_cluster".format(table_name)) - node.query("DROP TABLE IF EXISTS {}_replicated ON CLUSTER test_cluster".format(table_name)) + node.query("DROP TABLE IF EXISTS {} ON CLUSTER test_cluster SYNC".format(table_name)) + node.query("DROP TABLE IF EXISTS {}_replicated ON CLUSTER test_cluster SYNC".format(table_name)) time.sleep(1) diff --git a/tests/integration/test_replica_is_active/__init__.py b/tests/integration/test_replica_is_active/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_replica_is_active/test.py b/tests/integration/test_replica_is_active/test.py new file mode 100644 index 00000000000..14046ea7f7d --- /dev/null +++ b/tests/integration/test_replica_is_active/test.py @@ -0,0 +1,41 @@ +import pytest +from helpers.client import QueryRuntimeException +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) +node1 = cluster.add_instance('node1', with_zookeeper=True) +node2 = cluster.add_instance('node2', with_zookeeper=True) +node3 = cluster.add_instance('node3', with_zookeeper=True) + +@pytest.fixture(scope="module") +def start_cluster(): + try: + cluster.start() + + for i, node in enumerate((node1, node2, node3)): + node_name = 'node' + str(i + 1) + node.query( + ''' + CREATE TABLE test_table(date Date, id UInt32, dummy UInt32) + ENGINE = ReplicatedMergeTree('/clickhouse/tables/test_table', '{}') + PARTITION BY date ORDER BY id + '''.format(node_name) + ) + + yield cluster + + finally: + cluster.shutdown() + + +def test_replica_is_active(start_cluster): + query_result = node1.query("select replica_is_active from system.replicas where table = 'test_table'") + assert query_result == '{\'node1\':1,\'node2\':1,\'node3\':1}\n' + + node3.stop() + query_result = node1.query("select replica_is_active from system.replicas where table = 'test_table'") + assert query_result == '{\'node1\':1,\'node2\':1,\'node3\':0}\n' + + node2.stop() + query_result = node1.query("select replica_is_active from system.replicas where table = 'test_table'") + assert query_result == '{\'node1\':1,\'node2\':0,\'node3\':0}\n' diff --git a/tests/integration/test_replicated_fetches_timeouts/test.py b/tests/integration/test_replicated_fetches_timeouts/test.py index 963ec2487fd..88763265270 100644 --- a/tests/integration/test_replicated_fetches_timeouts/test.py +++ b/tests/integration/test_replicated_fetches_timeouts/test.py @@ -78,7 +78,7 @@ def test_no_stall(started_cluster): """ SELECT count() FROM system.replication_queue - WHERE last_exception LIKE '%e.displayText() = Timeout%' + WHERE last_exception LIKE '%Timeout%' AND last_exception NOT LIKE '%connect timed out%' """).strip()) diff --git a/tests/integration/test_replicated_mutations/test.py b/tests/integration/test_replicated_mutations/test.py index 12a49ec22d8..9c779011c83 100644 --- a/tests/integration/test_replicated_mutations/test.py +++ b/tests/integration/test_replicated_mutations/test.py @@ -23,22 +23,27 @@ node5 = cluster.add_instance('node5', macros={'cluster': 'test3'}, main_configs= all_nodes = [node1, node2, node3, node4, node5] +def prepare_cluster(): + for node in all_nodes: + node.query("DROP TABLE IF EXISTS test_mutations SYNC") + + for node in [node1, node2, node3, node4]: + node.query(""" + CREATE TABLE test_mutations(d Date, x UInt32, i UInt32) + ENGINE ReplicatedMergeTree('/clickhouse/{cluster}/tables/test/test_mutations', '{instance}') + ORDER BY x + PARTITION BY toYYYYMM(d) + SETTINGS number_of_free_entries_in_pool_to_execute_mutation=0 + """) + + node5.query( + "CREATE TABLE test_mutations(d Date, x UInt32, i UInt32) ENGINE MergeTree() ORDER BY x PARTITION BY toYYYYMM(d)") + @pytest.fixture(scope="module") def started_cluster(): try: cluster.start() - - for node in all_nodes: - node.query("DROP TABLE IF EXISTS test_mutations") - - for node in [node1, node2, node3, node4]: - node.query( - "CREATE TABLE test_mutations(d Date, x UInt32, i UInt32) ENGINE ReplicatedMergeTree('/clickhouse/{cluster}/tables/test/test_mutations', '{instance}') ORDER BY x PARTITION BY toYYYYMM(d)") - - node5.query( - "CREATE TABLE test_mutations(d Date, x UInt32, i UInt32) ENGINE MergeTree() ORDER BY x PARTITION BY toYYYYMM(d)") - yield cluster finally: @@ -160,6 +165,8 @@ def wait_for_mutations(nodes, number_of_mutations): def test_mutations(started_cluster): + prepare_cluster() + DURATION_SECONDS = 30 nodes = [node1, node2] @@ -207,6 +214,8 @@ def test_mutations(started_cluster): ] ) def test_mutations_dont_prevent_merges(started_cluster, nodes): + prepare_cluster() + for year in range(2000, 2016): rows = '' date_str = '{}-01-{}'.format(year, random.randint(1, 10)) diff --git a/tests/integration/test_rocksdb_options/__init__.py b/tests/integration/test_rocksdb_options/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_rocksdb_options/configs/rocksdb.yml b/tests/integration/test_rocksdb_options/configs/rocksdb.yml new file mode 100644 index 00000000000..363ead6f318 --- /dev/null +++ b/tests/integration/test_rocksdb_options/configs/rocksdb.yml @@ -0,0 +1,13 @@ +--- +rocksdb: + options: + max_background_jobs: 8 + column_family_options: + num_levels: 2 + tables: + - table: + name: test + options: + max_open_files: 10000 + column_family_options: + max_bytes_for_level_base: 14 diff --git a/tests/integration/test_rocksdb_options/test.py b/tests/integration/test_rocksdb_options/test.py new file mode 100644 index 00000000000..286528107b8 --- /dev/null +++ b/tests/integration/test_rocksdb_options/test.py @@ -0,0 +1,85 @@ +# pylint: disable=unused-argument +# pylint: disable=redefined-outer-name +# pylint: disable=line-too-long + +import pytest + +from helpers.client import QueryRuntimeException +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) + +node = cluster.add_instance('node', main_configs=['configs/rocksdb.yml'], stay_alive=True) + + +@pytest.fixture(scope='module', autouse=True) +def start_cluster(): + try: + cluster.start() + yield cluster + finally: + cluster.shutdown() + +def test_valid_options(): + node.query(""" + CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); + DROP TABLE test; + """) + +def test_invalid_options(): + node.exec_in_container(['bash', '-c', "sed -i 's/max_background_jobs/no_such_option/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.restart_clickhouse() + with pytest.raises(QueryRuntimeException): + node.query(""" + CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); + """) + node.exec_in_container(['bash', '-c', "sed -i 's/no_such_option/max_background_jobs/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.restart_clickhouse() + +def test_table_valid_options(): + node.query(""" + CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); + DROP TABLE test; + """) + +def test_table_invalid_options(): + node.exec_in_container(['bash', '-c', "sed -i 's/max_open_files/no_such_table_option/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.restart_clickhouse() + with pytest.raises(QueryRuntimeException): + node.query(""" + CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); + """) + node.exec_in_container(['bash', '-c', "sed -i 's/no_such_table_option/max_open_files/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.restart_clickhouse() + +def test_valid_column_family_options(): + node.query(""" + CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); + DROP TABLE test; + """) + +def test_invalid_column_family_options(): + node.exec_in_container(['bash', '-c', "sed -i 's/num_levels/no_such_column_family_option/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.restart_clickhouse() + with pytest.raises(QueryRuntimeException): + node.query(""" + CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); + """) + node.exec_in_container(['bash', '-c', "sed -i 's/no_such_column_family_option/num_levels/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.restart_clickhouse() + +def test_table_valid_column_family_options(): + node.query(""" + CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); + DROP TABLE test; + """) + +def test_table_invalid_column_family_options(): + node.exec_in_container(['bash', '-c', "sed -i 's/max_bytes_for_level_base/no_such_table_column_family_option/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.restart_clickhouse() + with pytest.raises(QueryRuntimeException): + node.query(""" + CREATE TABLE test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); + """) + node.exec_in_container(['bash', '-c', "sed -i 's/no_such_table_column_family_option/max_bytes_for_level_base/' /etc/clickhouse-server/config.d/rocksdb.yml"]) + node.restart_clickhouse() diff --git a/tests/integration/test_role/test.py b/tests/integration/test_role/test.py index ccd3477ed72..1e253a93737 100644 --- a/tests/integration/test_role/test.py +++ b/tests/integration/test_role/test.py @@ -6,6 +6,13 @@ cluster = ClickHouseCluster(__file__) instance = cluster.add_instance('instance') +session_id_counter = 0 +def new_session_id(): + global session_id_counter + session_id_counter += 1 + return 'session #' + str(session_id_counter) + + @pytest.fixture(scope="module", autouse=True) def started_cluster(): try: @@ -26,7 +33,7 @@ def cleanup_after_test(): yield finally: instance.query("DROP USER IF EXISTS A, B") - instance.query("DROP ROLE IF EXISTS R1, R2") + instance.query("DROP ROLE IF EXISTS R1, R2, R3, R4") def test_create_role(): @@ -138,6 +145,41 @@ def test_revoke_requires_admin_option(): assert instance.query("SHOW GRANTS FOR B") == "" +def test_set_role(): + instance.query("CREATE USER A") + instance.query("CREATE ROLE R1, R2") + instance.query("GRANT R1, R2 TO A") + + session_id = new_session_id() + assert instance.http_query('SHOW CURRENT ROLES', user='A', params={'session_id':session_id}) == TSV([["R1", 0, 1], ["R2", 0, 1]]) + + instance.http_query('SET ROLE R1', user='A', params={'session_id':session_id}) + assert instance.http_query('SHOW CURRENT ROLES', user='A', params={'session_id':session_id}) == TSV([["R1", 0, 1]]) + + instance.http_query('SET ROLE R2', user='A', params={'session_id':session_id}) + assert instance.http_query('SHOW CURRENT ROLES', user='A', params={'session_id':session_id}) == TSV([["R2", 0, 1]]) + + instance.http_query('SET ROLE NONE', user='A', params={'session_id':session_id}) + assert instance.http_query('SHOW CURRENT ROLES', user='A', params={'session_id':session_id}) == TSV([]) + + instance.http_query('SET ROLE DEFAULT', user='A', params={'session_id':session_id}) + assert instance.http_query('SHOW CURRENT ROLES', user='A', params={'session_id':session_id}) == TSV([["R1", 0, 1], ["R2", 0, 1]]) + + +def test_changing_default_roles_affects_new_sessions_only(): + instance.query("CREATE USER A") + instance.query("CREATE ROLE R1, R2") + instance.query("GRANT R1, R2 TO A") + + session_id = new_session_id() + assert instance.http_query('SHOW CURRENT ROLES', user='A', params={'session_id':session_id}) == TSV([["R1", 0, 1], ["R2", 0, 1]]) + instance.query('SET DEFAULT ROLE R2 TO A') + assert instance.http_query('SHOW CURRENT ROLES', user='A', params={'session_id':session_id}) == TSV([["R1", 0, 0], ["R2", 0, 1]]) + + other_session_id = new_session_id() + assert instance.http_query('SHOW CURRENT ROLES', user='A', params={'session_id':other_session_id}) == TSV([["R2", 0, 1]]) + + def test_introspection(): instance.query("CREATE USER A") instance.query("CREATE USER B") @@ -198,3 +240,37 @@ def test_introspection(): assert instance.query("SELECT * from system.current_roles ORDER BY role_name", user='B') == TSV([["R2", 1, 1]]) assert instance.query("SELECT * from system.enabled_roles ORDER BY role_name", user='A') == TSV([["R1", 0, 1, 1]]) assert instance.query("SELECT * from system.enabled_roles ORDER BY role_name", user='B') == TSV([["R2", 1, 1, 1]]) + + +def test_function_current_roles(): + instance.query("CREATE USER A") + instance.query('CREATE ROLE R1, R2, R3, R4') + instance.query('GRANT R4 TO R2') + instance.query('GRANT R1,R2,R3 TO A') + + session_id = new_session_id() + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "['R1','R2','R3']\t['R1','R2','R3']\t['R1','R2','R3','R4']\n" + + instance.http_query('SET ROLE R1', user='A', params={'session_id':session_id}) + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "['R1','R2','R3']\t['R1']\t['R1']\n" + + instance.http_query('SET ROLE R2', user='A', params={'session_id':session_id}) + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "['R1','R2','R3']\t['R2']\t['R2','R4']\n" + + instance.http_query('SET ROLE NONE', user='A', params={'session_id':session_id}) + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "['R1','R2','R3']\t[]\t[]\n" + + instance.http_query('SET ROLE DEFAULT', user='A', params={'session_id':session_id}) + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "['R1','R2','R3']\t['R1','R2','R3']\t['R1','R2','R3','R4']\n" + + instance.query('SET DEFAULT ROLE R2 TO A') + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "['R2']\t['R1','R2','R3']\t['R1','R2','R3','R4']\n" + + instance.query('REVOKE R3 FROM A') + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "['R2']\t['R1','R2']\t['R1','R2','R4']\n" + + instance.query('REVOKE R2 FROM A') + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "[]\t['R1']\t['R1']\n" + + instance.query('SET DEFAULT ROLE ALL TO A') + assert instance.http_query('SELECT defaultRoles(), currentRoles(), enabledRoles()', user='A', params={'session_id':session_id}) == "['R1']\t['R1']\t['R1']\n" diff --git a/tests/integration/test_s3_with_proxy/configs/config.d/storage_conf.xml b/tests/integration/test_s3_with_proxy/configs/config.d/storage_conf.xml index ccae67c7c09..a8d36a53bd5 100644 --- a/tests/integration/test_s3_with_proxy/configs/config.d/storage_conf.xml +++ b/tests/integration/test_s3_with_proxy/configs/config.d/storage_conf.xml @@ -26,6 +26,7 @@ http://resolver:8080/hostname http 80 + 10 diff --git a/tests/integration/test_settings_profile/test.py b/tests/integration/test_settings_profile/test.py index 1945875bf53..7be0b395764 100644 --- a/tests/integration/test_settings_profile/test.py +++ b/tests/integration/test_settings_profile/test.py @@ -22,6 +22,13 @@ def system_settings_profile_elements(profile_name=None, user_name=None, role_nam return TSV(instance.query("SELECT * FROM system.settings_profile_elements" + where)) +session_id_counter = 0 +def new_session_id(): + global session_id_counter + session_id_counter += 1 + return 'session #' + str(session_id_counter) + + @pytest.fixture(scope="module", autouse=True) def setup_nodes(): try: @@ -42,7 +49,7 @@ def reset_after_test(): finally: instance.query("CREATE USER OR REPLACE robin") instance.query("DROP ROLE IF EXISTS worker") - instance.query("DROP SETTINGS PROFILE IF EXISTS xyz, alpha") + instance.query("DROP SETTINGS PROFILE IF EXISTS xyz, alpha, P1, P2, P3, P4, P5, P6") def test_smoke(): @@ -206,6 +213,54 @@ def test_show_profiles(): assert expected_access in instance.query("SHOW ACCESS") +def test_set_profile(): + instance.query("CREATE SETTINGS PROFILE P1 SETTINGS max_memory_usage=10000000001 MAX 20000000002") + + session_id = new_session_id() + instance.http_query("SET profile='P1'", user='robin', params={'session_id':session_id}) + assert instance.http_query("SELECT getSetting('max_memory_usage')", user='robin', params={'session_id':session_id}) == "10000000001\n" + + expected_error = "max_memory_usage shouldn't be greater than 20000000002" + assert expected_error in instance.http_query_and_get_error("SET max_memory_usage=20000000003", user='robin', params={'session_id':session_id}) + + +def test_changing_default_profiles_affects_new_sessions_only(): + instance.query("CREATE SETTINGS PROFILE P1 SETTINGS max_memory_usage=10000000001") + instance.query("CREATE SETTINGS PROFILE P2 SETTINGS max_memory_usage=10000000002") + instance.query("ALTER USER robin SETTINGS PROFILE P1") + + session_id = new_session_id() + assert instance.http_query("SELECT getSetting('max_memory_usage')", user='robin', params={'session_id':session_id}) == "10000000001\n" + instance.query("ALTER USER robin SETTINGS PROFILE P2") + assert instance.http_query("SELECT getSetting('max_memory_usage')", user='robin', params={'session_id':session_id}) == "10000000001\n" + + other_session_id = new_session_id() + assert instance.http_query("SELECT getSetting('max_memory_usage')", user='robin', params={'session_id':other_session_id}) == "10000000002\n" + + +def test_function_current_profiles(): + instance.query("CREATE SETTINGS PROFILE P1, P2") + instance.query("ALTER USER robin SETTINGS PROFILE P1, P2") + instance.query("CREATE SETTINGS PROFILE P3 TO robin") + instance.query("CREATE SETTINGS PROFILE P4") + instance.query("CREATE SETTINGS PROFILE P5 SETTINGS INHERIT P4") + instance.query("CREATE ROLE worker SETTINGS PROFILE P5") + instance.query("GRANT worker TO robin") + instance.query("CREATE SETTINGS PROFILE P6") + + session_id = new_session_id() + assert instance.http_query('SELECT defaultProfiles(), currentProfiles(), enabledProfiles()', user='robin', params={'session_id':session_id}) == "['P1','P2']\t['P1','P2']\t['default','P3','P4','P5','P1','P2']\n" + + instance.http_query("SET profile='P6'", user='robin', params={'session_id':session_id}) + assert instance.http_query('SELECT defaultProfiles(), currentProfiles(), enabledProfiles()', user='robin', params={'session_id':session_id}) == "['P1','P2']\t['P6']\t['default','P3','P4','P5','P1','P2','P6']\n" + + instance.http_query("SET profile='P5'", user='robin', params={'session_id':session_id}) + assert instance.http_query('SELECT defaultProfiles(), currentProfiles(), enabledProfiles()', user='robin', params={'session_id':session_id}) == "['P1','P2']\t['P5']\t['default','P3','P1','P2','P6','P4','P5']\n" + + instance.query("ALTER USER robin SETTINGS PROFILE P2") + assert instance.http_query('SELECT defaultProfiles(), currentProfiles(), enabledProfiles()', user='robin', params={'session_id':session_id}) == "['P2']\t['P5']\t['default','P3','P1','P2','P6','P4','P5']\n" + + def test_allow_ddl(): assert "it's necessary to have grant" in instance.query_and_get_error("CREATE TABLE tbl(a Int32) ENGINE=Log", user="robin") assert "it's necessary to have grant" in instance.query_and_get_error("GRANT CREATE ON tbl TO robin", user="robin") diff --git a/tests/integration/test_shard_level_const_function/__init__.py b/tests/integration/test_shard_level_const_function/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_shard_level_const_function/configs/remote_servers.xml b/tests/integration/test_shard_level_const_function/configs/remote_servers.xml new file mode 100644 index 00000000000..68dcfcc1460 --- /dev/null +++ b/tests/integration/test_shard_level_const_function/configs/remote_servers.xml @@ -0,0 +1,16 @@ + + + + + + node1 + 9000 + + + node2 + 9000 + + + + + diff --git a/tests/integration/test_shard_level_const_function/test.py b/tests/integration/test_shard_level_const_function/test.py new file mode 100644 index 00000000000..4561f3507cb --- /dev/null +++ b/tests/integration/test_shard_level_const_function/test.py @@ -0,0 +1,30 @@ +import pytest + +from helpers.cluster import ClickHouseCluster + +cluster = ClickHouseCluster(__file__) + +node1 = cluster.add_instance( + "node1", main_configs=["configs/remote_servers.xml"], with_zookeeper=True +) +node2 = cluster.add_instance( + "node2", main_configs=["configs/remote_servers.xml"], with_zookeeper=True +) + + +@pytest.fixture(scope="module") +def start_cluster(): + try: + cluster.start() + yield cluster + finally: + cluster.shutdown() + + +def test_remote(start_cluster): + assert ( + node1.query( + """select hostName() h, tcpPort() p, count() from clusterAllReplicas("two_shards", system.one) group by h, p order by h, p""" + ) + == "node1\t9000\t1\nnode2\t9000\t1\n" + ) diff --git a/tests/integration/test_storage_hdfs/test.py b/tests/integration/test_storage_hdfs/test.py index 731644b0987..f3c83166b46 100644 --- a/tests/integration/test_storage_hdfs/test.py +++ b/tests/integration/test_storage_hdfs/test.py @@ -17,7 +17,7 @@ def started_cluster(): def test_read_write_storage(started_cluster): hdfs_api = started_cluster.hdfs_api - + node1.query("drop table if exists SimpleHDFSStorage SYNC") node1.query( "create table SimpleHDFSStorage (id UInt32, name String, weight Float64) ENGINE = HDFS('hdfs://hdfs1:9000/simple_storage', 'TSV')") node1.query("insert into SimpleHDFSStorage values (1, 'Mark', 72.53)") diff --git a/tests/integration/test_storage_kafka/test.py b/tests/integration/test_storage_kafka/test.py index 51b2052baae..b9fc0b2272f 100644 --- a/tests/integration/test_storage_kafka/test.py +++ b/tests/integration/test_storage_kafka/test.py @@ -66,7 +66,7 @@ def get_kafka_producer(port, serializer, retries): except Exception as e: errors += [str(e)] time.sleep(1) - + raise Exception("Connection not establised, {}".format(errors)) def producer_serializer(x): @@ -1339,7 +1339,7 @@ def test_librdkafka_compression(kafka_cluster): Example of corruption: - 2020.12.10 09:59:56.831507 [ 20 ] {} void DB::StorageKafka::threadFunc(size_t): Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected '"' before: 'foo"}': (while reading the value of key value): (at row 1) + 2020.12.10 09:59:56.831507 [ 20 ] {} void DB::StorageKafka::threadFunc(size_t): Code: 27. DB::Exception: Cannot parse input: expected '"' before: 'foo"}': (while reading the value of key value): (at row 1) To trigger this regression there should duplicated messages diff --git a/tests/integration/test_storage_mongodb/configs_secure/config.d/ssl_conf.xml b/tests/integration/test_storage_mongodb/configs_secure/config.d/ssl_conf.xml new file mode 100644 index 00000000000..e14aac81e17 --- /dev/null +++ b/tests/integration/test_storage_mongodb/configs_secure/config.d/ssl_conf.xml @@ -0,0 +1,8 @@ + + + + + none + + + diff --git a/tests/integration/test_storage_mongodb/test.py b/tests/integration/test_storage_mongodb/test.py index 75af909faec..415f1c1cb33 100644 --- a/tests/integration/test_storage_mongodb/test.py +++ b/tests/integration/test_storage_mongodb/test.py @@ -5,24 +5,29 @@ from helpers.client import QueryRuntimeException from helpers.cluster import ClickHouseCluster -cluster = ClickHouseCluster(__file__) -node = cluster.add_instance('node', with_mongo=True) - @pytest.fixture(scope="module") -def started_cluster(): +def started_cluster(request): try: + cluster = ClickHouseCluster(__file__) + node = cluster.add_instance('node', + main_configs=["configs_secure/config.d/ssl_conf.xml"], + with_mongo=True, + with_mongo_secure=request.param) cluster.start() yield cluster finally: cluster.shutdown() -def get_mongo_connection(started_cluster): +def get_mongo_connection(started_cluster, secure=False): connection_str = 'mongodb://root:clickhouse@localhost:{}'.format(started_cluster.mongo_port) + if secure: + connection_str += '/?tls=true&tlsAllowInvalidCertificates=true' return pymongo.MongoClient(connection_str) +@pytest.mark.parametrize('started_cluster', [False], indirect=['started_cluster']) def test_simple_select(started_cluster): mongo_connection = get_mongo_connection(started_cluster) db = mongo_connection['test'] @@ -33,6 +38,7 @@ def test_simple_select(started_cluster): data.append({'key': i, 'data': hex(i * i)}) simple_mongo_table.insert_many(data) + node = started_cluster.instances['node'] node.query( "CREATE TABLE simple_mongo_table(key UInt64, data String) ENGINE = MongoDB('mongo1:27017', 'test', 'simple_table', 'root', 'clickhouse')") @@ -42,6 +48,7 @@ def test_simple_select(started_cluster): assert node.query("SELECT data from simple_mongo_table where key = 42") == hex(42 * 42) + '\n' +@pytest.mark.parametrize('started_cluster', [False], indirect=['started_cluster']) def test_complex_data_type(started_cluster): mongo_connection = get_mongo_connection(started_cluster) db = mongo_connection['test'] @@ -52,6 +59,7 @@ def test_complex_data_type(started_cluster): data.append({'key': i, 'data': hex(i * i), 'dict': {'a': i, 'b': str(i)}}) incomplete_mongo_table.insert_many(data) + node = started_cluster.instances['node'] node.query( "CREATE TABLE incomplete_mongo_table(key UInt64, data String) ENGINE = MongoDB('mongo1:27017', 'test', 'complex_table', 'root', 'clickhouse')") @@ -61,6 +69,7 @@ def test_complex_data_type(started_cluster): assert node.query("SELECT data from incomplete_mongo_table where key = 42") == hex(42 * 42) + '\n' +@pytest.mark.parametrize('started_cluster', [False], indirect=['started_cluster']) def test_incorrect_data_type(started_cluster): mongo_connection = get_mongo_connection(started_cluster) db = mongo_connection['test'] @@ -71,6 +80,7 @@ def test_incorrect_data_type(started_cluster): data.append({'key': i, 'data': hex(i * i), 'aaaa': 'Hello'}) strange_mongo_table.insert_many(data) + node = started_cluster.instances['node'] node.query( "CREATE TABLE strange_mongo_table(key String, data String) ENGINE = MongoDB('mongo1:27017', 'test', 'strange_table', 'root', 'clickhouse')") @@ -85,3 +95,24 @@ def test_incorrect_data_type(started_cluster): with pytest.raises(QueryRuntimeException): node.query("SELECT bbbb FROM strange_mongo_table2") + + +@pytest.mark.parametrize('started_cluster', [True], indirect=['started_cluster']) +def test_secure_connection(started_cluster): + mongo_connection = get_mongo_connection(started_cluster, secure=True) + db = mongo_connection['test'] + db.add_user('root', 'clickhouse') + simple_mongo_table = db['simple_table'] + data = [] + for i in range(0, 100): + data.append({'key': i, 'data': hex(i * i)}) + simple_mongo_table.insert_many(data) + + node = started_cluster.instances['node'] + node.query( + "CREATE TABLE simple_mongo_table(key UInt64, data String) ENGINE = MongoDB('mongo1:27017', 'test', 'simple_table', 'root', 'clickhouse', 'ssl=true')") + + assert node.query("SELECT COUNT() FROM simple_mongo_table") == '100\n' + assert node.query("SELECT sum(key) FROM simple_mongo_table") == str(sum(range(0, 100))) + '\n' + + assert node.query("SELECT data from simple_mongo_table where key = 42") == hex(42 * 42) + '\n' diff --git a/tests/integration/test_storage_postgresql/test.py b/tests/integration/test_storage_postgresql/test.py index 307879265df..28a76631c0f 100644 --- a/tests/integration/test_storage_postgresql/test.py +++ b/tests/integration/test_storage_postgresql/test.py @@ -1,55 +1,18 @@ -import time - +import logging import pytest -import psycopg2 from multiprocessing.dummy import Pool from helpers.cluster import ClickHouseCluster -from helpers.test_tools import assert_eq_with_retry -from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT cluster = ClickHouseCluster(__file__) node1 = cluster.add_instance('node1', with_postgres=True) node2 = cluster.add_instance('node2', with_postgres_cluster=True) -def get_postgres_conn(cluster, ip, database=False): - if database == True: - conn_string = f"host={ip} port='{cluster.postgres_port}' dbname='clickhouse' user='postgres' password='mysecretpassword'" - else: - conn_string = f"host={ip} port='{cluster.postgres_port}' user='postgres' password='mysecretpassword'" - - conn = psycopg2.connect(conn_string) - conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT) - conn.autocommit = True - return conn - -def create_postgres_db(conn, name): - cursor = conn.cursor() - cursor.execute("DROP DATABASE IF EXISTS {}".format(name)) - cursor.execute("CREATE DATABASE {}".format(name)) - @pytest.fixture(scope="module") def started_cluster(): try: cluster.start() - postgres_conn = get_postgres_conn(cluster, ip=cluster.postgres_ip) - print("postgres connected") - create_postgres_db(postgres_conn, 'clickhouse') - - postgres_conn = get_postgres_conn(cluster, ip=cluster.postgres2_ip) - print("postgres2 connected") - create_postgres_db(postgres_conn, 'clickhouse') - - postgres_conn = get_postgres_conn(cluster, ip=cluster.postgres3_ip) - print("postgres2 connected") - create_postgres_db(postgres_conn, 'clickhouse') - - postgres_conn = get_postgres_conn(cluster, ip=cluster.postgres4_ip) - print("postgres2 connected") - create_postgres_db(postgres_conn, 'clickhouse') - - print("postgres connected") yield cluster finally: @@ -57,50 +20,58 @@ def started_cluster(): def test_postgres_select_insert(started_cluster): - conn = get_postgres_conn(started_cluster, started_cluster.postgres_ip, True) - cursor = conn.cursor() + cursor = started_cluster.postgres_conn.cursor() table_name = 'test_many' - table = f'''postgresql('{started_cluster.postgres_ip}:{started_cluster.postgres_port}', 'clickhouse', '{table_name}', 'postgres', 'mysecretpassword')''' - cursor.execute('CREATE TABLE IF NOT EXISTS {} (a integer, b text, c integer)'.format(table_name)) + table = f'''postgresql('{started_cluster.postgres_ip}:{started_cluster.postgres_port}', 'postgres', '{table_name}', 'postgres', 'mysecretpassword')''' + cursor.execute(f'DROP TABLE IF EXISTS {table_name}') + cursor.execute(f'CREATE TABLE {table_name} (a integer, b text, c integer)') - result = node1.query(''' - INSERT INTO TABLE FUNCTION {} - SELECT number, concat('name_', toString(number)), 3 from numbers(10000)'''.format(table)) - check1 = "SELECT count() FROM {}".format(table) - check2 = "SELECT Sum(c) FROM {}".format(table) - check3 = "SELECT count(c) FROM {} WHERE a % 2 == 0".format(table) - check4 = "SELECT count() FROM {} WHERE b LIKE concat('name_', toString(1))".format(table) + result = node1.query(f''' + INSERT INTO TABLE FUNCTION {table} + SELECT number, concat('name_', toString(number)), 3 from numbers(10000)''') + check1 = f"SELECT count() FROM {table}" + check2 = f"SELECT Sum(c) FROM {table}" + check3 = f"SELECT count(c) FROM {table} WHERE a % 2 == 0" + check4 = f"SELECT count() FROM {table} WHERE b LIKE concat('name_', toString(1))" assert (node1.query(check1)).rstrip() == '10000' assert (node1.query(check2)).rstrip() == '30000' assert (node1.query(check3)).rstrip() == '5000' assert (node1.query(check4)).rstrip() == '1' + # Triggers issue https://github.com/ClickHouse/ClickHouse/issues/26088 + # for i in range(1, 1000): + # assert (node1.query(check1)).rstrip() == '10000', f"Failed on {i}" + + cursor.execute(f'DROP TABLE {table_name} ') + def test_postgres_conversions(started_cluster): - conn = get_postgres_conn(started_cluster, started_cluster.postgres_ip, True) - cursor = conn.cursor() + cursor = started_cluster.postgres_conn.cursor() + cursor.execute(f'DROP TABLE IF EXISTS test_types') + cursor.execute(f'DROP TABLE IF EXISTS test_array_dimensions') + cursor.execute( - '''CREATE TABLE IF NOT EXISTS test_types ( + '''CREATE TABLE test_types ( a smallint, b integer, c bigint, d real, e double precision, f serial, g bigserial, h timestamp, i date, j decimal(5, 3), k numeric, l boolean)''') node1.query(''' - INSERT INTO TABLE FUNCTION postgresql('postgres1:5432', 'clickhouse', 'test_types', 'postgres', 'mysecretpassword') VALUES - (-32768, -2147483648, -9223372036854775808, 1.12345, 1.1234567890, 2147483647, 9223372036854775807, '2000-05-12 12:12:12', '2000-05-12', 22.222, 22.222, 1)''') + INSERT INTO TABLE FUNCTION postgresql('postgres1:5432', 'postgres', 'test_types', 'postgres', 'mysecretpassword') VALUES + (-32768, -2147483648, -9223372036854775808, 1.12345, 1.1234567890, 2147483647, 9223372036854775807, '2000-05-12 12:12:12.012345', '2000-05-12', 22.222, 22.222, 1)''') result = node1.query(''' - SELECT a, b, c, d, e, f, g, h, i, j, toDecimal128(k, 3), l FROM postgresql('postgres1:5432', 'clickhouse', 'test_types', 'postgres', 'mysecretpassword')''') - assert(result == '-32768\t-2147483648\t-9223372036854775808\t1.12345\t1.123456789\t2147483647\t9223372036854775807\t2000-05-12 12:12:12\t2000-05-12\t22.222\t22.222\t1\n') + SELECT a, b, c, d, e, f, g, h, i, j, toDecimal128(k, 3), l FROM postgresql('postgres1:5432', 'postgres', 'test_types', 'postgres', 'mysecretpassword')''') + assert(result == '-32768\t-2147483648\t-9223372036854775808\t1.12345\t1.123456789\t2147483647\t9223372036854775807\t2000-05-12 12:12:12.012345\t2000-05-12\t22.222\t22.222\t1\n') cursor.execute("INSERT INTO test_types (l) VALUES (TRUE), (true), ('yes'), ('y'), ('1');") cursor.execute("INSERT INTO test_types (l) VALUES (FALSE), (false), ('no'), ('off'), ('0');") expected = "1\n1\n1\n1\n1\n1\n0\n0\n0\n0\n0\n" - result = node1.query('''SELECT l FROM postgresql('postgres1:5432', 'clickhouse', 'test_types', 'postgres', 'mysecretpassword')''') + result = node1.query('''SELECT l FROM postgresql('postgres1:5432', 'postgres', 'test_types', 'postgres', 'mysecretpassword')''') assert(result == expected) cursor.execute( '''CREATE TABLE IF NOT EXISTS test_array_dimensions ( a Date[] NOT NULL, -- Date - b Timestamp[] NOT NULL, -- DateTime + b Timestamp[] NOT NULL, -- DateTime64(6) c real[][] NOT NULL, -- Float32 d double precision[][] NOT NULL, -- Float64 e decimal(5, 5)[][][] NOT NULL, -- Decimal32 @@ -112,9 +83,9 @@ def test_postgres_conversions(started_cluster): )''') result = node1.query(''' - DESCRIBE TABLE postgresql('postgres1:5432', 'clickhouse', 'test_array_dimensions', 'postgres', 'mysecretpassword')''') + DESCRIBE TABLE postgresql('postgres1:5432', 'postgres', 'test_array_dimensions', 'postgres', 'mysecretpassword')''') expected = ('a\tArray(Date)\t\t\t\t\t\n' + - 'b\tArray(DateTime)\t\t\t\t\t\n' + + 'b\tArray(DateTime64(6))\t\t\t\t\t\n' + 'c\tArray(Array(Float32))\t\t\t\t\t\n' + 'd\tArray(Array(Float64))\t\t\t\t\t\n' + 'e\tArray(Array(Array(Decimal(5, 5))))\t\t\t\t\t\n' + @@ -126,10 +97,10 @@ def test_postgres_conversions(started_cluster): ) assert(result.rstrip() == expected) - node1.query("INSERT INTO TABLE FUNCTION postgresql('postgres1:5432', 'clickhouse', 'test_array_dimensions', 'postgres', 'mysecretpassword') " + node1.query("INSERT INTO TABLE FUNCTION postgresql('postgres1:5432', 'postgres', 'test_array_dimensions', 'postgres', 'mysecretpassword') " "VALUES (" "['2000-05-12', '2000-05-12'], " - "['2000-05-12 12:12:12', '2000-05-12 12:12:12'], " + "['2000-05-12 12:12:12.012345', '2000-05-12 12:12:12.012345'], " "[[1.12345], [1.12345], [1.12345]], " "[[1.1234567891], [1.1234567891], [1.1234567891]], " "[[[0.11111, 0.11111]], [[0.22222, 0.22222]], [[0.33333, 0.33333]]], " @@ -141,10 +112,10 @@ def test_postgres_conversions(started_cluster): ")") result = node1.query(''' - SELECT * FROM postgresql('postgres1:5432', 'clickhouse', 'test_array_dimensions', 'postgres', 'mysecretpassword')''') + SELECT * FROM postgresql('postgres1:5432', 'postgres', 'test_array_dimensions', 'postgres', 'mysecretpassword')''') expected = ( "['2000-05-12','2000-05-12']\t" + - "['2000-05-12 12:12:12','2000-05-12 12:12:12']\t" + + "['2000-05-12 12:12:12.012345','2000-05-12 12:12:12.012345']\t" + "[[1.12345],[1.12345],[1.12345]]\t" + "[[1.1234567891],[1.1234567891],[1.1234567891]]\t" + "[[[0.11111,0.11111]],[[0.22222,0.22222]],[[0.33333,0.33333]]]\t" @@ -156,25 +127,33 @@ def test_postgres_conversions(started_cluster): ) assert(result == expected) + cursor.execute(f'DROP TABLE test_types') + cursor.execute(f'DROP TABLE test_array_dimensions') + def test_non_default_scema(started_cluster): - conn = get_postgres_conn(started_cluster, started_cluster.postgres_ip, True) - cursor = conn.cursor() + node1.query('DROP TABLE IF EXISTS test_pg_table_schema') + node1.query('DROP TABLE IF EXISTS test_pg_table_schema_with_dots') + + cursor = started_cluster.postgres_conn.cursor() + cursor.execute('DROP SCHEMA IF EXISTS test_schema CASCADE') + cursor.execute('DROP SCHEMA IF EXISTS "test.nice.schema" CASCADE') + cursor.execute('CREATE SCHEMA test_schema') cursor.execute('CREATE TABLE test_schema.test_table (a integer)') cursor.execute('INSERT INTO test_schema.test_table SELECT i FROM generate_series(0, 99) as t(i)') node1.query(''' CREATE TABLE test_pg_table_schema (a UInt32) - ENGINE PostgreSQL('postgres1:5432', 'clickhouse', 'test_table', 'postgres', 'mysecretpassword', 'test_schema'); + ENGINE PostgreSQL('postgres1:5432', 'postgres', 'test_table', 'postgres', 'mysecretpassword', 'test_schema'); ''') result = node1.query('SELECT * FROM test_pg_table_schema') expected = node1.query('SELECT number FROM numbers(100)') assert(result == expected) - table_function = '''postgresql('postgres1:5432', 'clickhouse', 'test_table', 'postgres', 'mysecretpassword', 'test_schema')''' - result = node1.query('SELECT * FROM {}'.format(table_function)) + table_function = '''postgresql('postgres1:5432', 'postgres', 'test_table', 'postgres', 'mysecretpassword', 'test_schema')''' + result = node1.query(f'SELECT * FROM {table_function}') assert(result == expected) cursor.execute('''CREATE SCHEMA "test.nice.schema"''') @@ -183,24 +162,28 @@ def test_non_default_scema(started_cluster): node1.query(''' CREATE TABLE test_pg_table_schema_with_dots (a UInt32) - ENGINE PostgreSQL('postgres1:5432', 'clickhouse', 'test.nice.table', 'postgres', 'mysecretpassword', 'test.nice.schema'); + ENGINE PostgreSQL('postgres1:5432', 'postgres', 'test.nice.table', 'postgres', 'mysecretpassword', 'test.nice.schema'); ''') result = node1.query('SELECT * FROM test_pg_table_schema_with_dots') assert(result == expected) cursor.execute('INSERT INTO "test_schema"."test_table" SELECT i FROM generate_series(100, 199) as t(i)') - result = node1.query('SELECT * FROM {}'.format(table_function)) + result = node1.query(f'SELECT * FROM {table_function}') expected = node1.query('SELECT number FROM numbers(200)') assert(result == expected) + cursor.execute('DROP SCHEMA test_schema CASCADE') + cursor.execute('DROP SCHEMA "test.nice.schema" CASCADE') + node1.query('DROP TABLE test_pg_table_schema') + node1.query('DROP TABLE test_pg_table_schema_with_dots') + def test_concurrent_queries(started_cluster): - conn = get_postgres_conn(started_cluster, started_cluster.postgres_ip, True) - cursor = conn.cursor() + cursor = started_cluster.postgres_conn.cursor() node1.query(''' CREATE TABLE test_table (key UInt32, value UInt32) - ENGINE = PostgreSQL('postgres1:5432', 'clickhouse', 'test_table', 'postgres', 'mysecretpassword')''') + ENGINE = PostgreSQL('postgres1:5432', 'postgres', 'test_table', 'postgres', 'mysecretpassword')''') cursor.execute('CREATE TABLE test_table (key integer, value integer)') @@ -212,7 +195,7 @@ def test_concurrent_queries(started_cluster): p = busy_pool.map_async(node_select, range(20)) p.wait() count = node1.count_in_log('New connection to postgres1:5432') - print(count, prev_count) + logging.debug(f'count {count}, prev_count {prev_count}') # 16 is default size for connection pool assert(int(count) <= int(prev_count) + 16) @@ -224,7 +207,7 @@ def test_concurrent_queries(started_cluster): p = busy_pool.map_async(node_insert, range(5)) p.wait() result = node1.query("SELECT count() FROM test_table", user='default') - print(result) + logging.debug(result) assert(int(result) == 5 * 5 * 1000) def node_insert_select(_): @@ -236,44 +219,41 @@ def test_concurrent_queries(started_cluster): p = busy_pool.map_async(node_insert_select, range(5)) p.wait() result = node1.query("SELECT count() FROM test_table", user='default') - print(result) + logging.debug(result) assert(int(result) == 5 * 5 * 1000 * 2) node1.query('DROP TABLE test_table;') cursor.execute('DROP TABLE test_table;') count = node1.count_in_log('New connection to postgres1:5432') - print(count, prev_count) + logging.debug(f'count {count}, prev_count {prev_count}') assert(int(count) <= int(prev_count) + 16) def test_postgres_distributed(started_cluster): - conn0 = get_postgres_conn(started_cluster, started_cluster.postgres_ip, database=True) - conn1 = get_postgres_conn(started_cluster, started_cluster.postgres2_ip, database=True) - conn2 = get_postgres_conn(started_cluster, started_cluster.postgres3_ip, database=True) - conn3 = get_postgres_conn(started_cluster, started_cluster.postgres4_ip, database=True) - - cursor0 = conn0.cursor() - cursor1 = conn1.cursor() - cursor2 = conn2.cursor() - cursor3 = conn3.cursor() + cursor0 = started_cluster.postgres_conn.cursor() + cursor1 = started_cluster.postgres2_conn.cursor() + cursor2 = started_cluster.postgres3_conn.cursor() + cursor3 = started_cluster.postgres4_conn.cursor() cursors = [cursor0, cursor1, cursor2, cursor3] for i in range(4): + cursors[i].execute('DROP TABLE IF EXISTS test_replicas') cursors[i].execute('CREATE TABLE test_replicas (id Integer, name Text)') - cursors[i].execute("""INSERT INTO test_replicas select i, 'host{}' from generate_series(0, 99) as t(i);""".format(i + 1)); + cursors[i].execute(f"""INSERT INTO test_replicas select i, 'host{i+1}' from generate_series(0, 99) as t(i);"""); # test multiple ports parsing - result = node2.query('''SELECT DISTINCT(name) FROM postgresql(`postgres{1|2|3}:5432`, 'clickhouse', 'test_replicas', 'postgres', 'mysecretpassword'); ''') + result = node2.query('''SELECT DISTINCT(name) FROM postgresql(`postgres{1|2|3}:5432`, 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') assert(result == 'host1\n' or result == 'host2\n' or result == 'host3\n') - result = node2.query('''SELECT DISTINCT(name) FROM postgresql(`postgres2:5431|postgres3:5432`, 'clickhouse', 'test_replicas', 'postgres', 'mysecretpassword'); ''') + result = node2.query('''SELECT DISTINCT(name) FROM postgresql(`postgres2:5431|postgres3:5432`, 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') assert(result == 'host3\n' or result == 'host2\n') # Create storage with with 3 replicas + node2.query('DROP TABLE IF EXISTS test_replicas') node2.query(''' CREATE TABLE test_replicas (id UInt32, name String) - ENGINE = PostgreSQL(`postgres{2|3|4}:5432`, 'clickhouse', 'test_replicas', 'postgres', 'mysecretpassword'); ''') + ENGINE = PostgreSQL(`postgres{2|3|4}:5432`, 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') # Check all replicas are traversed query = "SELECT name FROM (" @@ -284,10 +264,12 @@ def test_postgres_distributed(started_cluster): assert(result == 'host2\nhost3\nhost4\n') # Create storage with with two two shards, each has 2 replicas + node2.query('DROP TABLE IF EXISTS test_shards') + node2.query(''' CREATE TABLE test_shards (id UInt32, name String, age UInt32, money UInt32) - ENGINE = ExternalDistributed('PostgreSQL', `postgres{1|2}:5432,postgres{3|4}:5432`, 'clickhouse', 'test_replicas', 'postgres', 'mysecretpassword'); ''') + ENGINE = ExternalDistributed('PostgreSQL', `postgres{1|2}:5432,postgres{3|4}:5432`, 'postgres', 'test_replicas', 'postgres', 'mysecretpassword'); ''') # Check only one replica in each shard is used result = node2.query("SELECT DISTINCT(name) FROM test_shards ORDER BY name") @@ -306,26 +288,32 @@ def test_postgres_distributed(started_cluster): result = node2.query("SELECT DISTINCT(name) FROM test_shards ORDER BY name") started_cluster.unpause_container('postgres1') assert(result == 'host2\nhost4\n' or result == 'host3\nhost4\n') + node2.query('DROP TABLE test_shards') + node2.query('DROP TABLE test_replicas') def test_datetime_with_timezone(started_cluster): - conn = get_postgres_conn(started_cluster, started_cluster.postgres_ip, True) - cursor = conn.cursor() + cursor = started_cluster.postgres_conn.cursor() + cursor.execute("DROP TABLE IF EXISTS test_timezone") + node1.query("DROP TABLE IF EXISTS test_timezone") cursor.execute("CREATE TABLE test_timezone (ts timestamp without time zone, ts_z timestamp with time zone)") cursor.execute("insert into test_timezone select '2014-04-04 20:00:00', '2014-04-04 20:00:00'::timestamptz at time zone 'America/New_York';") cursor.execute("select * from test_timezone") result = cursor.fetchall()[0] - print(result[0], str(result[1])[:-6]) - node1.query("create table test_timezone ( ts DateTime, ts_z DateTime('America/New_York')) ENGINE PostgreSQL('postgres1:5432', 'clickhouse', 'test_timezone', 'postgres', 'mysecretpassword');") + logging.debug(f'{result[0]}, {str(result[1])[:-6]}') + node1.query("create table test_timezone ( ts DateTime, ts_z DateTime('America/New_York')) ENGINE PostgreSQL('postgres1:5432', 'postgres', 'test_timezone', 'postgres', 'mysecretpassword');") assert(node1.query("select ts from test_timezone").strip() == str(result[0])) # [:-6] because 2014-04-04 16:00:00+00:00 -> 2014-04-04 16:00:00 assert(node1.query("select ts_z from test_timezone").strip() == str(result[1])[:-6]) assert(node1.query("select * from test_timezone") == "2014-04-04 20:00:00\t2014-04-04 16:00:00\n") + cursor.execute("DROP TABLE test_timezone") + node1.query("DROP TABLE test_timezone") def test_postgres_ndim(started_cluster): - conn = get_postgres_conn(started_cluster, started_cluster.postgres_ip, True) - cursor = conn.cursor() + cursor = started_cluster.postgres_conn.cursor() + cursor.execute("DROP TABLE IF EXISTS arr1, arr2") + cursor.execute('CREATE TABLE arr1 (a Integer[])') cursor.execute("INSERT INTO arr1 SELECT '{{1}, {2}}'") @@ -335,8 +323,9 @@ def test_postgres_ndim(started_cluster): result = cursor.fetchall()[0] assert(int(result[0]) == 0) - result = node1.query('''SELECT toTypeName(a) FROM postgresql('postgres1:5432', 'clickhouse', 'arr2', 'postgres', 'mysecretpassword')''') + result = node1.query('''SELECT toTypeName(a) FROM postgresql('postgres1:5432', 'postgres', 'arr2', 'postgres', 'mysecretpassword')''') assert(result.strip() == "Array(Array(Nullable(Int32)))") + cursor.execute("DROP TABLE arr1, arr2") if __name__ == '__main__': diff --git a/tests/integration/test_storage_rabbitmq/test.py b/tests/integration/test_storage_rabbitmq/test.py index 38c823cd52f..a8efea1c5d6 100644 --- a/tests/integration/test_storage_rabbitmq/test.py +++ b/tests/integration/test_storage_rabbitmq/test.py @@ -2032,6 +2032,20 @@ def test_rabbitmq_queue_consume(rabbitmq_cluster): instance.query('DROP TABLE test.rabbitmq_queue') +def test_rabbitmq_drop_table_with_unfinished_setup(rabbitmq_cluster): + rabbitmq_cluster.pause_container('rabbitmq1') + instance.query(''' + CREATE TABLE test.drop (key UInt64, value UInt64) + ENGINE = RabbitMQ + SETTINGS rabbitmq_host_port = 'rabbitmq1:5672', + rabbitmq_exchange_name = 'drop', + rabbitmq_format = 'JSONEachRow'; + ''') + time.sleep(5) + instance.query('DROP TABLE test.drop;') + rabbitmq_cluster.unpause_container('rabbitmq1') + + if __name__ == '__main__': cluster.start() input("Cluster created, press any key to destroy...") diff --git a/tests/integration/test_storage_s3/test.py b/tests/integration/test_storage_s3/test.py index 1ba29975202..5908def8297 100644 --- a/tests/integration/test_storage_s3/test.py +++ b/tests/integration/test_storage_s3/test.py @@ -198,12 +198,14 @@ def test_empty_put(started_cluster, auth): instance = started_cluster.instances["dummy"] # type: ClickHouseInstance table_format = "column1 UInt32, column2 UInt32, column3 UInt32" + drop_empty_table_query = "DROP TABLE IF EXISTS empty_table" create_empty_table_query = """ CREATE TABLE empty_table ( {} ) ENGINE = Null() """.format(table_format) + run_query(instance, drop_empty_table_query) run_query(instance, create_empty_table_query) filename = "empty_put_test.csv" @@ -305,22 +307,22 @@ def test_put_with_zero_redirect(started_cluster): def test_put_get_with_globs(started_cluster): # type: (ClickHouseCluster) -> None - + unique_prefix = random.randint(1,10000) bucket = started_cluster.minio_bucket instance = started_cluster.instances["dummy"] # type: ClickHouseInstance table_format = "column1 UInt32, column2 UInt32, column3 UInt32" max_path = "" for i in range(10): for j in range(10): - path = "{}_{}/{}.csv".format(i, random.choice(['a', 'b', 'c', 'd']), j) + path = "{}/{}_{}/{}.csv".format(unique_prefix, i, random.choice(['a', 'b', 'c', 'd']), j) max_path = max(path, max_path) values = "({},{},{})".format(i, j, i + j) query = "insert into table function s3('http://{}:{}/{}/{}', 'CSV', '{}') values {}".format( started_cluster.minio_ip, MINIO_INTERNAL_PORT, bucket, path, table_format, values) run_query(instance, query) - query = "select sum(column1), sum(column2), sum(column3), min(_file), max(_path) from s3('http://{}:{}/{}/*_{{a,b,c,d}}/%3f.csv', 'CSV', '{}')".format( - started_cluster.minio_redirect_host, started_cluster.minio_redirect_port, bucket, table_format) + query = "select sum(column1), sum(column2), sum(column3), min(_file), max(_path) from s3('http://{}:{}/{}/{}/*_{{a,b,c,d}}/%3f.csv', 'CSV', '{}')".format( + started_cluster.minio_redirect_host, started_cluster.minio_redirect_port, bucket, unique_prefix, table_format) assert run_query(instance, query).splitlines() == [ "450\t450\t900\t0.csv\t{bucket}/{max_path}".format(bucket=bucket, max_path=max_path)] @@ -479,6 +481,7 @@ def test_custom_auth_headers(started_cluster): result = run_query(instance, get_query) assert result == '1\t2\t3\n' + instance.query("DROP TABLE IF EXISTS test") instance.query( "CREATE TABLE test ({table_format}) ENGINE = S3('http://resolver:8080/{bucket}/{file}', 'CSV')".format( bucket=started_cluster.minio_restricted_bucket, @@ -494,6 +497,7 @@ def test_custom_auth_headers(started_cluster): replace_config("
Authorization: Bearer INVALID_TOKEN", "
Authorization: Bearer TOKEN") instance.query("SYSTEM RELOAD CONFIG") assert run_query(instance, "SELECT * FROM test") == '1\t2\t3\n' + instance.query("DROP TABLE test") def test_custom_auth_headers_exclusion(started_cluster): @@ -551,6 +555,8 @@ def test_storage_s3_get_gzip(started_cluster, extension, method): "Norman Ortega,33", "" ] + run_query(instance, f"DROP TABLE IF EXISTS {name}") + buf = io.BytesIO() compressed = gzip.GzipFile(fileobj=buf, mode="wb") compressed.write(("\n".join(data)).encode()) @@ -562,7 +568,8 @@ def test_storage_s3_get_gzip(started_cluster, extension, method): 'CSV', '{method}')""") - run_query(instance, "SELECT sum(id) FROM {}".format(name)).splitlines() == ["565"] + run_query(instance, f"SELECT sum(id) FROM {name}").splitlines() == ["565"] + run_query(instance, f"DROP TABLE {name}") def test_storage_s3_get_unstable(started_cluster): diff --git a/tests/integration/test_version_update_after_mutation/test.py b/tests/integration/test_version_update_after_mutation/test.py index 4f8a61a5bf0..8d38234ccdd 100644 --- a/tests/integration/test_version_update_after_mutation/test.py +++ b/tests/integration/test_version_update_after_mutation/test.py @@ -26,6 +26,7 @@ def start_cluster(): def test_mutate_and_upgrade(start_cluster): for node in [node1, node2]: + node.query("DROP TABLE IF EXISTS mt") node.query( "CREATE TABLE mt (EventDate Date, id UInt64) ENGINE ReplicatedMergeTree('/clickhouse/tables/t', '{}') ORDER BY tuple()".format( node.name)) @@ -67,8 +68,13 @@ def test_mutate_and_upgrade(start_cluster): assert node1.query("SELECT id FROM mt") == "1\n4\n" assert node2.query("SELECT id FROM mt") == "1\n4\n" + for node in [node1, node2]: + node.query("DROP TABLE mt") + def test_upgrade_while_mutation(start_cluster): + node3.query("DROP TABLE IF EXISTS mt1") + node3.query( "CREATE TABLE mt1 (EventDate Date, id UInt64) ENGINE ReplicatedMergeTree('/clickhouse/tables/t1', 'node3') ORDER BY tuple()") @@ -80,9 +86,11 @@ def test_upgrade_while_mutation(start_cluster): node3.restart_with_latest_version(signal=9) # checks for readonly - exec_query_with_retry(node3, "OPTIMIZE TABLE mt1", retry_count=60) + exec_query_with_retry(node3, "OPTIMIZE TABLE mt1", sleep_time=5, retry_count=60) node3.query("ALTER TABLE mt1 DELETE WHERE id > 100000", settings={"mutations_sync": "2"}) # will delete nothing, but previous async mutation will finish with this query assert_eq_with_retry(node3, "SELECT COUNT() from mt1", "50000\n") + + node3.query("DROP TABLE mt1") diff --git a/tests/performance/datetime_comparison.xml b/tests/performance/datetime_comparison.xml index 2d47ded0b1a..8d7b0c8c4de 100644 --- a/tests/performance/datetime_comparison.xml +++ b/tests/performance/datetime_comparison.xml @@ -2,4 +2,5 @@ SELECT count() FROM numbers(1000000000) WHERE materialize(now()) > toString(toDateTime('2020-09-30 00:00:00')) SELECT count() FROM numbers(1000000000) WHERE materialize(now()) > toUInt32(toDateTime('2020-09-30 00:00:00')) SELECT count() FROM numbers(1000000000) WHERE materialize(now()) > toDateTime('2020-09-30 00:00:00') + SELECT count() FROM numbers(1000000000) WHERE materialize(now()) > toDate('2020-09-30 00:00:00') diff --git a/tests/performance/lot_of_subcolumns.xml b/tests/performance/lot_of_subcolumns.xml new file mode 100644 index 00000000000..d33a7704d70 --- /dev/null +++ b/tests/performance/lot_of_subcolumns.xml @@ -0,0 +1,23 @@ + + + CREATE TABLE lot_of_arrays(id UInt64, + `nested.arr0` Array(UInt64), `nested.arr1` Array(UInt64), `nested.arr2` Array(UInt64), `nested.arr3` Array(UInt64), `nested.arr4` Array(UInt64), `nested.arr5` Array(UInt64), `nested.arr6` Array(UInt64), `nested.arr7` Array(UInt64), `nested.arr8` Array(UInt64), `nested.arr9` Array(UInt64), `nested.arr10` Array(UInt64), `nested.arr11` Array(UInt64), `nested.arr12` Array(UInt64), `nested.arr13` Array(UInt64), `nested.arr14` Array(UInt64), `nested.arr15` Array(UInt64), `nested.arr16` Array(UInt64), `nested.arr17` Array(UInt64), `nested.arr18` Array(UInt64), `nested.arr19` Array(UInt64), `nested.arr20` Array(UInt64), `nested.arr21` Array(UInt64), `nested.arr22` Array(UInt64), `nested.arr23` Array(UInt64), `nested.arr24` Array(UInt64), `nested.arr25` Array(UInt64), `nested.arr26` Array(UInt64), `nested.arr27` Array(UInt64), `nested.arr28` Array(UInt64), `nested.arr29` Array(UInt64), `nested.arr30` Array(UInt64), `nested.arr31` Array(UInt64), `nested.arr32` Array(UInt64), `nested.arr33` Array(UInt64), `nested.arr34` Array(UInt64), `nested.arr35` Array(UInt64), `nested.arr36` Array(UInt64), `nested.arr37` Array(UInt64), `nested.arr38` Array(UInt64), `nested.arr39` Array(UInt64), `nested.arr40` Array(UInt64), `nested.arr41` Array(UInt64), `nested.arr42` Array(UInt64), `nested.arr43` Array(UInt64), `nested.arr44` Array(UInt64), `nested.arr45` Array(UInt64), `nested.arr46` Array(UInt64), `nested.arr47` Array(UInt64), `nested.arr48` Array(UInt64), `nested.arr49` Array(UInt64), `nested.arr50` Array(UInt64), `nested.arr51` Array(UInt64), `nested.arr52` Array(UInt64), `nested.arr53` Array(UInt64), `nested.arr54` Array(UInt64), `nested.arr55` Array(UInt64), `nested.arr56` Array(UInt64), `nested.arr57` Array(UInt64), `nested.arr58` Array(UInt64), `nested.arr59` Array(UInt64), `nested.arr60` Array(UInt64), `nested.arr61` Array(UInt64), `nested.arr62` Array(UInt64), `nested.arr63` Array(UInt64), `nested.arr64` Array(UInt64), `nested.arr65` Array(UInt64), `nested.arr66` Array(UInt64), `nested.arr67` Array(UInt64), `nested.arr68` Array(UInt64), `nested.arr69` Array(UInt64), `nested.arr70` Array(UInt64), `nested.arr71` Array(UInt64), `nested.arr72` Array(UInt64), `nested.arr73` Array(UInt64), `nested.arr74` Array(UInt64), `nested.arr75` Array(UInt64), `nested.arr76` Array(UInt64), `nested.arr77` Array(UInt64), `nested.arr78` Array(UInt64), `nested.arr79` Array(UInt64), `nested.arr80` Array(UInt64), `nested.arr81` Array(UInt64), `nested.arr82` Array(UInt64), `nested.arr83` Array(UInt64), `nested.arr84` Array(UInt64), `nested.arr85` Array(UInt64), `nested.arr86` Array(UInt64), `nested.arr87` Array(UInt64), `nested.arr88` Array(UInt64), `nested.arr89` Array(UInt64), `nested.arr90` Array(UInt64), `nested.arr91` Array(UInt64), `nested.arr92` Array(UInt64), `nested.arr93` Array(UInt64), `nested.arr94` Array(UInt64), `nested.arr95` Array(UInt64), `nested.arr96` Array(UInt64), `nested.arr97` Array(UInt64), `nested.arr98` Array(UInt64), `nested.arr99` Array(UInt64), + `nested.arr100` Array(UInt64), `nested.arr101` Array(UInt64), `nested.arr102` Array(UInt64), `nested.arr103` Array(UInt64), `nested.arr104` Array(UInt64), `nested.arr105` Array(UInt64), `nested.arr106` Array(UInt64), `nested.arr107` Array(UInt64), `nested.arr108` Array(UInt64), `nested.arr109` Array(UInt64), `nested.arr110` Array(UInt64), `nested.arr111` Array(UInt64), `nested.arr112` Array(UInt64), `nested.arr113` Array(UInt64), `nested.arr114` Array(UInt64), `nested.arr115` Array(UInt64), `nested.arr116` Array(UInt64), `nested.arr117` Array(UInt64), `nested.arr118` Array(UInt64), `nested.arr119` Array(UInt64), `nested.arr120` Array(UInt64), `nested.arr121` Array(UInt64), `nested.arr122` Array(UInt64), `nested.arr123` Array(UInt64), `nested.arr124` Array(UInt64), `nested.arr125` Array(UInt64), `nested.arr126` Array(UInt64), `nested.arr127` Array(UInt64), `nested.arr128` Array(UInt64), `nested.arr129` Array(UInt64), `nested.arr130` Array(UInt64), `nested.arr131` Array(UInt64), `nested.arr132` Array(UInt64), `nested.arr133` Array(UInt64), `nested.arr134` Array(UInt64), `nested.arr135` Array(UInt64), `nested.arr136` Array(UInt64), `nested.arr137` Array(UInt64), `nested.arr138` Array(UInt64), `nested.arr139` Array(UInt64), `nested.arr140` Array(UInt64), `nested.arr141` Array(UInt64), `nested.arr142` Array(UInt64), `nested.arr143` Array(UInt64), `nested.arr144` Array(UInt64), `nested.arr145` Array(UInt64), `nested.arr146` Array(UInt64), `nested.arr147` Array(UInt64), `nested.arr148` Array(UInt64), `nested.arr149` Array(UInt64), `nested.arr150` Array(UInt64), `nested.arr151` Array(UInt64), `nested.arr152` Array(UInt64), `nested.arr153` Array(UInt64), `nested.arr154` Array(UInt64), `nested.arr155` Array(UInt64), `nested.arr156` Array(UInt64), `nested.arr157` Array(UInt64), `nested.arr158` Array(UInt64), `nested.arr159` Array(UInt64), `nested.arr160` Array(UInt64), `nested.arr161` Array(UInt64), `nested.arr162` Array(UInt64), `nested.arr163` Array(UInt64), `nested.arr164` Array(UInt64), `nested.arr165` Array(UInt64), `nested.arr166` Array(UInt64), `nested.arr167` Array(UInt64), `nested.arr168` Array(UInt64), `nested.arr169` Array(UInt64), `nested.arr170` Array(UInt64), `nested.arr171` Array(UInt64), `nested.arr172` Array(UInt64), `nested.arr173` Array(UInt64), `nested.arr174` Array(UInt64), `nested.arr175` Array(UInt64), `nested.arr176` Array(UInt64), `nested.arr177` Array(UInt64), `nested.arr178` Array(UInt64), `nested.arr179` Array(UInt64), `nested.arr180` Array(UInt64), `nested.arr181` Array(UInt64), `nested.arr182` Array(UInt64), `nested.arr183` Array(UInt64), `nested.arr184` Array(UInt64), `nested.arr185` Array(UInt64), `nested.arr186` Array(UInt64), `nested.arr187` Array(UInt64), `nested.arr188` Array(UInt64), `nested.arr189` Array(UInt64), `nested.arr190` Array(UInt64), `nested.arr191` Array(UInt64), `nested.arr192` Array(UInt64), `nested.arr193` Array(UInt64), `nested.arr194` Array(UInt64), `nested.arr195` Array(UInt64), `nested.arr196` Array(UInt64), `nested.arr197` Array(UInt64), `nested.arr198` Array(UInt64), `nested.arr199` Array(UInt64), + `nested.arr200` Array(UInt64), `nested.arr201` Array(UInt64), `nested.arr202` Array(UInt64), `nested.arr203` Array(UInt64), `nested.arr204` Array(UInt64), `nested.arr205` Array(UInt64), `nested.arr206` Array(UInt64), `nested.arr207` Array(UInt64), `nested.arr208` Array(UInt64), `nested.arr209` Array(UInt64), `nested.arr210` Array(UInt64), `nested.arr211` Array(UInt64), `nested.arr212` Array(UInt64), `nested.arr213` Array(UInt64), `nested.arr214` Array(UInt64), `nested.arr215` Array(UInt64), `nested.arr216` Array(UInt64), `nested.arr217` Array(UInt64), `nested.arr218` Array(UInt64), `nested.arr219` Array(UInt64), `nested.arr220` Array(UInt64), `nested.arr221` Array(UInt64), `nested.arr222` Array(UInt64), `nested.arr223` Array(UInt64), `nested.arr224` Array(UInt64), `nested.arr225` Array(UInt64), `nested.arr226` Array(UInt64), `nested.arr227` Array(UInt64), `nested.arr228` Array(UInt64), `nested.arr229` Array(UInt64), `nested.arr230` Array(UInt64), `nested.arr231` Array(UInt64), `nested.arr232` Array(UInt64), `nested.arr233` Array(UInt64), `nested.arr234` Array(UInt64), `nested.arr235` Array(UInt64), `nested.arr236` Array(UInt64), `nested.arr237` Array(UInt64), `nested.arr238` Array(UInt64), `nested.arr239` Array(UInt64), `nested.arr240` Array(UInt64), `nested.arr241` Array(UInt64), `nested.arr242` Array(UInt64), `nested.arr243` Array(UInt64), `nested.arr244` Array(UInt64), `nested.arr245` Array(UInt64), `nested.arr246` Array(UInt64), `nested.arr247` Array(UInt64), `nested.arr248` Array(UInt64), `nested.arr249` Array(UInt64), `nested.arr250` Array(UInt64), `nested.arr251` Array(UInt64), `nested.arr252` Array(UInt64), `nested.arr253` Array(UInt64), `nested.arr254` Array(UInt64), `nested.arr255` Array(UInt64), `nested.arr256` Array(UInt64), `nested.arr257` Array(UInt64), `nested.arr258` Array(UInt64), `nested.arr259` Array(UInt64), `nested.arr260` Array(UInt64), `nested.arr261` Array(UInt64), `nested.arr262` Array(UInt64), `nested.arr263` Array(UInt64), `nested.arr264` Array(UInt64), `nested.arr265` Array(UInt64), `nested.arr266` Array(UInt64), `nested.arr267` Array(UInt64), `nested.arr268` Array(UInt64), `nested.arr269` Array(UInt64), `nested.arr270` Array(UInt64), `nested.arr271` Array(UInt64), `nested.arr272` Array(UInt64), `nested.arr273` Array(UInt64), `nested.arr274` Array(UInt64), `nested.arr275` Array(UInt64), `nested.arr276` Array(UInt64), `nested.arr277` Array(UInt64), `nested.arr278` Array(UInt64), `nested.arr279` Array(UInt64), `nested.arr280` Array(UInt64), `nested.arr281` Array(UInt64), `nested.arr282` Array(UInt64), `nested.arr283` Array(UInt64), `nested.arr284` Array(UInt64), `nested.arr285` Array(UInt64), `nested.arr286` Array(UInt64), `nested.arr287` Array(UInt64), `nested.arr288` Array(UInt64), `nested.arr289` Array(UInt64), `nested.arr290` Array(UInt64), `nested.arr291` Array(UInt64), `nested.arr292` Array(UInt64), `nested.arr293` Array(UInt64), `nested.arr294` Array(UInt64), `nested.arr295` Array(UInt64), `nested.arr296` Array(UInt64), `nested.arr297` Array(UInt64), `nested.arr298` Array(UInt64), `nested.arr299` Array(UInt64), + `nested.arr300` Array(UInt64), `nested.arr301` Array(UInt64), `nested.arr302` Array(UInt64), `nested.arr303` Array(UInt64), `nested.arr304` Array(UInt64), `nested.arr305` Array(UInt64), `nested.arr306` Array(UInt64), `nested.arr307` Array(UInt64), `nested.arr308` Array(UInt64), `nested.arr309` Array(UInt64), `nested.arr310` Array(UInt64), `nested.arr311` Array(UInt64), `nested.arr312` Array(UInt64), `nested.arr313` Array(UInt64), `nested.arr314` Array(UInt64), `nested.arr315` Array(UInt64), `nested.arr316` Array(UInt64), `nested.arr317` Array(UInt64), `nested.arr318` Array(UInt64), `nested.arr319` Array(UInt64), `nested.arr320` Array(UInt64), `nested.arr321` Array(UInt64), `nested.arr322` Array(UInt64), `nested.arr323` Array(UInt64), `nested.arr324` Array(UInt64), `nested.arr325` Array(UInt64), `nested.arr326` Array(UInt64), `nested.arr327` Array(UInt64), `nested.arr328` Array(UInt64), `nested.arr329` Array(UInt64), `nested.arr330` Array(UInt64), `nested.arr331` Array(UInt64), `nested.arr332` Array(UInt64), `nested.arr333` Array(UInt64), `nested.arr334` Array(UInt64), `nested.arr335` Array(UInt64), `nested.arr336` Array(UInt64), `nested.arr337` Array(UInt64), `nested.arr338` Array(UInt64), `nested.arr339` Array(UInt64), `nested.arr340` Array(UInt64), `nested.arr341` Array(UInt64), `nested.arr342` Array(UInt64), `nested.arr343` Array(UInt64), `nested.arr344` Array(UInt64), `nested.arr345` Array(UInt64), `nested.arr346` Array(UInt64), `nested.arr347` Array(UInt64), `nested.arr348` Array(UInt64), `nested.arr349` Array(UInt64), `nested.arr350` Array(UInt64), `nested.arr351` Array(UInt64), `nested.arr352` Array(UInt64), `nested.arr353` Array(UInt64), `nested.arr354` Array(UInt64), `nested.arr355` Array(UInt64), `nested.arr356` Array(UInt64), `nested.arr357` Array(UInt64), `nested.arr358` Array(UInt64), `nested.arr359` Array(UInt64), `nested.arr360` Array(UInt64), `nested.arr361` Array(UInt64), `nested.arr362` Array(UInt64), `nested.arr363` Array(UInt64), `nested.arr364` Array(UInt64), `nested.arr365` Array(UInt64), `nested.arr366` Array(UInt64), `nested.arr367` Array(UInt64), `nested.arr368` Array(UInt64), `nested.arr369` Array(UInt64), `nested.arr370` Array(UInt64), `nested.arr371` Array(UInt64), `nested.arr372` Array(UInt64), `nested.arr373` Array(UInt64), `nested.arr374` Array(UInt64), `nested.arr375` Array(UInt64), `nested.arr376` Array(UInt64), `nested.arr377` Array(UInt64), `nested.arr378` Array(UInt64), `nested.arr379` Array(UInt64), `nested.arr380` Array(UInt64), `nested.arr381` Array(UInt64), `nested.arr382` Array(UInt64), `nested.arr383` Array(UInt64), `nested.arr384` Array(UInt64), `nested.arr385` Array(UInt64), `nested.arr386` Array(UInt64), `nested.arr387` Array(UInt64), `nested.arr388` Array(UInt64), `nested.arr389` Array(UInt64), `nested.arr390` Array(UInt64), `nested.arr391` Array(UInt64), `nested.arr392` Array(UInt64), `nested.arr393` Array(UInt64), `nested.arr394` Array(UInt64), `nested.arr395` Array(UInt64), `nested.arr396` Array(UInt64), `nested.arr397` Array(UInt64), `nested.arr398` Array(UInt64), `nested.arr399` Array(UInt64), + `nested.arr400` Array(UInt64), `nested.arr401` Array(UInt64), `nested.arr402` Array(UInt64), `nested.arr403` Array(UInt64), `nested.arr404` Array(UInt64), `nested.arr405` Array(UInt64), `nested.arr406` Array(UInt64), `nested.arr407` Array(UInt64), `nested.arr408` Array(UInt64), `nested.arr409` Array(UInt64), `nested.arr410` Array(UInt64), `nested.arr411` Array(UInt64), `nested.arr412` Array(UInt64), `nested.arr413` Array(UInt64), `nested.arr414` Array(UInt64), `nested.arr415` Array(UInt64), `nested.arr416` Array(UInt64), `nested.arr417` Array(UInt64), `nested.arr418` Array(UInt64), `nested.arr419` Array(UInt64), `nested.arr420` Array(UInt64), `nested.arr421` Array(UInt64), `nested.arr422` Array(UInt64), `nested.arr423` Array(UInt64), `nested.arr424` Array(UInt64), `nested.arr425` Array(UInt64), `nested.arr426` Array(UInt64), `nested.arr427` Array(UInt64), `nested.arr428` Array(UInt64), `nested.arr429` Array(UInt64), `nested.arr430` Array(UInt64), `nested.arr431` Array(UInt64), `nested.arr432` Array(UInt64), `nested.arr433` Array(UInt64), `nested.arr434` Array(UInt64), `nested.arr435` Array(UInt64), `nested.arr436` Array(UInt64), `nested.arr437` Array(UInt64), `nested.arr438` Array(UInt64), `nested.arr439` Array(UInt64), `nested.arr440` Array(UInt64), `nested.arr441` Array(UInt64), `nested.arr442` Array(UInt64), `nested.arr443` Array(UInt64), `nested.arr444` Array(UInt64), `nested.arr445` Array(UInt64), `nested.arr446` Array(UInt64), `nested.arr447` Array(UInt64), `nested.arr448` Array(UInt64), `nested.arr449` Array(UInt64), `nested.arr450` Array(UInt64), `nested.arr451` Array(UInt64), `nested.arr452` Array(UInt64), `nested.arr453` Array(UInt64), `nested.arr454` Array(UInt64), `nested.arr455` Array(UInt64), `nested.arr456` Array(UInt64), `nested.arr457` Array(UInt64), `nested.arr458` Array(UInt64), `nested.arr459` Array(UInt64), `nested.arr460` Array(UInt64), `nested.arr461` Array(UInt64), `nested.arr462` Array(UInt64), `nested.arr463` Array(UInt64), `nested.arr464` Array(UInt64), `nested.arr465` Array(UInt64), `nested.arr466` Array(UInt64), `nested.arr467` Array(UInt64), `nested.arr468` Array(UInt64), `nested.arr469` Array(UInt64), `nested.arr470` Array(UInt64), `nested.arr471` Array(UInt64), `nested.arr472` Array(UInt64), `nested.arr473` Array(UInt64), `nested.arr474` Array(UInt64), `nested.arr475` Array(UInt64), `nested.arr476` Array(UInt64), `nested.arr477` Array(UInt64), `nested.arr478` Array(UInt64), `nested.arr479` Array(UInt64), `nested.arr480` Array(UInt64), `nested.arr481` Array(UInt64), `nested.arr482` Array(UInt64), `nested.arr483` Array(UInt64), `nested.arr484` Array(UInt64), `nested.arr485` Array(UInt64), `nested.arr486` Array(UInt64), `nested.arr487` Array(UInt64), `nested.arr488` Array(UInt64), `nested.arr489` Array(UInt64), `nested.arr490` Array(UInt64), `nested.arr491` Array(UInt64), `nested.arr492` Array(UInt64), `nested.arr493` Array(UInt64), `nested.arr494` Array(UInt64), `nested.arr495` Array(UInt64), `nested.arr496` Array(UInt64), `nested.arr497` Array(UInt64), `nested.arr498` Array(UInt64), `nested.arr499` Array(UInt64), + arr500 Array(Array(Nullable(UInt64))), arr501 Array(Array(Nullable(UInt64))), arr502 Array(Array(Nullable(UInt64))), arr503 Array(Array(Nullable(UInt64))), arr504 Array(Array(Nullable(UInt64))), arr505 Array(Array(Nullable(UInt64))), arr506 Array(Array(Nullable(UInt64))), arr507 Array(Array(Nullable(UInt64))), arr508 Array(Array(Nullable(UInt64))), arr509 Array(Array(Nullable(UInt64))), arr510 Array(Array(Nullable(UInt64))), arr511 Array(Array(Nullable(UInt64))), arr512 Array(Array(Nullable(UInt64))), arr513 Array(Array(Nullable(UInt64))), arr514 Array(Array(Nullable(UInt64))), arr515 Array(Array(Nullable(UInt64))), arr516 Array(Array(Nullable(UInt64))), arr517 Array(Array(Nullable(UInt64))), arr518 Array(Array(Nullable(UInt64))), arr519 Array(Array(Nullable(UInt64))), arr520 Array(Array(Nullable(UInt64))), arr521 Array(Array(Nullable(UInt64))), arr522 Array(Array(Nullable(UInt64))), arr523 Array(Array(Nullable(UInt64))), arr524 Array(Array(Nullable(UInt64))), arr525 Array(Array(Nullable(UInt64))), arr526 Array(Array(Nullable(UInt64))), arr527 Array(Array(Nullable(UInt64))), arr528 Array(Array(Nullable(UInt64))), arr529 Array(Array(Nullable(UInt64))), arr530 Array(Array(Nullable(UInt64))), arr531 Array(Array(Nullable(UInt64))), arr532 Array(Array(Nullable(UInt64))), arr533 Array(Array(Nullable(UInt64))), arr534 Array(Array(Nullable(UInt64))), arr535 Array(Array(Nullable(UInt64))), arr536 Array(Array(Nullable(UInt64))), arr537 Array(Array(Nullable(UInt64))), arr538 Array(Array(Nullable(UInt64))), arr539 Array(Array(Nullable(UInt64))), arr540 Array(Array(Nullable(UInt64))), arr541 Array(Array(Nullable(UInt64))), arr542 Array(Array(Nullable(UInt64))), arr543 Array(Array(Nullable(UInt64))), arr544 Array(Array(Nullable(UInt64))), arr545 Array(Array(Nullable(UInt64))), arr546 Array(Array(Nullable(UInt64))), arr547 Array(Array(Nullable(UInt64))), arr548 Array(Array(Nullable(UInt64))), arr549 Array(Array(Nullable(UInt64))), arr550 Array(Array(Nullable(UInt64))), arr551 Array(Array(Nullable(UInt64))), arr552 Array(Array(Nullable(UInt64))), arr553 Array(Array(Nullable(UInt64))), arr554 Array(Array(Nullable(UInt64))), arr555 Array(Array(Nullable(UInt64))), arr556 Array(Array(Nullable(UInt64))), arr557 Array(Array(Nullable(UInt64))), arr558 Array(Array(Nullable(UInt64))), arr559 Array(Array(Nullable(UInt64))), arr560 Array(Array(Nullable(UInt64))), arr561 Array(Array(Nullable(UInt64))), arr562 Array(Array(Nullable(UInt64))), arr563 Array(Array(Nullable(UInt64))), arr564 Array(Array(Nullable(UInt64))), arr565 Array(Array(Nullable(UInt64))), arr566 Array(Array(Nullable(UInt64))), arr567 Array(Array(Nullable(UInt64))), arr568 Array(Array(Nullable(UInt64))), arr569 Array(Array(Nullable(UInt64))), arr570 Array(Array(Nullable(UInt64))), arr571 Array(Array(Nullable(UInt64))), arr572 Array(Array(Nullable(UInt64))), arr573 Array(Array(Nullable(UInt64))), arr574 Array(Array(Nullable(UInt64))), arr575 Array(Array(Nullable(UInt64))), arr576 Array(Array(Nullable(UInt64))), arr577 Array(Array(Nullable(UInt64))), arr578 Array(Array(Nullable(UInt64))), arr579 Array(Array(Nullable(UInt64))), arr580 Array(Array(Nullable(UInt64))), arr581 Array(Array(Nullable(UInt64))), arr582 Array(Array(Nullable(UInt64))), arr583 Array(Array(Nullable(UInt64))), arr584 Array(Array(Nullable(UInt64))), arr585 Array(Array(Nullable(UInt64))), arr586 Array(Array(Nullable(UInt64))), arr587 Array(Array(Nullable(UInt64))), arr588 Array(Array(Nullable(UInt64))), arr589 Array(Array(Nullable(UInt64))), arr590 Array(Array(Nullable(UInt64))), arr591 Array(Array(Nullable(UInt64))), arr592 Array(Array(Nullable(UInt64))), arr593 Array(Array(Nullable(UInt64))), arr594 Array(Array(Nullable(UInt64))), arr595 Array(Array(Nullable(UInt64))), arr596 Array(Array(Nullable(UInt64))), arr597 Array(Array(Nullable(UInt64))), arr598 Array(Array(Nullable(UInt64))), arr599 Array(Array(Nullable(UInt64))), + arr600 Array(Array(Nullable(UInt64))), arr601 Array(Array(Nullable(UInt64))), arr602 Array(Array(Nullable(UInt64))), arr603 Array(Array(Nullable(UInt64))), arr604 Array(Array(Nullable(UInt64))), arr605 Array(Array(Nullable(UInt64))), arr606 Array(Array(Nullable(UInt64))), arr607 Array(Array(Nullable(UInt64))), arr608 Array(Array(Nullable(UInt64))), arr609 Array(Array(Nullable(UInt64))), arr610 Array(Array(Nullable(UInt64))), arr611 Array(Array(Nullable(UInt64))), arr612 Array(Array(Nullable(UInt64))), arr613 Array(Array(Nullable(UInt64))), arr614 Array(Array(Nullable(UInt64))), arr615 Array(Array(Nullable(UInt64))), arr616 Array(Array(Nullable(UInt64))), arr617 Array(Array(Nullable(UInt64))), arr618 Array(Array(Nullable(UInt64))), arr619 Array(Array(Nullable(UInt64))), arr620 Array(Array(Nullable(UInt64))), arr621 Array(Array(Nullable(UInt64))), arr622 Array(Array(Nullable(UInt64))), arr623 Array(Array(Nullable(UInt64))), arr624 Array(Array(Nullable(UInt64))), arr625 Array(Array(Nullable(UInt64))), arr626 Array(Array(Nullable(UInt64))), arr627 Array(Array(Nullable(UInt64))), arr628 Array(Array(Nullable(UInt64))), arr629 Array(Array(Nullable(UInt64))), arr630 Array(Array(Nullable(UInt64))), arr631 Array(Array(Nullable(UInt64))), arr632 Array(Array(Nullable(UInt64))), arr633 Array(Array(Nullable(UInt64))), arr634 Array(Array(Nullable(UInt64))), arr635 Array(Array(Nullable(UInt64))), arr636 Array(Array(Nullable(UInt64))), arr637 Array(Array(Nullable(UInt64))), arr638 Array(Array(Nullable(UInt64))), arr639 Array(Array(Nullable(UInt64))), arr640 Array(Array(Nullable(UInt64))), arr641 Array(Array(Nullable(UInt64))), arr642 Array(Array(Nullable(UInt64))), arr643 Array(Array(Nullable(UInt64))), arr644 Array(Array(Nullable(UInt64))), arr645 Array(Array(Nullable(UInt64))), arr646 Array(Array(Nullable(UInt64))), arr647 Array(Array(Nullable(UInt64))), arr648 Array(Array(Nullable(UInt64))), arr649 Array(Array(Nullable(UInt64))), arr650 Array(Array(Nullable(UInt64))), arr651 Array(Array(Nullable(UInt64))), arr652 Array(Array(Nullable(UInt64))), arr653 Array(Array(Nullable(UInt64))), arr654 Array(Array(Nullable(UInt64))), arr655 Array(Array(Nullable(UInt64))), arr656 Array(Array(Nullable(UInt64))), arr657 Array(Array(Nullable(UInt64))), arr658 Array(Array(Nullable(UInt64))), arr659 Array(Array(Nullable(UInt64))), arr660 Array(Array(Nullable(UInt64))), arr661 Array(Array(Nullable(UInt64))), arr662 Array(Array(Nullable(UInt64))), arr663 Array(Array(Nullable(UInt64))), arr664 Array(Array(Nullable(UInt64))), arr665 Array(Array(Nullable(UInt64))), arr666 Array(Array(Nullable(UInt64))), arr667 Array(Array(Nullable(UInt64))), arr668 Array(Array(Nullable(UInt64))), arr669 Array(Array(Nullable(UInt64))), arr670 Array(Array(Nullable(UInt64))), arr671 Array(Array(Nullable(UInt64))), arr672 Array(Array(Nullable(UInt64))), arr673 Array(Array(Nullable(UInt64))), arr674 Array(Array(Nullable(UInt64))), arr675 Array(Array(Nullable(UInt64))), arr676 Array(Array(Nullable(UInt64))), arr677 Array(Array(Nullable(UInt64))), arr678 Array(Array(Nullable(UInt64))), arr679 Array(Array(Nullable(UInt64))), arr680 Array(Array(Nullable(UInt64))), arr681 Array(Array(Nullable(UInt64))), arr682 Array(Array(Nullable(UInt64))), arr683 Array(Array(Nullable(UInt64))), arr684 Array(Array(Nullable(UInt64))), arr685 Array(Array(Nullable(UInt64))), arr686 Array(Array(Nullable(UInt64))), arr687 Array(Array(Nullable(UInt64))), arr688 Array(Array(Nullable(UInt64))), arr689 Array(Array(Nullable(UInt64))), arr690 Array(Array(Nullable(UInt64))), arr691 Array(Array(Nullable(UInt64))), arr692 Array(Array(Nullable(UInt64))), arr693 Array(Array(Nullable(UInt64))), arr694 Array(Array(Nullable(UInt64))), arr695 Array(Array(Nullable(UInt64))), arr696 Array(Array(Nullable(UInt64))), arr697 Array(Array(Nullable(UInt64))), arr698 Array(Array(Nullable(UInt64))), arr699 Array(Array(Nullable(UInt64))), + arr700 Array(Array(Nullable(UInt64))), arr701 Array(Array(Nullable(UInt64))), arr702 Array(Array(Nullable(UInt64))), arr703 Array(Array(Nullable(UInt64))), arr704 Array(Array(Nullable(UInt64))), arr705 Array(Array(Nullable(UInt64))), arr706 Array(Array(Nullable(UInt64))), arr707 Array(Array(Nullable(UInt64))), arr708 Array(Array(Nullable(UInt64))), arr709 Array(Array(Nullable(UInt64))), arr710 Array(Array(Nullable(UInt64))), arr711 Array(Array(Nullable(UInt64))), arr712 Array(Array(Nullable(UInt64))), arr713 Array(Array(Nullable(UInt64))), arr714 Array(Array(Nullable(UInt64))), arr715 Array(Array(Nullable(UInt64))), arr716 Array(Array(Nullable(UInt64))), arr717 Array(Array(Nullable(UInt64))), arr718 Array(Array(Nullable(UInt64))), arr719 Array(Array(Nullable(UInt64))), arr720 Array(Array(Nullable(UInt64))), arr721 Array(Array(Nullable(UInt64))), arr722 Array(Array(Nullable(UInt64))), arr723 Array(Array(Nullable(UInt64))), arr724 Array(Array(Nullable(UInt64))), arr725 Array(Array(Nullable(UInt64))), arr726 Array(Array(Nullable(UInt64))), arr727 Array(Array(Nullable(UInt64))), arr728 Array(Array(Nullable(UInt64))), arr729 Array(Array(Nullable(UInt64))), arr730 Array(Array(Nullable(UInt64))), arr731 Array(Array(Nullable(UInt64))), arr732 Array(Array(Nullable(UInt64))), arr733 Array(Array(Nullable(UInt64))), arr734 Array(Array(Nullable(UInt64))), arr735 Array(Array(Nullable(UInt64))), arr736 Array(Array(Nullable(UInt64))), arr737 Array(Array(Nullable(UInt64))), arr738 Array(Array(Nullable(UInt64))), arr739 Array(Array(Nullable(UInt64))), arr740 Array(Array(Nullable(UInt64))), arr741 Array(Array(Nullable(UInt64))), arr742 Array(Array(Nullable(UInt64))), arr743 Array(Array(Nullable(UInt64))), arr744 Array(Array(Nullable(UInt64))), arr745 Array(Array(Nullable(UInt64))), arr746 Array(Array(Nullable(UInt64))), arr747 Array(Array(Nullable(UInt64))), arr748 Array(Array(Nullable(UInt64))), arr749 Array(Array(Nullable(UInt64))), arr750 Array(Array(Nullable(UInt64))), arr751 Array(Array(Nullable(UInt64))), arr752 Array(Array(Nullable(UInt64))), arr753 Array(Array(Nullable(UInt64))), arr754 Array(Array(Nullable(UInt64))), arr755 Array(Array(Nullable(UInt64))), arr756 Array(Array(Nullable(UInt64))), arr757 Array(Array(Nullable(UInt64))), arr758 Array(Array(Nullable(UInt64))), arr759 Array(Array(Nullable(UInt64))), arr760 Array(Array(Nullable(UInt64))), arr761 Array(Array(Nullable(UInt64))), arr762 Array(Array(Nullable(UInt64))), arr763 Array(Array(Nullable(UInt64))), arr764 Array(Array(Nullable(UInt64))), arr765 Array(Array(Nullable(UInt64))), arr766 Array(Array(Nullable(UInt64))), arr767 Array(Array(Nullable(UInt64))), arr768 Array(Array(Nullable(UInt64))), arr769 Array(Array(Nullable(UInt64))), arr770 Array(Array(Nullable(UInt64))), arr771 Array(Array(Nullable(UInt64))), arr772 Array(Array(Nullable(UInt64))), arr773 Array(Array(Nullable(UInt64))), arr774 Array(Array(Nullable(UInt64))), arr775 Array(Array(Nullable(UInt64))), arr776 Array(Array(Nullable(UInt64))), arr777 Array(Array(Nullable(UInt64))), arr778 Array(Array(Nullable(UInt64))), arr779 Array(Array(Nullable(UInt64))), arr780 Array(Array(Nullable(UInt64))), arr781 Array(Array(Nullable(UInt64))), arr782 Array(Array(Nullable(UInt64))), arr783 Array(Array(Nullable(UInt64))), arr784 Array(Array(Nullable(UInt64))), arr785 Array(Array(Nullable(UInt64))), arr786 Array(Array(Nullable(UInt64))), arr787 Array(Array(Nullable(UInt64))), arr788 Array(Array(Nullable(UInt64))), arr789 Array(Array(Nullable(UInt64))), arr790 Array(Array(Nullable(UInt64))), arr791 Array(Array(Nullable(UInt64))), arr792 Array(Array(Nullable(UInt64))), arr793 Array(Array(Nullable(UInt64))), arr794 Array(Array(Nullable(UInt64))), arr795 Array(Array(Nullable(UInt64))), arr796 Array(Array(Nullable(UInt64))), arr797 Array(Array(Nullable(UInt64))), arr798 Array(Array(Nullable(UInt64))), arr799 Array(Array(Nullable(UInt64))), + arr800 Array(Array(Nullable(UInt64))), arr801 Array(Array(Nullable(UInt64))), arr802 Array(Array(Nullable(UInt64))), arr803 Array(Array(Nullable(UInt64))), arr804 Array(Array(Nullable(UInt64))), arr805 Array(Array(Nullable(UInt64))), arr806 Array(Array(Nullable(UInt64))), arr807 Array(Array(Nullable(UInt64))), arr808 Array(Array(Nullable(UInt64))), arr809 Array(Array(Nullable(UInt64))), arr810 Array(Array(Nullable(UInt64))), arr811 Array(Array(Nullable(UInt64))), arr812 Array(Array(Nullable(UInt64))), arr813 Array(Array(Nullable(UInt64))), arr814 Array(Array(Nullable(UInt64))), arr815 Array(Array(Nullable(UInt64))), arr816 Array(Array(Nullable(UInt64))), arr817 Array(Array(Nullable(UInt64))), arr818 Array(Array(Nullable(UInt64))), arr819 Array(Array(Nullable(UInt64))), arr820 Array(Array(Nullable(UInt64))), arr821 Array(Array(Nullable(UInt64))), arr822 Array(Array(Nullable(UInt64))), arr823 Array(Array(Nullable(UInt64))), arr824 Array(Array(Nullable(UInt64))), arr825 Array(Array(Nullable(UInt64))), arr826 Array(Array(Nullable(UInt64))), arr827 Array(Array(Nullable(UInt64))), arr828 Array(Array(Nullable(UInt64))), arr829 Array(Array(Nullable(UInt64))), arr830 Array(Array(Nullable(UInt64))), arr831 Array(Array(Nullable(UInt64))), arr832 Array(Array(Nullable(UInt64))), arr833 Array(Array(Nullable(UInt64))), arr834 Array(Array(Nullable(UInt64))), arr835 Array(Array(Nullable(UInt64))), arr836 Array(Array(Nullable(UInt64))), arr837 Array(Array(Nullable(UInt64))), arr838 Array(Array(Nullable(UInt64))), arr839 Array(Array(Nullable(UInt64))), arr840 Array(Array(Nullable(UInt64))), arr841 Array(Array(Nullable(UInt64))), arr842 Array(Array(Nullable(UInt64))), arr843 Array(Array(Nullable(UInt64))), arr844 Array(Array(Nullable(UInt64))), arr845 Array(Array(Nullable(UInt64))), arr846 Array(Array(Nullable(UInt64))), arr847 Array(Array(Nullable(UInt64))), arr848 Array(Array(Nullable(UInt64))), arr849 Array(Array(Nullable(UInt64))), arr850 Array(Array(Nullable(UInt64))), arr851 Array(Array(Nullable(UInt64))), arr852 Array(Array(Nullable(UInt64))), arr853 Array(Array(Nullable(UInt64))), arr854 Array(Array(Nullable(UInt64))), arr855 Array(Array(Nullable(UInt64))), arr856 Array(Array(Nullable(UInt64))), arr857 Array(Array(Nullable(UInt64))), arr858 Array(Array(Nullable(UInt64))), arr859 Array(Array(Nullable(UInt64))), arr860 Array(Array(Nullable(UInt64))), arr861 Array(Array(Nullable(UInt64))), arr862 Array(Array(Nullable(UInt64))), arr863 Array(Array(Nullable(UInt64))), arr864 Array(Array(Nullable(UInt64))), arr865 Array(Array(Nullable(UInt64))), arr866 Array(Array(Nullable(UInt64))), arr867 Array(Array(Nullable(UInt64))), arr868 Array(Array(Nullable(UInt64))), arr869 Array(Array(Nullable(UInt64))), arr870 Array(Array(Nullable(UInt64))), arr871 Array(Array(Nullable(UInt64))), arr872 Array(Array(Nullable(UInt64))), arr873 Array(Array(Nullable(UInt64))), arr874 Array(Array(Nullable(UInt64))), arr875 Array(Array(Nullable(UInt64))), arr876 Array(Array(Nullable(UInt64))), arr877 Array(Array(Nullable(UInt64))), arr878 Array(Array(Nullable(UInt64))), arr879 Array(Array(Nullable(UInt64))), arr880 Array(Array(Nullable(UInt64))), arr881 Array(Array(Nullable(UInt64))), arr882 Array(Array(Nullable(UInt64))), arr883 Array(Array(Nullable(UInt64))), arr884 Array(Array(Nullable(UInt64))), arr885 Array(Array(Nullable(UInt64))), arr886 Array(Array(Nullable(UInt64))), arr887 Array(Array(Nullable(UInt64))), arr888 Array(Array(Nullable(UInt64))), arr889 Array(Array(Nullable(UInt64))), arr890 Array(Array(Nullable(UInt64))), arr891 Array(Array(Nullable(UInt64))), arr892 Array(Array(Nullable(UInt64))), arr893 Array(Array(Nullable(UInt64))), arr894 Array(Array(Nullable(UInt64))), arr895 Array(Array(Nullable(UInt64))), arr896 Array(Array(Nullable(UInt64))), arr897 Array(Array(Nullable(UInt64))), arr898 Array(Array(Nullable(UInt64))), arr899 Array(Array(Nullable(UInt64))), + arr900 Array(Array(Nullable(UInt64))), arr901 Array(Array(Nullable(UInt64))), arr902 Array(Array(Nullable(UInt64))), arr903 Array(Array(Nullable(UInt64))), arr904 Array(Array(Nullable(UInt64))), arr905 Array(Array(Nullable(UInt64))), arr906 Array(Array(Nullable(UInt64))), arr907 Array(Array(Nullable(UInt64))), arr908 Array(Array(Nullable(UInt64))), arr909 Array(Array(Nullable(UInt64))), arr910 Array(Array(Nullable(UInt64))), arr911 Array(Array(Nullable(UInt64))), arr912 Array(Array(Nullable(UInt64))), arr913 Array(Array(Nullable(UInt64))), arr914 Array(Array(Nullable(UInt64))), arr915 Array(Array(Nullable(UInt64))), arr916 Array(Array(Nullable(UInt64))), arr917 Array(Array(Nullable(UInt64))), arr918 Array(Array(Nullable(UInt64))), arr919 Array(Array(Nullable(UInt64))), arr920 Array(Array(Nullable(UInt64))), arr921 Array(Array(Nullable(UInt64))), arr922 Array(Array(Nullable(UInt64))), arr923 Array(Array(Nullable(UInt64))), arr924 Array(Array(Nullable(UInt64))), arr925 Array(Array(Nullable(UInt64))), arr926 Array(Array(Nullable(UInt64))), arr927 Array(Array(Nullable(UInt64))), arr928 Array(Array(Nullable(UInt64))), arr929 Array(Array(Nullable(UInt64))), arr930 Array(Array(Nullable(UInt64))), arr931 Array(Array(Nullable(UInt64))), arr932 Array(Array(Nullable(UInt64))), arr933 Array(Array(Nullable(UInt64))), arr934 Array(Array(Nullable(UInt64))), arr935 Array(Array(Nullable(UInt64))), arr936 Array(Array(Nullable(UInt64))), arr937 Array(Array(Nullable(UInt64))), arr938 Array(Array(Nullable(UInt64))), arr939 Array(Array(Nullable(UInt64))), arr940 Array(Array(Nullable(UInt64))), arr941 Array(Array(Nullable(UInt64))), arr942 Array(Array(Nullable(UInt64))), arr943 Array(Array(Nullable(UInt64))), arr944 Array(Array(Nullable(UInt64))), arr945 Array(Array(Nullable(UInt64))), arr946 Array(Array(Nullable(UInt64))), arr947 Array(Array(Nullable(UInt64))), arr948 Array(Array(Nullable(UInt64))), arr949 Array(Array(Nullable(UInt64))), arr950 Array(Array(Nullable(UInt64))), arr951 Array(Array(Nullable(UInt64))), arr952 Array(Array(Nullable(UInt64))), arr953 Array(Array(Nullable(UInt64))), arr954 Array(Array(Nullable(UInt64))), arr955 Array(Array(Nullable(UInt64))), arr956 Array(Array(Nullable(UInt64))), arr957 Array(Array(Nullable(UInt64))), arr958 Array(Array(Nullable(UInt64))), arr959 Array(Array(Nullable(UInt64))), arr960 Array(Array(Nullable(UInt64))), arr961 Array(Array(Nullable(UInt64))), arr962 Array(Array(Nullable(UInt64))), arr963 Array(Array(Nullable(UInt64))), arr964 Array(Array(Nullable(UInt64))), arr965 Array(Array(Nullable(UInt64))), arr966 Array(Array(Nullable(UInt64))), arr967 Array(Array(Nullable(UInt64))), arr968 Array(Array(Nullable(UInt64))), arr969 Array(Array(Nullable(UInt64))), arr970 Array(Array(Nullable(UInt64))), arr971 Array(Array(Nullable(UInt64))), arr972 Array(Array(Nullable(UInt64))), arr973 Array(Array(Nullable(UInt64))), arr974 Array(Array(Nullable(UInt64))), arr975 Array(Array(Nullable(UInt64))), arr976 Array(Array(Nullable(UInt64))), arr977 Array(Array(Nullable(UInt64))), arr978 Array(Array(Nullable(UInt64))), arr979 Array(Array(Nullable(UInt64))), arr980 Array(Array(Nullable(UInt64))), arr981 Array(Array(Nullable(UInt64))), arr982 Array(Array(Nullable(UInt64))), arr983 Array(Array(Nullable(UInt64))), arr984 Array(Array(Nullable(UInt64))), arr985 Array(Array(Nullable(UInt64))), arr986 Array(Array(Nullable(UInt64))), arr987 Array(Array(Nullable(UInt64))), arr988 Array(Array(Nullable(UInt64))), arr989 Array(Array(Nullable(UInt64))), arr990 Array(Array(Nullable(UInt64))), arr991 Array(Array(Nullable(UInt64))), arr992 Array(Array(Nullable(UInt64))), arr993 Array(Array(Nullable(UInt64))), arr994 Array(Array(Nullable(UInt64))), arr995 Array(Array(Nullable(UInt64))), arr996 Array(Array(Nullable(UInt64))), arr997 Array(Array(Nullable(UInt64))), arr998 Array(Array(Nullable(UInt64))), arr999 Array(Array(Nullable(UInt64)))) + ENGINE = MergeTree ORDER BY id PARTITION BY id % 100 + + + INSERT INTO lot_of_arrays(id) SELECT number FROM numbers(1000) + OPTIMIZE TABLE lot_of_arrays FINAL + + SELECT nested.arr0 FROM lot_of_arrays WHERE id > 10 FORMAT Null + + DROP TABLE IF EXISTS lot_of_arrays + diff --git a/tests/performance/nlp.xml b/tests/performance/nlp.xml new file mode 100644 index 00000000000..07eda93c686 --- /dev/null +++ b/tests/performance/nlp.xml @@ -0,0 +1,20 @@ + + + 1 + + + + hits_100m_single + + + CREATE TABLE hits_100m_words (words Array(String), UserID UInt64) ENGINE Memory + INSERT INTO hits_100m_words SELECT splitByNonAlpha(SearchPhrase) AS words, UserID FROM hits_100m_single WHERE length(words) > 0 + + SELECT splitByNonAlpha(SearchPhrase) FROM hits_100m_single FORMAT Null + SELECT splitByWhitespace(SearchPhrase) FROM hits_100m_single FORMAT Null + + SELECT arrayMap(x -> stem('ru', x), words) FROM hits_100m_words FORMAT Null + + DROP TABLE IF EXISTS hits_100m_words + DROP TABLE IF EXISTS hits_100m_words_ws + diff --git a/tests/performance/window_functions.xml b/tests/performance/window_functions.xml index 6be3d59e2b0..e3d30d96ec3 100644 --- a/tests/performance/window_functions.xml +++ b/tests/performance/window_functions.xml @@ -3,10 +3,6 @@ hits_100m_single - - 1 - - ... SELECT sum(n) from rich_syntax; + +-- Clear cache to avoid future errors in the logs +SYSTEM DROP DNS CACHE diff --git a/tests/queries/0_stateless/01087_storage_generate.sql b/tests/queries/0_stateless/01087_storage_generate.sql index bc69e8abbac..a16ad55832c 100644 --- a/tests/queries/0_stateless/01087_storage_generate.sql +++ b/tests/queries/0_stateless/01087_storage_generate.sql @@ -7,7 +7,7 @@ DROP TABLE IF EXISTS test_table; SELECT '-'; DROP TABLE IF EXISTS test_table_2; -CREATE TABLE test_table_2(a Array(Int8), d Decimal32(4), c Tuple(DateTime64(3), UUID)) ENGINE=GenerateRandom(10, 5, 3); +CREATE TABLE test_table_2(a Array(Int8), d Decimal32(4), c Tuple(DateTime64(3, 'Europe/Moscow'), UUID)) ENGINE=GenerateRandom(10, 5, 3); SELECT * FROM test_table_2 LIMIT 100; diff --git a/tests/queries/0_stateless/01087_table_function_generate.reference b/tests/queries/0_stateless/01087_table_function_generate.reference index c04fa831328..bf301d34eb3 100644 --- a/tests/queries/0_stateless/01087_table_function_generate.reference +++ b/tests/queries/0_stateless/01087_table_function_generate.reference @@ -46,7 +46,7 @@ h \N o - -Date DateTime DateTime(\'Europe/Moscow\') +Date DateTime(\'Europe/Moscow\') DateTime(\'Europe/Moscow\') 2113-06-12 2050-12-17 02:46:35 2096-02-16 22:18:22 2141-08-09 2013-10-17 23:35:26 1976-01-24 12:52:48 2039-08-16 1974-11-17 23:22:46 1980-03-04 21:02:50 @@ -58,7 +58,7 @@ Date DateTime DateTime(\'Europe/Moscow\') 2008-03-16 2047-05-16 23:28:36 2103-02-11 16:44:39 2000-07-07 2105-07-19 19:29:06 1980-01-02 05:18:22 - -DateTime64(3) DateTime64(6) DateTime64(6, \'Europe/Moscow\') +DateTime64(3, \'Europe/Moscow\') DateTime64(6, \'Europe/Moscow\') DateTime64(6, \'Europe/Moscow\') 1978-06-07 23:50:57.320 2013-08-28 10:21:54.010758 1991-08-25 16:23:26.140215 1978-08-25 17:07:25.427 2034-05-02 20:49:42.148578 2015-08-26 15:26:31.783160 2037-04-04 10:50:56.898 2055-05-28 11:12:48.819271 2068-12-26 09:58:49.635722 diff --git a/tests/queries/0_stateless/01087_table_function_generate.sql b/tests/queries/0_stateless/01087_table_function_generate.sql index 05f03a5a4e6..9a0f7db24ec 100644 --- a/tests/queries/0_stateless/01087_table_function_generate.sql +++ b/tests/queries/0_stateless/01087_table_function_generate.sql @@ -42,20 +42,20 @@ LIMIT 10; SELECT '-'; SELECT toTypeName(d), toTypeName(dt), toTypeName(dtm) -FROM generateRandom('d Date, dt DateTime, dtm DateTime(\'Europe/Moscow\')') +FROM generateRandom('d Date, dt DateTime(\'Europe/Moscow\'), dtm DateTime(\'Europe/Moscow\')') LIMIT 1; SELECT d, dt, dtm -FROM generateRandom('d Date, dt DateTime, dtm DateTime(\'Europe/Moscow\')', 1, 10, 10) +FROM generateRandom('d Date, dt DateTime(\'Europe/Moscow\'), dtm DateTime(\'Europe/Moscow\')', 1, 10, 10) LIMIT 10; SELECT '-'; SELECT toTypeName(dt64), toTypeName(dts64), toTypeName(dtms64) -FROM generateRandom('dt64 DateTime64, dts64 DateTime64(6), dtms64 DateTime64(6 ,\'Europe/Moscow\')') +FROM generateRandom('dt64 DateTime64(3, \'Europe/Moscow\'), dts64 DateTime64(6, \'Europe/Moscow\'), dtms64 DateTime64(6 ,\'Europe/Moscow\')') LIMIT 1; SELECT dt64, dts64, dtms64 -FROM generateRandom('dt64 DateTime64, dts64 DateTime64(6), dtms64 DateTime64(6 ,\'Europe/Moscow\')', 1, 10, 10) +FROM generateRandom('dt64 DateTime64(3, \'Europe/Moscow\'), dts64 DateTime64(6, \'Europe/Moscow\'), dtms64 DateTime64(6 ,\'Europe/Moscow\')', 1, 10, 10) LIMIT 10; SELECT '-'; SELECT @@ -168,8 +168,8 @@ FROM generateRandom('i String', 1, 10, 10) LIMIT 10; SELECT '-'; DROP TABLE IF EXISTS test_table; -CREATE TABLE test_table(a Array(Int8), d Decimal32(4), c Tuple(DateTime64(3), UUID)) ENGINE=Memory; -INSERT INTO test_table SELECT * FROM generateRandom('a Array(Int8), d Decimal32(4), c Tuple(DateTime64(3), UUID)', 1, 10, 2) +CREATE TABLE test_table(a Array(Int8), d Decimal32(4), c Tuple(DateTime64(3, 'Europe/Moscow'), UUID)) ENGINE=Memory; +INSERT INTO test_table SELECT * FROM generateRandom('a Array(Int8), d Decimal32(4), c Tuple(DateTime64(3, \'Europe/Moscow\'), UUID)', 1, 10, 2) LIMIT 10; SELECT * FROM test_table ORDER BY a, d, c; @@ -179,8 +179,8 @@ DROP TABLE IF EXISTS test_table; SELECT '-'; DROP TABLE IF EXISTS test_table_2; -CREATE TABLE test_table_2(a Array(Int8), b UInt32, c Nullable(String), d Decimal32(4), e Nullable(Enum16('h' = 1, 'w' = 5 , 'o' = -200)), f Float64, g Tuple(Date, DateTime, DateTime64, UUID), h FixedString(2)) ENGINE=Memory; -INSERT INTO test_table_2 SELECT * FROM generateRandom('a Array(Int8), b UInt32, c Nullable(String), d Decimal32(4), e Nullable(Enum16(\'h\' = 1, \'w\' = 5 , \'o\' = -200)), f Float64, g Tuple(Date, DateTime, DateTime64, UUID), h FixedString(2)', 10, 5, 3) +CREATE TABLE test_table_2(a Array(Int8), b UInt32, c Nullable(String), d Decimal32(4), e Nullable(Enum16('h' = 1, 'w' = 5 , 'o' = -200)), f Float64, g Tuple(Date, DateTime('Europe/Moscow'), DateTime64(3, 'Europe/Moscow'), UUID), h FixedString(2)) ENGINE=Memory; +INSERT INTO test_table_2 SELECT * FROM generateRandom('a Array(Int8), b UInt32, c Nullable(String), d Decimal32(4), e Nullable(Enum16(\'h\' = 1, \'w\' = 5 , \'o\' = -200)), f Float64, g Tuple(Date, DateTime(\'Europe/Moscow\'), DateTime64(3, \'Europe/Moscow\'), UUID), h FixedString(2)', 10, 5, 3) LIMIT 10; SELECT a, b, c, d, e, f, g, hex(h) FROM test_table_2 ORDER BY a, b, c, d, e, f, g, h; diff --git a/tests/queries/0_stateless/01095_tpch_like_smoke.reference b/tests/queries/0_stateless/01095_tpch_like_smoke.reference index e47b402bf9f..8cdcc2b015f 100644 --- a/tests/queries/0_stateless/01095_tpch_like_smoke.reference +++ b/tests/queries/0_stateless/01095_tpch_like_smoke.reference @@ -11,7 +11,7 @@ 10 11 12 -13 fail: join predicates +13 14 0.000000 15 fail: correlated subquery diff --git a/tests/queries/0_stateless/01095_tpch_like_smoke.sql b/tests/queries/0_stateless/01095_tpch_like_smoke.sql index ffd2e21dc39..5971178ade5 100644 --- a/tests/queries/0_stateless/01095_tpch_like_smoke.sql +++ b/tests/queries/0_stateless/01095_tpch_like_smoke.sql @@ -476,7 +476,7 @@ group by order by l_shipmode; -select 13, 'fail: join predicates'; -- TODO: Invalid expression for JOIN ON +select 13; select c_count, count(*) as custdist @@ -484,7 +484,7 @@ from ( select c_custkey, - count(o_orderkey) + count(o_orderkey) as c_count from customer left outer join orders on c_custkey = o_custkey @@ -496,7 +496,7 @@ group by c_count order by custdist desc, - c_count desc; -- { serverError 403 } + c_count desc; select 14; select diff --git a/tests/queries/0_stateless/01098_msgpack_format.sh b/tests/queries/0_stateless/01098_msgpack_format.sh index c7a1a0cff42..3bc60b4a9cb 100755 --- a/tests/queries/0_stateless/01098_msgpack_format.sh +++ b/tests/queries/0_stateless/01098_msgpack_format.sh @@ -6,7 +6,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS msgpack"; -$CLICKHOUSE_CLIENT --query="CREATE TABLE msgpack (uint8 UInt8, uint16 UInt16, uint32 UInt32, uint64 UInt64, int8 Int8, int16 Int16, int32 Int32, int64 Int64, float Float32, double Float64, string String, date Date, datetime DateTime, datetime64 DateTime64, array Array(UInt32)) ENGINE = Memory"; +$CLICKHOUSE_CLIENT --query="CREATE TABLE msgpack (uint8 UInt8, uint16 UInt16, uint32 UInt32, uint64 UInt64, int8 Int8, int16 Int16, int32 Int32, int64 Int64, float Float32, double Float64, string String, date Date, datetime DateTime('Europe/Moscow'), datetime64 DateTime64(3, 'Europe/Moscow'), array Array(UInt32)) ENGINE = Memory"; $CLICKHOUSE_CLIENT --query="INSERT INTO msgpack VALUES (255, 65535, 4294967295, 100000000000, -128, -32768, -2147483648, -100000000000, 2.02, 10000.0000001, 'String', 18980, 1639872000, 1639872000000, [1,2,3,4,5]), (4, 1234, 3244467295, 500000000000, -1, -256, -14741221, -7000000000, 100.1, 14321.032141201, 'Another string', 20000, 1839882000, 1639872891123, [5,4,3,2,1]), (42, 42, 42, 42, 42, 42, 42, 42, 42.42, 42.42, '42', 42, 42, 42, [42])"; diff --git a/tests/queries/0_stateless/01099_operators_date_and_timestamp.sql b/tests/queries/0_stateless/01099_operators_date_and_timestamp.sql index f52d2b774c1..c630e19490d 100644 --- a/tests/queries/0_stateless/01099_operators_date_and_timestamp.sql +++ b/tests/queries/0_stateless/01099_operators_date_and_timestamp.sql @@ -14,7 +14,7 @@ select timestamp '2001-09-28 01:00:00' + interval 23 hour; select timestamp '2001-09-28 23:00:00' - interval 23 hour; -- TODO: return interval -select (timestamp '2001-09-29 03:00:00' - timestamp '2001-09-27 12:00:00') x, toTypeName(x); -- interval '1 day 15:00:00' +select (timestamp '2001-12-29 03:00:00' - timestamp '2001-12-27 12:00:00') x, toTypeName(x); -- interval '1 day 15:00:00' -- select -interval 23 hour; -- interval '-23:00:00' -- select interval 1 day + interval 1 hour; -- interval '1 day 01:00:00' diff --git a/tests/queries/0_stateless/01107_join_right_table_totals.reference b/tests/queries/0_stateless/01107_join_right_table_totals.reference index f71d3b0d05f..daf503b776d 100644 --- a/tests/queries/0_stateless/01107_join_right_table_totals.reference +++ b/tests/queries/0_stateless/01107_join_right_table_totals.reference @@ -18,3 +18,31 @@ 0 0 0 0 +1 1 +1 1 + +0 0 +1 1 +1 1 + +0 0 +1 1 +1 1 + +0 0 +1 1 +1 1 + +0 0 +1 1 + +0 0 +1 foo 1 1 300 + +0 foo 1 0 300 +1 100 1970-01-01 1 100 1970-01-01 +1 100 1970-01-01 1 200 1970-01-02 +1 200 1970-01-02 1 100 1970-01-01 +1 200 1970-01-02 1 200 1970-01-02 + +0 0 1970-01-01 0 0 1970-01-01 diff --git a/tests/queries/0_stateless/01107_join_right_table_totals.sql b/tests/queries/0_stateless/01107_join_right_table_totals.sql index a4f284e5e2d..f894b6bf8bb 100644 --- a/tests/queries/0_stateless/01107_join_right_table_totals.sql +++ b/tests/queries/0_stateless/01107_join_right_table_totals.sql @@ -35,29 +35,66 @@ FULL JOIN ) rr USING (id); -SELECT id, yago +SELECT id, yago FROM ( SELECT item_id AS id FROM t GROUP BY id ) AS ll -FULL OUTER JOIN ( SELECT item_id AS id, arrayJoin([111, 222, 333, 444]), SUM(price_sold) AS yago FROM t GROUP BY id WITH TOTALS ) AS rr +FULL OUTER JOIN ( SELECT item_id AS id, arrayJoin([111, 222, 333, 444]), SUM(price_sold) AS yago FROM t GROUP BY id WITH TOTALS ) AS rr USING (id); -SELECT id, yago +SELECT id, yago FROM ( SELECT item_id AS id, arrayJoin([111, 222, 333]) FROM t GROUP BY id WITH TOTALS ) AS ll -FULL OUTER JOIN ( SELECT item_id AS id, SUM(price_sold) AS yago FROM t GROUP BY id ) AS rr +FULL OUTER JOIN ( SELECT item_id AS id, SUM(price_sold) AS yago FROM t GROUP BY id ) AS rr USING (id); -SELECT id, yago +SELECT id, yago FROM ( SELECT item_id AS id, arrayJoin(emptyArrayInt32()) FROM t GROUP BY id WITH TOTALS ) AS ll -FULL OUTER JOIN ( SELECT item_id AS id, SUM(price_sold) AS yago FROM t GROUP BY id ) AS rr +FULL OUTER JOIN ( SELECT item_id AS id, SUM(price_sold) AS yago FROM t GROUP BY id ) AS rr USING (id); -SELECT id, yago +SELECT id, yago FROM ( SELECT item_id AS id FROM t GROUP BY id ) AS ll -FULL OUTER JOIN ( SELECT item_id AS id, arrayJoin(emptyArrayInt32()), SUM(price_sold) AS yago FROM t GROUP BY id WITH TOTALS ) AS rr +FULL OUTER JOIN ( SELECT item_id AS id, arrayJoin(emptyArrayInt32()), SUM(price_sold) AS yago FROM t GROUP BY id WITH TOTALS ) AS rr USING (id); -SELECT id, yago +SELECT id, yago FROM ( SELECT item_id AS id, arrayJoin([111, 222, 333]) FROM t GROUP BY id WITH TOTALS ) AS ll -FULL OUTER JOIN ( SELECT item_id AS id, arrayJoin([111, 222, 333, 444]), SUM(price_sold) AS yago FROM t GROUP BY id WITH TOTALS ) AS rr +FULL OUTER JOIN ( SELECT item_id AS id, arrayJoin([111, 222, 333, 444]), SUM(price_sold) AS yago FROM t GROUP BY id WITH TOTALS ) AS rr USING (id); +INSERT INTO t VALUES (1, 100, '1970-01-01'), (1, 200, '1970-01-02'); + +SELECT * +FROM (SELECT item_id FROM t GROUP BY item_id WITH TOTALS) l +LEFT JOIN (SELECT item_id FROM t ) r +ON l.item_id = r.item_id; + +SELECT * +FROM (SELECT item_id FROM t GROUP BY item_id WITH TOTALS) l +RIGHT JOIN (SELECT item_id FROM t ) r +ON l.item_id = r.item_id; + +SELECT * +FROM (SELECT item_id FROM t) l +LEFT JOIN (SELECT item_id FROM t GROUP BY item_id WITH TOTALS ) r +ON l.item_id = r.item_id; + +SELECT * +FROM (SELECT item_id FROM t) l +RIGHT JOIN (SELECT item_id FROM t GROUP BY item_id WITH TOTALS ) r +ON l.item_id = r.item_id; + +SELECT * +FROM (SELECT item_id FROM t GROUP BY item_id WITH TOTALS) l +LEFT JOIN (SELECT item_id FROM t GROUP BY item_id WITH TOTALS ) r +ON l.item_id = r.item_id; + +SELECT * +FROM (SELECT item_id, 'foo' AS key, 1 AS val FROM t GROUP BY item_id WITH TOTALS) l +LEFT JOIN (SELECT item_id, sum(price_sold) AS val FROM t GROUP BY item_id WITH TOTALS ) r +ON l.item_id = r.item_id; + +SELECT * +FROM (SELECT * FROM t GROUP BY item_id, price_sold, date WITH TOTALS) l +LEFT JOIN (SELECT * FROM t GROUP BY item_id, price_sold, date WITH TOTALS ) r +ON l.item_id = r.item_id; + DROP TABLE t; diff --git a/tests/queries/0_stateless/01149_zookeeper_mutation_stuck_after_replace_partition.reference b/tests/queries/0_stateless/01149_zookeeper_mutation_stuck_after_replace_partition.reference index edf8a7e391c..f60cf551ce9 100644 --- a/tests/queries/0_stateless/01149_zookeeper_mutation_stuck_after_replace_partition.reference +++ b/tests/queries/0_stateless/01149_zookeeper_mutation_stuck_after_replace_partition.reference @@ -4,8 +4,8 @@ 4 4 mt 0 0_1_1_0 2 rmt 0 0_0_0_0 2 -1 1 -2 2 +1 s1 +2 s2 mt 0 0_1_1_0 2 rmt 0 0_3_3_0 2 0000000000 UPDATE s = concat(\'s\', toString(n)) WHERE 1 [] 0 1 diff --git a/tests/queries/0_stateless/01149_zookeeper_mutation_stuck_after_replace_partition.sql b/tests/queries/0_stateless/01149_zookeeper_mutation_stuck_after_replace_partition.sql index fd3f1f3fcfe..ca8f70b3cf4 100644 --- a/tests/queries/0_stateless/01149_zookeeper_mutation_stuck_after_replace_partition.sql +++ b/tests/queries/0_stateless/01149_zookeeper_mutation_stuck_after_replace_partition.sql @@ -1,3 +1,4 @@ +set send_logs_level='error'; drop table if exists mt; drop table if exists rmt sync; @@ -5,12 +6,13 @@ create table mt (n UInt64, s String) engine = MergeTree partition by intDiv(n, 1 insert into mt values (3, '3'), (4, '4'); create table rmt (n UInt64, s String) engine = ReplicatedMergeTree('/clickhouse/test_01149_{database}/rmt', 'r1') partition by intDiv(n, 10) order by n; -insert into rmt values (1,'1'), (2, '2'); +insert into rmt values (1, '1'), (2, '2'); select * from rmt; select * from mt; select table, partition_id, name, rows from system.parts where database=currentDatabase() and table in ('mt', 'rmt') and active=1 order by table, name; +SET mutations_sync = 1; alter table rmt update s = 's'||toString(n) where 1; select * from rmt; diff --git a/tests/queries/0_stateless/01154_move_partition_long.sh b/tests/queries/0_stateless/01154_move_partition_long.sh index dd16b2dc63d..1b5985b9942 100755 --- a/tests/queries/0_stateless/01154_move_partition_long.sh +++ b/tests/queries/0_stateless/01154_move_partition_long.sh @@ -117,14 +117,15 @@ timeout $TIMEOUT bash -c drop_part_thread & wait for ((i=0; i<16; i++)) do - $CLICKHOUSE_CLIENT -q "SYSTEM SYNC REPLICA dst_$i" & - $CLICKHOUSE_CLIENT -q "SYSTEM SYNC REPLICA src_$i" 2>/dev/null & + # The size of log is big, so increase timeout. + $CLICKHOUSE_CLIENT --receive_timeout 600 -q "SYSTEM SYNC REPLICA dst_$i" & + $CLICKHOUSE_CLIENT --receive_timeout 600 -q "SYSTEM SYNC REPLICA src_$i" 2>/dev/null & done wait echo "Replication did not hang" for ((i=0; i<16; i++)) do - $CLICKHOUSE_CLIENT -q "DROP TABLE dst_$i" & - $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS src_$i" & + $CLICKHOUSE_CLIENT -q "DROP TABLE dst_$i" 2>&1| grep -Fv "is already started to be removing" & + $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS src_$i" 2>&1| grep -Fv "is already started to be removing" & done wait diff --git a/tests/queries/0_stateless/01156_pcg_deserialization.reference b/tests/queries/0_stateless/01156_pcg_deserialization.reference index e43b7ca3ceb..a41bc53d840 100644 --- a/tests/queries/0_stateless/01156_pcg_deserialization.reference +++ b/tests/queries/0_stateless/01156_pcg_deserialization.reference @@ -1,3 +1,6 @@ 5 5 5 5 5 5 +5 5 +5 5 +5 5 diff --git a/tests/queries/0_stateless/01156_pcg_deserialization.sh b/tests/queries/0_stateless/01156_pcg_deserialization.sh index 9c8ac29f32e..00ef86dce9c 100755 --- a/tests/queries/0_stateless/01156_pcg_deserialization.sh +++ b/tests/queries/0_stateless/01156_pcg_deserialization.sh @@ -4,16 +4,20 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh +declare -a functions=("groupArraySample" "groupUniqArray") declare -a engines=("Memory" "MergeTree order by n" "Log") -for engine in "${engines[@]}" +for func in "${functions[@]}" do - $CLICKHOUSE_CLIENT -q "drop table if exists t"; - $CLICKHOUSE_CLIENT -q "create table t (n UInt8, a1 AggregateFunction(groupArraySample(1), UInt8)) engine=$engine" - $CLICKHOUSE_CLIENT -q "insert into t select number % 5 as n, groupArraySampleState(1)(toUInt8(number)) from numbers(10) group by n" + for engine in "${engines[@]}" + do + $CLICKHOUSE_CLIENT -q "drop table if exists t"; + $CLICKHOUSE_CLIENT -q "create table t (n UInt8, a1 AggregateFunction($func(1), UInt8)) engine=$engine" + $CLICKHOUSE_CLIENT -q "insert into t select number % 5 as n, ${func}State(1)(toUInt8(number)) from numbers(10) group by n" - $CLICKHOUSE_CLIENT -q "select * from t format TSV" | $CLICKHOUSE_CLIENT -q "insert into t format TSV" - $CLICKHOUSE_CLIENT -q "select countDistinct(n), countDistinct(a1) from t" + $CLICKHOUSE_CLIENT -q "select * from t format TSV" | $CLICKHOUSE_CLIENT -q "insert into t format TSV" + $CLICKHOUSE_CLIENT -q "select countDistinct(n), countDistinct(a1) from t" - $CLICKHOUSE_CLIENT -q "drop table t"; + $CLICKHOUSE_CLIENT -q "drop table t"; + done done diff --git a/tests/queries/0_stateless/01157_replace_table.reference b/tests/queries/0_stateless/01157_replace_table.reference new file mode 100644 index 00000000000..9fddaf99847 --- /dev/null +++ b/tests/queries/0_stateless/01157_replace_table.reference @@ -0,0 +1,20 @@ +test flush on replace +1 s1 +2 s2 +3 s3 +exception on create and fill +0 +1 1 s1 +2 2 s2 +3 3 s3 +1 1 s1 +2 2 s2 +3 3 s3 +1 1 s1 +2 2 s2 +3 3 s3 +4 4 s4 +buf +dist +join +t diff --git a/tests/queries/0_stateless/01157_replace_table.sql b/tests/queries/0_stateless/01157_replace_table.sql new file mode 100644 index 00000000000..a29b381a522 --- /dev/null +++ b/tests/queries/0_stateless/01157_replace_table.sql @@ -0,0 +1,51 @@ +drop table if exists t; +drop table if exists dist; +drop table if exists buf; +drop table if exists join; + +select 'test flush on replace'; +create table t (n UInt64, s String default 's' || toString(n)) engine=Memory; +create table dist (n int) engine=Distributed(test_shard_localhost, currentDatabase(), t); +create table buf (n int) engine=Buffer(currentDatabase(), dist, 1, 10, 100, 10, 100, 1000, 1000); + +system stop distributed sends dist; +insert into buf values (1); +replace table buf (n int) engine=Distributed(test_shard_localhost, currentDatabase(), dist); +replace table dist (n int) engine=Buffer(currentDatabase(), t, 1, 10, 100, 10, 100, 1000, 1000); + +system stop distributed sends buf; +insert into buf values (2); +replace table buf (n int) engine=Buffer(currentDatabase(), dist, 1, 10, 100, 10, 100, 1000, 1000); +replace table dist (n int) engine=Distributed(test_shard_localhost, currentDatabase(), t); + +system stop distributed sends dist; +insert into buf values (3); +replace table buf (n int) engine=Null; +replace table dist (n int) engine=Null; + +select * from t order by n; + +select 'exception on create and fill'; +-- table is not created if select fails +create or replace table join engine=Join(ANY, INNER, n) as select * from t where throwIf(n); -- { serverError 395 } +select count() from system.tables where database=currentDatabase() and name='join'; + +-- table is created and filled +create or replace table join engine=Join(ANY, INNER, n) as select * from t; +select * from numbers(10) as t any join join on t.number=join.n order by n; + +-- table is not replaced if select fails +insert into t(n) values (4); +replace table join engine=Join(ANY, INNER, n) as select * from t where throwIf(n); -- { serverError 395 } +select * from numbers(10) as t any join join on t.number=join.n order by n; + +-- table is replaced +replace table join engine=Join(ANY, INNER, n) as select * from t; +select * from numbers(10) as t any join join on t.number=join.n order by n; + +select name from system.tables where database=currentDatabase() order by name; + +drop table t; +drop table dist; +drop table buf; +drop table join; diff --git a/tests/queries/0_stateless/01158_zookeeper_log.reference b/tests/queries/0_stateless/01158_zookeeper_log.reference new file mode 100644 index 00000000000..35a30ee04e3 --- /dev/null +++ b/tests/queries/0_stateless/01158_zookeeper_log.reference @@ -0,0 +1,40 @@ +log +Response 0 Watch /test/01158/default/rmt/log 0 0 \N 0 0 ZOK CHILD CONNECTED 0 0 0 0 +Request 0 Create /test/01158/default/rmt/log 0 0 \N 0 4 \N \N \N 0 0 0 0 +Response 0 Create /test/01158/default/rmt/log 0 0 \N 0 4 ZOK \N \N /test/01158/default/rmt/log 0 0 0 0 +Request 0 Create /test/01158/default/rmt/log/log- 0 1 \N 0 1 \N \N \N 0 0 0 0 +Response 0 Create /test/01158/default/rmt/log/log- 0 1 \N 0 1 ZOK \N \N /test/01158/default/rmt/log/log-0000000000 0 0 0 0 +parts +Request 0 Multi 0 0 \N 5 0 \N \N \N 0 0 0 0 +Request 0 Create /test/01158/default/rmt/log/log- 0 1 \N 0 1 \N \N \N 0 0 0 0 +Request 0 Remove /test/01158/default/rmt/block_numbers/all/block-0000000000 0 0 -1 0 2 \N \N \N 0 0 0 0 +Request 0 Remove /test/01158/default/rmt/temp/abandonable_lock-0000000000 0 0 -1 0 3 \N \N \N 0 0 0 0 +Request 0 Create /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 \N 0 4 \N \N \N 0 0 0 0 +Request 0 Create /test/01158/default/rmt/replicas/1/parts/all_0_0_0 0 0 \N 0 5 \N \N \N 0 0 0 0 +Response 0 Multi 0 0 \N 5 0 ZOK \N \N 0 0 0 0 +Response 0 Create /test/01158/default/rmt/log/log- 0 1 \N 0 1 ZOK \N \N /test/01158/default/rmt/log/log-0000000000 0 0 0 0 +Response 0 Remove /test/01158/default/rmt/block_numbers/all/block-0000000000 0 0 -1 0 2 ZOK \N \N 0 0 0 0 +Response 0 Remove /test/01158/default/rmt/temp/abandonable_lock-0000000000 0 0 -1 0 3 ZOK \N \N 0 0 0 0 +Response 0 Create /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 \N 0 4 ZOK \N \N /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 0 0 +Response 0 Create /test/01158/default/rmt/replicas/1/parts/all_0_0_0 0 0 \N 0 5 ZOK \N \N /test/01158/default/rmt/replicas/1/parts/all_0_0_0 0 0 0 0 +Request 0 Exists /test/01158/default/rmt/replicas/1/parts/all_0_0_0 0 0 \N 0 0 \N \N \N 0 0 0 0 +Response 0 Exists /test/01158/default/rmt/replicas/1/parts/all_0_0_0 0 0 \N 0 0 ZOK \N \N 0 0 96 0 +blocks +Request 0 Multi 0 0 \N 3 0 \N \N \N 0 0 0 0 +Request 0 Create /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 \N 0 1 \N \N \N 0 0 0 0 +Request 0 Remove /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 -1 0 2 \N \N \N 0 0 0 0 +Request 0 Create /test/01158/default/rmt/temp/abandonable_lock- 1 1 \N 0 3 \N \N \N 0 0 0 0 +Response 0 Multi 0 0 \N 3 0 ZOK \N \N 0 0 0 0 +Response 0 Create /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 \N 0 1 ZOK \N \N /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 0 0 +Response 0 Remove /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 -1 0 2 ZOK \N \N 0 0 0 0 +Response 0 Create /test/01158/default/rmt/temp/abandonable_lock- 1 1 \N 0 3 ZOK \N \N /test/01158/default/rmt/temp/abandonable_lock-0000000000 0 0 0 0 +Request 0 Multi 0 0 \N 3 0 \N \N \N 0 0 0 0 +Request 0 Create /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 \N 0 1 \N \N \N 0 0 0 0 +Request 0 Remove /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 -1 0 2 \N \N \N 0 0 0 0 +Request 0 Create /test/01158/default/rmt/temp/abandonable_lock- 1 1 \N 0 3 \N \N \N 0 0 0 0 +Response 0 Multi 0 0 \N 3 0 ZNODEEXISTS \N \N 0 0 0 0 +Response 0 Error /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 \N 0 1 ZNODEEXISTS \N \N 0 0 0 0 +Response 0 Error /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 -1 0 2 ZRUNTIMEINCONSISTENCY \N \N 0 0 0 0 +Response 0 Error /test/01158/default/rmt/temp/abandonable_lock- 1 1 \N 0 3 ZRUNTIMEINCONSISTENCY \N \N 0 0 0 0 +Request 0 Get /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 \N 0 0 \N \N \N 0 0 0 0 +Response 0 Get /test/01158/default/rmt/blocks/all_6308706741995381342_2495791770474910886 0 0 \N 0 0 ZOK \N \N 0 0 9 0 diff --git a/tests/queries/0_stateless/01158_zookeeper_log.sql b/tests/queries/0_stateless/01158_zookeeper_log.sql new file mode 100644 index 00000000000..f3f1980b5a2 --- /dev/null +++ b/tests/queries/0_stateless/01158_zookeeper_log.sql @@ -0,0 +1,28 @@ +drop table if exists rmt; +create table rmt (n int) engine=ReplicatedMergeTree('/test/01158/{database}/rmt', '1') order by n; +system sync replica rmt; +insert into rmt values (1); +insert into rmt values (1); +system flush logs; + +select 'log'; +select type, has_watch, op_num, path, is_ephemeral, is_sequential, version, requests_size, request_idx, error, watch_type, + watch_state, path_created, stat_version, stat_cversion, stat_dataLength, stat_numChildren +from system.zookeeper_log where path like '/test/01158/' || currentDatabase() || '/rmt/log%' and op_num not in (3, 4, 12) +order by xid, type, request_idx; + +select 'parts'; +select type, has_watch, op_num, path, is_ephemeral, is_sequential, version, requests_size, request_idx, error, watch_type, + watch_state, path_created, stat_version, stat_cversion, stat_dataLength, stat_numChildren +from system.zookeeper_log +where (session_id, xid) in (select session_id, xid from system.zookeeper_log where path='/test/01158/' || currentDatabase() || '/rmt/replicas/1/parts/all_0_0_0') +order by xid, type, request_idx; + +select 'blocks'; +select type, has_watch, op_num, path, is_ephemeral, is_sequential, version, requests_size, request_idx, error, watch_type, + watch_state, path_created, stat_version, stat_cversion, stat_dataLength, stat_numChildren +from system.zookeeper_log +where (session_id, xid) in (select session_id, xid from system.zookeeper_log where path like '/test/01158/' || currentDatabase() || '/rmt/blocks%' and op_num not in (1, 12)) +order by xid, type, request_idx; + +drop table rmt; diff --git a/tests/queries/0_stateless/01159_combinators_with_parameters.reference b/tests/queries/0_stateless/01159_combinators_with_parameters.reference new file mode 100644 index 00000000000..cc0cb604bf3 --- /dev/null +++ b/tests/queries/0_stateless/01159_combinators_with_parameters.reference @@ -0,0 +1,20 @@ +AggregateFunction(topKArray(10), Array(String)) +AggregateFunction(topKDistinct(10), String) +AggregateFunction(topKForEach(10), Array(String)) +AggregateFunction(topKIf(10), String, UInt8) +AggregateFunction(topK(10), String) +AggregateFunction(topKOrNull(10), String) +AggregateFunction(topKOrDefault(10), String) +AggregateFunction(topKResample(10, 1, 2, 42), String, UInt64) +AggregateFunction(topK(10), String) +AggregateFunction(topKArrayResampleOrDefaultIf(10, 1, 2, 42), Array(String), UInt64, UInt8) +10 +10 +[10] +11 +10 +10 +10 +[1] +10 +[1] diff --git a/tests/queries/0_stateless/01159_combinators_with_parameters.sql b/tests/queries/0_stateless/01159_combinators_with_parameters.sql new file mode 100644 index 00000000000..69508d8e304 --- /dev/null +++ b/tests/queries/0_stateless/01159_combinators_with_parameters.sql @@ -0,0 +1,43 @@ +SELECT toTypeName(topKArrayState(10)([toString(number)])) FROM numbers(100); +SELECT toTypeName(topKDistinctState(10)(toString(number))) FROM numbers(100); +SELECT toTypeName(topKForEachState(10)([toString(number)])) FROM numbers(100); +SELECT toTypeName(topKIfState(10)(toString(number), number % 2)) FROM numbers(100); +SELECT toTypeName(topKMergeState(10)(state)) FROM (SELECT topKState(10)(toString(number)) as state FROM numbers(100)); +SELECT toTypeName(topKOrNullState(10)(toString(number))) FROM numbers(100); +SELECT toTypeName(topKOrDefaultState(10)(toString(number))) FROM numbers(100); +SELECT toTypeName(topKResampleState(10, 1, 2, 42)(toString(number), number)) FROM numbers(100); +SELECT toTypeName(topKState(10)(toString(number))) FROM numbers(100); +SELECT toTypeName(topKArrayResampleOrDefaultIfState(10, 1, 2, 42)([toString(number)], number, number % 2)) FROM numbers(100); + +CREATE TEMPORARY TABLE t0 AS SELECT quantileArrayState(0.10)([number]) FROM numbers(100); +CREATE TEMPORARY TABLE t1 AS SELECT quantileDistinctState(0.10)(number) FROM numbers(100); +CREATE TEMPORARY TABLE t2 AS SELECT quantileForEachState(0.10)([number]) FROM numbers(100); +CREATE TEMPORARY TABLE t3 AS SELECT quantileIfState(0.10)(number, number % 2) FROM numbers(100); +CREATE TEMPORARY TABLE t4 AS SELECT quantileMergeState(0.10)(state) FROM (SELECT quantileState(0.10)(number) as state FROM numbers(100)); +CREATE TEMPORARY TABLE t5 AS SELECT quantileOrNullState(0.10)(number) FROM numbers(100); +CREATE TEMPORARY TABLE t6 AS SELECT quantileOrDefaultState(0.10)(number) FROM numbers(100); +CREATE TEMPORARY TABLE t7 AS SELECT quantileResampleState(0.10, 1, 2, 42)(number, number) FROM numbers(100); +CREATE TEMPORARY TABLE t8 AS SELECT quantileState(0.10)(number) FROM numbers(100); +CREATE TEMPORARY TABLE t9 AS SELECT quantileArrayResampleOrDefaultIfState(0.10, 1, 2, 42)([number], number, number % 2) FROM numbers(100); + +INSERT INTO t0 SELECT quantileArrayState(0.10)([number]) FROM numbers(100); +INSERT INTO t1 SELECT quantileDistinctState(0.10)(number) FROM numbers(100); +INSERT INTO t2 SELECT quantileForEachState(0.10)([number]) FROM numbers(100); +INSERT INTO t3 SELECT quantileIfState(0.10)(number, number % 2) FROM numbers(100); +INSERT INTO t4 SELECT quantileMergeState(0.10)(state) FROM (SELECT quantileState(0.10)(number) as state FROM numbers(100)); +INSERT INTO t5 SELECT quantileOrNullState(0.10)(number) FROM numbers(100); +INSERT INTO t6 SELECT quantileOrDefaultState(0.10)(number) FROM numbers(100); +INSERT INTO t7 SELECT quantileResampleState(0.10, 1, 2, 42)(number, number) FROM numbers(100); +INSERT INTO t8 SELECT quantileState(0.10)(number) FROM numbers(100); +INSERT INTO t9 SELECT quantileArrayResampleOrDefaultIfState(0.10, 1, 2, 42)([number], number, number % 2) FROM numbers(100); + +SELECT round(quantileArrayMerge(0.10)((*,).1)) FROM t0; +SELECT round(quantileDistinctMerge(0.10)((*,).1)) FROM t1; +SELECT arrayMap(x -> round(x), quantileForEachMerge(0.10)((*,).1)) FROM t2; +SELECT round(quantileIfMerge(0.10)((*,).1)) FROM t3; +SELECT round(quantileMerge(0.10)((*,).1)) FROM t4; +SELECT round(quantileOrNullMerge(0.10)((*,).1)) FROM t5; +SELECT round(quantileOrDefaultMerge(0.10)((*,).1)) FROM t6; +SELECT arrayMap(x -> round(x), quantileResampleMerge(0.10, 1, 2, 42)((*,).1)) FROM t7; +SELECT round(quantileMerge(0.10)((*,).1)) FROM t8; +SELECT arrayMap(x -> round(x), quantileArrayResampleOrDefaultIfMerge(0.10, 1, 2, 42)((*,).1)) FROM t9; diff --git a/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.reference b/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.reference index 0cc8c788fed..29f6f801044 100644 --- a/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.reference +++ b/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.reference @@ -1,25 +1,23 @@ none Received exception from server: -Code: 57. Error: Received from localhost:9000. Error: There was an error on [localhost:9000]: Code: 57, e.displayText() = Error: Table default.throw already exists +Code: 57. Error: Received from localhost:9000. Error: There was an error on [localhost:9000]: Code: 57. Error: Table default.none already exists. (TABLE_ALREADY_EXISTS) Received exception from server: -Code: 159. Error: Received from localhost:9000. Error: Watching task is executing longer than distributed_ddl_task_timeout (=8) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background. +Code: 159. Error: Received from localhost:9000. Error: Watching task is executing longer than distributed_ddl_task_timeout (=1) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background. (TIMEOUT_EXCEEDED) throw localhost 9000 0 0 0 -localhost 9000 57 Code: 57, e.displayText() = Error: Table default.throw already exists. 0 0 Received exception from server: -Code: 57. Error: Received from localhost:9000. Error: There was an error on [localhost:9000]: Code: 57, e.displayText() = Error: Table default.throw already exists +Code: 57. Error: Received from localhost:9000. Error: There was an error on [localhost:9000]: Code: 57. Error: Table default.throw already exists. (TABLE_ALREADY_EXISTS) localhost 9000 0 1 0 Received exception from server: -Code: 159. Error: Received from localhost:9000. Error: Watching task is executing longer than distributed_ddl_task_timeout (=8) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background. +Code: 159. Error: Received from localhost:9000. Error: Watching task is executing longer than distributed_ddl_task_timeout (=1) seconds. There are 1 unfinished hosts (0 of them are currently active), they are going to execute the query in background. (TIMEOUT_EXCEEDED) null_status_on_timeout localhost 9000 0 0 0 -localhost 9000 57 Code: 57, e.displayText() = Error: Table default.null_status already exists. 0 0 Received exception from server: -Code: 57. Error: Received from localhost:9000. Error: There was an error on [localhost:9000]: Code: 57, e.displayText() = Error: Table default.null_status already exists +Code: 57. Error: Received from localhost:9000. Error: There was an error on [localhost:9000]: Code: 57. Error: Table default.null_status already exists. (TABLE_ALREADY_EXISTS) localhost 9000 0 1 0 localhost 1 \N \N 1 0 never_throw localhost 9000 0 0 0 -localhost 9000 57 Code: 57, e.displayText() = Error: Table default.never_throw already exists. 0 0 +localhost 9000 57 Code: 57. Error: Table default.never_throw already exists. (TABLE_ALREADY_EXISTS) 0 0 localhost 9000 0 1 0 localhost 1 \N \N 1 0 diff --git a/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.sh b/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.sh index 66ceef21682..483979d00db 100755 --- a/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.sh +++ b/tests/queries/0_stateless/01175_distributed_ddl_output_mode_long.sh @@ -1,39 +1,68 @@ #!/usr/bin/env bash +CLICKHOUSE_CLIENT_SERVER_LOGS_LEVEL=fatal + CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh +# We execute a distributed DDL query with timeout 1 to check that one host is unavailable and will time out and other complete successfully. +# But sometimes one second is not enough even for healthy host to succeed. Repeat the test in this case. +function run_until_out_contains() +{ + PATTERN=$1 + shift + + for _ in {1..20} + do + "$@" > "${CLICKHOUSE_TMP}/out" 2>&1 + if grep -q "$PATTERN" "${CLICKHOUSE_TMP}/out" + then + cat "${CLICKHOUSE_TMP}/out" + break; + fi + done +} + + +$CLICKHOUSE_CLIENT -q "drop table if exists none;" $CLICKHOUSE_CLIENT -q "drop table if exists throw;" $CLICKHOUSE_CLIENT -q "drop table if exists null_status;" $CLICKHOUSE_CLIENT -q "drop table if exists never_throw;" -CLICKHOUSE_CLIENT_OPT=$(echo ${CLICKHOUSE_CLIENT_OPT} | sed 's/'"--send_logs_level=${CLICKHOUSE_CLIENT_SERVER_LOGS_LEVEL}"'/--send_logs_level=fatal/g') - -CLIENT="$CLICKHOUSE_CLIENT_BINARY $CLICKHOUSE_CLIENT_OPT --distributed_ddl_task_timeout=8 --distributed_ddl_output_mode=none" -$CLIENT -q "select value from system.settings where name='distributed_ddl_output_mode';" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=none -q "select value from system.settings where name='distributed_ddl_output_mode';" # Ok -$CLIENT -q "create table throw on cluster test_shard_localhost (n int) engine=Memory;" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=none -q "create table none on cluster test_shard_localhost (n int) engine=Memory;" # Table exists -$CLIENT -q "create table throw on cluster test_shard_localhost (n int) engine=Memory;" 2>&1| grep -Fv "@ 0x" | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/exists.. /exists/" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=none -q "create table none on cluster test_shard_localhost (n int) engine=Memory;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" # Timeout -$CLIENT -q "drop table throw on cluster test_unavailable_shard;" 2>&1| grep -Fv "@ 0x" | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/Watching task .* is executing longer/Watching task is executing longer/" | sed "s/background. /background./" -CLIENT="$CLICKHOUSE_CLIENT_BINARY $CLICKHOUSE_CLIENT_OPT --distributed_ddl_task_timeout=8 --distributed_ddl_output_mode=throw" -$CLIENT -q "select value from system.settings where name='distributed_ddl_output_mode';" -$CLIENT -q "create table throw on cluster test_shard_localhost (n int) engine=Memory;" -$CLIENT -q "create table throw on cluster test_shard_localhost (n int) engine=Memory;" 2>&1| grep -Fv "@ 0x" | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/exists.. /exists/" -$CLIENT -q "drop table throw on cluster test_unavailable_shard;" 2>&1| grep -Fv "@ 0x" | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/Watching task .* is executing longer/Watching task is executing longer/" | sed "s/background. /background./" +run_until_out_contains 'There are 1 unfinished hosts' $CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=1 --distributed_ddl_output_mode=none -q "drop table if exists none on cluster test_unavailable_shard;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/Watching task .* is executing longer/Watching task is executing longer/" -CLIENT="$CLICKHOUSE_CLIENT_BINARY $CLICKHOUSE_CLIENT_OPT --distributed_ddl_task_timeout=8 --distributed_ddl_output_mode=null_status_on_timeout" -$CLIENT -q "select value from system.settings where name='distributed_ddl_output_mode';" -$CLIENT -q "create table null_status on cluster test_shard_localhost (n int) engine=Memory;" -$CLIENT -q "create table null_status on cluster test_shard_localhost (n int) engine=Memory;" 2>&1| grep -Fv "@ 0x" | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/exists.. /exists/" -$CLIENT -q "drop table null_status on cluster test_unavailable_shard;" -CLIENT="$CLICKHOUSE_CLIENT_BINARY $CLICKHOUSE_CLIENT_OPT --distributed_ddl_task_timeout=8 --distributed_ddl_output_mode=never_throw" -$CLIENT -q "select value from system.settings where name='distributed_ddl_output_mode';" -$CLIENT -q "create table never_throw on cluster test_shard_localhost (n int) engine=Memory;" -$CLIENT -q "create table never_throw on cluster test_shard_localhost (n int) engine=Memory;" 2>&1| sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" -$CLIENT -q "drop table never_throw on cluster test_unavailable_shard;" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=throw -q "select value from system.settings where name='distributed_ddl_output_mode';" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=throw -q "create table throw on cluster test_shard_localhost (n int) engine=Memory;" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=throw -q "create table throw on cluster test_shard_localhost (n int) engine=Memory format Null;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" + +run_until_out_contains 'There are 1 unfinished hosts' $CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=1 --distributed_ddl_output_mode=throw -q "drop table if exists throw on cluster test_unavailable_shard;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" | sed "s/Watching task .* is executing longer/Watching task is executing longer/" + + +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=null_status_on_timeout -q "select value from system.settings where name='distributed_ddl_output_mode';" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=null_status_on_timeout -q "create table null_status on cluster test_shard_localhost (n int) engine=Memory;" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=null_status_on_timeout -q "create table null_status on cluster test_shard_localhost (n int) engine=Memory format Null;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" + +run_until_out_contains '9000 0 ' $CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=1 --distributed_ddl_output_mode=null_status_on_timeout -q "drop table if exists null_status on cluster test_unavailable_shard;" + + +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=never_throw -q "select value from system.settings where name='distributed_ddl_output_mode';" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=never_throw -q "create table never_throw on cluster test_shard_localhost (n int) engine=Memory;" +$CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=600 --distributed_ddl_output_mode=never_throw -q "create table never_throw on cluster test_shard_localhost (n int) engine=Memory;" 2>&1 | sed "s/DB::Exception/Error/g" | sed "s/ (version.*)//" + +run_until_out_contains '9000 0 ' $CLICKHOUSE_CLIENT --output_format_parallel_formatting=0 --distributed_ddl_task_timeout=1 --distributed_ddl_output_mode=never_throw -q "drop table if exists never_throw on cluster test_unavailable_shard;" + + +$CLICKHOUSE_CLIENT -q "drop table if exists none;" +$CLICKHOUSE_CLIENT -q "drop table if exists throw;" +$CLICKHOUSE_CLIENT -q "drop table if exists null_status;" +$CLICKHOUSE_CLIENT -q "drop table if exists never_throw;" diff --git a/tests/queries/0_stateless/01185_create_or_replace_table.reference b/tests/queries/0_stateless/01185_create_or_replace_table.reference index 84df5f0f5b5..be187d9dcd4 100644 --- a/tests/queries/0_stateless/01185_create_or_replace_table.reference +++ b/tests/queries/0_stateless/01185_create_or_replace_table.reference @@ -1,8 +1,8 @@ t1 -CREATE TABLE test_01185.t1\n(\n `n` UInt64,\n `s` String\n)\nENGINE = MergeTree\nORDER BY n\nSETTINGS index_granularity = 8192 +CREATE TABLE default.t1\n(\n `n` UInt64,\n `s` String\n)\nENGINE = MergeTree\nORDER BY n\nSETTINGS index_granularity = 8192 t1 -CREATE TABLE test_01185.t1\n(\n `n` UInt64,\n `s` Nullable(String)\n)\nENGINE = MergeTree\nORDER BY n\nSETTINGS index_granularity = 8192 +CREATE TABLE default.t1\n(\n `n` UInt64,\n `s` Nullable(String)\n)\nENGINE = MergeTree\nORDER BY n\nSETTINGS index_granularity = 8192 2 \N t1 -CREATE TABLE test_01185.t1\n(\n `n` UInt64\n)\nENGINE = MergeTree\nORDER BY n\nSETTINGS index_granularity = 8192 +CREATE TABLE default.t1\n(\n `n` UInt64\n)\nENGINE = MergeTree\nORDER BY n\nSETTINGS index_granularity = 8192 3 diff --git a/tests/queries/0_stateless/01185_create_or_replace_table.sql b/tests/queries/0_stateless/01185_create_or_replace_table.sql index fe408cc7ac6..45900329b2c 100644 --- a/tests/queries/0_stateless/01185_create_or_replace_table.sql +++ b/tests/queries/0_stateless/01185_create_or_replace_table.sql @@ -1,23 +1,22 @@ -drop database if exists test_01185; -create database test_01185 engine=Atomic; +drop table if exists t1; -replace table test_01185.t1 (n UInt64, s String) engine=MergeTree order by n; -- { serverError 60 } -show tables from test_01185; -create or replace table test_01185.t1 (n UInt64, s String) engine=MergeTree order by n; -show tables from test_01185; -show create table test_01185.t1; +replace table t1 (n UInt64, s String) engine=MergeTree order by n; -- { serverError 60 } +show tables; +create or replace table t1 (n UInt64, s String) engine=MergeTree order by n; +show tables; +show create table t1; -insert into test_01185.t1 values (1, 'test'); -create or replace table test_01185.t1 (n UInt64, s Nullable(String)) engine=MergeTree order by n; -insert into test_01185.t1 values (2, null); -show tables from test_01185; -show create table test_01185.t1; -select * from test_01185.t1; +insert into t1 values (1, 'test'); +create or replace table t1 (n UInt64, s Nullable(String)) engine=MergeTree order by n; +insert into t1 values (2, null); +show tables; +show create table t1; +select * from t1; -replace table test_01185.t1 (n UInt64) engine=MergeTree order by n; -insert into test_01185.t1 values (3); -show tables from test_01185; -show create table test_01185.t1; -select * from test_01185.t1; +replace table t1 (n UInt64) engine=MergeTree order by n; +insert into t1 values (3); +show tables; +show create table t1; +select * from t1; -drop database test_01185; +drop table t1; diff --git a/tests/queries/0_stateless/01186_conversion_to_nullable.sql b/tests/queries/0_stateless/01186_conversion_to_nullable.sql index bf7df6234d2..828d3cac05b 100644 --- a/tests/queries/0_stateless/01186_conversion_to_nullable.sql +++ b/tests/queries/0_stateless/01186_conversion_to_nullable.sql @@ -2,9 +2,9 @@ select toUInt8(x) from values('x Nullable(String)', '42', NULL, '0', '', '256'); select toInt64(x) from values('x Nullable(String)', '42', NULL, '0', '', '256'); select toDate(x) from values('x Nullable(String)', '2020-12-24', NULL, '0000-00-00', '', '9999-01-01'); -select toDateTime(x) from values('x Nullable(String)', '2020-12-24 01:02:03', NULL, '0000-00-00 00:00:00', ''); -select toDateTime64(x, 2) from values('x Nullable(String)', '2020-12-24 01:02:03', NULL, '0000-00-00 00:00:00', ''); -select toUnixTimestamp(x) from values ('x Nullable(String)', '2000-01-01 13:12:12', NULL, ''); +select toDateTime(x, 'Europe/Moscow') from values('x Nullable(String)', '2020-12-24 01:02:03', NULL, '0000-00-00 00:00:00', ''); +select toDateTime64(x, 2, 'Europe/Moscow') from values('x Nullable(String)', '2020-12-24 01:02:03', NULL, '0000-00-00 00:00:00', ''); +select toUnixTimestamp(x, 'Europe/Moscow') from values ('x Nullable(String)', '2000-01-01 13:12:12', NULL, ''); select toDecimal32(x, 2) from values ('x Nullable(String)', '42', NULL, '3.14159'); select toDecimal64(x, 8) from values ('x Nullable(String)', '42', NULL, '3.14159'); diff --git a/tests/queries/0_stateless/01213_alter_rename_with_default_zookeeper.sql b/tests/queries/0_stateless/01213_alter_rename_with_default_zookeeper.sql index d3f83c0cbb0..e5701077770 100644 --- a/tests/queries/0_stateless/01213_alter_rename_with_default_zookeeper.sql +++ b/tests/queries/0_stateless/01213_alter_rename_with_default_zookeeper.sql @@ -12,7 +12,7 @@ ENGINE = MergeTree() PARTITION BY date ORDER BY key; -INSERT INTO table_rename_with_default (date, key, value1) SELECT toDate('2019-10-01') + number % 3, number, toString(number) from numbers(9); +INSERT INTO table_rename_with_default (date, key, value1) SELECT toDateTime(toDate('2019-10-01') + number % 3, 'Europe/Moscow'), number, toString(number) from numbers(9); SELECT * FROM table_rename_with_default WHERE key = 1 FORMAT TSVWithNames; @@ -42,7 +42,7 @@ ENGINE = ReplicatedMergeTree('/clickhouse/test_01213/table_rename_with_ttl', '1' ORDER BY tuple() TTL date2 + INTERVAL 10000 MONTH; -INSERT INTO table_rename_with_ttl SELECT toDate('2019-10-01') + number % 3, toDate('2018-10-01') + number % 3, toString(number), toString(number) from numbers(9); +INSERT INTO table_rename_with_ttl SELECT toDateTime(toDate('2019-10-01') + number % 3, 'Europe/Moscow'), toDateTime(toDate('2018-10-01') + number % 3, 'Europe/Moscow'), toString(number), toString(number) from numbers(9); SELECT * FROM table_rename_with_ttl WHERE value1 = '1' FORMAT TSVWithNames; diff --git a/tests/queries/0_stateless/01236_graphite_mt.sql b/tests/queries/0_stateless/01236_graphite_mt.sql index f3f1905b901..88d2d0ccb63 100644 --- a/tests/queries/0_stateless/01236_graphite_mt.sql +++ b/tests/queries/0_stateless/01236_graphite_mt.sql @@ -1,5 +1,6 @@ drop table if exists test_graphite; -create table test_graphite (key UInt32, Path String, Time DateTime, Value Float64, Version UInt32, col UInt64) engine = GraphiteMergeTree('graphite_rollup') order by key settings index_granularity=10; +create table test_graphite (key UInt32, Path String, Time DateTime, Value Float64, Version UInt32, col UInt64) + engine = GraphiteMergeTree('graphite_rollup') order by key settings index_granularity=10; insert into test_graphite select 1, 'sum_1', toDateTime(today()) - number * 60 - 30, number, 1, number from numbers(300) union all @@ -21,7 +22,7 @@ select 2, 'max_1', toDateTime(today() - 3) - number * 60 - 30, number, 1, number select 1, 'max_2', toDateTime(today() - 3) - number * 60 - 30, number, 1, number from numbers(1200) union all select 2, 'max_2', toDateTime(today() - 3) - number * 60 - 30, number, 1, number from numbers(1200); -optimize table test_graphite; +optimize table test_graphite final; select key, Path, Value, Version, col from test_graphite order by key, Path, Time desc; diff --git a/tests/queries/0_stateless/01244_optimize_distributed_group_by_sharding_key.reference b/tests/queries/0_stateless/01244_optimize_distributed_group_by_sharding_key.reference index acaf6531101..4442b0b6b61 100644 --- a/tests/queries/0_stateless/01244_optimize_distributed_group_by_sharding_key.reference +++ b/tests/queries/0_stateless/01244_optimize_distributed_group_by_sharding_key.reference @@ -115,6 +115,7 @@ GROUP BY WITH TOTALS LIMIT 2 0 4 0 +GROUP BY (compound) GROUP BY sharding_key, ... 0 0 1 0 @@ -123,6 +124,15 @@ GROUP BY sharding_key, ... GROUP BY ..., sharding_key 0 0 1 0 +0 0 +1 0 +sharding_key (compound) +1 2 3 +1 2 3 +1 2 6 +1 2 +1 2 +2 window functions 0 0 1 0 diff --git a/tests/queries/0_stateless/01244_optimize_distributed_group_by_sharding_key.sql b/tests/queries/0_stateless/01244_optimize_distributed_group_by_sharding_key.sql index 6b6300a4871..1dcdd795bc1 100644 --- a/tests/queries/0_stateless/01244_optimize_distributed_group_by_sharding_key.sql +++ b/tests/queries/0_stateless/01244_optimize_distributed_group_by_sharding_key.sql @@ -97,6 +97,7 @@ select 'GROUP BY WITH TOTALS LIMIT'; select count(), * from dist_01247 group by number with totals limit 1; -- GROUP BY (compound) +select 'GROUP BY (compound)'; drop table if exists dist_01247; drop table if exists data_01247; create table data_01247 engine=Memory() as select number key, 0 value from numbers(2); @@ -106,9 +107,16 @@ select * from dist_01247 group by key, value; select 'GROUP BY ..., sharding_key'; select * from dist_01247 group by value, key; +-- sharding_key (compound) +select 'sharding_key (compound)'; +select k1, k2, sum(v) from remote('127.{1,2}', view(select 1 k1, 2 k2, 3 v), cityHash64(k1, k2)) group by k1, k2; -- optimization applied +select k1, any(k2), sum(v) from remote('127.{1,2}', view(select 1 k1, 2 k2, 3 v), cityHash64(k1, k2)) group by k1; -- optimization does not applied +select distinct k1, k2 from remote('127.{1,2}', view(select 1 k1, 2 k2, 3 v), cityHash64(k1, k2)); -- optimization applied +select distinct on (k1) k2 from remote('127.{1,2}', view(select 1 k1, 2 k2, 3 v), cityHash64(k1, k2)); -- optimization does not applied + -- window functions select 'window functions'; -select key, sum(sum(value)) over (rows unbounded preceding) from dist_01247 group by key settings allow_experimental_window_functions=1; +select key, sum(sum(value)) over (rows unbounded preceding) from dist_01247 group by key; drop table dist_01247; drop table data_01247; diff --git a/tests/queries/0_stateless/01246_extractAllGroupsHorizontal.reference b/tests/queries/0_stateless/01246_extractAllGroupsHorizontal.reference index 13e717485d8..2aed6b7a3c0 100644 --- a/tests/queries/0_stateless/01246_extractAllGroupsHorizontal.reference +++ b/tests/queries/0_stateless/01246_extractAllGroupsHorizontal.reference @@ -1,4 +1,3 @@ -0 groups, zero matches 1 group, multiple matches, String and FixedString [['hello','world']] [['hello','world']] diff --git a/tests/queries/0_stateless/01246_extractAllGroupsHorizontal.sql b/tests/queries/0_stateless/01246_extractAllGroupsHorizontal.sql index b7a71415a9d..d28402056d3 100644 --- a/tests/queries/0_stateless/01246_extractAllGroupsHorizontal.sql +++ b/tests/queries/0_stateless/01246_extractAllGroupsHorizontal.sql @@ -5,9 +5,11 @@ SELECT extractAllGroupsHorizontal('hello', 123); --{serverError 43} invalid arg SELECT extractAllGroupsHorizontal(123, 'world'); --{serverError 43} invalid argument type SELECT extractAllGroupsHorizontal('hello world', '((('); --{serverError 427} invalid re SELECT extractAllGroupsHorizontal('hello world', materialize('\\w+')); --{serverError 44} non-cons needle +SELECT extractAllGroupsHorizontal('hello world', '\\w+'); -- { serverError 36 } 0 groups +SELECT extractAllGroupsHorizontal('hello world', '(\\w+)') SETTINGS regexp_max_matches_per_row = 0; -- { serverError 128 } to many groups matched per row +SELECT extractAllGroupsHorizontal('hello world', '(\\w+)') SETTINGS regexp_max_matches_per_row = 1; -- { serverError 128 } to many groups matched per row -SELECT '0 groups, zero matches'; -SELECT extractAllGroupsHorizontal('hello world', '\\w+'); -- { serverError 36 } +SELECT extractAllGroupsHorizontal('hello world', '(\\w+)') SETTINGS regexp_max_matches_per_row = 1000000 FORMAT Null; -- users now can set limit bigger than previous 1000 matches per row SELECT '1 group, multiple matches, String and FixedString'; SELECT extractAllGroupsHorizontal('hello world', '(\\w+)'); diff --git a/tests/queries/0_stateless/01246_extractAllGroupsVertical.reference b/tests/queries/0_stateless/01246_extractAllGroupsVertical.reference index 983e3838ee5..80b0bf2884d 100644 --- a/tests/queries/0_stateless/01246_extractAllGroupsVertical.reference +++ b/tests/queries/0_stateless/01246_extractAllGroupsVertical.reference @@ -1,4 +1,3 @@ -0 groups, zero matches 1 group, multiple matches, String and FixedString [['hello'],['world']] [['hello'],['world']] diff --git a/tests/queries/0_stateless/01246_extractAllGroupsVertical.sql b/tests/queries/0_stateless/01246_extractAllGroupsVertical.sql index 8edc3f3e741..65ddbfe411b 100644 --- a/tests/queries/0_stateless/01246_extractAllGroupsVertical.sql +++ b/tests/queries/0_stateless/01246_extractAllGroupsVertical.sql @@ -5,9 +5,7 @@ SELECT extractAllGroupsVertical('hello', 123); --{serverError 43} invalid argum SELECT extractAllGroupsVertical(123, 'world'); --{serverError 43} invalid argument type SELECT extractAllGroupsVertical('hello world', '((('); --{serverError 427} invalid re SELECT extractAllGroupsVertical('hello world', materialize('\\w+')); --{serverError 44} non-const needle - -SELECT '0 groups, zero matches'; -SELECT extractAllGroupsVertical('hello world', '\\w+'); -- { serverError 36 } +SELECT extractAllGroupsVertical('hello world', '\\w+'); -- { serverError 36 } 0 groups SELECT '1 group, multiple matches, String and FixedString'; SELECT extractAllGroupsVertical('hello world', '(\\w+)'); diff --git a/tests/queries/0_stateless/01247_some_msan_crashs_from_22517.reference b/tests/queries/0_stateless/01247_some_msan_crashs_from_22517.reference new file mode 100644 index 00000000000..573541ac970 --- /dev/null +++ b/tests/queries/0_stateless/01247_some_msan_crashs_from_22517.reference @@ -0,0 +1 @@ +0 diff --git a/tests/queries/0_stateless/01247_some_msan_crashs_from_22517.sql b/tests/queries/0_stateless/01247_some_msan_crashs_from_22517.sql new file mode 100644 index 00000000000..8bcbbde63d6 --- /dev/null +++ b/tests/queries/0_stateless/01247_some_msan_crashs_from_22517.sql @@ -0,0 +1,3 @@ +SELECT a FROM (SELECT ignore((SELECT 1)) AS a, a AS b); + +SELECT x FROM (SELECT dummy AS x, plus(ignore(ignore(ignore(ignore('-922337203.6854775808', ignore(NULL)), ArrLen = 256, ignore(100, Arr.C3, ignore(NULL), (SELECT 10.000100135803223, count(*) FROM system.time_zones) > NULL)))), dummy, 65535) AS dummy ORDER BY ignore(-2) ASC, identity(x) DESC NULLS FIRST) FORMAT Null; -- { serverError 47 } diff --git a/tests/queries/0_stateless/01269_toStartOfSecond.sql b/tests/queries/0_stateless/01269_toStartOfSecond.sql index 5fe6aa9602f..b74eaabf351 100644 --- a/tests/queries/0_stateless/01269_toStartOfSecond.sql +++ b/tests/queries/0_stateless/01269_toStartOfSecond.sql @@ -4,10 +4,10 @@ SELECT toStartOfSecond(now()); -- {serverError 43} SELECT toStartOfSecond(); -- {serverError 42} SELECT toStartOfSecond(now64(), 123); -- {serverError 43} -WITH toDateTime64('2019-09-16 19:20:11', 3) AS dt64 SELECT toStartOfSecond(dt64, 'UTC') AS res, toTypeName(res); +WITH toDateTime64('2019-09-16 19:20:11', 3, 'Europe/Moscow') AS dt64 SELECT toStartOfSecond(dt64, 'UTC') AS res, toTypeName(res); WITH toDateTime64('2019-09-16 19:20:11', 0, 'UTC') AS dt64 SELECT toStartOfSecond(dt64) AS res, toTypeName(res); WITH toDateTime64('2019-09-16 19:20:11.123', 3, 'UTC') AS dt64 SELECT toStartOfSecond(dt64) AS res, toTypeName(res); WITH toDateTime64('2019-09-16 19:20:11.123', 9, 'UTC') AS dt64 SELECT toStartOfSecond(dt64) AS res, toTypeName(res); SELECT 'non-const column'; -WITH toDateTime64('2019-09-16 19:20:11.123', 3, 'UTC') AS dt64 SELECT toStartOfSecond(materialize(dt64)) AS res, toTypeName(res); \ No newline at end of file +WITH toDateTime64('2019-09-16 19:20:11.123', 3, 'UTC') AS dt64 SELECT toStartOfSecond(materialize(dt64)) AS res, toTypeName(res); diff --git a/tests/queries/0_stateless/01273_arrow.reference b/tests/queries/0_stateless/01273_arrow.reference index 0dc503f65e4..89eca82f8ef 100644 --- a/tests/queries/0_stateless/01273_arrow.reference +++ b/tests/queries/0_stateless/01273_arrow.reference @@ -41,7 +41,7 @@ converted: 127 255 32767 65535 2147483647 4294967295 9223372036854775807 9223372036854775807 -1.032 -1.064 string-2 fixedstring-2\0\0 2004-06-07 2004-02-03 04:05:06 diff: dest: -79 81 82 83 84 85 86 87 88 89 str01\0\0\0\0\0\0\0\0\0\0 fstr1\0\0\0\0\0\0\0\0\0\0 2003-03-04 1970-01-01 06:29:04 +79 81 82 83 84 85 86 87 88 89 str01\0\0\0\0\0\0\0\0\0\0 fstr1\0\0\0\0\0\0\0\0\0\0 2003-03-04 2004-05-06 00:00:00 80 81 82 83 84 85 86 87 88 89 str02 fstr2\0\0\0\0\0\0\0\0\0\0 2005-03-04 2006-08-09 10:11:12 min: -128 0 0 0 0 0 0 0 -1 -1 string-1\0\0\0\0\0\0\0 fixedstring-1\0\0 2003-04-05 2003-02-03 @@ -49,10 +49,10 @@ min: 79 81 82 83 84 85 86 87 88 89 str01\0\0\0\0\0\0\0\0\0\0 fstr1\0\0\0\0\0\0\0\0\0\0 2003-03-04 2004-05-06 127 -1 -1 -1 -1 -1 -1 -1 -1 -1 string-2\0\0\0\0\0\0\0 fixedstring-2\0\0 2004-06-07 2004-02-03 max: --128 0 -32768 0 -2147483648 0 -9223372036854775808 0 -1 -1 string-1 fixedstring-1\0\0 1970-01-01 06:22:27 2003-02-03 04:05:06 --108 108 -1016 1116 -1032 1132 -1064 1164 -1 -1 string-0 fixedstring\0\0\0\0 1970-01-01 06:09:16 2002-02-03 04:05:06 +-128 0 -32768 0 -2147483648 0 -9223372036854775808 0 -1 -1 string-1 fixedstring-1\0\0 2003-04-05 00:00:00 2003-02-03 04:05:06 +-108 108 -1016 1116 -1032 1132 -1064 1164 -1 -1 string-0 fixedstring\0\0\0\0 2001-02-03 00:00:00 2002-02-03 04:05:06 80 81 82 83 84 85 86 87 88 89 str02 fstr2 2005-03-04 05:06:07 2006-08-09 10:11:12 -127 255 32767 65535 2147483647 4294967295 9223372036854775807 9223372036854775807 -1 -1 string-2 fixedstring-2\0\0 1970-01-01 06:29:36 2004-02-03 04:05:06 +127 255 32767 65535 2147483647 4294967295 9223372036854775807 9223372036854775807 -1 -1 string-2 fixedstring-2\0\0 2004-06-07 00:00:00 2004-02-03 04:05:06 dest from null: -128 0 -32768 0 -2147483648 0 -9223372036854775808 0 -1.032 -1.064 string-1 fixedstring-1\0\0 2003-04-05 2003-02-03 04:05:06 -108 108 -1016 1116 -1032 1132 -1064 1164 -1.032 -1.064 string-0 fixedstring\0\0\0\0 2001-02-03 2002-02-03 04:05:06 diff --git a/tests/queries/0_stateless/01273_arrow_load.sh b/tests/queries/0_stateless/01273_arrow_load.sh index b2ca0e32af1..bc5588a905b 100755 --- a/tests/queries/0_stateless/01273_arrow_load.sh +++ b/tests/queries/0_stateless/01273_arrow_load.sh @@ -11,7 +11,7 @@ CB_DIR=$(dirname "$CLICKHOUSE_CLIENT_BINARY") DATA_FILE=$CUR_DIR/data_arrow/test.arrow ${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS arrow_load" -${CLICKHOUSE_CLIENT} --query="CREATE TABLE arrow_load (bool UInt8, int8 Int8, int16 Int16, int32 Int32, int64 Int64, uint8 UInt8, uint16 UInt16, uint32 UInt32, uint64 UInt64, halffloat Float32, float Float32, double Float64, string String, date32 Date, date64 DateTime, timestamp DateTime) ENGINE = Memory" +${CLICKHOUSE_CLIENT} --query="CREATE TABLE arrow_load (bool UInt8, int8 Int8, int16 Int16, int32 Int32, int64 Int64, uint8 UInt8, uint16 UInt16, uint32 UInt32, uint64 UInt64, halffloat Float32, float Float32, double Float64, string String, date32 Date, date64 DateTime('Europe/Moscow'), timestamp DateTime('Europe/Moscow')) ENGINE = Memory" cat "$DATA_FILE" | ${CLICKHOUSE_CLIENT} -q "insert into arrow_load format Arrow" ${CLICKHOUSE_CLIENT} --query="select * from arrow_load" diff --git a/tests/queries/0_stateless/01273_arrow_stream.reference b/tests/queries/0_stateless/01273_arrow_stream.reference index 0dc503f65e4..89eca82f8ef 100644 --- a/tests/queries/0_stateless/01273_arrow_stream.reference +++ b/tests/queries/0_stateless/01273_arrow_stream.reference @@ -41,7 +41,7 @@ converted: 127 255 32767 65535 2147483647 4294967295 9223372036854775807 9223372036854775807 -1.032 -1.064 string-2 fixedstring-2\0\0 2004-06-07 2004-02-03 04:05:06 diff: dest: -79 81 82 83 84 85 86 87 88 89 str01\0\0\0\0\0\0\0\0\0\0 fstr1\0\0\0\0\0\0\0\0\0\0 2003-03-04 1970-01-01 06:29:04 +79 81 82 83 84 85 86 87 88 89 str01\0\0\0\0\0\0\0\0\0\0 fstr1\0\0\0\0\0\0\0\0\0\0 2003-03-04 2004-05-06 00:00:00 80 81 82 83 84 85 86 87 88 89 str02 fstr2\0\0\0\0\0\0\0\0\0\0 2005-03-04 2006-08-09 10:11:12 min: -128 0 0 0 0 0 0 0 -1 -1 string-1\0\0\0\0\0\0\0 fixedstring-1\0\0 2003-04-05 2003-02-03 @@ -49,10 +49,10 @@ min: 79 81 82 83 84 85 86 87 88 89 str01\0\0\0\0\0\0\0\0\0\0 fstr1\0\0\0\0\0\0\0\0\0\0 2003-03-04 2004-05-06 127 -1 -1 -1 -1 -1 -1 -1 -1 -1 string-2\0\0\0\0\0\0\0 fixedstring-2\0\0 2004-06-07 2004-02-03 max: --128 0 -32768 0 -2147483648 0 -9223372036854775808 0 -1 -1 string-1 fixedstring-1\0\0 1970-01-01 06:22:27 2003-02-03 04:05:06 --108 108 -1016 1116 -1032 1132 -1064 1164 -1 -1 string-0 fixedstring\0\0\0\0 1970-01-01 06:09:16 2002-02-03 04:05:06 +-128 0 -32768 0 -2147483648 0 -9223372036854775808 0 -1 -1 string-1 fixedstring-1\0\0 2003-04-05 00:00:00 2003-02-03 04:05:06 +-108 108 -1016 1116 -1032 1132 -1064 1164 -1 -1 string-0 fixedstring\0\0\0\0 2001-02-03 00:00:00 2002-02-03 04:05:06 80 81 82 83 84 85 86 87 88 89 str02 fstr2 2005-03-04 05:06:07 2006-08-09 10:11:12 -127 255 32767 65535 2147483647 4294967295 9223372036854775807 9223372036854775807 -1 -1 string-2 fixedstring-2\0\0 1970-01-01 06:29:36 2004-02-03 04:05:06 +127 255 32767 65535 2147483647 4294967295 9223372036854775807 9223372036854775807 -1 -1 string-2 fixedstring-2\0\0 2004-06-07 00:00:00 2004-02-03 04:05:06 dest from null: -128 0 -32768 0 -2147483648 0 -9223372036854775808 0 -1.032 -1.064 string-1 fixedstring-1\0\0 2003-04-05 2003-02-03 04:05:06 -108 108 -1016 1116 -1032 1132 -1064 1164 -1.032 -1.064 string-0 fixedstring\0\0\0\0 2001-02-03 2002-02-03 04:05:06 diff --git a/tests/queries/0_stateless/01277_toUnixTimestamp64.sql b/tests/queries/0_stateless/01277_toUnixTimestamp64.sql index de2b132a2dc..eb3e8c612ed 100644 --- a/tests/queries/0_stateless/01277_toUnixTimestamp64.sql +++ b/tests/queries/0_stateless/01277_toUnixTimestamp64.sql @@ -12,22 +12,22 @@ SELECT toUnixTimestamp64Micro('abc', 123); -- {serverError 42} SELECT toUnixTimestamp64Nano('abc', 123); -- {serverError 42} SELECT 'const column'; -WITH toDateTime64('2019-09-16 19:20:12.345678910', 3) AS dt64 +WITH toDateTime64('2019-09-16 19:20:12.345678910', 3, 'Europe/Moscow') AS dt64 SELECT dt64, toUnixTimestamp64Milli(dt64), toUnixTimestamp64Micro(dt64), toUnixTimestamp64Nano(dt64); -WITH toDateTime64('2019-09-16 19:20:12.345678910', 6) AS dt64 +WITH toDateTime64('2019-09-16 19:20:12.345678910', 6, 'Europe/Moscow') AS dt64 SELECT dt64, toUnixTimestamp64Milli(dt64), toUnixTimestamp64Micro(dt64), toUnixTimestamp64Nano(dt64); -WITH toDateTime64('2019-09-16 19:20:12.345678910', 9) AS dt64 +WITH toDateTime64('2019-09-16 19:20:12.345678910', 9, 'Europe/Moscow') AS dt64 SELECT dt64, toUnixTimestamp64Milli(dt64), toUnixTimestamp64Micro(dt64), toUnixTimestamp64Nano(dt64); SELECT 'non-const column'; -WITH toDateTime64('2019-09-16 19:20:12.345678910', 3) AS x +WITH toDateTime64('2019-09-16 19:20:12.345678910', 3, 'Europe/Moscow') AS x SELECT materialize(x) as dt64, toUnixTimestamp64Milli(dt64), toUnixTimestamp64Micro(dt64), toUnixTimestamp64Nano(dt64); -WITH toDateTime64('2019-09-16 19:20:12.345678910', 6) AS x +WITH toDateTime64('2019-09-16 19:20:12.345678910', 6, 'Europe/Moscow') AS x SELECT materialize(x) as dt64, toUnixTimestamp64Milli(dt64), toUnixTimestamp64Micro(dt64), toUnixTimestamp64Nano(dt64); -WITH toDateTime64('2019-09-16 19:20:12.345678910', 9) AS x +WITH toDateTime64('2019-09-16 19:20:12.345678910', 9, 'Europe/Moscow') AS x SELECT materialize(x) as dt64, toUnixTimestamp64Milli(dt64), toUnixTimestamp64Micro(dt64), toUnixTimestamp64Nano(dt64); diff --git a/tests/queries/0_stateless/01293_client_interactive_vertical_multiline_long.expect b/tests/queries/0_stateless/01293_client_interactive_vertical_multiline.expect similarity index 95% rename from tests/queries/0_stateless/01293_client_interactive_vertical_multiline_long.expect rename to tests/queries/0_stateless/01293_client_interactive_vertical_multiline.expect index 85eb97fb6f2..5e845754402 100755 --- a/tests/queries/0_stateless/01293_client_interactive_vertical_multiline_long.expect +++ b/tests/queries/0_stateless/01293_client_interactive_vertical_multiline.expect @@ -41,7 +41,7 @@ expect ":) " send -- "" expect eof -spawn bash -c "source $basedir/../shell_config.sh ; \$CLICKHOUSE_CLIENT_BINARY \$CLICKHOUSE_CLIENT_OPT" +spawn bash -c "source $basedir/../shell_config.sh ; \$CLICKHOUSE_CLIENT_BINARY \$CLICKHOUSE_CLIENT_OPT --disable_suggestion --multiline" expect ":) " send -- "SELECT 1;\r" diff --git a/tests/queries/0_stateless/01293_client_interactive_vertical_multiline.reference b/tests/queries/0_stateless/01293_client_interactive_vertical_multiline.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/01294_system_distributed_on_cluster.sql b/tests/queries/0_stateless/01294_system_distributed_on_cluster.sql index d56bddba3c6..525974e78ba 100644 --- a/tests/queries/0_stateless/01294_system_distributed_on_cluster.sql +++ b/tests/queries/0_stateless/01294_system_distributed_on_cluster.sql @@ -3,6 +3,7 @@ -- quirk for ON CLUSTER does not uses currentDatabase() drop database if exists db_01294; create database db_01294; +set distributed_ddl_output_mode='throw'; drop table if exists db_01294.dist_01294; create table db_01294.dist_01294 as system.one engine=Distributed(test_shard_localhost, system, one); diff --git a/tests/queries/0_stateless/01307_orc_output_format.sh b/tests/queries/0_stateless/01307_orc_output_format.sh index 26c7db5ad1b..b5000bfd3fc 100755 --- a/tests/queries/0_stateless/01307_orc_output_format.sh +++ b/tests/queries/0_stateless/01307_orc_output_format.sh @@ -6,7 +6,7 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) $CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS orc"; -$CLICKHOUSE_CLIENT --query="CREATE TABLE orc (uint8 UInt8, uint16 UInt16, uint32 UInt32, uint64 UInt64, int8 Int8, int16 Int16, int32 Int32, int64 Int64, float Float32, double Float64, string String, fixed FixedString(4), date Date, datetime DateTime, decimal32 Decimal32(4), decimal64 Decimal64(10), decimal128 Decimal128(20), nullable Nullable(Int32)) ENGINE = Memory"; +$CLICKHOUSE_CLIENT --query="CREATE TABLE orc (uint8 UInt8, uint16 UInt16, uint32 UInt32, uint64 UInt64, int8 Int8, int16 Int16, int32 Int32, int64 Int64, float Float32, double Float64, string String, fixed FixedString(4), date Date, datetime DateTime('Europe/Moscow'), decimal32 Decimal32(4), decimal64 Decimal64(10), decimal128 Decimal128(20), nullable Nullable(Int32)) ENGINE = Memory"; $CLICKHOUSE_CLIENT --query="INSERT INTO orc VALUES (255, 65535, 4294967295, 100000000000, -128, -32768, -2147483648, -100000000000, 2.02, 10000.0000001, 'String', '2020', 18980, 1639872000, 1.0001, 1.00000001, 100000.00000000000001, 1), (4, 1234, 3244467295, 500000000000, -1, -256, -14741221, -7000000000, 100.1, 14321.032141201, 'Another string', '2000', 20000, 1839882000, 34.1234, 123123.123123123, 123123123.123123123123123, NULL), (42, 42, 42, 42, 42, 42, 42, 42, 42.42, 42.42, '42', '4242', 42, 42, 42.42, 42.42424242, 424242.42424242424242, 42)"; diff --git a/tests/queries/0_stateless/01372_remote_table_function_empty_table.sql b/tests/queries/0_stateless/01372_remote_table_function_empty_table.sql index 698c323d73f..4153dc632f3 100644 --- a/tests/queries/0_stateless/01372_remote_table_function_empty_table.sql +++ b/tests/queries/0_stateless/01372_remote_table_function_empty_table.sql @@ -1 +1,4 @@ SELECT * FROM remote('127..2', 'a.'); -- { serverError 36 } + +-- Clear cache to avoid future errors in the logs +SYSTEM DROP DNS CACHE diff --git a/tests/queries/0_stateless/01379_with_fill_several_columns.sql b/tests/queries/0_stateless/01379_with_fill_several_columns.sql index f98431b61b9..505b9e0f8e1 100644 --- a/tests/queries/0_stateless/01379_with_fill_several_columns.sql +++ b/tests/queries/0_stateless/01379_with_fill_several_columns.sql @@ -1,6 +1,6 @@ SELECT - toDate((number * 10) * 86400) AS d1, - toDate(number * 86400) AS d2, + toDate(toDateTime((number * 10) * 86400, 'Europe/Moscow')) AS d1, + toDate(toDateTime(number * 86400, 'Europe/Moscow')) AS d2, 'original' AS source FROM numbers(10) WHERE (number % 3) = 1 @@ -11,11 +11,11 @@ ORDER BY SELECT '==============='; SELECT - toDate((number * 10) * 86400) AS d1, - toDate(number * 86400) AS d2, + toDate(toDateTime((number * 10) * 86400, 'Europe/Moscow')) AS d1, + toDate(toDateTime(number * 86400, 'Europe/Moscow')) AS d2, 'original' AS source FROM numbers(10) WHERE (number % 3) = 1 ORDER BY d1 WITH FILL STEP 5, - d2 WITH FILL; \ No newline at end of file + d2 WITH FILL; diff --git a/tests/queries/0_stateless/01410_nullable_key.reference b/tests/queries/0_stateless/01410_nullable_key.reference deleted file mode 100644 index 75163f1bf41..00000000000 --- a/tests/queries/0_stateless/01410_nullable_key.reference +++ /dev/null @@ -1,35 +0,0 @@ -0 0 -2 3 -4 6 -6 9 -8 12 -10 15 -12 18 -14 21 -16 24 -18 27 -\N 0 -\N -1 -\N -2 -\N 0 -\N -1 -\N -2 -0 0 -2 3 -4 6 -6 9 -8 12 -10 15 -12 18 -14 21 -16 24 -18 27 -12 18 -14 21 -16 24 -18 27 -0 0 -2 3 -4 6 -6 9 -8 12 diff --git a/tests/queries/0_stateless/01410_nullable_key.sql b/tests/queries/0_stateless/01410_nullable_key.sql deleted file mode 100644 index 4a3701cf46d..00000000000 --- a/tests/queries/0_stateless/01410_nullable_key.sql +++ /dev/null @@ -1,13 +0,0 @@ -DROP TABLE IF EXISTS nullable_key; -CREATE TABLE nullable_key (k Nullable(int), v int) ENGINE MergeTree ORDER BY k SETTINGS allow_nullable_key = 1; - -INSERT INTO nullable_key SELECT number * 2, number * 3 FROM numbers(10); -INSERT INTO nullable_key SELECT NULL, -number FROM numbers(3); - -SELECT * FROM nullable_key ORDER BY k; -SELECT * FROM nullable_key WHERE k IS NULL; -SELECT * FROM nullable_key WHERE k IS NOT NULL; -SELECT * FROM nullable_key WHERE k > 10; -SELECT * FROM nullable_key WHERE k < 10; - -DROP TABLE nullable_key; diff --git a/tests/queries/0_stateless/01410_nullable_key_and_index.reference b/tests/queries/0_stateless/01410_nullable_key_and_index.reference new file mode 100644 index 00000000000..1fc2cf91e62 --- /dev/null +++ b/tests/queries/0_stateless/01410_nullable_key_and_index.reference @@ -0,0 +1,81 @@ +0 0 +2 3 +4 6 +6 9 +8 12 +10 15 +12 18 +14 21 +16 24 +18 27 +\N 0 +\N -1 +\N -2 +\N 0 +\N -1 +\N -2 +0 0 +2 3 +4 6 +6 9 +8 12 +10 15 +12 18 +14 21 +16 24 +18 27 +12 18 +14 21 +16 24 +18 27 +0 0 +2 3 +4 6 +6 9 +8 12 +\N 0 +\N -1 +\N -2 +0 0 +2 3 +4 6 +6 9 +8 12 +10 15 +12 18 +14 21 +16 24 +18 27 +10 15 +\N 0 +\N -1 +\N -2 +\N +123 +1 1 +1 3 +2 \N +2 2 +2 1 +2 7 +2 \N +3 \N +3 2 +3 4 +2 \N +2 \N +3 \N +1 3 +2 7 +3 4 +1 1 +2 2 +2 1 +3 2 +1 3 +2 7 +3 4 +1 1 +2 2 +2 1 +3 2 diff --git a/tests/queries/0_stateless/01410_nullable_key_and_index.sql b/tests/queries/0_stateless/01410_nullable_key_and_index.sql new file mode 100644 index 00000000000..24ddb226c16 --- /dev/null +++ b/tests/queries/0_stateless/01410_nullable_key_and_index.sql @@ -0,0 +1,65 @@ +DROP TABLE IF EXISTS nullable_key; +DROP TABLE IF EXISTS nullable_key_without_final_mark; +DROP TABLE IF EXISTS nullable_minmax_index; + +SET max_threads = 1; + +CREATE TABLE nullable_key (k Nullable(int), v int) ENGINE MergeTree ORDER BY k SETTINGS allow_nullable_key = 1, index_granularity = 1; + +INSERT INTO nullable_key SELECT number * 2, number * 3 FROM numbers(10); +INSERT INTO nullable_key SELECT NULL, -number FROM numbers(3); + +SELECT * FROM nullable_key ORDER BY k; + +SET force_primary_key = 1; +SET max_rows_to_read = 3; +SELECT * FROM nullable_key WHERE k IS NULL; +SET max_rows_to_read = 10; +SELECT * FROM nullable_key WHERE k IS NOT NULL; +SET max_rows_to_read = 5; +SELECT * FROM nullable_key WHERE k > 10; +SELECT * FROM nullable_key WHERE k < 10; + +OPTIMIZE TABLE nullable_key FINAL; + +SET max_rows_to_read = 4; -- one additional left mark needs to be read +SELECT * FROM nullable_key WHERE k IS NULL; +SET max_rows_to_read = 10; +SELECT * FROM nullable_key WHERE k IS NOT NULL; + +-- Nullable in set and with transform_null_in = 1 +SET max_rows_to_read = 3; +SELECT * FROM nullable_key WHERE k IN (10, 20) SETTINGS transform_null_in = 1; +SET max_rows_to_read = 5; +SELECT * FROM nullable_key WHERE k IN (3, NULL) SETTINGS transform_null_in = 1; + +CREATE TABLE nullable_key_without_final_mark (s Nullable(String)) ENGINE MergeTree ORDER BY s SETTINGS allow_nullable_key = 1, write_final_mark = 0; +INSERT INTO nullable_key_without_final_mark VALUES ('123'), (NULL); +SET max_rows_to_read = 0; +SELECT * FROM nullable_key_without_final_mark WHERE s IS NULL; +SELECT * FROM nullable_key_without_final_mark WHERE s IS NOT NULL; + +CREATE TABLE nullable_minmax_index (k int, v Nullable(int), INDEX v_minmax v TYPE minmax GRANULARITY 4) ENGINE MergeTree ORDER BY k SETTINGS index_granularity = 1; + +INSERT INTO nullable_minmax_index VALUES (1, 3), (2, 7), (3, 4), (2, NULL); -- [3, +Inf] +INSERT INTO nullable_minmax_index VALUES (1, 1), (2, 2), (3, 2), (2, 1); -- [1, 2] +INSERT INTO nullable_minmax_index VALUES (2, NULL), (3, NULL); -- [+Inf, +Inf] + +SET force_primary_key = 0; +SELECT * FROM nullable_minmax_index ORDER BY k; +SET max_rows_to_read = 6; +SELECT * FROM nullable_minmax_index WHERE v IS NULL; +-- NOTE: granuals with Null values cannot be filtred in data skipping indexes, +-- due to backward compatibility +SET max_rows_to_read = 0; +SELECT * FROM nullable_minmax_index WHERE v IS NOT NULL; +SET max_rows_to_read = 6; +SELECT * FROM nullable_minmax_index WHERE v > 2; +-- NOTE: granuals with Null values cannot be filtred in data skipping indexes, +-- due to backward compatibility +SET max_rows_to_read = 0; +SELECT * FROM nullable_minmax_index WHERE v <= 2; + +DROP TABLE nullable_key; +DROP TABLE nullable_key_without_final_mark; +DROP TABLE nullable_minmax_index; diff --git a/tests/queries/0_stateless/01415_sticking_mutations.sh b/tests/queries/0_stateless/01415_sticking_mutations.sh index 9bd0a6eeebf..2e86b6d972d 100755 --- a/tests/queries/0_stateless/01415_sticking_mutations.sh +++ b/tests/queries/0_stateless/01415_sticking_mutations.sh @@ -33,9 +33,10 @@ function check_sticky_mutations() query_result=$($CLICKHOUSE_CLIENT --query="$check_query" 2>&1) - while [ "$query_result" == "0" ] + for _ in {1..50} do query_result=$($CLICKHOUSE_CLIENT --query="$check_query" 2>&1) + if ! [ "$query_result" == "0" ]; then break; fi sleep 0.5 done ##### wait mutation to start ##### diff --git a/tests/queries/0_stateless/01429_join_on_error_messages.sql b/tests/queries/0_stateless/01429_join_on_error_messages.sql index f9e2647f2e3..6e792e90d42 100644 --- a/tests/queries/0_stateless/01429_join_on_error_messages.sql +++ b/tests/queries/0_stateless/01429_join_on_error_messages.sql @@ -4,8 +4,8 @@ SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON (A.a = arrayJoin([1])); -- { SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON equals(a); -- { serverError 62 } SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON less(a); -- { serverError 62 } -SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b OR a = b; -- { serverError 48 } -SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b AND a > b; -- { serverError 48 } -SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b AND a < b; -- { serverError 48 } -SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b AND a >= b; -- { serverError 48 } -SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b AND a <= b; -- { serverError 48 } +SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b OR a = b; -- { serverError 403 } +SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b AND a > b; -- { serverError 403 } +SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b AND a < b; -- { serverError 403 } +SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b AND a >= b; -- { serverError 403 } +SELECT 1 FROM (select 1 a) A JOIN (select 1 b) B ON a = b AND a <= b; -- { serverError 403 } diff --git a/tests/queries/0_stateless/01440_to_date_monotonicity.sql b/tests/queries/0_stateless/01440_to_date_monotonicity.sql index e48911e954c..8843d7ffca6 100644 --- a/tests/queries/0_stateless/01440_to_date_monotonicity.sql +++ b/tests/queries/0_stateless/01440_to_date_monotonicity.sql @@ -1,14 +1,16 @@ DROP TABLE IF EXISTS tdm; DROP TABLE IF EXISTS tdm2; -CREATE TABLE tdm (x DateTime) ENGINE = MergeTree ORDER BY x SETTINGS write_final_mark = 0; +CREATE TABLE tdm (x DateTime('Europe/Moscow')) ENGINE = MergeTree ORDER BY x SETTINGS write_final_mark = 0; INSERT INTO tdm VALUES (now()); -SELECT count(x) FROM tdm WHERE toDate(x) < today() SETTINGS max_rows_to_read = 1; +SELECT count(x) FROM tdm WHERE toDate(x) < toDate(now(), 'Europe/Moscow') SETTINGS max_rows_to_read = 1; -SELECT toDate(-1), toDate(10000000000000), toDate(100), toDate(65536), toDate(65535); -SELECT toDateTime(-1), toDateTime(10000000000000), toDateTime(1000); +SELECT toDate(-1), toDate(10000000000000, 'Europe/Moscow'), toDate(100), toDate(65536, 'UTC'), toDate(65535, 'Europe/Moscow'); +SELECT toDateTime(-1, 'Europe/Moscow'), toDateTime(10000000000000, 'Europe/Moscow'), toDateTime(1000, 'Europe/Moscow'); CREATE TABLE tdm2 (timestamp UInt32) ENGINE = MergeTree ORDER BY timestamp SETTINGS index_granularity = 1; + INSERT INTO tdm2 VALUES (toUnixTimestamp('2000-01-01 13:12:12')), (toUnixTimestamp('2000-01-01 14:12:12')), (toUnixTimestamp('2000-01-01 15:12:12')); + SET max_rows_to_read = 1; SELECT toDateTime(timestamp) FROM tdm2 WHERE toHour(toDateTime(timestamp)) = 13; diff --git a/tests/queries/0_stateless/01451_normalize_query.reference b/tests/queries/0_stateless/01451_normalize_query.reference index 67aa3a97998..339ad34ea77 100644 --- a/tests/queries/0_stateless/01451_normalize_query.reference +++ b/tests/queries/0_stateless/01451_normalize_query.reference @@ -20,3 +20,6 @@ SELECT ? AS xyz11 SELECT ? xyz11 SELECT ?, xyz11 SELECT ?.. +SELECT ? xyz11 +SELECT ?, xyz11 +SELECT ?.. diff --git a/tests/queries/0_stateless/01451_normalize_query.sql b/tests/queries/0_stateless/01451_normalize_query.sql index d1e45dea967..3c01a975712 100644 --- a/tests/queries/0_stateless/01451_normalize_query.sql +++ b/tests/queries/0_stateless/01451_normalize_query.sql @@ -20,3 +20,7 @@ SELECT normalizeQuery('SELECT 1 AS xyz11'); SELECT normalizeQuery('SELECT 1 xyz11'); SELECT normalizeQuery('SELECT 1, xyz11'); SELECT normalizeQuery('SELECT 1, ''xyz11'''); +SELECT normalizeQuery('SELECT $doc$VALUE$doc$ xyz11'); +SELECT normalizeQuery('SELECT $doc$VALUE$doc$, xyz11'); +SELECT normalizeQuery('SELECT $doc$VALUE$doc$, ''xyz11'''); + diff --git a/tests/queries/0_stateless/01452_normalized_query_hash.reference b/tests/queries/0_stateless/01452_normalized_query_hash.reference index fcb49fa9945..bb0850568bb 100644 --- a/tests/queries/0_stateless/01452_normalized_query_hash.reference +++ b/tests/queries/0_stateless/01452_normalized_query_hash.reference @@ -5,3 +5,5 @@ 1 1 1 +1 +1 diff --git a/tests/queries/0_stateless/01452_normalized_query_hash.sql b/tests/queries/0_stateless/01452_normalized_query_hash.sql index a888d2b87b5..0ae95b5292c 100644 --- a/tests/queries/0_stateless/01452_normalized_query_hash.sql +++ b/tests/queries/0_stateless/01452_normalized_query_hash.sql @@ -5,3 +5,7 @@ SELECT normalizedQueryHash('[1, 2, 3]') = normalizedQueryHash('[1, ''x'']'); SELECT normalizedQueryHash('[1, 2, 3, x]') != normalizedQueryHash('[1, x]'); SELECT normalizedQueryHash('SELECT 1 AS `xyz`') != normalizedQueryHash('SELECT 1 AS `abc`'); SELECT normalizedQueryHash('SELECT 1 AS xyz111') = normalizedQueryHash('SELECT 2 AS xyz234'); +SELECT normalizedQueryHash('SELECT $doc$VALUE$doc$ AS `xyz`') != normalizedQueryHash('SELECT $doc$VALUE$doc$ AS `abc`'); +SELECT normalizedQueryHash('SELECT $doc$VALUE$doc$ AS xyz111') = normalizedQueryHash('SELECT $doc$VALUE$doc$ AS xyz234'); + + diff --git a/tests/queries/0_stateless/01455_shard_leaf_max_rows_bytes_to_read.sql b/tests/queries/0_stateless/01455_shard_leaf_max_rows_bytes_to_read.sql index fca5c4534f7..d21aa391890 100644 --- a/tests/queries/0_stateless/01455_shard_leaf_max_rows_bytes_to_read.sql +++ b/tests/queries/0_stateless/01455_shard_leaf_max_rows_bytes_to_read.sql @@ -1,3 +1,10 @@ +-- Leaf limits is unreliable w/ prefer_localhost_replica=1. +-- Since in this case initial query and the query on the local node (to the +-- underlying table) has the same counters, so if query on the remote node +-- will be finished before local, then local node will already have some rows +-- read, and leaf limit will fail. +SET prefer_localhost_replica=0; + SELECT count() FROM (SELECT * FROM remote('127.0.0.1', system.numbers) LIMIT 100) SETTINGS max_rows_to_read_leaf=1; -- { serverError 158 } SELECT count() FROM (SELECT * FROM remote('127.0.0.1', system.numbers) LIMIT 100) SETTINGS max_bytes_to_read_leaf=1; -- { serverError 307 } SELECT count() FROM (SELECT * FROM remote('127.0.0.1', system.numbers) LIMIT 100) SETTINGS max_rows_to_read_leaf=100; @@ -26,4 +33,4 @@ SELECT count() FROM (SELECT * FROM test_distributed) SETTINGS max_bytes_to_read SELECT count() FROM (SELECT * FROM test_distributed) SETTINGS max_bytes_to_read_leaf = 100000; DROP TABLE IF EXISTS test_local; -DROP TABLE IF EXISTS test_distributed; \ No newline at end of file +DROP TABLE IF EXISTS test_distributed; diff --git a/tests/queries/0_stateless/01460_DistributedFilesToInsert.sql b/tests/queries/0_stateless/01460_DistributedFilesToInsert.sql index 34c0d55d573..02e3d3ef73f 100644 --- a/tests/queries/0_stateless/01460_DistributedFilesToInsert.sql +++ b/tests/queries/0_stateless/01460_DistributedFilesToInsert.sql @@ -2,33 +2,31 @@ -- (i.e. no .bin files and hence no sending is required) set prefer_localhost_replica=0; -set distributed_directory_monitor_sleep_time_ms=50; - drop table if exists data_01460; drop table if exists dist_01460; create table data_01460 as system.one engine=Null(); -create table dist_01460 as data_01460 engine=Distributed(test_shard_localhost, currentDatabase(), data_01460); +create table dist_01460 as data_01460 engine=Distributed(test_shard_localhost, currentDatabase(), data_01460) settings monitor_sleep_time_ms=50; select 'INSERT'; select value from system.metrics where metric = 'DistributedFilesToInsert'; insert into dist_01460 select * from system.one; -select sleep(1) format Null; -- distributed_directory_monitor_sleep_time_ms +select sleep(1) format Null; -- monitor_sleep_time_ms select value from system.metrics where metric = 'DistributedFilesToInsert'; select 'STOP/START DISTRIBUTED SENDS'; system stop distributed sends dist_01460; insert into dist_01460 select * from system.one; -select sleep(1) format Null; -- distributed_directory_monitor_sleep_time_ms +select sleep(1) format Null; -- monitor_sleep_time_ms select value from system.metrics where metric = 'DistributedFilesToInsert'; system start distributed sends dist_01460; -select sleep(1) format Null; -- distributed_directory_monitor_sleep_time_ms +select sleep(1) format Null; -- monitor_sleep_time_ms select value from system.metrics where metric = 'DistributedFilesToInsert'; select 'FLUSH DISTRIBUTED'; system stop distributed sends dist_01460; insert into dist_01460 select * from system.one; -select sleep(1) format Null; -- distributed_directory_monitor_sleep_time_ms +select sleep(1) format Null; -- monitor_sleep_time_ms select value from system.metrics where metric = 'DistributedFilesToInsert'; system flush distributed dist_01460; select value from system.metrics where metric = 'DistributedFilesToInsert'; @@ -36,7 +34,7 @@ select value from system.metrics where metric = 'DistributedFilesToInsert'; select 'DROP TABLE'; system stop distributed sends dist_01460; insert into dist_01460 select * from system.one; -select sleep(1) format Null; -- distributed_directory_monitor_sleep_time_ms +select sleep(1) format Null; -- monitor_sleep_time_ms select value from system.metrics where metric = 'DistributedFilesToInsert'; drop table dist_01460; select value from system.metrics where metric = 'DistributedFilesToInsert'; diff --git a/tests/queries/0_stateless/01508_partition_pruning_long.queries b/tests/queries/0_stateless/01508_partition_pruning_long.queries index 3773e907c53..786240145a9 100644 --- a/tests/queries/0_stateless/01508_partition_pruning_long.queries +++ b/tests/queries/0_stateless/01508_partition_pruning_long.queries @@ -2,20 +2,20 @@ DROP TABLE IF EXISTS tMM; DROP TABLE IF EXISTS tDD; DROP TABLE IF EXISTS sDD; DROP TABLE IF EXISTS xMM; -CREATE TABLE tMM(d DateTime,a Int64) ENGINE = MergeTree PARTITION BY toYYYYMM(d) ORDER BY tuple() SETTINGS index_granularity = 8192; +CREATE TABLE tMM(d DateTime('Europe/Moscow'), a Int64) ENGINE = MergeTree PARTITION BY toYYYYMM(d) ORDER BY tuple() SETTINGS index_granularity = 8192; SYSTEM STOP MERGES tMM; -INSERT INTO tMM SELECT toDateTime('2020-08-16 00:00:00') + number*60, number FROM numbers(5000); -INSERT INTO tMM SELECT toDateTime('2020-08-16 00:00:00') + number*60, number FROM numbers(5000); -INSERT INTO tMM SELECT toDateTime('2020-09-01 00:00:00') + number*60, number FROM numbers(5000); -INSERT INTO tMM SELECT toDateTime('2020-09-01 00:00:00') + number*60, number FROM numbers(5000); -INSERT INTO tMM SELECT toDateTime('2020-10-01 00:00:00') + number*60, number FROM numbers(5000); -INSERT INTO tMM SELECT toDateTime('2020-10-15 00:00:00') + number*60, number FROM numbers(5000); +INSERT INTO tMM SELECT toDateTime('2020-08-16 00:00:00', 'Europe/Moscow') + number*60, number FROM numbers(5000); +INSERT INTO tMM SELECT toDateTime('2020-08-16 00:00:00', 'Europe/Moscow') + number*60, number FROM numbers(5000); +INSERT INTO tMM SELECT toDateTime('2020-09-01 00:00:00', 'Europe/Moscow') + number*60, number FROM numbers(5000); +INSERT INTO tMM SELECT toDateTime('2020-09-01 00:00:00', 'Europe/Moscow') + number*60, number FROM numbers(5000); +INSERT INTO tMM SELECT toDateTime('2020-10-01 00:00:00', 'Europe/Moscow') + number*60, number FROM numbers(5000); +INSERT INTO tMM SELECT toDateTime('2020-10-15 00:00:00', 'Europe/Moscow') + number*60, number FROM numbers(5000); -CREATE TABLE tDD(d DateTime,a Int) ENGINE = MergeTree PARTITION BY toYYYYMMDD(d) ORDER BY tuple() SETTINGS index_granularity = 8192; +CREATE TABLE tDD(d DateTime('Europe/Moscow'),a Int) ENGINE = MergeTree PARTITION BY toYYYYMMDD(d) ORDER BY tuple() SETTINGS index_granularity = 8192; SYSTEM STOP MERGES tDD; -insert into tDD select toDateTime(toDate('2020-09-23')), number from numbers(10000) UNION ALL select toDateTime(toDateTime('2020-09-23 11:00:00')), number from numbers(10000) UNION ALL select toDateTime(toDate('2020-09-24')), number from numbers(10000) UNION ALL select toDateTime(toDate('2020-09-25')), number from numbers(10000) UNION ALL select toDateTime(toDate('2020-08-15')), number from numbers(10000); +insert into tDD select toDateTime(toDate('2020-09-23'), 'Europe/Moscow'), number from numbers(10000) UNION ALL select toDateTime(toDateTime('2020-09-23 11:00:00', 'Europe/Moscow')), number from numbers(10000) UNION ALL select toDateTime(toDate('2020-09-24'), 'Europe/Moscow'), number from numbers(10000) UNION ALL select toDateTime(toDate('2020-09-25'), 'Europe/Moscow'), number from numbers(10000) UNION ALL select toDateTime(toDate('2020-08-15'), 'Europe/Moscow'), number from numbers(10000); -CREATE TABLE sDD(d UInt64,a Int) ENGINE = MergeTree PARTITION BY toYYYYMM(toDate(intDiv(d,1000))) ORDER BY tuple() SETTINGS index_granularity = 8192; +CREATE TABLE sDD(d UInt64,a Int) ENGINE = MergeTree PARTITION BY toYYYYMM(toDate(intDiv(d,1000), 'Europe/Moscow')) ORDER BY tuple() SETTINGS index_granularity = 8192; SYSTEM STOP MERGES sDD; insert into sDD select (1597536000+number*60)*1000, number from numbers(5000); insert into sDD select (1597536000+number*60)*1000, number from numbers(5000); @@ -24,14 +24,14 @@ insert into sDD select (1598918400+number*60)*1000, number from numbers(5000); insert into sDD select (1601510400+number*60)*1000, number from numbers(5000); insert into sDD select (1602720000+number*60)*1000, number from numbers(5000); -CREATE TABLE xMM(d DateTime,a Int64, f Int64) ENGINE = MergeTree PARTITION BY (toYYYYMM(d), a) ORDER BY tuple() SETTINGS index_granularity = 8192; +CREATE TABLE xMM(d DateTime('Europe/Moscow'),a Int64, f Int64) ENGINE = MergeTree PARTITION BY (toYYYYMM(d), a) ORDER BY tuple() SETTINGS index_granularity = 8192; SYSTEM STOP MERGES xMM; -INSERT INTO xMM SELECT toDateTime('2020-08-16 00:00:00') + number*60, 1, number FROM numbers(5000); -INSERT INTO xMM SELECT toDateTime('2020-08-16 00:00:00') + number*60, 2, number FROM numbers(5000); -INSERT INTO xMM SELECT toDateTime('2020-09-01 00:00:00') + number*60, 3, number FROM numbers(5000); -INSERT INTO xMM SELECT toDateTime('2020-09-01 00:00:00') + number*60, 2, number FROM numbers(5000); -INSERT INTO xMM SELECT toDateTime('2020-10-01 00:00:00') + number*60, 1, number FROM numbers(5000); -INSERT INTO xMM SELECT toDateTime('2020-10-15 00:00:00') + number*60, 1, number FROM numbers(5000); +INSERT INTO xMM SELECT toDateTime('2020-08-16 00:00:00', 'Europe/Moscow') + number*60, 1, number FROM numbers(5000); +INSERT INTO xMM SELECT toDateTime('2020-08-16 00:00:00', 'Europe/Moscow') + number*60, 2, number FROM numbers(5000); +INSERT INTO xMM SELECT toDateTime('2020-09-01 00:00:00', 'Europe/Moscow') + number*60, 3, number FROM numbers(5000); +INSERT INTO xMM SELECT toDateTime('2020-09-01 00:00:00', 'Europe/Moscow') + number*60, 2, number FROM numbers(5000); +INSERT INTO xMM SELECT toDateTime('2020-10-01 00:00:00', 'Europe/Moscow') + number*60, 1, number FROM numbers(5000); +INSERT INTO xMM SELECT toDateTime('2020-10-15 00:00:00', 'Europe/Moscow') + number*60, 1, number FROM numbers(5000); SELECT '--------- tMM ----------------------------'; @@ -44,8 +44,8 @@ select uniqExact(_part), count() from tMM where toYYYYMMDD(d)=20200816; select uniqExact(_part), count() from tMM where toYYYYMMDD(d)=20201015; select uniqExact(_part), count() from tMM where toDate(d)='2020-10-15'; select uniqExact(_part), count() from tMM where d >= '2020-09-01 00:00:00' and d<'2020-10-15 00:00:00'; -select uniqExact(_part), count() from tMM where d >= '2020-01-16 00:00:00' and d < toDateTime('2021-08-17 00:00:00'); -select uniqExact(_part), count() from tMM where d >= '2020-09-16 00:00:00' and d < toDateTime('2020-10-01 00:00:00'); +select uniqExact(_part), count() from tMM where d >= '2020-01-16 00:00:00' and d < toDateTime('2021-08-17 00:00:00', 'Europe/Moscow'); +select uniqExact(_part), count() from tMM where d >= '2020-09-16 00:00:00' and d < toDateTime('2020-10-01 00:00:00', 'Europe/Moscow'); select uniqExact(_part), count() from tMM where d >= '2020-09-12 00:00:00' and d < '2020-10-16 00:00:00'; select uniqExact(_part), count() from tMM where toStartOfDay(d) >= '2020-09-12 00:00:00'; select uniqExact(_part), count() from tMM where toStartOfDay(d) = '2020-09-01 00:00:00'; diff --git a/tests/queries/0_stateless/01508_partition_pruning_long.reference b/tests/queries/0_stateless/01508_partition_pruning_long.reference index 334ecb63164..9cd208a336f 100644 --- a/tests/queries/0_stateless/01508_partition_pruning_long.reference +++ b/tests/queries/0_stateless/01508_partition_pruning_long.reference @@ -35,11 +35,11 @@ select uniqExact(_part), count() from tMM where d >= '2020-09-01 00:00:00' and d 3 15000 Selected 3/6 parts by partition key, 3 parts by primary key, 3/3 marks by primary key, 3 marks to read from 3 ranges -select uniqExact(_part), count() from tMM where d >= '2020-01-16 00:00:00' and d < toDateTime('2021-08-17 00:00:00'); +select uniqExact(_part), count() from tMM where d >= '2020-01-16 00:00:00' and d < toDateTime('2021-08-17 00:00:00', 'Europe/Moscow'); 6 30000 Selected 6/6 parts by partition key, 6 parts by primary key, 6/6 marks by primary key, 6 marks to read from 6 ranges -select uniqExact(_part), count() from tMM where d >= '2020-09-16 00:00:00' and d < toDateTime('2020-10-01 00:00:00'); +select uniqExact(_part), count() from tMM where d >= '2020-09-16 00:00:00' and d < toDateTime('2020-10-01 00:00:00', 'Europe/Moscow'); 0 0 Selected 0/6 parts by partition key, 0 parts by primary key, 0/0 marks by primary key, 0 marks to read from 0 ranges diff --git a/tests/queries/0_stateless/01508_partition_pruning_long.sh b/tests/queries/0_stateless/01508_partition_pruning_long.sh index 1b3c524ac77..745d08496a7 100755 --- a/tests/queries/0_stateless/01508_partition_pruning_long.sh +++ b/tests/queries/0_stateless/01508_partition_pruning_long.sh @@ -1,22 +1,15 @@ #!/usr/bin/env bash -#------------------------------------------------------------------------------------------- # Description of test result: -# Test the correctness of the partition -# pruning +# Test the correctness of the partition pruning # -# Script executes queries from a file 01508_partition_pruning_long.queries (1 line = 1 query) -# Queries are started with 'select' (but NOT with 'SELECT') are executed with log_level=debug -#------------------------------------------------------------------------------------------- +# Script executes queries from a file 01508_partition_pruning_long.queries (1 line = 1 query) +# Queries are started with 'select' (but NOT with 'SELECT') are executed with log_level=debug CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CURDIR"/../shell_config.sh -#export CLICKHOUSE_CLIENT="clickhouse-client --send_logs_level=none" -#export CLICKHOUSE_CLIENT_SERVER_LOGS_LEVEL=none -#export CURDIR=. - queries="${CURDIR}/01508_partition_pruning_long.queries" while IFS= read -r sql diff --git a/tests/queries/0_stateless/01509_check_many_parallel_quorum_inserts_long.sh b/tests/queries/0_stateless/01509_check_many_parallel_quorum_inserts_long.sh index b71654e7e6c..187357b94e2 100755 --- a/tests/queries/0_stateless/01509_check_many_parallel_quorum_inserts_long.sh +++ b/tests/queries/0_stateless/01509_check_many_parallel_quorum_inserts_long.sh @@ -16,13 +16,20 @@ for i in $(seq 1 $NUM_REPLICAS); do done function thread { - $CLICKHOUSE_CLIENT --insert_quorum 5 --insert_quorum_parallel 1 --query "INSERT INTO r$1 SELECT $2" + while true + do + $CLICKHOUSE_CLIENT --insert_quorum 5 --insert_quorum_parallel 1 --query "INSERT INTO r$1 SELECT $2" && break + sleep 0.1 + done } for i in $(seq 1 $NUM_REPLICAS); do for j in {0..9}; do a=$((($i - 1) * 10 + $j)) - thread $i $a & + + # Note: making 100 connections simultaneously is a mini-DoS when server is build with sanitizers and CI environment is overloaded. + # That's why we repeat "socket timeout" errors. + thread $i $a 2>&1 | grep -v -P 'SOCKET_TIMEOUT|NETWORK_ERROR|^$' & done done diff --git a/tests/queries/0_stateless/01526_client_start_and_exit.expect-not-a-test-case b/tests/queries/0_stateless/01526_client_start_and_exit.expect-not-a-test-case index 585c8c369dd..00fb5c4e85b 100755 --- a/tests/queries/0_stateless/01526_client_start_and_exit.expect-not-a-test-case +++ b/tests/queries/0_stateless/01526_client_start_and_exit.expect-not-a-test-case @@ -4,7 +4,7 @@ log_user 1 set timeout 5 match_max 100000 -spawn bash -c "$env(CLICKHOUSE_CLIENT_BINARY) $env(CLICKHOUSE_CLIENT_OPT)" +spawn bash -c "$env(CLICKHOUSE_CLIENT_BINARY) --no-warnings $env(CLICKHOUSE_CLIENT_OPT)" expect ":) " send -- "\4" expect eof diff --git a/tests/queries/0_stateless/01551_mergetree_read_in_order_spread.reference b/tests/queries/0_stateless/01551_mergetree_read_in_order_spread.reference index 2843b305f0a..cdc595a3c57 100644 --- a/tests/queries/0_stateless/01551_mergetree_read_in_order_spread.reference +++ b/tests/queries/0_stateless/01551_mergetree_read_in_order_spread.reference @@ -9,9 +9,9 @@ ExpressionTransform (SettingQuotaAndLimits) (ReadFromMergeTree) ExpressionTransform × 4 - MergeTree 0 → 1 + MergeTreeInOrder 0 → 1 MergingSortedTransform 2 → 1 ExpressionTransform × 2 - MergeTree × 2 0 → 1 + MergeTreeInOrder × 2 0 → 1 ExpressionTransform - MergeTree 0 → 1 + MergeTreeInOrder 0 → 1 diff --git a/tests/queries/0_stateless/01568_window_functions_distributed.reference b/tests/queries/0_stateless/01568_window_functions_distributed.reference index 483e84a2bee..0b439ef759a 100644 --- a/tests/queries/0_stateless/01568_window_functions_distributed.reference +++ b/tests/queries/0_stateless/01568_window_functions_distributed.reference @@ -1,5 +1,4 @@ -- { echo } -set allow_experimental_window_functions = 1; select row_number() over (order by dummy) from (select * from remote('127.0.0.{1,2}', system, one)); 1 2 diff --git a/tests/queries/0_stateless/01568_window_functions_distributed.sql b/tests/queries/0_stateless/01568_window_functions_distributed.sql index 6f38597a7a3..5e20c57d23d 100644 --- a/tests/queries/0_stateless/01568_window_functions_distributed.sql +++ b/tests/queries/0_stateless/01568_window_functions_distributed.sql @@ -1,6 +1,4 @@ -- { echo } -set allow_experimental_window_functions = 1; - select row_number() over (order by dummy) from (select * from remote('127.0.0.{1,2}', system, one)); select row_number() over (order by dummy) from remote('127.0.0.{1,2}', system, one); diff --git a/tests/queries/0_stateless/01571_window_functions.reference b/tests/queries/0_stateless/01571_window_functions.reference index bbac8e5ac6d..420f7575a52 100644 --- a/tests/queries/0_stateless/01571_window_functions.reference +++ b/tests/queries/0_stateless/01571_window_functions.reference @@ -1,6 +1,37 @@ -- { echo } -- Another test for window functions because the other one is too long. -set allow_experimental_window_functions = 1; + +-- some craziness with a mix of materialized and unmaterialized const columns +-- after merging sorted transform, that used to break the peer group detection in +-- the window transform. +CREATE TABLE order_by_const +( + `a` UInt64, + `b` UInt64, + `c` UInt64, + `d` UInt64 +) +ENGINE = MergeTree +ORDER BY (a, b) +SETTINGS index_granularity = 8192; +truncate table order_by_const; +system stop merges order_by_const; +INSERT INTO order_by_const(a, b, c, d) VALUES (1, 1, 101, 1), (1, 2, 102, 1), (1, 3, 103, 1), (1, 4, 104, 1); +INSERT INTO order_by_const(a, b, c, d) VALUES (1, 5, 104, 1), (1, 6, 105, 1), (2, 1, 106, 2), (2, 1, 107, 2); +INSERT INTO order_by_const(a, b, c, d) VALUES (2, 2, 107, 2), (2, 3, 108, 2), (2, 4, 109, 2); +SELECT row_number() OVER (order by 1, a) FROM order_by_const; +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +drop table order_by_const; -- expressions in window frame select count() over (rows between 1 + 1 preceding and 1 + 1 following) from numbers(10); 3 @@ -15,3 +46,26 @@ select count() over (rows between 1 + 1 preceding and 1 + 1 following) from numb 3 -- signed and unsigned in offset do not cause logical error select count() over (rows between 2 following and 1 + -1 following) FROM numbers(10); -- { serverError 36 } +-- default arguments of lagInFrame can be a subtype of the argument +select number, + lagInFrame(toNullable(number), 2, null) over w, + lagInFrame(number, 2, 1) over w +from numbers(10) +window w as (order by number) +; +0 \N 1 +1 \N 1 +2 0 0 +3 1 1 +4 2 2 +5 3 3 +6 4 4 +7 5 5 +8 6 6 +9 7 7 +-- the case when current_row goes past the partition end at the block end +select number, row_number() over (partition by number rows between unbounded preceding and 1 preceding) from numbers(4) settings max_block_size = 2; +0 1 +1 1 +2 1 +3 1 diff --git a/tests/queries/0_stateless/01571_window_functions.sql b/tests/queries/0_stateless/01571_window_functions.sql index c6479044b59..4aaba19100a 100644 --- a/tests/queries/0_stateless/01571_window_functions.sql +++ b/tests/queries/0_stateless/01571_window_functions.sql @@ -1,9 +1,42 @@ -- { echo } -- Another test for window functions because the other one is too long. -set allow_experimental_window_functions = 1; + +-- some craziness with a mix of materialized and unmaterialized const columns +-- after merging sorted transform, that used to break the peer group detection in +-- the window transform. +CREATE TABLE order_by_const +( + `a` UInt64, + `b` UInt64, + `c` UInt64, + `d` UInt64 +) +ENGINE = MergeTree +ORDER BY (a, b) +SETTINGS index_granularity = 8192; + +truncate table order_by_const; +system stop merges order_by_const; +INSERT INTO order_by_const(a, b, c, d) VALUES (1, 1, 101, 1), (1, 2, 102, 1), (1, 3, 103, 1), (1, 4, 104, 1); +INSERT INTO order_by_const(a, b, c, d) VALUES (1, 5, 104, 1), (1, 6, 105, 1), (2, 1, 106, 2), (2, 1, 107, 2); +INSERT INTO order_by_const(a, b, c, d) VALUES (2, 2, 107, 2), (2, 3, 108, 2), (2, 4, 109, 2); +SELECT row_number() OVER (order by 1, a) FROM order_by_const; + +drop table order_by_const; -- expressions in window frame select count() over (rows between 1 + 1 preceding and 1 + 1 following) from numbers(10); -- signed and unsigned in offset do not cause logical error select count() over (rows between 2 following and 1 + -1 following) FROM numbers(10); -- { serverError 36 } + +-- default arguments of lagInFrame can be a subtype of the argument +select number, + lagInFrame(toNullable(number), 2, null) over w, + lagInFrame(number, 2, 1) over w +from numbers(10) +window w as (order by number) +; + +-- the case when current_row goes past the partition end at the block end +select number, row_number() over (partition by number rows between unbounded preceding and 1 preceding) from numbers(4) settings max_block_size = 2; diff --git a/tests/queries/0_stateless/01576_alias_column_rewrite.reference b/tests/queries/0_stateless/01576_alias_column_rewrite.reference index ef598570b10..c9a4c04b352 100644 --- a/tests/queries/0_stateless/01576_alias_column_rewrite.reference +++ b/tests/queries/0_stateless/01576_alias_column_rewrite.reference @@ -61,3 +61,4 @@ second-index 1 1 1 +1 1 diff --git a/tests/queries/0_stateless/01576_alias_column_rewrite.sql b/tests/queries/0_stateless/01576_alias_column_rewrite.sql index cab32db0192..910c95afd64 100644 --- a/tests/queries/0_stateless/01576_alias_column_rewrite.sql +++ b/tests/queries/0_stateless/01576_alias_column_rewrite.sql @@ -127,3 +127,11 @@ select sum(i) from pd group by dt_m settings allow_experimental_projection_optim drop table pd; drop table pl; + +drop table if exists t; + +create temporary table t (x UInt64, y alias x); +insert into t values (1); +select sum(x), sum(y) from t; + +drop table t; diff --git a/tests/queries/0_stateless/01585_use_index_for_global_in_with_null.reference b/tests/queries/0_stateless/01585_use_index_for_global_in_with_null.reference new file mode 100644 index 00000000000..de0116f9eaa --- /dev/null +++ b/tests/queries/0_stateless/01585_use_index_for_global_in_with_null.reference @@ -0,0 +1,20 @@ +0 2 +1 3 +0 2 +1 3 +0 2 +1 3 +0 2 +1 3 +0 2 +1 3 +0 2 +1 3 +0 2 +1 3 +0 2 +1 3 +\N 100 +\N 100 +\N 100 +\N 100 diff --git a/tests/queries/0_stateless/01585_use_index_for_global_in_with_null.sql b/tests/queries/0_stateless/01585_use_index_for_global_in_with_null.sql new file mode 100644 index 00000000000..72f12ce435a --- /dev/null +++ b/tests/queries/0_stateless/01585_use_index_for_global_in_with_null.sql @@ -0,0 +1,30 @@ +drop table if exists xp; +drop table if exists xp_d; + +create table xp(i Nullable(UInt64), j UInt64) engine MergeTree order by i settings index_granularity = 1, allow_nullable_key = 1; +create table xp_d as xp engine Distributed(test_shard_localhost, currentDatabase(), xp); + +insert into xp select number, number + 2 from numbers(10); +insert into xp select null, 100; + +optimize table xp final; + +set max_rows_to_read = 2; +select * from xp where i in (select * from numbers(2)); +select * from xp where i global in (select * from numbers(2)); +select * from xp_d where i in (select * from numbers(2)); +select * from xp_d where i global in (select * from numbers(2)); + +set transform_null_in = 1; +select * from xp where i in (select * from numbers(2)); +select * from xp where i global in (select * from numbers(2)); +select * from xp_d where i in (select * from numbers(2)); +select * from xp_d where i global in (select * from numbers(2)); + +select * from xp where i in (null); +select * from xp where i global in (null); +select * from xp_d where i in (null); +select * from xp_d where i global in (null); + +drop table if exists xp; +drop table if exists xp_d; diff --git a/tests/queries/0_stateless/01591_window_functions.reference b/tests/queries/0_stateless/01591_window_functions.reference index fa909972678..26e9e500c3c 100644 --- a/tests/queries/0_stateless/01591_window_functions.reference +++ b/tests/queries/0_stateless/01591_window_functions.reference @@ -1,6 +1,5 @@ -- { echo } -set allow_experimental_window_functions = 1; -- just something basic select number, count() over (partition by intDiv(number, 3) order by number rows unbounded preceding) from numbers(10); 0 1 @@ -1056,10 +1055,26 @@ settings max_block_size = 3; 15 3 15 0 15 15 15 -- careful with auto-application of Null combinator select lagInFrame(toNullable(1)) over (); -0 +\N select lagInFrameOrNull(1) over (); -- { serverError 36 } +-- this is the same as `select max(Null::Nullable(Nothing))` select intDiv(1, NULL) x, toTypeName(x), max(x) over (); \N Nullable(Nothing) \N +-- to make lagInFrame return null for out-of-frame rows, cast the argument to +-- Nullable; otherwise, it returns default values. +SELECT + number, + lagInFrame(toNullable(number), 1) OVER w, + lagInFrame(toNullable(number), 2) OVER w, + lagInFrame(number, 1) OVER w, + lagInFrame(number, 2) OVER w +FROM numbers(4) +WINDOW w AS (ORDER BY number ASC) +; +0 \N \N 0 0 +1 0 \N 0 0 +2 1 0 1 0 +3 2 1 2 1 -- case-insensitive SQL-standard synonyms for any and anyLast select number, @@ -1079,6 +1094,62 @@ order by number 7 6 8 8 7 9 9 8 9 +-- nth_value without specific frame range given +select + number, + nth_value(number, 1) over w as firstValue, + nth_value(number, 2) over w as secondValue, + nth_value(number, 3) over w as thirdValue, + nth_value(number, 4) over w as fourthValue +from numbers(10) +window w as (order by number) +order by number +; +0 0 0 0 0 +1 0 1 0 0 +2 0 1 2 0 +3 0 1 2 3 +4 0 1 2 3 +5 0 1 2 3 +6 0 1 2 3 +7 0 1 2 3 +8 0 1 2 3 +9 0 1 2 3 +-- nth_value with frame range specified +select + number, + nth_value(number, 1) over w as firstValue, + nth_value(number, 2) over w as secondValue, + nth_value(number, 3) over w as thirdValue, + nth_value(number, 4) over w as fourthValue +from numbers(10) +window w as (order by number range between 1 preceding and 1 following) +order by number +; +0 0 1 0 0 +1 0 1 2 0 +2 1 2 3 0 +3 2 3 4 0 +4 3 4 5 0 +5 4 5 6 0 +6 5 6 7 0 +7 6 7 8 0 +8 7 8 9 0 +9 8 9 0 0 +-- to make nth_value return null for out-of-frame rows, cast the argument to +-- Nullable; otherwise, it returns default values. +SELECT + number, + nth_value(toNullable(number), 1) OVER w as firstValue, + nth_value(toNullable(number), 3) OVER w as thridValue +FROM numbers(5) +WINDOW w AS (ORDER BY number ASC) +; +0 0 \N +1 0 \N +2 0 2 +3 0 2 +4 0 2 -- In this case, we had a problem with PartialSortingTransform returning zero-row -- chunks for input chunks w/o columns. select count() over () from numbers(4) where number < 2; diff --git a/tests/queries/0_stateless/01591_window_functions.sql b/tests/queries/0_stateless/01591_window_functions.sql index 05f4cb49252..3075c1ddb46 100644 --- a/tests/queries/0_stateless/01591_window_functions.sql +++ b/tests/queries/0_stateless/01591_window_functions.sql @@ -1,7 +1,5 @@ -- { echo } -set allow_experimental_window_functions = 1; - -- just something basic select number, count() over (partition by intDiv(number, 3) order by number rows unbounded preceding) from numbers(10); @@ -379,7 +377,19 @@ settings max_block_size = 3; -- careful with auto-application of Null combinator select lagInFrame(toNullable(1)) over (); select lagInFrameOrNull(1) over (); -- { serverError 36 } +-- this is the same as `select max(Null::Nullable(Nothing))` select intDiv(1, NULL) x, toTypeName(x), max(x) over (); +-- to make lagInFrame return null for out-of-frame rows, cast the argument to +-- Nullable; otherwise, it returns default values. +SELECT + number, + lagInFrame(toNullable(number), 1) OVER w, + lagInFrame(toNullable(number), 2) OVER w, + lagInFrame(number, 1) OVER w, + lagInFrame(number, 2) OVER w +FROM numbers(4) +WINDOW w AS (ORDER BY number ASC) +; -- case-insensitive SQL-standard synonyms for any and anyLast select @@ -391,6 +401,40 @@ window w as (order by number range between 1 preceding and 1 following) order by number ; +-- nth_value without specific frame range given +select + number, + nth_value(number, 1) over w as firstValue, + nth_value(number, 2) over w as secondValue, + nth_value(number, 3) over w as thirdValue, + nth_value(number, 4) over w as fourthValue +from numbers(10) +window w as (order by number) +order by number +; + +-- nth_value with frame range specified +select + number, + nth_value(number, 1) over w as firstValue, + nth_value(number, 2) over w as secondValue, + nth_value(number, 3) over w as thirdValue, + nth_value(number, 4) over w as fourthValue +from numbers(10) +window w as (order by number range between 1 preceding and 1 following) +order by number +; + +-- to make nth_value return null for out-of-frame rows, cast the argument to +-- Nullable; otherwise, it returns default values. +SELECT + number, + nth_value(toNullable(number), 1) OVER w as firstValue, + nth_value(toNullable(number), 3) OVER w as thridValue +FROM numbers(5) +WINDOW w AS (ORDER BY number ASC) +; + -- In this case, we had a problem with PartialSortingTransform returning zero-row -- chunks for input chunks w/o columns. select count() over () from numbers(4) where number < 2; diff --git a/tests/queries/0_stateless/01592_long_window_functions1.sql b/tests/queries/0_stateless/01592_long_window_functions1.sql index bb0f77ff60a..14fe3affed3 100644 --- a/tests/queries/0_stateless/01592_long_window_functions1.sql +++ b/tests/queries/0_stateless/01592_long_window_functions1.sql @@ -1,6 +1,5 @@ drop table if exists stack; -set allow_experimental_window_functions = 1; set max_insert_threads = 4; create table stack(item_id Int64, brand_id Int64, rack_id Int64, dt DateTime, expiration_dt DateTime, quantity UInt64) diff --git a/tests/queries/0_stateless/01592_window_functions.sql b/tests/queries/0_stateless/01592_window_functions.sql index 8d5033fc821..b05b04628d2 100644 --- a/tests/queries/0_stateless/01592_window_functions.sql +++ b/tests/queries/0_stateless/01592_window_functions.sql @@ -1,5 +1,3 @@ -set allow_experimental_window_functions = 1; - drop table if exists product_groups; drop table if exists products; diff --git a/tests/queries/0_stateless/01600_log_queries_with_extensive_info.reference b/tests/queries/0_stateless/01600_log_queries_with_extensive_info.reference index 453827808f4..701e72b3b8e 100644 --- a/tests/queries/0_stateless/01600_log_queries_with_extensive_info.reference +++ b/tests/queries/0_stateless/01600_log_queries_with_extensive_info.reference @@ -6,6 +6,7 @@ create database test_log_queries 18329631544365042880 Create ['test_log_queries' create table test_log_queries.logtable(i int, j int, k int) engine MergeTree order by i 14473140110122260412 Create ['test_log_queries'] ['test_log_queries.logtable'] [] insert into test_log_queries.logtable values 10533878590475998223 Insert ['test_log_queries'] ['test_log_queries.logtable'] [] select k from test_log_queries.logtable where i = 4 10551554599491277990 Select ['test_log_queries'] ['test_log_queries.logtable'] ['test_log_queries.logtable.i','test_log_queries.logtable.k'] +select k from test_log_queries.logtable where i > \'\' 10861140285495151963 Select [] [] [] select k from test_log_queries.logtable where i = 1 10551554599491277990 Select ['test_log_queries'] ['test_log_queries.logtable'] ['test_log_queries.logtable.i','test_log_queries.logtable.k'] select * from test_log_queries.logtable where i = 1 2790142879136771124 Select ['test_log_queries'] ['test_log_queries.logtable'] ['test_log_queries.logtable.i','test_log_queries.logtable.j','test_log_queries.logtable.k'] create table test_log_queries.logtable2 as test_log_queries.logtable 16326833375045356331 Create ['test_log_queries'] ['test_log_queries.logtable','test_log_queries.logtable2'] [] diff --git a/tests/queries/0_stateless/01600_log_queries_with_extensive_info.sh b/tests/queries/0_stateless/01600_log_queries_with_extensive_info.sh index a6a8f221084..6f0f1c29208 100755 --- a/tests/queries/0_stateless/01600_log_queries_with_extensive_info.sh +++ b/tests/queries/0_stateless/01600_log_queries_with_extensive_info.sh @@ -10,6 +10,10 @@ ${CLICKHOUSE_CLIENT} -q "create database test_log_queries" "--query_id=01600_log ${CLICKHOUSE_CLIENT} -q "create table test_log_queries.logtable(i int, j int, k int) engine MergeTree order by i" "--query_id=01600_log_queries_with_extensive_info_002" ${CLICKHOUSE_CLIENT} -q "insert into test_log_queries.logtable values (1,2,3), (4,5,6)" "--query_id=01600_log_queries_with_extensive_info_003" ${CLICKHOUSE_CLIENT} -q "select k from test_log_queries.logtable where i = 4" "--query_id=01600_log_queries_with_extensive_info_004" + +# exception query should also contain query_kind +${CLICKHOUSE_CLIENT} -q "select k from test_log_queries.logtable where i > ''" "--query_id=01600_log_queries_with_extensive_info_004_err" 2> /dev/null || true + ${CLICKHOUSE_CLIENT} -q "select k from test_log_queries.logtable where i = 1" "--query_id=01600_log_queries_with_extensive_info_005" ${CLICKHOUSE_CLIENT} -q "select * from test_log_queries.logtable where i = 1" "--query_id=01600_log_queries_with_extensive_info_006" ${CLICKHOUSE_CLIENT} -q "create table test_log_queries.logtable2 as test_log_queries.logtable" "--query_id=01600_log_queries_with_extensive_info_007" @@ -26,4 +30,4 @@ ${CLICKHOUSE_CLIENT} -q "drop table if exists test_log_queries.logtable3" "--que ${CLICKHOUSE_CLIENT} -q "drop database if exists test_log_queries" "--query_id=01600_log_queries_with_extensive_info_018" ${CLICKHOUSE_CLIENT} -q "system flush logs" -${CLICKHOUSE_CLIENT} -q "select columns(query, normalized_query_hash, query_kind, databases, tables, columns) apply (any) from system.query_log where current_database = currentDatabase() AND type = 'QueryFinish' and query_id like '01600_log_queries_with_extensive_info%' group by query_id order by query_id" +${CLICKHOUSE_CLIENT} -q "select columns(query, normalized_query_hash, query_kind, databases, tables, columns) apply (any) from system.query_log where current_database = currentDatabase() AND type != 'QueryStart' and query_id like '01600_log_queries_with_extensive_info%' group by query_id order by query_id" diff --git a/tests/queries/0_stateless/01600_quota_by_forwarded_ip.sh b/tests/queries/0_stateless/01600_quota_by_forwarded_ip.sh index 323dd88efab..97e4da5f9e3 100755 --- a/tests/queries/0_stateless/01600_quota_by_forwarded_ip.sh +++ b/tests/queries/0_stateless/01600_quota_by_forwarded_ip.sh @@ -6,20 +6,14 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) $CLICKHOUSE_CLIENT -n --query " -DROP USER IF EXISTS quoted_by_ip; -DROP USER IF EXISTS quoted_by_forwarded_ip; +CREATE USER quoted_by_ip_${CLICKHOUSE_DATABASE}; +CREATE USER quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}; -DROP QUOTA IF EXISTS quota_by_ip; -DROP QUOTA IF EXISTS quota_by_forwarded_ip; +GRANT SELECT, CREATE ON *.* TO quoted_by_ip_${CLICKHOUSE_DATABASE}; +GRANT SELECT, CREATE ON *.* TO quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}; -CREATE USER quoted_by_ip; -CREATE USER quoted_by_forwarded_ip; - -GRANT SELECT, CREATE ON *.* TO quoted_by_ip; -GRANT SELECT, CREATE ON *.* TO quoted_by_forwarded_ip; - -CREATE QUOTA quota_by_ip KEYED BY ip_address FOR RANDOMIZED INTERVAL 1 YEAR MAX QUERIES = 1 TO quoted_by_ip; -CREATE QUOTA quota_by_forwarded_ip KEYED BY forwarded_ip_address FOR RANDOMIZED INTERVAL 1 YEAR MAX QUERIES = 1 TO quoted_by_forwarded_ip; +CREATE QUOTA quota_by_ip_${CLICKHOUSE_DATABASE} KEYED BY ip_address FOR RANDOMIZED INTERVAL 1 YEAR MAX QUERIES = 1 TO quoted_by_ip_${CLICKHOUSE_DATABASE}; +CREATE QUOTA quota_by_forwarded_ip_${CLICKHOUSE_DATABASE} KEYED BY forwarded_ip_address FOR RANDOMIZED INTERVAL 1 YEAR MAX QUERIES = 1 TO quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}; " # Note: the test can be flaky if the randomized interval will end while the loop is run. But with year long interval it's unlikely. @@ -28,39 +22,39 @@ CREATE QUOTA quota_by_forwarded_ip KEYED BY forwarded_ip_address FOR RANDOMIZED echo '--- Test with quota by immediate IP ---' while true; do - $CLICKHOUSE_CLIENT --user quoted_by_ip --query "SELECT count() FROM numbers(10)" 2>/dev/null || break + ${CLICKHOUSE_CURL} --fail -sS "${CLICKHOUSE_URL}&user=quoted_by_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" 2>/dev/null || break done | uniq -${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&user=quoted_by_ip" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&user=quoted_by_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' # X-Forwarded-For is ignored for quota by immediate IP address -${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 1.2.3.4' -sS "${CLICKHOUSE_URL}&user=quoted_by_ip" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' +${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 1.2.3.4' -sS "${CLICKHOUSE_URL}&user=quoted_by_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' echo '--- Test with quota by forwarded IP ---' while true; do - $CLICKHOUSE_CLIENT --user quoted_by_forwarded_ip --query "SELECT count() FROM numbers(10)" 2>/dev/null || break + ${CLICKHOUSE_CURL} --fail -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" 2>/dev/null || break done | uniq -${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' +${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' # X-Forwarded-For is respected for quota by forwarded IP address while true; do - ${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 1.2.3.4' -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip" -d "SELECT count() FROM numbers(10)" | grep -oP '^10$' || break + ${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 1.2.3.4' -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" | grep -oP '^10$' || break done | uniq -${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 1.2.3.4' -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' +${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 1.2.3.4' -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' # Only the last IP address is trusted -${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 5.6.7.8, 1.2.3.4' -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' +${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 5.6.7.8, 1.2.3.4' -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" | grep -oF 'exceeded' -${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 1.2.3.4, 5.6.7.8' -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip" -d "SELECT count() FROM numbers(10)" +${CLICKHOUSE_CURL} -H 'X-Forwarded-For: 1.2.3.4, 5.6.7.8' -sS "${CLICKHOUSE_URL}&user=quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}" -d "SELECT count() FROM numbers(10)" $CLICKHOUSE_CLIENT -n --query " -DROP QUOTA IF EXISTS quota_by_ip; +DROP QUOTA IF EXISTS quota_by_ip_${CLICKHOUSE_DATABASE}; DROP QUOTA IF EXISTS quota_by_forwarded_ip; -DROP USER IF EXISTS quoted_by_ip; -DROP USER IF EXISTS quoted_by_forwarded_ip; +DROP USER IF EXISTS quoted_by_ip_${CLICKHOUSE_DATABASE}; +DROP USER IF EXISTS quoted_by_forwarded_ip_${CLICKHOUSE_DATABASE}; " diff --git a/tests/queries/0_stateless/01611_constant_folding_subqueries.reference b/tests/queries/0_stateless/01611_constant_folding_subqueries.reference index e46fd479413..6128cd109e2 100644 --- a/tests/queries/0_stateless/01611_constant_folding_subqueries.reference +++ b/tests/queries/0_stateless/01611_constant_folding_subqueries.reference @@ -5,7 +5,7 @@ SELECT (SELECT * FROM system.numbers LIMIT 1 OFFSET 1) AS n, toUInt64(10 / n) FO 1,10 EXPLAIN SYNTAX SELECT (SELECT * FROM system.numbers LIMIT 1 OFFSET 1) AS n, toUInt64(10 / n); SELECT - identity(CAST(0, \'UInt64\')) AS n, + identity(CAST(0, \'Nullable(UInt64)\')) AS n, toUInt64(10 / n) SELECT * FROM (WITH (SELECT * FROM system.numbers LIMIT 1 OFFSET 1) AS n, toUInt64(10 / n) as q SELECT * FROM system.one WHERE q > 0); 0 diff --git a/tests/queries/0_stateless/01622_defaults_for_url_engine.sh b/tests/queries/0_stateless/01622_defaults_for_url_engine.sh index 7afdbbc6b66..491a1bd8988 100755 --- a/tests/queries/0_stateless/01622_defaults_for_url_engine.sh +++ b/tests/queries/0_stateless/01622_defaults_for_url_engine.sh @@ -7,8 +7,6 @@ CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) PORT="$(($RANDOM%63000+2001))" -TEMP_FILE="${CLICKHOUSE_TMP}/01622_defaults_for_url_engine.tmp" - function thread1 { while true; do @@ -19,7 +17,7 @@ function thread1 function thread2 { while true; do - $CLICKHOUSE_CLIENT --input_format_defaults_for_omitted_fields=1 -q "SELECT * FROM url('http://127.0.0.1:$1/', JSONEachRow, 'a int, b int default 7, c default a + b') format Values" + $CLICKHOUSE_CLIENT --input_format_defaults_for_omitted_fields=1 -q "SELECT * FROM url('http://127.0.0.1:$1/', JSONEachRow, 'a int, b int default 7, c default a + b') format Values" | grep -F '(1,7,8)' && break done } @@ -27,11 +25,11 @@ function thread2 export -f thread1; export -f thread2; -TIMEOUT=5 +TIMEOUT=60 timeout $TIMEOUT bash -c "thread1 $PORT" > /dev/null 2>&1 & -timeout $TIMEOUT bash -c "thread2 $PORT" 2> /dev/null > $TEMP_FILE & +PID=$! -wait +bash -c "thread2 $PORT" 2> /dev/null | grep -q -F '(1,7,8)' && echo "Ok" && kill -9 $PID -grep -q '(1,7,8)' $TEMP_FILE && echo "Ok" +wait >/dev/null 2>&1 diff --git a/tests/queries/0_stateless/01632_group_array_msan.sql b/tests/queries/0_stateless/01632_group_array_msan.sql index 0000f158d4e..f67ff896c3f 100644 --- a/tests/queries/0_stateless/01632_group_array_msan.sql +++ b/tests/queries/0_stateless/01632_group_array_msan.sql @@ -1 +1,4 @@ -SELECT groupArrayMerge(1048577)(y * 1048576) FROM (SELECT groupArrayState(9223372036854775807)(x) AS y FROM (SELECT 1048576 AS x)) FORMAT Null; +SELECT groupArrayMerge(1048577)(y * 1048576) FROM (SELECT groupArrayState(9223372036854775807)(x) AS y FROM (SELECT 1048576 AS x)) FORMAT Null; -- { serverError 43 } +SELECT groupArrayMerge(1048577)(y * 1048576) FROM (SELECT groupArrayState(1048577)(x) AS y FROM (SELECT 1048576 AS x)) FORMAT Null; +SELECT groupArrayMerge(9223372036854775807)(y * 1048576) FROM (SELECT groupArrayState(9223372036854775807)(x) AS y FROM (SELECT 1048576 AS x)) FORMAT Null; +SELECT quantileResampleMerge(0.5, 257, 65536, 1)(tuple(*).1) FROM (SELECT quantileResampleState(0.10, 1, 2, 42)(number, number) FROM numbers(100)); -- { serverError 43 } diff --git a/tests/queries/0_stateless/01651_bugs_from_15889.reference b/tests/queries/0_stateless/01651_bugs_from_15889.reference index 77ac542d4fb..8b137891791 100644 --- a/tests/queries/0_stateless/01651_bugs_from_15889.reference +++ b/tests/queries/0_stateless/01651_bugs_from_15889.reference @@ -1,2 +1 @@ -0 diff --git a/tests/queries/0_stateless/01651_bugs_from_15889.sql b/tests/queries/0_stateless/01651_bugs_from_15889.sql index d0f1006da95..4717a8dcc0d 100644 --- a/tests/queries/0_stateless/01651_bugs_from_15889.sql +++ b/tests/queries/0_stateless/01651_bugs_from_15889.sql @@ -55,7 +55,7 @@ WHERE (query_id = WHERE current_database = currentDatabase() AND (query LIKE '%test cpu time query profiler%') AND (query NOT LIKE '%system%') ORDER BY event_time DESC LIMIT 1 -)) AND (symbol LIKE '%Source%'); +)) AND (symbol LIKE '%Source%'); -- { serverError 125 } WITH addressToSymbol(arrayJoin(trace)) AS symbol @@ -70,7 +70,7 @@ WHERE greaterOrEquals(event_date, ignore(ignore(ignore(NULL, '')), 256), yesterd WHERE current_database = currentDatabase() AND (event_date >= yesterday()) AND (query LIKE '%test memory profiler%') ORDER BY event_time DESC LIMIT 1 -)); -- { serverError 42 } +)); -- { serverError 125 } DROP TABLE IF EXISTS trace_log; diff --git a/tests/queries/0_stateless/01656_sequence_next_node_long.sql b/tests/queries/0_stateless/01656_sequence_next_node_long.sql index d0d01e989b8..9c181f5e491 100644 --- a/tests/queries/0_stateless/01656_sequence_next_node_long.sql +++ b/tests/queries/0_stateless/01656_sequence_next_node_long.sql @@ -4,30 +4,30 @@ DROP TABLE IF EXISTS test_sequenceNextNode_Nullable; CREATE TABLE IF NOT EXISTS test_sequenceNextNode_Nullable (dt DateTime, id int, action Nullable(String)) ENGINE = MergeTree() PARTITION BY dt ORDER BY id; -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',1,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:02',1,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:03',1,'C'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:04',1,'D'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',2,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:02',2,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:03',2,'D'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:04',2,'C'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',3,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:02',3,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',4,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:02',4,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:03',4,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:04',4,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:05',4,'C'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',5,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:02',5,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:03',5,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:04',5,'C'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',6,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:02',6,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:03',6,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:04',6,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:05',6,'C'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',1,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:02',1,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:03',1,'C'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:04',1,'D'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',2,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:02',2,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:03',2,'D'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:04',2,'C'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',3,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:02',3,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',4,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:02',4,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:03',4,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:04',4,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:05',4,'C'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',5,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:02',5,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:03',5,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:04',5,'C'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',6,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:02',6,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:03',6,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:04',6,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:05',6,'C'); SELECT '(forward, head, A)', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A') AS next_node FROM test_sequenceNextNode_Nullable GROUP BY id ORDER BY id; SELECT '(forward, head, B)', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'B') AS next_node FROM test_sequenceNextNode_Nullable GROUP BY id ORDER BY id; @@ -50,11 +50,11 @@ SELECT '(forward, head, B->A->A)', id, sequenceNextNode('forward', 'head')(dt, a SELECT '(backward, tail, A->A->B)', id, sequenceNextNode('backward', 'tail')(dt, action, 1, action = 'A', action = 'A', action = 'B') AS next_node FROM test_sequenceNextNode_Nullable GROUP BY id ORDER BY id; SELECT '(backward, tail, B->A->A)', id, sequenceNextNode('backward', 'tail')(dt, action, 1, action = 'B', action = 'A', action = 'A') AS next_node FROM test_sequenceNextNode_Nullable GROUP BY id ORDER BY id; -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',10,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:02',10,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:02',10,NULL); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:03',10,'C'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:04',10,'D'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',10,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:02',10,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:02',10,NULL); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:03',10,'C'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:04',10,'D'); SELECT '(forward, head, A) id >= 10', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A') AS next_node FROM test_sequenceNextNode_Nullable WHERE id >= 10 GROUP BY id ORDER BY id; SELECT '(forward, head, A) id >= 10', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A', action = 'B') AS next_node FROM test_sequenceNextNode_Nullable WHERE id >= 10 GROUP BY id ORDER BY id; @@ -63,10 +63,10 @@ SELECT '(forward, head, A) id >= 10', id, sequenceNextNode('backward', 'tail')(d SELECT '(backward, tail, A) id >= 10', id, sequenceNextNode('backward', 'tail')(dt, action, 1, action = 'D', action = 'C') AS next_node FROM test_sequenceNextNode_Nullable WHERE id >= 10 GROUP BY id ORDER BY id; SELECT '(backward, tail, A) id >= 10', id, sequenceNextNode('backward', 'tail')(dt, action, 1, action = 'D', action = 'C', action = 'B') AS next_node FROM test_sequenceNextNode_Nullable WHERE id >= 10 GROUP BY id ORDER BY id; -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',11,'A'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',11,'B'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',11,'C'); -INSERT INTO test_sequenceNextNode_Nullable values ('1970-01-01 09:00:01',11,'D'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',11,'A'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',11,'B'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',11,'C'); +INSERT INTO test_sequenceNextNode_Nullable values ('2000-01-02 09:00:01',11,'D'); SELECT '(0, A) id = 11', count() FROM (SELECT id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A') AS next_node FROM test_sequenceNextNode_Nullable WHERE id = 11 GROUP BY id HAVING next_node = 'B'); SELECT '(0, C) id = 11', count() FROM (SELECT id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'C') AS next_node FROM test_sequenceNextNode_Nullable WHERE id = 11 GROUP BY id HAVING next_node = 'D'); @@ -100,30 +100,30 @@ DROP TABLE IF EXISTS test_sequenceNextNode; CREATE TABLE IF NOT EXISTS test_sequenceNextNode (dt DateTime, id int, action String) ENGINE = MergeTree() PARTITION BY dt ORDER BY id; -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',1,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:02',1,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:03',1,'C'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:04',1,'D'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',2,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:02',2,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:03',2,'D'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:04',2,'C'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',3,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:02',3,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',4,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:02',4,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:03',4,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:04',4,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:05',4,'C'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',5,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:02',5,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:03',5,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:04',5,'C'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',6,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:02',6,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:03',6,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:04',6,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:05',6,'C'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',1,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:02',1,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:03',1,'C'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:04',1,'D'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',2,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:02',2,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:03',2,'D'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:04',2,'C'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',3,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:02',3,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',4,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:02',4,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:03',4,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:04',4,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:05',4,'C'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',5,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:02',5,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:03',5,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:04',5,'C'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',6,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:02',6,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:03',6,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:04',6,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:05',6,'C'); SELECT '(forward, head, A)', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A') AS next_node FROM test_sequenceNextNode GROUP BY id ORDER BY id; SELECT '(forward, head, B)', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'B') AS next_node FROM test_sequenceNextNode GROUP BY id ORDER BY id; @@ -146,10 +146,10 @@ SELECT '(forward, head, B->A->A)', id, sequenceNextNode('forward', 'head')(dt, a SELECT '(backward, tail, A->A->B)', id, sequenceNextNode('backward', 'tail')(dt, action, 1, action = 'A', action = 'A', action = 'B') AS next_node FROM test_sequenceNextNode GROUP BY id ORDER BY id; SELECT '(backward, tail, B->A->A)', id, sequenceNextNode('backward', 'tail')(dt, action, 1, action = 'B', action = 'A', action = 'A') AS next_node FROM test_sequenceNextNode GROUP BY id ORDER BY id; -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',10,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:02',10,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:03',10,'C'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:04',10,'D'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',10,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:02',10,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:03',10,'C'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:04',10,'D'); SELECT '(forward, head, A) id >= 10', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A') AS next_node FROM test_sequenceNextNode WHERE id >= 10 GROUP BY id ORDER BY id; SELECT '(forward, head, A) id >= 10', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A', action = 'B') AS next_node FROM test_sequenceNextNode WHERE id >= 10 GROUP BY id ORDER BY id; @@ -158,10 +158,10 @@ SELECT '(forward, head, A) id >= 10', id, sequenceNextNode('backward', 'tail')(d SELECT '(backward, tail, A) id >= 10', id, sequenceNextNode('backward', 'tail')(dt, action, 1, action = 'D', action = 'C') AS next_node FROM test_sequenceNextNode WHERE id >= 10 GROUP BY id ORDER BY id; SELECT '(backward, tail, A) id >= 10', id, sequenceNextNode('backward', 'tail')(dt, action, 1, action = 'D', action = 'C', action = 'B') AS next_node FROM test_sequenceNextNode WHERE id >= 10 GROUP BY id ORDER BY id; -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',11,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',11,'B'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',11,'C'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',11,'D'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',11,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',11,'B'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',11,'C'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',11,'D'); SELECT '(0, A) id = 11', count() FROM (SELECT id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A') AS next_node FROM test_sequenceNextNode WHERE id = 11 GROUP BY id HAVING next_node = 'B'); SELECT '(0, C) id = 11', count() FROM (SELECT id, sequenceNextNode('forward', 'head')(dt, action, 1, action = 'C') AS next_node FROM test_sequenceNextNode WHERE id = 11 GROUP BY id HAVING next_node = 'D'); @@ -189,8 +189,8 @@ SELECT '(backward, first_match, B->B)', id, sequenceNextNode('backward', 'first_ SELECT '(max_args)', id, sequenceNextNode('forward', 'head')(dt, action, 1, action = '0', action = '1', action = '2', action = '3', action = '4', action = '5', action = '6', action = '7', action = '8', action = '9', action = '10', action = '11', action = '12', action = '13', action = '14', action = '15', action = '16', action = '17', action = '18', action = '19', action = '20', action = '21', action = '22', action = '23', action = '24', action = '25', action = '26', action = '27', action = '28', action = '29', action = '30', action = '31', action = '32', action = '33', action = '34', action = '35', action = '36', action = '37', action = '38', action = '39', action = '40', action = '41', action = '42', action = '43', action = '44', action = '45', action = '46', action = '47', action = '48', action = '49', action = '50', action = '51', action = '52', action = '53', action = '54', action = '55', action = '56', action = '57', action = '58', action = '59', action = '60', action = '61', action = '62', action = '63') from test_sequenceNextNode GROUP BY id ORDER BY id; -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',12,'A'); -INSERT INTO test_sequenceNextNode values ('1970-01-01 09:00:01',12,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',12,'A'); +INSERT INTO test_sequenceNextNode values ('2000-01-02 09:00:01',12,'A'); SELECT '(forward, head, A) id = 12', sequenceNextNode('forward', 'head')(dt, action, 1, action = 'A') AS next_node FROM test_sequenceNextNode WHERE id = 12; @@ -200,18 +200,18 @@ DROP TABLE IF EXISTS test_base_condition; CREATE TABLE IF NOT EXISTS test_base_condition (dt DateTime, id int, action String, referrer String) ENGINE = MergeTree() PARTITION BY dt ORDER BY id; -INSERT INTO test_base_condition values ('1970-01-01 09:00:01',1,'A','1'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:02',1,'B','2'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:03',1,'C','3'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:04',1,'D','4'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:01',2,'D','4'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:02',2,'C','3'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:03',2,'B','2'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:04',2,'A','1'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:01',3,'B','10'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:02',3,'B','2'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:03',3,'D','3'); -INSERT INTO test_base_condition values ('1970-01-01 09:00:04',3,'C','4'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:01',1,'A','1'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:02',1,'B','2'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:03',1,'C','3'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:04',1,'D','4'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:01',2,'D','4'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:02',2,'C','3'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:03',2,'B','2'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:04',2,'A','1'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:01',3,'B','10'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:02',3,'B','2'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:03',3,'D','3'); +INSERT INTO test_base_condition values ('2000-01-02 09:00:04',3,'C','4'); SELECT '(forward, head, 1)', id, sequenceNextNode('forward', 'head')(dt, action, referrer = '1') AS next_node FROM test_base_condition GROUP BY id ORDER BY id; SELECT '(forward, head, 1, A)', id, sequenceNextNode('forward', 'head')(dt, action, referrer = '1', action = 'A') AS next_node FROM test_base_condition GROUP BY id ORDER BY id; diff --git a/tests/queries/0_stateless/01656_test_query_log_factories_info.reference b/tests/queries/0_stateless/01656_test_query_log_factories_info.reference index af7feae5a38..47b3133ceca 100644 --- a/tests/queries/0_stateless/01656_test_query_log_factories_info.reference +++ b/tests/queries/0_stateless/01656_test_query_log_factories_info.reference @@ -13,6 +13,9 @@ arraySort(used_table_functions) arraySort(used_functions) ['CAST','CRC32','addDays','array','arrayFlatten','modulo','plus','pow','round','substring','tanh','toDate','toDayOfYear','toTypeName','toWeek'] +used_functions +['repeat'] + arraySort(used_data_type_families) ['Array','Int32','Nullable','String'] diff --git a/tests/queries/0_stateless/01656_test_query_log_factories_info.sql b/tests/queries/0_stateless/01656_test_query_log_factories_info.sql index 3a890ce16f9..50d50155480 100644 --- a/tests/queries/0_stateless/01656_test_query_log_factories_info.sql +++ b/tests/queries/0_stateless/01656_test_query_log_factories_info.sql @@ -1,4 +1,5 @@ SET database_atomic_wait_for_drop_and_detach_synchronously=1; +SET log_queries=1; SELECT uniqArray([1, 1, 2]), SUBSTRING('Hello, world', 7, 5), @@ -13,9 +14,16 @@ SELECT uniqArray([1, 1, 2]), countIf(toDate('2000-12-05') + number as d, toDayOfYear(d) % 2) FROM numbers(100); + +SELECT repeat('a', number) +FROM numbers(10e3) +SETTINGS max_memory_usage=4e6, max_block_size=100 +FORMAT Null; -- { serverError 241 } + SELECT ''; SYSTEM FLUSH LOGS; + SELECT arraySort(used_aggregate_functions) FROM system.query_log WHERE current_database = currentDatabase() AND type = 'QueryFinish' AND (query LIKE '%toDate(\'2000-12-05\')%') ORDER BY query_start_time DESC LIMIT 1 FORMAT TabSeparatedWithNames; @@ -36,6 +44,11 @@ FROM system.query_log WHERE current_database = currentDatabase() AND type = 'Que ORDER BY query_start_time DESC LIMIT 1 FORMAT TabSeparatedWithNames; SELECT ''; +SELECT used_functions +FROM system.query_log WHERE current_database = currentDatabase() AND type != 'QueryStart' AND (query LIKE '%repeat%') +ORDER BY query_start_time DESC LIMIT 1 FORMAT TabSeparatedWithNames; +SELECT ''; + SELECT arraySort(used_data_type_families) FROM system.query_log WHERE current_database = currentDatabase() AND type = 'QueryFinish' AND (query LIKE '%toDate(\'2000-12-05\')%') ORDER BY query_start_time DESC LIMIT 1 FORMAT TabSeparatedWithNames; diff --git a/tests/queries/0_stateless/01658_read_file_to_stringcolumn.sh b/tests/queries/0_stateless/01658_read_file_to_stringcolumn.sh index 072e8d75f52..1bfcf863184 100755 --- a/tests/queries/0_stateless/01658_read_file_to_stringcolumn.sh +++ b/tests/queries/0_stateless/01658_read_file_to_stringcolumn.sh @@ -44,7 +44,7 @@ echo "clickhouse-client --query "'"select file('"'${user_files_path}/dir'), file echo "clickhouse-client --query "'"select file('"'/tmp/c.txt'), file('${user_files_path}/b.txt')"'";echo :$?' | bash 2>/dev/null # Test relative path consists of ".." whose absolute path is out of the user_files directory. -echo "clickhouse-client --query "'"select file('"'${user_files_path}/../../../../tmp/c.txt'), file('b.txt')"'";echo :$?' | bash 2>/dev/null +echo "clickhouse-client --query "'"select file('"'${user_files_path}/../../../../../../../../../../../../../../../../../../../tmp/c.txt'), file('b.txt')"'";echo :$?' | bash 2>/dev/null echo "clickhouse-client --query "'"select file('"'../../../../a.txt'), file('${user_files_path}/b.txt')"'";echo :$?' | bash 2>/dev/null diff --git a/tests/queries/0_stateless/01664_ntoa_aton_mysql_compatibility.reference b/tests/queries/0_stateless/01664_ntoa_aton_mysql_compatibility.reference new file mode 100644 index 00000000000..b8c6661bca7 --- /dev/null +++ b/tests/queries/0_stateless/01664_ntoa_aton_mysql_compatibility.reference @@ -0,0 +1,4 @@ +2a02:6b8::11 +2A0206B8000000000000000000000011 +0.0.5.57 +3232235521 diff --git a/tests/queries/0_stateless/01664_ntoa_aton_mysql_compatibility.sql b/tests/queries/0_stateless/01664_ntoa_aton_mysql_compatibility.sql new file mode 100644 index 00000000000..4f4aef09259 --- /dev/null +++ b/tests/queries/0_stateless/01664_ntoa_aton_mysql_compatibility.sql @@ -0,0 +1,4 @@ +SELECT INET6_NTOA(toFixedString(unhex('2A0206B8000000000000000000000011'), 16)); +SELECT hex(INET6_ATON('2a02:6b8::11')); +SELECT INET_NTOA(toUInt32(1337)); +SELECT INET_ATON('192.168.0.1'); diff --git a/tests/queries/0_stateless/01664_test_FunctionIPv6NumToString_mysql_compatibility.reference b/tests/queries/0_stateless/01664_test_FunctionIPv6NumToString_mysql_compatibility.reference deleted file mode 100644 index 18a9c3436e5..00000000000 --- a/tests/queries/0_stateless/01664_test_FunctionIPv6NumToString_mysql_compatibility.reference +++ /dev/null @@ -1 +0,0 @@ -2a02:6b8::11 diff --git a/tests/queries/0_stateless/01664_test_FunctionIPv6NumToString_mysql_compatibility.sql b/tests/queries/0_stateless/01664_test_FunctionIPv6NumToString_mysql_compatibility.sql deleted file mode 100644 index 85bf1f8c7f9..00000000000 --- a/tests/queries/0_stateless/01664_test_FunctionIPv6NumToString_mysql_compatibility.sql +++ /dev/null @@ -1 +0,0 @@ -SELECT INET6_NTOA(toFixedString(unhex('2A0206B8000000000000000000000011'), 16)); diff --git a/tests/queries/0_stateless/01665_test_FunctionIPv6StringToNum_mysql_compatibility.reference b/tests/queries/0_stateless/01665_test_FunctionIPv6StringToNum_mysql_compatibility.reference deleted file mode 100644 index 0b3192fc44c..00000000000 --- a/tests/queries/0_stateless/01665_test_FunctionIPv6StringToNum_mysql_compatibility.reference +++ /dev/null @@ -1 +0,0 @@ -*¸\0\0\0\0\0\0\0\0\0\0\0 diff --git a/tests/queries/0_stateless/01665_test_FunctionIPv6StringToNum_mysql_compatibility.sql b/tests/queries/0_stateless/01665_test_FunctionIPv6StringToNum_mysql_compatibility.sql deleted file mode 100644 index 2eff6cca793..00000000000 --- a/tests/queries/0_stateless/01665_test_FunctionIPv6StringToNum_mysql_compatibility.sql +++ /dev/null @@ -1 +0,0 @@ -SELECT INET6_ATON('2a02:6b8::11'); diff --git a/tests/queries/0_stateless/01666_test_FunctionIPv4NumToString_mysql_compatibility.reference b/tests/queries/0_stateless/01666_test_FunctionIPv4NumToString_mysql_compatibility.reference deleted file mode 100644 index 08674e64f67..00000000000 --- a/tests/queries/0_stateless/01666_test_FunctionIPv4NumToString_mysql_compatibility.reference +++ /dev/null @@ -1 +0,0 @@ -0.0.5.57 diff --git a/tests/queries/0_stateless/01666_test_FunctionIPv4NumToString_mysql_compatibility.sql b/tests/queries/0_stateless/01666_test_FunctionIPv4NumToString_mysql_compatibility.sql deleted file mode 100644 index 0c6608c6e74..00000000000 --- a/tests/queries/0_stateless/01666_test_FunctionIPv4NumToString_mysql_compatibility.sql +++ /dev/null @@ -1 +0,0 @@ -SELECT INET_NTOA(toUInt32(1337)); diff --git a/tests/queries/0_stateless/01667_test_FunctionIPv4StringToNum_mysql_compatibility.reference b/tests/queries/0_stateless/01667_test_FunctionIPv4StringToNum_mysql_compatibility.reference deleted file mode 100644 index c15798a747d..00000000000 --- a/tests/queries/0_stateless/01667_test_FunctionIPv4StringToNum_mysql_compatibility.reference +++ /dev/null @@ -1 +0,0 @@ -3232235521 diff --git a/tests/queries/0_stateless/01667_test_FunctionIPv4StringToNum_mysql_compatibility.sql b/tests/queries/0_stateless/01667_test_FunctionIPv4StringToNum_mysql_compatibility.sql deleted file mode 100644 index 6a91900370c..00000000000 --- a/tests/queries/0_stateless/01667_test_FunctionIPv4StringToNum_mysql_compatibility.sql +++ /dev/null @@ -1 +0,0 @@ -SELECT INET_ATON('192.168.0.1'); diff --git a/tests/queries/0_stateless/01676_reinterpret_as.sql b/tests/queries/0_stateless/01676_reinterpret_as.sql index 5eb94ed0a13..e8c2a0b1373 100644 --- a/tests/queries/0_stateless/01676_reinterpret_as.sql +++ b/tests/queries/0_stateless/01676_reinterpret_as.sql @@ -30,8 +30,8 @@ SELECT reinterpret(a, 'String'), reinterpretAsString(a), reinterpretAsUInt8('11' SELECT reinterpret(a, 'String'), reinterpretAsString(a), reinterpretAsUInt16('11') as a; SELECT 'Dates'; SELECT reinterpret(0, 'Date'), reinterpret('', 'Date'); -SELECT reinterpret(0, 'DateTime'), reinterpret('', 'DateTime'); -SELECT reinterpret(0, 'DateTime64'), reinterpret('', 'DateTime64'); +SELECT reinterpret(0, 'DateTime(''Europe/Moscow'')'), reinterpret('', 'DateTime(''Europe/Moscow'')'); +SELECT reinterpret(0, 'DateTime64(3, ''Europe/Moscow'')'), reinterpret('', 'DateTime64(3, ''Europe/Moscow'')'); SELECT 'Decimals'; SELECT reinterpret(toDecimal32(5, 2), 'Decimal32(2)'), reinterpret('1', 'Decimal32(2)'); SELECT reinterpret(toDecimal64(5, 2), 'Decimal64(2)'), reinterpret('1', 'Decimal64(2)');; diff --git a/tests/queries/0_stateless/01683_codec_encrypted.reference b/tests/queries/0_stateless/01683_codec_encrypted.reference new file mode 100644 index 00000000000..0d30be781e5 --- /dev/null +++ b/tests/queries/0_stateless/01683_codec_encrypted.reference @@ -0,0 +1 @@ +1 Some plaintext diff --git a/tests/queries/0_stateless/01683_codec_encrypted.sql b/tests/queries/0_stateless/01683_codec_encrypted.sql new file mode 100644 index 00000000000..ec90e6c3129 --- /dev/null +++ b/tests/queries/0_stateless/01683_codec_encrypted.sql @@ -0,0 +1,7 @@ +DROP TABLE IF EXISTS encryption_test; +CREATE TABLE encryption_test (i Int, s String Codec(Encrypted('AES-128-GCM-SIV'))) ENGINE = MergeTree ORDER BY i; + +INSERT INTO encryption_test VALUES (1, 'Some plaintext'); +SELECT * FROM encryption_test; + +DROP TABLE encryption_test; diff --git a/tests/queries/0_stateless/01686_event_time_microseconds_part_log.reference b/tests/queries/0_stateless/01686_event_time_microseconds_part_log.reference index 9766475a418..79ebd0860f4 100644 --- a/tests/queries/0_stateless/01686_event_time_microseconds_part_log.reference +++ b/tests/queries/0_stateless/01686_event_time_microseconds_part_log.reference @@ -1 +1,2 @@ ok +ok diff --git a/tests/queries/0_stateless/01686_event_time_microseconds_part_log.sql b/tests/queries/0_stateless/01686_event_time_microseconds_part_log.sql index 4a653379ef1..6063be4d1da 100644 --- a/tests/queries/0_stateless/01686_event_time_microseconds_part_log.sql +++ b/tests/queries/0_stateless/01686_event_time_microseconds_part_log.sql @@ -10,16 +10,27 @@ ORDER BY key; INSERT INTO table_with_single_pk SELECT number, toString(number % 10) FROM numbers(1000000); +-- Check NewPart SYSTEM FLUSH LOGS; - WITH ( SELECT (event_time, event_time_microseconds) FROM system.part_log - WHERE "table" = 'table_with_single_pk' - AND "database" = currentDatabase() + WHERE table = 'table_with_single_pk' AND database = currentDatabase() AND event_type = 'NewPart' ORDER BY event_time DESC LIMIT 1 ) AS time SELECT if(dateDiff('second', toDateTime(time.2), toDateTime(time.1)) = 0, 'ok', 'fail'); -DROP TABLE IF EXISTS table_with_single_pk; +-- Now let's check RemovePart +TRUNCATE TABLE table_with_single_pk; +SYSTEM FLUSH LOGS; +WITH ( + SELECT (event_time, event_time_microseconds) + FROM system.part_log + WHERE table = 'table_with_single_pk' AND database = currentDatabase() AND event_type = 'RemovePart' + ORDER BY event_time DESC + LIMIT 1 + ) AS time +SELECT if(dateDiff('second', toDateTime(time.2), toDateTime(time.1)) = 0, 'ok', 'fail'); + +DROP TABLE table_with_single_pk; diff --git a/tests/queries/0_stateless/01686_rocksdb.reference b/tests/queries/0_stateless/01686_rocksdb.reference index fa4e12d51ff..198771df112 100644 --- a/tests/queries/0_stateless/01686_rocksdb.reference +++ b/tests/queries/0_stateless/01686_rocksdb.reference @@ -1,3 +1,4 @@ +10000 123 Hello, world (123) -- -- diff --git a/tests/queries/0_stateless/01686_rocksdb.sql b/tests/queries/0_stateless/01686_rocksdb.sql index 9a8662453c1..23baf249d1f 100644 --- a/tests/queries/0_stateless/01686_rocksdb.sql +++ b/tests/queries/0_stateless/01686_rocksdb.sql @@ -2,7 +2,9 @@ DROP TABLE IF EXISTS 01686_test; CREATE TABLE 01686_test (key UInt64, value String) Engine=EmbeddedRocksDB PRIMARY KEY(key); +SELECT value FROM system.rocksdb WHERE database = currentDatabase() and table = '01686_test' and name = 'number.keys.written'; INSERT INTO 01686_test SELECT number, format('Hello, world ({})', toString(number)) FROM numbers(10000); +SELECT value FROM system.rocksdb WHERE database = currentDatabase() and table = '01686_test' and name = 'number.keys.written'; SELECT * FROM 01686_test WHERE key = 123; SELECT '--'; diff --git a/tests/queries/0_stateless/01691_DateTime64_clamp.reference b/tests/queries/0_stateless/01691_DateTime64_clamp.reference index 881ab4feff8..41a8d653a3f 100644 --- a/tests/queries/0_stateless/01691_DateTime64_clamp.reference +++ b/tests/queries/0_stateless/01691_DateTime64_clamp.reference @@ -17,11 +17,11 @@ SELECT toDateTime64(toFloat32(bitShiftLeft(toUInt64(1),33)), 2, 'Europe/Moscow') 2106-02-07 09:28:16.00 SELECT toDateTime64(toFloat64(bitShiftLeft(toUInt64(1),33)), 2, 'Europe/Moscow') FORMAT Null; -- These are outsize of extended range and hence clamped -SELECT toDateTime64(-1 * bitShiftLeft(toUInt64(1), 35), 2); +SELECT toDateTime64(-1 * bitShiftLeft(toUInt64(1), 35), 2, 'Europe/Moscow'); 1925-01-01 02:00:00.00 -SELECT CAST(-1 * bitShiftLeft(toUInt64(1), 35) AS DateTime64); +SELECT CAST(-1 * bitShiftLeft(toUInt64(1), 35) AS DateTime64(3, 'Europe/Moscow')); 1925-01-01 02:00:00.000 -SELECT CAST(bitShiftLeft(toUInt64(1), 35) AS DateTime64); +SELECT CAST(bitShiftLeft(toUInt64(1), 35) AS DateTime64(3, 'Europe/Moscow')); 2282-12-31 03:00:00.000 -SELECT toDateTime64(bitShiftLeft(toUInt64(1), 35), 2); +SELECT toDateTime64(bitShiftLeft(toUInt64(1), 35), 2, 'Europe/Moscow'); 2282-12-31 03:00:00.00 diff --git a/tests/queries/0_stateless/01691_DateTime64_clamp.sql b/tests/queries/0_stateless/01691_DateTime64_clamp.sql index c77a66febb3..2786d9c1c09 100644 --- a/tests/queries/0_stateless/01691_DateTime64_clamp.sql +++ b/tests/queries/0_stateless/01691_DateTime64_clamp.sql @@ -11,7 +11,7 @@ SELECT toDateTime64(toFloat32(bitShiftLeft(toUInt64(1),33)), 2, 'Europe/Moscow') SELECT toDateTime64(toFloat64(bitShiftLeft(toUInt64(1),33)), 2, 'Europe/Moscow') FORMAT Null; -- These are outsize of extended range and hence clamped -SELECT toDateTime64(-1 * bitShiftLeft(toUInt64(1), 35), 2); -SELECT CAST(-1 * bitShiftLeft(toUInt64(1), 35) AS DateTime64); -SELECT CAST(bitShiftLeft(toUInt64(1), 35) AS DateTime64); -SELECT toDateTime64(bitShiftLeft(toUInt64(1), 35), 2); +SELECT toDateTime64(-1 * bitShiftLeft(toUInt64(1), 35), 2, 'Europe/Moscow'); +SELECT CAST(-1 * bitShiftLeft(toUInt64(1), 35) AS DateTime64(3, 'Europe/Moscow')); +SELECT CAST(bitShiftLeft(toUInt64(1), 35) AS DateTime64(3, 'Europe/Moscow')); +SELECT toDateTime64(bitShiftLeft(toUInt64(1), 35), 2, 'Europe/Moscow'); diff --git a/tests/queries/0_stateless/01692_DateTime64_from_DateTime.reference b/tests/queries/0_stateless/01692_DateTime64_from_DateTime.reference index a0562e40027..3473b027c22 100644 --- a/tests/queries/0_stateless/01692_DateTime64_from_DateTime.reference +++ b/tests/queries/0_stateless/01692_DateTime64_from_DateTime.reference @@ -1,9 +1,5 @@ --- { echo } -select toDateTime64(toDateTime(1), 2); 1970-01-01 03:00:01.00 -select toDateTime64(toDate(1), 2); +1970-01-01 03:00:01.00 1970-01-02 00:00:00.00 -select toDateTime64(toDateTime(1), 2, 'GMT'); 1970-01-01 00:00:01.00 -select toDateTime64(toDate(1), 2, 'GMT'); 1970-01-02 00:00:00.00 diff --git a/tests/queries/0_stateless/01692_DateTime64_from_DateTime.sql b/tests/queries/0_stateless/01692_DateTime64_from_DateTime.sql index 60f76e9192c..fac0c341007 100644 --- a/tests/queries/0_stateless/01692_DateTime64_from_DateTime.sql +++ b/tests/queries/0_stateless/01692_DateTime64_from_DateTime.sql @@ -1,5 +1,7 @@ --- { echo } -select toDateTime64(toDateTime(1), 2); -select toDateTime64(toDate(1), 2); +select toDateTime64(toDateTime(1, 'Europe/Moscow'), 2); +select toDateTime64(toDate(1), 2) FORMAT Null; -- Unknown timezone +select toDateTime64(toDateTime(1), 2) FORMAT Null; -- Unknown timezone +select toDateTime64(toDateTime(1), 2, 'Europe/Moscow'); +select toDateTime64(toDate(1), 2, 'Europe/Moscow'); select toDateTime64(toDateTime(1), 2, 'GMT'); select toDateTime64(toDate(1), 2, 'GMT'); diff --git a/tests/queries/0_stateless/01702_toDateTime_from_string_clamping.reference b/tests/queries/0_stateless/01702_toDateTime_from_string_clamping.reference index 7e8307d66a6..c5e86963f22 100644 --- a/tests/queries/0_stateless/01702_toDateTime_from_string_clamping.reference +++ b/tests/queries/0_stateless/01702_toDateTime_from_string_clamping.reference @@ -1,9 +1,4 @@ --- { echo } -SELECT toString(toDateTime('-922337203.6854775808', 1)); 1940-10-09 22:13:17.6 -SELECT toString(toDateTime('9922337203.6854775808', 1)); 2283-11-11 23:46:43.6 -SELECT toDateTime64(CAST('10000000000.1' AS Decimal64(1)), 1); 2283-11-11 23:46:40.1 -SELECT toDateTime64(CAST('-10000000000.1' AS Decimal64(1)), 1); 1925-01-01 00:00:00.1 diff --git a/tests/queries/0_stateless/01702_toDateTime_from_string_clamping.sql b/tests/queries/0_stateless/01702_toDateTime_from_string_clamping.sql index d1f0416149a..f51a1bb2280 100644 --- a/tests/queries/0_stateless/01702_toDateTime_from_string_clamping.sql +++ b/tests/queries/0_stateless/01702_toDateTime_from_string_clamping.sql @@ -1,5 +1,4 @@ --- { echo } -SELECT toString(toDateTime('-922337203.6854775808', 1)); -SELECT toString(toDateTime('9922337203.6854775808', 1)); -SELECT toDateTime64(CAST('10000000000.1' AS Decimal64(1)), 1); -SELECT toDateTime64(CAST('-10000000000.1' AS Decimal64(1)), 1); +SELECT toString(toDateTime('-922337203.6854775808', 1, 'Europe/Moscow')); +SELECT toString(toDateTime('9922337203.6854775808', 1, 'Europe/Moscow')); +SELECT toDateTime64(CAST('10000000000.1' AS Decimal64(1)), 1, 'Europe/Moscow'); +SELECT toDateTime64(CAST('-10000000000.1' AS Decimal64(1)), 1, 'Europe/Moscow'); diff --git a/tests/queries/0_stateless/01710_projection_in_index.reference b/tests/queries/0_stateless/01710_projection_in_index.reference new file mode 100644 index 00000000000..73c1df53be4 --- /dev/null +++ b/tests/queries/0_stateless/01710_projection_in_index.reference @@ -0,0 +1,2 @@ +1 1 1 +2 2 2 diff --git a/tests/queries/0_stateless/01710_projection_in_index.sql b/tests/queries/0_stateless/01710_projection_in_index.sql new file mode 100644 index 00000000000..2669d69dc9f --- /dev/null +++ b/tests/queries/0_stateless/01710_projection_in_index.sql @@ -0,0 +1,11 @@ +drop table if exists t; + +create table t (i int, j int, k int, projection p (select * order by j)) engine MergeTree order by i settings index_granularity = 1; + +insert into t select number, number, number from numbers(10); + +set allow_experimental_projection_optimization = 1, max_rows_to_read = 3; + +select * from t where i < 5 and j in (1, 2); + +drop table t; diff --git a/tests/queries/0_stateless/01720_join_implicit_cast.reference b/tests/queries/0_stateless/01720_join_implicit_cast.reference deleted file mode 100644 index 3cca6a264fa..00000000000 --- a/tests/queries/0_stateless/01720_join_implicit_cast.reference +++ /dev/null @@ -1,102 +0,0 @@ -=== hash === -= full = -1 1 -2 2 --1 1 -1 \N -1 257 -1 -1 -= left = -1 1 -2 2 -= right = -1 1 --1 1 -1 \N -1 257 -1 -1 -= inner = -1 1 -= full = -1 1 1 1 -2 2 0 \N -0 0 -1 1 -0 0 1 \N -0 0 1 257 -0 0 1 -1 -= left = -1 1 1 1 -2 2 0 \N -= right = -1 1 1 1 -0 0 -1 1 -0 0 1 \N -0 0 1 257 -0 0 1 -1 -= inner = -1 1 1 1 -= agg = -5 260 -3 3 -3 258 -1 1 -5 260 -3 3 -3 258 -1 1 -= types = -1 -1 -1 -1 -=== partial_merge === -= full = -1 1 -2 2 --1 1 -1 \N -1 257 -1 -1 -= left = -1 1 -2 2 -= right = -1 1 --1 1 -1 \N -1 257 -1 -1 -= inner = -1 1 -= full = -1 1 1 1 -2 2 0 \N -0 0 -1 1 -0 0 1 \N -0 0 1 257 -0 0 1 -1 -= left = -1 1 1 1 -2 2 0 \N -= right = -1 1 1 1 -0 0 -1 1 -0 0 1 \N -0 0 1 257 -0 0 1 -1 -= inner = -1 1 1 1 -= agg = -5 260 -3 3 -3 258 -1 1 -5 260 -3 3 -3 258 -1 1 -= types = -1 -1 -1 -1 diff --git a/tests/queries/0_stateless/01720_join_implicit_cast.reference.j2 b/tests/queries/0_stateless/01720_join_implicit_cast.reference.j2 new file mode 100644 index 00000000000..807088d2d5d --- /dev/null +++ b/tests/queries/0_stateless/01720_join_implicit_cast.reference.j2 @@ -0,0 +1,53 @@ +{% for join_type in ['hash', 'partial_merge'] -%} +=== {{ join_type }} === += full = +1 1 +2 2 +-1 1 +1 \N +1 257 +1 -1 += left = +1 1 +2 2 += right = +1 1 +-1 1 +1 \N +1 257 +1 -1 += inner = +1 1 += full = +1 1 1 1 +2 2 0 \N +0 0 -1 1 +0 0 1 \N +0 0 1 257 +0 0 1 -1 += left = +1 1 1 1 +2 2 0 \N += right = +1 1 1 1 +0 0 -1 1 +0 0 1 \N +0 0 1 257 +0 0 1 -1 += inner = +1 1 1 1 += agg = +5 260 +3 3 +3 258 +1 1 +5 260 +3 3 +3 258 +1 1 += types = +1 +1 +1 +1 +{% endfor -%} diff --git a/tests/queries/0_stateless/01720_join_implicit_cast.sql b/tests/queries/0_stateless/01720_join_implicit_cast.sql.j2 similarity index 52% rename from tests/queries/0_stateless/01720_join_implicit_cast.sql rename to tests/queries/0_stateless/01720_join_implicit_cast.sql.j2 index cf4a3bdcef6..f7760c38163 100644 --- a/tests/queries/0_stateless/01720_join_implicit_cast.sql +++ b/tests/queries/0_stateless/01720_join_implicit_cast.sql.j2 @@ -6,9 +6,11 @@ CREATE TABLE t_ab2 (id Nullable(Int32), a Int16, b Nullable(Int64)) ENGINE = Tin INSERT INTO t_ab1 VALUES (0, 1, 1), (1, 2, 2); INSERT INTO t_ab2 VALUES (2, -1, 1), (3, 1, NULL), (4, 1, 257), (5, 1, -1), (6, 1, 1); -SELECT '=== hash ==='; +{% for join_type in ['hash', 'partial_merge'] -%} -SET join_algorithm = 'hash'; +SELECT '=== {{ join_type }} ==='; + +SET join_algorithm = '{{ join_type }}'; SELECT '= full ='; SELECT a, b FROM t_ab1 FULL JOIN t_ab2 USING (a, b) ORDER BY ifNull(t_ab1.id, t_ab2.id); @@ -49,48 +51,7 @@ SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(b)) == 'Nullable(Int64)' SELECT * FROM ( SELECT a, b as "CAST(a, Int32)" FROM t_ab1 ) t_ab1 FULL JOIN t_ab2 ON (t_ab1.a == t_ab2.a); -- { serverError 44 } SELECT * FROM ( SELECT a, b as "CAST(a, Int32)" FROM t_ab1 ) t_ab1 FULL JOIN t_ab2 USING (a) FORMAT Null; -SELECT '=== partial_merge ==='; - -SET join_algorithm = 'partial_merge'; - -SELECT '= full ='; -SELECT a, b FROM t_ab1 FULL JOIN t_ab2 USING (a, b) ORDER BY ifNull(t_ab1.id, t_ab2.id); -SELECT '= left ='; -SELECT a, b FROM t_ab1 LEFT JOIN t_ab2 USING (a, b) ORDER BY ifNull(t_ab1.id, t_ab2.id); -SELECT '= right ='; -SELECT a, b FROM t_ab1 RIGHT JOIN t_ab2 USING (a, b) ORDER BY ifNull(t_ab1.id, t_ab2.id); -SELECT '= inner ='; -SELECT a, b FROM t_ab1 INNER JOIN t_ab2 USING (a, b) ORDER BY ifNull(t_ab1.id, t_ab2.id); - -SELECT '= full ='; -SELECT a, b, t_ab2.a, t_ab2.b FROM t_ab1 FULL JOIN t_ab2 ON (t_ab1.a == t_ab2.a AND t_ab1.b == t_ab2.b) ORDER BY ifNull(t_ab1.id, t_ab2.id); -SELECT '= left ='; -SELECT a, b, t_ab2.a, t_ab2.b FROM t_ab1 LEFT JOIN t_ab2 ON (t_ab1.a == t_ab2.a AND t_ab1.b == t_ab2.b) ORDER BY ifNull(t_ab1.id, t_ab2.id); -SELECT '= right ='; -SELECT a, b, t_ab2.a, t_ab2.b FROM t_ab1 RIGHT JOIN t_ab2 ON (t_ab1.a == t_ab2.a AND t_ab1.b == t_ab2.b) ORDER BY ifNull(t_ab1.id, t_ab2.id); -SELECT '= inner ='; -SELECT a, b, t_ab2.a, t_ab2.b FROM t_ab1 INNER JOIN t_ab2 ON (t_ab1.a == t_ab2.a AND t_ab1.b == t_ab2.b) ORDER BY ifNull(t_ab1.id, t_ab2.id); - -SELECT '= agg ='; -SELECT sum(a), sum(b) FROM t_ab1 FULL JOIN t_ab2 USING (a, b); -SELECT sum(a), sum(b) FROM t_ab1 LEFT JOIN t_ab2 USING (a, b); -SELECT sum(a), sum(b) FROM t_ab1 RIGHT JOIN t_ab2 USING (a, b); -SELECT sum(a), sum(b) FROM t_ab1 INNER JOIN t_ab2 USING (a, b); - -SELECT sum(a) + sum(t_ab2.a) - 1, sum(b) + sum(t_ab2.b) - 1 FROM t_ab1 FULL JOIN t_ab2 ON (t_ab1.a == t_ab2.a AND t_ab1.b == t_ab2.b); -SELECT sum(a) + sum(t_ab2.a) - 1, sum(b) + sum(t_ab2.b) - 1 FROM t_ab1 LEFT JOIN t_ab2 ON (t_ab1.a == t_ab2.a AND t_ab1.b == t_ab2.b); -SELECT sum(a) + sum(t_ab2.a) - 1, sum(b) + sum(t_ab2.b) - 1 FROM t_ab1 RIGHT JOIN t_ab2 ON (t_ab1.a == t_ab2.a AND t_ab1.b == t_ab2.b); -SELECT sum(a) + sum(t_ab2.a) - 1, sum(b) + sum(t_ab2.b) - 1 FROM t_ab1 INNER JOIN t_ab2 ON (t_ab1.a == t_ab2.a AND t_ab1.b == t_ab2.b); - -SELECT '= types ='; - -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(b)) == 'Nullable(Int64)' FROM t_ab1 FULL JOIN t_ab2 USING (a, b); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(b)) == 'Nullable(Int64)' FROM t_ab1 LEFT JOIN t_ab2 USING (a, b); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(b)) == 'Nullable(Int64)' FROM t_ab1 RIGHT JOIN t_ab2 USING (a, b); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(b)) == 'Nullable(Int64)' FROM t_ab1 INNER JOIN t_ab2 USING (a, b); - -SELECT * FROM ( SELECT a, b as "CAST(a, Int32)" FROM t_ab1 ) t_ab1 FULL JOIN t_ab2 ON (t_ab1.a == t_ab2.a); -- { serverError 44 } -SELECT * FROM ( SELECT a, b as "CAST(a, Int32)" FROM t_ab1 ) t_ab1 FULL JOIN t_ab2 USING (a) FORMAT Null; +{% endfor %} DROP TABLE IF EXISTS t_ab1; DROP TABLE IF EXISTS t_ab2; diff --git a/tests/queries/0_stateless/01721_join_implicit_cast_long.reference b/tests/queries/0_stateless/01721_join_implicit_cast_long.reference index d78307175f9..51a20d9f524 100644 --- a/tests/queries/0_stateless/01721_join_implicit_cast_long.reference +++ b/tests/queries/0_stateless/01721_join_implicit_cast_long.reference @@ -400,7 +400,7 @@ 1 1 1 -=== switch === +=== auto === = full = -4 0 196 -3 0 197 diff --git a/tests/queries/0_stateless/01721_join_implicit_cast_long.sql b/tests/queries/0_stateless/01721_join_implicit_cast_long.sql.j2 similarity index 51% rename from tests/queries/0_stateless/01721_join_implicit_cast_long.sql rename to tests/queries/0_stateless/01721_join_implicit_cast_long.sql.j2 index a6b411fadde..4479f507046 100644 --- a/tests/queries/0_stateless/01721_join_implicit_cast_long.sql +++ b/tests/queries/0_stateless/01721_join_implicit_cast_long.sql.j2 @@ -7,159 +7,14 @@ CREATE TABLE t2 (a Int16, b Nullable(Int64)) ENGINE = TinyLog; INSERT INTO t1 SELECT number as a, 100 + number as b FROM system.numbers LIMIT 1, 10; INSERT INTO t2 SELECT number - 5 as a, 200 + number - 5 as b FROM system.numbers LIMIT 1, 10; -SELECT '=== hash ==='; -SET join_algorithm = 'hash'; +{% for join_type in ['hash', 'partial_merge', 'auto'] -%} -SELECT '= full ='; -SELECT a, b, t2.b FROM t1 FULL JOIN t2 USING (a) ORDER BY (a); -SELECT '= left ='; -SELECT a, b, t2.b FROM t1 LEFT JOIN t2 USING (a) ORDER BY (a); -SELECT '= right ='; -SELECT a, b, t2.b FROM t1 RIGHT JOIN t2 USING (a) ORDER BY (a); -SELECT '= inner ='; -SELECT a, b, t2.b FROM t1 INNER JOIN t2 USING (a) ORDER BY (a); +SELECT '=== {{ join_type }} ==='; +SET join_algorithm = '{{ join_type }}'; -SELECT '= full ='; -SELECT a, t1.a, t2.a FROM t1 FULL JOIN t2 USING (a) ORDER BY (t1.a, t2.a); -SELECT '= left ='; -SELECT a, t1.a, t2.a FROM t1 LEFT JOIN t2 USING (a) ORDER BY (t1.a, t2.a); -SELECT '= right ='; -SELECT a, t1.a, t2.a FROM t1 RIGHT JOIN t2 USING (a) ORDER BY (t1.a, t2.a); -SELECT '= inner ='; -SELECT a, t1.a, t2.a FROM t1 INNER JOIN t2 USING (a) ORDER BY (t1.a, t2.a); - -SELECT '= join on ='; -SELECT '= full ='; -SELECT a, b, t2.a, t2.b FROM t1 FULL JOIN t2 ON (t1.a == t2.a) ORDER BY (t1.a, t2.a); -SELECT '= left ='; -SELECT a, b, t2.a, t2.b FROM t1 LEFT JOIN t2 ON (t1.a == t2.a) ORDER BY (t1.a, t2.a); -SELECT '= right ='; -SELECT a, b, t2.a, t2.b FROM t1 RIGHT JOIN t2 ON (t1.a == t2.a) ORDER BY (t1.a, t2.a); -SELECT '= inner ='; -SELECT a, b, t2.a, t2.b FROM t1 INNER JOIN t2 ON (t1.a == t2.a) ORDER BY (t1.a, t2.a); - -SELECT '= full ='; -SELECT * FROM t1 FULL JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) ORDER BY (t1.a, t2.a); -SELECT '= left ='; -SELECT * FROM t1 LEFT JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) ORDER BY (t1.a, t2.a); -SELECT '= right ='; -SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) ORDER BY (t1.a, t2.a); -SELECT '= inner ='; -SELECT * FROM t1 INNER JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) ORDER BY (t1.a, t2.a); - --- Int64 and UInt64 has no supertype -SELECT * FROM t1 FULL JOIN t2 ON (t1.a + t1.b + 100 = t2.a + t2.b) ORDER BY (t1.a, t2.a); -- { serverError 53 } -SELECT * FROM t1 LEFT JOIN t2 ON (t1.a + t1.b + 100 = t2.a + t2.b) ORDER BY (t1.a, t2.a); -- { serverError 53 } -SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a + t1.b + 100 = t2.a + t2.b) ORDER BY (t1.a, t2.a); -- { serverError 53 } -SELECT * FROM t1 INNER JOIN t2 ON (t1.a + t1.b + 100 = t2.a + t2.b) ORDER BY (t1.a, t2.a); -- { serverError 53 } - -SELECT '= agg ='; -SELECT sum(a) == 7 FROM t1 FULL JOIN t2 USING (a) WHERE b > 102 AND t2.b <= 204; -SELECT sum(a) == 7 FROM t1 INNER JOIN t2 USING (a) WHERE b > 102 AND t2.b <= 204; - -SELECT sum(b) = 103 FROM t1 LEFT JOIN t2 USING (a) WHERE b > 102 AND t2.b < 204; -SELECT sum(t2.b) = 203 FROM t1 RIGHT JOIN t2 USING (a) WHERE b > 102 AND t2.b < 204; - -SELECT sum(a) == 2 + 3 + 4 FROM t1 FULL JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) WHERE t1.b < 105 AND t2.b > 201; -SELECT sum(a) == 55 FROM t1 FULL JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) WHERE 1; - -SELECT a > 0, sum(a), sum(b) FROM t1 FULL JOIN t2 USING (a) GROUP BY (a > 0) ORDER BY a > 0; -SELECT a > 0, sum(a), sum(t2.a), sum(b), sum(t2.b) FROM t1 FULL JOIN t2 ON (t1.a == t2.a) GROUP BY (a > 0) ORDER BY a > 0; - -SELECT '= types ='; -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(t2.a)) == 'Int32' FROM t1 FULL JOIN t2 USING (a); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(t2.a)) == 'Int32' FROM t1 LEFT JOIN t2 USING (a); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(t2.a)) == 'Int32' FROM t1 RIGHT JOIN t2 USING (a); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(t2.a)) == 'Int32' FROM t1 INNER JOIN t2 USING (a); - -SELECT toTypeName(any(a)) == 'Int32' AND toTypeName(any(t2.a)) == 'Int32' FROM t1 FULL JOIN t2 USING (a); -SELECT min(toTypeName(a) == 'Int32' AND toTypeName(t2.a) == 'Int32') FROM t1 FULL JOIN t2 USING (a); - -SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 FULL JOIN t2 ON (t1.a == t2.a); -SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 LEFT JOIN t2 ON (t1.a == t2.a); -SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 RIGHT JOIN t2 ON (t1.a == t2.a); -SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 INNER JOIN t2 ON (t1.a == t2.a); -SELECT toTypeName(any(a)) == 'UInt16' AND toTypeName(any(t2.a)) == 'Int16' FROM t1 FULL JOIN t2 ON (t1.a == t2.a); - -SELECT '=== partial_merge ==='; - -SET join_algorithm = 'partial_merge'; - -SELECT '= full ='; -SELECT a, b, t2.b FROM t1 FULL JOIN t2 USING (a) ORDER BY (a); -SELECT '= left ='; -SELECT a, b, t2.b FROM t1 LEFT JOIN t2 USING (a) ORDER BY (a); -SELECT '= right ='; -SELECT a, b, t2.b FROM t1 RIGHT JOIN t2 USING (a) ORDER BY (a); -SELECT '= inner ='; -SELECT a, b, t2.b FROM t1 INNER JOIN t2 USING (a) ORDER BY (a); - -SELECT '= full ='; -SELECT a, t1.a, t2.a FROM t1 FULL JOIN t2 USING (a) ORDER BY (t1.a, t2.a); -SELECT '= left ='; -SELECT a, t1.a, t2.a FROM t1 LEFT JOIN t2 USING (a) ORDER BY (t1.a, t2.a); -SELECT '= right ='; -SELECT a, t1.a, t2.a FROM t1 RIGHT JOIN t2 USING (a) ORDER BY (t1.a, t2.a); -SELECT '= inner ='; -SELECT a, t1.a, t2.a FROM t1 INNER JOIN t2 USING (a) ORDER BY (t1.a, t2.a); - -SELECT '= join on ='; -SELECT '= full ='; -SELECT a, b, t2.a, t2.b FROM t1 FULL JOIN t2 ON (t1.a == t2.a) ORDER BY (t1.a, t2.a); -SELECT '= left ='; -SELECT a, b, t2.a, t2.b FROM t1 LEFT JOIN t2 ON (t1.a == t2.a) ORDER BY (t1.a, t2.a); -SELECT '= right ='; -SELECT a, b, t2.a, t2.b FROM t1 RIGHT JOIN t2 ON (t1.a == t2.a) ORDER BY (t1.a, t2.a); -SELECT '= inner ='; -SELECT a, b, t2.a, t2.b FROM t1 INNER JOIN t2 ON (t1.a == t2.a) ORDER BY (t1.a, t2.a); - -SELECT '= full ='; -SELECT * FROM t1 FULL JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) ORDER BY (t1.a, t2.a); -SELECT '= left ='; -SELECT * FROM t1 LEFT JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) ORDER BY (t1.a, t2.a); -SELECT '= right ='; -SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) ORDER BY (t1.a, t2.a); -SELECT '= inner ='; -SELECT * FROM t1 INNER JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) ORDER BY (t1.a, t2.a); - --- Int64 and UInt64 has no supertype -SELECT * FROM t1 FULL JOIN t2 ON (t1.a + t1.b + 100 = t2.a + t2.b) ORDER BY (t1.a, t2.a); -- { serverError 53 } -SELECT * FROM t1 LEFT JOIN t2 ON (t1.a + t1.b + 100 = t2.a + t2.b) ORDER BY (t1.a, t2.a); -- { serverError 53 } -SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a + t1.b + 100 = t2.a + t2.b) ORDER BY (t1.a, t2.a); -- { serverError 53 } -SELECT * FROM t1 INNER JOIN t2 ON (t1.a + t1.b + 100 = t2.a + t2.b) ORDER BY (t1.a, t2.a); -- { serverError 53 } - -SELECT '= agg ='; -SELECT sum(a) == 7 FROM t1 FULL JOIN t2 USING (a) WHERE b > 102 AND t2.b <= 204; -SELECT sum(a) == 7 FROM t1 INNER JOIN t2 USING (a) WHERE b > 102 AND t2.b <= 204; - -SELECT sum(b) = 103 FROM t1 LEFT JOIN t2 USING (a) WHERE b > 102 AND t2.b < 204; -SELECT sum(t2.b) = 203 FROM t1 RIGHT JOIN t2 USING (a) WHERE b > 102 AND t2.b < 204; - -SELECT sum(a) == 2 + 3 + 4 FROM t1 FULL JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) WHERE t1.b < 105 AND t2.b > 201; -SELECT sum(a) == 55 FROM t1 FULL JOIN t2 ON (t1.a + t1.b = t2.a + t2.b - 100) WHERE 1; - -SELECT a > 0, sum(a), sum(b) FROM t1 FULL JOIN t2 USING (a) GROUP BY (a > 0) ORDER BY a > 0; -SELECT a > 0, sum(a), sum(t2.a), sum(b), sum(t2.b) FROM t1 FULL JOIN t2 ON (t1.a == t2.a) GROUP BY (a > 0) ORDER BY a > 0; - -SELECT '= types ='; -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(t2.a)) == 'Int32' FROM t1 FULL JOIN t2 USING (a); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(t2.a)) == 'Int32' FROM t1 LEFT JOIN t2 USING (a); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(t2.a)) == 'Int32' FROM t1 RIGHT JOIN t2 USING (a); -SELECT any(toTypeName(a)) == 'Int32' AND any(toTypeName(t2.a)) == 'Int32' FROM t1 INNER JOIN t2 USING (a); - -SELECT toTypeName(any(a)) == 'Int32' AND toTypeName(any(t2.a)) == 'Int32' FROM t1 FULL JOIN t2 USING (a); -SELECT min(toTypeName(a) == 'Int32' AND toTypeName(t2.a) == 'Int32') FROM t1 FULL JOIN t2 USING (a); - -SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 FULL JOIN t2 ON (t1.a == t2.a); -SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 LEFT JOIN t2 ON (t1.a == t2.a); -SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 RIGHT JOIN t2 ON (t1.a == t2.a); -SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 INNER JOIN t2 ON (t1.a == t2.a); -SELECT toTypeName(any(a)) == 'UInt16' AND toTypeName(any(t2.a)) == 'Int16' FROM t1 FULL JOIN t2 ON (t1.a == t2.a); - -SELECT '=== switch ==='; - -SET join_algorithm = 'auto'; +{% if join_type == 'auto' -%} SET max_bytes_in_join = 100; +{% endif -%} SELECT '= full ='; SELECT a, b, t2.b FROM t1 FULL JOIN t2 USING (a) ORDER BY (a); @@ -232,7 +87,11 @@ SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM SELECT any(toTypeName(a)) == 'UInt16' AND any(toTypeName(t2.a)) == 'Int16' FROM t1 INNER JOIN t2 ON (t1.a == t2.a); SELECT toTypeName(any(a)) == 'UInt16' AND toTypeName(any(t2.a)) == 'Int16' FROM t1 FULL JOIN t2 ON (t1.a == t2.a); +{% if join_type == 'auto' -%} SET max_bytes_in_join = 0; +{% endif -%} + +{% endfor -%} SELECT '=== join use nulls ==='; diff --git a/tests/queries/0_stateless/01732_more_consistent_datetime64_parsing.sql b/tests/queries/0_stateless/01732_more_consistent_datetime64_parsing.sql index dcd874f8c45..88859177a92 100644 --- a/tests/queries/0_stateless/01732_more_consistent_datetime64_parsing.sql +++ b/tests/queries/0_stateless/01732_more_consistent_datetime64_parsing.sql @@ -5,7 +5,7 @@ INSERT INTO t VALUES (3, '1111111111222'); INSERT INTO t VALUES (4, '1111111111.222'); SELECT * FROM t ORDER BY i; -SELECT toDateTime64(1111111111.222, 3); -SELECT toDateTime64('1111111111.222', 3); -SELECT toDateTime64('1111111111222', 3); -SELECT ignore(toDateTime64(1111111111222, 3)); -- This gives somewhat correct but unexpected result +SELECT toDateTime64(1111111111.222, 3, 'Europe/Moscow'); +SELECT toDateTime64('1111111111.222', 3, 'Europe/Moscow'); +SELECT toDateTime64('1111111111222', 3, 'Europe/Moscow'); +SELECT ignore(toDateTime64(1111111111222, 3, 'Europe/Moscow')); -- This gives somewhat correct but unexpected result diff --git a/tests/queries/0_stateless/01734_datetime64_from_float.reference b/tests/queries/0_stateless/01734_datetime64_from_float.reference index 32e7d2736c6..eb96016311d 100644 --- a/tests/queries/0_stateless/01734_datetime64_from_float.reference +++ b/tests/queries/0_stateless/01734_datetime64_from_float.reference @@ -1,7 +1,3 @@ --- { echo } -SELECT CAST(1111111111.222 AS DateTime64(3)); 2005-03-18 04:58:31.222 -SELECT toDateTime(1111111111.222, 3); 2005-03-18 04:58:31.222 -SELECT toDateTime64(1111111111.222, 3); 2005-03-18 04:58:31.222 diff --git a/tests/queries/0_stateless/01734_datetime64_from_float.sql b/tests/queries/0_stateless/01734_datetime64_from_float.sql index b6be65cb7c2..416638a4a73 100644 --- a/tests/queries/0_stateless/01734_datetime64_from_float.sql +++ b/tests/queries/0_stateless/01734_datetime64_from_float.sql @@ -1,4 +1,3 @@ --- { echo } -SELECT CAST(1111111111.222 AS DateTime64(3)); -SELECT toDateTime(1111111111.222, 3); -SELECT toDateTime64(1111111111.222, 3); +SELECT CAST(1111111111.222 AS DateTime64(3, 'Europe/Moscow')); +SELECT toDateTime(1111111111.222, 3, 'Europe/Moscow'); +SELECT toDateTime64(1111111111.222, 3, 'Europe/Moscow'); diff --git a/tests/queries/0_stateless/01756_optimize_skip_unused_shards_rewrite_in.reference b/tests/queries/0_stateless/01756_optimize_skip_unused_shards_rewrite_in.reference index 65b7bf54f7f..972f4c89bdf 100644 --- a/tests/queries/0_stateless/01756_optimize_skip_unused_shards_rewrite_in.reference +++ b/tests/queries/0_stateless/01756_optimize_skip_unused_shards_rewrite_in.reference @@ -1,17 +1,17 @@ (0, 2) 0 0 0 0 -WITH CAST(\'default\', \'String\') AS id_no SELECT one.dummy, ignore(id_no) FROM system.one WHERE dummy IN (0, 2) -WITH CAST(\'default\', \'String\') AS id_no SELECT one.dummy, ignore(id_no) FROM system.one WHERE dummy IN (0, 2) +WITH CAST(\'default\', \'Nullable(String)\') AS id_no SELECT one.dummy, ignore(id_no) FROM system.one WHERE dummy IN (0, 2) +WITH CAST(\'default\', \'Nullable(String)\') AS id_no SELECT one.dummy, ignore(id_no) FROM system.one WHERE dummy IN (0, 2) optimize_skip_unused_shards_rewrite_in(0, 2) 0 0 -WITH CAST(\'default\', \'String\') AS id_02 SELECT one.dummy, ignore(id_02) FROM system.one WHERE dummy IN tuple(0) -WITH CAST(\'default\', \'String\') AS id_02 SELECT one.dummy, ignore(id_02) FROM system.one WHERE dummy IN tuple(2) +WITH CAST(\'default\', \'Nullable(String)\') AS id_02 SELECT one.dummy, ignore(id_02) FROM system.one WHERE dummy IN tuple(0) +WITH CAST(\'default\', \'Nullable(String)\') AS id_02 SELECT one.dummy, ignore(id_02) FROM system.one WHERE dummy IN tuple(2) optimize_skip_unused_shards_rewrite_in(2,) -WITH CAST(\'default\', \'String\') AS id_2 SELECT one.dummy, ignore(id_2) FROM system.one WHERE dummy IN tuple(2) +WITH CAST(\'default\', \'Nullable(String)\') AS id_2 SELECT one.dummy, ignore(id_2) FROM system.one WHERE dummy IN tuple(2) optimize_skip_unused_shards_rewrite_in(0,) 0 0 -WITH CAST(\'default\', \'String\') AS id_0 SELECT one.dummy, ignore(id_0) FROM system.one WHERE dummy IN tuple(0) +WITH CAST(\'default\', \'Nullable(String)\') AS id_0 SELECT one.dummy, ignore(id_0) FROM system.one WHERE dummy IN tuple(0) 0 0 errors diff --git a/tests/queries/0_stateless/01786_explain_merge_tree.reference b/tests/queries/0_stateless/01786_explain_merge_tree.reference index 7a0a0af3e05..9b2df9773ea 100644 --- a/tests/queries/0_stateless/01786_explain_merge_tree.reference +++ b/tests/queries/0_stateless/01786_explain_merge_tree.reference @@ -3,21 +3,21 @@ MinMax Keys: y - Condition: (y in [1, +inf)) + Condition: (y in [1, +Inf)) Parts: 4/5 Granules: 11/12 Partition Keys: y bitAnd(z, 3) - Condition: and((bitAnd(z, 3) not in [1, 1]), and((y in [1, +inf)), (bitAnd(z, 3) not in [1, 1]))) + Condition: and((bitAnd(z, 3) not in [1, 1]), and((y in [1, +Inf)), (bitAnd(z, 3) not in [1, 1]))) Parts: 3/4 Granules: 10/11 PrimaryKey Keys: x y - Condition: and((x in [11, +inf)), (y in [1, +inf))) + Condition: and((x in [11, +Inf)), (y in [1, +Inf))) Parts: 2/3 Granules: 6/10 Skip @@ -36,7 +36,7 @@ { "Type": "MinMax", "Keys": ["y"], - "Condition": "(y in [1, +inf))", + "Condition": "(y in [1, +Inf))", "Initial Parts": 5, "Selected Parts": 4, "Initial Granules": 12, @@ -45,7 +45,7 @@ { "Type": "Partition", "Keys": ["y", "bitAnd(z, 3)"], - "Condition": "and((bitAnd(z, 3) not in [1, 1]), and((y in [1, +inf)), (bitAnd(z, 3) not in [1, 1])))", + "Condition": "and((bitAnd(z, 3) not in [1, 1]), and((y in [1, +Inf)), (bitAnd(z, 3) not in [1, 1])))", "Initial Parts": 4, "Selected Parts": 3, "Initial Granules": 11, @@ -54,7 +54,7 @@ { "Type": "PrimaryKey", "Keys": ["x", "y"], - "Condition": "and((x in [11, +inf)), (y in [1, +inf)))", + "Condition": "and((x in [11, +Inf)), (y in [1, +Inf)))", "Initial Parts": 3, "Selected Parts": 2, "Initial Granules": 10, @@ -104,6 +104,6 @@ Keys: x plus(x, y) - Condition: or((x in 2-element set), (plus(plus(x, y), 1) in (-inf, 2])) + Condition: or((x in 2-element set), (plus(plus(x, y), 1) in (-Inf, 2])) Parts: 1/1 Granules: 1/1 diff --git a/tests/queries/0_stateless/01802_rank_corr_mann_whitney_over_window.sql b/tests/queries/0_stateless/01802_rank_corr_mann_whitney_over_window.sql index 24ee9282ac0..4b8bf0844a3 100644 --- a/tests/queries/0_stateless/01802_rank_corr_mann_whitney_over_window.sql +++ b/tests/queries/0_stateless/01802_rank_corr_mann_whitney_over_window.sql @@ -1,7 +1,5 @@ DROP TABLE IF EXISTS 01802_empsalary; -SET allow_experimental_window_functions=1; - CREATE TABLE 01802_empsalary ( `depname` LowCardinality(String), diff --git a/tests/queries/0_stateless/01852_cast_operator.reference b/tests/queries/0_stateless/01852_cast_operator.reference index 8b4069ab2b8..dc522ae8076 100644 --- a/tests/queries/0_stateless/01852_cast_operator.reference +++ b/tests/queries/0_stateless/01852_cast_operator.reference @@ -13,7 +13,7 @@ SELECT CAST([1, 1 + 1, 1 + 2], \'Array(UInt32)\') AS c 2010-10-10 SELECT CAST(\'2010-10-10\', \'Date\') AS c 2010-10-10 00:00:00 -SELECT CAST(\'2010-10-10\', \'DateTime\') AS c +SELECT CAST(\'2010-10-10\', \'DateTime(\\\'UTC\\\')\') AS c ['2010-10-10','2010-10-10'] SELECT CAST(\'[\\\'2010-10-10\\\', \\\'2010-10-10\\\']\', \'Array(Date)\') 3 diff --git a/tests/queries/0_stateless/01852_cast_operator.sql b/tests/queries/0_stateless/01852_cast_operator.sql index 98ac7ee73e7..adb9f86539d 100644 --- a/tests/queries/0_stateless/01852_cast_operator.sql +++ b/tests/queries/0_stateless/01852_cast_operator.sql @@ -19,8 +19,8 @@ EXPLAIN SYNTAX SELECT [1, 1 + 1, 1 + 2]::Array(UInt32) AS c; SELECT '2010-10-10'::Date AS c; EXPLAIN SYNTAX SELECT '2010-10-10'::Date AS c; -SELECT '2010-10-10'::DateTime AS c; -EXPLAIN SYNTAX SELECT '2010-10-10'::DateTime AS c; +SELECT '2010-10-10'::DateTime('UTC') AS c; +EXPLAIN SYNTAX SELECT '2010-10-10'::DateTime('UTC') AS c; SELECT ['2010-10-10', '2010-10-10']::Array(Date) AS c; EXPLAIN SYNTAX SELECT ['2010-10-10', '2010-10-10']::Array(Date); diff --git a/tests/queries/0_stateless/01854_HTTP_dict_decompression.python b/tests/queries/0_stateless/01854_HTTP_dict_decompression.python index 98581a1e47c..216e1afa71d 100644 --- a/tests/queries/0_stateless/01854_HTTP_dict_decompression.python +++ b/tests/queries/0_stateless/01854_HTTP_dict_decompression.python @@ -28,20 +28,20 @@ CLICKHOUSE_PORT_HTTP = os.environ.get('CLICKHOUSE_PORT_HTTP', '8123') # accessible from clickhouse server. ##################################################################################### -# IP-address of this host accessible from outside world. -HTTP_SERVER_HOST = subprocess.check_output(['hostname', '-i']).decode('utf-8').strip() +# IP-address of this host accessible from the outside world. Get the first one +HTTP_SERVER_HOST = subprocess.check_output(['hostname', '-i']).decode('utf-8').strip().split()[0] HTTP_SERVER_PORT = get_local_port(HTTP_SERVER_HOST) # IP address and port of the HTTP server started from this script. HTTP_SERVER_ADDRESS = (HTTP_SERVER_HOST, HTTP_SERVER_PORT) HTTP_SERVER_URL_STR = 'http://' + ':'.join(str(s) for s in HTTP_SERVER_ADDRESS) + "/" -# Because we need to check content of file.csv we can create this content and avoid reading csv +# Because we need to check the content of file.csv we can create this content and avoid reading csv CSV_DATA = "Hello, 1\nWorld, 2\nThis, 152\nis, 9283\ntesting, 2313213\ndata, 555\n" -# Choose compression method -# (Will change during test, need to check standart data sending, to make sure that nothing broke) +# Choose compression method +# (Will change during test, need to check standard data sending, to make sure that nothing broke) COMPRESS_METHOD = 'none' ADDING_ENDING = '' ENDINGS = ['.gz', '.xz'] @@ -88,7 +88,7 @@ class HttpProcessor(SimpleHTTPRequestHandler): def do_GET(self): self._set_headers() - + if COMPRESS_METHOD == 'none': self.wfile.write(CSV_DATA.encode()) else: @@ -120,7 +120,7 @@ def test_select(dict_name="", schema="word String, counter UInt32", requests=[], if i > 2: ADDING_ENDING = ENDINGS[i-3] SEND_ENCODING = False - + if dict_name: get_ch_answer("drop dictionary if exists {}".format(dict_name)) get_ch_answer('''CREATE DICTIONARY {} ({}) diff --git a/tests/queries/0_stateless/01860_Distributed__shard_num_GROUP_BY.sql b/tests/queries/0_stateless/01860_Distributed__shard_num_GROUP_BY.sql index 91215fd8ee6..d8a86b7799e 100644 --- a/tests/queries/0_stateless/01860_Distributed__shard_num_GROUP_BY.sql +++ b/tests/queries/0_stateless/01860_Distributed__shard_num_GROUP_BY.sql @@ -11,6 +11,4 @@ SELECT _shard_num + dummy s, count() FROM remote('127.0.0.{1,2}', system.one) GR SELECT _shard_num FROM remote('127.0.0.{1,2}', system.one) ORDER BY _shard_num; SELECT _shard_num s FROM remote('127.0.0.{1,2}', system.one) ORDER BY _shard_num; -SELECT _shard_num s, count() FROM remote('127.0.0.{1,2}', system.one) GROUP BY s order by s; - -select materialize(_shard_num), * from remote('127.{1,2}', system.one) limit 1 by dummy format Null; +SELECT _shard_num, count() FROM remote('127.0.0.{1,2}', system.one) GROUP BY _shard_num order by _shard_num; diff --git a/tests/queries/0_stateless/01861_explain_pipeline.reference b/tests/queries/0_stateless/01861_explain_pipeline.reference index 9d62fb9f6b8..63ba55f5a04 100644 --- a/tests/queries/0_stateless/01861_explain_pipeline.reference +++ b/tests/queries/0_stateless/01861_explain_pipeline.reference @@ -5,7 +5,7 @@ ExpressionTransform ExpressionTransform ReplacingSorted 2 → 1 ExpressionTransform × 2 - MergeTree × 2 0 → 1 + MergeTreeInOrder × 2 0 → 1 0 0 1 1 2 2 @@ -22,4 +22,4 @@ ExpressionTransform × 2 Copy × 2 1 → 2 AddingSelector × 2 ExpressionTransform × 2 - MergeTree × 2 0 → 1 + MergeTreeInOrder × 2 0 → 1 diff --git a/tests/queries/0_stateless/01867_support_datetime64_version_column.sql b/tests/queries/0_stateless/01867_support_datetime64_version_column.sql index f4427be635a..1aea0fb91f2 100644 --- a/tests/queries/0_stateless/01867_support_datetime64_version_column.sql +++ b/tests/queries/0_stateless/01867_support_datetime64_version_column.sql @@ -1,5 +1,5 @@ drop table if exists replacing; -create table replacing( `A` Int64, `D` DateTime64(9), `S` String) ENGINE = ReplacingMergeTree(D) ORDER BY A; +create table replacing( `A` Int64, `D` DateTime64(9, 'Europe/Moscow'), `S` String) ENGINE = ReplacingMergeTree(D) ORDER BY A; insert into replacing values (1,'1970-01-01 08:25:46.300800000','a'); insert into replacing values (2,'1970-01-01 08:25:46.300800002','b'); diff --git a/tests/queries/0_stateless/01881_join_on_conditions.reference b/tests/queries/0_stateless/01881_join_on_conditions.reference new file mode 100644 index 00000000000..e1fac0e7dc3 --- /dev/null +++ b/tests/queries/0_stateless/01881_join_on_conditions.reference @@ -0,0 +1,108 @@ +-- hash_join -- +-- +222 2 +222 222 +333 333 +-- +222 222 +333 333 +-- +222 +333 +-- +1 +1 +1 +1 +1 +1 +1 +1 +1 +-- +2 +2 +3 +2 +3 +2 +3 +2 +3 +2 +3 +2 +3 +2 +3 +2 +3 +-- +222 2 +333 3 +222 2 +333 3 +-- +0 2 AAA a +0 4 CCC CCC +1 111 111 0 +2 222 2 0 +2 222 222 2 AAA AAA +3 333 333 3 BBB BBB +-- +2 222 2 2 AAA a +2 222 222 2 AAA AAA +-- partial_merge -- +-- +222 2 +222 222 +333 333 +-- +222 222 +333 333 +-- +222 +333 +-- +1 +1 +1 +1 +1 +1 +1 +1 +1 +-- +2 +2 +3 +2 +3 +2 +3 +2 +3 +2 +3 +2 +3 +2 +3 +2 +3 +-- +222 2 +333 3 +222 2 +333 3 +-- +0 2 AAA a +0 4 CCC CCC +1 111 111 0 +2 222 2 0 +2 222 222 2 AAA AAA +3 333 333 3 BBB BBB +-- +2 222 2 2 AAA a +2 222 222 2 AAA AAA diff --git a/tests/queries/0_stateless/01881_join_on_conditions.sql b/tests/queries/0_stateless/01881_join_on_conditions.sql new file mode 100644 index 00000000000..a34c413845b --- /dev/null +++ b/tests/queries/0_stateless/01881_join_on_conditions.sql @@ -0,0 +1,141 @@ +DROP TABLE IF EXISTS t1; +DROP TABLE IF EXISTS t2; +DROP TABLE IF EXISTS t2_nullable; +DROP TABLE IF EXISTS t2_lc; + +CREATE TABLE t1 (`id` Int32, key String, key2 String) ENGINE = TinyLog; +CREATE TABLE t2 (`id` Int32, key String, key2 String) ENGINE = TinyLog; +CREATE TABLE t2_nullable (`id` Int32, key String, key2 Nullable(String)) ENGINE = TinyLog; +CREATE TABLE t2_lc (`id` Int32, key String, key2 LowCardinality(String)) ENGINE = TinyLog; + +INSERT INTO t1 VALUES (1, '111', '111'),(2, '222', '2'),(2, '222', '222'),(3, '333', '333'); +INSERT INTO t2 VALUES (2, 'AAA', 'AAA'),(2, 'AAA', 'a'),(3, 'BBB', 'BBB'),(4, 'CCC', 'CCC'); +INSERT INTO t2_nullable VALUES (2, 'AAA', 'AAA'),(2, 'AAA', 'a'),(3, 'BBB', NULL),(4, 'CCC', 'CCC'); +INSERT INTO t2_lc VALUES (2, 'AAA', 'AAA'),(2, 'AAA', 'a'),(3, 'BBB', 'BBB'),(4, 'CCC', 'CCC'); + +SELECT '-- hash_join --'; + +SELECT '--'; +SELECT t1.key, t1.key2 FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2; +SELECT '--'; +SELECT t1.key, t1.key2 FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2; + +SELECT '--'; +SELECT t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2; + +SELECT '--'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t2.id > 2; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t2.id == 3; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t2.key2 == 'BBB'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t1.key2 == '333'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2_nullable as t2 ON t1.id == t2.id AND (t2.key == t2.key2 OR isNull(t2.key2)) AND t1.key == t1.key2 AND t1.key2 == '333'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2_lc as t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t1.key2 == '333'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2_nullable as t2 ON t1.id == t2.id AND isNull(t2.key2); +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2_nullable as t2 ON t1.id == t2.id AND t1.key2 like '33%'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t1.id >= length(t1.key); + +-- DISTINCT is used to remove the difference between 'hash' and 'merge' join: 'merge' doesn't support `any_join_distinct_right_table_keys` + +SELECT '--'; +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2_nullable as t2 ON t1.id == t2.id AND t2.key2 != ''; +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toNullable(t2.key2 != ''); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toLowCardinality(t2.key2 != ''); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toLowCardinality(toNullable(t2.key2 != '')); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toNullable(toLowCardinality(t2.key2 != '')); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toNullable(t1.key2 != ''); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toLowCardinality(t1.key2 != ''); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toLowCardinality(toNullable(t1.key2 != '')); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toNullable(toLowCardinality(t1.key2 != '')); + +SELECT '--'; +SELECT DISTINCT t1.key, toUInt8(t1.id) as e FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND e; +-- `e + 1` is UInt16 +SELECT DISTINCT t1.key, toUInt8(t1.id) as e FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND e + 1; -- { serverError 403 } +SELECT DISTINCT t1.key, toUInt8(t1.id) as e FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toUInt8(e + 1); + +SELECT '--'; +SELECT t1.id, t1.key, t1.key2, t2.id, t2.key, t2.key2 FROM t1 FULL JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 ORDER BY t1.id, t2.id; + +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t1.id; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.id; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t1.id + 2; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.id + 2; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t1.key; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.key; -- { serverError 403 } +SELECT * FROM t1 JOIN t2 ON t2.key == t2.key2 AND (t1.id == t2.id OR isNull(t2.key2)); -- { serverError 403 } +SELECT * FROM t1 JOIN t2 ON t2.key == t2.key2 OR t1.id == t2.id; -- { serverError 403 } +SELECT * FROM t1 JOIN t2 ON (t2.key == t2.key2 AND (t1.key == t1.key2 AND t1.key != 'XXX' OR t1.id == t2.id)) AND t1.id == t2.id; -- { serverError 403 } +-- non-equi condition containing columns from different tables doesn't supported yet +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t1.id >= t2.id; -- { serverError 403 } +SELECT * FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t1.id >= length(t2.key); -- { serverError 403 } + +SELECT '--'; +-- length(t1.key2) == length(t2.key2) is expression for columns from both tables, it works because it part of joining key +SELECT t1.*, t2.* FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND length(t1.key2) == length(t2.key2) AND t1.key != '333'; + +SET join_algorithm = 'partial_merge'; + +SELECT '-- partial_merge --'; + +SELECT '--'; +SELECT t1.key, t1.key2 FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2; +SELECT '--'; +SELECT t1.key, t1.key2 FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2; + +SELECT '--'; +SELECT t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2; + +SELECT '--'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t2.id > 2; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t2.id == 3; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t2.key2 == 'BBB'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t1.key2 == '333'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2_nullable as t2 ON t1.id == t2.id AND (t2.key == t2.key2 OR isNull(t2.key2)) AND t1.key == t1.key2 AND t1.key2 == '333'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2_lc as t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t1.key2 == '333'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2_nullable as t2 ON t1.id == t2.id AND isNull(t2.key2); +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2_nullable as t2 ON t1.id == t2.id AND t1.key2 like '33%'; +SELECT '333' = t1.key FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t1.id >= length(t1.key); + +-- DISTINCT is used to remove the difference between 'hash' and 'merge' join: 'merge' doesn't support `any_join_distinct_right_table_keys` + +SELECT '--'; +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2_nullable as t2 ON t1.id == t2.id AND t2.key2 != ''; +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toNullable(t2.key2 != ''); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toLowCardinality(t2.key2 != ''); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toLowCardinality(toNullable(t2.key2 != '')); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toNullable(toLowCardinality(t2.key2 != '')); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toNullable(t1.key2 != ''); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toLowCardinality(t1.key2 != ''); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toLowCardinality(toNullable(t1.key2 != '')); +SELECT DISTINCT t1.id FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toNullable(toLowCardinality(t1.key2 != '')); + +SELECT '--'; +SELECT DISTINCT t1.key, toUInt8(t1.id) as e FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND e; +-- `e + 1` is UInt16 +SELECT DISTINCT t1.key, toUInt8(t1.id) as e FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND e + 1; -- { serverError 403 } +SELECT DISTINCT t1.key, toUInt8(t1.id) as e FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND toUInt8(e + 1); + +SELECT '--'; +SELECT t1.id, t1.key, t1.key2, t2.id, t2.key, t2.key2 FROM t1 FULL JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 ORDER BY t1.id, t2.id; + +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t1.id; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.id; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t1.id + 2; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.id + 2; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t1.key; -- { serverError 403 } +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t2.key; -- { serverError 403 } +SELECT * FROM t1 JOIN t2 ON t2.key == t2.key2 AND (t1.id == t2.id OR isNull(t2.key2)); -- { serverError 403 } +SELECT * FROM t1 JOIN t2 ON t2.key == t2.key2 OR t1.id == t2.id; -- { serverError 403 } +SELECT * FROM t1 JOIN t2 ON (t2.key == t2.key2 AND (t1.key == t1.key2 AND t1.key != 'XXX' OR t1.id == t2.id)) AND t1.id == t2.id; -- { serverError 403 } +-- non-equi condition containing columns from different tables doesn't supported yet +SELECT * FROM t1 INNER ALL JOIN t2 ON t1.id == t2.id AND t1.id >= t2.id; -- { serverError 403 } +SELECT * FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND t2.key == t2.key2 AND t1.key == t1.key2 AND t1.id >= length(t2.key); -- { serverError 403 } + +SELECT '--'; +-- length(t1.key2) == length(t2.key2) is expression for columns from both tables, it works because it part of joining key +SELECT t1.*, t2.* FROM t1 INNER ANY JOIN t2 ON t1.id == t2.id AND length(t1.key2) == length(t2.key2) AND t1.key != '333'; + +DROP TABLE IF EXISTS t1; +DROP TABLE IF EXISTS t2; +DROP TABLE IF EXISTS t2_nullable; +DROP TABLE IF EXISTS t2_lc; diff --git a/tests/queries/0_stateless/01882_total_rows_approx.reference b/tests/queries/0_stateless/01882_total_rows_approx.reference index 7f2070fc9cb..fd1fb9b7231 100644 --- a/tests/queries/0_stateless/01882_total_rows_approx.reference +++ b/tests/queries/0_stateless/01882_total_rows_approx.reference @@ -1,8 +1 @@ -Waiting for query to be started... -Query started. -Checking total_rows_approx. -10 -10 -10 -10 -10 +"total_rows_to_read":"10" diff --git a/tests/queries/0_stateless/01882_total_rows_approx.sh b/tests/queries/0_stateless/01882_total_rows_approx.sh index f51e95b15c0..26333f61692 100755 --- a/tests/queries/0_stateless/01882_total_rows_approx.sh +++ b/tests/queries/0_stateless/01882_total_rows_approx.sh @@ -1,23 +1,12 @@ #!/usr/bin/env bash -# Check that total_rows_approx (via system.processes) includes all rows from +# Check that total_rows_approx (via http headers) includes all rows from # all parts at the query start. # # At some point total_rows_approx was accounted only when the query starts # reading the part, and so total_rows_approx wasn't reliable, even for simple # SELECT FROM MergeTree() # It was fixed by take total_rows_approx into account as soon as possible. -# -# To check total_rows_approx this query starts the query in background, -# that sleep's 1 second for each part, and by using max_threads=1 the query -# reads parts sequentially and sleeps 1 second between parts. -# Also the test spawns background process to check total_rows_approx for this -# query. -# It checks multiple times since at first few iterations the query may not -# start yet (since there are 3 excessive sleep calls - 1 for primary key -# analysis and 2 for partition pruning), and get only last 5 total_rows_approx -# rows (one row is not enough since when the query finishes total_rows_approx -# will be set to 10 anyway, regardless proper accounting). CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh @@ -25,31 +14,14 @@ CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) $CLICKHOUSE_CLIENT -q "drop table if exists data_01882" $CLICKHOUSE_CLIENT -q "create table data_01882 (key Int) Engine=MergeTree() partition by key order by key as select * from numbers(10)" -QUERY_ID="$CLICKHOUSE_TEST_NAME-$(tr -cd '[:lower:]' < /dev/urandom | head -c10)" - -function check_background_query() -{ - echo "Waiting for query to be started..." - while [[ $($CLICKHOUSE_CLIENT --param_query_id="$QUERY_ID" -q 'select count() from system.processes where query_id = {query_id:String}') != 1 ]]; do - sleep 0.01 - done - echo "Query started." - - echo "Checking total_rows_approx." - # check total_rows_approx multiple times - # (to make test more reliable to what it covers) - local i=0 - for ((i = 0; i < 20; ++i)); do - $CLICKHOUSE_CLIENT --param_query_id="$QUERY_ID" -q 'select total_rows_approx from system.processes where query_id = {query_id:String}' - (( ++i )) - sleep 1 - done | tail -n5 -} -check_background_query & - -# this query will sleep 10 seconds in total, 1 seconds for each part (10 parts). -$CLICKHOUSE_CLIENT -q "select *, sleepEachRow(1) from data_01882" --max_threads=1 --format Null --query_id="$QUERY_ID" --max_block_size=1 - -wait - -$CLICKHOUSE_CLIENT -q "drop table data_01882" +# send_progress_in_http_headers will periodically send the progress +# but this is not stable, i.e. it can be dumped on query end, +# thus check few times to be sure that this is not coincidence. +for _ in {1..30}; do + $CLICKHOUSE_CURL -vsS "${CLICKHOUSE_URL}&max_threads=1&default_format=Null&send_progress_in_http_headers=1&http_headers_progress_interval_ms=1" --data-binary @- <<< "select * from data_01882" |& { + grep -o -F '"total_rows_to_read":"10"' + } | { + # grep out final result + grep -v -F '"read_rows":"10"' + } +done | uniq diff --git a/tests/queries/0_stateless/01889_clickhouse_client_config_format.reference b/tests/queries/0_stateless/01889_clickhouse_client_config_format.reference index aa7748928f1..202e32a583e 100644 --- a/tests/queries/0_stateless/01889_clickhouse_client_config_format.reference +++ b/tests/queries/0_stateless/01889_clickhouse_client_config_format.reference @@ -13,4 +13,4 @@ yml yaml 2 ini -Code: 347. Unknown format of '/config_default.ini' config +Code: 347. Unknown format of '/config_default.ini' config. (CANNOT_LOAD_CONFIG) diff --git a/tests/queries/0_stateless/01889_sqlite_read_write.reference b/tests/queries/0_stateless/01889_sqlite_read_write.reference index 2388a8b16c5..e979b5816c5 100644 --- a/tests/queries/0_stateless/01889_sqlite_read_write.reference +++ b/tests/queries/0_stateless/01889_sqlite_read_write.reference @@ -3,14 +3,16 @@ show database tables: table1 table2 table3 +table4 +table5 +show creare table: +CREATE TABLE SQLite.table1\n(\n `col1` Nullable(String),\n `col2` Nullable(Int16)\n)\nENGINE = SQLite +CREATE TABLE SQLite.table2\n(\n `col1` Nullable(Int32),\n `col2` Nullable(String)\n)\nENGINE = SQLite describe table: col1 Nullable(String) col2 Nullable(Int16) col1 Nullable(Int32) col2 Nullable(String) -describe table: -CREATE TABLE SQLite.table1\n(\n `col1` Nullable(String),\n `col2` Nullable(Int16)\n)\nENGINE = SQLite -CREATE TABLE SQLite.table2\n(\n `col1` Nullable(Int32),\n `col2` Nullable(String)\n)\nENGINE = SQLite select *: line1 1 line2 2 @@ -18,28 +20,23 @@ line3 3 1 text1 2 text2 3 text3 -test NULLs: -\N 1 -not a null 2 -\N 3 - 4 -detach -line1 1 -line2 2 -line3 3 -1 text1 -2 text2 -3 text3 +test types +CREATE TABLE SQLite.table4\n(\n `a` Nullable(Int32),\n `b` Nullable(Int32),\n `c` Nullable(Int8),\n `d` Nullable(Int16),\n `e` Nullable(Int32),\n `bigint` Nullable(String),\n `int2` Nullable(String),\n `int8` Nullable(String)\n)\nENGINE = SQLite +CREATE TABLE SQLite.table5\n(\n `a` Nullable(String),\n `b` Nullable(String),\n `c` Nullable(Float64),\n `d` Nullable(Float64),\n `e` Nullable(Float64),\n `f` Nullable(Float32)\n)\nENGINE = SQLite create table engine with table3 CREATE TABLE default.sqlite_table3\n(\n `col1` String,\n `col2` Int32\n)\nENGINE = SQLite 1 not a null 2 3 4 -test types -CREATE TABLE SQLite.table4\n(\n `a` Nullable(Int32),\n `b` Nullable(Int32),\n `c` Nullable(Int8),\n `d` Nullable(Int16),\n `e` Nullable(Int32),\n `bigint` Nullable(String),\n `int2` Nullable(String),\n `int8` Nullable(String)\n)\nENGINE = SQLite -CREATE TABLE SQLite.table5\n(\n `a` Nullable(String),\n `b` Nullable(String),\n `c` Nullable(Float64),\n `d` Nullable(Float64),\n `e` Nullable(Float64),\n `f` Nullable(Float32)\n)\nENGINE = SQLite +line6 6 + 7 test table function line1 1 line2 2 line3 3 +line4 4 +test path in clickhouse-local +line1 1 +line2 2 +line3 3 diff --git a/tests/queries/0_stateless/01889_sqlite_read_write.sh b/tests/queries/0_stateless/01889_sqlite_read_write.sh index f78736b841a..73b106e9eb4 100755 --- a/tests/queries/0_stateless/01889_sqlite_read_write.sh +++ b/tests/queries/0_stateless/01889_sqlite_read_write.sh @@ -4,71 +4,86 @@ CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh . "$CUR_DIR"/../shell_config.sh -DATA_FILE1=$CUR_DIR/data_sqlite/db1 -DATA_FILE2=$CUR_DIR/db2 +# See 01658_read_file_to_string_column.sh +user_files_path=$(clickhouse-client --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}') + +mkdir -p ${user_files_path}/ +chmod 777 ${user_files_path} +DB_PATH=${user_files_path}/db1 + + +sqlite3 ${DB_PATH} 'DROP TABLE IF EXISTS table1' +sqlite3 ${DB_PATH} 'DROP TABLE IF EXISTS table2' +sqlite3 ${DB_PATH} 'DROP TABLE IF EXISTS table3' +sqlite3 ${DB_PATH} 'DROP TABLE IF EXISTS table4' +sqlite3 ${DB_PATH} 'DROP TABLE IF EXISTS table5' + +sqlite3 ${DB_PATH} 'CREATE TABLE table1 (col1 text, col2 smallint);' +sqlite3 ${DB_PATH} 'CREATE TABLE table2 (col1 int, col2 text);' + +chmod ugo+w ${DB_PATH} + +sqlite3 ${DB_PATH} "INSERT INTO table1 VALUES ('line1', 1), ('line2', 2), ('line3', 3)" +sqlite3 ${DB_PATH} "INSERT INTO table2 VALUES (1, 'text1'), (2, 'text2'), (3, 'text3')" + +sqlite3 ${DB_PATH} 'CREATE TABLE table3 (col1 text, col2 int);' +sqlite3 ${DB_PATH} 'INSERT INTO table3 VALUES (NULL, 1)' +sqlite3 ${DB_PATH} "INSERT INTO table3 VALUES ('not a null', 2)" +sqlite3 ${DB_PATH} 'INSERT INTO table3 VALUES (NULL, 3)' +sqlite3 ${DB_PATH} "INSERT INTO table3 VALUES ('', 4)" + +sqlite3 ${DB_PATH} 'CREATE TABLE table4 (a int, b integer, c tinyint, d smallint, e mediumint, bigint, int2, int8)' +sqlite3 ${DB_PATH} 'CREATE TABLE table5 (a character(20), b varchar(10), c real, d double, e double precision, f float)' + ${CLICKHOUSE_CLIENT} --query='DROP DATABASE IF EXISTS sqlite_database' ${CLICKHOUSE_CLIENT} --query="select 'create database engine'"; -${CLICKHOUSE_CLIENT} --query="CREATE DATABASE sqlite_database ENGINE = SQLite('${DATA_FILE1}')" +${CLICKHOUSE_CLIENT} --query="CREATE DATABASE sqlite_database ENGINE = SQLite('${DB_PATH}')" ${CLICKHOUSE_CLIENT} --query="select 'show database tables:'"; ${CLICKHOUSE_CLIENT} --query='SHOW TABLES FROM sqlite_database;' +${CLICKHOUSE_CLIENT} --query="select 'show creare table:'"; +${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_database.table1;' | sed -r 's/(.*SQLite)(.*)/\1/' +${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_database.table2;' | sed -r 's/(.*SQLite)(.*)/\1/' + ${CLICKHOUSE_CLIENT} --query="select 'describe table:'"; ${CLICKHOUSE_CLIENT} --query='DESCRIBE TABLE sqlite_database.table1;' ${CLICKHOUSE_CLIENT} --query='DESCRIBE TABLE sqlite_database.table2;' -${CLICKHOUSE_CLIENT} --query="select 'describe table:'"; -${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_database.table1;' | sed -r 's/(.*SQLite)(.*)/\1/' -${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_database.table2;' | sed -r 's/(.*SQLite)(.*)/\1/' - ${CLICKHOUSE_CLIENT} --query="select 'select *:'"; ${CLICKHOUSE_CLIENT} --query='SELECT * FROM sqlite_database.table1 ORDER BY col2' ${CLICKHOUSE_CLIENT} --query='SELECT * FROM sqlite_database.table2 ORDER BY col1;' -sqlite3 $CUR_DIR/db2 'DROP TABLE IF EXISTS table3' -sqlite3 $CUR_DIR/db2 'CREATE TABLE table3 (col1 text, col2 int)' -sqlite3 $CUR_DIR/db2 'INSERT INTO table3 VALUES (NULL, 1)' -sqlite3 $CUR_DIR/db2 "INSERT INTO table3 VALUES ('not a null', 2)" -sqlite3 $CUR_DIR/db2 'INSERT INTO table3 VALUES (NULL, 3)' -sqlite3 $CUR_DIR/db2 "INSERT INTO table3 VALUES ('', 4)" +${CLICKHOUSE_CLIENT} --query="select 'test types'"; +${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_database.table4;' | sed -r 's/(.*SQLite)(.*)/\1/' +${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_database.table5;' | sed -r 's/(.*SQLite)(.*)/\1/' -${CLICKHOUSE_CLIENT} --query='DROP DATABASE IF EXISTS sqlite_database_2' -${CLICKHOUSE_CLIENT} --query="CREATE DATABASE sqlite_database_2 ENGINE = SQLite('${DATA_FILE2}')" -# Do not run these, bacuase requires permissions in ci for write access to the directory of the created file and chmod does not help. -# ${CLICKHOUSE_CLIENT} --query="INSERT INTO sqlite_database_2.table3 VALUES (NULL, 3);" -# ${CLICKHOUSE_CLIENT} --query="INSERT INTO sqlite_database_2.table3 VALUES (NULL, 4);" -# ${CLICKHOUSE_CLIENT} --query="INSERT INTO sqlite_database_2.table3 VALUES ('line5', 5);" -${CLICKHOUSE_CLIENT} --query="select 'test NULLs:'"; -${CLICKHOUSE_CLIENT} --query='SELECT * FROM sqlite_database_2.table3 ORDER BY col2;' +${CLICKHOUSE_CLIENT} --query='DROP DATABASE IF EXISTS sqlite_database' -${CLICKHOUSE_CLIENT} --query="select 'detach'"; -${CLICKHOUSE_CLIENT} --query='DETACH DATABASE sqlite_database;' -${CLICKHOUSE_CLIENT} --query='ATTACH DATABASE sqlite_database;' - -${CLICKHOUSE_CLIENT} --query='SELECT * FROM sqlite_database.table1 ORDER BY col2' -${CLICKHOUSE_CLIENT} --query='SELECT * FROM sqlite_database.table2 ORDER BY col1;' - -${CLICKHOUSE_CLIENT} --query='DROP DATABASE IF EXISTS sqlite_database;' ${CLICKHOUSE_CLIENT} --query="select 'create table engine with table3'"; ${CLICKHOUSE_CLIENT} --query='DROP TABLE IF EXISTS sqlite_table3' -${CLICKHOUSE_CLIENT} --query="CREATE TABLE sqlite_table3 (col1 String, col2 Int32) ENGINE = SQLite('${DATA_FILE2}', 'table3')" +${CLICKHOUSE_CLIENT} --query="CREATE TABLE sqlite_table3 (col1 String, col2 Int32) ENGINE = SQLite('${DB_PATH}', 'table3')" + ${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_table3;' | sed -r 's/(.*SQLite)(.*)/\1/' -# Do not run these, bacuase requires permissions in ci for write access to the directory of the created file and chmod does not help. -# ${CLICKHOUSE_CLIENT} --query="INSERT INTO sqlite_table3 VALUES ('line6', 6);" -# ${CLICKHOUSE_CLIENT} --query="INSERT INTO sqlite_table3 VALUES (NULL, 7);" +${CLICKHOUSE_CLIENT} --query="INSERT INTO sqlite_table3 VALUES ('line6', 6);" +${CLICKHOUSE_CLIENT} --query="INSERT INTO sqlite_table3 VALUES (NULL, 7);" + ${CLICKHOUSE_CLIENT} --query='SELECT * FROM sqlite_table3 ORDER BY col2' -sqlite3 $CUR_DIR/db2 'DROP TABLE IF EXISTS table4' -sqlite3 $CUR_DIR/db2 'CREATE TABLE table4 (a int, b integer, c tinyint, d smallint, e mediumint, bigint, int2, int8)' -${CLICKHOUSE_CLIENT} --query="select 'test types'"; -${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_database_2.table4;' | sed -r 's/(.*SQLite)(.*)/\1/' -sqlite3 $CUR_DIR/db2 'CREATE TABLE table5 (a character(20), b varchar(10), c real, d double, e double precision, f float)' -${CLICKHOUSE_CLIENT} --query='SHOW CREATE TABLE sqlite_database_2.table5;' | sed -r 's/(.*SQLite)(.*)/\1/' ${CLICKHOUSE_CLIENT} --query="select 'test table function'"; -${CLICKHOUSE_CLIENT} --query="SELECT * FROM sqlite('${DATA_FILE1}', 'table1') ORDER BY col2" +${CLICKHOUSE_CLIENT} --query="INSERT INTO TABLE FUNCTION sqlite('${DB_PATH}', 'table1') SELECT 'line4', 4" +${CLICKHOUSE_CLIENT} --query="SELECT * FROM sqlite('${DB_PATH}', 'table1') ORDER BY col2" -rm ${DATA_FILE2} + +sqlite3 $CUR_DIR/db2 'DROP TABLE IF EXISTS table1' +sqlite3 $CUR_DIR/db2 'CREATE TABLE table1 (col1 text, col2 smallint);' +sqlite3 $CUR_DIR/db2 "INSERT INTO table1 VALUES ('line1', 1), ('line2', 2), ('line3', 3)" + +${CLICKHOUSE_CLIENT} --query="select 'test path in clickhouse-local'"; +${CLICKHOUSE_LOCAL} --query="SELECT * FROM sqlite('$CUR_DIR/db2', 'table1') ORDER BY col2" + +rm -r ${DB_PATH} diff --git a/tests/queries/0_stateless/01889_tokenize.reference b/tests/queries/0_stateless/01889_tokenize.reference new file mode 100644 index 00000000000..4dd6f323929 --- /dev/null +++ b/tests/queries/0_stateless/01889_tokenize.reference @@ -0,0 +1,8 @@ +['It','is','quite','a','wonderful','day','isn','t','it'] +['There','is','so','much','to','learn'] +['22','00','email','yandex','ru'] +['ТокенизациÑ','каких','либо','других','Ñзыков'] +['It','is','quite','a','wonderful','day,','isn\'t','it?'] +['There','is....','so','much','to','learn!'] +['22:00','email@yandex.ru'] +['ТокенизациÑ','каких-либо','других','Ñзыков?'] diff --git a/tests/queries/0_stateless/01889_tokenize.sql b/tests/queries/0_stateless/01889_tokenize.sql new file mode 100644 index 00000000000..c9d29a8632b --- /dev/null +++ b/tests/queries/0_stateless/01889_tokenize.sql @@ -0,0 +1,11 @@ +SET allow_experimental_nlp_functions = 1; + +SELECT splitByNonAlpha('It is quite a wonderful day, isn\'t it?'); +SELECT splitByNonAlpha('There is.... so much to learn!'); +SELECT splitByNonAlpha('22:00 email@yandex.ru'); +SELECT splitByNonAlpha('Ð¢Ð¾ÐºÐµÐ½Ð¸Ð·Ð°Ñ†Ð¸Ñ ÐºÐ°ÐºÐ¸Ñ…-либо других Ñзыков?'); + +SELECT splitByWhitespace('It is quite a wonderful day, isn\'t it?'); +SELECT splitByWhitespace('There is.... so much to learn!'); +SELECT splitByWhitespace('22:00 email@yandex.ru'); +SELECT splitByWhitespace('Ð¢Ð¾ÐºÐµÐ½Ð¸Ð·Ð°Ñ†Ð¸Ñ ÐºÐ°ÐºÐ¸Ñ…-либо других Ñзыков?'); diff --git a/tests/queries/0_stateless/01890_stem.reference b/tests/queries/0_stateless/01890_stem.reference new file mode 100644 index 00000000000..33e18cd6775 --- /dev/null +++ b/tests/queries/0_stateless/01890_stem.reference @@ -0,0 +1,21 @@ +given +combinatori +collect +possibl +studi +commonplac +pack +комбинаторн +получ +огранич +конечн +макÑимальн +Ñуммарн +ÑтоимоÑÑ‚ +remplissag +valeur +maximis +dépass +intens +étudi +peuvent diff --git a/tests/queries/0_stateless/01890_stem.sql b/tests/queries/0_stateless/01890_stem.sql new file mode 100644 index 00000000000..472cfb54251 --- /dev/null +++ b/tests/queries/0_stateless/01890_stem.sql @@ -0,0 +1,25 @@ +SET allow_experimental_nlp_functions = 1; + +SELECT stem('en', 'given'); +SELECT stem('en', 'combinatorial'); +SELECT stem('en', 'collection'); +SELECT stem('en', 'possibility'); +SELECT stem('en', 'studied'); +SELECT stem('en', 'commonplace'); +SELECT stem('en', 'packing'); + +SELECT stem('ru', 'комбинаторной'); +SELECT stem('ru', 'получила'); +SELECT stem('ru', 'ограничена'); +SELECT stem('ru', 'конечной'); +SELECT stem('ru', 'макÑимальной'); +SELECT stem('ru', 'Ñуммарный'); +SELECT stem('ru', 'ÑтоимоÑтью'); + +SELECT stem('fr', 'remplissage'); +SELECT stem('fr', 'valeur'); +SELECT stem('fr', 'maximiser'); +SELECT stem('fr', 'dépasser'); +SELECT stem('fr', 'intensivement'); +SELECT stem('fr', 'étudié'); +SELECT stem('fr', 'peuvent'); diff --git a/tests/queries/0_stateless/01891_partition_hash.sql b/tests/queries/0_stateless/01891_partition_hash.sql index 6e356e799ab..f401c7c2d07 100644 --- a/tests/queries/0_stateless/01891_partition_hash.sql +++ b/tests/queries/0_stateless/01891_partition_hash.sql @@ -1,5 +1,5 @@ drop table if exists tab; -create table tab (i8 Int8, i16 Int16, i32 Int32, i64 Int64, i128 Int128, i256 Int256, u8 UInt8, u16 UInt16, u32 UInt32, u64 UInt64, u128 UInt128, u256 UInt256, id UUID, s String, fs FixedString(33), a Array(UInt8), t Tuple(UInt16, UInt32), d Date, dt DateTime, dt64 DateTime64, dec128 Decimal128(3), dec256 Decimal256(4), lc LowCardinality(String)) engine = MergeTree PARTITION BY (i8, i16, i32, i64, i128, i256, u8, u16, u32, u64, u128, u256, id, s, fs, a, t, d, dt, dt64, dec128, dec256, lc) order by tuple(); +create table tab (i8 Int8, i16 Int16, i32 Int32, i64 Int64, i128 Int128, i256 Int256, u8 UInt8, u16 UInt16, u32 UInt32, u64 UInt64, u128 UInt128, u256 UInt256, id UUID, s String, fs FixedString(33), a Array(UInt8), t Tuple(UInt16, UInt32), d Date, dt DateTime('Europe/Moscow'), dt64 DateTime64(3, 'Europe/Moscow'), dec128 Decimal128(3), dec256 Decimal256(4), lc LowCardinality(String)) engine = MergeTree PARTITION BY (i8, i16, i32, i64, i128, i256, u8, u16, u32, u64, u128, u256, id, s, fs, a, t, d, dt, dt64, dec128, dec256, lc) order by tuple(); insert into tab values (-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, '61f0c404-5cb3-11e7-907b-a6006ad3dba0', 'a', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', [1, 2, 3], (-1, -2), '2020-01-01', '2020-01-01 01:01:01', '2020-01-01 01:01:01', '123.456', '78.9101', 'a'); -- Here we check that partition id did not change. -- Different result means Backward Incompatible Change. Old partitions will not be accepted by new server. diff --git a/tests/queries/0_stateless/01891_partition_hash_no_long_int.sql b/tests/queries/0_stateless/01891_partition_hash_no_long_int.sql index bf5c2457923..0751ff2729f 100644 --- a/tests/queries/0_stateless/01891_partition_hash_no_long_int.sql +++ b/tests/queries/0_stateless/01891_partition_hash_no_long_int.sql @@ -1,5 +1,5 @@ drop table if exists tab; -create table tab (i8 Int8, i16 Int16, i32 Int32, i64 Int64, u8 UInt8, u16 UInt16, u32 UInt32, u64 UInt64, id UUID, s String, fs FixedString(33), a Array(UInt8), t Tuple(UInt16, UInt32), d Date, dt DateTime, dt64 DateTime64, dec128 Decimal128(3), lc LowCardinality(String)) engine = MergeTree PARTITION BY (i8, i16, i32, i64, u8, u16, u32, u64, id, s, fs, a, t, d, dt, dt64, dec128, lc) order by tuple(); +create table tab (i8 Int8, i16 Int16, i32 Int32, i64 Int64, u8 UInt8, u16 UInt16, u32 UInt32, u64 UInt64, id UUID, s String, fs FixedString(33), a Array(UInt8), t Tuple(UInt16, UInt32), d Date, dt DateTime('Europe/Moscow'), dt64 DateTime64(3, 'Europe/Moscow'), dec128 Decimal128(3), lc LowCardinality(String)) engine = MergeTree PARTITION BY (i8, i16, i32, i64, u8, u16, u32, u64, id, s, fs, a, t, d, dt, dt64, dec128, lc) order by tuple(); insert into tab values (-1, -1, -1, -1, -1, -1, -1, -1, '61f0c404-5cb3-11e7-907b-a6006ad3dba0', 'a', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', [1, 2, 3], (-1, -2), '2020-01-01', '2020-01-01 01:01:01', '2020-01-01 01:01:01', '123.456', 'a'); -- Here we check that partition id did not change. -- Different result means Backward Incompatible Change. Old partitions will not be accepted by new server. diff --git a/tests/queries/0_stateless/01905_to_json_string.sql b/tests/queries/0_stateless/01905_to_json_string.sql index fe8a2407f3d..80d9c2e2625 100644 --- a/tests/queries/0_stateless/01905_to_json_string.sql +++ b/tests/queries/0_stateless/01905_to_json_string.sql @@ -1,10 +1,17 @@ -drop table if exists t; - -create table t engine Memory as select * from generateRandom('a Array(Int8), b UInt32, c Nullable(String), d Decimal32(4), e Nullable(Enum16(\'h\' = 1, \'w\' = 5 , \'o\' = -200)), f Float64, g Tuple(Date, DateTime, DateTime64, UUID), h FixedString(2), i Array(Nullable(UUID))', 10, 5, 3) limit 2; +create temporary table t engine Memory as select * from generateRandom( +$$ + a Array(Int8), + b UInt32, + c Nullable(String), + d Decimal32(4), + e Nullable(Enum16('h' = 1, 'w' = 5 , 'o' = -200)), + f Float64, + g Tuple(Date, DateTime('Europe/Moscow'), DateTime64(3, 'Europe/Moscow'), UUID), + h FixedString(2), + i Array(Nullable(UUID)) +$$, 10, 5, 3) limit 2; select * apply toJSONString from t; -drop table t; - set allow_experimental_map_type = 1; select toJSONString(map('1234', '5678')); diff --git a/tests/queries/0_stateless/01915_create_or_replace_dictionary.sql b/tests/queries/0_stateless/01915_create_or_replace_dictionary.sql index c9df6114ec9..1520dd41973 100644 --- a/tests/queries/0_stateless/01915_create_or_replace_dictionary.sql +++ b/tests/queries/0_stateless/01915_create_or_replace_dictionary.sql @@ -1,51 +1,51 @@ -DROP DATABASE IF EXISTS 01915_db; -CREATE DATABASE 01915_db ENGINE=Atomic; +DROP DATABASE IF EXISTS test_01915_db; +CREATE DATABASE test_01915_db ENGINE=Atomic; -DROP TABLE IF EXISTS 01915_db.test_source_table_1; -CREATE TABLE 01915_db.test_source_table_1 +DROP TABLE IF EXISTS test_01915_db.test_source_table_1; +CREATE TABLE test_01915_db.test_source_table_1 ( id UInt64, value String ) ENGINE=TinyLog; -INSERT INTO 01915_db.test_source_table_1 VALUES (0, 'Value0'); +INSERT INTO test_01915_db.test_source_table_1 VALUES (0, 'Value0'); -DROP DICTIONARY IF EXISTS 01915_db.test_dictionary; -CREATE OR REPLACE DICTIONARY 01915_db.test_dictionary +DROP DICTIONARY IF EXISTS test_01915_db.test_dictionary; +CREATE OR REPLACE DICTIONARY test_01915_db.test_dictionary ( id UInt64, value String ) PRIMARY KEY id LAYOUT(DIRECT()) -SOURCE(CLICKHOUSE(DB '01915_db' TABLE 'test_source_table_1')); +SOURCE(CLICKHOUSE(DB 'test_01915_db' TABLE 'test_source_table_1')); -SELECT * FROM 01915_db.test_dictionary; +SELECT * FROM test_01915_db.test_dictionary; -DROP TABLE IF EXISTS 01915_db.test_source_table_2; -CREATE TABLE 01915_db.test_source_table_2 +DROP TABLE IF EXISTS test_01915_db.test_source_table_2; +CREATE TABLE test_01915_db.test_source_table_2 ( id UInt64, value_1 String ) ENGINE=TinyLog; -INSERT INTO 01915_db.test_source_table_2 VALUES (0, 'Value1'); +INSERT INTO test_01915_db.test_source_table_2 VALUES (0, 'Value1'); -CREATE OR REPLACE DICTIONARY 01915_db.test_dictionary +CREATE OR REPLACE DICTIONARY test_01915_db.test_dictionary ( id UInt64, value_1 String ) PRIMARY KEY id LAYOUT(HASHED()) -SOURCE(CLICKHOUSE(DB '01915_db' TABLE 'test_source_table_2')) +SOURCE(CLICKHOUSE(DB 'test_01915_db' TABLE 'test_source_table_2')) LIFETIME(0); -SELECT * FROM 01915_db.test_dictionary; +SELECT * FROM test_01915_db.test_dictionary; -DROP DICTIONARY 01915_db.test_dictionary; +DROP DICTIONARY test_01915_db.test_dictionary; -DROP TABLE 01915_db.test_source_table_1; -DROP TABLE 01915_db.test_source_table_2; +DROP TABLE test_01915_db.test_source_table_1; +DROP TABLE test_01915_db.test_source_table_2; -DROP DATABASE 01915_db; +DROP DATABASE test_01915_db; diff --git a/tests/queries/0_stateless/01920_async_drain_connections.reference b/tests/queries/0_stateless/01920_async_drain_connections.reference new file mode 100644 index 00000000000..aa47d0d46d4 --- /dev/null +++ b/tests/queries/0_stateless/01920_async_drain_connections.reference @@ -0,0 +1,2 @@ +0 +0 diff --git a/tests/queries/0_stateless/01920_async_drain_connections.sql b/tests/queries/0_stateless/01920_async_drain_connections.sql new file mode 100644 index 00000000000..827ca13fc1a --- /dev/null +++ b/tests/queries/0_stateless/01920_async_drain_connections.sql @@ -0,0 +1,6 @@ +drop table if exists t; + +create table t (number UInt64) engine = Distributed(test_cluster_two_shards, system, numbers); +select * from t where number = 0 limit 2 settings sleep_in_receive_cancel_ms = 10000, max_execution_time = 5; + +drop table t; diff --git a/tests/queries/0_stateless/01921_datatype_date32.reference b/tests/queries/0_stateless/01921_datatype_date32.reference index 3efe9079cc2..2114f6f6b1e 100644 --- a/tests/queries/0_stateless/01921_datatype_date32.reference +++ b/tests/queries/0_stateless/01921_datatype_date32.reference @@ -194,11 +194,11 @@ 2283-11-11 01:00:00.000 2021-06-22 01:00:00.000 -------addHours--------- -1925-01-01 12:00:00.000 -1925-01-01 12:00:00.000 -2282-12-31 12:00:00.000 -2283-11-11 12:00:00.000 -2021-06-22 12:00:00.000 +1925-01-01 01:00:00.000 +1925-01-01 01:00:00.000 +2282-12-31 01:00:00.000 +2283-11-11 01:00:00.000 +2021-06-22 01:00:00.000 -------addDays--------- 1925-01-08 1925-01-08 @@ -280,3 +280,6 @@ -------toDate32--------- 1925-01-01 2000-01-01 1925-01-01 1925-01-01 +1925-01-01 \N +1925-01-01 +\N diff --git a/tests/queries/0_stateless/01921_datatype_date32.sql b/tests/queries/0_stateless/01921_datatype_date32.sql index 5431736fab3..e01bdfeee8d 100644 --- a/tests/queries/0_stateless/01921_datatype_date32.sql +++ b/tests/queries/0_stateless/01921_datatype_date32.sql @@ -23,7 +23,7 @@ select toMinute(x1) from t1; -- { serverError 43 } select '-------toSecond---------'; select toSecond(x1) from t1; -- { serverError 43 } select '-------toStartOfDay---------'; -select toStartOfDay(x1) from t1; +select toStartOfDay(x1, 'Europe/Moscow') from t1; select '-------toMonday---------'; select toMonday(x1) from t1; select '-------toISOWeek---------'; @@ -57,21 +57,21 @@ select toStartOfHour(x1) from t1; -- { serverError 43 } select '-------toStartOfISOYear---------'; select toStartOfISOYear(x1) from t1; select '-------toRelativeYearNum---------'; -select toRelativeYearNum(x1) from t1; +select toRelativeYearNum(x1, 'Europe/Moscow') from t1; select '-------toRelativeQuarterNum---------'; -select toRelativeQuarterNum(x1) from t1; +select toRelativeQuarterNum(x1, 'Europe/Moscow') from t1; select '-------toRelativeMonthNum---------'; -select toRelativeMonthNum(x1) from t1; +select toRelativeMonthNum(x1, 'Europe/Moscow') from t1; select '-------toRelativeWeekNum---------'; -select toRelativeWeekNum(x1) from t1; +select toRelativeWeekNum(x1, 'Europe/Moscow') from t1; select '-------toRelativeDayNum---------'; -select toRelativeDayNum(x1) from t1; +select toRelativeDayNum(x1, 'Europe/Moscow') from t1; select '-------toRelativeHourNum---------'; -select toRelativeHourNum(x1) from t1; +select toRelativeHourNum(x1, 'Europe/Moscow') from t1; select '-------toRelativeMinuteNum---------'; -select toRelativeMinuteNum(x1) from t1; +select toRelativeMinuteNum(x1, 'Europe/Moscow') from t1; select '-------toRelativeSecondNum---------'; -select toRelativeSecondNum(x1) from t1; +select toRelativeSecondNum(x1, 'Europe/Moscow') from t1; select '-------toTime---------'; select toTime(x1) from t1; -- { serverError 43 } select '-------toYYYYMM---------'; @@ -85,7 +85,7 @@ select addSeconds(x1, 3600) from t1; select '-------addMinutes---------'; select addMinutes(x1, 60) from t1; select '-------addHours---------'; -select addHours(x1, 12) from t1; +select addHours(x1, 1) from t1; select '-------addDays---------'; select addDays(x1, 7) from t1; select '-------addWeeks---------'; @@ -115,4 +115,7 @@ select subtractYears(x1, 1) from t1; select '-------toDate32---------'; select toDate32('1925-01-01'), toDate32(toDate('2000-01-01')); select toDate32OrZero('1924-01-01'), toDate32OrNull('1924-01-01'); +select toDate32OrZero(''), toDate32OrNull(''); +select (select toDate32OrZero('')); +select (select toDate32OrNull('')); diff --git a/tests/queries/0_stateless/01925_broken_partition_id_zookeeper.reference b/tests/queries/0_stateless/01925_broken_partition_id_zookeeper.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/01925_broken_partition_id_zookeeper.sql b/tests/queries/0_stateless/01925_broken_partition_id_zookeeper.sql new file mode 100644 index 00000000000..baf6c1fbf8f --- /dev/null +++ b/tests/queries/0_stateless/01925_broken_partition_id_zookeeper.sql @@ -0,0 +1,16 @@ +DROP TABLE IF EXISTS broken_partition; + +CREATE TABLE broken_partition +( + date Date, + key UInt64 +) +ENGINE = ReplicatedMergeTree('/clickhouse/test_01925_{database}/rmt', 'r1') +ORDER BY tuple() +PARTITION BY date; + +ALTER TABLE broken_partition DROP PARTITION ID '20210325_0_13241_6_12747'; --{serverError 248} + +ALTER TABLE broken_partition DROP PARTITION ID '20210325_0_13241_6_12747'; --{serverError 248} + +DROP TABLE IF EXISTS broken_partition; diff --git a/tests/queries/0_stateless/01925_map_populate_series_on_map.reference b/tests/queries/0_stateless/01925_map_populate_series_on_map.reference new file mode 100644 index 00000000000..235a227f548 --- /dev/null +++ b/tests/queries/0_stateless/01925_map_populate_series_on_map.reference @@ -0,0 +1,67 @@ +-- { echo } +drop table if exists map_test; +set allow_experimental_map_type = 1; +create table map_test engine=TinyLog() as (select (number + 1) as n, map(1, 1, number,2) as m from numbers(1, 5)); +select mapPopulateSeries(m) from map_test; +{1:1} +{1:1,2:2} +{1:1,2:0,3:2} +{1:1,2:0,3:0,4:2} +{1:1,2:0,3:0,4:0,5:2} +select mapPopulateSeries(m, toUInt64(3)) from map_test; +{1:1,2:0,3:0} +{1:1,2:2,3:0} +{1:1,2:0,3:2} +{1:1,2:0,3:0} +{1:1,2:0,3:0} +select mapPopulateSeries(m, toUInt64(10)) from map_test; +{1:1,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0} +{1:1,2:2,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0} +{1:1,2:0,3:2,4:0,5:0,6:0,7:0,8:0,9:0,10:0} +{1:1,2:0,3:0,4:2,5:0,6:0,7:0,8:0,9:0,10:0} +{1:1,2:0,3:0,4:0,5:2,6:0,7:0,8:0,9:0,10:0} +select mapPopulateSeries(m, 1000) from map_test; -- { serverError 43 } +select mapPopulateSeries(m, n) from map_test; +{1:1,2:0} +{1:1,2:2,3:0} +{1:1,2:0,3:2,4:0} +{1:1,2:0,3:0,4:2,5:0} +{1:1,2:0,3:0,4:0,5:2,6:0} +drop table map_test; +select mapPopulateSeries(map(toUInt8(1), toUInt8(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(UInt8,UInt8) +select mapPopulateSeries(map(toUInt16(1), toUInt16(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(UInt16,UInt16) +select mapPopulateSeries(map(toUInt32(1), toUInt32(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(UInt32,UInt32) +select mapPopulateSeries(map(toUInt64(1), toUInt64(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(UInt64,UInt64) +select mapPopulateSeries(map(toUInt128(1), toUInt128(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(UInt128,UInt128) +select mapPopulateSeries(map(toUInt256(1), toUInt256(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(UInt256,UInt256) +select mapPopulateSeries(map(toInt8(1), toInt8(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(Int16,Int16) +select mapPopulateSeries(map(toInt16(1), toInt16(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(Int16,Int16) +select mapPopulateSeries(map(toInt32(1), toInt32(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(Int32,Int32) +select mapPopulateSeries(map(toInt64(1), toInt64(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(Int64,Int64) +select mapPopulateSeries(map(toInt128(1), toInt128(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(Int128,Int128) +select mapPopulateSeries(map(toInt256(1), toInt256(1), 2, 1)) as res, toTypeName(res); +{1:1,2:1} Map(Int256,Int256) +select mapPopulateSeries(map(toInt8(-10), toInt8(1), 2, 1)) as res, toTypeName(res); +{-10:1,-9:0,-8:0,-7:0,-6:0,-5:0,-4:0,-3:0,-2:0,-1:0,0:0,1:0,2:1} Map(Int16,Int16) +select mapPopulateSeries(map(toInt16(-10), toInt16(1), 2, 1)) as res, toTypeName(res); +{-10:1,-9:0,-8:0,-7:0,-6:0,-5:0,-4:0,-3:0,-2:0,-1:0,0:0,1:0,2:1} Map(Int16,Int16) +select mapPopulateSeries(map(toInt32(-10), toInt32(1), 2, 1)) as res, toTypeName(res); +{-10:1,-9:0,-8:0,-7:0,-6:0,-5:0,-4:0,-3:0,-2:0,-1:0,0:0,1:0,2:1} Map(Int32,Int32) +select mapPopulateSeries(map(toInt64(-10), toInt64(1), 2, 1)) as res, toTypeName(res); +{-10:1,-9:0,-8:0,-7:0,-6:0,-5:0,-4:0,-3:0,-2:0,-1:0,0:0,1:0,2:1} Map(Int64,Int64) +select mapPopulateSeries(map(toInt64(-10), toInt64(1), 2, 1), toInt64(-5)) as res, toTypeName(res); +{-10:1,-9:0,-8:0,-7:0,-6:0,-5:0} Map(Int64,Int64) +select mapPopulateSeries(); -- { serverError 42 } +select mapPopulateSeries('asdf'); -- { serverError 43 } +select mapPopulateSeries(map('1', 1, '2', 1)) as res, toTypeName(res); -- { serverError 43 } diff --git a/tests/queries/0_stateless/01925_map_populate_series_on_map.sql b/tests/queries/0_stateless/01925_map_populate_series_on_map.sql new file mode 100644 index 00000000000..ac78280ec1d --- /dev/null +++ b/tests/queries/0_stateless/01925_map_populate_series_on_map.sql @@ -0,0 +1,36 @@ +-- { echo } +drop table if exists map_test; +set allow_experimental_map_type = 1; +create table map_test engine=TinyLog() as (select (number + 1) as n, map(1, 1, number,2) as m from numbers(1, 5)); + +select mapPopulateSeries(m) from map_test; +select mapPopulateSeries(m, toUInt64(3)) from map_test; +select mapPopulateSeries(m, toUInt64(10)) from map_test; +select mapPopulateSeries(m, 1000) from map_test; -- { serverError 43 } +select mapPopulateSeries(m, n) from map_test; + +drop table map_test; + +select mapPopulateSeries(map(toUInt8(1), toUInt8(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toUInt16(1), toUInt16(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toUInt32(1), toUInt32(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toUInt64(1), toUInt64(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toUInt128(1), toUInt128(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toUInt256(1), toUInt256(1), 2, 1)) as res, toTypeName(res); + +select mapPopulateSeries(map(toInt8(1), toInt8(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt16(1), toInt16(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt32(1), toInt32(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt64(1), toInt64(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt128(1), toInt128(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt256(1), toInt256(1), 2, 1)) as res, toTypeName(res); + +select mapPopulateSeries(map(toInt8(-10), toInt8(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt16(-10), toInt16(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt32(-10), toInt32(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt64(-10), toInt64(1), 2, 1)) as res, toTypeName(res); +select mapPopulateSeries(map(toInt64(-10), toInt64(1), 2, 1), toInt64(-5)) as res, toTypeName(res); + +select mapPopulateSeries(); -- { serverError 42 } +select mapPopulateSeries('asdf'); -- { serverError 43 } +select mapPopulateSeries(map('1', 1, '2', 1)) as res, toTypeName(res); -- { serverError 43 } diff --git a/tests/queries/0_stateless/01926_date_date_time_supertype.sql b/tests/queries/0_stateless/01926_date_date_time_supertype.sql index 559cd465ebb..cce488a5cff 100644 --- a/tests/queries/0_stateless/01926_date_date_time_supertype.sql +++ b/tests/queries/0_stateless/01926_date_date_time_supertype.sql @@ -15,7 +15,7 @@ WITH toDate('2000-01-01') as a, toDateTime('2000-01-01', 'Europe/Moscow') as b SELECT if(value, b, a) as result, toTypeName(result) FROM predicate_table; -WITH toDateTime('2000-01-01') as a, toDateTime64('2000-01-01', 5, 'Europe/Moscow') as b +WITH toDateTime('2000-01-01', 'Europe/Moscow') as a, toDateTime64('2000-01-01', 5, 'Europe/Moscow') as b SELECT if(value, b, a) as result, toTypeName(result) FROM predicate_table; diff --git a/tests/queries/0_stateless/01926_order_by_desc_limit.reference b/tests/queries/0_stateless/01926_order_by_desc_limit.reference new file mode 100644 index 00000000000..6ed281c757a --- /dev/null +++ b/tests/queries/0_stateless/01926_order_by_desc_limit.reference @@ -0,0 +1,2 @@ +1 +1 diff --git a/tests/queries/0_stateless/01926_order_by_desc_limit.sql b/tests/queries/0_stateless/01926_order_by_desc_limit.sql new file mode 100644 index 00000000000..7ea102e11e9 --- /dev/null +++ b/tests/queries/0_stateless/01926_order_by_desc_limit.sql @@ -0,0 +1,21 @@ +DROP TABLE IF EXISTS order_by_desc; + +CREATE TABLE order_by_desc (u UInt32, s String) +ENGINE MergeTree ORDER BY u PARTITION BY u % 100 +SETTINGS index_granularity = 1024; + +INSERT INTO order_by_desc SELECT number, repeat('a', 1024) FROM numbers(1024 * 300); +OPTIMIZE TABLE order_by_desc FINAL; + +SELECT s FROM order_by_desc ORDER BY u DESC LIMIT 10 FORMAT Null +SETTINGS max_memory_usage = '400M'; + +SELECT s FROM order_by_desc ORDER BY u LIMIT 10 FORMAT Null +SETTINGS max_memory_usage = '400M'; + +SYSTEM FLUSH LOGS; + +SELECT read_rows < 110000 FROM system.query_log +WHERE type = 'QueryFinish' AND current_database = currentDatabase() +AND event_time > now() - INTERVAL 10 SECOND +AND lower(query) LIKE lower('SELECT s FROM order_by_desc ORDER BY u%'); diff --git a/tests/queries/0_stateless/01939_user_with_default_database.reference b/tests/queries/0_stateless/01939_user_with_default_database.reference new file mode 100644 index 00000000000..8c8ff7e3007 --- /dev/null +++ b/tests/queries/0_stateless/01939_user_with_default_database.reference @@ -0,0 +1,4 @@ +default +db_01939 +CREATE USER u_01939 +CREATE USER u_01939 DEFAULT DATABASE NONE diff --git a/tests/queries/0_stateless/01939_user_with_default_database.sh b/tests/queries/0_stateless/01939_user_with_default_database.sh new file mode 100755 index 00000000000..6dcd288797b --- /dev/null +++ b/tests/queries/0_stateless/01939_user_with_default_database.sh @@ -0,0 +1,25 @@ +#!/usr/bin/env bash + + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + +${CLICKHOUSE_CLIENT_BINARY} --query "create database if not exists db_01939" +${CLICKHOUSE_CLIENT_BINARY} --query "create database if not exists NONE" + +#create user by sql +${CLICKHOUSE_CLIENT_BINARY} --query "drop user if exists u_01939" +${CLICKHOUSE_CLIENT_BINARY} --query "create user u_01939 default database db_01939" + +${CLICKHOUSE_CLIENT_BINARY} --query "SELECT currentDatabase();" +${CLICKHOUSE_CLIENT_BINARY} --user=u_01939 --query "SELECT currentDatabase();" + +${CLICKHOUSE_CLIENT_BINARY} --query "alter user u_01939 default database NONE" +${CLICKHOUSE_CLIENT_BINARY} --query "show create user u_01939" +${CLICKHOUSE_CLIENT_BINARY} --query "alter user u_01939 default database \`NONE\`" +${CLICKHOUSE_CLIENT_BINARY} --query "show create user u_01939" + +${CLICKHOUSE_CLIENT_BINARY} --query "drop user u_01939 " +${CLICKHOUSE_CLIENT_BINARY} --query "drop database db_01939" +${CLICKHOUSE_CLIENT_BINARY} --query "drop database NONE" diff --git a/tests/queries/0_stateless/01940_totimezone_operator_monotonicity.reference b/tests/queries/0_stateless/01940_totimezone_operator_monotonicity.reference new file mode 100644 index 00000000000..d00491fd7e5 --- /dev/null +++ b/tests/queries/0_stateless/01940_totimezone_operator_monotonicity.reference @@ -0,0 +1 @@ +1 diff --git a/tests/queries/0_stateless/01940_totimezone_operator_monotonicity.sql b/tests/queries/0_stateless/01940_totimezone_operator_monotonicity.sql new file mode 100644 index 00000000000..b8065947ead --- /dev/null +++ b/tests/queries/0_stateless/01940_totimezone_operator_monotonicity.sql @@ -0,0 +1,6 @@ +DROP TABLE IF EXISTS totimezone_op_mono; +CREATE TABLE totimezone_op_mono(i int, tz String, create_time DateTime) ENGINE MergeTree PARTITION BY toDate(create_time) ORDER BY i; +INSERT INTO totimezone_op_mono VALUES (1, 'UTC', toDateTime('2020-09-01 00:00:00', 'UTC')), (2, 'UTC', toDateTime('2020-09-02 00:00:00', 'UTC')); +SET max_rows_to_read = 1; +SELECT count() FROM totimezone_op_mono WHERE toTimeZone(create_time, 'UTC') = '2020-09-01 00:00:00'; +DROP TABLE IF EXISTS totimezone_op_mono; diff --git a/tests/queries/0_stateless/01943_pmj_non_joined_stuck.reference.j2 b/tests/queries/0_stateless/01943_pmj_non_joined_stuck.reference.j2 new file mode 100644 index 00000000000..8e54cd28808 --- /dev/null +++ b/tests/queries/0_stateless/01943_pmj_non_joined_stuck.reference.j2 @@ -0,0 +1,3 @@ +{% for i in range(24) -%} +1 +{% endfor -%} diff --git a/tests/queries/0_stateless/01943_pmj_non_joined_stuck.sql.j2 b/tests/queries/0_stateless/01943_pmj_non_joined_stuck.sql.j2 new file mode 100644 index 00000000000..32838b66e83 --- /dev/null +++ b/tests/queries/0_stateless/01943_pmj_non_joined_stuck.sql.j2 @@ -0,0 +1,9 @@ +SET max_block_size = 6, join_algorithm = 'partial_merge'; + +{% for i in range(4, 16) -%} +SELECT count() == {{ i }} FROM (SELECT 100 AS s) AS js1 ALL RIGHT JOIN ( SELECT number AS s FROM numbers({{ i }}) ) AS js2 USING (s); +{% endfor -%} + +{% for i in range(4, 16) -%} +SELECT count() == {{ i + 1 }} FROM (SELECT 100 AS s) AS js1 ALL FULL JOIN ( SELECT number AS s FROM numbers({{ i }}) ) AS js2 USING (s); +{% endfor -%} diff --git a/tests/queries/0_stateless/01943_query_id_check.reference b/tests/queries/0_stateless/01943_query_id_check.reference new file mode 100644 index 00000000000..324d49f8907 --- /dev/null +++ b/tests/queries/0_stateless/01943_query_id_check.reference @@ -0,0 +1,7 @@ +CREATE TABLE tmp ENGINE = TinyLog AS SELECT queryID(); +CREATE TABLE tmp ENGINE = TinyLog AS SELECT initialQueryID(); +3 +3 +1 +1 +3 diff --git a/tests/queries/0_stateless/01943_query_id_check.sql b/tests/queries/0_stateless/01943_query_id_check.sql new file mode 100644 index 00000000000..6e3d281386d --- /dev/null +++ b/tests/queries/0_stateless/01943_query_id_check.sql @@ -0,0 +1,21 @@ +DROP TABLE IF EXISTS tmp; + +CREATE TABLE tmp ENGINE = TinyLog AS SELECT queryID(); +SYSTEM FLUSH LOGS; +SELECT query FROM system.query_log WHERE query_id = (SELECT * FROM tmp) AND current_database = currentDatabase() LIMIT 1; +DROP TABLE tmp; + +CREATE TABLE tmp ENGINE = TinyLog AS SELECT initialQueryID(); +SYSTEM FLUSH LOGS; +SELECT query FROM system.query_log WHERE initial_query_id = (SELECT * FROM tmp) AND current_database = currentDatabase() LIMIT 1; +DROP TABLE tmp; + +CREATE TABLE tmp (str String) ENGINE = Log; +INSERT INTO tmp (*) VALUES ('a') +SELECT count() FROM (SELECT initialQueryID() FROM remote('127.0.0.{1..3}', currentDatabase(), 'tmp') GROUP BY queryID()); +SELECT count() FROM (SELECT queryID() FROM remote('127.0.0.{1..3}', currentDatabase(), 'tmp') GROUP BY queryID()); +SELECT count() FROM (SELECT queryID() AS t FROM remote('127.0.0.{1..3}', currentDatabase(), 'tmp') GROUP BY queryID() HAVING t == initialQueryID()); +SELECT count(DISTINCT t) FROM (SELECT initialQueryID() AS t FROM remote('127.0.0.{1..3}', currentDatabase(), 'tmp') GROUP BY queryID()); +SELECT count(DISTINCT t) FROM (SELECT queryID() AS t FROM remote('127.0.0.{1..3}', currentDatabase(), 'tmp') GROUP BY queryID()); +DROP TABLE tmp; + diff --git a/tests/queries/0_stateless/01944_range_max_elements.reference b/tests/queries/0_stateless/01944_range_max_elements.reference new file mode 100644 index 00000000000..7763ac4ce96 --- /dev/null +++ b/tests/queries/0_stateless/01944_range_max_elements.reference @@ -0,0 +1,33 @@ +[] +[0] +[0,1] +[] +[0] +[0,1] +[] +[0] +[0,1] +[] +[] +[0] +[0,1] +[] +[0] +[0,1] +[] +[0] +[0,1] +[] +[0] +[] +[0] +[0,1] +[] +[0] +[0,1] +[] +[0] +[0,1] +[] +[0] +[0,1] diff --git a/tests/queries/0_stateless/01944_range_max_elements.sql b/tests/queries/0_stateless/01944_range_max_elements.sql new file mode 100644 index 00000000000..c18f61e3190 --- /dev/null +++ b/tests/queries/0_stateless/01944_range_max_elements.sql @@ -0,0 +1,7 @@ +SET function_range_max_elements_in_block = 10; +SELECT range(number % 3) FROM numbers(10); +SELECT range(number % 3) FROM numbers(11); +SELECT range(number % 3) FROM numbers(12); -- { serverError 69 } + +SET function_range_max_elements_in_block = 12; +SELECT range(number % 3) FROM numbers(12); diff --git a/tests/queries/0_stateless/01945_show_debug_warning.expect b/tests/queries/0_stateless/01945_show_debug_warning.expect new file mode 100755 index 00000000000..7f14fdfbc96 --- /dev/null +++ b/tests/queries/0_stateless/01945_show_debug_warning.expect @@ -0,0 +1,50 @@ +#!/usr/bin/expect -f + +# This is a test for system.warnings. Testing in interactive mode is necessary, +# as we want to see certain warnings from client + +log_user 0 +set timeout 60 +match_max 100000 + +# A default timeout action is to do nothing, change it to fail +expect_after { + timeout { + exit 1 + } +} + +set basedir [file dirname $argv0] +set Debug_type 0 + +spawn bash -c "source $basedir/../shell_config.sh ; \$CLICKHOUSE_CLIENT_BINARY \$CLICKHOUSE_CLIENT_OPT --disable_suggestion" +expect ":) " + +# Check debug type +send -- "SELECT value FROM system.build_options WHERE name='BUILD_TYPE'\r" +expect { +"Debug" { + set Debug_type 1 + expect ":) " + } +"RelWithDebInfo" +} + +send -- "q\r" +expect eof + +if { $Debug_type > 0} { + +spawn bash -c "source $basedir/../shell_config.sh ; \$CLICKHOUSE_CLIENT_BINARY \$CLICKHOUSE_CLIENT_OPT --disable_suggestion" +expect "Warnings:" +expect " * Server was built in debug mode. It will work slowly." +expect ":) " + +# Check debug message in system.warnings +send -- "SELECT message FROM system.warnings WHERE message='Server was built in debug mode. It will work slowly.'\r" +expect "Server was built in debug mode. It will work slowly." +expect ":) " + +send -- "q\r" +expect eof +} diff --git a/tests/queries/0_stateless/01945_show_debug_warning.reference b/tests/queries/0_stateless/01945_show_debug_warning.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/01945_system_warnings.expect b/tests/queries/0_stateless/01945_system_warnings.expect index 56d219e1040..01a314429f8 100755 --- a/tests/queries/0_stateless/01945_system_warnings.expect +++ b/tests/queries/0_stateless/01945_system_warnings.expect @@ -36,5 +36,5 @@ expect { } # Finish test -send -- "\4" +send -- "q\r" expect eof diff --git a/tests/queries/0_stateless/01946_profile_sleep.reference b/tests/queries/0_stateless/01946_profile_sleep.reference new file mode 100644 index 00000000000..cc2d9ab80f9 --- /dev/null +++ b/tests/queries/0_stateless/01946_profile_sleep.reference @@ -0,0 +1,6 @@ +{"'SLEEP #1 CHECK'":"SLEEP #1 CHECK","calls":"1","microseconds":"1000"} +{"'SLEEP #2 CHECK'":"SLEEP #2 CHECK","calls":"1","microseconds":"1000"} +{"'SLEEP #3 CHECK'":"SLEEP #3 CHECK","calls":"1","microseconds":"1000"} +{"'SLEEP #4 CHECK'":"SLEEP #4 CHECK","calls":"2","microseconds":"2000"} +{"'SLEEP #5 CHECK'":"SLEEP #5 CHECK","calls":"0","microseconds":"0"} +{"'SLEEP #6 CHECK'":"SLEEP #6 CHECK","calls":"10","microseconds":"10000"} diff --git a/tests/queries/0_stateless/01946_profile_sleep.sql b/tests/queries/0_stateless/01946_profile_sleep.sql new file mode 100644 index 00000000000..01c203fb73e --- /dev/null +++ b/tests/queries/0_stateless/01946_profile_sleep.sql @@ -0,0 +1,65 @@ +SET log_queries=1; +SET log_profile_events=true; + +SELECT 'SLEEP #1 TEST', sleep(0.001) FORMAT Null; +SYSTEM FLUSH LOGS; +SELECT 'SLEEP #1 CHECK', ProfileEvents['SleepFunctionCalls'] as calls, ProfileEvents['SleepFunctionMicroseconds'] as microseconds +FROM system.query_log +WHERE query like '%SELECT ''SLEEP #1 TEST''%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +SELECT 'SLEEP #2 TEST', sleep(0.001) FROM numbers(2) FORMAT Null; +SYSTEM FLUSH LOGS; +SELECT 'SLEEP #2 CHECK', ProfileEvents['SleepFunctionCalls'] as calls, ProfileEvents['SleepFunctionMicroseconds'] as microseconds +FROM system.query_log +WHERE query like '%SELECT ''SLEEP #2 TEST''%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +SELECT 'SLEEP #3 TEST', sleepEachRow(0.001) FORMAT Null; +SYSTEM FLUSH LOGS; +SELECT 'SLEEP #3 CHECK', ProfileEvents['SleepFunctionCalls'] as calls, ProfileEvents['SleepFunctionMicroseconds'] as microseconds +FROM system.query_log +WHERE query like '%SELECT ''SLEEP #3 TEST''%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +SELECT 'SLEEP #4 TEST', sleepEachRow(0.001) FROM numbers(2) FORMAT Null; +SYSTEM FLUSH LOGS; +SELECT 'SLEEP #4 CHECK', ProfileEvents['SleepFunctionCalls'] as calls, ProfileEvents['SleepFunctionMicroseconds'] as microseconds +FROM system.query_log +WHERE query like '%SELECT ''SLEEP #4 TEST''%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + + +CREATE VIEW sleep_view AS SELECT sleepEachRow(0.001) FROM system.numbers; +SYSTEM FLUSH LOGS; +SELECT 'SLEEP #5 CHECK', ProfileEvents['SleepFunctionCalls'] as calls, ProfileEvents['SleepFunctionMicroseconds'] as microseconds +FROM system.query_log +WHERE query like '%CREATE VIEW sleep_view AS%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +SELECT 'SLEEP #6 TEST', sleepEachRow(0.001) FROM sleep_view LIMIT 10 FORMAT Null; +SYSTEM FLUSH LOGS; +SELECT 'SLEEP #6 CHECK', ProfileEvents['SleepFunctionCalls'] as calls, ProfileEvents['SleepFunctionMicroseconds'] as microseconds +FROM system.query_log +WHERE query like '%SELECT ''SLEEP #6 TEST''%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +DROP TABLE sleep_view; diff --git a/tests/queries/0_stateless/01946_test.reference b/tests/queries/0_stateless/01946_test.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/01946_test_wrong_host_name_access.reference b/tests/queries/0_stateless/01946_test_wrong_host_name_access.reference new file mode 100644 index 00000000000..1191247b6d9 --- /dev/null +++ b/tests/queries/0_stateless/01946_test_wrong_host_name_access.reference @@ -0,0 +1,2 @@ +1 +2 diff --git a/tests/queries/0_stateless/01946_test_wrong_host_name_access.sh b/tests/queries/0_stateless/01946_test_wrong_host_name_access.sh new file mode 100755 index 00000000000..86e1fdd768f --- /dev/null +++ b/tests/queries/0_stateless/01946_test_wrong_host_name_access.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env bash + +MYHOSTNAME=$(hostname -f) + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + +${CLICKHOUSE_CLIENT} --multiquery --query " + DROP USER IF EXISTS dns_fail_1, dns_fail_2; + CREATE USER dns_fail_1 HOST NAME 'non.existing.host.name', '${MYHOSTNAME}'; + CREATE USER dns_fail_2 HOST NAME '${MYHOSTNAME}', 'non.existing.host.name';" + +${CLICKHOUSE_CLIENT} --query "SELECT 1" --user dns_fail_1 --host ${MYHOSTNAME} + +${CLICKHOUSE_CLIENT} --query "SELECT 2" --user dns_fail_2 --host ${MYHOSTNAME} + +${CLICKHOUSE_CLIENT} --query "DROP USER IF EXISTS dns_fail_1, dns_fail_2" + +${CLICKHOUSE_CLIENT} --query "SYSTEM DROP DNS CACHE" diff --git a/tests/queries/0_stateless/01946_test_zstd_decompression_with_escape_sequence_at_the_end_of_buffer.reference b/tests/queries/0_stateless/01946_test_zstd_decompression_with_escape_sequence_at_the_end_of_buffer.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/01946_test_zstd_decompression_with_escape_sequence_at_the_end_of_buffer.sh b/tests/queries/0_stateless/01946_test_zstd_decompression_with_escape_sequence_at_the_end_of_buffer.sh new file mode 100755 index 00000000000..abca5cdfa3b --- /dev/null +++ b/tests/queries/0_stateless/01946_test_zstd_decompression_with_escape_sequence_at_the_end_of_buffer.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env bash + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + + +# See 01658_read_file_to_string_column.sh +user_files_path=$(clickhouse-client --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}') +mkdir -p ${user_files_path}/ +cp $CUR_DIR/data_zstd/test_01946.zstd ${user_files_path}/ + +${CLICKHOUSE_CLIENT} --multiline --multiquery --query " +set max_read_buffer_size = 65536; +set input_format_parallel_parsing = 0; +select * from file('test_01946.zstd', 'JSONEachRow', 'foo String') limit 30 format Null; +set input_format_parallel_parsing = 1; +select * from file('test_01946.zstd', 'JSONEachRow', 'foo String') limit 30 format Null; +" + diff --git a/tests/queries/0_stateless/01946_tskv.reference b/tests/queries/0_stateless/01946_tskv.reference new file mode 100644 index 00000000000..5a3b19fa88f --- /dev/null +++ b/tests/queries/0_stateless/01946_tskv.reference @@ -0,0 +1 @@ +can contain = symbol diff --git a/tests/queries/0_stateless/01946_tskv.sh b/tests/queries/0_stateless/01946_tskv.sh new file mode 100755 index 00000000000..ecc18d205d2 --- /dev/null +++ b/tests/queries/0_stateless/01946_tskv.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +$CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS tskv"; +$CLICKHOUSE_CLIENT --query="CREATE TABLE tskv (text String) ENGINE = Memory"; + +# shellcheck disable=SC2028 +echo -n 'tskv text=can contain \= symbol +' | $CLICKHOUSE_CLIENT --query="INSERT INTO tskv FORMAT TSKV"; + +$CLICKHOUSE_CLIENT --query="SELECT * FROM tskv"; +$CLICKHOUSE_CLIENT --query="DROP TABLE tskv"; diff --git a/tests/queries/0_stateless/01947_multiple_pipe_read.reference b/tests/queries/0_stateless/01947_multiple_pipe_read.reference new file mode 100644 index 00000000000..de88ec2140e --- /dev/null +++ b/tests/queries/0_stateless/01947_multiple_pipe_read.reference @@ -0,0 +1,86 @@ +File generated: +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +****************** +Read twice from a regular file +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +--- +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +--- +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +****************** +Read twice from file descriptor that corresponds to a regular file +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +--- +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +--- +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA +0 BBB +1 BBB +2 BBB +3 BBB +4 AAA +5 BBB +6 AAA diff --git a/tests/queries/0_stateless/01947_multiple_pipe_read.sh b/tests/queries/0_stateless/01947_multiple_pipe_read.sh new file mode 100755 index 00000000000..de9ca47f8cf --- /dev/null +++ b/tests/queries/0_stateless/01947_multiple_pipe_read.sh @@ -0,0 +1,28 @@ +#!/usr/bin/env bash +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +SAMPLE_FILE=$(mktemp 01947_multiple_pipe_read_sample_data_XXXXXX.csv) + +echo 'File generated:' +${CLICKHOUSE_LOCAL} -q "SELECT number AS x, if(number in (4,6), 'AAA', 'BBB') AS s from numbers(7)" > "$SAMPLE_FILE" +cat "$SAMPLE_FILE" + +echo '******************' +echo 'Read twice from a regular file' +${CLICKHOUSE_LOCAL} --structure 'x UInt64, s String' -q 'select * from table; select * from table;' --file "$SAMPLE_FILE" +echo '---' +${CLICKHOUSE_LOCAL} --structure 'x UInt64, s String' -q 'select * from table WHERE x IN (select x from table);' --file "$SAMPLE_FILE" +echo '---' +${CLICKHOUSE_LOCAL} --structure 'x UInt64, s String' -q 'select * from table UNION ALL select * from table;' --file "$SAMPLE_FILE" + +echo '******************' +echo 'Read twice from file descriptor that corresponds to a regular file' +${CLICKHOUSE_LOCAL} --structure 'x UInt64, s String' -q 'select * from table; select * from table;' < "$SAMPLE_FILE" +echo '---' +${CLICKHOUSE_LOCAL} --structure 'x UInt64, s String' -q 'select * from table WHERE x IN (select x from table);' < "$SAMPLE_FILE" +echo '---' +${CLICKHOUSE_LOCAL} --structure 'x UInt64, s String' -q 'select * from table UNION ALL select * from table;' < "$SAMPLE_FILE" + +rm "$SAMPLE_FILE" diff --git a/tests/queries/0_stateless/01947_mv_subquery.reference b/tests/queries/0_stateless/01947_mv_subquery.reference new file mode 100644 index 00000000000..fe65b417907 --- /dev/null +++ b/tests/queries/0_stateless/01947_mv_subquery.reference @@ -0,0 +1,6 @@ +{"test":"1947 #1 CHECK - TRUE","sleep_calls":"0","sleep_microseconds":"0"} +{"test":"1947 #2 CHECK - TRUE","sleep_calls":"2","sleep_microseconds":"2000"} +{"test":"1947 #3 CHECK - TRUE","sleep_calls":"0","sleep_microseconds":"0"} +{"test":"1947 #1 CHECK - FALSE","sleep_calls":"0","sleep_microseconds":"0"} +{"test":"1947 #2 CHECK - FALSE","sleep_calls":"2","sleep_microseconds":"2000"} +{"test":"1947 #3 CHECK - FALSE","sleep_calls":"0","sleep_microseconds":"0"} diff --git a/tests/queries/0_stateless/01947_mv_subquery.sql b/tests/queries/0_stateless/01947_mv_subquery.sql new file mode 100644 index 00000000000..ae67e46e0ae --- /dev/null +++ b/tests/queries/0_stateless/01947_mv_subquery.sql @@ -0,0 +1,145 @@ +SET log_queries=1; +SET log_profile_events=true; + +CREATE TABLE src Engine=MergeTree ORDER BY id AS SELECT number as id, toInt32(1) as value FROM numbers(1); +CREATE TABLE dst (id UInt64, delta Int64) Engine=MergeTree ORDER BY id; + +-- First we try with default values (https://github.com/ClickHouse/ClickHouse/issues/9587) +SET use_index_for_in_with_subqueries = 1; + +CREATE MATERIALIZED VIEW src2dst_true TO dst AS +SELECT + id, + src.value - deltas_sum as delta +FROM src +LEFT JOIN +( + SELECT id, sum(delta) as deltas_sum FROM dst + WHERE id IN (SELECT id FROM src WHERE not sleepEachRow(0.001)) + GROUP BY id +) _a +USING (id); + +-- Inserting 2 numbers should require 2 calls to sleep +INSERT into src SELECT number + 100 as id, 1 FROM numbers(2); + +-- Describe should not need to call sleep +DESCRIBE ( SELECT '1947 #3 QUERY - TRUE', + id, + src.value - deltas_sum as delta + FROM src + LEFT JOIN + ( + SELECT id, sum(delta) as deltas_sum FROM dst + WHERE id IN (SELECT id FROM src WHERE not sleepEachRow(0.001)) + GROUP BY id + ) _a + USING (id) + ) FORMAT Null; + + +SYSTEM FLUSH LOGS; + +SELECT '1947 #1 CHECK - TRUE' as test, + ProfileEvents['SleepFunctionCalls'] as sleep_calls, + ProfileEvents['SleepFunctionMicroseconds'] as sleep_microseconds +FROM system.query_log +WHERE query like '%CREATE MATERIALIZED VIEW src2dst_true%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +SELECT '1947 #2 CHECK - TRUE' as test, + ProfileEvents['SleepFunctionCalls'] as sleep_calls, + ProfileEvents['SleepFunctionMicroseconds'] as sleep_microseconds +FROM system.query_log +WHERE query like '%INSERT into src SELECT number + 100 as id, 1 FROM numbers(2)%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +SELECT '1947 #3 CHECK - TRUE' as test, + ProfileEvents['SleepFunctionCalls'] as sleep_calls, + ProfileEvents['SleepFunctionMicroseconds'] as sleep_microseconds +FROM system.query_log +WHERE query like '%DESCRIBE ( SELECT ''1947 #3 QUERY - TRUE'',%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +DROP TABLE src2dst_true; + + +-- Retry the same but using use_index_for_in_with_subqueries = 0 + +SET use_index_for_in_with_subqueries = 0; + +CREATE MATERIALIZED VIEW src2dst_false TO dst AS +SELECT + id, + src.value - deltas_sum as delta +FROM src +LEFT JOIN +( + SELECT id, sum(delta) as deltas_sum FROM dst + WHERE id IN (SELECT id FROM src WHERE not sleepEachRow(0.001)) + GROUP BY id +) _a +USING (id); + +-- Inserting 2 numbers should require 2 calls to sleep +INSERT into src SELECT number + 200 as id, 1 FROM numbers(2); + +-- Describe should not need to call sleep +DESCRIBE ( SELECT '1947 #3 QUERY - FALSE', + id, + src.value - deltas_sum as delta + FROM src + LEFT JOIN + ( + SELECT id, sum(delta) as deltas_sum FROM dst + WHERE id IN (SELECT id FROM src WHERE not sleepEachRow(0.001)) + GROUP BY id + ) _a + USING (id) + ) FORMAT Null; + +SYSTEM FLUSH LOGS; + +SELECT '1947 #1 CHECK - FALSE' as test, + ProfileEvents['SleepFunctionCalls'] as sleep_calls, + ProfileEvents['SleepFunctionMicroseconds'] as sleep_microseconds +FROM system.query_log +WHERE query like '%CREATE MATERIALIZED VIEW src2dst_false%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +SELECT '1947 #2 CHECK - FALSE' as test, + ProfileEvents['SleepFunctionCalls'] as sleep_calls, + ProfileEvents['SleepFunctionMicroseconds'] as sleep_microseconds +FROM system.query_log +WHERE query like '%INSERT into src SELECT number + 200 as id, 1 FROM numbers(2)%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +SELECT '1947 #3 CHECK - FALSE' as test, + ProfileEvents['SleepFunctionCalls'] as sleep_calls, + ProfileEvents['SleepFunctionMicroseconds'] as sleep_microseconds +FROM system.query_log +WHERE query like '%DESCRIBE ( SELECT ''1947 #3 QUERY - FALSE'',%' + AND type > 1 + AND current_database = currentDatabase() + AND event_date >= yesterday() + FORMAT JSONEachRow; + +DROP TABLE src2dst_false; + +DROP TABLE src; +DROP TABLE dst; diff --git a/tests/queries/0_stateless/01948_dictionary_quoted_database_name.reference b/tests/queries/0_stateless/01948_dictionary_quoted_database_name.reference new file mode 100644 index 00000000000..6a9fb68a92e --- /dev/null +++ b/tests/queries/0_stateless/01948_dictionary_quoted_database_name.reference @@ -0,0 +1,2 @@ +0 Value +0 Value diff --git a/tests/queries/0_stateless/01948_dictionary_quoted_database_name.sql b/tests/queries/0_stateless/01948_dictionary_quoted_database_name.sql new file mode 100644 index 00000000000..21e8e07c724 --- /dev/null +++ b/tests/queries/0_stateless/01948_dictionary_quoted_database_name.sql @@ -0,0 +1,38 @@ +DROP DATABASE IF EXISTS `01945.db`; +CREATE DATABASE `01945.db`; + +CREATE TABLE `01945.db`.test_dictionary_values +( + id UInt64, + value String +) ENGINE=TinyLog; + +INSERT INTO `01945.db`.test_dictionary_values VALUES (0, 'Value'); + +CREATE DICTIONARY `01945.db`.test_dictionary +( + id UInt64, + value String +) +PRIMARY KEY id +LAYOUT(DIRECT()) +SOURCE(CLICKHOUSE(DB '01945.db' TABLE 'test_dictionary_values')); + +SELECT * FROM `01945.db`.test_dictionary; +DROP DICTIONARY `01945.db`.test_dictionary; + +CREATE DICTIONARY `01945.db`.`test_dictionary.test` +( + id UInt64, + value String +) +PRIMARY KEY id +LAYOUT(DIRECT()) +SOURCE(CLICKHOUSE(DB '01945.db' TABLE 'test_dictionary_values')); + +SELECT * FROM `01945.db`.`test_dictionary.test`; +DROP DICTIONARY `01945.db`.`test_dictionary.test`; + + +DROP TABLE `01945.db`.test_dictionary_values; +DROP DATABASE `01945.db`; diff --git a/tests/queries/0_stateless/01948_group_bitmap_and_or_xor_fix.reference b/tests/queries/0_stateless/01948_group_bitmap_and_or_xor_fix.reference new file mode 100644 index 00000000000..1cec1260860 --- /dev/null +++ b/tests/queries/0_stateless/01948_group_bitmap_and_or_xor_fix.reference @@ -0,0 +1 @@ +1 1 0 diff --git a/tests/queries/0_stateless/01948_group_bitmap_and_or_xor_fix.sql b/tests/queries/0_stateless/01948_group_bitmap_and_or_xor_fix.sql new file mode 100644 index 00000000000..7a7c603ffa5 --- /dev/null +++ b/tests/queries/0_stateless/01948_group_bitmap_and_or_xor_fix.sql @@ -0,0 +1 @@ +SELECT groupBitmapAnd(bitmapBuild([toInt32(1)])), groupBitmapOr(bitmapBuild([toInt32(1)])), groupBitmapXor(bitmapBuild([toInt32(1)])) FROM cluster(test_cluster_two_shards, numbers(10)); diff --git a/tests/queries/0_stateless/01948_heredoc.reference b/tests/queries/0_stateless/01948_heredoc.reference new file mode 100644 index 00000000000..2f40275e1d2 --- /dev/null +++ b/tests/queries/0_stateless/01948_heredoc.reference @@ -0,0 +1,14 @@ + +VALUE + +VALUE +\'VALUE\' +$do$ $ doc$ $doc $ $doco$ +$do$ $ doc$ $doc $ $doco$ $do$ $ doc$ $doc $ $doco$ +ТЕСТ +该类型的引擎 +VALUE +VALUE +\nvalue1\nvalue2\nvalue3\n +\'\\xc3\\x28\' +\'\\xc3\\x28\' diff --git a/tests/queries/0_stateless/01948_heredoc.sql b/tests/queries/0_stateless/01948_heredoc.sql new file mode 100644 index 00000000000..4a4ced004e3 --- /dev/null +++ b/tests/queries/0_stateless/01948_heredoc.sql @@ -0,0 +1,22 @@ +SELECT $$$$; +SELECT $$VALUE$$; +SELECT $doc$$doc$; +SELECT $doc$VALUE$doc$; +SELECT $doc$'VALUE'$doc$; +SELECT $doc$$do$ $ doc$ $doc $ $doco$$doc$; +SELECT $doc$$do$ $ doc$ $doc $ $doco$$doc$, $doc$$do$ $ doc$ $doc $ $doco$$doc$; + +SELECT $doc$ТЕСТ$doc$; +SELECT $doc$该类型的引擎$doc$; + +SELECT $РÐЗДЕЛИТЕЛЬ$VALUE$РÐЗДЕЛИТЕЛЬ$; +SELECT $该类型的引擎$VALUE$该类型的引擎$; + +SELECT $$ +value1 +value2 +value3 +$$; + +SELECT $doc$'\xc3\x28'$doc$; +SELECT $\xc3\x28$'\xc3\x28'$\xc3\x28$; diff --git a/tests/queries/0_stateless/01949_clickhouse_local_with_remote_localhost.reference b/tests/queries/0_stateless/01949_clickhouse_local_with_remote_localhost.reference new file mode 100644 index 00000000000..e342541fca6 --- /dev/null +++ b/tests/queries/0_stateless/01949_clickhouse_local_with_remote_localhost.reference @@ -0,0 +1,8 @@ +Code:81. +"test2",0 +"test2",1 +"test2",2 +Code:81. +"test4",0 +"test4",1 +"test4",2 diff --git a/tests/queries/0_stateless/01949_clickhouse_local_with_remote_localhost.sh b/tests/queries/0_stateless/01949_clickhouse_local_with_remote_localhost.sh new file mode 100755 index 00000000000..efd49d47017 --- /dev/null +++ b/tests/queries/0_stateless/01949_clickhouse_local_with_remote_localhost.sh @@ -0,0 +1,36 @@ +#!/usr/bin/env bash + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + + +${CLICKHOUSE_CLIENT} --query "CREATE TABLE ${CLICKHOUSE_DATABASE}.remote_table (a Int64) ENGINE=TinyLog AS SELECT * FROM system.numbers limit 10;" + +if [ "$CLICKHOUSE_HOST" == "localhost" ]; then + # Connecting to 127.0.0.1 will connect to clickhouse-local itself, where the table doesn't exist + ${CLICKHOUSE_LOCAL} -q "SELECT 'test1', * FROM remote('127.0.0.1', '${CLICKHOUSE_DATABASE}.remote_table') LIMIT 3;" 2>&1 | awk '{print $1 $2}' + + # Now connecting to 127.0.0.1:9000 will connect to the database we are running tests against + ${CLICKHOUSE_LOCAL} -q "SELECT 'test2', * FROM remote('127.0.0.1:${CLICKHOUSE_PORT_TCP}', '${CLICKHOUSE_DATABASE}.remote_table') LIMIT 3 FORMAT CSV;" 2>&1 \ + | grep -av "ASan doesn't fully support makecontext/swapcontext functions" + + # Same test now against localhost + ${CLICKHOUSE_LOCAL} -q "SELECT 'test3', * FROM remote('localhost', '${CLICKHOUSE_DATABASE}.remote_table') LIMIT 3;" 2>&1 | awk '{print $1 $2}' + + ${CLICKHOUSE_LOCAL} -q "SELECT 'test4', * FROM remote('localhost:${CLICKHOUSE_PORT_TCP}', '${CLICKHOUSE_DATABASE}.remote_table') LIMIT 3 FORMAT CSV;" 2>&1 \ + | grep -av "ASan doesn't fully support makecontext/swapcontext functions" +else + # Can't test without localhost + echo Code:81. + echo \"test2\",0 + echo \"test2\",1 + echo \"test2\",2 + echo Code:81. + echo \"test4\",0 + echo \"test4\",1 + echo \"test4\",2 +fi + + +${CLICKHOUSE_CLIENT} --query "DROP TABLE ${CLICKHOUSE_DATABASE}.remote_table;" diff --git a/tests/queries/0_stateless/01949_heredoc_unfinished.reference b/tests/queries/0_stateless/01949_heredoc_unfinished.reference new file mode 100644 index 00000000000..234840fd3dd --- /dev/null +++ b/tests/queries/0_stateless/01949_heredoc_unfinished.reference @@ -0,0 +1,6 @@ +doc +abc +abc +1 +1 +1 diff --git a/tests/queries/0_stateless/01949_heredoc_unfinished.sh b/tests/queries/0_stateless/01949_heredoc_unfinished.sh new file mode 100755 index 00000000000..8ab9ffd6406 --- /dev/null +++ b/tests/queries/0_stateless/01949_heredoc_unfinished.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +curl -sS "${CLICKHOUSE_URL}&query=SELECT%20%24%24doc%24%24VALUE%24%24doc%24%24"; +curl -sS "${CLICKHOUSE_URL}&query=SELECT%20%24%24abc%24%24"; +curl -sS "${CLICKHOUSE_URL}&query=SELECT%20%24%24abc%24%24d"; +curl -sS "${CLICKHOUSE_URL}&query=SELECT%20%24%24ab" | grep -c "DB::Exception"; +curl -sS "${CLICKHOUSE_URL}&query=SELECT%20%24%24" | grep -c "DB::Exception"; +curl -sS "${CLICKHOUSE_URL}&query=SELECT%20%24" | grep -c "DB::Exception"; diff --git a/tests/queries/0_stateless/01950_kill_large_group_by_query.reference b/tests/queries/0_stateless/01950_kill_large_group_by_query.reference new file mode 100644 index 00000000000..1602d6587ad --- /dev/null +++ b/tests/queries/0_stateless/01950_kill_large_group_by_query.reference @@ -0,0 +1,2 @@ +finished test_01948_tcp_default default SELECT * FROM\n (\n SELECT a.name as n\n FROM\n (\n SELECT \'Name\' as name, number FROM system.numbers LIMIT 2000000\n ) AS a,\n (\n SELECT \'Name\' as name, number FROM system.numbers LIMIT 2000000\n ) as b\n GROUP BY n\n )\n LIMIT 20\n FORMAT Null +finished test_01948_http_default default SELECT * FROM\n (\n SELECT a.name as n\n FROM\n (\n SELECT \'Name\' as name, number FROM system.numbers LIMIT 2000000\n ) AS a,\n (\n SELECT \'Name\' as name, number FROM system.numbers LIMIT 2000000\n ) as b\n GROUP BY n\n )\n LIMIT 20\n FORMAT Null diff --git a/tests/queries/0_stateless/01950_kill_large_group_by_query.sh b/tests/queries/0_stateless/01950_kill_large_group_by_query.sh new file mode 100755 index 00000000000..465b923187e --- /dev/null +++ b/tests/queries/0_stateless/01950_kill_large_group_by_query.sh @@ -0,0 +1,54 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +set -e -o pipefail + +function wait_for_query_to_start() +{ + while [[ $($CLICKHOUSE_CLIENT -q "SELECT count() FROM system.processes WHERE query_id = '$1'") == 0 ]]; do sleep 0.1; done +} + + +# TCP CLIENT + +$CLICKHOUSE_CLIENT --max_execution_time 10 --query_id "test_01948_tcp_$CLICKHOUSE_DATABASE" -q \ + "SELECT * FROM + ( + SELECT a.name as n + FROM + ( + SELECT 'Name' as name, number FROM system.numbers LIMIT 2000000 + ) AS a, + ( + SELECT 'Name' as name, number FROM system.numbers LIMIT 2000000 + ) as b + GROUP BY n + ) + LIMIT 20 + FORMAT Null" > /dev/null 2>&1 & +wait_for_query_to_start "test_01948_tcp_$CLICKHOUSE_DATABASE" +$CLICKHOUSE_CLIENT --max_execution_time 10 -q "KILL QUERY WHERE query_id = 'test_01948_tcp_$CLICKHOUSE_DATABASE' SYNC" + + +# HTTP CLIENT + +${CLICKHOUSE_CURL_COMMAND} -q --max-time 10 -sS "$CLICKHOUSE_URL&query_id=test_01948_http_$CLICKHOUSE_DATABASE" -d \ + "SELECT * FROM + ( + SELECT a.name as n + FROM + ( + SELECT 'Name' as name, number FROM system.numbers LIMIT 2000000 + ) AS a, + ( + SELECT 'Name' as name, number FROM system.numbers LIMIT 2000000 + ) as b + GROUP BY n + ) + LIMIT 20 + FORMAT Null" > /dev/null 2>&1 & +wait_for_query_to_start "test_01948_http_$CLICKHOUSE_DATABASE" +$CLICKHOUSE_CURL --max-time 10 -sS "$CLICKHOUSE_URL" -d "KILL QUERY WHERE query_id = 'test_01948_http_$CLICKHOUSE_DATABASE' SYNC" diff --git a/tests/queries/0_stateless/01954_clickhouse_benchmark_multiple_long.config.xml b/tests/queries/0_stateless/01954_clickhouse_benchmark_multiple_long.config.xml new file mode 100644 index 00000000000..e51779006ab --- /dev/null +++ b/tests/queries/0_stateless/01954_clickhouse_benchmark_multiple_long.config.xml @@ -0,0 +1,33 @@ + + + + trace + true + + + 9000 + + 0 + + + + + + + ::/0 + + + default + default + 1 + + + + + + + + + + + diff --git a/tests/queries/0_stateless/01954_clickhouse_benchmark_multiple_long.reference b/tests/queries/0_stateless/01954_clickhouse_benchmark_multiple_long.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/01954_clickhouse_benchmark_multiple_long.sh b/tests/queries/0_stateless/01954_clickhouse_benchmark_multiple_long.sh new file mode 100755 index 00000000000..069695c16db --- /dev/null +++ b/tests/queries/0_stateless/01954_clickhouse_benchmark_multiple_long.sh @@ -0,0 +1,186 @@ +#!/usr/bin/env bash + +CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CUR_DIR"/../shell_config.sh + +BASE="$CUR_DIR/$(basename "${BASH_SOURCE[0]}" .sh)" + +server_pids=() +paths=() + +function cleanup() +{ + local pid + for pid in "${server_pids[@]}"; do + kill -9 "$pid" + done + + echo "Test failed." >&2 + tail -n1000 "$BASE".clickhouse-server*.log "$BASE".clickhouse-benchmark*.log >&2 + rm -f "$BASE".clickhouse-server*.log "$BASE".clickhouse-benchmark*.log + + local path + for path in "${paths[@]}"; do + rm -fr "$path" + done + + exit 1 +} + +function start_server() +{ + local log=$1 && shift + + local server_opts=( + "--config-file=$BASE.config.xml" + "--" + # we will discover the real port later. + "--tcp_port=0" + "--shutdown_wait_unfinished=0" + "--listen_host=127.1" + ) + CLICKHOUSE_WATCHDOG_ENABLE=0 $CLICKHOUSE_SERVER_BINARY "${server_opts[@]}" "$@" >& "$log" & + local pid=$! + + echo "$pid" +} + +function get_server_port() +{ + local pid=$1 && shift + local port='' i=0 retries=300 + # wait until server will start to listen (max 30 seconds) + while [[ -z $port ]] && [[ $i -lt $retries ]]; do + port="$(lsof -n -a -P -i tcp -s tcp:LISTEN -p "$pid" 2>/dev/null | awk -F'[ :]' '/LISTEN/ { print $(NF-1) }')" + ((++i)) + sleep 0.1 + done + if [[ -z $port ]]; then + echo "Cannot wait for LISTEN socket" >&2 + exit 1 + fi + echo "$port" +} + +function wait_server_port() +{ + local port=$1 && shift + # wait for the server to start accepting tcp connections (max 30 seconds) + local i=0 retries=300 + while ! $CLICKHOUSE_CLIENT_BINARY --host 127.1 --port "$port" --format Null -q 'select 1' 2>/dev/null && [[ $i -lt $retries ]]; do + sleep 0.1 + done + if ! $CLICKHOUSE_CLIENT_BINARY --host 127.1 --port "$port" --format Null -q 'select 1'; then + echo "Cannot wait until server will start accepting connections on " >&2 + exit 1 + fi +} + +function execute_query() +{ + local port=$1 && shift + $CLICKHOUSE_CLIENT_BINARY --host 127.1 --port "$port" "$@" +} + +function make_server() +{ + local log=$1 && shift + + local pid + pid="$(start_server "$log" "$@")" + + local port + port="$(get_server_port "$pid")" + wait_server_port "$port" + + echo "$pid" "$port" +} + +function terminate_servers() +{ + local pid + for pid in "${server_pids[@]}"; do + kill -9 "$pid" + # NOTE: we cannot wait the server pid since it was created in a subshell + done + + rm -f "$BASE".clickhouse-server*.log "$BASE".clickhouse-benchmark*.log + + local path + for path in "${paths[@]}"; do + rm -fr "$path" + done +} + +function test_clickhouse_benchmark_multi_hosts() +{ + local benchmark_opts=( + --iterations 10000 + --host 127.1 --port "$port1" + --host 127.1 --port "$port2" + --query 'select 1' + --concurrency 10 + ) + clickhouse-benchmark "${benchmark_opts[@]}" >& "$(mktemp "$BASE.clickhouse-benchmark.XXXXXX.log")" + + local queries1 queries2 + queries1="$(execute_query "$port1" --query "select value from system.events where event = 'Query'")" + queries2="$(execute_query "$port2" --query "select value from system.events where event = 'Query'")" + + if [[ $queries1 -lt 4000 ]] || [[ $queries1 -gt 6000 ]]; then + echo "server1 (port=$port1) handled $queries1 queries" >&2 + fi + if [[ $queries2 -lt 4000 ]] || [[ $queries2 -gt 6000 ]]; then + echo "server1 (port=$port2) handled $queries2 queries" >&2 + fi +} +function test_clickhouse_benchmark_multi_hosts_roundrobin() +{ + local benchmark_opts=( + --iterations 10000 + --host 127.1 --port "$port1" + --host 127.1 --port "$port2" + --query 'select 1' + --concurrency 10 + --roundrobin + ) + clickhouse-benchmark "${benchmark_opts[@]}" >& "$(mktemp "$BASE.clickhouse-benchmark.XXXXXX.log")" + + local queries1 queries2 + queries1="$(execute_query "$port1" --query "select value from system.events where event = 'Query'")" + queries2="$(execute_query "$port2" --query "select value from system.events where event = 'Query'")" + + # NOTE: it should take into account test_clickhouse_benchmark_multi_hosts queries too. + # that's why it is [9000, 11000] instead of [4000, 6000] + if [[ $queries1 -lt 9000 ]] || [[ $queries1 -gt 11000 ]]; then + echo "server1 (port=$port1) handled $queries1 queries (with --roundrobin)" >&2 + fi + if [[ $queries2 -lt 9000 ]] || [[ $queries2 -gt 11000 ]]; then + echo "server1 (port=$port2) handled $queries2 queries (with --roundrobin)" >&2 + fi +} + +function main() +{ + trap cleanup EXIT + + local path port1 port2 + + path="$(mktemp -d "$BASE.server1.XXXXXX")" + paths+=( "$path" ) + read -r pid1 port1 <<<"$(make_server "$(mktemp "$BASE.clickhouse-server-XXXXXX.log")" --path "$path")" + server_pids+=( "$pid1" ) + + path="$(mktemp -d "$BASE.server2.XXXXXX")" + paths+=( "$path" ) + read -r pid2 port2 <<<"$(make_server "$(mktemp "$BASE.clickhouse-server-XXXXXX.log")" --path "$path")" + server_pids+=( "$pid2" ) + + test_clickhouse_benchmark_multi_hosts + test_clickhouse_benchmark_multi_hosts_roundrobin + + terminate_servers + trap '' EXIT +} +main "$@" diff --git a/tests/queries/0_stateless/01955_clickhouse_benchmark_connection_hang.reference b/tests/queries/0_stateless/01955_clickhouse_benchmark_connection_hang.reference new file mode 100644 index 00000000000..9e388b62601 --- /dev/null +++ b/tests/queries/0_stateless/01955_clickhouse_benchmark_connection_hang.reference @@ -0,0 +1,3 @@ +Loaded 1 queries. +I/O error: Too many open files +70 diff --git a/tests/queries/0_stateless/01955_clickhouse_benchmark_connection_hang.sh b/tests/queries/0_stateless/01955_clickhouse_benchmark_connection_hang.sh new file mode 100755 index 00000000000..9085d646a28 --- /dev/null +++ b/tests/queries/0_stateless/01955_clickhouse_benchmark_connection_hang.sh @@ -0,0 +1,71 @@ +#!/usr/bin/env bash +# shellcheck disable=SC2086 + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +# NOTE: Tests with limit for number of opened files cannot be run under UBsan. +# +# UBsan needs to create pipe each time it need to check the type: +# +# pipe() +# __sanitizer::IsAccessibleMemoryRange(unsigned long, unsigned long) +# __ubsan::checkDynamicType(void*, void*, unsigned long) + 271 +# HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) + 34 +# __ubsan_handle_dynamic_type_cache_miss_abort + 58 +# +# Obviously it will fail if RLIMIT_NOFILE exceeded (like in this test), and the UBsan will falsely report [1]: +# +# 01955_clickhouse_benchmark_connection_hang: [ FAIL ] 1.56 sec. - result differs with reference: +# --- /usr/share/clickhouse-test/queries/0_stateless/01955_clickhouse_benchmark_connection_hang.reference 2021-07-21 11:14:58.000000000 +0300 +# +++ /tmp/clickhouse-test/0_stateless/01955_clickhouse_benchmark_connection_hang.stdout 2021-07-21 11:53:45.684050372 +0300 +# @@ -1,3 +1,22 @@ +# Loaded 1 queries. +# -I/O error: Too many open files +# -70 +# +../contrib/libcxx/include/memory:3212:19: runtime error: member call on address 0x00002939d5c0 which does not point to an object of type 'std::__1::__shared_weak_count' +# +0x00002939d5c0: note: object has invalid vptr +# + +# +==558==WARNING: Can't create a socket pair to start external symbolizer (errno: 24) +# +==558==WARNING: Can't create a socket pair to start external symbolizer (errno: 24) +# +==558==WARNING: Can't create a socket pair to start external symbolizer (errno: 24) +# +==558==WARNING: Can't create a socket pair to start external symbolizer (errno: 24) +# +==558==WARNING: Can't create a socket pair to start external symbolizer (errno: 24) +# +==558==WARNING: Failed to use and restart external symbolizer! +# + #0 0xfe86b57 (/usr/bin/clickhouse+0xfe86b57) +# + #1 0xfe83fd7 (/usr/bin/clickhouse+0xfe83fd7) +# + #2 0xfe89af4 (/usr/bin/clickhouse+0xfe89af4) +# + #3 0xfe81fa9 (/usr/bin/clickhouse+0xfe81fa9) +# + #4 0x1f377609 (/usr/bin/clickhouse+0x1f377609) +# + #5 0xfe7e2a1 (/usr/bin/clickhouse+0xfe7e2a1) +# + #6 0xfce1003 (/usr/bin/clickhouse+0xfce1003) +# + #7 0x7f3345bd30b2 (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) +# + #8 0xfcbf0ed (/usr/bin/clickhouse+0xfcbf0ed) +# + +# +SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../contrib/libcxx/include/memory:3212:19 in +# +1 +# +# Stacktrace from lldb: +# +# thread #1, name = 'clickhouse-benc', stop reason = Dynamic type mismatch +# * frame #0: 0x000000000fffc070 clickhouse`__ubsan_on_report +# frame #1: 0x000000000fff6511 clickhouse`__ubsan::Diag::~Diag() + 209 +# frame #2: 0x000000000fffcb11 clickhouse`HandleDynamicTypeCacheMiss(__ubsan::DynamicTypeCacheMissData*, unsigned long, unsigned long, __ubsan::ReportOptions) + 609 +# frame #3: 0x000000000fffcf2a clickhouse`__ubsan_handle_dynamic_type_cache_miss_abort + 58 +# frame #4: 0x00000000101a33f8 clickhouse`std::__1::shared_ptr::PoolEntryHelper>::~shared_ptr(this=) + 152 at memory:3212 +# frame #5: 0x00000000101a267a clickhouse`PoolBase::Entry::~Entry(this=) + 26 at PoolBase.h:67 +# frame #6: 0x00000000101a0878 clickhouse`DB::ConnectionPool::get(this=, timeouts=0x00007fffffffc278, settings=, force_connected=true) + 664 at ConnectionPool.h:93 +# frame #7: 0x00000000101a6395 clickhouse`DB::Benchmark::runBenchmark(this=) + 981 at Benchmark.cpp:309 +# frame #8: 0x000000001019e84a clickhouse`DB::Benchmark::main(this=0x00007fffffffd8c8, (null)=) + 586 at Benchmark.cpp:128 +# frame #9: 0x000000001f5d028a clickhouse`Poco::Util::Application::run(this=0x00007fffffffd8c8) + 42 at Application.cpp:334 +# frame #10: 0x000000001019ab42 clickhouse`mainEntryClickHouseBenchmark(argc=, argv=) + 6978 at Benchmark.cpp:655 +# frame #11: 0x000000000fffdfc4 clickhouse`main(argc_=, argv_=) + 356 at main.cpp:366 +# frame #12: 0x00007ffff7de6d0a libc.so.6`__libc_start_main(main=(clickhouse`main at main.cpp:339), argc=7, argv=0x00007fffffffe1e8, init=, fini=, rtld_fini=, stack_end=0x00007fffffffe1d8) + 234 at libc-start.c:308 +# frame #13: 0x000000000ffdc0aa clickhouse`_start + 42 +# +# [1]: https://clickhouse-test-reports.s3.yandex.net/26656/f17ca450ac991603e6400c7caef49c493ac69739/functional_stateless_tests_(ubsan).html#fail1 + +# Limit number of files to 50, and we will get EMFILE for some of socket() +prlimit --nofile=50 $CLICKHOUSE_BENCHMARK --iterations 1 --concurrency 50 --query 'select 1' 2>&1 +echo $? diff --git a/tests/queries/0_stateless/01956_skip_unavailable_shards_excessive_attempts.reference b/tests/queries/0_stateless/01956_skip_unavailable_shards_excessive_attempts.reference new file mode 100644 index 00000000000..f2322e4ffc4 --- /dev/null +++ b/tests/queries/0_stateless/01956_skip_unavailable_shards_excessive_attempts.reference @@ -0,0 +1 @@ +Connection failed at try â„–1, diff --git a/tests/queries/0_stateless/01956_skip_unavailable_shards_excessive_attempts.sh b/tests/queries/0_stateless/01956_skip_unavailable_shards_excessive_attempts.sh new file mode 100755 index 00000000000..0e2f1857aed --- /dev/null +++ b/tests/queries/0_stateless/01956_skip_unavailable_shards_excessive_attempts.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +CLICKHOUSE_CLIENT_SERVER_LOGS_LEVEL=trace + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +opts=( + "--connections_with_failover_max_tries=1" + "--skip_unavailable_shards=1" +) +$CLICKHOUSE_CLIENT --query "select * from remote('255.255.255.255', system.one)" "${opts[@]}" 2>&1 | grep -o 'Connection failed at try.*,' diff --git a/tests/queries/0_stateless/01957_heredoc_more.reference b/tests/queries/0_stateless/01957_heredoc_more.reference new file mode 100644 index 00000000000..e092ff416d6 --- /dev/null +++ b/tests/queries/0_stateless/01957_heredoc_more.reference @@ -0,0 +1 @@ +FFFE80BB diff --git a/tests/queries/0_stateless/01957_heredoc_more.sql b/tests/queries/0_stateless/01957_heredoc_more.sql new file mode 100644 index 00000000000..61681ad39c0 --- /dev/null +++ b/tests/queries/0_stateless/01957_heredoc_more.sql @@ -0,0 +1 @@ +SELECT hex($$ÿþ€»$$); diff --git a/tests/queries/0_stateless/01960_lambda_precedence.reference b/tests/queries/0_stateless/01960_lambda_precedence.reference new file mode 100644 index 00000000000..96e36988183 --- /dev/null +++ b/tests/queries/0_stateless/01960_lambda_precedence.reference @@ -0,0 +1,3 @@ +1000 [2,3,4] 1010 +1 +1 diff --git a/tests/queries/0_stateless/01960_lambda_precedence.sql b/tests/queries/0_stateless/01960_lambda_precedence.sql new file mode 100644 index 00000000000..a3ff1424cf2 --- /dev/null +++ b/tests/queries/0_stateless/01960_lambda_precedence.sql @@ -0,0 +1,26 @@ +SELECT + 1000 AS a, + arrayMap(a -> (a + 1), [1, 2, 3]), + a + 10 as c; + + +-- https://github.com/ClickHouse/ClickHouse/issues/5046 +SELECT sum(c1) AS v +FROM + ( + SELECT + 1 AS c1, + ['v'] AS c2 + ) +WHERE arrayExists(v -> (v = 'v'), c2); + + +SELECT sum(c1) AS v +FROM + ( + SELECT + 1 AS c1, + ['v'] AS c2, + ['d'] AS d + ) +WHERE arrayExists(i -> (d = ['d']), c2); diff --git a/tests/queries/0_stateless/01999_grant_with_replace.reference b/tests/queries/0_stateless/01999_grant_with_replace.reference new file mode 100644 index 00000000000..9e089a05e52 --- /dev/null +++ b/tests/queries/0_stateless/01999_grant_with_replace.reference @@ -0,0 +1,36 @@ +CREATE USER test_user_01999 +A +B +GRANT SELECT ON db1.* TO test_user_01999 +GRANT SHOW TABLES, SHOW COLUMNS, SHOW DICTIONARIES ON db2.tb2 TO test_user_01999 +C +GRANT SELECT(col1) ON db3.table TO test_user_01999 +D +GRANT SELECT(col3) ON db3.table3 TO test_user_01999 +GRANT SELECT(col1, col2) ON db4.table4 TO test_user_01999 +E +GRANT SELECT(cola) ON db5.table TO test_user_01999 +GRANT INSERT(colb) ON db6.tb61 TO test_user_01999 +GRANT SHOW ON db7.* TO test_user_01999 +F +GRANT SELECT ON all.* TO test_user_01999 +G +H +GRANT SELECT ON db1.tb1 TO test_user_01999 +GRANT test_role_01999 TO test_user_01999 +I +GRANT test_role_01999 TO test_user_01999 +J +GRANT SHOW ON db8.* TO test_user_01999 +GRANT test_role_01999 TO test_user_01999 +K +GRANT SHOW ON db8.* TO test_user_01999 +L +GRANT SELECT ON db9.tb3 TO test_user_01999 +M +GRANT SELECT ON db9.tb3 TO test_user_01999 +GRANT test_role_01999 TO test_user_01999 +N +GRANT SELECT ON db9.tb3 TO test_user_01999 +GRANT test_role_01999_1 TO test_user_01999 +O diff --git a/tests/queries/0_stateless/01999_grant_with_replace.sql b/tests/queries/0_stateless/01999_grant_with_replace.sql new file mode 100644 index 00000000000..31a9187c0d2 --- /dev/null +++ b/tests/queries/0_stateless/01999_grant_with_replace.sql @@ -0,0 +1,75 @@ +DROP USER IF EXISTS test_user_01999; + +CREATE USER test_user_01999; +SHOW CREATE USER test_user_01999; + +SELECT 'A'; +SHOW GRANTS FOR test_user_01999; + +GRANT SELECT ON db1.* TO test_user_01999; +GRANT SHOW ON db2.tb2 TO test_user_01999; + +SELECT 'B'; +SHOW GRANTS FOR test_user_01999; + +GRANT SELECT(col1) ON db3.table TO test_user_01999 WITH REPLACE OPTION; + +SELECT 'C'; +SHOW GRANTS FOR test_user_01999; + +GRANT SELECT(col3) ON db3.table3, SELECT(col1, col2) ON db4.table4 TO test_user_01999 WITH REPLACE OPTION; + +SELECT 'D'; +SHOW GRANTS FOR test_user_01999; + +GRANT SELECT(cola) ON db5.table, INSERT(colb) ON db6.tb61, SHOW ON db7.* TO test_user_01999 WITH REPLACE OPTION; + +SELECT 'E'; +SHOW GRANTS FOR test_user_01999; + +SELECT 'F'; +GRANT SELECT ON all.* TO test_user_01999 WITH REPLACE OPTION; +SHOW GRANTS FOR test_user_01999; + +SELECT 'G'; +GRANT USAGE ON *.* TO test_user_01999 WITH REPLACE OPTION; +SHOW GRANTS FOR test_user_01999; + +SELECT 'H'; +DROP ROLE IF EXISTS test_role_01999; +CREATE role test_role_01999; +GRANT test_role_01999 to test_user_01999; +GRANT SELECT ON db1.tb1 TO test_user_01999; +SHOW GRANTS FOR test_user_01999; + +SELECT 'I'; +GRANT NONE ON *.* TO test_user_01999 WITH REPLACE OPTION; +SHOW GRANTS FOR test_user_01999; + +SELECT 'J'; +GRANT SHOW ON db8.* TO test_user_01999; +SHOW GRANTS FOR test_user_01999; + +SELECT 'K'; +GRANT NONE TO test_user_01999 WITH REPLACE OPTION; +SHOW GRANTS FOR test_user_01999; + +SELECT 'L'; +GRANT NONE ON *.*, SELECT on db9.tb3 TO test_user_01999 WITH REPLACE OPTION; +SHOW GRANTS FOR test_user_01999; + +SELECT 'M'; +GRANT test_role_01999 to test_user_01999; +SHOW GRANTS FOR test_user_01999; + +SELECT 'N'; +DROP ROLE IF EXISTS test_role_01999_1; +CREATE role test_role_01999_1; +GRANT NONE, test_role_01999_1 TO test_user_01999 WITH REPLACE OPTION; +SHOW GRANTS FOR test_user_01999; + +DROP USER IF EXISTS test_user_01999; +DROP ROLE IF EXISTS test_role_01999; +DROP ROLE IF EXISTS test_role_01999_1; + +SELECT 'O'; diff --git a/tests/queries/0_stateless/02000_default_from_default_empty_column.reference b/tests/queries/0_stateless/02000_default_from_default_empty_column.reference new file mode 100644 index 00000000000..bb48d0eda85 --- /dev/null +++ b/tests/queries/0_stateless/02000_default_from_default_empty_column.reference @@ -0,0 +1 @@ +1 diff --git a/tests/queries/0_stateless/02000_default_from_default_empty_column.sql b/tests/queries/0_stateless/02000_default_from_default_empty_column.sql new file mode 100644 index 00000000000..5ca642628d4 --- /dev/null +++ b/tests/queries/0_stateless/02000_default_from_default_empty_column.sql @@ -0,0 +1,17 @@ +DROP TABLE IF EXISTS test; + +CREATE TABLE test (col Int8) ENGINE=MergeTree ORDER BY tuple() +SETTINGS vertical_merge_algorithm_min_rows_to_activate=1, + vertical_merge_algorithm_min_columns_to_activate=1, + min_bytes_for_wide_part = 0; + + +INSERT INTO test VALUES (1); +ALTER TABLE test ADD COLUMN s1 String; +ALTER TABLE test ADD COLUMN s2 String DEFAULT s1; + +OPTIMIZE TABLE test FINAL; + +SELECT * FROM test; + +DROP TABLE IF EXISTS test; diff --git a/tests/queries/0_stateless/02000_table_function_cluster_macros.reference b/tests/queries/0_stateless/02000_table_function_cluster_macros.reference new file mode 100644 index 00000000000..6ed281c757a --- /dev/null +++ b/tests/queries/0_stateless/02000_table_function_cluster_macros.reference @@ -0,0 +1,2 @@ +1 +1 diff --git a/tests/queries/0_stateless/02000_table_function_cluster_macros.sql b/tests/queries/0_stateless/02000_table_function_cluster_macros.sql new file mode 100644 index 00000000000..f1bc1358b55 --- /dev/null +++ b/tests/queries/0_stateless/02000_table_function_cluster_macros.sql @@ -0,0 +1,2 @@ +SELECT _shard_num FROM cluster("{default_cluster_macro}", system.one); +SELECT _shard_num FROM clusterAllReplicas("{default_cluster_macro}", system.one); diff --git a/tests/queries/0_stateless/02001_add_default_database_to_system_users.reference b/tests/queries/0_stateless/02001_add_default_database_to_system_users.reference new file mode 100644 index 00000000000..bec3a35ee8b --- /dev/null +++ b/tests/queries/0_stateless/02001_add_default_database_to_system_users.reference @@ -0,0 +1 @@ +system diff --git a/tests/queries/0_stateless/02001_add_default_database_to_system_users.sql b/tests/queries/0_stateless/02001_add_default_database_to_system_users.sql new file mode 100644 index 00000000000..b006f9acb22 --- /dev/null +++ b/tests/queries/0_stateless/02001_add_default_database_to_system_users.sql @@ -0,0 +1,3 @@ +create user if not exists u_02001 default database system; +select default_database from system.users where name = 'u_02001'; +drop user if exists u_02001; diff --git a/tests/queries/0_stateless/02001_compress_output_file.reference b/tests/queries/0_stateless/02001_compress_output_file.reference new file mode 100644 index 00000000000..6f51dfc24e1 --- /dev/null +++ b/tests/queries/0_stateless/02001_compress_output_file.reference @@ -0,0 +1,2 @@ +Hello, World! From client. +Hello, World! From local. diff --git a/tests/queries/0_stateless/02001_compress_output_file.sh b/tests/queries/0_stateless/02001_compress_output_file.sh new file mode 100755 index 00000000000..11df227cc14 --- /dev/null +++ b/tests/queries/0_stateless/02001_compress_output_file.sh @@ -0,0 +1,23 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +set -e + +[ -e "${CLICKHOUSE_TMP}"/test_compression_of_output_file_from_client.gz ] && rm "${CLICKHOUSE_TMP}"/test_compression_of_output_file_from_client.gz + +${CLICKHOUSE_CLIENT} --query "SELECT * FROM (SELECT 'Hello, World! From client.') INTO OUTFILE '${CLICKHOUSE_TMP}/test_compression_of_output_file_from_client.gz'" +gunzip ${CLICKHOUSE_TMP}/test_compression_of_output_file_from_client.gz +cat ${CLICKHOUSE_TMP}/test_compression_of_output_file_from_client + +rm -f "${CLICKHOUSE_TMP}/test_compression_of_output_file_from_client" + +[ -e "${CLICKHOUSE_TMP}"/test_compression_of_output_file_from_local.gz ] && rm "${CLICKHOUSE_TMP}"/test_compression_of_output_file_from_local.gz + +${CLICKHOUSE_LOCAL} --query "SELECT * FROM (SELECT 'Hello, World! From local.') INTO OUTFILE '${CLICKHOUSE_TMP}/test_compression_of_output_file_from_local.gz'" +gunzip ${CLICKHOUSE_TMP}/test_compression_of_output_file_from_local.gz +cat ${CLICKHOUSE_TMP}/test_compression_of_output_file_from_local + +rm -f "${CLICKHOUSE_TMP}/test_compression_of_output_file_from_local" diff --git a/tests/queries/0_stateless/02001_hostname_test.reference b/tests/queries/0_stateless/02001_hostname_test.reference new file mode 100644 index 00000000000..da8a2d07eab --- /dev/null +++ b/tests/queries/0_stateless/02001_hostname_test.reference @@ -0,0 +1,2 @@ +localhost +localhost 2 diff --git a/tests/queries/0_stateless/02001_hostname_test.sql b/tests/queries/0_stateless/02001_hostname_test.sql new file mode 100644 index 00000000000..a8c7a8dab0c --- /dev/null +++ b/tests/queries/0_stateless/02001_hostname_test.sql @@ -0,0 +1,2 @@ +select hostname(); +select hostName() h, count() from cluster(test_cluster_two_shards, system.one) group by h; diff --git a/tests/queries/0_stateless/02001_shard_num_shard_count.reference b/tests/queries/0_stateless/02001_shard_num_shard_count.reference new file mode 100644 index 00000000000..34c5e7514f4 --- /dev/null +++ b/tests/queries/0_stateless/02001_shard_num_shard_count.reference @@ -0,0 +1,7 @@ +0 0 +1 3 +2 3 +3 3 +1 3 +2 3 +3 3 diff --git a/tests/queries/0_stateless/02001_shard_num_shard_count.sql b/tests/queries/0_stateless/02001_shard_num_shard_count.sql new file mode 100644 index 00000000000..daf1084a614 --- /dev/null +++ b/tests/queries/0_stateless/02001_shard_num_shard_count.sql @@ -0,0 +1,3 @@ +select shardNum() n, shardCount() c; +select shardNum() n, shardCount() c from remote('127.0.0.{1,2,3}', system.one) order by n settings prefer_localhost_replica = 0; +select shardNum() n, shardCount() c from remote('127.0.0.{1,2,3}', system.one) order by n settings prefer_localhost_replica = 1; diff --git a/tests/queries/0_stateless/02002_global_subqueries_subquery_or_table_name.reference b/tests/queries/0_stateless/02002_global_subqueries_subquery_or_table_name.reference new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/queries/0_stateless/02002_global_subqueries_subquery_or_table_name.sql b/tests/queries/0_stateless/02002_global_subqueries_subquery_or_table_name.sql new file mode 100644 index 00000000000..1b24617569c --- /dev/null +++ b/tests/queries/0_stateless/02002_global_subqueries_subquery_or_table_name.sql @@ -0,0 +1,5 @@ +SELECT + cityHash64(number GLOBAL IN (NULL, -2147483648, -9223372036854775808), nan, 1024, NULL, NULL, 1.000100016593933, NULL), + (NULL, cityHash64(inf, -2147483648, NULL, NULL, 10.000100135803223), cityHash64(1.1754943508222875e-38, NULL, NULL, NULL), 2147483647) +FROM cluster(test_cluster_two_shards_localhost, numbers((NULL, cityHash64(0., 65536, NULL, NULL, 10000000000., NULL), 0) GLOBAL IN (some_identifier), 65536)) +WHERE number GLOBAL IN [1025] --{serverError 284} diff --git a/tests/queries/0_stateless/arcadia_skip_list.txt b/tests/queries/0_stateless/arcadia_skip_list.txt index d7581cc4e07..606015c369f 100644 --- a/tests/queries/0_stateless/arcadia_skip_list.txt +++ b/tests/queries/0_stateless/arcadia_skip_list.txt @@ -93,6 +93,8 @@ 01138_join_on_distributed_and_tmp 01153_attach_mv_uuid 01155_rename_move_materialized_view +01157_replace_table +01185_create_or_replace_table 01191_rename_dictionary 01200_mutations_memory_consumption 01211_optimize_skip_unused_shards_type_mismatch diff --git a/tests/queries/0_stateless/data_parquet/alltypes_list.parquet.columns b/tests/queries/0_stateless/data_parquet/alltypes_list.parquet.columns index b633ca86bbf..794ee47d757 100644 --- a/tests/queries/0_stateless/data_parquet/alltypes_list.parquet.columns +++ b/tests/queries/0_stateless/data_parquet/alltypes_list.parquet.columns @@ -1 +1 @@ -`a1` Array(Int8), `a2` Array(UInt8), `a3` Array(Int16), `a4` Array(UInt16), `a5` Array(Int32), `a6` Array(UInt32), `a7` Array(Int64), `a8` Array(UInt64), `a9` Array(String), `a10` Array(FixedString(4)), `a11` Array(Float32), `a12` Array(Float64), `a13` Array(Date), `a14` Array(Datetime), `a15` Array(Decimal(4, 2)), `a16` Array(Decimal(10, 2)), `a17` Array(Decimal(25, 2)) +`a1` Array(Int8), `a2` Array(UInt8), `a3` Array(Int16), `a4` Array(UInt16), `a5` Array(Int32), `a6` Array(UInt32), `a7` Array(Int64), `a8` Array(UInt64), `a9` Array(String), `a10` Array(FixedString(4)), `a11` Array(Float32), `a12` Array(Float64), `a13` Array(Date), `a14` Array(Datetime('Europe/Moscow')), `a15` Array(Decimal(4, 2)), `a16` Array(Decimal(10, 2)), `a17` Array(Decimal(25, 2)) diff --git a/tests/queries/0_stateless/data_parquet/v0.7.1.column-metadata-handling.parquet.columns b/tests/queries/0_stateless/data_parquet/v0.7.1.column-metadata-handling.parquet.columns index 3d08da2522c..df35127ede8 100644 --- a/tests/queries/0_stateless/data_parquet/v0.7.1.column-metadata-handling.parquet.columns +++ b/tests/queries/0_stateless/data_parquet/v0.7.1.column-metadata-handling.parquet.columns @@ -1 +1 @@ -`a` Nullable(Int64), `b` Nullable(Float64), `c` Nullable(DateTime), `index` Nullable(String), `__index_level_1__` Nullable(DateTime) +`a` Nullable(Int64), `b` Nullable(Float64), `c` Nullable(DateTime('Europe/Moscow')), `index` Nullable(String), `__index_level_1__` Nullable(DateTime('Europe/Moscow')) diff --git a/tests/queries/0_stateless/data_sqlite/db1 b/tests/queries/0_stateless/data_sqlite/db1 deleted file mode 100644 index 776eff686fb..00000000000 Binary files a/tests/queries/0_stateless/data_sqlite/db1 and /dev/null differ diff --git a/tests/queries/0_stateless/data_zstd/test_01946.zstd b/tests/queries/0_stateless/data_zstd/test_01946.zstd new file mode 100644 index 00000000000..c021b112dad Binary files /dev/null and b/tests/queries/0_stateless/data_zstd/test_01946.zstd differ diff --git a/tests/queries/0_stateless/helpers/00900_parquet_create_table_columns.py b/tests/queries/0_stateless/helpers/00900_parquet_create_table_columns.py index 1a41da8c8b4..92606c9cb26 100755 --- a/tests/queries/0_stateless/helpers/00900_parquet_create_table_columns.py +++ b/tests/queries/0_stateless/helpers/00900_parquet_create_table_columns.py @@ -4,8 +4,8 @@ import json import sys TYPE_PARQUET_CONVERTED_TO_CLICKHOUSE = { - "TIMESTAMP_MICROS": "DateTime", - "TIMESTAMP_MILLIS": "DateTime", + "TIMESTAMP_MICROS": "DateTime('Europe/Moscow')", + "TIMESTAMP_MILLIS": "DateTime('Europe/Moscow')", "UTF8": "String", } diff --git a/tests/queries/1_stateful/00011_sorting.sql b/tests/queries/1_stateful/00011_sorting.sql index 8c6ae457566..381be7b7dd4 100644 --- a/tests/queries/1_stateful/00011_sorting.sql +++ b/tests/queries/1_stateful/00011_sorting.sql @@ -1 +1 @@ -SELECT EventTime FROM test.hits ORDER BY EventTime DESC LIMIT 10 +SELECT EventTime::DateTime('Europe/Moscow') FROM test.hits ORDER BY EventTime DESC LIMIT 10 diff --git a/tests/queries/1_stateful/00012_sorting_distributed.sql b/tests/queries/1_stateful/00012_sorting_distributed.sql index 51f249b3db8..b0cdb8bd8a2 100644 --- a/tests/queries/1_stateful/00012_sorting_distributed.sql +++ b/tests/queries/1_stateful/00012_sorting_distributed.sql @@ -1 +1 @@ -SELECT EventTime FROM remote('127.0.0.{1,2}', test, hits) ORDER BY EventTime DESC LIMIT 10 +SELECT EventTime::DateTime('Europe/Moscow') FROM remote('127.0.0.{1,2}', test, hits) ORDER BY EventTime DESC LIMIT 10 diff --git a/tests/queries/1_stateful/00066_sorting_distributed_many_replicas.sql b/tests/queries/1_stateful/00066_sorting_distributed_many_replicas.sql index d0636133186..4bc563712c0 100644 --- a/tests/queries/1_stateful/00066_sorting_distributed_many_replicas.sql +++ b/tests/queries/1_stateful/00066_sorting_distributed_many_replicas.sql @@ -1,2 +1,2 @@ SET max_parallel_replicas = 2; -SELECT EventTime FROM remote('127.0.0.{1|2}', test, hits) ORDER BY EventTime DESC LIMIT 10 +SELECT EventTime::DateTime('Europe/Moscow') FROM remote('127.0.0.{1|2}', test, hits) ORDER BY EventTime DESC LIMIT 10 diff --git a/tests/queries/1_stateful/00071_merge_tree_optimize_aio.sql b/tests/queries/1_stateful/00071_merge_tree_optimize_aio.sql index 1891cd63555..241f0f9b13b 100644 --- a/tests/queries/1_stateful/00071_merge_tree_optimize_aio.sql +++ b/tests/queries/1_stateful/00071_merge_tree_optimize_aio.sql @@ -1,6 +1,6 @@ DROP TABLE IF EXISTS test.hits_snippet; -CREATE TABLE test.hits_snippet(EventTime DateTime, EventDate Date, CounterID UInt32, UserID UInt64, URL String, Referer String) ENGINE = MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID), EventTime), 8192); +CREATE TABLE test.hits_snippet(EventTime DateTime('Europe/Moscow'), EventDate Date, CounterID UInt32, UserID UInt64, URL String, Referer String) ENGINE = MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID), EventTime), 8192); SET min_insert_block_size_rows = 0, min_insert_block_size_bytes = 0; SET max_block_size = 4096; diff --git a/tests/queries/1_stateful/00072_compare_date_and_string_index.sql b/tests/queries/1_stateful/00072_compare_date_and_string_index.sql index 90f1c875acd..af5d932fecb 100644 --- a/tests/queries/1_stateful/00072_compare_date_and_string_index.sql +++ b/tests/queries/1_stateful/00072_compare_date_and_string_index.sql @@ -15,7 +15,7 @@ SELECT count() FROM test.hits WHERE EventDate IN (toDate('2014-03-18'), toDate(' SELECT count() FROM test.hits WHERE EventDate = concat('2014-0', '3-18'); DROP TABLE IF EXISTS test.hits_indexed_by_time; -CREATE TABLE test.hits_indexed_by_time (EventDate Date, EventTime DateTime) ENGINE = MergeTree(EventDate, EventTime, 8192); +CREATE TABLE test.hits_indexed_by_time (EventDate Date, EventTime DateTime('Europe/Moscow')) ENGINE = MergeTree ORDER BY (EventDate, EventTime); INSERT INTO test.hits_indexed_by_time SELECT EventDate, EventTime FROM test.hits; SELECT count() FROM test.hits_indexed_by_time WHERE EventTime = '2014-03-18 01:02:03'; @@ -25,12 +25,12 @@ SELECT count() FROM test.hits_indexed_by_time WHERE EventTime <= '2014-03-18 01: SELECT count() FROM test.hits_indexed_by_time WHERE EventTime >= '2014-03-18 01:02:03'; SELECT count() FROM test.hits_indexed_by_time WHERE EventTime IN ('2014-03-18 01:02:03', '2014-03-19 04:05:06'); -SELECT count() FROM test.hits_indexed_by_time WHERE EventTime = toDateTime('2014-03-18 01:02:03'); -SELECT count() FROM test.hits_indexed_by_time WHERE EventTime < toDateTime('2014-03-18 01:02:03'); -SELECT count() FROM test.hits_indexed_by_time WHERE EventTime > toDateTime('2014-03-18 01:02:03'); -SELECT count() FROM test.hits_indexed_by_time WHERE EventTime <= toDateTime('2014-03-18 01:02:03'); -SELECT count() FROM test.hits_indexed_by_time WHERE EventTime >= toDateTime('2014-03-18 01:02:03'); -SELECT count() FROM test.hits_indexed_by_time WHERE EventTime IN (toDateTime('2014-03-18 01:02:03'), toDateTime('2014-03-19 04:05:06')); +SELECT count() FROM test.hits_indexed_by_time WHERE EventTime = toDateTime('2014-03-18 01:02:03', 'Europe/Moscow'); +SELECT count() FROM test.hits_indexed_by_time WHERE EventTime < toDateTime('2014-03-18 01:02:03', 'Europe/Moscow'); +SELECT count() FROM test.hits_indexed_by_time WHERE EventTime > toDateTime('2014-03-18 01:02:03', 'Europe/Moscow'); +SELECT count() FROM test.hits_indexed_by_time WHERE EventTime <= toDateTime('2014-03-18 01:02:03', 'Europe/Moscow'); +SELECT count() FROM test.hits_indexed_by_time WHERE EventTime >= toDateTime('2014-03-18 01:02:03', 'Europe/Moscow'); +SELECT count() FROM test.hits_indexed_by_time WHERE EventTime IN (toDateTime('2014-03-18 01:02:03', 'Europe/Moscow'), toDateTime('2014-03-19 04:05:06', 'Europe/Moscow')); SELECT count() FROM test.hits_indexed_by_time WHERE EventTime = concat('2014-03-18 ', '01:02:03'); diff --git a/tests/queries/1_stateful/00075_left_array_join.sql b/tests/queries/1_stateful/00075_left_array_join.sql index 424276cf036..52a48462b9d 100644 --- a/tests/queries/1_stateful/00075_left_array_join.sql +++ b/tests/queries/1_stateful/00075_left_array_join.sql @@ -1,2 +1,2 @@ -SELECT UserID, EventTime, pp.Key1, pp.Key2, ParsedParams.Key1 FROM test.hits ARRAY JOIN ParsedParams AS pp WHERE CounterID = 1704509 ORDER BY UserID, EventTime, pp.Key1, pp.Key2 LIMIT 100; -SELECT UserID, EventTime, pp.Key1, pp.Key2, ParsedParams.Key1 FROM test.hits LEFT ARRAY JOIN ParsedParams AS pp WHERE CounterID = 1704509 ORDER BY UserID, EventTime, pp.Key1, pp.Key2 LIMIT 100; +SELECT UserID, EventTime::DateTime('Europe/Moscow'), pp.Key1, pp.Key2, ParsedParams.Key1 FROM test.hits ARRAY JOIN ParsedParams AS pp WHERE CounterID = 1704509 ORDER BY UserID, EventTime, pp.Key1, pp.Key2 LIMIT 100; +SELECT UserID, EventTime::DateTime('Europe/Moscow'), pp.Key1, pp.Key2, ParsedParams.Key1 FROM test.hits LEFT ARRAY JOIN ParsedParams AS pp WHERE CounterID = 1704509 ORDER BY UserID, EventTime, pp.Key1, pp.Key2 LIMIT 100; diff --git a/tests/queries/1_stateful/00091_prewhere_two_conditions.sql b/tests/queries/1_stateful/00091_prewhere_two_conditions.sql index 201ff788006..c5952be83b6 100644 --- a/tests/queries/1_stateful/00091_prewhere_two_conditions.sql +++ b/tests/queries/1_stateful/00091_prewhere_two_conditions.sql @@ -2,12 +2,12 @@ SET max_bytes_to_read = 600000000; SET optimize_move_to_prewhere = 1; -SELECT uniq(URL) FROM test.hits WHERE EventTime >= '2014-03-20 00:00:00' AND EventTime < '2014-03-21 00:00:00'; -SELECT uniq(URL) FROM test.hits WHERE EventTime >= '2014-03-20 00:00:00' AND URL != '' AND EventTime < '2014-03-21 00:00:00'; -SELECT uniq(*) FROM test.hits WHERE EventTime >= '2014-03-20 00:00:00' AND EventTime < '2014-03-21 00:00:00' AND EventDate = '2014-03-21'; -WITH EventTime AS xyz SELECT uniq(*) FROM test.hits WHERE xyz >= '2014-03-20 00:00:00' AND xyz < '2014-03-21 00:00:00' AND EventDate = '2014-03-21'; +SELECT uniq(URL) FROM test.hits WHERE toTimeZone(EventTime, 'Europe/Moscow') >= '2014-03-20 00:00:00' AND toTimeZone(EventTime, 'Europe/Moscow') < '2014-03-21 00:00:00'; +SELECT uniq(URL) FROM test.hits WHERE toTimeZone(EventTime, 'Europe/Moscow') >= '2014-03-20 00:00:00' AND URL != '' AND toTimeZone(EventTime, 'Europe/Moscow') < '2014-03-21 00:00:00'; +SELECT uniq(*) FROM test.hits WHERE toTimeZone(EventTime, 'Europe/Moscow') >= '2014-03-20 00:00:00' AND toTimeZone(EventTime, 'Europe/Moscow') < '2014-03-21 00:00:00' AND EventDate = '2014-03-21'; +WITH toTimeZone(EventTime, 'Europe/Moscow') AS xyz SELECT uniq(*) FROM test.hits WHERE xyz >= '2014-03-20 00:00:00' AND xyz < '2014-03-21 00:00:00' AND EventDate = '2014-03-21'; SET optimize_move_to_prewhere = 0; -SELECT uniq(URL) FROM test.hits WHERE EventTime >= '2014-03-20 00:00:00' AND EventTime < '2014-03-21 00:00:00'; -- { serverError 307 } -SELECT uniq(URL) FROM test.hits WHERE EventTime >= '2014-03-20 00:00:00' AND URL != '' AND EventTime < '2014-03-21 00:00:00'; -- { serverError 307 } +SELECT uniq(URL) FROM test.hits WHERE toTimeZone(EventTime, 'Europe/Moscow') >= '2014-03-20 00:00:00' AND toTimeZone(EventTime, 'Europe/Moscow') < '2014-03-21 00:00:00'; -- { serverError 307 } +SELECT uniq(URL) FROM test.hits WHERE toTimeZone(EventTime, 'Europe/Moscow') >= '2014-03-20 00:00:00' AND URL != '' AND toTimeZone(EventTime, 'Europe/Moscow') < '2014-03-21 00:00:00'; -- { serverError 307 } diff --git a/tests/queries/1_stateful/00159_parallel_formatting_csv_and_friends.sh b/tests/queries/1_stateful/00159_parallel_formatting_csv_and_friends.sh index dc14928afa6..a6b5620812d 100755 --- a/tests/queries/1_stateful/00159_parallel_formatting_csv_and_friends.sh +++ b/tests/queries/1_stateful/00159_parallel_formatting_csv_and_friends.sh @@ -10,10 +10,10 @@ for format in "${FORMATS[@]}" do echo "$format, false"; $CLICKHOUSE_CLIENT --output_format_parallel_formatting=false -q \ - "SELECT ClientEventTime as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum + "SELECT ClientEventTime::DateTime('Europe/Moscow') as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum echo "$format, true"; $CLICKHOUSE_CLIENT --output_format_parallel_formatting=true -q \ - "SELECT ClientEventTime as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum + "SELECT ClientEventTime::DateTime('Europe/Moscow') as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum done diff --git a/tests/queries/1_stateful/00159_parallel_formatting_http.sh b/tests/queries/1_stateful/00159_parallel_formatting_http.sh index a4e68de6a3f..1dcae50812e 100755 --- a/tests/queries/1_stateful/00159_parallel_formatting_http.sh +++ b/tests/queries/1_stateful/00159_parallel_formatting_http.sh @@ -10,8 +10,8 @@ FORMATS=('TSV' 'CSV' 'JSONCompactEachRow') for format in "${FORMATS[@]}" do echo "$format, false"; - ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&query=SELECT+ClientEventTime+as+a,MobilePhoneModel+as+b,ClientIP6+as+c+FROM+test.hits+ORDER+BY+a,b,c+LIMIT+1000000+Format+$format&output_format_parallel_formatting=false" -d' ' | md5sum + ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&query=SELECT+ClientEventTime::DateTime('Europe/Moscow')+as+a,MobilePhoneModel+as+b,ClientIP6+as+c+FROM+test.hits+ORDER+BY+a,b,c+LIMIT+1000000+Format+$format&output_format_parallel_formatting=false" -d' ' | md5sum echo "$format, true"; - ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&query=SELECT+ClientEventTime+as+a,MobilePhoneModel+as+b,ClientIP6+as+c+FROM+test.hits+ORDER+BY+a,b,c+LIMIT+1000000+Format+$format&output_format_parallel_formatting=true" -d' ' | md5sum + ${CLICKHOUSE_CURL} -sS "${CLICKHOUSE_URL}&query=SELECT+ClientEventTime::DateTime('Europe/Moscow')+as+a,MobilePhoneModel+as+b,ClientIP6+as+c+FROM+test.hits+ORDER+BY+a,b,c+LIMIT+1000000+Format+$format&output_format_parallel_formatting=true" -d' ' | md5sum done diff --git a/tests/queries/1_stateful/00159_parallel_formatting_json_and_friends.reference b/tests/queries/1_stateful/00159_parallel_formatting_json_and_friends.reference index 96353f350ec..6d663c33057 100644 --- a/tests/queries/1_stateful/00159_parallel_formatting_json_and_friends.reference +++ b/tests/queries/1_stateful/00159_parallel_formatting_json_and_friends.reference @@ -7,6 +7,6 @@ ba1081a754a06ef6563840b2d8d4d327 - JSONCompactEachRow, true ba1081a754a06ef6563840b2d8d4d327 - JSONCompactStringsEachRowWithNamesAndTypes, false -902e53f621d5336aa7f702a5d6b64b42 - +31ded3cd9971b124450fb5a44a8bce63 - JSONCompactStringsEachRowWithNamesAndTypes, true -902e53f621d5336aa7f702a5d6b64b42 - +31ded3cd9971b124450fb5a44a8bce63 - diff --git a/tests/queries/1_stateful/00159_parallel_formatting_json_and_friends.sh b/tests/queries/1_stateful/00159_parallel_formatting_json_and_friends.sh index e02515d5c16..9f61b454d56 100755 --- a/tests/queries/1_stateful/00159_parallel_formatting_json_and_friends.sh +++ b/tests/queries/1_stateful/00159_parallel_formatting_json_and_friends.sh @@ -11,9 +11,9 @@ for format in "${FORMATS[@]}" do echo "$format, false"; $CLICKHOUSE_CLIENT --output_format_parallel_formatting=false -q \ - "SELECT ClientEventTime as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum + "SELECT ClientEventTime::DateTime('Europe/Moscow') as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum echo "$format, true"; $CLICKHOUSE_CLIENT --output_format_parallel_formatting=true -q \ - "SELECT ClientEventTime as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum + "SELECT ClientEventTime::DateTime('Europe/Moscow') as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum done diff --git a/tests/queries/1_stateful/00159_parallel_formatting_tsv_and_friends.sh b/tests/queries/1_stateful/00159_parallel_formatting_tsv_and_friends.sh index a81dfdc33b4..02d083c0498 100755 --- a/tests/queries/1_stateful/00159_parallel_formatting_tsv_and_friends.sh +++ b/tests/queries/1_stateful/00159_parallel_formatting_tsv_and_friends.sh @@ -11,9 +11,9 @@ for format in "${FORMATS[@]}" do echo "$format, false"; $CLICKHOUSE_CLIENT --output_format_parallel_formatting=false -q \ - "SELECT ClientEventTime as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum + "SELECT ClientEventTime::DateTime('Europe/Moscow') as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum echo "$format, true"; $CLICKHOUSE_CLIENT --output_format_parallel_formatting=true -q \ - "SELECT ClientEventTime as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum + "SELECT ClientEventTime::DateTime('Europe/Moscow') as a, MobilePhoneModel as b, ClientIP6 as c FROM test.hits ORDER BY a, b, c Format $format" | md5sum done diff --git a/tests/queries/1_stateful/00161_parallel_parsing_with_names.sh b/tests/queries/1_stateful/00161_parallel_parsing_with_names.sh index ca9984900e1..777d95fa0af 100755 --- a/tests/queries/1_stateful/00161_parallel_parsing_with_names.sh +++ b/tests/queries/1_stateful/00161_parallel_parsing_with_names.sh @@ -10,23 +10,23 @@ $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS parsing_with_names" for format in "${FORMATS[@]}" do # Columns are permuted - $CLICKHOUSE_CLIENT -q "CREATE TABLE parsing_with_names(c FixedString(16), a DateTime, b String) ENGINE=Memory()" - + $CLICKHOUSE_CLIENT -q "CREATE TABLE parsing_with_names(c FixedString(16), a DateTime('Europe/Moscow'), b String) ENGINE=Memory()" + echo "$format, false"; $CLICKHOUSE_CLIENT --output_format_parallel_formatting=false -q \ - "SELECT URLRegions as d, ClientEventTime as a, MobilePhoneModel as b, ParamPrice as e, ClientIP6 as c FROM test.hits LIMIT 50000 Format $format" | \ + "SELECT URLRegions as d, toTimeZone(ClientEventTime, 'Europe/Moscow') as a, MobilePhoneModel as b, ParamPrice as e, ClientIP6 as c FROM test.hits LIMIT 50000 Format $format" | \ $CLICKHOUSE_CLIENT --input_format_skip_unknown_fields=1 --input_format_parallel_parsing=false -q "INSERT INTO parsing_with_names FORMAT $format" $CLICKHOUSE_CLIENT -q "SELECT * FROM parsing_with_names;" | md5sum $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS parsing_with_names" - - $CLICKHOUSE_CLIENT -q "CREATE TABLE parsing_with_names(c FixedString(16), a DateTime, b String) ENGINE=Memory()" + + $CLICKHOUSE_CLIENT -q "CREATE TABLE parsing_with_names(c FixedString(16), a DateTime('Europe/Moscow'), b String) ENGINE=Memory()" echo "$format, true"; $CLICKHOUSE_CLIENT --output_format_parallel_formatting=false -q \ - "SELECT URLRegions as d, ClientEventTime as a, MobilePhoneModel as b, ParamPrice as e, ClientIP6 as c FROM test.hits LIMIT 50000 Format $format" | \ + "SELECT URLRegions as d, toTimeZone(ClientEventTime, 'Europe/Moscow') as a, MobilePhoneModel as b, ParamPrice as e, ClientIP6 as c FROM test.hits LIMIT 50000 Format $format" | \ $CLICKHOUSE_CLIENT --input_format_skip_unknown_fields=1 --input_format_parallel_parsing=true -q "INSERT INTO parsing_with_names FORMAT $format" $CLICKHOUSE_CLIENT -q "SELECT * FROM parsing_with_names;" | md5sum $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS parsing_with_names" -done \ No newline at end of file +done diff --git a/tests/queries/1_stateful/00163_column_oriented_formats.sh b/tests/queries/1_stateful/00163_column_oriented_formats.sh index 1363ccf3c00..50ad20cbe92 100755 --- a/tests/queries/1_stateful/00163_column_oriented_formats.sh +++ b/tests/queries/1_stateful/00163_column_oriented_formats.sh @@ -11,7 +11,7 @@ for format in "${FORMATS[@]}" do echo $format $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS 00163_column_oriented SYNC" - $CLICKHOUSE_CLIENT -q "CREATE TABLE 00163_column_oriented(ClientEventTime DateTime, MobilePhoneModel String, ClientIP6 FixedString(16)) ENGINE=File($format)" + $CLICKHOUSE_CLIENT -q "CREATE TABLE 00163_column_oriented(ClientEventTime DateTime('Europe/Moscow'), MobilePhoneModel String, ClientIP6 FixedString(16)) ENGINE=File($format)" $CLICKHOUSE_CLIENT -q "INSERT INTO 00163_column_oriented SELECT ClientEventTime, MobilePhoneModel, ClientIP6 FROM test.hits ORDER BY ClientEventTime, MobilePhoneModel, ClientIP6 LIMIT 100" $CLICKHOUSE_CLIENT -q "SELECT ClientEventTime from 00163_column_oriented" | md5sum $CLICKHOUSE_CLIENT -q "SELECT MobilePhoneModel from 00163_column_oriented" | md5sum diff --git a/tests/queries/1_stateful/00166_explain_estimate.reference b/tests/queries/1_stateful/00166_explain_estimate.reference new file mode 100644 index 00000000000..71ddd681581 --- /dev/null +++ b/tests/queries/1_stateful/00166_explain_estimate.reference @@ -0,0 +1,5 @@ +test hits 1 57344 7 +test hits 1 8839168 1079 +test hits 1 835584 102 +test hits 1 8003584 977 +test hits 2 581632 71 diff --git a/tests/queries/1_stateful/00166_explain_estimate.sql b/tests/queries/1_stateful/00166_explain_estimate.sql new file mode 100644 index 00000000000..06725ff7f9f --- /dev/null +++ b/tests/queries/1_stateful/00166_explain_estimate.sql @@ -0,0 +1,5 @@ +EXPLAIN ESTIMATE SELECT count() FROM test.hits WHERE CounterID = 29103473; +EXPLAIN ESTIMATE SELECT count() FROM test.hits WHERE CounterID != 29103473; +EXPLAIN ESTIMATE SELECT count() FROM test.hits WHERE CounterID > 29103473; +EXPLAIN ESTIMATE SELECT count() FROM test.hits WHERE CounterID < 29103473; +EXPLAIN ESTIMATE SELECT count() FROM test.hits WHERE CounterID = 29103473 UNION ALL SELECT count() FROM test.hits WHERE CounterID = 1704509; diff --git a/tests/queries/query_test.py b/tests/queries/query_test.py index 952e93c09cd..5f7dd79cf3f 100644 --- a/tests/queries/query_test.py +++ b/tests/queries/query_test.py @@ -12,6 +12,7 @@ SKIP_LIST = [ # these couple of tests hangs everything "00600_replace_running_query", "00987_distributed_stack_overflow", + "01954_clickhouse_benchmark_multiple_long", # just fail "00133_long_shard_memory_tracker_and_exception_safety", diff --git a/tests/queries/shell_config.sh b/tests/queries/shell_config.sh index e768a773255..ae279c93527 100644 --- a/tests/queries/shell_config.sh +++ b/tests/queries/shell_config.sh @@ -73,7 +73,7 @@ export CLICKHOUSE_PORT_MYSQL=${CLICKHOUSE_PORT_MYSQL:="9004"} export CLICKHOUSE_PORT_POSTGRESQL=${CLICKHOUSE_PORT_POSTGRESQL:=$(${CLICKHOUSE_EXTRACT_CONFIG} --try --key=postgresql_port 2>/dev/null)} 2>/dev/null export CLICKHOUSE_PORT_POSTGRESQL=${CLICKHOUSE_PORT_POSTGRESQL:="9005"} -export CLICKHOUSE_CLIENT_SECURE=${CLICKHOUSE_CLIENT_SECURE:=$(echo ${CLICKHOUSE_CLIENT} | sed 's/'"--port=${CLICKHOUSE_PORT_TCP}"'/'"--secure --port=${CLICKHOUSE_PORT_TCP_SECURE}"'/g')} +export CLICKHOUSE_CLIENT_SECURE=${CLICKHOUSE_CLIENT_SECURE:=$(echo "${CLICKHOUSE_CLIENT}" | sed 's/'"--port=${CLICKHOUSE_PORT_TCP}"'//g; s/$/'"--secure --port=${CLICKHOUSE_PORT_TCP_SECURE}"'/g')} # Add database and log comment to url params if [ -v CLICKHOUSE_URL_PARAMS ] diff --git a/tests/queries/skip_list.json b/tests/queries/skip_list.json index fd800d3bc33..579a2636ad5 100644 --- a/tests/queries/skip_list.json +++ b/tests/queries/skip_list.json @@ -41,7 +41,8 @@ "01473_event_time_microseconds", "01526_max_untracked_memory", /// requires TraceCollector, does not available under sanitizers "01594_too_low_memory_limits", /// requires jemalloc to track small allocations - "01193_metadata_loading" + "01193_metadata_loading", + "01955_clickhouse_benchmark_connection_hang" /// Limits RLIMIT_NOFILE, see comment in the test ], "memory-sanitizer": [ "capnproto", @@ -109,6 +110,8 @@ "00510_materizlized_view_and_deduplication_zookeeper", "00738_lock_for_inner_table", "01153_attach_mv_uuid", + "01157_replace_table", + "01185_create_or_replace_table", /// Sometimes cannot lock file most likely due to concurrent or adjacent tests, but we don't care how it works in Ordinary database. "rocksdb", "01914_exchange_dictionaries" /// Requires Atomic database @@ -118,16 +121,14 @@ "memory_tracking", "memory_usage", "live_view", - "01181_db_atomic_drop_on_cluster", - "01175_distributed_ddl_output_mode", - "01415_sticking_mutations", - "00980_zookeeper_merge_tree_alter_settings", "01148_zookeeper_path_macros_unfolding", - "01294_system_distributed_on_cluster", "01269_create_with_null", "01451_replicated_detach_drop_and_quorum", "01188_attach_table_from_path", - "01149_zookeeper_mutation_stuck_after_replace_partition", + /// ON CLUSTER is not allowed + "01181_db_atomic_drop_on_cluster", + "01175_distributed_ddl_output_mode", + "01415_sticking_mutations", /// user_files "01721_engine_file_truncate_on_insert", /// Fails due to additional replicas or shards @@ -136,6 +137,7 @@ "01532_execute_merges_on_single_replica", "00652_replicated_mutations_default_database_zookeeper", "00620_optimize_on_nonleader_replica_zookeeper", + "01158_zookeeper_log", /// grep -c "01018_ddl_dictionaries_bad_queries", "00908_bloom_filter_index", @@ -156,6 +158,8 @@ "00152_insert_different_granularity", "00054_merge_tree_partitions", "01781_merge_tree_deduplication", + "00980_zookeeper_merge_tree_alter_settings", + "00980_merge_alter_settings", /// Old syntax is not allowed "01062_alter_on_mutataion_zookeeper", "00925_zookeeper_empty_replicated_merge_tree_optimize_final", @@ -173,13 +177,17 @@ /// Does not support renaming of multiple tables in single query "00634_rename_view", "00140_rename", + /// Different query_id + "01943_query_id_check", /// Requires investigation "00953_zookeeper_suetin_deduplication_bug", - "01783_http_chunk_size" + "01783_http_chunk_size", + "00166_explain_estimate" ], "polymorphic-parts": [ "01508_partition_pruning_long", /// bug, shoud be fixed - "01482_move_to_prewhere_and_cast" /// bug, shoud be fixed + "01482_move_to_prewhere_and_cast", /// bug, shoud be fixed + "01158_zookeeper_log" ], "parallel": [ @@ -491,6 +499,7 @@ "01684_ssd_cache_dictionary_simple_key", "01685_ssd_cache_dictionary_complex_key", "01737_clickhouse_server_wait_server_pool_long", // This test is fully compatible to run in parallel, however under ASAN processes are pretty heavy and may fail under flaky adress check. + "01954_clickhouse_benchmark_round_robin", // This test is fully compatible to run in parallel, however under ASAN processes are pretty heavy and may fail under flaky adress check. "01594_too_low_memory_limits", // This test is fully compatible to run in parallel, however under ASAN processes are pretty heavy and may fail under flaky adress check. "01760_system_dictionaries", "01760_polygon_dictionaries", @@ -514,6 +523,12 @@ "01915_create_or_replace_dictionary", "01925_test_storage_merge_aliases", "01933_client_replxx_convert_history", /// Uses non unique history file - "01902_table_function_merge_db_repr" + "01939_user_with_default_database", //create user and database + "01999_grant_with_replace", + "01902_table_function_merge_db_repr", + "01946_test_zstd_decompression_with_escape_sequence_at_the_end_of_buffer", + "01946_test_wrong_host_name_access", + "01213_alter_rename_with_default_zookeeper", /// Warning: Removing leftovers from table. + "02001_add_default_database_to_system_users" ///create user ] } diff --git a/tests/server-test.xml b/tests/server-test.xml index dd21d55c78c..1f67317ad0a 100644 --- a/tests/server-test.xml +++ b/tests/server-test.xml @@ -1,5 +1,11 @@ - + trace diff --git a/tests/testflows/aes_encryption/regression.py b/tests/testflows/aes_encryption/regression.py index 60f42ece509..93add36cd1d 100755 --- a/tests/testflows/aes_encryption/regression.py +++ b/tests/testflows/aes_encryption/regression.py @@ -6,7 +6,6 @@ from testflows.core import * append_path(sys.path, "..") from helpers.cluster import Cluster -from helpers.common import Pool, join, run_scenario from helpers.argparser import argparser from aes_encryption.requirements import * @@ -52,19 +51,19 @@ xfails = { [(Fail, issue_18250)], "compatibility/mysql/:engine/encrypt/mysql_datatype='VARCHAR(100)'/:": [(Fail, issue_18250)], - # reinterpretAsFixedString for UUID stopped working + # reinterpretAsFixedString for UUID stopped working "decrypt/decryption/mode=:datatype=UUID:": - [(Fail, issue_24029)], + [(Fail, issue_24029)], "encrypt/:/mode=:datatype=UUID:": - [(Fail, issue_24029)], - "decrypt/invalid ciphertext/mode=:/invalid ciphertext=reinterpretAsFixedString(toUUID:": - [(Fail, issue_24029)], + [(Fail, issue_24029)], + "decrypt/invalid ciphertext/mode=:/invalid ciphertext=reinterpretAsFixedString(toUUID:": + [(Fail, issue_24029)], "encrypt_mysql/encryption/mode=:datatype=UUID:": - [(Fail, issue_24029)], + [(Fail, issue_24029)], "decrypt_mysql/decryption/mode=:datatype=UUID:": - [(Fail, issue_24029)], + [(Fail, issue_24029)], "decrypt_mysql/invalid ciphertext/mode=:/invalid ciphertext=reinterpretAsFixedString(toUUID:": - [(Fail, issue_24029)], + [(Fail, issue_24029)], } @TestFeature @@ -76,33 +75,29 @@ xfails = { RQ_SRS008_AES_Functions_DifferentModes("1.0") ) @XFails(xfails) -def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): +def regression(self, local, clickhouse_binary_path, stress=None): """ClickHouse AES encryption functions regression module. """ - top().terminating = False nodes = { "clickhouse": ("clickhouse1", "clickhouse2", "clickhouse3"), } if stress is not None: self.context.stress = stress - if parallel is not None: - self.context.parallel = parallel with Cluster(local, clickhouse_binary_path, nodes=nodes, docker_compose_project_dir=os.path.join(current_dir(), "aes_encryption_env")) as cluster: self.context.cluster = cluster - tasks = [] with Pool(5) as pool: try: - run_scenario(pool, tasks, Feature(test=load("aes_encryption.tests.encrypt", "feature"), flags=TE)) - run_scenario(pool, tasks, Feature(test=load("aes_encryption.tests.decrypt", "feature"), flags=TE)) - run_scenario(pool, tasks, Feature(test=load("aes_encryption.tests.encrypt_mysql", "feature"), flags=TE)) - run_scenario(pool, tasks, Feature(test=load("aes_encryption.tests.decrypt_mysql", "feature"), flags=TE)) - run_scenario(pool, tasks, Feature(test=load("aes_encryption.tests.compatibility.feature", "feature"), flags=TE)) + Feature(run=load("aes_encryption.tests.encrypt", "feature"), flags=TE, parallel=True, executor=pool) + Feature(run=load("aes_encryption.tests.decrypt", "feature"), flags=TE, parallel=True, executor=pool) + Feature(run=load("aes_encryption.tests.encrypt_mysql", "feature"), flags=TE, parallel=True, executor=pool) + Feature(run=load("aes_encryption.tests.decrypt_mysql", "feature"), flags=TE, parallel=True, executor=pool) + Feature(run=load("aes_encryption.tests.compatibility.feature", "feature"), flags=TE, parallel=True, executor=pool) finally: - join(tasks) + join() if main(): regression() diff --git a/tests/testflows/datetime64_extended_range/regression.py b/tests/testflows/datetime64_extended_range/regression.py index f8db9c74c9e..062c36660ed 100755 --- a/tests/testflows/datetime64_extended_range/regression.py +++ b/tests/testflows/datetime64_extended_range/regression.py @@ -61,33 +61,29 @@ xfails = { RQ_SRS_010_DateTime64_ExtendedRange("1.0"), ) @XFails(xfails) -def regression(self, local, clickhouse_binary_path, parallel=False, stress=False): +def regression(self, local, clickhouse_binary_path, stress=False): """ClickHouse DateTime64 Extended Range regression module. """ - top().terminating = False nodes = { "clickhouse": ("clickhouse1", "clickhouse2", "clickhouse3"), } if stress is not None: self.context.stress = stress - if parallel is not None: - self.context.parallel = parallel with Cluster(local, clickhouse_binary_path, nodes=nodes, docker_compose_project_dir=os.path.join(current_dir(), "datetime64_extended_range_env")) as cluster: self.context.cluster = cluster - tasks = [] with Pool(2) as pool: try: - run_scenario(pool, tasks, Scenario(test=load("datetime64_extended_range.tests.generic", "generic"))) - run_scenario(pool, tasks, Scenario(test=load("datetime64_extended_range.tests.non_existent_time", "feature"))) - run_scenario(pool, tasks, Scenario(test=load("datetime64_extended_range.tests.reference_times", "reference_times"))) - run_scenario(pool, tasks, Scenario(test=load("datetime64_extended_range.tests.date_time_functions", "date_time_funcs"))) - run_scenario(pool, tasks, Scenario(test=load("datetime64_extended_range.tests.type_conversion", "type_conversion"))) + Scenario(run=load("datetime64_extended_range.tests.generic", "generic"), parallel=True, executor=pool) + Scenario(run=load("datetime64_extended_range.tests.non_existent_time", "feature"), parallel=True, executor=pool) + Scenario(run=load("datetime64_extended_range.tests.reference_times", "reference_times"), parallel=True, executor=pool) + Scenario(run=load("datetime64_extended_range.tests.date_time_functions", "date_time_funcs"), parallel=True, executor=pool) + Scenario(run=load("datetime64_extended_range.tests.type_conversion", "type_conversion"), parallel=True, executor=pool) finally: - join(tasks) + join() if main(): regression() diff --git a/tests/testflows/datetime64_extended_range/tests/common.py b/tests/testflows/datetime64_extended_range/tests/common.py index f30381d2f9e..c3bee076bf4 100644 --- a/tests/testflows/datetime64_extended_range/tests/common.py +++ b/tests/testflows/datetime64_extended_range/tests/common.py @@ -128,7 +128,6 @@ def walk_datetime_in_incrementing_steps(self, date, hrs_range=(0, 24), step=1, t stress = self.context.stress secs = f"00{'.' * (precision > 0)}{'0' * precision}" - tasks = [] with Pool(2) as pool: try: with When(f"I loop through datetime range {hrs_range} starting from {date} in {step}min increments"): @@ -138,11 +137,11 @@ def walk_datetime_in_incrementing_steps(self, date, hrs_range=(0, 24), step=1, t expected = datetime with When(f"time is {datetime}"): - run_scenario(pool, tasks, Test(name=f"{hrs}:{mins}:{secs}", test=select_check_datetime), - kwargs=dict(datetime=datetime, precision=precision, timezone=timezone, - expected=expected)) + Test(name=f"{hrs}:{mins}:{secs}", test=select_check_datetime, parallel=True, executor=pool)( + datetime=datetime, precision=precision, timezone=timezone, + expected=expected) finally: - join(tasks) + join() @TestStep @@ -159,7 +158,6 @@ def walk_datetime_in_decrementing_steps(self, date, hrs_range=(23, 0), step=1, t stress = self.context.stress secs = f"00{'.' * (precision > 0)}{'0' * precision}" - tasks = [] with Pool(2) as pool: try: with When(f"I loop through datetime range {hrs_range} starting from {date} in {step}min decrements"): @@ -169,8 +167,8 @@ def walk_datetime_in_decrementing_steps(self, date, hrs_range=(23, 0), step=1, t expected = datetime with When(f"time is {datetime}"): - run_scenario(pool, tasks, Test(name=f"{hrs}:{mins}:{secs}", test=select_check_datetime), - kwargs=dict(datetime=datetime, precision=precision, timezone=timezone, - expected=expected)) + Test(name=f"{hrs}:{mins}:{secs}", test=select_check_datetime, parallel=True, executor=pool)( + datetime=datetime, precision=precision, timezone=timezone, + expected=expected) finally: - join(tasks) + join() diff --git a/tests/testflows/datetime64_extended_range/tests/date_time_functions.py b/tests/testflows/datetime64_extended_range/tests/date_time_functions.py index b0b22f82452..7338f34668a 100644 --- a/tests/testflows/datetime64_extended_range/tests/date_time_functions.py +++ b/tests/testflows/datetime64_extended_range/tests/date_time_functions.py @@ -3,7 +3,7 @@ import pytz import itertools from testflows.core import * import dateutil.relativedelta as rd -from datetime import datetime, timedelta +from datetime import datetime from datetime64_extended_range.requirements.requirements import * from datetime64_extended_range.common import * @@ -1536,10 +1536,9 @@ def date_time_funcs(self, node="clickhouse1"): """ self.context.node = self.context.cluster.node(node) - tasks = [] with Pool(4) as pool: try: for scenario in loads(current_module(), Scenario): - run_scenario(pool, tasks, Scenario(test=scenario)) + Scenario(run=scenario, parallel=True, executor=pool) finally: - join(tasks) + join() diff --git a/tests/testflows/datetime64_extended_range/tests/generic.py b/tests/testflows/datetime64_extended_range/tests/generic.py index 8cb56b99545..6eb117553e0 100644 --- a/tests/testflows/datetime64_extended_range/tests/generic.py +++ b/tests/testflows/datetime64_extended_range/tests/generic.py @@ -5,7 +5,6 @@ from datetime64_extended_range.common import * from datetime64_extended_range.tests.common import * import pytz -import datetime import itertools @TestScenario diff --git a/tests/testflows/datetime64_extended_range/tests/reference_times.py b/tests/testflows/datetime64_extended_range/tests/reference_times.py index 4d4762cc756..cdec3eb260c 100644 --- a/tests/testflows/datetime64_extended_range/tests/reference_times.py +++ b/tests/testflows/datetime64_extended_range/tests/reference_times.py @@ -3,7 +3,6 @@ from datetime import datetime from testflows.core import * from datetime64_extended_range.common import * -from datetime64_extended_range.tests.common import select_check_datetime from datetime64_extended_range.requirements.requirements import * from datetime64_extended_range.tests.common import * diff --git a/tests/testflows/example/regression.py b/tests/testflows/example/regression.py index 4e7ed1025e4..7a0c94a7cd4 100755 --- a/tests/testflows/example/regression.py +++ b/tests/testflows/example/regression.py @@ -11,18 +11,15 @@ from helpers.argparser import argparser @TestFeature @Name("example") @ArgumentParser(argparser) -def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): +def regression(self, local, clickhouse_binary_path, stress=None): """Simple example of how you can use TestFlows to test ClickHouse. """ - top().terminating = False nodes = { "clickhouse": ("clickhouse1",), } if stress is not None: self.context.stress = stress - if parallel is not None: - self.context.parallel = parallel with Cluster(local, clickhouse_binary_path, nodes=nodes, docker_compose_project_dir=os.path.join(current_dir(), "example_env")) as cluster: diff --git a/tests/testflows/extended_precision_data_types/regression.py b/tests/testflows/extended_precision_data_types/regression.py index 8fea6f68e5c..5572381d817 100755 --- a/tests/testflows/extended_precision_data_types/regression.py +++ b/tests/testflows/extended_precision_data_types/regression.py @@ -27,25 +27,20 @@ xflags = { @Requirements( RQ_SRS_020_ClickHouse_Extended_Precision("1.0"), ) -def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): +def regression(self, local, clickhouse_binary_path, stress=None): """Extended precision data type regression. """ - - top().terminating = False - nodes = { "clickhouse": ("clickhouse1",) } + with Cluster(local, clickhouse_binary_path, nodes=nodes, docker_compose_project_dir=os.path.join(current_dir(), "extended-precision-data-type_env")) as cluster: self.context.cluster = cluster self.context.stress = stress - if parallel is not None: - self.context.parallel = parallel - Feature(run=load("extended_precision_data_types.tests.feature", "feature")) if main(): diff --git a/tests/testflows/extended_precision_data_types/snapshots/common.py.tests.snapshot b/tests/testflows/extended_precision_data_types/snapshots/common.py.tests.snapshot index 18b58b0cfdc..e0414393111 100644 --- a/tests/testflows/extended_precision_data_types/snapshots/common.py.tests.snapshot +++ b/tests/testflows/extended_precision_data_types/snapshots/common.py.tests.snapshot @@ -1045,6 +1045,7 @@ a mapPopulateSeries_with_Int128_on_a_table = r""" a +([1,2,3,4,5],[1,2,3,0,0]) """ mapContains_with_Int128_on_a_table = r""" @@ -1575,6 +1576,7 @@ a mapPopulateSeries_with_Int256_on_a_table = r""" a +([1,2,3,4,5],[1,2,3,0,0]) """ mapContains_with_Int256_on_a_table = r""" @@ -2105,6 +2107,7 @@ a mapPopulateSeries_with_UInt128_on_a_table = r""" a +([1,2,3,4,5],[1,2,3,0,0]) """ mapContains_with_UInt128_on_a_table = r""" @@ -2635,6 +2638,7 @@ a mapPopulateSeries_with_UInt256_on_a_table = r""" a +([1,2,3,4,5],[1,2,3,0,0]) """ mapContains_with_UInt256_on_a_table = r""" diff --git a/tests/testflows/extended_precision_data_types/tests/array_tuple_map.py b/tests/testflows/extended_precision_data_types/tests/array_tuple_map.py index 938beabfff4..c39574ba75e 100644 --- a/tests/testflows/extended_precision_data_types/tests/array_tuple_map.py +++ b/tests/testflows/extended_precision_data_types/tests/array_tuple_map.py @@ -357,7 +357,7 @@ def map_func(self, data_type, node=None): exitcode, message = 0, None if data_type.startswith("Decimal"): - exitcode, message = 43, "Exception:" + exitcode, message = 43, "Exception:" node.query(sql, exitcode=exitcode, message=message) execute_query(f"""SELECT * FROM {table_name} ORDER BY a ASC""") @@ -393,9 +393,13 @@ def map_func(self, data_type, node=None): execute_query(f"SELECT * FROM {table_name} ORDER BY a ASC") with Scenario(f"mapPopulateSeries with {data_type}"): - node.query(f"SELECT mapPopulateSeries([1,2,3], [{to_data_type(data_type,1)}," - f"{to_data_type(data_type,2)}, {to_data_type(data_type,3)}], 5)", - exitcode = 44, message='Exception:') + sql = (f"SELECT mapPopulateSeries([1,2,3], [{to_data_type(data_type,1)}," + f"{to_data_type(data_type,2)}, {to_data_type(data_type,3)}], 5)") + + exitcode, message = 0, None + if data_type.startswith("Decimal"): + exitcode, message = 44, "Exception:" + node.query(sql, exitcode=exitcode, message=message) with Scenario(f"mapPopulateSeries with {data_type} on a table"): table_name = get_table_name() @@ -403,9 +407,13 @@ def map_func(self, data_type, node=None): table(name = table_name, data_type = f'Tuple(Array({data_type}), Array({data_type}))') with When("I insert the output into a table"): - node.query(f"INSERT INTO {table_name} SELECT mapPopulateSeries([1,2,3]," - f"[{to_data_type(data_type,1)}, {to_data_type(data_type,2)}, {to_data_type(data_type,3)}], 5)", - exitcode = 44, message='Exception:') + sql = (f"INSERT INTO {table_name} SELECT mapPopulateSeries([1,2,3]," + f"[{to_data_type(data_type,1)}, {to_data_type(data_type,2)}, {to_data_type(data_type,3)}], 5)") + + exitcode, message = 0, None + if data_type.startswith("Decimal"): + exitcode, message = 44, "Exception:" + node.query(sql, exitcode=exitcode, message=message) execute_query(f"SELECT * FROM {table_name} ORDER BY a ASC") diff --git a/tests/testflows/helpers/argparser.py b/tests/testflows/helpers/argparser.py index 03014becb76..63012601e3b 100644 --- a/tests/testflows/helpers/argparser.py +++ b/tests/testflows/helpers/argparser.py @@ -1,12 +1,5 @@ import os -def onoff(v): - if v in ["yes", "1", "on"]: - return True - elif v in ["no", "0", "off"]: - return False - raise ValueError(f"invalid {v}") - def argparser(parser): """Default argument parser for regressions. """ @@ -21,6 +14,3 @@ def argparser(parser): parser.add_argument("--stress", action="store_true", default=False, help="enable stress testing (might take a long time)") - - parser.add_argument("--parallel", type=onoff, default=True, choices=["yes", "no", "on", "off", 0, 1], - help="enable parallelism for tests that support it") \ No newline at end of file diff --git a/tests/testflows/helpers/cluster.py b/tests/testflows/helpers/cluster.py index 0b704093ff8..5b987c1e376 100755 --- a/tests/testflows/helpers/cluster.py +++ b/tests/testflows/helpers/cluster.py @@ -295,7 +295,7 @@ class Cluster(object): self.docker_compose += f" --ansi never --project-directory \"{docker_compose_project_dir}\" --file \"{docker_compose_file_path}\"" self.lock = threading.Lock() - + @property def control_shell(self, timeout=300): """Must be called with self.lock.acquired. @@ -339,7 +339,7 @@ class Cluster(object): if node is not None: with self.lock: container_id = self.node_container_id(node=node, timeout=timeout) - + time_start = time.time() while True: try: @@ -367,12 +367,6 @@ class Cluster(object): """ test = current() - if top().terminating: - if test and (test.cflags & MANDATORY and test.subtype is not TestSubType.Given): - pass - else: - raise InterruptedError("terminating") - current_thread = threading.current_thread() id = f"{current_thread.name}-{node}" diff --git a/tests/testflows/helpers/common.py b/tests/testflows/helpers/common.py index 2afcc591f98..6110074b137 100644 --- a/tests/testflows/helpers/common.py +++ b/tests/testflows/helpers/common.py @@ -1,15 +1,11 @@ import testflows.settings as settings - from testflows.core import * -from multiprocessing.dummy import Pool -from multiprocessing import TimeoutError as PoolTaskTimeoutError - @TestStep(Given) def instrument_clickhouse_server_log(self, node=None, test=None, clickhouse_server_log="/var/log/clickhouse-server/clickhouse-server.log"): """Instrument clickhouse-server.log for the current test (default) - by adding start and end messages that include test name to log + by adding start and end messages that include test name to log of the specified node. If we are in the debug mode and the test fails then dump the messages from the log for this test. """ @@ -29,7 +25,7 @@ def instrument_clickhouse_server_log(self, node=None, test=None, yield finally: - if top().terminating is True: + if test.terminating is True: return with Finally("adding test name end message to the clickhouse-server.log", flags=TE): @@ -44,65 +40,3 @@ def instrument_clickhouse_server_log(self, node=None, test=None, with Then("dumping clickhouse-server.log for this test"): node.command(f"tail -c +{start_logsize} {clickhouse_server_log}" f" | head -c {int(end_logsize) - int(start_logsize)}") - -def join(tasks, timeout=None, polling=5): - """Join all parallel tests. - """ - exc = None - - for task in tasks: - task._join_timeout = timeout - - while tasks: - try: - try: - tasks[0].get(timeout=polling) - tasks.pop(0) - - except PoolTaskTimeoutError as e: - task = tasks.pop(0) - if task._join_timeout is not None: - task._join_timeout -= polling - if task._join_timeout <= 0: - raise - tasks.append(task) - continue - - except KeyboardInterrupt as e: - top().terminating = True - raise - - except Exception as e: - tasks.pop(0) - if exc is None: - exc = e - top().terminating = True - - if exc is not None: - raise exc - -def start(pool, tasks, scenario, kwargs=None): - """Start parallel test. - """ - if kwargs is None: - kwargs = {} - - task = pool.apply_async(scenario, [], kwargs) - tasks.append(task) - - return task - -def run_scenario(pool, tasks, scenario, kwargs=None): - if kwargs is None: - kwargs = {} - - _top = top() - def _scenario_wrapper(**kwargs): - if _top.terminating: - return - return scenario(**kwargs) - - if current().context.parallel: - start(pool, tasks, _scenario_wrapper, kwargs) - else: - scenario(**kwargs) diff --git a/tests/testflows/kerberos/configs/clickhouse1/config.d/kerberos.xml b/tests/testflows/kerberos/configs/clickhouse1/config.d/kerberos.xml index ceaa497c561..e45c4519c73 100644 --- a/tests/testflows/kerberos/configs/clickhouse1/config.d/kerberos.xml +++ b/tests/testflows/kerberos/configs/clickhouse1/config.d/kerberos.xml @@ -1,5 +1,6 @@ + EXAMPLE.COM - \ No newline at end of file + diff --git a/tests/testflows/kerberos/configs/clickhouse3/config.d/kerberos.xml b/tests/testflows/kerberos/configs/clickhouse3/config.d/kerberos.xml new file mode 100644 index 00000000000..e45c4519c73 --- /dev/null +++ b/tests/testflows/kerberos/configs/clickhouse3/config.d/kerberos.xml @@ -0,0 +1,6 @@ + + + + EXAMPLE.COM + + diff --git a/tests/testflows/kerberos/configs/kerberos/etc/krb5.conf b/tests/testflows/kerberos/configs/kerberos/etc/krb5.conf index b963fc25daa..602ca76abbe 100644 --- a/tests/testflows/kerberos/configs/kerberos/etc/krb5.conf +++ b/tests/testflows/kerberos/configs/kerberos/etc/krb5.conf @@ -3,17 +3,14 @@ [libdefaults] default_realm = EXAMPLE.COM - ticket_lifetime = 24000 - dns_lookup_realm = false - dns_lookup_kdc = false - dns_fallback = false - rdns = false + ticket_lifetime = 36000 + dns_lookup_kdc = false [realms] - EXAMPLE.COM = { - kdc = kerberos - admin_server = kerberos - } + EXAMPLE.COM = { + kdc = kerberos_env_kerberos_1.krbnet + admin_server = kerberos_env_kerberos_1.krbnet + } OTHER.COM = { kdc = kerberos admin_server = kerberos @@ -22,6 +19,10 @@ [domain_realm] docker-compose_default = EXAMPLE.COM .docker-compose_default = EXAMPLE.COM + krbnet = EXAMPLE.COM + .krbnet = EXAMPLE.COM + kerberos_env_default = EXAMPLE.COM + .kerberos_env_default = EXAMPLE.COM [appdefaults] validate = false diff --git a/tests/testflows/kerberos/kerberos_env/docker-compose.yml b/tests/testflows/kerberos/kerberos_env/docker-compose.yml index d1a74662a83..e89d18a5299 100644 --- a/tests/testflows/kerberos/kerberos_env/docker-compose.yml +++ b/tests/testflows/kerberos/kerberos_env/docker-compose.yml @@ -73,3 +73,8 @@ services: condition: service_healthy kerberos: condition: service_healthy + +networks: + default: + name: krbnet + driver: bridge diff --git a/tests/testflows/kerberos/kerberos_env/kerberos-service.yml b/tests/testflows/kerberos/kerberos_env/kerberos-service.yml index 3f21e93e0b6..b34751258da 100644 --- a/tests/testflows/kerberos/kerberos_env/kerberos-service.yml +++ b/tests/testflows/kerberos/kerberos_env/kerberos-service.yml @@ -3,7 +3,6 @@ version: '2.3' services: kerberos: image: zvonand/docker-krb5-server:1.0.0 - restart: always expose: - "88" - "464" @@ -17,7 +16,7 @@ services: environment: KRB5_PASS: pwd KRB5_REALM: EXAMPLE.COM - KRB5_KDC: localhost + KRB5_KDC: 0.0.0.0 volumes: - "${CLICKHOUSE_TESTS_DIR}/configs/kerberos/etc/krb5kdc/kdc.conf:/etc/krb5kdc/kdc.conf" - "${CLICKHOUSE_TESTS_DIR}/_instances/kerberos/krb5kdc/log/kdc.log:/usr/local/var/krb5kdc/kdc.log" diff --git a/tests/testflows/kerberos/regression.py b/tests/testflows/kerberos/regression.py index ca174aaff08..d1b13acc1c9 100755 --- a/tests/testflows/kerberos/regression.py +++ b/tests/testflows/kerberos/regression.py @@ -10,6 +10,7 @@ from helpers.argparser import argparser from kerberos.requirements.requirements import * xfails = { + "config/principal and realm specified/:": [(Fail, "https://github.com/ClickHouse/ClickHouse/issues/26197")], } @@ -20,10 +21,9 @@ xfails = { RQ_SRS_016_Kerberos("1.0") ) @XFails(xfails) -def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): +def regression(self, local, clickhouse_binary_path, stress=None): """ClickHouse Kerberos authentication test regression module. """ - top().terminating = False nodes = { "clickhouse": ("clickhouse1", "clickhouse2", "clickhouse3"), "kerberos": ("kerberos", ), @@ -31,8 +31,6 @@ def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): if stress is not None: self.context.stress = stress - if parallel is not None: - self.context.parallel = parallel with Cluster(local, clickhouse_binary_path, nodes=nodes, docker_compose_project_dir=os.path.join(current_dir(), "kerberos_env")) as cluster: @@ -42,6 +40,5 @@ def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): Feature(run=load("kerberos.tests.config", "config"), flags=TE) Feature(run=load("kerberos.tests.parallel", "parallel"), flags=TE) - if main(): regression() diff --git a/tests/testflows/kerberos/requirements/requirements.md b/tests/testflows/kerberos/requirements/requirements.md index 2121dd343b8..8f2b3b7e11e 100644 --- a/tests/testflows/kerberos/requirements/requirements.md +++ b/tests/testflows/kerberos/requirements/requirements.md @@ -9,38 +9,41 @@ * 4 [Requirements](#requirements) * 4.1 [Generic](#generic) * 4.1.1 [RQ.SRS-016.Kerberos](#rqsrs-016kerberos) - * 4.2 [Configuration](#configuration) - * 4.2.1 [RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods](#rqsrs-016kerberosconfigurationmultipleauthmethods) - * 4.2.2 [RQ.SRS-016.Kerberos.Configuration.KerberosNotEnabled](#rqsrs-016kerberosconfigurationkerberosnotenabled) - * 4.2.3 [RQ.SRS-016.Kerberos.Configuration.MultipleKerberosSections](#rqsrs-016kerberosconfigurationmultiplekerberossections) - * 4.2.4 [RQ.SRS-016.Kerberos.Configuration.WrongUserRealm](#rqsrs-016kerberosconfigurationwronguserrealm) - * 4.2.5 [RQ.SRS-016.Kerberos.Configuration.PrincipalAndRealmSpecified](#rqsrs-016kerberosconfigurationprincipalandrealmspecified) - * 4.2.6 [RQ.SRS-016.Kerberos.Configuration.MultiplePrincipalSections](#rqsrs-016kerberosconfigurationmultipleprincipalsections) - * 4.2.7 [RQ.SRS-016.Kerberos.Configuration.MultipleRealmSections](#rqsrs-016kerberosconfigurationmultiplerealmsections) - * 4.3 [Valid User](#valid-user) - * 4.3.1 [RQ.SRS-016.Kerberos.ValidUser.XMLConfiguredUser](#rqsrs-016kerberosvaliduserxmlconfigureduser) - * 4.3.2 [RQ.SRS-016.Kerberos.ValidUser.RBACConfiguredUser](#rqsrs-016kerberosvaliduserrbacconfigureduser) - * 4.3.3 [RQ.SRS-016.Kerberos.ValidUser.KerberosNotConfigured](#rqsrs-016kerberosvaliduserkerberosnotconfigured) - * 4.4 [Invalid User](#invalid-user) - * 4.4.1 [RQ.SRS-016.Kerberos.InvalidUser](#rqsrs-016kerberosinvaliduser) - * 4.4.2 [RQ.SRS-016.Kerberos.InvalidUser.UserDeleted](#rqsrs-016kerberosinvaliduseruserdeleted) - * 4.5 [Kerberos Not Available](#kerberos-not-available) - * 4.5.1 [RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidServerTicket](#rqsrs-016kerberoskerberosnotavailableinvalidserverticket) - * 4.5.2 [RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidClientTicket](#rqsrs-016kerberoskerberosnotavailableinvalidclientticket) - * 4.5.3 [RQ.SRS-016.Kerberos.KerberosNotAvailable.ValidTickets](#rqsrs-016kerberoskerberosnotavailablevalidtickets) - * 4.6 [Kerberos Restarted](#kerberos-restarted) - * 4.6.1 [RQ.SRS-016.Kerberos.KerberosServerRestarted](#rqsrs-016kerberoskerberosserverrestarted) - * 4.7 [Performance](#performance) - * 4.7.1 [RQ.SRS-016.Kerberos.Performance](#rqsrs-016kerberosperformance) - * 4.8 [Parallel Requests processing](#parallel-requests-processing) - * 4.8.1 [RQ.SRS-016.Kerberos.Parallel](#rqsrs-016kerberosparallel) - * 4.8.2 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.KerberosAndNonKerberos](#rqsrs-016kerberosparallelvalidrequestskerberosandnonkerberos) - * 4.8.3 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.SameCredentials](#rqsrs-016kerberosparallelvalidrequestssamecredentials) - * 4.8.4 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.DifferentCredentials](#rqsrs-016kerberosparallelvalidrequestsdifferentcredentials) - * 4.8.5 [RQ.SRS-016.Kerberos.Parallel.ValidInvalid](#rqsrs-016kerberosparallelvalidinvalid) - * 4.8.6 [RQ.SRS-016.Kerberos.Parallel.Deletion](#rqsrs-016kerberosparalleldeletion) + * 4.2 [Ping](#ping) + * 4.2.1 [RQ.SRS-016.Kerberos.Ping](#rqsrs-016kerberosping) + * 4.3 [Configuration](#configuration) + * 4.3.1 [RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods](#rqsrs-016kerberosconfigurationmultipleauthmethods) + * 4.3.2 [RQ.SRS-016.Kerberos.Configuration.KerberosNotEnabled](#rqsrs-016kerberosconfigurationkerberosnotenabled) + * 4.3.3 [RQ.SRS-016.Kerberos.Configuration.MultipleKerberosSections](#rqsrs-016kerberosconfigurationmultiplekerberossections) + * 4.3.4 [RQ.SRS-016.Kerberos.Configuration.WrongUserRealm](#rqsrs-016kerberosconfigurationwronguserrealm) + * 4.3.5 [RQ.SRS-016.Kerberos.Configuration.PrincipalAndRealmSpecified](#rqsrs-016kerberosconfigurationprincipalandrealmspecified) + * 4.3.6 [RQ.SRS-016.Kerberos.Configuration.MultiplePrincipalSections](#rqsrs-016kerberosconfigurationmultipleprincipalsections) + * 4.3.7 [RQ.SRS-016.Kerberos.Configuration.MultipleRealmSections](#rqsrs-016kerberosconfigurationmultiplerealmsections) + * 4.4 [Valid User](#valid-user) + * 4.4.1 [RQ.SRS-016.Kerberos.ValidUser.XMLConfiguredUser](#rqsrs-016kerberosvaliduserxmlconfigureduser) + * 4.4.2 [RQ.SRS-016.Kerberos.ValidUser.RBACConfiguredUser](#rqsrs-016kerberosvaliduserrbacconfigureduser) + * 4.4.3 [RQ.SRS-016.Kerberos.ValidUser.KerberosNotConfigured](#rqsrs-016kerberosvaliduserkerberosnotconfigured) + * 4.5 [Invalid User](#invalid-user) + * 4.5.1 [RQ.SRS-016.Kerberos.InvalidUser](#rqsrs-016kerberosinvaliduser) + * 4.5.2 [RQ.SRS-016.Kerberos.InvalidUser.UserDeleted](#rqsrs-016kerberosinvaliduseruserdeleted) + * 4.6 [Kerberos Not Available](#kerberos-not-available) + * 4.6.1 [RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidServerTicket](#rqsrs-016kerberoskerberosnotavailableinvalidserverticket) + * 4.6.2 [RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidClientTicket](#rqsrs-016kerberoskerberosnotavailableinvalidclientticket) + * 4.6.3 [RQ.SRS-016.Kerberos.KerberosNotAvailable.ValidTickets](#rqsrs-016kerberoskerberosnotavailablevalidtickets) + * 4.7 [Kerberos Restarted](#kerberos-restarted) + * 4.7.1 [RQ.SRS-016.Kerberos.KerberosServerRestarted](#rqsrs-016kerberoskerberosserverrestarted) + * 4.8 [Performance](#performance) + * 4.8.1 [RQ.SRS-016.Kerberos.Performance](#rqsrs-016kerberosperformance) + * 4.9 [Parallel Requests processing](#parallel-requests-processing) + * 4.9.1 [RQ.SRS-016.Kerberos.Parallel](#rqsrs-016kerberosparallel) + * 4.9.2 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.KerberosAndNonKerberos](#rqsrs-016kerberosparallelvalidrequestskerberosandnonkerberos) + * 4.9.3 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.SameCredentials](#rqsrs-016kerberosparallelvalidrequestssamecredentials) + * 4.9.4 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.DifferentCredentials](#rqsrs-016kerberosparallelvalidrequestsdifferentcredentials) + * 4.9.5 [RQ.SRS-016.Kerberos.Parallel.ValidInvalid](#rqsrs-016kerberosparallelvalidinvalid) + * 4.9.6 [RQ.SRS-016.Kerberos.Parallel.Deletion](#rqsrs-016kerberosparalleldeletion) * 5 [References](#references) + ## Revision History This document is stored in an electronic form using [Git] source control management software @@ -85,6 +88,13 @@ version: 1.0 [ClickHouse] SHALL support user authentication using [Kerberos] server. +### Ping + +#### RQ.SRS-016.Kerberos.Ping +version: 1.0 + +Docker containers SHALL be able to ping each other. + ### Configuration #### RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods @@ -278,4 +288,3 @@ version: 1.0 [Revision History]: https://github.com/ClickHouse/ClickHouse/commits/master/tests/testflows/kerberos/requirements/requirements.md [Git]: https://git-scm.com/ [Kerberos terminology]: https://web.mit.edu/kerberos/kfw-4.1/kfw-4.1/kfw-4.1-help/html/kerberos_terminology.htm - diff --git a/tests/testflows/kerberos/requirements/requirements.py b/tests/testflows/kerberos/requirements/requirements.py index 5c49e7d127f..418c51ca8b3 100644 --- a/tests/testflows/kerberos/requirements/requirements.py +++ b/tests/testflows/kerberos/requirements/requirements.py @@ -1,6 +1,6 @@ # These requirements were auto generated # from software requirements specification (SRS) -# document by TestFlows v1.6.201216.1172002. +# document by TestFlows v1.6.210312.1172513. # Do not edit by hand but re-generate instead # using 'tfs requirements generate' command. from testflows.core import Specification @@ -23,6 +23,21 @@ RQ_SRS_016_Kerberos = Requirement( level=3, num='4.1.1') +RQ_SRS_016_Kerberos_Ping = Requirement( + name='RQ.SRS-016.Kerberos.Ping', + version='1.0', + priority=None, + group=None, + type=None, + uid=None, + description=( + 'Docker containers SHALL be able to ping each other.\n' + '\n' + ), + link=None, + level=3, + num='4.2.1') + RQ_SRS_016_Kerberos_Configuration_MultipleAuthMethods = Requirement( name='RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods', version='1.0', @@ -36,7 +51,7 @@ RQ_SRS_016_Kerberos_Configuration_MultipleAuthMethods = Requirement( ), link=None, level=3, - num='4.2.1') + num='4.3.1') RQ_SRS_016_Kerberos_Configuration_KerberosNotEnabled = Requirement( name='RQ.SRS-016.Kerberos.Configuration.KerberosNotEnabled', @@ -74,7 +89,7 @@ RQ_SRS_016_Kerberos_Configuration_KerberosNotEnabled = Requirement( ), link=None, level=3, - num='4.2.2') + num='4.3.2') RQ_SRS_016_Kerberos_Configuration_MultipleKerberosSections = Requirement( name='RQ.SRS-016.Kerberos.Configuration.MultipleKerberosSections', @@ -89,7 +104,7 @@ RQ_SRS_016_Kerberos_Configuration_MultipleKerberosSections = Requirement( ), link=None, level=3, - num='4.2.3') + num='4.3.3') RQ_SRS_016_Kerberos_Configuration_WrongUserRealm = Requirement( name='RQ.SRS-016.Kerberos.Configuration.WrongUserRealm', @@ -104,7 +119,7 @@ RQ_SRS_016_Kerberos_Configuration_WrongUserRealm = Requirement( ), link=None, level=3, - num='4.2.4') + num='4.3.4') RQ_SRS_016_Kerberos_Configuration_PrincipalAndRealmSpecified = Requirement( name='RQ.SRS-016.Kerberos.Configuration.PrincipalAndRealmSpecified', @@ -119,7 +134,7 @@ RQ_SRS_016_Kerberos_Configuration_PrincipalAndRealmSpecified = Requirement( ), link=None, level=3, - num='4.2.5') + num='4.3.5') RQ_SRS_016_Kerberos_Configuration_MultiplePrincipalSections = Requirement( name='RQ.SRS-016.Kerberos.Configuration.MultiplePrincipalSections', @@ -134,7 +149,7 @@ RQ_SRS_016_Kerberos_Configuration_MultiplePrincipalSections = Requirement( ), link=None, level=3, - num='4.2.6') + num='4.3.6') RQ_SRS_016_Kerberos_Configuration_MultipleRealmSections = Requirement( name='RQ.SRS-016.Kerberos.Configuration.MultipleRealmSections', @@ -149,7 +164,7 @@ RQ_SRS_016_Kerberos_Configuration_MultipleRealmSections = Requirement( ), link=None, level=3, - num='4.2.7') + num='4.3.7') RQ_SRS_016_Kerberos_ValidUser_XMLConfiguredUser = Requirement( name='RQ.SRS-016.Kerberos.ValidUser.XMLConfiguredUser', @@ -179,7 +194,7 @@ RQ_SRS_016_Kerberos_ValidUser_XMLConfiguredUser = Requirement( ), link=None, level=3, - num='4.3.1') + num='4.4.1') RQ_SRS_016_Kerberos_ValidUser_RBACConfiguredUser = Requirement( name='RQ.SRS-016.Kerberos.ValidUser.RBACConfiguredUser', @@ -204,7 +219,7 @@ RQ_SRS_016_Kerberos_ValidUser_RBACConfiguredUser = Requirement( ), link=None, level=3, - num='4.3.2') + num='4.4.2') RQ_SRS_016_Kerberos_ValidUser_KerberosNotConfigured = Requirement( name='RQ.SRS-016.Kerberos.ValidUser.KerberosNotConfigured', @@ -219,7 +234,7 @@ RQ_SRS_016_Kerberos_ValidUser_KerberosNotConfigured = Requirement( ), link=None, level=3, - num='4.3.3') + num='4.4.3') RQ_SRS_016_Kerberos_InvalidUser = Requirement( name='RQ.SRS-016.Kerberos.InvalidUser', @@ -234,7 +249,7 @@ RQ_SRS_016_Kerberos_InvalidUser = Requirement( ), link=None, level=3, - num='4.4.1') + num='4.5.1') RQ_SRS_016_Kerberos_InvalidUser_UserDeleted = Requirement( name='RQ.SRS-016.Kerberos.InvalidUser.UserDeleted', @@ -249,7 +264,7 @@ RQ_SRS_016_Kerberos_InvalidUser_UserDeleted = Requirement( ), link=None, level=3, - num='4.4.2') + num='4.5.2') RQ_SRS_016_Kerberos_KerberosNotAvailable_InvalidServerTicket = Requirement( name='RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidServerTicket', @@ -264,7 +279,7 @@ RQ_SRS_016_Kerberos_KerberosNotAvailable_InvalidServerTicket = Requirement( ), link=None, level=3, - num='4.5.1') + num='4.6.1') RQ_SRS_016_Kerberos_KerberosNotAvailable_InvalidClientTicket = Requirement( name='RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidClientTicket', @@ -279,7 +294,7 @@ RQ_SRS_016_Kerberos_KerberosNotAvailable_InvalidClientTicket = Requirement( ), link=None, level=3, - num='4.5.2') + num='4.6.2') RQ_SRS_016_Kerberos_KerberosNotAvailable_ValidTickets = Requirement( name='RQ.SRS-016.Kerberos.KerberosNotAvailable.ValidTickets', @@ -294,7 +309,7 @@ RQ_SRS_016_Kerberos_KerberosNotAvailable_ValidTickets = Requirement( ), link=None, level=3, - num='4.5.3') + num='4.6.3') RQ_SRS_016_Kerberos_KerberosServerRestarted = Requirement( name='RQ.SRS-016.Kerberos.KerberosServerRestarted', @@ -309,7 +324,7 @@ RQ_SRS_016_Kerberos_KerberosServerRestarted = Requirement( ), link=None, level=3, - num='4.6.1') + num='4.7.1') RQ_SRS_016_Kerberos_Performance = Requirement( name='RQ.SRS-016.Kerberos.Performance', @@ -324,7 +339,7 @@ RQ_SRS_016_Kerberos_Performance = Requirement( ), link=None, level=3, - num='4.7.1') + num='4.8.1') RQ_SRS_016_Kerberos_Parallel = Requirement( name='RQ.SRS-016.Kerberos.Parallel', @@ -339,7 +354,7 @@ RQ_SRS_016_Kerberos_Parallel = Requirement( ), link=None, level=3, - num='4.8.1') + num='4.9.1') RQ_SRS_016_Kerberos_Parallel_ValidRequests_KerberosAndNonKerberos = Requirement( name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.KerberosAndNonKerberos', @@ -354,7 +369,7 @@ RQ_SRS_016_Kerberos_Parallel_ValidRequests_KerberosAndNonKerberos = Requirement( ), link=None, level=3, - num='4.8.2') + num='4.9.2') RQ_SRS_016_Kerberos_Parallel_ValidRequests_SameCredentials = Requirement( name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.SameCredentials', @@ -369,7 +384,7 @@ RQ_SRS_016_Kerberos_Parallel_ValidRequests_SameCredentials = Requirement( ), link=None, level=3, - num='4.8.3') + num='4.9.3') RQ_SRS_016_Kerberos_Parallel_ValidRequests_DifferentCredentials = Requirement( name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.DifferentCredentials', @@ -384,7 +399,7 @@ RQ_SRS_016_Kerberos_Parallel_ValidRequests_DifferentCredentials = Requirement( ), link=None, level=3, - num='4.8.4') + num='4.9.4') RQ_SRS_016_Kerberos_Parallel_ValidInvalid = Requirement( name='RQ.SRS-016.Kerberos.Parallel.ValidInvalid', @@ -399,7 +414,7 @@ RQ_SRS_016_Kerberos_Parallel_ValidInvalid = Requirement( ), link=None, level=3, - num='4.8.5') + num='4.9.5') RQ_SRS_016_Kerberos_Parallel_Deletion = Requirement( name='RQ.SRS-016.Kerberos.Parallel.Deletion', @@ -414,17 +429,17 @@ RQ_SRS_016_Kerberos_Parallel_Deletion = Requirement( ), link=None, level=3, - num='4.8.6') + num='4.9.6') QA_SRS016_ClickHouse_Kerberos_Authentication = Specification( name='QA-SRS016 ClickHouse Kerberos Authentication', description=None, - author='Andrey Zvonov', - date='December 14, 2020', - status='-', - approved_by='-', - approved_date='-', - approved_version='-', + author=None, + date=None, + status=None, + approved_by=None, + approved_date=None, + approved_version=None, version=None, group=None, type=None, @@ -439,40 +454,43 @@ QA_SRS016_ClickHouse_Kerberos_Authentication = Specification( Heading(name='Requirements', level=1, num='4'), Heading(name='Generic', level=2, num='4.1'), Heading(name='RQ.SRS-016.Kerberos', level=3, num='4.1.1'), - Heading(name='Configuration', level=2, num='4.2'), - Heading(name='RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods', level=3, num='4.2.1'), - Heading(name='RQ.SRS-016.Kerberos.Configuration.KerberosNotEnabled', level=3, num='4.2.2'), - Heading(name='RQ.SRS-016.Kerberos.Configuration.MultipleKerberosSections', level=3, num='4.2.3'), - Heading(name='RQ.SRS-016.Kerberos.Configuration.WrongUserRealm', level=3, num='4.2.4'), - Heading(name='RQ.SRS-016.Kerberos.Configuration.PrincipalAndRealmSpecified', level=3, num='4.2.5'), - Heading(name='RQ.SRS-016.Kerberos.Configuration.MultiplePrincipalSections', level=3, num='4.2.6'), - Heading(name='RQ.SRS-016.Kerberos.Configuration.MultipleRealmSections', level=3, num='4.2.7'), - Heading(name='Valid User', level=2, num='4.3'), - Heading(name='RQ.SRS-016.Kerberos.ValidUser.XMLConfiguredUser', level=3, num='4.3.1'), - Heading(name='RQ.SRS-016.Kerberos.ValidUser.RBACConfiguredUser', level=3, num='4.3.2'), - Heading(name='RQ.SRS-016.Kerberos.ValidUser.KerberosNotConfigured', level=3, num='4.3.3'), - Heading(name='Invalid User', level=2, num='4.4'), - Heading(name='RQ.SRS-016.Kerberos.InvalidUser', level=3, num='4.4.1'), - Heading(name='RQ.SRS-016.Kerberos.InvalidUser.UserDeleted', level=3, num='4.4.2'), - Heading(name='Kerberos Not Available', level=2, num='4.5'), - Heading(name='RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidServerTicket', level=3, num='4.5.1'), - Heading(name='RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidClientTicket', level=3, num='4.5.2'), - Heading(name='RQ.SRS-016.Kerberos.KerberosNotAvailable.ValidTickets', level=3, num='4.5.3'), - Heading(name='Kerberos Restarted', level=2, num='4.6'), - Heading(name='RQ.SRS-016.Kerberos.KerberosServerRestarted', level=3, num='4.6.1'), - Heading(name='Performance', level=2, num='4.7'), - Heading(name='RQ.SRS-016.Kerberos.Performance', level=3, num='4.7.1'), - Heading(name='Parallel Requests processing', level=2, num='4.8'), - Heading(name='RQ.SRS-016.Kerberos.Parallel', level=3, num='4.8.1'), - Heading(name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.KerberosAndNonKerberos', level=3, num='4.8.2'), - Heading(name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.SameCredentials', level=3, num='4.8.3'), - Heading(name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.DifferentCredentials', level=3, num='4.8.4'), - Heading(name='RQ.SRS-016.Kerberos.Parallel.ValidInvalid', level=3, num='4.8.5'), - Heading(name='RQ.SRS-016.Kerberos.Parallel.Deletion', level=3, num='4.8.6'), + Heading(name='Ping', level=2, num='4.2'), + Heading(name='RQ.SRS-016.Kerberos.Ping', level=3, num='4.2.1'), + Heading(name='Configuration', level=2, num='4.3'), + Heading(name='RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods', level=3, num='4.3.1'), + Heading(name='RQ.SRS-016.Kerberos.Configuration.KerberosNotEnabled', level=3, num='4.3.2'), + Heading(name='RQ.SRS-016.Kerberos.Configuration.MultipleKerberosSections', level=3, num='4.3.3'), + Heading(name='RQ.SRS-016.Kerberos.Configuration.WrongUserRealm', level=3, num='4.3.4'), + Heading(name='RQ.SRS-016.Kerberos.Configuration.PrincipalAndRealmSpecified', level=3, num='4.3.5'), + Heading(name='RQ.SRS-016.Kerberos.Configuration.MultiplePrincipalSections', level=3, num='4.3.6'), + Heading(name='RQ.SRS-016.Kerberos.Configuration.MultipleRealmSections', level=3, num='4.3.7'), + Heading(name='Valid User', level=2, num='4.4'), + Heading(name='RQ.SRS-016.Kerberos.ValidUser.XMLConfiguredUser', level=3, num='4.4.1'), + Heading(name='RQ.SRS-016.Kerberos.ValidUser.RBACConfiguredUser', level=3, num='4.4.2'), + Heading(name='RQ.SRS-016.Kerberos.ValidUser.KerberosNotConfigured', level=3, num='4.4.3'), + Heading(name='Invalid User', level=2, num='4.5'), + Heading(name='RQ.SRS-016.Kerberos.InvalidUser', level=3, num='4.5.1'), + Heading(name='RQ.SRS-016.Kerberos.InvalidUser.UserDeleted', level=3, num='4.5.2'), + Heading(name='Kerberos Not Available', level=2, num='4.6'), + Heading(name='RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidServerTicket', level=3, num='4.6.1'), + Heading(name='RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidClientTicket', level=3, num='4.6.2'), + Heading(name='RQ.SRS-016.Kerberos.KerberosNotAvailable.ValidTickets', level=3, num='4.6.3'), + Heading(name='Kerberos Restarted', level=2, num='4.7'), + Heading(name='RQ.SRS-016.Kerberos.KerberosServerRestarted', level=3, num='4.7.1'), + Heading(name='Performance', level=2, num='4.8'), + Heading(name='RQ.SRS-016.Kerberos.Performance', level=3, num='4.8.1'), + Heading(name='Parallel Requests processing', level=2, num='4.9'), + Heading(name='RQ.SRS-016.Kerberos.Parallel', level=3, num='4.9.1'), + Heading(name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.KerberosAndNonKerberos', level=3, num='4.9.2'), + Heading(name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.SameCredentials', level=3, num='4.9.3'), + Heading(name='RQ.SRS-016.Kerberos.Parallel.ValidRequests.DifferentCredentials', level=3, num='4.9.4'), + Heading(name='RQ.SRS-016.Kerberos.Parallel.ValidInvalid', level=3, num='4.9.5'), + Heading(name='RQ.SRS-016.Kerberos.Parallel.Deletion', level=3, num='4.9.6'), Heading(name='References', level=1, num='5'), ), requirements=( RQ_SRS_016_Kerberos, + RQ_SRS_016_Kerberos_Ping, RQ_SRS_016_Kerberos_Configuration_MultipleAuthMethods, RQ_SRS_016_Kerberos_Configuration_KerberosNotEnabled, RQ_SRS_016_Kerberos_Configuration_MultipleKerberosSections, @@ -501,25 +519,6 @@ QA_SRS016_ClickHouse_Kerberos_Authentication = Specification( # QA-SRS016 ClickHouse Kerberos Authentication # Software Requirements Specification -(c) 2020 Altinity LTD. All Rights Reserved. - -**Document status:** Confidential - -**Author:** Andrey Zvonov - -**Date:** December 14, 2020 - -## Approval - -**Status:** - - -**Version:** - - -**Approved by:** - - -**Date:** - - - ## Table of Contents * 1 [Revision History](#revision-history) @@ -528,47 +527,50 @@ QA_SRS016_ClickHouse_Kerberos_Authentication = Specification( * 4 [Requirements](#requirements) * 4.1 [Generic](#generic) * 4.1.1 [RQ.SRS-016.Kerberos](#rqsrs-016kerberos) - * 4.2 [Configuration](#configuration) - * 4.2.1 [RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods](#rqsrs-016kerberosconfigurationmultipleauthmethods) - * 4.2.2 [RQ.SRS-016.Kerberos.Configuration.KerberosNotEnabled](#rqsrs-016kerberosconfigurationkerberosnotenabled) - * 4.2.3 [RQ.SRS-016.Kerberos.Configuration.MultipleKerberosSections](#rqsrs-016kerberosconfigurationmultiplekerberossections) - * 4.2.4 [RQ.SRS-016.Kerberos.Configuration.WrongUserRealm](#rqsrs-016kerberosconfigurationwronguserrealm) - * 4.2.5 [RQ.SRS-016.Kerberos.Configuration.PrincipalAndRealmSpecified](#rqsrs-016kerberosconfigurationprincipalandrealmspecified) - * 4.2.6 [RQ.SRS-016.Kerberos.Configuration.MultiplePrincipalSections](#rqsrs-016kerberosconfigurationmultipleprincipalsections) - * 4.2.7 [RQ.SRS-016.Kerberos.Configuration.MultipleRealmSections](#rqsrs-016kerberosconfigurationmultiplerealmsections) - * 4.3 [Valid User](#valid-user) - * 4.3.1 [RQ.SRS-016.Kerberos.ValidUser.XMLConfiguredUser](#rqsrs-016kerberosvaliduserxmlconfigureduser) - * 4.3.2 [RQ.SRS-016.Kerberos.ValidUser.RBACConfiguredUser](#rqsrs-016kerberosvaliduserrbacconfigureduser) - * 4.3.3 [RQ.SRS-016.Kerberos.ValidUser.KerberosNotConfigured](#rqsrs-016kerberosvaliduserkerberosnotconfigured) - * 4.4 [Invalid User](#invalid-user) - * 4.4.1 [RQ.SRS-016.Kerberos.InvalidUser](#rqsrs-016kerberosinvaliduser) - * 4.4.2 [RQ.SRS-016.Kerberos.InvalidUser.UserDeleted](#rqsrs-016kerberosinvaliduseruserdeleted) - * 4.5 [Kerberos Not Available](#kerberos-not-available) - * 4.5.1 [RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidServerTicket](#rqsrs-016kerberoskerberosnotavailableinvalidserverticket) - * 4.5.2 [RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidClientTicket](#rqsrs-016kerberoskerberosnotavailableinvalidclientticket) - * 4.5.3 [RQ.SRS-016.Kerberos.KerberosNotAvailable.ValidTickets](#rqsrs-016kerberoskerberosnotavailablevalidtickets) - * 4.6 [Kerberos Restarted](#kerberos-restarted) - * 4.6.1 [RQ.SRS-016.Kerberos.KerberosServerRestarted](#rqsrs-016kerberoskerberosserverrestarted) - * 4.7 [Performance](#performance) - * 4.7.1 [RQ.SRS-016.Kerberos.Performance](#rqsrs-016kerberosperformance) - * 4.8 [Parallel Requests processing](#parallel-requests-processing) - * 4.8.1 [RQ.SRS-016.Kerberos.Parallel](#rqsrs-016kerberosparallel) - * 4.8.2 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.KerberosAndNonKerberos](#rqsrs-016kerberosparallelvalidrequestskerberosandnonkerberos) - * 4.8.3 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.SameCredentials](#rqsrs-016kerberosparallelvalidrequestssamecredentials) - * 4.8.4 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.DifferentCredentials](#rqsrs-016kerberosparallelvalidrequestsdifferentcredentials) - * 4.8.5 [RQ.SRS-016.Kerberos.Parallel.ValidInvalid](#rqsrs-016kerberosparallelvalidinvalid) - * 4.8.6 [RQ.SRS-016.Kerberos.Parallel.Deletion](#rqsrs-016kerberosparalleldeletion) + * 4.2 [Ping](#ping) + * 4.2.1 [RQ.SRS-016.Kerberos.Ping](#rqsrs-016kerberosping) + * 4.3 [Configuration](#configuration) + * 4.3.1 [RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods](#rqsrs-016kerberosconfigurationmultipleauthmethods) + * 4.3.2 [RQ.SRS-016.Kerberos.Configuration.KerberosNotEnabled](#rqsrs-016kerberosconfigurationkerberosnotenabled) + * 4.3.3 [RQ.SRS-016.Kerberos.Configuration.MultipleKerberosSections](#rqsrs-016kerberosconfigurationmultiplekerberossections) + * 4.3.4 [RQ.SRS-016.Kerberos.Configuration.WrongUserRealm](#rqsrs-016kerberosconfigurationwronguserrealm) + * 4.3.5 [RQ.SRS-016.Kerberos.Configuration.PrincipalAndRealmSpecified](#rqsrs-016kerberosconfigurationprincipalandrealmspecified) + * 4.3.6 [RQ.SRS-016.Kerberos.Configuration.MultiplePrincipalSections](#rqsrs-016kerberosconfigurationmultipleprincipalsections) + * 4.3.7 [RQ.SRS-016.Kerberos.Configuration.MultipleRealmSections](#rqsrs-016kerberosconfigurationmultiplerealmsections) + * 4.4 [Valid User](#valid-user) + * 4.4.1 [RQ.SRS-016.Kerberos.ValidUser.XMLConfiguredUser](#rqsrs-016kerberosvaliduserxmlconfigureduser) + * 4.4.2 [RQ.SRS-016.Kerberos.ValidUser.RBACConfiguredUser](#rqsrs-016kerberosvaliduserrbacconfigureduser) + * 4.4.3 [RQ.SRS-016.Kerberos.ValidUser.KerberosNotConfigured](#rqsrs-016kerberosvaliduserkerberosnotconfigured) + * 4.5 [Invalid User](#invalid-user) + * 4.5.1 [RQ.SRS-016.Kerberos.InvalidUser](#rqsrs-016kerberosinvaliduser) + * 4.5.2 [RQ.SRS-016.Kerberos.InvalidUser.UserDeleted](#rqsrs-016kerberosinvaliduseruserdeleted) + * 4.6 [Kerberos Not Available](#kerberos-not-available) + * 4.6.1 [RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidServerTicket](#rqsrs-016kerberoskerberosnotavailableinvalidserverticket) + * 4.6.2 [RQ.SRS-016.Kerberos.KerberosNotAvailable.InvalidClientTicket](#rqsrs-016kerberoskerberosnotavailableinvalidclientticket) + * 4.6.3 [RQ.SRS-016.Kerberos.KerberosNotAvailable.ValidTickets](#rqsrs-016kerberoskerberosnotavailablevalidtickets) + * 4.7 [Kerberos Restarted](#kerberos-restarted) + * 4.7.1 [RQ.SRS-016.Kerberos.KerberosServerRestarted](#rqsrs-016kerberoskerberosserverrestarted) + * 4.8 [Performance](#performance) + * 4.8.1 [RQ.SRS-016.Kerberos.Performance](#rqsrs-016kerberosperformance) + * 4.9 [Parallel Requests processing](#parallel-requests-processing) + * 4.9.1 [RQ.SRS-016.Kerberos.Parallel](#rqsrs-016kerberosparallel) + * 4.9.2 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.KerberosAndNonKerberos](#rqsrs-016kerberosparallelvalidrequestskerberosandnonkerberos) + * 4.9.3 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.SameCredentials](#rqsrs-016kerberosparallelvalidrequestssamecredentials) + * 4.9.4 [RQ.SRS-016.Kerberos.Parallel.ValidRequests.DifferentCredentials](#rqsrs-016kerberosparallelvalidrequestsdifferentcredentials) + * 4.9.5 [RQ.SRS-016.Kerberos.Parallel.ValidInvalid](#rqsrs-016kerberosparallelvalidinvalid) + * 4.9.6 [RQ.SRS-016.Kerberos.Parallel.Deletion](#rqsrs-016kerberosparalleldeletion) * 5 [References](#references) + ## Revision History This document is stored in an electronic form using [Git] source control management software -hosted in a [GitLab Repository]. +hosted in a [GitHub Repository]. All the updates are tracked using the [Git]'s [Revision History]. ## Introduction -This document specifies the behavior for authenticating existing users using [Kerberos] authentication protocol. +This document specifies the behavior for authenticating existing users via [Kerberos] authentication protocol. Existing [ClickHouse] users, that are properly configured, have an ability to authenticate using [Kerberos]. Kerberos authentication is only supported for HTTP requests, and users configured to authenticate via Kerberos cannot be authenticated by any other means of authentication. In order to use Kerberos authentication, Kerberos needs to be properly configured in the environment: Kerberos server must be present and user's and server's credentials must be set up. Configuring the Kerberos environment is outside the scope of this document. @@ -604,6 +606,13 @@ version: 1.0 [ClickHouse] SHALL support user authentication using [Kerberos] server. +### Ping + +#### RQ.SRS-016.Kerberos.Ping +version: 1.0 + +Docker containers SHALL be able to ping each other. + ### Configuration #### RQ.SRS-016.Kerberos.Configuration.MultipleAuthMethods @@ -784,17 +793,17 @@ version: 1.0 ## References * **ClickHouse:** https://clickhouse.tech -* **Gitlab Repository:** https://gitlab.com/altinity-qa/documents/qa-srs016-clickhouse-kerberos-authentication/-/blob/master/QA_SRS016_ClickHouse_Kerberos_Authentication.md -* **Revision History:** https://gitlab.com/altinity-qa/documents/qa-srs016-clickhouse-kerberos-authentication/-/commits/master/QA_SRS016_ClickHouse_Kerberos_Authentication.md +* **GitHub Repository:** https://github.com/ClickHouse/ClickHouse/blob/master/tests/testflows/kerberos/requirements/requirements.md +* **Revision History:** https://github.com/ClickHouse/ClickHouse/commits/master/tests/testflows/kerberos/requirements/requirements.md * **Git:** https://git-scm.com/ * **Kerberos terminology:** https://web.mit.edu/kerberos/kfw-4.1/kfw-4.1/kfw-4.1-help/html/kerberos_terminology.htm [Kerberos]: https://en.wikipedia.org/wiki/Kerberos_(protocol) [SPNEGO]: https://en.wikipedia.org/wiki/SPNEGO [ClickHouse]: https://clickhouse.tech -[GitLab]: https://gitlab.com -[GitLab Repository]: https://gitlab.com/altinity-qa/documents/qa-srs016-clickhouse-kerberos-authentication/-/blob/master/QA_SRS016_ClickHouse_Kerberos_Authentication.md -[Revision History]: https://gitlab.com/altinity-qa/documents/qa-srs016-clickhouse-kerberos-authentication/-/commits/master/QA_SRS016_ClickHouse_Kerberos_Authentication.md +[GitHub]: https://gitlab.com +[GitHub Repository]: https://github.com/ClickHouse/ClickHouse/blob/master/tests/testflows/kerberos/requirements/requirements.md +[Revision History]: https://github.com/ClickHouse/ClickHouse/commits/master/tests/testflows/kerberos/requirements/requirements.md [Git]: https://git-scm.com/ [Kerberos terminology]: https://web.mit.edu/kerberos/kfw-4.1/kfw-4.1/kfw-4.1-help/html/kerberos_terminology.htm ''') diff --git a/tests/testflows/kerberos/tests/common.py b/tests/testflows/kerberos/tests/common.py index e768a78cad5..8b72f1c2ffd 100644 --- a/tests/testflows/kerberos/tests/common.py +++ b/tests/testflows/kerberos/tests/common.py @@ -68,8 +68,8 @@ def create_server_principal(self, node): """ try: node.cmd("echo pwd | kinit admin/admin") - node.cmd(f"kadmin -w pwd -q \"add_principal -randkey HTTP/docker-compose_{node.name}_1.docker-compose_default\"") - node.cmd(f"kadmin -w pwd -q \"ktadd -k /etc/krb5.keytab HTTP/docker-compose_{node.name}_1.docker-compose_default\"") + node.cmd(f"kadmin -w pwd -q \"add_principal -randkey HTTP/kerberos_env_{node.name}_1.krbnet\"") + node.cmd(f"kadmin -w pwd -q \"ktadd -k /etc/krb5.keytab HTTP/kerberos_env_{node.name}_1.krbnet\"") yield finally: node.cmd("kdestroy") @@ -170,7 +170,7 @@ def check_wrong_config(self, node, client, config_path, modify_file, log_error=" config_contents = xmltree.tostring(root, encoding='utf8', method='xml').decode('utf-8') command = f"cat < {full_config_path}\n{config_contents}\nHEREDOC" node.command(command, steps=False, exitcode=0) - # time.sleep(1) + time.sleep(1) with Then(f"{preprocessed_name} should be updated", description=f"timeout {timeout}"): started = time.time() @@ -183,11 +183,14 @@ def check_wrong_config(self, node, client, config_path, modify_file, log_error=" assert exitcode == 0, error() with When("I restart ClickHouse to apply the config changes"): + node.cmd("kdestroy") + # time.sleep(1) if output: node.restart(safe=False, wait_healthy=True) else: node.restart(safe=False, wait_healthy=False) + if output != "": with Then(f"check {output} is in output"): time.sleep(5) @@ -201,7 +204,7 @@ def check_wrong_config(self, node, client, config_path, modify_file, log_error=" break time.sleep(1) else: - assert False, error() + assert output in r.output, error() finally: with Finally("I restore original config"): @@ -223,3 +226,19 @@ def check_wrong_config(self, node, client, config_path, modify_file, log_error=" assert exitcode == 0, error() +@TestStep(Given) +def instrument_clickhouse_server_log(self, clickhouse_server_log="/var/log/clickhouse-server/clickhouse-server.log"): + """Instrument clickhouse-server.log for the current test + by adding start and end messages that include + current test name to the clickhouse-server.log of the specified node and + if the test fails then dump the messages from + the clickhouse-server.log for this test. + """ + all_nodes = self.context.ch_nodes + [self.context.krb_server] + + for node in all_nodes: + if node.name != "kerberos": + with When(f"output stats for {node.repr()}"): + node.command(f"echo -e \"\\n-- {current().name} -- top --\\n\" && top -bn1") + node.command(f"echo -e \"\\n-- {current().name} -- df --\\n\" && df -h") + node.command(f"echo -e \"\\n-- {current().name} -- free --\\n\" && free -mh") diff --git a/tests/testflows/kerberos/tests/config.py b/tests/testflows/kerberos/tests/config.py index 3f4bf15deb5..85af0b3214e 100644 --- a/tests/testflows/kerberos/tests/config.py +++ b/tests/testflows/kerberos/tests/config.py @@ -145,12 +145,8 @@ def multiple_principal(self): log_error="Multiple principal sections are not allowed") - - - - - @TestFeature +@Name("config") def config(self): """Perform ClickHouse Kerberos authentication testing for incorrect configuration files """ diff --git a/tests/testflows/kerberos/tests/generic.py b/tests/testflows/kerberos/tests/generic.py index 3276fd5ec5f..642b99b4fc3 100644 --- a/tests/testflows/kerberos/tests/generic.py +++ b/tests/testflows/kerberos/tests/generic.py @@ -3,8 +3,22 @@ from kerberos.tests.common import * from kerberos.requirements.requirements import * import time -import datetime -import itertools + + +@TestScenario +@Requirements( + RQ_SRS_016_Kerberos_Ping("1.0") +) +def ping(self): + """Containers should be reachable + """ + ch_nodes = self.context.ch_nodes + + for i in range(3): + with When(f"curl ch_{i} kerberos"): + r = ch_nodes[i].command(f"curl kerberos -c 1") + with Then(f"return code should be 0"): + assert r.exitcode == 7, error() @TestScenario @@ -84,8 +98,10 @@ def invalid_server_ticket(self): ch_nodes[2].cmd("kdestroy") while True: kinit_no_keytab(node=ch_nodes[2]) + create_server_principal(node=ch_nodes[0]) if ch_nodes[2].cmd(test_select_query(node=ch_nodes[0])).output == "kerberos_user": break + debug(test_select_query(node=ch_nodes[0])) ch_nodes[2].cmd("kdestroy") with And("I expect the user to be default"): @@ -97,8 +113,8 @@ def invalid_server_ticket(self): RQ_SRS_016_Kerberos_KerberosNotAvailable_InvalidClientTicket("1.0") ) def invalid_client_ticket(self): - """ClickHouse SHALL reject Kerberos authentication no Kerberos server is reachable - and client has no valid ticket (or the existing ticket is outdated). + """ClickHouse SHALL reject Kerberos authentication in case client has + no valid ticket (or the existing ticket is outdated). """ ch_nodes = self.context.ch_nodes @@ -108,8 +124,8 @@ def invalid_client_ticket(self): with And("setting up server principal"): create_server_principal(node=ch_nodes[0]) - with And("I kill kerberos-server"): - self.context.krb_server.stop() + # with And("I kill kerberos-server"): + # self.context.krb_server.stop() with And("I wait until client ticket is expired"): time.sleep(10) @@ -120,17 +136,18 @@ def invalid_client_ticket(self): with Then("I expect the user to be default"): assert r.output == "default", error() - with Finally("I start kerberos server again"): - self.context.krb_server.start() - ch_nodes[2].cmd("kdestroy") + with Finally(""): + # self.context.krb_server.start() + time.sleep(1) + ch_nodes[2].cmd(f"echo pwd | kinit -l 10:00 kerberos_user") while True: - kinit_no_keytab(node=ch_nodes[2]) + time.sleep(1) if ch_nodes[2].cmd(test_select_query(node=ch_nodes[0])).output == "kerberos_user": break ch_nodes[2].cmd("kdestroy") -@TestScenario +@TestCase @Requirements( RQ_SRS_016_Kerberos_KerberosNotAvailable_ValidTickets("1.0") ) @@ -316,9 +333,6 @@ def authentication_performance(self): ch_nodes[0].query("DROP USER pwd_user") - - - @TestFeature def generic(self): """Perform ClickHouse Kerberos authentication testing @@ -329,4 +343,4 @@ def generic(self): self.context.clients = [self.context.cluster.node(f"krb-client{i}") for i in range(1, 6)] for scenario in loads(current_module(), Scenario, Suite): - Scenario(run=scenario, flags=TE) + Scenario(run=scenario, flags=TE) #, setup=instrument_clickhouse_server_log) diff --git a/tests/testflows/kerberos/tests/parallel.py b/tests/testflows/kerberos/tests/parallel.py index 694245e524c..5d352af7df4 100644 --- a/tests/testflows/kerberos/tests/parallel.py +++ b/tests/testflows/kerberos/tests/parallel.py @@ -1,12 +1,6 @@ from testflows.core import * from kerberos.tests.common import * from kerberos.requirements.requirements import * -from multiprocessing.dummy import Pool - -import time -import datetime -import itertools - @TestScenario @Requirements( @@ -28,17 +22,17 @@ def valid_requests_same_credentials(self): return cmd(test_select_query(node=ch_nodes[0])) for i in range(15): - pool = Pool(2) tasks = [] - with When("I try simultaneous authentication"): - tasks.append(pool.apply_async(helper, (ch_nodes[1].cmd, ))) - tasks.append(pool.apply_async(helper, (ch_nodes[2].cmd, ))) - tasks[0].wait(timeout=200) - tasks[1].wait(timeout=200) + with Pool(2) as pool: + with When("I try simultaneous authentication"): + tasks.append(pool.submit(helper, (ch_nodes[1].cmd, ))) + tasks.append(pool.submit(helper, (ch_nodes[2].cmd, ))) + tasks[0].result(timeout=200) + tasks[1].result(timeout=200) - with Then(f"I expect requests to success"): - assert tasks[0].get(timeout=300).output == "kerberos_user", error() - assert tasks[1].get(timeout=300).output == "kerberos_user", error() + with Then(f"I expect requests to success"): + assert tasks[0].result(timeout=300).output == "kerberos_user", error() + assert tasks[1].result(timeout=300).output == "kerberos_user", error() @TestScenario @@ -61,26 +55,26 @@ def valid_requests_different_credentials(self): return cmd(test_select_query(node=ch_nodes[0])) for i in range(15): - pool = Pool(2) + tasks = [] + with Pool(2) as pool: + with And("add 2 kerberos users via RBAC"): + ch_nodes[0].query("CREATE USER krb1 IDENTIFIED WITH kerberos REALM 'EXAMPLE.COM'") + ch_nodes[0].query("CREATE USER krb2 IDENTIFIED WITH kerberos REALM 'EXAMPLE.COM'") - with And("add 2 kerberos users via RBAC"): - ch_nodes[0].query("CREATE USER krb1 IDENTIFIED WITH kerberos REALM 'EXAMPLE.COM'") - ch_nodes[0].query("CREATE USER krb2 IDENTIFIED WITH kerberos REALM 'EXAMPLE.COM'") + with When("I try simultaneous authentication for valid and invalid"): + tasks.append(pool.submit(helper, (ch_nodes[1].cmd, ))) + tasks.append(pool.submit(helper, (ch_nodes[2].cmd, ))) + tasks[0].result(timeout=200) + tasks[1].result(timeout=200) - with When("I try simultaneous authentication for valid and invalid"): - tasks.append(pool.apply_async(helper, (ch_nodes[1].cmd, ))) - tasks.append(pool.apply_async(helper, (ch_nodes[2].cmd, ))) - tasks[0].wait(timeout=200) - tasks[1].wait(timeout=200) + with Then(f"I expect have auth failure"): + assert tasks[1].result(timeout=300).output == "krb2", error() + assert tasks[0].result(timeout=300).output == "krb1", error() - with Then(f"I expect have auth failure"): - assert tasks[1].get(timeout=300).output == "krb2", error() - assert tasks[0].get(timeout=300).output == "krb1", error() - - with Finally("I make sure both users are removed"): - ch_nodes[0].query("DROP USER krb1", no_checks=True) - ch_nodes[0].query("DROP USER krb2", no_checks=True) + with Finally("I make sure both users are removed"): + ch_nodes[0].query("DROP USER krb1", no_checks=True) + ch_nodes[0].query("DROP USER krb2", no_checks=True) @TestScenario @@ -103,15 +97,15 @@ def valid_invalid(self): return cmd(test_select_query(node=ch_nodes[0]), no_checks=True) for i in range(15): - pool = Pool(2) tasks = [] - with When("I try simultaneous authentication for valid and invalid"): - tasks.append(pool.apply_async(helper, (ch_nodes[1].cmd, ))) # invalid - tasks.append(pool.apply_async(helper, (ch_nodes[2].cmd, ))) # valid + with Pool(2) as pool: + with When("I try simultaneous authentication for valid and invalid"): + tasks.append(pool.submit(helper, (ch_nodes[1].cmd,))) # invalid + tasks.append(pool.submit(helper, (ch_nodes[2].cmd,))) # valid - with Then(f"I expect have auth failure"): - assert tasks[1].get(timeout=300).output == "kerberos_user", error() - assert tasks[0].get(timeout=300).output != "kerberos_user", error() + with Then(f"I expect have auth failure"): + assert tasks[1].result(timeout=300).output == "kerberos_user", error() + assert tasks[0].result(timeout=300).output != "kerberos_user", error() @TestScenario @@ -134,28 +128,27 @@ def deletion(self): return cmd(test_select_query(node=ch_nodes[0], req=f"DROP USER {todel}"), no_checks=True) for i in range(15): - pool = Pool(2) tasks = [] - - with And("add 2 kerberos users via RBAC"): - ch_nodes[0].query("CREATE USER krb1 IDENTIFIED WITH kerberos REALM 'EXAMPLE.COM'") - ch_nodes[0].query("CREATE USER krb2 IDENTIFIED WITH kerberos REALM 'EXAMPLE.COM'") - ch_nodes[0].query("GRANT ACCESS MANAGEMENT ON *.* TO krb1") - ch_nodes[0].query("GRANT ACCESS MANAGEMENT ON *.* TO krb2") + with Pool(2) as pool: + with And("add 2 kerberos users via RBAC"): + ch_nodes[0].query("CREATE USER krb1 IDENTIFIED WITH kerberos REALM 'EXAMPLE.COM'") + ch_nodes[0].query("CREATE USER krb2 IDENTIFIED WITH kerberos REALM 'EXAMPLE.COM'") + ch_nodes[0].query("GRANT ACCESS MANAGEMENT ON *.* TO krb1") + ch_nodes[0].query("GRANT ACCESS MANAGEMENT ON *.* TO krb2") - with When("I try simultaneous authentication for valid and invalid"): - tasks.append(pool.apply_async(helper, (ch_nodes[1].cmd, "krb2"))) - tasks.append(pool.apply_async(helper, (ch_nodes[2].cmd, "krb1"))) - tasks[0].wait(timeout=200) - tasks[1].wait(timeout=200) + with When("I try simultaneous authentication for valid and invalid"): + tasks.append(pool.submit(helper, (ch_nodes[1].cmd, "krb2"))) + tasks.append(pool.submit(helper, (ch_nodes[2].cmd, "krb1"))) + tasks[0].result(timeout=200) + tasks[1].result(timeout=200) - with Then(f"I check CH is alive"): - assert ch_nodes[0].query("SELECT 1").output == "1", error() + with Then(f"I check CH is alive"): + assert ch_nodes[0].query("SELECT 1").output == "1", error() - with Finally("I make sure both users are removed"): - ch_nodes[0].query("DROP USER krb1", no_checks=True) - ch_nodes[0].query("DROP USER krb2", no_checks=True) + with Finally("I make sure both users are removed"): + ch_nodes[0].query("DROP USER krb1", no_checks=True) + ch_nodes[0].query("DROP USER krb2", no_checks=True) @TestScenario @@ -177,15 +170,15 @@ def kerberos_and_nonkerberos(self): return cmd(test_select_query(node=ch_nodes[0], krb_auth=krb_auth), no_checks=True) for i in range(15): - pool = Pool(2) tasks = [] - with When("I try simultaneous authentication for valid and invalid"): - tasks.append(pool.apply_async(helper, (ch_nodes[1].cmd, False))) # non-kerberos - tasks.append(pool.apply_async(helper, (ch_nodes[2].cmd, True))) # kerberos + with Pool(2) as pool: + with When("I try simultaneous authentication for valid and invalid"): + tasks.append(pool.submit(helper, (ch_nodes[1].cmd, False))) # non-kerberos + tasks.append(pool.submit(helper, (ch_nodes[2].cmd, True))) # kerberos - with Then(f"I expect have auth failure"): - assert tasks[1].get(timeout=300).output == "kerberos_user", error() - assert tasks[0].get(timeout=300).output == "default", error() + with Then(f"I expect have auth failure"): + assert tasks[1].result(timeout=300).output == "kerberos_user", error() + assert tasks[0].result(timeout=300).output == "default", error() @TestFeature diff --git a/tests/testflows/map_type/regression.py b/tests/testflows/map_type/regression.py index 9f9c2b2b261..049585dea81 100755 --- a/tests/testflows/map_type/regression.py +++ b/tests/testflows/map_type/regression.py @@ -109,10 +109,9 @@ xflags = { @Specifications( SRS018_ClickHouse_Map_Data_Type ) -def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): +def regression(self, local, clickhouse_binary_path, stress=None): """Map type regression. """ - top().terminating = False nodes = { "clickhouse": ("clickhouse1", "clickhouse2", "clickhouse3") @@ -120,8 +119,6 @@ def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): if stress is not None: self.context.stress = stress - if parallel is not None: - self.context.parallel = parallel with Cluster(local, clickhouse_binary_path, nodes=nodes, docker_compose_project_dir=os.path.join(current_dir(), "map_type_env")) as cluster: diff --git a/tests/testflows/rbac/helper/tables.py b/tests/testflows/rbac/helper/tables.py index 5d14bb34a83..ee6289bcbb5 100755 --- a/tests/testflows/rbac/helper/tables.py +++ b/tests/testflows/rbac/helper/tables.py @@ -3,39 +3,39 @@ from collections import namedtuple table_tuple = namedtuple("table_tuple", "create_statement cluster") table_types = { - "MergeTree": table_tuple("CREATE TABLE {name} (d DATE, a String, b UInt8, x String, y Int8) ENGINE = MergeTree() PARTITION BY y ORDER BY d", None), - "ReplacingMergeTree": table_tuple("CREATE TABLE {name} (d DATE, a String, b UInt8, x String, y Int8) ENGINE = ReplacingMergeTree() PARTITION BY y ORDER BY d", None), - "SummingMergeTree": table_tuple("CREATE TABLE {name} (d DATE, a String, b UInt8 DEFAULT 1, x String, y Int8) ENGINE = SummingMergeTree() PARTITION BY y ORDER BY d", None), - "AggregatingMergeTree": table_tuple("CREATE TABLE {name} (d DATE, a String, b UInt8, x String, y Int8) ENGINE = AggregatingMergeTree() PARTITION BY y ORDER BY d", None), - "CollapsingMergeTree": table_tuple("CREATE TABLE {name} (d Date, a String, b UInt8, x String, y Int8, sign Int8 DEFAULT 1) ENGINE = CollapsingMergeTree(sign) PARTITION BY y ORDER BY d", None), - "VersionedCollapsingMergeTree": table_tuple("CREATE TABLE {name} (d Date, a String, b UInt8, x String, y Int8, version UInt64, sign Int8 DEFAULT 1) ENGINE = VersionedCollapsingMergeTree(sign, version) PARTITION BY y ORDER BY d", None), - "GraphiteMergeTree": table_tuple("CREATE TABLE {name} (d Date, a String, b UInt8, x String, y Int8, Path String, Time DateTime, Value Float64, col UInt64, Timestamp Int64) ENGINE = GraphiteMergeTree('graphite_rollup_example') PARTITION BY y ORDER by d", None), + "MergeTree": table_tuple("CREATE TABLE {name} (d DATE, a String, b UInt8, x String, y Int8) ENGINE = MergeTree() PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", None), + "ReplacingMergeTree": table_tuple("CREATE TABLE {name} (d DATE, a String, b UInt8, x String, y Int8) ENGINE = ReplacingMergeTree() PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", None), + "SummingMergeTree": table_tuple("CREATE TABLE {name} (d DATE, a String, b UInt8 DEFAULT 1, x String, y Int8) ENGINE = SummingMergeTree() PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", None), + "AggregatingMergeTree": table_tuple("CREATE TABLE {name} (d DATE, a String, b UInt8, x String, y Int8) ENGINE = AggregatingMergeTree() PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", None), + "CollapsingMergeTree": table_tuple("CREATE TABLE {name} (d Date, a String, b UInt8, x String, y Int8, sign Int8 DEFAULT 1) ENGINE = CollapsingMergeTree(sign) PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", None), + "VersionedCollapsingMergeTree": table_tuple("CREATE TABLE {name} (d Date, a String, b UInt8, x String, y Int8, version UInt64, sign Int8 DEFAULT 1) ENGINE = VersionedCollapsingMergeTree(sign, version) PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", None), + "GraphiteMergeTree": table_tuple("CREATE TABLE {name} (d Date, a String, b UInt8, x String, y Int8, Path String, Time DateTime, Value Float64, col UInt64, Timestamp Int64) ENGINE = GraphiteMergeTree('graphite_rollup_example') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", None), "ReplicatedMergeTree-sharded_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER sharded_cluster (d DATE, a String, b UInt8, x String, y Int8) \ - ENGINE = ReplicatedMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY d", "sharded_cluster"), + ENGINE = ReplicatedMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "sharded_cluster"), "ReplicatedMergeTree-one_shard_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER one_shard_cluster (d DATE, a String, b UInt8, x String, y Int8) \ - ENGINE = ReplicatedMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY d", "one_shard_cluster"), + ENGINE = ReplicatedMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "one_shard_cluster"), "ReplicatedReplacingMergeTree-sharded_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER sharded_cluster (d DATE, a String, b UInt8, x String, y Int8) \ - ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY d", "sharded_cluster"), + ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "sharded_cluster"), "ReplicatedReplacingMergeTree-one_shard_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER one_shard_cluster (d DATE, a String, b UInt8, x String, y Int8) \ - ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY d", "one_shard_cluster"), + ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "one_shard_cluster"), "ReplicatedSummingMergeTree-sharded_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER sharded_cluster (d DATE, a String, b UInt8 DEFAULT 1, x String, y Int8) \ - ENGINE = ReplicatedSummingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY d", "sharded_cluster"), + ENGINE = ReplicatedSummingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "sharded_cluster"), "ReplicatedSummingMergeTree-one_shard_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER one_shard_cluster (d DATE, a String, b UInt8 DEFAULT 1, x String, y Int8) \ - ENGINE = ReplicatedSummingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY d", "one_shard_cluster"), + ENGINE = ReplicatedSummingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "one_shard_cluster"), "ReplicatedAggregatingMergeTree-sharded_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER sharded_cluster (d DATE, a String, b UInt8, x String, y Int8) \ - ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY d", "sharded_cluster"), + ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "sharded_cluster"), "ReplicatedAggregatingMergeTree-one_shard_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER one_shard_cluster (d DATE, a String, b UInt8, x String, y Int8) \ - ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY d", "one_shard_cluster"), + ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "one_shard_cluster"), "ReplicatedCollapsingMergeTree-sharded_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER sharded_cluster (d Date, a String, b UInt8, x String, y Int8, sign Int8 DEFAULT 1) \ - ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', sign) PARTITION BY y ORDER BY d", "sharded_cluster"), + ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', sign) PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "sharded_cluster"), "ReplicatedCollapsingMergeTree-one_shard_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER one_shard_cluster (d Date, a String, b UInt8, x String, y Int8, sign Int8 DEFAULT 1) \ - ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', sign) PARTITION BY y ORDER BY d", "one_shard_cluster"), + ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', sign) PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "one_shard_cluster"), "ReplicatedVersionedCollapsingMergeTree-sharded_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER sharded_cluster (d Date, a String, b UInt8, x String, y Int8, version UInt64, sign Int8 DEFAULT 1) \ - ENGINE = ReplicatedVersionedCollapsingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', sign, version) PARTITION BY y ORDER BY d", "sharded_cluster"), + ENGINE = ReplicatedVersionedCollapsingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', sign, version) PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "sharded_cluster"), "ReplicatedVersionedCollapsingMergeTree-one_shard_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER one_shard_cluster (d Date, a String, b UInt8, x String, y Int8, version UInt64, sign Int8 DEFAULT 1) \ - ENGINE = ReplicatedVersionedCollapsingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', sign, version) PARTITION BY y ORDER BY d", "one_shard_cluster"), + ENGINE = ReplicatedVersionedCollapsingMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', sign, version) PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "one_shard_cluster"), "ReplicatedGraphiteMergeTree-sharded_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER sharded_cluster (d Date, a String, b UInt8, x String, y Int8, Path String, Time DateTime, Value Float64, col UInt64, Timestamp Int64) \ - ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', 'graphite_rollup_example') PARTITION BY y ORDER BY d", "sharded_cluster"), + ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', 'graphite_rollup_example') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "sharded_cluster"), "ReplicatedGraphiteMergeTree-one_shard_cluster": table_tuple("CREATE TABLE {name} ON CLUSTER one_shard_cluster (d Date, a String, b UInt8, x String, y Int8, Path String, Time DateTime, Value Float64, col UInt64, Timestamp Int64) \ - ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', 'graphite_rollup_example') PARTITION BY y ORDER BY d", "one_shard_cluster"), -} \ No newline at end of file + ENGINE = ReplicatedGraphiteMergeTree('/clickhouse/tables/{{shard}}/{name}', '{{replica}}', 'graphite_rollup_example') PARTITION BY y ORDER BY (b, d) PRIMARY KEY b", "one_shard_cluster"), +} diff --git a/tests/testflows/rbac/regression.py b/tests/testflows/rbac/regression.py index 145865b2fa9..590f384288b 100755 --- a/tests/testflows/rbac/regression.py +++ b/tests/testflows/rbac/regression.py @@ -31,6 +31,7 @@ issue_18206 = "https://github.com/ClickHouse/ClickHouse/issues/18206" issue_21083 = "https://github.com/ClickHouse/ClickHouse/issues/21083" issue_21084 = "https://github.com/ClickHouse/ClickHouse/issues/21084" issue_25413 = "https://github.com/ClickHouse/ClickHouse/issues/25413" +issue_26746 = "https://github.com/ClickHouse/ClickHouse/issues/26746" xfails = { "syntax/show create quota/I show create quota current": @@ -150,7 +151,15 @@ xfails = { "privileges/kill mutation/:/:/KILL ALTER : with revoked privilege": [(Fail, issue_25413)], "privileges/kill mutation/:/:/KILL ALTER : with revoked ALL privilege": - [(Fail, issue_25413)] + [(Fail, issue_25413)], + "privileges/create table/create with subquery privilege granted directly or via role/create with subquery, privilege granted directly": + [(Fail, issue_26746)], + "privileges/create table/create with subquery privilege granted directly or via role/create with subquery, privilege granted through a role": + [(Fail, issue_26746)], + "views/live view/create with join subquery privilege granted directly or via role/create with join subquery, privilege granted directly": + [(Fail, issue_26746)], + "views/live view/create with join subquery privilege granted directly or via role/create with join subquery, privilege granted through a role": + [(Fail, issue_26746)] } xflags = { diff --git a/tests/testflows/rbac/tests/privileges/alter/alter_index.py b/tests/testflows/rbac/tests/privileges/alter/alter_index.py index 379abd52d8c..78f7134a8b7 100755 --- a/tests/testflows/rbac/tests/privileges/alter/alter_index.py +++ b/tests/testflows/rbac/tests/privileges/alter/alter_index.py @@ -128,10 +128,10 @@ def check_order_by_when_privilege_is_granted(table, user, node): column = "order" with Given("I run sanity check"): - node.query(f"ALTER TABLE {table} MODIFY ORDER BY d", settings = [("user", user)]) + node.query(f"ALTER TABLE {table} MODIFY ORDER BY b", settings = [("user", user)]) with And("I add new column and modify order using that column"): - node.query(f"ALTER TABLE {table} ADD COLUMN {column} UInt32, MODIFY ORDER BY (d, {column})") + node.query(f"ALTER TABLE {table} ADD COLUMN {column} UInt32, MODIFY ORDER BY (b, {column})") with When(f"I insert random data into the ordered-by column {column}"): data = random.sample(range(1,1000),100) @@ -151,7 +151,7 @@ def check_order_by_when_privilege_is_granted(table, user, node): with And("I verify that the sorting key is present in the table"): output = json.loads(node.query(f"SHOW CREATE TABLE {table} FORMAT JSONEachRow").output) - assert f"ORDER BY (d, {column})" in output['statement'], error() + assert f"ORDER BY (b, {column})" in output['statement'], error() with But(f"I cannot drop the required column {column}"): exitcode, message = errors.missing_columns(column) @@ -163,21 +163,13 @@ def check_sample_by_when_privilege_is_granted(table, user, node): """ column = 'sample' - with Given(f"I add new column {column}"): - node.query(f"ALTER TABLE {table} ADD COLUMN {column} UInt32") - with When(f"I add sample by clause"): - node.query(f"ALTER TABLE {table} MODIFY SAMPLE BY (d, {column})", + node.query(f"ALTER TABLE {table} MODIFY SAMPLE BY b", settings = [("user", user)]) with Then("I verify that the sample is in the table"): output = json.loads(node.query(f"SHOW CREATE TABLE {table} FORMAT JSONEachRow").output) - assert f"SAMPLE BY (d, {column})" in output['statement'], error() - - with But(f"I cannot drop the required column {column}"): - exitcode, message = errors.missing_columns(column) - node.query(f"ALTER TABLE {table} DROP COLUMN {column}", - exitcode=exitcode, message=message) + assert f"SAMPLE BY b" in output['statement'], error() def check_add_index_when_privilege_is_granted(table, user, node): """Ensures ADD INDEX runs as expected when the privilege is granted to the specified user @@ -258,7 +250,7 @@ def check_order_by_when_privilege_is_not_granted(table, user, node): """ with When("I try to use privilege that has not been granted"): exitcode, message = errors.not_enough_privileges(user) - node.query(f"ALTER TABLE {table} MODIFY ORDER BY d", + node.query(f"ALTER TABLE {table} MODIFY ORDER BY b", settings = [("user", user)], exitcode=exitcode, message=message) def check_sample_by_when_privilege_is_not_granted(table, user, node): @@ -266,7 +258,7 @@ def check_sample_by_when_privilege_is_not_granted(table, user, node): """ with When("I try to use privilege that has not been granted"): exitcode, message = errors.not_enough_privileges(user) - node.query(f"ALTER TABLE {table} MODIFY SAMPLE BY d", + node.query(f"ALTER TABLE {table} MODIFY SAMPLE BY b", settings = [("user", user)], exitcode=exitcode, message=message) def check_add_index_when_privilege_is_not_granted(table, user, node): diff --git a/tests/testflows/regression.py b/tests/testflows/regression.py index 8932e6bcf8f..ba2ea3b111c 100755 --- a/tests/testflows/regression.py +++ b/tests/testflows/regression.py @@ -4,35 +4,31 @@ from testflows.core import * append_path(sys.path, ".") -from helpers.common import Pool, join, run_scenario from helpers.argparser import argparser @TestModule @Name("clickhouse") @ArgumentParser(argparser) -def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): +def regression(self, local, clickhouse_binary_path, stress=None): """ClickHouse regression. """ - top().terminating = False - args = {"local": local, "clickhouse_binary_path": clickhouse_binary_path, "stress": stress, "parallel": parallel} + args = {"local": local, "clickhouse_binary_path": clickhouse_binary_path, "stress": stress} self.context.stress = stress - self.context.parallel = parallel - tasks = [] with Pool(8) as pool: try: - run_scenario(pool, tasks, Feature(test=load("example.regression", "regression")), args) - #run_scenario(pool, tasks, Feature(test=load("ldap.regression", "regression")), args) - run_scenario(pool, tasks, Feature(test=load("rbac.regression", "regression")), args) - run_scenario(pool, tasks, Feature(test=load("aes_encryption.regression", "regression")), args) - run_scenario(pool, tasks, Feature(test=load("map_type.regression", "regression")), args) - run_scenario(pool, tasks, Feature(test=load("window_functions.regression", "regression")), args) - run_scenario(pool, tasks, Feature(test=load("datetime64_extended_range.regression", "regression")), args) - #run_scenario(pool, tasks, Feature(test=load("kerberos.regression", "regression")), args) - run_scenario(pool, tasks, Feature(test=load("extended_precision_data_types.regression", "regression")), args) + Feature(test=load("example.regression", "regression"), parallel=True, executor=pool)(**args) + # run_scenario(pool, tasks, Feature(test=load("ldap.regression", "regression")), args) + # run_scenario(pool, tasks, Feature(test=load("rbac.regression", "regression")), args) + Feature(test=load("aes_encryption.regression", "regression"), parallel=True, executor=pool)(**args) + Feature(test=load("map_type.regression", "regression"), parallel=True, executor=pool)(**args) + Feature(test=load("window_functions.regression", "regression"), parallel=True, executor=pool)(**args) + Feature(test=load("datetime64_extended_range.regression", "regression"), parallel=True, executor=pool)(**args) + Feature(test=load("kerberos.regression", "regression"), parallel=True, executor=pool)(**args) + Feature(test=load("extended_precision_data_types.regression", "regression"), parallel=True, executor=pool)(**args) finally: - join(tasks) + join() if main(): regression() diff --git a/tests/testflows/window_functions/regression.py b/tests/testflows/window_functions/regression.py index 778a829082f..2c70fc1d075 100755 --- a/tests/testflows/window_functions/regression.py +++ b/tests/testflows/window_functions/regression.py @@ -91,10 +91,9 @@ xflags = { @Requirements( RQ_SRS_019_ClickHouse_WindowFunctions("1.0") ) -def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): +def regression(self, local, clickhouse_binary_path, stress=None): """Window functions regression. """ - top().terminating = False nodes = { "clickhouse": ("clickhouse1", "clickhouse2", "clickhouse3") @@ -102,8 +101,6 @@ def regression(self, local, clickhouse_binary_path, stress=None, parallel=None): if stress is not None: self.context.stress = stress - if parallel is not None: - self.context.parallel = parallel with Cluster(local, clickhouse_binary_path, nodes=nodes, docker_compose_project_dir=os.path.join(current_dir(), "window_functions_env")) as cluster: diff --git a/tests/testflows/window_functions/tests/common.py b/tests/testflows/window_functions/tests/common.py index 3a6ac95bd9b..3ed4f794ada 100644 --- a/tests/testflows/window_functions/tests/common.py +++ b/tests/testflows/window_functions/tests/common.py @@ -386,23 +386,3 @@ def create_table(self, name, statement, on_cluster=False): node.query(f"DROP TABLE IF EXISTS {name} ON CLUSTER {on_cluster}") else: node.query(f"DROP TABLE IF EXISTS {name}") - -@TestStep(Given) -def allow_experimental_window_functions(self): - """Set allow_experimental_window_functions = 1 - """ - setting = ("allow_experimental_window_functions", 1) - default_query_settings = None - - try: - with By("adding allow_experimental_window_functions to the default query settings"): - default_query_settings = getsattr(current().context, "default_query_settings", []) - default_query_settings.append(setting) - yield - finally: - with Finally("I remove allow_experimental_window_functions from the default query settings"): - if default_query_settings: - try: - default_query_settings.pop(default_query_settings.index(setting)) - except ValueError: - pass diff --git a/tests/testflows/window_functions/tests/errors.py b/tests/testflows/window_functions/tests/errors.py index 0935c00885d..d7b80ed7cd8 100644 --- a/tests/testflows/window_functions/tests/errors.py +++ b/tests/testflows/window_functions/tests/errors.py @@ -44,8 +44,8 @@ def error_window_function_in_where(self): def error_window_function_in_join(self): """Check that trying to use window function in `JOIN` returns an error. """ - exitcode = 48 - message = "DB::Exception: JOIN ON inequalities are not supported. Unexpected 'row_number() OVER (ORDER BY salary ASC) < 10" + exitcode = 147 + message = "DB::Exception: Cannot get JOIN keys from JOIN ON section: row_number() OVER (ORDER BY salary ASC) < 10" sql = ("SELECT * FROM empsalary INNER JOIN tenk1 ON row_number() OVER (ORDER BY salary) < 10") diff --git a/tests/testflows/window_functions/tests/feature.py b/tests/testflows/window_functions/tests/feature.py index 124660e8802..f6c565d116b 100755 --- a/tests/testflows/window_functions/tests/feature.py +++ b/tests/testflows/window_functions/tests/feature.py @@ -17,10 +17,7 @@ def feature(self, distributed, node="clickhouse1"): self.context.distributed = distributed self.context.node = self.context.cluster.node(node) - with Given("I allow experimental window functions"): - allow_experimental_window_functions() - - with And("employee salary table"): + with Given("employee salary table"): empsalary_table(distributed=distributed) with And("tenk1 table"): diff --git a/utils/antlr/ClickHouseParser.g4 b/utils/antlr/ClickHouseParser.g4 index 28e5b1217ab..eb1908ed073 100644 --- a/utils/antlr/ClickHouseParser.g4 +++ b/utils/antlr/ClickHouseParser.g4 @@ -91,10 +91,10 @@ checkStmt: CHECK TABLE tableIdentifier partitionClause?; createStmt : (ATTACH | CREATE) DATABASE (IF NOT EXISTS)? databaseIdentifier clusterClause? engineExpr? # CreateDatabaseStmt - | (ATTACH | CREATE) DICTIONARY (IF NOT EXISTS)? tableIdentifier uuidClause? clusterClause? dictionarySchemaClause dictionaryEngineClause # CreateDictionaryStmt + | (ATTACH | CREATE (OR REPLACE)? | REPLACE) DICTIONARY (IF NOT EXISTS)? tableIdentifier uuidClause? clusterClause? dictionarySchemaClause dictionaryEngineClause # CreateDictionaryStmt | (ATTACH | CREATE) LIVE VIEW (IF NOT EXISTS)? tableIdentifier uuidClause? clusterClause? (WITH TIMEOUT DECIMAL_LITERAL?)? destinationClause? tableSchemaClause? subqueryClause # CreateLiveViewStmt | (ATTACH | CREATE) MATERIALIZED VIEW (IF NOT EXISTS)? tableIdentifier uuidClause? clusterClause? tableSchemaClause? (destinationClause | engineClause POPULATE?) subqueryClause # CreateMaterializedViewStmt - | (ATTACH | CREATE) TEMPORARY? TABLE (IF NOT EXISTS)? tableIdentifier uuidClause? clusterClause? tableSchemaClause? engineClause? subqueryClause? # CreateTableStmt + | (ATTACH | CREATE (OR REPLACE)? | REPLACE) TEMPORARY? TABLE (IF NOT EXISTS)? tableIdentifier uuidClause? clusterClause? tableSchemaClause? engineClause? subqueryClause? # CreateTableStmt | (ATTACH | CREATE) (OR REPLACE)? VIEW (IF NOT EXISTS)? tableIdentifier uuidClause? clusterClause? tableSchemaClause? subqueryClause # CreateViewStmt ; diff --git a/utils/changelog/README.md b/utils/changelog/README.md index ff3ac39f632..69a190fdedc 100644 --- a/utils/changelog/README.md +++ b/utils/changelog/README.md @@ -1,5 +1,14 @@ ## Generate changelog +Generate github token: +* https://github.com/settings/tokens - keep all checkboxes unchecked, no scopes need to be enabled. + +Dependencies: +``` + apt-get install git curl jq python3 python3-fuzzywuzzy +``` + + Usage example: ``` diff --git a/utils/keeper-bench/Runner.cpp b/utils/keeper-bench/Runner.cpp index d3f51fb2356..1c8deeca476 100644 --- a/utils/keeper-bench/Runner.cpp +++ b/utils/keeper-bench/Runner.cpp @@ -181,7 +181,8 @@ std::vector> Runner::getConnections() "", /*identity*/ Poco::Timespan(0, 30000 * 1000), Poco::Timespan(0, 1000 * 1000), - Poco::Timespan(0, 10000 * 1000))); + Poco::Timespan(0, 10000 * 1000), + nullptr)); } return zookeepers; diff --git a/utils/list-versions/version_date.tsv b/utils/list-versions/version_date.tsv index 28c2d7b1523..5a1d2f4c098 100644 --- a/utils/list-versions/version_date.tsv +++ b/utils/list-versions/version_date.tsv @@ -1,5 +1,9 @@ +v21.7.6.39-stable 2021-08-06 +v21.7.5.29-stable 2021-07-28 +v21.7.4.18-stable 2021-07-17 v21.7.3.14-stable 2021-07-13 v21.7.2.7-stable 2021-07-09 +v21.6.8.62-stable 2021-07-13 v21.6.7.57-stable 2021-07-09 v21.6.6.51-stable 2021-07-02 v21.6.5.37-stable 2021-06-19 diff --git a/website/benchmark/hardware/results/oracle.json b/website/benchmark/hardware/results/oracle.json new file mode 100644 index 00000000000..6470b70e109 --- /dev/null +++ b/website/benchmark/hardware/results/oracle.json @@ -0,0 +1,54 @@ +[ + { + "system": "Oracle Cloud ARM 4vCPU", + "system_full": "Oracle Cloud (free tier), Ampere Altra, 4 vCPU, 24 GiB RAM", + "time": "2021-07-28 00:00:00", + "kind": "cloud", + "result": + [ +[0.031, 0.008, 0.002], +[0.087, 0.044, 0.039], +[0.193, 0.068, 0.068], +[0.807, 0.110, 0.089], +[1.622, 0.230, 0.244], +[2.539, 0.765, 0.742], +[0.146, 0.087, 0.092], +[0.074, 0.044, 0.038], +[1.650, 0.973, 0.978], +[2.020, 1.139, 1.166], +[0.907, 0.530, 0.509], +[0.964, 0.618, 0.608], +[2.448, 1.500, 1.529], +[4.810, 1.994, 1.930], +[2.932, 1.814, 1.853], +[1.935, 1.577, 1.583], +[6.272, 4.697, 4.650], +[4.279, 2.832, 2.871], +[12.380, 9.137, 9.085], +[0.601, 0.167, 0.118], +[25.357, 1.873, 1.848], +[28.153, 2.274, 2.202], +[53.116, 4.946, 4.907], +[56.118, 2.229, 2.192], +[5.749, 0.732, 0.696], +[1.829, 0.601, 0.592], +[5.860, 0.748, 0.709], +[24.439, 1.954, 1.949], +[20.452, 3.093, 3.042], +[1.539, 1.448, 1.437], +[4.704, 1.362, 1.430], +[12.698, 1.997, 1.940], +[12.854, 10.336, 10.454], +[26.098, 6.737, 6.771], +[26.259, 6.679, 6.677], +[2.602, 2.305, 2.278], +[0.283, 0.182, 0.181], +[0.130, 0.101, 0.085], +[0.174, 0.068, 0.073], +[0.557, 0.374, 0.377], +[0.066, 0.017, 0.017], +[0.049, 0.014, 0.014], +[0.033, 0.006, 0.004] + ] + } +] diff --git a/website/css/bootstrap.css b/website/css/bootstrap.css index b52c19c2b8a..0703d5d3b65 100644 --- a/website/css/bootstrap.css +++ b/website/css/bootstrap.css @@ -527,7 +527,6 @@ kbd kbd { pre { display: block; font-size: 87.5%; - color: #212529; } pre code {