Merge branch 'master' into less-header-1

This commit is contained in:
János Benjamin Antal 2024-03-18 10:25:42 +01:00 committed by GitHub
commit 4b4a03c0f1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1147 changed files with 24078 additions and 24848 deletions

View File

@ -5,128 +5,127 @@
# a) the new check is not controversial (this includes many checks in readability-* and google-*) or
# b) too noisy (checks with > 100 new warnings are considered noisy, this includes e.g. cppcoreguidelines-*).
# TODO: Once clang(-tidy) 17 is the minimum, we can convert this list to YAML
# See https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/ReleaseNotes.html#improvements-to-clang-tidy
# TODO Let clang-tidy check headers in further directories
# --> HeaderFilterRegex: '^.*/(src|base|programs|utils)/.*(h|hpp)$'
HeaderFilterRegex: '^.*/(base|programs|utils)/.*(h|hpp)$'
Checks: '*,
-abseil-*,
Checks: [
'*',
-altera-*,
'-abseil-*',
-android-*,
'-altera-*',
-bugprone-assignment-in-if-condition,
-bugprone-branch-clone,
-bugprone-easily-swappable-parameters,
-bugprone-exception-escape,
-bugprone-implicit-widening-of-multiplication-result,
-bugprone-narrowing-conversions,
-bugprone-not-null-terminated-result,
-bugprone-reserved-identifier, # useful but too slow, TODO retry when https://reviews.llvm.org/rG1c282052624f9d0bd273bde0b47b30c96699c6c7 is merged
-bugprone-unchecked-optional-access,
'-android-*',
-cert-dcl16-c,
-cert-dcl37-c,
-cert-dcl51-cpp,
-cert-err58-cpp,
-cert-msc32-c,
-cert-msc51-cpp,
-cert-oop54-cpp,
-cert-oop57-cpp,
'-bugprone-assignment-in-if-condition',
'-bugprone-branch-clone',
'-bugprone-easily-swappable-parameters',
'-bugprone-exception-escape',
'-bugprone-implicit-widening-of-multiplication-result',
'-bugprone-narrowing-conversions',
'-bugprone-not-null-terminated-result',
'-bugprone-reserved-identifier', # useful but too slow, TODO retry when https://reviews.llvm.org/rG1c282052624f9d0bd273bde0b47b30c96699c6c7 is merged
'-bugprone-unchecked-optional-access',
-clang-analyzer-unix.Malloc,
'-cert-dcl16-c',
'-cert-dcl37-c',
'-cert-dcl51-cpp',
'-cert-err58-cpp',
'-cert-msc32-c',
'-cert-msc51-cpp',
'-cert-oop54-cpp',
'-cert-oop57-cpp',
-cppcoreguidelines-*, # impractical in a codebase as large as ClickHouse, also slow
'-clang-analyzer-unix.Malloc',
-darwin-*,
'-cppcoreguidelines-*', # impractical in a codebase as large as ClickHouse, also slow
-fuchsia-*,
'-darwin-*',
-google-build-using-namespace,
-google-readability-braces-around-statements,
-google-readability-casting,
-google-readability-function-size,
-google-readability-namespace-comments,
-google-readability-todo,
'-fuchsia-*',
-hicpp-avoid-c-arrays,
-hicpp-avoid-goto,
-hicpp-braces-around-statements,
-hicpp-explicit-conversions,
-hicpp-function-size,
-hicpp-member-init,
-hicpp-move-const-arg,
-hicpp-multiway-paths-covered,
-hicpp-named-parameter,
-hicpp-no-array-decay,
-hicpp-no-assembler,
-hicpp-no-malloc,
-hicpp-signed-bitwise,
-hicpp-special-member-functions,
-hicpp-uppercase-literal-suffix,
-hicpp-use-auto,
-hicpp-use-emplace,
-hicpp-vararg,
'-google-build-using-namespace',
'-google-readability-braces-around-statements',
'-google-readability-casting',
'-google-readability-function-size',
'-google-readability-namespace-comments',
'-google-readability-todo',
-linuxkernel-*,
'-hicpp-avoid-c-arrays',
'-hicpp-avoid-goto',
'-hicpp-braces-around-statements',
'-hicpp-explicit-conversions',
'-hicpp-function-size',
'-hicpp-member-init',
'-hicpp-move-const-arg',
'-hicpp-multiway-paths-covered',
'-hicpp-named-parameter',
'-hicpp-no-array-decay',
'-hicpp-no-assembler',
'-hicpp-no-malloc',
'-hicpp-signed-bitwise',
'-hicpp-special-member-functions',
'-hicpp-uppercase-literal-suffix',
'-hicpp-use-auto',
'-hicpp-use-emplace',
'-hicpp-vararg',
-llvm-*,
'-linuxkernel-*',
-llvmlibc-*,
'-llvm-*',
-openmp-*,
'-llvmlibc-*',
-misc-const-correctness,
-misc-include-cleaner, # useful but far too many occurrences
-misc-no-recursion,
-misc-non-private-member-variables-in-classes,
-misc-confusable-identifiers, # useful but slooow
-misc-use-anonymous-namespace,
'-openmp-*',
-modernize-avoid-c-arrays,
-modernize-concat-nested-namespaces,
-modernize-macro-to-enum,
-modernize-pass-by-value,
-modernize-return-braced-init-list,
-modernize-use-auto,
-modernize-use-default-member-init,
-modernize-use-emplace,
-modernize-use-nodiscard,
-modernize-use-override,
-modernize-use-trailing-return-type,
'-misc-const-correctness',
'-misc-include-cleaner', # useful but far too many occurrences
'-misc-no-recursion',
'-misc-non-private-member-variables-in-classes',
'-misc-confusable-identifiers', # useful but slooo
'-misc-use-anonymous-namespace',
-performance-inefficient-string-concatenation,
-performance-no-int-to-ptr,
-performance-avoid-endl,
-performance-unnecessary-value-param,
'-modernize-avoid-c-arrays',
'-modernize-concat-nested-namespaces',
'-modernize-macro-to-enum',
'-modernize-pass-by-value',
'-modernize-return-braced-init-list',
'-modernize-use-auto',
'-modernize-use-default-member-init',
'-modernize-use-emplace',
'-modernize-use-nodiscard',
'-modernize-use-override',
'-modernize-use-trailing-return-type',
-portability-simd-intrinsics,
'-performance-inefficient-string-concatenation',
'-performance-no-int-to-ptr',
'-performance-avoid-endl',
'-performance-unnecessary-value-param',
-readability-avoid-unconditional-preprocessor-if,
-readability-braces-around-statements,
-readability-convert-member-functions-to-static,
-readability-else-after-return,
-readability-function-cognitive-complexity,
-readability-function-size,
-readability-identifier-length,
-readability-identifier-naming, # useful but too slow
-readability-implicit-bool-conversion,
-readability-isolate-declaration,
-readability-magic-numbers,
-readability-named-parameter,
-readability-redundant-declaration,
-readability-simplify-boolean-expr,
-readability-static-accessed-through-instance,
-readability-suspicious-call-argument,
-readability-uppercase-literal-suffix,
-readability-use-anyofallof,
'-portability-simd-intrinsics',
-zircon-*,
'
'-readability-avoid-unconditional-preprocessor-if',
'-readability-braces-around-statements',
'-readability-convert-member-functions-to-static',
'-readability-else-after-return',
'-readability-function-cognitive-complexity',
'-readability-function-size',
'-readability-identifier-length',
'-readability-identifier-naming', # useful but too slow
'-readability-implicit-bool-conversion',
'-readability-isolate-declaration',
'-readability-magic-numbers',
'-readability-named-parameter',
'-readability-redundant-declaration',
'-readability-simplify-boolean-expr',
'-readability-static-accessed-through-instance',
'-readability-suspicious-call-argument',
'-readability-uppercase-literal-suffix',
'-readability-use-anyofallof',
'-zircon-*'
]
WarningsAsErrors: '*'

View File

@ -305,7 +305,7 @@ jobs:
runner_type: style-checker-aarch64
data: ${{ needs.RunConfig.outputs.data }}
MarkReleaseReady:
if: ${{ ! (contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure')) }}
if: ${{ !failure() && !cancelled() }}
needs:
- BuilderBinDarwin
- BuilderBinDarwinAarch64
@ -313,9 +313,25 @@ jobs:
- BuilderDebAarch64
runs-on: [self-hosted, style-checker]
steps:
- name: Debug
run: |
echo need with different filters
cat << 'EOF'
${{ toJSON(needs) }}
${{ toJSON(needs.*.result) }}
no failures ${{ !contains(needs.*.result, 'failure') }}
no skips ${{ !contains(needs.*.result, 'skipped') }}
no both ${{ !(contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure')) }}
EOF
- name: Not ready
# fail the job to be able restart it
if: ${{ contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure') }}
run: exit 1
- name: Check out repository code
if: ${{ ! (contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure')) }}
uses: ClickHouse/checkout@v1
- name: Mark Commit Release Ready
if: ${{ ! (contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure')) }}
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 mark_release_ready.py

View File

@ -45,62 +45,3 @@ jobs:
with:
data: "${{ needs.RunConfig.outputs.data }}"
set_latest: true
SonarCloud:
runs-on: [self-hosted, builder]
env:
SONAR_SCANNER_VERSION: 4.8.0.2856
SONAR_SERVER_URL: "https://sonarcloud.io"
BUILD_WRAPPER_OUT_DIR: build_wrapper_output_directory # Directory where build-wrapper output will be placed
CC: clang-17
CXX: clang++-17
steps:
- name: Check out repository code
uses: ClickHouse/checkout@v1
with:
clear-repository: true
fetch-depth: 0 # Shallow clones should be disabled for a better relevancy of analysis
filter: tree:0
submodules: true
- name: Set up JDK 11
uses: actions/setup-java@v1
with:
java-version: 11
- name: Download and set up sonar-scanner
env:
SONAR_SCANNER_DOWNLOAD_URL: https://binaries.sonarsource.com/Distribution/sonar-scanner-cli/sonar-scanner-cli-${{ env.SONAR_SCANNER_VERSION }}-linux.zip
run: |
mkdir -p "$HOME/.sonar"
curl -sSLo "$HOME/.sonar/sonar-scanner.zip" "${{ env.SONAR_SCANNER_DOWNLOAD_URL }}"
unzip -o "$HOME/.sonar/sonar-scanner.zip" -d "$HOME/.sonar/"
echo "$HOME/.sonar/sonar-scanner-${{ env.SONAR_SCANNER_VERSION }}-linux/bin" >> "$GITHUB_PATH"
- name: Download and set up build-wrapper
env:
BUILD_WRAPPER_DOWNLOAD_URL: ${{ env.SONAR_SERVER_URL }}/static/cpp/build-wrapper-linux-x86.zip
run: |
curl -sSLo "$HOME/.sonar/build-wrapper-linux-x86.zip" "${{ env.BUILD_WRAPPER_DOWNLOAD_URL }}"
unzip -o "$HOME/.sonar/build-wrapper-linux-x86.zip" -d "$HOME/.sonar/"
echo "$HOME/.sonar/build-wrapper-linux-x86" >> "$GITHUB_PATH"
- name: Set Up Build Tools
run: |
sudo apt-get update
sudo apt-get install -yq git cmake ccache ninja-build python3 yasm nasm
sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
- name: Run build-wrapper
run: |
mkdir build
cd build
cmake ..
cd ..
build-wrapper-linux-x86-64 --out-dir ${{ env.BUILD_WRAPPER_OUT_DIR }} cmake --build build/
- name: Run sonar-scanner
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
run: |
sonar-scanner \
--define sonar.host.url="${{ env.SONAR_SERVER_URL }}" \
--define sonar.cfamily.build-wrapper-output="${{ env.BUILD_WRAPPER_OUT_DIR }}" \
--define sonar.projectKey="ClickHouse_ClickHouse" \
--define sonar.organization="clickhouse-java" \
--define sonar.cfamily.cpp23.enabled=true \
--define sonar.exclusions="**/*.java,**/*.ts,**/*.js,**/*.css,**/*.sql"

View File

@ -172,6 +172,7 @@ jobs:
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 finish_check.py
python3 merge_pr.py --check-approved
#############################################################################################

View File

@ -206,7 +206,7 @@ jobs:
runner_type: style-checker-aarch64
data: ${{ needs.RunConfig.outputs.data }}
MarkReleaseReady:
if: ${{ ! (contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure')) }}
if: ${{ !failure() && !cancelled() }}
needs:
- BuilderBinDarwin
- BuilderBinDarwinAarch64
@ -214,9 +214,25 @@ jobs:
- BuilderDebAarch64
runs-on: [self-hosted, style-checker-aarch64]
steps:
- name: Debug
run: |
echo need with different filters
cat << 'EOF'
${{ toJSON(needs) }}
${{ toJSON(needs.*.result) }}
no failures ${{ !contains(needs.*.result, 'failure') }}
no skips ${{ !contains(needs.*.result, 'skipped') }}
no both ${{ !(contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure')) }}
EOF
- name: Not ready
# fail the job to be able restart it
if: ${{ contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure') }}
run: exit 1
- name: Check out repository code
if: ${{ ! (contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure')) }}
uses: ClickHouse/checkout@v1
- name: Mark Commit Release Ready
if: ${{ ! (contains(needs.*.result, 'skipped') || contains(needs.*.result, 'failure')) }}
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 mark_release_ready.py

2
.gitignore vendored
View File

@ -165,7 +165,7 @@ tests/queries/0_stateless/*.expect.history
tests/integration/**/_gen
# rust
/rust/**/target
/rust/**/target*
# It is autogenerated from *.in
/rust/**/.cargo/config.toml
/rust/**/vendor

View File

@ -56,13 +56,13 @@ option(ENABLE_CHECK_HEAVY_BUILDS "Don't allow C++ translation units to compile t
if (ENABLE_CHECK_HEAVY_BUILDS)
# set DATA (since RSS does not work since 2.6.x+) to 5G
set (RLIMIT_DATA 5000000000)
# set VIRT (RLIMIT_AS) to 10G (DATA*10)
# set VIRT (RLIMIT_AS) to 10G (DATA*2)
set (RLIMIT_AS 10000000000)
# set CPU time limit to 1000 seconds
set (RLIMIT_CPU 1000)
# -fsanitize=memory is too heavy
if (SANITIZE STREQUAL "memory")
# -fsanitize=memory and address are too heavy
if (SANITIZE OR SANITIZE_COVERAGE OR WITH_COVERAGE)
set (RLIMIT_DATA 10000000000) # 10G
endif()
@ -110,11 +110,6 @@ endif()
# - sanitize.cmake
add_library(global-libs INTERFACE)
# We don't want to instrument everything with fuzzer, but only specific targets (see below),
# also, since we build our own llvm, we specifically don't want to instrument
# libFuzzer library itself - it would result in infinite recursion
#include (cmake/fuzzer.cmake)
include (cmake/sanitize.cmake)
option(ENABLE_COLORED_BUILD "Enable colors in compiler output" ON)
@ -319,7 +314,8 @@ if (COMPILER_CLANG)
endif()
endif ()
set (COMPILER_FLAGS "${COMPILER_FLAGS}")
# Disable floating-point expression contraction in order to get consistent floating point calculation results across platforms
set (COMPILER_FLAGS "${COMPILER_FLAGS} -ffp-contract=off")
# Our built-in unwinder only supports DWARF version up to 4.
set (DEBUG_INFO_FLAGS "-g")
@ -553,7 +549,9 @@ if (ENABLE_RUST)
endif()
endif()
if (CMAKE_BUILD_TYPE_UC STREQUAL "RELWITHDEBINFO" AND NOT SANITIZE AND NOT SANITIZE_COVERAGE AND OS_LINUX AND (ARCH_AMD64 OR ARCH_AARCH64))
if (CMAKE_BUILD_TYPE_UC STREQUAL "RELWITHDEBINFO"
AND NOT SANITIZE AND NOT SANITIZE_COVERAGE AND NOT ENABLE_FUZZING
AND OS_LINUX AND (ARCH_AMD64 OR ARCH_AARCH64))
set(CHECK_LARGE_OBJECT_SIZES_DEFAULT ON)
else ()
set(CHECK_LARGE_OBJECT_SIZES_DEFAULT OFF)
@ -576,10 +574,7 @@ if (FUZZER)
if (NOT(target_type STREQUAL "INTERFACE_LIBRARY" OR target_type STREQUAL "UTILITY"))
target_compile_options(${target} PRIVATE "-fsanitize=fuzzer-no-link")
endif()
# clickhouse fuzzer isn't working correctly
# initial PR https://github.com/ClickHouse/ClickHouse/pull/27526
#if (target MATCHES ".+_fuzzer" OR target STREQUAL "clickhouse")
if (target_type STREQUAL "EXECUTABLE" AND target MATCHES ".+_fuzzer")
if (target_type STREQUAL "EXECUTABLE" AND (target MATCHES ".+_fuzzer" OR target STREQUAL "clickhouse"))
message(STATUS "${target} instrumented with fuzzer")
target_link_libraries(${target} PUBLIC ch_contrib::fuzzer)
# Add to fuzzers bundle

View File

@ -31,15 +31,30 @@ curl https://clickhouse.com/ | sh
* [Static Analysis (SonarCloud)](https://sonarcloud.io/project/issues?resolved=false&id=ClickHouse_ClickHouse) proposes C++ quality improvements.
* [Contacts](https://clickhouse.com/company/contact) can help to get your questions answered if there are any.
## Monthly Release & Community Call
Every month we get together with the community (users, contributors, customers, those interested in learning more about ClickHouse) to discuss what is coming in the latest release. If you are interested in sharing what you've built on ClickHouse, let us know.
* [v24.3 Community Call](https://clickhouse.com/company/events/v24-3-community-release-call) - Mar 26
* [v24.4 Community Call](https://clickhouse.com/company/events/v24-4-community-release-call) - Apr 30
## Upcoming Events
Keep an eye out for upcoming meetups around the world. Somewhere else you want us to be? Please feel free to reach out to tyler `<at>` clickhouse `<dot>` com.
Keep an eye out for upcoming meetups and eventsaround the world. Somewhere else you want us to be? Please feel free to reach out to tyler `<at>` clickhouse `<dot>` com. You can also peruse [ClickHouse Events](https://clickhouse.com/company/news-events) for a list of all upcoming trainings, meetups, speaking engagements, etc.
* [ClickHouse Meetup in Bellevue](https://www.meetup.com/clickhouse-seattle-user-group/events/298650371/) - Mar 11
* [ClickHouse Meetup at Ramp's Offices in NYC](https://www.meetup.com/clickhouse-new-york-user-group/events/298640542/) - Mar 19
* [ClickHouse Melbourne Meetup](https://www.meetup.com/clickhouse-australia-user-group/events/299479750/) - Mar 20
* [ClickHouse Meetup in Paris](https://www.meetup.com/clickhouse-france-user-group/events/298997115/) - Mar 21
* [ClickHouse Meetup in Bengaluru](https://www.meetup.com/clickhouse-bangalore-user-group/events/299479850/) - Mar 23
* [ClickHouse Meetup in Zurich](https://www.meetup.com/clickhouse-switzerland-meetup-group/events/299628922/) - Apr 16
* [ClickHouse Meetup in Copenhagen](https://www.meetup.com/clickhouse-denmark-meetup-group/events/299629133/) - Apr 23
* [ClickHouse Meetup in Dubai](https://www.meetup.com/clickhouse-dubai-meetup-group/events/299629189/) - May 28
## Recent Recordings
* **Recent Meetup Videos**: [Meetup Playlist](https://www.youtube.com/playlist?list=PL0Z2YDlm0b3iNDUzpY1S3L_iV4nARda_U) Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
* **Recording available**: [**v24.1 Release Webinar**](https://www.youtube.com/watch?v=pBF9g0wGAGs) All the features of 24.1, one convenient video! Watch it now!
* **All release webinar recordings**: [YouTube playlist](https://www.youtube.com/playlist?list=PL0Z2YDlm0b3jAlSy1JxyP8zluvXaN3nxU)
* **Recording available**: [**v24.2 Release Call**](https://www.youtube.com/watch?v=iN2y-TK8f3A) All the features of 24.2, one convenient video! Watch it now!
## Interested in joining ClickHouse and making it your full-time job?

View File

@ -13,6 +13,7 @@ set (SRCS
cgroupsv2.cpp
coverage.cpp
demangle.cpp
Decimal.cpp
getAvailableMemoryAmount.cpp
getFQDNOrHostName.cpp
getMemoryAmount.cpp

87
base/base/Decimal.cpp Normal file
View File

@ -0,0 +1,87 @@
#include <base/Decimal.h>
#include <base/extended_types.h>
namespace DB
{
/// Explicit template instantiations.
#define FOR_EACH_UNDERLYING_DECIMAL_TYPE(M) \
M(Int32) \
M(Int64) \
M(Int128) \
M(Int256)
#define FOR_EACH_UNDERLYING_DECIMAL_TYPE_PASS(M, X) \
M(Int32, X) \
M(Int64, X) \
M(Int128, X) \
M(Int256, X)
template <typename T> const Decimal<T> & Decimal<T>::operator += (const T & x) { value += x; return *this; }
template <typename T> const Decimal<T> & Decimal<T>::operator -= (const T & x) { value -= x; return *this; }
template <typename T> const Decimal<T> & Decimal<T>::operator *= (const T & x) { value *= x; return *this; }
template <typename T> const Decimal<T> & Decimal<T>::operator /= (const T & x) { value /= x; return *this; }
template <typename T> const Decimal<T> & Decimal<T>::operator %= (const T & x) { value %= x; return *this; }
template <typename T> void NO_SANITIZE_UNDEFINED Decimal<T>::addOverflow(const T & x) { value += x; }
/// Maybe this explicit instantiation affects performance since operators cannot be inlined.
template <typename T> template <typename U> const Decimal<T> & Decimal<T>::operator += (const Decimal<U> & x) { value += static_cast<T>(x.value); return *this; }
template <typename T> template <typename U> const Decimal<T> & Decimal<T>::operator -= (const Decimal<U> & x) { value -= static_cast<T>(x.value); return *this; }
template <typename T> template <typename U> const Decimal<T> & Decimal<T>::operator *= (const Decimal<U> & x) { value *= static_cast<T>(x.value); return *this; }
template <typename T> template <typename U> const Decimal<T> & Decimal<T>::operator /= (const Decimal<U> & x) { value /= static_cast<T>(x.value); return *this; }
template <typename T> template <typename U> const Decimal<T> & Decimal<T>::operator %= (const Decimal<U> & x) { value %= static_cast<T>(x.value); return *this; }
#define DISPATCH(TYPE_T, TYPE_U) \
template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator += (const Decimal<TYPE_U> & x); \
template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator -= (const Decimal<TYPE_U> & x); \
template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator *= (const Decimal<TYPE_U> & x); \
template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator /= (const Decimal<TYPE_U> & x); \
template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator %= (const Decimal<TYPE_U> & x);
#define INVOKE(X) FOR_EACH_UNDERLYING_DECIMAL_TYPE_PASS(DISPATCH, X)
FOR_EACH_UNDERLYING_DECIMAL_TYPE(INVOKE);
#undef INVOKE
#undef DISPATCH
#define DISPATCH(TYPE) template struct Decimal<TYPE>;
FOR_EACH_UNDERLYING_DECIMAL_TYPE(DISPATCH)
#undef DISPATCH
template <typename T> bool operator< (const Decimal<T> & x, const Decimal<T> & y) { return x.value < y.value; }
template <typename T> bool operator> (const Decimal<T> & x, const Decimal<T> & y) { return x.value > y.value; }
template <typename T> bool operator<= (const Decimal<T> & x, const Decimal<T> & y) { return x.value <= y.value; }
template <typename T> bool operator>= (const Decimal<T> & x, const Decimal<T> & y) { return x.value >= y.value; }
template <typename T> bool operator== (const Decimal<T> & x, const Decimal<T> & y) { return x.value == y.value; }
template <typename T> bool operator!= (const Decimal<T> & x, const Decimal<T> & y) { return x.value != y.value; }
#define DISPATCH(TYPE) \
template bool operator< (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template bool operator> (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template bool operator<= (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template bool operator>= (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template bool operator== (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template bool operator!= (const Decimal<TYPE> & x, const Decimal<TYPE> & y);
FOR_EACH_UNDERLYING_DECIMAL_TYPE(DISPATCH)
#undef DISPATCH
template <typename T> Decimal<T> operator+ (const Decimal<T> & x, const Decimal<T> & y) { return x.value + y.value; }
template <typename T> Decimal<T> operator- (const Decimal<T> & x, const Decimal<T> & y) { return x.value - y.value; }
template <typename T> Decimal<T> operator* (const Decimal<T> & x, const Decimal<T> & y) { return x.value * y.value; }
template <typename T> Decimal<T> operator/ (const Decimal<T> & x, const Decimal<T> & y) { return x.value / y.value; }
template <typename T> Decimal<T> operator- (const Decimal<T> & x) { return -x.value; }
#define DISPATCH(TYPE) \
template Decimal<TYPE> operator+ (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template Decimal<TYPE> operator- (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template Decimal<TYPE> operator* (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template Decimal<TYPE> operator/ (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
template Decimal<TYPE> operator- (const Decimal<TYPE> & x);
FOR_EACH_UNDERLYING_DECIMAL_TYPE(DISPATCH)
#undef DISPATCH
#undef FOR_EACH_UNDERLYING_DECIMAL_TYPE_PASS
#undef FOR_EACH_UNDERLYING_DECIMAL_TYPE
}

View File

@ -2,6 +2,7 @@
#include <base/extended_types.h>
#include <base/Decimal_fwd.h>
#include <base/types.h>
#include <base/defines.h>
@ -10,6 +11,18 @@ namespace DB
template <class> struct Decimal;
class DateTime64;
#define FOR_EACH_UNDERLYING_DECIMAL_TYPE(M) \
M(Int32) \
M(Int64) \
M(Int128) \
M(Int256)
#define FOR_EACH_UNDERLYING_DECIMAL_TYPE_PASS(M, X) \
M(Int32, X) \
M(Int64, X) \
M(Int128, X) \
M(Int256, X)
using Decimal32 = Decimal<Int32>;
using Decimal64 = Decimal<Int64>;
using Decimal128 = Decimal<Int128>;
@ -50,36 +63,73 @@ struct Decimal
return static_cast<U>(value);
}
const Decimal<T> & operator += (const T & x) { value += x; return *this; }
const Decimal<T> & operator -= (const T & x) { value -= x; return *this; }
const Decimal<T> & operator *= (const T & x) { value *= x; return *this; }
const Decimal<T> & operator /= (const T & x) { value /= x; return *this; }
const Decimal<T> & operator %= (const T & x) { value %= x; return *this; }
const Decimal<T> & operator += (const T & x);
const Decimal<T> & operator -= (const T & x);
const Decimal<T> & operator *= (const T & x);
const Decimal<T> & operator /= (const T & x);
const Decimal<T> & operator %= (const T & x);
template <typename U> const Decimal<T> & operator += (const Decimal<U> & x) { value += x.value; return *this; }
template <typename U> const Decimal<T> & operator -= (const Decimal<U> & x) { value -= x.value; return *this; }
template <typename U> const Decimal<T> & operator *= (const Decimal<U> & x) { value *= x.value; return *this; }
template <typename U> const Decimal<T> & operator /= (const Decimal<U> & x) { value /= x.value; return *this; }
template <typename U> const Decimal<T> & operator %= (const Decimal<U> & x) { value %= x.value; return *this; }
template <typename U> const Decimal<T> & operator += (const Decimal<U> & x);
template <typename U> const Decimal<T> & operator -= (const Decimal<U> & x);
template <typename U> const Decimal<T> & operator *= (const Decimal<U> & x);
template <typename U> const Decimal<T> & operator /= (const Decimal<U> & x);
template <typename U> const Decimal<T> & operator %= (const Decimal<U> & x);
/// This is to avoid UB for sumWithOverflow()
void NO_SANITIZE_UNDEFINED addOverflow(const T & x) { value += x; }
void NO_SANITIZE_UNDEFINED addOverflow(const T & x);
T value;
};
template <typename T> inline bool operator< (const Decimal<T> & x, const Decimal<T> & y) { return x.value < y.value; }
template <typename T> inline bool operator> (const Decimal<T> & x, const Decimal<T> & y) { return x.value > y.value; }
template <typename T> inline bool operator<= (const Decimal<T> & x, const Decimal<T> & y) { return x.value <= y.value; }
template <typename T> inline bool operator>= (const Decimal<T> & x, const Decimal<T> & y) { return x.value >= y.value; }
template <typename T> inline bool operator== (const Decimal<T> & x, const Decimal<T> & y) { return x.value == y.value; }
template <typename T> inline bool operator!= (const Decimal<T> & x, const Decimal<T> & y) { return x.value != y.value; }
#define DISPATCH(TYPE) extern template struct Decimal<TYPE>;
FOR_EACH_UNDERLYING_DECIMAL_TYPE(DISPATCH)
#undef DISPATCH
template <typename T> inline Decimal<T> operator+ (const Decimal<T> & x, const Decimal<T> & y) { return x.value + y.value; }
template <typename T> inline Decimal<T> operator- (const Decimal<T> & x, const Decimal<T> & y) { return x.value - y.value; }
template <typename T> inline Decimal<T> operator* (const Decimal<T> & x, const Decimal<T> & y) { return x.value * y.value; }
template <typename T> inline Decimal<T> operator/ (const Decimal<T> & x, const Decimal<T> & y) { return x.value / y.value; }
template <typename T> inline Decimal<T> operator- (const Decimal<T> & x) { return -x.value; }
#define DISPATCH(TYPE_T, TYPE_U) \
extern template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator += (const Decimal<TYPE_U> & x); \
extern template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator -= (const Decimal<TYPE_U> & x); \
extern template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator *= (const Decimal<TYPE_U> & x); \
extern template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator /= (const Decimal<TYPE_U> & x); \
extern template const Decimal<TYPE_T> & Decimal<TYPE_T>::operator %= (const Decimal<TYPE_U> & x);
#define INVOKE(X) FOR_EACH_UNDERLYING_DECIMAL_TYPE_PASS(DISPATCH, X)
FOR_EACH_UNDERLYING_DECIMAL_TYPE(INVOKE);
#undef INVOKE
#undef DISPATCH
template <typename T> bool operator< (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> bool operator> (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> bool operator<= (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> bool operator>= (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> bool operator== (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> bool operator!= (const Decimal<T> & x, const Decimal<T> & y);
#define DISPATCH(TYPE) \
extern template bool operator< (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template bool operator> (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template bool operator<= (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template bool operator>= (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template bool operator== (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template bool operator!= (const Decimal<TYPE> & x, const Decimal<TYPE> & y);
FOR_EACH_UNDERLYING_DECIMAL_TYPE(DISPATCH)
#undef DISPATCH
template <typename T> Decimal<T> operator+ (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> Decimal<T> operator- (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> Decimal<T> operator* (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> Decimal<T> operator/ (const Decimal<T> & x, const Decimal<T> & y);
template <typename T> Decimal<T> operator- (const Decimal<T> & x);
#define DISPATCH(TYPE) \
extern template Decimal<TYPE> operator+ (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template Decimal<TYPE> operator- (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template Decimal<TYPE> operator* (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template Decimal<TYPE> operator/ (const Decimal<TYPE> & x, const Decimal<TYPE> & y); \
extern template Decimal<TYPE> operator- (const Decimal<TYPE> & x);
FOR_EACH_UNDERLYING_DECIMAL_TYPE(DISPATCH)
#undef DISPATCH
#undef FOR_EACH_UNDERLYING_DECIMAL_TYPE_PASS
#undef FOR_EACH_UNDERLYING_DECIMAL_TYPE
/// Distinguishable type to allow function resolution/deduction based on value type,
/// but also relatively easy to convert to/from Decimal64.

View File

@ -1,7 +1,7 @@
#include "coverage.h"
#include <sys/mman.h>
#pragma GCC diagnostic ignored "-Wreserved-identifier"
#pragma clang diagnostic ignored "-Wreserved-identifier"
/// WITH_COVERAGE enables the default implementation of code coverage,

View File

@ -108,16 +108,22 @@
{
[[noreturn]] void abortOnFailedAssertion(const String & description);
}
#define chassert(x) do { static_cast<bool>(x) ? void(0) : ::DB::abortOnFailedAssertion(#x); } while (0)
#define chassert_1(x, ...) do { static_cast<bool>(x) ? void(0) : ::DB::abortOnFailedAssertion(#x); } while (0)
#define chassert_2(x, comment, ...) do { static_cast<bool>(x) ? void(0) : ::DB::abortOnFailedAssertion(comment); } while (0)
#define UNREACHABLE() abort()
// clang-format off
#else
/// Here sizeof() trick is used to suppress unused warning for result,
/// since simple "(void)x" will evaluate the expression, while
/// "sizeof(!(x))" will not.
#define chassert(x) (void)sizeof(!(x))
#define chassert_1(x, ...) (void)sizeof(!(x))
#define chassert_2(x, comment, ...) (void)sizeof(!(x))
#define UNREACHABLE() __builtin_unreachable()
#endif
#define CHASSERT_DISPATCH(_1,_2, N,...) N(_1, _2)
#define CHASSERT_INVOKE(tuple) CHASSERT_DISPATCH tuple
#define chassert(...) CHASSERT_INVOKE((__VA_ARGS__, chassert_2, chassert_1))
#endif
/// Macros for Clang Thread Safety Analysis (TSA). They can be safely ignored by other compilers.

View File

@ -64,6 +64,44 @@ template <> struct is_arithmetic<UInt256> { static constexpr bool value = true;
template <typename T>
inline constexpr bool is_arithmetic_v = is_arithmetic<T>::value;
#define FOR_EACH_ARITHMETIC_TYPE(M) \
M(DataTypeDate) \
M(DataTypeDate32) \
M(DataTypeDateTime) \
M(DataTypeInt8) \
M(DataTypeUInt8) \
M(DataTypeInt16) \
M(DataTypeUInt16) \
M(DataTypeInt32) \
M(DataTypeUInt32) \
M(DataTypeInt64) \
M(DataTypeUInt64) \
M(DataTypeInt128) \
M(DataTypeUInt128) \
M(DataTypeInt256) \
M(DataTypeUInt256) \
M(DataTypeFloat32) \
M(DataTypeFloat64)
#define FOR_EACH_ARITHMETIC_TYPE_PASS(M, X) \
M(DataTypeDate, X) \
M(DataTypeDate32, X) \
M(DataTypeDateTime, X) \
M(DataTypeInt8, X) \
M(DataTypeUInt8, X) \
M(DataTypeInt16, X) \
M(DataTypeUInt16, X) \
M(DataTypeInt32, X) \
M(DataTypeUInt32, X) \
M(DataTypeInt64, X) \
M(DataTypeUInt64, X) \
M(DataTypeInt128, X) \
M(DataTypeUInt128, X) \
M(DataTypeInt256, X) \
M(DataTypeUInt256, X) \
M(DataTypeFloat32, X) \
M(DataTypeFloat64, X)
template <typename T>
struct make_unsigned // NOLINT(readability-identifier-naming)
{

View File

@ -59,8 +59,8 @@ using ComparatorWrapper = Comparator;
#endif
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wold-style-cast"
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wold-style-cast"
#include <miniselect/floyd_rivest_select.h>
@ -115,7 +115,7 @@ void partial_sort(RandomIt first, RandomIt middle, RandomIt last)
::partial_sort(first, middle, last, comparator());
}
#pragma GCC diagnostic pop
#pragma clang diagnostic pop
template <typename RandomIt, typename Compare>
void sort(RandomIt first, RandomIt last, Compare compare)

View File

@ -45,6 +45,8 @@ namespace Net
~HTTPChunkedStreamBuf();
void close();
bool isComplete() const { return _chunk == std::char_traits<char>::eof(); }
protected:
int readFromDevice(char * buffer, std::streamsize length);
int writeToDevice(const char * buffer, std::streamsize length);
@ -68,6 +70,8 @@ namespace Net
~HTTPChunkedIOS();
HTTPChunkedStreamBuf * rdbuf();
bool isComplete() const { return _buf.isComplete(); }
protected:
HTTPChunkedStreamBuf _buf;
};

View File

@ -210,7 +210,7 @@ namespace Net
void setKeepAliveTimeout(const Poco::Timespan & timeout);
/// Sets the connection timeout for HTTP connections.
const Poco::Timespan & getKeepAliveTimeout() const;
Poco::Timespan getKeepAliveTimeout() const;
/// Returns the connection timeout for HTTP connections.
virtual std::ostream & sendRequest(HTTPRequest & request);
@ -275,7 +275,7 @@ namespace Net
/// This method should only be called if the request contains
/// a "Expect: 100-continue" header.
void flushRequest();
virtual void flushRequest();
/// Flushes the request stream.
///
/// Normally this method does not need to be called.
@ -283,7 +283,7 @@ namespace Net
/// fully sent if receiveResponse() is not called, e.g.,
/// because the underlying socket will be detached.
void reset();
virtual void reset();
/// Resets the session and closes the socket.
///
/// The next request will initiate a new connection,
@ -303,6 +303,9 @@ namespace Net
/// Returns true if the proxy should be bypassed
/// for the current host.
const Poco::Timestamp & getLastRequest() const;
/// Returns time when connection has been used last time
protected:
enum
{
@ -338,6 +341,10 @@ namespace Net
/// Calls proxyConnect() and attaches the resulting StreamSocket
/// to the HTTPClientSession.
void setLastRequest(Poco::Timestamp time);
void assign(HTTPClientSession & session);
HTTPSessionFactory _proxySessionFactory;
/// Factory to create HTTPClientSession to proxy.
private:
@ -433,11 +440,20 @@ namespace Net
}
inline const Poco::Timespan & HTTPClientSession::getKeepAliveTimeout() const
inline Poco::Timespan HTTPClientSession::getKeepAliveTimeout() const
{
return _keepAliveTimeout;
}
inline const Poco::Timestamp & HTTPClientSession::getLastRequest() const
{
return _lastRequest;
}
inline void HTTPClientSession::setLastRequest(Poco::Timestamp time)
{
_lastRequest = time;
}
}
} // namespace Poco::Net

View File

@ -48,6 +48,8 @@ namespace Net
HTTPFixedLengthStreamBuf(HTTPSession & session, ContentLength length, openmode mode);
~HTTPFixedLengthStreamBuf();
bool isComplete() const;
protected:
int readFromDevice(char * buffer, std::streamsize length);
int writeToDevice(const char * buffer, std::streamsize length);
@ -67,6 +69,8 @@ namespace Net
~HTTPFixedLengthIOS();
HTTPFixedLengthStreamBuf * rdbuf();
bool isComplete() const { return _buf.isComplete(); }
protected:
HTTPFixedLengthStreamBuf _buf;
};

View File

@ -64,6 +64,15 @@ namespace Net
Poco::Timespan getTimeout() const;
/// Returns the timeout for the HTTP session.
Poco::Timespan getConnectionTimeout() const;
/// Returns connection timeout for the HTTP session.
Poco::Timespan getSendTimeout() const;
/// Returns send timeout for the HTTP session.
Poco::Timespan getReceiveTimeout() const;
/// Returns receive timeout for the HTTP session.
bool connected() const;
/// Returns true if the underlying socket is connected.
@ -217,12 +226,25 @@ namespace Net
return _keepAlive;
}
inline Poco::Timespan HTTPSession::getTimeout() const
{
return _receiveTimeout;
}
inline Poco::Timespan HTTPSession::getConnectionTimeout() const
{
return _connectionTimeout;
}
inline Poco::Timespan HTTPSession::getSendTimeout() const
{
return _sendTimeout;
}
inline Poco::Timespan HTTPSession::getReceiveTimeout() const
{
return _receiveTimeout;
}
inline StreamSocket & HTTPSession::socket()
{

View File

@ -63,6 +63,8 @@ namespace Net
~HTTPIOS();
HTTPStreamBuf * rdbuf();
bool isComplete() const { return false; }
protected:
HTTPStreamBuf _buf;
};

View File

@ -49,10 +49,12 @@ HTTPChunkedStreamBuf::~HTTPChunkedStreamBuf()
void HTTPChunkedStreamBuf::close()
{
if (_mode & std::ios::out)
if (_mode & std::ios::out && _chunk != std::char_traits<char>::eof())
{
sync();
_session.write("0\r\n\r\n", 5);
_chunk = std::char_traits<char>::eof();
}
}

View File

@ -227,7 +227,7 @@ void HTTPClientSession::setKeepAliveTimeout(const Poco::Timespan& timeout)
std::ostream& HTTPClientSession::sendRequest(HTTPRequest& request)
{
_pRequestStream = 0;
_pResponseStream = 0;
_pResponseStream = 0;
clearException();
_responseReceived = false;
@ -501,5 +501,26 @@ bool HTTPClientSession::bypassProxy() const
else return false;
}
void HTTPClientSession::assign(Poco::Net::HTTPClientSession & session)
{
poco_assert (this != &session);
if (session.buffered())
throw Poco::LogicException("assign a session with not empty buffered data");
if (buffered())
throw Poco::LogicException("assign to a session with not empty buffered data");
attachSocket(session.detachSocket());
setLastRequest(session.getLastRequest());
setResolvedHost(session.getResolvedHost());
setKeepAlive(session.getKeepAlive());
setTimeout(session.getConnectionTimeout(), session.getSendTimeout(), session.getReceiveTimeout());
setKeepAliveTimeout(session.getKeepAliveTimeout());
setProxyConfig(session.getProxyConfig());
session.reset();
}
} } // namespace Poco::Net

View File

@ -43,6 +43,12 @@ HTTPFixedLengthStreamBuf::~HTTPFixedLengthStreamBuf()
}
bool HTTPFixedLengthStreamBuf::isComplete() const
{
return _count == _length;
}
int HTTPFixedLengthStreamBuf::readFromDevice(char* buffer, std::streamsize length)
{
int n = 0;

View File

@ -93,7 +93,7 @@ void TCPServerDispatcher::release()
void TCPServerDispatcher::run()
{
AutoPtr<TCPServerDispatcher> guard(this, true); // ensure object stays alive
AutoPtr<TCPServerDispatcher> guard(this); // ensure object stays alive
int idleTime = (int) _pParams->getThreadIdleTime().totalMilliseconds();
@ -149,11 +149,13 @@ void TCPServerDispatcher::enqueue(const StreamSocket& socket)
{
try
{
this->duplicate();
_threadPool.startWithPriority(_pParams->getThreadPriority(), *this, threadName);
++_currentThreads;
}
catch (Poco::Exception& exc)
{
this->release();
++_refusedConnections;
std::cerr << "Got exception while starting thread for connection. Error code: "
<< exc.code() << ", message: '" << exc.displayText() << "'" << std::endl;

View File

@ -1,17 +0,0 @@
# see ./CMakeLists.txt for variable declaration
if (FUZZER)
if (FUZZER STREQUAL "libfuzzer")
# NOTE: Eldar Zaitov decided to name it "libfuzzer" instead of "fuzzer" to keep in mind another possible fuzzer backends.
# NOTE: no-link means that all the targets are built with instrumentation for fuzzer, but only some of them
# (tests) have entry point for fuzzer and it's not checked.
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} -fsanitize=fuzzer-no-link -DFUZZER=1")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} -fsanitize=fuzzer-no-link -DFUZZER=1")
# NOTE: oss-fuzz can change LIB_FUZZING_ENGINE variable
if (NOT LIB_FUZZING_ENGINE)
set (LIB_FUZZING_ENGINE "-fsanitize=fuzzer")
endif ()
else ()
message (FATAL_ERROR "Unknown fuzzer type: ${FUZZER}")
endif ()
endif()

2
contrib/curl vendored

@ -1 +1 @@
Subproject commit 5ce164e0e9290c96eb7d502173426c0a135ec008
Subproject commit 1a05e833f8f7140628b27882b10525fd9ec4b873

2
contrib/libhdfs3 vendored

@ -1 +1 @@
Subproject commit b9598e6016720a7c088bfe85ce1fa0410f9d2103
Subproject commit 0d04201c45359f0d0701fb1e8297d25eff7cfecf

View File

@ -17,6 +17,8 @@
#ifndef METROHASH_METROHASH_128_H
#define METROHASH_METROHASH_128_H
// NOLINTBEGIN(readability-avoid-const-params-in-decls)
#include <stdint.h>
class MetroHash128
@ -68,5 +70,6 @@ private:
void metrohash128_1(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
void metrohash128_2(const uint8_t * key, uint64_t len, uint32_t seed, uint8_t * out);
// NOLINTEND(readability-avoid-const-params-in-decls)
#endif // #ifndef METROHASH_METROHASH_128_H

View File

@ -1,8 +1,12 @@
{
"docker/packager/binary": {
"docker/packager/binary-builder": {
"name": "clickhouse/binary-builder",
"dependent": []
},
"docker/packager/cctools": {
"name": "clickhouse/cctools",
"dependent": []
},
"docker/test/compatibility/centos": {
"name": "clickhouse/test-old-centos",
"dependent": []
@ -30,7 +34,6 @@
"docker/test/util": {
"name": "clickhouse/test-util",
"dependent": [
"docker/packager/binary",
"docker/test/base",
"docker/test/fasttest"
]
@ -67,7 +70,9 @@
},
"docker/test/fasttest": {
"name": "clickhouse/fasttest",
"dependent": []
"dependent": [
"docker/packager/binary-builder"
]
},
"docker/test/style": {
"name": "clickhouse/style-test",

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.2.1.2248"
ARG VERSION="24.2.2.71"
ARG PACKAGES="clickhouse-keeper"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -28,7 +28,6 @@ lrwxrwxrwx 1 root root 10 clickhouse-benchmark -> clickhouse
lrwxrwxrwx 1 root root 10 clickhouse-clang -> clickhouse
lrwxrwxrwx 1 root root 10 clickhouse-client -> clickhouse
lrwxrwxrwx 1 root root 10 clickhouse-compressor -> clickhouse
lrwxrwxrwx 1 root root 10 clickhouse-copier -> clickhouse
lrwxrwxrwx 1 root root 10 clickhouse-extract-from-config -> clickhouse
lrwxrwxrwx 1 root root 10 clickhouse-format -> clickhouse
lrwxrwxrwx 1 root root 10 clickhouse-lld -> clickhouse

View File

@ -1,45 +1,11 @@
# docker build -t clickhouse/binary-builder .
ARG FROM_TAG=latest
FROM clickhouse/test-util:latest AS cctools
# The cctools are built always from the clickhouse/test-util:latest and cached inline
# Theoretically, it should improve rebuild speed significantly
FROM clickhouse/fasttest:$FROM_TAG
ENV CC=clang-${LLVM_VERSION}
ENV CXX=clang++-${LLVM_VERSION}
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# DO NOT PUT ANYTHING BEFORE THE NEXT TWO `RUN` DIRECTIVES
# THE MOST HEAVY OPERATION MUST BE THE FIRST IN THE CACHE
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# libtapi is required to support .tbh format from recent MacOS SDKs
RUN git clone https://github.com/tpoechtrager/apple-libtapi.git \
&& cd apple-libtapi \
&& git checkout 15dfc2a8c9a2a89d06ff227560a69f5265b692f9 \
&& INSTALLPREFIX=/cctools ./build.sh \
&& ./install.sh \
&& cd .. \
&& rm -rf apple-libtapi
# Build and install tools for cross-linking to Darwin (x86-64)
# Build and install tools for cross-linking to Darwin (aarch64)
RUN git clone https://github.com/tpoechtrager/cctools-port.git \
&& cd cctools-port/cctools \
&& git checkout 2a3e1c2a6ff54a30f898b70cfb9ba1692a55fad7 \
&& ./configure --prefix=/cctools --with-libtapi=/cctools \
--target=x86_64-apple-darwin \
&& make install -j$(nproc) \
&& make clean \
&& ./configure --prefix=/cctools --with-libtapi=/cctools \
--target=aarch64-apple-darwin \
&& make install -j$(nproc) \
&& cd ../.. \
&& rm -rf cctools-port
# !!!!!!!!!!!
# END COMPILE
# !!!!!!!!!!!
FROM clickhouse/test-util:$FROM_TAG
ENV CC=clang-${LLVM_VERSION}
ENV CXX=clang++-${LLVM_VERSION}
# If the cctools is updated, then first build it in the CI, then update here in a different commit
COPY --from=clickhouse/cctools:d9e3596e706b /cctools /cctools
# Rust toolchain and libraries
ENV RUSTUP_HOME=/rust/rustup
@ -110,8 +76,6 @@ RUN curl -Lo /usr/bin/clang-tidy-cache \
"https://raw.githubusercontent.com/matus-chochlik/ctcache/$CLANG_TIDY_SHA1/clang-tidy-cache" \
&& chmod +x /usr/bin/clang-tidy-cache
COPY --from=cctools /cctools /cctools
RUN mkdir /workdir && chmod 777 /workdir
WORKDIR /workdir

View File

@ -0,0 +1,34 @@
# This is a hack to significantly reduce the build time of the clickhouse/binary-builder
# It's based on the assumption that we don't care of the cctools version so much
# It event does not depend on the clickhouse/fasttest in the `docker/images.json`
ARG FROM_TAG=latest
FROM clickhouse/fasttest:$FROM_TAG as builder
ENV CC=clang-${LLVM_VERSION}
ENV CXX=clang++-${LLVM_VERSION}
RUN git clone https://github.com/tpoechtrager/apple-libtapi.git \
&& cd apple-libtapi \
&& git checkout 15dfc2a8c9a2a89d06ff227560a69f5265b692f9 \
&& INSTALLPREFIX=/cctools ./build.sh \
&& ./install.sh \
&& cd .. \
&& rm -rf apple-libtapi
# Build and install tools for cross-linking to Darwin (x86-64)
# Build and install tools for cross-linking to Darwin (aarch64)
RUN git clone https://github.com/tpoechtrager/cctools-port.git \
&& cd cctools-port/cctools \
&& git checkout 2a3e1c2a6ff54a30f898b70cfb9ba1692a55fad7 \
&& ./configure --prefix=/cctools --with-libtapi=/cctools \
--target=x86_64-apple-darwin \
&& make install -j$(nproc) \
&& make clean \
&& ./configure --prefix=/cctools --with-libtapi=/cctools \
--target=aarch64-apple-darwin \
&& make install -j$(nproc) \
&& cd ../.. \
&& rm -rf cctools-port
FROM scratch
COPY --from=builder /cctools /cctools

View File

@ -1,16 +1,16 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import subprocess
import os
import argparse
import logging
import os
import subprocess
import sys
from pathlib import Path
from typing import List, Optional
SCRIPT_PATH = Path(__file__).absolute()
IMAGE_TYPE = "binary"
IMAGE_NAME = f"clickhouse/{IMAGE_TYPE}-builder"
IMAGE_TYPE = "binary-builder"
IMAGE_NAME = f"clickhouse/{IMAGE_TYPE}"
class BuildException(Exception):

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="24.2.1.2248"
ARG VERSION="24.2.2.71"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
ARG DIRECT_DOWNLOAD_URLS=""

View File

@ -27,7 +27,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="24.2.1.2248"
ARG VERSION="24.2.2.71"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -33,6 +33,9 @@ ENV TSAN_OPTIONS='halt_on_error=1 abort_on_error=1 history_size=7 memory_limit_m
ENV UBSAN_OPTIONS='print_stacktrace=1'
ENV MSAN_OPTIONS='abort_on_error=1 poison_in_dtor=1'
# for external_symbolizer_path
RUN ln -s /usr/bin/llvm-symbolizer-${LLVM_VERSION} /usr/bin/llvm-symbolizer
RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && locale-gen en_US.UTF-8
ENV LC_ALL en_US.UTF-8

View File

@ -6,9 +6,17 @@ FROM clickhouse/test-util:$FROM_TAG
RUN apt-get update \
&& apt-get install \
brotli \
clang-${LLVM_VERSION} \
clang-tidy-${LLVM_VERSION} \
cmake \
expect \
file \
libclang-${LLVM_VERSION}-dev \
libclang-rt-${LLVM_VERSION}-dev \
lld-${LLVM_VERSION} \
llvm-${LLVM_VERSION}-dev \
lsof \
ninja-build \
odbcinst \
psmisc \
python3 \
@ -26,14 +34,48 @@ RUN apt-get update \
RUN pip3 install numpy==1.26.3 scipy==1.12.0 pandas==1.5.3 Jinja2==3.1.3
ARG odbc_driver_url="https://github.com/ClickHouse/clickhouse-odbc/releases/download/v1.1.4.20200302/clickhouse-odbc-1.1.4-Linux.tar.gz"
# This symlink is required by gcc to find the lld linker
RUN ln -s /usr/bin/lld-${LLVM_VERSION} /usr/bin/ld.lld
# FIXME: workaround for "The imported target "merge-fdata" references the file" error
# https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/commit/992e52c0b156a5ba9c6a8a54f8c4857ddd3d371d
RUN sed -i '/_IMPORT_CHECK_FILES_FOR_\(mlir-\|llvm-bolt\|merge-fdata\|MLIR\)/ {s|^|#|}' /usr/lib/llvm-${LLVM_VERSION}/lib/cmake/llvm/LLVMExports-*.cmake
RUN mkdir -p /tmp/clickhouse-odbc-tmp \
&& wget -nv -O - ${odbc_driver_url} | tar --strip-components=1 -xz -C /tmp/clickhouse-odbc-tmp \
&& cp /tmp/clickhouse-odbc-tmp/lib64/*.so /usr/local/lib/ \
&& odbcinst -i -d -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbcinst.ini.sample \
&& odbcinst -i -s -l -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbc.ini.sample \
&& rm -rf /tmp/clickhouse-odbc-tmp
ARG CCACHE_VERSION=4.6.1
RUN mkdir /tmp/ccache \
&& cd /tmp/ccache \
&& curl -L \
-O https://github.com/ccache/ccache/releases/download/v$CCACHE_VERSION/ccache-$CCACHE_VERSION.tar.xz \
-O https://github.com/ccache/ccache/releases/download/v$CCACHE_VERSION/ccache-$CCACHE_VERSION.tar.xz.asc \
&& gpg --recv-keys --keyserver hkps://keyserver.ubuntu.com 5A939A71A46792CF57866A51996DDA075594ADB8 \
&& gpg --verify ccache-4.6.1.tar.xz.asc \
&& tar xf ccache-$CCACHE_VERSION.tar.xz \
&& cd /tmp/ccache/ccache-$CCACHE_VERSION \
&& cmake -DCMAKE_INSTALL_PREFIX=/usr \
-DCMAKE_BUILD_TYPE=None \
-DZSTD_FROM_INTERNET=ON \
-DREDIS_STORAGE_BACKEND=OFF \
-Wno-dev \
-B build \
-S . \
&& make VERBOSE=1 -C build \
&& make install -C build \
&& cd / \
&& rm -rf /tmp/ccache
ARG TARGETARCH
ARG SCCACHE_VERSION=v0.7.7
ENV SCCACHE_IGNORE_SERVER_IO_ERROR=1
# sccache requires a value for the region. So by default we use The Default Region
ENV SCCACHE_REGION=us-east-1
RUN arch=${TARGETARCH:-amd64} \
&& case $arch in \
amd64) rarch=x86_64 ;; \
arm64) rarch=aarch64 ;; \
esac \
&& curl -Ls "https://github.com/mozilla/sccache/releases/download/$SCCACHE_VERSION/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl.tar.gz" | \
tar xz -C /tmp \
&& mv "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl/sccache" /usr/bin \
&& rm "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl" -r
# Give suid to gdb to grant it attach permissions
# chmod 777 to make the container user independent

View File

@ -343,10 +343,9 @@ quit
# which is confusing.
task_exit_code=$fuzzer_exit_code
echo "failure" > status.txt
{ rg -ao "Found error:.*" fuzzer.log \
|| rg -ao "Exception:.*" fuzzer.log \
|| echo "Fuzzer failed ($fuzzer_exit_code). See the logs." ; } \
| tail -1 > description.txt
echo "Let op!" > description.txt
echo "Fuzzer went wrong with error code: ($fuzzer_exit_code). Its process died somehow when the server stayed alive. The server log probably won't tell you much so try to find information in other files." >>description.txt
{ rg -ao "Found error:.*" fuzzer.log || rg -ao "Exception:.*" fuzzer.log; } | tail -1 >>description.txt
fi
if test -f core.*; then

View File

@ -24,17 +24,18 @@ RUN pip3 install \
deepdiff \
sqlglot
ARG odbc_repo="https://github.com/ClickHouse/clickhouse-odbc.git"
ARG odbc_driver_url="https://github.com/ClickHouse/clickhouse-odbc/releases/download/v1.1.6.20200320/clickhouse-odbc-1.1.6-Linux.tar.gz"
RUN mkdir -p /tmp/clickhouse-odbc-tmp \
&& cd /tmp/clickhouse-odbc-tmp \
&& curl -L ${odbc_driver_url} | tar --strip-components=1 -xz clickhouse-odbc-1.1.6-Linux \
&& mkdir /usr/local/lib64 -p \
&& cp /tmp/clickhouse-odbc-tmp/lib64/*.so /usr/local/lib64/ \
&& odbcinst -i -d -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbcinst.ini.sample \
&& odbcinst -i -s -l -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbc.ini.sample \
&& sed -i 's"=libclickhouseodbc"=/usr/local/lib64/libclickhouseodbc"' /etc/odbcinst.ini \
&& rm -rf /tmp/clickhouse-odbc-tmp
RUN git clone --recursive ${odbc_repo} \
&& mkdir -p /clickhouse-odbc/build \
&& cmake -S /clickhouse-odbc -B /clickhouse-odbc/build \
&& ls /clickhouse-odbc/build/driver \
&& make -j 10 -C /clickhouse-odbc/build \
&& ls /clickhouse-odbc/build/driver \
&& mkdir -p /usr/local/lib64/ && cp /clickhouse-odbc/build/driver/lib*.so /usr/local/lib64/ \
&& odbcinst -i -d -f /clickhouse-odbc/packaging/odbcinst.ini.sample \
&& odbcinst -i -s -l -f /clickhouse-odbc/packaging/odbc.ini.sample
ENV TZ=Europe/Amsterdam
ENV MAX_RUN_TIME=9000

1
docker/test/stateless/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
/minio_data

View File

@ -3,7 +3,7 @@
ARG FROM_TAG=latest
FROM clickhouse/test-base:$FROM_TAG
ARG odbc_driver_url="https://github.com/ClickHouse/clickhouse-odbc/releases/download/v1.1.4.20200302/clickhouse-odbc-1.1.4-Linux.tar.gz"
ARG odbc_driver_url="https://github.com/ClickHouse/clickhouse-odbc/releases/download/v1.1.6.20200320/clickhouse-odbc-1.1.6-Linux.tar.gz"
# golang version 1.13 on Ubuntu 20 is enough for tests
RUN apt-get update -y \
@ -35,7 +35,6 @@ RUN apt-get update -y \
sudo \
tree \
unixodbc \
wget \
rustc \
cargo \
zstd \
@ -50,11 +49,14 @@ RUN apt-get update -y \
RUN pip3 install numpy==1.26.3 scipy==1.12.0 pandas==1.5.3 Jinja2==3.1.3 pyarrow==15.0.0
RUN mkdir -p /tmp/clickhouse-odbc-tmp \
&& wget -nv -O - ${odbc_driver_url} | tar --strip-components=1 -xz -C /tmp/clickhouse-odbc-tmp \
&& cp /tmp/clickhouse-odbc-tmp/lib64/*.so /usr/local/lib/ \
&& odbcinst -i -d -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbcinst.ini.sample \
&& odbcinst -i -s -l -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbc.ini.sample \
&& rm -rf /tmp/clickhouse-odbc-tmp
&& cd /tmp/clickhouse-odbc-tmp \
&& curl -L ${odbc_driver_url} | tar --strip-components=1 -xz clickhouse-odbc-1.1.6-Linux \
&& mkdir /usr/local/lib64 -p \
&& cp /tmp/clickhouse-odbc-tmp/lib64/*.so /usr/local/lib64/ \
&& odbcinst -i -d -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbcinst.ini.sample \
&& odbcinst -i -s -l -f /tmp/clickhouse-odbc-tmp/share/doc/clickhouse-odbc/config/odbc.ini.sample \
&& sed -i 's"=libclickhouseodbc"=/usr/local/lib64/libclickhouseodbc"' /etc/odbcinst.ini \
&& rm -rf /tmp/clickhouse-odbc-tmp
ENV TZ=Europe/Amsterdam
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
@ -70,11 +72,11 @@ ARG TARGETARCH
# Download Minio-related binaries
RUN arch=${TARGETARCH:-amd64} \
&& wget "https://dl.min.io/server/minio/release/linux-${arch}/archive/minio.RELEASE.${MINIO_SERVER_VERSION}" -O ./minio \
&& wget "https://dl.min.io/client/mc/release/linux-${arch}/archive/mc.RELEASE.${MINIO_CLIENT_VERSION}" -O ./mc \
&& curl -L "https://dl.min.io/server/minio/release/linux-${arch}/archive/minio.RELEASE.${MINIO_SERVER_VERSION}" -o ./minio \
&& curl -L "https://dl.min.io/client/mc/release/linux-${arch}/archive/mc.RELEASE.${MINIO_CLIENT_VERSION}" -o ./mc \
&& chmod +x ./mc ./minio
RUN wget --no-verbose 'https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz' \
RUN curl -L --no-verbose -O 'https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz' \
&& tar -xvf hadoop-3.3.1.tar.gz \
&& rm -rf hadoop-3.3.1.tar.gz

View File

@ -60,5 +60,4 @@ RUN arch=${TARGETARCH:-amd64} \
COPY run.sh /
COPY process_style_check_result.py /
CMD ["/bin/bash", "/run.sh"]

View File

@ -26,6 +26,8 @@ RUN apt-get update \
&& export CODENAME="$(lsb_release --codename --short | tr 'A-Z' 'a-z')" \
&& echo "deb https://apt.llvm.org/${CODENAME}/ llvm-toolchain-${CODENAME}-${LLVM_VERSION} main" >> \
/etc/apt/sources.list \
&& apt-get update \
&& apt-get install --yes --no-install-recommends --verbose-versions llvm-${LLVM_VERSION} \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
@ -41,20 +43,11 @@ RUN apt-get update \
bash \
bsdmainutils \
build-essential \
clang-${LLVM_VERSION} \
clang-tidy-${LLVM_VERSION} \
cmake \
gdb \
git \
gperf \
libclang-rt-${LLVM_VERSION}-dev \
lld-${LLVM_VERSION} \
llvm-${LLVM_VERSION} \
llvm-${LLVM_VERSION}-dev \
libclang-${LLVM_VERSION}-dev \
moreutils \
nasm \
ninja-build \
pigz \
rename \
software-properties-common \
@ -63,49 +56,4 @@ RUN apt-get update \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
# This symlink is required by gcc to find the lld linker
RUN ln -s /usr/bin/lld-${LLVM_VERSION} /usr/bin/ld.lld
# for external_symbolizer_path
RUN ln -s /usr/bin/llvm-symbolizer-${LLVM_VERSION} /usr/bin/llvm-symbolizer
# FIXME: workaround for "The imported target "merge-fdata" references the file" error
# https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/commit/992e52c0b156a5ba9c6a8a54f8c4857ddd3d371d
RUN sed -i '/_IMPORT_CHECK_FILES_FOR_\(mlir-\|llvm-bolt\|merge-fdata\|MLIR\)/ {s|^|#|}' /usr/lib/llvm-${LLVM_VERSION}/lib/cmake/llvm/LLVMExports-*.cmake
ARG CCACHE_VERSION=4.6.1
RUN mkdir /tmp/ccache \
&& cd /tmp/ccache \
&& curl -L \
-O https://github.com/ccache/ccache/releases/download/v$CCACHE_VERSION/ccache-$CCACHE_VERSION.tar.xz \
-O https://github.com/ccache/ccache/releases/download/v$CCACHE_VERSION/ccache-$CCACHE_VERSION.tar.xz.asc \
&& gpg --recv-keys --keyserver hkps://keyserver.ubuntu.com 5A939A71A46792CF57866A51996DDA075594ADB8 \
&& gpg --verify ccache-4.6.1.tar.xz.asc \
&& tar xf ccache-$CCACHE_VERSION.tar.xz \
&& cd /tmp/ccache/ccache-$CCACHE_VERSION \
&& cmake -DCMAKE_INSTALL_PREFIX=/usr \
-DCMAKE_BUILD_TYPE=None \
-DZSTD_FROM_INTERNET=ON \
-DREDIS_STORAGE_BACKEND=OFF \
-Wno-dev \
-B build \
-S . \
&& make VERBOSE=1 -C build \
&& make install -C build \
&& cd / \
&& rm -rf /tmp/ccache
ARG TARGETARCH
ARG SCCACHE_VERSION=v0.5.4
ENV SCCACHE_IGNORE_SERVER_IO_ERROR=1
# sccache requires a value for the region. So by default we use The Default Region
ENV SCCACHE_REGION=us-east-1
RUN arch=${TARGETARCH:-amd64} \
&& case $arch in \
amd64) rarch=x86_64 ;; \
arm64) rarch=aarch64 ;; \
esac \
&& curl -Ls "https://github.com/mozilla/sccache/releases/download/$SCCACHE_VERSION/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl.tar.gz" | \
tar xz -C /tmp \
&& mv "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl/sccache" /usr/bin \
&& rm "/tmp/sccache-$SCCACHE_VERSION-$rarch-unknown-linux-musl" -r
COPY process_functional_tests_result.py /

View File

@ -0,0 +1,64 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.12.5.81-stable (a0fbe3ae813) FIXME as compared to v23.12.4.15-stable (4233d111d20)
#### Improvement
* Backported in [#60290](https://github.com/ClickHouse/ClickHouse/issues/60290): Copy S3 file GCP fallback to buffer copy in case GCP returned `Internal Error` with `GATEWAY_TIMEOUT` HTTP error code. [#60164](https://github.com/ClickHouse/ClickHouse/pull/60164) ([Maksim Kita](https://github.com/kitaisreal)).
* Backported in [#60830](https://github.com/ClickHouse/ClickHouse/issues/60830): Update tzdata to 2024a. [#60768](https://github.com/ClickHouse/ClickHouse/pull/60768) ([Raúl Marín](https://github.com/Algunenano)).
#### Build/Testing/Packaging Improvement
* Backported in [#59883](https://github.com/ClickHouse/ClickHouse/issues/59883): If you want to run initdb scripts every time when ClickHouse container is starting you shoud initialize environment varible CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS. [#59808](https://github.com/ClickHouse/ClickHouse/pull/59808) ([Alexander Nikolaev](https://github.com/AlexNik)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix_kql_issue_found_by_wingfuzz [#59626](https://github.com/ClickHouse/ClickHouse/pull/59626) ([Yong Wang](https://github.com/kashwy)).
* Fix error "Read beyond last offset" for AsynchronousBoundedReadBuffer [#59630](https://github.com/ClickHouse/ClickHouse/pull/59630) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix query start time on non initial queries [#59662](https://github.com/ClickHouse/ClickHouse/pull/59662) ([Raúl Marín](https://github.com/Algunenano)).
* rabbitmq: fix having neither acked nor nacked messages [#59775](https://github.com/ClickHouse/ClickHouse/pull/59775) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix parsing of partition expressions surrounded by parens [#59901](https://github.com/ClickHouse/ClickHouse/pull/59901) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix optimize_uniq_to_count removing the column alias [#60026](https://github.com/ClickHouse/ClickHouse/pull/60026) ([Raúl Marín](https://github.com/Algunenano)).
* Fix cosineDistance crash with Nullable [#60150](https://github.com/ClickHouse/ClickHouse/pull/60150) ([Raúl Marín](https://github.com/Algunenano)).
* Hide sensitive info for s3queue [#60233](https://github.com/ClickHouse/ClickHouse/pull/60233) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)).
* Reduce the number of read rows from `system.numbers` [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)).
* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Prevent setting custom metadata headers on unsupported multipart upload operations [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)).
* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)).
* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)).
#### CI Fix or Improvement (changelog entry is not required)
* Backported in [#60767](https://github.com/ClickHouse/ClickHouse/issues/60767): Decoupled changes from [#60408](https://github.com/ClickHouse/ClickHouse/issues/60408). [#60553](https://github.com/ClickHouse/ClickHouse/pull/60553) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#60582](https://github.com/ClickHouse/ClickHouse/issues/60582): Arm and amd docker build jobs use similar job names and thus overwrite job reports - aarch64 and amd64 suffixes added to fix this. [#60554](https://github.com/ClickHouse/ClickHouse/pull/60554) ([Max K.](https://github.com/maxknv)).
* Backported in [#61041](https://github.com/ClickHouse/ClickHouse/issues/61041): Debug and fix markreleaseready. [#60611](https://github.com/ClickHouse/ClickHouse/pull/60611) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#61030](https://github.com/ClickHouse/ClickHouse/issues/61030): ... [#61022](https://github.com/ClickHouse/ClickHouse/pull/61022) ([Max K.](https://github.com/maxknv)).
* Backported in [#61224](https://github.com/ClickHouse/ClickHouse/issues/61224): ... [#61183](https://github.com/ClickHouse/ClickHouse/pull/61183) ([Han Fei](https://github.com/hanfei1991)).
* Backported in [#61190](https://github.com/ClickHouse/ClickHouse/issues/61190): ... [#61185](https://github.com/ClickHouse/ClickHouse/pull/61185) ([Max K.](https://github.com/maxknv)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Backport [#59798](https://github.com/ClickHouse/ClickHouse/issues/59798) to 23.12: CI: do not reuse builds on release branches"'. [#59979](https://github.com/ClickHouse/ClickHouse/pull/59979) ([Max K.](https://github.com/maxknv)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* CI: move ci-specifics from job scripts to ci.py [#58516](https://github.com/ClickHouse/ClickHouse/pull/58516) ([Max K.](https://github.com/maxknv)).
* Make ZooKeeper actually sequentialy consistent [#59735](https://github.com/ClickHouse/ClickHouse/pull/59735) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix special build reports in release branches [#59797](https://github.com/ClickHouse/ClickHouse/pull/59797) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* CI: do not reuse builds on release branches [#59798](https://github.com/ClickHouse/ClickHouse/pull/59798) ([Max K.](https://github.com/maxknv)).
* Fix mark release ready [#59994](https://github.com/ClickHouse/ClickHouse/pull/59994) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Ability to detect undead ZooKeeper sessions [#60044](https://github.com/ClickHouse/ClickHouse/pull/60044) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Detect io_uring in tests [#60373](https://github.com/ClickHouse/ClickHouse/pull/60373) ([Azat Khuzhin](https://github.com/azat)).
* Cancel PipelineExecutor properly in case of exception in spawnThreads [#60499](https://github.com/ClickHouse/ClickHouse/pull/60499) ([Kruglov Pavel](https://github.com/Avogar)).
* Remove broken test while we fix it [#60547](https://github.com/ClickHouse/ClickHouse/pull/60547) ([Raúl Marín](https://github.com/Algunenano)).
* Speed up cctools building [#61011](https://github.com/ClickHouse/ClickHouse/pull/61011) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,24 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.3.21.26-lts (d9672a3731f) FIXME as compared to v23.3.20.27-lts (cc974ba4f81)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix reading from sparse columns after restart [#49660](https://github.com/ClickHouse/ClickHouse/pull/49660) ([Anton Popov](https://github.com/CurtizJ)).
* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)).
* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)).
* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Cancel PipelineExecutor properly in case of exception in spawnThreads [#57104](https://github.com/ClickHouse/ClickHouse/pull/57104) ([Kruglov Pavel](https://github.com/Avogar)).
* Detect io_uring in tests [#60373](https://github.com/ClickHouse/ClickHouse/pull/60373) ([Azat Khuzhin](https://github.com/azat)).
* Cancel PipelineExecutor properly in case of exception in spawnThreads [#60499](https://github.com/ClickHouse/ClickHouse/pull/60499) ([Kruglov Pavel](https://github.com/Avogar)).

View File

@ -0,0 +1,30 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v23.8.11.28-lts (31879d2ab4c) FIXME as compared to v23.8.10.43-lts (a278225bba9)
#### Improvement
* Backported in [#60828](https://github.com/ClickHouse/ClickHouse/issues/60828): Update tzdata to 2024a. [#60768](https://github.com/ClickHouse/ClickHouse/pull/60768) ([Raúl Marín](https://github.com/Algunenano)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)).
* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)).
* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)).
#### NO CL ENTRY
* NO CL ENTRY: 'Use the current branch test-utils to build cctools'. [#61276](https://github.com/ClickHouse/ClickHouse/pull/61276) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Cancel PipelineExecutor properly in case of exception in spawnThreads [#57104](https://github.com/ClickHouse/ClickHouse/pull/57104) ([Kruglov Pavel](https://github.com/Avogar)).
* Detect io_uring in tests [#60373](https://github.com/ClickHouse/ClickHouse/pull/60373) ([Azat Khuzhin](https://github.com/azat)).
* Cancel PipelineExecutor properly in case of exception in spawnThreads [#60499](https://github.com/ClickHouse/ClickHouse/pull/60499) ([Kruglov Pavel](https://github.com/Avogar)).

View File

@ -0,0 +1,26 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.1.7.18-stable (90925babd78) FIXME as compared to v24.1.6.52-stable (fa09f677bc9)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)).
* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)).
* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)).
#### CI Fix or Improvement (changelog entry is not required)
* Backported in [#61043](https://github.com/ClickHouse/ClickHouse/issues/61043): Debug and fix markreleaseready. [#60611](https://github.com/ClickHouse/ClickHouse/pull/60611) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#61168](https://github.com/ClickHouse/ClickHouse/issues/61168): Just a preparation for the merge queue support. [#61099](https://github.com/ClickHouse/ClickHouse/pull/61099) ([Max K.](https://github.com/maxknv)).
* Backported in [#61192](https://github.com/ClickHouse/ClickHouse/issues/61192): ... [#61185](https://github.com/ClickHouse/ClickHouse/pull/61185) ([Max K.](https://github.com/maxknv)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Cancel PipelineExecutor properly in case of exception in spawnThreads [#60499](https://github.com/ClickHouse/ClickHouse/pull/60499) ([Kruglov Pavel](https://github.com/Avogar)).

View File

@ -0,0 +1,55 @@
---
sidebar_position: 1
sidebar_label: 2024
---
# 2024 Changelog
### ClickHouse release v24.2.2.71-stable (9293d361e72) FIXME as compared to v24.2.1.2248-stable (891689a4150)
#### Improvement
* Backported in [#60834](https://github.com/ClickHouse/ClickHouse/issues/60834): Update tzdata to 2024a. [#60768](https://github.com/ClickHouse/ClickHouse/pull/60768) ([Raúl Marín](https://github.com/Algunenano)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* PartsSplitter invalid ranges for the same part [#60041](https://github.com/ClickHouse/ClickHouse/pull/60041) ([Maksim Kita](https://github.com/kitaisreal)).
* Try to avoid calculation of scalar subqueries for CREATE TABLE. [#60464](https://github.com/ClickHouse/ClickHouse/pull/60464) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix deadlock in parallel parsing when lots of rows are skipped due to errors [#60516](https://github.com/ClickHouse/ClickHouse/pull/60516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix_max_query_size_for_kql_compound_operator: [#60534](https://github.com/ClickHouse/ClickHouse/pull/60534) ([Yong Wang](https://github.com/kashwy)).
* Reduce the number of read rows from `system.numbers` [#60546](https://github.com/ClickHouse/ClickHouse/pull/60546) ([JackyWoo](https://github.com/JackyWoo)).
* Don't output number tips for date types [#60577](https://github.com/ClickHouse/ClickHouse/pull/60577) ([Raúl Marín](https://github.com/Algunenano)).
* Fix buffer overflow in CompressionCodecMultiple [#60731](https://github.com/ClickHouse/ClickHouse/pull/60731) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove nonsense from SQL/JSON [#60738](https://github.com/ClickHouse/ClickHouse/pull/60738) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Prevent setting custom metadata headers on unsupported multipart upload operations [#60748](https://github.com/ClickHouse/ClickHouse/pull/60748) ([Francisco J. Jurado Moreno](https://github.com/Beetelbrox)).
* Fix crash in arrayEnumerateRanked [#60764](https://github.com/ClickHouse/ClickHouse/pull/60764) ([Raúl Marín](https://github.com/Algunenano)).
* Fix crash when using input() in INSERT SELECT JOIN [#60765](https://github.com/ClickHouse/ClickHouse/pull/60765) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash with different allow_experimental_analyzer value in subqueries [#60770](https://github.com/ClickHouse/ClickHouse/pull/60770) ([Dmitry Novik](https://github.com/novikd)).
* Remove recursion when reading from S3 [#60849](https://github.com/ClickHouse/ClickHouse/pull/60849) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix multiple bugs in groupArraySorted [#61203](https://github.com/ClickHouse/ClickHouse/pull/61203) ([Raúl Marín](https://github.com/Algunenano)).
* Fix Keeper reconfig for standalone binary [#61233](https://github.com/ClickHouse/ClickHouse/pull/61233) ([Antonio Andelic](https://github.com/antonio2368)).
#### CI Fix or Improvement (changelog entry is not required)
* Backported in [#60758](https://github.com/ClickHouse/ClickHouse/issues/60758): Decoupled changes from [#60408](https://github.com/ClickHouse/ClickHouse/issues/60408). [#60553](https://github.com/ClickHouse/ClickHouse/pull/60553) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#60706](https://github.com/ClickHouse/ClickHouse/issues/60706): Eliminates the need to provide input args to docker server jobs to clean yml files. [#60602](https://github.com/ClickHouse/ClickHouse/pull/60602) ([Max K.](https://github.com/maxknv)).
* Backported in [#61045](https://github.com/ClickHouse/ClickHouse/issues/61045): Debug and fix markreleaseready. [#60611](https://github.com/ClickHouse/ClickHouse/pull/60611) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#60721](https://github.com/ClickHouse/ClickHouse/issues/60721): Fix build_report job so that it's defined by ci_config only (not yml file). [#60613](https://github.com/ClickHouse/ClickHouse/pull/60613) ([Max K.](https://github.com/maxknv)).
* Backported in [#60668](https://github.com/ClickHouse/ClickHouse/issues/60668): Do not await ci pending jobs on release branches decrease wait timeout to fit into gh job timeout. [#60652](https://github.com/ClickHouse/ClickHouse/pull/60652) ([Max K.](https://github.com/maxknv)).
* Backported in [#60863](https://github.com/ClickHouse/ClickHouse/issues/60863): Set limited number of builds for "special build check" report in backports. [#60850](https://github.com/ClickHouse/ClickHouse/pull/60850) ([Max K.](https://github.com/maxknv)).
* Backported in [#60946](https://github.com/ClickHouse/ClickHouse/issues/60946): ... [#60935](https://github.com/ClickHouse/ClickHouse/pull/60935) ([Max K.](https://github.com/maxknv)).
* Backported in [#60972](https://github.com/ClickHouse/ClickHouse/issues/60972): ... [#60952](https://github.com/ClickHouse/ClickHouse/pull/60952) ([Max K.](https://github.com/maxknv)).
* Backported in [#60980](https://github.com/ClickHouse/ClickHouse/issues/60980): ... [#60958](https://github.com/ClickHouse/ClickHouse/pull/60958) ([Max K.](https://github.com/maxknv)).
* Backported in [#61170](https://github.com/ClickHouse/ClickHouse/issues/61170): Just a preparation for the merge queue support. [#61099](https://github.com/ClickHouse/ClickHouse/pull/61099) ([Max K.](https://github.com/maxknv)).
* Backported in [#61181](https://github.com/ClickHouse/ClickHouse/issues/61181): ... [#61172](https://github.com/ClickHouse/ClickHouse/pull/61172) ([Max K.](https://github.com/maxknv)).
* Backported in [#61228](https://github.com/ClickHouse/ClickHouse/issues/61228): ... [#61183](https://github.com/ClickHouse/ClickHouse/pull/61183) ([Han Fei](https://github.com/hanfei1991)).
* Backported in [#61194](https://github.com/ClickHouse/ClickHouse/issues/61194): ... [#61185](https://github.com/ClickHouse/ClickHouse/pull/61185) ([Max K.](https://github.com/maxknv)).
* Backported in [#61244](https://github.com/ClickHouse/ClickHouse/issues/61244): ... [#61214](https://github.com/ClickHouse/ClickHouse/pull/61214) ([Max K.](https://github.com/maxknv)).
* Backported in [#61388](https://github.com/ClickHouse/ClickHouse/issues/61388):. [#61373](https://github.com/ClickHouse/ClickHouse/pull/61373) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* CI: make workflow yml abstract [#60421](https://github.com/ClickHouse/ClickHouse/pull/60421) ([Max K.](https://github.com/maxknv)).
* Cancel PipelineExecutor properly in case of exception in spawnThreads [#60499](https://github.com/ClickHouse/ClickHouse/pull/60499) ([Kruglov Pavel](https://github.com/Avogar)).
* General sanity in function `seriesOutliersDetectTukey` [#60535](https://github.com/ClickHouse/ClickHouse/pull/60535) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Speed up cctools building [#61011](https://github.com/ClickHouse/ClickHouse/pull/61011) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -18,8 +18,8 @@ This engine allows integrating ClickHouse with [RabbitMQ](https://www.rabbitmq.c
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
name1 [type1],
name2 [type2],
...
) ENGINE = RabbitMQ SETTINGS
rabbitmq_host_port = 'host:port' [or rabbitmq_address = 'amqp(s)://guest:guest@localhost/vhost'],
@ -198,6 +198,10 @@ Additional virtual columns when `kafka_handle_error_mode='stream'`:
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always `NULL` when message was parsed successfully.
## Caveats {#caveats}
Even though you may specify [default column expressions](/docs/en/sql-reference/statements/create/table.md/#default_values) (such as `DEFAULT`, `MATERIALIZED`, `ALIAS`) in the table definition, these will be ignored. Instead, the columns will be filled with their respective default values for their types.
## Data formats support {#data-formats-support}
RabbitMQ engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -946,96 +946,6 @@ You could change storage policy after table creation with [ALTER TABLE ... MODIF
The number of threads performing background moves of data parts can be changed by [background_move_pool_size](/docs/en/operations/server-configuration-parameters/settings.md/#background_move_pool_size) setting.
### Dynamic Storage
This example query shows how to attach a table stored at a URL and configure the
remote storage within the query. The web storage is not configured in the ClickHouse
configuration files; all the settings are in the CREATE/ATTACH query.
:::note
The example uses `type=web`, but any disk type can be configured as dynamic, even Local disk. Local disks require a path argument to be inside the server config parameter `custom_local_disks_base_directory`, which has no default, so set that also when using local disk.
:::
#### Example dynamic web storage
:::tip
A [demo dataset](https://github.com/ClickHouse/web-tables-demo) is hosted in GitHub. To prepare your own tables for web storage see the tool [clickhouse-static-files-uploader](/docs/en/operations/storing-data.md/#storing-data-on-webserver)
:::
In this `ATTACH TABLE` query the `UUID` provided matches the directory name of the data, and the endpoint is the URL for the raw GitHub content.
```sql
# highlight-next-line
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
is_new UInt8,
duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2)
# highlight-start
SETTINGS disk = disk(
type=web,
endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
);
# highlight-end
```
### Nested Dynamic Storage
This example query builds on the above dynamic disk configuration and shows how to
use a local disk to cache data from a table stored at a URL. Neither the cache disk
nor the web storage is configured in the ClickHouse configuration files; both are
configured in the CREATE/ATTACH query settings.
In the settings highlighted below notice that the disk of `type=web` is nested within
the disk of `type=cache`.
```sql
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
is_new UInt8,
duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2)
# highlight-start
SETTINGS disk = disk(
type=cache,
max_size='1Gi',
path='/var/lib/clickhouse/custom_disk_cache/',
disk=disk(
type=web,
endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
)
);
# highlight-end
```
### Details {#details}
In the case of `MergeTree` tables, data is getting to disk in different ways:
@ -1064,13 +974,11 @@ During this time, they are not moved to other volumes or disks. Therefore, until
User can assign new big parts to different disks of a [JBOD](https://en.wikipedia.org/wiki/Non-RAID_drive_architectures) volume in a balanced way using the [min_bytes_to_rebalance_partition_over_jbod](/docs/en/operations/settings/merge-tree-settings.md/#min-bytes-to-rebalance-partition-over-jbod) setting.
## Using S3 for Data Storage {#table_engine-mergetree-s3}
## Using External Storage for Data Storage {#table_engine-mergetree-s3}
:::note
Google Cloud Storage (GCS) is also supported using the type `s3`. See [GCS backed MergeTree](/docs/en/integrations/gcs).
:::
[MergeTree](/docs/en/engines/table-engines/mergetree-family/mergetree.md) family table engines can store data to `S3`, `AzureBlobStorage`, `HDFS` using a disk with types `s3`, `azure_blob_storage`, `hdfs` accordingly. See [configuring external storage options](/docs/en/operations/storing-data.md/#configuring-external-storage) for more details.
`MergeTree` family table engines can store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`.
Example for [S3](https://aws.amazon.com/s3/) as external storage using a disk with type `s3`.
Configuration markup:
``` xml
@ -1112,253 +1020,12 @@ Configuration markup:
</storage_configuration>
```
Also see [configuring external storage options](/docs/en/operations/storing-data.md/#configuring-external-storage).
:::note cache configuration
ClickHouse versions 22.3 through 22.7 use a different cache configuration, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) if you are using one of those versions.
:::
### Configuring the S3 disk
Required parameters:
- `endpoint` — S3 endpoint URL in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint URL should contain a bucket and root path to store data.
- `access_key_id` — S3 access key id.
- `secret_access_key` — S3 secret access key.
Optional parameters:
- `region` — S3 region name.
- `support_batch_delete` — This controls the check to see if batch deletes are supported. Set this to `false` when using Google Cloud Storage (GCS) as GCS does not support batch deletes and preventing the checks will prevent error messages in the logs.
- `use_environment_credentials` — Reads AWS credentials from the Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN if they exist. Default value is `false`.
- `use_insecure_imds_request` — If set to `true`, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Default value is `false`.
- `expiration_window_seconds` — Grace period for checking if expiration-based credentials have expired. Optional, default value is `120`.
- `proxy` — Proxy configuration for S3 endpoint. Each `uri` element inside `proxy` block should contain a proxy URL.
- `connect_timeout_ms` — Socket connect timeout in milliseconds. Default value is `10 seconds`.
- `request_timeout_ms` — Request timeout in milliseconds. Default value is `5 seconds`.
- `retry_attempts` — Number of retry attempts in case of failed request. Default value is `10`.
- `single_read_retries` — Number of retry attempts in case of connection drop during read. Default value is `4`.
- `min_bytes_for_seek` — Minimal number of bytes to use seek operation instead of sequential read. Default value is `1 Mb`.
- `metadata_path` — Path on local FS to store metadata files for S3. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
- `skip_access_check` — If true, disk access checks will not be performed on disk start-up. Default value is `false`.
- `header` — Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times.
- `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set.
- `server_side_encryption_kms_key_id` - If specified, required headers for accessing S3 objects with [SSE-KMS encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) will be set. If an empty string is specified, the AWS managed S3 key will be used. Optional.
- `server_side_encryption_kms_encryption_context` - If specified alongside `server_side_encryption_kms_key_id`, the given encryption context header for SSE-KMS will be set. Optional.
- `server_side_encryption_kms_bucket_key_enabled` - If specified alongside `server_side_encryption_kms_key_id`, the header to enable S3 bucket keys for SSE-KMS will be set. Optional, can be `true` or `false`, defaults to nothing (matches the bucket-level setting).
- `s3_max_put_rps` — Maximum PUT requests per second rate before throttling. Default value is `0` (unlimited).
- `s3_max_put_burst` — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (`0` value) equals to `s3_max_put_rps`.
- `s3_max_get_rps` — Maximum GET requests per second rate before throttling. Default value is `0` (unlimited).
- `s3_max_get_burst` — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (`0` value) equals to `s3_max_get_rps`.
- `read_resource` — Resource name to be used for [scheduling](/docs/en/operations/workload-scheduling.md) of read requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
- `write_resource` — Resource name to be used for [scheduling](/docs/en/operations/workload-scheduling.md) of write requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
- `key_template` — Define the format with which the object keys are generated. By default, Clickhouse takes `root path` from `endpoint` option and adds random generated suffix. That suffix is a dir with 3 random symbols and a file name with 29 random symbols. With that option you have a full control how to the object keys are generated. Some usage scenarios require having random symbols in the prefix or in the middle of object key. For example: `[a-z]{3}-prefix-random/constant-part/random-middle-[a-z]{3}/random-suffix-[a-z]{29}`. The value is parsed with [`re2`](https://github.com/google/re2/wiki/Syntax). Only some subset of the syntax is supported. Check if your preferred format is supported before using that option. Disk isn't initialized if clickhouse is unable to generate a key by the value of `key_template`. It requires enabled feature flag [storage_metadata_write_full_object_key](/docs/en/operations/settings/settings#storage_metadata_write_full_object_key). It forbids declaring the `root path` in `endpoint` option. It requires definition of the option `key_compatibility_prefix`.
- `key_compatibility_prefix` — That option is required when option `key_template` is in use. In order to be able to read the objects keys which were stored in the metadata files with the metadata version lower that `VERSION_FULL_OBJECT_KEY`, the previous `root path` from the `endpoint` option should be set here.
### Configuring the cache
This is the cache configuration from above:
```xml
<s3_cache>
<type>cache</type>
<disk>s3</disk>
<path>/var/lib/clickhouse/disks/s3_cache/</path>
<max_size>10Gi</max_size>
</s3_cache>
```
These parameters define the cache layer:
- `type` — If a disk is of type `cache` it caches mark and index files in memory.
- `disk` — The name of the disk that will be cached.
Cache parameters:
- `path` — The path where metadata for the cache is stored.
- `max_size` — The size (amount of disk space) that the cache can grow to.
:::tip
There are several other cache parameters that you can use to tune your storage, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) for the details.
:::
S3 disk can be configured as `main` or `cold` storage:
``` xml
<storage_configuration>
...
<disks>
<s3>
<type>s3</type>
<endpoint>https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/root-path/</endpoint>
<access_key_id>your_access_key_id</access_key_id>
<secret_access_key>your_secret_access_key</secret_access_key>
</s3>
</disks>
<policies>
<s3_main>
<volumes>
<main>
<disk>s3</disk>
</main>
</volumes>
</s3_main>
<s3_cold>
<volumes>
<main>
<disk>default</disk>
</main>
<external>
<disk>s3</disk>
</external>
</volumes>
<move_factor>0.2</move_factor>
</s3_cold>
</policies>
...
</storage_configuration>
```
In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule.
## Using Azure Blob Storage for Data Storage {#table_engine-mergetree-azure-blob-storage}
`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.
As of February 2022, this feature is still a fresh addition, so expect that some Azure Blob Storage functionalities might be unimplemented.
Configuration markup:
``` xml
<storage_configuration>
...
<disks>
<blob_storage_disk>
<type>azure_blob_storage</type>
<storage_account_url>http://account.blob.core.windows.net</storage_account_url>
<container_name>container</container_name>
<account_name>account</account_name>
<account_key>pass123</account_key>
<metadata_path>/var/lib/clickhouse/disks/blob_storage_disk/</metadata_path>
<cache_path>/var/lib/clickhouse/disks/blob_storage_disk/cache/</cache_path>
<skip_access_check>false</skip_access_check>
</blob_storage_disk>
</disks>
...
</storage_configuration>
```
Connection parameters:
* `endpoint` — AzureBlobStorage endpoint URL with container & prefix. Optionally can contain account_name if the authentication method used needs it. (`http://account.blob.core.windows.net:{port}/[account_name]{container_name}/{data_prefix}`) or these parameters can be provided separately using storage_account_url, account_name & container. For specifying prefix, endpoint should be used.
* `endpoint_contains_account_name` - This flag is used to specify if endpoint contains account_name as it is only needed for certain authentication methods. (Default : true)
* `storage_account_url` - Required if endpoint is not specified, Azure Blob Storage account URL, like `http://account.blob.core.windows.net` or `http://azurite1:10000/devstoreaccount1`.
* `container_name` - Target container name, defaults to `default-container`.
* `container_already_exists` - If set to `false`, a new container `container_name` is created in the storage account, if set to `true`, disk connects to the container directly, and if left unset, disk connects to the account, checks if the container `container_name` exists, and creates it if it doesn't exist yet.
Authentication parameters (the disk will try all available methods **and** Managed Identity Credential):
* `connection_string` - For authentication using a connection string.
* `account_name` and `account_key` - For authentication using Shared Key.
Limit parameters (mainly for internal usage):
* `s3_max_single_part_upload_size` - Limits the size of a single block upload to Blob Storage.
* `min_bytes_for_seek` - Limits the size of a seekable region.
* `max_single_read_retries` - Limits the number of attempts to read a chunk of data from Blob Storage.
* `max_single_download_retries` - Limits the number of attempts to download a readable buffer from Blob Storage.
* `thread_pool_size` - Limits the number of threads with which `IDiskRemote` is instantiated.
* `s3_max_inflight_parts_for_one_file` - Limits the number of put requests that can be run concurrently for one object.
Other parameters:
* `metadata_path` - Path on local FS to store metadata files for Blob Storage. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
* `skip_access_check` - If true, disk access checks will not be performed on disk start-up. Default value is `false`.
* `read_resource` — Resource name to be used for [scheduling](/docs/en/operations/workload-scheduling.md) of read requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
* `write_resource` — Resource name to be used for [scheduling](/docs/en/operations/workload-scheduling.md) of write requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
Examples of working configurations can be found in integration tests directory (see e.g. [test_merge_tree_azure_blob_storage](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml) or [test_azure_blob_storage_zero_copy_replication](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml)).
:::note Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
## HDFS storage {#hdfs-storage}
In this sample configuration:
- the disk is of type `hdfs`
- the data is hosted at `hdfs://hdfs1:9000/clickhouse/`
```xml
<clickhouse>
<storage_configuration>
<disks>
<hdfs>
<type>hdfs</type>
<endpoint>hdfs://hdfs1:9000/clickhouse/</endpoint>
<skip_access_check>true</skip_access_check>
</hdfs>
<hdd>
<type>local</type>
<path>/</path>
</hdd>
</disks>
<policies>
<hdfs>
<volumes>
<main>
<disk>hdfs</disk>
</main>
<external>
<disk>hdd</disk>
</external>
</volumes>
</hdfs>
</policies>
</storage_configuration>
</clickhouse>
```
## Web storage (read-only) {#web-storage}
Web storage can be used for read-only purposes. An example use is for hosting sample
data, or for migrating data.
:::tip
Storage can also be configured temporarily within a query, if a web dataset is not expected
to be used routinely, see [dynamic storage](#dynamic-storage) and skip editing the
configuration file.
:::
In this sample configuration:
- the disk is of type `web`
- the data is hosted at `http://nginx:80/test1/`
- a cache on local storage is used
```xml
<clickhouse>
<storage_configuration>
<disks>
<web>
<type>web</type>
<endpoint>http://nginx:80/test1/</endpoint>
</web>
<cached_web>
<type>cache</type>
<disk>web</disk>
<path>cached_web_cache/</path>
<max_size>100000000</max_size>
</cached_web>
</disks>
<policies>
<web>
<volumes>
<main>
<disk>web</disk>
</main>
</volumes>
</web>
<cached_web>
<volumes>
<main>
<disk>cached_web</disk>
</main>
</volumes>
</cached_web>
</policies>
</storage_configuration>
</clickhouse>
```
## Virtual Columns {#virtual-columns}
- `_part` — Name of a part.

View File

@ -55,7 +55,7 @@ CREATE TABLE criteo_log (
) ENGINE = Log;
```
Download the data:
Insert the data:
``` bash
$ for i in {00..23}; do echo $i; zcat datasets/criteo/day_${i#0}.gz | sed -r 's/^/2000-01-'${i/00/24}'\t/' | clickhouse-client --host=example-perftest01j --query="INSERT INTO criteo_log FORMAT TabSeparated"; done

View File

@ -10,10 +10,14 @@ The embeddings and the metadata are stored in separate files in the raw data. A
converts them to CSV and imports them into ClickHouse. You can use the following `download.sh` script for that:
```bash
wget --tries=100 https://deploy.laion.ai/8f83b608504d46bb81708ec86e912220/embeddings/img_emb/img_emb_${1}.npy # download image embedding
wget --tries=100 https://deploy.laion.ai/8f83b608504d46bb81708ec86e912220/embeddings/text_emb/text_emb_${1}.npy # download text embedding
wget --tries=100 https://deploy.laion.ai/8f83b608504d46bb81708ec86e912220/embeddings/metadata/metadata_${1}.parquet # download metadata
python3 process.py ${1} # merge files and convert to CSV
number=${1}
if [[ $number == '' ]]; then
number=1
fi;
wget --tries=100 https://deploy.laion.ai/8f83b608504d46bb81708ec86e912220/embeddings/img_emb/img_emb_${number}.npy # download image embedding
wget --tries=100 https://deploy.laion.ai/8f83b608504d46bb81708ec86e912220/embeddings/text_emb/text_emb_${number}.npy # download text embedding
wget --tries=100 https://deploy.laion.ai/8f83b608504d46bb81708ec86e912220/embeddings/metadata/metadata_${number}.parquet # download metadata
python3 process.py $number # merge files and convert to CSV
```
Script `process.py` is defined as follows:

View File

@ -248,6 +248,9 @@ Some of the files might not download fully. Check the file sizes and re-download
``` bash
$ curl -O https://datasets.clickhouse.com/trips_mergetree/partitions/trips_mergetree.tar
# Validate the checksum
$ md5sum trips_mergetree.tar
# Checksum should be equal to: f3b8d469b41d9a82da064ded7245d12c
$ tar xvf trips_mergetree.tar -C /var/lib/clickhouse # path to ClickHouse data directory
$ # check permissions of unpacked data, fix if required
$ sudo service clickhouse-server restart

View File

@ -78,8 +78,8 @@ It is recommended to use official pre-compiled `deb` packages for Debian or Ubun
#### Setup the Debian repository
``` bash
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/clickhouse-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 8919F6BD2B48D754
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | sudo tee \
/etc/apt/sources.list.d/clickhouse.list

View File

@ -170,7 +170,7 @@ RESTORE TABLE test.table PARTITIONS '2', '3'
### Backups as tar archives
Backups can also be stored as tar archives. The functionality is the same as for zip, except that a password is not supported.
Backups can also be stored as tar archives. The functionality is the same as for zip, except that a password is not supported.
Write a backup as a tar:
```
@ -444,10 +444,6 @@ Often data that is ingested into ClickHouse is delivered through some sort of pe
Some local filesystems provide snapshot functionality (for example, [ZFS](https://en.wikipedia.org/wiki/ZFS)), but they might not be the best choice for serving live queries. A possible solution is to create additional replicas with this kind of filesystem and exclude them from the [Distributed](../engines/table-engines/special/distributed.md) tables that are used for `SELECT` queries. Snapshots on such replicas will be out of reach of any queries that modify data. As a bonus, these replicas might have special hardware configurations with more disks attached per server, which would be cost-effective.
### clickhouse-copier {#clickhouse-copier}
[clickhouse-copier](../operations/utilities/clickhouse-copier.md) is a versatile tool that was initially created to re-shard petabyte-sized tables. It can also be used for backup and restore purposes because it reliably copies data between ClickHouse tables and clusters.
For smaller volumes of data, a simple `INSERT INTO ... SELECT ...` to remote tables might work as well.
### Manipulations with Parts {#manipulations-with-parts}

View File

@ -933,9 +933,9 @@ Hard limit is configured via system tools
## database_atomic_delay_before_drop_table_sec {#database_atomic_delay_before_drop_table_sec}
Sets the delay before remove table data in seconds. If the query has `SYNC` modifier, this setting is ignored.
The delay before a table data is dropped in seconds. If the `DROP TABLE` query has a `SYNC` modifier, this setting is ignored.
Default value: `480` (8 minute).
Default value: `480` (8 minutes).
## database_catalog_unused_dir_hide_timeout_sec {#database_catalog_unused_dir_hide_timeout_sec}

View File

@ -3954,6 +3954,7 @@ Possible values:
- `none` — Is similar to throw, but distributed DDL query returns no result set.
- `null_status_on_timeout` — Returns `NULL` as execution status in some rows of result set instead of throwing `TIMEOUT_EXCEEDED` if query is not finished on the corresponding hosts.
- `never_throw` — Do not throw `TIMEOUT_EXCEEDED` and do not rethrow exceptions if query has failed on some hosts.
- `none_only_active` - similar to `none`, but doesn't wait for inactive replicas of the `Replicated` database. Note: with this mode it's impossible to figure out that the query was not executed on some replica and will be executed in background.
- `null_status_on_timeout_only_active` — similar to `null_status_on_timeout`, but doesn't wait for inactive replicas of the `Replicated` database
- `throw_only_active` — similar to `throw`, but doesn't wait for inactive replicas of the `Replicated` database

View File

@ -5,26 +5,416 @@ sidebar_label: "External Disks for Storing Data"
title: "External Disks for Storing Data"
---
Data, processed in ClickHouse, is usually stored in the local file system — on the same machine with the ClickHouse server. That requires large-capacity disks, which can be expensive enough. To avoid that you can store the data remotely — on [Amazon S3](https://aws.amazon.com/s3/) disks or in the Hadoop Distributed File System ([HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html)).
Data, processed in ClickHouse, is usually stored in the local file system — on the same machine with the ClickHouse server. That requires large-capacity disks, which can be expensive enough. To avoid that you can store the data remotely. Various storages are supported:
1. [Amazon S3](https://aws.amazon.com/s3/) object storage.
2. The Hadoop Distributed File System ([HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html))
3. [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs).
To work with data stored on `Amazon S3` disks use [S3](/docs/en/engines/table-engines/integrations/s3.md) table engine, and to work with data in the Hadoop Distributed File System — [HDFS](/docs/en/engines/table-engines/integrations/hdfs.md) table engine.
:::note ClickHouse also has support for external table engines, which are different from external storage option described on this page as they allow to read data stored in some general file format (like Parquet), while on this page we are describing storage configuration for ClickHouse `MergeTree` family or `Log` family tables.
1. to work with data stored on `Amazon S3` disks, use [S3](/docs/en/engines/table-engines/integrations/s3.md) table engine.
2. to work with data in the Hadoop Distributed File System — [HDFS](/docs/en/engines/table-engines/integrations/hdfs.md) table engine.
3. to work with data stored in Azure Blob Storage use [AzureBlobStorage](/docs/en/engines/table-engines/integrations/azureBlobStorage.md) table engine.
:::
To load data from a web server with static files use a disk with type [web](#storing-data-on-webserver).
## Configuring external storage {#configuring-external-storage}
## Configuring HDFS {#configuring-hdfs}
[MergeTree](/docs/en/engines/table-engines/mergetree-family/mergetree.md) and [Log](/docs/en/engines/table-engines/log-family/log.md) family table engines can store data to `S3`, `AzureBlobStorage`, `HDFS` using a disk with types `s3`, `azure_blob_storage`, `hdfs` accordingly.
[MergeTree](/docs/en/engines/table-engines/mergetree-family/mergetree.md) and [Log](/docs/en/engines/table-engines/log-family/log.md) family table engines can store data to HDFS using a disk with type `HDFS`.
Disk configuration requires:
1. `type` section, equal to one of `s3`, `azure_blob_storage`, `hdfs`, `local_blob_storage`, `web`.
2. Configuration of a specific external storage type.
Configuration markup:
Starting from 24.1 clickhouse version, it is possible to use a new configuration option.
It requires to specify:
1. `type` equal to `object_storage`
2. `object_storage_type`, equal to one of `s3`, `azure_blob_storage` (or just `azure` from `24.3`), `hdfs`, `local_blob_storage` (or just `local` from `24.3`), `web`.
Optionally, `metadata_type` can be specified (it is equal to `local` by default), but it can also be set to `plain`, `web`.
Usage of `plain` metadata type is described in [plain storage section](/docs/en/operations/storing-data.md/#storing-data-on-webserver), `web` metadata type can be used only with `web` object storage type, `local` metadata type stores metadata files locally (each metadata files contains mapping to files in object storage and some additional meta information about them).
E.g. configuration option
``` xml
<s3>
<type>s3</type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_invironment_credentials>1</use_invironment_credentials>
</s3>
```
is equal to configuration (from `24.1`):
``` xml
<s3>
<type>object_storage</type>
<object_storage_type>s3</object_storage_type>
<metadata_type>local</metadata_type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_invironment_credentials>1</use_invironment_credentials>
</s3>
```
Configuration
``` xml
<s3_plain>
<type>s3_plain</type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_invironment_credentials>1</use_invironment_credentials>
</s3_plain>
```
is equal to
``` xml
<s3_plain>
<type>object_storage</type>
<object_storage_type>s3</object_storage_type>
<metadata_type>plain</metadata_type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_invironment_credentials>1</use_invironment_credentials>
</s3_plain>
```
Example of full storage configuration will look like:
``` xml
<clickhouse>
<storage_configuration>
<disks>
<s3>
<type>s3</type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_invironment_credentials>1</use_invironment_credentials>
</s3>
</disks>
<policies>
<s3>
<volumes>
<main>
<disk>s3</disk>
</main>
</volumes>
</s3>
</policies>
</storage_configuration>
</clickhouse>
```
Starting with 24.1 clickhouse version, it can also look like:
``` xml
<clickhouse>
<storage_configuration>
<disks>
<s3>
<type>object_storage</type>
<object_storage_type>s3</object_storage_type>
<metadata_type>local</metadata_type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_invironment_credentials>1</use_invironment_credentials>
</s3>
</disks>
<policies>
<s3>
<volumes>
<main>
<disk>s3</disk>
</main>
</volumes>
</s3>
</policies>
</storage_configuration>
</clickhouse>
```
In order to make a specific kind of storage a default option for all `MergeTree` tables add the following section to configuration file:
``` xml
<clickhouse>
<merge_tree>
<storage_policy>s3</storage_policy>
</merge_tree>
</clickhouse>
```
If you want to configure a specific storage policy only to specific table, you can define it in settings while creating the table:
``` sql
CREATE TABLE test (a Int32, b String)
ENGINE = MergeTree() ORDER BY a
SETTINGS storage_policy = 's3';
```
You can also use `disk` instead of `storage_policy`. In this case it is not requires to have `storage_policy` section in configuration file, only `disk` section would be enough.
``` sql
CREATE TABLE test (a Int32, b String)
ENGINE = MergeTree() ORDER BY a
SETTINGS disk = 's3';
```
## Dynamic Configuration {#dynamic-configuration}
There is also a possibility to specify storage configuration without a predefined disk in configuration in a configuration file, but can be configured in the `CREATE`/`ATTACH` query settings.
The following example query builds on the above dynamic disk configuration and shows how to use a local disk to cache data from a table stored at a URL.
```sql
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
is_new UInt8,
duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2)
# highlight-start
SETTINGS disk = disk(
type=web,
endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
);
# highlight-end
```
The example below adds cache to external storage.
```sql
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
is_new UInt8,
duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2)
# highlight-start
SETTINGS disk = disk(
type=cache,
max_size='1Gi',
path='/var/lib/clickhouse/custom_disk_cache/',
disk=disk(
type=web,
endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
)
);
# highlight-end
```
In the settings highlighted below notice that the disk of `type=web` is nested within
the disk of `type=cache`.
:::note
The example uses `type=web`, but any disk type can be configured as dynamic, even Local disk. Local disks require a path argument to be inside the server config parameter `custom_local_disks_base_directory`, which has no default, so set that also when using local disk.
:::
A combination of config-based configuration and sql-defined configuration is also possible:
```sql
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
is_new UInt8,
duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2)
# highlight-start
SETTINGS disk = disk(
type=cache,
max_size='1Gi',
path='/var/lib/clickhouse/custom_disk_cache/',
disk=disk(
type=web,
endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
)
);
# highlight-end
```
where `web` is a from a server configuration file:
``` xml
<storage_configuration>
<disks>
<web>
<type>web</type>
<endpoint>'https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'</endpoint>
</web>
</disks>
</storage_configuration>
```
### Using S3 Storage {#s3-storage}
Required parameters:
- `endpoint` — S3 endpoint URL in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint URL should contain a bucket and root path to store data.
- `access_key_id` — S3 access key id.
- `secret_access_key` — S3 secret access key.
Optional parameters:
- `region` — S3 region name.
- `support_batch_delete` — This controls the check to see if batch deletes are supported. Set this to `false` when using Google Cloud Storage (GCS) as GCS does not support batch deletes and preventing the checks will prevent error messages in the logs.
- `use_environment_credentials` — Reads AWS credentials from the Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN if they exist. Default value is `false`.
- `use_insecure_imds_request` — If set to `true`, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Default value is `false`.
- `expiration_window_seconds` — Grace period for checking if expiration-based credentials have expired. Optional, default value is `120`.
- `proxy` — Proxy configuration for S3 endpoint. Each `uri` element inside `proxy` block should contain a proxy URL.
- `connect_timeout_ms` — Socket connect timeout in milliseconds. Default value is `10 seconds`.
- `request_timeout_ms` — Request timeout in milliseconds. Default value is `5 seconds`.
- `retry_attempts` — Number of retry attempts in case of failed request. Default value is `10`.
- `single_read_retries` — Number of retry attempts in case of connection drop during read. Default value is `4`.
- `min_bytes_for_seek` — Minimal number of bytes to use seek operation instead of sequential read. Default value is `1 Mb`.
- `metadata_path` — Path on local FS to store metadata files for S3. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
- `skip_access_check` — If true, disk access checks will not be performed on disk start-up. Default value is `false`.
- `header` — Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times.
- `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set.
- `server_side_encryption_kms_key_id` - If specified, required headers for accessing S3 objects with [SSE-KMS encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) will be set. If an empty string is specified, the AWS managed S3 key will be used. Optional.
- `server_side_encryption_kms_encryption_context` - If specified alongside `server_side_encryption_kms_key_id`, the given encryption context header for SSE-KMS will be set. Optional.
- `server_side_encryption_kms_bucket_key_enabled` - If specified alongside `server_side_encryption_kms_key_id`, the header to enable S3 bucket keys for SSE-KMS will be set. Optional, can be `true` or `false`, defaults to nothing (matches the bucket-level setting).
- `s3_max_put_rps` — Maximum PUT requests per second rate before throttling. Default value is `0` (unlimited).
- `s3_max_put_burst` — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (`0` value) equals to `s3_max_put_rps`.
- `s3_max_get_rps` — Maximum GET requests per second rate before throttling. Default value is `0` (unlimited).
- `s3_max_get_burst` — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (`0` value) equals to `s3_max_get_rps`.
- `read_resource` — Resource name to be used for [scheduling](/docs/en/operations/workload-scheduling.md) of read requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
- `write_resource` — Resource name to be used for [scheduling](/docs/en/operations/workload-scheduling.md) of write requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
- `key_template` — Define the format with which the object keys are generated. By default, Clickhouse takes `root path` from `endpoint` option and adds random generated suffix. That suffix is a dir with 3 random symbols and a file name with 29 random symbols. With that option you have a full control how to the object keys are generated. Some usage scenarios require having random symbols in the prefix or in the middle of object key. For example: `[a-z]{3}-prefix-random/constant-part/random-middle-[a-z]{3}/random-suffix-[a-z]{29}`. The value is parsed with [`re2`](https://github.com/google/re2/wiki/Syntax). Only some subset of the syntax is supported. Check if your preferred format is supported before using that option. Disk isn't initialized if clickhouse is unable to generate a key by the value of `key_template`. It requires enabled feature flag [storage_metadata_write_full_object_key](/docs/en/operations/settings/settings#storage_metadata_write_full_object_key). It forbids declaring the `root path` in `endpoint` option. It requires definition of the option `key_compatibility_prefix`.
- `key_compatibility_prefix` — That option is required when option `key_template` is in use. In order to be able to read the objects keys which were stored in the metadata files with the metadata version lower that `VERSION_FULL_OBJECT_KEY`, the previous `root path` from the `endpoint` option should be set here.
:::note
Google Cloud Storage (GCS) is also supported using the type `s3`. See [GCS backed MergeTree](/docs/en/integrations/gcs).
:::
### Using Plain Storage {#plain-storage}
In `22.10` a new disk type `s3_plain` was introduced, which provides a write-once storage. Configuration parameters are the same as for `s3` disk type.
Unlike `s3` disk type, it stores data as is, e.g. instead of randomly-generated blob names, it uses normal file names (the same way as clickhouse stores files on local disk) and does not store any metadata locally, e.g. it is derived from data on `s3`.
This disk type allows to keep a static version of the table, as it does not allow executing merges on the existing data and does not allow inserting of new data.
A use case for this disk type is to create backups on it, which can be done via `BACKUP TABLE data TO Disk('plain_disk_name', 'backup_name')`. Afterwards you can do `RESTORE TABLE data AS data_restored FROM Disk('plain_disk_name', 'backup_name')` or using `ATTACH TABLE data (...) ENGINE = MergeTree() SETTINGS disk = 'plain_disk_name'`.
Configuration:
``` xml
<s3_plain>
<type>s3_plain</type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_invironment_credentials>1</use_invironment_credentials>
</s3_plain>
```
Starting from `24.1` it is possible configure any object storage disk (`s3`, `azure`, `hdfs`, `local`) using `plain` metadata type.
Configuration:
``` xml
<s3_plain>
<type>object_storage</type>
<object_storage_type>azure</object_storage_type>
<metadata_type>plain</metadata_type>
<endpoint>https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/</endpoint>
<use_invironment_credentials>1</use_invironment_credentials>
</s3_plain>
```
### Using Azure Blob Storage {#azure-blob-storage}
`MergeTree` family table engines can store data to [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) using a disk with type `azure_blob_storage`.
As of February 2022, this feature is still a fresh addition, so expect that some Azure Blob Storage functionalities might be unimplemented.
Configuration markup:
``` xml
<storage_configuration>
...
<disks>
<blob_storage_disk>
<type>azure_blob_storage</type>
<storage_account_url>http://account.blob.core.windows.net</storage_account_url>
<container_name>container</container_name>
<account_name>account</account_name>
<account_key>pass123</account_key>
<metadata_path>/var/lib/clickhouse/disks/blob_storage_disk/</metadata_path>
<cache_path>/var/lib/clickhouse/disks/blob_storage_disk/cache/</cache_path>
<skip_access_check>false</skip_access_check>
</blob_storage_disk>
</disks>
...
</storage_configuration>
```
Connection parameters:
* `storage_account_url` - **Required**, Azure Blob Storage account URL, like `http://account.blob.core.windows.net` or `http://azurite1:10000/devstoreaccount1`.
* `container_name` - Target container name, defaults to `default-container`.
* `container_already_exists` - If set to `false`, a new container `container_name` is created in the storage account, if set to `true`, disk connects to the container directly, and if left unset, disk connects to the account, checks if the container `container_name` exists, and creates it if it doesn't exist yet.
Authentication parameters (the disk will try all available methods **and** Managed Identity Credential):
* `connection_string` - For authentication using a connection string.
* `account_name` and `account_key` - For authentication using Shared Key.
Limit parameters (mainly for internal usage):
* `s3_max_single_part_upload_size` - Limits the size of a single block upload to Blob Storage.
* `min_bytes_for_seek` - Limits the size of a seekable region.
* `max_single_read_retries` - Limits the number of attempts to read a chunk of data from Blob Storage.
* `max_single_download_retries` - Limits the number of attempts to download a readable buffer from Blob Storage.
* `thread_pool_size` - Limits the number of threads with which `IDiskRemote` is instantiated.
* `s3_max_inflight_parts_for_one_file` - Limits the number of put requests that can be run concurrently for one object.
Other parameters:
* `metadata_path` - Path on local FS to store metadata files for Blob Storage. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
* `skip_access_check` - If true, disk access checks will not be performed on disk start-up. Default value is `false`.
* `read_resource` — Resource name to be used for [scheduling](/docs/en/operations/workload-scheduling.md) of read requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
* `write_resource` — Resource name to be used for [scheduling](/docs/en/operations/workload-scheduling.md) of write requests to this disk. Default value is empty string (IO scheduling is not enabled for this disk).
Examples of working configurations can be found in integration tests directory (see e.g. [test_merge_tree_azure_blob_storage](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml) or [test_azure_blob_storage_zero_copy_replication](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml)).
:::note Zero-copy replication is not ready for production
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
## Using HDFS storage {#hdfs-storage}
In this sample configuration:
- the disk is of type `hdfs`
- the data is hosted at `hdfs://hdfs1:9000/clickhouse/`
```xml
<clickhouse>
<storage_configuration>
<disks>
<hdfs>
<type>hdfs</type>
<endpoint>hdfs://hdfs1:9000/clickhouse/</endpoint>
<skip_access_check>true</skip_access_check>
</hdfs>
<hdd>
<type>local</type>
<path>/</path>
</hdd>
</disks>
<policies>
<hdfs>
@ -32,26 +422,17 @@ Configuration markup:
<main>
<disk>hdfs</disk>
</main>
<external>
<disk>hdd</disk>
</external>
</volumes>
</hdfs>
</policies>
</storage_configuration>
<merge_tree>
<min_bytes_for_wide_part>0</min_bytes_for_wide_part>
</merge_tree>
</clickhouse>
```
Required parameters:
- `endpoint` — HDFS endpoint URL in `path` format. Endpoint URL should contain a root path to store data.
Optional parameters:
- `min_bytes_for_seek` — The minimal number of bytes to use seek operation instead of sequential read. Default value: `1 Mb`.
## Using Virtual File System for Data Encryption {#encrypted-virtual-file-system}
### Using Data Encryption {#encrypted-virtual-file-system}
You can encrypt the data stored on [S3](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#table_engine-mergetree-s3), or [HDFS](#configuring-hdfs) external disks, or on a local disk. To turn on the encryption mode, in the configuration file you must define a disk with the type `encrypted` and choose a disk on which the data will be saved. An `encrypted` disk ciphers all written files on the fly, and when you read files from an `encrypted` disk it deciphers them automatically. So you can work with an `encrypted` disk like with a normal one.
@ -112,7 +493,7 @@ Example of disk configuration:
</clickhouse>
```
## Using local cache {#using-local-cache}
### Using local cache {#using-local-cache}
It is possible to configure local cache over disks in storage configuration starting from version 22.3.
For versions 22.3 - 22.7 cache is supported only for `s3` disk type. For versions >= 22.8 cache is supported for any disk type: S3, Azure, Local, Encrypted, etc.
@ -275,23 +656,92 @@ Cache profile events:
- `CachedWriteBufferCacheWriteBytes`, `CachedWriteBufferCacheWriteMicroseconds`
## Using in-memory cache (userspace page cache) {#userspace-page-cache}
The File Cache described above stores cached data in local files. Alternatively, object-store-based disks can be configured to use "Userspace Page Cache", which is RAM-only. Userspace page cache is recommended only if file cache can't be used for some reason, e.g. if the machine doesn't have a local disk at all. Note that file cache effectively uses RAM for caching too, since the OS caches contents of local files.
To enable userspace page cache for disks that don't use file cache, use setting `use_page_cache_for_disks_without_file_cache`.
By default, on Linux, the userspace page cache will use all available memory, similar to the OS page cache. In tools like `top` and `ps`, the clickhouse server process will typically show resident set size near 100% of the machine's RAM - this is normal, and most of this memory is actually reclaimable by the OS on memory pressure (`MADV_FREE`). This behavior can be disabled with server setting `page_cache_use_madv_free = 0`, making the userspace page cache just use a fixed amount of memory `page_cache_size` with no special interaction with the OS. On Mac OS, `page_cache_use_madv_free` is always disabled as it doesn't have lazy `MADV_FREE`.
Unfortunately, `page_cache_use_madv_free` makes it difficult to tell if the server is close to running out of memory, since the RSS metric becomes useless. Async metric `UnreclaimableRSS` shows the amount of physical memory used by the server, excluding the memory reclaimable by the OS: `select value from system.asynchronous_metrics where metric = 'UnreclaimableRSS'`. Use it for monitoring instead of RSS. This metric is only available if `page_cache_use_madv_free` is enabled.
## Storing Data on Web Server {#storing-data-on-webserver}
There is a tool `clickhouse-static-files-uploader`, which prepares a data directory for a given table (`SELECT data_paths FROM system.tables WHERE name = 'table_name'`). For each table you need, you get a directory of files. These files can be uploaded to, for example, a web server with static files. After this preparation, you can load this table into any ClickHouse server via `DiskWeb`.
### Using static Web storage (read-only) {#web-storage}
This is a read-only disk. Its data is only read and never modified. A new table is loaded to this disk via `ATTACH TABLE` query (see example below). Local disk is not actually used, each `SELECT` query will result in a `http` request to fetch required data. All modification of the table data will result in an exception, i.e. the following types of queries are not allowed: [CREATE TABLE](/docs/en/sql-reference/statements/create/table.md), [ALTER TABLE](/docs/en/sql-reference/statements/alter/index.md), [RENAME TABLE](/docs/en/sql-reference/statements/rename.md/#misc_operations-rename_table), [DETACH TABLE](/docs/en/sql-reference/statements/detach.md) and [TRUNCATE TABLE](/docs/en/sql-reference/statements/truncate.md).
Web storage can be used for read-only purposes. An example use is for hosting sample data, or for migrating data.
There is a tool `clickhouse-static-files-uploader`, which prepares a data directory for a given table (`SELECT data_paths FROM system.tables WHERE name = 'table_name'`). For each table you need, you get a directory of files. These files can be uploaded to, for example, a web server with static files. After this preparation, you can load this table into any ClickHouse server via `DiskWeb`.
Web server storage is supported only for the [MergeTree](/docs/en/engines/table-engines/mergetree-family/mergetree.md) and [Log](/docs/en/engines/table-engines/log-family/log.md) engine families. To access the data stored on a `web` disk, use the [storage_policy](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#terms) setting when executing the query. For example, `ATTACH TABLE table_web UUID '{}' (id Int32) ENGINE = MergeTree() ORDER BY id SETTINGS storage_policy = 'web'`.
In this sample configuration:
- the disk is of type `web`
- the data is hosted at `http://nginx:80/test1/`
- a cache on local storage is used
```xml
<clickhouse>
<storage_configuration>
<disks>
<web>
<type>web</type>
<endpoint>http://nginx:80/test1/</endpoint>
</web>
<cached_web>
<type>cache</type>
<disk>web</disk>
<path>cached_web_cache/</path>
<max_size>100000000</max_size>
</cached_web>
</disks>
<policies>
<web>
<volumes>
<main>
<disk>web</disk>
</main>
</volumes>
</web>
<cached_web>
<volumes>
<main>
<disk>cached_web</disk>
</main>
</volumes>
</cached_web>
</policies>
</storage_configuration>
</clickhouse>
```
:::tip
Storage can also be configured temporarily within a query, if a web dataset is not expected
to be used routinely, see [dynamic configuration](#dynamic-configuration) and skip editing the
configuration file.
:::
:::tip
A [demo dataset](https://github.com/ClickHouse/web-tables-demo) is hosted in GitHub. To prepare your own tables for web storage see the tool [clickhouse-static-files-uploader](/docs/en/operations/storing-data.md/#storing-data-on-webserver)
:::
In this `ATTACH TABLE` query the `UUID` provided matches the directory name of the data, and the endpoint is the URL for the raw GitHub content.
```sql
# highlight-next-line
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('other' = 0, 'terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4),
is_new UInt8,
duration Enum8('unknown' = 0, 'freehold' = 1, 'leasehold' = 2),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (postcode1, postcode2, addr1, addr2)
# highlight-start
SETTINGS disk = disk(
type=web,
endpoint='https://raw.githubusercontent.com/ClickHouse/web-tables-demo/main/web/'
);
# highlight-end
```
A ready test case. You need to add this configuration to config:
@ -487,7 +937,7 @@ If URL is not reachable on disk load when the server is starting up tables, then
Use [http_max_single_read_retries](/docs/en/operations/settings/settings.md/#http-max-single-read-retries) setting to limit the maximum number of retries during a single HTTP read.
## Zero-copy Replication (not ready for production) {#zero-copy}
### Zero-copy Replication (not ready for production) {#zero-copy}
Zero-copy replication is possible, but not recommended, with `S3` and `HDFS` disks. Zero-copy replication means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.

View File

@ -513,10 +513,6 @@ Part was moved to another disk and should be deleted in own destructor.
Not active data part with identity refcounter, it is deleting right now by a cleaner.
### PartsInMemory
In-memory parts.
### PartsOutdated
Not active data part, but could be used by only current SELECTs, could be deleted after SELECTs finishes.

View File

@ -26,7 +26,9 @@ priority: 0
is_active: 0
active_children: 0
dequeued_requests: 67
canceled_requests: 0
dequeued_cost: 4692272
canceled_cost: 0
busy_periods: 63
vruntime: 938454.1999999989
system_vruntime: ᴺᵁᴸᴸ
@ -54,7 +56,9 @@ Columns:
- `is_active` (`UInt8`) - Whether this node is currently active - has resource requests to be dequeued and constraints satisfied.
- `active_children` (`UInt64`) - The number of children in active state.
- `dequeued_requests` (`UInt64`) - The total number of resource requests dequeued from this node.
- `canceled_requests` (`UInt64`) - The total number of resource requests canceled from this node.
- `dequeued_cost` (`UInt64`) - The sum of costs (e.g. size in bytes) of all requests dequeued from this node.
- `canceled_cost` (`UInt64`) - The sum of costs (e.g. size in bytes) of all requests canceled from this node.
- `busy_periods` (`UInt64`) - The total number of deactivations of this node.
- `vruntime` (`Nullable(Float64)`) - For children of `fair` nodes only. Virtual runtime of a node used by SFQ algorithm to select the next child to process in a max-min fair manner.
- `system_vruntime` (`Nullable(Float64)`) - For `fair` nodes only. Virtual runtime showing `vruntime` of the last processed resource request. Used during child activation as the new value of `vruntime`.

View File

@ -1,187 +0,0 @@
---
slug: /en/operations/utilities/clickhouse-copier
sidebar_position: 59
sidebar_label: clickhouse-copier
---
# clickhouse-copier
Copies data from the tables in one cluster to tables in another (or the same) cluster.
:::note
To get a consistent copy, the data in the source tables and partitions should not change during the entire process.
:::
You can run multiple `clickhouse-copier` instances on different servers to perform the same job. ClickHouse Keeper, or ZooKeeper, is used for syncing the processes.
After starting, `clickhouse-copier`:
- Connects to ClickHouse Keeper and receives:
- Copying jobs.
- The state of the copying jobs.
- It performs the jobs.
Each running process chooses the “closest” shard of the source cluster and copies the data into the destination cluster, resharding the data if necessary.
`clickhouse-copier` tracks the changes in ClickHouse Keeper and applies them on the fly.
To reduce network traffic, we recommend running `clickhouse-copier` on the same server where the source data is located.
## Running Clickhouse-copier {#running-clickhouse-copier}
The utility should be run manually:
``` bash
$ clickhouse-copier --daemon --config keeper.xml --task-path /task/path --base-dir /path/to/dir
```
Parameters:
- `daemon` — Starts `clickhouse-copier` in daemon mode.
- `config` — The path to the `keeper.xml` file with the parameters for the connection to ClickHouse Keeper.
- `task-path` — The path to the ClickHouse Keeper node. This node is used for syncing `clickhouse-copier` processes and storing tasks. Tasks are stored in `$task-path/description`.
- `task-file` — Optional path to file with task configuration for initial upload to ClickHouse Keeper.
- `task-upload-force` — Force upload `task-file` even if node already exists. Default is false.
- `base-dir` — The path to logs and auxiliary files. When it starts, `clickhouse-copier` creates `clickhouse-copier_YYYYMMHHSS_<PID>` subdirectories in `$base-dir`. If this parameter is omitted, the directories are created in the directory where `clickhouse-copier` was launched.
## Format of keeper.xml {#format-of-zookeeper-xml}
``` xml
<clickhouse>
<logger>
<level>trace</level>
<size>100M</size>
<count>3</count>
</logger>
<zookeeper>
<node index="1">
<host>127.0.0.1</host>
<port>2181</port>
</node>
</zookeeper>
</clickhouse>
```
## Configuration of Copying Tasks {#configuration-of-copying-tasks}
``` xml
<clickhouse>
<!-- Configuration of clusters as in an ordinary server config -->
<remote_servers>
<source_cluster>
<!--
source cluster & destination clusters accept exactly the same
parameters as parameters for the usual Distributed table
see https://clickhouse.com/docs/en/engines/table-engines/special/distributed/
-->
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>127.0.0.1</host>
<port>9000</port>
<!--
<user>default</user>
<password>default</password>
<secure>1</secure>
-->
</replica>
</shard>
...
</source_cluster>
<destination_cluster>
...
</destination_cluster>
</remote_servers>
<!-- How many simultaneously active workers are possible. If you run more workers superfluous workers will sleep. -->
<max_workers>2</max_workers>
<!-- Setting used to fetch (pull) data from source cluster tables -->
<settings_pull>
<readonly>1</readonly>
</settings_pull>
<!-- Setting used to insert (push) data to destination cluster tables -->
<settings_push>
<readonly>0</readonly>
</settings_push>
<!-- Common setting for fetch (pull) and insert (push) operations. Also, copier process context uses it.
They are overlaid by <settings_pull/> and <settings_push/> respectively. -->
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.
You could specify several table task in the same task description (in the same ZooKeeper node), they will be performed
sequentially.
-->
<tables>
<!-- A table task, copies one table. -->
<table_hits>
<!-- Source cluster name (from <remote_servers/> section) and tables in it that should be copied -->
<cluster_pull>source_cluster</cluster_pull>
<database_pull>test</database_pull>
<table_pull>hits</table_pull>
<!-- Destination cluster name and tables in which the data should be inserted -->
<cluster_push>destination_cluster</cluster_push>
<database_push>test</database_push>
<table_push>hits2</table_push>
<!-- Engine of destination tables.
If destination tables have not be created, workers create them using columns definition from source tables and engine
definition from here.
NOTE: If the first worker starts insert data and detects that destination partition is not empty then the partition will
be dropped and refilled, take it into account if you already have some data in destination tables. You could directly
specify partitions that should be copied in <enabled_partitions/>, they should be in quoted format like partition column of
system.parts table.
-->
<engine>
ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}')
PARTITION BY toMonday(date)
ORDER BY (CounterID, EventDate)
</engine>
<!-- Sharding key used to insert data to destination cluster -->
<sharding_key>jumpConsistentHash(intHash64(UserID), 2)</sharding_key>
<!-- Optional expression that filter data while pull them from source servers -->
<where_condition>CounterID != 0</where_condition>
<!-- This section specifies partitions that should be copied, other partition will be ignored.
Partition names should have the same format as
partition column of system.parts table (i.e. a quoted text).
Since partition key of source and destination cluster could be different,
these partition names specify destination partitions.
NOTE: In spite of this section is optional (if it is not specified, all partitions will be copied),
it is strictly recommended to specify them explicitly.
If you already have some ready partitions on destination cluster they
will be removed at the start of the copying since they will be interpeted
as unfinished data from the previous copying!!!
-->
<enabled_partitions>
<partition>'2018-02-26'</partition>
<partition>'2018-03-05'</partition>
...
</enabled_partitions>
</table_hits>
<!-- Next table to copy. It is not copied until previous table is copying. -->
<table_visits>
...
</table_visits>
...
</tables>
</clickhouse>
```
`clickhouse-copier` tracks the changes in `/task/path/description` and applies them on the fly. For instance, if you change the value of `max_workers`, the number of processes running tasks will also change.

View File

@ -201,12 +201,12 @@ Arguments:
- `-S`, `--structure` — table structure for input data.
- `--input-format` — input format, `TSV` by default.
- `-f`, `--file` — path to data, `stdin` by default.
- `-F`, `--file` — path to data, `stdin` by default.
- `-q`, `--query` — queries to execute with `;` as delimiter. `--query` can be specified multiple times, e.g. `--query "SELECT 1" --query "SELECT 2"`. Cannot be used simultaneously with `--queries-file`.
- `--queries-file` - file path with queries to execute. `--queries-file` can be specified multiple times, e.g. `--query queries1.sql --query queries2.sql`. Cannot be used simultaneously with `--query`.
- `--multiquery, -n` If specified, multiple queries separated by semicolons can be listed after the `--query` option. For convenience, it is also possible to omit `--query` and pass the queries directly after `--multiquery`.
- `-N`, `--table` — table name where to put output data, `table` by default.
- `--format`, `--output-format` — output format, `TSV` by default.
- `-f`, `--format`, `--output-format` — output format, `TSV` by default.
- `-d`, `--database` — default database, `_local` by default.
- `--stacktrace` — whether to dump debug output in case of exception.
- `--echo` — print query before execution.

View File

@ -2,13 +2,11 @@
slug: /en/operations/utilities/
sidebar_position: 56
sidebar_label: List of tools and utilities
pagination_next: 'en/operations/utilities/clickhouse-copier'
---
# List of tools and utilities
- [clickhouse-local](../../operations/utilities/clickhouse-local.md) — Allows running SQL queries on data without starting the ClickHouse server, similar to how `awk` does this.
- [clickhouse-copier](../../operations/utilities/clickhouse-copier.md) — Copies (and reshards) data from one cluster to another cluster.
- [clickhouse-benchmark](../../operations/utilities/clickhouse-benchmark.md) — Loads server with the custom queries and settings.
- [clickhouse-format](../../operations/utilities/clickhouse-format.md) — Enables formatting input queries.
- [ClickHouse obfuscator](../../operations/utilities/clickhouse-obfuscator.md) — Obfuscates data.

View File

@ -16,7 +16,9 @@ ClickHouse also supports:
## NULL Processing
During aggregation, all `NULL`s are skipped. If the aggregation has several parameters it will ignore any row in which one or more of the parameters are NULL.
During aggregation, all `NULL` arguments are skipped. If the aggregation has several arguments it will ignore any row in which one or more of them are NULL.
There is an exception to this rule, which are the functions [`first_value`](../../sql-reference/aggregate-functions/reference/first_value.md), [`last_value`](../../sql-reference/aggregate-functions/reference/last_value.md) and their aliases when followed by the modifier `RESPECT NULLS`: `FIRST_VALUE(b) RESPECT NULLS`.
**Examples:**
@ -85,3 +87,50 @@ FROM t_null_big;
│ [2,2,3] │ [2,NULL,2,3,NULL] │
└───────────────┴───────────────────────────────────────┘
```
Note that aggregations are skipped when the columns are used as arguments to an aggregated function. For example [`count`](../../sql-reference/aggregate-functions/reference/count.md) without parameters (`count()`) or with constant ones (`count(1)`) will count all rows in the block (independently of the value of the GROUP BY column as it's not an argument), while `count(column)` will only return the number of rows where column is not NULL.
```sql
SELECT
v,
count(1),
count(v)
FROM
(
SELECT if(number < 10, NULL, number % 3) AS v
FROM numbers(15)
)
GROUP BY v
┌────v─┬─count()─┬─count(v)─┐
│ ᴺᵁᴸᴸ │ 10 │ 0 │
│ 0 │ 1 │ 1 │
│ 1 │ 2 │ 2 │
│ 2 │ 2 │ 2 │
└──────┴─────────┴──────────┘
```
And here is an example of of first_value with `RESPECT NULLS` where we can see that NULL inputs are respected and it will return the first value read, whether it's NULL or not:
```sql
SELECT
col || '_' || ((col + 1) * 5 - 1) as range,
first_value(odd_or_null) as first,
first_value(odd_or_null) IGNORE NULLS as first_ignore_null,
first_value(odd_or_null) RESPECT NULLS as first_respect_nulls
FROM
(
SELECT
intDiv(number, 5) AS col,
if(number % 2 == 0, NULL, number) as odd_or_null
FROM numbers(15)
)
GROUP BY col
ORDER BY col
┌─range─┬─first─┬─first_ignore_null─┬─first_respect_nulls─┐
│ 0_4 │ 1 │ 1 │ ᴺᵁᴸᴸ │
│ 1_9 │ 5 │ 5 │ 5 │
│ 2_14 │ 11 │ 11 │ ᴺᵁᴸᴸ │
└───────┴───────┴───────────────────┴─────────────────────┘
```

View File

@ -1,16 +1,99 @@
---
slug: /en/sql-reference/aggregate-functions/reference/varpop
title: "varPop"
slug: "/en/sql-reference/aggregate-functions/reference/varpop"
sidebar_position: 32
---
# varPop(x)
This page covers the `varPop` and `varPopStable` functions available in ClickHouse.
Calculates the amount `Σ((x - x̅)^2) / n`, where `n` is the sample size and `x̅`is the average value of `x`.
## varPop
In other words, dispersion for a set of values. Returns `Float64`.
Calculates the population covariance between two data columns. The population covariance measures the degree to which two variables vary together. Calculates the amount `Σ((x - x̅)^2) / n`, where `n` is the sample size and `x̅`is the average value of `x`.
Alias: `VAR_POP`.
**Syntax**
:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `varPopStable` function. It works slower but provides a lower computational error.
:::
```sql
covarPop(x, y)
```
**Parameters**
- `x`: The first data column. [Numeric](../../../native-protocol/columns.md)
- `y`: The second data column. [Numeric](../../../native-protocol/columns.md)
**Returned value**
Returns an integer of type `Float64`.
**Implementation details**
This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable [`varPopStable` function](#varPopStable).
**Example**
```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
x Int32,
y Int32
)
ENGINE = Memory;
INSERT INTO test_data VALUES (1, 2), (2, 3), (3, 5), (4, 6), (5, 8);
SELECT
covarPop(x, y) AS covar_pop
FROM test_data;
```
```response
3
```
## varPopStable
Calculates population covariance between two data columns using a stable, numerically accurate method to calculate the variance. This function is designed to provide reliable results even with large datasets or values that might cause numerical instability in other implementations.
**Syntax**
```sql
covarPopStable(x, y)
```
**Parameters**
- `x`: The first data column. [String literal](../../syntax#syntax-string-literal)
- `y`: The second data column. [Expression](../../syntax#syntax-expressions)
**Returned value**
Returns an integer of type `Float64`.
**Implementation details**
Unlike [`varPop()`](#varPop), this function uses a stable, numerically accurate algorithm to calculate the population variance to avoid issues like catastrophic cancellation or loss of precision. This function also handles `NaN` and `Inf` values correctly, excluding them from calculations.
**Example**
Query:
```sql
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
x Int32,
y Int32
)
ENGINE = Memory;
INSERT INTO test_data VALUES (1, 2), (2, 9), (9, 5), (4, 6), (5, 8);
SELECT
covarPopStable(x, y) AS covar_pop_stable
FROM test_data;
```
```response
0.5999999999999999
```

View File

@ -1,18 +1,128 @@
---
title: "varSamp"
slug: /en/sql-reference/aggregate-functions/reference/varsamp
sidebar_position: 33
---
# varSamp
This page contains information on the `varSamp` and `varSampStable` ClickHouse functions.
Calculates the amount `Σ((x - x̅)^2) / (n - 1)`, where `n` is the sample size and `x̅`is the average value of `x`.
## varSamp
It represents an unbiased estimate of the variance of a random variable if passed values from its sample.
Calculate the sample variance of a data set.
Returns `Float64`. When `n <= 1`, returns `+∞`.
**Syntax**
Alias: `VAR_SAMP`.
```sql
varSamp(expr)
```
:::note
This function uses a numerically unstable algorithm. If you need [numerical stability](https://en.wikipedia.org/wiki/Numerical_stability) in calculations, use the `varSampStable` function. It works slower but provides a lower computational error.
:::
**Parameters**
- `expr`: An expression representing the data set for which you want to calculate the sample variance. [Expression](../../syntax#syntax-expressions)
**Returned value**
Returns a Float64 value representing the sample variance of the input data set.
**Implementation details**
The `varSamp()` function calculates the sample variance using the following formula:
```plaintext
∑(x - mean(x))^2 / (n - 1)
```
Where:
- `x` is each individual data point in the data set.
- `mean(x)` is the arithmetic mean of the data set.
- `n` is the number of data points in the data set.
The function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the [`varPop()` function](./varpop#varpop) instead.
This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable [`varSampStable` function](#varSampStable).
**Example**
Query:
```sql
CREATE TABLE example_table
(
id UInt64,
value Float64
)
ENGINE = MergeTree
ORDER BY id;
INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);
SELECT varSamp(value) FROM example_table;
```
Response:
```response
0.8650000000000091
```
## varSampStable
Calculate the sample variance of a data set using a numerically stable algorithm.
**Syntax**
```sql
varSampStable(expr)
```
**Parameters**
- `expr`: An expression representing the data set for which you want to calculate the sample variance. [Expression](../../syntax#syntax-expressions)
**Returned value**
The `varSampStable()` function returns a Float64 value representing the sample variance of the input data set.
**Implementation details**
The `varSampStable()` function calculates the sample variance using the same formula as the [`varSamp()`](#varSamp function):
```plaintext
∑(x - mean(x))^2 / (n - 1)
```
Where:
- `x` is each individual data point in the data set.
- `mean(x)` is the arithmetic mean of the data set.
- `n` is the number of data points in the data set.
The difference between `varSampStable()` and `varSamp()` is that `varSampStable()` is designed to provide a more deterministic and stable result when dealing with floating-point arithmetic. It uses an algorithm that minimizes the accumulation of rounding errors, which can be particularly important when dealing with large data sets or data with a wide range of values.
Like `varSamp()`, the `varSampStable()` function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the [`varPopStable()` function](./varpop#varpopstable) instead.
**Example**
Query:
```sql
CREATE TABLE example_table
(
id UInt64,
value Float64
)
ENGINE = MergeTree
ORDER BY id;
INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);
SELECT varSampStable(value) FROM example_table;
```
Response:
```response
0.865
```
This query calculates the sample variance of the `value` column in the `example_table` using the `varSampStable()` function. The result shows that the sample variance of the values `[10.5, 12.3, 9.8, 11.2, 10.7]` is approximately 0.865, which may differ slightly from the result of `varSamp()` due to the more precise handling of floating-point arithmetic.

View File

@ -19,7 +19,7 @@ empty([x])
An array is considered empty if it does not contain any elements.
:::note
Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operations/settings/settings.md#optimize-functions-to-subcolumns) setting. With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT empty(arr) FROM TABLE;` transforms to `SELECT arr.size0 = 0 FROM TABLE;`.
Can be optimized by enabling the [`optimize_functions_to_subcolumns` setting](../../operations/settings/settings.md#optimize-functions-to-subcolumns). With `optimize_functions_to_subcolumns = 1` the function reads only [size0](../../sql-reference/data-types/array.md#array-size) subcolumn instead of reading and processing the whole array column. The query `SELECT empty(arr) FROM TABLE;` transforms to `SELECT arr.size0 = 0 FROM TABLE;`.
:::
The function also works for [strings](string-functions.md#empty) or [UUID](uuid-functions.md#empty).
@ -104,17 +104,416 @@ Can be optimized by enabling the [optimize_functions_to_subcolumns](../../operat
Alias: `OCTET_LENGTH`
## emptyArrayUInt8, emptyArrayUInt16, emptyArrayUInt32, emptyArrayUInt64
## emptyArrayUInt8
## emptyArrayInt8, emptyArrayInt16, emptyArrayInt32, emptyArrayInt64
Returns an empty UInt8 array.
## emptyArrayFloat32, emptyArrayFloat64
**Syntax**
## emptyArrayDate, emptyArrayDateTime
```sql
emptyArrayUInt8()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayUInt8();
```
Result:
```response
[]
```
## emptyArrayUInt16
Returns an empty UInt16 array.
**Syntax**
```sql
emptyArrayUInt16()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayUInt16();
```
Result:
```response
[]
```
## emptyArrayUInt32
Returns an empty UInt32 array.
**Syntax**
```sql
emptyArrayUInt32()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayUInt32();
```
Result:
```response
[]
```
## emptyArrayUInt64
Returns an empty UInt64 array.
**Syntax**
```sql
emptyArrayUInt64()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayUInt64();
```
Result:
```response
[]
```
## emptyArrayInt8
Returns an empty Int8 array.
**Syntax**
```sql
emptyArrayInt8()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayInt8();
```
Result:
```response
[]
```
## emptyArrayInt16
Returns an empty Int16 array.
**Syntax**
```sql
emptyArrayInt16()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayInt16();
```
Result:
```response
[]
```
## emptyArrayInt32
Returns an empty Int32 array.
**Syntax**
```sql
emptyArrayInt32()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayInt32();
```
Result:
```response
[]
```
## emptyArrayInt64
Returns an empty Int64 array.
**Syntax**
```sql
emptyArrayInt64()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayInt64();
```
Result:
```response
[]
```
## emptyArrayFloat32
Returns an empty Float32 array.
**Syntax**
```sql
emptyArrayFloat32()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayFloat32();
```
Result:
```response
[]
```
## emptyArrayFloat64
Returns an empty Float64 array.
**Syntax**
```sql
emptyArrayFloat64()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayFloat64();
```
Result:
```response
[]
```
## emptyArrayDate
Returns an empty Date array.
**Syntax**
```sql
emptyArrayDate()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayDate();
```
## emptyArrayDateTime
Returns an empty DateTime array.
**Syntax**
```sql
[]
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayDateTime();
```
Result:
```response
[]
```
## emptyArrayString
Accepts zero arguments and returns an empty array of the appropriate type.
Returns an empty String array.
**Syntax**
```sql
emptyArrayString()
```
**Arguments**
None.
**Returned value**
An empty array.
**Examples**
Query:
```sql
SELECT emptyArrayString();
```
Result:
```response
[]
```
## emptyArrayToSingle

View File

@ -394,8 +394,7 @@ Result:
## toYear
Converts a date or date with time to the year number (AD) as `UInt16` value.
Returns the year component (AD) of a date or date with time.
**Syntax**
@ -431,7 +430,7 @@ Result:
## toQuarter
Converts a date or date with time to the quarter number (1-4) as `UInt8` value.
Returns the quarter (1-4) of a date or date with time.
**Syntax**
@ -465,10 +464,9 @@ Result:
└──────────────────────────────────────────────┘
```
## toMonth
Converts a date or date with time to the month number (1-12) as `UInt8` value.
Returns the month component (1-12) of a date or date with time.
**Syntax**
@ -504,7 +502,7 @@ Result:
## toDayOfYear
Converts a date or date with time to the number of the day of the year (1-366) as `UInt16` value.
Returns the number of the day within the year (1-366) of a date or date with time.
**Syntax**
@ -540,7 +538,7 @@ Result:
## toDayOfMonth
Converts a date or date with time to the number of the day in the month (1-31) as `UInt8` value.
Returns the number of the day within the month (1-31) of a date or date with time.
**Syntax**
@ -576,7 +574,7 @@ Result:
## toDayOfWeek
Converts a date or date with time to the number of the day in the week as `UInt8` value.
Returns the number of the day within the week of a date or date with time.
The two-argument form of `toDayOfWeek()` enables you to specify whether the week starts on Monday or Sunday, and whether the return value should be in the range from 0 to 6 or 1 to 7. If the mode argument is omitted, the default mode is 0. The time zone of the date can be specified as the third argument.
@ -627,7 +625,7 @@ Result:
## toHour
Converts a date with time to the number of the hour in 24-hour time (0-23) as `UInt8` value.
Returns the hour component (0-24) of a date with time.
Assumes that if clocks are moved ahead, it is by one hour and occurs at 2 a.m., and if clocks are moved back, it is by one hour and occurs at 3 a.m. (which is not always exactly when it occurs - it depends on the timezone).
@ -641,7 +639,7 @@ Alias: `HOUR`
**Arguments**
- `value` - a [Date](../data-types/date.md), [Date32](../data-types/date32.md), [DateTime](../data-types/datetime.md) or [DateTime64](../data-types/datetime64.md)
- `value` - a [DateTime](../data-types/datetime.md) or [DateTime64](../data-types/datetime64.md)
**Returned value**
@ -665,7 +663,7 @@ Result:
## toMinute
Converts a date with time to the number of the minute of the hour (0-59) as `UInt8` value.
Returns the minute component (0-59) a date with time.
**Syntax**
@ -677,7 +675,7 @@ Alias: `MINUTE`
**Arguments**
- `value` - a [Date](../data-types/date.md), [Date32](../data-types/date32.md), [DateTime](../data-types/datetime.md) or [DateTime64](../data-types/datetime64.md)
- `value` - a [DateTime](../data-types/datetime.md) or [DateTime64](../data-types/datetime64.md)
**Returned value**
@ -701,7 +699,7 @@ Result:
## toSecond
Converts a date with time to the second in the minute (0-59) as `UInt8` value. Leap seconds are not considered.
Returns the second component (0-59) of a date with time. Leap seconds are not considered.
**Syntax**
@ -713,7 +711,7 @@ Alias: `SECOND`
**Arguments**
- `value` - a [Date](../data-types/date.md), [Date32](../data-types/date32.md), [DateTime](../data-types/datetime.md) or [DateTime64](../data-types/datetime64.md)
- `value` - a [DateTime](../data-types/datetime.md) or [DateTime64](../data-types/datetime64.md)
**Returned value**
@ -735,6 +733,40 @@ Result:
└─────────────────────────────────────────────┘
```
## toMillisecond
Returns the millisecond component (0-999) of a date with time.
**Syntax**
```sql
toMillisecond(value)
```
*Arguments**
- `value` - [DateTime](../data-types/datetime.md) or [DateTime64](../data-types/datetime64.md)
Alias: `MILLISECOND`
```sql
SELECT toMillisecond(toDateTime64('2023-04-21 10:20:30.456', 3))
```
Result:
```response
┌──toMillisecond(toDateTime64('2023-04-21 10:20:30.456', 3))─┐
│ 456 │
└────────────────────────────────────────────────────────────┘
```
**Returned value**
- The millisecond in the minute (0 - 59) of the given date/time
Type: `UInt16`
## toUnixTimestamp
Converts a string, a date or a date with time to the [Unix Timestamp](https://en.wikipedia.org/wiki/Unix_time) in `UInt32` representation.

View File

@ -433,3 +433,292 @@ Result:
│ [0,1,2,3,4,5,6,7] │
└───────────────────┘
```
## mortonEncode
Calculates the Morton encoding (ZCurve) for a list of unsigned integers.
The function has two modes of operation:
- Simple
- Expanded
### Simple mode
Accepts up to 8 unsigned integers as arguments and produces a UInt64 code.
**Syntax**
```sql
mortonEncode(args)
```
**Parameters**
- `args`: up to 8 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type.
**Returned value**
- A UInt64 code
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Query:
```sql
SELECT mortonEncode(1, 2, 3);
```
Result:
```response
53
```
### Expanded mode
Accepts a range mask ([tuple](../../sql-reference/data-types/tuple.md)) as a first argument and up to 8 [unsigned integers](../../sql-reference/data-types/int-uint.md) as other arguments.
Each number in the mask configures the amount of range expansion:<br/>
1 - no expansion<br/>
2 - 2x expansion<br/>
3 - 3x expansion<br/>
...<br/>
Up to 8x expansion.<br/>
**Syntax**
```sql
mortonEncode(range_mask, args)
```
**Parameters**
- `range_mask`: 1-8.
- `args`: up to 8 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type.
Note: when using columns for `args` the provided `range_mask` tuple should still be a constant.
**Returned value**
- A UInt64 code
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
Query:
```sql
SELECT mortonEncode((1,2), 1024, 16);
```
Result:
```response
1572864
```
Note: tuple size must be equal to the number of the other arguments.
**Example**
Morton encoding for one argument is always the argument itself:
Query:
```sql
SELECT mortonEncode(1);
```
Result:
```response
1
```
**Example**
It is also possible to expand one argument too:
Query:
```sql
SELECT mortonEncode(tuple(2), 128);
```
Result:
```response
32768
```
**Example**
You can also use column names in the function.
Query:
First create the table and insert some data.
```sql
create table morton_numbers(
n1 UInt32,
n2 UInt32,
n3 UInt16,
n4 UInt16,
n5 UInt8,
n6 UInt8,
n7 UInt8,
n8 UInt8
)
Engine=MergeTree()
ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';
insert into morton_numbers (*) values(1,2,3,4,5,6,7,8);
```
Use column names instead of constants as function arguments to `mortonEncode`
Query:
```sql
SELECT mortonEncode(n1, n2, n3, n4, n5, n6, n7, n8) FROM morton_numbers;
```
Result:
```response
2155374165
```
**implementation details**
Please note that you can fit only so many bits of information into Morton code as [UInt64](../../sql-reference/data-types/int-uint.md) has. Two arguments will have a range of maximum 2^32 (64/2) each, three arguments a range of max 2^21 (64/3) each and so on. All overflow will be clamped to zero.
## mortonDecode
Decodes a Morton encoding (ZCurve) into the corresponding unsigned integer tuple.
As with the `mortonEncode` function, this function has two modes of operation:
- Simple
- Expanded
### Simple mode
Accepts a resulting tuple size as the first argument and the code as the second argument.
**Syntax**
```sql
mortonDecode(tuple_size, code)
```
**Parameters**
- `tuple_size`: integer value no more than 8.
- `code`: [UInt64](../../sql-reference/data-types/int-uint.md) code.
**Returned value**
- [tuple](../../sql-reference/data-types/tuple.md) of the specified size.
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Query:
```sql
SELECT mortonDecode(3, 53);
```
Result:
```response
["1","2","3"]
```
### Expanded mode
Accepts a range mask (tuple) as a first argument and the code as the second argument.
Each number in the mask configures the amount of range shrink:<br/>
1 - no shrink<br/>
2 - 2x shrink<br/>
3 - 3x shrink<br/>
...<br/>
Up to 8x shrink.<br/>
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
As with the encode function, this is limited to 8 numbers at most.
**Example**
Query:
```sql
SELECT mortonDecode(1, 1);
```
Result:
```response
["1"]
```
**Example**
It is also possible to shrink one argument:
Query:
```sql
SELECT mortonDecode(tuple(2), 32768);
```
Result:
```response
["128"]
```
**Example**
You can also use column names in the function.
First create the table and insert some data.
Query:
```sql
create table morton_numbers(
n1 UInt32,
n2 UInt32,
n3 UInt16,
n4 UInt16,
n5 UInt8,
n6 UInt8,
n7 UInt8,
n8 UInt8
)
Engine=MergeTree()
ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';
insert into morton_numbers (*) values(1,2,3,4,5,6,7,8);
```
Use column names instead of constants as function arguments to `mortonDecode`
Query:
```sql
select untuple(mortonDecode(8, mortonEncode(n1, n2, n3, n4, n5, n6, n7, n8))) from morton_numbers;
```
Result:
```response
1 2 3 4 5 6 7 8
```

View File

@ -10,6 +10,8 @@ sidebar_label: Nullable
Returns whether the argument is [NULL](../../sql-reference/syntax.md#null).
See also operator [`IS NULL`](../operators/index.md#is_null).
``` sql
isNull(x)
```
@ -54,6 +56,8 @@ Result:
Returns whether the argument is not [NULL](../../sql-reference/syntax.md#null-literal).
See also operator [`IS NOT NULL`](../operators/index.md#is_not_null).
``` sql
isNotNull(x)
```

View File

@ -53,6 +53,62 @@ String starting with `POLYGON`
Polygon
## readWKTPoint
The `readWKTPoint` function in ClickHouse parses a Well-Known Text (WKT) representation of a Point geometry and returns a point in the internal ClickHouse format.
### Syntax
```sql
readWKTPoint(wkt_string)
```
### Arguments
- `wkt_string`: The input WKT string representing a Point geometry.
### Returned value
The function returns a ClickHouse internal representation of the Point geometry.
### Example
```sql
SELECT readWKTPoint('POINT (1.2 3.4)');
```
```response
(1.2,3.4)
```
## readWKTRing
Parses a Well-Known Text (WKT) representation of a Polygon geometry and returns a ring (closed linestring) in the internal ClickHouse format.
### Syntax
```sql
readWKTRing(wkt_string)
```
### Arguments
- `wkt_string`: The input WKT string representing a Polygon geometry.
### Returned value
The function returns a ClickHouse internal representation of the ring (closed linestring) geometry.
### Example
```sql
SELECT readWKTRing('LINESTRING (1 1, 2 2, 3 3, 1 1)');
```
```response
[(1,1),(2,2),(3,3),(1,1)]
```
## polygonsWithinSpherical
Returns true or false depending on whether or not one polygon lies completely inside another polygon. Reference https://www.boost.org/doc/libs/1_62_0/libs/geometry/doc/html/geometry/reference/algorithms/within/within_2.html

View File

@ -5,80 +5,372 @@ sidebar_label: JSON
---
There are two sets of functions to parse JSON.
- `visitParam*` (`simpleJSON*`) is made to parse a special very limited subset of a JSON, but these functions are extremely fast.
- `simpleJSON*` (`visitParam*`) is made to parse a special very limited subset of a JSON, but these functions are extremely fast.
- `JSONExtract*` is made to parse normal JSON.
# visitParam functions
# simpleJSON/visitParam functions
ClickHouse has special functions for working with simplified JSON. All these JSON functions are based on strong assumptions about what the JSON can be, but they try to do as little as possible to get the job done.
The following assumptions are made:
1. The field name (function argument) must be a constant.
2. The field name is somehow canonically encoded in JSON. For example: `visitParamHas('{"abc":"def"}', 'abc') = 1`, but `visitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0`
2. The field name is somehow canonically encoded in JSON. For example: `simpleJSONHas('{"abc":"def"}', 'abc') = 1`, but `simpleJSONHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0`
3. Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used.
4. The JSON does not have space characters outside of string literals.
## visitParamHas(params, name)
## simpleJSONHas
Checks whether there is a field with the `name` name.
Checks whether there is a field named `field_name`. The result is `UInt8`.
Alias: `simpleJSONHas`.
**Syntax**
## visitParamExtractUInt(params, name)
Parses UInt64 from the value of the field named `name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns 0.
Alias: `simpleJSONExtractUInt`.
## visitParamExtractInt(params, name)
The same as for Int64.
Alias: `simpleJSONExtractInt`.
## visitParamExtractFloat(params, name)
The same as for Float64.
Alias: `simpleJSONExtractFloat`.
## visitParamExtractBool(params, name)
Parses a true/false value. The result is UInt8.
Alias: `simpleJSONExtractBool`.
## visitParamExtractRaw(params, name)
Returns the value of a field, including separators.
Alias: `simpleJSONExtractRaw`.
Examples:
``` sql
visitParamExtractRaw('{"abc":"\\n\\u0000"}', 'abc') = '"\\n\\u0000"';
visitParamExtractRaw('{"abc":{"def":[1,2,3]}}', 'abc') = '{"def":[1,2,3]}';
```sql
simpleJSONHas(json, field_name)
```
## visitParamExtractString(params, name)
**Parameters**
Parses the string in double quotes. The value is unescaped. If unescaping failed, it returns an empty string.
- `json`: The JSON in which the field is searched for. [String](../../sql-reference/data-types/string.md#string)
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
Alias: `simpleJSONExtractString`.
**Returned value**
Examples:
It returns `1` if the field exists, `0` otherwise.
``` sql
visitParamExtractString('{"abc":"\\n\\u0000"}', 'abc') = '\n\0';
visitParamExtractString('{"abc":"\\u263a"}', 'abc') = '☺';
visitParamExtractString('{"abc":"\\u263"}', 'abc') = '';
visitParamExtractString('{"abc":"hello}', 'abc') = '';
**Example**
Query:
```sql
CREATE TABLE jsons
(
`json` String
)
ENGINE = Memory;
INSERT INTO jsons VALUES ('{"foo":"true","qux":1}');
SELECT simpleJSONHas(json, 'foo') FROM jsons;
SELECT simpleJSONHas(json, 'bar') FROM jsons;
```
```response
1
0
```
## simpleJSONExtractUInt
Parses `UInt64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
**Syntax**
```sql
simpleJSONExtractUInt(json, field_name)
```
**Parameters**
- `json`: The JSON in which the field is searched for. [String](../../sql-reference/data-types/string.md#string)
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
**Returned value**
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
**Example**
Query:
```sql
CREATE TABLE jsons
(
`json` String
)
ENGINE = Memory;
INSERT INTO jsons VALUES ('{"foo":"4e3"}');
INSERT INTO jsons VALUES ('{"foo":3.4}');
INSERT INTO jsons VALUES ('{"foo":5}');
INSERT INTO jsons VALUES ('{"foo":"not1number"}');
INSERT INTO jsons VALUES ('{"baz":2}');
SELECT simpleJSONExtractUInt(json, 'foo') FROM jsons ORDER BY json;
```
```response
0
4
0
3
5
```
## simpleJSONExtractInt
Parses `Int64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
**Syntax**
```sql
simpleJSONExtractInt(json, field_name)
```
**Parameters**
- `json`: The JSON in which the field is searched for. [String](../../sql-reference/data-types/string.md#string)
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
**Returned value**
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
**Example**
Query:
```sql
CREATE TABLE jsons
(
`json` String
)
ENGINE = Memory;
INSERT INTO jsons VALUES ('{"foo":"-4e3"}');
INSERT INTO jsons VALUES ('{"foo":-3.4}');
INSERT INTO jsons VALUES ('{"foo":5}');
INSERT INTO jsons VALUES ('{"foo":"not1number"}');
INSERT INTO jsons VALUES ('{"baz":2}');
SELECT simpleJSONExtractInt(json, 'foo') FROM jsons ORDER BY json;
```
```response
0
-4
0
-3
5
```
## simpleJSONExtractFloat
Parses `Float64` from the value of the field named `field_name`. If this is a string field, it tries to parse a number from the beginning of the string. If the field does not exist, or it exists but does not contain a number, it returns `0`.
**Syntax**
```sql
simpleJSONExtractFloat(json, field_name)
```
**Parameters**
- `json`: The JSON in which the field is searched for. [String](../../sql-reference/data-types/string.md#string)
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
**Returned value**
It returns the number parsed from the field if the field exists and contains a number, `0` otherwise.
**Example**
Query:
```sql
CREATE TABLE jsons
(
`json` String
)
ENGINE = Memory;
INSERT INTO jsons VALUES ('{"foo":"-4e3"}');
INSERT INTO jsons VALUES ('{"foo":-3.4}');
INSERT INTO jsons VALUES ('{"foo":5}');
INSERT INTO jsons VALUES ('{"foo":"not1number"}');
INSERT INTO jsons VALUES ('{"baz":2}');
SELECT simpleJSONExtractFloat(json, 'foo') FROM jsons ORDER BY json;
```
```response
0
-4000
0
-3.4
5
```
## simpleJSONExtractBool
Parses a true/false value from the value of the field named `field_name`. The result is `UInt8`.
**Syntax**
```sql
simpleJSONExtractBool(json, field_name)
```
**Parameters**
- `json`: The JSON in which the field is searched for. [String](../../sql-reference/data-types/string.md#string)
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
**Returned value**
It returns `1` if the value of the field is `true`, `0` otherwise. This means this function will return `0` including (and not only) in the following cases:
- If the field doesn't exists.
- If the field contains `true` as a string, e.g.: `{"field":"true"}`.
- If the field contains `1` as a numerical value.
**Example**
Query:
```sql
CREATE TABLE jsons
(
`json` String
)
ENGINE = Memory;
INSERT INTO jsons VALUES ('{"foo":false,"bar":true}');
INSERT INTO jsons VALUES ('{"foo":"true","qux":1}');
SELECT simpleJSONExtractBool(json, 'bar') FROM jsons ORDER BY json;
SELECT simpleJSONExtractBool(json, 'foo') FROM jsons ORDER BY json;
```
```response
0
1
0
0
```
## simpleJSONExtractRaw
Returns the value of the field named `field_name` as a `String`, including separators.
**Syntax**
```sql
simpleJSONExtractRaw(json, field_name)
```
**Parameters**
- `json`: The JSON in which the field is searched for. [String](../../sql-reference/data-types/string.md#string)
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
**Returned value**
It returns the value of the field as a [`String`](../../sql-reference/data-types/string.md#string), including separators if the field exists, or an empty `String` otherwise.
**Example**
Query:
```sql
CREATE TABLE jsons
(
`json` String
)
ENGINE = Memory;
INSERT INTO jsons VALUES ('{"foo":"-4e3"}');
INSERT INTO jsons VALUES ('{"foo":-3.4}');
INSERT INTO jsons VALUES ('{"foo":5}');
INSERT INTO jsons VALUES ('{"foo":{"def":[1,2,3]}}');
INSERT INTO jsons VALUES ('{"baz":2}');
SELECT simpleJSONExtractRaw(json, 'foo') FROM jsons ORDER BY json;
```
```response
"-4e3"
-3.4
5
{"def":[1,2,3]}
```
## simpleJSONExtractString
Parses `String` in double quotes from the value of the field named `field_name`.
**Syntax**
```sql
simpleJSONExtractString(json, field_name)
```
**Parameters**
- `json`: The JSON in which the field is searched for. [String](../../sql-reference/data-types/string.md#string)
- `field_name`: The name of the field to search for. [String literal](../syntax#string)
**Returned value**
It returns the value of a field as a [`String`](../../sql-reference/data-types/string.md#string), including separators. The value is unescaped. It returns an empty `String`: if the field doesn't contain a double quoted string, if unescaping fails or if the field doesn't exist.
**Implementation details**
There is currently no support for code points in the format `\uXXXX\uYYYY` that are not from the basic multilingual plane (they are converted to CESU-8 instead of UTF-8).
**Example**
Query:
```sql
CREATE TABLE jsons
(
`json` String
)
ENGINE = Memory;
INSERT INTO jsons VALUES ('{"foo":"\\n\\u0000"}');
INSERT INTO jsons VALUES ('{"foo":"\\u263"}');
INSERT INTO jsons VALUES ('{"foo":"\\u263a"}');
INSERT INTO jsons VALUES ('{"foo":"hello}');
SELECT simpleJSONExtractString(json, 'foo') FROM jsons ORDER BY json;
```
```response
\n\0
```
## visitParamHas
This function is [an alias of `simpleJSONHas`](./json-functions#simplejsonhas).
## visitParamExtractUInt
This function is [an alias of `simpleJSONExtractUInt`](./json-functions#simplejsonextractuint).
## visitParamExtractInt
This function is [an alias of `simpleJSONExtractInt`](./json-functions#simplejsonextractint).
## visitParamExtractFloat
This function is [an alias of `simpleJSONExtractFloat`](./json-functions#simplejsonextractfloat).
## visitParamExtractBool
This function is [an alias of `simpleJSONExtractBool`](./json-functions#simplejsonextractbool).
## visitParamExtractRaw
This function is [an alias of `simpleJSONExtractRaw`](./json-functions#simplejsonextractraw).
## visitParamExtractString
This function is [an alias of `simpleJSONExtractString`](./json-functions#simplejsonextractstring).
# JSONExtract functions
The following functions are based on [simdjson](https://github.com/lemire/simdjson) designed for more complex JSON parsing requirements.

View File

@ -299,6 +299,18 @@ sin(x)
Type: [Float*](../../sql-reference/data-types/float.md).
**Example**
Query:
```sql
SELECT sin(1.23);
```
```response
0.9424888019316975
```
## cos
Returns the cosine of the argument.

File diff suppressed because it is too large Load Diff

View File

@ -11,79 +11,173 @@ elimination](../../sql-reference/functions/index.md#common-subexpression-elimina
function return different random values.
Related content
- Blog: [Generating random data in ClickHouse](https://clickhouse.com/blog/generating-random-test-distribution-data-for-clickhouse)
:::note
The random numbers are generated by non-cryptographic algorithms.
:::
## rand, rand32
## rand
Returns a random UInt32 number, evenly distributed across the range of all possible UInt32 numbers.
Returns a random UInt32 number with uniform distribution.
Uses a linear congruential generator.
Uses a linear congruential generator with an initial state obtained from the system, which means that while it appears random, it's not truly random and can be predictable if the initial state is known. For scenarios where true randomness is crucial, consider using alternative methods like system-level calls or integrating with external libraries.
### Syntax
```sql
rand()
```
Alias: `rand32`
### Arguments
None.
### Returned value
Returns a number of type UInt32.
### Example
```sql
SELECT rand();
```
```response
1569354847 -- Note: The actual output will be a random number, not the specific number shown in the example
```
## rand64
Returns a random UInt64 number, evenly distributed across the range of all possible UInt64 numbers.
Returns a random UInt64 integer (UInt64) number
Uses a linear congruential generator.
### Syntax
```sql
rand64()
```
### Arguments
None.
### Returned value
Returns a number UInt64 number with uniform distribution.
Uses a linear congruential generator with an initial state obtained from the system, which means that while it appears random, it's not truly random and can be predictable if the initial state is known. For scenarios where true randomness is crucial, consider using alternative methods like system-level calls or integrating with external libraries.
### Example
```sql
SELECT rand64();
```
```response
15030268859237645412 -- Note: The actual output will be a random number, not the specific number shown in the example.
```
## randCanonical
Returns a random Float64 value, evenly distributed in interval [0, 1).
Returns a random Float64 number.
### Syntax
```sql
randCanonical()
```
### Arguments
None.
### Returned value
Returns a Float64 value between 0 (inclusive) and 1 (exclusive).
### Example
```sql
SELECT randCanonical();
```
```response
0.3452178901234567 - Note: The actual output will be a random Float64 number between 0 and 1, not the specific number shown in the example.
```
## randConstant
Like `rand` but produces a constant column with a random value.
Generates a single constant column filled with a random value. Unlike `rand`, this function ensures the same random value appears in every row of the generated column, making it useful for scenarios requiring a consistent random seed across rows in a single query.
**Example**
### Syntax
``` sql
SELECT rand(), rand(1), rand(number), randConstant(), randConstant(1), randConstant(number)
FROM numbers(3)
```sql
randConstant([x]);
```
Result:
### Arguments
``` result
┌─────rand()─┬────rand(1)─┬─rand(number)─┬─randConstant()─┬─randConstant(1)─┬─randConstant(number)─┐
│ 3047369878 │ 4132449925 │ 4044508545 │ 2740811946 │ 4229401477 │ 1924032898 │
│ 2938880146 │ 1267722397 │ 4154983056 │ 2740811946 │ 4229401477 │ 1924032898 │
│ 956619638 │ 4238287282 │ 1104342490 │ 2740811946 │ 4229401477 │ 1924032898 │
└────────────┴────────────┴──────────────┴────────────────┴─────────────────┴──────────────────────┘
- **[x] (Optional):** An optional expression that influences the generated random value. Even if provided, the resulting value will still be constant within the same query execution. Different queries using the same expression will likely generate different constant values.
### Returned value
Returns a column of type UInt32 containing the same random value in each row.
### Implementation details
The actual output will be different for each query execution, even with the same optional expression. The optional parameter may not significantly change the generated value compared to using `randConstant` alone.
### Examples
```sql
SELECT randConstant() AS random_value;
```
```response
| random_value |
|--------------|
| 1234567890 |
```
```sql
SELECT randConstant(10) AS random_value;
```
```response
| random_value |
|--------------|
| 9876543210 |
```
## randUniform
Returns a random Float64 drawn uniformly from interval [`min`, `max`) ([continuous uniform distribution](https://en.wikipedia.org/wiki/Continuous_uniform_distribution)).
Returns a random Float64 drawn uniformly from interval [`min`, `max`].
**Syntax**
### Syntax
``` sql
```sql
randUniform(min, max)
```
**Arguments**
### Arguments
- `min` - `Float64` - left boundary of the range,
- `max` - `Float64` - right boundary of the range.
**Returned value**
### Returned value
- Random number.
A random number of type [Float64](/docs/en/sql-reference/data-types/float.md).
Type: [Float64](/docs/en/sql-reference/data-types/float.md).
### Example
**Example**
``` sql
```sql
SELECT randUniform(5.5, 10) FROM numbers(5)
```
Result:
``` result
```response
┌─randUniform(5.5, 10)─┐
│ 8.094978491443102 │
│ 7.3181248914450885 │
@ -99,7 +193,7 @@ Returns a random Float64 drawn from a [normal distribution](https://en.wikipedia
**Syntax**
``` sql
```sql
randNormal(mean, variance)
```
@ -116,13 +210,13 @@ Type: [Float64](/docs/en/sql-reference/data-types/float.md).
**Example**
``` sql
```sql
SELECT randNormal(10, 2) FROM numbers(5)
```
Result:
``` result
```result
┌──randNormal(10, 2)─┐
│ 13.389228911709653 │
│ 8.622949707401295 │
@ -138,7 +232,7 @@ Returns a random Float64 drawn from a [log-normal distribution](https://en.wikip
**Syntax**
``` sql
```sql
randLogNormal(mean, variance)
```
@ -155,13 +249,13 @@ Type: [Float64](/docs/en/sql-reference/data-types/float.md).
**Example**
``` sql
```sql
SELECT randLogNormal(100, 5) FROM numbers(5)
```
Result:
``` result
```result
┌─randLogNormal(100, 5)─┐
│ 1.295699673937363e48 │
│ 9.719869109186684e39 │
@ -177,7 +271,7 @@ Returns a random UInt64 drawn from a [binomial distribution](https://en.wikipedi
**Syntax**
``` sql
```sql
randBinomial(experiments, probability)
```
@ -194,13 +288,13 @@ Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
**Example**
``` sql
```sql
SELECT randBinomial(100, .75) FROM numbers(5)
```
Result:
``` result
```result
┌─randBinomial(100, 0.75)─┐
│ 74 │
│ 78 │
@ -216,7 +310,7 @@ Returns a random UInt64 drawn from a [negative binomial distribution](https://en
**Syntax**
``` sql
```sql
randNegativeBinomial(experiments, probability)
```
@ -233,13 +327,13 @@ Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
**Example**
``` sql
```sql
SELECT randNegativeBinomial(100, .75) FROM numbers(5)
```
Result:
``` result
```result
┌─randNegativeBinomial(100, 0.75)─┐
│ 33 │
│ 32 │
@ -255,7 +349,7 @@ Returns a random UInt64 drawn from a [Poisson distribution](https://en.wikipedia
**Syntax**
``` sql
```sql
randPoisson(n)
```
@ -271,13 +365,13 @@ Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
**Example**
``` sql
```sql
SELECT randPoisson(10) FROM numbers(5)
```
Result:
``` result
```result
┌─randPoisson(10)─┐
│ 8 │
│ 8 │
@ -293,7 +387,7 @@ Returns a random UInt64 drawn from a [Bernoulli distribution](https://en.wikiped
**Syntax**
``` sql
```sql
randBernoulli(probability)
```
@ -309,13 +403,13 @@ Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
**Example**
``` sql
```sql
SELECT randBernoulli(.75) FROM numbers(5)
```
Result:
``` result
```result
┌─randBernoulli(0.75)─┐
│ 1 │
│ 1 │
@ -331,7 +425,7 @@ Returns a random Float64 drawn from a [exponential distribution](https://en.wiki
**Syntax**
``` sql
```sql
randExponential(lambda)
```
@ -347,13 +441,13 @@ Type: [Float64](/docs/en/sql-reference/data-types/float.md).
**Example**
``` sql
```sql
SELECT randExponential(1/10) FROM numbers(5)
```
Result:
``` result
```result
┌─randExponential(divide(1, 10))─┐
│ 44.71628934340778 │
│ 4.211013337903262 │
@ -369,7 +463,7 @@ Returns a random Float64 drawn from a [Chi-square distribution](https://en.wikip
**Syntax**
``` sql
```sql
randChiSquared(degree_of_freedom)
```
@ -385,13 +479,13 @@ Type: [Float64](/docs/en/sql-reference/data-types/float.md).
**Example**
``` sql
```sql
SELECT randChiSquared(10) FROM numbers(5)
```
Result:
``` result
```result
┌─randChiSquared(10)─┐
│ 10.015463656521543 │
│ 9.621799919882768 │
@ -407,7 +501,7 @@ Returns a random Float64 drawn from a [Student's t-distribution](https://en.wiki
**Syntax**
``` sql
```sql
randStudentT(degree_of_freedom)
```
@ -423,13 +517,13 @@ Type: [Float64](/docs/en/sql-reference/data-types/float.md).
**Example**
``` sql
```sql
SELECT randStudentT(10) FROM numbers(5)
```
Result:
``` result
```result
┌─────randStudentT(10)─┐
│ 1.2217309938538725 │
│ 1.7941971681200541 │
@ -445,7 +539,7 @@ Returns a random Float64 drawn from a [F-distribution](https://en.wikipedia.org/
**Syntax**
``` sql
```sql
randFisherF(d1, d2)
```
@ -462,13 +556,13 @@ Type: [Float64](/docs/en/sql-reference/data-types/float.md).
**Example**
``` sql
```sql
SELECT randFisherF(10, 3) FROM numbers(5)
```
Result:
``` result
```result
┌──randFisherF(10, 3)─┐
│ 7.286287504216609 │
│ 0.26590779413050386 │
@ -484,7 +578,7 @@ Generates a string of the specified length filled with random bytes (including z
**Syntax**
``` sql
```sql
randomString(length)
```
@ -502,13 +596,13 @@ Type: [String](../../sql-reference/data-types/string.md).
Query:
``` sql
```sql
SELECT randomString(30) AS str, length(str) AS len FROM numbers(2) FORMAT Vertical;
```
Result:
``` text
```text
Row 1:
──────
str: 3 G : pT ?w тi k aV f6
@ -526,7 +620,7 @@ Generates a binary string of the specified length filled with random bytes (incl
**Syntax**
``` sql
```sql
randomFixedString(length);
```
@ -563,7 +657,7 @@ If you pass `length < 0`, the behavior of the function is undefined.
**Syntax**
``` sql
```sql
randomPrintableASCII(length)
```
@ -579,11 +673,11 @@ Type: [String](../../sql-reference/data-types/string.md)
**Example**
``` sql
```sql
SELECT number, randomPrintableASCII(30) as str, length(str) FROM system.numbers LIMIT 3
```
``` text
```text
┌─number─┬─str────────────────────────────┬─length(randomPrintableASCII(30))─┐
│ 0 │ SuiCOSTvC0csfABSw=UcSzp2.`rv8x │ 30 │
│ 1 │ 1Ag NlJ &RCN:*>HVPG;PE-nO"SUFD │ 30 │
@ -597,7 +691,7 @@ Generates a random string of a specified length. Result string contains valid UT
**Syntax**
``` sql
```sql
randomStringUTF8(length);
```
@ -635,11 +729,12 @@ Flips the bits of String or FixedString `s`, each with probability `prob`.
**Syntax**
``` sql
```sql
fuzzBits(s, prob)
```
**Arguments**
- `s` - `String` or `FixedString`,
- `prob` - constant `Float32/64` between 0.0 and 1.0.
@ -649,14 +744,14 @@ Fuzzed string with same type as `s`.
**Example**
``` sql
```sql
SELECT fuzzBits(materialize('abacaba'), 0.1)
FROM numbers(3)
```
Result:
``` result
```result
┌─fuzzBits(materialize('abacaba'), 0.1)─┐
│ abaaaja │
│ a*cjab+ │

View File

@ -588,8 +588,41 @@ Result:
## substringUTF8
Like `substring` but for Unicode code points. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Returns the substring of a string `s` which starts at the specified byte index `offset` for Unicode code points. Byte counting starts from `1`. If `offset` is `0`, an empty string is returned. If `offset` is negative, the substring starts `pos` characters from the end of the string, rather than from the beginning. An optional argument `length` specifies the maximum number of bytes the returned substring may have.
Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
**Syntax**
```sql
substringUTF8(s, offset[, length])
```
**Arguments**
- `s`: The string to calculate a substring from. [String](../../sql-reference/data-types/string.md), [FixedString](../../sql-reference/data-types/fixedstring.md) or [Enum](../../sql-reference/data-types/enum.md)
- `offset`: The starting position of the substring in `s` . [(U)Int*](../../sql-reference/data-types/int-uint.md).
- `length`: The maximum length of the substring. [(U)Int*](../../sql-reference/data-types/int-uint.md). Optional.
**Returned value**
A substring of `s` with `length` many bytes, starting at index `offset`.
**Implementation details**
Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
**Example**
```sql
SELECT 'Täglich grüßt das Murmeltier.' AS str,
substringUTF8(str, 9),
substringUTF8(str, 9, 5)
```
```response
Täglich grüßt das Murmeltier. grüßt das Murmeltier. grüßt
```
## substringIndex
@ -624,7 +657,39 @@ Result:
## substringIndexUTF8
Like `substringIndex` but for Unicode code points. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
Returns the substring of `s` before `count` occurrences of the delimiter `delim`, specifically for Unicode code points.
Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
**Syntax**
```sql
substringIndexUTF8(s, delim, count)
```
**Arguments**
- `s`: The string to extract substring from. [String](../../sql-reference/data-types/string.md).
- `delim`: The character to split. [String](../../sql-reference/data-types/string.md).
- `count`: The number of occurrences of the delimiter to count before extracting the substring. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. [UInt or Int](../data-types/int-uint.md)
**Returned value**
A substring [String](../../sql-reference/data-types/string.md) of `s` before `count` occurrences of `delim`.
**Implementation details**
Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.
**Example**
```sql
SELECT substringIndexUTF8('www.straßen-in-europa.de', '.', 2)
```
```response
www.straßen-in-europa
```
## appendTrailingCharIfAbsent

View File

@ -968,7 +968,7 @@ Converts a numeric value to String with the number of fractional digits in the o
toDecimalString(number, scale)
```
**Parameters**
**Arguments**
- `number` — Value to be represented as String, [Int, UInt](/docs/en/sql-reference/data-types/int-uint.md), [Float](/docs/en/sql-reference/data-types/float.md), [Decimal](/docs/en/sql-reference/data-types/decimal.md),
- `scale` — Number of fractional digits, [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
@ -1261,7 +1261,7 @@ Converts input value `x` to the specified data type `T`. Always returns [Nullabl
accurateCastOrNull(x, T)
```
**Parameters**
**Arguments**
- `x` — Input value.
- `T` — The name of the returned data type.
@ -1314,7 +1314,7 @@ Converts input value `x` to the specified data type `T`. Returns default type va
accurateCastOrDefault(x, T)
```
**Parameters**
**Arguments**
- `x` — Input value.
- `T` — The name of the returned data type.
@ -1675,7 +1675,7 @@ Same as [parseDateTimeBestEffort](#parsedatetimebesteffort) function but also pa
parseDateTime64BestEffort(time_string [, precision [, time_zone]])
```
**Parameters**
**Arguments**
- `time_string` — String containing a date or date with time to convert. [String](/docs/en/sql-reference/data-types/string.md).
- `precision` — Required precision. `3` — for milliseconds, `6` — for microseconds. Default — `3`. Optional. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
@ -1990,7 +1990,7 @@ Extracts the timestamp component of a [Snowflake ID](https://en.wikipedia.org/wi
snowflakeToDateTime(value[, time_zone])
```
**Parameters**
**Arguments**
- `value` — Snowflake ID. [Int64](/docs/en/sql-reference/data-types/int-uint.md).
- `time_zone` — [Timezone](/docs/en/operations/server-configuration-parameters/settings.md/#server_configuration_parameters-timezone). The function parses `time_string` according to the timezone. Optional. [String](/docs/en/sql-reference/data-types/string.md).
@ -2026,7 +2026,7 @@ Extracts the timestamp component of a [Snowflake ID](https://en.wikipedia.org/wi
snowflakeToDateTime64(value[, time_zone])
```
**Parameters**
**Arguments**
- `value` — Snowflake ID. [Int64](/docs/en/sql-reference/data-types/int-uint.md).
- `time_zone` — [Timezone](/docs/en/operations/server-configuration-parameters/settings.md/#server_configuration_parameters-timezone). The function parses `time_string` according to the timezone. Optional. [String](/docs/en/sql-reference/data-types/string.md).
@ -2062,7 +2062,7 @@ Converts a [DateTime](/docs/en/sql-reference/data-types/datetime.md) value to th
dateTimeToSnowflake(value)
```
**Parameters**
**Arguments**
- `value` — Date with time. [DateTime](/docs/en/sql-reference/data-types/datetime.md).
@ -2096,7 +2096,7 @@ Convert a [DateTime64](/docs/en/sql-reference/data-types/datetime64.md) to the f
dateTime64ToSnowflake(value)
```
**Parameters**
**Arguments**
- `value` — Date with time. [DateTime64](/docs/en/sql-reference/data-types/datetime64.md).

View File

@ -155,7 +155,7 @@ Configuration example:
cutToFirstSignificantSubdomain(URL, TLD)
```
**Parameters**
**Arguments**
- `URL` — URL. [String](../../sql-reference/data-types/string.md).
- `TLD` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
@ -209,7 +209,7 @@ Configuration example:
cutToFirstSignificantSubdomainCustomWithWWW(URL, TLD)
```
**Parameters**
**Arguments**
- `URL` — URL. [String](../../sql-reference/data-types/string.md).
- `TLD` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
@ -263,7 +263,7 @@ Configuration example:
firstSignificantSubdomainCustom(URL, TLD)
```
**Parameters**
**Arguments**
- `URL` — URL. [String](../../sql-reference/data-types/string.md).
- `TLD` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).

View File

@ -353,7 +353,7 @@ For efficiency, the `and` and `or` functions accept any number of arguments. The
ClickHouse supports the `IS NULL` and `IS NOT NULL` operators.
### IS NULL
### IS NULL {#is_null}
- For [Nullable](../../sql-reference/data-types/nullable.md) type values, the `IS NULL` operator returns:
- `1`, if the value is `NULL`.
@ -374,7 +374,7 @@ SELECT x+100 FROM t_null WHERE y IS NULL
└──────────────┘
```
### IS NOT NULL
### IS NOT NULL {#is_not_null}
- For [Nullable](../../sql-reference/data-types/nullable.md) type values, the `IS NOT NULL` operator returns:
- `0`, if the value is `NULL`.

View File

@ -335,7 +335,7 @@ The `ALTER` query lets you create and delete separate elements (columns) in nest
There is no support for deleting columns in the primary key or the sampling key (columns that are used in the `ENGINE` expression). Changing the type for columns that are included in the primary key is only possible if this change does not cause the data to be modified (for example, you are allowed to add values to an Enum or to change a type from `DateTime` to `UInt32`).
If the `ALTER` query is not sufficient to make the table changes you need, you can create a new table, copy the data to it using the [INSERT SELECT](/docs/en/sql-reference/statements/insert-into.md/#inserting-the-results-of-select) query, then switch the tables using the [RENAME](/docs/en/sql-reference/statements/rename.md/#rename-table) query and delete the old table. You can use the [clickhouse-copier](/docs/en/operations/utilities/clickhouse-copier.md) as an alternative to the `INSERT SELECT` query.
If the `ALTER` query is not sufficient to make the table changes you need, you can create a new table, copy the data to it using the [INSERT SELECT](/docs/en/sql-reference/statements/insert-into.md/#inserting-the-results-of-select) query, then switch the tables using the [RENAME](/docs/en/sql-reference/statements/rename.md/#rename-table) query and delete the old table.
The `ALTER` query blocks all reads and writes for the table. In other words, if a long `SELECT` is running at the time of the `ALTER` query, the `ALTER` query will wait for it to complete. At the same time, all new queries to the same table will wait while this `ALTER` is running.

View File

@ -202,6 +202,13 @@ Hierarchy of privileges:
- `S3`
- [dictGet](#grant-dictget)
- [displaySecretsInShowAndSelect](#grant-display-secrets)
- [NAMED COLLECTION ADMIN](#grant-named-collection-admin)
- `CREATE NAMED COLLECTION`
- `DROP NAMED COLLECTION`
- `ALTER NAMED COLLECTION`
- `SHOW NAMED COLLECTIONS`
- `SHOW NAMED COLLECTIONS SECRETS`
- `NAMED COLLECTION`
Examples of how this hierarchy is treated:
@ -498,6 +505,25 @@ and
[`format_display_secrets_in_show_and_select` format setting](../../operations/settings/formats#format_display_secrets_in_show_and_select)
are turned on.
### NAMED COLLECTION ADMIN
Allows a certain operation on a specified named collection. Before version 23.7 it was called NAMED COLLECTION CONTROL, and after 23.7 NAMED COLLECTION ADMIN was added and NAMED COLLECTION CONTROL is preserved as an alias.
- `NAMED COLLECTION ADMIN`. Level: `NAMED_COLLECTION`. Aliases: `NAMED COLLECTION CONTROL`
- `CREATE NAMED COLLECTION`. Level: `NAMED_COLLECTION`
- `DROP NAMED COLLECTION`. Level: `NAMED_COLLECTION`
- `ALTER NAMED COLLECTION`. Level: `NAMED_COLLECTION`
- `SHOW NAMED COLLECTIONS`. Level: `NAMED_COLLECTION`. Aliases: `SHOW NAMED COLLECTIONS`
- `SHOW NAMED COLLECTIONS SECRETS`. Level: `NAMED_COLLECTION`. Aliases: `SHOW NAMED COLLECTIONS SECRETS`
- `NAMED COLLECTION`. Level: `NAMED_COLLECTION`. Aliases: `NAMED COLLECTION USAGE, USE NAMED COLLECTION`
Unlike all other grants (CREATE, DROP, ALTER, SHOW) grant NAMED COLLECTION was added only in 23.7, while all others were added earlier - in 22.12.
**Examples**
Assuming a named collection is called abc, we grant privilege CREATE NAMED COLLECTION to user john.
- `GRANT CREATE NAMED COLLECTION ON abc TO john`
### ALL
Grants all the privileges on regulated entity to a user account or a role.

View File

@ -13,13 +13,6 @@ a system table called `system.dropped_tables`.
If you have a materialized view without a `TO` clause associated with the dropped table, then you will also have to UNDROP the inner table of that view.
:::note
UNDROP TABLE is experimental. To use it add this setting:
```sql
set allow_experimental_undrop_table_query = 1;
```
:::
:::tip
Also see [DROP TABLE](/docs/en/sql-reference/statements/drop.md)
:::
@ -32,60 +25,53 @@ UNDROP TABLE [db.]name [UUID '<uuid>'] [ON CLUSTER cluster]
**Example**
``` sql
set allow_experimental_undrop_table_query = 1;
```
```sql
CREATE TABLE undropMe
CREATE TABLE tab
(
`id` UInt8
)
ENGINE = MergeTree
ORDER BY id
```
ORDER BY id;
DROP TABLE tab;
```sql
DROP TABLE undropMe
```
```sql
SELECT *
FROM system.dropped_tables
FORMAT Vertical
FORMAT Vertical;
```
```response
Row 1:
──────
index: 0
database: default
table: undropMe
table: tab
uuid: aa696a1a-1d70-4e60-a841-4c80827706cc
engine: MergeTree
metadata_dropped_path: /var/lib/clickhouse/metadata_dropped/default.undropMe.aa696a1a-1d70-4e60-a841-4c80827706cc.sql
metadata_dropped_path: /var/lib/clickhouse/metadata_dropped/default.tab.aa696a1a-1d70-4e60-a841-4c80827706cc.sql
table_dropped_time: 2023-04-05 14:12:12
1 row in set. Elapsed: 0.001 sec.
```
```sql
UNDROP TABLE undropMe
```
```response
Ok.
```
```sql
UNDROP TABLE tab;
SELECT *
FROM system.dropped_tables
FORMAT Vertical
```
FORMAT Vertical;
```response
Ok.
0 rows in set. Elapsed: 0.001 sec.
```
```sql
DESCRIBE TABLE undropMe
FORMAT Vertical
DESCRIBE TABLE tab
FORMAT Vertical;
```
```response
Row 1:
──────

View File

@ -5,7 +5,7 @@ sidebar_label: cluster
title: "cluster, clusterAllReplicas"
---
Allows to access all shards in an existing cluster which configured in `remote_servers` section without creating a [Distributed](../../engines/table-engines/special/distributed.md) table. One replica of each shard is queried.
Allows to access all shards (configured in the `remote_servers` section) of a cluster without creating a [Distributed](../../engines/table-engines/special/distributed.md) table. Only one replica of each shard is queried.
`clusterAllReplicas` function — same as `cluster`, but all replicas are queried. Each replica in a cluster is used as a separate shard/connection.

View File

@ -5,7 +5,12 @@ sidebar_label: Window Functions
title: Window Functions
---
ClickHouse supports the standard grammar for defining windows and window functions. The following features are currently supported:
Windows functions let you perform calculations across a set of rows that are related to the current row.
Some of the calculations that you can do are similar to those that can be done with an aggregate function, but a window function doesn't cause rows to be grouped into a single output - the individual rows are still returned.
## Standard Window Functions
ClickHouse supports the standard grammar for defining windows and window functions. The table below indicates whether a feature is currently supported.
| Feature | Support or workaround |
|------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@ -25,6 +30,8 @@ ClickHouse supports the standard grammar for defining windows and window functio
## ClickHouse-specific Window Functions
There are also the following window function that's specific to ClickHouse:
### nonNegativeDerivative(metric_column, timestamp_column[, INTERVAL X UNITS])
Finds non-negative derivative for given `metric_column` by `timestamp_column`.
@ -33,40 +40,6 @@ The computed value is the following for each row:
- `0` for 1st row,
- ${metric_i - metric_{i-1} \over timestamp_i - timestamp_{i-1}} * interval$ for $i_th$ row.
## References
### GitHub Issues
The roadmap for the initial support of window functions is [in this issue](https://github.com/ClickHouse/ClickHouse/issues/18097).
All GitHub issues related to window functions have the [comp-window-functions](https://github.com/ClickHouse/ClickHouse/labels/comp-window-functions) tag.
### Tests
These tests contain the examples of the currently supported grammar:
https://github.com/ClickHouse/ClickHouse/blob/master/tests/performance/window_functions.xml
https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/01591_window_functions.sql
### Postgres Docs
https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW
https://www.postgresql.org/docs/devel/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
https://www.postgresql.org/docs/devel/functions-window.html
https://www.postgresql.org/docs/devel/tutorial-window.html
### MySQL Docs
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html
https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html
https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
## Syntax
```text
@ -80,20 +53,7 @@ WINDOW window_name as ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
- `PARTITION BY` - defines how to break a resultset into groups.
- `ORDER BY` - defines how to order rows inside the group during calculation aggregate_function.
- `ROWS or RANGE` - defines bounds of a frame, aggregate_function is calculated within a frame.
- `WINDOW` - allows to reuse a window definition with multiple expressions.
### Functions
These functions can be used only as a window function.
- `row_number()` - Number the current row within its partition starting from 1.
- `first_value(x)` - Return the first non-NULL value evaluated within its ordered frame.
- `last_value(x)` - Return the last non-NULL value evaluated within its ordered frame.
- `nth_value(x, offset)` - Return the first non-NULL value evaluated against the nth row (offset) in its ordered frame.
- `rank()` - Rank the current row within its partition with gaps.
- `dense_rank()` - Rank the current row within its partition without gaps.
- `lagInFrame(x)` - Return a value evaluated at the row that is at a specified physical offset row before the current row within the ordered frame.
- `leadInFrame(x)` - Return a value evaluated at the row that is offset rows after the current row within the ordered frame.
- `WINDOW` - allows multiple expressions to use the same window definition.
```text
PARTITION
@ -112,8 +72,23 @@ These functions can be used only as a window function.
└─────────────────┘ <--- UNBOUNDED FOLLOWING (END of the PARTITION)
```
### Functions
These functions can be used only as a window function.
- `row_number()` - Number the current row within its partition starting from 1.
- `first_value(x)` - Return the first non-NULL value evaluated within its ordered frame.
- `last_value(x)` - Return the last non-NULL value evaluated within its ordered frame.
- `nth_value(x, offset)` - Return the first non-NULL value evaluated against the nth row (offset) in its ordered frame.
- `rank()` - Rank the current row within its partition with gaps.
- `dense_rank()` - Rank the current row within its partition without gaps.
- `lagInFrame(x)` - Return a value evaluated at the row that is at a specified physical offset row before the current row within the ordered frame.
- `leadInFrame(x)` - Return a value evaluated at the row that is offset rows after the current row within the ordered frame.
## Examples
Let's have a look at some examples of how window functions can be used.
```sql
CREATE TABLE wf_partition
(
@ -589,6 +564,41 @@ ORDER BY
└──────────────┴─────────────────────┴───────┴─────────────────────────┘
```
## References
### GitHub Issues
The roadmap for the initial support of window functions is [in this issue](https://github.com/ClickHouse/ClickHouse/issues/18097).
All GitHub issues related to window functions have the [comp-window-functions](https://github.com/ClickHouse/ClickHouse/labels/comp-window-functions) tag.
### Tests
These tests contain the examples of the currently supported grammar:
https://github.com/ClickHouse/ClickHouse/blob/master/tests/performance/window_functions.xml
https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/01591_window_functions.sql
### Postgres Docs
https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW
https://www.postgresql.org/docs/devel/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
https://www.postgresql.org/docs/devel/functions-window.html
https://www.postgresql.org/docs/devel/tutorial-window.html
### MySQL Docs
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html
https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html
https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
## Related Content
- Blog: [Working with time series data in ClickHouse](https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse)

View File

@ -585,10 +585,6 @@ ENGINE = Distributed(perftest_3shards_1replicas, tutorial, hits_local, rand());
INSERT INTO tutorial.hits_all SELECT * FROM tutorial.hits_v1;
```
:::danger Внимание!
Этот подход не годится для сегментирования больших таблиц. Есть инструмент [clickhouse-copier](../operations/utilities/clickhouse-copier.md), специально предназначенный для перераспределения любых больших таблиц.
:::
Как и следовало ожидать, вычислительно сложные запросы работают втрое быстрее, если они выполняются на трёх серверах, а не на одном.
В данном случае мы использовали кластер из трёх сегментов с одной репликой для каждого.

View File

@ -24,10 +24,6 @@ sidebar_label: "Резервное копирование данных"
Некоторые локальные файловые системы позволяют делать снимки (например, [ZFS](https://en.wikipedia.org/wiki/ZFS)), но они могут быть не лучшим выбором для обслуживания живых запросов. Возможным решением является создание дополнительных реплик с такой файловой системой и исключение их из [Distributed](../engines/table-engines/special/distributed.md) таблиц, используемых для запросов `SELECT`. Снимки на таких репликах будут недоступны для запросов, изменяющих данные. В качестве бонуса, эти реплики могут иметь особые конфигурации оборудования с большим количеством дисков, подключенных к серверу, что будет экономически эффективным.
## clickhouse-copier {#clickhouse-copier}
[clickhouse-copier](utilities/clickhouse-copier.md) — это универсальный инструмент, который изначально был создан для перешардирования таблиц с петабайтами данных. Его также можно использовать для резервного копирования и восстановления, поскольку он надёжно копирует данные между таблицами и кластерами ClickHouse.
Для небольших объёмов данных можно применять `INSERT INTO ... SELECT ...` в удалённые таблицы.
## Манипуляции с партициями {#manipuliatsii-s-partitsiiami}

View File

@ -1,183 +0,0 @@
---
slug: /ru/operations/utilities/clickhouse-copier
sidebar_position: 59
sidebar_label: clickhouse-copier
---
# clickhouse-copier {#clickhouse-copier}
Копирует данные из таблиц одного кластера в таблицы другого (или этого же) кластера.
Можно запустить несколько `clickhouse-copier` для разных серверах для выполнения одного и того же задания. Для синхронизации между процессами используется ZooKeeper.
После запуска, `clickhouse-copier`:
- Соединяется с ZooKeeper и получает:
- Задания на копирование.
- Состояние заданий на копирование.
- Выполняет задания.
Каждый запущенный процесс выбирает "ближайший" шард исходного кластера и копирует данные в кластер назначения, при необходимости перешардируя их.
`clickhouse-copier` отслеживает изменения в ZooKeeper и применяет их «на лету».
Для снижения сетевого трафика рекомендуем запускать `clickhouse-copier` на том же сервере, где находятся исходные данные.
## Запуск Clickhouse-copier {#zapusk-clickhouse-copier}
Утилиту следует запускать вручную следующим образом:
``` bash
$ clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --base-dir /path/to/dir
```
Параметры запуска:
- `daemon` - запускает `clickhouse-copier` в режиме демона.
- `config` - путь к файлу `zookeeper.xml` с параметрами соединения с ZooKeeper.
- `task-path` - путь к ноде ZooKeeper. Нода используется для синхронизации между процессами `clickhouse-copier` и для хранения заданий. Задания хранятся в `$task-path/description`.
- `task-file` - необязательный путь к файлу с описанием конфигурация заданий для загрузки в ZooKeeper.
- `task-upload-force` - Загрузить `task-file` в ZooKeeper даже если уже было загружено.
- `base-dir` - путь к логам и вспомогательным файлам. При запуске `clickhouse-copier` создает в `$base-dir` подкаталоги `clickhouse-copier_YYYYMMHHSS_<PID>`. Если параметр не указан, то каталоги будут создаваться в каталоге, где `clickhouse-copier` был запущен.
## Формат Zookeeper.xml {#format-zookeeper-xml}
``` xml
<clickhouse>
<logger>
<level>trace</level>
<size>100M</size>
<count>3</count>
</logger>
<zookeeper>
<node index="1">
<host>127.0.0.1</host>
<port>2181</port>
</node>
</zookeeper>
</clickhouse>
```
## Конфигурация заданий на копирование {#konfiguratsiia-zadanii-na-kopirovanie}
``` xml
<clickhouse>
<!-- Configuration of clusters as in an ordinary server config -->
<remote_servers>
<source_cluster>
<!--
source cluster & destination clusters accept exactly the same
parameters as parameters for the usual Distributed table
see https://clickhouse.com/docs/ru/engines/table-engines/special/distributed/
-->
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>127.0.0.1</host>
<port>9000</port>
<!--
<user>default</user>
<password>default</password>
<secure>1</secure>
-->
</replica>
</shard>
...
</source_cluster>
<destination_cluster>
...
</destination_cluster>
</remote_servers>
<!-- How many simultaneously active workers are possible. If you run more workers superfluous workers will sleep. -->
<max_workers>2</max_workers>
<!-- Setting used to fetch (pull) data from source cluster tables -->
<settings_pull>
<readonly>1</readonly>
</settings_pull>
<!-- Setting used to insert (push) data to destination cluster tables -->
<settings_push>
<readonly>0</readonly>
</settings_push>
<!-- Common setting for fetch (pull) and insert (push) operations. Also, copier process context uses it.
They are overlaid by <settings_pull/> and <settings_push/> respectively. -->
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.
You could specify several table task in the same task description (in the same ZooKeeper node), they will be performed
sequentially.
-->
<tables>
<!-- A table task, copies one table. -->
<table_hits>
<!-- Source cluster name (from <remote_servers/> section) and tables in it that should be copied -->
<cluster_pull>source_cluster</cluster_pull>
<database_pull>test</database_pull>
<table_pull>hits</table_pull>
<!-- Destination cluster name and tables in which the data should be inserted -->
<cluster_push>destination_cluster</cluster_push>
<database_push>test</database_push>
<table_push>hits2</table_push>
<!-- Engine of destination tables.
If destination tables have not be created, workers create them using columns definition from source tables and engine
definition from here.
NOTE: If the first worker starts insert data and detects that destination partition is not empty then the partition will
be dropped and refilled, take it into account if you already have some data in destination tables. You could directly
specify partitions that should be copied in <enabled_partitions/>, they should be in quoted format like partition column of
system.parts table.
-->
<engine>
ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}')
PARTITION BY toMonday(date)
ORDER BY (CounterID, EventDate)
</engine>
<!-- Sharding key used to insert data to destination cluster -->
<sharding_key>jumpConsistentHash(intHash64(UserID), 2)</sharding_key>
<!-- Optional expression that filter data while pull them from source servers -->
<where_condition>CounterID != 0</where_condition>
<!-- This section specifies partitions that should be copied, other partition will be ignored.
Partition names should have the same format as
partition column of system.parts table (i.e. a quoted text).
Since partition key of source and destination cluster could be different,
these partition names specify destination partitions.
NOTE: In spite of this section is optional (if it is not specified, all partitions will be copied),
it is strictly recommended to specify them explicitly.
If you already have some ready partitions on destination cluster they
will be removed at the start of the copying since they will be interpeted
as unfinished data from the previous copying!!!
-->
<enabled_partitions>
<partition>'2018-02-26'</partition>
<partition>'2018-03-05'</partition>
...
</enabled_partitions>
</table_hits>
<!-- Next table to copy. It is not copied until previous table is copying. -->
<table_visits>
...
</table_visits>
...
</tables>
</clickhouse>
```
`clickhouse-copier` отслеживает изменения `/task/path/description` и применяет их «на лету». Если вы поменяете, например, значение `max_workers`, то количество процессов, выполняющих задания, также изменится.

View File

@ -7,7 +7,6 @@ sidebar_position: 56
# Утилиты ClickHouse {#utility-clickhouse}
- [clickhouse-local](clickhouse-local.md) - позволяет выполнять SQL-запросы над данными без остановки сервера ClickHouse, подобно утилите `awk`.
- [clickhouse-copier](clickhouse-copier.md) - копирует (и перешардирует) данные с одного кластера на другой.
- [clickhouse-benchmark](../../operations/utilities/clickhouse-benchmark.md) — устанавливает соединение с сервером ClickHouse и запускает циклическое выполнение указанных запросов.
- [clickhouse-format](../../operations/utilities/clickhouse-format.md) — позволяет форматировать входящие запросы.
- [ClickHouse obfuscator](../../operations/utilities/clickhouse-obfuscator.md) — обфусцирует данные.

View File

@ -94,7 +94,7 @@ RENAME COLUMN [IF EXISTS] name to new_name
Переименовывает столбец `name` в `new_name`. Если указано выражение `IF EXISTS`, то запрос не будет возвращать ошибку при условии, что столбец `name` не существует. Поскольку переименование не затрагивает физические данные колонки, запрос выполняется практически мгновенно.
**ЗАМЕЧЕНИЕ**: Столбцы, являющиеся частью основного ключа или ключа сортировки (заданные с помощью `ORDER BY` или `PRIMARY KEY`), не могут быть переименованы. Попытка переименовать эти слобцы приведет к `SQL Error [524]`.
**ЗАМЕЧЕНИЕ**: Столбцы, являющиеся частью основного ключа или ключа сортировки (заданные с помощью `ORDER BY` или `PRIMARY KEY`), не могут быть переименованы. Попытка переименовать эти слобцы приведет к `SQL Error [524]`.
Пример:
@ -254,7 +254,7 @@ SELECT groupArray(x), groupArray(s) FROM tmp;
Отсутствует возможность удалять столбцы, входящие в первичный ключ или ключ для сэмплирования (в общем, входящие в выражение `ENGINE`). Изменение типа у столбцов, входящих в первичный ключ возможно только в том случае, если это изменение не приводит к изменению данных (например, разрешено добавление значения в Enum или изменение типа с `DateTime` на `UInt32`).
Если возможностей запроса `ALTER` не хватает для нужного изменения таблицы, вы можете создать новую таблицу, скопировать туда данные с помощью запроса [INSERT SELECT](../insert-into.md#inserting-the-results-of-select), затем поменять таблицы местами с помощью запроса [RENAME](../rename.md#rename-table), и удалить старую таблицу. В качестве альтернативы для запроса `INSERT SELECT`, можно использовать инструмент [clickhouse-copier](../../../sql-reference/statements/alter/index.md).
Если возможностей запроса `ALTER` не хватает для нужного изменения таблицы, вы можете создать новую таблицу, скопировать туда данные с помощью запроса [INSERT SELECT](../insert-into.md#inserting-the-results-of-select), затем поменять таблицы местами с помощью запроса [RENAME](../rename.md#rename-table), и удалить старую таблицу.
Запрос `ALTER` блокирует все чтения и записи для таблицы. То есть если на момент запроса `ALTER` выполнялся долгий `SELECT`, то запрос `ALTER` сначала дождётся его выполнения. И в это время все новые запросы к той же таблице будут ждать, пока завершится этот `ALTER`.

View File

@ -582,8 +582,6 @@ ENGINE = Distributed(perftest_3shards_1replicas, tutorial, hits_local, rand());
INSERT INTO tutorial.hits_all SELECT * FROM tutorial.hits_v1;
```
!!! warning "注意:"
这种方法不适合大型表的分片。 有一个单独的工具 [clickhouse-copier](../operations/utilities/clickhouse-copier.md) 这可以重新分片任意大表。
正如您所期望的那样如果计算量大的查询使用3台服务器而不是一个则运行速度快N倍。

View File

@ -24,12 +24,6 @@ sidebar_label: "\u6570\u636E\u5907\u4EFD"
某些本地文件系统提供快照功能(例如, [ZFS](https://en.wikipedia.org/wiki/ZFS)),但它们可能不是提供实时查询的最佳选择。 一个可能的解决方案是使用这种文件系统创建额外的副本,并将它们与用于`SELECT` 查询的 [分布式](../engines/table-engines/special/distributed.md) 表分离。 任何修改数据的查询都无法访问此类副本上的快照。 作为回报,这些副本可能具有特殊的硬件配置,每个服务器附加更多的磁盘,这将是经济高效的。
## clickhouse-copier {#clickhouse-copier}
[clickhouse-copier](utilities/clickhouse-copier.md) 是一个多功能工具最初创建它是为了用于重新切分pb大小的表。 因为它能够在ClickHouse表和集群之间可靠地复制数据所以它也可用于备份和还原数据。
对于较小的数据量,一个简单的 `INSERT INTO ... SELECT ...` 到远程表也可以工作。
## part操作 {#manipulations-with-parts}
ClickHouse允许使用 `ALTER TABLE ... FREEZE PARTITION ...` 查询以创建表分区的本地副本。 这是利用硬链接(hardlink)到 `/var/lib/clickhouse/shadow/` 文件夹中实现的,所以它通常不会因为旧数据而占用额外的磁盘空间。 创建的文件副本不由ClickHouse服务器处理所以你可以把它们留在那里你将有一个简单的备份不需要任何额外的外部系统但它仍然容易出现硬件问题。 出于这个原因,最好将它们远程复制到另一个位置,然后删除本地副本。 分布式文件系统和对象存储仍然是一个不错的选择,但是具有足够大容量的正常附加文件服务器也可以工作(在这种情况下,传输将通过网络文件系统或者也许是 [rsync](https://en.wikipedia.org/wiki/Rsync) 来进行).

View File

@ -649,11 +649,22 @@ log_query_threads=1
## max_query_size {#settings-max_query_size}
查询的最大部分可以被带到RAM用于使用SQL解析器进行解析。
插入查询还包含由单独的流解析器消耗O(1)RAM处理的插入数据这些数据不包含在此限制中。
SQL 解析器解析的查询字符串的最大字节数。 INSERT 查询的 VALUES 子句中的数据由单独的流解析器(消耗 O(1) RAM处理并且不受此限制的影响。
默认值256KiB。
## max_parser_depth {#max_parser_depth}
限制递归下降解析器中的最大递归深度。允许控制堆栈大小。
可能的值:
- 正整数。
- 0 — 递归深度不受限制。
默认值1000。
## interactive_delay {#interactive-delay}
以微秒为单位的间隔,用于检查请求执行是否已被取消并发送进度。
@ -1064,6 +1075,28 @@ ClickHouse生成异常
默认值0。
## optimize_functions_to_subcolumns {#optimize_functions_to_subcolumns}
启用或禁用通过将某些函数转换为读取子列的优化。这减少了要读取的数据量。
这些函数可以转化为:
- [length](../../sql-reference/functions/array-functions.md/#array_functions-length) 读取 [size0](../../sql-reference/data-types/array.md/#array-size子列。
- [empty](../../sql-reference/functions/array-functions.md/#empty函数) 读取 [size0](../../sql-reference/data-types/array.md/#array-size子列。
- [notEmpty](../../sql-reference/functions/array-functions.md/#notempty函数) 读取 [size0](../../sql-reference/data-types/array.md/#array-size子列。
- [isNull](../../sql-reference/operators/index.md#operator-is-null) 读取 [null](../../sql-reference/data-types/nullable. md/#finding-null) 子列。
- [isNotNull](../../sql-reference/operators/index.md#is-not-null) 读取 [null](../../sql-reference/data-types/nullable. md/#finding-null) 子列。
- [count](../../sql-reference/aggregate-functions/reference/count.md) 读取 [null](../../sql-reference/data-types/nullable.md/#finding-null) 子列。
- [mapKeys](../../sql-reference/functions/tuple-map-functions.mdx/#mapkeys) 读取 [keys](../../sql-reference/data-types/map.md/#map-subcolumns) 子列。
- [mapValues](../../sql-reference/functions/tuple-map-functions.mdx/#mapvalues) 读取 [values](../../sql-reference/data-types/map.md/#map-subcolumns) 子列。
可能的值:
- 0 — 禁用优化。
- 1 — 优化已启用。
默认值:`0`。
## distributed_replica_error_half_life {#settings-distributed_replica_error_half_life}
- 类型:秒

View File

@ -1,172 +0,0 @@
---
slug: /zh/operations/utilities/clickhouse-copier
---
# clickhouse-copier {#clickhouse-copier}
将数据从一个群集中的表复制到另一个(或相同)群集中的表。
您可以运行多个 `clickhouse-copier` 不同服务器上的实例执行相同的作业。 ZooKeeper用于同步进程。
开始后, `clickhouse-copier`:
- 连接到ZooKeeper并且接收:
- 复制作业。
- 复制作业的状态。
- 它执行的工作。
每个正在运行的进程都会选择源集群的“最接近”分片,然后将数据复制到目标集群,并在必要时重新分片数据。
`clickhouse-copier` 跟踪ZooKeeper中的更改并实时应用它们。
为了减少网络流量,我们建议运行 `clickhouse-copier` 在源数据所在的同一服务器上。
## 运行Clickhouse-copier {#running-clickhouse-copier}
该实用程序应手动运行:
``` bash
clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --base-dir /path/to/dir
```
参数:
- `daemon` — 在守护进程模式下启动`clickhouse-copier`。
- `config``zookeeper.xml`文件的路径其中包含用于连接ZooKeeper的参数。
- `task-path` — ZooKeeper节点的路径。 该节点用于同步`clickhouse-copier`进程和存储任务。 任务存储在`$task-path/description`中。
- `task-file` — 可选的非必须参数, 指定一个包含任务配置的参数文件, 用于初始上传到ZooKeeper。
- `task-upload-force` — 即使节点已经存在,也强制上载`task-file`。
- `base-dir` — 日志和辅助文件的路径。 启动时,`clickhouse-copier`在`$base-dir`中创建`clickhouse-copier_YYYYMMHHSS_<PID>`子目录。 如果省略此参数,则会在启动`clickhouse-copier`的目录中创建目录。
## Zookeeper.xml格式 {#format-of-zookeeper-xml}
``` xml
<clickhouse>
<logger>
<level>trace</level>
<size>100M</size>
<count>3</count>
</logger>
<zookeeper>
<node index="1">
<host>127.0.0.1</host>
<port>2181</port>
</node>
</zookeeper>
</clickhouse>
```
## 复制任务的配置 {#configuration-of-copying-tasks}
``` xml
<clickhouse>
<!-- Configuration of clusters as in an ordinary server config -->
<remote_servers>
<source_cluster>
<shard>
<internal_replication>false</internal_replication>
<replica>
<host>127.0.0.1</host>
<port>9000</port>
</replica>
</shard>
...
</source_cluster>
<destination_cluster>
...
</destination_cluster>
</remote_servers>
<!-- How many simultaneously active workers are possible. If you run more workers superfluous workers will sleep. -->
<max_workers>2</max_workers>
<!-- Setting used to fetch (pull) data from source cluster tables -->
<settings_pull>
<readonly>1</readonly>
</settings_pull>
<!-- Setting used to insert (push) data to destination cluster tables -->
<settings_push>
<readonly>0</readonly>
</settings_push>
<!-- Common setting for fetch (pull) and insert (push) operations. Also, copier process context uses it.
They are overlaid by <settings_pull/> and <settings_push/> respectively. -->
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.
You could specify several table task in the same task description (in the same ZooKeeper node), they will be performed
sequentially.
-->
<tables>
<!-- A table task, copies one table. -->
<table_hits>
<!-- Source cluster name (from <remote_servers/> section) and tables in it that should be copied -->
<cluster_pull>source_cluster</cluster_pull>
<database_pull>test</database_pull>
<table_pull>hits</table_pull>
<!-- Destination cluster name and tables in which the data should be inserted -->
<cluster_push>destination_cluster</cluster_push>
<database_push>test</database_push>
<table_push>hits2</table_push>
<!-- Engine of destination tables.
If destination tables have not be created, workers create them using columns definition from source tables and engine
definition from here.
NOTE: If the first worker starts insert data and detects that destination partition is not empty then the partition will
be dropped and refilled, take it into account if you already have some data in destination tables. You could directly
specify partitions that should be copied in <enabled_partitions/>, they should be in quoted format like partition column of
system.parts table.
-->
<engine>
ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}')
PARTITION BY toMonday(date)
ORDER BY (CounterID, EventDate)
</engine>
<!-- Sharding key used to insert data to destination cluster -->
<sharding_key>jumpConsistentHash(intHash64(UserID), 2)</sharding_key>
<!-- Optional expression that filter data while pull them from source servers -->
<where_condition>CounterID != 0</where_condition>
<!-- This section specifies partitions that should be copied, other partition will be ignored.
Partition names should have the same format as
partition column of system.parts table (i.e. a quoted text).
Since partition key of source and destination cluster could be different,
these partition names specify destination partitions.
NOTE: In spite of this section is optional (if it is not specified, all partitions will be copied),
it is strictly recommended to specify them explicitly.
If you already have some ready partitions on destination cluster they
will be removed at the start of the copying since they will be interpeted
as unfinished data from the previous copying!!!
-->
<enabled_partitions>
<partition>'2018-02-26'</partition>
<partition>'2018-03-05'</partition>
...
</enabled_partitions>
</table_hits>
<!-- Next table to copy. It is not copied until previous table is copying. -->
<table_visits>
...
</table_visits>
...
</tables>
</clickhouse>
```
`clickhouse-copier` 跟踪更改 `/task/path/description` 并在飞行中应用它们。 例如,如果你改变的值 `max_workers`,运行任务的进程数也会发生变化。

View File

@ -4,5 +4,4 @@ slug: /zh/operations/utilities/
# 实用工具 {#clickhouse-utility}
- [本地查询](clickhouse-local.md) — 在不停止ClickHouse服务的情况下对数据执行查询操作(类似于 `awk` 命令)。
- [跨集群复制](clickhouse-copier.md) — 在不同集群间复制数据。
- [性能测试](clickhouse-benchmark.md) — 连接到Clickhouse服务器执行性能测试。

View File

@ -1,7 +1,7 @@
---
slug: /zh/sql-reference/data-types/array
---
# 阵列(T) {#data-type-array}
# 数组(T) {#data-type-array}
`T` 类型元素组成的数组。
@ -66,3 +66,27 @@ SELECT array(1, 'a')
Received exception from server (version 1.1.54388):
Code: 386. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: There is no supertype for types UInt8, String because some of them are String/FixedString and some of them are not.
```
## 数组大小 {#array-size}
可以使用 `size0` 子列找到数组的大小,而无需读取整个列。对于多维数组,您可以使用 `sizeN-1`,其中 `N` 是所需的维度。
**例子**
SQL查询
```sql
CREATE TABLE t_arr (`arr` Array(Array(Array(UInt32)))) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO t_arr VALUES ([[[12, 13, 0, 1],[12]]]);
SELECT arr.size0, arr.size1, arr.size2 FROM t_arr;
```
结果:
``` text
┌─arr.size0─┬─arr.size1─┬─arr.size2─┐
│ 1 │ [2] │ [[4,1]] │
└───────────┴───────────┴───────────┘
```

View File

@ -20,6 +20,33 @@ slug: /zh/sql-reference/data-types/nullable
掩码文件中的条目允许ClickHouse区分每个表行的对应数据类型的«NULL»和默认值由于有额外的文件«Nullable»列比普通列消耗更多的存储空间
## null子列 {#finding-null}
通过使用 `null` 子列可以在列中查找 `NULL` 值,而无需读取整个列。如果对应的值为 `NULL`,则返回 `1`,否则返回 `0`
**示例**
SQL查询:
``` sql
CREATE TABLE nullable (`n` Nullable(UInt32)) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO nullable VALUES (1) (NULL) (2) (NULL);
SELECT n.null FROM nullable;
```
结果:
``` text
┌─n.null─┐
│ 0 │
│ 1 │
│ 0 │
│ 1 │
└────────┘
```
## 用法示例 {#yong-fa-shi-li}
``` sql

View File

@ -150,7 +150,7 @@ ALTER TABLE visits MODIFY COLUMN browser Array(String)
不支持对primary key或者sampling key中的列`ENGINE` 表达式中用到的列进行删除操作。改变包含在primary key中的列的类型时如果操作不会导致数据的变化例如往Enum中添加一个值或者将`DateTime` 类型改成 `UInt32`),那么这种操作是可行的。
如果 `ALTER` 操作不足以完成你想要的表变动操作,你可以创建一张新的表,通过 [INSERT SELECT](../../sql-reference/statements/insert-into.md#inserting-the-results-of-select)将数据拷贝进去,然后通过 [RENAME](../../sql-reference/statements/misc.md#misc_operations-rename)将新的表改成和原有表一样的名称,并删除原有的表。你可以使用 [clickhouse-copier](../../operations/utilities/clickhouse-copier.md) 代替 `INSERT SELECT`
如果 `ALTER` 操作不足以完成你想要的表变动操作,你可以创建一张新的表,通过 [INSERT SELECT](../../sql-reference/statements/insert-into.md#inserting-the-results-of-select)将数据拷贝进去,然后通过 [RENAME](../../sql-reference/statements/misc.md#misc_operations-rename)将新的表改成和原有表一样的名称,并删除原有的表。
`ALTER` 操作会阻塞对表的所有读写操作。换句话说,当一个大的 `SELECT` 语句和 `ALTER`同时执行时,`ALTER`会等待,直到 `SELECT` 执行结束。与此同时,当 `ALTER` 运行时,新的 sql 语句将会等待。

View File

@ -50,8 +50,6 @@ contents:
dst: /etc/init.d/clickhouse-server
- src: clickhouse-server.service
dst: /lib/systemd/system/clickhouse-server.service
- src: root/usr/bin/clickhouse-copier
dst: /usr/bin/clickhouse-copier
- src: root/usr/bin/clickhouse-server
dst: /usr/bin/clickhouse-server
# clickhouse-keeper part

Some files were not shown because too many files have changed in this diff Show More