Merge branch 'master' into agg-func-setting-null-for-empty

This commit is contained in:
alexey-milovidov 2020-11-04 13:24:39 +03:00 committed by GitHub
commit 6f004e4abd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1559 changed files with 64534 additions and 30435 deletions

4
.gitmodules vendored
View File

@ -186,3 +186,7 @@
path = contrib/cyrus-sasl
url = https://github.com/cyrusimap/cyrus-sasl
branch = cyrus-sasl-2.1
[submodule "contrib/croaring"]
path = contrib/croaring
url = https://github.com/RoaringBitmap/CRoaring
branch = v0.2.66

View File

@ -409,7 +409,7 @@
## ClickHouse release 20.6
### ClickHouse release v20.6.3.28-stable
### ClickHouse release v20.6.3.28-stable
#### New Feature
@ -2362,7 +2362,7 @@ No changes compared to v20.4.3.16-stable.
* `Live View` table engine refactoring. [#8519](https://github.com/ClickHouse/ClickHouse/pull/8519) ([vzakaznikov](https://github.com/vzakaznikov))
* Add additional checks for external dictionaries created from DDL-queries. [#8127](https://github.com/ClickHouse/ClickHouse/pull/8127) ([alesapin](https://github.com/alesapin))
* Fix error `Column ... already exists` while using `FINAL` and `SAMPLE` together, e.g. `select count() from table final sample 1/2`. Fixes [#5186](https://github.com/ClickHouse/ClickHouse/issues/5186). [#7907](https://github.com/ClickHouse/ClickHouse/pull/7907) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Now table the first argument of `joinGet` function can be table indentifier. [#7707](https://github.com/ClickHouse/ClickHouse/pull/7707) ([Amos Bird](https://github.com/amosbird))
* Now table the first argument of `joinGet` function can be table identifier. [#7707](https://github.com/ClickHouse/ClickHouse/pull/7707) ([Amos Bird](https://github.com/amosbird))
* Allow using `MaterializedView` with subqueries above `Kafka` tables. [#8197](https://github.com/ClickHouse/ClickHouse/pull/8197) ([filimonov](https://github.com/filimonov))
* Now background moves between disks run it the seprate thread pool. [#7670](https://github.com/ClickHouse/ClickHouse/pull/7670) ([Vladimir Chebotarev](https://github.com/excitoon))
* `SYSTEM RELOAD DICTIONARY` now executes synchronously. [#8240](https://github.com/ClickHouse/ClickHouse/pull/8240) ([Vitaly Baranov](https://github.com/vitlibar))

View File

@ -59,25 +59,6 @@ set(CMAKE_DEBUG_POSTFIX "d" CACHE STRING "Generate debug library name with a pos
# For more info see https://cmake.org/cmake/help/latest/prop_gbl/USE_FOLDERS.html
set_property(GLOBAL PROPERTY USE_FOLDERS ON)
# cmake 3.9+ needed.
# Usually impractical.
# See also ${ENABLE_THINLTO}
option(ENABLE_IPO "Full link time optimization")
if(ENABLE_IPO)
cmake_policy(SET CMP0069 NEW)
include(CheckIPOSupported)
check_ipo_supported(RESULT IPO_SUPPORTED OUTPUT IPO_NOT_SUPPORTED)
if(IPO_SUPPORTED)
message(STATUS "IPO/LTO is supported, enabling")
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)
else()
message (${RECONFIGURE_MESSAGE_LEVEL} "IPO/LTO is not supported: <${IPO_NOT_SUPPORTED}>")
endif()
else()
message(STATUS "IPO/LTO not enabled.")
endif()
# Check that submodules are present only if source was downloaded with git
if (EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/.git" AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/boost/boost")
message (FATAL_ERROR "Submodules are not initialized. Run\n\tgit submodule update --init --recursive")

View File

@ -17,4 +17,6 @@ ClickHouse is an open-source column-oriented database management system that all
## Upcoming Events
* [ClickHouse virtual office hours](https://www.eventbrite.com/e/clickhouse-october-virtual-meetup-office-hours-tickets-123129500651) on October 22, 2020.
* [The Second ClickHouse Meetup East (online)](https://www.eventbrite.com/e/the-second-clickhouse-meetup-east-tickets-126787955187) on October 31, 2020.
* [ClickHouse for Enterprise Meetup (online in Russian)](https://arenadata-events.timepad.ru/event/1465249/) on November 10, 2020.

View File

@ -51,7 +51,7 @@ struct StringRef
};
/// Here constexpr doesn't implicate inline, see https://www.viva64.com/en/w/v1043/
/// nullptr can't be used because the StringRef values are used in SipHash's pointer arithmetics
/// nullptr can't be used because the StringRef values are used in SipHash's pointer arithmetic
/// and the UBSan thinks that something like nullptr + 8 is UB.
constexpr const inline char empty_string_ref_addr{};
constexpr const inline StringRef EMPTY_STRING_REF{&empty_string_ref_addr, 0};

View File

@ -0,0 +1,339 @@
/* origin: OpenBSD /usr/src/lib/libm/src/ld80/e_lgammal.c */
/*
* ====================================================
* Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
*
* Developed at SunPro, a Sun Microsystems, Inc. business.
* Permission to use, copy, modify, and distribute this
* software is freely granted, provided that this notice
* is preserved.
* ====================================================
*/
/*
* Copyright (c) 2008 Stephen L. Moshier <steve@moshier.net>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
/* lgammal(x)
* Reentrant version of the logarithm of the Gamma function
* with user provide pointer for the sign of Gamma(x).
*
* Method:
* 1. Argument Reduction for 0 < x <= 8
* Since gamma(1+s)=s*gamma(s), for x in [0,8], we may
* reduce x to a number in [1.5,2.5] by
* lgamma(1+s) = log(s) + lgamma(s)
* for example,
* lgamma(7.3) = log(6.3) + lgamma(6.3)
* = log(6.3*5.3) + lgamma(5.3)
* = log(6.3*5.3*4.3*3.3*2.3) + lgamma(2.3)
* 2. Polynomial approximation of lgamma around its
* minimun ymin=1.461632144968362245 to maintain monotonicity.
* On [ymin-0.23, ymin+0.27] (i.e., [1.23164,1.73163]), use
* Let z = x-ymin;
* lgamma(x) = -1.214862905358496078218 + z^2*poly(z)
* 2. Rational approximation in the primary interval [2,3]
* We use the following approximation:
* s = x-2.0;
* lgamma(x) = 0.5*s + s*P(s)/Q(s)
* Our algorithms are based on the following observation
*
* zeta(2)-1 2 zeta(3)-1 3
* lgamma(2+s) = s*(1-Euler) + --------- * s - --------- * s + ...
* 2 3
*
* where Euler = 0.5771... is the Euler constant, which is very
* close to 0.5.
*
* 3. For x>=8, we have
* lgamma(x)~(x-0.5)log(x)-x+0.5*log(2pi)+1/(12x)-1/(360x**3)+....
* (better formula:
* lgamma(x)~(x-0.5)*(log(x)-1)-.5*(log(2pi)-1) + ...)
* Let z = 1/x, then we approximation
* f(z) = lgamma(x) - (x-0.5)(log(x)-1)
* by
* 3 5 11
* w = w0 + w1*z + w2*z + w3*z + ... + w6*z
*
* 4. For negative x, since (G is gamma function)
* -x*G(-x)*G(x) = pi/sin(pi*x),
* we have
* G(x) = pi/(sin(pi*x)*(-x)*G(-x))
* since G(-x) is positive, sign(G(x)) = sign(sin(pi*x)) for x<0
* Hence, for x<0, signgam = sign(sin(pi*x)) and
* lgamma(x) = log(|Gamma(x)|)
* = log(pi/(|x*sin(pi*x)|)) - lgamma(-x);
* Note: one should avoid compute pi*(-x) directly in the
* computation of sin(pi*(-x)).
*
* 5. Special Cases
* lgamma(2+s) ~ s*(1-Euler) for tiny s
* lgamma(1)=lgamma(2)=0
* lgamma(x) ~ -log(x) for tiny x
* lgamma(0) = lgamma(inf) = inf
* lgamma(-integer) = +-inf
*
*/
#include <stdint.h>
#include <math.h>
#include "libm.h"
#if LDBL_MANT_DIG == 53 && LDBL_MAX_EXP == 1024
double lgamma_r(double x, int *sg);
long double lgammal_r(long double x, int *sg)
{
return lgamma_r(x, sg);
}
#elif LDBL_MANT_DIG == 64 && LDBL_MAX_EXP == 16384
static const long double pi = 3.14159265358979323846264L,
/* lgam(1+x) = 0.5 x + x a(x)/b(x)
-0.268402099609375 <= x <= 0
peak relative error 6.6e-22 */
a0 = -6.343246574721079391729402781192128239938E2L,
a1 = 1.856560238672465796768677717168371401378E3L,
a2 = 2.404733102163746263689288466865843408429E3L,
a3 = 8.804188795790383497379532868917517596322E2L,
a4 = 1.135361354097447729740103745999661157426E2L,
a5 = 3.766956539107615557608581581190400021285E0L,
b0 = 8.214973713960928795704317259806842490498E3L,
b1 = 1.026343508841367384879065363925870888012E4L,
b2 = 4.553337477045763320522762343132210919277E3L,
b3 = 8.506975785032585797446253359230031874803E2L,
b4 = 6.042447899703295436820744186992189445813E1L,
/* b5 = 1.000000000000000000000000000000000000000E0 */
tc = 1.4616321449683623412626595423257213284682E0L,
tf = -1.2148629053584961146050602565082954242826E-1, /* double precision */
/* tt = (tail of tf), i.e. tf + tt has extended precision. */
tt = 3.3649914684731379602768989080467587736363E-18L,
/* lgam ( 1.4616321449683623412626595423257213284682E0 ) =
-1.2148629053584960809551455717769158215135617312999903886372437313313530E-1 */
/* lgam (x + tc) = tf + tt + x g(x)/h(x)
-0.230003726999612341262659542325721328468 <= x
<= 0.2699962730003876587373404576742786715318
peak relative error 2.1e-21 */
g0 = 3.645529916721223331888305293534095553827E-18L,
g1 = 5.126654642791082497002594216163574795690E3L,
g2 = 8.828603575854624811911631336122070070327E3L,
g3 = 5.464186426932117031234820886525701595203E3L,
g4 = 1.455427403530884193180776558102868592293E3L,
g5 = 1.541735456969245924860307497029155838446E2L,
g6 = 4.335498275274822298341872707453445815118E0L,
h0 = 1.059584930106085509696730443974495979641E4L,
h1 = 2.147921653490043010629481226937850618860E4L,
h2 = 1.643014770044524804175197151958100656728E4L,
h3 = 5.869021995186925517228323497501767586078E3L,
h4 = 9.764244777714344488787381271643502742293E2L,
h5 = 6.442485441570592541741092969581997002349E1L,
/* h6 = 1.000000000000000000000000000000000000000E0 */
/* lgam (x+1) = -0.5 x + x u(x)/v(x)
-0.100006103515625 <= x <= 0.231639862060546875
peak relative error 1.3e-21 */
u0 = -8.886217500092090678492242071879342025627E1L,
u1 = 6.840109978129177639438792958320783599310E2L,
u2 = 2.042626104514127267855588786511809932433E3L,
u3 = 1.911723903442667422201651063009856064275E3L,
u4 = 7.447065275665887457628865263491667767695E2L,
u5 = 1.132256494121790736268471016493103952637E2L,
u6 = 4.484398885516614191003094714505960972894E0L,
v0 = 1.150830924194461522996462401210374632929E3L,
v1 = 3.399692260848747447377972081399737098610E3L,
v2 = 3.786631705644460255229513563657226008015E3L,
v3 = 1.966450123004478374557778781564114347876E3L,
v4 = 4.741359068914069299837355438370682773122E2L,
v5 = 4.508989649747184050907206782117647852364E1L,
/* v6 = 1.000000000000000000000000000000000000000E0 */
/* lgam (x+2) = .5 x + x s(x)/r(x)
0 <= x <= 1
peak relative error 7.2e-22 */
s0 = 1.454726263410661942989109455292824853344E6L,
s1 = -3.901428390086348447890408306153378922752E6L,
s2 = -6.573568698209374121847873064292963089438E6L,
s3 = -3.319055881485044417245964508099095984643E6L,
s4 = -7.094891568758439227560184618114707107977E5L,
s5 = -6.263426646464505837422314539808112478303E4L,
s6 = -1.684926520999477529949915657519454051529E3L,
r0 = -1.883978160734303518163008696712983134698E7L,
r1 = -2.815206082812062064902202753264922306830E7L,
r2 = -1.600245495251915899081846093343626358398E7L,
r3 = -4.310526301881305003489257052083370058799E6L,
r4 = -5.563807682263923279438235987186184968542E5L,
r5 = -3.027734654434169996032905158145259713083E4L,
r6 = -4.501995652861105629217250715790764371267E2L,
/* r6 = 1.000000000000000000000000000000000000000E0 */
/* lgam(x) = ( x - 0.5 ) * log(x) - x + LS2PI + 1/x w(1/x^2)
x >= 8
Peak relative error 1.51e-21
w0 = LS2PI - 0.5 */
w0 = 4.189385332046727417803e-1L,
w1 = 8.333333333333331447505E-2L,
w2 = -2.777777777750349603440E-3L,
w3 = 7.936507795855070755671E-4L,
w4 = -5.952345851765688514613E-4L,
w5 = 8.412723297322498080632E-4L,
w6 = -1.880801938119376907179E-3L,
w7 = 4.885026142432270781165E-3L;
long double lgammal_r(long double x, int *sg) {
long double t, y, z, nadj, p, p1, p2, q, r, w;
union ldshape u = {x};
uint32_t ix = (u.i.se & 0x7fffU)<<16 | u.i.m>>48;
int sign = u.i.se >> 15;
int i;
*sg = 1;
/* purge off +-inf, NaN, +-0, tiny and negative arguments */
if (ix >= 0x7fff0000)
return x * x;
if (ix < 0x3fc08000) { /* |x|<2**-63, return -log(|x|) */
if (sign) {
*sg = -1;
x = -x;
}
return -logl(x);
}
if (sign) {
x = -x;
t = sin(pi * x);
if (t == 0.0)
return 1.0 / (x-x); /* -integer */
if (t > 0.0)
*sg = -1;
else
t = -t;
nadj = logl(pi / (t * x));
}
/* purge off 1 and 2 (so the sign is ok with downward rounding) */
if ((ix == 0x3fff8000 || ix == 0x40008000) && u.i.m == 0) {
r = 0;
} else if (ix < 0x40008000) { /* x < 2.0 */
if (ix <= 0x3ffee666) { /* 8.99993896484375e-1 */
/* lgamma(x) = lgamma(x+1) - log(x) */
r = -logl(x);
if (ix >= 0x3ffebb4a) { /* 7.31597900390625e-1 */
y = x - 1.0;
i = 0;
} else if (ix >= 0x3ffced33) { /* 2.31639862060546875e-1 */
y = x - (tc - 1.0);
i = 1;
} else { /* x < 0.23 */
y = x;
i = 2;
}
} else {
r = 0.0;
if (ix >= 0x3fffdda6) { /* 1.73162841796875 */
/* [1.7316,2] */
y = x - 2.0;
i = 0;
} else if (ix >= 0x3fff9da6) { /* 1.23162841796875 */
/* [1.23,1.73] */
y = x - tc;
i = 1;
} else {
/* [0.9, 1.23] */
y = x - 1.0;
i = 2;
}
}
switch (i) {
case 0:
p1 = a0 + y * (a1 + y * (a2 + y * (a3 + y * (a4 + y * a5))));
p2 = b0 + y * (b1 + y * (b2 + y * (b3 + y * (b4 + y))));
r += 0.5 * y + y * p1/p2;
break;
case 1:
p1 = g0 + y * (g1 + y * (g2 + y * (g3 + y * (g4 + y * (g5 + y * g6)))));
p2 = h0 + y * (h1 + y * (h2 + y * (h3 + y * (h4 + y * (h5 + y)))));
p = tt + y * p1/p2;
r += (tf + p);
break;
case 2:
p1 = y * (u0 + y * (u1 + y * (u2 + y * (u3 + y * (u4 + y * (u5 + y * u6))))));
p2 = v0 + y * (v1 + y * (v2 + y * (v3 + y * (v4 + y * (v5 + y)))));
r += (-0.5 * y + p1 / p2);
}
} else if (ix < 0x40028000) { /* 8.0 */
/* x < 8.0 */
i = (int)x;
y = x - (double)i;
p = y * (s0 + y * (s1 + y * (s2 + y * (s3 + y * (s4 + y * (s5 + y * s6))))));
q = r0 + y * (r1 + y * (r2 + y * (r3 + y * (r4 + y * (r5 + y * (r6 + y))))));
r = 0.5 * y + p / q;
z = 1.0;
/* lgamma(1+s) = log(s) + lgamma(s) */
switch (i) {
case 7:
z *= (y + 6.0); /* FALLTHRU */
case 6:
z *= (y + 5.0); /* FALLTHRU */
case 5:
z *= (y + 4.0); /* FALLTHRU */
case 4:
z *= (y + 3.0); /* FALLTHRU */
case 3:
z *= (y + 2.0); /* FALLTHRU */
r += logl(z);
break;
}
} else if (ix < 0x40418000) { /* 2^66 */
/* 8.0 <= x < 2**66 */
t = logl(x);
z = 1.0 / x;
y = z * z;
w = w0 + z * (w1 + y * (w2 + y * (w3 + y * (w4 + y * (w5 + y * (w6 + y * w7))))));
r = (x - 0.5) * (t - 1.0) + w;
} else /* 2**66 <= x <= inf */
r = x * (logl(x) - 1.0);
if (sign)
r = nadj - r;
return r;
}
#elif LDBL_MANT_DIG == 113 && LDBL_MAX_EXP == 16384
// TODO: broken implementation to make things compile
double lgamma_r(double x, int *sg);
long double lgammal_r(long double x, int *sg)
{
return lgamma_r(x, sg);
}
#endif
int signgam_lgammal;
long double lgammal(long double x)
{
return lgammal_r(x, &signgam_lgammal);
}

View File

@ -16,8 +16,4 @@ endif ()
if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(ppc64le.*|PPC64LE.*)")
set (ARCH_PPC64LE 1)
# FIXME: move this check into tools.cmake
if (COMPILER_CLANG OR (COMPILER_GCC AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 8))
message(FATAL_ERROR "Only gcc-8 or higher is supported for powerpc architecture")
endif ()
endif ()

View File

@ -57,8 +57,8 @@ if (SANITIZE)
endif ()
elseif (SANITIZE STREQUAL "undefined")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=float-divide-by-zero")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=float-divide-by-zero")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=float-divide-by-zero -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/ubsan_suppressions.txt")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} -fsanitize=undefined -fno-sanitize-recover=all -fno-sanitize=float-divide-by-zero -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/ubsan_suppressions.txt")
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=undefined")
endif()

View File

@ -15,6 +15,10 @@ if (COMPILER_GCC)
elseif (COMPILER_CLANG)
# Require minimum version of clang/apple-clang
if (CMAKE_CXX_COMPILER_ID MATCHES "AppleClang")
# If you are developer you can figure out what exact versions of AppleClang are Ok,
# remove the following line and commit changes below.
message (FATAL_ERROR "AppleClang is not supported, you should install clang from brew.")
# AppleClang 10.0.1 (Xcode 10.2) corresponds to LLVM/Clang upstream version 7.0.0
# AppleClang 11.0.0 (Xcode 11.0) corresponds to LLVM/Clang upstream version 8.0.0
set (XCODE_MINIMUM_VERSION 10.2)
@ -80,3 +84,9 @@ if (LINKER_NAME)
message(STATUS "Using custom linker by name: ${LINKER_NAME}")
endif ()
if (ARCH_PPC64LE)
if (COMPILER_CLANG OR (COMPILER_GCC AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 8))
message(FATAL_ERROR "Only gcc-8 or higher is supported for powerpc architecture")
endif ()
endif ()

View File

@ -11,11 +11,11 @@ CFLAGS (GLOBAL -DDBMS_VERSION_MAJOR=${VERSION_MAJOR})
CFLAGS (GLOBAL -DDBMS_VERSION_MINOR=${VERSION_MINOR})
CFLAGS (GLOBAL -DDBMS_VERSION_PATCH=${VERSION_PATCH})
CFLAGS (GLOBAL -DVERSION_FULL=\"\\\"${VERSION_FULL}\\\"\")
CFLAGS (GLOBAL -DVERSION_MAJOR=${VERSION_MAJOR})
CFLAGS (GLOBAL -DVERSION_MINOR=${VERSION_MINOR})
CFLAGS (GLOBAL -DVERSION_MAJOR=${VERSION_MAJOR})
CFLAGS (GLOBAL -DVERSION_MINOR=${VERSION_MINOR})
CFLAGS (GLOBAL -DVERSION_PATCH=${VERSION_PATCH})
# TODO: not supported yet, not sure if ya.make supports arithmetics.
# TODO: not supported yet, not sure if ya.make supports arithmetic.
CFLAGS (GLOBAL -DVERSION_INTEGER=0)
CFLAGS (GLOBAL -DVERSION_NAME=\"\\\"${VERSION_NAME}\\\"\")

View File

@ -20,7 +20,6 @@ add_subdirectory (boost-cmake)
add_subdirectory (cctz-cmake)
add_subdirectory (consistent-hashing-sumbur)
add_subdirectory (consistent-hashing)
add_subdirectory (croaring)
add_subdirectory (FastMemcpy)
add_subdirectory (hyperscan-cmake)
add_subdirectory (jemalloc-cmake)
@ -34,6 +33,7 @@ add_subdirectory (ryu-cmake)
add_subdirectory (unixodbc-cmake)
add_subdirectory (poco-cmake)
add_subdirectory (croaring-cmake)
# TODO: refactor the contrib libraries below this comment.

1
contrib/croaring vendored Submodule

@ -0,0 +1 @@
Subproject commit 5f20740ec0de5e153e8f4cb2ab91814e8b291a14

View File

@ -0,0 +1,25 @@
set(LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/croaring)
set(SRCS
${LIBRARY_DIR}/src/array_util.c
${LIBRARY_DIR}/src/bitset_util.c
${LIBRARY_DIR}/src/containers/array.c
${LIBRARY_DIR}/src/containers/bitset.c
${LIBRARY_DIR}/src/containers/containers.c
${LIBRARY_DIR}/src/containers/convert.c
${LIBRARY_DIR}/src/containers/mixed_intersection.c
${LIBRARY_DIR}/src/containers/mixed_union.c
${LIBRARY_DIR}/src/containers/mixed_equal.c
${LIBRARY_DIR}/src/containers/mixed_subset.c
${LIBRARY_DIR}/src/containers/mixed_negation.c
${LIBRARY_DIR}/src/containers/mixed_xor.c
${LIBRARY_DIR}/src/containers/mixed_andnot.c
${LIBRARY_DIR}/src/containers/run.c
${LIBRARY_DIR}/src/roaring.c
${LIBRARY_DIR}/src/roaring_priority_queue.c
${LIBRARY_DIR}/src/roaring_array.c)
add_library(roaring ${SRCS})
target_include_directories(roaring PRIVATE ${LIBRARY_DIR}/include/roaring)
target_include_directories(roaring SYSTEM BEFORE PUBLIC ${LIBRARY_DIR}/include)

View File

@ -1,6 +0,0 @@
add_library(roaring
roaring.c
roaring/roaring.h
roaring/roaring.hh)
target_include_directories (roaring SYSTEM PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})

View File

@ -1,202 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2016 The CRoaring authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -1,2 +0,0 @@
download from https://github.com/RoaringBitmap/CRoaring/archive/v0.2.57.tar.gz
and use ./amalgamation.sh generate

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -192,7 +192,7 @@ set(SRCS
${HDFS3_SOURCE_DIR}/common/FileWrapper.h
)
# old kernels (< 3.17) doens't have SYS_getrandom. Always use POSIX implementation to have better compatibility
# old kernels (< 3.17) doesn't have SYS_getrandom. Always use POSIX implementation to have better compatibility
set_source_files_properties(${HDFS3_SOURCE_DIR}/rpc/RpcClient.cpp PROPERTIES COMPILE_FLAGS "-DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX=1")
# target

@ -1 +1 @@
Subproject commit f5638e954a79f50bac7c7a5deaa5a241e0ce8b5f
Subproject commit 1485b0de3eaa1508dfe49a5ba1e4aa2a71fd8335

View File

@ -153,82 +153,19 @@ initdb()
start()
{
[ -x $CLICKHOUSE_BINDIR/$PROGRAM ] || exit 0
local EXIT_STATUS
EXIT_STATUS=0
echo -n "Start $PROGRAM service: "
if is_running; then
echo -n "already running "
EXIT_STATUS=1
else
ulimit -n 262144
mkdir -p $CLICKHOUSE_PIDDIR
chown -R $CLICKHOUSE_USER:$CLICKHOUSE_GROUP $CLICKHOUSE_PIDDIR
initdb
if ! is_running; then
# Lock should not be held while running child process, so we release the lock. Note: obviously, there is race condition.
# But clickhouse-server has protection from simultaneous runs with same data directory.
su -s $SHELL ${CLICKHOUSE_USER} -c "$FLOCK -u 9; $CLICKHOUSE_PROGRAM_ENV exec -a \"$PROGRAM\" \"$CLICKHOUSE_BINDIR/$PROGRAM\" --daemon --pid-file=\"$CLICKHOUSE_PIDFILE\" --config-file=\"$CLICKHOUSE_CONFIG\""
EXIT_STATUS=$?
if [ $EXIT_STATUS -ne 0 ]; then
return $EXIT_STATUS
fi
fi
fi
if [ $EXIT_STATUS -eq 0 ]; then
attempts=0
while ! is_running && [ $attempts -le ${CLICKHOUSE_START_TIMEOUT:=10} ]; do
attempts=$(($attempts + 1))
sleep 1
done
if is_running; then
echo "DONE"
else
echo "UNKNOWN"
fi
else
echo "FAILED"
fi
return $EXIT_STATUS
${CLICKHOUSE_GENERIC_PROGRAM} start --user "${CLICKHOUSE_USER}" --pid-path "${CLICKHOUSE_PIDDIR}" --config-path "${CLICKHOUSE_CONFDIR}" --binary-path "${CLICKHOUSE_BINDIR}"
}
stop()
{
#local EXIT_STATUS
EXIT_STATUS=0
if [ -f $CLICKHOUSE_PIDFILE ]; then
echo -n "Stop $PROGRAM service: "
kill -TERM $(cat "$CLICKHOUSE_PIDFILE")
if ! wait_for_done ${CLICKHOUSE_STOP_TIMEOUT}; then
EXIT_STATUS=2
echo "TIMEOUT"
else
echo "DONE"
fi
fi
return $EXIT_STATUS
${CLICKHOUSE_GENERIC_PROGRAM} stop --pid-path "${CLICKHOUSE_PIDDIR}"
}
restart()
{
check_config
if stop; then
if start; then
return 0
fi
fi
return 1
${CLICKHOUSE_GENERIC_PROGRAM} restart --user "${CLICKHOUSE_USER}" --pid-path "${CLICKHOUSE_PIDDIR}" --config-path "${CLICKHOUSE_CONFDIR}" --binary-path "${CLICKHOUSE_BINDIR}"
}

View File

@ -2,6 +2,7 @@
set -e
# set -x
PROGRAM=clickhouse-server
CLICKHOUSE_USER=${CLICKHOUSE_USER:=clickhouse}
CLICKHOUSE_GROUP=${CLICKHOUSE_GROUP:=${CLICKHOUSE_USER}}
# Please note that we don't support paths with whitespaces. This is rather ignorant.
@ -12,6 +13,7 @@ CLICKHOUSE_BINDIR=${CLICKHOUSE_BINDIR:=/usr/bin}
CLICKHOUSE_GENERIC_PROGRAM=${CLICKHOUSE_GENERIC_PROGRAM:=clickhouse}
EXTRACT_FROM_CONFIG=${CLICKHOUSE_GENERIC_PROGRAM}-extract-from-config
CLICKHOUSE_CONFIG=$CLICKHOUSE_CONFDIR/config.xml
CLICKHOUSE_PIDDIR=/var/run/$PROGRAM
[ -f /usr/share/debconf/confmodule ] && . /usr/share/debconf/confmodule
[ -f /etc/default/clickhouse ] && . /etc/default/clickhouse
@ -41,105 +43,5 @@ if [ "$1" = configure ] || [ -n "$not_deb_os" ]; then
fi
fi
# Make sure the administrative user exists
if ! getent passwd ${CLICKHOUSE_USER} > /dev/null; then
if [ -n "$not_deb_os" ]; then
useradd -r -s /bin/false --home-dir /nonexistent ${CLICKHOUSE_USER} > /dev/null
else
adduser --system --disabled-login --no-create-home --home /nonexistent \
--shell /bin/false --group --gecos "ClickHouse server" ${CLICKHOUSE_USER} > /dev/null
fi
fi
# if the user was created manually, make sure the group is there as well
if ! getent group ${CLICKHOUSE_GROUP} > /dev/null; then
groupadd -r ${CLICKHOUSE_GROUP} > /dev/null
fi
# make sure user is in the correct group
if ! id -Gn ${CLICKHOUSE_USER} | grep -qw ${CLICKHOUSE_USER}; then
usermod -a -G ${CLICKHOUSE_GROUP} ${CLICKHOUSE_USER} > /dev/null
fi
# check validity of user and group
if [ "$(id -u ${CLICKHOUSE_USER})" -eq 0 ]; then
echo "The ${CLICKHOUSE_USER} system user must not have uid 0 (root).
Please fix this and reinstall this package." >&2
exit 1
fi
if [ "$(id -g ${CLICKHOUSE_GROUP})" -eq 0 ]; then
echo "The ${CLICKHOUSE_USER} system user must not have root as primary group.
Please fix this and reinstall this package." >&2
exit 1
fi
if [ -x "$CLICKHOUSE_BINDIR/$EXTRACT_FROM_CONFIG" ] && [ -f "$CLICKHOUSE_CONFIG" ]; then
if [ -z "$SHELL" ]; then
SHELL="/bin/sh"
fi
CLICKHOUSE_DATADIR_FROM_CONFIG=$(su -s $SHELL ${CLICKHOUSE_USER} -c "$CLICKHOUSE_BINDIR/$EXTRACT_FROM_CONFIG --config-file=\"$CLICKHOUSE_CONFIG\" --key=path") ||:
echo "Path to data directory in ${CLICKHOUSE_CONFIG}: ${CLICKHOUSE_DATADIR_FROM_CONFIG}"
fi
CLICKHOUSE_DATADIR_FROM_CONFIG=${CLICKHOUSE_DATADIR_FROM_CONFIG:=$CLICKHOUSE_DATADIR}
if [ ! -d ${CLICKHOUSE_DATADIR_FROM_CONFIG} ]; then
mkdir -p ${CLICKHOUSE_DATADIR_FROM_CONFIG}
chown ${CLICKHOUSE_USER}:${CLICKHOUSE_GROUP} ${CLICKHOUSE_DATADIR_FROM_CONFIG}
chmod 700 ${CLICKHOUSE_DATADIR_FROM_CONFIG}
fi
if [ -d ${CLICKHOUSE_CONFDIR} ]; then
mkdir -p ${CLICKHOUSE_CONFDIR}/users.d
mkdir -p ${CLICKHOUSE_CONFDIR}/config.d
rm -fv ${CLICKHOUSE_CONFDIR}/*-preprocessed.xml ||:
fi
[ -e ${CLICKHOUSE_CONFDIR}/preprocessed ] || ln -s ${CLICKHOUSE_DATADIR_FROM_CONFIG}/preprocessed_configs ${CLICKHOUSE_CONFDIR}/preprocessed ||:
if [ ! -d ${CLICKHOUSE_LOGDIR} ]; then
mkdir -p ${CLICKHOUSE_LOGDIR}
chown root:${CLICKHOUSE_GROUP} ${CLICKHOUSE_LOGDIR}
# Allow everyone to read logs, root and clickhouse to read-write
chmod 775 ${CLICKHOUSE_LOGDIR}
fi
# Set net_admin capabilities to support introspection of "taskstats" performance metrics from the kernel
# and ipc_lock capabilities to allow mlock of clickhouse binary.
# 1. Check that "setcap" tool exists.
# 2. Check that an arbitrary program with installed capabilities can run.
# 3. Set the capabilities.
# The second is important for Docker and systemd-nspawn.
# When the container has no capabilities,
# but the executable file inside the container has capabilities,
# then attempt to run this file will end up with a cryptic "Operation not permitted" message.
TMPFILE=/tmp/test_setcap.sh
command -v setcap >/dev/null \
&& echo > $TMPFILE && chmod a+x $TMPFILE && $TMPFILE && setcap "cap_net_admin,cap_ipc_lock,cap_sys_nice+ep" $TMPFILE && $TMPFILE && rm $TMPFILE \
&& setcap "cap_net_admin,cap_ipc_lock,cap_sys_nice+ep" "${CLICKHOUSE_BINDIR}/${CLICKHOUSE_GENERIC_PROGRAM}" \
|| echo "Cannot set 'net_admin' or 'ipc_lock' or 'sys_nice' capability for clickhouse binary. This is optional. Taskstats accounting will be disabled. To enable taskstats accounting you may add the required capability later manually."
# Clean old dynamic compilation results
if [ -d "${CLICKHOUSE_DATADIR_FROM_CONFIG}/build" ]; then
rm -f ${CLICKHOUSE_DATADIR_FROM_CONFIG}/build/*.cpp ${CLICKHOUSE_DATADIR_FROM_CONFIG}/build/*.so ||:
fi
if [ -f /usr/share/debconf/confmodule ]; then
db_get clickhouse-server/default-password
defaultpassword="$RET"
if [ -n "$defaultpassword" ]; then
echo "<yandex><users><default><password>$defaultpassword</password></default></users></yandex>" > ${CLICKHOUSE_CONFDIR}/users.d/default-password.xml
chown ${CLICKHOUSE_USER}:${CLICKHOUSE_GROUP} ${CLICKHOUSE_CONFDIR}/users.d/default-password.xml
chmod 600 ${CLICKHOUSE_CONFDIR}/users.d/default-password.xml
fi
# everything went well, so now let's reset the password
db_set clickhouse-server/default-password ""
# ... done with debconf here
db_stop
fi
${CLICKHOUSE_GENERIC_PROGRAM} install --user "${CLICKHOUSE_USER}" --group "${CLICKHOUSE_GROUP}" --pid-path "${CLICKHOUSE_PIDDIR}" --config-path "${CLICKHOUSE_CONFDIR}" --binary-path "${CLICKHOUSE_BINDIR}" --log-path "${CLICKHOUSE_LOGDIR}" --data-path "${CLICKHOUSE_DATADIR}"
fi

View File

@ -31,14 +31,6 @@ RUN curl -O https://clickhouse-builds.s3.yandex.net/utils/1/dpkg-deb \
&& chmod +x dpkg-deb \
&& cp dpkg-deb /usr/bin
ENV APACHE_PUBKEY_HASH="bba6987b63c63f710fd4ed476121c588bc3812e99659d27a855f8c4d312783ee66ad6adfce238765691b04d62fa3688f"
RUN export CODENAME="$(lsb_release --codename --short | tr 'A-Z' 'a-z')" \
&& wget -nv -O /tmp/arrow-keyring.deb "https://apache.bintray.com/arrow/ubuntu/apache-arrow-archive-keyring-latest-${CODENAME}.deb" \
&& echo "${APACHE_PUBKEY_HASH} /tmp/arrow-keyring.deb" | sha384sum -c \
&& dpkg -i /tmp/arrow-keyring.deb
# Libraries from OS are only needed to test the "unbundled" build (this is not used in production).
RUN apt-get update \
&& apt-get install \

View File

@ -1,6 +1,10 @@
# docker build -t yandex/clickhouse-unbundled-builder .
FROM yandex/clickhouse-deb-builder
RUN export CODENAME="$(lsb_release --codename --short | tr 'A-Z' 'a-z')" \
&& wget -nv -O /tmp/arrow-keyring.deb "https://apache.bintray.com/arrow/ubuntu/apache-arrow-archive-keyring-latest-${CODENAME}.deb" \
&& dpkg -i /tmp/arrow-keyring.deb
# Libraries from OS are only needed to test the "unbundled" build (that is not used in production).
RUN apt-get update \
&& apt-get install \

View File

@ -0,0 +1,8 @@
# post / preinstall scripts (not needed, we do it in Dockerfile)
alpine-root/install/*
# docs (looks useless)
alpine-root/usr/share/doc/*
# packages, etc. (used by prepare.sh)
alpine-root/tgz-packages/*

1
docker/server/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
alpine-root/*

View File

@ -0,0 +1,26 @@
FROM alpine
ENV LANG=en_US.UTF-8 \
LANGUAGE=en_US:en \
LC_ALL=en_US.UTF-8 \
TZ=UTC \
CLICKHOUSE_CONFIG=/etc/clickhouse-server/config.xml
COPY alpine-root/ /
# from https://github.com/ClickHouse/ClickHouse/blob/master/debian/clickhouse-server.postinst
RUN addgroup clickhouse \
&& adduser -S -H -h /nonexistent -s /bin/false -G clickhouse -g "ClickHouse server" clickhouse \
&& chown clickhouse:clickhouse /var/lib/clickhouse \
&& chmod 700 /var/lib/clickhouse \
&& chown root:clickhouse /var/log/clickhouse-server \
&& chmod 775 /var/log/clickhouse-server \
&& chmod +x /entrypoint.sh \
&& apk add --no-cache su-exec
EXPOSE 9000 8123 9009
VOLUME /var/lib/clickhouse \
/var/log/clickhouse-server
ENTRYPOINT ["/entrypoint.sh"]

59
docker/server/alpine-build.sh Executable file
View File

@ -0,0 +1,59 @@
#!/bin/bash
set -x
REPO_CHANNEL="${REPO_CHANNEL:-stable}" # lts / testing / prestable / etc
REPO_URL="${REPO_URL:-"https://repo.yandex.ru/clickhouse/tgz/${REPO_CHANNEL}"}"
VERSION="${VERSION:-20.9.3.45}"
# where original files live
DOCKER_BUILD_FOLDER="${BASH_SOURCE%/*}"
# we will create root for our image here
CONTAINER_ROOT_FOLDER="${DOCKER_BUILD_FOLDER}/alpine-root"
# where to put downloaded tgz
TGZ_PACKAGES_FOLDER="${CONTAINER_ROOT_FOLDER}/tgz-packages"
# clean up the root from old runs
rm -rf "$CONTAINER_ROOT_FOLDER"
mkdir -p "$TGZ_PACKAGES_FOLDER"
PACKAGES=( "clickhouse-client" "clickhouse-server" "clickhouse-common-static" )
# download tars from the repo
for package in "${PACKAGES[@]}"
do
wget -q --show-progress "${REPO_URL}/${package}-${VERSION}.tgz" -O "${TGZ_PACKAGES_FOLDER}/${package}-${VERSION}.tgz"
done
# unpack tars
for package in "${PACKAGES[@]}"
do
tar xvzf "${TGZ_PACKAGES_FOLDER}/${package}-${VERSION}.tgz" --strip-components=2 -C "$CONTAINER_ROOT_FOLDER"
done
# prepare few more folders
mkdir -p "${CONTAINER_ROOT_FOLDER}/etc/clickhouse-server/users.d" \
"${CONTAINER_ROOT_FOLDER}/etc/clickhouse-server/config.d" \
"${CONTAINER_ROOT_FOLDER}/var/log/clickhouse-server" \
"${CONTAINER_ROOT_FOLDER}/var/lib/clickhouse" \
"${CONTAINER_ROOT_FOLDER}/docker-entrypoint-initdb.d" \
"${CONTAINER_ROOT_FOLDER}/lib64"
cp "${DOCKER_BUILD_FOLDER}/docker_related_config.xml" "${CONTAINER_ROOT_FOLDER}/etc/clickhouse-server/config.d/"
cp "${DOCKER_BUILD_FOLDER}/entrypoint.alpine.sh" "${CONTAINER_ROOT_FOLDER}/entrypoint.sh"
## get glibc components from ubuntu 20.04 and put them to expected place
docker pull ubuntu:20.04
ubuntu20image=$(docker create --rm ubuntu:20.04)
docker cp -L ${ubuntu20image}:/lib/x86_64-linux-gnu/libc.so.6 "${CONTAINER_ROOT_FOLDER}/lib"
docker cp -L ${ubuntu20image}:/lib/x86_64-linux-gnu/libdl.so.2 "${CONTAINER_ROOT_FOLDER}/lib"
docker cp -L ${ubuntu20image}:/lib/x86_64-linux-gnu/libm.so.6 "${CONTAINER_ROOT_FOLDER}/lib"
docker cp -L ${ubuntu20image}:/lib/x86_64-linux-gnu/libpthread.so.0 "${CONTAINER_ROOT_FOLDER}/lib"
docker cp -L ${ubuntu20image}:/lib/x86_64-linux-gnu/librt.so.1 "${CONTAINER_ROOT_FOLDER}/lib"
docker cp -L ${ubuntu20image}:/lib/x86_64-linux-gnu/libnss_dns.so.2 "${CONTAINER_ROOT_FOLDER}/lib"
docker cp -L ${ubuntu20image}:/lib/x86_64-linux-gnu/libresolv.so.2 "${CONTAINER_ROOT_FOLDER}/lib"
docker cp -L ${ubuntu20image}:/lib64/ld-linux-x86-64.so.2 "${CONTAINER_ROOT_FOLDER}/lib64"
docker build "$DOCKER_BUILD_FOLDER" -f Dockerfile.alpine -t "yandex/clickhouse-server:${VERSION}-alpine" --pull

View File

@ -0,0 +1,152 @@
#!/bin/sh
#set -x
DO_CHOWN=1
if [ "$CLICKHOUSE_DO_NOT_CHOWN" = 1 ]; then
DO_CHOWN=0
fi
CLICKHOUSE_UID="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
CLICKHOUSE_GID="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"
# support --user
if [ "$(id -u)" = "0" ]; then
USER=$CLICKHOUSE_UID
GROUP=$CLICKHOUSE_GID
# busybox has setuidgid & chpst buildin
gosu="su-exec $USER:$GROUP"
else
USER="$(id -u)"
GROUP="$(id -g)"
gosu=""
DO_CHOWN=0
fi
# set some vars
CLICKHOUSE_CONFIG="${CLICKHOUSE_CONFIG:-/etc/clickhouse-server/config.xml}"
# port is needed to check if clickhouse-server is ready for connections
HTTP_PORT="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=http_port)"
# get CH directories locations
DATA_DIR="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=path || true)"
TMP_DIR="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=tmp_path || true)"
USER_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=user_files_path || true)"
LOG_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=logger.log || true)"
LOG_DIR="$(dirname $LOG_PATH || true)"
ERROR_LOG_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=logger.errorlog || true)"
ERROR_LOG_DIR="$(dirname $ERROR_LOG_PATH || true)"
FORMAT_SCHEMA_PATH="$(clickhouse extract-from-config --config-file $CLICKHOUSE_CONFIG --key=format_schema_path || true)"
CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}"
CLICKHOUSE_PASSWORD="${CLICKHOUSE_PASSWORD:-}"
CLICKHOUSE_DB="${CLICKHOUSE_DB:-}"
for dir in "$DATA_DIR" \
"$ERROR_LOG_DIR" \
"$LOG_DIR" \
"$TMP_DIR" \
"$USER_PATH" \
"$FORMAT_SCHEMA_PATH"
do
# check if variable not empty
[ -z "$dir" ] && continue
# ensure directories exist
if ! mkdir -p "$dir"; then
echo "Couldn't create necessary directory: $dir"
exit 1
fi
if [ "$DO_CHOWN" = "1" ]; then
# ensure proper directories permissions
chown -R "$USER:$GROUP" "$dir"
elif [ "$(stat -c %u "$dir")" != "$USER" ]; then
echo "Necessary directory '$dir' isn't owned by user with id '$USER'"
exit 1
fi
done
# if clickhouse user is defined - create it (user "default" already exists out of box)
if [ -n "$CLICKHOUSE_USER" ] && [ "$CLICKHOUSE_USER" != "default" ] || [ -n "$CLICKHOUSE_PASSWORD" ]; then
echo "$0: create new user '$CLICKHOUSE_USER' instead 'default'"
cat <<EOT > /etc/clickhouse-server/users.d/default-user.xml
<yandex>
<!-- Docs: <https://clickhouse.tech/docs/en/operations/settings/settings_users/> -->
<users>
<!-- Remove default user -->
<default remove="remove">
</default>
<${CLICKHOUSE_USER}>
<profile>default</profile>
<networks>
<ip>::/0</ip>
</networks>
<password>${CLICKHOUSE_PASSWORD}</password>
<quota>default</quota>
</${CLICKHOUSE_USER}>
</users>
</yandex>
EOT
fi
if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
# Listen only on localhost until the initialization is done
$gosu /usr/bin/clickhouse-server --config-file=$CLICKHOUSE_CONFIG -- --listen_host=127.0.0.1 &
pid="$!"
# check if clickhouse is ready to accept connections
# will try to send ping clickhouse via http_port (max 6 retries, with 1 sec timeout and 1 sec delay between retries)
tries=6
while ! wget --spider -T 1 -q "http://localhost:$HTTP_PORT/ping" 2>/dev/null; do
if [ "$tries" -le "0" ]; then
echo >&2 'ClickHouse init process failed.'
exit 1
fi
tries=$(( tries-1 ))
sleep 1
done
if [ ! -z "$CLICKHOUSE_PASSWORD" ]; then
printf -v WITH_PASSWORD '%s %q' "--password" "$CLICKHOUSE_PASSWORD"
fi
clickhouseclient="clickhouse-client --multiquery -u $CLICKHOUSE_USER $WITH_PASSWORD "
# create default database, if defined
if [ -n "$CLICKHOUSE_DB" ]; then
echo "$0: create database '$CLICKHOUSE_DB'"
"$clickhouseclient" -q "CREATE DATABASE IF NOT EXISTS $CLICKHOUSE_DB";
fi
for f in /docker-entrypoint-initdb.d/*; do
case "$f" in
*.sh)
if [ -x "$f" ]; then
echo "$0: running $f"
"$f"
else
echo "$0: sourcing $f"
. "$f"
fi
;;
*.sql) echo "$0: running $f"; cat "$f" | "$clickhouseclient" ; echo ;;
*.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "$clickhouseclient"; echo ;;
*) echo "$0: ignoring $f" ;;
esac
echo
done
if ! kill -s TERM "$pid" || ! wait "$pid"; then
echo >&2 'Finishing of ClickHouse init process failed.'
exit 1
fi
fi
# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
exec $gosu /usr/bin/clickhouse-server --config-file=$CLICKHOUSE_CONFIG "$@"
fi
# Otherwise, we assume the user want to run his own process, for example a `bash` shell to explore this image
exec "$@"

View File

@ -82,6 +82,7 @@ RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
ENV COMMIT_SHA=''
ENV PULL_REQUEST_NUMBER=''
ENV COPY_CLICKHOUSE_BINARY_TO_OUTPUT=0
COPY run.sh /
CMD ["/bin/bash", "/run.sh"]

View File

@ -20,6 +20,7 @@ FASTTEST_SOURCE=$(readlink -f "${FASTTEST_SOURCE:-$FASTTEST_WORKSPACE/ch}")
FASTTEST_BUILD=$(readlink -f "${FASTTEST_BUILD:-${BUILD:-$FASTTEST_WORKSPACE/build}}")
FASTTEST_DATA=$(readlink -f "${FASTTEST_DATA:-$FASTTEST_WORKSPACE/db-fasttest}")
FASTTEST_OUTPUT=$(readlink -f "${FASTTEST_OUTPUT:-$FASTTEST_WORKSPACE}")
PATH="$FASTTEST_BUILD/programs:$FASTTEST_SOURCE/tests:$PATH"
# Export these variables, so that all subsequent invocations of the script
# use them, and not try to guess them anew, which leads to weird effects.
@ -28,6 +29,7 @@ export FASTTEST_SOURCE
export FASTTEST_BUILD
export FASTTEST_DATA
export FASTTEST_OUT
export PATH
server_pid=none
@ -125,7 +127,7 @@ function clone_submodules
(
cd "$FASTTEST_SOURCE"
SUBMODULES_TO_UPDATE=(contrib/boost contrib/zlib-ng contrib/libxml2 contrib/poco contrib/libunwind contrib/ryu contrib/fmtlib contrib/base64 contrib/cctz contrib/libcpuid contrib/double-conversion contrib/libcxx contrib/libcxxabi contrib/libc-headers contrib/lz4 contrib/zstd contrib/fastops contrib/rapidjson contrib/re2 contrib/sparsehash-c11)
SUBMODULES_TO_UPDATE=(contrib/boost contrib/zlib-ng contrib/libxml2 contrib/poco contrib/libunwind contrib/ryu contrib/fmtlib contrib/base64 contrib/cctz contrib/libcpuid contrib/double-conversion contrib/libcxx contrib/libcxxabi contrib/libc-headers contrib/lz4 contrib/zstd contrib/fastops contrib/rapidjson contrib/re2 contrib/sparsehash-c11 contrib/croaring)
git submodule sync
git submodule update --init --recursive "${SUBMODULES_TO_UPDATE[@]}"
@ -137,7 +139,14 @@ git submodule foreach git clean -xfd
function run_cmake
{
CMAKE_LIBS_CONFIG=("-DENABLE_LIBRARIES=0" "-DENABLE_TESTS=0" "-DENABLE_UTILS=0" "-DENABLE_EMBEDDED_COMPILER=0" "-DENABLE_THINLTO=0" "-DUSE_UNWIND=1")
CMAKE_LIBS_CONFIG=(
"-DENABLE_LIBRARIES=0"
"-DENABLE_TESTS=0"
"-DENABLE_UTILS=0"
"-DENABLE_EMBEDDED_COMPILER=0"
"-DENABLE_THINLTO=0"
"-DUSE_UNWIND=1"
)
# TODO remove this? we don't use ccache anyway. An option would be to download it
# from S3 simultaneously with cloning.
@ -163,6 +172,9 @@ function build
(
cd "$FASTTEST_BUILD"
time ninja clickhouse-bundle | ts '%Y-%m-%d %H:%M:%S' | tee "$FASTTEST_OUTPUT/build_log.txt"
if [ "$COPY_CLICKHOUSE_BINARY_TO_OUTPUT" -eq "1" ]; then
cp programs/clickhouse "$FASTTEST_OUTPUT/clickhouse"
fi
ccache --show-stats ||:
)
}
@ -219,6 +231,8 @@ TESTS_TO_SKIP=(
01268_dictionary_direct_layout
01280_ssd_complex_key_dictionary
01281_group_by_limit_memory_tracking # max_memory_usage_for_user can interfere another queries running concurrently
01318_encrypt # Depends on OpenSSL
01318_decrypt # Depends on OpenSSL
01281_unsucceeded_insert_select_queries_counter
01292_create_user
01294_lazy_database_concurrent
@ -258,6 +272,13 @@ TESTS_TO_SKIP=(
# Look at DistributedFilesToInsert, so cannot run in parallel.
01460_DistributedFilesToInsert
01541_max_memory_usage_for_user
# Require python libraries like scipy, pandas and numpy
01322_ttest_scipy
01545_system_errors
)
time clickhouse-test -j 8 --order=random --no-long --testname --shard --zookeeper --skip "${TESTS_TO_SKIP[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee "$FASTTEST_OUTPUT/test_log.txt"
@ -327,8 +348,6 @@ case "$stage" in
;&
"build")
build
PATH="$FASTTEST_BUILD/programs:$FASTTEST_SOURCE/tests:$PATH"
export PATH
;&
"configure")
# The `install_log.txt` is also needed for compatibility with old CI task --

View File

@ -164,7 +164,7 @@ case "$stage" in
# Lost connection to the server. This probably means that the server died
# with abort.
echo "failure" > status.txt
if ! grep -ao "Received signal.*\|Logical error.*\|Assertion.*failed" server.log > description.txt
if ! grep -ao "Received signal.*\|Logical error.*\|Assertion.*failed\|Failed assertion.*" server.log > description.txt
then
echo "Lost connection to server. See the logs" > description.txt
fi

View File

@ -17,7 +17,8 @@ RUN apt-get update \
sqlite3 \
curl \
tar \
krb5-user
krb5-user \
iproute2
RUN rm -rf \
/var/lib/apt/lists/* \
/var/cache/debconf \

View File

@ -37,7 +37,28 @@ RUN apt-get update \
ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN python3 -m pip install urllib3==1.23 pytest docker-compose==1.22.0 docker dicttoxml kazoo PyMySQL psycopg2==2.7.5 pymongo tzlocal kafka-python protobuf redis aerospike pytest-timeout minio grpcio grpcio-tools cassandra-driver confluent-kafka avro
RUN python3 -m pip install \
PyMySQL \
aerospike \
avro \
cassandra-driver \
confluent-kafka \
dicttoxml \
docker \
docker-compose==1.22.0 \
grpcio \
grpcio-tools \
kafka-python \
kazoo \
minio \
protobuf \
psycopg2-binary==2.7.5 \
pymongo \
pytest \
pytest-timeout \
redis \
tzlocal \
urllib3
ENV DOCKER_CHANNEL stable
ENV DOCKER_VERSION 17.09.1-ce

View File

@ -9,6 +9,7 @@ RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install --yes --no-install-recommends \
bash \
curl \
dmidecode \
g++ \
gdb \
git \
@ -37,7 +38,18 @@ RUN apt-get update \
COPY * /
CMD /entrypoint.sh
# Bind everything to one NUMA node, if there's more than one. Theoretically the
# node #0 should be less stable because of system interruptions. We bind
# randomly to node 1 or 0 to gather some statistics on that. We have to bind
# both servers and the tmpfs on which the database is stored. How to do it
# through Yandex Sandbox API is unclear, but by default tmpfs uses
# 'process allocation policy', not sure which process but hopefully the one that
# writes to it, so just bind the downloader script as well. We could also try to
# remount it with proper options in Sandbox task.
# https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt
# Double-escaped backslashes are a tribute to the engineering wonder of docker --
# it gives '/bin/sh: 1: [bash,: not found' otherwise.
CMD ["bash", "-c", "node=$((RANDOM % $(numactl --hardware | sed -n 's/^.*available:\\(.*\\)nodes.*$/\\1/p'))); echo Will bind to NUMA node $node; numactl --cpunodebind=$node --membind=$node /entrypoint.sh"]
# docker run --network=host --volume <workspace>:/workspace --volume=<output>:/output -e PR_TO_TEST=<> -e SHA_TO_TEST=<> yandex/clickhouse-performance-comparison

View File

@ -48,12 +48,13 @@ This table shows queries that take significantly longer to process on the client
#### Unexpected Query Duration
Action required for every item -- these are errors that must be fixed.
Queries that have "short" duration (on the order of 0.1 s) can't be reliably tested in a normal way, where we perform a small (about ten) measurements for each server, because the signal-to-noise ratio is much smaller. There is a special mode for such queries that instead runs them for a fixed amount of time, normally with much higher number of measurements (up to thousands). This mode must be explicitly enabled by the test author to avoid accidental errors. It must be used only for queries that are meant to complete "immediately", such as `select count(*)`. If your query is not supposed to be "immediate", try to make it run longer, by e.g. processing more data.
A query is supposed to run longer than 0.1 second. If your query runs faster, increase the amount of processed data to bring the run time above this threshold. You can use a bigger table (e.g. `hits_100m` instead of `hits_10m`), increase a `LIMIT`, make a query single-threaded, and so on. Queries that are too fast suffer from poor stability and precision.
This table shows queries for which the "short" marking is not consistent with the actual query run time -- i.e., a query runs for a long time but is marked as short, or it runs very fast but is not marked as short.
Sometimes you want to test a query that is supposed to complete "instantaneously", i.e. in sublinear time. This might be `count(*)`, or parsing a complicated tuple. It might not be practical or even possible to increase the run time of such queries by adding more data. For such queries there is a specal comparison mode which runs them for a fixed amount of time, instead of a fixed number of iterations like we do normally. This mode is inferior to the normal mode, because the influence of noise and overhead is higher, which leads to less precise and stable results.
If your query is really supposed to complete "immediately" and can't be made to run longer, you have to mark it as "short". To do so, write `<query short="1">...` in the test file. The value of "short" attribute is evaluated as a python expression, and substitutions are performed, so you can write something like `<query short="{column1} = {column2}">select count(*) from table where {column1} > {column2}</query>`, to mark only a particular combination of variables as short.
If it is impossible to increase the run time of a query and it is supposed to complete "immediately", you have to explicitly mark this in the test. To do so, add a `short` attribute to the query tag in the test file: `<query short="1">...`. The value of the `short` attribute is evaluated as a python expression, and substitutions are performed, so you can write something like `<query short="{column1} = {column2}">select count(*) from table where {column1} > {column2}</query>`, to mark only a particular combination of variables as short.
This table shows queries for which the `short` marking is not consistent with the actual query run time -- i.e., a query runs for a normal time but is marked as `short`, or it runs faster than normal but is not marked as `short`.
#### Partial Queries
Action required for the cells marked in red.

View File

@ -63,7 +63,7 @@ function configure
# Make copies of the original db for both servers. Use hardlinks instead
# of copying to save space. Before that, remove preprocessed configs and
# system tables, because sharing them between servers with hardlinks may
# lead to weird effects.
# lead to weird effects.
rm -r left/db ||:
rm -r right/db ||:
rm -r db0/preprocessed_configs ||:
@ -77,20 +77,30 @@ function restart
while killall clickhouse-server; do echo . ; sleep 1 ; done
echo all killed
# Change the jemalloc settings here.
# https://github.com/jemalloc/jemalloc/wiki/Getting-Started
export MALLOC_CONF="confirm_conf:true"
set -m # Spawn servers in their own process groups
left/clickhouse-server --config-file=left/config/config.xml -- --path left/db --user_files_path left/db/user_files &>> left-server-log.log &
left/clickhouse-server --config-file=left/config/config.xml \
-- --path left/db --user_files_path left/db/user_files \
&>> left-server-log.log &
left_pid=$!
kill -0 $left_pid
disown $left_pid
right/clickhouse-server --config-file=right/config/config.xml -- --path right/db --user_files_path right/db/user_files &>> right-server-log.log &
right/clickhouse-server --config-file=right/config/config.xml \
-- --path right/db --user_files_path right/db/user_files \
&>> right-server-log.log &
right_pid=$!
kill -0 $right_pid
disown $right_pid
set +m
unset MALLOC_CONF
wait_for_server 9001 $left_pid
echo left ok
@ -198,7 +208,7 @@ function run_tests
echo test "$test_name"
# Don't profile if we're past the time limit.
# Use awk because bash doesn't support floating point arithmetics.
# Use awk because bash doesn't support floating point arithmetic.
profile_seconds=$(awk "BEGIN { print ($profile_seconds_left > 0 ? 10 : 0) }")
TIMEFORMAT=$(printf "$test_name\t%%3R\t%%3U\t%%3S\n")
@ -449,7 +459,12 @@ wait
unset IFS
)
parallel --joblog analyze/parallel-log.txt --null < analyze/commands.txt 2>> analyze/errors.log
# The comparison script might be bound to one NUMA node for better test
# stability, and the calculation runs out of memory because of this. Use
# all nodes.
numactl --show
numactl --cpunodebind=all --membind=all numactl --show
numactl --cpunodebind=all --membind=all parallel --joblog analyze/parallel-log.txt --null < analyze/commands.txt 2>> analyze/errors.log
clickhouse-local --query "
-- Join the metric names back to the metric statistics we've calculated, and make
@ -526,10 +541,10 @@ create table queries engine File(TSVWithNamesAndTypes, 'report/queries.tsv')
as select
abs(diff) > report_threshold and abs(diff) > stat_threshold as changed_fail,
abs(diff) > report_threshold - 0.05 and abs(diff) > stat_threshold as changed_show,
not changed_fail and stat_threshold > report_threshold + 0.10 as unstable_fail,
not changed_show and stat_threshold > report_threshold - 0.05 as unstable_show,
left, right, diff, stat_threshold,
if(report_threshold > 0, report_threshold, 0.10) as report_threshold,
query_metric_stats.test test, query_metric_stats.query_index query_index,
@ -752,7 +767,7 @@ create table all_tests_report engine File(TSV, 'report/all-queries.tsv') as
-- The threshold for 2) is significantly larger than the threshold for 1), to
-- avoid jitter.
create view shortness
as select
as select
(test, query_index) in
(select * from file('analyze/marked-short-queries.tsv', TSV,
'test text, query_index int'))
@ -1070,8 +1085,10 @@ case "$stage" in
time configure
;&
"restart")
numactl --show ||:
numactl --hardware ||:
lscpu ||:
dmidecode -t 4 ||:
time restart
;&
"run_tests")

View File

@ -14,6 +14,9 @@
we might also add time check to perf.py script.
-->
<max_execution_time>300</max_execution_time>
<!-- One NUMA node w/o hyperthreading -->
<max_threads>20</max_threads>
</default>
</profiles>
</yandex>

View File

@ -468,14 +468,14 @@ if args.report == 'main':
return
columns = [
'Test', #0
'Wall clock time,&nbsp;s', #1
'Total client time,&nbsp;s', #2
'Total queries', #3
'Longest query<br>(sum for all runs),&nbsp;s', #4
'Avg wall clock time<br>(sum for all runs),&nbsp;s', #5
'Shortest query<br>(sum for all runs),&nbsp;s', #6
'', # Runs #7
'Test', #0
'Wall clock time, entire test,&nbsp;s', #1
'Total client time for measured query runs,&nbsp;s', #2
'Queries', #3
'Longest query, total for measured runs,&nbsp;s', #4
'Wall clock time per query,&nbsp;s', #5
'Shortest query, total for measured runs,&nbsp;s', #6
'', # Runs #7
]
attrs = ['' for c in columns]
attrs[7] = None

View File

@ -16,6 +16,7 @@ RUN apt-get update -y \
python3-lxml \
python3-requests \
python3-termcolor \
python3-pip \
qemu-user-static \
sudo \
telnet \
@ -23,6 +24,8 @@ RUN apt-get update -y \
unixodbc \
wget
RUN pip3 install numpy scipy pandas
RUN mkdir -p /tmp/clickhouse-odbc-tmp \
&& wget -nv -O - ${odbc_driver_url} | tar --strip-components=1 -xz -C /tmp/clickhouse-odbc-tmp \
&& cp /tmp/clickhouse-odbc-tmp/lib64/*.so /usr/local/lib/ \
@ -33,5 +36,8 @@ RUN mkdir -p /tmp/clickhouse-odbc-tmp \
ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
ENV NUM_TRIES=1
ENV MAX_RUN_TIME=0
COPY run.sh /
CMD ["/bin/bash", "/run.sh"]

View File

@ -1,6 +1,7 @@
#!/bin/bash
set -e -x
# fail on errors, verbose and export all env variables
set -e -x -a
dpkg -i package_folder/clickhouse-common-static_*.deb
dpkg -i package_folder/clickhouse-common-static-dbg_*.deb
@ -17,8 +18,26 @@ if grep -q -- "--use-skip-list" /usr/bin/clickhouse-test; then
SKIP_LIST_OPT="--use-skip-list"
fi
# We can have several additional options so we path them as array because it's
# more idiologically correct.
read -ra ADDITIONAL_OPTIONS <<< "${ADDITIONAL_OPTIONS:-}"
function run_tests()
{
# We can have several additional options so we path them as array because it's
# more idiologically correct.
read -ra ADDITIONAL_OPTIONS <<< "${ADDITIONAL_OPTIONS:-}"
clickhouse-test --testname --shard --zookeeper --hung-check --print-time "$SKIP_LIST_OPT" "${ADDITIONAL_OPTIONS[@]}" "$SKIP_TESTS_OPTION" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
# Skip these tests, because they fail when we rerun them multiple times
if [ "$NUM_TRIES" -gt "1" ]; then
ADDITIONAL_OPTIONS+=('--skip')
ADDITIONAL_OPTIONS+=('00000_no_tests_to_skip')
fi
for i in $(seq 1 $NUM_TRIES); do
clickhouse-test --testname --shard --zookeeper --hung-check --print-time "$SKIP_LIST_OPT" "${ADDITIONAL_OPTIONS[@]}" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee -a test_output/test_result.txt
if [ ${PIPESTATUS[0]} -ne "0" ]; then
break;
fi
done
}
export -f run_tests
timeout $MAX_RUN_TIME bash -c run_tests ||:

View File

@ -58,6 +58,7 @@ RUN apt-get --allow-unauthenticated update -y \
python3-lxml \
python3-requests \
python3-termcolor \
python3-pip \
qemu-user-static \
sudo \
telnet \
@ -68,6 +69,8 @@ RUN apt-get --allow-unauthenticated update -y \
wget \
zlib1g-dev
RUN pip3 install numpy scipy pandas
RUN mkdir -p /tmp/clickhouse-odbc-tmp \
&& wget -nv -O - ${odbc_driver_url} | tar --strip-components=1 -xz -C /tmp/clickhouse-odbc-tmp \
&& cp /tmp/clickhouse-odbc-tmp/lib64/*.so /usr/local/lib/ \

View File

@ -45,7 +45,7 @@ function start()
# for clickhouse-server (via service)
echo "ASAN_OPTIONS='malloc_context_size=10 verbosity=1 allocator_release_to_os_interval_ms=10000'" >> /etc/environment
# for clickhouse-client
export ASAN_OPTIONS='malloc_context_size=10 verbosity=1 allocator_release_to_os_interval_ms=10000'
export ASAN_OPTIONS='malloc_context_size=10 allocator_release_to_os_interval_ms=10000'
start

View File

@ -28,8 +28,18 @@ def get_options(i):
options = ""
if 0 < i:
options += " --order=random"
if i % 2 == 1:
options += " --db-engine=Ordinary"
# If database name is not specified, new database is created for each functional test.
# Run some threads with one database for all tests.
if i % 3 == 1:
options += " --database=test_{}".format(i)
if i == 13:
options += " --client-option='memory_tracker_fault_probability=0.00001'"
return options

View File

@ -35,7 +35,7 @@ RUN apt-get update \
ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN pip3 install urllib3 testflows==1.6.48 docker-compose docker dicttoxml kazoo tzlocal
RUN pip3 install urllib3 testflows==1.6.59 docker-compose docker dicttoxml kazoo tzlocal
ENV DOCKER_CHANNEL stable
ENV DOCKER_VERSION 17.09.1-ce
@ -72,5 +72,5 @@ RUN set -x \
VOLUME /var/lib/docker
EXPOSE 2375
ENTRYPOINT ["dockerd-entrypoint.sh"]
CMD ["sh", "-c", "python3 regression.py --no-color --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json"]
CMD ["sh", "-c", "python3 regression.py --no-color -o classic --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json"]

View File

@ -195,7 +195,7 @@ Templates:
- [Function](_description_templates/template-function.md)
- [Setting](_description_templates/template-setting.md)
- [Table engine](_description_templates/template-table-engine.md)
- [Database or Table engine](_description_templates/template-engine.md)
- [System table](_description_templates/template-system-table.md)

View File

@ -1,8 +1,14 @@
# EngineName {#enginename}
- What the engine does.
- What the Database/Table engine does.
- Relations with other engines if they exist.
## Creating a Database {#creating-a-database}
``` sql
CREATE DATABASE ...
```
or
## Creating a Table {#creating-a-table}
``` sql
CREATE TABLE ...
@ -10,12 +16,19 @@
**Engine Parameters**
**Query Clauses**
**Query Clauses** (for Table engines only)
## Virtual columns {#virtual-columns}
## Virtual columns {#virtual-columns} (for Table engines only)
List and virtual columns with description, if they exist.
## Data Types Support {#data_types-support} (for Database engines only)
| EngineName | ClickHouse |
|-----------------------|------------------------------------|
| NativeDataTypeName | [ClickHouseDataTypeName](link#) |
## Specifics and recommendations {#specifics-and-recommendations}
Algorithms

View File

@ -18,4 +18,14 @@ toc_title: Cloud
- Encryption and isolation
- Automated maintenance
## Altinity.Cloud {#altinity.cloud}
[Altinity.Cloud](https://altinity.com/cloud-database/) is a fully managed ClickHouse-as-a-Service for the Amazon public cloud.
- Fast deployment of ClickHouse clusters on Amazon resources
- Easy scale-out/scale-in as well as vertical scaling of nodes
- Isolated per-tenant VPCs with public endpoint or VPC peering
- Configurable storage types and volume configurations
- Cross-AZ scaling for performance and high availability
- Built-in monitoring and SQL query editor
{## [Original article](https://clickhouse.tech/docs/en/commercial/cloud/) ##}

View File

@ -189,7 +189,7 @@ Replication is implemented in the `ReplicatedMergeTree` storage engine. The path
Replication uses an asynchronous multi-master scheme. You can insert data into any replica that has a session with `ZooKeeper`, and data is replicated to all other replicas asynchronously. Because ClickHouse doesnt support UPDATEs, replication is conflict-free. As there is no quorum acknowledgment of inserts, just-inserted data might be lost if one node fails.
Metadata for replication is stored in ZooKeeper. There is a replication log that lists what actions to do. Actions are: get part; merge parts; drop a partition, and so on. Each replica copies the replication log to its queue and then executes the actions from the queue. For example, on insertion, the “get the part” action is created in the log, and every replica downloads that part. Merges are coordinated between replicas to get byte-identical results. All parts are merged in the same way on all replicas. It is achieved by electing one replica as the leader, and that replica initiates merges and writes “merge parts” actions to the log.
Metadata for replication is stored in ZooKeeper. There is a replication log that lists what actions to do. Actions are: get part; merge parts; drop a partition, and so on. Each replica copies the replication log to its queue and then executes the actions from the queue. For example, on insertion, the “get the part” action is created in the log, and every replica downloads that part. Merges are coordinated between replicas to get byte-identical results. All parts are merged in the same way on all replicas. One of the leaders initiates a new merge first and writes “merge parts” actions to the log. Multiple replicas (or all) can be leaders at the same time. A replica can be prevented from becoming a leader using the `merge_tree` setting `replicated_can_become_leader`. The leaders are responsible for scheduling background merges.
Replication is physical: only compressed parts are transferred between nodes, not queries. Merges are processed on each replica independently in most cases to lower the network costs by avoiding network amplification. Large merged parts are sent over the network only in cases of significant replication lag.

View File

@ -47,6 +47,8 @@ select x; -- { serverError 49 }
```
This test ensures that the server returns an error with code 49 about unknown column `x`. If there is no error, or the error is different, the test will fail. If you want to ensure that an error occurs on the client side, use `clientError` annotation instead.
Do not check for a particular wording of error message, it may change in the future, and the test will needlessly break. Check only the error code. If the existing error code is not precise enough for your needs, consider adding a new one.
### Testing a Distributed Query
If you want to use distributed queries in functional tests, you can leverage `remote` table function with `127.0.0.{1..2}` addresses for the server to query itself; or you can use predefined test clusters in server configuration file like `test_shard_localhost`. Remember to add the words `shard` or `distributed` to the test name, so that it is ran in CI in correct configurations, where the server is configured to support distributed queries.

View File

@ -51,7 +51,7 @@ Optional parameters:
- `rabbitmq_row_delimiter` Delimiter character, which ends the message.
- `rabbitmq_schema` Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `rabbitmq_num_consumers` The number of consumers per table. Default: `1`. Specify more consumers if the throughput of one consumer is insufficient.
- `rabbitmq_num_queues` The number of queues per consumer. Default: `1`. Specify more queues if the capacity of one queue per consumer is insufficient.
- `rabbitmq_num_queues` Total number of queues. Default: `1`. Increasing this number can significantly improve performance.
- `rabbitmq_queue_base` - Specify a hint for queue names. Use cases of this setting are described below.
- `rabbitmq_deadletter_exchange` - Specify name for a [dead letter exchange](https://www.rabbitmq.com/dlx.html). You can create another table with this exchange name and collect messages in cases when they are republished to dead letter exchange. By default dead letter exchange is not specified.
- `rabbitmq_persistent` - If set to 1 (true), in insert query delivery mode will be set to 2 (marks messages as 'persistent'). Default: `0`.
@ -148,4 +148,5 @@ Example:
- `_channel_id` - ChannelID, on which consumer, who received the message, was declared.
- `_delivery_tag` - DeliveryTag of the received message. Scoped per channel.
- `_redelivered` - `redelivered` flag of the message.
- `_message_id` - MessageID of the received message; non-empty if was set, when message was published.
- `_message_id` - messageID of the received message; non-empty if was set, when message was published.
- `_timestamp` - timestamp of the received message; non-empty if was set, when message was published.

View File

@ -88,7 +88,7 @@ For a description of parameters, see the [CREATE query description](../../../sql
- `index_granularity` — Maximum number of data rows between the marks of an index. Default value: 8192. See [Data Storage](#mergetree-data-storage).
- `index_granularity_bytes` — Maximum size of data granules in bytes. Default value: 10Mb. To restrict the granule size only by number of rows, set to 0 (not recommended). See [Data Storage](#mergetree-data-storage).
- `min_index_granularity_bytes` — Min allowed size of data granules in bytes. Default value: 1024b. To provide safeguard against accidentally creating tables with very low index_granularity_bytes. See [Data Storage](#mergetree-data-storage).
- `min_index_granularity_bytes` — Min allowed size of data granules in bytes. Default value: 1024b. To provide a safeguard against accidentally creating tables with very low index_granularity_bytes. See [Data Storage](#mergetree-data-storage).
- `enable_mixed_granularity_parts` — Enables or disables transitioning to control the granule size with the `index_granularity_bytes` setting. Before version 19.11, there was only the `index_granularity` setting for restricting granule size. The `index_granularity_bytes` setting improves ClickHouse performance when selecting data from tables with big rows (tens and hundreds of megabytes). If you have tables with big rows, you can enable this setting for the tables to improve the efficiency of `SELECT` queries.
- `use_minimalistic_part_header_in_zookeeper` — Storage method of the data parts headers in ZooKeeper. If `use_minimalistic_part_header_in_zookeeper=1`, then ZooKeeper stores less data. For more information, see the [setting description](../../../operations/server-configuration-parameters/settings.md#server-settings-use_minimalistic_part_header_in_zookeeper) in “Server configuration parameters”.
- `min_merge_bytes_to_use_direct_io` — The minimum data volume for merge operation that is required for using direct I/O access to the storage disk. When merging data parts, ClickHouse calculates the total storage volume of all the data to be merged. If the volume exceeds `min_merge_bytes_to_use_direct_io` bytes, ClickHouse reads and writes the data to the storage disk using the direct I/O interface (`O_DIRECT` option). If `min_merge_bytes_to_use_direct_io = 0`, then direct I/O is disabled. Default value: `10 * 1024 * 1024 * 1024` bytes.

View File

@ -117,7 +117,9 @@ CREATE TABLE table_name
</details>
As the example shows, these parameters can contain substitutions in curly brackets. The substituted values are taken from the macros section of the configuration file. Example:
As the example shows, these parameters can contain substitutions in curly brackets. The substituted values are taken from the «[macros](../../../operations/server-configuration-parameters/settings/#macros) section of the configuration file.
Example:
``` xml
<macros>
@ -137,12 +139,40 @@ In this case, the path consists of the following parts:
`table_name` is the name of the node for the table in ZooKeeper. It is a good idea to make it the same as the table name. It is defined explicitly, because in contrast to the table name, it doesnt change after a RENAME query.
*HINT*: you could add a database name in front of `table_name` as well. E.g. `db_name.table_name`
The two built-in substitutions `{database}` and `{table}` can be used, they expand into the table name and the database name respectively (unless these macros are defined in the `macros` section). So the zookeeper path can be specified as `'/clickhouse/tables/{layer}-{shard}/{database}/{table}'`.
Be careful with table renames when using these built-in substitutions. The path in Zookeeper cannot be changed, and when the table is renamed, the macros will expand into a different path, the table will refer to a path that does not exist in Zookeeper, and will go into read-only mode.
The replica name identifies different replicas of the same table. You can use the server name for this, as in the example. The name only needs to be unique within each shard.
You can define the parameters explicitly instead of using substitutions. This might be convenient for testing and for configuring small clusters. However, you cant use distributed DDL queries (`ON CLUSTER`) in this case.
When working with large clusters, we recommend using substitutions because they reduce the probability of error.
You can specify default arguments for `Replicated` table engine in the server configuration file. For instance:
```xml
<default_replica_path>/clickhouse/tables/{shard}/{database}/{table}</default_replica_path>
<default_replica_name>{replica}</default_replica_path>
```
In this case, you can omit arguments when creating tables:
``` sql
CREATE TABLE table_name (
x UInt32
) ENGINE = ReplicatedMergeTree
ORDER BY x;
```
It is equivalent to:
``` sql
CREATE TABLE table_name (
x UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/table_name', '{replica}')
ORDER BY x;
```
Run the `CREATE TABLE` query on each replica. This query creates a new replicated table, or adds a new replica to an existing one.
If you add a new replica after the table already contains some data on other replicas, the data will be copied from the other replicas to the new one after running the query. In other words, the new replica syncs itself with the others.

View File

@ -30,4 +30,4 @@ Instead of inserting data manually, you might consider to use one of [client lib
- `input_format_import_nested_json` allows to insert nested JSON objects into columns of [Nested](../../sql-reference/data-types/nested-data-structures/nested.md) type.
!!! note "Note"
Settings are specified as `GET` parameters for the HTTP interface or as additional command-line arguments prefixed with `--` for the CLI interface.
Settings are specified as `GET` parameters for the HTTP interface or as additional command-line arguments prefixed with `--` for the `CLI` interface.

View File

@ -1,5 +1,5 @@
---
toc_priority: 17
toc_priority: 19
toc_title: AMPLab Big Data Benchmark
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 19
toc_priority: 18
toc_title: Terabyte Click Logs from Criteo
---

View File

@ -1,6 +1,6 @@
---
toc_folder_title: Example Datasets
toc_priority: 15
toc_priority: 14
toc_title: Introduction
---
@ -18,4 +18,4 @@ The list of documented datasets:
- [New York Taxi Data](../../getting-started/example-datasets/nyc-taxi.md)
- [OnTime](../../getting-started/example-datasets/ontime.md)
[Original article](https://clickhouse.tech/docs/en/getting_started/example_datasets) <!--hide-->
[Original article](https://clickhouse.tech/docs/en/getting_started/example_datasets) <!--hide-->

View File

@ -1,5 +1,5 @@
---
toc_priority: 14
toc_priority: 15
toc_title: Yandex.Metrica Data
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 16
toc_priority: 20
toc_title: New York Taxi Data
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 15
toc_priority: 21
toc_title: OnTime
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 20
toc_priority: 16
toc_title: Star Schema Benchmark
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 18
toc_priority: 17
toc_title: WikiStat
---

View File

@ -460,7 +460,7 @@ See also the [JSONEachRow](#jsoneachrow) format.
## JSONString {#jsonstring}
Differs from JSON only in that data fields are output in strings, not in typed json values.
Differs from JSON only in that data fields are output in strings, not in typed JSON values.
Example:
@ -596,7 +596,7 @@ When inserting the data, you should provide a separate JSON value for each row.
## JSONEachRowWithProgress {#jsoneachrowwithprogress}
## JSONStringEachRowWithProgress {#jsonstringeachrowwithprogress}
Differs from JSONEachRow/JSONStringEachRow in that ClickHouse will also yield progress information as JSON objects.
Differs from `JSONEachRow`/`JSONStringEachRow` in that ClickHouse will also yield progress information as JSON values.
```json
{"row":{"'hello'":"hello","multiply(42, number)":"0","range(5)":[0,1,2,3,4]}}
@ -608,7 +608,7 @@ Differs from JSONEachRow/JSONStringEachRow in that ClickHouse will also yield pr
## JSONCompactEachRowWithNamesAndTypes {#jsoncompacteachrowwithnamesandtypes}
## JSONCompactStringEachRowWithNamesAndTypes {#jsoncompactstringeachrowwithnamesandtypes}
Differs from JSONCompactEachRow/JSONCompactStringEachRow in that the column names and types are written as the first two rows.
Differs from `JSONCompactEachRow`/`JSONCompactStringEachRow` in that the column names and types are written as the first two rows.
```json
["'hello'", "multiply(42, number)", "range(5)"]

View File

@ -79,7 +79,7 @@ By default, data is returned in TabSeparated format (for more information, see t
You use the FORMAT clause of the query to request any other format.
Also, you can use the default_format URL parameter or X-ClickHouse-Format header to specify a default format other than TabSeparated.
Also, you can use the default_format URL parameter or the X-ClickHouse-Format header to specify a default format other than TabSeparated.
``` bash
$ echo 'SELECT 1 FORMAT Pretty' | curl 'http://localhost:8123/?' --data-binary @-
@ -170,7 +170,7 @@ $ echo "SELECT 1" | gzip -c | curl -sS --data-binary @- -H 'Content-Encoding: gz
!!! note "Note"
Some HTTP clients might decompress data from the server by default (with `gzip` and `deflate`) and you might get decompressed data even if you use the compression settings correctly.
You can use the database URL parameter or X-ClickHouse-Database header to specify the default database.
You can use the database URL parameter or the X-ClickHouse-Database header to specify the default database.
``` bash
$ echo 'SELECT number FROM numbers LIMIT 10' | curl 'http://localhost:8123/?database=system' --data-binary @-

View File

@ -6,7 +6,7 @@ toc_title: Client Libraries
# Client Libraries from Third-party Developers {#client-libraries-from-third-party-developers}
!!! warning "Disclaimer"
Yandex does **not** maintain the libraries listed below and havent done any extensive testing to ensure their quality.
Yandex does **not** maintain the libraries listed below and hasnt done any extensive testing to ensure their quality.
- Python
- [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm)

View File

@ -77,6 +77,7 @@ toc_title: Adopters
| <a href="https://rambler.ru" class="favicon">Rambler</a> | Internet services | Analytics | — | — | [Talk in Russian, April 2018](https://medium.com/@ramblertop/разработка-api-clickhouse-для-рамблер-топ-100-f4c7e56f3141) |
| <a href="https://retell.cc/" class="favicon">Retell</a> | Speech synthesis | Analytics | — | — | [Blog Article, August 2020](https://vc.ru/services/153732-kak-sozdat-audiostati-na-vashem-sayte-i-zachem-eto-nuzhno) |
| <a href="https://rspamd.com/" class="favicon">Rspamd</a> | Antispam | Analytics | — | — | [Official Website](https://rspamd.com/doc/modules/clickhouse.html) |
| <a href="https://rusiem.com/en" class="favicon">RuSIEM</a> | SIEM | Main Product | — | — | [Official Website](https://rusiem.com/en/products/architecture) |
| <a href="https://www.s7.ru" class="favicon">S7 Airlines</a> | Airlines | Metrics, Logging | — | — | [Talk in Russian, March 2019](https://www.youtube.com/watch?v=nwG68klRpPg&t=15s) |
| <a href="https://www.scireum.de/" class="favicon">scireum GmbH</a> | e-Commerce | Main product | — | — | [Talk in German, February 2020](https://www.youtube.com/watch?v=7QWAn5RbyR4) |
| <a href="https://segment.com/" class="favicon">Segment</a> | Data processing | Main product | 9 * i3en.3xlarge nodes 7.5TB NVME SSDs, 96GB Memory, 12 vCPUs | — | [Slides, 2019](https://slides.com/abraithwaite/segment-clickhouse) |
@ -88,6 +89,8 @@ toc_title: Adopters
| <a href="https://smi2.ru/" class="favicon">SMI2</a> | News | Analytics | — | — | [Blog Post in Russian, November 2017](https://habr.com/ru/company/smi2/blog/314558/) |
| <a href="https://www.splunk.com/" class="favicon">Splunk</a> | Business Analytics | Main product | — | — | [Slides in English, January 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup12/splunk.pdf) |
| <a href="https://www.spotify.com" class="favicon">Spotify</a> | Music | Experimentation | — | — | [Slides, July 2018](https://www.slideshare.net/glebus/using-clickhouse-for-experimentation-104247173) |
| <a href="https://www.staffcop.ru/" class="favicon">Staffcop</a> | Information Security | Main Product | — | — | [Official website, Documentation](https://www.staffcop.ru/sce43) |
| <a href="https://www.teralytics.net/" class="favicon">Teralytics</a> | Mobility | Analytics | — | — | [Tech blog](https://www.teralytics.net/knowledge-hub/visualizing-mobility-data-the-scalability-challenge) |
| <a href="https://www.tencent.com" class="favicon">Tencent</a> | Big Data | Data processing | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/5.%20ClickHouse大数据集群应用_李俊飞腾讯网媒事业部.pdf) |
| <a href="https://www.tencent.com" class="favicon">Tencent</a> | Messaging | Logging | — | — | [Talk in Chinese, November 2019](https://youtu.be/T-iVQRuw-QY?t=5050) |
| <a href="https://trafficstars.com/" class="favicon">Traffic Stars</a> | AD network | — | — | — | [Slides in Russian, May 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup15/lightning/ninja.pdf) |

View File

@ -0,0 +1,69 @@
---
toc_priority: 62
toc_title: OpenTelemetry Support
---
# [experimental] OpenTelemetry Support
[OpenTelemetry](https://opentelemetry.io/) is an open standard for collecting
traces and metrics from distributed application. ClickHouse has some support
for OpenTelemetry.
!!! warning "Warning"
This is an experimental feature that will change in backwards-incompatible ways in the future releases.
## Supplying Trace Context to ClickHouse
ClickHouse accepts trace context HTTP headers, as described by
the [W3C recommendation](https://www.w3.org/TR/trace-context/).
It also accepts trace context over native protocol that is used for
communication between ClickHouse servers or between the client and server.
For manual testing, trace context headers conforming to the Trace Context
recommendation can be supplied to `clickhouse-client` using
`--opentelemetry-traceparent` and `--opentelemetry-tracestate` flags.
If no parent trace context is supplied, ClickHouse can start a new trace, with
probability controlled by the `opentelemetry_start_trace_probability` setting.
## Propagating the Trace Context
The trace context is propagated to downstream services in the following cases:
* Queries to remote ClickHouse servers, such as when using `Distributed` table
engine.
* `URL` table function. Trace context information is sent in HTTP headers.
## Tracing the ClickHouse Itself
ClickHouse creates _trace spans_ for each query and some of the query execution
stages, such as query planning or distributed queries.
To be useful, the tracing information has to be exported to a monitoring system
that supports OpenTelemetry, such as Jaeger or Prometheus. ClickHouse avoids
a dependency on a particular monitoring system, instead only
providing the tracing data conforming to the standard. A natural way to do so
in an SQL RDBMS is a system table. OpenTelemetry trace span information
[required by the standard](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md#span)
is stored in the system table called `system.opentelemetry_span_log`.
The table must be enabled in the server configuration, see the `opentelemetry_span_log`
element in the default config file `config.xml`. It is enabled by default.
The table has the following columns:
- `trace_id`
- `span_id`
- `parent_span_id`
- `operation_name`
- `start_time`
- `finish_time`
- `finish_date`
- `attribute.name`
- `attribute.values`
The tags or attributes are saved as two parallel arrays, containing the keys
and values. Use `ARRAY JOIN` to work with them.

View File

@ -479,6 +479,26 @@ The maximum number of simultaneously processed requests.
<max_concurrent_queries>100</max_concurrent_queries>
```
## max_concurrent_queries_for_all_users {#max-concurrent-queries-for-all-users}
Throw exception if the value of this setting is less or equal than the current number of simultaneously processed queries.
Example: `max_concurrent_queries_for_all_users` can be set to 99 for all users and database administrator can set it to 100 for itself to run queries for investigation even when the server is overloaded.
Modifying the setting for one query or user does not affect other queries.
Default value: `0` that means no limit.
**Example**
``` xml
<max_concurrent_queries_for_all_users>99</max_concurrent_queries_for_all_users>
```
**See Also**
- [max_concurrent_queries](#max-concurrent-queries)
## max_connections {#max-connections}
The maximum number of inbound connections.

View File

@ -305,6 +305,10 @@ When enabled, replace empty input fields in TSV with default values. For complex
Disabled by default.
## input_format_tsv_enum_as_number {#settings-input_format_tsv_enum_as_number}
For TSV input format switches to parsing enum values as enum ids.
## input_format_null_as_default {#settings-input-format-null-as-default}
Enables or disables using default values if input data contain `NULL`, but the data type of the corresponding column in not `Nullable(T)` (for text input formats).
@ -676,6 +680,21 @@ Example:
log_queries=1
```
## log_queries_min_query_duration_ms {#settings-log-queries-min-query-duration-ms}
Minimal time for the query to run to get to the following tables:
- `system.query_log`
- `system.query_thread_log`
Only the queries with the following type will get to the log:
- `QUERY_FINISH`
- `EXCEPTION_WHILE_PROCESSING`
- Type: milliseconds
- Default value: 0 (any query)
## log_queries_min_type {#settings-log-queries-min-type}
`query_log` minimal type to log.
@ -1161,6 +1180,10 @@ The character is interpreted as a delimiter in the CSV data. By default, the del
For CSV input format enables or disables parsing of unquoted `NULL` as literal (synonym for `\N`).
## input_format_csv_enum_as_number {#settings-input_format_csv_enum_as_number}
For CSV input format switches to parsing enum values as enum ids.
## output_format_csv_crlf_end_of_line {#settings-output-format-csv-crlf-end-of-line}
Use DOS/Windows-style line separator (CRLF) in CSV instead of Unix style (LF).
@ -1398,6 +1421,17 @@ Possible values:
Default value: 0
## allow_nondeterministic_optimize_skip_unused_shards {#allow-nondeterministic-optimize-skip-unused-shards}
Allow nondeterministic (like `rand` or `dictGet`, since later has some caveats with updates) functions in sharding key.
Possible values:
- 0 — Disallowed.
- 1 — Allowed.
Default value: 0
## optimize_skip_unused_shards_nesting {#optimize-skip-unused-shards-nesting}
Controls [`optimize_skip_unused_shards`](#optimize-skip-unused-shards) (hence still requires [`optimize_skip_unused_shards`](#optimize-skip-unused-shards)) depends on the nesting level of the distributed query (case when you have `Distributed` table that look into another `Distributed` table).
@ -2129,7 +2163,34 @@ Result:
└───────────────┘
```
[Original article](https://clickhouse.tech/docs/en/operations/settings/settings/) <!-- hide -->
## output_format_pretty_row_numbers {#output_format_pretty_row_numbers}
Adds row numbers to output in the [Pretty](../../interfaces/formats.md#pretty) format.
Possible values:
- 0 — Output without row numbers.
- 1 — Output with row numbers.
Default value: `0`.
**Example**
Query:
```sql
SET output_format_pretty_row_numbers = 1;
SELECT TOP 3 name, value FROM system.settings;
```
Result:
```text
┌─name────────────────────┬─value───┐
1. │ min_compress_block_size │ 65536 │
2. │ max_compress_block_size │ 1048576 │
3. │ max_block_size │ 65505 │
└─────────────────────────┴─────────┘
```
## allow_experimental_bigint_types {#allow_experimental_bigint_types}
@ -2141,3 +2202,5 @@ Possible values:
- 0 — The bigint data type is disabled.
Default value: `0`.
[Original article](https://clickhouse.tech/docs/en/operations/settings/settings/) <!-- hide -->

View File

@ -1,6 +1,6 @@
## system.asynchronous_metric_log {#system-tables-async-log}
Contains the historical values for `system.asynchronous_metrics`, which are saved once per minute. This feature is enabled by default.
Contains the historical values for `system.asynchronous_metrics`, which are saved once per minute. Enabled by default.
Columns:
@ -33,7 +33,7 @@ SELECT * FROM system.asynchronous_metric_log LIMIT 10
**See Also**
- [system.asynchronous_metrics](../system-tables/asynchronous_metrics.md) — Contains metrics that are calculated periodically in the background.
- [system.asynchronous_metrics](../system-tables/asynchronous_metrics.md) — Contains metrics, calculated periodically in the background.
- [system.metric_log](../system-tables/metric_log.md) — Contains history of metrics values from tables `system.metrics` and `system.events`, periodically flushed to disk.
[Original article](https://clickhouse.tech/docs/en/operations/system_tables/asynchronous_metric_log) <!--hide-->

View File

@ -6,19 +6,21 @@ You can use this table to get information similar to the [DESCRIBE TABLE](../../
The `system.columns` table contains the following columns (the column type is shown in brackets):
- `database` (String) — Database name.
- `table` (String) — Table name.
- `name` (String) — Column name.
- `type` (String) — Column type.
- `default_kind` (String) — Expression type (`DEFAULT`, `MATERIALIZED`, `ALIAS`) for the default value, or an empty string if it is not defined.
- `default_expression` (String) — Expression for the default value, or an empty string if it is not defined.
- `data_compressed_bytes` (UInt64) — The size of compressed data, in bytes.
- `data_uncompressed_bytes` (UInt64) — The size of decompressed data, in bytes.
- `marks_bytes` (UInt64) — The size of marks, in bytes.
- `comment` (String) — Comment on the column, or an empty string if it is not defined.
- `is_in_partition_key` (UInt8) — Flag that indicates whether the column is in the partition expression.
- `is_in_sorting_key` (UInt8) — Flag that indicates whether the column is in the sorting key expression.
- `is_in_primary_key` (UInt8) — Flag that indicates whether the column is in the primary key expression.
- `is_in_sampling_key` (UInt8) — Flag that indicates whether the column is in the sampling key expression.
- `database` ([String](../../sql-reference/data-types/string.md)) — Database name.
- `table` ([String](../../sql-reference/data-types/string.md)) — Table name.
- `name` ([String](../../sql-reference/data-types/string.md)) — Column name.
- `type` ([String](../../sql-reference/data-types/string.md)) — Column type.
- `position` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Ordinal position of a column in a table starting with 1.
- `default_kind` ([String](../../sql-reference/data-types/string.md)) — Expression type (`DEFAULT`, `MATERIALIZED`, `ALIAS`) for the default value, or an empty string if it is not defined.
- `default_expression` ([String](../../sql-reference/data-types/string.md)) — Expression for the default value, or an empty string if it is not defined.
- `data_compressed_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The size of compressed data, in bytes.
- `data_uncompressed_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The size of decompressed data, in bytes.
- `marks_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The size of marks, in bytes.
- `comment` ([String](../../sql-reference/data-types/string.md)) — Comment on the column, or an empty string if it is not defined.
- `is_in_partition_key` ([UInt8](../../sql-reference/data-types/int-uint.md)) — Flag that indicates whether the column is in the partition expression.
- `is_in_sorting_key` ([UInt8](../../sql-reference/data-types/int-uint.md)) — Flag that indicates whether the column is in the sorting key expression.
- `is_in_primary_key` ([UInt8](../../sql-reference/data-types/int-uint.md)) — Flag that indicates whether the column is in the primary key expression.
- `is_in_sampling_key` ([UInt8](../../sql-reference/data-types/int-uint.md)) — Flag that indicates whether the column is in the sampling key expression.
- `compression_codec` ([String](../../sql-reference/data-types/string.md)) — Compression codec name.
[Original article](https://clickhouse.tech/docs/en/operations/system_tables/columns) <!--hide-->

View File

@ -0,0 +1,48 @@
# system.crash_log {#system-tables_crash_log}
Contains information about stack traces for fatal errors. The table does not exist in the database by default, it is created only when fatal errors occur.
Columns:
- `event_date` ([Datetime](../../sql-reference/data-types/datetime.md)) — Date of the event.
- `event_time` ([Datetime](../../sql-reference/data-types/datetime.md)) — Time of the event.
- `timestamp_ns` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Timestamp of the event with nanoseconds.
- `signal` ([Int32](../../sql-reference/data-types/int-uint.md)) — Signal number.
- `thread_id` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Thread ID.
- `query_id` ([String](../../sql-reference/data-types/string.md)) — Query ID.
- `trace` ([Array](../../sql-reference/data-types/array.md)([UInt64](../../sql-reference/data-types/int-uint.md))) — Stack trace at the moment of crash. Each element is a virtual memory address inside ClickHouse server process.
- `trace_full` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — Stack trace at the moment of crash. Each element contains a called method inside ClickHouse server process.
- `version` ([String](../../sql-reference/data-types/string.md)) — ClickHouse server version.
- `revision` ([UInt32](../../sql-reference/data-types/int-uint.md)) — ClickHouse server revision.
- `build_id` ([String](../../sql-reference/data-types/string.md)) — BuildID that is generated by compiler.
**Example**
Query:
``` sql
SELECT * FROM system.crash_log ORDER BY event_time DESC LIMIT 1;
```
Result (not full):
``` text
Row 1:
──────
event_date: 2020-10-14
event_time: 2020-10-14 15:47:40
timestamp_ns: 1602679660271312710
signal: 11
thread_id: 23624
query_id: 428aab7c-8f5c-44e9-9607-d16b44467e69
trace: [188531193,...]
trace_full: ['3. DB::(anonymous namespace)::FunctionFormatReadableTimeDelta::executeImpl(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> >&, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > const&, unsigned long, unsigned long) const @ 0xb3cc1f9 in /home/username/work/ClickHouse/build/programs/clickhouse',...]
version: ClickHouse 20.11.1.1
revision: 54442
build_id:
```
**See also**
- [trace_log](../../operations/system-tables/trace_log.md) system table
[Original article](https://clickhouse.tech/docs/en/operations/system-tables/crash-log)

View File

@ -0,0 +1,23 @@
# system.errors {#system_tables-errors}
Contains error codes with number of times they have been triggered.
Columns:
- `name` ([String](../../sql-reference/data-types/string.md)) — name of the error (`errorCodeToName`).
- `code` ([Int32](../../sql-reference/data-types/int-uint.md)) — code number of the error.
- `value` ([UInt64](../../sql-reference/data-types/int-uint.md)) - number of times this error has been happened.
**Example**
``` sql
SELECT *
FROM system.errors
WHERE value > 0
ORDER BY code ASC
LIMIT 1
┌─name─────────────┬─code─┬─value─┐
│ CANNOT_OPEN_FILE │ 76 │ 1 │
└──────────────────┴──────┴───────┘
```

View File

@ -1,6 +1,7 @@
# system.metric_log {#system_tables-metric_log}
Contains history of metrics values from tables `system.metrics` and `system.events`, periodically flushed to disk.
To turn on metrics history collection on `system.metric_log`, create `/etc/clickhouse-server/config.d/metric_log.xml` with following content:
``` xml
@ -14,6 +15,11 @@ To turn on metrics history collection on `system.metric_log`, create `/etc/click
</yandex>
```
Columns:
- `event_date` ([Date](../../sql-reference/data-types/date.md)) — Event date.
- `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Event time.
- `event_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — Event time with microseconds resolution.
**Example**
``` sql

View File

@ -0,0 +1,148 @@
# system.parts_columns {#system_tables-parts_columns}
Contains information about parts and columns of [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) tables.
Each row describes one data part.
Columns:
- `partition` ([String](../../sql-reference/data-types/string.md)) — The partition name. To learn what a partition is, see the description of the [ALTER](../../sql-reference/statements/alter/index.md#query_language_queries_alter) query.
Formats:
- `YYYYMM` for automatic partitioning by month.
- `any_string` when partitioning manually.
- `name` ([String](../../sql-reference/data-types/string.md)) — Name of the data part.
- `part_type` ([String](../../sql-reference/data-types/string.md)) — The data part storing format.
Possible values:
- `Wide` — Each column is stored in a separate file in a filesystem.
- `Compact` — All columns are stored in one file in a filesystem.
Data storing format is controlled by the `min_bytes_for_wide_part` and `min_rows_for_wide_part` settings of the [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) table.
- `active` ([UInt8](../../sql-reference/data-types/int-uint.md)) — Flag that indicates whether the data part is active. If a data part is active, its used in a table. Otherwise, its deleted. Inactive data parts remain after merging.
- `marks` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The number of marks. To get the approximate number of rows in a data part, multiply `marks` by the index granularity (usually 8192) (this hint doesnt work for adaptive granularity).
- `rows` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The number of rows.
- `bytes_on_disk` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Total size of all the data part files in bytes.
- `data_compressed_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Total size of compressed data in the data part. All the auxiliary files (for example, files with marks) are not included.
- `data_uncompressed_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Total size of uncompressed data in the data part. All the auxiliary files (for example, files with marks) are not included.
- `marks_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The size of the file with marks.
- `modification_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — The time the directory with the data part was modified. This usually corresponds to the time of data part creation.
- `remove_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — The time when the data part became inactive.
- `refcount` ([UInt32](../../sql-reference/data-types/int-uint.md)) — The number of places where the data part is used. A value greater than 2 indicates that the data part is used in queries or merges.
- `min_date` ([Date](../../sql-reference/data-types/date.md)) — The minimum value of the date key in the data part.
- `max_date` ([Date](../../sql-reference/data-types/date.md)) — The maximum value of the date key in the data part.
- `partition_id` ([String](../../sql-reference/data-types/string.md)) — ID of the partition.
- `min_block_number` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The minimum number of data parts that make up the current part after merging.
- `max_block_number` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The maximum number of data parts that make up the current part after merging.
- `level` ([UInt32](../../sql-reference/data-types/int-uint.md)) — Depth of the merge tree. Zero means that the current part was created by insert rather than by merging other parts.
- `data_version` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Number that is used to determine which mutations should be applied to the data part (mutations with a version higher than `data_version`).
- `primary_key_bytes_in_memory` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The amount of memory (in bytes) used by primary key values.
- `primary_key_bytes_in_memory_allocated` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The amount of memory (in bytes) reserved for primary key values.
- `database` ([String](../../sql-reference/data-types/string.md)) — Name of the database.
- `table` ([String](../../sql-reference/data-types/string.md)) — Name of the table.
- `engine` ([String](../../sql-reference/data-types/string.md)) — Name of the table engine without parameters.
- `disk_name` ([String](../../sql-reference/data-types/string.md)) — Name of a disk that stores the data part.
- `path` ([String](../../sql-reference/data-types/string.md)) — Absolute path to the folder with data part files.
- `column` ([String](../../sql-reference/data-types/string.md)) — Name of the column.
- `type` ([String](../../sql-reference/data-types/string.md)) — Column type.
- `column_position` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Ordinal position of a column in a table starting with 1.
- `default_kind` ([String](../../sql-reference/data-types/string.md)) — Expression type (`DEFAULT`, `MATERIALIZED`, `ALIAS`) for the default value, or an empty string if it is not defined.
- `default_expression` ([String](../../sql-reference/data-types/string.md)) — Expression for the default value, or an empty string if it is not defined.
- `column_bytes_on_disk` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Total size of the column in bytes.
- `column_data_compressed_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Total size of compressed data in the column, in bytes.
- `column_data_uncompressed_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Total size of the decompressed data in the column, in bytes.
- `column_marks_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The size of the column with marks, in bytes.
- `bytes` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Alias for `bytes_on_disk`.
- `marks_size` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Alias for `marks_bytes`.
**Example**
``` sql
SELECT * FROM system.parts_columns LIMIT 1 FORMAT Vertical;
```
``` text
Row 1:
──────
partition: tuple()
name: all_1_2_1
part_type: Wide
active: 1
marks: 2
rows: 2
bytes_on_disk: 155
data_compressed_bytes: 56
data_uncompressed_bytes: 4
marks_bytes: 96
modification_time: 2020-09-23 10:13:36
remove_time: 2106-02-07 06:28:15
refcount: 1
min_date: 1970-01-01
max_date: 1970-01-01
partition_id: all
min_block_number: 1
max_block_number: 2
level: 1
data_version: 1
primary_key_bytes_in_memory: 2
primary_key_bytes_in_memory_allocated: 64
database: default
table: 53r93yleapyears
engine: MergeTree
disk_name: default
path: /var/lib/clickhouse/data/default/53r93yleapyears/all_1_2_1/
column: id
type: Int8
column_position: 1
default_kind:
default_expression:
column_bytes_on_disk: 76
column_data_compressed_bytes: 28
column_data_uncompressed_bytes: 2
column_marks_bytes: 48
```
**See Also**
- [MergeTree family](../../engines/table-engines/mergetree-family/mergetree.md)
[Original article](https://clickhouse.tech/docs/en/operations/system_tables/parts_columns) <!--hide-->

View File

@ -20,8 +20,8 @@ The `system.query_log` table registers two kinds of queries:
Each query creates one or two rows in the `query_log` table, depending on the status (see the `type` column) of the query:
1. If the query execution was successful, two rows with the `QueryStart` and `QueryFinish` types are created .
2. If an error occurred during query processing, two events with the `QueryStart` and `ExceptionWhileProcessing` types are created .
1. If the query execution was successful, two rows with the `QueryStart` and `QueryFinish` types are created.
2. If an error occurred during query processing, two events with the `QueryStart` and `ExceptionWhileProcessing` types are created.
3. If an error occurred before launching the query, a single event with the `ExceptionBeforeStart` type is created.
Columns:
@ -37,8 +37,8 @@ Columns:
- `query_start_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Start time of query execution.
- `query_start_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — Start time of query execution with microsecond precision.
- `query_duration_ms` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Duration of query execution in milliseconds.
- `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number or rows read from all tables and table functions participated in query. It includes usual subqueries, subqueries for `IN` and `JOIN`. For distributed queries `read_rows` includes the total number of rows read at all replicas. Each replica sends its `read_rows` value, and the server-initiator of the query summarize all received and local values. The cache volumes doesnt affect this value.
- `read_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number or bytes read from all tables and table functions participated in query. It includes usual subqueries, subqueries for `IN` and `JOIN`. For distributed queries `read_bytes` includes the total number of rows read at all replicas. Each replica sends its `read_bytes` value, and the server-initiator of the query summarize all received and local values. The cache volumes doesnt affect this value.
- `read_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number of rows read from all tables and table functions participated in query. It includes usual subqueries, subqueries for `IN` and `JOIN`. For distributed queries `read_rows` includes the total number of rows read at all replicas. Each replica sends its `read_rows` value, and the server-initiator of the query summarizes all received and local values. The cache volumes dont affect this value.
- `read_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number of bytes read from all tables and table functions participated in query. It includes usual subqueries, subqueries for `IN` and `JOIN`. For distributed queries `read_bytes` includes the total number of rows read at all replicas. Each replica sends its `read_bytes` value, and the server-initiator of the query summarizes all received and local values. The cache volumes dont affect this value.
- `written_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — For `INSERT` queries, the number of written rows. For other queries, the column value is 0.
- `written_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — For `INSERT` queries, the number of written bytes. For other queries, the column value is 0.
- `result_rows` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of rows in a result of the `SELECT` query, or a number of rows in the `INSERT` query.

View File

@ -1,6 +1,6 @@
# system.query_thread_log {#system_tables-query_thread_log}
Contains information about threads which execute queries, for example, thread name, thread start time, duration of query processing.
Contains information about threads that execute queries, for example, thread name, thread start time, duration of query processing.
To start logging:

View File

@ -53,9 +53,9 @@ Columns:
- `table` (`String`) - Table name
- `engine` (`String`) - Table engine name
- `is_leader` (`UInt8`) - Whether the replica is the leader.
Only one replica at a time can be the leader. The leader is responsible for selecting background merges to perform.
Multiple replicas can be leaders at the same time. A replica can be prevented from becoming a leader using the `merge_tree` setting `replicated_can_become_leader`. The leaders are responsible for scheduling background merges.
Note that writes can be performed to any replica that is available and has a session in ZK, regardless of whether it is a leader.
- `can_become_leader` (`UInt8`) - Whether the replica can be elected as a leader.
- `can_become_leader` (`UInt8`) - Whether the replica can be a leader.
- `is_readonly` (`UInt8`) - Whether the replica is in read-only mode.
This mode is turned on if the config doesnt have sections with ZooKeeper, if an unknown error occurred when reinitializing sessions in ZooKeeper, and during session reinitialization in ZooKeeper.
- `is_session_expired` (`UInt8`) - the session with ZooKeeper has expired. Basically the same as `is_readonly`.

View File

@ -1,6 +1,6 @@
# system.text_log {#system_tables-text_log}
Contains logging entries. Logging level which goes to this table can be limited with `text_log.level` server setting.
Contains logging entries. The logging level which goes to this table can be limited to the `text_log.level` server setting.
Columns:

View File

@ -18,7 +18,7 @@ Columns:
- `revision` ([UInt32](../../sql-reference/data-types/int-uint.md)) — ClickHouse server build revision.
When connecting to server by `clickhouse-client`, you see the string similar to `Connected to ClickHouse server version 19.18.1 revision 54429.`. This field contains the `revision`, but not the `version` of a server.
When connecting to the server by `clickhouse-client`, you see the string similar to `Connected to ClickHouse server version 19.18.1 revision 54429.`. This field contains the `revision`, but not the `version` of a server.
- `timer_type` ([Enum8](../../sql-reference/data-types/enum.md)) — Timer type:

View File

@ -16,7 +16,7 @@ By default `clickhouse-local` does not have access to data on the same host, but
!!! warning "Warning"
It is not recommended to load production server configuration into `clickhouse-local` because data can be damaged in case of human error.
For temporary data an unique temporary data directory is created by default. If you want to override this behavior the data directory can be explicitly specified with the `-- --path` option.
For temporary data, a unique temporary data directory is created by default. If you want to override this behavior, the data directory can be explicitly specified with the `-- --path` option.
## Usage {#usage}

View File

@ -80,4 +80,4 @@ Code: 43. DB::Exception: Received from localhost:9000. DB::Exception: Wrong argu
## See Also {#see-also}
- [INTERVAL](../../../sql-reference/operators/index.md#operator-interval) operator
- [toInterval](../../../sql-reference/functions/type-conversion-functions.md#function-tointerval) type convertion functions
- [toInterval](../../../sql-reference/functions/type-conversion-functions.md#function-tointerval) type conversion functions

View File

@ -59,7 +59,8 @@ LAYOUT(LAYOUT_TYPE(param value)) -- layout settings
- [range_hashed](#range-hashed)
- [complex_key_hashed](#complex-key-hashed)
- [complex_key_cache](#complex-key-cache)
- [ssd_complex_key_cache](#ssd-cache)
- [ssd_cache](#ssd-cache)
- [ssd_complex_key_cache](#complex-key-ssd-cache)
- [complex_key_direct](#complex-key-direct)
- [ip_trie](#ip-trie)

View File

@ -23,8 +23,6 @@ SELECT
└─────────────────────┴────────────┴────────────┴─────────────────────┘
```
Only time zones that differ from UTC by a whole number of hours are supported.
## toTimeZone {#totimezone}
Convert time or date and time to the specified time zone.

View File

@ -6,7 +6,7 @@ toc_title: Encoding
# Encoding Functions {#encoding-functions}
## char {#char}
Returns the string with the length as the number of passed arguments and each byte has the value of corresponding argument. Accepts multiple arguments of numeric types. If the value of argument is out of range of UInt8 data type, it is converted to UInt8 with possible rounding and overflow.
**Syntax**

View File

@ -551,7 +551,7 @@ formatReadableTimeDelta(column[, maximum_unit])
**Parameters**
- `column` — A column with numeric time delta.
- `maximum_unit` — Optional. Maximum unit to show. Acceptable values seconds, minutes, hours, days, months, years.
- `maximum_unit` — Optional. Maximum unit to show. Acceptable values seconds, minutes, hours, days, months, years.
Example:
@ -626,7 +626,12 @@ neighbor(column, offset[, default_value])
```
The result of the function depends on the affected data blocks and the order of data in the block.
If you make a subquery with ORDER BY and call the function from outside the subquery, you can get the expected result.
!!! warning "Warning"
It can reach the neighbor rows only inside the currently processed data block.
The rows order used during the calculation of `neighbor` can differ from the order of rows returned to the user.
To prevent that you can make a subquery with ORDER BY and call the function from outside the subquery.
**Parameters**
@ -731,8 +736,13 @@ Result:
Calculates the difference between successive row values in the data block.
Returns 0 for the first row and the difference from the previous row for each subsequent row.
!!! warning "Warning"
It can reach the previos row only inside the currently processed data block.
The result of the function depends on the affected data blocks and the order of data in the block.
If you make a subquery with ORDER BY and call the function from outside the subquery, you can get the expected result.
The rows order used during the calculation of `runningDifference` can differ from the order of rows returned to the user.
To prevent that you can make a subquery with ORDER BY and call the function from outside the subquery.
Example:
@ -1584,7 +1594,7 @@ isDecimalOverflow(d, [p])
**Parameters**
- `d` — value. [Decimal](../../sql-reference/data-types/decimal.md).
- `p` — precision. Optional. If omitted, the initial presicion of the first argument is used. Using of this paratemer could be helpful for data extraction to another DBMS or file. [UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges).
- `p` — precision. Optional. If omitted, the initial precision of the first argument is used. Using of this paratemer could be helpful for data extraction to another DBMS or file. [UInt8](../../sql-reference/data-types/int-uint.md#uint-ranges).
**Returned values**
@ -1647,4 +1657,24 @@ Result:
10 10 19 19 39 39
```
## errorCodeToName {#error-code-to-name}
**Returned value**
- Variable name for the error code.
Type: [LowCardinality(String)](../../sql-reference/data-types/lowcardinality.md).
**Syntax**
``` sql
errorCodeToName(1)
```
Result:
``` text
UNSUPPORTED_METHOD
```
[Original article](https://clickhouse.tech/docs/en/query_language/functions/other_functions/) <!--hide-->

View File

@ -461,6 +461,66 @@ For other regular expressions, the code is the same as for the match funct
The same thing as like, but negative.
## ilike {#ilike}
Case insensitive variant of [like](https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/#function-like) function. You can use `ILIKE` operator instead of the `ilike` function.
**Syntax**
``` sql
ilike(haystack, pattern)
```
**Parameters**
- `haystack` — Input string. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `pattern` — If `pattern` doesn't contain percent signs or underscores, then the `pattern` only represents the string itself. An underscore (`_`) in `pattern` stands for (matches) any single character. A percent sign (`%`) matches any sequence of zero or more characters.
Some `pattern` examples:
``` text
'abc' ILIKE 'abc' true
'abc' ILIKE 'a%' true
'abc' ILIKE '_b_' true
'abc' ILIKE 'c' false
```
**Returned values**
- True, if the string matches `pattern`.
- False, if the string doesn't match `pattern`.
**Example**
Input table:
``` text
┌─id─┬─name─────┬─days─┐
│ 1 │ January │ 31 │
│ 2 │ February │ 29 │
│ 3 │ March │ 31 │
│ 4 │ April │ 30 │
└────┴──────────┴──────┘
```
Query:
``` sql
SELECT * FROM Months WHERE ilike(name, '%j%')
```
Result:
``` text
┌─id─┬─name────┬─days─┐
│ 1 │ January │ 31 │
└────┴─────────┴──────┘
```
**See Also**
- [like](https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/#function-like) <!--hide-->
## ngramDistance(haystack, needle) {#ngramdistancehaystack-needle}
Calculates the 4-gram distance between `haystack` and `needle`: counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns float number from 0 to 1 the closer to zero, the more strings are similar to each other. If the constant `needle` or `haystack` is more than 32Kb, throws an exception. If some of the non-constant `haystack` or `needle` strings are more than 32Kb, the distance is always one.

View File

@ -323,6 +323,10 @@ This function accepts a number or date or date with time, and returns a string c
This function accepts a number or date or date with time, and returns a FixedString containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a FixedString that is one byte long.
## reinterpretAsUUID {#reinterpretasuuid}
This function accepts FixedString, and returns UUID. Takes 16 bytes string. If the string isn't long enough, the functions work as if the string is padded with the necessary number of null bytes to the end. If the string longer than 16 bytes, the extra bytes at the end are ignored.
## CAST(x, T) {#type_conversion_function-cast}
Converts x to the t data type. The syntax CAST(x AS t) is also supported.
@ -780,4 +784,42 @@ Result:
└──────────────────────────────────┘
```
## formatRowNoNewline {#formatrownonewline}
Converts arbitrary expressions into a string via given format. The function trims the last `\n` if any.
**Syntax**
``` sql
formatRowNoNewline(format, x, y, ...)
```
**Parameters**
- `format` — Text format. For example, [CSV](../../interfaces/formats.md#csv), [TSV](../../interfaces/formats.md#tabseparated).
- `x`,`y`, ... — Expressions.
**Returned value**
- A formatted string.
**Example**
Query:
``` sql
SELECT formatRowNoNewline('CSV', number, 'good')
FROM numbers(3)
```
Result:
``` text
┌─formatRowNoNewline('CSV', number, 'good')─┐
│ 0,"good" │
│ 1,"good" │
│ 2,"good" │
└───────────────────────────────────────────┘
```
[Original article](https://clickhouse.tech/docs/en/query_language/functions/type_conversion_functions/) <!--hide-->

View File

@ -61,6 +61,54 @@ SELECT toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0') AS uuid
└──────────────────────────────────────┘
```
## toUUIDOrNull (x) {#touuidornull-x}
It takes an argument of type String and tries to parse it into UUID. If failed, returns NULL.
``` sql
toUUIDOrNull(String)
```
**Returned value**
The Nullable(UUID) type value.
**Usage example**
``` sql
SELECT toUUIDOrNull('61f0c404-5cb3-11e7-907b-a6006ad3dba0T') AS uuid
```
``` text
┌─uuid─┐
│ ᴺᵁᴸᴸ │
└──────┘
```
## toUUIDOrZero (x) {#touuidorzero-x}
It takes an argument of type String and tries to parse it into UUID. If failed, returns zero UUID.
``` sql
toUUIDOrZero(String)
```
**Returned value**
The UUID type value.
**Usage example**
``` sql
SELECT toUUIDOrZero('61f0c404-5cb3-11e7-907b-a6006ad3dba0T') AS uuid
```
``` text
┌─────────────────────────────────uuid─┐
│ 00000000-0000-0000-0000-000000000000 │
└──────────────────────────────────────┘
```
## UUIDStringToNum {#uuidstringtonum}
Accepts a string containing 36 characters in the format `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`, and returns it as a set of bytes in a [FixedString(16)](../../sql-reference/data-types/fixedstring.md).

View File

@ -1,5 +1,5 @@
---
toc_priority: 37
toc_priority: 38
toc_title: Operators
---
@ -53,6 +53,8 @@ ClickHouse transforms operators to their corresponding functions at the query pa
`a NOT LIKE s` The `notLike(a, b)` function.
`a ILIKE s` The `ilike(a, b)` function.
`a BETWEEN b AND c` The same as `a >= b AND a <= c`.
`a NOT BETWEEN b AND c` The same as `a < b OR a > c`.
@ -149,25 +151,47 @@ Types of intervals:
- `QUARTER`
- `YEAR`
You can also use a string literal when setting the `INTERVAL` value. For example, `INTERVAL 1 HOUR` is identical to the `INTERVAL '1 hour'` or `INTERVAL '1' hour`.
!!! warning "Warning"
Intervals with different types cant be combined. You cant use expressions like `INTERVAL 4 DAY 1 HOUR`. Specify intervals in units that are smaller or equal to the smallest unit of the interval, for example, `INTERVAL 25 HOUR`. You can use consecutive operations, like in the example below.
Example:
Examples:
``` sql
SELECT now() AS current_date_time, current_date_time + INTERVAL 4 DAY + INTERVAL 3 HOUR
SELECT now() AS current_date_time, current_date_time + INTERVAL 4 DAY + INTERVAL 3 HOUR;
```
``` text
┌───current_date_time─┬─plus(plus(now(), toIntervalDay(4)), toIntervalHour(3))─┐
│ 2019-10-23 11:16:28 │ 2019-10-27 14:16:28
│ 2020-11-03 22:09:50 │ 2020-11-08 01:09:50
└─────────────────────┴────────────────────────────────────────────────────────┘
```
``` sql
SELECT now() AS current_date_time, current_date_time + INTERVAL '4 day' + INTERVAL '3 hour';
```
``` text
┌───current_date_time─┬─plus(plus(now(), toIntervalDay(4)), toIntervalHour(3))─┐
│ 2020-11-03 22:12:10 │ 2020-11-08 01:12:10 │
└─────────────────────┴────────────────────────────────────────────────────────┘
```
``` sql
SELECT now() AS current_date_time, current_date_time + INTERVAL '4' day + INTERVAL '3' hour;
```
``` text
┌───current_date_time─┬─plus(plus(now(), toIntervalDay('4')), toIntervalHour('3'))─┐
│ 2020-11-03 22:33:19 │ 2020-11-08 01:33:19 │
└─────────────────────┴────────────────────────────────────────────────────────────┘
```
**See Also**
- [Interval](../../sql-reference/data-types/special-data-types/interval.md) data type
- [toInterval](../../sql-reference/functions/type-conversion-functions.md#function-tointerval) type convertion functions
- [toInterval](../../sql-reference/functions/type-conversion-functions.md#function-tointerval) type conversion functions
## Logical Negation Operator {#logical-negation-operator}

View File

@ -1,5 +1,5 @@
---
toc_priority: 36
toc_priority: 35
toc_title: ALTER
---

View File

@ -5,16 +5,16 @@ toc_title: SAMPLE BY
# Manipulating Sampling-Key Expressions {#manipulations-with-sampling-key-expressions}
Syntax:
``` sql
ALTER TABLE [db].name [ON CLUSTER cluster] MODIFY SAMPLE BY new_expression
```
The command changes the [sampling key](../../../engines/table-engines/mergetree-family/mergetree.md) of the table to `new_expression` (an expression or a tuple of expressions).
The command is lightweight in a sense that it only changes metadata. The primary key must contain the new sample key.
The command is lightweight in the sense that it only changes metadata. The primary key must contain the new sample key.
!!! note "Note"
It only works for tables in the [`MergeTree`](../../../engines/table-engines/mergetree-family/mergetree.md) family (including
[replicated](../../../engines/table-engines/mergetree-family/replication.md) tables).
It only works for tables in the [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md) family (including
[replicated](../../../engines/table-engines/mergetree-family/replication.md) tables).

View File

@ -1,5 +1,5 @@
---
toc_priority: 42
toc_priority: 40
toc_title: ATTACH
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 43
toc_priority: 41
toc_title: CHECK
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 1
toc_priority: 35
toc_title: DATABASE
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 4
toc_priority: 38
toc_title: DICTIONARY
---

View File

@ -1,6 +1,6 @@
---
toc_folder_title: CREATE
toc_priority: 35
toc_priority: 34
toc_title: Overview
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 8
toc_priority: 42
toc_title: QUOTA
---

View File

@ -1,5 +1,5 @@
---
toc_priority: 6
toc_priority: 40
toc_title: ROLE
---

Some files were not shown because too many files have changed in this diff Show More