Merge remote-tracking branch 'upstream/master' into HEAD

This commit is contained in:
Anton Popov 2020-07-09 20:51:36 +03:00
commit 9f72baad1f
565 changed files with 15872 additions and 12630 deletions

View File

@ -20,9 +20,12 @@
* Support writes in ODBC Table function [#10554](https://github.com/ClickHouse/ClickHouse/pull/10554) ([ageraab](https://github.com/ageraab)). [#10901](https://github.com/ClickHouse/ClickHouse/pull/10901) ([tavplubix](https://github.com/tavplubix)).
* Add query performance metrics based on Linux `perf_events` (these metrics are calculated with hardware CPU counters and OS counters). It is optional and requires `CAP_SYS_ADMIN` to be set on clickhouse binary. [#9545](https://github.com/ClickHouse/ClickHouse/pull/9545) [Andrey Skobtsov](https://github.com/And42). [#11226](https://github.com/ClickHouse/ClickHouse/pull/11226) ([Alexander Kuzmenkov](https://github.com/akuzm)).
* Now support `NULL` and `NOT NULL` modifiers for data types in `CREATE` query. [#11057](https://github.com/ClickHouse/ClickHouse/pull/11057) ([Павел Потемкин](https://github.com/Potya)).
* Add `ArrowStream` input and output format. [#11088](https://github.com/ClickHouse/ClickHouse/pull/11088) ([hcz](https://github.com/hczhcz)).
* Support Cassandra as external dictionary source. [#4978](https://github.com/ClickHouse/ClickHouse/pull/4978) ([favstovol](https://github.com/favstovol)).
* Added a new layout `direct` which loads all the data directly from the source for each query, without storing or caching data. [#10622](https://github.com/ClickHouse/ClickHouse/pull/10622) ([Artem Streltsov](https://github.com/kekekekule)).
* Added new `complex_key_direct` layout to dictionaries, that does not store anything locally during query execution. [#10850](https://github.com/ClickHouse/ClickHouse/pull/10850) ([Artem Streltsov](https://github.com/kekekekule)).
* Added support for MySQL style global variables syntax (stub). This is needed for compatibility of MySQL protocol. [#11832](https://github.com/ClickHouse/ClickHouse/pull/11832) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Added syntax highligting to `clickhouse-client` using `replxx`. [#11422](https://github.com/ClickHouse/ClickHouse/pull/11422) ([Tagir Kuskarov](https://github.com/kuskarov)).
* `minMap` and `maxMap` functions were added. [#11603](https://github.com/ClickHouse/ClickHouse/pull/11603) ([Ildus Kurbangaliev](https://github.com/ildus)).
* Add the `system.asynchronous_metric_log` table that logs historical metrics from `system.asynchronous_metrics`. [#11588](https://github.com/ClickHouse/ClickHouse/pull/11588) ([Alexander Kuzmenkov](https://github.com/akuzm)).
* Add functions `extractAllGroupsHorizontal(haystack, re)` and `extractAllGroupsVertical(haystack, re)`. [#11554](https://github.com/ClickHouse/ClickHouse/pull/11554) ([Vasily Nemkov](https://github.com/Enmk)).
@ -30,6 +33,7 @@
* Add `netloc` function for extracting network location, similar to `urlparse(url)`, `netloc` in python. [#11356](https://github.com/ClickHouse/ClickHouse/pull/11356) ([Guillaume Tassery](https://github.com/YiuRULE)).
* Add 2 more virtual columns for engine=Kafka to access message headers. [#11283](https://github.com/ClickHouse/ClickHouse/pull/11283) ([filimonov](https://github.com/filimonov)).
* Add `_timestamp_ms` virtual column for Kafka engine (type is `Nullable(DateTime64(3))`). [#11260](https://github.com/ClickHouse/ClickHouse/pull/11260) ([filimonov](https://github.com/filimonov)).
* Add function `randomFixedString`. [#10866](https://github.com/ClickHouse/ClickHouse/pull/10866) ([Andrei Nekrashevich](https://github.com/xolm)).
* Add function `fuzzBits` that randomly flips bits in a string with given probability. [#11237](https://github.com/ClickHouse/ClickHouse/pull/11237) ([Andrei Nekrashevich](https://github.com/xolm)).
* Allow comparison of numbers with constant string in comparison operators, IN and VALUES sections. [#11647](https://github.com/ClickHouse/ClickHouse/pull/11647) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add `round_robin` load_balancing mode. [#11645](https://github.com/ClickHouse/ClickHouse/pull/11645) ([Azat Khuzhin](https://github.com/azat)).
@ -138,6 +142,7 @@
* Fixed parseDateTime64BestEffort argument resolution bugs. [#10925](https://github.com/ClickHouse/ClickHouse/issues/10925). [#11038](https://github.com/ClickHouse/ClickHouse/pull/11038) ([Vasily Nemkov](https://github.com/Enmk)).
* Now it's possible to `ADD/DROP` and `RENAME` the same one column in a single `ALTER` query. Exception message for simultaneous `MODIFY` and `RENAME` became more clear. Partially fixes [#10669](https://github.com/ClickHouse/ClickHouse/issues/10669). [#11037](https://github.com/ClickHouse/ClickHouse/pull/11037) ([alesapin](https://github.com/alesapin)).
* Fixed parsing of S3 URLs. [#11036](https://github.com/ClickHouse/ClickHouse/pull/11036) ([Vladimir Chebotarev](https://github.com/excitoon)).
* Fix memory tracking for two-level `GROUP BY` when there is a `LIMIT`. [#11022](https://github.com/ClickHouse/ClickHouse/pull/11022) ([Azat Khuzhin](https://github.com/azat)).
* Fix very rare potential use-after-free error in MergeTree if table was not created successfully. [#10986](https://github.com/ClickHouse/ClickHouse/pull/10986) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix metadata (relative path for rename) and data (relative path for symlink) handling for Atomic database. [#10980](https://github.com/ClickHouse/ClickHouse/pull/10980) ([Azat Khuzhin](https://github.com/azat)).
* Fix server crash on concurrent `ALTER` and `DROP DATABASE` queries with `Atomic` database engine. [#10968](https://github.com/ClickHouse/ClickHouse/pull/10968) ([tavplubix](https://github.com/tavplubix)).
@ -206,6 +211,7 @@
* Default user and database creation on docker image starting. [#10637](https://github.com/ClickHouse/ClickHouse/pull/10637) ([Paramtamtam](https://github.com/tarampampam)).
* When multiline query is printed to server log, the lines are joined. Make it to work correct in case of multiline string literals, identifiers and single-line comments. This fixes [#3853](https://github.com/ClickHouse/ClickHouse/issues/3853). [#11686](https://github.com/ClickHouse/ClickHouse/pull/11686) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Multiple names are now allowed in commands: CREATE USER, CREATE ROLE, ALTER USER, SHOW CREATE USER, SHOW GRANTS and so on. [#11670](https://github.com/ClickHouse/ClickHouse/pull/11670) ([Vitaly Baranov](https://github.com/vitlibar)).
* Add support for distributed DDL (`UPDATE/DELETE/DROP PARTITION`) on cross replication clusters. [#11508](https://github.com/ClickHouse/ClickHouse/pull/11508) ([frank lee](https://github.com/etah000)).
* Clear password from command line in `clickhouse-client` and `clickhouse-benchmark` if the user has specified it with explicit value. This prevents password exposure by `ps` and similar tools. [#11665](https://github.com/ClickHouse/ClickHouse/pull/11665) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Don't use debug info from ELF file if it doesn't correspond to the running binary. It is needed to avoid printing wrong function names and source locations in stack traces. This fixes [#7514](https://github.com/ClickHouse/ClickHouse/issues/7514). [#11657](https://github.com/ClickHouse/ClickHouse/pull/11657) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Return NULL/zero when value is not parsed completely in parseDateTimeBestEffortOrNull/Zero functions. This fixes [#7876](https://github.com/ClickHouse/ClickHouse/issues/7876). [#11653](https://github.com/ClickHouse/ClickHouse/pull/11653) ([alexey-milovidov](https://github.com/alexey-milovidov)).
@ -221,6 +227,7 @@
* Automatically update DNS cache, which is used to check if user is allowed to connect from an address. [#11487](https://github.com/ClickHouse/ClickHouse/pull/11487) ([tavplubix](https://github.com/tavplubix)).
* OPTIMIZE FINAL will force merge even if concurrent merges are performed. This closes [#11309](https://github.com/ClickHouse/ClickHouse/issues/11309) and closes [#11322](https://github.com/ClickHouse/ClickHouse/issues/11322). [#11346](https://github.com/ClickHouse/ClickHouse/pull/11346) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Suppress output of cancelled queries in clickhouse-client. In previous versions result may continue to print in terminal even after you press Ctrl+C to cancel query. This closes [#9473](https://github.com/ClickHouse/ClickHouse/issues/9473). [#11342](https://github.com/ClickHouse/ClickHouse/pull/11342) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Now history file is updated after each query and there is no race condition if multiple clients use one history file. This fixes [#9897](https://github.com/ClickHouse/ClickHouse/issues/9897). [#11453](https://github.com/ClickHouse/ClickHouse/pull/11453) ([Tagir Kuskarov](https://github.com/kuskarov)).
* Better log messages in while reloading configuration. [#11341](https://github.com/ClickHouse/ClickHouse/pull/11341) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Remove trailing whitespaces from formatted queries in `clickhouse-client` or `clickhouse-format` in some cases. [#11325](https://github.com/ClickHouse/ClickHouse/pull/11325) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add setting "output_format_pretty_max_value_width". If value is longer, it will be cut to avoid output of too large values in terminal. This closes [#11140](https://github.com/ClickHouse/ClickHouse/issues/11140). [#11324](https://github.com/ClickHouse/ClickHouse/pull/11324) ([alexey-milovidov](https://github.com/alexey-milovidov)).
@ -229,6 +236,7 @@
* Support kafka_client_id parameter for Kafka tables. It also changes the default `client.id` used by ClickHouse when communicating with Kafka to be more verbose and usable. [#11252](https://github.com/ClickHouse/ClickHouse/pull/11252) ([filimonov](https://github.com/filimonov)).
* Keep the value of `DistributedFilesToInsert` metric on exceptions. In previous versions, the value was set when we are going to send some files, but it is zero, if there was an exception and some files are still pending. Now it corresponds to the number of pending files in filesystem. [#11220](https://github.com/ClickHouse/ClickHouse/pull/11220) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add support for multi-word data type names (such as `DOUBLE PRECISION` and `CHAR VARYING`) for better SQL compatibility. [#11214](https://github.com/ClickHouse/ClickHouse/pull/11214) ([Павел Потемкин](https://github.com/Potya)).
* Provide synonyms for some data types. [#10856](https://github.com/ClickHouse/ClickHouse/pull/10856) ([Павел Потемкин](https://github.com/Potya)).
* The query log is now enabled by default. [#11184](https://github.com/ClickHouse/ClickHouse/pull/11184) ([Ivan Blinkov](https://github.com/blinkov)).
* Show authentication type in table system.users and while executing SHOW CREATE USER query. [#11080](https://github.com/ClickHouse/ClickHouse/pull/11080) ([Vitaly Baranov](https://github.com/vitlibar)).
* Remove data on explicit `DROP DATABASE` for `Memory` database engine. Fixes [#10557](https://github.com/ClickHouse/ClickHouse/issues/10557). [#11021](https://github.com/ClickHouse/ClickHouse/pull/11021) ([tavplubix](https://github.com/tavplubix)).
@ -255,6 +263,7 @@
* New optimization that takes all operations out of `any` function, enabled with `optimize_move_functions_out_of_any` [#11529](https://github.com/ClickHouse/ClickHouse/pull/11529) ([Ruslan](https://github.com/kamalov-ruslan)).
* Improve performance of `clickhouse-client` in interactive mode when Pretty formats are used. In previous versions, significant amount of time can be spent calculating visible width of UTF-8 string. This closes [#11323](https://github.com/ClickHouse/ClickHouse/issues/11323). [#11323](https://github.com/ClickHouse/ClickHouse/pull/11323) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Improved performance for queries with `ORDER BY` and small `LIMIT` (less, then `max_block_size`). [#11171](https://github.com/ClickHouse/ClickHouse/pull/11171) ([Albert Kidrachev](https://github.com/Provet)).
* Add runtime CPU detection to select and dispatch the best function implementation. Add support for codegeneration for multiple targets. This closes [#1017](https://github.com/ClickHouse/ClickHouse/issues/1017). [#10058](https://github.com/ClickHouse/ClickHouse/pull/10058) ([DimasKovas](https://github.com/DimasKovas)).
* Enable `mlock` of clickhouse binary by default. It will prevent clickhouse executable from being paged out under high IO load. [#11139](https://github.com/ClickHouse/ClickHouse/pull/11139) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Make queries with `sum` aggregate function and without GROUP BY keys to run multiple times faster. [#10992](https://github.com/ClickHouse/ClickHouse/pull/10992) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Improving radix sort (used in `ORDER BY` with simple keys) by removing some redundant data moves. [#10981](https://github.com/ClickHouse/ClickHouse/pull/10981) ([Arslan Gumerov](https://github.com/g-arslan)).

View File

@ -12,6 +12,20 @@ foreach(policy
endif()
endforeach()
# set default policy
foreach(default_policy_var_name
# make option() honor normal variables for BUILD_SHARED_LIBS:
# - re2
# - snappy
CMAKE_POLICY_DEFAULT_CMP0077
# Google Test from sources uses too old cmake, 2.6.x, and CMP0022 should
# set, to avoid using deprecated LINK_INTERFACE_LIBRARIES(_<CONFIG>)? over
# INTERFACE_LINK_LIBRARIES.
CMAKE_POLICY_DEFAULT_CMP0022
)
set(${default_policy_var_name} NEW)
endforeach()
project(ClickHouse)
include (cmake/arch.cmake)
@ -378,10 +392,6 @@ include (cmake/find/mysqlclient.cmake)
# When testing for memory leaks with Valgrind, don't link tcmalloc or jemalloc.
if (OS_LINUX AND NOT ENABLE_JEMALLOC)
message (WARNING "Non default allocator is disabled. This is not recommended for production Linux builds.")
endif ()
if (USE_OPENCL)
if (OS_DARWIN)
set(OPENCL_LINKER_FLAGS "-framework OpenCL")
@ -397,6 +407,10 @@ endif ()
add_subdirectory (contrib EXCLUDE_FROM_ALL)
if (NOT ENABLE_JEMALLOC)
message (WARNING "Non default allocator is disabled. This is not recommended for production builds.")
endif ()
macro (add_executable target)
# invoke built-in add_executable
# explicitly acquire and interpose malloc symbols by clickhouse_malloc
@ -421,3 +435,5 @@ add_subdirectory (tests)
add_subdirectory (utils)
include (cmake/print_include_directories.cmake)
include (cmake/sanitize_target_link_libraries.cmake)

View File

@ -16,4 +16,5 @@ ClickHouse is an open-source column-oriented database management system that all
## Upcoming Events
* [ClickHouse at Yandex Cloud Webinar (in Russian)](https://cloud.yandex.ru/events/144) on July 7, 2020.
* [ClickHouse for genetic data (in Russian)](https://cloud.yandex.ru/events/152) on July 14, 2020.
* [ClickHouse virtual office hours](https://www.eventbrite.com/e/clickhouse-july-virtual-meetup-tickets-111199787558) on July 15, 2020.

View File

@ -163,7 +163,8 @@ public:
enum Signals : int
{
StdTerminate = -1,
StopThread = -2
StopThread = -2,
SanitizerTrap = -3,
};
explicit SignalListener(BaseDaemon & daemon_)
@ -223,8 +224,12 @@ public:
std::string query_id;
DB::ThreadStatus * thread_ptr{};
DB::readPODBinary(info, in);
DB::readPODBinary(context, in);
if (sig != SanitizerTrap)
{
DB::readPODBinary(info, in);
DB::readPODBinary(context, in);
}
DB::readPODBinary(stack_trace, in);
DB::readBinary(thread_num, in);
DB::readBinary(query_id, in);
@ -279,7 +284,14 @@ private:
VERSION_STRING, VERSION_OFFICIAL, daemon.build_id_info, thread_num, query_id, strsignal(sig), sig);
}
LOG_FATAL(log, signalToErrorMessage(sig, info, context));
String error_message;
if (sig != SanitizerTrap)
error_message = signalToErrorMessage(sig, info, context);
else
error_message = "Sanitizer trap.";
LOG_FATAL(log, error_message);
if (stack_trace.getSize())
{
@ -305,12 +317,12 @@ private:
String build_id_hex{};
#endif
SentryWriter::onFault(sig, info, context, stack_trace, build_id_hex);
if (sig != SanitizerTrap)
SentryWriter::onFault(sig, error_message, stack_trace, build_id_hex);
/// When everything is done, we will try to send these error messages to client.
if (thread_ptr)
thread_ptr->onFatalError();
}
};
@ -320,35 +332,27 @@ extern "C" void __sanitizer_set_death_callback(void (*)());
static void sanitizerDeathCallback()
{
Poco::Logger * log = &Poco::Logger::get("BaseDaemon");
/// Also need to send data via pipe. Otherwise it may lead to deadlocks or failures in printing diagnostic info.
StringRef query_id = DB::CurrentThread::getQueryId(); /// This is signal safe.
char buf[signal_pipe_buf_size];
DB::WriteBufferFromFileDescriptorDiscardOnFailure out(signal_pipe.fds_rw[1], signal_pipe_buf_size, buf);
if (query_id.size == 0)
{
LOG_FATAL(log, "(version {}{}) (from thread {}) (no query) Sanitizer trap.",
VERSION_STRING, VERSION_OFFICIAL, getThreadId());
}
else
{
LOG_FATAL(log, "(version {}{}) (from thread {}) (query_id: {}) Sanitizer trap.",
VERSION_STRING, VERSION_OFFICIAL, getThreadId(), query_id);
}
const StackTrace stack_trace;
/// Just in case print our own stack trace. In case when llvm-symbolizer does not work.
StackTrace stack_trace;
if (stack_trace.getSize())
{
std::stringstream bare_stacktrace;
bare_stacktrace << "Stack trace:";
for (size_t i = stack_trace.getOffset(); i < stack_trace.getSize(); ++i)
bare_stacktrace << ' ' << stack_trace.getFramePointers()[i];
StringRef query_id = DB::CurrentThread::getQueryId();
query_id.size = std::min(query_id.size, max_query_id_size);
LOG_FATAL(log, bare_stacktrace.str());
}
int sig = SignalListener::SanitizerTrap;
DB::writeBinary(sig, out);
DB::writePODBinary(stack_trace, out);
DB::writeBinary(UInt32(getThreadId()), out);
DB::writeStringBinary(query_id, out);
DB::writePODBinary(DB::current_thread, out);
/// Write symbolized stack trace line by line for better grep-ability.
stack_trace.toStringEveryLine([&](const std::string & s) { LOG_FATAL(log, s); });
out.next();
/// The time that is usually enough for separate thread to print info into log.
sleepForSeconds(10);
}
#endif

View File

@ -2,23 +2,26 @@
#include <Poco/File.h>
#include <Poco/Util/Application.h>
#include <Poco/Util/LayeredConfiguration.h>
#include <common/defines.h>
#include <common/getFQDNOrHostName.h>
#include <common/logger_useful.h>
#include <Common/StackTrace.h>
#if !defined(ARCADIA_BUILD)
# include "Common/config_version.h"
# include <Common/config.h>
#endif
#if USE_SENTRY
# include <sentry.h> // Y_IGNORE
# include <stdio.h>
# include <filesystem>
#endif
#if USE_SENTRY
namespace
{
@ -76,12 +79,12 @@ void sentry_logger(sentry_level_t level, const char * message, va_list args)
}
}
}
}
#endif
void SentryWriter::initialize(Poco::Util::LayeredConfiguration & config)
{
#if USE_SENTRY
bool enabled = false;
bool debug = config.getBool("send_crash_reports.debug", false);
auto * logger = &Poco::Logger::get("SentryWriter");
@ -146,28 +149,19 @@ void SentryWriter::initialize(Poco::Util::LayeredConfiguration & config)
{
LOG_INFO(logger, "Sending crash reports is disabled");
}
#else
UNUSED(config);
#endif
}
void SentryWriter::shutdown()
{
#if USE_SENTRY
if (initialized)
{
sentry_shutdown();
}
#endif
}
void SentryWriter::onFault(int sig, const siginfo_t & info, const ucontext_t & context, const StackTrace & stack_trace, const String & build_id_hex)
void SentryWriter::onFault(int sig, const std::string & error_message, const StackTrace & stack_trace, const std::string & build_id_hex)
{
#if USE_SENTRY
auto * logger = &Poco::Logger::get("SentryWriter");
if (initialized)
{
const std::string & error_message = signalToErrorMessage(sig, info, context);
sentry_value_t event = sentry_value_new_message_event(SENTRY_LEVEL_FATAL, "fault", error_message.c_str());
sentry_set_tag("signal", strsignal(sig));
sentry_set_extra("signal_number", sentry_value_new_int32(sig));
@ -240,11 +234,12 @@ void SentryWriter::onFault(int sig, const siginfo_t & info, const ucontext_t & c
{
LOG_INFO(logger, "Not sending crash report");
}
#else
UNUSED(sig);
UNUSED(info);
UNUSED(context);
UNUSED(stack_trace);
UNUSED(build_id_hex);
#endif
}
#else
void SentryWriter::initialize(Poco::Util::LayeredConfiguration &) {}
void SentryWriter::shutdown() {}
void SentryWriter::onFault(int, const std::string &, const StackTrace &, const std::string &) {}
#endif

View File

@ -1,12 +1,12 @@
#pragma once
#include <common/types.h>
#include <Common/StackTrace.h>
#include <Poco/Util/LayeredConfiguration.h>
#include <string>
namespace Poco { namespace Util { class LayeredConfiguration; }}
class StackTrace;
/// \brief Sends crash reports to ClickHouse core developer team via https://sentry.io
///
/// This feature can enabled with "send_crash_reports.enabled" server setting,
@ -14,20 +14,16 @@
///
/// It is possible to send those reports to your own sentry account or account of consulting company you hired
/// by overriding "send_crash_reports.endpoint" setting. "send_crash_reports.debug" setting will allow to do that for
class SentryWriter
namespace SentryWriter
{
public:
SentryWriter() = delete;
static void initialize(Poco::Util::LayeredConfiguration & config);
static void shutdown();
void initialize(Poco::Util::LayeredConfiguration & config);
void shutdown();
/// Not signal safe and can't be called from a signal handler
static void onFault(
void onFault(
int sig,
const siginfo_t & info,
const ucontext_t & context,
const std::string & error_message,
const StackTrace & stack_trace,
const String & build_id_hex
const std::string & build_id_hex
);
};

View File

@ -1,4 +1,4 @@
SET(ENABLE_AMQPCPP 1)
SET(ENABLE_AMQPCPP ${ENABLE_LIBRARIES})
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/AMQP-CPP/CMakeLists.txt")
message (WARNING "submodule contrib/AMQP-CPP is missing. to fix try run: \n git submodule update --init --recursive")
set (ENABLE_AMQPCPP 0)

View File

@ -0,0 +1,56 @@
# When you will try to link target with the directory (that exists), cmake will
# skip this without an error, only the following warning will be reported:
#
# target_link_libraries(main /tmp)
#
# WARNING: Target "main" requests linking to directory "/tmp". Targets may link only to libraries. CMake is dropping the item.
#
# And there is no cmake policy that controls this.
# (I guess the reason that it is allowed is because of FRAMEWORK for OSX).
#
# So to avoid error-prone cmake rules, this can be sanitized.
# There are the following ways:
# - overwrite target_link_libraries()/link_libraries() and check *before*
# calling real macro, but this requires duplicate all supported syntax
# -- too complex
# - overwrite target_link_libraries() and check LINK_LIBRARIES property, this
# works great
# -- but cannot be used with link_libraries()
# - use BUILDSYSTEM_TARGETS property to get list of all targets and sanitize
# -- this will work.
# https://stackoverflow.com/a/62311397/328260
function (get_all_targets var)
set (targets)
get_all_targets_recursive (targets ${CMAKE_CURRENT_SOURCE_DIR})
set (${var} ${targets} PARENT_SCOPE)
endfunction()
macro (get_all_targets_recursive targets dir)
get_property (subdirectories DIRECTORY ${dir} PROPERTY SUBDIRECTORIES)
foreach (subdir ${subdirectories})
get_all_targets_recursive (${targets} ${subdir})
endforeach ()
get_property (current_targets DIRECTORY ${dir} PROPERTY BUILDSYSTEM_TARGETS)
list (APPEND ${targets} ${current_targets})
endmacro ()
macro (sanitize_link_libraries target)
get_target_property(target_type ${target} TYPE)
if (${target_type} STREQUAL "INTERFACE_LIBRARY")
get_property(linked_libraries TARGET ${target} PROPERTY INTERFACE_LINK_LIBRARIES)
else()
get_property(linked_libraries TARGET ${target} PROPERTY LINK_LIBRARIES)
endif()
foreach (linked_library ${linked_libraries})
if (TARGET ${linked_library})
# just in case, skip if TARGET
elseif (IS_DIRECTORY ${linked_library})
message(FATAL_ERROR "${target} requested to link with directory: ${linked_library}")
endif()
endforeach()
endmacro()
get_all_targets (all_targets)
foreach (target ${all_targets})
sanitize_link_libraries(${target})
endforeach()

View File

@ -222,6 +222,7 @@ if (USE_INTERNAL_AVRO_LIBRARY)
endif()
if(USE_INTERNAL_GTEST_LIBRARY)
set(GOOGLETEST_VERSION 1.10.0) # master
# Google Test from sources
add_subdirectory(${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest ${CMAKE_CURRENT_BINARY_DIR}/googletest)
# avoid problems with <regexp.h>
@ -290,12 +291,13 @@ if (USE_FASTOPS)
add_subdirectory (fastops-cmake)
endif()
if (USE_AMQPCPP OR USE_CASSANDRA)
add_subdirectory (libuv)
endif()
if (USE_AMQPCPP)
add_subdirectory (amqpcpp-cmake)
endif()
if (USE_CASSANDRA)
add_subdirectory (libuv)
add_subdirectory (cassandra)
endif()

2
contrib/arrow vendored

@ -1 +1 @@
Subproject commit b789226ccb2124285792107c758bb3b40b3d082a
Subproject commit 3cbcb7b62c2f2d02851bff837758637eb592a64b

View File

@ -1,5 +1,3 @@
include(ExternalProject)
set (CMAKE_CXX_STANDARD 17)
# === thrift
@ -77,14 +75,9 @@ add_custom_command(OUTPUT orc_proto.pb.h orc_proto.pb.cc
# === flatbuffers
##############################################################
# fbs - Step 1: build flatbuffers lib and flatc compiler
##############################################################
set(FLATBUFFERS_SRC_DIR ${ClickHouse_SOURCE_DIR}/contrib/flatbuffers)
set(FLATBUFFERS_BINARY_DIR ${ClickHouse_BINARY_DIR}/contrib/flatbuffers)
set(FLATBUFFERS_INCLUDE_DIR ${FLATBUFFERS_SRC_DIR}/include)
set(FLATBUFFERS_COMPILER "$<TARGET_FILE:flatc>")
# set flatbuffers CMake options
if (${USE_STATIC_LIBRARIES})
@ -94,57 +87,11 @@ else ()
set(FLATBUFFERS_BUILD_SHAREDLIB ON CACHE BOOL "Enable the build of the flatbuffers shared library")
set(FLATBUFFERS_BUILD_FLATLIB OFF CACHE BOOL "Disable the build of the flatbuffers library")
endif ()
set(FLATBUFFERS_BUILD_FLATC ON CACHE BOOL "Build flatbuffers compiler")
set(FLATBUFFERS_BUILD_TESTS OFF CACHE BOOL "Skip flatbuffers tests")
add_subdirectory(${FLATBUFFERS_SRC_DIR} "${FLATBUFFERS_BINARY_DIR}")
###################################
# fbs - Step 2: compile *.fbs files
###################################
set(ARROW_IPC_SRC_DIR ${ARROW_SRC_DIR}/arrow/ipc)
set(ARROW_FORMAT_SRC_DIR ${ARROW_SRC_DIR}/../../format)
set(ARROW_GENERATED_INCLUDE_DIR ${CMAKE_CURRENT_BINARY_DIR}/arrow_gen_headers)
set(FLATBUFFERS_COMPILED_OUT_DIR ${ARROW_GENERATED_INCLUDE_DIR}/arrow/ipc)
set(FBS_OUTPUT_FILES
"${FLATBUFFERS_COMPILED_OUT_DIR}/File_generated.h"
"${FLATBUFFERS_COMPILED_OUT_DIR}/Message_generated.h"
"${FLATBUFFERS_COMPILED_OUT_DIR}/feather_generated.h"
"${FLATBUFFERS_COMPILED_OUT_DIR}/Schema_generated.h"
"${FLATBUFFERS_COMPILED_OUT_DIR}/SparseTensor_generated.h"
"${FLATBUFFERS_COMPILED_OUT_DIR}/Tensor_generated.h")
set(FBS_SRC
${ARROW_FORMAT_SRC_DIR}/Message.fbs
${ARROW_FORMAT_SRC_DIR}/File.fbs
${ARROW_FORMAT_SRC_DIR}/Schema.fbs
${ARROW_FORMAT_SRC_DIR}/Tensor.fbs
${ARROW_FORMAT_SRC_DIR}/SparseTensor.fbs
${ARROW_IPC_SRC_DIR}/feather.fbs)
foreach (FIL ${FBS_SRC})
get_filename_component(ABS_FIL ${FIL} ABSOLUTE)
list(APPEND ABS_FBS_SRC ${ABS_FIL})
endforeach ()
message(STATUS "FLATBUFFERS_LIBRARY: ${FLATBUFFERS_LIBRARY}, FLATBUFFERS_COMPILER: ${FLATBUFFERS_COMPILER}")
message(STATUS "FLATBUFFERS_COMPILED_OUT_DIR: ${FLATBUFFERS_COMPILED_OUT_DIR}")
message(STATUS "flatc: ${FLATBUFFERS_COMPILER} -c -o ${FLATBUFFERS_COMPILED_OUT_DIR}/ ${ABS_FBS_SRC}")
add_custom_command(OUTPUT ${FBS_OUTPUT_FILES}
COMMAND ${FLATBUFFERS_COMPILER}
-c
-o
${FLATBUFFERS_COMPILED_OUT_DIR}/
${ABS_FBS_SRC}
DEPENDS flatc ${ABS_FBS_SRC}
COMMENT "Running flatc compiler on ${ABS_FBS_SRC}"
VERBATIM)
add_custom_target(metadata_fbs DEPENDS ${FBS_OUTPUT_FILES})
add_dependencies(metadata_fbs flatc)
message(STATUS "FLATBUFFERS_LIBRARY: ${FLATBUFFERS_LIBRARY}")
# arrow-cmake cmake file calling orc cmake subroutine which detects certain compiler features.
# Apple Clang compiler failed to compile this code without specifying c++11 standard.
@ -203,6 +150,7 @@ configure_file("${LIBRARY_DIR}/util/config.h.cmake" "${CMAKE_CURRENT_SOURCE_DIR}
set(ARROW_SRCS
${LIBRARY_DIR}/array.cc
${LIBRARY_DIR}/buffer.cc
${LIBRARY_DIR}/device.cc
${LIBRARY_DIR}/builder.cc
${LIBRARY_DIR}/compare.cc
${LIBRARY_DIR}/extension_type.cc
@ -219,6 +167,11 @@ set(ARROW_SRCS
${LIBRARY_DIR}/type.cc
${LIBRARY_DIR}/visitor.cc
${LIBRARY_DIR}/tensor/coo_converter.cc
${LIBRARY_DIR}/tensor/csc_converter.cc
${LIBRARY_DIR}/tensor/csf_converter.cc
${LIBRARY_DIR}/tensor/csr_converter.cc
${LIBRARY_DIR}/array/builder_adaptive.cc
${LIBRARY_DIR}/array/builder_base.cc
${LIBRARY_DIR}/array/builder_binary.cc
@ -230,6 +183,7 @@ set(ARROW_SRCS
${LIBRARY_DIR}/array/concatenate.cc
${LIBRARY_DIR}/array/dict_internal.cc
${LIBRARY_DIR}/array/diff.cc
${LIBRARY_DIR}/array/validate.cc
${LIBRARY_DIR}/csv/converter.cc
${LIBRARY_DIR}/csv/chunker.cc
@ -237,6 +191,7 @@ set(ARROW_SRCS
${LIBRARY_DIR}/csv/options.cc
${LIBRARY_DIR}/csv/parser.cc
${LIBRARY_DIR}/csv/reader.cc
${LIBRARY_DIR}/csv/column_decoder.cc
${LIBRARY_DIR}/ipc/dictionary.cc
${LIBRARY_DIR}/ipc/feather.cc
@ -251,7 +206,6 @@ set(ARROW_SRCS
${LIBRARY_DIR}/io/file.cc
${LIBRARY_DIR}/io/interfaces.cc
${LIBRARY_DIR}/io/memory.cc
${LIBRARY_DIR}/io/readahead.cc
${LIBRARY_DIR}/io/slow.cc
${LIBRARY_DIR}/util/basic_decimal.cc
@ -274,6 +228,12 @@ set(ARROW_SRCS
${LIBRARY_DIR}/util/thread_pool.cc
${LIBRARY_DIR}/util/trie.cc
${LIBRARY_DIR}/util/utf8.cc
${LIBRARY_DIR}/util/future.cc
${LIBRARY_DIR}/util/formatting.cc
${LIBRARY_DIR}/util/parsing.cc
${LIBRARY_DIR}/util/time.cc
${LIBRARY_DIR}/util/delimiting.cc
${LIBRARY_DIR}/util/iterator.cc
${LIBRARY_DIR}/vendored/base64.cpp
${ORC_SRCS}
@ -321,7 +281,7 @@ endif ()
add_library(${ARROW_LIBRARY} ${ARROW_SRCS})
# Arrow dependencies
add_dependencies(${ARROW_LIBRARY} ${FLATBUFFERS_LIBRARY} metadata_fbs)
add_dependencies(${ARROW_LIBRARY} ${FLATBUFFERS_LIBRARY})
target_link_libraries(${ARROW_LIBRARY} PRIVATE ${FLATBUFFERS_LIBRARY} boost::filesystem)
@ -352,17 +312,18 @@ target_include_directories(${ARROW_LIBRARY} PRIVATE SYSTEM ${ORC_BUILD_INCLUDE_D
target_include_directories(${ARROW_LIBRARY} PRIVATE SYSTEM ${ORC_ADDITION_SOURCE_DIR})
target_include_directories(${ARROW_LIBRARY} PRIVATE SYSTEM ${ARROW_SRC_DIR})
target_include_directories(${ARROW_LIBRARY} PRIVATE SYSTEM ${FLATBUFFERS_INCLUDE_DIR})
target_include_directories(${ARROW_LIBRARY} PRIVATE SYSTEM ${ARROW_GENERATED_INCLUDE_DIR})
# === parquet
set(LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/arrow/cpp/src/parquet)
set(GEN_LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/arrow/cpp/src/generated)
# arrow/cpp/src/parquet/CMakeLists.txt
set(PARQUET_SRCS
${LIBRARY_DIR}/arrow/reader.cc
${LIBRARY_DIR}/arrow/reader_internal.cc
${LIBRARY_DIR}/arrow/schema.cc
${LIBRARY_DIR}/arrow/writer.cc
${LIBRARY_DIR}/arrow/path_internal.cc
${LIBRARY_DIR}/bloom_filter.cc
${LIBRARY_DIR}/column_reader.cc
${LIBRARY_DIR}/column_scanner.cc
@ -379,16 +340,19 @@ set(PARQUET_SRCS
${LIBRARY_DIR}/schema.cc
${LIBRARY_DIR}/statistics.cc
${LIBRARY_DIR}/types.cc
${LIBRARY_DIR}/encryption.cc
${LIBRARY_DIR}/encryption_internal.cc
${LIBRARY_DIR}/internal_file_decryptor.cc
${LIBRARY_DIR}/internal_file_encryptor.cc
${GEN_LIBRARY_DIR}/parquet_constants.cpp
${GEN_LIBRARY_DIR}/parquet_types.cpp
)
#list(TRANSFORM PARQUET_SRCS PREPEND ${LIBRARY_DIR}/) # cmake 3.12
list(APPEND PARQUET_SRCS
${CMAKE_CURRENT_SOURCE_DIR}/cpp/src/parquet/parquet_constants.cpp
${CMAKE_CURRENT_SOURCE_DIR}/cpp/src/parquet/parquet_types.cpp
)
add_library(${PARQUET_LIBRARY} ${PARQUET_SRCS})
target_include_directories(${PARQUET_LIBRARY} SYSTEM PUBLIC ${ClickHouse_SOURCE_DIR}/contrib/arrow/cpp/src ${CMAKE_CURRENT_SOURCE_DIR}/cpp/src)
target_include_directories(${PARQUET_LIBRARY} SYSTEM PUBLIC ${ClickHouse_SOURCE_DIR}/contrib/arrow/cpp/src ${CMAKE_CURRENT_SOURCE_DIR}/cpp/src PRIVATE ${OPENSSL_INCLUDE_DIR})
include(${ClickHouse_SOURCE_DIR}/contrib/thrift/build/cmake/ConfigureChecks.cmake) # makes config.h
target_link_libraries(${PARQUET_LIBRARY} PUBLIC ${ARROW_LIBRARY} PRIVATE ${THRIFT_LIBRARY} boost::headers_only boost::regex)
target_link_libraries(${PARQUET_LIBRARY} PUBLIC ${ARROW_LIBRARY} PRIVATE ${THRIFT_LIBRARY} boost::headers_only boost::regex ${OPENSSL_LIBRARIES})
if (SANITIZE STREQUAL "undefined")
target_compile_options(${PARQUET_LIBRARY} PRIVATE -fno-sanitize=undefined)

View File

@ -20,5 +20,7 @@
#define ARROW_VERSION_PATCH
#define ARROW_VERSION ((ARROW_VERSION_MAJOR * 1000) + ARROW_VERSION_MINOR) * 1000 + ARROW_VERSION_PATCH
/* #undef DOUBLE_CONVERSION_HAS_CASE_INSENSIBILITY */
#define ARROW_SO_VERSION ""
#define ARROW_FULL_SO_VERSION ""
/* #undef GRPCPP_PP_INCLUDE */

View File

@ -1,17 +0,0 @@
/**
* Autogenerated by Thrift Compiler (0.12.0)
*
* DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING
* @generated
*/
#include "parquet_constants.h"
namespace parquet { namespace format {
const parquetConstants g_parquet_constants;
parquetConstants::parquetConstants() {
}
}} // namespace

View File

@ -1,24 +0,0 @@
/**
* Autogenerated by Thrift Compiler (0.12.0)
*
* DO NOT EDIT UNLESS YOU ARE SURE THAT YOU KNOW WHAT YOU ARE DOING
* @generated
*/
#ifndef parquet_CONSTANTS_H
#define parquet_CONSTANTS_H
#include "parquet_types.h"
namespace parquet { namespace format {
class parquetConstants {
public:
parquetConstants();
};
extern const parquetConstants g_parquet_constants;
}} // namespace
#endif

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -18,6 +18,13 @@
#ifndef PARQUET_VERSION_H
#define PARQUET_VERSION_H
#define PARQUET_VERSION_MAJOR 1
#define PARQUET_VERSION_MINOR 5
#define PARQUET_VERSION_PATCH 1
#define PARQUET_SO_VERSION 0
#define PARQUET_FULL_SO_VERSION 0.17
// define the parquet created by version
#define CREATED_BY_VERSION "parquet-cpp version 1.5.1-SNAPSHOT"

2
contrib/flatbuffers vendored

@ -1 +1 @@
Subproject commit bf9eb67ab9371755c6bcece13cadc7693bcbf264
Subproject commit 6df40a2471737b27271bdd9b900ab5f3aec746c7

2
contrib/googletest vendored

@ -1 +1 @@
Subproject commit 703bd9caab50b139428cea1aaff9974ebee5742e
Subproject commit 356f2d264a485db2fcc50ec1c672e0d37b6cb39b

View File

@ -31,6 +31,10 @@
"name": "yandex/clickhouse-integration-test",
"dependent": []
},
"docker/test/fuzzer": {
"name": "yandex/clickhouse-fuzzer",
"dependent": []
},
"docker/test/performance-comparison": {
"name": "yandex/clickhouse-performance-comparison",
"dependent": []
@ -79,5 +83,9 @@
"docker/test/integration/runner": {
"name": "yandex/clickhouse-integration-tests-runner",
"dependent": []
},
"docker/test/testflows/runner": {
"name": "yandex/clickhouse-testflows-runner",
"dependent": []
}
}

View File

@ -18,7 +18,7 @@ ccache --zero-stats ||:
ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so ||:
rm -f CMakeCache.txt
cmake --debug-trycompile --verbose=1 -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DSANITIZE=$SANITIZER $CMAKE_FLAGS ..
ninja clickhouse-bundle
ninja $NINJA_FLAGS clickhouse-bundle
mv ./programs/clickhouse* /output
mv ./src/unit_tests_dbms /output
find . -name '*.so' -print -exec mv '{}' /output \;

View File

@ -149,6 +149,8 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ
if clang_tidy:
cmake_flags.append('-DENABLE_CLANG_TIDY=1')
# Don't stop on first error to find more clang-tidy errors in one run.
result.append('NINJA_FLAGS=-k0')
if with_coverage:
cmake_flags.append('-DWITH_COVERAGE=1')

View File

@ -0,0 +1,36 @@
# docker build -t yandex/clickhouse-fuzzer .
FROM ubuntu:18.04
ENV LANG=C.UTF-8
ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install --yes --no-install-recommends \
bash \
ca-certificates \
curl \
gdb \
git \
libc6-dbg \
moreutils \
ncdu \
p7zip-full \
parallel \
psmisc \
rsync \
tree \
tzdata \
vim \
wget \
&& apt-get autoremove --yes \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
COPY * /
CMD cd /workspace \
&& /run-fuzzer.sh 2>&1 | ts "$(printf '%%Y-%%m-%%d %%H:%%M:%%S\t')" | tee main.log
# docker run --network=host --volume <workspace>:/workspace -e PR_TO_TEST=<> -e SHA_TO_TEST=<> yandex/clickhouse-fuzzer

View File

@ -0,0 +1,82 @@
#!/bin/bash
set -ex
set -o pipefail
trap "exit" INT TERM
trap 'kill $(jobs -pr) ||:' EXIT
stage=${stage:-}
script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# Doesn't work for clone stage, but should work after that
repo_dir=${repo_dir:-$(readlink -f "$script_dir/../../..")}
function clone
{
(
rm -rf ch ||:
mkdir ch
cd ch
git init
git remote add origin https://github.com/ClickHouse/ClickHouse
git fetch --depth=1 origin "$SHA_TO_TEST"
# If not master, try to fetch pull/.../{head,merge}
if [ "$PR_TO_TEST" != "0" ]
then
git fetch --depth=1 origin "refs/pull/$PR_TO_TEST/*:refs/heads/pull/$PR_TO_TEST/*"
fi
git checkout "$SHA_TO_TEST"
)
}
function download
{
# wget -O- -nv -nd -c "https://clickhouse-builds.s3.yandex.net/$PR_TO_TEST/$SHA_TO_TEST/clickhouse_build_check/performance/performance.tgz" \
# | tar --strip-components=1 -zxv
wget -nv -nd -c "https://clickhouse-builds.s3.yandex.net/$PR_TO_TEST/$SHA_TO_TEST/clickhouse_build_check/clang-10_debug_none_bundled_unsplitted_disable_False_binary/clickhouse"
}
function configure
{
mkdir db ||:
cp -av "$repo_dir"/programs/server/config* db
cp -av "$repo_dir"/programs/server/user* db
cp -av "$repo_dir"/tests/config db/config.d
}
function fuzz
{
./clickhouse server --config-file db/config.xml -- --path db 2>&1 | tail -1000000 > server.log &
server_pid=$!
kill -0 $server_pid
while ! ./clickhouse client --query "select 1" && kill -0 $server_pid ; do echo . ; sleep 1 ; done
./clickhouse client --query "select 1"
echo Server started
for f in $(ls ch/tests/queries/0_stateless/*.sql | sort -R); do cat $f; echo ';'; done \
| ./clickhouse client --query-fuzzer-runs=10 2>&1 | tail -1000000 > fuzzer.log
}
case "$stage" in
"")
;&
"clone")
time clone
stage=download time ch/docker/test/fuzzer/run-fuzzer.sh
;;
"download")
time download
;&
"configure")
time configure
;&
"fuzz")
time fuzz
;&
"report")
;&
esac

View File

@ -1,6 +1,6 @@
## Docker containers for integration tests
- `base` container with required packages
- `runner` container with that runs integration tests in docker
- `compose` contains docker_compose YaML files that are used in tests
- `runnner/compose` contains docker\_compose YaML files that are used in tests
How to run integration tests is described in tests/integration/README.md
How to run integration tests is described in tests/integration/README.md

View File

@ -63,6 +63,7 @@ RUN set -eux; \
COPY modprobe.sh /usr/local/bin/modprobe
COPY dockerd-entrypoint.sh /usr/local/bin/
COPY compose/ /compose/
RUN set -x \
&& addgroup --system dockremap \

View File

@ -18,8 +18,58 @@ to have false positives.
### How to read the report
Should add inline comments there, because who reads the docs anyway. They must
be collapsible and I am afraid of Javascript, so I'm going to do it later.
The check status summarizes the report in a short text message like `1 faster, 10 unstable`:
* `1 faster` -- how many queries became faster,
* `1 slower` -- how many queries are slower,
* `1 too long` -- how many queries are taking too long to run,
* `1 unstable` -- how many queries have unstable results,
* `1 errors` -- how many errors there are in total. Action is required for every error, this number must be zero. The number of errors includes slower tests, tests that are too long, errors while running the tests and building reports, etc. Please look at the main report page to investigate these errors.
The report page itself constists of a several tables. Some of them always signify errors, e.g. "Run errors" -- the very presence of this table indicates that there were errors during the test, that are not normal and must be fixed. Some tables are mostly informational, e.g. "Test times" -- they reflect normal test results. But if a cell in such table is marked in red, this also means an error, e.g., a test is taking too long to run.
#### Tested commits
Informational, no action required. Log messages for the commits that are tested. Note that for the right commit, we show nominal tested commit `pull/*/head` and real tested commit `pull/*/merge`, which is generated by GitHub by merging latest master to the `pull/*/head` and which we actually build and test in CI.
#### Run errors
Action required for every item -- these are errors that must be fixed. The errors that ocurred when running some test queries. For more information about the error, download test output archive and see `test-name-err.log`. To reproduce, see 'How to run' below.
#### Slow on client
Action required for every item -- these are errors that must be fixed. This table shows queries that take significantly longer to process on the client than on the server. A possible reason might be sending too much data to the client, e.g., a forgotten `format Null`.
#### Partial queries
Action required for the cells marked in red. Shows the queries we are unable to run on an old server -- probably because they contain a new function. You should see this table when you add a new function and a performance test for it. Check that the run time and variance are acceptable (run time between 0.1 and 1 seconds, variance below 10%). If not, they will be highlighted in red.
#### Changes in performance
Action required for the cells marked in red, and some cheering is appropriate for the cells marked in green. These are the queries for which we observe a statistically significant change in performance. Note that there will always be some false positives -- we try to filter by p < 0.001, and have 2000 queries, so two false positives per run are expected. In practice we have more -- e.g. code layout changed because of some unknowable jitter in compiler internals, so the change we observe is real, but it is a 'false positive' in the sense that it is not directly caused by your changes. If, based on your knowledge of ClickHouse internals, you can decide that the observed test changes are not relevant to the changes made in the tested PR, you can ignore them.
You can find flame graphs for queries with performance changes in the test output archive, in files named as 'my_test_0_Cpu_SELECT 1 FROM....FORMAT Null.left.svg'. First goes the test name, then the query number in the test, then the trace type (same as in `system.trace_log`), and then the server version (left is old and right is new).
#### Unstable queries
Action required for the cells marked in red. These are queries for which we did not observe a statistically significant change in performance, but for which the variance in query performance is very high. This means that we are likely to observe big changes in performance even in the absence of real changes, e.g. when comparing the server to itself. Such queries are going to have bad sensitivity as performance tests -- if a query has, say, 50% expected variability, this means we are going to see changes in performance up to 50%, even when there were no real changes in the code. And because of this, we won't be able to detect changes less than 50% with such a query, which is pretty bad. The reasons for the high variability must be investigated and fixed; ideally, the variability should be brought under 5-10%.
The most frequent reason for instability is that the query is just too short -- e.g. below 0.1 seconds. Bringing query time to 0.2 seconds or above usually helps.
Other reasons may include:
* using a lot of memory which is allocated differently between servers, so the access time may vary. This may apply to your queries if you have a `Memory` engine table that is bigger than 1 GB. For example, this problem has plagued `arithmetic` and `logical_functions` tests for a long time.
* having some threshold behavior in the query, e.g. you insert to a Buffer table and it is flushed only on some query runs, so you get a much higher time for them.
Investigating the instablility is the hardest problem in performance testing, and we still have not been able to understand the reasons behind the instability of some queries. There are some data that can help you in the performance test output archive. Look for files named 'my_unstable_test_0_SELECT 1...FORMAT Null.{left,right}.metrics.rep'. They contain metrics from `system.query_log.ProfileEvents` and functions from stack traces from `system.trace_log`, that vary significantly between query runs. The second column is array of \[min, med, max] values for the metric. Say, if you see `PerfCacheMisses` there, it may mean that the code being tested has not-so-cache-local memory access pattern that is sensitive to memory layout.
#### Skipped tests
Informational, no action required. Shows the tests that were skipped, and the reason for it. Normally it is because the data set required for the test was not loaded, or the test is marked as 'long' -- both cases mean that the test is too big to be ran per-commit.
#### Test performance changes
Informational, no action required. This table summarizes the changes in performance of queries in each test -- how many queries have changed, how many are unstable, and what is the magnitude of the changes.
#### Test times
Action required for the cells marked in red. This table shows the run times for all the tests. You may have to fix two kinds of errors in this table:
1) Average query run time is too long -- probalby means that the preparatory steps such as creating the table and filling them with data are taking too long. Try to make them faster.
2) Longest query run time is too long -- some particular queries are taking too long, try to make them faster. The ideal query run time is between 0.1 and 1 s.
#### Concurrent benchmarks
No action required. This table shows the results of a concurrent behcmark where queries from `website` are ran in parallel using `clickhouse-benchmark`, and requests per second values are compared for old and new servers. It shows variability up to 20% for no apparent reason, so it's probably safe to disregard it. We have it for special cases like investigating concurrency effects in memory allocators, where it may be important.
#### Metric changes
No action required. These are changes in median values of metrics from `system.asynchronous_metrics_log`. Again, they are prone to unexplained variation and you can safely ignore this table unless it's interesting to you for some particular reason (e.g. you want to compare memory usage). There are also graphs of these metrics in the performance test output archive, in the `metrics` folder.
### How to run
Run the entire docker container, specifying PR number (0 for master)

View File

@ -6,8 +6,6 @@
<allow_introspection_functions>1</allow_introspection_functions>
<log_queries>1</log_queries>
<metrics_perf_events_enabled>1</metrics_perf_events_enabled>
<memory_profiler_sample_probability>1</memory_profiler_sample_probability>
<max_untracked_memory>1048576</max_untracked_memory> <!-- 1MB -->
</default>
</profiles>
</yandex>

View File

@ -11,46 +11,5 @@ COPY s3downloader /s3downloader
ENV DATASETS="hits visits"
CMD dpkg -i package_folder/clickhouse-common-static_*.deb; \
dpkg -i package_folder/clickhouse-common-static-dbg_*.deb; \
dpkg -i package_folder/clickhouse-server_*.deb; \
dpkg -i package_folder/clickhouse-client_*.deb; \
dpkg -i package_folder/clickhouse-test_*.deb; \
mkdir -p /etc/clickhouse-server/dict_examples; \
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/; \
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/; \
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/lib/llvm-9/bin/llvm-symbolizer /usr/bin/llvm-symbolizer; \
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then ln -s /usr/share/clickhouse-test/config/database_atomic_configd.xml /etc/clickhouse-server/config.d/; fi; \
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then ln -s /usr/share/clickhouse-test/config/database_atomic_usersd.xml /etc/clickhouse-server/users.d/; fi; \
echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 history_size=7'" >> /etc/environment; \
echo "TSAN_SYMBOLIZER_PATH=/usr/lib/llvm-8/bin/llvm-symbolizer" >> /etc/environment; \
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment; \
echo "ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-6.0/bin/llvm-symbolizer" >> /etc/environment; \
echo "UBSAN_SYMBOLIZER_PATH=/usr/lib/llvm-6.0/bin/llvm-symbolizer" >> /etc/environment; \
echo "LLVM_SYMBOLIZER_PATH=/usr/lib/llvm-6.0/bin/llvm-symbolizer" >> /etc/environment; \
service zookeeper start; sleep 5; \
service clickhouse-server start && sleep 5 \
&& /s3downloader --dataset-names $DATASETS \
&& chmod 777 -R /var/lib/clickhouse \
&& clickhouse-client --query "SHOW DATABASES" \
&& clickhouse-client --query "ATTACH DATABASE datasets ENGINE = Ordinary" \
&& clickhouse-client --query "CREATE DATABASE test" \
&& service clickhouse-server restart && sleep 5 \
&& clickhouse-client --query "SHOW TABLES FROM datasets" \
&& clickhouse-client --query "SHOW TABLES FROM test" \
&& clickhouse-client --query "RENAME TABLE datasets.hits_v1 TO test.hits" \
&& clickhouse-client --query "RENAME TABLE datasets.visits_v1 TO test.visits" \
&& clickhouse-client --query "SHOW TABLES FROM test" \
&& clickhouse-test --testname --shard --zookeeper --no-stateless --use-skip-list $ADDITIONAL_OPTIONS $SKIP_TESTS_OPTION 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
COPY run.sh /
CMD ["/bin/bash", "/run.sh"]

59
docker/test/stateful/run.sh Executable file
View File

@ -0,0 +1,59 @@
#!/bin/bash
set -e -x
dpkg -i package_folder/clickhouse-common-static_*.deb;
dpkg -i package_folder/clickhouse-common-static-dbg_*.deb
dpkg -i package_folder/clickhouse-server_*.deb
dpkg -i package_folder/clickhouse-client_*.deb
dpkg -i package_folder/clickhouse-test_*.deb
mkdir -p /etc/clickhouse-server/dict_examples
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then
ln -s /usr/share/clickhouse-test/config/database_atomic_configd.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/database_atomic_usersd.xml /etc/clickhouse-server/users.d/
fi
echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 history_size=7'" >> /etc/environment
echo "TSAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment
echo "ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "UBSAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "LLVM_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
service zookeeper start
sleep 5
service clickhouse-server start
sleep 5
/s3downloader --dataset-names $DATASETS
chmod 777 -R /var/lib/clickhouse
clickhouse-client --query "SHOW DATABASES"
clickhouse-client --query "ATTACH DATABASE datasets ENGINE = Ordinary"
clickhouse-client --query "CREATE DATABASE test"
service clickhouse-server restart && sleep 5
clickhouse-client --query "SHOW TABLES FROM datasets"
clickhouse-client --query "SHOW TABLES FROM test"
clickhouse-client --query "RENAME TABLE datasets.hits_v1 TO test.hits"
clickhouse-client --query "RENAME TABLE datasets.visits_v1 TO test.visits"
clickhouse-client --query "SHOW TABLES FROM test"
if cat /usr/bin/clickhouse-test | grep -q -- "--use-skip-list"; then
SKIP_LIST_OPT="--use-skip-list"
fi
clickhouse-test --testname --shard --zookeeper --no-stateless "$SKIP_LIST_OPT" $ADDITIONAL_OPTIONS $SKIP_TESTS_OPTION 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt

View File

@ -66,8 +66,7 @@ ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/con
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/lib/llvm-8/bin/llvm-symbolizer /usr/bin/llvm-symbolizer
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/;
service zookeeper start
@ -105,7 +104,12 @@ LLVM_PROFILE_FILE='client_%h_%p_%m.profraw' clickhouse-client --query "SHOW TABL
LLVM_PROFILE_FILE='client_%h_%p_%m.profraw' clickhouse-client --query "RENAME TABLE datasets.hits_v1 TO test.hits"
LLVM_PROFILE_FILE='client_%h_%p_%m.profraw' clickhouse-client --query "RENAME TABLE datasets.visits_v1 TO test.visits"
LLVM_PROFILE_FILE='client_%h_%p_%m.profraw' clickhouse-client --query "SHOW TABLES FROM test"
LLVM_PROFILE_FILE='client_%h_%p_%m.profraw' clickhouse-test --testname --shard --zookeeper --no-stateless --use-skip-list $ADDITIONAL_OPTIONS $SKIP_TESTS_OPTION 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
if cat /usr/bin/clickhouse-test | grep -q -- "--use-skip-list"; then
SKIP_LIST_OPT="--use-skip-list"
fi
LLVM_PROFILE_FILE='client_%h_%p_%m.profraw' clickhouse-test --testname --shard --zookeeper --no-stateless "$SKIP_LIST_OPT" $ADDITIONAL_OPTIONS $SKIP_TESTS_OPTION 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
kill_clickhouse

View File

@ -47,40 +47,7 @@ RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 history_size=7'" >> /etc/environment; \
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment; \
echo "MSAN_OPTIONS='abort_on_error=1'" >> /etc/environment; \
ln -s /usr/lib/llvm-9/bin/llvm-symbolizer /usr/bin/llvm-symbolizer;
ln -s /usr/lib/llvm-10/bin/llvm-symbolizer /usr/bin/llvm-symbolizer;
CMD dpkg -i package_folder/clickhouse-common-static_*.deb; \
dpkg -i package_folder/clickhouse-common-static-dbg_*.deb; \
dpkg -i package_folder/clickhouse-server_*.deb; \
dpkg -i package_folder/clickhouse-client_*.deb; \
dpkg -i package_folder/clickhouse-test_*.deb; \
mkdir -p /etc/clickhouse-server/dict_examples; \
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/; \
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/query_masking_rules.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/; \
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/; \
ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-server/users.d/; \
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/clusters.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/graphite.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/share/clickhouse-test/config/server.key /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/server.crt /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/dhparam.pem /etc/clickhouse-server/; \
if [[ -n "$USE_POLYMORPHIC_PARTS" ]] && [[ "$USE_POLYMORPHIC_PARTS" -eq 1 ]]; then ln -s /usr/share/clickhouse-test/config/polymorphic_parts.xml /etc/clickhouse-server/config.d/; fi; \
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then ln -s /usr/share/clickhouse-test/config/database_atomic_configd.xml /etc/clickhouse-server/config.d/; fi; \
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then ln -s /usr/share/clickhouse-test/config/database_atomic_usersd.xml /etc/clickhouse-server/users.d/; fi; \
ln -sf /usr/share/clickhouse-test/config/client_config.xml /etc/clickhouse-client/config.xml; \
service zookeeper start; sleep 5; \
service clickhouse-server start && sleep 5 && clickhouse-test --testname --shard --zookeeper --use-skip-list $ADDITIONAL_OPTIONS $SKIP_TESTS_OPTION 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
COPY run.sh /
CMD ["/bin/bash", "/run.sh"]

61
docker/test/stateless/run.sh Executable file
View File

@ -0,0 +1,61 @@
#!/bin/bash
set -e -x
dpkg -i package_folder/clickhouse-common-static_*.deb
dpkg -i package_folder/clickhouse-common-static-dbg_*.deb
dpkg -i package_folder/clickhouse-server_*.deb
dpkg -i package_folder/clickhouse-client_*.deb
dpkg -i package_folder/clickhouse-test_*.deb
mkdir -p /etc/clickhouse-server/dict_examples
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/dict_examples/
ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/listen.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/text_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/metric_log.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/query_masking_rules.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/readonly.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-server/users.d/
ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/clusters.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/graphite.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/server.key /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/server.crt /etc/clickhouse-server/
ln -s /usr/share/clickhouse-test/config/dhparam.pem /etc/clickhouse-server/
if [[ -n "$USE_POLYMORPHIC_PARTS" ]] && [[ "$USE_POLYMORPHIC_PARTS" -eq 1 ]]; then
ln -s /usr/share/clickhouse-test/config/polymorphic_parts.xml /etc/clickhouse-server/config.d/
fi
if [[ -n "$USE_DATABASE_ATOMIC" ]] && [[ "$USE_DATABASE_ATOMIC" -eq 1 ]]; then
ln -s /usr/share/clickhouse-test/config/database_atomic_configd.xml /etc/clickhouse-server/config.d/
ln -s /usr/share/clickhouse-test/config/database_atomic_usersd.xml /etc/clickhouse-server/users.d/
fi
ln -sf /usr/share/clickhouse-test/config/client_config.xml /etc/clickhouse-client/config.xml
echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 history_size=7'" >> /etc/environment
echo "TSAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment
echo "ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "UBSAN_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
echo "LLVM_SYMBOLIZER_PATH=/usr/lib/llvm-10/bin/llvm-symbolizer" >> /etc/environment
service zookeeper start
sleep 5
service clickhouse-server start && sleep 5
if cat /usr/bin/clickhouse-test | grep -q -- "--use-skip-list"; then
SKIP_LIST_OPT="--use-skip-list"
fi
clickhouse-test --testname --shard --zookeeper "$SKIP_LIST_OPT" $ADDITIONAL_OPTIONS $SKIP_TESTS_OPTION 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt

View File

@ -44,4 +44,11 @@ ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
COPY run.sh /run.sh
# Sanitizer options
RUN echo "TSAN_OPTIONS='verbosity=1000 halt_on_error=1 history_size=7'" >> /etc/environment; \
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment; \
echo "MSAN_OPTIONS='abort_on_error=1'" >> /etc/environment; \
ln -s /usr/lib/llvm-10/bin/llvm-symbolizer /usr/bin/llvm-symbolizer;
CMD ["/bin/bash", "/run.sh"]

View File

@ -66,8 +66,7 @@ ln -s /usr/share/clickhouse-test/config/zookeeper.xml /etc/clickhouse-server/con
ln -s /usr/share/clickhouse-test/config/server.key /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/server.crt /etc/clickhouse-server/; \
ln -s /usr/share/clickhouse-test/config/dhparam.pem /etc/clickhouse-server/; \
ln -sf /usr/share/clickhouse-test/config/client_config.xml /etc/clickhouse-client/config.xml; \
ln -s /usr/lib/llvm-8/bin/llvm-symbolizer /usr/bin/llvm-symbolizer
ln -sf /usr/share/clickhouse-test/config/client_config.xml /etc/clickhouse-client/config.xml
service zookeeper start
sleep 5
@ -76,7 +75,12 @@ start_clickhouse
sleep 10
LLVM_PROFILE_FILE='client_coverage.profraw' clickhouse-test --testname --shard --zookeeper --use-skip-list $ADDITIONAL_OPTIONS $SKIP_TESTS_OPTION 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
if cat /usr/bin/clickhouse-test | grep -q -- "--use-skip-list"; then
SKIP_LIST_OPT="--use-skip-list"
fi
LLVM_PROFILE_FILE='client_coverage.profraw' clickhouse-test --testname --shard --zookeeper "$SKIP_LIST_OPT" $ADDITIONAL_OPTIONS $SKIP_TESTS_OPTION 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
kill_clickhouse

View File

@ -33,7 +33,6 @@ CMD dpkg -i package_folder/clickhouse-common-static_*.deb; \
dpkg -i package_folder/clickhouse-test_*.deb; \
ln -s /usr/share/clickhouse-test/config/log_queries.xml /etc/clickhouse-server/users.d/; \
ln -s /usr/share/clickhouse-test/config/part_log.xml /etc/clickhouse-server/config.d/; \
ln -s /usr/lib/llvm-9/bin/llvm-symbolizer /usr/bin/llvm-symbolizer; \
echo "TSAN_OPTIONS='halt_on_error=1 history_size=7 ignore_noninstrumented_modules=1 verbosity=1'" >> /etc/environment; \
echo "UBSAN_OPTIONS='print_stacktrace=1'" >> /etc/environment; \
echo "ASAN_OPTIONS='malloc_context_size=10 verbosity=1 allocator_release_to_os_interval_ms=10000'" >> /etc/environment; \

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python
#-*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
from multiprocessing import cpu_count
from subprocess import Popen, check_call
import os
@ -8,27 +8,39 @@ import argparse
import logging
import time
def get_skip_list_cmd(path):
with open(path, 'r') as f:
for line in f:
if '--use-skip-list' in line:
return '--use-skip-list'
return ''
def run_perf_test(cmd, xmls_path, output_folder):
output_path = os.path.join(output_folder, "perf_stress_run.txt")
f = open(output_path, 'w')
p = Popen("{} --skip-tags=long --recursive --input-files {}".format(cmd, xmls_path), shell=True, stdout=f, stderr=f)
return p
def run_func_test(cmd, output_prefix, num_processes, skip_tests_option):
skip_list_opt = get_skip_list_cmd(cmd)
output_paths = [os.path.join(output_prefix, "stress_test_run_{}.txt".format(i)) for i in range(num_processes)]
f = open(output_paths[0], 'w')
main_command = "{} --use-skip-list {}".format(cmd, skip_tests_option)
main_command = "{} {} {}".format(cmd, skip_list_opt, skip_tests_option)
logging.info("Run func tests main cmd '%s'", main_command)
pipes = [Popen(main_command, shell=True, stdout=f, stderr=f)]
for output_path in output_paths[1:]:
time.sleep(0.5)
f = open(output_path, 'w')
full_command = "{} --use-skip-list --order=random {}".format(cmd, skip_tests_option)
full_command = "{} {} --order=random {}".format(cmd, skip_list_opt, skip_tests_option)
logging.info("Run func tests '%s'", full_command)
p = Popen(full_command, shell=True, stdout=f, stderr=f)
pipes.append(p)
return pipes
def check_clickhouse_alive(cmd):
try:
logging.info("Checking ClickHouse still alive")
@ -37,10 +49,11 @@ def check_clickhouse_alive(cmd):
except:
return False
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
parser = argparse.ArgumentParser(description="ClickHouse script for running stresstest")
parser.add_argument("--test-cmd", default='clickhouse-test')
parser.add_argument("--test-cmd", default='/usr/bin/clickhouse-test')
parser.add_argument("--skip-func-tests", default='')
parser.add_argument("--client-cmd", default='clickhouse-client')
parser.add_argument("--perf-test-cmd", default='clickhouse-performance-test')

View File

@ -0,0 +1,76 @@
# docker build -t yandex/clickhouse-testflows-runner .
FROM ubuntu:20.04
RUN apt-get update \
&& env DEBIAN_FRONTEND=noninteractive apt-get install --yes \
ca-certificates \
bash \
btrfs-progs \
e2fsprogs \
iptables \
xfsprogs \
tar \
pigz \
wget \
git \
iproute2 \
cgroupfs-mount \
python3-pip \
tzdata \
libreadline-dev \
libicu-dev \
bsdutils \
curl \
liblua5.1-dev \
luajit \
libssl-dev \
libcurl4-openssl-dev \
gdb \
&& rm -rf \
/var/lib/apt/lists/* \
/var/cache/debconf \
/tmp/* \
&& apt-get clean
ENV TZ=Europe/Moscow
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN pip3 install urllib3 testflows==1.6.24 docker-compose docker dicttoxml kazoo tzlocal
ENV DOCKER_CHANNEL stable
ENV DOCKER_VERSION 17.09.1-ce
RUN set -eux; \
\
# this "case" statement is generated via "update.sh"
\
if ! wget -O docker.tgz "https://download.docker.com/linux/static/${DOCKER_CHANNEL}/x86_64/docker-${DOCKER_VERSION}.tgz"; then \
echo >&2 "error: failed to download 'docker-${DOCKER_VERSION}' from '${DOCKER_CHANNEL}' for '${x86_64}'"; \
exit 1; \
fi; \
\
tar --extract \
--file docker.tgz \
--strip-components 1 \
--directory /usr/local/bin/ \
; \
rm docker.tgz; \
\
dockerd --version; \
docker --version
COPY modprobe.sh /usr/local/bin/modprobe
COPY dockerd-entrypoint.sh /usr/local/bin/
RUN set -x \
&& addgroup --system dockremap \
&& adduser --system dockremap \
&& adduser dockremap dockremap \
&& echo 'dockremap:165536:65536' >> /etc/subuid \
&& echo 'dockremap:165536:65536' >> /etc/subgid
VOLUME /var/lib/docker
EXPOSE 2375
ENTRYPOINT ["dockerd-entrypoint.sh"]
CMD ["sh", "-c", "python3 regression.py --no-color --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json"]

View File

@ -0,0 +1,26 @@
#!/bin/bash
set -e
dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2375 &>/var/log/somefile &
set +e
reties=0
while true; do
docker info &>/dev/null && break
reties=$[$reties+1]
if [[ $reties -ge 100 ]]; then # 10 sec max
echo "Can't start docker daemon, timeout exceeded." >&2
exit 1;
fi
sleep 0.1
done
set -e
echo "Start tests"
export CLICKHOUSE_TESTS_SERVER_BIN_PATH=/clickhouse
export CLICKHOUSE_TESTS_CLIENT_BIN_PATH=/clickhouse
export CLICKHOUSE_TESTS_BASE_CONFIG_DIR=/clickhouse-config
export CLICKHOUSE_ODBC_BRIDGE_BINARY_PATH=/clickhouse-odbc-bridge
cd /ClickHouse/tests/testflows
exec "$@"

View File

@ -0,0 +1,20 @@
#!/bin/sh
set -eu
# "modprobe" without modprobe
# https://twitter.com/lucabruno/status/902934379835662336
# this isn't 100% fool-proof, but it'll have a much higher success rate than simply using the "real" modprobe
# Docker often uses "modprobe -va foo bar baz"
# so we ignore modules that start with "-"
for module; do
if [ "${module#-}" = "$module" ]; then
ip link show "$module" || true
lsmod | grep "$module" || true
fi
done
# remove /usr/local/... from PATH so we can exec the real modprobe as a last resort
export PATH='/usr/sbin:/usr/bin:/sbin:/bin'
exec modprobe "$@"

View File

@ -1,7 +1,17 @@
---
toc_folder_title: Commercial
toc_priority: 70
toc_title: Commercial
toc_title: Introduction
---
# ClickHouse Commercial Services {#clickhouse-commercial-services}
This section is a directory of commercial service providers specializing in ClickHouse. They are independent companies not necessarily affiliated with Yandex.
Service categories:
- [Cloud](../commercial/cloud.md)
- [Support](../commercial/support.md)
!!! note "For service providers"
If you happen to represent one of them, feel free to open a pull request adding your company to the respective section (or even adding a new section if the service doesnt fit into existing categories). The easiest way to open a pull-request for documentation page is by using a “pencil” edit button in the top-right corner. If your service available in some local market, make sure to mention it in a localized documentation page as well (or at least point it out in a pull-request description).

View File

@ -5,7 +5,7 @@ toc_title: Architecture Overview
# Overview of ClickHouse Architecture {#overview-of-clickhouse-architecture}
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns). Whenever possible, operations are dispatched on arrays, rather than on individual values. It is called "vectorized query execution" and it helps lower the cost of actual data processing.
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns). Whenever possible, operations are dispatched on arrays, rather than on individual values. It is called “vectorized query execution” and it helps lower the cost of actual data processing.
> This idea is nothing new. It dates back to the `APL` (A programming language, 1957) and its descendants: `A +` (APL dialect), `J` (1990), `K` (1993), and `Q` (programming language from Kx Systems, 2003). Array programming is used in scientific data processing. Neither is this idea something new in relational databases: for example, it is used in the `VectorWise` system (also known as Actian Vector Analytic Database by Actian Corporation).
@ -21,11 +21,11 @@ Various `IColumn` implementations (`ColumnUInt8`, `ColumnString`, and so on) are
Nevertheless, it is possible to work with individual values as well. To represent an individual value, the `Field` is used. `Field` is just a discriminated union of `UInt64`, `Int64`, `Float64`, `String` and `Array`. `IColumn` has the `operator []` method to get the n-th value as a `Field`, and the `insert` method to append a `Field` to the end of a column. These methods are not very efficient, because they require dealing with temporary `Field` objects representing an individual value. There are more efficient methods, such as `insertFrom`, `insertRangeFrom`, and so on.
`Field` doesn't have enough information about a specific data type for a table. For example, `UInt8`, `UInt16`, `UInt32`, and `UInt64` are all represented as `UInt64` in a `Field`.
`Field` doesnt have enough information about a specific data type for a table. For example, `UInt8`, `UInt16`, `UInt32`, and `UInt64` are all represented as `UInt64` in a `Field`.
## Leaky Abstractions {#leaky-abstractions}
`IColumn` has methods for common relational transformations of data, but they dont meet all needs. For example, `ColumnUInt64` doesn't have a method to calculate the sum of two columns, and `ColumnString` doesn't have a method to run a substring search. These countless routines are implemented outside of `IColumn`.
`IColumn` has methods for common relational transformations of data, but they dont meet all needs. For example, `ColumnUInt64` doesnt have a method to calculate the sum of two columns, and `ColumnString` doesnt have a method to run a substring search. These countless routines are implemented outside of `IColumn`.
Various functions on columns can be implemented in a generic, non-efficient way using `IColumn` methods to extract `Field` values, or in a specialized way using knowledge of inner memory layout of data in a specific `IColumn` implementation. It is implemented by casting functions to a specific `IColumn` type and deal with internal representation directly. For example, `ColumnUInt64` has the `getData` method that returns a reference to an internal array, then a separate routine reads or fills that array directly. We have “leaky abstractions” to allow efficient specializations of various routines.
@ -35,7 +35,7 @@ Various functions on columns can be implemented in a generic, non-efficient way
`IDataType` and `IColumn` are only loosely related to each other. Different data types can be represented in memory by the same `IColumn` implementations. For example, `DataTypeUInt32` and `DataTypeDateTime` are both represented by `ColumnUInt32` or `ColumnConstUInt32`. In addition, the same data type can be represented by different `IColumn` implementations. For example, `DataTypeUInt8` can be represented by `ColumnUInt8` or `ColumnConstUInt8`.
`IDataType` only stores metadata. For instance, `DataTypeUInt8` doesn't store anything at all (except virtual pointer `vptr`) and `DataTypeFixedString` stores just `N` (the size of fixed-size strings).
`IDataType` only stores metadata. For instance, `DataTypeUInt8` doesnt store anything at all (except virtual pointer `vptr`) and `DataTypeFixedString` stores just `N` (the size of fixed-size strings).
`IDataType` has helper methods for various data formats. Examples are methods to serialize a value with possible quoting, to serialize a value for JSON, and to serialize a value as part of the XML format. There is no direct correspondence to data formats. For example, the different data formats `Pretty` and `TabSeparated` can use the same `serializeTextEscaped` helper method from the `IDataType` interface.
@ -120,9 +120,9 @@ There are ordinary functions and aggregate functions. For aggregate functions, s
Ordinary functions dont change the number of rows they work as if they are processing each row independently. In fact, functions are not called for individual rows, but for `Block`s of data to implement vectorized query execution.
There are some miscellaneous functions, like [blockSize](../sql-reference/functions/other-functions.md#function-blocksize), [rowNumberInBlock](../sql-reference/functions/other-functions.md#function-rownumberinblock), and [runningAccumulate](../sql-reference/functions/other-functions.md#runningaccumulatexploit block processing and violate the independence of rows.
There are some miscellaneous functions, like [blockSize](../sql-reference/functions/other-functions.md#function-blocksize), [rowNumberInBlock](../sql-reference/functions/other-functions.md#function-rownumberinblock), and \[runningAccumulate\](../sql-reference/functions/other-functions.md\#runningaccumulatexploit block processing and violate the independence of rows.
ClickHouse has strong typing, so theres no implicit type conversion. If a function doesn't support a specific combination of types, it throws an exception. But functions can work (be overloaded) for many different combinations of types. For example, the `plus` function (to implement the `+` operator) works for any combination of numeric types: `UInt8` + `Float32`, `UInt16` + `Int8`, and so on. Also, some variadic functions can accept any number of arguments, such as the `concat` function.
ClickHouse has strong typing, so theres no implicit type conversion. If a function doesnt support a specific combination of types, it throws an exception. But functions can work (be overloaded) for many different combinations of types. For example, the `plus` function (to implement the `+` operator) works for any combination of numeric types: `UInt8` + `Float32`, `UInt16` + `Int8`, and so on. Also, some variadic functions can accept any number of arguments, such as the `concat` function.
Implementing a function may be slightly inconvenient because a function explicitly dispatches supported data types and supported `IColumns`. For example, the `plus` function has code generated by instantiation of a C++ template for each combination of numeric types, and constant or non-constant left and right arguments.
@ -169,13 +169,13 @@ There is no global query plan for distributed query execution. Each node has its
`MergeTree` is a family of storage engines that supports indexing by primary key. The primary key can be an arbitrary tuple of columns or expressions. Data in a `MergeTree` table is stored in “parts”. Each part stores data in the primary key order, so data is ordered lexicographically by the primary key tuple. All the table columns are stored in separate `column.bin` files in these parts. The files consist of compressed blocks. Each block is usually from 64 KB to 1 MB of uncompressed data, depending on the average value size. The blocks consist of column values placed contiguously one after the other. Column values are in the same order for each column (the primary key defines the order), so when you iterate by many columns, you get values for the corresponding rows.
The primary key itself is “sparse”. It doesn't address every single row, but only some ranges of data. A separate `primary.idx` file has the value of the primary key for each N-th row, where N is called `index_granularity` (usually, N = 8192). Also, for each column, we have `column.mrk` files with “marks,” which are offsets to each N-th row in the data file. Each mark is a pair: the offset in the file to the beginning of the compressed block, and the offset in the decompressed block to the beginning of data. Usually, compressed blocks are aligned by marks, and the offset in the decompressed block is zero. Data for `primary.idx` always resides in memory, and data for `column.mrk` files is cached.
The primary key itself is “sparse”. It doesnt address every single row, but only some ranges of data. A separate `primary.idx` file has the value of the primary key for each N-th row, where N is called `index_granularity` (usually, N = 8192). Also, for each column, we have `column.mrk` files with “marks,” which are offsets to each N-th row in the data file. Each mark is a pair: the offset in the file to the beginning of the compressed block, and the offset in the decompressed block to the beginning of data. Usually, compressed blocks are aligned by marks, and the offset in the decompressed block is zero. Data for `primary.idx` always resides in memory, and data for `column.mrk` files is cached.
When we are going to read something from a part in `MergeTree`, we look at `primary.idx` data and locate ranges that could contain requested data, then look at `column.mrk` data and calculate offsets for where to start reading those ranges. Because of sparseness, excess data may be read. ClickHouse is not suitable for a high load of simple point queries, because the entire range with `index_granularity` rows must be read for each key, and the entire compressed block must be decompressed for each column. We made the index sparse because we must be able to maintain trillions of rows per single server without noticeable memory consumption for the index. Also, because the primary key is sparse, it is not unique: it cannot check the existence of the key in the table at INSERT time. You could have many rows with the same key in a table.
When you `INSERT` a bunch of data into `MergeTree`, that bunch is sorted by primary key order and forms a new part. There are background threads that periodically select some parts and merge them into a single sorted part to keep the number of parts relatively low. Thats why it is called `MergeTree`. Of course, merging leads to “write amplification”. All parts are immutable: they are only created and deleted, but not modified. When SELECT is executed, it holds a snapshot of the table (a set of parts). After merging, we also keep old parts for some time to make a recovery after failure easier, so if we see that some merged part is probably broken, we can replace it with its source parts.
`MergeTree` is not an LSM tree because it doesn't contain “memtable” and “log”: inserted data is written directly to the filesystem. This makes it suitable only to INSERT data in batches, not by individual row and not very frequently about once per second is ok, but a thousand times a second is not. We did it this way for simplicitys sake, and because we are already inserting data in batches in our applications.
`MergeTree` is not an LSM tree because it doesnt contain “memtable” and “log”: inserted data is written directly to the filesystem. This makes it suitable only to INSERT data in batches, not by individual row and not very frequently about once per second is ok, but a thousand times a second is not. We did it this way for simplicitys sake, and because we are already inserting data in batches in our applications.
> MergeTree tables can only have one (primary) index: there arent any secondary indices. It would be nice to allow multiple physical representations under one logical table, for example, to store data in more than one physical order or even to allow representations with pre-aggregated data along with original data.
@ -187,7 +187,7 @@ Replication in ClickHouse can be configured on a per-table basis. You could have
Replication is implemented in the `ReplicatedMergeTree` storage engine. The path in `ZooKeeper` is specified as a parameter for the storage engine. All tables with the same path in `ZooKeeper` become replicas of each other: they synchronize their data and maintain consistency. Replicas can be added and removed dynamically simply by creating or dropping a table.
Replication uses an asynchronous multi-master scheme. You can insert data into any replica that has a session with `ZooKeeper`, and data is replicated to all other replicas asynchronously. Because ClickHouse doesn't support UPDATEs, replication is conflict-free. As there is no quorum acknowledgment of inserts, just-inserted data might be lost if one node fails.
Replication uses an asynchronous multi-master scheme. You can insert data into any replica that has a session with `ZooKeeper`, and data is replicated to all other replicas asynchronously. Because ClickHouse doesnt support UPDATEs, replication is conflict-free. As there is no quorum acknowledgment of inserts, just-inserted data might be lost if one node fails.
Metadata for replication is stored in ZooKeeper. There is a replication log that lists what actions to do. Actions are: get part; merge parts; drop a partition, and so on. Each replica copies the replication log to its queue and then executes the actions from the queue. For example, on insertion, the “get the part” action is created in the log, and every replica downloads that part. Merges are coordinated between replicas to get byte-identical results. All parts are merged in the same way on all replicas. It is achieved by electing one replica as the leader, and that replica initiates merges and writes “merge parts” actions to the log.

View File

@ -5,11 +5,11 @@ toc_priority: 25
toc_title: hidden
---
# ClickHouse Engines
# ClickHouse Engines {#clickhouse-engines}
There are two key engine kinds in ClickHouse:
- [Table engines](table-engines/index.md)
- [Database engines](database-engines/index.md)
- [Table engines](../engines/table-engines/index.md)
- [Database engines](../engines/database-engines/index.md)
{## [Original article](https://clickhouse.tech/docs/en/engines/) ##}

View File

@ -3,14 +3,14 @@ toc_folder_title: Integrations
toc_priority: 30
---
# Table Engines for Integrations
# Table Engines for Integrations {#table-engines-for-integrations}
ClickHouse provides various means for integrating with external systems, including table engines. Like with all other table engines, the configuration is done using `CREATE TABLE` or `ALTER TABLE` queries. Then from a user perspective, the configured integration looks like a normal table, but queries to it are proxied to the external system. This transparent querying is one of the key advantages of this approach over alternative integration methods, like external dictionaries or table functions, which require to use custom query methods on each use.
List of supported integrations:
- [ODBC](odbc.md)
- [JDBC](jdbc.md)
- [MySQL](mysql.md)
- [HDFS](hdfs.md)
- [Kafka](kafka.md)
- [ODBC](../../../engines/table-engines/integrations/odbc.md)
- [JDBC](../../../engines/table-engines/integrations/jdbc.md)
- [MySQL](../../../engines/table-engines/integrations/mysql.md)
- [HDFS](../../../engines/table-engines/integrations/hdfs.md)
- [Kafka](../../../engines/table-engines/integrations/kafka.md)

View File

@ -18,7 +18,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
```
See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create.md#create-table-query) query.
See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
The table structure can differ from the original MySQL table structure:

View File

@ -23,7 +23,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
ENGINE = ODBC(connection_settings, external_database, external_table)
```
See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create.md#create-table-query) query.
See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
The table structure can differ from the source table structure:

View File

@ -1,6 +1,45 @@
---
toc_folder_title: Log Family
toc_priority: 29
toc_title: Introduction
---
# Log Engine Family {#log-engine-family}
These engines were developed for scenarios when you need to quickly write many small tables (up to about 1 million rows) and read them later as a whole.
Engines of the family:
- [StripeLog](../../../engines/table-engines/log-family/stripelog.md)
- [Log](../../../engines/table-engines/log-family/log.md)
- [TinyLog](../../../engines/table-engines/log-family/tinylog.md)
## Common Properties {#common-properties}
Engines:
- Store data on a disk.
- Append data to the end of file when writing.
- Support locks for concurrent data access.
During `INSERT` queries, the table is locked, and other queries for reading and writing data both wait for the table to unlock. If there are no data writing queries, any number of data reading queries can be performed concurrently.
- Do not support [mutation](../../../sql-reference/statements/alter.md#alter-mutations) operations.
- Do not support indexes.
This means that `SELECT` queries for ranges of data are not efficient.
- Do not write data atomically.
You can get a table with corrupted data if something breaks the write operation, for example, abnormal server shutdown.
## Differences {#differences}
The `TinyLog` engine is the simplest in the family and provides the poorest functionality and lowest efficiency. The `TinyLog` engine doesnt support parallel data reading by several threads. It reads data slower than other engines in the family that support parallel reading and it uses almost as many descriptors as the `Log` engine because it stores each column in a separate file. Use it in simple low-load scenarios.
The `Log` and `StripeLog` engines support parallel data reading. When reading data, ClickHouse uses multiple threads. Each thread processes a separate data block. The `Log` engine uses a separate file for each column of the table. `StripeLog` stores all the data in one file. As a result, the `StripeLog` engine uses fewer descriptors in the operating system, but the `Log` engine provides higher efficiency when reading data.
[Original article](https://clickhouse.tech/docs/en/operations/table_engines/log_family/) <!--hide-->

View File

@ -1,44 +0,0 @@
---
toc_priority: 31
toc_title: Introduction
---
# Log Engine Family {#log-engine-family}
These engines were developed for scenarios when you need to quickly write many small tables (up to about 1 million rows) and read them later as a whole.
Engines of the family:
- [StripeLog](../../../engines/table-engines/log-family/stripelog.md)
- [Log](../../../engines/table-engines/log-family/log.md)
- [TinyLog](../../../engines/table-engines/log-family/tinylog.md)
## Common Properties {#common-properties}
Engines:
- Store data on a disk.
- Append data to the end of file when writing.
- Support locks for concurrent data access.
During `INSERT` queries, the table is locked, and other queries for reading and writing data both wait for the table to unlock. If there are no data writing queries, any number of data reading queries can be performed concurrently.
- Do not support [mutation](../../../sql-reference/statements/alter.md#alter-mutations) operations.
- Do not support indexes.
This means that `SELECT` queries for ranges of data are not efficient.
- Do not write data atomically.
You can get a table with corrupted data if something breaks the write operation, for example, abnormal server shutdown.
## Differences {#differences}
The `TinyLog` engine is the simplest in the family and provides the poorest functionality and lowest efficiency. The `TinyLog` engine doesnt support parallel data reading by several threads. It reads data slower than other engines in the family that support parallel reading and it uses almost as many descriptors as the `Log` engine because it stores each column in a separate file. Use it in simple low-load scenarios.
The `Log` and `StripeLog` engines support parallel data reading. When reading data, ClickHouse uses multiple threads. Each thread processes a separate data block. The `Log` engine uses a separate file for each column of the table. `StripeLog` stores all the data in one file. As a result, the `StripeLog` engine uses fewer descriptors in the operating system, but the `Log` engine provides higher efficiency when reading data.
[Original article](https://clickhouse.tech/docs/en/operations/table_engines/log_family/) <!--hide-->

View File

@ -5,7 +5,7 @@ toc_title: Log
# Log {#log}
Engine belongs to the family of log engines. See the common properties of log engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/log-family.md) article.
Engine belongs to the family of log engines. See the common properties of log engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/index.md) article.
Log differs from [TinyLog](../../../engines/table-engines/log-family/tinylog.md) in that a small file of “marks” resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads.
For concurrent data access, the read operations can be performed simultaneously, while write operations block reads and each other.

View File

@ -5,7 +5,7 @@ toc_title: StripeLog
# Stripelog {#stripelog}
This engine belongs to the family of log engines. See the common properties of log engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/log-family.md) article.
This engine belongs to the family of log engines. See the common properties of log engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/index.md) article.
Use this engine in scenarios when you need to write many tables with a small amount of data (less than 1 million rows).
@ -20,7 +20,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
) ENGINE = StripeLog
```
See the detailed description of the [CREATE TABLE](../../../sql-reference/statements/create.md#create-table-query) query.
See the detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
## Writing the Data {#table_engines-stripelog-writing-the-data}

View File

@ -5,7 +5,7 @@ toc_title: TinyLog
# TinyLog {#tinylog}
The engine belongs to the log engine family. See [Log Engine Family](../../../engines/table-engines/log-family/log-family.md) for common properties of log engines and their differences.
The engine belongs to the log engine family. See [Log Engine Family](../../../engines/table-engines/log-family/index.md) for common properties of log engines and their differences.
This table engine is typically used with the write-once method: write data one time, then read it as many times as necessary. For example, you can use `TinyLog`-type tables for intermediary data that is processed in small batches. Note that storing data in a large number of small tables is inefficient.

View File

@ -32,7 +32,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[SETTINGS name=value, ...]
```
For a description of request parameters, see [request description](../../../sql-reference/statements/create.md).
For a description of request parameters, see [request description](../../../sql-reference/statements/create/table.md).
**Query clauses**

View File

@ -26,7 +26,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[SETTINGS name=value, ...]
```
For a description of query parameters, see [query description](../../../sql-reference/statements/create.md).
For a description of query parameters, see [query description](../../../sql-reference/statements/create/table.md).
**CollapsingMergeTree Parameters**

View File

@ -28,7 +28,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[SETTINGS name=value, ...]
```
See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create.md#create-table-query) query.
See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
A table for the Graphite data should have the following columns for the following data:

View File

@ -1,6 +1,17 @@
---
toc_folder_title: MergeTree Family
toc_priority: 28
toc_title: Introduction
---
# MergeTree Engine Family
Table engines from the MergeTree family are the core of ClickHouse data storage capabilities. They provide most features for resilience and high-performance data retrieval: columnar storage, custom partitioning, sparse primary index, secondary data-skipping indexes, etc.
Base [MergeTree](mergetree.md) table engine can be considered the default table engine for single-node ClickHouse instances because it is versatile and practical for a wide range of use cases.
For production usage [ReplicatedMergeTree](replication.md) is the way to go, because it adds high-availability to all features of regular MergeTree engine. A bonus is automatic data deduplication on data ingestion, so the software can safely retry if there was some network issue during insert.
All other engines of MergeTree family add extra functionality for some specific use cases. Usually, it's implemented as additional data manipulation in background.
The main downside of MergeTree engines is that they are rather heavy-weight. So the typical pattern is to have not so many of them. If you need many small tables, for example for temporary data, consider [Log engine family](../../../engines/table-engines/log-family/index.md).

View File

@ -49,7 +49,7 @@ ORDER BY expr
[SETTINGS name=value, ...]
```
For a description of parameters, see the [CREATE query description](../../../sql-reference/statements/create.md).
For a description of parameters, see the [CREATE query description](../../../sql-reference/statements/create/table.md).
### Query Clauses {#mergetree-query-clauses}

View File

@ -27,7 +27,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[SETTINGS name=value, ...]
```
For a description of request parameters, see [request description](../../../sql-reference/statements/create.md).
For a description of request parameters, see [statement description](../../../sql-reference/statements/create/table.md).
**ReplacingMergeTree Parameters**

View File

@ -24,7 +24,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[SETTINGS name=value, ...]
```
For a description of request parameters, see [request description](../../../sql-reference/statements/create.md).
For a description of request parameters, see [request description](../../../sql-reference/statements/create/table.md).
**Parameters of SummingMergeTree**

View File

@ -29,7 +29,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[SETTINGS name=value, ...]
```
For a description of query parameters, see the [query description](../../../sql-reference/statements/create.md).
For a description of query parameters, see the [query description](../../../sql-reference/statements/create/table.md).
**Engine Parameters**

View File

@ -3,7 +3,7 @@ toc_folder_title: Special
toc_priority: 31
---
# Special Table Engines
# Special Table Engines {#special-table-engines}
There are three main categories of table engines:

View File

@ -20,7 +20,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
) ENGINE = Join(join_strictness, join_type, k1[, k2, ...])
```
See the detailed description of the [CREATE TABLE](../../../sql-reference/statements/create.md#create-table-query) query.
See the detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
**Engine Parameters**

View File

@ -5,6 +5,6 @@ toc_title: MaterializedView
# MaterializedView Table Engine {#materializedview}
Used for implementing materialized views (for more information, see [CREATE TABLE](../../../sql-reference/statements/create.md)). For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses that engine.
Used for implementing materialized views (for more information, see [CREATE TABLE](../../../sql-reference/statements/create/table.md)). For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses that engine.
[Original article](https://clickhouse.tech/docs/en/operations/table_engines/materializedview/) <!--hide-->

View File

@ -17,7 +17,7 @@ Indexes
: ClickHouse keeps data structures in memory that allows reading not only used columns but only necessary row ranges of those columns.
Data compression
: Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data column often has the same or not so many different values for neighboring rows. In addition to general-purpose compression, ClickHouse supports [specialized codecs](../../sql-reference/statements/create.md#create-query-specialized-codecs) that can make data even more compact.
: Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data column often has the same or not so many different values for neighboring rows. In addition to general-purpose compression, ClickHouse supports [specialized codecs](../../sql-reference/statements/create/table.md#create-query-specialized-codecs) that can make data even more compact.
Vectorized query execution
: ClickHouse not only stores data in columns but also processes data in columns. It leads to better CPU cache utilization and allows for [SIMD](https://en.wikipedia.org/wiki/SIMD) CPU instructions usage.

View File

@ -8,8 +8,8 @@ toc_priority: 101
ClickHouse is a generic data storage solution for [OLAP](../../faq/general/olap.md) workloads, while there are many specialized time-series database management systems. Nevertheless, ClickHouses [focus on query execution speed](../../faq/general/why-clickhouse-is-so-fast.md) allows it to outperform specialized systems in many cases. There are many independent benchmarks on this topic out there ([example](https://medium.com/@AltinityDB/clickhouse-for-time-series-scalability-benchmarks-e181132a895b)), so were not going to conduct one here. Instead, lets focus on ClickHouse features that are important to use if thats your use case.
First of all, there are **[specialized codecs](../../sql-reference/statements/create.md#create-query-specialized-codecs)** which make typical time-series. Either common algorithms like `DoubleDelta` and `Gorilla` or specific to ClickHouse like `T64`.
First of all, there are **[specialized codecs](../../sql-reference/statements/create/table.md#create-query-specialized-codecs)** which make typical time-series. Either common algorithms like `DoubleDelta` and `Gorilla` or specific to ClickHouse like `T64`.
Second, time-series queries often hit only recent data, like one day or one week old. It makes sense to use servers that have both fast nVME/SSD drives and high-capacity HDD drives. ClickHouse [TTL](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) feature allows to configure keeping fresh hot data on fast drives and gradually move it to slower drives as it ages. Rollup or removal of even older data is also possible if your requirements demand it.
Even though its against ClickHouse philosophy of storing and processing raw data, you can use [materialized views](../../sql-reference/statements/create.md#create-view) to fit into even tighter latency or costs requirements.
Even though its against ClickHouse philosophy of storing and processing raw data, you can use [materialized views](../../sql-reference/statements/create/view.md) to fit into even tighter latency or costs requirements.

View File

@ -98,9 +98,9 @@ To run ClickHouse inside Docker follow the guide on [Docker Hub](https://hub.doc
For non-Linux operating systems and for AArch64 CPU arhitecture, ClickHouse builds are provided as a cross-compiled binary from the latest commit of the `master` branch (with a few hours delay).
- [macOS](https://builds.clickhouse.tech/master/macos/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/macos/clickhouse' && chmod a+x ./clickhouse`
- [FreeBSD](https://builds.clickhouse.tech/master/freebsd/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/freebsd/clickhouse' && chmod a+x ./clickhouse`
- [AArch64](https://builds.clickhouse.tech/master/aarch64/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/aarch64/clickhouse' && chmod a+x ./clickhouse`
- [macOS](https://builds.clickhouse.tech/master/macos/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/macos/clickhouse' && chmod a+x ./clickhouse`
- [FreeBSD](https://builds.clickhouse.tech/master/freebsd/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/freebsd/clickhouse' && chmod a+x ./clickhouse`
- [AArch64](https://builds.clickhouse.tech/master/aarch64/clickhouse) — `curl -O 'https://builds.clickhouse.tech/master/aarch64/clickhouse' && chmod a+x ./clickhouse`
After downloading, you can use the `clickhouse client` to connect to the server, or `clickhouse local` to process local data. To run `clickhouse server`, you have to additionally download [server](https://github.com/ClickHouse/ClickHouse/blob/master/programs/server/config.xml) and [users](https://github.com/ClickHouse/ClickHouse/blob/master/programs/server/users.xml) configuration files from GitHub.

View File

@ -99,7 +99,7 @@ As in most databases management systems, ClickHouse logically groups tables into
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
```
Syntax for creating tables is way more complicated compared to databases (see [reference](../sql-reference/statements/create.md). In general `CREATE TABLE` statement has to specify three key things:
Syntax for creating tables is way more complicated compared to databases (see [reference](../sql-reference/statements/create/table.md). In general `CREATE TABLE` statement has to specify three key things:
1. Name of table to create.
2. Table schema, i.e. list of columns and their [data types](../sql-reference/data-types/index.md).

View File

@ -979,7 +979,7 @@ message MessageType {
}
```
are not applied; the [table defaults](../sql-reference/statements/create.md#create-default-values) are used instead of them.
are not applied; the [table defaults](../sql-reference/statements/create/table.md#create-default-values) are used instead of them.
ClickHouse inputs and outputs protobuf messages in the `length-delimited` format.
It means before every message should be written its length as a [varint](https://developers.google.com/protocol-buffers/docs/encoding#varints).

View File

@ -367,34 +367,34 @@ $ curl -v 'http://localhost:8123/predefined_query'
As you can see from the example if `http_handlers` is configured in the config.xml file and `http_handlers` can contain many `rules`. ClickHouse will match the HTTP requests received to the predefined type in `rule` and the first matched runs the handler. Then ClickHouse will execute the corresponding predefined query if the match is successful.
Now `rule` can configure `method`, `headers`, `url`, `handler`:
- `method` is responsible for matching the method part of the HTTP request. `method` fully conforms to the definition of [method](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) in the HTTP protocol. It is an optional configuration. If it is not defined in the configuration file, it does not match the method portion of the HTTP request.
- `method` is responsible for matching the method part of the HTTP request. `method` fully conforms to the definition of [method](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) in the HTTP protocol. It is an optional configuration. If it is not defined in the configuration file, it does not match the method portion of the HTTP request.
- `url` is responsible for matching the URL part of the HTTP request. It is compatible with [RE2](https://github.com/google/re2)s regular expressions. It is an optional configuration. If it is not defined in the configuration file, it does not match the URL portion of the HTTP request.
- `url` is responsible for matching the URL part of the HTTP request. It is compatible with [RE2](https://github.com/google/re2)s regular expressions. It is an optional configuration. If it is not defined in the configuration file, it does not match the URL portion of the HTTP request.
- `headers` are responsible for matching the header part of the HTTP request. It is compatible with RE2s regular expressions. It is an optional configuration. If it is not defined in the configuration file, it does not match the header portion of the HTTP request.
- `headers` are responsible for matching the header part of the HTTP request. It is compatible with RE2s regular expressions. It is an optional configuration. If it is not defined in the configuration file, it does not match the header portion of the HTTP request.
- `handler` contains the main processing part. Now `handler` can configure `type`, `status`, `content_type`, `response_content`, `query`, `query_param_name`.
`type` currently supports three types: [predefined\_query\_handler](#predefined_query_handler), [dynamic\_query\_handler](#dynamic_query_handler), [static](#static).
- `handler` contains the main processing part. Now `handler` can configure `type`, `status`, `content_type`, `response_content`, `query`, `query_param_name`.
`type` currently supports three types: [predefined_query_handler](#predefined_query_handler), [dynamic_query_handler](#dynamic_query_handler), [static](#static).
- `query` — use with `predefined_query_handler` type, executes query when the handler is called.
- `query_param_name` — use with `dynamic_query_handler` type, extracts and executes the value corresponding to the `query_param_name` value in HTTP request params.
- `status` — use with `static` type, response status code.
- `content_type` — use with `static` type, response [content-type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type).
- `response_content` — use with `static` type, response content sent to client, when using the prefix file:// or config://, find the content from the file or configuration sends to client.
Next are the configuration methods for different `type`.
### predefined_query_handler {#predefined_query_handler}
### predefined\_query\_handler {#predefined_query_handler}
`predefined_query_handler` supports setting `Settings` and `query_params` values. You can configure `query` in the type of `predefined_query_handler`.
`query` value is a predefined query of `predefined_query_handler`, which is executed by ClickHouse when an HTTP request is matched and the result of the query is returned. It is a must configuration.
The following example defines the values of [max_threads](../operations/settings/settings.md#settings-max_threads) and `max_alter_threads` settings, then queries the system table to check whether these settings were set successfully.
The following example defines the values of [max\_threads](../operations/settings/settings.md#settings-max_threads) and `max_alter_threads` settings, then queries the system table to check whether these settings were set successfully.
Example:
@ -425,13 +425,13 @@ max_alter_threads 2
!!! note "caution"
In one `predefined_query_handler` only supports one `query` of an insert type.
### dynamic_query_handler {#dynamic_query_handler}
### dynamic\_query\_handler {#dynamic_query_handler}
In `dynamic_query_handler`, the query is written in the form of param of the HTTP request. The difference is that in `predefined_query_handler`, the query is written in the configuration file. You can configure `query_param_name` in `dynamic_query_handler`.
ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the param is not passed in.
To experiment with this functionality, the example defines the values of [max_threads](../operations/settings/settings.md#settings-max_threads) and `max_alter_threads` and `queries` whether the settings were set successfully.
To experiment with this functionality, the example defines the values of [max\_threads](../operations/settings/settings.md#settings-max_threads) and `max_alter_threads` and `queries` whether the settings were set successfully.
Example:
@ -456,7 +456,7 @@ max_alter_threads 2
### static {#static}
`static` can return [content_type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type), [status](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) and `response_content`. `response_content` can return the specified content.
`static` can return [content\_type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type), [status](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) and `response_content`. `response_content` can return the specified content.
Example:

View File

@ -3,15 +3,14 @@ toc_folder_title: Third-Party
toc_priority: 24
---
# Third-Party Interfaces
# Third-Party Interfaces {#third-party-interfaces}
This is a collection of links to third-party tools that provide some sort of interface to ClickHouse. It can be either visual interface, command-line interface or an API:
- [Client libraries](client-libraries.md)
- [Integrations](integrations.md)
- [GUI](gui.md)
- [Proxies](proxy.md)
- [Client libraries](../../interfaces/third-party/client-libraries.md)
- [Integrations](../../interfaces/third-party/integrations.md)
- [GUI](../../interfaces/third-party/gui.md)
- [Proxies](../../interfaces/third-party/proxy.md)
!!! note "Note"
Generic tools that support common API like [ODBC](../../interfaces/odbc.md) or [JDBC](../../interfaces/jdbc.md) usually can work with ClickHouse as well, but are not listed here because there are way too many of them.

View File

@ -104,6 +104,5 @@ toc_title: Integrations
- [ActiveRecord](https://github.com/PNixx/clickhouse-activerecord)
- [GraphQL](https://github.com/graphql)
- [activecube-graphql](https://github.com/bitquery/activecube-graphql)
[Original article](https://clickhouse.tech/docs/en/interfaces/third-party/integrations/) <!--hide-->

View File

@ -50,7 +50,7 @@ toc_title: Adopters
| <a href="http://www.pragma-innovation.fr/" class="favicon">Pragma Innovation</a> | Telemetry and Big Data Analysis | Main product | — | — | [Slides in English, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup18/4_pragma_innovation.pdf) |
| <a href="https://www.qingcloud.com/" class="favicon">QINGCLOUD</a> | Cloud services | Main product | — | — | [Slides in Chinese, October 2018](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup19/4.%20Cloud%20%2B%20TSDB%20for%20ClickHouse%20张健%20QingCloud.pdf) |
| <a href="https://qrator.net" class="favicon">Qrator</a> | DDoS protection | Main product | — | — | [Blog Post, March 2019](https://blog.qrator.net/en/clickhouse-ddos-mitigation_37/) |
| <a href="https://www.percent.cn/" class="favicon">Percent 百分点</a> | Analytics | Main Product | — | — | [Slides in Chinese, June 2019](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup24/4.%20ClickHouse万亿数据双中心的设计与实践%20.pdf) |
| <a href="https://www.percent.cn/" class="favicon">Percent 百分点</a> | Analytics | Main Product | — | — | [Slides in Chinese, June 2019](https://github.com/ClickHouse/clickhouse-presentations/blob/master/meetup24/4.%20ClickHouse万亿数据双中心的设计与实践%20.pdf) |
| <a href="https://plausible.io/" class="favicon">Plausible</a> | Analytics | Main Product | — | — | [Blog post, June 2020](https://twitter.com/PlausibleHQ/status/1273889629087969280) |
| <a href="https://rambler.ru" class="favicon">Rambler</a> | Internet services | Analytics | — | — | [Talk in Russian, April 2018](https://medium.com/@ramblertop/разработка-api-clickhouse-для-рамблер-топ-100-f4c7e56f3141) |
| <a href="https://www.tencent.com" class="favicon">Tencent</a> | Messaging | Logging | — | — | [Talk in Chinese, November 2019](https://youtu.be/T-iVQRuw-QY?t=5050) |

View File

@ -17,11 +17,11 @@ Its also worth noting that ClickHouse is a database management system, not a
Some column-oriented DBMSs do not use data compression. However, data compression does play a key role in achieving excellent performance.
In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](../sql-reference/statements/create.md#create-query-specialized-codecs) for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones.
In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](../sql-reference/statements/create/table.md#create-query-specialized-codecs) for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones.
## Disk Storage of Data {#disk-storage-of-data}
Keeping data physically sorted by primary key makes it possible to extract data for its specific values or value ranges with low latency, less than a few dozen milliseconds. Some column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach encourages the allocation of a larger hardware budget than is necessary for real-time analysis.
Keeping data physically sorted by primary key makes it possible to extract data for its specific values or value ranges with low latency, less than a few dozen milliseconds. Some column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach encourages the allocation of a larger hardware budget than is necessary for real-time analysis.
ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available.
@ -57,11 +57,11 @@ Having a data physically sorted by primary key makes it possible to extract data
## Secondary Indexes {#secondary-indexes}
Unlike other database management systems, secondary indexes in ClickHouse does not point to specific rows or row ranges. Instead, they allow the database to know in advance that all rows in some data parts wouldn't match the query filtering conditions and do not read them at all, thus they are called [data skipping indexes](../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-data_skipping-indexes).
Unlike other database management systems, secondary indexes in ClickHouse does not point to specific rows or row ranges. Instead, they allow the database to know in advance that all rows in some data parts wouldnt match the query filtering conditions and do not read them at all, thus they are called [data skipping indexes](../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-data_skipping-indexes).
## Suitable for Online Queries {#suitable-for-online-queries}
Most OLAP database management systems don't aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more which forces to prepare reports offline (in advance or by responding with "come back later").
Most OLAP database management systems dont aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more which forces to prepare reports offline (in advance or by responding with “come back later”).
In ClickHouse low latency means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment while the user interface page is loading. In other words, online.
@ -73,9 +73,9 @@ ClickHouse provides various ways to trade accuracy for performance:
2. Running a query based on a part (sample) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
## Adaptive Join Algorithm
## Adaptive Join Algorithm {#adaptive-join-algorithm}
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table.
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if theres more than one large table.
## Data Replication and Data Integrity Support {#data-replication-and-data-integrity-support}
@ -83,9 +83,9 @@ ClickHouse uses asynchronous multi-master replication. After being written to an
For more information, see the section [Data replication](../engines/table-engines/mergetree-family/replication.md).
## Role-Based Access Control
## Role-Based Access Control {#role-based-access-control}
ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](../operations/access-rights.md) similar to what can be found in ANSI SQL standard and popular relational database management systems.
ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](../operations/access-rights.md) similar to what can be found in ANSI SQL standard and popular relational database management systems.
## Features that Can Be Considered Disadvantages {#clickhouse-features-that-can-be-considered-disadvantages}

View File

@ -58,7 +58,7 @@ Privileges can be granted to a user account by the [GRANT](../sql-reference/stat
Management queries:
- [CREATE USER](../sql-reference/statements/create.md#create-user-statement)
- [CREATE USER](../sql-reference/statements/create/user.md)
- [ALTER USER](../sql-reference/statements/alter.md#alter-user-statement)
- [DROP USER](../sql-reference/statements/misc.md#drop-user-statement)
- [SHOW CREATE USER](../sql-reference/statements/show.md#show-create-user-statement)
@ -84,7 +84,7 @@ Role contains:
Management queries:
- [CREATE ROLE](../sql-reference/statements/create.md#create-role-statement)
- [CREATE ROLE](../sql-reference/statements/create/role.md)
- [ALTER ROLE](../sql-reference/statements/alter.md#alter-role-statement)
- [DROP ROLE](../sql-reference/statements/misc.md#drop-role-statement)
- [SET ROLE](../sql-reference/statements/misc.md#set-role-statement)
@ -99,7 +99,7 @@ Row policy is a filter that defines which of the rows are available to a user or
Management queries:
- [CREATE ROW POLICY](../sql-reference/statements/create.md#create-row-policy-statement)
- [CREATE ROW POLICY](../sql-reference/statements/create/row-policy.md)
- [ALTER ROW POLICY](../sql-reference/statements/alter.md#alter-row-policy-statement)
- [DROP ROW POLICY](../sql-reference/statements/misc.md#drop-row-policy-statement)
- [SHOW CREATE ROW POLICY](../sql-reference/statements/show.md#show-create-row-policy-statement)
@ -110,7 +110,7 @@ Settings profile is a collection of [settings](../operations/settings/index.md).
Management queries:
- [CREATE SETTINGS PROFILE](../sql-reference/statements/create.md#create-settings-profile-statement)
- [CREATE SETTINGS PROFILE](../sql-reference/statements/create/settings-profile.md#create-settings-profile-statement)
- [ALTER SETTINGS PROFILE](../sql-reference/statements/alter.md#alter-settings-profile-statement)
- [DROP SETTINGS PROFILE](../sql-reference/statements/misc.md#drop-settings-profile-statement)
- [SHOW CREATE SETTINGS PROFILE](../sql-reference/statements/show.md#show-create-settings-profile-statement)
@ -123,7 +123,7 @@ Quota contains a set of limits for some durations, as well as a list of roles an
Management queries:
- [CREATE QUOTA](../sql-reference/statements/create.md#create-quota-statement)
- [CREATE QUOTA](../sql-reference/statements/create/quota.md)
- [ALTER QUOTA](../sql-reference/statements/alter.md#alter-quota-statement)
- [DROP QUOTA](../sql-reference/statements/misc.md#drop-quota-statement)
- [SHOW CREATE QUOTA](../sql-reference/statements/show.md#show-create-quota-statement)

View File

@ -1,6 +1,9 @@
---
toc_folder_title: Optimizing Performance
toc_priority: 52
toc_hidden: true
---
# Optimizing Performance
- [Sampling query profiler](sampling-query-profiler.md)

View File

@ -348,7 +348,7 @@ Keys:
Default value: `LOG_USER` if `address` is specified, `LOG_DAEMON otherwise.`
- format Message format. Possible values: `bsd` and `syslog.`
## send_crash_reports {#server_configuration_parameters-logger}
## send\_crash\_reports {#server_configuration_parameters-logger}
Settings for opt-in sending crash reports to the ClickHouse core developers team via [Sentry](https://sentry.io).
Enabling it, especially in pre-production environments, is greatly appreciated.
@ -398,8 +398,7 @@ The cache is shared for the server and memory is allocated as needed. The cache
<mark_cache_size>5368709120</mark_cache_size>
```
## max_server_memory_usage {#max_server_memory_usage}
## max\_server\_memory\_usage {#max_server_memory_usage}
Limits total RAM usage by the ClickHouse server. You can specify it only for the default profile.
@ -416,8 +415,7 @@ On hosts with low RAM and swap, you possibly need setting `max_server_memory_usa
**See also**
- [max_memory_usage](../settings/query-complexity.md#settings_max_memory_usage)
- [max\_memory\_usage](../../operations/settings/query-complexity.md#settings_max_memory_usage)
## max\_concurrent\_queries {#max-concurrent-queries}

View File

@ -36,7 +36,7 @@ Memory usage is not monitored for the states of certain aggregate functions.
Memory usage is not fully tracked for states of the aggregate functions `min`, `max`, `any`, `anyLast`, `argMin`, `argMax` from `String` and `Array` arguments.
Memory consumption is also restricted by the parameters `max_memory_usage_for_user` and [max_server_memory_usage](../server-configuration-parameters/settings.md#max_server_memory_usage).
Memory consumption is also restricted by the parameters `max_memory_usage_for_user` and [max\_server\_memory\_usage](../../operations/server-configuration-parameters/settings.md#max_server_memory_usage).
## max\_memory\_usage\_for\_user {#max-memory-usage-for-user}
@ -46,7 +46,6 @@ Default values are defined in [Settings.h](https://github.com/ClickHouse/ClickHo
See also the description of [max\_memory\_usage](#settings_max_memory_usage).
## max\_rows\_to\_read {#max-rows-to-read}
The following restrictions can be checked on each block (instead of on each row). That is, the restrictions can be broken a little.

View File

@ -129,7 +129,7 @@ Default value: 0.
## max\_http\_get\_redirects {#setting-max_http_get_redirects}
Limits the maximum number of HTTP GET redirect hops for [URL](../../engines/table-engines/special/url.md)-engine tables. The setting applies to both types of tables: those created by the [CREATE TABLE](../../sql-reference/statements/create.md#create-table-query) query and by the [url](../../sql-reference/table-functions/url.md) table function.
Limits the maximum number of HTTP GET redirect hops for [URL](../../engines/table-engines/special/url.md)-engine tables. The setting applies to both types of tables: those created by the [CREATE TABLE](../../sql-reference/statements/create/table.md) query and by the [url](../../sql-reference/table-functions/url.md) table function.
Possible values:
@ -733,8 +733,8 @@ Limits maximum recursion depth in the recursive descent parser. Allows to contro
Possible values:
- Positive integer.
- 0 — Recursion depth is unlimited.
- Positive integer.
- 0 — Recursion depth is unlimited.
Default value: 1000.
@ -1427,20 +1427,20 @@ Possible values:
Default value: 16.
## always_fetch_merged_part {#always_fetch_merged_part}
## always\_fetch\_merged\_part {#always_fetch_merged_part}
Prohibits data parts merging in [Replicated*MergeTree](../../engines/table-engines/mergetree-family/replication.md)-engine tables.
Prohibits data parts merging in [Replicated\*MergeTree](../../engines/table-engines/mergetree-family/replication.md)-engine tables.
When merging is prohibited, the replica never merges parts and always downloads merged parts from other replicas. If there is no required data yet, the replica waits for it. CPU and disk load on the replica server decreases, but the network load on cluster increases. This setting can be useful on servers with relatively weak CPUs or slow disks, such as servers for backups storage.
Possible values:
- 0 — `Replicated*MergeTree`-engine tables merge data parts at the replica.
- 1 — `Replicated*MergeTree`-engine tables don't merge data parts at the replica. The tables download merged data parts from other replicas.
- 1 — `Replicated*MergeTree`-engine tables dont merge data parts at the replica. The tables download merged data parts from other replicas.
Default value: 0.
**See Also**
**See Also**
- [Data Replication](../../engines/table-engines/mergetree-family/replication.md)
@ -1454,11 +1454,11 @@ Possible values:
Default value: 16.
## transform_null_in {#transform_null_in}
## transform\_null\_in {#transform_null_in}
Enables equality of [NULL](../../sql-reference/syntax.md#null-literal) values for [IN](../../sql-reference/operators/in.md) operator.
By default, `NULL` values can't be compared because `NULL` means undefined value. Thus, comparison `expr = NULL` must always return `false`. With this setting `NULL = NULL` returns `true` for `IN` operator.
By default, `NULL` values cant be compared because `NULL` means undefined value. Thus, comparison `expr = NULL` must always return `false`. With this setting `NULL = NULL` returns `true` for `IN` operator.
Possible values:
@ -1467,11 +1467,11 @@ Possible values:
Default value: 0.
**Example**
**Example**
Consider the `null_in` table:
```text
``` text
┌──idx─┬─────i─┐
│ 1 │ 1 │
│ 2 │ NULL │
@ -1481,13 +1481,13 @@ Consider the `null_in` table:
Query:
```sql
``` sql
SELECT idx, i FROM null_in WHERE i IN (1, NULL) SETTINGS transform_null_in = 0;
```
Result:
```text
``` text
┌──idx─┬────i─┐
│ 1 │ 1 │
└──────┴──────┘
@ -1495,24 +1495,23 @@ Result:
Query:
```sql
``` sql
SELECT idx, i FROM null_in WHERE i IN (1, NULL) SETTINGS transform_null_in = 1;
```
Result:
```text
``` text
┌──idx─┬─────i─┐
│ 1 │ 1 │
│ 2 │ NULL │
└──────┴───────┘
```
**See Also**
**See Also**
- [NULL Processing in IN Operators](../../sql-reference/operators/in.md#in-null-processing)
## low\_cardinality\_max\_dictionary\_size {#low_cardinality_max_dictionary_size}
Sets a maximum size in rows of a shared global dictionary for the [LowCardinality](../../sql-reference/data-types/lowcardinality.md) data type that can be written to a storage file system. This setting prevents issues with RAM in case of unlimited dictionary growth. All the data that cant be encoded due to maximum dictionary size limitation ClickHouse writes in an ordinary method.
@ -1570,9 +1569,9 @@ Possible values:
Default value: 0.
## min_insert_block_size_rows_for_materialized_views {#min-insert-block-size-rows-for-materialized-views}
## min\_insert\_block\_size\_rows\_for\_materialized\_views {#min-insert-block-size-rows-for-materialized-views}
Sets minimum number of rows in block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. This setting is applied only for blocks inserted into [materialized view](../../sql-reference/statements/create.md#create-view). By adjusting this setting, you control blocks squashing while pushing to materialized view and avoid excessive memory usage.
Sets minimum number of rows in block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. This setting is applied only for blocks inserted into [materialized view](../../sql-reference/statements/create/view.md). By adjusting this setting, you control blocks squashing while pushing to materialized view and avoid excessive memory usage.
Possible values:
@ -1583,11 +1582,11 @@ Default value: 1048576.
**See Also**
- [min_insert_block_size_rows](#min-insert-block-size-rows)
- [min\_insert\_block\_size\_rows](#min-insert-block-size-rows)
## min_insert_block_size_bytes_for_materialized_views {#min-insert-block-size-bytes-for-materialized-views}
## min\_insert\_block\_size\_bytes\_for\_materialized\_views {#min-insert-block-size-bytes-for-materialized-views}
Sets minimum number of bytes in block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. This setting is applied only for blocks inserted into [materialized view](../../sql-reference/statements/create.md#create-view). By adjusting this setting, you control blocks squashing while pushing to materialized view and avoid excessive memory usage.
Sets minimum number of bytes in block which can be inserted into a table by an `INSERT` query. Smaller-sized blocks are squashed into bigger ones. This setting is applied only for blocks inserted into [materialized view](../../sql-reference/statements/create/view.md). By adjusting this setting, you control blocks squashing while pushing to materialized view and avoid excessive memory usage.
Possible values:
@ -1598,6 +1597,6 @@ Default value: 268435456.
**See also**
- [min_insert_block_size_bytes](#min-insert-block-size-bytes)
- [min\_insert\_block\_size\_bytes](#min-insert-block-size-bytes)
[Original article](https://clickhouse.tech/docs/en/operations/settings/settings/) <!-- hide -->

View File

@ -1,3 +1,3 @@
## system.asynchronous\_metric\_log {#system-tables-async-log}
Contains the historical values for `system.asynchronous_log` (see [system.asynchronous\_metrics](asynchronous_metrics.md#system_tables-asynchronous_metrics))
Contains the historical values for `system.asynchronous_log` (see [system.asynchronous\_metrics](../../operations/system-tables/asynchronous_metrics.md#system_tables-asynchronous_metrics))

View File

@ -31,6 +31,6 @@ SELECT * FROM system.asynchronous_metrics LIMIT 10
**See Also**
- [Monitoring](../../operations/monitoring.md) — Base concepts of ClickHouse monitoring.
- [system.metrics](metrics.md#system_tables-metrics) — Contains instantly calculated metrics.
- [system.events](events.md#system_tables-events) — Contains a number of events that have occurred.
- [system.metric\_log](metric_log.md#system_tables-metric_log) — Contains a history of metrics values from tables `system.metrics` и `system.events`.
- [system.metrics](../../operations/system-tables/metrics.md#system_tables-metrics) — Contains instantly calculated metrics.
- [system.events](../../operations/system-tables/events.md#system_tables-events) — Contains a number of events that have occurred.
- [system.metric\_log](../../operations/system-tables/metric_log.md#system_tables-metric_log) — Contains a history of metrics values from tables `system.metrics` и `system.events`.

View File

@ -4,6 +4,6 @@ Contains information about detached parts of [MergeTree](../../engines/table-eng
For user-detached parts, the reason is empty. Such parts can be attached with [ALTER TABLE ATTACH PARTITION\|PART](../../sql-reference/statements/alter.md#alter_attach-partition) command.
For the description of other columns, see [system.parts](parts.md#system_tables-parts).
For the description of other columns, see [system.parts](../../operations/system-tables/parts.md#system_tables-parts).
If part name is invalid, values of some columns may be `NULL`. Such parts can be deleted with [ALTER TABLE DROP DETACHED PART](../../sql-reference/statements/alter.md#alter_drop-detached).

View File

@ -26,7 +26,7 @@ SELECT * FROM system.events LIMIT 5
**See Also**
- [system.asynchronous\_metrics](asynchronous_metrics.md#system_tables-asynchronous_metrics) — Contains periodically calculated metrics.
- [system.metrics](metrics.md#system_tables-metrics) — Contains instantly calculated metrics.
- [system.metric\_log](metric_log.md#system_tables-metric_log) — Contains a history of metrics values from tables `system.metrics` и `system.events`.
- [system.asynchronous\_metrics](../../operations/system-tables/asynchronous_metrics.md#system_tables-asynchronous_metrics) — Contains periodically calculated metrics.
- [system.metrics](../../operations/system-tables/metrics.md#system_tables-metrics) — Contains instantly calculated metrics.
- [system.metric\_log](../../operations/system-tables/metric_log.md#system_tables-metric_log) — Contains a history of metrics values from tables `system.metrics` и `system.events`.
- [Monitoring](../../operations/monitoring.md) — Base concepts of ClickHouse monitoring.

View File

@ -20,7 +20,7 @@ System tables:
Most of system tables store their data in RAM. A ClickHouse server creates such system tables at the start.
Unlike other system tables, the system tables [metric\_log](metric_log.md#system_tables-metric_log), [query\_log](query_log.md#system_tables-query_log), [query\_thread\_log](query_thread_log.md#system_tables-query_thread_log), [trace\_log](trace_log.md#system_tables-trace_log) are served by [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) table engine and store their data in a storage filesystem. If you remove a table from a filesystem, the ClickHouse server creates the empty one again at the time of the next data writing. If system table schema changed in a new release, then ClickHouse renames the current table and creates a new one.
Unlike other system tables, the system tables [metric\_log](../../operations/system-tables/metric_log.md#system_tables-metric_log), [query\_log](../../operations/system-tables/query_log.md#system_tables-query_log), [query\_thread\_log](../../operations/system-tables/query_thread_log.md#system_tables-query_thread_log), [trace\_log](../../operations/system-tables/trace_log.md#system_tables-trace_log) are served by [MergeTree](../../engines/table-engines/mergetree-family/mergetree.md) table engine and store their data in a storage filesystem. If you remove a table from a filesystem, the ClickHouse server creates the empty one again at the time of the next data writing. If system table schema changed in a new release, then ClickHouse renames the current table and creates a new one.
By default, table growth is unlimited. To control a size of a table, you can use [TTL](../../sql-reference/statements/alter.md#manipulations-with-table-ttl) settings for removing outdated log records. Also you can use the partitioning feature of `MergeTree`-engine tables.
@ -45,5 +45,4 @@ If procfs is supported and enabled on the system, ClickHouse server collects the
- `OSReadBytes`
- `OSWriteBytes`
[Original article](https://clickhouse.tech/docs/en/operations/system-tables/) <!--hide-->

View File

@ -49,7 +49,7 @@ CurrentMetric_ReplicatedChecks: 0
**See also**
- [system.asynchronous\_metrics](asynchronous_metrics.md#system_tables-asynchronous_metrics) — Contains periodically calculated metrics.
- [system.events](events.md#system_tables-events) — Contains a number of events that occurred.
- [system.metrics](metrics.md#system_tables-metrics) — Contains instantly calculated metrics.
- [system.asynchronous\_metrics](../../operations/system-tables/asynchronous_metrics.md#system_tables-asynchronous_metrics) — Contains periodically calculated metrics.
- [system.events](../../operations/system-tables/events.md#system_tables-events) — Contains a number of events that occurred.
- [system.metrics](../../operations/system-tables/metrics.md#system_tables-metrics) — Contains instantly calculated metrics.
- [Monitoring](../../operations/monitoring.md) — Base concepts of ClickHouse monitoring.

View File

@ -33,7 +33,7 @@ SELECT * FROM system.metrics LIMIT 10
**See Also**
- [system.asynchronous\_metrics](asynchronous_metrics.md#system_tables-asynchronous_metrics) — Contains periodically calculated metrics.
- [system.events](events.md#system_tables-events) — Contains a number of events that occurred.
- [system.metric\_log](metric_log.md#system_tables-metric_log) — Contains a history of metrics values from tables `system.metrics` и `system.events`.
- [system.asynchronous\_metrics](../../operations/system-tables/asynchronous_metrics.md#system_tables-asynchronous_metrics) — Contains periodically calculated metrics.
- [system.events](../../operations/system-tables/events.md#system_tables-events) — Contains a number of events that occurred.
- [system.metric\_log](../../operations/system-tables/metric_log.md#system_tables-metric_log) — Contains a history of metrics values from tables `system.metrics` и `system.events`.
- [Monitoring](../../operations/monitoring.md) — Base concepts of ClickHouse monitoring.

View File

@ -1,5 +1,5 @@
# system.numbers\_mt {#system-numbers-mt}
The same as [system.numbers](numbers.md) but reads are parallelized. The numbers can be returned in any order.
The same as [system.numbers](../../operations/system-tables/numbers.md) but reads are parallelized. The numbers can be returned in any order.
Used for tests.

View File

@ -11,7 +11,7 @@ You can disable queries logging by setting [log\_queries = 0](../../operations/s
The flushing period of data is set in `flush_interval_milliseconds` parameter of the [query\_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query-log) server settings section. To force flushing, use the [SYSTEM FLUSH LOGS](../../sql-reference/statements/system.md#query_language-system-flush_logs) query.
ClickHouse doesnt delete data from the table automatically. See [Introduction](index.md#system-tables-introduction) for more details.
ClickHouse doesnt delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details.
The `system.query_log` table registers two kinds of queries:
@ -75,7 +75,7 @@ Columns:
- `quota_key` ([String](../../sql-reference/data-types/string.md)) — The “quota key” specified in the [quotas](../../operations/quotas.md) setting (see `keyed`).
- `revision` ([UInt32](../../sql-reference/data-types/int-uint.md)) — ClickHouse revision.
- `thread_numbers` ([Array(UInt32)](../../sql-reference/data-types/array.md)) — Number of threads that are participating in query execution.
- `ProfileEvents.Names` ([Array(String)](../../sql-reference/data-types/array.md)) — Counters that measure different metrics. The description of them could be found in the table [system.events](events.md#system_tables-events)
- `ProfileEvents.Names` ([Array(String)](../../sql-reference/data-types/array.md)) — Counters that measure different metrics. The description of them could be found in the table [system.events](../../operations/system-tables/events.md#system_tables-events)
- `ProfileEvents.Values` ([Array(UInt64)](../../sql-reference/data-types/array.md)) — Values of metrics that are listed in the `ProfileEvents.Names` column.
- `Settings.Names` ([Array(String)](../../sql-reference/data-types/array.md)) — Names of settings that were changed when the client ran the query. To enable logging changes to settings, set the `log_query_settings` parameter to 1.
- `Settings.Values` ([Array(String)](../../sql-reference/data-types/array.md)) — Values of settings that are listed in the `Settings.Names` column.
@ -135,4 +135,4 @@ Settings.Values: ['0','random','1','10000000000']
**See Also**
- [system.query\_thread\_log](query_thread_log.md#system_tables-query_thread_log) — This table contains information about each query execution thread.
- [system.query\_thread\_log](../../operations/system-tables/query_thread_log.md#system_tables-query_thread_log) — This table contains information about each query execution thread.

View File

@ -9,7 +9,7 @@ To start logging:
The flushing period of data is set in `flush_interval_milliseconds` parameter of the [query\_thread\_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-query_thread_log) server settings section. To force flushing, use the [SYSTEM FLUSH LOGS](../../sql-reference/statements/system.md#query_language-system-flush_logs) query.
ClickHouse doesnt delete data from the table automatically. See [Introduction](index.md#system-tables-introduction) for more details.
ClickHouse doesnt delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details.
Columns:
@ -110,4 +110,4 @@ ProfileEvents.Values: [1,97,81,5,81]
**See Also**
- [system.query\_log](query_log.md#system_tables-query_log) — Description of the `query_log` system table which contains common information about queries execution.
- [system.query\_log](../../operations/system-tables/query_log.md#system_tables-query_log) — Description of the `query_log` system table which contains common information about queries execution.

View File

@ -1,6 +1,7 @@
---
toc_priority: 103
---
# anyHeavy {#anyheavyx}
Selects a frequently occurring value using the [heavy hitters](http://www.cs.umd.edu/~samir/498/karp.pdf) algorithm. If there is a value that occurs more than in half the cases in each of the querys execution threads, this value is returned. Normally, the result is nondeterministic.

View File

@ -5,4 +5,4 @@ toc_priority: 104
## anyLast {#anylastx}
Selects the last value encountered.
The result is just as indeterminate as for the [any](any.md) function.
The result is just as indeterminate as for the [any](../../../sql-reference/aggregate-functions/reference/any.md) function.

Some files were not shown because too many files have changed in this diff Show More