This commit is contained in:
Alexander Kozhikhov 2019-05-25 16:33:36 +03:00
commit 7b235a312b
122 changed files with 1981 additions and 473 deletions

View File

@ -1,8 +1,60 @@
## ClickHouse release 19.6.2.11, 2019-05-13
### New Features
* TTL expressions for columns and tables. [#4212](https://github.com/yandex/ClickHouse/pull/4212) ([Anton Popov](https://github.com/CurtizJ))
* Added support for `brotli` compression for HTTP responses (Accept-Encoding: br) [#4388](https://github.com/yandex/ClickHouse/pull/4388) ([Mikhail](https://github.com/fandyushin))
* Added a new aggregate function `simpleLinearRegression(x, y)` which performs linear regression on points (x, y) and returns the parameters of the line. (from #4668) [#4917](https://github.com/yandex/ClickHouse/pull/4917) ([hcz](https://github.com/hczhcz))
* Added new function `isValidUTF8` for checking whether a set of bytes is correctly utf-8 encoded. [#4934](https://github.com/yandex/ClickHouse/pull/4934) ([Danila Kutenin](https://github.com/danlark1))
* Add new load balancing policy `first_or_random` which sends queries to the first specified host and if it's inaccessible send queries to random hosts of shard. Useful for cross-replication topology setups. [#5012](https://github.com/yandex/ClickHouse/pull/5012) ([nvartolomei](https://github.com/nvartolomei))
### Experimental Features
* Add setting `index_granularity_bytes` (adaptive index granularity) for MergeTree* tables family. [#4826](https://github.com/yandex/ClickHouse/pull/4826) ([alesapin](https://github.com/alesapin))
### Improvements
* Added support for non-constant and negative size and length arguments for function `substringUTF8`. [#4989](https://github.com/yandex/ClickHouse/pull/4989) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Disable push-down to right table in left join, left table in right join, and both tables in full join. This fixes wrong JOIN results in some cases. [#4846](https://github.com/yandex/ClickHouse/pull/4846) ([Ivan](https://github.com/abyss7))
* `clickhouse-copier`: auto upload task configuration from `--task-file` option [#4876](https://github.com/yandex/ClickHouse/pull/4876) ([proller](https://github.com/proller))
* Added typos handler for storage factory and table functions factory. [#4891](https://github.com/yandex/ClickHouse/pull/4891) ([Danila Kutenin](https://github.com/danlark1))
* Support asterisks and qualified asterisks for multiple joins without subqueries [#4898](https://github.com/yandex/ClickHouse/pull/4898) ([Artem Zuikov](https://github.com/4ertus2))
* Make missing column error message more user friendly. [#4915](https://github.com/yandex/ClickHouse/pull/4915) ([Artem Zuikov](https://github.com/4ertus2))
### Performance Improvements
* Significant speedup of ASOF JOIN [#4924](https://github.com/yandex/ClickHouse/pull/4924) ([Martijn Bakker](https://github.com/Gladdy))
### Backward Incompatible Changes
* HTTP header `Query-Id` was renamed to `X-ClickHouse-Query-Id` for consistency. [#4972](https://github.com/yandex/ClickHouse/pull/4972) ([Mikhail](https://github.com/fandyushin))
### Bug Fixes
* Fixed potential null pointer dereference in `clickhouse-copier`. [#4900](https://github.com/yandex/ClickHouse/pull/4900) ([proller](https://github.com/proller))
* Fixed error on query with JOIN + ARRAY JOIN [#4938](https://github.com/yandex/ClickHouse/pull/4938) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed hanging on start of the server when a dictionary depends on another dictionary via a database with engine=Dictionary. [#4962](https://github.com/yandex/ClickHouse/pull/4962) ([Vitaly Baranov](https://github.com/vitlibar))
* Partially fix distributed_product_mode = local. It's possible to allow columns of local tables in where/having/order by/... via table aliases. Throw exception if table does not have alias. There's not possible to access to the columns without table aliases yet. [#4986](https://github.com/yandex/ClickHouse/pull/4986) ([Artem Zuikov](https://github.com/4ertus2))
* Fix potentially wrong result for `SELECT DISTINCT` with `JOIN` [#5001](https://github.com/yandex/ClickHouse/pull/5001) ([Artem Zuikov](https://github.com/4ertus2))
### Build/Testing/Packaging Improvements
* Fixed test failures when running clickhouse-server on different host [#4713](https://github.com/yandex/ClickHouse/pull/4713) ([Vasily Nemkov](https://github.com/Enmk))
* clickhouse-test: Disable color control sequences in non tty environment. [#4937](https://github.com/yandex/ClickHouse/pull/4937) ([alesapin](https://github.com/alesapin))
* clickhouse-test: Allow use any test database (remove `test.` qualification where it possible) [#5008](https://github.com/yandex/ClickHouse/pull/5008) ([proller](https://github.com/proller))
* Fix ubsan errors [#5037](https://github.com/yandex/ClickHouse/pull/5037) ([Vitaly Baranov](https://github.com/vitlibar))
* Yandex LFAlloc was added to ClickHouse to allocate MarkCache and UncompressedCache data in different ways to catch segfaults more reliable [#4995](https://github.com/yandex/ClickHouse/pull/4995) ([Danila Kutenin](https://github.com/danlark1))
* Python util to help with backports and changelogs. [#4949](https://github.com/yandex/ClickHouse/pull/4949) ([Ivan](https://github.com/abyss7))
## ClickHouse release 19.5.4.22, 2019-05-13
### Bug fixes
* Fixed possible crash in bitmap* functions [#5220](https://github.com/yandex/ClickHouse/pull/5220) [#5228](https://github.com/yandex/ClickHouse/pull/5228) ([Andy Yang](https://github.com/andyyzh))
* Fixed very rare data race condition that could happen when executing a query with UNION ALL involving at least two SELECTs from system.columns, system.tables, system.parts, system.parts_tables or tables of Merge family and performing ALTER of columns of the related tables concurrently. [#5189](https://github.com/yandex/ClickHouse/pull/5189) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed error `Set for IN is not created yet in case of using single LowCardinality column in the left part of IN`. This error happened if LowCardinality column was the part of primary key. #5031 [#5154](https://github.com/yandex/ClickHouse/pull/5154) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Modification of retention function: If a row satisfies both the first and NTH condition, only the first satisfied condition is added to the data state. Now all conditions that satisfy in a row of data are added to the data state. [#5119](https://github.com/yandex/ClickHouse/pull/5119) ([小路](https://github.com/nicelulu))
## ClickHouse release 19.5.3.8, 2019-04-18
### Bug fixes
* Fixed type of setting `max_partitions_per_insert_block` from boolean to UInt64. [#5028](https://github.com/yandex/ClickHouse/pull/5028) ([Mohammad Hossein Sekhavat](https://github.com/mhsekhavat))
## ClickHouse release 19.5.2.6, 2019-04-15
### New Features

View File

@ -1,3 +1,54 @@
## ClickHouse release 19.6.2.11, 2019-05-13
### Новые возможности
* TTL выражения, позволяющие настроить время жизни и автоматическую очистку данных в таблице или в отдельных её столбцах. [#4212](https://github.com/yandex/ClickHouse/pull/4212) ([Anton Popov](https://github.com/CurtizJ))
* Добавлена поддержка алгоритма сжатия `brotli` в HTTP ответах (`Accept-Encoding: br`). Для тела POST запросов, эта возможность уже существовала. [#4388](https://github.com/yandex/ClickHouse/pull/4388) ([Mikhail](https://github.com/fandyushin))
* Добавлена агрегатная функция `simpleLinearRegression(x, y)` которая осуществляет линейную регрессию на точках (x, y) и возвращает параметры линии. (from #4668) [#4917](https://github.com/yandex/ClickHouse/pull/4917) ([hcz](https://github.com/hczhcz))
* Добавлена функция `isValidUTF8` для проверки, содержит ли строка валидные данные в кодировке UTF-8. [#4934](https://github.com/yandex/ClickHouse/pull/4934) ([Danila Kutenin](https://github.com/danlark1))
* Добавлены новое правило балансировки (`load_balancing`) `first_or_random` по которому запросы посылаются на первый заданый хост и если он недоступен - на случайные хосты шарда. Полезно для топологий с кросс-репликацией. [#5012](https://github.com/yandex/ClickHouse/pull/5012) ([nvartolomei](https://github.com/nvartolomei))
### Эксперемннтальные возможности
* Добавлена настройка `index_granularity_bytes` (адаптивная гранулярность индекса) для таблиц семейства MergeTree* . [#4826](https://github.com/yandex/ClickHouse/pull/4826) ([alesapin](https://github.com/alesapin))
### Улучшения
* Добавлена поддержка для не константных и отрицательных значений аргументов смещения и длины для функции `substringUTF8`. [#4989](https://github.com/yandex/ClickHouse/pull/4989) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Отключение push-down в правую таблицы в left join, левую таблицу в right join, и в обе таблицы в full join. Это исправляет неправильные JOIN результаты в некоторых случаях. [#4846](https://github.com/yandex/ClickHouse/pull/4846) ([Ivan](https://github.com/abyss7))
* `clickhouse-copier`: Автоматическая загрузка конфигурации задачи в zookeeper из `--task-file` опции [#4876](https://github.com/yandex/ClickHouse/pull/4876) ([proller](https://github.com/proller))
* Добавлены подсказки с учётом опечаток для имён движков таблиц и табличных функций. [#4891](https://github.com/yandex/ClickHouse/pull/4891) ([Danila Kutenin](https://github.com/danlark1))
* Поддержка выражений `select *` и `select tablename.*` для множественных join без подзапросов [#4898](https://github.com/yandex/ClickHouse/pull/4898) ([Artem Zuikov](https://github.com/4ertus2))
* Сообщения об ошибках об отсутствующих столбцах стали более понятными. [#4915](https://github.com/yandex/ClickHouse/pull/4915) ([Artem Zuikov](https://github.com/4ertus2))
### Улучшение производительности
* Существенное ускорение ASOF JOIN [#4924](https://github.com/yandex/ClickHouse/pull/4924) ([Martijn Bakker](https://github.com/Gladdy))
### Обратно несовместимые изменения
* HTTP заголовок `Query-Id` переименован в `X-ClickHouse-Query-Id` для соответствия. [#4972](https://github.com/yandex/ClickHouse/pull/4972) ([Mikhail](https://github.com/fandyushin))
### Исправления ошибок
* Исправлены возможные разыменования нулевого указателя в `clickhouse-copier`. [#4900](https://github.com/yandex/ClickHouse/pull/4900) ([proller](https://github.com/proller))
* Исправлены ошибки в запросах с JOIN + ARRAY JOIN [#4938](https://github.com/yandex/ClickHouse/pull/4938) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлено зависание на старте сервера если внешний словарь зависит от другого словаря через использование таблицы из БД с движком `Dictionary`. [#4962](https://github.com/yandex/ClickHouse/pull/4962) ([Vitaly Baranov](https://github.com/vitlibar))
* При использовании `distributed_product_mode = 'local'` корректно работает использование столбцов локальных таблиц в where/having/order by/... через табличные алиасы. Выкидывает исключение если таблица не имеет алиас. Доступ к столбцам без алиасов пока не возможен. [#4986](https://github.com/yandex/ClickHouse/pull/4986) ([Artem Zuikov](https://github.com/4ertus2))
* Исправлен потенциально некорректный результат для `SELECT DISTINCT` с `JOIN` [#5001](https://github.com/yandex/ClickHouse/pull/5001) ([Artem Zuikov](https://github.com/4ertus2))
### Улучшения сборки/тестирования/пакетирования
* Исправлена неработоспособность тестов, если `clickhouse-server` запущен на удалённом хосте [#4713](https://github.com/yandex/ClickHouse/pull/4713) ([Vasily Nemkov](https://github.com/Enmk))
* `clickhouse-test`: Отключена раскраска результата, если команда запускается не в терминале. [#4937](https://github.com/yandex/ClickHouse/pull/4937) ([alesapin](https://github.com/alesapin))
* `clickhouse-test`: Возможность использования не только базы данных test [#5008](https://github.com/yandex/ClickHouse/pull/5008) ([proller](https://github.com/proller))
* Исправлены ошибки при запуске тестов под UBSan [#5037](https://github.com/yandex/ClickHouse/pull/5037) ([Vitaly Baranov](https://github.com/vitlibar))
* Добавлен аллокатор Yandex LFAlloc для аллоцирования MarkCache и UncompressedCache данных разными способами для более надежного отлавливания проездов по памяти [#4995](https://github.com/yandex/ClickHouse/pull/4995) ([Danila Kutenin](https://github.com/danlark1))
* Утилита для упрощения бэкпортирования изменений в старые релизы и составления changelogs. [#4949](https://github.com/yandex/ClickHouse/pull/4949) ([Ivan](https://github.com/abyss7))
## ClickHouse release 19.5.4.22, 2019-05-13
### Исправления ошибок
* Исправлены возможные падения в bitmap* функциях [#5220](https://github.com/yandex/ClickHouse/pull/5220) [#5228](https://github.com/yandex/ClickHouse/pull/5228) ([Andy Yang](https://github.com/andyyzh))
* Исправлен очень редкий data race condition который мог произойти при выполнении запроса с UNION ALL включающего минимум два SELECT из таблиц system.columns, system.tables, system.parts, system.parts_tables или таблиц семейства Merge и одновременно выполняющихся запросов ALTER столбцов соответствующих таблиц. [#5189](https://github.com/yandex/ClickHouse/pull/5189) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Исправлена ошибка `Set for IN is not created yet in case of using single LowCardinality column in the left part of IN`. Эта ошибка возникала когда LowCardinality столбец была частью primary key. #5031 [#5154](https://github.com/yandex/ClickHouse/pull/5154) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Исправление функции retention: только первое соответствующее условие добавлялось в состояние данных. Сейчас все условия которые удовлетворяют в строке данных добавляются в состояние. [#5119](https://github.com/yandex/ClickHouse/pull/5119) ([小路](https://github.com/nicelulu))
## ClickHouse release 19.5.3.8, 2019-04-18
### Исправления ошибок

View File

@ -1,17 +1,16 @@
#include "InterserverIOHTTPHandler.h"
#include <Poco/Net/HTTPBasicCredentials.h>
#include <Poco/Net/HTTPServerRequest.h>
#include <Poco/Net/HTTPServerResponse.h>
#include <common/logger_useful.h>
#include <Common/HTMLForm.h>
#include <Common/setThreadName.h>
#include <Compression/CompressedWriteBuffer.h>
#include <IO/ReadBufferFromIStream.h>
#include <IO/WriteBufferFromHTTPServerResponse.h>
#include <Interpreters/InterserverIOHandler.h>
#include "InterserverIOHTTPHandler.h"
#include "IServer.h"
namespace DB
{
@ -50,7 +49,7 @@ std::pair<String, bool> InterserverIOHTTPHandler::checkAuthentication(Poco::Net:
return {"", true};
}
void InterserverIOHTTPHandler::processQuery(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response)
void InterserverIOHTTPHandler::processQuery(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response, Output & used_output)
{
HTMLForm params(request);
@ -61,24 +60,17 @@ void InterserverIOHTTPHandler::processQuery(Poco::Net::HTTPServerRequest & reque
ReadBufferFromIStream body(request.stream());
const auto & config = server.config();
unsigned keep_alive_timeout = config.getUInt("keep_alive_timeout", 10);
WriteBufferFromHTTPServerResponse out(request, response, keep_alive_timeout);
auto endpoint = server.context().getInterserverIOHandler().getEndpoint(endpoint_name);
if (compress)
{
CompressedWriteBuffer compressed_out(out);
CompressedWriteBuffer compressed_out(*used_output.out.get());
endpoint->processQuery(params, body, compressed_out, response);
}
else
{
endpoint->processQuery(params, body, out, response);
endpoint->processQuery(params, body, *used_output.out.get(), response);
}
out.finalize();
}
@ -90,30 +82,30 @@ void InterserverIOHTTPHandler::handleRequest(Poco::Net::HTTPServerRequest & requ
if (request.getVersion() == Poco::Net::HTTPServerRequest::HTTP_1_1)
response.setChunkedTransferEncoding(true);
Output used_output;
const auto & config = server.config();
unsigned keep_alive_timeout = config.getUInt("keep_alive_timeout", 10);
used_output.out = std::make_shared<WriteBufferFromHTTPServerResponse>(request, response, keep_alive_timeout);
try
{
if (auto [msg, success] = checkAuthentication(request); success)
if (auto [message, success] = checkAuthentication(request); success)
{
processQuery(request, response);
processQuery(request, response, used_output);
LOG_INFO(log, "Done processing query");
}
else
{
response.setStatusAndReason(Poco::Net::HTTPServerResponse::HTTP_UNAUTHORIZED);
if (!response.sent())
response.send() << msg << std::endl;
writeString(message, *used_output.out);
LOG_WARNING(log, "Query processing failed request: '" << request.getURI() << "' authentification failed");
}
}
catch (Exception & e)
{
if (e.code() == ErrorCodes::TOO_MANY_SIMULTANEOUS_QUERIES)
{
if (!response.sent())
response.send();
return;
}
response.setStatusAndReason(Poco::Net::HTTPResponse::HTTP_INTERNAL_SERVER_ERROR);
@ -122,7 +114,7 @@ void InterserverIOHTTPHandler::handleRequest(Poco::Net::HTTPServerRequest & requ
std::string message = getCurrentExceptionMessage(is_real_error);
if (!response.sent())
response.send() << message << std::endl;
writeString(message, *used_output.out);
if (is_real_error)
LOG_ERROR(log, message);
@ -134,7 +126,8 @@ void InterserverIOHTTPHandler::handleRequest(Poco::Net::HTTPServerRequest & requ
response.setStatusAndReason(Poco::Net::HTTPResponse::HTTP_INTERNAL_SERVER_ERROR);
std::string message = getCurrentExceptionMessage(false);
if (!response.sent())
response.send() << message << std::endl;
writeString(message, *used_output.out);
LOG_ERROR(log, message);
}
}

View File

@ -1,12 +1,10 @@
#pragma once
#include <memory>
#include <Poco/Logger.h>
#include <Poco/Net/HTTPRequestHandler.h>
#include <Common/CurrentMetrics.h>
#include "IServer.h"
namespace CurrentMetrics
{
@ -16,6 +14,9 @@ namespace CurrentMetrics
namespace DB
{
class IServer;
class WriteBufferFromHTTPServerResponse;
class InterserverIOHTTPHandler : public Poco::Net::HTTPRequestHandler
{
public:
@ -28,12 +29,17 @@ public:
void handleRequest(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response) override;
private:
struct Output
{
std::shared_ptr<WriteBufferFromHTTPServerResponse> out;
};
IServer & server;
Poco::Logger * log;
CurrentMetrics::Increment metric_increment{CurrentMetrics::InterserverConnection};
void processQuery(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response);
void processQuery(Poco::Net::HTTPServerRequest & request, Poco::Net::HTTPServerResponse & response, Output & used_output);
std::pair<String, bool> checkAuthentication(Poco::Net::HTTPServerRequest & request) const;
};

View File

@ -34,7 +34,7 @@ namespace
for (size_t i = 0; i < argument_types.size(); ++i)
{
if (!isNumber(argument_types[i]))
if (!isNativeNumber(argument_types[i]))
throw Exception(
"Argument " + std::to_string(i) + " of type " + argument_types[i]->getName()
+ " must be numeric for aggregate function " + name,
@ -355,7 +355,7 @@ void LogisticRegression::predict(
for (size_t i = 1; i < arguments.size(); ++i)
{
const ColumnWithTypeAndName & cur_col = block.getByPosition(arguments[i]);
if (!isNumber(cur_col.type))
if (!isNativeNumber(cur_col.type))
{
throw Exception("Prediction arguments must have numeric type", ErrorCodes::BAD_ARGUMENTS);
}
@ -428,7 +428,7 @@ void LinearRegression::predict(
for (size_t i = 1; i < arguments.size(); ++i)
{
const ColumnWithTypeAndName & cur_col = block.getByPosition(arguments[i]);
if (!isNumber(cur_col.type))
if (!isNativeNumber(cur_col.type))
{
throw Exception("Prediction arguments must have numeric type", ErrorCodes::BAD_ARGUMENTS);
}

View File

@ -61,10 +61,10 @@ public:
AggregateFunctionIntersectionsMax(AggregateFunctionIntersectionsKind kind_, const DataTypes & arguments)
: IAggregateFunctionDataHelper<MaxIntersectionsData<PointType>, AggregateFunctionIntersectionsMax<PointType>>(arguments, {}), kind(kind_)
{
if (!isNumber(arguments[0]))
if (!isNativeNumber(arguments[0]))
throw Exception{getName() + ": first argument must be represented by integer", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
if (!isNumber(arguments[1]))
if (!isNativeNumber(arguments[1]))
throw Exception{getName() + ": second argument must be represented by integer", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
if (!arguments[0]->equals(*arguments[1]))

View File

@ -1,6 +1,12 @@
#include <AggregateFunctions/Helpers.h>
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/AggregateFunctionSequenceMatch.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime.h>
#include <ext/range.h>
namespace DB
{
@ -12,32 +18,58 @@ namespace ErrorCodes
namespace
{
AggregateFunctionPtr createAggregateFunctionSequenceCount(const std::string & name, const DataTypes & argument_types, const Array & params)
template <template <typename, typename> class AggregateFunction, template <typename> class Data>
AggregateFunctionPtr createAggregateFunctionSequenceBase(const std::string & name, const DataTypes & argument_types, const Array & params)
{
if (params.size() != 1)
throw Exception{"Aggregate function " + name + " requires exactly one parameter.",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH};
String pattern = params.front().safeGet<std::string>();
return std::make_shared<AggregateFunctionSequenceCount>(argument_types, params, pattern);
}
const auto arg_count = argument_types.size();
AggregateFunctionPtr createAggregateFunctionSequenceMatch(const std::string & name, const DataTypes & argument_types, const Array & params)
{
if (params.size() != 1)
throw Exception{"Aggregate function " + name + " requires exactly one parameter.",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH};
if (arg_count < 3)
throw Exception{"Aggregate function " + name + " requires at least 3 arguments.",
ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION};
if (arg_count - 1 > max_events)
throw Exception{"Aggregate function " + name + " supports up to "
+ toString(max_events) + " event arguments.",
ErrorCodes::TOO_MANY_ARGUMENTS_FOR_FUNCTION};
const auto time_arg = argument_types.front().get();
for (const auto i : ext::range(1, arg_count))
{
const auto cond_arg = argument_types[i].get();
if (!isUInt8(cond_arg))
throw Exception{"Illegal type " + cond_arg->getName() + " of argument " + toString(i + 1)
+ " of aggregate function " + name + ", must be UInt8",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
}
String pattern = params.front().safeGet<std::string>();
return std::make_shared<AggregateFunctionSequenceMatch>(argument_types, params, pattern);
AggregateFunctionPtr res(createWithUnsignedIntegerType<AggregateFunction, Data>(*argument_types[0], argument_types, params, pattern));
if (res)
return res;
WhichDataType which(argument_types.front().get());
if (which.isDateTime())
return std::make_shared<AggregateFunction<DataTypeDateTime::FieldType, Data<DataTypeDateTime::FieldType>>>(argument_types, params, pattern);
else if (which.isDate())
return std::make_shared<AggregateFunction<DataTypeDate::FieldType, Data<DataTypeDate::FieldType>>>(argument_types, params, pattern);
throw Exception{"Illegal type " + time_arg->getName() + " of first argument of aggregate function "
+ name + ", must be DateTime",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
}
}
void registerAggregateFunctionsSequenceMatch(AggregateFunctionFactory & factory)
{
factory.registerFunction("sequenceMatch", createAggregateFunctionSequenceMatch);
factory.registerFunction("sequenceCount", createAggregateFunctionSequenceCount);
factory.registerFunction("sequenceMatch", createAggregateFunctionSequenceBase<AggregateFunctionSequenceMatch, AggregateFunctionSequenceMatchData>);
factory.registerFunction("sequenceCount", createAggregateFunctionSequenceBase<AggregateFunctionSequenceCount, AggregateFunctionSequenceMatchData>);
}
}

View File

@ -36,11 +36,12 @@ struct ComparePairFirst final
}
};
static constexpr auto max_events = 32;
template <typename T>
struct AggregateFunctionSequenceMatchData final
{
static constexpr auto max_events = 32;
using Timestamp = std::uint32_t;
using Timestamp = T;
using Events = std::bitset<max_events>;
using TimestampEvents = std::pair<Timestamp, Events>;
using Comparator = ComparePairFirst<std::less>;
@ -61,6 +62,9 @@ struct AggregateFunctionSequenceMatchData final
void merge(const AggregateFunctionSequenceMatchData & other)
{
if (other.events_list.empty())
return;
const auto size = events_list.size();
events_list.insert(std::begin(other.events_list), std::end(other.events_list));
@ -119,7 +123,7 @@ struct AggregateFunctionSequenceMatchData final
for (size_t i = 0; i < size; ++i)
{
std::uint32_t timestamp;
Timestamp timestamp;
readBinary(timestamp, buf);
UInt64 events;
@ -135,48 +139,23 @@ struct AggregateFunctionSequenceMatchData final
constexpr auto sequence_match_max_iterations = 1000000;
template <typename Derived>
class AggregateFunctionSequenceBase : public IAggregateFunctionDataHelper<AggregateFunctionSequenceMatchData, Derived>
template <typename T, typename Data, typename Derived>
class AggregateFunctionSequenceBase : public IAggregateFunctionDataHelper<Data, Derived>
{
public:
AggregateFunctionSequenceBase(const DataTypes & arguments, const Array & params, const String & pattern)
: IAggregateFunctionDataHelper<AggregateFunctionSequenceMatchData, Derived>(arguments, params)
: IAggregateFunctionDataHelper<Data, Derived>(arguments, params)
, pattern(pattern)
{
arg_count = arguments.size();
if (!sufficientArgs(arg_count))
throw Exception{"Aggregate function " + derived().getName() + " requires at least 3 arguments.",
ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION};
if (arg_count - 1 > AggregateFunctionSequenceMatchData::max_events)
throw Exception{"Aggregate function " + derived().getName() + " supports up to " +
toString(AggregateFunctionSequenceMatchData::max_events) + " event arguments.",
ErrorCodes::TOO_MANY_ARGUMENTS_FOR_FUNCTION};
const auto time_arg = arguments.front().get();
if (!WhichDataType(time_arg).isDateTime())
throw Exception{"Illegal type " + time_arg->getName() + " of first argument of aggregate function "
+ derived().getName() + ", must be DateTime",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
for (const auto i : ext::range(1, arg_count))
{
const auto cond_arg = arguments[i].get();
if (!isUInt8(cond_arg))
throw Exception{"Illegal type " + cond_arg->getName() + " of argument " + toString(i + 1) +
" of aggregate function " + derived().getName() + ", must be UInt8",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
}
parsePattern();
}
void add(AggregateDataPtr place, const IColumn ** columns, const size_t row_num, Arena *) const override
{
const auto timestamp = static_cast<const ColumnUInt32 *>(columns[0])->getData()[row_num];
const auto timestamp = static_cast<const ColumnVector<T> *>(columns[0])->getData()[row_num];
AggregateFunctionSequenceMatchData::Events events;
typename Data::Events events;
for (const auto i : ext::range(1, arg_count))
{
const auto event = static_cast<const ColumnUInt8 *>(columns[i])->getData()[row_num];
@ -218,17 +197,15 @@ private:
struct PatternAction final
{
PatternActionType type;
std::uint32_t extra;
std::uint64_t extra;
PatternAction() = default;
PatternAction(const PatternActionType type, const std::uint32_t extra = 0) : type{type}, extra{extra} {}
PatternAction(const PatternActionType type, const std::uint64_t extra = 0) : type{type}, extra{extra} {}
};
static constexpr size_t bytes_on_stack = 64;
using PatternActions = PODArray<PatternAction, bytes_on_stack, AllocatorWithStackMemory<Allocator<false>, bytes_on_stack>>;
static bool sufficientArgs(const size_t arg_count) { return arg_count >= 3; }
Derived & derived() { return static_cast<Derived &>(*this); }
void parsePattern()
@ -340,8 +317,8 @@ protected:
/// This algorithm performs in O(mn) (with m the number of DFA states and N the number
/// of events) with a memory consumption and memory allocations in O(m). It means that
/// if n >>> m (which is expected to be the case), this algorithm can be considered linear.
template <typename T>
bool dfaMatch(T & events_it, const T events_end) const
template <typename EventEntry>
bool dfaMatch(EventEntry & events_it, const EventEntry events_end) const
{
using ActiveStates = std::vector<bool>;
@ -396,8 +373,8 @@ protected:
return active_states.back();
}
template <typename T>
bool backtrackingMatch(T & events_it, const T events_end) const
template <typename EventEntry>
bool backtrackingMatch(EventEntry & events_it, const EventEntry events_end) const
{
const auto action_begin = std::begin(actions);
const auto action_end = std::end(actions);
@ -407,7 +384,7 @@ protected:
auto base_it = events_it;
/// an iterator to action plus an iterator to row in events list plus timestamp at the start of sequence
using backtrack_info = std::tuple<decltype(action_it), T, T>;
using backtrack_info = std::tuple<decltype(action_it), EventEntry, EventEntry>;
std::stack<backtrack_info> back_stack;
/// backtrack if possible
@ -458,7 +435,7 @@ protected:
}
else if (action_it->type == PatternActionType::TimeLessOrEqual)
{
if (events_it->first - base_it->first <= action_it->extra)
if (events_it->first <= base_it->first + action_it->extra)
{
/// condition satisfied, move onto next action
back_stack.emplace(action_it, events_it, base_it);
@ -470,7 +447,7 @@ protected:
}
else if (action_it->type == PatternActionType::TimeLess)
{
if (events_it->first - base_it->first < action_it->extra)
if (events_it->first < base_it->first + action_it->extra)
{
back_stack.emplace(action_it, events_it, base_it);
base_it = events_it;
@ -481,7 +458,7 @@ protected:
}
else if (action_it->type == PatternActionType::TimeGreaterOrEqual)
{
if (events_it->first - base_it->first >= action_it->extra)
if (events_it->first >= base_it->first + action_it->extra)
{
back_stack.emplace(action_it, events_it, base_it);
base_it = events_it;
@ -492,7 +469,7 @@ protected:
}
else if (action_it->type == PatternActionType::TimeGreater)
{
if (events_it->first - base_it->first > action_it->extra)
if (events_it->first > base_it->first + action_it->extra)
{
back_stack.emplace(action_it, events_it, base_it);
base_it = events_it;
@ -575,14 +552,14 @@ private:
DFAStates dfa_states;
};
class AggregateFunctionSequenceMatch final : public AggregateFunctionSequenceBase<AggregateFunctionSequenceMatch>
template <typename T, typename Data>
class AggregateFunctionSequenceMatch final : public AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceMatch<T, Data>>
{
public:
AggregateFunctionSequenceMatch(const DataTypes & arguments, const Array & params, const String & pattern)
: AggregateFunctionSequenceBase<AggregateFunctionSequenceMatch>(arguments, params, pattern) {}
: AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceMatch<T, Data>>(arguments, params, pattern) {}
using AggregateFunctionSequenceBase<AggregateFunctionSequenceMatch>::AggregateFunctionSequenceBase;
using AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceMatch<T, Data>>::AggregateFunctionSequenceBase;
String getName() const override { return "sequenceMatch"; }
@ -590,27 +567,27 @@ public:
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
{
const_cast<Data &>(data(place)).sort();
const_cast<Data &>(this->data(place)).sort();
const auto & data_ref = data(place);
const auto & data_ref = this->data(place);
const auto events_begin = std::begin(data_ref.events_list);
const auto events_end = std::end(data_ref.events_list);
auto events_it = events_begin;
bool match = pattern_has_time ? backtrackingMatch(events_it, events_end) : dfaMatch(events_it, events_end);
bool match = this->pattern_has_time ? this->backtrackingMatch(events_it, events_end) : this->dfaMatch(events_it, events_end);
static_cast<ColumnUInt8 &>(to).getData().push_back(match);
}
};
class AggregateFunctionSequenceCount final : public AggregateFunctionSequenceBase<AggregateFunctionSequenceCount>
template <typename T, typename Data>
class AggregateFunctionSequenceCount final : public AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceCount<T, Data>>
{
public:
AggregateFunctionSequenceCount(const DataTypes & arguments, const Array & params, const String & pattern)
: AggregateFunctionSequenceBase<AggregateFunctionSequenceCount>(arguments, params, pattern) {}
: AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceCount<T, Data>>(arguments, params, pattern) {}
using AggregateFunctionSequenceBase<AggregateFunctionSequenceCount>::AggregateFunctionSequenceBase;
using AggregateFunctionSequenceBase<T, Data, AggregateFunctionSequenceCount<T, Data>>::AggregateFunctionSequenceBase;
String getName() const override { return "sequenceCount"; }
@ -618,21 +595,21 @@ public:
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
{
const_cast<Data &>(data(place)).sort();
const_cast<Data &>(this->data(place)).sort();
static_cast<ColumnUInt64 &>(to).getData().push_back(count(place));
}
private:
UInt64 count(const ConstAggregateDataPtr & place) const
{
const auto & data_ref = data(place);
const auto & data_ref = this->data(place);
const auto events_begin = std::begin(data_ref.events_list);
const auto events_end = std::end(data_ref.events_list);
auto events_it = events_begin;
size_t count = 0;
while (events_it != events_end && backtrackingMatch(events_it, events_end))
while (events_it != events_end && this->backtrackingMatch(events_it, events_end))
++count;
return count;

View File

@ -10,6 +10,7 @@
#include <Functions/IFunction.h>
#include <IO/WriteBufferFromOStream.h>
#include <Interpreters/ExpressionAnalyzer.h>
#include <Interpreters/ExpressionActions.h>
#include <Parsers/IAST.h>
#include <Storages/IStorage.h>
#include <Common/COW.h>
@ -70,7 +71,7 @@ std::ostream & operator<<(std::ostream & stream, const Block & what)
std::ostream & operator<<(std::ostream & stream, const ColumnWithTypeAndName & what)
{
stream << "ColumnWithTypeAndName(name = " << what.name << ", type = " << what.type << ", column = ";
stream << "ColumnWithTypeAndName(name = " << what.name << ", type = " << *what.type << ", column = ";
return dumpValue(stream, what.column) << ")";
}
@ -109,4 +110,56 @@ std::ostream & operator<<(std::ostream & stream, const IAST & what)
return stream;
}
std::ostream & operator<<(std::ostream & stream, const ExpressionAction & what)
{
stream << "ExpressionAction(" << what.toString() << ")";
return stream;
}
std::ostream & operator<<(std::ostream & stream, const ExpressionActions & what)
{
stream << "ExpressionActions(" << what.dumpActions() << ")";
return stream;
}
std::ostream & operator<<(std::ostream & stream, const SyntaxAnalyzerResult & what)
{
stream << "SyntaxAnalyzerResult{";
stream << "storage=" << what.storage << "; ";
if (!what.source_columns.empty())
{
stream << "source_columns=";
dumpValue(stream, what.source_columns);
stream << "; ";
}
if (!what.aliases.empty())
{
stream << "aliases=";
dumpValue(stream, what.aliases);
stream << "; ";
}
if (!what.array_join_result_to_source.empty())
{
stream << "array_join_result_to_source=";
dumpValue(stream, what.array_join_result_to_source);
stream << "; ";
}
if (!what.array_join_alias_to_name.empty())
{
stream << "array_join_alias_to_name=";
dumpValue(stream, what.array_join_alias_to_name);
stream << "; ";
}
if (!what.array_join_name_to_alias.empty())
{
stream << "array_join_name_to_alias=";
dumpValue(stream, what.array_join_name_to_alias);
stream << "; ";
}
stream << "rewrite_subqueries=" << what.rewrite_subqueries << "; ";
stream << "}";
return stream;
}
}

View File

@ -41,6 +41,14 @@ std::ostream & operator<<(std::ostream & stream, const IAST & what);
std::ostream & operator<<(std::ostream & stream, const Connection::Packet & what);
struct ExpressionAction;
std::ostream & operator<<(std::ostream & stream, const ExpressionAction & what);
class ExpressionActions;
std::ostream & operator<<(std::ostream & stream, const ExpressionActions & what);
struct SyntaxAnalyzerResult;
std::ostream & operator<<(std::ostream & stream, const SyntaxAnalyzerResult & what);
}
/// some operator<< should be declared before operator<<(... std::shared_ptr<>)

View File

@ -821,7 +821,7 @@ MutableColumnUniquePtr DataTypeLowCardinality::createColumnUniqueImpl(const IDat
return creator(static_cast<ColumnVector<UInt16> *>(nullptr));
if (typeid_cast<const DataTypeDateTime *>(type))
return creator(static_cast<ColumnVector<UInt32> *>(nullptr));
if (isNumber(type))
if (isColumnedAsNumber(type))
{
MutableColumnUniquePtr column;
TypeListNumbers::forEach(CreateColumnVector(column, *type, creator));

View File

@ -581,11 +581,18 @@ inline bool isFloat(const T & data_type)
return which.isFloat();
}
template <typename T>
inline bool isNativeNumber(const T & data_type)
{
WhichDataType which(data_type);
return which.isNativeInt() || which.isNativeUInt() || which.isFloat();
}
template <typename T>
inline bool isNumber(const T & data_type)
{
WhichDataType which(data_type);
return which.isInt() || which.isUInt() || which.isFloat();
return which.isInt() || which.isUInt() || which.isFloat() || which.isDecimal();
}
template <typename T>

View File

@ -6,6 +6,7 @@
#include "DictionaryStructure.h"
namespace DB
{
namespace ErrorCodes
@ -47,7 +48,6 @@ void registerDictionarySourceMysql(DictionarySourceFactory & factory)
# include <Formats/MySQLBlockInputStream.h>
# include "readInvalidateQuery.h"
namespace DB
{
static const UInt64 max_block_size = 8192;
@ -71,6 +71,7 @@ MySQLDictionarySource::MySQLDictionarySource(
, query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks}
, load_all_query{query_builder.composeLoadAllQuery()}
, invalidate_query{config.getString(config_prefix + ".invalidate_query", "")}
, close_connection{config.getBool(config_prefix + ".close_connection", false)}
{
}
@ -91,6 +92,7 @@ MySQLDictionarySource::MySQLDictionarySource(const MySQLDictionarySource & other
, last_modification{other.last_modification}
, invalidate_query{other.invalidate_query}
, invalidate_query_response{other.invalidate_query_response}
, close_connection{other.close_connection}
{
}
@ -117,7 +119,7 @@ BlockInputStreamPtr MySQLDictionarySource::loadAll()
last_modification = getLastModification();
LOG_TRACE(log, load_all_query);
return std::make_shared<MySQLBlockInputStream>(pool.Get(), load_all_query, sample_block, max_block_size);
return std::make_shared<MySQLBlockInputStream>(pool.Get(), load_all_query, sample_block, max_block_size, close_connection);
}
BlockInputStreamPtr MySQLDictionarySource::loadUpdatedAll()
@ -126,7 +128,7 @@ BlockInputStreamPtr MySQLDictionarySource::loadUpdatedAll()
std::string load_update_query = getUpdateFieldAndDate();
LOG_TRACE(log, load_update_query);
return std::make_shared<MySQLBlockInputStream>(pool.Get(), load_update_query, sample_block, max_block_size);
return std::make_shared<MySQLBlockInputStream>(pool.Get(), load_update_query, sample_block, max_block_size, close_connection);
}
BlockInputStreamPtr MySQLDictionarySource::loadIds(const std::vector<UInt64> & ids)
@ -134,7 +136,7 @@ BlockInputStreamPtr MySQLDictionarySource::loadIds(const std::vector<UInt64> & i
/// We do not log in here and do not update the modification time, as the request can be large, and often called.
const auto query = query_builder.composeLoadIdsQuery(ids);
return std::make_shared<MySQLBlockInputStream>(pool.Get(), query, sample_block, max_block_size);
return std::make_shared<MySQLBlockInputStream>(pool.Get(), query, sample_block, max_block_size, close_connection);
}
BlockInputStreamPtr MySQLDictionarySource::loadKeys(const Columns & key_columns, const std::vector<size_t> & requested_rows)
@ -142,7 +144,7 @@ BlockInputStreamPtr MySQLDictionarySource::loadKeys(const Columns & key_columns,
/// We do not log in here and do not update the modification time, as the request can be large, and often called.
const auto query = query_builder.composeLoadKeysQuery(key_columns, requested_rows, ExternalQueryBuilder::AND_OR_CHAIN);
return std::make_shared<MySQLBlockInputStream>(pool.Get(), query, sample_block, max_block_size);
return std::make_shared<MySQLBlockInputStream>(pool.Get(), query, sample_block, max_block_size, close_connection);
}
bool MySQLDictionarySource::isModified() const
@ -253,7 +255,7 @@ std::string MySQLDictionarySource::doInvalidateQuery(const std::string & request
Block invalidate_sample_block;
ColumnPtr column(ColumnString::create());
invalidate_sample_block.insert(ColumnWithTypeAndName(column, std::make_shared<DataTypeString>(), "Sample Block"));
MySQLBlockInputStream block_input_stream(pool.Get(), request, invalidate_sample_block, 1);
MySQLBlockInputStream block_input_stream(pool.Get(), request, invalidate_sample_block, 1, close_connection);
return readInvalidateQuery(block_input_stream);
}

View File

@ -81,6 +81,7 @@ private:
LocalDateTime last_modification;
std::string invalidate_query;
mutable std::string invalidate_query_response;
const bool close_connection;
};
}

View File

@ -340,7 +340,7 @@ bool OPTIMIZE(1) CSVRowInputStream::parseRowAndPrintDiagnosticInfo(MutableColumn
if (curr_position < prev_position)
throw Exception("Logical error: parsing is non-deterministic.", ErrorCodes::LOGICAL_ERROR);
if (isNumber(current_column_type) || isDateOrDateTime(current_column_type))
if (isNativeNumber(current_column_type) || isDateOrDateTime(current_column_type))
{
/// An empty string instead of a value.
if (curr_position == prev_position)

View File

@ -20,8 +20,8 @@ namespace ErrorCodes
MySQLBlockInputStream::MySQLBlockInputStream(
const mysqlxx::PoolWithFailover::Entry & entry, const std::string & query_str, const Block & sample_block, const UInt64 max_block_size)
: entry{entry}, query{this->entry->query(query_str)}, result{query.use()}, max_block_size{max_block_size}
const mysqlxx::PoolWithFailover::Entry & entry, const std::string & query_str, const Block & sample_block, const UInt64 max_block_size, const bool auto_close)
: entry{entry}, query{this->entry->query(query_str)}, result{query.use()}, max_block_size{max_block_size}, auto_close{auto_close}
{
if (sample_block.columns() != result.getNumFields())
throw Exception{"mysqlxx::UseQueryResult contains " + toString(result.getNumFields()) + " columns while "
@ -93,7 +93,11 @@ Block MySQLBlockInputStream::readImpl()
{
auto row = result.fetch();
if (!row)
{
if (auto_close)
entry.disconnect();
return {};
}
MutableColumns columns(description.sample_block.columns());
for (const auto i : ext::range(0, columns.size()))
@ -126,7 +130,8 @@ Block MySQLBlockInputStream::readImpl()
row = result.fetch();
}
if (auto_close)
entry.disconnect();
return description.sample_block.cloneWithColumns(std::move(columns));
}

View File

@ -18,7 +18,8 @@ public:
const mysqlxx::PoolWithFailover::Entry & entry,
const std::string & query_str,
const Block & sample_block,
const UInt64 max_block_size);
const UInt64 max_block_size,
const bool auto_close = false);
String getName() const override { return "MySQL"; }
@ -31,6 +32,7 @@ private:
mysqlxx::Query query;
mysqlxx::UseQueryResult result;
const UInt64 max_block_size;
const bool auto_close;
ExternalResultDescription description;
};

View File

@ -308,7 +308,7 @@ bool OPTIMIZE(1) TabSeparatedRowInputStream::parseRowAndPrintDiagnosticInfo(
if (curr_position < prev_position)
throw Exception("Logical error: parsing is non-deterministic.", ErrorCodes::LOGICAL_ERROR);
if (isNumber(current_column_type) || isDateOrDateTime(current_column_type))
if (isNativeNumber(current_column_type) || isDateOrDateTime(current_column_type))
{
/// An empty string instead of a value.
if (curr_position == prev_position)

View File

@ -263,7 +263,7 @@ public:
+ toString(arguments.size()) + ", should be 2 or 3",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
if (!isNumber(arguments[1].type))
if (!isNativeNumber(arguments[1].type))
throw Exception("Second argument for function " + getName() + " (delta) must be number",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);

View File

@ -59,7 +59,7 @@ private:
{
const auto check_argument_type = [this] (const IDataType * arg)
{
if (!isNumber(arg))
if (!isNativeNumber(arg))
throw Exception{"Illegal type " + arg->getName() + " of argument of function " + getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
};

View File

@ -56,7 +56,7 @@ private:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
const auto & arg = arguments.front();
if (!isNumber(arg) && !isDecimal(arg))
if (!isNumber(arg))
throw Exception{"Illegal type " + arg->getName() + " of argument of function " + getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};

View File

@ -37,7 +37,7 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isNumber(arguments.front()))
if (!isNativeNumber(arguments.front()))
throw Exception{"Argument for function " + getName() + " must be number", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
return std::make_shared<DataTypeUInt8>();

View File

@ -369,6 +369,13 @@ public:
throw Exception(
"Second argument for function " + getName() + " must be an bitmap but it has type " + arguments[1]->getName() + ".",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
if (bitmap_type0->getArgumentsDataTypes()[0]->getTypeId() != bitmap_type1->getArgumentsDataTypes()[0]->getTypeId())
throw Exception(
"The nested type in bitmaps must be the same, but one is " + bitmap_type0->getArgumentsDataTypes()[0]->getName()
+ ", and the other is " + bitmap_type1->getArgumentsDataTypes()[0]->getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return std::make_shared<DataTypeNumber<ToType>>();
}
@ -487,6 +494,13 @@ public:
throw Exception(
"Second argument for function " + getName() + " must be an bitmap but it has type " + arguments[1]->getName() + ".",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
if (bitmap_type0->getArgumentsDataTypes()[0]->getTypeId() != bitmap_type1->getArgumentsDataTypes()[0]->getTypeId())
throw Exception(
"The nested type in bitmaps must be the same, but one is " + bitmap_type0->getArgumentsDataTypes()[0]->getName()
+ ", and the other is " + bitmap_type1->getArgumentsDataTypes()[0]->getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return arguments[0];
}

View File

@ -20,7 +20,7 @@ void throwExceptionForIncompletelyParsedValue(
else
message_buf << " at begin of string";
if (isNumber(to_type))
if (isNativeNumber(to_type))
message_buf << ". Note: there are to" << to_type.getName() << "OrZero and to" << to_type.getName() << "OrNull functions, which returns zero/NULL instead of throwing exception.";
throw Exception(message_buf.str(), ErrorCodes::CANNOT_PARSE_TEXT);

View File

@ -1785,7 +1785,7 @@ private:
return createStringToEnumWrapper<ColumnString, EnumType>();
else if (checkAndGetDataType<DataTypeFixedString>(from_type.get()))
return createStringToEnumWrapper<ColumnFixedString, EnumType>();
else if (isNumber(from_type) || isEnum(from_type))
else if (isNativeNumber(from_type) || isEnum(from_type))
{
auto function = Function::create(context);

View File

@ -111,7 +111,7 @@ public:
const auto type_x = arguments[0];
if (!isNumber(type_x))
if (!isNativeNumber(type_x))
throw Exception{"Unsupported type " + type_x->getName() + " of first argument of function " + getName() + " must be a numeric type",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};

View File

@ -143,7 +143,7 @@ public:
{
const IDataType & type = *arguments[0];
if (!isNumber(type))
if (!isNativeNumber(type))
throw Exception("Cannot format " + type.getName() + " as size in bytes", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return std::make_shared<DataTypeString>();

View File

@ -138,7 +138,7 @@ public:
for (auto j : ext::range(0, elements.size()))
{
if (!isNumber(elements[j]))
if (!isNativeNumber(elements[j]))
{
throw Exception(getMsgPrefix(i) + " must contains numeric tuple at position " + toString(j + 1),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);

View File

@ -309,8 +309,8 @@ public:
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
for (size_t i = 0; i < arguments.size(); ++i)
if (!(isNumber(arguments[i])
|| (Impl::specialImplementationForNulls() && (arguments[i]->onlyNull() || isNumber(removeNullable(arguments[i]))))))
if (!(isNativeNumber(arguments[i])
|| (Impl::specialImplementationForNulls() && (arguments[i]->onlyNull() || isNativeNumber(removeNullable(arguments[i]))))))
throw Exception("Illegal type ("
+ arguments[i]->getName()
+ ") of " + toString(i + 1) + " argument of function " + getName(),
@ -488,7 +488,7 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isNumber(arguments[0]))
if (!isNativeNumber(arguments[0]))
throw Exception("Illegal type ("
+ arguments[0]->getName()
+ ") of argument of function " + getName(),

View File

@ -500,7 +500,7 @@ public:
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
for (const auto & type : arguments)
if (!isNumber(type) && !isDecimal(type))
if (!isNumber(type))
throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
@ -588,7 +588,7 @@ public:
{
const DataTypePtr & type_x = arguments[0];
if (!(isNumber(type_x) || isDecimal(type_x)))
if (!isNumber(type_x))
throw Exception{"Unsupported type " + type_x->getName()
+ " of first argument of function " + getName()
+ ", must be numeric type.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
@ -601,7 +601,7 @@ public:
const auto type_arr_nested = type_arr->getNestedType();
if (!(isNumber(type_arr_nested) || isDecimal(type_arr_nested)))
if (!isNumber(type_arr_nested))
{
throw Exception{"Elements of array of second argument of function " + getName()
+ " must be numeric type.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};

View File

@ -595,7 +595,7 @@ inline bool allowArrayIndex(const DataTypePtr & type0, const DataTypePtr & type1
DataTypePtr data_type0 = removeNullable(type0);
DataTypePtr data_type1 = removeNullable(type1);
return ((isNumber(data_type0) || isEnum(data_type0)) && isNumber(data_type1))
return ((isNativeNumber(data_type0) || isEnum(data_type0)) && isNativeNumber(data_type1))
|| data_type0->equals(*data_type1);
}

View File

@ -183,7 +183,7 @@ Columns FunctionArrayIntersect::castColumns(
auto & type_nested = type_array->getNestedType();
auto type_not_nullable_nested = removeNullable(type_nested);
const bool is_numeric_or_string = isNumber(type_not_nullable_nested)
const bool is_numeric_or_string = isNativeNumber(type_not_nullable_nested)
|| isDateOrDateTime(type_not_nullable_nested)
|| isStringOrFixedString(type_not_nullable_nested);

View File

@ -37,7 +37,7 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isNumber(arguments[0]))
if (!isNativeNumber(arguments[0]))
throw Exception("Illegal type " + arguments[0]->getName() +
" of argument of function " + getName() +
", expected Integer", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);

View File

@ -55,8 +55,8 @@ public:
+ ".",
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
if (!isNumber(arguments[0]) || !isNumber(arguments[1]) || !isNumber(arguments[2])
|| (arguments.size() == 4 && !isNumber(arguments[3])))
if (!isNativeNumber(arguments[0]) || !isNativeNumber(arguments[1]) || !isNativeNumber(arguments[2])
|| (arguments.size() == 4 && !isNativeNumber(arguments[3])))
throw Exception("All arguments for function " + getName() + " must be numeric.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return std::make_shared<DataTypeString>();

View File

@ -62,13 +62,13 @@ public:
if ((is_utf8 && !isString(arguments[0])) || !isStringOrFixedString(arguments[0]))
throw Exception("Illegal type " + arguments[0]->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
if (!isNumber(arguments[1]))
if (!isNativeNumber(arguments[1]))
throw Exception("Illegal type " + arguments[1]->getName()
+ " of second argument of function "
+ getName(),
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
if (number_of_arguments == 3 && !isNumber(arguments[2]))
if (number_of_arguments == 3 && !isNativeNumber(arguments[2]))
throw Exception("Illegal type " + arguments[2]->getName()
+ " of second argument of function "
+ getName(),

View File

@ -39,7 +39,7 @@ public:
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (!isNumber(arguments.front()))
if (!isNativeNumber(arguments.front()))
throw Exception{"Argument for function " + getName() + " must be number", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
return std::make_shared<DataTypeUInt8>();

View File

@ -4,6 +4,7 @@
#include <Interpreters/CrossToInnerJoinVisitor.h>
#include <Interpreters/DatabaseAndTableWithAlias.h>
#include <Interpreters/IdentifierSemantic.h>
#include <Interpreters/QueryNormalizer.h> // for functionIsInOperator
#include <Parsers/ASTSelectQuery.h>
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Parsers/ASTIdentifier.h>
@ -120,6 +121,12 @@ public:
{
/// leave other comparisons as is
}
else if (functionIsInOperator(node.name)) /// IN, NOT IN
{
if (auto ident = node.arguments->children.at(0)->as<ASTIdentifier>())
if (size_t min_table = checkIdentifier(*ident))
asts_to_join_on[min_table].push_back(ast);
}
else
{
ands_only = false;
@ -173,7 +180,7 @@ private:
/// @return table position to attach expression to or 0.
size_t checkIdentifiers(const ASTIdentifier & left, const ASTIdentifier & right)
{
/// {best_match, berst_table_pos}
/// {best_match, best_table_pos}
std::pair<size_t, size_t> left_best{0, 0};
std::pair<size_t, size_t> right_best{0, 0};
@ -202,6 +209,26 @@ private:
}
return 0;
}
size_t checkIdentifier(const ASTIdentifier & identifier)
{
size_t best_match = 0;
size_t best_table_pos = 0;
for (size_t i = 0; i < tables.size(); ++i)
{
size_t match = IdentifierSemantic::canReferColumnToTable(identifier, tables[i].table);
if (match > best_match)
{
best_match = match;
best_table_pos = i;
}
}
if (best_match && tables[best_table_pos].canAttachOnExpression())
return best_table_pos;
return 0;
}
};
using CheckExpressionMatcher = OneTypeMatcher<CheckExpressionVisitorData, false>;

View File

@ -849,8 +849,19 @@ bool ExpressionAnalyzer::appendLimitBy(ExpressionActionsChain & chain, bool only
getRootActions(select_query->limitBy(), only_types, step.actions);
NameSet aggregated_names;
for (const auto & column : aggregated_columns)
{
step.required_output.push_back(column.name);
aggregated_names.insert(column.name);
}
for (const auto & child : select_query->limitBy()->children)
step.required_output.push_back(child->getColumnName());
{
auto child_name = child->getColumnName();
if (!aggregated_names.count(child_name))
step.required_output.push_back(std::move(child_name));
}
return true;
}

View File

@ -851,7 +851,7 @@ static UInt64 getLimitUIntValue(const ASTPtr & node, const Context & context)
{
const auto & [field, type] = evaluateConstantExpression(node, context);
if (!isNumber(type))
if (!isNativeNumber(type))
throw Exception("Illegal type " + type->getName() + " of LIMIT expression, must be numeric type", ErrorCodes::INVALID_LIMIT_EXPRESSION);
Field converted = convertFieldToType(field, DataTypeUInt64());

View File

@ -1,12 +1,14 @@
#include "evaluateMissingDefaults.h"
#include <Core/Block.h>
#include <Storages/ColumnDefault.h>
#include <Interpreters/SyntaxAnalyzer.h>
#include <Interpreters/ExpressionAnalyzer.h>
#include <Interpreters/ExpressionActions.h>
#include <Interpreters/evaluateMissingDefaults.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTWithAlias.h>
#include <utility>
#include <DataTypes/DataTypesNumber.h>
namespace DB
@ -58,7 +60,29 @@ void evaluateMissingDefaults(Block & block,
Block copy_block{block};
auto syntax_result = SyntaxAnalyzer(context).analyze(default_expr_list, block.getNamesAndTypesList());
ExpressionAnalyzer{default_expr_list, syntax_result, context}.getActions(true)->execute(copy_block);
auto expression_analyzer = ExpressionAnalyzer{default_expr_list, syntax_result, context};
auto required_source_columns = expression_analyzer.getRequiredSourceColumns();
auto rows_was = copy_block.rows();
// Delete all not needed columns in DEFAULT expression.
// They can intersect with columns added in PREWHERE
// test 00950_default_prewhere
// CLICKHOUSE-4523
for (const auto & delete_column : copy_block.getNamesAndTypesList())
{
if (std::find(required_source_columns.begin(), required_source_columns.end(), delete_column.name) == required_source_columns.end())
{
copy_block.erase(delete_column.name);
}
}
if (copy_block.columns() == 0)
{
// Add column to indicate block size in execute()
copy_block.insert({DataTypeUInt8().createColumnConst(rows_was, 0u), std::make_shared<DataTypeUInt8>(), "__dummy"});
}
expression_analyzer.getActions(true)->execute(copy_block);
/// move evaluated columns to the original block, materializing them at the same time
size_t pos = 0;

View File

@ -126,8 +126,15 @@ void StorageKafka::startup()
for (size_t i = 0; i < num_consumers; ++i)
{
// Make buffer available
pushBuffer(createBuffer());
++num_created_consumers;
try
{
pushBuffer(createBuffer());
++num_created_consumers;
}
catch (const cppkafka::Exception &)
{
tryLogCurrentException(log);
}
}
// Start the reader thread

View File

@ -706,7 +706,7 @@ bool KeyCondition::atomFromAST(const ASTPtr & node, const Context & context, Blo
bool cast_not_needed =
is_set_const /// Set args are already casted inside Set::createFromAST
|| (isNumber(key_expr_type) && isNumber(const_type)); /// Numbers are accurately compared without cast.
|| (isNativeNumber(key_expr_type) && isNativeNumber(const_type)); /// Numbers are accurately compared without cast.
if (!cast_not_needed)
castValueToType(key_expr_type, const_value, const_type, node);

View File

@ -2568,6 +2568,39 @@ MergeTreeData::DataPartsVector MergeTreeData::getAllDataPartsVector(MergeTreeDat
return res;
}
std::vector<DetachedPartInfo>
MergeTreeData::getDetachedParts() const
{
std::vector<DetachedPartInfo> res;
for (Poco::DirectoryIterator it(full_path + "detached");
it != Poco::DirectoryIterator(); ++it)
{
auto dir_name = it.name();
res.emplace_back();
auto & part = res.back();
/// First, try to parse as <part_name>.
if (MergeTreePartInfo::tryParsePartName(dir_name, &part, format_version))
continue;
/// Next, as <prefix>_<partname>. Use entire name as prefix if it fails.
part.prefix = dir_name;
const auto first_separator = dir_name.find_first_of('_');
if (first_separator == String::npos)
continue;
const auto part_name = dir_name.substr(first_separator + 1,
dir_name.size() - first_separator - 1);
if (!MergeTreePartInfo::tryParsePartName(part_name, &part, format_version))
continue;
part.prefix = dir_name.substr(0, first_separator);
}
return res;
}
MergeTreeData::DataParts MergeTreeData::getDataParts(const DataPartStates & affordable_states) const
{
DataParts res;

View File

@ -413,6 +413,9 @@ public:
/// Returns absolutely all parts (and snapshot of their states)
DataPartsVector getAllDataPartsVector(DataPartStateVector * out_states = nullptr) const;
/// Returns all detached parts
std::vector<DetachedPartInfo> getDetachedParts() const;
/// Returns Committed parts
DataParts getDataParts() const;
DataPartsVector getDataPartsVector() const;

View File

@ -437,25 +437,26 @@ void MergeTreeDataPart::renameTo(const String & new_relative_path, bool remove_n
String MergeTreeDataPart::getRelativePathForDetachedPart(const String & prefix) const
{
/// Do not allow underscores in the prefix because they are used as separators.
assert(prefix.find_first_of('_') == String::npos);
String res;
unsigned try_no = 0;
auto dst_name = [&, this] { return "detached/" + prefix + name + (try_no ? "_try" + DB::toString(try_no) : ""); };
/** If you need to detach a part, and directory into which we want to rename it already exists,
* we will rename to the directory with the name to which the suffix is added in the form of "_tryN".
* This is done only in the case of `to_detached`, because it is assumed that in this case the exact name does not matter.
* No more than 10 attempts are made so that there are not too many junk directories left.
*/
while (try_no < 10)
for (int try_no = 0; try_no < 10; try_no++)
{
res = dst_name();
res = "detached/" + (prefix.empty() ? "" : prefix + "_")
+ name + (try_no ? "_try" + DB::toString(try_no) : "");
if (!Poco::File(storage.full_path + res).exists())
return res;
LOG_WARNING(storage.log, "Directory " << dst_name() << " (to detach to) is already exist."
LOG_WARNING(storage.log, "Directory " << res << " (to detach to) already exists."
" Will detach to directory with '_tryN' suffix.");
++try_no;
}
return res;

View File

@ -32,9 +32,21 @@ void MergeTreeMinMaxGranule::serializeBinary(WriteBuffer & ostr) const
for (size_t i = 0; i < index.columns.size(); ++i)
{
const DataTypePtr & type = index.data_types[i];
type->serializeBinary(parallelogram[i].left, ostr);
type->serializeBinary(parallelogram[i].right, ostr);
if (!type->isNullable())
{
type->serializeBinary(parallelogram[i].left, ostr);
type->serializeBinary(parallelogram[i].right, ostr);
}
else
{
bool is_null = parallelogram[i].left.isNull() || parallelogram[i].right.isNull(); // one is enough
writeBinary(is_null, ostr);
if (!is_null)
{
type->serializeBinary(parallelogram[i].left, ostr);
type->serializeBinary(parallelogram[i].right, ostr);
}
}
}
}
@ -46,9 +58,26 @@ void MergeTreeMinMaxGranule::deserializeBinary(ReadBuffer & istr)
for (size_t i = 0; i < index.columns.size(); ++i)
{
const DataTypePtr & type = index.data_types[i];
type->deserializeBinary(min_val, istr);
type->deserializeBinary(max_val, istr);
if (!type->isNullable())
{
type->deserializeBinary(min_val, istr);
type->deserializeBinary(max_val, istr);
}
else
{
bool is_null;
readBinary(is_null, istr);
if (!is_null)
{
type->deserializeBinary(min_val, istr);
type->deserializeBinary(max_val, istr);
}
else
{
min_val = Null();
max_val = Null();
}
}
parallelogram.emplace_back(min_val, true, max_val, true);
}
}
@ -111,6 +140,9 @@ bool MinMaxCondition::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) c
if (!granule)
throw Exception(
"Minmax index condition got a granule with the wrong type.", ErrorCodes::LOGICAL_ERROR);
for (const auto & range : granule->parallelogram)
if (range.left.isNull() || range.right.isNull())
return true;
return condition.mayBeTrueInParallelogram(granule->parallelogram, index.data_types);
}

View File

@ -52,6 +52,12 @@ bool MergeTreePartInfo::tryParsePartName(const String & dir_name, MergeTreePartI
}
}
/// Sanity check
if (partition_id.empty())
{
return false;
}
Int64 min_block_num = 0;
Int64 max_block_num = 0;
UInt32 level = 0;
@ -66,6 +72,12 @@ bool MergeTreePartInfo::tryParsePartName(const String & dir_name, MergeTreePartI
return false;
}
/// Sanity check
if (min_block_num > max_block_num)
{
return false;
}
if (!in.eof())
{
if (!checkChar('_', in)

View File

@ -88,4 +88,11 @@ struct MergeTreePartInfo
static constexpr UInt32 MAX_BLOCK_NUMBER = 999999999;
};
/// Information about detached part, which includes its prefix in
/// addition to the above fields.
struct DetachedPartInfo : public MergeTreePartInfo
{
String prefix;
};
}

View File

@ -259,7 +259,7 @@ void ReplicatedMergeTreePartCheckThread::checkPart(const String & part_name)
storage.removePartAndEnqueueFetch(part_name);
/// Delete part locally.
storage.forgetPartAndMoveToDetached(part, "broken_");
storage.forgetPartAndMoveToDetached(part, "broken");
}
}
else if (part->modification_time + MAX_AGE_OF_LOCAL_PART_THAT_WASNT_ADDED_TO_ZOOKEEPER < time(nullptr))
@ -270,7 +270,7 @@ void ReplicatedMergeTreePartCheckThread::checkPart(const String & part_name)
ProfileEvents::increment(ProfileEvents::ReplicatedPartChecksFailed);
LOG_ERROR(log, "Unexpected part " << part_name << " in filesystem. Removing.");
storage.forgetPartAndMoveToDetached(part, "unexpected_");
storage.forgetPartAndMoveToDetached(part, "unexpected");
}
else
{

View File

@ -245,7 +245,7 @@ void ReplicatedMergeTreeRestartingThread::removeFailedQuorumParts()
if (part)
{
LOG_DEBUG(log, "Found part " << part_name << " with failed quorum. Moving to detached. This shouldn't happen often.");
storage.forgetPartAndMoveToDetached(part, "noquorum_");
storage.forgetPartAndMoveToDetached(part, "noquorum");
storage.queue.removeFromVirtualParts(part->info);
}
}

View File

@ -41,6 +41,7 @@ namespace ErrorCodes
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int NO_SUCH_COLUMN_IN_TABLE;
extern const int BLOCKS_HAVE_DIFFERENT_STRUCTURE;
extern const int SAMPLING_NOT_SUPPORTED;
}
@ -218,6 +219,7 @@ BlockInputStreams StorageMerge::read(
query_info.query, has_table_virtual_column, true, context.getCurrentQueryId());
if (selected_tables.empty())
/// FIXME: do we support sampling in this case?
return createSourceStreams(
query_info, processed_stage, max_block_size, header, {}, {}, real_column_names, modified_context, 0, has_table_virtual_column);
@ -234,6 +236,10 @@ BlockInputStreams StorageMerge::read(
StoragePtr storage = it->first;
TableStructureReadLockHolder struct_lock = it->second;
/// If sampling requested, then check that table supports it.
if (query_info.query->as<ASTSelectQuery>()->sample_size() && !storage->supportsSampling())
throw Exception("Illegal SAMPLE: table doesn't support sampling", ErrorCodes::SAMPLING_NOT_SUPPORTED);
BlockInputStreams source_streams;
if (current_streams)

View File

@ -689,7 +689,7 @@ void StorageReplicatedMergeTree::checkParts(bool skip_sanity_checks)
for (const DataPartPtr & part : unexpected_parts)
{
LOG_ERROR(log, "Renaming unexpected part " << part->name << " to ignored_" + part->name);
forgetPartAndMoveToDetached(part, "ignored_", true);
forgetPartAndMoveToDetached(part, "ignored", true);
}
}

View File

@ -0,0 +1,84 @@
#include <Storages/System/StorageSystemDetachedParts.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataStreams/OneBlockInputStream.h>
#include <ext/shared_ptr_helper.h>
#include <Storages/IStorage.h>
#include <Storages/System/StorageSystemPartsBase.h>
namespace DB
{
/**
* Implements system table 'detached_parts' which allows to get information
* about detached data parts for tables of MergeTree family.
* We don't use StorageSystemPartsBase, because it introduces virtual _state
* column and column aliases which we don't need.
*/
class StorageSystemDetachedParts :
public ext::shared_ptr_helper<StorageSystemDetachedParts>,
public IStorage
{
public:
std::string getName() const override { return "SystemDetachedParts"; }
std::string getTableName() const override { return "detached_parts"; }
protected:
explicit StorageSystemDetachedParts()
{
setColumns(ColumnsDescription{{
{"database", std::make_shared<DataTypeString>()},
{"table", std::make_shared<DataTypeString>()},
{"partition_id", std::make_shared<DataTypeString>()},
{"name", std::make_shared<DataTypeString>()},
{"reason", std::make_shared<DataTypeString>()},
{"min_block_number", std::make_shared<DataTypeInt64>()},
{"max_block_number", std::make_shared<DataTypeInt64>()},
{"level", std::make_shared<DataTypeUInt32>()}
}});
}
BlockInputStreams read(
const Names & /* column_names */,
const SelectQueryInfo & query_info,
const Context & context,
QueryProcessingStage::Enum /*processed_stage*/,
const size_t /*max_block_size*/,
const unsigned /*num_streams*/) override
{
StoragesInfoStream stream(query_info, context);
/// Create the result.
Block block = getSampleBlock();
MutableColumns columns = block.cloneEmptyColumns();
while (StoragesInfo info = stream.next())
{
const auto parts = info.data->getDetachedParts();
for (auto & p : parts)
{
int i = 0;
columns[i++]->insert(info.database);
columns[i++]->insert(info.table);
columns[i++]->insert(p.partition_id);
columns[i++]->insert(p.getPartName());
columns[i++]->insert(p.prefix);
columns[i++]->insert(p.min_block);
columns[i++]->insert(p.max_block);
columns[i++]->insert(p.level);
}
}
return BlockInputStreams(1, std::make_shared<OneBlockInputStream>(
block.cloneWithColumns(std::move(columns))));
}
};
StoragePtr
createDetachedPartsTable()
{
return StorageSystemDetachedParts::create();
}
}

View File

@ -0,0 +1,10 @@
#pragma once
#include <Storages/IStorage_fwd.h>
namespace DB
{
StoragePtr createDetachedPartsTable();
}

View File

@ -51,11 +51,16 @@ StorageSystemParts::StorageSystemParts(const std::string & name)
void StorageSystemParts::processNextStorage(MutableColumns & columns, const StoragesInfo & info, bool has_state_column)
{
using State = MergeTreeDataPart::State;
MergeTreeData::DataPartStateVector all_parts_state;
MergeTreeData::DataPartsVector all_parts;
for (size_t part_number = 0; part_number < info.all_parts.size(); ++part_number)
all_parts = info.getParts(all_parts_state, has_state_column);
for (size_t part_number = 0; part_number < all_parts.size(); ++part_number)
{
const auto & part = info.all_parts[part_number];
auto part_state = info.all_parts_state[part_number];
const auto & part = all_parts[part_number];
auto part_state = all_parts_state[part_number];
MergeTreeDataPart::ColumnSize columns_size = part->getTotalColumnsSize();
size_t i = 0;

View File

@ -17,7 +17,7 @@
namespace DB
{
bool StorageSystemPartsBase::hasStateColumn(const Names & column_names)
bool StorageSystemPartsBase::hasStateColumn(const Names & column_names) const
{
bool has_state_column = false;
Names real_column_names;
@ -37,192 +37,172 @@ bool StorageSystemPartsBase::hasStateColumn(const Names & column_names)
return has_state_column;
}
class StoragesInfoStream
MergeTreeData::DataPartsVector
StoragesInfo::getParts(MergeTreeData::DataPartStateVector & state, bool has_state_column) const
{
public:
StoragesInfoStream(const SelectQueryInfo & query_info, const Context & context, bool has_state_column)
: query_id(context.getCurrentQueryId())
, has_state_column(has_state_column)
using State = MergeTreeData::DataPartState;
if (need_inactive_parts)
{
/// Will apply WHERE to subset of columns and then add more columns.
/// This is kind of complicated, but we use WHERE to do less work.
/// If has_state_column is requested, return all states.
if (!has_state_column)
return data->getDataPartsVector({State::Committed, State::Outdated}, &state);
Block block_to_filter;
return data->getAllDataPartsVector(&state);
}
MutableColumnPtr table_column_mut = ColumnString::create();
MutableColumnPtr engine_column_mut = ColumnString::create();
MutableColumnPtr active_column_mut = ColumnUInt8::create();
return data->getDataPartsVector({State::Committed}, &state);
}
StoragesInfoStream::StoragesInfoStream(const SelectQueryInfo & query_info, const Context & context)
: query_id(context.getCurrentQueryId())
{
/// Will apply WHERE to subset of columns and then add more columns.
/// This is kind of complicated, but we use WHERE to do less work.
Block block_to_filter;
MutableColumnPtr table_column_mut = ColumnString::create();
MutableColumnPtr engine_column_mut = ColumnString::create();
MutableColumnPtr active_column_mut = ColumnUInt8::create();
{
Databases databases = context.getDatabases();
/// Add column 'database'.
MutableColumnPtr database_column_mut = ColumnString::create();
for (const auto & database : databases)
{
Databases databases = context.getDatabases();
/// Add column 'database'.
MutableColumnPtr database_column_mut = ColumnString::create();
for (const auto & database : databases)
{
if (context.hasDatabaseAccessRights(database.first))
database_column_mut->insert(database.first);
}
block_to_filter.insert(ColumnWithTypeAndName(
std::move(database_column_mut), std::make_shared<DataTypeString>(), "database"));
/// Filter block_to_filter with column 'database'.
VirtualColumnUtils::filterBlockWithQuery(query_info.query, block_to_filter, context);
rows = block_to_filter.rows();
/// Block contains new columns, update database_column.
ColumnPtr database_column_ = block_to_filter.getByName("database").column;
if (rows)
{
/// Add columns 'table', 'engine', 'active'
IColumn::Offsets offsets(rows);
for (size_t i = 0; i < rows; ++i)
{
String database_name = (*database_column_)[i].get<String>();
const DatabasePtr database = databases.at(database_name);
offsets[i] = i ? offsets[i - 1] : 0;
for (auto iterator = database->getIterator(context); iterator->isValid(); iterator->next())
{
String table_name = iterator->name();
StoragePtr storage = iterator->table();
String engine_name = storage->getName();
if (!dynamic_cast<MergeTreeData *>(storage.get()))
continue;
storages[std::make_pair(database_name, iterator->name())] = storage;
/// Add all combinations of flag 'active'.
for (UInt64 active : {0, 1})
{
table_column_mut->insert(table_name);
engine_column_mut->insert(engine_name);
active_column_mut->insert(active);
}
offsets[i] += 2;
}
}
for (size_t i = 0; i < block_to_filter.columns(); ++i)
{
ColumnPtr & column = block_to_filter.safeGetByPosition(i).column;
column = column->replicate(offsets);
}
}
if (context.hasDatabaseAccessRights(database.first))
database_column_mut->insert(database.first);
}
block_to_filter.insert(ColumnWithTypeAndName(
std::move(database_column_mut), std::make_shared<DataTypeString>(), "database"));
block_to_filter.insert(ColumnWithTypeAndName(std::move(table_column_mut), std::make_shared<DataTypeString>(), "table"));
block_to_filter.insert(ColumnWithTypeAndName(std::move(engine_column_mut), std::make_shared<DataTypeString>(), "engine"));
block_to_filter.insert(ColumnWithTypeAndName(std::move(active_column_mut), std::make_shared<DataTypeUInt8>(), "active"));
/// Filter block_to_filter with column 'database'.
VirtualColumnUtils::filterBlockWithQuery(query_info.query, block_to_filter, context);
rows = block_to_filter.rows();
/// Block contains new columns, update database_column.
ColumnPtr database_column_ = block_to_filter.getByName("database").column;
if (rows)
{
/// Filter block_to_filter with columns 'database', 'table', 'engine', 'active'.
VirtualColumnUtils::filterBlockWithQuery(query_info.query, block_to_filter, context);
rows = block_to_filter.rows();
/// Add columns 'table', 'engine', 'active'
IColumn::Offsets offsets(rows);
for (size_t i = 0; i < rows; ++i)
{
String database_name = (*database_column_)[i].get<String>();
const DatabasePtr database = databases.at(database_name);
offsets[i] = i ? offsets[i - 1] : 0;
for (auto iterator = database->getIterator(context); iterator->isValid(); iterator->next())
{
String table_name = iterator->name();
StoragePtr storage = iterator->table();
String engine_name = storage->getName();
if (!dynamic_cast<MergeTreeData *>(storage.get()))
continue;
storages[std::make_pair(database_name, iterator->name())] = storage;
/// Add all combinations of flag 'active'.
for (UInt64 active : {0, 1})
{
table_column_mut->insert(table_name);
engine_column_mut->insert(engine_name);
active_column_mut->insert(active);
}
offsets[i] += 2;
}
}
for (size_t i = 0; i < block_to_filter.columns(); ++i)
{
ColumnPtr & column = block_to_filter.safeGetByPosition(i).column;
column = column->replicate(offsets);
}
}
database_column = block_to_filter.getByName("database").column;
table_column = block_to_filter.getByName("table").column;
active_column = block_to_filter.getByName("active").column;
next_row = 0;
}
StorageSystemPartsBase::StoragesInfo next()
block_to_filter.insert(ColumnWithTypeAndName(std::move(table_column_mut), std::make_shared<DataTypeString>(), "table"));
block_to_filter.insert(ColumnWithTypeAndName(std::move(engine_column_mut), std::make_shared<DataTypeString>(), "engine"));
block_to_filter.insert(ColumnWithTypeAndName(std::move(active_column_mut), std::make_shared<DataTypeUInt8>(), "active"));
if (rows)
{
StorageSystemPartsBase::StoragesInfo info;
info.storage = nullptr;
while (next_row < rows)
{
info.database = (*database_column)[next_row].get<String>();
info.table = (*table_column)[next_row].get<String>();
auto isSameTable = [&info, this] (size_t row) -> bool
{
return (*database_column)[row].get<String>() == info.database &&
(*table_column)[row].get<String>() == info.table;
};
/// What 'active' value we need.
bool need[2]{}; /// [active]
for (; next_row < rows && isSameTable(next_row); ++next_row)
{
bool active = (*active_column)[next_row].get<UInt64>() != 0;
need[active] = true;
}
info.storage = storages.at(std::make_pair(info.database, info.table));
try
{
/// For table not to be dropped and set of columns to remain constant.
info.table_lock = info.storage->lockStructureForShare(false, query_id);
}
catch (const Exception & e)
{
/** There are case when IStorage::drop was called,
* but we still own the object.
* Then table will throw exception at attempt to lock it.
* Just skip the table.
*/
if (e.code() == ErrorCodes::TABLE_IS_DROPPED)
continue;
throw;
}
info.engine = info.storage->getName();
info.data = dynamic_cast<MergeTreeData *>(info.storage.get());
if (!info.data)
throw Exception("Unknown engine " + info.engine, ErrorCodes::LOGICAL_ERROR);
using State = MergeTreeDataPart::State;
auto & all_parts_state = info.all_parts_state;
auto & all_parts = info.all_parts;
if (need[0])
{
/// If has_state_column is requested, return all states.
if (!has_state_column)
all_parts = info.data->getDataPartsVector({State::Committed, State::Outdated}, &all_parts_state);
else
all_parts = info.data->getAllDataPartsVector(&all_parts_state);
}
else
all_parts = info.data->getDataPartsVector({State::Committed}, &all_parts_state);
break;
}
return info;
/// Filter block_to_filter with columns 'database', 'table', 'engine', 'active'.
VirtualColumnUtils::filterBlockWithQuery(query_info.query, block_to_filter, context);
rows = block_to_filter.rows();
}
private:
String query_id;
database_column = block_to_filter.getByName("database").column;
table_column = block_to_filter.getByName("table").column;
active_column = block_to_filter.getByName("active").column;
bool has_state_column;
next_row = 0;
}
ColumnPtr database_column;
ColumnPtr table_column;
ColumnPtr active_column;
StoragesInfo StoragesInfoStream::next()
{
StoragesInfo info;
size_t next_row;
size_t rows;
while (next_row < rows)
{
using StoragesMap = std::map<std::pair<String, String>, StoragePtr>;
StoragesMap storages;
};
info.database = (*database_column)[next_row].get<String>();
info.table = (*table_column)[next_row].get<String>();
auto isSameTable = [&info, this] (size_t row) -> bool
{
return (*database_column)[row].get<String>() == info.database &&
(*table_column)[row].get<String>() == info.table;
};
/// We may have two rows per table which differ in 'active' value.
/// If rows with 'active = 0' were not filtered out, this means we
/// must collect the inactive parts. Remember this fact in StoragesInfo.
for (; next_row < rows && isSameTable(next_row); ++next_row)
{
const auto active = (*active_column)[next_row].get<UInt64>();
if (active == 0)
info.need_inactive_parts = true;
}
info.storage = storages.at(std::make_pair(info.database, info.table));
try
{
/// For table not to be dropped and set of columns to remain constant.
info.table_lock = info.storage->lockStructureForShare(false, query_id);
}
catch (const Exception & e)
{
/** There are case when IStorage::drop was called,
* but we still own the object.
* Then table will throw exception at attempt to lock it.
* Just skip the table.
*/
if (e.code() == ErrorCodes::TABLE_IS_DROPPED)
continue;
throw;
}
info.engine = info.storage->getName();
info.data = dynamic_cast<MergeTreeData *>(info.storage.get());
if (!info.data)
throw Exception("Unknown engine " + info.engine, ErrorCodes::LOGICAL_ERROR);
break;
}
return info;
}
BlockInputStreams StorageSystemPartsBase::read(
const Names & column_names,
@ -234,7 +214,7 @@ BlockInputStreams StorageSystemPartsBase::read(
{
bool has_state_column = hasStateColumn(column_names);
StoragesInfoStream stream(query_info, context, has_state_column);
StoragesInfoStream stream(query_info, context);
/// Create the result.

View File

@ -11,6 +11,42 @@ namespace DB
class Context;
struct StoragesInfo
{
StoragePtr storage = nullptr;
TableStructureReadLockHolder table_lock;
String database;
String table;
String engine;
bool need_inactive_parts = false;
MergeTreeData * data = nullptr;
operator bool() const { return storage != nullptr; }
MergeTreeData::DataPartsVector getParts(MergeTreeData::DataPartStateVector & state, bool has_state_column) const;
};
/** A helper class that enumerates the storages that match given query. */
class StoragesInfoStream
{
public:
StoragesInfoStream(const SelectQueryInfo & query_info, const Context & context);
StoragesInfo next();
private:
String query_id;
ColumnPtr database_column;
ColumnPtr table_column;
ColumnPtr active_column;
size_t next_row;
size_t rows;
using StoragesMap = std::map<std::pair<String, String>, StoragePtr>;
StoragesMap storages;
};
/** Implements system table 'parts' which allows to get information about data parts for tables of MergeTree family.
*/
@ -31,26 +67,10 @@ public:
size_t max_block_size,
unsigned num_streams) override;
struct StoragesInfo
{
StoragePtr storage;
TableStructureReadLockHolder table_lock;
String database;
String table;
String engine;
MergeTreeData * data;
MergeTreeData::DataPartStateVector all_parts_state;
MergeTreeData::DataPartsVector all_parts;
operator bool() const { return storage != nullptr; }
};
private:
const std::string name;
bool hasStateColumn(const Names & column_names);
bool hasStateColumn(const Names & column_names) const;
protected:
const FormatSettings format_settings;

View File

@ -80,10 +80,13 @@ void StorageSystemPartsColumns::processNextStorage(MutableColumns & columns, con
}
/// Go through the list of parts.
for (size_t part_number = 0; part_number < info.all_parts.size(); ++part_number)
MergeTreeData::DataPartStateVector all_parts_state;
MergeTreeData::DataPartsVector all_parts;
all_parts = info.getParts(all_parts_state, has_state_column);
for (size_t part_number = 0; part_number < all_parts.size(); ++part_number)
{
const auto & part = info.all_parts[part_number];
auto part_state = info.all_parts_state[part_number];
const auto & part = all_parts[part_number];
auto part_state = all_parts_state[part_number];
auto columns_size = part->getTotalColumnsSize();
/// For convenience, in returned refcount, don't add references that was due to local variables in this method: all_parts, active_parts.

View File

@ -9,6 +9,7 @@
#include <Storages/System/StorageSystemColumns.h>
#include <Storages/System/StorageSystemDatabases.h>
#include <Storages/System/StorageSystemDataTypeFamilies.h>
#include <Storages/System/StorageSystemDetachedParts.h>
#include <Storages/System/StorageSystemDictionaries.h>
#include <Storages/System/StorageSystemEvents.h>
#include <Storages/System/StorageSystemFormats.h>
@ -64,6 +65,7 @@ void attachSystemTablesServer(IDatabase & system_database, bool has_zookeeper)
{
attachSystemTablesLocal(system_database);
system_database.attachTable("parts", StorageSystemParts::create("parts"));
system_database.attachTable("detached_parts", createDetachedPartsTable());
system_database.attachTable("parts_columns", StorageSystemPartsColumns::create("parts_columns"));
system_database.attachTable("processes", StorageSystemProcesses::create("processes"));
system_database.attachTable("metrics", StorageSystemMetrics::create("metrics"));

View File

@ -94,6 +94,14 @@ def colored(text, args, color=None, on_color=None, attrs=None):
else:
return text
def print_err(*args):
sys.stderr.write(' '.join(map(str,args)) + '\n')
def report_failure(name, msg):
print(msg)
# If stderr is not the same as stdout, duplicate the test name there.
if os.fstat(2) != os.fstat(1):
print_err(name, ":", msg)
SERVER_DIED = False
exit_code = 0
@ -140,7 +148,9 @@ def run_tests_array(all_tests_with_params):
elif not args.zookeeper and 'zookeeper' in name:
print(MSG_SKIPPED + " - no zookeeper")
skipped_total += 1
elif not args.shard and 'shard' in name:
elif not args.shard and ('shard' in name
or 'distributed' in name
or 'global' in name):
print(MSG_SKIPPED + " - no shard")
skipped_total += 1
elif not args.no_long and 'long' in name:
@ -171,7 +181,7 @@ def run_tests_array(all_tests_with_params):
raise
failures += 1
print("{0} - Timeout!".format(MSG_FAIL))
report_failure(name, "{0} - Timeout!".format(MSG_FAIL))
else:
counter = 1
while proc.returncode != 0 and need_retry(stderr):
@ -184,10 +194,10 @@ def run_tests_array(all_tests_with_params):
if proc.returncode != 0:
failures += 1
failures_chain += 1
print("{0} - return code {1}".format(MSG_FAIL, proc.returncode))
report_failure(name, "{0} - return code {1}".format(MSG_FAIL, proc.returncode))
if stderr:
print(stderr.encode('utf-8'))
print_err(stderr.encode('utf-8'))
if args.stop and ('Connection refused' in stderr or 'Attempt to read after eof' in stderr) and not 'Received exception from server' in stderr:
SERVER_DIED = True
@ -195,20 +205,20 @@ def run_tests_array(all_tests_with_params):
elif stderr:
failures += 1
failures_chain += 1
print("{0} - having stderror:\n{1}".format(MSG_FAIL, stderr.encode('utf-8')))
report_failure(name, "{0} - having stderror:\n{1}".format(MSG_FAIL, stderr.encode('utf-8')))
elif 'Exception' in stdout:
failures += 1
failures_chain += 1
print("{0} - having exception:\n{1}".format(MSG_FAIL, stdout.encode('utf-8')))
report_failure(name, "{0} - having exception:\n{1}".format(MSG_FAIL, stdout.encode('utf-8')))
elif not os.path.isfile(reference_file):
print("{0} - no reference file".format(MSG_UNKNOWN))
report_failure(name, "{0} - no reference file".format(MSG_UNKNOWN))
else:
result_is_different = subprocess.call(['diff', '-q', reference_file, stdout_file], stdout = PIPE)
if result_is_different:
diff = Popen(['diff', '--unified', reference_file, stdout_file], stdout = PIPE).communicate()[0]
failures += 1
print("{0} - result differs with reference:\n{1}".format(MSG_FAIL, diff))
report_failure(name, "{0} - result differs with reference:\n{1}".format(MSG_FAIL, diff))
else:
passed_total += 1
failures_chain = 0
@ -224,7 +234,7 @@ def run_tests_array(all_tests_with_params):
import traceback
exc_type, exc_value, tb = sys.exc_info()
failures += 1
print("{0} - Test internal error: {1}\n{2}\n{3}".format(MSG_FAIL, exc_type.__name__, exc_value, "\n".join(traceback.format_tb(tb, 10))))
print_err("{0} - Test internal error: {1}\n{2}\n{3}".format(MSG_FAIL, exc_type.__name__, exc_value, "\n".join(traceback.format_tb(tb, 10))))
if failures_chain >= 20:
break
@ -232,7 +242,7 @@ def run_tests_array(all_tests_with_params):
failures_total = failures_total + failures
if failures_total > 0:
print(colored("\nHaving {failures_total} errors! {passed_total} tests passed. {skipped_total} tests skipped.".format(passed_total = passed_total, skipped_total = skipped_total, failures_total = failures_total), args, "red", attrs=["bold"]))
print_err(colored("\nHaving {failures_total} errors! {passed_total} tests passed. {skipped_total} tests skipped.".format(passed_total = passed_total, skipped_total = skipped_total, failures_total = failures_total), args, "red", attrs=["bold"]))
exit_code = 1
else:
print(colored("\n{passed_total} tests passed. {skipped_total} tests skipped.".format(passed_total = passed_total, skipped_total = skipped_total), args, "green", attrs=["bold"]))
@ -388,11 +398,11 @@ def main(args):
processlist = get_processlist(args.client_with_database)
if processlist:
server_pid = get_server_pid(os.getenv("CLICKHOUSE_PORT_TCP", '9000'))
print(colored("\nFound hung queries in processlist:", args, "red", attrs=["bold"]))
print(processlist)
print_err(colored("\nFound hung queries in processlist:", args, "red", attrs=["bold"]))
print_err(processlist)
if server_pid:
print("\nStacktraces of all threads:")
print(get_stacktraces(server_pid))
print_err("\nStacktraces of all threads:")
print_err(get_stacktraces(server_pid))
exit_code = 1
else:
print(colored("\nNo queries hung.", args, "green", attrs=["bold"]))
@ -455,6 +465,9 @@ if __name__ == '__main__':
args.queries = '/usr/share/clickhouse-test/queries'
if args.tmp is None:
args.tmp = '/tmp/clickhouse-test'
if args.queries is None:
print_err("Failed to detect path to the queries directory. Please specify it with '--queries' option.")
exit(1)
if args.tmp is None:
args.tmp = args.queries
if args.client is None:

View File

@ -0,0 +1,32 @@
## ClickHouse performance tests
This directory contains `.xml`-files with performance tests for `clickhouse-performance-test` tool.
### How to write performance test
First of all you should check existing tests don't cover your case. If there are no such tests than you should write your own.
There two types of performance tests:
* First is executed in loop, and have tag `<type>loop</type>` in config.
* Second one is executed only once and have tag `<type>once</type>` in config.
Type `once` should be used only for endless queries. Even if your query really long (10 seconds+), it's better to choose `loop` test.
After you have choosen type, you have to specify `preconditions`. It contains table names. Only `hits_100m_single`, `hits_10m_single`, `test.hits` are available in CI.
The most important part of test is `stop_conditions`. For `loop` test you should always use `min_time_not_changing_for_ms` stop condition. For `once` test you can choose between `average_speed_not_changing_for_ms` and `max_speed_not_changing_for_ms`, but first is preferable. Also you should always specify `total_time_ms` metric. Endless tests will be ignored by CI.
`metrics` and `main_metric` settings are not important and can be ommited, because `loop` tests are always compared by `min_time` metric and `once` tests compared by `max_rows_per_second`.
You can use `substitions`, `create`, `fill` and `drop` queries to prepare test. You can find examples in this folder.
Take into account, that these tests will run in CI which consists of 56-cores and 512 RAM machines. Queries will be executed much faster than on local laptop.
### How to run performance test
You have to run clickhouse-server and after you can start testing:
```
$ clickhouse-performance-test --input-file my_lovely_test1.xml --input-file my_lovely_test2.xml
$ clickhouse-performance-test --input-file /my_lovely_test_dir/
```

View File

@ -1,6 +1,6 @@
<test>
<name>set_lookup_hits</name>
<type>once</type>
<type>loop</type>
<preconditions>
<table_exists>hits_100m_single</table_exists>
@ -8,10 +8,10 @@
<stop_conditions>
<all_of>
<total_time_ms>10000</total_time_ms>
<total_time_ms>8000</total_time_ms>
</all_of>
<any_of>
<average_speed_not_changing_for_ms>5000</average_speed_not_changing_for_ms>
<min_time_not_changing_for_ms>7000</min_time_not_changing_for_ms>
<total_time_ms>20000</total_time_ms>
</any_of>
</stop_conditions>

View File

@ -12,7 +12,7 @@
<total_time_ms>8000</total_time_ms>
</all_of>
<any_of>
<average_speed_not_changing_for_ms>5000</average_speed_not_changing_for_ms>
<min_time_not_changing_for_ms>5000</min_time_not_changing_for_ms>
<total_time_ms>20000</total_time_ms>
</any_of>
</stop_conditions>

View File

@ -4,48 +4,48 @@ create table sequence_test (time UInt32, data UInt8) engine=Memory;
insert into sequence_test values (0,0),(1,0),(2,0),(3,0),(4,1),(5,2),(6,0),(7,0),(8,0),(9,0),(10,1),(11,1);
select 1 = sequenceMatch('')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('.')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('.*')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?3)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?4)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?1)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?1)(?1)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?1)(?1)(?1)(?1)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?1)(?1)(?1)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?t>10)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?1)(?t>11)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?t<11)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?t<3)(?3)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?t<=2)(?3)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?1)(?t<2)(?3)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?2)(?t>=7)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?2)(?t>7)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?2)(?3)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('.')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('.*')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?3)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?4)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?1)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?1)(?1)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?1)(?1)(?1)(?1)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?1)(?1)(?1)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?t>10)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?1)(?t>11)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?t<11)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?t<3)(?3)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?1)(?t<=2)(?3)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?1)(?t<2)(?3)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?2)(?t>=7)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceMatch('(?2)(?t>7)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceMatch('(?2)(?3)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select count() = sequenceCount('')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select count() = sequenceCount('.')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select count() = sequenceCount('.*')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 8 = sequenceCount('(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 3 = sequenceCount('(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?3)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?4)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 4 = sequenceCount('(?1)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 2 = sequenceCount('(?1)(?1)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 2 = sequenceCount('(?1)(?1)(?1)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?1)(?1)(?1)(?1)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 2 = sequenceCount('(?1)(?1)(?1)(?1)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?1)(?t>10)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?1)(?t>11)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 2 = sequenceCount('(?1)(?t<11)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?1)(?t<3)(?3)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?1)(?t<=2)(?3)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?1)(?t<2)(?3)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?2)(?t>=7)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?2)(?t>7)(?2)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?2)(?3)(?1)')(toDateTime(time), data = 0, data = 1, data = 2, data = 3) from sequence_test;
select count() = sequenceCount('')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select count() = sequenceCount('.')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select count() = sequenceCount('.*')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 8 = sequenceCount('(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 3 = sequenceCount('(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?3)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?4)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 4 = sequenceCount('(?1)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 2 = sequenceCount('(?1)(?1)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 2 = sequenceCount('(?1)(?1)(?1)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?1)(?1)(?1)(?1)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 2 = sequenceCount('(?1)(?1)(?1)(?1)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?1)(?t>10)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?1)(?t>11)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 2 = sequenceCount('(?1)(?t<11)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?1)(?t<3)(?3)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?1)(?t<=2)(?3)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?1)(?t<2)(?3)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?2)(?t>=7)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 0 = sequenceCount('(?2)(?t>7)(?2)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
select 1 = sequenceCount('(?2)(?3)(?1)')(time, data = 0, data = 1, data = 2, data = 3) from sequence_test;
drop table sequence_test;

View File

@ -24,3 +24,4 @@
101
101
102
1

View File

@ -32,4 +32,7 @@ SELECT 1 as one FROM remote('127.0.0.{2,3}', system.one) LIMIT 1 BY one;
-- Distributed LIMIT BY with LIMIT
SELECT toInt8(number / 5 + 100) AS x FROM remote('127.0.0.1', system.numbers) LIMIT 2 BY x LIMIT 5;
-- Distributed LIMIT BY with ORDER BY non-selected column
SELECT 1 AS x FROM remote('127.0.0.{2,3}', system.one) ORDER BY dummy LIMIT 1 BY x;
DROP TABLE IF EXISTS limit_by;

View File

@ -8,6 +8,8 @@ Sum before DETACH PARTITION:
15
Sum after DETACH PARTITION:
0
system.detached_parts after DETACH PARTITION:
test not_partitioned all all_1_2_1 1 2 1
*** Partitioned by week ***
Parts before OPTIMIZE:
1999-12-27 19991227_1_1_0

View File

@ -17,6 +17,8 @@ SELECT sum(x) FROM test.not_partitioned;
ALTER TABLE test.not_partitioned DETACH PARTITION ID 'all';
SELECT 'Sum after DETACH PARTITION:';
SELECT sum(x) FROM test.not_partitioned;
SELECT 'system.detached_parts after DETACH PARTITION:';
SELECT * FROM system.detached_parts WHERE table = 'not_partitioned';
DROP TABLE test.not_partitioned;

View File

@ -0,0 +1,20 @@
494
331
576
709
903
378
498
102
861
97
494
331
576
709
903
378
498
102
861
97

View File

@ -0,0 +1,18 @@
DROP TABLE IF EXISTS test.numbers1;
DROP TABLE IF EXISTS test.numbers2;
CREATE TABLE test.numbers1 ENGINE = Memory AS SELECT number FROM numbers(1000);
CREATE TABLE test.numbers2 ENGINE = Memory AS SELECT number FROM numbers(1000);
SELECT * FROM merge(test, '^numbers\\d+$') SAMPLE 0.1; -- { serverError 141 }
DROP TABLE test.numbers1;
DROP TABLE test.numbers2;
CREATE TABLE test.numbers1 ENGINE = MergeTree ORDER BY intHash32(number) SAMPLE BY intHash32(number) AS SELECT number FROM numbers(1000);
CREATE TABLE test.numbers2 ENGINE = MergeTree ORDER BY intHash32(number) SAMPLE BY intHash32(number) AS SELECT number FROM numbers(1000);
SELECT * FROM merge(test, '^numbers\\d+$') SAMPLE 0.01;
DROP TABLE test.numbers1;
DROP TABLE test.numbers2;

View File

@ -5,21 +5,32 @@ DROP TABLE IF EXISTS sequence;
CREATE TABLE sequence
(
userID UInt64,
eventType Enum8('A' = 1, 'B' = 2, 'C' = 3),
eventType Enum8('A' = 1, 'B' = 2, 'C' = 3, 'D' = 4),
EventTime UInt64
)
ENGINE = Memory;
INSERT INTO sequence SELECT 1, number = 0 ? 'A' : (number < 1000000 ? 'B' : 'C'), number FROM numbers(1000001);
INSERT INTO sequence SELECT 1, 'D', 1e14;
SELECT userID
SELECT 'ABC'
FROM sequence
GROUP BY userID
HAVING sequenceMatch('(?1).*(?2).*(?3)')(toDateTime(EventTime), eventType = 'A', eventType = 'B', eventType = 'C');
SELECT userID
SELECT 'ABA'
FROM sequence
GROUP BY userID
HAVING sequenceMatch('(?1).*(?2).*(?3)')(toDateTime(EventTime), eventType = 'A', eventType = 'B', eventType = 'A');
SELECT 'ABBC'
FROM sequence
GROUP BY userID
HAVING sequenceMatch('(?1).*(?2).*(?3).*(?4)')(EventTime, eventType = 'A', eventType = 'B', eventType = 'B',eventType = 'C');
SELECT 'CD'
FROM sequence
GROUP BY userID
HAVING sequenceMatch('(?1)(?t>=10000000000000)(?2)')(EventTime, eventType = 'C', eventType = 'D');
DROP TABLE sequence;

View File

@ -8,6 +8,8 @@
5
4
2
2
[100,200]
70
2019-01-01 50 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]
2019-01-02 60 [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70]

View File

@ -9,6 +9,8 @@ SELECT bitmapAndCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]));
SELECT bitmapOrCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]));
SELECT bitmapXorCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]));
SELECT bitmapAndnotCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]));
SELECT bitmapAndCardinality(bitmapBuild([100, 200, 500]), bitmapBuild(CAST([100, 200], 'Array(UInt16)')));
SELECT bitmapToArray(bitmapAnd(bitmapBuild([100, 200, 500]), bitmapBuild(CAST([100, 200], 'Array(UInt16)'))));
DROP TABLE IF EXISTS bitmap_test;
CREATE TABLE bitmap_test(pickup_date Date, city_id UInt32, uid UInt32)ENGINE = Memory;
@ -86,11 +88,13 @@ ORDER BY t;
INSERT INTO bitmap_column_expr_test VALUES (now(), bitmapBuild(cast([3,19,47] as Array(UInt32))));
SELECT bitmapAndCardinality( bitmapBuild(cast([19,7] as Array(UInt32))), z) from bitmap_column_expr_test;
SELECT bitmapAndCardinality( z, bitmapBuild(cast([19,7] as Array(UInt32))) ) from bitmap_column_expr_test;
SELECT bitmapAndCardinality( bitmapBuild(cast([19,7] AS Array(UInt32))), z) FROM bitmap_column_expr_test;
SELECT bitmapAndCardinality( z, bitmapBuild(cast([19,7] AS Array(UInt32))) ) FROM bitmap_column_expr_test;
SELECT bitmapCardinality(bitmapAnd(bitmapBuild(cast([19,7] AS Array(UInt32))), z )) FROM bitmap_column_expr_test;
SELECT bitmapCardinality(bitmapAnd(z, bitmapBuild(cast([19,7] AS Array(UInt32))))) FROM bitmap_column_expr_test;
select bitmapCardinality(bitmapAnd(bitmapBuild(cast([19,7] as Array(UInt32))), z )) from bitmap_column_expr_test;
select bitmapCardinality(bitmapAnd(z, bitmapBuild(cast([19,7] as Array(UInt32))))) from bitmap_column_expr_test;
DROP TABLE IF EXISTS bitmap_test;
DROP TABLE IF EXISTS bitmap_state_test;

View File

@ -0,0 +1,3 @@
1
2
3

View File

@ -0,0 +1,21 @@
drop table if exists test1;
drop table if exists test2;
drop table if exists test3;
create table test1 (id UInt64, code String) engine = Memory;
create table test3 (id UInt64, code String) engine = Memory;
create table test2 (id UInt64, code String, test1_id UInt64, test3_id UInt64) engine = Memory;
insert into test1 (id, code) select number, toString(number) FROM numbers(100000);
insert into test3 (id, code) select number, toString(number) FROM numbers(100000);
insert into test2 (id, code, test1_id, test3_id) select number, toString(number), number, number FROM numbers(100000);
select test2.id
from test1, test2, test3
where test1.code in ('1', '2', '3')
and test2.test1_id = test1.id
and test2.test3_id = test3.id;
drop table test1;
drop table test2;
drop table test3;

View File

@ -0,0 +1 @@
< HTTP/1.1 500 Internal Server Error

View File

@ -0,0 +1,8 @@
#!/usr/bin/env bash
# Test fix for issue #5066
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. $CURDIR/../shell_config.sh
${CLICKHOUSE_CURL} -vvv "${CLICKHOUSE_URL_INTERSERVER}?endpoint=DataPartsExchange%3A%2Fclickhouse%2Ftables%2F01-01%2Fvisits%2Freplicas%2Fsome.server.com&part=0&compress=false" 2>&1 | grep -F "< HTTP/1.1 500 Internal Server Error"

View File

@ -0,0 +1,4 @@
0
2
4
0

View File

@ -0,0 +1,24 @@
DROP TABLE IF EXISTS min_max_with_nullable_string;
SET allow_experimental_data_skipping_indices = 1;
CREATE TABLE min_max_with_nullable_string (
t DateTime,
nullable_str Nullable(String),
INDEX nullable_str_min_max nullable_str TYPE minmax GRANULARITY 8192
) ENGINE = MergeTree ORDER BY (t);
INSERT INTO min_max_with_nullable_string(t) VALUES (now()) (now());
SELECT count() FROM min_max_with_nullable_string WHERE nullable_str = '.';
INSERT INTO min_max_with_nullable_string(t, nullable_str) VALUES (now(), '.') (now(), '.');
SELECT count() FROM min_max_with_nullable_string WHERE nullable_str = '.';
INSERT INTO min_max_with_nullable_string(t, nullable_str) VALUES (now(), NULL) (now(), '.') (now(), NULL) (now(), '.') (now(), NULL);
SELECT count() FROM min_max_with_nullable_string WHERE nullable_str = '.';
SELECT count() FROM min_max_with_nullable_string WHERE nullable_str = '';
DROP TABLE min_max_with_nullable_string;

View File

@ -0,0 +1,7 @@
42
42 42 42
42 42 43
43
43
43
42 42 43

View File

@ -0,0 +1,21 @@
DROP TABLE IF EXISTS test_generic_events_all;
CREATE TABLE test_generic_events_all (APIKey UInt8, SessionType UInt8) ENGINE = MergeTree() PARTITION BY APIKey ORDER BY tuple();
INSERT INTO test_generic_events_all VALUES( 42, 42 );
ALTER TABLE test_generic_events_all ADD COLUMN OperatingSystem UInt64 DEFAULT 42;
SELECT OperatingSystem FROM test_generic_events_all PREWHERE APIKey = 42 WHERE SessionType = 42;
SELECT * FROM test_generic_events_all PREWHERE APIKey = 42 WHERE SessionType = 42;
DROP TABLE IF EXISTS test_generic_events_all;
CREATE TABLE test_generic_events_all (APIKey UInt8, SessionType UInt8) ENGINE = MergeTree() PARTITION BY APIKey ORDER BY tuple();
INSERT INTO test_generic_events_all VALUES( 42, 42 );
ALTER TABLE test_generic_events_all ADD COLUMN OperatingSystem UInt64 DEFAULT SessionType+1;
SELECT * FROM test_generic_events_all WHERE APIKey = 42 AND SessionType = 42;
SELECT OperatingSystem FROM test_generic_events_all WHERE APIKey = 42;
SELECT OperatingSystem FROM test_generic_events_all WHERE APIKey = 42 AND SessionType = 42;
SELECT OperatingSystem FROM test_generic_events_all PREWHERE APIKey = 42 WHERE SessionType = 42;
SELECT * FROM test_generic_events_all PREWHERE APIKey = 42 WHERE SessionType = 42;
DROP TABLE IF EXISTS test_generic_events_all;

View File

@ -0,0 +1,12 @@
DROP TABLE IF EXISTS test_generic_events_all;
CREATE TABLE test_generic_events_all (APIKey UInt8, SessionType UInt8) ENGINE = MergeTree() PARTITION BY APIKey ORDER BY tuple();
INSERT INTO test_generic_events_all VALUES( 42, 42 );
ALTER TABLE test_generic_events_all ADD COLUMN OperatingSystem UInt64 DEFAULT APIKey+1;
SELECT * FROM test_generic_events_all WHERE APIKey = 42 AND SessionType = 42;
-- InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "APIKey = 42" moved to PREWHERE
SELECT OperatingSystem FROM test_generic_events_all WHERE APIKey = 42;
SELECT OperatingSystem FROM test_generic_events_all WHERE APIKey = 42 AND SessionType = 42;
SELECT OperatingSystem FROM test_generic_events_all PREWHERE APIKey = 42 WHERE SessionType = 42;
SELECT * FROM test_generic_events_all PREWHERE APIKey = 42 WHERE SessionType = 42;
DROP TABLE IF EXISTS test_generic_events_all;

View File

@ -41,6 +41,10 @@ export CLICKHOUSE_PORT_HTTP_PROTO=${CLICKHOUSE_PORT_HTTP_PROTO:="http"}
export CLICKHOUSE_URL=${CLICKHOUSE_URL:="${CLICKHOUSE_PORT_HTTP_PROTO}://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT_HTTP}/"}
export CLICKHOUSE_URL_HTTPS=${CLICKHOUSE_URL_HTTPS:="https://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT_HTTPS}/"}
export CLICKHOUSE_URL_PARAMS=${CLICKHOUSE_URL_PARAMS:="${CLICKHOUSE_URL}?database=${CLICKHOUSE_DATABASE}"}
export CLICKHOUSE_PORT_INTERSERVER=${CLICKHOUSE_PORT_INTERSERVER:=`${CLICKHOUSE_EXTRACT_CONFIG} --try --key=interserver_http_port 2>/dev/null`} 2>/dev/null
export CLICKHOUSE_PORT_INTERSERVER=${CLICKHOUSE_PORT_INTERSERVER:="9009"}
export CLICKHOUSE_URL_INTERSERVER=${CLICKHOUSE_URL_INTERSERVER:="${CLICKHOUSE_PORT_HTTP_PROTO}://${CLICKHOUSE_HOST}:${CLICKHOUSE_PORT_INTERSERVER}/"}
export CLICKHOUSE_CURL_COMMAND=${CLICKHOUSE_CURL_COMMAND:="curl"}
export CLICKHOUSE_CURL=${CLICKHOUSE_CURL:="${CLICKHOUSE_CURL_COMMAND} --max-time 10"}
export CLICKHOUSE_TMP=${CLICKHOUSE_TMP:="."}

View File

@ -21,7 +21,12 @@ Tests should use (create, drop, etc) only tables in `test` database that is assu
If you want to use distributed queries in functional tests, you can leverage `remote` table function with `127.0.0.{1..2}` addresses for the server to query itself; or you can use predefined test clusters in server configuration file like `test_shard_localhost`.
Some tests are marked with `zookeeper`, `shard` or `long` in their names. `zookeeper` is for tests that are using ZooKeeper; `shard` is for tests that requires server to listen `127.0.0.*`; `long` is for tests that run slightly longer that one second.
Some tests are marked with `zookeeper`, `shard` or `long` in their names.
`zookeeper` is for tests that are using ZooKeeper. `shard` is for tests that
requires server to listen `127.0.0.*`; `distributed` or `global` have the same
meaning. `long` is for tests that run slightly longer that one second. You can
disable these groups of tests using `--no-zookeeper`, `--no-shard` and
`--no-long` options, respectively.
## Known bugs

View File

@ -24,6 +24,7 @@ The table below lists supported formats and how they can be used in `INSERT` and
| [PrettyNoEscapes](#prettynoescapes) | ✗ | ✔ |
| [PrettySpace](#prettyspace) | ✗ | ✔ |
| [Protobuf](#protobuf) | ✔ | ✔ |
| [Parquet](#data-format-parquet) | ✔ | ✔ |
| [RowBinary](#rowbinary) | ✔ | ✔ |
| [Native](#native) | ✔ | ✔ |
| [Null](#null) | ✗ | ✔ |
@ -711,6 +712,55 @@ ClickHouse inputs and outputs protobuf messages in the `length-delimited` format
It means before every message should be written its length as a [varint](https://developers.google.com/protocol-buffers/docs/encoding#varints).
See also [how to read/write length-delimited protobuf messages in popular languages](https://cwiki.apache.org/confluence/display/GEODE/Delimiting+Protobuf+Messages).
## Parquet {#data-format-parquet}
[Apache Parquet](http://parquet.apache.org/) is a columnar storage format available to any project in the Hadoop ecosystem. ClickHouse supports read and write operations for this format.
### Data Types Matching
The table below shows supported data types and how they match ClickHouse [data types](../data_types/index.md) in `INSERT` and `SELECT` queries.
| Parquet data type (`INSERT`) | ClickHouse data type | Parquet data type (`SELECT`)
| -------------------- | ------------------ | ---- |
| `UINT8`, `BOOL` | [UInt8](../data_types/int_uint.md) | `UINT8` |
| `INT8` | [Int8](../data_types/int_uint.md) | `INT8` |
| `UINT16` | [UInt16](../data_types/int_uint.md) | `UINT16` |
| `INT16` | [Int16](../data_types/int_uint.md) | `INT16` |
| `UINT32` | [UInt32](../data_types/int_uint.md) | `UINT32` |
| `INT32` | [Int32](../data_types/int_uint.md) | `INT32` |
| `UINT64` | [UInt64](../data_types/int_uint.md) | `UINT64` |
| `INT64` | [Int64](../data_types/int_uint.md) | `INT64` |
| `FLOAT`, `HALF_FLOAT` | [Float32](../data_types/float.md) | `FLOAT` |
| `DOUBLE` | [Float64](../data_types/float.md) | `DOUBLE` |
| `DATE32` | [Date](../data_types/date.md) | `UINT16` |
| `DATE64`, `TIMESTAMP` | [DateTime](../data_types/datetime.md) | `UINT32` |
| `STRING`, `BINARY` | [String](../data_types/string.md) | `STRING` |
| — | [FixedString](../data_types/fixedstring.md) | `STRING` |
| `DECIMAL` | [Decimal](../data_types/decimal.md) | `DECIMAL` |
ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query treats the Parquet `DECIMAL` type as the ClickHouse `Decimal128` type.
Unsupported Parquet data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`.
Data types of a ClickHouse table columns can differ from the corresponding fields of the Parquet data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column.
### Inserting and Selecting Data
You can insert Parquet data from a file into ClickHouse table by the following command:
```
cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet"
```
You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command:
```
clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq}
```
Also look at the `HDFS` and `URL` storage engines to process data from the remote servers.
## Format Schema {#formatschema}
The file name containing the format schema is set by the setting `format_schema`.

View File

@ -683,9 +683,10 @@ This parameter section contains the following parameters:
</node>
```
The `index` attribute specifies an order of node, when trying to connect to ZooKeeper cluster.
The `index` attribute is not used in ClickHouse. The only reason for this attribute is to allow some other programs to use the same configuraton.
- `session_timeout` — Maximum timeout for client session in milliseconds.
- `session_timeout_ms` — Maximum timeout for client session in milliseconds (default: 30000).
- `operation_timeout_ms` — Maximum timeout for operation in milliseconds (default: 10000).
- `root` — ZNode, that is used as root for znodes used by ClickHouse server. Optional.
- `identity` — User and password, required by ZooKeeper to give access to requested znodes. Optional.
@ -702,6 +703,7 @@ This parameter section contains the following parameters:
<port>2181</port>
</node>
<session_timeout_ms>30000</session_timeout_ms>
<operation_timeout_ms>10000</operation_timeout_ms>
<!-- Optional. Chroot suffix. Should exist. -->
<root>/path/to/zookeeper/node</root>
<!-- Optional. Zookeeper digest ACL string. -->

View File

@ -194,4 +194,24 @@ Maximum number of bytes (uncompressed data) that can be passed to a remote serve
What to do when the amount of data exceeds one of the limits: 'throw' or 'break'. By default, throw.
## max_partitions_per_insert_block
Limits the maximum number of partitions in a single inserted block.
Possible values:
- Positive integer.
- 0 — Unlimited number of partitions.
Default value: 100.
**Details**
When inserting data ClickHouse calculates the number of partitions in the inserted block, and if the number of partitions is more than `max_partitions_per_insert_block` then ClickHouse throws an exception with the following text:
"Too many partitions for single INSERT block (more than " + toString(max_parts) + "). The limit is controlled by 'max_partitions_per_insert_block' setting. Large number of partitions is a common misconception. It will lead to severe negative performance impact, including slow server startup, slow INSERT queries and slow SELECT queries. Recommended total number of partitions for a table is under 1000..10000. Please note, that partitioning is not intended to speed up SELECT queries (ORDER BY key is sufficient to make range queries fast). Partitions are intended for data manipulation (DROP PARTITION, etc)."
[Original article](https://clickhouse.yandex/docs/en/operations/settings/query_complexity/) <!--hide-->

View File

@ -497,16 +497,31 @@ The default value is 7500.
The smaller the value, the more often data is flushed into the table. Setting the value too low leads to poor performance.
## load_balancing
## load_balancing {#settings-load_balancing}
Which replicas (among healthy replicas) to preferably send a query to (on the first attempt) for distributed processing.
Specifies the algorithm of replicas selection that is used for distributed query processing.
### random (default)
ClickHouse supports the following algorithms of choosing replicas:
- [Random](#load_balancing-random) (by default)
- [Nearest hostname](#load_balancing-nearest_hostname)
- [In order](#load_balancing-in_order)
- [First or random](#load_balancing-first_or_random)
### Random (by default) {#load_balancing-random}
```
load_balancing = random
```
The number of errors is counted for each replica. The query is sent to the replica with the fewest errors, and if there are several of these, to any one of them.
Disadvantages: Server proximity is not accounted for; if the replicas have different data, you will also get different data.
### nearest_hostname
### Nearest Hostname {#load_balancing-nearest_hostname}
```
load_balancing = nearest_hostname
```
The number of errors is counted for each replica. Every 5 minutes, the number of errors is integrally divided by 2. Thus, the number of errors is calculated for a recent time with exponential smoothing. If there is one replica with a minimal number of errors (i.e. errors occurred recently on the other replicas), the query is sent to it. If there are multiple replicas with the same minimal number of errors, the query is sent to the replica with a host name that is most similar to the server's host name in the config file (for the number of different characters in identical positions, up to the minimum length of both host names).
@ -516,11 +531,37 @@ This method might seem primitive, but it doesn't require external data about net
Thus, if there are equivalent replicas, the closest one by name is preferred.
We can also assume that when sending a query to the same server, in the absence of failures, a distributed query will also go to the same servers. So even if different data is placed on the replicas, the query will return mostly the same results.
### in_order
### In Order {#load_balancing-in_order}
Replicas are accessed in the same order as they are specified. The number of errors does not matter.
```
load_balancing = in_order
```
Replicas with the same number of errors are accessed in the same order as they are specified in configuration.
This method is appropriate when you know exactly which replica is preferable.
### First or Random {#load_balancing-first_or_random}
```
load_balancing = first_or_random
```
This algorithm chooses the first replica in order or a random replica if the first one is unavailable. It is effective in cross-replication topology setups, but it useless in other configurations.
The `first or random` algorithm solves the problem of the `in order` algorithm. The problem is: if one replica goes down, the next one handles twice the usual load while remaining ones handle usual traffic. When using the `first or random` algorithm, the load on replicas is leveled.
## prefer_localhost_replica {#settings-prefer_localhost_replica}
Enables/disables preferable using the localhost replica when processing distributed queries.
Possible values:
- 1 — ClickHouse always sends a query to the localhost replica if it exists.
- 0 — ClickHouse uses the balancing strategy specified by the [load_balancing](#settings-load_balancing) setting.
Default value: 1.
## totals_mode
How to calculate TOTALS when HAVING is present, as well as when max_rows_to_group_by and group_by_overflow_mode = 'any' are present.

View File

@ -57,6 +57,14 @@ This table contains a single String column called 'name' the name of a datab
Each database that the server knows about has a corresponding entry in the table.
This system table is used for implementing the `SHOW DATABASES` query.
## system.detached_parts
Contains information about detached parts of
[MergeTree](table_engines/mergetree.md) tables. The `reason` column specifies
why the part was detached. For user-detached parts, the reason is empty. Such
parts can be attached with [ALTER TABLE ATTACH PARTITION|PART](../query_language/query_language/alter/#alter_attach-partition)
command. For the description of other columns, see [system.parts](#system_tables-parts).
## system.dictionaries
Contains information about external dictionaries.

View File

@ -1,8 +1,8 @@
# JDBC
# JDBC {#table_engine-jdbc}
Allows ClickHouse to connect to external databases via [JDBC](https://en.wikipedia.org/wiki/Java_Database_Connectivity).
To implement JDBC connection, ClickHouse uses the separate program [clickhouse-jdbc-bridge](https://github.com/alex-krash/clickhouse-jdbc-bridge). You should run it as a daemon.
To implement JDBC connection, ClickHouse uses the third-party program [clickhouse-jdbc-bridge](https://github.com/alex-krash/clickhouse-jdbc-bridge). Installation instructions are in it's documentation. You should run it as a daemon.
This engine supports the [Nullable](../../data_types/nullable.md) data type.
@ -78,3 +78,5 @@ FROM jdbc_table
## See Also
- [JDBC table function](../../query_language/table_functions/jdbc.md).
[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/jdbc/) <!--hide-->

View File

@ -3,21 +3,35 @@
The MySQL engine allows you to perform `SELECT` queries on data that is stored on a remote MySQL server.
Call format:
## Creating a Table
```
MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
```sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
...
INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
```
**Call parameters**
See the detailed description of the [CREATE TABLE](../../query_language/create.md#create-table-query) query.
- `host:port` — Address of the MySQL server.
- `database` — Database name on the MySQL server.
- `table` — Name of the table.
- `user` — The MySQL User.
The table structure can be not the same as the original MySQL table structure:
- Names of columns should be the same as in the original MySQL table, but you can use just some of these columns in any order.
- Types of columns may differ from the types in the original MySQL table. ClickHouse tries to [cast](../../query_language/functions/type_conversion_functions.md#type_conversion_function-cast) values into the ClickHouse data types.
**Engine Parameters**
- `host:port` — MySQL server address.
- `database` — Remote database name.
- `table` — Remote table name.
- `user` — MySQL user.
- `password` — User password.
- `replace_query` — Flag that sets query substitution `INSERT INTO` to `REPLACE INTO`. If `replace_query=1`, the query is replaced.
- `on_duplicate_clause` — Adds the `ON DUPLICATE KEY on_duplicate_clause` expression to the `INSERT` query.
- `on_duplicate_clause`The `ON DUPLICATE KEY on_duplicate_clause` expression that is added to the `INSERT` query.
Example: `INSERT INTO t (c1,c2) VALUES ('a', 2) ON DUPLICATE KEY UPDATE c2 = c2 + 1`, where `on_duplicate_clause` is `UPDATE c2 = c2 + 1`. See MySQL documentation to find which `on_duplicate_clause` you can use with `ON DUPLICATE KEY` clause.
@ -27,7 +41,53 @@ At this time, simple `WHERE` clauses such as ` =, !=, >, >=, <, <=` are executed
The rest of the conditions and the `LIMIT` sampling constraint are executed in ClickHouse only after the query to MySQL finishes.
The `MySQL` engine does not support the [Nullable](../../data_types/nullable.md) data type, so when reading data from MySQL tables, `NULL` is converted to default values for the specified column type (usually 0 or an empty string).
## Usage Example
Table in MySQL:
```
mysql> CREATE TABLE `test`.`test` (
-> `int_id` INT NOT NULL AUTO_INCREMENT,
-> `int_nullable` INT NULL DEFAULT NULL,
-> `float` FLOAT NOT NULL,
-> `float_nullable` FLOAT NULL DEFAULT NULL,
-> PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)
mysql> insert into test (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)
mysql> select * from test;
+--------+--------------+-------+----------------+
| int_id | int_nullable | float | float_nullable |
+--------+--------------+-------+----------------+
| 1 | NULL | 2 | NULL |
+--------+--------------+-------+----------------+
1 row in set (0,00 sec)
```
Table in ClickHouse, getting data from the MySQL table:
```sql
CREATE TABLE mysql_table
(
`float_nullable` Nullable(Float32),
`int_id` Int32
)
ENGINE = MySQL('localhost:3306', 'test', 'test', 'bayonet', '123')
```
```sql
SELECT * FROM mysql_table6
```
```text
┌─float_nullable─┬─int_id─┐
│ ᴺᵁᴸᴸ │ 1 │
└────────────────┴────────┘
```
## See Also
- [The 'mysql' table function](../../query_language/table_functions/mysql.md)
- [Using MySQL as a source of extenal dictionary](../../query_language/dicts/external_dicts_dict_sources.md#dicts-external_dicts_dict_sources-mysql)
[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/mysql/) <!--hide-->

View File

@ -0,0 +1,120 @@
# ODBC {#table_engine-odbc}
Allows ClickHouse to connect to external databases via [ODBC](https://en.wikipedia.org/wiki/Open_Database_Connectivity).
To implement ODBC connection safely, ClickHouse uses the separate program `clickhouse-odbc-bridge`. If the ODBC driver is loaded directly from the `clickhouse-server` program, the problems in the driver can crash the ClickHouse server. ClickHouse starts the `clickhouse-odbc-bridge` program automatically when it is required. The ODBC bridge program is installed by the same package as the `clickhouse-server`.
This engine supports the [Nullable](../../data_types/nullable.md) data type.
## Creating a Table
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1],
name2 [type2],
...
)
ENGINE = ODBC(connection_settings, external_database, external_table)
```
See the detailed description of the [CREATE TABLE](../../query_language/create.md#create-table-query) query.
The table structure can be not the same as the source table structure:
- Names of columns should be the same as in the source table, but you can use just some of these columns in any order.
- Types of columns may differ from the types in the source table. ClickHouse tries to [cast](../../query_language/functions/type_conversion_functions.md#type_conversion_function-cast) values into the ClickHouse data types.
**Engine Parameters**
- `connection_settings` — Name of the section with connection settings in the `odbc.ini` file.
- `external_database` — Name of a database in an external DBMS.
- `external_table` — Name of a table in the `external_database`.
## Usage Example
**Getting data from the local MySQL installation via ODBC**
This example is for linux Ubuntu 18.04 and MySQL server 5.7.
Ensure that there are unixODBC and MySQL Connector are installed.
By default (if installed from packages) ClickHouse starts on behalf of the user `clickhouse`. Thus, you need to create and configure this user at MySQL server.
```
sudo mysql
mysql> CREATE USER 'clickhouse'@'localhost' IDENTIFIED BY 'clickhouse';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'clickhouse'@'clickhouse' WITH GRANT OPTION;
```
Then configure the connection in `/etc/odbc.ini`.
```
$ cat /etc/odbc.ini
[mysqlconn]
DRIVER = /usr/local/lib/libmyodbc5w.so
SERVER = 127.0.0.1
PORT = 3306
DATABASE = test
USERNAME = clickhouse
PASSWORD = clickhouse
```
You can check the connection by the `isql` utility from the unixODBC installation.
```
isql -v mysqlconn
+---------------------------------------+
| Connected! |
| |
...
```
Table in MySQL:
```
mysql> CREATE TABLE `test`.`test` (
-> `int_id` INT NOT NULL AUTO_INCREMENT,
-> `int_nullable` INT NULL DEFAULT NULL,
-> `float` FLOAT NOT NULL,
-> `float_nullable` FLOAT NULL DEFAULT NULL,
-> PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)
mysql> insert into test (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)
mysql> select * from test;
+--------+--------------+-------+----------------+
| int_id | int_nullable | float | float_nullable |
+--------+--------------+-------+----------------+
| 1 | NULL | 2 | NULL |
+--------+--------------+-------+----------------+
1 row in set (0,00 sec)
```
Table in ClickHouse, getting data from the MySQL table:
```sql
CREATE TABLE odbc_t
(
`int_id` Int32,
`float_nullable` Nullable(Float32)
)
ENGINE = ODBC('DSN=mysqlconn', 'test', 'test')
```
```sql
SELECT * FROM odbc_t
```
```text
┌─int_id─┬─float_nullable─┐
│ 1 │ ᴺᵁᴸᴸ │
└────────┴────────────────┘
```
## See Also
- [ODBC external dictionaries](../../query_language/dicts/external_dicts_dict_sources.md#dicts-external_dicts_dict_sources-odbc)
- [ODBC table function](../../query_language/table_functions/odbc.md).
[Original article](https://clickhouse.yandex/docs/en/operations/table_engines/jdbc/) <!--hide-->

View File

@ -85,7 +85,7 @@ CREATE TABLE table_name
EventDate DateTime,
CounterID UInt32,
UserID UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/hits', '{replica}')
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/table_name', '{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
@ -99,7 +99,7 @@ CREATE TABLE table_name
EventDate DateTime,
CounterID UInt32,
UserID UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/hits', '{replica}', EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID), EventTime), 8192)
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/table_name', '{replica}', EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID), EventTime), 8192)
```
</details>
@ -121,7 +121,8 @@ In this case, the path consists of the following parts:
`{layer}-{shard}` is the shard identifier. In this example it consists of two parts, since the Yandex.Metrica cluster uses bi-level sharding. For most tasks, you can leave just the {shard} substitution, which will be expanded to the shard identifier.
`hits` is the name of the node for the table in ZooKeeper. It is a good idea to make it the same as the table name. It is defined explicitly, because in contrast to the table name, it doesn't change after a RENAME query.
`table_name` is the name of the node for the table in ZooKeeper. It is a good idea to make it the same as the table name. It is defined explicitly, because in contrast to the table name, it doesn't change after a RENAME query.
*HINT*: you could add a database name in front of `table_name` as well. E.g. `db_name.table_name`
The replica name identifies different replicas of the same table. You can use the server name for this, as in the example. The name only needs to be unique within each shard.

View File

@ -8,7 +8,7 @@ Pattern matching for event chains.
`pattern` is a string containing a pattern to match. The pattern is similar to a regular expression.
`time` is the time of the event with the DateTime type.
`time` is the time of the event, type support: `Date`,`DateTime`, and other unsigned integer types.
`cond1`, `cond2` ... is from one to 32 arguments of type UInt8 that indicate whether a certain condition was met for the event.
@ -59,7 +59,7 @@ windowFunnel(window)(timestamp, cond1, cond2, cond3, ...)
**Parameters:**
- `window` — Length of the sliding window in seconds.
- `timestamp` — Name of the column containing the timestamp. Data type: [DateTime](../../data_types/datetime.md) or [UInt32](../../data_types/int_uint.md).
- `timestamp` — Name of the column containing the timestamp. Data type support: `Date`,`DateTime`, and other unsigned integer types(Note that though timestamp support `UInt64` type, there is a limitation it's value can't overflow maximum of Int64, which is 2^63 - 1).
- `cond1`, `cond2`... — Conditions or data describing the chain of events. Data type: `UInt8`. Values can be 0 or 1.
**Algorithm**

View File

@ -580,5 +580,41 @@ Calculates the value of `Σ((x - x̅)(y - y̅)) / n`.
Calculates the Pearson correlation coefficient: `Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)^2) * Σ((y - y̅)^2))`.
## simpleLinearRegression
Performs simple (unidimensional) linear regression.
```
simpleLinearRegression(x, y)
```
Parameters:
- `x` — Column with values of dependent variable.
- `y` — Column with explanatory variable.
Returned values:
Parameters `(a, b)` of the resulting line `x = a*y + b`.
**Examples**
```sql
SELECT arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [0, 1, 2, 3])
```
```text
┌─arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [0, 1, 2, 3])─┐
│ (1,0) │
└───────────────────────────────────────────────────────────────────┘
```
```sql
SELECT arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [3, 4, 5, 6])
```
```text
┌─arrayReduce('simpleLinearRegression', [0, 1, 2, 3], [3, 4, 5, 6])─┐
│ (1,3) │
└───────────────────────────────────────────────────────────────────┘
```
[Original article](https://clickhouse.yandex/docs/en/query_language/agg_functions/reference/) <!--hide-->

View File

@ -842,6 +842,43 @@ DISTINCT is not supported if SELECT has at least one array column.
`DISTINCT` works with [NULL](syntax.md) as if `NULL` were a specific value, and `NULL=NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` only occur once.
ClickHouse supports using the `DISTINCT` and `ORDER BY` clauses for different columns in one query. The `DISTINCT` clause is executed before the `ORDER BY` clause.
The sample table:
```text
┌─a─┬─b─┐
│ 2 │ 1 │
│ 1 │ 2 │
│ 3 │ 3 │
│ 2 │ 4 │
└───┴───┘
```
When selecting data by the `SELECT DISTINCT a FROM t1 ORDER BY b ASC` query, we get the following result:
```text
┌─a─┐
│ 2 │
│ 1 │
│ 3 │
└───┘
```
If we change the direction of ordering `SELECT DISTINCT a FROM t1 ORDER BY b DESC`, we get the following result:
```text
┌─a─┐
│ 3 │
│ 1 │
│ 2 │
└───┘
```
Row `2, 4` was cut before sorting.
Take into account this implementation specificity when programming queries.
### LIMIT Clause
`LIMIT m` allows you to select the first `m` rows from the result.

View File

@ -1,5 +1,4 @@
# jdbc
# jdbc {#table_function-jdbc}
`jdbc(jdbc_connection_uri, schema, table)` - returns table that is connected via JDBC driver.

View File

@ -0,0 +1,72 @@
# mysql
Allows to perform `SELECT` queries on data that is stored on a remote MySQL server.
```
mysql('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
```
**Parameters**
- `host:port` — MySQL server address.
- `database` — Remote database name.
- `table` — Remote table name.
- `user` — MySQL user.
- `password` — User password.
- `replace_query` — If `replace_query=1`, the `REPLACE` query will be performed instead of `INSERT`.
- `on_duplicate_clause` — The `ON DUPLICATE KEY on_duplicate_clause` expression that is added to the `INSERT` query.
Example: `INSERT INTO t (c1,c2) VALUES ('a', 2) ON DUPLICATE KEY UPDATE c2 = c2 + 1`, where `on_duplicate_clause` is `UPDATE c2 = c2 + 1`. See MySQL documentation to find which `on_duplicate_clause` you can use with `ON DUPLICATE KEY` clause.
To specify `on_duplicate_clause` you need to pass `0` to the `replace_query` parameter. If you simultaneously pass `replace_query = 1` and `on_duplicate_clause`, ClickHouse generates an exception.
At this time, simple `WHERE` clauses such as ` =, !=, >, >=, <, <=` are executed on the MySQL server.
The rest of the conditions and the `LIMIT` sampling constraint are executed in ClickHouse only after the query to MySQL finishes.
**Returned Value**
A table object with the same columns as the original MySQL table.
## Usage Example
Table in MySQL:
```
mysql> CREATE TABLE `test`.`test` (
-> `int_id` INT NOT NULL AUTO_INCREMENT,
-> `int_nullable` INT NULL DEFAULT NULL,
-> `float` FLOAT NOT NULL,
-> `float_nullable` FLOAT NULL DEFAULT NULL,
-> PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)
mysql> insert into test (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)
mysql> select * from test;
+--------+--------------+-------+----------------+
| int_id | int_nullable | float | float_nullable |
+--------+--------------+-------+----------------+
| 1 | NULL | 2 | NULL |
+--------+--------------+-------+----------------+
1 row in set (0,00 sec)
```
Selection of the data from ClickHouse:
```sql
SELECT * FROM mysql('localhost:3306', 'test', 'test', 'bayonet', '123')
```
```text
┌─int_id─┬─int_nullable─┬─float─┬─float_nullable─┐
│ 1 │ ᴺᵁᴸᴸ │ 2 │ ᴺᵁᴸᴸ │
└────────┴──────────────┴───────┴────────────────┘
```
## See Also
- [The 'MySQL' table engine](../../operations/table_engines/mysql.md)
- [Using MySQL as a source of extenal dictionary](../dicts/external_dicts_dict_sources.md#dicts-external_dicts_dict_sources-mysql)
[Original article](https://clickhouse.yandex/docs/en/query_language/table_functions/mysql/) <!--hide-->

View File

@ -0,0 +1,97 @@
# odbc {#table_functions-odbc}
Returns table that is connected via [ODBC](https://en.wikipedia.org/wiki/Open_Database_Connectivity).
```
odbc(connection_settings, external_database, external_table)
```
Parameters:
- `connection_settings` — Name of the section with connection settings in the `odbc.ini` file.
- `external_database` — Name of a database in an external DBMS.
- `external_table` — Name of a table in the `external_database`.
To implement ODBC connection safely, ClickHouse uses the separate program `clickhouse-odbc-bridge`. If the ODBC driver is loaded directly from the `clickhouse-server` program, the problems in the driver can crash the ClickHouse server. ClickHouse starts the `clickhouse-odbc-bridge` program automatically when it is required. The ODBC bridge program is installed by the same package as the `clickhouse-server`.
The fields with the `NULL` values from the external table are converted into the default values for the base data type. For example, if a remote MySQL table field has the `INT NULL` type it is converted to 0 (the default value for ClickHouse `Int32` data type).
## Usage example
**Getting data from the local MySQL installation via ODBC**
This example is for linux Ubuntu 18.04 and MySQL server 5.7.
Ensure that there are unixODBC and MySQL Connector are installed.
By default (if installed from packages) ClickHouse starts on behalf of the user `clickhouse`. Thus you need to create and configure this user at MySQL server.
```
sudo mysql
mysql> CREATE USER 'clickhouse'@'localhost' IDENTIFIED BY 'clickhouse';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'clickhouse'@'clickhouse' WITH GRANT OPTION;
```
Then configure the connection in `/etc/odbc.ini`.
```
$ cat /etc/odbc.ini
[mysqlconn]
DRIVER = /usr/local/lib/libmyodbc5w.so
SERVER = 127.0.0.1
PORT = 3306
DATABASE = test
USERNAME = clickhouse
PASSWORD = clickhouse
```
You can check the connection by the `isql` utility from the unixODBC installation.
```
isql -v mysqlconn
+---------------------------------------+
| Connected! |
| |
...
```
Table in MySQL:
```
mysql> CREATE TABLE `test`.`test` (
-> `int_id` INT NOT NULL AUTO_INCREMENT,
-> `int_nullable` INT NULL DEFAULT NULL,
-> `float` FLOAT NOT NULL,
-> `float_nullable` FLOAT NULL DEFAULT NULL,
-> PRIMARY KEY (`int_id`));
Query OK, 0 rows affected (0,09 sec)
mysql> insert into test (`int_id`, `float`) VALUES (1,2);
Query OK, 1 row affected (0,00 sec)
mysql> select * from test;
+--------+--------------+-------+----------------+
| int_id | int_nullable | float | float_nullable |
+--------+--------------+-------+----------------+
| 1 | NULL | 2 | NULL |
+--------+--------------+-------+----------------+
1 row in set (0,00 sec)
```
Getting data from the MySQL table:
```sql
SELECT * FROM odbc('DSN=mysqlconn', 'test', 'test')
```
```text
┌─int_id─┬─int_nullable─┬─float─┬─float_nullable─┐
│ 1 │ 0 │ 2 │ 0 │
└────────┴──────────────┴───────┴────────────────┘
```
## See Also
- [ODBC external dictionaries](../../query_language/dicts/external_dicts_dict_sources.md#dicts-external_dicts_dict_sources-odbc)
- [ODBC table engine](../../operations/table_engines/odbc.md).
[Original article](https://clickhouse.yandex/docs/en/query_language/table_functions/jdbc/) <!--hide-->

Some files were not shown because too many files have changed in this diff Show More