Merge branch 'master' into fix_watch_race_testkeeper

This commit is contained in:
alesapin 2020-12-16 19:45:57 +03:00
commit 422467628a
56 changed files with 2315 additions and 309 deletions

2
.gitmodules vendored
View File

@ -183,7 +183,7 @@
url = https://github.com/kthohr/stats.git
[submodule "contrib/krb5"]
path = contrib/krb5
url = https://github.com/krb5/krb5
url = https://github.com/ClickHouse-Extras/krb5
[submodule "contrib/cyrus-sasl"]
path = contrib/cyrus-sasl
url = https://github.com/cyrusimap/cyrus-sasl

2
contrib/krb5 vendored

@ -1 +1 @@
Subproject commit 99f7ad2831a01f264c07eed42a0a3a9336b86184
Subproject commit 90ff6f4f8c695d6bf1aaba78a9b8942be92141c2

2
contrib/libunwind vendored

@ -1 +1 @@
Subproject commit 51b84d9b6d2548f1cbdcafe622d5a753853b6149
Subproject commit 8fe25d7dc70f2a4ea38c3e5a33fa9d4199b67a5a

View File

@ -58,8 +58,7 @@
"docker/test/stateless": {
"name": "yandex/clickhouse-stateless-test",
"dependent": [
"docker/test/stateful",
"docker/test/stateful_with_coverage"
"docker/test/stateful"
]
},
"docker/test/stateless_pytest": {
@ -68,7 +67,9 @@
},
"docker/test/stateless_with_coverage": {
"name": "yandex/clickhouse-stateless-test-with-coverage",
"dependent": []
"dependent": [
"docker/test/stateful_with_coverage"
]
},
"docker/test/unit": {
"name": "yandex/clickhouse-unit-test",

View File

@ -1,5 +1,5 @@
# docker build -t yandex/clickhouse-test-base .
FROM ubuntu:19.10
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive LLVM_VERSION=11

View File

@ -1,8 +1,6 @@
# docker build -t yandex/clickhouse-stateful-test-with-coverage .
FROM yandex/clickhouse-stateless-test-with-coverage
RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list
RUN apt-get update -y \
&& env DEBIAN_FRONTEND=noninteractive \
apt-get install --yes --no-install-recommends \

View File

@ -25,6 +25,6 @@ Under the same conditions, ClickHouse can handle several hundred queries per sec
## Performance When Inserting Data {#performance-when-inserting-data}
We recommend inserting data in packets of at least 1000 rows, or no more than a single request per second. When inserting to a MergeTree table from a tab-separated dump, the insertion speed can be from 50 to 200 MB/s. If the inserted rows are around 1 Kb in size, the speed will be from 50,000 to 200,000 rows per second. If the rows are small, the performance can be higher in rows per second (on Banner System data -`>` 500,000 rows per second; on Graphite data -`>` 1,000,000 rows per second). To improve performance, you can make multiple INSERT queries in parallel, which scales linearly.
We recommend inserting data in packets of at least 1000 rows, or no more than a single request per second. When inserting to a MergeTree table from a tab-separated dump, the insertion speed can be from 50 to 200 MB/s. If the inserted rows are around 1 KB in size, the speed will be from 50,000 to 200,000 rows per second. If the rows are small, the performance can be higher in rows per second (on Banner System data -`>` 500,000 rows per second; on Graphite data -`>` 1,000,000 rows per second). To improve performance, you can make multiple INSERT queries in parallel, which scales linearly.
{## [Original article](https://clickhouse.tech/docs/en/introduction/performance/) ##}

View File

@ -91,6 +91,23 @@ The Linux kernel prior to 3.2 had a multitude of problems with IPv6 implementati
Use at least a 10 GB network, if possible. 1 Gb will also work, but it will be much worse for patching replicas with tens of terabytes of data, or for processing distributed queries with a large amount of intermediate data.
## Hypervisor configuration
If you are using OpenStack, set
```
cpu_mode=host-passthrough
```
in nova.conf.
If you are using libvirt, set
```
<cpu mode='host-passthrough'/>
```
in XML configuration.
This is important for ClickHouse to be able to get correct information with `cpuid` instruction.
Otherwise you may get `Illegal instruction` crashes when hypervisor is run on old CPU models.
## ZooKeeper {#zookeeper}
You are probably already using ZooKeeper for other purposes. You can use the same installation of ZooKeeper, if it isnt already overloaded.

View File

@ -593,6 +593,18 @@ SELECT dateDiff('hour', toDateTime('2018-01-01 22:00:00'), toDateTime('2018-01-0
Например, `timeSlots(toDateTime('2012-01-01 12:20:00'), toUInt32(600)) = [toDateTime('2012-01-01 12:00:00'), toDateTime('2012-01-01 12:30:00')]`.
Это нужно для поиска хитов, входящих в соответствующий визит.
## toYYYYMM
Переводит дату или дату со временем в число типа UInt32, содержащее номер года и месяца (YYYY * 100 + MM).
## toYYYYMMDD
Переводит дату или дату со временем в число типа UInt32, содержащее номер года, месяца и дня (YYYY * 10000 + MM * 100 + DD).
## toYYYYMMDDhhmmss
Переводит дату или дату со временем в число типа UInt64 содержащее номер года, месяца, дня и время (YYYY * 10000000000 + MM * 100000000 + DD * 1000000 + hh * 10000 + mm * 100 + ss).
## formatDateTime {#formatdatetime}
Функция преобразует дату-и-время в строку по заданному шаблону. Важно: шаблон — константное выражение, поэтому использовать разные шаблоны в одной колонке не получится.

View File

@ -14,7 +14,7 @@ toc_title: LIMIT
## LIMIT … WITH TIES 修饰符 {#limit-with-ties}
如果为 `LIMIT n[,m]` 设置了 `WITH TIES` ,并且声明了 `ORDER BY expr_list`, you will get in result first `n` or `n,m` rows and all rows with same `ORDER BY` fields values equal to row at position `n` for `LIMIT n` and `m` for `LIMIT n,m`.
如果为 `LIMIT n[,m]` 设置了 `WITH TIES` ,并且声明了 `ORDER BY expr_list`, 除了得到无修饰符的结果(正常情况下的 `limit n`, 前n行数据), 还会返回与第`n`行具有相同排序字段的行(即如果第n+1行的字段与第n行 拥有相同的排序字段,同样返回该结果.
此修饰符可以与: [ORDER BY … WITH FILL modifier](../../../sql-reference/statements/select/order-by.md#orderby-with-fill) 组合使用.
@ -38,7 +38,7 @@ SELECT * FROM (
└───┘
```
单子执行了 `WITH TIES` 修饰符后
添加 `WITH TIES` 修饰符后
``` sql
SELECT * FROM (
@ -59,4 +59,8 @@ SELECT * FROM (
└───┘
```
cause row number 6 have same value “2” for field `n` as row number 5
虽然指定了`LIMIT 5`, 但第6行的`n`字段值为2与第5行相同因此也作为满足条件的记录返回。
简而言之,该修饰符可理解为是否增加“并列行”的数据。
``` sql
``` sql

241
src/Columns/ColumnMap.cpp Normal file
View File

@ -0,0 +1,241 @@
#include <Columns/ColumnMap.h>
#include <Columns/IColumnImpl.h>
#include <DataStreams/ColumnGathererStream.h>
#include <IO/WriteBufferFromString.h>
#include <IO/Operators.h>
#include <ext/map.h>
#include <ext/range.h>
#include <Common/typeid_cast.h>
#include <Common/assert_cast.h>
#include <Common/WeakHash.h>
#include <Core/Field.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_COLUMN;
extern const int NOT_IMPLEMENTED;
extern const int LOGICAL_ERROR;
}
std::string ColumnMap::getName() const
{
WriteBufferFromOwnString res;
const auto & nested_tuple = getNestedData();
res << "Map(" << nested_tuple.getColumn(0).getName()
<< ", " << nested_tuple.getColumn(1).getName() << ")";
return res.str();
}
ColumnMap::ColumnMap(MutableColumnPtr && nested_)
: nested(std::move(nested_))
{
const auto * column_array = typeid_cast<const ColumnArray *>(nested.get());
if (!column_array)
throw Exception(ErrorCodes::LOGICAL_ERROR, "ColumnMap can be created only from array of tuples");
const auto * column_tuple = typeid_cast<const ColumnTuple *>(column_array->getDataPtr().get());
if (!column_tuple)
throw Exception(ErrorCodes::LOGICAL_ERROR, "ColumnMap can be created only from array of tuples");
if (column_tuple->getColumns().size() != 2)
throw Exception(ErrorCodes::LOGICAL_ERROR, "ColumnMap should contain only 2 subcolumns: keys and values");
for (const auto & column : column_tuple->getColumns())
if (isColumnConst(*column))
throw Exception(ErrorCodes::ILLEGAL_COLUMN, "ColumnMap cannot have ColumnConst as its element");
}
MutableColumnPtr ColumnMap::cloneEmpty() const
{
return ColumnMap::create(nested->cloneEmpty());
}
MutableColumnPtr ColumnMap::cloneResized(size_t new_size) const
{
return ColumnMap::create(nested->cloneResized(new_size));
}
Field ColumnMap::operator[](size_t n) const
{
auto array = DB::get<Array>((*nested)[n]);
return Map(std::make_move_iterator(array.begin()), std::make_move_iterator(array.end()));
}
void ColumnMap::get(size_t n, Field & res) const
{
const auto & offsets = getNestedColumn().getOffsets();
size_t offset = offsets[n - 1];
size_t size = offsets[n] - offsets[n - 1];
res = Map(size);
auto & map = DB::get<Map &>(res);
for (size_t i = 0; i < size; ++i)
getNestedData().get(offset + i, map[i]);
}
StringRef ColumnMap::getDataAt(size_t) const
{
throw Exception("Method getDataAt is not supported for " + getName(), ErrorCodes::NOT_IMPLEMENTED);
}
void ColumnMap::insertData(const char *, size_t)
{
throw Exception("Method insertData is not supported for " + getName(), ErrorCodes::NOT_IMPLEMENTED);
}
void ColumnMap::insert(const Field & x)
{
const auto & map = DB::get<const Map &>(x);
nested->insert(Array(map.begin(), map.end()));
}
void ColumnMap::insertDefault()
{
nested->insertDefault();
}
void ColumnMap::popBack(size_t n)
{
nested->popBack(n);
}
StringRef ColumnMap::serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const
{
return nested->serializeValueIntoArena(n, arena, begin);
}
const char * ColumnMap::deserializeAndInsertFromArena(const char * pos)
{
return nested->deserializeAndInsertFromArena(pos);
}
void ColumnMap::updateHashWithValue(size_t n, SipHash & hash) const
{
nested->updateHashWithValue(n, hash);
}
void ColumnMap::updateWeakHash32(WeakHash32 & hash) const
{
nested->updateWeakHash32(hash);
}
void ColumnMap::updateHashFast(SipHash & hash) const
{
nested->updateHashFast(hash);
}
void ColumnMap::insertRangeFrom(const IColumn & src, size_t start, size_t length)
{
nested->insertRangeFrom(
assert_cast<const ColumnMap &>(src).getNestedColumn(),
start, length);
}
ColumnPtr ColumnMap::filter(const Filter & filt, ssize_t result_size_hint) const
{
auto filtered = nested->filter(filt, result_size_hint);
return ColumnMap::create(filtered);
}
ColumnPtr ColumnMap::permute(const Permutation & perm, size_t limit) const
{
auto permuted = nested->permute(perm, limit);
return ColumnMap::create(std::move(permuted));
}
ColumnPtr ColumnMap::index(const IColumn & indexes, size_t limit) const
{
auto res = nested->index(indexes, limit);
return ColumnMap::create(std::move(res));
}
ColumnPtr ColumnMap::replicate(const Offsets & offsets) const
{
auto replicated = nested->replicate(offsets);
return ColumnMap::create(std::move(replicated));
}
MutableColumns ColumnMap::scatter(ColumnIndex num_columns, const Selector & selector) const
{
auto scattered_columns = nested->scatter(num_columns, selector);
MutableColumns res;
res.reserve(num_columns);
for (auto && scattered : scattered_columns)
res.push_back(ColumnMap::create(std::move(scattered)));
return res;
}
int ColumnMap::compareAt(size_t n, size_t m, const IColumn & rhs, int nan_direction_hint) const
{
const auto & rhs_map = assert_cast<const ColumnMap &>(rhs);
return nested->compareAt(n, m, rhs_map.getNestedColumn(), nan_direction_hint);
}
void ColumnMap::compareColumn(const IColumn & rhs, size_t rhs_row_num,
PaddedPODArray<UInt64> * row_indexes, PaddedPODArray<Int8> & compare_results,
int direction, int nan_direction_hint) const
{
return doCompareColumn<ColumnMap>(assert_cast<const ColumnMap &>(rhs), rhs_row_num, row_indexes,
compare_results, direction, nan_direction_hint);
}
void ColumnMap::getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const
{
nested->getPermutation(reverse, limit, nan_direction_hint, res);
}
void ColumnMap::updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const
{
nested->updatePermutation(reverse, limit, nan_direction_hint, res, equal_range);
}
void ColumnMap::gather(ColumnGathererStream & gatherer)
{
gatherer.gather(*this);
}
void ColumnMap::reserve(size_t n)
{
nested->reserve(n);
}
size_t ColumnMap::byteSize() const
{
return nested->byteSize();
}
size_t ColumnMap::allocatedBytes() const
{
return nested->allocatedBytes();
}
void ColumnMap::protect()
{
nested->protect();
}
void ColumnMap::getExtremes(Field & min, Field & max) const
{
nested->getExtremes(min, max);
}
void ColumnMap::forEachSubcolumn(ColumnCallback callback)
{
nested->forEachSubcolumn(callback);
}
bool ColumnMap::structureEquals(const IColumn & rhs) const
{
if (const auto * rhs_map = typeid_cast<const ColumnMap *>(&rhs))
return nested->structureEquals(*rhs_map->nested);
return false;
}
}

92
src/Columns/ColumnMap.h Normal file
View File

@ -0,0 +1,92 @@
#pragma once
#include <Core/Block.h>
#include <Columns/ColumnArray.h>
#include <Columns/ColumnVector.h>
#include <Columns/ColumnTuple.h>
namespace DB
{
/** Column, that stores a nested Array(Tuple(key, value)) column.
*/
class ColumnMap final : public COWHelper<IColumn, ColumnMap>
{
private:
friend class COWHelper<IColumn, ColumnMap>;
WrappedPtr nested;
explicit ColumnMap(MutableColumnPtr && nested_);
ColumnMap(const ColumnMap &) = default;
public:
/** Create immutable column using immutable arguments. This arguments may be shared with other columns.
* Use IColumn::mutate in order to make mutable column and mutate shared nested columns.
*/
using Base = COWHelper<IColumn, ColumnMap>;
static Ptr create(const ColumnPtr & keys, const ColumnPtr & values, const ColumnPtr & offsets)
{
auto nested_column = ColumnArray::create(ColumnTuple::create(Columns{keys, values}), offsets);
return ColumnMap::create(nested_column);
}
static Ptr create(const ColumnPtr & column) { return ColumnMap::create(column->assumeMutable()); }
static Ptr create(ColumnPtr && arg) { return create(arg); }
template <typename Arg, typename = typename std::enable_if<std::is_rvalue_reference<Arg &&>::value>::type>
static MutablePtr create(Arg && arg) { return Base::create(std::forward<Arg>(arg)); }
std::string getName() const override;
const char * getFamilyName() const override { return "Map"; }
TypeIndex getDataType() const override { return TypeIndex::Map; }
MutableColumnPtr cloneEmpty() const override;
MutableColumnPtr cloneResized(size_t size) const override;
size_t size() const override { return nested->size(); }
Field operator[](size_t n) const override;
void get(size_t n, Field & res) const override;
StringRef getDataAt(size_t n) const override;
void insertData(const char * pos, size_t length) override;
void insert(const Field & x) override;
void insertDefault() override;
void popBack(size_t n) override;
StringRef serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const override;
const char * deserializeAndInsertFromArena(const char * pos) override;
void updateHashWithValue(size_t n, SipHash & hash) const override;
void updateWeakHash32(WeakHash32 & hash) const override;
void updateHashFast(SipHash & hash) const override;
void insertRangeFrom(const IColumn & src, size_t start, size_t length) override;
ColumnPtr filter(const Filter & filt, ssize_t result_size_hint) const override;
ColumnPtr permute(const Permutation & perm, size_t limit) const override;
ColumnPtr index(const IColumn & indexes, size_t limit) const override;
ColumnPtr replicate(const Offsets & offsets) const override;
MutableColumns scatter(ColumnIndex num_columns, const Selector & selector) const override;
void gather(ColumnGathererStream & gatherer_stream) override;
int compareAt(size_t n, size_t m, const IColumn & rhs, int nan_direction_hint) const override;
void compareColumn(const IColumn & rhs, size_t rhs_row_num,
PaddedPODArray<UInt64> * row_indexes, PaddedPODArray<Int8> & compare_results,
int direction, int nan_direction_hint) const override;
void getExtremes(Field & min, Field & max) const override;
void getPermutation(bool reverse, size_t limit, int nan_direction_hint, Permutation & res) const override;
void updatePermutation(bool reverse, size_t limit, int nan_direction_hint, IColumn::Permutation & res, EqualRanges & equal_range) const override;
void reserve(size_t n) override;
size_t byteSize() const override;
size_t allocatedBytes() const override;
void protect() override;
void forEachSubcolumn(ColumnCallback callback) override;
bool structureEquals(const IColumn & rhs) const override;
const ColumnArray & getNestedColumn() const { return assert_cast<const ColumnArray &>(*nested); }
ColumnArray & getNestedColumn() { return assert_cast<ColumnArray &>(*nested); }
const ColumnTuple & getNestedData() const { return assert_cast<const ColumnTuple &>(getNestedColumn().getData()); }
ColumnTuple & getNestedData() { return assert_cast<ColumnTuple &>(getNestedColumn().getData()); }
};
}

View File

@ -24,6 +24,7 @@ SRCS(
ColumnFixedString.cpp
ColumnFunction.cpp
ColumnLowCardinality.cpp
ColumnMap.cpp
ColumnNullable.cpp
ColumnString.cpp
ColumnTuple.cpp

View File

@ -529,6 +529,7 @@
M(560, ZSTD_ENCODER_FAILED) \
M(561, ZSTD_DECODER_FAILED) \
M(562, TLD_LIST_NOT_FOUND) \
M(563, CANNOT_READ_MAP_FROM_TEXT) \
\
M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \

View File

@ -93,6 +93,22 @@ String FieldVisitorDump::operator() (const Tuple & x) const
return wb.str();
}
String FieldVisitorDump::operator() (const Map & x) const
{
WriteBufferFromOwnString wb;
wb << "Map_(";
for (auto it = x.begin(); it != x.end(); ++it)
{
if (it != x.begin())
wb << ", ";
wb << applyVisitor(*this, *it);
}
wb << ')';
return wb.str();
}
String FieldVisitorDump::operator() (const AggregateFunctionStateData & x) const
{
WriteBufferFromOwnString wb;
@ -176,6 +192,82 @@ String FieldVisitorToString::operator() (const Tuple & x) const
return wb.str();
}
String FieldVisitorToString::operator() (const Map & x) const
{
WriteBufferFromOwnString wb;
wb << '(';
for (auto it = x.begin(); it != x.end(); ++it)
{
if (it != x.begin())
wb << ", ";
wb << applyVisitor(*this, *it);
}
wb << ')';
return wb.str();
}
void FieldVisitorWriteBinary::operator() (const Null &, WriteBuffer &) const { }
void FieldVisitorWriteBinary::operator() (const UInt64 & x, WriteBuffer & buf) const { DB::writeVarUInt(x, buf); }
void FieldVisitorWriteBinary::operator() (const Int64 & x, WriteBuffer & buf) const { DB::writeVarInt(x, buf); }
void FieldVisitorWriteBinary::operator() (const Float64 & x, WriteBuffer & buf) const { DB::writeFloatBinary(x, buf); }
void FieldVisitorWriteBinary::operator() (const String & x, WriteBuffer & buf) const { DB::writeStringBinary(x, buf); }
void FieldVisitorWriteBinary::operator() (const UInt128 & x, WriteBuffer & buf) const { DB::writeBinary(x, buf); }
void FieldVisitorWriteBinary::operator() (const Int128 & x, WriteBuffer & buf) const { DB::writeVarInt(x, buf); }
void FieldVisitorWriteBinary::operator() (const UInt256 & x, WriteBuffer & buf) const { DB::writeBinary(x, buf); }
void FieldVisitorWriteBinary::operator() (const Int256 & x, WriteBuffer & buf) const { DB::writeBinary(x, buf); }
void FieldVisitorWriteBinary::operator() (const DecimalField<Decimal32> & x, WriteBuffer & buf) const { DB::writeBinary(x.getValue(), buf); }
void FieldVisitorWriteBinary::operator() (const DecimalField<Decimal64> & x, WriteBuffer & buf) const { DB::writeBinary(x.getValue(), buf); }
void FieldVisitorWriteBinary::operator() (const DecimalField<Decimal128> & x, WriteBuffer & buf) const { DB::writeBinary(x.getValue(), buf); }
void FieldVisitorWriteBinary::operator() (const DecimalField<Decimal256> & x, WriteBuffer & buf) const { DB::writeBinary(x.getValue(), buf); }
void FieldVisitorWriteBinary::operator() (const AggregateFunctionStateData & x, WriteBuffer & buf) const
{
DB::writeStringBinary(x.name, buf);
DB::writeStringBinary(x.data, buf);
}
void FieldVisitorWriteBinary::operator() (const Array & x, WriteBuffer & buf) const
{
const size_t size = x.size();
DB::writeBinary(size, buf);
for (size_t i = 0; i < size; ++i)
{
const UInt8 type = x[i].getType();
DB::writeBinary(type, buf);
Field::dispatch([&buf] (const auto & value) { DB::FieldVisitorWriteBinary()(value, buf); }, x[i]);
}
}
void FieldVisitorWriteBinary::operator() (const Tuple & x, WriteBuffer & buf) const
{
const size_t size = x.size();
DB::writeBinary(size, buf);
for (size_t i = 0; i < size; ++i)
{
const UInt8 type = x[i].getType();
DB::writeBinary(type, buf);
Field::dispatch([&buf] (const auto & value) { DB::FieldVisitorWriteBinary()(value, buf); }, x[i]);
}
}
void FieldVisitorWriteBinary::operator() (const Map & x, WriteBuffer & buf) const
{
const size_t size = x.size();
DB::writeBinary(size, buf);
for (size_t i = 0; i < size; ++i)
{
const UInt8 type = x[i].getType();
writeBinary(type, buf);
Field::dispatch([&buf] (const auto & value) { DB::FieldVisitorWriteBinary()(value, buf); }, x[i]);
}
}
FieldVisitorHash::FieldVisitorHash(SipHash & hash_) : hash(hash_) {}
@ -238,6 +330,16 @@ void FieldVisitorHash::operator() (const Tuple & x) const
applyVisitor(*this, elem);
}
void FieldVisitorHash::operator() (const Map & x) const
{
UInt8 type = Field::Types::Map;
hash.update(type);
hash.update(x.size());
for (const auto & elem : x)
applyVisitor(*this, elem);
}
void FieldVisitorHash::operator() (const Array & x) const
{
UInt8 type = Field::Types::Array;

View File

@ -77,6 +77,7 @@ public:
String operator() (const String & x) const;
String operator() (const Array & x) const;
String operator() (const Tuple & x) const;
String operator() (const Map & x) const;
String operator() (const DecimalField<Decimal32> & x) const;
String operator() (const DecimalField<Decimal64> & x) const;
String operator() (const DecimalField<Decimal128> & x) const;
@ -88,6 +89,30 @@ public:
};
class FieldVisitorWriteBinary
{
public:
void operator() (const Null & x, WriteBuffer & buf) const;
void operator() (const UInt64 & x, WriteBuffer & buf) const;
void operator() (const UInt128 & x, WriteBuffer & buf) const;
void operator() (const Int64 & x, WriteBuffer & buf) const;
void operator() (const Int128 & x, WriteBuffer & buf) const;
void operator() (const Float64 & x, WriteBuffer & buf) const;
void operator() (const String & x, WriteBuffer & buf) const;
void operator() (const Array & x, WriteBuffer & buf) const;
void operator() (const Tuple & x, WriteBuffer & buf) const;
void operator() (const Map & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal32> & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal64> & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal128> & x, WriteBuffer & buf) const;
void operator() (const DecimalField<Decimal256> & x, WriteBuffer & buf) const;
void operator() (const AggregateFunctionStateData & x, WriteBuffer & buf) const;
void operator() (const UInt256 & x, WriteBuffer & buf) const;
void operator() (const Int256 & x, WriteBuffer & buf) const;
};
/** Print readable and unique text dump of field type and value. */
class FieldVisitorDump : public StaticVisitor<String>
{
@ -101,6 +126,7 @@ public:
String operator() (const String & x) const;
String operator() (const Array & x) const;
String operator() (const Tuple & x) const;
String operator() (const Map & x) const;
String operator() (const DecimalField<Decimal32> & x) const;
String operator() (const DecimalField<Decimal64> & x) const;
String operator() (const DecimalField<Decimal128> & x) const;
@ -137,6 +163,11 @@ public:
throw Exception("Cannot convert Tuple to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
}
T operator() (const Map &) const
{
throw Exception("Cannot convert Map to " + demangle(typeid(T).name()), ErrorCodes::CANNOT_CONVERT_TYPE);
}
T operator() (const UInt64 & x) const { return T(x); }
T operator() (const Int64 & x) const { return T(x); }
T operator() (const Int128 & x) const { return T(x); }
@ -226,6 +257,7 @@ public:
void operator() (const String & x) const;
void operator() (const Array & x) const;
void operator() (const Tuple & x) const;
void operator() (const Map & x) const;
void operator() (const DecimalField<Decimal32> & x) const;
void operator() (const DecimalField<Decimal64> & x) const;
void operator() (const DecimalField<Decimal128> & x) const;
@ -268,6 +300,7 @@ public:
bool operator() (String &) const { throw Exception("Cannot sum Strings", ErrorCodes::LOGICAL_ERROR); }
bool operator() (Array &) const { throw Exception("Cannot sum Arrays", ErrorCodes::LOGICAL_ERROR); }
bool operator() (Tuple &) const { throw Exception("Cannot sum Tuples", ErrorCodes::LOGICAL_ERROR); }
bool operator() (Map &) const { throw Exception("Cannot sum Maps", ErrorCodes::LOGICAL_ERROR); }
bool operator() (UInt128 &) const { throw Exception("Cannot sum UUIDs", ErrorCodes::LOGICAL_ERROR); }
bool operator() (AggregateFunctionStateData &) const { throw Exception("Cannot sum AggregateFunctionStates", ErrorCodes::LOGICAL_ERROR); }

View File

@ -17,6 +17,63 @@ namespace ErrorCodes
extern const int DECIMAL_OVERFLOW;
}
inline Field getBinaryValue(UInt8 type, ReadBuffer & buf)
{
switch (type)
{
case Field::Types::Null: {
return DB::Field();
}
case Field::Types::UInt64: {
UInt64 value;
DB::readVarUInt(value, buf);
return value;
}
case Field::Types::UInt128: {
UInt128 value;
DB::readBinary(value, buf);
return value;
}
case Field::Types::Int64: {
Int64 value;
DB::readVarInt(value, buf);
return value;
}
case Field::Types::Float64: {
Float64 value;
DB::readFloatBinary(value, buf);
return value;
}
case Field::Types::String: {
std::string value;
DB::readStringBinary(value, buf);
return value;
}
case Field::Types::Array: {
Array value;
DB::readBinary(value, buf);
return value;
}
case Field::Types::Tuple: {
Tuple value;
DB::readBinary(value, buf);
return value;
}
case Field::Types::Map: {
Map value;
DB::readBinary(value, buf);
return value;
}
case Field::Types::AggregateFunctionState: {
AggregateFunctionStateData value;
DB::readStringBinary(value.name, buf);
DB::readStringBinary(value.data, buf);
return value;
}
}
return DB::Field();
}
void readBinary(Array & x, ReadBuffer & buf)
{
size_t size;
@ -25,73 +82,7 @@ void readBinary(Array & x, ReadBuffer & buf)
DB::readBinary(size, buf);
for (size_t index = 0; index < size; ++index)
{
switch (type)
{
case Field::Types::Null:
{
x.push_back(DB::Field());
break;
}
case Field::Types::UInt64:
{
UInt64 value;
DB::readVarUInt(value, buf);
x.push_back(value);
break;
}
case Field::Types::UInt128:
{
UInt128 value;
DB::readBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::Int64:
{
Int64 value;
DB::readVarInt(value, buf);
x.push_back(value);
break;
}
case Field::Types::Float64:
{
Float64 value;
DB::readFloatBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::String:
{
std::string value;
DB::readStringBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::Array:
{
Array value;
DB::readBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::Tuple:
{
Tuple value;
DB::readBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::AggregateFunctionState:
{
AggregateFunctionStateData value;
DB::readStringBinary(value.name, buf);
DB::readStringBinary(value.data, buf);
x.push_back(value);
break;
}
}
}
x.push_back(getBinaryValue(type, buf));
}
void writeBinary(const Array & x, WriteBuffer & buf)
@ -104,53 +95,7 @@ void writeBinary(const Array & x, WriteBuffer & buf)
DB::writeBinary(size, buf);
for (const auto & elem : x)
{
switch (type)
{
case Field::Types::Null: break;
case Field::Types::UInt64:
{
DB::writeVarUInt(get<UInt64>(elem), buf);
break;
}
case Field::Types::UInt128:
{
DB::writeBinary(get<UInt128>(elem), buf);
break;
}
case Field::Types::Int64:
{
DB::writeVarInt(get<Int64>(elem), buf);
break;
}
case Field::Types::Float64:
{
DB::writeFloatBinary(get<Float64>(elem), buf);
break;
}
case Field::Types::String:
{
DB::writeStringBinary(get<std::string>(elem), buf);
break;
}
case Field::Types::Array:
{
DB::writeBinary(get<Array>(elem), buf);
break;
}
case Field::Types::Tuple:
{
DB::writeBinary(get<Tuple>(elem), buf);
break;
}
case Field::Types::AggregateFunctionState:
{
DB::writeStringBinary(elem.get<AggregateFunctionStateData>().name, buf);
DB::writeStringBinary(elem.get<AggregateFunctionStateData>().data, buf);
break;
}
}
}
Field::dispatch([&buf] (const auto & value) { DB::FieldVisitorWriteBinary()(value, buf); }, elem);
}
void writeText(const Array & x, WriteBuffer & buf)
@ -168,93 +113,7 @@ void readBinary(Tuple & x, ReadBuffer & buf)
{
UInt8 type;
DB::readBinary(type, buf);
switch (type)
{
case Field::Types::Null:
{
x.push_back(DB::Field());
break;
}
case Field::Types::UInt64:
{
UInt64 value;
DB::readVarUInt(value, buf);
x.push_back(value);
break;
}
case Field::Types::UInt128:
{
UInt128 value;
DB::readBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::Int64:
{
Int64 value;
DB::readVarInt(value, buf);
x.push_back(value);
break;
}
case Field::Types::Int128:
{
Int64 value;
DB::readVarInt(value, buf);
x.push_back(value);
break;
}
case Field::Types::Float64:
{
Float64 value;
DB::readFloatBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::String:
{
std::string value;
DB::readStringBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::UInt256:
{
UInt256 value;
DB::readBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::Int256:
{
Int256 value;
DB::readBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::Array:
{
Array value;
DB::readBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::Tuple:
{
Tuple value;
DB::readBinary(value, buf);
x.push_back(value);
break;
}
case Field::Types::AggregateFunctionState:
{
AggregateFunctionStateData value;
DB::readStringBinary(value.name, buf);
DB::readStringBinary(value.data, buf);
x.push_back(value);
break;
}
}
x.push_back(getBinaryValue(type, buf));
}
}
@ -267,67 +126,7 @@ void writeBinary(const Tuple & x, WriteBuffer & buf)
{
const UInt8 type = elem.getType();
DB::writeBinary(type, buf);
switch (type)
{
case Field::Types::Null: break;
case Field::Types::UInt64:
{
DB::writeVarUInt(get<UInt64>(elem), buf);
break;
}
case Field::Types::UInt128:
{
DB::writeBinary(get<UInt128>(elem), buf);
break;
}
case Field::Types::Int64:
{
DB::writeVarInt(get<Int64>(elem), buf);
break;
}
case Field::Types::Int128:
{
DB::writeVarInt(get<Int64>(elem), buf);
break;
}
case Field::Types::Float64:
{
DB::writeFloatBinary(get<Float64>(elem), buf);
break;
}
case Field::Types::String:
{
DB::writeStringBinary(get<std::string>(elem), buf);
break;
}
case Field::Types::UInt256:
{
DB::writeBinary(get<UInt256>(elem), buf);
break;
}
case Field::Types::Int256:
{
DB::writeBinary(get<Int256>(elem), buf);
break;
}
case Field::Types::Array:
{
DB::writeBinary(get<Array>(elem), buf);
break;
}
case Field::Types::Tuple:
{
DB::writeBinary(get<Tuple>(elem), buf);
break;
}
case Field::Types::AggregateFunctionState:
{
DB::writeStringBinary(elem.get<AggregateFunctionStateData>().name, buf);
DB::writeStringBinary(elem.get<AggregateFunctionStateData>().data, buf);
break;
}
}
Field::dispatch([&buf] (const auto & value) { DB::FieldVisitorWriteBinary()(value, buf); }, elem);
}
}
@ -336,6 +135,37 @@ void writeText(const Tuple & x, WriteBuffer & buf)
writeFieldText(DB::Field(x), buf);
}
void readBinary(Map & x, ReadBuffer & buf)
{
size_t size;
DB::readBinary(size, buf);
for (size_t index = 0; index < size; ++index)
{
UInt8 type;
DB::readBinary(type, buf);
x.push_back(getBinaryValue(type, buf));
}
}
void writeBinary(const Map & x, WriteBuffer & buf)
{
const size_t size = x.size();
DB::writeBinary(size, buf);
for (const auto & elem : x)
{
const UInt8 type = elem.getType();
DB::writeBinary(type, buf);
Field::dispatch([&buf] (const auto & value) { DB::FieldVisitorWriteBinary()(value, buf); }, elem);
}
}
void writeText(const Map & x, WriteBuffer & buf)
{
writeFieldText(DB::Field(x), buf);
}
template <typename T>
void readQuoted(DecimalField<T> & x, ReadBuffer & buf)
{
@ -530,6 +360,30 @@ Field Field::restoreFromDump(const std::string_view & dump_)
return tuple;
}
prefix = std::string_view{"Map_("};
if (dump.starts_with(prefix))
{
std::string_view tail = dump.substr(prefix.length());
trimLeft(tail);
Map map;
while (tail != ")")
{
size_t separator = tail.find_first_of(",)");
if (separator == std::string_view::npos)
show_error();
bool comma = (tail[separator] == ',');
std::string_view element = tail.substr(0, separator);
tail.remove_prefix(separator);
if (comma)
tail.remove_prefix(1);
trimLeft(tail);
if (!comma && tail != ")")
show_error();
map.push_back(Field::restoreFromDump(element));
}
return map;
}
prefix = std::string_view{"AggregateFunctionState_("};
if (dump.starts_with(prefix))
{

View File

@ -51,6 +51,9 @@ struct X : public FieldVector \
DEFINE_FIELD_VECTOR(Array);
DEFINE_FIELD_VECTOR(Tuple);
/// An array with the following structure: [(key1, value1), (key2, value2), ...]
DEFINE_FIELD_VECTOR(Map);
#undef DEFINE_FIELD_VECTOR
struct AggregateFunctionStateData
@ -206,6 +209,7 @@ template <> struct NearestFieldTypeImpl<std::string_view> { using Type = String;
template <> struct NearestFieldTypeImpl<String> { using Type = String; };
template <> struct NearestFieldTypeImpl<Array> { using Type = Array; };
template <> struct NearestFieldTypeImpl<Tuple> { using Type = Tuple; };
template <> struct NearestFieldTypeImpl<Map> { using Type = Map; };
template <> struct NearestFieldTypeImpl<bool> { using Type = UInt64; };
template <> struct NearestFieldTypeImpl<Null> { using Type = Null; };
@ -259,6 +263,7 @@ public:
Decimal256 = 23,
UInt256 = 24,
Int256 = 25,
Map = 26,
};
static const int MIN_NON_POD = 16;
@ -276,6 +281,7 @@ public:
case String: return "String";
case Array: return "Array";
case Tuple: return "Tuple";
case Map: return "Map";
case Decimal32: return "Decimal32";
case Decimal64: return "Decimal64";
case Decimal128: return "Decimal128";
@ -464,6 +470,7 @@ public:
case Types::String: return get<String>() < rhs.get<String>();
case Types::Array: return get<Array>() < rhs.get<Array>();
case Types::Tuple: return get<Tuple>() < rhs.get<Tuple>();
case Types::Map: return get<Map>() < rhs.get<Map>();
case Types::Decimal32: return get<DecimalField<Decimal32>>() < rhs.get<DecimalField<Decimal32>>();
case Types::Decimal64: return get<DecimalField<Decimal64>>() < rhs.get<DecimalField<Decimal64>>();
case Types::Decimal128: return get<DecimalField<Decimal128>>() < rhs.get<DecimalField<Decimal128>>();
@ -499,6 +506,7 @@ public:
case Types::String: return get<String>() <= rhs.get<String>();
case Types::Array: return get<Array>() <= rhs.get<Array>();
case Types::Tuple: return get<Tuple>() <= rhs.get<Tuple>();
case Types::Map: return get<Map>() <= rhs.get<Map>();
case Types::Decimal32: return get<DecimalField<Decimal32>>() <= rhs.get<DecimalField<Decimal32>>();
case Types::Decimal64: return get<DecimalField<Decimal64>>() <= rhs.get<DecimalField<Decimal64>>();
case Types::Decimal128: return get<DecimalField<Decimal128>>() <= rhs.get<DecimalField<Decimal128>>();
@ -536,6 +544,7 @@ public:
case Types::String: return get<String>() == rhs.get<String>();
case Types::Array: return get<Array>() == rhs.get<Array>();
case Types::Tuple: return get<Tuple>() == rhs.get<Tuple>();
case Types::Map: return get<Map>() == rhs.get<Map>();
case Types::UInt128: return get<UInt128>() == rhs.get<UInt128>();
case Types::Int128: return get<Int128>() == rhs.get<Int128>();
case Types::Decimal32: return get<DecimalField<Decimal32>>() == rhs.get<DecimalField<Decimal32>>();
@ -575,6 +584,7 @@ public:
case Types::String: return f(field.template get<String>());
case Types::Array: return f(field.template get<Array>());
case Types::Tuple: return f(field.template get<Tuple>());
case Types::Map: return f(field.template get<Map>());
case Types::Decimal32: return f(field.template get<DecimalField<Decimal32>>());
case Types::Decimal64: return f(field.template get<DecimalField<Decimal64>>());
case Types::Decimal128: return f(field.template get<DecimalField<Decimal128>>());
@ -600,7 +610,7 @@ public:
private:
std::aligned_union_t<DBMS_MIN_FIELD_SIZE - sizeof(Types::Which),
Null, UInt64, UInt128, Int64, Int128, Float64, String, Array, Tuple,
Null, UInt64, UInt128, Int64, Int128, Float64, String, Array, Tuple, Map,
DecimalField<Decimal32>, DecimalField<Decimal64>, DecimalField<Decimal128>, DecimalField<Decimal256>,
AggregateFunctionStateData,
UInt256, Int256
@ -699,6 +709,9 @@ private:
case Types::Tuple:
destroy<Tuple>();
break;
case Types::Map:
destroy<Map>();
break;
case Types::AggregateFunctionState:
destroy<AggregateFunctionStateData>();
break;
@ -729,6 +742,7 @@ template <> struct Field::TypeToEnum<Float64> { static const Types::Which value
template <> struct Field::TypeToEnum<String> { static const Types::Which value = Types::String; };
template <> struct Field::TypeToEnum<Array> { static const Types::Which value = Types::Array; };
template <> struct Field::TypeToEnum<Tuple> { static const Types::Which value = Types::Tuple; };
template <> struct Field::TypeToEnum<Map> { static const Types::Which value = Types::Map; };
template <> struct Field::TypeToEnum<DecimalField<Decimal32>>{ static const Types::Which value = Types::Decimal32; };
template <> struct Field::TypeToEnum<DecimalField<Decimal64>>{ static const Types::Which value = Types::Decimal64; };
template <> struct Field::TypeToEnum<DecimalField<Decimal128>>{ static const Types::Which value = Types::Decimal128; };
@ -747,6 +761,7 @@ template <> struct Field::EnumToType<Field::Types::Float64> { using Type = Float
template <> struct Field::EnumToType<Field::Types::String> { using Type = String; };
template <> struct Field::EnumToType<Field::Types::Array> { using Type = Array; };
template <> struct Field::EnumToType<Field::Types::Tuple> { using Type = Tuple; };
template <> struct Field::EnumToType<Field::Types::Map> { using Type = Map; };
template <> struct Field::EnumToType<Field::Types::Decimal32> { using Type = DecimalField<Decimal32>; };
template <> struct Field::EnumToType<Field::Types::Decimal64> { using Type = DecimalField<Decimal64>; };
template <> struct Field::EnumToType<Field::Types::Decimal128> { using Type = DecimalField<Decimal128>; };
@ -814,6 +829,7 @@ T safeGet(Field & field)
template <> struct TypeName<Array> { static std::string get() { return "Array"; } };
template <> struct TypeName<Tuple> { static std::string get() { return "Tuple"; } };
template <> struct TypeName<Map> { static std::string get() { return "Map"; } };
template <> struct TypeName<AggregateFunctionStateData> { static std::string get() { return "AggregateFunctionState"; } };
template <typename T>
@ -900,6 +916,12 @@ void writeBinary(const Tuple & x, WriteBuffer & buf);
void writeText(const Tuple & x, WriteBuffer & buf);
void readBinary(Map & x, ReadBuffer & buf);
[[noreturn]] inline void readText(Map &, ReadBuffer &) { throw Exception("Cannot read Map.", ErrorCodes::NOT_IMPLEMENTED); }
[[noreturn]] inline void readQuoted(Map &, ReadBuffer &) { throw Exception("Cannot read Map.", ErrorCodes::NOT_IMPLEMENTED); }
void writeBinary(const Map & x, WriteBuffer & buf);
void writeText(const Map & x, WriteBuffer & buf);
[[noreturn]] inline void writeQuoted(const Map &, WriteBuffer &) { throw Exception("Cannot write Map quoted.", ErrorCodes::NOT_IMPLEMENTED); }
__attribute__ ((noreturn)) inline void writeText(const AggregateFunctionStateData &, WriteBuffer &)
{

View File

@ -402,6 +402,7 @@ class IColumn;
M(Bool, aggregate_functions_null_for_empty, false, "Rewrite all aggregate functions in a query, adding -OrNull suffix to them", 0) \
M(Bool, optimize_skip_merged_partitions, false, "Skip partitions with one part with level > 0 in optimize final", 0) \
M(Bool, optimize_on_insert, true, "Do the same transformation for inserted block of data as if merge was done on this block.", 0) \
M(Bool, allow_experimental_map_type, false, "Allow data type Map", 0) \
\
M(Bool, use_antlr_parser, false, "Parse incoming queries using ANTLR-generated parser", 0) \
\

View File

@ -56,6 +56,7 @@ enum class TypeIndex
Function,
AggregateFunction,
LowCardinality,
Map,
};
#if !__clang__
#pragma GCC diagnostic pop
@ -267,6 +268,7 @@ inline constexpr const char * getTypeName(TypeIndex idx)
case TypeIndex::Function: return "Function";
case TypeIndex::AggregateFunction: return "AggregateFunction";
case TypeIndex::LowCardinality: return "LowCardinality";
case TypeIndex::Map: return "Map";
}
__builtin_unreachable();

View File

@ -180,6 +180,7 @@ DataTypeFactory::DataTypeFactory()
registerDataTypeDomainIPv4AndIPv6(*this);
registerDataTypeDomainSimpleAggregateFunction(*this);
registerDataTypeDomainGeo(*this);
registerDataTypeMap(*this);
}
DataTypeFactory & DataTypeFactory::instance()

View File

@ -73,6 +73,7 @@ void registerDataTypeFixedString(DataTypeFactory & factory);
void registerDataTypeEnum(DataTypeFactory & factory);
void registerDataTypeArray(DataTypeFactory & factory);
void registerDataTypeTuple(DataTypeFactory & factory);
void registerDataTypeMap(DataTypeFactory & factory);
void registerDataTypeNullable(DataTypeFactory & factory);
void registerDataTypeNothing(DataTypeFactory & factory);
void registerDataTypeUUID(DataTypeFactory & factory);

View File

@ -0,0 +1,375 @@
#include <Common/StringUtils/StringUtils.h>
#include <Columns/ColumnMap.h>
#include <Columns/ColumnArray.h>
#include <Core/Field.h>
#include <Formats/FormatSettings.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeFactory.h>
#include <Parsers/IAST.h>
#include <Parsers/ASTNameTypePair.h>
#include <Common/typeid_cast.h>
#include <Common/assert_cast.h>
#include <Common/quoteString.h>
#include <IO/WriteHelpers.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromString.h>
#include <IO/ReadBufferFromString.h>
#include <IO/Operators.h>
#include <ext/map.h>
#include <ext/enumerate.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int CANNOT_READ_MAP_FROM_TEXT;
}
DataTypeMap::DataTypeMap(const DataTypes & elems_)
{
assert(elems_.size() == 2);
key_type = elems_[0];
value_type = elems_[1];
nested = std::make_shared<DataTypeArray>(
std::make_shared<DataTypeTuple>(DataTypes{key_type, value_type}, Names{"keys", "values"}));
}
DataTypeMap::DataTypeMap(const DataTypePtr & key_type_, const DataTypePtr & value_type_)
: key_type(key_type_), value_type(value_type_)
, nested(std::make_shared<DataTypeArray>(
std::make_shared<DataTypeTuple>(DataTypes{key_type_, value_type_}, Names{"keys", "values"}))) {}
std::string DataTypeMap::doGetName() const
{
WriteBufferFromOwnString s;
s << "Map(" << key_type->getName() << "," << value_type->getName() << ")";
return s.str();
}
static const IColumn & extractNestedColumn(const IColumn & column)
{
return assert_cast<const ColumnMap &>(column).getNestedColumn();
}
static IColumn & extractNestedColumn(IColumn & column)
{
return assert_cast<ColumnMap &>(column).getNestedColumn();
}
void DataTypeMap::serializeBinary(const Field & field, WriteBuffer & ostr) const
{
const auto & map = get<const Map &>(field);
writeVarUInt(map.size(), ostr);
for (const auto & elem : map)
{
const auto & tuple = elem.safeGet<const Tuple>();
assert(tuple.size() == 2);
key_type->serializeBinary(tuple[0], ostr);
value_type->serializeBinary(tuple[1], ostr);
}
}
void DataTypeMap::deserializeBinary(Field & field, ReadBuffer & istr) const
{
size_t size;
readVarUInt(size, istr);
field = Map(size);
for (auto & elem : field.get<Map &>())
{
Tuple tuple(2);
key_type->deserializeBinary(tuple[0], istr);
value_type->deserializeBinary(tuple[1], istr);
elem = std::move(tuple);
}
}
void DataTypeMap::serializeBinary(const IColumn & column, size_t row_num, WriteBuffer & ostr) const
{
nested->serializeBinary(extractNestedColumn(column), row_num, ostr);
}
void DataTypeMap::deserializeBinary(IColumn & column, ReadBuffer & istr) const
{
nested->deserializeBinary(extractNestedColumn(column), istr);
}
template <typename Writer>
void DataTypeMap::serializeTextImpl(const IColumn & column, size_t row_num, WriteBuffer & ostr, Writer && writer) const
{
const auto & column_map = assert_cast<const ColumnMap &>(column);
const auto & nested_array = column_map.getNestedColumn();
const auto & nested_tuple = column_map.getNestedData();
const auto & offsets = nested_array.getOffsets();
size_t offset = offsets[row_num - 1];
size_t next_offset = offsets[row_num];
writeChar('{', ostr);
for (size_t i = offset; i < next_offset; ++i)
{
if (i != offset)
writeChar(',', ostr);
writer(key_type, nested_tuple.getColumn(0), i);
writeChar(':', ostr);
writer(value_type, nested_tuple.getColumn(1), i);
}
writeChar('}', ostr);
}
template <typename Reader>
void DataTypeMap::deserializeTextImpl(IColumn & column, ReadBuffer & istr, bool need_safe_get_int_key, Reader && reader) const
{
auto & column_map = assert_cast<ColumnMap &>(column);
auto & nested_array = column_map.getNestedColumn();
auto & nested_tuple = column_map.getNestedData();
auto & offsets = nested_array.getOffsets();
auto & key_column = nested_tuple.getColumn(0);
auto & value_column = nested_tuple.getColumn(1);
size_t size = 0;
assertChar('{', istr);
try
{
bool first = true;
while (!istr.eof() && *istr.position() != '}')
{
if (!first)
{
if (*istr.position() == ',')
++istr.position();
else
throw Exception("Cannot read Map from text", ErrorCodes::CANNOT_READ_MAP_FROM_TEXT);
}
first = false;
skipWhitespaceIfAny(istr);
if (*istr.position() == '}')
break;
if (need_safe_get_int_key)
{
ReadBuffer::Position tmp = istr.position();
while (*tmp != ':' && *tmp != '}')
++tmp;
*tmp = ' ';
reader(key_type, key_column);
}
else
{
reader(key_type, key_column);
skipWhitespaceIfAny(istr);
assertChar(':', istr);
}
++size;
skipWhitespaceIfAny(istr);
reader(value_type, value_column);
skipWhitespaceIfAny(istr);
}
offsets.push_back(offsets.back() + size);
assertChar('}', istr);
}
catch (...)
{
throw;
}
}
void DataTypeMap::serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
serializeTextImpl(column, row_num, ostr,
[&](const DataTypePtr & subcolumn_type, const IColumn & subcolumn, size_t pos)
{
subcolumn_type->serializeAsTextQuoted(subcolumn, pos, ostr, settings);
});
}
void DataTypeMap::deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
// need_safe_get_int_key is set for Interger to prevent to readIntTextUnsafe
bool need_safe_get_int_key = isInteger(key_type);
deserializeTextImpl(column, istr, need_safe_get_int_key,
[&](const DataTypePtr & subcolumn_type, IColumn & subcolumn)
{
subcolumn_type->deserializeAsTextQuoted(subcolumn, istr, settings);
});
}
void DataTypeMap::serializeTextJSON(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
serializeTextImpl(column, row_num, ostr,
[&](const DataTypePtr & subcolumn_type, const IColumn & subcolumn, size_t pos)
{
subcolumn_type->serializeAsTextJSON(subcolumn, pos, ostr, settings);
});
}
void DataTypeMap::deserializeTextJSON(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
// need_safe_get_int_key is set for Interger to prevent to readIntTextUnsafe
bool need_safe_get_int_key = isInteger(key_type);
deserializeTextImpl(column, istr, need_safe_get_int_key,
[&](const DataTypePtr & subcolumn_type, IColumn & subcolumn)
{
subcolumn_type->deserializeAsTextJSON(subcolumn, istr, settings);
});
}
void DataTypeMap::serializeTextXML(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
const auto & column_map = assert_cast<const ColumnMap &>(column);
const auto & offsets = column_map.getNestedColumn().getOffsets();
size_t offset = offsets[row_num - 1];
size_t next_offset = offsets[row_num];
const auto & nested_data = column_map.getNestedData();
writeCString("<map>", ostr);
for (size_t i = offset; i < next_offset; ++i)
{
writeCString("<elem>", ostr);
writeCString("<key>", ostr);
key_type->serializeAsTextXML(nested_data.getColumn(0), i, ostr, settings);
writeCString("</key>", ostr);
writeCString("<value>", ostr);
value_type->serializeAsTextXML(nested_data.getColumn(1), i, ostr, settings);
writeCString("</value>", ostr);
writeCString("</elem>", ostr);
}
writeCString("</map>", ostr);
}
void DataTypeMap::serializeTextCSV(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
{
WriteBufferFromOwnString wb;
serializeText(column, row_num, wb, settings);
writeCSV(wb.str(), ostr);
}
void DataTypeMap::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
String s;
readCSV(s, istr, settings.csv);
ReadBufferFromString rb(s);
deserializeText(column, rb, settings);
}
void DataTypeMap::enumerateStreams(const StreamCallback & callback, SubstreamPath & path) const
{
nested->enumerateStreams(callback, path);
}
void DataTypeMap::serializeBinaryBulkStatePrefix(
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const
{
nested->serializeBinaryBulkStatePrefix(settings, state);
}
void DataTypeMap::serializeBinaryBulkStateSuffix(
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const
{
nested->serializeBinaryBulkStateSuffix(settings, state);
}
void DataTypeMap::deserializeBinaryBulkStatePrefix(
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state) const
{
nested->deserializeBinaryBulkStatePrefix(settings, state);
}
void DataTypeMap::serializeBinaryBulkWithMultipleStreams(
const IColumn & column,
size_t offset,
size_t limit,
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const
{
nested->serializeBinaryBulkWithMultipleStreams(extractNestedColumn(column), offset, limit, settings, state);
}
void DataTypeMap::deserializeBinaryBulkWithMultipleStreams(
IColumn & column,
size_t limit,
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state) const
{
nested->deserializeBinaryBulkWithMultipleStreams(extractNestedColumn(column), limit, settings, state);
}
void DataTypeMap::serializeProtobuf(const IColumn & column, size_t row_num, ProtobufWriter & protobuf, size_t & value_index) const
{
nested->serializeProtobuf(extractNestedColumn(column), row_num, protobuf, value_index);
}
void DataTypeMap::deserializeProtobuf(IColumn & column, ProtobufReader & protobuf, bool allow_add_row, bool & row_added) const
{
nested->deserializeProtobuf(extractNestedColumn(column), protobuf, allow_add_row, row_added);
}
MutableColumnPtr DataTypeMap::createColumn() const
{
return ColumnMap::create(nested->createColumn());
}
Field DataTypeMap::getDefault() const
{
return Map();
}
bool DataTypeMap::equals(const IDataType & rhs) const
{
if (typeid(rhs) != typeid(*this))
return false;
const DataTypeMap & rhs_map = static_cast<const DataTypeMap &>(rhs);
return nested->equals(*rhs_map.nested);
}
static DataTypePtr create(const ASTPtr & arguments)
{
if (!arguments || arguments->children.size() != 2)
throw Exception("Map data type family must have two arguments: key and value types", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
DataTypes nested_types;
nested_types.reserve(arguments->children.size());
for (const ASTPtr & child : arguments->children)
nested_types.emplace_back(DataTypeFactory::instance().get(child));
return std::make_shared<DataTypeMap>(nested_types);
}
void registerDataTypeMap(DataTypeFactory & factory)
{
factory.registerDataType("Map", create);
}
}

103
src/DataTypes/DataTypeMap.h Normal file
View File

@ -0,0 +1,103 @@
#pragma once
#include <DataTypes/DataTypeWithSimpleSerialization.h>
namespace DB
{
/** Map data type.
* Map is implemented as two arrays of keys and values.
* Serialization of type 'Map(K, V)' is similar to serialization.
* of 'Array(Tuple(keys K, values V))' or in other words of 'Nested(keys K, valuev V)'.
*/
class DataTypeMap final : public DataTypeWithSimpleSerialization
{
private:
DataTypePtr key_type;
DataTypePtr value_type;
/// 'nested' is an Array(Tuple(key_type, value_type))
DataTypePtr nested;
public:
static constexpr bool is_parametric = true;
DataTypeMap(const DataTypes & elems);
DataTypeMap(const DataTypePtr & key_type_, const DataTypePtr & value_type_);
TypeIndex getTypeId() const override { return TypeIndex::Map; }
std::string doGetName() const override;
const char * getFamilyName() const override { return "Map"; }
bool canBeInsideNullable() const override { return false; }
void serializeBinary(const Field & field, WriteBuffer & ostr) const override;
void deserializeBinary(Field & field, ReadBuffer & istr) const override;
void serializeBinary(const IColumn & column, size_t row_num, WriteBuffer & ostr) const override;
void deserializeBinary(IColumn & column, ReadBuffer & istr) const override;
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings &) const override;
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings &) const override;
void serializeTextJSON(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings &) const override;
void deserializeTextJSON(IColumn & column, ReadBuffer & istr, const FormatSettings &) const override;
void serializeTextXML(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings &) const override;
void serializeTextCSV(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings &) const override;
void deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings &) const override;
/** Each sub-column in a map is serialized in separate stream.
*/
void enumerateStreams(const StreamCallback & callback, SubstreamPath & path) const override;
void serializeBinaryBulkStatePrefix(
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const override;
void serializeBinaryBulkStateSuffix(
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const override;
void deserializeBinaryBulkStatePrefix(
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state) const override;
void serializeBinaryBulkWithMultipleStreams(
const IColumn & column,
size_t offset,
size_t limit,
SerializeBinaryBulkSettings & settings,
SerializeBinaryBulkStatePtr & state) const override;
void deserializeBinaryBulkWithMultipleStreams(
IColumn & column,
size_t limit,
DeserializeBinaryBulkSettings & settings,
DeserializeBinaryBulkStatePtr & state) const override;
void serializeProtobuf(const IColumn & column, size_t row_num, ProtobufWriter & protobuf, size_t & value_index) const override;
void deserializeProtobuf(IColumn & column, ProtobufReader & protobuf, bool allow_add_row, bool & row_added) const override;
MutableColumnPtr createColumn() const override;
Field getDefault() const override;
bool equals(const IDataType & rhs) const override;
bool isComparable() const override { return key_type->isComparable() && value_type->isComparable(); }
bool isParametric() const override { return true; }
bool haveSubtypes() const override { return true; }
const DataTypePtr & getKeyType() const { return key_type; }
const DataTypePtr & getValueType() const { return value_type; }
DataTypes getKeyValueTypes() const { return {key_type, value_type}; }
private:
template <typename Writer>
void serializeTextImpl(const IColumn & column, size_t row_num, WriteBuffer & ostr, Writer && writer) const;
template <typename Reader>
void deserializeTextImpl(IColumn & column, ReadBuffer & istr, bool need_safe_get_int_key, Reader && reader) const;
};
}

View File

@ -1,6 +1,7 @@
#include <Common/FieldVisitors.h>
#include <DataTypes/FieldToDataType.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypesDecimal.h>
#include <DataTypes/DataTypeString.h>
@ -118,6 +119,24 @@ DataTypePtr FieldToDataType::operator() (const Tuple & tuple) const
return std::make_shared<DataTypeTuple>(element_types);
}
DataTypePtr FieldToDataType::operator() (const Map & map) const
{
DataTypes key_types;
DataTypes value_types;
key_types.reserve(map.size());
value_types.reserve(map.size());
for (const auto & elem : map)
{
const auto & tuple = elem.safeGet<const Tuple &>();
assert(tuple.size() == 2);
key_types.push_back(applyVisitor(FieldToDataType(), tuple[0]));
value_types.push_back(applyVisitor(FieldToDataType(), tuple[1]));
}
return std::make_shared<DataTypeMap>(getLeastSupertype(key_types), getLeastSupertype(value_types));
}
DataTypePtr FieldToDataType::operator() (const AggregateFunctionStateData & x) const
{
const auto & name = static_cast<const AggregateFunctionStateData &>(x).name;

View File

@ -26,6 +26,7 @@ public:
DataTypePtr operator() (const String & x) const;
DataTypePtr operator() (const Array & x) const;
DataTypePtr operator() (const Tuple & tuple) const;
DataTypePtr operator() (const Map & map) const;
DataTypePtr operator() (const DecimalField<Decimal32> & x) const;
DataTypePtr operator() (const DecimalField<Decimal64> & x) const;
DataTypePtr operator() (const DecimalField<Decimal128> & x) const;

View File

@ -91,6 +91,8 @@ public:
TupleElement,
MapElement,
DictionaryKeys,
DictionaryIndexes,
};
@ -517,6 +519,7 @@ struct WhichDataType
constexpr bool isUUID() const { return idx == TypeIndex::UUID; }
constexpr bool isArray() const { return idx == TypeIndex::Array; }
constexpr bool isTuple() const { return idx == TypeIndex::Tuple; }
constexpr bool isMap() const {return idx == TypeIndex::Map; }
constexpr bool isSet() const { return idx == TypeIndex::Set; }
constexpr bool isInterval() const { return idx == TypeIndex::Interval; }

View File

@ -27,6 +27,7 @@ SRCS(
DataTypeInterval.cpp
DataTypeLowCardinality.cpp
DataTypeLowCardinalityHelpers.cpp
DataTypeMap.cpp
DataTypeNothing.cpp
DataTypeNullable.cpp
DataTypeNumberBase.cpp

View File

@ -43,6 +43,7 @@ static Context createQueryContext(const Context & global_context)
{
Settings new_query_settings = global_context.getSettings();
new_query_settings.insert_allow_materialized_columns = true;
new_query_settings.optimize_on_insert = false;
Context query_context(global_context);
query_context.setSettings(new_query_settings);

View File

@ -20,6 +20,7 @@
#include <DataTypes/DataTypeEnum.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeNothing.h>
#include <DataTypes/DataTypeUUID.h>
@ -32,6 +33,7 @@
#include <Columns/ColumnArray.h>
#include <Columns/ColumnNullable.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnMap.h>
#include <Columns/ColumnsCommon.h>
#include <Common/FieldVisitors.h>
#include <Common/assert_cast.h>
@ -2167,6 +2169,20 @@ private:
};
}
using ElementWrappers = std::vector<WrapperType>;
ElementWrappers getElementWrappers(const DataTypes & from_element_types, const DataTypes & to_element_types) const
{
ElementWrappers element_wrappers;
element_wrappers.reserve(from_element_types.size());
/// Create conversion wrapper for each element in tuple
for (const auto idx_type : ext::enumerate(from_element_types))
element_wrappers.push_back(prepareUnpackDictionaries(idx_type.second, to_element_types[idx_type.first]));
return element_wrappers;
}
WrapperType createTupleWrapper(const DataTypePtr & from_type_untyped, const DataTypeTuple * to_type) const
{
/// Conversion from String through parsing.
@ -2189,12 +2205,7 @@ private:
const auto & from_element_types = from_type->getElements();
const auto & to_element_types = to_type->getElements();
std::vector<WrapperType> element_wrappers;
element_wrappers.reserve(from_element_types.size());
/// Create conversion wrapper for each element in tuple
for (const auto idx_type : ext::enumerate(from_type->getElements()))
element_wrappers.push_back(prepareUnpackDictionaries(idx_type.second, to_element_types[idx_type.first]));
auto element_wrappers = getElementWrappers(from_element_types, to_element_types);
return [element_wrappers, from_element_types, to_element_types]
(ColumnsWithTypeAndName & arguments, const DataTypePtr &, const ColumnNullable * nullable_source, size_t input_rows_count) -> ColumnPtr
@ -2217,6 +2228,121 @@ private:
};
}
/// The case of: tuple([key1, key2, ..., key_n], [value1, value2, ..., value_n])
WrapperType createTupleToMapWrapper(const DataTypes & from_kv_types, const DataTypes & to_kv_types) const
{
return [element_wrappers = getElementWrappers(from_kv_types, to_kv_types), from_kv_types, to_kv_types]
(ColumnsWithTypeAndName & arguments, const DataTypePtr &, const ColumnNullable * nullable_source, size_t input_rows_count) -> ColumnPtr
{
const auto * col = arguments.front().column.get();
const auto & column_tuple = assert_cast<const ColumnTuple &>(*col);
if (column_tuple.getColumn(0).size() != column_tuple.getColumn(1).size())
throw Exception(ErrorCodes::TYPE_MISMATCH,
"CAST AS Map can only be performed from tuple of arrays with equal sizes."
" Size of keys: {}. Size of values: {}", column_tuple.getColumn(0).size(), column_tuple.getColumn(1).size());
ColumnPtr offsets;
Columns converted_columns(2);
for (size_t i = 0; i < 2; ++i)
{
const auto & column_array = assert_cast<const ColumnArray &>(column_tuple.getColumn(i));
ColumnsWithTypeAndName element = {{column_array.getDataPtr(), from_kv_types[i], ""}};
converted_columns[i] = element_wrappers[i](element, to_kv_types[i], nullable_source, input_rows_count);
if (!offsets)
offsets = column_array.getOffsetsPtr();
}
return ColumnMap::create(converted_columns[0], converted_columns[1], offsets);
};
}
WrapperType createMapToMapWrrapper(const DataTypes & from_kv_types, const DataTypes & to_kv_types) const
{
return [element_wrappers = getElementWrappers(from_kv_types, to_kv_types), from_kv_types, to_kv_types]
(ColumnsWithTypeAndName & arguments, const DataTypePtr &, const ColumnNullable * nullable_source, size_t input_rows_count) -> ColumnPtr
{
const auto * col = arguments.front().column.get();
const auto & column_map = typeid_cast<const ColumnMap &>(*col);
const auto & nested_data = column_map.getNestedData();
Columns converted_columns(2);
for (size_t i = 0; i < 2; ++i)
{
ColumnsWithTypeAndName element = {{nested_data.getColumnPtr(i), from_kv_types[i], ""}};
converted_columns[i] = element_wrappers[i](element, to_kv_types[i], nullable_source, input_rows_count);
}
return ColumnMap::create(converted_columns[0], converted_columns[1], column_map.getNestedColumn().getOffsetsPtr());
};
}
/// The case of: [(key1, value1), (key2, value2), ...]
WrapperType createArrayToMapWrrapper(const DataTypes & from_kv_types, const DataTypes & to_kv_types) const
{
return [element_wrappers = getElementWrappers(from_kv_types, to_kv_types), from_kv_types, to_kv_types]
(ColumnsWithTypeAndName & arguments, const DataTypePtr &, const ColumnNullable * nullable_source, size_t input_rows_count) -> ColumnPtr
{
const auto * col = arguments.front().column.get();
const auto & column_array = typeid_cast<const ColumnArray &>(*col);
const auto & nested_data = typeid_cast<const ColumnTuple &>(column_array.getData());
Columns converted_columns(2);
for (size_t i = 0; i < 2; ++i)
{
ColumnsWithTypeAndName element = {{nested_data.getColumnPtr(i), from_kv_types[i], ""}};
converted_columns[i] = element_wrappers[i](element, to_kv_types[i], nullable_source, input_rows_count);
}
return ColumnMap::create(converted_columns[0], converted_columns[1], column_array.getOffsetsPtr());
};
}
WrapperType createMapWrapper(const DataTypePtr & from_type_untyped, const DataTypeMap * to_type) const
{
if (const auto * from_tuple = checkAndGetDataType<DataTypeTuple>(from_type_untyped.get()))
{
if (from_tuple->getElements().size() != 2)
throw Exception{"CAST AS Map from tuple requeires 2 elements.\n"
"Left type: " + from_tuple->getName() + ", right type: " + to_type->getName(), ErrorCodes::TYPE_MISMATCH};
DataTypes from_kv_types;
const auto & to_kv_types = to_type->getKeyValueTypes();
for (const auto & elem : from_tuple->getElements())
{
const auto * type_array = checkAndGetDataType<DataTypeArray>(elem.get());
if (!type_array)
throw Exception(ErrorCodes::TYPE_MISMATCH,
"CAST AS Map can only be performed from tuples of array. Got: {}", from_tuple->getName());
from_kv_types.push_back(type_array->getNestedType());
}
return createTupleToMapWrapper(from_kv_types, to_kv_types);
}
else if (const auto * from_array = typeid_cast<const DataTypeArray *>(from_type_untyped.get()))
{
const auto * nested_tuple = typeid_cast<const DataTypeTuple *>(from_array->getNestedType().get());
if (!nested_tuple || nested_tuple->getElements().size() != 2)
throw Exception{"CAST AS Map from array requeires nested tuple of 2 elements.\n"
"Left type: " + from_array->getName() + ", right type: " + to_type->getName(), ErrorCodes::TYPE_MISMATCH};
return createArrayToMapWrrapper(nested_tuple->getElements(), to_type->getKeyValueTypes());
}
else if (const auto * from_type = checkAndGetDataType<DataTypeMap>(from_type_untyped.get()))
{
return createMapToMapWrrapper(from_type->getKeyValueTypes(), to_type->getKeyValueTypes());
}
else
{
throw Exception{"Unsupported types to CAST AS Map\n"
"Left type: " + from_type_untyped->getName() + ", right type: " + to_type->getName(), ErrorCodes::TYPE_MISMATCH};
}
}
template <typename FieldType>
WrapperType createEnumWrapper(const DataTypePtr & from_type, const DataTypeEnum<FieldType> * to_type) const
{
@ -2584,6 +2710,8 @@ private:
return createArrayWrapper(from_type, checkAndGetDataType<DataTypeArray>(to_type.get()));
case TypeIndex::Tuple:
return createTupleWrapper(from_type, checkAndGetDataType<DataTypeTuple>(to_type.get()));
case TypeIndex::Map:
return createMapWrapper(from_type, checkAndGetDataType<DataTypeMap>(to_type.get()));
case TypeIndex::AggregateFunction:
return createAggregateFunctionWrapper(from_type, checkAndGetDataType<DataTypeAggregateFunction>(to_type.get()));

View File

@ -4,12 +4,16 @@
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypesNumber.h>
#include <Core/ColumnNumbers.h>
#include <Columns/ColumnArray.h>
#include <Core/Field.h>
#include <Columns/ColumnNullable.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnMap.h>
#include <Common/typeid_cast.h>
#include <Common/assert_cast.h>
@ -80,6 +84,38 @@ private:
/** For a tuple array, the function is evaluated component-wise for each element of the tuple.
*/
ColumnPtr executeTuple(const ColumnsWithTypeAndName & arguments, size_t input_rows_count) const;
/** For a map the function finds the matched value for a key.
* Currently implemented just as linear seach in array.
* However, optimizations are possible.
*/
ColumnPtr executeMap(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const;
using Offsets = ColumnArray::Offsets;
static bool matchKeyToIndex(const IColumn & data, const Offsets & offsets,
const ColumnsWithTypeAndName & arguments, PaddedPODArray<UInt64> & matched_idxs);
static bool matchKeyToIndexConst(const IColumn & data, const Offsets & offsets,
const Field & index, PaddedPODArray<UInt64> & matched_idxs);
template <typename DataType>
static bool matchKeyToIndexNumber(const IColumn & data, const Offsets & offsets,
const ColumnsWithTypeAndName & arguments, PaddedPODArray<UInt64> & matched_idxs);
template <typename DataType>
static bool matchKeyToIndexNumberConst(const IColumn & data, const Offsets & offsets,
const Field & index, PaddedPODArray<UInt64> & matched_idxs);
static bool matchKeyToIndexString(const IColumn & data, const Offsets & offsets,
const ColumnsWithTypeAndName & arguments, PaddedPODArray<UInt64> & matched_idxs);
static bool matchKeyToIndexStringConst(const IColumn & data, const Offsets & offsets,
const Field & index, PaddedPODArray<UInt64> & matched_idxs);
template <typename Matcher>
static void executeMatchKeyToIndex(const Offsets & offsets,
PaddedPODArray<UInt64> & matched_idxs, const Matcher & matcher);
};
@ -626,10 +662,8 @@ ColumnPtr FunctionArrayElement::executeArgument(
const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, ArrayImpl::NullMapBuilder & builder, size_t input_rows_count) const
{
auto index = checkAndGetColumn<ColumnVector<IndexType>>(arguments[1].column.get());
if (!index)
return nullptr;
const auto & index_data = index->getData();
if (builder)
@ -704,6 +738,236 @@ ColumnPtr FunctionArrayElement::executeTuple(const ColumnsWithTypeAndName & argu
return ColumnTuple::create(result_tuple_columns);
}
namespace
{
struct MatcherString
{
const ColumnString & data;
const ColumnString & index;
bool match(size_t row_data, size_t row_index) const
{
auto data_ref = data.getDataAt(row_data);
auto index_ref = index.getDataAt(row_index);
return memequalSmallAllowOverflow15(index_ref.data, index_ref.size, data_ref.data, data_ref.size);
}
};
struct MatcherStringConst
{
const ColumnString & data;
const String & index;
bool match(size_t row_data, size_t /* row_index */) const
{
auto data_ref = data.getDataAt(row_data);
return index.size() == data_ref.size && memcmp(index.data(), data_ref.data, data_ref.size) == 0;
}
};
template <typename T>
struct MatcherNumber
{
const PaddedPODArray<T> & data;
const PaddedPODArray<T> & index;
bool match(size_t row_data, size_t row_index) const
{
return data[row_data] == index[row_index];
}
};
template <typename T>
struct MatcherNumberConst
{
const PaddedPODArray<T> & data;
T index;
bool match(size_t row_data, size_t /* row_index */) const
{
return data[row_data] == index;
}
};
}
template <typename Matcher>
void FunctionArrayElement::executeMatchKeyToIndex(
const Offsets & offsets, PaddedPODArray<UInt64> & matched_idxs, const Matcher & matcher)
{
size_t rows = offsets.size();
for (size_t i = 0; i < rows; ++i)
{
bool matched = false;
size_t begin = offsets[i - 1];
size_t end = offsets[i];
for (size_t j = begin; j < end; ++j)
{
if (matcher.match(j, i))
{
matched_idxs.push_back(j - begin + 1);
matched = true;
break;
}
}
if (!matched)
matched_idxs.push_back(0);
}
}
bool FunctionArrayElement::matchKeyToIndexStringConst(
const IColumn & data, const Offsets & offsets,
const Field & index, PaddedPODArray<UInt64> & matched_idxs)
{
const auto * data_string = checkAndGetColumn<ColumnString>(&data);
if (!data_string)
return false;
if (index.getType() != Field::Types::String)
return false;
MatcherStringConst matcher{*data_string, get<const String &>(index)};
executeMatchKeyToIndex(offsets, matched_idxs, matcher);
return true;
}
bool FunctionArrayElement::matchKeyToIndexString(
const IColumn & data, const Offsets & offsets,
const ColumnsWithTypeAndName & arguments, PaddedPODArray<UInt64> & matched_idxs)
{
const auto * index_string = checkAndGetColumn<ColumnString>(arguments[1].column.get());
if (!index_string)
return false;
const auto * data_string = checkAndGetColumn<ColumnString>(&data);
if (!data_string)
return false;
MatcherString matcher{*data_string, *index_string};
executeMatchKeyToIndex(offsets, matched_idxs, matcher);
return true;
}
template <typename DataType>
bool FunctionArrayElement::matchKeyToIndexNumberConst(
const IColumn & data, const Offsets & offsets,
const Field & index, PaddedPODArray<UInt64> & matched_idxs)
{
const auto * data_numeric = checkAndGetColumn<ColumnVector<DataType>>(&data);
if (!data_numeric)
return false;
if (index.getType() != Field::Types::UInt64 && index.getType() != Field::Types::Int64)
return false;
MatcherNumberConst<DataType> matcher{data_numeric->getData(), get<DataType>(index)};
executeMatchKeyToIndex(offsets, matched_idxs, matcher);
return true;
}
template <typename DataType>
bool FunctionArrayElement::matchKeyToIndexNumber(
const IColumn & data, const Offsets & offsets,
const ColumnsWithTypeAndName & arguments, PaddedPODArray<UInt64> & matched_idxs)
{
const auto * index_numeric = checkAndGetColumn<ColumnVector<DataType>>(arguments[1].column.get());
if (!index_numeric)
return false;
const auto * data_numeric = checkAndGetColumn<ColumnVector<DataType>>(&data);
if (!data_numeric)
return false;
MatcherNumber<DataType> matcher{data_numeric->getData(), index_numeric->getData()};
executeMatchKeyToIndex(offsets, matched_idxs, matcher);
return true;
}
bool FunctionArrayElement::matchKeyToIndex(
const IColumn & data, const Offsets & offsets,
const ColumnsWithTypeAndName & arguments, PaddedPODArray<UInt64> & matched_idxs)
{
return matchKeyToIndexNumber<UInt8>(data, offsets, arguments, matched_idxs)
|| matchKeyToIndexNumber<UInt16>(data, offsets, arguments, matched_idxs)
|| matchKeyToIndexNumber<UInt32>(data, offsets, arguments, matched_idxs)
|| matchKeyToIndexNumber<UInt64>(data, offsets, arguments, matched_idxs)
|| matchKeyToIndexNumber<Int8>(data, offsets, arguments, matched_idxs)
|| matchKeyToIndexNumber<Int16>(data, offsets, arguments, matched_idxs)
|| matchKeyToIndexNumber<Int32>(data, offsets, arguments, matched_idxs)
|| matchKeyToIndexNumber<Int64>(data, offsets, arguments, matched_idxs)
|| matchKeyToIndexString(data, offsets, arguments, matched_idxs);
}
bool FunctionArrayElement::matchKeyToIndexConst(
const IColumn & data, const Offsets & offsets,
const Field & index, PaddedPODArray<UInt64> & matched_idxs)
{
return matchKeyToIndexNumberConst<UInt8>(data, offsets, index, matched_idxs)
|| matchKeyToIndexNumberConst<UInt16>(data, offsets, index, matched_idxs)
|| matchKeyToIndexNumberConst<UInt32>(data, offsets, index, matched_idxs)
|| matchKeyToIndexNumberConst<UInt64>(data, offsets, index, matched_idxs)
|| matchKeyToIndexNumberConst<Int8>(data, offsets, index, matched_idxs)
|| matchKeyToIndexNumberConst<Int16>(data, offsets, index, matched_idxs)
|| matchKeyToIndexNumberConst<Int32>(data, offsets, index, matched_idxs)
|| matchKeyToIndexNumberConst<Int64>(data, offsets, index, matched_idxs)
|| matchKeyToIndexStringConst(data, offsets, index, matched_idxs);
}
ColumnPtr FunctionArrayElement::executeMap(
const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const
{
const ColumnMap * col_map = typeid_cast<const ColumnMap *>(arguments[0].column.get());
if (!col_map)
return nullptr;
const auto & nested_column = col_map->getNestedColumn();
const auto & keys_data = col_map->getNestedData().getColumn(0);
const auto & values_data = col_map->getNestedData().getColumn(1);
const auto & offsets = nested_column.getOffsets();
/// At first step calculate indices in array of values for requested keys.
auto indices_column = DataTypeNumber<UInt64>().createColumn();
indices_column->reserve(input_rows_count);
auto & indices_data = assert_cast<ColumnVector<UInt64> &>(*indices_column).getData();
if (!isColumnConst(*arguments[1].column))
{
if (input_rows_count > 0 && !matchKeyToIndex(keys_data, offsets, arguments, indices_data))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal types of arguments: {}, {} for function ",
arguments[0].type->getName(), arguments[1].type->getName(), getName());
}
else
{
Field index = (*arguments[1].column)[0];
// Get Matched key's value
if (input_rows_count > 0 && !matchKeyToIndexConst(keys_data, offsets, index, indices_data))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Illegal types of arguments: {}, {} for function {}",
arguments[0].type->getName(), arguments[1].type->getName(), getName());
}
/// Prepare arguments to call arrayElement for array with values and calculated indices at previous step.
ColumnsWithTypeAndName new_arguments =
{
{
ColumnArray::create(values_data.getPtr(), nested_column.getOffsetsPtr()),
std::make_shared<DataTypeArray>(result_type),
""
},
{
std::move(indices_column),
std::make_shared<DataTypeNumber<UInt64>>(),
""
}
};
return executeImpl(new_arguments, result_type, input_rows_count);
}
String FunctionArrayElement::getName() const
{
return name;
@ -711,7 +975,10 @@ String FunctionArrayElement::getName() const
DataTypePtr FunctionArrayElement::getReturnTypeImpl(const DataTypes & arguments) const
{
const DataTypeArray * array_type = checkAndGetDataType<DataTypeArray>(arguments[0].get());
if (const auto * map_type = checkAndGetDataType<DataTypeMap>(arguments[0].get()))
return map_type->getValueType();
const auto * array_type = checkAndGetDataType<DataTypeArray>(arguments[0].get());
if (!array_type)
{
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
@ -734,6 +1001,10 @@ ColumnPtr FunctionArrayElement::executeImpl(const ColumnsWithTypeAndName & argum
/// Check nullability.
bool is_array_of_nullable = false;
const ColumnMap * col_map = checkAndGetColumn<ColumnMap>(arguments[0].column.get());
if (col_map)
return executeMap(arguments, result_type, input_rows_count);
const ColumnArray * col_array = nullptr;
const ColumnArray * col_const_array = nullptr;
@ -826,7 +1097,7 @@ ColumnPtr FunctionArrayElement::perform(const ColumnsWithTypeAndName & arguments
|| (res = executeArgument<Int16>(arguments, result_type, builder, input_rows_count))
|| (res = executeArgument<Int32>(arguments, result_type, builder, input_rows_count))
|| (res = executeArgument<Int64>(arguments, result_type, builder, input_rows_count))))
throw Exception("Second argument for function " + getName() + " must must have UInt or Int type.",
throw Exception("Second argument for function " + getName() + " must have UInt or Int type.",
ErrorCodes::ILLEGAL_COLUMN);
}
else

View File

@ -304,10 +304,10 @@ void FunctionArrayEnumerateRankedExtended<Derived>::executeMethodImpl(
const size_t depth_to_look = arrays_depths.max_array_depth;
const auto & offsets = *offsets_by_depth[depth_to_look - 1];
using Map = ClearableHashMapWithStackMemory<UInt128, UInt32,
using Container = ClearableHashMapWithStackMemory<UInt128, UInt32,
UInt128TrivialHash, INITIAL_SIZE_DEGREE>;
Map indices;
Container indices;
std::vector<size_t> indices_by_depth(depth_to_look);
std::vector<size_t> current_offset_n_by_depth(depth_to_look);

View File

@ -438,22 +438,22 @@ ColumnPtr FunctionArrayIntersect::executeImpl(const ColumnsWithTypeAndName & arg
template <typename T, size_t>
void FunctionArrayIntersect::NumberExecutor::operator()()
{
using Map = ClearableHashMapWithStackMemory<T, size_t, DefaultHash<T>,
using Container = ClearableHashMapWithStackMemory<T, size_t, DefaultHash<T>,
INITIAL_SIZE_DEGREE>;
if (!result && typeid_cast<const DataTypeNumber<T> *>(data_type.get()))
result = execute<Map, ColumnVector<T>, true>(arrays, ColumnVector<T>::create());
result = execute<Container, ColumnVector<T>, true>(arrays, ColumnVector<T>::create());
}
template <typename T, size_t>
void FunctionArrayIntersect::DecimalExecutor::operator()()
{
using Map = ClearableHashMapWithStackMemory<T, size_t, DefaultHash<T>,
using Container = ClearableHashMapWithStackMemory<T, size_t, DefaultHash<T>,
INITIAL_SIZE_DEGREE>;
if (!result)
if (auto * decimal = typeid_cast<const DataTypeDecimal<T> *>(data_type.get()))
result = execute<Map, ColumnDecimal<T>, true>(arrays, ColumnDecimal<T>::create(0, decimal->getScale()));
result = execute<Container, ColumnDecimal<T>, true>(arrays, ColumnDecimal<T>::create(0, decimal->getScale()));
}
template <typename Map, typename ColumnType, bool is_numeric_column>

140
src/Functions/map.cpp Normal file
View File

@ -0,0 +1,140 @@
#include <Functions/IFunctionImpl.h>
#include <Functions/FunctionFactory.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypesNumber.h>
#include <Columns/ColumnMap.h>
#include <Columns/ColumnArray.h>
#include <DataTypes/getLeastSupertype.h>
#include <Interpreters/castColumn.h>
#include <memory>
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}
namespace
{
// map(x, y, ...) is a function that allows you to make key-value pair
class FunctionMap : public IFunction
{
public:
static constexpr auto name = "map";
static FunctionPtr create(const Context &)
{
return std::make_shared<FunctionMap>();
}
String getName() const override
{
return name;
}
bool isVariadic() const override
{
return true;
}
size_t getNumberOfArguments() const override
{
return 0;
}
bool isInjective(const ColumnsWithTypeAndName &) const override
{
return true;
}
bool useDefaultImplementationForNulls() const override { return false; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
{
if (arguments.size() % 2 != 0)
throw Exception("Function " + getName() + " even number of arguments.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
DataTypes keys, values;
for (size_t i = 0; i < arguments.size(); i += 2)
{
keys.emplace_back(arguments[i]);
values.emplace_back(arguments[i + 1]);
}
DataTypes tmp;
tmp.emplace_back(getLeastSupertype(keys));
tmp.emplace_back(getLeastSupertype(values));
return std::make_shared<DataTypeMap>(tmp);
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
{
size_t num_elements = arguments.size();
if (num_elements == 0)
return result_type->createColumnConstWithDefaultValue(input_rows_count);
const auto & result_type_map = static_cast<const DataTypeMap &>(*result_type);
const DataTypePtr & key_type = result_type_map.getKeyType();
const DataTypePtr & value_type = result_type_map.getValueType();
Columns columns_holder(num_elements);
ColumnRawPtrs column_ptrs(num_elements);
for (size_t i = 0; i < num_elements; ++i)
{
const auto & arg = arguments[i];
const auto to_type = i % 2 == 0 ? key_type : value_type;
ColumnPtr preprocessed_column = castColumn(arg, to_type);
preprocessed_column = preprocessed_column->convertToFullColumnIfConst();
columns_holder[i] = std::move(preprocessed_column);
column_ptrs[i] = columns_holder[i].get();
}
/// Create and fill the result map.
MutableColumnPtr keys_data = key_type->createColumn();
MutableColumnPtr values_data = value_type->createColumn();
MutableColumnPtr offsets = DataTypeNumber<IColumn::Offset>().createColumn();
size_t total_elements = input_rows_count * num_elements / 2;
keys_data->reserve(total_elements);
values_data->reserve(total_elements);
offsets->reserve(input_rows_count);
IColumn::Offset current_offset = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
for (size_t j = 0; j < num_elements; j += 2)
{
keys_data->insertFrom(*column_ptrs[j], i);
values_data->insertFrom(*column_ptrs[j + 1], i);
}
current_offset += num_elements / 2;
offsets->insert(current_offset);
}
auto nested_column = ColumnArray::create(
ColumnTuple::create(Columns{std::move(keys_data), std::move(values_data)}),
std::move(offsets));
return ColumnMap::create(nested_column);
}
};
}
void registerFunctionsMap(FunctionFactory & factory)
{
factory.registerFunction<FunctionMap>();
}
}

View File

@ -10,6 +10,7 @@ namespace DB
void registerFunctionsArithmetic(FunctionFactory &);
void registerFunctionsArray(FunctionFactory &);
void registerFunctionsTuple(FunctionFactory &);
void registerFunctionsMap(FunctionFactory &);
void registerFunctionsBitmap(FunctionFactory &);
void registerFunctionsCoding(FunctionFactory &);
void registerFunctionsComparison(FunctionFactory &);
@ -64,6 +65,7 @@ void registerFunctions()
registerFunctionsArithmetic(factory);
registerFunctionsArray(factory);
registerFunctionsTuple(factory);
registerFunctionsMap(factory);
#if !defined(ARCADIA_BUILD)
registerFunctionsBitmap(factory);
#endif

View File

@ -325,6 +325,7 @@ SRCS(
lowCardinalityKeys.cpp
lower.cpp
lowerUTF8.cpp
map.cpp
match.cpp
materialize.cpp
minus.cpp

View File

@ -615,6 +615,23 @@ void InterpreterCreateQuery::validateTableStructure(const ASTCreateQuery & creat
}
}
}
if (!create.attach && !settings.allow_experimental_map_type)
{
for (const auto & name_and_type_pair : properties.columns.getAllPhysical())
{
WhichDataType which(*name_and_type_pair.type);
if (which.isMap())
{
const auto & type_name = name_and_type_pair.type->getName();
String message = "Cannot create table with column '" + name_and_type_pair.name + "' which type is '"
+ type_name + "' because experimental Map type is not allowed. "
+ "Set 'allow_experimental_map_type = 1' setting to enable";
throw Exception(message, ErrorCodes::ILLEGAL_COLUMN);
}
}
}
}
void InterpreterCreateQuery::setEngine(ASTCreateQuery & create) const

View File

@ -5,6 +5,7 @@
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypesDecimal.h>
#include <DataTypes/DataTypeString.h>

View File

@ -3,6 +3,7 @@
#include <Parsers/ASTFunction.h>
#include <Parsers/ASTWithAlias.h>
#include <Parsers/ASTSubquery.h>
#include <Parsers/ASTExpressionList.h>
#include <IO/WriteHelpers.h>
#include <IO/WriteBufferFromString.h>
#include <Common/SipHash.h>
@ -395,6 +396,19 @@ void ASTFunction::formatImplWithoutAlias(const FormatSettings & settings, Format
settings.ostr << (settings.hilite ? hilite_operator : "") << ')' << (settings.hilite ? hilite_none : "");
written = true;
}
if (!written && 0 == strcmp(name.c_str(), "map"))
{
settings.ostr << (settings.hilite ? hilite_operator : "") << '{' << (settings.hilite ? hilite_none : "");
for (size_t i = 0; i < arguments->children.size(); ++i)
{
if (i != 0)
settings.ostr << ", ";
arguments->children[i]->formatImpl(settings, state, nested_dont_need_parens);
}
settings.ostr << (settings.hilite ? hilite_operator : "") << '}' << (settings.hilite ? hilite_none : "");
written = true;
}
}
if (!written)

View File

@ -1110,6 +1110,10 @@ bool ParserCollectionOfLiterals<Collection>::parseImpl(Pos & pos, ASTPtr & node,
{
++pos;
}
else if (pos->type == TokenType::Colon && std::is_same_v<Collection, Map> && arr.size() % 2 == 1)
{
++pos;
}
else
{
expected.add(pos, "comma or closing bracket");
@ -1130,6 +1134,7 @@ bool ParserCollectionOfLiterals<Collection>::parseImpl(Pos & pos, ASTPtr & node,
template bool ParserCollectionOfLiterals<Array>::parseImpl(Pos & pos, ASTPtr & node, Expected & expected);
template bool ParserCollectionOfLiterals<Tuple>::parseImpl(Pos & pos, ASTPtr & node, Expected & expected);
template bool ParserCollectionOfLiterals<Map>::parseImpl(Pos & pos, ASTPtr & node, Expected & expected);
bool ParserLiteral::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
{
@ -1564,6 +1569,7 @@ bool ParserExpressionElement::parseImpl(Pos & pos, ASTPtr & node, Expected & exp
{
return ParserSubquery().parse(pos, node, expected)
|| ParserTupleOfLiterals().parse(pos, node, expected)
|| ParserMapOfLiterals().parse(pos, node, expected)
|| ParserParenthesisExpression().parse(pos, node, expected)
|| ParserArrayOfLiterals().parse(pos, node, expected)
|| ParserArray().parse(pos, node, expected)

View File

@ -270,6 +270,18 @@ protected:
}
};
class ParserMapOfLiterals : public IParserBase
{
public:
ParserCollectionOfLiterals<Map> map_parser{TokenType::OpeningCurlyBrace, TokenType::ClosingCurlyBrace};
protected:
const char * getName() const override { return "map"; }
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override
{
return map_parser.parse(pos, node, expected);
}
};
class ParserArrayOfLiterals : public IParserBase
{
public:

View File

@ -100,6 +100,7 @@ bool ParserList::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
auto list = std::make_shared<ASTExpressionList>(result_separator);
list->children = std::move(elements);
node = list;
return true;
}

View File

@ -1,11 +1,13 @@
#include <Columns/ColumnConst.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnMap.h>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/FieldToDataType.h>
#include <Processors/Formats/IRowInputFormat.h>
#include <Functions/FunctionFactory.h>
@ -46,6 +48,7 @@ struct SpecialParserType
bool is_nullable = false;
bool is_array = false;
bool is_tuple = false;
bool is_map = false;
/// Type and nullability
std::vector<std::pair<Field::Types::Which, bool>> nested_types;
@ -119,6 +122,10 @@ static void fillLiteralInfo(DataTypes & nested_types, LiteralInfo & info)
{
field_type = Field::Types::Tuple;
}
else if (type_info.isMap())
{
field_type = Field::Types::Map;
}
else
throw Exception("Unexpected literal type inside Array: " + nested_type->getName() + ". It's a bug",
ErrorCodes::LOGICAL_ERROR);
@ -196,6 +203,13 @@ private:
if (not_null == array.end())
return true;
}
else if (literal->value.getType() == Field::Types::Map)
{
const Map & map = literal->value.get<Map>();
if (map.size() % 2)
return false;
}
String column_name = "_dummy_" + std::to_string(replaced_literals.size());
replaced_literals.emplace_back(literal, column_name, force_nullable);
setDataType(replaced_literals.back());
@ -243,6 +257,15 @@ private:
fillLiteralInfo(nested_types, info);
info.type = std::make_shared<DataTypeTuple>(nested_types);
}
else if (field_type == Field::Types::Map)
{
info.special_parser.is_map = true;
info.type = applyVisitor(FieldToDataType(), info.literal->value);
auto nested_types = assert_cast<const DataTypeMap &>(*info.type).getKeyValueTypes();
fillLiteralInfo(nested_types, info);
info.type = std::make_shared<DataTypeMap>(nested_types);
}
else
throw Exception(String("Unexpected literal type ") + info.literal->value.getTypeName() + ". It's a bug",
ErrorCodes::LOGICAL_ERROR);
@ -453,17 +476,19 @@ bool ConstantExpressionTemplate::parseLiteralAndAssertType(ReadBuffer & istr, co
/// If literal does not fit entirely in the buffer, parsing error will happen.
/// However, it's possible to deduce new template (or use template from cache) after error like it was template mismatch.
if (type_info.is_array || type_info.is_tuple)
if (type_info.is_array || type_info.is_tuple || type_info.is_map)
{
/// TODO faster way to check types without using Parsers
ParserArrayOfLiterals parser_array;
ParserTupleOfLiterals parser_tuple;
ParserMapOfLiterals parser_map;
Tokens tokens_number(istr.position(), istr.buffer().end());
IParser::Pos iterator(tokens_number, settings.max_parser_depth);
Expected expected;
ASTPtr ast;
if (!parser_array.parse(iterator, ast, expected) && !parser_tuple.parse(iterator, ast, expected))
if (!parser_array.parse(iterator, ast, expected) && !parser_tuple.parse(iterator, ast, expected)
&& !parser_map.parse(iterator, ast, expected))
return false;
istr.position() = const_cast<char *>(iterator->begin);
@ -474,8 +499,10 @@ bool ConstantExpressionTemplate::parseLiteralAndAssertType(ReadBuffer & istr, co
DataTypes nested_types;
if (type_info.is_array)
nested_types = { assert_cast<const DataTypeArray &>(*collection_type).getNestedType() };
else
else if (type_info.is_tuple)
nested_types = assert_cast<const DataTypeTuple &>(*collection_type).getElements();
else
nested_types = assert_cast<const DataTypeMap &>(*collection_type).getKeyValueTypes();
for (size_t i = 0; i < nested_types.size(); ++i)
{

View File

@ -3,6 +3,83 @@
#include <Storages/MergeTree/MergeSelector.h>
/**
We have a set of data parts that is dynamically changing - new data parts are added and there is background merging process.
Background merging process periodically selects continuous range of data parts to merge.
It tries to optimize the following metrics:
1. Write amplification: total amount of data written on disk (source data + merges) relative to the amount of source data.
It can be also considered as the total amount of work for merges.
2. The number of data parts in the set at the random moment of time (average + quantiles).
Also taking the following considerations:
1. Older data parts should be merged less frequently than newer data parts.
2. Larger data parts should be merged less frequently than smaller data parts.
3. If no new parts arrive, we should continue to merge existing data parts to eventually optimize the table.
4. Never allow too many parts, because it will slow down SELECT queries significantly.
5. Multiple background merges can run concurrently but not too many.
It is not possible to optimize both metrics, because they contradict to each other.
To lower the number of parts we can merge eagerly but write amplification will increase.
Then we need some balance between optimization of these two metrics.
But some optimizations may improve both metrics.
For example, we can look at the "merge tree" - the tree of data parts that were merged.
If the tree is perfectly balanced then its depth is proportonal to the log(data size),
the total amount of work is proportional to data_size * log(data_size)
and the write amplification is proportional to log(data_size).
If it's not balanced (e.g. every new data part is always merged with existing data parts),
its depth is proportional to the data size, total amount of work is proportional to data_size^2.
We can also control the "base of the logarithm" - you can look it as the number of data parts
that are merged at once (the tree "arity"). But as the data parts are of different size, we should generalize it:
calculate the ratio between total size of merged parts to the size of the largest part participated in merge.
For example, if we merge 4 parts of size 5, 3, 1, 1 - then "base" will be 2 (10 / 5).
Base of the logarithm (simply called `base` in `SimpleMergeSelector`) is the main knob to control the write amplification.
The more it is, the less is write amplification but we will have more data parts on average.
To fit all the considerations, we also adjust `base` depending on total parts count,
parts size and parts age, with linear interpolation (then `base` is not a constant but a function of multiple variables,
looking like a section of hyperplane).
Then we apply the algorithm to select the optimal range of data parts to merge.
There is a proof that this algorithm is optimal if we look in the future only by single step.
The best range of data parts is selected.
We also apply some tunes:
- there is a fixed const of merging small parts (that is added to the size of data part before all estimations);
- there are some heuristics to "stick" ranges to large data parts.
It's still unclear if this algorithm is good or optimal at all. It's unclear if this algorithm is using the optimal coefficients.
To test and optimize SimpleMergeSelector, we apply the following methods:
- insert/merge simulator: a model simulating parts insertion and merging;
merge selecting algorithm is applied and the relevant metrics are calculated to allow to tune the algorithm;
- insert/merge simulator on real `system.part_log` from production - it gives realistic information about inserted data parts:
their sizes, at what time intervals they are inserted;
There is a research thesis dedicated to optimization of merge algorithm:
https://presentations.clickhouse.tech/hse_2019/merge_algorithm.pptx
This work made attempt to variate the coefficients in SimpleMergeSelector and to solve the optimization task:
maybe some change in coefficients will give a clear win on all metrics. Surprisingly enough, it has found
that our selection of coefficients is near optimal. It has found slightly more optimal coefficients,
but I decided not to use them, because the representativeness of the test data is in question.
This work did not make any attempt to propose any other algorithm.
This work did not make any attempt to analyze the task with analytical methods.
That's why I still believe that there are many opportunities to optimize the merge selection algorithm.
Please do not mix the task with a similar task in other LSM-based systems (like RocksDB).
Their problem statement is subtly different. Our set of data parts is consisted of data parts
that are completely independent in stored data. Ranges of primary keys in data parts can intersect.
When doing SELECT we read from all data parts. INSERTed data parts comes with unknown size...
*/
namespace DB
{

View File

@ -39,7 +39,8 @@ namespace DB
* - the structure of the table (/metadata, /columns)
* - action log with data (/log/log-...,/replicas/replica_name/queue/queue-...);
* - a replica list (/replicas), and replica activity tag (/replicas/replica_name/is_active), replica addresses (/replicas/replica_name/host);
* - select the leader replica (/leader_election) - this is the replica that assigns the merge;
* - select the leader replica (/leader_election) - these are the replicas that assigning merges, mutations and partition manipulations
* (after ClickHouse version 20.5 we allow multiple leaders to act concurrently);
* - a set of parts of data on each replica (/replicas/replica_name/parts);
* - list of the last N blocks of data with checksum, for deduplication (/blocks);
* - the list of incremental block numbers (/block_numbers) that we are about to insert,

View File

@ -15,6 +15,7 @@ from subprocess import check_call
from subprocess import Popen
from subprocess import PIPE
from subprocess import CalledProcessError
from subprocess import TimeoutExpired
from datetime import datetime
from time import time, sleep
from errno import ESRCH
@ -114,6 +115,7 @@ def get_db_engine(args):
def run_single_test(args, ext, server_logs_level, client_options, case_file, stdout_file, stderr_file):
# print(client_options)
start_time = datetime.now()
if args.database:
database = args.database
os.environ.setdefault("CLICKHOUSE_DATABASE", database)
@ -129,7 +131,11 @@ def run_single_test(args, ext, server_logs_level, client_options, case_file, std
database = 'test_{suffix}'.format(suffix=random_str())
clickhouse_proc_create = Popen(shlex.split(args.client), stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=True)
clickhouse_proc_create.communicate(("CREATE DATABASE " + database + get_db_engine(args)))
try:
clickhouse_proc_create.communicate(("CREATE DATABASE " + database + get_db_engine(args)), timeout=args.timeout)
except TimeoutExpired:
total_time = (datetime.now() - start_time).total_seconds()
return clickhouse_proc_create, "", "Timeout creating database {} before test".format(database), total_time
os.environ["CLICKHOUSE_DATABASE"] = database
@ -152,14 +158,24 @@ def run_single_test(args, ext, server_logs_level, client_options, case_file, std
# print(command)
proc = Popen(command, shell=True, env=os.environ)
start_time = datetime.now()
while (datetime.now() - start_time).total_seconds() < args.timeout and proc.poll() is None:
sleep(0.01)
if not args.database:
clickhouse_proc_create = Popen(shlex.split(args.client), stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=True)
clickhouse_proc_create.communicate(("DROP DATABASE " + database))
seconds_left = max(args.timeout - (datetime.now() - start_time).total_seconds(), 10)
try:
clickhouse_proc_create.communicate(("DROP DATABASE " + database), timeout=seconds_left)
except TimeoutExpired:
# kill test process because it can also hung
if proc.returncode is None:
try:
proc.kill()
except OSError as e:
if e.errno != ESRCH:
raise
return clickhouse_proc_create, "", "Timeout dropping database {} after test".format(database), total_time
total_time = (datetime.now() - start_time).total_seconds()
@ -305,7 +321,7 @@ def run_tests_array(all_tests_with_params):
if args.testname:
clickhouse_proc = Popen(shlex.split(args.client), stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=True)
clickhouse_proc.communicate(("SELECT 'Running test {suite}/{case} from pid={pid}';".format(pid = os.getpid(), case = case, suite = suite)))
clickhouse_proc.communicate(("SELECT 'Running test {suite}/{case} from pid={pid}';".format(pid = os.getpid(), case = case, suite = suite)), timeout=10)
if clickhouse_proc.returncode != 0:
failures += 1
@ -330,6 +346,8 @@ def run_tests_array(all_tests_with_params):
print(MSG_FAIL, end='')
print_test_time(total_time)
print(" - Timeout!")
if stderr:
print(stderr)
else:
counter = 1
while proc.returncode != 0 and need_retry(stderr):
@ -562,6 +580,9 @@ def main(args):
else:
args.skip = tests_to_skip_from_list
if args.use_skip_list and not args.sequential:
args.sequential = collect_sequential_list(args.skip_list_path)
base_dir = os.path.abspath(args.queries)
tmp_dir = os.path.abspath(args.tmp)
@ -675,10 +696,19 @@ def main(args):
all_tests = [t for t in all_tests if any([re.search(r, t) for r in args.test])]
all_tests.sort(key=key_func)
parallel_tests = []
sequential_tests = []
for test in all_tests:
if any(s in test for s in args.sequential):
sequential_tests.append(test)
else:
parallel_tests.append(test)
print("Found", len(parallel_tests), "parallel tests and", len(sequential_tests), "sequential tests")
run_n, run_total = args.parallel.split('/')
run_n = float(run_n)
run_total = float(run_total)
tests_n = len(all_tests)
tests_n = len(parallel_tests)
if run_total > tests_n:
run_total = tests_n
if run_n > run_total:
@ -690,18 +720,20 @@ def main(args):
if jobs > run_total:
run_total = jobs
batch_size = len(all_tests) // jobs
all_tests_array = []
for i in range(0, len(all_tests), batch_size):
all_tests_array.append((all_tests[i:i+batch_size], suite, suite_dir, suite_tmp_dir, run_total))
batch_size = len(parallel_tests) // jobs
parallel_tests_array = []
for i in range(0, len(parallel_tests), batch_size):
parallel_tests_array.append((parallel_tests[i:i+batch_size], suite, suite_dir, suite_tmp_dir, run_total))
if jobs > 1:
with closing(multiprocessing.Pool(processes=jobs)) as pool:
pool.map(run_tests_array, all_tests_array)
else:
run_tests_array(all_tests_array[int(run_n)-1])
pool.map(run_tests_array, parallel_tests_array)
total_tests_run += tests_n
run_tests_array((sequential_tests, suite, suite_dir, suite_tmp_dir, run_total))
total_tests_run += len(sequential_tests) + len(parallel_tests)
else:
run_tests_array((all_tests, suite, suite_dir, suite_tmp_dir, run_total))
total_tests_run += len(all_tests)
if args.hung_check:
@ -792,6 +824,20 @@ def collect_tests_to_skip(skip_list_path, build_flags):
return result
def collect_sequential_list(skip_list_path):
if not os.path.exists(skip_list_path):
return set([])
with open(skip_list_path, 'r') as skip_list_file:
content = skip_list_file.read()
# allows to have comments in skip_list.json
skip_dict = json.loads(json_minify(content))
if 'parallel' in skip_dict:
return skip_dict['parallel']
return set([])
if __name__ == '__main__':
parser=ArgumentParser(description='ClickHouse functional tests')
parser.add_argument('-q', '--queries', help='Path to queries dir')
@ -828,6 +874,7 @@ if __name__ == '__main__':
parser.add_argument('--no-stateless', action='store_true', help='Disable all stateless tests')
parser.add_argument('--no-stateful', action='store_true', help='Disable all stateful tests')
parser.add_argument('--skip', nargs='+', help="Skip these tests")
parser.add_argument('--sequential', nargs='+', help="Run these tests sequentially even if --parallel specified")
parser.add_argument('--no-long', action='store_false', dest='no_long', help='Do not run long tests')
parser.add_argument('--client-option', nargs='+', help='Specify additional client argument')
parser.add_argument('--print-time', action='store_true', dest='print_time', help='Print test time')
@ -871,6 +918,9 @@ if __name__ == '__main__':
if args.skip_list_path is None:
args.skip_list_path = os.path.join(args.queries, 'skip_list.json')
if args.sequential is None:
args.sequential = set([])
if args.tmp is None:
args.tmp = args.queries
if args.client is None:

View File

@ -0,0 +1,4 @@
<!-- Ignore limit of concurrent queries in tests -->
<yandex>
<max_concurrent_queries>0</max_concurrent_queries>
</yandex>

View File

@ -27,6 +27,7 @@ ln -sf $SRC_PATH/config.d/secure_ports.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/clusters.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/graphite.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/database_atomic.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/max_concurrent_queries.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/test_cluster_with_incorrect_pw.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/test_keeper_port.xml $DEST_SERVER_PATH/config.d/
ln -sf $SRC_PATH/config.d/logging_no_rotate.xml $DEST_SERVER_PATH/config.d/

View File

@ -135,3 +135,13 @@ named `test.py` containing tests in it. All functions with names starting with `
To assert that two TSV files must be equal, wrap them in the `TSV` class and use the regular `assert`
statement. Example: `assert TSV(result) == TSV(reference)`. In case the assertion fails, `pytest`
will automagically detect the types of variables and only the small diff of two files is printed.
### Troubleshooting
If tests failing for misterious reasons, this may help:
```
sudo service docker stop
sudo bash -c 'rm -rf /var/lib/docker/*'
sudo service docker start
```

View File

@ -0,0 +1,29 @@
zhangsan
lisi
1111 2222
1112 2224
1113 2226
female
zhangsan
gender
1116
1117
1118
1119
[]
[]
[]
[]
[]
[]
[0,2,0]
[1,2,3]
[1,3,2]
[2,4,4]
[3,5,6]
[4,6,8]
[5,7,10]
[100,20,90]
{1:'1',2:'2',3:'foo'} 1
200000 560000 0
200000 560000 0

View File

@ -0,0 +1,65 @@
set allow_experimental_map_type = 1;
-- String type
drop table if exists table_map;
create table table_map (a Map(String, String)) engine = Memory;
insert into table_map values ({'name':'zhangsan', 'gender':'male'}), ({'name':'lisi', 'gender':'female'});
select a['name'] from table_map;
drop table if exists table_map;
drop table if exists table_map;
create table table_map (a Map(String, UInt64)) engine = MergeTree() order by a;
insert into table_map select map('key1', number, 'key2', number * 2) from numbers(1111, 3);
select a['key1'], a['key2'] from table_map;
drop table if exists table_map;
-- MergeTree Engine
drop table if exists table_map;
create table table_map (a Map(String, String), b String) engine = MergeTree() order by a;
insert into table_map values ({'name':'zhangsan', 'gender':'male'}, 'name'), ({'name':'lisi', 'gender':'female'}, 'gender');
select a[b] from table_map;
select b from table_map where a = map('name','lisi', 'gender', 'female');
drop table if exists table_map;
-- Int type
drop table if exists table_map;
create table table_map(a Map(UInt8, UInt64), b UInt8) Engine = MergeTree() order by b;
insert into table_map select map(number, number+5), number from numbers(1111,4);
select a[b] from table_map;
drop table if exists table_map;
-- Array Type
drop table if exists table_map;
create table table_map(a Map(String, Array(UInt8))) Engine = MergeTree() order by a;
insert into table_map values(map('k1', [1,2,3], 'k2', [4,5,6])), (map('k0', [], 'k1', [100,20,90]));
insert into table_map select map('k1', [number, number + 2, number * 2]) from numbers(6);
insert into table_map select map('k2', [number, number + 2, number * 2]) from numbers(6);
select a['k1'] as col1 from table_map order by col1;
drop table if exists table_map;
SELECT CAST(([1, 2, 3], ['1', '2', 'foo']), 'Map(UInt8, String)') AS map, map[1];
CREATE TABLE table_map (n UInt32, m Map(String, Int))
ENGINE = MergeTree ORDER BY n SETTINGS min_bytes_for_wide_part = 0;
-- coversion from Tuple(Array(K), Array(V))
INSERT INTO table_map SELECT number, (arrayMap(x -> toString(x), range(number % 10 + 2)), range(number % 10 + 2)) FROM numbers(100000);
-- coversion from Array(Tuple(K, V))
INSERT INTO table_map SELECT number, arrayMap(x -> (toString(x), x), range(number % 10 + 2)) FROM numbers(100000);
SELECT sum(m['1']), sum(m['7']), sum(m['100']) FROM table_map;
DROP TABLE IF EXISTS table_map;
CREATE TABLE table_map (n UInt32, m Map(String, Int))
ENGINE = MergeTree ORDER BY n;
-- coversion from Tuple(Array(K), Array(V))
INSERT INTO table_map SELECT number, (arrayMap(x -> toString(x), range(number % 10 + 2)), range(number % 10 + 2)) FROM numbers(100000);
-- coversion from Array(Tuple(K, V))
INSERT INTO table_map SELECT number, arrayMap(x -> (toString(x), x), range(number % 10 + 2)) FROM numbers(100000);
SELECT sum(m['1']), sum(m['7']), sum(m['100']) FROM table_map;
DROP TABLE IF EXISTS table_map;

View File

@ -0,0 +1,46 @@
JSON
{
"meta":
[
{
"name": "m",
"type": "Map(String,UInt32)"
},
{
"name": "m1",
"type": "Map(String,Date)"
},
{
"name": "m2",
"type": "Map(String,Array(UInt32))"
}
],
"data":
[
{
"m": {"k1":1,"k2":2,"k3":3},
"m1": {"k1":"2020-05-05"},
"m2": {"k1":[],"k2":[7,8]}
},
{
"m": {"k1":10,"k3":30},
"m1": {"k2":"2020-06-06"},
"m2": {}
}
],
"rows": 2
}
JSONEachRow
{"m":{"k1":1,"k2":2,"k3":3},"m1":{"k1":"2020-05-05"},"m2":{"k1":[],"k2":[7,8]}}
{"m":{"k1":10,"k3":30},"m1":{"k2":"2020-06-06"},"m2":{}}
CSV
"{'k1':1,'k2':2,'k3':3}","{'k1':'2020-05-05'}","{'k1':[],'k2':[7,8]}"
"{'k1':10,'k3':30}","{'k2':'2020-06-06'}","{}"
TSV
{'k1':1,'k2':2,'k3':3} {'k1':'2020-05-05'} {'k1':[],'k2':[7,8]}
{'k1':10,'k3':30} {'k2':'2020-06-06'} {}
TSKV
m={'k1':1,'k2':2,'k3':3} m1={'k1':'2020-05-05'} m2={'k1':[],'k2':[7,8]}
m={'k1':10,'k3':30} m1={'k2':'2020-06-06'} m2={}

View File

@ -0,0 +1,19 @@
SET allow_experimental_map_type = 1;
SET output_format_write_statistics = 0;
DROP TABLE IF EXISTS map_formats;
CREATE TABLE map_formats (m Map(String, UInt32), m1 Map(String, Date), m2 Map(String, Array(UInt32))) ENGINE = Log;
INSERT INTO map_formats VALUES(map('k1', 1, 'k2', 2, 'k3', 3), map('k1', toDate('2020-05-05')), map('k1', [], 'k2', [7, 8]));
INSERT INTO map_formats VALUES(map('k1', 10, 'k3', 30), map('k2', toDate('2020-06-06')), map());
SELECT 'JSON';
SELECT * FROM map_formats ORDER BY m['k1'] FORMAT JSON;
SELECT 'JSONEachRow';
SELECT * FROM map_formats ORDER BY m['k1'] FORMAT JSONEachRow;
SELECT 'CSV';
SELECT * FROM map_formats ORDER BY m['k1'] FORMAT CSV;
SELECT 'TSV';
SELECT * FROM map_formats ORDER BY m['k1'] FORMAT TSV;
SELECT 'TSKV';
SELECT * FROM map_formats ORDER BY m['k1'] FORMAT TSKV;

View File

@ -0,0 +1,5 @@
{'k1':1,'k2':2,'k3':3} {'k1':'2020-05-05'} {'k1':[],'k2':[7,8]}
{'k1':1,'k2':2,'k3':3} {'k1':'2020-05-05'} {'k1':[],'k2':[7,8]}
{'k1':1,'k2':2,'k3':3} {'k1':'2020-05-05'} {'k1':[],'k2':[7,8]}
{'k1':1,'k2':2,'k3':3} {'k1':'2020-05-05'} {'k1':[],'k2':[7,8]}
{'k1':1,'k2':2,'k3':3} {'k1':'2020-05-05'} {'k1':[],'k2':[7,8]}

View File

@ -0,0 +1,21 @@
#!/usr/bin/env bash
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. "$CURDIR"/../shell_config.sh
$CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS map_formats_input"
$CLICKHOUSE_CLIENT -q "CREATE TABLE map_formats_input (m Map(String, UInt32), m1 Map(String, Date), m2 Map(String, Array(UInt32))) ENGINE = Log;" --allow_experimental_map_type 1
$CLICKHOUSE_CLIENT -q "INSERT INTO map_formats_input FORMAT JSONEachRow" <<< '{"m":{"k1":1,"k2":2,"k3":3},"m1":{"k1":"2020-05-05"},"m2":{"k1":[],"k2":[7,8]}}'
$CLICKHOUSE_CLIENT -q "SELECT * FROM map_formats_input"
$CLICKHOUSE_CLIENT -q "TRUNCATE TABLE map_formats_input"
$CLICKHOUSE_CLIENT -q "INSERT INTO map_formats_input FORMAT CSV" <<< "\"{'k1':1,'k2':2,'k3':3}\",\"{'k1':'2020-05-05'}\",\"{'k1':[],'k2':[7,8]}\""
$CLICKHOUSE_CLIENT -q "SELECT * FROM map_formats_input"
$CLICKHOUSE_CLIENT -q "TRUNCATE TABLE map_formats_input"
$CLICKHOUSE_CLIENT -q "INSERT INTO map_formats_input FORMAT TSV" <<< "{'k1':1,'k2':2,'k3':3} {'k1':'2020-05-05'} {'k1':[],'k2':[7,8]}"
$CLICKHOUSE_CLIENT -q "SELECT * FROM map_formats_input"
$CLICKHOUSE_CLIENT -q 'SELECT * FROM map_formats_input FORMAT Native' | $CLICKHOUSE_CLIENT -q "INSERT INTO map_formats_input FORMAT Native"
$CLICKHOUSE_CLIENT -q "SELECT * FROM map_formats_input"

View File

@ -106,5 +106,177 @@
"polymorphic-parts": [
"01508_partition_pruning", /// bug, shoud be fixed
"01482_move_to_prewhere_and_cast" /// bug, shoud be fixed
],
"parallel":
[
/// Pessimistic list of tests which work badly in parallel.
/// Probably they need better investigation.
"00062_replicated_merge_tree_alter_zookeeper",
"00109_shard_totals_after_having",
"00110_external_sort",
"00116_storage_set",
"00121_drop_column_zookeeper",
"00133_long_shard_memory_tracker_and_exception_safety",
"00180_attach_materialized_view",
"00226_zookeeper_deduplication_and_unexpected_parts",
"00236_replicated_drop_on_non_leader_zookeeper",
"00305_http_and_readonly",
"00311_array_primary_key",
"00417_kill_query",
"00423_storage_log_single_thread",
"00429_long_http_bufferization",
"00446_clear_column_in_partition_concurrent_zookeeper",
"00446_clear_column_in_partition_zookeeper",
"00463_long_sessions_in_http_interface",
"00505_shard_secure",
"00508_materialized_view_to",
"00516_deduplication_after_drop_partition_zookeeper",
"00534_functions_bad_arguments10",
"00552_or_nullable",
"00564_versioned_collapsing_merge_tree",
"00571_non_exist_database_when_create_materializ_view",
"00575_illegal_column_exception_when_drop_depen_column",
"00599_create_view_with_subquery",
"00612_http_max_query_size",
"00619_union_highlite",
"00620_optimize_on_nonleader_replica_zookeeper",
"00625_arrays_in_nested",
"00626_replace_partition_from_table",
"00626_replace_partition_from_table_zookeeper",
"00633_materialized_view_and_too_many_parts_zookeeper",
"00652_mergetree_mutations",
"00652_replicated_mutations_zookeeper",
"00682_empty_parts_merge",
"00688_low_cardinality_serialization",
"00693_max_block_size_system_tables_columns",
"00699_materialized_view_mutations",
"00701_rollup",
"00715_fetch_merged_or_mutated_part_zookeeper",
"00751_default_databasename_for_view",
"00753_alter_attach",
"00754_alter_modify_column_partitions",
"00754_alter_modify_order_by_replicated_zookeeper",
"00763_long_lock_buffer_alter_destination_table",
"00804_test_alter_compression_codecs",
"00804_test_custom_compression_codecs",
"00804_test_custom_compression_codes_log_storages",
"00804_test_delta_codec_compression",
"00834_cancel_http_readonly_queries_on_client_close",
"00834_kill_mutation",
"00834_kill_mutation_replicated_zookeeper",
"00840_long_concurrent_select_and_drop_deadlock",
"00899_long_attach_memory_limit",
"00910_zookeeper_custom_compression_codecs_replicated",
"00926_adaptive_index_granularity_merge_tree",
"00926_adaptive_index_granularity_pk",
"00926_adaptive_index_granularity_replacing_merge_tree",
"00926_zookeeper_adaptive_index_granularity_replicated_merge_tree",
"00933_alter_ttl",
"00933_reserved_word",
"00933_test_fix_extra_seek_on_compressed_cache",
"00933_ttl_replicated_zookeeper",
"00933_ttl_with_default",
"00955_test_final_mark",
"00976_ttl_with_old_parts",
"00980_merge_alter_settings",
"00980_zookeeper_merge_tree_alter_settings",
"00988_constraints_replication_zookeeper",
"00989_parallel_parts_loading",
"00993_system_parts_race_condition_drop_zookeeper",
"01013_sync_replica_timeout_zookeeper",
"01014_lazy_database_concurrent_recreate_reattach_and_show_tables",
"01015_attach_part",
"01018_ddl_dictionaries_concurrent_requrests",
"01018_ddl_dictionaries_create",
"01018_ddl_dictionaries_select",
"01021_only_tuple_columns",
"01031_mutations_interpreter_and_context",
"01033_dictionaries_lifetime",
"01035_concurrent_move_partition_from_table_zookeeper",
"01045_zookeeper_system_mutations_with_parts_names",
"01053_ssd_dictionary",
"01055_compact_parts_1",
"01060_avro",
"01060_shutdown_table_after_detach",
"01070_materialize_ttl",
"01070_modify_ttl",
"01070_mutations_with_dependencies",
"01071_live_view_detach_dependency",
"01071_prohibition_secondary_index_with_old_format_merge_tree",
"01073_attach_if_not_exists",
"01076_parallel_alter_replicated_zookeeper",
"01079_parallel_alter_add_drop_column_zookeeper",
"01079_parallel_alter_detach_table_zookeeper",
"01083_expressions_in_engine_arguments",
"01085_max_distributed_connections_http",
"01092_memory_profiler",
"01098_temporary_and_external_tables",
"01107_atomic_db_detach_attach",
"01108_restart_replicas_rename_deadlock_zookeeper",
"01110_dictionary_layout_without_arguments",
"01114_database_atomic",
"01127_month_partitioning_consistency_select",
"01130_in_memory_parts_partitons",
"01135_default_and_alter_zookeeper",
"01148_zookeeper_path_macros_unfolding",
"01190_full_attach_syntax",
"01193_metadata_loading",
"01200_mutations_memory_consumption",
"01238_http_memory_tracking",
"01249_bad_arguments_for_bloom_filter",
"01251_dict_is_in_infinite_loop",
"01254_dict_load_after_detach_attach",
"01259_dictionary_custom_settings_ddl",
"01267_alter_default_key_columns_zookeeper",
"01268_dictionary_direct_layout",
"01269_alias_type_differs",
"01272_suspicious_codecs",
"01277_alter_rename_column_constraint_zookeeper",
"01280_ssd_complex_key_dictionary",
"01280_ttl_where_group_by",
"01281_group_by_limit_memory_tracking",
"01281_unsucceeded_insert_select_queries_counter",
"01293_system_distribution_queue",
"01294_lazy_database_concurrent",
"01294_lazy_database_concurrent_recreate_reattach_and_show_tables",
"01305_replica_create_drop_zookeeper",
"01307_multiple_leaders_zookeeper",
"01318_long_unsuccessful_mutation_zookeeper",
"01319_manual_write_to_replicas",
"01338_long_select_and_alter",
"01338_long_select_and_alter_zookeeper",
"01355_alter_column_with_order",
"01355_ilike",
"01357_version_collapsing_attach_detach_zookeeper",
"01375_compact_parts_codecs",
"01378_alter_rename_with_ttl_zookeeper",
"01388_clear_all_columns",
"01396_inactive_replica_cleanup_nodes_zookeeper",
"01412_cache_dictionary_race",
"01414_mutations_and_errors_zookeeper",
"01415_inconsistent_merge_tree_settings",
"01415_sticking_mutations",
"01417_freeze_partition_verbose",
"01417_freeze_partition_verbose_zookeeper",
"01430_modify_sample_by_zookeeper",
"01454_storagememory_data_race_challenge",
"01456_modify_column_type_via_add_drop_update",
"01457_create_as_table_function_structure",
"01459_manual_write_to_replicas",
"01460_DistributedFilesToInsert",
"01465_ttl_recompression",
"01471_calculate_ttl_during_merge",
"01493_alter_remove_properties_zookeeper",
"01493_storage_set_persistency",
"01494_storage_join_persistency",
"01516_drop_table_stress",
"01541_max_memory_usage_for_user",
"attach",
"ddl_dictionaries",
"dictionary",
"limit_memory",
"live_view",
"memory_leak",
"memory_limit"
]
}