Merge branch 'master' into persistent_nukeeper_log_storage

This commit is contained in:
alesapin 2021-02-16 11:42:54 +03:00
commit daee018ea8
49 changed files with 1070 additions and 107 deletions

View File

@ -0,0 +1,29 @@
---
toc_priority:
toc_title:
---
# data_type_name {#data_type-name}
Description.
**Parameters** (Optional)
- `x` — Description. [Type name](relative/path/to/type/dscr.md#type).
- `y` — Description. [Type name](relative/path/to/type/dscr.md#type).
**Examples**
```sql
```
## Additional Info {#additional-info} (Optional)
The name of an additional section can be any, for example, **Usage**.
**See Also** (Optional)
- [link](#)
[Original article](https://clickhouse.tech/docs/en/data_types/<data-type-name>/) <!--hide-->

View File

@ -136,8 +136,7 @@ The following settings can be specified in configuration file for given endpoint
- `access_key_id` and `secret_access_key` — Optional. Specifies credentials to use with given endpoint.
- `use_environment_credentials` — Optional, default value is `false`. If set to `true`, S3 client will try to obtain credentials from environment variables and Amazon EC2 metadata for given endpoint.
- `header` — Optional, can be speficied multiple times. Adds specified HTTP header to a request to given endpoint.
This configuration also applies to S3 disks in `MergeTree` table engine family.
- `server_side_encryption_customer_key_base64` — Optional. If specified, required headers for accessing S3 objects with SSE-C encryption will be set.
Example:
@ -149,6 +148,7 @@ Example:
<!-- <secret_access_key>SECRET_ACCESS_KEY</secret_access_key> -->
<!-- <use_environment_credentials>false</use_environment_credentials> -->
<!-- <header>Authorization: Bearer SOME-TOKEN</header> -->
<!-- <server_side_encryption_customer_key_base64>BASE64-ENCODED-KEY</server_side_encryption_customer_key_base64> -->
</endpoint-name>
</s3>
```

View File

@ -715,6 +715,7 @@ Configuration markup:
<endpoint>https://storage.yandexcloud.net/my-bucket/root-path/</endpoint>
<access_key_id>your_access_key_id</access_key_id>
<secret_access_key>your_secret_access_key</secret_access_key>
<server_side_encryption_customer_key_base64>your_base64_encoded_customer_key</server_side_encryption_customer_key_base64>
<proxy>
<uri>http://proxy1</uri>
<uri>http://proxy2</uri>
@ -750,7 +751,8 @@ Optional parameters:
- `metadata_path` — Path on local FS to store metadata files for S3. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
- `cache_enabled` — Allows to cache mark and index files on local FS. Default value is `true`.
- `cache_path` — Path on local FS where to store cached mark and index files. Default value is `/var/lib/clickhouse/disks/<disk_name>/cache/`.
- `skip_access_check` — If true disk access checks will not be performed on disk start-up. Default value is `false`.
- `skip_access_check` — If true, disk access checks will not be performed on disk start-up. Default value is `false`.
- `server_side_encryption_customer_key_base64` — If specified, required headers for accessing S3 objects with SSE-C encryption will be set.
S3 disk can be configured as `main` or `cold` storage:

View File

@ -2592,6 +2592,18 @@ Possible values:
Default value: `16`.
## opentelemetry_start_trace_probability {#opentelemetry-start-trace-probability}
Sets the probability that the ClickHouse can start a trace for executed queries (if no parent [trace context](https://www.w3.org/TR/trace-context/) is supplied).
Possible values:
- 0 — The trace for all executed queries is disabled (if no parent trace context is supplied).
- Positive floating-point number in the range [0..1]. For example, if the setting value is `0,5`, ClickHouse can start a trace on average for half of the queries.
- 1 — The trace for all executed queries is enabled.
Default value: `0`.
## optimize_on_insert {#optimize-on-insert}
Enables or disables data transformation before the insertion, as if merge was done on this block (according to table engine).

View File

@ -0,0 +1,53 @@
# system.opentelemetry_span_log {#system_tables-opentelemetry_span_log}
Contains information about [trace spans](https://opentracing.io/docs/overview/spans/) for executed queries.
Columns:
- `trace_id` ([UUID](../../sql-reference/data-types/uuid.md) — ID of the trace for executed query.
- `span_id` ([UInt64](../../sql-reference/data-types/int-uint.md)) — ID of the `trace span`.
- `parent_span_id` ([UInt64](../../sql-reference/data-types/int-uint.md)) — ID of the parent `trace span`.
- `operation_name` ([String](../../sql-reference/data-types/string.md)) — The name of the operation.
- `start_time_us` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The start time of the `trace span` (in microseconds).
- `finish_time_us` ([UInt64](../../sql-reference/data-types/int-uint.md)) — The finish time of the `trace span` (in microseconds).
- `finish_date` ([Date](../../sql-reference/data-types/date.md)) — The finish date of the `trace span`.
- `attribute.names` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — [Attribute](https://opentelemetry.io/docs/go/instrumentation/#attributes) names depending on the `trace span`. They are filled in according to the recommendations in the [OpenTelemetry](https://opentelemetry.io/) standard.
- `attribute.values` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — Attribute values depending on the `trace span`. They are filled in according to the recommendations in the `OpenTelemetry` standard.
**Example**
Query:
``` sql
SELECT * FROM system.opentelemetry_span_log LIMIT 1 FORMAT Vertical;
```
Result:
``` text
Row 1:
──────
trace_id: cdab0847-0d62-61d5-4d38-dd65b19a1914
span_id: 701487461015578150
parent_span_id: 2991972114672045096
operation_name: DB::Block DB::InterpreterSelectQuery::getSampleBlockImpl()
start_time_us: 1612374594529090
finish_time_us: 1612374594529108
finish_date: 2021-02-03
attribute.names: []
attribute.values: []
```
**See Also**
- [OpenTelemetry](../../operations/opentelemetry.md)
[Original article](https://clickhouse.tech/docs/en/operations/system_tables/opentelemetry_span_log) <!--hide-->

View File

@ -1,9 +1,9 @@
---
toc_priority: 47
toc_title: ClickHouse Update
toc_title: ClickHouse Upgrade
---
# ClickHouse Update {#clickhouse-update}
# ClickHouse Upgrade {#clickhouse-upgrade}
If ClickHouse was installed from `deb` packages, execute the following commands on the server:
@ -16,3 +16,19 @@ $ sudo service clickhouse-server restart
If you installed ClickHouse using something other than the recommended `deb` packages, use the appropriate update method.
ClickHouse does not support a distributed update. The operation should be performed consecutively on each separate server. Do not update all the servers on a cluster simultaneously, or the cluster will be unavailable for some time.
The upgrade of older version of ClickHouse to specific version:
As an example:
`xx.yy.a.b` is a current stable version. The latest stable version could be found [here](https://github.com/ClickHouse/ClickHouse/releases)
```bash
$ sudo apt-get update
$ sudo apt-get install clickhouse-server=xx.yy.a.b clickhouse-client=xx.yy.a.b clickhouse-common-static=xx.yy.a.b
$ sudo service clickhouse-server restart
```

View File

@ -0,0 +1,83 @@
---
toc_priority: 65
toc_title: Map(key, value)
---
# Map(key, value) {#data_type-map}
`Map(key, value)` data type stores `key:value` pairs.
**Parameters**
- `key` — The key part of the pair. [String](../../sql-reference/data-types/string.md) or [Integer](../../sql-reference/data-types/int-uint.md).
- `value` — The value part of the pair. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md) or [Array](../../sql-reference/data-types/array.md).
!!! warning "Warning"
Currently `Map` data type is an experimental feature. To work with it you must set `allow_experimental_map_type = 1`.
To get the value from an `a Map('key', 'value')` column, use `a['key']` syntax. This lookup works now with a linear complexity.
**Examples**
Consider the table:
``` sql
CREATE TABLE table_map (a Map(String, UInt64)) ENGINE=Memory;
INSERT INTO table_map VALUES ({'key1':1, 'key2':10}), ({'key1':2,'key2':20}), ({'key1':3,'key2':30});
```
Select all `key2` values:
```sql
SELECT a['key2'] FROM table_map;
```
Result:
```text
┌─arrayElement(a, 'key2')─┐
│ 10 │
│ 20 │
│ 30 │
└─────────────────────────┘
```
If there's no such `key` in the `Map()` column, the query returns zeros for numerical values, empty strings or empty arrays.
```sql
INSERT INTO table_map VALUES ({'key3':100}), ({});
SELECT a['key3'] FROM table_map;
```
Result:
```text
┌─arrayElement(a, 'key3')─┐
│ 100 │
│ 0 │
└─────────────────────────┘
┌─arrayElement(a, 'key3')─┐
│ 0 │
│ 0 │
│ 0 │
└─────────────────────────┘
```
## Convert Tuple to Map Type {#map-and-tuple}
You can cast `Tuple()` as `Map()` using [CAST](../../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) function:
``` sql
SELECT CAST(([1, 2, 3], ['Ready', 'Steady', 'Go']), 'Map(UInt8, String)') AS map;
```
``` text
┌─map───────────────────────────┐
│ {1:'Ready',2:'Steady',3:'Go'} │
└───────────────────────────────┘
```
**See Also**
- [map()](../../sql-reference/functions/tuple-map-functions.md#function-map) function
- [CAST()](../../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) function
[Original article](https://clickhouse.tech/docs/en/data-types/map/) <!--hide-->

View File

@ -909,6 +909,66 @@ WHERE diff != 1
Same as for [runningDifference](../../sql-reference/functions/other-functions.md#other_functions-runningdifference), the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.
## runningConcurrency {#runningconcurrency}
Given a series of beginning time and ending time of events, this function calculates concurrency of the events at each of the data point, that is, the beginning time.
!!! warning "Warning"
Events spanning multiple data blocks will not be processed correctly. The function resets its state for each new data block.
The result of the function depends on the order of data in the block. It assumes the beginning time is sorted in ascending order.
**Syntax**
``` sql
runningConcurrency(begin, end)
```
**Parameters**
- `begin` — A column for the beginning time of events (inclusive). [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md), or [DateTime64](../../sql-reference/data-types/datetime64.md).
- `end` — A column for the ending time of events (exclusive). [Date](../../sql-reference/data-types/date.md), [DateTime](../../sql-reference/data-types/datetime.md), or [DateTime64](../../sql-reference/data-types/datetime64.md).
Note that two columns `begin` and `end` must have the same type.
**Returned values**
- The concurrency of events at the data point.
Type: [UInt32](../../sql-reference/data-types/int-uint.md)
**Example**
Input table:
``` text
┌───────────────begin─┬─────────────────end─┐
│ 2020-12-01 00:00:00 │ 2020-12-01 00:59:59 │
│ 2020-12-01 00:30:00 │ 2020-12-01 00:59:59 │
│ 2020-12-01 00:40:00 │ 2020-12-01 01:30:30 │
│ 2020-12-01 01:10:00 │ 2020-12-01 01:30:30 │
│ 2020-12-01 01:50:00 │ 2020-12-01 01:59:59 │
└─────────────────────┴─────────────────────┘
```
Query:
``` sql
SELECT runningConcurrency(begin, end) FROM example
```
Result:
``` text
┌─runningConcurrency(begin, end)─┐
│ 1 │
│ 2 │
│ 3 │
│ 2 │
│ 1 │
└────────────────────────────────┘
```
## MACNumToString(num) {#macnumtostringnum}
Accepts a UInt64 number. Interprets it as a MAC address in big endian. Returns a string containing the corresponding MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form).

View File

@ -5,6 +5,68 @@ toc_title: Working with maps
# Functions for maps {#functions-for-working-with-tuple-maps}
## map {#function-map}
Arranges `key:value` pairs into [Map(key, value)](../../sql-reference/data-types/map.md) data type.
**Syntax**
``` sql
map(key1, value1[, key2, value2, ...])
```
**Parameters**
- `key` — The key part of the pair. [String](../../sql-reference/data-types/string.md) or [Integer](../../sql-reference/data-types/int-uint.md).
- `value` — The value part of the pair. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md) or [Array](../../sql-reference/data-types/array.md).
**Returned value**
- Data structure as `key:value` pairs.
Type: [Map(key, value)](../../sql-reference/data-types/map.md).
**Examples**
Query:
``` sql
SELECT map('key1', number, 'key2', number * 2) FROM numbers(3);
```
Result:
``` text
┌─map('key1', number, 'key2', multiply(number, 2))─┐
│ {'key1':0,'key2':0} │
│ {'key1':1,'key2':2} │
│ {'key1':2,'key2':4} │
└──────────────────────────────────────────────────┘
```
Query:
``` sql
CREATE TABLE table_map (a Map(String, UInt64)) ENGINE = MergeTree() ORDER BY a;
INSERT INTO table_map SELECT map('key1', number, 'key2', number * 2) FROM numbers(3);
SELECT a['key2'] FROM table_map;
```
Result:
``` text
┌─arrayElement(a, 'key2')─┐
│ 0 │
│ 2 │
│ 4 │
└─────────────────────────┘
```
**See Also**
- [Map(key, value)](../../sql-reference/data-types/map.md) data type
## mapAdd {#function-mapadd}
Collect all the keys and sum corresponding values.
@ -112,4 +174,4 @@ Result:
└──────────────────────────────┴───────────────────────────────────┘
```
[Original article](https://clickhouse.tech/docs/en/query_language/functions/tuple-map-functions/) <!--hide-->
[Original article](https://clickhouse.tech/docs/en/sql-reference/functions/tuple-map-functions/) <!--hide-->

View File

@ -14,14 +14,16 @@ ClickHouse supports the standard grammar for defining windows and window functio
| Feature | Support or workaround |
| --------| ----------|
| ad hoc window specification (`count(*) over (partition by id order by time desc)`) | yes |
| `WINDOW` clause (`select ... from table window w as (partiton by id)`) | yes |
| `ROWS` frame | yes |
| `RANGE` frame | yes, it is the default |
| `GROUPS` frame | no |
| ad hoc window specification (`count(*) over (partition by id order by time desc)`) | supported |
| expressions involving window functions, e.g. `(count(*) over ()) / 2)` | not supported, wrap in a subquery ([feature request](https://github.com/ClickHouse/ClickHouse/issues/19857)) |
| `WINDOW` clause (`select ... from table window w as (partiton by id)`) | supported |
| `ROWS` frame | supported |
| `RANGE` frame | supported, the default |
| `INTERVAL` syntax for `DateTime` `RANGE OFFSET` frame | not supported, specify the number of seconds instead |
| `GROUPS` frame | not supported |
| Calculating aggregate functions over a frame (`sum(value) over (order by time)`) | all aggregate functions are supported |
| `rank()`, `dense_rank()`, `row_number()` | yes |
| `lag/lead(value, offset)` | no, replace with `any(value) over (.... rows between <offset> preceding and <offset> preceding)`, or `following` for `lead`|
| `rank()`, `dense_rank()`, `row_number()` | supported |
| `lag/lead(value, offset)` | not supported, replace with `any(value) over (.... rows between <offset> preceding and <offset> preceding)`, or `following` for `lead`|
## References

View File

@ -2473,6 +2473,18 @@ SELECT SUM(-1), MAX(0) FROM system.one WHERE 0;
Значение по умолчанию: `16`.
## opentelemetry_start_trace_probability {#opentelemetry-start-trace-probability}
Задает вероятность того, что ClickHouse начнет трассировку для выполненных запросов (если не указан [входящий контекст](https://www.w3.org/TR/trace-context/) трассировки).
Возможные значения:
- 0 — трассировка для выполненных запросов отключена (если не указан входящий контекст трассировки).
- Положительное число с плавающей точкой в диапазоне [0..1]. Например, при значении настройки, равной `0,5`, ClickHouse начнет трассировку в среднем для половины запросов.
- 1 — трассировка для всех выполненных запросов включена.
Значение по умолчанию: `0`.
## optimize_on_insert {#optimize-on-insert}
Включает или выключает преобразование данных перед добавлением в таблицу, как будто над добавляемым блоком предварительно было произведено слияние (в соответствии с движком таблицы).

View File

@ -0,0 +1,49 @@
# system.opentelemetry_span_log {#system_tables-opentelemetry_span_log}
Содержит информацию о [trace spans](https://opentracing.io/docs/overview/spans/) для выполненных запросов.
Столбцы:
- `trace_id` ([UUID](../../sql-reference/data-types/uuid.md) — идентификатор трассировки для выполненного запроса.
- `span_id` ([UInt64](../../sql-reference/data-types/int-uint.md)) — идентификатор `trace span`.
- `parent_span_id` ([UInt64](../../sql-reference/data-types/int-uint.md)) — идентификатор родительского `trace span`.
- `operation_name` ([String](../../sql-reference/data-types/string.md)) — имя операции.
- `start_time_us` ([UInt64](../../sql-reference/data-types/int-uint.md)) — время начала `trace span` (в микросекундах).
- `finish_time_us` ([UInt64](../../sql-reference/data-types/int-uint.md)) — время окончания `trace span` (в микросекундах).
- `finish_date` ([Date](../../sql-reference/data-types/date.md)) — дата окончания `trace span`.
- `attribute.names` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — имена [атрибутов](https://opentelemetry.io/docs/go/instrumentation/#attributes) в зависимости от `trace span`. Заполняются согласно рекомендациям в стандарте [OpenTelemetry](https://opentelemetry.io/).
- `attribute.values` ([Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md))) — значения атрибутов в зависимости от `trace span`. Заполняются согласно рекомендациям в стандарте `OpenTelemetry`.
**Пример**
Запрос:
``` sql
SELECT * FROM system.opentelemetry_span_log LIMIT 1 FORMAT Vertical;
```
Результат:
``` text
Row 1:
──────
trace_id: cdab0847-0d62-61d5-4d38-dd65b19a1914
span_id: 701487461015578150
parent_span_id: 2991972114672045096
operation_name: DB::Block DB::InterpreterSelectQuery::getSampleBlockImpl()
start_time_us: 1612374594529090
finish_time_us: 1612374594529108
finish_date: 2021-02-03
attribute.names: []
attribute.values: []
```
[Оригинальная статья](https://clickhouse.tech/docs/ru/operations/system_tables/opentelemetry_span_log) <!--hide-->

View File

@ -0,0 +1,69 @@
---
toc_priority: 65
toc_title: Map(key, value)
---
# Map(key, value) {#data_type-map}
Тип данных `Map(key, value)` хранит пары `ключ:значение`.
**Параметры**
- `key` — ключ. [String](../../sql-reference/data-types/string.md) или [Integer](../../sql-reference/data-types/int-uint.md).
- `value` — значение. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md) или [Array](../../sql-reference/data-types/array.md).
!!! warning "Предупреждение"
Сейчас использование типа данных `Map` является экспериментальной возможностью. Чтобы использовать этот тип данных, включите настройку `allow_experimental_map_type = 1`.
Чтобы получить значение из колонки `a Map('key', 'value')`, используйте синтаксис `a['key']`. В настоящее время такая подстановка работает по алгоритму с линейной сложностью.
**Примеры**
Рассмотрим таблицу:
``` sql
CREATE TABLE table_map (a Map(String, UInt64)) ENGINE=Memory;
INSERT INTO table_map VALUES ({'key1':1, 'key2':10}), ({'key1':2,'key2':20}), ({'key1':3,'key2':30});
```
Выборка всех значений ключа `key2`:
```sql
SELECT a['key2'] FROM table_map;
```
Результат:
```text
┌─arrayElement(a, 'key2')─┐
│ 10 │
│ 20 │
│ 30 │
└─────────────────────────┘
```
Если для какого-то ключа `key` в колонке с типом `Map()` нет значения, запрос возвращает нули для числовых колонок, пустые строки или пустые массивы.
```sql
INSERT INTO table_map VALUES ({'key3':100}), ({});
SELECT a['key3'] FROM table_map;
```
Результат:
```text
┌─arrayElement(a, 'key3')─┐
│ 100 │
│ 0 │
└─────────────────────────┘
┌─arrayElement(a, 'key3')─┐
│ 0 │
│ 0 │
│ 0 │
└─────────────────────────┘
```
**См. также**
- функция [map()](../../sql-reference/functions/tuple-map-functions.md#function-map)
- функция [CAST()](../../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast)
[Original article](https://clickhouse.tech/docs/ru/data-types/map/) <!--hide-->

View File

@ -5,6 +5,66 @@ toc_title: Работа с контейнерами map
# Функции для работы с контейнерами map {#functions-for-working-with-tuple-maps}
## map {#function-map}
Преобразовывает пары `ключ:значение` в тип данных [Map(key, value)](../../sql-reference/data-types/map.md).
**Синтаксис**
``` sql
map(key1, value1[, key2, value2, ...])
```
**Параметры**
- `key` — ключ. [String](../../sql-reference/data-types/string.md) или [Integer](../../sql-reference/data-types/int-uint.md).
- `value` — значение. [String](../../sql-reference/data-types/string.md), [Integer](../../sql-reference/data-types/int-uint.md) или [Array](../../sql-reference/data-types/array.md).
**Возвращаемое значение**
- Структура данных в виде пар `ключ:значение`.
Тип: [Map(key, value)](../../sql-reference/data-types/map.md).
**Примеры**
Запрос:
``` sql
SELECT map('key1', number, 'key2', number * 2) FROM numbers(3);
```
Результат:
``` text
┌─map('key1', number, 'key2', multiply(number, 2))─┐
│ {'key1':0,'key2':0} │
│ {'key1':1,'key2':2} │
│ {'key1':2,'key2':4} │
└──────────────────────────────────────────────────┘
```
Запрос:
``` sql
CREATE TABLE table_map (a Map(String, UInt64)) ENGINE = MergeTree() ORDER BY a;
INSERT INTO table_map SELECT map('key1', number, 'key2', number * 2) FROM numbers(3);
SELECT a['key2'] FROM table_map;
```
Результат:
``` text
┌─arrayElement(a, 'key2')─┐
│ 0 │
│ 2 │
│ 4 │
└─────────────────────────┘
```
**См. также**
- тип данных [Map(key, value)](../../sql-reference/data-types/map.md)
## mapAdd {#function-mapadd}
Собирает все ключи и суммирует соответствующие значения.

View File

@ -455,7 +455,14 @@ template <>
struct LowCardinalityKeys<false> {};
/// For the case when all keys are of fixed length, and they fit in N (for example, 128) bits.
template <typename Value, typename Key, typename Mapped, bool has_nullable_keys_ = false, bool has_low_cardinality_ = false, bool use_cache = true, bool need_offset = false>
template <
typename Value,
typename Key,
typename Mapped,
bool has_nullable_keys_ = false,
bool has_low_cardinality_ = false,
bool use_cache = true,
bool need_offset = false>
struct HashMethodKeysFixed
: private columns_hashing_impl::BaseStateKeysFixed<Key, has_nullable_keys_>
, public columns_hashing_impl::HashMethodBase<HashMethodKeysFixed<Value, Key, Mapped, has_nullable_keys_, has_low_cardinality_, use_cache, need_offset>, Value, Mapped, use_cache, need_offset>
@ -471,6 +478,12 @@ struct HashMethodKeysFixed
Sizes key_sizes;
size_t keys_size;
/// SSSE3 shuffle method can be used. Shuffle masks will be calculated and stored here.
#if defined(__SSSE3__) && !defined(MEMORY_SANITIZER)
std::unique_ptr<uint8_t[]> masks;
std::unique_ptr<const char*[]> columns_data;
#endif
HashMethodKeysFixed(const ColumnRawPtrs & key_columns, const Sizes & key_sizes_, const HashMethodContextPtr &)
: Base(key_columns), key_sizes(std::move(key_sizes_)), keys_size(key_columns.size())
{
@ -491,6 +504,58 @@ struct HashMethodKeysFixed
low_cardinality_keys.nested_columns[i] = key_columns[i];
}
}
#if defined(__SSSE3__) && !defined(MEMORY_SANITIZER)
if constexpr (!has_low_cardinality && !has_nullable_keys && sizeof(Key) <= 16)
{
/** The task is to "pack" multiple fixed-size fields into single larger Key.
* Example: pack UInt8, UInt32, UInt16, UInt64 into UInt128 key:
* [- ---- -- -------- -] - the resulting uint128 key
* ^ ^ ^ ^ ^
* u8 u32 u16 u64 zero
*
* We can do it with the help of SSSE3 shuffle instruction.
*
* There will be a mask for every GROUP BY element (keys_size masks in total).
* Every mask has 16 bytes but only sizeof(Key) bytes are used (other we don't care).
*
* Every byte in the mask has the following meaning:
* - if it is 0..15, take the element at this index from source register and place here in the result;
* - if it is 0xFF - set the elemend in the result to zero.
*
* Example:
* We want to copy UInt32 to offset 1 in the destination and set other bytes in the destination as zero.
* The corresponding mask will be: FF, 0, 1, 2, 3, FF, FF, FF, FF, FF, FF, FF, FF, FF, FF, FF
*
* The max size of destination is 16 bytes, because we cannot process more with SSSE3.
*
* The method is disabled under MSan, because it's allowed
* to load into SSE register and process up to 15 bytes of uninitialized memory in columns padding.
* We don't use this uninitialized memory but MSan cannot look "into" the shuffle instruction.
*
* 16-bytes masks can be placed overlapping, only first sizeof(Key) bytes are relevant in each mask.
* We initialize them to 0xFF and then set the needed elements.
*/
size_t total_masks_size = sizeof(Key) * keys_size + (16 - sizeof(Key));
masks.reset(new uint8_t[total_masks_size]);
memset(masks.get(), 0xFF, total_masks_size);
size_t offset = 0;
for (size_t i = 0; i < keys_size; ++i)
{
for (size_t j = 0; j < key_sizes[i]; ++j)
{
masks[i * sizeof(Key) + offset] = j;
++offset;
}
}
columns_data.reset(new const char*[keys_size]);
for (size_t i = 0; i < keys_size; ++i)
columns_data[i] = Base::getActualColumns()[i]->getRawData().data;
}
#endif
}
ALWAYS_INLINE Key getKeyHolder(size_t row, Arena &) const
@ -506,6 +571,10 @@ struct HashMethodKeysFixed
return packFixed<Key, true>(row, keys_size, low_cardinality_keys.nested_columns, key_sizes,
&low_cardinality_keys.positions, &low_cardinality_keys.position_sizes);
#if defined(__SSSE3__) && !defined(MEMORY_SANITIZER)
if constexpr (!has_low_cardinality && !has_nullable_keys && sizeof(Key) <= 16)
return packFixedShuffle<Key>(columns_data.get(), keys_size, key_sizes.data(), row, masks.get());
#endif
return packFixed<Key>(row, keys_size, Base::getActualColumns(), key_sizes);
}
}

View File

@ -166,6 +166,8 @@ void NuKeeperStateMachine::create_snapshot(
}
}
LOG_DEBUG(log, "Created snapshot {}", s.get_last_log_idx());
nuraft::ptr<std::exception> except(nullptr);
bool ret = true;
when_done(ret, except);

View File

@ -7,6 +7,7 @@
#include "DiskS3.h"
#include "Disks/DiskCacheWrapper.h"
#include "Disks/DiskFactory.h"
#include "Storages/StorageS3Settings.h"
#include "ProxyConfiguration.h"
#include "ProxyListConfiguration.h"
#include "ProxyResolverConfiguration.h"
@ -137,6 +138,8 @@ void registerDiskS3(DiskFactory & factory)
uri.is_virtual_hosted_style,
config.getString(config_prefix + ".access_key_id", ""),
config.getString(config_prefix + ".secret_access_key", ""),
config.getString(config_prefix + ".server_side_encryption_customer_key_base64", ""),
{},
config.getBool(config_prefix + ".use_environment_credentials", config.getBool("s3.use_environment_credentials", false))
);

View File

@ -3,6 +3,10 @@
namespace DB
{
namespace ErrorCodes
{
extern const int INCORRECT_DATA;
}
std::pair<bool, size_t> fileSegmentationEngineJSONEachRowImpl(ReadBuffer & in, DB::Memory<> & memory, size_t min_chunk_size)
{
@ -15,6 +19,12 @@ std::pair<bool, size_t> fileSegmentationEngineJSONEachRowImpl(ReadBuffer & in, D
while (loadAtPosition(in, memory, pos) && (balance || memory.size() + static_cast<size_t>(pos - in.position()) < min_chunk_size))
{
const auto current_object_size = memory.size() + static_cast<size_t>(pos - in.position());
if (current_object_size > 10 * min_chunk_size)
throw ParsingException("Size of JSON object is extremely large. Expected not greater than " +
std::to_string(min_chunk_size) + " bytes, but current is " + std::to_string(current_object_size) +
" bytes per row. Increase the value setting 'min_chunk_bytes_for_parallel_parsing' or check your data manually, most likely JSON is malformed", ErrorCodes::INCORRECT_DATA);
if (quotes)
{
pos = find_first_symbols<'\\', '"'>(pos, in.buffer().end());

View File

@ -45,6 +45,7 @@ void registerFunctionTimeZone(FunctionFactory &);
void registerFunctionRunningAccumulate(FunctionFactory &);
void registerFunctionRunningDifference(FunctionFactory &);
void registerFunctionRunningDifferenceStartingWithFirstValue(FunctionFactory &);
void registerFunctionRunningConcurrency(FunctionFactory &);
void registerFunctionFinalizeAggregation(FunctionFactory &);
void registerFunctionToLowCardinality(FunctionFactory &);
void registerFunctionLowCardinalityIndices(FunctionFactory &);
@ -112,6 +113,7 @@ void registerFunctionsMiscellaneous(FunctionFactory & factory)
registerFunctionRunningAccumulate(factory);
registerFunctionRunningDifference(factory);
registerFunctionRunningDifferenceStartingWithFirstValue(factory);
registerFunctionRunningConcurrency(factory);
registerFunctionFinalizeAggregation(factory);
registerFunctionToLowCardinality(factory);
registerFunctionLowCardinalityIndices(factory);

View File

@ -0,0 +1,223 @@
#include <Columns/ColumnVector.h>
#include <Core/callOnTypeIndex.h>
#include <DataTypes/IDataType.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime.h>
#include <DataTypes/DataTypeDateTime64.h>
#include <DataTypes/DataTypeNullable.h>
#include <Formats/FormatSettings.h>
#include <Functions/FunctionFactory.h>
#include <Functions/IFunctionImpl.h>
#include <IO/WriteBufferFromString.h>
#include <common/defines.h>
#include <set>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_COLUMN;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int INCORRECT_DATA;
}
template <typename Name, typename ArgDataType, typename ConcurrencyDataType>
class ExecutableFunctionRunningConcurrency : public IExecutableFunctionImpl
{
public:
String getName() const override
{
return Name::name;
}
ColumnPtr execute(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
using ColVecArg = typename ArgDataType::ColumnType;
const ColVecArg * col_begin = checkAndGetColumn<ColVecArg>(arguments[0].column.get());
const ColVecArg * col_end = checkAndGetColumn<ColVecArg>(arguments[1].column.get());
if (!col_begin || !col_end)
throw Exception("Constant columns are not supported at the moment",
ErrorCodes::ILLEGAL_COLUMN);
const typename ColVecArg::Container & vec_begin = col_begin->getData();
const typename ColVecArg::Container & vec_end = col_end->getData();
using ColVecConc = typename ConcurrencyDataType::ColumnType;
typename ColVecConc::MutablePtr col_concurrency = ColVecConc::create(input_rows_count);
typename ColVecConc::Container & vec_concurrency = col_concurrency->getData();
std::multiset<typename ArgDataType::FieldType> ongoing_until;
for (size_t i = 0; i < input_rows_count; ++i)
{
const auto begin = vec_begin[i];
const auto end = vec_end[i];
if (unlikely(begin > end))
{
const FormatSettings default_format;
WriteBufferFromOwnString buf_begin, buf_end;
arguments[0].type->serializeAsTextQuoted(*(arguments[0].column), i, buf_begin, default_format);
arguments[1].type->serializeAsTextQuoted(*(arguments[1].column), i, buf_end, default_format);
throw Exception(
"Incorrect order of events: " + buf_begin.str() + " > " + buf_end.str(),
ErrorCodes::INCORRECT_DATA);
}
ongoing_until.insert(end);
// Erase all the elements from "ongoing_until" which
// are less than or equal to "begin", i.e. durations
// that have already ended. We consider "begin" to be
// inclusive, and "end" to be exclusive.
ongoing_until.erase(
ongoing_until.begin(), ongoing_until.upper_bound(begin));
vec_concurrency[i] = ongoing_until.size();
}
return col_concurrency;
}
bool useDefaultImplementationForConstants() const override
{
return true;
}
};
template <typename Name, typename ArgDataType, typename ConcurrencyDataType>
class FunctionBaseRunningConcurrency : public IFunctionBaseImpl
{
public:
explicit FunctionBaseRunningConcurrency(DataTypes argument_types_, DataTypePtr return_type_)
: argument_types(std::move(argument_types_))
, return_type(std::move(return_type_)) {}
String getName() const override
{
return Name::name;
}
const DataTypes & getArgumentTypes() const override
{
return argument_types;
}
const DataTypePtr & getResultType() const override
{
return return_type;
}
ExecutableFunctionImplPtr prepare(const ColumnsWithTypeAndName &) const override
{
return std::make_unique<ExecutableFunctionRunningConcurrency<Name, ArgDataType, ConcurrencyDataType>>();
}
bool isStateful() const override
{
return true;
}
private:
DataTypes argument_types;
DataTypePtr return_type;
};
template <typename Name, typename ConcurrencyDataType>
class RunningConcurrencyOverloadResolver : public IFunctionOverloadResolverImpl
{
template <typename T>
struct TypeTag
{
using Type = T;
};
/// Call a polymorphic lambda with a type tag of src_type.
template <typename F>
void dispatchForSourceType(const IDataType & src_type, F && f) const
{
WhichDataType which(src_type);
switch (which.idx)
{
case TypeIndex::Date: f(TypeTag<DataTypeDate>()); break;
case TypeIndex::DateTime: f(TypeTag<DataTypeDateTime>()); break;
case TypeIndex::DateTime64: f(TypeTag<DataTypeDateTime64>()); break;
default:
throw Exception(
"Arguments for function " + getName() + " must be Date, DateTime, or DateTime64.",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
}
public:
static constexpr auto name = Name::name;
static FunctionOverloadResolverImplPtr create(const Context &)
{
return std::make_unique<RunningConcurrencyOverloadResolver<Name, ConcurrencyDataType>>();
}
String getName() const override
{
return Name::name;
}
FunctionBaseImplPtr build(const ColumnsWithTypeAndName & arguments, const DataTypePtr & return_type) const override
{
// The type of the second argument must match with that of the first one.
if (unlikely(!arguments[1].type->equals(*(arguments[0].type))))
{
throw Exception(
"Function " + getName() + " must be called with two arguments having the same type.",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
DataTypes argument_types = { arguments[0].type, arguments[1].type };
FunctionBaseImplPtr base;
dispatchForSourceType(*(arguments[0].type), [&](auto arg_type_tag) // Throws when the type is inappropriate.
{
using Tag = decltype(arg_type_tag);
using ArgDataType = typename Tag::Type;
base = std::make_unique<FunctionBaseRunningConcurrency<Name, ArgDataType, ConcurrencyDataType>>(argument_types, return_type);
});
return base;
}
DataTypePtr getReturnType(const DataTypes &) const override
{
return std::make_shared<ConcurrencyDataType>();
}
size_t getNumberOfArguments() const override
{
return 2;
}
bool isInjective(const ColumnsWithTypeAndName &) const override
{
return false;
}
bool isStateful() const override
{
return true;
}
bool useDefaultImplementationForNulls() const override
{
return false;
}
};
struct NameRunningConcurrency
{
static constexpr auto name = "runningConcurrency";
};
void registerFunctionRunningConcurrency(FunctionFactory & factory)
{
factory.registerFunction<RunningConcurrencyOverloadResolver<NameRunningConcurrency, DataTypeUInt32>>();
}
}

View File

@ -423,6 +423,7 @@ SRCS(
rowNumberInAllBlocks.cpp
rowNumberInBlock.cpp
runningAccumulate.cpp
runningConcurrency.cpp
runningDifference.cpp
runningDifferenceStartingWithFirstValue.cpp
sigmoid.cpp

View File

@ -1104,9 +1104,9 @@ void saveUpToPosition(ReadBuffer & in, DB::Memory<> & memory, char * current)
assert(current >= in.position());
assert(current <= in.buffer().end());
const int old_bytes = memory.size();
const int additional_bytes = current - in.position();
const int new_bytes = old_bytes + additional_bytes;
const size_t old_bytes = memory.size();
const size_t additional_bytes = current - in.position();
const size_t new_bytes = old_bytes + additional_bytes;
/// There are no new bytes to add to memory.
/// No need to do extra stuff.
if (new_bytes == 0)

View File

@ -13,6 +13,7 @@
# include <aws/core/platform/Environment.h>
# include <aws/core/utils/logging/LogMacros.h>
# include <aws/core/utils/logging/LogSystemInterface.h>
# include <aws/core/utils/HashingUtils.h>
# include <aws/s3/S3Client.h>
# include <aws/core/http/HttpClientFactory.h>
# include <IO/S3/PocoHTTPClientFactory.h>
@ -273,56 +274,12 @@ namespace S3
return ret;
}
/// This method is not static because it requires ClientFactory to be initialized.
std::shared_ptr<Aws::S3::S3Client> ClientFactory::create( // NOLINT
const String & endpoint,
bool is_virtual_hosted_style,
const String & access_key_id,
const String & secret_access_key,
bool use_environment_credentials,
const RemoteHostFilter & remote_host_filter,
unsigned int s3_max_redirects)
{
PocoHTTPClientConfiguration client_configuration(remote_host_filter, s3_max_redirects);
if (!endpoint.empty())
client_configuration.endpointOverride = endpoint;
return create(client_configuration,
is_virtual_hosted_style,
access_key_id,
secret_access_key,
use_environment_credentials);
}
std::shared_ptr<Aws::S3::S3Client> ClientFactory::create( // NOLINT
const PocoHTTPClientConfiguration & cfg_,
bool is_virtual_hosted_style,
const String & access_key_id,
const String & secret_access_key,
bool use_environment_credentials)
{
Aws::Auth::AWSCredentials credentials(access_key_id, secret_access_key);
PocoHTTPClientConfiguration client_configuration = cfg_;
client_configuration.updateSchemeAndRegion();
return std::make_shared<Aws::S3::S3Client>(
std::make_shared<S3CredentialsProviderChain>(
client_configuration,
credentials,
use_environment_credentials), // AWS credentials provider.
std::move(client_configuration), // Client configuration.
Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never, // Sign policy.
is_virtual_hosted_style || client_configuration.endpointOverride.empty() // Use virtual addressing if endpoint is not specified.
);
}
std::shared_ptr<Aws::S3::S3Client> ClientFactory::create( // NOLINT
const PocoHTTPClientConfiguration & cfg_,
bool is_virtual_hosted_style,
const String & access_key_id,
const String & secret_access_key,
const String & server_side_encryption_customer_key_base64,
HeaderCollection headers,
bool use_environment_credentials)
{
@ -331,7 +288,28 @@ namespace S3
Aws::Auth::AWSCredentials credentials(access_key_id, secret_access_key);
auto auth_signer = std::make_shared<S3AuthSigner>(client_configuration, std::move(credentials), std::move(headers), use_environment_credentials);
if (!server_side_encryption_customer_key_base64.empty())
{
/// See S3Client::GeneratePresignedUrlWithSSEC().
headers.push_back({Aws::S3::SSEHeaders::SERVER_SIDE_ENCRYPTION_CUSTOMER_ALGORITHM,
Aws::S3::Model::ServerSideEncryptionMapper::GetNameForServerSideEncryption(Aws::S3::Model::ServerSideEncryption::AES256)});
headers.push_back({Aws::S3::SSEHeaders::SERVER_SIDE_ENCRYPTION_CUSTOMER_KEY,
server_side_encryption_customer_key_base64});
Aws::Utils::ByteBuffer buffer = Aws::Utils::HashingUtils::Base64Decode(server_side_encryption_customer_key_base64);
String str_buffer(reinterpret_cast<char *>(buffer.GetUnderlyingData()), buffer.GetLength());
headers.push_back({Aws::S3::SSEHeaders::SERVER_SIDE_ENCRYPTION_CUSTOMER_KEY_MD5,
Aws::Utils::HashingUtils::Base64Encode(Aws::Utils::HashingUtils::CalculateMD5(str_buffer))});
}
auto auth_signer = std::make_shared<S3AuthSigner>(
client_configuration,
std::move(credentials),
std::move(headers),
use_environment_credentials);
return std::make_shared<Aws::S3::S3Client>(
std::move(auth_signer),
std::move(client_configuration), // Client configuration.

View File

@ -31,27 +31,12 @@ public:
static ClientFactory & instance();
std::shared_ptr<Aws::S3::S3Client> create(
const String & endpoint,
bool is_virtual_hosted_style,
const String & access_key_id,
const String & secret_access_key,
bool use_environment_credentials,
const RemoteHostFilter & remote_host_filter,
unsigned int s3_max_redirects);
std::shared_ptr<Aws::S3::S3Client> create(
const PocoHTTPClientConfiguration & cfg,
bool is_virtual_hosted_style,
const String & access_key_id,
const String & secret_access_key,
bool use_environment_credentials);
std::shared_ptr<Aws::S3::S3Client> create(
const PocoHTTPClientConfiguration & cfg,
bool is_virtual_hosted_style,
const String & access_key_id,
const String & secret_access_key,
const String & server_side_encryption_customer_key_base64,
HeaderCollection headers,
bool use_environment_credentials);

View File

@ -188,14 +188,14 @@ void WriteBufferFromHTTPServerResponse::onProgress(const Progress & progress)
void WriteBufferFromHTTPServerResponse::finalize()
{
if (offset())
next();
if (out)
{
next();
if (out)
out.reset();
out->next();
out.reset();
}
else
if (!offset())
{
/// If no remaining data, just send headers.
std::lock_guard lock(mutex);

View File

@ -15,6 +15,10 @@
#include <Columns/ColumnFixedString.h>
#include <Columns/ColumnLowCardinality.h>
#if defined(__SSSE3__) && !defined(MEMORY_SANITIZER)
#include <tmmintrin.h>
#endif
template <>
struct DefaultHash<StringRef> : public StringRefHash {};
@ -255,4 +259,32 @@ static inline StringRef ALWAYS_INLINE serializeKeysToPoolContiguous(
}
/** Pack elements with shuffle instruction.
* See the explanation in ColumnsHashing.h
*/
#if defined(__SSSE3__) && !defined(MEMORY_SANITIZER)
template <typename T>
static T inline packFixedShuffle(
const char * __restrict * __restrict srcs,
size_t num_srcs,
const size_t * __restrict elem_sizes,
size_t idx,
const uint8_t * __restrict masks)
{
__m128i res{};
for (size_t i = 0; i < num_srcs; ++i)
{
res = _mm_xor_si128(res,
_mm_shuffle_epi8(
_mm_loadu_si128(reinterpret_cast<const __m128i *>(srcs[i] + elem_sizes[i] * idx)),
_mm_loadu_si128(reinterpret_cast<const __m128i *>(&masks[i * sizeof(T)]))));
}
T out;
__builtin_memcpy(&out, &res, sizeof(T));
return out;
}
#endif
}

View File

@ -365,7 +365,13 @@ struct AggregationMethodKeysFixed
template <typename Other>
AggregationMethodKeysFixed(const Other & other) : data(other.data) {}
using State = ColumnsHashing::HashMethodKeysFixed<typename Data::value_type, Key, Mapped, has_nullable_keys, has_low_cardinality, use_cache>;
using State = ColumnsHashing::HashMethodKeysFixed<
typename Data::value_type,
Key,
Mapped,
has_nullable_keys,
has_low_cardinality,
use_cache>;
static const bool low_cardinality_optimization = false;

View File

@ -329,7 +329,7 @@ InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter(const ASTPtr & ast
InterpreterSelectWithUnionQuery::~InterpreterSelectWithUnionQuery() = default;
Block InterpreterSelectWithUnionQuery::getSampleBlock(const ASTPtr & query_ptr_, const Context & context_)
Block InterpreterSelectWithUnionQuery::getSampleBlock(const ASTPtr & query_ptr_, const Context & context_, bool is_subquery)
{
auto & cache = context_.getSampleBlockCache();
/// Using query string because query_ptr changes for every internal SELECT
@ -339,7 +339,11 @@ Block InterpreterSelectWithUnionQuery::getSampleBlock(const ASTPtr & query_ptr_,
return cache[key];
}
return cache[key] = InterpreterSelectWithUnionQuery(query_ptr_, context_, SelectQueryOptions().analyze()).getSampleBlock();
if (is_subquery)
return cache[key]
= InterpreterSelectWithUnionQuery(query_ptr_, context_, SelectQueryOptions().subquery().analyze()).getSampleBlock();
else
return cache[key] = InterpreterSelectWithUnionQuery(query_ptr_, context_, SelectQueryOptions().analyze()).getSampleBlock();
}

View File

@ -35,7 +35,8 @@ public:
static Block getSampleBlock(
const ASTPtr & query_ptr_,
const Context & context_);
const Context & context_,
bool is_subquery = false);
virtual void ignoreWithTotals() override;

View File

@ -84,7 +84,7 @@ static NamesAndTypesList getColumnsFromTableExpression(
if (table_expression.subquery)
{
const auto & subquery = table_expression.subquery->children.at(0);
names_and_type_list = InterpreterSelectWithUnionQuery::getSampleBlock(subquery, context).getNamesAndTypesList();
names_and_type_list = InterpreterSelectWithUnionQuery::getSampleBlock(subquery, context, true).getNamesAndTypesList();
}
else if (table_expression.table_function)
{

View File

@ -715,7 +715,6 @@ void HTTPHandler::trySendExceptionToClient(const std::string & s, int exception_
writeChar('\n', *used_output.out_maybe_compressed);
used_output.out_maybe_compressed->next();
used_output.out->next();
used_output.out->finalize();
}
}
@ -775,6 +774,9 @@ void HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest & request, Poco::Ne
trySendExceptionToClient(exception_message, exception_code, request, response, used_output);
}
if (used_output.out)
used_output.out->finalize();
}
DynamicQueryHandler::DynamicQueryHandler(IServer & server_, const std::string & param_name_)

View File

@ -234,6 +234,7 @@ StorageS3::StorageS3(
uri_.is_virtual_hosted_style,
credentials.GetAWSAccessKeyId(),
credentials.GetAWSSecretKey(),
settings.server_side_encryption_customer_key_base64,
std::move(settings.headers),
settings.use_environment_credentials.value_or(global_context.getConfigRef().getBool("s3.use_environment_credentials", false))
);

View File

@ -30,6 +30,7 @@ void StorageS3Settings::loadFromConfig(const String & config_elem, const Poco::U
auto endpoint = config.getString(config_elem + "." + key + ".endpoint");
auto access_key_id = config.getString(config_elem + "." + key + ".access_key_id", "");
auto secret_access_key = config.getString(config_elem + "." + key + ".secret_access_key", "");
auto server_side_encryption_customer_key_base64 = config.getString(config_elem + "." + key + ".server_side_encryption_customer_key_base64", "");
std::optional<bool> use_environment_credentials;
if (config.has(config_elem + "." + key + ".use_environment_credentials"))
{
@ -51,7 +52,7 @@ void StorageS3Settings::loadFromConfig(const String & config_elem, const Poco::U
}
}
settings.emplace(endpoint, S3AuthSettings{std::move(access_key_id), std::move(secret_access_key), std::move(headers), use_environment_credentials});
settings.emplace(endpoint, S3AuthSettings{std::move(access_key_id), std::move(secret_access_key), std::move(server_side_encryption_customer_key_base64), std::move(headers), use_environment_credentials});
}
}
}

View File

@ -27,6 +27,7 @@ struct S3AuthSettings
{
const String access_key_id;
const String secret_access_key;
const String server_side_encryption_customer_key_base64;
const HeaderCollection headers;

View File

@ -6,6 +6,8 @@
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<snapshot_distance>0</snapshot_distance>
<reserved_log_items>0</reserved_log_items>
</coordination_settings>
<raft_configuration>

View File

@ -29,8 +29,8 @@ def get_fake_zk():
def reset_last_zxid_listener(state):
print("Fake zk callback called for state", state)
global _fake_zk_instance
# reset last_zxid -- fake server doesn't support it
_fake_zk_instance.last_zxid = 0
if state != KazooState.CONNECTED:
_fake_zk_instance._reset()
_fake_zk_instance.add_listener(reset_last_zxid_listener)
_fake_zk_instance.start()

View File

@ -0,0 +1,7 @@
<test>
<query>WITH toUInt8(number) AS k, toUInt64(k) AS k1, k AS k2 SELECT k1, k2, count() FROM numbers(100000000) GROUP BY k1, k2</query>
<query>WITH toUInt8(number) AS k, toUInt16(k) AS k1, toUInt32(k) AS k2, k AS k3 SELECT k1, k2, k3, count() FROM numbers(100000000) GROUP BY k1, k2, k3</query>
<query>WITH toUInt8(number) AS k, k AS k1, k + 1 AS k2 SELECT k1, k2, count() FROM numbers(100000000) GROUP BY k1, k2</query>
<query>WITH toUInt8(number) AS k, k AS k1, k + 1 AS k2, k + 2 AS k3, k + 3 AS k4 SELECT k1, k2, k3, k4, count() FROM numbers(100000000) GROUP BY k1, k2, k3, k4</query>
<query>WITH toUInt8(number) AS k, toUInt64(k) AS k1, k1 + 1 AS k2 SELECT k1, k2, count() FROM numbers(100000000) GROUP BY k1, k2</query>
</test>

View File

@ -0,0 +1,19 @@
Invocation with Date columns
1
2
3
2
1
Invocation with DateTime
1
2
3
2
1
Invocation with DateTime64
1
2
3
2
1
Erroneous cases

View File

@ -0,0 +1,51 @@
--
SELECT 'Invocation with Date columns';
DROP TABLE IF EXISTS runningConcurrency_test;
CREATE TABLE runningConcurrency_test(begin Date, end Date) ENGINE = Memory;
INSERT INTO runningConcurrency_test VALUES ('2020-12-01', '2020-12-10'), ('2020-12-02', '2020-12-10'), ('2020-12-03', '2020-12-12'), ('2020-12-10', '2020-12-12'), ('2020-12-13', '2020-12-20');
SELECT runningConcurrency(begin, end) FROM runningConcurrency_test;
DROP TABLE runningConcurrency_test;
--
SELECT 'Invocation with DateTime';
DROP TABLE IF EXISTS runningConcurrency_test;
CREATE TABLE runningConcurrency_test(begin DateTime, end DateTime) ENGINE = Memory;
INSERT INTO runningConcurrency_test VALUES ('2020-12-01 00:00:00', '2020-12-01 00:59:59'), ('2020-12-01 00:30:00', '2020-12-01 00:59:59'), ('2020-12-01 00:40:00', '2020-12-01 01:30:30'), ('2020-12-01 01:10:00', '2020-12-01 01:30:30'), ('2020-12-01 01:50:00', '2020-12-01 01:59:59');
SELECT runningConcurrency(begin, end) FROM runningConcurrency_test;
DROP TABLE runningConcurrency_test;
--
SELECT 'Invocation with DateTime64';
DROP TABLE IF EXISTS runningConcurrency_test;
CREATE TABLE runningConcurrency_test(begin DateTime64(3), end DateTime64(3)) ENGINE = Memory;
INSERT INTO runningConcurrency_test VALUES ('2020-12-01 00:00:00.000', '2020-12-01 00:00:00.100'), ('2020-12-01 00:00:00.010', '2020-12-01 00:00:00.100'), ('2020-12-01 00:00:00.020', '2020-12-01 00:00:00.200'), ('2020-12-01 00:00:00.150', '2020-12-01 00:00:00.200'), ('2020-12-01 00:00:00.250', '2020-12-01 00:00:00.300');
SELECT runningConcurrency(begin, end) FROM runningConcurrency_test;
DROP TABLE runningConcurrency_test;
--
SELECT 'Erroneous cases';
-- Constant columns are currently not supported.
SELECT runningConcurrency(toDate(arrayJoin([1, 2])), toDate('2000-01-01')); -- { serverError 44 }
-- Unsupported data types
SELECT runningConcurrency('strings are', 'not supported'); -- { serverError 43 }
SELECT runningConcurrency(NULL, NULL); -- { serverError 43 }
SELECT runningConcurrency(CAST(NULL, 'Nullable(DateTime)'), CAST(NULL, 'Nullable(DateTime)')); -- { serverError 43 }
-- Mismatching data types
SELECT runningConcurrency(toDate('2000-01-01'), toDateTime('2000-01-01 00:00:00')); -- { serverError 43 }
-- begin > end
SELECT runningConcurrency(toDate('2000-01-02'), toDate('2000-01-01')); -- { serverError 117 }

View File

@ -0,0 +1,9 @@
#!/usr/bin/env bash
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
${CLICKHOUSE_CLIENT} -q "create table insert_big_json(a String, b String) engine=MergeTree() order by tuple()";
python3 -c "[print('{{\"a\":\"{}\", \"b\":\"{}\"'.format('clickhouse'* 1000000, 'dbms' * 1000000)) for i in range(10)]; [print('{{\"a\":\"{}\", \"b\":\"{}\"}}'.format('clickhouse'* 100000, 'dbms' * 100000)) for i in range(10)]" 2>/dev/null | ${CLICKHOUSE_CLIENT} --input_format_parallel_parsing=1 --max_memory_usage=0 -q "insert into insert_big_json FORMAT JSONEachRow" 2>&1 | grep -q "min_chunk_bytes_for_parallel_parsing" && echo "Ok." || echo "FAIL" ||:

View File

@ -0,0 +1 @@
WITH (SELECT count(distinct colU) from tabA) AS withA, (SELECT count(distinct colU) from tabA) AS withB SELECT withA / withB AS ratio FROM (SELECT date AS period, colX FROM (SELECT date, if(colA IN (SELECT colB FROM tabC), 0, colA) AS colX FROM tabB) AS tempB GROUP BY period, colX) AS main; -- {serverError 60}

View File

@ -0,0 +1,2 @@
2021-02-14 23:59:59
10

View File

@ -0,0 +1,2 @@
SELECT subtractSeconds(toDate('2021-02-15'), 1);
SELECT subtractSeconds(today(), 1) - subtractSeconds(today(), 11);

View File

@ -0,0 +1,3 @@
2020-05-13 13:38:45 2020-05-13 16:38:45
2020-05-13 13:38:45 2020-05-13 16:38:45
2020-05-13 13:38:45 2020-05-13 16:38:45

View File

@ -0,0 +1,45 @@
DROP TABLE IF EXISTS test;
CREATE TABLE test (timestamp DateTime('UTC'), i UInt8) Engine=MergeTree() PARTITION BY toYYYYMM(timestamp) ORDER BY (i);
INSERT INTO test values ('2020-05-13 16:38:45', 1);
SELECT
toTimeZone(timestamp, 'America/Sao_Paulo') AS converted,
timestamp AS original
FROM test
LEFT JOIN (SELECT 2 AS x) AS anything ON x = i
WHERE timestamp >= toDateTime('2020-05-13T00:00:00', 'America/Sao_Paulo');
/* This was incorrect result in previous ClickHouse versions:
convertedoriginal
2020-05-13 16:38:45 2020-05-13 16:38:45 <-- toTimeZone is ignored.
*/
SELECT
toTimeZone(timestamp, 'America/Sao_Paulo') AS converted,
timestamp AS original
FROM test
-- LEFT JOIN (SELECT 2 AS x) AS anything ON x = i -- Removing the join fixes the issue.
WHERE timestamp >= toDateTime('2020-05-13T00:00:00', 'America/Sao_Paulo');
/*
convertedoriginal
2020-05-13 13:38:45 2020-05-13 16:38:45 <-- toTimeZone works.
*/
SELECT
toTimeZone(timestamp, 'America/Sao_Paulo') AS converted,
timestamp AS original
FROM test
LEFT JOIN (SELECT 2 AS x) AS anything ON x = i
WHERE timestamp >= '2020-05-13T00:00:00'; -- Not using toDateTime in the WHERE also fixes the issue.
/*
convertedoriginal
2020-05-13 13:38:45 2020-05-13 16:38:45 <-- toTimeZone works.
*/
DROP TABLE test;

View File

@ -10,7 +10,6 @@
"00152_insert_different_granularity",
"00151_replace_partition_with_different_granularity",
"00157_cache_dictionary",
"00992_system_parts_race_condition_zookeeper", /// TODO remove me (alesapin)
"01193_metadata_loading",
"01473_event_time_microseconds",
"01526_max_untracked_memory", /// requires TraceCollector, does not available under sanitizers
@ -26,7 +25,6 @@
"memory_profiler",
"odbc_roundtrip",
"01103_check_cpu_instructions_at_startup",
"00992_system_parts_race_condition_zookeeper", /// TODO remove me (alesapin)
"01473_event_time_microseconds",
"01526_max_untracked_memory", /// requires TraceCollector, does not available under sanitizers
"01193_metadata_loading"
@ -37,7 +35,6 @@
"memory_profiler",
"01103_check_cpu_instructions_at_startup",
"00900_orc_load",
"00992_system_parts_race_condition_zookeeper", /// TODO remove me (alesapin)
"01473_event_time_microseconds",
"01526_max_untracked_memory", /// requires TraceCollector, does not available under sanitizers
"01193_metadata_loading"
@ -49,7 +46,6 @@
"01103_check_cpu_instructions_at_startup",
"01086_odbc_roundtrip", /// can't pass because odbc libraries are not instrumented
"00877_memory_limit_for_new_delete", /// memory limits don't work correctly under msan because it replaces malloc/free
"00992_system_parts_race_condition_zookeeper", /// TODO remove me (alesapin)
"01473_event_time_microseconds",
"01526_max_untracked_memory", /// requires TraceCollector, does not available under sanitizers
"01193_metadata_loading"
@ -61,7 +57,6 @@
"00980_alter_settings_race",
"00834_kill_mutation_replicated_zookeeper",
"00834_kill_mutation",
"00992_system_parts_race_condition_zookeeper", /// TODO remove me (alesapin)
"01200_mutations_memory_consumption",
"01103_check_cpu_instructions_at_startup",
"01037_polygon_dicts_",
@ -87,7 +82,6 @@
"00505_secure",
"00505_shard_secure",
"odbc_roundtrip",
"00992_system_parts_race_condition_zookeeper", /// TODO remove me (alesapin)
"01103_check_cpu_instructions_at_startup",
"01114_mysql_database_engine_segfault",
"00834_cancel_http_readonly_queries_on_client_close",
@ -101,19 +95,16 @@
"01455_time_zones"
],
"release-build": [
"00992_system_parts_race_condition_zookeeper" /// TODO remove me (alesapin)
],
"database-ordinary": [
"00604_show_create_database",
"00609_mv_index_in_in",
"00510_materizlized_view_and_deduplication_zookeeper",
"00738_lock_for_inner_table",
"00992_system_parts_race_condition_zookeeper" /// TODO remove me (alesapin)
"00738_lock_for_inner_table"
],
"polymorphic-parts": [
"01508_partition_pruning_long", /// bug, shoud be fixed
"01482_move_to_prewhere_and_cast", /// bug, shoud be fixed
"00992_system_parts_race_condition_zookeeper" /// TODO remove me (alesapin)
"01482_move_to_prewhere_and_cast" /// bug, shoud be fixed
],
"antlr": [
"00186_very_long_arrays",
@ -153,7 +144,6 @@
"00982_array_enumerate_uniq_ranked",
"00984_materialized_view_to_columns",
"00988_constraints_replication_zookeeper",
"00992_system_parts_race_condition_zookeeper", /// TODO remove me (alesapin)
"00995_order_by_with_fill",
"01001_enums_in_in_section",
"01011_group_uniq_array_memsan",

View File

@ -97,7 +97,8 @@ void run(String part_path, String date_column, String dest_path)
Poco::File(new_tmp_part_path_str + "checksums.txt").setWriteable();
WriteBufferFromFile checksums_out(new_tmp_part_path_str + "checksums.txt", 4096);
checksums.write(checksums_out);
checksums.close();
checksums_in.close();
checksums_out.close();
Poco::File(new_tmp_part_path).renameTo(new_part_path.toString());
}