Merge remote-tracking branch 'upstream/master' into fix25

This commit is contained in:
proller 2019-09-10 14:45:52 +03:00
commit 6d50a2eda0
15 changed files with 271 additions and 31 deletions

View File

@ -13,7 +13,6 @@ ClickHouse is an open-source column-oriented database management system that all
* You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.
## Upcoming Events
* [ClickHouse Meetup in Moscow](https://yandex.ru/promo/clickhouse/moscow-2019) on September 5.
* [ClickHouse Meetup in Munich](https://www.meetup.com/ClickHouse-Meetup-Munich/events/264185199/) on September 17.
* [ClickHouse Meetup in Paris](https://www.eventbrite.com/e/clickhouse-paris-meetup-2019-registration-68493270215) on October 3.
* [ClickHouse Meetup in Hong Kong](https://www.meetup.com/Hong-Kong-Machine-Learning-Meetup/events/263580542/) on October 17.

View File

@ -72,9 +72,7 @@ void ReadBufferFromKafkaConsumer::commit()
PrintOffsets("Polled offset", consumer->get_offsets_position(consumer->get_assignment()));
/// Since we can poll more messages than we already processed - commit only processed messages.
if (!messages.empty())
consumer->async_commit(*std::prev(current));
consumer->async_commit();
PrintOffsets("Committed offset", consumer->get_offsets_committed(consumer->get_assignment()));
@ -186,6 +184,9 @@ bool ReadBufferFromKafkaConsumer::nextImpl()
auto new_position = reinterpret_cast<char *>(const_cast<unsigned char *>(current->get_payload().get_data()));
BufferBase::set(new_position, current->get_payload().get_size(), 0);
/// Since we can poll more messages than we already processed - commit only processed messages.
consumer->store_offset(*current);
++current;
return true;

View File

@ -261,9 +261,10 @@ ConsumerBufferPtr StorageKafka::createReadBuffer()
conf.set("metadata.broker.list", brokers);
conf.set("group.id", group);
conf.set("client.id", VERSION_FULL);
conf.set("auto.offset.reset", "smallest"); // If no offset stored for this group, read all messages from the start
conf.set("enable.auto.commit", "false"); // We manually commit offsets after a stream successfully finished
conf.set("enable.partition.eof", "false"); // Ignore EOF messages
conf.set("auto.offset.reset", "smallest"); // If no offset stored for this group, read all messages from the start
conf.set("enable.auto.commit", "false"); // We manually commit offsets after a stream successfully finished
conf.set("enable.auto.offset.store", "false"); // Update offset automatically - to commit them all at once.
conf.set("enable.partition.eof", "false"); // Ignore EOF messages
updateConfiguration(conf);
// Create a consumer and subscribe to topics

View File

@ -29,6 +29,7 @@ The supported formats are:
| [Protobuf](#protobuf) | ✔ | ✔ |
| [Parquet](#data-format-parquet) | ✔ | ✔ |
| [RowBinary](#rowbinary) | ✔ | ✔ |
| [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ |
| [Native](#native) | ✔ | ✔ |
| [Null](#null) | ✗ | ✔ |
| [XML](#xml) | ✗ | ✔ |
@ -680,9 +681,10 @@ For [NULL](../query_language/syntax.md#null-literal) support, an additional byte
## RowBinaryWithNamesAndTypes {#rowbinarywithnamesandtypes}
Similar to [RowBinary](#rowbinary), but with added header:
* [LEB128](https://en.wikipedia.org/wiki/LEB128)-encoded number of columns (N)
* N `String`s specifying column names
* N `String`s specifying column types
* [LEB128](https://en.wikipedia.org/wiki/LEB128)-encoded number of columns (N)
* N `String`s specifying column names
* N `String`s specifying column types
## Values {#data-format-values}
@ -924,17 +926,17 @@ Data types of a ClickHouse table columns can differ from the corresponding field
You can insert Parquet data from a file into ClickHouse table by the following command:
```
```bash
cat {filename} | clickhouse-client --query="INSERT INTO {some_table} FORMAT Parquet"
```
You can select data from a ClickHouse table and save them into some file in the Parquet format by the following command:
```
```sql
clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_file.pq}
```
To exchange data with the Hadoop, you can use [`HDFS` table engine](../../operations/table_engines/hdfs.md).
To exchange data with the Hadoop, you can use [HDFS table engine](../operations/table_engines/hdfs.md).
## Format Schema {#formatschema}

View File

@ -857,6 +857,18 @@ Possible values:
Default value: 0.
## optimize_throw_if_noop {#setting-optimize_throw_if_noop}
Enables or disables throwing an exception if the [OPTIMIZE](../../query_language/misc.md#misc_operations-optimize) query have not performed a merge.
By default `OPTIMIZE` returns successfully even if it haven't done anything. This setting allows to distinguish this situation and get the reason in exception message.
Possible values:
- 1 — Throwing an exception is enabled.
- 0 — Throwing an exception is disabled.
Default value: 0.
## distributed_replica_error_half_life {#settings-distributed_replica_error_half_life}
- Type: seconds

View File

@ -64,9 +64,9 @@ Please note that `errors_count` is updated once per query to the cluster, but `e
** See also **
- [Table engine Distributed](../../operations/table_engines/distributed.md)
- [distributed_replica_error_cap setting](../settings/settings.md#settings-distributed_replica_error_cap)
- [distributed_replica_error_half_life setting](../settings/settings.md#settings-distributed_replica_error_half_life)
- [Table engine Distributed](table_engines/distributed.md)
- [distributed_replica_error_cap setting](settings/settings.md#settings-distributed_replica_error_cap)
- [distributed_replica_error_half_life setting](settings/settings.md#settings-distributed_replica_error_half_life)
## system.columns

View File

@ -388,10 +388,66 @@ When the values in the column expire, ClickHouse replaces them with the default
The `TTL` clause can't be used for key columns.
Examples:
Creating a table with TTL
```sql
CREATE TABLE example_table
(
d DateTime,
a Int TTL d + INTERVAL 1 MONTH,
b Int TTL d + INTERVAL 1 MONTH,
c String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d;
```
Adding TTL to a column of an existing table
```sql
ALTER TABLE example_table
MODIFY COLUMN
c String TTL d + INTERVAL 1 DAY;
```
Altering TTL of the column
```sql
ALTER TABLE example_table
MODIFY COLUMN
c String TTL d + INTERVAL 1 MONTH;
```
**Table TTL**
When data in a table expires, ClickHouse deletes all corresponding rows.
Examples:
Creating a table with TTL
```sql
CREATE TABLE example_table
(
d DateTime,
a Int
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d
TTL d + INTERVAL 1 MONTH;
```
Altering TTL of the table
```sql
ALTER TABLE example_table
MODIFY TTL d + INTERVAL 1 DAY;
```
**Removing Data**
Data with an expired TTL is removed when ClickHouse merges data parts.

View File

@ -38,6 +38,7 @@ SELECT greatCircleDistance(55.755831, 37.617673, -55.755831, -37.617673)
## pointInEllipses
Checks whether the point belongs to at least one of the ellipses.
Coordinates are geometric in the Cartesian coordinate system.
```
pointInEllipses(x, y, x₀, y₀, a₀, b₀,...,xₙ, yₙ, aₙ, bₙ)
@ -47,7 +48,7 @@ pointInEllipses(x, y, x₀, y₀, a₀, b₀,...,xₙ, yₙ, aₙ, bₙ)
- `x, y` — Coordinates of a point on the plane.
- `xᵢ, yᵢ` — Coordinates of the center of the `i`-th ellipsis.
- `aᵢ, bᵢ` — Axes of the `i`-th ellipsis in meters.
- `aᵢ, bᵢ` — Axes of the `i`-th ellipsis in units of x, y coordinates.
The input parameters must be `2+4⋅n`, where `n` is the number of ellipses.
@ -58,13 +59,13 @@ The input parameters must be `2+4⋅n`, where `n` is the number of ellipses.
**Example**
``` sql
SELECT pointInEllipses(55.755831, 37.617673, 55.755831, 37.617673, 1.0, 2.0)
SELECT pointInEllipses(10., 10., 10., 9.1, 1., 0.9999)
```
```
┌─pointInEllipses(55.755831, 37.617673, 55.755831, 37.617673, 1., 2.)─┐
1 │
└─────────────────────────────────────────────────────────────────────
┌─pointInEllipses(10., 10., 10., 9.1, 1., 0.9999)─┐
│ 1 │
└─────────────────────────────────────────────────┘
```
## pointInPolygon

View File

@ -177,10 +177,13 @@ Changes already made by the mutation are not rolled back.
OPTIMIZE TABLE [db.]name [ON CLUSTER cluster] [PARTITION partition] [FINAL]
```
Asks the table engine to do something for optimization.
Supported only by `*MergeTree` engines, in which this query initializes a non-scheduled merge of data parts.
If you specify a `PARTITION`, only the specified partition will be optimized.
If you specify `FINAL`, optimization will be performed even when all the data is already in one part.
This query tries to initialize an unscheduled merge of data parts for tables with a table engine of [MergeTree](../operations/table_engines/mergetree.md) family. Other kinds of table engines are not supported.
When `OPTIMIZE` is used with [ReplicatedMergeTree](../operations/table_engines/replication.md) family of table engines, ClickHouse creates a task for merging and waits for execution on all nodes (if the `replication_alter_partitions_sync` setting is enabled).
- If `OPTIMIZE` doesn't perform merging for any reason, it doesn't notify the client about it. To enable notification use the [optimize_throw_if_noop](../operations/settings/settings.md#setting-optimize_throw_if_noop) setting.
- If you specify a `PARTITION`, only the specified partition is optimized.
- If you specify `FINAL`, optimization is performed even when all the data is already in one part.
!!! warning
OPTIMIZE can't fix the "Too many parts" error.

View File

@ -1,9 +1,58 @@
# SYSTEM Queries {#query_language-system}
- [RELOAD DICTIONARIES](#query_language-system-reload-dictionaries)
- [RELOAD DICTIONARY](#query_language-system-reload-dictionary)
- [DROP DNS CACHE](#query_language-system-drop-dns-cache)
- [DROP MARKS CACHE](#query_language-system-drop-marks-cache)
- [FLUSH LOGS](#query_language-system-flush_logs)
- [RELOAD CONFIG](#query_language-system-reload-config)
- [SHUTDOWN](#query_language-system-shutdown)
- [KILL](#query_language-system-kill)
- [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends)
- [FLUSH DISTRIBUTED](#query_language-system-flush-distributed)
- [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends)
## RELOAD DICTIONARIES {#query_language-system-reload-dictionaries}
Reloads all dictionaries that have been successfully loaded before.
By default, dictionaries are loaded lazily (see [dictionaries_lazy_load](../operations/server_settings/settings.md#server_settings-dictionaries_lazy_load)), so instead of being loaded automatically at startup, they are initialized on first access through dictGet function or SELECT from tables with ENGINE = Dictionary. The `SYSTEM RELOAD DICTIONARIES` query reloads such dictionaries (LOADED).
Always returns `Ok.` regardless of the result of the dictionary update.
## RELOAD DICTIONARY dictionary_name {#query_language-system-reload-dictionary}
Completely reloads a dictionary `dictionary_name`, regardless of the state of the dictionary (LOADED / NOT_LOADED / FAILED).
Always returns `Ok.` regardless of the result of updating the dictionary.
The status of the dictionary can be checked by querying the `system.dictionaries` table.
```sql
SELECT name, status FROM system.dictionaries;
```
## DROP DNS CACHE {#query_language-system-drop-dns-cache}
Resets ClickHouse's internal DNS cache. Sometimes (for old ClickHouse versions) it is necessary to use this command when changing the infrastructure (changing the IP address of another ClickHouse server or the server used by dictionaries).
For more convenient (automatic) cache management, see disable_internal_dns_cache, dns_cache_update_period parameters.
## DROP MARKS CACHE {#query_language-system-drop-marks-cache}
Resets the mark cache. Used in development of ClickHouse and performance tests.
## FLUSH LOGS {#query_language-system-flush_logs}
Flushes buffers of log messages to system tables (e.g. system.query_log). Allows you to not wait 7.5 seconds when debugging.
## RELOAD CONFIG {#query_language-system-reload-config}
Reloads ClickHouse configuration. Used when configuration is stored in ZooKeeeper.
## SHUTDOWN {#query_language-system-shutdown}
Normally shuts down ClickHouse (like `service clickhouse-server stop` / `kill {$pid_clickhouse-server}`)
## KILL {#query_language-system-kill}
Aborts ClickHouse process (like `kill -9 {$ pid_clickhouse-server}`)
## Managing Distributed Tables {#query_language-system-distributed}

View File

@ -28,6 +28,7 @@ ClickHouse может принимать (`INSERT`) и отдавать (`SELECT
| [Protobuf](#protobuf) | ✔ | ✔ |
| [Parquet](#data-format-parquet) | ✔ | ✔ |
| [RowBinary](#rowbinary) | ✔ | ✔ |
| [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ |
| [Native](#native) | ✔ | ✔ |
| [Null](#null) | ✗ | ✔ |
| [XML](#xml) | ✗ | ✔ |
@ -673,7 +674,15 @@ FixedString представлены просто как последовате
Array представлены как длина в формате varint (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), а затем элементы массива, подряд.
Для поддержки [NULL](../query_language/syntax.md#null-literal) перед каждым значением типа [Nullable](../data_types/nullable.md
Для поддержки [NULL](../query_language/syntax.md#null-literal) перед каждым значением типа [Nullable](../data_types/nullable.md) следует байт содержащий 1 или 0. Если байт 1, то значение равно NULL, и этот байт интерпретируется как отдельное значение (т.е. после него следует значение следующего поля). Если байт 0, то после байта следует значение поля (не равно NULL).
## RowBinaryWithNamesAndTypes {#rowbinarywithnamesandtypes}
То же самое что [RowBinary](#rowbinary), но добавляется заголовок:
* Количество колонок - N, закодированное [LEB128](https://en.wikipedia.org/wiki/LEB128),
* N строк (`String`) с именами колонок,
* N строк (`String`) с типами колонок.
## Values {#data-format-values}

View File

@ -327,10 +327,64 @@ TTL date_time + INTERVAL 15 HOUR
Секцию `TTL` нельзя использовать для ключевых столбцов.
Примеры:
Создание таблицы с TTL
```sql
CREATE TABLE example_table
(
d DateTime,
a Int TTL d + INTERVAL 1 MONTH,
b Int TTL d + INTERVAL 1 MONTH,
c String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d;
```
Добавление TTL на колонку существующей таблицы
```sql
ALTER TABLE example_table
MODIFY COLUMN
c String TTL d + INTERVAL 1 DAY;
```
Изменение TTL у колонки
```sql
ALTER TABLE example_table
MODIFY COLUMN
c String TTL d + INTERVAL 1 MONTH;
```
**TTL таблицы**
Когда некоторые данные в таблице устаревают, ClickHouse удаляет все соответствующие строки.
Примеры:
```sql
CREATE TABLE example_table
(
d DateTime,
a Int
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(d)
ORDER BY d
TTL d + INTERVAL 1 MONTH;
```
Изменение TTL
```sql
ALTER TABLE example_table
MODIFY TTL d + INTERVAL 1 DAY;
```
**Удаление данных**
Данные с истекшим TTL удаляются, когда ClickHouse мёржит куски данных.

View File

@ -38,6 +38,7 @@ SELECT greatCircleDistance(55.755831, 37.617673, -55.755831, -37.617673)
## pointInEllipses
Проверяет, принадлежит ли точка хотя бы одному из эллипсов.
Координаты — геометрические в декартовой системе координат.
```
pointInEllipses(x, y, x₀, y₀, a₀, b₀,...,xₙ, yₙ, aₙ, bₙ)
@ -47,7 +48,7 @@ pointInEllipses(x, y, x₀, y₀, a₀, b₀,...,xₙ, yₙ, aₙ, bₙ)
- `x, y` — координаты точки на плоскости.
- `xᵢ, yᵢ` — координаты центра `i`-го эллипса.
- `aᵢ, bᵢ` — полуоси `i`-го эллипса в метрах.
- `aᵢ, bᵢ` — полуоси `i`-го эллипса (в единицах измерения координат x,y).
Входных параметров должно быть `2+4⋅n`, где `n` — количество эллипсов.
@ -58,13 +59,13 @@ pointInEllipses(x, y, x₀, y₀, a₀, b₀,...,xₙ, yₙ, aₙ, bₙ)
**Пример**
```sql
SELECT pointInEllipses(55.755831, 37.617673, 55.755831, 37.617673, 1.0, 2.0)
SELECT pointInEllipses(10., 10., 10., 9.1, 1., 0.9999)
```
```
┌─pointInEllipses(55.755831, 37.617673, 55.755831, 37.617673, 1., 2.)─┐
1 │
└─────────────────────────────────────────────────────────────────────
┌─pointInEllipses(10., 10., 10., 9.1, 1., 0.9999)─┐
│ 1 │
└─────────────────────────────────────────────────┘
```
## pointInPolygon

View File

@ -1,9 +1,59 @@
# Запросы SYSTEM {#query_language-system}
- [RELOAD DICTIONARIES](#query_language-system-reload-dictionaries)
- [RELOAD DICTIONARY](#query_language-system-reload-dictionary)
- [DROP DNS CACHE](#query_language-system-drop-dns-cache)
- [DROP MARKS CACHE](#query_language-system-drop-marks-cache)
- [FLUSH LOGS](#query_language-system-flush_logs)
- [RELOAD CONFIG](#query_language-system-reload-config)
- [SHUTDOWN](#query_language-system-shutdown)
- [KILL](#query_language-system-kill)
- [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends)
- [FLUSH DISTRIBUTED](#query_language-system-flush-distributed)
- [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends)
## RELOAD DICTIONARIES {#query_language-system-reload-dictionaries}
Перегружает все словари, которые были успешно загружены до этого.
По умолчанию включена ленивая загрузка [dictionaries_lazy_load](../operations/server_settings/settings.md#dictionaries-lazy-load), поэтому словари не загружаются автоматически при старте, а только при первом обращении через dictGet или SELECT к ENGINE=Dictionary. После этого такие словари (LOADED) будут перегружаться командой `system reload dictionaries`.
Всегда возвращает `Ok.`, вне зависимости от результата обновления словарей.
## RELOAD DICTIONARY dictionary_name {#query_language-system-reload-dictionary}
Полностью перегружает словарь `dictionary_name`, вне зависимости от состояния словаря (LOADED/NOT_LOADED/FAILED).
Всегда возвращает `Ok.`, вне зависимости от результата обновления словаря.
Состояние словаря можно проверить запросом к `system.dictionaries`.
```sql
SELECT name, status FROM system.dictionaries;
```
## DROP DNS CACHE {#query_language-system-drop-dns-cache}
Сбрасывает внутренний DNS кеш ClickHouse. Иногда (для старых версий ClickHouse) необходимо использовать эту команду при изменении инфраструктуры (смене IP адреса у другого ClickHouse сервера или сервера, используемого словарями).
Для более удобного (автоматического) управления кешем см. параметры disable_internal_dns_cache, dns_cache_update_period.
## DROP MARKS CACHE {#query_language-system-drop-marks-cache}
Сбрасывает кеш "засечек" (`mark cache`). Используется при разработке ClickHouse и тестах производительности.
## FLUSH LOGS {#query_language-system-flush_logs}
Записывает буферы логов в системные таблицы (например system.query_log). Позволяет не ждать 7.5 секунд при отладке.
## RELOAD CONFIG {#query_language-system-reload-config}
Перечитывает конфигурацию настроек ClickHouse. Используется при хранении конфигурации в zookeeeper.
## SHUTDOWN {#query_language-system-shutdown}
Штатно завершает работу ClickHouse (аналог `service clickhouse-server stop` / `kill {$pid_clickhouse-server}`)
## KILL {#query_language-system-kill}
Аварийно завершает работу ClickHouse (аналог `kill -9 {$pid_clickhouse-server}`)
## Управление распределёнными таблицами {#query_language-system-distributed}
ClickHouse может оперировать [распределёнными](../operations/table_engines/distributed.md) таблицами. Когда пользователь вставляет данные в эти таблицы, ClickHouse сначала формирует очередь из данных, которые должны быть отправлены на узлы кластера, а затем асинхронно отправляет подготовленные данные. Вы пожете управлять очередью с помощью запросов [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) и [FLUSH DISTRIBUTED](#query_language-system-flush-distributed). Также есть возможность синхронно вставлять распределенные данные с помощью настройки `insert_distributed_sync`.

View File

@ -87,6 +87,7 @@ nav:
- 'MySQL': 'operations/table_engines/mysql.md'
- 'JDBC': 'operations/table_engines/jdbc.md'
- 'ODBC': 'operations/table_engines/odbc.md'
- 'HDFS': 'operations/table_engines/hdfs.md'
- 'Special':
- 'Distributed': 'operations/table_engines/distributed.md'
- 'External data': 'operations/table_engines/external_data.md'
@ -158,6 +159,7 @@ nav:
- 'mysql': 'query_language/table_functions/mysql.md'
- 'jdbc': 'query_language/table_functions/jdbc.md'
- 'odbc': 'query_language/table_functions/odbc.md'
- 'hdfs': 'query_language/table_functions/hdfs.md'
- 'input': 'query_language/table_functions/input.md'
- 'Dictionaries':
- 'Introduction': 'query_language/dicts/index.md'