External editions are revised. English translation is actualised from 02.03.2018 version up to 26.03.2018.

This commit is contained in:
BayoNet 2018-03-26 16:16:59 +03:00 committed by Alexey Milovidov
parent 474fc1dbb6
commit 6881d84bc2
12 changed files with 59 additions and 57 deletions

View File

@ -27,8 +27,7 @@ The dictionary configuration has the following structure:
```
- name The identifier that can be used to access the dictionary. Use the characters `[a-zA-Z0-9_\-]`.
- [source](external_dicts_dict_sources.html/#dicts-external_dicts_dict_sources) — Source of the dictionary .
- [source](external_dicts_dict_sources.md/#dicts-external_dicts_dict_sources) — Source of the dictionary .
- [layout](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout) — Dictionary layout in memory.
- [source](external_dicts_dict_sources.html/#dicts-external_dicts_dict_sources) — Structure of the dictionary . A key and attributes that can be retrieved by this key.
- [structure](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure) — Structure of the dictionary . A key and attributes that can be retrieved by this key.
- [lifetime](external_dicts_dict_lifetime.md#dicts-external_dicts_dict_lifetime) — Frequency of dictionary updates.

View File

@ -2,11 +2,11 @@
# Storing dictionaries in memory
There are a [variety of ways](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-manner) to store dictionaries in memory.
There are [many different ways](external_dicts_dict_layout#dicts-external_dicts_dict_layout-manner) to store dictionaries in memory.
We recommend [flat](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-flat), [hashed](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-hashed)and[complex_key_hashed](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-complex_key_hashed). which provide optimal processing speed.
We recommend [flat](external_dicts_dict_layout#dicts-external_dicts_dict_layout-flat), [hashed](external_dicts_dict_layout#dicts-external_dicts_dict_layout-hashed), and [complex_key_hashed](external_dicts_dict_layout#dicts-external_dicts_dict_layout-complex_key_hashed). which provide optimal processing speed.
Caching is not recommended because of potentially poor performance and difficulties in selecting optimal parameters. Read more in the section " [cache](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-cache)".
Caching is not recommended because of potentially poor performance and difficulties in selecting optimal parameters. Read more about this in the "[cache](external_dicts_dict_layout#dicts-external_dicts_dict_layout-cache)" section.
There are several ways to improve dictionary performance:
@ -88,7 +88,7 @@ Configuration example:
### complex_key_hashed
This type is for use with composite [keys](external_dicts_dict_structure.md/#dicts-external_dicts_dict_structure). Similar to `hashed`.
This type of storage is designed for use with compound [keys](external_dicts_dict_structure#dicts-external_dicts_dict_structure). It is similar to hashed.
Configuration example:
@ -109,18 +109,18 @@ This storage method works the same way as hashed and allows using date/time rang
Example: The table contains discounts for each advertiser in the format:
```
+---------------+---------------------+-------------------+--------+
| advertiser id | discount start date | discount end date | amount |
+===============+=====================+===================+========+
| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
+---------------+---------------------+-------------------+--------+
| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
+---------------+---------------------+-------------------+--------+
| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
+---------------+---------------------+-------------------+--------+
+---------------+---------------------+-------------------+--------+
| advertiser id | discount start date | discount end date | amount |
+===============+=====================+===================+========+
| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
+---------------+---------------------+-------------------+--------+
| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
+---------------+---------------------+-------------------+--------+
| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
+---------------+---------------------+-------------------+--------+
```
To use a sample for date ranges, define the `range_min` and `range_max` elements in the [structure](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure).
To use a sample for date ranges, define `range_min` and `range_max` in [structure](external_dicts_dict_structure#dicts-external_dicts_dict_structure).
Example:
@ -197,15 +197,15 @@ This is the least effective of all the ways to store dictionaries. The speed of
To improve cache performance, use a subquery with ` LIMIT`, and call the function with the dictionary externally.
Supported [sources](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources): MySQL, ClickHouse, executable, HTTP.
Supported [sources](external_dicts_dict_sources#dicts-external_dicts_dict_sources): MySQL, ClickHouse, executable, HTTP.
Example of settings:
```xml
<layout>
<cache>
<!-- The size of the cache, in number of cells. Rounded up to a power of two. -->
<size_in_cells>1000000000</size_in_cells>
<!-- The size of the cache, in number of cells. Rounded up to a power of two. -->
<size_in_cells>1000000000</size_in_cells>
</cache>
</layout>
```
@ -227,7 +227,7 @@ Do not use ClickHouse as a source, because it is slow to process queries with ra
### complex_key_cache
This type of storage is for use with composite [keys](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure). Similar to `cache`.
This type of storage is designed for use with compound [keys](external_dicts_dict_structure#dicts-external_dicts_dict_structure). Similar to `cache`.
<a name="dicts-external_dicts_dict_layout-ip_trie"></a>

View File

@ -66,7 +66,7 @@ Configuration fields:
The key can be a `tuple` from any types of fields. The [layout](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout) in this case must be `complex_key_hashed` or `complex_key_cache`.
<div class="admonition tip">
A composite key can consist of a single element. This makes it possible to use a string as the key, for instance.
A composite key can consist of a single element. This makes it possible to use a string as the key, for instance.
</div>
The key structure is set in the element `<key>`. Key fields are specified in the same format as the dictionary [attributes](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure-attributes). Example:

View File

@ -39,7 +39,7 @@ Accepts an empty array and returns a one-element array that is equal to the defa
Returns an array of numbers from 0 to N-1.
Just in case, an exception is thrown if arrays with a total length of more than 100,000,000 elements are created in a data block.
## array(x1, ...), оператор \[x1, ...\]
## array(x1, ...), operator \[x1, ...\]
Creates an array from the function arguments.
The arguments must be constants and have types that have the smallest common type. At least one argument must be passed, because otherwise it isn't clear which type of array to create. That is, you can't use this function to create an empty array (to do that, use the 'emptyArray\*' function described above).

View File

@ -5,7 +5,7 @@ In Yandex.Metrica, JSON is transmitted by users as session parameters. There are
The following assumptions are made:
1. The field name (function argument) must be a constant.
2. The field name is somehow canonically encoded in JSON. For example: `visitParamHas('{"abc":"def"}', 'abc') = 1`, но `visitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0`
2. The field name is somehow canonically encoded in JSON. For example: `visitParamHas('{"abc":"def"}', 'abc') = 1`, but `visitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0`
3. Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used.
4. The JSON doesn't have space characters outside of string literals.

View File

@ -16,15 +16,14 @@ The terminal must use UTF-8 encoding (the default in Ubuntu).
For testing and development, the system can be installed on a single server or on a desktop computer.
### Installing from packages
### Installing from packages Debian/Ubuntu
In `/etc/apt/sources.list` (or in a separate `/etc/apt/sources.list.d/clickhouse.list` file), add the repository:
```text
deb http://repo.yandex.ru/clickhouse/trusty stable main
deb http://repo.yandex.ru/clickhouse/deb/stable/ main/
```
On other versions of Ubuntu, replace `trusty` with `xenial` or `precise`.
If you want to use the most recent test version, replace 'stable' with 'testing'.
Then run:
@ -36,9 +35,7 @@ sudo apt-get install clickhouse-client clickhouse-server-common
```
You can also download and install packages manually from here:
<http://repo.yandex.ru/clickhouse/trusty/pool/main/c/clickhouse/>
<http://repo.yandex.ru/clickhouse/xenial/pool/main/c/clickhouse/>
<http://repo.yandex.ru/clickhouse/precise/pool/main/c/clickhouse/>
<https://repo.yandex.ru/clickhouse/deb/stable/main/>
ClickHouse contains access restriction settings. They are located in the 'users.xml' file (next to 'config.xml').
By default, access is allowed from anywhere for the 'default' user, without a password. See 'user/default/networks'.
@ -137,4 +134,3 @@ SELECT 1
**Congratulations, the system works!**
To continue experimenting, you can try to download from the test data sets.

View File

@ -37,8 +37,7 @@ Date: Fri, 16 Nov 2012 19:21:50 GMT
1
```
As you can see, curl is somewhat inconvenient in that spaces must be URL escaped.
Although wget escapes everything itself, we don't recommend using it because it doesn't work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
As you can see, curl is somewhat inconvenient in that spaces must be URL escaped.Although wget escapes everything itself, we don't recommend using it because it doesn't work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
```bash
$ echo 'SELECT 1' | curl 'http://localhost:8123/' --data-binary @-
@ -131,11 +130,15 @@ POST 'http://localhost:8123/?query=DROP TABLE t'
For successful requests that don't return a data table, an empty response body is returned.
You can use compression when transmitting data. The compressed data has a non-standard format, and you will need to use the special compressor program to work with it (sudo apt-get install compressor-metrika-yandex).
You can use compression when transmitting data.
For using ClickHouse internal compression format, and you will need to use the special compressor program to work with it (sudo apt-get install compressor-metrika-yandex).
If you specified 'compress=1' in the URL, the server will compress the data it sends you.
If you specified 'decompress=1' in the URL, the server will decompress the same data that you pass in the POST method.
Also standard gzip-based HTTP compression can be used. To send gzip compressed POST data just add `Content-Encoding: gzip` to request headers, and gzip POST body.
To get response compressed, you need to add `Accept-Encoding: gzip` to request headers, and turn on ClickHouse setting called `enable_http_compression`.
You can use this to reduce network traffic when transmitting a large amount of data, or for creating dumps that are immediately compressed.
You can use the 'database' URL parameter to specify the default database.
@ -191,7 +194,11 @@ $ echo 'SELECT number FROM system.numbers LIMIT 10' | curl 'http://localhost:812
For information about other parameters, see the section "SET".
In contrast to the native interface, the HTTP interface does not support the concept of sessions or session settings, does not allow aborting a query (to be exact, it allows this in only a few cases), and does not show the progress of query processing. Parsing and data formatting are performed on the server side, and using the network might be ineffective.
You can use ClickHouse sessions in the HTTP protocol. To do this, you need to specify the `session_id` GET parameter in HTTP request. You can use any alphanumeric string as a session_id. By default session will be timed out after 60 seconds of inactivity. You can change that by setting `default_session_timeout` in server config file, or by adding GET parameter `session_timeout`. You can also check the status of the session by using GET parameter `session_check=1`. When using sessions you can't run 2 queries with the same session_id simultaneously.
You can get the progress of query execution in X-ClickHouse-Progress headers, by enabling setting send_progress_in_http_headers.
Running query are not aborted automatically after closing HTTP connection. Parsing and data formatting are performed on the server side, and using the network might be ineffective.
The optional 'query_id' parameter can be passed as the query ID (any string). For more information, see the section "Settings, replace_running_query".
The optional 'quota_key' parameter can be passed as the quota key (any string). For more information, see the section "Quotas".
@ -213,4 +220,3 @@ curl -sS 'http://localhost:8123/?max_result_bytes=4000000&buffer_size=3000000&wa
```
Use buffering to avoid situations where a query processing error occurred after the response code and HTTP headers were sent to the client. In this situation, an error message is written at the end of the response body, and on the client side, the error can only be detected at the parsing stage.

View File

@ -440,14 +440,14 @@ For more information, see the MergeTreeSettings.h header file.
SSL client/server configuration.
Support for SSL is provided by the `` libpoco`` library. The interface is described in the file [SSLManager.h](https://github.com/yandex/ClickHouse/blob/master/contrib/libpoco/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h)
Support for SSL is provided by the `` libpoco`` library. The interface is described in the file [SSLManager.h](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h)
Keys for server/client settings:
- privateKeyFile The path to the file with the secret key of the PEM certificate. The file may contain a key and certificate at the same time.
- certificateFile The path to the client/server certificate file in PEM format. You can omit it if `` privateKeyFile`` contains the certificate.
- caConfig The path to the file or directory that contains trusted root certificates.
- verificationMode The method for checking the node's certificates. Details are in the description of the [Context](https://github.com/yandex/ClickHouse/blob/master/contrib/libpoco/NetSSL_OpenSSL/include/Poco/Net/Context.h) class. Possible values: ``none``, ``relaxed``, ``strict``, ``once``.
- verificationMode The method for checking the node's certificates. Details are in the description of the [Context](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/Context.h) class. Possible values: ``none``, ``relaxed``, ``strict``, ``once``.
- verificationDepth The maximum length of the verification chain. Verification will fail if the certificate chain length exceeds the set value.
- loadDefaultCAFile Indicates that built-in CA certificates for OpenSSL will be used. Acceptable values: `` true``, `` false``. |
- cipherList - Поддерживаемые OpenSSL-шифры. For example: `` ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH``.

View File

@ -1434,7 +1434,7 @@ and the result will be put in a temporary table in RAM. Then the request will be
SELECT uniq(UserID) FROM local_table WHERE CounterID = 101500 AND UserID GLOBAL IN _data1
```
and the temporary table '_data1' will be sent to every remote server together with the query (the name of the temporary table is implementation-defined).
and the temporary table `_data1` will be sent to every remote server together with the query (the name of the temporary table is implementation-defined).
This is more optimal than using the normal IN. However, keep the following points in mind:
@ -1476,28 +1476,29 @@ In all other cases, we don't recommend using the asterisk, since it only gives y
## KILL QUERY
```sql
KILL QUERY WHERE <where expression to SELECT FROM system.processes query> [SYNC|ASYNC|TEST] [FORMAT format]
KILL QUERY
WHERE <where expression to SELECT FROM system.processes query>
[SYNC|ASYNC|TEST]
[FORMAT format]
```
Attempts to terminate queries currently running.
The queries to terminate are selected from the system.processes table for which expression_for_system.processes is true.
The queries to terminate are selected from the system.processes table for which `WHERE` expression is true.
Examples:
```sql
-- Terminates all queries with the specified query_id.
KILL QUERY WHERE query_id='2-857d-4a57-9ee0-327da5d60a90'
```
Terminates all queries with the specified query_id.
```sql
-- Synchronously terminates all queries run by `username`.
KILL QUERY WHERE user='username' SYNC
```
Synchronously terminates all queries run by `username`.
Readonly-users can only terminate their own requests.
By default, the asynchronous version of queries is used (`ASYNC`), which terminates without waiting for queries to complete.
The synchronous version (`SYNC`) waits for all queries to be completed and displays information about each process as it terminates.
The response contains the `kill_status` column, which can take the following values:

View File

@ -22,7 +22,7 @@
При наличии в запросе `SELECT` секции `GROUP BY` или хотя бы одной агрегатной функции, ClickHouse (в отличие от, например, MySQL) требует, чтобы все выражения в секциях `SELECT`, `HAVING`, `ORDER BY` вычислялись из ключей или из агрегатных функций. То есть, каждый выбираемый из таблицы столбец, должен использоваться либо в ключах, либо внутри агрегатных функций. Чтобы получить поведение, как в MySQL, вы можете поместить остальные столбцы в агрегатную функцию `any`.
## anyHeavy
## anyHeavy(x)
Выбирает часто встречающееся значение с помощью алгоритма "[heavy hitters](http://www.cs.umd.edu/~samir/498/karp.pdf)". Если существует значение, которое встречается чаще, чем в половине случаев, в каждом потоке выполнения запроса, то возвращается данное значение. В общем случае, результат недетерминирован.
@ -185,7 +185,7 @@ GROUP BY timeslot
<a name="agg_functions_groupArrayInsertAt"></a>
## groupArrayInsertAt
## groupArrayInsertAt(x)
Вставляет в массив значение в заданную позицию.
@ -281,7 +281,7 @@ GROUP BY timeslot
Результат зависит от порядка выполнения запроса, и является недетерминированным.
## median
## median(x)
Для всех quantile-функций, также присутствуют соответствующие median-функции: `median`, `medianDeterministic`, `medianTiming`, `medianTimingWeighted`, `medianExact`, `medianExactWeighted`, `medianTDigest`. Они являются синонимами и их поведение ничем не отличается.
@ -315,7 +315,7 @@ GROUP BY timeslot
Результат равен квадратному корню от `varPop(x)`.
## topK
## topK(N)(column)
Возвращает массив наиболее часто встречающихся значений в указанном столбце. Результирующий массив упорядочен по убыванию частоты значения (не по самим значениям).

View File

@ -33,7 +33,7 @@
Максимальный возможный объем оперативной памяти для выполнения запроса на одном сервере.
В конфигурационном файле по-умолчанию, ограничение равно 10 ГБ.
В конфигурационном файле по умолчанию, ограничение равно 10 ГБ.
Настройка не учитывает объём свободной памяти или общий объём памяти на машине.
Ограничение действует на один запрос, в пределах одного сервера.

View File

@ -180,7 +180,7 @@ DROP DATABASE [IF EXISTS] db [ON CLUSTER cluster]
Если указано `IF EXISTS` - не выдавать ошибку, если база данных не существует.
```sql
DROP TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
DROP [TEMPORARY] TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
```
Удаляет таблицу.
@ -444,7 +444,7 @@ SHOW DATABASES [INTO OUTFILE filename] [FORMAT format]
## SHOW TABLES
```sql
SHOW TABLES [FROM db] [LIKE 'pattern'] [INTO OUTFILE filename] [FORMAT format]
SHOW [TEMPORARY] TABLES [FROM db] [LIKE 'pattern'] [INTO OUTFILE filename] [FORMAT format]
```
Выводит список таблиц
@ -491,7 +491,7 @@ watch -n1 "clickhouse-client --query='SHOW PROCESSLIST'"
## SHOW CREATE TABLE
```sql
SHOW CREATE TABLE [db.]table [INTO OUTFILE filename] [FORMAT format]
SHOW CREATE [TEMPORARY] TABLE [db.]table [INTO OUTFILE filename] [FORMAT format]
```
Возвращает один столбец statement типа `String`, содержащий одно значение - запрос `CREATE`, с помощью которого создана указанная таблица.
@ -509,7 +509,7 @@ DESC|DESCRIBE TABLE [db.]table [INTO OUTFILE filename] [FORMAT format]
## EXISTS
```sql
EXISTS TABLE [db.]name [INTO OUTFILE filename] [FORMAT format]
EXISTS [TEMPORARY] TABLE [db.]name [INTO OUTFILE filename] [FORMAT format]
```
Возвращает один столбец типа `UInt8`, содержащий одно значение - `0`, если таблицы или БД не существует и `1`, если таблица в указанной БД существует.
@ -1430,7 +1430,7 @@ SELECT UserID FROM distributed_table WHERE CounterID = 34
SELECT uniq(UserID) FROM local_table WHERE CounterID = 101500 AND UserID GLOBAL IN _data1
```
, и вместе с запросом, на каждый удалённый сервер будет отправлена временная таблица _data1 (имя временной таблицы - implementation defined).
, и вместе с запросом, на каждый удалённый сервер будет отправлена временная таблица `_data1` (имя временной таблицы - implementation defined).
Это гораздо более оптимально, чем при использовании обычного IN. Но при этом, следует помнить о нескольких вещах: