-
+:)
-### CPU
+
+
+### By calculation
Since executing a query requires processing a large number of rows, it helps to dispatch all operations for entire vectors instead of for separate rows, or to implement the query engine so that there is almost no dispatching cost. If you don't do this, with any half-decent disk subsystem, the query interpreter inevitably stalls the CPU.
It makes sense to both store data in columns and process it, when possible, by columns.
@@ -140,4 +138,3 @@ There are two ways to do this:
This is not done in "normal" databases, because it doesn't make sense when running simple queries. However, there are exceptions. For example, MemSQL uses code generation to reduce latency when processing SQL queries. (For comparison, analytical DBMSs require optimization of throughput, not latency.)
Note that for CPU efficiency, the query language must be declarative (SQL or MDX), or at least a vector (J, K). The query should only contain implicit loops, allowing for optimization.
-
diff --git a/docs/en/interfaces/cli.md b/docs/en/interfaces/cli.md
index 294052419b1..2123842ce62 100644
--- a/docs/en/interfaces/cli.md
+++ b/docs/en/interfaces/cli.md
@@ -1,6 +1,6 @@
# Command-line Client
-To work from the command line, you can use ` clickhouse-client`:
+To work from the command line, you can use `clickhouse-client`:
```bash
$ clickhouse-client
@@ -31,6 +31,7 @@ _EOF
cat file.csv | clickhouse-client --database=test --query="INSERT INTO test FORMAT CSV";
```
+
In batch mode, the default data format is TabSeparated. You can set the format in the FORMAT clause of the query.
By default, you can only process a single query in batch mode. To make multiple queries from a "script," use the --multiquery parameter. This works for all queries except INSERT. Query results are output consecutively without additional separators.
@@ -51,7 +52,8 @@ The history is written to `~/.clickhouse-client-history`.
By default, the format used is PrettyCompact. You can change the format in the FORMAT clause of the query, or by specifying `\G` at the end of the query, using the `--format` or `--vertical` argument in the command line, or using the client configuration file.
-To exit the client, press Ctrl+D (or Ctrl+C), or enter one of the following instead of a query: "exit", "quit", "logout", "exit;", "quit;", "logout;", "q", "Q", ":q"
+To exit the client, press Ctrl+D (or Ctrl+C), or enter one of the following instead of a query:
+"exit", "quit", "logout", "exit;", "quit;", "logout;", "q", "Q", ":q"
When processing a query, the client shows:
diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md
index 6d46cedbd5b..8402f660d37 100644
--- a/docs/en/interfaces/formats.md
+++ b/docs/en/interfaces/formats.md
@@ -1,36 +1,36 @@
-# Input and Output Formats
+# Formats for input and output data
-The format determines how data is returned to you after SELECTs (how it is written and formatted by the server), and how it is accepted for INSERTs (how it is read and parsed by the server).
+ClickHouse can accept (`INSERT`) and return (`SELECT`) data in various formats.
-See the table below for the list of supported formats for either kinds of queries.
+The table below lists supported formats and how they can be used in `INSERT` and `SELECT` queries.
-Format | INSERT | SELECT
--------|--------|--------
-[TabSeparated](formats.md#tabseparated) | ✔ | ✔ |
-[TabSeparatedRaw](formats.md#tabseparatedraw) | ✗ | ✔ |
-[TabSeparatedWithNames](formats.md#tabseparatedwithnames) | ✔ | ✔ |
-[TabSeparatedWithNamesAndTypes](formats.md#tabseparatedwithnamesandtypes) | ✔ | ✔ |
-[CSV](formats.md#csv) | ✔ | ✔ |
-[CSVWithNames](formats.md#csvwithnames) | ✔ | ✔ |
-[Values](formats.md#values) | ✔ | ✔ |
-[Vertical](formats.md#vertical) | ✗ | ✔ |
-[VerticalRaw](formats.md#verticalraw) | ✗ | ✔ |
-[JSON](formats.md#json) | ✗ | ✔ |
-[JSONCompact](formats.md#jsoncompact) | ✗ | ✔ |
-[JSONEachRow](formats.md#jsoneachrow) | ✔ | ✔ |
-[TSKV](formats.md#tskv) | ✔ | ✔ |
-[Pretty](formats.md#pretty) | ✗ | ✔ |
-[PrettyCompact](formats.md#prettycompact) | ✗ | ✔ |
-[PrettyCompactMonoBlock](formats.md#prettycompactmonoblock) | ✗ | ✔ |
-[PrettyNoEscapes](formats.md#prettynoescapes) | ✗ | ✔ |
-[PrettySpace](formats.md#prettyspace) | ✗ | ✔ |
-[RowBinary](formats.md#rowbinary) | ✔ | ✔ |
-[Native](formats.md#native) | ✔ | ✔ |
-[Null](formats.md#null) | ✗ | ✔ |
-[XML](formats.md#xml) | ✗ | ✔ |
-[CapnProto](formats.md#capnproto) | ✔ | ✔ |
+| Format | INSERT | SELECT |
+| ------- | -------- | -------- |
+| [TabSeparated](#tabseparated) | ✔ | ✔ |
+| [TabSeparatedRaw](#tabseparatedraw) | ✗ | ✔ |
+| [TabSeparatedWithNames](#tabseparatedwithnames) | ✔ | ✔ |
+| [TabSeparatedWithNamesAndTypes](#tabseparatedwithnamesandtypes) | ✔ | ✔ |
+| [CSV](#csv) | ✔ | ✔ |
+| [CSVWithNames](#csvwithnames) | ✔ | ✔ |
+| [Values](#values) | ✔ | ✔ |
+| [Vertical](#vertical) | ✗ | ✔ |
+| [VerticalRaw](#verticalraw) | ✗ | ✔ |
+| [JSON](#json) | ✗ | ✔ |
+| [JSONCompact](#jsoncompact) | ✗ | ✔ |
+| [JSONEachRow](#jsoneachrow) | ✔ | ✔ |
+| [TSKV](#tskv) | ✔ | ✔ |
+| [Pretty](#pretty) | ✗ | ✔ |
+| [PrettyCompact](#prettycompact) | ✗ | ✔ |
+| [PrettyCompactMonoBlock](#prettycompactmonoblock) | ✗ | ✔ |
+| [PrettyNoEscapes](#prettynoescapes) | ✗ | ✔ |
+| [PrettySpace](#prettyspace) | ✗ | ✔ |
+| [RowBinary](#rowbinary) | ✔ | ✔ |
+| [Native](#native) | ✔ | ✔ |
+| [Null](#null) | ✗ | ✔ |
+| [XML](#xml) | ✗ | ✔ |
+| [CapnProto](#capnproto) | ✔ | ✔ |
@@ -57,26 +57,30 @@ struct Message {
Schema files are in the file that is located in the directory specified in [ format_schema_path](../operations/server_settings/settings.md#server_settings-format_schema_path) in the server configuration.
Deserialization is effective and usually doesn't increase the system load.
+
## CSV
Comma Separated Values format ([RFC](https://tools.ietf.org/html/rfc4180)).
-When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters. Date and date-time are enclosed in double quotes. Numbers are output without quotes. Values are separated by a delimiter*. Rows are separated using the Unix line feed (LF). Arrays are serialized in CSV as follows: first the array is serialized to a string as in TabSeparated format, and then the resulting string is output to CSV in double quotes. Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
+When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters. Date and date-time are enclosed in double quotes. Numbers are output without quotes. Values are separated by a delimiter character, which is `,` by default. The delimiter character is defined in the setting [format_csv_delimiter](../operations/settings/settings.md#format_csv_delimiter). Rows are separated using the Unix line feed (LF). Arrays are serialized in CSV as follows: first the array is serialized to a string as in TabSeparated format, and then the resulting string is output to CSV in double quotes. Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
```
clickhouse-client --format_csv_delimiter="|" --query="INSERT INTO test.csv FORMAT CSV" < data.csv
```
-*By default — `,`. See a [format_csv_delimiter](/operations/settings/settings/#format_csv_delimiter) setting for additional info.
+*By default, the delimiter is `,`. See the [format_csv_delimiter](/operations/settings/settings/#format_csv_delimiter) setting for more information.
-When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to a delimiter or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) are all supported.
+When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to the delimiter character or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) types are all supported.
+
+`NULL` is formatted as `\N`.
The CSV format supports the output of totals and extremes the same way as `TabSeparated`.
## CSVWithNames
Also prints the header row, similar to `TabSeparatedWithNames`.
+
## JSON
@@ -150,7 +154,7 @@ SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase WITH TOTA
}
```
-The JSON is compatible with JavaScript. To ensure this, some characters are additionally escaped: the slash ` /` is escaped as ` \/`; alternative line breaks ` U+2028` and ` U+2029`, which break some browsers, are escaped as ` \uXXXX`. ASCII control characters are escaped: backspace, form feed, line feed, carriage return, and horizontal tab are replaced with `\b`, `\f`, `\n`, `\r`, `\t` , as well as the remaining bytes in the 00-1F range using `\uXXXX` sequences. Invalid UTF-8 sequences are changed to the replacement character � so the output text will consist of valid UTF-8 sequences. For compatibility with JavaScript, Int64 and UInt64 integers are enclosed in double quotes by default. To remove the quotes, you can set the configuration parameter output_format_json_quote_64bit_integers to 0.
+The JSON is compatible with JavaScript. To ensure this, some characters are additionally escaped: the slash `/` is escaped as `\/`; alternative line breaks `U+2028` and `U+2029`, which break some browsers, are escaped as `\uXXXX`. ASCII control characters are escaped: backspace, form feed, line feed, carriage return, and horizontal tab are replaced with `\b`, `\f`, `\n`, `\r`, `\t` , as well as the remaining bytes in the 00-1F range using `\uXXXX` sequences. Invalid UTF-8 sequences are changed to the replacement character � so the output text will consist of valid UTF-8 sequences. For compatibility with JavaScript, Int64 and UInt64 integers are enclosed in double quotes by default. To remove the quotes, you can set the configuration parameter output_format_json_quote_64bit_integers to 0.
`rows` – The total number of output rows.
@@ -162,7 +166,13 @@ If the query contains GROUP BY, rows_before_limit_at_least is the exact number o
`extremes` – Extreme values (when extremes is set to 1).
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
+
+ClickHouse supports [NULL](../query_language/syntax.md#null-literal), which is displayed as `null` in the JSON output.
+
See also the JSONEachRow format.
+
+
+
## JSONCompact
Differs from JSON only in that data rows are output in arrays, not in objects.
@@ -188,8 +198,8 @@ Example:
["", "8267016"],
["bathroom interior design", "2166"],
["yandex", "1655"],
- ["spring 2014 fashion", "1549"],
- ["freeform photos", "1480"]
+ ["fashion trends spring 2014", "1549"],
+ ["freeform photo", "1480"]
],
"totals": ["","8873898"],
@@ -208,6 +218,7 @@ Example:
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
See also the `JSONEachRow` format.
+
## JSONEachRow
@@ -215,37 +226,53 @@ Outputs data as separate JSON objects for each row (newline delimited JSON).
```json
{"SearchPhrase":"","count()":"8267016"}
-{"SearchPhrase":"bathroom interior design","count()":"2166"}
+{"SearchPhrase": "bathroom interior design","count()": "2166"}
{"SearchPhrase":"yandex","count()":"1655"}
-{"SearchPhrase":"spring 2014 fashion","count()":"1549"}
+{"SearchPhrase":"2014 spring fashion","count()":"1549"}
{"SearchPhrase":"freeform photo","count()":"1480"}
{"SearchPhrase":"angelina jolie","count()":"1245"}
{"SearchPhrase":"omsk","count()":"1112"}
{"SearchPhrase":"photos of dog breeds","count()":"1091"}
-{"SearchPhrase":"curtain design","count()":"1064"}
+{"SearchPhrase":"curtain designs","count()":"1064"}
{"SearchPhrase":"baku","count()":"1000"}
```
Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON.
For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.
+
## Native
The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is "columnar" – it doesn't convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It doesn't make sense to work with this format yourself.
+
## Null
Nothing is output. However, the query is processed, and when using the command-line client, data is transmitted to the client. This is used for tests, including productivity testing.
Obviously, this format is only appropriate for output, not for parsing.
+
## Pretty
Outputs data as Unicode-art tables, also using ANSI-escape sequences for setting colors in the terminal.
A full grid of the table is drawn, and each row occupies two lines in the terminal.
Each result block is output as a separate table. This is necessary so that blocks can be output without buffering results (buffering would be necessary in order to pre-calculate the visible width of all the values).
+
+[NULL](../query_language/syntax.md#null-literal) is output as `ᴺᵁᴸᴸ`.
+
+```sql
+SELECT * FROM t_null
+```
+
+```
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+└───┴──────┘
+```
+
To avoid dumping too much data to the terminal, only the first 10,000 rows are printed. If the number of rows is greater than or equal to 10,000, the message "Showed first 10 000" is printed.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
@@ -278,14 +305,18 @@ Extremes:
└────────────┴─────────┘
```
+
+
## PrettyCompact
Differs from `Pretty` in that the grid is drawn between rows and the result is more compact.
This format is used by default in the command-line client in interactive mode.
+
## PrettyCompactMonoBlock
-Differs from `PrettyCompact` in that up to 10,000 rows are buffered, then output as a single table, not by blocks.
+Differs from [PrettyCompact](#prettycompact) in that up to 10,000 rows are buffered, then output as a single table, not by blocks.
+
## PrettyNoEscapes
@@ -306,10 +337,12 @@ The same as the previous setting.
### PrettySpaceNoEscapes
The same as the previous setting.
+
## PrettySpace
-Differs from `PrettyCompact` in that whitespace (space characters) is used instead of the grid.
+Differs from [PrettyCompact](#prettycompact) in that whitespace (space characters) is used instead of the grid.
+
## RowBinary
@@ -324,10 +357,41 @@ FixedString is represented simply as a sequence of bytes.
Array is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by successive elements of the array.
+For [NULL](../query_language/syntax.md#null-literal) support, an additional byte containing 1 or 0 is added before each [Nullable](../data_types/nullable.md#data_type-nullable) value. If 1, then the value is `NULL` and this byte is interpreted as a separate value. If 0, the value after the byte is not `NULL`.
+
+
+
## TabSeparated
In TabSeparated format, data is written by row. Each row contains values separated by tabs. Each value is follow by a tab, except the last value in the row, which is followed by a line feed. Strictly Unix line feeds are assumed everywhere. The last row also must contain a line feed at the end. Values are written in text format, without enclosing quotation marks, and with special characters escaped.
+This format is also available under the name `TSV`.
+
+The `TabSeparated` format is convenient for processing data using custom programs and scripts. It is used by default in the HTTP interface, and in the command-line client's batch mode. This format also allows transferring data between different DBMSs. For example, you can get a dump from MySQL and upload it to ClickHouse, or vice versa.
+
+The `TabSeparated` format supports outputting total values (when using WITH TOTALS) and extreme values (when 'extremes' is set to 1). In these cases, the total values and extremes are output after the main data. The main result, total values, and extremes are separated from each other by an empty line. Example:
+
+```sql
+SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated``
+```
+
+```text
+2014-03-17 1406958
+2014-03-18 1383658
+2014-03-19 1405797
+2014-03-20 1353623
+2014-03-21 1245779
+2014-03-22 1031592
+2014-03-23 1046491
+
+0000-00-00 8873898
+
+2014-03-17 1031592
+2014-03-23 1406958
+```
+
+## Data formatting
+
Integer numbers are written in decimal form. Numbers can contain an extra "+" character at the beginning (ignored when parsing, and not recorded when formatting). Non-negative numbers can't contain the negative sign. When reading, it is allowed to parse an empty string as a zero, or (for signed types) a string consisting of just a minus sign as a zero. Numbers that do not fit into the corresponding data type may be parsed as a different number, without an error message.
Floating-point numbers are written in decimal form. The dot is used as the decimal separator. Exponential entries are supported, as are 'inf', '+inf', '-inf', and 'nan'. An entry of floating-point numbers may begin or end with a decimal point.
@@ -358,37 +422,17 @@ Only a small set of symbols are escaped. You can easily stumble onto a string va
Arrays are written as a list of comma-separated values in square brackets. Number items in the array are fomratted as normally, but dates, dates with times, and strings are written in single quotes with the same escaping rules as above.
-The TabSeparated format is convenient for processing data using custom programs and scripts. It is used by default in the HTTP interface, and in the command-line client's batch mode. This format also allows transferring data between different DBMSs. For example, you can get a dump from MySQL and upload it to ClickHouse, or vice versa.
+[NULL](../query_language/syntax.md#null-literal) is formatted as `\N`.
-The TabSeparated format supports outputting total values (when using WITH TOTALS) and extreme values (when 'extremes' is set to 1). In these cases, the total values and extremes are output after the main data. The main result, total values, and extremes are separated from each other by an empty line. Example:
-
-```sql
-SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated``
-```
-
-```text
-2014-03-17 1406958
-2014-03-18 1383658
-2014-03-19 1405797
-2014-03-20 1353623
-2014-03-21 1245779
-2014-03-22 1031592
-2014-03-23 1046491
-
-0000-00-00 8873898
-
-2014-03-17 1031592
-2014-03-23 1406958
-```
-
-This format is also available under the name `TSV`.
+
## TabSeparatedRaw
Differs from `TabSeparated` format in that the rows are written without escaping.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
-This format is also available under the name ` TSVRaw`.
+This format is also available under the name `TSVRaw`.
+
## TabSeparatedWithNames
@@ -396,14 +440,16 @@ Differs from the `TabSeparated` format in that the column names are written in t
During parsing, the first row is completely ignored. You can't use column names to determine their position or to check their correctness.
(Support for parsing the header row may be added in the future.)
-This format is also available under the name ` TSVWithNames`.
+This format is also available under the name `TSVWithNames`.
+
## TabSeparatedWithNamesAndTypes
Differs from the `TabSeparated` format in that the column names are written to the first row, while the column types are in the second row.
During parsing, the first and second rows are completely ignored.
-This format is also available under the name ` TSVWithNamesAndTypes`.
+This format is also available under the name `TSVWithNamesAndTypes`.
+
## TSKV
@@ -413,15 +459,25 @@ Similar to TabSeparated, but outputs a value in name=value format. Names are esc
SearchPhrase= count()=8267016
SearchPhrase=bathroom interior design count()=2166
SearchPhrase=yandex count()=1655
-SearchPhrase=spring 2014 fashion count()=1549
+SearchPhrase=2014 spring fashion count()=1549
SearchPhrase=freeform photos count()=1480
-SearchPhrase=angelina jolia count()=1245
+SearchPhrase=angelina jolie count()=1245
SearchPhrase=omsk count()=1112
SearchPhrase=photos of dog breeds count()=1091
-SearchPhrase=curtain design count()=1064
+SearchPhrase=curtain designs count()=1064
SearchPhrase=baku count()=1000
```
+[NULL](../query_language/syntax.md#null-literal) is formatted as `\N`.
+
+```sql
+SELECT * FROM t_null FORMAT TSKV
+```
+
+```
+x=1 y=\N
+```
+
When there is a large number of small columns, this format is ineffective, and there is generally no reason to use it. It is used in some departments of Yandex.
Both data output and parsing are supported in this format. For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults.
@@ -430,17 +486,37 @@ Parsing allows the presence of the additional field `tskv` without the equal sig
## Values
-Prints every row in brackets. Rows are separated by commas. There is no comma after the last row. The values inside the brackets are also comma-separated. Numbers are output in decimal format without quotes. Arrays are output in square brackets. Strings, dates, and dates with times are output in quotes. Escaping rules and parsing are similar to the TabSeparated format. During formatting, extra spaces aren't inserted, but during parsing, they are allowed and skipped (except for spaces inside array values, which are not allowed).
+Prints every row in brackets. Rows are separated by commas. There is no comma after the last row. The values inside the brackets are also comma-separated. Numbers are output in decimal format without quotes. Arrays are output in square brackets. Strings, dates, and dates with times are output in quotes. Escaping rules and parsing are similar to the [TabSeparated](#tabseparated) format. During formatting, extra spaces aren't inserted, but during parsing, they are allowed and skipped (except for spaces inside array values, which are not allowed). [NULL](../query_language/syntax.md#null-literal) is represented as `NULL`.
The minimum set of characters that you need to escape when passing data in Values format: single quotes and backslashes.
This is the format that is used in `INSERT INTO t VALUES ...`, but you can also use it for formatting query results.
+
+
## Vertical
Prints each value on a separate line with the column name specified. This format is convenient for printing just one or a few rows, if each row consists of a large number of columns.
+
+[NULL](../query_language/syntax.md#null-literal) is output as `ᴺᵁᴸᴸ`.
+
+Example:
+
+```sql
+SELECT * FROM t_null FORMAT Vertical
+```
+
+```
+Row 1:
+──────
+x: 1
+y: ᴺᵁᴸᴸ
+```
+
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
+
+
## VerticalRaw
Differs from `Vertical` format in that the rows are not escaped.
@@ -469,6 +545,9 @@ Row 1:
──────
test: string with \'quotes\' and \t with some special \n characters
```
+
+
+
## XML
XML format is suitable only for output, not for parsing. Example:
@@ -502,7 +581,7 @@ XML format is suitable only for output, not for parsing. Example:
1655
- spring 2014 fashion
+ 2014 spring fashion1549
@@ -522,7 +601,7 @@ XML format is suitable only for output, not for parsing. Example:
1091
- curtain design
+ curtain designs1064
@@ -540,6 +619,4 @@ Just as for JSON, invalid UTF-8 sequences are changed to the replacement charact
In string values, the characters `<` and `&` are escaped as `<` and `&`.
-Arrays are output as `HelloWorld...`,
-and tuples as `HelloWorld...`.
-
+Arrays are output as `HelloWorld...`,and tuples as `HelloWorld...`.
diff --git a/docs/en/interfaces/http_interface.md b/docs/en/interfaces/http_interface.md
index 7e16fd854ba..4b20f2a0b65 100644
--- a/docs/en/interfaces/http_interface.md
+++ b/docs/en/interfaces/http_interface.md
@@ -34,7 +34,8 @@ Date: Fri, 16 Nov 2012 19:21:50 GMT
1
```
-As you can see, curl is somewhat inconvenient in that spaces must be URL escaped.Although wget escapes everything itself, we don't recommend using it because it doesn't work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
+As you can see, curl is somewhat inconvenient in that spaces must be URL escaped.
+Although wget escapes everything itself, we don't recommend using it because it doesn't work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
```bash
$ echo 'SELECT 1' | curl 'http://localhost:8123/' --data-binary @-
@@ -170,8 +171,7 @@ echo 'SELECT 1' | curl 'http://localhost:8123/?user=user&password=password' -d @
```
If the user name is not indicated, the username 'default' is used. If the password is not indicated, an empty password is used.
-You can also use the URL parameters to specify any settings for processing a single query, or entire profiles of settings. Example:
-http://localhost:8123/?profile=web&max_rows_to_read=1000000000&query=SELECT+1
+You can also use the URL parameters to specify any settings for processing a single query, or entire profiles of settings. Example:http://localhost:8123/?profile=web&max_rows_to_read=1000000000&query=SELECT+1
For more information, see the section "Settings".
diff --git a/docs/en/interfaces/jdbc.md b/docs/en/interfaces/jdbc.md
index 9d4c4f7d37e..a8808770fa9 100644
--- a/docs/en/interfaces/jdbc.md
+++ b/docs/en/interfaces/jdbc.md
@@ -1,7 +1,5 @@
# JDBC Driver
-There is an official JDBC driver for ClickHouse. See [here](https://github.com/yandex/clickhouse-jdbc) .
+- [Official driver](https://github.com/yandex/clickhouse-jdbc).
+- Third-party driver from [ClickHouse-Native-JDBC](https://github.com/housepower/ClickHouse-Native-JDBC).
-JDBC drivers implemented by other organizations:
-
-- [ClickHouse-Native-JDBC](https://github.com/housepower/ClickHouse-Native-JDBC)
diff --git a/docs/en/interfaces/third-party_client_libraries.md b/docs/en/interfaces/third-party_client_libraries.md
index 311c03bb2e6..a46d9efc696 100644
--- a/docs/en/interfaces/third-party_client_libraries.md
+++ b/docs/en/interfaces/third-party_client_libraries.md
@@ -1,6 +1,6 @@
# Libraries from Third-party Developers
-There are libraries for working with ClickHouse for:
+We have not tested the libraries listed below.
- Python
- [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm)
@@ -45,4 +45,3 @@ There are libraries for working with ClickHouse for:
- Nim
- [nim-clickhouse](https://github.com/leonardoce/nim-clickhouse)
-We have not tested these libraries. They are listed in random order.
diff --git a/docs/en/interfaces/third-party_gui.md b/docs/en/interfaces/third-party_gui.md
index 3bd64ee7bbb..eee127f4f0d 100644
--- a/docs/en/interfaces/third-party_gui.md
+++ b/docs/en/interfaces/third-party_gui.md
@@ -4,7 +4,8 @@
Web interface for ClickHouse in the [Tabix](https://github.com/tabixio/tabix) project.
-### Features:
+Main features:
+
- Works with ClickHouse directly from the browser, without the need to install additional software.
- Query editor with syntax highlighting.
- Auto-completion of commands.
@@ -13,22 +14,25 @@ Web interface for ClickHouse in the [Tabix](https://github.com/tabixio/tabix) pr
[Tabix documentation](https://tabix.io/doc/).
-
## HouseOps
-[HouseOps](https://github.com/HouseOps/HouseOps) is a unique Desktop ClickHouse Ops UI / IDE for OSX, Linux and Windows.
+[HouseOps](https://github.com/HouseOps/HouseOps) is a UI/IDE for OSX, Linux and Windows.
+
+Main features:
+
+- Query builder with syntax highlighting. View the response in a table or JSON view.
+- Export query results as CSV or JSON.
+- List of processes with descriptions. Write mode. Ability to stop (`KILL`) a process.
+- Database graph. Shows all tables and their columns with additional information.
+- Quick view of the column size.
+- Server configuration.
+
+The following features are planned for development:
+
+- Database management.
+- User management.
+- Real-time data analysis.
+- Cluster monitoring.
+- Cluster management.
+- Monitoring replicated and Kafka tables.
-### Features:
-- Query builder with syntax highlighting, response viewed in Table and JSON Object.
-- Export results in csv and JSON object.
-- Processes List with description, Record mode and Kill processes feature.
-- Database Graph with all tables and columns with extra informations.
-- Easy view your columns size.
-- Server settings.
-- Database manangement (soon);
-- Users manangement (soon);
-- Real-Time Data Analytics (soon);
-- Cluster/Infra monitoring (soon);
-- Cluster manangement (soon);
-- Kafka and Replicated tables monitoring (soon);
-- And a lot of others features for you take a beautiful implementation of ClickHouse.
diff --git a/docs/en/introduction/distinctive_features.md b/docs/en/introduction/distinctive_features.md
index b8660694fe5..382049cd848 100644
--- a/docs/en/introduction/distinctive_features.md
+++ b/docs/en/introduction/distinctive_features.md
@@ -2,68 +2,61 @@
## True Column-oriented DBMS
-In a true column-oriented DBMS, there is no excessive data stored with the values. For example, this means that constant-length values must be supported, to avoid storing their length as additional integer next to the values. In this case, a billion UInt8 values should actually consume around 1 GB uncompressed, or this will strongly affect the CPU use. It is very important to store data compactly even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
+In a true column-oriented DBMS, no extra data is stored with the values. Among other things, this means that constant-length values must be supported, to avoid storing their length "number" next to the values. As an example, a billion UInt8-type values should actually consume around 1 GB uncompressed, or this will strongly affect the CPU use. It is very important to store data compactly (without any "garbage") even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
-This is worth noting because there are systems that can store values of different columns separately, but that can't effectively process analytical queries due to their optimization for other scenarios. Examples are HBase, BigTable, Cassandra, and HyperTable. In these systems, you will get throughput around a hundred thousand rows per second, but not hundreds of millions of rows per second.
+This is worth noting because there are systems that can store values of separate columns separately, but that can't effectively process analytical queries due to their optimization for other scenarios. Examples are HBase, BigTable, Cassandra, and HyperTable. In these systems, you will get throughput around a hundred thousand rows per second, but not hundreds of millions of rows per second.
-Also note that ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
+It's also worth noting that ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
## Data Compression
-Some column-oriented DBMSs (InfiniDB CE and MonetDB) do not use data compression. However, data compression is crucial to achieve excellent performance.
+Some column-oriented DBMSs (InfiniDB CE and MonetDB) do not use data compression. However, data compression does play a key role in achieving excellent performance.
## Disk Storage of Data
-Many column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach stimulates the allocation of a larger hardware budget than is actually necessary for real-time analysis. ClickHouse is designed to work on regular hard drives, which ensures low cost of ownership per gigabyte of data, but SSD and additional RAM are also utilized fully if available.
+Many column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach encourages more budgeting for hardware than is actually needed for real-time analysis. ClickHouse is designed to work on normal hard drives, which means the cost per GB of data storage is low, but SSD b additional RAM is also fully used if available.
## Parallel Processing on Multiple Cores
-Large queries are parallelized in a natural way, utilizing all necessary resources that are available on the current server.
+Large queries are parallelized in a natural way, taking all the necessary resources from what is available on the server.
## Distributed Processing on Multiple Servers
-Almost none of the columnar DBMSs mentioned above have support for distributed query processing.
+Almost none of the columnar DBMSs listed above have support for distributed processing.
In ClickHouse, data can reside on different shards. Each shard can be a group of replicas that are used for fault tolerance. The query is processed on all the shards in parallel. This is transparent for the user.
## SQL Support
-If you are familiar with standard SQL, we can't really talk about SQL support.
-All the functions have different names.
-However, this is a declarative query language based on SQL that can't be differentiated from SQL in many instances.
-JOINs are supported. Subqueries are supported in FROM, IN, and JOIN clauses, as well as scalar subqueries.
-Dependent subqueries are not supported.
-
-ClickHouse supports declarative query language that is based on SQL and complies to SQL standard in many cases.
-GROUP BY, ORDER BY, scalar subqueries and subqueries in FROM, IN and JOIN clauses are supported.
-Correlated subqueries and window functions are not supported.
+ClickHouse supports a declarative query language based on SQL that is identical to the SQL standard in many cases.
+Supported queries include GROUP BY, ORDER BY, subqueries in FROM, IN, and JOIN clauses, and scalar subqueries.
+Dependent subqueries and window functions are not supported.
## Vector Engine
-Data is not only stored by columns, but is also processed by vectors (parts of columns). This allows to achieve high CPU efficiency.
+Data is not only stored by columns, but is processed by vectors (parts of columns). This allows us to achieve high CPU performance.
## Real-time Data Updates
-ClickHouse supports tables with a primary key. In order to quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. No locks are taken when new data is ingested.
+ClickHouse supports primary key tables. In order to quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. There is no locking when adding data.
## Index
-Having a data physically sorted by primary key makes it possible to extract data for it's specific values or value ranges with low latency, less than few dozen milliseconds.
+Physical sorting of data by primary key allows you to get data for specific key values or ranges of values with low latency of less than several dozen milliseconds.
## Suitable for Online Queries
-Low latency means that queries can be processed without delay and without trying to prepare answer in advance, right at the same moment while user interface page is loading. In other words, online.
+Low latency means queries can be processed without delay and without preparing the response ahead of time, so a query can be processed while the user interface page is loading. In other words, in online mode.
## Support for Approximated Calculations
-ClickHouse provides various ways to trade accuracy for performance:
+ClickHouse provides various ways to change the precision of calculations for improved performance:
-1. Aggregate functions for approximated calculation of the number of distinct values, medians, and quantiles.
-2. Running a query based on a part (sample) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
-3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
+1. The system contains aggregate functions for approximated calculation of the number of various values, medians, and quantiles.
+2. Supports running a query based on a part (sample) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
+3. Supports running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
-## Data Replication and Integrity
+## Data replication and data integrity support
-ClickHouse uses asynchronous multimaster replication. After being written to any available replica, data is distributed to all the other replicas in background. The system maintains identical data on different replicas. Data is restored automatically after most failures, or semiautomatically in complicated cases.
-
-For more information, see the [Data replication](../operations/table_engines/replication.md#table_engines-replication) section.
+Uses asynchronous multimaster replication. After being written to any available replica, data is distributed to all the remaining replicas in the background. The system maintains identical data on different replicas. Recovery after most failures is performed automatically, and in complex cases — semi-automatically.
+For more information, see the section [Data replication](../operations/table_engines/replication.md#table_engines-replication).
diff --git a/docs/en/introduction/features_considered_disadvantages.md b/docs/en/introduction/features_considered_disadvantages.md
index bc00374b88e..9e69486ada8 100644
--- a/docs/en/introduction/features_considered_disadvantages.md
+++ b/docs/en/introduction/features_considered_disadvantages.md
@@ -1,6 +1,7 @@
# ClickHouse Features that Can be Considered Disadvantages
-1. No full-fledged transactions.
-2. Lack of ability to modify or delete already inserted data with high rate and low latency. There are batch deletes available to clean up data that is not needed anymore or to comply with [GDPR](https://gdpr-info.eu). Batch updates are currently in development as of July 2018.
-3. Sparse index makes ClickHouse not really suitable for point queries retrieving single rows by their keys.
+1. Lack of full transactions.
+2. Previously recorded data can't be changed or deleted with low latency and high query frequency. Mass deletions to clear data that is no longer relevant or falls under [GDPR](https://gdpr-info.eu) regulations. Batch changes to data are in development (as of July, 2018).
+3. The sparse index makes ClickHouse ill-suited for point-reading single rows on its own
+keys.
diff --git a/docs/en/introduction/performance.md b/docs/en/introduction/performance.md
index 80774c8c395..d9796d26388 100644
--- a/docs/en/introduction/performance.md
+++ b/docs/en/introduction/performance.md
@@ -1,24 +1,23 @@
# Performance
-According to internal testing results by Yandex, ClickHouse shows the best performance for comparable operating scenarios among systems of its class that were available for testing. This includes the highest throughput for long queries, and the lowest latency on short queries. Testing results are shown on a [separate page](https://clickhouse.yandex/benchmark.html).
+According to internal testing results at Yandex, ClickHouse shows the best performance (both the highest throughput for long queries and the lowest latency on short queries) for comparable operating scenarios among systems of its class that were available for testing. You can view the test results on a [separate page](https://clickhouse.yandex/benchmark.html).
-There are a lot of independent benchmarks that confirm this as well. You can look it up on your own or here is the small [collection of independent benchmark links](https://clickhouse.yandex/#independent-benchmarks).
+This has also been confirmed by numerous independent benchmarks. They are not difficult to find using an internet search, or you can see [our small collection of related links](https://clickhouse.yandex/#independent-bookmarks).
## Throughput for a Single Large Query
-Throughput can be measured in rows per second or in megabytes per second. If the data is placed in the page cache, a query that is not too complex is processed on modern hardware at a speed of approximately 2-10 GB/s of uncompressed data on a single server (for the simplest cases, the speed may reach 30 GB/s). If data is not placed in the page cache, the speed is bound by the disk subsystem and how well the data has been compressed. For example, if the disk subsystem allows reading data at 400 MB/s, and the data compression rate is 3, the speed will be around 1.2 GB/s. To get the speed in rows per second, divide the speed in bytes per second by the total size of the columns used in the query. For example, if 10 bytes of columns are extracted, the speed will be around 100-200 million rows per second.
+Throughput can be measured in rows per second or in megabytes per second. If the data is placed in the page cache, a query that is not too complex is processed on modern hardware at a speed of approximately 2-10 GB/s of uncompressed data on a single server (for the simplest cases, the speed may reach 30 GB/s). If data is not placed in the page cache, the speed depends on the disk subsystem and the data compression rate. For example, if the disk subsystem allows reading data at 400 MB/s, and the data compression rate is 3, the speed will be around 1.2 GB/s. To get the speed in rows per second, divide the speed in bytes per second by the total size of the columns used in the query. For example, if 10 bytes of columns are extracted, the speed will be around 100-200 million rows per second.
The processing speed increases almost linearly for distributed processing, but only if the number of rows resulting from aggregation or sorting is not too large.
## Latency When Processing Short Queries
-If a query uses a primary key and does not select too many rows to process (hundreds of thousands), and does not use too many columns, we can expect less than 50 milliseconds of latency (single digits of milliseconds in the best case) if data is placed in the page cache. Otherwise, latency is calculated from the number of seeks. If you use rotating drives, for a system that is not overloaded, the approximate latency can be calculated by this formula: seek time (10 ms) \* number of columns queried \* number of data parts.
+If a query uses a primary key and does not select too many rows to process (hundreds of thousands), and does not use too many columns, we can expect less than 50 milliseconds of latency (single digits of milliseconds in the best case) if data is placed in the page cache. Otherwise, latency is calculated from the number of seeks. If you use rotating drives, for a system that is not overloaded, the latency is calculated by this formula: seek time (10 ms) \* number of columns queried \* number of data parts.
## Throughput When Processing a Large Quantity of Short Queries
-Under the same circumstances, ClickHouse can handle several hundred queries per second on a single server (up to several thousands in the best case). Since this scenario is not typical for analytical DBMSs, it is better to expect a maximum of hundreds of queries per second.
+Under the same conditions, ClickHouse can handle several hundred queries per second on a single server (up to several thousand in the best case). Since this scenario is not typical for analytical DBMSs, we recommend expecting a maximum of 100 queries per second.
## Performance When Inserting Data
-It is recommended to insert data in batches of at least 1000 rows, or no more than a single request per second. When inserting to a MergeTree table from a tab-separated dump, the insertion speed will be from 50 to 200 MB/s. If the inserted rows are around 1 Kb in size, the speed will be from 50,000 to 200,000 rows per second. If the rows are small, the performance will be higher in rows per second (on Banner System data -`>` 500,000 rows per second; on Graphite data -`>` 1,000,000 rows per second). To improve performance, you can make multiple INSERT queries in parallel, and performance will increase linearly.
-
+We recommend inserting data in packets of at least 1000 rows, or no more than a single request per second. When inserting to a MergeTree table from a tab-separated dump, the insertion speed will be from 50 to 200 MB/s. If the inserted rows are around 1 Kb in size, the speed will be from 50,000 to 200,000 rows per second. If the rows are small, the performance will be higher in rows per second (on Banner System data -`>` 500,000 rows per second; on Graphite data -`>` 1,000,000 rows per second). To improve performance, you can make multiple INSERT queries in parallel, and performance will increase linearly.
diff --git a/docs/en/introduction/ya_metrika_task.md b/docs/en/introduction/ya_metrika_task.md
index b573e5fef72..db173e17817 100644
--- a/docs/en/introduction/ya_metrika_task.md
+++ b/docs/en/introduction/ya_metrika_task.md
@@ -1,10 +1,10 @@
# Yandex.Metrica Use Case
-ClickHouse has been initially developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be it's core component. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article gives a historical background on what was the main goal of ClickHouse before it became an opensource product.
+ClickHouse was originally developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be the core component of this system. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development.
-Yandex.Metrica generates custom reports based on hits and sessions on the fly, with arbitrary segments and time periods chosen by the end user. Complex aggregates are often required, such as the number of unique visitors. New data for the reports arrives in real-time.
+Yandex.Metrica builds customized reports on the fly based on hits and sessions, with arbitrary segments defined by the user. This often requires building complex aggregates, such as the number of unique users. New data for building a report is received in real time.
-As of April 2014, Yandex.Metrica received approximately 12 billion events (page views and clicks) daily. All these events must be stored in order to build those custom reports. A single query may require scanning millions of rows in no more than a few hundred milliseconds, or hundreds of millions of rows over a few seconds.
+As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily. All these events must be stored in order to build custom reports. A single query may require scanning millions of rows within a few hundred milliseconds, or hundreds of millions of rows in just a few seconds.
## Usage in Yandex.Metrica and Other Yandex Services
diff --git a/docs/en/operations/access_rights.md b/docs/en/operations/access_rights.md
index a8ed711b1e4..3064da75108 100644
--- a/docs/en/operations/access_rights.md
+++ b/docs/en/operations/access_rights.md
@@ -11,28 +11,29 @@ Users are recorded in the `users` section. Here is a fragment of the `users.xml`
+
+
@@ -84,13 +86,13 @@ Quotas can use the "quota key" feature in order to report on resources for multi
```xml
-
diff --git a/docs/en/operations/server_settings/index.md b/docs/en/operations/server_settings/index.md
index 208deec710c..5631d131a43 100644
--- a/docs/en/operations/server_settings/index.md
+++ b/docs/en/operations/server_settings/index.md
@@ -4,7 +4,7 @@
This section contains descriptions of server settings that cannot be changed at the session or query level.
-These settings are stored in the ` config.xml` file on the ClickHouse server.
+These settings are stored in the `config.xml` file on the ClickHouse server.
Other settings are described in the "[Settings](../settings/index.md#settings)" section.
diff --git a/docs/en/operations/server_settings/settings.md b/docs/en/operations/server_settings/settings.md
index b93fdd15e62..ca64c1b00a1 100644
--- a/docs/en/operations/server_settings/settings.md
+++ b/docs/en/operations/server_settings/settings.md
@@ -22,8 +22,8 @@ Default value: 3600.
Data compression settings.
-!!! warning "Warning"
- Don't use it if you have just started using ClickHouse.
+!!! Important
+Don't use it if you have just started using ClickHouse.
The configuration looks like this:
@@ -44,7 +44,7 @@ Block field ``:
- ``min_part_size_ratio`` – The ratio of the minimum size of a table part to the full size of the table.
- ``method`` – Compression method. Acceptable values : ``lz4`` or ``zstd``(experimental).
-ClickHouse checks ` min_part_size` and ` min_part_size_ratio` and processes the ` case` blocks that match these conditions. If none of the `` matches, ClickHouse applies the `lz4` compression algorithm.
+ClickHouse checks `min_part_size` and `min_part_size_ratio` and processes the `case` blocks that match these conditions. If none of the `` matches, ClickHouse applies the `lz4` compression algorithm.
**Example**
@@ -64,7 +64,7 @@ ClickHouse checks ` min_part_size` and ` min_part_size_ratio` and processes th
The default database.
-To get a list of databases, use the [SHOW DATABASES](../../query_language/misc.md#query_language_queries_show_databases).
+To get a list of databases, use the [ SHOW DATABASES](../../query_language/misc.md#query_language_queries_show_databases) query.
**Example**
@@ -111,11 +111,11 @@ See also "[External dictionaries](../../query_language/dicts/external_dicts.md#d
Lazy loading of dictionaries.
-If ` true`, then each dictionary is created on first use. If dictionary creation failed, the function that was using the dictionary throws an exception.
+If `true`, then each dictionary is created on first use. If dictionary creation failed, the function that was using the dictionary throws an exception.
If `false`, all dictionaries are created when the server starts, and if there is an error, the server shuts down.
-The default is ` true`.
+The default is `true`.
**Example**
@@ -176,7 +176,7 @@ You can configure multiple `` clauses. For instance, you can use this
Settings for thinning data for Graphite.
-For more information, see [GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md#table_engines-graphitemergetree).
+For more details, see [GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md#table_engines-graphitemergetree).
**Example**
@@ -264,7 +264,7 @@ Port for exchanging data between ClickHouse servers.
The host name that can be used by other servers to access this server.
-If omitted, it is defined in the same way as the ` hostname-f` command.
+If omitted, it is defined in the same way as the `hostname-f` command.
Useful for breaking away from a specific network interface.
@@ -308,7 +308,7 @@ Logging settings.
Keys:
- level – Logging level. Acceptable values: ``trace``, ``debug``, ``information``, ``warning``, ``error``.
-- log – The log file. Contains all the entries according to `` level``.
+- log – The log file. Contains all the entries according to `level`.
- errorlog – Error log file.
- size – Size of the file. Applies to ``log``and``errorlog``. Once the file reaches ``size``, ClickHouse archives and renames it, and creates a new log file in its place.
- count – The number of archived log files that ClickHouse stores.
@@ -325,7 +325,8 @@ Keys:
```
-Also, logging to syslog is possible. Configuration example:
+Writing to the syslog is also supported. Config example:
+
```xml
1
@@ -339,13 +340,14 @@ Also, logging to syslog is possible. Configuration example:
```
Keys:
-- user_syslog - activation key, turning on syslog logging.
-- address - host[:port] of syslogd. If not specified, local one would be used.
-- hostname - optional, source host of logs
-- facility - [syslog facility](https://en.wikipedia.org/wiki/Syslog#Facility),
-in uppercase, prefixed with "LOG_": (``LOG_USER``, ``LOG_DAEMON``, ``LOG_LOCAL3`` etc.).
-Default values: when ``address`` is specified, then ``LOG_USER``, otherwise - ``LOG_DAEMON``
-- format - message format. Possible values are - ``bsd`` and ``syslog``
+
+- user_syslog — Required setting if you want to write to the syslog.
+- address — The host[:порт] of syslogd. If omitted, the local daemon is used.
+- hostname — Optional. The name of the host that logs are sent from.
+- facility — [The syslog facility keyword](https://en.wikipedia.org/wiki/Syslog#Facility)
+in uppercase letters with the "LOG_" prefix: (``LOG_USER``, ``LOG_DAEMON``, ``LOG_LOCAL3``, and so on).
+Default value: ``LOG_USER`` if ``address`` is specified, ``LOG_DAEMON otherwise.``
+- format – Message format. Possible values: ``bsd`` and ``syslog.``
@@ -367,7 +369,7 @@ For more information, see the section "[Creating replicated tables](../../operat
## mark_cache_size
-Approximate size (in bytes) of the cache of "marks" used by [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) engines.
+Approximate size (in bytes) of the cache of "marks" used by [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree).
The cache is shared for the server and memory is allocated as needed. The cache size must be at least 5368709120.
@@ -409,7 +411,7 @@ The maximum number of open files.
By default: `maximum`.
-We recommend using this option in Mac OS X, since the ` getrlimit()` function returns an incorrect value.
+We recommend using this option in Mac OS X, since the `getrlimit()` function returns an incorrect value.
**Example**
@@ -423,9 +425,9 @@ We recommend using this option in Mac OS X, since the ` getrlimit()` function re
Restriction on deleting tables.
-If the size of a [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) type table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
+If the size of a [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
-If you still need to delete the table without restarting the ClickHouse server, create the ` /flags/force_drop_table` file and run the DROP query.
+If you still need to delete the table without restarting the ClickHouse server, create the `/flags/force_drop_table` file and run the DROP query.
Default value: 50 GB.
@@ -441,7 +443,7 @@ The value 0 means that you can delete all tables without any restrictions.
## merge_tree
-Fine tuning for tables in the [ MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) family.
+Fine tuning for tables in the [ MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree).
For more information, see the MergeTreeSettings.h header file.
@@ -459,25 +461,25 @@ For more information, see the MergeTreeSettings.h header file.
SSL client/server configuration.
-Support for SSL is provided by the `` libpoco`` library. The interface is described in the file [SSLManager.h](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h)
+Support for SSL is provided by the `libpoco` library. The interface is described in the file [SSLManager.h](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h)
Keys for server/client settings:
- privateKeyFile – The path to the file with the secret key of the PEM certificate. The file may contain a key and certificate at the same time.
-- certificateFile – The path to the client/server certificate file in PEM format. You can omit it if `` privateKeyFile`` contains the certificate.
+- certificateFile – The path to the client/server certificate file in PEM format. You can omit it if `privateKeyFile` contains the certificate.
- caConfig – The path to the file or directory that contains trusted root certificates.
- verificationMode – The method for checking the node's certificates. Details are in the description of the [Context](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/Context.h) class. Possible values: ``none``, ``relaxed``, ``strict``, ``once``.
- verificationDepth – The maximum length of the verification chain. Verification will fail if the certificate chain length exceeds the set value.
-- loadDefaultCAFile – Indicates that built-in CA certificates for OpenSSL will be used. Acceptable values: `` true``, `` false``. |
-- cipherList – Supported OpenSSL encryptions. For example: `` ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH``.
-- cacheSessions – Enables or disables caching sessions. Must be used in combination with ``sessionIdContext``. Acceptable values: `` true``, `` false``.
+- loadDefaultCAFile – Indicates that built-in CA certificates for OpenSSL will be used. Acceptable values: `true`, `false`. |
+- cipherList – Supported OpenSSL encryptions. For example: `ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH`.
+- cacheSessions – Enables or disables caching sessions. Must be used in combination with ``sessionIdContext``. Acceptable values: `true`, `false`.
- sessionIdContext – A unique set of random characters that the server appends to each generated identifier. The length of the string must not exceed ``SSL_MAX_SSL_SESSION_ID_LENGTH``. This parameter is always recommended, since it helps avoid problems both if the server caches the session and if the client requested caching. Default value: ``${application.name}``.
- sessionCacheSize – The maximum number of sessions that the server caches. Default value: 1024\*20. 0 – Unlimited sessions.
- sessionTimeout – Time for caching the session on the server.
-- extendedVerification – Automatically extended verification of certificates after the session ends. Acceptable values: `` true``, `` false``.
-- requireTLSv1 – Require a TLSv1 connection. Acceptable values: `` true``, `` false``.
-- requireTLSv1_1 – Require a TLSv1.1 connection. Acceptable values: `` true``, `` false``.
-- requireTLSv1 – Require a TLSv1.2 connection. Acceptable values: `` true``, `` false``.
+- extendedVerification – Automatically extended verification of certificates after the session ends. Acceptable values: `true`, `false`.
+- requireTLSv1 – Require a TLSv1 connection. Acceptable values: `true`, `false`.
+- requireTLSv1_1 – Require a TLSv1.1 connection. Acceptable values: `true`, `false`.
+- requireTLSv1 – Require a TLSv1.2 connection. Acceptable values: `true`, `false`.
- fips – Activates OpenSSL FIPS mode. Supported if the library's OpenSSL version supports FIPS.
- privateKeyPassphraseHandler – Class (PrivateKeyPassphraseHandler subclass) that requests the passphrase for accessing the private key. For example: ````, ``KeyFileHandler``, ``test``, ````.
- invalidCertificateHandler – Class (subclass of CertificateHandler) for verifying invalid certificates. For example: `` ConsoleCertificateHandler`` .
@@ -518,7 +520,7 @@ Keys for server/client settings:
## part_log
-Logging events that are associated with [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) data. For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
+Logging events that are associated with [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree). For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
Queries are logged in the ClickHouse table, not in a separate file.
@@ -558,9 +560,8 @@ Use the following parameters to configure logging:
The path to the directory containing data.
-!!! warning "Attention"
- The trailing slash is mandatory.
-
+!!! Note:
+The end slash is mandatory.
**Example**
@@ -646,8 +647,8 @@ Port for communicating with clients over the TCP protocol.
Path to temporary data for processing large queries.
-!!! warning "Attention"
- The trailing slash is mandatory.
+!!! Note:
+The end slash is mandatory.
**Example**
@@ -659,7 +660,7 @@ Path to temporary data for processing large queries.
## uncompressed_cache_size
-Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) family.
+Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree).
There is one shared cache for the server. Memory is allocated on demand. The cache is used if the option [use_uncompressed_cache](../settings/settings.md#settings-use_uncompressed_cache) is enabled.
@@ -673,7 +674,7 @@ The uncompressed cache is advantageous for very short queries in individual case
## user_files_path
-A catalog with user files. Used in a [file()](../../query_language/table_functions/file.md#table_functions-file) table function.
+The directory with user files. Used in the table function [file()](../../query_language/table_functions/file.md#table_functions-file).
**Example**
@@ -715,3 +716,4 @@ For more information, see the section "[Replication](../../operations/table_engi
```xml
```
+
diff --git a/docs/en/operations/settings/index.md b/docs/en/operations/settings/index.md
index 0a72ebac128..1c03340670f 100644
--- a/docs/en/operations/settings/index.md
+++ b/docs/en/operations/settings/index.md
@@ -7,17 +7,18 @@ Settings are configured in layers, so each subsequent layer redefines the previo
Ways to configure settings, in order of priority:
-- Settings in the server config file `users.xml`.
+- Settings in the `users.xml` server configuration file.
- Set it in user profile in `` element.
+ Set in the element ``.
- Session settings.
- Send ` SET setting=value` from the ClickHouse console client in interactive mode.
+ Send ` SET setting=value` from the ClickHouse console client in interactive mode.
Similarly, you can use ClickHouse sessions in the HTTP protocol. To do this, you need to specify the `session_id` HTTP parameter.
-- For a query.
- - When starting the ClickHouse console client in non-interactive mode, set the startup parameter `--setting=value`.
- - When using the HTTP API, pass CGI parameters (`URL?setting_1=value&setting_2=value...`).
+- Query settings.
+ - When starting the ClickHouse console client in non-interactive mode, set the startup parameter `--setting=value`.
+ - When using the HTTP API, pass CGI parameters (`URL?setting_1=value&setting_2=value...`).
Settings that can only be made in the server config file are not covered in this section.
+
diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md
index 25ed20fd5a3..8fa31d95bba 100644
--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@@ -4,18 +4,24 @@
## distributed_product_mode
-Changes the behavior of [distributed subqueries](../../query_language/select.md#queries-distributed-subrequests), i.e. in cases when the query contains the product of distributed tables.
+Changes the behavior of [distributed subqueries](../../query_language/select.md#queries-distributed-subrequests).
-ClickHouse applies the configuration if the subqueries on any level have a distributed table that exists on the local server and has more than one shard.
+ClickHouse applies this setting when the query contains the product of distributed tables, i.e. when the query for a distributed table contains a non-GLOBAL subquery for the distributed table.
Restrictions:
- Only applied for IN and JOIN subqueries.
-- Used only if a distributed table is used in the FROM clause.
-- Not used for a table-valued [ remote](../../query_language/table_functions/remote.md#table_functions-remote) function.
+- Only if the FROM section uses a distributed table containing more than one shard.
+- If the subquery concerns a distributed table containing more than one shard,
+- Not used for a table-valued [ remote](../../query_language/table_functions/remote.md#table_functions-remote).
The possible values are:
+- `deny` — Default value. Prohibits using these types of subqueries (returns the "Double-distributed in/JOIN subqueries is denied" exception).
+- `local` — Replaces the database and table in the subquery with local ones for the destination server (shard), leaving the normal `IN` / `JOIN.`
+- `global` — Replaces the `IN` / `JOIN` query with `GLOBAL IN` / `GLOBAL JOIN.`
+- `allow` — Allows the use of these types of subqueries.
+
## fallback_to_stale_replicas_for_distributed_queries
@@ -24,7 +30,7 @@ Forces a query to an out-of-date replica if updated data is not available. See "
ClickHouse selects the most relevant from the outdated replicas of the table.
-Used when performing ` SELECT` from a distributed table that points to replicated tables.
+Used when performing `SELECT` from a distributed table that points to replicated tables.
By default, 1 (enabled).
@@ -131,7 +137,7 @@ Sets the time in seconds. If a replica lags more than the set value, this replic
Default value: 0 (off).
-Used when performing ` SELECT` from a distributed table that points to replicated tables.
+Used when performing `SELECT` from a distributed table that points to replicated tables.
## max_threads
@@ -158,7 +164,7 @@ Don't confuse blocks for compression (a chunk of memory consisting of bytes) and
## min_compress_block_size
-For [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree)" tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
+For [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree)". In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
The actual size of the block, if the uncompressed data is less than 'max_compress_block_size', is no less than this value and no less than the volume of data for one mark.
@@ -343,4 +349,12 @@ If the value is true, integers appear in quotes when using JSON\* Int64 and UInt
## format_csv_delimiter
-The character to be considered as a delimiter in CSV data. By default, `,`.
+The character interpreted as a delimiter in the CSV data. By default, the delimiter is `,`.
+
+
+
+## join_use_nulls {: #settings-join_use_nulls}
+
+Affects the behavior of [JOIN](../../query_language/select.md#query_language-join).
+
+With `join_use_nulls=1`, `JOIN` behaves like in standard SQL, i.e. if empty cells appear when merging, the type of the corresponding field is converted to [Nullable](../../data_types/nullable.md#data_type-nullable), and empty cells are filled with [NULL](../../query_language/syntax.md#null-literal).
diff --git a/docs/en/operations/settings/settings_profiles.md b/docs/en/operations/settings/settings_profiles.md
index b0f2e0c3e35..3e3175bc9fb 100644
--- a/docs/en/operations/settings/settings_profiles.md
+++ b/docs/en/operations/settings/settings_profiles.md
@@ -1,11 +1,13 @@
+
+
# Settings profiles
A settings profile is a collection of settings grouped under the same name. Each ClickHouse user has a profile.
-To apply all the settings in a profile, set `profile`.
+To apply all the settings in a profile, set the `profile` setting.
Example:
-Setting `web` profile.
+Install the `web` profile.
```sql
SET profile = 'web'
@@ -57,6 +59,7 @@ Example:
```
-The example specifies two profiles: `default` and `web`. The `default` profile has a special purpose: it must always be present and is applied when starting the server. In other words, the `default` profile contains default settings. The `web` profile is a regular profile that can be set using the `SET` query or using a URL parameter in an HTTP query.
+The example specifies two profiles: `default` and `web`. The `default` profile has a special purpose: it must always be present and is applied when starting the server. In other words, the `default` profile contains default settings. The `web` profile is a regular profile that can be set using the `SET` query or using a URL parameter in an HTTP query.
+
+Settings profiles can inherit from each other. To use inheritance, indicate the `profile` setting before the other settings that are listed in the profile.
-Settings profiles can inherit from each other. To use inheritance, indicate the `profile` setting before the other settings that are listed in the profile.
diff --git a/docs/en/operations/system_tables.md b/docs/en/operations/system_tables.md
index 5659700a0b6..abd6e819373 100644
--- a/docs/en/operations/system_tables.md
+++ b/docs/en/operations/system_tables.md
@@ -5,7 +5,6 @@ You can't delete a system table (but you can perform DETACH).
System tables don't have files with data on the disk or files with metadata. The server creates all the system tables when it starts.
System tables are read-only.
They are located in the 'system' database.
-
## system.asynchronous_metrics
@@ -20,27 +19,28 @@ Contains information about clusters available in the config file and the servers
Columns:
```text
-cluster String – Cluster name.
-shard_num UInt32 – Number of a shard in the cluster, starting from 1.
-shard_weight UInt32 – Relative weight of a shard when writing data.
-replica_num UInt32 – Number of a replica in the shard, starting from 1.
-host_name String – Host name as specified in the config.
-host_address String – Host's IP address obtained from DNS.
-port UInt16 – The port used to access the server.
-user String – The username to use for connecting to the server.
+cluster String — The cluster name.
+shard_num UInt32 — The shard number in the cluster, starting from 1.
+shard_weight UInt32 — The relative weight of the shard when writing data.
+replica_num UInt32 — The replica number in the shard, starting from 1.
+host_name String — The host name, as specified in the config.
+String host_address — The host IP address obtained from DNS.
+port UInt16 — The port to use for connecting to the server.
+user String — The name of the user for connecting to the server.
```
+
## system.columns
Contains information about the columns in all tables.
You can use this table to get information similar to `DESCRIBE TABLE`, but for multiple tables at once.
```text
-database String - Name of the database the table is located in.
-table String - Table name.
-name String - Column name.
-type String - Column type.
-default_type String - Expression type (DEFAULT, MATERIALIZED, ALIAS) for the default value, or an empty string if it is not defined.
-default_expression String - Expression for the default value, or an empty string if it is not defined.
+database String — The name of the database the table is in.
+table String – Table name.
+name String — Column name.
+type String — Column type.
+default_type String — Expression type (DEFAULT, MATERIALIZED, ALIAS) for the default value, or an empty string if it is not defined.
+default_expression String — Expression for the default value, or an empty string if it is not defined.
```
## system.databases
@@ -55,19 +55,19 @@ Contains information about external dictionaries.
Columns:
-- `name String` – Dictionary name.
-- `type String` – Dictionary type: Flat, Hashed, Cache.
-- `origin String` – Path to the config file where the dictionary is described.
-- `attribute.names Array(String)` – Array of attribute names provided by the dictionary.
-- `attribute.types Array(String)` – Corresponding array of attribute types provided by the dictionary.
-- `has_hierarchy UInt8` – Whether the dictionary is hierarchical.
-- `bytes_allocated UInt64` – The amount of RAM used by the dictionary.
-- `hit_rate Float64` – For cache dictionaries, the percent of usage for which the value was in the cache.
-- `element_count UInt64` – The number of items stored in the dictionary.
-- `load_factor Float64` – The filled percentage of the dictionary (for a hashed dictionary, it is the filled percentage of the hash table).
-- `creation_time DateTime` – Time spent for the creation or last successful reload of the dictionary.
-- `last_exception String` – Text of an error that occurred when creating or reloading the dictionary, if the dictionary couldn't be created.
-- `source String` – Text describing the data source for the dictionary.
+- `name String` — Dictionary name.
+- `type String` — Dictionary type: Flat, Hashed, Cache.
+- `origin String` — Path to the configuration file that describes the dictionary.
+- `attribute.names Array(String)` — Array of attribute names provided by the dictionary.
+- `attribute.types Array(String)` — Corresponding array of attribute types that are provided by the dictionary.
+- `has_hierarchy UInt8` — Whether the dictionary is hierarchical.
+- `bytes_allocated UInt64` — The amount of RAM the dictionary uses.
+- `hit_rate Float64` — For cache dictionaries, the percentage of uses for which the value was in the cache.
+- `element_count UInt64` — The number of items stored in the dictionary.
+- `load_factor Float64` — The percentage full of the dictionary (for a hashed dictionary, the percentage filled in the hash table).
+- `creation_time DateTime` — The time when the dictionary was created or last successfully reloaded.
+- `last_exception String` — Text of the error that occurs when creating or reloading the dictionary if the dictionary couldn't be created.
+- `source String` — Text describing the data source for the dictionary.
Note that the amount of memory used by the dictionary is not proportional to the number of items stored in it. So for flat and cached dictionaries, all the memory cells are pre-assigned, regardless of how full the dictionary actually is.
@@ -84,26 +84,27 @@ Contains information about normal and aggregate functions.
Columns:
-- `name` (`String`) – Function name.
-- `is_aggregate` (`UInt8`) – Whether it is an aggregate function.
+- `name`(`String`) – The name of the function.
+- `is_aggregate`(`UInt8`) — Whether the function is aggregate.
+
## system.merges
Contains information about merges currently in process for tables in the MergeTree family.
Columns:
-- `database String` — Name of the database the table is located in.
-- `table String` — Name of the table.
-- `elapsed Float64` — Time in seconds since the merge started.
-- `progress Float64` — Percent of progress made, from 0 to 1.
-- `num_parts UInt64` — Number of parts to merge.
-- `result_part_name String` — Name of the part that will be formed as the result of the merge.
-- `total_size_bytes_compressed UInt64` — Total size of compressed data in the parts being merged.
-- `total_size_marks UInt64` — Total number of marks in the parts being merged.
-- `bytes_read_uncompressed UInt64` — Amount of bytes read, decompressed.
+- `database String` — The name of the database the table is in.
+- `table String` — Table name.
+- `elapsed Float64` — The time elapsed (in seconds) since the merge started.
+- `progress Float64` — The percentage of completed work from 0 to 1.
+- `num_parts UInt64` — The number of pieces to be merged.
+- `result_part_name String` — The name of the part that will be formed as the result of merging.
+- `total_size_bytes_compressed UInt64` — The total size of the compressed data in the merged chunks.
+- `total_size_marks UInt64` — The total number of marks in the merged partss.
+- `bytes_read_uncompressed UInt64` — Number of bytes read, uncompressed.
- `rows_read UInt64` — Number of rows read.
-- `bytes_written_uncompressed UInt64` — Amount of bytes written, uncompressed.
-- `rows_written UInt64` — Number of rows written.
+- `bytes_written_uncompressed UInt64` — Number of bytes written, uncompressed.
+- `rows_written UInt64` — Number of lines rows written.
## system.metrics
@@ -127,31 +128,54 @@ This is similar to the DUAL table found in other DBMSs.
## system.parts
-Contains information about parts of a table in the [MergeTree](../operations/table_engines/mergetree.md#table_engines-mergetree) family.
+Contains information about parts of [MergeTree](table_engines/mergetree.md#table_engines-mergetree) tables.
Each row describes one part of the data.
Columns:
-- partition (String) – The partition name. It's in YYYYMM format in case of old-style partitioning and is arbitary serialized value in case of custom partitioning. To learn what a partition is, see the description of the [ALTER](../query_language/alter.md#query_language_queries_alter) query.
+- partition (String) – The partition name. To learn what a partition is, see the description of the [ALTER](../query_language/alter.md#query_language_queries_alter) query.
+
+Formats:
+- `YYYYMM` for automatic partitioning by month.
+- `any_string` when partitioning manually.
+
- name (String) – Name of the data part.
+
- active (UInt8) – Indicates whether the part is active. If a part is active, it is used in a table; otherwise, it will be deleted. Inactive data parts remain after merging.
+
- marks (UInt64) – The number of marks. To get the approximate number of rows in a data part, multiply ``marks`` by the index granularity (usually 8192).
+
- marks_size (UInt64) – The size of the file with marks.
+
- rows (UInt64) – The number of rows.
+
- bytes (UInt64) – The number of bytes when compressed.
+
- modification_time (DateTime) – The modification time of the directory with the data part. This usually corresponds to the time of data part creation.|
+
- remove_time (DateTime) – The time when the data part became inactive.
+
- refcount (UInt32) – The number of places where the data part is used. A value greater than 2 indicates that the data part is used in queries or merges.
+
- min_date (Date) – The minimum value of the date key in the data part.
+
- max_date (Date) – The maximum value of the date key in the data part.
+
- min_block_number (UInt64) – The minimum number of data parts that make up the current part after merging.
+
- max_block_number (UInt64) – The maximum number of data parts that make up the current part after merging.
+
- level (UInt32) – Depth of the merge tree. If a merge was not performed, ``level=0``.
+
- primary_key_bytes_in_memory (UInt64) – The amount of memory (in bytes) used by primary key values.
+
- primary_key_bytes_in_memory_allocated (UInt64) – The amount of memory (in bytes) reserved for primary key values.
+
- database (String) – Name of the database.
+
- table (String) – Name of the table.
+
- engine (String) – Name of the table engine without parameters.
## system.processes
@@ -162,21 +186,21 @@ Columns:
```text
user String – Name of the user who made the request. For distributed query processing, this is the user who helped the requestor server send the query to this server, not the user who made the distributed request on the requestor server.
-address String – The IP address that the query was made from. The same is true for distributed query processing.
+address String - The IP address the request was made from. The same for distributed processing.
-elapsed Float64 – The time in seconds since request execution started.
+elapsed Float64 - The time in seconds since request execution started.
-rows_read UInt64 – The number of rows read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
+rows_read UInt64 - The number of rows read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
-bytes_read UInt64 – The number of uncompressed bytes read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
+bytes_read UInt64 - The number of uncompressed bytes read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
-UInt64 total_rows_approx – The approximate total number of rows that must be read. For distributed processing, on the requestor server, this is the total for all remote servers. It can be updated during request processing, when new sources to process become known.
+total_rows_approx UInt64 - The approximation of the total number of rows that should be read. For distributed processing, on the requestor server, this is the total for all remote servers. It can be updated during request processing, when new sources to process become known.
-memory_usage UInt64 – Memory consumption by the query. It might not include some types of dedicated memory.
+memory_usage UInt64 - How much memory the request uses. It might not include some types of dedicated memory.
-query String – The query text. For INSERT, it doesn't include the data to insert.
+query String - The query text. For INSERT, it doesn't include the data to insert.
-query_id – Query ID, if defined.
+query_id String - Query ID, if defined.
```
## system.replicas
@@ -220,54 +244,54 @@ active_replicas: 2
Columns:
```text
-database: database name
-table: table name
-engine: table engine name
+database: Database name
+table: Table name
+engine: Table engine name
-is_leader: whether the replica is the leader
+is_leader: Whether the replica is the leader.
Only one replica at a time can be the leader. The leader is responsible for selecting background merges to perform.
Note that writes can be performed to any replica that is available and has a session in ZK, regardless of whether it is a leader.
is_readonly: Whether the replica is in read-only mode.
-This mode is turned on if the config doesn't have sections with ZK, if an unknown error occurred when reinitializing sessions in ZK, and during session reinitialization in ZK.
+This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when reinitializing sessions in ZooKeeper, and during session reinitialization in ZooKeeper.
-is_session_expired: Whether the ZK session expired.
-Basically, the same thing as is_readonly.
+is_session_expired: Whether the session with ZooKeeper has expired.
+Basically the same as 'is_readonly'.
-future_parts: The number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.
+future_parts: The number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.
-parts_to_check: The number of data parts in the queue for verification.
+parts_to_check: The number of data parts in the queue for verification.
A part is put in the verification queue if there is suspicion that it might be damaged.
-zookeeper_path: The path to the table data in ZK.
-replica_name: Name of the replica in ZK. Different replicas of the same table have different names.
-replica_path: The path to the replica data in ZK. The same as concatenating zookeeper_path/replicas/replica_path.
+zookeeper_path: Path to table data in ZooKeeper.
+replica_name: Replica name in ZooKeeper. Different replicas of the same table have different names.
+replica_path: Path to replica data in ZooKeeper. The same as concatenating 'zookeeper_path/replicas/replica_path'.
-columns_version: Version number of the table structure.
+columns_version: Version number of the table structure.
Indicates how many times ALTER was performed. If replicas have different versions, it means some replicas haven't made all of the ALTERs yet.
queue_size: Size of the queue for operations waiting to be performed.
Operations include inserting blocks of data, merges, and certain other actions.
-Normally coincides with future_parts.
+It usually coincides with 'future_parts'.
-inserts_in_queue: Number of inserts of blocks of data that need to be made.
-Insertions are usually replicated fairly quickly. If the number is high, something is wrong.
+inserts_in_queue: Number of inserts of blocks of data that need to be made.
+Insertions are usually replicated fairly quickly. If this number is large, it means something is wrong.
-merges_in_queue: The number of merges waiting to be made.
+merges_in_queue: The number of merges waiting to be made.
Sometimes merges are lengthy, so this value may be greater than zero for a long time.
-The next 4 columns have a non-null value only if the ZK session is active.
+The next 4 columns have a non-zero value only where there is an active session with ZK.
-log_max_index: Maximum entry number in the log of general activity.
+log_max_index: Maximum entry number in the log of general activity.
log_pointer: Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one.
If log_pointer is much smaller than log_max_index, something is wrong.
-total_replicas: Total number of known replicas of this table.
-active_replicas: Number of replicas of this table that have a ZK session (the number of active replicas).
+total_replicas: The total number of known replicas of this table.
+active_replicas: The number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas).
```
-If you request all the columns, the table may work a bit slowly, since several reads from ZK are made for each row.
+If you request all the columns, the table may work a bit slowly, since several reads from ZooKeeper are made for each row.
If you don't request the last 4 columns (log_max_index, log_pointer, total_replicas, active_replicas), the table works quickly.
For example, you can check that everything is working correctly like this:
@@ -307,14 +331,14 @@ If this query doesn't return anything, it means that everything is fine.
## system.settings
Contains information about settings that are currently in use.
-I.e. used for executing the query you are using to read from the system.settings table).
+I.e. used for executing the query you are using to read from the system.settings table.
Columns:
```text
-name String – Setting name.
-value String – Setting value.
-changed UInt8 - Whether the setting was explicitly defined in the config or explicitly changed.
+name String — Setting name.
+value String — Setting value.
+changed UInt8 — Whether the setting was explicitly defined in the config or explicitly changed.
```
Example:
@@ -343,7 +367,7 @@ This system table is used for implementing SHOW TABLES queries.
## system.zookeeper
-This table presents when ZooKeeper is configured. It allows reading data from the ZooKeeper cluster defined in the config.
+The table does not exist if ZooKeeper is not configured. Allows reading data from the ZooKeeper cluster defined in the config.
The query must have a 'path' equality condition in the WHERE clause. This is the path in ZooKeeper for the children that you want to get data for.
The query `SELECT * FROM system.zookeeper WHERE path = '/clickhouse'` outputs data for all children on the `/clickhouse` node.
@@ -352,21 +376,20 @@ If the path specified in 'path' doesn't exist, an exception will be thrown.
Columns:
-- `name String` — Name of the node.
-- `path String` — Path to the node.
-- `value String` — Value of the node.
+- `name String` — The name of the node.
+- `path String` — The path to the node.
+- `value String` — Node value.
- `dataLength Int32` — Size of the value.
-- `numChildren Int32` — Number of children.
+- `numChildren Int32` — Number of descendants.
- `czxid Int64` — ID of the transaction that created the node.
- `mzxid Int64` — ID of the transaction that last changed the node.
-- `pzxid Int64` — ID of the transaction that last added or removed children.
+- `pzxid Int64` — ID of the transaction that last deleted or added descendants.
- `ctime DateTime` — Time of node creation.
-- `mtime DateTime` — Time of the last node modification.
-- `version Int32` — Node version - the number of times the node was changed.
-- `cversion Int32` — Number of added or removed children.
-- `aversion Int32` — Number of changes to ACL.
-- `ephemeralOwner Int64` — For ephemeral nodes, the ID of the session that owns this node.
-
+- `mtime DateTime` — Time of the last modification of the node.
+- `version Int32` — Node version: the number of times the node was changed.
+- `cversion Int32` — Number of added or removed descendants.
+- `aversion Int32` — Number of changes to the ACL.
+- `ephemeralOwner Int64` — For ephemeral nodes, the ID of hte session that owns this node.
Example:
diff --git a/docs/en/operations/table_engines/aggregatingmergetree.md b/docs/en/operations/table_engines/aggregatingmergetree.md
index ec14eb66130..0f99da076b6 100644
--- a/docs/en/operations/table_engines/aggregatingmergetree.md
+++ b/docs/en/operations/table_engines/aggregatingmergetree.md
@@ -21,15 +21,14 @@ This type of column stores the state of an aggregate function.
To get this type of value, use aggregate functions with the `State` suffix.
-Example:
-`uniqState(UserID), quantilesState(0.5, 0.9)(SendTiming)`
+Example:`uniqState(UserID), quantilesState(0.5, 0.9)(SendTiming)`
In contrast to the corresponding `uniq` and `quantiles` functions, these functions return the state, rather than the prepared value. In other words, they return an `AggregateFunction` type value.
An `AggregateFunction` type value can't be output in Pretty formats. In other formats, these types of values are output as implementation-specific binary data. The `AggregateFunction` type values are not intended for output or saving in a dump.
-The only useful thing you can do with `AggregateFunction` type values is combine the states and get a result, which essentially means to finish aggregation. Aggregate functions with the 'Merge' suffix are used for this purpose.
-Example: `uniqMerge(UserIDState), where UserIDState has the AggregateFunction` type.
+The only useful thing you can do with `AggregateFunction` type values is to combine the states and get a result, which essentially means to finish aggregation. Aggregate functions with the 'Merge' suffix are used for this purpose.
+Example: `uniqMerge(UserIDState)`, where `UserIDState` has the `AggregateFunction` type.
In other words, an aggregate function with the 'Merge' suffix takes a set of states, combines them, and returns the result.
As an example, these two queries return the same result:
diff --git a/docs/en/operations/table_engines/custom_partitioning_key.md b/docs/en/operations/table_engines/custom_partitioning_key.md
index 631484afbf8..bcfe8c8c410 100644
--- a/docs/en/operations/table_engines/custom_partitioning_key.md
+++ b/docs/en/operations/table_engines/custom_partitioning_key.md
@@ -37,7 +37,7 @@ Note: For old-style tables, the partition can be specified either as a number `2
In the `system.parts` table, the `partition` column specifies the value of the partition expression to use in ALTER queries (if quotas are removed). The `name` column should specify the name of the data part that has a new format.
-Was: `20140317_20140323_2_2_0` (minimum date - maximum date - minimum block number - maximum block number - level).
+Old: `20140317_20140323_2_2_0` (minimum date - maximum date - minimum block number - maximum block number - level).
Now: `201403_2_2_0` (partition ID - minimum block number - maximum block number - level).
diff --git a/docs/en/operations/table_engines/dictionary.md b/docs/en/operations/table_engines/dictionary.md
index 7029abf41ec..6bf606ffa12 100644
--- a/docs/en/operations/table_engines/dictionary.md
+++ b/docs/en/operations/table_engines/dictionary.md
@@ -83,6 +83,7 @@ CREATE TABLE products
)
ENGINE = Dictionary(products)
```
+
```
Ok.
@@ -106,3 +107,4 @@ LIMIT 1
1 rows in set. Elapsed: 0.006 sec.
```
+
diff --git a/docs/en/operations/table_engines/file.md b/docs/en/operations/table_engines/file.md
index 8e31e346031..ae081365daa 100644
--- a/docs/en/operations/table_engines/file.md
+++ b/docs/en/operations/table_engines/file.md
@@ -1,51 +1,51 @@
-# File(InputFormat)
+# File(Format)
-The data source is a file that stores data in one of the supported input formats (TabSeparated, Native, etc.).
+Manages data in a single file on disk in the specified format.
Usage examples:
-- Data export from ClickHouse to file.
-- Convert data from one format to another.
-- Updating data in ClickHouse via editing a file on a disk.
+- Downloading data from ClickHouse to a file.
+- Converting data from one format to another.
+- Updating data in ClickHouse by editing the file on disk.
-## Usage in ClickHouse Server
+## Using the engine in the ClickHouse server
```
File(Format)
```
-`Format` should be supported for either `INSERT` and `SELECT`. For the full list of supported formats see [Formats](../../interfaces/formats.md#formats).
+`The format` must be one that ClickHouse can use both in `INSERT` queries and in `SELECT` queries. For the full list of supported formats, see [Formats](../../interfaces/formats.md#formats).
-ClickHouse does not allow to specify filesystem path for`File`. It will use folder defined by [path](../server_settings/settings.md#server_settings-path) setting in server configuration.
+The ClickHouse server does not allow you to specify the path to the file tthat `File` will work with. It uses the path to the storage that is specified by the [path](../server_settings/settings.md#server_settings-path) parameter in the server configuration.
-When creating table using `File(Format)` it creates empty subdirectory in that folder. When data is written to that table, it's put into `data.Format` file in that subdirectory.
+When creating a table using `File(Format)`, the ClickHouse server creates a directory with the name of the table in the storage, and puts the `data.Format` file there after it is added to the data table.
-You may manually create this subfolder and file in server filesystem and then [ATTACH](../../query_language/misc.md#queries-attach) it to table information with matching name, so you can query data from that file.
+You can manually create the table's directory in storage, put the file there, and then use [ATTACH](../../query_language/misc.md#queries-attach) the ClickHouse server to add information about the table corresponding to the directory name and read data from the file.
-!!! warning
- Be careful with this funcionality, because ClickHouse does not keep track of external changes to such files. The result of simultaneous writes via ClickHouse and outside of ClickHouse is undefined.
+!!! Warning:
+Be careful with this functionality, because the ClickHouse server does not track external data changes. If the file will be written to simultaneously from the ClickHouse server and from an external source, the result is unpredictable.
**Example:**
-**1.** Set up the `file_engine_table` table:
+**1.** Create a `file_engine_table` table on the server :
```sql
CREATE TABLE file_engine_table (name String, value UInt32) ENGINE=File(TabSeparated)
```
-By default ClickHouse will create folder `/var/lib/clickhouse/data/default/file_engine_table`.
+In the default configuration, the ClickHouse server creates the directory `/var/lib/clickhouse/data/default/file_engine_table`.
-**2.** Manually create `/var/lib/clickhouse/data/default/file_engine_table/data.TabSeparated` containing:
+**2.** Manually create the file `/var/lib/clickhouse/data/default/file_engine_table/data.TabSeparated` with the contents:
```bash
-$ cat data.TabSeparated
+$cat data.TabSeparated
one 1
two 2
```
-**3.** Query the data:
+**3.** Request data:
```sql
SELECT * FROM file_engine_table
@@ -58,9 +58,9 @@ SELECT * FROM file_engine_table
└──────┴───────┘
```
-## Usage in Clickhouse-local
+## Using the engine in clickhouse-local
-In [clickhouse-local](../utils/clickhouse-local.md#utils-clickhouse-local) File engine accepts file path in addition to `Format`. Default input/output streams can be specified using numeric or human-readable names like `0` or `stdin`, `1` or `stdout`.
+In [clickhouse-local](../utils/clickhouse-local.md#utils-clickhouse-local) the engine takes the file path as a parameter, as well as the format. The standard input/output streams can be specified using numbers or letters: `0` or `stdin`, `1` or `stdout`.
**Example:**
@@ -70,9 +70,8 @@ $ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64
## Details of Implementation
-- Reads can be parallel, but not writes
+- Multi-stream reading and single-stream writing are supported.
- Not supported:
- - `ALTER`
- - `SELECT ... SAMPLE`
- - Indices
- - Replication
+ - `ALTER` and `SELECT...SAMPLE` operations.
+ - Indexes.
+ - Replication.
diff --git a/docs/en/operations/table_engines/graphitemergetree.md b/docs/en/operations/table_engines/graphitemergetree.md
index f7d43c3bb23..d1aaeca6aed 100644
--- a/docs/en/operations/table_engines/graphitemergetree.md
+++ b/docs/en/operations/table_engines/graphitemergetree.md
@@ -8,11 +8,11 @@ Graphite stores full data in ClickHouse, and data can be retrieved in the follow
- Without thinning.
- Uses the [MergeTree](mergetree.md#table_engines-mergetree) engine.
+ Uses the [MergeTree](mergetree.md#table_engines-mergetree) engine.
- With thinning.
- Using the `GraphiteMergeTree` engine.
+ Using the `GraphiteMergeTree` engine.
The engine inherits properties from MergeTree. The settings for thinning data are defined by the [graphite_rollup](../server_settings/settings.md#server_settings-graphite_rollup) parameter in the server configuration.
diff --git a/docs/en/operations/table_engines/index.md b/docs/en/operations/table_engines/index.md
index 1a72e030b77..eba6fffe37d 100644
--- a/docs/en/operations/table_engines/index.md
+++ b/docs/en/operations/table_engines/index.md
@@ -1,14 +1,16 @@
-# Table Engines
+
+
+# Table engines
The table engine (type of table) determines:
-- How and where data is stored: where to write it to, and where to read it from.
+- How and where data is stored, where to write it to, and where to read it from.
- Which queries are supported, and how.
- Concurrent data access.
- Use of indexes, if present.
- Whether multithreaded request execution is possible.
-- Data replication.
+- Data replication parameters.
-When reading data, the engine is only required to extract the necessary set of columns. However, in some cases, the query may be partially processed inside the table engine.
+When reading, the engine is only required to output the requested columns, but in some cases the engine can partially process data when responding to the request.
-Note that for most serious tasks, you should use engines from the `MergeTree` family.
+For most serious tasks, you should use engines from the `MergeTree` family.
diff --git a/docs/en/operations/table_engines/kafka.md b/docs/en/operations/table_engines/kafka.md
index f04c234dcd5..6fd1a90a503 100644
--- a/docs/en/operations/table_engines/kafka.md
+++ b/docs/en/operations/table_engines/kafka.md
@@ -115,11 +115,11 @@ To stop receiving topic data or to change the conversion logic, detach the mater
ATTACH MATERIALIZED VIEW consumer;
```
-If you want to change the target table by using ` ALTER`materialized view, we recommend disabling the material view to avoid discrepancies between the target table and the data from the view.
+If you want to change the target table by using `ALTER`, we recommend disabling the material view to avoid discrepancies between the target table and the data from the view.
## Configuration
-Similar to GraphiteMergeTree, the Kafka engine supports extended configuration using the ClickHouse config file. There are two configuration keys that you can use: global (`kafka`) and topic-level (`kafka_topic_*`). The global configuration is applied first, and the topic-level configuration is second (if it exists).
+Similar to GraphiteMergeTree, the Kafka engine supports extended configuration using the ClickHouse config file. There are two configuration keys that you can use: global (`kafka`) and topic-level (`kafka_topic_*`). The global configuration is applied first, and then the topic-level configuration is applied (if it exists).
```xml
@@ -136,4 +136,3 @@ Similar to GraphiteMergeTree, the Kafka engine supports extended configuration u
```
For a list of possible configuration options, see the [librdkafka configuration reference](https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md). Use the underscore (`_`) instead of a dot in the ClickHouse configuration. For example, `check.crcs=true` will be `true`.
-
diff --git a/docs/en/operations/table_engines/materializedview.md b/docs/en/operations/table_engines/materializedview.md
index 5f46f5fe528..e2eb857aca7 100644
--- a/docs/en/operations/table_engines/materializedview.md
+++ b/docs/en/operations/table_engines/materializedview.md
@@ -1,4 +1,4 @@
# MaterializedView
-Used for implementing materialized views (for more information, see the [CREATE TABLE](../../query_language/create.md#query_language-queries-create_table)) query. For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses this engine.
+Used for implementing materialized views (for more information, see [CREATE TABLE](../../query_language/create.md#query_language-queries-create_table)). For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses this engine.
diff --git a/docs/en/operations/table_engines/merge.md b/docs/en/operations/table_engines/merge.md
index f50f20ccd9a..4694b0d78f7 100644
--- a/docs/en/operations/table_engines/merge.md
+++ b/docs/en/operations/table_engines/merge.md
@@ -1,8 +1,8 @@
# Merge
-The Merge engine (not to be confused with `MergeTree`) does not store data itself, but allows reading from any number of other tables simultaneously.
+The `Merge` engine (not to be confused with `MergeTree`) does not store data itself, but allows reading from any number of other tables simultaneously.
Reading is automatically parallelized. Writing to a table is not supported. When reading, the indexes of tables that are actually being read are used, if they exist.
-The Merge engine accepts parameters: the database name and a regular expression for tables.
+The `Merge` engine accepts parameters: the database name and a regular expression for tables.
Example:
@@ -10,30 +10,31 @@ Example:
Merge(hits, '^WatchLog')
```
-Data will be read from the tables in the 'hits' database that have names that match the regular expression '`^WatchLog`'.
+Data will be read from the tables in the `hits` database that have names that match the regular expression '`^WatchLog`'.
Instead of the database name, you can use a constant expression that returns a string. For example, `currentDatabase()`.
Regular expressions — [re2](https://github.com/google/re2) (supports a subset of PCRE), case-sensitive.
See the notes about escaping symbols in regular expressions in the "match" section.
-When selecting tables to read, the Merge table itself will not be selected, even if it matches the regex. This is to avoid loops.
-It is possible to create two Merge tables that will endlessly try to read each others' data, but this is not a good idea.
+When selecting tables to read, the `Merge` table itself will not be selected, even if it matches the regex. This is to avoid loops.
+It is possible to create two `Merge` tables that will endlessly try to read each others' data, but this is not a good idea.
-The typical way to use the Merge engine is for working with a large number of TinyLog tables as if with a single table.
+The typical way to use the `Merge` engine is for working with a large number of `TinyLog` tables as if with a single table.
## Virtual Columns
-Virtual columns are columns that are provided by the table engine, regardless of the table definition. In other words, these columns are not specified in CREATE TABLE, but they are accessible for SELECT.
+Virtual columns are columns that are provided by the table engine, regardless of the table definition. In other words, these columns are not specified in `CREATE TABLE`, but they are accessible for `SELECT`.
Virtual columns differ from normal columns in the following ways:
- They are not specified in table definitions.
-- Data can't be added to them with INSERT.
-- When using INSERT without specifying the list of columns, virtual columns are ignored.
+- Data can't be added to them with `INSERT`.
+- When using `INSERT` without specifying the list of columns, virtual columns are ignored.
- They are not selected when using the asterisk (`SELECT *`).
- Virtual columns are not shown in `SHOW CREATE TABLE` and `DESC TABLE` queries.
-A Merge type table contains a virtual _table column with the String type. (If the table already has a _table column, the virtual column is named _table1, and if it already has _table1, it is named _table2, and so on.) It contains the name of the table that data was read from.
+The `Merge` type table contains a virtual `_table` column of the `String` type. (If the table already has a `_table` column, the virtual column is called `_table1`; if you already have `_table1`, it's called `_table2`, and so on.) It contains the name of the table that data was read from.
+
+If the `WHERE/PREWHERE` clause contains conditions for the `_table` column that do not depend on other table columns (as one of the conjunction elements, or as an entire expression), these conditions are used as an index. The conditions are performed on a data set of table names to read data from, and the read operation will be performed from only those tables that the condition was triggered on.
-If the WHERE or PREWHERE clause contains conditions for the '_table' column that do not depend on other table columns (as one of the conjunction elements, or as an entire expression), these conditions are used as an index. The conditions are performed on a data set of table names to read data from, and the read operation will be performed from only those tables that the condition was triggered on.
diff --git a/docs/en/operations/table_engines/mergetree.md b/docs/en/operations/table_engines/mergetree.md
index d89b2dfdfa3..8ff01c112da 100644
--- a/docs/en/operations/table_engines/mergetree.md
+++ b/docs/en/operations/table_engines/mergetree.md
@@ -56,7 +56,7 @@ In this example, the index can't be used.
SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
```
-To check whether ClickHouse can use the index when executing the query, use the settings [force_index_by_date](../settings/settings.md#settings-settings-force_index_by_date)and[force_primary_key](../settings/settings.md#settings-settings-force_primary_key).
+To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](../settings/settings.md#settings-settings-force_index_by_date) and [force_primary_key](../settings/settings.md#settings-settings-force_primary_key).
The index by date only allows reading those parts that contain dates from the desired range. However, a data part may contain data for many dates (up to an entire month), while within a single part the data is ordered by the primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.
@@ -69,4 +69,3 @@ The `OPTIMIZE` query is supported, which calls an extra merge step.
You can use a single large table and continually add data to it in small chunks – this is what MergeTree is intended for.
Data replication is possible for all types of tables in the MergeTree family (see the section "Data replication").
-
diff --git a/docs/en/operations/table_engines/mysql.md b/docs/en/operations/table_engines/mysql.md
index c9b90d2e253..06c0fd6b622 100644
--- a/docs/en/operations/table_engines/mysql.md
+++ b/docs/en/operations/table_engines/mysql.md
@@ -2,18 +2,27 @@
# MySQL
-The MySQL engine allows you to perform SELECT queries on data that is stored on a remote MySQL server.
+The MySQL engine allows you to perform `SELECT` queries on data that is stored on a remote MySQL server.
-The engine takes 5-7 parameters: the server address (host and port); the name of the database; the name of the table; the user's name; the user's password; whether to use replace query; the on duplcate clause. Example:
+Call format:
-```text
+```
MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
```
-At this time, simple WHERE clauses such as ```=, !=, >, >=, <, <=``` are executed on the MySQL server.
+**Call parameters**
-The rest of the conditions and the LIMIT sampling constraint are executed in ClickHouse only after the query to MySQL finishes.
+- `host:port` — Address of the MySQL server.
+- `database` — Database name on the MySQL server.
+- `table` — Name of the table.
+- `user` — The MySQL User.
+- `password` — User password.
+- `replace_query` — Flag that sets query substitution `INSERT INTO` to `REPLACE INTO`. If `replace_query=1`, the query is replaced.
+- `'on_duplicate_clause'` — Adds the `ON DUPLICATE KEY UPDATE 'on_duplicate_clause'` expression to the `INSERT` query. For example: `impression = VALUES(impression) + impression`. To specify `'on_duplicate_clause'` you need to pass `0` to the `replace_query` parameter. If you simultaneously pass `replace_query = 1` and `'on_duplicate_clause'`, ClickHouse generates an exception.
+
+At this time, simple `WHERE` clauses such as ` =, !=, >, >=, <, <=` are executed on the MySQL server.
+
+The rest of the conditions and the `LIMIT` sampling constraint are executed in ClickHouse only after the query to MySQL finishes.
+
+The `MySQL` engine does not support the [Nullable](../../data_types/nullable.md#data_type-nullable) data type, so when reading data from MySQL tables, `NULL` is converted to default values for the specified column type (usually 0 or an empty string).
-If `replace_query` is specified to 1, then `INSERT INTO` query to this table would be transformed to `REPLACE INTO`.
-If `on_duplicate_clause` is specified, eg `update impression = values(impression) + impression`, it would add `on_duplicate_clause` to the end of the MySQL insert sql.
-Notice that only one of 'replace_query' and 'on_duplicate_clause' can be specified, or none of them.
diff --git a/docs/en/operations/table_engines/replication.md b/docs/en/operations/table_engines/replication.md
index dc83d7e70e5..d391b8bcd0e 100644
--- a/docs/en/operations/table_engines/replication.md
+++ b/docs/en/operations/table_engines/replication.md
@@ -15,7 +15,7 @@ Replication works at the level of an individual table, not the entire server. A
Replication does not depend on sharding. Each shard has its own independent replication.
-Compressed data is replicated for `INSERT` and `ALTER` queries (see the description of the [ALTER](../../query_language/alter.md#query_language_queries_alter) query).
+Compressed data for `INSERT` and `ALTER` queries is replicated (for more information, see the documentation for [ALTER](../../query_language/alter.md#query_language_queries_alter)).
`CREATE`, `DROP`, `ATTACH`, `DETACH` and `RENAME` queries are executed on a single server and are not replicated:
@@ -154,7 +154,7 @@ There is no restriction on network bandwidth during recovery. Keep this in mind
## Converting from MergeTree to ReplicatedMergeTree
-We use the term `MergeTree` to refer to all table engines in the ` MergeTree family`, the same as for ` ReplicatedMergeTree`.
+We use the term `MergeTree` to refer to all table engines in the ` MergeTree family`, the same as for `ReplicatedMergeTree`.
If you had a `MergeTree` table that was manually replicated, you can convert it to a replicatable table. You might need to do this if you have already collected a large amount of data in a `MergeTree` table and now you want to enable replication.
@@ -178,6 +178,3 @@ After this, you can launch the server, create a `MergeTree` table, move the data
## Recovery When Metadata in The ZooKeeper Cluster is Lost or Damaged
If the data in ZooKeeper was lost or damaged, you can save data by moving it to an unreplicated table as described above.
-
-If exactly the same parts exist on the other replicas, they are added to the working set on them. If not, the parts are downloaded from the replica that has them.
-
diff --git a/docs/en/operations/table_engines/summingmergetree.md b/docs/en/operations/table_engines/summingmergetree.md
index f19f156f9e5..1248ac85d14 100644
--- a/docs/en/operations/table_engines/summingmergetree.md
+++ b/docs/en/operations/table_engines/summingmergetree.md
@@ -16,7 +16,7 @@ The columns to total are set explicitly (the last parameter – Shows, Clicks, C
If the values were zero in all of these columns, the row is deleted.
-For the other columns that are not part of the primary key, the first value that occurs is selected when merging. But if a column is of AggregateFunction type, then it is merged according to that function, which effectively makes this engine behave like `AggregatingMergeTree`.
+For the other columns that are not part of the primary key, the first value that occurs is selected when merging. But for the AggregateFunction type of columns, aggregation is performed according to the set function, so this engine actually behaves like `AggregatingMergeTree`.
Summation is not performed for a read operation. If it is necessary, write the appropriate GROUP BY.
diff --git a/docs/en/operations/table_engines/url.md b/docs/en/operations/table_engines/url.md
index ae62cadfc21..66f53867895 100644
--- a/docs/en/operations/table_engines/url.md
+++ b/docs/en/operations/table_engines/url.md
@@ -2,36 +2,34 @@
# URL(URL, Format)
-This data source operates with data on remote HTTP/HTTPS server. The engine is
-similar to [`File`](./file.md#).
+Manages data on a remote HTTP/HTTPS server. This engine is similar
+to the [`File`](./file.md#) engine.
-## Usage in ClickHouse Server
+## Using the engine in the ClickHouse server
-```
-URL(URL, Format)
-```
+`The format` must be one that ClickHouse can use in
+`SELECT` queries and, if necessary, in `INSERTs`. For the full list of supported formats, see
+[Formats](../../interfaces/formats.md#formats).
-`Format` should be supported for `SELECT` and/or `INSERT`. For the full list of
-supported formats see [Formats](../../interfaces/formats.md#formats).
+`The URL` must conform to the structure of a Uniform Resource Locator. The specified URL must point to a server
+that uses HTTP or HTTPS. This does not require any
+additional headers for getting a response from the server.
-`URL` must match the format of Uniform Resource Locator. The specified
-URL must address a server working with HTTP or HTTPS. The server shouldn't
-require any additional HTTP-headers.
-
-`INSERT` and `SELECT` queries are transformed into `POST` and `GET` requests
-respectively. For correct `POST`-requests handling the remote server should support
-[Chunked transfer encoding](https://ru.wikipedia.org/wiki/Chunked_transfer_encoding).
+`INSERT` and `SELECT` queries are transformed to `POST` and `GET` requests,
+respectively. For processing `POST` requests, the remote server must support
+[Chunked transfer encoding](https://en.wikipedia.org/wiki/Chunked_transfer_encoding).
**Example:**
-**1.** Create the `url_engine_table` table:
+**1.** Create a `url_engine_table` table on the server :
```sql
CREATE TABLE url_engine_table (word String, value UInt64)
ENGINE=URL('http://127.0.0.1:12345/', CSV)
```
-**2.** Implement simple http-server using python3:
+**2.** Create a basic HTTP server using the standard Python 3 tools and
+start it:
```python3
from http.server import BaseHTTPRequestHandler, HTTPServer
@@ -53,7 +51,7 @@ if __name__ == "__main__":
python3 server.py
```
-**3.** Query the data:
+**3.** Request data:
```sql
SELECT * FROM url_engine_table
@@ -66,12 +64,16 @@ SELECT * FROM url_engine_table
└───────┴───────┘
```
+## Usage
+<<<<<<< HEAD
+- Multi-stream reading and writing are supported.
+=======
## Details of Implementation
- Reads and writes can be parallel
+>>>>>>> upstream/master
- Not supported:
- - `ALTER`
- - `SELECT ... SAMPLE`
- - Indices
- - Replication
+ - `ALTER` and `SELECT...SAMPLE` operations.
+ - Indexes.
+ - Replication.
diff --git a/docs/en/operations/tips.md b/docs/en/operations/tips.md
index ec59e590573..ca069304a10 100644
--- a/docs/en/operations/tips.md
+++ b/docs/en/operations/tips.md
@@ -107,6 +107,10 @@ You are probably already using ZooKeeper for other purposes. You can use the sam
It's best to use a fresh version of ZooKeeper – 3.4.9 or later. The version in stable Linux distributions may be outdated.
+You should never use manually written scripts to transfer data between different ZooKeeper clusters, because the result will be incorrect for sequential nodes. Never use the "zkcopy" utility for the same reason: https://github.com/ksprojects/zkcopy/issues/15
+
+If you want to divide an existing ZooKeeper cluster into two, the correct way is to increase the number of its replicas and then reconfigure it as two independent clusters.
+
Do not run ZooKeeper on the same servers as ClickHouse. Because ZooKeeper is very sensitive for latency and ClickHouse may utilize all available system resources.
With the default settings, ZooKeeper is a time bomb:
@@ -115,10 +119,6 @@ With the default settings, ZooKeeper is a time bomb:
This bomb must be defused.
-If you want to move data between different ZooKeeper clusters, never move it by hand-written script, because it will produce wrong data for sequential nodes. Never use "zkcopy" tool, by the same reason: https://github.com/ksprojects/zkcopy/issues/15
-
-If you want to split ZooKeeper cluster, proper way is to increase number of replicas and then reconfigure it as two independent clusters.
-
The ZooKeeper (3.5.1) configuration below is used in the Yandex.Metrica production environment as of May 20, 2017:
zoo.cfg:
diff --git a/docs/en/operations/utils/clickhouse-copier.md b/docs/en/operations/utils/clickhouse-copier.md
index eeb5e077d6a..3109044724f 100644
--- a/docs/en/operations/utils/clickhouse-copier.md
+++ b/docs/en/operations/utils/clickhouse-copier.md
@@ -9,12 +9,12 @@ You can run multiple `clickhouse-copier` instances on different servers to perfo
After starting, `clickhouse-copier`:
- Connects to ZooKeeper and receives:
- - Copying jobs.
- - The state of the copying jobs.
+ - Copying jobs.
+ - The state of the copying jobs.
- It performs the jobs.
- Each running process chooses the "closest" shard of the source cluster and copies the data into the destination cluster, resharding the data if necessary.
+ Each running process chooses the "closest" shard of the source cluster and copies the data into the destination cluster, resharding the data if necessary.
`clickhouse-copier` tracks the changes in ZooKeeper and applies them on the fly.
@@ -83,7 +83,7 @@ Parameters:
0
-
3
@@ -91,13 +91,14 @@ Parameters:
1
-
-
+
-
+
source_clustertesthits
@@ -108,11 +109,12 @@ Parameters:
hits2
@@ -133,11 +135,11 @@ Parameters:
Since partition key of source and destination cluster could be different,
these partition names specify destination partitions.
- Note: Although this section is optional (if it omitted, all partitions will be copied),
- it is strongly recommended to specify the partitions explicitly.
- If you already have some partitions ready on the destination cluster, they
- will be removed at the start of the copying, because they will be interpreted
- as unfinished data from the previous copying.
+ NOTE: In spite of this section is optional (if it is not specified, all partitions will be copied),
+ it is strictly recommended to specify them explicitly.
+ If you already have some ready paritions on destination cluster they
+ will be removed at the start of the copying since they will be interpeted
+ as unfinished data from the previous copying!!!
-->
'2018-02-26'
@@ -146,7 +148,7 @@ Parameters:
-
+
...
diff --git a/docs/en/operations/utils/clickhouse-local.md b/docs/en/operations/utils/clickhouse-local.md
index bfa612569f3..4251de12377 100644
--- a/docs/en/operations/utils/clickhouse-local.md
+++ b/docs/en/operations/utils/clickhouse-local.md
@@ -2,64 +2,58 @@
# clickhouse-local
-The `clickhouse-local` program enables you to perform fast processing on local files, without having to deploy and configure the ClickHouse server.
+Accepts data that can be represented as a table and performs the operations that are specified in the ClickHouse [query language](../../query_language/index.md#queries).
-Accepts data that represent tables and queries them using [ClickHouse SQL dialect](../../query_language/index.md#queries).
+`clickhouse-local` uses the ClickHouse server engine, meaning it supports all date formats and table engines that ClickHouse works with, and operations do not require a running server.
-`clickhouse-local` uses the same core as ClickHouse server, so it supports most of the features and the same set of formats and table engines.
+`When clickhouse-local` is configured by default, it does not have access to data managed by the ClickHouse server that is installed on the same host, but you can use the `--config-file` key to connect the server configuration.
-By default `clickhouse-local` does not have access to data on the same host, but it supports loading server configuration using `--config-file` argument.
+!!! Warning:
+We do not recommend connecting the server configuration to `clickhouse-local`, since data can be easily damaged by accident.
-!!! warning
- It is not recommended to load production server configuration into `clickhouse-local` because data can be damaged in case of human error.
+## Invoking the program
+Basic format of the call:
-## Usage
-
-Basic usage:
-
-``` bash
+```bash
clickhouse-local --structure "table_structure" --input-format "format_of_incoming_data" -q "query"
```
-Arguments:
-
-- `-S`, `--structure` — table structure for input data.
-- `-if`, `--input-format` — input format, `TSV` by default.
-- `-f`, `--file` — path to data, `stdin` by default.
-- `-q` `--query` — queries to execute with `;` as delimeter.
-- `-N`, `--table` — table name where to put output data, `table` by default.
-- `-of`, `--format`, `--output-format` — output format, `TSV` by default.
-- `--stacktrace` — whether to dump debug output in case of exception.
-- `--verbose` — more details on query execution.
-- `-s` — disables `stderr` logging.
-- `--config-file` — path to configuration file in same format as for ClickHouse server, by default the configuration empty.
-- `--help` — arguments references for `clickhouse-local`.
-
-Also there are arguments for each ClickHouse configuration variable which are more commonly used instead of `--config-file`.
+Command keys:
+- `-S`, `--structure` — The structure of the table where the input data will be placed.
+- `-if`, `--input-format` — The input data format. By default, it is `TSV`.
+- `-f`, `--file` — The path to the data file. By default, it is `stdin`.
+- `-q`, `--query` — Queries to run. The query separator is `;`.
+- `-N`, `-- table` — The name of the table where the input data will be placed. By default, it is `table`.
+- `-of`, `--format`, `--output-format` — The output data format. By default, it is `TSV`.
+- `--stacktrace` — Output debugging information for exceptions.
+- `--verbose` — Verbose output when running a query.
+- `-s` — Suppresses displaying the system log in `stderr`.
+- `--config-file` — The path to the configuration file. By default, `clickhouse-local` starts with an empty configuration. The configuration file has the same format as the ClickHouse server and can use all the server configuration parameters. Typically, the connection configuration is not required. If you want to set a specific parameter, you can use a key with the parameter name.
+- `--help` — Output reference information about `clickhouse-local`.
## Examples
-``` bash
+```bash
echo -e "1,2\n3,4" | clickhouse-local -S "a Int64, b Int64" -if "CSV" -q "SELECT * FROM table"
Read 2 rows, 32.00 B in 0.000 sec., 5182 rows/sec., 80.97 KiB/sec.
1 2
3 4
```
-Previous example is the same as:
+The above command is equivalent to the following:
-``` bash
+```bash
$ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64) ENGINE = File(CSV, stdin); SELECT a, b FROM table; DROP TABLE table"
Read 2 rows, 32.00 B in 0.000 sec., 4987 rows/sec., 77.93 KiB/sec.
1 2
3 4
```
-Now let's output memory user for each Unix user:
+Now let's show the amount of RAM occupied by users (Unix) on the screen:
-``` bash
+```bash
$ ps aux | tail -n +2 | awk '{ printf("%s\t%s\n", $1, $4) }' | clickhouse-local -S "user String, mem Float64" -q "SELECT user, round(sum(mem), 2) as memTotal FROM table GROUP BY user ORDER BY memTotal DESC FORMAT Pretty"
Read 186 rows, 4.15 KiB in 0.035 sec., 5302 rows/sec., 118.34 KiB/sec.
┏━━━━━━━━━━┳━━━━━━━━━━┓
diff --git a/docs/en/query_language/agg_functions/combinators.md b/docs/en/query_language/agg_functions/combinators.md
index ca5c4674172..3b7e372b324 100644
--- a/docs/en/query_language/agg_functions/combinators.md
+++ b/docs/en/query_language/agg_functions/combinators.md
@@ -24,7 +24,7 @@ Example 2: `uniqArray(arr)` – Count the number of unique elements in all 'arr'
## -State
-If you apply this combinator, the aggregate function doesn't return the resulting value (such as the number of unique values for the 'uniq' function), but an intermediate state of the aggregation (for ` uniq`, this is the hash table for calculating the number of unique values). This is an AggregateFunction(...) that can be used for further processing or stored in a table to finish aggregating later. See the sections "AggregatingMergeTree" and "Functions for working with intermediate aggregation states".
+If you apply this combinator, the aggregate function doesn't return the resulting value (such as the number of unique values for the 'uniq' function), but an intermediate state of the aggregation (for `uniq`, this is the hash table for calculating the number of unique values). This is an AggregateFunction(...) that can be used for further processing or stored in a table to finish aggregating later. See the sections "AggregatingMergeTree" and "Functions for working with intermediate aggregation states".
## -Merge
diff --git a/docs/en/query_language/agg_functions/index.md b/docs/en/query_language/agg_functions/index.md
index 3864f7271c4..0f9cefc7b95 100644
--- a/docs/en/query_language/agg_functions/index.md
+++ b/docs/en/query_language/agg_functions/index.md
@@ -9,3 +9,55 @@ ClickHouse also supports:
- [Parametric aggregate functions](parametric_functions.md#aggregate_functions_parametric), which accept other parameters in addition to columns.
- [Combinators](combinators.md#aggregate_functions_combinators), which change the behavior of aggregate functions.
+## NULL processing
+
+During aggregation, all `NULL`s are skipped.
+
+**Examples:**
+
+Consider this table:
+
+```
+┌─x─┬────y─┐
+│ 1 │ 2 │
+│ 2 │ ᴺᵁᴸᴸ │
+│ 3 │ 2 │
+│ 3 │ 3 │
+│ 3 │ ᴺᵁᴸᴸ │
+└───┴──────┘
+```
+
+Let's say you need to total the values in the `y` column:
+
+```
+:) SELECT sum(y) FROM t_null_big
+
+SELECT sum(y)
+FROM t_null_big
+
+┌─sum(y)─┐
+│ 7 │
+└────────┘
+
+1 rows in set. Elapsed: 0.002 sec.
+```
+
+The `sum` function interprets `NULL` as `0`. In particular, this means that if the function receives input of a selection where all the values are `NULL`, then the result will be `0`, not `NULL`.
+
+Now you can use the `groupArray` function to create an array from the `y` column:
+
+```
+:) SELECT groupArray(y) FROM t_null_big
+
+SELECT groupArray(y)
+FROM t_null_big
+
+┌─groupArray(y)─┐
+│ [2,2,3] │
+└───────────────┘
+
+1 rows in set. Elapsed: 0.002 sec.
+```
+
+`groupArray` does not include `NULL` in the resulting array.
+
diff --git a/docs/en/query_language/agg_functions/parametric_functions.md b/docs/en/query_language/agg_functions/parametric_functions.md
index 5881bed4c90..18f519035b0 100644
--- a/docs/en/query_language/agg_functions/parametric_functions.md
+++ b/docs/en/query_language/agg_functions/parametric_functions.md
@@ -52,19 +52,39 @@ Chains are searched for without overlapping. In other words, the next chain can
## windowFunnel(window)(timestamp, cond1, cond2, cond3, ...)
-Window funnel matching for event chains, calculates the max event level in a sliding window.
+Searches for event chains in a sliding time window and calculates the maximum number of events that occurred from the chain.
-`window` is the timestamp window value, such as 3600.
+```
+windowFunnel(window)(timestamp, cond1, cond2, cond3, ...)
+```
-`timestamp` is the time of the event with the DateTime type or UInt32 type.
+**Parameters:**
-`cond1`, `cond2` ... is from one to 32 arguments of type UInt8 that indicate whether a certain condition was met for the event
+- `window` — Length of the sliding window in seconds.
+- `timestamp` — Name of the column containing the timestamp. Data type: [DateTime](../../data_types/datetime.md#data_type-datetime) or [UInt32](../../data_types/int_uint.md#data_type-int).
+- `cond1`, `cond2`... — Conditions or data describing the chain of events. Data type: `UInt8`. Values can be 0 or 1.
-Example:
+**Algorithm**
-Consider you are doing a website analytics, intend to find out the user counts clicked login button (event = 1001), then the user counts followed by searched the phones( event = 1003 and product = 'phone'), then the user counts followed by made an order (event = 1009). And all event chains must be in a 3600 seconds sliding window.
+- The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts.
+- If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter isn't incremented.
+- If the data has multiple event chains at varying points of completion, the function will only output the size of the longest chain.
-This could be easily calculate by `windowFunnel`
+**Returned value**
+
+- Integer. The maximum number of consecutive triggered conditions from the chain within the sliding time window. All the chains in the selection are analyzed.
+
+**Example**
+
+Determine if one hour is enough for the user to select a phone and purchase it in the online store.
+
+Set the following chain of events:
+
+1. The user logged in to their account on the store (`eventID=1001`).
+2. The user searches for a phone (`eventID = 1003, product = 'phone'`).
+3. The user placed an order (`eventID = 1009`).
+
+To find out how far the user `user_id` could get through the chain in an hour in January of 2017, make the query:
```
SELECT
@@ -74,7 +94,7 @@ FROM
(
SELECT
user_id,
- windowFunnel(3600)(timestamp, event_id = 1001, event_id = 1003 AND product = 'phone', event_id = 1009) AS level
+ windowFunnel(3600)(timestamp, eventID = 1001, eventID = 1003 AND product = 'phone', eventID = 1009) AS level
FROM trend_event
WHERE (event_date >= '2017-01-01') AND (event_date <= '2017-01-31')
GROUP BY user_id
@@ -91,26 +111,26 @@ Retention refers to the ability of a company or product to retain its customers
`cond1`, `cond2` ... is from one to 32 arguments of type UInt8 that indicate whether a certain condition was met for the event
-Example:
+Example:
Consider you are doing a website analytics, intend to calculate the retention of customers
This could be easily calculate by `retention`
```
-SELECT
- sum(r[1]) AS r1,
- sum(r[2]) AS r2,
+SELECT
+ sum(r[1]) AS r1,
+ sum(r[2]) AS r2,
sum(r[3]) AS r3
-FROM
+FROM
(
- SELECT
+ SELECT
uid,
retention(date = '2018-08-10', date = '2018-08-11', date = '2018-08-12') AS r
- FROM events
+ FROM events
WHERE date IN ('2018-08-10', '2018-08-11', '2018-08-12')
GROUP BY uid
-)
+)
```
Simply, `r1` means the number of unique visitors who met the `cond1` condition, `r2` means the number of unique visitors who met `cond1` and `cond2` conditions, `r3` means the number of unique visitors who met `cond1` and `cond3` conditions.
@@ -135,4 +155,3 @@ Usage example:
Problem: Generate a report that shows only keywords that produced at least 5 unique users.
Solution: Write in the GROUP BY query SearchPhrase HAVING uniqUpTo(4)(UserID) >= 5
```
-
diff --git a/docs/en/query_language/agg_functions/reference.md b/docs/en/query_language/agg_functions/reference.md
index 8cdbd36c2d9..357f3d622a2 100644
--- a/docs/en/query_language/agg_functions/reference.md
+++ b/docs/en/query_language/agg_functions/reference.md
@@ -28,6 +28,7 @@ anyHeavy(column)
```
**Arguments**
+
- `column` – The column name.
**Example**
@@ -252,7 +253,7 @@ A hash table is used as the algorithm. Because of this, if the passed values
Approximates the quantile level using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm. The maximum error is 1%. Memory consumption by State is proportional to the logarithm of the number of passed values.
-The performance of the function is lower than for ` quantile`, ` quantileTiming`. In terms of the ratio of State size to precision, this function is much better than `quantile`.
+The performance of the function is lower than for `quantile`, `quantileTiming`. In terms of the ratio of State size to precision, this function is much better than `quantile`.
The result depends on the order of running the query, and is nondeterministic.
@@ -290,7 +291,7 @@ The result is equal to the square root of `varPop(x)`.
Returns an array of the most frequent values in the specified column. The resulting array is sorted in descending order of frequency of values (not by the values themselves).
-Implements the [Filtered Space-Saving](http://www.l2f.inesc-id.pt/~fmmb/wiki/uploads/Work/misnis.ref0a.pdf) algorithm for analyzing TopK, based on the reduce-and-combine algorithm from [Parallel Space Saving](https://arxiv.org/pdf/1401.0702.pdf).
+Implements the [ Filtered Space-Saving](http://www.l2f.inesc-id.pt/~fmmb/wiki/uploads/Work/misnis.ref0a.pdf) algorithm for analyzing TopK, based on the reduce-and-combine algorithm from [Parallel Space Saving](https://arxiv.org/pdf/1401.0702.pdf).
```
topK(N)(column)
@@ -301,6 +302,7 @@ This function doesn't provide a guaranteed result. In certain situations, errors
We recommend using the `N < 10 ` value; performance is reduced with large `N` values. Maximum value of ` N = 65536`.
**Arguments**
+
- 'N' is the number of values.
- ' x ' – The column.
@@ -312,6 +314,7 @@ Take the [OnTime](../../getting_started/example_datasets/ontime.md#example_datas
SELECT topK(3)(AirlineID) AS res
FROM ontime
```
+
```
┌─res─────────────────┐
│ [19393,19790,19805] │
diff --git a/docs/en/query_language/alter.md b/docs/en/query_language/alter.md
index d5db2ae5abb..e1ce7a0296d 100644
--- a/docs/en/query_language/alter.md
+++ b/docs/en/query_language/alter.md
@@ -104,7 +104,7 @@ drwxrwxrwx 2 clickhouse clickhouse 4096 May 5 02:55 detached
-rw-rw-rw- 1 clickhouse clickhouse 2 May 5 02:58 increment.txt
```
-Here, `20140317_20140323_2_2_0` and ` 20140317_20140323_4_4_0` are the directories of data parts.
+Here, `20140317_20140323_2_2_0` and `20140317_20140323_4_4_0` are the directories of data parts.
Let's break down the name of the first part: `20140317_20140323_2_2_0`.
@@ -208,7 +208,7 @@ Although the query is called `ALTER TABLE`, it does not change the table structu
Data is placed in the `detached` directory. You can use the `ALTER TABLE ... ATTACH` query to attach the data.
-The ` FROM` clause specifies the path in ` ZooKeeper`. For example, `/clickhouse/tables/01-01/visits`.
+The `FROM` clause specifies the path in `ZooKeeper`. For example, `/clickhouse/tables/01-01/visits`.
Before downloading, the system checks that the partition exists and the table structure matches. The most appropriate replica is selected automatically from the healthy replicas.
The `ALTER ... FETCH PARTITION` query is not replicated. The partition will be downloaded to the 'detached' directory only on the local server. Note that if after this you use the `ALTER TABLE ... ATTACH` query to add data to the table, the data will be added on all replicas (on one of the replicas it will be added from the 'detached' directory, and on the rest it will be loaded from neighboring replicas).
@@ -224,45 +224,45 @@ Possible values: `0` – do not wait; `1` – only wait for own execution (defau
### Mutations
-Mutations are an ALTER query variant that allows changing or deleting rows in a table. In contrast to standard `UPDATE` and `DELETE` queries that are intended for point data changes, mutations are intended for heavy operations that change a lot of rows in a table.
+A mutation is a type of ALTER query that lets you change or delete data in a table. In contrast to standard `DELETE` and `UPDATE` queries that are calculated for point-based data changes, mutations are used for broad changes that affect many rows in the table.
-The functionality is in beta stage and is available starting with the 1.1.54388 version. Currently *MergeTree table engines are supported (both replicated and unreplicated).
+The functionality is in beta testing and is available starting from version 1.1.54388. Implemented support for \*MergeTree tables (with and without replication).
-Existing tables are ready for mutations as-is (no conversion necessary), but after the first mutation is applied to a table, its metadata format becomes incompatible with previous server versions and falling back to a previous version becomes impossible.
+You don't need to convert existing tables to work with mutations. However, after the first mutation is applied, the table data format becomes incompatible with previous versions and it is not possible to roll back to the previous version.
-At the moment the `ALTER DELETE` command is available:
+The `ALTER DELETE` command is currently available:
```sql
ALTER TABLE [db.]table DELETE WHERE expr
```
-The expression `expr` must be of UInt8 type. The query deletes rows for which this expression evaluates to a non-zero value.
+The `expr` must be of type UInt8. The query deletes rows in the table for which this expression takes a non-zero value.
-One query can contain several commands separated by commas.
+A single query can specify multiple comma-separated commands.
-For *MergeTree tables mutations execute by rewriting whole data parts. There is no atomicity - parts are substituted for mutated parts as soon as they are ready and a `SELECT` query that started executing during a mutation will see data from parts that have already been mutated along with data from parts that have not been mutated yet.
+For \*Merge-tables, the mutations are applied by overwriting the data in chunks (parts). However, there is no atomicity: the parts are replaced with the mutations as they are processed and the `SELECT` query specified during execution of the mutation will see the data for both the changed parts and the parts that have not yet been changed.
-Mutations are totally ordered by their creation order and are applied to each part in that order. Mutations are also partially ordered with INSERTs - data that was inserted into the table before the mutation was submitted will be mutated and data that was inserted after that will not be mutated. Note that mutations do not block INSERTs in any way.
+The mutations have a linear order and they are applied to each part in the order they were added. Mutations are also ordered with inserts. This guarantees that data inserted in the table before the start of the mutation query will be changed, and data inserted after the query ends will not be changed. However, mutations do not block inserts in any way.
-A mutation query returns immediately after the mutation entry is added (in case of replicated tables to ZooKeeper, for nonreplicated tables - to the filesystem). The mutation itself executes asynchronously using the system profile settings. To track the progress of mutations you can use the `system.mutations` table. A mutation that was successfully submitted will continue to execute even if ClickHouse servers are restarted. There is no way to roll back the mutation once it is submitted.
+The query is completed immediately after adding information about the mutation (for replicated tables, in Zookeeper; for non-replicated tables, in the file system). The mutation itself is executed asynchronously using the system profile settings. You can monitor the progress in the `system.mutations` table. Added mutations are fully completed even if ClickHouse is restarted. You can't revert a mutation after it has been added.
Entries for finished mutations are not deleted right away (the number of preserved entries is determined by the `finished_mutations_to_keep` storage engine parameter). Older mutation entries are deleted.
#### system.mutations Table
-The table contains information about mutations of MergeTree tables and their progress. Each mutation command is represented by a single row. The table has the following columns:
+This table contains information about the progress of mutations on MergeTree tables. Each mutation command corresponds to a single row. The table has the following columns:
-**database**, **table** - The name of the database and table to which the mutation was applied.
+**database**, **table** — The name of the database and the table that the mutation was applied to.
-**mutation_id** - The ID of the mutation. For replicated tables these IDs correspond to znode names in the `/mutations/` directory in ZooKeeper. For unreplicated tables the IDs correspond to file names in the data directory of the table.
+**mutation_id** — ID of the query. For replicated tables, these IDs correspond to the names of entries in the `/mutations/` directory in ZooKeeper. For unreplicated tables, they are the file names in the directory with the table data.
-**command** - The mutation command string (the part of the query after `ALTER TABLE [db.]table`).
+**command** — The mutation command (the part of the query after `ALTER TABLE [db.]table`).
-**create_time** - When this mutation command was submitted for execution.
+**create_time** — The time of creation of the mutation.
-**block_numbers.partition_id**, **block_numbers.number** - A Nested column. For mutations of replicated tables contains one record for each partition: the partition ID and the block number that was acquired by the mutation (in each partition only parts that contain blocks with numbers less than the block number acquired by the mutation in that partition will be mutated). Because in non-replicated tables blocks numbers in all partitions form a single sequence, for mutatations of non-replicated tables the column will contain one record with a single block number acquired by the mutation.
+**block_numbers.partition_id**, **block_numbers.number** — Nested column. For mutations in replicated tables, it contains the block number obtained by this mutation for each partition (each partition will only have changes in the parts that contain blocks with numbers less than the number obtained by the mutation in this partition). For non-replicated tables, the block numbering carries through the partitions, so the column contains a single entry with a single block number that was obtained by the mutation.
-**parts_to_do** - The number of data parts that need to be mutated for the mutation to finish.
+**parts_to_do** — The number of parts of the table that still need to be changed.
-**is_done** - Is the mutation done? Note that even if `parts_to_do = 0` it is possible that a mutation of a replicated table is not done yet because of a long-running INSERT that will create a new data part that will need to be mutated.
+**is_done** — Whether the mutation is complete. Note: Even if `parts_to_do = 0`, for a replicated table, there may be a situation when the mutation has not yet finished because of a long insert that is adding data that will need to be mutated.
diff --git a/docs/en/query_language/create.md b/docs/en/query_language/create.md
index 5489dad2b6d..9369c3a5536 100644
--- a/docs/en/query_language/create.md
+++ b/docs/en/query_language/create.md
@@ -103,7 +103,7 @@ CREATE TABLE IF NOT EXISTS all_hits ON CLUSTER cluster (p Date, i Int32) ENGINE
In order to run these queries correctly, each host must have the same cluster definition (to simplify syncing configs, you can use substitutions from ZooKeeper). They must also connect to the ZooKeeper servers.
The local version of the query will eventually be implemented on each host in the cluster, even if some hosts are currently not available. The order for executing queries within a single host is guaranteed.
-` ALTER` queries are not yet supported for replicated tables.
+`ALTER` queries are not yet supported for replicated tables.
## CREATE VIEW
@@ -152,4 +152,3 @@ The execution of `ALTER` queries on materialized views has not been fully develo
Views look the same as normal tables. For example, they are listed in the result of the `SHOW TABLES` query.
There isn't a separate query for deleting views. To delete a view, use `DROP TABLE`.
-
diff --git a/docs/en/query_language/dicts/external_dicts.md b/docs/en/query_language/dicts/external_dicts.md
index 7f1063a04e6..3d7217249f0 100644
--- a/docs/en/query_language/dicts/external_dicts.md
+++ b/docs/en/query_language/dicts/external_dicts.md
@@ -2,11 +2,11 @@
# External Dictionaries
-You can add your own dictionaries from various data sources. The data source for a dictionary can be a local text or executable file, an HTTP(s) resource, or another DBMS. For more information, see "[Sources for external dictionaries](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources)".
+You can add your own dictionaries from various data sources. The data source for a dictionary can be a local text or executable file, an HTTP(s) resource, or another DBMS. For more information, see "[Sources of external dictionaries](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources)".
ClickHouse:
-> - Fully or partially stores dictionaries in RAM.
+- Fully or partially stores dictionaries in RAM.
- Periodically updates dictionaries and dynamically loads missing values. In other words, dictionaries can be loaded dynamically.
The configuration of external dictionaries is located in one or more files. The path to the configuration is specified in the [dictionaries_config](../../operations/server_settings/settings.md#server_settings-dictionaries_config) parameter.
@@ -37,7 +37,14 @@ The dictionary config file has the following format:
You can [configure](external_dicts_dict.md#dicts-external_dicts_dict) any number of dictionaries in the same file. The file format is preserved even if there is only one dictionary (i.e. `
-
+
+
+
DSN=myconnection
postgresql_table
@@ -203,7 +203,7 @@ Installing the driver: :
Configuring the driver: :
```
- $ cat /etc/freetds/freetds.conf
+ $ cat /etc/freetds/freetds.conf
...
[MSSQL]
@@ -212,7 +212,7 @@ Configuring the driver: :
tds version = 7.0
client charset = UTF-8
- $ cat /etc/odbcinst.ini
+ $ cat /etc/odbcinst.ini
...
[FreeTDS]
@@ -222,7 +222,7 @@ Configuring the driver: :
FileUsage = 1
UsageCount = 5
- $ cat ~/.odbc.ini
+ $ cat ~/.odbc.ini
...
[MSSQL]
@@ -310,9 +310,9 @@ Setting fields:
- `password` – Password of the MySQL user. You can specify it for all replicas, or for each one individually (inside ``).
- `replica` – Section of replica configurations. There can be multiple sections.
- - `replica/host` – The MySQL host.
+ - `replica/host` – The MySQL host.
- \* `replica/priority` – The replica priority. When attempting to connect, ClickHouse traverses the replicas in order of priority. The lower the number, the higher the priority.
+ \* `replica/priority` – The replica priority. When attempting to connect, ClickHouse traverses the replicas in order of priority. The lower the number, the higher the priority.
- `db` – Name of the database.
@@ -398,4 +398,3 @@ Setting fields:
- `password` – Password of the MongoDB user.
- `db` – Name of the database.
- `collection` – Name of the collection.
-
diff --git a/docs/en/query_language/dicts/external_dicts_dict_structure.md b/docs/en/query_language/dicts/external_dicts_dict_structure.md
index 06ab8e812ce..176bf0607ee 100644
--- a/docs/en/query_language/dicts/external_dicts_dict_structure.md
+++ b/docs/en/query_language/dicts/external_dicts_dict_structure.md
@@ -39,8 +39,8 @@ ClickHouse supports the following types of keys:
A structure can contain either `` or `` .
-!!! warning
- The key doesn't need to be defined separately in attributes.
+!!! Note:
+The key doesn't need to be defined separately in attributes.
### Numeric Key
@@ -62,8 +62,8 @@ Configuration fields:
The key can be a `tuple` from any types of fields. The [layout](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout) in this case must be `complex_key_hashed` or `complex_key_cache`.
-!!! tip
- A composite key can consist of a single element. This makes it possible to use a string as the key, for instance.
+!!! Tip:
+A composite key can consist of a single element. This makes it possible to use a string as the key, for instance.
The key structure is set in the element ``. Key fields are specified in the same format as the dictionary [attributes](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure-attributes). Example:
@@ -112,6 +112,7 @@ Configuration fields:
- `type` – The column type. Sets the method for interpreting data in the source. For example, for MySQL, the field might be `TEXT`, `VARCHAR`, or `BLOB` in the source table, but it can be uploaded as `String`.
- `null_value` – The default value for a non-existing element. In the example, it is an empty string.
- `expression` – The attribute can be an expression. The tag is not required.
-- `hierarchical` – Hierarchical support. Mirrored to the parent identifier. By default, ` false`.
-- `injective` – Whether the `id -> attribute` image is injective. If ` true`, then you can optimize the ` GROUP BY` clause. By default, `false`.
+- `hierarchical` – Hierarchical support. Mirrored to the parent identifier. By default, `false`.
+- `injective` – Whether the `id -> attribute` image is injective. If `true`, then you can optimize the ` GROUP BY` clause. By default, `false`.
- `is_object_id` – Whether the query is executed for a MongoDB document by `ObjectID`.
+
diff --git a/docs/en/query_language/dicts/index.md b/docs/en/query_language/dicts/index.md
index de89ef197f7..862dd686f2c 100644
--- a/docs/en/query_language/dicts/index.md
+++ b/docs/en/query_language/dicts/index.md
@@ -1,7 +1,13 @@
# Dictionaries
-`A dictionary` is a mapping (key `->` attributes) that can be used in a query as functions.
-You can think of this as a more convenient and efficient type of JOIN with dimension tables.
+A dictionary is a mapping (`key -> attributes`) that is convenient for various types of reference lists.
-There are built-in (internal) and add-on (external) dictionaries.
+ClickHouse supports special functions for working with dictionaries that can be used in queries. It is easier and more efficient to use dictionaries with functions than a `JOIN` with reference tables.
+
+[NULL](../syntax.md#null-literal) values can't be stored in a dictionary.
+
+ClickHouse supports:
+
+- [Built-in dictionaries](internal_dicts.md#internal_dicts) with a specific [set of functions](../functions/ym_dict_functions.md#ym_dict_functions).
+- [Plug-in (external) dictionaries](external_dicts.md#dicts-external_dicts) with a [set of functions](../functions/ext_dict_functions.md#ext_dict_functions).
diff --git a/docs/en/query_language/dicts/internal_dicts.md b/docs/en/query_language/dicts/internal_dicts.md
index ff368490d85..47833b1e57a 100644
--- a/docs/en/query_language/dicts/internal_dicts.md
+++ b/docs/en/query_language/dicts/internal_dicts.md
@@ -1,4 +1,6 @@
-# Internal Dictionaries
+
+
+# Internal dictionaries
ClickHouse contains a built-in feature for working with a geobase.
@@ -15,35 +17,33 @@ The internal dictionaries are disabled in the default package.
To enable them, uncomment the parameters `path_to_regions_hierarchy_file` and `path_to_regions_names_files` in the server configuration file.
The geobase is loaded from text files.
-If you work at Yandex, you can follow these instructions to create them:
-
+If you work at Yandex, you can create them by following [the instructions](https://github.yandex-team.ru/raw/Metrika/ClickHouse_private/master/doc/create_embedded_geobase_dictionaries.txt).
-Put the regions_hierarchy\*.txt files in the path_to_regions_hierarchy_file directory. This configuration parameter must contain the path to the regions_hierarchy.txt file (the default regional hierarchy), and the other files (regions_hierarchy_ua.txt) must be located in the same directory.
+Place the `regions_hierarchy*.txt` files into the `path_to_regions_hierarchy_file` directory. This configuration parameter must contain the path to the `regions_hierarchy.txt` file (the default regional hierarchy), and the other files (`regions_hierarchy_ua.txt`) must be located in the same directory.
-Put the `regions_names_*.txt` files in the path_to_regions_names_files directory.
+Put the `regions_names_*.txt` files in the `path_to_regions_names_files` directory.
You can also create these files yourself. The file format is as follows:
`regions_hierarchy*.txt`: TabSeparated (no header), columns:
-- Region ID (UInt32)
-- Parent region ID (UInt32)
-- Region type (UInt8): 1 - continent, 3 - country, 4 - federal district, 5 - region, 6 - city; other types don't have values.
-- Population (UInt32) - Optional column.
+- region ID (`UInt32`)
+- parent region ID (`UInt32`)
+- region type (`UInt8`): 1 - continent, 3 - country, 4 - federal district, 5 - region, 6 - city; other types don't have values
+- population (`UInt32`) — optional column
`regions_names_*.txt`: TabSeparated (no header), columns:
-- Region ID (UInt32)
-- Region name (String) - Can't contain tabs or line feeds, even escaped ones.
+- region ID (`UInt32`)
+- region name (`String`) — Can't contain tabs or line feeds, even escaped ones.
A flat array is used for storing in RAM. For this reason, IDs shouldn't be more than a million.
Dictionaries can be updated without restarting the server. However, the set of available dictionaries is not updated.
For updates, the file modification times are checked. If a file has changed, the dictionary is updated.
-The interval to check for changes is configured in the 'builtin_dictionaries_reload_interval' parameter.
+The interval to check for changes is configured in the `builtin_dictionaries_reload_interval` parameter.
Dictionary updates (other than loading at first use) do not block queries. During updates, queries use the old versions of dictionaries. If an error occurs during an update, the error is written to the server log, and queries continue using the old version of dictionaries.
We recommend periodically updating the dictionaries with the geobase. During an update, generate new files and write them to a separate location. When everything is ready, rename them to the files used by the server.
There are also functions for working with OS identifiers and Yandex.Metrica search engines, but they shouldn't be used.
-
diff --git a/docs/en/query_language/functions/array_functions.md b/docs/en/query_language/functions/array_functions.md
index 3c9b02041ea..e250479e5b8 100644
--- a/docs/en/query_language/functions/array_functions.md
+++ b/docs/en/query_language/functions/array_functions.md
@@ -62,6 +62,7 @@ arrayConcat(arrays)
```sql
SELECT arrayConcat([1, 2], [3, 4], [5, 6]) AS res
```
+
```
┌─res───────────┐
│ [1,2,3,4,5,6] │
@@ -81,13 +82,49 @@ If the index falls outside of the bounds of an array, it returns some default va
Checks whether the 'arr' array has the 'elem' element.
Returns 0 if the the element is not in the array, or 1 if it is.
+`NULL` is processed as a value.
+
+```
+SELECT has([1, 2, NULL], NULL)
+
+┌─has([1, 2, NULL], NULL)─┐
+│ 1 │
+└─────────────────────────┘
+```
+
## indexOf(arr, x)
-Returns the index of the 'x' element (starting from 1) if it is in the array, or 0 if it is not.
+Returns the index of the first 'x' element (starting from 1) if it is in the array, or 0 if it is not.
+
+Example:
+
+```
+:) select indexOf([1,3,NULL,NULL],NULL)
+
+SELECT indexOf([1, 3, NULL, NULL], NULL)
+
+┌─indexOf([1, 3, NULL, NULL], NULL)─┐
+│ 3 │
+└───────────────────────────────────┘
+```
+
+Elements set to `NULL` are handled as normal values.
## countEqual(arr, x)
-Returns the number of elements in the array equal to x. Equivalent to arrayCount (elem-> elem = x, arr).
+Returns the number of elements in the array equal to x. Equivalent to arrayCount (elem -> elem = x, arr).
+
+`NULL` elements are handled as separate values.
+
+Example:
+
+```
+SELECT countEqual([1, 2, NULL, NULL], NULL)
+
+┌─countEqual([1, 2, NULL, NULL], NULL)─┐
+│ 2 │
+└──────────────────────────────────────┘
+```
## arrayEnumerate(arr)
@@ -202,6 +239,7 @@ arrayPopBack(array)
```sql
SELECT arrayPopBack([1, 2, 3]) AS res
```
+
```
┌─res───┐
│ [1,2] │
@@ -225,6 +263,7 @@ arrayPopFront(array)
```sql
SELECT arrayPopFront([1, 2, 3]) AS res
```
+
```
┌─res───┐
│ [2,3] │
@@ -242,7 +281,7 @@ arrayPushBack(array, single_value)
**Arguments**
- `array` – Array.
-- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about ClickHouse data types, read the section "[Data types](../../data_types/index.md#data_types)".
+- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about the types of data in ClickHouse, see "[Data types](../../data_types/index.md#data_types)". Can be `NULL`. The function adds a `NULL` element to an array, and the type of array elements converts to `Nullable`.
**Example**
@@ -267,19 +306,58 @@ arrayPushFront(array, single_value)
**Arguments**
- `array` – Array.
-- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about ClickHouse data types, read the section "[Data types](../../data_types/index.md#data_types)".
+- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about the types of data in ClickHouse, see "[Data types](../../data_types/index.md#data_types)". Can be `NULL`. The function adds a `NULL` element to an array, and the type of array elements converts to `Nullable`.
**Example**
```sql
SELECT arrayPushBack(['b'], 'a') AS res
```
+
```
┌─res───────┐
│ ['a','b'] │
└───────────┘
```
+## arrayResize
+
+Changes the length of the array.
+
+```
+arrayResize(array, size[, extender])
+```
+
+**Parameters:**
+
+- `array` — Array.
+- `size` — Required length of the array.
+ - If `size` is less than the original size of the array, the array is truncated from the right.
+- If `size` is larger than the initial size of the array, the array is extended to the right with `extender` values or default values for the data type of the array items.
+- `extender` — Value for extending an array. Can be `NULL`.
+
+**Returned value:**
+
+An array of length `size`.
+
+**Examples of calls**
+
+```
+SELECT arrayResize([1], 3)
+
+┌─arrayResize([1], 3)─┐
+│ [1,0,0] │
+└─────────────────────┘
+```
+
+```
+SELECT arrayResize([1], 3, NULL)
+
+┌─arrayResize([1], 3, NULL)─┐
+│ [1,NULL,NULL] │
+└───────────────────────────┘
+```
+
## arraySlice
Returns a slice of the array.
@@ -297,14 +375,17 @@ arraySlice(array, offset[, length])
**Example**
```sql
-SELECT arraySlice([1, 2, 3, 4, 5], 2, 3) AS res
+SELECT arraySlice([1, 2, NULL, 4, 5], 2, 3) AS res
```
+
```
-┌─res─────┐
-│ [2,3,4] │
-└─────────┘
+┌─res────────┐
+│ [2,NULL,4] │
+└────────────┘
```
+Array elements set to `NULL` are handled as normal values.
+
## arrayUniq(arr, ...)
If one argument is passed, it counts the number of different elements in the array.
@@ -315,4 +396,3 @@ If you want to get a list of unique items in an array, you can use arrayReduce('
## arrayJoin(arr)
A special function. See the section ["ArrayJoin function"](array_join.md#functions_arrayjoin).
-
diff --git a/docs/en/query_language/functions/conditional_functions.md b/docs/en/query_language/functions/conditional_functions.md
index c658b35bcbc..abd8c96e498 100644
--- a/docs/en/query_language/functions/conditional_functions.md
+++ b/docs/en/query_language/functions/conditional_functions.md
@@ -2,5 +2,47 @@
## if(cond, then, else), cond ? operator then : else
-Returns 'then' if cond !or 'else' if cond = 0.'cond' must be UInt 8, and 'then' and 'else' must be a type that has the smallest common type.
+Returns `then` if `cond != 0`, or `else` if `cond = 0`.
+`cond` must be of type `UInt8`, and `then` and `else` must have the lowest common type.
+`then` and `else` can be `NULL`
+
+## multiIf
+
+Allows you to write the [CASE](../operators.md#operator_case) operator more compactly in the query.
+
+```
+multiIf(cond_1, then_1, cond_2, then_2...else)
+```
+
+**Parameters:**
+
+- `cond_N` — The condition for the function to return `then_N`.
+- `then_N` — The result of the function when executed.
+- `else` — The result of the function if none of the conditions is met.
+
+The function accepts `2N+1` parameters.
+
+**Returned values**
+
+The function returns one of the values `then_N` or `else`, depending on the conditions `cond_N`.
+
+**Example**
+
+Take the table
+
+```
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+│ 2 │ 3 │
+└───┴──────┘
+```
+
+Run the query `SELECT multiIf(isNull(y) x, y < 3, y, NULL) FROM t_null`. Result:
+
+```
+┌─multiIf(isNull(y), x, less(y, 3), y, NULL)─┐
+│ 1 │
+│ ᴺᵁᴸᴸ │
+└────────────────────────────────────────────┘
+```
diff --git a/docs/en/query_language/functions/ext_dict_functions.md b/docs/en/query_language/functions/ext_dict_functions.md
index 5d5e4461396..68f5eaab7aa 100644
--- a/docs/en/query_language/functions/ext_dict_functions.md
+++ b/docs/en/query_language/functions/ext_dict_functions.md
@@ -15,6 +15,7 @@ For information on connecting and configuring external dictionaries, see "[Exter
## dictGetUUID
## dictGetString
+
`dictGetT('dict_name', 'attr_name', id)`
- Get the value of the attr_name attribute from the dict_name dictionary using the 'id' key.`dict_name` and `attr_name` are constant strings.`id`must be UInt64.
@@ -28,7 +29,7 @@ The same as the `dictGetT` functions, but the default value is taken from the fu
## dictIsIn
-`dictIsIn('dict_name', child_id, ancestor_id)`
+`dictIsIn ('dict_name', child_id, ancestor_id)`
- For the 'dict_name' hierarchical dictionary, finds out whether the 'child_id' key is located inside 'ancestor_id' (or matches 'ancestor_id'). Returns UInt8.
diff --git a/docs/en/query_language/functions/functions_for_nulls.md b/docs/en/query_language/functions/functions_for_nulls.md
new file mode 100644
index 00000000000..d52d0c840ec
--- /dev/null
+++ b/docs/en/query_language/functions/functions_for_nulls.md
@@ -0,0 +1,295 @@
+# Functions for working with Nullable aggregates
+
+## isNull
+
+Checks whether the argument is [NULL](../syntax.md#null-literal).
+
+```
+isNull(x)
+```
+
+**Parameters:**
+
+- `x` — A value with a non-compound data type.
+
+**Returned value**
+
+- `1` if `x` is `NULL`.
+- `0` if `x` is not `NULL`.
+
+**Example**
+
+Input table
+
+```
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+│ 2 │ 3 │
+└───┴──────┘
+```
+
+Query
+
+```
+:) SELECT x FROM t_null WHERE isNull(y)
+
+SELECT x
+FROM t_null
+WHERE isNull(y)
+
+┌─x─┐
+│ 1 │
+└───┘
+
+1 rows in set. Elapsed: 0.010 sec.
+```
+
+## isNotNull
+
+Checks whether the argument is [NULL](../syntax.md#null-literal).
+
+```
+isNotNull(x)
+```
+
+**Parameters:**
+
+- `x` — A value with a non-compound data type.
+
+**Returned value**
+
+- `0` if `x` is `NULL`.
+- `1` if `x` is not `NULL`.
+
+**Example**
+
+Input table
+
+```
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+│ 2 │ 3 │
+└───┴──────┘
+```
+
+Query
+
+```
+:) SELECT x FROM t_null WHERE isNotNull(y)
+
+SELECT x
+FROM t_null
+WHERE isNotNull(y)
+
+┌─x─┐
+│ 2 │
+└───┘
+
+1 rows in set. Elapsed: 0.010 sec.
+```
+
+## coalesce
+
+Checks from left to right whether `NULL` arguments were passed and returns the first non-`NULL` argument.
+
+```
+coalesce(x,...)
+```
+
+**Parameters:**
+
+- Any number of parameters of a non-compound type. All parameters must be compatible by data type.
+
+**Returned values**
+
+- The first non-`NULL` argument.
+- `NULL`, if all arguments are `NULL`.
+
+**Example**
+
+Consider a list of contacts that may specify multiple ways to contact a customer.
+
+```
+┌─name─────┬─mail─┬─phone─────┬──icq─┐
+│ client 1 │ ᴺᵁᴸᴸ │ 123-45-67 │ 123 │
+│ client 2 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │
+└──────────┴──────┴───────────┴──────┘
+```
+
+The `mail` and `phone` fields are of type String, but the `icq` field is `UInt32`, so it needs to be converted to `String`.
+
+Get the first available contact method for the customer from the contact list:
+
+```
+:) SELECT coalesce(mail, phone, CAST(icq,'Nullable(String)')) FROM aBook
+
+SELECT coalesce(mail, phone, CAST(icq, 'Nullable(String)'))
+FROM aBook
+
+┌─name─────┬─coalesce(mail, phone, CAST(icq, 'Nullable(String)'))─┐
+│ client 1 │ 123-45-67 │
+│ client 2 │ ᴺᵁᴸᴸ │
+└──────────┴──────────────────────────────────────────────────────┘
+
+2 rows in set. Elapsed: 0.006 sec.
+```
+
+## ifNull
+
+Returns an alternative value if the main argument is `NULL`.
+
+```
+ifNull(x,alt)
+```
+
+**Parameters:**
+
+- `x` — The value to check for `NULL`.
+- `alt` — The value that the function returns if `x` is `NULL`.
+
+**Returned values**
+
+- The value `x`, if `x` is not `NULL`.
+- The value `alt`, if `x` is `NULL`.
+
+**Example**
+
+```
+SELECT ifNull('a', 'b')
+
+┌─ifNull('a', 'b')─┐
+│ a │
+└──────────────────┘
+```
+
+```
+SELECT ifNull(NULL, 'b')
+
+┌─ifNull(NULL, 'b')─┐
+│ b │
+└───────────────────┘
+```
+
+## nullIf
+
+Returns `NULL` if the arguments are equal.
+
+```
+nullIf(x, y)
+```
+
+**Parameters:**
+
+`x`, `y` — Values for comparison. They must be compatible types, or ClickHouse will generate an exception.
+
+**Returned values**
+
+- `NULL`, if the arguments are equal.
+- The `x` value, if the arguments are not equal.
+
+**Example**
+
+```
+SELECT nullIf(1, 1)
+
+┌─nullIf(1, 1)─┐
+│ ᴺᵁᴸᴸ │
+└──────────────┘
+```
+
+```
+SELECT nullIf(1, 2)
+
+┌─nullIf(1, 2)─┐
+│ 1 │
+└──────────────┘
+```
+
+## assumeNotNull
+
+Results in a value of type [Nullable](../../data_types/nullable.md#data_type-nullable) for a non- `Nullable`, if the value is not `NULL`.
+
+```
+assumeNotNull(x)
+```
+
+**Parameters:**
+
+- `x` — The original value.
+
+**Returned values**
+
+- The original value from the non-`Nullable` type, if it is not `NULL`.
+- The default value for the non-`Nullable` type if the original value was `NULL`.
+
+**Example**
+
+Consider the `t_null` table.
+
+```
+SHOW CREATE TABLE t_null
+
+┌─statement─────────────────────────────────────────────────────────────────┐
+│ CREATE TABLE default.t_null ( x Int8, y Nullable(Int8)) ENGINE = TinyLog │
+└───────────────────────────────────────────────────────────────────────────┘
+```
+
+```
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+│ 2 │ 3 │
+└───┴──────┘
+```
+
+Apply the `resumenotnull` function to the `y` column.
+
+```
+SELECT assumeNotNull(y) FROM t_null
+
+┌─assumeNotNull(y)─┐
+│ 0 │
+│ 3 │
+└──────────────────┘
+```
+
+```
+SELECT toTypeName(assumeNotNull(y)) FROM t_null
+
+┌─toTypeName(assumeNotNull(y))─┐
+│ Int8 │
+│ Int8 │
+└──────────────────────────────┘
+```
+
+## toNullable
+
+Converts the argument type to `Nullable`.
+
+```
+toNullable(x)
+```
+
+**Parameters:**
+
+- `x` — The value of any non-compound type.
+
+**Returned value**
+
+- The input value with a non-`Nullable` type.
+
+**Example**
+
+```
+SELECT toTypeName(10)
+
+┌─toTypeName(10)─┐
+│ UInt8 │
+└────────────────┘
+
+SELECT toTypeName(toNullable(10))
+
+┌─toTypeName(toNullable(10))─┐
+│ Nullable(UInt8) │
+└────────────────────────────┘
+```
+
diff --git a/docs/en/query_language/functions/geo.md b/docs/en/query_language/functions/geo.md
new file mode 100644
index 00000000000..7d7cf6e440b
--- /dev/null
+++ b/docs/en/query_language/functions/geo.md
@@ -0,0 +1,70 @@
+# Functions for working with geographical coordinates
+
+## greatCircleDistance
+
+Calculate the distance between two points on the Earth's surface using [the great-circle formula](https://en.wikipedia.org/wiki/Great-circle_distance).
+
+```
+greatCircleDistance(lon1Deg, lat1Deg, lon2Deg, lat2Deg)
+```
+
+**Input parameters**
+
+- `lon1Deg` — Latitude of the first point in degrees. Range: `[-90°, 90°]`.
+- `lat1Deg` — Longitude of the first point in degrees. Range: `[-180°, 180°]`.
+- `lon2Deg` — Latitude of the second point in degrees. Range: `[-90°, 90°]`.
+- `lat2Deg` — Longitude of the second point in degrees. Range: `[-180°, 180°]`.
+
+Positive values correspond to North latitude and East longitude, and negative values correspond to South latitude and West longitude.
+
+**Returned value**
+
+The distance between two points on the Earth's surface, in meters.
+
+Generates an exception when the input parameter values fall outside of the range.
+
+**Example**
+
+```sql
+SELECT greatCircleDistance(55.755831, 37.617673, -55.755831, -37.617673)
+```
+
+```text
+┌─greatCircleDistance(55.755831, 37.617673, -55.755831, -37.617673)─┐
+│ 14132374.194975413 │
+└───────────────────────────────────────────────────────────────────┘
+```
+
+## pointInEllipses
+
+Checks whether the point belongs to at least one of the ellipses.
+
+```
+pointInEllipses(x, y, x₀, y₀, a₀, b₀,...,xₙ, yₙ, aₙ, bₙ)
+```
+
+**Input parameters**
+
+- `x` — Latitude of the point.
+- `y` — Longitude of the point.
+- `xᵢ, yᵢ` — Coordinates of the center of the `i`-th ellipsis.
+- `aᵢ, bᵢ` — Axes of the `i`-th ellipsis in meters.
+
+The input parameters must be `2+4⋅n`, where `n` is the number of ellipses.
+
+**Returned values**
+
+`1` if the point is inside at least one of the ellipses; `0`if it is not.
+
+**Examples:**
+
+```sql
+SELECT pointInEllipses(55.755831, 37.617673, 55.755831, 37.617673, 1.0, 2.0)
+```
+
+```text
+┌─pointInEllipses(55.755831, 37.617673, 55.755831, 37.617673, 1., 2.)─┐
+│ 1 │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
diff --git a/docs/en/query_language/functions/higher_order_functions.md b/docs/en/query_language/functions/higher_order_functions.md
index a80d5708e11..e7e84296a0d 100644
--- a/docs/en/query_language/functions/higher_order_functions.md
+++ b/docs/en/query_language/functions/higher_order_functions.md
@@ -1,3 +1,5 @@
+
+
# Higher-order functions
## `->` operator, lambda(params, expr) function
@@ -89,9 +91,9 @@ SELECT arrayCumSum([1, 1, 1, 1]) AS res
### arraySort(\[func,\] arr1, ...)
-Returns an array as result of sorting the elements of `arr1` in ascending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays)
+Returns the `arr1` array sorted in ascending order. If `func` is set, the sort order is determined by the result of the `func` function on the elements of the array or arrays.
-The [Schwartzian transform](https://en.wikipedia.org/wiki/Schwartzian_transform) is used to impove sorting efficiency.
+To improve sorting efficiency, we use the [Schwartzian Transform](https://en.wikipedia.org/wiki/%D0%9F%D1%80%D0%B5%D0%BE%D0%B1%D1%80%D0%B0%D0%B7%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%A8%D0%B2%D0%B0%D1%80%D1%86%D0%B0).
Example:
@@ -107,9 +109,5 @@ SELECT arraySort((x, y) -> y, ['hello', 'world'], [2, 1]);
### arrayReverseSort(\[func,\] arr1, ...)
-Returns an array as result of sorting the elements of `arr1` in descending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays)
+Returns the `arr1` array sorted in descending order. If `func` is set, the sort order is determined by the result of the `func` function on the elements of the array or arrays.
-
-
-
-
diff --git a/docs/en/query_language/functions/index.md b/docs/en/query_language/functions/index.md
index 15e1061d093..3700a277975 100644
--- a/docs/en/query_language/functions/index.md
+++ b/docs/en/query_language/functions/index.md
@@ -27,6 +27,13 @@ A constant expression is also considered a constant (for example, the right half
Functions can be implemented in different ways for constant and non-constant arguments (different code is executed). But the results for a constant and for a true column containing only the same value should match each other.
+## NULL processing
+
+Functions have the following behaviors:
+
+- If at least one of the arguments of the function is `NULL`, the function result is also `NULL`.
+- Special behavior that is specified individually in the description of each function. In the ClickHouse source code, these functions have `UseDefaultImplementationForNulls=false`.
+
## Constancy
Functions can't change the values of their arguments – any changes are returned as the result. Thus, the result of calculating separate functions does not depend on the order in which the functions are written in the query.
diff --git a/docs/en/query_language/functions/math_functions.md b/docs/en/query_language/functions/math_functions.md
index 5ef43170e0d..0ae7eb8427c 100644
--- a/docs/en/query_language/functions/math_functions.md
+++ b/docs/en/query_language/functions/math_functions.md
@@ -4,11 +4,11 @@ All the functions return a Float64 number. The accuracy of the result is close t
## e()
-Returns a Float64 number close to the e number.
+Returns a Float64 number that is close to the number e.
## pi()
-Returns a Float64 number close to π.
+Returns a Float64 number that is close to the number π.
## exp(x)
@@ -20,7 +20,7 @@ Accepts a numeric argument and returns a Float64 number close to the natural log
## exp2(x)
-Accepts a numeric argument and returns a Float64 number close to 2^x.
+Accepts a numeric argument and returns a Float64 number close to 2 to the power of x.
## log2(x)
@@ -28,7 +28,7 @@ Accepts a numeric argument and returns a Float64 number close to the binary loga
## exp10(x)
-Accepts a numeric argument and returns a Float64 number close to 10^x.
+Accepts a numeric argument and returns a Float64 number close to 10 to the power of x.
## log10(x)
@@ -96,4 +96,4 @@ The arc tangent.
## pow(x, y)
-Accepts two numeric arguments and returns a Float64 number close to x^y.
+Takes two numeric arguments x and y. Returns a Float64 number close to x to the power of y.
diff --git a/docs/en/query_language/functions/other_functions.md b/docs/en/query_language/functions/other_functions.md
index a8d2a54fa6a..f3299bd5796 100644
--- a/docs/en/query_language/functions/other_functions.md
+++ b/docs/en/query_language/functions/other_functions.md
@@ -9,10 +9,22 @@ Returns a string with the name of the host that this function was performed on.
Calculates the approximate width when outputting values to the console in text format (tab-separated).
This function is used by the system for implementing Pretty formats.
+`NULL` is represented as a string corresponding to `NULL` in `Pretty` formats.
+
+```
+SELECT visibleWidth(NULL)
+
+┌─visibleWidth(NULL)─┐
+│ 4 │
+└────────────────────┘
+```
+
## toTypeName(x)
Returns a string containing the type name of the passed argument.
+If `NULL` is passed to the function as input, then it returns the `Nullable(Nothing)` type, which corresponds to an internal `NULL` representation in ClickHouse.
+
## blockSize()
Gets the size of the block.
@@ -25,7 +37,7 @@ In ClickHouse, full columns and constants are represented differently in memory.
## ignore(...)
-Accepts any arguments and always returns 0.
+Accepts any arguments, including `NULL`. Always returns 0.
However, the argument is still evaluated. This can be used for benchmarks.
## sleep(seconds)
@@ -59,13 +71,13 @@ For elements in a nested data structure, the function checks for the existence o
Allows building a unicode-art diagram.
-`bar (x, min, max, width)` draws a band with a width proportional to `(x - min)` and equal to `width` characters when `x = max`.
+`bar(x, min, max, width)` draws a band with a width proportional to `(x - min)` and equal to `width` characters when `x = max`.
Parameters:
-- `x` – Value to display.
-- `min, max` – Integer constants. The value must fit in Int64.
-- `width` – Constant, positive number, may be a fraction.
+- `x` — Size to display.
+- `min, max` — Integer constants. The value must fit in `Int64`.
+- `width` — Constant, positive integer, can be fractional.
The band is drawn with accuracy to one eighth of a symbol.
@@ -283,3 +295,267 @@ The inverse function of MACNumToString. If the MAC address has an invalid format
## MACStringToOUI(s)
Accepts a MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form). Returns the first three octets as a UInt64 number. If the MAC address has an invalid format, it returns 0.
+
+## getSizeOfEnumType
+
+Returns the number of fields in [Enum](../../data_types/enum.md#data_type-enum).
+
+```
+getSizeOfEnumType(value)
+```
+
+**Parameters:**
+
+- `value` — Value of type `Enum`.
+
+**Returned values**
+
+- The number of fields with `Enum` input values.
+- An exception is thrown if the type is not `Enum`.
+
+**Example**
+
+```
+SELECT getSizeOfEnumType( CAST('a' AS Enum8('a' = 1, 'b' = 2) ) ) AS x
+
+┌─x─┐
+│ 2 │
+└───┘
+```
+
+## toColumnTypeName
+
+Returns the name of the class that represents the data type of the column in RAM.
+
+```
+toColumnTypeName(value)
+```
+
+**Parameters:**
+
+- `value` — Any type of value.
+
+**Returned values**
+
+- A string with the name of the class that is used for representing the `value` data type in RAM.
+
+**Example of the difference between` toTypeName ' and ' toColumnTypeName`**
+
+```
+:) select toTypeName(cast('2018-01-01 01:02:03' AS DateTime))
+
+SELECT toTypeName(CAST('2018-01-01 01:02:03', 'DateTime'))
+
+┌─toTypeName(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
+│ DateTime │
+└─────────────────────────────────────────────────────┘
+
+1 rows in set. Elapsed: 0.008 sec.
+
+:) select toColumnTypeName(cast('2018-01-01 01:02:03' AS DateTime))
+
+SELECT toColumnTypeName(CAST('2018-01-01 01:02:03', 'DateTime'))
+
+┌─toColumnTypeName(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
+│ Const(UInt32) │
+└───────────────────────────────────────────────────────────┘
+```
+
+The example shows that the `DateTime` data type is stored in memory as `Const(UInt32)`.
+
+## dumpColumnStructure
+
+Outputs a detailed description of data structures in RAM
+
+```
+dumpColumnStructure(value)
+```
+
+**Parameters:**
+
+- `value` — Any type of value.
+
+**Returned values**
+
+- A string describing the structure that is used for representing the `value` data type in RAM.
+
+**Example**
+
+```
+SELECT dumpColumnStructure(CAST('2018-01-01 01:02:03', 'DateTime'))
+
+┌─dumpColumnStructure(CAST('2018-01-01 01:02:03', 'DateTime'))─┐
+│ DateTime, Const(size = 1, UInt32(size = 1)) │
+└──────────────────────────────────────────────────────────────┘
+```
+
+## defaultValueOfArgumentType
+
+Outputs the default value for the data type.
+
+Does not include default values for custom columns set by the user.
+
+```
+defaultValueOfArgumentType(expression)
+```
+
+**Parameters:**
+
+- `expression` — Arbitrary type of value or an expression that results in a value of an arbitrary type.
+
+**Returned values**
+
+- `0` for numbers.
+- Empty string for strings.
+- `ᴺᵁᴸᴸ` for [Nullable](../../data_types/nullable.md#data_type-nullable).
+
+**Example**
+
+```
+:) SELECT defaultValueOfArgumentType( CAST(1 AS Int8) )
+
+SELECT defaultValueOfArgumentType(CAST(1, 'Int8'))
+
+┌─defaultValueOfArgumentType(CAST(1, 'Int8'))─┐
+│ 0 │
+└─────────────────────────────────────────────┘
+
+1 rows in set. Elapsed: 0.002 sec.
+
+:) SELECT defaultValueOfArgumentType( CAST(1 AS Nullable(Int8) ) )
+
+SELECT defaultValueOfArgumentType(CAST(1, 'Nullable(Int8)'))
+
+┌─defaultValueOfArgumentType(CAST(1, 'Nullable(Int8)'))─┐
+│ ᴺᵁᴸᴸ │
+└───────────────────────────────────────────────────────┘
+
+1 rows in set. Elapsed: 0.002 sec.
+```
+
+## indexHint
+
+Outputs data in the range selected by the index without filtering by the expression specified as an argument.
+
+The expression passed to the function is not calculated, but ClickHouse applies the index to this expression in the same way as if the expression was in the query without `indexHint`.
+
+**Returned value**
+
+- 1.
+
+**Example**
+
+Here is a table with the test data for [ontime](../../getting_started/example_datasets/ontime.md#example_datasets-ontime).
+
+```
+SELECT count() FROM ontime
+
+┌─count()─┐
+│ 4276457 │
+└─────────┘
+```
+
+The table has indexes for the fields `(FlightDate, (Year, FlightDate))`.
+
+Create a selection by date like this:
+
+```
+:) SELECT FlightDate AS k, count() FROM ontime GROUP BY k ORDER BY k
+
+SELECT
+ FlightDate AS k,
+ count()
+FROM ontime
+GROUP BY k
+ORDER BY k ASC
+
+┌──────────k─┬─count()─┐
+│ 2017-01-01 │ 13970 │
+│ 2017-01-02 │ 15882 │
+........................
+│ 2017-09-28 │ 16411 │
+│ 2017-09-29 │ 16384 │
+│ 2017-09-30 │ 12520 │
+└────────────┴─────────┘
+
+273 rows in set. Elapsed: 0.072 sec. Processed 4.28 million rows, 8.55 MB (59.00 million rows/s., 118.01 MB/s.)
+```
+
+In this selection, the index is not used and ClickHouse processed the entire table (`Processed 4.28 million rows`). To apply the index, select a specific date and run the following query:
+
+```
+:) SELECT FlightDate AS k, count() FROM ontime WHERE k = '2017-09-15' GROUP BY k ORDER BY k
+
+SELECT
+ FlightDate AS k,
+ count()
+FROM ontime
+WHERE k = '2017-09-15'
+GROUP BY k
+ORDER BY k ASC
+
+┌──────────k─┬─count()─┐
+│ 2017-09-15 │ 16428 │
+└────────────┴─────────┘
+
+1 rows in set. Elapsed: 0.014 sec. Processed 32.74 thousand rows, 65.49 KB (2.31 million rows/s., 4.63 MB/s.)
+```
+
+The last line of output shows that by using the index, ClickHouse processed a significantly smaller number of rows (`Processed 32.74 thousand rows`).
+
+Now pass the expression `k = '2017-09-15'` to the `indexHint` function:
+
+```
+:) SELECT FlightDate AS k, count() FROM ontime WHERE indexHint(k = '2017-09-15') GROUP BY k ORDER BY k
+
+SELECT
+ FlightDate AS k,
+ count()
+FROM ontime
+WHERE indexHint(k = '2017-09-15')
+GROUP BY k
+ORDER BY k ASC
+
+┌──────────k─┬─count()─┐
+│ 2017-09-14 │ 7071 │
+│ 2017-09-15 │ 16428 │
+│ 2017-09-16 │ 1077 │
+│ 2017-09-30 │ 8167 │
+└────────────┴─────────┘
+
+4 rows in set. Elapsed: 0.004 sec. Processed 32.74 thousand rows, 65.49 KB (8.97 million rows/s., 17.94 MB/s.)
+```
+
+The response to the request shows that ClickHouse applied the index in the same way as the previous time (`Processed 32.74 thousand rows`). However, the resulting set of rows shows that the expression `k = '2017-09-15'` was not used when generating the result.
+
+Because the index is sparse in ClickHouse, "extra" data ends up in the response when reading a range (in this case, the adjacent dates). Use the `indexHint` function to see it.
+
+## replicate
+
+Creates an array with a single value.
+
+Used for internal implementation of [arrayJoin](array_join.md#functions_arrayjoin).
+
+```
+replicate(x, arr)
+```
+
+**Parameters:**
+
+- `arr` — Original array. ClickHouse creates a new array of the same length as the original and fills it with the value `x`.
+- `x` — The value that the resulting array will be filled with.
+
+**Output value**
+
+- An array filled with the value `x`.
+
+**Example**
+
+```
+SELECT replicate(1, ['a', 'b', 'c'])
+
+┌─replicate(1, ['a', 'b', 'c'])─┐
+│ [1,1,1] │
+└───────────────────────────────┘
+```
+
diff --git a/docs/en/query_language/functions/string_search_functions.md b/docs/en/query_language/functions/string_search_functions.md
index 56644f00ba3..a038162c023 100644
--- a/docs/en/query_language/functions/string_search_functions.md
+++ b/docs/en/query_language/functions/string_search_functions.md
@@ -5,16 +5,16 @@ The search substring or regular expression must be a constant in all these funct
## position(haystack, needle)
-Search for the `needle` substring in the `haystack` string.
+Search for the substring `needle` in the string `haystack`.
Returns the position (in bytes) of the found substring, starting from 1, or returns 0 if the substring was not found.
-For case-insensitive search use `positionCaseInsensitive` function.
+For a case-insensitive search, use the function `positionCaseInsensitive`.
## positionUTF8(haystack, needle)
The same as `position`, but the position is returned in Unicode code points. Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn't throw an exception).
-For case-insensitive search use `positionCaseInsensitiveUTF8` function.
+For a case-insensitive search, use the function `positionCaseInsensitiveUTF8`.
## match(haystack, pattern)
@@ -51,3 +51,4 @@ For other regular expressions, the code is the same as for the 'match' function.
## notLike(haystack, pattern), haystack NOT LIKE pattern operator
The same thing as 'like', but negative.
+
diff --git a/docs/en/query_language/functions/type_conversion_functions.md b/docs/en/query_language/functions/type_conversion_functions.md
index 903f370e2b2..02014987e80 100644
--- a/docs/en/query_language/functions/type_conversion_functions.md
+++ b/docs/en/query_language/functions/type_conversion_functions.md
@@ -117,5 +117,23 @@ SELECT
└─────────────────────┴─────────────────────┴────────────┴─────────────────────┴───────────────────────────┘
```
-Conversion to FixedString (N) only works for arguments of type String or FixedString (N).
+Conversion to FixedString(N) only works for arguments of type String or FixedString(N).
+
+Type conversion to [Nullable](../../data_types/nullable.md#data_type-nullable) and back is supported. Example:
+
+```
+SELECT toTypeName(x) FROM t_null
+
+┌─toTypeName(x)─┐
+│ Int8 │
+│ Int8 │
+└───────────────┘
+
+SELECT toTypeName(CAST(x, 'Nullable(UInt16)')) FROM t_null
+
+┌─toTypeName(CAST(x, 'Nullable(UInt16)'))─┐
+│ Nullable(UInt16) │
+│ Nullable(UInt16) │
+└─────────────────────────────────────────┘
+```
diff --git a/docs/en/query_language/functions/ym_dict_functions.md b/docs/en/query_language/functions/ym_dict_functions.md
index 7ba7e7012cf..60ddc55947a 100644
--- a/docs/en/query_language/functions/ym_dict_functions.md
+++ b/docs/en/query_language/functions/ym_dict_functions.md
@@ -1,3 +1,5 @@
+
+
# Functions for working with Yandex.Metrica dictionaries
In order for the functions below to work, the server config must specify the paths and addresses for getting all the Yandex.Metrica dictionaries. The dictionaries are loaded at the first call of any of these functions. If the reference lists can't be loaded, an exception is thrown.
@@ -21,9 +23,9 @@ All functions for working with regions have an optional argument at the end –
Example:
```text
-regionToCountry(RegionID) – Uses the default dictionary: /opt/geo/regions_hierarchy.txt
-regionToCountry(RegionID, '') – Uses the default dictionary: /opt/geo/regions_hierarchy.txt
-regionToCountry(RegionID, 'ua') – Uses the dictionary for the 'ua' key: /opt/geo/regions_hierarchy_ua.txt
+regionToCountry (RegionID) — Uses the default dictionary: /opt/geo/regions_hierarchy.txt.
+regionToCountry (RegionID, '') — Uses the default dictionary: /opt/geo/regions_hierarchy.txt.
+regionToCountry (RegionID, 'ua') — Uses the dictionary for the ua key: /opt/geo/regions_hierarchy_ua.txt.
```
### regionToCity(id[, geobase])
@@ -43,20 +45,20 @@ LIMIT 15
```text
┌─regionToName(regionToArea(toUInt32(number), \'ua\'))─┐
│ │
-│ Moscow and Moscow region │
-│ St. Petersburg and Leningrad region │
-│ Belgorod region │
-│ Ivanovsk region │
-│ Kaluga region │
-│ Kostroma region │
-│ Kursk region │
-│ Lipetsk region │
-│ Orlov region │
-│ Ryazan region │
-│ Smolensk region │
-│ Tambov region │
-│ Tver region │
-│ Tula region │
+│ Moscow and Moscow region │
+│ St. Petersburg and Leningrad region │
+│ Belogorod region │
+│ Ivanovo region │
+│ Kaluga region │
+│ Kostroma region │
+│ Kursk region │
+│ Lipetsk region │
+│ Oryol region │
+│ Ryazan region │
+│ Smolensk region │
+│ Tambov region │
+│ Tver region │
+│ Tula region │
└──────────────────────────────────────────────────────┘
```
@@ -73,20 +75,20 @@ LIMIT 15
```text
┌─regionToName(regionToDistrict(toUInt32(number), \'ua\'))─┐
│ │
-│ Central federal district │
-│ Northwest federal district │
-│ South federal district │
-│ North Caucases federal district │
-│ Privolga federal district │
-│ Ural federal district │
-│ Siberian federal district │
-│ Far East federal district │
-│ Scotland │
-│ Faroe Islands │
-│ Flemish region │
-│ Brussels capital region │
-│ Wallonia │
-│ Federation of Bosnia and Herzegovina │
+│ Central Federal District │
+│ Northwest Federal District │
+│ Southern Federal District │
+│ North Caucasian Federal District │
+│ Privolzhsky Federal District │
+│ Ural Federal District │
+│ Siberian Federal District │
+│ Far East Federal District │
+│ Scotland │
+│ Faroe Islands │
+│ Flemish region │
+│ Brussels capital region │
+│ Walloon │
+│ Federation of Bosnia and Herzegovina │
└──────────────────────────────────────────────────────────┘
```
diff --git a/docs/en/query_language/index.md b/docs/en/query_language/index.md
index 144303f30f1..dcb59cbe6ad 100644
--- a/docs/en/query_language/index.md
+++ b/docs/en/query_language/index.md
@@ -4,4 +4,5 @@
* [INSERT INTO](insert_into.md#queries-insert)
* [CREATE](create.md#create-database)
* [ALTER](alter.md#query_language_queries_alter)
-* [Other kinds of queries](misc.md#miscellaneous-queries)
+* [Other types of queries](misc.md#miscellanous-queries)
+
diff --git a/docs/en/query_language/misc.md b/docs/en/query_language/misc.md
index 253c1d59ef4..b6d76cf48c2 100644
--- a/docs/en/query_language/misc.md
+++ b/docs/en/query_language/misc.md
@@ -6,9 +6,10 @@ This query is exactly the same as `CREATE`, but
- instead of the word `CREATE` it uses the word `ATTACH`.
- The query doesn't create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the table to the server.
-After executing an ATTACH query, the server will know about the existence of the table.
-If the table was previously detached (``DETACH``), meaning that its structure is known, you can use shorthand without defining the structure.
+After executing an `ATTACH` query, the server will know about the existence of the table.
+
+If the table was previously detached (`DETACH`), meaning that its structure is known, you can use shorthand without defining the structure.
```sql
ATTACH TABLE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
@@ -175,8 +176,8 @@ Supported only by `*MergeTree` engines, in which this query initializes a non-sc
If you specify a `PARTITION`, only the specified partition will be optimized.
If you specify `FINAL`, optimization will be performed even when all the data is already in one part.
-!!! warning
- OPTIMIZE can't fix the "Too many parts" error.
+!!! Important:
+The OPTIMIZE query can't fix the cause of the "Too many parts" error.
## KILL QUERY
@@ -193,7 +194,7 @@ The queries to terminate are selected from the system.processes table using the
Examples:
```sql
--- Forcibly terminates all queries with the specified query_id:
+-- Terminates all queries with the specified query_id:
KILL QUERY WHERE query_id='2-857d-4a57-9ee0-327da5d60a90'
-- Synchronously terminates all queries run by 'username':
diff --git a/docs/en/query_language/operators.md b/docs/en/query_language/operators.md
index 36422455868..5d3b6df4ca6 100644
--- a/docs/en/query_language/operators.md
+++ b/docs/en/query_language/operators.md
@@ -81,7 +81,9 @@ Groups of operators are listed in order of priority (the higher it is in the lis
Note:
-The conditional operator calculates the values of b and c, then checks whether condition a is met, and then returns the corresponding value. If "b" or "c" is an arrayJoin() function, each row will be replicated regardless of the "a" condition.
+The conditional operator calculates the values of b and c, then checks whether condition a is met, and then returns the corresponding value. If `b` or `C` is an [arrayJoin()](functions/array_join.md#functions_arrayjoin) function, each row will be replicated regardless of the "a" condition.
+
+
## Conditional Expression
@@ -120,3 +122,52 @@ Sometimes this doesn't work the way you expect. For example, ` SELECT 4 > 2 > 3`
For efficiency, the `and` and `or` functions accept any number of arguments. The corresponding chains of `AND` and `OR` operators are transformed to a single call of these functions.
+## Checking for `NULL`
+
+ClickHouse supports the `IS NULL` and `IS NOT NULL` operators.
+
+
+
+### IS NULL
+
+- For [Nullable](../data_types/nullable.md#data_type-nullable) type values, the `IS NULL` operator returns:
+ - `1`, if the value is `NULL`.
+ - `0` otherwise.
+- For other values, the `IS NULL` operator always returns `0`.
+
+```bash
+:) SELECT x+100 FROM t_null WHERE y IS NULL
+
+SELECT x + 100
+FROM t_null
+WHERE isNull(y)
+
+┌─plus(x, 100)─┐
+│ 101 │
+└──────────────┘
+
+1 rows in set. Elapsed: 0.002 sec.
+```
+
+
+
+### IS NOT NULL
+
+- For [Nullable](../data_types/nullable.md#data_type-nullable) type values, the `IS NOT NULL` operator returns:
+ - `0`, if the value is `NULL`.
+ - `1` otherwise.
+- For other values, the `IS NOT NULL` operator always returns `1`.
+
+```bash
+:) SELECT * FROM t_null WHERE y IS NOT NULL
+
+SELECT *
+FROM t_null
+WHERE isNotNull(y)
+
+┌─x─┬─y─┐
+│ 2 │ 3 │
+└───┴───┘
+
+1 rows in set. Elapsed: 0.002 sec.
+```
diff --git a/docs/en/query_language/select.md b/docs/en/query_language/select.md
index ad82ee0eee0..d00b8250e79 100644
--- a/docs/en/query_language/select.md
+++ b/docs/en/query_language/select.md
@@ -1,6 +1,6 @@
# SELECT Queries Syntax
-`SELECT` performs data retrieval.
+`SELECT` forms samples of the data.
```sql
SELECT [DISTINCT] expr_list
@@ -26,7 +26,9 @@ The clauses below are described in almost the same order as in the query executi
If the query omits the `DISTINCT`, `GROUP BY` and `ORDER BY` clauses and the `IN` and `JOIN` subqueries, the query will be completely stream processed, using O(1) amount of RAM.
Otherwise, the query might consume a lot of RAM if the appropriate restrictions are not specified: `max_memory_usage`, `max_rows_to_group_by`, `max_rows_to_sort`, `max_rows_in_distinct`, `max_bytes_in_distinct`, `max_rows_in_set`, `max_bytes_in_set`, `max_rows_in_join`, `max_bytes_in_join`, `max_bytes_before_external_sort`, `max_bytes_before_external_group_by`. For more information, see the section "Settings". It is possible to use external sorting (saving temporary tables to a disk) and external aggregation. `The system does not have "merge join"`.
-### FROM Clause
+
+
+### FROM clause
If the FROM clause is omitted, data will be read from the `system.one` table.
The 'system.one' table contains exactly one row (this table fulfills the same purpose as the DUAL table found in other DBMSs).
@@ -332,7 +334,9 @@ The query can only specify a single ARRAY JOIN clause.
The corresponding conversion can be performed before the WHERE/PREWHERE clause (if its result is needed in this clause), or after completing WHERE/PREWHERE (to reduce the volume of calculations).
-### JOIN Clause
+
+
+### JOIN clause
The normal JOIN, which is not related to ARRAY JOIN described above.
@@ -426,29 +430,58 @@ Among the various types of JOINs, the most efficient is ANY LEFT JOIN, then ANY
If you need a JOIN for joining with dimension tables (these are relatively small tables that contain dimension properties, such as names for advertising campaigns), a JOIN might not be very convenient due to the bulky syntax and the fact that the right table is re-accessed for every query. For such cases, there is an "external dictionaries" feature that you should use instead of JOIN. For more information, see the section "External dictionaries".
-### WHERE Clause
+#### NULL processing
-If there is a WHERE clause, it must contain an expression with the UInt8 type. This is usually an expression with comparison and logical operators.
-This expression will be used for filtering data before all other transformations.
+The JOIN behavior is affected by the [join_use_nulls](../operations/settings/settings.md#settings-join_use_nulls) setting. With `join_use_nulls=1,` `JOIN` works like in standard SQL.
-If indexes are supported by the database table engine, the expression is evaluated on the ability to use indexes.
+If the JOIN keys are [Nullable](../data_types/nullable.md#data_types-nullable) fields, the rows where at least one of the keys has the value [NULL](syntax.md#null-literal) are not joined.
+
+
+
+### WHERE clause
+
+Allows you to set an expression that ClickHouse uses to filter data before all other actions in the query, other than the expressions contained in the [PREWHERE](#query_language-queries-prewhere) clause. This is usually an expression with logical operators.
+
+The result of the expression must be of type `UInt8`.
+
+ClickHouse uses indexes in the expression if this is allowed by the [table engine](../operations/table_engines/index.md#table_engines).
+
+If [NULL](syntax.md#null-literal) must be checked in the clause, then use the [IS NULL](operators.md#operator-is-null) and [IS NOT NULL](operators.md#operator-is-not-null) operators and the related `isNull` and `isNotNull` functions. Otherwise, the expression will always be considered as not executed.
+
+Example of checking for `NULL`:
+
+```bash
+:) SELECT * FROM t_null WHERE y IS NULL
+
+SELECT *
+FROM t_null
+WHERE isNull(y)
+
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+└───┴──────┘
+
+1 rows in set. Elapsed: 0.002 sec.
+```
+
+
### PREWHERE Clause
-This clause has the same meaning as the WHERE clause. The difference is in which data is read from the table.
-When using PREWHERE, first only the columns necessary for executing PREWHERE are read. Then the other columns are read that are needed for running the query, but only those blocks where the PREWHERE expression is true.
+It has the same purpose as the [WHERE](#query_language-queries-where) clause. The difference is in which data is read from the table.
+When using `PREWHERE`, first only the columns necessary for executing `PREWHERE` are read. Then the other columns are read that are needed for running the query, but only those blocks where the `PREWHERE` expression is true.
-It makes sense to use PREWHERE if there are filtration conditions that are not suitable for indexes that are used by a minority of the columns in the query, but that provide strong data filtration. This reduces the volume of data to read.
+`PREWHERE` makes sense if there are filtration conditions that are not suitable for indexes that are used by a minority of the columns in the query, but that provide strong data filtration. This reduces the volume of data to read.
-For example, it is useful to write PREWHERE for queries that extract a large number of columns, but that only have filtration for a few columns.
+For example, it is useful to write `PREWHERE` for queries that extract a large number of columns, but that only have filtration for a few columns.
-PREWHERE is only supported by tables from the `*MergeTree` family.
+`PREWHERE` is only supported by tables from the `*MergeTree` family.
-A query may simultaneously specify PREWHERE and WHERE. In this case, PREWHERE precedes WHERE.
+A query may simultaneously specify `PREWHERE` and `WHERE`. In this case, `PREWHERE` goes before `WHERE`.
-Keep in mind that it does not make much sense for PREWHERE to only specify those columns that have an index, because when using an index, only the data blocks that match the index are read.
+Keep in mind that it does not make much sense for `PREWHERE` to only specify those columns that have an index, because when using an index, only the data blocks that match the index are read.
-If the 'optimize_move_to_prewhere' setting is set to 1 and PREWHERE is omitted, the system uses heuristics to automatically move parts of expressions from WHERE to PREWHERE.
+If the setting `optimize_move_to_prewhere` is set to `1`, in the absence of `PREWHERE`, the system will automatically move parts of expressions from `WHERE` to `PREWHERE` according to heuristic analysis.
### GROUP BY Clause
@@ -490,7 +523,39 @@ GROUP BY is not supported for array columns.
A constant can't be specified as arguments for aggregate functions. Example: sum(1). Instead of this, you can get rid of the constant. Example: `count()`.
-#### WITH TOTALS Modifier
+#### NULL processing
+
+For grouping, ClickHouse interprets [NULL](syntax.md#null-literal) as a value, and `NULL=NULL`.
+
+Here's an example to show what this means.
+
+Assume you have this table:
+
+```
+┌─x─┬────y─┐
+│ 1 │ 2 │
+│ 2 │ ᴺᵁᴸᴸ │
+│ 3 │ 2 │
+│ 3 │ 3 │
+│ 3 │ ᴺᵁᴸᴸ │
+└───┴──────┘
+```
+
+The query `SELECT sum(x), y FROM t_null_big GROUP BY y` results in:
+
+```
+┌─sum(x)─┬────y─┐
+│ 4 │ 2 │
+│ 3 │ 3 │
+│ 5 │ ᴺᵁᴸᴸ │
+└────────┴──────┘
+```
+
+You can see that `GROUP BY` for `У = NULL` summed up `x`, as if `NULL` is this value.
+
+If you pass several keys to `GROUP BY`, the result will give you all the combinations of the selection, as if `NULL` were a specific value.
+
+#### WITH TOTALS modifier
If the WITH TOTALS modifier is specified, another row will be calculated. This row will have key columns containing default values (zeros or empty lines), and columns of aggregate functions with the values calculated across all the rows (the "total" values).
@@ -522,20 +587,20 @@ The `max_bytes_before_external_group_by` setting determines the threshold RAM co
When using `max_bytes_before_external_group_by`, we recommend that you set max_memory_usage about twice as high. This is necessary because there are two stages to aggregation: reading the date and forming intermediate data (1) and merging the intermediate data (2). Dumping data to the file system can only occur during stage 1. If the temporary data wasn't dumped, then stage 2 might require up to the same amount of memory as in stage 1.
-For example, if `max_memory_usage` was set to 10000000000 and you want to use external aggregation, it makes sense to set `max_bytes_before_external_group_by` to 10000000000, and max_memory_usage to 20000000000. When external aggregation is triggered (if there was at least one dump of temporary data), maximum consumption of RAM is only slightly more than ` max_bytes_before_external_group_by`.
+For example, if `max_memory_usage` was set to 10000000000 and you want to use external aggregation, it makes sense to set `max_bytes_before_external_group_by` to 10000000000, and max_memory_usage to 20000000000. When external aggregation is triggered (if there was at least one dump of temporary data), maximum consumption of RAM is only slightly more than `max_bytes_before_external_group_by`.
-With distributed query processing, external aggregation is performed on remote servers. In order for the requestor server to use only a small amount of RAM, set ` distributed_aggregation_memory_efficient` to 1.
+With distributed query processing, external aggregation is performed on remote servers. In order for the requestor server to use only a small amount of RAM, set `distributed_aggregation_memory_efficient` to 1.
-When merging data flushed to the disk, as well as when merging results from remote servers when the ` distributed_aggregation_memory_efficient` setting is enabled, consumes up to 1/256 \* the number of threads from the total amount of RAM.
+When merging data flushed to the disk, as well as when merging results from remote servers when the `distributed_aggregation_memory_efficient` setting is enabled, consumes up to 1/256 \* the number of threads from the total amount of RAM.
-When external aggregation is enabled, if there was less than ` max_bytes_before_external_group_by` of data (i.e. data was not flushed), the query runs just as fast as without external aggregation. If any temporary data was flushed, the run time will be several times longer (approximately three times).
+When external aggregation is enabled, if there was less than `max_bytes_before_external_group_by` of data (i.e. data was not flushed), the query runs just as fast as without external aggregation. If any temporary data was flushed, the run time will be several times longer (approximately three times).
If you have an ORDER BY with a small LIMIT after GROUP BY, then the ORDER BY CLAUSE will not use significant amounts of RAM.
But if the ORDER BY doesn't have LIMIT, don't forget to enable external sorting (`max_bytes_before_external_sort`).
### LIMIT N BY Clause
-LIMIT N BY COLUMNS selects the top N rows for each group of COLUMNS. LIMIT N BY is not related to LIMIT; they can both be used in the same query. The key for LIMIT N BY can contain any number of columns or expressions.
+`LIMIT N BY COLUMNS` selects the top `N` rows for each group of `COLUMNS`. `LIMIT N BY` is not related to `LIMIT`; they can both be used in the same query. The key for `LIMIT N BY` can contain any number of columns or expressions.
Example:
@@ -554,7 +619,9 @@ LIMIT 100
The query will select the top 5 referrers for each `domain, device_type` pair, but not more than 100 rows (`LIMIT n BY + LIMIT`).
-### HAVING Clause
+`LIMIT n BY` works with [NULL](syntax.md#null-literal) as if it were a specific value. This means that as the result of the query, the user will get all the combinations of fields specified in `BY`.
+
+### HAVING clause
Allows filtering the result received after GROUP BY, similar to the WHERE clause.
WHERE and HAVING differ in that WHERE is performed before aggregation (GROUP BY), while HAVING is performed after it.
@@ -573,7 +640,47 @@ We only recommend using COLLATE for final sorting of a small number of rows, sin
Rows that have identical values for the list of sorting expressions are output in an arbitrary order, which can also be nondeterministic (different each time).
If the ORDER BY clause is omitted, the order of the rows is also undefined, and may be nondeterministic as well.
-When floating point numbers are sorted, NaNs are separate from the other values. Regardless of the sorting order, NaNs come at the end. In other words, for ascending sorting they are placed as if they are larger than all the other numbers, while for descending sorting they are placed as if they are smaller than the rest.
+`NaN` and `NULL` sorting order:
+
+- With the modifier `NULLS FIRST` — First `NULL`, then `NaN`, then other values.
+- With the modifier `NULLS LAST` — First the values, then `NaN`, then `NULL`.
+- Default — The same as with the `NULLS LAST` modifier.
+
+Example:
+
+For the table
+
+```
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+│ 2 │ 2 │
+│ 1 │ nan │
+│ 2 │ 2 │
+│ 3 │ 4 │
+│ 5 │ 6 │
+│ 6 │ nan │
+│ 7 │ ᴺᵁᴸᴸ │
+│ 6 │ 7 │
+│ 8 │ 9 │
+└───┴──────┘
+```
+
+Run the query `SELECT * FROM t_null_nan ORDER BY y NULLS FIRST` to get:
+
+```
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+│ 7 │ ᴺᵁᴸᴸ │
+│ 1 │ nan │
+│ 6 │ nan │
+│ 2 │ 2 │
+│ 2 │ 2 │
+│ 3 │ 4 │
+│ 5 │ 6 │
+│ 6 │ 7 │
+│ 8 │ 9 │
+└───┴──────┘
+```
Less RAM is used if a small enough LIMIT is specified in addition to ORDER BY. Otherwise, the amount of memory spent is proportional to the volume of data for sorting. For distributed query processing, if GROUP BY is omitted, sorting is partially done on remote servers, and the results are merged on the requestor server. This means that for distributed sorting, the volume of data to sort can be greater than the amount of memory on a single server.
@@ -592,14 +699,16 @@ These expressions work as if they are applied to separate rows in the result.
### DISTINCT Clause
-If DISTINCT is specified, only a single row will remain out of all the sets of fully matching rows in the result.
-The result will be the same as if GROUP BY were specified across all the fields specified in SELECT without aggregate functions. But there are several differences from GROUP BY:
+If `DISTINCT` is specified, only a single row will remain out of all the sets of fully matching rows in the result.
+The result will be the same as if `GROUP BY` were specified across all the fields specified in `SELECT` without aggregate functions. But there are several differences from `GROUP BY`:
-- DISTINCT can be applied together with GROUP BY.
-- When ORDER BY is omitted and LIMIT is defined, the query stops running immediately after the required number of different rows has been read.
+- `DISTINCT` can be applied together with `GROUP BY`.
+- When `ORDER BY` is omitted and `LIMIT` is defined, the query stops running immediately after the required number of different rows has been read.
- Data blocks are output as they are processed, without waiting for the entire query to finish running.
-DISTINCT is not supported if SELECT has at least one array column.
+`DISTINCT` is not supported if `SELECT` has at least one array column.
+
+`DISTINCT` works with [NULL](syntax.md#null-literal) as if `NULL` were a specific value, and `NULL=NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` only occur once.
### LIMIT Clause
@@ -610,9 +719,11 @@ LIMIT n, m allows you to select the first 'm' rows from the result after skippin
If there isn't an ORDER BY clause that explicitly sorts results, the result may be arbitrary and nondeterministic.
-### UNION ALL Clause
+`DISTINCT` works with [NULL](syntax.md#null-literal) as if `NULL` were a specific value, and `NULL=NULL`. In other words, in the `DISTINCT` results, different combinations with `NULL` only occur once.
-You can use UNION ALL to combine any number of queries. Example:
+### UNION ALL clause
+
+You can use `UNION ALL` to combine any number of queries. Example:
```sql
SELECT CounterID, 1 AS table, toInt64(count()) AS c
@@ -627,13 +738,13 @@ SELECT CounterID, 2 AS table, sum(Sign) AS c
HAVING c > 0
```
-Only UNION ALL is supported. The regular UNION (UNION DISTINCT) is not supported. If you need UNION DISTINCT, you can write SELECT DISTINCT from a subquery containing UNION ALL.
+Only `UNION ALL` is supported. The normal `UNION` (`UNION DISTINCT`) is not supported. If you need `UNION DISTINCT`, you can write `SELECT DISTINCT` from a subquery containing `UNION ALL`.
-Queries that are parts of UNION ALL can be run simultaneously, and their results can be mixed together.
+Queries that are part of a `UNION ALL` can be run in parallel and their results might be mixed together when returned.
-The structure of results (the number and type of columns) must match for the queries. But the column names can differ. In this case, the column names for the final result will be taken from the first query.
+The structure of results (the number and type of columns) must match for the queries. But the column names can differ. In this case, the column names for the final result will be taken from the first query. Type casting is performed for unions. For example, if two queries being combined have the same field with non-`Nullable` and `Nullable` types from a compatible type, the resulting `UNION ALL` has a `Nullable` type field.
-Queries that are parts of UNION ALL can't be enclosed in brackets. ORDER BY and LIMIT are applied to separate queries, not to the final result. If you need to apply a conversion to the final result, you can put all the queries with UNION ALL in a subquery in the FROM clause.
+Queries that are parts of `UNION ALL` can't be enclosed in brackets. `ORDER BY` and `LIMIT` are applied to separate queries, not to the final result. If you need to apply a conversion to the final result, you can put all the queries with `UNION ALL` in a subquery in the `FROM` clause.
### INTO OUTFILE Clause
@@ -652,7 +763,9 @@ If the FORMAT clause is omitted, the default format is used, which depends on bo
When using the command-line client, data is passed to the client in an internal efficient format. The client independently interprets the FORMAT clause of the query and formats the data itself (thus relieving the network and the server from the load).
-### IN Operators
+
+
+### IN operators
The `IN`, `NOT IN`, `GLOBAL IN`, and `GLOBAL NOT IN` operators are covered separately, since their functionality is quite rich.
@@ -716,14 +829,47 @@ ORDER BY EventDate ASC
For each day after March 17th, count the percentage of pageviews made by users who visited the site on March 17th.
A subquery in the IN clause is always run just one time on a single server. There are no dependent subqueries.
+#### NULL processing
+
+During request processing, the IN operator assumes that the result of an operation with [NULL](syntax.md#null-literal) is always equal to `0`, regardless of whether `NULL` is on the right or left side of the operator. `NULL` values are not included in any dataset, do not correspond to each other and cannot be compared.
+
+Here is an example with the `t_null` table:
+
+```
+┌─x─┬────y─┐
+│ 1 │ ᴺᵁᴸᴸ │
+│ 2 │ 3 │
+└───┴──────┘
+```
+
+Running the query `SELECT x FROM t_null WHERE y IN (NULL,3)` gives you the following result:
+
+```
+┌─x─┐
+│ 2 │
+└───┘
+```
+
+You can see that the row in which `y = NULL` is thrown out of the query results. This is because ClickHouse can't decide whether `NULL` is included in the `(NULL,3)` set, returns `0` as the result of the operation, and `SELECT` excludes this row from the final output.
+
+```
+SELECT y IN (NULL, 3)
+FROM t_null
+
+┌─in(y, tuple(NULL, 3))─┐
+│ 0 │
+│ 1 │
+└───────────────────────┘
+```
+
#### Distributed Subqueries
-There are two options for IN-s with subqueries (similar to JOINs): normal `IN` / ` OIN` and `IN GLOBAL` / `GLOBAL JOIN`. They differ in how they are run for distributed query processing.
+There are two options for IN-s with subqueries (similar to JOINs): normal `IN` / `OIN` and `IN GLOBAL` / `GLOBAL JOIN`. They differ in how they are run for distributed query processing.
-!!! attention
- Remember that the algorithms described below may work differently depending on the [settings](../operations/settings/settings.md#settings-distributed_product_mode) `distributed_product_mode` setting.
+!!! Attention:
+Remember that the algorithms described below may work differently depending on the [settings](../operations/settings/settings.md#settings-distributed_product_mode) `distributed_product_mode` setting.
When using the regular IN, the query is sent to remote servers, and each of them runs the subqueries in the `IN` or `JOIN` clause.
diff --git a/docs/en/query_language/syntax.md b/docs/en/query_language/syntax.md
index 5d7409ec502..cb2847f190e 100644
--- a/docs/en/query_language/syntax.md
+++ b/docs/en/query_language/syntax.md
@@ -35,34 +35,32 @@ Keywords (such as `SELECT`) are not case-sensitive. Everything else (column name
Identifiers (column names, functions, and data types) can be quoted or non-quoted.
Non-quoted identifiers start with a Latin letter or underscore, and continue with a Latin letter, underscore, or number. In other words, they must match the regex `^[a-zA-Z_][0-9a-zA-Z_]*$`. Examples: `x, _1, X_y__Z123_.`
-Quoted identifiers are placed in reversed quotation marks `` `id` `` (the same as in MySQL), and can indicate any set of bytes (non-empty). In addition, symbols (for example, the reverse quotation mark) inside this type of identifier can be backslash-escaped. Escaping rules are the same as for string literals (see below).
+Quoted identifiers are placed in reversed quotation marks ` `id` ` (the same as in MySQL), and can indicate any set of bytes (non-empty). In addition, symbols (for example, the reverse quotation mark) inside this type of identifier can be backslash-escaped. Escaping rules are the same as for string literals (see below).
We recommend using identifiers that do not need to be quoted.
## Literals
-There are numeric literals, string literals, and compound literals.
-
-### Numeric Literals
+### Numeric
A numeric literal tries to be parsed:
-- First as a 64-bit signed number, using the 'strtoull' function.
-- If unsuccessful, as a 64-bit unsigned number, using the 'strtoll' function.
-- If unsuccessful, as a floating-point number using the 'strtod' function.
+- First as a 64-bit signed number, using the `strtoull` function.
+- If unsuccessful, as a 64-bit unsigned number, using the `strtoll` function.
+- If unsuccessful, as a floating-point number using the `strtod` function.
- Otherwise, an error is returned.
The corresponding value will have the smallest type that the value fits in.
-For example, 1 is parsed as UInt8, but 256 is parsed as UInt16. For more information, see "Data types".
+For example, 1 is parsed as `UInt8`, and 256 is parsed as `UInt16`. For more information, see the section [Data types](../data_types/index.md#data_types).
Examples: `1`, `18446744073709551615`, `0xDEADBEEF`, `01`, `0.1`, `1e100`, `-1e-100`, `inf`, `nan`.
-### String Literals
+### String
-Only string literals in single quotes are supported. The enclosed characters can be backslash-escaped. The following escape sequences have a corresponding special value: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\a`, `\v`, `\xHH`. In all other cases, escape sequences in the format `\c`, where "c" is any character, are converted to "c". This means that you can use the sequences `\'`and`\\`. The value will have the String type.
+Only string literals in single quotes are supported. The enclosed characters can be backslash-escaped. The following escape sequences have a corresponding special value: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\a`, `\v`, `\xHH`. In all other cases, escape sequences in the format `\c`, where "c" is any character, are converted to "c". This means that you can use the sequences `\'`and`\\`. The value will be of type [String](../data_types/string.md#data_types-string).
The minimum set of characters that you need to escape in string literals: `'` and `\`.
-### Compound Literals
+### Compound
Constructions are supported for arrays: `[1, 2, 3]` and tuples: `(1, 'Hello, world!', 2)`..
Actually, these are not literals, but expressions with the array creation operator and the tuple creation operator, respectively.
@@ -70,6 +68,20 @@ For more information, see the section "Operators2".
An array must consist of at least one item, and a tuple must have at least two items.
Tuples have a special purpose for use in the IN clause of a SELECT query. Tuples can be obtained as the result of a query, but they can't be saved to a database (with the exception of Memory-type tables).
+
+
+### NULL
+
+Indicates that the value is missing.
+
+In order to store `NULL` in a table field, it must be of the [Nullable](../data_types/nullable.md#data_type-nullable) type.
+
+Depending on the data format (input or output), `NULL` may have a different representation. For more information, see the documentation for [data formats](../interfaces/formats.md#formats).
+
+There are many nuances to processing `NULL`. For example, if at least one of the arguments of a comparison operation is `NULL`, the result of this operation will also be `NULL`. The same is true for multiplication, addition, and other operations. For more information, read the documentation for each operation.
+
+In queries, you can check `NULL` using the [IS NULL](operators.md#operator-is-null) and [IS NOT NULL](operators.md#operator-is-not-null) operators and the related functions `isNull` and `isNotNull`.
+
## Functions
Functions are written like an identifier with a list of arguments (possibly empty) in brackets. In contrast to standard SQL, the brackets are required, even for an empty arguments list. Example: `now()`.
@@ -104,4 +116,3 @@ In a `SELECT` query, an asterisk can replace the expression. For more informatio
An expression is a function, identifier, literal, application of an operator, expression in brackets, subquery, or asterisk. It can also contain a synonym.
A list of expressions is one or more expressions separated by commas.
Functions and operators, in turn, can have expressions as arguments.
-
diff --git a/docs/en/query_language/table_functions/file.md b/docs/en/query_language/table_functions/file.md
index 67eb5742988..a8e299f36c7 100644
--- a/docs/en/query_language/table_functions/file.md
+++ b/docs/en/query_language/table_functions/file.md
@@ -2,17 +2,18 @@
# file
-`file(path, format, structure)` - returns a table created from a path file with a format type, with columns specified in structure.
+`file(path, format, structure` — Returns a table with the columns specified in the structure created from the file at path in the specified format.
-path - a relative path to a file from [user_files_path](../../operations/server_settings/settings.md#user_files_path).
+path — The relative path to the file from [user_files_path](../../operations/server_settings/settings.md#user_files_path).
-format - file [format](../../interfaces/formats.md#formats).
+format — The file data [format](../../interfaces/formats.md#formats).
-structure - table structure in 'UserID UInt64, URL String' format. Determines column names and types.
+structure — The structure of the table in the format 'UserID UInt64, URL String'. Defines the column names and types.
**Example**
```sql
--- getting the first 10 lines of a table that contains 3 columns of UInt32 type from a CSV file
+-- Get the first 10 rows of a table consisting of three UInt32 columns from a CSV file.
SELECT * FROM file('test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32') LIMIT 10
```
+
diff --git a/docs/en/query_language/table_functions/numbers.md b/docs/en/query_language/table_functions/numbers.md
index 4486fece3d1..15e62b2abf1 100644
--- a/docs/en/query_language/table_functions/numbers.md
+++ b/docs/en/query_language/table_functions/numbers.md
@@ -1,22 +1,22 @@
# numbers
-`numbers(N)` – Returns a table with the single 'number' column (UInt64) that contains integers from 0 to N-1.
-`numbers(N, M)` - Returns a table with the single 'number' column (UInt64) that contains integers from N to (N + M - 1).
+`numbers(N)` — Returns a table with the single `number` column (UInt64) that contains integers from `0` to `N-1`.
+`numbers(N, M)` — Returns a table with the single `number` column (UInt64) that contains integers from `N` to `(N + M - 1)`.
-Similar to the `system.numbers` table, it can be used for testing and generating successive values, `numbers(N, M)` more efficient than `system.numbers`.
+Similar to the `system.numbers` table, it can be used for testing and generating successive values. The `numbers(N, M)` function is more efficient than a selection from `system.numbers`.
The following queries are equivalent:
```sql
SELECT * FROM numbers(10);
-SELECT * FROM numbers(0, 10);
+SELECT * FROM numbers(0,10);
SELECT * FROM system.numbers LIMIT 10;
```
Examples:
```sql
--- Generate a sequence of dates from 2010-01-01 to 2010-12-31
+-- generate a sequence of all dates from 2010-01-01 to 2010-12-31
select toDate('2010-01-01') + number as d FROM numbers(365);
```
diff --git a/docs/en/query_language/table_functions/remote.md b/docs/en/query_language/table_functions/remote.md
index 425c6f81a7d..c929176fee1 100644
--- a/docs/en/query_language/table_functions/remote.md
+++ b/docs/en/query_language/table_functions/remote.md
@@ -13,8 +13,8 @@ remote('addresses_expr', db.table[, 'user'[, 'password']])
`addresses_expr` – An expression that generates addresses of remote servers. This may be just one server address. The server address is `host:port`, or just `host`. The host can be specified as the server name, or as the IPv4 or IPv6 address. An IPv6 address is specified in square brackets. The port is the TCP port on the remote server. If the port is omitted, it uses `tcp_port` from the server's config file (by default, 9000).
-!!! important
- The port is required for an IPv6 address.
+!!! Important:
+With an IPv6 address, you must specify the port.
Examples:
@@ -49,7 +49,7 @@ example01-{01..02}-1
If you have multiple pairs of curly brackets, it generates the direct product of the corresponding sets.
-Addresses and parts of addresses in curly brackets can be separated by the pipe symbol (|). In this case, the corresponding sets of addresses are interpreted as replicas, and the query will be sent to the first healthy replica. However, the replicas are iterated in the order currently set in the [load_balancing](../../operations/settings/settings.md#settings-load_balancing) setting.
+Addresses and parts of addresses in curly brackets can be separated by the pipe symbol (|). In this case, the corresponding sets of addresses are interpreted as replicas, and the query will be sent to the first healthy replica. The replicas are evaluated in the order currently set in the [load_balancing](../../operations/settings/settings.md#settings-load_balancing).
Example:
@@ -72,4 +72,3 @@ The `remote` table function can be useful in the following cases:
If the user is not specified, `default` is used.
If the password is not specified, an empty password is used.
-
diff --git a/docs/en/query_language/table_functions/url.md b/docs/en/query_language/table_functions/url.md
index 7e30936bd45..3f2a706e208 100644
--- a/docs/en/query_language/table_functions/url.md
+++ b/docs/en/query_language/table_functions/url.md
@@ -2,18 +2,20 @@
# url
-`url(URL, format, structure)` - returns a table created from the `URL` with given
-`format` and `structure`.
+`url(URL, format, structure)` — Returns a table with the columns specified in
+`structure`, created from data located at `URL` in the specified `format`.
-URL - HTTP or HTTPS server address, which can accept `GET` and/or `POST` requests.
+URL — URL where the server accepts `GET` and/or `POST` requests
+over HTTP or HTTPS.
-format - [format](../../interfaces/formats.md#formats) of the data.
+format — The data [format](../../interfaces/formats.md#formats).
-structure - table structure in `'UserID UInt64, Name String'` format. Determines column names and types.
+structure — The structure of the table in the format `'UserID UInt64, Name String'`. Defines the column names and types.
**Example**
```sql
--- getting the first 3 lines of a table that contains columns of String and UInt32 type from HTTP-server which answers in CSV format.
+-- Get 3 rows of a table consisting of two columns of type String and UInt32 from the server, which returns the data in CSV format
SELECT * FROM url('http://127.0.0.1:12345/', CSV, 'column1 String, column2 UInt32') LIMIT 3
```
+
diff --git a/docs/en/roadmap.md b/docs/en/roadmap.md
index 51b6df94107..21520f92693 100644
--- a/docs/en/roadmap.md
+++ b/docs/en/roadmap.md
@@ -2,19 +2,20 @@
## Q3 2018
-- `ALTER UPDATE` for batch changing the data with approach similar to `ALTER DELETE`
-- Protobuf and Parquet input and output formats
-- Improved compatibility with Tableau and other BI tools
+- `ALTER UPDATE` for mass changes to data using an approach similar to `ALTER DELETE`
+- Adding Protobuf and Parquet to the range of supported input/output formats
+- Improved compatibility with Tableau and other business analytics tools
## Q4 2018
-- JOIN syntax compatible with SQL standard:
- - Mutliple `JOIN`s in single `SELECT`
- - Connecting tables with `ON`
- - Support table reference instead of subquery
+- JOIN syntax that conforms to the SQL standard:
+ - Multiple `JOIN`s in a single `SELECT`
+ - Setting the relationship between tables via `ON`
+ - Ability to refer to the table name instead of using a subquery
-- JOIN execution improvements:
- - Distributed join not limited by memory
- - Predicate pushdown through join
+- Improvements in the performance of JOIN:
+ - Distributed JOIN not limited by RAM
+ - Transferring predicates that depend on only one side via JOIN
+
+- Resource pools for more accurate distribution of cluster capacity between its users
-- Resource pools for more precise distribution of cluster capacity between users
diff --git a/docs/ru/operations/system_tables.md b/docs/ru/operations/system_tables.md
index 84fcb14092b..ee03d00ac34 100644
--- a/docs/ru/operations/system_tables.md
+++ b/docs/ru/operations/system_tables.md
@@ -122,7 +122,7 @@ default_expression String - выражение для значения по ум
## system.parts
-Содержит информацию о кусках таблиц семейства [MergeTree](../operations/table_engines/mergetree.md#table_engines-mergetree).
+Содержит информацию о кусках таблиц семейства [MergeTree](table_engines/mergetree.md#table_engines-mergetree).
Каждая строка описывает один кусок данных.
@@ -134,7 +134,7 @@ default_expression String - выражение для значения по ум
- `YYYYMM` для автоматической схемы партиционирования по месяцам.
- `any_string` при партиционировании вручную.
-
+
- name (String) - Имя куска.
- active (UInt8) - Признак активности. Если кусок активен, то он используется таблице, в противном случает он будет удален. Неактивные куски остаются после слияний.
- marks (UInt64) - Количество засечек. Чтобы получить примерное количество строк в куске, умножьте ``marks`` на гранулированность индекса (обычно 8192).
diff --git a/docs/ru/operations/table_engines/file.md b/docs/ru/operations/table_engines/file.md
index a4672929d72..f6573676822 100644
--- a/docs/ru/operations/table_engines/file.md
+++ b/docs/ru/operations/table_engines/file.md
@@ -68,7 +68,7 @@ SELECT * FROM file_engine_table
$ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64) ENGINE = File(CSV, stdin); SELECT a, b FROM table; DROP TABLE table"
```
-## Особенности использования
+## Детали реализации
- Поддерживается многопоточное чтение и однопоточная запись.
- Не поддерживается:
diff --git a/docs/ru/operations/utils/clickhouse-local.md b/docs/ru/operations/utils/clickhouse-local.md
index 60ab2b0a8e8..ddaa64e0a21 100644
--- a/docs/ru/operations/utils/clickhouse-local.md
+++ b/docs/ru/operations/utils/clickhouse-local.md
@@ -24,7 +24,7 @@ clickhouse-local --structure "table_structure" --input-format "format_of_incomin
- `-S`, `--structure` — структура таблицы, в которую будут помещены входящие данные.
- `-if`, `--input-format` — формат входящих данных. По умолчанию — `TSV`.
- `-f`, `--file` — путь к файлу с данными. По умолчанию — `stdin`.
-- `-q` `--query` — запросы на выполнение. Разделитель запросов — `;`.
+- `-q`, `--query` — запросы на выполнение. Разделитель запросов — `;`.
- `-N`, `--table` — имя таблицы, в которую будут помещены входящие данные. По умолчанию - `table`.
- `-of`, `--format`, `--output-format` — формат выходных данных. По умолчанию — `TSV`.
- `--stacktrace` — вывод отладочной информации при исключениях.
diff --git a/docs/ru/query_language/syntax.md b/docs/ru/query_language/syntax.md
index 192b314cfbf..95609b37109 100644
--- a/docs/ru/query_language/syntax.md
+++ b/docs/ru/query_language/syntax.md
@@ -35,7 +35,7 @@ INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')
Идентификаторы (имена столбцов, функций, типов данных) могут быть квотированными или не квотированными.
Не квотированные идентификаторы начинаются на букву латинского алфавита или подчёркивание; продолжаются на букву латинского алфавита или подчёркивание или цифру. Короче говоря, должны соответствовать регулярному выражению `^[a-zA-Z_][0-9a-zA-Z_]*$`. Примеры: `x, _1, X_y__Z123_.`
-Квотированные идентификаторы расположены в обратных кавычках `` `id` `` (также, как в MySQL), и могут обозначать произвольный (непустой) набор байт. При этом, внутри записи такого идентификатора, символы (например, символ обратной кавычки) могут экранироваться с помощью обратного слеша. Правила экранирования такие же, как в строковых литералах (см. ниже).
+Квотированные идентификаторы расположены в обратных кавычках ` `id` ` (также, как в MySQL), и могут обозначать произвольный (непустой) набор байт. При этом, внутри записи такого идентификатора, символы (например, символ обратной кавычки) могут экранироваться с помощью обратного слеша. Правила экранирования такие же, как в строковых литералах (см. ниже).
Рекомендуется использовать идентификаторы, которые не нужно квотировать.
## Литералы
diff --git a/docs/toc_en.yml b/docs/toc_en.yml
index 7c4f99921d7..99e9fb3076f 100644
--- a/docs/toc_en.yml
+++ b/docs/toc_en.yml
@@ -40,6 +40,7 @@ nav:
- 'Array(T)': 'data_types/array.md'
- 'AggregateFunction(name, types_of_arguments...)': 'data_types/nested_data_structures/aggregatefunction.md'
- 'Tuple(T1, T2, ...)': 'data_types/tuple.md'
+ - 'Nullable': 'data_types/nullable.md'
- 'Nested data structures':
- 'hidden': 'data_types/nested_data_structures/index.md'
- 'Nested(Name1 Type1, Name2 Type2, ...)': 'data_types/nested_data_structures/nested.md'
@@ -47,6 +48,7 @@ nav:
- 'hidden': 'data_types/special_data_types/index.md'
- 'Expression': 'data_types/special_data_types/expression.md'
- 'Set': 'data_types/special_data_types/set.md'
+ - 'Nothing': 'data_types/special_data_types/nothing.md'
- 'SQL reference':
- 'hidden': 'query_language/index.md'
@@ -83,6 +85,8 @@ nav:
- 'Functions for working with Yandex.Metrica dictionaries': 'query_language/functions/ym_dict_functions.md'
- 'Functions for implementing the IN operator': 'query_language/functions/in_functions.md'
- 'arrayJoin function': 'query_language/functions/array_join.md'
+ - 'Functions for working with geographical coordinates': 'query_language/functions/geo.md'
+ - 'Functions for working with Nullable arguments': 'query_language/functions/functions_for_nulls.md'
- 'Aggregate functions':
- 'Introduction': 'query_language/agg_functions/index.md'
- 'Function reference': 'query_language/agg_functions/reference.md'
From 1ea6bc969103aeba26e345b7108b76d499fc72c6 Mon Sep 17 00:00:00 2001
From: chertus
Date: Tue, 4 Sep 2018 18:04:23 +0300
Subject: [PATCH 17/25] decimal_check_overflow [CLICKHOUSE-3765]
---
dbms/src/DataTypes/DataTypesDecimal.cpp | 4 ++--
dbms/src/Interpreters/Settings.h | 3 +--
dbms/tests/queries/0_stateless/00700_decimal_arithm.sql | 2 +-
3 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/dbms/src/DataTypes/DataTypesDecimal.cpp b/dbms/src/DataTypes/DataTypesDecimal.cpp
index 5e78c980848..9f81107eb68 100644
--- a/dbms/src/DataTypes/DataTypesDecimal.cpp
+++ b/dbms/src/DataTypes/DataTypesDecimal.cpp
@@ -20,8 +20,8 @@ namespace ErrorCodes
}
-bool decimalCheckComparisonOverflow(const Context & context) { return context.getSettingsRef().decimal_check_comparison_overflow; }
-bool decimalCheckArithmeticOverflow(const Context & context) { return context.getSettingsRef().decimal_check_arithmetic_overflow; }
+bool decimalCheckComparisonOverflow(const Context & context) { return context.getSettingsRef().decimal_check_overflow; }
+bool decimalCheckArithmeticOverflow(const Context & context) { return context.getSettingsRef().decimal_check_overflow; }
//
diff --git a/dbms/src/Interpreters/Settings.h b/dbms/src/Interpreters/Settings.h
index d2afd5bb036..da3b7364d36 100644
--- a/dbms/src/Interpreters/Settings.h
+++ b/dbms/src/Interpreters/Settings.h
@@ -281,8 +281,7 @@ struct Settings
M(SettingBool, low_cardinality_use_single_dictionary_for_part, false, "LowCardinality type serialization setting. If is true, than will use additional keys when global dictionary overflows. Otherwise, will create several shared dictionaries.") \
M(SettingBool, allow_experimental_low_cardinality_type, false, "Allows to create table with LowCardinality types.") \
M(SettingBool, allow_experimental_decimal_type, false, "Enables Decimal data type.") \
- M(SettingBool, decimal_check_comparison_overflow, true, "Check overflow of decimal comparison operations") \
- M(SettingBool, decimal_check_arithmetic_overflow, true, "Check overflow of decimal arithmetic operations") \
+ M(SettingBool, decimal_check_overflow, true, "Check overflow of decimal arithmetic/comparison operations") \
\
M(SettingBool, prefer_localhost_replica, 1, "1 - always send query to local replica, if it exists. 0 - choose replica to send query between local and remote ones according to load_balancing") \
M(SettingUInt64, max_fetch_partition_retries_count, 5, "Amount of retries while fetching partition from another host.") \
diff --git a/dbms/tests/queries/0_stateless/00700_decimal_arithm.sql b/dbms/tests/queries/0_stateless/00700_decimal_arithm.sql
index f0802b7e83a..7ebd189b690 100644
--- a/dbms/tests/queries/0_stateless/00700_decimal_arithm.sql
+++ b/dbms/tests/queries/0_stateless/00700_decimal_arithm.sql
@@ -58,7 +58,7 @@ SELECT 21 + j, 21 - j, 84 - j, 21 * j, -21 * j, 21 / j, 84 / j FROM test.decimal
SELECT a, -a, -b, -c, -d, -e, -f, -g, -h, -j from test.decimal ORDER BY a;
SELECT abs(a), abs(b), abs(c), abs(d), abs(e), abs(f), abs(g), abs(h), abs(j) from test.decimal ORDER BY a;
-SET decimal_check_arithmetic_overflow = 0;
+SET decimal_check_overflow = 0;
SELECT (h * h) != 0, (h / h) != 1 FROM test.decimal WHERE h > 0;
SELECT (i * i) != 0, (i / i) = 1 FROM test.decimal WHERE i > 0;
From e11f3ea5bc50ef663db1716205ceb50145be967e Mon Sep 17 00:00:00 2001
From: chertus
Date: Tue, 4 Sep 2018 21:51:44 +0300
Subject: [PATCH 18/25] enable nullable for Decimal [CLICKHOUSE-3765]
---
dbms/src/Columns/ColumnDecimal.h | 2 +-
dbms/src/DataTypes/DataTypesDecimal.h | 2 +-
.../0_stateless/00700_decimal_casts.sql | 8 ---
.../0_stateless/00700_decimal_null.reference | 39 +++++++++++
.../0_stateless/00700_decimal_null.sql | 65 +++++++++++++++++++
5 files changed, 106 insertions(+), 10 deletions(-)
create mode 100644 dbms/tests/queries/0_stateless/00700_decimal_null.reference
create mode 100644 dbms/tests/queries/0_stateless/00700_decimal_null.sql
diff --git a/dbms/src/Columns/ColumnDecimal.h b/dbms/src/Columns/ColumnDecimal.h
index 04a1b7a1a56..21de35ce845 100644
--- a/dbms/src/Columns/ColumnDecimal.h
+++ b/dbms/src/Columns/ColumnDecimal.h
@@ -77,7 +77,7 @@ public:
const char * getFamilyName() const override { return TypeName::get(); }
bool isNumeric() const override { return false; }
- bool canBeInsideNullable() const override { return false; }
+ bool canBeInsideNullable() const override { return true; }
bool isFixedAndContiguous() const override { return true; }
size_t sizeOfValueIfFixed() const override { return sizeof(T); }
diff --git a/dbms/src/DataTypes/DataTypesDecimal.h b/dbms/src/DataTypes/DataTypesDecimal.h
index cb47ab57e11..f9e5fc01f6f 100644
--- a/dbms/src/DataTypes/DataTypesDecimal.h
+++ b/dbms/src/DataTypes/DataTypesDecimal.h
@@ -148,7 +148,7 @@ public:
bool canBeUsedInBooleanContext() const override { return true; }
bool isNumber() const override { return true; }
bool isInteger() const override { return false; }
- bool canBeInsideNullable() const override { return false; }
+ bool canBeInsideNullable() const override { return true; }
/// Decimal specific
diff --git a/dbms/tests/queries/0_stateless/00700_decimal_casts.sql b/dbms/tests/queries/0_stateless/00700_decimal_casts.sql
index 21350861343..f2d0d63ffc2 100644
--- a/dbms/tests/queries/0_stateless/00700_decimal_casts.sql
+++ b/dbms/tests/queries/0_stateless/00700_decimal_casts.sql
@@ -1,14 +1,6 @@
SET allow_experimental_decimal_type = 1;
SET send_logs_level = 'none';
-CREATE TABLE IF NOT EXISTS test.x (a Nullable(Decimal(9, 2))) ENGINE = Memory; -- { serverError 43 }
-CREATE TABLE IF NOT EXISTS test.x (a Nullable(Decimal(18, 2))) ENGINE = Memory; -- { serverError 43 }
-CREATE TABLE IF NOT EXISTS test.x (a Nullable(Decimal(38, 2))) ENGINE = Memory; -- { serverError 43 }
-
-SELECT toNullable(toDecimal32(0, 0)); -- { serverError 43 }
-SELECT toNullable(toDecimal64(0, 0)); -- { serverError 43 }
-SELECT toNullable(toDecimal128(0, 0)); -- { serverError 43 }
-
SELECT toDecimal32('1.1', 1), toDecimal32('1.1', 2), toDecimal32('1.1', 8);
SELECT toDecimal32('1.1', 0); -- { serverError 69 }
SELECT toDecimal32(1.1, 0), toDecimal32(1.1, 1), toDecimal32(1.1, 2), toDecimal32(1.1, 8);
diff --git a/dbms/tests/queries/0_stateless/00700_decimal_null.reference b/dbms/tests/queries/0_stateless/00700_decimal_null.reference
new file mode 100644
index 00000000000..250a437a883
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00700_decimal_null.reference
@@ -0,0 +1,39 @@
+32 32
+64 64
+128 128
+1 1 1
+2 2 2
+3 3 3
+4 4 4
+5 5 5
+6 6 6
+7 7 7
+8 8 8
+\N \N
+\N \N
+1 1
+1 1
+\N
+\N
+1
+1
+1.10 1.10000 1.10000 1.1000 1.10000000 1.10000000
+2.20 2.20000 2.20000 2.2000 \N \N
+3.30 3.30000 3.30000 \N 3.30000000 \N
+4.40 4.40000 4.40000 \N \N 4.40000000
+5.50 5.50000 5.50000 \N \N \N
+0 1
+0 1
+0 1
+1 0
+1 0
+1 0
+5
+5
+5
+3
+3
+3
+2
+2
+2
diff --git a/dbms/tests/queries/0_stateless/00700_decimal_null.sql b/dbms/tests/queries/0_stateless/00700_decimal_null.sql
new file mode 100644
index 00000000000..abc4ef86890
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00700_decimal_null.sql
@@ -0,0 +1,65 @@
+SET send_logs_level = 'none';
+SET allow_experimental_decimal_type = 1;
+
+CREATE DATABASE IF NOT EXISTS test;
+DROP TABLE IF EXISTS test.decimal;
+
+CREATE TABLE IF NOT EXISTS test.decimal
+(
+ a DEC(9, 2),
+ b DEC(18, 5),
+ c DEC(38, 5),
+ d Nullable(DEC(9, 4)),
+ e Nullable(DEC(18, 8)),
+ f Nullable(DEC(38, 8))
+) ENGINE = Memory;
+
+SELECT toNullable(toDecimal32(32, 0)) AS x, assumeNotNull(x);
+SELECT toNullable(toDecimal64(64, 0)) AS x, assumeNotNull(x);
+SELECT toNullable(toDecimal128(128, 0)) AS x, assumeNotNull(x);
+SELECT NULL AS x, assumeNotNull(x); -- { serverError 48 }
+
+SELECT ifNull(toDecimal32(1, 0), NULL), ifNull(toDecimal64(1, 0), NULL), ifNull(toDecimal128(1, 0), NULL);
+SELECT ifNull(toNullable(toDecimal32(2, 0)), NULL), ifNull(toNullable(toDecimal64(2, 0)), NULL), ifNull(toNullable(toDecimal128(2, 0)), NULL);
+SELECT ifNull(NULL, toDecimal32(3, 0)), ifNull(NULL, toDecimal64(3, 0)), ifNull(NULL, toDecimal128(3, 0));
+SELECT ifNull(NULL, toNullable(toDecimal32(4, 0))), ifNull(NULL, toNullable(toDecimal64(4, 0))), ifNull(NULL, toNullable(toDecimal128(4, 0)));
+
+SELECT coalesce(toDecimal32(5, 0), NULL), coalesce(toDecimal64(5, 0), NULL), coalesce(toDecimal128(5, 0), NULL);
+SELECT coalesce(NULL, toDecimal32(6, 0)), coalesce(NULL, toDecimal64(6, 0)), coalesce(NULL, toDecimal128(6, 0));
+
+SELECT coalesce(toNullable(toDecimal32(7, 0)), NULL), coalesce(toNullable(toDecimal64(7, 0)), NULL), coalesce(toNullable(toDecimal128(7, 0)), NULL);
+SELECT coalesce(NULL, toNullable(toDecimal32(8, 0))), coalesce(NULL, toNullable(toDecimal64(8, 0))), coalesce(NULL, toNullable(toDecimal128(8, 0)));
+
+SELECT nullIf(toNullable(toDecimal32(1, 0)), toDecimal32(1, 0)), nullIf(toNullable(toDecimal64(1, 0)), toDecimal64(1, 0));
+SELECT nullIf(toDecimal32(1, 0), toNullable(toDecimal32(1, 0))), nullIf(toDecimal64(1, 0), toNullable(toDecimal64(1, 0)));
+SELECT nullIf(toNullable(toDecimal32(1, 0)), toDecimal32(2, 0)), nullIf(toNullable(toDecimal64(1, 0)), toDecimal64(2, 0));
+SELECT nullIf(toDecimal32(1, 0), toNullable(toDecimal32(2, 0))), nullIf(toDecimal64(1, 0), toNullable(toDecimal64(2, 0)));
+SELECT nullIf(toNullable(toDecimal128(1, 0)), toDecimal128(1, 0));
+SELECT nullIf(toDecimal128(1, 0), toNullable(toDecimal128(1, 0)));
+SELECT nullIf(toNullable(toDecimal128(1, 0)), toDecimal128(2, 0));
+SELECT nullIf(toDecimal128(1, 0), toNullable(toDecimal128(2, 0)));
+
+INSERT INTO test.decimal (a, b, c, d, e, f) VALUES (1.1, 1.1, 1.1, 1.1, 1.1, 1.1);
+INSERT INTO test.decimal (a, b, c, d) VALUES (2.2, 2.2, 2.2, 2.2);
+INSERT INTO test.decimal (a, b, c, e) VALUES (3.3, 3.3, 3.3, 3.3);
+INSERT INTO test.decimal (a, b, c, f) VALUES (4.4, 4.4, 4.4, 4.4);
+INSERT INTO test.decimal (a, b, c) VALUES (5.5, 5.5, 5.5);
+
+SELECT * FROM test.decimal ORDER BY d, e, f;
+SELECT isNull(a), isNotNull(a) FROM test.decimal WHERE a = toDecimal32(5.5, 1);
+SELECT isNull(b), isNotNull(b) FROM test.decimal WHERE a = toDecimal32(5.5, 1);
+SELECT isNull(c), isNotNull(c) FROM test.decimal WHERE a = toDecimal32(5.5, 1);
+SELECT isNull(d), isNotNull(d) FROM test.decimal WHERE a = toDecimal32(5.5, 1);
+SELECT isNull(e), isNotNull(e) FROM test.decimal WHERE a = toDecimal32(5.5, 1);
+SELECT isNull(f), isNotNull(f) FROM test.decimal WHERE a = toDecimal32(5.5, 1);
+SELECT count() FROM test.decimal WHERE a IS NOT NULL;
+SELECT count() FROM test.decimal WHERE b IS NOT NULL;
+SELECT count() FROM test.decimal WHERE c IS NOT NULL;
+SELECT count() FROM test.decimal WHERE d IS NULL;
+SELECT count() FROM test.decimal WHERE e IS NULL;
+SELECT count() FROM test.decimal WHERE f IS NULL;
+SELECT count() FROM test.decimal WHERE d IS NULL AND e IS NULL;
+SELECT count() FROM test.decimal WHERE d IS NULL AND f IS NULL;
+SELECT count() FROM test.decimal WHERE e IS NULL AND f IS NULL;
+
+DROP TABLE IF EXISTS test.decimal;
From d10f9200b9f1d72b12be3541caa936e1e5e839ad Mon Sep 17 00:00:00 2001
From: Alexey Milovidov
Date: Tue, 4 Sep 2018 22:24:45 +0300
Subject: [PATCH 19/25] Consistency of FileOpen event [#CLICKHOUSE-3943]
---
dbms/src/Common/CounterInFile.h | 4 ++--
dbms/src/Common/StatusFile.cpp | 2 +-
dbms/src/IO/MMapReadBufferFromFile.cpp | 1 -
dbms/src/IO/ReadBufferFromFile.cpp | 2 +-
dbms/src/IO/WriteBufferAIO.cpp | 2 ++
dbms/src/IO/WriteBufferFromFile.cpp | 2 +-
6 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/dbms/src/Common/CounterInFile.h b/dbms/src/Common/CounterInFile.h
index 2e7afaa79de..6ea34362a59 100644
--- a/dbms/src/Common/CounterInFile.h
+++ b/dbms/src/Common/CounterInFile.h
@@ -54,7 +54,7 @@ public:
"You must create it manulally with appropriate value or 0 for first start.");
}
- int fd = open(path.c_str(), O_RDWR | O_CREAT, 0666);
+ int fd = ::open(path.c_str(), O_RDWR | O_CREAT, 0666);
if (-1 == fd)
DB::throwFromErrno("Cannot open file " + path);
@@ -128,7 +128,7 @@ public:
{
bool file_exists = Poco::File(path).exists();
- int fd = open(path.c_str(), O_RDWR | O_CREAT, 0666);
+ int fd = ::open(path.c_str(), O_RDWR | O_CREAT, 0666);
if (-1 == fd)
DB::throwFromErrno("Cannot open file " + path);
diff --git a/dbms/src/Common/StatusFile.cpp b/dbms/src/Common/StatusFile.cpp
index 4da9ea48aa7..c34ef553f89 100644
--- a/dbms/src/Common/StatusFile.cpp
+++ b/dbms/src/Common/StatusFile.cpp
@@ -40,7 +40,7 @@ StatusFile::StatusFile(const std::string & path_)
LOG_INFO(&Logger::get("StatusFile"), "Status file " << path << " already exists and is empty - probably unclean hardware restart.");
}
- fd = open(path.c_str(), O_WRONLY | O_CREAT, 0666);
+ fd = ::open(path.c_str(), O_WRONLY | O_CREAT, 0666);
if (-1 == fd)
throwFromErrno("Cannot open file " + path);
diff --git a/dbms/src/IO/MMapReadBufferFromFile.cpp b/dbms/src/IO/MMapReadBufferFromFile.cpp
index d7c4229fb56..74c07c40782 100644
--- a/dbms/src/IO/MMapReadBufferFromFile.cpp
+++ b/dbms/src/IO/MMapReadBufferFromFile.cpp
@@ -9,7 +9,6 @@
namespace ProfileEvents
{
extern const Event FileOpen;
- extern const Event FileOpenFailed;
}
namespace DB
diff --git a/dbms/src/IO/ReadBufferFromFile.cpp b/dbms/src/IO/ReadBufferFromFile.cpp
index 827970fdda4..f3b4ddb7d23 100644
--- a/dbms/src/IO/ReadBufferFromFile.cpp
+++ b/dbms/src/IO/ReadBufferFromFile.cpp
@@ -38,7 +38,7 @@ ReadBufferFromFile::ReadBufferFromFile(
if (o_direct)
flags = flags & ~O_DIRECT;
#endif
- fd = open(file_name.c_str(), flags == -1 ? O_RDONLY : flags);
+ fd = ::open(file_name.c_str(), flags == -1 ? O_RDONLY : flags);
if (-1 == fd)
throwFromErrno("Cannot open file " + file_name, errno == ENOENT ? ErrorCodes::FILE_DOESNT_EXIST : ErrorCodes::CANNOT_OPEN_FILE);
diff --git a/dbms/src/IO/WriteBufferAIO.cpp b/dbms/src/IO/WriteBufferAIO.cpp
index 860383ca02e..9d46567ffd3 100644
--- a/dbms/src/IO/WriteBufferAIO.cpp
+++ b/dbms/src/IO/WriteBufferAIO.cpp
@@ -47,6 +47,8 @@ WriteBufferAIO::WriteBufferAIO(const std::string & filename_, size_t buffer_size
flush_buffer(BufferWithOwnMemory(this->memory.size(), nullptr, DEFAULT_AIO_FILE_BLOCK_SIZE)),
filename(filename_)
{
+ ProfileEvents::increment(ProfileEvents::FileOpen);
+
/// Correct the buffer size information so that additional pages do not touch the base class `BufferBase`.
this->buffer().resize(this->buffer().size() - DEFAULT_AIO_FILE_BLOCK_SIZE);
this->internalBuffer().resize(this->internalBuffer().size() - DEFAULT_AIO_FILE_BLOCK_SIZE);
diff --git a/dbms/src/IO/WriteBufferFromFile.cpp b/dbms/src/IO/WriteBufferFromFile.cpp
index 6f8ec2c5d56..7299d651acc 100644
--- a/dbms/src/IO/WriteBufferFromFile.cpp
+++ b/dbms/src/IO/WriteBufferFromFile.cpp
@@ -41,7 +41,7 @@ WriteBufferFromFile::WriteBufferFromFile(
flags = flags & ~O_DIRECT;
#endif
- fd = open(file_name.c_str(), flags == -1 ? O_WRONLY | O_TRUNC | O_CREAT : flags, mode);
+ fd = ::open(file_name.c_str(), flags == -1 ? O_WRONLY | O_TRUNC | O_CREAT : flags, mode);
if (-1 == fd)
throwFromErrno("Cannot open file " + file_name, errno == ENOENT ? ErrorCodes::FILE_DOESNT_EXIST : ErrorCodes::CANNOT_OPEN_FILE);
From 2332bf1a50454abce3d491b855e3918fe3697e74 Mon Sep 17 00:00:00 2001
From: Alexey Milovidov
Date: Tue, 4 Sep 2018 22:34:34 +0300
Subject: [PATCH 20/25] Code consistency [#CLICKHOUSE-3943]
---
dbms/src/IO/CachedCompressedReadBuffer.cpp | 1 +
dbms/src/IO/CachedCompressedReadBuffer.h | 2 +-
dbms/src/IO/createReadBufferFromFileBase.cpp | 2 +-
dbms/src/IO/createReadBufferFromFileBase.h | 15 ++++++++-------
dbms/src/IO/createWriteBufferFromFileBase.cpp | 8 ++++----
dbms/src/IO/createWriteBufferFromFileBase.h | 19 +++++++++++--------
6 files changed, 26 insertions(+), 21 deletions(-)
diff --git a/dbms/src/IO/CachedCompressedReadBuffer.cpp b/dbms/src/IO/CachedCompressedReadBuffer.cpp
index a9f6a5d778c..6f6836718f4 100644
--- a/dbms/src/IO/CachedCompressedReadBuffer.cpp
+++ b/dbms/src/IO/CachedCompressedReadBuffer.cpp
@@ -1,3 +1,4 @@
+#include
#include
#include
#include
diff --git a/dbms/src/IO/CachedCompressedReadBuffer.h b/dbms/src/IO/CachedCompressedReadBuffer.h
index 9be9dd01b1f..1b5e41972f3 100644
--- a/dbms/src/IO/CachedCompressedReadBuffer.h
+++ b/dbms/src/IO/CachedCompressedReadBuffer.h
@@ -2,7 +2,7 @@
#include
#include
-#include
+#include
#include
#include
#include
diff --git a/dbms/src/IO/createReadBufferFromFileBase.cpp b/dbms/src/IO/createReadBufferFromFileBase.cpp
index b16189c9e5d..7db36924201 100644
--- a/dbms/src/IO/createReadBufferFromFileBase.cpp
+++ b/dbms/src/IO/createReadBufferFromFileBase.cpp
@@ -35,7 +35,7 @@ std::unique_ptr createReadBufferFromFileBase(const std::
ProfileEvents::increment(ProfileEvents::CreatedReadBufferAIO);
return std::make_unique(filename_, buffer_size_, flags_, existing_memory_);
#else
- throw Exception("AIO is not implemented yet on MacOS X", ErrorCodes::NOT_IMPLEMENTED);
+ throw Exception("AIO is not implemented yet on non-Linux OS", ErrorCodes::NOT_IMPLEMENTED);
#endif
}
}
diff --git a/dbms/src/IO/createReadBufferFromFileBase.h b/dbms/src/IO/createReadBufferFromFileBase.h
index 8b5e94c03e5..fa98e536a46 100644
--- a/dbms/src/IO/createReadBufferFromFileBase.h
+++ b/dbms/src/IO/createReadBufferFromFileBase.h
@@ -15,12 +15,13 @@ namespace DB
* If aio_threshold = 0 or estimated_size < aio_threshold, read operations are executed synchronously.
* Otherwise, the read operations are performed asynchronously.
*/
-std::unique_ptr createReadBufferFromFileBase(const std::string & filename_,
- size_t estimated_size,
- size_t aio_threshold,
- size_t buffer_size_ = DBMS_DEFAULT_BUFFER_SIZE,
- int flags_ = -1,
- char * existing_memory_ = nullptr,
- size_t alignment = 0);
+std::unique_ptr createReadBufferFromFileBase(
+ const std::string & filename_,
+ size_t estimated_size,
+ size_t aio_threshold,
+ size_t buffer_size_ = DBMS_DEFAULT_BUFFER_SIZE,
+ int flags_ = -1,
+ char * existing_memory_ = nullptr,
+ size_t alignment = 0);
}
diff --git a/dbms/src/IO/createWriteBufferFromFileBase.cpp b/dbms/src/IO/createWriteBufferFromFileBase.cpp
index b5670b0b16b..1fa26d21c6a 100644
--- a/dbms/src/IO/createWriteBufferFromFileBase.cpp
+++ b/dbms/src/IO/createWriteBufferFromFileBase.cpp
@@ -22,22 +22,22 @@ namespace ErrorCodes
}
#endif
-WriteBufferFromFileBase * createWriteBufferFromFileBase(const std::string & filename_, size_t estimated_size,
+std::unique_ptr createWriteBufferFromFileBase(const std::string & filename_, size_t estimated_size,
size_t aio_threshold, size_t buffer_size_, int flags_, mode_t mode, char * existing_memory_,
size_t alignment)
{
if ((aio_threshold == 0) || (estimated_size < aio_threshold))
{
ProfileEvents::increment(ProfileEvents::CreatedWriteBufferOrdinary);
- return new WriteBufferFromFile(filename_, buffer_size_, flags_, mode, existing_memory_, alignment);
+ return std::make_unique(filename_, buffer_size_, flags_, mode, existing_memory_, alignment);
}
else
{
#if defined(__linux__)
ProfileEvents::increment(ProfileEvents::CreatedWriteBufferAIO);
- return new WriteBufferAIO(filename_, buffer_size_, flags_, mode, existing_memory_);
+ return std::make_unique(filename_, buffer_size_, flags_, mode, existing_memory_);
#else
- throw Exception("AIO is not implemented yet on MacOS X", ErrorCodes::NOT_IMPLEMENTED);
+ throw Exception("AIO is not implemented yet on non-Linux OS", ErrorCodes::NOT_IMPLEMENTED);
#endif
}
}
diff --git a/dbms/src/IO/createWriteBufferFromFileBase.h b/dbms/src/IO/createWriteBufferFromFileBase.h
index 3b4154fcfb0..42cad88303b 100644
--- a/dbms/src/IO/createWriteBufferFromFileBase.h
+++ b/dbms/src/IO/createWriteBufferFromFileBase.h
@@ -2,6 +2,8 @@
#include
#include
+#include
+
namespace DB
{
@@ -13,13 +15,14 @@ namespace DB
* If aio_threshold = 0 or estimated_size < aio_threshold, the write operations are executed synchronously.
* Otherwise, write operations are performed asynchronously.
*/
-WriteBufferFromFileBase * createWriteBufferFromFileBase(const std::string & filename_,
- size_t estimated_size,
- size_t aio_threshold,
- size_t buffer_size_ = DBMS_DEFAULT_BUFFER_SIZE,
- int flags_ = -1,
- mode_t mode = 0666,
- char * existing_memory_ = nullptr,
- size_t alignment = 0);
+std::unique_ptr createWriteBufferFromFileBase(
+ const std::string & filename_,
+ size_t estimated_size,
+ size_t aio_threshold,
+ size_t buffer_size_ = DBMS_DEFAULT_BUFFER_SIZE,
+ int flags_ = -1,
+ mode_t mode = 0666,
+ char * existing_memory_ = nullptr,
+ size_t alignment = 0);
}
From 035c07d01e5d5d9476ecfae19f0158caf9a57e2c Mon Sep 17 00:00:00 2001
From: Alexey Milovidov
Date: Tue, 4 Sep 2018 23:56:09 +0300
Subject: [PATCH 21/25] Fixed error with locking in InterpreterDropQuery
[#CLICKHOUSE-3959]
---
.../src/Interpreters/InterpreterDropQuery.cpp | 8 +++---
dbms/src/Interpreters/InterpreterDropQuery.h | 4 ++-
dbms/src/Storages/IStorage.h | 2 +-
...00704_drop_truncate_memory_table.reference | 1 +
.../00704_drop_truncate_memory_table.sh | 27 +++++++++++++++++++
5 files changed, 36 insertions(+), 6 deletions(-)
create mode 100644 dbms/tests/queries/0_stateless/00704_drop_truncate_memory_table.reference
create mode 100755 dbms/tests/queries/0_stateless/00704_drop_truncate_memory_table.sh
diff --git a/dbms/src/Interpreters/InterpreterDropQuery.cpp b/dbms/src/Interpreters/InterpreterDropQuery.cpp
index fb6fe2c8c38..ff0ece66dfd 100644
--- a/dbms/src/Interpreters/InterpreterDropQuery.cpp
+++ b/dbms/src/Interpreters/InterpreterDropQuery.cpp
@@ -69,7 +69,7 @@ BlockIO InterpreterDropQuery::executeToTable(String & database_name_, String & t
{
database_and_table.second->shutdown();
/// If table was already dropped by anyone, an exception will be thrown
- auto table_lock = database_and_table.second->lockDataForAlter(__PRETTY_FUNCTION__);
+ auto table_lock = database_and_table.second->lockForAlter(__PRETTY_FUNCTION__);
/// Drop table from memory, don't touch data and metadata
database_and_table.first->detachTable(database_and_table.second->getTableName());
}
@@ -78,7 +78,7 @@ BlockIO InterpreterDropQuery::executeToTable(String & database_name_, String & t
database_and_table.second->checkTableCanBeDropped();
/// If table was already dropped by anyone, an exception will be thrown
- auto table_lock = database_and_table.second->lockDataForAlter(__PRETTY_FUNCTION__);
+ auto table_lock = database_and_table.second->lockForAlter(__PRETTY_FUNCTION__);
/// Drop table data, don't touch metadata
database_and_table.second->truncate(query_ptr);
}
@@ -88,7 +88,7 @@ BlockIO InterpreterDropQuery::executeToTable(String & database_name_, String & t
database_and_table.second->shutdown();
/// If table was already dropped by anyone, an exception will be thrown
- auto table_lock = database_and_table.second->lockDataForAlter(__PRETTY_FUNCTION__);
+ auto table_lock = database_and_table.second->lockForAlter(__PRETTY_FUNCTION__);
/// Delete table metdata and table itself from memory
database_and_table.first->removeTable(context, database_and_table.second->getTableName());
/// Delete table data
@@ -124,7 +124,7 @@ BlockIO InterpreterDropQuery::executeToTemporaryTable(String & table_name, ASTDr
if (kind == ASTDropQuery::Kind::Truncate)
{
/// If table was already dropped by anyone, an exception will be thrown
- auto table_lock = table->lockDataForAlter(__PRETTY_FUNCTION__);
+ auto table_lock = table->lockForAlter(__PRETTY_FUNCTION__);
/// Drop table data, don't touch metadata
table->truncate(query_ptr);
}
diff --git a/dbms/src/Interpreters/InterpreterDropQuery.h b/dbms/src/Interpreters/InterpreterDropQuery.h
index 658c150de27..2bea9f85c99 100644
--- a/dbms/src/Interpreters/InterpreterDropQuery.h
+++ b/dbms/src/Interpreters/InterpreterDropQuery.h
@@ -13,7 +13,9 @@ class IAST;
using ASTPtr = std::shared_ptr;
using DatabaseAndTable = std::pair;
-/** Allow to either drop table with all its data (DROP), or remove information about table (just forget) from server (DETACH).
+/** Allow to either drop table with all its data (DROP),
+ * or remove information about table (just forget) from server (DETACH),
+ * or just clear all data in table (TRUNCATE).
*/
class InterpreterDropQuery : public IInterpreter
{
diff --git a/dbms/src/Storages/IStorage.h b/dbms/src/Storages/IStorage.h
index 2d1092d04b8..cbf69a18a77 100644
--- a/dbms/src/Storages/IStorage.h
+++ b/dbms/src/Storages/IStorage.h
@@ -126,7 +126,7 @@ public:
return res;
}
- /** Does not allow reading the table structure. It is taken for ALTER, RENAME and DROP.
+ /** Does not allow reading the table structure. It is taken for ALTER, RENAME and DROP, TRUNCATE.
*/
TableFullWriteLock lockForAlter(const std::string & who = "Alter")
{
diff --git a/dbms/tests/queries/0_stateless/00704_drop_truncate_memory_table.reference b/dbms/tests/queries/0_stateless/00704_drop_truncate_memory_table.reference
new file mode 100644
index 00000000000..83b33d238da
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00704_drop_truncate_memory_table.reference
@@ -0,0 +1 @@
+1000
diff --git a/dbms/tests/queries/0_stateless/00704_drop_truncate_memory_table.sh b/dbms/tests/queries/0_stateless/00704_drop_truncate_memory_table.sh
new file mode 100755
index 00000000000..f805c7aa03e
--- /dev/null
+++ b/dbms/tests/queries/0_stateless/00704_drop_truncate_memory_table.sh
@@ -0,0 +1,27 @@
+#!/usr/bin/env bash
+set -e
+
+CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
+. $CURDIR/../shell_config.sh
+
+${CLICKHOUSE_CLIENT} --multiquery --query="
+DROP TABLE IF EXISTS test.memory;
+CREATE TABLE test.memory (x UInt64) ENGINE = Memory;
+
+SET max_block_size = 1, min_insert_block_size_rows = 0, min_insert_block_size_bytes = 0;
+
+INSERT INTO test.memory SELECT * FROM numbers(1000);"
+
+
+${CLICKHOUSE_CLIENT} --multiquery --query="
+SET max_threads = 1;
+SELECT count() FROM test.memory WHERE NOT ignore(sleep(0.0001));" &
+
+sleep 0.05;
+
+${CLICKHOUSE_CLIENT} --multiquery --query="
+TRUNCATE TABLE test.memory;
+DROP TABLE test.memory;
+"
+
+wait
From 477a1afd55ae6a2801d023400305f2e76d7bc6ca Mon Sep 17 00:00:00 2001
From: robot-clickhouse
Date: Wed, 5 Sep 2018 00:28:49 +0300
Subject: [PATCH 22/25] Auto version update to [18.12.2] [54407]
---
dbms/cmake/version.cmake | 8 ++++----
debian/changelog | 4 ++--
docker/client/Dockerfile | 2 +-
docker/server/Dockerfile | 2 +-
docker/test/Dockerfile | 2 +-
5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/dbms/cmake/version.cmake b/dbms/cmake/version.cmake
index 4d02450a0e6..217e5bd7d0b 100644
--- a/dbms/cmake/version.cmake
+++ b/dbms/cmake/version.cmake
@@ -2,10 +2,10 @@
set(VERSION_REVISION 54407 CACHE STRING "")
set(VERSION_MAJOR 18 CACHE STRING "")
set(VERSION_MINOR 12 CACHE STRING "")
-set(VERSION_PATCH 1 CACHE STRING "")
-set(VERSION_GITHASH 76eaacf1be15102a732a90949739b6605d8596a1 CACHE STRING "")
-set(VERSION_DESCRIBE v18.12.1-testing CACHE STRING "")
-set(VERSION_STRING 18.12.1 CACHE STRING "")
+set(VERSION_PATCH 2 CACHE STRING "")
+set(VERSION_GITHASH d12c1b02bc50119d67db2690c6bc7aeeae9d55ef CACHE STRING "")
+set(VERSION_DESCRIBE v18.12.2-testing CACHE STRING "")
+set(VERSION_STRING 18.12.2 CACHE STRING "")
# end of autochange
set(VERSION_EXTRA "" CACHE STRING "")
diff --git a/debian/changelog b/debian/changelog
index 4457725fcb6..17cfdd1c287 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,5 +1,5 @@
-clickhouse (18.12.1) unstable; urgency=low
+clickhouse (18.12.2) unstable; urgency=low
* Modified source code
- -- Thu, 30 Aug 2018 22:28:33 +0300
+ -- Wed, 05 Sep 2018 00:28:49 +0300
diff --git a/docker/client/Dockerfile b/docker/client/Dockerfile
index fd64c994ec7..8655329065c 100644
--- a/docker/client/Dockerfile
+++ b/docker/client/Dockerfile
@@ -1,7 +1,7 @@
FROM ubuntu:18.04
ARG repository="deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"
-ARG version=18.12.1
+ARG version=18.12.2
RUN apt-get update && \
apt-get install -y apt-transport-https dirmngr && \
diff --git a/docker/server/Dockerfile b/docker/server/Dockerfile
index 37ca5e839ff..953eab83e33 100644
--- a/docker/server/Dockerfile
+++ b/docker/server/Dockerfile
@@ -1,7 +1,7 @@
FROM ubuntu:18.04
ARG repository="deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"
-ARG version=18.12.1
+ARG version=18.12.2
RUN apt-get update && \
apt-get install -y apt-transport-https dirmngr && \
diff --git a/docker/test/Dockerfile b/docker/test/Dockerfile
index 8fcc1d9b8d7..f1b5dda8e2a 100644
--- a/docker/test/Dockerfile
+++ b/docker/test/Dockerfile
@@ -1,7 +1,7 @@
FROM ubuntu:18.04
ARG repository="deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"
-ARG version=18.12.1
+ARG version=18.12.2
RUN apt-get update && \
apt-get install -y apt-transport-https dirmngr && \
From 94997889c514198277f8db163c7eeb20d5ddab68 Mon Sep 17 00:00:00 2001
From: chertus
Date: Wed, 5 Sep 2018 01:17:01 +0300
Subject: [PATCH 23/25] fix test
---
dbms/tests/queries/0_stateless/00700_decimal_null.sql | 1 -
1 file changed, 1 deletion(-)
diff --git a/dbms/tests/queries/0_stateless/00700_decimal_null.sql b/dbms/tests/queries/0_stateless/00700_decimal_null.sql
index abc4ef86890..2ec15dd3775 100644
--- a/dbms/tests/queries/0_stateless/00700_decimal_null.sql
+++ b/dbms/tests/queries/0_stateless/00700_decimal_null.sql
@@ -17,7 +17,6 @@ CREATE TABLE IF NOT EXISTS test.decimal
SELECT toNullable(toDecimal32(32, 0)) AS x, assumeNotNull(x);
SELECT toNullable(toDecimal64(64, 0)) AS x, assumeNotNull(x);
SELECT toNullable(toDecimal128(128, 0)) AS x, assumeNotNull(x);
-SELECT NULL AS x, assumeNotNull(x); -- { serverError 48 }
SELECT ifNull(toDecimal32(1, 0), NULL), ifNull(toDecimal64(1, 0), NULL), ifNull(toDecimal128(1, 0), NULL);
SELECT ifNull(toNullable(toDecimal32(2, 0)), NULL), ifNull(toNullable(toDecimal64(2, 0)), NULL), ifNull(toNullable(toDecimal128(2, 0)), NULL);
From 7795a6f1bc9de006995b1ae49aac7493c70136f5 Mon Sep 17 00:00:00 2001
From: Pavel Patrin
Date: Wed, 5 Sep 2018 15:33:28 +0300
Subject: [PATCH 24/25] Update tinylog.md (#3042)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Пишите, Карл!
---
docs/ru/operations/table_engines/tinylog.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/ru/operations/table_engines/tinylog.md b/docs/ru/operations/table_engines/tinylog.md
index ce4ddac68f5..813eaa56890 100644
--- a/docs/ru/operations/table_engines/tinylog.md
+++ b/docs/ru/operations/table_engines/tinylog.md
@@ -7,7 +7,7 @@
Конкуррентный доступ к данным никак не ограничивается:
- если вы одновременно читаете из таблицы и в другом запросе пишете в неё, то чтение будет завершено с ошибкой;
-- если вы одновременно пишите в таблицу в нескольких запросах, то данные будут битыми.
+- если вы одновременно пишете в таблицу в нескольких запросах, то данные будут битыми.
Типичный способ использования этой таблицы - это write-once: сначала один раз только пишем данные, а потом сколько угодно читаем.
Запросы выполняются в один поток. То есть, этот движок предназначен для сравнительно маленьких таблиц (рекомендуется до 1 000 000 строк).
From 3c10fbfabaa98255ed52d026c632c7a6f4b8f819 Mon Sep 17 00:00:00 2001
From: Sergey Zaikin
Date: Wed, 5 Sep 2018 15:37:49 +0300
Subject: [PATCH 25/25] Update external_dicts_dict_sources.md (#3043)
Set PostgreSQL section level equals MS SQL section level
---
docs/ru/query_language/dicts/external_dicts_dict_sources.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/ru/query_language/dicts/external_dicts_dict_sources.md b/docs/ru/query_language/dicts/external_dicts_dict_sources.md
index 2cb4754b934..c2b9c27c9ab 100644
--- a/docs/ru/query_language/dicts/external_dicts_dict_sources.md
+++ b/docs/ru/query_language/dicts/external_dicts_dict_sources.md
@@ -124,7 +124,7 @@
- `connection_string` - строка соединения.
- `invalidate_query` - запрос для проверки статуса словаря. Необязательный параметр. Читайте подробнее в разделе [Обновление словарей](external_dicts_dict_lifetime.md#dicts-external_dicts_dict_lifetime).
-## Пример подключения PostgreSQL
+### Пример подключения PostgreSQL
ОС Ubuntu.