diff --git a/docs/en/agg_functions/combinators.md b/docs/en/agg_functions/combinators.md
old mode 100755
new mode 100644
diff --git a/docs/en/agg_functions/index.md b/docs/en/agg_functions/index.md
old mode 100755
new mode 100644
index 3864f7271c4..e87bf4ff833
--- a/docs/en/agg_functions/index.md
+++ b/docs/en/agg_functions/index.md
@@ -8,4 +8,3 @@ ClickHouse also supports:
- [Parametric aggregate functions](parametric_functions.md#aggregate_functions_parametric), which accept other parameters in addition to columns.
- [Combinators](combinators.md#aggregate_functions_combinators), which change the behavior of aggregate functions.
-
diff --git a/docs/en/agg_functions/parametric_functions.md b/docs/en/agg_functions/parametric_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/agg_functions/reference.md b/docs/en/agg_functions/reference.md
old mode 100755
new mode 100644
index 0eb896e4664..90ff9da58e7
--- a/docs/en/agg_functions/reference.md
+++ b/docs/en/agg_functions/reference.md
@@ -19,7 +19,7 @@ In some cases, you can rely on the order of execution. This applies to cases whe
When a `SELECT` query has the `GROUP BY` clause or at least one aggregate function, ClickHouse (in contrast to MySQL) requires that all expressions in the `SELECT`, `HAVING`, and `ORDER BY` clauses be calculated from keys or from aggregate functions. In other words, each column selected from the table must be used either in keys or inside aggregate functions. To get behavior like in MySQL, you can put the other columns in the `any` aggregate function.
-## anyHeavy
+## anyHeavy(x)
Selects a frequently occurring value using the [heavy hitters](http://www.cs.umd.edu/~samir/498/karp.pdf) algorithm. If there is a value that occurs more than in half the cases in each of the query's execution threads, this value is returned. Normally, the result is nondeterministic.
@@ -39,6 +39,7 @@ Take the [OnTime](../getting_started/example_datasets/ontime.md#example_datasets
SELECT anyHeavy(AirlineID) AS res
FROM ontime
```
+
```
┌───res─┐
│ 19690 │
@@ -124,11 +125,11 @@ The result is always Float64.
Calculates the approximate number of different values of the argument. Works for numbers, strings, dates, date-with-time, and for multiple arguments and tuple arguments.
Uses an adaptive sampling algorithm: for the calculation state, it uses a sample of element hash values with a size up to 65536.
-This algorithm is also very accurate for data sets with low cardinality (up to 65536) and very efficient on CPU (when computing not too many of these functions, using `uniq` is almost as fast as using other aggregate functions).
+This algorithm is also very accurate for data sets with small cardinality (up to 65536) and very efficient on CPU (when computing not too many of these functions, using `uniq` is almost as fast as using other aggregate functions).
The result is determinate (it doesn't depend on the order of query processing).
-This function provides excellent accuracy even for data sets with extremely high cardinality (over 10 billion elements). It is recommended for default use.
+This function provides excellent accuracy even for data sets with huge cardinality (10B+ elements) and is recommended for use by default.
## uniqCombined(x)
@@ -138,16 +139,16 @@ A combination of three algorithms is used: array, hash table and [HyperLogLog](h
The result is determinate (it doesn't depend on the order of query processing).
-The `uniqCombined` function is a good default choice for calculating the number of different values, but keep in mind that the estimation error will increase for high-cardinality data sets (200M+ elements), and the function will return very inaccurate results for data sets with extremely high cardinality (1B+ elements).
+The `uniqCombined` function is a good default choice for calculating the number of different values, but the following should be considered: for data sets with large cardinality (200M+) error of estimate will only grow and for data sets with huge cardinality(1B+ elements) it returns result with high inaccuracy.
## uniqHLL12(x)
Uses the [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) algorithm to approximate the number of different values of the argument.
-212 5-bit cells are used. The size of the state is slightly more than 2.5 KB. The result is not very accurate (up to ~10% error) for small data sets (<10K elements). However, the result is fairly accurate for high-cardinality data sets (10K-100M), with a maximum error of ~1.6%. Starting from 100M, the estimation error increases, and the function will return very inaccurate results for data sets with extremely high cardinality (1B+ elements).
+212 5-bit cells are used. The size of the state is slightly more than 2.5 KB. Result is not very accurate (error up to ~10%) for data sets of small cardinality(<10K elements), but for data sets with large cardinality (10K - 100M) result is quite accurate (error up to ~1.6%) and after that error of estimate will only grow and for data sets with huge cardinality (1B+ elements) it returns result with high inaccuracy.
The result is determinate (it doesn't depend on the order of query processing).
-We don't recommend using this function. In most cases, use the `uniq` or `uniqCombined` function.
+This function is not recommended for use, and in most cases, use the `uniq` or `uniqCombined` function.
## uniqExact(x)
@@ -169,7 +170,7 @@ In some cases, you can still rely on the order of execution. This applies to cas
-## groupArrayInsertAt
+## groupArrayInsertAt(x)
Inserts a value into the array in the specified position.
@@ -235,8 +236,8 @@ For its purpose (calculating quantiles of page loading times), using this functi
## quantileTimingWeighted(level)(x, weight)
-Differs from the `quantileTiming` function in that it has a second argument, "weights". Weight is a non-negative integer.
-The result is calculated as if the `x` value were passed `weight` number of times to the `quantileTiming` function.
+Differs from the 'quantileTiming' function in that it has a second argument, "weights". Weight is a non-negative integer.
+The result is calculated as if the 'x' value were passed 'weight' number of times to the 'quantileTiming' function.
## quantileExact(level)(x)
@@ -256,7 +257,7 @@ The performance of the function is lower than for ` quantile`, ` quantileTiming`
The result depends on the order of running the query, and is nondeterministic.
-## median
+## median(x)
All the quantile functions have corresponding median functions: `median`, `medianDeterministic`, `medianTiming`, `medianTimingWeighted`, `medianExact`, `medianExactWeighted`, `medianTDigest`. They are synonyms and their behavior is identical.
@@ -274,7 +275,7 @@ Returns `Float64`. When `n <= 1`, returns `+∞`.
## varPop(x)
-Calculates the amount `Σ((x - x̅)^2) / (n - 1)`, where `n` is the sample size and `x̅`is the average value of `x`.
+Calculates the amount `Σ((x - x̅)^2) / n`, where `n` is the sample size and `x̅`is the average value of `x`.
In other words, dispersion for a set of values. Returns `Float64`.
@@ -286,33 +287,30 @@ The result is equal to the square root of `varSamp(x)`.
The result is equal to the square root of `varPop(x)`.
-## topK
+## topK(N)(column)
Returns an array of the most frequent values in the specified column. The resulting array is sorted in descending order of frequency of values (not by the values themselves).
Implements the [ Filtered Space-Saving](http://www.l2f.inesc-id.pt/~fmmb/wiki/uploads/Work/misnis.ref0a.pdf) algorithm for analyzing TopK, based on the reduce-and-combine algorithm from [Parallel Space Saving](https://arxiv.org/pdf/1401.0702.pdf).
-```
-topK(N)(column)
-```
-
This function doesn't provide a guaranteed result. In certain situations, errors might occur and it might return frequent values that aren't the most frequent values.
We recommend using the `N < 10 ` value; performance is reduced with large `N` values. Maximum value of ` N = 65536`.
**Arguments**
-- 'N' is the number of values.
+- 'N' – The number of values.
- ' x ' – The column.
**Example**
-Take the [OnTime](../getting_started/example_datasets/ontime.md#example_datasets-ontime) data set and select the three most frequently occurring values in the `AirlineID` column.
+Take the [OnTime](../getting_started/example_datasets/ontime.md#example_datasets-ontime)data set and select the three most frequently occurring values in the `AirlineID` column.
```sql
SELECT topK(3)(AirlineID) AS res
FROM ontime
```
+
```
┌─res─────────────────┐
│ [19393,19790,19805] │
diff --git a/docs/en/data_types/array.md b/docs/en/data_types/array.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/boolean.md b/docs/en/data_types/boolean.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/date.md b/docs/en/data_types/date.md
old mode 100755
new mode 100644
index cb179c0d8c4..355e5555bfa
--- a/docs/en/data_types/date.md
+++ b/docs/en/data_types/date.md
@@ -1,6 +1,6 @@
# Date
-A date. Stored in two bytes as the number of days since 1970-01-01 (unsigned). Allows storing values from just after the beginning of the Unix Epoch to the upper threshold defined by a constant at the compilation stage (currently, this is until the year 2106, but the final fully-supported year is 2105).
+Date. Stored in two bytes as the number of days since 1970-01-01 (unsigned). Allows storing values from just after the beginning of the Unix Epoch to the upper threshold defined by a constant at the compilation stage (currently, this is until the year 2038, but it may be expanded to 2106).
The minimum value is output as 0000-00-00.
The date is stored without the time zone.
diff --git a/docs/en/data_types/datetime.md b/docs/en/data_types/datetime.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/enum.md b/docs/en/data_types/enum.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/fixedstring.md b/docs/en/data_types/fixedstring.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/float.md b/docs/en/data_types/float.md
old mode 100755
new mode 100644
index 9d5cc2c01bb..c6d12999604
--- a/docs/en/data_types/float.md
+++ b/docs/en/data_types/float.md
@@ -4,8 +4,8 @@
Types are equivalent to types of C:
-- `Float32` - `float`
-- `Float64` - ` double`
+- `Float32` - `float`;
+- `Float64` - ` double`.
We recommend that you store data in integer form whenever possible. For example, convert fixed precision numbers to integer values, such as monetary amounts or page load times in milliseconds.
@@ -24,7 +24,9 @@ SELECT 1 - 0.9
```
- The result of the calculation depends on the calculation method (the processor type and architecture of the computer system).
+
- Floating-point calculations might result in numbers such as infinity (`Inf`) and "not-a-number" (`NaN`). This should be taken into account when processing the results of calculations.
+
- When reading floating point numbers from rows, the result might not be the nearest machine-representable number.
## NaN and Inf
@@ -42,7 +44,6 @@ SELECT 0.5 / 0
│ inf │
└────────────────┘
```
-
- `-Inf` – Negative infinity.
```sql
@@ -54,7 +55,6 @@ SELECT -0.5 / 0
│ -inf │
└─────────────────┘
```
-
- `NaN` – Not a number.
```
@@ -67,5 +67,5 @@ SELECT 0 / 0
└──────────────┘
```
-See the rules for ` NaN` sorting in the section [ORDER BY clause](../query_language/queries.md#query_language-queries-order_by).
+ See the rules for ` NaN` sorting in the section [ORDER BY clause](../query_language/queries.md#query_language-queries-order_by).
diff --git a/docs/en/data_types/index.md b/docs/en/data_types/index.md
old mode 100755
new mode 100644
index 4008a872161..c17b51c08a2
--- a/docs/en/data_types/index.md
+++ b/docs/en/data_types/index.md
@@ -2,7 +2,6 @@
# Data types
-ClickHouse can store various types of data in table cells.
-
-This section describes the supported data types and special considerations when using and/or implementing them, if any.
+ClickHouse table fields can contain data of different types.
+The topic contains descriptions of data types supported and specificity of their usage of implementation if exists.
\ No newline at end of file
diff --git a/docs/en/data_types/int_uint.md b/docs/en/data_types/int_uint.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/nested_data_structures/aggregatefunction.md b/docs/en/data_types/nested_data_structures/aggregatefunction.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/nested_data_structures/index.md b/docs/en/data_types/nested_data_structures/index.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/nested_data_structures/nested.md b/docs/en/data_types/nested_data_structures/nested.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/special_data_types/expression.md b/docs/en/data_types/special_data_types/expression.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/special_data_types/index.md b/docs/en/data_types/special_data_types/index.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/special_data_types/set.md b/docs/en/data_types/special_data_types/set.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/string.md b/docs/en/data_types/string.md
old mode 100755
new mode 100644
diff --git a/docs/en/data_types/tuple.md b/docs/en/data_types/tuple.md
old mode 100755
new mode 100644
diff --git a/docs/en/development/style.md b/docs/en/development/style.md
old mode 100755
new mode 100644
index 700fede5373..ef6490187c8
--- a/docs/en/development/style.md
+++ b/docs/en/development/style.md
@@ -4,7 +4,7 @@
1. The following are recommendations, not requirements.
2. If you are editing code, it makes sense to follow the formatting of the existing code.
-3. Code style is needed for consistency. Consistency makes it easier to read the code, and it also makes it easier to search the code.
+3. Code style is needed for consistency. Consistency makes it easier to read the code. and it also makes it easier to search the code.
4. Many of the rules do not have logical reasons; they are dictated by established practices.
## Formatting
@@ -93,25 +93,25 @@
14. In classes and structures, public, private, and protected are written on the same level as the class/struct, but all other internal elements should be deeper.
```cpp
- template
-class MultiVersion
-{
-public:
- /// Version of object for usage. shared_ptr manage lifetime of version.
- using Version = std::shared_ptr;
- ...
-}
+ template >
+ class MultiVersion
+ {
+ public:
+ /// The specific version of the object to use.
+ using Version = Ptr;
+ ...
+ }
```
15. If the same namespace is used for the entire file, and there isn't anything else significant, an offset is not necessary inside namespace.
16. If the block for if, for, while... expressions consists of a single statement, you don't need to use curly brackets. Place the statement on a separate line, instead. The same is true for a nested if, for, while... statement. But if the inner statement contains curly brackets or else, the external block should be written in curly brackets.
- ```cpp
- /// Finish write.
-for (auto & stream : streams)
- stream.second->finalize();
- ```
+ ```cpp
+ /// Finish write.
+ for (auto & stream : streams)
+ stream.second->finalize();
+ ```
17. There should be any spaces at the ends of lines.
@@ -218,11 +218,11 @@ for (auto & stream : streams)
*/
void executeQuery(
ReadBuffer & istr, /// Where to read the query from (and data for INSERT, if applicable)
- WriteBuffer & ostr, /// Where to write the result
- Context & context, /// DB, tables, data types, engines, functions, aggregate functions...
- BlockInputStreamPtr & query_plan, /// A description of query processing can be included here
- QueryProcessingStage::Enum stage = QueryProcessingStage::Complete /// The last stage to process the SELECT query to
- )
+ WriteBuffer & ostr, /// Where to write the result
+ Context & context, /// DB, tables, data types, engines, functions, aggregate functions...
+ BlockInputStreamPtr & query_plan, /// A description of query processing can be included here
+ QueryProcessingStage::Enum stage = QueryProcessingStage::Complete /// The last stage to process the SELECT query to
+ )
```
4. Comments should be written in English only.
@@ -252,7 +252,7 @@ for (auto & stream : streams)
*/
```
- (the example is borrowed from the resource [http://home.tamk.fi/~jaalto/course/coding-style/doc/unmaintainable-code/](http://home.tamk.fi/~jaalto/course/coding-style/doc/unmaintainable-code/)
+ (Example taken from: [http://home.tamk.fi/~jaalto/course/coding-style/doc/unmaintainable-code/)](http://home.tamk.fi/~jaalto/course/coding-style/doc/unmaintainable-code/)
7. Do not write garbage comments (author, creation date ..) at the beginning of each file.
@@ -497,15 +497,7 @@ This is not recommended, but it is allowed.
You can create a separate code block inside a single function in order to make certain variables local, so that the destructors are called when exiting the block.
```cpp
- Block block = data.in->read();
-
- {
- std::lock_guard lock(mutex);
- data.ready = true;
- data.block = block;
- }
-
- ready_any.set();
+ Block block = data.in->read();{ std::lock_guard lock(mutex); data.ready = true; data.block = block;}ready_any.set();
```
7. Multithreading.
@@ -568,12 +560,13 @@ This is not recommended, but it is allowed.
```cpp
using AggregateFunctionPtr = std::shared_ptr;
-
- /** Creates an aggregate function by name. */
+
+ /** Creates an aggregate function by name.
+ */
class AggregateFunctionFactory
{
public:
- AggregateFunctionFactory();
+ AggregateFunctionFactory();
AggregateFunctionPtr get(const String & name, const DataTypes & argument_types) const;
```
@@ -598,10 +591,10 @@ This is not recommended, but it is allowed.
If later you’ll need to delay initialization, you can add a default constructor that will create an invalid object. Or, for a small number of objects, you can use shared_ptr/unique_ptr.
```cpp
- Loader(DB::Connection * connection_, const std::string & query, size_t max_block_size_);
-
- /// For delayed initialization
- Loader() {}
+ Loader(DB::Connection * connection_, const std::string & query, size_t max_block_size_);
+
+ /// For delayed initialization
+ Loader() {}
```
17. Virtual functions.
diff --git a/docs/en/dicts/external_dicts.md b/docs/en/dicts/external_dicts.md
old mode 100755
new mode 100644
index a6af84a313f..b99b02bbf57
--- a/docs/en/dicts/external_dicts.md
+++ b/docs/en/dicts/external_dicts.md
@@ -21,11 +21,12 @@ The dictionary config file has the following format:
/etc/metrika.xml
+
-
+
-
+
...
diff --git a/docs/en/dicts/external_dicts_dict.md b/docs/en/dicts/external_dicts_dict.md
old mode 100755
new mode 100644
index 6d2f4128704..4133b036e1f
--- a/docs/en/dicts/external_dicts_dict.md
+++ b/docs/en/dicts/external_dicts_dict.md
@@ -27,8 +27,7 @@ The dictionary configuration has the following structure:
```
- name – The identifier that can be used to access the dictionary. Use the characters `[a-zA-Z0-9_\-]`.
-- [source](external_dicts_dict_sources.html/#dicts-external_dicts_dict_sources) — Source of the dictionary .
-- [layout](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout) — Dictionary layout in memory.
-- [source](external_dicts_dict_sources.html/#dicts-external_dicts_dict_sources) — Structure of the dictionary . A key and attributes that can be retrieved by this key.
-- [lifetime](external_dicts_dict_lifetime.md#dicts-external_dicts_dict_lifetime) — Frequency of dictionary updates.
-
+- [source](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources) – Source of the dictionary.
+- [layout](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout) – Location of the dictionary in memory.
+- [structure](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure) – Structure of the dictionary. A key and attributes that can be retrieved by this key.
+- [lifetime](external_dicts_dict_lifetime.md#dicts-external_dicts_dict_lifetime) – How frequently to update dictionaries.
diff --git a/docs/en/dicts/external_dicts_dict_layout.md b/docs/en/dicts/external_dicts_dict_layout.md
old mode 100755
new mode 100644
index 8b7cad24b65..ad635db94f5
--- a/docs/en/dicts/external_dicts_dict_layout.md
+++ b/docs/en/dicts/external_dicts_dict_layout.md
@@ -2,11 +2,11 @@
# Storing dictionaries in memory
-There are a [variety of ways](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-manner) to store dictionaries in memory.
+There are [many different ways](external_dicts_dict_layout#dicts-external_dicts_dict_layout-manner) to store dictionaries in memory.
-We recommend [flat](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-flat), [hashed](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-hashed)and[complex_key_hashed](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-complex_key_hashed). which provide optimal processing speed.
+We recommend [flat](external_dicts_dict_layout#dicts-external_dicts_dict_layout-flat), [hashed](external_dicts_dict_layout#dicts-external_dicts_dict_layout-hashed), and [complex_key_hashed](external_dicts_dict_layout#dicts-external_dicts_dict_layout-complex_key_hashed). which provide optimal processing speed.
-Caching is not recommended because of potentially poor performance and difficulties in selecting optimal parameters. Read more in the section " [cache](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout-cache)".
+Caching is not recommended because of potentially poor performance and difficulties in selecting optimal parameters. Read more about this in the "[cache](external_dicts_dict_layout#dicts-external_dicts_dict_layout-cache)" section.
There are several ways to improve dictionary performance:
@@ -46,6 +46,7 @@ The configuration looks like this:
- [range_hashed](#dicts-external_dicts_dict_layout-range_hashed)
- [complex_key_hashed](#dicts-external_dicts_dict_layout-complex_key_hashed)
- [complex_key_cache](#dicts-external_dicts_dict_layout-complex_key_cache)
+- [ip_trie](#dicts-external_dicts_dict_layout-ip_trie)
@@ -87,7 +88,7 @@ Configuration example:
### complex_key_hashed
-This type is for use with composite [keys](external_dicts_dict_structure.md/#dicts-external_dicts_dict_structure). Similar to `hashed`.
+This type of storage is designed for use with compound [keys](external_dicts_dict_structure#dicts-external_dicts_dict_structure). It is similar to hashed.
Configuration example:
@@ -108,18 +109,18 @@ This storage method works the same way as hashed and allows using date/time rang
Example: The table contains discounts for each advertiser in the format:
```
-+---------------+---------------------+-------------------+--------+
-| advertiser id | discount start date | discount end date | amount |
-+===============+=====================+===================+========+
-| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
-+---------------+---------------------+-------------------+--------+
-| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
-+---------------+---------------------+-------------------+--------+
-| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
-+---------------+---------------------+-------------------+--------+
+ +---------------+---------------------+-------------------+--------+
+ | advertiser id | discount start date | discount end date | amount |
+ +===============+=====================+===================+========+
+ | 123 | 2015-01-01 | 2015-01-15 | 0.15 |
+ +---------------+---------------------+-------------------+--------+
+ | 123 | 2015-01-16 | 2015-01-31 | 0.25 |
+ +---------------+---------------------+-------------------+--------+
+ | 456 | 2015-01-01 | 2015-01-15 | 0.05 |
+ +---------------+---------------------+-------------------+--------+
```
-To use a sample for date ranges, define the `range_min` and `range_max` elements in the [structure](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure).
+To use a sample for date ranges, define `range_min` and `range_max` in [structure](external_dicts_dict_structure#dicts-external_dicts_dict_structure).
Example:
@@ -196,15 +197,15 @@ This is the least effective of all the ways to store dictionaries. The speed of
To improve cache performance, use a subquery with ` LIMIT`, and call the function with the dictionary externally.
-Supported [sources](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources): MySQL, ClickHouse, executable, HTTP.
+Supported [sources](external_dicts_dict_sources#dicts-external_dicts_dict_sources): MySQL, ClickHouse, executable, HTTP.
Example of settings:
```xml
-
- 1000000000
+
+ 1000000000
```
@@ -226,4 +227,66 @@ Do not use ClickHouse as a source, because it is slow to process queries with ra
### complex_key_cache
-This type of storage is for use with composite [keys](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure). Similar to `cache`.
+This type of storage is designed for use with compound [keys](external_dicts_dict_structure#dicts-external_dicts_dict_structure). Similar to `cache`.
+
+
+
+### ip_trie
+
+
+The table stores IP prefixes for each key (IP address), which makes it possible to map IP addresses to metadata such as ASN or threat score.
+
+Example: in the table there are prefixes matches to AS number and country:
+
+```
+ +-----------------+-------+--------+
+ | prefix | asn | cca2 |
+ +=================+=======+========+
+ | 202.79.32.0/20 | 17501 | NP |
+ +-----------------+-------+--------+
+ | 2620:0:870::/48 | 3856 | US |
+ +-----------------+-------+--------+
+ | 2a02:6b8:1::/48 | 13238 | RU |
+ +-----------------+-------+--------+
+ | 2001:db8::/32 | 65536 | ZZ |
+ +-----------------+-------+--------+
+```
+
+When using such a layout, the structure should have the "key" element.
+
+Example:
+
+```xml
+
+
+
+ prefix
+ String
+
+
+
+ asn
+ UInt32
+
+
+
+ cca2
+ String
+ ??
+
+ ...
+```
+
+These key must have only one attribute of type String, containing a valid IP prefix. Other types are not yet supported.
+
+For querying, same functions (dictGetT with tuple) as for complex key dictionaries have to be used:
+
+ dictGetT('dict_name', 'attr_name', tuple(ip))
+
+The function accepts either UInt32 for IPv4 address or FixedString(16) for IPv6 address in wire format:
+
+ dictGetString('prefix', 'asn', tuple(IPv6StringToNum('2001:db8::1')))
+
+No other type is supported. The function returns attribute for a prefix matching the given IP address. If there are overlapping prefixes, the most specific one is returned.
+
+The data is stored currently in a bitwise trie, it has to fit in memory.
diff --git a/docs/en/dicts/external_dicts_dict_lifetime.md b/docs/en/dicts/external_dicts_dict_lifetime.md
old mode 100755
new mode 100644
index 52ee7a4aa78..6431fb3de48
--- a/docs/en/dicts/external_dicts_dict_lifetime.md
+++ b/docs/en/dicts/external_dicts_dict_lifetime.md
@@ -36,13 +36,13 @@ Example of settings:
When upgrading the dictionaries, the ClickHouse server applies different logic depending on the type of [ source](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources):
> - For a text file, it checks the time of modification. If the time differs from the previously recorded time, the dictionary is updated.
-- For MyISAM tables, the time of modification is checked using a `SHOW TABLE STATUS` query.
-- Dictionaries from other sources are updated every time by default.
+> - For MyISAM tables, the time of modification is checked using a `SHOW TABLE STATUS` query.
+> - Dictionaries from other sources are updated every time by default.
For MySQL (InnoDB) and ODBC sources, you can set up a query that will update the dictionaries only if they really changed, rather than each time. To do this, follow these steps:
> - The dictionary table must have a field that always changes when the source data is updated.
-- The settings of the source must specify a query that retrieves the changing field. The ClickHouse server interprets the query result as a row, and if this row has changed relative to its previous state, the dictionary is updated. Specify the query in the `` field in the settings for the [source](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources).
+> - The settings of the source must specify a query that retrieves the changing field. The ClickHouse server interprets the query result as a row, and if this row has changed relative to its previous state, the dictionary is updated. The query must be specified in the `` field in the [ source](external_dicts_dict_sources.md#dicts-external_dicts_dict_sources) settings.
Example of settings:
diff --git a/docs/en/dicts/external_dicts_dict_sources.md b/docs/en/dicts/external_dicts_dict_sources.md
old mode 100755
new mode 100644
index 6cb4e0ea44d..721302cd556
--- a/docs/en/dicts/external_dicts_dict_sources.md
+++ b/docs/en/dicts/external_dicts_dict_sources.md
@@ -80,7 +80,7 @@ Setting fields:
## HTTP(s)
-Working with an HTTP(s) server depends on [how the dictionary is stored in memory](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout). If the dictionary is stored using `cache` and `complex_key_cache`, ClickHouse requests the necessary keys by sending a request via the `POST` method.
+Working with executable files depends on [how the dictionary is stored in memory](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout). If the dictionary is stored using `cache` and `complex_key_cache`, ClickHouse requests the necessary keys by sending a request via the `POST` method.
Example of settings:
@@ -135,9 +135,9 @@ Installing unixODBC and the ODBC driver for PostgreSQL:
Configuring `/etc/odbc.ini` (or `~/.odbc.ini`):
```
-[DEFAULT]
+ [DEFAULT]
Driver = myconnection
-
+
[myconnection]
Description = PostgreSQL connection to my_db
Driver = PostgreSQL Unicode
diff --git a/docs/en/dicts/external_dicts_dict_structure.md b/docs/en/dicts/external_dicts_dict_structure.md
old mode 100755
new mode 100644
index 2542af00ec6..5a6d349b350
--- a/docs/en/dicts/external_dicts_dict_structure.md
+++ b/docs/en/dicts/external_dicts_dict_structure.md
@@ -25,8 +25,8 @@ Overall structure:
Columns are described in the structure:
-- `` - [key column](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure-key).
-- `` - [data column](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure-attributes). There can be a large number of columns.
+- `` – [Key column](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure-key).
+- `` – [Data column](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure-attributes). There can be a large number of columns.
@@ -63,10 +63,12 @@ Configuration fields:
### Composite key
-The key can be a `tuple` from any types of fields. The [layout](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout) in this case must be `complex_key_hashed` or `complex_key_cache`.
+The key can be a `tuple` from any types of fields. The [ layout](external_dicts_dict_layout.md#dicts-external_dicts_dict_layout) in this case must be `complex_key_hashed` or `complex_key_cache`.
-A composite key can consist of a single element.
This makes it possible to use a string as the key, for instance.
+
+A composite key can also consist of a single element, which makes it possible to use a string as the key, for instance.
+
The key structure is set in the element ``. Key fields are specified in the same format as the dictionary [attributes](external_dicts_dict_structure.md#dicts-external_dicts_dict_structure-attributes). Example:
@@ -117,6 +119,6 @@ Configuration fields:
- `null_value` – The default value for a non-existing element. In the example, it is an empty string.
- `expression` – The attribute can be an expression. The tag is not required.
- `hierarchical` – Hierarchical support. Mirrored to the parent identifier. By default, ` false`.
-- `injective` – Whether the `id -> attribute` image is injective. If ` true`, then you can optimize the ` GROUP BY` clause. By default, `false`.
-- `is_object_id` – Whether the query is executed for a MongoDB document by `ObjectID`.
+- `injective` Whether the `id -> attribute` image is injective. If ` true`, then you can optimize the ` GROUP BY` clause. By default, `false`.
+- `is_object_id` - Used for query mongo documents by ObjectId
diff --git a/docs/en/dicts/index.md b/docs/en/dicts/index.md
old mode 100755
new mode 100644
diff --git a/docs/en/dicts/internal_dicts.md b/docs/en/dicts/internal_dicts.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/capnproto.md b/docs/en/formats/capnproto.md
old mode 100755
new mode 100644
index 918197b2bd9..0d482e20887
--- a/docs/en/formats/capnproto.md
+++ b/docs/en/formats/capnproto.md
@@ -1,26 +1,26 @@
-
-
-# CapnProto
-
-Cap'n Proto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
-
-Cap'n Proto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.
-
-```sql
-SELECT SearchPhrase, count() AS c FROM test.hits
- GROUP BY SearchPhrase FORMAT CapnProto SETTINGS schema = 'schema:Message'
-```
-
-Where `schema.capnp` looks like this:
-
-```
-struct Message {
- SearchPhrase @0 :Text;
- c @1 :Uint64;
-}
-```
-
-Schema files are in the file that is located in the directory specified in [ format_schema_path](../operations/server_settings/settings.md#server_settings-format_schema_path) in the server configuration.
-
-Deserialization is effective and usually doesn't increase the system load.
-
+
+
+# CapnProto
+
+Cap'n Proto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
+
+Cap'n Proto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.
+
+```sql
+SELECT SearchPhrase, count() AS c FROM test.hits
+ GROUP BY SearchPhrase FORMAT CapnProto SETTINGS schema = 'schema:Message'
+```
+
+Where `schema.capnp` looks like this:
+
+```
+struct Message {
+ SearchPhrase @0 :Text;
+ c @1 :Uint64;
+}
+```
+
+Schema files are in the file that is located in the directory specified in [ format_schema_path](../operations/server_settings/settings.md#server_settings-format_schema_path) in the server configuration.
+
+Deserialization is effective and usually doesn't increase the system load.
+
diff --git a/docs/en/formats/csv.md b/docs/en/formats/csv.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/csvwithnames.md b/docs/en/formats/csvwithnames.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/index.md b/docs/en/formats/index.md
old mode 100755
new mode 100644
index 112a13ff5e5..815a2d060cb
--- a/docs/en/formats/index.md
+++ b/docs/en/formats/index.md
@@ -3,4 +3,3 @@
# Formats
The format determines how data is returned to you after SELECTs (how it is written and formatted by the server), and how it is accepted for INSERTs (how it is read and parsed by the server).
-
diff --git a/docs/en/formats/json.md b/docs/en/formats/json.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/jsoncompact.md b/docs/en/formats/jsoncompact.md
old mode 100755
new mode 100644
index d870b6dff08..e4ce0867bc2
--- a/docs/en/formats/jsoncompact.md
+++ b/docs/en/formats/jsoncompact.md
@@ -24,7 +24,7 @@ Example:
["bathroom interior design", "2166"],
["yandex", "1655"],
["spring 2014 fashion", "1549"],
- ["freeform photo", "1480"]
+ ["freeform photos", "1480"]
],
"totals": ["","8873898"],
diff --git a/docs/en/formats/jsoneachrow.md b/docs/en/formats/jsoneachrow.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/native.md b/docs/en/formats/native.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/null.md b/docs/en/formats/null.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/pretty.md b/docs/en/formats/pretty.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/prettycompact.md b/docs/en/formats/prettycompact.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/prettycompactmonoblock.md b/docs/en/formats/prettycompactmonoblock.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/prettynoescapes.md b/docs/en/formats/prettynoescapes.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/prettyspace.md b/docs/en/formats/prettyspace.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/rowbinary.md b/docs/en/formats/rowbinary.md
old mode 100755
new mode 100644
index aeb3df4c8a8..bc8479332ba
--- a/docs/en/formats/rowbinary.md
+++ b/docs/en/formats/rowbinary.md
@@ -9,5 +9,5 @@ Date is represented as a UInt16 object that contains the number of days since 19
String is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by the bytes of the string.
FixedString is represented simply as a sequence of bytes.
-Array is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by successive elements of the array.
+Arrays are represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by the array elements in order.
diff --git a/docs/en/formats/tabseparated.md b/docs/en/formats/tabseparated.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/tabseparatedraw.md b/docs/en/formats/tabseparatedraw.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/tabseparatedwithnames.md b/docs/en/formats/tabseparatedwithnames.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/tabseparatedwithnamesandtypes.md b/docs/en/formats/tabseparatedwithnamesandtypes.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/tskv.md b/docs/en/formats/tskv.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/values.md b/docs/en/formats/values.md
old mode 100755
new mode 100644
index a672723f33d..2e929369848
--- a/docs/en/formats/values.md
+++ b/docs/en/formats/values.md
@@ -4,5 +4,5 @@ Prints every row in brackets. Rows are separated by commas. There is no comma af
The minimum set of characters that you need to escape when passing data in Values format: single quotes and backslashes.
-This is the format that is used in `INSERT INTO t VALUES ...`, but you can also use it for formatting query results.
+This is the format that is used in `INSERT INTO t VALUES ...` but you can also use it for query result.
diff --git a/docs/en/formats/vertical.md b/docs/en/formats/vertical.md
old mode 100755
new mode 100644
diff --git a/docs/en/formats/xml.md b/docs/en/formats/xml.md
old mode 100755
new mode 100644
index 0da55875cc3..f91adec9356
--- a/docs/en/formats/xml.md
+++ b/docs/en/formats/xml.md
@@ -35,7 +35,7 @@ XML format is suitable only for output, not for parsing. Example:
1549
- freeform photo
+ freeform photos1480
diff --git a/docs/en/functions/arithmetic_functions.md b/docs/en/functions/arithmetic_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/array_functions.md b/docs/en/functions/array_functions.md
old mode 100755
new mode 100644
index 232f6a20427..493a465ac82
--- a/docs/en/functions/array_functions.md
+++ b/docs/en/functions/array_functions.md
@@ -39,7 +39,7 @@ Accepts an empty array and returns a one-element array that is equal to the defa
Returns an array of numbers from 0 to N-1.
Just in case, an exception is thrown if arrays with a total length of more than 100,000,000 elements are created in a data block.
-## array(x1, ...), оператор \[x1, ...\]
+## array(x1, ...), operator \[x1, ...\]
Creates an array from the function arguments.
The arguments must be constants and have types that have the smallest common type. At least one argument must be passed, because otherwise it isn't clear which type of array to create. That is, you can't use this function to create an empty array (to do that, use the 'emptyArray\*' function described above).
@@ -62,6 +62,7 @@ arrayConcat(arrays)
```sql
SELECT arrayConcat([1, 2], [3, 4], [5, 6]) AS res
```
+
```
┌─res───────────┐
│ [1,2,3,4,5,6] │
@@ -202,6 +203,7 @@ arrayPopBack(array)
```sql
SELECT arrayPopBack([1, 2, 3]) AS res
```
+
```
┌─res───┐
│ [1,2] │
@@ -243,13 +245,14 @@ arrayPushBack(array, single_value)
**Arguments**
- `array` – Array.
-- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about ClickHouse data types, read the section "[Data types](../data_types/index.md#data_types)".
+- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about the types of data in ClickHouse, see "[Data types](../data_types/index.md#data_types)".
**Example**
```sql
SELECT arrayPushBack(['a'], 'b') AS res
```
+
```
┌─res───────┐
│ ['a','b'] │
@@ -267,7 +270,7 @@ arrayPushFront(array, single_value)
**Arguments**
- `array` – Array.
-- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about ClickHouse data types, read the section "[Data types](../data_types/index.md#data_types)".
+- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about the types of data in ClickHouse, see "[Data types](../data_types/index.md#data_types)".
**Example**
@@ -292,7 +295,7 @@ arraySlice(array, offset[, length])
**Arguments**
- `array` – Array of data.
-- `offset` – Indent from the edge of the array. A positive value indicates an offset on the left, and a negative value is an indent on the right. Numbering of the array items begins with 1.
+- `offset` – Offset from the edge of the array. A positive value indicates an offset on the left, and a negative value is an indent on the right. Numbering of the array items begins with 1.
- `length` - The length of the required slice. If you specify a negative value, the function returns an open slice `[offset, array_length - length)`. If you omit the value, the function returns the slice `[offset, the_end_of_array]`.
**Example**
@@ -300,6 +303,7 @@ arraySlice(array, offset[, length])
```sql
SELECT arraySlice([1, 2, 3, 4, 5], 2, 3) AS res
```
+
```
┌─res─────┐
│ [2,3,4] │
diff --git a/docs/en/functions/array_join.md b/docs/en/functions/array_join.md
old mode 100755
new mode 100644
index f94b2707f52..6e18f8203c0
--- a/docs/en/functions/array_join.md
+++ b/docs/en/functions/array_join.md
@@ -28,3 +28,4 @@ SELECT arrayJoin([1, 2, 3] AS src) AS dst, 'Hello', src
│ 3 │ Hello │ [1,2,3] │
└─────┴───────────┴─────────┘
```
+
diff --git a/docs/en/functions/bit_functions.md b/docs/en/functions/bit_functions.md
old mode 100755
new mode 100644
index c5a032aa5d6..523413f200a
--- a/docs/en/functions/bit_functions.md
+++ b/docs/en/functions/bit_functions.md
@@ -15,3 +15,4 @@ The result type is an integer with bits equal to the maximum bits of its argumen
## bitShiftLeft(a, b)
## bitShiftRight(a, b)
+
diff --git a/docs/en/functions/comparison_functions.md b/docs/en/functions/comparison_functions.md
old mode 100755
new mode 100644
index 9b95966ba84..e37642d42ed
--- a/docs/en/functions/comparison_functions.md
+++ b/docs/en/functions/comparison_functions.md
@@ -15,7 +15,7 @@ For example, you can't compare a date with a string. You have to use a function
Strings are compared by bytes. A shorter string is smaller than all strings that start with it and that contain at least one more character.
-Note. Up until version 1.1.54134, signed and unsigned numbers were compared the same way as in C++. In other words, you could get an incorrect result in cases like SELECT 9223372036854775807 > -1. This behavior changed in version 1.1.54134 and is now mathematically correct.
+Note: Up until version 1.1.54134, signed and unsigned numbers were compared the same way as in C++. In other words, you could get an incorrect result in cases like SELECT 9223372036854775807 > -1. This behavior changed in version 1.1.54134 and is now mathematically correct.
## equals, a = b and a == b operator
diff --git a/docs/en/functions/conditional_functions.md b/docs/en/functions/conditional_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/date_time_functions.md b/docs/en/functions/date_time_functions.md
old mode 100755
new mode 100644
index a7529e5f0e1..da6ad9b4a7c
--- a/docs/en/functions/date_time_functions.md
+++ b/docs/en/functions/date_time_functions.md
@@ -79,10 +79,6 @@ Rounds down a date with time to the start of the minute.
Rounds down a date with time to the start of the hour.
-## toStartOfFifteenMinutes
-
-Rounds down the date with time to the start of the fifteen-minute interval.
-
Note: If you need to round a date with time to any other number of seconds, minutes, or hours, you can convert it into a number by using the toUInt32 function, then round the number using intDiv and multiplication, and convert it back using the toDateTime function.
## toStartOfHour
diff --git a/docs/en/functions/encoding_functions.md b/docs/en/functions/encoding_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/ext_dict_functions.md b/docs/en/functions/ext_dict_functions.md
old mode 100755
new mode 100644
index 002e2f55845..949e805d9ab
--- a/docs/en/functions/ext_dict_functions.md
+++ b/docs/en/functions/ext_dict_functions.md
@@ -18,20 +18,18 @@ For information on connecting and configuring external dictionaries, see "[Exter
`dictGetT('dict_name', 'attr_name', id)`
-- Get the value of the attr_name attribute from the dict_name dictionary using the 'id' key.
-`dict_name` and `attr_name` are constant strings.
-`id`must be UInt64.
+- Get the value of the attr_name attribute from the dict_name dictionary using the 'id' key.`dict_name` and `attr_name` are constant strings.`id`must be UInt64.
If there is no `id` key in the dictionary, it returns the default value specified in the dictionary description.
## dictGetTOrDefault
`dictGetT('dict_name', 'attr_name', id, default)`
-The same as the `dictGetT` functions, but the default value is taken from the function's last argument.
+Similar to the functions dictGetT, but the default value is taken from the last argument of the function.
## dictIsIn
-`dictIsIn('dict_name', child_id, ancestor_id)`
+`dictIsIn ('dict_name', child_id, ancestor_id)`
- For the 'dict_name' hierarchical dictionary, finds out whether the 'child_id' key is located inside 'ancestor_id' (or matches 'ancestor_id'). Returns UInt8.
diff --git a/docs/en/functions/hash_functions.md b/docs/en/functions/hash_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/higher_order_functions.md b/docs/en/functions/higher_order_functions.md
old mode 100755
new mode 100644
index 15cc40dead1..ab9bdc50786
--- a/docs/en/functions/higher_order_functions.md
+++ b/docs/en/functions/higher_order_functions.md
@@ -73,7 +73,7 @@ Returns the index of the first element in the 'arr1' array for which 'func' retu
### arrayCumSum(\[func,\] arr1, ...)
-Returns an array of partial sums of elements in the source array (a running sum). If the `func` function is specified, then the values of the array elements are converted by this function before summing.
+Returns the cumulative sum of the array obtained from the original application of the 'func' function to each element in the 'arr' array.
Example:
@@ -86,3 +86,4 @@ SELECT arrayCumSum([1, 1, 1, 1]) AS res
│ [1, 2, 3, 4] │
└──────────────┘
```
+
diff --git a/docs/en/functions/in_functions.md b/docs/en/functions/in_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/index.md b/docs/en/functions/index.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/ip_address_functions.md b/docs/en/functions/ip_address_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/json_functions.md b/docs/en/functions/json_functions.md
old mode 100755
new mode 100644
index 90a2ddc47dd..1bf10e9cf0c
--- a/docs/en/functions/json_functions.md
+++ b/docs/en/functions/json_functions.md
@@ -1,11 +1,11 @@
-# Functions for working with JSON
+# Functions for working with JSON.
In Yandex.Metrica, JSON is transmitted by users as session parameters. There are some special functions for working with this JSON. (Although in most of the cases, the JSONs are additionally pre-processed, and the resulting values are put in separate columns in their processed format.) All these functions are based on strong assumptions about what the JSON can be, but they try to do as little as possible to get the job done.
The following assumptions are made:
1. The field name (function argument) must be a constant.
-2. The field name is somehow canonically encoded in JSON. For example: `visitParamHas('{"abc":"def"}', 'abc') = 1`, но `visitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0`
+2. The field name is somehow canonically encoded in JSON. For example: `visitParamHas('{"abc":"def"}', 'abc') = 1`, but `visitParamHas('{"\\u0061\\u0062\\u0063":"def"}', 'abc') = 0`
3. Fields are searched for on any nesting level, indiscriminately. If there are multiple matching fields, the first occurrence is used.
4. The JSON doesn't have space characters outside of string literals.
@@ -47,10 +47,7 @@ Parses the string in double quotes. The value is unescaped. If unescaping failed
Examples:
```text
-visitParamExtractString('{"abc":"\\n\\u0000"}', 'abc') = '\n\0'
-visitParamExtractString('{"abc":"\\u263a"}', 'abc') = '☺'
-visitParamExtractString('{"abc":"\\u263"}', 'abc') = ''
-visitParamExtractString('{"abc":"hello}', 'abc') = ''
+visitParamExtractString('{"abc":"\\n\\u0000"}', 'abc') = '\n\0'visitParamExtractString('{"abc":"\\u263a"}', 'abc') = '☺'visitParamExtractString('{"abc":"\\u263"}', 'abc') = ''visitParamExtractString('{"abc":"hello}', 'abc') = ''
```
There is currently no support for code points in the format `\uXXXX\uYYYY` that are not from the basic multilingual plane (they are converted to CESU-8 instead of UTF-8).
diff --git a/docs/en/functions/logical_functions.md b/docs/en/functions/logical_functions.md
old mode 100755
new mode 100644
index d396640a49d..4ef0fe5fd32
--- a/docs/en/functions/logical_functions.md
+++ b/docs/en/functions/logical_functions.md
@@ -11,3 +11,4 @@ Zero as an argument is considered "false," while any non-zero value is considere
## not, NOT operator
## xor
+
diff --git a/docs/en/functions/math_functions.md b/docs/en/functions/math_functions.md
old mode 100755
new mode 100644
index 42e3f3e8018..d606c87a509
--- a/docs/en/functions/math_functions.md
+++ b/docs/en/functions/math_functions.md
@@ -97,3 +97,4 @@ The arc tangent.
## pow(x, y)
xy.
+
diff --git a/docs/en/functions/other_functions.md b/docs/en/functions/other_functions.md
old mode 100755
new mode 100644
index 8a0063750fe..befd94ecd4e
--- a/docs/en/functions/other_functions.md
+++ b/docs/en/functions/other_functions.md
@@ -59,8 +59,7 @@ For elements in a nested data structure, the function checks for the existence o
Allows building a unicode-art diagram.
-`bar (x, min, max, width)` – Draws a band with a width proportional to (x - min) and equal to 'width' characters when x == max.
-`min, max` – Integer constants. The value must fit in Int64.`width` – Constant, positive number, may be a fraction.
+`bar (x, min, max, width)` – Draws a band with a width proportional to (x - min) and equal to 'width' characters when x == max.`min, max` – Integer constants. The value must fit in Int64.`width` – Constant, positive number, may be a fraction.
The band is drawn with accuracy to one eighth of a symbol.
@@ -138,7 +137,7 @@ Example:
```sql
SELECT
- transform(SearchEngineID, [2, 3], ['Yandex', 'Google'], 'Other') AS title,
+ transform(SearchEngineID, [2, 3], ['Yandex', 'Google'], 'Other' AS title,
count() AS c
FROM test.hits
WHERE SearchEngineID != 0
diff --git a/docs/en/functions/random_functions.md b/docs/en/functions/random_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/rounding_functions.md b/docs/en/functions/rounding_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/splitting_merging_functions.md b/docs/en/functions/splitting_merging_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/string_functions.md b/docs/en/functions/string_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/string_replace_functions.md b/docs/en/functions/string_replace_functions.md
old mode 100755
new mode 100644
index d70d8f404de..d3773504278
--- a/docs/en/functions/string_replace_functions.md
+++ b/docs/en/functions/string_replace_functions.md
@@ -76,3 +76,4 @@ SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res
│ here: Hello, World! │
└─────────────────────┘
```
+
diff --git a/docs/en/functions/string_search_functions.md b/docs/en/functions/string_search_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/type_conversion_functions.md b/docs/en/functions/type_conversion_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/url_functions.md b/docs/en/functions/url_functions.md
old mode 100755
new mode 100644
diff --git a/docs/en/functions/ym_dict_functions.md b/docs/en/functions/ym_dict_functions.md
old mode 100755
new mode 100644
index 7ba7e7012cf..540b5dd601a
--- a/docs/en/functions/ym_dict_functions.md
+++ b/docs/en/functions/ym_dict_functions.md
@@ -21,9 +21,7 @@ All functions for working with regions have an optional argument at the end –
Example:
```text
-regionToCountry(RegionID) – Uses the default dictionary: /opt/geo/regions_hierarchy.txt
-regionToCountry(RegionID, '') – Uses the default dictionary: /opt/geo/regions_hierarchy.txt
-regionToCountry(RegionID, 'ua') – Uses the dictionary for the 'ua' key: /opt/geo/regions_hierarchy_ua.txt
+regionToCountry(RegionID) – Uses the default dictionary: /opt/geo/regions_hierarchy.txtregionToCountry(RegionID, '') – Uses the default dictionary: /opt/geo/regions_hierarchy.txtregionToCountry(RegionID, 'ua') – Uses the dictionary for the 'ua' key: /opt/geo/regions_hierarchy_ua.txt
```
### regionToCity(id[, geobase])
@@ -35,9 +33,7 @@ Accepts a UInt32 number – the region ID from the Yandex geobase. If this regio
Converts a region to an area (type 5 in the geobase). In every other way, this function is the same as 'regionToCity'.
```sql
-SELECT DISTINCT regionToName(regionToArea(toUInt32(number), 'ua'))
-FROM system.numbers
-LIMIT 15
+SELECT DISTINCT regionToName(regionToArea(toUInt32(number), 'ua'))FROM system.numbersLIMIT 15
```
```text
@@ -65,9 +61,7 @@ LIMIT 15
Converts a region to a federal district (type 4 in the geobase). In every other way, this function is the same as 'regionToCity'.
```sql
-SELECT DISTINCT regionToName(regionToDistrict(toUInt32(number), 'ua'))
-FROM system.numbers
-LIMIT 15
+SELECT DISTINCT regionToName(regionToDistrict(toUInt32(number), 'ua'))FROM system.numbersLIMIT 15
```
```text
diff --git a/docs/en/getting_started/example_datasets/amplab_benchmark.md b/docs/en/getting_started/example_datasets/amplab_benchmark.md
old mode 100755
new mode 100644
diff --git a/docs/en/getting_started/example_datasets/criteo.md b/docs/en/getting_started/example_datasets/criteo.md
old mode 100755
new mode 100644
index 3c60a68f430..9b59d6e5f3d
--- a/docs/en/getting_started/example_datasets/criteo.md
+++ b/docs/en/getting_started/example_datasets/criteo.md
@@ -66,8 +66,6 @@ CREATE TABLE criteo
Transform data from the raw log and put it in the second table:
```sql
-INSERT INTO criteo SELECT date, clicked, int1, int2, int3, int4, int5, int6, int7, int8, int9, int10, int11, int12, int13, reinterpretAsUInt32(unhex(cat1)) AS icat1, reinterpretAsUInt32(unhex(cat2)) AS icat2, reinterpretAsUInt32(unhex(cat3)) AS icat3, reinterpretAsUInt32(unhex(cat4)) AS icat4, reinterpretAsUInt32(unhex(cat5)) AS icat5, reinterpretAsUInt32(unhex(cat6)) AS icat6, reinterpretAsUInt32(unhex(cat7)) AS icat7, reinterpretAsUInt32(unhex(cat8)) AS icat8, reinterpretAsUInt32(unhex(cat9)) AS icat9, reinterpretAsUInt32(unhex(cat10)) AS icat10, reinterpretAsUInt32(unhex(cat11)) AS icat11, reinterpretAsUInt32(unhex(cat12)) AS icat12, reinterpretAsUInt32(unhex(cat13)) AS icat13, reinterpretAsUInt32(unhex(cat14)) AS icat14, reinterpretAsUInt32(unhex(cat15)) AS icat15, reinterpretAsUInt32(unhex(cat16)) AS icat16, reinterpretAsUInt32(unhex(cat17)) AS icat17, reinterpretAsUInt32(unhex(cat18)) AS icat18, reinterpretAsUInt32(unhex(cat19)) AS icat19, reinterpretAsUInt32(unhex(cat20)) AS icat20, reinterpretAsUInt32(unhex(cat21)) AS icat21, reinterpretAsUInt32(unhex(cat22)) AS icat22, reinterpretAsUInt32(unhex(cat23)) AS icat23, reinterpretAsUInt32(unhex(cat24)) AS icat24, reinterpretAsUInt32(unhex(cat25)) AS icat25, reinterpretAsUInt32(unhex(cat26)) AS icat26 FROM criteo_log;
-
-DROP TABLE criteo_log;
+INSERT INTO criteo SELECT date, clicked, int1, int2, int3, int4, int5, int6, int7, int8, int9, int10, int11, int12, int13, reinterpretAsUInt32(unhex(cat1)) AS icat1, reinterpretAsUInt32(unhex(cat2)) AS icat2, reinterpretAsUInt32(unhex(cat3)) AS icat3, reinterpretAsUInt32(unhex(cat4)) AS icat4, reinterpretAsUInt32(unhex(cat5)) AS icat5, reinterpretAsUInt32(unhex(cat6)) AS icat6, reinterpretAsUInt32(unhex(cat7)) AS icat7, reinterpretAsUInt32(unhex(cat8)) AS icat8, reinterpretAsUInt32(unhex(cat9)) AS icat9, reinterpretAsUInt32(unhex(cat10)) AS icat10, reinterpretAsUInt32(unhex(cat11)) AS icat11, reinterpretAsUInt32(unhex(cat12)) AS icat12, reinterpretAsUInt32(unhex(cat13)) AS icat13, reinterpretAsUInt32(unhex(cat14)) AS icat14, reinterpretAsUInt32(unhex(cat15)) AS icat15, reinterpretAsUInt32(unhex(cat16)) AS icat16, reinterpretAsUInt32(unhex(cat17)) AS icat17, reinterpretAsUInt32(unhex(cat18)) AS icat18, reinterpretAsUInt32(unhex(cat19)) AS icat19, reinterpretAsUInt32(unhex(cat20)) AS icat20, reinterpretAsUInt32(unhex(cat21)) AS icat21, reinterpretAsUInt32(unhex(cat22)) AS icat22, reinterpretAsUInt32(unhex(cat23)) AS icat23, reinterpretAsUInt32(unhex(cat24)) AS icat24, reinterpretAsUInt32(unhex(cat25)) AS icat25, reinterpretAsUInt32(unhex(cat26)) AS icat26 FROM criteo_log;DROP TABLE criteo_log;
```
diff --git a/docs/en/getting_started/example_datasets/nyc_taxi.md b/docs/en/getting_started/example_datasets/nyc_taxi.md
old mode 100755
new mode 100644
index a9f04f595d1..11ed81d1a43
--- a/docs/en/getting_started/example_datasets/nyc_taxi.md
+++ b/docs/en/getting_started/example_datasets/nyc_taxi.md
@@ -1,8 +1,8 @@
-# New York Taxi data
+# Data about New York taxis
-## How to import the raw data
+## How to import raw data
-See and for the description of the dataset and instructions for downloading.
+See and for description of the dataset and loading instructions.
Downloading will result in about 227 GB of uncompressed data in CSV files. The download takes about an hour over a 1 Gbit connection (parallel downloading from s3.amazonaws.com recovers at least half of a 1 Gbit channel).
Some of the files might not download fully. Check the file sizes and re-download any that seem doubtful.
@@ -301,14 +301,19 @@ SELECT passenger_count, toYear(pickup_date) AS year, count(*) FROM trips_mergetr
Q4:
```sql
-SELECT passenger_count, toYear(pickup_date) AS year, round(trip_distance) AS distance, count(*)FROM trips_mergetreeGROUP BY passenger_count, year, distanceORDER BY year, count(*) DESC
+SELECT passenger_count, toYear(pickup_date) AS year, round(trip_distance) AS distance, count(*)
+FROM trips_mergetree
+GROUP BY passenger_count, year, distance
+ORDER BY year, count(*) DESC
```
3.593 seconds.
The following server was used:
-Two Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 16 physical kernels total,128 GiB RAM,8x6 TB HD on hardware RAID-5
+Two Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 16 physical kernels total,
+128 GiB RAM,
+8x6 TB HD on hardware RAID-5
Execution time is the best of three runsBut starting from the second run, queries read data from the file system cache. No further caching occurs: the data is read out and processed in each run.
diff --git a/docs/en/getting_started/example_datasets/ontime.md b/docs/en/getting_started/example_datasets/ontime.md
old mode 100755
new mode 100644
diff --git a/docs/en/getting_started/example_datasets/star_schema.md b/docs/en/getting_started/example_datasets/star_schema.md
old mode 100755
new mode 100644
index 664ba59f48c..8807de3e670
--- a/docs/en/getting_started/example_datasets/star_schema.md
+++ b/docs/en/getting_started/example_datasets/star_schema.md
@@ -1,4 +1,4 @@
-# Star Schema Benchmark
+# Star scheme
Compiling dbgen:
@@ -82,4 +82,3 @@ Downloading data (change 'customer' to 'customerd' in the distributed version):
cat customer.tbl | sed 's/$/2000-01-01/' | clickhouse-client --query "INSERT INTO customer FORMAT CSV"
cat lineorder.tbl | clickhouse-client --query "INSERT INTO lineorder FORMAT CSV"
```
-
diff --git a/docs/en/getting_started/example_datasets/wikistat.md b/docs/en/getting_started/example_datasets/wikistat.md
old mode 100755
new mode 100644
index fee0a56b52c..6cbc3b15561
--- a/docs/en/getting_started/example_datasets/wikistat.md
+++ b/docs/en/getting_started/example_datasets/wikistat.md
@@ -20,7 +20,7 @@ CREATE TABLE wikistat
Loading data:
```bash
-for i in {2007..2016}; do for j in {01..12}; do echo $i-$j >&2; curl -sSL "http://dumps.wikimedia.org/other/pagecounts-raw/$i/$i-$j/" | grep -oE 'pagecounts-[0-9]+-[0-9]+\.gz'; done; done | sort | uniq | tee links.txt
+for i in {2007..2016}; do for j in {01..12}; do echo $i-$j >&2; curl -sS "http://dumps.wikimedia.org/other/pagecounts-raw/$i/$i-$j/" | grep -oE 'pagecounts-[0-9]+-[0-9]+\.gz'; done; done | sort | uniq | tee links.txt
cat links.txt | while read link; do wget http://dumps.wikimedia.org/other/pagecounts-raw/$(echo $link | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})[0-9]{2}-[0-9]+\.gz/\1/')/$(echo $link | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})[0-9]{2}-[0-9]+\.gz/\1-\2/')/$link; done
ls -1 /opt/wikistat/ | grep gz | while read i; do echo $i; gzip -cd /opt/wikistat/$i | ./wikistat-loader --time="$(echo -n $i | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})([0-9]{2})-([0-9]{2})([0-9]{2})([0-9]{2})\.gz/\1-\2-\3 \4-00-00/')" | clickhouse-client --query="INSERT INTO wikistat FORMAT TabSeparated"; done
```
diff --git a/docs/en/getting_started/index.md b/docs/en/getting_started/index.md
old mode 100755
new mode 100644
index 07d0d91a224..42fc1c75551
--- a/docs/en/getting_started/index.md
+++ b/docs/en/getting_started/index.md
@@ -16,15 +16,14 @@ The terminal must use UTF-8 encoding (the default in Ubuntu).
For testing and development, the system can be installed on a single server or on a desktop computer.
-### Installing from packages
+### Installing from packages Debian/Ubuntu
In `/etc/apt/sources.list` (or in a separate `/etc/apt/sources.list.d/clickhouse.list` file), add the repository:
```text
-deb http://repo.yandex.ru/clickhouse/trusty stable main
+deb http://repo.yandex.ru/clickhouse/deb/stable/ main/
```
-On other versions of Ubuntu, replace `trusty` with `xenial` or `precise`.
If you want to use the most recent test version, replace 'stable' with 'testing'.
Then run:
@@ -35,10 +34,7 @@ sudo apt-get update
sudo apt-get install clickhouse-client clickhouse-server-common
```
-You can also download and install packages manually from here:
-
-
-
+You can also download and install packages manually from here:
ClickHouse contains access restriction settings. They are located in the 'users.xml' file (next to 'config.xml').
By default, access is allowed from anywhere for the 'default' user, without a password. See 'user/default/networks'.
@@ -104,8 +100,7 @@ clickhouse-client
```
The default parameters indicate connecting with localhost:9000 on behalf of the user 'default' without a password.
-The client can be used for connecting to a remote server.
-Example:
+The client can be used for connecting to a remote server. Example:
```bash
clickhouse-client --host=example.com
@@ -137,4 +132,3 @@ SELECT 1
**Congratulations, the system works!**
To continue experimenting, you can try to download from the test data sets.
-
diff --git a/docs/en/index.md b/docs/en/index.md
old mode 100755
new mode 100644
index 72efa70802b..586c18297a8
--- a/docs/en/index.md
+++ b/docs/en/index.md
@@ -39,7 +39,7 @@ We'll say that the following is true for the OLAP (online analytical processing)
- Data is updated in fairly large batches (> 1000 rows), not by single rows; or it is not updated at all.
- Data is added to the DB but is not modified.
- For reads, quite a large number of rows are extracted from the DB, but only a small subset of columns.
-- Tables are "wide," meaning they contain a large number of columns.
+- Tables are "wide", meaning they contain a large number of columns.
- Queries are relatively rare (usually hundreds of queries per server or less per second).
- For simple queries, latencies around 50 ms are allowed.
- Column values are fairly small: numbers and short strings (for example, 60 bytes per URL).
diff --git a/docs/en/interfaces/cli.md b/docs/en/interfaces/cli.md
old mode 100755
new mode 100644
index 76549b46b36..4fd998fed66
--- a/docs/en/interfaces/cli.md
+++ b/docs/en/interfaces/cli.md
@@ -6,9 +6,7 @@ To work from the command line, you can use ` clickhouse-client`:
$ clickhouse-client
ClickHouse client version 0.0.26176.
Connecting to localhost:9000.
-Connected to ClickHouse server version 0.0.26176.
-
-:)
+Connected to ClickHouse server version 0.0.26176.:)
```
The client supports command-line options and configuration files. For more information, see "[Configuring](#interfaces_cli_configuration)".
@@ -31,6 +29,7 @@ _EOF
cat file.csv | clickhouse-client --database=test --query="INSERT INTO test FORMAT CSV";
```
+In batch mode, the default data format is TabSeparated. You can set the format in the FORMAT clause of the query.
By default, you can only process a single query in batch mode. To make multiple queries from a "script," use the --multiquery parameter. This works for all queries except INSERT. Query results are output consecutively without additional separators.
Similarly, to process a large number of queries, you can run 'clickhouse-client' for each query. Note that it may take tens of milliseconds to launch the 'clickhouse-client' program.
@@ -65,7 +64,7 @@ The command-line client allows passing external data (external temporary tables)
-## Configuring
+## Configure
You can pass parameters to `clickhouse-client` (all parameters have a default value) using:
diff --git a/docs/en/interfaces/http_interface.md b/docs/en/interfaces/http_interface.md
old mode 100755
new mode 100644
index 38a70feef46..5c989a59d65
--- a/docs/en/interfaces/http_interface.md
+++ b/docs/en/interfaces/http_interface.md
@@ -37,8 +37,7 @@ Date: Fri, 16 Nov 2012 19:21:50 GMT
1
```
-As you can see, curl is somewhat inconvenient in that spaces must be URL escaped.
-Although wget escapes everything itself, we don't recommend using it because it doesn't work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
+As you can see, curl is somewhat inconvenient in that spaces must be URL escaped.Although wget escapes everything itself, we don't recommend using it because it doesn't work well over HTTP 1.1 when using keep-alive and Transfer-Encoding: chunked.
```bash
$ echo 'SELECT 1' | curl 'http://localhost:8123/' --data-binary @-
@@ -131,11 +130,15 @@ POST 'http://localhost:8123/?query=DROP TABLE t'
For successful requests that don't return a data table, an empty response body is returned.
-You can use compression when transmitting data. The compressed data has a non-standard format, and you will need to use the special compressor program to work with it (sudo apt-get install compressor-metrika-yandex).
+You can use compression when transmitting data.
+For using ClickHouse internal compression format, and you will need to use the special clickhouse-compressor program to work with it (installed as a part of clickhouse-client package).
If you specified 'compress=1' in the URL, the server will compress the data it sends you.
If you specified 'decompress=1' in the URL, the server will decompress the same data that you pass in the POST method.
+Also standard gzip-based HTTP compression can be used. To send gzip compressed POST data just add `Content-Encoding: gzip` to request headers, and gzip POST body.
+To get response compressed, you need to add `Accept-Encoding: gzip` to request headers, and turn on ClickHouse setting called `enable_http_compression`.
+
You can use this to reduce network traffic when transmitting a large amount of data, or for creating dumps that are immediately compressed.
You can use the 'database' URL parameter to specify the default database.
@@ -191,7 +194,11 @@ $ echo 'SELECT number FROM system.numbers LIMIT 10' | curl 'http://localhost:812
For information about other parameters, see the section "SET".
-In contrast to the native interface, the HTTP interface does not support the concept of sessions or session settings, does not allow aborting a query (to be exact, it allows this in only a few cases), and does not show the progress of query processing. Parsing and data formatting are performed on the server side, and using the network might be ineffective.
+You can use ClickHouse sessions in the HTTP protocol. To do this, you need to specify the `session_id` GET parameter in HTTP request. You can use any alphanumeric string as a session_id. By default session will be timed out after 60 seconds of inactivity. You can change that by setting `default_session_timeout` in server config file, or by adding GET parameter `session_timeout`. You can also check the status of the session by using GET parameter `session_check=1`. When using sessions you can't run 2 queries with the same session_id simultaneously.
+
+You can get the progress of query execution in X-ClickHouse-Progress headers, by enabling setting send_progress_in_http_headers.
+
+Running query are not aborted automatically after closing HTTP connection. Parsing and data formatting are performed on the server side, and using the network might be ineffective.
The optional 'query_id' parameter can be passed as the query ID (any string). For more information, see the section "Settings, replace_running_query".
The optional 'quota_key' parameter can be passed as the quota key (any string). For more information, see the section "Quotas".
@@ -213,4 +220,3 @@ curl -sS 'http://localhost:8123/?max_result_bytes=4000000&buffer_size=3000000&wa
```
Use buffering to avoid situations where a query processing error occurred after the response code and HTTP headers were sent to the client. In this situation, an error message is written at the end of the response body, and on the client side, the error can only be detected at the parsing stage.
-
diff --git a/docs/en/interfaces/index.md b/docs/en/interfaces/index.md
old mode 100755
new mode 100644
index e43f4474271..3e3e3df4853
--- a/docs/en/interfaces/index.md
+++ b/docs/en/interfaces/index.md
@@ -2,5 +2,4 @@
# Interfaces
-To explore the system's capabilities, download data to tables, or make manual queries, use the clickhouse-client program.
-
+To explore the system's capabilities, download data to tables, or make manual queries, use the clickhouse-client program.
\ No newline at end of file
diff --git a/docs/en/interfaces/jdbc.md b/docs/en/interfaces/jdbc.md
old mode 100755
new mode 100644
diff --git a/docs/en/interfaces/tcp.md b/docs/en/interfaces/tcp.md
old mode 100755
new mode 100644
diff --git a/docs/en/interfaces/third-party_client_libraries.md b/docs/en/interfaces/third-party_client_libraries.md
old mode 100755
new mode 100644
index 8437be23b99..cc8ff1f4307
--- a/docs/en/interfaces/third-party_client_libraries.md
+++ b/docs/en/interfaces/third-party_client_libraries.md
@@ -2,7 +2,7 @@
There are libraries for working with ClickHouse for:
-- Python
+- Python:
- [infi.clickhouse_orm](https://github.com/Infinidat/infi.clickhouse_orm)
- [sqlalchemy-clickhouse](https://github.com/cloudflare/sqlalchemy-clickhouse)
- [clickhouse-driver](https://github.com/mymarilyn/clickhouse-driver)
diff --git a/docs/en/interfaces/third-party_gui.md b/docs/en/interfaces/third-party_gui.md
old mode 100755
new mode 100644
diff --git a/docs/en/introduction/distinctive_features.md b/docs/en/introduction/distinctive_features.md
old mode 100755
new mode 100644
index 59853b8e202..3927405579f
--- a/docs/en/introduction/distinctive_features.md
+++ b/docs/en/introduction/distinctive_features.md
@@ -1,10 +1,10 @@
# Distinctive features of ClickHouse
-## True column-oriented DBMS
+## True column-oriented DBMS.
-In a true column-oriented DBMS, there isn't any "garbage" stored with the values. Among other things, this means that constant-length values must be supported, to avoid storing their length "number" next to the values. As an example, a billion UInt8-type values should actually consume around 1 GB uncompressed, or this will strongly affect the CPU use. It is very important to store data compactly (without any "garbage") even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
+In a true column-oriented DBMS, there isn't any "garbage" stored with the values. For example, constant-length values must be supported, to avoid storing their length "number" next to the values. As an example, a billion UInt8-type values should actually consume around 1 GB uncompressed, or this will strongly affect the CPU use. It is very important to store data compactly (without any "garbage") even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
-This is worth noting because there are systems that can store values of separate columns separately, but that can't effectively process analytical queries due to their optimization for other scenarios. Examples are HBase, BigTable, Cassandra, and HyperTable. In these systems, you will get throughput around a hundred thousand rows per second, but not hundreds of millions of rows per second.
+This is worth noting because there are systems that can store values of separate columns separately, but that can't effectively process analytical queries due to their optimization for other scenarios. Example are HBase, BigTable, Cassandra, and HyperTable. In these systems, you will get throughput around a hundred thousand rows per second, but not hundreds of millions of rows per second.
Also note that ClickHouse is a DBMS, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
@@ -12,15 +12,15 @@ Also note that ClickHouse is a DBMS, not a single database. ClickHouse allows cr
Some column-oriented DBMSs (InfiniDB CE and MonetDB) do not use data compression. However, data compression really improves performance.
-## Disk storage of data
+## Disk storage of data.
Many column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. But even on thousands of servers, the RAM is too small for storing all the pageviews and sessions in Yandex.Metrica.
-## Parallel processing on multiple cores
+## Parallel processing on multiple cores.
Large queries are parallelized in a natural way.
-## Distributed processing on multiple servers
+## Distributed processing on multiple servers.
Almost none of the columnar DBMSs listed above have support for distributed processing.
In ClickHouse, data can reside on different shards. Each shard can be a group of replicas that are used for fault tolerance. The query is processed on all the shards in parallel. This is transparent for the user.
@@ -30,12 +30,12 @@ In ClickHouse, data can reside on different shards. Each shard can be a group of
If you are familiar with standard SQL, we can't really talk about SQL support.
All the functions have different names.
However, this is a declarative query language based on SQL that can't be differentiated from SQL in many instances.
-JOINs are supported. Subqueries are supported in FROM, IN, and JOIN clauses, as well as scalar subqueries.
+Support for JOINs. Subqueries are supported in FROM, IN, and JOIN clauses, as well as scalar subqueries.
Dependent subqueries are not supported.
## Vector engine
-Data is not only stored by columns, but is processed by vectors (parts of columns). This allows us to achieve high CPU performance.
+Data is not only stored by columns, but is processed by vectors – parts of columns. This allows us to achieve high CPU performance.
## Real-time data updates
@@ -43,13 +43,13 @@ ClickHouse supports primary key tables. In order to quickly perform queries on t
## Indexes
-Having a primary key makes it possible to extract data for specific clients (for instance, Yandex.Metrica tracking tags) for a specific time range, with low latency less than several dozen milliseconds.
+Having a primary key allows, for example, extracting data for specific clients (Metrica counters) for a specific time range, with low latency less than several dozen milliseconds.
## Suitable for online queries
This lets us use the system as the back-end for a web interface. Low latency means queries can be processed without delay, while the Yandex.Metrica interface page is loading. In other words, in online mode.
-## Support for approximated calculations
+## Support for approximated calculations.
1. The system contains aggregate functions for approximated calculation of the number of various values, medians, and quantiles.
2. Supports running a query based on a part (sample) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
diff --git a/docs/en/introduction/features_considered_disadvantages.md b/docs/en/introduction/features_considered_disadvantages.md
old mode 100755
new mode 100644
diff --git a/docs/en/introduction/index.md b/docs/en/introduction/index.md
old mode 100755
new mode 100644
index 3d07efe555d..e10b99d0138
--- a/docs/en/introduction/index.md
+++ b/docs/en/introduction/index.md
@@ -1,2 +1 @@
# Introduction
-
diff --git a/docs/en/introduction/performance.md b/docs/en/introduction/performance.md
old mode 100755
new mode 100644
diff --git a/docs/en/introduction/possible_silly_questions.md b/docs/en/introduction/possible_silly_questions.md
old mode 100755
new mode 100644
index cf7b2c48032..36363ebe247
--- a/docs/en/introduction/possible_silly_questions.md
+++ b/docs/en/introduction/possible_silly_questions.md
@@ -1,8 +1,8 @@
-# Questions you were afraid to ask
+# Everything you were afraid to ask
## Why not use something like MapReduce?
-We can refer to systems like map-reduce as distributed computing systems in which the reduce operation is based on distributed sorting. In this sense, they include Hadoop, and YT (YT is developed at Yandex for internal use).
+We can refer to systems like map-reduce as distributed computing systems in which the reduce operation is based on distributed sorting. In this sense, they include Hadoop and YT (Yandex proprietary technology).
These systems aren't appropriate for online queries due to their high latency. In other words, they can't be used as the back-end for a web interface.
These types of systems aren't useful for real-time data updates.
diff --git a/docs/en/introduction/ya_metrika_task.md b/docs/en/introduction/ya_metrika_task.md
old mode 100755
new mode 100644
index 10f45f061d6..6a488be9b5f
--- a/docs/en/introduction/ya_metrika_task.md
+++ b/docs/en/introduction/ya_metrika_task.md
@@ -1,6 +1,6 @@
-# Yandex.Metrica use case
+# The Yandex.Metrica task
-ClickHouse currently powers [Yandex.Metrica](https://metrika.yandex.ru/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all). With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows you generating custom reports on the fly directly from non-aggregated data.
+ClickHouse currently powers [ Yandex.Metrica](https://metrika.yandex.ru/), [ the second largest platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all) for Web Analytics. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows you generating custom reports on the fly directly from non-aggregated data.
We need to get custom reports based on hits and sessions, with custom segments set by the user. Data for the reports is updated in real-time. Queries must be run immediately (in online mode). We must be able to build reports for any time period. Complex aggregates must be calculated, such as the number of unique visitors.
At this time (April 2014), Yandex.Metrica receives approximately 12 billion events (pageviews and mouse clicks) daily. All these events must be stored in order to build custom reports. A single query may require scanning hundreds of millions of rows over a few seconds, or millions of rows in no more than a few hundred milliseconds.
diff --git a/docs/en/operations/access_rights.md b/docs/en/operations/access_rights.md
old mode 100755
new mode 100644
index 1c72bf13b3e..9879dab9a99
--- a/docs/en/operations/access_rights.md
+++ b/docs/en/operations/access_rights.md
@@ -2,14 +2,14 @@
Users and access rights are set up in the user config. This is usually `users.xml`.
-Users are recorded in the 'users' section. Here is a fragment of the `users.xml` file:
+Users are recorded in the 'users' section. We'll look at a fragment of the `users.xml` file:
```xml
-
-
-
-
+ How to generate a decent password:
+ Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
+ The first line has the password and the second line has the corresponding SHA256.
+ -->
+
-
-
-
- default
-
+
+ ::/0
+
+ -->
+
+
+
+ default
+
default
-
-
-
-
-
-
- web
- default
-
- test
+
+
+
+
+
+
+ web
+ default
+
+ test
```
You can see a declaration from two users: `default`and`web`. We added the `web` user separately.
-The `default` user is chosen in cases when the username is not passed. The `default` user is also used for distributed query processing, if the configuration of the server or cluster doesn't specify the `user` and `password` (see the section on the [Distributed](../table_engines/distributed.md#table_engines-distributed) engine).
+The `default` user is chosen in cases when the username is not passed. The `default` user is also used for distributed query processing, if the configuration of the server or cluster doesn't specify `user` and `password` (see the section on the [Distributed](../table_engines/distributed.md#distributed_distributed) engine).
The user that is used for exchanging information between servers combined in a cluster must not have substantial restrictions or quotas – otherwise, distributed queries will fail.
diff --git a/docs/en/operations/configuration_files.md b/docs/en/operations/configuration_files.md
old mode 100755
new mode 100644
index 52e9e10ffea..c3122617bf1
--- a/docs/en/operations/configuration_files.md
+++ b/docs/en/operations/configuration_files.md
@@ -14,7 +14,7 @@ If `replace` is specified, it replaces the entire element with the specified one
If ` remove` is specified, it deletes the element.
-The config can also define "substitutions". If an element has the `incl` attribute, the corresponding substitution from the file will be used as the value. By default, the path to the file with substitutions is `/etc/metrika.xml`. This can be changed in the [include_from](server_settings/settings.md#server_settings-include_from) element in the server config. The substitution values are specified in `/yandex/substitution_name` elements in this file. If a substitution specified in ` incl` does not exist, it is recorded in the log. To prevent ClickHouse from logging missing substitutions, specify the `optional="true"` attribute (for example, settings for [macros]()server_settings/settings.md#server_settings-macros)).
+The config can also define "substitutions". If an element has the `incl` attribute, the corresponding substitution from the file will be used as the value. By default, the path to the file with substitutions is `/etc/metrika.xml`. This can be changed in the config in the [include_from](server_settings/settings.md#server_settings-include_from) element. The substitution values are specified in `/yandex/substitution_name` elements in this file. If a substitution specified in ` incl` does not exist, it is recorded in the log. To prevent ClickHouse from logging missing substitutions, specify the `optional = "true"` attribute (for example, settings for [ macros](server_settings/settings.md#server_settings-macros)).
Substitutions can also be performed from ZooKeeper. To do this, specify the attribute `from_zk = "/path/to/node"`. The element value is replaced with the contents of the node at ` /path/to/node` in ZooKeeper. You can also put an entire XML subtree on the ZooKeeper node and it will be fully inserted into the source element.
diff --git a/docs/en/operations/index.md b/docs/en/operations/index.md
old mode 100755
new mode 100644
index 0ff38af8086..eb90f937cff
--- a/docs/en/operations/index.md
+++ b/docs/en/operations/index.md
@@ -1,2 +1 @@
-# Usage
-
+# Operation
diff --git a/docs/en/operations/quotas.md b/docs/en/operations/quotas.md
old mode 100755
new mode 100644
index fb1238b257d..d7b1a61ce7f
--- a/docs/en/operations/quotas.md
+++ b/docs/en/operations/quotas.md
@@ -18,10 +18,10 @@ Let's look at the section of the 'users.xml' file that defines quotas.
-
+
- 3600
-
+ 3600
+
00
@@ -39,19 +39,21 @@ The resource consumption calculated for each interval is output to the server lo
-
- 3600
- 1000
- 100
- 1000000000
- 100000000000
- 900
-
+
+ 3600
-
- 86400
- 10000
- 1000
+ 1000
+ 100
+ 1000000000
+ 100000000000
+ 900
+
+
+
+ 86400
+
+ 10000
+ 100050000000005000000000007200
@@ -87,7 +89,7 @@ Quotas can use the "quota key" feature in order to report on resources for multi
Using keys makes sense only if quota_key is transmitted by the program, not by a user.
You can also write so the IP address is used as the quota key.(But keep in mind that users can change the IPv6 address fairly easily.)
- -->
+ -->
```
@@ -96,3 +98,4 @@ The quota is assigned to users in the 'users' section of the config. See the sec
For distributed query processing, the accumulated amounts are stored on the requestor server. So if the user goes to another server, the quota there will "start over".
When the server is restarted, quotas are reset.
+
diff --git a/docs/en/operations/server_settings/index.md b/docs/en/operations/server_settings/index.md
old mode 100755
new mode 100644
index 208deec710c..2293e86f5c7
--- a/docs/en/operations/server_settings/index.md
+++ b/docs/en/operations/server_settings/index.md
@@ -9,4 +9,3 @@ These settings are stored in the ` config.xml` file on the ClickHouse server.
Other settings are described in the "[Settings](../settings/index.md#settings)" section.
Before studying the settings, read the [Configuration files](../configuration_files.md#configuration_files) section and note the use of substitutions (the `incl` and `optional` attributes).
-
diff --git a/docs/en/operations/server_settings/settings.md b/docs/en/operations/server_settings/settings.md
old mode 100755
new mode 100644
index e1575df2f88..8818f8ec932
--- a/docs/en/operations/server_settings/settings.md
+++ b/docs/en/operations/server_settings/settings.md
@@ -67,7 +67,7 @@ ClickHouse checks ` min_part_size` and ` min_part_size_ratio` and processes th
The default database.
-To get a list of databases, use the [ SHOW DATABASES]( query./../query_language/queries.md#query_language_queries_show_databases).
+Use a [ SHOW DATABASES](../../query_language/queries.md#query_language_queries_show_databases) query to get a list of databases.
**Example**
@@ -81,7 +81,7 @@ To get a list of databases, use the [ SHOW DATABASES]( query./../query_language/
Default settings profile.
-Settings profiles are located in the file specified in the parameter [user_config](#server_settings-users_config).
+Settings profiles are located in the file specified in the [user_config](#server_settings-users_config) parameter.
**Example**
@@ -100,7 +100,7 @@ Path:
- Specify the absolute path or the path relative to the server config file.
- The path can contain wildcards \* and ?.
-See also "[External dictionaries]("./../dicts/external_dicts.md#dicts-external_dicts)".
+See also "[External dictionaries](../../dicts/external_dicts.md#dicts-external_dicts)".
**Example**
@@ -130,12 +130,12 @@ The default is ` true`.
## format_schema_path
-The path to the directory with the schemes for the input data, such as schemas for the [CapnProto](../../formats/capnproto.md#format_capnproto) format.
+The path to the directory with the schemas for the input data, such as schemas for the [ CapnProto](../../formats/capnproto.md#format_capnproto) format.
**Example**
```xml
-
+
format_schemas/
```
@@ -179,7 +179,7 @@ You can configure multiple `` clauses. For instance, you can use this
Settings for thinning data for Graphite.
-For more information, see [GraphiteMergeTree](../../table_engines/graphitemergetree.md#table_engines-graphitemergetree).
+For more details, see [ GraphiteMergeTree](../../table_engines/graphitemergetree.md#table_engines-graphitemergetree).
**Example**
@@ -241,7 +241,7 @@ Opens `https://tabix.io/` when accessing ` http://localhost: http_port`.
The path to the file with substitutions.
-For more information, see the section "[Configuration files](../configuration_files.md#configuration_files)".
+For details, see the section "[Configuration files](../configuration_files.md#configuration_files)".
**Example**
@@ -298,8 +298,7 @@ Restriction on hosts that requests can come from. If you want the server to answ
Examples:
```xml
-::1
-127.0.0.1
+::1127.0.0.1
```
@@ -348,7 +347,7 @@ For more information, see the section "[Creating replicated tables](../../table_
## mark_cache_size
-Approximate size (in bytes) of the cache of "marks" used by [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) family.
+Approximate size (in bytes) of the cache of "marks" used by [ MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) engines.
The cache is shared for the server and memory is allocated as needed. The cache size must be at least 5368709120.
@@ -404,7 +403,7 @@ We recommend using this option in Mac OS X, since the ` getrlimit()` function re
Restriction on deleting tables.
-If the size of a [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) type table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
+If the size of a [ MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) type table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
If you still need to delete the table without restarting the ClickHouse server, create the ` /flags/force_drop_table` file and run the DROP query.
@@ -440,17 +439,17 @@ For more information, see the MergeTreeSettings.h header file.
SSL client/server configuration.
-Support for SSL is provided by the `` libpoco`` library. The interface is described in the file [SSLManager.h](https://github.com/yandex/ClickHouse/blob/master/contrib/libpoco/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h)
+Support for SSL is provided by the `` libpoco`` library. The description of the interface is in the [ SSLManager.h file.](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h)
Keys for server/client settings:
- privateKeyFile – The path to the file with the secret key of the PEM certificate. The file may contain a key and certificate at the same time.
- certificateFile – The path to the client/server certificate file in PEM format. You can omit it if `` privateKeyFile`` contains the certificate.
- caConfig – The path to the file or directory that contains trusted root certificates.
-- verificationMode – The method for checking the node's certificates. Details are in the description of the [Context](https://github.com/yandex/ClickHouse/blob/master/contrib/libpoco/NetSSL_OpenSSL/include/Poco/Net/Context.h) class. Possible values: ``none``, ``relaxed``, ``strict``, ``once``.
+- verificationMode – The method for checking the node's certificates. Details are in the description of the [Context](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/Context.h) class. Acceptable values: ``none``, ``relaxed``, ``strict``, ``once``.
- verificationDepth – The maximum length of the verification chain. Verification will fail if the certificate chain length exceeds the set value.
- loadDefaultCAFile – Indicates that built-in CA certificates for OpenSSL will be used. Acceptable values: `` true``, `` false``. |
-- cipherList - Поддерживаемые OpenSSL-шифры. For example: `` ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH``.
+- cipherList - Supported OpenSSL-ciphers. For example: `` ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH``.
- cacheSessions – Enables or disables caching sessions. Must be used in combination with ``sessionIdContext``. Acceptable values: `` true``, `` false``.
- sessionIdContext – A unique set of random characters that the server appends to each generated identifier. The length of the string must not exceed ``SSL_MAX_SSL_SESSION_ID_LENGTH``. This parameter is always recommended, since it helps avoid problems both if the server caches the session and if the client requested caching. Default value: ``${application.name}``.
- sessionCacheSize – The maximum number of sessions that the server caches. Default value: 1024\*20. 0 – Unlimited sessions.
@@ -499,7 +498,7 @@ Keys for server/client settings:
## part_log
-Logging events that are associated with [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) data. For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
+Logging events that are associated with the [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) data type. For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
Queries are logged in the ClickHouse table, not in a separate file.
@@ -519,7 +518,7 @@ Use the following parameters to configure logging:
- database – Name of the database.
- table – Name of the table.
-- partition_by – Sets a [custom partitioning key](../../table_engines/custom_partitioning_key.md#custom-partitioning-key).
+- partition_by - Sets the [custom partition key](../../table_engines/custom_partitioning_key.md#custom-partitioning-key).
- flush_interval_milliseconds – Interval for flushing data from memory to the disk.
**Example**
@@ -563,7 +562,7 @@ Use the following parameters to configure logging:
- database – Name of the database.
- table – Name of the table.
-- partition_by – Sets a [custom partitioning key](../../table_engines/custom_partitioning_key.md#custom-partitioning-key).
+- partition_by - Sets the [custom partition key](../../table_engines/custom_partitioning_key.md#custom-partitioning-key).
- flush_interval_milliseconds – Interval for flushing data from memory to the disk.
If the table doesn't exist, ClickHouse will create it. If the structure of the query log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
@@ -585,7 +584,7 @@ If the table doesn't exist, ClickHouse will create it. If the structure of the q
Configuration of clusters used by the Distributed table engine.
-For more information, see the section "[Table engines/Distributed](../../table_engines/distributed.md#table_engines-distributed)".
+For more information, see the section "[Duplicated table engine](../../table_engines/distributed.md#table_engines-distributed)".
**Example**
@@ -645,7 +644,7 @@ The end slash is mandatory.
## uncompressed_cache_size
-Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) family.
+Cache size (in bytes) for uncompressed data used by table engines from the [ MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) family.
There is one shared cache for the server. Memory is allocated on demand. The cache is used if the option [use_uncompressed_cache](../settings/settings.md#settings-use_uncompressed_cache) is enabled.
diff --git a/docs/en/operations/settings/index.md b/docs/en/operations/settings/index.md
old mode 100755
new mode 100644
index 0c5ca5d5171..0e967a4c081
--- a/docs/en/operations/settings/index.md
+++ b/docs/en/operations/settings/index.md
@@ -9,9 +9,9 @@ Ways to configure settings, in order of priority:
- Settings in the server config file.
- Settings from user profiles.
+ Set via user profiles.
-- Session settings.
+- For the session.
Send ` SET setting=value` from the ClickHouse console client in interactive mode.
Similarly, you can use ClickHouse sessions in the HTTP protocol. To do this, you need to specify the `session_id` HTTP parameter.
diff --git a/docs/en/operations/settings/query_complexity.md b/docs/en/operations/settings/query_complexity.md
old mode 100755
new mode 100644
diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md
old mode 100755
new mode 100644
index 25c804b0035..e006f302c68
--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@@ -4,7 +4,7 @@
## distributed_product_mode
-Changes the behavior of [distributed subqueries](../../query_language/queries.md#queries-distributed-subrequests), i.e. in cases when the query contains the product of distributed tables.
+Alters the behavior of [distributed subqueries](../../query_language/queries.md#queries-distributed-subrequests), i.e. in cases when the query contains the product of distributed tables.
ClickHouse applies the configuration if the subqueries on any level have a distributed table that exists on the local server and has more than one shard.
@@ -12,9 +12,9 @@ Restrictions:
- Only applied for IN and JOIN subqueries.
- Used only if a distributed table is used in the FROM clause.
-- Not used for a table-valued [ remote](../../table_functions/remote.md#table_functions-remote) function.
+- Not used for a table-valued [ remote](../../table_functions/remote.md#table_functions-remote)function.
-The possible values are:
+Possible values:
@@ -36,7 +36,7 @@ Disables query execution if the index can't be used by date.
Works with tables in the MergeTree family.
-If `force_index_by_date=1`, ClickHouse checks whether the query has a date key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For example, the condition `Date != ' 2000-01-01 '` is acceptable even when it matches all the data in the table (i.e., running the query requires a full scan). For more information about ranges of data in MergeTree tables, see "[MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)".
+If `force_index_by_date=1`, ClickHouse checks whether the query has a date key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For example, the condition `Date != ' 2000-01-01 '` is acceptable even when it matches all the data in the table (i.e., running the query requires a full scan). For more information about ranges of data in MergeTree tables, see [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)".
@@ -46,7 +46,7 @@ Disables query execution if indexing by the primary key is not possible.
Works with tables in the MergeTree family.
-If `force_primary_key=1`, ClickHouse checks to see if the query has a primary key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For more information about data ranges in MergeTree tables, see "[MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)".
+If `force_primary_key=1`, ClickHouse checks to see if the query has a primary key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For more information about ranges of data in MergeTree tables, see [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)".
@@ -158,7 +158,7 @@ Don't confuse blocks for compression (a chunk of memory consisting of bytes) and
## min_compress_block_size
-For [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)" tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
+For [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
The actual size of the block, if the uncompressed data is less than 'max_compress_block_size', is no less than this value and no less than the volume of data for one mark.
@@ -253,13 +253,13 @@ Yandex.Metrica uses this parameter set to 1 for implementing suggestions for seg
## schema
-This parameter is useful when you are using formats that require a schema definition, such as [Cap'n Proto](https://capnproto.org/). The value depends on the format.
+This parameter is useful when you are using formats that require a schema definition, such as [ Cap'n Proto](https://capnproto.org/). The value depends on the format.
## stream_flush_interval_ms
-Works for tables with streaming in the case of a timeout, or when a thread generates[max_insert_block_size](#settings-settings-max_insert_block_size) rows.
+Works for tables with streaming in the case of a timeout, or when a thread generates [ max_insert_block_size](#settings-settings-max_insert_block_size) rows.
The default value is 7500.
diff --git a/docs/en/operations/settings/settings_profiles.md b/docs/en/operations/settings/settings_profiles.md
old mode 100755
new mode 100644
index f1fce41ba75..c978c599bd5
--- a/docs/en/operations/settings/settings_profiles.md
+++ b/docs/en/operations/settings/settings_profiles.md
@@ -17,7 +17,7 @@ Example:
-
+
8
diff --git a/docs/en/operations/tips.md b/docs/en/operations/tips.md
old mode 100755
new mode 100644
index 652698fe24c..11fc8f6da11
--- a/docs/en/operations/tips.md
+++ b/docs/en/operations/tips.md
@@ -105,7 +105,7 @@ Use at least a 10 GB network, if possible. 1 Gb will also work, but it will be m
You are probably already using ZooKeeper for other purposes. You can use the same installation of ZooKeeper, if it isn't already overloaded.
-It's best to use a fresh version of ZooKeeper – 3.4.9 or later. The version in stable Linux distributions may be outdated.
+It's best to use a fresh version of ZooKeeper – 3.5 or later. The version in stable Linux distributions may be outdated.
With the default settings, ZooKeeper is a time bomb:
@@ -174,7 +174,8 @@ dynamicConfigFile=/etc/zookeeper-{{ cluster['name'] }}/conf/zoo.cfg.dynamic
Java version:
```text
-Java(TM) SE Runtime Environment (build 1.8.0_25-b17)Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
+Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
+Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
```
JVM parameters:
diff --git a/docs/en/operators/index.md b/docs/en/operators/index.md
old mode 100755
new mode 100644
index 411cde34b50..779f3bf3843
--- a/docs/en/operators/index.md
+++ b/docs/en/operators/index.md
@@ -67,11 +67,11 @@ Groups of operators are listed in order of priority (the higher it is in the lis
`NOT a` The `not(a) function.`
-## Logical AND operator
+## Logical 'AND' operator
`a AND b` – The`and(a, b) function.`
-## Logical OR operator
+## Logical 'OR' operator
`a OR b` – The `or(a, b) function.`
diff --git a/docs/en/query_language/index.md b/docs/en/query_language/index.md
old mode 100755
new mode 100644
index 769d94eb4fd..247d76fc6ed
--- a/docs/en/query_language/index.md
+++ b/docs/en/query_language/index.md
@@ -1,2 +1 @@
# Query language
-
diff --git a/docs/en/query_language/queries.md b/docs/en/query_language/queries.md
old mode 100755
new mode 100644
index d235945a646..743706a551f
--- a/docs/en/query_language/queries.md
+++ b/docs/en/query_language/queries.md
@@ -11,6 +11,7 @@ CREATE DATABASE [IF NOT EXISTS] db_name
`A database` is just a directory for tables.
If `IF NOT EXISTS` is included, the query won't return an error if the database already exists.
+
## CREATE TABLE
@@ -183,7 +184,7 @@ Deletes all tables inside the 'db' database, then deletes the 'db' database itse
If `IF EXISTS` is specified, it doesn't return an error if the database doesn't exist.
```sql
-DROP TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
+DROP [TEMPORARY] TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
```
Deletes the table.
@@ -307,7 +308,8 @@ SELECT * FROM system.parts WHERE active
`active` – Only count active parts. Inactive parts are, for example, source parts remaining after merging to a larger part – these parts are deleted approximately 10 minutes after merging.
Another way to view a set of parts and partitions is to go into the directory with table data.
-Data directory: `/var/lib/clickhouse/data/database/table/`,where `/var/lib/clickhouse/` is the path to the ClickHouse data, 'database' is the database name, and 'table' is the table name. Example:
+Data directory: `/var/lib/clickhouse/data/database/table/`,
+where `/var/lib/clickhouse/` is the path to the ClickHouse data, 'database' is the database name, and 'table' is the table name. Example:
```bash
$ ls -l /var/lib/clickhouse/data/test/visits/
@@ -323,7 +325,7 @@ Here, `20140317_20140323_2_2_0` and ` 20140317_20140323_4_4_0` are the directori
Let's break down the name of the first part: `20140317_20140323_2_2_0`.
- `20140317` is the minimum date of the data in the chunk.
-- `20140323` is the maximum data of the data in the chunk.
+- `20140323` is the maximum date of the data in the chunk.
- `2` is the minimum number of the data block.
- `2` is the maximum number of the data block.
- `0` is the chunk level (the depth of the merge tree it is formed from).
@@ -450,7 +452,7 @@ See also the section "Formats".
## SHOW TABLES
```sql
-SHOW TABLES [FROM db] [LIKE 'pattern'] [INTO OUTFILE filename] [FORMAT format]
+SHOW [TEMPORARY] TABLES [FROM db] [LIKE 'pattern'] [INTO OUTFILE filename] [FORMAT format]
```
Displays a list of tables
@@ -460,7 +462,7 @@ Displays a list of tables
This query is identical to: `SELECT name FROM system.tables WHERE database = 'db' [AND name LIKE 'pattern'] [INTO OUTFILE filename] [FORMAT format]`.
-See also the section "LIKE operator".
+See the section "LIKE operator" also.
## SHOW PROCESSLIST
@@ -484,7 +486,7 @@ Prints a table containing the columns:
**query** – The query itself. In INSERT queries, the data for insertion is not output.
-**query_id** – The query identifier. Non-empty only if it was explicitly defined by the user. For distributed processing, the query ID is not passed to remote servers.
+**query_id** - The query identifier. Non-empty only if it was explicitly defined by the user. For distributed processing, the query ID is not passed to remote servers.
This query is identical to: `SELECT * FROM system.processes [INTO OUTFILE filename] [FORMAT format]`.
@@ -497,7 +499,7 @@ watch -n1 "clickhouse-client --query='SHOW PROCESSLIST'"
## SHOW CREATE TABLE
```sql
-SHOW CREATE TABLE [db.]table [INTO OUTFILE filename] [FORMAT format]
+SHOW CREATE [TEMPORARY] TABLE [db.]table [INTO OUTFILE filename] [FORMAT format]
```
Returns a single `String`-type 'statement' column, which contains a single value – the `CREATE` query used for creating the specified table.
@@ -515,7 +517,7 @@ Nested data structures are output in "expanded" format. Each column is shown sep
## EXISTS
```sql
-EXISTS TABLE [db.]name [INTO OUTFILE filename] [FORMAT format]
+EXISTS [TEMPORARY] TABLE [db.]name [INTO OUTFILE filename] [FORMAT format]
```
Returns a single `UInt8`-type column, which contains the single value `0` if the table or database doesn't exist, or `1` if the table exists in the specified database.
@@ -571,9 +573,9 @@ The query can specify a list of columns to insert `[(c1, c2, c3)]`. In this case
- The values calculated from the `DEFAULT` expressions specified in the table definition.
- Zeros and empty strings, if `DEFAULT` expressions are not defined.
-If [strict_insert_defaults=1](../operations/settings/settings.md#settings-strict_insert_defaults), columns that do not have `DEFAULT` defined must be listed in the query.
+If [strict_insert_defaults=1](../operations/settings/settings.md#settings-strict_insert_defaults), columns that do not have ` DEFAULT` defined must be listed in the query.
-Data can be passed to the INSERT in any [format](../formats/index.md#formats) supported by ClickHouse. The format must be specified explicitly in the query:
+The INSERT can pass data in any [format](../formats/index.md#formats) supported by ClickHouse. The format must be specified explicitly in the query:
```sql
INSERT INTO [db.]table [(c1, c2, c3)] FORMAT format_name data_set
@@ -972,7 +974,8 @@ All columns that are not needed for the JOIN are deleted from the subquery.
There are several types of JOINs:
-`INNER` or `LEFT` type:If INNER is specified, the result will contain only those rows that have a matching row in the right table.
+`INNER` or `LEFT` type:
+If INNER is specified, the result will contain only those rows that have a matching row in the right table.
If LEFT is specified, any rows in the left table that don't have matching rows in the right table will be assigned the default value - zeros or empty rows. LEFT OUTER may be written instead of LEFT; the word OUTER does not affect anything.
`ANY` or `ALL` stringency:If `ANY` is specified and the right table has several matching rows, only the first one found is joined.
@@ -1103,7 +1106,7 @@ Example:
SELECT
domainWithoutWWW(URL) AS domain,
count(),
- any(Title) AS title -- getting the first occurring page header for each domain.
+ any(Title) AS title -- getting the first occurred page header for each domain.
FROM hits
GROUP BY domain
```
@@ -1348,7 +1351,7 @@ There are two options for IN-s with subqueries (similar to JOINs): normal `IN`
-Remember that the algorithms described below may work differently depending on the [settings](../operations/settings/settings.md#settings-distributed_product_mode) `distributed_product_mode` setting.
+Remember that the algorithms described below may work differently depending on the [](../operations/settings/settings.md#settings-distributed_product_mode) `distributed_product_mode` setting.
@@ -1476,34 +1479,34 @@ In all other cases, we don't recommend using the asterisk, since it only gives y
## KILL QUERY
```sql
-KILL QUERY WHERE [SYNC|ASYNC|TEST] [FORMAT format]
+KILL QUERY
+ WHERE
+ [SYNC|ASYNC|TEST]
+ [FORMAT format]
```
Attempts to terminate queries currently running.
-The queries to terminate are selected from the system.processes table for which expression_for_system.processes is true.
+The queries to terminate are selected from the system.processes table for which `WHERE` expression is true.
Examples:
```sql
+-- Terminates all queries with the specified query_id.
KILL QUERY WHERE query_id='2-857d-4a57-9ee0-327da5d60a90'
-```
-Terminates all queries with the specified query_id.
-
-```sql
+-- Synchronously terminates all queries run by `username`.
KILL QUERY WHERE user='username' SYNC
```
-Synchronously terminates all queries run by `username`.
-
Readonly-users can only terminate their own requests.
-By default, the asynchronous version of queries is used (`ASYNC`), which terminates without waiting for queries to complete.
-The synchronous version (`SYNC`) waits for all queries to be completed and displays information about each process as it terminates.
+
+By default, the asynchronous version of queries is used (`ASYNC`), which doesn't wait for query termination.
+
+The synchronous version (`SYNC`) waits for all queries to be killed and displays information about each process as it terminates.
The response contains the `kill_status` column, which can take the following values:
-1. 'finished' – The query completed successfully.
+1. 'finished' – The query terminated successfully.
2. 'waiting' – Waiting for the query to finish after sending it a signal to terminate.
3. The other values explain why the query can't be terminated.
A test query (`TEST`) only checks the user's rights and displays a list of queries to terminate.
-
diff --git a/docs/en/query_language/syntax.md b/docs/en/query_language/syntax.md
old mode 100755
new mode 100644
index 4928f2d4a12..e151d2ee3d9
--- a/docs/en/query_language/syntax.md
+++ b/docs/en/query_language/syntax.md
@@ -46,7 +46,7 @@ There are numeric literals, string literals, and compound literals.
A numeric literal tries to be parsed:
-- First as a 64-bit signed number, using the 'strtoull' function.
+- first as a 64-bit signed number, using the 'strtoull' function.
- If unsuccessful, as a 64-bit unsigned number, using the 'strtoll' function.
- If unsuccessful, as a floating-point number using the 'strtod' function.
- Otherwise, an error is returned.
diff --git a/docs/en/roadmap.md b/docs/en/roadmap.md
old mode 100755
new mode 100644
index 8241b0a65ae..46b08a89607
--- a/docs/en/roadmap.md
+++ b/docs/en/roadmap.md
@@ -3,13 +3,11 @@
## Q1 2018
### New functionality
-
-- Support for `UPDATE` and `DELETE`.
-
-- Multidimensional and nested arrays.
-
- It can look something like this:
-
+- Initial support for `UPDATE` and `DELETE`.
+- Multi-dimensional and nested arrays.
+
+ It may look like this:
+
```sql
CREATE TABLE t
(
@@ -23,7 +21,7 @@ ENGINE = MergeTree ORDER BY x
- External MySQL and ODBC tables.
- External tables can be integrated into ClickHouse using external dictionaries. This new functionality is a convenient alternative to connecting external tables.
+ External tables can be integrated to ClickHouse using external dictionaries. This will be an alternative and a more convenient way to do so.
```sql
SELECT ...
@@ -32,68 +30,66 @@ FROM mysql('host:port', 'db', 'table', 'user', 'password')`
### Improvements
-- Effective data copying between ClickHouse clusters.
+- Efficient data copy between ClickHouse clusters.
- Now you can copy data with the remote() function. For example: `
+ Currently, it is possible to copy data using remote() function, e.g.: `
INSERT INTO t SELECT * FROM remote(...) `.
- This operation will have improved performance.
+ The performance of this will be improved by proper distributed execution.
- O_DIRECT for merges.
- This will improve the performance of the OS cache and "hot" queries.
+ Should improve OS cache performance and correspondingly query performance for 'hot' queries.
+
## Q2 2018
### New functionality
-- UPDATE/DELETE conform to the EU GDPR.
-- Protobuf and Parquet input and output formats.
-- Creating dictionaries using DDL queries.
+- UPDATE/DELETE in order to comply with Europe GDPR.
+- Protobuf and Parquet input/output formats.
+- Create dictionaries by DDL queries.
- Currently, dictionaries that are part of the database schema are defined in external XML files. This is inconvenient and counter-intuitive. The new approach should fix it.
-
-- Integration with LDAP.
+ Currently, it is inconvenient and confusing that dictionaries are defined in external XML files while being a part of DB schema. The new approach will fix that.
+- LDAP integration.
- WITH ROLLUP and WITH CUBE for GROUP BY.
+- Custom encoding/compression for columns.
-- Custom encoding and compression for each column individually.
+ Currently, ClickHouse support LZ4 and ZSTD compressions for columns, and compressions settings are global (see our article [Compression in ClickHouse](https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse)) for more details). Column level encoding (e.g. delta encoding) and compression will allow more efficient data storage and therefore faster queries.
- As of now, ClickHouse supports LZ4 and ZSTD compression of columns, and compression settings are global (see the article [Compression in ClickHouse](https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse)). Per-column compression and encoding will provide more efficient data storage, which in turn will speed up queries.
+- Store data at multiple disk volumes of a single server.
-- Storing data on multiple disks on the same server.
-
- This functionality will make it easier to extend the disk space, since different disk systems can be used for different databases or tables. Currently, users are forced to use symbolic links if the databases and tables must be stored on a different disk.
+ That will make it easier to extend disk system as well as use different disk systems for different DBs or tables. Currently, users have to use symlinks if DB/table needs to be stored in another volume.
### Improvements
-Many improvements and fixes are planned for the query execution system. For example:
+A lot of enhancements and fixes are planned for query execution. In particular:
-- Using an index for `in (subquery)`.
+- Using index for ‘in (subquery)’.
- The index is not used right now, which reduces performance.
+ Currently, index is not used for such queries resulting in lower performance.
-- Passing predicates from `where` to subqueries, and passing predicates to views.
+- Predicate pushdown from ‘where’ into subqueries and Predicate pushdown for views.
- The predicates must be passed, since the view is changed by the subquery. Performance is still low for view filters, and views can't use the primary key of the original table, which makes views useless for large tables.
+ These two are related since view is replaced by subquery. Currently, performance of filter conditions for views is significantly degraded, views can not use primary key of the underlying table, that makes views on big tables pretty much useless.
-- Optimizing branching operations (ternary operator, if, multiIf).
+- Short-circuit expressions evaluation (ternary operator, if, multiIf).
- ClickHouse currently performs all branches, even if they aren't necessary.
+ Currently, ClickHouse evaluates all branches even if the first one needs to be returned due to logical condition result.
-- Using a primary key for GROUP BY and ORDER BY.
+- Using primary key for GROUP BY and ORDER BY.
- This will speed up certain types of queries with partially sorted data.
+ This may speed up certain types of queries since data is already partially pre-sorted.
## Q3-Q4 2018
-We don't have any set plans yet, but the main projects will be:
+Longer term plans are not yet finalized. There are two major projects on the list so far.
-- Resource pools for executing queries.
+- Resource pools for query execution.
- This will make load management more efficient.
+ That will allow managing workloads more efficiently.
- ANSI SQL JOIN syntax.
- Improve ClickHouse compatibility with many SQL tools.
-
+ That will make ClickHouse more friendly for numerous SQL tools.
diff --git a/docs/en/system_tables/index.md b/docs/en/system_tables/index.md
old mode 100755
new mode 100644
index 240105a684b..614ce4020ec
--- a/docs/en/system_tables/index.md
+++ b/docs/en/system_tables/index.md
@@ -4,5 +4,5 @@ System tables are used for implementing part of the system's functionality, and
You can't delete a system table (but you can perform DETACH).
System tables don't have files with data on the disk or files with metadata. The server creates all the system tables when it starts.
System tables are read-only.
-They are located in the 'system' database.
+System tables are located in the 'system' database.
diff --git a/docs/en/system_tables/system.asynchronous_metrics.md b/docs/en/system_tables/system.asynchronous_metrics.md
old mode 100755
new mode 100644
diff --git a/docs/en/system_tables/system.clusters.md b/docs/en/system_tables/system.clusters.md
old mode 100755
new mode 100644
index bc8dab86b3c..c0bc3dd13fa
--- a/docs/en/system_tables/system.clusters.md
+++ b/docs/en/system_tables/system.clusters.md
@@ -4,12 +4,13 @@ Contains information about clusters available in the config file and the servers
Columns:
```text
-cluster String - Cluster name.
-shard_num UInt32 - Number of a shard in the cluster, starting from 1.
-shard_weight UInt32 - Relative weight of a shard when writing data.
-replica_num UInt32 - Number of a replica in the shard, starting from 1.
-host_name String - Host name as specified in the config.
-host_address String - Host's IP address obtained from DNS.
-port UInt16 - The port used to access the server.
-user String - The username to use for connecting to the server.
+cluster String – Cluster name.
+shard_num UInt32 – Number of a shard in the cluster, starting from 1.
+shard_weight UInt32 – Relative weight of a shard when writing data.
+replica_num UInt32 – Number of a replica in the shard, starting from 1.
+host_name String – Host name as specified in the config.
+host_address String – Host's IP address obtained from DNS.
+port UInt16 – The port used to access the server.
+user String – The username to use for connecting to the server.
```
+
diff --git a/docs/en/system_tables/system.columns.md b/docs/en/system_tables/system.columns.md
old mode 100755
new mode 100644
index bf05616fbef..975b84fe9d4
--- a/docs/en/system_tables/system.columns.md
+++ b/docs/en/system_tables/system.columns.md
@@ -11,3 +11,4 @@ type String - Column type.
default_type String - Expression type (DEFAULT, MATERIALIZED, ALIAS) for the default value, or an empty string if it is not defined.
default_expression String - Expression for the default value, or an empty string if it is not defined.
```
+
diff --git a/docs/en/system_tables/system.databases.md b/docs/en/system_tables/system.databases.md
old mode 100755
new mode 100644
diff --git a/docs/en/system_tables/system.dictionaries.md b/docs/en/system_tables/system.dictionaries.md
old mode 100755
new mode 100644
index 4ef0d7707b8..f3c9929d38e
--- a/docs/en/system_tables/system.dictionaries.md
+++ b/docs/en/system_tables/system.dictionaries.md
@@ -5,19 +5,19 @@ Contains information about external dictionaries.
Columns:
```text
-name String - Dictionary name.
-type String - Dictionary type: Flat, Hashed, Cache.
-origin String - Path to the config file where the dictionary is described.
-attribute.names Array(String) - Array of attribute names provided by the dictionary.
-attribute.types Array(String) - Corresponding array of attribute types provided by the dictionary.
-has_hierarchy UInt8 - Whether the dictionary is hierarchical.
-bytes_allocated UInt64 - The amount of RAM used by the dictionary.
-hit_rate Float64 - For cache dictionaries, the percent of usage for which the value was in the cache.
-element_count UInt64 - The number of items stored in the dictionary.
-load_factor Float64 - The filled percentage of the dictionary (for a hashed dictionary, it is the filled percentage of the hash table).
-creation_time DateTime - Time spent for the creation or last successful reload of the dictionary.
-last_exception String - Text of an error that occurred when creating or reloading the dictionary, if the dictionary couldn't be created.
-source String - Text describing the data source for the dictionary.
+name String – Dictionary name.
+type String – Dictionary type: Flat, Hashed, Cache.
+origin String – Path to the config file where the dictionary is described.attribute.
+names Array(String) – Array of attribute names provided by the dictionary.
+attribute.types Array(String) – Corresponding array of attribute types provided by the dictionary.
+has_hierarchy UInt8 – Whether the dictionary is hierarchical.
+bytes_allocated UInt64 – The amount of RAM used by the dictionary.
+hit_rate Float64 – For cache dictionaries, the percent of usage for which the value was in the cache.
+element_count UInt64 – The number of items stored in the dictionary.
+load_factor Float64 – The filled percentage of the dictionary (for a hashed dictionary, it is the filled percentage of the hash table).
+creation_time DateTime – Time spent for the creation or last successful reload of the dictionary.
+last_exception String – Text of an error that occurred when creating or reloading the dictionary, if the dictionary couldn't be created.
+source String – Text describing the data source for the dictionary.
```
Note that the amount of memory used by the dictionary is not proportional to the number of items stored in it. So for flat and cached dictionaries, all the memory cells are pre-assigned, regardless of how full the dictionary actually is.
diff --git a/docs/en/system_tables/system.events.md b/docs/en/system_tables/system.events.md
old mode 100755
new mode 100644
diff --git a/docs/en/system_tables/system.functions.md b/docs/en/system_tables/system.functions.md
old mode 100755
new mode 100644
index a1022a5e557..ac550acc14b
--- a/docs/en/system_tables/system.functions.md
+++ b/docs/en/system_tables/system.functions.md
@@ -6,6 +6,6 @@ Columns:
```text
name String – Function name.
-is_aggregate UInt8 – Whether it is an aggregate function.
+is_aggregate UInt8 – Whether it is an aggregate function.
```
diff --git a/docs/en/system_tables/system.merges.md b/docs/en/system_tables/system.merges.md
old mode 100755
new mode 100644
index 59870922ea5..0a10e4a5a8c
--- a/docs/en/system_tables/system.merges.md
+++ b/docs/en/system_tables/system.merges.md
@@ -18,3 +18,4 @@ rows_read UInt64 - Number of rows read.
bytes_written_uncompressed UInt64 - Amount of bytes written, uncompressed.
rows_written UInt64 - Number of rows written.
```
+
diff --git a/docs/en/system_tables/system.numbers.md b/docs/en/system_tables/system.numbers.md
old mode 100755
new mode 100644
diff --git a/docs/en/system_tables/system.numbers_mt.md b/docs/en/system_tables/system.numbers_mt.md
old mode 100755
new mode 100644
diff --git a/docs/en/system_tables/system.one.md b/docs/en/system_tables/system.one.md
old mode 100755
new mode 100644
diff --git a/docs/en/system_tables/system.parts.md b/docs/en/system_tables/system.parts.md
old mode 100755
new mode 100644
diff --git a/docs/en/system_tables/system.processes.md b/docs/en/system_tables/system.processes.md
old mode 100755
new mode 100644
index 0802e555648..b9ad6a44e81
--- a/docs/en/system_tables/system.processes.md
+++ b/docs/en/system_tables/system.processes.md
@@ -14,12 +14,12 @@ rows_read UInt64 – The number of rows read from the table. For distribu
bytes_read UInt64 – The number of uncompressed bytes read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
-UInt64 total_rows_approx – The approximate total number of rows that must be read. For distributed processing, on the requestor server, this is the total for all remote servers. It can be updated during request processing, when new sources to process become known.
+total_rows_approx UInt64 – The approximate total number of rows that must be read. For distributed processing, on the requestor server, this is the total for all remote servers. It can be updated during request processing, when new sources to process become known.
memory_usage UInt64 – Memory consumption by the query. It might not include some types of dedicated memory.
-Query String – The query text. For INSERT, it doesn't include the data to insert.
+query String – The query text. For INSERT, it doesn't include the data to insert.
-query_id – Query ID, if defined.
+query_id String - The query ID, if defined.
```
diff --git a/docs/en/system_tables/system.replicas.md b/docs/en/system_tables/system.replicas.md
old mode 100755
new mode 100644
index 75cd8e34340..ec1341198dc
--- a/docs/en/system_tables/system.replicas.md
+++ b/docs/en/system_tables/system.replicas.md
@@ -42,41 +42,37 @@ Columns:
database: database name
table: table name
engine: table engine name
-
is_leader: whether the replica is the leader
Only one replica at a time can be the leader. The leader is responsible for selecting background merges to perform.
Note that writes can be performed to any replica that is available and has a session in ZK, regardless of whether it is a leader.
-is_readonly: Whether the replica is in read-only mode.
-This mode is turned on if the config doesn't have sections with ZK, if an unknown error occurred when reinitializing sessions in ZK, and during session reinitialization in ZK.
+is_readonly: Whether the replica is in read-only mode.This mode is turned on if the config doesn't have sections with ZK, if an unknown error occurred when reinitializing sessions in ZK, and during session reinitialization in ZK.
is_session_expired: Whether the ZK session expired.
Basically, the same thing as is_readonly.
-future_parts: The number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.
+future_parts: The number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.
-parts_to_check: The number of data parts in the queue for verification.
-A part is put in the verification queue if there is suspicion that it might be damaged.
+parts_to_check: The number of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged.
-zookeeper_path: The path to the table data in ZK.
+zookeeper_path: The path to the table data in ZK.
replica_name: Name of the replica in ZK. Different replicas of the same table have different names.
-replica_path: The path to the replica data in ZK. The same as concatenating zookeeper_path/replicas/replica_path.
+replica_path: The path to the replica data in ZK. The same as concatenating zookeeper_path/replicas/replica_path.
-columns_version: Version number of the table structure. Indicates how many times ALTER was performed. If replicas have different versions, it means some replicas haven't made all of the ALTERs yet.
+columns_version: Version number of the table structure. Indicates how many times ALTER was performed. If replicas have different versions, it means some replicas haven't made all of the ALTERs yet.
-queue_size: Size of the queue for operations waiting to be performed.
-Operations include inserting blocks of data, merges, and certain other actions.
+queue_size: Size of the queue for operations waiting to be performed. Operations include inserting blocks of data, merges, and certain other actions.
Normally coincides with future_parts.
-inserts_in_queue: Number of inserts of blocks of data that need to be made. Insertions are usually replicated fairly quickly. If the number is high, something is wrong.
+inserts_in_queue: Number of inserts of blocks of data that need to be made. Insertions are usually replicated fairly quickly. If the number is high, something is wrong.
-merges_in_queue: The number of merges waiting to be made. Sometimes merges are lengthy, so this value may be greater than zero for a long time.
+merges_in_queue: The number of merges waiting to be made. Sometimes merges are lengthy, so this value may be greater than zero for a long time.
The next 4 columns have a non-null value only if the ZK session is active.
-log_max_index: Maximum entry number in the log of general activity. log_pointer: Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one.
-If log_pointer is much smaller than log_max_index, something is wrong.
+log_max_index: Maximum entry number in the log of general activity.
+log_pointer: Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one. If log_pointer is much smaller than log_max_index, something is wrong.
total_replicas: Total number of known replicas of this table.
active_replicas: Number of replicas of this table that have a ZK session (the number of active replicas).
diff --git a/docs/en/system_tables/system.settings.md b/docs/en/system_tables/system.settings.md
old mode 100755
new mode 100644
index 90a392bcc24..a055135ebcf
--- a/docs/en/system_tables/system.settings.md
+++ b/docs/en/system_tables/system.settings.md
@@ -6,9 +6,9 @@ I.e. used for executing the query you are using to read from the system.settings
Columns:
```text
-name String – Setting name.
+name String – Setting name.
value String – Setting value.
-changed UInt8 - Whether the setting was explicitly defined in the config or explicitly changed.
+changed UInt8 -–Whether the setting was explicitly defined in the config or explicitly changed.
```
Example:
diff --git a/docs/en/system_tables/system.tables.md b/docs/en/system_tables/system.tables.md
old mode 100755
new mode 100644
index 5757a8ac3da..fabddf4dbb1
--- a/docs/en/system_tables/system.tables.md
+++ b/docs/en/system_tables/system.tables.md
@@ -1,7 +1,6 @@
# system.tables
This table contains the String columns 'database', 'name', and 'engine'.
-The table also contains three virtual columns: metadata_modification_time (DateTime type), create_table_query, and engine_full (String type).
+Also, the table has three virtual columns: metadata_modification_time of type DateTime, create_table_query and engine_full of type String.
Each table that the server knows about is entered in the 'system.tables' table.
This system table is used for implementing SHOW TABLES queries.
-
diff --git a/docs/en/system_tables/system.zookeeper.md b/docs/en/system_tables/system.zookeeper.md
old mode 100755
new mode 100644
index 46b40e7a08f..c456e7d9207
--- a/docs/en/system_tables/system.zookeeper.md
+++ b/docs/en/system_tables/system.zookeeper.md
@@ -70,3 +70,4 @@ numChildren: 7
pzxid: 987021252247
path: /clickhouse/tables/01-08/visits/replicas
```
+
diff --git a/docs/en/table_engines/aggregatingmergetree.md b/docs/en/table_engines/aggregatingmergetree.md
old mode 100755
new mode 100644
index 987c102508b..d75d8353e6d
--- a/docs/en/table_engines/aggregatingmergetree.md
+++ b/docs/en/table_engines/aggregatingmergetree.md
@@ -1,11 +1,10 @@
# AggregatingMergeTree
-This engine differs from `MergeTree` in that the merge combines the states of aggregate functions stored in the table for rows with the same primary key value.
+This engine differs from MergeTree in that the merge combines the states of aggregate functions stored in the table for rows with the same primary key value.
-For this to work, it uses the `AggregateFunction` data type, as well as `-State` and `-Merge` modifiers for aggregate functions. Let's examine it more closely.
-
-There is an `AggregateFunction` data type. It is a parametric data type. As parameters, the name of the aggregate function is passed, then the types of its arguments.
+In order for this to work, it uses the AggregateFunction data type and the -State and -Merge modifiers for aggregate functions. Let's examine it more closely.
+There is an AggregateFunction data type. It is a parametric data type. As parameters, the name of the aggregate function is passed, then the types of its arguments.
Examples:
```sql
@@ -20,16 +19,12 @@ CREATE TABLE t
This type of column stores the state of an aggregate function.
To get this type of value, use aggregate functions with the `State` suffix.
+Example: `uniqState(UserID), quantilesState(0.5, 0.9)(SendTiming)` – in contrast to the corresponding 'uniq' and 'quantiles' functions, these functions return the state, rather than the prepared value. In other words, they return an AggregateFunction type value.
-Example:
-`uniqState(UserID), quantilesState(0.5, 0.9)(SendTiming)`
+An AggregateFunction type value can't be output in Pretty formats. In other formats, these types of values are output as implementation-specific binary data. The AggregateFunction type values are not intended for output or saving in a dump.
-In contrast to the corresponding `uniq` and `quantiles` functions, these functions return the state, rather than the prepared value. In other words, they return an `AggregateFunction` type value.
-
-An `AggregateFunction` type value can't be output in Pretty formats. In other formats, these types of values are output as implementation-specific binary data. The `AggregateFunction` type values are not intended for output or saving in a dump.
-
-The only useful thing you can do with `AggregateFunction` type values is combine the states and get a result, which essentially means to finish aggregation. Aggregate functions with the 'Merge' suffix are used for this purpose.
-Example: `uniqMerge(UserIDState), where UserIDState has the AggregateFunction` type.
+The only useful thing you can do with AggregateFunction type values is combine the states and get a result, which essentially means to finish aggregation. Aggregate functions with the 'Merge' suffix are used for this purpose.
+Example: `uniqMerge(UserIDState), where UserIDState has the AggregateFunction type`.
In other words, an aggregate function with the 'Merge' suffix takes a set of states, combines them, and returns the result.
As an example, these two queries return the same result:
@@ -42,15 +37,15 @@ SELECT uniqMerge(state) FROM (SELECT uniqState(UserID) AS state FROM table GROUP
There is an ` AggregatingMergeTree` engine. Its job during a merge is to combine the states of aggregate functions from different table rows with the same primary key value.
-You can't use a normal INSERT to insert a row in a table containing `AggregateFunction` columns, because you can't explicitly define the `AggregateFunction` value. Instead, use `INSERT SELECT` with `-State` aggregate functions for inserting data.
+You can't use a normal INSERT to insert a row in a table containing AggregateFunction columns, because you can't explicitly define the AggregateFunction value. Instead, use INSERT SELECT with '-State' aggregate functions for inserting data.
-With SELECT from an `AggregatingMergeTree` table, use GROUP BY and aggregate functions with the '-Merge' modifier in order to complete data aggregation.
+With SELECT from an AggregatingMergeTree table, use GROUP BY and aggregate functions with the '-Merge' modifier in order to complete data aggregation.
-You can use `AggregatingMergeTree` tables for incremental data aggregation, including for aggregated materialized views.
+You can use AggregatingMergeTree tables for incremental data aggregation, including for aggregated materialized views.
Example:
-Create an `AggregatingMergeTree` materialized view that watches the `test.visits` table:
+Creating a materialized AggregatingMergeTree view that tracks the 'test.visits' table:
```sql
CREATE MATERIALIZED VIEW test.basic
@@ -64,13 +59,13 @@ FROM test.visits
GROUP BY CounterID, StartDate;
```
-Insert data in the `test.visits` table. Data will also be inserted in the view, where it will be aggregated:
+Inserting data in the 'test.visits' table. Data will also be inserted in the view, where it will be aggregated:
```sql
INSERT INTO test.visits ...
```
-Perform `SELECT` from the view using `GROUP BY` in order to complete data aggregation:
+Performing SELECT from the view using GROUP BY to finish data aggregation:
```sql
SELECT
diff --git a/docs/en/table_engines/buffer.md b/docs/en/table_engines/buffer.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/collapsingmergetree.md b/docs/en/table_engines/collapsingmergetree.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/custom_partitioning_key.md b/docs/en/table_engines/custom_partitioning_key.md
old mode 100755
new mode 100644
index bfcb3c2c545..6a468a1137d
--- a/docs/en/table_engines/custom_partitioning_key.md
+++ b/docs/en/table_engines/custom_partitioning_key.md
@@ -2,7 +2,7 @@
# Custom partitioning key
-Starting with version 1.1.54310, you can create tables in the MergeTree family with any partitioning expression (not only partitioning by month).
+Starting with version 1.1.54310, you can create tables in the MergeTree family with any partition expression (not only partitioning by month).
The partition key can be an expression from the table columns, or a tuple of such expressions (similar to the primary key). The partition key can be omitted. When creating a table, specify the partition key in the ENGINE description with the new syntax:
@@ -10,7 +10,7 @@ The partition key can be an expression from the table columns, or a tuple of suc
ENGINE [=] Name(...) [PARTITION BY expr] [ORDER BY expr] [SAMPLE BY expr] [SETTINGS name=value, ...]
```
-For MergeTree tables, the partition expression is specified after `PARTITION BY`, the primary key after `ORDER BY`, the sampling key after `SAMPLE BY`, and `SETTINGS` can specify `index_granularity` (optional; the default value is 8192), as well as other settings from [MergeTreeSettings.h](https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/MergeTree/MergeTreeSettings.h). The other engine parameters are specified in parentheses after the engine name, as previously. Example:
+For MergeTree tables, the partition expression is specified after `PARTITION BY`, the primary key after `ORDER BY`, the sampling key after `SAMPLE BY`, and `SETTINGS` can specify `index_granularity` (optional; the default value is 8192), as well as other settings from [MergeTreeSettings.h](https://github.com/yandex/ClickHouse/blob/master/dbms/src/Storages/MergeTree/MergeTreeSettings.h). Example:
```sql
ENGINE = ReplicatedCollapsingMergeTree('/clickhouse/tables/name', 'replica1', Sign)
@@ -23,7 +23,7 @@ The traditional partitioning by month is expressed as `toYYYYMM(date_column)`.
You can't convert an old-style table to a table with custom partitions (only via INSERT SELECT).
-After this table is created, merge will only work for data parts that have the same value for the partitioning expression. Note: This means that you shouldn't make overly granular partitions (more than about a thousand partitions), or SELECT will perform poorly.
+After this table is created, merge will only work for data parts that have the same value for the partition expression. Note: This means that you shouldn't make overly granular partitions (more than about a thousand partitions), or SELECT will perform poorly.
To specify a partition in ALTER PARTITION commands, specify the value of the partition expression (or a tuple). Constants and constant expressions are supported. Example:
@@ -35,13 +35,13 @@ Deletes the partition for the current week with event type 1. The same is true f
Note: For old-style tables, the partition can be specified either as a number `201710` or a string `'201710'`. The syntax for the new style of tables is stricter with types (similar to the parser for the VALUES input format). In addition, ALTER TABLE FREEZE PARTITION uses exact match for new-style tables (not prefix match).
-In the `system.parts` table, the `partition` column specifies the value of the partition expression to use in ALTER queries (if quotas are removed). The `name` column should specify the name of the data part that has a new format.
+In the `system.parts` table, the `partition` column should specify the value of the partition expression to use in ALTER queries (if quotas are removed). The `name` column should specify the name of the data part that has a new format.
-Was: `20140317_20140323_2_2_0` (minimum date - maximum date - minimum block number - maximum block number - level).
+Before: `20140317_20140323_2_2_0` (minimal data - maximal data - number of minimal block - number of maximal block - level).
-Now: `201403_2_2_0` (partition ID - minimum block number - maximum block number - level).
+After: `201403_2_2_0` (partition ID - number of minimal block - number of maximal block - level).
The partition ID is its string identifier (human-readable, if possible) that is used for the names of data parts in the file system and in ZooKeeper. You can specify it in ALTER queries in place of the partition key. Example: Partition key `toYYYYMM(EventDate)`; ALTER can specify either `PARTITION 201710` or `PARTITION ID '201710'`.
-For more examples, see the tests [`00502_custom_partitioning_local`](https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00502_custom_partitioning_local.sql) and [`00502_custom_partitioning_replicated_zookeeper`](https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00502_custom_partitioning_replicated_zookeeper.sql).
+There are more examples in the tests [`00502_custom_partitioning_local`](https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00502_custom_partitioning_local.sql) and [`00502_custom_partitioning_replicated_zookeeper`](https://github.com/yandex/ClickHouse/blob/master/dbms/tests/queries/0_stateless/00502_custom_partitioning_replicated_zookeeper.sql).
diff --git a/docs/en/table_engines/dictionary.md b/docs/en/table_engines/dictionary.md
deleted file mode 100755
index ae8cca90d7c..00000000000
--- a/docs/en/table_engines/dictionary.md
+++ /dev/null
@@ -1,106 +0,0 @@
-
-
-# Dictionary
-
-The `Dictionary` engine displays the dictionary data as a ClickHouse table.
-
-As an example, consider a dictionary of `products` with the following configuration:
-
-```xml
-
-
- products
-
-
- 300
- 360
-
-
-
-
-
-
- product_id
-
-
- title
- String
-
-
-
-
-
-```
-
-Query the dictionary data:
-
-```sql
-select name, type, key, attribute.names, attribute.types, bytes_allocated, element_count,source from system.dictionaries where name = 'products';
-
-SELECT
- name,
- type,
- key,
- attribute.names,
- attribute.types,
- bytes_allocated,
- element_count,
- source
-FROM system.dictionaries
-WHERE name = 'products'
-```
-```
-┌─name─────┬─type─┬─key────┬─attribute.names─┬─attribute.types─┬─bytes_allocated─┬─element_count─┬─source──────────┐
-│ products │ Flat │ UInt64 │ ['title'] │ ['String'] │ 23065376 │ 175032 │ ODBC: .products │
-└──────────┴──────┴────────┴─────────────────┴─────────────────┴─────────────────┴───────────────┴─────────────────┘
-```
-
-You can use the [dictGet*](../functions/ext_dict_functions.md#ext_dict_functions) function to get the dictionary data in this format.
-
-This view isn't helpful when you need to get raw data, or when performing a `JOIN` operation. For these cases, you can use the `Dictionary` engine, which displays the dictionary data in a table.
-
-Syntax:
-
-```
-CREATE TABLE %table_name% (%fields%) engine = Dictionary(%dictionary_name%)`
-```
-
-Usage example:
-
-```sql
-create table products (product_id UInt64, title String) Engine = Dictionary(products);
-
-CREATE TABLE products
-(
- product_id UInt64,
- title String,
-)
-ENGINE = Dictionary(products)
-```
-```
-Ok.
-
-0 rows in set. Elapsed: 0.004 sec.
-```
-
-Take a look at what's in the table.
-
-```sql
-select * from products limit 1;
-
-SELECT *
-FROM products
-LIMIT 1
-```
-```
-┌────product_id─┬─title───────────┐
-│ 152689 │ Некоторый товар │
-└───────────────┴─────────────────┘
-
-1 rows in set. Elapsed: 0.006 sec.
-```
diff --git a/docs/en/table_engines/distributed.md b/docs/en/table_engines/distributed.md
old mode 100755
new mode 100644
index dd2ffe27fe5..b8643461fbb
--- a/docs/en/table_engines/distributed.md
+++ b/docs/en/table_engines/distributed.md
@@ -25,29 +25,29 @@ Clusters are set like this:
-
- 1
-
- false
-
- example01-01-1
- 9000
-
-
- example01-01-2
- 9000
-
-
-
- 2
- false
-
- example01-02-1
- 9000
-
-
- example01-02-2
- 9000
+
+ 1
+
+ false
+
+ example01-01-1
+ 9000
+
+
+ example01-01-2
+ 9000
+
+
+
+ 2
+ false
+
+ example01-02-1
+ 9000
+
+
+ example01-02-2
+ 9000
diff --git a/docs/en/table_engines/external_data.md b/docs/en/table_engines/external_data.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/file.md b/docs/en/table_engines/file.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/graphitemergetree.md b/docs/en/table_engines/graphitemergetree.md
old mode 100755
new mode 100644
index a4b62424954..6452377ac15
--- a/docs/en/table_engines/graphitemergetree.md
+++ b/docs/en/table_engines/graphitemergetree.md
@@ -2,13 +2,13 @@
# GraphiteMergeTree
-This engine is designed for rollup (thinning and aggregating/averaging) [Graphite](http://graphite.readthedocs.io/en/latest/index.html) data. It may be helpful to developers who want to use ClickHouse as a data store for Graphite.
+This engine is designed for rollup (thinning and aggregating/averaging) [ Graphite](http://graphite.readthedocs.io/en/latest/index.html) data. It may be helpful to developers who want to use ClickHouse as a data store for Graphite.
Graphite stores full data in ClickHouse, and data can be retrieved in the following ways:
- Without thinning.
- Uses the [MergeTree](mergetree.md#table_engines-mergetree) engine.
+ Using the [MergeTree](mergetree.md#table_engines-mergetree) engine.
- With thinning.
@@ -83,3 +83,4 @@ Example of settings:
```
+
diff --git a/docs/en/table_engines/index.md b/docs/en/table_engines/index.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/join.md b/docs/en/table_engines/join.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/kafka.md b/docs/en/table_engines/kafka.md
old mode 100755
new mode 100644
index 4f10e55d029..85943e44fc5
--- a/docs/en/table_engines/kafka.md
+++ b/docs/en/table_engines/kafka.md
@@ -59,7 +59,7 @@ Example:
level String,
total UInt64
) ENGINE = SummingMergeTree(day, (day, level), 8192);
-
+
CREATE MATERIALIZED VIEW consumer TO daily
AS SELECT toDate(toDateTime(timestamp)) AS day, level, count() as total
FROM queue GROUP BY day, level;
diff --git a/docs/en/table_engines/log.md b/docs/en/table_engines/log.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/materializedview.md b/docs/en/table_engines/materializedview.md
old mode 100755
new mode 100644
index 5e2741c6aa1..00f70bd72bd
--- a/docs/en/table_engines/materializedview.md
+++ b/docs/en/table_engines/materializedview.md
@@ -1,4 +1,4 @@
# MaterializedView
-Used for implementing materialized views (for more information, see the [CREATE TABLE](../query_language/queries.md#query_language-queries-create_table)) query. For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses this engine.
+Used for implementing materialized views (for more information, see [CREATE TABLE](../query_language/queries.md#query_language-queries-create_table)). For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses this engine.
diff --git a/docs/en/table_engines/memory.md b/docs/en/table_engines/memory.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/merge.md b/docs/en/table_engines/merge.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/mergetree.md b/docs/en/table_engines/mergetree.md
old mode 100755
new mode 100644
index fea02e01d72..71197f21b34
--- a/docs/en/table_engines/mergetree.md
+++ b/docs/en/table_engines/mergetree.md
@@ -56,7 +56,7 @@ In this example, the index can't be used:
SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
```
-To check whether ClickHouse can use the index when executing the query, use the settings [force_index_by_date](../operations/settings/settings.md#settings-settings-force_index_by_date)and[force_primary_key](../operations/settings/settings.md#settings-settings-force_primary_key).
+To check whether ClickHouse can use the index when executing the query, use the settings [ force_index_by_date](../operations/settings/settings.md#settings-settings-force_index_by_date) and [ force_primary_key](../operations/settings/settings.md#settings-settings-force_primary_key).
The index by date only allows reading those parts that contain dates from the desired range. However, a data part may contain data for many dates (up to an entire month), while within a single part the data is ordered by the primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.
diff --git a/docs/en/table_engines/null.md b/docs/en/table_engines/null.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/replacingmergetree.md b/docs/en/table_engines/replacingmergetree.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/replication.md b/docs/en/table_engines/replication.md
old mode 100755
new mode 100644
index 20dd17e444f..1e58878c34e
--- a/docs/en/table_engines/replication.md
+++ b/docs/en/table_engines/replication.md
@@ -46,7 +46,7 @@ You can specify any existing ZooKeeper cluster and the system will use a directo
If ZooKeeper isn't set in the config file, you can't create replicated tables, and any existing replicated tables will be read-only.
-ZooKeeper isn't used for SELECT queries. In other words, replication doesn't affect the productivity of SELECT queries – they work just as fast as for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](../operations/settings/settings.md#settings_settings_max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](../operations/settings/settings.md#settings-settings-fallback_to_stale_replicas_for_distributed_queries).
+ZooKeeper isn't used for SELECT queries. In other words, replication doesn't affect the productivity of SELECT queries – they work just as fast as for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](../operations/settings/settings.md#settings_settings_max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](../operations/settings/settings.md#settings-settings-fallback_to_stale_replicas_for_distributed_queries).
For each INSERT query (more precisely, for each inserted block of data; the INSERT query contains a single block, or per block for every max_insert_block_size = 1048576 rows), approximately ten entries are made in ZooKeeper in several transactions. This leads to slightly longer latencies for INSERT compared to non-replicated tables. But if you follow the recommendations to insert data in batches of no more than one INSERT per second, it doesn't create any problems. The entire ClickHouse cluster used for coordinating one ZooKeeper cluster has a total of several hundred INSERTs per second. The throughput on data inserts (the number of rows per second) is just as high as for non-replicated data.
diff --git a/docs/en/table_engines/set.md b/docs/en/table_engines/set.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/summingmergetree.md b/docs/en/table_engines/summingmergetree.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/tinylog.md b/docs/en/table_engines/tinylog.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_engines/view.md b/docs/en/table_engines/view.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_functions/index.md b/docs/en/table_functions/index.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_functions/merge.md b/docs/en/table_functions/merge.md
old mode 100755
new mode 100644
diff --git a/docs/en/table_functions/remote.md b/docs/en/table_functions/remote.md
old mode 100755
new mode 100644
index e26e245207b..99b0c7bb116
--- a/docs/en/table_functions/remote.md
+++ b/docs/en/table_functions/remote.md
@@ -52,7 +52,7 @@ example01-{01..02}-1
If you have multiple pairs of curly brackets, it generates the direct product of the corresponding sets.
-Addresses and parts of addresses in curly brackets can be separated by the pipe symbol (|). In this case, the corresponding sets of addresses are interpreted as replicas, and the query will be sent to the first healthy replica. However, the replicas are iterated in the order currently set in the [load_balancing](../operations/settings/settings.md#settings-load_balancing) setting.
+Addresses and parts of addresses in curly brackets can be separated by the pipe symbol (|). In this case, the corresponding sets of addresses are interpreted as replicas, and the query will be sent to the first healthy replica. The replicas are evaluated in the order currently set in the [load_balancing](../operations/settings/settings.md#settings-load_balancing) setting.
Example:
diff --git a/docs/en/utils/clickhouse-copier.md b/docs/en/utils/clickhouse-copier.md
old mode 100755
new mode 100644
index 9d15053fe06..25d22f19222
--- a/docs/en/utils/clickhouse-copier.md
+++ b/docs/en/utils/clickhouse-copier.md
@@ -1,54 +1,40 @@
-
+# clickhouse-copier util
-# clickhouse-copier
+The util copies tables data from one cluster to new tables of other (possibly the same) cluster in distributed and fault-tolerant manner.
-Copies data from the tables in one cluster to tables in another (or the same) cluster.
+Configuration of copying tasks is set in special ZooKeeper node (called the `/description` node).
+A ZooKeeper path to the description node is specified via `--task-path ` parameter.
+So, node `/task/path/description` should contain special XML content describing copying tasks.
-You can run multiple `clickhouse-copier` instances on different servers to perform the same job. ZooKeeper is used for syncing the processes.
+Simultaneously many `clickhouse-copier` processes located on any servers could execute the same task.
+ZooKeeper node `/task/path/` is used by the processes to coordinate their work.
+You must not add additional child nodes to `/task/path/`.
-After starting, `clickhouse-copier`:
+Currently you are responsible for manual launching of all `cluster-copier` processes.
+You can launch as many processes as you want, whenever and wherever you want.
+Each process try to select the nearest available shard of source cluster and copy some part of data (partition) from it to the whole
+destination cluster (with resharding).
+Therefore it makes sense to launch cluster-copier processes on the source cluster nodes to reduce the network usage.
-- Connects to ZooKeeper and receives:
- - Copying jobs.
- - The state of the copying jobs.
-
-- It performs the jobs.
-
- Each running process chooses the "closest" shard of the source cluster and copies the data into the destination cluster, resharding the data if necessary.
-
-`clickhouse-copier` tracks the changes in ZooKeeper and applies them on the fly.
-
-To reduce network traffic, we recommend running `clickhouse-copier` on the same server where the source data is located.
-
-## Running clickhouse-copier
-
-The utility should be run manually:
-
-```bash
-clickhouse-copier copier --daemon --config zookeeper.xml --task-path /task/path --base-dir /path/to/dir
-```
-
-Parameters:
-
-- `daemon` — Starts `clickhouse-copier` in daemon mode.
-- `config` — The path to the `zookeeper.xml` file with the parameters for the connection to ZooKeeper.
-- `task-path` — The path to the ZooKeeper node. This node is used for syncing `clickhouse-copier` processes and storing tasks. Tasks are stored in `$task-path/description`.
-- `base-dir` — The path to logs and auxiliary files. When it starts, `clickhouse-copier` creates `clickhouse-copier_YYYYMMHHSS_` subdirectories in `$base-dir`. If this parameter is omitted, the directories are created in the directory where `clickhouse-copier` was launched.
-
-## Format of zookeeper.xml
+Since the workers coordinate their work via ZooKeeper, in addition to `--task-path ` you have to specify ZooKeeper
+cluster configuration via `--config-file ` parameter. Example of `zookeeper.xml`:
```xml
-
+ 127.0.0.12181
-
+
```
-## Configuration of copying tasks
+When you run `clickhouse-copier --config-file --task-path ` the process connects to ZooKeeper cluster, reads tasks config from `/task/path/description` and executes them.
+
+## Format of task config
+
+Here is an example of `/task/path/description` content:
```xml
@@ -83,62 +69,62 @@ Parameters:
0
-
3
-
+
1
-
-
+
-
+
source_clustertesthits
-
+
destination_clustertesthits2
-
-
- ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}')
+ ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}')
PARTITION BY toMonday(date)
ORDER BY (CounterID, EventDate)
-
+
jumpConsistentHash(intHash64(UserID), 2)
-
+
CounterID != 0
-
'2018-02-26'
@@ -147,7 +133,7 @@ Parameters:
-
+
...
@@ -156,5 +142,15 @@ Parameters:
```
-`clickhouse-copier` tracks the changes in `/task/path/description` and applies them on the fly. For instance, if you change the value of `max_workers`, the number of processes running tasks will also change.
+cluster-copier processes watch for `/task/path/description` node update.
+So, if you modify the config settings or `max_workers` params, they will be updated.
+## Example
+
+```bash
+clickhouse-copier copier --daemon --config /path/to/copier/zookeeper.xml --task-path /clickhouse-copier/cluster1_tables_hits --base-dir /path/to/copier_logs
+```
+
+`--base-dir /path/to/copier_logs` specifies where auxilary and log files of the copier process will be saved.
+In this case it will create `/path/to/copier_logs/clickhouse-copier_YYYYMMHHSS_/` dir with log and status-files.
+If it is not specified it will use current dir (`/clickhouse-copier_YYYYMMHHSS_/` if it is run as a `--daemon`).
diff --git a/docs/en/utils/clickhouse-local.md b/docs/en/utils/clickhouse-local.md
old mode 100755
new mode 100644
index d5fba56271f..d18cc200320
--- a/docs/en/utils/clickhouse-local.md
+++ b/docs/en/utils/clickhouse-local.md
@@ -1,6 +1,4 @@
-
+# The clickhouse-local program
-#clickhouse-local
-
-The `clickhouse-local` program enables you to perform fast processing on local files that store tables, without having to deploy and configure the ClickHouse server.
+The `clickhouse-local` program enables you to perform fast processing on local files that store tables, without having to deploy and configure clickhouse-server.
diff --git a/docs/en/utils/index.md b/docs/en/utils/index.md
old mode 100755
new mode 100644
index cf541cda895..7a8c5ee5138
--- a/docs/en/utils/index.md
+++ b/docs/en/utils/index.md
@@ -1,5 +1,6 @@
-# ClickHouse utility
+# ClickHouse utilites
-* [clickhouse-local](clickhouse-local.md#utils-clickhouse-local) — Allows running SQL queries on data without stopping the ClickHouse server, similar to how `awk` does this.
-* [clickhouse-copier](clickhouse-copier.md#utils-clickhouse-copier) — Copies (and reshards) data from one cluster to another cluster.
+There are several ClickHouse utilites that are separate executable files:
+* `clickhouse-local` allows to execute SQL queries on a local data like `awk`
+* `clickhouse-copier` copies (and reshards) immutable data from one cluster to another in a fault-tolerant manner.
diff --git a/docs/mkdocs_en.yml b/docs/mkdocs_en.yml
index eeedc71a79b..d5fadb3f1e1 100644
--- a/docs/mkdocs_en.yml
+++ b/docs/mkdocs_en.yml
@@ -86,7 +86,6 @@ pages:
- 'GraphiteMergeTree': 'table_engines/graphitemergetree.md'
- 'Data replication': 'table_engines/replication.md'
- 'Distributed': 'table_engines/distributed.md'
- - 'Dictionary': 'table_engines/dictionary.md'
- 'Merge': 'table_engines/merge.md'
- 'Buffer': 'table_engines/buffer.md'
- 'File': 'table_engines/file.md'
diff --git a/docs/ru/dicts/external_dicts_dict_layout.md b/docs/ru/dicts/external_dicts_dict_layout.md
index defb0605c0f..ff1b9c0cdd5 100644
--- a/docs/ru/dicts/external_dicts_dict_layout.md
+++ b/docs/ru/dicts/external_dicts_dict_layout.md
@@ -109,15 +109,15 @@
Пример: таблица содержит скидки для каждого рекламодателя в виде:
```
-+---------------+---------------------+-------------------+--------+
-| advertiser id | discount start date | discount end date | amount |
-+===============+=====================+===================+========+
-| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
-+---------------+---------------------+-------------------+--------+
-| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
-+---------------+---------------------+-------------------+--------+
-| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
-+---------------+---------------------+-------------------+--------+
+ +------------------+-----------------------------+------------+----------+
+ | id рекламодателя | дата начала действия скидки | дата конца | величина |
+ +==================+=============================+============+==========+
+ | 123 | 2015-01-01 | 2015-01-15 | 0.15 |
+ +------------------+-----------------------------+------------+----------+
+ | 123 | 2015-01-16 | 2015-01-31 | 0.25 |
+ +------------------+-----------------------------+------------+----------+
+ | 456 | 2015-01-01 | 2015-01-15 | 0.05 |
+ +------------------+-----------------------------+------------+----------+
```
Чтобы использовать выборку по диапазонам дат, необходимо в [structure](external_dicts_dict_structure#dicts-external_dicts_dict_structure) определить элементы `range_min`, `range_max`.
diff --git a/docs/ru/introduction/ya_metrika_task.md b/docs/ru/introduction/ya_metrika_task.md
index 765c0450890..24e595b2c49 100644
--- a/docs/ru/introduction/ya_metrika_task.md
+++ b/docs/ru/introduction/ya_metrika_task.md
@@ -1,6 +1,6 @@
# Постановка задачи в Яндекс.Метрике
-ClickHouse на данный момент обеспечивает работу [Яндекс.Метрики](https://metrika.yandex.ru/), [второй крупнейшей в мире](http://w3techs.com/technologies/overview/traffic_analysis/all) платформы для веб аналитики. При более 13 триллионах записей в базе данных и более 20 миллиардах событий в сутки, ClickHouse позволяет генерировать индивидуально настроенные отчёты на лету напрямую из неагрегированных данных.
+ClickHouse на данный момент обеспечивает рабту [Яндекс.Метрики](https://metrika.yandex.ru/), [второй крупнейшей в мире](http://w3techs.com/technologies/overview/traffic_analysis/all) платформы для веб аналитики. При более 13 триллионах записей в базе данных и более 20 миллиардах событий в сутки, ClickHouse позволяет генерировать индивидуально настроенные отчёты на лету напрямую из неагрегированных данных.
Нужно получать произвольные отчёты на основе хитов и визитов, с произвольными сегментами, задаваемыми пользователем. Данные для отчётов обновляются в реальном времени. Запросы должны выполняться сразу (в режиме онлайн). Отчёты должно быть возможно строить за произвольный период. Требуется вычислять сложные агрегаты типа количества уникальных посетителей.
На данный момент (апрель 2014), каждый день в Яндекс.Метрику поступает около 12 миллиардов событий (хитов и кликов мыши). Все эти события должны быть сохранены для возможности строить произвольные отчёты. Один запрос может потребовать просканировать сотни миллионов строк за время не более нескольких секунд, или миллионы строк за время не более нескольких сотен миллисекунд.
diff --git a/docs/ru/roadmap.md b/docs/ru/roadmap.md
index bbdc740bf01..13c2b60c094 100644
--- a/docs/ru/roadmap.md
+++ b/docs/ru/roadmap.md
@@ -5,15 +5,15 @@
### Новая функциональность
- Поддержка `UPDATE` и `DELETE`.
- Многомерные и вложенные массивы.
-
+
Это может выглядеть например так:
-
+
```sql
CREATE TABLE t
(
- x Array(Array(String)),
+ x Array(Array(String)),
z Nested(
- x Array(String),
+ x Array(String),
y Nested(...))
)
ENGINE = MergeTree ORDER BY x
@@ -24,7 +24,7 @@ ENGINE = MergeTree ORDER BY x
Внешние таблицы можно интрегрировать в ClickHouse с помощью внешних словарей. Новая функциональность станет более удобной альтернативой для подключения внешних таблиц.
```sql
-SELECT ...
+SELECT ...
FROM mysql('host:port', 'db', 'table', 'user', 'password')`
```
@@ -40,7 +40,7 @@ INSERT INTO t SELECT * FROM remote(...) `.
- O_DIRECT for merges.
Улучшит производительность кэша операционной системы, а также производительность 'горячих' запросов.
-
+
## Q2 2018
### Новая функциональность
@@ -56,23 +56,23 @@ INSERT INTO t SELECT * FROM remote(...) `.
- Настраиваемые кодировки и сжатие для каждого столбца в отдельности.
Сейчас, ClickHouse поддерживает сжатие столбцов с помощью LZ4 и ZSTD, и настройки сжатия глобальные (смотрите статью [Compression in ClickHouse](https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse)). Поколоночное сжатие и кодирование обеспечит более эффективное хранение данных, что в свою очередь ускорит выполнение запросов.
-
+
- Хранение данных на нескольких дисках на одном сервере.
Реализация это функциональности упростит расширение дискового пространства, поскольку можно будет использовать различные дисковые системы для разных баз данных или таблиц. Сейчас, пользователи вынуждены использовать символические ссылки, если базы данных и таблицы должны храниться на другом диске.
-
+
### Улучшения
Планируется множество улучшений и исправлений в системе выполнения запросов. Например:
- Использование индекса для `in (subquery)`.
- Сейчас, индекс не используется, что приводит к снижению производительности.
-
+ Сейчас, индекс не используется, что приводит с снижению производительности.
+
- Передача предикатов из `where` в подзапросы, а также передача предикатов в представления.
Передача предикатов необходима, поскольку представление изменяется поздапросом. Сейчас производительность фильтров для представлений низкая, представления не могут использовать первичный ключ оригинальной таблицы, что делает представления для больших таблиц бесполезными.
-
+
- Оптимизация операций с ветвлением (тернарный оператор, if, multiIf).
Сейчас, ClickHouse выполняет все ветви, даже если в этом нет необходимости.
@@ -88,7 +88,7 @@ INSERT INTO t SELECT * FROM remote(...) `.
- Пулы ресурсов для выполнения запросов.
Позволят более эффективно управлять нагрузкой.
-
+
- Синтаксис ANSI SQL JOIN.
Улучшит совместимость ClickHouse со множеством SQL-инструментов.