CLICKHOUSE-2720: progress on website (#865)

* update presentations

* CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com

* update submodule

* lost files

* CLICKHOUSE-2981: prefer sphinx docs over original reference

* CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links

* update presentations

* Less confusing directory structure (docs -> doc/reference/)

* Minify sphinx docs too

* Website release script: fail fast + pass docker hash on deploy

* Do not underline links in docs

* shorter

* cleanup docker images

* tune nginx config

* CLICKHOUSE-3043: get rid of habrastorage links

* Lost translation

* CLICKHOUSE-2936: temporary client-side redirect

* behaves weird in test

* put redirect back

* CLICKHOUSE-3047: copy docs txts to public too

* move to proper file

* remove old pages to avoid confusion

* Remove reference redirect warning for now

* Refresh README.md

* Yellow buttons in docs

* Use svg flags instead of unicode ones in docs

* fix test website instance

* Put flags to separate files

* wrong flag

* Copy Yandex.Metrica introduction from main page to docs

* Yet another home page structure change, couple new blocks (CLICKHOUSE-3045)

* Update Contacts section

* CLICKHOUSE-2849: more detailed legal information

* CLICKHOUSE-2978 preparation - split by files

* More changes in Contacts block

* Tune texts on index page

* update presentations

* One more benchmark

* Add usage sections to index page, adapted from slides

* Get the roadmap started, based on slides from last ClickHouse Meetup

* CLICKHOUSE-2977: some rendering tuning

* Get rid of excessive section in the end of getting started

* Make headers linkable

* CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849

* CLICKHOUSE-2981: fix mobile styles in docs

* Ban crawling of duplicating docs

* Open some external links in new tab

* Ban old docs too

* Lots of trivial fixes in english docs

* Lots of trivial fixes in russian docs

* Remove getting started copies in markdown

* Add Yandex.Webmaster

* Fix some sphinx warnings

* More warnings fixed in english docs

* More sphinx warnings fixed

* Add code-block:: text

* More code-block:: text

* These headers look not that well

* Better switch between documentation languages

* merge use_case.rst into ya_metrika_task.rst

* Edit the agg_functions.rst texts

* Add lost empty lines
This commit is contained in:
Ivan Blinkov 2017-06-13 07:15:47 +03:00 committed by alexey-milovidov
parent 49eab81d9a
commit e8a5804e76
190 changed files with 1875 additions and 1732 deletions

@ -1 +1 @@
Subproject commit 20b95b4022d0d8749ba9e07385bbbf14c7e0bb8b
Subproject commit 7ddc53e4d70bc5f6e63d044b99ec81f3b0d0bd5a

View File

@ -102,3 +102,14 @@ input[type="submit"] {
font-style: normal;
font-stretch: normal
}
@media screen and (max-width: 870px) {
div.sphinxsidebar a {
color: #fff;
}
div.document, div.footer {
width: auto;
}
}

View File

@ -0,0 +1,38 @@
$(function() {
$('a[href="#edit"]').on('click', function(e) {
e.preventDefault();
var pathname = window.location.pathname;
var url;
if (pathname.indexOf('html') >= 0) {
url = pathname.replace('/docs/', 'https://github.com/yandex/ClickHouse/edit/master/doc/reference/').replace('html', 'rst');
} else {
if (pathname.indexOf('/single/') >= 0) {
if (pathname.indexOf('ru') >= 0) {
url = 'https://github.com/yandex/ClickHouse/tree/master/doc/reference/ru';
} else {
url = 'https://github.com/yandex/ClickHouse/tree/master/doc/reference/en';
}
} else {
if (pathname.indexOf('ru') >= 0) {
url = 'https://github.com/yandex/ClickHouse/edit/master/doc/reference/ru/index.rst';
} else {
url = 'https://github.com/yandex/ClickHouse/edit/master/doc/reference/en/index.rst';
}
}
}
if (url) {
var win = window.open(url, '_blank');
win.focus();
}
});
$('a[href="#en"]').on('click', function(e) {
e.preventDefault();
window.location = window.location.toString().replace('/ru/', '/en/');
});
$('a[href="#ru"]').on('click', function(e) {
e.preventDefault();
window.location = window.location.toString().replace('/en/', '/ru/');
});
});

View File

@ -1,22 +1,22 @@
Aggregate functions
==================
===================
count()
-------
Counts the number of rows. Accepts zero arguments and returns UInt64.
The syntax COUNT(DISTINCT x) is not supported. The separate 'uniq' aggregate function exists for this purpose.
The syntax ``COUNT(DISTINCT x)`` is not supported. The separate ``uniq`` aggregate function exists for this purpose.
A 'SELECT count() FROM table' query is not optimized, because the number of entries in the table is not stored separately. It will select some small column from the table and count the number of values in it.
A ``SELECT count() FROM table`` query is not optimized, because the number of entries in the table is not stored separately. It will select some small column from the table and count the number of values in it.
any(x)
------
Selects the first encountered value.
The query can be executed in any order and even in a different order each time, so the result of this function is indeterminate.
To get a determinate result, you can use the 'min' or 'max' function instead of 'any'.
To get a determinate result, you can use the ``min`` or ``max`` function instead of ``any``.
In some cases, you can rely on the order of execution. This applies to cases when SELECT comes from a subquery that uses ORDER BY.
In some cases, you can rely on the order of execution. This applies to cases when ``SELECT`` comes from a subquery that uses ``ORDER BY``.
When a SELECT query has the GROUP BY clause or at least one aggregate function, ClickHouse (in contrast to MySQL) requires that all expressions in the SELECT, HAVING, and ORDER BY clauses be calculated from keys or from aggregate functions. That is, each column selected from the table must be used either in keys, or inside aggregate functions. To get behavior like in MySQL, you can put the other columns in the 'any' aggregate function.
When a SELECT query has the GROUP BY clause or at least one aggregate function, ClickHouse (in contrast to for example MySQL) requires that all expressions in the ``SELECT``, ``HAVING`` and ``ORDER BY`` clauses be calculated from keys or from aggregate functions. That is, each column selected from the table must be used either in keys, or inside aggregate functions. To get behavior like in MySQL, you can put the other columns in the ``any`` aggregate function.
anyLast(x)
----------
@ -28,7 +28,7 @@ min(x)
Calculates the minimum.
max(x)
-----
------
Calculates the maximum
argMin(arg, val)
@ -36,11 +36,11 @@ argMin(arg, val)
Calculates the 'arg' value for a minimal 'val' value. If there are several different values of 'arg' for minimal values of 'val', the first of these values encountered is output.
argMax(arg, val)
---------------
----------------
Calculates the 'arg' value for a maximum 'val' value. If there are several different values of 'arg' for maximum values of 'val', the first of these values encountered is output.
sum(x)
-------
------
Calculates the sum.
Only works for numbers.
@ -51,88 +51,88 @@ Only works for numbers.
The result is always Float64.
uniq(x)
--------
-------
Calculates the approximate number of different values of the argument. Works for numbers, strings, dates, and dates with times.
Uses an adaptive sampling algorithm: for the calculation state, it uses a sample of element hash values with a size up to 65535.
Compared with the widely known HyperLogLog algorithm, this algorithm is less effective in terms of accuracy and memory consumption (even up to proportionality), but it is adaptive. This means that with fairly high accuracy, it consumes less memory during simultaneous computation of cardinality for a large number of data sets whose cardinality has power law distribution (i.e. in cases when most of the data sets are small). This algorithm is also very accurate for data sets with small cardinality (up to 65536) and very efficient on CPU (when computing not too many of these functions, using 'uniq' is almost as fast as using other aggregate functions).
Compared with the widely known `HyperLogLog <https://en.wikipedia.org/wiki/HyperLogLog>`_ algorithm, this algorithm is less effective in terms of accuracy and memory consumption (even up to proportionality), but it is adaptive. This means that with fairly high accuracy, it consumes less memory during simultaneous computation of cardinality for a large number of data sets whose cardinality has power law distribution (i.e. in cases when most of the data sets are small). This algorithm is also very accurate for data sets with small cardinality (up to 65536) and very efficient on CPU (when computing not too many of these functions, using ``uniq`` is almost as fast as using other aggregate functions).
There is no compensation for the bias of an estimate, so for large data sets the results are systematically deflated. This function is normally used for computing the number of unique visitors in Yandex.Metrica, so this bias does not play a role.
The result is determinate (it doesn't depend on the order of query execution).
The result is deterministic (it does not depend on the order of query execution).
uniqCombined(x)
--------------
---------------
Approximately computes the number of different values of the argument. Works for numbers, strings, dates, date-with-time, for several arguments and arguments-tuples.
A combination of three algorithms is used: an array, a hash table and HyperLogLog with an error correction table. The memory consumption is several times smaller than the uniq function, and the accuracy is several times higher. The speed of operation is slightly lower than that of the uniq function, but sometimes it can be even higher - in the case of distributed requests, in which a large number of aggregation states are transmitted over the network. The maximum state size is 96 KiB (HyperLogLog of 217 6-bit cells).
A combination of three algorithms is used: an array, a hash table and `HyperLogLog <https://en.wikipedia.org/wiki/HyperLogLog>`_ with an error correction table. The memory consumption is several times smaller than the ``uniq`` function, and the accuracy is several times higher. The speed of operation is slightly lower than that of the ``uniq`` function, but sometimes it can be even higher - in the case of distributed requests, in which a large number of aggregation states are transmitted over the network. The maximum state size is 96 KiB (HyperLogLog of 217 6-bit cells).
The result is deterministic (it does not depend on the order of query execution).
The uniqCombined function is a good default choice for calculating the number of different values.
The ``uniqCombined`` function is a good default choice for calculating the number of different values.
uniqHLL12(x)
------------
Uses the HyperLogLog algorithm to approximate the number of different values of the argument. It uses 212 5-bit cells. The size of the state is slightly more than 2.5 KB.
Uses the `HyperLogLog <https://en.wikipedia.org/wiki/HyperLogLog>`_ algorithm to approximate the number of different values of the argument. It uses 212 5-bit cells. The size of the state is slightly more than 2.5 KB.
The result is determinate (it doesn't depend on the order of query execution).
The result is deterministic (it does not depend on the order of query execution).
In most cases, use the 'uniq' function. You should only use this function if you understand its advantages well.
uniqExact(x)
------------
Calculates the number of different values of the argument, exactly.
There is no reason to fear approximations, so it's better to use the 'uniq' function.
You should use the 'uniqExact' function if you definitely need an exact result.
There is no reason to fear approximations, so it's better to use the ``uniq`` function.
You should use the ``uniqExact`` function if you definitely need an exact result.
The 'uniqExact' function uses more memory than the 'uniq' function, because the size of the state has unbounded growth as the number of different values increases.
The ``uniqExact`` function uses more memory than the ``uniq`` function, because the size of the state has unbounded growth as the number of different values increases.
groupArray(x)
------------
-------------
Creates an array of argument values.
Values can be added to the array in any (indeterminate) order.
In some cases, you can rely on the order of execution. This applies to cases when SELECT comes from a subquery that uses ORDER BY.
In some cases, you can rely on the order of execution. This applies to cases when ``SELECT`` comes from a subquery that uses ``ORDER BY``.
groupUniqArray(x)
-----------------
Creates an array from different argument values. Memory consumption is the same as for the 'uniqExact' function.
Creates an array from different argument values. Memory consumption is the same as for the ``uniqExact`` function.
quantile(level)(x)
------------------
Approximates the 'level' quantile. 'level' is a constant, a floating-point number from 0 to 1. We recommend using a 'level' value in the range of 0.01 .. 0.99.
Approximates the 'level' quantile. 'level' is a constant, a floating-point number from 0 to 1. We recommend using a 'level' value in the range of 0.01..0.99.
Don't use a 'level' value equal to 0 or 1 - use the 'min' and 'max' functions for these cases.
The algorithm is the same as for the 'median' function. Actually, 'quantile' and 'median' are internally the same function. You can use the 'quantile' function without parameters - in this case, it calculates the median, and you can use the 'median' function with parameters - in this case, it calculates the quantile of the set level.
The algorithm is the same as for the ``median`` function. Actually, ``quantile`` and ``median`` are internally the same function. You can use the ``quantile`` function without parameters - in this case, it calculates the median, and you can use the ``median`` function with parameters - in this case, it calculates the quantile of the set level.
When using multiple 'quantile' and 'median' functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the 'quantiles' function.
When using multiple ``quantile` and ``median`` functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the ``quantiles`` function.
quantileDeterministic(level)(x, determinator)
--------------
Calculates the quantile of 'level' using the same algorithm as the 'medianDeterministic' function.
---------------------------------------------
Calculates the quantile of 'level' using the same algorithm as the ``medianDeterministic`` function.
quantileTiming(level)(x)
---------------
Calculates the quantile of 'level' using the same algorithm as the 'medianTiming' function.
------------------------
Calculates the quantile of 'level' using the same algorithm as the ``medianTiming`` function.
quantileTimingWeighted(level)(x, weight)
---------------
Calculates the quantile of 'level' using the same algorithm as the 'medianTimingWeighted' function.
----------------------------------------
Calculates the quantile of 'level' using the same algorithm as the ``medianTimingWeighted`` function.
quantileExact(level)(x)
------------
Computes the level quantile exactly. To do this, all transferred values are added to an array, which is then partially sorted. Therefore, the function consumes O (n) memory, where n is the number of transferred values. However, for a small number of values, the function is very effective.
-----------------------
Computes the level quantile exactly. To do this, all transferred values are added to an array, which is then partially sorted. Therefore, the function consumes O(n) memory, where n is the number of transferred values. However, for a small number of values, the function is very effective.
quantileExactWeighted(level)(x, weight)
----------------
---------------------------------------
Computes the level quantile exactly. In this case, each value is taken into account with the weight weight - as if it is present weight once. The arguments of the function can be considered as histograms, where the value "x" corresponds to the "column" of the histogram of the height weight, and the function itself can be considered as the summation of histograms.
The algorithm is a hash table. Because of this, in case the transmitted values are often repeated, the function consumes less RAM than the quantileExact. You can use this function instead of quantileExact, specifying the number 1 as the weight.
quantileTDigest(level)(x)
-------------
Computes the level quantile approximatively, using the t-digest algorithm. The maximum error is 1%. The memory consumption per state is proportional to the logarithm of the number of transmitted values.
-------------------------
Computes the level quantile approximately, using the `t-digest <https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf>`_ algorithm. The maximum error is 1%. The memory consumption per state is proportional to the logarithm of the number of transmitted values.
The performance of the function is below quantile, quantileTiming. By the ratio of state size and accuracy, the function is significantly better than quantile.
@ -140,24 +140,24 @@ The result depends on the order in which the query is executed, and is nondeterm
median
------
Approximates the median. Also see the similar 'quantile' function.
Approximates the median. Also see the similar ``quantile`` function.
Works for numbers, dates, and dates with times.
For numbers it returns Float64, for dates - a date, and for dates with times - a date with time.
Uses reservoir sampling with a reservoir size up to 8192.
Uses `reservoir sampling <https://en.wikipedia.org/wiki/Reservoir_sampling>`_ with a reservoir size up to 8192.
If necessary, the result is output with linear approximation from the two neighboring values.
This algorithm proved to be more practical than another well-known algorithm - QDigest.
The result depends on the order of running the query, and is nondeterministic.
quantiles(level1, level2, ...)(x)
---------------
---------------------------------
Approximates quantiles of all specified levels.
The result is an array containing the corresponding number of values.
varSamp(x)
--------
Calculates the amount Σ((x - x̅)2) / (n - 1), where 'n' is the sample size and 'x̅' is the average value of 'x'.
----------
Calculates the amount ``Σ((x - x̅)2) / (n - 1)``, where 'n' is the sample size and 'x̅' is the average value of 'x'.
It represents an unbiased estimate of the variance of a random variable, if the values passed to the function are a sample of this random amount.
@ -165,40 +165,40 @@ Returns Float64. If n <= 1, it returns +∞.
varPop(x)
---------
Calculates the amount Σ((x - x̅)2) / n, where 'n' is the sample size and 'x̅' is the average value of 'x'.
Calculates the amount ``Σ((x - x̅)2) / n``, where 'n' is the sample size and 'x̅' is the average value of 'x'.
In other words, dispersion for a set of values. Returns Float64.
stddevSamp(x)
-----------
The result is equal to the square root of 'varSamp(x)'.
-------------
The result is equal to the square root of ``varSamp(x)``.
stddevPop(x)
---------
The result is equal to the square root of 'varPop(x)'.
------------
The result is equal to the square root of ``varPop(x)``.
covarSamp(x, y)
----------
Calculates the value of Σ((x - x̅)(y - y̅)) / (n - 1).
---------------
Calculates the value of ``Σ((x - x̅)(y - y̅)) / (n - 1)``.
Returns Float64. If n <= 1, it returns +∞.
covarPop(x, y)
----------
Calculates the value of Σ((x - x̅)(y - y̅)) / n.
--------------
Calculates the value of ``Σ((x - x̅)(y - y̅)) / n``.
corr(x, y)
---------
Calculates the Pearson correlation coefficient: Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)2) * Σ((y - y̅)2)).
----------
Calculates the Pearson correlation coefficient: ``Σ((x - x̅)(y - y̅)) / sqrt(Σ((x - x̅)2) * Σ((y - y̅)2))``.
Parametric aggregate functions
================
==============================
Some aggregate functions can accept not only argument columns (used for compression), but a set of parameters - constants for initialization. The syntax is two pairs of brackets instead of one. The first is for parameters, and the second is for arguments.
sequenceMatch(pattern)(time, cond1, cond2, ...)
------------
-----------------------------------------------
Pattern matching for event chains.
'pattern' is a string containing a pattern to match. The pattern is similar to a regular expression.
@ -208,11 +208,15 @@ Pattern matching for event chains.
The function collects a sequence of events in RAM. Then it checks whether this sequence matches the pattern.
It returns UInt8 - 0 if the pattern isn't matched, or 1 if it matches.
Example: sequenceMatch('(?1).*(?2)')(EventTime, URL LIKE '%company%', URL LIKE '%cart%')
Example: ``sequenceMatch('(?1).*(?2)')(EventTime, URL LIKE '%company%', URL LIKE '%cart%')``
- whether there was a chain of events in which pages with the address in company were visited earlier than pages with the address in cart.
This is a degenerate example. You could write it using other aggregate functions:
minIf(EventTime, URL LIKE '%company%') < maxIf(EventTime, URL LIKE '%cart%').
This is a simple example. You could write it using other aggregate functions:
.. code-block:: sql
minIf(EventTime, URL LIKE '%company%') < maxIf(EventTime, URL LIKE '%cart%').
However, there is no such solution for more complex situations.
Pattern syntax:
@ -226,12 +230,12 @@ Any number may be specified in place of 1800.
Events that occur during the same second may be put in the chain in any order. This may affect the result of the function.
sequenceCount(pattern)(time, cond1, cond2, ...)
------------------
-----------------------------------------------
Similar to the sequenceMatch function, but it does not return the fact that there is a chain of events, and UInt64 is the number of strings found.
Chains are searched without overlapping. That is, the following chain can start only after the end of the previous one.
uniqUpTo(N)(x)
-------------
--------------
Calculates the number of different argument values, if it is less than or equal to N.
If the number of different argument values is greater than N, it returns N + 1.
@ -247,12 +251,12 @@ Problem: Generate a report that shows only keywords that produced at least 5 uni
Solution: Write in the query ``GROUP BY SearchPhrase HAVING uniqUpTo(4)(UserID) >= 5``
Aggregate function combinators
=======================
==============================
The name of an aggregate function can have a suffix appended to it. This changes the way the aggregate function works.
There are ``If`` and ``Array`` combinators. See the sections below.
-If combinator. Conditional aggregate functions
---------------------
If combinator. Conditional aggregate functions
----------------------------------------------
The suffix ``-If`` can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument - a condition (Uint8 type). The aggregate function processes only the rows that trigger the condition. If the condition was not triggered even once, it returns a default value (usually zeros or empty strings).
Examples: ``sumIf(column, cond)``, ``countIf(cond)``, ``avgIf(x, cond)``, ``quantilesTimingIf(level1, level2)(x, cond)``, ``argMinIf(arg, val, cond)`` and so on.
@ -260,8 +264,8 @@ Examples: ``sumIf(column, cond)``, ``countIf(cond)``, ``avgIf(x, cond)``, ``quan
You can use aggregate functions to calculate aggregates for multiple conditions at once, without using subqueries and JOINs.
For example, in Yandex.Metrica, we use conditional aggregate functions for implementing segment comparison functionality.
-Array combinator. Aggregate functions for array arguments
-----------------
Array combinator. Aggregate functions for array arguments
---------------------------------------------------------
The -Array suffix can be appended to any aggregate function. In this case, the aggregate function takes arguments of the 'Array(T)' type (arrays) instead of 'T' type arguments. If the aggregate function accepts multiple arguments, this must be arrays of equal lengths. When processing arrays, the aggregate function works like the original aggregate function across all array elements.
Example 1: ``sumArray(arr)`` - Totals all the elements of all 'arr' arrays. In this example, it could have been written more simply: sum(arraySum(arr)).
@ -271,14 +275,14 @@ Example 2: ``uniqArray(arr)`` - Count the number of unique elements in all 'arr'
The ``-If`` and ``-Array`` combinators can be used together. However, 'Array' must come first, then 'If'.
Examples: ``uniqArrayIf(arr, cond)``, ``quantilesTimingArrayIf(level1, level2)(arr, cond)``. Due to this order, the 'cond' argument can't be an array.
-State combinator
------------
State combinator
----------------
If this combinator is used, the aggregate function returns a non-completed/non-finished value (for example, in the case of the ``uniq`` function, the number of unique values), and the intermediate aggregation state (for example, in the case of the ``uniq`` function, a hash table for calculating the number of unique values), which has type of ``AggregateFunction(...)`` and can be used for further processing or can be saved to a table for subsequent pre-aggregation - see the sections "AggregatingMergeTree" and "functions for working with intermediate aggregation states".
-Merge combinator
------------
Merge combinator
----------------
In the case of using this combinator, the aggregate function will take as an argument the intermediate state of an aggregation, pre-aggregate (combine together) these states, and return the finished/complete value.
-MergeState combinator
----------------
MergeState combinator
---------------------
Merges the intermediate aggregation states, similar to the -Merge combinator, but returns a non-complete value, but an intermediate aggregation state, similar to the -State combinator.

View File

@ -125,10 +125,11 @@ html_theme_options = {
'link': '#08f',
'link_hover': 'red',
'extra_nav_links': collections.OrderedDict([
('Switch to Russian <img id="svg-flag" src="/docs/en/_static/ru.svg" width="20" height="12" />', '/docs/ru/'),
('Switch to Russian <img id="svg-flag" src="/docs/en/_static/ru.svg" width="20" height="12" />', '#ru'),
('Single page documentation', '/docs/en/single/'),
('Website home', '/'),
('GitHub', 'https://github.com/yandex/ClickHouse'),
('ClickHouse repository', 'https://github.com/yandex/ClickHouse'),
('Edit this page', '#edit'),
])
}
@ -290,3 +291,6 @@ texinfo_documents = [
# If true, do not generate a @detailmenu in the "Top" node's menu.
#texinfo_no_detailmenu = False
def setup(app):
app.add_javascript('custom.js')

View File

@ -1,5 +1,5 @@
Configuration files
======================
===================
The main server config file is ``config.xml``. It resides in the ``/etc/clickhouse-server/`` directory.

View File

@ -2,4 +2,5 @@ Array(T)
--------
Array of T-type items. The T type can be any type, including an array.
We don't recommend using multidimensional arrays, because they are not well supported (for example, you can't store multidimensional arrays in tables with engines from MergeTree family).

View File

@ -1,4 +1,4 @@
Boolean
---------------
-------
There is no separate type for boolean values. For them, the type UInt8 is used, in which only the values 0 and 1 are used.

View File

@ -5,7 +5,7 @@ Date with time. Stored in four bytes as a Unix timestamp (unsigned). Allows stor
Time zones
~~~~~~~~~~~~~
~~~~~~~~~~
The date with time is converted from text (divided into component parts) to binary and back, using the system's time zone at the time the client or server starts. In text format, information about daylight savings is lost.

View File

@ -5,9 +5,11 @@ Enum8 or Enum16. A set of enumerated string values that are stored as Int8 or In
Example:
::
.. code-block:: sql
Enum8('hello' = 1, 'world' = 2)
- This data type has two possible values - 'hello' and 'world'.
This data type has two possible values - 'hello' and 'world'.
The numeric values must be within -128..127 for ``Enum8`` and -32768..32767 for ``Enum16``. Every member of the enum must also have different numbers. The empty string is a valid value. The numbers do not need to be sequential and can be in any order. The order does not matter.

View File

@ -1,5 +1,5 @@
Data types
===========
==========
.. toctree::
:glob:

View File

@ -4,7 +4,7 @@ UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64
Fixed-length integers, with or without a sign.
Int ranges
"""""""""""""
""""""""""
.. table::
@ -23,7 +23,7 @@ Int ranges
Uint ranges
""""""""""""""
"""""""""""
.. table::

View File

@ -1,4 +1,4 @@
AggregateFunction(name, types_of_arguments...)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The intermediate state of an aggregate function. To get it, use aggregate functions with the '-State' suffix. For more information, see "AggregatingMergeTree".

View File

@ -1,5 +1,5 @@
Nested data structures
--------------------------
----------------------
.. toctree::
:glob:

View File

@ -45,6 +45,8 @@ Example:
WHERE CounterID = 101500 AND length(Goals.ID) < 5
LIMIT 10
.. code-block:: text
┌─Goals.ID───────────────────────┬─Goals.EventTime───────────────────────────────────────────────────────────────────────────┐
│ [1073752,591325,591325] │ ['2014-03-17 16:38:10','2014-03-17 16:38:48','2014-03-17 16:42:27'] │
│ [1073752] │ ['2014-03-17 00:28:25'] │
@ -72,6 +74,8 @@ The only place where a SELECT query can specify the name of an entire nested dat
WHERE CounterID = 101500 AND length(Goals.ID) < 5
LIMIT 10
.. code-block:: text
┌─Goal.ID─┬──────Goal.EventTime─┐
│ 1073752 │ 2014-03-17 16:38:10 │
│ 591325 │ 2014-03-17 16:38:48 │
@ -85,6 +89,7 @@ The only place where a SELECT query can specify the name of an entire nested dat
│ 1073752 │ 2014-03-17 11:37:06 │
└─────────┴─────────────────────┘
You can't perform SELECT for an entire nested data structure. You can only explicitly list individual columns that are part of it.
For an INSERT query, you should pass all the component column arrays of a nested data structure separately (as if they were individual column arrays). During insertion, the system checks that they have the same length.

View File

@ -1,5 +1,5 @@
Special data types
----------------------
------------------
Special data type values can't be saved to a table or output in results, but are used as the intermediate result of running a query.

View File

@ -5,7 +5,7 @@ Strings of an arbitrary length. The length is not limited. The value can contain
The String type replaces the types VARCHAR, BLOB, CLOB, and others from other DBMSs.
Кодировки
Encodings
~~~~~~~~~
ClickHouse doesn't have the concept of encodings. Strings can contain an arbitrary set of bytes, which are stored and output as-is.

View File

@ -1,15 +1,15 @@
External dictionaries
===============
=====================
It is possible to add your own dictionaries from various data sources. The data source for a dictionary can be a file in the local file system, the ClickHouse server, or a MySQL server.
A dictionary can be stored completely in RAM and updated regularly, or it can be partially cached in RAM and dynamically load missing values.
The configuration of external dictionaries is in a separate file or files specified in the 'dictionaries_config' configuration parameter.
The configuration of external dictionaries is in a separate file or files specified in the ``dictionaries_config`` configuration parameter.
This parameter contains the absolute or relative path to the file with the dictionary configuration. A relative path is relative to the directory with the server config file. The path can contain wildcards * and ?, in which case all matching files are found. Example: dictionaries/*.xml.
The dictionary configuration, as well as the set of files with the configuration, can be updated without restarting the server. The server checks updates every 5 seconds. This means that dictionaries can be enabled dynamically.
Dictionaries can be created when starting the server, or at first use. This is defined by the 'dictionaries_lazy_load' parameter in the main server config file. This parameter is optional, 'true' by default. If set to 'true', each dictionary is created at first use. If dictionary creation failed, the function that was using the dictionary throws an exception. If 'false', all dictionaries are created when the server starts, and if there is an error, the server shuts down.
Dictionaries can be created when starting the server, or at first use. This is defined by the ``dictionaries_lazy_load`` parameter in the main server config file. This parameter is optional, 'true' by default. If set to 'true', each dictionary is created at first use. If dictionary creation failed, the function that was using the dictionary throws an exception. If 'false', all dictionaries are created when the server starts, and if there is an error, the server shuts down.
The dictionary config file has the following format:
@ -142,25 +142,27 @@ The dictionary identifier (key attribute) should be a number that fits into UInt
There are six ways to store dictionaries in memory.
flat
-----
----
This is the most effective method. It works if all keys are smaller than ``500,000``. If a larger key is discovered when creating the dictionary, an exception is thrown and the dictionary is not created. The dictionary is loaded to RAM in its entirety. The dictionary uses the amount of memory proportional to maximum key value. With the limit of 500,000, memory consumption is not likely to be high. All types of sources are supported. When updating, data (from a file or from a table) is read in its entirety.
hashed
-------
------
This method is slightly less effective than the first one. The dictionary is also loaded to RAM in its entirety, and can contain any number of items with any identifiers. In practice, it makes sense to use up to tens of millions of items, while there is enough RAM.
All types of sources are supported. When updating, data (from a file or from a table) is read in its entirety.
cache
-------
-----
This is the least effective method. It is appropriate if the dictionary doesn't fit in RAM. It is a cache of a fixed number of cells, where frequently-used data can be located. MySQL, ClickHouse, executable, http sources are supported, but file sources are not supported.
When searching a dictionary, the cache is searched first. For each data block, all keys not found in the cache (or expired keys) are collected in a package, which is sent to the source with the query ``SELECT attrs... FROM db.table WHERE id IN (k1, k2, ...)``. The received data is then written to the cache.
range_hashed
--------
------------
The table lists some data for date ranges, for each key. To give the possibility to extract this data for a given key, for a given date.
Example: in the table there are discounts for each advertiser in the form:
::
.. code-block:: text
advertiser id discount start date end date value
123 2015-01-01 2015-01-15 0.15
123 2015-01-16 2015-01-31 0.25
@ -251,7 +253,8 @@ ip_trie
The table stores IP prefixes for each key (IP address), which makes it possible to map IP addresses to metadata such as ASN or threat score.
Example: in the table there are prefixes matches to AS number and country:
::
.. code-block:: text
prefix asn cca2
202.79.32.0/20 17501 NP
2620:0:870::/48 3856 US
@ -299,17 +302,17 @@ No other type is supported. The function returns attribute for a prefix matching
The data is stored currently in a bitwise trie, it has to fit in memory.
complex_key_hashed
----------------
------------------
The same as ``hashed``, but for complex keys.
complex_key_cache
----------
-----------------
The same as ``cache``, but for complex keys.
Notes
----------
-----
We recommend using the ``flat`` method when possible, or ``hashed``. The speed of the dictionaries is impeccable with this type of memory storage.
@ -335,7 +338,7 @@ To use external dictionaries, see the section "Functions for working with extern
Note that you can convert values for a small dictionary by specifying all the contents of the dictionary directly in a ``SELECT`` query (see the section "transform function"). This functionality is not related to external dictionaries.
Dictionaries with complex keys
----------------------------
------------------------------
You can use tuples consisting of fields of arbitrary types as keys. Configure your dictionary with ``complex_key_hashed`` or ``complex_key_cache`` layout in this case.

View File

@ -1,5 +1,5 @@
Dictionaries
=======
============
A dictionary is a mapping (key -> attributes) that can be used in a query as functions. You can think of this as a more convenient and efficient type of JOIN with dimension tables.

View File

@ -1,5 +1,5 @@
Internal dictionaries
------------------
---------------------
ClickHouse contains a built-in feature for working with a geobase.
@ -9,7 +9,7 @@ This allows you to:
* Check whether a region is part of another region.
* Get a chain of parent regions.
All the functions support "translocality," the ability to simultaneously use different perspectives on region ownership. For more information, see the section "Functions for working with Yandex.Metrica dictionaries".
All the functions support "translocality", the ability to simultaneously use different perspectives on region ownership. For more information, see the section "Functions for working with Yandex.Metrica dictionaries".
The internal dictionaries are disabled in the default package.
To enable them, uncomment the parameters ``path_to_regions_hierarchy_file`` and ``path_to_regions_names_files`` in the server config file.

View File

@ -1,5 +1,5 @@
External data for query processing
====================================
==================================
ClickHouse allows sending a server the data that is needed for processing a query, together with a SELECT query. This data is put in a temporary table (see the section "Temporary tables") and can be used in the query (for example, in IN operators).
@ -10,7 +10,8 @@ If you need to run more than one query with a large volume of external data, don
External data can be uploaded using the command-line client (in non-interactive mode), or using the HTTP interface.
In the command-line client, you can specify a parameters section in the format
::
.. code-block:: bash
--external --file=... [--name=...] [--format=...] [--types=...|--structure=...]
You may have multiple sections like this, for the number of tables being transmitted.
@ -30,7 +31,8 @@ One of the following parameters is required:
The files specified in ``file`` will be parsed by the format specified in ``format``, using the data types specified in ``types`` or ``structure``. The table will be uploaded to the server and accessible there as a temporary table with the name ``name``.
Examples:
::
.. code-block:: bash
echo -ne "1\n2\n3\n" | clickhouse-client --query="SELECT count() FROM test.visits WHERE TraficSourceID IN _data" --external --file=- --types=Int8
849897
cat /etc/passwd | sed 's/:/\t/g' | clickhouse-client --query="SELECT shell, count() AS c FROM passwd GROUP BY shell ORDER BY c DESC" --external --file=- --name=passwd --structure='login String, unused String, uid UInt16, gid UInt16, comment String, home String, shell String'
@ -43,7 +45,8 @@ Examples:
When using the HTTP interface, external data is passed in the multipart/form-data format. Each table is transmitted as a separate file. The table name is taken from the file name. The 'query_string' passes the parameters 'name_format', 'name_types', and 'name_structure', where name is the name of the table that these parameters correspond to. The meaning of the parameters is the same as when using the command-line client.
Example:
::
.. code-block:: bash
cat /etc/passwd | sed 's/:/\t/g' > passwd.tsv
curl -F 'passwd=@passwd.tsv;' 'http://localhost:8123/?query=SELECT+shell,+count()+AS+c+FROM+passwd+GROUP+BY+shell+ORDER+BY+c+DESC&passwd_structure=login+String,+unused+String,+uid+UInt16,+gid+UInt16,+comment+String,+home+String,+shell+String'

View File

@ -1,5 +1,5 @@
CSV
----
---
Comma separated values (`RFC <https://tools.ietf.org/html/rfc4180>`_).

View File

@ -1,5 +1,5 @@
JSON
-----
----
Outputs data in JSON format. Besides data tables, it also outputs column names and types, along with some additional information - the total number of output rows, and the number of rows that could have been output if there weren't a LIMIT. Example:

View File

@ -4,7 +4,9 @@ JSONCompact
Differs from ``JSON`` only in that data rows are output in arrays, not in objects.
Example:
::
.. code-block:: text
{
"meta":
[

View File

@ -3,7 +3,9 @@ JSONEachRow
If put in SELECT query, displays data in newline delimited JSON (JSON objects separated by \\n character) format.
If put in INSERT query, expects this kind of data as input.
::
.. code-block:: text
{"SearchPhrase":"","count()":"8267016"}
{"SearchPhrase":"bathroom interior","count()":"2166"}
{"SearchPhrase":"yandex","count()":"1655"}

View File

@ -4,7 +4,9 @@ PrettyNoEscapes
Differs from Pretty in that ANSI-escape sequences aren't used. This is necessary for displaying this format in a browser, as well as for using the 'watch' command-line utility.
Example:
::
.. code-block:: text
watch -n1 "clickhouse-client --query='SELECT * FROM system.events FORMAT PrettyCompactNoEscapes'"
You can use the HTTP interface for displaying in the browser.

View File

@ -17,7 +17,9 @@ During a parsing operation, incorrect dates and dates with times can be parsed w
As an exception, parsing DateTime is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats ``YYYY-MM-DD hh:mm:ss`` and ``NNNNNNNNNN`` are differentiated automatically.
Strings are parsed and formatted with backslash-escaped special characters. The following escape sequences are used while formatting: ``\b``, ``\f``, ``\r``, ``\n``, ``\t``, ``\0``, ``\'``, and ``\\``. For parsing, also supported \a, \v and \xHH (hex escape sequence) and any sequences of the type \c where c is any character (these sequences are converted to c). This means that parsing supports formats where a line break can be written as \n or as \ and a line break. For example, the string 'Hello world' with a line break between the words instead of a space can be retrieved in any of the following variations:
::
.. code-block:: text
Hello\nworld
Hello\
@ -35,10 +37,12 @@ The TabSeparated format is convenient for processing data using custom programs
The TabSeparated format supports outputting total values (when using WITH TOTALS) and extreme values (when 'extremes' is set to 1). In these cases, the total values and extremes are output after the main data. The main result, total values, and extremes are separated from each other by an empty line. Example:
``SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated``
.. code-block:: sql
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated
.. code-block:: text
2014-03-17 1406958
2014-03-18 1383658
2014-03-19 1405797

View File

@ -1,8 +1,9 @@
TSKV
-----
----
Similar to TabSeparated, but displays data in name=value format. Names are displayed just as in TabSeparated. Additionally, a ``=`` symbol is displayed.
::
.. code-block:: text
SearchPhrase= count()=8267016
SearchPhrase=bathroom interior count()=2166
SearchPhrase=yandex count()=1655

View File

@ -5,5 +5,5 @@ Prints every row in parentheses. Rows are separated by commas. There is no comma
Minimum set of symbols that you must escape in Values format is single quote and backslash.
This is the format that is used in ``INSERT INTO t VALUES`` ...
This is the format that is used in ``INSERT INTO t VALUES ...``
But you can also use it for query result.

View File

@ -1,5 +1,5 @@
XML
----
---
XML format is supported only for displaying data, not for INSERTS. Example:

View File

@ -1,13 +1,15 @@
Arithmetic functions
======================
====================
For all arithmetic functions, the result type is calculated as the smallest number type that the result fits in, if there is such a type. The minimum is taken simultaneously based on the number of bits, whether it is signed, and whether it floats. If there are not enough bits, the highest bit type is taken.
Example
Example:
.. code-block:: sql
:) SELECT toTypeName(0), toTypeName(0 + 0), toTypeName(0 + 0 + 0), toTypeName(0 + 0 + 0 + 0)
SELECT toTypeName(0), toTypeName(0 + 0), toTypeName(0 + 0 + 0), toTypeName(0 + 0 + 0 + 0)
.. code-block:: text
┌─toTypeName(0)─┬─toTypeName(plus(0, 0))─┬─toTypeName(plus(plus(0, 0), 0))─┬─toTypeName(plus(plus(plus(0, 0), 0), 0))─┐
│ UInt8 │ UInt16 │ UInt32 │ UInt64 │
@ -34,7 +36,7 @@ multiply(a, b), a * b operator
Calculates the product of the numbers.
divide(a, b), a / b operator
-----------------------------
----------------------------
Calculates the quotient of the numbers. The result type is always a floating-point type.
It is not integer division. For integer division, use the 'intDiv' function.
When dividing by zero you get 'inf', '-inf', or 'nan'.

View File

@ -91,6 +91,8 @@ This function is normally used together with ARRAY JOIN. It allows counting some
WHERE CounterID = 160656
LIMIT 10
.. code-block:: text
┌─Reaches─┬──Hits─┐
│ 95606 │ 31406 │
└─────────┴───────┘
@ -105,6 +107,8 @@ In this example, Reaches is the number of conversions (the strings received afte
FROM test.hits
WHERE (CounterID = 160656) AND notEmpty(GoalsReached)
.. code-block:: text
┌─Reaches─┬──Hits─┐
│ 95606 │ 31406 │
└─────────┴───────┘
@ -133,6 +137,8 @@ This function is useful when using ARRAY JOIN and aggregation of array elements.
ORDER BY Reaches DESC
LIMIT 10
.. code-block:: text
┌──GoalID─┬─Reaches─┬─Visits─┐
│ 53225 │ 3214 │ 1097 │
│ 2825062 │ 3188 │ 1097 │
@ -156,6 +162,8 @@ The arrayEnumerateUniq function can take multiple arrays of the same size as arg
SELECT arrayEnumerateUniq([1, 1, 1, 2, 2, 2], [1, 1, 2, 1, 1, 2]) AS res
.. code-block:: text
┌─res───────────┐
│ [1,2,1,1,2,1] │
└───────────────┘

View File

@ -1,5 +1,5 @@
arrayJoin function
---------------
------------------
This is a very unusual function.
Normal functions don't change a set of rows, but just change the values in each row (map). Aggregate functions compress a set of rows (fold or reduce).
@ -23,6 +23,7 @@ Example:
'Hello',
src
.. code-block:: text
┌─dst─┬─\'Hello\'─┬─src─────┐
│ 1 │ Hello │ [1,2,3] │
│ 2 │ Hello │ [1,2,3] │

View File

@ -1,5 +1,5 @@
Bit functions
---------------
-------------
Bit functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64.
@ -21,4 +21,4 @@ bitShiftLeft(a, b)
~~~~~~~~~~~~~~~~~~
bitShiftRight(a, b)
~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~

View File

@ -1,5 +1,5 @@
Comparison functions
------------------
--------------------
Comparison functions always return 0 or 1 (Uint8).
@ -18,19 +18,19 @@ Strings are compared by bytes. A shorter string is smaller than all strings that
Note: before version 1.1.54134 signed and unsigned numbers were compared the same way as in C++. That is, you could got an incorrect result in such cases: SELECT 9223372036854775807 > -1. From version 1.1.54134, the behavior has changed and numbers are compared mathematically correct.
equals, a = b and a == b operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
notEquals, a != b and a <> b operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
less, < operator
~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~
greater, > operator
~~~~~~~~~~~~~~~~~~~
lessOrEquals, <= operator
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~
greaterOrEquals, >= operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -1,7 +1,7 @@
Conditional functions
-------------
---------------------
if(cond, then, else), оператор cond ? then : else
~~~~~~~~~~~~~~~~~
if(cond, then, else), ternary operator cond ? then : else
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns 'then' if 'cond != 0', or 'else' if 'cond = 0'.
'cond' must be UInt 8, and 'then' and 'else' must be a type that has the smallest common type.

View File

@ -1,9 +1,10 @@
Functions for working with dates and times
--------------------------------------
------------------------------------------
Time Zone Support
All functions for working with the date and time for which this makes sense, can take a second, optional argument - the time zone name. Example: Asia / Yekaterinburg. In this case, they do not use the local time zone (the default), but the specified one.
.. code-block:: sql
SELECT
toDateTime('2016-06-15 23:00:00') AS time,
@ -11,13 +12,16 @@ All functions for working with the date and time for which this makes sense, can
toDate(time, 'Asia/Yekaterinburg') AS date_yekat,
toString(time, 'US/Samoa') AS time_samoa
.. code-block:: text
┌────────────────time─┬─date_local─┬─date_yekat─┬─time_samoa──────────┐
│ 2016-06-15 23:00:00 │ 2016-06-15 │ 2016-06-16 │ 2016-06-15 09:00:00 │
└─────────────────────┴────────────┴────────────┴─────────────────────┘
Only time zones are supported, different from UTC for an integer number of hours.
toYear
~~~~~~~
~~~~~~
Converts a date or date with time to a UInt16 number containing the year number (AD).
toMonth
@ -25,117 +29,117 @@ toMonth
Converts a date or date with time to a UInt8 number containing the month number (1-12).
toDayOfMonth
~~~~~~~
~~~~~~~~~~~~
Converts a date or date with time to a UInt8 number containing the number of the day of the month (1-31).
toDayOfWeek
~~~~~~~
~~~~~~~~~~~
Converts a date or date with time to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7).
toHour
~~~~~~~
~~~~~~
Converts a date with time to a UInt8 number containing the number of the hour in 24-hour time (0-23).
This function assumes that if clocks are moved ahead, it is by one hour and occurs at 2 a.m., and if clocks are moved back, it is by one hour and occurs at 3 a.m. (which is not always true - even in Moscow the clocks were once changed at a different time).
toMinute
~~~~~~~
~~~~~~~~
Converts a date with time to a UInt8 number containing the number of the minute of the hour (0-59).
toSecond
~~~~~~~
~~~~~~~~
Converts a date with time to a UInt8 number containing the number of the second in the minute (0-59).
Leap seconds are not accounted for.
toStartOfDay
~~~~~~~
~~~~~~~~~~~~
Rounds down a date with time to the start of the day.
toMonday
~~~~~~~
~~~~~~~~
Rounds down a date or date with time to the nearest Monday.
Returns the date.
toStartOfMonth
~~~~~~~
~~~~~~~~~~~~~~
Rounds down a date or date with time to the first day of the month.
Returns the date.
toStartOfQuarter
~~~~~~~
~~~~~~~~~~~~~~~~
Rounds down a date or date with time to the first day of the quarter.
The first day of the quarter is either 1 January, 1 April, 1 July, or 1 October. Returns the date.
toStartOfYear
~~~~~~~
~~~~~~~~~~~~~
Rounds down a date or date with time to the first day of the year.
Returns the date.
toStartOfMinute
~~~~~~~
~~~~~~~~~~~~~~~
Rounds down a date with time to the start of the minute.
toStartOfFiveMinute
~~~~~~~
~~~~~~~~~~~~~~~~~~~
Rounds down a date with time to the start of the 5 minute (00:00, 00:05, 00:10...).
toStartOfHour
~~~~~~~
~~~~~~~~~~~~~
Rounds down a date with time to the start of the hour.
toTime
~~~~~~~
~~~~~~
Converts a date with time to some fixed date, while preserving the time.
toRelativeYearNum
~~~~~~~
~~~~~~~~~~~~~~~~~
Converts a date with time or date to the number of the year, starting from a certain fixed point in the past.
toRelativeMonthNum
~~~~~~~
~~~~~~~~~~~~~~~~~~
Converts a date with time or date to the number of the month, starting from a certain fixed point in the past.
toRelativeWeekNum
~~~~~~~
~~~~~~~~~~~~~~~~~
Converts a date with time or date to the number of the week, starting from a certain fixed point in the past.
toRelativeDayNum
~~~~~~~
~~~~~~~~~~~~~~~~
Converts a date with time or date to the number of the day, starting from a certain fixed point in the past.
toRelativeHourNum
~~~~~~~
~~~~~~~~~~~~~~~~~
Converts a date with time or date to the number of the hour, starting from a certain fixed point in the past.
toRelativeMinuteNum
~~~~~~~
~~~~~~~~~~~~~~~~~~~
Converts a date with time or date to the number of the minute, starting from a certain fixed point in the past.
toRelativeSecondNum
~~~~~~~
~~~~~~~~~~~~~~~~~~~
Converts a date with time or date to the number of the second, starting from a certain fixed point in the past.
now
~~~~~~~
~~~
Accepts zero arguments and returns the current time at one of the moments of request execution.
This function returns a constant, even if the request took a long time to complete.
today
~~~~~~~
~~~~~
Accepts zero arguments and returns the current date at one of the moments of request execution.
The same as 'toDate(now())'.
yesterday
~~~~~~~
~~~~~~~~~
Accepts zero arguments and returns yesterday's date at one of the moments of request execution.
The same as 'today() - 1'.
timeSlot
~~~~~~~
~~~~~~~~
Rounds the time to the half hour.
This function is specific to Yandex.Metrica, since half an hour is the minimum amount of time for breaking a session into two sessions if a counter shows a single user's consecutive pageviews that differ in time by strictly more than this amount. This means that tuples (the counter number, user ID, and time slot) can be used to search for pageviews that are included in the corresponding session.
timeSlots(StartTime, Duration)
~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For a time interval starting at 'StartTime' and continuing for 'Duration' seconds, it returns an array of moments in time, consisting of points from this interval rounded down to the half hour.
For example, timeSlots(toDateTime('2012-01-01 12:20:00'), toUInt32(600)) = [toDateTime('2012-01-01 12:00:00'), toDateTime('2012-01-01 12:30:00')].
This is necessary for searching for pageviews in the corresponding session.
For example, ``timeSlots(toDateTime('2012-01-01 12:20:00'), toUInt32(600)) = [toDateTime('2012-01-01 12:00:00'), toDateTime('2012-01-01 12:30:00')]``.
This is necessary for searching for page views in the corresponding session.

View File

@ -1,8 +1,8 @@
Encoding functions
--------
------------------
hex
~~~~~
~~~
Accepts a string, number, date, or date with time. Returns a string containing the argument's hexadecimal representation. Uses uppercase letters A-F.
Doesn't use ``0x`` prefixes or ``h`` suffixes.
For strings, all bytes are simply encoded as two hexadecimal numbers. Numbers are converted to big endian ("human readable") format.
@ -10,22 +10,22 @@ For numbers, older zeros are trimmed, but only by entire bytes.
For example, ``hex(1) = '01'``. Dates are encoded as the number of days since the beginning of the Unix Epoch. Dates with times are encoded as the number of seconds since the beginning of the Unix Epoch.
unhex(str)
~~~~~~~
~~~~~~~~~~
Accepts a string containing any number of hexadecimal digits, and returns a string containing the corresponding bytes. Supports both uppercase and lowercase letters A-F. The number of hexadecimal digits doesn't have to be even. If it is odd, the last digit is interpreted as the younger half of the 00-0F byte. If the argument string contains anything other than hexadecimal digits, some implementation-defined result is returned (an exception isn't thrown).
If you want to convert the result to a number, you can use the functions 'reverse' and 'reinterpretAsType'
UUIDStringToNum(str)
~~~~~~~
~~~~~~~~~~~~~~~~~~~~
Accepts a string containing the UUID in the text format (``123e4567-e89b-12d3-a456-426655440000``). Returns a binary representation of the UUID in ``FixedString(16)``.
UUIDNumToString(str)
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
Accepts a FixedString(16) value containing the UUID in the binary format. Returns a readable string containing the UUID in the text format.
bitmaskToList(num)
~~~~~~~
~~~~~~~~~~~~~~~~~~
Accepts an integer. Returns a string containing the list of powers of two that total the source number when summed. They are comma-separated without spaces in text format, in ascending order.
bitmaskToArray(num)
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~
Accepts an integer. Returns an array of UInt64 numbers containing the list of powers of two that total the source number when summed. Numbers in the array are in ascending order.

View File

@ -1,43 +1,43 @@
Functions for working with external dictionaries
-------
------------------------------------------------
For more information, see the section "External dictionaries".
dictGetUInt8, dictGetUInt16, dictGetUInt32, dictGetUInt64
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dictGetInt8, dictGetInt16, dictGetInt32, dictGetInt64
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dictGetFloat32, dictGetFloat64
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dictGetDate, dictGetDateTime
~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dictGetString
~~~~~~
~~~~~~~~~~~~~
``dictGetT('dict_name', 'attr_name', id)``
- Gets the value of the 'attr_name' attribute from the 'dict_name' dictionary by the 'id' key.
Gets the value of the 'attr_name' attribute from the 'dict_name' dictionary by the 'id' key.
'dict_name' and 'attr_name' are constant strings.
'id' must be UInt64.
If the 'id' key is not in the dictionary, it returns the default value set in the dictionary definition.
dictGetTOrDefault
~~~~~~~~
~~~~~~~~~~~~~~~~~
``dictGetT('dict_name', 'attr_name', id, default)``
Similar to the functions dictGetT, but the default value is taken from the last argument of the function.
dictIsIn
~~~~~~
~~~~~~~~
``dictIsIn('dict_name', child_id, ancestor_id)``
- For the 'dict_name' hierarchical dictionary, finds out whether the 'child_id' key is located inside 'ancestor_id' (or matches 'ancestor_id'). Returns UInt8.
For the 'dict_name' hierarchical dictionary, finds out whether the 'child_id' key is located inside 'ancestor_id' (or matches 'ancestor_id'). Returns UInt8.
dictGetHierarchy
~~~~~~~~
~~~~~~~~~~~~~~~~
``dictGetHierarchy('dict_name', id)``
- For the 'dict_name' hierarchical dictionary, returns an array of dictionary keys starting from 'id' and continuing along the chain of parent elements. Returns Array(UInt64).
For the 'dict_name' hierarchical dictionary, returns an array of dictionary keys starting from 'id' and continuing along the chain of parent elements. Returns Array(UInt64).
dictHas
~~~~~~
~~~~~~~
``dictHas('dict_name', id)``
- check the presence of a key in the dictionary. Returns a value of type UInt8, equal to 0, if there is no key and 1 if there is a key.
check the presence of a key in the dictionary. Returns a value of type UInt8, equal to 0, if there is no key and 1 if there is a key.

View File

@ -1,10 +1,10 @@
Hash functions
-------------
--------------
Hash functions can be used for deterministic pseudo-random shuffling of elements.
halfMD5
~~~~~~
~~~~~~~
Calculates the MD5 from a string. Then it takes the first 8 bytes of the hash and interprets them as UInt64 in big endian.
Accepts a String-type argument. Returns UInt64.
This function works fairly slowly (5 million short strings per second per processor core).
@ -17,19 +17,19 @@ If you don't need MD5 in particular, but you need a decent cryptographic 128-bit
If you need the same result as gives 'md5sum' utility, write ``lower(hex(MD5(s)))``.
sipHash64
~~~~~~~
~~~~~~~~~
Calculates SipHash from a string.
Accepts a String-type argument. Returns UInt64.
SipHash is a cryptographic hash function. It works at least three times faster than MD5. For more information, see https://131002.net/siphash/
sipHash128
~~~~~
~~~~~~~~~~
Calculates SipHash from a string.
Accepts a String-type argument. Returns FixedString(16).
Differs from sipHash64 in that the final xor-folding state is only done up to 128 bits.
cityHash64
~~~~~
~~~~~~~~~~
Calculates CityHash64 from a string or a similar hash function for any number of any type of arguments.
For String-type arguments, CityHash is used. This is a fast non-cryptographic hash function for strings with decent quality.
For other types of arguments, a decent implementation-specific fast non-cryptographic hash function is used.
@ -37,12 +37,12 @@ If multiple arguments are passed, the function is calculated using the same rule
For example, you can compute the checksum of an entire table with accuracy up to the row order: ``SELECT sum(cityHash64(*)) FROM table``.
intHash32
~~~~~
~~~~~~~~~
Calculates a 32-bit hash code from any type of integer.
This is a relatively fast non-cryptographic hash function of average quality for numbers.
intHash64
~~~~~
~~~~~~~~~
Calculates a 64-bit hash code from any type of integer.
It works faster than intHash32. Average quality.
@ -50,17 +50,17 @@ SHA1
~~~~
SHA224
~~~~~
~~~~~~
SHA256
~~~~~
~~~~~~
Calculates SHA-1, SHA-224, or SHA-256 from a string and returns the resulting set of bytes as FixedString(20), FixedString(28), or FixedString(32).
The function works fairly slowly (SHA-1 processes about 5 million short strings per second per processor core, while SHA-224 and SHA-256 process about 2.2 million).
We recommend using this function only in cases when you need a specific hash function and you can't select it.
Even in these cases, we recommend applying the function offline and pre-calculating values when inserting them into the table, instead of applying it in SELECTS.
URLHash(url[, N])
~~~~~~~~
~~~~~~~~~~~~~~~~~
A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization.
``URLHash(s)`` - Calculates a hash from a string without one of the trailing symbols ``/``,``?`` or ``#`` at the end, if present.

View File

@ -1,8 +1,8 @@
Higher-order functions
-----------------------
----------------------
-> operator, lambda(params, expr) function
~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Allows describing a lambda function for passing to a higher-order function. The left side of the arrow has a formal parameter - any ID, or multiple formal parameters - any IDs in a tuple. The right side of the arrow has an expression that can use these formal parameters, as well as any table columns.
Examples: ``x -> 2 * x, str -> str != Referer.``
@ -14,11 +14,11 @@ A lambda function that accepts multiple arguments can be passed to a higher-orde
For all functions other than 'arrayMap' and 'arrayFilter', the first argument (the lambda function) can be omitted. In this case, identical mapping is assumed.
arrayMap(func, arr1, ...)
~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~
Returns an array obtained from the original application of the 'func' function to each element in the 'arr' array.
arrayFilter(func, arr1, ...)
~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns an array containing only the elements in 'arr1' for which 'func' returns something other than 0.
Examples:
@ -27,10 +27,14 @@ Examples:
SELECT arrayFilter(x -> x LIKE '%World%', ['Hello', 'abc World']) AS res
.. code-block:: text
┌─res───────────┐
│ ['abc World'] │
└───────────────┘
.. code-block:: sql
SELECT
arrayFilter(
(i, x) -> x LIKE '%World%',
@ -38,30 +42,32 @@ Examples:
['Hello', 'abc World'] AS arr)
AS res
.. code-block:: text
┌─res─┐
│ [2] │
└─────┘
arrayCount([func,] arr1, ...)
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns the number of elements in 'arr' for which 'func' returns something other than 0. If 'func' is not specified, it returns the number of non-zero items in the array.
arrayExists([func,] arr1, ...)
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns 1 if there is at least one element in 'arr' for which 'func' returns something other than 0. Otherwise, it returns 0.
arrayAll([func,] arr1, ...)
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns 1 if 'func' returns something other than 0 for all the elements in 'arr'. Otherwise, it returns 0.
arraySum([func,] arr1, ...)
~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns the sum of the 'func' values. If the function is omitted, it just returns the sum of the array elements.
arrayFirst(func, arr1, ...)
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns the first element in the 'arr1' array for which 'func' returns something other than 0.
arrayFirstIndex(func, arr1, ...)
~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns the index of the first element in the 'arr1' array for which 'func' returns something other than 0.

View File

@ -1,18 +1,18 @@
Functions for implementing the IN operator
---------------
------------------------------------------
in, notIn, globalIn, globalNotIn
~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See the section "IN operators".
tuple(x, y, ...), оператор (x, y, ...)
~~~~~~~~~~~~~
tuple(x, y, ...), operator (x, y, ...)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A function that allows grouping multiple columns.
For columns with the types T1, T2, ..., it returns a Tuple(T1, T2, ...) type tuple containing these columns. There is no cost to execute the function.
Tuples are normally used as intermediate values for an argument of IN operators, or for creating a list of formal parameters of lambda functions. Tuples can't be written to a table.
tupleElement(tuple, n), оператор x.N
~~~~~~~~~~~
tupleElement(tuple, n), operator x.N
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A function that allows getting columns from a tuple.
'N' is the column index, starting from 1. 'N' must be a constant. 'N' must be a strict postive integer no greater than the size of the tuple.
There is no cost to execute the function.

View File

@ -1,5 +1,5 @@
Functions
=======
=========
There are at least* two types of functions - regular functions (they are just called "functions") and aggregate functions. These are completely different concepts. Regular functions work as if they are applied to each row separately (for each row, the result of the function doesn't depend on the other rows). Aggregate functions accumulate a set of values from various rows (i.e. they depend on the entire set of rows).
@ -7,7 +7,6 @@ In this section we discuss regular functions. For aggregate functions, see the s
* - There is a third type of function that the 'arrayJoin' function belongs to; table functions can also be mentioned separately.
.. toctree::
:glob:
@ -16,15 +15,15 @@ In this section we discuss regular functions. For aggregate functions, see the s
Strong typing
~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~
In contrast to standard SQL, ClickHouse has strong typing. In other words, it doesn't make implicit conversions between types. Each function works for a specific set of types. This means that sometimes you need to use type conversion functions.
Common subexpression elimination
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All expressions in a query that have the same AST (the same record or same result of syntactic parsing) are considered to have identical values. Such expressions are concatenated and executed once. Identical subqueries are also eliminated this way.
Types of results
~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~
All functions return a single return as the result (not several values, and not zero values). The type of result is usually defined only by the types of arguments, not by the values. Exceptions are the tupleElement function (the a.N operator), and the toFixedString function.
Constants
@ -37,25 +36,25 @@ A constant expression is also considered a constant (for example, the right half
Functions can be implemented in different ways for constant and non-constant arguments (different code is executed). But the results for a constant and for a true column containing only the same value should match each other.
Immutability
~~~~~~~~~~~~~~
~~~~~~~~~~~~
Functions can't change the values of their arguments - any changes are returned as the result. Thus, the result of calculating separate functions does not depend on the order in which the functions are written in the query.
Error handling
~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~
Some functions might throw an exception if the data is invalid. In this case, the query is canceled and an error text is returned to the client. For distributed processing, when an exception occurs on one of the servers, the other servers also attempt to abort the query.
Evaluation of argument expressions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In almost all programming languages, one of the arguments might not be evaluated for certain operators. This is usually for the operators ``&&``, ``||``, ``?:``.
But in ClickHouse, arguments of functions (operators) are always evaluated. This is because entire parts of columns are evaluated at once, instead of calculating each row separately.
Performing functions for distributed query processing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For distributed query processing, as many stages of query processing as possible are performed on remote servers, and the rest of the stages (merging intermediate results and everything after that) are performed on the requestor server.

View File

@ -1,16 +1,16 @@
Functions for working with IP addresses
-------------------------
---------------------------------------
IPv4NumToString(num)
~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
Takes a UInt32 number. Interprets it as an IPv4 address in big endian. Returns a string containing the corresponding IPv4 address in the format A.B.C.d (dot-separated numbers in decimal form).
IPv4StringToNum(s)
~~~~~~~~
~~~~~~~~~~~~~~~~~~
The reverse function of IPv4NumToString. If the IPv4 address has an invalid format, it returns 0.
IPv4NumToStringClassC(num)
~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~
Similar to IPv4NumToString, but using ``xxx`` instead of the last octet.
Example:
@ -41,7 +41,7 @@ Example:
Since using ``'xxx'`` is highly unusual, this may be changed in the future. We recommend that you don't rely on the exact format of this fragment.
IPv6NumToString(x)
~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format.
IPv6-mapped IPv4 addresses are output in the format ``::ffff:111.222.33.44``. Examples:
@ -96,6 +96,6 @@ IPv6-mapped IPv4 addresses are output in the format ``::ffff:111.222.33.44``. Ex
└────────────────────────────┴────────┘
IPv6StringToNum(s)
~~~~~~~~
~~~~~~~~~~~~~~~~~~
The reverse function of IPv6NumToString. If the IPv6 address has an invalid format, it returns a string of null bytes.
HEX can be uppercase or lowercase.

View File

@ -1,5 +1,5 @@
Functions for working with JSON.
-------------------
--------------------------------
In Yandex.Metrica, JSON is passed by users as session parameters. There are several functions for working with this JSON. (Although in most of the cases, the JSONs are additionally pre-processed, and the resulting values are put in separate columns in their processed format.) All these functions are based on strong assumptions about what the JSON can be, but they try not to do anything.
The following assumptions are made:
@ -9,40 +9,42 @@ The following assumptions are made:
#. JSON doesn't have space characters outside of string literals.
visitParamHas(params, name)
~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Checks whether there is a field with the 'name' name.
visitParamExtractUInt(params, name)
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Parses UInt64 from the value of the field named 'name'. If this is a string field, it tries to parse a number from the beginning of the string. If the field doesn't exist, or it exists but doesn't contain a number, it returns 0.
visitParamExtractInt(params, name)
~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The same as for Int64.
visitParamExtractFloat(params, name)
~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The same as for Float64.
visitParamExtractBool(params, name)
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Parses a true/false value. The result is UInt8.
visitParamExtractRaw(params, name)
~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Returns the value of a field, including separators.
Examples:
::
.. code-block:: text
visitParamExtractRaw('{"abc":"\\n\\u0000"}', 'abc') = '"\\n\\u0000"'
visitParamExtractRaw('{"abc":{"def":[1,2,3]}}', 'abc') = '{"def":[1,2,3]}'
visitParamExtractString(params, name)
~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Parses the string in double quotes. The value is unescaped. If unescaping failed, it returns an empty string.
Examples:
::
.. code-block:: text
visitParamExtractString('{"abc":"\\n\\u0000"}', 'abc') = '\n\0'
visitParamExtractString('{"abc":"\\u263a"}', 'abc') = '☺'
visitParamExtractString('{"abc":"\\u263"}', 'abc') = ''

View File

@ -1,5 +1,5 @@
Logical functions
------------------
-----------------
Logical functions accept any numeric types, but return a UInt8 number equal to 0 or 1.
@ -12,8 +12,8 @@ or, OR operator
~~~~~~~~~~~~~~~
not, NOT operator
~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~
xor
~~~~~~~~~~~~~~~
~~~

View File

@ -1,9 +1,9 @@
Mathematical functions
---------------
----------------------
All the functions return a Float64 number. The accuracy of the result is close to the maximum precision possible, but the result might not coincide with the machine representable number nearest to the corresponding real number.
e()
~~~~
~~~
Accepts zero arguments and returns a Float64 number close to the e number.
pi()
@ -11,7 +11,7 @@ pi()
Accepts zero arguments and returns a Float64 number close to π.
exp(x)
~~~~~
~~~~~~
Accepts a numeric argument and returns a Float64 number close to the exponent of the argument.
log(x)
@ -23,19 +23,19 @@ exp2(x)
Accepts a numeric argument and returns a Float64 number close to 2x.
log2(x)
~~~~~
~~~~~~~
Accepts a numeric argument and returns a Float64 number close to the binary logarithm of the argument.
exp10(x)
~~~~~~~
~~~~~~~~
Accepts a numeric argument and returns a Float64 number close to 10x.
log10(x)
~~~~~~~
~~~~~~~~
Accepts a numeric argument and returns a Float64 number close to the decimal logarithm of the argument.
sqrt(x)
~~~~~~~~
~~~~~~~
Accepts a numeric argument and returns a Float64 number close to the square root of the argument.
cbrt(x)
@ -43,9 +43,9 @@ cbrt(x)
Accepts a numeric argument and returns a Float64 number close to the cubic root of the argument.
erf(x)
~~~~~~~
~~~~~~
If 'x' is non-negative, then erf(x / σ√2) - is the probability that a random variable having a normal distribution with standard deviation 'σ' takes the value that is separated from the expected value by more than 'x'.
If 'x' is non-negative, then ``erf(x / σ√2)`` - is the probability that a random variable having a normal distribution with standard deviation 'σ' takes the value that is separated from the expected value by more than 'x'.
Example (three sigma rule):
@ -53,28 +53,30 @@ Example (three sigma rule):
SELECT erf(3 / sqrt(2))
.. code-block:: text
┌─erf(divide(3, sqrt(2)))─┐
│ 0.9973002039367398 │
└─────────────────────────┘
erfc(x)
~~~~~~
~~~~~~~
Accepts a numeric argument and returns a Float64 number close to 1 - erf(x), but without loss of precision for large 'x' values.
lgamma(x)
~~~~~~~
~~~~~~~~~
The logarithm of the gamma function.
tgamma(x)
~~~~~~
~~~~~~~~~
Gamma function.
sin(x)
~~~~~
~~~~~~
The sine.
cos(x)
~~~~~
~~~~~~
The cosine.
tan(x)
@ -82,17 +84,17 @@ tan(x)
The tangent.
asin(x)
~~~~~~
~~~~~~~
The arc sine
acos(x)
~~~~~~
~~~~~~~
The arc cosine.
atan(x)
~~~~~
~~~~~~~
The arc tangent.
pow(x, y)
~~~~~~~
xy.
~~~~~~~~~
x to the power y.

View File

@ -1,64 +1,64 @@
Other functions
-------------
---------------
hostName()
~~~~~~~
~~~~~~~~~~
Returns a string with the name of the host that this function was performed on. For distributed processing, this is the name of the remote server host, if the function is performed on a remote server.
visibleWidth(x)
~~~~~~~~~
~~~~~~~~~~~~~~~
Calculates the approximate width when outputting values to the console in text format (tab-separated). This function is used by the system for implementing Pretty formats.
toTypeName(x)
~~~~~~~~
~~~~~~~~~~~~~
Gets the type name. Returns a string containing the type name of the passed argument.
blockSize()
~~~~~~~~
~~~~~~~~~~~
Gets the size of the block.
In ClickHouse, queries are always run on blocks (sets of column parts). This function allows getting the size of the block that you called it for.
materialize(x)
~~~~~~~~
~~~~~~~~~~~~~~
Turns a constant into a full column containing just one value.
In ClickHouse, full columns and constants are represented differently in memory. Functions work differently for constant arguments and normal arguments (different code is executed), although the result is almost always the same. This function is for debugging this behavior.
ignore(...)
~~~~~~~
~~~~~~~~~~~
A function that accepts any arguments and always returns 0.
However, the argument is still calculated. This can be used for benchmarks.
sleep(seconds)
~~~~~~~~~
~~~~~~~~~~~~~~
Sleeps 'seconds' seconds on each data block. You can specify an integer or a floating-point number.
currentDatabase()
~~~~~~~~~~
~~~~~~~~~~~~~~~~~
Returns the name of the current database.
You can use this function in table engine parameters in a CREATE TABLE query where you need to specify the database..
isFinite(x)
~~~~~~~
~~~~~~~~~~~
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is not infinite and not a NaN, otherwise 0.
isInfinite(x)
~~~~~~~
~~~~~~~~~~~~~
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is infinite, otherwise 0.
Note that 0 is returned for a NaN
isNaN(x)
~~~~~
~~~~~~~~
Accepts Float32 and Float64 and returns UInt8 equal to 1 if the argument is a NaN, otherwise 0.
hasColumnInTable('database', 'table', 'column')
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Accepts constant String columns - database name, table name and column name. Returns constant UInt8 value, equal to 1 if column exists,
otherwise 0.
If table doesn't exist than exception is thrown.
For elements of nested data structure function checks existence of column. For nested data structure 0 is returned.
bar
~~~~~
~~~
Allows building a unicode-art diagram.
``bar(x, min, max, width)`` - Draws a band with a width proportional to (x - min) and equal to 'width' characters when x == max.
@ -77,6 +77,8 @@ The band is drawn with accuracy to one eighth of a symbol. Example:
GROUP BY h
ORDER BY h ASC
.. code-block:: text
┌──h─┬──────c─┬─bar────────────────┐
│ 0 │ 292907 │ █████████▋ │
│ 1 │ 180563 │ ██████ │
@ -105,7 +107,7 @@ The band is drawn with accuracy to one eighth of a symbol. Example:
└────┴────────┴────────────────────┘
transform
~~~~~~~
~~~~~~~~~
Transforms a value according to the explicitly defined mapping of some elements to other ones.
There are two variations of this function:
@ -143,6 +145,8 @@ Example:
GROUP BY title
ORDER BY c DESC
.. code-block:: text
┌─title─────┬──────c─┐
│ Яндекс │ 498635 │
│ Google │ 229872 │
@ -171,6 +175,8 @@ Example:
ORDER BY count() DESC
LIMIT 10
.. code-block:: text
┌─s──────────────┬───────c─┐
│ │ 2906259 │
│ www.yandex │ 867767 │
@ -185,7 +191,7 @@ Example:
└────────────────┴─────────┘
formatReadableSize(x)
~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
Gets a size (number of bytes). Returns a string that contains rounded size with the suffix (KiB, MiB etc.).
Example:
@ -196,6 +202,8 @@ Example:
arrayJoin([1, 1024, 1024*1024, 192851925]) AS filesize_bytes,
formatReadableSize(filesize_bytes) AS filesize
.. code-block:: text
┌─filesize_bytes─┬─filesize───┐
│ 1 │ 1.00 B │
│ 1024 │ 1.00 KiB │
@ -204,32 +212,32 @@ Example:
└────────────────┴────────────┘
least(a, b)
~~~~~~
~~~~~~~~~~~
Returns the least element of a and b.
greatest(a, b)
~~~~~~~~
~~~~~~~~~~~~~~
Returns the greatest element of a and b
uptime()
~~~~~~
~~~~~~~~
Returns server's uptime in seconds.
version()
~~~~~~~
~~~~~~~~~
Returns server's version as a string.
rowNumberInAllBlocks()
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~
Returns an incremental row number within all blocks that were processed by this function.
runningDifference(x)
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
Calculates the difference between consecutive values in the data block.
Result of the function depends on the order of the data in the blocks.
It works only inside of the each processed block of data. Data splitting in the blocks is not explicitly controlled by the user.
If you specify ORDER BY in subquery and call runningDifference outside of it, you could get an expected result.
If you specify ``ORDER BY`` in subquery and call runningDifference outside of it, you could get an expected result.
Example:
@ -250,6 +258,8 @@ Example:
LIMIT 5
)
.. code-block:: text
┌─EventID─┬───────────EventTime─┬─delta─┐
│ 1106 │ 2016-11-24 00:00:04 │ 0 │
│ 1107 │ 2016-11-24 00:00:05 │ 1 │
@ -269,4 +279,3 @@ The reverse function of MACNumToString. If the MAC address has an invalid format
MACStringToOUI(s)
~~~~~~~~~~~~~~~~~
Takes MAC address in the format AA:BB:CC:DD:EE:FF (colon-separated numbers in hexadecimal form). Returns first three octets as UInt64 number. If the MAC address has an invalid format, it returns 0.

View File

@ -1,5 +1,5 @@
Functions for generating pseudo-random numbers
----------------------
----------------------------------------------
Non-cryptographic generators of pseudo-random numbers are used.
All the functions accept zero arguments or one argument.
@ -12,6 +12,6 @@ Returns a pseudo-random UInt32 number, evenly distributed among all UInt32-type
Uses a linear congruential generator.
rand64
~~~~
~~~~~~
Returns a pseudo-random UInt64 number, evenly distributed among all UInt64-type numbers.
Uses a linear congruential generator.

View File

@ -1,8 +1,8 @@
Rounding functions
----------------
------------------
floor(x[, N])
~~~~~~~
~~~~~~~~~~~~~
Returns a rounder number that is less than or equal to 'x'.
A round number is a multiple of 1 / 10N, or the nearest number of the appropriate data type ``if 1 / 10N`` isn't exact.
'N' is an integer constant, optional parameter. By default it is zero, which means to round to an integer.
@ -15,24 +15,24 @@ For integer arguments, it makes sense to round with a negative 'N' value (for no
If rounding causes overflow (for example, ``floor(-128, -1))``, an implementation-specific result is returned.
ceil(x[, N])
~~~~~~
~~~~~~~~~~~~
Returns the smallest round number that is greater than or equal to 'x'. In every other way, it is the same as the 'floor' function (see above)..
round(x[, N])
~~~~~~~
~~~~~~~~~~~~~
Returns the round number nearest to 'num', which may be less than, greater than, or equal to 'x'.
If 'x' is exactly in the middle between the nearest round numbers, one of them is returned (implementation-specific).
The number '-0.' may or may not be considered round (implementation-specific).
In every other way, this function is the same as 'floor' and 'ceil' described above.
roundToExp2(num)
~~~~~~~~
~~~~~~~~~~~~~~~~
Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to the nearest (whole non-negative) degree of two.
roundDuration(num)
~~~~~~~~
~~~~~~~~~~~~~~~~~~
Accepts a number. If the number is less than one, it returns 0. Otherwise, it rounds the number down to numbers from the set: 1, 10, 30, 60, 120, 180, 240, 300, 600, 1200, 1800, 3600, 7200, 18000, 36000. This function is specific to Yandex.Metrica and used for implementing the report on session length.
roundAge(num)
~~~~~~~
~~~~~~~~~~~~~
Accepts a number. If the number is less than 18, it returns 0. Otherwise, it rounds the number down to numbers from the set: 18, 25, 35, 45, 55. This function is specific to Yandex.Metrica and used for implementing the report on user age.

View File

@ -1,23 +1,23 @@
Functions for splitting and merging strings and arrays
----------------
------------------------------------------------------
splitByChar(separator, s)
~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~
Splits a string into substrings, using 'separator' as the separator.
'separator' must be a string constant consisting of exactly one character.
Returns an array of selected substrings. Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.
splitByString(separator, s)
~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The same as above, but it uses a string of multiple characters as the separator. The string must be non-empty.
arrayStringConcat(arr[, separator])
~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Concatenates strings from the array elements, using 'separator' as the separator.
'separator' is a string constant, an optional parameter. By default it is an empty string.
Returns a string.
alphaTokens(s)
~~~~~~~~~~
~~~~~~~~~~~~~~
Selects substrings of consecutive bytes from the range a-z and A-Z.
Returns an array of selected substrings.

View File

@ -1,5 +1,5 @@
Functions for working with strings
------------------------------
----------------------------------
empty
~~~~~
@ -26,7 +26,7 @@ Returns the length of a string in Unicode code points (not in characters), assum
The result type is UInt64.
lower
~~~~~~
~~~~~
Converts ASCII Latin symbols in a string to lowercase.
upper

View File

@ -1,5 +1,5 @@
Functions for searching and replacing in strings
---------------------------------
------------------------------------------------
replaceOne(haystack, pattern, replacement)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -30,6 +30,8 @@ Example 1. Converting the date to American format:
LIMIT 7
FORMAT TabSeparated
.. code-block:: text
2014-03-17 03/17/2014
2014-03-18 03/18/2014
2014-03-19 03/19/2014
@ -44,6 +46,8 @@ Example 2. Copy the string ten times:
SELECT replaceRegexpOne('Hello, World!', '.*', '\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0') AS res
.. code-block:: text
┌─res────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World!Hello, World! │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
@ -53,8 +57,11 @@ replaceRegexpAll(haystack, pattern, replacement)
This does the same thing, but replaces all the occurrences. Example:
.. code-block:: sql
SELECT replaceRegexpAll('Hello, World!', '.', '\\0\\0') AS res
.. code-block:: text
┌─res────────────────────────┐
│ HHeelllloo,, WWoorrlldd!! │
└────────────────────────────┘
@ -63,8 +70,11 @@ As an exception, if a regular expression worked on an empty substring, the repla
Example:
.. code-block:: sql
SELECT replaceRegexpAll('Hello, World!', '^', 'here: ') AS res
.. code-block:: text
┌─res─────────────────┐
│ here: Hello, World! │
└─────────────────────┘

View File

@ -1,5 +1,5 @@
Functions for searching strings
------------------------
-------------------------------
The search is case-sensitive in all these functions.
The search substring or regular expression must be a constant in all these functions.

View File

@ -1,5 +1,5 @@
Type conversion functions
----------------------------
-------------------------
toUInt8, toUInt16, toUInt32, toUInt64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -26,7 +26,9 @@ When converting dates to numbers or vice versa, the date corresponds to the numb
When converting dates with times to numbers or vice versa, the date with time corresponds to the number of seconds since the beginning of the Unix epoch.
Formats of date and date with time for toDate/toDateTime functions are defined as follows:
::
.. code-block:: text
YYYY-MM-DD
YYYY-MM-DD hh:mm:ss
@ -44,20 +46,24 @@ To do transformations on DateTime in given time zone, pass second argument with
now() AS now_local,
toString(now(), 'Asia/Yekaterinburg') AS now_yekat
.. code-block:: text
┌───────────now_local─┬─now_yekat───────────┐
│ 2016-06-15 00:11:21 │ 2016-06-15 02:11:21 │
└─────────────────────┴─────────────────────┘
To format DateTime in given time zone:
::
.. code-block:: text
toString(now(), 'Asia/Yekaterinburg')
To get unix timestamp for string with datetime in specified time zone:
::
.. code-block:: text
toUnixTimestamp('2000-01-01 00:00:00', 'Asia/Yekaterinburg')
toFixedString(s, N)
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~
Converts a String type argument to a FixedString(N) type (a string with fixed length N). N must be a constant. If the string has fewer bytes than N, it is passed with null bytes to the right. If the string has more bytes than N, an exception is thrown.
toStringCutToZero(s)
@ -68,13 +74,13 @@ Example:
.. code-block:: sql
:) SELECT toFixedString('foo', 8) AS s, toStringCutToZero(s) AS s_cut
SELECT toFixedString('foo', 8) AS s, toStringCutToZero(s) AS s_cut
┌─s─────────────┬─s_cut─┐
│ foo\0\0\0\0\0 │ foo │
└───────────────┴───────┘
:) SELECT toFixedString('foo\0bar', 8) AS s, toStringCutToZero(s) AS s_cut
SELECT toFixedString('foo\0bar', 8) AS s, toStringCutToZero(s) AS s_cut
┌─s──────────┬─s_cut─┐
│ foo\0bar\0 │ foo │
@ -113,6 +119,8 @@ Example:
CAST(timestamp, 'String') AS string,
CAST(timestamp, 'FixedString(22)') AS fixed_string
.. code-block:: text
┌─timestamp───────────┬────────────datetime─┬───────date─┬─string──────────────┬─fixed_string──────────────┐
│ 2016-06-15 23:00:00 │ 2016-06-15 23:00:00 │ 2016-06-15 │ 2016-06-15 23:00:00 │ 2016-06-15 23:00:00\0\0\0 │
└─────────────────────┴─────────────────────┴────────────┴─────────────────────┴───────────────────────────┘

View File

@ -1,79 +1,80 @@
Functions for working with URLs
------------------------
-------------------------------
All these functions don't follow the RFC. They are maximally simplified for improved performance.
Функции, извлекающие часть URL-а.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions tat extract part of the URL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there isn't anything similar in a URL, an empty string is returned.
protocol
""""""""
- Selects the protocol. Examples: http, ftp, mailto, magnet...
Selects the protocol. Examples: http, ftp, mailto, magnet...
domain
"""""""
- Selects the domain.
""""""
Selects the domain.
domainWithoutWWW
""""""""""""
- Selects the domain and removes no more than one 'www.' from the beginning of it, if present.
""""""""""""""""
Selects the domain and removes no more than one 'www.' from the beginning of it, if present.
topLevelDomain
"""""""""""
- Selects the top-level domain. Example: .ru.
""""""""""""""
Selects the top-level domain. Example: .ru.
firstSignificantSubdomain
""""""""""""""
- Selects the "first significant subdomain". This is a non-standard concept specific to Yandex.Metrica. The first significant subdomain is a second-level domain if it is 'com', 'net', 'org', or 'co'. Otherwise, it is a third-level domain. For example, firstSignificantSubdomain('https://news.yandex.ru/') = 'yandex', firstSignificantSubdomain('https://news.yandex.com.tr/') = 'yandex'. The list of "insignificant" second-level domains and other implementation details may change in the future.
"""""""""""""""""""""""""
Selects the "first significant subdomain". This is a non-standard concept specific to Yandex.Metrica. The first significant subdomain is a second-level domain if it is 'com', 'net', 'org', or 'co'. Otherwise, it is a third-level domain. For example, firstSignificantSubdomain('https://news.yandex.ru/') = 'yandex', firstSignificantSubdomain('https://news.yandex.com.tr/') = 'yandex'. The list of "insignificant" second-level domains and other implementation details may change in the future.
cutToFirstSignificantSubdomain
""""""""""""""""
- Selects the part of the domain that includes top-level subdomains up to the "first significant subdomain" (see the explanation above).
""""""""""""""""""""""""""""""
Selects the part of the domain that includes top-level subdomains up to the "first significant subdomain" (see the explanation above).
For example, ``cutToFirstSignificantSubdomain('https://news.yandex.com.tr/') = 'yandex.com.tr'``.
path
""""
- Selects the path. Example: /top/news.html The path does not include the query-string.
Selects the path. Example: /top/news.html The path does not include the query-string.
pathFull
"""""""
- The same as above, but including query-string and fragment. Example: /top/news.html?page=2#comments
""""""""
The same as above, but including query-string and fragment. Example: /top/news.html?page=2#comments
queryString
"""""""""
- Selects the query-string. Example: page=1&lr=213. query-string does not include the first question mark, or # and everything that comes after #.
"""""""""""
Selects the query-string. Example: page=1&lr=213. query-string does not include the first question mark, or # and everything that comes after #.
fragment
""""""
- Selects the fragment identifier. fragment does not include the first number sign (#).
""""""""
Selects the fragment identifier. fragment does not include the first number sign (#).
queryStringAndFragment
"""""""""
- Selects the query-string and fragment identifier. Example: page=1#29390.
""""""""""""""""""""""
Selects the query-string and fragment identifier. Example: page=1#29390.
extractURLParameter(URL, name)
"""""""""
- Selects the value of the 'name' parameter in the URL, if present. Otherwise, selects an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL in exactly the same way as in the argument passed.
""""""""""""""""""""""""""""""
Selects the value of the 'name' parameter in the URL, if present. Otherwise, selects an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL in exactly the same way as in the argument passed.
extractURLParameters(URL)
""""""""""
- Gets an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.
"""""""""""""""""""""""""
Gets an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.
extractURLParameterNames(URL)
""""""""
- Gets an array of name=value strings corresponding to the names of URL parameters. The values are not decoded in any way.
"""""""""""""""""""""""""""""
Gets an array of name=value strings corresponding to the names of URL parameters. The values are not decoded in any way.
URLHierarchy(URL)
"""""""""
- Gets an array containing the URL trimmed to the ``/``, ``?`` characters in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters. Example:
"""""""""""""""""
Gets an array containing the URL trimmed to the ``/``, ``?`` characters in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters. Example:
URLPathHierarchy(URL)
""""""""
- The same thing, but without the protocol and host in the result. The / element (root) is not included. Example:
"""""""""""""""""""""
The same thing, but without the protocol and host in the result. The / element (root) is not included. Example:
This function is used for implementing tree-view reports by URL in Yandex.Metrica.
::
.. code-block:: text
URLPathHierarchy('https://example.com/browse/CONV-6788') =
[
'/browse/',
@ -81,38 +82,39 @@ This function is used for implementing tree-view reports by URL in Yandex.Metric
]
decodeURLComponent(URL)
"""""""""""
"""""""""""""""""""""""
Returns a URL-decoded URL.
Example:
.. code-block:: sql
:) SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS DecodedURL;
SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS DecodedURL;
┌─DecodedURL─────────────────────────────┐
│ http://127.0.0.1:8123/?query=SELECT 1; │
└────────────────────────────────────────┘
Functions that remove part of a URL.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the URL doesn't have anything similar, the URL remains unchanged.
cutWWW
"""""
""""""
Removes no more than one 'www.' from the beginning of the URL's domain, if present.
cutQueryString
""""""
""""""""""""""
Removes the query-string. The question mark is also removed..
cutFragment
""""""""
"""""""""""
Removes the fragment identifier. The number sign is also removed.
cutQueryStringAndFragment
""""""""""
"""""""""""""""""""""""""
Removes the query-string and fragment identifier. The question mark and number sign are also removed.
cutURLParameter(URL, name)
""""""""""
""""""""""""""""""""""""""
Removes the URL parameter named 'name', if present. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.

View File

@ -1,11 +1,11 @@
Functions for working with Yandex.Metrica dictionaries
----------------
------------------------------------------------------
In order for the functions below to work, the server config must specify the paths and addresses for getting all the Yandex.Metrica dictionaries. The dictionaries are loaded at the first call of any of these functions. If the reference lists can't be loaded, an exception is thrown.
For information about creating reference lists, see the section "Dictionaries".
Multiple geobases
~~~~~~~~~
~~~~~~~~~~~~~~~~~
ClickHouse supports working with multiple alternative geobases (regional hierarchies) simultaneously, in order to support various perspectives on which countries certain regions belong to.
The 'clickhouse-server' config specifies the file with the regional hierarchy:
@ -20,17 +20,19 @@ All the dictionaries are re-loaded in runtime (once every certain number of seco
All functions for working with regions have an optional argument at the end - the dictionary key. It is indicated as the geobase.
Example:
::
.. code-block:: text
regionToCountry(RegionID) - Uses the default dictionary: /opt/geo/regions_hierarchy.txt
regionToCountry(RegionID, '') - Uses the default dictionary: /opt/geo/regions_hierarchy.txt
regionToCountry(RegionID, 'ua') - Uses the dictionary for the 'ua' key: /opt/geo/regions_hierarchy_ua.txt
regionToCity(id[, geobase])
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Accepts a UInt32 number - the region ID from the Yandex geobase. If this region is a city or part of a city, it returns the region ID for the appropriate city. Otherwise, returns 0.
regionToArea(id[, geobase])
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converts a region to an area (type 5 in the geobase). In every other way, this function is the same as 'regionToCity'.
.. code-block:: sql
@ -39,6 +41,8 @@ Converts a region to an area (type 5 in the geobase). In every other way, this f
FROM system.numbers
LIMIT 15
.. code-block:: text
┌─regionToName(regionToArea(toUInt32(number), \'ua\'), \'en\')─┐
│ │
│ Moscow and Moscow region │
@ -58,7 +62,7 @@ Converts a region to an area (type 5 in the geobase). In every other way, this f
└──────────────────────────────────────────────────────────────┘
regionToDistrict(id[, geobase])
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converts a region to a federal district (type 4 in the geobase). In every other way, this function is the same as 'regionToCity'.
.. code-block:: sql
@ -67,6 +71,8 @@ Converts a region to a federal district (type 4 in the geobase). In every other
FROM system.numbers
LIMIT 15
.. code-block:: text
┌─regionToName(regionToDistrict(toUInt32(number), \'ua\'), \'en\')─┐
│ │
│ Central │
@ -86,34 +92,34 @@ Converts a region to a federal district (type 4 in the geobase). In every other
└──────────────────────────────────────────────────────────────────┘
regionToCountry(id[, geobase])
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converts a region to a country. In every other way, this function is the same as 'regionToCity'.
Example: ``regionToCountry(toUInt32(213)) = 225`` converts ``Moscow (213)`` to ``Russia (225)``.
regionToContinent(id[, geobase])
~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Converts a region to a continent. In every other way, this function is the same as 'regionToCity'.
Example: ``regionToContinent(toUInt32(213)) = 10001`` converts Moscow (213) to Eurasia (10001).
regionToPopulation(id[, geobase])
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gets the population for a region.
The population can be recorded in files with the geobase. See the section "External dictionaries".
If the population is not recorded for the region, it returns 0.
In the Yandex geobase, the population might be recorded for child regions, but not for parent regions..
regionIn(lhs, rhs[, geobase])
~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Checks whether a 'lhs' region belongs to a 'rhs' region. Returns a UInt8 number equal to 1 if it belongs, or 0 if it doesn't belong.
The relationship is reflexive - any region also belongs to itself.
regionHierarchy(id[, geobase])
~~~~~~~~~
ПAccepts a UInt32 number - the region ID from the Yandex geobase. Returns an array of region IDs consisting of the passed region and all parents along the chain.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Accepts a UInt32 number - the region ID from the Yandex geobase. Returns an array of region IDs consisting of the passed region and all parents along the chain.
Example: ``regionHierarchy(toUInt32(213)) = [213,1,3,225,10001,10000]``.
regionToName(id[, lang])
~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
Accepts a UInt32 number - the region ID from the Yandex geobase. A string with the name of the language can be passed as a second argument. Supported languages are: ru, en, ua, uk, by, kz, tr. If the second argument is omitted, the language 'ru' is used. If the language is not supported, an exception is thrown. Returns a string - the name of the region in the corresponding language. If the region with the specified ID doesn't exist, an empty string is returned.
``ua`` and ``uk`` mean the same thing - Ukrainian.

View File

@ -1,133 +0,0 @@
Начало работы
=============
Системные требования
-----------------
Система некроссплатформенная. Требуется ОС Linux Ubuntu не более старая, чем Precise (12.04); архитектура x86_64 с поддержкой набора инструкций SSE 4.2.
Для проверки наличия SSE 4.2, выполните:
::
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
Рекомендуется использовать Ubuntu Trusty или Ubuntu Xenial или Ubuntu Precise.
Терминал должен работать в кодировке UTF-8 (как по умолчанию в Ubuntu).
Установка
-----------------
В целях тестирования и разработки, система может быть установлена на один сервер или на рабочий компьютер.
Установка из пакетов
~~~~~~~~~~~~~~~~~~~~
Пропишите в `/etc/apt/sources.list` (или в отдельный файл `/etc/apt/sources.list.d/clickhouse.list`) репозитории:
::
deb http://repo.yandex.ru/clickhouse/trusty stable main
На других версиях Ubuntu, замените `trusty` на `xenial` или `precise`.
Затем выполните:
::
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # optional
sudo apt-get update
sudo apt-get install clickhouse-client clickhouse-server-common
Также можно скачать и установить пакеты вручную, отсюда:
http://repo.yandex.ru/clickhouse/trusty/pool/main/c/clickhouse/,
http://repo.yandex.ru/clickhouse/xenial/pool/main/c/clickhouse/,
http://repo.yandex.ru/clickhouse/precise/pool/main/c/clickhouse/.
ClickHouse содержит настройки ограничения доступа. Они расположены в файле users.xml (рядом с config.xml).
По умолчанию, разрешён доступ отовсюду для пользователя default без пароля. См. секцию users/default/networks.
Подробнее смотрите в разделе "конфигурационные файлы".
Установка из исходников
~~~~~~~~~~~~~~~~~~~~~~~
Для сборки воспользуйтесь инструкцией: build.md
Вы можете собрать пакеты и установить их.
Также вы можете использовать программы без установки пакетов.
Клиент: dbms/src/Client/
Сервер: dbms/src/Server/
Для сервера создаёте директории с данными, например:
::
/opt/clickhouse/data/default/
/opt/clickhouse/metadata/default/
(Настраивается в конфиге сервера.)
Сделайте chown под нужного пользователя.
Обратите внимание на путь к логам в конфиге сервера (src/dbms/src/Server/config.xml).
Другие методы установки
~~~~~~~~~~~~~~~~~~~~~~~
Docker образ: https://hub.docker.com/r/yandex/clickhouse-server/
Gentoo overlay: https://github.com/kmeaw/clickhouse-overlay
Запуск
-------
Для запуска сервера (в качестве демона), выполните:
::
sudo service clickhouse-server start
Смотрите логи в директории `/var/log/clickhouse-server/`
Если сервер не стартует - проверьте правильность конфигурации в файле `/etc/clickhouse-server/config.xml`
Также можно запустить сервер из консоли:
::
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
При этом, лог будет выводиться в консоль - удобно для разработки.
Если конфигурационный файл лежит в текущей директории, то указывать параметр --config-file не требуется - по умолчанию будет использован файл ./config.xml
Соединиться с сервером можно с помощью клиента командной строки:
::
clickhouse-client
Параметры по умолчанию обозначают - соединяться с localhost:9000, от имени пользователя default без пароля.
Клиент может быть использован для соединения с удалённым сервером. Пример:
::
clickhouse-client --host=example.com
Подробнее смотри раздел "Клиент командной строки".
Проверим работоспособность системы:
::
milovidov@milovidov-Latitude-E6320:~/work/metrica/src/dbms/src/Client$ ./clickhouse-client
ClickHouse client version 0.0.18749.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.18749.
:) SELECT 1
SELECT 1
┌─1─┐
│ 1 │
└───┘
1 rows in set. Elapsed: 0.003 sec.
:)
Поздравляю, система работает!
Тестовые данные
---------------
Если вы сотрудник Яндекса, вы можете воспользоваться тестовыми данными Яндекс.Метрики для изучения возможностей системы.
Как загрузить тестовые данные, написано здесь.
Если вы внешний пользователь системы, вы можете воспользоваться использовать общедоступные данные, способы загрузки которых указаны здесь.
Если возникли вопросы
---------------------
Если вы являетесь сотрудником Яндекса, обращайтесь на внутреннюю рассылку по ClickHouse.
Вы можете подписаться на эту рассылку, чтобы получать анонсы, быть в курсе нововведений, а также видеть вопросы, которые возникают у других пользователей.
Иначе вы можете задавать вопросы на Stackoverflow или участвовать в обсуждениях на Google Groups. Также вы можете отправить приватное сообщение для разрабочиков по адресу clickhouse-feedback@yandex-team.com.

View File

@ -1,33 +1,37 @@
Getting started
=============
===============
System requirements
-----------------
-------------------
This is not a cross-platform system. It requires Linux Ubuntu Precise (12.04) or newer, x86_64 architecture with SSE 4.2 instruction set.
To test for SSE 4.2 support, do:
::
.. code-block:: text
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
We recommend using Ubuntu Trusty or Ubuntu Xenial or Ubuntu Precise.
The terminal must use UTF-8 encoding (the default in Ubuntu).
Installation
-----------------
------------
For testing and development, the system can be installed on a single server or on a desktop computer.
Installing from packages
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~
In `/etc/apt/sources.list` (or in a separate `/etc/apt/sources.list.d/clickhouse.list` file), add the repository:
::
.. code-block:: text
deb http://repo.yandex.ru/clickhouse/trusty stable main
For other Ubuntu versions, replace `trusty` to `xenial` or `precise`.
Then run:
::
.. code-block:: bash
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # optional
sudo apt-get update
sudo apt-get install clickhouse-client clickhouse-server-common
@ -41,16 +45,18 @@ ClickHouse contains access restriction settings. They are located in the 'users.
By default, access is allowed from everywhere for the default user without a password. See 'user/default/networks'. For more information, see the section "Configuration files".
Installing from source
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~
To build, follow the instructions in build.md (for Linux) or in build_osx.md (for Mac OS X).
You can compile packages and install them. You can also use programs without installing packages.
::
.. code-block:: text
Client: dbms/src/Client/
Server: dbms/src/Server/
For the server, create a catalog with data, such as:
::
.. code-block:: text
/opt/clickhouse/data/default/
/opt/clickhouse/metadata/default/
@ -60,17 +66,18 @@ Run 'chown' for the desired user.
Note the path to logs in the server config (src/dbms/src/Server/config.xml).
Other methods of installation
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Docker image is located here: https://hub.docker.com/r/yandex/clickhouse-server/
There is Gentoo overlay located here: https://github.com/kmeaw/clickhouse-overlay
Launch
-------
------
To start the server (as a daemon), run:
::
.. code-block:: bash
sudo service clickhouse-server start
View the logs in the catalog `/var/log/clickhouse-server/`
@ -78,25 +85,29 @@ View the logs in the catalog `/var/log/clickhouse-server/`
If the server doesn't start, check the configurations in the file `/etc/clickhouse-server/config.xml`
You can also launch the server from the console:
::
.. code-block:: bash
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
In this case, the log will be printed to the console, which is convenient during development. If the configuration file is in the current directory, you don't need to specify the '--config-file' parameter. By default, it uses './config.xml'.
You can use the command-line client to connect to the server:
::
.. code-block:: bash
clickhouse-client
The default parameters indicate connecting with localhost:9000 on behalf of the user 'default' without a password.
The client can be used for connecting to a remote server. For example:
::
.. code-block:: bash
clickhouse-client --host=example.com
For more information, see the section "Command-line client".
Checking the system:
::
milovidov@milovidov-Latitude-E6320:~/work/metrica/src/dbms/src/Client$ ./clickhouse-client
.. code-block:: bash
milovidov@hostname:~/work/metrica/src/dbms/src/Client$ ./clickhouse-client
ClickHouse client version 0.0.18749.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.18749.
@ -114,17 +125,3 @@ Checking the system:
:)
Congratulations, it works!
Test data
---------------
If you are Yandex employee, you can use Yandex.Metrica test data to explore the system's capabilities. You can find instructions for using the test data here.
Otherwise, you could use one of available public datasets, described here.
If you have questions
---------------------
If you are Yandex employee, use internal ClickHouse maillist.
You can subscribe to this list to get announcements, information on new developments, and questions that other users have.
Otherwise, you could ask questions on Stack Overflow; discuss in Google Groups; or send private message to developers to address clickhouse-feedback@yandex-team.com.

View File

@ -1,6 +1,6 @@
Documentation
-----------------
-------------
.. toctree::
:maxdepth: 6
@ -23,3 +23,4 @@ Documentation
configuration_files
access_rights
quotas
roadmap

View File

@ -1,7 +1,8 @@
Command-line client
-----------------------
-------------------
To work for command line you can use ``clickhouse-client``:
::
.. code-block:: bash
$ clickhouse-client
ClickHouse client version 0.0.26176.
Connecting to localhost:9000.
@ -37,9 +38,12 @@ Only works in non-interactive mode.
``--stacktrace`` - If specified, also prints the stack trace if an exception occurs.
``--config-file`` - Name of the configuration file that has additional settings or changed defaults for the settings listed above.
By default, files are searched for in this order:
./clickhouse-client.xml
~/./clickhouse-client/config.xml
/etc/clickhouse-client/config.xml
.. code-block:: text
./clickhouse-client.xml
~/./clickhouse-client/config.xml
/etc/clickhouse-client/config.xml
Settings are only taken from the first file found.
You can also specify any settings that will be used for processing queries. For example, ``clickhouse-client --max_threads=1``. For more information, see the section "Settings".
@ -49,7 +53,8 @@ To use batch mode, specify the 'query' parameter, or send data to 'stdin' (it ve
Similar to the HTTP interface, when using the 'query' parameter and sending data to 'stdin', the request is a concatenation of the 'query' parameter, a line break, and the data in 'stdin'. This is convenient for large INSERT queries.
Examples for insert data via clickhouse-client:
::
.. code-block:: bash
echo -ne "1, 'some text', '2016-08-14 00:00:00'\n2, 'some more text', '2016-08-14 00:00:01'" | clickhouse-client --database=test --query="INSERT INTO test FORMAT CSV";
cat <<_EOF | clickhouse-client --database=test --query="INSERT INTO test FORMAT CSV";

View File

@ -160,15 +160,18 @@ By default, the database that is registered in the server settings is used as th
The username and password can be indicated in one of two ways:
1. Using HTTP Basic Authentication. Example: ::
1. Using HTTP Basic Authentication. Example:
.. code-block:: bash
echo 'SELECT 1' | curl 'http://user:password@localhost:8123/' -d @-
2. In the 'user' and 'password' URL parameters. Example: ::
2. In the 'user' and 'password' URL parameters. Example:
.. code-block:: bash
echo 'SELECT 1' | curl 'http://localhost:8123/?user=user&password=password' -d @-
3. Using 'X-ClickHouse-User' and 'X-ClickHouse-Key' headers. Example: ::
3. Using 'X-ClickHouse-User' and 'X-ClickHouse-Key' headers. Example:
.. code-block:: bash
echo 'SELECT 1' | curl -H "X-ClickHouse-User: user" -H "X-ClickHouse-Key: password" 'http://localhost:8123/' -d @-

View File

@ -1,4 +1,4 @@
JDBC driver
------------
-----------
There is official JDBC driver for ClickHouse. See `here <https://github.com/yandex/clickhouse-jdbc>`_ .

View File

@ -1,5 +1,5 @@
Third-party client libraries
--------------------------------------
----------------------------
There exist third-party client libraries for ClickHouse:
@ -11,17 +11,17 @@ There exist third-party client libraries for ClickHouse:
- `PhpClickHouseClient <https://github.com/SevaCode/PhpClickHouseClient>`_
- `phpClickHouse <https://github.com/smi2/phpClickHouse>`_
* Go
- `clickhouse <https://github.com/kshvakov/clickhouse/>`_
- `clickhouse (Go) <https://github.com/kshvakov/clickhouse/>`_
- `go-clickhouse <https://github.com/roistat/go-clickhouse>`_
* NodeJs
- `clickhouse <https://github.com/TimonKK/clickhouse>`_
- `clickhouse (NodeJs) <https://github.com/TimonKK/clickhouse>`_
- `node-clickhouse <https://github.com/apla/node-clickhouse>`_
* Perl
- `perl-DBD-ClickHouse <https://github.com/elcamlost/perl-DBD-ClickHouse>`_
- `HTTP-ClickHouse <https://metacpan.org/release/HTTP-ClickHouse>`_
- `AnyEvent-ClickHouse <https://metacpan.org/release/AnyEvent-ClickHouse>`_
* Ruby
- `clickhouse <https://github.com/archan937/clickhouse>`_
- `clickhouse (Ruby) <https://github.com/archan937/clickhouse>`_
* R
- `clickhouse-r <https://github.com/hannesmuehleisen/clickhouse-r>`_
* .NET

View File

@ -1,5 +1,5 @@
Third-party GUI
------------------------------
---------------
There are `open source project Tabix <https://github.com/smi2/tabix.ui>`_ company of SMI2, which implements a graphical web interface for ClickHouse.

View File

@ -1,8 +1,8 @@
Distinctive features of ClickHouse
===================================
==================================
1. True column-oriented DBMS.
---------------------------------
-----------------------------
In a true column-oriented DBMS, there isn't any "garbage" stored with the values. For example, constant-length values must be supported, to avoid storing their length "number" next to the values. As an example, a billion UInt8-type values should actually consume around 1 GB uncompressed, or this will strongly affect the CPU use. It is very important to store data compactly (without any "garbage") even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
This is worth noting because there are systems that can store values of separate columns separately, but that can't effectively process analytical queries due to their optimization for other scenarios. Example are HBase, BigTable, Cassandra, and HyperTable. In these systems, you will get throughput around a hundred thousand rows per second, but not hundreds of millions of rows per second.
@ -10,19 +10,19 @@ This is worth noting because there are systems that can store values of separate
Also note that ClickHouse is a DBMS, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
2. Data compression.
-----------------
--------------------
Some column-oriented DBMSs (InfiniDB CE and MonetDB) do not use data compression. However, data compression really improves performance.
3. Disk storage of data.
----------------------------
------------------------
Many column-oriented DBMSs (SAP HANA, and Google PowerDrill) can only work in RAM. But even on thousands of servers, the RAM is too small for storing all the pageviews and sessions in Yandex.Metrica.
4. Parallel processing on multiple cores.
---------------------------------------------------------------
-----------------------------------------
Large queries are parallelized in a natural way.
5. Distributed processing on multiple servers.
-----------------------------------------------
----------------------------------------------
Almost none of the columnar DBMSs listed above have support for distributed processing.
In ClickHouse, data can reside on different shards. Each shard can be a group of replicas that are used for fault tolerance. The query is processed on all the shards in parallel. This is transparent for the user.
@ -38,25 +38,25 @@ Correlated subqueries are not supported.
Data is not only stored by columns, but is processed by vectors - parts of columns. This allows us to achieve high CPU performance.
8. Real-time data updates.
-----------------------
--------------------------
ClickHouse supports primary key tables. In order to quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. There is no locking when adding data.
9. Indexes.
-----------------
-----------
Having a primary key allows, for example, extracting data for specific clients (Metrica counters) for a specific time range, with low latency less than several dozen milliseconds.
10. Suitable for online queries.
------------------
--------------------------------
This lets us use the system as the back-end for a web interface. Low latency means queries can be processed without delay, while the Yandex.Metrica interface page is loading (in online mode).
11. Support for approximated calculations.
-----------------
------------------------------------------
#. The system contains aggregate functions for approximated calculation of the number of various values, medians, and quantiles.
#. Supports running a query based on a part (sample) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
#. Supports running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
14. Data replication and support for data integrity on replicas.
-----------------
Uses asynchronous multimaster replication. After being written to any available replica, data is distributed to all the remaining replicas. The system maintains identical data on different replicas. Data is restored automatically after a failure, or using a "button" for complex cases.
----------------------------------------------------------------
Uses asynchronous multi-master replication. After being written to any available replica, data is distributed to all the remaining replicas. The system maintains identical data on different replicas. Data is restored automatically after a failure, or using a "button" for complex cases.
For more information, see the section "Data replication".

View File

@ -1,5 +1,5 @@
ClickHouse features that can be considered disadvantages
------------------------------------------------------------
--------------------------------------------------------
#. No transactions.
#. For aggregation, query results must fit in the RAM on a single server. However, the volume of source data for a query may be indefinitely large.

View File

@ -1,5 +1,5 @@
Introduction
=========
============
.. toctree::
:glob:

View File

@ -1,21 +1,21 @@
Performance
===================
===========
According to internal testing results, ClickHouse shows the best performance for comparable operating scenarios among systems of its class that were available for testing. This includes the highest throughput for long queries, and the lowest latency on short queries. Testing results are shown on this page.
Throughput for a single large query
-------------------------------
-----------------------------------
Throughput can be measured in rows per second or in megabytes per second. If the data is placed in the page cache, a query that is not too complex is processed on modern hardware at a speed of approximately 2-10 GB/s of uncompressed data on a single server (for the simplest cases, the speed may reach 30 GB/s). If data is not placed in the page cache, the speed depends on the disk subsystem and the data compression rate. For example, if the disk subsystem allows reading data at 400 MB/s, and the data compression rate is 3, the speed will be around 1.2 GB/s. To get the speed in rows per second, divide the speed in bytes per second by the total size of the columns used in the query. For example, if 10 bytes of columns are extracted, the speed will be around 100-200 million rows per second.
The processing speed increases almost linearly for distributed processing, but only if the number of rows resulting from aggregation or sorting is not too large.
Latency when processing short queries.
--------------------
--------------------------------------
If a query uses a primary key and does not select too many rows to process (hundreds of thousands), and does not use too many columns, we can expect less than 50 milliseconds of latency (single digits of milliseconds in the best case) if data is placed in the page cache. Otherwise, latency is calculated from the number of seeks. If you use rotating drives, for a system that is not overloaded, the latency is calculated by this formula: seek time (10 ms) * number of columns queried * number of data parts.
Throughput when processing a large quantity of short queries.
--------------------
-------------------------------------------------------------
Under the same conditions, ClickHouse can handle several hundred queries per second on a single server (up to several thousand in the best case). Since this scenario is not typical for analytical DBMSs, we recommend expecting a maximum of 100 queries per second.
Performance on data insertion.
------------------
------------------------------
We recommend inserting data in packets of at least 1000 rows, or no more than a single request per second. When inserting to a MergeTree table from a tab-separated dump, the insertion speed will be from 50 to 200 MB/s. If the inserted rows are around 1 Kb in size, the speed will be from 50,000 to 200,000 rows per second. If the rows are small, the performance will be higher in rows per second (on Yandex Banner System data -> 500,000 rows per second, on Graphite data -> 1,000,000 rows per second). To improve performance, you can make multiple INSERT queries in parallel, and performance will increase linearly.

View File

@ -1,8 +1,8 @@
Possible silly questions
-----------------------
------------------------
1. Why not to use systems like map-reduce?
"""""""""""""""""""
""""""""""""""""""""""""""""""""""""""""""
Systems like map-reduce are distributed computing systems, where the reduce phase is performed using distributed sorting.
Regarding this aspect, map-reduce is similar to other systems like YAMR, Hadoop, YT.

View File

@ -1,13 +0,0 @@
Usage in Yandex.Metrica and other Yandex services
------------------------------------------
ClickHouse is used for multiple purposes in Yandex.Metrica. Its main task is to build reports in online mode using non-aggregated data. It uses a cluster of 374 servers, which store over 20.3 trillion rows in the database. The volume of compressed data, without counting duplication and replication, is about 2 PB. The volume of uncompressed data (in TSV format) would be approximately 17 PB.
ClickHouse is also used for:
* Storing WebVisor data.
* Processing intermediate data.
* Building global reports with Analytics.
* Running queries for debugging the Metrica engine.
* Analyzing logs from the API and the user interface.
ClickHouse has at least a dozen installations in other Yandex services: in search verticals, Market, Direct, business analytics, mobile development, AdFox, personal services, and others.

View File

@ -1,18 +1,21 @@
What is ClickHouse?
====================
===================
ClickHouse is a columnar DBMS for OLAP.
In a "normal" row-oriented DBMS, data is stored in this order:
::
.. code-block:: text
5123456789123456789 1 Eurobasket - Greece - Bosnia and Herzegovina - example.com 1 2011-09-01 01:03:02 6274717 1294101174 11409 612345678912345678 0 33 6 http://www.example.com/basketball/team/123/match/456789.html http://www.example.com/basketball/team/123/match/987654.html 0 1366 768 32 10 3183 0 0 13 0\0 1 1 0 0 2011142 -1 0 0 01321 613 660 2011-09-01 08:01:17 0 0 0 0 utf-8 1466 0 0 0 5678901234567890123 277789954 0 0 0 0 0
5234985259563631958 0 Consulting, Tax assessment, Accounting, Law 1 2011-09-01 01:03:02 6320881 2111222333 213 6458937489576391093 0 3 2 http://www.example.ru/ 0 800 600 16 10 2 153.1 0 0 10 63 1 1 0 0 2111678 000 0 588 368 240 2011-09-01 01:03:17 4 0 60310 0 windows-1251 1466 0 000 778899001 0 0 0 0 0
...
...
In other words, all the values related to a row are stored next to each other. Examples of a row-oriented DBMS are MySQL, Postgres, MS SQL Server, and others.
In a column-oriented DBMS, data is stored like this:
::
.. code-block:: text
WatchID: 5385521489354350662 5385521490329509958 5385521489953706054 5385521490476781638 5385521490583269446 5385521490218868806 5385521491437850694 5385521491090174022 5385521490792669254 5385521490420695110 5385521491532181574 5385521491559694406 5385521491459625030 5385521492275175494 5385521492781318214 5385521492710027334 5385521492955615302 5385521493708759110 5385521494506434630 5385521493104611398
JavaEnable: 1 0 1 0 0 0 1 0 1 1 1 1 1 1 0 1 0 0 1 1
Title: Yandex Announcements - Investor Relations - Yandex Yandex — Contact us — Moscow Yandex — Mission Ru Yandex — History — History of Yandex Yandex Financial Releases - Investor Relations - Yandex Yandex — Locations Yandex Board of Directors - Corporate Governance - Yandex Yandex — Technologies
@ -49,15 +52,17 @@ It is easy to see that the OLAP scenario is very different from other popular sc
Columnar-oriented databases are better suited to OLAP scenarios (at least 100 times better in processing speed for most queries), for the following reasons:
1. For I/O.
1.1. For an analytical query, only a small number of table columns need to be read. In a column-oriented database, you can read just the data you need. For example, if you need 5 columns out of 100, you can expect a 20-fold reduction in I/O.
1.2. Since data is read in packets, it is easier to compress. Data in columns is also easier to compress. This further reduces the I/O volume.
1.3. Due to the reduced I/O, more data fits in the system cache.
#. For an analytical query, only a small number of table columns need to be read. In a column-oriented database, you can read just the data you need. For example, if you need 5 columns out of 100, you can expect a 20-fold reduction in I/O.
#. Since data is read in packets, it is easier to compress. Data in columns is also easier to compress. This further reduces the I/O volume.
#. Due to the reduced I/O, more data fits in the system cache.
For example, the query "count the number of records for each advertising platform" requires reading one "advertising platform ID" column, which takes up 1 byte uncompressed. If most of the traffic was not from advertising platforms, you can expect at least 10-fold compression of this column. When using a quick compression algorithm, data decompression is possible at a speed of at least several gigabytes of uncompressed data per second. In other words, this query can be processed at a speed of approximately several billion rows per second on a single server. This speed is actually achieved in practice.
Example:
::
milovidov@████████.yandex.ru:~$ clickhouse-client
.. code-block:: text
milovidov@hostname:~$ clickhouse-client
ClickHouse client version 0.0.52053.
Connecting to localhost:9000.
Connected to ClickHouse server version 0.0.52053.
@ -100,12 +105,13 @@ Example:
:)
2. For CPU.
Since executing a query requires processing a large number of rows, it helps to dispatch all operations for entire vectors instead of for separate rows, or to implement the query engine so that there is almost no dispatching cost. If you don't do this, with any half-decent disk subsystem, the query interpreter inevitably stalls the CPU.
It makes sense to both store data in columns and process it, when possible, by columns.
There are two ways to do this:
1. A vector engine. All operations are written for vectors, instead of for separate values. This means you don't need to call operations very often, and dispatching costs are negligible. Operation code contains an optimized internal cycle.
2. Code generation. The code generated for the query has all the indirect calls in it.
#. A vector engine. All operations are written for vectors, instead of for separate values. This means you don't need to call operations very often, and dispatching costs are negligible. Operation code contains an optimized internal cycle.
#. Code generation. The code generated for the query has all the indirect calls in it.
This is not done in "normal" databases, because it doesn't make sense when running simple queries. However, there are exceptions. For example, MemSQL uses code generation to reduce latency when processing SQL queries. (For comparison, analytical DBMSs require optimization of throughput, not latency.)

View File

@ -1,11 +1,27 @@
The Yandex.Metrica task
----------------------------------
-----------------------
ClickHouse currently powers `Yandex.Metrica <https://metrica.yandex.com/>`_, world's `second largest <http://w3techs.com/technologies/overview/traffic_analysis/all>`_ web analytics platform, with over 13 trillion database records and over 20 billion events a day, generating customized reports on the fly directly from non-aggregated data.
We need to get custom reports based on hits and sessions, with custom segments set by the user. Data for the reports is updated in real-time. Queries must be run immediately (in online mode). We must be able to build reports for any time period. Complex aggregates must be calculated, such as the number of unique visitors.
At this time (April 2014), Yandex.Metrica receives approximately 12 billion events (pageviews and mouse clicks) daily. All these events must be stored in order to build custom reports. A single query may require scanning hundreds of millions of rows over a few seconds, or millions of rows in no more than a few hundred milliseconds.
Usage in Yandex.Metrica and other Yandex services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ClickHouse is used for multiple purposes in Yandex.Metrica. Its main task is to build reports in online mode using non-aggregated data. It uses a cluster of 374 servers, which store over 20.3 trillion rows in the database. The volume of compressed data, without counting duplication and replication, is about 2 PB. The volume of uncompressed data (in TSV format) would be approximately 17 PB.
ClickHouse is also used for:
* Storing WebVisor data.
* Processing intermediate data.
* Building global reports with Analytics.
* Running queries for debugging the Metrica engine.
* Analyzing logs from the API and the user interface.
ClickHouse has at least a dozen installations in other Yandex services: in search verticals, Market, Direct, business analytics, mobile development, AdFox, personal services, and others.
Aggregated and non-aggregated data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There is a popular opinion that in order to effectively calculate statistics, you must aggregate data, since this reduces the volume of data.
But data aggregation is a very limited solution, for the following reasons:

View File

@ -4,19 +4,19 @@ Operators
All operators are transformed to the corresponding functions at the query parsing stage, in accordance with their precedence and associativity.
Access operators
-----------------
----------------
``a[N]`` - Access to an array element, arrayElement(a, N) function.
``a.N`` - Access to a tuple element, tupleElement(a, N) function.
Numeric negation operator
----------------------------
-------------------------
``-a`` - negate(a) function
Multiplication and division operators
-----------------------------
-------------------------------------
``a * b`` - multiply(a, b) function
@ -25,14 +25,14 @@ Multiplication and division operators
``a % b`` - modulo(a, b) function
Addition and subtraction operators
------------------------------
----------------------------------
``a + b`` - plus(a, b) function
``a - b`` - minus(a, b) function
Comparison operators
-------------------
--------------------
``a = b`` - equals(a, b) function
@ -58,7 +58,7 @@ Comparison operators
Operators for working with data sets
----------------------------------
------------------------------------
*See the section "IN operators".*
@ -73,29 +73,29 @@ Operators for working with data sets
Logical negation operator
------------------------------
-------------------------
``NOT a`` - ``not(a)`` function
Logical "AND" operator
-------------------------
----------------------
``a AND b`` - function ``and(a, b)``
Logical "OR" operator
---------------------------
---------------------
``a OR b`` - function ``or(a, b)``
Conditional operator
-----------------
--------------------
``a ? b : c`` - function ``if(a, b, c)``
Conditional expression
------------------
----------------------
.. code-block:: sql
@ -108,29 +108,29 @@ Conditional expression
If x is given - transform(x, [a, ...], [b, ...], c). Otherwise, multiIf(a, b, ..., c).
String concatenation operator
-------------------------
-----------------------------
``s1 || s2`` - concat(s1, s2) function
Lambda creation operator
----------------------------------
------------------------
``x -> expr`` - lambda(x, expr) function
The following operators do not have a priority, since they are brackets:
Array creation operator
--------------------------
-----------------------
``[x1, ...]`` - array(x1, ...) function
Tuple creation operator
-------------------------
-----------------------
``(x1, x2, ...)`` - tuple(x2, x2, ...) function
Associativity
----------------
-------------
All binary operators have left associativity. For example, ``'1 + 2 + 3'`` is transformed to ``'plus(plus(1, 2), 3)'``.
Sometimes this doesn't work the way you expect. For example, ``'SELECT 4 > 3 > 2'`` results in ``0``.

View File

@ -1,4 +1,4 @@
clickhouse-local
--------------------------
----------------
Application ``clickhouse-local`` can fast processing of local files that store tables without resorting to deployment and configuration clickhouse-server ...

View File

@ -1,5 +1,5 @@
Query language
==========
==============
.. toctree::
:glob:

View File

@ -4,7 +4,8 @@ Queries
CREATE DATABASE
~~~~~~~~~~~~~~~
Creates the 'db_name' database.
::
.. code-block:: sql
CREATE DATABASE [IF NOT EXISTS] db_name
A database is just a directory for tables.
@ -13,7 +14,8 @@ If "IF NOT EXISTS" is included, the query won't return an error if the database
CREATE TABLE
~~~~~~~~~~~~
The ``CREATE TABLE`` query can have several forms.
::
.. code-block:: sql
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
@ -25,11 +27,13 @@ Creates a table named 'name' in the 'db' database or the current database if 'db
A column description is ``name type`` in the simplest case. For example: ``RegionID UInt32``.
Expressions can also be defined for default values (see below).
::
.. code-block:: sql
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name AS [db2.]name2 [ENGINE = engine]
Creates a table with the same structure as another table. You can specify a different engine for the table. If the engine is not specified, the same engine will be used as for the 'db2.name2' table.
::
.. code-block:: sql
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name ENGINE = engine AS SELECT ...
Creates a table with a structure like the result of the ``SELECT`` query, with the 'engine' engine, and fills it with data from SELECT.
@ -37,7 +41,7 @@ Creates a table with a structure like the result of the ``SELECT`` query, with t
In all cases, if IF NOT EXISTS is specified, the query won't return an error if the table already exists. In this case, the query won't do anything.
Default values
"""""""""""""""""""""
""""""""""""""
The column description can specify an expression for a default value, in one of the following ways:
``DEFAULT expr``, ``MATERIALIZED expr``, ``ALIAS expr``.
Example: ``URLDomain String DEFAULT domain(URL)``.
@ -73,7 +77,7 @@ If you add a new column to a table but later change its default expression, the
It is not possible to set default values for elements in nested data structures.
Temporary tables
"""""""""""""""""
""""""""""""""""
In all cases, if TEMPORARY is specified, a temporary table will be created. Temporary tables have the following characteristics:
- Temporary tables disappear when the session ends, including if the connection is lost.
- A temporary table is created with the Memory engine. The other table engines are not supported.
@ -84,7 +88,7 @@ In all cases, if TEMPORARY is specified, a temporary table will be created. Temp
In most cases, temporary tables are not created manually, but when using external data for a query, or for distributed (GLOBAL) IN. For more information, see the appropriate sections.
CREATE VIEW
~~~~~~~~~~~~
~~~~~~~~~~~
``CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]name [ENGINE = engine] [POPULATE] AS SELECT ...``
Creates a view. There are two types of views: normal and MATERIALIZED.
@ -92,14 +96,17 @@ Creates a view. There are two types of views: normal and MATERIALIZED.
Normal views don't store any data, but just perform a read from another table. In other words, a normal view is nothing more than a saved query. When reading from a view, this saved query is used as a subquery in the FROM clause.
As an example, assume you've created a view:
::
.. code-block:: sql
CREATE VIEW view AS SELECT ...
and written a query:
::
.. code-block:: sql
SELECT a, b, c FROM view
This query is fully equivalent to using the subquery:
::
.. code-block:: sql
SELECT a, b, c FROM (SELECT ...)
Materialized views store data transformed by the corresponding SELECT query.
@ -130,21 +137,24 @@ This query is used when starting the server. The server stores table metadata as
DROP
~~~~
This query has two types: ``DROP DATABASE`` and ``DROP TABLE``.
::
.. code-block:: sql
DROP DATABASE [IF EXISTS] db
Deletes all tables inside the 'db' database, then deletes the 'db' database itself.
If IF EXISTS is specified, it doesn't return an error if the database doesn't exist.
::
.. code-block:: sql
DROP TABLE [IF EXISTS] [db.]name
Deletes the table.
If IF EXISTS is specified, it doesn't return an error if the table doesn't exist or the database doesn't exist.
If ``IF EXISTS`` is specified, it doesn't return an error if the table doesn't exist or the database doesn't exist.
DETACH
~~~~~~
Deletes information about the table from the server. The server stops knowing about the table's existence.
::
.. code-block:: sql
DETACH TABLE [IF EXISTS] [db.]name
This does not delete the table's data or metadata. On the next server launch, the server will read the metadata and find out about the table again. Similarly, a "detached" table can be re-attached using the ATTACH query (with the exception of system tables, which do not have metadata stored for them).
@ -154,7 +164,8 @@ There is no DETACH DATABASE query.
RENAME
~~~~~~
Renames one or more tables.
::
.. code-block:: sql
RENAME TABLE [db11.]name11 TO [db12.]name12, [db21.]name21 TO [db22.]name22, ...
All tables are renamed under global locking. Renaming tables is a light operation. If you indicated another database after TO, the table will be moved to this database. However, the directories with databases must reside in the same file system (otherwise, an error is returned).
@ -164,15 +175,17 @@ ALTER
The ALTER query is only supported for *MergeTree type tables, as well as for Merge and Distributed types. The query has several variations.
Column manipulations
""""""""""""""""""""""""
""""""""""""""""""""
Lets you change the table structure.
::
.. code-block:: sql
ALTER TABLE [db].name ADD|DROP|MODIFY COLUMN ...
In the query, specify a list of one or more comma-separated actions. Each action is an operation on a column.
The following actions are supported:
::
.. code-block:: sql
ADD COLUMN name [type] [default_expr] [AFTER name_after]
Adds a new column to the table with the specified name, type, and default expression (see the section "Default expressions"). If you specify 'AFTER name_after' (the name of another column), the column is added after the specified one in the list of table columns. Otherwise, the column is added to the end of the table. Note that there is no way to add a column to the beginning of a table. For a chain of actions, 'name_after' can be the name of a column that is added in one of the previous actions.
@ -223,7 +236,7 @@ For tables that don't store data themselves (Merge and Distributed), ALTER just
The ALTER query for changing columns is replicated. The instructions are saved in ZooKeeper, then each replica applies them. All ALTER queries are run in the same order. The query waits for the appropriate actions to be completed on the other replicas. However, a query to change columns in a replicated table can be interrupted, and all actions will be performed asynchronously.
Manipulations with partitions and parts
""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""
Only works for tables in the MergeTree family. The following operations are available:
* ``DETACH PARTITION`` - Move a partition to the 'detached' directory and forget it.
@ -239,7 +252,9 @@ A partition in a table is data for a single calendar month. This is determined b
A "part" in the table is part of the data from a single partition, sorted by the primary key.
You can use the ``system.parts`` table to view the set of table parts and partitions:
::
.. code-block:: text
SELECT * FROM system.parts WHERE active
``active`` - Only count active parts. Inactive parts are, for example, source parts remaining after merging to a larger part - these parts are deleted approximately 10 minutes after merging.
@ -248,7 +263,8 @@ Another way to view a set of parts and partitions is to go into the directory wi
The directory with data is
/var/lib/clickhouse/data/database/table/,
where /var/lib/clickhouse/ is the path to ClickHouse data, 'database' is the database name, and 'table' is the table name. Example:
::
.. code-block:: bash
$ ls -l /var/lib/clickhouse/data/test/visits/
total 48
drwxrwxrwx 2 clickhouse clickhouse 20480 мая 13 02:58 20140317_20140323_2_2_0
@ -271,8 +287,9 @@ Each part corresponds to a single partition and contains data for a single month
On an operating server, you can't manually change the set of parts or their data on the file system, since the server won't know about it. For non-replicated tables, you can do this when the server is stopped, but we don't recommended it. For replicated tables, the set of parts can't be changed in any case.
The 'detached' directory contains parts that are not used by the server - detached from the table using the ALTER ... DETACH query. Parts that are damaged are also moved to this directory, instead of deleting them. You can add, delete, or modify the data in the 'detached' directory at any time - the server won't know about this until you make the ALTER TABLE ... ATTACH query.
::
ALTER TABLE [db.]table DETACH PARTITION 'name'
.. code-block:: sql
ALTER TABLE [db.]table DETACH PARTITION 'name'
Move all data for partitions named 'name' to the 'detached' directory and forget about them.
The partition name is specified in YYYYMM format. It can be indicated in single quotes or without them.
@ -280,11 +297,13 @@ The partition name is specified in YYYYMM format. It can be indicated in single
After the query is executed, you can do whatever you want with the data in the 'detached' directory — delete it from the file system, or just leave it.
The query is replicated - data will be moved to the 'detached' directory and forgotten on all replicas. The query can only be sent to a leader replica. To find out if a replica is a leader, perform SELECT to the 'system.replicas' system table. Alternatively, it is easier to make a query on all replicas, and all except one will throw an exception.
::
.. code-block:: sql
ALTER TABLE [db.]table DROP PARTITION 'name'
Similar to the DETACH operation. Deletes data from the table. Data parts will be tagged as inactive and will be completely deleted in approximately 10 minutes. The query is replicated - data will be deleted on all replicas.
::
.. code-block:: sql
ALTER TABLE [db.]table ATTACH PARTITION|PART 'name'
Adds data to the table from the 'detached' directory.
@ -294,7 +313,8 @@ It is possible to add data for an entire partition or a separate part. For a par
The query is replicated. Each replica checks whether there is data in the 'detached' directory. If there is data, it checks the integrity, verifies that it matches the data on the server that initiated the query, and then adds it if everything is correct. If not, it downloads data from the query requestor replica, or from another replica where the data has already been added.
So you can put data in the 'detached' directory on one replica, and use the ALTER ... ATTACH query to add it to the table on all replicas.
::
.. code-block:: sql
ALTER TABLE [db.]table FREEZE PARTITION 'name'
Creates a local backup of one or multiple partitions. The name can be the full name of the partition (for example, 201403), or its prefix (for example, 2014) - then the backup will be created for all the corresponding partitions.
@ -328,13 +348,14 @@ In this way, data from the backup will be added to the table.
Restoring from a backup doesn't require stopping the server.
Backups and replication
"""""""""""""""""""
"""""""""""""""""""""""
Replication provides protection from device failures. If all data disappeared on one of your replicas, follow the instructions in the "Restoration after failure" section to restore it.
For protection from device failures, you must use replication. For more information about replication, see the section "Data replication".
Backups protect against human error (accidentally deleting data, deleting the wrong data or in the wrong cluster, or corrupting data). For high-volume databases, it can be difficult to copy backups to remote servers. In such cases, to protect from human error, you can keep a backup on the same server (it will reside in /var/lib/clickhouse/shadow/).
::
.. code-block:: sql
ALTER TABLE [db.]table FETCH PARTITION 'name' FROM 'path-in-zookeeper'
This query only works for replicatable tables.
@ -351,7 +372,7 @@ Before downloading, the system checks that the partition exists and the table st
The ALTER ... FETCH PARTITION query is not replicated. The partition will be downloaded to the 'detached' directory only on the local server. Note that if after this you use the ALTER TABLE ... ATTACH query to add data to the table, the data will be added on all replicas (on one of the replicas it will be added from the 'detached' directory, and on the rest it will be loaded from neighboring replicas).
Synchronicity of ALTER queries
"""""""""""""""""""""""""""
""""""""""""""""""""""""""""""
For non-replicatable tables, all ALTER queries are performed synchronously. For replicatable tables, the query just adds instructions for the appropriate actions to ZooKeeper, and the actions themselves are performed as soon as possible. However, the query can wait for these actions to be completed on all the replicas.
For ``ALTER ... ATTACH|DETACH|DROP`` queries, you can use the ``'replication_alter_partitions_sync'`` setting to set up waiting.
@ -623,7 +644,7 @@ Allows executing JOIN with an array or nested data structure. The intent is simi
ARRAY JOIN is essentially INNER JOIN with an array. Example:
.. code-block:: sql
.. code-block:: text
:) CREATE TABLE arrays_test (s String, arr Array(UInt8)) ENGINE = Memory
@ -684,6 +705,8 @@ An alias can be specified for an array in the ARRAY JOIN clause. In this case, a
FROM arrays_test
ARRAY JOIN arr AS a
.. code-block:: text
┌─s─────┬─arr─────┬─a─┐
│ Hello │ [1,2] │ 1 │
│ Hello │ [1,2] │ 2 │
@ -697,7 +720,7 @@ An alias can be specified for an array in the ARRAY JOIN clause. In this case, a
Multiple arrays of the same size can be comma-separated in the ARRAY JOIN clause. In this case, JOIN is performed with them simultaneously (the direct sum, not the direct product).
Example:
.. code-block:: sql
.. code-block:: text
:) SELECT s, arr, a, num, mapped FROM arrays_test ARRAY JOIN arr AS a, arrayEnumerate(arr) AS num, arrayMap(x -> x + 1, arr) AS mapped
@ -733,7 +756,7 @@ Example:
ARRAY JOIN also works with nested data structures. Example:
.. code-block:: sql
.. code-block:: text
:) CREATE TABLE nested_test (s String, nest Nested(x UInt8, y UInt32)) ENGINE = Memory
@ -788,7 +811,7 @@ ARRAY JOIN also works with nested data structures. Example:
When specifying names of nested data structures in ARRAY JOIN, the meaning is the same as ARRAY JOIN with all the array elements that it consists of. Example:
.. code-block:: sql
.. code-block:: text
:) SELECT s, nest.x, nest.y FROM nested_test ARRAY JOIN nest.x, nest.y
@ -816,6 +839,8 @@ This variation also makes sense:
FROM nested_test
ARRAY JOIN `nest.x`
.. code-block:: text
┌─s─────┬─nest.x─┬─nest.y─────┐
│ Hello │ 1 │ [10,20] │
│ Hello │ 2 │ [10,20] │
@ -836,6 +861,8 @@ An alias may be used for a nested data structure, in order to select either the
FROM nested_test
ARRAY JOIN nest AS n
.. code-block:: text
┌─s─────┬─n.x─┬─n.y─┬─nest.x──┬─nest.y─────┐
│ Hello │ 1 │ 10 │ [1,2] │ [10,20] │
│ Hello │ 2 │ 20 │ [1,2] │ [10,20] │
@ -856,6 +883,8 @@ Example of using the arrayEnumerate function:
FROM nested_test
ARRAY JOIN nest AS n, arrayEnumerate(`nest.x`) AS num
.. code-block:: text
┌─s─────┬─n.x─┬─n.y─┬─nest.x──┬─nest.y─────┬─num─┐
│ Hello │ 1 │ 10 │ [1,2] │ [10,20] │ 1 │
│ Hello │ 2 │ 20 │ [1,2] │ [10,20] │ 2 │
@ -935,6 +964,8 @@ Example:
ORDER BY hits DESC
LIMIT 10
.. code-block:: text
┌─CounterID─┬───hits─┬─visits─┐
│ 1143050 │ 523264 │ 13665 │
│ 731962 │ 475698 │ 102716 │
@ -1030,7 +1061,7 @@ GROUP BY is not supported for array columns.
A constant can't be specified as arguments for aggregate functions. Example: sum(1). Instead of this, you can get rid of the constant. Example: ``count()``.
WITH TOTALS modifier
^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^
If the WITH TOTALS modifier is specified, another row will be calculated. This row will have key columns containing default values (zeros or empty lines), and columns of aggregate functions with the values calculated across all the rows (the "total" values).
@ -1056,7 +1087,7 @@ If 'max_rows_to_group_by' and 'group_by_overflow_mode = 'any'' are not used, all
You can use WITH TOTALS in subqueries, including subqueries in the JOIN clause. In this case, the respective total values are combined.
external memory GROUP BY
^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^
It is possible to turn on spilling temporary data to disk to limit memory consumption during the execution of GROUP BY. Value of ``max_bytes_before_external_group_by`` setting determines the maximum memory consumption before temporary data is dumped to the file system. If it is 0 (the default value), the feature is turned off.
@ -1071,7 +1102,7 @@ If external aggregation is turned on and total memory consumption was less than
If you have an ORDER BY clause with some small LIMIT after a GROUP BY, then ORDER BY will not consume significant amount of memory. But if no LIMIT is provided, don't forget to turn on external sorting (``max_bytes_before_external_sort``).
LIMIT N BY modifier
^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^
LIMIT ``N`` BY ``COLUMNS`` allows you to restrict top ``N`` rows per each group of ``COLUMNS``. ``LIMIT N BY`` is unrelated to ``LIMIT`` clause. Key for ``LIMIT N BY`` could contain arbitrary number of columns or expressions.
@ -1241,6 +1272,8 @@ Example:
GROUP BY EventDate
ORDER BY EventDate ASC
.. code-block:: text
┌──EventDate─┬────ratio─┐
│ 2014-03-17 │ 1 │
│ 2014-03-18 │ 0.807696 │
@ -1255,7 +1288,7 @@ Example:
A subquery in the IN clause is always run just one time on a single server. There are no dependent subqueries.
Distributed subqueries
"""""""""""""""""""""""""
""""""""""""""""""""""
There are two versions of INs with subqueries (and for JOINs): the regular ``IN`` / ``JOIN``, and ``GLOBAL IN`` / ``GLOBAL JOIN``. They differ in how they are run for distributed query processing.
@ -1349,7 +1382,7 @@ This is more optimal than using the normal IN. However, keep the following point
It also makes sense to specify a local table in the GLOBAL IN clause, in case this local table is only available on the requestor server and you want to use data from it on remote servers.
Extreme values
""""""""""""""""""""""
""""""""""""""
In addition to results, you can also get minimum and maximum values for the results columns. To do this, set the 'extremes' setting to '1'. Minimums and maximums are calculated for numeric types, dates, and dates with times. For other columns, the default values are output.
@ -1360,7 +1393,7 @@ In JSON* formats, the extreme values are output in a separate 'extremes' field.
Extreme values are calculated for rows that have passed through LIMIT. However, when using 'LIMIT offset, size', the rows before 'offset' are included in 'extremes'. In stream requests, the result may also include a small number of rows that passed through LIMIT.
Notes
"""""""""
"""""
The GROUP BY and ORDER BY clauses do not support positional arguments. This contradicts MySQL, but conforms to standard SQL.
For example, ``'GROUP BY 1, 2'`` will be interpreted as grouping by constants (i.e. aggregation of all rows into one).

View File

@ -1,9 +1,10 @@
Syntax
---------
------
There are two types of parsers in the system: a full SQL parser (a recursive descent parser), and a data format parser (a fast stream parser). In all cases except the INSERT query, only the full SQL parser is used.
The INSERT query uses both parsers:
::
.. code-block:: sql
INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')
The ``INSERT INTO t VALUES`` fragment is parsed by the full parser, and the data ``(1, 'Hello, world'), (2, 'abc'), (3, 'def')`` is parsed by the fast stream parser.
@ -14,21 +15,21 @@ When using the Values format in an ``INSERT`` query, it may seem that data is pa
Next we will cover the full parser. For more information about format parsers, see the section "Formats".
Spaces
~~~~~~~
~~~~~~
There may be any number of space symbols between syntactical constructions (including the beginning and end of a query). Space symbols include the space, tab, line break, CR, and form feed.
Comments
~~~~~~~~~~~
~~~~~~~~
SQL-style and C-style comments are supported.
SQL-style comments: from ``--`` to the end of the line. The space after ``--`` can be omitted.
C-style comments: from ``/*`` to ``*/``. These comments can be multiline. Spaces are not required here, either.
Keywords
~~~~~~~~~~~~~~
~~~~~~~~
Keywords (such as SELECT) are not case-sensitive. Everything else (column names, functions, and so on), in contrast to standard SQL, is case-sensitive. Keywords are not reserved (they are just parsed as keywords in the corresponding context).
Identifiers
~~~~~~~~~~~~~~
~~~~~~~~~~~
Identifiers (column names, functions, and data types) can be quoted or non-quoted.
Non-quoted identifiers start with a Latin letter or underscore, and continue with a Latin letter, underscore, or number. In other words, they must match the regex ``^[a-zA-Z_][0-9a-zA-Z_]*$``. Examples: ``x``, ``_1``, ``X_y__Z123_``.
Quoted identifiers are placed in reversed quotation marks ```id``` (the same as in MySQL), and can indicate any set of bytes (non-empty). In addition, symbols (for example, the reverse quotation mark) inside this type of identifier can be backslash-escaped. Escaping rules are the same as for string literals (see below).
@ -39,7 +40,7 @@ Literals
There are numeric literals, string literals, and compound literals.
Numeric literals
"""""""""""""""""
""""""""""""""""
A numeric literal tries to be parsed:
- first as a 64-bit signed number, using the 'strtoull' function.
- if unsuccessful, as a 64-bit unsigned number, using the 'strtoll' function.
@ -52,20 +53,20 @@ For example, 1 is parsed as UInt8, but 256 is parsed as UInt16. For more informa
Examples: ``1``, ``18446744073709551615``, ``0xDEADBEEF``, ``01``, ``0.1``, ``1e100``, ``-1e-100``, ``inf``, ``nan``.
String literals
""""""""""""""""""
"""""""""""""""
Only string literals in single quotes are supported. The enclosed characters can be backslash-escaped. The following escape sequences have special meanings: ``\b``, ``\f``, ``\r``, ``\n``, ``\t``, ``\0``, ``\a``, ``\v``, ``\xHH``. In all other cases, escape sequences like \c, where c is any character, are transformed to c. This means that the sequences ``\'`` and ``\\`` can be used. The value will have the String type.
Minimum set of symbols that must be escaped in string literal is ``'`` and ``\``.
Compound literals
""""""""""""""""""
"""""""""""""""""
Constructions are supported for arrays: ``[1, 2, 3]`` and tuples: ``(1, 'Hello, world!', 2)``.
Actually, these are not literals, but expressions with the array creation operator and the tuple creation operator, respectively. For more information, see the section "Operators".
An array must consist of at least one item, and a tuple must have at least two items.
Tuples have a special purpose for use in the IN clause of a SELECT query. Tuples can be obtained as the result of a query, but they can't be saved to a database (with the exception of Memory-type tables).
Functions
~~~~~~~
~~~~~~~~~
Functions are written like an identifier with a list of arguments (possibly empty) in brackets. In contrast to standard SQL, the brackets are required, even for an empty arguments list. Example: ``now()``.
There are regular and aggregate functions (see the section "Aggregate functions"). Some aggregate functions can contain two lists of arguments in brackets. Example: ``quantile(0.9)(x)``. These aggregate functions are called "parametric" functions, and the arguments in the first list are called "parameters". The syntax of aggregate functions without parameters is the same as for regular functions.
@ -76,23 +77,24 @@ For example, the expression ``1 + 2 * 3 + 4`` is transformed to ``plus(plus(1, m
For more information, see the section "Operators" below.
Data types and database table engines
~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Data types and table engines in the ``CREATE`` query are written the same way as identifiers or functions. In other words, they may or may not contain an arguments list in brackets. For more information, see the sections "Data types," "Table engines," and "CREATE".
Synonyms
~~~~~~~~
In the SELECT query, expressions can specify synonyms using the AS keyword. Any expression is placed to the left of AS. The identifier name for the synonym is placed to the right of AS. As opposed to standard SQL, synonyms are not only declared on the top level of expressions:
::
.. code-block:: sql
SELECT (1 AS n) + 2, n
In contrast to standard SQL, synonyms can be used in all parts of a query, not just ``SELECT``.
Asterisk
~~~~~~~~~
~~~~~~~~
In a ``SELECT`` query, an asterisk can replace the expression. For more information, see the section "SELECT".
Expressions
~~~~~~~~~
~~~~~~~~~~~
An expression is a function, identifier, literal, application of an operator, expression in brackets, subquery, or asterisk. It can also contain a synonym.
A list of expressions is one or more expressions separated by commas.
Functions and operators, in turn, can have expressions as arguments.

View File

@ -0,0 +1,20 @@
Roadmap
=======
Q3 2017
-------
* ``SYSTEM`` queries
* Limit on parallel replica downloads
* Finalize ``NULL`` support
* ``SELECT db.table.column``
Q4 2017
-------
* Arbitrary partitioning key for MergeTree engine family
* Better compliance of ``JOIN`` syntax with SQL standard
* Resource pools for queries (CPU, disk I/O, network bandwidth)
Q1 2018
-------
* Basic support for ``UPDATE`` and ``DELETE``

View File

@ -1,5 +1,5 @@
Settings
==========
========
In this section, we review settings that you can make using a SET query or in a config file. Remember that these settings can be set for a session or globally. Settings that can only be made in the server config file are not covered here.

View File

@ -1,5 +1,5 @@
Restrictions on query complexity
=====================
================================
Restrictions on query complexity are part of the settings.
They are used in order to provide safer execution from the user interface.
Almost all the restrictions only apply to SELECTs.
@ -16,7 +16,7 @@ It can take one of two values: 'throw' or 'break'. Restrictions on aggregation (
``any`` (only for group_by_overflow_mode) - Continuing aggregation for the keys that got into the set, but don't add new keys to the set.
readonly
-------
--------
If set to 0, allows to run any queries.
If set to 1, allows to run only queries that don't change data or settings (e.g. SELECT or SHOW). INSERT and SET are forbidden.
If set to 2, allows to run queries that don't change data (SELECT, SHOW) and allows to change settings (SET).
@ -26,7 +26,7 @@ After you set the read-only mode, you won't be able to disable it in the current
When using the GET method in the HTTP interface, 'readonly = 1' is set automatically. In other words, for queries that modify data, you can only use the POST method. You can send the query itself either in the POST body, or in the URL parameter.
max_memory_usage
--------------
----------------
The maximum amount of memory consumption when running a query on a single server. By default, 10 GB.
The setting doesn't consider the volume of available memory or the total volume of memory on the machine.
@ -41,133 +41,133 @@ Certain cases of memory consumption are not tracked:
Memory consumption is not fully considered for aggregate function states ``min``, ``max``, ``any``, ``anyLast``, ``argMin``, and ``argMax`` from String and Array arguments.
max_rows_to_read
---------------
----------------
The following restrictions can be checked on each block (instead of on each row). That is, the restrictions can be broken a little.
When running a query in multiple threads, the following restrictions apply to each thread separately.
Maximum number of rows that can be read from a table when running a query.
max_bytes_to_read
-------------
-----------------
Maximum number of bytes (uncompressed data) that can be read from a table when running a query.
read_overflow_mode
-------------
------------------
What to do when the volume of data read exceeds one of the limits: ``throw`` or ``break``. ``By default, throw``.
max_rows_to_group_by
-------------
--------------------
Maximum number of unique keys received from aggregation. This setting lets you limit memory consumption when aggregating.
group_by_overflow_mode
---------------
----------------------
What to do when the number of unique keys for aggregation exceeds the limit: ``throw``, ``break``, or ``any``. ``By default, throw``.
Using the 'any' value lets you run an approximation of GROUP BY. The quality of this approximation depends on the statistical nature of the data.
max_rows_to_sort
--------------
----------------
Maximum number of rows before sorting. This allows you to limit memory consumption when sorting.
max_bytes_to_sort
-------------
-----------------
Maximum number of bytes before sorting.
sort_overflow_mode
------------
------------------
What to do if the number of rows received before sorting exceeds one of the limits: ``throw`` or ``break``. ``By default, throw``.
max_result_rows
-------------
---------------
Limit on the number of rows in the result. Also checked for subqueries, and on remote servers when running parts of a distributed query.
max_result_bytes
-------------
----------------
Limit on the number of bytes in the result. The same as the previous setting.
result_overflow_mode
--------------
--------------------
What to do if the volume of the result exceeds one of the limits: ``throw`` or ``break``. By default, throw.
Using ``break`` is similar to using ``LIMIT``.
max_execution_time
--------------
------------------
Maximum query execution time in seconds.
At this time, it is not checked for one of the sorting stages, or when merging and finalizing aggregate functions.
timeout_overflow_mode
---------------
---------------------
What to do if the query is run longer than ``max_execution_time``: ``throw`` or ``break``. ``By default, throw``.
min_execution_speed
--------------
-------------------
Minimal execution speed in rows per second. Checked on every data block when ``timeout_before_checking_execution_speed`` expires. If the execution speed is lower, an exception is thrown.
timeout_before_checking_execution_speed
---------------
---------------------------------------
Checks that execution speed is not too slow (no less than ``min_execution_speed``), after the specified time in seconds has expired.
max_columns_to_read
--------------
-------------------
Maximum number of columns that can be read from a table in a single query. If a query requires reading a greater number of columns, it throws an exception.
max_temporary_columns
----------------
---------------------
Maximum number of temporary columns that must be kept in RAM at the same time when running a query, including constant columns. If there are more temporary columns than this, it throws an exception.
max_temporary_non_const_columns
---------------------
-------------------------------
The same thing as 'max_temporary_columns', but without counting constant columns.
Note that constant columns are formed fairly often when running a query, but they require approximately zero computing resources.
max_subquery_depth
-------------
------------------
Maximum nesting depth of subqueries. If subqueries are deeper, an exception is thrown. ``By default, 100``.
max_pipeline_depth
-----------
------------------
Maximum pipeline depth. Corresponds to the number of transformations that each data block goes through during query processing. Counted within the limits of a single server. If the pipeline depth is greater, an exception is thrown. By default, 1000.
max_ast_depth
-----------
-------------
Maximum nesting depth of a query syntactic tree. If exceeded, an exception is thrown. At this time, it isn't checked during parsing, but only after parsing the query. That is, a syntactic tree that is too deep can be created during parsing, but the query will fail. By default, 1000.
max_ast_elements
-----------
----------------
Maximum number of elements in a query syntactic tree. If exceeded, an exception is thrown.
In the same way as the previous setting, it is checked only after parsing the query. ``By default, 10,000``.
max_rows_in_set
----------
---------------
Maximum number of rows for a data set in the IN clause created from a subquery.
max_bytes_in_set
-----------
----------------
Maximum number of bytes (uncompressed data) used by a set in the IN clause created from a subquery.
set_overflow_mode
-----------
-----------------
What to do when the amount of data exceeds one of the limits: ``throw`` or ``break``. ``By default, throw``.
max_rows_in_distinct
-----------
--------------------
Maximum number of different rows when using DISTINCT.
max_bytes_in_distinct
--------------
---------------------
Maximum number of bytes used by a hash table when using DISTINCT.
distinct_overflow_mode
------------
----------------------
What to do when the amount of data exceeds one of the limits: ``throw`` or ``break``. ``By default, throw``.
max_rows_to_transfer
-----------
--------------------
Maximum number of rows that can be passed to a remote server or saved in a temporary table when using GLOBAL IN.
max_bytes_to_transfer
-----------
---------------------
Maximum number of bytes (uncompressed data) that can be passed to a remote server or saved in a temporary table when using GLOBAL IN.
transfer_overflow_mode
---------
----------------------
What to do when the amount of data exceeds one of the limits: ``throw`` or ``break``. ``By default, throw``.

View File

@ -7,7 +7,7 @@ By default, it is 65,536.
Blocks the size of 'max_block_size' are not always loaded from the table. If it is obvious that less data needs to be retrieved, a smaller block is processed.
max_insert_block_size
--------------------
---------------------
The size of blocks to form for insertion into a table.
This setting only applies in cases when the server forms the blocks.
For example, for an INSERT via the HTTP interface, the server parses the data format and forms blocks of the specified size.
@ -26,7 +26,7 @@ The maximum number of query processing threads
This parameter applies to threads that perform the same stages of the query execution pipeline in parallel.
For example, if reading from a table, evaluating expressions with functions, filtering with WHERE and pre-aggregating for GROUP BY can all be done in parallel using at least ``max_threads`` number of threads, then 'max_threads' are used.
By default, ``8``.
By default, 8.
If less than one SELECT query is normally run on a server at a time, set this parameter to a value slightly less than the actual number of processor cores.
@ -35,13 +35,13 @@ For queries that are completed quickly because of a LIMIT, you can set a lower `
The smaller the ``max_threads`` value, the less memory is consumed.
max_compress_block_size
-----------
-----------------------
The maximum size of blocks of uncompressed data before compressing for writing to a table. By default, ``1,048,576 (1 MiB)``. If the size is reduced, the compression rate is significantly reduced, the compression and decompression speed increases slightly due to cache locality, and memory consumption is reduced. There usually isn't any reason to change this setting.
Don't confuse blocks for compression (a chunk of memory consisting of bytes) and blocks for query processing (a set of rows from a table).
min_compress_block_size
--------------
-----------------------
For *MergeTree tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least ``min_compress_block_size``. By default, 65,536.
The actual size of the block, if the uncompressed data less than ``max_compress_block_size`` is no less than this value and no less than the volume of data for one mark.
@ -55,68 +55,71 @@ We are writing a URL column with the String type (average size of 60 bytes per v
There usually isn't any reason to change this setting.
max_query_size
-----------
--------------
The maximum part of a query that can be taken to RAM for parsing with the SQL parser.
The INSERT query also contains data for INSERT that is processed by a separate stream parser (that consumes O(1) RAM), which is not included in this restriction.
``By default, 256 KiB.``
By default, 256 KiB.
interactive_delay
-------------
-----------------
The interval in microseconds for checking whether request execution has been canceled and sending the progress.
By default, 100,000 (check for canceling and send progress ten times per second).
connect_timeout
-----------
---------------
receive_timeout
---------
---------------
send_timeout
---------
------------
Timeouts in seconds on the socket used for communicating with the client.
``By default, 10, 300, 300.``
By default, 10, 300, 300.
poll_interval
----------
-------------
Lock in a wait loop for the specified number of seconds.
``By default, 10``.
By default, 10.
max_distributed_connections
----------------
---------------------------
The maximum number of simultaneous connections with remote servers for distributed processing of a single query to a single Distributed table. We recommend setting a value no less than the number of servers in the cluster.
``By default, 100.``
By default, 100.
The following parameters are only used when creating Distributed tables (and when launching a server), so there is no reason to change them at runtime.
distributed_connections_pool_size
-------------------
---------------------------------
The maximum number of simultaneous connections with remote servers for distributed processing of all queries to a single Distributed table. We recommend setting a value no less than the number of servers in the cluster.
``By default, 128.``
By default, 128.
connect_timeout_with_failover_ms
----------------
--------------------------------
The timeout in milliseconds for connecting to a remote server for a Distributed table engine, if the 'shard' and 'replica' sections are used in the cluster definition.
If unsuccessful, several attempts are made to connect to various replicas.
``By default, 50.``
By default, 50.
connections_with_failover_max_tries
----------------
-----------------------------------
The maximum number of connection attempts with each replica, for the Distributed table engine.
``By default, 3.``
By default, 3.
extremes
-----
--------
Whether to count extreme values (the minimums and maximums in columns of a query result).
Accepts 0 or 1. By default, 0 (disabled).
For more information, see the section "Extreme values".
use_uncompressed_cache
----------
----------------------
Whether to use a cache of uncompressed blocks. Accepts 0 or 1. By default, 0 (disabled).
The uncompressed cache (only for tables in the MergeTree family) allows significantly reducing latency and increasing throughput when working with a large number of short queries. Enable this setting for users who send frequent short requests. Also pay attention to the ``uncompressed_cache_size`` configuration parameter (only set in the config file) - the size of uncompressed cache blocks.
By default, it is 8 GiB. The uncompressed cache is filled in as needed; the least-used data is automatically deleted.
@ -124,26 +127,26 @@ By default, it is 8 GiB. The uncompressed cache is filled in as needed; the leas
For queries that read at least a somewhat large volume of data (one million rows or more), the uncompressed cache is disabled automatically in order to save space for truly small queries. So you can keep the ``use_uncompressed_cache`` setting always set to 1.
replace_running_query
-----------
---------------------
When using the HTTP interface, the 'query_id' parameter can be passed. This is any string that serves as the query identifier.
If a query from the same user with the same 'query_id' already exists at this time, the behavior depends on the 'replace_running_query' parameter.
``0 (default)`` - Throw an exception (don't allow the query to run if a query with the same 'query_id' is already running).
``0`` (default) - Throw an exception (don't allow the query to run if a query with the same 'query_id' is already running).
``1`` - Cancel the old query and start running the new one.
Yandex.Metrica uses this parameter set to 1 for implementing suggestions for segmentation conditions. After entering the next character, if the old query hasn't finished yet, it should be canceled.
load_balancing
-----------
--------------
Which replicas (among healthy replicas) to preferably send a query to (on the first attempt) for distributed processing.
random (по умолчанию)
~~~~~~~~~~~~~~~~
random (by default)
~~~~~~~~~~~~~~~~~~~
The number of errors is counted for each replica. The query is sent to the replica with the fewest errors, and if there are several of these, to any one of them.
Disadvantages: Server proximity is not accounted for; if the replicas have different data, you will also get different data.
nearest_hostname
~~~~~~~~~
~~~~~~~~~~~~~~~~
The number of errors is counted for each replica. Every 5 minutes, the number of errors is integrally divided by 2. Thus, the number of errors is calculated for a recent time with exponential smoothing. If there is one replica with a minimal number of errors (i.e. errors occurred recently on the other replicas), the query is sent to it. If there are multiple replicas with the same minimal number of errors, the query is sent to the replica with a host name that is most similar to the server's host name in the config file (for the number of different characters in identical positions, up to the minimum length of both host names).
As an example, example01-01-1 and example01-01-2.yandex.ru are different in one position, while example01-01-1 and example01-02-2 differ in two places.
@ -153,7 +156,7 @@ Thus, if there are equivalent replicas, the closest one by name is preferred.
We can also assume that when sending a query to the same server, in the absence of failures, a distributed query will also go to the same servers. So even if different data is placed on the replicas, the query will return mostly the same results.
in_order
~~~~~~~
~~~~~~~~
Replicas are accessed in the same order as they are specified. The number of errors does not matter. This method is appropriate when you know exactly which replica is preferable.
totals_mode
@ -162,19 +165,19 @@ How to calculate TOTALS when HAVING is present, as well as when max_rows_to_grou
See the section "WITH TOTALS modifier".
totals_auto_threshold
--------------
---------------------
The threshold for ``totals_mode = 'auto'``.
See the section "WITH TOTALS modifier".
default_sample
----------
--------------
A floating-point number from 0 to 1. By default, 1.
Allows setting a default sampling coefficient for all SELECT queries.
(For tables that don't support sampling, an exception will be thrown.)
If set to 1, default sampling is not performed.
max_parallel_replicas
---------------
---------------------
The maximum number of replicas of each shard used when the query is executed.
For consistency (to get different parts of the same partition), this option only works for the specified sampling key.
The lag of the replicas is not controlled.
@ -187,7 +190,7 @@ Compilation is provided for only part of the request processing pipeline - for t
In the event that this part of the pipeline was compiled, the query can work faster, by deploying short loops and inlining the aggregate function calls. The maximum performance increase (up to four times in rare cases) is achieved on queries with several simple aggregate functions. Typically, the performance gain is negligible. In very rare cases, the request may be slowed down.
min_count_to_compile
---------------
--------------------
After how many times, when the compiled piece of code could come in handy, perform its compilation. The default is 3.
In case the value is zero, the compilation is executed synchronously, and the request will wait for the compilation process to finish before continuing. This can be used for testing, otherwise use values starting with 1. Typically, compilation takes about 5-10 seconds.
If the value is 1 or more, the compilation is performed asynchronously, in a separate thread. If the result is ready, it will be immediately used, including those already running at the moment requests.
@ -196,13 +199,13 @@ The compiled code is required for each different combination of aggregate functi
The compilation results are saved in the build directory as .so files. The number of compilation results is unlimited, since they do not take up much space. When the server is restarted, the old results will be used, except for the server update - then the old results are deleted.
input_format_skip_unknown_fields
----------------
--------------------------------
If the parameter is true, INSERT operation will skip columns with unknown names from input.
Otherwise, an exception will be generated, it is default behavior.
The parameter works only for JSONEachRow and TSKV input formats.
output_format_json_quote_64bit_integers
-----------------
---------------------------------------
If the parameter is true (default value), UInt64 and Int64 numbers are printed as quoted strings in all JSON output formats.
Such behavior is compatible with most JavaScript interpreters that stores all numbers as double-precision floating point numbers.
Otherwise, they are printed as regular numbers.

View File

@ -1,13 +1,15 @@
Settings profiles
================
=================
A settings profile is a collection of settings grouped under the same name. Each ClickHouse user has a profile.
To apply all the settings in a profile, set 'profile'. Example:
::
.. code-block:: sql
SET profile = 'web'
- Load the 'web' profile. That is, set all the options belonging to the 'web' profile.
Load the 'web' profile. That is, set all the options belonging to the 'web' profile.
Settings profiles are declared in the user config file. This is normally 'users.xml'.
Example:
.. code-block:: xml

View File

@ -1,5 +1,5 @@
System tables
==========
=============
System tables are used for implementing part of the system's functionality, and for providing access to information about how the system is working.
You can't delete a system table (but you can perform DETACH).

View File

@ -3,7 +3,8 @@ system.clusters
Contains information about clusters available in the config file and the servers in them.
Columns:
::
.. code-block:: text
cluster String - Cluster name.
shard_num UInt32 - Number of a shard in the cluster, starting from 1.
shard_weight UInt32 - Relative weight of a shard when writing data.

View File

@ -3,7 +3,8 @@ system.columns
Contains information about the columns in all tables.
You can use this table to get information similar to ``DESCRIBE TABLE``, but for multiple tables at once.
::
.. code-block:: text
database String - Name of the database the table is located in.
table String - Table name.
name String - Column name.

View File

@ -4,7 +4,8 @@ system.dictionaries
Contains information about external dictionaries.
Columns:
::
.. code-block:: text
name String - Dictionary name.
type String - Dictionary type: Flat, Hashed, Cache.
origin String - Path to the config file where the dictionary is described.

View File

@ -3,7 +3,7 @@ system.functions
Contains information about normal and aggregate functions.
Columns:
.. code-block:: text
::
name String - Function name.
is_aggregate UInt8 - Whether it is an aggregate function.

View File

@ -3,7 +3,8 @@ system.merges
Contains information about merges currently in process for tables in the MergeTree family.
Columns:
::
.. code-block:: text
database String - Name of the database the table is located in.
table String - Name of the table.
elapsed Float64 - Time in seconds since the merge started.

View File

@ -3,7 +3,8 @@ system.parts
Contains information about parts of a table in the MergeTree family.
Columns:
::
.. code-block:: text
database String - Name of the database where the table that this part belongs to is located.
table String - Name of the table that this part belongs to.
engine String - Name of the table engine, without parameters.

View File

@ -3,7 +3,8 @@ system.processes
This system table is used for implementing the ``SHOW PROCESSLIST`` query.
Columns:
::
.. code-block:: text
user String - Name of the user who made the request. For distributed query processing, this is the user who helped the requestor server send the query to this server, not the user who made the distributed request on the requestor server.
address String - The IP address the request was made from. The same for distributed processing.

View File

@ -5,7 +5,7 @@ Contains information and status for replicated tables residing on the local serv
Example:
.. code-block:: sql
.. code-block:: text
SELECT *
FROM system.replicas
@ -34,8 +34,10 @@ Example:
total_replicas: 2
active_replicas: 2
Столбцы:
::
Columns:
.. code-block:: text
database: Database name.
table: Table name.
engine: Table engine name.

View File

@ -4,7 +4,8 @@ system.settings
Contains information about settings that are currently in use (i.e. used for executing the query you are using to read from the system.settings table).
Columns:
::
.. code-block:: text
name String - Setting name.
value String - Setting value.
changed UInt8 - Whether the setting was explicitly defined in the config or explicitly changed.

View File

@ -9,7 +9,8 @@ To output data for all root nodes, write path = '/'.
If the path specified in 'path' doesn't exist, an exception will be thrown.
Columns:
::
.. code-block:: text
name String - Name of the node.
path String - Path to the node.
value String - Value of the node.
@ -27,7 +28,7 @@ Columns:
Example:
.. code-block:: sql
.. code-block:: text
SELECT *
FROM system.zookeeper

View File

@ -2,7 +2,8 @@ Buffer
------
Buffers the data to write in RAM, periodically flushing it to another table. During the read operation, data is read from the buffer and the other table simultaneously.
::
.. code-block:: text
Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)
Engine parameters:
@ -20,7 +21,8 @@ During the write operation, data is inserted to a 'num_layers' number of random
The conditions for flushing the data are calculated separately for each of the 'num_layers' buffers. For example, if num_layers = 16 and max_bytes = 100000000, the maximum RAM consumption is 1.6 GB.
Example:
::
.. code-block:: sql
CREATE TABLE merge.hits_buffer AS merge.hits ENGINE = Buffer(merge, hits, 16, 10, 100, 10000, 1000000, 10000000, 100000000)
Creating a 'merge.hits_buffer' table with the same structure as 'merge.hits' and using the Buffer engine. When writing to this table, data is buffered in RAM and later written to the 'merge.hits' table. 16 buffers are created. The data in each of them is flushed if either 100 seconds have passed, or one million rows have been written, or 100 MB of data have been written; or if simultaneously 10 seconds have passed and 10,000 rows and 10 MB of data have been written. For example, if just one row has been written, after 100 seconds it will be flushed, no matter what. But if many rows have been written, the data will be flushed sooner.

View File

@ -1,11 +1,12 @@
Distributed
-----------
**The Distributed engine does not store data itself**, but allows distributed query processing on multiple servers.
**The Distributed engine by itself does not store data**, but allows distributed query processing on multiple servers.
Reading is automatically parallelized. During a read, the table indexes on remote servers are used, if there are any.
The Distributed engine accepts parameters: the cluster name in the server's config file, the name of a remote database, the name of a remote table, and (optionally) a sharding key.
Example:
::
.. code-block:: text
Distributed(logs, default, hits[, sharding_key])
- Data will be read from all servers in the 'logs' cluster, from the 'default.hits' table located on every server in the cluster.

View File

@ -1,4 +1,4 @@
File(InputFormat)
-----------------
The data source is a file that stores data in one of the supported input formats (TabSeparated, Native, и т. д.) ...
The data source is a file that stores data in one of the supported input formats (TabSeparated, Native, etc.) ...

View File

@ -2,7 +2,8 @@ Join
----
A prepared data structure for JOIN that is always located in RAM.
::
.. code-block:: text
Join(ANY|ALL, LEFT|INNER, k1[, k2, ...])
Engine parameters: ``ANY``|``ALL`` - strictness, and ``LEFT``|``INNER`` - the type. These parameters are set without quotes and must match the JOIN that the table will be used for. k1, k2, ... are the key columns from the USING clause that the join will be made on.

View File

@ -1,5 +1,5 @@
Log
----
---
Log differs from TinyLog in that a small file of "marks" resides with the column files. These marks are written on every data block and contain offsets - where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads. For concurrent data access, the read operations can be performed simultaneously, while write operations block reads and each other.
The Log engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The Log engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes.

View File

@ -1,4 +1,4 @@
MaterializedView
-----------------
----------------
Used for implementing materialized views (for more information, see ``CREATE MATERIALIZED VIEW``). For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses this engine.

Some files were not shown because too many files have changed in this diff Show More