Docapi 4994 registry (#4214)

This commit is contained in:
BayoNet 2019-02-04 16:30:28 +03:00 committed by Ivan Blinkov
parent 7bd76c89de
commit 37b1d8369c
35 changed files with 300 additions and 1591 deletions

View File

@ -0,0 +1,32 @@
# Monitoring
You can monitor:
- Hardware resources utilization.
- ClickHouse server metrics.
## Resources Utilization
ClickHouse does not monitor the state of hardware resources by itself.
It is highly recommended to set up monitoring for:
- Processors load and temperature.
You can use [dmesg](https://en.wikipedia.org/wiki/Dmesg), [turbostat](https://www.linux.org/docs/man8/turbostat.html) or other instruments.
- Utilization of storage system, RAM and network.
## ClickHouse Server Metrics
ClickHouse server has embedded instruments for self-state monitoring.
To monitor server events use server logs. See the [logger](#server_settings-logger) section of the configuration file.
ClickHouse collects different metrics of computational resources usage and common statistics of queries processing. You can find metrics in tables [system.metrics](#system_tables-metrics), [system.events](#system_tables-events) и [system.asynchronous_metrics](#system_tables-asynchronous_metrics).
You can configure ClickHouse to export metrics to [Graphite](https://github.com/graphite-project). See the [Graphite section](server_settings/settings.md#server_settings-graphite) of ClickHouse server configuration file. Before configuring metrics export, you should set up Graphite by following their official guide https://graphite.readthedocs.io/en/latest/install.html.
Also, you can monitor server availability through the HTTP API. Send the `HTTP GET` request to `/`. If server available, it answers `200 OK`.
To monitor servers in a cluster configuration, you should set [max_replica_delay_for_distributed_queries](settings/settings.md#settings-max_replica_delay_for_distributed_queries) parameter and use HTTP resource `/replicas-delay`. Request to `/replicas-delay` returns `200 OK` if the replica is available and does not delay behind others. If replica delays, it returns the information about the gap.

View File

@ -0,0 +1,52 @@
# Requirements
## CPU
In case of installation from prebuilt deb-packages use CPU with x86/64 architecture and SSE 4.2 instructions support. If you build ClickHouse from sources, you can use other processors.
ClickHouse implements parallel data processing and uses all the hardware resources available. When choosing a processor, take into account that ClickHouse works more efficient at configurations with a large number of cores but lower clock rate than at configurations with fewer cores and a higher clock rate. For example, 16 cores with 2600 MHz is preferable than 8 cores with 3600 MHz.
Use of **Turbo Boost** and **hyper-threading** technologies is recommended. It significantly improves performance with a typical load.
## RAM
We recommend to use 4GB of RAM as minimum to be able to perform non-trivial queries. The ClickHouse server can run with a much smaller amount of RAM, but it requires memory for queries processing.
The required volume of RAM depends on:
- The complexity of queries.
- Amount of the data in queries.
To calculate the required volume of RAM, you should estimate the size of temporary data for [GROUP BY](../query_language/select.md#select-group-by-clause), [DISTINCT](../query_language/select.md#select-distinct), [JOIN](../query_language/select.md#select-join) and other operations you use.
ClickHouse can use external memory for temporary data. See [GROUP BY in External Memory](../query_language/select.md#select-group-by-in-external-memory) for details.
## Swap File
Disable the swap file for production environments.
## Storage Subsystem
You need to have 2GB of free disk space to install ClickHouse.
The volume of storage required for your data should be calculated separately. Assessment should include:
- Estimation of a data volume.
You can take the sample of the data and get the size of a row from it. Then multiply the size of the row with a number of rows you plan to store.
- Data compression coefficient.
To estimate the data compression coefficient, load some sample of your data into ClickHouse and compare the actual size of the data with the size of the table stored. For example, the typical compression coefficient for clickstream data lays in a range of 6-10 times.
To calculate the final volume of data to be stored, divide the estimated data volume by the compression coefficient.
## Network
If possible, use a 10G network.
A bandwidth of the network is critical for processing of distributed queries with a large amount of intermediate data. Also, network speed affects replication processes.
## Software
ClickHouse is developed for Linux family of operating systems. The recommended Linux distribution is Ubuntu. The `tzdata` package should be installed in the system. Name and version of an operating system where ClickHouse runs depend on the method of installation. See details in [Getting started](../getting_started/index.md) section of the documentation.

View File

@ -130,7 +130,7 @@ The path to the directory with the schemes for the input data, such as schemas f
```
## graphite
## graphite {#server_settings-graphite}
Sending data to [Graphite](https://github.com/graphite-project).
@ -271,7 +271,7 @@ The number of seconds that ClickHouse waits for incoming requests before closing
```
## listen_host
## listen_host {#server_settings-listen_host}
Restriction on hosts that requests can come from. If you want the server to answer all of them, specify `::`.
@ -283,7 +283,7 @@ Examples:
```
## logger
## logger {#server_settings-logger}
Logging settings.
@ -599,7 +599,7 @@ The time zone is necessary for conversions between String and DateTime formats w
```
## tcp_port
## tcp_port {#server_settings-tcp_port}
Port for communicating with clients over the TCP protocol.

View File

@ -6,7 +6,7 @@ System tables don't have files with data on the disk or files with metadata. The
System tables are read-only.
They are located in the 'system' database.
## system.asynchronous_metrics
## system.asynchronous_metrics {#system_tables-asynchronous_metrics}
Contain metrics used for profiling and monitoring.
They usually reflect the number of events currently in the system, or the total resources consumed by the system.
@ -70,7 +70,7 @@ Columns:
Note that the amount of memory used by the dictionary is not proportional to the number of items stored in it. So for flat and cached dictionaries, all the memory cells are pre-assigned, regardless of how full the dictionary actually is.
## system.events
## system.events {#system_tables-events}
Contains information about the number of events that have occurred in the system. This is used for profiling and monitoring purposes.
Example: The number of processed SELECT queries.
@ -104,7 +104,7 @@ Columns:
- `bytes_written_uncompressed UInt64` — Number of bytes written, uncompressed.
- `rows_written UInt64` — Number of lines rows written.
## system.metrics
## system.metrics {#system_tables-metrics}
## system.numbers

View File

@ -1,21 +1,5 @@
# Usage Recommendations
## CPU
The SSE 4.2 instruction set must be supported. Modern processors (since 2008) support it.
When choosing a processor, prefer a large number of cores and slightly slower clock rate over fewer cores and a higher clock rate.
For example, 16 cores with 2600 MHz is better than 8 cores with 3600 MHz.
## Hyper-threading
Don't disable hyper-threading. It helps for some queries, but not for others.
## Turbo Boost
Turbo Boost is highly recommended. It significantly improves performance with a typical load.
You can use `turbostat` to view the CPU's actual clock rate under a load.
## CPU Scaling Governor
Always use the `performance` scaling governor. The `on-demand` scaling governor works much worse with constantly high demand.
@ -40,10 +24,6 @@ Do not disable overcommit. The value `cat /proc/sys/vm/overcommit_memory` should
echo 0 | sudo tee /proc/sys/vm/overcommit_memory
```
## Swap File
Always disable the swap file. The only reason for not doing this is if you are using ClickHouse on your personal laptop.
## Huge Pages
Always disable transparent huge pages. It interferes with memory allocators, which leads to significant performance degradation.

View File

@ -0,0 +1,142 @@
# Troubleshooting
Known issues:
- [Installation errors](#troubleshooting-installation-errors).
- [The server does not accept the connections](#troubleshooting-accepts-no-connections).
- [ClickHouse does not process queries](#troubleshooting-does-not-process-queries).
- [ClickHouse processes queries too slow](#troubleshooting-too-slow).
## Installation Errors {#troubleshooting-installation-errors}
### You Can Not Get Deb-packages from ClickHouse Repository With apt-get
- Check firewall settings.
- If you can not access the repository by any reason, download packages as described in the [Getting started](../getting_started/index.md) article and install them manually with `sudo dpkg -i <packages>` command. Also, you need `tzdata` package.
## Server Does Not Accept the Connections {#troubleshooting-accepts-no-connections}
Possible reasons:
- The server is not running.
- Unexpected or wrong configuration parameters.
### Server Is Not Running
**Check if server is runnnig**
Command:
```
sudo service clickhouse-server status
```
If the server is not running, start it with the command:
```
sudo service clickhouse-server start
```
**Check logs**
The main log of `clickhouse-server` is in `/var/log/clickhouse-server.log` by default.
In case of successful start you should see the strings:
- `starting up` — Server started to run.
- `Ready for connections` — Server runs and ready for connections.
If `clickhouse-server` start failed by the configuration error you should see the `<Error>` string with an error description. For example:
```
2019.01.11 15:23:25.549505 [ 45 ] {} <Error> ExternalDictionaries: Failed reloading 'event2id' external dictionary: Poco::Exception. Code: 1000, e.code() = 111, e.displayText() = Connection refused, e.what() = Connection refused
```
If you don't see an error at the end of file look through all the file from the string:
```
<Information> Application: starting up.
```
If you try to start the second instance of `clickhouse-server` at the server you see the following log:
```
2019.01.11 15:25:11.151730 [ 1 ] {} <Information> : Starting ClickHouse 19.1.0 with revision 54413
2019.01.11 15:25:11.154578 [ 1 ] {} <Information> Application: starting up
2019.01.11 15:25:11.156361 [ 1 ] {} <Information> StatusFile: Status file ./status already exists - unclean restart. Contents:
PID: 8510
Started at: 2019-01-11 15:24:23
Revision: 54413
2019.01.11 15:25:11.156673 [ 1 ] {} <Error> Application: DB::Exception: Cannot lock file ./status. Another server instance in same directory is already running.
2019.01.11 15:25:11.156682 [ 1 ] {} <Information> Application: shutting down
2019.01.11 15:25:11.156686 [ 1 ] {} <Debug> Application: Uninitializing subsystem: Logging Subsystem
2019.01.11 15:25:11.156716 [ 2 ] {} <Information> BaseDaemon: Stop SignalListener thread
```
**See system.d logs**
If there is no any useful information in `clickhouse-server` logs or there is no any logs, you can see `system.d` logs by the command:
```
sudo journalctl -u clickhouse-server
```
**Start clickhouse-server in interactive mode**
```
sudo -u clickhouse /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml
```
This command starts the server as an interactive app with standard parameters of autostart script. In this mode `clickhouse-server` prints all the event messages into the console.
### Configuration Parameters
Check:
- Docker settings.
If you run ClickHouse in Docker in IPv6 network, make sure that `network=host` is set.
- Endpoint settings.
Check [listen_host](server_settings/settings.md#server_settings-listen_host) and [tcp_port](server_settings/settings.md#server_settings-tcp_port) settings.
ClickHouse server accepts localhost connections only by default.
- HTTP protocol settings.
Check protocol settings for HTTP API.
- Secure connection settings.
Check:
- `tcp_port_secure` setting.
- Settings for SSL sertificates.
Use proper parameters while connecting. For example, use parameter `port_secure` with `clickhouse_client`.
- User settings
You may use the wrong user name or password for it.
## ClickHouse Does Not Process Queries {#troubleshooting-does-not-process-queries}
If ClickHouse can not process the query, it sends the description of an error to the client. In the `clickhouse-client` you get a description of an error in console. If you use HTTP interface, ClickHouse sends error description in response body. For example,
```bash
$ curl 'http://localhost:8123/' --data-binary "SELECT a"
Code: 47, e.displayText() = DB::Exception: Unknown identifier: a. Note that there is no tables (FROM clause) in your query, context: required_names: 'a' source_tables: table_aliases: private_aliases: column_aliases: public_columns: 'a' masked_columns: array_join_columns: source_columns: , e.what() = DB::Exception
```
If you start `clickhouse-client` with `stack-trace` parameter, ClickHouse returns server stack trace with the description of an error.
It is possible that you see the message of connection broken. In this case, you can repeat query. If connection brakes any time you perform the query you should check the server logs for errors.
## ClickHouse Processes Queries Not Fast Enough {#troubleshooting-too-slow}
If you see that ClickHouse works too slow, you need to profile the load of the server resources and network for your queries.
You can use clickhouse-benchmark utility to profile queries. It shows the number of queries processed in a second, the number of rows processed in a second and percentiles of query processing times.

View File

@ -123,7 +123,7 @@ SELECT
FROM
(
SELECT
uid,
uid,
retention(date = '2018-08-10', date = '2018-08-11', date = '2018-08-12') AS r
FROM events
WHERE date IN ('2018-08-10', '2018-08-11', '2018-08-12')
@ -159,4 +159,4 @@ Solution: Write in the GROUP BY query SearchPhrase HAVING uniqUpTo(4)(UserID) >=
## sumMapFiltered(keys_to_keep)(keys, values)
Same behavior as [sumMap](reference.md#sumMap) except that an array of keys is passed as a parameter. This can be especially useful when working with a high cardinality of keys.
Same behavior as [sumMap](reference.md#agg_functions-summap) except that an array of keys is passed as a parameter. This can be especially useful when working with a high cardinality of keys.

View File

@ -223,7 +223,7 @@ Computes the sum of the numbers, using the same data type for the result as for
Only works for numbers.
## sumMap(key, value)
## sumMap(key, value) {#agg_functions-summap}
Totals the 'value' array according to the keys specified in the 'key' array.
The number of elements in 'key' and 'value' must be the same for each row that is totaled.

View File

@ -70,7 +70,7 @@ Calculates FarmHash64 from a string.
Accepts a String-type argument. Returns UInt64.
For more information, see the link: [FarmHash64](https://github.com/google/farmhash)
## javaHash
## javaHash {#hash_functions-javahash}
Calculates JavaHash from a string.
Accepts a String-type argument. Returns Int32.
@ -80,7 +80,7 @@ For more information, see the link: [JavaHash](http://hg.openjdk.java.net/jdk8u/
Calculates HiveHash from a string.
Accepts a String-type argument. Returns Int32.
Same as for [JavaHash](./hash_functions.md#javaHash), except that the return value never has a negative number.
Same as for [JavaHash](#hash_functions-javahash), except that the return value never has a negative number.
## metroHash64

View File

@ -262,7 +262,7 @@ Returns the ordinal number of the row in the data block. Different data blocks a
Returns the ordinal number of the row in the data block. This function only considers the affected data blocks.
## runningDifference(x)
## runningDifference(x) {#other_functions-runningdifference}
Calculates the difference between successive row values in the data block.
Returns 0 for the first row and the difference from the previous row for each subsequent row.
@ -301,7 +301,7 @@ FROM
## runningDifferenceStartingWithFirstValue
Same as for [runningDifference](./other_functions.md#runningDifference), the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.
Same as for [runningDifference](./other_functions.md#other_functions-runningdifference), the difference is the value of the first row, returned the value of the first row, and each subsequent row returns the difference from the previous row.
## MACNumToString(num)

View File

@ -152,12 +152,12 @@ Converts a Number type argument to a Interval type (duration).
The interval type is actually very useful, you can use this type of data to perform arithmetic operations directly with Date or DateTime. At the same time, ClickHouse provides a more convenient syntax for declaring Interval type data. For example:
```sql
WITH
toDate('2019-01-01') AS date,
INTERVAL 1 WEEK AS interval_week,
WITH
toDate('2019-01-01') AS date,
INTERVAL 1 WEEK AS interval_week,
toIntervalWeek(1) AS interval_to_week
SELECT
date + interval_week,
SELECT
date + interval_week,
date + interval_to_week
```
@ -167,7 +167,7 @@ SELECT
└───────────────────────────┴──────────────────────────────┘
```
## parseDateTimeBestEffort
## parseDateTimeBestEffort {#type_conversion_functions-parsedatetimebesteffort}
Parse a number type argument to a Date or DateTime type.
different from toDate and toDateTime, parseDateTimeBestEffort can progress more complex date format.
@ -175,10 +175,10 @@ For more information, see the link: [Complex Date Format](https://xkcd.com/1179/
## parseDateTimeBestEffortOrNull
Same as for [parseDateTimeBestEffort](./type_conversion_functions.md#parseDateTimeBestEffort) except that it returns null when it encounters a date format that cannot be processed.
Same as for [parseDateTimeBestEffort](#type_conversion_functions-parsedatetimebesteffort) except that it returns null when it encounters a date format that cannot be processed.
## parseDateTimeBestEffortOrZero
Same as for [parseDateTimeBestEffort](./type_conversion_functions.md#parseDateTimeBestEffort) except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
Same as for [parseDateTimeBestEffort](#type_conversion_functions-parsedatetimebesteffort) except that it returns zero date or zero date time when it encounters a date format that cannot be processed.
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/type_conversion_functions/) <!--hide-->

View File

@ -334,7 +334,7 @@ The query can only specify a single ARRAY JOIN clause.
The corresponding conversion can be performed before the WHERE/PREWHERE clause (if its result is needed in this clause), or after completing WHERE/PREWHERE (to reduce the volume of calculations).
### JOIN Clause
### JOIN Clause {#select-join}
Joins the data in the usual [SQL JOIN](https://en.wikipedia.org/wiki/Join_(SQL)) sense.
@ -469,7 +469,7 @@ A query may simultaneously specify PREWHERE and WHERE. In this case, PREWHERE pr
If the 'optimize_move_to_prewhere' setting is set to 1 and PREWHERE is omitted, the system uses heuristics to automatically move parts of expressions from WHERE to PREWHERE.
### GROUP BY Clause
### GROUP BY Clause {#select-group-by-clause}
This is one of the most important parts of a column-oriented DBMS.
@ -566,7 +566,7 @@ If `max_rows_to_group_by` and `group_by_overflow_mode = 'any'` are not used, all
You can use WITH TOTALS in subqueries, including subqueries in the JOIN clause (in this case, the respective total values are combined).
#### GROUP BY in External Memory
#### GROUP BY in External Memory {#select-group-by-in-external-memory}
You can enable dumping temporary data to the disk to restrict memory usage during GROUP BY.
The `max_bytes_before_external_group_by` setting determines the threshold RAM consumption for dumping GROUP BY temporary data to the file system. If set to 0 (the default), it is disabled.
@ -682,7 +682,7 @@ More specifically, expressions are analyzed that are above the aggregate functio
The aggregate functions and everything below them are calculated during aggregation (GROUP BY).
These expressions work as if they are applied to separate rows in the result.
### DISTINCT Clause
### DISTINCT Clause {#select-distinct}
If DISTINCT is specified, only a single row will remain out of all the sets of fully matching rows in the result.
The result will be the same as if GROUP BY were specified across all the fields specified in SELECT without aggregate functions. But there are several differences from GROUP BY:

View File

@ -0,0 +1 @@
../../en/operations/monitoring.md

View File

@ -0,0 +1 @@
../../en/operations/requirements.md

View File

@ -0,0 +1 @@
../../en/operations/troubleshooting.md

View File

@ -0,0 +1 @@
../../en/operations/monitoring.md

View File

@ -0,0 +1 @@
../../en/operations/requirements.md

View File

@ -131,7 +131,7 @@ ClickHouse проверит условия `min_part_size` и `min_part_size_rat
## graphite
## graphite {#server_settings-graphite}
Отправка даных в [Graphite](https://github.com/graphite-project).
@ -272,7 +272,7 @@ ClickHouse проверит условия `min_part_size` и `min_part_size_rat
```
## listen_host
## listen_host {#server_settings-listen_host}
Ограничение по хостам, с которых может прийти запрос. Если необходимо, чтобы сервер отвечал всем, то надо указать `::`.
@ -284,7 +284,7 @@ ClickHouse проверит условия `min_part_size` и `min_part_size_rat
```
## logger
## logger {#server_settings-logger}
Настройки логгирования.
@ -602,7 +602,7 @@ ClickHouse проверит условия `min_part_size` и `min_part_size_rat
```
## tcp_port
## tcp_port {#server_settings-tcp_port}
Порт для взаимодействия с клиентами по протоколу TCP.

View File

@ -6,7 +6,7 @@
В системные таблицы нельзя записывать данные - можно только читать.
Системные таблицы расположены в базе данных system.
## system.asynchronous_metrics
## system.asynchronous_metrics {#system_tables-asynchronous_metrics}
Содержат метрики, используемые для профилирования и мониторинга.
Обычно отражают количество событий, происходящих в данный момент в системе, или ресурсов, суммарно потребляемых системой.
@ -69,11 +69,12 @@ default_expression String - выражение для значения по ум
Заметим, что количество оперативной памяти, которое использует словарь, не является пропорциональным количеству элементов, хранящихся в словаре. Так, для flat и cached словарей, все ячейки памяти выделяются заранее, независимо от реальной заполненности словаря.
## system.events
## system.events {#system_tables-events}
Содержит информацию о количестве произошедших в системе событий, для профилирования и мониторинга.
Пример: количество обработанных запросов типа SELECT.
Столбцы: event String - имя события, value UInt64 - количество.
## system.functions
Содержит информацию об обычных и агрегатных функциях.
@ -101,7 +102,8 @@ default_expression String - выражение для значения по ум
- `bytes_written_uncompressed UInt64` — Количество записанных байт, несжатых.
- `rows_written UInt64` — Количество записанных строк.
## system.metrics
## system.metrics {#system_tables-metrics}
## system.numbers
Таблица содержит один столбец с именем number типа UInt64, содержащим почти все натуральные числа, начиная с нуля.

View File

@ -0,0 +1 @@
../../en/operations/troubleshooting.md

View File

@ -336,7 +336,7 @@ ARRAY JOIN nest AS n, arrayEnumerate(`nest.x`) AS num
### Секция JOIN
### Секция JOIN {#select-join}
Обычный JOIN, не имеет отношения к ARRAY JOIN, который описан выше.
@ -482,7 +482,7 @@ WHERE isNull(y)
Если настройка `optimize_move_to_prewhere` выставлена в `1`, то при отсутствии `PREWHERE`, система будет автоматически переносить части выражений из `WHERE` в `PREWHERE` согласно некоторой эвристике.
### Секция GROUP BY
### Секция GROUP BY {#select-group-by-clause}
Это одна из наиболее важных частей СУБД.
@ -579,7 +579,7 @@ GROUP BY вычисляет для каждого встретившегося
Вы можете использовать WITH TOTALS в подзапросах, включая подзапросы в секции JOIN (в этом случае соответствующие тотальные значения будут соединены).
#### GROUP BY во внешней памяти
#### GROUP BY во внешней памяти {#select-group-by-in-external-memory}
Существует возможность включить сброс временных данных на диск для ограничения потребления оперативной памяти при GROUP BY.
Настройка `max_bytes_before_external_group_by` - потребление оперативки, при котором временные данные GROUP BY сбрасываются в файловую систему. Если равно 0 (по умолчанию) - значит выключено.
@ -695,7 +695,7 @@ WHERE и HAVING отличаются тем, что WHERE выполняется
Сами агрегатные функции и то, что под ними, вычисляются при агрегации (GROUP BY).
Эти выражения работают так, как будто применяются к отдельным строкам результата.
### Секция DISTINCT
### Секция DISTINCT {#select-distinct}
Если указано `DISTINCT`, то из всех множеств полностью совпадающих строк результата, будет оставляться только одна строка.
Результат выполнения будет таким же, как если указано `GROUP BY` по всем указанным полям в `SELECT` и не указаны агрегатные функции. Но имеется несколько отличий от `GROUP BY`:

View File

@ -123,6 +123,10 @@ nav:
- 'Operations':
- 'hidden': 'operations/index.md'
- 'Requirements': 'operations/requirements.md'
- 'Monitoring': 'operations/monitoring.md'
- 'Troubleshooting': 'operations/troubleshooting.md'
- 'Usage Recommendations': 'operations/tips.md'
- 'Table Engines':
- 'Introduction': 'operations/table_engines/index.md'
- 'MergeTree Family':
@ -160,7 +164,6 @@ nav:
- 'Configuration Files': 'operations/configuration_files.md'
- 'Quotas': 'operations/quotas.md'
- 'System Tables': 'operations/system_tables.md'
- 'Usage Recommendations': 'operations/tips.md'
- 'Server Configuration Parameters':
- 'Introduction': 'operations/server_settings/index.md'
- 'Server Settings': 'operations/server_settings/settings.md'

View File

@ -119,6 +119,10 @@ nav:
- 'Operations':
- 'hidden': 'operations/index.md'
- 'Requirements': 'operations/requirements.md'
- 'Monitoring': 'operations/monitoring.md'
- 'Troubleshooting': 'operations/troubleshooting.md'
- 'Usage recommendations': 'operations/tips.md'
- 'Table engines':
- 'Introduction': 'operations/table_engines/index.md'
- 'MergeTree family':
@ -156,7 +160,6 @@ nav:
- 'Configuration files': 'operations/configuration_files.md'
- 'Quotas': 'operations/quotas.md'
- 'System tables': 'operations/system_tables.md'
- 'Usage recommendations': 'operations/tips.md'
- 'Server configuration parameters':
- 'Introduction': 'operations/server_settings/index.md'
- 'Server settings': 'operations/server_settings/settings.md'

View File

@ -121,6 +121,10 @@ nav:
- 'Эксплуатация':
- 'hidden': 'operations/index.md'
- 'Требования': 'operations/requirements.md'
- 'Мониторинг': 'operations/monitoring.md'
- 'Решение проблем': 'operations/troubleshooting.md'
- 'Советы по эксплуатации': 'operations/tips.md'
- 'Движки таблиц':
- 'Введение': 'operations/table_engines/index.md'
- 'Семейство MergeTree':
@ -158,7 +162,6 @@ nav:
- 'Конфигурационные файлы': 'operations/configuration_files.md'
- 'Квоты': 'operations/quotas.md'
- 'Системные таблицы': 'operations/system_tables.md'
- 'Советы по эксплуатации': 'operations/tips.md'
- 'Конфигурационные параметры сервера':
- 'Введение': 'operations/server_settings/index.md'
- 'Серверные настройки': 'operations/server_settings/settings.md'

View File

@ -120,6 +120,10 @@ nav:
- '运维':
- 'hidden': 'operations/index.md'
- 'Requirements': 'operations/requirements.md'
- 'Monitoring': 'operations/monitoring.md'
- 'Troubleshooting': 'operations/troubleshooting.md'
- 'Usage recommendations': 'operations/tips.md'
- 'Table engines':
- 'Introduction': 'operations/table_engines/index.md'
- 'MergeTree family':
@ -157,7 +161,6 @@ nav:
- 'Configuration files': 'operations/configuration_files.md'
- 'Quotas': 'operations/quotas.md'
- 'System tables': 'operations/system_tables.md'
- 'Usage recommendations': 'operations/tips.md'
- 'Server configuration parameters':
- 'Introduction': 'operations/server_settings/index.md'
- 'Server settings': 'operations/server_settings/settings.md'

View File

@ -159,7 +159,7 @@ x=1 y=\N
clickhouse-client --format_csv_delimiter="|" --query="INSERT INTO test.csv FORMAT CSV" < data.csv
```
&ast;默认情况下间隔符是 `,` ,在 [format_csv_delimiter](../operations/settings/settings.md#format_csv_delimiter) 中可以了解更多间隔符配置。
&ast;默认情况下间隔符是 `,` ,在 [format_csv_delimiter](../operations/settings/settings.md#settings-format_csv_delimiter) 中可以了解更多间隔符配置。
解析的时候,可以使用或不使用引号来解析所有值。支持双引号和单引号。行也可以不用引号排列。 在这种情况下它们被解析为逗号或换行符CR 或 LF。在解析不带引号的行时若违反 RFC 规则,会忽略前导和尾随的空格和制表符。 对于换行,全部支持 UnixLFWindowsCR LF和 Mac OS ClassicCR LF

View File

@ -0,0 +1 @@
../../en/operations/monitoring.md

View File

@ -0,0 +1 @@
../../en/operations/requirements.md

View File

@ -1,697 +0,0 @@
# Server settings
## builtin_dictionaries_reload_interval
The interval in seconds before reloading built-in dictionaries.
ClickHouse reloads built-in dictionaries every x seconds. This makes it possible to edit dictionaries "on the fly" without restarting the server.
Default value: 3600.
**Example**
```xml
<builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>
```
## compression
Data compression settings.
!!! warning
Don't use it if you have just started using ClickHouse.
The configuration looks like this:
```xml
<compression>
<case>
<parameters/>
</case>
...
</compression>
```
You can configure multiple sections `<case>`.
Block field `<case>`:
- ``min_part_size`` The minimum size of a table part.
- ``min_part_size_ratio`` The ratio of the minimum size of a table part to the full size of the table.
- ``method`` Compression method. Acceptable values : ``lz4`` or ``zstd``(experimental).
ClickHouse checks `min_part_size` and `min_part_size_ratio` and processes the `case` blocks that match these conditions. If none of the `<case>` matches, ClickHouse applies the `lz4` compression algorithm.
**Example**
```xml
<compression incl="clickhouse_compression">
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>zstd</method>
</case>
</compression>
```
## default_database
The default database.
To get a list of databases, use the [SHOW DATABASES](../../query_language/misc.md#query_language_queries_show_databases) query.
**Example**
```xml
<default_database>default</default_database>
```
## default_profile
Default settings profile.
Settings profiles are located in the file specified in the parameter `user_config`.
**Example**
```xml
<default_profile>default</default_profile>
```
## dictionaries_config
The path to the config file for external dictionaries.
Path:
- Specify the absolute path or the path relative to the server config file.
- The path can contain wildcards \* and ?.
See also "[External dictionaries](../../query_language/dicts/external_dicts.md)".
**Example**
```xml
<dictionaries_config>*_dictionary.xml</dictionaries_config>
```
## dictionaries_lazy_load
Lazy loading of dictionaries.
If `true`, then each dictionary is created on first use. If dictionary creation failed, the function that was using the dictionary throws an exception.
If `false`, all dictionaries are created when the server starts, and if there is an error, the server shuts down.
The default is `true`.
**Example**
```xml
<dictionaries_lazy_load>true</dictionaries_lazy_load>
```
## format_schema_path
The path to the directory with the schemes for the input data, such as schemas for the [CapnProto](../../interfaces/formats.md#capnproto) format.
**Example**
```xml
<!-- Directory containing schema files for various input formats. -->
<format_schema_path>format_schemas/</format_schema_path>
```
## graphite
Sending data to [Graphite](https://github.com/graphite-project).
Settings:
- host The Graphite server.
- port The port on the Graphite server.
- interval The interval for sending, in seconds.
- timeout The timeout for sending data, in seconds.
- root_path Prefix for keys.
- metrics Sending data from a :ref:`system_tables-system.metrics` table.
- events Sending data from a :ref:`system_tables-system.events` table.
- asynchronous_metrics Sending data from a :ref:`system_tables-system.asynchronous_metrics` table.
You can configure multiple `<graphite>` clauses. For instance, you can use this for sending different data at different intervals.
**Example**
```xml
<graphite>
<host>localhost</host>
<port>42000</port>
<timeout>0.1</timeout>
<interval>60</interval>
<root_path>one_min</root_path>
<metrics>true</metrics>
<events>true</events>
<asynchronous_metrics>true</asynchronous_metrics>
</graphite>
```
## graphite_rollup
Settings for thinning data for Graphite.
For more details, see [GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md).
**Example**
```xml
<graphite_rollup_example>
<default>
<function>max</function>
<retention>
<age>0</age>
<precision>60</precision>
</retention>
<retention>
<age>3600</age>
<precision>300</precision>
</retention>
<retention>
<age>86400</age>
<precision>3600</precision>
</retention>
</default>
</graphite_rollup_example>
```
## http_port/https_port
The port for connecting to the server over HTTP(s).
If `https_port` is specified, [openSSL](#openssl) must be configured.
If `http_port` is specified, the openSSL configuration is ignored even if it is set.
**Example**
```xml
<https>0000</https>
```
## http_server_default_response
The page that is shown by default when you access the ClickHouse HTTP(s) server.
**Example**
Opens `https://tabix.io/` when accessing ` http://localhost: http_port`.
```xml
<http_server_default_response>
<![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]>
</http_server_default_response>
```
## include_from {#server_settings-include_from}
The path to the file with substitutions.
For more information, see the section "[Configuration files](../configuration_files.md#configuration_files)".
**Example**
```xml
<include_from>/etc/metrica.xml</include_from>
```
## interserver_http_port
Port for exchanging data between ClickHouse servers.
**Example**
```xml
<interserver_http_port>9009</interserver_http_port>
```
## interserver_http_host
The host name that can be used by other servers to access this server.
If omitted, it is defined in the same way as the `hostname-f` command.
Useful for breaking away from a specific network interface.
**Example**
```xml
<interserver_http_host>example.yandex.ru</interserver_http_host>
```
## keep_alive_timeout
The number of seconds that ClickHouse waits for incoming requests before closing the connection. Defaults to 3 seconds.
**Example**
```xml
<keep_alive_timeout>3</keep_alive_timeout>
```
## listen_host
Restriction on hosts that requests can come from. If you want the server to answer all of them, specify `::`.
Examples:
```xml
<listen_host>::1</listen_host>
<listen_host>127.0.0.1</listen_host>
```
## logger
Logging settings.
Keys:
- level Logging level. Acceptable values: ``trace``, ``debug``, ``information``, ``warning``, ``error``.
- log The log file. Contains all the entries according to `level`.
- errorlog Error log file.
- size Size of the file. Applies to ``log``and``errorlog``. Once the file reaches ``size``, ClickHouse archives and renames it, and creates a new log file in its place.
- count The number of archived log files that ClickHouse stores.
**Example**
```xml
<logger>
<level>trace</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
</logger>
```
Writing to the syslog is also supported. Config example:
```xml
<logger>
<use_syslog>1</use_syslog>
<syslog>
<address>syslog.remote:10514</address>
<hostname>myhost.local</hostname>
<facility>LOG_LOCAL6</facility>
<format>syslog</format>
</syslog>
</logger>
```
Keys:
- user_syslog — Required setting if you want to write to the syslog.
- address — The host[:порт] of syslogd. If omitted, the local daemon is used.
- hostname — Optional. The name of the host that logs are sent from.
- facility — [The syslog facility keyword](https://en.wikipedia.org/wiki/Syslog#Facility)
in uppercase letters with the "LOG_" prefix: (``LOG_USER``, ``LOG_DAEMON``, ``LOG_LOCAL3``, and so on).
Default value: ``LOG_USER`` if ``address`` is specified, ``LOG_DAEMON otherwise.``
- format Message format. Possible values: ``bsd`` and ``syslog.``
## macros
Parameter substitutions for replicated tables.
Can be omitted if replicated tables are not used.
For more information, see the section "[Creating replicated tables](../../operations/table_engines/replication.md)".
**Example**
```xml
<macros incl="macros" optional="true" />
```
## mark_cache_size
Approximate size (in bytes) of the cache of "marks" used by [MergeTree](../../operations/table_engines/mergetree.md).
The cache is shared for the server and memory is allocated as needed. The cache size must be at least 5368709120.
**Example**
```xml
<mark_cache_size>5368709120</mark_cache_size>
```
## max_concurrent_queries
The maximum number of simultaneously processed requests.
**Example**
```xml
<max_concurrent_queries>100</max_concurrent_queries>
```
## max_connections
The maximum number of inbound connections.
**Example**
```xml
<max_connections>4096</max_connections>
```
## max_open_files
The maximum number of open files.
By default: `maximum`.
We recommend using this option in Mac OS X, since the `getrlimit()` function returns an incorrect value.
**Example**
```xml
<max_open_files>262144</max_open_files>
```
## max_table_size_to_drop
Restriction on deleting tables.
If the size of a [MergeTree](../../operations/table_engines/mergetree.md) table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
If you still need to delete the table without restarting the ClickHouse server, create the `<clickhouse-path>/flags/force_drop_table` file and run the DROP query.
Default value: 50 GB.
The value 0 means that you can delete all tables without any restrictions.
**Example**
```xml
<max_table_size_to_drop>0</max_table_size_to_drop>
```
## merge_tree
Fine tuning for tables in the [ MergeTree](../../operations/table_engines/mergetree.md).
For more information, see the MergeTreeSettings.h header file.
**Example**
```xml
<merge_tree>
<max_suspicious_broken_parts>5</max_suspicious_broken_parts>
</merge_tree>
```
## openSSL
SSL client/server configuration.
Support for SSL is provided by the `libpoco` library. The interface is described in the file [SSLManager.h](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h)
Keys for server/client settings:
- privateKeyFile The path to the file with the secret key of the PEM certificate. The file may contain a key and certificate at the same time.
- certificateFile The path to the client/server certificate file in PEM format. You can omit it if `privateKeyFile` contains the certificate.
- caConfig The path to the file or directory that contains trusted root certificates.
- verificationMode The method for checking the node's certificates. Details are in the description of the [Context](https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/Context.h) class. Possible values: ``none``, ``relaxed``, ``strict``, ``once``.
- verificationDepth The maximum length of the verification chain. Verification will fail if the certificate chain length exceeds the set value.
- loadDefaultCAFile Indicates that built-in CA certificates for OpenSSL will be used. Acceptable values: `true`, `false`. |
- cipherList Supported OpenSSL encryptions. For example: `ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH`.
- cacheSessions Enables or disables caching sessions. Must be used in combination with ``sessionIdContext``. Acceptable values: `true`, `false`.
- sessionIdContext A unique set of random characters that the server appends to each generated identifier. The length of the string must not exceed ``SSL_MAX_SSL_SESSION_ID_LENGTH``. This parameter is always recommended, since it helps avoid problems both if the server caches the session and if the client requested caching. Default value: ``${application.name}``.
- sessionCacheSize The maximum number of sessions that the server caches. Default value: 1024\*20. 0 Unlimited sessions.
- sessionTimeout Time for caching the session on the server.
- extendedVerification Automatically extended verification of certificates after the session ends. Acceptable values: `true`, `false`.
- requireTLSv1 Require a TLSv1 connection. Acceptable values: `true`, `false`.
- requireTLSv1_1 Require a TLSv1.1 connection. Acceptable values: `true`, `false`.
- requireTLSv1 Require a TLSv1.2 connection. Acceptable values: `true`, `false`.
- fips Activates OpenSSL FIPS mode. Supported if the library's OpenSSL version supports FIPS.
- privateKeyPassphraseHandler Class (PrivateKeyPassphraseHandler subclass) that requests the passphrase for accessing the private key. For example: ``<privateKeyPassphraseHandler>``, ``<name>KeyFileHandler</name>``, ``<options><password>test</password></options>``, ``</privateKeyPassphraseHandler>``.
- invalidCertificateHandler Class (subclass of CertificateHandler) for verifying invalid certificates. For example: `` <invalidCertificateHandler> <name>ConsoleCertificateHandler</name> </invalidCertificateHandler>`` .
- disableProtocols Protocols that are not allowed to use.
- preferServerCiphers Preferred server ciphers on the client.
**Example of settings:**
```xml
<openSSL>
<server>
<!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
<certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
<privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
<!-- openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096 -->
<dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>
<verificationMode>none</verificationMode>
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
</server>
<client>
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<!-- Use for self-signed: <verificationMode>none</verificationMode> -->
<invalidCertificateHandler>
<!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>
```
## part_log
Logging events that are associated with [MergeTree](../../operations/table_engines/mergetree.md). For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
Queries are logged in the ClickHouse table, not in a separate file.
Columns in the log:
- event_time Date of the event.
- duration_ms Duration of the event.
- event_type Type of event. 1 new data part; 2 merge result; 3 data part downloaded from replica; 4 data part deleted.
- database_name The name of the database.
- table_name Name of the table.
- part_name Name of the data part.
- size_in_bytes Size of the data part in bytes.
- merged_from An array of names of data parts that make up the merge (also used when downloading a merged part).
- merge_time_ms Time spent on the merge.
Use the following parameters to configure logging:
- database Name of the database.
- table Name of the table.
- partition_by Sets a [custom partitioning key](../../operations/table_engines/custom_partitioning_key.md).
- flush_interval_milliseconds Interval for flushing data from memory to the disk.
**Example**
```xml
<part_log>
<database>system</database>
<table>part_log</table>
<partition_by>toMonday(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</part_log>
```
## path
The path to the directory containing data.
!!! note
The trailing slash is mandatory.
**Example**
```xml
<path>/var/lib/clickhouse/</path>
```
## query_log
Setting for logging queries received with the [log_queries=1](../settings/settings.md) setting.
Queries are logged in the ClickHouse table, not in a separate file.
Use the following parameters to configure logging:
- database Name of the database.
- table Name of the table.
- partition_by Sets a [custom partitioning key](../../operations/table_engines/custom_partitioning_key.md).
- flush_interval_milliseconds Interval for flushing data from memory to the disk.
If the table doesn't exist, ClickHouse will create it. If the structure of the query log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
**Example**
```xml
<query_log>
<database>system</database>
<table>query_log</table>
<partition_by>toMonday(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
```
## remote_servers
Configuration of clusters used by the Distributed table engine.
For more information, see the section "[Table engines/Distributed](../../operations/table_engines/distributed.md)".
**Example**
```xml
<remote_servers incl="clickhouse_remote_servers" />
```
For the value of the `incl` attribute, see the section "[Configuration files](../configuration_files.md#configuration_files)".
## timezone
The server's time zone.
Specified as an IANA identifier for the UTC time zone or geographic location (for example, Africa/Abidjan).
The time zone is necessary for conversions between String and DateTime formats when DateTime fields are output to text format (printed on the screen or in a file), and when getting DateTime from a string. In addition, the time zone is used in functions that work with the time and date if they didn't receive the time zone in the input parameters.
**Example**
```xml
<timezone>Europe/Moscow</timezone>
```
## tcp_port
Port for communicating with clients over the TCP protocol.
**Example**
```xml
<tcp_port>9000</tcp_port>
```
## tmp_path
Path to temporary data for processing large queries.
!!! note
The trailing slash is mandatory.
**Example**
```xml
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
```
## uncompressed_cache_size
Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../operations/table_engines/mergetree.md).
There is one shared cache for the server. Memory is allocated on demand. The cache is used if the option [use_uncompressed_cache](../settings/settings.md) is enabled.
The uncompressed cache is advantageous for very short queries in individual cases.
**Example**
```xml
<uncompressed_cache_size>8589934592</uncompressed_cache_size>
```
## user_files_path {#server_settings-user_files_path}
The directory with user files. Used in the table function [file()](../../query_language/table_functions/file.md).
**Example**
```xml
<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>
```
## users_config
Path to the file that contains:
- User configurations.
- Access rights.
- Settings profiles.
- Quota settings.
**Example**
```xml
<users_config>users.xml</users_config>
```
## zookeeper
Configuration of ZooKeeper servers.
ClickHouse uses ZooKeeper for storing replica metadata when using replicated tables.
This parameter can be omitted if replicated tables are not used.
For more information, see the section "[Replication](../../operations/table_engines/replication.md)".
**Example**
```xml
<zookeeper>
<node index="1">
<host>example1</host>
<port>2181</port>
</node>
<node index="2">
<host>example2</host>
<port>2181</port>
</node>
<node index="3">
<host>example3</host>
<port>2181</port>
</node>
</zookeeper>
```
[Original article](https://clickhouse.yandex/docs/en/operations/server_settings/settings/) <!--hide-->

View File

@ -0,0 +1 @@
../../../en/operations/server_settings/settings.md

View File

@ -1,390 +0,0 @@
# Settings
## distributed_product_mode
Changes the behavior of [distributed subqueries](../../query_language/select.md).
ClickHouse applies this setting when the query contains the product of distributed tables, i.e. when the query for a distributed table contains a non-GLOBAL subquery for the distributed table.
Restrictions:
- Only applied for IN and JOIN subqueries.
- Only if the FROM section uses a distributed table containing more than one shard.
- If the subquery concerns a distributed table containing more than one shard,
- Not used for a table-valued [remote](../../query_language/table_functions/remote.md) function.
The possible values are:
- `deny` — Default value. Prohibits using these types of subqueries (returns the "Double-distributed in/JOIN subqueries is denied" exception).
- `local` — Replaces the database and table in the subquery with local ones for the destination server (shard), leaving the normal `IN` / `JOIN.`
- `global` — Replaces the `IN` / `JOIN` query with `GLOBAL IN` / `GLOBAL JOIN.`
- `allow` — Allows the use of these types of subqueries.
## fallback_to_stale_replicas_for_distributed_queries
Forces a query to an out-of-date replica if updated data is not available. See "[Replication](../../operations/table_engines/replication.md)".
ClickHouse selects the most relevant from the outdated replicas of the table.
Used when performing `SELECT` from a distributed table that points to replicated tables.
By default, 1 (enabled).
## force_index_by_date {#settings-settings-force_index_by_date}
Disables query execution if the index can't be used by date.
Works with tables in the MergeTree family.
If `force_index_by_date=1`, ClickHouse checks whether the query has a date key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For example, the condition `Date != ' 2000-01-01 '` is acceptable even when it matches all the data in the table (i.e., running the query requires a full scan). For more information about ranges of data in MergeTree tables, see "[MergeTree](../../operations/table_engines/mergetree.md)".
## force_primary_key
Disables query execution if indexing by the primary key is not possible.
Works with tables in the MergeTree family.
If `force_primary_key=1`, ClickHouse checks to see if the query has a primary key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For more information about data ranges in MergeTree tables, see "[MergeTree](../../operations/table_engines/mergetree.md)".
## fsync_metadata
Enable or disable fsync when writing .sql files. Enabled by default.
It makes sense to disable it if the server has millions of tiny table chunks that are constantly being created and destroyed.
## input_format_allow_errors_num
Sets the maximum number of acceptable errors when reading from text formats (CSV, TSV, etc.).
The default value is 0.
Always pair it with `input_format_allow_errors_ratio`. To skip errors, both settings must be greater than 0.
If an error occurred while reading rows but the error counter is still less than `input_format_allow_errors_num`, ClickHouse ignores the row and moves on to the next one.
If `input_format_allow_errors_num`is exceeded, ClickHouse throws an exception.
## input_format_allow_errors_ratio
Sets the maximum percentage of errors allowed when reading from text formats (CSV, TSV, etc.).
The percentage of errors is set as a floating-point number between 0 and 1.
The default value is 0.
Always pair it with `input_format_allow_errors_num`. To skip errors, both settings must be greater than 0.
If an error occurred while reading rows but the error counter is still less than `input_format_allow_errors_ratio`, ClickHouse ignores the row and moves on to the next one.
If `input_format_allow_errors_ratio` is exceeded, ClickHouse throws an exception.
## max_block_size
In ClickHouse, data is processed by blocks (sets of column parts). The internal processing cycles for a single block are efficient enough, but there are noticeable expenditures on each block. `max_block_size` is a recommendation for what size of block (in number of rows) to load from tables. The block size shouldn't be too small, so that the expenditures on each block are still noticeable, but not too large, so that the query with LIMIT that is completed after the first block is processed quickly, so that too much memory isn't consumed when extracting a large number of columns in multiple threads, and so that at least some cache locality is preserved.
By default, 65,536.
Blocks the size of `max_block_size` are not always loaded from the table. If it is obvious that less data needs to be retrieved, a smaller block is processed.
## preferred_block_size_bytes
Used for the same purpose as `max_block_size`, but it sets the recommended block size in bytes by adapting it to the number of rows in the block.
However, the block size cannot be more than `max_block_size` rows.
By default: 1,000,000. It only works when reading from MergeTree engines.
## log_queries
Setting up query logging.
Queries sent to ClickHouse with this setup are logged according to the rules in the [query_log](../server_settings/settings.md) server configuration parameter.
**Example**:
log_queries=1
## max_insert_block_size {#settings-max_insert_block_size}
The size of blocks to form for insertion into a table.
This setting only applies in cases when the server forms the blocks.
For example, for an INSERT via the HTTP interface, the server parses the data format and forms blocks of the specified size.
But when using clickhouse-client, the client parses the data itself, and the 'max_insert_block_size' setting on the server doesn't affect the size of the inserted blocks.
The setting also doesn't have a purpose when using INSERT SELECT, since data is inserted using the same blocks that are formed after SELECT.
By default, it is 1,048,576.
This is slightly more than `max_block_size`. The reason for this is because certain table engines (`*MergeTree`) form a data part on the disk for each inserted block, which is a fairly large entity. Similarly, `*MergeTree` tables sort data during insertion, and a large enough block size allows sorting more data in RAM.
## max_replica_delay_for_distributed_queries {#settings_settings_max_replica_delay_for_distributed_queries}
Disables lagging replicas for distributed queries. See "[Replication](../../operations/table_engines/replication.md)".
Sets the time in seconds. If a replica lags more than the set value, this replica is not used.
Default value: 300.
Used when performing `SELECT` from a distributed table that points to replicated tables.
## max_threads {#settings-max_threads}
The maximum number of query processing threads
- excluding threads for retrieving data from remote servers (see the 'max_distributed_connections' parameter).
This parameter applies to threads that perform the same stages of the query processing pipeline in parallel.
For example, if reading from a table, evaluating expressions with functions, filtering with WHERE and pre-aggregating for GROUP BY can all be done in parallel using at least 'max_threads' number of threads, then 'max_threads' are used.
By default, 2.
If less than one SELECT query is normally run on a server at a time, set this parameter to a value slightly less than the actual number of processor cores.
For queries that are completed quickly because of a LIMIT, you can set a lower 'max_threads'. For example, if the necessary number of entries are located in every block and max_threads = 8, 8 blocks are retrieved, although it would have been enough to read just one.
The smaller the `max_threads` value, the less memory is consumed.
## max_compress_block_size
The maximum size of blocks of uncompressed data before compressing for writing to a table. By default, 1,048,576 (1 MiB). If the size is reduced, the compression rate is significantly reduced, the compression and decompression speed increases slightly due to cache locality, and memory consumption is reduced. There usually isn't any reason to change this setting.
Don't confuse blocks for compression (a chunk of memory consisting of bytes) and blocks for query processing (a set of rows from a table).
## min_compress_block_size
For [MergeTree](../../operations/table_engines/mergetree.md)" tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
The actual size of the block, if the uncompressed data is less than 'max_compress_block_size', is no less than this value and no less than the volume of data for one mark.
Let's look at an example. Assume that 'index_granularity' was set to 8192 during table creation.
We are writing a UInt32-type column (4 bytes per value). When writing 8192 rows, the total will be 32 KB of data. Since min_compress_block_size = 65,536, a compressed block will be formed for every two marks.
We are writing a URL column with the String type (average size of 60 bytes per value). When writing 8192 rows, the average will be slightly less than 500 KB of data. Since this is more than 65,536, a compressed block will be formed for each mark. In this case, when reading data from the disk in the range of a single mark, extra data won't be decompressed.
There usually isn't any reason to change this setting.
## max_query_size
The maximum part of a query that can be taken to RAM for parsing with the SQL parser.
The INSERT query also contains data for INSERT that is processed by a separate stream parser (that consumes O(1) RAM), which is not included in this restriction.
The default is 256 KiB.
## interactive_delay
The interval in microseconds for checking whether request execution has been canceled and sending the progress.
By default, 100,000 (check for canceling and send progress ten times per second).
## connect_timeout, receive_timeout, send_timeout
Timeouts in seconds on the socket used for communicating with the client.
By default, 10, 300, 300.
## poll_interval
Lock in a wait loop for the specified number of seconds.
By default, 10.
## max_distributed_connections
The maximum number of simultaneous connections with remote servers for distributed processing of a single query to a single Distributed table. We recommend setting a value no less than the number of servers in the cluster.
By default, 1024.
The following parameters are only used when creating Distributed tables (and when launching a server), so there is no reason to change them at runtime.
## distributed_connections_pool_size
The maximum number of simultaneous connections with remote servers for distributed processing of all queries to a single Distributed table. We recommend setting a value no less than the number of servers in the cluster.
By default, 1024.
## connect_timeout_with_failover_ms
The timeout in milliseconds for connecting to a remote server for a Distributed table engine, if the 'shard' and 'replica' sections are used in the cluster definition.
If unsuccessful, several attempts are made to connect to various replicas.
By default, 50.
## connections_with_failover_max_tries
The maximum number of connection attempts with each replica, for the Distributed table engine.
By default, 3.
## extremes
Whether to count extreme values (the minimums and maximums in columns of a query result). Accepts 0 or 1. By default, 0 (disabled).
For more information, see the section "Extreme values".
## use_uncompressed_cache
Whether to use a cache of uncompressed blocks. Accepts 0 or 1. By default, 1 (enabled).
The uncompressed cache (only for tables in the MergeTree family) allows significantly reducing latency and increasing throughput when working with a large number of short queries. Enable this setting for users who send frequent short requests. Also pay attention to the 'uncompressed_cache_size' configuration parameter (only set in the config file) the size of uncompressed cache blocks. By default, it is 8 GiB. The uncompressed cache is filled in as needed; the least-used data is automatically deleted.
For queries that read at least a somewhat large volume of data (one million rows or more), the uncompressed cache is disabled automatically in order to save space for truly small queries. So you can keep the 'use_uncompressed_cache' setting always set to 1.
## replace_running_query
When using the HTTP interface, the 'query_id' parameter can be passed. This is any string that serves as the query identifier.
If a query from the same user with the same 'query_id' already exists at this time, the behavior depends on the 'replace_running_query' parameter.
`0` (default) Throw an exception (don't allow the query to run if a query with the same 'query_id' is already running).
`1` Cancel the old query and start running the new one.
Yandex.Metrica uses this parameter set to 1 for implementing suggestions for segmentation conditions. After entering the next character, if the old query hasn't finished yet, it should be canceled.
## schema
This parameter is useful when you are using formats that require a schema definition, such as [Cap'n Proto](https://capnproto.org/). The value depends on the format.
## stream_flush_interval_ms
Works for tables with streaming in the case of a timeout, or when a thread generates [max_insert_block_size](#settings-max_insert_block_size) rows.
The default value is 7500.
The smaller the value, the more often data is flushed into the table. Setting the value too low leads to poor performance.
## load_balancing
Which replicas (among healthy replicas) to preferably send a query to (on the first attempt) for distributed processing.
### random (default)
The number of errors is counted for each replica. The query is sent to the replica with the fewest errors, and if there are several of these, to any one of them.
Disadvantages: Server proximity is not accounted for; if the replicas have different data, you will also get different data.
### nearest_hostname
The number of errors is counted for each replica. Every 5 minutes, the number of errors is integrally divided by 2. Thus, the number of errors is calculated for a recent time with exponential smoothing. If there is one replica with a minimal number of errors (i.e. errors occurred recently on the other replicas), the query is sent to it. If there are multiple replicas with the same minimal number of errors, the query is sent to the replica with a host name that is most similar to the server's host name in the config file (for the number of different characters in identical positions, up to the minimum length of both host names).
For instance, example01-01-1 and example01-01-2.yandex.ru are different in one position, while example01-01-1 and example01-02-2 differ in two places.
This method might seem a little stupid, but it doesn't use external data about network topology, and it doesn't compare IP addresses, which would be complicated for our IPv6 addresses.
Thus, if there are equivalent replicas, the closest one by name is preferred.
We can also assume that when sending a query to the same server, in the absence of failures, a distributed query will also go to the same servers. So even if different data is placed on the replicas, the query will return mostly the same results.
### in_order
Replicas are accessed in the same order as they are specified. The number of errors does not matter.
This method is appropriate when you know exactly which replica is preferable.
## totals_mode
How to calculate TOTALS when HAVING is present, as well as when max_rows_to_group_by and group_by_overflow_mode = 'any' are present.
See the section "WITH TOTALS modifier".
## totals_auto_threshold
The threshold for `totals_mode = 'auto'`.
See the section "WITH TOTALS modifier".
## max_parallel_replicas
The maximum number of replicas for each shard when executing a query.
For consistency (to get different parts of the same data split), this option only works when the sampling key is set.
Replica lag is not controlled.
## compile
Enable compilation of queries. By default, 0 (disabled).
Compilation is only used for part of the query-processing pipeline: for the first stage of aggregation (GROUP BY).
If this portion of the pipeline was compiled, the query may run faster due to deployment of short cycles and inlining aggregate function calls. The maximum performance improvement (up to four times faster in rare cases) is seen for queries with multiple simple aggregate functions. Typically, the performance gain is insignificant. In very rare cases, it may slow down query execution.
## min_count_to_compile
How many times to potentially use a compiled chunk of code before running compilation. By default, 3.
If the value is zero, then compilation runs synchronously and the query waits for the end of the compilation process before continuing execution. This can be used for testing; otherwise, use values starting with 1. Compilation normally takes about 5-10 seconds.
If the value is 1 or more, compilation occurs asynchronously in a separate thread. The result will be used as soon as it is ready, including by queries that are currently running.
Compiled code is required for each different combination of aggregate functions used in the query and the type of keys in the GROUP BY clause.
The results of compilation are saved in the build directory in the form of .so files. There is no restriction on the number of compilation results, since they don't use very much space. Old results will be used after server restarts, except in the case of a server upgrade in this case, the old results are deleted.
## input_format_skip_unknown_fields
If the value is true, running INSERT skips input data from columns with unknown names. Otherwise, this situation will generate an exception.
It works for JSONEachRow and TSKV formats.
## output_format_json_quote_64bit_integers
If the value is true, integers appear in quotes when using JSON\* Int64 and UInt64 formats (for compatibility with most JavaScript implementations); otherwise, integers are output without the quotes.
## format_csv_delimiter {#format_csv_delimiter}
The character interpreted as a delimiter in the CSV data. By default, the delimiter is `,`.
## join_use_nulls
Affects the behavior of [JOIN](../../query_language/select.md).
With `join_use_nulls=1,` `JOIN` behaves like in standard SQL, i.e. if empty cells appear when merging, the type of the corresponding field is converted to [Nullable](../../data_types/nullable.md#data_type-nullable), and empty cells are filled with [NULL](../../query_language/syntax.md).
## insert_quorum
Enables quorum writes.
- If `insert_quorum < 2`, the quorum writes are disabled.
- If `insert_quorum >= 2`, the quorum writes are enabled.
The default value is 0.
**Quorum writes**
`INSERT` succeeds only when ClickHouse manages to correctly write data to the `insert_quorum` of replicas during the `insert_quorum_timeout`. If for any reason the number of replicas with successful writes does not reach the `insert_quorum`, the write is considered failed and ClickHouse will delete the inserted block from all the replicas where data has already been written.
All the replicas in the quorum are consistent, i.e., they contain data from all previous `INSERT` queries. The `INSERT` sequence is linearized.
When reading the data written from the `insert_quorum`, you can use the [select_sequential_consistency](#select-sequential-consistency) option.
**ClickHouse generates an exception**
- If the number of available replicas at the time of the query is less than the `insert_quorum`.
- At an attempt to write data when the previous block has not yet been inserted in the `insert_quorum` of replicas. This situation may occur if the user tries to perform an `INSERT` before the previous one with the `insert_quorum` is completed.
**See also the following parameters:**
- [insert_quorum_timeout](#insert-quorum-timeout)
- [select_sequential_consistency](#select-sequential-consistency)
## insert_quorum_timeout
Quorum write timeout in seconds. If the timeout has passed and no write has taken place yet, ClickHouse will generate an exception and the client must repeat the query to write the same block to the same or any other replica.
By default, 60 seconds.
**See also the following parameters:**
- [insert_quorum](#insert-quorum)
- [select_sequential_consistency](#select-sequential-consistency)
## select_sequential_consistency
Enables/disables sequential consistency for `SELECT` queries:
- 0 — disabled. The default value is 0.
- 1 — enabled.
When sequential consistency is enabled, ClickHouse allows the client to execute the `SELECT` query only for those replicas that contain data from all previous `INSERT` queries executed with `insert_quorum`. If the client refers to a partial replica, ClickHouse will generate an exception. The SELECT query will not include data that has not yet been written to the quorum of replicas.
See also the following parameters:
- [insert_quorum](#insert-quorum)
- [insert_quorum_timeout](#insert-quorum-timeout)
[Original article](https://clickhouse.yandex/docs/en/operations/settings/settings/) <!--hide-->

View File

@ -0,0 +1 @@
../../../en/operations/settings/settings.md

View File

@ -1,436 +0,0 @@
# System tables
System tables are used for implementing part of the system's functionality, and for providing access to information about how the system is working.
You can't delete a system table (but you can perform DETACH).
System tables don't have files with data on the disk or files with metadata. The server creates all the system tables when it starts.
System tables are read-only.
They are located in the 'system' database.
## system.asynchronous_metrics
Contain metrics used for profiling and monitoring.
They usually reflect the number of events currently in the system, or the total resources consumed by the system.
Example: The number of SELECT queries currently running; the amount of memory in use.`system.asynchronous_metrics`and`system.metrics` differ in their sets of metrics and how they are calculated.
## system.clusters
Contains information about clusters available in the config file and the servers in them.
Columns:
```
cluster String — The cluster name.
shard_num UInt32 — The shard number in the cluster, starting from 1.
shard_weight UInt32 — The relative weight of the shard when writing data.
replica_num UInt32 — The replica number in the shard, starting from 1.
host_name String — The host name, as specified in the config.
String host_address — The host IP address obtained from DNS.
port UInt16 — The port to use for connecting to the server.
user String — The name of the user for connecting to the server.
```
## system.columns
Contains information about the columns in all tables.
You can use this table to get information similar to `DESCRIBE TABLE`, but for multiple tables at once.
```
database String — The name of the database the table is in.
table String Table name.
name String — Column name.
type String — Column type.
default_type String — Expression type (DEFAULT, MATERIALIZED, ALIAS) for the default value, or an empty string if it is not defined.
default_expression String — Expression for the default value, or an empty string if it is not defined.
```
## system.databases
This table contains a single String column called 'name' the name of a database.
Each database that the server knows about has a corresponding entry in the table.
This system table is used for implementing the `SHOW DATABASES` query.
## system.dictionaries
Contains information about external dictionaries.
Columns:
- `name String` — Dictionary name.
- `type String` — Dictionary type: Flat, Hashed, Cache.
- `origin String` — Path to the configuration file that describes the dictionary.
- `attribute.names Array(String)` — Array of attribute names provided by the dictionary.
- `attribute.types Array(String)` — Corresponding array of attribute types that are provided by the dictionary.
- `has_hierarchy UInt8` — Whether the dictionary is hierarchical.
- `bytes_allocated UInt64` — The amount of RAM the dictionary uses.
- `hit_rate Float64` — For cache dictionaries, the percentage of uses for which the value was in the cache.
- `element_count UInt64` — The number of items stored in the dictionary.
- `load_factor Float64` — The percentage full of the dictionary (for a hashed dictionary, the percentage filled in the hash table).
- `creation_time DateTime` — The time when the dictionary was created or last successfully reloaded.
- `last_exception String` — Text of the error that occurs when creating or reloading the dictionary if the dictionary couldn't be created.
- `source String` — Text describing the data source for the dictionary.
Note that the amount of memory used by the dictionary is not proportional to the number of items stored in it. So for flat and cached dictionaries, all the memory cells are pre-assigned, regardless of how full the dictionary actually is.
## system.events
Contains information about the number of events that have occurred in the system. This is used for profiling and monitoring purposes.
Example: The number of processed SELECT queries.
Columns: 'event String' the event name, and 'value UInt64' the quantity.
## system.functions
Contains information about normal and aggregate functions.
Columns:
- `name`(`String`) The name of the function.
- `is_aggregate`(`UInt8`) — Whether the function is aggregate.
## system.merges
Contains information about merges currently in process for tables in the MergeTree family.
Columns:
- `database String` — The name of the database the table is in.
- `table String` — Table name.
- `elapsed Float64` — The time elapsed (in seconds) since the merge started.
- `progress Float64` — The percentage of completed work from 0 to 1.
- `num_parts UInt64` — The number of pieces to be merged.
- `result_part_name String` — The name of the part that will be formed as the result of merging.
- `total_size_bytes_compressed UInt64` — The total size of the compressed data in the merged chunks.
- `total_size_marks UInt64` — The total number of marks in the merged partss.
- `bytes_read_uncompressed UInt64` — Number of bytes read, uncompressed.
- `rows_read UInt64` — Number of rows read.
- `bytes_written_uncompressed UInt64` — Number of bytes written, uncompressed.
- `rows_written UInt64` — Number of lines rows written.
## system.metrics
## system.numbers
This table contains a single UInt64 column named 'number' that contains almost all the natural numbers starting from zero.
You can use this table for tests, or if you need to do a brute force search.
Reads from this table are not parallelized.
## system.numbers_mt
The same as 'system.numbers' but reads are parallelized. The numbers can be returned in any order.
Used for tests.
## system.one
This table contains a single row with a single 'dummy' UInt8 column containing the value 0.
This table is used if a SELECT query doesn't specify the FROM clause.
This is similar to the DUAL table found in other DBMSs.
## system.parts
Contains information about parts of [MergeTree](table_engines/mergetree.md) tables.
Each row describes one part of the data.
Columns:
- partition (String) The partition name. To learn what a partition is, see the description of the [ALTER](../query_language/alter.md#query_language_queries_alter) query.
Formats:
- `YYYYMM` for automatic partitioning by month.
- `any_string` when partitioning manually.
- name (String) Name of the data part.
- active (UInt8) Indicates whether the part is active. If a part is active, it is used in a table; otherwise, it will be deleted. Inactive data parts remain after merging.
- marks (UInt64) The number of marks. To get the approximate number of rows in a data part, multiply ``marks`` by the index granularity (usually 8192).
- marks_size (UInt64) The size of the file with marks.
- rows (UInt64) The number of rows.
- bytes (UInt64) The number of bytes when compressed.
- modification_time (DateTime) The modification time of the directory with the data part. This usually corresponds to the time of data part creation.|
- remove_time (DateTime) The time when the data part became inactive.
- refcount (UInt32) The number of places where the data part is used. A value greater than 2 indicates that the data part is used in queries or merges.
- min_date (Date) The minimum value of the date key in the data part.
- max_date (Date) The maximum value of the date key in the data part.
- min_block_number (UInt64) The minimum number of data parts that make up the current part after merging.
- max_block_number (UInt64) The maximum number of data parts that make up the current part after merging.
- level (UInt32) Depth of the merge tree. If a merge was not performed, ``level=0``.
- primary_key_bytes_in_memory (UInt64) The amount of memory (in bytes) used by primary key values.
- primary_key_bytes_in_memory_allocated (UInt64) The amount of memory (in bytes) reserved for primary key values.
- database (String) Name of the database.
- table (String) Name of the table.
- engine (String) Name of the table engine without parameters.
## system.processes
This system table is used for implementing the `SHOW PROCESSLIST` query.
Columns:
```
user String Name of the user who made the request. For distributed query processing, this is the user who helped the requestor server send the query to this server, not the user who made the distributed request on the requestor server.
address String - The IP address the request was made from. The same for distributed processing.
elapsed Float64 - The time in seconds since request execution started.
rows_read UInt64 - The number of rows read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
bytes_read UInt64 - The number of uncompressed bytes read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
total_rows_approx UInt64 - The approximation of the total number of rows that should be read. For distributed processing, on the requestor server, this is the total for all remote servers. It can be updated during request processing, when new sources to process become known.
memory_usage UInt64 - How much memory the request uses. It might not include some types of dedicated memory.
query String - The query text. For INSERT, it doesn't include the data to insert.
query_id String - Query ID, if defined.
```
## system.replicas
Contains information and status for replicated tables residing on the local server.
This table can be used for monitoring. The table contains a row for every Replicated\* table.
Example:
``` sql
SELECT *
FROM system.replicas
WHERE table = 'visits'
FORMAT Vertical
```
```
Row 1:
──────
database: merge
table: visits
engine: ReplicatedCollapsingMergeTree
is_leader: 1
is_readonly: 0
is_session_expired: 0
future_parts: 1
parts_to_check: 0
zookeeper_path: /clickhouse/tables/01-06/visits
replica_name: example01-06-1.yandex.ru
replica_path: /clickhouse/tables/01-06/visits/replicas/example01-06-1.yandex.ru
columns_version: 9
queue_size: 1
inserts_in_queue: 0
merges_in_queue: 1
log_max_index: 596273
log_pointer: 596274
total_replicas: 2
active_replicas: 2
```
Columns:
```
database: Database name
table: Table name
engine: Table engine name
is_leader: Whether the replica is the leader.
Only one replica at a time can be the leader. The leader is responsible for selecting background merges to perform.
Note that writes can be performed to any replica that is available and has a session in ZK, regardless of whether it is a leader.
is_readonly: Whether the replica is in read-only mode.
This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when reinitializing sessions in ZooKeeper, and during session reinitialization in ZooKeeper.
is_session_expired: Whether the session with ZooKeeper has expired.
Basically the same as 'is_readonly'.
future_parts: The number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.
parts_to_check: The number of data parts in the queue for verification.
A part is put in the verification queue if there is suspicion that it might be damaged.
zookeeper_path: Path to table data in ZooKeeper.
replica_name: Replica name in ZooKeeper. Different replicas of the same table have different names.
replica_path: Path to replica data in ZooKeeper. The same as concatenating 'zookeeper_path/replicas/replica_path'.
columns_version: Version number of the table structure.
Indicates how many times ALTER was performed. If replicas have different versions, it means some replicas haven't made all of the ALTERs yet.
queue_size: Size of the queue for operations waiting to be performed.
Operations include inserting blocks of data, merges, and certain other actions.
It usually coincides with 'future_parts'.
inserts_in_queue: Number of inserts of blocks of data that need to be made.
Insertions are usually replicated fairly quickly. If this number is large, it means something is wrong.
merges_in_queue: The number of merges waiting to be made.
Sometimes merges are lengthy, so this value may be greater than zero for a long time.
The next 4 columns have a non-zero value only where there is an active session with ZK.
log_max_index: Maximum entry number in the log of general activity.
log_pointer: Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one.
If log_pointer is much smaller than log_max_index, something is wrong.
total_replicas: The total number of known replicas of this table.
active_replicas: The number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas).
```
If you request all the columns, the table may work a bit slowly, since several reads from ZooKeeper are made for each row.
If you don't request the last 4 columns (log_max_index, log_pointer, total_replicas, active_replicas), the table works quickly.
For example, you can check that everything is working correctly like this:
``` sql
SELECT
database,
table,
is_leader,
is_readonly,
is_session_expired,
future_parts,
parts_to_check,
columns_version,
queue_size,
inserts_in_queue,
merges_in_queue,
log_max_index,
log_pointer,
total_replicas,
active_replicas
FROM system.replicas
WHERE
is_readonly
OR is_session_expired
OR future_parts > 20
OR parts_to_check > 10
OR queue_size > 20
OR inserts_in_queue > 10
OR log_max_index - log_pointer > 10
OR total_replicas < 2
OR active_replicas < total_replicas
```
If this query doesn't return anything, it means that everything is fine.
## system.settings
Contains information about settings that are currently in use.
I.e. used for executing the query you are using to read from the system.settings table.
Columns:
```
name String — Setting name.
value String — Setting value.
changed UInt8 — Whether the setting was explicitly defined in the config or explicitly changed.
```
Example:
``` sql
SELECT *
FROM system.settings
WHERE changed
```
```
┌─name───────────────────┬─value───────┬─changed─┐
│ max_threads │ 8 │ 1 │
│ use_uncompressed_cache │ 0 │ 1 │
│ load_balancing │ random │ 1 │
│ max_memory_usage │ 10000000000 │ 1 │
└────────────────────────┴─────────────┴─────────┘
```
## system.tables
This table contains the String columns 'database', 'name', and 'engine'.
The table also contains three virtual columns: metadata_modification_time (DateTime type), create_table_query, and engine_full (String type).
Each table that the server knows about is entered in the 'system.tables' table.
This system table is used for implementing SHOW TABLES queries.
## system.zookeeper
The table does not exist if ZooKeeper is not configured. Allows reading data from the ZooKeeper cluster defined in the config.
The query must have a 'path' equality condition in the WHERE clause. This is the path in ZooKeeper for the children that you want to get data for.
The query `SELECT * FROM system.zookeeper WHERE path = '/clickhouse'` outputs data for all children on the `/clickhouse` node.
To output data for all root nodes, write path = '/'.
If the path specified in 'path' doesn't exist, an exception will be thrown.
Columns:
- `name String` — The name of the node.
- `path String` — The path to the node.
- `value String` — Node value.
- `dataLength Int32` — Size of the value.
- `numChildren Int32` — Number of descendants.
- `czxid Int64` — ID of the transaction that created the node.
- `mzxid Int64` — ID of the transaction that last changed the node.
- `pzxid Int64` — ID of the transaction that last deleted or added descendants.
- `ctime DateTime` — Time of node creation.
- `mtime DateTime` — Time of the last modification of the node.
- `version Int32` — Node version: the number of times the node was changed.
- `cversion Int32` — Number of added or removed descendants.
- `aversion Int32` — Number of changes to the ACL.
- `ephemeralOwner Int64` — For ephemeral nodes, the ID of hte session that owns this node.
Example:
``` sql
SELECT *
FROM system.zookeeper
WHERE path = '/clickhouse/tables/01-08/visits/replicas'
FORMAT Vertical
```
```
Row 1:
──────
name: example01-08-1.yandex.ru
value:
czxid: 932998691229
mzxid: 932998691229
ctime: 2015-03-27 16:49:51
mtime: 2015-03-27 16:49:51
version: 0
cversion: 47
aversion: 0
ephemeralOwner: 0
dataLength: 0
numChildren: 7
pzxid: 987021031383
path: /clickhouse/tables/01-08/visits/replicas
Row 2:
──────
name: example01-08-2.yandex.ru
value:
czxid: 933002738135
mzxid: 933002738135
ctime: 2015-03-27 16:57:01
mtime: 2015-03-27 16:57:01
version: 0
cversion: 37
aversion: 0
ephemeralOwner: 0
dataLength: 0
numChildren: 7
pzxid: 987021252247
path: /clickhouse/tables/01-08/visits/replicas
```
[Original article](https://clickhouse.yandex/docs/en/operations/system_tables/) <!--hide-->

View File

@ -0,0 +1 @@
../../en/operations/system_tables.md

View File

@ -221,7 +221,7 @@ In the example below, the index can't be used.
SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
```
To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](../settings/settings.md#settings-settings-force_index_by_date) and [force_primary_key](../settings/settings.md).
To check whether ClickHouse can use the index when running a query, use the settings [force_index_by_date](../settings/settings.md#settings-force_index_by_date) and [force_primary_key](../settings/settings.md).
The key for partitioning by month allows reading only those data blocks which contain dates from the proper range. In this case, the data block may contain data for many dates (up to an entire month). Within a block, data is sorted by primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.

View File

@ -46,7 +46,7 @@ You can specify any existing ZooKeeper cluster and the system will use a directo
If ZooKeeper isn't set in the config file, you can't create replicated tables, and any existing replicated tables will be read-only.
ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](../settings/settings.md#settings_settings_max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](../settings/settings.md).
ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](../settings/settings.md#settings-max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](../settings/settings.md).
For each `INSERT` query, approximately ten entries are added to ZooKeeper through several transactions. (To be more precise, this is for each inserted block of data; an INSERT query contains one block or one block per `max_insert_block_size = 1048576` rows.) This leads to slightly longer latencies for `INSERT` compared to non-replicated tables. But if you follow the recommendations to insert data in batches of no more than one `INSERT` per second, it doesn't create any problems. The entire ClickHouse cluster used for coordinating one ZooKeeper cluster has a total of several hundred `INSERTs` per second. The throughput on data inserts (the number of rows per second) is just as high as for non-replicated data.

View File

@ -0,0 +1 @@
../../en/operations/troubleshooting.md

View File

@ -334,7 +334,7 @@ ARRAY JOIN nest AS n, arrayEnumerate(`nest.x`) AS num
如果在WHERE/PREWHERE子句中使用了ARRAY JOIN子句的结果它将优先于WHERE/PREWHERE子句执行否则它将在WHERE/PRWHERE子句之后执行以便减少计算。
### JOIN 子句
### JOIN 子句 {#select-join}
JOIN子句用于连接数据作用与[SQL JOIN](https://en.wikipedia.org/wiki/Join_(SQL))的定义相同。
@ -469,7 +469,7 @@ PREWHERE 仅支持`*MergeTree`系列引擎。
如果将'optimize_move_to_prewhere'设置为1并且在查询中不包含PREWHERE则系统将自动的把适合PREWHERE表达式的部分从WHERE中抽离到PREWHERE中。
### GROUP BY 子句
### GROUP BY 子句 {#select-group-by-clause}
这是列式数据库管理系统中最重要的一部分。
@ -566,7 +566,7 @@ GROUP BY子句会为遇到的每一个不同的key计算一组聚合函数的值
你可以在子查询包含子查询的JOIN子句中使用WITH TOTALS在这种情况下它们各自的总值会被组合在一起
#### GROUP BY 使用外部存储设备
#### GROUP BY 使用外部存储设备 {#select-group-by-in-external-memory}
你可以在GROUP BY中允许将临时数据转存到磁盘上以限制对内存的使用。
`max_bytes_before_external_group_by`这个配置确定了在GROUP BY中启动将临时数据转存到磁盘上的内存阈值。如果你将它设置为0这是默认值这项功能将被禁用。
@ -682,7 +682,7 @@ WHERE于HAVING不同之处在于WHERE在聚合前(GROUP BY)执行HAVING在聚
聚合函数与聚合函数之前的表达式都将在聚合期间完成计算GROUP BY
就像他们本身就已经存在结果上一样。
### DISTINCT 子句
### DISTINCT 子句 {#select-distinct}
如果存在DISTINCT子句则会对结果中的完全相同的行进行去重。
在GROUP BY不包含聚合函数并对全部SELECT部分都包含在GROUP BY中时的作用一样。但该子句还是与GROUP BY子句存在以下几点不同