ClickHouse/docs/en/operations/table_engines/distributed.md


# Distributed

**The Distributed engine does not store data itself**, but allows distributed query processing on multiple servers.
Reading is automatically parallelized. During a read, the table indexes on remote servers are used, if there are any.

The Distributed engine accepts parameters:

- the cluster name in the server's config file
- the name of a remote database
- the name of a remote table
- (optionally) sharding key
- (optionally) policy name, it will be used to store temporary files for async send

  See also:
  - `insert_distributed_sync` setting
  - [MergeTree](mergetree.md#table_engine-mergetree-multiple-volumes) for the examples

Example:

```sql
Distributed(logs, default, hits[, sharding_key[, policy_name]])
```

Data will be read from all servers in the 'logs' cluster, from the default.hits table located on every server in the cluster.
Data is not only read, but is partially processed on the remote servers (to the extent that this is possible).
For example, for a query with GROUP BY, data will be aggregated on remote servers, and the intermediate states of aggregate functions will be sent to the requestor server. Then data will be further aggregated.

Instead of the database name, you can use a constant expression that returns a string. For example: currentDatabase().

logs – The cluster name in the server's config file.

Clusters are set like this:

```xml
<remote_servers>
    <logs>
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>false</internal_replication>
            <replica>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <weight>2</weight>
            <internal_replication>false</internal_replication>
            <replica>
                <host>example01-02-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-02-2</host>
                <secure>1</secure>
                <port>9440</port>
            </replica>
        </shard>
    </logs>
</remote_servers>
```

Here a cluster is defined with the name 'logs' that consists of two shards, each of which contains two replicas.
Shards refer to the servers that contain different parts of the data (in order to read all the data, you must access all the shards).
Replicas are duplicating servers (in order to read all the data, you can access the data on any one of the replicas).

Cluster names must not contain dots.

The parameters `host`, `port`, and optionally `user`, `password`, `secure`, `compression` are specified for each server:

:   -   `host` – The address of the remote server. You can use either the domain or the IPv4 or IPv6 address. If you specify the domain, the server makes a DNS request when it starts, and the result is stored as long as the server is running. If the DNS request fails, the server doesn't start. If you change the DNS record, restart the server.
-   `port` – The TCP port for messenger activity ('tcp_port' in the config, usually set to 9000). Do not confuse it with http_port.
-   `user` – Name of the user for connecting to a remote server. Default value: default. This user must have access to connect to the specified server. Access is configured in the users.xml file. For more information, see the section [Access rights](../../operations/access_rights.md).
-   `password` – The password for connecting to a remote server (not masked). Default value: empty string.
-   `secure` - Use ssl for connection, usually you also should define `port` = 9440. Server should listen on <tcp_port_secure>9440</tcp_port_secure> and have correct certificates.
-   `compression` - Use data compression. Default value: true.

When specifying replicas, one of the available replicas will be selected for each of the shards when reading. You can configure the algorithm for load balancing (the preference for which replica to access) – see the [load_balancing](../settings/settings.md#settings-load_balancing) setting.
If the connection with the server is not established, there will be an attempt to connect with a short timeout. If the connection failed, the next replica will be selected, and so on for all the replicas. If the connection attempt failed for all the replicas, the attempt will be repeated the same way, several times.
This works in favor of resiliency, but does not provide complete fault tolerance: a remote server might accept the connection, but might not work, or work poorly.

You can specify just one of the shards (in this case, query processing should be called remote, rather than distributed) or up to any number of shards. In each shard, you can specify from one to any number of replicas. You can specify a different number of replicas for each shard.

You can specify as many clusters as you wish in the configuration.

To view your clusters, use the 'system.clusters' table.

The Distributed engine allows working with a cluster like a local server. However, the cluster is inextensible: you must write its configuration in the server config file (even better, for all the cluster's servers).

There is no support for Distributed tables that look at other Distributed tables (except in cases when a Distributed table only has one shard). As an alternative, make the Distributed table look at the "final" tables.

The Distributed engine requires writing clusters to the config file. Clusters from the config file are updated on the fly, without restarting the server. If you need to send a query to an unknown set of shards and replicas each time, you don't need to create a Distributed table – use the 'remote' table function instead. See the section [Table functions](../../query_language/table_functions/index.md).

There are two methods for writing data to a cluster:

First, you can define which servers to write which data to and perform the write directly on each shard. In other words, perform INSERT in the tables that the distributed table "looks at". This is the most flexible solution as you can use any sharding scheme, which could be non-trivial due to the requirements of the subject area. This is also the most optimal solution, since data can be written to different shards completely independently.

Second, you can perform INSERT in a Distributed table. In this case, the table will distribute the inserted data across servers itself. In order to write to a Distributed table, it must have a sharding key set (the last parameter). In addition, if there is only one shard, the write operation works without specifying the sharding key, since it doesn't mean anything in this case.

Each shard can have a weight defined in the config file. By default, the weight is equal to one. Data is distributed across shards in the amount proportional to the shard weight. For example, if there are two shards and the first has a weight of 9 while the second has a weight of 10, the first will be sent 9 / 19 parts of the rows, and the second will be sent 10 / 19.

Each shard can have the 'internal_replication' parameter defined in the config file.

If this parameter is set to 'true', the write operation selects the first healthy replica and writes data to it. Use this alternative if the Distributed table "looks at" replicated tables. In other words, if the table where data will be written is going to replicate them itself.

If it is set to 'false' (the default), data is written to all replicas. In essence, this means that the Distributed table replicates data itself. This is worse than using replicated tables, because the consistency of replicas is not checked, and over time they will contain slightly different data.

To select the shard that a row of data is sent to, the sharding expression is analyzed, and its remainder is taken from dividing it by the total weight of the shards. The row is sent to the shard that corresponds to the half-interval of the remainders from 'prev_weight' to 'prev_weights + weight', where 'prev_weights' is the total weight of the shards with the smallest number, and 'weight' is the weight of this shard. For example, if there are two shards, and the first has a weight of 9 while the second has a weight of 10, the row will be sent to the first shard for the remainders from the range \[0, 9), and to the second for the remainders from the range \[9, 19).

The sharding expression can be any expression from constants and table columns that returns an integer. For example, you can use the expression 'rand()' for random distribution of data, or 'UserID' for distribution by the remainder from dividing the user's ID (then the data of a single user will reside on a single shard, which simplifies running IN and JOIN by users). If one of the columns is not distributed evenly enough, you can wrap it in a hash function: intHash64(UserID).

A simple remainder from division is a limited solution for sharding and isn't always appropriate. It works for medium and large volumes of data (dozens of servers), but not for very large volumes of data (hundreds of servers or more). In the latter case, use the sharding scheme required by the subject area, rather than using entries in Distributed tables.

SELECT queries are sent to all the shards, and work regardless of how data is distributed across the shards (they can be distributed completely randomly). When you add a new shard, you don't have to transfer the old data to it. You can write new data with a heavier weight – the data will be distributed slightly unevenly, but queries will work correctly and efficiently.

You should be concerned about the sharding scheme in the following cases:

- Queries are used that require joining data (IN or JOIN) by a specific key. If data is sharded by this key, you can use local IN or JOIN instead of GLOBAL IN or GLOBAL JOIN, which is much more efficient.
- A large number of servers is used (hundreds or more) with a large number of small queries (queries of individual clients - websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, as we've done in Yandex.Metrica, you can set up bi-level sharding: divide the entire cluster into "layers", where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. Distributed tables are created for each layer, and a single shared distributed table is created for global queries.

Data is written asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The period for sending data is managed by the [distributed_directory_monitor_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`.

If the server ceased to exist or had a rough restart (for example, after a device failure) after an INSERT to a Distributed table, the inserted data might be lost. If a damaged data part is detected in the table directory, it is transferred to the 'broken' subdirectory and no longer used.

When the max_parallel_replicas option is enabled, query processing is parallelized across all replicas within a single shard. For more information, see the section [max_parallel_replicas](../settings/settings.md#settings-max_parallel_replicas).

## Virtual Columns

- `_shard_num` — Contains the `shard_num` (from `system.clusters`). Type: [UInt32](../../data_types/int_uint.md).

!!! note "Note"
    Since [`remote`](../../query_language/table_functions/remote.md)/`cluster` table functions internally create temporary instance of the same Distributed engine, `_shard_num` is available there too.

**See Also**

- [Virtual columns](index.md#table_engines-virtual_columns)


[Original article](https://clickhouse.tech/docs/en/operations/table_engines/distributed/) <!--hide-->
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Update distributed.md
											
										
										
											2018-02-21 07:04:22 +00:00
+								# Distributed
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
 								**The Distributed engine does not store data itself**, but allows distributed query processing on multiple servers.
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Reading is automatically parallelized. During a read, the table indexes on remote servers are used, if there are any.
-												Add ability to use multiple disks/volumes for Distributed engine

Now Distributed() has gain the 5-th argument -- policy name (for storing
data to send):

  CREATE TABLE foo (key Int) Engine=Distributed(test_shard_localhost, currentDatabase(), some_table, key%2, 'default');

											
										
										
											2020-01-20 17:54:52 +00:00
 								The Distributed engine accepts parameters:
 								- the cluster name in the server's config file
 								- the name of a remote database
 								- the name of a remote table
 								- (optionally) sharding key
 								- (optionally) policy name, it will be used to store temporary files for async send
 								  See also:
 								  - `insert_distributed_sync` setting
-												Links fixed.

											
										
										
											2020-01-28 15:27:44 +00:00
+								  - [MergeTree](mergetree.md#table_engine-mergetree-multiple-volumes) for the examples
-												Add ability to use multiple disks/volumes for Distributed engine

Now Distributed() has gain the 5-th argument -- policy name (for storing
data to send):

  CREATE TABLE foo (key Int) Engine=Distributed(test_shard_localhost, currentDatabase(), some_table, key%2, 'default');

											
										
										
											2020-01-20 17:54:52 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Example:
-												Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2].
for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i | sed -r -e 's/$/<NEWLINE>/' | tr -d '\n' | sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' | sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done

											
										
										
											2017-06-13 20:35:07 +00:00
-												DOCAPI-8530: Code blocks markup fix (#7060)

* Typo fix.

* Links fix.

* Fixed links in docs.

* More fixes.

* docs/en: cleaning some files

* docs/en: cleaning data_types

* docs/en: cleaning database_engines

* docs/en: cleaning development

* docs/en: cleaning getting_started

* docs/en: cleaning interfaces

* docs/en: cleaning operations

* docs/en: cleaning query_lamguage

* docs/en: cleaning en

* docs/ru: cleaning data_types

* docs/ru: cleaning index

* docs/ru: cleaning database_engines

* docs/ru: cleaning development

* docs/ru: cleaning general

* docs/ru: cleaning getting_started

* docs/ru: cleaning interfaces

* docs/ru: cleaning operations

* docs/ru: cleaning query_language

* docs: cleaning interfaces/http

* Update docs/en/data_types/array.md

decorated ```

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/getting_started/example_datasets/nyc_taxi.md

fixed typo

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/getting_started/example_datasets/ontime.md

fixed typo

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/interfaces/formats.md

fixed error

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/operations/table_engines/custom_partitioning_key.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/operations/utils/clickhouse-local.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/query_language/dicts/external_dicts_dict_sources.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/operations/utils/clickhouse-local.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/query_language/functions/json_functions.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/query_language/functions/json_functions.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/query_language/functions/other_functions.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/query_language/functions/other_functions.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/query_language/functions/date_time_functions.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* Update docs/en/operations/table_engines/jdbc.md

Co-Authored-By: BayoNet <da-daos@yandex.ru>

* docs: fixed error

* docs: fixed error

											
										
										
											2019-09-23 15:31:46 +00:00
+								```sql
-												Add ability to use multiple disks/volumes for Distributed engine

Now Distributed() has gain the 5-th argument -- policy name (for storing
data to send):

  CREATE TABLE foo (key Int) Engine=Distributed(test_shard_localhost, currentDatabase(), some_table, key%2, 'default');

											
										
										
											2020-01-20 17:54:52 +00:00
+								Distributed(logs, default, hits[, sharding_key[, policy_name]])
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								Data will be read from all servers in the 'logs' cluster, from the default.hits table located on every server in the cluster.
 								Data is not only read, but is partially processed on the remote servers (to the extent that this is possible).
 								For example, for a query with GROUP BY, data will be aggregated on remote servers, and the intermediate states of aggregate functions will be sent to the requestor server. Then data will be further aggregated.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								Instead of the database name, you can use a constant expression that returns a string. For example: currentDatabase().
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								logs – The cluster name in the server's config file.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Clusters are set like this:
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								```xml
 								<remote_servers>
 								    <logs>
 								        <shard>
-												English translation update.

											
										
										
											2018-03-25 02:04:22 +00:00
+								            <!-- Optional. Shard weight when writing data. Default: 1. -->
-												English translation is updated.

											
										
										
											2018-04-23 06:20:21 +00:00
+								            <weight>1</weight>
 								            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
 								            <internal_replication>false</internal_replication>
 								            <replica>
 								                <host>example01-01-1</host>
 								                <port>9000</port>
 								            </replica>
 								            <replica>
 								                <host>example01-01-2</host>
 								                <port>9000</port>
 								            </replica>
 								        </shard>
 								        <shard>
 								            <weight>2</weight>
 								            <internal_replication>false</internal_replication>
 								            <replica>
 								                <host>example01-02-1</host>
 								                <port>9000</port>
 								            </replica>
 								            <replica>
 								                <host>example01-02-2</host>
-												wip

											
										
										
											2018-09-18 15:59:14 +00:00
+								                <secure>1</secure>
 								                <port>9440</port>
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								            </replica>
 								        </shard>
 								    </logs>
 								</remote_servers>
 								```
 								Here a cluster is defined with the name 'logs' that consists of two shards, each of which contains two replicas.
 								Shards refer to the servers that contain different parts of the data (in order to read all the data, you must access all the shards).
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Replicas are duplicating servers (in order to read all the data, you can access the data on any one of the replicas).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												add notice that dots in cluster names are forbidden [#CLICKHOUSE-3983]

											
										
										
											2018-09-17 17:56:24 +00:00
+								Cluster names must not contain dots.
-												wip

											
										
										
											2018-09-18 15:59:14 +00:00
+								The parameters `host`, `port`, and optionally `user`, `password`, `secure`, `compression` are specified for each server:
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
 								:   -   `host` – The address of the remote server. You can use either the domain or the IPv4 or IPv6 address. If you specify the domain, the server makes a DNS request when it starts, and the result is stored as long as the server is running. If the DNS request fails, the server doesn't start. If you change the DNS record, restart the server.
-												translate docs/zh/operations/table_engines/distributed.md (#5004)

* translate docs/zh/operations/table_engines/distributed.md

* fix indent

* Update docs/zh/operations/table_engines/distributed.md

Co-Authored-By: neverlee <neverlea@foxmail.com>

* Update docs/zh/operations/table_engines/distributed.md

Co-Authored-By: neverlee <neverlea@foxmail.com>

* Update docs/zh/operations/table_engines/distributed.md

Co-Authored-By: neverlee <neverlea@foxmail.com>

* fix error for docs/zh/operations/table_engines/distributed.md

* optimize docs/zh/operations/table_engines/distributed.md

											
										
										
											2019-04-16 09:51:45 +00:00
+								-   `port` – The TCP port for messenger activity ('tcp_port' in the config, usually set to 9000). Do not confuse it with http_port.
-												Doc change. Added links to other doc articles (distributed.md) (#7144)


											
										
										
											2019-10-08 13:56:48 +00:00
+								-   `user` – Name of the user for connecting to a remote server. Default value: default. This user must have access to connect to the specified server. Access is configured in the users.xml file. For more information, see the section [Access rights](../../operations/access_rights.md).
-												wip

											
										
										
											2018-09-18 15:59:14 +00:00
+								-   `password` – The password for connecting to a remote server (not masked). Default value: empty string.
 								-   `secure` - Use ssl for connection, usually you also should define `port` = 9440. Server should listen on <tcp_port_secure>9440</tcp_port_secure> and have correct certificates.
 								-   `compression` - Use data compression. Default value: true.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Doc change. Added links to other doc articles (distributed.md) (#7144)


											
										
										
											2019-10-08 13:56:48 +00:00
+								When specifying replicas, one of the available replicas will be selected for each of the shards when reading. You can configure the algorithm for load balancing (the preference for which replica to access) – see the [load_balancing](../settings/settings.md#settings-load_balancing) setting.
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								If the connection with the server is not established, there will be an attempt to connect with a short timeout. If the connection failed, the next replica will be selected, and so on for all the replicas. If the connection attempt failed for all the replicas, the attempt will be repeated the same way, several times.
 								This works in favor of resiliency, but does not provide complete fault tolerance: a remote server might accept the connection, but might not work, or work poorly.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								You can specify just one of the shards (in this case, query processing should be called remote, rather than distributed) or up to any number of shards. In each shard, you can specify from one to any number of replicas. You can specify a different number of replicas for each shard.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								You can specify as many clusters as you wish in the configuration.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								To view your clusters, use the 'system.clusters' table.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								The Distributed engine allows working with a cluster like a local server. However, the cluster is inextensible: you must write its configuration in the server config file (even better, for all the cluster's servers).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								There is no support for Distributed tables that look at other Distributed tables (except in cases when a Distributed table only has one shard). As an alternative, make the Distributed table look at the "final" tables.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Doc change. Added links to other doc articles (distributed.md) (#7144)


											
										
										
											2019-10-08 13:56:48 +00:00
+								The Distributed engine requires writing clusters to the config file. Clusters from the config file are updated on the fly, without restarting the server. If you need to send a query to an unknown set of shards and replicas each time, you don't need to create a Distributed table – use the 'remote' table function instead. See the section [Table functions](../../query_language/table_functions/index.md).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								There are two methods for writing data to a cluster:
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Update distributed.md (#77)


											
										
										
											2019-12-25 02:19:00 +00:00
+								First, you can define which servers to write which data to and perform the write directly on each shard. In other words, perform INSERT in the tables that the distributed table "looks at". This is the most flexible solution as you can use any sharding scheme, which could be non-trivial due to the requirements of the subject area. This is also the most optimal solution, since data can be written to different shards completely independently.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Update distributed.md (#77)


											
										
										
											2019-12-25 02:19:00 +00:00
+								Second, you can perform INSERT in a Distributed table. In this case, the table will distribute the inserted data across servers itself. In order to write to a Distributed table, it must have a sharding key set (the last parameter). In addition, if there is only one shard, the write operation works without specifying the sharding key, since it doesn't mean anything in this case.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Each shard can have a weight defined in the config file. By default, the weight is equal to one. Data is distributed across shards in the amount proportional to the shard weight. For example, if there are two shards and the first has a weight of 9 while the second has a weight of 10, the first will be sent 9 / 19 parts of the rows, and the second will be sent 10 / 19.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Each shard can have the 'internal_replication' parameter defined in the config file.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								If this parameter is set to 'true', the write operation selects the first healthy replica and writes data to it. Use this alternative if the Distributed table "looks at" replicated tables. In other words, if the table where data will be written is going to replicate them itself.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								If it is set to 'false' (the default), data is written to all replicas. In essence, this means that the Distributed table replicates data itself. This is worse than using replicated tables, because the consistency of replicas is not checked, and over time they will contain slightly different data.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Fixed typo in docs. [#CLICKHOUSE-3]

											
										
										
											2018-04-18 12:39:43 +00:00
+								To select the shard that a row of data is sent to, the sharding expression is analyzed, and its remainder is taken from dividing it by the total weight of the shards. The row is sent to the shard that corresponds to the half-interval of the remainders from 'prev_weight' to 'prev_weights + weight', where 'prev_weights' is the total weight of the shards with the smallest number, and 'weight' is the weight of this shard. For example, if there are two shards, and the first has a weight of 9 while the second has a weight of 10, the row will be sent to the first shard for the remainders from the range \[0, 9), and to the second for the remainders from the range \[9, 19).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								The sharding expression can be any expression from constants and table columns that returns an integer. For example, you can use the expression 'rand()' for random distribution of data, or 'UserID' for distribution by the remainder from dividing the user's ID (then the data of a single user will reside on a single shard, which simplifies running IN and JOIN by users). If one of the columns is not distributed evenly enough, you can wrap it in a hash function: intHash64(UserID).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								A simple remainder from division is a limited solution for sharding and isn't always appropriate. It works for medium and large volumes of data (dozens of servers), but not for very large volumes of data (hundreds of servers or more). In the latter case, use the sharding scheme required by the subject area, rather than using entries in Distributed tables.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												References on "Resharding" section were deleted.

											
										
										
											2018-02-12 11:02:23 +00:00
+								SELECT queries are sent to all the shards, and work regardless of how data is distributed across the shards (they can be distributed completely randomly). When you add a new shard, you don't have to transfer the old data to it. You can write new data with a heavier weight – the data will be distributed slightly unevenly, but queries will work correctly and efficiently.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								You should be concerned about the sharding scheme in the following cases:
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								- Queries are used that require joining data (IN or JOIN) by a specific key. If data is sharded by this key, you can use local IN or JOIN instead of GLOBAL IN or GLOBAL JOIN, which is much more efficient.
 								- A large number of servers is used (hundreds or more) with a large number of small queries (queries of individual clients - websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, as we've done in Yandex.Metrica, you can set up bi-level sharding: divide the entire cluster into "layers", where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. Distributed tables are created for each layer, and a single shared distributed table is created for global queries.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Update distributed.md (#77)


											
										
										
											2019-12-25 02:19:00 +00:00
+								Data is written asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The period for sending data is managed by the [distributed_directory_monitor_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								If the server ceased to exist or had a rough restart (for example, after a device failure) after an INSERT to a Distributed table, the inserted data might be lost. If a damaged data part is detected in the table directory, it is transferred to the 'broken' subdirectory and no longer used.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Doc change. Added links to other doc articles (distributed.md) (#7144)


											
										
										
											2019-10-08 13:56:48 +00:00
+								When the max_parallel_replicas option is enabled, query processing is parallelized across all replicas within a single shard. For more information, see the section [max_parallel_replicas](../settings/settings.md#settings-max_parallel_replicas).
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Add _shard_num virtual column for the Distributed engine

With JOIN from system.clusters one can figure out from which server data
came.

TODO:
- optimization to avoid communicating with unrelated shards (for queries
  like "AND _shard_num = n")
- fix aliases (see tests with serverError expected)

v0: AddingConstColumnBlockInputStream
v2: VirtualColumnUtils::rewriteEntityInAst
v3: fix remote(Distributed) by appending _shard_num only if has been requested

											
										
										
											2019-09-18 21:17:00 +00:00
+								## Virtual Columns
 								- `_shard_num` — Contains the `shard_num` (from `system.clusters`). Type: [UInt32](../../data_types/int_uint.md).
 								!!! note "Note"
 								    Since [`remote`](../../query_language/table_functions/remote.md)/`cluster` table functions internally create temporary instance of the same Distributed engine, `_shard_num` is available there too.
 								**See Also**
 								- [Virtual columns](index.md#table_engines-virtual_columns)
-												WIP on docs/website (#3383)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

											
										
										
											2018-10-16 10:47:17 +00:00
-												Domain change in docs

											
										
										
											2020-01-30 10:34:55 +00:00
+								[Original article](https://clickhouse.tech/docs/en/operations/table_engines/distributed/) <!--hide-->