ClickHouse/docs/en/engines/table-engines/special/distributed.md

---
toc_priority: 33
toc_title: Distributed
---

# Distributed Table Engine {#distributed}

Tables with Distributed engine do not store any data of their own, but allow distributed query processing on multiple servers.
Reading is automatically parallelized. During a read, the table indexes on remote servers are used, if there are any.

## Creating a Table {#distributed-creating-a-table}

``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = Distributed(cluster, database, table[, sharding_key[, policy_name]])
[SETTINGS name=value, ...]
```

### From a Table {#distributed-from-a-table}
When the `Distributed` table is pointing to a table on the current server you can adopt that table's schema:

``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2 ENGINE = Distributed(cluster, database, table[, sharding_key[, policy_name]]) [SETTINGS name=value, ...]
```

**Distributed Parameters**

-   `cluster` - the cluster name in the server’s config file

-   `database` - the name of a remote database

-   `table` - the name of a remote table

-   `sharding_key` - (optionally) sharding key

-   `policy_name` - (optionally) policy name, it will be used to store temporary files for async send

**See Also**

 - [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
 - [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) for the examples

**Distributed Settings**

- `fsync_after_insert` - do the `fsync` for the file data after asynchronous insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.

- `fsync_directories` - do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to asynchronous inserts on Distributed table (after insert, after sending the data to shard, etc).

- `bytes_to_throw_insert` - if more than this number of compressed bytes will be pending for async INSERT, an exception will be thrown. 0 - do not throw. Default 0.

- `bytes_to_delay_insert` - if more than this number of compressed bytes will be pending for async INSERT, the query will be delayed. 0 - do not delay. Default 0.

- `max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for async send. Default 60.

- `monitor_batch_inserts` - same as [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)

- `monitor_split_batch_on_failure` - same as [distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure)

- `monitor_sleep_time_ms` - same as [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)

- `monitor_max_sleep_time_ms` - same as [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)

!!! note "Note"

    **Durability settings** (`fsync_...`):

    - Affect only asynchronous INSERTs (i.e. `insert_distributed_sync=false`) when data first stored on the initiator node disk and later asynchronously send to shards.
    - May significantly decrease the inserts' performance
    - Affect writing the data stored inside Distributed table folder into the **node which accepted your insert**. If you need to have guarantees of writing data to underlying MergeTree tables - see durability settings (`...fsync...`) in `system.merge_tree_settings`

    For **Insert limit settings** (`..._insert`) see also:

    - [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
    - [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) setting
    - `bytes_to_throw_insert` handled before `bytes_to_delay_insert`, so you should not set it to the value less then `bytes_to_delay_insert`

**Example**

``` sql
CREATE TABLE hits_all AS hits
ENGINE = Distributed(logs, default, hits[, sharding_key[, policy_name]])
SETTINGS
    fsync_after_insert=0,
    fsync_directories=0;
```

Data will be read from all servers in the `logs` cluster, from the `default.hits` table located on every server in the cluster.
Data is not only read but is partially processed on the remote servers (to the extent that this is possible).
For example, for a query with `GROUP BY`, data will be aggregated on remote servers, and the intermediate states of aggregate functions will be sent to the requestor server. Then data will be further aggregated.

Instead of the database name, you can use a constant expression that returns a string. For example: `currentDatabase()`.

## Clusters {#distributed-clusters}

Clusters are configured in the [server configuration file](../../../operations/configuration-files.md):

``` xml
<remote_servers>
    <logs>
        <!-- Inter-server per-cluster secret for Distributed queries
             default: no secret (no authentication will be performed)

             If set, then Distributed queries will be validated on shards, so at least:
             - such cluster should exist on the shard,
             - such cluster should have the same secret.

             And also (and which is more important), the initial_user will
             be used as current user for the query.
        -->
        <!-- <secret></secret> -->
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>false</internal_replication>
            <replica>
                <!-- Optional. Priority of the replica for load balancing (see also load_balancing setting). Default: 1 (less value has more priority). -->
                <priority>1</priority>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <weight>2</weight>
            <internal_replication>false</internal_replication>
            <replica>
                <host>example01-02-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-02-2</host>
                <secure>1</secure>
                <port>9440</port>
            </replica>
        </shard>
    </logs>
</remote_servers>
```

Here a cluster is defined with the name `logs` that consists of two shards, each of which contains two replicas.
Shards refer to the servers that contain different parts of the data (in order to read all the data, you must access all the shards).
Replicas are duplicating servers (in order to read all the data, you can access the data on any one of the replicas).

Cluster names must not contain dots.

The parameters `host`, `port`, and optionally `user`, `password`, `secure`, `compression` are specified for each server:

- `host` – The address of the remote server. You can use either the domain or the IPv4 or IPv6 address. If you specify the domain, the server makes a DNS request when it starts, and the result is stored as long as the server is running. If the DNS request fails, the server does not start. If you change the DNS record, restart the server.
- `port` – The TCP port for messenger activity (`tcp_port` in the config, usually set to 9000). Not to be confused with `http_port`.
- `user` – Name of the user for connecting to a remote server. Default value is the `default` user. This user must have access to connect to the specified server. Access is configured in the `users.xml` file. For more information, see the section [Access rights](../../../operations/access-rights.md).
- `password` – The password for connecting to a remote server (not masked). Default value: empty string.
- `secure` - Whether to use a secure SSL/TLS connection. Usually also requires specifying the port (the default secure port is `9440`). The server should listen on `<tcp_port_secure>9440</tcp_port_secure>` and be configured with correct certificates.
- `compression` - Use data compression. Default value: `true`.

When specifying replicas, one of the available replicas will be selected for each of the shards when reading. You can configure the algorithm for load balancing (the preference for which replica to access) – see the [load_balancing](../../../operations/settings/settings.md#settings-load_balancing) setting.
If the connection with the server is not established, there will be an attempt to connect with a short timeout. If the connection failed, the next replica will be selected, and so on for all the replicas. If the connection attempt failed for all the replicas, the attempt will be repeated the same way, several times.
This works in favour of resiliency, but does not provide complete fault tolerance: a remote server might accept the connection, but might not work, or work poorly.

You can specify just one of the shards (in this case, query processing should be called remote, rather than distributed) or up to any number of shards. In each shard, you can specify from one to any number of replicas. You can specify a different number of replicas for each shard.

You can specify as many clusters as you wish in the configuration.

To view your clusters, use the `system.clusters` table.

The `Distributed` engine allows working with a cluster like a local server. However, the cluster's configuration cannot be specified dynamically, it has to be configured in the server config file. Usually, all servers in a cluster will have the same cluster config (though this is not required). Clusters from the config file are updated on the fly, without restarting the server.

If you need to send a query to an unknown set of shards and replicas each time, you do not need to create a `Distributed` table – use the `remote` table function instead. See the section [Table functions](../../../sql-reference/table-functions/index.md).

## Writing data {#distributed-writing-data}

There are two methods for writing data to a cluster:

First, you can define which servers to write which data to and perform the write directly on each shard. In other words, perform direct `INSERT` statements on the remote tables in the cluster that the `Distributed` table is pointing to. This is the most flexible solution as you can use any sharding scheme, even one that is non-trivial due to the requirements of the subject area. This is also the most optimal solution since data can be written to different shards completely independently.

Second, you can perform `INSERT` statements on a `Distributed` table. In this case, the table will distribute the inserted data across the servers itself. In order to write to a `Distributed` table, it must have the `sharding_key` parameter configured (except if there is only one shard).

Each shard can have a `<weight>` defined in the config file. By default, the weight is `1`. Data is distributed across shards in the amount proportional to the shard weight. All shard weights are summed up, then each shard's weight is divided by the total to determine each shard's proportion. For example, if there are two shards and the first has a weight of 1 while the second has a weight of 2, the first will be sent one third (1 / 3) of inserted rows and the second will be sent two thirds (2 / 3).

Each shard can have the `internal_replication` parameter defined in the config file. If this parameter is set to `true`, the write operation selects the first healthy replica and writes data to it. Use this if the tables underlying the `Distributed` table are replicated tables (e.g. any of the `Replicated*MergeTree` table engines). One of the table replicas will receive the write and it will be replicated to the other replicas automatically.

If `internal_replication` is set to `false` (the default), data is written to all replicas. In this case, the `Distributed` table replicates data itself. This is worse than using replicated tables because the consistency of replicas is not checked and, over time, they will contain slightly different data.

To select the shard that a row of data is sent to, the sharding expression is analyzed, and its remainder is taken from dividing it by the total weight of the shards. The row is sent to the shard that corresponds to the half-interval of the remainders from `prev_weights` to `prev_weights + weight`, where `prev_weights` is the total weight of the shards with the smallest number, and `weight` is the weight of this shard. For example, if there are two shards, and the first has a weight of 9 while the second has a weight of 10, the row will be sent to the first shard for the remainders from the range \[0, 9), and to the second for the remainders from the range \[9, 19).

The sharding expression can be any expression from constants and table columns that returns an integer. For example, you can use the expression `rand()` for random distribution of data, or `UserID` for distribution by the remainder from dividing the user’s ID (then the data of a single user will reside on a single shard, which simplifies running `IN` and `JOIN` by users). If one of the columns is not distributed evenly enough, you can wrap it in a hash function e.g. `intHash64(UserID)`.

A simple remainder from the division is a limited solution for sharding and isn’t always appropriate. It works for medium and large volumes of data (dozens of servers), but not for very large volumes of data (hundreds of servers or more). In the latter case, use the sharding scheme required by the subject area rather than using entries in `Distributed` tables.

You should be concerned about the sharding scheme in the following cases:

-   Queries are used that require joining data (`IN` or `JOIN`) by a specific key. If data is sharded by this key, you can use local `IN` or `JOIN` instead of `GLOBAL IN` or `GLOBAL JOIN`, which is much more efficient.
-   A large number of servers is used (hundreds or more) with a large number of small queries, for example, queries for data of individual clients (e.g. websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, as we’ve done in Yandex.Metrica, you can set up bi-level sharding: divide the entire cluster into “layers”, where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. `Distributed` tables are created for each layer, and a single shared distributed table is created for global queries.

Data is written asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The periodicity for sending data is managed by the [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. The number of threads performing background tasks can be set by [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting.

If the server ceased to exist or had a rough restart (for example, due to a hardware failure) after an `INSERT` to a `Distributed` table, the inserted data might be lost. If a damaged data part is detected in the table directory, it is transferred to the `broken` subdirectory and no longer used.

## Reading data {#distributed-reading-data}

When querying a `Distributed` table, `SELECT` queries are sent to all shards and work regardless of how data is distributed across the shards (they can be distributed completely randomly). When you add a new shard, you do not have to transfer old data into it. Instead, you can write new data to it by using a heavier weight – the data will be distributed slightly unevenly, but queries will work correctly and efficiently.

When the `max_parallel_replicas` option is enabled, query processing is parallelized across all replicas within a single shard. For more information, see the section [max_parallel_replicas](../../../operations/settings/settings.md#settings-max_parallel_replicas).

## Virtual Columns {#virtual-columns}

-   `_shard_num` — Contains the `shard_num` value from the table `system.clusters`. Type: [UInt32](../../../sql-reference/data-types/int-uint.md).

!!! note "Note"
    Since [remote](../../../sql-reference/table-functions/remote.md) and [cluster](../../../sql-reference/table-functions/cluster.md) table functions internally create temporary Distributed table, `_shard_num` is available there too.

**See Also**

-   [Virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns) description
-   [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting
-   [shardNum()](../../../sql-reference/functions/other-functions.md#shard-num) and [shardCount()](../../../sql-reference/functions/other-functions.md#shard-count) functions
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								---
 								toc_priority: 33
 								toc_title: Distributed
 								---
-												Update distributed.md
											
										
										
											2020-06-10 20:18:36 +00:00
+								# Distributed Table Engine {#distributed}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Grammar suggestions to distributed.md

* fixed some typos.
* improved wording of some statements.
											
										
										
											2021-10-20 20:35:17 +00:00
+								Tables with Distributed engine do not store any data of their own, but allow distributed query processing on multiple servers.
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Reading is automatically parallelized. During a read, the table indexes on remote servers are used, if there are any.
-												Add ability to use multiple disks/volumes for Distributed engine

Now Distributed() has gain the 5-th argument -- policy name (for storing
data to send):

  CREATE TABLE foo (key Int) Engine=Distributed(test_shard_localhost, currentDatabase(), some_table, key%2, 'default');

											
										
										
											2020-01-20 17:54:52 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								## Creating a Table {#distributed-creating-a-table}
-												Add ability to use multiple disks/volumes for Distributed engine

Now Distributed() has gain the 5-th argument -- policy name (for storing
data to send):

  CREATE TABLE foo (key Int) Engine=Distributed(test_shard_localhost, currentDatabase(), some_table, key%2, 'default');

											
										
										
											2020-01-20 17:54:52 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								``` sql
 								CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
 								(
 								    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
 								    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
 								    ...
 								) ENGINE = Distributed(cluster, database, table[, sharding_key[, policy_name]])
 								[SETTINGS name=value, ...]
 								```
 								### From a Table {#distributed-from-a-table}
 								When the `Distributed` table is pointing to a table on the current server you can adopt that table's schema:
 								``` sql
 								CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2 ENGINE = Distributed(cluster, database, table[, sharding_key[, policy_name]]) [SETTINGS name=value, ...]
 								```
 								**Distributed Parameters**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								-   `cluster` - the cluster name in the server’s config file
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								-   `database` - the name of a remote database
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								-   `table` - the name of a remote table
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								-   `sharding_key` - (optionally) sharding key
-												Add ability to use multiple disks/volumes for Distributed engine

Now Distributed() has gain the 5-th argument -- policy name (for storing
data to send):

  CREATE TABLE foo (key Int) Engine=Distributed(test_shard_localhost, currentDatabase(), some_table, key%2, 'default');

											
										
										
											2020-01-20 17:54:52 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								-   `policy_name` - (optionally) policy name, it will be used to store temporary files for async send
-												Fix list formatting in Distributed docs.

											
										
										
											2021-12-16 12:31:28 +00:00
+								**See Also**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Fix list formatting in Distributed docs.

											
										
										
											2021-12-16 12:31:28 +00:00
+								 - [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
 								 - [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) for the examples
-												Add ability to use multiple disks/volumes for Distributed engine

Now Distributed() has gain the 5-th argument -- policy name (for storing
data to send):

  CREATE TABLE foo (key Int) Engine=Distributed(test_shard_localhost, currentDatabase(), some_table, key%2, 'default');

											
										
										
											2020-01-20 17:54:52 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								**Distributed Settings**
-												Add fsync support for Distributed engine.

Two new settings (by analogy with MergeTree family) has been added:

- `fsync_after_insert` - Do fsync for every inserted. Will decreases
  performance of inserts.

- `fsync_tmp_directory` - Do fsync for temporary directory (that is used
  for async INSERT only) after all part operations (writes, renames,
  etc.).

Refs: #17380 (p1)

											
										
										
											2021-01-07 14:14:41 +00:00
-												Update documentation for Distributed fsync settings.

											
										
										
											2021-01-08 23:42:15 +00:00
+								- `fsync_after_insert` - do the `fsync` for the file data after asynchronous insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.
-												Add fsync support for Distributed engine.

Two new settings (by analogy with MergeTree family) has been added:

- `fsync_after_insert` - Do fsync for every inserted. Will decreases
  performance of inserts.

- `fsync_tmp_directory` - Do fsync for temporary directory (that is used
  for async INSERT only) after all part operations (writes, renames,
  etc.).

Refs: #17380 (p1)

											
										
										
											2021-01-07 14:14:41 +00:00
-												Rename fsync_tmp_directory to fsync_directories for Distributed engine

											
										
										
											2021-01-09 14:51:30 +00:00
+								- `fsync_directories` - do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to asynchronous inserts on Distributed table (after insert, after sending the data to shard, etc).
-												Update documentation for Distributed fsync settings.

											
										
										
											2021-01-08 23:42:15 +00:00
-												Distributed: Add ability to delay/throttle INSERT until pending data will be reduced

Add two new settings for the Distributed engine:
- bytes_to_delay_insert
- max_delay_to_insert

If at the beginning of INSERT there will be too much pending data, more
then bytes_to_delay_insert, then the INSERT will wait until it will be
shrinked, and not more then max_delay_to_insert seconds.

If after this there will be still too much pending, it will throw an
exception.

Also new profile events were added (by analogy to the MergeTree):
- DistributedDelayedInserts (although you can use system.errors instead
  of this, but still)
- DistributedRejectedInserts
- DistributedDelayedInsertsMilliseconds

											
										
										
											2021-01-27 18:43:41 +00:00
+								- `bytes_to_throw_insert` - if more than this number of compressed bytes will be pending for async INSERT, an exception will be thrown. 0 - do not throw. Default 0.
-												Distributed: Add ability to limit amount of pending bytes for async INSERT

Right now with distributed_directory_monitor_batch_inserts=1 and
insert_distributed_sync=0 INSERT into Distributed table will store
blocks that should be sent to remote (and in case of
prefer_localhost_replica=0 to the localhost too) on the local
filesystem, and sent it in background.

However there is no limit for this storage, and if the remote is
unavailable (or some other error), these pending blocks may take
significant space, and this is not always desired behaviour.

Add new Distributed setting - bytes_to_throw_insert, that will set the
limit for how much pending bytes is allowed, if the limit will be
reached an exception will be throw.

By default was set to 0, to avoid surprises.

											
										
										
											2021-01-26 18:45:37 +00:00
-												Distributed: Add ability to delay/throttle INSERT until pending data will be reduced

Add two new settings for the Distributed engine:
- bytes_to_delay_insert
- max_delay_to_insert

If at the beginning of INSERT there will be too much pending data, more
then bytes_to_delay_insert, then the INSERT will wait until it will be
shrinked, and not more then max_delay_to_insert seconds.

If after this there will be still too much pending, it will throw an
exception.

Also new profile events were added (by analogy to the MergeTree):
- DistributedDelayedInserts (although you can use system.errors instead
  of this, but still)
- DistributedRejectedInserts
- DistributedDelayedInsertsMilliseconds

											
										
										
											2021-01-27 18:43:41 +00:00
+								- `bytes_to_delay_insert` - if more than this number of compressed bytes will be pending for async INSERT, the query will be delayed. 0 - do not delay. Default 0.
-												Distributed: Add ability to limit amount of pending bytes for async INSERT

Right now with distributed_directory_monitor_batch_inserts=1 and
insert_distributed_sync=0 INSERT into Distributed table will store
blocks that should be sent to remote (and in case of
prefer_localhost_replica=0 to the localhost too) on the local
filesystem, and sent it in background.

However there is no limit for this storage, and if the remote is
unavailable (or some other error), these pending blocks may take
significant space, and this is not always desired behaviour.

Add new Distributed setting - bytes_to_throw_insert, that will set the
limit for how much pending bytes is allowed, if the limit will be
reached an exception will be throw.

By default was set to 0, to avoid surprises.

											
										
										
											2021-01-26 18:45:37 +00:00
-												Distributed: Add ability to delay/throttle INSERT until pending data will be reduced

Add two new settings for the Distributed engine:
- bytes_to_delay_insert
- max_delay_to_insert

If at the beginning of INSERT there will be too much pending data, more
then bytes_to_delay_insert, then the INSERT will wait until it will be
shrinked, and not more then max_delay_to_insert seconds.

If after this there will be still too much pending, it will throw an
exception.

Also new profile events were added (by analogy to the MergeTree):
- DistributedDelayedInserts (although you can use system.errors instead
  of this, but still)
- DistributedRejectedInserts
- DistributedDelayedInsertsMilliseconds

											
										
										
											2021-01-27 18:43:41 +00:00
+								- `max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for async send. Default 60.
-												Distributed: Add ability to limit amount of pending bytes for async INSERT

Right now with distributed_directory_monitor_batch_inserts=1 and
insert_distributed_sync=0 INSERT into Distributed table will store
blocks that should be sent to remote (and in case of
prefer_localhost_replica=0 to the localhost too) on the local
filesystem, and sent it in background.

However there is no limit for this storage, and if the remote is
unavailable (or some other error), these pending blocks may take
significant space, and this is not always desired behaviour.

Add new Distributed setting - bytes_to_throw_insert, that will set the
limit for how much pending bytes is allowed, if the limit will be
reached an exception will be throw.

By default was set to 0, to avoid surprises.

											
										
										
											2021-01-26 18:45:37 +00:00
-												Add ability to set Distributed directory monitor settings via CREATE TABLE

											
										
										
											2021-07-15 06:26:10 +00:00
+								- `monitor_batch_inserts` - same as [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)
 								- `monitor_split_batch_on_failure` - same as [distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure)
 								- `monitor_sleep_time_ms` - same as [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)
 								- `monitor_max_sleep_time_ms` - same as [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)
-												Update documentation for Distributed fsync settings.

											
										
										
											2021-01-08 23:42:15 +00:00
+								!!! note "Note"
 								    **Durability settings** (`fsync_...`):
 								    - Affect only asynchronous INSERTs (i.e. `insert_distributed_sync=false`) when data first stored on the initiator node disk and later asynchronously send to shards.
 								    - May significantly decrease the inserts' performance
 								    - Affect writing the data stored inside Distributed table folder into the **node which accepted your insert**. If you need to have guarantees of writing data to underlying MergeTree tables - see durability settings (`...fsync...`) in `system.merge_tree_settings`
-												Add fsync support for Distributed engine.

Two new settings (by analogy with MergeTree family) has been added:

- `fsync_after_insert` - Do fsync for every inserted. Will decreases
  performance of inserts.

- `fsync_tmp_directory` - Do fsync for temporary directory (that is used
  for async INSERT only) after all part operations (writes, renames,
  etc.).

Refs: #17380 (p1)

											
										
										
											2021-01-07 14:14:41 +00:00
-												Distributed: Add ability to delay/throttle INSERT until pending data will be reduced

Add two new settings for the Distributed engine:
- bytes_to_delay_insert
- max_delay_to_insert

If at the beginning of INSERT there will be too much pending data, more
then bytes_to_delay_insert, then the INSERT will wait until it will be
shrinked, and not more then max_delay_to_insert seconds.

If after this there will be still too much pending, it will throw an
exception.

Also new profile events were added (by analogy to the MergeTree):
- DistributedDelayedInserts (although you can use system.errors instead
  of this, but still)
- DistributedRejectedInserts
- DistributedDelayedInsertsMilliseconds

											
										
										
											2021-01-27 18:43:41 +00:00
+								    For **Insert limit settings** (`..._insert`) see also:
 								    - [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
 								    - [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) setting
 								    - `bytes_to_throw_insert` handled before `bytes_to_delay_insert`, so you should not set it to the value less then `bytes_to_delay_insert`
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								**Example**
-												Fixed newlines in .rst files before code blocks [#CLICKHOUSE-2].
for i in $(find . -name '*.rst'); do grep -F -q '.. code-block:: ' $i && cat $i | sed -r -e 's/$/<NEWLINE>/' | tr -d '\n' | sed -r -e 's/([^>])<NEWLINE>.. code-block::/\1<NEWLINE><NEWLINE>.. code-block::/g' | sed -r -e 's/<NEWLINE>/\n/g' > ${i}.tmp && mv ${i}.tmp ${i}; done

											
										
										
											2017-06-13 20:35:07 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								CREATE TABLE hits_all AS hits
 								ENGINE = Distributed(logs, default, hits[, sharding_key[, policy_name]])
-												Add fsync support for Distributed engine.

Two new settings (by analogy with MergeTree family) has been added:

- `fsync_after_insert` - Do fsync for every inserted. Will decreases
  performance of inserts.

- `fsync_tmp_directory` - Do fsync for temporary directory (that is used
  for async INSERT only) after all part operations (writes, renames,
  etc.).

Refs: #17380 (p1)

											
										
										
											2021-01-07 14:14:41 +00:00
+								SETTINGS
 								    fsync_after_insert=0,
-												Rename fsync_tmp_directory to fsync_directories for Distributed engine

											
										
										
											2021-01-09 14:51:30 +00:00
+								    fsync_directories=0;
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								```
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								Data will be read from all servers in the `logs` cluster, from the `default.hits` table located on every server in the cluster.
-												Minor improvements in docs build and content (#9752)


											
										
										
											2020-03-19 11:51:22 +00:00
+								Data is not only read but is partially processed on the remote servers (to the extent that this is possible).
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								For example, for a query with `GROUP BY`, data will be aggregated on remote servers, and the intermediate states of aggregate functions will be sent to the requestor server. Then data will be further aggregated.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								Instead of the database name, you can use a constant expression that returns a string. For example: `currentDatabase()`.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								## Clusters {#distributed-clusters}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								Clusters are configured in the [server configuration file](../../../operations/configuration-files.md):
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` xml
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								<remote_servers>
 								    <logs>
-												Fix error configuration for cluster secret

											
										
										
											2021-03-01 08:30:42 +00:00
+								        <!-- Inter-server per-cluster secret for Distributed queries
 								             default: no secret (no authentication will be performed)
-												Secure inter-cluster query execution (with initial_user as current query user) [v3]

Add inter-server cluster secret, it is used for Distributed queries
inside cluster, you can configure in the configuration file:

  <remote_servers>
      <logs>
          <shard>
              <secret>foobar</secret> <!-- empty -- works as before -->
              ...
          </shard>
      </logs>
  </remote_servers>

And this will allow clickhouse to make sure that the query was not
faked, and was issued from the node that knows the secret. And since
trust appeared it can use initial_user for query execution, this will
apply correct *_for_user (since with inter-server secret enabled, the
query will be executed from the same user on the shards as on initator,
unlike "default" user w/o it).

v2: Change user to the initial_user for Distributed queries if secret match
v3: Add Protocol::Cluster package
v4: Drop Protocol::Cluster and use plain Protocol::Hello + user marker
v5: Do not use user from Hello for cluster-secure (superfluous)

											
										
										
											2020-09-14 21:55:43 +00:00
-												Fix error configuration for cluster secret

											
										
										
											2021-03-01 08:30:42 +00:00
+								             If set, then Distributed queries will be validated on shards, so at least:
 								             - such cluster should exist on the shard,
 								             - such cluster should have the same secret.
-												Secure inter-cluster query execution (with initial_user as current query user) [v3]

Add inter-server cluster secret, it is used for Distributed queries
inside cluster, you can configure in the configuration file:

  <remote_servers>
      <logs>
          <shard>
              <secret>foobar</secret> <!-- empty -- works as before -->
              ...
          </shard>
      </logs>
  </remote_servers>

And this will allow clickhouse to make sure that the query was not
faked, and was issued from the node that knows the secret. And since
trust appeared it can use initial_user for query execution, this will
apply correct *_for_user (since with inter-server secret enabled, the
query will be executed from the same user on the shards as on initator,
unlike "default" user w/o it).

v2: Change user to the initial_user for Distributed queries if secret match
v3: Add Protocol::Cluster package
v4: Drop Protocol::Cluster and use plain Protocol::Hello + user marker
v5: Do not use user from Hello for cluster-secure (superfluous)

											
										
										
											2020-09-14 21:55:43 +00:00
-												Fix error configuration for cluster secret

											
										
										
											2021-03-01 08:30:42 +00:00
+								             And also (and which is more important), the initial_user will
 								             be used as current user for the query.
 								        -->
 								        <!-- <secret></secret> -->
 								        <shard>
-												English translation update.

											
										
										
											2018-03-25 02:04:22 +00:00
+								            <!-- Optional. Shard weight when writing data. Default: 1. -->
-												English translation is updated.

											
										
										
											2018-04-23 06:20:21 +00:00
+								            <weight>1</weight>
 								            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
 								            <internal_replication>false</internal_replication>
 								            <replica>
-												Add replica priority into documentation

											
										
										
											2020-06-27 07:47:17 +00:00
+								                <!-- Optional. Priority of the replica for load balancing (see also load_balancing setting). Default: 1 (less value has more priority). -->
 								                <priority>1</priority>
-												English translation is updated.

											
										
										
											2018-04-23 06:20:21 +00:00
+								                <host>example01-01-1</host>
 								                <port>9000</port>
 								            </replica>
 								            <replica>
 								                <host>example01-01-2</host>
 								                <port>9000</port>
 								            </replica>
 								        </shard>
 								        <shard>
 								            <weight>2</weight>
 								            <internal_replication>false</internal_replication>
 								            <replica>
 								                <host>example01-02-1</host>
 								                <port>9000</port>
 								            </replica>
 								            <replica>
 								                <host>example01-02-2</host>
-												wip

											
										
										
											2018-09-18 15:59:14 +00:00
+								                <secure>1</secure>
 								                <port>9440</port>
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								            </replica>
 								        </shard>
 								    </logs>
 								</remote_servers>
 								```
-												Replase back/forward quotes and apostrophes by straight
											
										
										
											2020-06-10 10:52:41 +00:00
+								Here a cluster is defined with the name `logs` that consists of two shards, each of which contains two replicas.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
+								Shards refer to the servers that contain different parts of the data (in order to read all the data, you must access all the shards).
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								Replicas are duplicating servers (in order to read all the data, you can access the data on any one of the replicas).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												add notice that dots in cluster names are forbidden [#CLICKHOUSE-3983]

											
										
										
											2018-09-17 17:56:24 +00:00
+								Cluster names must not contain dots.
-												wip

											
										
										
											2018-09-18 15:59:14 +00:00
+								The parameters `host`, `port`, and optionally `user`, `password`, `secure`, `compression` are specified for each server:
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								- `host` – The address of the remote server. You can use either the domain or the IPv4 or IPv6 address. If you specify the domain, the server makes a DNS request when it starts, and the result is stored as long as the server is running. If the DNS request fails, the server does not start. If you change the DNS record, restart the server.
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								- `port` – The TCP port for messenger activity (`tcp_port` in the config, usually set to 9000). Not to be confused with `http_port`.
 								- `user` – Name of the user for connecting to a remote server. Default value is the `default` user. This user must have access to connect to the specified server. Access is configured in the `users.xml` file. For more information, see the section [Access rights](../../../operations/access-rights.md).
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								- `password` – The password for connecting to a remote server (not masked). Default value: empty string.
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								- `secure` - Whether to use a secure SSL/TLS connection. Usually also requires specifying the port (the default secure port is `9440`). The server should listen on `<tcp_port_secure>9440</tcp_port_secure>` and be configured with correct certificates.
 								- `compression` - Use data compression. Default value: `true`.
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Fix broken links in docs

											
										
										
											2020-10-13 17:23:29 +00:00
+								When specifying replicas, one of the available replicas will be selected for each of the shards when reading. You can configure the algorithm for load balancing (the preference for which replica to access) – see the [load_balancing](../../../operations/settings/settings.md#settings-load_balancing) setting.
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								If the connection with the server is not established, there will be an attempt to connect with a short timeout. If the connection failed, the next replica will be selected, and so on for all the replicas. If the connection attempt failed for all the replicas, the attempt will be repeated the same way, several times.
-												Minor improvements in docs build and content (#9752)


											
										
										
											2020-03-19 11:51:22 +00:00
+								This works in favour of resiliency, but does not provide complete fault tolerance: a remote server might accept the connection, but might not work, or work poorly.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								You can specify just one of the shards (in this case, query processing should be called remote, rather than distributed) or up to any number of shards. In each shard, you can specify from one to any number of replicas. You can specify a different number of replicas for each shard.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								You can specify as many clusters as you wish in the configuration.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Replase back/forward quotes and apostrophes by straight
											
										
										
											2020-06-10 10:52:41 +00:00
+								To view your clusters, use the `system.clusters` table.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								The `Distributed` engine allows working with a cluster like a local server. However, the cluster's configuration cannot be specified dynamically, it has to be configured in the server config file. Usually, all servers in a cluster will have the same cluster config (though this is not required). Clusters from the config file are updated on the fly, without restarting the server.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								If you need to send a query to an unknown set of shards and replicas each time, you do not need to create a `Distributed` table – use the `remote` table function instead. See the section [Table functions](../../../sql-reference/table-functions/index.md).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								## Writing data {#distributed-writing-data}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								There are two methods for writing data to a cluster:
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								First, you can define which servers to write which data to and perform the write directly on each shard. In other words, perform direct `INSERT` statements on the remote tables in the cluster that the `Distributed` table is pointing to. This is the most flexible solution as you can use any sharding scheme, even one that is non-trivial due to the requirements of the subject area. This is also the most optimal solution since data can be written to different shards completely independently.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								Second, you can perform `INSERT` statements on a `Distributed` table. In this case, the table will distribute the inserted data across the servers itself. In order to write to a `Distributed` table, it must have the `sharding_key` parameter configured (except if there is only one shard).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								Each shard can have a `<weight>` defined in the config file. By default, the weight is `1`. Data is distributed across shards in the amount proportional to the shard weight. All shard weights are summed up, then each shard's weight is divided by the total to determine each shard's proportion. For example, if there are two shards and the first has a weight of 1 while the second has a weight of 2, the first will be sent one third (1 / 3) of inserted rows and the second will be sent two thirds (2 / 3).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								Each shard can have the `internal_replication` parameter defined in the config file. If this parameter is set to `true`, the write operation selects the first healthy replica and writes data to it. Use this if the tables underlying the `Distributed` table are replicated tables (e.g. any of the `Replicated*MergeTree` table engines). One of the table replicas will receive the write and it will be replicated to the other replicas automatically.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								If `internal_replication` is set to `false` (the default), data is written to all replicas. In this case, the `Distributed` table replicates data itself. This is worse than using replicated tables because the consistency of replicas is not checked and, over time, they will contain slightly different data.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Grammar suggestions to distributed.md

* fixed some typos.
* improved wording of some statements.
											
										
										
											2021-10-20 20:35:17 +00:00
+								To select the shard that a row of data is sent to, the sharding expression is analyzed, and its remainder is taken from dividing it by the total weight of the shards. The row is sent to the shard that corresponds to the half-interval of the remainders from `prev_weights` to `prev_weights + weight`, where `prev_weights` is the total weight of the shards with the smallest number, and `weight` is the weight of this shard. For example, if there are two shards, and the first has a weight of 9 while the second has a weight of 10, the row will be sent to the first shard for the remainders from the range \[0, 9), and to the second for the remainders from the range \[9, 19).
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								The sharding expression can be any expression from constants and table columns that returns an integer. For example, you can use the expression `rand()` for random distribution of data, or `UserID` for distribution by the remainder from dividing the user’s ID (then the data of a single user will reside on a single shard, which simplifies running `IN` and `JOIN` by users). If one of the columns is not distributed evenly enough, you can wrap it in a hash function e.g. `intHash64(UserID)`.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								A simple remainder from the division is a limited solution for sharding and isn’t always appropriate. It works for medium and large volumes of data (dozens of servers), but not for very large volumes of data (hundreds of servers or more). In the latter case, use the sharding scheme required by the subject area rather than using entries in `Distributed` tables.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Table engines

											
										
										
											2017-04-26 17:26:17 +00:00
+								You should be concerned about the sharding scheme in the following cases:
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								-   Queries are used that require joining data (`IN` or `JOIN`) by a specific key. If data is sharded by this key, you can use local `IN` or `JOIN` instead of `GLOBAL IN` or `GLOBAL JOIN`, which is much more efficient.
 								-   A large number of servers is used (hundreds or more) with a large number of small queries, for example, queries for data of individual clients (e.g. websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, as we’ve done in Yandex.Metrica, you can set up bi-level sharding: divide the entire cluster into “layers”, where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. `Distributed` tables are created for each layer, and a single shared distributed table is created for global queries.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Grammar suggestions to distributed.md

* fixed some typos.
* improved wording of some statements.
											
										
										
											2021-10-20 20:35:17 +00:00
+								Data is written asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The periodicity for sending data is managed by the [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. The number of threads performing background tasks can be set by [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Add sections to Distributed documentation.

											
										
										
											2021-12-10 18:29:15 +00:00
+								If the server ceased to exist or had a rough restart (for example, due to a hardware failure) after an `INSERT` to a `Distributed` table, the inserted data might be lost. If a damaged data part is detected in the table directory, it is transferred to the `broken` subdirectory and no longer used.
 								## Reading data {#distributed-reading-data}
 								When querying a `Distributed` table, `SELECT` queries are sent to all shards and work regardless of how data is distributed across the shards (they can be distributed completely randomly). When you add a new shard, you do not have to transfer old data into it. Instead, you can write new data to it by using a heavier weight – the data will be distributed slightly unevenly, but queries will work correctly and efficiently.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Fix broken links in docs

											
										
										
											2020-10-13 17:23:29 +00:00
+								When the `max_parallel_replicas` option is enabled, query processing is parallelized across all replicas within a single shard. For more information, see the section [max_parallel_replicas](../../../operations/settings/settings.md#settings-max_parallel_replicas).
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Virtual Columns {#virtual-columns}
-												Add _shard_num virtual column for the Distributed engine

With JOIN from system.clusters one can figure out from which server data
came.

TODO:
- optimization to avoid communicating with unrelated shards (for queries
  like "AND _shard_num = n")
- fix aliases (see tests with serverError expected)

v0: AddingConstColumnBlockInputStream
v2: VirtualColumnUtils::rewriteEntityInAst
v3: fix remote(Distributed) by appending _shard_num only if has been requested

											
										
										
											2019-09-18 21:17:00 +00:00
-												Virtual column in Distributed updated, link fixed, links added
Translated that part

											
										
										
											2021-10-09 19:17:02 +00:00
+								-   `_shard_num` — Contains the `shard_num` value from the table `system.clusters`. Type: [UInt32](../../../sql-reference/data-types/int-uint.md).
-												Add _shard_num virtual column for the Distributed engine

With JOIN from system.clusters one can figure out from which server data
came.

TODO:
- optimization to avoid communicating with unrelated shards (for queries
  like "AND _shard_num = n")
- fix aliases (see tests with serverError expected)

v0: AddingConstColumnBlockInputStream
v2: VirtualColumnUtils::rewriteEntityInAst
v3: fix remote(Distributed) by appending _shard_num only if has been requested

											
										
										
											2019-09-18 21:17:00 +00:00
 								!!! note "Note"
-												Virtual column in Distributed updated, link fixed, links added
Translated that part

											
										
										
											2021-10-09 19:17:02 +00:00
+								    Since [remote](../../../sql-reference/table-functions/remote.md) and [cluster](../../../sql-reference/table-functions/cluster.md) table functions internally create temporary Distributed table, `_shard_num` is available there too.
-												Add _shard_num virtual column for the Distributed engine

With JOIN from system.clusters one can figure out from which server data
came.

TODO:
- optimization to avoid communicating with unrelated shards (for queries
  like "AND _shard_num = n")
- fix aliases (see tests with serverError expected)

v0: AddingConstColumnBlockInputStream
v2: VirtualColumnUtils::rewriteEntityInAst
v3: fix remote(Distributed) by appending _shard_num only if has been requested

											
										
										
											2019-09-18 21:17:00 +00:00
 								**See Also**
-												Virtual column in Distributed updated, link fixed, links added
Translated that part

											
										
										
											2021-10-09 19:17:02 +00:00
+								-   [Virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns) description
 								-   [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting
-												New buildId variant
Links from Distributed

											
										
										
											2021-10-09 18:37:28 +00:00
+								-   [shardNum()](../../../sql-reference/functions/other-functions.md#shard-num) and [shardCount()](../../../sql-reference/functions/other-functions.md#shard-count) functions
-												Add _shard_num virtual column for the Distributed engine

With JOIN from system.clusters one can figure out from which server data
came.

TODO:
- optimization to avoid communicating with unrelated shards (for queries
  like "AND _shard_num = n")
- fix aliases (see tests with serverError expected)

v0: AddingConstColumnBlockInputStream
v2: VirtualColumnUtils::rewriteEntityInAst
v3: fix remote(Distributed) by appending _shard_num only if has been requested

											
										
										
											2019-09-18 21:17:00 +00:00