ClickHouse/docs/en/sql-reference/table-functions/remote.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

161 lines
7.3 KiB
Markdown
Raw Normal View History

2020-04-03 13:23:32 +00:00
---
2022-08-28 14:53:34 +00:00
slug: /en/sql-reference/table-functions/remote
sidebar_position: 40
sidebar_label: remote
2020-04-03 13:23:32 +00:00
---
2022-06-02 10:55:18 +00:00
# remote, remoteSecure
2022-07-13 19:07:20 +00:00
Allows accessing remote servers, including migration of data, without creating a [Distributed](../../engines/table-engines/special/distributed.md) table. `remoteSecure` - same as `remote` but with a secured connection.
2021-01-19 22:39:12 +00:00
Both functions can be used in `SELECT` and `INSERT` queries.
2022-07-13 19:07:20 +00:00
## Syntax
2020-03-20 10:10:48 +00:00
``` sql
remote('addresses_expr', db, table[, 'user'[, 'password'], sharding_key])
remote('addresses_expr', db.table[, 'user'[, 'password'], sharding_key])
remoteSecure('addresses_expr', db, table[, 'user'[, 'password'], sharding_key])
remoteSecure('addresses_expr', db.table[, 'user'[, 'password'], sharding_key])
```
2022-07-13 19:07:20 +00:00
## Parameters
2021-01-19 22:39:12 +00:00
2021-07-29 15:20:55 +00:00
- `addresses_expr` — An expression that generates addresses of remote servers. This may be just one server address. The server address is `host:port`, or just `host`.
2021-07-29 15:27:50 +00:00
2021-07-29 15:20:55 +00:00
The host can be specified as the server name, or as the IPv4 or IPv6 address. An IPv6 address is specified in square brackets.
2021-07-29 15:27:50 +00:00
The port is the TCP port on the remote server. If the port is omitted, it uses [tcp_port](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port) from the servers config file in `remote` (by default, 9000) and [tcp_port_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) in `remoteSecure` (by default, 9440).
2021-01-19 22:39:12 +00:00
2021-01-19 23:02:46 +00:00
The port is required for an IPv6 address.
Type: [String](../../sql-reference/data-types/string.md).
2021-01-19 22:39:12 +00:00
- `db` — Database name. Type: [String](../../sql-reference/data-types/string.md).
- `table` — Table name. Type: [String](../../sql-reference/data-types/string.md).
- `user` — User name. If the user is not specified, `default` is used. Type: [String](../../sql-reference/data-types/string.md).
- `password` — User password. If the password is not specified, an empty password is used. Type: [String](../../sql-reference/data-types/string.md).
- `sharding_key` — Sharding key to support distributing data across nodes. For example: `insert into remote('127.0.0.1:9000,127.0.0.2', db, table, 'default', rand())`. Type: [UInt32](../../sql-reference/data-types/int-uint.md).
2021-01-19 22:39:12 +00:00
2022-07-13 19:07:20 +00:00
## Returned value
2021-01-19 22:39:12 +00:00
The dataset from remote servers.
2021-01-19 22:39:12 +00:00
2022-07-13 19:07:20 +00:00
## Usage
2022-07-13 19:07:20 +00:00
Unless you are migrating data from one system to another, using the `remote` table function is less optimal than creating a `Distributed` table because in this case the server connection is re-established for every request. Also, if hostnames are set, the names are resolved, and errors are not counted when working with various replicas. When processing a large number of queries, always create the `Distributed` table ahead of time, and do not use the `remote` table function.
2021-01-19 22:39:12 +00:00
The `remote` table function can be useful in the following cases:
2022-07-13 19:07:20 +00:00
- Migrating data from one system to another
- Accessing a specific server for data comparison, debugging, and testing.
- Queries between various ClickHouse clusters for research purposes.
- Infrequent distributed requests that are made manually.
- Distributed requests where the set of servers is re-defined each time.
### Addresses
2020-03-20 10:10:48 +00:00
``` text
example01-01-1
2022-07-13 19:07:20 +00:00
example01-01-1:9440
example01-01-1:9000
localhost
127.0.0.1
2022-07-13 19:07:20 +00:00
[::]:9440
[::]:9000
[2a02:6b8:0:1111::11]:9000
```
Multiple addresses can be comma-separated. In this case, ClickHouse will use distributed processing, so it will send the query to all specified addresses (like shards with different data). Example:
2020-03-20 10:10:48 +00:00
``` text
example01-01-1,example01-02-1
```
2022-07-13 19:07:20 +00:00
## Examples
2022-07-13 19:07:20 +00:00
### Selecting data from a remote server:
2021-01-19 22:39:12 +00:00
``` sql
SELECT * FROM remote('127.0.0.1', db.remote_engine_table) LIMIT 3;
```
2022-07-13 19:07:20 +00:00
### Inserting data from a remote server into a table:
2018-09-06 18:15:16 +00:00
2021-01-19 22:39:12 +00:00
``` sql
CREATE TABLE remote_table (name String, value UInt32) ENGINE=Memory;
INSERT INTO FUNCTION remote('127.0.0.1', currentDatabase(), 'remote_table') VALUES ('test', 42);
SELECT * FROM remote_table;
2021-01-19 22:39:12 +00:00
```
2022-07-13 19:07:20 +00:00
### Migration of tables from one system to another:
This example uses one table from a sample dataset. The database is `imdb`, and the table is `actors`.
#### On the source ClickHouse system (the system that currently hosts the data)
2022-07-13 23:27:33 +00:00
- Verify the source database and table name (`imdb.actors`)
```sql
show databases
```
2022-07-13 19:07:20 +00:00
2022-07-13 23:27:33 +00:00
```sql
show tables in imdb
```
2022-07-13 19:07:20 +00:00
- Get the CREATE TABLE statement from the source:
2022-07-13 23:27:33 +00:00
```
select create_table_query
from system.tables
where database = 'imdb' and table = 'actors'
```
Response
```sql
CREATE TABLE imdb.actors (`id` UInt32,
`first_name` String,
`last_name` String,
`gender` FixedString(1))
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
ORDER BY (id, first_name, last_name, gender)
SETTINGS index_granularity = 8192
```
2022-07-13 19:07:20 +00:00
#### On the destination ClickHouse system:
- Create the destination database:
2022-07-13 23:27:33 +00:00
```sql
CREATE DATABASE imdb
```
2022-07-13 19:07:20 +00:00
- Using the CREATE TABLE statement from the source, create the destination:
2022-07-13 23:27:33 +00:00
```sql
CREATE TABLE imdb.actors (`id` UInt32,
`first_name` String,
`last_name` String,
`gender` FixedString(1))
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
ORDER BY (id, first_name, last_name, gender)
SETTINGS index_granularity = 8192
```
2022-07-13 19:07:20 +00:00
#### Back on the source deployment:
Insert into the new database and table created on the remote system. You will need the host, port, username, password, destination database, and destination table.
```sql
INSERT INTO FUNCTION
remoteSecure('remote.clickhouse.cloud:9440', 'imdb.actors', 'USER', 'PASSWORD', rand())
SELECT * from imdb.actors
```
2022-07-13 23:27:33 +00:00
## Globs in Addresses {#globs-in-addresses}
2021-07-29 15:20:55 +00:00
Patterns in curly brackets `{ }` are used to generate a set of shards and to specify replicas. If there are multiple pairs of curly brackets, then the direct product of the corresponding sets is generated.
The following pattern types are supported.
- {*a*,*b*} - Any number of variants separated by a comma. The pattern is replaced with *a* in the first shard address and it is replaced with *b* in the second shard address and so on. For instance, `example0{1,2}-1` generates addresses `example01-1` and `example02-1`.
- {*n*..*m*} - A range of numbers. This pattern generates shard addresses with incrementing indices from *n* to *m*. `example0{1..2}-1` generates `example01-1` and `example02-1`.
- {*0n*..*0m*} - A range of numbers with leading zeroes. This modification preserves leading zeroes in indices. The pattern `example{01..03}-1` generates `example01-1`, `example02-1` and `example03-1`.
- {*a*|*b*} - Any number of variants separated by a `|`. The pattern specifies replicas. For instance, `example01-{1|2}` generates replicas `example01-1` and `example01-2`.
2021-07-29 15:20:55 +00:00
The query will be sent to the first healthy replica. However, for `remote` the replicas are iterated in the order currently set in the [load_balancing](../../operations/settings/settings.md#settings-load_balancing) setting.
The number of generated addresses is limited by [table_function_remote_max_addresses](../../operations/settings/settings.md#table_function_remote_max_addresses) setting.