ClickHouse/docs/en/operations/named-collections.md

549 lines
15 KiB
Markdown
Raw Normal View History

2022-04-01 17:34:35 +00:00
---
2022-08-28 14:53:34 +00:00
slug: /en/operations/named-collections
sidebar_position: 69
2022-08-22 22:17:39 +00:00
sidebar_label: "Named collections"
2023-05-16 18:58:04 +00:00
title: "Named collections"
2022-04-01 17:34:35 +00:00
---
import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';
<CloudNotSupportedBadge />
Named collections provide a way to store collections of key-value pairs to be
2023-05-16 18:58:04 +00:00
used to configure integrations with external sources. You can use named collections with
dictionaries, tables, table functions, and object storage.
2023-05-16 18:58:04 +00:00
Named collections can be configured with DDL or in configuration files and are applied
when ClickHouse starts. They simplify the creation of objects and the hiding of credentials
from users without administrative access.
2023-05-17 18:48:16 +00:00
The keys in a named collection must match the parameter names of the corresponding
function, table engine, database, etc. In the examples below the parameter list is
linked to for each type.
Parameters set in a named collection can be overridden in SQL, this is shown in the examples
below. This ability can be limited using `[NOT] OVERRIDABLE` keywords and XML attributes
and/or the configuration option `allow_named_collection_override_by_default`.
:::warning
If override is allowed, it may be possible for users without administrative access to
figure out the credentials that you are trying to hide.
If you are using named collections with that purpose, you should disable
`allow_named_collection_override_by_default` (which is enabled by default).
:::
2023-05-17 18:48:16 +00:00
## Storing named collections in the system database
### DDL example
```sql
CREATE NAMED COLLECTION name AS
key_1 = 'value' OVERRIDABLE,
key_2 = 'value2' NOT OVERRIDABLE,
2023-05-17 18:48:16 +00:00
url = 'https://connection.url/'
```
2023-05-16 18:58:04 +00:00
In the above example:
* `key_1` can always be overridden.
* `key_2` can never be overridden.
* `url` can be overridden or not depending on the value of `allow_named_collection_override_by_default`.
2023-05-16 18:58:04 +00:00
### Permissions to create named collections with DDL
To manage named collections with DDL a user must have the `named_control_collection` privilege. This can be assigned by adding a file to `/etc/clickhouse-server/users.d/`. The example gives the user `default` both the `access_management` and `named_collection_control` privileges:
```xml title='/etc/clickhouse-server/users.d/user_default.xml'
<clickhouse>
<users>
<default>
<password_sha256_hex>65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5</password_sha256_hex replace=true>
<access_management>1</access_management>
<!-- highlight-start -->
<named_collection_control>1</named_collection_control>
<!-- highlight-end -->
</default>
</users>
</clickhouse>
```
:::tip
In the above example the `password_sha256_hex` value is the hexadecimal representation of the SHA256 hash of the password. This configuration for the user `default` has the attribute `replace=true` as in the default configuration has a plain text `password` set, and it is not possible to have both plain text and sha256 hex passwords set for a user.
2023-05-16 18:58:04 +00:00
:::
2024-06-07 11:08:01 +00:00
### Storage for named collections
2024-08-20 14:06:39 +00:00
Named collections can either be stored on local disk or in ZooKeeper/Keeper. By default local storage is used.
They can also be stored using encryption with the same algorithms used for [disk encryption](storing-data#encrypted-virtual-file-system),
where `aes_128_ctr` is used by default.
2024-06-07 11:08:01 +00:00
2024-08-20 14:24:53 +00:00
To configure named collections storage you need to specify a `type`. This can be either `local` or `keeper`/`zookeeper`. For encrypted storage,
2024-08-20 14:06:39 +00:00
you can use `local_encrypted` or `keeper_encrypted`/`zookeeper_encrypted`.
To use ZooKeeper/Keeper we also need to set up a `path` (path in ZooKeeper/Keeper, where named collections will be stored) to
`named_collections_storage` section in configuration file. The following example uses encryption and ZooKeeper/Keeper:
2024-06-07 11:08:01 +00:00
```
<clickhouse>
<named_collections_storage>
2024-08-20 14:06:39 +00:00
<type>zookeeper_encrypted</type>
<key_hex>bebec0cabebec0cabebec0cabebec0ca</key_hex>
<algorithm>aes_128_ctr</algorithm>
2024-06-07 11:08:01 +00:00
<path>/named_collections_path/</path>
<update_timeout_ms>1000</update_timeout_ms>
</named_collections_storage>
</clickhouse>
```
An optional configuration parameter `update_timeout_ms` by default is equal to `5000`.
2023-05-17 18:48:16 +00:00
## Storing named collections in configuration files
2023-05-16 18:58:04 +00:00
### XML example
2022-04-01 17:34:35 +00:00
2023-05-19 14:49:43 +00:00
```xml title='/etc/clickhouse-server/config.d/named_collections.xml'
2022-04-01 17:34:35 +00:00
<clickhouse>
<named_collections>
2023-05-16 18:58:04 +00:00
<name>
<key_1 overridable="true">value</key_1>
<key_2 overridable="false">value_2</key_2>
2023-05-16 18:58:04 +00:00
<url>https://connection.url/</url>
</name>
2022-04-01 17:34:35 +00:00
</named_collections>
</clickhouse>
```
In the above example:
* `key_1` can always be overridden.
* `key_2` can never be overridden.
* `url` can be overridden or not depending on the value of `allow_named_collection_override_by_default`.
2023-05-19 14:49:43 +00:00
## Modifying named collections
2023-05-19 15:03:53 +00:00
Named collections that are created with DDL queries can be altered or dropped with DDL. Named collections created with XML files can be managed by editing or deleting the corresponding XML.
2023-05-19 14:49:43 +00:00
### Alter a DDL named collection
Change or add the keys `key1` and `key3` of the collection `collection2`
(this will not change the value of the `overridable` flag for those keys):
2023-05-19 14:49:43 +00:00
```sql
ALTER NAMED COLLECTION collection2 SET key1=4, key3='value3'
```
Change or add the key `key1` and allow it to be always overridden:
```sql
ALTER NAMED COLLECTION collection2 SET key1=4 OVERRIDABLE
```
2023-05-19 14:49:43 +00:00
Remove the key `key2` from `collection2`:
```sql
ALTER NAMED COLLECTION collection2 DELETE key2
```
2023-05-19 15:03:53 +00:00
Change or add the key `key1` and delete the key `key3` of the collection `collection2`:
```sql
ALTER NAMED COLLECTION collection2 SET key1=4, DELETE key3
```
To force a key to use the default settings for the `overridable` flag, you have to
remove and re-add the key.
```sql
ALTER NAMED COLLECTION collection2 DELETE key1;
ALTER NAMED COLLECTION collection2 SET key1=4;
```
2023-05-19 14:49:43 +00:00
### Drop the DDL named collection `collection2`:
```sql
DROP NAMED COLLECTION collection2
```
2023-05-16 18:58:04 +00:00
## Named collections for accessing S3
2022-04-01 17:34:35 +00:00
The description of parameters see [s3 Table Function](../sql-reference/table-functions/s3.md).
2023-05-16 18:58:04 +00:00
### DDL example
```sql
CREATE NAMED COLLECTION s3_mydata AS
access_key_id = 'AKIAIOSFODNN7EXAMPLE',
secret_access_key = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
format = 'CSV',
url = 'https://s3.us-east-1.amazonaws.com/yourbucket/mydata/'
```
### XML example
2022-04-01 17:34:35 +00:00
```xml
<clickhouse>
<named_collections>
<s3_mydata>
<access_key_id>AKIAIOSFODNN7EXAMPLE</access_key_id>
2022-06-22 03:04:55 +00:00
<secret_access_key>wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</secret_access_key>
2022-04-01 17:34:35 +00:00
<format>CSV</format>
<url>https://s3.us-east-1.amazonaws.com/yourbucket/mydata/</url>
</s3_mydata>
</named_collections>
</clickhouse>
```
2023-05-17 18:48:16 +00:00
### s3() function and S3 Table named collection examples
2023-05-16 18:58:04 +00:00
Both of the following examples use the same named collection `s3_mydata`:
#### s3() function
2022-04-01 17:34:35 +00:00
```sql
INSERT INTO FUNCTION s3(s3_mydata, filename = 'test_file.tsv.gz',
format = 'TSV', structure = 'number UInt64', compression_method = 'gzip')
SELECT * FROM numbers(10000);
```
2023-05-16 18:58:04 +00:00
:::tip
The first argument to the `s3()` function above is the name of the collection, `s3_mydata`. Without named collections, the access key ID, secret, format, and URL would all be passed in every call to the `s3()` function.
:::
#### S3 table
2022-04-01 17:34:35 +00:00
```sql
CREATE TABLE s3_engine_table (number Int64)
ENGINE=S3(s3_mydata, url='https://s3.us-east-1.amazonaws.com/yourbucket/mydata/test_file.tsv.gz', format = 'TSV')
SETTINGS input_format_with_names_use_header = 0;
SELECT * FROM s3_engine_table LIMIT 3;
┌─number─┐
│ 0 │
│ 1 │
│ 2 │
└────────┘
```
2022-08-22 22:17:39 +00:00
## Named collections for accessing MySQL database
2022-04-01 17:34:35 +00:00
The description of parameters see [mysql](../sql-reference/table-functions/mysql.md).
2023-05-16 18:58:04 +00:00
### DDL example
```sql
CREATE NAMED COLLECTION mymysql AS
user = 'myuser',
password = 'mypass',
host = '127.0.0.1',
port = 3306,
2023-05-26 00:35:51 +00:00
database = 'test',
connection_pool_size = 8,
2023-05-16 18:58:04 +00:00
replace_query = 1
```
### XML example
2022-04-01 17:34:35 +00:00
```xml
<clickhouse>
<named_collections>
<mymysql>
<user>myuser</user>
<password>mypass</password>
<host>127.0.0.1</host>
<port>3306</port>
<database>test</database>
<connection_pool_size>8</connection_pool_size>
<replace_query>1</replace_query>
</mymysql>
</named_collections>
</clickhouse>
```
2023-05-16 18:58:04 +00:00
### mysql() function, MySQL table, MySQL database, and Dictionary named collection examples
The four following examples use the same named collection `mymysql`:
#### mysql() function
2022-04-01 17:34:35 +00:00
```sql
SELECT count() FROM mysql(mymysql, table = 'test');
┌─count()─┐
│ 3 │
└─────────┘
```
2023-05-16 18:58:04 +00:00
:::note
The named collection does not specify the `table` parameter, so it is specified in the function call as `table = 'test'`.
:::
2022-04-01 17:34:35 +00:00
2023-05-16 18:58:04 +00:00
#### MySQL table
2022-04-01 17:34:35 +00:00
```sql
CREATE TABLE mytable(A Int64) ENGINE = MySQL(mymysql, table = 'test', connection_pool_size=3, replace_query=0);
SELECT count() FROM mytable;
┌─count()─┐
│ 3 │
└─────────┘
```
2023-05-16 18:58:04 +00:00
:::note
The DDL overrides the named collection setting for connection_pool_size.
:::
#### MySQL database
2022-04-01 17:34:35 +00:00
```sql
CREATE DATABASE mydatabase ENGINE = MySQL(mymysql);
SHOW TABLES FROM mydatabase;
┌─name───┐
│ source │
│ test │
└────────┘
```
2023-05-16 18:58:04 +00:00
#### MySQL Dictionary
2022-04-01 17:34:35 +00:00
```sql
CREATE DICTIONARY dict (A Int64, B String)
PRIMARY KEY A
SOURCE(MYSQL(NAME mymysql TABLE 'source'))
LIFETIME(MIN 1 MAX 2)
LAYOUT(HASHED());
SELECT dictGet('dict', 'B', 2);
┌─dictGet('dict', 'B', 2)─┐
│ two │
└─────────────────────────┘
```
2022-08-22 22:17:39 +00:00
## Named collections for accessing PostgreSQL database
2022-04-01 17:34:35 +00:00
2024-08-16 10:23:20 +00:00
The description of parameters see [postgresql](../sql-reference/table-functions/postgresql.md). Additionally, there are aliases:
2022-04-01 17:34:35 +00:00
2024-08-16 10:23:20 +00:00
- `username` for `user`
- `db` for `database`.
2024-08-16 11:01:23 +00:00
Parameter `addresses_expr` is used in a collection instead of `host:port`. The parameter is optional, because there are other optional ones: `host`, `hostname`, `port`. The following pseudo code explains the priority:
2024-08-16 10:23:20 +00:00
```sql
2024-08-20 14:06:39 +00:00
CASE
2024-08-16 10:23:20 +00:00
WHEN collection['addresses_expr'] != '' THEN collection['addresses_expr']
WHEN collection['host'] != '' THEN collection['host'] || ':' || if(collection['port'] != '', collection['port'], '5432')
WHEN collection['hostname'] != '' THEN collection['hostname'] || ':' || if(collection['port'] != '', collection['port'], '5432')
END
```
Example of creation:
2023-05-16 18:58:04 +00:00
```sql
CREATE NAMED COLLECTION mypg AS
user = 'pguser',
password = 'jw8s0F4',
host = '127.0.0.1',
port = 5432,
database = 'test',
2024-08-16 10:23:20 +00:00
schema = 'test_schema'
2023-05-16 18:58:04 +00:00
```
2022-04-01 17:34:35 +00:00
Example of configuration:
```xml
<clickhouse>
<named_collections>
<mypg>
<user>pguser</user>
<password>jw8s0F4</password>
<host>127.0.0.1</host>
<port>5432</port>
<database>test</database>
<schema>test_schema</schema>
</mypg>
</named_collections>
</clickhouse>
```
2022-08-22 22:17:39 +00:00
### Example of using named collections with the postgresql function
2022-04-01 17:34:35 +00:00
```sql
SELECT * FROM postgresql(mypg, table = 'test');
┌─a─┬─b───┐
│ 2 │ two │
│ 1 │ one │
└───┴─────┘
SELECT * FROM postgresql(mypg, table = 'test', schema = 'public');
┌─a─┐
│ 1 │
│ 2 │
│ 3 │
└───┘
```
2022-08-22 22:17:39 +00:00
### Example of using named collections with database with engine PostgreSQL
2022-04-01 17:34:35 +00:00
```sql
CREATE TABLE mypgtable (a Int64) ENGINE = PostgreSQL(mypg, table = 'test', schema = 'public');
SELECT * FROM mypgtable;
┌─a─┐
│ 1 │
│ 2 │
│ 3 │
└───┘
```
2024-08-16 10:23:20 +00:00
:::note
PostgreSQL copies data from the named collection when the table is being created. A change in the collection does not affect the existing tables.
:::
2022-08-22 22:17:39 +00:00
### Example of using named collections with database with engine PostgreSQL
2022-04-01 17:34:35 +00:00
```sql
CREATE DATABASE mydatabase ENGINE = PostgreSQL(mypg);
SHOW TABLES FROM mydatabase
┌─name─┐
│ test │
└──────┘
```
### Example of using named collections with a dictionary with source POSTGRESQL
2022-04-01 17:34:35 +00:00
```sql
CREATE DICTIONARY dict (a Int64, b String)
PRIMARY KEY a
SOURCE(POSTGRESQL(NAME mypg TABLE test))
LIFETIME(MIN 1 MAX 2)
LAYOUT(HASHED());
SELECT dictGet('dict', 'b', 2);
┌─dictGet('dict', 'b', 2)─┐
│ two │
└─────────────────────────┘
2022-06-22 03:04:55 +00:00
```
2022-08-22 22:17:39 +00:00
2023-05-16 18:58:04 +00:00
## Named collections for accessing a remote ClickHouse database
2022-08-22 22:17:39 +00:00
The description of parameters see [remote](../sql-reference/table-functions/remote.md/#parameters).
Example of configuration:
2023-05-16 18:58:04 +00:00
```sql
CREATE NAMED COLLECTION remote1 AS
host = 'remote_host',
port = 9000,
database = 'system',
user = 'foo',
password = 'secret',
secure = 1
```
2022-08-22 22:17:39 +00:00
```xml
<clickhouse>
<named_collections>
<remote1>
<host>remote_host</host>
2022-08-22 22:17:39 +00:00
<port>9000</port>
<database>system</database>
<user>foo</user>
<password>secret</password>
<secure>1</secure>
2022-08-22 22:17:39 +00:00
</remote1>
</named_collections>
</clickhouse>
```
`secure` is not needed for connection because of `remoteSecure`, but it can be used for dictionaries.
2022-08-22 22:17:39 +00:00
2022-08-22 22:36:01 +00:00
### Example of using named collections with the `remote`/`remoteSecure` functions
2022-08-22 22:17:39 +00:00
```sql
SELECT * FROM remote(remote1, table = one);
┌─dummy─┐
│ 0 │
└───────┘
SELECT * FROM remote(remote1, database = merge(system, '^one'));
┌─dummy─┐
│ 0 │
└───────┘
INSERT INTO FUNCTION remote(remote1, database = default, table = test) VALUES (1,'a');
SELECT * FROM remote(remote1, database = default, table = test);
┌─a─┬─b─┐
│ 1 │ a │
└───┴───┘
```
### Example of using named collections with a dictionary with source ClickHouse
2022-08-22 22:17:39 +00:00
```sql
CREATE DICTIONARY dict(a Int64, b String)
PRIMARY KEY a
SOURCE(CLICKHOUSE(NAME remote1 TABLE test DB default))
LIFETIME(MIN 1 MAX 2)
LAYOUT(HASHED());
SELECT dictGet('dict', 'b', 1);
┌─dictGet('dict', 'b', 1)─┐
│ a │
└─────────────────────────┘
```
## Named collections for accessing Kafka
The description of parameters see [Kafka](../engines/table-engines/integrations/kafka.md).
### DDL example
```sql
CREATE NAMED COLLECTION my_kafka_cluster AS
kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'kafka_topic',
kafka_group_name = 'consumer_group',
kafka_format = 'JSONEachRow',
kafka_max_block_size = '1048576';
2024-08-20 14:06:39 +00:00
```
### XML example
```xml
<clickhouse>
<named_collections>
<my_kafka_cluster>
<kafka_broker_list>localhost:9092</kafka_broker_list>
<kafka_topic_list>kafka_topic</kafka_topic_list>
<kafka_group_name>consumer_group</kafka_group_name>
<kafka_format>JSONEachRow</kafka_format>
<kafka_max_block_size>1048576</kafka_max_block_size>
</my_kafka_cluster>
</named_collections>
</clickhouse>
```
### Example of using named collections with a Kafka table
Both of the following examples use the same named collection `my_kafka_cluster`:
```sql
CREATE TABLE queue
(
timestamp UInt64,
level String,
message String
)
ENGINE = Kafka(my_kafka_cluster)
CREATE TABLE queue
(
timestamp UInt64,
level String,
message String
)
ENGINE = Kafka(my_kafka_cluster)
SETTINGS kafka_num_consumers = 4,
kafka_thread_per_consumer = 1;
```