Merge pull request #19267 from Jokser/disk-s3-docs

Add S3 disk documentation [EN]
This commit is contained in:
vdimir 2021-01-22 17:20:05 +03:00 committed by GitHub
commit 4879d25748
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -657,6 +657,96 @@ The `default` storage policy implies using only one volume, which consists of on
The number of threads performing background moves of data parts can be changed by [background_move_pool_size](../../../operations/settings/settings.md#background_move_pool_size) setting.
## Using S3 for Data Storage {#table_engine-mergetree-s3}
`MergeTree` family table engines is able to store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`.
Configuration markup:
``` xml
<storage_configuration>
...
<disks>
<s3>
<type>s3</type>
<endpoint>https://storage.yandexcloud.net/my-bucket/root-path/</endpoint>
<access_key_id>your_access_key_id</access_key_id>
<secret_access_key>your_secret_access_key</secret_access_key>
<proxy>
<uri>http://proxy1</uri>
<uri>http://proxy2</uri>
</proxy>
<connect_timeout_ms>10000</connect_timeout_ms>
<request_timeout_ms>5000</request_timeout_ms>
<max_connections>100</max_connections>
<retry_attempts>10</retry_attempts>
<min_bytes_for_seek>1000</min_bytes_for_seek>
<metadata_path>/var/lib/clickhouse/disks/s3/</metadata_path>
<cache_enabled>true</cache_enabled>
<cache_path>/var/lib/clickhouse/disks/s3/cache/</cache_path>
<skip_access_check>false</skip_access_check>
</s3>
</disks>
...
</storage_configuration>
```
Required parameters:
- `endpoint` — S3 endpoint url in `path` or `virtual hosted` [styles](https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html). Endpoint url should contain bucket and root path to store data.
- `access_key_id` — S3 access key id.
- `secret_access_key` — S3 secret access key.
Optional parameters:
- `use_environment_credentials` — Reads AWS credentials from the Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN if they exist. Default value is `false`.
- `proxy` — Proxy configuration for S3 endpoint. Each `uri` element inside `proxy` block should contain a proxy URL.
- `connect_timeout_ms` — Socket connect timeout in milliseconds. Default value is `10 seconds`.
- `request_timeout_ms` — Request timeout in milliseconds. Default value is `5 seconds`.
- `max_connections` — S3 connections pool size. Default value is `100`.
- `retry_attempts` — Number of retry attempts in case of failed request. Default value is `10`.
- `min_bytes_for_seek` — Minimal number of bytes to use seek operation instead of sequential read. Default value is `1 Mb`.
- `metadata_path` — Path on local FS to store metadata files for S3. Default value is `/var/lib/clickhouse/disks/<disk_name>/`.
- `cache_enabled` — Allows to cache mark and index files on local FS. Default value is `true`.
- `cache_path` — Path on local FS where to store cached mark and index files. Default value is `/var/lib/clickhouse/disks/<disk_name>/cache/`.
- `skip_access_check` — If true disk access checks will not be performed on disk start-up. Default value is `false`.
S3 disk can be configured as `main` or `cold` storage:
``` xml
<storage_configuration>
...
<disks>
<s3>
<type>s3</type>
<endpoint>https://storage.yandexcloud.net/my-bucket/root-path/</endpoint>
<access_key_id>your_access_key_id</access_key_id>
<secret_access_key>your_secret_access_key</secret_access_key>
</s3>
</disks>
<policies>
<s3_main>
<volumes>
<main>
<disk>s3</disk>
</main>
</volumes>
</s3_main>
<s3_cold>
<volumes>
<main>
<disk>default</disk>
</main>
<external>
<disk>s3</disk>
</external>
</volumes>
<move_factor>0.2</move_factor>
</s3_cold>
</policies>
...
</storage_configuration>
```
In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule.
### Details {#details}
In the case of `MergeTree` tables, data is getting to disk in different ways: