From 750a82a4ff615190a2793c0cfae9f4c1f5c75433 Mon Sep 17 00:00:00 2001 From: kssenii Date: Thu, 15 Feb 2024 13:23:33 +0100 Subject: [PATCH] Update doc --- .../mergetree-family/mergetree.md | 2 + docs/en/operations/storing-data.md | 146 ++++++++++++++++-- 2 files changed, 134 insertions(+), 14 deletions(-) diff --git a/docs/en/engines/table-engines/mergetree-family/mergetree.md b/docs/en/engines/table-engines/mergetree-family/mergetree.md index f185c11bab3..e1eef8db9ab 100644 --- a/docs/en/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md @@ -1106,6 +1106,8 @@ Configuration markup: ``` +Also see [configuring external storage options](/docs/en/operations/storing-data.md/#configuring-external-storage). + :::note cache configuration ClickHouse versions 22.3 through 22.7 use a different cache configuration, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) if you are using one of those versions. ::: diff --git a/docs/en/operations/storing-data.md b/docs/en/operations/storing-data.md index 003277c8d4f..7a7edfb1a90 100644 --- a/docs/en/operations/storing-data.md +++ b/docs/en/operations/storing-data.md @@ -11,45 +11,163 @@ To work with data stored on `Amazon S3` disks use [S3](/docs/en/engines/table-en To load data from a web server with static files use a disk with type [web](#storing-data-on-webserver). -## Configuring HDFS {#configuring-hdfs} +## Configuring external storage {#configuring-external-storage} -[MergeTree](/docs/en/engines/table-engines/mergetree-family/mergetree.md) and [Log](/docs/en/engines/table-engines/log-family/log.md) family table engines can store data to HDFS using a disk with type `HDFS`. +[MergeTree](/docs/en/engines/table-engines/mergetree-family/mergetree.md) and [Log](/docs/en/engines/table-engines/log-family/log.md) family table engines can store data to `S3`, `AzureBlobStorage`, `HDFS` using a disk with types `s3`, `azure_blob_storage`, `hdfs` accordingly. Configuration markup: +Let's take a loop at different storage configuration options on the example of `S3` storage. +Firstly, define configuration in server configuration file. In order to configure `S3` storage the following configuration can be used: + ``` xml - - hdfs - hdfs://hdfs1:9000/clickhouse/ - + + s3 + https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/ + 1 + - +
- hdfs + s3
-
+
+
+``` +Starting with 24.1 clickhouse version, a different type of configuration is supported in addition to the older one: + +``` xml + + + + + object_storage + s3 + local + https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/ + 1 + + + + + +
+ s3 +
+
+
+
+
+
+``` + +In order to make a specific kind of storage a default option for all `MergeTree` tables add the following section to configuration file: + +``` xml + - 0 + s3 ``` -Required parameters: +If you want to configure a specific storage policy only to specific table, you can define it in settings while creating the table: -- `endpoint` — HDFS endpoint URL in `path` format. Endpoint URL should contain a root path to store data. +``` sql +CREATE TABLE test (a Int32, b String) +ENGINE = MergeTree() ORDER BY a +SETTINGS storage_policy = 's3'; +``` -Optional parameters: +You can also use `disk` instead of `storage_policy`. In this case it is not requires to have `storage_policy` section in configuration file, only `disk` section would be enough. -- `min_bytes_for_seek` — The minimal number of bytes to use seek operation instead of sequential read. Default value: `1 Mb`. +``` sql +CREATE TABLE test (a Int32, b String) +ENGINE = MergeTree() ORDER BY a +SETTINGS disk = 's3'; +``` + +There is also a possibility to specify storage configuration without a preconfigured disk in configuration file: + +``` sql +CREATE TABLE test (a Int32, b String) +ENGINE = MergeTree() ORDER BY a +SETTINGS disk = disk(name = 's3_disk', type = 's3', endpoint = 'https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/', use_environment_credentials = 1); +``` + +Adding cache is also possible: + +``` sql +CREATE TABLE test (a Int32, b String) +ENGINE = MergeTree() ORDER BY a +SETTINGS disk = disk(name = 'cached_s3_disk', type = 'cache', max_size = '10Gi', path = '/s3_cache', disk = disk(name = 's3_disk', type = 's3', endpoint = 'https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/', use_environment_credentials = 1)); +``` + +A combination of config file disk configuration and sql-defined configuration is also possible: + +``` sql +CREATE TABLE test (a Int32, b String) +ENGINE = MergeTree() ORDER BY a +SETTINGS disk = disk(name = 'cached_s3_disk', type = 'cache', max_size = '10Gi', path = '/s3_cache', disk = 's3'); +``` + +Here `s3` is a disk name from server configuration file, while `cache` disk is defined via sql. + +Let's take a closer look at configuration parameters. + +All disk configuration require `type` section, equal to one of `s3`, `azure_blob_storage`, `hdfs`, `local`, `cache`, `web`. Then goes configuration of a specific storage type. +Starting from 24.1 clickhouse version, you can you a new configuration option. For it you are required to specify `type` as `object_storage`, `object_storage_type` as one of `s3`, `azure_blob_storage`, `hdfs`, `local`, `cache`, `web`, and optionally you can specify `metadata_type`, which is `local` by default, but it can also be set to `plain`, `web`. + +E.g. first configuration option: +``` xml + + s3 + https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/ + 1 + +``` + +and second (from `24.1`): +``` xml + + object_storage + s3 + local + https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/ + 1 + +``` + +Configuration like +``` xml + + s3_plain + https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/ + 1 + +``` + +is equal to +``` xml + + object_storage + s3 + plain + https://s3.eu-west-1.amazonaws.com/clickhouse-eu-west-1.clickhouse.com/data/ + 1 + +``` + +For details configuration options of each storage see [MergeTree](/docs/en/engines/table-engines/mergetree-family/mergetree.md). ## Using Virtual File System for Data Encryption {#encrypted-virtual-file-system}