mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-20 05:05:38 +00:00

robot-clickhouse a9b2a54de7 Backport #71947 to 24.10: Fix weird case when s3/s3Cluster return incomplete result or exception

2024-12-16 14:08:48 +00:00

18 KiB

Raw Blame History

slug	sidebar_position	sidebar_label
/en/engines/table-engines/integrations/s3	180	S3

S3 Table Engine

This engine provides integration with Amazon S3 ecosystem. This engine is similar to the HDFS engine, but provides S3-specific features.

Example

CREATE TABLE s3_engine_table (name String, value UInt32)
    ENGINE=S3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/test-data.csv.gz', 'CSV', 'gzip')
    SETTINGS input_format_with_names_use_header = 0;

INSERT INTO s3_engine_table VALUES ('one', 1), ('two', 2), ('three', 3);

SELECT * FROM s3_engine_table LIMIT 2;

┌─name─┬─value─┐
│ one  │     1 │
│ two  │     2 │
└──────┴───────┘

Create Table

CREATE TABLE s3_engine_table (name String, value UInt32)
    ENGINE = S3(path [, NOSIGN | aws_access_key_id, aws_secret_access_key,] format, [compression])
    [PARTITION BY expr]
    [SETTINGS ...]

Engine parameters

path — Bucket url with path to file. Supports following wildcards in readonly mode: *, **, ?, {abc,def} and {N..M} where N, M — numbers, 'abc', 'def' — strings. For more information see below.
NOSIGN - If this keyword is provided in place of credentials, all the requests will not be signed.
format — The format of the file.
aws_access_key_id, aws_secret_access_key - Long-term credentials for the AWS account user. You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file. For more information see Using S3 for Data Storage.
compression — Compression type. Supported values: none, gzip/gz, brotli/br, xz/LZMA, zstd/zst. Parameter is optional. By default, it will auto-detect compression by file extension.

Data cache

S3 table engine supports data caching on local disk. See filesystem cache configuration options and usage in this section. Caching is made depending on the path and ETag of the storage object, so clickhouse will not read a stale cache version.

To enable caching use a setting filesystem_cache_name = '<name>' and enable_filesystem_cache = 1.

SELECT *
FROM s3('http://minio:10000/clickhouse//test_3.csv', 'minioadmin', 'minioadminpassword', 'CSV')
SETTINGS filesystem_cache_name = 'cache_for_s3', enable_filesystem_cache = 1;

There are two ways to define cache in configuration file.

add the following section to clickhouse configuration file:

<clickhouse>
    <filesystem_caches>
        <cache_for_s3>
            <path>path to cache directory</path>
            <max_size>10Gi</max_size>
        </cache_for_s3>
    </filesystem_caches>
</clickhouse>

reuse cache configuration (and therefore cache storage) from clickhouse storage_configuration section, described here

PARTITION BY

PARTITION BY — Optional. In most cases you don't need a partition key, and if it is needed you generally don't need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).

For partitioning by month, use the toYYYYMM(date_column) expression, where date_column is a column with a date of the type Date. The partition names here have the "YYYYMM" format.

Querying partitioned data

This example uses the docker compose recipe, which integrates ClickHouse and MinIO. You should be able to reproduce the same queries using S3 by replacing the endpoint and authentication values.

Notice that the S3 endpoint in the ENGINE configuration uses the parameter token {_partition_id} as part of the S3 object (filename), and that the SELECT queries select against those resulting object names (e.g., test_3.csv).

:::note As shown in the example, querying from S3 tables that are partitioned is not directly supported at this time, but can be accomplished by querying the individual partitions using the S3 table function.

The primary use-case for writing partitioned data in S3 is to enable transferring that data into another ClickHouse system (for example, moving from on-prem systems to ClickHouse Cloud). Because ClickHouse datasets are often very large, and network reliability is sometimes imperfect it makes sense to transfer datasets in subsets, hence partitioned writes. :::

Create the table

CREATE TABLE p
(
    `column1` UInt32,
    `column2` UInt32,
    `column3` UInt32
)
ENGINE = S3(
# highlight-next-line
           'http://minio:10000/clickhouse//test_{_partition_id}.csv',
           'minioadmin',
           'minioadminpassword',
           'CSV')
PARTITION BY column3

Insert data

insert into p values (1, 2, 3), (3, 2, 1), (78, 43, 45)

Select from partition 3

:::tip This query uses the s3 table function :::

SELECT *
FROM s3('http://minio:10000/clickhouse//test_3.csv', 'minioadmin', 'minioadminpassword', 'CSV')

┌─c1─┬─c2─┬─c3─┐
│  1 │  2 │  3 │
└────┴────┴────┘

Select from partition 1

SELECT *
FROM s3('http://minio:10000/clickhouse//test_1.csv', 'minioadmin', 'minioadminpassword', 'CSV')

┌─c1─┬─c2─┬─c3─┐
│  3 │  2 │  1 │
└────┴────┴────┘

Select from partition 45

SELECT *
FROM s3('http://minio:10000/clickhouse//test_45.csv', 'minioadmin', 'minioadminpassword', 'CSV')

┌─c1─┬─c2─┬─c3─┐
│ 78 │ 43 │ 45 │
└────┴────┴────┘

Limitation

You may naturally try to Select * from p, but as noted above, this query will fail; use the preceding query.

SELECT * FROM p

Received exception from server (version 23.4.1):
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception: Reading from a partitioned S3 storage is not implemented yet. (NOT_IMPLEMENTED)

Virtual columns

_path — Path to the file. Type: LowCardinalty(String).
_file — Name of the file. Type: LowCardinalty(String).
_size — Size of the file in bytes. Type: Nullable(UInt64). If the size is unknown, the value is NULL.
_time — Last modified time of the file. Type: Nullable(DateTime). If the time is unknown, the value is NULL.
_etag — ETag of the file. Type: LowCardinalty(String). If the etag is unknown, the value is NULL.

For more information about virtual columns see here.

Implementation Details

Reads and writes can be parallel
Not supported:
- ALTER and SELECT...SAMPLE operations.
- Indexes.
- Zero-copy replication is possible, but not supported.
:::note Zero-copy replication is not ready for production Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use. :::

Wildcards In Path

path argument can specify multiple files using bash-like wildcards. For being processed file should exist and match to the whole path pattern. Listing of files is determined during SELECT (not at CREATE moment).

* — Substitutes any number of any characters except / including empty string.
** — Substitutes any number of any character include / including empty string.
? — Substitutes any single character.
{some_string,another_string,yet_another_one} — Substitutes any of strings 'some_string', 'another_string', 'yet_another_one'.
{N..M} — Substitutes any number in range from N to M including both borders. N and M can have leading zeroes e.g. 000..078.

Constructions with {} are similar to the remote table function.

:::note If the listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use ?. :::

Example with wildcards 1

Create table with files named file-000.csv, file-001.csv, ... , file-999.csv:

CREATE TABLE big_table (name String, value UInt32)
    ENGINE = S3('https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/my_folder/file-{000..999}.csv', 'CSV');

Example with wildcards 2

Suppose we have several files in CSV format with the following URIs on S3:

There are several ways to make a table consisting of all six files:

Specify the range of file postfixes:

CREATE TABLE table_with_range (name String, value UInt32)
    ENGINE = S3('https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/{some,another}_folder/some_file_{1..3}', 'CSV');

Take all files with some_file_ prefix (there should be no extra files with such prefix in both folders):

CREATE TABLE table_with_question_mark (name String, value UInt32)
    ENGINE = S3('https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/{some,another}_folder/some_file_?', 'CSV');

Take all the files in both folders (all files should satisfy format and schema described in query):

CREATE TABLE table_with_asterisk (name String, value UInt32)
    ENGINE = S3('https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/{some,another}_folder/*', 'CSV');

Storage Settings

s3_truncate_on_insert - allows to truncate file before insert into it. Disabled by default.
s3_create_new_file_on_insert - allows to create a new file on each insert if format has suffix. Disabled by default.
s3_skip_empty_files - allows to skip empty files while reading. Enabled by default.

S3-related Settings

The following settings can be set before query execution or placed into configuration file.

s3_max_single_part_upload_size — The maximum size of object to upload using singlepart upload to S3. Default value is 32Mb.
s3_min_upload_part_size — The minimum size of part to upload during multipart upload to S3 Multipart upload. Default value is 16Mb.
s3_max_redirects — Max number of S3 redirects hops allowed. Default value is 10.
s3_single_read_retries — The maximum number of attempts during single read. Default value is 4.
s3_max_put_rps — Maximum PUT requests per second rate before throttling. Default value is 0 (unlimited).
s3_max_put_burst — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (0 value) equals to s3_max_put_rps.
s3_max_get_rps — Maximum GET requests per second rate before throttling. Default value is 0 (unlimited).
s3_max_get_burst — Max number of requests that can be issued simultaneously before hitting request per second limit. By default (0 value) equals to s3_max_get_rps.
s3_upload_part_size_multiply_factor - Multiply s3_min_upload_part_size by this factor each time s3_multiply_parts_count_threshold parts were uploaded from a single write to S3. Default values is 2.
s3_upload_part_size_multiply_parts_count_threshold - Each time this number of parts was uploaded to S3, s3_min_upload_part_size is multiplied by s3_upload_part_size_multiply_factor. Default value is 500.
s3_max_inflight_parts_for_one_file - Limits the number of put requests that can be run concurrently for one object. Its number should be limited. The value 0 means unlimited. Default value is 20. Each in-flight part has a buffer with size s3_min_upload_part_size for the first s3_upload_part_size_multiply_factor parts and more when file is big enough, see upload_part_size_multiply_factor. With default settings one uploaded file consumes not more than 320Mb for a file which is less than 8G. The consumption is greater for a larger file.

Security consideration: if malicious user can specify arbitrary S3 URLs, s3_max_redirects must be set to zero to avoid SSRF attacks; or alternatively, remote_host_filter must be specified in server configuration.

Endpoint-based Settings

The following settings can be specified in configuration file for given endpoint (which will be matched by exact prefix of a URL):

endpoint — Specifies prefix of an endpoint. Mandatory.
access_key_id and secret_access_key — Specifies credentials to use with given endpoint. Optional.
use_environment_credentials — If set to true, S3 client will try to obtain credentials from environment variables and Amazon EC2 metadata for given endpoint. Optional, default value is false.
region — Specifies S3 region name. Optional.
use_insecure_imds_request — If set to true, S3 client will use insecure IMDS request while obtaining credentials from Amazon EC2 metadata. Optional, default value is false.
expiration_window_seconds — Grace period for checking if expiration-based credentials have expired. Optional, default value is 120.
no_sign_request - Ignore all the credentials so requests are not signed. Useful for accessing public buckets.
header — Adds specified HTTP header to a request to given endpoint. Optional, can be specified multiple times.
server_side_encryption_customer_key_base64 — If specified, required headers for accessing S3 objects with SSE-C encryption will be set. Optional.
server_side_encryption_kms_key_id - If specified, required headers for accessing S3 objects with SSE-KMS encryption will be set. If an empty string is specified, the AWS managed S3 key will be used. Optional.
server_side_encryption_kms_encryption_context - If specified alongside server_side_encryption_kms_key_id, the given encryption context header for SSE-KMS will be set. Optional.
server_side_encryption_kms_bucket_key_enabled - If specified alongside server_side_encryption_kms_key_id, the header to enable S3 bucket keys for SSE-KMS will be set. Optional, can be true or false, defaults to nothing (matches the bucket-level setting).
max_single_read_retries — The maximum number of attempts during single read. Default value is 4. Optional.
max_put_rps, max_put_burst, max_get_rps and max_get_burst - Throttling settings (see description above) to use for specific endpoint instead of per query. Optional.

Example:

<s3>
    <endpoint-name>
        <endpoint>https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/</endpoint>
        <!-- <access_key_id>ACCESS_KEY_ID</access_key_id> -->
        <!-- <secret_access_key>SECRET_ACCESS_KEY</secret_access_key> -->
        <!-- <region>us-west-1</region> -->
        <!-- <use_environment_credentials>false</use_environment_credentials> -->
        <!-- <use_insecure_imds_request>false</use_insecure_imds_request> -->
        <!-- <expiration_window_seconds>120</expiration_window_seconds> -->
        <!-- <no_sign_request>false</no_sign_request> -->
        <!-- <header>Authorization: Bearer SOME-TOKEN</header> -->
        <!-- <server_side_encryption_customer_key_base64>BASE64-ENCODED-KEY</server_side_encryption_customer_key_base64> -->
        <!-- <server_side_encryption_kms_key_id>KMS_KEY_ID</server_side_encryption_kms_key_id> -->
        <!-- <server_side_encryption_kms_encryption_context>KMS_ENCRYPTION_CONTEXT</server_side_encryption_kms_encryption_context> -->
        <!-- <server_side_encryption_kms_bucket_key_enabled>true</server_side_encryption_kms_bucket_key_enabled> -->
        <!-- <max_single_read_retries>4</max_single_read_retries> -->
    </endpoint-name>
</s3>

Accessing public buckets

ClickHouse tries to fetch credentials from many different types of sources. Sometimes, it can produce problems when accessing some buckets that are public causing the client to return 403 error code. This issue can be avoided by using NOSIGN keyword, forcing the client to ignore all the credentials, and not sign the requests.

CREATE TABLE big_table (name String, value UInt32)
    ENGINE = S3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv', NOSIGN, 'CSVWithNames');

Optimizing performance

For details on optimizing the performance of the s3 function see our detailed guide.

18 KiB Raw Blame History