mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-17 03:42:48 +00:00
679a0e1300
This can help users that want to pass temporary credentials that issued by AWS in order to load data from S3 without changing configuration or creating an IAM User. Fixes #57848
4.5 KiB
4.5 KiB
slug | sidebar_position | sidebar_label |
---|---|---|
/zh/sql-reference/table-functions/s3 | 45 | s3 |
S3 表函数
提供类似于表的接口来 select/insert Amazon S3中的文件。这个表函数类似于hdfs,但提供了 S3 特有的功能。
语法
s3(path [,access_key_id, secret_access_key [,session_token]] ,format, structure, [compression])
参数
path
— 带有文件路径的 Bucket url。在只读模式下支持以下通配符:*
,?
,{abc,def}
和{N..M}
其中N
,M
是数字,'abc'
,'def'
是字符串. 更多信息见下文.format
— 文件的格式.structure
— 表的结构. 格式像这样'column1_name column1_type, column2_name column2_type, ...'
.compression
— 压缩类型. 支持的值:none
,gzip/gz
,brotli/br
,xz/LZMA
,zstd/zst
. 参数是可选的. 默认情况下,通过文件扩展名自动检测压缩类型.
返回值
一个具有指定结构的表,用于读取或写入指定文件中的数据。
示例
从 S3 文件https://storage.yandexcloud.net/my-test-bucket-768/data.csv
中选择表格的前两行:
SELECT *
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/data.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')
LIMIT 2;
┌─column1─┬─column2─┬─column3─┐
│ 1 │ 2 │ 3 │
│ 3 │ 2 │ 1 │
└─────────┴─────────┴─────────┘
类似的情况,但来源是gzip
压缩的文件:
SELECT *
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/data.csv.gz', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32', 'gzip')
LIMIT 2;
┌─column1─┬─column2─┬─column3─┐
│ 1 │ 2 │ 3 │
│ 3 │ 2 │ 1 │
└─────────┴─────────┴─────────┘
用法
假设我们在S3上有几个文件,URI如下:
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_1.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_2.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_3.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_4.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_1.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_2.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_3.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_4.csv'
计算以数字1至3结尾的文件的总行数:
SELECT count(*)
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/{some,another}_prefix/some_file_{1..3}.csv', 'CSV', 'name String, value UInt32')
┌─count()─┐
│ 18 │
└─────────┘
计算这两个目录中所有文件的行的总量:
SELECT count(*)
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/{some,another}_prefix/*', 'CSV', 'name String, value UInt32')
┌─count()─┐
│ 24 │
└─────────┘
!!! warning "Warning"
如果文件列表中包含有从零开头的数字范围,请对每个数字分别使用带括号的结构,或者使用?
。
计算名为 file-000.csv
, file-001.csv
, … , file-999.csv
文件的总行数:
SELECT count(*)
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/big_prefix/file-{000..999}.csv', 'CSV', 'name String, value UInt32');
┌─count()─┐
│ 12 │
└─────────┘
插入数据到 test-data.csv.gz
文件:
INSERT INTO FUNCTION s3('https://storage.yandexcloud.net/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')
VALUES ('test-data', 1), ('test-data-2', 2);
从已有的表插入数据到 test-data.csv.gz
文件:
INSERT INTO FUNCTION s3('https://storage.yandexcloud.net/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')
SELECT name, value FROM existing_table;
另请参阅