ClickHouse/docs/zh/sql-reference/table-functions/s3.md
2021-06-20 18:13:16 +08:00

133 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
toc_priority: 45
toc_title: s3
---
# S3 表函数 {#s3-table-function}
提供类似于表的接口来 select/insert [Amazon S3](https://aws.amazon.com/s3/)中的文件。这个表函数类似于[hdfs](../../sql-reference/table-functions/hdfs.md),但提供了 S3 特有的功能。
**语法**
``` sql
s3(path, [aws_access_key_id, aws_secret_access_key,] format, structure, [compression])
```
**参数**
- `path` — 带有文件路径的 Bucket url。在只读模式下支持以下通配符: `*`, `?`, `{abc,def}` 和 `{N..M}` 其中 `N`, `M` 是数字, `'abc'`, `'def'` 是字符串. 更多信息见[下文](#wildcards-in-path).
- `format` — 文件的[格式](../../interfaces/formats.md#formats).
- `structure` — 表的结构. 格式像这样 `'column1_name column1_type, column2_name column2_type, ...'`.
- `compression` — 压缩类型. 支持的值: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. 参数是可选的. 默认情况下,通过文件扩展名自动检测压缩类型.
**返回值**
一个具有指定结构的表,用于读取或写入指定文件中的数据。
**示例**
从 S3 文件`https://storage.yandexcloud.net/my-test-bucket-768/data.csv`中选择表格的前两行:
``` sql
SELECT *
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/data.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')
LIMIT 2;
```
``` text
┌─column1─┬─column2─┬─column3─┐
│ 1 │ 2 │ 3 │
│ 3 │ 2 │ 1 │
└─────────┴─────────┴─────────┘
```
类似的情况,但来源是`gzip`压缩的文件:
``` sql
SELECT *
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/data.csv.gz', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32', 'gzip')
LIMIT 2;
```
``` text
┌─column1─┬─column2─┬─column3─┐
│ 1 │ 2 │ 3 │
│ 3 │ 2 │ 1 │
└─────────┴─────────┴─────────┘
```
## 用法 {#usage-examples}
假设我们在S3上有几个文件URI如下:
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_1.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_2.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_3.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_4.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_1.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_2.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_3.csv'
- 'https://storage.yandexcloud.net/my-test-bucket-768/another_prefix/some_file_4.csv'
计算以数字1至3结尾的文件的总行数:
``` sql
SELECT count(*)
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/{some,another}_prefix/some_file_{1..3}.csv', 'CSV', 'name String, value UInt32')
```
``` text
┌─count()─┐
│ 18 │
└─────────┘
```
计算这两个目录中所有文件的行的总量:
``` sql
SELECT count(*)
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/{some,another}_prefix/*', 'CSV', 'name String, value UInt32')
```
``` text
┌─count()─┐
│ 24 │
└─────────┘
```
!!! warning "Warning"
如果文件列表中包含有从零开头的数字范围,请对每个数字分别使用带括号的结构,或者使用`?`。
计算名为 `file-000.csv`, `file-001.csv`, … , `file-999.csv` 文件的总行数:
``` sql
SELECT count(*)
FROM s3('https://storage.yandexcloud.net/my-test-bucket-768/big_prefix/file-{000..999}.csv', 'CSV', 'name String, value UInt32');
```
``` text
┌─count()─┐
│ 12 │
└─────────┘
```
插入数据到 `test-data.csv.gz` 文件:
``` sql
INSERT INTO FUNCTION s3('https://storage.yandexcloud.net/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')
VALUES ('test-data', 1), ('test-data-2', 2);
```
从已有的表插入数据到 `test-data.csv.gz` 文件:
``` sql
INSERT INTO FUNCTION s3('https://storage.yandexcloud.net/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')
SELECT name, value FROM existing_table;
```
**另请参阅**
- [S3 引擎](../../engines/table-engines/integrations/s3.md)
[原始文章](https://clickhouse.tech/docs/en/sql-reference/table-functions/s3/) <!--hide-->