mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-25 00:52:02 +00:00
Merge pull request #28716 from olgarev/revolg-DOCSUP-13742-partitions_in_s3_table_function
This commit is contained in:
commit
5b967d91ba
@ -210,4 +210,4 @@ ENGINE = S3('https://storage.yandexcloud.net/my-test-bucket-768/big_prefix/file-
|
|||||||
|
|
||||||
## See also
|
## See also
|
||||||
|
|
||||||
- [S3 table function](../../../sql-reference/table-functions/s3.md)
|
- [s3 table function](../../../sql-reference/table-functions/s3.md)
|
||||||
|
@ -3,7 +3,7 @@ toc_priority: 45
|
|||||||
toc_title: s3
|
toc_title: s3
|
||||||
---
|
---
|
||||||
|
|
||||||
# S3 Table Function {#s3-table-function}
|
# s3 Table Function {#s3-table-function}
|
||||||
|
|
||||||
Provides table-like interface to select/insert files in [Amazon S3](https://aws.amazon.com/s3/). This table function is similar to [hdfs](../../sql-reference/table-functions/hdfs.md), but provides S3-specific features.
|
Provides table-like interface to select/insert files in [Amazon S3](https://aws.amazon.com/s3/). This table function is similar to [hdfs](../../sql-reference/table-functions/hdfs.md), but provides S3-specific features.
|
||||||
|
|
||||||
@ -125,6 +125,30 @@ INSERT INTO FUNCTION s3('https://storage.yandexcloud.net/my-test-bucket-768/test
|
|||||||
SELECT name, value FROM existing_table;
|
SELECT name, value FROM existing_table;
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Partitioned Write {#partitioned-write}
|
||||||
|
|
||||||
|
If you specify `PARTITION BY` expression when inserting data into `S3` table, a separate file is created for each partition value. Splitting the data into separate files helps to improve reading operations efficiency.
|
||||||
|
|
||||||
|
**Examples**
|
||||||
|
|
||||||
|
1. Using partition ID in a key creates separate files:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
INSERT INTO TABLE FUNCTION
|
||||||
|
s3('http://bucket.amazonaws.com/my_bucket/file_{_partition_id}.csv', 'CSV', 'a String, b UInt32, c UInt32')
|
||||||
|
PARTITION BY a VALUES ('x', 2, 3), ('x', 4, 5), ('y', 11, 12), ('y', 13, 14), ('z', 21, 22), ('z', 23, 24);
|
||||||
|
```
|
||||||
|
As a result, the data is written into three files: `file_x.csv`, `file_y.csv`, and `file_z.csv`.
|
||||||
|
|
||||||
|
2. Using partition ID in a bucket name creates files in different buckets:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
INSERT INTO TABLE FUNCTION
|
||||||
|
s3('http://bucket.amazonaws.com/my_bucket_{_partition_id}/file.csv', 'CSV', 'a UInt32, b UInt32, c UInt32')
|
||||||
|
PARTITION BY a VALUES (1, 2, 3), (1, 4, 5), (10, 11, 12), (10, 13, 14), (20, 21, 22), (20, 23, 24);
|
||||||
|
```
|
||||||
|
As a result, the data is written into three files in different buckets: `my_bucket_1/file.csv`, `my_bucket_10/file.csv`, and `my_bucket_20/file.csv`.
|
||||||
|
|
||||||
**See Also**
|
**See Also**
|
||||||
|
|
||||||
- [S3 engine](../../engines/table-engines/integrations/s3.md)
|
- [S3 engine](../../engines/table-engines/integrations/s3.md)
|
||||||
|
@ -151,4 +151,4 @@ ENGINE = S3('https://storage.yandexcloud.net/my-test-bucket-768/big_prefix/file-
|
|||||||
|
|
||||||
**Смотрите также**
|
**Смотрите также**
|
||||||
|
|
||||||
- [Табличная функция S3](../../../sql-reference/table-functions/s3.md)
|
- [Табличная функция s3](../../../sql-reference/table-functions/s3.md)
|
||||||
|
@ -133,6 +133,30 @@ INSERT INTO FUNCTION s3('https://storage.yandexcloud.net/my-test-bucket-768/test
|
|||||||
SELECT name, value FROM existing_table;
|
SELECT name, value FROM existing_table;
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Партиционирование при записи данных {#partitioned-write}
|
||||||
|
|
||||||
|
Если при добавлении данных в таблицу S3 указать выражение `PARTITION BY`, то для каждого значения ключа партиционирования создается отдельный файл. Это повышает эффективность операций чтения.
|
||||||
|
|
||||||
|
**Примеры**
|
||||||
|
|
||||||
|
1. При использовании ID партиции в имени ключа создаются отдельные файлы:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
INSERT INTO TABLE FUNCTION
|
||||||
|
s3('http://bucket.amazonaws.com/my_bucket/file_{_partition_id}.csv', 'CSV', 'a UInt32, b UInt32, c UInt32')
|
||||||
|
PARTITION BY a VALUES ('x', 2, 3), ('x', 4, 5), ('y', 11, 12), ('y', 13, 14), ('z', 21, 22), ('z', 23, 24);
|
||||||
|
```
|
||||||
|
В результате данные будут записаны в три файла: `file_x.csv`, `file_y.csv` и `file_z.csv`.
|
||||||
|
|
||||||
|
2. При использовании ID партиции в названии бакета создаются файлы в разных бакетах:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
INSERT INTO TABLE FUNCTION
|
||||||
|
s3('http://bucket.amazonaws.com/my_bucket_{_partition_id}/file.csv', 'CSV', 'a UInt32, b UInt32, c UInt32')
|
||||||
|
PARTITION BY a VALUES (1, 2, 3), (1, 4, 5), (10, 11, 12), (10, 13, 14), (20, 21, 22), (20, 23, 24);
|
||||||
|
```
|
||||||
|
В результате будут созданы три файла в разных бакетах: `my_bucket_1/file.csv`, `my_bucket_10/file.csv` и `my_bucket_20/file.csv`.
|
||||||
|
|
||||||
**Смотрите также**
|
**Смотрите также**
|
||||||
|
|
||||||
- [Движок таблиц S3](../../engines/table-engines/integrations/s3.md)
|
- [Движок таблиц S3](../../engines/table-engines/integrations/s3.md)
|
||||||
|
Loading…
Reference in New Issue
Block a user