diff --git a/docs/en/engines/table-engines/integrations/s3.md b/docs/en/engines/table-engines/integrations/s3.md index 7249e24aff9..e494e9aec6a 100644 --- a/docs/en/engines/table-engines/integrations/s3.md +++ b/docs/en/engines/table-engines/integrations/s3.md @@ -210,4 +210,4 @@ ENGINE = S3('https://storage.yandexcloud.net/my-test-bucket-768/big_prefix/file- ## See also -- [S3 table function](../../../sql-reference/table-functions/s3.md) +- [s3 table function](../../../sql-reference/table-functions/s3.md) diff --git a/docs/en/sql-reference/table-functions/s3.md b/docs/en/sql-reference/table-functions/s3.md index d84edb3f46e..ffba8f5c6d3 100644 --- a/docs/en/sql-reference/table-functions/s3.md +++ b/docs/en/sql-reference/table-functions/s3.md @@ -3,7 +3,7 @@ toc_priority: 45 toc_title: s3 --- -# S3 Table Function {#s3-table-function} +# s3 Table Function {#s3-table-function} Provides table-like interface to select/insert files in [Amazon S3](https://aws.amazon.com/s3/). This table function is similar to [hdfs](../../sql-reference/table-functions/hdfs.md), but provides S3-specific features. @@ -125,6 +125,30 @@ INSERT INTO FUNCTION s3('https://storage.yandexcloud.net/my-test-bucket-768/test SELECT name, value FROM existing_table; ``` +## Partitioned Write {#partitioned-write} + +If you specify `PARTITION BY` expression when inserting data into `S3` table, a separate file is created for each partition value. Splitting the data into separate files helps to improve reading operations efficiency. + +**Examples** + +1. Using partition ID in a key creates separate files: + +```sql +INSERT INTO TABLE FUNCTION + s3('http://bucket.amazonaws.com/my_bucket/file_{_partition_id}.csv', 'CSV', 'a String, b UInt32, c UInt32') + PARTITION BY a VALUES ('x', 2, 3), ('x', 4, 5), ('y', 11, 12), ('y', 13, 14), ('z', 21, 22), ('z', 23, 24); +``` +As a result, the data is written into three files: `file_x.csv`, `file_y.csv`, and `file_z.csv`. + +2. Using partition ID in a bucket name creates files in different buckets: + +```sql +INSERT INTO TABLE FUNCTION + s3('http://bucket.amazonaws.com/my_bucket_{_partition_id}/file.csv', 'CSV', 'a UInt32, b UInt32, c UInt32') + PARTITION BY a VALUES (1, 2, 3), (1, 4, 5), (10, 11, 12), (10, 13, 14), (20, 21, 22), (20, 23, 24); +``` +As a result, the data is written into three files in different buckets: `my_bucket_1/file.csv`, `my_bucket_10/file.csv`, and `my_bucket_20/file.csv`. + **See Also** - [S3 engine](../../engines/table-engines/integrations/s3.md) diff --git a/docs/ru/engines/table-engines/integrations/s3.md b/docs/ru/engines/table-engines/integrations/s3.md index 5895bd43d2f..c90b7293e1c 100644 --- a/docs/ru/engines/table-engines/integrations/s3.md +++ b/docs/ru/engines/table-engines/integrations/s3.md @@ -151,4 +151,4 @@ ENGINE = S3('https://storage.yandexcloud.net/my-test-bucket-768/big_prefix/file- **Смотрите также** -- [Табличная функция S3](../../../sql-reference/table-functions/s3.md) +- [Табличная функция s3](../../../sql-reference/table-functions/s3.md) diff --git a/docs/ru/sql-reference/table-functions/s3.md b/docs/ru/sql-reference/table-functions/s3.md index 597f145c096..c8dbcf81559 100644 --- a/docs/ru/sql-reference/table-functions/s3.md +++ b/docs/ru/sql-reference/table-functions/s3.md @@ -133,6 +133,30 @@ INSERT INTO FUNCTION s3('https://storage.yandexcloud.net/my-test-bucket-768/test SELECT name, value FROM existing_table; ``` +## Партиционирование при записи данных {#partitioned-write} + +Если при добавлении данных в таблицу S3 указать выражение `PARTITION BY`, то для каждого значения ключа партиционирования создается отдельный файл. Это повышает эффективность операций чтения. + +**Примеры** + +1. При использовании ID партиции в имени ключа создаются отдельные файлы: + +```sql +INSERT INTO TABLE FUNCTION + s3('http://bucket.amazonaws.com/my_bucket/file_{_partition_id}.csv', 'CSV', 'a UInt32, b UInt32, c UInt32') + PARTITION BY a VALUES ('x', 2, 3), ('x', 4, 5), ('y', 11, 12), ('y', 13, 14), ('z', 21, 22), ('z', 23, 24); +``` +В результате данные будут записаны в три файла: `file_x.csv`, `file_y.csv` и `file_z.csv`. + +2. При использовании ID партиции в названии бакета создаются файлы в разных бакетах: + +```sql +INSERT INTO TABLE FUNCTION + s3('http://bucket.amazonaws.com/my_bucket_{_partition_id}/file.csv', 'CSV', 'a UInt32, b UInt32, c UInt32') + PARTITION BY a VALUES (1, 2, 3), (1, 4, 5), (10, 11, 12), (10, 13, 14), (20, 21, 22), (20, 23, 24); +``` +В результате будут созданы три файла в разных бакетах: `my_bucket_1/file.csv`, `my_bucket_10/file.csv` и `my_bucket_20/file.csv`. + **Смотрите также** - [Движок таблиц S3](../../engines/table-engines/integrations/s3.md)