ClickHouse/docs/en/sql-reference/table-functions/s3.md

---
slug: /en/sql-reference/table-functions/s3
sidebar_position: 180
sidebar_label: s3
keywords: [s3, gcs, bucket]
---

# s3 Table Function

Provides a table-like interface to select/insert files in [Amazon S3](https://aws.amazon.com/s3/) and [Google Cloud Storage](https://cloud.google.com/storage/). This table function is similar to the [hdfs function](../../sql-reference/table-functions/hdfs.md), but provides S3-specific features.

If you have multiple replicas in your cluster, you can use the [s3Cluster function](../../sql-reference/table-functions/s3Cluster.md) instead to parallelize inserts.

When using the `s3 table function` with [`INSERT INTO...SELECT`](../../sql-reference/statements/insert-into#inserting-the-results-of-select), data is read and inserted in a streaming fashion. Only a few blocks of data reside in memory while the blocks are continuously read from S3 and pushed into the destination table.

**Syntax**

``` sql
s3(url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,structure] [,compression_method])
s3(named_collection[, option=value [,..]])
```

:::tip GCS
The S3 Table Function integrates with Google Cloud Storage by using the GCS XML API and HMAC keys.  See the [Google interoperability docs]( https://cloud.google.com/storage/docs/interoperability) for more details about the endpoint and HMAC.

For GCS, substitute your HMAC key and HMAC secret where you see `access_key_id` and `secret_access_key`.
:::

**Parameters**

`s3` table function supports the following plain parameters:

- `url` — Bucket url with path to file. Supports following wildcards in readonly mode: `*`, `**`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc'`, `'def'` — strings. For more information see [here](../../engines/table-engines/integrations/s3.md#wildcards-in-path).
  :::note GCS
  The GCS url is in this format as the endpoint for the Google XML API is different than the JSON API:
  ```
  https://storage.googleapis.com/<bucket>/<folder>/<filename(s)>
  ```
  and not ~~https://storage.cloud.google.com~~.
  :::
- `NOSIGN` — If this keyword is provided in place of credentials, all the requests will not be signed.
- `access_key_id` and `secret_access_key` — Keys that specify credentials to use with given endpoint. Optional.
- `session_token` - Session token to use with the given keys. Optional when passing keys.
- `format` — The [format](../../interfaces/formats.md#formats) of the file.
- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`.
- `compression_method` — Parameter is optional. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. By default, it will autodetect compression method by file extension.

Arguments can also be passed using [named collections](/docs/en/operations/named-collections.md). In this case `url`, `access_key_id`, `secret_access_key`, `format`, `structure`, `compression_method` work in the same way, and some extra parameters are supported:

 - `filename` — appended to the url if specified.
 - `use_environment_credentials` — enabled by default, allows passing extra parameters using environment variables `AWS_CONTAINER_CREDENTIALS_RELATIVE_URI`, `AWS_CONTAINER_CREDENTIALS_FULL_URI`, `AWS_CONTAINER_AUTHORIZATION_TOKEN`, `AWS_EC2_METADATA_DISABLED`.
 - `no_sign_request` — disabled by default.
 - `expiration_window_seconds` — default value is 120.

**Returned value**

A table with the specified structure for reading or writing data in the specified file.

**Examples**

Selecting the first 5 rows from the table from S3 file `https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv`:

``` sql
SELECT *
FROM s3(
   'https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv',
   'CSVWithNames'
)
LIMIT 5;
```

```response
┌───────Date─┬────Open─┬────High─┬─────Low─┬───Close─┬───Volume─┬─OpenInt─┐
│ 1984-09-07 │ 0.42388 │ 0.42902 │ 0.41874 │ 0.42388 │ 23220030 │       0 │
│ 1984-09-10 │ 0.42388 │ 0.42516 │ 0.41366 │ 0.42134 │ 18022532 │       0 │
│ 1984-09-11 │ 0.42516 │ 0.43668 │ 0.42516 │ 0.42902 │ 42498199 │       0 │
│ 1984-09-12 │ 0.42902 │ 0.43157 │ 0.41618 │ 0.41618 │ 37125801 │       0 │
│ 1984-09-13 │ 0.43927 │ 0.44052 │ 0.43927 │ 0.43927 │ 57822062 │       0 │
└────────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┘
```

:::note
ClickHouse uses filename extensions to determine the format of the data. For example, we could have run the previous command without the `CSVWithNames`:

``` sql
SELECT *
FROM s3(
   'https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv'
)
LIMIT 5;
```

ClickHouse also can determine the compression method of the file. For example, if the file was zipped up with a `.csv.gz` extension, ClickHouse would decompress the file automatically.
:::


## Usage

Suppose that we have several files with following URIs on S3:

- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/some_prefix/some_file_1.csv'
- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/some_prefix/some_file_2.csv'
- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/some_prefix/some_file_3.csv'
- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/some_prefix/some_file_4.csv'
- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/another_prefix/some_file_1.csv'
- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/another_prefix/some_file_2.csv'
- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/another_prefix/some_file_3.csv'
- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/another_prefix/some_file_4.csv'

Count the amount of rows in files ending with numbers from 1 to 3:

``` sql
SELECT count(*)
FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/{some,another}_prefix/some_file_{1..3}.csv', 'CSV', 'name String, value UInt32')
```

``` text
┌─count()─┐
│      18 │
└─────────┘
```

Count the total amount of rows in all files in these two directories:

``` sql
SELECT count(*)
FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/{some,another}_prefix/*', 'CSV', 'name String, value UInt32')
```

``` text
┌─count()─┐
│      24 │
└─────────┘
```

:::tip
If your listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
:::

Count the total amount of rows in files named `file-000.csv`, `file-001.csv`, … , `file-999.csv`:

``` sql
SELECT count(*)
FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/big_prefix/file-{000..999}.csv', 'CSV', 'name String, value UInt32');
```

``` text
┌─count()─┐
│      12 │
└─────────┘
```

Insert data into file `test-data.csv.gz`:

``` sql
INSERT INTO FUNCTION s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')
VALUES ('test-data', 1), ('test-data-2', 2);
```

Insert data into file `test-data.csv.gz` from existing table:

``` sql
INSERT INTO FUNCTION s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')
SELECT name, value FROM existing_table;
```

Glob ** can be used for recursive directory traversal. Consider the below example, it will fetch all files from `my-test-bucket-768` directory recursively:

``` sql
SELECT * FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**', 'CSV', 'name String, value UInt32', 'gzip');
```

The below get data from all `test-data.csv.gz` files from any folder inside `my-test-bucket` directory recursively:

``` sql
SELECT * FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip');
```

Note. It is possible to specify custom URL mappers in the server configuration file. Example:
``` sql
SELECT * FROM s3('s3://clickhouse-public-datasets/my-test-bucket-768/**/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip');
```
The URL `'s3://clickhouse-public-datasets/my-test-bucket-768/**/test-data.csv.gz'` would be replaced to `'http://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**/test-data.csv.gz'`


Custom mapper can be added into `config.xml`:
``` xml
<url_scheme_mappers>
   <s3>
      <to>https://{bucket}.s3.amazonaws.com</to>
   </s3>
   <gs>
      <to>https://{bucket}.storage.googleapis.com</to>
   </gs>
   <oss>
      <to>https://{bucket}.oss.aliyuncs.com</to>
   </oss>
</url_scheme_mappers>
```

For production use cases it is recommended to use [named collections](/docs/en/operations/named-collections.md). Here is the example:
``` sql

CREATE NAMED COLLECTION creds AS
        access_key_id = '***',
        secret_access_key = '***';
SELECT count(*)
FROM s3(creds, url='https://s3-object-url.csv')
```

## Partitioned Write

If you specify `PARTITION BY` expression when inserting data into `S3` table, a separate file is created for each partition value. Splitting the data into separate files helps to improve reading operations efficiency.

**Examples**

1. Using partition ID in a key creates separate files:

```sql
INSERT INTO TABLE FUNCTION
    s3('http://bucket.amazonaws.com/my_bucket/file_{_partition_id}.csv', 'CSV', 'a String, b UInt32, c UInt32')
    PARTITION BY a VALUES ('x', 2, 3), ('x', 4, 5), ('y', 11, 12), ('y', 13, 14), ('z', 21, 22), ('z', 23, 24);
```
As a result, the data is written into three files: `file_x.csv`, `file_y.csv`, and `file_z.csv`.

2. Using partition ID in a bucket name creates files in different buckets:

```sql
INSERT INTO TABLE FUNCTION
    s3('http://bucket.amazonaws.com/my_bucket_{_partition_id}/file.csv', 'CSV', 'a UInt32, b UInt32, c UInt32')
    PARTITION BY a VALUES (1, 2, 3), (1, 4, 5), (10, 11, 12), (10, 13, 14), (20, 21, 22), (20, 23, 24);
```
As a result, the data is written into three files in different buckets: `my_bucket_1/file.csv`, `my_bucket_10/file.csv`, and `my_bucket_20/file.csv`.

## Accessing public buckets

ClickHouse tries to fetch credentials from many different types of sources.
Sometimes, it can produce problems when accessing some buckets that are public causing the client to return `403` error code.
This issue can be avoided by using `NOSIGN` keyword, forcing the client to ignore all the credentials, and not sign the requests.

``` sql
SELECT *
FROM s3(
   'https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv',
   NOSIGN,
   'CSVWithNames'
)
LIMIT 5;
```

## Virtual Columns {#virtual-columns}

- `_path` — Path to the file. Type: `LowCardinalty(String)`.
- `_file` — Name of the file. Type: `LowCardinalty(String)`.
- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the file size is unknown, the value is `NULL`.

## Storage Settings {#storage-settings}

- [s3_truncate_on_insert](/docs/en/operations/settings/settings.md#s3_truncate_on_insert) - allows to truncate file before insert into it. Disabled by default.
- [s3_create_multiple_files](/docs/en/operations/settings/settings.md#s3_allow_create_multiple_files) - allows to create a new file on each insert if format has suffix. Disabled by default.
- [s3_skip_empty_files](/docs/en/operations/settings/settings.md#s3_skip_empty_files) - allows to skip empty files while reading. Disabled by default.

**See Also**

- [S3 engine](../../engines/table-engines/integrations/s3.md)
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			`---`
add slugs 2022-08-28 14:53:34 +00:00			`slug: /en/sql-reference/table-functions/s3`
Alphabetize table functions and engines 2023-06-23 13:16:22 +00:00			`sidebar_position: 180`
Removed /ja folder, cleaned up /ru markdown 2022-04-09 13:29:05 +00:00			`sidebar_label: s3`
add notes for GCS 2023-02-02 15:56:39 +00:00			`keywords: [s3, gcs, bucket]`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			`---`

Remove H1 anchor tags from docs 2022-06-02 10:55:18 +00:00			`# s3 Table Function`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
add notes for GCS 2023-02-02 15:56:39 +00:00			`Provides a table-like interface to select/insert files in [Amazon S3](https://aws.amazon.com/s3/) and [Google Cloud Storage](https://cloud.google.com/storage/). This table function is similar to the [hdfs function](../../sql-reference/table-functions/hdfs.md), but provides S3-specific features.`
edit en text 2021-03-05 08:00:49 +00:00
[Docs] Add details to S3 and GCS table functions 2023-11-13 20:57:05 +00:00			`If you have multiple replicas in your cluster, you can use the [s3Cluster function](../../sql-reference/table-functions/s3Cluster.md) instead to parallelize inserts.`

[Docs] Add details on S3 table function memory usage/streaming 2023-11-13 21:03:25 +00:00			When using the `s3 table function` with [`INSERT INTO...SELECT`](../../sql-reference/statements/insert-into#inserting-the-results-of-select), data is read and inserted in a streaming fashion. Only a few blocks of data reside in memory while the blocks are continuously read from S3 and pushed into the destination table.

casting to template 2021-03-05 08:31:16 +00:00			`Syntax`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			``` sql
Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			`s3(url [, NOSIGN \| access_key_id, secret_access_key, [session_token]] [,format] [,structure] [,compression_method])`
			`s3(named_collection[, option=value [,..]])`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

add notes for GCS 2023-02-02 15:56:39 +00:00			`:::tip GCS`
			`The S3 Table Function integrates with Google Cloud Storage by using the GCS XML API and HMAC keys. See the [Google interoperability docs]( https://cloud.google.com/storage/docs/interoperability) for more details about the endpoint and HMAC.`

Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			For GCS, substitute your HMAC key and HMAC secret where you see `access_key_id` and `secret_access_key`.
add notes for GCS 2023-02-02 15:56:39 +00:00			`:::`

Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			`Parameters`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			`s3` table function supports the following plain parameters:
add notes for GCS 2023-02-02 15:56:39 +00:00
Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			- `url` — Bucket url with path to file. Supports following wildcards in readonly mode: ``, `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc'`, `'def'` — strings. For more information see [here](../../engines/table-engines/integrations/s3.md#wildcards-in-path).
add notes for GCS 2023-02-02 15:56:39 +00:00			`:::note GCS`
Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			`The GCS url is in this format as the endpoint for the Google XML API is different than the JSON API:`
add notes for GCS 2023-02-02 15:56:39 +00:00			```
			`https://storage.googleapis.com/<bucket>/<folder>/<filename(s)>`
			```
			`and not ~~https://storage.cloud.google.com~~.`
			`:::`
Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			- `NOSIGN` — If this keyword is provided in place of credentials, all the requests will not be signed.
			- `access_key_id` and `secret_access_key` — Keys that specify credentials to use with given endpoint. Optional.
StorageS3 / TableFunctionS3: Allow passing session_token to AuthSettings This can help users that want to pass temporary credentials that issued by AWS in order to load data from S3 without changing configuration or creating an IAM User. Fixes #57848 2023-12-14 08:05:01 +00:00			- `session_token` - Session token to use with the given keys. Optional when passing keys.
Docs: Replace annoying three spaces in enumerations by a single space 2023-04-19 15:55:29 +00:00			- `format` — The [format](../../interfaces/formats.md#formats) of the file.
			- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`.
Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			- `compression_method` — Parameter is optional. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. By default, it will autodetect compression method by file extension.

			Arguments can also be passed using [named collections](/docs/en/operations/named-collections.md). In this case `url`, `access_key_id`, `secret_access_key`, `format`, `structure`, `compression_method` work in the same way, and some extra parameters are supported:

			- `filename` — appended to the url if specified.
			- `use_environment_credentials` — enabled by default, allows passing extra parameters using environment variables `AWS_CONTAINER_CREDENTIALS_RELATIVE_URI`, `AWS_CONTAINER_CREDENTIALS_FULL_URI`, `AWS_CONTAINER_AUTHORIZATION_TOKEN`, `AWS_EC2_METADATA_DISABLED`.
			- `no_sign_request` — disabled by default.
			- `expiration_window_seconds` — default value is 120.
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			`Returned value`

			`A table with the specified structure for reading or writing data in the specified file.`

edit en text 2021-03-05 08:00:49 +00:00			`Examples`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
Update s3.md 2023-03-20 14:57:13 +00:00			Selecting the first 5 rows from the table from S3 file `https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv`:
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			``` sql
			`SELECT *`
Update s3.md 2023-03-20 14:57:13 +00:00			`FROM s3(`
			`'https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv',`
			`'CSVWithNames'`
			`)`
			`LIMIT 5;`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

Update s3.md 2023-03-20 14:57:13 +00:00			```response
			`┌───────Date─┬────Open─┬────High─┬─────Low─┬───Close─┬───Volume─┬─OpenInt─┐`
			`│ 1984-09-07 │ 0.42388 │ 0.42902 │ 0.41874 │ 0.42388 │ 23220030 │ 0 │`
			`│ 1984-09-10 │ 0.42388 │ 0.42516 │ 0.41366 │ 0.42134 │ 18022532 │ 0 │`
			`│ 1984-09-11 │ 0.42516 │ 0.43668 │ 0.42516 │ 0.42902 │ 42498199 │ 0 │`
			`│ 1984-09-12 │ 0.42902 │ 0.43157 │ 0.41618 │ 0.41618 │ 37125801 │ 0 │`
			`│ 1984-09-13 │ 0.43927 │ 0.44052 │ 0.43927 │ 0.43927 │ 57822062 │ 0 │`
			`└────────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┘`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

Update s3.md 2023-03-20 14:57:13 +00:00			`:::note`
Update s3.md 2023-03-21 17:03:44 +00:00			ClickHouse uses filename extensions to determine the format of the data. For example, we could have run the previous command without the `CSVWithNames`:
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			``` sql
			`SELECT *`
Update s3.md 2023-03-20 14:57:13 +00:00			`FROM s3(`
			`'https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv'`
			`)`
			`LIMIT 5;`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			ClickHouse also can determine the compression method of the file. For example, if the file was zipped up with a `.csv.gz` extension, ClickHouse would decompress the file automatically.
Update s3.md 2023-03-20 14:57:13 +00:00			`:::`

Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
Remove H1 anchor tags from docs 2022-06-02 10:55:18 +00:00			`## Usage`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
edit en text 2021-03-05 08:00:49 +00:00			`Suppose that we have several files with following URIs on S3:`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
Docs: Replace annoying three spaces in enumerations by a single space 2023-04-19 15:55:29 +00:00			`- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/some_prefix/some_file_1.csv'`
			`- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/some_prefix/some_file_2.csv'`
			`- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/some_prefix/some_file_3.csv'`
			`- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/some_prefix/some_file_4.csv'`
			`- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/another_prefix/some_file_1.csv'`
			`- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/another_prefix/some_file_2.csv'`
			`- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/another_prefix/some_file_3.csv'`
			`- 'https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/another_prefix/some_file_4.csv'`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
Update docs/en/sql-reference/table-functions/s3.md Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com> 2021-03-10 06:13:46 +00:00			`Count the amount of rows in files ending with numbers from 1 to 3:`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			``` sql
			`SELECT count(*)`
Incorporated feedback 2022-03-12 16:04:51 +00:00			`FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/{some,another}_prefix/some_file_{1..3}.csv', 'CSV', 'name String, value UInt32')`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

			``` text
			`┌─count()─┐`
			`│ 18 │`
			`└─────────┘`
			```

Update docs/en/sql-reference/table-functions/s3.md Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com> 2021-03-10 05:48:52 +00:00			`Count the total amount of rows in all files in these two directories:`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			``` sql
			`SELECT count(*)`
Incorporated feedback 2022-03-12 16:04:51 +00:00			`FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/{some,another}_prefix/*', 'CSV', 'name String, value UInt32')`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

			``` text
			`┌─count()─┐`
			`│ 24 │`
			`└─────────┘`
			```

standardize admonitions 2023-03-27 18:54:05 +00:00			`:::tip`
Removed /ja folder, cleaned up /ru markdown 2022-04-09 13:29:05 +00:00			If your listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
			`:::`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
Update docs/en/sql-reference/table-functions/s3.md Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com> 2021-03-10 06:13:40 +00:00			Count the total amount of rows in files named `file-000.csv`, `file-001.csv`, … , `file-999.csv`:
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			``` sql
			`SELECT count(*)`
Incorporated feedback 2022-03-12 16:04:51 +00:00			`FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/big_prefix/file-{000..999}.csv', 'CSV', 'name String, value UInt32');`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

			``` text
			`┌─count()─┐`
			`│ 12 │`
			`└─────────┘`
			```

Update docs/en/sql-reference/table-functions/s3.md Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com> 2021-03-10 05:49:05 +00:00			Insert data into file `test-data.csv.gz`:
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			``` sql
Incorporated feedback 2022-03-12 16:04:51 +00:00			`INSERT INTO FUNCTION s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')`
revert some changes 2021-03-05 09:22:15 +00:00			`VALUES ('test-data', 1), ('test-data-2', 2);`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

Update docs/en/sql-reference/table-functions/s3.md Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com> 2021-03-10 06:13:31 +00:00			Insert data into file `test-data.csv.gz` from existing table:
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00
			``` sql
Incorporated feedback 2022-03-12 16:04:51 +00:00			`INSERT INTO FUNCTION s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')`
revert some changes 2021-03-05 09:22:15 +00:00			`SELECT name, value FROM existing_table;`
Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			```

Added documentation for file and s3 - 36316 Support ** glob for recursive directory traversal 2022-10-18 12:08:26 +00:00			Glob ** can be used for recursive directory traversal. Consider the below example, it will fetch all files from `my-test-bucket-768` directory recursively:

			``` sql
			`SELECT * FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**', 'CSV', 'name String, value UInt32', 'gzip');`
			```

			The below get data from all `test-data.csv.gz` files from any folder inside `my-test-bucket` directory recursively:

			``` sql
			`SELECT * FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip');`
			```

Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			`Note. It is possible to specify custom URL mappers in the server configuration file. Example:`
added docs 2023-09-27 19:02:01 +00:00			``` sql
			`SELECT * FROM s3('s3://clickhouse-public-datasets/my-test-bucket-768/**/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip');`
			```
modified docs 2023-09-28 10:20:34 +00:00			The URL `'s3://clickhouse-public-datasets/my-test-bucket-768//test-data.csv.gz'` would be replaced to `'http://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768//test-data.csv.gz'`


Update s3.md 2023-09-28 10:22:48 +00:00			Custom mapper can be added into `config.xml`:
modified docs 2023-09-28 10:20:34 +00:00			``` xml
			`<url_scheme_mappers>`
			`<s3>`
changed docs 2023-09-28 15:20:09 +00:00			`<to>https://{bucket}.s3.amazonaws.com</to>`
modified docs 2023-09-28 10:20:34 +00:00			`</s3>`
			`<gs>`
changed docs 2023-09-28 15:20:09 +00:00			`<to>https://{bucket}.storage.googleapis.com</to>`
modified docs 2023-09-28 10:20:34 +00:00			`</gs>`
			`<oss>`
changed docs 2023-09-28 15:20:09 +00:00			`<to>https://{bucket}.oss.aliyuncs.com</to>`
modified docs 2023-09-28 10:20:34 +00:00			`</oss>`
			`</url_scheme_mappers>`
			```
added docs 2023-09-27 19:02:01 +00:00
Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			`For production use cases it is recommended to use [named collections](/docs/en/operations/named-collections.md). Here is the example:`
			``` sql

			`CREATE NAMED COLLECTION creds AS`
			`access_key_id = '***',`
			`secret_access_key = '***';`
			`SELECT count(*)`
Add named collections to gcs table function docs 2023-12-06 21:04:10 +00:00			`FROM s3(creds, url='https://s3-object-url.csv')`
Add named collections to s3 table function docs 2023-12-06 20:48:26 +00:00			```

Remove H1 anchor tags from docs 2022-06-02 10:55:18 +00:00			`## Partitioned Write`
Initial (en) 2021-09-08 02:32:02 +00:00
Update docs/en/sql-reference/table-functions/s3.md 2021-09-09 19:26:51 +00:00			If you specify `PARTITION BY` expression when inserting data into `S3` table, a separate file is created for each partition value. Splitting the data into separate files helps to improve reading operations efficiency.
Initial (en) 2021-09-08 02:32:02 +00:00
			`Examples`

Apply suggestions from code review 2021-09-10 06:31:49 +00:00			`1. Using partition ID in a key creates separate files:`
Initial (en) 2021-09-08 02:32:02 +00:00
			```sql
			`INSERT INTO TABLE FUNCTION`
Apply suggestions from code review 2021-09-09 19:33:59 +00:00			`s3('http://bucket.amazonaws.com/my_bucket/file_{_partition_id}.csv', 'CSV', 'a String, b UInt32, c UInt32')`
			`PARTITION BY a VALUES ('x', 2, 3), ('x', 4, 5), ('y', 11, 12), ('y', 13, 14), ('z', 21, 22), ('z', 23, 24);`
Initial (en) 2021-09-08 02:32:02 +00:00			```
Apply suggestions from code review 2021-09-09 19:33:59 +00:00			As a result, the data is written into three files: `file_x.csv`, `file_y.csv`, and `file_z.csv`.
Initial (en) 2021-09-08 02:32:02 +00:00
Translated and minor fixes 2021-09-08 02:52:17 +00:00			`2. Using partition ID in a bucket name creates files in different buckets:`
Initial (en) 2021-09-08 02:32:02 +00:00
			```sql
			`INSERT INTO TABLE FUNCTION`
			`s3('http://bucket.amazonaws.com/my_bucket_{_partition_id}/file.csv', 'CSV', 'a UInt32, b UInt32, c UInt32')`
Apply suggestions from code review 2021-09-09 19:33:59 +00:00			`PARTITION BY a VALUES (1, 2, 3), (1, 4, 5), (10, 11, 12), (10, 13, 14), (20, 21, 22), (20, 23, 24);`
Initial (en) 2021-09-08 02:32:02 +00:00			```
			As a result, the data is written into three files in different buckets: `my_bucket_1/file.csv`, `my_bucket_10/file.csv`, and `my_bucket_20/file.csv`.

update docs 2023-03-28 07:19:08 +00:00			`## Accessing public buckets`

			`ClickHouse tries to fetch credentials from many different types of sources.`
			Sometimes, it can produce problems when accessing some buckets that are public causing the client to return `403` error code.
			This issue can be avoided by using `NOSIGN` keyword, forcing the client to ignore all the credentials, and not sign the requests.

			``` sql
			`SELECT *`
			`FROM s3(`
			`'https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv',`
			`NOSIGN,`
			`'CSVWithNames'`
			`)`
			`LIMIT 5;`
			```
Update s3.md 2023-03-21 05:20:49 +00:00
Add docs 2023-11-22 18:21:30 +00:00			`## Virtual Columns {#virtual-columns}`

			- `_path` — Path to the file. Type: `LowCardinalty(String)`.
			- `_file` — Name of the file. Type: `LowCardinalty(String)`.
			- `_size` — Size of the file in bytes. Type: `Nullable(UInt64)`. If the file size is unknown, the value is `NULL`.

Add docs, fix style 2023-05-31 17:52:29 +00:00			`## Storage Settings {#storage-settings}`

Fix anchors to settings.md 2023-12-20 18:26:36 +00:00			`- [s3_truncate_on_insert](/docs/en/operations/settings/settings.md#s3_truncate_on_insert) - allows to truncate file before insert into it. Disabled by default.`
Add docs, fix style 2023-05-31 17:52:29 +00:00			`- [s3_create_multiple_files](/docs/en/operations/settings/settings.md#s3_allow_create_multiple_files) - allows to create a new file on each insert if format has suffix. Disabled by default.`
			`- [s3_skip_empty_files](/docs/en/operations/settings/settings.md#s3_skip_empty_files) - allows to skip empty files while reading. Disabled by default.`

Add S3 table function / engine documentation [EN] 2020-12-18 09:46:50 +00:00			`See Also`

Docs: Replace annoying three spaces in enumerations by a single space 2023-04-19 15:55:29 +00:00			`- [S3 engine](../../engines/table-engines/integrations/s3.md)`