Add documentation

This commit is contained in:
kssenii 2024-10-10 13:40:08 +08:00
parent 6f3c1fe0e8
commit 367cdb7227
4 changed files with 67 additions and 0 deletions

View File

@ -63,7 +63,34 @@ Currently there are 3 ways to authenticate:
- `SAS Token` - Can be used by providing an `endpoint`, `connection_string` or `storage_account_url`. It is identified by presence of '?' in the url.
- `Workload Identity` - Can be used by providing an `endpoint` or `storage_account_url`. If `use_workload_identity` parameter is set in config, ([workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications)) is used for authentication.
### Data cache {#data-cache}
`Azure` table engine supports data caching on local disk.
See filesystem cache configuration options and usage in this [section](/docs/en/operations/storing-data.md/#using-local-cache).
Caching is made depending on the path and ETag of the storage object, so clickhouse will not read a stale cache version.
To enable caching use a setting `filesystem_cache_name = '<name>'` and `enable_filesystem_cache = 1`.
```sql
SELECT *
FROM azureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;', 'test_container', 'test_table', 'CSV')
SETTINGS filesystem_cache_name = 'cache_for_azure', enable_filesystem_cache = 1;
```
1. add the following section to clickhouse configuration file:
``` xml
<clickhouse>
<filesystem_caches>
<cache_for_azure>
<path>path to cache directory</path>
<max_size>10Gi</max_size>
</cache_for_azure>
</filesystem_caches>
</clickhouse>
```
2. reuse cache configuration (and therefore cache storage) from clickhouse `storage_configuration` section, [described here](/docs/en/operations/storing-data.md/#using-local-cache)
## See also

View File

@ -48,6 +48,10 @@ Using named collections:
CREATE TABLE deltalake ENGINE=DeltaLake(deltalake_conf, filename = 'test_table')
```
### Data cache {#data-cache}
`Iceberg` table engine and table function support data caching same as `S3`, `AzureBlobStorage`, `HDFS` storages. See [here](../../../engines/table-engines/integrations/s3.md#data-cache).
## See also
- [deltaLake table function](../../../sql-reference/table-functions/deltalake.md)

View File

@ -60,6 +60,10 @@ CREATE TABLE iceberg_table ENGINE=IcebergS3(iceberg_conf, filename = 'test_table
Table engine `Iceberg` is an alias to `IcebergS3` now.
### Data cache {#data-cache}
`Iceberg` table engine and table function support data caching same as `S3`, `AzureBlobStorage`, `HDFS` storages. See [here](../../../engines/table-engines/integrations/s3.md#data-cache).
## See also
- [iceberg table function](/docs/en/sql-reference/table-functions/iceberg.md)

View File

@ -26,6 +26,7 @@ SELECT * FROM s3_engine_table LIMIT 2;
│ two │ 2 │
└──────┴───────┘
```
## Create Table {#creating-a-table}
``` sql
@ -43,6 +44,37 @@ CREATE TABLE s3_engine_table (name String, value UInt32)
- `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user. You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file. For more information see [Using S3 for Data Storage](../mergetree-family/mergetree.md#table_engine-mergetree-s3).
- `compression` — Compression type. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. Parameter is optional. By default, it will auto-detect compression by file extension.
### Data cache {#data-cache}
`S3` table engine supports data caching on local disk.
See filesystem cache configuration options and usage in this [section](/docs/en/operations/storing-data.md/#using-local-cache).
Caching is made depending on the path and ETag of the storage object, so clickhouse will not read a stale cache version.
To enable caching use a setting `filesystem_cache_name = '<name>'` and `enable_filesystem_cache = 1`.
```sql
SELECT *
FROM s3('http://minio:10000/clickhouse//test_3.csv', 'minioadmin', 'minioadminpassword', 'CSV')
SETTINGS filesystem_cache_name = 'cache_for_s3', enable_filesystem_cache = 1;
```
There are two ways to define cache in configuration file.
1. add the following section to clickhouse configuration file:
``` xml
<clickhouse>
<filesystem_caches>
<cache_for_s3>
<path>path to cache directory</path>
<max_size>10Gi</max_size>
</cache_for_s3>
</filesystem_caches>
</clickhouse>
```
2. reuse cache configuration (and therefore cache storage) from clickhouse `storage_configuration` section, [described here](/docs/en/operations/storing-data.md/#using-local-cache)
### PARTITION BY
`PARTITION BY` — Optional. In most cases you don't need a partition key, and if it is needed you generally don't need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).