diff --git a/docs/en/engines/table-engines/integrations/azureBlobStorage.md b/docs/en/engines/table-engines/integrations/azureBlobStorage.md index bdf96832e9d..bb1349ad9d0 100644 --- a/docs/en/engines/table-engines/integrations/azureBlobStorage.md +++ b/docs/en/engines/table-engines/integrations/azureBlobStorage.md @@ -63,7 +63,34 @@ Currently there are 3 ways to authenticate: - `SAS Token` - Can be used by providing an `endpoint`, `connection_string` or `storage_account_url`. It is identified by presence of '?' in the url. - `Workload Identity` - Can be used by providing an `endpoint` or `storage_account_url`. If `use_workload_identity` parameter is set in config, ([workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications)) is used for authentication. +### Data cache {#data-cache} +`Azure` table engine supports data caching on local disk. +See filesystem cache configuration options and usage in this [section](/docs/en/operations/storing-data.md/#using-local-cache). +Caching is made depending on the path and ETag of the storage object, so clickhouse will not read a stale cache version. + +To enable caching use a setting `filesystem_cache_name = ''` and `enable_filesystem_cache = 1`. + +```sql +SELECT * +FROM azureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;', 'test_container', 'test_table', 'CSV') +SETTINGS filesystem_cache_name = 'cache_for_azure', enable_filesystem_cache = 1; +``` + +1. add the following section to clickhouse configuration file: + +``` xml + + + + path to cache directory + 10Gi + + + +``` + +2. reuse cache configuration (and therefore cache storage) from clickhouse `storage_configuration` section, [described here](/docs/en/operations/storing-data.md/#using-local-cache) ## See also diff --git a/docs/en/engines/table-engines/integrations/deltalake.md b/docs/en/engines/table-engines/integrations/deltalake.md index 964c952f31a..fb564b4873e 100644 --- a/docs/en/engines/table-engines/integrations/deltalake.md +++ b/docs/en/engines/table-engines/integrations/deltalake.md @@ -48,6 +48,10 @@ Using named collections: CREATE TABLE deltalake ENGINE=DeltaLake(deltalake_conf, filename = 'test_table') ``` +### Data cache {#data-cache} + +`Iceberg` table engine and table function support data caching same as `S3`, `AzureBlobStorage`, `HDFS` storages. See [here](../../../engines/table-engines/integrations/s3.md#data-cache). + ## See also - [deltaLake table function](../../../sql-reference/table-functions/deltalake.md) diff --git a/docs/en/engines/table-engines/integrations/iceberg.md b/docs/en/engines/table-engines/integrations/iceberg.md index 94468066372..939312bafab 100644 --- a/docs/en/engines/table-engines/integrations/iceberg.md +++ b/docs/en/engines/table-engines/integrations/iceberg.md @@ -60,6 +60,10 @@ CREATE TABLE iceberg_table ENGINE=IcebergS3(iceberg_conf, filename = 'test_table Table engine `Iceberg` is an alias to `IcebergS3` now. +### Data cache {#data-cache} + +`Iceberg` table engine and table function support data caching same as `S3`, `AzureBlobStorage`, `HDFS` storages. See [here](../../../engines/table-engines/integrations/s3.md#data-cache). + ## See also - [iceberg table function](/docs/en/sql-reference/table-functions/iceberg.md) diff --git a/docs/en/engines/table-engines/integrations/s3.md b/docs/en/engines/table-engines/integrations/s3.md index f02d0563491..fb759b948a5 100644 --- a/docs/en/engines/table-engines/integrations/s3.md +++ b/docs/en/engines/table-engines/integrations/s3.md @@ -26,6 +26,7 @@ SELECT * FROM s3_engine_table LIMIT 2; │ two │ 2 │ └──────┴───────┘ ``` + ## Create Table {#creating-a-table} ``` sql @@ -43,6 +44,37 @@ CREATE TABLE s3_engine_table (name String, value UInt32) - `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user. You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file. For more information see [Using S3 for Data Storage](../mergetree-family/mergetree.md#table_engine-mergetree-s3). - `compression` — Compression type. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. Parameter is optional. By default, it will auto-detect compression by file extension. +### Data cache {#data-cache} + +`S3` table engine supports data caching on local disk. +See filesystem cache configuration options and usage in this [section](/docs/en/operations/storing-data.md/#using-local-cache). +Caching is made depending on the path and ETag of the storage object, so clickhouse will not read a stale cache version. + +To enable caching use a setting `filesystem_cache_name = ''` and `enable_filesystem_cache = 1`. + +```sql +SELECT * +FROM s3('http://minio:10000/clickhouse//test_3.csv', 'minioadmin', 'minioadminpassword', 'CSV') +SETTINGS filesystem_cache_name = 'cache_for_s3', enable_filesystem_cache = 1; +``` + +There are two ways to define cache in configuration file. + +1. add the following section to clickhouse configuration file: + +``` xml + + + + path to cache directory + 10Gi + + + +``` + +2. reuse cache configuration (and therefore cache storage) from clickhouse `storage_configuration` section, [described here](/docs/en/operations/storing-data.md/#using-local-cache) + ### PARTITION BY `PARTITION BY` — Optional. In most cases you don't need a partition key, and if it is needed you generally don't need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).