Add documentation

2024-11-21 23:21:59 +00:00 · 2024-10-10 13:40:08 +08:00 · 2024-10-10 13:40:08 +08:00 · 367cdb7227
commit 367cdb7227
parent 6f3c1fe0e8
4 changed files with 67 additions and 0 deletions
--- a/docs/en/engines/table-engines/integrations/azureBlobStorage.md
+++ b/docs/en/engines/table-engines/integrations/azureBlobStorage.md
@ -63,7 +63,34 @@ Currently there are 3 ways to authenticate:
 - `SAS Token` - Can be used by providing an `endpoint`, `connection_string` or `storage_account_url`. It is identified by presence of '?' in the url.
 - `Workload Identity` - Can be used by providing an `endpoint` or `storage_account_url`. If `use_workload_identity` parameter is set in config, ([workload identity](https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/identity/azure-identity#authenticate-azure-hosted-applications)) is used for authentication.

+### Data cache {#data-cache}

+`Azure` table engine supports data caching on local disk.
+See filesystem cache configuration options and usage in this [section](/docs/en/operations/storing-data.md/#using-local-cache).
+Caching is made depending on the path and ETag of the storage object, so clickhouse will not read a stale cache version.
+
+To enable caching use a setting `filesystem_cache_name = '<name>'` and `enable_filesystem_cache = 1`.
+
+```sql
+SELECT *
+FROM azureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;', 'test_container', 'test_table', 'CSV')
+SETTINGS filesystem_cache_name = 'cache_for_azure', enable_filesystem_cache = 1;
+```
+
+1. add the following section to clickhouse configuration file:
+
+``` xml
+<clickhouse>
+    <filesystem_caches>
+        <cache_for_azure>
+            <path>path to cache directory</path>
+            <max_size>10Gi</max_size>
+        </cache_for_azure>
+    </filesystem_caches>
+</clickhouse>
+```
+
+2. reuse cache configuration (and therefore cache storage) from clickhouse `storage_configuration` section, [described here](/docs/en/operations/storing-data.md/#using-local-cache)

 ## See also

--- a/docs/en/engines/table-engines/integrations/deltalake.md
+++ b/docs/en/engines/table-engines/integrations/deltalake.md
@ -48,6 +48,10 @@ Using named collections:
 CREATE TABLE deltalake ENGINE=DeltaLake(deltalake_conf, filename = 'test_table')
 ```

+### Data cache {#data-cache}
+
+`Iceberg` table engine and table function support data caching same as `S3`, `AzureBlobStorage`, `HDFS` storages. See [here](../../../engines/table-engines/integrations/s3.md#data-cache).
+
 ## See also

 - [deltaLake table function](../../../sql-reference/table-functions/deltalake.md)
--- a/docs/en/engines/table-engines/integrations/iceberg.md
+++ b/docs/en/engines/table-engines/integrations/iceberg.md
@ -60,6 +60,10 @@ CREATE TABLE iceberg_table ENGINE=IcebergS3(iceberg_conf, filename = 'test_table

 Table engine `Iceberg` is an alias to `IcebergS3` now.

+### Data cache {#data-cache}
+
+`Iceberg` table engine and table function support data caching same as `S3`, `AzureBlobStorage`, `HDFS` storages. See [here](../../../engines/table-engines/integrations/s3.md#data-cache).
+
 ## See also

 - [iceberg table function](/docs/en/sql-reference/table-functions/iceberg.md)
--- a/docs/en/engines/table-engines/integrations/s3.md
+++ b/docs/en/engines/table-engines/integrations/s3.md
@ -26,6 +26,7 @@ SELECT * FROM s3_engine_table LIMIT 2;
 │ two  │     2 │
 └──────┴───────┘
 ```
+
 ## Create Table {#creating-a-table}

 ``` sql
@ -43,6 +44,37 @@ CREATE TABLE s3_engine_table (name String, value UInt32)
 - `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user.  You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file. For more information see [Using S3 for Data Storage](../mergetree-family/mergetree.md#table_engine-mergetree-s3).
 - `compression` — Compression type. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. Parameter is optional. By default, it will auto-detect compression by file extension.

+### Data cache {#data-cache}
+
+`S3` table engine supports data caching on local disk.
+See filesystem cache configuration options and usage in this [section](/docs/en/operations/storing-data.md/#using-local-cache).
+Caching is made depending on the path and ETag of the storage object, so clickhouse will not read a stale cache version.
+
+To enable caching use a setting `filesystem_cache_name = '<name>'` and `enable_filesystem_cache = 1`.
+
+```sql
+SELECT *
+FROM s3('http://minio:10000/clickhouse//test_3.csv', 'minioadmin', 'minioadminpassword', 'CSV')
+SETTINGS filesystem_cache_name = 'cache_for_s3', enable_filesystem_cache = 1;
+```
+
+There are two ways to define cache in configuration file.
+
+1. add the following section to clickhouse configuration file:
+
+``` xml
+<clickhouse>
+    <filesystem_caches>
+        <cache_for_s3>
+            <path>path to cache directory</path>
+            <max_size>10Gi</max_size>
+        </cache_for_s3>
+    </filesystem_caches>
+</clickhouse>
+```
+
+2. reuse cache configuration (and therefore cache storage) from clickhouse `storage_configuration` section, [described here](/docs/en/operations/storing-data.md/#using-local-cache)
+
 ### PARTITION BY

 `PARTITION BY` — Optional. In most cases you don't need a partition key, and if it is needed you generally don't need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).