From 613c688331f9a1afea8f4c98ce3a6017348917f0 Mon Sep 17 00:00:00 2001 From: kssenii Date: Wed, 7 Sep 2022 13:04:44 +0200 Subject: [PATCH 1/5] Add documentation for cache --- docs/en/operations/caches.md | 1 + docs/en/operations/storing-data.md | 110 +++++++++++++++++++++++++++++ 2 files changed, 111 insertions(+) diff --git a/docs/en/operations/caches.md b/docs/en/operations/caches.md index 910f57ec56b..3aeae7d1c9d 100644 --- a/docs/en/operations/caches.md +++ b/docs/en/operations/caches.md @@ -20,6 +20,7 @@ Additional cache types: - [Avro format](../interfaces/formats.md#data-format-avro) schemas cache. - [Dictionaries](../sql-reference/dictionaries/index.md) data cache. - Schema inference cache. +- [Filesystem cache](storing-data.md) over S3, Azure, Local and other disks. Indirectly used: diff --git a/docs/en/operations/storing-data.md b/docs/en/operations/storing-data.md index fab78366892..19d5a9a1651 100644 --- a/docs/en/operations/storing-data.md +++ b/docs/en/operations/storing-data.md @@ -112,6 +112,116 @@ Example of disk configuration: ``` +## Using local cache {#using-local-cache} + +It is possible to configure cache over disks in storage configuration starting from version 22.3. For versions 22.3 - 22.7 cache is supported only for `s3` disk type. For versions >= 22.8 cache is supported for any disk type: S3, Azure, Local, Encrypted, etc. Cache uses `LRU` cache policy. + +Example of configuration for versions later or equal to 21.8: + +``` xml + + + + + s3 + ... + ... s3 configuration ... + + + cache + s3 + /s3_cache/ + 10000000 + + + +``` + +Example of configuration for versions earlier than 21.8: + +``` xml + + + + + s3 + ... + ... s3 configuration ... + 1 + 10000000 + + + +``` + +Cache configuration settings (the list corresponds to latest ClickHouse version, for earlier versions something might be unsupported): + +- `path` - path to cache Default: None, this settings is obligatory. + +- `max_size` - size of cache in bytes Default: None, this settings is obligatory. + +- `cache_on_write_operations` - turn on `write-through` cache. Default: `false`. The `write-through` cache is enabled if `cache_on_write_operations` is `true` and user setting `filesystem`. + +- `enable_filesystem_query_cache_limit` - allow to limit the size of cache which is downloaded within each query (depends on user setting `max_query_cache_size`). Default: `false`. + +- `enable_cache_hits_threshold` - a number which defines the number of times some data needs to be read before it will be cached. Default: `0`, e.g. the data is cached at the first attempt to read it. + +- `do_not_evict_index_and_mark_files` - do not evict small frequently used files according to cache policy. Default: `true`. + +- `max_file_segment_size` - a max size for a single cache file. Default: `100 Mb`. + +- `max_elements` a limit for a number of cache files. + +Cache user settings (can be changes per query): + +- `enable_filesystem_cache` - allows to disable cache even if storage policy was configured with `cache` disk type. Default: `true`. + +- `read_from_filesystem_cache_if_exists_otherwise_bypass_cache` - allows to use cache in query only if it already exists, otherwise cache will not be filled with the query data. Default: `false`. + +- `enable_filesystem_cache_on_write_operations` - turn on `write-through` cache. This setting works only if settings `cache_on_write_operations` in cache configuration is turned on. + +- `enable_filesystem_cache_log` - turn on writing to `system.filesystem_cache_log` table. Gives a detailed view of cache usage per query. Default: `false`. + +- `max_query_cache_size` - a limit for the cache size, which can be written to local cache storage. Requires enabled `enable_filesystem_query_cache_limit` in cache configuration. Default: `false`. + +- `skip_download_if_exceeds_query_cache` - allows to change the behaviour of setting `max_query_cache_size`. Default: `true`. If this setting is turned on and cache download limit during query was reached, no more cache will be downloaded to cache storage. If this setting is turned off and cache download limit during query was reached, cache will still be written by evicting previously written within current query cache data. E.g. second behaviour allows to preserve `last recentltly used` behaviour. + +Cache system tables: + +- `system.filesystem_cache` - system tables which shows current state of cache. + +- `system.filesystem_cache_log` - system table which shows detailed cache usage per query. Requires `enable_filesystem_cache_log` setting to be `true`. + +Cache commands: + +- `SYSTEM DROP FILESYSTEM CACHE (ON CLUSTER)` + +- `SHOW CACHES` -- show list of caches which were configured on the server. + +- `DESCRIBE CACHE ''` - show cache configuration and some general statistics for a specific cache. Cache name can be taken from `SHOW CACHES` command. + +Cache current metrics: + +- `FilesystemCacheSize` + +- `FilesystemCacheElements` + +Cache asynchronous metrics: + +- `FilesystemCacheBytes` + +- `FilesystemCacheFiles` + +Cache profile events: + +- `CachedReadBufferReadFromSourceBytes`, `CachedReadBufferReadFromCacheBytes,` + +- `CachedReadBufferReadFromSourceMicroseconds`, `CachedReadBufferReadFromCacheMicroseconds` + +- `CachedReadBufferCacheWriteBytes`, `CachedReadBufferCacheWriteMicroseconds` + +- `CachedWriteBufferCacheWriteBytes`, `CachedWriteBufferCacheWriteMicroseconds` + ## Storing Data on Web Server {#storing-data-on-webserver} There is a tool `clickhouse-static-files-uploader`, which prepares a data directory for a given table (`SELECT data_paths FROM system.tables WHERE name = 'table_name'`). For each table you need, you get a directory of files. These files can be uploaded to, for example, a web server with static files. After this preparation, you can load this table into any ClickHouse server via `DiskWeb`. From 6f06633df6c47e94c7dc1dc09819db899aaf8cc2 Mon Sep 17 00:00:00 2001 From: Kseniia Sumarokova <54203879+kssenii@users.noreply.github.com> Date: Wed, 7 Sep 2022 13:59:39 +0200 Subject: [PATCH 2/5] Update storing-data.md --- docs/en/operations/storing-data.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/en/operations/storing-data.md b/docs/en/operations/storing-data.md index 19d5a9a1651..f52b388d517 100644 --- a/docs/en/operations/storing-data.md +++ b/docs/en/operations/storing-data.md @@ -154,7 +154,7 @@ Example of configuration for versions earlier than 21.8: ``` -Cache configuration settings (the list corresponds to latest ClickHouse version, for earlier versions something might be unsupported): +Cache **configuration settings**: - `path` - path to cache Default: None, this settings is obligatory. @@ -172,7 +172,7 @@ Cache configuration settings (the list corresponds to latest ClickHouse version, - `max_elements` a limit for a number of cache files. -Cache user settings (can be changes per query): +Cache **query settings**: - `enable_filesystem_cache` - allows to disable cache even if storage policy was configured with `cache` disk type. Default: `true`. @@ -186,15 +186,17 @@ Cache user settings (can be changes per query): - `skip_download_if_exceeds_query_cache` - allows to change the behaviour of setting `max_query_cache_size`. Default: `true`. If this setting is turned on and cache download limit during query was reached, no more cache will be downloaded to cache storage. If this setting is turned off and cache download limit during query was reached, cache will still be written by evicting previously written within current query cache data. E.g. second behaviour allows to preserve `last recentltly used` behaviour. -Cache system tables: +* Cache configuration settings and cache query settings correspond to the latest ClickHouse version, for earlier versions something might not be supported. + +Cache **system tables**: - `system.filesystem_cache` - system tables which shows current state of cache. - `system.filesystem_cache_log` - system table which shows detailed cache usage per query. Requires `enable_filesystem_cache_log` setting to be `true`. -Cache commands: +Cache **commands**: -- `SYSTEM DROP FILESYSTEM CACHE (ON CLUSTER)` +- `SYSTEM DROP FILESYSTEM CACHE () (ON CLUSTER)` - `SHOW CACHES` -- show list of caches which were configured on the server. From 3af51f4340589570888cbb041af22d1a3f68bf73 Mon Sep 17 00:00:00 2001 From: Kseniia Sumarokova <54203879+kssenii@users.noreply.github.com> Date: Wed, 7 Sep 2022 22:21:46 +0200 Subject: [PATCH 3/5] Update storing-data.md --- docs/en/operations/storing-data.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/en/operations/storing-data.md b/docs/en/operations/storing-data.md index f52b388d517..926ee92b95c 100644 --- a/docs/en/operations/storing-data.md +++ b/docs/en/operations/storing-data.md @@ -116,7 +116,7 @@ Example of disk configuration: It is possible to configure cache over disks in storage configuration starting from version 22.3. For versions 22.3 - 22.7 cache is supported only for `s3` disk type. For versions >= 22.8 cache is supported for any disk type: S3, Azure, Local, Encrypted, etc. Cache uses `LRU` cache policy. -Example of configuration for versions later or equal to 21.8: +Example of configuration for versions later or equal to 22.8: ``` xml @@ -137,7 +137,7 @@ Example of configuration for versions later or equal to 21.8: ``` -Example of configuration for versions earlier than 21.8: +Example of configuration for versions earlier than 22.8: ``` xml @@ -156,11 +156,11 @@ Example of configuration for versions earlier than 21.8: Cache **configuration settings**: -- `path` - path to cache Default: None, this settings is obligatory. +- `path` - path to cache. Default: None, this setting is obligatory. -- `max_size` - size of cache in bytes Default: None, this settings is obligatory. +- `max_size` - size of the cache in bytes. Default: None, this setting is obligatory. -- `cache_on_write_operations` - turn on `write-through` cache. Default: `false`. The `write-through` cache is enabled if `cache_on_write_operations` is `true` and user setting `filesystem`. +- `cache_on_write_operations` - turn on `write-through` cache. Default: `false`. The `write-through` cache is enabled if `cache_on_write_operations` is `true` and user setting `enable_filesystem_cache_on_write_operations`. - `enable_filesystem_query_cache_limit` - allow to limit the size of cache which is downloaded within each query (depends on user setting `max_query_cache_size`). Default: `false`. @@ -168,9 +168,9 @@ Cache **configuration settings**: - `do_not_evict_index_and_mark_files` - do not evict small frequently used files according to cache policy. Default: `true`. -- `max_file_segment_size` - a max size for a single cache file. Default: `100 Mb`. +- `max_file_segment_size` - a max size for a single cache file. Default: `104857600` (100 Mb). -- `max_elements` a limit for a number of cache files. +- `max_elements` a limit for a number of cache files. Default: `1048576`. Cache **query settings**: From eb53df48d1c29238637eb2c4511ce27ec5853e3a Mon Sep 17 00:00:00 2001 From: Kseniia Sumarokova <54203879+kssenii@users.noreply.github.com> Date: Wed, 7 Sep 2022 22:26:52 +0200 Subject: [PATCH 4/5] Update storing-data.md --- docs/en/operations/storing-data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/operations/storing-data.md b/docs/en/operations/storing-data.md index 926ee92b95c..c0d2c828247 100644 --- a/docs/en/operations/storing-data.md +++ b/docs/en/operations/storing-data.md @@ -184,7 +184,7 @@ Cache **query settings**: - `max_query_cache_size` - a limit for the cache size, which can be written to local cache storage. Requires enabled `enable_filesystem_query_cache_limit` in cache configuration. Default: `false`. -- `skip_download_if_exceeds_query_cache` - allows to change the behaviour of setting `max_query_cache_size`. Default: `true`. If this setting is turned on and cache download limit during query was reached, no more cache will be downloaded to cache storage. If this setting is turned off and cache download limit during query was reached, cache will still be written by evicting previously written within current query cache data. E.g. second behaviour allows to preserve `last recentltly used` behaviour. +- `skip_download_if_exceeds_query_cache` - allows to change the behaviour of setting `max_query_cache_size`. Default: `true`. If this setting is turned on and cache download limit during query was reached, no more cache will be downloaded to cache storage. If this setting is turned off and cache download limit during query was reached, cache will still be written by cost of evicting previously downloaded (within current query) data, e.g. second behaviour allows to preserve `last recentltly used` behaviour while keeping query cache limit. * Cache configuration settings and cache query settings correspond to the latest ClickHouse version, for earlier versions something might not be supported. From 7f086a94a7309b7a708cad806301d553fc20da2b Mon Sep 17 00:00:00 2001 From: Kseniia Sumarokova <54203879+kssenii@users.noreply.github.com> Date: Thu, 8 Sep 2022 12:34:58 +0200 Subject: [PATCH 5/5] Update storing-data.md --- docs/en/operations/storing-data.md | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/docs/en/operations/storing-data.md b/docs/en/operations/storing-data.md index c0d2c828247..546e3d7b7a6 100644 --- a/docs/en/operations/storing-data.md +++ b/docs/en/operations/storing-data.md @@ -114,7 +114,7 @@ Example of disk configuration: ## Using local cache {#using-local-cache} -It is possible to configure cache over disks in storage configuration starting from version 22.3. For versions 22.3 - 22.7 cache is supported only for `s3` disk type. For versions >= 22.8 cache is supported for any disk type: S3, Azure, Local, Encrypted, etc. Cache uses `LRU` cache policy. +It is possible to configure local cache over disks in storage configuration starting from version 22.3. For versions 22.3 - 22.7 cache is supported only for `s3` disk type. For versions >= 22.8 cache is supported for any disk type: S3, Azure, Local, Encrypted, etc. Cache uses `LRU` cache policy. Example of configuration for versions later or equal to 22.8: @@ -156,37 +156,38 @@ Example of configuration for versions earlier than 22.8: Cache **configuration settings**: -- `path` - path to cache. Default: None, this setting is obligatory. +- `path` - path to the directory with cache. Default: None, this setting is obligatory. -- `max_size` - size of the cache in bytes. Default: None, this setting is obligatory. +- `max_size` - maximum size of the cache in bytes. When the limit is reached, cache files are evicted according to the cache eviction policy. Default: None, this setting is obligatory. -- `cache_on_write_operations` - turn on `write-through` cache. Default: `false`. The `write-through` cache is enabled if `cache_on_write_operations` is `true` and user setting `enable_filesystem_cache_on_write_operations`. +- `cache_on_write_operations` - allow to turn on `write-through` cache (caching data on any write operations: `INSERT` queries, background merges). Default: `false`. The `write-through` cache can be disabled per query using setting `enable_filesystem_cache_on_write_operations` (data is cached only if both cache config settings and corresponding query setting are enabled). - `enable_filesystem_query_cache_limit` - allow to limit the size of cache which is downloaded within each query (depends on user setting `max_query_cache_size`). Default: `false`. -- `enable_cache_hits_threshold` - a number which defines the number of times some data needs to be read before it will be cached. Default: `0`, e.g. the data is cached at the first attempt to read it. +- `enable_cache_hits_threshold` - a number, which defines how many times some data needs to be read before it will be cached. Default: `0`, e.g. the data is cached at the first attempt to read it. - `do_not_evict_index_and_mark_files` - do not evict small frequently used files according to cache policy. Default: `true`. -- `max_file_segment_size` - a max size for a single cache file. Default: `104857600` (100 Mb). +- `max_file_segment_size` - a maximum size of a single cache file. Default: `104857600` (100 Mb). -- `max_elements` a limit for a number of cache files. Default: `1048576`. +- `max_elements` - a limit for a number of cache files. Default: `1048576`. Cache **query settings**: -- `enable_filesystem_cache` - allows to disable cache even if storage policy was configured with `cache` disk type. Default: `true`. +- `enable_filesystem_cache` - allows to disable cache per query even if storage policy was configured with `cache` disk type. Default: `true`. -- `read_from_filesystem_cache_if_exists_otherwise_bypass_cache` - allows to use cache in query only if it already exists, otherwise cache will not be filled with the query data. Default: `false`. +- `read_from_filesystem_cache_if_exists_otherwise_bypass_cache` - allows to use cache in query only if it already exists, otherwise query data will not be written to local cache storage. Default: `false`. -- `enable_filesystem_cache_on_write_operations` - turn on `write-through` cache. This setting works only if settings `cache_on_write_operations` in cache configuration is turned on. +- `enable_filesystem_cache_on_write_operations` - turn on `write-through` cache. This setting works only if setting `cache_on_write_operations` in cache configuration is turned on. -- `enable_filesystem_cache_log` - turn on writing to `system.filesystem_cache_log` table. Gives a detailed view of cache usage per query. Default: `false`. +- `enable_filesystem_cache_log` - turn on logging to `system.filesystem_cache_log` table. Gives a detailed view of cache usage per query. Default: `false`. - `max_query_cache_size` - a limit for the cache size, which can be written to local cache storage. Requires enabled `enable_filesystem_query_cache_limit` in cache configuration. Default: `false`. - `skip_download_if_exceeds_query_cache` - allows to change the behaviour of setting `max_query_cache_size`. Default: `true`. If this setting is turned on and cache download limit during query was reached, no more cache will be downloaded to cache storage. If this setting is turned off and cache download limit during query was reached, cache will still be written by cost of evicting previously downloaded (within current query) data, e.g. second behaviour allows to preserve `last recentltly used` behaviour while keeping query cache limit. -* Cache configuration settings and cache query settings correspond to the latest ClickHouse version, for earlier versions something might not be supported. +** Warning ** +Cache configuration settings and cache query settings correspond to the latest ClickHouse version, for earlier versions something might not be supported. Cache **system tables**: