ClickHouse/docs/en/operations/query-result-cache.md

82 lines
6.4 KiB
Markdown
Raw Normal View History

2022-11-29 13:15:28 +00:00
---
slug: /en/operations/caches
sidebar_position: 65
sidebar_label: Query Result Cache [experimental]
title: "Query Result Cache [experimental]"
---
# Query Result Cache [experimental]
The query result cache is an experimental feature which can speed up repeated executions of the same SELECT query.
## Background, Design and Limitations
Query caches are generally either transactionally consistent or inconsistent.
- In transactionally consistent caches, the database invalidates/discards cached query results if the result of the SELECT query changes or
2022-12-15 17:49:54 +00:00
potentially changes. In ClickHouse, operations which change the data include inserts/updates/deletes in/of/from tables or collapsing
merges. Transactionally consistent caching is especially suitable for OLTP databases, for example
2022-11-29 13:15:28 +00:00
[MySQL](https://dev.mysql.com/doc/refman/5.6/en/query-cache.html) (which removed query result cache after v8.0) and
[Oracle](https://docs.oracle.com/database/121/TGDBA/tune_result_cache.htm).
2022-12-15 17:49:54 +00:00
- In transactionally inconsistent caches, slight inaccuracies in query results are accepted under the assumption that all cache entries are
2022-11-29 13:15:28 +00:00
assigned a validity period after which they expire (e.g. 1 minute) and that the underlying data changes only little during this period.
This approach is overall more suitable for OLAP databases. As an example where transactionally inconsistent caching is sufficient,
2022-12-15 20:21:02 +00:00
consider an hourly sales report in a reporting tool which is simultaneously accessed by multiple users. Sales data changes typically
slowly enough that the database only needs to compute the first report (represented by a SELECT query). Further queries can be served
2022-12-15 17:49:54 +00:00
directly from the query result cache. In this example, a reasonable validity period could be 30 min.
2022-11-29 13:15:28 +00:00
2022-12-15 17:49:54 +00:00
Transactionally inconsistent caching is traditionally provided by client tools or proxy packages interacting with the database. As a result,
the same caching logic and configuration is often duplicated. With ClickHouse's query result cache, the caching logic moves to the server
2022-12-15 19:08:30 +00:00
side. This reduces maintenance effort and avoids redundancy.
2022-11-29 13:15:28 +00:00
## Usage Examples and Configuration Settings
2022-12-18 12:23:38 +00:00
The query/user/profile-level parameter [enable_experimental_query_result_cache](settings/settings.md#enable-experimental-query-result-cache)
controls whether query results are inserted or retrieved from the cache. For example, the first execution of query
2022-11-29 13:15:28 +00:00
``` sql
SELECT expensive_calculation(A, B, C)
FROM T
SETTINGS enable_experimental_query_result_cache = true;
2022-11-29 13:15:28 +00:00
```
will store the query result into the query result cache and subsequent executions will retrieve the result directly from the cache.
2022-12-15 17:49:54 +00:00
It is sometimes desirable to use the query result cache only passively, i.e. to read from it but not write in it. Parameter
2022-12-18 12:23:38 +00:00
[enable_experimental_query_result_cache_passive_usage](settings/settings.md#enable-experimental-query-result-cache-passive-usage)
instead of 'enable_experimental_query_result_cache' can be used for that.
2022-11-29 13:15:28 +00:00
For maximum control, it is generally recommended to enable caching on a per-query basis. It is also possible to activate caching at
2022-12-15 17:49:54 +00:00
user/profile level but users should keep in mind that all SELECT queries may then return outdated results.
2022-11-29 13:15:28 +00:00
To clear the query result cache, use statement `SYSTEM DROP QUERY RESULT CACHE`. The content of the query result cache is displayed in
2022-12-16 10:39:19 +00:00
system table `SYSTEM.QUERYRESULT_CACHE`. The number of query result cache hits and misses are shown as events "QueryResultCacheHits" and
"QueryResultCacheMisses" in system table `SYSTEM.EVENTS`. Both counters are updated only for SELECT queries which run with settings
"enable_experimental_query_result_cache = true" or "enable_experimental_query_result_cache_passive_usage = true". In particular, all other
queries do not increment the cache miss counter.
2022-11-29 13:15:28 +00:00
2022-12-15 17:49:54 +00:00
The cache exists once per ClickHouse server process but cache results are by default not shared between users (see below).
2022-11-29 13:15:28 +00:00
2022-12-19 09:14:57 +00:00
Query results are referenced in the cache by the [Abstract Syntax Tree (AST)](https://en.wikipedia.org/wiki/Abstract_syntax_tree) of their query. This means that caching is agnostic to upper/lowercase, for example `SELECT 1` and `select 1` are treated as the same query.
2022-11-29 13:15:28 +00:00
2022-12-15 17:49:54 +00:00
### Further Configuration Options:
2022-11-29 13:15:28 +00:00
To configure the size of the query result cache, use setting [query_result_cache_size](server-configuration-parameters/settings.md#server_configuration_parameters_query-result-cache-size).
2022-11-29 13:15:28 +00:00
To set the maximum number of cache entries and the maximum size of a cache entry in bytes and in records, use settings [query_result_cache_max_entries](server-configuration-parameters/settings.md#server_configuration_parameters_query-result-cache-max-entries), [query_result_cache_max_entry_size](server-configuration-parameters/settings.md#server_configuration_parameters_query-result-cache-max-entry-size) and [query_result_cache_max_entry_records](server-configuration-parameters/settings.md#server_configuration_parameters_query-result-cache-max-entry-records).
To define how long a query must run at least such that its result is cached, use setting [query_result_cache_min_query_duration](settings/settings.md#query-result-cache-min-query-duration).
2022-11-29 13:15:28 +00:00
To control how often a query needs to run until its result is cached, use setting [query_result_cache_min_query_runs](settings/settings.md#query-result-cache-min-query-runs).
To specify the validity period after which cache entries become stale, use setting [query_result_cache_ttl](settings/settings.md#query-result-cache-ttl).
2022-11-29 13:15:28 +00:00
Results of queries with non-deterministic functions such as `rand()` and `now()` are not cached by default. This behavior can be overruled using setting [query_result_cache_store_results_of_queries_with_nondeterministic_functions](settings/settings.md#query-result-cache-store-results-of-queries-with-nondeterministic-functions).
2022-12-15 17:49:54 +00:00
2022-12-17 18:04:18 +00:00
Query cache entries are not shared between users due to security reasons. For example, user A must not be able to bypass a row policy on a
table by running the same query as another user B for whom no such policy exists. If nevertheless necessary, cache entries can be marked
accessible by other users (i.e. shared) using setting [query_result_cache_share_between_users]{settings/settings.md#query-result-cache-share-between-users}.
2022-11-29 13:15:28 +00:00
Finally, it is sometimes useful to cache query results of the same query multiple times with different validity periods. To identify
different entries for the same query, users may pass configuration [query_result_cache_partition_key](settings/settings.md#query-result-cache-partition-key).