Merge branch 'master' into iaadeflate_upgrade_qpl_v1.0.0

This commit is contained in:
Robert Schulze 2023-02-02 16:45:33 +01:00 committed by GitHub
commit 4b59f5bb4c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 95 additions and 145 deletions

View File

@ -48,6 +48,7 @@ RUN apt-get update \
gdb \
git \
gperf \
libclang-rt-${LLVM_VERSION}-dev \
lld-${LLVM_VERSION} \
llvm-${LLVM_VERSION} \
llvm-${LLVM_VERSION}-dev \

View File

@ -22,6 +22,6 @@ Additional cache types:
- [Dictionaries](../sql-reference/dictionaries/index.md) data cache.
- Schema inference cache.
- [Filesystem cache](storing-data.md) over S3, Azure, Local and other disks.
- [(Experimental) Query result cache](query-result-cache.md).
- [(Experimental) Query cache](query-cache.md).
To drop one of the caches, use [SYSTEM DROP ... CACHE](../sql-reference/statements/system.md#drop-mark-cache) statements.

View File

@ -1,112 +0,0 @@
---
slug: /en/operations/query-result-cache
sidebar_position: 65
sidebar_label: Query Result Cache [experimental]
---
# Query Result Cache [experimental]
The query result cache allows to compute `SELECT` queries just once and to serve further executions of the same query directly from the
cache. Depending on the type of the queries, this can dramatically reduce latency and resource consumption of the ClickHouse server.
## Background, Design and Limitations
Query result caches can generally be viewed as transactionally consistent or inconsistent.
- In transactionally consistent caches, the database invalidates (discards) cached query results if the result of the `SELECT` query changes
or potentially changes. In ClickHouse, operations which change the data include inserts/updates/deletes in/of/from tables or collapsing
merges. Transactionally consistent caching is especially suitable for OLTP databases, for example
[MySQL](https://dev.mysql.com/doc/refman/5.6/en/query-cache.html) (which removed query result cache after v8.0) and
[Oracle](https://docs.oracle.com/database/121/TGDBA/tune_result_cache.htm).
- In transactionally inconsistent caches, slight inaccuracies in query results are accepted under the assumption that all cache entries are
assigned a validity period after which they expire (e.g. 1 minute) and that the underlying data changes only little during this period.
This approach is overall more suitable for OLAP databases. As an example where transactionally inconsistent caching is sufficient,
consider an hourly sales report in a reporting tool which is simultaneously accessed by multiple users. Sales data changes typically
slowly enough that the database only needs to compute the report once (represented by the first `SELECT` query). Further queries can be
served directly from the query result cache. In this example, a reasonable validity period could be 30 min.
Transactionally inconsistent caching is traditionally provided by client tools or proxy packages interacting with the database. As a result,
the same caching logic and configuration is often duplicated. With ClickHouse's query result cache, the caching logic moves to the server
side. This reduces maintenance effort and avoids redundancy.
:::warning
The query result cache is an experimental feature that should not be used in production. There are known cases (e.g. in distributed query
processing) where wrong results are returned.
:::
## Configuration Settings and Usage
As long as the result cache is experimental it must be activated using the following configuration setting:
```sql
SET allow_experimental_query_result_cache = true;
```
Afterwards, setting [use_query_result_cache](settings/settings.md#use-query-result-cache) can be used to control whether a specific query or
all queries of the current session should utilize the query result cache. For example, the first execution of query
```sql
SELECT some_expensive_calculation(column_1, column_2)
FROM table
SETTINGS use_query_result_cache = true;
```
will store the query result in the query result cache. Subsequent executions of the same query (also with parameter `use_query_result_cache
= true`) will read the computed result from the cache and return it immediately.
The way the cache is utilized can be configured in more detail using settings [enable_writes_to_query_result_cache](settings/settings.md#enable-writes-to-query-result-cache)
and [enable_reads_from_query_result_cache](settings/settings.md#enable-reads-from-query-result-cache) (both `true` by default). The first
settings controls whether query results are stored in the cache, whereas the second parameter determines if the database should try to
retrieve query results from the cache. For example, the following query will use the cache only passively, i.e. attempt to read from it but
not store its result in it:
```sql
SELECT some_expensive_calculation(column_1, column_2)
FROM table
SETTINGS use_query_result_cache = true, enable_writes_to_query_result_cache = false;
```
For maximum control, it is generally recommended to provide settings "use_query_result_cache", "enable_writes_to_query_result_cache" and
"enable_reads_from_query_result_cache" only with specific queries. It is also possible to enable caching at user or profile level (e.g. via
`SET use_query_result_cache = true`) but one should keep in mind that all `SELECT` queries including monitoring or debugging queries to
system tables may return cached results then.
The query result cache can be cleared using statement `SYSTEM DROP QUERY RESULT CACHE`. The content of the query result cache is displayed
in system table `SYSTEM.QUERY_RESULT_CACHE`. The number of query result cache hits and misses are shown as events "QueryCacheHits" and
"QueryCacheMisses" in system table `SYSTEM.EVENTS`. Both counters are only updated for `SELECT` queries which run with setting
"use_query_result_cache = true". Other queries do not affect the cache miss counter.
The query result cache exists once per ClickHouse server process. However, cache results are by default not shared between users. This can
be changed (see below) but doing so is not recommended for security reasons.
Query results are referenced in the query result cache by the [Abstract Syntax Tree (AST)](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
of their query. This means that caching is agnostic to upper/lowercase, for example `SELECT 1` and `select 1` are treated as the same query.
To make the matching more natural, all query-level settings related to the query result cache are removed from the AST.
If the query was aborted due to an exception or user cancellation, no entry is written into the query result cache.
The size of the query result cache, the maximum number of cache entries and the maximum size of cache entries (in bytes and in records) can
be configured using different [server configuration options](server-configuration-parameters/settings.md#server_configuration_parameters_query-result-cache).
To define how long a query must run at least such that its result can be cached, you can use setting
[query_result_cache_min_query_duration](settings/settings.md#query-result-cache-min-query-duration). For example, the result of query
``` sql
SELECT some_expensive_calculation(column_1, column_2)
FROM table
SETTINGS use_query_result_cache = true, query_result_cache_min_query_duration = 5000;
```
is only cached if the query runs longer than 5 seconds. It is also possible to specify how often a query needs to run until its result is
cached - for that use setting [query_result_cache_min_query_runs](settings/settings.md#query-result-cache-min-query-runs).
Entries in the query result cache become stale after a certain time period (time-to-live). By default, this period is 60 seconds but a
different value can be specified at session, profile or query level using setting [query_result_cache_ttl](settings/settings.md#query-result-cache-ttl).
Also, results of queries with non-deterministic functions such as `rand()` and `now()` are not cached. This can be overruled using
setting [query_result_cache_store_results_of_queries_with_nondeterministic_functions](settings/settings.md#query-result-cache-store-results-of-queries-with-nondeterministic-functions).
Finally, entries in the query cache are not shared between users due to security reasons. For example, user A must not be able to bypass a
row policy on a table by running the same query as another user B for whom no such policy exists. However, if necessary, cache entries can
be marked accessible by other users (i.e. shared) by supplying setting
[query_result_cache_share_between_users](settings/settings.md#query-result-cache-share-between-users).

View File

@ -1303,7 +1303,7 @@ Default value: `3`.
## use_query_cache {#use-query-cache}
If turned on, `SELECT` queries may utilize the [query cache](../query-cache.md). Parameters [enable_reads_from_query_cache](#enable-readsfrom-query-cache)
If turned on, `SELECT` queries may utilize the [query cache](../query-cache.md). Parameters [enable_reads_from_query_cache](#enable-reads-from-query-cache)
and [enable_writes_to_query_cache](#enable-writes-to-query-cache) control in more detail how the cache is used.
Possible values:

View File

@ -76,7 +76,7 @@
#charts
{
height: 100%;
display: flex;
display: none;
flex-flow: row wrap;
gap: 1rem;
}
@ -170,6 +170,14 @@
background: var(--button-background-color);
}
#auth-error {
color: var(--error-color);
display: flex;
flex-flow: row nowrap;
justify-content: center;
}
form {
display: inline;
}
@ -293,6 +301,7 @@
</div>
</form>
</div>
<div id="auth-error"></div>
<div id="charts"></div>
<script>
@ -322,6 +331,11 @@ if (location.protocol != 'file:') {
user = 'default';
}
const errorCodeRegex = /Code: (\d+)/
const errorCodeMessageMap = {
516: 'Error authenticating with database. Please check your connection params and try again.'
}
/// This is just a demo configuration of the dashboard.
let queries = [
@ -597,6 +611,11 @@ function insertChart(i) {
query_editor_confirm.value = 'Ok';
query_editor_confirm.className = 'edit-confirm';
function getCurrentIndex() {
/// Indices may change after deletion of other element, hence captured "i" may become incorrect.
return [...charts.querySelectorAll('.chart')].findIndex(child => chart == child);
}
function editConfirm() {
query_editor.style.display = 'none';
query_error.style.display = 'none';
@ -605,7 +624,8 @@ function insertChart(i) {
title_text.data = '';
findParamsInQuery(q.query, params);
buildParams();
draw(i, chart, getParamsForURL(), q.query);
const idx = getCurrentIndex();
draw(idx, chart, getParamsForURL(), q.query);
saveState();
}
@ -649,8 +669,7 @@ function insertChart(i) {
let trash_text = document.createTextNode('✕');
trash.appendChild(trash_text);
trash.addEventListener('click', e => {
/// Indices may change after deletion of other element, hence captured "i" may become incorrect.
let idx = [...charts.querySelectorAll('.chart')].findIndex(child => chart == child);
const idx = getCurrentIndex();
if (plots[idx]) {
plots[idx].destroy();
plots[idx] = null;
@ -796,6 +815,18 @@ async function draw(idx, chart, url_params, query) {
error = e.toString();
}
if (error) {
const errorMatch = error.match(errorCodeRegex)
if (errorMatch && errorMatch[1]) {
const code = errorMatch[1]
if (errorCodeMessageMap[code]) {
const authError = new Error(errorCodeMessageMap[code])
authError.code = code
throw authError
}
}
}
if (!error) {
if (!Array.isArray(data)) {
error = "Query should return an array.";
@ -853,16 +884,50 @@ async function draw(idx, chart, url_params, query) {
sync.sub(plots[idx]);
/// Set title
const title = queries[idx].title ? queries[idx].title.replaceAll(/\{(\w+)\}/g, (_, name) => params[name] ) : '';
const title = queries[idx] && queries[idx].title ? queries[idx].title.replaceAll(/\{(\w+)\}/g, (_, name) => params[name] ) : '';
chart.querySelector('.title').firstChild.data = title;
}
function showAuthError(message) {
const charts = document.querySelector('#charts');
charts.style.display = 'none';
const add = document.querySelector('#add');
add.style.display = 'none';
const authError = document.querySelector('#auth-error');
authError.textContent = message;
authError.style.display = 'flex';
}
function hideAuthError() {
const charts = document.querySelector('#charts');
charts.style.display = 'flex';
const add = document.querySelector('#add');
add.style.display = 'block';
const authError = document.querySelector('#auth-error');
authError.textContent = '';
authError.style.display = 'none';
}
let firstLoad = true;
async function drawAll() {
let params = getParamsForURL();
const charts = document.getElementsByClassName('chart');
for (let i = 0; i < queries.length; ++i) {
draw(i, charts[i], params, queries[i].query);
if (!firstLoad) {
hideAuthError();
}
await Promise.all([...Array(queries.length)].map(async (_, i) => {
return draw(i, charts[i], params, queries[i].query).catch((e) => {
if (!firstLoad) {
showAuthError(e.message);
}
});
})).then(() => {
firstLoad = false;
})
}
function resize() {

View File

@ -17,7 +17,7 @@
#include <Processors/QueryPlan/ExpressionStep.h>
#include <Processors/QueryPlan/FilterStep.h>
#include <Processors/QueryPlan/ReadFromPreparedSource.h>
#include <Processors/Executors/PullingPipelineExecutor.h>
#include <Processors/Executors/PullingAsyncPipelineExecutor.h>
#include <Processors/Transforms/CheckSortedTransform.h>
#include <Parsers/ASTIdentifier.h>
#include <Parsers/ASTFunction.h>
@ -197,7 +197,7 @@ bool isStorageTouchedByMutations(
MergeTreeData::DataPartPtr source_part,
const StorageMetadataPtr & metadata_snapshot,
const std::vector<MutationCommand> & commands,
ContextMutablePtr context_copy)
ContextPtr context)
{
if (commands.empty())
return false;
@ -210,7 +210,7 @@ bool isStorageTouchedByMutations(
if (command.partition)
{
const String partition_id = storage.getPartitionIDFromQuery(command.partition, context_copy);
const String partition_id = storage.getPartitionIDFromQuery(command.partition, context);
if (partition_id == source_part->info.partition_id)
all_commands_can_be_skipped = false;
}
@ -221,15 +221,7 @@ bool isStorageTouchedByMutations(
if (all_commands_can_be_skipped)
return false;
/// We must read with one thread because it guarantees that
/// output stream will be sorted after reading from MergeTree parts.
/// Disable all settings that can enable reading with several streams.
context_copy->setSetting("max_streams_to_max_threads_ratio", 1);
context_copy->setSetting("max_threads", 1);
context_copy->setSetting("allow_asynchronous_read_from_io_pool_for_merge_tree", false);
context_copy->setSetting("max_streams_for_merge_tree_reading", Field(0));
ASTPtr select_query = prepareQueryAffectedAST(commands, storage.shared_from_this(), context_copy);
ASTPtr select_query = prepareQueryAffectedAST(commands, storage.shared_from_this(), context);
auto storage_from_part = std::make_shared<StorageFromMergeTreeDataPart>(source_part);
@ -237,12 +229,12 @@ bool isStorageTouchedByMutations(
/// For some reason it may copy context and give it into ExpressionTransform
/// after that we will use context from destroyed stack frame in our stream.
InterpreterSelectQuery interpreter(
select_query, context_copy, storage_from_part, metadata_snapshot, SelectQueryOptions().ignoreLimits().ignoreProjections());
select_query, context, storage_from_part, metadata_snapshot, SelectQueryOptions().ignoreLimits().ignoreProjections());
auto io = interpreter.execute();
PullingPipelineExecutor executor(io.pipeline);
PullingAsyncPipelineExecutor executor(io.pipeline);
Block block;
while (executor.pull(block)) {}
while (block.rows() == 0 && executor.pull(block));
if (!block.rows())
return false;

View File

@ -23,7 +23,7 @@ bool isStorageTouchedByMutations(
MergeTreeData::DataPartPtr source_part,
const StorageMetadataPtr & metadata_snapshot,
const std::vector<MutationCommand> & commands,
ContextMutablePtr context_copy
ContextPtr context
);
ASTPtr getPartitionAndPredicateExpressionForMutationCommand(

View File

@ -1543,13 +1543,6 @@ bool MutateTask::prepare()
auto context_for_reading = Context::createCopy(ctx->context);
/// We must read with one thread because it guarantees that output stream will be sorted.
/// Disable all settings that can enable reading with several streams.
context_for_reading->setSetting("max_streams_to_max_threads_ratio", 1);
context_for_reading->setSetting("max_threads", 1);
context_for_reading->setSetting("allow_asynchronous_read_from_io_pool_for_merge_tree", false);
context_for_reading->setSetting("max_streams_for_merge_tree_reading", Field(0));
/// Allow mutations to work when force_index_by_date or force_primary_key is on.
context_for_reading->setSetting("force_index_by_date", false);
context_for_reading->setSetting("force_primary_key", false);
@ -1562,7 +1555,7 @@ bool MutateTask::prepare()
}
if (ctx->source_part->isStoredOnDisk() && !isStorageTouchedByMutations(
*ctx->data, ctx->source_part, ctx->metadata_snapshot, ctx->commands_for_part, Context::createCopy(context_for_reading)))
*ctx->data, ctx->source_part, ctx->metadata_snapshot, ctx->commands_for_part, context_for_reading))
{
NameSet files_to_copy_instead_of_hardlinks;
auto settings_ptr = ctx->data->getSettings();
@ -1597,6 +1590,15 @@ bool MutateTask::prepare()
LOG_TRACE(ctx->log, "Mutating part {} to mutation version {}", ctx->source_part->name, ctx->future_part->part_info.mutation);
}
/// We must read with one thread because it guarantees that output stream will be sorted.
/// Disable all settings that can enable reading with several streams.
/// NOTE: isStorageTouchedByMutations() above is done without this settings because it
/// should be ok to calculate count() with multiple streams.
context_for_reading->setSetting("max_streams_to_max_threads_ratio", 1);
context_for_reading->setSetting("max_threads", 1);
context_for_reading->setSetting("allow_asynchronous_read_from_io_pool_for_merge_tree", false);
context_for_reading->setSetting("max_streams_for_merge_tree_reading", Field(0));
MutationHelpers::splitMutationCommands(ctx->source_part, ctx->commands_for_part, ctx->for_interpreter, ctx->for_file_renames);
ctx->stage_progress = std::make_unique<MergeStageProgress>(1.0);

View File

@ -208,6 +208,8 @@ Merge it only if you intend to backport changes to the target branch, otherwise
self.cherrypick_pr.add_to_labels(Labels.CHERRYPICK)
self.cherrypick_pr.add_to_labels(Labels.DO_NOT_TEST)
self._assign_new_pr(self.cherrypick_pr)
# update cherrypick PR to get the state for PR.mergable
self.cherrypick_pr.update()
def create_backport(self):
assert self.cherrypick_pr is not None