Merge branch 'master' into iaadeflate_upgrade_qpl_v1.0.0

2024-11-28 10:31:57 +00:00 · 2023-02-02 16:45:33 +01:00 · 2023-02-02 16:45:33 +01:00 · 4b59f5bb4c
commit 4b59f5bb4c
parent 8629a54562 ec3bb0c04e
9 changed files with 95 additions and 145 deletions
--- a/docker/test/util/Dockerfile
+++ b/docker/test/util/Dockerfile
@ -48,6 +48,7 @@ RUN apt-get update \
        gdb \
        git \
        gperf \
+        libclang-rt-${LLVM_VERSION}-dev \
        lld-${LLVM_VERSION} \
        llvm-${LLVM_VERSION} \
        llvm-${LLVM_VERSION}-dev \
--- a/docs/en/operations/caches.md
+++ b/docs/en/operations/caches.md
@ -22,6 +22,6 @@ Additional cache types:
 - [Dictionaries](../sql-reference/dictionaries/index.md) data cache.
 - Schema inference cache.
 - [Filesystem cache](storing-data.md) over S3, Azure, Local and other disks.
- [(Experimental) Query result cache](query-result-cache.md).
+- [(Experimental) Query cache](query-cache.md).

 To drop one of the caches, use [SYSTEM DROP ... CACHE](../sql-reference/statements/system.md#drop-mark-cache) statements.
--- a/docs/en/operations/query-result-cache.md
+++ b/docs/en/operations/query-result-cache.md
@ -1,112 +0,0 @@
---
-slug: /en/operations/query-result-cache
-sidebar_position: 65
-sidebar_label: Query Result Cache [experimental]
---
-
-# Query Result Cache [experimental]
-
-The query result cache allows to compute `SELECT` queries just once and to serve further executions of the same query directly from the
-cache. Depending on the type of the queries, this can dramatically reduce latency and resource consumption of the ClickHouse server.
-
-## Background, Design and Limitations
-
-Query result caches can generally be viewed as transactionally consistent or inconsistent.
-
- In transactionally consistent caches, the database invalidates (discards) cached query results if the result of the `SELECT` query changes
-  or potentially changes. In ClickHouse, operations which change the data include inserts/updates/deletes in/of/from tables or collapsing
-  merges. Transactionally consistent caching is especially suitable for OLTP databases, for example
-  [MySQL](https://dev.mysql.com/doc/refman/5.6/en/query-cache.html) (which removed query result cache after v8.0) and
-  [Oracle](https://docs.oracle.com/database/121/TGDBA/tune_result_cache.htm).
- In transactionally inconsistent caches, slight inaccuracies in query results are accepted under the assumption that all cache entries are
-  assigned a validity period after which they expire (e.g. 1 minute) and that the underlying data changes only little during this period.
-  This approach is overall more suitable for OLAP databases. As an example where transactionally inconsistent caching is sufficient,
-  consider an hourly sales report in a reporting tool which is simultaneously accessed by multiple users. Sales data changes typically
-  slowly enough that the database only needs to compute the report once (represented by the first `SELECT` query). Further queries can be
-  served directly from the query result cache. In this example, a reasonable validity period could be 30 min.
-
-Transactionally inconsistent caching is traditionally provided by client tools or proxy packages interacting with the database. As a result,
-the same caching logic and configuration is often duplicated. With ClickHouse's query result cache, the caching logic moves to the server
-side. This reduces maintenance effort and avoids redundancy.
-
-:::warning
-The query result cache is an experimental feature that should not be used in production. There are known cases (e.g. in distributed query
-processing) where wrong results are returned.
-:::
-
-## Configuration Settings and Usage
-
-As long as the result cache is experimental it must be activated using the following configuration setting:
-
-```sql
-SET allow_experimental_query_result_cache = true;
-```
-
-Afterwards, setting [use_query_result_cache](settings/settings.md#use-query-result-cache) can be used to control whether a specific query or
-all queries of the current session should utilize the query result cache. For example, the first execution of query
-
-```sql
-SELECT some_expensive_calculation(column_1, column_2)
-FROM table
-SETTINGS use_query_result_cache = true;
-```
-
-will store the query result in the query result cache. Subsequent executions of the same query (also with parameter `use_query_result_cache
-= true`) will read the computed result from the cache and return it immediately.
-
-The way the cache is utilized can be configured in more detail using settings [enable_writes_to_query_result_cache](settings/settings.md#enable-writes-to-query-result-cache)
-and [enable_reads_from_query_result_cache](settings/settings.md#enable-reads-from-query-result-cache) (both `true` by default). The first
-settings controls whether query results are stored in the cache, whereas the second parameter determines if the database should try to
-retrieve query results from the cache. For example, the following query will use the cache only passively, i.e. attempt to read from it but
-not store its result in it:
-
-```sql
-SELECT some_expensive_calculation(column_1, column_2)
-FROM table
-SETTINGS use_query_result_cache = true, enable_writes_to_query_result_cache = false;
-```
-
-For maximum control, it is generally recommended to provide settings "use_query_result_cache", "enable_writes_to_query_result_cache" and
-"enable_reads_from_query_result_cache" only with specific queries. It is also possible to enable caching at user or profile level (e.g. via
-`SET use_query_result_cache = true`) but one should keep in mind that all `SELECT` queries including monitoring or debugging queries to
-system tables may return cached results then.
-
-The query result cache can be cleared using statement `SYSTEM DROP QUERY RESULT CACHE`. The content of the query result cache is displayed
-in system table `SYSTEM.QUERY_RESULT_CACHE`. The number of query result cache hits and misses are shown as events "QueryCacheHits" and
-"QueryCacheMisses" in system table `SYSTEM.EVENTS`. Both counters are only updated for `SELECT` queries which run with setting
-"use_query_result_cache = true". Other queries do not affect the cache miss counter.
-
-The query result cache exists once per ClickHouse server process. However, cache results are by default not shared between users. This can
-be changed (see below) but doing so is not recommended for security reasons.
-
-Query results are referenced in the query result cache by the [Abstract Syntax Tree (AST)](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
-of their query. This means that caching is agnostic to upper/lowercase, for example `SELECT 1` and `select 1` are treated as the same query.
-To make the matching more natural, all query-level settings related to the query result cache are removed from the AST.
-
-If the query was aborted due to an exception or user cancellation, no entry is written into the query result cache.
-
-The size of the query result cache, the maximum number of cache entries and the maximum size of cache entries (in bytes and in records) can
-be configured using different [server configuration options](server-configuration-parameters/settings.md#server_configuration_parameters_query-result-cache).
-
-To define how long a query must run at least such that its result can be cached, you can use setting
-[query_result_cache_min_query_duration](settings/settings.md#query-result-cache-min-query-duration). For example, the result of query
-
-``` sql
-SELECT some_expensive_calculation(column_1, column_2)
-FROM table
-SETTINGS use_query_result_cache = true, query_result_cache_min_query_duration = 5000;
-```
-
-is only cached if the query runs longer than 5 seconds. It is also possible to specify how often a query needs to run until its result is
-cached - for that use setting [query_result_cache_min_query_runs](settings/settings.md#query-result-cache-min-query-runs).
-
-Entries in the query result cache become stale after a certain time period (time-to-live). By default, this period is 60 seconds but a
-different value can be specified at session, profile or query level using setting [query_result_cache_ttl](settings/settings.md#query-result-cache-ttl).
-
-Also, results of queries with non-deterministic functions such as `rand()` and `now()` are not cached. This can be overruled using
-setting [query_result_cache_store_results_of_queries_with_nondeterministic_functions](settings/settings.md#query-result-cache-store-results-of-queries-with-nondeterministic-functions).
-
-Finally, entries in the query cache are not shared between users due to security reasons. For example, user A must not be able to bypass a
-row policy on a table by running the same query as another user B for whom no such policy exists. However, if necessary, cache entries can
-be marked accessible by other users (i.e. shared) by supplying setting
-[query_result_cache_share_between_users](settings/settings.md#query-result-cache-share-between-users).
--- a/docs/en/operations/settings/settings.md
+++ b/docs/en/operations/settings/settings.md
@ -1303,7 +1303,7 @@ Default value: `3`.

 ## use_query_cache {#use-query-cache}

-If turned on, `SELECT` queries may utilize the [query cache](../query-cache.md). Parameters [enable_reads_from_query_cache](#enable-readsfrom-query-cache)
+If turned on, `SELECT` queries may utilize the [query cache](../query-cache.md). Parameters [enable_reads_from_query_cache](#enable-reads-from-query-cache)
 and [enable_writes_to_query_cache](#enable-writes-to-query-cache) control in more detail how the cache is used.

 Possible values:
--- a/programs/server/dashboard.html
+++ b/programs/server/dashboard.html
@ -76,7 +76,7 @@
        #charts
        {
            height: 100%;
-            display: flex;
+            display: none;
            flex-flow: row wrap;
            gap: 1rem;
        }
@ -170,6 +170,14 @@
            background: var(--button-background-color);
        }

+        #auth-error {
+            color: var(--error-color);
+
+            display: flex;
+            flex-flow: row nowrap;
+            justify-content: center;
+        }
+
        form {
            display: inline;
        }
@ -293,6 +301,7 @@
        </div>
    </form>
 </div>
+<div id="auth-error"></div>
 <div id="charts"></div>
 <script>

@ -322,6 +331,11 @@ if (location.protocol != 'file:') {
    user = 'default';
 }

+const errorCodeRegex = /Code: (\d+)/
+const errorCodeMessageMap = {
+    516: 'Error authenticating with database. Please check your connection params and try again.'
+}
+
 /// This is just a demo configuration of the dashboard.

 let queries = [
@ -597,6 +611,11 @@ function insertChart(i) {
    query_editor_confirm.value = 'Ok';
    query_editor_confirm.className = 'edit-confirm';

+    function getCurrentIndex() {
+        /// Indices may change after deletion of other element, hence captured "i" may become incorrect.
+        return [...charts.querySelectorAll('.chart')].findIndex(child => chart == child);
+    }
+
    function editConfirm() {
        query_editor.style.display = 'none';
        query_error.style.display = 'none';
@ -605,7 +624,8 @@ function insertChart(i) {
        title_text.data = '';
        findParamsInQuery(q.query, params);
        buildParams();
-        draw(i, chart, getParamsForURL(), q.query);
+        const idx = getCurrentIndex();
+        draw(idx, chart, getParamsForURL(), q.query);
        saveState();
    }

@ -649,8 +669,7 @@ function insertChart(i) {
    let trash_text = document.createTextNode('✕');
    trash.appendChild(trash_text);
    trash.addEventListener('click', e => {
-        /// Indices may change after deletion of other element, hence captured "i" may become incorrect.
-        let idx = [...charts.querySelectorAll('.chart')].findIndex(child => chart == child);
+        const idx = getCurrentIndex();
        if (plots[idx]) {
            plots[idx].destroy();
            plots[idx] = null;
@ -796,6 +815,18 @@ async function draw(idx, chart, url_params, query) {
        error = e.toString();
    }

+    if (error) {
+        const errorMatch = error.match(errorCodeRegex)
+        if (errorMatch && errorMatch[1]) {
+            const code = errorMatch[1]
+            if (errorCodeMessageMap[code]) {
+                const authError = new Error(errorCodeMessageMap[code])
+                authError.code = code
+                throw authError
+            }
+        }
+    }
+
    if (!error) {
        if (!Array.isArray(data)) {
            error = "Query should return an array.";
@ -853,16 +884,50 @@ async function draw(idx, chart, url_params, query) {
    sync.sub(plots[idx]);

    /// Set title
-    const title = queries[idx].title ? queries[idx].title.replaceAll(/\{(\w+)\}/g, (_, name) => params[name] ) : '';
+    const title = queries[idx] && queries[idx].title ? queries[idx].title.replaceAll(/\{(\w+)\}/g, (_, name) => params[name] ) : '';
    chart.querySelector('.title').firstChild.data = title;
 }

+function showAuthError(message) {
+    const charts = document.querySelector('#charts');
+    charts.style.display = 'none';
+    const add = document.querySelector('#add');
+    add.style.display = 'none';
+
+    const authError = document.querySelector('#auth-error');
+    authError.textContent = message;
+    authError.style.display = 'flex';
+}
+
+function hideAuthError() {
+    const charts = document.querySelector('#charts');
+    charts.style.display = 'flex';
+    const add = document.querySelector('#add');
+    add.style.display = 'block';
+
+    const authError = document.querySelector('#auth-error');
+    authError.textContent = '';
+    authError.style.display = 'none';
+}
+
+let firstLoad = true;
+
 async function drawAll() {
    let params = getParamsForURL();
    const charts = document.getElementsByClassName('chart');
-    for (let i = 0; i < queries.length; ++i) {
-        draw(i, charts[i], params, queries[i].query);
+
+    if (!firstLoad) {
+        hideAuthError();
    }
+    await Promise.all([...Array(queries.length)].map(async (_, i) => {
+        return draw(i, charts[i], params, queries[i].query).catch((e) => {
+            if (!firstLoad) {
+                showAuthError(e.message);
+            }
+        });
+    })).then(() => {
+        firstLoad = false;
+    })
 }

 function resize() {
--- a/src/Interpreters/MutationsInterpreter.cpp
+++ b/src/Interpreters/MutationsInterpreter.cpp
@ -17,7 +17,7 @@
 #include <Processors/QueryPlan/ExpressionStep.h>
 #include <Processors/QueryPlan/FilterStep.h>
 #include <Processors/QueryPlan/ReadFromPreparedSource.h>
-#include <Processors/Executors/PullingPipelineExecutor.h>
+#include <Processors/Executors/PullingAsyncPipelineExecutor.h>
 #include <Processors/Transforms/CheckSortedTransform.h>
 #include <Parsers/ASTIdentifier.h>
 #include <Parsers/ASTFunction.h>
@ -197,7 +197,7 @@ bool isStorageTouchedByMutations(
    MergeTreeData::DataPartPtr source_part,
    const StorageMetadataPtr & metadata_snapshot,
    const std::vector<MutationCommand> & commands,
-    ContextMutablePtr context_copy)
+    ContextPtr context)
 {
    if (commands.empty())
        return false;
@ -210,7 +210,7 @@ bool isStorageTouchedByMutations(

        if (command.partition)
        {
-            const String partition_id = storage.getPartitionIDFromQuery(command.partition, context_copy);
+            const String partition_id = storage.getPartitionIDFromQuery(command.partition, context);
            if (partition_id == source_part->info.partition_id)
                all_commands_can_be_skipped = false;
        }
@ -221,15 +221,7 @@ bool isStorageTouchedByMutations(
    if (all_commands_can_be_skipped)
        return false;

-    /// We must read with one thread because it guarantees that
-    /// output stream will be sorted after reading from MergeTree parts.
-    /// Disable all settings that can enable reading with several streams.
-    context_copy->setSetting("max_streams_to_max_threads_ratio", 1);
-    context_copy->setSetting("max_threads", 1);
-    context_copy->setSetting("allow_asynchronous_read_from_io_pool_for_merge_tree", false);
-    context_copy->setSetting("max_streams_for_merge_tree_reading", Field(0));
-
-    ASTPtr select_query = prepareQueryAffectedAST(commands, storage.shared_from_this(), context_copy);
+    ASTPtr select_query = prepareQueryAffectedAST(commands, storage.shared_from_this(), context);

    auto storage_from_part = std::make_shared<StorageFromMergeTreeDataPart>(source_part);

@ -237,12 +229,12 @@ bool isStorageTouchedByMutations(
    /// For some reason it may copy context and give it into ExpressionTransform
    /// after that we will use context from destroyed stack frame in our stream.
    InterpreterSelectQuery interpreter(
-        select_query, context_copy, storage_from_part, metadata_snapshot, SelectQueryOptions().ignoreLimits().ignoreProjections());
+        select_query, context, storage_from_part, metadata_snapshot, SelectQueryOptions().ignoreLimits().ignoreProjections());
    auto io = interpreter.execute();
-    PullingPipelineExecutor executor(io.pipeline);
+    PullingAsyncPipelineExecutor executor(io.pipeline);

    Block block;
-    while (executor.pull(block)) {}
+    while (block.rows() == 0 && executor.pull(block));

    if (!block.rows())
        return false;
--- a/src/Interpreters/MutationsInterpreter.h
+++ b/src/Interpreters/MutationsInterpreter.h
@ -23,7 +23,7 @@ bool isStorageTouchedByMutations(
    MergeTreeData::DataPartPtr source_part,
    const StorageMetadataPtr & metadata_snapshot,
    const std::vector<MutationCommand> & commands,
-    ContextMutablePtr context_copy
+    ContextPtr context
 );

 ASTPtr getPartitionAndPredicateExpressionForMutationCommand(
--- a/src/Storages/MergeTree/MutateTask.cpp
+++ b/src/Storages/MergeTree/MutateTask.cpp
@ -1543,13 +1543,6 @@ bool MutateTask::prepare()

    auto context_for_reading = Context::createCopy(ctx->context);

-    /// We must read with one thread because it guarantees that output stream will be sorted.
-    /// Disable all settings that can enable reading with several streams.
-    context_for_reading->setSetting("max_streams_to_max_threads_ratio", 1);
-    context_for_reading->setSetting("max_threads", 1);
-    context_for_reading->setSetting("allow_asynchronous_read_from_io_pool_for_merge_tree", false);
-    context_for_reading->setSetting("max_streams_for_merge_tree_reading", Field(0));
-
    /// Allow mutations to work when force_index_by_date or force_primary_key is on.
    context_for_reading->setSetting("force_index_by_date", false);
    context_for_reading->setSetting("force_primary_key", false);
@ -1562,7 +1555,7 @@ bool MutateTask::prepare()
    }

    if (ctx->source_part->isStoredOnDisk() && !isStorageTouchedByMutations(
-        *ctx->data, ctx->source_part, ctx->metadata_snapshot, ctx->commands_for_part, Context::createCopy(context_for_reading)))
+        *ctx->data, ctx->source_part, ctx->metadata_snapshot, ctx->commands_for_part, context_for_reading))
    {
        NameSet files_to_copy_instead_of_hardlinks;
        auto settings_ptr = ctx->data->getSettings();
@ -1597,6 +1590,15 @@ bool MutateTask::prepare()
        LOG_TRACE(ctx->log, "Mutating part {} to mutation version {}", ctx->source_part->name, ctx->future_part->part_info.mutation);
    }

+    /// We must read with one thread because it guarantees that output stream will be sorted.
+    /// Disable all settings that can enable reading with several streams.
+    /// NOTE: isStorageTouchedByMutations() above is done without this settings because it
+    /// should be ok to calculate count() with multiple streams.
+    context_for_reading->setSetting("max_streams_to_max_threads_ratio", 1);
+    context_for_reading->setSetting("max_threads", 1);
+    context_for_reading->setSetting("allow_asynchronous_read_from_io_pool_for_merge_tree", false);
+    context_for_reading->setSetting("max_streams_for_merge_tree_reading", Field(0));
+
    MutationHelpers::splitMutationCommands(ctx->source_part, ctx->commands_for_part, ctx->for_interpreter, ctx->for_file_renames);

    ctx->stage_progress = std::make_unique<MergeStageProgress>(1.0);
--- a/tests/ci/cherry_pick.py
+++ b/tests/ci/cherry_pick.py
@ -208,6 +208,8 @@ Merge it only if you intend to backport changes to the target branch, otherwise
        self.cherrypick_pr.add_to_labels(Labels.CHERRYPICK)
        self.cherrypick_pr.add_to_labels(Labels.DO_NOT_TEST)
        self._assign_new_pr(self.cherrypick_pr)
+        # update cherrypick PR to get the state for PR.mergable
+        self.cherrypick_pr.update()

    def create_backport(self):
        assert self.cherrypick_pr is not None