Merge branch 'master' into use-iobject-storage-for-table-engines-1

This commit is contained in:
Kseniia Sumarokova 2024-03-11 18:47:56 +01:00 committed by GitHub
commit 38ba6990bb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
36 changed files with 1063 additions and 153 deletions

View File

@ -433,3 +433,292 @@ Result:
│ [0,1,2,3,4,5,6,7] │
└───────────────────┘
```
## mortonEncode
Calculates the Morton encoding (ZCurve) for a list of unsigned integers.
The function has two modes of operation:
- Simple
- Expanded
### Simple mode
Accepts up to 8 unsigned integers as arguments and produces a UInt64 code.
**Syntax**
```sql
mortonEncode(args)
```
**Parameters**
- `args`: up to 8 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type.
**Returned value**
- A UInt64 code
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Query:
```sql
SELECT mortonEncode(1, 2, 3);
```
Result:
```response
53
```
### Expanded mode
Accepts a range mask ([tuple](../../sql-reference/data-types/tuple.md)) as a first argument and up to 8 [unsigned integers](../../sql-reference/data-types/int-uint.md) as other arguments.
Each number in the mask configures the amount of range expansion:<br/>
1 - no expansion<br/>
2 - 2x expansion<br/>
3 - 3x expansion<br/>
...<br/>
Up to 8x expansion.<br/>
**Syntax**
```sql
mortonEncode(range_mask, args)
```
**Parameters**
- `range_mask`: 1-8.
- `args`: up to 8 [unsigned integers](../../sql-reference/data-types/int-uint.md) or columns of the aforementioned type.
Note: when using columns for `args` the provided `range_mask` tuple should still be a constant.
**Returned value**
- A UInt64 code
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
Query:
```sql
SELECT mortonEncode((1,2), 1024, 16);
```
Result:
```response
1572864
```
Note: tuple size must be equal to the number of the other arguments.
**Example**
Morton encoding for one argument is always the argument itself:
Query:
```sql
SELECT mortonEncode(1);
```
Result:
```response
1
```
**Example**
It is also possible to expand one argument too:
Query:
```sql
SELECT mortonEncode(tuple(2), 128);
```
Result:
```response
32768
```
**Example**
You can also use column names in the function.
Query:
First create the table and insert some data.
```sql
create table morton_numbers(
n1 UInt32,
n2 UInt32,
n3 UInt16,
n4 UInt16,
n5 UInt8,
n6 UInt8,
n7 UInt8,
n8 UInt8
)
Engine=MergeTree()
ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';
insert into morton_numbers (*) values(1,2,3,4,5,6,7,8);
```
Use column names instead of constants as function arguments to `mortonEncode`
Query:
```sql
SELECT mortonEncode(n1, n2, n3, n4, n5, n6, n7, n8) FROM morton_numbers;
```
Result:
```response
2155374165
```
**implementation details**
Please note that you can fit only so many bits of information into Morton code as [UInt64](../../sql-reference/data-types/int-uint.md) has. Two arguments will have a range of maximum 2^32 (64/2) each, three arguments a range of max 2^21 (64/3) each and so on. All overflow will be clamped to zero.
## mortonDecode
Decodes a Morton encoding (ZCurve) into the corresponding unsigned integer tuple.
As with the `mortonEncode` function, this function has two modes of operation:
- Simple
- Expanded
### Simple mode
Accepts a resulting tuple size as the first argument and the code as the second argument.
**Syntax**
```sql
mortonDecode(tuple_size, code)
```
**Parameters**
- `tuple_size`: integer value no more than 8.
- `code`: [UInt64](../../sql-reference/data-types/int-uint.md) code.
**Returned value**
- [tuple](../../sql-reference/data-types/tuple.md) of the specified size.
Type: [UInt64](../../sql-reference/data-types/int-uint.md)
**Example**
Query:
```sql
SELECT mortonDecode(3, 53);
```
Result:
```response
["1","2","3"]
```
### Expanded mode
Accepts a range mask (tuple) as a first argument and the code as the second argument.
Each number in the mask configures the amount of range shrink:<br/>
1 - no shrink<br/>
2 - 2x shrink<br/>
3 - 3x shrink<br/>
...<br/>
Up to 8x shrink.<br/>
Range expansion can be beneficial when you need a similar distribution for arguments with wildly different ranges (or cardinality)
For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
As with the encode function, this is limited to 8 numbers at most.
**Example**
Query:
```sql
SELECT mortonDecode(1, 1);
```
Result:
```response
["1"]
```
**Example**
It is also possible to shrink one argument:
Query:
```sql
SELECT mortonDecode(tuple(2), 32768);
```
Result:
```response
["128"]
```
**Example**
You can also use column names in the function.
First create the table and insert some data.
Query:
```sql
create table morton_numbers(
n1 UInt32,
n2 UInt32,
n3 UInt16,
n4 UInt16,
n5 UInt8,
n6 UInt8,
n7 UInt8,
n8 UInt8
)
Engine=MergeTree()
ORDER BY n1 SETTINGS index_granularity = 8192, index_granularity_bytes = '10Mi';
insert into morton_numbers (*) values(1,2,3,4,5,6,7,8);
```
Use column names instead of constants as function arguments to `mortonDecode`
Query:
```sql
select untuple(mortonDecode(8, mortonEncode(n1, n2, n3, n4, n5, n6, n7, n8))) from morton_numbers;
```
Result:
```response
1 2 3 4 5 6 7 8
```

View File

@ -53,6 +53,62 @@ String starting with `POLYGON`
Polygon
## readWKTPoint
The `readWKTPoint` function in ClickHouse parses a Well-Known Text (WKT) representation of a Point geometry and returns a point in the internal ClickHouse format.
### Syntax
```sql
readWKTPoint(wkt_string)
```
### Arguments
- `wkt_string`: The input WKT string representing a Point geometry.
### Returned value
The function returns a ClickHouse internal representation of the Point geometry.
### Example
```sql
SELECT readWKTPoint('POINT (1.2 3.4)');
```
```response
(1.2,3.4)
```
## readWKTRing
Parses a Well-Known Text (WKT) representation of a Polygon geometry and returns a ring (closed linestring) in the internal ClickHouse format.
### Syntax
```sql
readWKTRing(wkt_string)
```
### Arguments
- `wkt_string`: The input WKT string representing a Polygon geometry.
### Returned value
The function returns a ClickHouse internal representation of the ring (closed linestring) geometry.
### Example
```sql
SELECT readWKTRing('LINESTRING (1 1, 2 2, 3 3, 1 1)');
```
```response
[(1,1),(2,2),(3,3),(1,1)]
```
## polygonsWithinSpherical
Returns true or false depending on whether or not one polygon lies completely inside another polygon. Reference https://www.boost.org/doc/libs/1_62_0/libs/geometry/doc/html/geometry/reference/algorithms/within/within_2.html

View File

@ -86,6 +86,7 @@
namespace ProfileEvents
{
extern const Event ScalarSubqueriesGlobalCacheHit;
extern const Event ScalarSubqueriesLocalCacheHit;
extern const Event ScalarSubqueriesCacheMiss;
}
@ -1444,7 +1445,8 @@ private:
std::unordered_map<QueryTreeNodePtr, size_t> node_to_tree_size;
/// Global scalar subquery to scalar value map
std::unordered_map<QueryTreeNodePtrWithHash, Block> scalar_subquery_to_scalar_value;
std::unordered_map<QueryTreeNodePtrWithHash, Block> scalar_subquery_to_scalar_value_local;
std::unordered_map<QueryTreeNodePtrWithHash, Block> scalar_subquery_to_scalar_value_global;
const bool only_analyze;
};
@ -1951,6 +1953,24 @@ QueryTreeNodePtr QueryAnalyzer::tryGetLambdaFromSQLUserDefinedFunctions(const st
return result_node;
}
bool subtreeHasViewSource(const IQueryTreeNode * node, const Context & context)
{
if (!node)
return false;
if (const auto * table_node = node->as<TableNode>())
{
if (table_node->getStorageID().getFullNameNotQuoted() == context.getViewSource()->getStorageID().getFullNameNotQuoted())
return true;
}
for (const auto & child : node->getChildren())
if (subtreeHasViewSource(child.get(), context))
return true;
return false;
}
/// Evaluate scalar subquery and perform constant folding if scalar subquery does not have constant value
void QueryAnalyzer::evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & node, IdentifierResolveScope & scope)
{
@ -1970,12 +1990,26 @@ void QueryAnalyzer::evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & node, Iden
node_without_alias->removeAlias();
QueryTreeNodePtrWithHash node_with_hash(node_without_alias);
auto scalar_value_it = scalar_subquery_to_scalar_value.find(node_with_hash);
auto str_hash = DB::toString(node_with_hash.hash);
if (scalar_value_it != scalar_subquery_to_scalar_value.end())
bool can_use_global_scalars = !only_analyze && !(context->getViewSource() && subtreeHasViewSource(node_without_alias.get(), *context));
auto & scalars_cache = can_use_global_scalars ? scalar_subquery_to_scalar_value_global : scalar_subquery_to_scalar_value_local;
if (scalars_cache.contains(node_with_hash))
{
if (can_use_global_scalars)
ProfileEvents::increment(ProfileEvents::ScalarSubqueriesGlobalCacheHit);
else
ProfileEvents::increment(ProfileEvents::ScalarSubqueriesLocalCacheHit);
scalar_block = scalars_cache.at(node_with_hash);
}
else if (context->hasQueryContext() && can_use_global_scalars && context->getQueryContext()->hasScalar(str_hash))
{
scalar_block = context->getQueryContext()->getScalar(str_hash);
scalar_subquery_to_scalar_value_global.emplace(node_with_hash, scalar_block);
ProfileEvents::increment(ProfileEvents::ScalarSubqueriesGlobalCacheHit);
scalar_block = scalar_value_it->second;
}
else
{
@ -2087,7 +2121,9 @@ void QueryAnalyzer::evaluateScalarSubqueryIfNeeded(QueryTreeNodePtr & node, Iden
}
}
scalar_subquery_to_scalar_value.emplace(node_with_hash, scalar_block);
scalars_cache.emplace(node_with_hash, scalar_block);
if (can_use_global_scalars && context->hasQueryContext())
context->getQueryContext()->addScalar(str_hash, scalar_block);
}
const auto & scalar_column_with_type = scalar_block.safeGetByPosition(0);

View File

@ -21,6 +21,7 @@ namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int ALL_CONNECTION_TRIES_FAILED;
extern const int BAD_ARGUMENTS;
}
@ -191,11 +192,20 @@ std::vector<ConnectionPoolWithFailover::TryResult> ConnectionPoolWithFailover::g
max_entries = nested_pools.size();
}
else if (pool_mode == PoolMode::GET_ONE)
{
max_entries = 1;
}
else if (pool_mode == PoolMode::GET_MANY)
{
if (settings.max_parallel_replicas == 0)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "The value of the setting max_parallel_replicas must be greater than 0");
max_entries = settings.max_parallel_replicas;
}
else
{
throw DB::Exception(DB::ErrorCodes::LOGICAL_ERROR, "Unknown pool allocation mode");
}
if (!priority_func)
priority_func = makeGetPriorityFunc(settings);

View File

@ -19,6 +19,7 @@ namespace ErrorCodes
extern const int ALL_CONNECTION_TRIES_FAILED;
extern const int ALL_REPLICAS_ARE_STALE;
extern const int LOGICAL_ERROR;
extern const int BAD_ARGUMENTS;
}
HedgedConnectionsFactory::HedgedConnectionsFactory(
@ -82,7 +83,10 @@ std::vector<Connection *> HedgedConnectionsFactory::getManyConnections(PoolMode
}
case PoolMode::GET_MANY:
{
max_entries = max_parallel_replicas;
if (max_parallel_replicas == 0)
throw Exception(ErrorCodes::BAD_ARGUMENTS, "The value of the setting max_parallel_replicas must be greater than 0");
max_entries = std::min(max_parallel_replicas, shuffled_pools.size());
break;
}
}

View File

@ -158,7 +158,7 @@ private:
/// checking the number of requested replicas that are still in process).
size_t requested_connections_count = 0;
const size_t max_parallel_replicas = 0;
const size_t max_parallel_replicas = 1;
const bool skip_unavailable_shards = 0;
};

View File

@ -112,7 +112,6 @@ namespace DB
M(UInt64, tables_loader_background_pool_size, 0, "The maximum number of threads that will be used for background async loading of tables. Zero means use all CPUs.", 0) \
M(Bool, async_load_databases, false, "Enable asynchronous loading of databases and tables to speedup server startup. Queries to not yet loaded entity will be blocked until load is finished.", 0) \
M(Bool, display_secrets_in_show_and_select, false, "Allow showing secrets in SHOW and SELECT queries via a format setting and a grant", 0) \
\
M(Seconds, keep_alive_timeout, DEFAULT_HTTP_KEEP_ALIVE_TIMEOUT, "The number of seconds that ClickHouse waits for incoming requests before closing the connection.", 0) \
M(Seconds, replicated_fetches_http_connection_timeout, 0, "HTTP connection timeout for part fetch requests. Inherited from default profile `http_connection_timeout` if not set explicitly.", 0) \
M(Seconds, replicated_fetches_http_send_timeout, 0, "HTTP send timeout for part fetch requests. Inherited from default profile `http_send_timeout` if not set explicitly.", 0) \

View File

@ -17,16 +17,15 @@ using namespace DB;
namespace
{
bool withFileCache(const ReadSettings & settings)
{
return settings.remote_fs_cache && settings.enable_filesystem_cache
&& (!CurrentThread::getQueryId().empty() || settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache
|| !settings.avoid_readthrough_cache_outside_query_context);
}
bool withPageCache(const ReadSettings & settings, bool with_file_cache)
{
return settings.page_cache && !with_file_cache && settings.use_page_cache_for_disks_without_file_cache;
}
bool withFileCache(const ReadSettings & settings)
{
return settings.remote_fs_cache && settings.enable_filesystem_cache;
}
bool withPageCache(const ReadSettings & settings, bool with_file_cache)
{
return settings.page_cache && !with_file_cache && settings.use_page_cache_for_disks_without_file_cache;
}
}
namespace DB

View File

@ -43,10 +43,6 @@ ReadSettings CachedObjectStorage::patchSettings(const ReadSettings & read_settin
{
ReadSettings modified_settings{read_settings};
modified_settings.remote_fs_cache = cache;
if (!canUseReadThroughCache(read_settings))
modified_settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache = true;
return object_storage->patchSettings(modified_settings);
}
@ -206,14 +202,4 @@ String CachedObjectStorage::getObjectsNamespace() const
return object_storage->getObjectsNamespace();
}
bool CachedObjectStorage::canUseReadThroughCache(const ReadSettings & settings)
{
if (!settings.avoid_readthrough_cache_outside_query_context)
return true;
return CurrentThread::isInitialized()
&& CurrentThread::get().getQueryContext()
&& !CurrentThread::getQueryId().empty();
}
}

View File

@ -119,8 +119,6 @@ public:
const FileCacheSettings & getCacheSettings() const { return cache_settings; }
static bool canUseReadThroughCache(const ReadSettings & settings);
#if USE_AZURE_BLOB_STORAGE
std::shared_ptr<const Azure::Storage::Blobs::BlobContainerClient> getAzureBlobStorageClient() override
{

View File

@ -99,8 +99,7 @@ struct ReadSettings
bool enable_filesystem_cache = true;
bool read_from_filesystem_cache_if_exists_otherwise_bypass_cache = false;
bool enable_filesystem_cache_log = false;
/// Don't populate cache when the read is not part of query execution (e.g. background thread).
bool avoid_readthrough_cache_outside_query_context = true;
bool force_read_through_cache_merges = false;
size_t filesystem_cache_segments_batch_size = 20;
bool use_page_cache_for_disks_without_file_cache = false;

View File

@ -10,6 +10,7 @@
#include <Common/logger_useful.h>
#include <Common/scope_guard_safe.h>
#include <Common/ElapsedTimeProfileEventIncrement.h>
#include <Common/setThreadName.h>
#include <magic_enum.hpp>
@ -195,7 +196,7 @@ bool FileSegment::isDownloaded() const
String FileSegment::getCallerId()
{
if (!CurrentThread::isInitialized() || CurrentThread::getQueryId().empty())
return "None:" + toString(getThreadId());
return fmt::format("None:{}:{}", getThreadName(), toString(getThreadId()));
return std::string(CurrentThread::getQueryId()) + ":" + toString(getThreadId());
}

View File

@ -947,7 +947,7 @@ bool InterpreterSelectQuery::adjustParallelReplicasAfterAnalysis()
if (number_of_replicas_to_use <= 1)
{
context->setSetting("allow_experimental_parallel_reading_from_replicas", Field(0));
context->setSetting("max_parallel_replicas", UInt64{0});
context->setSetting("max_parallel_replicas", UInt64{1});
LOG_DEBUG(log, "Disabling parallel replicas because there aren't enough rows to read");
return true;
}

View File

@ -295,7 +295,7 @@ bool applyTrivialCountIfPossible(
/// The query could use trivial count if it didn't use parallel replicas, so let's disable it
query_context->setSetting("allow_experimental_parallel_reading_from_replicas", Field(0));
query_context->setSetting("max_parallel_replicas", UInt64{0});
query_context->setSetting("max_parallel_replicas", UInt64{1});
LOG_TRACE(getLogger("Planner"), "Disabling parallel replicas to be able to use a trivial count optimization");
}
@ -756,7 +756,7 @@ JoinTreeQueryPlan buildQueryPlanForTableExpression(QueryTreeNodePtr table_expres
{
planner_context->getMutableQueryContext()->setSetting(
"allow_experimental_parallel_reading_from_replicas", Field(0));
planner_context->getMutableQueryContext()->setSetting("max_parallel_replicas", UInt64{0});
planner_context->getMutableQueryContext()->setSetting("max_parallel_replicas", UInt64{1});
LOG_DEBUG(getLogger("Planner"), "Disabling parallel replicas because there aren't enough rows to read");
}
else if (number_of_replicas_to_use < settings.max_parallel_replicas)

View File

@ -149,7 +149,8 @@ MergeTreeSequentialSource::MergeTreeSequentialSource(
const auto & context = storage.getContext();
ReadSettings read_settings = context->getReadSettings();
read_settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache = true;
read_settings.read_from_filesystem_cache_if_exists_otherwise_bypass_cache = !storage.getSettings()->force_read_through_cache_for_merges;
/// It does not make sense to use pthread_threadpool for background merges/mutations
/// And also to preserve backward compatibility
read_settings.local_fs_method = LocalFSReadMethod::pread;

View File

@ -192,6 +192,7 @@ struct Settings;
M(String, remote_fs_zero_copy_zookeeper_path, "/clickhouse/zero_copy", "ZooKeeper path for zero-copy table-independent info.", 0) \
M(Bool, remote_fs_zero_copy_path_compatible_mode, false, "Run zero-copy in compatible mode during conversion process.", 0) \
M(Bool, cache_populated_by_fetch, false, "Only available in ClickHouse Cloud", 0) \
M(Bool, force_read_through_cache_for_merges, false, "Force read-through filesystem cache for merges", 0) \
M(Bool, allow_experimental_block_number_column, false, "Enable persisting column _block_number for each row.", 0) \
M(Bool, allow_experimental_replacing_merge_with_cleanup, false, "Allow experimental CLEANUP merges for ReplacingMergeTree with is_deleted column.", 0) \
\

View File

@ -4,14 +4,12 @@
01062_pm_all_join_with_block_continuation
01083_expressions_in_engine_arguments
01155_rename_move_materialized_view
01244_optimize_distributed_group_by_sharding_key
01584_distributed_buffer_cannot_find_column
01624_soft_constraints
01747_join_view_filter_dictionary
01761_cast_to_enum_nullable
01925_join_materialized_columns
01952_optimize_distributed_group_by_sharding_key
02174_cte_scalar_cache_mv
02354_annoy
# Check after constants refactoring
02901_parallel_replicas_rollup

View File

@ -1756,7 +1756,8 @@ def main() -> int:
result["build"] = build_digest
result["docs"] = docs_digest
result["ci_flags"] = ci_flags
result["stages_data"] = _generate_ci_stage_config(jobs_data)
if not args.skip_jobs:
result["stages_data"] = _generate_ci_stage_config(jobs_data)
result["jobs_data"] = jobs_data
result["docker_data"] = docker_data
### CONFIGURE action: end

View File

@ -44,11 +44,12 @@ RETRY_SLEEP = 0
class EventType:
UNKNOWN = 0
PUSH = 1
PULL_REQUEST = 2
SCHEDULE = 3
DISPATCH = 4
UNKNOWN = "unknown"
PUSH = "commits"
PULL_REQUEST = "pull_request"
SCHEDULE = "schedule"
DISPATCH = "dispatch"
MERGE_QUEUE = "merge_group"
def get_pr_for_commit(sha, ref):
@ -114,6 +115,12 @@ class PRInfo:
# release_pr and merged_pr are used for docker images additional cache
self.release_pr = 0
self.merged_pr = 0
self.labels = set()
repo_prefix = f"{GITHUB_SERVER_URL}/{GITHUB_REPOSITORY}"
self.task_url = GITHUB_RUN_URL
self.repo_full_name = GITHUB_REPOSITORY
self.event_type = EventType.UNKNOWN
ref = github_event.get("ref", "refs/heads/master")
if ref and ref.startswith("refs/heads/"):
@ -154,10 +161,6 @@ class PRInfo:
else:
self.sha = github_event["pull_request"]["head"]["sha"]
repo_prefix = f"{GITHUB_SERVER_URL}/{GITHUB_REPOSITORY}"
self.task_url = GITHUB_RUN_URL
self.repo_full_name = GITHUB_REPOSITORY
self.commit_html_url = f"{repo_prefix}/commits/{self.sha}"
self.pr_html_url = f"{repo_prefix}/pull/{self.number}"
@ -176,7 +179,7 @@ class PRInfo:
self.body = github_event["pull_request"]["body"]
self.labels = {
label["name"] for label in github_event["pull_request"]["labels"]
} # type: Set[str]
}
self.user_login = github_event["pull_request"]["user"]["login"] # type: str
self.user_orgs = set() # type: Set[str]
@ -191,6 +194,28 @@ class PRInfo:
self.diff_urls.append(self.compare_pr_url(github_event["pull_request"]))
elif (
EventType.MERGE_QUEUE in github_event
): # pull request and other similar events
self.event_type = EventType.MERGE_QUEUE
# FIXME: need pr? we can parse it from ["head_ref": "refs/heads/gh-readonly-queue/test-merge-queue/pr-6751-4690229995a155e771c52e95fbd446d219c069bf"]
self.number = 0
self.sha = github_event[EventType.MERGE_QUEUE]["head_sha"]
self.base_ref = github_event[EventType.MERGE_QUEUE]["base_ref"]
base_sha = github_event[EventType.MERGE_QUEUE]["base_sha"] # type: str
# ClickHouse/ClickHouse
self.base_name = github_event["repository"]["full_name"]
# any_branch-name - the name of working branch name
self.head_ref = github_event[EventType.MERGE_QUEUE]["head_ref"]
# UserName/ClickHouse or ClickHouse/ClickHouse
self.head_name = self.base_name
self.user_login = github_event["sender"]["login"]
self.diff_urls.append(
github_event["repository"]["compare_url"]
.replace("{base}", base_sha)
.replace("{head}", self.sha)
)
elif "commits" in github_event:
self.event_type = EventType.PUSH
# `head_commit` always comes with `commits`
@ -203,10 +228,8 @@ class PRInfo:
logging.error("Failed to convert %s to integer", merged_pr)
self.sha = github_event["after"]
pull_request = get_pr_for_commit(self.sha, github_event["ref"])
repo_prefix = f"{GITHUB_SERVER_URL}/{GITHUB_REPOSITORY}"
self.task_url = GITHUB_RUN_URL
self.commit_html_url = f"{repo_prefix}/commits/{self.sha}"
self.repo_full_name = GITHUB_REPOSITORY
if pull_request is None or pull_request["state"] == "closed":
# it's merged PR to master
self.number = 0
@ -272,11 +295,7 @@ class PRInfo:
"GITHUB_SHA", "0000000000000000000000000000000000000000"
)
self.number = 0
self.labels = set()
repo_prefix = f"{GITHUB_SERVER_URL}/{GITHUB_REPOSITORY}"
self.task_url = GITHUB_RUN_URL
self.commit_html_url = f"{repo_prefix}/commits/{self.sha}"
self.repo_full_name = GITHUB_REPOSITORY
self.pr_html_url = f"{repo_prefix}/commits/{ref}"
self.base_ref = ref
self.base_name = self.repo_full_name
@ -300,6 +319,9 @@ class PRInfo:
def is_scheduled(self):
return self.event_type == EventType.SCHEDULE
def is_merge_queue(self):
return self.event_type == EventType.MERGE_QUEUE
def is_dispatched(self):
return self.event_type == EventType.DISPATCH

View File

@ -288,6 +288,9 @@ class JobReport:
# if False no GH commit status will be created by CI
need_commit_status: bool = True
def __post_init__(self):
assert self.status in (SUCCESS, ERROR, FAILURE, PENDING)
@classmethod
def exist(cls) -> bool:
return JOB_REPORT_FILE.is_file()

View File

@ -1,16 +1,17 @@
#!/usr/bin/env python3
import argparse
from concurrent.futures import ProcessPoolExecutor
import csv
import logging
import os
import shutil
import subprocess
import sys
from concurrent.futures import ProcessPoolExecutor
from pathlib import Path
from typing import List, Tuple
from docker_images_helper import get_docker_image, pull_image
from env_helper import REPO_COPY, TEMP_PATH
from env_helper import CI, REPO_COPY, TEMP_PATH
from git_helper import GIT_PREFIX, git_runner
from pr_info import PRInfo
from report import ERROR, FAILURE, SUCCESS, JobReport, TestResults, read_test_results
@ -120,43 +121,70 @@ def checkout_last_ref(pr_info: PRInfo) -> None:
def main():
logging.basicConfig(level=logging.INFO)
logging.getLogger("git_helper").setLevel(logging.DEBUG)
# args = parse_args()
args = parse_args()
stopwatch = Stopwatch()
repo_path = Path(REPO_COPY)
temp_path = Path(TEMP_PATH)
if temp_path.is_dir():
shutil.rmtree(temp_path)
temp_path.mkdir(parents=True, exist_ok=True)
# pr_info = PRInfo()
pr_info = PRInfo()
IMAGE_NAME = "clickhouse/style-test"
image = pull_image(get_docker_image(IMAGE_NAME))
cmd_1 = (
cmd_cpp = (
f"docker run -u $(id -u ${{USER}}):$(id -g ${{USER}}) --cap-add=SYS_PTRACE "
f"--volume={repo_path}:/ClickHouse --volume={temp_path}:/test_output "
f"--entrypoint= -w/ClickHouse/utils/check-style "
f"{image} ./check_cpp_docs.sh"
f"{image} ./check_cpp.sh"
)
cmd_2 = (
cmd_py = (
f"docker run -u $(id -u ${{USER}}):$(id -g ${{USER}}) --cap-add=SYS_PTRACE "
f"--volume={repo_path}:/ClickHouse --volume={temp_path}:/test_output "
f"--entrypoint= -w/ClickHouse/utils/check-style "
f"{image} ./check_py.sh"
)
logging.info("Is going to run the command: %s", cmd_1)
logging.info("Is going to run the command: %s", cmd_2)
cmd_docs = (
f"docker run -u $(id -u ${{USER}}):$(id -g ${{USER}}) --cap-add=SYS_PTRACE "
f"--volume={repo_path}:/ClickHouse --volume={temp_path}:/test_output "
f"--entrypoint= -w/ClickHouse/utils/check-style "
f"{image} ./check_docs.sh"
)
with ProcessPoolExecutor(max_workers=2) as executor:
# Submit commands for execution in parallel
future1 = executor.submit(subprocess.run, cmd_1, shell=True)
future2 = executor.submit(subprocess.run, cmd_2, shell=True)
# Wait for both commands to complete
_ = future1.result()
_ = future2.result()
logging.info("Run docs files check: %s", cmd_docs)
future = executor.submit(subprocess.run, cmd_docs, shell=True)
# Parallelization does not make it faster - run subsequently
_ = future.result()
# if args.push:
# checkout_head(pr_info)
run_cppcheck = True
run_pycheck = True
if CI and pr_info.number > 0:
pr_info.fetch_changed_files()
if not any(file.endswith(".py") for file in pr_info.changed_files):
run_pycheck = False
if all(file.endswith(".py") for file in pr_info.changed_files):
run_cppcheck = False
if run_cppcheck:
logging.info("Run source files check: %s", cmd_cpp)
future1 = executor.submit(subprocess.run, cmd_cpp, shell=True)
_ = future1.result()
if run_pycheck:
if args.push:
checkout_head(pr_info)
logging.info("Run py files check: %s", cmd_py)
future2 = executor.submit(subprocess.run, cmd_py, shell=True)
_ = future2.result()
if args.push:
commit_push_staged(pr_info)
checkout_last_ref(pr_info)
subprocess.check_call(
f"python3 ../../utils/check-style/process_style_check_result.py --in-results-dir {temp_path} "
@ -165,10 +193,6 @@ def main():
shell=True,
)
# if args.push:
# commit_push_staged(pr_info)
# checkout_last_ref(pr_info)
state, description, test_results, additional_files = process_result(temp_path)
JobReport(

View File

@ -0,0 +1,5 @@
<clickhouse>
<merge_tree>
<force_read_through_cache_for_merges>1</force_read_through_cache_for_merges>
</merge_tree>
</clickhouse>

View File

@ -19,6 +19,9 @@ def cluster():
main_configs=[
"config.d/storage_conf.xml",
],
user_configs=[
"users.d/cache_on_write_operations.xml",
],
stay_alive=True,
)
cluster.add_instance(
@ -35,6 +38,17 @@ def cluster():
],
stay_alive=True,
)
cluster.add_instance(
"node_force_read_through_cache_on_merge",
main_configs=[
"config.d/storage_conf.xml",
"config.d/force_read_through_cache_for_merges.xml",
],
user_configs=[
"users.d/cache_on_write_operations.xml",
],
stay_alive=True,
)
logging.info("Starting cluster...")
cluster.start()
@ -80,12 +94,21 @@ def test_parallel_cache_loading_on_startup(cluster, node_name):
cache_state = node.query(
"SELECT key, file_segment_range_begin, size FROM system.filesystem_cache WHERE size > 0 ORDER BY key, file_segment_range_begin, size"
)
keys = (
node.query(
"SELECT distinct(key) FROM system.filesystem_cache WHERE size > 0 ORDER BY key, file_segment_range_begin, size"
)
.strip()
.splitlines()
)
node.restart_clickhouse()
assert cache_count == int(node.query("SELECT count() FROM system.filesystem_cache"))
# < because of additional files loaded into cache on server startup.
assert cache_count <= int(node.query("SELECT count() FROM system.filesystem_cache"))
keys_set = ",".join(["'" + x + "'" for x in keys])
assert cache_state == node.query(
"SELECT key, file_segment_range_begin, size FROM system.filesystem_cache ORDER BY key, file_segment_range_begin, size"
f"SELECT key, file_segment_range_begin, size FROM system.filesystem_cache WHERE key in ({keys_set}) ORDER BY key, file_segment_range_begin, size"
)
assert node.contains_in_log("Loading filesystem cache with 30 threads")
@ -323,3 +346,83 @@ def test_custom_cached_disk(cluster):
"SELECT cache_path FROM system.disks WHERE name = 'custom_cached4'"
).strip()
)
def test_force_filesystem_cache_on_merges(cluster):
def test(node, forced_read_through_cache_on_merge):
def to_int(value):
if value == "":
return 0
else:
return int(value)
r_cache_count = to_int(
node.query(
"SELECT value FROM system.events WHERE name = 'CachedReadBufferCacheWriteBytes'"
)
)
w_cache_count = to_int(
node.query(
"SELECT value FROM system.events WHERE name = 'CachedWriteBufferCacheWriteBytes'"
)
)
node.query(
"""
DROP TABLE IF EXISTS test SYNC;
CREATE TABLE test (key UInt32, value String)
Engine=MergeTree()
ORDER BY value
SETTINGS disk = disk(
type = cache,
path = 'force_cache_on_merges',
disk = 'hdd_blob',
max_file_segment_size = '1Ki',
cache_on_write_operations = 1,
boundary_alignment = '1Ki',
max_size = '10Gi',
max_elements = 10000000,
load_metadata_threads = 30);
SYSTEM DROP FILESYSTEM CACHE;
INSERT INTO test SELECT * FROM generateRandom('a Int32, b String') LIMIT 1000000;
INSERT INTO test SELECT * FROM generateRandom('a Int32, b String') LIMIT 1000000;
"""
)
assert int(node.query("SELECT count() FROM system.filesystem_cache")) > 0
assert int(node.query("SELECT max(size) FROM system.filesystem_cache")) == 1024
w_cache_count_2 = int(
node.query(
"SELECT value FROM system.events WHERE name = 'CachedWriteBufferCacheWriteBytes'"
)
)
assert w_cache_count_2 > w_cache_count
r_cache_count_2 = to_int(
node.query(
"SELECT value FROM system.events WHERE name = 'CachedReadBufferCacheWriteBytes'"
)
)
assert r_cache_count_2 == r_cache_count
node.query("SYSTEM DROP FILESYSTEM CACHE")
node.query("OPTIMIZE TABLE test FINAL")
r_cache_count_3 = to_int(
node.query(
"SELECT value FROM system.events WHERE name = 'CachedReadBufferCacheWriteBytes'"
)
)
if forced_read_through_cache_on_merge:
assert r_cache_count_3 > r_cache_count
else:
assert r_cache_count_3 == r_cache_count
node = cluster.instances["node_force_read_through_cache_on_merge"]
test(node, True)
node = cluster.instances["node"]
test(node, False)

View File

@ -0,0 +1,7 @@
<clickhouse>
<profiles>
<default>
<enable_filesystem_cache_on_write_operations>1</enable_filesystem_cache_on_write_operations>
</default>
</profiles>
</clickhouse>

View File

@ -42,7 +42,7 @@ select 'GROUP BY number ORDER BY number DESC';
select count(), * from dist_01247 group by number order by number desc;
select 'GROUP BY toString(number)';
select count(), * from dist_01247 group by toString(number);
select count(), any(number) from dist_01247 group by toString(number);
select 'GROUP BY number%2';
select count(), any(number) from dist_01247 group by number%2;

View File

@ -19,6 +19,48 @@
94 94 94 94 5
99 99 99 99 5
02177_MV 7 80 22
4 4 4 4 5
9 9 9 9 5
14 14 14 14 5
19 19 19 19 5
24 24 24 24 5
29 29 29 29 5
34 34 34 34 5
39 39 39 39 5
44 44 44 44 5
49 49 49 49 5
54 54 54 54 5
59 59 59 59 5
64 64 64 64 5
69 69 69 69 5
74 74 74 74 5
79 79 79 79 5
84 84 84 84 5
89 89 89 89 5
94 94 94 94 5
99 99 99 99 5
02177_MV 0 0 22
10
40
70
100
130
160
190
220
250
280
310
340
370
400
430
460
490
520
550
580
02177_MV_2 0 0 21
10
40
70
@ -61,3 +103,24 @@
188
198
02177_MV_3 20 0 1
8
18
28
38
48
58
68
78
88
98
108
118
128
138
148
158
168
178
188
198
02177_MV_3 19 0 2

View File

@ -14,6 +14,8 @@ CREATE MATERIALIZED VIEW mv1 TO t2 AS
FROM t1
LIMIT 5;
set allow_experimental_analyzer = 0;
-- FIRST INSERT
INSERT INTO t1
WITH
@ -58,8 +60,48 @@ WHERE
AND query LIKE '-- FIRST INSERT\nINSERT INTO t1\n%'
AND event_date >= yesterday() AND event_time > now() - interval 10 minute;
truncate table t2;
set allow_experimental_analyzer = 1;
-- FIRST INSERT ANALYZER
INSERT INTO t1
WITH
(SELECT max(i) FROM t1) AS t1
SELECT
number as i,
t1 + t1 + t1 AS j -- Using global cache
FROM system.numbers
LIMIT 100
SETTINGS
min_insert_block_size_rows=5,
max_insert_block_size=5,
min_insert_block_size_rows_for_materialized_views=5,
max_block_size=5,
max_threads=1;
SELECT k, l, m, n, count()
FROM t2
GROUP BY k, l, m, n
ORDER BY k, l, m, n;
SYSTEM FLUSH LOGS;
SELECT
'02177_MV',
ProfileEvents['ScalarSubqueriesGlobalCacheHit'] as scalar_cache_global_hit,
ProfileEvents['ScalarSubqueriesLocalCacheHit'] as scalar_cache_local_hit,
ProfileEvents['ScalarSubqueriesCacheMiss'] as scalar_cache_miss
FROM system.query_log
WHERE
current_database = currentDatabase()
AND type = 'QueryFinish'
AND query LIKE '-- FIRST INSERT ANALYZER\nINSERT INTO t1\n%'
AND event_date >= yesterday() AND event_time > now() - interval 10 minute;
DROP TABLE mv1;
set allow_experimental_analyzer = 0;
CREATE TABLE t3 (z Int64) ENGINE = Memory;
CREATE MATERIALIZED VIEW mv2 TO t3 AS
SELECT
@ -91,8 +133,36 @@ WHERE
AND query LIKE '-- SECOND INSERT\nINSERT INTO t1%'
AND event_date >= yesterday() AND event_time > now() - interval 10 minute;
truncate table t3;
set allow_experimental_analyzer = 1;
-- SECOND INSERT ANALYZER
INSERT INTO t1
SELECT 0 as i, number as j from numbers(100)
SETTINGS
min_insert_block_size_rows=5,
max_insert_block_size=5,
min_insert_block_size_rows_for_materialized_views=5,
max_block_size=5,
max_threads=1;
SELECT * FROM t3 ORDER BY z ASC;
SYSTEM FLUSH LOGS;
SELECT
'02177_MV_2',
ProfileEvents['ScalarSubqueriesGlobalCacheHit'] as scalar_cache_global_hit,
ProfileEvents['ScalarSubqueriesLocalCacheHit'] as scalar_cache_local_hit,
ProfileEvents['ScalarSubqueriesCacheMiss'] as scalar_cache_miss
FROM system.query_log
WHERE
current_database = currentDatabase()
AND type = 'QueryFinish'
AND query LIKE '-- SECOND INSERT ANALYZER\nINSERT INTO t1%'
AND event_date >= yesterday() AND event_time > now() - interval 10 minute;
DROP TABLE mv2;
set allow_experimental_analyzer = 0;
CREATE TABLE t4 (z Int64) ENGINE = Memory;
CREATE MATERIALIZED VIEW mv3 TO t4 AS
@ -126,6 +196,35 @@ WHERE
AND query LIKE '-- THIRD INSERT\nINSERT INTO t1%'
AND event_date >= yesterday() AND event_time > now() - interval 10 minute;
truncate table t4;
set allow_experimental_analyzer = 1;
-- THIRD INSERT ANALYZER
INSERT INTO t1
SELECT number as i, number as j from numbers(100)
SETTINGS
min_insert_block_size_rows=5,
max_insert_block_size=5,
min_insert_block_size_rows_for_materialized_views=5,
max_block_size=5,
max_threads=1;
SYSTEM FLUSH LOGS;
SELECT * FROM t4 ORDER BY z ASC;
SELECT
'02177_MV_3',
ProfileEvents['ScalarSubqueriesGlobalCacheHit'] as scalar_cache_global_hit,
ProfileEvents['ScalarSubqueriesLocalCacheHit'] as scalar_cache_local_hit,
ProfileEvents['ScalarSubqueriesCacheMiss'] as scalar_cache_miss
FROM system.query_log
WHERE
current_database = currentDatabase()
AND type = 'QueryFinish'
AND query LIKE '-- THIRD INSERT ANALYZER\nINSERT INTO t1%'
AND event_date >= yesterday() AND event_time > now() - interval 10 minute;
DROP TABLE mv3;
DROP TABLE t1;
DROP TABLE t2;

View File

@ -1,62 +1,220 @@
Using storage policy: s3_cache
DROP TABLE IF EXISTS test_02241
CREATE TABLE test_02241 (key UInt32, value String) Engine=MergeTree() ORDER BY key SETTINGS storage_policy='s3_cache', min_bytes_for_wide_part = 10485760, compress_marks=false, compress_primary_key=false, ratio_of_defaults_for_sparse_serialization = 1
SYSTEM STOP MERGES test_02241
SYSTEM DROP FILESYSTEM CACHE
SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
FROM
(
SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path
FROM system.remote_data_paths
) AS data_paths
INNER JOIN
system.filesystem_cache AS caches
ON data_paths.cache_path = caches.cache_path
)
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical
SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path
0
SELECT count(), sum(size) FROM system.filesystem_cache
0 0
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100)
SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
FROM
(
SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path
FROM system.remote_data_paths
) AS data_paths
INNER JOIN
system.filesystem_cache AS caches
ON data_paths.cache_path = caches.cache_path
)
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical
Row 1:
──────
file_segment_range_begin: 0
file_segment_range_end: 745
size: 746
state: DOWNLOADED
SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path
8
SELECT count(), sum(size) FROM system.filesystem_cache
8 1100
SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0
0
SELECT * FROM test_02241 FORMAT Null
SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0
2
SELECT * FROM test_02241 FORMAT Null
SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0
2
SELECT count(), sum(size) size FROM system.filesystem_cache
8 1100
SYSTEM DROP FILESYSTEM CACHE
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100, 200)
SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
FROM
(
SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path
FROM system.remote_data_paths
) AS data_paths
INNER JOIN
system.filesystem_cache AS caches
ON data_paths.cache_path = caches.cache_path
)
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical;
Row 1:
──────
file_segment_range_begin: 0
file_segment_range_end: 1659
size: 1660
state: DOWNLOADED
SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path
8
SELECT count(), sum(size) FROM system.filesystem_cache
8 2014
SELECT count(), sum(size) FROM system.filesystem_cache
8 2014
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100) SETTINGS enable_filesystem_cache_on_write_operations=0
SELECT count(), sum(size) FROM system.filesystem_cache
8 2014
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100)
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(300, 10000)
SELECT count(), sum(size) FROM system.filesystem_cache
24 84045
SYSTEM START MERGES test_02241
OPTIMIZE TABLE test_02241 FINAL
SELECT count(), sum(size) FROM system.filesystem_cache
32 167243
ALTER TABLE test_02241 UPDATE value = 'kek' WHERE key = 100
SELECT count(), sum(size) FROM system.filesystem_cache
41 250541
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(5000000)
SYSTEM FLUSH LOGS
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(5000000) 0
SELECT count() FROM test_02241
5010500
SELECT count() FROM test_02241 WHERE value LIKE '%010%'
18816
Using storage policy: local_cache
DROP TABLE IF EXISTS test_02241
CREATE TABLE test_02241 (key UInt32, value String) Engine=MergeTree() ORDER BY key SETTINGS storage_policy='local_cache', min_bytes_for_wide_part = 10485760, compress_marks=false, compress_primary_key=false, ratio_of_defaults_for_sparse_serialization = 1
SYSTEM STOP MERGES test_02241
SYSTEM DROP FILESYSTEM CACHE
SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
FROM
(
SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path
FROM system.remote_data_paths
) AS data_paths
INNER JOIN
system.filesystem_cache AS caches
ON data_paths.cache_path = caches.cache_path
)
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical
SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path
0
SELECT count(), sum(size) FROM system.filesystem_cache
0 0
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100)
SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
FROM
(
SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path
FROM system.remote_data_paths
) AS data_paths
INNER JOIN
system.filesystem_cache AS caches
ON data_paths.cache_path = caches.cache_path
)
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical
Row 1:
──────
file_segment_range_begin: 0
file_segment_range_end: 745
size: 746
state: DOWNLOADED
SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path
8
SELECT count(), sum(size) FROM system.filesystem_cache
8 1100
SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0
0
SELECT * FROM test_02241 FORMAT Null
SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0
2
SELECT * FROM test_02241 FORMAT Null
SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0
2
SELECT count(), sum(size) size FROM system.filesystem_cache
8 1100
SYSTEM DROP FILESYSTEM CACHE
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100, 200)
SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
FROM
(
SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path
FROM system.remote_data_paths
) AS data_paths
INNER JOIN
system.filesystem_cache AS caches
ON data_paths.cache_path = caches.cache_path
)
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical;
Row 1:
──────
file_segment_range_begin: 0
file_segment_range_end: 1659
size: 1660
state: DOWNLOADED
SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path
8
SELECT count(), sum(size) FROM system.filesystem_cache
8 2014
SELECT count(), sum(size) FROM system.filesystem_cache
8 2014
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100) SETTINGS enable_filesystem_cache_on_write_operations=0
SELECT count(), sum(size) FROM system.filesystem_cache
8 2014
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100)
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(300, 10000)
SELECT count(), sum(size) FROM system.filesystem_cache
24 84045
SYSTEM START MERGES test_02241
OPTIMIZE TABLE test_02241 FINAL
SELECT count(), sum(size) FROM system.filesystem_cache
32 167243
ALTER TABLE test_02241 UPDATE value = 'kek' WHERE key = 100
SELECT count(), sum(size) FROM system.filesystem_cache
41 250541
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(5000000)
SYSTEM FLUSH LOGS
INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(5000000) 0
SELECT count() FROM test_02241
5010500
SELECT count() FROM test_02241 WHERE value LIKE '%010%'
18816

View File

@ -10,13 +10,13 @@ CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
for STORAGE_POLICY in 's3_cache' 'local_cache'; do
echo "Using storage policy: $STORAGE_POLICY"
$CLICKHOUSE_CLIENT --query "DROP TABLE IF EXISTS test_02241"
$CLICKHOUSE_CLIENT --query "CREATE TABLE test_02241 (key UInt32, value String) Engine=MergeTree() ORDER BY key SETTINGS storage_policy='$STORAGE_POLICY', min_bytes_for_wide_part = 10485760, compress_marks=false, compress_primary_key=false, ratio_of_defaults_for_sparse_serialization = 1"
$CLICKHOUSE_CLIENT --query "SYSTEM STOP MERGES test_02241"
$CLICKHOUSE_CLIENT --echo --query "DROP TABLE IF EXISTS test_02241"
$CLICKHOUSE_CLIENT --echo --query "CREATE TABLE test_02241 (key UInt32, value String) Engine=MergeTree() ORDER BY key SETTINGS storage_policy='$STORAGE_POLICY', min_bytes_for_wide_part = 10485760, compress_marks=false, compress_primary_key=false, ratio_of_defaults_for_sparse_serialization = 1"
$CLICKHOUSE_CLIENT --echo --query "SYSTEM STOP MERGES test_02241"
$CLICKHOUSE_CLIENT --query "SYSTEM DROP FILESYSTEM CACHE"
$CLICKHOUSE_CLIENT --echo --query "SYSTEM DROP FILESYSTEM CACHE"
$CLICKHOUSE_CLIENT -n --query "SELECT file_segment_range_begin, file_segment_range_end, size, state
$CLICKHOUSE_CLIENT --echo -n --query "SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
@ -32,12 +32,12 @@ for STORAGE_POLICY in 's3_cache' 'local_cache'; do
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical"
$CLICKHOUSE_CLIENT --query "SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --query "SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100)"
$CLICKHOUSE_CLIENT --echo --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100)"
$CLICKHOUSE_CLIENT -n --query "SELECT file_segment_range_begin, file_segment_range_end, size, state
$CLICKHOUSE_CLIENT --echo -n --query "SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
@ -53,24 +53,24 @@ for STORAGE_POLICY in 's3_cache' 'local_cache'; do
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical"
$CLICKHOUSE_CLIENT --query "SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --query "SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --query "SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0"
$CLICKHOUSE_CLIENT --echo --query "SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0"
$CLICKHOUSE_CLIENT --query "SELECT * FROM test_02241 FORMAT Null"
$CLICKHOUSE_CLIENT --query "SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0"
$CLICKHOUSE_CLIENT --echo --query "SELECT * FROM test_02241 FORMAT Null"
$CLICKHOUSE_CLIENT --echo --query "SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0"
$CLICKHOUSE_CLIENT --query "SELECT * FROM test_02241 FORMAT Null"
$CLICKHOUSE_CLIENT --query "SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0"
$CLICKHOUSE_CLIENT --echo --query "SELECT * FROM test_02241 FORMAT Null"
$CLICKHOUSE_CLIENT --echo --query "SELECT count() FROM system.filesystem_cache WHERE cache_hits > 0"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) size FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) size FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --query "SYSTEM DROP FILESYSTEM CACHE"
$CLICKHOUSE_CLIENT --echo --query "SYSTEM DROP FILESYSTEM CACHE"
$CLICKHOUSE_CLIENT --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100, 200)"
$CLICKHOUSE_CLIENT --echo --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100, 200)"
$CLICKHOUSE_CLIENT -n --query "SELECT file_segment_range_begin, file_segment_range_end, size, state
$CLICKHOUSE_CLIENT --echo -n --query "SELECT file_segment_range_begin, file_segment_range_end, size, state
FROM
(
SELECT file_segment_range_begin, file_segment_range_end, size, state, local_path
@ -86,27 +86,28 @@ for STORAGE_POLICY in 's3_cache' 'local_cache'; do
WHERE endsWith(local_path, 'data.bin')
FORMAT Vertical;"
$CLICKHOUSE_CLIENT --query "SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --query "SELECT count() FROM (SELECT arrayJoin(cache_paths) AS cache_path, local_path, remote_path FROM system.remote_data_paths ) AS data_paths INNER JOIN system.filesystem_cache AS caches ON data_paths.cache_path = caches.cache_path"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100) SETTINGS enable_filesystem_cache_on_write_operations=0"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100) SETTINGS enable_filesystem_cache_on_write_operations=0"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100)"
$CLICKHOUSE_CLIENT --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(300, 10000)"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(100)"
$CLICKHOUSE_CLIENT --echo --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(300, 10000)"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --query "SYSTEM START MERGES test_02241"
$CLICKHOUSE_CLIENT --echo --query "SYSTEM START MERGES test_02241"
$CLICKHOUSE_CLIENT --enable_filesystem_cache_on_write_operations=1 --query "OPTIMIZE TABLE test_02241 FINAL"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --enable_filesystem_cache_on_write_operations=1 --query "OPTIMIZE TABLE test_02241 FINAL"
$CLICKHOUSE_CLIENT --enable_filesystem_cache_on_write_operations=1 --mutations_sync=2 --query "ALTER TABLE test_02241 UPDATE value = 'kek' WHERE key = 100"
$CLICKHOUSE_CLIENT --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(5000000)"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --query "SYSTEM FLUSH LOGS"
$CLICKHOUSE_CLIENT --echo --enable_filesystem_cache_on_write_operations=1 --mutations_sync=2 --query "ALTER TABLE test_02241 UPDATE value = 'kek' WHERE key = 100"
$CLICKHOUSE_CLIENT --echo --query "SELECT count(), sum(size) FROM system.filesystem_cache"
$CLICKHOUSE_CLIENT --echo --enable_filesystem_cache_on_write_operations=1 --query "INSERT INTO test_02241 SELECT number, toString(number) FROM numbers(5000000)"
$CLICKHOUSE_CLIENT --echo --query "SYSTEM FLUSH LOGS"
$CLICKHOUSE_CLIENT -n --query "SELECT
query, ProfileEvents['RemoteFSReadBytes'] > 0 as remote_fs_read
@ -121,6 +122,6 @@ for STORAGE_POLICY in 's3_cache' 'local_cache'; do
DESC
LIMIT 1"
$CLICKHOUSE_CLIENT --query "SELECT count() FROM test_02241"
$CLICKHOUSE_CLIENT --query "SELECT count() FROM test_02241 WHERE value LIKE '%010%'"
$CLICKHOUSE_CLIENT --echo --query "SELECT count() FROM test_02241"
$CLICKHOUSE_CLIENT --echo --query "SELECT count() FROM test_02241 WHERE value LIKE '%010%'"
done

View File

@ -0,0 +1,5 @@
drop table if exists test_d;
create table test_d engine=Distributed(test_cluster_two_shard_three_replicas_localhost, system, numbers);
select * from test_d limit 10 settings max_parallel_replicas = 0, prefer_localhost_replica = 0; --{serverError BAD_ARGUMENTS}
drop table test_d;

View File

@ -260,6 +260,7 @@ ExactEdgeLengthRads
ExecutablePool
ExtType
ExternalDistributed
FFFFFFFF
FFFD
FIPS
FOSDEM
@ -546,6 +547,8 @@ MinIO
MinMax
MindsDB
Mongodb
mortonDecode
mortonEncode
MsgPack
MultiPolygon
Multiline
@ -1846,6 +1849,7 @@ linearized
lineasstring
linefeeds
lineorder
linestring
linux
llvm
loadDefaultCAFile
@ -2204,7 +2208,9 @@ rankCorr
rapidjson
rawblob
readWKTMultiPolygon
readWKTPoint
readWKTPolygon
readWKTRing
readahead
readline
readme
@ -2743,6 +2749,7 @@ xz
yaml
yandex
youtube
ZCurve
zLib
zLinux
zabbix

View File

@ -4,31 +4,35 @@
cd /ClickHouse/utils/check-style || echo -e "failure\tRepo not found" > /test_output/check_status.tsv
start_total=`date +%s`
# FIXME: 30 sec to wait
# echo "Check duplicates" | ts
# ./check-duplicate-includes.sh |& tee /test_output/duplicate_includes_output.txt
echo "Check style" | ts
start=`date +%s`
./check-style -n |& tee /test_output/style_output.txt
echo "Check typos" | ts
./check-typos |& tee /test_output/typos_output.txt
echo "Check docs spelling" | ts
./check-doc-aspell |& tee /test_output/docs_spelling_output.txt
echo "Check whitespaces" | ts
runtime=$((`date +%s`-start))
echo "Check style. Done. $runtime seconds."
start=`date +%s`
./check-whitespaces -n |& tee /test_output/whitespaces_output.txt
echo "Check workflows" | ts
runtime=$((`date +%s`-start))
echo "Check whitespaces. Done. $runtime seconds."
start=`date +%s`
./check-workflows |& tee /test_output/workflows_output.txt
echo "Check submodules" | ts
runtime=$((`date +%s`-start))
echo "Check workflows. Done. $runtime seconds."
start=`date +%s`
./check-submodules |& tee /test_output/submodules_output.txt
echo "Check style. Done" | ts
runtime=$((`date +%s`-start))
echo "Check submodules. Done. $runtime seconds."
# FIXME: 6 min to wait
# echo "Check shell scripts with shellcheck" | ts
# ./shellcheck-run.sh |& tee /test_output/shellcheck_output.txt
# FIXME: move out
# /process_style_check_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
# echo "Check help for changelog generator works" | ts
# cd ../changelog || exit 1
# ./changelog.py -h 2>/dev/null 1>&2
runtime=$((`date +%s`-start_total))
echo "Check style total. Done. $runtime seconds."

20
utils/check-style/check_docs.sh Executable file
View File

@ -0,0 +1,20 @@
#!/bin/bash
# yaml check is not the best one
cd /ClickHouse/utils/check-style || echo -e "failure\tRepo not found" > /test_output/check_status.tsv
start_total=`date +%s`
start=`date +%s`
./check-typos |& tee /test_output/typos_output.txt
runtime=$((`date +%s`-start))
echo "Check typos. Done. $runtime seconds."
start=`date +%s`
./check-doc-aspell |& tee /test_output/docs_spelling_output.txt
runtime=$((`date +%s`-start))
echo "Check docs spelling. Done. $runtime seconds."
runtime=$((`date +%s`-start_total))
echo "Check Docs, total. Done. $runtime seconds."

View File

@ -1,17 +1,22 @@
#!/bin/bash
# yaml check is not the best one
cd /ClickHouse/utils/check-style || echo -e "failure\tRepo not found" > /test_output/check_status.tsv
start_total=`date +%s`
# FIXME: 1 min to wait + head checkout
# echo "Check python formatting with black" | ts
# ./check-black -n |& tee /test_output/black_output.txt
echo "Check python formatting with black" | ts
./check-black -n |& tee /test_output/black_output.txt
echo "Check pylint" | ts
start=`date +%s`
./check-pylint -n |& tee /test_output/pylint_output.txt
echo "Check pylint. Done" | ts
runtime=$((`date +%s`-start))
echo "Check pylint. Done. $runtime seconds."
echo "Check python type hinting with mypy" | ts
start=`date +%s`
./check-mypy -n |& tee /test_output/mypy_output.txt
echo "Check python type hinting with mypy. Done" | ts
runtime=$((`date +%s`-start))
echo "Check python type hinting with mypy. Done. $runtime seconds."
runtime=$((`date +%s`-start_total))
echo "Check python total. Done. $runtime seconds."

View File

@ -1,9 +1,9 @@
#!/usr/bin/env python3
import os
import logging
import argparse
import csv
import logging
import os
# TODO: add typing and log files to the fourth column, think about launching
@ -13,11 +13,11 @@ def process_result(result_folder):
description = ""
test_results = []
checks = (
#"duplicate includes",
#"shellcheck",
# "duplicate includes",
# "shellcheck",
"style",
"pylint",
#"black",
"black",
"mypy",
"typos",
"whitespaces",
@ -30,11 +30,15 @@ def process_result(result_folder):
out_file = name.replace(" ", "_") + "_output.txt"
full_path = os.path.join(result_folder, out_file)
if not os.path.exists(full_path):
logging.info("No %s check log on path %s", name, full_path)
return "exception", f"No {name} check log", []
test_results.append((f"Check {name}", "SKIPPED"))
elif os.stat(full_path).st_size != 0:
with open(full_path, "r") as file:
lines = file.readlines()
if len(lines) > 100:
lines = lines[:100] + ["====TRIMMED===="]
content = "\n".join(lines)
description += f"Check {name} failed. "
test_results.append((f"Check {name}", "FAIL"))
test_results.append((f"Check {name}", "FAIL", None, content))
status = "failure"
else:
test_results.append((f"Check {name}", "OK"))
@ -42,6 +46,8 @@ def process_result(result_folder):
if not description:
description += "Style check success"
assert test_results, "No single style-check output found"
return status, description, test_results