Compare commits

...

26 Commits

Author SHA1 Message Date
Shichao Jin
06e93c4a15
Merge 5231d00616 into 44b4bd38b9 2024-11-20 15:25:35 -08:00
Mikhail Artemenko
44b4bd38b9
Merge pull request #72045 from ClickHouse/issues/70174/cluster_versions
Enable cluster table functions for DataLake Storages
2024-11-20 21:22:37 +00:00
Shichao Jin
40c7d5fd1a
Merge pull request #71894 from udiz/fix-arrayWithConstant-size-estimation
Fix: arrayWithConstant size estimation using row's element size
2024-11-20 19:56:27 +00:00
Mikhail Artemenko
4ccebd9a24 fix syntax for iceberg in docs 2024-11-20 11:15:39 +00:00
Mikhail Artemenko
99177c0daf remove icebergCluster alias 2024-11-20 11:15:12 +00:00
jsc0218
5231d00616 remove unused and add compact projout 2024-11-20 03:11:36 +00:00
jsc0218
190651cf23 add test 2024-11-19 20:44:19 +00:00
Mikhail Artemenko
0951991c1d update aspell-dict.txt 2024-11-19 13:10:42 +00:00
Mikhail Artemenko
19aec5e572 Merge branch 'issues/70174/cluster_versions' of github.com:ClickHouse/ClickHouse into issues/70174/cluster_versions 2024-11-19 12:51:56 +00:00
Mikhail Artemenko
a367de9977 add docs 2024-11-19 12:49:59 +00:00
Mikhail Artemenko
6894e280b2 fix pr issues 2024-11-19 12:34:42 +00:00
Mikhail Artemenko
39ebe113d9 Merge branch 'master' into issues/70174/cluster_versions 2024-11-19 11:28:46 +00:00
jsc0218
0ee0ab9132 use existing projpart 2024-11-19 00:35:46 +00:00
jsc0218
8fa8d5ac0f add metadata for projpart and remove duplicated code 2024-11-19 00:08:14 +00:00
udiz
239bbaa133 use length 2024-11-19 00:00:43 +00:00
udiz
07fac5808d format null on test 2024-11-18 23:08:48 +00:00
udiz
ed95e0781f test uses less memory 2024-11-18 22:48:38 +00:00
robot-clickhouse
014608fb6b Automatic style fix 2024-11-18 17:51:51 +00:00
Mikhail Artemenko
a29ded4941 add test for iceberg 2024-11-18 17:39:46 +00:00
Mikhail Artemenko
d2efae7511 enable cluster versions for datalake storages 2024-11-18 17:35:21 +00:00
jsc0218
445879a2ac prepare to add meta for proj part 2024-11-18 02:09:50 +00:00
jsc0218
130a1151fe MutationInterpreter prepare for proj 2024-11-14 03:18:56 +00:00
udiz
6879aa130a newline 2024-11-13 22:47:54 +00:00
udiz
43f3c886a2 add test 2024-11-13 22:46:36 +00:00
udiz
c383a743f7 arrayWithConstant size estimation using single value size 2024-11-13 20:02:31 +00:00
jsc0218
af7e3640d1 preparaion 2024-11-11 01:52:19 +00:00
23 changed files with 599 additions and 33 deletions

View File

@ -49,4 +49,4 @@ LIMIT 2
**See Also** **See Also**
- [DeltaLake engine](/docs/en/engines/table-engines/integrations/deltalake.md) - [DeltaLake engine](/docs/en/engines/table-engines/integrations/deltalake.md)
- [DeltaLake cluster table function](/docs/en/sql-reference/table-functions/deltalakeCluster.md)

View File

@ -0,0 +1,30 @@
---
slug: /en/sql-reference/table-functions/deltalakeCluster
sidebar_position: 46
sidebar_label: deltaLakeCluster
title: "deltaLakeCluster Table Function"
---
This is an extension to the [deltaLake](/docs/en/sql-reference/table-functions/deltalake.md) table function.
Allows processing files from [Delta Lake](https://github.com/delta-io/delta) tables in Amazon S3 in parallel from many nodes in a specified cluster. On initiator it creates a connection to all nodes in the cluster and dispatches each file dynamically. On the worker node it asks the initiator about the next task to process and processes it. This is repeated until all tasks are finished.
**Syntax**
``` sql
deltaLakeCluster(cluster_name, url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
```
**Arguments**
- `cluster_name` — Name of a cluster that is used to build a set of addresses and connection parameters to remote and local servers.
- Description of all other arguments coincides with description of arguments in equivalent [deltaLake](/docs/en/sql-reference/table-functions/deltalake.md) table function.
**Returned value**
A table with the specified structure for reading data from cluster in the specified Delta Lake table in S3.
**See Also**
- [deltaLake engine](/docs/en/engines/table-engines/integrations/deltalake.md)
- [deltaLake table function](/docs/en/sql-reference/table-functions/deltalake.md)

View File

@ -29,4 +29,4 @@ A table with the specified structure for reading data in the specified Hudi tabl
**See Also** **See Also**
- [Hudi engine](/docs/en/engines/table-engines/integrations/hudi.md) - [Hudi engine](/docs/en/engines/table-engines/integrations/hudi.md)
- [Hudi cluster table function](/docs/en/sql-reference/table-functions/hudiCluster.md)

View File

@ -0,0 +1,30 @@
---
slug: /en/sql-reference/table-functions/hudiCluster
sidebar_position: 86
sidebar_label: hudiCluster
title: "hudiCluster Table Function"
---
This is an extension to the [hudi](/docs/en/sql-reference/table-functions/hudi.md) table function.
Allows processing files from Apache [Hudi](https://hudi.apache.org/) tables in Amazon S3 in parallel from many nodes in a specified cluster. On initiator it creates a connection to all nodes in the cluster and dispatches each file dynamically. On the worker node it asks the initiator about the next task to process and processes it. This is repeated until all tasks are finished.
**Syntax**
``` sql
hudiCluster(cluster_name, url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
```
**Arguments**
- `cluster_name` — Name of a cluster that is used to build a set of addresses and connection parameters to remote and local servers.
- Description of all other arguments coincides with description of arguments in equivalent [hudi](/docs/en/sql-reference/table-functions/hudi.md) table function.
**Returned value**
A table with the specified structure for reading data from cluster in the specified Hudi table in S3.
**See Also**
- [Hudi engine](/docs/en/engines/table-engines/integrations/hudi.md)
- [Hudi table function](/docs/en/sql-reference/table-functions/hudi.md)

View File

@ -72,3 +72,4 @@ Table function `iceberg` is an alias to `icebergS3` now.
**See Also** **See Also**
- [Iceberg engine](/docs/en/engines/table-engines/integrations/iceberg.md) - [Iceberg engine](/docs/en/engines/table-engines/integrations/iceberg.md)
- [Iceberg cluster table function](/docs/en/sql-reference/table-functions/icebergCluster.md)

View File

@ -0,0 +1,43 @@
---
slug: /en/sql-reference/table-functions/icebergCluster
sidebar_position: 91
sidebar_label: icebergCluster
title: "icebergCluster Table Function"
---
This is an extension to the [iceberg](/docs/en/sql-reference/table-functions/iceberg.md) table function.
Allows processing files from Apache [Iceberg](https://iceberg.apache.org/) in parallel from many nodes in a specified cluster. On initiator it creates a connection to all nodes in the cluster and dispatches each file dynamically. On the worker node it asks the initiator about the next task to process and processes it. This is repeated until all tasks are finished.
**Syntax**
``` sql
icebergS3Cluster(cluster_name, url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])
icebergS3Cluster(cluster_name, named_collection[, option=value [,..]])
icebergAzureCluster(cluster_name, connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])
icebergAzureCluster(cluster_name, named_collection[, option=value [,..]])
icebergHDFSCluster(cluster_name, path_to_table, [,format] [,compression_method])
icebergHDFSCluster(cluster_name, named_collection[, option=value [,..]])
```
**Arguments**
- `cluster_name` — Name of a cluster that is used to build a set of addresses and connection parameters to remote and local servers.
- Description of all other arguments coincides with description of arguments in equivalent [iceberg](/docs/en/sql-reference/table-functions/iceberg.md) table function.
**Returned value**
A table with the specified structure for reading data from cluster in the specified Iceberg table.
**Examples**
```sql
SELECT * FROM icebergS3Cluster('cluster_simple', 'http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
```
**See Also**
- [Iceberg engine](/docs/en/engines/table-engines/integrations/iceberg.md)
- [Iceberg table function](/docs/en/sql-reference/table-functions/iceberg.md)

View File

@ -62,16 +62,17 @@ public:
for (size_t i = 0; i < num_rows; ++i) for (size_t i = 0; i < num_rows; ++i)
{ {
auto array_size = col_num->getInt(i); auto array_size = col_num->getInt(i);
auto element_size = col_value->byteSizeAt(i);
if (unlikely(array_size < 0)) if (unlikely(array_size < 0))
throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Array size {} cannot be negative: while executing function {}", array_size, getName()); throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Array size {} cannot be negative: while executing function {}", array_size, getName());
Int64 estimated_size = 0; Int64 estimated_size = 0;
if (unlikely(common::mulOverflow(array_size, col_value->byteSize(), estimated_size))) if (unlikely(common::mulOverflow(array_size, element_size, estimated_size)))
throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Array size {} with element size {} bytes is too large: while executing function {}", array_size, col_value->byteSize(), getName()); throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Array size {} with element size {} bytes is too large: while executing function {}", array_size, element_size, getName());
if (unlikely(estimated_size > max_array_size_in_columns_bytes)) if (unlikely(estimated_size > max_array_size_in_columns_bytes))
throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Array size {} with element size {} bytes is too large: while executing function {}", array_size, col_value->byteSize(), getName()); throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Array size {} with element size {} bytes is too large: while executing function {}", array_size, element_size, getName());
offset += array_size; offset += array_size;

View File

@ -756,12 +756,11 @@ MergeTreeDataWriter::TemporaryPart MergeTreeDataWriter::writeProjectionPartImpl(
TemporaryPart temp_part; TemporaryPart temp_part;
const auto & metadata_snapshot = projection.metadata; const auto & metadata_snapshot = projection.metadata;
MergeTreeDataPartType part_type;
/// Size of part would not be greater than block.bytes() + epsilon /// Size of part would not be greater than block.bytes() + epsilon
size_t expected_size = block.bytes(); size_t expected_size = block.bytes();
// just check if there is enough space on parent volume // just check if there is enough space on parent volume
MergeTreeData::reserveSpace(expected_size, parent_part->getDataPartStorage()); MergeTreeData::reserveSpace(expected_size, parent_part->getDataPartStorage());
part_type = data.choosePartFormatOnDisk(expected_size, block.rows()).part_type; MergeTreeDataPartType part_type = data.choosePartFormatOnDisk(expected_size, block.rows()).part_type;
auto new_data_part = parent_part->getProjectionPartBuilder(part_name, is_temp).withPartType(part_type).build(); auto new_data_part = parent_part->getProjectionPartBuilder(part_name, is_temp).withPartType(part_type).build();
auto projection_part_storage = new_data_part->getDataPartStoragePtr(); auto projection_part_storage = new_data_part->getDataPartStoragePtr();

View File

@ -1008,7 +1008,6 @@ void finalizeMutatedPart(
new_data_part->default_codec = codec; new_data_part->default_codec = codec;
} }
} }
struct MutationContext struct MutationContext
@ -1023,7 +1022,9 @@ struct MutationContext
FutureMergedMutatedPartPtr future_part; FutureMergedMutatedPartPtr future_part;
MergeTreeData::DataPartPtr source_part; MergeTreeData::DataPartPtr source_part;
MergeTreeData::DataPartPtr projection_part;
StorageMetadataPtr metadata_snapshot; StorageMetadataPtr metadata_snapshot;
StorageMetadataPtr projection_metadata_snapshot;
MutationCommandsConstPtr commands; MutationCommandsConstPtr commands;
time_t time_of_mutation; time_t time_of_mutation;
@ -1040,6 +1041,11 @@ struct MutationContext
ProgressCallback progress_callback; ProgressCallback progress_callback;
Block updated_header; Block updated_header;
/// source here is projection part instead of table part
QueryPipelineBuilder projection_mutating_pipeline_builder;
QueryPipeline projection_mutating_pipeline;
std::unique_ptr<PullingPipelineExecutor> projection_mutating_executor;
std::unique_ptr<MutationsInterpreter> interpreter; std::unique_ptr<MutationsInterpreter> interpreter;
UInt64 watch_prev_elapsed = 0; UInt64 watch_prev_elapsed = 0;
std::unique_ptr<MergeStageProgress> stage_progress; std::unique_ptr<MergeStageProgress> stage_progress;
@ -1056,8 +1062,14 @@ struct MutationContext
MergeTreeData::MutableDataPartPtr new_data_part; MergeTreeData::MutableDataPartPtr new_data_part;
IMergedBlockOutputStreamPtr out; IMergedBlockOutputStreamPtr out;
MergeTreeData::MutableDataPartPtr new_projection_part;
IMergedBlockOutputStreamPtr projection_out;
String mrk_extension; String mrk_extension;
/// Used in lightweight delete to bit mask the projection if possible,
/// preference is higher than rebuild as more performant.
std::vector<ProjectionDescriptionRawPtr> projections_to_mask;
std::vector<ProjectionDescriptionRawPtr> projections_to_build; std::vector<ProjectionDescriptionRawPtr> projections_to_build;
IMergeTreeDataPart::MinMaxIndexPtr minmax_idx; IMergeTreeDataPart::MinMaxIndexPtr minmax_idx;
@ -1151,6 +1163,14 @@ public:
if (iterateThroughAllProjections()) if (iterateThroughAllProjections())
return true; return true;
state = State::NEED_MASK_PROJECTION_PARTS;
return true;
}
case State::NEED_MASK_PROJECTION_PARTS:
{
if (iterateThroughAllProjectionsToMask())
return true;
state = State::SUCCESS; state = State::SUCCESS;
return true; return true;
} }
@ -1170,6 +1190,7 @@ private:
void finalizeTempProjections(); void finalizeTempProjections();
bool iterateThroughAllProjections(); bool iterateThroughAllProjections();
void constructTaskForProjectionPartsMerge(); void constructTaskForProjectionPartsMerge();
bool iterateThroughAllProjectionsToMask();
void finalize(); void finalize();
enum class State : uint8_t enum class State : uint8_t
@ -1177,7 +1198,7 @@ private:
NEED_PREPARE, NEED_PREPARE,
NEED_MUTATE_ORIGINAL_PART, NEED_MUTATE_ORIGINAL_PART,
NEED_MERGE_PROJECTION_PARTS, NEED_MERGE_PROJECTION_PARTS,
NEED_MASK_PROJECTION_PARTS,
SUCCESS SUCCESS
}; };
@ -1343,6 +1364,14 @@ bool PartMergerWriter::iterateThroughAllProjections()
return true; return true;
} }
bool PartMergerWriter::iterateThroughAllProjectionsToMask()
{
for (size_t i = 0, size = ctx->projections_to_mask.size(); i < size; ++i)
{
}
return false;
}
void PartMergerWriter::finalize() void PartMergerWriter::finalize()
{ {
if (ctx->count_lightweight_deleted_rows) if (ctx->count_lightweight_deleted_rows)
@ -1660,6 +1689,30 @@ private:
ctx->mutating_pipeline.disableProfileEventUpdate(); ctx->mutating_pipeline.disableProfileEventUpdate();
ctx->mutating_executor = std::make_unique<PullingPipelineExecutor>(ctx->mutating_pipeline); ctx->mutating_executor = std::make_unique<PullingPipelineExecutor>(ctx->mutating_pipeline);
chassert (ctx->projection_mutating_pipeline_builder.initialized());
builder = std::make_unique<QueryPipelineBuilder>(std::move(ctx->projection_mutating_pipeline_builder));
const auto & projections_name_and_part = ctx->source_part->getProjectionParts();
auto part_name = fmt::format("{}", projections_name_and_part.begin()->first);
auto new_data_part = ctx->new_data_part->getProjectionPartBuilder(part_name, true).withPartType(ctx->projection_part->getType()).build();
ctx->projection_out = std::make_shared<MergedBlockOutputStream>(
new_data_part,
ctx->projection_metadata_snapshot,
ctx->projection_metadata_snapshot->getColumns().getAllPhysical(),
MergeTreeIndices{},
ColumnsStatistics{},
ctx->data->getContext()->chooseCompressionCodec(0, 0),
Tx::PrehistoricTID,
/*reset_columns=*/ false,
/*save_marks_in_cache=*/ false,
/*blocks_are_granules_size=*/ false,
ctx->data->getContext()->getWriteSettings());
ctx->projection_mutating_pipeline = QueryPipelineBuilder::getPipeline(std::move(*builder));
ctx->projection_mutating_pipeline.disableProfileEventUpdate();
ctx->projection_mutating_executor = std::make_unique<PullingPipelineExecutor>(ctx->projection_mutating_pipeline);
part_merger_writer_task = std::make_unique<PartMergerWriter>(ctx); part_merger_writer_task = std::make_unique<PartMergerWriter>(ctx);
} }
@ -1895,6 +1948,26 @@ private:
ctx->projections_to_build = std::vector<ProjectionDescriptionRawPtr>{ctx->projections_to_recalc.begin(), ctx->projections_to_recalc.end()}; ctx->projections_to_build = std::vector<ProjectionDescriptionRawPtr>{ctx->projections_to_recalc.begin(), ctx->projections_to_recalc.end()};
chassert (ctx->projection_mutating_pipeline_builder.initialized());
builder = std::make_unique<QueryPipelineBuilder>(std::move(ctx->projection_mutating_pipeline_builder));
// ctx->projection_out = std::make_shared<MergedColumnOnlyOutputStream>(
// ctx->new_projection_part,
// ctx->metadata_snapshot,
// ctx->updated_header.getNamesAndTypesList(),
// ctx->compression_codec,
// std::vector<MergeTreeIndexPtr>(),
// std::vector<ColumnStatisticsPartPtr>(),
// nullptr,
// /*save_marks_in_cache=*/ false,
// ctx->projection_part->index_granularity,
// &ctx->projection_part->index_granularity_info
// );
ctx->projection_mutating_pipeline = QueryPipelineBuilder::getPipeline(std::move(*builder));
ctx->projection_mutating_pipeline.disableProfileEventUpdate();
ctx->projection_mutating_executor = std::make_unique<PullingPipelineExecutor>(ctx->projection_mutating_pipeline);
part_merger_writer_task = std::make_unique<PartMergerWriter>(ctx); part_merger_writer_task = std::make_unique<PartMergerWriter>(ctx);
} }
} }
@ -2308,6 +2381,32 @@ bool MutateTask::prepare()
ctx->updated_header = ctx->interpreter->getUpdatedHeader(); ctx->updated_header = ctx->interpreter->getUpdatedHeader();
ctx->progress_callback = MergeProgressCallback((*ctx->mutate_entry)->ptr(), ctx->watch_prev_elapsed, *ctx->stage_progress); ctx->progress_callback = MergeProgressCallback((*ctx->mutate_entry)->ptr(), ctx->watch_prev_elapsed, *ctx->stage_progress);
const auto & proj_desc = *(ctx->metadata_snapshot->getProjections().begin());
ctx->projection_metadata_snapshot = proj_desc.metadata;
const auto & projections_name_and_part = ctx->source_part->getProjectionParts();
ctx->projection_part = projections_name_and_part.begin()->second;
MutationCommands projection_commands;
MutationHelpers::splitAndModifyMutationCommands(
ctx->projection_part,
proj_desc.metadata,
alter_conversions,
ctx->commands_for_part,
projection_commands,
ctx->for_file_renames,
false,
ctx->log);
chassert(!projection_commands.empty());
auto projection_interpreter = std::make_unique<MutationsInterpreter>(
*ctx->data, ctx->projection_part, alter_conversions,
proj_desc.metadata, projection_commands,
proj_desc.metadata->getColumns().getNamesOfPhysical(), context_for_reading, settings);
ctx->projection_mutating_pipeline_builder = projection_interpreter->execute();
lightweight_delete_mode = ctx->updated_header.has(RowExistsColumn::name); lightweight_delete_mode = ctx->updated_header.has(RowExistsColumn::name);
/// If under the condition of lightweight delete mode with rebuild option, add projections again here as we can only know /// If under the condition of lightweight delete mode with rebuild option, add projections again here as we can only know
/// the condition as early as from here. /// the condition as early as from here.

View File

@ -334,6 +334,17 @@ Block ProjectionDescription::calculate(const Block & block, ContextPtr context)
return ret; return ret;
} }
void ProjectionDescription::mask(const Block & block[[maybe_unused]], ContextPtr context[[maybe_unused]]) const
{
// auto mut_context = Context::createCopy(context);
// query_ast_copy = query_ast->clone();
// auto builder = InterpreterAlterQuery(query_ast_copy, mut_context,
// Pipe(std::make_shared<SourceFromSingleChunk>(block)))
// .buildQueryPipeline();
// auto pipeline = QueryPipelineBuilder::getPipeline(std::move(builder));
// PullingPipelineExecutor executor(pipeline);
// executor.pull(ret);
}
String ProjectionsDescription::toString() const String ProjectionsDescription::toString() const
{ {

View File

@ -93,6 +93,8 @@ struct ProjectionDescription
Block calculate(const Block & block, ContextPtr context) const; Block calculate(const Block & block, ContextPtr context) const;
void mask(const Block & block, ContextPtr context) const;
String getDirectoryName() const { return name + ".proj"; } String getDirectoryName() const { return name + ".proj"; }
}; };

View File

@ -226,6 +226,26 @@ template class TableFunctionObjectStorage<HDFSClusterDefinition, StorageHDFSConf
#endif #endif
template class TableFunctionObjectStorage<LocalDefinition, StorageLocalConfiguration>; template class TableFunctionObjectStorage<LocalDefinition, StorageLocalConfiguration>;
#if USE_AVRO && USE_AWS_S3
template class TableFunctionObjectStorage<IcebergS3ClusterDefinition, StorageS3IcebergConfiguration>;
#endif
#if USE_AVRO && USE_AZURE_BLOB_STORAGE
template class TableFunctionObjectStorage<IcebergAzureClusterDefinition, StorageAzureIcebergConfiguration>;
#endif
#if USE_AVRO && USE_HDFS
template class TableFunctionObjectStorage<IcebergHDFSClusterDefinition, StorageHDFSIcebergConfiguration>;
#endif
#if USE_PARQUET && USE_AWS_S3
template class TableFunctionObjectStorage<DeltaLakeClusterDefinition, StorageS3DeltaLakeConfiguration>;
#endif
#if USE_AWS_S3
template class TableFunctionObjectStorage<HudiClusterDefinition, StorageS3HudiConfiguration>;
#endif
#if USE_AVRO #if USE_AVRO
void registerTableFunctionIceberg(TableFunctionFactory & factory) void registerTableFunctionIceberg(TableFunctionFactory & factory)
{ {

View File

@ -96,7 +96,7 @@ void registerTableFunctionObjectStorageCluster(TableFunctionFactory & factory)
{ {
.documentation = { .documentation = {
.description=R"(The table function can be used to read the data stored on HDFS in parallel for many nodes in a specified cluster.)", .description=R"(The table function can be used to read the data stored on HDFS in parallel for many nodes in a specified cluster.)",
.examples{{"HDFSCluster", "SELECT * FROM HDFSCluster(cluster_name, uri, format)", ""}}}, .examples{{"HDFSCluster", "SELECT * FROM HDFSCluster(cluster, uri, format)", ""}}},
.allow_readonly = false .allow_readonly = false
} }
); );
@ -105,15 +105,77 @@ void registerTableFunctionObjectStorageCluster(TableFunctionFactory & factory)
UNUSED(factory); UNUSED(factory);
} }
#if USE_AVRO
void registerTableFunctionIcebergCluster(TableFunctionFactory & factory)
{
UNUSED(factory);
#if USE_AWS_S3 #if USE_AWS_S3
template class TableFunctionObjectStorageCluster<S3ClusterDefinition, StorageS3Configuration>; factory.registerFunction<TableFunctionIcebergS3Cluster>(
{.documentation
= {.description = R"(The table function can be used to read the Iceberg table stored on S3 object store in parallel for many nodes in a specified cluster.)",
.examples{{"icebergS3Cluster", "SELECT * FROM icebergS3Cluster(cluster, url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression])", ""}},
.categories{"DataLake"}},
.allow_readonly = false});
#endif #endif
#if USE_AZURE_BLOB_STORAGE #if USE_AZURE_BLOB_STORAGE
template class TableFunctionObjectStorageCluster<AzureClusterDefinition, StorageAzureConfiguration>; factory.registerFunction<TableFunctionIcebergAzureCluster>(
{.documentation
= {.description = R"(The table function can be used to read the Iceberg table stored on Azure object store in parallel for many nodes in a specified cluster.)",
.examples{{"icebergAzureCluster", "SELECT * FROM icebergAzureCluster(cluster, connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])", ""}},
.categories{"DataLake"}},
.allow_readonly = false});
#endif #endif
#if USE_HDFS #if USE_HDFS
template class TableFunctionObjectStorageCluster<HDFSClusterDefinition, StorageHDFSConfiguration>; factory.registerFunction<TableFunctionIcebergHDFSCluster>(
{.documentation
= {.description = R"(The table function can be used to read the Iceberg table stored on HDFS virtual filesystem in parallel for many nodes in a specified cluster.)",
.examples{{"icebergHDFSCluster", "SELECT * FROM icebergHDFSCluster(cluster, uri, [format], [structure], [compression_method])", ""}},
.categories{"DataLake"}},
.allow_readonly = false});
#endif #endif
} }
#endif
#if USE_AWS_S3
#if USE_PARQUET
void registerTableFunctionDeltaLakeCluster(TableFunctionFactory & factory)
{
factory.registerFunction<TableFunctionDeltaLakeCluster>(
{.documentation
= {.description = R"(The table function can be used to read the DeltaLake table stored on object store in parallel for many nodes in a specified cluster.)",
.examples{{"deltaLakeCluster", "SELECT * FROM deltaLakeCluster(cluster, url, access_key_id, secret_access_key)", ""}},
.categories{"DataLake"}},
.allow_readonly = false});
}
#endif
void registerTableFunctionHudiCluster(TableFunctionFactory & factory)
{
factory.registerFunction<TableFunctionHudiCluster>(
{.documentation
= {.description = R"(The table function can be used to read the Hudi table stored on object store in parallel for many nodes in a specified cluster.)",
.examples{{"hudiCluster", "SELECT * FROM hudiCluster(cluster, url, access_key_id, secret_access_key)", ""}},
.categories{"DataLake"}},
.allow_readonly = false});
}
#endif
void registerDataLakeClusterTableFunctions(TableFunctionFactory & factory)
{
UNUSED(factory);
#if USE_AVRO
registerTableFunctionIcebergCluster(factory);
#endif
#if USE_AWS_S3
#if USE_PARQUET
registerTableFunctionDeltaLakeCluster(factory);
#endif
registerTableFunctionHudiCluster(factory);
#endif
}
}

View File

@ -33,6 +33,36 @@ struct HDFSClusterDefinition
static constexpr auto storage_type_name = "HDFSCluster"; static constexpr auto storage_type_name = "HDFSCluster";
}; };
struct IcebergS3ClusterDefinition
{
static constexpr auto name = "icebergS3Cluster";
static constexpr auto storage_type_name = "IcebergS3Cluster";
};
struct IcebergAzureClusterDefinition
{
static constexpr auto name = "icebergAzureCluster";
static constexpr auto storage_type_name = "IcebergAzureCluster";
};
struct IcebergHDFSClusterDefinition
{
static constexpr auto name = "icebergHDFSCluster";
static constexpr auto storage_type_name = "IcebergHDFSCluster";
};
struct DeltaLakeClusterDefinition
{
static constexpr auto name = "deltaLakeCluster";
static constexpr auto storage_type_name = "DeltaLakeS3Cluster";
};
struct HudiClusterDefinition
{
static constexpr auto name = "hudiCluster";
static constexpr auto storage_type_name = "HudiS3Cluster";
};
/** /**
* Class implementing s3/hdfs/azureBlobStorageCluster(...) table functions, * Class implementing s3/hdfs/azureBlobStorageCluster(...) table functions,
* which allow to process many files from S3/HDFS/Azure blob storage on a specific cluster. * which allow to process many files from S3/HDFS/Azure blob storage on a specific cluster.
@ -79,4 +109,25 @@ using TableFunctionAzureBlobCluster = TableFunctionObjectStorageCluster<AzureClu
#if USE_HDFS #if USE_HDFS
using TableFunctionHDFSCluster = TableFunctionObjectStorageCluster<HDFSClusterDefinition, StorageHDFSConfiguration>; using TableFunctionHDFSCluster = TableFunctionObjectStorageCluster<HDFSClusterDefinition, StorageHDFSConfiguration>;
#endif #endif
#if USE_AVRO && USE_AWS_S3
using TableFunctionIcebergS3Cluster = TableFunctionObjectStorageCluster<IcebergS3ClusterDefinition, StorageS3IcebergConfiguration>;
#endif
#if USE_AVRO && USE_AZURE_BLOB_STORAGE
using TableFunctionIcebergAzureCluster = TableFunctionObjectStorageCluster<IcebergAzureClusterDefinition, StorageAzureIcebergConfiguration>;
#endif
#if USE_AVRO && USE_HDFS
using TableFunctionIcebergHDFSCluster = TableFunctionObjectStorageCluster<IcebergHDFSClusterDefinition, StorageHDFSIcebergConfiguration>;
#endif
#if USE_AWS_S3 && USE_PARQUET
using TableFunctionDeltaLakeCluster = TableFunctionObjectStorageCluster<DeltaLakeClusterDefinition, StorageS3DeltaLakeConfiguration>;
#endif
#if USE_AWS_S3
using TableFunctionHudiCluster = TableFunctionObjectStorageCluster<HudiClusterDefinition, StorageS3HudiConfiguration>;
#endif
} }

View File

@ -66,6 +66,7 @@ void registerTableFunctions(bool use_legacy_mongodb_integration [[maybe_unused]]
registerTableFunctionObjectStorage(factory); registerTableFunctionObjectStorage(factory);
registerTableFunctionObjectStorageCluster(factory); registerTableFunctionObjectStorageCluster(factory);
registerDataLakeTableFunctions(factory); registerDataLakeTableFunctions(factory);
registerDataLakeClusterTableFunctions(factory);
} }
} }

View File

@ -70,6 +70,7 @@ void registerTableFunctionExplain(TableFunctionFactory & factory);
void registerTableFunctionObjectStorage(TableFunctionFactory & factory); void registerTableFunctionObjectStorage(TableFunctionFactory & factory);
void registerTableFunctionObjectStorageCluster(TableFunctionFactory & factory); void registerTableFunctionObjectStorageCluster(TableFunctionFactory & factory);
void registerDataLakeTableFunctions(TableFunctionFactory & factory); void registerDataLakeTableFunctions(TableFunctionFactory & factory);
void registerDataLakeClusterTableFunctions(TableFunctionFactory & factory);
void registerTableFunctionTimeSeries(TableFunctionFactory & factory); void registerTableFunctionTimeSeries(TableFunctionFactory & factory);

View File

@ -0,0 +1,20 @@
<clickhouse>
<remote_servers>
<cluster_simple>
<shard>
<replica>
<host>node1</host>
<port>9000</port>
</replica>
<replica>
<host>node2</host>
<port>9000</port>
</replica>
<replica>
<host>node3</host>
<port>9000</port>
</replica>
</shard>
</cluster_simple>
</remote_servers>
</clickhouse>

View File

@ -0,0 +1,6 @@
<clickhouse>
<query_log>
<database>system</database>
<table>query_log</table>
</query_log>
</clickhouse>

View File

@ -73,14 +73,38 @@ def started_cluster():
cluster.add_instance( cluster.add_instance(
"node1", "node1",
main_configs=[ main_configs=[
"configs/config.d/query_log.xml",
"configs/config.d/cluster.xml",
"configs/config.d/named_collections.xml", "configs/config.d/named_collections.xml",
"configs/config.d/filesystem_caches.xml", "configs/config.d/filesystem_caches.xml",
], ],
user_configs=["configs/users.d/users.xml"], user_configs=["configs/users.d/users.xml"],
with_minio=True, with_minio=True,
with_azurite=True, with_azurite=True,
stay_alive=True,
with_hdfs=with_hdfs, with_hdfs=with_hdfs,
stay_alive=True,
)
cluster.add_instance(
"node2",
main_configs=[
"configs/config.d/query_log.xml",
"configs/config.d/cluster.xml",
"configs/config.d/named_collections.xml",
"configs/config.d/filesystem_caches.xml",
],
user_configs=["configs/users.d/users.xml"],
stay_alive=True,
)
cluster.add_instance(
"node3",
main_configs=[
"configs/config.d/query_log.xml",
"configs/config.d/cluster.xml",
"configs/config.d/named_collections.xml",
"configs/config.d/filesystem_caches.xml",
],
user_configs=["configs/users.d/users.xml"],
stay_alive=True,
) )
logging.info("Starting cluster...") logging.info("Starting cluster...")
@ -182,6 +206,7 @@ def get_creation_expression(
cluster, cluster,
format="Parquet", format="Parquet",
table_function=False, table_function=False,
run_on_cluster=False,
**kwargs, **kwargs,
): ):
if storage_type == "s3": if storage_type == "s3":
@ -189,7 +214,11 @@ def get_creation_expression(
bucket = kwargs["bucket"] bucket = kwargs["bucket"]
else: else:
bucket = cluster.minio_bucket bucket = cluster.minio_bucket
print(bucket)
if run_on_cluster:
assert table_function
return f"icebergS3Cluster('cluster_simple', s3, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')"
else:
if table_function: if table_function:
return f"icebergS3(s3, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')" return f"icebergS3(s3, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')"
else: else:
@ -197,7 +226,14 @@ def get_creation_expression(
DROP TABLE IF EXISTS {table_name}; DROP TABLE IF EXISTS {table_name};
CREATE TABLE {table_name} CREATE TABLE {table_name}
ENGINE=IcebergS3(s3, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')""" ENGINE=IcebergS3(s3, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')"""
elif storage_type == "azure": elif storage_type == "azure":
if run_on_cluster:
assert table_function
return f"""
icebergAzureCluster('cluster_simple', azure, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/iceberg_data/default/{table_name}/', format={format})
"""
else:
if table_function: if table_function:
return f""" return f"""
icebergAzure(azure, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/iceberg_data/default/{table_name}/', format={format}) icebergAzure(azure, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/iceberg_data/default/{table_name}/', format={format})
@ -207,7 +243,14 @@ def get_creation_expression(
DROP TABLE IF EXISTS {table_name}; DROP TABLE IF EXISTS {table_name};
CREATE TABLE {table_name} CREATE TABLE {table_name}
ENGINE=IcebergAzure(azure, container = {cluster.azure_container_name}, storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/iceberg_data/default/{table_name}/', format={format})""" ENGINE=IcebergAzure(azure, container = {cluster.azure_container_name}, storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/iceberg_data/default/{table_name}/', format={format})"""
elif storage_type == "hdfs": elif storage_type == "hdfs":
if run_on_cluster:
assert table_function
return f"""
icebergHDFSCluster('cluster_simple', hdfs, filename= 'iceberg_data/default/{table_name}/', format={format}, url = 'hdfs://hdfs1:9000/')
"""
else:
if table_function: if table_function:
return f""" return f"""
icebergHDFS(hdfs, filename= 'iceberg_data/default/{table_name}/', format={format}, url = 'hdfs://hdfs1:9000/') icebergHDFS(hdfs, filename= 'iceberg_data/default/{table_name}/', format={format}, url = 'hdfs://hdfs1:9000/')
@ -217,7 +260,10 @@ def get_creation_expression(
DROP TABLE IF EXISTS {table_name}; DROP TABLE IF EXISTS {table_name};
CREATE TABLE {table_name} CREATE TABLE {table_name}
ENGINE=IcebergHDFS(hdfs, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'hdfs://hdfs1:9000/');""" ENGINE=IcebergHDFS(hdfs, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'hdfs://hdfs1:9000/');"""
elif storage_type == "local": elif storage_type == "local":
assert not run_on_cluster
if table_function: if table_function:
return f""" return f"""
icebergLocal(local, path = '/iceberg_data/default/{table_name}/', format={format}) icebergLocal(local, path = '/iceberg_data/default/{table_name}/', format={format})
@ -227,6 +273,7 @@ def get_creation_expression(
DROP TABLE IF EXISTS {table_name}; DROP TABLE IF EXISTS {table_name};
CREATE TABLE {table_name} CREATE TABLE {table_name}
ENGINE=IcebergLocal(local, path = '/iceberg_data/default/{table_name}/', format={format});""" ENGINE=IcebergLocal(local, path = '/iceberg_data/default/{table_name}/', format={format});"""
else: else:
raise Exception(f"Unknown iceberg storage type: {storage_type}") raise Exception(f"Unknown iceberg storage type: {storage_type}")
@ -492,6 +539,108 @@ def test_types(started_cluster, format_version, storage_type):
) )
@pytest.mark.parametrize("format_version", ["1", "2"])
@pytest.mark.parametrize("storage_type", ["s3", "azure", "hdfs"])
def test_cluster_table_function(started_cluster, format_version, storage_type):
if is_arm() and storage_type == "hdfs":
pytest.skip("Disabled test IcebergHDFS for aarch64")
instance = started_cluster.instances["node1"]
spark = started_cluster.spark_session
TABLE_NAME = (
"test_iceberg_cluster_"
+ format_version
+ "_"
+ storage_type
+ "_"
+ get_uuid_str()
)
def add_df(mode):
write_iceberg_from_df(
spark,
generate_data(spark, 0, 100),
TABLE_NAME,
mode=mode,
format_version=format_version,
)
files = default_upload_directory(
started_cluster,
storage_type,
f"/iceberg_data/default/{TABLE_NAME}/",
f"/iceberg_data/default/{TABLE_NAME}/",
)
logging.info(f"Adding another dataframe. result files: {files}")
return files
files = add_df(mode="overwrite")
for i in range(1, len(started_cluster.instances)):
files = add_df(mode="append")
logging.info(f"Setup complete. files: {files}")
assert len(files) == 5 + 4 * (len(started_cluster.instances) - 1)
clusters = instance.query(f"SELECT * FROM system.clusters")
logging.info(f"Clusters setup: {clusters}")
# Regular Query only node1
table_function_expr = get_creation_expression(
storage_type, TABLE_NAME, started_cluster, table_function=True
)
select_regular = (
instance.query(f"SELECT * FROM {table_function_expr}").strip().split()
)
# Cluster Query with node1 as coordinator
table_function_expr_cluster = get_creation_expression(
storage_type,
TABLE_NAME,
started_cluster,
table_function=True,
run_on_cluster=True,
)
select_cluster = (
instance.query(f"SELECT * FROM {table_function_expr_cluster}").strip().split()
)
# Simple size check
assert len(select_regular) == 600
assert len(select_cluster) == 600
# Actual check
assert select_cluster == select_regular
# Check query_log
for replica in started_cluster.instances.values():
replica.query("SYSTEM FLUSH LOGS")
for node_name, replica in started_cluster.instances.items():
cluster_secondary_queries = (
replica.query(
f"""
SELECT query, type, is_initial_query, read_rows, read_bytes FROM system.query_log
WHERE
type = 'QueryStart' AND
positionCaseInsensitive(query, '{storage_type}Cluster') != 0 AND
position(query, '{TABLE_NAME}') != 0 AND
position(query, 'system.query_log') = 0 AND
NOT is_initial_query
"""
)
.strip()
.split("\n")
)
logging.info(
f"[{node_name}] cluster_secondary_queries: {cluster_secondary_queries}"
)
assert len(cluster_secondary_queries) == 1
@pytest.mark.parametrize("format_version", ["1", "2"]) @pytest.mark.parametrize("format_version", ["1", "2"])
@pytest.mark.parametrize("storage_type", ["s3", "azure", "hdfs", "local"]) @pytest.mark.parametrize("storage_type", ["s3", "azure", "hdfs", "local"])
def test_delete_files(started_cluster, format_version, storage_type): def test_delete_files(started_cluster, format_version, storage_type):

View File

@ -1,3 +1,6 @@
SELECT arrayWithConstant(96142475, ['qMUF']); -- { serverError TOO_LARGE_ARRAY_SIZE } SELECT arrayWithConstant(96142475, ['qMUF']); -- { serverError TOO_LARGE_ARRAY_SIZE }
SELECT arrayWithConstant(100000000, materialize([[[[[[[[[['Hello, world!']]]]]]]]]])); -- { serverError TOO_LARGE_ARRAY_SIZE } SELECT arrayWithConstant(100000000, materialize([[[[[[[[[['Hello, world!']]]]]]]]]])); -- { serverError TOO_LARGE_ARRAY_SIZE }
SELECT length(arrayWithConstant(10000000, materialize([[[[[[[[[['Hello world']]]]]]]]]]))); SELECT length(arrayWithConstant(10000000, materialize([[[[[[[[[['Hello world']]]]]]]]]])));
CREATE TEMPORARY TABLE args (value Array(Int)) ENGINE=Memory AS SELECT [1, 1, 1, 1] as value FROM numbers(1, 100);
SELECT length(arrayWithConstant(1000000, value)) FROM args FORMAT NULL;

View File

@ -0,0 +1,29 @@
-- compact
DROP TABLE IF EXISTS users;
CREATE TABLE users (
uid Int16,
name String,
age Int16,
projection p1 (select age, count() group by age),
) ENGINE = MergeTree order by uid
SETTINGS lightweight_mutation_projection_mode = 'rebuild', min_bytes_for_wide_part = 10485760;
INSERT INTO users VALUES (1231, 'John', 33), (1232, 'Mary', 34);
DELETE FROM users WHERE age = 34;
-- wide
DROP TABLE IF EXISTS users;
CREATE TABLE users (
uid Int16,
name String,
age Int16,
projection p1 (select age, count() group by age),
) ENGINE = MergeTree order by uid
SETTINGS lightweight_mutation_projection_mode = 'rebuild', min_bytes_for_wide_part = 0;
INSERT INTO users VALUES (1231, 'John', 33), (1232, 'Mary', 34);
DELETE FROM users WHERE age = 34;

View File

@ -244,7 +244,10 @@ Deduplication
DefaultTableEngine DefaultTableEngine
DelayedInserts DelayedInserts
DeliveryTag DeliveryTag
Deltalake
DeltaLake DeltaLake
deltalakeCluster
deltaLakeCluster
Denormalize Denormalize
DestroyAggregatesThreads DestroyAggregatesThreads
DestroyAggregatesThreadsActive DestroyAggregatesThreadsActive
@ -377,10 +380,15 @@ Homebrew's
HorizontalDivide HorizontalDivide
Hostname Hostname
HouseOps HouseOps
hudi
Hudi Hudi
hudiCluster
HudiCluster
HyperLogLog HyperLogLog
Hypot Hypot
IANA IANA
icebergCluster
IcebergCluster
IDE IDE
IDEs IDEs
IDNA IDNA