Merge c9e674a01e into e0f8b8d351

Merge pull request #70458 from ClickHouse/fix-ephemeral-comment
Fix ephemeral column comment
2024-11-21 15:12:02 +00:00 · 2024-11-21 07:06:34 +01:00 · 2024-11-21 05:10:11 +00:00 · 2024-11-21 05:06:51 +00:00 · 2024-11-21 05:05:41 +00:00 · 2024-11-21 05:03:24 +00:00
29 changed files with 547 additions and 62 deletions
--- a/docker/server/README.md
+++ b/docker/server/README.md
@ -16,16 +16,18 @@ ClickHouse works 100-1000x faster than traditional database management systems,

 For more information and documentation see https://clickhouse.com/.

-<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
 ## Versions

 -	The `latest` tag points to the latest release of the latest stable branch.
 -	Branch tags like `22.2` point to the latest release of the corresponding branch.
-	Full version tags like `22.2.3.5` point to the corresponding release.
+-	Full version tags like `22.2.3` and `22.2.3.5` point to the corresponding release.
+<!-- docker-official-library:off -->
+<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
 -	The tag `head` is built from the latest commit to the default branch.
 -	Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.

 <!-- REMOVE UNTIL HERE -->
+<!-- docker-official-library:on -->
 ### Compatibility

 -	The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
--- a/docker/server/README.src/content.md
+++ b/docker/server/README.src/content.md
@ -10,16 +10,18 @@ ClickHouse works 100-1000x faster than traditional database management systems,

 For more information and documentation see https://clickhouse.com/.

-<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
 ## Versions

 -	The `latest` tag points to the latest release of the latest stable branch.
 -	Branch tags like `22.2` point to the latest release of the corresponding branch.
-	Full version tags like `22.2.3.5` point to the corresponding release.
+-	Full version tags like `22.2.3` and `22.2.3.5` point to the corresponding release.
+<!-- docker-official-library:off -->
+<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
 -	The tag `head` is built from the latest commit to the default branch.
 -	Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.

 <!-- REMOVE UNTIL HERE -->
+<!-- docker-official-library:on -->
 ### Compatibility

 -	The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
--- a/docs/changelogs/v24.8.1.2684-lts.md
+++ b/docs/changelogs/v24.8.1.2684-lts.md
@ -522,4 +522,3 @@ sidebar_label: 2024
 * Backported in [#68518](https://github.com/ClickHouse/ClickHouse/issues/68518): Minor update in Dynamic/JSON serializations. [#68459](https://github.com/ClickHouse/ClickHouse/pull/68459) ([Kruglov Pavel](https://github.com/Avogar)).
 * Backported in [#68558](https://github.com/ClickHouse/ClickHouse/issues/68558): CI: Minor release workflow fix. [#68536](https://github.com/ClickHouse/ClickHouse/pull/68536) ([Max K.](https://github.com/maxknv)).
 * Backported in [#68576](https://github.com/ClickHouse/ClickHouse/issues/68576): CI: Tidy build timeout from 2h to 3h. [#68567](https://github.com/ClickHouse/ClickHouse/pull/68567) ([Max K.](https://github.com/maxknv)).
-
--- a/docs/changelogs/v24.9.1.3278-stable.md
+++ b/docs/changelogs/v24.9.1.3278-stable.md
@ -497,4 +497,3 @@ sidebar_label: 2024
 * Backported in [#69899](https://github.com/ClickHouse/ClickHouse/issues/69899): Revert "Merge pull request [#69032](https://github.com/ClickHouse/ClickHouse/issues/69032) from alexon1234/include_real_time_execution_in_http_header". [#69885](https://github.com/ClickHouse/ClickHouse/pull/69885) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
 * Backported in [#69931](https://github.com/ClickHouse/ClickHouse/issues/69931): RIPE is an acronym and thus should be capital. RIPE stands for **R**ACE **I**ntegrity **P**rimitives **E**valuation and RACE stands for **R**esearch and Development in **A**dvanced **C**ommunications **T**echnologies in **E**urope. [#69901](https://github.com/ClickHouse/ClickHouse/pull/69901) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
 * Backported in [#70034](https://github.com/ClickHouse/ClickHouse/issues/70034): Revert "Add RIPEMD160 function". [#70005](https://github.com/ClickHouse/ClickHouse/pull/70005) ([Robert Schulze](https://github.com/rschu1ze)).
-
--- a/docs/en/sql-reference/table-functions/deltalake.md
+++ b/docs/en/sql-reference/table-functions/deltalake.md
@ -49,4 +49,4 @@ LIMIT 2
 **See Also**

 - [DeltaLake engine](/docs/en/engines/table-engines/integrations/deltalake.md)
-
+- [DeltaLake cluster table function](/docs/en/sql-reference/table-functions/deltalakeCluster.md)
--- a/docs/en/sql-reference/table-functions/deltalakeCluster.md
+++ b/docs/en/sql-reference/table-functions/deltalakeCluster.md
@ -0,0 +1,30 @@
+---
+slug: /en/sql-reference/table-functions/deltalakeCluster
+sidebar_position: 46
+sidebar_label: deltaLakeCluster
+title: "deltaLakeCluster Table Function"
+---
+This is an extension to the [deltaLake](/docs/en/sql-reference/table-functions/deltalake.md) table function.
+
+Allows processing files from [Delta Lake](https://github.com/delta-io/delta) tables in Amazon S3 in parallel from many nodes in a specified cluster. On initiator it creates a connection to all nodes in the cluster and dispatches each file dynamically. On the worker node it asks the initiator about the next task to process and processes it. This is repeated until all tasks are finished.
+
+**Syntax**
+
+``` sql
+deltaLakeCluster(cluster_name, url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
+```
+
+**Arguments**
+
+- `cluster_name` — Name of a cluster that is used to build a set of addresses and connection parameters to remote and local servers.
+
+- Description of all other arguments coincides with description of arguments in equivalent [deltaLake](/docs/en/sql-reference/table-functions/deltalake.md) table function.
+
+**Returned value**
+
+A table with the specified structure for reading data from cluster in the specified Delta Lake table in S3.
+
+**See Also**
+
+- [deltaLake engine](/docs/en/engines/table-engines/integrations/deltalake.md)
+- [deltaLake table function](/docs/en/sql-reference/table-functions/deltalake.md)
--- a/docs/en/sql-reference/table-functions/hudi.md
+++ b/docs/en/sql-reference/table-functions/hudi.md
@ -29,4 +29,4 @@ A table with the specified structure for reading data in the specified Hudi tabl
 **See Also**

 - [Hudi engine](/docs/en/engines/table-engines/integrations/hudi.md)
-
+- [Hudi cluster table function](/docs/en/sql-reference/table-functions/hudiCluster.md)
--- a/docs/en/sql-reference/table-functions/hudiCluster.md
+++ b/docs/en/sql-reference/table-functions/hudiCluster.md
@ -0,0 +1,30 @@
+---
+slug: /en/sql-reference/table-functions/hudiCluster
+sidebar_position: 86
+sidebar_label: hudiCluster
+title: "hudiCluster Table Function"
+---
+This is an extension to the [hudi](/docs/en/sql-reference/table-functions/hudi.md) table function.
+
+Allows processing files from Apache [Hudi](https://hudi.apache.org/) tables in Amazon S3 in parallel from many nodes in a specified cluster. On initiator it creates a connection to all nodes in the cluster and dispatches each file dynamically. On the worker node it asks the initiator about the next task to process and processes it. This is repeated until all tasks are finished.
+
+**Syntax**
+
+``` sql
+hudiCluster(cluster_name, url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
+```
+
+**Arguments**
+
+- `cluster_name` — Name of a cluster that is used to build a set of addresses and connection parameters to remote and local servers.
+
+- Description of all other arguments coincides with description of arguments in equivalent [hudi](/docs/en/sql-reference/table-functions/hudi.md) table function.
+
+**Returned value**
+
+A table with the specified structure for reading data from cluster in the specified Hudi table in S3.
+
+**See Also**
+
+- [Hudi engine](/docs/en/engines/table-engines/integrations/hudi.md)
+- [Hudi table function](/docs/en/sql-reference/table-functions/hudi.md)
--- a/docs/en/sql-reference/table-functions/iceberg.md
+++ b/docs/en/sql-reference/table-functions/iceberg.md
@ -72,3 +72,4 @@ Table function `iceberg` is an alias to `icebergS3` now.
 **See Also**

 - [Iceberg engine](/docs/en/engines/table-engines/integrations/iceberg.md)
+- [Iceberg cluster table function](/docs/en/sql-reference/table-functions/icebergCluster.md)
--- a/docs/en/sql-reference/table-functions/icebergCluster.md
+++ b/docs/en/sql-reference/table-functions/icebergCluster.md
@ -0,0 +1,43 @@
+---
+slug: /en/sql-reference/table-functions/icebergCluster
+sidebar_position: 91
+sidebar_label: icebergCluster
+title: "icebergCluster Table Function"
+---
+This is an extension to the [iceberg](/docs/en/sql-reference/table-functions/iceberg.md) table function.
+
+Allows processing files from Apache [Iceberg](https://iceberg.apache.org/) in parallel from many nodes in a specified cluster. On initiator it creates a connection to all nodes in the cluster and dispatches each file dynamically. On the worker node it asks the initiator about the next task to process and processes it. This is repeated until all tasks are finished.
+
+**Syntax**
+
+``` sql
+icebergS3Cluster(cluster_name, url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])
+icebergS3Cluster(cluster_name, named_collection[, option=value [,..]])
+
+icebergAzureCluster(cluster_name, connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])
+icebergAzureCluster(cluster_name, named_collection[, option=value [,..]])
+
+icebergHDFSCluster(cluster_name, path_to_table, [,format] [,compression_method])
+icebergHDFSCluster(cluster_name, named_collection[, option=value [,..]])
+```
+
+**Arguments**
+
+- `cluster_name` — Name of a cluster that is used to build a set of addresses and connection parameters to remote and local servers.
+
+- Description of all other arguments coincides with description of arguments in equivalent [iceberg](/docs/en/sql-reference/table-functions/iceberg.md) table function.
+
+**Returned value**
+
+A table with the specified structure for reading data from cluster in the specified Iceberg table.
+
+**Examples**
+
+```sql
+SELECT * FROM icebergS3Cluster('cluster_simple', 'http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')
+```
+
+**See Also**
+
+- [Iceberg engine](/docs/en/engines/table-engines/integrations/iceberg.md)
+- [Iceberg table function](/docs/en/sql-reference/table-functions/iceberg.md)
--- a/programs/server/dashboard.html
+++ b/programs/server/dashboard.html
@ -476,7 +476,7 @@
            <input id="edit" type="button" value="✎" style="display: none;">
            <input id="add" type="button" value="Add chart" style="display: none;">
            <input id="reload" type="button" value="Reload">
-            <span id="search-span" class="nowrap" style="display: none;"><input id="search" type="button" value="🔎" title="Run query to obtain list of charts from ClickHouse"><input id="search-query" name="search" type="text" spellcheck="false"></span>
+            <span id="search-span" class="nowrap" style="display: none;"><input id="search" type="button" value="🔎" title="Run query to obtain list of charts from ClickHouse. Either select dashboard name or write your own query"><input id="search-query" name="search" list="search-options" type="text" spellcheck="false"><datalist id="search-options"></datalist></span>
            <div id="chart-params"></div>
        </div>
    </form>
@ -532,9 +532,15 @@ const errorMessages = [
    }
 ]

+/// Dashboard selector
+const dashboardSearchQuery = (dashboard_name) => `SELECT title, query FROM system.dashboards WHERE dashboard = '${dashboard_name}'`;
+let dashboard_queries = {
+    "Overview": dashboardSearchQuery("Overview"),
+};
+const default_dashboard = 'Overview';

 /// Query to fill `queries` list for the dashboard
-let search_query = `SELECT title, query FROM system.dashboards WHERE dashboard = 'Overview'`;
+let search_query = dashboardSearchQuery(default_dashboard);
 let customized = false;
 let queries = [];

@ -1439,7 +1445,7 @@ async function reloadAll(do_search) {
    try {
        updateParams();
        if (do_search) {
-            search_query = document.getElementById('search-query').value;
+            search_query = toSearchQuery(document.getElementById('search-query').value);
            queries = [];
            refreshCustomized(false);
        }
@ -1504,7 +1510,7 @@ function updateFromState() {
    document.getElementById('url').value = host;
    document.getElementById('user').value = user;
    document.getElementById('password').value = password;
-    document.getElementById('search-query').value = search_query;
+    document.getElementById('search-query').value = fromSearchQuery(search_query);
    refreshCustomized();
 }

@ -1543,6 +1549,44 @@ if (window.location.hash) {
    } catch {}
 }

+function fromSearchQuery(query) {
+    for (const dashboard_name in dashboard_queries) {
+        if (query == dashboard_queries[dashboard_name])
+            return dashboard_name;
+    }
+    return query;
+}
+
+function toSearchQuery(value) {
+    if (value in dashboard_queries)
+        return dashboard_queries[value];
+    else
+        return value;
+}
+
+async function populateSearchOptions() {
+    let {reply, error} = await doFetch("SELECT dashboard FROM system.dashboards GROUP BY dashboard ORDER BY ALL");
+    if (error) {
+        throw new Error(error);
+    }
+    let data = reply.data;
+    if (data.dashboard.length == 0) {
+        console.log("Unable to fetch dashboards list");
+        return;
+    }
+    dashboard_queries = {};
+    for (let i = 0; i < data.dashboard.length; i++) {
+        const dashboard = data.dashboard[i];
+        dashboard_queries[dashboard] = dashboardSearchQuery(dashboard);
+    }
+    const searchOptions = document.getElementById('search-options');
+    for (const dashboard in dashboard_queries) {
+        const opt = document.createElement('option');
+        opt.value = dashboard;
+        searchOptions.appendChild(opt);
+    }
+}
+
 async function start() {
    try {
        updateFromState();
@ -1558,6 +1602,7 @@ async function start() {
        } else {
            drawAll();
        }
+        await populateSearchOptions();
    } catch (e) {
        showError(e.message);
    }
--- a/src/Analyzer/Resolve/IdentifierResolver.cpp
+++ b/src/Analyzer/Resolve/IdentifierResolver.cpp
@ -528,7 +528,7 @@ QueryTreeNodePtr IdentifierResolver::tryResolveIdentifierFromCompoundExpression(
  *
  * Resolve strategy:
  * 1. Try to bind identifier to scope argument name to node map.
-  * 2. If identifier is binded but expression context and node type are incompatible return nullptr.
+  * 2. If identifier is bound but expression context and node type are incompatible return nullptr.
  *
  * It is important to support edge cases, where we lookup for table or function node, but argument has same name.
  * Example: WITH (x -> x + 1) AS func, (func -> func(1) + func) AS lambda SELECT lambda(1);
--- a/src/Client/ReplxxLineReader.cpp
+++ b/src/Client/ReplxxLineReader.cpp
@ -362,7 +362,7 @@ ReplxxLineReader::ReplxxLineReader(
    if (highlighter)
        rx.set_highlighter_callback(highlighter);

-    /// By default C-p/C-n binded to COMPLETE_NEXT/COMPLETE_PREV,
+    /// By default C-p/C-n bound to COMPLETE_NEXT/COMPLETE_PREV,
    /// bind C-p/C-n to history-previous/history-next like readline.
    rx.bind_key(Replxx::KEY::control('N'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::HISTORY_NEXT, code); });
    rx.bind_key(Replxx::KEY::control('P'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::HISTORY_PREVIOUS, code); });
@ -384,9 +384,9 @@ ReplxxLineReader::ReplxxLineReader(
    rx.bind_key(Replxx::KEY::control('J'), commit_action);
    rx.bind_key(Replxx::KEY::ENTER, commit_action);

-    /// By default COMPLETE_NEXT/COMPLETE_PREV was binded to C-p/C-n, re-bind
+    /// By default COMPLETE_NEXT/COMPLETE_PREV was bound to C-p/C-n, re-bind
    /// to M-P/M-N (that was used for HISTORY_COMMON_PREFIX_SEARCH before, but
-    /// it also binded to M-p/M-n).
+    /// it also bound to M-p/M-n).
    rx.bind_key(Replxx::KEY::meta('N'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::COMPLETE_NEXT, code); });
    rx.bind_key(Replxx::KEY::meta('P'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::COMPLETE_PREVIOUS, code); });
    /// By default M-BACKSPACE is KILL_TO_WHITESPACE_ON_LEFT, while in readline it is backward-kill-word
--- a/src/Parsers/ParserCreateQuery.h
+++ b/src/Parsers/ParserCreateQuery.h
@ -237,6 +237,7 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
            null_modifier.emplace(true);
    }

+    bool is_comment = false;
    /// Collate is also allowed after NULL/NOT NULL
    if (!collation_expression && s_collate.ignore(pos, expected)
        && !collation_parser.parse(pos, collation_expression, expected))
@ -254,7 +255,9 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
    else if (s_ephemeral.ignore(pos, expected))
    {
        default_specifier = s_ephemeral.getName();
-        if (!expr_parser.parse(pos, default_expression, expected) && type)
+        if (s_comment.ignore(pos, expected))
+            is_comment = true;
+        if ((is_comment || !expr_parser.parse(pos, default_expression, expected)) && type)
        {
            ephemeral_default = true;

@ -289,6 +292,8 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
    if (require_type && !type && !default_expression)
        return false; /// reject column name without type

+    if (!is_comment)
+    {
        if ((type || default_expression) && allow_null_modifiers && !null_modifier.has_value())
        {
            if (s_not.ignore(pos, expected))
@ -300,8 +305,9 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
            else if (s_null.ignore(pos, expected))
                null_modifier.emplace(true);
        }
+    }

-    if (s_comment.ignore(pos, expected))
+    if (is_comment || s_comment.ignore(pos, expected))
    {
        /// should be followed by a string literal
        if (!string_literal_parser.parse(pos, comment_expression, expected))
--- a/src/TableFunctions/TableFunctionObjectStorage.cpp
+++ b/src/TableFunctions/TableFunctionObjectStorage.cpp
@ -226,6 +226,26 @@ template class TableFunctionObjectStorage<HDFSClusterDefinition, StorageHDFSConf
 #endif
 template class TableFunctionObjectStorage<LocalDefinition, StorageLocalConfiguration>;

+#if USE_AVRO && USE_AWS_S3
+template class TableFunctionObjectStorage<IcebergS3ClusterDefinition, StorageS3IcebergConfiguration>;
+#endif
+
+#if USE_AVRO && USE_AZURE_BLOB_STORAGE
+template class TableFunctionObjectStorage<IcebergAzureClusterDefinition, StorageAzureIcebergConfiguration>;
+#endif
+
+#if USE_AVRO && USE_HDFS
+template class TableFunctionObjectStorage<IcebergHDFSClusterDefinition, StorageHDFSIcebergConfiguration>;
+#endif
+
+#if USE_PARQUET && USE_AWS_S3
+template class TableFunctionObjectStorage<DeltaLakeClusterDefinition, StorageS3DeltaLakeConfiguration>;
+#endif
+
+#if USE_AWS_S3
+template class TableFunctionObjectStorage<HudiClusterDefinition, StorageS3HudiConfiguration>;
+#endif
+
 #if USE_AVRO
 void registerTableFunctionIceberg(TableFunctionFactory & factory)
 {
--- a/src/TableFunctions/TableFunctionObjectStorageCluster.cpp
+++ b/src/TableFunctions/TableFunctionObjectStorageCluster.cpp
@ -96,7 +96,7 @@ void registerTableFunctionObjectStorageCluster(TableFunctionFactory & factory)
    {
        .documentation = {
            .description=R"(The table function can be used to read the data stored on HDFS in parallel for many nodes in a specified cluster.)",
-            .examples{{"HDFSCluster", "SELECT * FROM HDFSCluster(cluster_name, uri, format)", ""}}},
+            .examples{{"HDFSCluster", "SELECT * FROM HDFSCluster(cluster, uri, format)", ""}}},
            .allow_readonly = false
        }
    );
@ -105,15 +105,77 @@ void registerTableFunctionObjectStorageCluster(TableFunctionFactory & factory)
    UNUSED(factory);
 }

+
+#if USE_AVRO
+void registerTableFunctionIcebergCluster(TableFunctionFactory & factory)
+{
+    UNUSED(factory);
+
 #if USE_AWS_S3
-template class TableFunctionObjectStorageCluster<S3ClusterDefinition, StorageS3Configuration>;
+    factory.registerFunction<TableFunctionIcebergS3Cluster>(
+        {.documentation
+         = {.description = R"(The table function can be used to read the Iceberg table stored on S3 object store in parallel for many nodes in a specified cluster.)",
+            .examples{{"icebergS3Cluster", "SELECT * FROM icebergS3Cluster(cluster, url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression])", ""}},
+            .categories{"DataLake"}},
+         .allow_readonly = false});
 #endif

 #if USE_AZURE_BLOB_STORAGE
-template class TableFunctionObjectStorageCluster<AzureClusterDefinition, StorageAzureConfiguration>;
+    factory.registerFunction<TableFunctionIcebergAzureCluster>(
+        {.documentation
+         = {.description = R"(The table function can be used to read the Iceberg table stored on Azure object store in parallel for many nodes in a specified cluster.)",
+            .examples{{"icebergAzureCluster", "SELECT * FROM icebergAzureCluster(cluster, connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])", ""}},
+            .categories{"DataLake"}},
+         .allow_readonly = false});
 #endif

 #if USE_HDFS
-template class TableFunctionObjectStorageCluster<HDFSClusterDefinition, StorageHDFSConfiguration>;
+    factory.registerFunction<TableFunctionIcebergHDFSCluster>(
+        {.documentation
+         = {.description = R"(The table function can be used to read the Iceberg table stored on HDFS virtual filesystem in parallel for many nodes in a specified cluster.)",
+            .examples{{"icebergHDFSCluster", "SELECT * FROM icebergHDFSCluster(cluster, uri, [format], [structure], [compression_method])", ""}},
+            .categories{"DataLake"}},
+         .allow_readonly = false});
 #endif
 }
+#endif
+
+#if USE_AWS_S3
+#if USE_PARQUET
+void registerTableFunctionDeltaLakeCluster(TableFunctionFactory & factory)
+{
+    factory.registerFunction<TableFunctionDeltaLakeCluster>(
+        {.documentation
+         = {.description = R"(The table function can be used to read the DeltaLake table stored on object store in parallel for many nodes in a specified cluster.)",
+            .examples{{"deltaLakeCluster", "SELECT * FROM deltaLakeCluster(cluster, url, access_key_id, secret_access_key)", ""}},
+            .categories{"DataLake"}},
+         .allow_readonly = false});
+}
+#endif
+
+void registerTableFunctionHudiCluster(TableFunctionFactory & factory)
+{
+    factory.registerFunction<TableFunctionHudiCluster>(
+        {.documentation
+         = {.description = R"(The table function can be used to read the Hudi table stored on object store in parallel for many nodes in a specified cluster.)",
+            .examples{{"hudiCluster", "SELECT * FROM hudiCluster(cluster, url, access_key_id, secret_access_key)", ""}},
+            .categories{"DataLake"}},
+         .allow_readonly = false});
+}
+#endif
+
+void registerDataLakeClusterTableFunctions(TableFunctionFactory & factory)
+{
+    UNUSED(factory);
+#if USE_AVRO
+    registerTableFunctionIcebergCluster(factory);
+#endif
+#if USE_AWS_S3
+#if USE_PARQUET
+    registerTableFunctionDeltaLakeCluster(factory);
+#endif
+    registerTableFunctionHudiCluster(factory);
+#endif
+}
+
+}
--- a/src/TableFunctions/TableFunctionObjectStorageCluster.h
+++ b/src/TableFunctions/TableFunctionObjectStorageCluster.h
@ -33,6 +33,36 @@ struct HDFSClusterDefinition
    static constexpr auto storage_type_name = "HDFSCluster";
 };

+struct IcebergS3ClusterDefinition
+{
+    static constexpr auto name = "icebergS3Cluster";
+    static constexpr auto storage_type_name = "IcebergS3Cluster";
+};
+
+struct IcebergAzureClusterDefinition
+{
+    static constexpr auto name = "icebergAzureCluster";
+    static constexpr auto storage_type_name = "IcebergAzureCluster";
+};
+
+struct IcebergHDFSClusterDefinition
+{
+    static constexpr auto name = "icebergHDFSCluster";
+    static constexpr auto storage_type_name = "IcebergHDFSCluster";
+};
+
+struct DeltaLakeClusterDefinition
+{
+    static constexpr auto name = "deltaLakeCluster";
+    static constexpr auto storage_type_name = "DeltaLakeS3Cluster";
+};
+
+struct HudiClusterDefinition
+{
+    static constexpr auto name = "hudiCluster";
+    static constexpr auto storage_type_name = "HudiS3Cluster";
+};
+
 /**
 * Class implementing s3/hdfs/azureBlobStorageCluster(...) table functions,
 * which allow to process many files from S3/HDFS/Azure blob storage on a specific cluster.
@ -79,4 +109,25 @@ using TableFunctionAzureBlobCluster = TableFunctionObjectStorageCluster<AzureClu
 #if USE_HDFS
 using TableFunctionHDFSCluster = TableFunctionObjectStorageCluster<HDFSClusterDefinition, StorageHDFSConfiguration>;
 #endif
+
+#if USE_AVRO && USE_AWS_S3
+using TableFunctionIcebergS3Cluster = TableFunctionObjectStorageCluster<IcebergS3ClusterDefinition, StorageS3IcebergConfiguration>;
+#endif
+
+#if USE_AVRO && USE_AZURE_BLOB_STORAGE
+using TableFunctionIcebergAzureCluster = TableFunctionObjectStorageCluster<IcebergAzureClusterDefinition, StorageAzureIcebergConfiguration>;
+#endif
+
+#if USE_AVRO && USE_HDFS
+using TableFunctionIcebergHDFSCluster = TableFunctionObjectStorageCluster<IcebergHDFSClusterDefinition, StorageHDFSIcebergConfiguration>;
+#endif
+
+#if USE_AWS_S3 && USE_PARQUET
+using TableFunctionDeltaLakeCluster = TableFunctionObjectStorageCluster<DeltaLakeClusterDefinition, StorageS3DeltaLakeConfiguration>;
+#endif
+
+#if USE_AWS_S3
+using TableFunctionHudiCluster = TableFunctionObjectStorageCluster<HudiClusterDefinition, StorageS3HudiConfiguration>;
+#endif
+
 }
--- a/src/TableFunctions/registerTableFunctions.cpp
+++ b/src/TableFunctions/registerTableFunctions.cpp
@ -66,6 +66,7 @@ void registerTableFunctions(bool use_legacy_mongodb_integration [[maybe_unused]]
    registerTableFunctionObjectStorage(factory);
    registerTableFunctionObjectStorageCluster(factory);
    registerDataLakeTableFunctions(factory);
+    registerDataLakeClusterTableFunctions(factory);
 }

 }
--- a/src/TableFunctions/registerTableFunctions.h
+++ b/src/TableFunctions/registerTableFunctions.h
@ -70,6 +70,7 @@ void registerTableFunctionExplain(TableFunctionFactory & factory);
 void registerTableFunctionObjectStorage(TableFunctionFactory & factory);
 void registerTableFunctionObjectStorageCluster(TableFunctionFactory & factory);
 void registerDataLakeTableFunctions(TableFunctionFactory & factory);
+void registerDataLakeClusterTableFunctions(TableFunctionFactory & factory);

 void registerTableFunctionTimeSeries(TableFunctionFactory & factory);

--- a/tests/ci/official_docker.py
+++ b/tests/ci/official_docker.py
@ -299,8 +299,6 @@ class TagAttrs:

    # Only one latest can exist
    latest: ClickHouseVersion
-    # Only one can be a major one (the most fresh per a year)
-    majors: Dict[int, ClickHouseVersion]
    # Only one lts version can exist
    lts: Optional[ClickHouseVersion]

@ -345,14 +343,6 @@ def ldf_tags(version: ClickHouseVersion, distro: str, tag_attrs: TagAttrs) -> st
            tags.append("lts")
        tags.append(f"lts-{distro}")

-    # If the tag `22`, `23`, `24` etc. should be included in the tags
-    with_major = tag_attrs.majors.get(version.major) in (None, version)
-    if with_major:
-        tag_attrs.majors[version.major] = version
-        if without_distro:
-            tags.append(f"{version.major}")
-        tags.append(f"{version.major}-{distro}")
-
    # Add all normal tags
    for tag in (
        f"{version.major}.{version.minor}",
@ -384,7 +374,7 @@ def generate_ldf(args: argparse.Namespace) -> None:
        args.directory / git_runner(f"git -C {args.directory} rev-parse --show-cdup")
    ).absolute()
    lines = ldf_header(git, directory)
-    tag_attrs = TagAttrs(versions[-1], {}, None)
+    tag_attrs = TagAttrs(versions[-1], None)

    # We iterate from the most recent to the oldest version
    for version in reversed(versions):
--- a/tests/clickhouse-test
+++ b/tests/clickhouse-test
@ -809,6 +809,7 @@ class SettingsRandomizer:
        "prefer_localhost_replica": lambda: random.randint(0, 1),
        "max_block_size": lambda: random.randint(8000, 100000),
        "max_joined_block_size_rows": lambda: random.randint(8000, 100000),
+        "min_joined_block_size_bytes": lambda: random.randint(524288 // 2, 4 * 524288),
        "max_threads": lambda: 32 if random.random() < 0.03 else random.randint(1, 3),
        "optimize_append_index": lambda: random.randint(0, 1),
        "optimize_if_chain_to_multiif": lambda: random.randint(0, 1),
@ -921,6 +922,14 @@ class SettingsRandomizer:
        "parallel_replicas_local_plan": lambda: random.randint(0, 1),
        "output_format_native_write_json_as_string": lambda: random.randint(0, 1),
        "enable_vertical_final": lambda: random.randint(0, 1),
+        "grace_hash_join_initial_buckets": lambda: random.randint(1, 16),
+        "grace_hash_join_max_buckets": lambda: 1024 * random.randint(0, 4),
+        "join_to_sort_minimum_perkey_rows": lambda: random.randint(1, 100),
+        "join_to_sort_maximum_table_rows": lambda: random.randint(1000, 100000),
+        "allow_experimental_join_right_table_sorting": lambda: True,
+        "allow_experimental_join_condition": lambda: True,
+        "join_output_by_rowlist_perkey_rows_threshold": lambda: random.randint(0, 10),
+        "max_rows_in_set_to_optimize_join": lambda: random.randint(0, 100),
    }

    @staticmethod
--- a/tests/integration/test_server_reload/test.py
+++ b/tests/integration/test_server_reload/test.py
@ -378,7 +378,7 @@ def test_reload_via_client(cluster, zk):
                        configure_from_zk(zk)
                    break
                except QueryRuntimeException:
-                    logging.exception("The new socket is not binded yet")
+                    logging.exception("The new socket is not bound yet")
                    time.sleep(0.1)

    if exception:
--- a/tests/integration/test_storage_iceberg/configs/config.d/cluster.xml
+++ b/tests/integration/test_storage_iceberg/configs/config.d/cluster.xml
@ -0,0 +1,20 @@
+<clickhouse>
+    <remote_servers>
+        <cluster_simple>
+            <shard>
+                <replica>
+                    <host>node1</host>
+                    <port>9000</port>
+                </replica>
+                <replica>
+                    <host>node2</host>
+                    <port>9000</port>
+                </replica>
+                <replica>
+                    <host>node3</host>
+                    <port>9000</port>
+                </replica>
+            </shard>
+        </cluster_simple>
+    </remote_servers>
+</clickhouse>
--- a/tests/integration/test_storage_iceberg/configs/config.d/query_log.xml
+++ b/tests/integration/test_storage_iceberg/configs/config.d/query_log.xml
@ -0,0 +1,6 @@
+<clickhouse>
+    <query_log>
+      <database>system</database>
+      <table>query_log</table>
+    </query_log>
+</clickhouse>
--- a/tests/integration/test_storage_iceberg/test.py
+++ b/tests/integration/test_storage_iceberg/test.py
@ -73,14 +73,38 @@ def started_cluster():
        cluster.add_instance(
            "node1",
            main_configs=[
+                "configs/config.d/query_log.xml",
+                "configs/config.d/cluster.xml",
                "configs/config.d/named_collections.xml",
                "configs/config.d/filesystem_caches.xml",
            ],
            user_configs=["configs/users.d/users.xml"],
            with_minio=True,
            with_azurite=True,
-            stay_alive=True,
            with_hdfs=with_hdfs,
+            stay_alive=True,
+        )
+        cluster.add_instance(
+            "node2",
+            main_configs=[
+                "configs/config.d/query_log.xml",
+                "configs/config.d/cluster.xml",
+                "configs/config.d/named_collections.xml",
+                "configs/config.d/filesystem_caches.xml",
+            ],
+            user_configs=["configs/users.d/users.xml"],
+            stay_alive=True,
+        )
+        cluster.add_instance(
+            "node3",
+            main_configs=[
+                "configs/config.d/query_log.xml",
+                "configs/config.d/cluster.xml",
+                "configs/config.d/named_collections.xml",
+                "configs/config.d/filesystem_caches.xml",
+            ],
+            user_configs=["configs/users.d/users.xml"],
+            stay_alive=True,
        )

        logging.info("Starting cluster...")
@ -182,6 +206,7 @@ def get_creation_expression(
    cluster,
    format="Parquet",
    table_function=False,
+    run_on_cluster=False,
    **kwargs,
 ):
    if storage_type == "s3":
@ -189,7 +214,11 @@ def get_creation_expression(
            bucket = kwargs["bucket"]
        else:
            bucket = cluster.minio_bucket
-        print(bucket)
+
+        if run_on_cluster:
+            assert table_function
+            return f"icebergS3Cluster('cluster_simple', s3, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')"
+        else:
            if table_function:
                return f"icebergS3(s3, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')"
            else:
@ -197,7 +226,14 @@ def get_creation_expression(
                    DROP TABLE IF EXISTS {table_name};
                    CREATE TABLE {table_name}
                    ENGINE=IcebergS3(s3, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'http://minio1:9001/{bucket}/')"""
+
    elif storage_type == "azure":
+        if run_on_cluster:
+            assert table_function
+            return f"""
+                icebergAzureCluster('cluster_simple', azure, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/iceberg_data/default/{table_name}/', format={format})
+            """
+        else:
            if table_function:
                return f"""
                    icebergAzure(azure, container = '{cluster.azure_container_name}', storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/iceberg_data/default/{table_name}/', format={format})
@ -207,7 +243,14 @@ def get_creation_expression(
                    DROP TABLE IF EXISTS {table_name};
                    CREATE TABLE {table_name}
                    ENGINE=IcebergAzure(azure, container = {cluster.azure_container_name}, storage_account_url = '{cluster.env_variables["AZURITE_STORAGE_ACCOUNT_URL"]}', blob_path = '/iceberg_data/default/{table_name}/', format={format})"""
+
    elif storage_type == "hdfs":
+        if run_on_cluster:
+            assert table_function
+            return f"""
+                icebergHDFSCluster('cluster_simple', hdfs, filename= 'iceberg_data/default/{table_name}/', format={format}, url = 'hdfs://hdfs1:9000/')
+            """
+        else:
            if table_function:
                return f"""
                    icebergHDFS(hdfs, filename= 'iceberg_data/default/{table_name}/', format={format}, url = 'hdfs://hdfs1:9000/')
@ -217,7 +260,10 @@ def get_creation_expression(
                    DROP TABLE IF EXISTS {table_name};
                    CREATE TABLE {table_name}
                    ENGINE=IcebergHDFS(hdfs, filename = 'iceberg_data/default/{table_name}/', format={format}, url = 'hdfs://hdfs1:9000/');"""
+
    elif storage_type == "local":
+        assert not run_on_cluster
+
        if table_function:
            return f"""
                icebergLocal(local, path = '/iceberg_data/default/{table_name}/', format={format})
@ -227,6 +273,7 @@ def get_creation_expression(
                DROP TABLE IF EXISTS {table_name};
                CREATE TABLE {table_name}
                ENGINE=IcebergLocal(local, path = '/iceberg_data/default/{table_name}/', format={format});"""
+
    else:
        raise Exception(f"Unknown iceberg storage type: {storage_type}")

@ -492,6 +539,108 @@ def test_types(started_cluster, format_version, storage_type):
    )


+@pytest.mark.parametrize("format_version", ["1", "2"])
+@pytest.mark.parametrize("storage_type", ["s3", "azure", "hdfs"])
+def test_cluster_table_function(started_cluster, format_version, storage_type):
+    if is_arm() and storage_type == "hdfs":
+        pytest.skip("Disabled test IcebergHDFS for aarch64")
+
+    instance = started_cluster.instances["node1"]
+    spark = started_cluster.spark_session
+
+    TABLE_NAME = (
+        "test_iceberg_cluster_"
+        + format_version
+        + "_"
+        + storage_type
+        + "_"
+        + get_uuid_str()
+    )
+
+    def add_df(mode):
+        write_iceberg_from_df(
+            spark,
+            generate_data(spark, 0, 100),
+            TABLE_NAME,
+            mode=mode,
+            format_version=format_version,
+        )
+
+        files = default_upload_directory(
+            started_cluster,
+            storage_type,
+            f"/iceberg_data/default/{TABLE_NAME}/",
+            f"/iceberg_data/default/{TABLE_NAME}/",
+        )
+
+        logging.info(f"Adding another dataframe. result files: {files}")
+
+        return files
+
+    files = add_df(mode="overwrite")
+    for i in range(1, len(started_cluster.instances)):
+        files = add_df(mode="append")
+
+    logging.info(f"Setup complete. files: {files}")
+    assert len(files) == 5 + 4 * (len(started_cluster.instances) - 1)
+
+    clusters = instance.query(f"SELECT * FROM system.clusters")
+    logging.info(f"Clusters setup: {clusters}")
+
+    # Regular Query only node1
+    table_function_expr = get_creation_expression(
+        storage_type, TABLE_NAME, started_cluster, table_function=True
+    )
+    select_regular = (
+        instance.query(f"SELECT * FROM {table_function_expr}").strip().split()
+    )
+
+    # Cluster Query with node1 as coordinator
+    table_function_expr_cluster = get_creation_expression(
+        storage_type,
+        TABLE_NAME,
+        started_cluster,
+        table_function=True,
+        run_on_cluster=True,
+    )
+    select_cluster = (
+        instance.query(f"SELECT * FROM {table_function_expr_cluster}").strip().split()
+    )
+
+    # Simple size check
+    assert len(select_regular) == 600
+    assert len(select_cluster) == 600
+
+    # Actual check
+    assert select_cluster == select_regular
+
+    # Check query_log
+    for replica in started_cluster.instances.values():
+        replica.query("SYSTEM FLUSH LOGS")
+
+    for node_name, replica in started_cluster.instances.items():
+        cluster_secondary_queries = (
+            replica.query(
+                f"""
+                SELECT query, type, is_initial_query, read_rows, read_bytes FROM system.query_log
+                WHERE
+                    type = 'QueryStart' AND
+                    positionCaseInsensitive(query, '{storage_type}Cluster') != 0 AND
+                    position(query, '{TABLE_NAME}') != 0 AND
+                    position(query, 'system.query_log') = 0 AND
+                    NOT is_initial_query
+            """
+            )
+            .strip()
+            .split("\n")
+        )
+
+        logging.info(
+            f"[{node_name}] cluster_secondary_queries: {cluster_secondary_queries}"
+        )
+        assert len(cluster_secondary_queries) == 1
+
+
@pytest.mark.parametrize("format_version", ["1", "2"])
@pytest.mark.parametrize("storage_type", ["s3", "azure", "hdfs", "local"])
 def test_delete_files(started_cluster, format_version, storage_type):
--- a/tests/queries/0_stateless/03250_ephemeral_comment.reference
+++ b/tests/queries/0_stateless/03250_ephemeral_comment.reference
--- a/tests/queries/0_stateless/03250_ephemeral_comment.sql
+++ b/tests/queries/0_stateless/03250_ephemeral_comment.sql
@ -0,0 +1,11 @@
+drop table if exists test;
+CREATE TABLE test (
+    `start_s`  UInt32 EPHEMERAL COMMENT 'start UNIX time' ,
+    `start_us` UInt16 EPHEMERAL COMMENT 'start microseconds',
+    `finish_s`  UInt32 EPHEMERAL COMMENT 'finish UNIX time',
+    `finish_us` UInt16 EPHEMERAL COMMENT 'finish microseconds',
+    `captured` DateTime MATERIALIZED fromUnixTimestamp(start_s),
+    `duration` Decimal32(6) MATERIALIZED finish_s - start_s + (finish_us - start_us)/1000000
+)
+ENGINE Null;
+drop table if exists test;
--- a/utils/check-style/aspell-ignore/en/aspell-dict.txt
+++ b/utils/check-style/aspell-ignore/en/aspell-dict.txt
@ -244,7 +244,10 @@ Deduplication
 DefaultTableEngine
 DelayedInserts
 DeliveryTag
+Deltalake
 DeltaLake
+deltalakeCluster
+deltaLakeCluster
 Denormalize
 DestroyAggregatesThreads
 DestroyAggregatesThreadsActive
@ -377,10 +380,15 @@ Homebrew's
 HorizontalDivide
 Hostname
 HouseOps
+hudi
 Hudi
+hudiCluster
+HudiCluster
 HyperLogLog
 Hypot
 IANA
+icebergCluster
+IcebergCluster
 IDE
 IDEs
 IDNA
Author	SHA1	Message	Date
Vladimir Cherkasov	6f4c4dc206	Merge `c9e674a01e` into `e0f8b8d351`	2024-11-21 07:06:34 +01:00
Yakov Olkhovskiy	e0f8b8d351	Merge pull request #70458 from ClickHouse/fix-ephemeral-comment Fix ephemeral column comment	2024-11-21 05:10:11 +00:00
Alexey Milovidov	da2176d696	Merge pull request #72081 from ClickHouse/add-dashboard-selector Add advanced dashboard selector	2024-11-21 05:06:51 +00:00
Alexey Milovidov	53e0036593	Merge pull request #72176 from ClickHouse/change-ldf-major-versions Get rid of `major` tags in official docker images	2024-11-21 05:05:41 +00:00
Alexey Milovidov	25bd73ea5e	Merge pull request #72023 from ClickHouse/fix-bind Fix comments	2024-11-21 05:03:24 +00:00
Yakov Olkhovskiy	72d5af29e0	Merge branch 'master' into fix-ephemeral-comment	2024-11-20 22:01:54 +00:00
Mikhail Artemenko	44b4bd38b9	Merge pull request #72045 from ClickHouse/issues/70174/cluster_versions Enable cluster table functions for DataLake Storages	2024-11-20 21:22:37 +00:00
vdimir	c9e674a01e	Add missing settings to randomization	2024-11-20 16:30:26 +00:00
Mikhail f. Shiryaev	9a2a664b04	Get rid of `major` tags in official docker images	2024-11-20 16:36:50 +01:00
Mikhail Artemenko	4ccebd9a24	fix syntax for iceberg in docs	2024-11-20 11:15:39 +00:00
Mikhail Artemenko	99177c0daf	remove icebergCluster alias	2024-11-20 11:15:12 +00:00
serxa	ad67608956	Add advanced dashboard selector	2024-11-19 13:18:21 +00:00
Mikhail Artemenko	0951991c1d	update aspell-dict.txt	2024-11-19 13:10:42 +00:00
Mikhail Artemenko	19aec5e572	Merge branch 'issues/70174/cluster_versions' of github.com:ClickHouse/ClickHouse into issues/70174/cluster_versions	2024-11-19 12:51:56 +00:00
Mikhail Artemenko	a367de9977	add docs	2024-11-19 12:49:59 +00:00
Mikhail Artemenko	6894e280b2	fix pr issues	2024-11-19 12:34:42 +00:00
Mikhail Artemenko	39ebe113d9	Merge branch 'master' into issues/70174/cluster_versions	2024-11-19 11:28:46 +00:00
robot-clickhouse	014608fb6b	Automatic style fix	2024-11-18 17:51:51 +00:00
Mikhail Artemenko	a29ded4941	add test for iceberg	2024-11-18 17:39:46 +00:00
Mikhail Artemenko	d2efae7511	enable cluster versions for datalake storages	2024-11-18 17:35:21 +00:00
Alexey Milovidov	49589da56e	Fix comments	2024-11-18 07:18:46 +01:00
Yakov Olkhovskiy	3827d90bb0	add test	2024-10-08 02:37:41 +00:00
Yakov Olkhovskiy	bf3a3ad607	fix ephemeral comment	2024-10-08 02:27:36 +00:00