additional help messages for extended syntax [#CLICKHOUSE-3000]

2024-09-22 17:50:47 +00:00 · 2017-10-20 23:00:41 +03:00 · 2017-10-20 23:00:41 +03:00 · dd42d53856
commit dd42d53856
parent c61d4106e8
1 changed files with 175 additions and 106 deletions
--- a/dbms/src/Storages/StorageFactory.cpp
+++ b/dbms/src/Storages/StorageFactory.cpp
@ -246,6 +246,117 @@ static void checkAllTypesAreAllowedInTable(const NamesAndTypesList & names_and_t
 }


+static String getMergeTreeVerboseHelp(bool is_extended_syntax)
+{
+    String help = R"(
+
+MergeTree is a family of storage engines.
+
+MergeTrees are different in two ways:
+- they may be replicated and non-replicated;
+- they may do different actions on merge: nothing; sign collapse; sum; apply aggregete functions.
+
+So we have 14 combinations:
+    MergeTree, CollapsingMergeTree, SummingMergeTree, AggregatingMergeTree, ReplacingMergeTree, UnsortedMergeTree, GraphiteMergeTree
+    ReplicatedMergeTree, ReplicatedCollapsingMergeTree, ReplicatedSummingMergeTree, ReplicatedAggregatingMergeTree, ReplicatedReplacingMergeTree, ReplicatedUnsortedMergeTree, ReplicatedGraphiteMergeTree
+
+In most of cases, you need MergeTree or ReplicatedMergeTree.
+
+For replicated merge trees, you need to supply a path in ZooKeeper and a replica name as the first two parameters.
+Path in ZooKeeper is like '/clickhouse/tables/01/' where /clickhouse/tables/ is a common prefix and 01 is a shard name.
+Replica name is like 'mtstat01-1' - it may be the hostname or any suitable string identifying replica.
+You may use macro substitutions for these parameters. It's like ReplicatedMergeTree('/clickhouse/tables/{shard}/', '{replica}'...
+Look at the <macros> section in server configuration file.
+)";
+
+    if (!is_extended_syntax)
+        help += R"(
+Next parameter (which is the first for unreplicated tables and the third for replicated tables) is the name of date column.
+Date column must exist in the table and have type Date (not DateTime).
+It is used for internal data partitioning and works like some kind of index.
+
+If your source data doesn't have a column of type Date, but has a DateTime column, you may add values for Date column while loading,
+ or you may INSERT your source data to a table of type Log and then transform it with INSERT INTO t SELECT toDate(time) AS date, * FROM ...
+If your source data doesn't have any date or time, you may just pass any constant for a date column while loading.
+
+Next parameter is optional sampling expression. Sampling expression is used to implement SAMPLE clause in query for approximate query execution.
+If you don't need approximate query execution, simply omit this parameter.
+Sample expression must be one of the elements of the primary key tuple. For example, if your primary key is (CounterID, EventDate, intHash64(UserID)), your sampling expression might be intHash64(UserID).
+
+Next parameter is the primary key tuple. It's like (CounterID, EventDate, intHash64(UserID)) - a list of column names or functional expressions in round brackets. If your primary key has just one element, you may omit round brackets.
+
+Careful choice of the primary key is extremely important for processing short-time queries.
+
+Next parameter is index (primary key) granularity. Good value is 8192. You have no reasons to use any other value.
+)";
+
+    help += R"(
+For the Collapsing mode, the last parameter is the name of a sign column - a special column that is used to 'collapse' rows with the same primary key while merging.
+
+For the Summing mode, the optional last parameter is a list of columns to sum while merging. This list is passed in round brackets, like (PageViews, Cost).
+If this parameter is omitted, the storage will sum all numeric columns except columns participating in the primary key.
+
+For the Replacing mode, the optional last parameter is the name of a 'version' column. While merging, for all rows with the same primary key, only one row is selected: the last row, if the version column was not specified, or the last row with the maximum version value, if specified.
+)";
+
+    if (is_extended_syntax)
+        help += R"(
+You can specify a partitioning expression in the PARTITION BY clause. It is optional but highly recommended.
+A common partitioning expression is some function of the event date column e.g. PARTITION BY toYYYYMM(EventDate) will partition the table by month.
+Rows with different partition expression values are never merged together. That allows manipulating partitions with ALTER commands.
+Also it acts as a kind of index.
+
+Primary key is specified in the ORDER BY clause. It is mandatory for all MergeTree types except UnsortedMergeTree.
+It is like (CounterID, EventDate, intHash64(UserID)) - a list of column names or functional expressions in round brackets.
+If your primary key has just one element, you may omit round brackets.
+
+Careful choice of the primary key is extremely important for processing short-time queries.
+
+Optional sampling expression can be specified in the SAMPLE BY clause. It is used to implement the SAMPLE clause in a SELECT query for approximate query execution.
+Sampling expression must be one of the elements of the primary key tuple. For example, if your primary key is (CounterID, EventDate, intHash64(UserID)), your sampling expression might be intHash64(UserID).
+
+Engine settings can be specified in the SETTINGS clause. Full list is in the source code in the 'dbms/src/Storages/MergeTree/MergeTreeSettings.h' file.
+E.g. you can specify the index (primary key) granularity with SETTINGS index_granularity = 8192.
+
+Examples:
+
+MergeTree PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate) SETTINGS index_granularity = 8192
+
+MergeTree PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID), EventTime) SAMPLE BY intHash32(UserID)
+
+CollapsingMergeTree(Sign) PARTITION BY StartDate SAMPLE BY intHash32(UserID) ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
+
+SummingMergeTree PARTITION BY toMonday(EventDate) ORDER BY (OrderID, EventDate, BannerID, PhraseID, ContextType, RegionID, PageID, IsFlat, TypeID, ResourceNo)
+
+SummingMergeTree((Shows, Clicks, Cost, CostCur, ShowsSumPosition, ClicksSumPosition, SessionNum, SessionLen, SessionCost, GoalsNum, SessionDepth)) PARTITION BY toYYYYMM(EventDate) ORDER BY (OrderID, EventDate, BannerID, PhraseID, ContextType, RegionID, PageID, IsFlat, TypeID, ResourceNo)
+
+ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/hits', '{replica}') PARTITION BY EventDate ORDER BY (CounterID, EventDate, intHash32(UserID), EventTime) SAMPLE BY intHash32(UserID)
+)";
+    else
+        help += R"(
+Examples:
+
+MergeTree(EventDate, (CounterID, EventDate), 8192)
+
+MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID), EventTime), 8192)
+
+CollapsingMergeTree(StartDate, intHash32(UserID), (CounterID, StartDate, intHash32(UserID), VisitID), 8192, Sign)
+
+SummingMergeTree(EventDate, (OrderID, EventDate, BannerID, PhraseID, ContextType, RegionID, PageID, IsFlat, TypeID, ResourceNo), 8192)
+
+SummingMergeTree(EventDate, (OrderID, EventDate, BannerID, PhraseID, ContextType, RegionID, PageID, IsFlat, TypeID, ResourceNo), 8192, (Shows, Clicks, Cost, CostCur, ShowsSumPosition, ClicksSumPosition, SessionNum, SessionLen, SessionCost, GoalsNum, SessionDepth))
+
+ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/hits', '{replica}', EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID), EventTime), 8192)
+)";
+
+    help += R"(
+For further info please read the documentation: https://clickhouse.yandex/
+)";
+
+    return help;
+}
+
+
 StoragePtr StorageFactory::get(
    const String & name,
    const String & data_path,
@ -688,7 +799,7 @@ StoragePtr StorageFactory::get(
          *  - the name of the column with the date;
          *  - (optional) expression for sampling
          *     (the query with `SAMPLE x` will select rows that have a lower value in this column than `x * UINT32_MAX`);
-          *  - an expression for sorting (either a scalar expression or a tuple from several);
+          *  - an expression for sorting (either a scalar expression or a tuple of several);
          *  - index_granularity;
          *  - (for Collapsing) the name of Int8 column that contains `sign` type with the change of "visit" (taking values 1 and -1).
          * For example: ENGINE = ReplicatedCollapsingMergeTree('/tables/mytable', 'rep02', EventDate, (CounterID, EventDate, intHash32(UniqID), VisitID), 8192, Sign).
@ -704,72 +815,15 @@ StoragePtr StorageFactory::get(
          * ReplacingMergeTree(date, [sample_key], primary_key, index_granularity, [version_column])
          * GraphiteMergeTree(date, [sample_key], primary_key, index_granularity, 'config_element')
          * UnsortedMergeTree(date, index_granularity)  TODO Add description below.
+          *
+          * Alternatively, if experimental_allow_extended_storage_definition_syntax setting is specified,
+          * you can specify:
+          *  - Partitioning expression in the PARTITION BY clause;
+          *  - Primary key in the ORDER BY clause;
+          *  - Sampling expression in the SAMPLE BY clause;
+          *  - Additional MergeTreeSettings in the SETTINGS clause;
          */

-        const char * verbose_help = R"(
-
-MergeTree is family of storage engines.
-
-MergeTrees is different in two ways:
- it may be replicated and non-replicated;
- it may do different actions on merge: nothing; sign collapse; sum; apply aggregete functions.
-
-So we have 14 combinations:
-    MergeTree, CollapsingMergeTree, SummingMergeTree, AggregatingMergeTree, ReplacingMergeTree, UnsortedMergeTree, GraphiteMergeTree
-    ReplicatedMergeTree, ReplicatedCollapsingMergeTree, ReplicatedSummingMergeTree, ReplicatedAggregatingMergeTree, ReplicatedReplacingMergeTree, ReplicatedUnsortedMergeTree, ReplicatedGraphiteMergeTree
-
-In most of cases, you need MergeTree or ReplicatedMergeTree.
-
-For replicated merge trees, you need to supply path in ZooKeeper and replica name as first two parameters.
-Path in ZooKeeper is like '/clickhouse/tables/01/' where /clickhouse/tables/ is common prefix and 01 is shard name.
-Replica name is like 'mtstat01-1' - it may be hostname or any suitable string identifying replica.
-You may use macro substitutions for these parameters. It's like ReplicatedMergeTree('/clickhouse/tables/{shard}/', '{replica}'...
-Look at <macros> section in server configuration file.
-
-Next parameter (which is first for unreplicated tables and third for replicated tables) is name of date column.
-Date column must exist in table and have type Date (not DateTime).
-It is used for internal data partitioning and works like some kind of index.
-
-If your source data doesn't have column of type Date, but have DateTime column, you may add values for Date column while loading,
- or you may INSERT your source data to table of type Log and then transform it with INSERT INTO t SELECT toDate(time) AS date, * FROM ...
-If your source data doesn't have any date or time, you may just pass any constant for date column while loading.
-
-Next parameter is optional sampling expression. Sampling expression is used to implement SAMPLE clause in query for approximate query execution.
-If you don't need approximate query execution, simply omit this parameter.
-Sample expression must be one of elements of primary key tuple. For example, if your primary key is (CounterID, EventDate, intHash64(UserID)), your sampling expression might be intHash64(UserID).
-
-Next parameter is primary key tuple. It's like (CounterID, EventDate, intHash64(UserID)) - list of column names or functional expressions in round brackets. If your primary key have just one element, you may omit round brackets.
-
-Careful choice of primary key is extremely important for processing short-time queries.
-
-Next parameter is index (primary key) granularity. Good value is 8192. You have no reasons to use any other value.
-
-For Collapsing mode, last parameter is name of sign column - special column that is used to 'collapse' rows with same primary key while merge.
-
-For Summing mode, last parameter is optional list of columns to sum while merge. List is passed in round brackets, like (PageViews, Cost).
-If this parameter is omitted, storage will sum all numeric columns except columns participated in primary key.
-
-For Replacing mode, last parameter is optional name of 'version' column. While merging, for all rows with same primary key, only one row is selected: last row, if version column was not specified, or last row with maximum version value, if specified.
-
-
-Examples:
-
-MergeTree(EventDate, (CounterID, EventDate), 8192)
-
-MergeTree(EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID), EventTime), 8192)
-
-CollapsingMergeTree(StartDate, intHash32(UserID), (CounterID, StartDate, intHash32(UserID), VisitID), 8192, Sign)
-
-SummingMergeTree(EventDate, (OrderID, EventDate, BannerID, PhraseID, ContextType, RegionID, PageID, IsFlat, TypeID, ResourceNo), 8192)
-
-SummingMergeTree(EventDate, (OrderID, EventDate, BannerID, PhraseID, ContextType, RegionID, PageID, IsFlat, TypeID, ResourceNo), 8192, (Shows, Clicks, Cost, CostCur, ShowsSumPosition, ClicksSumPosition, SessionNum, SessionLen, SessionCost, GoalsNum, SessionDepth))
-
-ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/hits', '{replica}', EventDate, intHash32(UserID), (CounterID, EventDate, intHash32(UserID), EventTime), 8192)
-
-
-For further info please read the documentation: https://clickhouse.yandex/
-)";
-
        String name_part = name.substr(0, name.size() - strlen("MergeTree"));

        bool replicated = startsWith(name_part, "Replicated");
@ -779,6 +833,9 @@ For further info please read the documentation: https://clickhouse.yandex/
        MergeTreeData::MergingParams merging_params;
        merging_params.mode = MergeTreeData::MergingParams::Ordinary;

+        const bool allow_extended_storage_def =
+            local_context.getSettingsRef().experimental_allow_extended_storage_definition_syntax;
+
        if (name_part == "Collapsing")
            merging_params.mode = MergeTreeData::MergingParams::Collapsing;
        else if (name_part == "Summing")
@ -792,7 +849,9 @@ For further info please read the documentation: https://clickhouse.yandex/
        else if (name_part == "Graphite")
            merging_params.mode = MergeTreeData::MergingParams::Graphite;
        else if (!name_part.empty())
-            throw Exception("Unknown storage " + name + verbose_help, ErrorCodes::UNKNOWN_STORAGE);
+            throw Exception(
+                "Unknown storage " + name + getMergeTreeVerboseHelp(allow_extended_storage_def),
+                ErrorCodes::UNKNOWN_STORAGE);

        ASTs args;
        if (engine_def.arguments)
@ -803,9 +862,6 @@ For further info please read the documentation: https://clickhouse.yandex/
        bool is_extended_storage_def =
            storage_def.partition_by || storage_def.order_by || storage_def.sample_by || storage_def.settings;

-        const bool allow_extended_storage_def =
-            local_context.getSettingsRef().experimental_allow_extended_storage_definition_syntax;
-
        if (is_extended_storage_def && !allow_extended_storage_def)
            throw Exception(
                "Extended storage definition syntax (PARTITION BY, ORDER BY, SAMPLE BY and SETTINGS clauses) "
@ -815,13 +871,25 @@ For further info please read the documentation: https://clickhouse.yandex/
        size_t max_num_params = 0;
        String needed_params;

+        auto add_mandatory_param = [&](const char * desc)
+        {
+            ++min_num_params;
+            ++max_num_params;
+            needed_params += needed_params.empty() ? "\n" : ",\n";
+            needed_params += desc;
+        };
+        auto add_optional_param = [&](const char * desc)
+        {
+            ++max_num_params;
+            needed_params += needed_params.empty() ? "\n" : ",\n[";
+            needed_params += desc;
+            needed_params += "]";
+        };
+
        if (replicated)
        {
-            needed_params +=
-                "\npath in ZooKeeper,"
-                "\nreplica name";
-            min_num_params += 2;
-            max_num_params += 2;
+            add_mandatory_param("path in ZooKeeper");
+            add_mandatory_param("replica name");
        }

        if (!is_extended_storage_def)
@ -832,22 +900,16 @@ For further info please read the documentation: https://clickhouse.yandex/
                    is_extended_storage_def = true;
                else
                {
-                    needed_params +=
-                        "\nname of column with date,"
-                        "\nindex granularity";
-                    min_num_params += 2;
-                    max_num_params += 2;
+                    add_mandatory_param("name of column with date");
+                    add_mandatory_param("index granularity");
                }
            }
            else
            {
-                needed_params +=
-                    ",\nname of column with date"
-                    ",\n[sampling element of primary key]"
-                    ",\nprimary key expression"
-                    ",\nindex granularity";
-                min_num_params += 3;
-                max_num_params += 4;
+                add_mandatory_param("name of column with date");
+                add_optional_param("sampling element of primary key");
+                add_mandatory_param("primary key expression");
+                add_mandatory_param("index granularity");
            }
        }

@ -856,22 +918,16 @@ For further info please read the documentation: https://clickhouse.yandex/
        default:
            break;
        case MergeTreeData::MergingParams::Summing:
-            needed_params += ",\n[list of columns to sum]";
-            max_num_params += 1;
+            add_optional_param("list of columns to sum");
            break;
        case MergeTreeData::MergingParams::Replacing:
-            needed_params += ",\n[version]";
-            max_num_params += 1;
+            add_optional_param("version");
            break;
        case MergeTreeData::MergingParams::Collapsing:
-            needed_params += ",\nsign column";
-            min_num_params += 1;
-            max_num_params += 1;
+            add_mandatory_param("sign column");
            break;
        case MergeTreeData::MergingParams::Graphite:
-            needed_params += ",\n'config_element_for_graphite_schema'";
-            min_num_params += 1;
-            max_num_params += 1;
+            add_mandatory_param("'config_element_for_graphite_schema'");
            break;
        }

@ -894,8 +950,7 @@ For further info please read the documentation: https://clickhouse.yandex/
            else
                msg += "no parameters";

-            if (!is_extended_storage_def)
-                msg += verbose_help;
+            msg += getMergeTreeVerboseHelp(is_extended_storage_def);

            throw Exception(msg, ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
        }
@ -910,16 +965,22 @@ For further info please read the documentation: https://clickhouse.yandex/
            if (ast && ast->value.getType() == Field::Types::String)
                zookeeper_path = safeGet<String>(ast->value);
            else
-                throw Exception(String("Path in ZooKeeper must be a string literal") + verbose_help, ErrorCodes::BAD_ARGUMENTS);
+                throw Exception(
+                    "Path in ZooKeeper must be a string literal" + getMergeTreeVerboseHelp(is_extended_storage_def),
+                    ErrorCodes::BAD_ARGUMENTS);

            ast = typeid_cast<ASTLiteral *>(&*args[1]);
            if (ast && ast->value.getType() == Field::Types::String)
                replica_name = safeGet<String>(ast->value);
            else
-                throw Exception(String("Replica name must be a string literal") + verbose_help, ErrorCodes::BAD_ARGUMENTS);
+                throw Exception(
+                    "Replica name must be a string literal" + getMergeTreeVerboseHelp(is_extended_storage_def),
+                    ErrorCodes::BAD_ARGUMENTS);

            if (replica_name.empty())
-                throw Exception(String("No replica name in config") + verbose_help, ErrorCodes::NO_REPLICA_NAME_GIVEN);
+                throw Exception(
+                    "No replica name in config" + getMergeTreeVerboseHelp(is_extended_storage_def),
+                    ErrorCodes::NO_REPLICA_NAME_GIVEN);

            args.erase(args.begin(), args.begin() + 2);
        }
@ -929,7 +990,9 @@ For further info please read the documentation: https://clickhouse.yandex/
            if (auto ast = typeid_cast<ASTIdentifier *>(&*args.back()))
                merging_params.sign_column = ast->name;
            else
-                throw Exception(String("Sign column name must be an unquoted string") + verbose_help, ErrorCodes::BAD_ARGUMENTS);
+                throw Exception(
+                    "Sign column name must be an unquoted string" + getMergeTreeVerboseHelp(is_extended_storage_def),
+                    ErrorCodes::BAD_ARGUMENTS);

            args.pop_back();
        }
@ -941,7 +1004,9 @@ For further info please read the documentation: https://clickhouse.yandex/
                if (auto ast = typeid_cast<ASTIdentifier *>(&*args.back()))
                    merging_params.version_column = ast->name;
                else
-                    throw Exception(String("Version column name must be an unquoted string") + verbose_help, ErrorCodes::BAD_ARGUMENTS);
+                    throw Exception(
+                        "Version column name must be an unquoted string" + getMergeTreeVerboseHelp(is_extended_storage_def),
+                        ErrorCodes::BAD_ARGUMENTS);

                args.pop_back();
            }
@ -959,7 +1024,7 @@ For further info please read the documentation: https://clickhouse.yandex/
        {
            String graphite_config_name;
            String error_msg = "Last parameter of GraphiteMergeTree must be name (in single quotes) of element in configuration file with Graphite options";
-            error_msg += verbose_help;
+            error_msg += getMergeTreeVerboseHelp(is_extended_storage_def);

            if (auto ast = typeid_cast<ASTLiteral *>(&*args.back()))
            {
@ -1008,7 +1073,9 @@ For further info please read the documentation: https://clickhouse.yandex/
            if (auto ast = typeid_cast<ASTIdentifier *>(args[0].get()))
                date_column_name = ast->name;
            else
-                throw Exception(String("Date column name must be an unquoted string") + verbose_help, ErrorCodes::BAD_ARGUMENTS);
+                throw Exception(
+                    "Date column name must be an unquoted string" + getMergeTreeVerboseHelp(is_extended_storage_def),
+                    ErrorCodes::BAD_ARGUMENTS);

            if (merging_params.mode != MergeTreeData::MergingParams::Unsorted)
                primary_expr_list = extractKeyExpressionList(*args[1]);
@ -1017,7 +1084,9 @@ For further info please read the documentation: https://clickhouse.yandex/
            if (ast && ast->value.getType() == Field::Types::UInt64)
                storage_settings.index_granularity = safeGet<UInt64>(ast->value);
            else
-                throw Exception(String("Index granularity must be a positive integer") + verbose_help, ErrorCodes::BAD_ARGUMENTS);
+                throw Exception(
+                    "Index granularity must be a positive integer" + getMergeTreeVerboseHelp(is_extended_storage_def),
+                    ErrorCodes::BAD_ARGUMENTS);
        }

        if (replicated)