From 774c4b52dadbd0fbb2430d2abbf62c3b630204ef Mon Sep 17 00:00:00 2001 From: Robert Schulze Date: Mon, 18 Sep 2023 20:08:37 +0000 Subject: [PATCH] Rework --- docs/en/operations/settings/settings.md | 11 +- .../functions/splitting-merging-functions.md | 16 +- src/Core/Settings.h | 2 +- src/Functions/FunctionsStringArray.cpp | 22 +- src/Functions/FunctionsStringArray.h | 339 +++++------------- src/Functions/URL/URLHierarchy.cpp | 2 +- src/Functions/URL/URLPathHierarchy.cpp | 2 +- .../URL/extractURLParameterNames.cpp | 2 +- src/Functions/URL/extractURLParameters.cpp | 2 +- .../02475_split_with_max_substrings.reference | 204 ++++++++--- .../02475_split_with_max_substrings.sql | 226 +++++++++--- ...6_splitby_max_substring_behavior.reference | 126 ------- .../02876_splitby_max_substring_behavior.sql | 151 -------- 13 files changed, 446 insertions(+), 659 deletions(-) delete mode 100644 tests/queries/0_stateless/02876_splitby_max_substring_behavior.reference delete mode 100644 tests/queries/0_stateless/02876_splitby_max_substring_behavior.sql diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index ad1437ea3eb..ef4703e3bc3 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -4067,17 +4067,16 @@ Result: └─────┴─────┴───────┘ ``` -## splitby_max_substring_behavior {#splitby-max-substring-behavior} +## splitby_max_substrings_includes_remaining_string {#splitby_max_substrings_includes_remaining_string} -Controls how functions [splitBy*()](../../sql-reference/functions/splitting-merging-functions.md) with given `max_substring` argument behave. +Controls whether function [splitBy*()](../../sql-reference/functions/splitting-merging-functions.md) with argument `max_substrings` > 0 will include the remaining string in the last element of the result array. Possible values: -- `''` - If `max_substring` >=1, return the first `max_substring`-many splits. -- `'python'` - If `max_substring` >= 0, split `max_substring`-many times, and return `max_substring + 1` elements where the last element contains the remaining string. -- `'spark'` - If `max_substring` >= 1, split `max_substring`-many times, and return `max_substring + 1` elements where the last element contains the remaining string. +- `0` - The remaining string will not be included in the last element of the result array. +- `1` - The remaining string will be included in the last element of the result array. This is the behavior of Spark's [`split()`](https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.sql.functions.split.html) function and Python's ['string.split()'](https://docs.python.org/3/library/stdtypes.html#str.split) method. -Default value: ``. +Default value: `0` ## enable_extended_results_for_datetime_functions {#enable-extended-results-for-datetime-functions} diff --git a/docs/en/sql-reference/functions/splitting-merging-functions.md b/docs/en/sql-reference/functions/splitting-merging-functions.md index 1e0bc3da664..614bf556c8e 100644 --- a/docs/en/sql-reference/functions/splitting-merging-functions.md +++ b/docs/en/sql-reference/functions/splitting-merging-functions.md @@ -21,7 +21,7 @@ splitByChar(separator, s[, max_substrings])) - `separator` — The separator which should contain exactly one character. [String](../../sql-reference/data-types/string.md). - `s` — The string to split. [String](../../sql-reference/data-types/string.md). -- `max_substrings` — An optional `Int64` defaulting to 0. When `max_substrings` > 0, the returned substrings will be no more than `max_substrings`, otherwise the function will return as many substrings as possible. +- `max_substrings` — An optional `Int64` defaulting to 0. If `max_substrings` > 0, the returned array will contain at most `max_substrings` substrings, otherwise the function will return as many substrings as possible. **Returned value(s)** @@ -39,7 +39,9 @@ For example, - in v22.10: `SELECT splitByChar('=', 'a=b=c=d', 2); -- ['a','b','c=d']` - in v22.11: `SELECT splitByChar('=', 'a=b=c=d', 2); -- ['a','b']` -The previous behavior can be restored by setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) = 'python'. +A behavior similar to ClickHouse pre-v22.11 can be achieved by setting +[splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) +`SELECT splitByChar('=', 'a=b=c=d', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1 -- ['a', 'b=c=d']` ::: **Example** @@ -82,7 +84,7 @@ Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-refere - There are multiple consecutive non-empty separators; - The original string `s` is empty while the separator is not empty. -Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0. +Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0. **Example** @@ -137,7 +139,7 @@ Returns an array of selected substrings. Empty substrings may be selected when: Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). -Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0. +Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0. **Example** @@ -188,7 +190,7 @@ Returns an array of selected substrings. Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). -Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0. +Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0. **Example** @@ -227,7 +229,7 @@ Returns an array of selected substrings. Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). -Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0. +Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0. **Example** @@ -289,7 +291,7 @@ Returns an array of selected substrings. Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)). -Setting [splitby_max_substring_behavior](../../operations/settings/settings.md#splitby-max-substring-behavior) (default: '') controls the behavior with `max_substrings` > 0. +Setting [splitby_max_substrings_includes_remaining_string](../../operations/settings/settings.md#splitby_max_substrings_includes_remaining_string) (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0. **Example** diff --git a/src/Core/Settings.h b/src/Core/Settings.h index ca8f82ed8b6..fe9f50baf20 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -502,7 +502,7 @@ class IColumn; M(Bool, reject_expensive_hyperscan_regexps, true, "Reject patterns which will likely be expensive to evaluate with hyperscan (due to NFA state explosion)", 0) \ M(Bool, allow_simdjson, true, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.", 0) \ M(Bool, allow_introspection_functions, false, "Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.", 0) \ - M(String, splitby_max_substring_behavior, "", "Control the behavior of the 'max_substring' argument in functions splitBy*(): '' (default), 'python' or 'spark'", 0) \ + M(Bool, splitby_max_substrings_includes_remaining_string, false, "Functions 'splitBy*()' with 'max_substrings' argument > 0 include the remaining string as last element in the result", 0) \ \ M(Bool, allow_execute_multiif_columnar, true, "Allow execute multiIf function columnar", 0) \ M(Bool, formatdatetime_f_prints_single_zero, false, "Formatter '%f' in function 'formatDateTime()' produces a single zero instead of six zeros if the formatted value has no fractional seconds.", 0) \ diff --git a/src/Functions/FunctionsStringArray.cpp b/src/Functions/FunctionsStringArray.cpp index 326651c111d..4afee55704f 100644 --- a/src/Functions/FunctionsStringArray.cpp +++ b/src/Functions/FunctionsStringArray.cpp @@ -19,7 +19,7 @@ std::optional extractMaxSplitsImpl(const ColumnWithTypeAndName & argument return static_cast(value); } -std::optional extractMaxSplits(const ColumnsWithTypeAndName & arguments, size_t max_substrings_argument_position, MaxSubstringBehavior max_substring_behavior) +std::optional extractMaxSplits(const ColumnsWithTypeAndName & arguments, size_t max_substrings_argument_position) { if (max_substrings_argument_position >= arguments.size()) return std::nullopt; @@ -35,24 +35,8 @@ std::optional extractMaxSplits(const ColumnsWithTypeAndName & arguments, arguments[max_substrings_argument_position].column->getName(), max_substrings_argument_position + 1); - if (max_splits) - switch (max_substring_behavior) - { - case MaxSubstringBehavior::LikeClickHouse: - case MaxSubstringBehavior::LikeSpark: - { - if (*max_splits <= 0) - return std::nullopt; - break; - } - case MaxSubstringBehavior::LikePython: - { - if (*max_splits < 0) - return std::nullopt; - break; - } - } - + if (*max_splits <= 0) + return std::nullopt; return max_splits; } diff --git a/src/Functions/FunctionsStringArray.h b/src/Functions/FunctionsStringArray.h index e720fc96e52..d7d7e3b5100 100644 --- a/src/Functions/FunctionsStringArray.h +++ b/src/Functions/FunctionsStringArray.h @@ -54,14 +54,7 @@ namespace ErrorCodes using Pos = const char *; -enum class MaxSubstringBehavior -{ - LikeClickHouse, - LikeSpark, - LikePython -}; - -std::optional extractMaxSplits(const ColumnsWithTypeAndName & arguments, size_t max_substrings_argument_position, MaxSubstringBehavior max_substring_behavior); +std::optional extractMaxSplits(const ColumnsWithTypeAndName & arguments, size_t max_substrings_argument_position); /// Substring generators. All of them have a common interface. @@ -72,7 +65,7 @@ private: Pos end; std::optional max_splits; size_t splits; - MaxSubstringBehavior max_substring_behavior; + bool max_substrings_includes_remaining_string; public: static constexpr auto name = "alphaTokens"; @@ -97,10 +90,10 @@ public: static constexpr auto strings_argument_position = 0uz; - void init(const ColumnsWithTypeAndName & arguments, MaxSubstringBehavior max_substring_behavior_) + void init(const ColumnsWithTypeAndName & arguments, bool max_substrings_includes_remaining_string_) { - max_substring_behavior = max_substring_behavior_; - max_splits = extractMaxSplits(arguments, 1, max_substring_behavior); + max_substrings_includes_remaining_string = max_substrings_includes_remaining_string_; + max_splits = extractMaxSplits(arguments, 1); } /// Called for each next string. @@ -125,35 +118,18 @@ public: if (max_splits) { - switch (max_substring_behavior) + if (max_substrings_includes_remaining_string) { - case MaxSubstringBehavior::LikeClickHouse: + if (splits == *max_splits - 1) { - if (splits == *max_splits) - return false; - break; - } - case MaxSubstringBehavior::LikeSpark: - { - if (splits == *max_splits - 1) - { - token_end = end; - pos = end; - return true; - } - break; - } - case MaxSubstringBehavior::LikePython: - { - if (splits == *max_splits) - { - token_end = end; - pos = end; - return true; - } - break; + token_end = end; + pos = end; + return true; } } + else + if (splits == *max_splits) + return false; } while (pos < end && isAlphaASCII(*pos)) @@ -173,7 +149,7 @@ private: Pos end; std::optional max_splits; size_t splits; - MaxSubstringBehavior max_substring_behavior; + bool max_substrings_includes_remaining_string; public: /// Get the name of the function. @@ -190,10 +166,10 @@ public: static constexpr auto strings_argument_position = 0uz; - void init(const ColumnsWithTypeAndName & arguments, MaxSubstringBehavior max_substring_behavior_) + void init(const ColumnsWithTypeAndName & arguments, bool max_substrings_includes_remaining_string_) { - max_substring_behavior = max_substring_behavior_; - max_splits = extractMaxSplits(arguments, 1, max_substring_behavior); + max_substrings_includes_remaining_string = max_substrings_includes_remaining_string_; + max_splits = extractMaxSplits(arguments, 1); } /// Called for each next string. @@ -218,35 +194,18 @@ public: if (max_splits) { - switch (max_substring_behavior) + if (max_substrings_includes_remaining_string) { - case MaxSubstringBehavior::LikeClickHouse: + if (splits == *max_splits - 1) { - if (splits == *max_splits) - return false; - break; - } - case MaxSubstringBehavior::LikeSpark: - { - if (splits == *max_splits - 1) - { - token_end = end; - pos = end; - return true; - } - break; - } - case MaxSubstringBehavior::LikePython: - { - if (splits == *max_splits) - { - token_end = end; - pos = end; - return true; - } - break; + token_end = end; + pos = end; + return true; } } + else + if (splits == *max_splits) + return false; } while (pos < end && !(isWhitespaceASCII(*pos) || isPunctuationASCII(*pos))) @@ -266,7 +225,7 @@ private: Pos end; std::optional max_splits; size_t splits; - MaxSubstringBehavior max_substring_behavior; + bool max_substrings_includes_remaining_string; public: static constexpr auto name = "splitByWhitespace"; @@ -282,10 +241,10 @@ public: static constexpr auto strings_argument_position = 0uz; - void init(const ColumnsWithTypeAndName & arguments, MaxSubstringBehavior max_substring_behavior_) + void init(const ColumnsWithTypeAndName & arguments, bool max_substrings_includes_remaining_string_) { - max_substring_behavior = max_substring_behavior_; - max_splits = extractMaxSplits(arguments, 1, max_substring_behavior); + max_substrings_includes_remaining_string = max_substrings_includes_remaining_string_; + max_splits = extractMaxSplits(arguments, 1); } /// Called for each next string. @@ -310,35 +269,18 @@ public: if (max_splits) { - switch (max_substring_behavior) + if (max_substrings_includes_remaining_string) { - case MaxSubstringBehavior::LikeClickHouse: + if (splits == *max_splits - 1) { - if (splits == *max_splits) - return false; - break; - } - case MaxSubstringBehavior::LikeSpark: - { - if (splits == *max_splits - 1) - { - token_end = end; - pos = end; - return true; - } - break; - } - case MaxSubstringBehavior::LikePython: - { - if (splits == *max_splits) - { - token_end = end; - pos = end; - return true; - } - break; + token_end = end; + pos = end; + return true; } } + else + if (splits == *max_splits) + return false; } while (pos < end && !isWhitespaceASCII(*pos)) @@ -359,7 +301,7 @@ private: char separator; std::optional max_splits; size_t splits; - MaxSubstringBehavior max_substring_behavior; + bool max_substrings_includes_remaining_string; public: static constexpr auto name = "splitByChar"; @@ -383,7 +325,7 @@ public: static constexpr auto strings_argument_position = 1uz; - void init(const ColumnsWithTypeAndName & arguments, MaxSubstringBehavior max_substring_behavior_) + void init(const ColumnsWithTypeAndName & arguments, bool max_substrings_includes_remaining_string_) { const ColumnConst * col = checkAndGetColumnConstStringOrFixedString(arguments[0].column.get()); @@ -398,8 +340,8 @@ public: separator = sep_str[0]; - max_substring_behavior = max_substring_behavior_; - max_splits = extractMaxSplits(arguments, 2, max_substring_behavior); + max_substrings_includes_remaining_string = max_substrings_includes_remaining_string_; + max_splits = extractMaxSplits(arguments, 2); } void set(Pos pos_, Pos end_) @@ -418,35 +360,18 @@ public: if (max_splits) { - switch (max_substring_behavior) + if (max_substrings_includes_remaining_string) { - case MaxSubstringBehavior::LikeClickHouse: + if (splits == *max_splits - 1) { - if (splits == *max_splits) - return false; - break; - } - case MaxSubstringBehavior::LikeSpark: - { - if (splits == *max_splits - 1) - { - token_end = end; - pos = nullptr; - return true; - } - break; - } - case MaxSubstringBehavior::LikePython: - { - if (splits == *max_splits) - { - token_end = end; - pos = nullptr; - return true; - } - break; + token_end = end; + pos = nullptr; + return true; } } + else + if (splits == *max_splits) + return false; } pos = reinterpret_cast(memchr(pos, separator, end - pos)); @@ -472,7 +397,7 @@ private: String separator; std::optional max_splits; size_t splits; - MaxSubstringBehavior max_substring_behavior; + bool max_substrings_includes_remaining_string; public: static constexpr auto name = "splitByString"; @@ -487,7 +412,7 @@ public: static constexpr auto strings_argument_position = 1uz; - void init(const ColumnsWithTypeAndName & arguments, MaxSubstringBehavior max_substring_behavior_) + void init(const ColumnsWithTypeAndName & arguments, bool max_substrings_includes_remaining_string_) { const ColumnConst * col = checkAndGetColumnConstStringOrFixedString(arguments[0].column.get()); @@ -497,8 +422,8 @@ public: separator = col->getValue(); - max_substring_behavior = max_substring_behavior_; - max_splits = extractMaxSplits(arguments, 2, max_substring_behavior); + max_substrings_includes_remaining_string = max_substrings_includes_remaining_string_; + max_splits = extractMaxSplits(arguments, 2); } /// Called for each next string. @@ -521,35 +446,18 @@ public: if (max_splits) { - switch (max_substring_behavior) + if (max_substrings_includes_remaining_string) { - case MaxSubstringBehavior::LikeClickHouse: + if (splits == *max_splits - 1) { - if (splits == *max_splits) - return false; - break; - } - case MaxSubstringBehavior::LikeSpark: - { - if (splits == *max_splits - 1) - { - token_end = end; - pos = end; - return true; - } - break; - } - case MaxSubstringBehavior::LikePython: - { - if (splits == *max_splits) - { - token_end = end; - pos = end; - return true; - } - break; + token_end = end; + pos = end; + return true; } } + else + if (splits == *max_splits) + return false; } pos += 1; @@ -565,35 +473,18 @@ public: if (max_splits) { - switch (max_substring_behavior) + if (max_substrings_includes_remaining_string) { - case MaxSubstringBehavior::LikeClickHouse: + if (splits == *max_splits - 1) { - if (splits == *max_splits) - return false; - break; - } - case MaxSubstringBehavior::LikeSpark: - { - if (splits == *max_splits - 1) - { - token_end = end; - pos = nullptr; - return true; - } - break; - } - case MaxSubstringBehavior::LikePython: - { - if (splits == *max_splits) - { - token_end = end; - pos = nullptr; - return true; - } - break; + token_end = end; + pos = nullptr; + return true; } } + else + if (splits == *max_splits) + return false; } pos = reinterpret_cast(memmem(pos, end - pos, separator.data(), separator.size())); @@ -622,7 +513,7 @@ private: std::optional max_splits; size_t splits; - MaxSubstringBehavior max_substring_behavior; + bool max_substrings_includes_remaining_string; public: static constexpr auto name = "splitByRegexp"; @@ -638,7 +529,7 @@ public: static constexpr auto strings_argument_position = 1uz; - void init(const ColumnsWithTypeAndName & arguments, MaxSubstringBehavior max_substring_behavior_) + void init(const ColumnsWithTypeAndName & arguments, bool max_substrings_includes_remaining_string_) { const ColumnConst * col = checkAndGetColumnConstStringOrFixedString(arguments[0].column.get()); @@ -649,8 +540,8 @@ public: if (!col->getValue().empty()) re = std::make_shared(Regexps::createRegexp(col->getValue())); - max_substring_behavior = max_substring_behavior_; - max_splits = extractMaxSplits(arguments, 2, max_substring_behavior); + max_substrings_includes_remaining_string = max_substrings_includes_remaining_string_; + max_splits = extractMaxSplits(arguments, 2); } /// Called for each next string. @@ -673,35 +564,18 @@ public: if (max_splits) { - switch (max_substring_behavior) + if (max_substrings_includes_remaining_string) { - case MaxSubstringBehavior::LikeClickHouse: + if (splits == *max_splits - 1) { - if (splits == *max_splits) - return false; - break; - } - case MaxSubstringBehavior::LikeSpark: - { - if (splits == *max_splits - 1) - { - token_end = end; - pos = end; - return true; - } - break; - } - case MaxSubstringBehavior::LikePython: - { - if (splits == *max_splits) - { - token_end = end; - pos = end; - return true; - } - break; + token_end = end; + pos = end; + return true; } } + else + if (splits == *max_splits) + return false; } pos += 1; @@ -717,35 +591,18 @@ public: if (max_splits) { - switch (max_substring_behavior) + if (max_substrings_includes_remaining_string) { - case MaxSubstringBehavior::LikeClickHouse: + if (splits == *max_splits - 1) { - if (splits == *max_splits) - return false; - break; - } - case MaxSubstringBehavior::LikeSpark: - { - if (splits == *max_splits - 1) - { - token_end = end; - pos = nullptr; - return true; - } - break; - } - case MaxSubstringBehavior::LikePython: - { - if (splits == *max_splits) - { - token_end = end; - pos = nullptr; - return true; - } - break; + token_end = end; + pos = nullptr; + return true; } } + else + if (splits == *max_splits) + return false; } if (!re->match(pos, end - pos, matches) || !matches[0].length) @@ -792,7 +649,7 @@ public: static constexpr auto strings_argument_position = 0uz; - void init(const ColumnsWithTypeAndName & arguments, MaxSubstringBehavior /*max_substring_behavior*/) + void init(const ColumnsWithTypeAndName & arguments, bool /*max_substrings_includes_remaining_string*/) { const ColumnConst * col = checkAndGetColumnConstStringOrFixedString(arguments[1].column.get()); @@ -845,7 +702,7 @@ template class FunctionTokens : public IFunction { private: - MaxSubstringBehavior max_substring_behavior; + bool max_substrings_includes_remaining_string; public: static constexpr auto name = Generator::name; @@ -854,17 +711,7 @@ public: explicit FunctionTokens(ContextPtr context) { const Settings & settings = context->getSettingsRef(); - if (settings.splitby_max_substring_behavior.value == "") - max_substring_behavior = MaxSubstringBehavior::LikeClickHouse; - else if (settings.splitby_max_substring_behavior.value == "python") - max_substring_behavior = MaxSubstringBehavior::LikePython; - else if (settings.splitby_max_substring_behavior.value == "spark") - max_substring_behavior = MaxSubstringBehavior::LikeSpark; - else - throw Exception( - ErrorCodes::ILLEGAL_COLUMN, - "Illegal value {} for setting splitby_max_substring_behavior in function {}, must be '', 'python' or 'spark'", - settings.splitby_max_substring_behavior.value, getName()); + max_substrings_includes_remaining_string = settings.splitby_max_substrings_includes_remaining_string; } String getName() const override { return name; } @@ -885,7 +732,7 @@ public: ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t /*input_rows_count*/) const override { Generator generator; - generator.init(arguments, max_substring_behavior); + generator.init(arguments, max_substrings_includes_remaining_string); const auto & array_argument = arguments[generator.strings_argument_position]; diff --git a/src/Functions/URL/URLHierarchy.cpp b/src/Functions/URL/URLHierarchy.cpp index 260053dc401..96b64d3182b 100644 --- a/src/Functions/URL/URLHierarchy.cpp +++ b/src/Functions/URL/URLHierarchy.cpp @@ -30,7 +30,7 @@ public: static constexpr auto strings_argument_position = 0uz; - void init(const ColumnsWithTypeAndName & /*arguments*/, MaxSubstringBehavior /*max_substring_behavior*/) {} + void init(const ColumnsWithTypeAndName & /*arguments*/, bool /*max_substrings_includes_remaining_string*/) {} /// Called for each next string. void set(Pos pos_, Pos end_) diff --git a/src/Functions/URL/URLPathHierarchy.cpp b/src/Functions/URL/URLPathHierarchy.cpp index a11be358a70..7fd6601d780 100644 --- a/src/Functions/URL/URLPathHierarchy.cpp +++ b/src/Functions/URL/URLPathHierarchy.cpp @@ -29,7 +29,7 @@ public: static constexpr auto strings_argument_position = 0uz; - void init(const ColumnsWithTypeAndName & /*arguments*/, MaxSubstringBehavior /*max_substring_behavior*/) {} + void init(const ColumnsWithTypeAndName & /*arguments*/, bool /*max_substring_behavior*/) {} /// Called for each next string. void set(Pos pos_, Pos end_) diff --git a/src/Functions/URL/extractURLParameterNames.cpp b/src/Functions/URL/extractURLParameterNames.cpp index 2b79be07cae..b792d9140d6 100644 --- a/src/Functions/URL/extractURLParameterNames.cpp +++ b/src/Functions/URL/extractURLParameterNames.cpp @@ -29,7 +29,7 @@ public: static constexpr auto strings_argument_position = 0uz; - void init(const ColumnsWithTypeAndName & /*arguments*/, MaxSubstringBehavior /*max_substring_behavior*/) {} + void init(const ColumnsWithTypeAndName & /*arguments*/, bool /*max_substrings_includes_remaining_string*/) {} /// Called for each next string. void set(Pos pos_, Pos end_) diff --git a/src/Functions/URL/extractURLParameters.cpp b/src/Functions/URL/extractURLParameters.cpp index 271e5dc89c9..e1243d8fbcd 100644 --- a/src/Functions/URL/extractURLParameters.cpp +++ b/src/Functions/URL/extractURLParameters.cpp @@ -27,7 +27,7 @@ public: validateFunctionArgumentTypes(func, arguments, mandatory_args); } - void init(const ColumnsWithTypeAndName & /*arguments*/, MaxSubstringBehavior /*max_substring_behavior*/) {} + void init(const ColumnsWithTypeAndName & /*arguments*/, bool /*max_substrings_includes_remaining_string*/) {} static constexpr auto strings_argument_position = 0uz; diff --git a/tests/queries/0_stateless/02475_split_with_max_substrings.reference b/tests/queries/0_stateless/02475_split_with_max_substrings.reference index d55ef45a5e0..904441f83fa 100644 --- a/tests/queries/0_stateless/02475_split_with_max_substrings.reference +++ b/tests/queries/0_stateless/02475_split_with_max_substrings.reference @@ -1,44 +1,160 @@ -['1','2','3'] -['1','2','3'] -['1','2','3'] -['1'] -['1','2'] -['1','2','3'] -['1','2','3'] -['one','two','three',''] -['one','two','three',''] -['one','two','three',''] -['one'] -['one','two'] -['one','two','three'] -['one','two','three',''] -['one','two','three',''] -['abca','abc'] -['abca','abc'] -['abca','abc'] -['abca'] -['abca','abc'] -['abca','abc'] -['abca','abc'] -['1','a','b'] -['1','a','b'] -['1','a','b'] -['1'] -['1','a'] -['1','a','b'] -['1','a','b'] -['1!','a,','b.'] -['1!','a,','b.'] -['1!','a,','b.'] -['1!'] -['1!','a,'] -['1!','a,','b.'] -['1!','a,','b.'] -['1','2 3','4,5','abcde'] -['1','2 3','4,5','abcde'] -['1','2 3','4,5','abcde'] -['1'] -['1','2 3'] -['1','2 3','4,5'] -['1','2 3','4,5','abcde'] -['1','2 3','4,5','abcde'] +-- negative tests +-- splitByChar +-- (default) +['a','','b','c','d'] +['a','','b','c','d'] +['a','','b','c','d'] +['a'] +['a',''] +['a','','b'] +['a','','b','c'] +['a','','b','c','d'] +['a','','b','c','d'] +-- (include remainder) +['a','','b','c','d'] +['a','','b','c','d'] +['a','','b','c','d'] +['a==b=c=d'] +['a','=b=c=d'] +['a','','b=c=d'] +['a','','b','c=d'] +['a','','b','c','d'] +['a','','b','c','d'] +-- splitByString +-- (default) +['a','=','=','b','=','c','=','d'] +['a','=','=','b','=','c','=','d'] +['a','=','=','b','=','c','=','d'] +['a'] +['a','='] +['a','=','='] +['a','=','=','b'] +['a','=','=','b','='] +['a','=','=','b','=','c'] +['a','=','=','b','=','c','='] +['a','=','=','b','=','c','='] +['a','=','=','b','=','c','=','d'] +['a','=','=','b','=','c','=','d'] +['a','','b','c','d'] +['a','','b','c','d'] +['a','','b','c','d'] +['a'] +['a',''] +['a','','b'] +['a','','b','c'] +['a','','b','c','d'] +['a','','b','c','d'] +-- (include remainder) +['a','=','=','b','=','c','=','d'] +['a','=','=','b','=','c','=','d'] +['a','=','=','b','=','c','=','d'] +['a==b=c=d'] +['a','==b=c=d'] +['a','=','=b=c=d'] +['a','=','=','b=c=d'] +['a','=','=','b','=c=d'] +['a','=','=','b','=','c=d'] +['a','=','=','b','=','c','=d'] +['a','=','=','b','=','c','=','d'] +['a','=','=','b','=','c','=','d'] +['a','','b','c','d'] +['a','','b','c','d'] +['a','','b','c','d'] +['a==b=c=d'] +['a','=b=c=d'] +['a','','b=c=d'] +['a','','b','c=d'] +['a','','b','c','d'] +['a','','b','c','d'] +-- splitByRegexp +-- (default) +['a','bc','de','f'] +['a','bc','de','f'] +['a','bc','de','f'] +['a'] +['a','bc'] +['a','bc','de'] +['a','bc','de','f'] +['a','bc','de','f'] +['a','1','2','b','c','2','3','d','e','3','4','5','f'] +['a','1','2','b','c','2','3','d','e','3','4','5','f'] +['a','1','2','b','c','2','3','d','e','3','4','5','f'] +['a'] +['a','1'] +['a','1','2'] +['a','1','2','b'] +['a','1','2','b','c'] +-- (include remainder) +['a','1','2','b','c','2','3','d','e','3','4','5','f'] +['a','1','2','b','c','2','3','d','e','3','4','5','f'] +['a','1','2','b','c','2','3','d','e','3','4','5','f'] +['a12bc23de345f'] +['a','12bc23de345f'] +['a','1','2bc23de345f'] +['a','1','2','bc23de345f'] +['a','1','2','b','c23de345f'] +['a','bc','de','f'] +['a','bc','de','f'] +['a','bc','de','f'] +['a12bc23de345f'] +['a','bc23de345f'] +['a','bc','de345f'] +['a','bc','de','f'] +['a','bc','de','f'] +-- splitByAlpha +-- (default) +['ab','cd','ef','gh'] +['ab','cd','ef','gh'] +['ab','cd','ef','gh'] +['ab'] +['ab','cd'] +['ab','cd','ef'] +['ab','cd','ef','gh'] +['ab','cd','ef','gh'] +-- (include remainder) +['ab','cd','ef','gh'] +['ab','cd','ef','gh'] +['ab','cd','ef','gh'] +['ab.cd.ef.gh'] +['ab','cd.ef.gh'] +['ab','cd','ef.gh'] +['ab','cd','ef','gh'] +['ab','cd','ef','gh'] +-- splitByNonAlpha +-- (default) +['128','0','0','1'] +['128','0','0','1'] +['128','0','0','1'] +['128'] +['128','0'] +['128','0','0'] +['128','0','0','1'] +['128','0','0','1'] +-- (include remainder) +['128','0','0','1'] +['128','0','0','1'] +['128','0','0','1'] +['128.0.0.1'] +['128','0.0.1'] +['128','0','0.1'] +['128','0','0','1'] +['128','0','0','1'] +-- splitByWhitespace +-- (default) +['Nein,','nein,','nein!','Doch!'] +['Nein,','nein,','nein!','Doch!'] +['Nein,','nein,','nein!','Doch!'] +['Nein,'] +['Nein,','nein,'] +['Nein,','nein,','nein!'] +['Nein,','nein,','nein!','Doch!'] +['Nein,','nein,','nein!','Doch!'] +-- (include remainder) +['Nein,','nein,','nein!','Doch!'] +['Nein,','nein,','nein!','Doch!'] +['Nein,','nein,','nein!','Doch!'] +['Nein, nein, nein! Doch!'] +['Nein,','nein, nein! Doch!'] +['Nein,','nein,','nein! Doch!'] +['Nein,','nein,','nein!','Doch!'] +['Nein,','nein,','nein!','Doch!'] diff --git a/tests/queries/0_stateless/02475_split_with_max_substrings.sql b/tests/queries/0_stateless/02475_split_with_max_substrings.sql index c51133c604e..3f367c75433 100644 --- a/tests/queries/0_stateless/02475_split_with_max_substrings.sql +++ b/tests/queries/0_stateless/02475_split_with_max_substrings.sql @@ -1,59 +1,175 @@ -select splitByChar(',', '1,2,3'); -select splitByChar(',', '1,2,3', -1); -select splitByChar(',', '1,2,3', 0); -select splitByChar(',', '1,2,3', 1); -select splitByChar(',', '1,2,3', 2); -select splitByChar(',', '1,2,3', 3); -select splitByChar(',', '1,2,3', 4); - -select splitByRegexp('[ABC]', 'oneAtwoBthreeC'); -select splitByRegexp('[ABC]', 'oneAtwoBthreeC', -1); -select splitByRegexp('[ABC]', 'oneAtwoBthreeC', 0); -select splitByRegexp('[ABC]', 'oneAtwoBthreeC', 1); -select splitByRegexp('[ABC]', 'oneAtwoBthreeC', 2); -select splitByRegexp('[ABC]', 'oneAtwoBthreeC', 3); -select splitByRegexp('[ABC]', 'oneAtwoBthreeC', 4); -select splitByRegexp('[ABC]', 'oneAtwoBthreeC', 5); - -SELECT alphaTokens('abca1abc'); -SELECT alphaTokens('abca1abc', -1); -SELECT alphaTokens('abca1abc', 0); -SELECT alphaTokens('abca1abc', 1); -SELECT alphaTokens('abca1abc', 2); -SELECT alphaTokens('abca1abc', 3); - -SELECT splitByAlpha('abca1abc'); - -SELECT splitByNonAlpha(' 1! a, b. '); -SELECT splitByNonAlpha(' 1! a, b. ', -1); -SELECT splitByNonAlpha(' 1! a, b. ', 0); -SELECT splitByNonAlpha(' 1! a, b. ', 1); -SELECT splitByNonAlpha(' 1! a, b. ', 2); -SELECT splitByNonAlpha(' 1! a, b. ', 3); -SELECT splitByNonAlpha(' 1! a, b. ', 4); - -SELECT splitByWhitespace(' 1! a, b. '); -SELECT splitByWhitespace(' 1! a, b. ', -1); -SELECT splitByWhitespace(' 1! a, b. ', 0); -SELECT splitByWhitespace(' 1! a, b. ', 1); -SELECT splitByWhitespace(' 1! a, b. ', 2); -SELECT splitByWhitespace(' 1! a, b. ', 3); -SELECT splitByWhitespace(' 1! a, b. ', 4); - -SELECT splitByString(', ', '1, 2 3, 4,5, abcde'); -SELECT splitByString(', ', '1, 2 3, 4,5, abcde', -1); -SELECT splitByString(', ', '1, 2 3, 4,5, abcde', 0); -SELECT splitByString(', ', '1, 2 3, 4,5, abcde', 1); -SELECT splitByString(', ', '1, 2 3, 4,5, abcde', 2); -SELECT splitByString(', ', '1, 2 3, 4,5, abcde', 3); -SELECT splitByString(', ', '1, 2 3, 4,5, abcde', 4); -SELECT splitByString(', ', '1, 2 3, 4,5, abcde', 5); - - -select splitByChar(',', '1,2,3', ''); -- { serverError 43 } -select splitByRegexp('[ABC]', 'oneAtwoBthreeC', ''); -- { serverError 43 } +SELECT '-- negative tests'; +SELECT splitByChar(',', '1,2,3', ''); -- { serverError 43 } +SELECT splitByRegexp('[ABC]', 'oneAtwoBthreeC', ''); -- { serverError 43 } SELECT alphaTokens('abca1abc', ''); -- { serverError 43 } SELECT splitByAlpha('abca1abc', ''); -- { serverError 43 } SELECT splitByNonAlpha(' 1! a, b. ', ''); -- { serverError 43 } SELECT splitByWhitespace(' 1! a, b. ', ''); -- { serverError 43 } -SELECT splitByString(', ', '1, 2 3, 4,5, abcde', ''); -- { serverError 43 } \ No newline at end of file +SELECT splitByString(', ', '1, 2 3, 4,5, abcde', ''); -- { serverError 43 } + +SELECT '-- splitByChar'; +SELECT '-- (default)'; +SELECT splitByChar('=', 'a==b=c=d'); +SELECT splitByChar('=', 'a==b=c=d', -1); +SELECT splitByChar('=', 'a==b=c=d', 0); +SELECT splitByChar('=', 'a==b=c=d', 1); +SELECT splitByChar('=', 'a==b=c=d', 2); +SELECT splitByChar('=', 'a==b=c=d', 3); +SELECT splitByChar('=', 'a==b=c=d', 4); +SELECT splitByChar('=', 'a==b=c=d', 5); +SELECT splitByChar('=', 'a==b=c=d', 6); +SELECT '-- (include remainder)'; +SELECT splitByChar('=', 'a==b=c=d') SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByChar('=', 'a==b=c=d', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByChar('=', 'a==b=c=d', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByChar('=', 'a==b=c=d', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByChar('=', 'a==b=c=d', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByChar('=', 'a==b=c=d', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByChar('=', 'a==b=c=d', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByChar('=', 'a==b=c=d', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByChar('=', 'a==b=c=d', 6) SETTINGS splitby_max_substrings_includes_remaining_string = 1; + +SELECT '-- splitByString'; +SELECT '-- (default)'; +SELECT splitByString('', 'a==b=c=d') SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 6) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 7) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 7) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 8) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('', 'a==b=c=d', 9) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d') SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT splitByString('=', 'a==b=c=d', 6) SETTINGS splitby_max_substrings_includes_remaining_string = 0; +SELECT '-- (include remainder)'; +SELECT splitByString('', 'a==b=c=d') SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 6) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 7) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 8) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('', 'a==b=c=d', 9) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d') SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByString('=', 'a==b=c=d', 6) SETTINGS splitby_max_substrings_includes_remaining_string = 1; + + +SELECT '-- splitByRegexp'; +SELECT '-- (default)'; +SELECT splitByRegexp('\\d+', 'a12bc23de345f'); +SELECT splitByRegexp('\\d+', 'a12bc23de345f', -1); +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 0); +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 1); +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 2); +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 3); +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 4); +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 5); +SELECT splitByRegexp('', 'a12bc23de345f'); +SELECT splitByRegexp('', 'a12bc23de345f', -1); +SELECT splitByRegexp('', 'a12bc23de345f', 0); +SELECT splitByRegexp('', 'a12bc23de345f', 1); +SELECT splitByRegexp('', 'a12bc23de345f', 2); +SELECT splitByRegexp('', 'a12bc23de345f', 3); +SELECT splitByRegexp('', 'a12bc23de345f', 4); +SELECT splitByRegexp('', 'a12bc23de345f', 5); +SELECT '-- (include remainder)'; +SELECT splitByRegexp('', 'a12bc23de345f') SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('', 'a12bc23de345f', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('', 'a12bc23de345f', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('', 'a12bc23de345f', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('', 'a12bc23de345f', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('', 'a12bc23de345f', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('', 'a12bc23de345f', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('', 'a12bc23de345f', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('\\d+', 'a12bc23de345f') SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('\\d+', 'a12bc23de345f', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByRegexp('\\d+', 'a12bc23de345f', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 1; + +SELECT '-- splitByAlpha'; +SELECT '-- (default)'; +SELECT splitByAlpha('ab.cd.ef.gh'); +SELECT splitByAlpha('ab.cd.ef.gh', -1); +SELECT splitByAlpha('ab.cd.ef.gh', 0); +SELECT splitByAlpha('ab.cd.ef.gh', 1); +SELECT splitByAlpha('ab.cd.ef.gh', 2); +SELECT splitByAlpha('ab.cd.ef.gh', 3); +SELECT splitByAlpha('ab.cd.ef.gh', 4); +SELECT splitByAlpha('ab.cd.ef.gh', 5); +SELECT '-- (include remainder)'; +SELECT splitByAlpha('ab.cd.ef.gh') SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByAlpha('ab.cd.ef.gh', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByAlpha('ab.cd.ef.gh', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByAlpha('ab.cd.ef.gh', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByAlpha('ab.cd.ef.gh', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByAlpha('ab.cd.ef.gh', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByAlpha('ab.cd.ef.gh', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByAlpha('ab.cd.ef.gh', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 1; + +SELECT '-- splitByNonAlpha'; +SELECT '-- (default)'; +SELECT splitByNonAlpha('128.0.0.1'); +SELECT splitByNonAlpha('128.0.0.1', -1); +SELECT splitByNonAlpha('128.0.0.1', 0); +SELECT splitByNonAlpha('128.0.0.1', 1); +SELECT splitByNonAlpha('128.0.0.1', 2); +SELECT splitByNonAlpha('128.0.0.1', 3); +SELECT splitByNonAlpha('128.0.0.1', 4); +SELECT splitByNonAlpha('128.0.0.1', 5); +SELECT '-- (include remainder)'; +SELECT splitByNonAlpha('128.0.0.1') SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByNonAlpha('128.0.0.1', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByNonAlpha('128.0.0.1', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByNonAlpha('128.0.0.1', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByNonAlpha('128.0.0.1', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByNonAlpha('128.0.0.1', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByNonAlpha('128.0.0.1', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByNonAlpha('128.0.0.1', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +-- +-- +SELECT '-- splitByWhitespace'; +SELECT '-- (default)'; +SELECT splitByWhitespace('Nein, nein, nein! Doch!'); +SELECT splitByWhitespace('Nein, nein, nein! Doch!', -1); +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 0); +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 1); +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 2); +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 3); +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 4); +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 5); +SELECT '-- (include remainder)'; +SELECT splitByWhitespace('Nein, nein, nein! Doch!') SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByWhitespace('Nein, nein, nein! Doch!', -1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 0) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 1) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 2) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 3) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 4) SETTINGS splitby_max_substrings_includes_remaining_string = 1; +SELECT splitByWhitespace('Nein, nein, nein! Doch!', 5) SETTINGS splitby_max_substrings_includes_remaining_string = 1; diff --git a/tests/queries/0_stateless/02876_splitby_max_substring_behavior.reference b/tests/queries/0_stateless/02876_splitby_max_substring_behavior.reference deleted file mode 100644 index 9966c7d090e..00000000000 --- a/tests/queries/0_stateless/02876_splitby_max_substring_behavior.reference +++ /dev/null @@ -1,126 +0,0 @@ --- splitByAlpha -['ab','cd','ef','gh'] -['ab','cd','ef','gh'] -['ab','cd','ef','gh'] -['ab'] -['ab','cd'] -['ab','cd','ef','gh'] -['ab','cd','ef','gh'] -['ab.cd.ef.gh'] -['ab','cd.ef.gh'] -['ab','cd','ef.gh'] -['ab','cd','ef','gh'] -['ab','cd','ef','gh'] -['ab','cd','ef','gh'] -['ab.cd.ef.gh'] -['ab','cd.ef.gh'] --- splitByNonAlpha -['128','0','0','1'] -['128','0','0','1'] -['128','0','0','1'] -['128'] -['128','0'] -['128','0','0','1'] -['128','0','0','1'] -['128.0.0.1'] -['128','0.0.1'] -['128','0','0.1'] -['128','0','0','1'] -['128','0','0','1'] -['128','0','0','1'] -['128.0.0.1'] -['128','0.0.1'] --- splitByWhitespace -['Nein,','nein,','nein!','Doch!'] -['Nein,','nein,','nein!','Doch!'] -['Nein,','nein,','nein!','Doch!'] -['Nein,'] -['Nein,','nein,'] -['Nein,','nein,','nein!','Doch!'] -['Nein,','nein,','nein!','Doch!'] -['Nein, nein, nein! Doch!'] -['Nein,','nein, nein! Doch!'] -['Nein,','nein,','nein! Doch!'] -['Nein,','nein,','nein!','Doch!'] -['Nein,','nein,','nein!','Doch!'] -['Nein,','nein,','nein!','Doch!'] -['Nein, nein, nein! Doch!'] -['Nein,','nein, nein! Doch!'] --- splitByChar -['a','','b','c','d'] -['a','','b','c','d'] -['a','','b','c','d'] -['a'] -['a',''] -['a','','b','c','d'] -['a','','b','c','d'] -['a==b=c=d'] -['a','=b=c=d'] -['a','','b=c=d'] -['a','','b','c','d'] -['a','','b','c','d'] -['a','','b','c','d'] -['a==b=c=d'] -['a','=b=c=d'] --- splitByString -['a','b=c=d'] -['a','b=c=d'] -['a','b=c=d'] -['a'] -['a','b=c=d'] -['a','b=c=d'] -['a','b=c=d'] -['a==b=c=d'] -['a','b=c=d'] -['a','b=c=d'] -['a','b=c=d'] -['a','b=c=d'] -['a','b=c=d'] -['a==b=c=d'] -['a','b=c=d'] -['a','=','=','b','=','c','=','d'] -['a','=','=','b','=','c','=','d'] -['a','=','=','b','=','c','=','d'] -['a'] -['a','='] -['a','=','=','b','=','c','=','d'] -['a','=','=','b','=','c','=','d'] -['a==b=c=d'] -['a','==b=c=d'] -['a','=','=b=c=d'] -['a','=','=','b','=','c','=','d'] -['a','=','=','b','=','c','=','d'] -['a','=','=','b','=','c','=','d'] -['a==b=c=d'] -['a','==b=c=d'] --- splitByRegexp -['a','bc','de','f'] -['a','bc','de','f'] -['a','bc','de','f'] -['a'] -['a','bc'] -['a','bc','de','f'] -['a','bc','de','f'] -['a12bc23de345f'] -['a','bc23de345f'] -['a','bc','de345f'] -['a','bc','de','f'] -['a','bc','de','f'] -['a','bc','de','f'] -['a12bc23de345f'] -['a','bc23de345f'] -['a','1','2','b','c','2','3','d','e','3','4','5','f'] -['a','1','2','b','c','2','3','d','e','3','4','5','f'] -['a','1','2','b','c','2','3','d','e','3','4','5','f'] -['a'] -['a','1'] -['a','1','2','b','c','2','3','d','e','3','4','5','f'] -['a','1','2','b','c','2','3','d','e','3','4','5','f'] -['a12bc23de345f'] -['a','12bc23de345f'] -['a','1','2bc23de345f'] -['a','1','2','b','c','2','3','d','e','3','4','5','f'] -['a','1','2','b','c','2','3','d','e','3','4','5','f'] -['a','1','2','b','c','2','3','d','e','3','4','5','f'] -['a12bc23de345f'] -['a','12bc23de345f'] diff --git a/tests/queries/0_stateless/02876_splitby_max_substring_behavior.sql b/tests/queries/0_stateless/02876_splitby_max_substring_behavior.sql deleted file mode 100644 index 1dcad65f09b..00000000000 --- a/tests/queries/0_stateless/02876_splitby_max_substring_behavior.sql +++ /dev/null @@ -1,151 +0,0 @@ -SELECT '-- splitByAlpha'; -SELECT splitByAlpha('ab.cd.ef.gh') SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByAlpha('ab.cd.ef.gh', -1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByAlpha('ab.cd.ef.gh', 0) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByAlpha('ab.cd.ef.gh', 1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByAlpha('ab.cd.ef.gh', 2) SETTINGS splitby_max_substring_behavior = ''; - -SELECT splitByAlpha('ab.cd.ef.gh') SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByAlpha('ab.cd.ef.gh', -1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByAlpha('ab.cd.ef.gh', 0) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByAlpha('ab.cd.ef.gh', 1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByAlpha('ab.cd.ef.gh', 2) SETTINGS splitby_max_substring_behavior = 'python'; - -SELECT splitByAlpha('ab.cd.ef.gh') SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByAlpha('ab.cd.ef.gh', -1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByAlpha('ab.cd.ef.gh', 0) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByAlpha('ab.cd.ef.gh', 1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByAlpha('ab.cd.ef.gh', 2) SETTINGS splitby_max_substring_behavior = 'spark'; - -SELECT '-- splitByNonAlpha'; -SELECT splitByNonAlpha('128.0.0.1') SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByNonAlpha('128.0.0.1', -1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByNonAlpha('128.0.0.1', 0) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByNonAlpha('128.0.0.1', 1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByNonAlpha('128.0.0.1', 2) SETTINGS splitby_max_substring_behavior = ''; - -SELECT splitByNonAlpha('128.0.0.1') SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByNonAlpha('128.0.0.1', -1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByNonAlpha('128.0.0.1', 0) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByNonAlpha('128.0.0.1', 1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByNonAlpha('128.0.0.1', 2) SETTINGS splitby_max_substring_behavior = 'python'; - -SELECT splitByNonAlpha('128.0.0.1') SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByNonAlpha('128.0.0.1', -1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByNonAlpha('128.0.0.1', 0) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByNonAlpha('128.0.0.1', 1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByNonAlpha('128.0.0.1', 2) SETTINGS splitby_max_substring_behavior = 'spark'; - -SELECT '-- splitByWhitespace'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!') SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', -1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 0) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 2) SETTINGS splitby_max_substring_behavior = ''; - -SELECT splitByWhitespace('Nein, nein, nein! Doch!') SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', -1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 0) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 2) SETTINGS splitby_max_substring_behavior = 'python'; - -SELECT splitByWhitespace('Nein, nein, nein! Doch!') SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', -1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 0) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByWhitespace('Nein, nein, nein! Doch!', 2) SETTINGS splitby_max_substring_behavior = 'spark'; - -SELECT '-- splitByChar'; -SELECT splitByChar('=', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByChar('=', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByChar('=', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByChar('=', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByChar('=', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = ''; - -SELECT splitByChar('=', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByChar('=', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByChar('=', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByChar('=', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByChar('=', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = 'python'; - -SELECT splitByChar('=', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByChar('=', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByChar('=', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByChar('=', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByChar('=', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = 'spark'; - -SELECT '-- splitByString'; - -SELECT splitByString('==', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByString('==', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByString('==', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByString('==', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByString('==', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = ''; - -SELECT splitByString('==', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByString('==', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByString('==', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByString('==', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByString('==', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = 'python'; - -SELECT splitByString('==', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByString('==', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByString('==', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByString('==', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByString('==', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = 'spark'; - -SELECT splitByString('', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByString('', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByString('', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByString('', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByString('', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = ''; - -SELECT splitByString('', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByString('', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByString('', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByString('', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByString('', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = 'python'; - -SELECT splitByString('', 'a==b=c=d') SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByString('', 'a==b=c=d', -1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByString('', 'a==b=c=d', 0) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByString('', 'a==b=c=d', 1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByString('', 'a==b=c=d', 2) SETTINGS splitby_max_substring_behavior = 'spark'; - -SELECT '-- splitByRegexp'; - -SELECT splitByRegexp('\\d+', 'a12bc23de345f') SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', -1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 0) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 2) SETTINGS splitby_max_substring_behavior = ''; - -SELECT splitByRegexp('\\d+', 'a12bc23de345f') SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', -1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 0) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 2) SETTINGS splitby_max_substring_behavior = 'python'; - -SELECT splitByRegexp('\\d+', 'a12bc23de345f') SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', -1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 0) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByRegexp('\\d+', 'a12bc23de345f', 2) SETTINGS splitby_max_substring_behavior = 'spark'; - -SELECT splitByRegexp('', 'a12bc23de345f') SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByRegexp('', 'a12bc23de345f', -1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByRegexp('', 'a12bc23de345f', 0) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByRegexp('', 'a12bc23de345f', 1) SETTINGS splitby_max_substring_behavior = ''; -SELECT splitByRegexp('', 'a12bc23de345f', 2) SETTINGS splitby_max_substring_behavior = ''; - -SELECT splitByRegexp('', 'a12bc23de345f') SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByRegexp('', 'a12bc23de345f', -1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByRegexp('', 'a12bc23de345f', 0) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByRegexp('', 'a12bc23de345f', 1) SETTINGS splitby_max_substring_behavior = 'python'; -SELECT splitByRegexp('', 'a12bc23de345f', 2) SETTINGS splitby_max_substring_behavior = 'python'; - -SELECT splitByRegexp('', 'a12bc23de345f') SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByRegexp('', 'a12bc23de345f', -1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByRegexp('', 'a12bc23de345f', 0) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByRegexp('', 'a12bc23de345f', 1) SETTINGS splitby_max_substring_behavior = 'spark'; -SELECT splitByRegexp('', 'a12bc23de345f', 2) SETTINGS splitby_max_substring_behavior = 'spark';