mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-26 17:41:59 +00:00
Merge pull request #59088 from Blargian/#31363_format_template_configure_in_settings
#31363 format template configure in settings
This commit is contained in:
commit
c8c9adbd8b
@ -253,7 +253,7 @@ This format is also available under the name `TSVRawWithNamesAndNames`.
|
||||
|
||||
This format allows specifying a custom format string with placeholders for values with a specified escaping rule.
|
||||
|
||||
It uses settings `format_template_resultset`, `format_template_row`, `format_template_rows_between_delimiter` and some settings of other formats (e.g. `output_format_json_quote_64bit_integers` when using `JSON` escaping, see further)
|
||||
It uses settings `format_template_resultset`, `format_template_row` (`format_template_row_format`), `format_template_rows_between_delimiter` and some settings of other formats (e.g. `output_format_json_quote_64bit_integers` when using `JSON` escaping, see further)
|
||||
|
||||
Setting `format_template_row` specifies the path to the file containing format strings for rows with the following syntax:
|
||||
|
||||
@ -279,9 +279,11 @@ the values of `SearchPhrase`, `c` and `price` columns, which are escaped as `Quo
|
||||
|
||||
`Search phrase: 'bathroom interior design', count: 2166, ad price: $3;`
|
||||
|
||||
In cases where it is challenging or not possible to deploy format output configuration for the template format to a directory on all nodes in a cluster, or if the format is trivial then `format_template_row_format` can be used to set the template string directly in the query, rather than a path to the file which contains it.
|
||||
|
||||
The `format_template_rows_between_delimiter` setting specifies the delimiter between rows, which is printed (or expected) after every row except the last one (`\n` by default)
|
||||
|
||||
Setting `format_template_resultset` specifies the path to the file, which contains a format string for resultset. Format string for resultset has the same syntax as a format string for row and allows to specify a prefix, a suffix and a way to print some additional information. It contains the following placeholders instead of column names:
|
||||
Setting `format_template_resultset` specifies the path to the file, which contains a format string for resultset. Setting `format_template_resultset_format` can be used to set the template string for the result set directly in the query itself. Format string for resultset has the same syntax as a format string for row and allows to specify a prefix, a suffix and a way to print some additional information. It contains the following placeholders instead of column names:
|
||||
|
||||
- `data` is the rows with data in `format_template_row` format, separated by `format_template_rows_between_delimiter`. This placeholder must be the first placeholder in the format string.
|
||||
- `totals` is the row with total values in `format_template_row` format (when using WITH TOTALS)
|
||||
|
@ -1662,6 +1662,10 @@ Result:
|
||||
|
||||
Path to file which contains format string for result set (for Template format).
|
||||
|
||||
### format_template_resultset_format {#format_template_resultset_format}
|
||||
|
||||
Format string for result set (for Template format)
|
||||
|
||||
### format_template_row {#format_template_row}
|
||||
|
||||
Path to file which contains format string for rows (for Template format).
|
||||
@ -1670,6 +1674,10 @@ Path to file which contains format string for rows (for Template format).
|
||||
|
||||
Delimiter between rows (for Template format).
|
||||
|
||||
### format_template_row_format {#format_template_row_format}
|
||||
|
||||
Format string for rows (for Template format)
|
||||
|
||||
## CustomSeparated format settings {custom-separated-format-settings}
|
||||
|
||||
### format_custom_escaping_rule {#format_custom_escaping_rule}
|
||||
|
@ -201,7 +201,7 @@ SELECT * FROM nestedt FORMAT TSV
|
||||
|
||||
Этот формат позволяет указать произвольную форматную строку, в которую подставляются значения, сериализованные выбранным способом.
|
||||
|
||||
Для этого используются настройки `format_template_resultset`, `format_template_row`, `format_template_rows_between_delimiter` и настройки экранирования других форматов (например, `output_format_json_quote_64bit_integers` при экранировании как в `JSON`, см. далее)
|
||||
Для этого используются настройки `format_template_resultset`, `format_template_row` (`format_template_row_format`), `format_template_rows_between_delimiter` и настройки экранирования других форматов (например, `output_format_json_quote_64bit_integers` при экранировании как в `JSON`, см. далее)
|
||||
|
||||
Настройка `format_template_row` задаёт путь к файлу, содержащему форматную строку для строк таблицы, которая должна иметь вид:
|
||||
|
||||
@ -227,9 +227,11 @@ SELECT * FROM nestedt FORMAT TSV
|
||||
|
||||
`Search phrase: 'bathroom interior design', count: 2166, ad price: $3;`
|
||||
|
||||
В тех случаях, когда не удобно или не возможно указать произвольную форматную строку в файле, можно использовать `format_template_row_format` указать произвольную форматную строку в запросе.
|
||||
|
||||
Настройка `format_template_rows_between_delimiter` задаёт разделитель между строками, который выводится (или ожмдается при вводе) после каждой строки, кроме последней. По умолчанию `\n`.
|
||||
|
||||
Настройка `format_template_resultset` задаёт путь к файлу, содержащему форматную строку для результата. Форматная строка для результата имеет синтаксис аналогичный форматной строке для строк таблицы и позволяет указать префикс, суффикс и способ вывода дополнительной информации. Вместо имён столбцов в ней указываются следующие имена подстановок:
|
||||
Настройка `format_template_resultset` задаёт путь к файлу, содержащему форматную строку для результата. Настройка `format_template_resultset_format` используется для установки форматной строки для результата непосредственно в запросе. Форматная строка для результата имеет синтаксис аналогичный форматной строке для строк таблицы и позволяет указать префикс, суффикс и способ вывода дополнительной информации. Вместо имён столбцов в ней указываются следующие имена подстановок:
|
||||
|
||||
- `data` - строки с данными в формате `format_template_row`, разделённые `format_template_rows_between_delimiter`. Эта подстановка должна быть первой подстановкой в форматной строке.
|
||||
- `totals` - строка с тотальными значениями в формате `format_template_row` (при использовании WITH TOTALS)
|
||||
|
@ -1093,6 +1093,8 @@ class IColumn;
|
||||
M(String, format_schema, "", "Schema identifier (used by schema-based formats)", 0) \
|
||||
M(String, format_template_resultset, "", "Path to file which contains format string for result set (for Template format)", 0) \
|
||||
M(String, format_template_row, "", "Path to file which contains format string for rows (for Template format)", 0) \
|
||||
M(String, format_template_row_format, "", "Format string for rows (for Template format)", 0) \
|
||||
M(String, format_template_resultset_format, "", "Format string for result set (for Template format)", 0) \
|
||||
M(String, format_template_rows_between_delimiter, "\n", "Delimiter between rows (for Template format)", 0) \
|
||||
\
|
||||
M(EscapingRule, format_custom_escaping_rule, "Escaped", "Field escaping rule (for CustomSeparated format)", 0) \
|
||||
|
@ -91,6 +91,8 @@ static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> sett
|
||||
{"async_insert_busy_timeout_max_ms", 200, 200, "The minimum value of the asynchronous insert timeout in milliseconds; async_insert_busy_timeout_ms is aliased to async_insert_busy_timeout_max_ms"},
|
||||
{"async_insert_busy_timeout_increase_rate", 0.2, 0.2, "The exponential growth rate at which the adaptive asynchronous insert timeout increases"},
|
||||
{"async_insert_busy_timeout_decrease_rate", 0.2, 0.2, "The exponential growth rate at which the adaptive asynchronous insert timeout decreases"},
|
||||
{"format_template_row_format", "", "", "Template row format string can be set directly in query"},
|
||||
{"format_template_resultset_format", "", "", "Template result set format string can be set in query"},
|
||||
{"split_parts_ranges_into_intersecting_and_non_intersecting_final", true, true, "Allow to split parts ranges into intersecting and non intersecting during FINAL optimization"},
|
||||
{"split_intersecting_parts_ranges_into_layers_final", true, true, "Allow to split intersecting parts ranges into layers during FINAL optimization"},
|
||||
{"azure_max_single_part_copy_size", 256*1024*1024, 256*1024*1024, "The maximum size of object to copy using single part copy to Azure blob storage."},
|
||||
|
@ -166,6 +166,8 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings)
|
||||
format_settings.template_settings.resultset_format = settings.format_template_resultset;
|
||||
format_settings.template_settings.row_between_delimiter = settings.format_template_rows_between_delimiter;
|
||||
format_settings.template_settings.row_format = settings.format_template_row;
|
||||
format_settings.template_settings.row_format_template = settings.format_template_row_format;
|
||||
format_settings.template_settings.resultset_format_template = settings.format_template_resultset_format;
|
||||
format_settings.tsv.crlf_end_of_line = settings.output_format_tsv_crlf_end_of_line;
|
||||
format_settings.tsv.empty_as_default = settings.input_format_tsv_empty_as_default;
|
||||
format_settings.tsv.enum_as_number = settings.input_format_tsv_enum_as_number;
|
||||
|
@ -338,6 +338,8 @@ struct FormatSettings
|
||||
String resultset_format;
|
||||
String row_format;
|
||||
String row_between_delimiter;
|
||||
String row_format_template;
|
||||
String resultset_format_template;
|
||||
} template_settings;
|
||||
|
||||
struct
|
||||
|
@ -11,6 +11,7 @@ namespace DB
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int SYNTAX_ERROR;
|
||||
extern const int INVALID_TEMPLATE_FORMAT;
|
||||
}
|
||||
|
||||
TemplateBlockOutputFormat::TemplateBlockOutputFormat(const Block & header_, WriteBuffer & out_, const FormatSettings & settings_,
|
||||
@ -193,13 +194,25 @@ void registerOutputFormatTemplate(FormatFactory & factory)
|
||||
const FormatSettings & settings)
|
||||
{
|
||||
ParsedTemplateFormatString resultset_format;
|
||||
auto idx_resultset_by_name = [&](const String & partName)
|
||||
{
|
||||
return static_cast<size_t>(TemplateBlockOutputFormat::stringToResultsetPart(partName));
|
||||
};
|
||||
if (settings.template_settings.resultset_format.empty())
|
||||
{
|
||||
/// Default format string: "${data}"
|
||||
resultset_format.delimiters.resize(2);
|
||||
resultset_format.escaping_rules.emplace_back(ParsedTemplateFormatString::EscapingRule::None);
|
||||
resultset_format.format_idx_to_column_idx.emplace_back(0);
|
||||
resultset_format.column_names.emplace_back("data");
|
||||
if (settings.template_settings.resultset_format_template.empty())
|
||||
{
|
||||
resultset_format.delimiters.resize(2);
|
||||
resultset_format.escaping_rules.emplace_back(ParsedTemplateFormatString::EscapingRule::None);
|
||||
resultset_format.format_idx_to_column_idx.emplace_back(0);
|
||||
resultset_format.column_names.emplace_back("data");
|
||||
}
|
||||
else
|
||||
{
|
||||
resultset_format = ParsedTemplateFormatString();
|
||||
resultset_format.parse(settings.template_settings.resultset_format_template, idx_resultset_by_name);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
@ -207,20 +220,34 @@ void registerOutputFormatTemplate(FormatFactory & factory)
|
||||
resultset_format = ParsedTemplateFormatString(
|
||||
FormatSchemaInfo(settings.template_settings.resultset_format, "Template", false,
|
||||
settings.schema.is_server, settings.schema.format_schema_path),
|
||||
[&](const String & partName)
|
||||
{
|
||||
return static_cast<size_t>(TemplateBlockOutputFormat::stringToResultsetPart(partName));
|
||||
});
|
||||
idx_resultset_by_name);
|
||||
if (!settings.template_settings.resultset_format_template.empty())
|
||||
{
|
||||
throw Exception(DB::ErrorCodes::INVALID_TEMPLATE_FORMAT, "Expected either format_template_resultset or format_template_resultset_format, but not both");
|
||||
}
|
||||
}
|
||||
|
||||
ParsedTemplateFormatString row_format = ParsedTemplateFormatString(
|
||||
ParsedTemplateFormatString row_format;
|
||||
auto idx_row_by_name = [&](const String & colName)
|
||||
{
|
||||
return sample.getPositionByName(colName);
|
||||
};
|
||||
if (settings.template_settings.row_format.empty())
|
||||
{
|
||||
row_format = ParsedTemplateFormatString();
|
||||
row_format.parse(settings.template_settings.row_format_template, idx_row_by_name);
|
||||
}
|
||||
else
|
||||
{
|
||||
row_format = ParsedTemplateFormatString(
|
||||
FormatSchemaInfo(settings.template_settings.row_format, "Template", false,
|
||||
settings.schema.is_server, settings.schema.format_schema_path),
|
||||
[&](const String & colName)
|
||||
{
|
||||
return sample.getPositionByName(colName);
|
||||
});
|
||||
|
||||
idx_row_by_name);
|
||||
if (!settings.template_settings.row_format_template.empty())
|
||||
{
|
||||
throw Exception(DB::ErrorCodes::INVALID_TEMPLATE_FORMAT, "Expected either format_template_row or format_template_row_format, but not both");
|
||||
}
|
||||
}
|
||||
return std::make_shared<TemplateBlockOutputFormat>(sample, buf, settings, resultset_format, row_format, settings.template_settings.row_between_delimiter);
|
||||
});
|
||||
|
||||
|
@ -0,0 +1,9 @@
|
||||
Question: 'How awesome is clickhouse?', Answer: 'unbelievably awesome!', Number of Likes: 456, Date: 2016-01-02;
|
||||
Question: 'How fast is clickhouse?', Answer: 'Lightning fast!', Number of Likes: 9876543210, Date: 2016-01-03;
|
||||
Question: 'Is it opensource?', Answer: 'of course it is!', Number of Likes: 789, Date: 2016-01-04
|
||||
|
||||
===== Results =====
|
||||
Question: 'How awesome is clickhouse?', Answer: 'unbelievably awesome!', Number of Likes: 456, Date: 2016-01-02;
|
||||
Question: 'How fast is clickhouse?', Answer: 'Lightning fast!', Number of Likes: 9876543210, Date: 2016-01-03;
|
||||
Question: 'Is it opensource?', Answer: 'of course it is!', Number of Likes: 789, Date: 2016-01-04
|
||||
===================
|
49
tests/queries/0_stateless/00937_format_schema_rows_template.sh
Executable file
49
tests/queries/0_stateless/00937_format_schema_rows_template.sh
Executable file
@ -0,0 +1,49 @@
|
||||
#!/usr/bin/env bash
|
||||
# shellcheck disable=SC2016
|
||||
|
||||
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
|
||||
# shellcheck source=../shell_config.sh
|
||||
. "$CURDIR"/../shell_config.sh
|
||||
|
||||
# Test format_template_row_format setting
|
||||
|
||||
$CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS template";
|
||||
$CLICKHOUSE_CLIENT --query="CREATE TABLE template (question String, answer String, likes UInt64, date Date) ENGINE = Memory";
|
||||
$CLICKHOUSE_CLIENT --query="INSERT INTO template VALUES
|
||||
('How awesome is clickhouse?', 'unbelievably awesome!', 456, '2016-01-02'),\
|
||||
('How fast is clickhouse?', 'Lightning fast!', 9876543210, '2016-01-03'),\
|
||||
('Is it opensource?', 'of course it is!', 789, '2016-01-04')";
|
||||
|
||||
$CLICKHOUSE_CLIENT --query="SELECT * FROM template GROUP BY question, answer, likes, date WITH TOTALS ORDER BY date LIMIT 3 FORMAT Template SETTINGS \
|
||||
format_template_row_format = 'Question: \${question:Quoted}, Answer: \${answer:Quoted}, Number of Likes: \${likes:Raw}, Date: \${date:Raw}', \
|
||||
format_template_rows_between_delimiter = ';\n'";
|
||||
|
||||
echo -e "\n"
|
||||
|
||||
# Test that if both format_template_row_format setting and format_template_row are provided, error is thrown
|
||||
row_format_file="$CURDIR"/"${CLICKHOUSE_TEST_UNIQUE_NAME}"_template_output_format_row.tmp
|
||||
echo -ne 'Question: ${question:Quoted}, Answer: ${answer:Quoted}, Number of Likes: ${likes:Raw}, Date: ${date:Raw}' > $row_format_file
|
||||
$CLICKHOUSE_CLIENT --multiline --multiquery --query "SELECT * FROM template GROUP BY question, answer, likes, date WITH TOTALS ORDER BY date LIMIT 3 FORMAT Template SETTINGS \
|
||||
format_template_row = '$row_format_file', \
|
||||
format_template_row_format = 'Question: \${question:Quoted}, Answer: \${answer:Quoted}, Number of Likes: \${likes:Raw}, Date: \${date:Raw}', \
|
||||
format_template_rows_between_delimiter = ';\n'; --{clientError 474}"
|
||||
|
||||
# Test format_template_resultset_format setting
|
||||
|
||||
$CLICKHOUSE_CLIENT --query="SELECT * FROM template GROUP BY question, answer, likes, date WITH TOTALS ORDER BY date LIMIT 3 FORMAT Template SETTINGS \
|
||||
format_template_row_format = 'Question: \${question:Quoted}, Answer: \${answer:Quoted}, Number of Likes: \${likes:Raw}, Date: \${date:Raw}', \
|
||||
format_template_resultset_format = '===== Results ===== \n\${data}\n===================\n', \
|
||||
format_template_rows_between_delimiter = ';\n'";
|
||||
|
||||
# Test that if both format_template_result_format setting and format_template_resultset are provided, error is thrown
|
||||
resultset_output_file="$CURDIR"/"$CLICKHOUSE_TEST_UNIQUE_NAME"_template_output_format_resultset.tmp
|
||||
echo -ne '===== Resultset ===== \n \${data} \n ===============' > $resultset_output_file
|
||||
$CLICKHOUSE_CLIENT --multiline --multiquery --query "SELECT * FROM template GROUP BY question, answer, likes, date WITH TOTALS ORDER BY date LIMIT 3 FORMAT Template SETTINGS \
|
||||
format_template_resultset = '$resultset_output_file', \
|
||||
format_template_resultset_format = '===== Resultset ===== \n \${data} \n ===============', \
|
||||
format_template_row_format = 'Question: \${question:Quoted}, Answer: \${answer:Quoted}, Number of Likes: \${likes:Raw}, Date: \${date:Raw}', \
|
||||
format_template_rows_between_delimiter = ';\n'; --{clientError 474}"
|
||||
|
||||
$CLICKHOUSE_CLIENT --query="DROP TABLE template";
|
||||
rm $row_format_file
|
||||
rm $resultset_output_file
|
Loading…
Reference in New Issue
Block a user