mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-09-20 08:40:50 +00:00
Merge pull request #52692 from Avogar/variable-number-of-volumns-more-formats
Allow variable number of columns in more formats, make it work with schema inference
This commit is contained in:
commit
c68456a20a
@ -196,6 +196,7 @@ SELECT * FROM nestedt FORMAT TSV
|
||||
- [input_format_tsv_skip_first_lines](/docs/en/operations/settings/settings-formats.md/#input_format_tsv_skip_first_lines) - skip specified number of lines at the beginning of data. Default value - `0`.
|
||||
- [input_format_tsv_detect_header](/docs/en/operations/settings/settings-formats.md/#input_format_tsv_detect_header) - automatically detect header with names and types in TSV format. Default value - `true`.
|
||||
- [input_format_tsv_skip_trailing_empty_lines](/docs/en/operations/settings/settings-formats.md/#input_format_tsv_skip_trailing_empty_lines) - skip trailing empty lines at the end of data. Default value - `false`.
|
||||
- [input_format_tsv_allow_variable_number_of_columns](/docs/en/operations/settings/settings-formats.md/#input_format_tsv_allow_variable_number_of_columns) - allow variable number of columns in TSV format, ignore extra columns and use default values on missing columns. Default value - `false`.
|
||||
|
||||
## TabSeparatedRaw {#tabseparatedraw}
|
||||
|
||||
@ -473,7 +474,7 @@ The CSV format supports the output of totals and extremes the same way as `TabSe
|
||||
- [input_format_csv_skip_trailing_empty_lines](/docs/en/operations/settings/settings-formats.md/#input_format_csv_skip_trailing_empty_lines) - skip trailing empty lines at the end of data. Default value - `false`.
|
||||
- [input_format_csv_trim_whitespaces](/docs/en/operations/settings/settings-formats.md/#input_format_csv_trim_whitespaces) - trim spaces and tabs in non-quoted CSV strings. Default value - `true`.
|
||||
- [input_format_csv_allow_whitespace_or_tab_as_delimiter](/docs/en/operations/settings/settings-formats.md/# input_format_csv_allow_whitespace_or_tab_as_delimiter) - Allow to use whitespace or tab as field delimiter in CSV strings. Default value - `false`.
|
||||
- [input_format_csv_allow_variable_number_of_columns](/docs/en/operations/settings/settings-formats.md/#input_format_csv_allow_variable_number_of_columns) - ignore extra columns in CSV input (if file has more columns than expected) and treat missing fields in CSV input as default values. Default value - `false`.
|
||||
- [input_format_csv_allow_variable_number_of_columns](/docs/en/operations/settings/settings-formats.md/#input_format_csv_allow_variable_number_of_columns) - allow variable number of columns in CSV format, ignore extra columns and use default values on missing columns. Default value - `false`.
|
||||
- [input_format_csv_use_default_on_bad_values](/docs/en/operations/settings/settings-formats.md/#input_format_csv_use_default_on_bad_values) - Allow to set default value to column when CSV field deserialization failed on bad value. Default value - `false`.
|
||||
|
||||
## CSVWithNames {#csvwithnames}
|
||||
@ -502,9 +503,10 @@ the types from input data will be compared with the types of the corresponding c
|
||||
|
||||
Similar to [Template](#format-template), but it prints or reads all names and types of columns and uses escaping rule from [format_custom_escaping_rule](/docs/en/operations/settings/settings-formats.md/#format_custom_escaping_rule) setting and delimiters from [format_custom_field_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_field_delimiter), [format_custom_row_before_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_row_before_delimiter), [format_custom_row_after_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_row_after_delimiter), [format_custom_row_between_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_row_between_delimiter), [format_custom_result_before_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_result_before_delimiter) and [format_custom_result_after_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_result_after_delimiter) settings, not from format strings.
|
||||
|
||||
If setting [input_format_custom_detect_header](/docs/en/operations/settings/settings-formats.md/#input_format_custom_detect_header) is enabled, ClickHouse will automatically detect header with names and types if any.
|
||||
|
||||
If setting [input_format_tsv_skip_trailing_empty_lines](/docs/en/operations/settings/settings-formats.md/#input_format_custom_detect_header) is enabled, trailing empty lines at the end of file will be skipped.
|
||||
Additional settings:
|
||||
- [input_format_custom_detect_header](/docs/en/operations/settings/settings-formats.md/#input_format_custom_detect_header) - enables automatic detection of header with names and types if any. Default value - `true`.
|
||||
- [input_format_custom_skip_trailing_empty_lines](/docs/en/operations/settings/settings-formats.md/#input_format_custom_skip_trailing_empty_lines) - skip trailing empty lines at the end of file . Default value - `false`.
|
||||
- [input_format_custom_allow_variable_number_of_columns](/docs/en/operations/settings/settings-formats.md/#input_format_custom_allow_variable_number_of_columns) - allow variable number of columns in CustomSeparated format, ignore extra columns and use default values on missing columns. Default value - `false`.
|
||||
|
||||
There is also `CustomSeparatedIgnoreSpaces` format, which is similar to [TemplateIgnoreSpaces](#templateignorespaces).
|
||||
|
||||
@ -1262,6 +1264,7 @@ SELECT * FROM json_each_row_nested
|
||||
- [input_format_json_named_tuples_as_objects](/docs/en/operations/settings/settings-formats.md/#input_format_json_named_tuples_as_objects) - parse named tuple columns as JSON objects. Default value - `true`.
|
||||
- [input_format_json_defaults_for_missing_elements_in_named_tuple](/docs/en/operations/settings/settings-formats.md/#input_format_json_defaults_for_missing_elements_in_named_tuple) - insert default values for missing elements in JSON object while parsing named tuple. Default value - `true`.
|
||||
- [input_format_json_ignore_unknown_keys_in_named_tuple](/docs/en/operations/settings/settings-formats.md/#input_format_json_ignore_unknown_keys_in_named_tuple) - Ignore unknown keys in json object for named tuples. Default value - `false`.
|
||||
- [input_format_json_compact_allow_variable_number_of_columns](/docs/en/operations/settings/settings-formats.md/#input_format_json_compact_allow_variable_number_of_columns) - allow variable number of columns in JSONCompact/JSONCompactEachRow format, ignore extra columns and use default values on missing columns. Default value - `false`.
|
||||
- [output_format_json_quote_64bit_integers](/docs/en/operations/settings/settings-formats.md/#output_format_json_quote_64bit_integers) - controls quoting of 64-bit integers in JSON output format. Default value - `true`.
|
||||
- [output_format_json_quote_64bit_floats](/docs/en/operations/settings/settings-formats.md/#output_format_json_quote_64bit_floats) - controls quoting of 64-bit floats in JSON output format. Default value - `false`.
|
||||
- [output_format_json_quote_denormals](/docs/en/operations/settings/settings-formats.md/#output_format_json_quote_denormals) - enables '+nan', '-nan', '+inf', '-inf' outputs in JSON output format. Default value - `false`.
|
||||
|
@ -627,6 +627,13 @@ Column type should be String. If value is empty, default names `row_{i}`will be
|
||||
|
||||
Default value: ''.
|
||||
|
||||
### input_format_json_compact_allow_variable_number_of_columns {#input_format_json_compact_allow_variable_number_of_columns}
|
||||
|
||||
Allow variable number of columns in rows in JSONCompact/JSONCompactEachRow input formats.
|
||||
Ignore extra columns in rows with more columns than expected and treat missing columns as default values.
|
||||
|
||||
Disabled by default.
|
||||
|
||||
## TSV format settings {#tsv-format-settings}
|
||||
|
||||
### input_format_tsv_empty_as_default {#input_format_tsv_empty_as_default}
|
||||
@ -764,6 +771,13 @@ When enabled, trailing empty lines at the end of TSV file will be skipped.
|
||||
|
||||
Disabled by default.
|
||||
|
||||
### input_format_tsv_allow_variable_number_of_columns {#input_format_tsv_allow_variable_number_of_columns}
|
||||
|
||||
Allow variable number of columns in rows in TSV input format.
|
||||
Ignore extra columns in rows with more columns than expected and treat missing columns as default values.
|
||||
|
||||
Disabled by default.
|
||||
|
||||
## CSV format settings {#csv-format-settings}
|
||||
|
||||
### format_csv_delimiter {#format_csv_delimiter}
|
||||
@ -955,9 +969,11 @@ Result
|
||||
```text
|
||||
" string "
|
||||
```
|
||||
|
||||
### input_format_csv_allow_variable_number_of_columns {#input_format_csv_allow_variable_number_of_columns}
|
||||
|
||||
ignore extra columns in CSV input (if file has more columns than expected) and treat missing fields in CSV input as default values.
|
||||
Allow variable number of columns in rows in CSV input format.
|
||||
Ignore extra columns in rows with more columns than expected and treat missing columns as default values.
|
||||
|
||||
Disabled by default.
|
||||
|
||||
@ -1571,6 +1587,13 @@ When enabled, trailing empty lines at the end of file in CustomSeparated format
|
||||
|
||||
Disabled by default.
|
||||
|
||||
### input_format_custom_allow_variable_number_of_columns {#input_format_custom_allow_variable_number_of_columns}
|
||||
|
||||
Allow variable number of columns in rows in CustomSeparated input format.
|
||||
Ignore extra columns in rows with more columns than expected and treat missing columns as default values.
|
||||
|
||||
Disabled by default.
|
||||
|
||||
## Regexp format settings {#regexp-format-settings}
|
||||
|
||||
### format_regexp_escaping_rule {#format_regexp_escaping_rule}
|
||||
|
@ -894,6 +894,10 @@ class IColumn;
|
||||
M(Bool, input_format_csv_allow_whitespace_or_tab_as_delimiter, false, "Allow to use spaces and tabs(\\t) as field delimiter in the CSV strings", 0) \
|
||||
M(Bool, input_format_csv_trim_whitespaces, true, "Trims spaces and tabs (\\t) characters at the beginning and end in CSV strings", 0) \
|
||||
M(Bool, input_format_csv_use_default_on_bad_values, false, "Allow to set default value to column when CSV field deserialization failed on bad value", 0) \
|
||||
M(Bool, input_format_csv_allow_variable_number_of_columns, false, "Ignore extra columns in CSV input (if file has more columns than expected) and treat missing fields in CSV input as default values", 0) \
|
||||
M(Bool, input_format_tsv_allow_variable_number_of_columns, false, "Ignore extra columns in TSV input (if file has more columns than expected) and treat missing fields in TSV input as default values", 0) \
|
||||
M(Bool, input_format_custom_allow_variable_number_of_columns, false, "Ignore extra columns in CustomSeparated input (if file has more columns than expected) and treat missing fields in CustomSeparated input as default values", 0) \
|
||||
M(Bool, input_format_json_compact_allow_variable_number_of_columns, false, "Ignore extra columns in JSONCompact(EachRow) input (if file has more columns than expected) and treat missing fields in JSONCompact(EachRow) input as default values", 0) \
|
||||
M(Bool, input_format_tsv_detect_header, true, "Automatically detect header with names and types in TSV format", 0) \
|
||||
M(Bool, input_format_custom_detect_header, true, "Automatically detect header with names and types in CustomSeparated format", 0) \
|
||||
M(Bool, input_format_parquet_skip_columns_with_unsupported_types_in_schema_inference, false, "Skip columns with unsupported types while schema inference for format Parquet", 0) \
|
||||
@ -1042,7 +1046,6 @@ class IColumn;
|
||||
M(Bool, regexp_dict_allow_hyperscan, true, "Allow regexp_tree dictionary using Hyperscan library.", 0) \
|
||||
\
|
||||
M(Bool, dictionary_use_async_executor, false, "Execute a pipeline for reading from a dictionary with several threads. It's supported only by DIRECT dictionary with CLICKHOUSE source.", 0) \
|
||||
M(Bool, input_format_csv_allow_variable_number_of_columns, false, "Ignore extra columns in CSV input (if file has more columns than expected) and treat missing fields in CSV input as default values", 0) \
|
||||
M(Bool, precise_float_parsing, false, "Prefer more precise (but slower) float parsing algorithm", 0) \
|
||||
|
||||
// End of FORMAT_FACTORY_SETTINGS
|
||||
|
@ -86,6 +86,7 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings)
|
||||
format_settings.custom.row_between_delimiter = settings.format_custom_row_between_delimiter;
|
||||
format_settings.custom.try_detect_header = settings.input_format_custom_detect_header;
|
||||
format_settings.custom.skip_trailing_empty_lines = settings.input_format_custom_skip_trailing_empty_lines;
|
||||
format_settings.custom.allow_variable_number_of_columns = settings.input_format_custom_allow_variable_number_of_columns;
|
||||
format_settings.date_time_input_format = settings.date_time_input_format;
|
||||
format_settings.date_time_output_format = settings.date_time_output_format;
|
||||
format_settings.interval.output_format = settings.interval_output_format;
|
||||
@ -115,6 +116,7 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings)
|
||||
format_settings.json.validate_utf8 = settings.output_format_json_validate_utf8;
|
||||
format_settings.json_object_each_row.column_for_object_name = settings.format_json_object_each_row_column_for_object_name;
|
||||
format_settings.json.allow_object_type = context->getSettingsRef().allow_experimental_object_type;
|
||||
format_settings.json.compact_allow_variable_number_of_columns = settings.input_format_json_compact_allow_variable_number_of_columns;
|
||||
format_settings.null_as_default = settings.input_format_null_as_default;
|
||||
format_settings.decimal_trailing_zeros = settings.output_format_decimal_trailing_zeros;
|
||||
format_settings.parquet.row_group_rows = settings.output_format_parquet_row_group_size;
|
||||
@ -163,6 +165,7 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings)
|
||||
format_settings.tsv.skip_first_lines = settings.input_format_tsv_skip_first_lines;
|
||||
format_settings.tsv.try_detect_header = settings.input_format_tsv_detect_header;
|
||||
format_settings.tsv.skip_trailing_empty_lines = settings.input_format_tsv_skip_trailing_empty_lines;
|
||||
format_settings.tsv.allow_variable_number_of_columns = settings.input_format_tsv_allow_variable_number_of_columns;
|
||||
format_settings.values.accurate_types_of_literals = settings.input_format_values_accurate_types_of_literals;
|
||||
format_settings.values.deduce_templates_of_expressions = settings.input_format_values_deduce_templates_of_expressions;
|
||||
format_settings.values.interpret_expressions = settings.input_format_values_interpret_expressions;
|
||||
|
@ -175,6 +175,7 @@ struct FormatSettings
|
||||
EscapingRule escaping_rule = EscapingRule::Escaped;
|
||||
bool try_detect_header = true;
|
||||
bool skip_trailing_empty_lines = false;
|
||||
bool allow_variable_number_of_columns = false;
|
||||
} custom;
|
||||
|
||||
struct
|
||||
@ -197,6 +198,7 @@ struct FormatSettings
|
||||
bool validate_types_from_metadata = true;
|
||||
bool validate_utf8 = false;
|
||||
bool allow_object_type = false;
|
||||
bool compact_allow_variable_number_of_columns = false;
|
||||
} json;
|
||||
|
||||
struct
|
||||
@ -317,6 +319,7 @@ struct FormatSettings
|
||||
UInt64 skip_first_lines = 0;
|
||||
bool try_detect_header = true;
|
||||
bool skip_trailing_empty_lines = false;
|
||||
bool allow_variable_number_of_columns = false;
|
||||
} tsv;
|
||||
|
||||
struct
|
||||
|
@ -115,21 +115,24 @@ NamesAndTypesList IRowSchemaReader::readSchema()
|
||||
"Cannot read rows to determine the schema, the maximum number of rows (or bytes) to read is set to 0. "
|
||||
"Most likely setting input_format_max_rows_to_read_for_schema_inference or input_format_max_bytes_to_read_for_schema_inference is set to 0");
|
||||
|
||||
DataTypes data_types = readRowAndGetDataTypes();
|
||||
auto data_types_maybe = readRowAndGetDataTypes();
|
||||
|
||||
/// Check that we read at list one column.
|
||||
if (data_types.empty())
|
||||
if (!data_types_maybe)
|
||||
throw Exception(ErrorCodes::EMPTY_DATA_PASSED, "Cannot read rows from the data");
|
||||
|
||||
DataTypes data_types = std::move(*data_types_maybe);
|
||||
|
||||
/// If column names weren't set, use default names 'c1', 'c2', ...
|
||||
if (column_names.empty())
|
||||
bool use_default_column_names = column_names.empty();
|
||||
if (use_default_column_names)
|
||||
{
|
||||
column_names.reserve(data_types.size());
|
||||
for (size_t i = 0; i != data_types.size(); ++i)
|
||||
column_names.push_back("c" + std::to_string(i + 1));
|
||||
}
|
||||
/// If column names were set, check that the number of names match the number of types.
|
||||
else if (column_names.size() != data_types.size())
|
||||
else if (column_names.size() != data_types.size() && !allowVariableNumberOfColumns())
|
||||
{
|
||||
throw Exception(
|
||||
ErrorCodes::INCORRECT_DATA,
|
||||
@ -137,6 +140,9 @@ NamesAndTypesList IRowSchemaReader::readSchema()
|
||||
}
|
||||
else
|
||||
{
|
||||
if (column_names.size() != data_types.size())
|
||||
data_types.resize(column_names.size());
|
||||
|
||||
std::unordered_set<std::string_view> names_set;
|
||||
for (const auto & name : column_names)
|
||||
{
|
||||
@ -155,13 +161,39 @@ NamesAndTypesList IRowSchemaReader::readSchema()
|
||||
|
||||
for (rows_read = 1; rows_read < max_rows_to_read && in.count() < max_bytes_to_read; ++rows_read)
|
||||
{
|
||||
DataTypes new_data_types = readRowAndGetDataTypes();
|
||||
if (new_data_types.empty())
|
||||
auto new_data_types_maybe = readRowAndGetDataTypes();
|
||||
if (!new_data_types_maybe)
|
||||
/// We reached eof.
|
||||
break;
|
||||
|
||||
DataTypes new_data_types = std::move(*new_data_types_maybe);
|
||||
|
||||
if (new_data_types.size() != data_types.size())
|
||||
throw Exception(ErrorCodes::INCORRECT_DATA, "Rows have different amount of values");
|
||||
{
|
||||
if (!allowVariableNumberOfColumns())
|
||||
throw Exception(ErrorCodes::INCORRECT_DATA, "Rows have different amount of values");
|
||||
|
||||
if (use_default_column_names)
|
||||
{
|
||||
/// Current row contains new columns, add new default names.
|
||||
if (new_data_types.size() > data_types.size())
|
||||
{
|
||||
for (size_t i = data_types.size(); i < new_data_types.size(); ++i)
|
||||
column_names.push_back("c" + std::to_string(i + 1));
|
||||
data_types.resize(new_data_types.size());
|
||||
}
|
||||
/// Current row contain less columns than previous rows.
|
||||
else
|
||||
{
|
||||
new_data_types.resize(data_types.size());
|
||||
}
|
||||
}
|
||||
/// If names were explicitly set, ignore all extra columns.
|
||||
else
|
||||
{
|
||||
new_data_types.resize(column_names.size());
|
||||
}
|
||||
}
|
||||
|
||||
for (field_index = 0; field_index != data_types.size(); ++field_index)
|
||||
{
|
||||
|
@ -93,11 +93,13 @@ protected:
|
||||
/// Read one row and determine types of columns in it.
|
||||
/// Return types in the same order in which the values were in the row.
|
||||
/// If it's impossible to determine the type for some column, return nullptr for it.
|
||||
/// Return empty list if can't read more data.
|
||||
virtual DataTypes readRowAndGetDataTypes() = 0;
|
||||
/// Return std::nullopt if can't read more data.
|
||||
virtual std::optional<DataTypes> readRowAndGetDataTypes() = 0;
|
||||
|
||||
void setColumnNames(const std::vector<String> & names) { column_names = names; }
|
||||
|
||||
virtual bool allowVariableNumberOfColumns() const { return false; }
|
||||
|
||||
size_t field_index;
|
||||
|
||||
private:
|
||||
|
@ -284,7 +284,7 @@ bool CSVFormatReader::parseRowEndWithDiagnosticInfo(WriteBuffer & out)
|
||||
return true;
|
||||
}
|
||||
|
||||
bool CSVFormatReader::allowVariableNumberOfColumns()
|
||||
bool CSVFormatReader::allowVariableNumberOfColumns() const
|
||||
{
|
||||
return format_settings.csv.allow_variable_number_of_columns;
|
||||
}
|
||||
@ -410,19 +410,22 @@ CSVSchemaReader::CSVSchemaReader(ReadBuffer & in_, bool with_names_, bool with_t
|
||||
{
|
||||
}
|
||||
|
||||
std::pair<std::vector<String>, DataTypes> CSVSchemaReader::readRowAndGetFieldsAndDataTypes()
|
||||
std::optional<std::pair<std::vector<String>, DataTypes>> CSVSchemaReader::readRowAndGetFieldsAndDataTypes()
|
||||
{
|
||||
if (buf.eof())
|
||||
return {};
|
||||
|
||||
auto fields = reader.readRow();
|
||||
auto data_types = tryInferDataTypesByEscapingRule(fields, format_settings, FormatSettings::EscapingRule::CSV);
|
||||
return {fields, data_types};
|
||||
return std::make_pair(std::move(fields), std::move(data_types));
|
||||
}
|
||||
|
||||
DataTypes CSVSchemaReader::readRowAndGetDataTypesImpl()
|
||||
std::optional<DataTypes> CSVSchemaReader::readRowAndGetDataTypesImpl()
|
||||
{
|
||||
return std::move(readRowAndGetFieldsAndDataTypes().second);
|
||||
auto fields_with_types = readRowAndGetFieldsAndDataTypes();
|
||||
if (!fields_with_types)
|
||||
return {};
|
||||
return std::move(fields_with_types->second);
|
||||
}
|
||||
|
||||
|
||||
|
@ -70,7 +70,7 @@ public:
|
||||
void skipPrefixBeforeHeader() override;
|
||||
|
||||
bool checkForEndOfRow() override;
|
||||
bool allowVariableNumberOfColumns() override;
|
||||
bool allowVariableNumberOfColumns() const override;
|
||||
|
||||
std::vector<String> readNames() override { return readHeaderRow(); }
|
||||
std::vector<String> readTypes() override { return readHeaderRow(); }
|
||||
@ -102,8 +102,10 @@ public:
|
||||
CSVSchemaReader(ReadBuffer & in_, bool with_names_, bool with_types_, const FormatSettings & format_settings_);
|
||||
|
||||
private:
|
||||
DataTypes readRowAndGetDataTypesImpl() override;
|
||||
std::pair<std::vector<String>, DataTypes> readRowAndGetFieldsAndDataTypes() override;
|
||||
bool allowVariableNumberOfColumns() const override { return format_settings.csv.allow_variable_number_of_columns; }
|
||||
|
||||
std::optional<DataTypes> readRowAndGetDataTypesImpl() override;
|
||||
std::optional<std::pair<std::vector<String>, DataTypes>> readRowAndGetFieldsAndDataTypes() override;
|
||||
|
||||
PeekableReadBuffer buf;
|
||||
CSVFormatReader reader;
|
||||
|
@ -139,10 +139,13 @@ void CustomSeparatedFormatReader::skipRowBetweenDelimiter()
|
||||
void CustomSeparatedFormatReader::skipField()
|
||||
{
|
||||
skipSpaces();
|
||||
skipFieldByEscapingRule(*buf, format_settings.custom.escaping_rule, format_settings);
|
||||
if (format_settings.custom.escaping_rule == FormatSettings::EscapingRule::CSV)
|
||||
readCSVFieldWithTwoPossibleDelimiters(*buf, format_settings.csv, format_settings.custom.field_delimiter, format_settings.custom.row_after_delimiter);
|
||||
else
|
||||
skipFieldByEscapingRule(*buf, format_settings.custom.escaping_rule, format_settings);
|
||||
}
|
||||
|
||||
bool CustomSeparatedFormatReader::checkEndOfRow()
|
||||
bool CustomSeparatedFormatReader::checkForEndOfRow()
|
||||
{
|
||||
PeekableReadBufferCheckpoint checkpoint{*buf, true};
|
||||
|
||||
@ -200,12 +203,12 @@ std::vector<String> CustomSeparatedFormatReader::readRowImpl()
|
||||
std::vector<String> values;
|
||||
skipRowStartDelimiter();
|
||||
|
||||
if (columns == 0)
|
||||
if (columns == 0 || allowVariableNumberOfColumns())
|
||||
{
|
||||
do
|
||||
{
|
||||
values.push_back(readFieldIntoString<mode>(values.empty(), false, true));
|
||||
} while (!checkEndOfRow());
|
||||
} while (!checkForEndOfRow());
|
||||
columns = values.size();
|
||||
}
|
||||
else
|
||||
@ -230,7 +233,7 @@ void CustomSeparatedFormatReader::skipHeaderRow()
|
||||
|
||||
skipField();
|
||||
}
|
||||
while (!checkEndOfRow());
|
||||
while (!checkForEndOfRow());
|
||||
|
||||
skipRowEndDelimiter();
|
||||
}
|
||||
@ -369,7 +372,7 @@ CustomSeparatedSchemaReader::CustomSeparatedSchemaReader(
|
||||
{
|
||||
}
|
||||
|
||||
std::pair<std::vector<String>, DataTypes> CustomSeparatedSchemaReader::readRowAndGetFieldsAndDataTypes()
|
||||
std::optional<std::pair<std::vector<String>, DataTypes>> CustomSeparatedSchemaReader::readRowAndGetFieldsAndDataTypes()
|
||||
{
|
||||
if (no_more_data || reader.checkForSuffix())
|
||||
{
|
||||
@ -385,12 +388,15 @@ std::pair<std::vector<String>, DataTypes> CustomSeparatedSchemaReader::readRowAn
|
||||
|
||||
auto fields = reader.readRow();
|
||||
auto data_types = tryInferDataTypesByEscapingRule(fields, reader.getFormatSettings(), reader.getEscapingRule(), &json_inference_info);
|
||||
return {fields, data_types};
|
||||
return std::make_pair(std::move(fields), std::move(data_types));
|
||||
}
|
||||
|
||||
DataTypes CustomSeparatedSchemaReader::readRowAndGetDataTypesImpl()
|
||||
std::optional<DataTypes> CustomSeparatedSchemaReader::readRowAndGetDataTypesImpl()
|
||||
{
|
||||
return readRowAndGetFieldsAndDataTypes().second;
|
||||
auto fields_with_types = readRowAndGetFieldsAndDataTypes();
|
||||
if (!fields_with_types)
|
||||
return {};
|
||||
return std::move(fields_with_types->second);
|
||||
}
|
||||
|
||||
void CustomSeparatedSchemaReader::transformTypesIfNeeded(DataTypePtr & type, DataTypePtr & new_type)
|
||||
|
@ -74,7 +74,9 @@ public:
|
||||
|
||||
std::vector<String> readRowForHeaderDetection() override { return readRowImpl<ReadFieldMode::AS_POSSIBLE_STRING>(); }
|
||||
|
||||
bool checkEndOfRow();
|
||||
bool checkForEndOfRow() override;
|
||||
bool allowVariableNumberOfColumns() const override { return format_settings.custom.allow_variable_number_of_columns; }
|
||||
|
||||
bool checkForSuffixImpl(bool check_eof);
|
||||
inline void skipSpaces() { if (ignore_spaces) skipWhitespaceIfAny(*buf, true); }
|
||||
|
||||
@ -109,9 +111,11 @@ public:
|
||||
CustomSeparatedSchemaReader(ReadBuffer & in_, bool with_names_, bool with_types_, bool ignore_spaces_, const FormatSettings & format_setting_);
|
||||
|
||||
private:
|
||||
DataTypes readRowAndGetDataTypesImpl() override;
|
||||
bool allowVariableNumberOfColumns() const override { return format_settings.custom.allow_variable_number_of_columns; }
|
||||
|
||||
std::pair<std::vector<String>, DataTypes> readRowAndGetFieldsAndDataTypes() override;
|
||||
std::optional<DataTypes> readRowAndGetDataTypesImpl() override;
|
||||
|
||||
std::optional<std::pair<std::vector<String>, DataTypes>> readRowAndGetFieldsAndDataTypes() override;
|
||||
|
||||
void transformTypesIfNeeded(DataTypePtr & type, DataTypePtr & new_type) override;
|
||||
|
||||
|
@ -112,6 +112,12 @@ bool JSONCompactEachRowFormatReader::readField(IColumn & column, const DataTypeP
|
||||
return JSONUtils::readField(*in, column, type, serialization, column_name, format_settings, yield_strings);
|
||||
}
|
||||
|
||||
bool JSONCompactEachRowFormatReader::checkForEndOfRow()
|
||||
{
|
||||
skipWhitespaceIfAny(*in);
|
||||
return !in->eof() && *in->position() == ']';
|
||||
}
|
||||
|
||||
bool JSONCompactEachRowFormatReader::parseRowStartWithDiagnosticInfo(WriteBuffer & out)
|
||||
{
|
||||
skipWhitespaceIfAny(*in);
|
||||
@ -187,7 +193,7 @@ JSONCompactEachRowRowSchemaReader::JSONCompactEachRowRowSchemaReader(
|
||||
{
|
||||
}
|
||||
|
||||
DataTypes JSONCompactEachRowRowSchemaReader::readRowAndGetDataTypesImpl()
|
||||
std::optional<DataTypes> JSONCompactEachRowRowSchemaReader::readRowAndGetDataTypesImpl()
|
||||
{
|
||||
if (first_row)
|
||||
first_row = false;
|
||||
|
@ -68,6 +68,9 @@ public:
|
||||
std::vector<String> readNames() override { return readHeaderRow(); }
|
||||
std::vector<String> readTypes() override { return readHeaderRow(); }
|
||||
|
||||
bool checkForEndOfRow() override;
|
||||
bool allowVariableNumberOfColumns() const override { return format_settings.json.compact_allow_variable_number_of_columns; }
|
||||
|
||||
bool yieldStrings() const { return yield_strings; }
|
||||
private:
|
||||
bool yield_strings;
|
||||
@ -79,7 +82,9 @@ public:
|
||||
JSONCompactEachRowRowSchemaReader(ReadBuffer & in_, bool with_names_, bool with_types_, bool yield_strings_, const FormatSettings & format_settings_);
|
||||
|
||||
private:
|
||||
DataTypes readRowAndGetDataTypesImpl() override;
|
||||
bool allowVariableNumberOfColumns() const override { return format_settings.json.compact_allow_variable_number_of_columns; }
|
||||
|
||||
std::optional<DataTypes> readRowAndGetDataTypesImpl() override;
|
||||
|
||||
void transformTypesIfNeeded(DataTypePtr & type, DataTypePtr & new_type) override;
|
||||
void transformFinalTypeIfNeeded(DataTypePtr & type) override;
|
||||
|
@ -634,7 +634,7 @@ DataTypePtr MsgPackSchemaReader::getDataType(const msgpack::object & object)
|
||||
UNREACHABLE();
|
||||
}
|
||||
|
||||
DataTypes MsgPackSchemaReader::readRowAndGetDataTypes()
|
||||
std::optional<DataTypes> MsgPackSchemaReader::readRowAndGetDataTypes()
|
||||
{
|
||||
if (buf.eof())
|
||||
return {};
|
||||
|
@ -91,7 +91,7 @@ public:
|
||||
private:
|
||||
msgpack::object_handle readObject();
|
||||
DataTypePtr getDataType(const msgpack::object & object);
|
||||
DataTypes readRowAndGetDataTypes() override;
|
||||
std::optional<DataTypes> readRowAndGetDataTypes() override;
|
||||
|
||||
PeekableReadBuffer buf;
|
||||
UInt64 number_of_columns;
|
||||
|
@ -422,7 +422,7 @@ NamesAndTypesList MySQLDumpSchemaReader::readSchema()
|
||||
return IRowSchemaReader::readSchema();
|
||||
}
|
||||
|
||||
DataTypes MySQLDumpSchemaReader::readRowAndGetDataTypes()
|
||||
std::optional<DataTypes> MySQLDumpSchemaReader::readRowAndGetDataTypes()
|
||||
{
|
||||
if (in.eof())
|
||||
return {};
|
||||
|
@ -33,7 +33,7 @@ public:
|
||||
|
||||
private:
|
||||
NamesAndTypesList readSchema() override;
|
||||
DataTypes readRowAndGetDataTypes() override;
|
||||
std::optional<DataTypes> readRowAndGetDataTypes() override;
|
||||
|
||||
String table_name;
|
||||
};
|
||||
|
@ -143,7 +143,7 @@ RegexpSchemaReader::RegexpSchemaReader(ReadBuffer & in_, const FormatSettings &
|
||||
{
|
||||
}
|
||||
|
||||
DataTypes RegexpSchemaReader::readRowAndGetDataTypes()
|
||||
std::optional<DataTypes> RegexpSchemaReader::readRowAndGetDataTypes()
|
||||
{
|
||||
if (buf.eof())
|
||||
return {};
|
||||
|
@ -79,7 +79,7 @@ public:
|
||||
RegexpSchemaReader(ReadBuffer & in_, const FormatSettings & format_settings);
|
||||
|
||||
private:
|
||||
DataTypes readRowAndGetDataTypes() override;
|
||||
std::optional<DataTypes> readRowAndGetDataTypes() override;
|
||||
|
||||
void transformTypesIfNeeded(DataTypePtr & type, DataTypePtr & new_type) override;
|
||||
|
||||
|
@ -300,6 +300,11 @@ bool TabSeparatedFormatReader::checkForSuffix()
|
||||
return false;
|
||||
}
|
||||
|
||||
bool TabSeparatedFormatReader::checkForEndOfRow()
|
||||
{
|
||||
return buf->eof() || *buf->position() == '\n';
|
||||
}
|
||||
|
||||
TabSeparatedSchemaReader::TabSeparatedSchemaReader(
|
||||
ReadBuffer & in_, bool with_names_, bool with_types_, bool is_raw_, const FormatSettings & format_settings_)
|
||||
: FormatWithNamesAndTypesSchemaReader(
|
||||
@ -315,19 +320,22 @@ TabSeparatedSchemaReader::TabSeparatedSchemaReader(
|
||||
{
|
||||
}
|
||||
|
||||
std::pair<std::vector<String>, DataTypes> TabSeparatedSchemaReader::readRowAndGetFieldsAndDataTypes()
|
||||
std::optional<std::pair<std::vector<String>, DataTypes>> TabSeparatedSchemaReader::readRowAndGetFieldsAndDataTypes()
|
||||
{
|
||||
if (buf.eof())
|
||||
return {};
|
||||
|
||||
auto fields = reader.readRow();
|
||||
auto data_types = tryInferDataTypesByEscapingRule(fields, reader.getFormatSettings(), reader.getEscapingRule());
|
||||
return {fields, data_types};
|
||||
return std::make_pair(fields, data_types);
|
||||
}
|
||||
|
||||
DataTypes TabSeparatedSchemaReader::readRowAndGetDataTypesImpl()
|
||||
std::optional<DataTypes> TabSeparatedSchemaReader::readRowAndGetDataTypesImpl()
|
||||
{
|
||||
return readRowAndGetFieldsAndDataTypes().second;
|
||||
auto fields_with_types = readRowAndGetFieldsAndDataTypes();
|
||||
if (!fields_with_types)
|
||||
return {};
|
||||
return std::move(fields_with_types->second);
|
||||
}
|
||||
|
||||
void registerInputFormatTabSeparated(FormatFactory & factory)
|
||||
|
@ -76,6 +76,9 @@ public:
|
||||
void setReadBuffer(ReadBuffer & in_) override;
|
||||
|
||||
bool checkForSuffix() override;
|
||||
bool checkForEndOfRow() override;
|
||||
|
||||
bool allowVariableNumberOfColumns() const override { return format_settings.tsv.allow_variable_number_of_columns; }
|
||||
|
||||
private:
|
||||
template <bool is_header>
|
||||
@ -92,8 +95,10 @@ public:
|
||||
TabSeparatedSchemaReader(ReadBuffer & in_, bool with_names_, bool with_types_, bool is_raw_, const FormatSettings & format_settings);
|
||||
|
||||
private:
|
||||
DataTypes readRowAndGetDataTypesImpl() override;
|
||||
std::pair<std::vector<String>, DataTypes> readRowAndGetFieldsAndDataTypes() override;
|
||||
bool allowVariableNumberOfColumns() const override { return format_settings.tsv.allow_variable_number_of_columns; }
|
||||
|
||||
std::optional<DataTypes> readRowAndGetDataTypesImpl() override;
|
||||
std::optional<std::pair<std::vector<String>, DataTypes>> readRowAndGetFieldsAndDataTypes() override;
|
||||
|
||||
PeekableReadBuffer buf;
|
||||
TabSeparatedFormatReader reader;
|
||||
|
@ -490,7 +490,7 @@ TemplateSchemaReader::TemplateSchemaReader(
|
||||
setColumnNames(row_format.column_names);
|
||||
}
|
||||
|
||||
DataTypes TemplateSchemaReader::readRowAndGetDataTypes()
|
||||
std::optional<DataTypes> TemplateSchemaReader::readRowAndGetDataTypes()
|
||||
{
|
||||
if (first_row)
|
||||
format_reader.readPrefix();
|
||||
|
@ -119,7 +119,7 @@ public:
|
||||
std::string row_between_delimiter,
|
||||
const FormatSettings & format_settings_);
|
||||
|
||||
DataTypes readRowAndGetDataTypes() override;
|
||||
std::optional<DataTypes> readRowAndGetDataTypes() override;
|
||||
|
||||
private:
|
||||
void transformTypesIfNeeded(DataTypePtr & type, DataTypePtr & new_type) override;
|
||||
|
@ -638,7 +638,7 @@ ValuesSchemaReader::ValuesSchemaReader(ReadBuffer & in_, const FormatSettings &
|
||||
{
|
||||
}
|
||||
|
||||
DataTypes ValuesSchemaReader::readRowAndGetDataTypes()
|
||||
std::optional<DataTypes> ValuesSchemaReader::readRowAndGetDataTypes()
|
||||
{
|
||||
if (first_row)
|
||||
{
|
||||
|
@ -105,7 +105,7 @@ public:
|
||||
ValuesSchemaReader(ReadBuffer & in_, const FormatSettings & format_settings);
|
||||
|
||||
private:
|
||||
DataTypes readRowAndGetDataTypes() override;
|
||||
std::optional<DataTypes> readRowAndGetDataTypes() override;
|
||||
|
||||
PeekableReadBuffer buf;
|
||||
ParserExpression parser;
|
||||
|
@ -212,8 +212,24 @@ bool RowInputFormatWithNamesAndTypes::readRow(MutableColumns & columns, RowReadE
|
||||
format_reader->skipRowStartDelimiter();
|
||||
|
||||
ext.read_columns.resize(data_types.size());
|
||||
for (size_t file_column = 0; file_column < column_mapping->column_indexes_for_input_fields.size(); ++file_column)
|
||||
size_t file_column = 0;
|
||||
for (; file_column < column_mapping->column_indexes_for_input_fields.size(); ++file_column)
|
||||
{
|
||||
if (format_reader->allowVariableNumberOfColumns() && format_reader->checkForEndOfRow())
|
||||
{
|
||||
while (file_column < column_mapping->column_indexes_for_input_fields.size())
|
||||
{
|
||||
const auto & rem_column_index = column_mapping->column_indexes_for_input_fields[file_column];
|
||||
if (rem_column_index)
|
||||
columns[*rem_column_index]->insertDefault();
|
||||
++file_column;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
if (file_column != 0)
|
||||
format_reader->skipFieldDelimiter();
|
||||
|
||||
const auto & column_index = column_mapping->column_indexes_for_input_fields[file_column];
|
||||
const bool is_last_file_column = file_column + 1 == column_mapping->column_indexes_for_input_fields.size();
|
||||
if (column_index)
|
||||
@ -225,22 +241,6 @@ bool RowInputFormatWithNamesAndTypes::readRow(MutableColumns & columns, RowReadE
|
||||
column_mapping->names_of_columns[file_column]);
|
||||
else
|
||||
format_reader->skipField(file_column);
|
||||
|
||||
if (!is_last_file_column)
|
||||
{
|
||||
if (format_reader->allowVariableNumberOfColumns() && format_reader->checkForEndOfRow())
|
||||
{
|
||||
++file_column;
|
||||
while (file_column < column_mapping->column_indexes_for_input_fields.size())
|
||||
{
|
||||
const auto & rem_column_index = column_mapping->column_indexes_for_input_fields[file_column];
|
||||
columns[*rem_column_index]->insertDefault();
|
||||
++file_column;
|
||||
}
|
||||
}
|
||||
else
|
||||
format_reader->skipFieldDelimiter();
|
||||
}
|
||||
}
|
||||
|
||||
if (format_reader->allowVariableNumberOfColumns() && !format_reader->checkForEndOfRow())
|
||||
@ -248,7 +248,7 @@ bool RowInputFormatWithNamesAndTypes::readRow(MutableColumns & columns, RowReadE
|
||||
do
|
||||
{
|
||||
format_reader->skipFieldDelimiter();
|
||||
format_reader->skipField(1);
|
||||
format_reader->skipField(file_column++);
|
||||
}
|
||||
while (!format_reader->checkForEndOfRow());
|
||||
}
|
||||
@ -419,12 +419,14 @@ namespace
|
||||
|
||||
void FormatWithNamesAndTypesSchemaReader::tryDetectHeader(std::vector<String> & column_names, std::vector<String> & type_names)
|
||||
{
|
||||
auto [first_row_values, first_row_types] = readRowAndGetFieldsAndDataTypes();
|
||||
auto first_row = readRowAndGetFieldsAndDataTypes();
|
||||
|
||||
/// No data.
|
||||
if (first_row_values.empty())
|
||||
if (!first_row)
|
||||
return;
|
||||
|
||||
const auto & [first_row_values, first_row_types] = *first_row;
|
||||
|
||||
/// The first row contains non String elements, it cannot be a header.
|
||||
if (!checkIfAllTypesAreString(first_row_types))
|
||||
{
|
||||
@ -432,15 +434,17 @@ void FormatWithNamesAndTypesSchemaReader::tryDetectHeader(std::vector<String> &
|
||||
return;
|
||||
}
|
||||
|
||||
auto [second_row_values, second_row_types] = readRowAndGetFieldsAndDataTypes();
|
||||
auto second_row = readRowAndGetFieldsAndDataTypes();
|
||||
|
||||
/// Data contains only 1 row, don't treat it as a header.
|
||||
if (second_row_values.empty())
|
||||
if (!second_row)
|
||||
{
|
||||
buffered_types = first_row_types;
|
||||
return;
|
||||
}
|
||||
|
||||
const auto & [second_row_values, second_row_types] = *second_row;
|
||||
|
||||
DataTypes data_types;
|
||||
bool second_row_can_be_type_names = checkIfAllTypesAreString(second_row_types) && checkIfAllValuesAreTypeNames(readNamesFromFields(second_row_values));
|
||||
size_t row = 2;
|
||||
@ -450,15 +454,16 @@ void FormatWithNamesAndTypesSchemaReader::tryDetectHeader(std::vector<String> &
|
||||
}
|
||||
else
|
||||
{
|
||||
data_types = readRowAndGetDataTypes();
|
||||
auto data_types_maybe = readRowAndGetDataTypes();
|
||||
/// Data contains only 2 rows.
|
||||
if (data_types.empty())
|
||||
if (!data_types_maybe)
|
||||
{
|
||||
second_row_can_be_type_names = false;
|
||||
data_types = second_row_types;
|
||||
}
|
||||
else
|
||||
{
|
||||
data_types = *data_types_maybe;
|
||||
++row;
|
||||
}
|
||||
}
|
||||
@ -490,10 +495,10 @@ void FormatWithNamesAndTypesSchemaReader::tryDetectHeader(std::vector<String> &
|
||||
return;
|
||||
}
|
||||
|
||||
auto next_row_types = readRowAndGetDataTypes();
|
||||
auto next_row_types_maybe = readRowAndGetDataTypes();
|
||||
/// Check if there are no more rows in data. It means that all rows contains only String values and Nulls,
|
||||
/// so, the first two rows with all String elements can be real data and we cannot use them as a header.
|
||||
if (next_row_types.empty())
|
||||
if (!next_row_types_maybe)
|
||||
{
|
||||
/// Buffer first data types from the first row, because it doesn't contain Nulls.
|
||||
buffered_types = first_row_types;
|
||||
@ -502,11 +507,11 @@ void FormatWithNamesAndTypesSchemaReader::tryDetectHeader(std::vector<String> &
|
||||
|
||||
++row;
|
||||
/// Combine types from current row and from previous rows.
|
||||
chooseResultColumnTypes(*this, data_types, next_row_types, getDefaultDataTypeForEscapingRule(FormatSettings::EscapingRule::CSV), default_colum_names, row);
|
||||
chooseResultColumnTypes(*this, data_types, *next_row_types_maybe, getDefaultDataTypeForEscapingRule(FormatSettings::EscapingRule::CSV), default_colum_names, row);
|
||||
}
|
||||
}
|
||||
|
||||
DataTypes FormatWithNamesAndTypesSchemaReader::readRowAndGetDataTypes()
|
||||
std::optional<DataTypes> FormatWithNamesAndTypesSchemaReader::readRowAndGetDataTypes()
|
||||
{
|
||||
/// Check if we tried to detect a header and have buffered types from read rows.
|
||||
if (!buffered_types.empty())
|
||||
|
@ -119,9 +119,10 @@ public:
|
||||
/// Check suffix.
|
||||
virtual bool checkForSuffix() { return in->eof(); }
|
||||
|
||||
/// Check if we are at the end of row, not between fields.
|
||||
virtual bool checkForEndOfRow() { throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method checkForEndOfRow is not implemented"); }
|
||||
|
||||
virtual bool allowVariableNumberOfColumns() { return false; }
|
||||
virtual bool allowVariableNumberOfColumns() const { return false; }
|
||||
|
||||
const FormatSettings & getFormatSettings() const { return format_settings; }
|
||||
|
||||
@ -160,15 +161,15 @@ public:
|
||||
NamesAndTypesList readSchema() override;
|
||||
|
||||
protected:
|
||||
virtual DataTypes readRowAndGetDataTypes() override;
|
||||
virtual std::optional<DataTypes> readRowAndGetDataTypes() override;
|
||||
|
||||
virtual DataTypes readRowAndGetDataTypesImpl()
|
||||
virtual std::optional<DataTypes> readRowAndGetDataTypesImpl()
|
||||
{
|
||||
throw Exception{ErrorCodes::NOT_IMPLEMENTED, "Method readRowAndGetDataTypesImpl is not implemented"};
|
||||
}
|
||||
|
||||
/// Return column fields with inferred types. In case of no more rows, return empty vectors.
|
||||
virtual std::pair<std::vector<String>, DataTypes> readRowAndGetFieldsAndDataTypes()
|
||||
/// Return column fields with inferred types. In case of no more rows, return nullopt.
|
||||
virtual std::optional<std::pair<std::vector<String>, DataTypes>> readRowAndGetFieldsAndDataTypes()
|
||||
{
|
||||
throw Exception{ErrorCodes::NOT_IMPLEMENTED, "Method readRowAndGetFieldsAndDataTypes is not implemented"};
|
||||
}
|
||||
|
@ -0,0 +1,76 @@
|
||||
CSV
|
||||
1 1
|
||||
2 0
|
||||
0 0
|
||||
3 3
|
||||
1 1 \N \N
|
||||
2 \N \N \N
|
||||
\N \N \N \N
|
||||
3 3 3 3
|
||||
1 1
|
||||
2 \N
|
||||
\N \N
|
||||
3 3
|
||||
1 0
|
||||
2 0
|
||||
0 0
|
||||
3 0
|
||||
TSV
|
||||
1 1
|
||||
2 0
|
||||
0 0
|
||||
3 3
|
||||
1 1 \N \N
|
||||
2 \N \N \N
|
||||
\N \N \N \N
|
||||
3 3 3 3
|
||||
1 1
|
||||
2 \N
|
||||
\N \N
|
||||
3 3
|
||||
1 0
|
||||
2 0
|
||||
0 0
|
||||
3 0
|
||||
JSONCompactEachRow
|
||||
1 1
|
||||
2 0
|
||||
0 0
|
||||
3 3
|
||||
1 1
|
||||
2 0
|
||||
0 0
|
||||
3 3
|
||||
1 [1,2,3]
|
||||
2 []
|
||||
0 []
|
||||
3 [3]
|
||||
1 1 \N \N
|
||||
2 \N \N \N
|
||||
\N \N \N \N
|
||||
3 3 3 3
|
||||
1 1
|
||||
2 \N
|
||||
\N \N
|
||||
3 3
|
||||
1 0
|
||||
2 0
|
||||
0 0
|
||||
3 0
|
||||
CustomSeparated
|
||||
1 1
|
||||
2 0
|
||||
0 0
|
||||
3 3
|
||||
1 1 \N \N
|
||||
2 \N \N \N
|
||||
\N \N \N \N
|
||||
3 3 3 3
|
||||
1 1
|
||||
2 \N
|
||||
\N \N
|
||||
3 3
|
||||
1 0
|
||||
2 0
|
||||
0 0
|
||||
3 0
|
@ -0,0 +1,24 @@
|
||||
select 'CSV';
|
||||
select * from format(CSV, 'x UInt32, y UInt32', '1,1\n2\n\n3,3,3,3') settings input_format_csv_allow_variable_number_of_columns=1;
|
||||
select * from format(CSV, '1,1\n2\n\n3,3,3,3') settings input_format_csv_allow_variable_number_of_columns=1;
|
||||
select * from format(CSVWithNames, '"x","y"\n1,1\n2\n\n3,3,3,3') settings input_format_csv_allow_variable_number_of_columns=1;
|
||||
select * from format(CSVWithNames, 'x UInt32, z UInt32', '"x","y"\n1,1\n2\n\n3,3,3,3') settings input_format_csv_allow_variable_number_of_columns=1;
|
||||
select 'TSV';
|
||||
select * from format(TSV, 'x UInt32, y UInt32', '1\t1\n2\n\n3\t3\t3\t3') settings input_format_tsv_allow_variable_number_of_columns=1;
|
||||
select * from format(TSV, '1\t1\n2\n\n3\t3\t3\t3') settings input_format_tsv_allow_variable_number_of_columns=1;
|
||||
select * from format(TSVWithNames, 'x\ty\n1\t1\n2\n\n3\t3\t3\t3') settings input_format_tsv_allow_variable_number_of_columns=1;
|
||||
select * from format(TSVWithNames, 'x UInt32, z UInt32', 'x\ty\n1\t1\n2\n\n3\t3\t3\t3') settings input_format_tsv_allow_variable_number_of_columns=1;
|
||||
select 'JSONCompactEachRow';
|
||||
select * from format(JSONCompactEachRow, 'x UInt32, y UInt32', '[1,1]\n[2]\n[]\n[3,3,3,3]') settings input_format_json_compact_allow_variable_number_of_columns=1;
|
||||
select * from format(JSONCompactEachRow, 'x UInt32, y UInt32', '[1,1,[1,2,3]]\n[2]\n[]\n[3,3,3,3,[1,2,3]]') settings input_format_json_compact_allow_variable_number_of_columns=1;
|
||||
select * from format(JSONCompactEachRow, 'x UInt32, y Array(UInt32)', '[1,[1,2,3],1]\n[2]\n[]\n[3,[3],3,3,[1,2,3]]') settings input_format_json_compact_allow_variable_number_of_columns=1;
|
||||
select * from format(JSONCompactEachRow, '[1,1]\n[2]\n[]\n[3,3,3,3]') settings input_format_json_compact_allow_variable_number_of_columns=1;
|
||||
select * from format(JSONCompactEachRowWithNames, '["x","y"]\n[1,1]\n[2]\n[]\n[3,3,3,3]') settings input_format_json_compact_allow_variable_number_of_columns=1;
|
||||
select * from format(JSONCompactEachRowWithNames, 'x UInt32, z UInt32', '["x","y"]\n[1,1]\n[2]\n[]\n[3,3,3,3]') settings input_format_json_compact_allow_variable_number_of_columns=1;
|
||||
select 'CustomSeparated';
|
||||
set format_custom_escaping_rule='CSV', format_custom_field_delimiter='<field_delimiter>', format_custom_row_before_delimiter='<row_before_delimiter>', format_custom_row_after_delimiter='<row_after_delimiter>', format_custom_row_between_delimiter='<row_between_delimiter>', format_custom_result_before_delimiter='<result_before_delimiter>', format_custom_result_after_delimiter='<result_after_delimiter>';
|
||||
select * from format(CustomSeparated, 'x UInt32, y UInt32', '<result_before_delimiter><row_before_delimiter>1<field_delimiter>1<row_after_delimiter><row_between_delimiter><row_before_delimiter>2<row_after_delimiter><row_between_delimiter><row_before_delimiter><row_after_delimiter><row_between_delimiter><row_before_delimiter>3<field_delimiter>3<field_delimiter>3<field_delimiter>3<row_after_delimiter><result_after_delimiter>') settings input_format_custom_allow_variable_number_of_columns=1;
|
||||
select * from format(CustomSeparated, '<result_before_delimiter><row_before_delimiter>1<field_delimiter>1<row_after_delimiter><row_between_delimiter><row_before_delimiter>2<row_after_delimiter><row_between_delimiter><row_before_delimiter><row_after_delimiter><row_between_delimiter><row_before_delimiter>3<field_delimiter>3<field_delimiter>3<field_delimiter>3<row_after_delimiter><result_after_delimiter>') settings input_format_custom_allow_variable_number_of_columns=1;
|
||||
select * from format(CustomSeparatedWithNames, '<result_before_delimiter><row_before_delimiter>"x"<field_delimiter>"y"<row_after_delimiter><row_between_delimiter><row_before_delimiter>1<field_delimiter>1<row_after_delimiter><row_between_delimiter><row_before_delimiter>2<row_after_delimiter><row_between_delimiter><row_before_delimiter><row_after_delimiter><row_between_delimiter><row_before_delimiter>3<field_delimiter>3<field_delimiter>3<field_delimiter>3<row_after_delimiter><result_after_delimiter>') settings input_format_custom_allow_variable_number_of_columns=1;
|
||||
select * from format(CustomSeparatedWithNames, 'x UInt32, z UInt32', '<result_before_delimiter><row_before_delimiter>"x"<field_delimiter>"y"<row_after_delimiter><row_between_delimiter><row_before_delimiter>1<field_delimiter>1<row_after_delimiter><row_between_delimiter><row_before_delimiter>2<row_after_delimiter><row_between_delimiter><row_before_delimiter><row_after_delimiter><row_between_delimiter><row_before_delimiter>3<field_delimiter>3<field_delimiter>3<field_delimiter>3<row_after_delimiter><result_after_delimiter>') settings input_format_custom_allow_variable_number_of_columns=1;
|
||||
|
Loading…
Reference in New Issue
Block a user