Consider the following query:
SELECT avgWeighted(x, y) FROM (SELECT NULL, 255 AS x, 1 AS y UNION ALL SELECT y, NULL AS x, 1 AS y)
Here is UNION from two SELECT queries
- `SELECT NULL, 255 AS x, 1 AS y`
- `SELECT y, NULL AS x, 1 AS y`
UNION queries matches columns by positions, not names, so the following
columns should be used by `avgWeighted()`:
- `255 AS x, 1 AS y`
- `NULL AS x, 1 AS y`
Result types of arguments should be:
- `x Nullable(UInt8)`
- `y UInt8`
And in case of UNION query is a subselect itself, it will return only
required columns, for the example above it needs only `x` and `y`.
For this it will get positions of these arguments from the first query,
and then use those positions to get required column names from the
second query (since there is no ability to get columns by positions
instead of names internally), and due to duplicated columns the second
query will return (`y`, `x`) not (`x`, `y`), and this will make the
result incorrect:
EXPLAIN header = 1, optimize = 0, actions=1 SELECT avgWeighted(x, y) FROM (SELECT NULL, 255 AS x, 1 AS y UNION ALL SELECT y, NULL AS x, 1 AS y)
Aggregates:
avgWeighted(x, y)
Function: avgWeighted(Nullable(UInt8), UInt8) → Nullable(Float64)
Arguments: x, y
Argument positions: 0, 1
Expression (Before GROUP BY)
Header: x UInt8
y Nullable(UInt8)
...
Union
Header: x UInt8
y Nullable(UInt8)
Expression (Conversion before UNION)
Header: x UInt8
y Nullable(UInt8)
Expression (Conversion before UNION)
Header: x UInt8
y Nullable(UInt8)
And the query itself fails with an error:
Logical error: 'Bad cast from type DB::ColumnVector<char8_t> to DB::ColumnNullable'.
_NOTE: `avgWeighted()` here is required to trigger `LOGICAL_ERROR`_
CI: https://s3.amazonaws.com/clickhouse-test-reports/37796/e637489f81768df582fe7389e57f7ed12893087c/fuzzer_astfuzzerdebug,actions//report.html
Fixes: 02227_union_match_by_name
v2: fix untuple() (reserve space for output_columns_positions too)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This is better then introducing separate
SelectQueryExpressionAnalyzer::useGroupingSetKey(), since for
optimize_aggregation_in_order that method will not be enough, because
size of ManyExpressionActions will not match size of SortDescription, in
ReadInOrderOptimizer::ReadInOrderOptimizer()
And plus it is cleaner.
v2: fix clang-tidy
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>