Consider the following query:
SELECT avgWeighted(x, y) FROM (SELECT NULL, 255 AS x, 1 AS y UNION ALL SELECT y, NULL AS x, 1 AS y)
Here is UNION from two SELECT queries
- `SELECT NULL, 255 AS x, 1 AS y`
- `SELECT y, NULL AS x, 1 AS y`
UNION queries matches columns by positions, not names, so the following
columns should be used by `avgWeighted()`:
- `255 AS x, 1 AS y`
- `NULL AS x, 1 AS y`
Result types of arguments should be:
- `x Nullable(UInt8)`
- `y UInt8`
And in case of UNION query is a subselect itself, it will return only
required columns, for the example above it needs only `x` and `y`.
For this it will get positions of these arguments from the first query,
and then use those positions to get required column names from the
second query (since there is no ability to get columns by positions
instead of names internally), and due to duplicated columns the second
query will return (`y`, `x`) not (`x`, `y`), and this will make the
result incorrect:
EXPLAIN header = 1, optimize = 0, actions=1 SELECT avgWeighted(x, y) FROM (SELECT NULL, 255 AS x, 1 AS y UNION ALL SELECT y, NULL AS x, 1 AS y)
Aggregates:
avgWeighted(x, y)
Function: avgWeighted(Nullable(UInt8), UInt8) → Nullable(Float64)
Arguments: x, y
Argument positions: 0, 1
Expression (Before GROUP BY)
Header: x UInt8
y Nullable(UInt8)
...
Union
Header: x UInt8
y Nullable(UInt8)
Expression (Conversion before UNION)
Header: x UInt8
y Nullable(UInt8)
Expression (Conversion before UNION)
Header: x UInt8
y Nullable(UInt8)
And the query itself fails with an error:
Logical error: 'Bad cast from type DB::ColumnVector<char8_t> to DB::ColumnNullable'.
_NOTE: `avgWeighted()` here is required to trigger `LOGICAL_ERROR`_
CI: https://s3.amazonaws.com/clickhouse-test-reports/37796/e637489f81768df582fe7389e57f7ed12893087c/fuzzer_astfuzzerdebug,actions//report.html
Fixes: 02227_union_match_by_name
v2: fix untuple() (reserve space for output_columns_positions too)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Order of files on disk is not guarantee to match with:
- order of creation
- lexical order
So sometimes 02305_data4.jsonl comes first and 2 rows is enough to get
schema.
Reorganize checks a little to avoid flakiness.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>