readability-identifier-length was added with Clang 14 which does not yet
run in the central builds yet but (at least on my system) locally.
Disabling the check because it is too noisy. Older clang-tidy versions
will just ignore the setting.
* Fix GROUP BY AggregateFunction
finalizeChunk() was unconditionally converting AggregateFunction to the
underlying type, however this should be done only if the aggregate was
applied.
So pass names of aggregates as an argument to the finalizeChunk()
Fuzzer report [1]:
Logical error: 'Bad cast from type DB::ColumnArray to DB::ColumnAggregateFunction'. Received signal 6 Received signal Aborted (6)
For the following query:
SELECT
arraySort(groupArrayArray(grp_simple)),
grp_aggreg,
arraySort(groupArrayArray(grp_simple)),
b,
arraySort(groupArrayArray(grp_simple)) AS grs
FROM data_02294
GROUP BY
a,
grp_aggreg,
b
SETTINGS optimize_aggregation_in_order = 1
[1]: https://s3.amazonaws.com/clickhouse-test-reports/37050/323ae98202d80fc4b311be1e7308ef2ac39e6063/fuzzer_astfuzzerdebug,actions//fuzzer.log
v2: fix conflicts in src/Interpreters/InterpreterSelectQuery.cpp
v3: Fix header for GROUP BY AggregateFunction WITH TOTALS
v4: Add sanity check into finalizeBlock()
v5: Use typeid_cast<&> to get more sensible error in case of bad cast (as suggested by @nickitat)
v6: Fix positions passed to finalizeChunk()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Core/ColumnNumbers.h: remove unused <string>
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Optimize finalizeChunk()/finalizeBlock()
v2: s/ByPosition/Mask/ s/by_position/mask/
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
GCC support these days is experimental. GCCs main use is to keep the
code portable but I don't think it's used for performance tests. Hence
removing a performance workaround added in the GCC 7 days (we are now
using 11, soon: 12).
- changed config.xml/yaml files used by CH's own internal tests which
are (hopefully) not sensitive to mark_cache_size being set or not
- further occurrences exist but changing them seems a bad idea (e.g.
because they are in customer-provided data)