v2: fill AggregateColumnsConstData only for only_merge
(fixes 01291_aggregation_in_order and some other tests)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
row_begin was wrong, and before this patch aggregator processing
{row_end, row_end} range, in other words, zero range.
Fixes: #9113 (cc @dimarub2000)
v2: add static_cast to fix UBSan
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
The check is currently *not* part of .clang-tidy. It complains about:
(1) "switch has multiple consecutive identical branches"
(2) "repeated branch in conditional chain"
About (1): Lots of findings in switches were about redundant
"[[fallthrough]]" in places where the compiler would not warn anyways. I
have cleaned these up.
About (2): In if-else_if-else chains, fixing the warning would usually
mean concatenating multiple if-conditions. As this would reduce
readability in most cases, I did not fix these places.
Because of (2), I also refrained from adding "bugprone-branch-clone" to
.clang-tidy.
Before it does lots of extra work, now, it will be significantly more
optimal (thousands of rows -> 1-2 million of rows).
v2: s/executeOnBlockSimple/executeOnBlockSmall/
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This statistics significantly decrease performance of
optimize_aggregation_in_order with a prefix key.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* use AggregationMethod ctor with reserve
* add new settings
* add HashTablesStatistics
* support queries with limit
* support distributed and with external aggregation
* add new profile events
* add some tests
* add perf test
* export cache stats through AsynchronousMetrics
* rm redundant trace
* fix style
* fix 02122_parallel_formatting test
* review fixes
* fix 02122_parallel_formatting test
* apply also to two-level HTs
* try simpler strategy
* increase max_size_to_preallocate_for_aggregation for experiment
* fixes
* Revert "increase max_size_to_preallocate_for_aggregation for experiment"
This reverts commit 6cf6f75704.
* fix test
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Reproducer:
# NOTE: we need clickhouse from 33957 since right now LSan is broken due to getauxval().
$ url=https://s3.amazonaws.com/clickhouse-builds/33957/e04b862673644d313712607a0078f5d1c48b5377/package_asan/clickhouse
$ wget $url -o clickhouse-asan
$ chmod +x clickhouse-asan
$ ./clickhouse-asan server &
$ ./clickhouse-asan client
:) create table data (key Int, value String) engine=MergeTree() order by key
:) insert into data select number%5, toString(number) from numbers(10e6)
# usually it is enough one query, benchmark is just for stability of the results
# note, that if the exception was not happen from AggregatingInOrderTransform then add --continue_on_errors and wait
$ ./clickhouse-asan benchmark --query 'select key, uniqCombined64(value), groupArray(value) from data group by key' --optimize_aggregation_in_order=1 --memory_tracker_fault_probability=0.01, max_untracked_memory='2Mi'
LSan report:
==24595==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 3932160 byte(s) in 6 object(s) allocated from:
0 0xcadba93 in realloc ()
1 0xcc108d9 in Allocator<false, false>::realloc() obj-x86_64-linux-gnu/../src/Common/Allocator.h:134:30
2 0xde19eae in void DB::PODArrayBase<>::realloc<DB::Arena*&>(unsigned long, DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h:161:25
3 0xde5f039 in void DB::PODArrayBase<>::reserveForNextSize<DB::Arena*&>(DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h
4 0xde5f039 in void DB::PODArray<>::push_back<>(DB::GroupArrayNodeString*&, DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h:432:19
5 0xde5f039 in DB::GroupArrayGeneralImpl<>::add() const obj-x86_64-linux-gnu/../src/AggregateFunctions/AggregateFunctionGroupArray.h:465:31
6 0xde5f039 in DB::IAggregateFunctionHelper<>::addBatchSinglePlaceFromInterval() const obj-x86_64-linux-gnu/../src/AggregateFunctions/IAggregateFunction.h:481:53
7 0x299df134 in DB::Aggregator::executeOnIntervalWithoutKeyImpl() obj-x86_64-linux-gnu/../src/Interpreters/Aggregator.cpp:869:31
8 0x2ca75f7d in DB::AggregatingInOrderTransform::consume() obj-x86_64-linux-gnu/../src/Processors/Transforms/AggregatingInOrderTransform.cpp:124:13
...
SUMMARY: AddressSanitizer: 4523184 byte(s) leaked in 12 allocation(s).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
1. add LRUResourceCache for managing resource cache in lru policy
2. rollback LRUCache to the original version
3. add remove() in LRUCache
4. add unit tests for LRUResourceCache and LRUCache
Defines.h is a very common header, so lots of modules will be recompiled
on changes.
Move macros for protocol into separate header, this should significantly
decreases number of units to compile on it's changes.
Overflow row is used for GROUP BY if all of the above is true:
- WITH TOTALS is requested
- max_rows_to_group_by > 0
- group_by_overflow_mode = any
- totals_mode != after_having_exclusive
And in case of overflow row and external GROUP BY, once the temporary
file dumps to disk it resets without_key data variant to nullptr, so any
subsequent dump to disk will cause SIGSEGV.
Fix this, by recreating without_key data variant after dumping to disk,
instead of reseting to nullptr.
And also add sanity check (LOGICAL_ERROR) to make error more
deterministic in case of such error.
Found with fuzzer [1].
[1]: https://clickhouse-test-reports.s3.yandex.net/23929/e7027e052998540ee660d186727e20f9555b729d/fuzzer_ubsan/report.html#fail1