Commit Graph

187 Commits

Author SHA1 Message Date
vdimir
51c44424cc
More metrics for temp files 2022-08-24 16:14:09 +00:00
vdimir
cd4038d511
Use TemporaryFileOnDisk instead of Poco::TemporaryFile 2022-08-24 16:14:08 +00:00
vdimir
1321ac87b5
Minor fixes 2022-08-24 16:14:07 +00:00
Alexey Milovidov
1a8ddf2956 Addition to prev. revision 2022-08-14 09:35:22 +02:00
alexX512
6bf29cb610 Change class LRUCache to class CachBase. Check running CacheBase with default pcahce policy SLRU 2022-08-07 19:59:30 +00:00
Nikita Taranov
4943202921
Improve memory usage during memory efficient merging of aggregation results (#39429) 2022-08-03 17:56:59 +02:00
Amos Bird
53f47127e9
Fix only_merge header 2022-07-01 23:31:45 +08:00
Nikita Taranov
2487ba7f00
Move updateInputStream to ITransformingStep (#37393) 2022-06-27 13:16:52 +02:00
Azat Khuzhin
0d4f78639e Interpreters/Aggregator: cleaner interface for block release during merge
Suggested-by: @amosbird
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 09:58:36 +03:00
Azat Khuzhin
4694929623 Implement merging only for AggregatingStep
v2: fill AggregateColumnsConstData only for only_merge
    (fixes 01291_aggregation_in_order and some other tests)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 09:58:36 +03:00
Anton Popov
df6882d2b9
Revert "Fix errors of CheckTriviallyCopyableMove type" 2022-06-07 13:53:10 +02:00
HeenaBansal2009
b7eb6bbd38 Fixed clang-tidy-CheckTriviallyCopyableMove-errors 2022-05-30 11:09:03 -07:00
Dmitry Novik
dd1e7b55b8
Merge pull request #37050 from azat/fix-optimize_aggregation_in_order-prefix-Array
Fix optimize_aggregation_in_order with prefix GROUP BY and *Array aggregate functions
2022-05-16 17:17:56 +02:00
Azat Khuzhin
323ae98202 Fix optimize_aggregation_in_order with prefix GROUP BY and *Array aggregate functions
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Fixes: #35111
See-also: #37046
2022-05-09 21:32:40 +03:00
Azat Khuzhin
6ada8a6337 Fix optimize_aggregation_in_order with *Array aggregate functions
row_begin was wrong, and before this patch aggregator processing
{row_end, row_end} range, in other words, zero range.

Fixes: #9113 (cc @dimarub2000)
v2: add static_cast to fix UBSan
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-09 20:43:19 +03:00
mergify[bot]
93f5aa7488
Merge branch 'master' into aggregator-jit-lock-fix 2022-05-02 16:16:36 +00:00
Dmitry Novik
9be17ef50c
Merge pull request #35111 from azat/optimize_aggregation_in_order-prefix
Implement partial GROUP BY key for optimize_aggregation_in_order
2022-05-02 17:49:48 +02:00
Maksim Kita
c059629f8d Aggregator JIT compilation lock fix 2022-05-02 16:21:10 +02:00
Nikita Taranov
0fd9740c72
Log hash table's cache messages with TRACE level (#36830) 2022-05-01 12:54:54 +02:00
Robert Schulze
89aa9ae00f
Fixed clang-tidy check "bugprone-branch-clone"
The check is currently *not* part of .clang-tidy. It complains about:
(1) "switch has multiple consecutive identical branches"
(2) "repeated branch in conditional chain"

About (1): Lots of findings in switches were about redundant
"[[fallthrough]]" in places where the compiler would not warn anyways. I
have cleaned these up.

About (2): In if-else_if-else chains, fixing the warning would usually
mean concatenating multiple if-conditions. As this would reduce
readability in most cases, I did not fix these places.

Because of (2), I also refrained from adding "bugprone-branch-clone" to
.clang-tidy.
2022-04-30 19:40:28 +02:00
Azat Khuzhin
0ce44f3021 Optimize optimize_aggregation_in_order with a prefix key
Before it does lots of extra work, now, it will be significantly more
optimal (thousands of rows -> 1-2 million of rows).

v2: s/executeOnBlockSimple/executeOnBlockSmall/
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:58:28 +03:00
Azat Khuzhin
190ce217bb Disable GROUP BY statistics for optimize_aggregation_in_order
This statistics significantly decrease performance of
optimize_aggregation_in_order with a prefix key.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:58:27 +03:00
Azat Khuzhin
767acd53fb Add ability to pass range of rows to Aggregator
v2: fix compiled aggregate functions (seek result to row_start)
v3: fix compiled aggregate functions (seek args to row_start)
v4: change signatures for JIT
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:57:55 +03:00
Robert Schulze
118e94523c
Activate clang-tidy warning "readability-container-contains"
This check suggests replacing <Container>.count() by
<Container>.contains() which is more speaking and in case of
multimaps/multisets also faster.
2022-04-18 23:53:11 +02:00
Nikita Taranov
30f2a942c5
Predict size of hash table for GROUP BY (#33439)
* use AggregationMethod ctor with reserve

* add new settings

* add HashTablesStatistics

* support queries with limit

* support distributed and with external aggregation

* add new profile events

* add some tests

* add perf test

* export cache stats through AsynchronousMetrics

* rm redundant trace

* fix style

* fix 02122_parallel_formatting test

* review fixes

* fix 02122_parallel_formatting test

* apply also to two-level HTs

* try simpler strategy

* increase max_size_to_preallocate_for_aggregation for experiment

* fixes

* Revert "increase max_size_to_preallocate_for_aggregation for experiment"

This reverts commit 6cf6f75704.

* fix test

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-30 22:47:51 +02:00
Azat Khuzhin
4fa2ae76bc Fix memory leak in AggregatingInOrderTransform
Reproducer:

    # NOTE: we need clickhouse from 33957 since right now LSan is broken due to getauxval().
    $ url=https://s3.amazonaws.com/clickhouse-builds/33957/e04b862673644d313712607a0078f5d1c48b5377/package_asan/clickhouse
    $ wget $url -o clickhouse-asan
    $ chmod +x clickhouse-asan
    $ ./clickhouse-asan server &

    $ ./clickhouse-asan client
    :) create table data (key Int, value String) engine=MergeTree() order by key
    :) insert into data select number%5, toString(number) from numbers(10e6)

    # usually it is enough one query, benchmark is just for stability of the results
    # note, that if the exception was not happen from AggregatingInOrderTransform then add --continue_on_errors and wait
    $ ./clickhouse-asan benchmark --query 'select key, uniqCombined64(value), groupArray(value) from data group by key' --optimize_aggregation_in_order=1 --memory_tracker_fault_probability=0.01, max_untracked_memory='2Mi'

LSan report:

    ==24595==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 3932160 byte(s) in 6 object(s) allocated from:
        0 0xcadba93 in realloc ()
        1 0xcc108d9 in Allocator<false, false>::realloc() obj-x86_64-linux-gnu/../src/Common/Allocator.h:134:30
        2 0xde19eae in void DB::PODArrayBase<>::realloc<DB::Arena*&>(unsigned long, DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h:161:25
        3 0xde5f039 in void DB::PODArrayBase<>::reserveForNextSize<DB::Arena*&>(DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h
        4 0xde5f039 in void DB::PODArray<>::push_back<>(DB::GroupArrayNodeString*&, DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h:432:19
        5 0xde5f039 in DB::GroupArrayGeneralImpl<>::add() const obj-x86_64-linux-gnu/../src/AggregateFunctions/AggregateFunctionGroupArray.h:465:31
        6 0xde5f039 in DB::IAggregateFunctionHelper<>::addBatchSinglePlaceFromInterval() const obj-x86_64-linux-gnu/../src/AggregateFunctions/IAggregateFunction.h:481:53
        7 0x299df134 in DB::Aggregator::executeOnIntervalWithoutKeyImpl() obj-x86_64-linux-gnu/../src/Interpreters/Aggregator.cpp:869:31
        8 0x2ca75f7d in DB::AggregatingInOrderTransform::consume() obj-x86_64-linux-gnu/../src/Processors/Transforms/AggregatingInOrderTransform.cpp:124:13

    ...

    SUMMARY: AddressSanitizer: 4523184 byte(s) leaked in 12 allocation(s).

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-09 09:23:56 +03:00
Maksim Kita
5ef83deaa6 Update sort to pdqsort 2022-01-30 19:49:48 +00:00
lgbo-ustc
59cbd76880 Add LRUResourceCache
1. add LRUResourceCache for managing resource cache in lru policy
2. rollback LRUCache to the original version
3. add remove() in LRUCache
4. add unit tests for LRUResourceCache and LRUCache
2021-12-29 15:25:33 +08:00
lgbo-ustc
ef1d7142f5 remove getOrTrySet 2021-12-27 16:12:39 +08:00
Anton Popov
54f51444c0 Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-01 15:49:02 +03:00
Azat Khuzhin
baf14444e6 Cleanup ProfileEvents and CurrentMetrics 2021-11-10 21:15:27 +03:00
Anton Popov
d71ffc355a Merge remote-tracking branch 'upstream/master' into HEAD 2021-10-18 15:18:22 +03:00
Nikolai Kochetov
fd14faeae2 Remove DataStreams folder. 2021-10-15 23:18:20 +03:00
Anton Popov
7aa6068fb2 Merge remote-tracking branch 'upstream/master' into HEAD 2021-10-14 19:44:08 +03:00
Nikolai Kochetov
c6bce1a4cf Update Native. 2021-10-08 20:21:19 +03:00
Azat Khuzhin
6a9dd9828d Move protocol macros into separate header
Defines.h is a very common header, so lots of modules will be recompiled
on changes.
Move macros for protocol into separate header, this should significantly
decreases number of units to compile on it's changes.
2021-10-03 14:34:03 +03:00
Anton Popov
eef436fe22 Merge remote-tracking branch 'upstream/master' into HEAD 2021-09-16 18:07:42 +03:00
Anton Popov
3adc2a9aea fix distinct with sparse columns 2021-09-16 15:52:59 +03:00
Amos Bird
91293c7449
Fix crash on exception with projection aggregate 2021-09-09 10:43:56 +08:00
Anton Popov
c3c3a06078 Merge remote-tracking branch 'upstream/master' into HEAD 2021-08-20 01:45:38 +03:00
Maksim Kita
07c1a8e26e Aggregation temporary disable compilation without key 2021-08-11 19:37:33 +03:00
Maksim Kita
e6b339fbb3
Merge pull request #26845 from kitaisreal/compile-aggregate-functions-without-key
Compile aggregate functions without key
2021-08-09 11:52:52 +03:00
Alexey Milovidov
9a5533a088 Improve performance 2021-08-05 23:44:14 +03:00
Anton Popov
16ed0f6ed4 Merge remote-tracking branch 'upstream/master' into HEAD 2021-08-02 17:55:17 +03:00
Maksim Kita
fd19f47311 Fixed MSan tests 2021-07-28 19:47:36 +03:00
Maksim Kita
3a6b37691a Compile aggregate functions without key 2021-07-27 19:50:57 +03:00
Maksim Kita
42201d3e30 Fixed code review issues 2021-07-23 01:03:44 +03:00
Maksim Kita
1fea19846b Compile aggregate functions profile events fix 2021-07-23 00:43:31 +03:00
Anton Popov
2b0ec88f5a disable jit aggregation for sparse columns 2021-07-20 20:02:41 +03:00
Anton Popov
14168b11f2 Merge remote-tracking branch 'upstream/master' into HEAD 2021-07-07 17:05:11 +03:00