Commit Graph

130 Commits

Author SHA1 Message Date
Nikita Taranov
a6c4f9218a clean up 2022-08-16 18:56:22 +02:00
Nikita Taranov
248011d7d9 move to utils 2022-08-16 18:56:22 +02:00
Nikita Taranov
6bdbaccc37 use max_block_size from settings 2022-08-16 18:56:22 +02:00
Nikita Taranov
433657e978 rm prepareBlockAndFill 2022-08-16 18:56:22 +02:00
Nikita Taranov
f650b23ee3 generate many blocks 2022-08-16 18:56:22 +02:00
Nikita Taranov
db0110fd7a more accurate crutch 2022-08-16 18:56:22 +02:00
Nikita Taranov
e5e0a24ab3 return chunks from prepareBlockAndFillWithoutKey 2022-08-16 18:56:22 +02:00
Nikita Taranov
4e974661d6 refactor convertToBlockImpl 2022-08-16 18:56:22 +02:00
Nikita Taranov
bca28ba9f8 split prepareBlockAndFill 2022-08-16 18:56:22 +02:00
Nikita Taranov
4943202921
Improve memory usage during memory efficient merging of aggregation results (#39429) 2022-08-03 17:56:59 +02:00
Robert Schulze
deda29b46b
Pass const StringRef by value, not by reference
See #39224
2022-07-15 11:34:56 +00:00
Nikita Taranov
2487ba7f00
Move updateInputStream to ITransformingStep (#37393) 2022-06-27 13:16:52 +02:00
Azat Khuzhin
0d4f78639e Interpreters/Aggregator: cleaner interface for block release during merge
Suggested-by: @amosbird
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 09:58:36 +03:00
Azat Khuzhin
4694929623 Implement merging only for AggregatingStep
v2: fill AggregateColumnsConstData only for only_merge
    (fixes 01291_aggregation_in_order and some other tests)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 09:58:36 +03:00
Nikolai Kochetov
f7dbd48ee5 Simplify code a little bit. 2022-05-10 16:12:03 +00:00
Nikolai Kochetov
a02e1d2f4a Simplify code a little bit. 2022-05-10 16:00:00 +00:00
Dmitry Novik
9a251e0028 Cleanup code 2022-05-05 18:13:00 +00:00
Dmitry Novik
4cc26aa38b Merge remote-tracking branch 'origin/master' into grouping-sets-fix
And fix execution of the query with only one grouping set
2022-05-05 17:14:52 +00:00
Dmitry Novik
161f52292b Support distributed queries 2022-05-05 13:56:16 +00:00
Dmitry Novik
9be17ef50c
Merge pull request #35111 from azat/optimize_aggregation_in_order-prefix
Implement partial GROUP BY key for optimize_aggregation_in_order
2022-05-02 17:49:48 +02:00
Azat Khuzhin
0ce44f3021 Optimize optimize_aggregation_in_order with a prefix key
Before it does lots of extra work, now, it will be significantly more
optimal (thousands of rows -> 1-2 million of rows).

v2: s/executeOnBlockSimple/executeOnBlockSmall/
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:58:28 +03:00
Azat Khuzhin
190ce217bb Disable GROUP BY statistics for optimize_aggregation_in_order
This statistics significantly decrease performance of
optimize_aggregation_in_order with a prefix key.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:58:27 +03:00
Azat Khuzhin
767acd53fb Add ability to pass range of rows to Aggregator
v2: fix compiled aggregate functions (seek result to row_start)
v3: fix compiled aggregate functions (seek args to row_start)
v4: change signatures for JIT
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:57:55 +03:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Dmitry Novik
6e73cd6929 Implement parallel grouping sets processing 2022-04-21 01:18:40 +00:00
Dmitry Novik
a16710c750 Merge remote-tracking branch 'origin/master' into grouping-sets-fix 2022-04-14 17:29:51 +00:00
Nikita Taranov
30f2a942c5
Predict size of hash table for GROUP BY (#33439)
* use AggregationMethod ctor with reserve

* add new settings

* add HashTablesStatistics

* support queries with limit

* support distributed and with external aggregation

* add new profile events

* add some tests

* add perf test

* export cache stats through AsynchronousMetrics

* rm redundant trace

* fix style

* fix 02122_parallel_formatting test

* review fixes

* fix 02122_parallel_formatting test

* apply also to two-level HTs

* try simpler strategy

* increase max_size_to_preallocate_for_aggregation for experiment

* fixes

* Revert "increase max_size_to_preallocate_for_aggregation for experiment"

This reverts commit 6cf6f75704.

* fix test

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-30 22:47:51 +02:00
Maksim Kita
e30117a3d6 Fix clang-tidy warnings in Interpreters, IO folders 2022-03-14 18:17:35 +00:00
Dmitry Novik
67df01d025
Merge branch 'master' into grouping-sets-fix 2022-02-22 04:18:03 -08:00
feng lv
07280e0ab1 Add name hints for data skipping indices
fix test
2022-02-20 11:48:22 +00:00
Azat Khuzhin
4fa2ae76bc Fix memory leak in AggregatingInOrderTransform
Reproducer:

    # NOTE: we need clickhouse from 33957 since right now LSan is broken due to getauxval().
    $ url=https://s3.amazonaws.com/clickhouse-builds/33957/e04b862673644d313712607a0078f5d1c48b5377/package_asan/clickhouse
    $ wget $url -o clickhouse-asan
    $ chmod +x clickhouse-asan
    $ ./clickhouse-asan server &

    $ ./clickhouse-asan client
    :) create table data (key Int, value String) engine=MergeTree() order by key
    :) insert into data select number%5, toString(number) from numbers(10e6)

    # usually it is enough one query, benchmark is just for stability of the results
    # note, that if the exception was not happen from AggregatingInOrderTransform then add --continue_on_errors and wait
    $ ./clickhouse-asan benchmark --query 'select key, uniqCombined64(value), groupArray(value) from data group by key' --optimize_aggregation_in_order=1 --memory_tracker_fault_probability=0.01, max_untracked_memory='2Mi'

LSan report:

    ==24595==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 3932160 byte(s) in 6 object(s) allocated from:
        0 0xcadba93 in realloc ()
        1 0xcc108d9 in Allocator<false, false>::realloc() obj-x86_64-linux-gnu/../src/Common/Allocator.h:134:30
        2 0xde19eae in void DB::PODArrayBase<>::realloc<DB::Arena*&>(unsigned long, DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h:161:25
        3 0xde5f039 in void DB::PODArrayBase<>::reserveForNextSize<DB::Arena*&>(DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h
        4 0xde5f039 in void DB::PODArray<>::push_back<>(DB::GroupArrayNodeString*&, DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h:432:19
        5 0xde5f039 in DB::GroupArrayGeneralImpl<>::add() const obj-x86_64-linux-gnu/../src/AggregateFunctions/AggregateFunctionGroupArray.h:465:31
        6 0xde5f039 in DB::IAggregateFunctionHelper<>::addBatchSinglePlaceFromInterval() const obj-x86_64-linux-gnu/../src/AggregateFunctions/IAggregateFunction.h:481:53
        7 0x299df134 in DB::Aggregator::executeOnIntervalWithoutKeyImpl() obj-x86_64-linux-gnu/../src/Interpreters/Aggregator.cpp:869:31
        8 0x2ca75f7d in DB::AggregatingInOrderTransform::consume() obj-x86_64-linux-gnu/../src/Processors/Transforms/AggregatingInOrderTransform.cpp:124:13

    ...

    SUMMARY: AddressSanitizer: 4523184 byte(s) leaked in 12 allocation(s).

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-09 09:23:56 +03:00
Dmitry Novik
1095814bbf Cleanup code 2022-01-11 11:26:13 +00:00
fanzhou
83e9e5d0e5 some changes 2022-01-11 11:26:12 +00:00
MaxTheHuman
e6bd807f60 grouping sets development 2022-01-11 11:26:10 +00:00
MaxTheHuman
4d1b354b5f grouping sets development 2022-01-11 11:26:10 +00:00
MaxTheHuman
abe09324c1 grouping sets development 2022-01-11 11:26:10 +00:00
MaxTheHuman
3195d600c5 feat grouping-sets: initial changes 2022-01-11 11:26:10 +00:00
alexey-milovidov
0a55fa3dc2
Revert "Grouping sets dev" 2021-12-25 20:30:31 +03:00
alexey-milovidov
6b97af4c63
Merge pull request #26869 from taylor12805/grouping-sets-dev
Grouping sets dev
2021-12-17 20:50:15 +03:00
Dmitry Novik
56a3f4a000 Cleanup code 2021-12-14 22:15:14 +03:00
fanzhou
43db4594ba some changes 2021-11-29 19:35:33 +03:00
MaxTheHuman
ddd1799743 grouping sets development 2021-11-26 22:11:34 +03:00
MaxTheHuman
d2258decf5 grouping sets development 2021-11-26 21:50:03 +03:00
MaxTheHuman
e60d1dd818 grouping sets development 2021-11-26 21:38:44 +03:00
MaxTheHuman
2bd07ef338 feat grouping-sets: initial changes 2021-11-26 20:24:35 +03:00
Anton Popov
d50137013c Merge remote-tracking branch 'upstream/master' into HEAD 2021-11-01 16:55:53 +03:00
Nikolai Kochetov
a92dc0a826 Update obsolete comments. 2021-10-19 12:58:10 +03:00
Anton Popov
d71ffc355a Merge remote-tracking branch 'upstream/master' into HEAD 2021-10-18 15:18:22 +03:00
Nikolai Kochetov
fd14faeae2 Remove DataStreams folder. 2021-10-15 23:18:20 +03:00
Anton Popov
7aa6068fb2 Merge remote-tracking branch 'upstream/master' into HEAD 2021-10-14 19:44:08 +03:00