Commit Graph

307 Commits

Author SHA1 Message Date
Li Shuai
279970337a Fix all key value is null and group use rollup return wrong answer 2023-05-04 11:07:17 +08:00
Sema Checherinda
4dd86a406a
Merge pull request #48543 from azat/mv-uniq-thread-group
Use one ThreadGroup while pushing to materialized views (and some refactoring for ThreadGroup)
2023-04-11 11:47:46 +02:00
Azat Khuzhin
79b83c4fd2 Remove superfluous includes of logger_userful.h from headers
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-04-10 17:59:30 +02:00
Azat Khuzhin
5b2b20a0b0 Rename ThreadGroupStatus to ThreadGroup
There are methods like getThreadGroup() and ThreadGroupSwitcher class,
so seems that this is logical.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-04-07 15:31:48 +02:00
Azat Khuzhin
f38a7aeabe ThreadPool metrics introspection
There are lots of thread pools and simple local-vs-global is not enough
already, it is good to know which one in particular uses threads.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-03-29 10:46:59 +02:00
Nikolai Kochetov
73e98de46d Merge branch 'master' into aggregate-projections-analysis-query-plan 2023-03-23 21:28:36 +01:00
Sema Checherinda
3c6deddd1d work with comments on PR 2023-03-16 19:55:58 +01:00
Anton Popov
d2a8cd3ed4 fix performance regression 2023-03-14 14:51:28 +00:00
Nikolai Kochetov
669a92bae0 Merge branch 'master' into aggregate-projections-analysis-query-plan 2023-03-13 19:55:49 +01:00
Anton Popov
6f3e4d4137
Merge pull request #46118 from CurtizJ/fix-issues-with-sparse
Randomize setting `ratio_of_defaults_for_sparse_serialization`
2023-03-05 22:28:18 +01:00
LiuNeng
d4c5ab9dcd
Optimize one nullable key aggregate performance (#45772) 2023-03-02 21:01:52 +01:00
Anton Popov
d926713cf5 Merge remote-tracking branch 'upstream/master' into HEAD 2023-02-23 23:04:22 +00:00
Nikolai Kochetov
c5f93eb108 Fix more tests. 2023-02-21 15:44:50 +00:00
Nikolai Kochetov
413a8d38fa Fix totals row for projections. 2023-02-20 16:40:35 +00:00
Anton Popov
3730ea388f fix issues with sparse columns 2023-02-15 21:46:26 +00:00
Nikita Taranov
581f31ad3d better 2023-01-30 17:11:56 +00:00
Nikita Taranov
a18343773f improve perf 2023-01-30 17:11:56 +00:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages (#45449)
* save format string for NetException

* format exceptions

* format exceptions 2

* format exceptions 3

* format exceptions 4

* format exceptions 5

* format exceptions 6

* fix

* format exceptions 7

* format exceptions 8

* Update MergeTreeIndexGin.cpp

* Update AggregateFunctionMap.cpp

* Update AggregateFunctionMap.cpp

* fix
2023-01-24 00:13:58 +03:00
Nikita Taranov
006fdd32d4
Apply preallocation optimisation more carefully (#44455)
* impl

* add perf test

* fix

* review fixes
2023-01-09 13:30:48 +01:00
Vladimir C
7482ea54ab
Merge pull request #43972 from ClickHouse/vdimir/tmp-data-in-fs-cache-2 2022-12-23 11:59:27 +01:00
vdimir
182b34c11e
Fixes 2022-12-22 10:22:57 +00:00
Dmitry Novik
3d2fccab87
Merge branch 'master' into refector-function-node 2022-12-12 21:36:39 +01:00
Nikita Taranov
9e2265a6ed
Improve hash table preallocation optimisation (#43945)
* do not preallocate if max_size_to_preallocate_for_aggregation is too small

* skip optimisation for aggr without key

* increase default for max_size_to_preallocate_for_aggregation
2022-12-08 00:05:15 +01:00
Dmitry Novik
25ecb75ca8 Merge remote-tracking branch 'origin/master' into refector-function-node 2022-12-07 18:36:50 +00:00
Nikolai Kochetov
0ed82f3cc0
Merge branch 'master' into aggregating-in-order-from-query-plan 2022-12-06 16:36:49 +01:00
Dmitry Novik
2c70dbc76a Refactor FunctionNode 2022-12-02 19:15:26 +00:00
Alexander Tokmakov
e45105bf44 detach threads from thread group 2022-11-28 21:31:55 +01:00
Nikolai Kochetov
6d0646ed8f
Merge branch 'master' into aggregating-in-order-from-query-plan 2022-11-28 16:53:29 +01:00
Nikolai Kochetov
1dfa188c7a Add order info for aggregating step in plan. Added test. 2022-11-28 15:15:36 +00:00
Nikita Taranov
7beb58b0cf
Optimize merge of uniqExact without_key (#43072)
* impl for uniqExact

* rm unused (read|write)Text methods

* fix style

* small fixes

* impl for variadic uniqExact

* refactor

* fix style

* more agressive inlining

* disable if max_threads=1

* small improvements

* review fixes

* Revert "rm unused (read|write)Text methods"

This reverts commit a7e7480584.

* encapsulate is_able_to_parallelize_merge in Data

* encapsulate is_exact & argument_is_tuple in Data
2022-11-17 13:19:02 +01:00
taojiatao
721e85a03e minor fix error msg, repalce outdated func name 2022-11-04 11:27:10 +08:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Duc Canh Le
c05429574d add exception 2022-10-17 08:59:39 +08:00
Duc Canh Le
5526e05aac remove junk log 2022-10-16 17:19:40 +08:00
Duc Canh Le
9e9e967f1f choose correct aggregation method for lc128 and lc512 2022-10-16 16:57:15 +08:00
vdimir
0178307c27 Followup for TemporaryDataOnDisk 2022-10-12 15:25:23 +02:00
Alexander Tokmakov
4175f8cde6 abort instead of __builtin_unreachable in debug builds 2022-10-07 21:49:08 +02:00
Raúl Marín
adbaaca2f5
QOL log improvements (#41947)
* Uniformize disk reservation logs

* Remove log about destroying stuff that appears all the time

* More tweaks on disk reservation logs

* Reorder logs in hash join

* Remove log that provides little information

* Collapse part removal logs

Co-authored-by: Sergei Trifonov <sergei@clickhouse.com>
2022-10-06 14:22:44 +02:00
vdimir
0605f6ed7f
fix after rebase 2022-09-29 09:51:48 +00:00
vdimir
6f8e8b979d
Revert "wip"
This reverts commit 46e4f0236df9a6f7b03d40278e583bc93b96559a.
2022-09-29 09:51:47 +00:00
vdimir
74d45325b3
wip 2022-09-29 09:51:46 +00:00
vdimir
9f3f34548c
Allow to create temporaty streams on leaf TemporaryDataOnDisk 2022-09-29 09:51:45 +00:00
vdimir
a56a10f089
Do not require tmp_data in Aggregator 2022-09-29 09:51:42 +00:00
vdimir
15c7a3be34
Temp data on disk: build 2022-09-29 09:51:41 +00:00
vdimir
c0898ce289
Use abstraction for temporary data on disk in Sort and Aggregation 2022-09-29 09:51:41 +00:00
vdimir
ac39bbb3f1
[wip] Common interface for temporary data on disk 2022-09-29 09:51:40 +00:00
Alexey Milovidov
45afacdae4
Merge pull request #41186 from ClickHouse/fix-three-fourth-of-trash
Fix more than half of the trash
2022-09-22 07:28:26 +03:00
Nikita Taranov
100c055510
Prefetching in aggregation (#39304)
* impl

* stash

* clean up

* do not apply when HT is small

* make branch static

* also in merge

* do not hardcode look ahead value

* fix

* apply to methods with cheap key calculation

* more tests

* silence tidy

* fix build

* support HashMethodKeysFixed

* apply during merge only for cheap

* stash

* fixes

* rename method

* add feature flag

* cache prefetch threshold value

* fix

* fix

* Update HashMap.h

* fix typo

* 256KB as default l2 size

Co-authored-by: Alexey Milovidov <milovidov@clickhouse.com>
2022-09-21 18:59:07 +02:00
Alexey Milovidov
45bd3cfc30 Merge branch 'master' into fix-three-fourth-of-trash 2022-09-20 21:27:41 +02:00
Alexey Milovidov
ab4db2d0c4 Fix 5/6 of trash 2022-09-19 08:50:53 +02:00
avogar
c9a9ef5b7e Fix comments 2022-09-16 16:12:30 +00:00
avogar
f1a0501eb2 Fix memory leaks and segfaults in combinators 2022-09-14 18:01:49 +00:00
Alexey Milovidov
193cd1b3b2
Merge pull request #39138 from nickitat/control_block_size_in_aggregator
Control block size in aggregator
2022-09-04 04:51:00 +03:00
vdimir
91788f29e8
Upd TemporaryFileOnDisk 2022-08-24 16:15:54 +00:00
vdimir
7194df1184
Move back TemporaryFile -> TemporaryFileOnDisk 2022-08-24 16:14:11 +00:00
vdimir
51c44424cc
More metrics for temp files 2022-08-24 16:14:09 +00:00
vdimir
cd4038d511
Use TemporaryFileOnDisk instead of Poco::TemporaryFile 2022-08-24 16:14:08 +00:00
vdimir
1321ac87b5
Minor fixes 2022-08-24 16:14:07 +00:00
Nikita Taranov
b31342ec2c fix 2022-08-22 19:29:48 +02:00
Nikita Taranov
a6c4f9218a clean up 2022-08-16 18:56:22 +02:00
Nikita Taranov
248011d7d9 move to utils 2022-08-16 18:56:22 +02:00
Nikita Taranov
6bdbaccc37 use max_block_size from settings 2022-08-16 18:56:22 +02:00
Nikita Taranov
370c25cd2a fix comment 2022-08-16 18:56:22 +02:00
Nikita Taranov
2b76abdacd fix tidy 2022-08-16 18:56:22 +02:00
Nikita Taranov
66b3268c65 fix 2022-08-16 18:56:22 +02:00
Nikita Taranov
56c09bf8a9 generate many blocks in convertToBlockImplNotFinal 2022-08-16 18:56:22 +02:00
Nikita Taranov
433657e978 rm prepareBlockAndFill 2022-08-16 18:56:22 +02:00
Nikita Taranov
63bc894a42 more parallelism 2022-08-16 18:56:22 +02:00
Nikita Taranov
f650b23ee3 generate many blocks 2022-08-16 18:56:22 +02:00
Nikita Taranov
db0110fd7a more accurate crutch 2022-08-16 18:56:22 +02:00
Nikita Taranov
e5e0a24ab3 return chunks from prepareBlockAndFillWithoutKey 2022-08-16 18:56:22 +02:00
Nikita Taranov
4e974661d6 refactor convertToBlockImpl 2022-08-16 18:56:22 +02:00
Nikita Taranov
bca28ba9f8 split prepareBlockAndFill 2022-08-16 18:56:22 +02:00
Alexey Milovidov
1a8ddf2956 Addition to prev. revision 2022-08-14 09:35:22 +02:00
alexX512
6bf29cb610 Change class LRUCache to class CachBase. Check running CacheBase with default pcahce policy SLRU 2022-08-07 19:59:30 +00:00
Nikita Taranov
4943202921
Improve memory usage during memory efficient merging of aggregation results (#39429) 2022-08-03 17:56:59 +02:00
Amos Bird
53f47127e9
Fix only_merge header 2022-07-01 23:31:45 +08:00
Nikita Taranov
2487ba7f00
Move updateInputStream to ITransformingStep (#37393) 2022-06-27 13:16:52 +02:00
Azat Khuzhin
0d4f78639e Interpreters/Aggregator: cleaner interface for block release during merge
Suggested-by: @amosbird
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 09:58:36 +03:00
Azat Khuzhin
4694929623 Implement merging only for AggregatingStep
v2: fill AggregateColumnsConstData only for only_merge
    (fixes 01291_aggregation_in_order and some other tests)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 09:58:36 +03:00
Anton Popov
df6882d2b9
Revert "Fix errors of CheckTriviallyCopyableMove type" 2022-06-07 13:53:10 +02:00
HeenaBansal2009
b7eb6bbd38 Fixed clang-tidy-CheckTriviallyCopyableMove-errors 2022-05-30 11:09:03 -07:00
Dmitry Novik
dd1e7b55b8
Merge pull request #37050 from azat/fix-optimize_aggregation_in_order-prefix-Array
Fix optimize_aggregation_in_order with prefix GROUP BY and *Array aggregate functions
2022-05-16 17:17:56 +02:00
Azat Khuzhin
323ae98202 Fix optimize_aggregation_in_order with prefix GROUP BY and *Array aggregate functions
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Fixes: #35111
See-also: #37046
2022-05-09 21:32:40 +03:00
Azat Khuzhin
6ada8a6337 Fix optimize_aggregation_in_order with *Array aggregate functions
row_begin was wrong, and before this patch aggregator processing
{row_end, row_end} range, in other words, zero range.

Fixes: #9113 (cc @dimarub2000)
v2: add static_cast to fix UBSan
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-09 20:43:19 +03:00
mergify[bot]
93f5aa7488
Merge branch 'master' into aggregator-jit-lock-fix 2022-05-02 16:16:36 +00:00
Dmitry Novik
9be17ef50c
Merge pull request #35111 from azat/optimize_aggregation_in_order-prefix
Implement partial GROUP BY key for optimize_aggregation_in_order
2022-05-02 17:49:48 +02:00
Maksim Kita
c059629f8d Aggregator JIT compilation lock fix 2022-05-02 16:21:10 +02:00
Nikita Taranov
0fd9740c72
Log hash table's cache messages with TRACE level (#36830) 2022-05-01 12:54:54 +02:00
Robert Schulze
89aa9ae00f
Fixed clang-tidy check "bugprone-branch-clone"
The check is currently *not* part of .clang-tidy. It complains about:
(1) "switch has multiple consecutive identical branches"
(2) "repeated branch in conditional chain"

About (1): Lots of findings in switches were about redundant
"[[fallthrough]]" in places where the compiler would not warn anyways. I
have cleaned these up.

About (2): In if-else_if-else chains, fixing the warning would usually
mean concatenating multiple if-conditions. As this would reduce
readability in most cases, I did not fix these places.

Because of (2), I also refrained from adding "bugprone-branch-clone" to
.clang-tidy.
2022-04-30 19:40:28 +02:00
Azat Khuzhin
0ce44f3021 Optimize optimize_aggregation_in_order with a prefix key
Before it does lots of extra work, now, it will be significantly more
optimal (thousands of rows -> 1-2 million of rows).

v2: s/executeOnBlockSimple/executeOnBlockSmall/
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:58:28 +03:00
Azat Khuzhin
190ce217bb Disable GROUP BY statistics for optimize_aggregation_in_order
This statistics significantly decrease performance of
optimize_aggregation_in_order with a prefix key.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:58:27 +03:00
Azat Khuzhin
767acd53fb Add ability to pass range of rows to Aggregator
v2: fix compiled aggregate functions (seek result to row_start)
v3: fix compiled aggregate functions (seek args to row_start)
v4: change signatures for JIT
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:57:55 +03:00
Robert Schulze
118e94523c
Activate clang-tidy warning "readability-container-contains"
This check suggests replacing <Container>.count() by
<Container>.contains() which is more speaking and in case of
multimaps/multisets also faster.
2022-04-18 23:53:11 +02:00
Nikita Taranov
30f2a942c5
Predict size of hash table for GROUP BY (#33439)
* use AggregationMethod ctor with reserve

* add new settings

* add HashTablesStatistics

* support queries with limit

* support distributed and with external aggregation

* add new profile events

* add some tests

* add perf test

* export cache stats through AsynchronousMetrics

* rm redundant trace

* fix style

* fix 02122_parallel_formatting test

* review fixes

* fix 02122_parallel_formatting test

* apply also to two-level HTs

* try simpler strategy

* increase max_size_to_preallocate_for_aggregation for experiment

* fixes

* Revert "increase max_size_to_preallocate_for_aggregation for experiment"

This reverts commit 6cf6f75704.

* fix test

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-30 22:47:51 +02:00
Azat Khuzhin
4fa2ae76bc Fix memory leak in AggregatingInOrderTransform
Reproducer:

    # NOTE: we need clickhouse from 33957 since right now LSan is broken due to getauxval().
    $ url=https://s3.amazonaws.com/clickhouse-builds/33957/e04b862673644d313712607a0078f5d1c48b5377/package_asan/clickhouse
    $ wget $url -o clickhouse-asan
    $ chmod +x clickhouse-asan
    $ ./clickhouse-asan server &

    $ ./clickhouse-asan client
    :) create table data (key Int, value String) engine=MergeTree() order by key
    :) insert into data select number%5, toString(number) from numbers(10e6)

    # usually it is enough one query, benchmark is just for stability of the results
    # note, that if the exception was not happen from AggregatingInOrderTransform then add --continue_on_errors and wait
    $ ./clickhouse-asan benchmark --query 'select key, uniqCombined64(value), groupArray(value) from data group by key' --optimize_aggregation_in_order=1 --memory_tracker_fault_probability=0.01, max_untracked_memory='2Mi'

LSan report:

    ==24595==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 3932160 byte(s) in 6 object(s) allocated from:
        0 0xcadba93 in realloc ()
        1 0xcc108d9 in Allocator<false, false>::realloc() obj-x86_64-linux-gnu/../src/Common/Allocator.h:134:30
        2 0xde19eae in void DB::PODArrayBase<>::realloc<DB::Arena*&>(unsigned long, DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h:161:25
        3 0xde5f039 in void DB::PODArrayBase<>::reserveForNextSize<DB::Arena*&>(DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h
        4 0xde5f039 in void DB::PODArray<>::push_back<>(DB::GroupArrayNodeString*&, DB::Arena*&) obj-x86_64-linux-gnu/../src/Common/PODArray.h:432:19
        5 0xde5f039 in DB::GroupArrayGeneralImpl<>::add() const obj-x86_64-linux-gnu/../src/AggregateFunctions/AggregateFunctionGroupArray.h:465:31
        6 0xde5f039 in DB::IAggregateFunctionHelper<>::addBatchSinglePlaceFromInterval() const obj-x86_64-linux-gnu/../src/AggregateFunctions/IAggregateFunction.h:481:53
        7 0x299df134 in DB::Aggregator::executeOnIntervalWithoutKeyImpl() obj-x86_64-linux-gnu/../src/Interpreters/Aggregator.cpp:869:31
        8 0x2ca75f7d in DB::AggregatingInOrderTransform::consume() obj-x86_64-linux-gnu/../src/Processors/Transforms/AggregatingInOrderTransform.cpp:124:13

    ...

    SUMMARY: AddressSanitizer: 4523184 byte(s) leaked in 12 allocation(s).

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-09 09:23:56 +03:00
Maksim Kita
5ef83deaa6 Update sort to pdqsort 2022-01-30 19:49:48 +00:00
lgbo-ustc
59cbd76880 Add LRUResourceCache
1. add LRUResourceCache for managing resource cache in lru policy
2. rollback LRUCache to the original version
3. add remove() in LRUCache
4. add unit tests for LRUResourceCache and LRUCache
2021-12-29 15:25:33 +08:00
lgbo-ustc
ef1d7142f5 remove getOrTrySet 2021-12-27 16:12:39 +08:00
Anton Popov
54f51444c0 Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-01 15:49:02 +03:00