Commit Graph

744 Commits

Author SHA1 Message Date
Igor Nikonov
7cd12393c2 If sorting type is specified then use it. Otherwise rely on sort description 2022-07-14 16:26:25 +00:00
Igor Nikonov
1d49adad20 Introduce Auto mode for sorting step (replace others for now) 2022-07-14 13:29:39 +00:00
Igor Nikonov
b73aca2a3b
Merge branch 'master' into skipping_sorting_step 2022-07-13 19:06:41 +02:00
Igor Nikonov
159c9428bd clean up 2022-07-13 17:05:54 +00:00
Igor Nikonov
f0d547993a Fix: 01655_plan_optimizations_optimize_read_in_window_order_long
basically I returned code. Both plans with Finish sorting, need to check
sorting prefix
2022-07-13 16:51:51 +00:00
vdimir
4124dc9ac4
Rewrite tryPushDownFilter for join with lambda 2022-07-13 12:06:29 +00:00
vdimir
549a85fee9
Throw logical error on child idx mismatch in tryAddNewFilterStep 2022-07-13 11:53:46 +00:00
vdimir
bddf6c1b32
Pushdown filter to the right side of sorting join 2022-07-13 11:36:25 +00:00
Igor Nikonov
1d6f699a12 Use sort mode Port for reading in order 2022-07-12 21:56:00 +00:00
Amos Bird
982e1a73d3
Better 2022-07-12 22:21:46 +08:00
Amos Bird
d3709c6c26
Avoid redundant join block transformation. 2022-07-12 22:20:10 +08:00
Amos Bird
b9d9ca5194
style fix 2022-07-12 22:20:08 +08:00
Dmitry Novik
aabf5123d6 Fixup 2022-07-12 13:46:06 +00:00
Igor Nikonov
2c8d9080bd Fix: consider collation in column sort description comparison 2022-07-12 13:14:10 +00:00
Dmitry Novik
cfca3db884 Fix crash with totals 2022-07-12 12:15:43 +00:00
Igor Nikonov
ea5e7793b2 Fix: self-review comments 2022-07-11 21:26:39 +00:00
Igor Nikonov
e0776b1c82 Fix: test for optimize read in window order
+ code polishing
2022-07-11 20:59:38 +00:00
Igor Nikonov
0ca8166ab2 Fix: forgot to return sorting type in constructors 2022-07-11 20:59:38 +00:00
Igor Nikonov
47bed7e318 Try to choose sorting transform based on sort description with fallback 2022-07-11 20:59:38 +00:00
Igor Nikonov
2a7e3bd741 Fix + SortMode::None as default value 2022-07-11 20:59:38 +00:00
Igor Nikonov
16d2319a8d SortingStep: type of sorting is deduced based on input stream sorting description in during transformation
+ perf test
2022-07-11 20:59:38 +00:00
Igor Nikonov
7d4d92bd61 In case full sort was wrong choise during plan interpretation 2022-07-11 20:59:38 +00:00
Igor Nikonov
67ce421e38 Skip sorting step if input stream is globally sorted 2022-07-11 20:59:38 +00:00
Dmitry Novik
d1df66687b
Merge branch 'master' into group-by-use-nulls 2022-07-07 20:54:38 +02:00
Dmitry Novik
1587385f7a Cleanup code 2022-07-07 18:53:20 +00:00
vdimir
7c586a9e7c
Minor updates for full soring merge join 2022-07-06 14:28:05 +00:00
vdimir
1b429fc1af
wip: any left/right sorting join 2022-07-06 14:23:46 +00:00
vdimir
8dce97123c
wip: any inner full sorting join 2022-07-06 14:23:46 +00:00
vdimir
4a16195964
Calculate output header for full sorting merge join 2022-07-06 14:23:45 +00:00
vdimir
fa8eb35599
Pipeline for full sorting merge join 2022-07-06 14:23:44 +00:00
Maksim Kita
b94489d52c
Merge pull request #38859 from kitaisreal/merge-tree-merge-disable-batch-optimization
MergeTree merge disable batch optimization
2022-07-06 15:59:40 +02:00
Maksim Kita
bdc21737d5 MergeTree merge disable batch optimization 2022-07-05 16:15:00 +02:00
Igor Nikonov
9ef8ff5a31 Addressing review comments 2022-07-01 22:50:00 +00:00
Anton Popov
ef87e1207c better support of read_in_order in case of fixed prefix of sorting key 2022-07-01 16:45:01 +00:00
Dmitry Novik
81dd90893e Merge remote-tracking branch 'origin/master' into group-by-use-nulls 2022-07-01 16:24:05 +00:00
Dmitry Novik
33f601ec0a Commit support use_nulls for GS 2022-06-30 15:14:26 +00:00
Igor Nikonov
488ee75fc4 + use DistinctSorted for final distinct step
+ fix performance tests
2022-06-30 13:03:39 +00:00
Dmitry Novik
98e9bc84d5 Refector ROLLUP and CUBE 2022-06-30 10:13:58 +00:00
Igor Nikonov
d435532c68 Adapt range search algorithm to high cardinality case
+ range search done in steps of some number of rows.
  Controled by new
  setting `distinct_in_order_range_search_step`. By default 0, i.e.
  whole chunk
+ before start binary search, linear probing is done on each step (32
  rows currently)
2022-06-29 23:30:35 +00:00
mergify[bot]
36139eacd7
Merge branch 'master' into dictinct_in_order_optimization 2022-06-29 13:37:16 +00:00
Nikita Taranov
f5d26572df
Quick fix for aggregation pipeline (#38295) 2022-06-29 01:16:30 +02:00
Igor Nikonov
4a00e33e6b Fixes for some review comments 2022-06-28 21:42:46 +00:00
mergify[bot]
a9c1b68034
Merge branch 'master' into dictinct_in_order_optimization 2022-06-27 20:16:00 +00:00
Dmitry Novik
1d15d72211 Support NULLs in ROLLUP 2022-06-27 18:42:26 +00:00
Nikita Taranov
2487ba7f00
Move updateInputStream to ITransformingStep (#37393) 2022-06-27 13:16:52 +02:00
Igor Nikonov
68927dd60c Adapt distinct for sorted chunks to handle sorted stream, so we can use
it for final distinct as well
2022-06-26 14:52:36 +00:00
Igor Nikonov
04ce070da0 Remove unnecessary include 2022-06-24 23:11:52 +00:00
Igor Nikonov
d5c6f5c18f Fixes
+ flaky test with explain pipeline
+ consider sort direction from read order info in sort description
  (ReadFromMergeTree step)
2022-06-24 22:49:27 +00:00
mergify[bot]
b5d3fd50d2
Merge branch 'master' into dictinct_in_order_optimization 2022-06-23 09:48:38 +00:00
Igor Nikonov
944c247345 DISTINCT in order optimization
+ try use the optimization for final distinct in case of sorted stream
  (sorting inside and among chunks)
+ sorting description contains only columns from sorting key which are in
  header as well
2022-06-23 09:47:22 +00:00
Nikita Taranov
41ba0118b5
Bring back #36396 (#38110)
* Revert "Revert "More parallel execution for queries with `FINAL` (#36396)""

This reverts commit 5bfb15262c.

* fix tests

* fix review suggestions

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-06-22 15:05:07 +02:00
mergify[bot]
f45b4f56d8
Merge branch 'master' into dictinct_in_order_optimization 2022-06-21 21:25:37 +00:00
Igor Nikonov
b0a98bd875 DISTINCT in order optimization
+ use SortDescription from input data stream in DistinctStep to decide if the optimization is applicable
2022-06-21 21:23:49 +00:00
Nikolai Kochetov
b8d27aa8dd
Merge pull request #37469 from azat/projections-optimize_aggregation_in_order
Implement in order aggregation (optimize_aggregation_in_order) for projections for tables with fully materialized projections
2022-06-21 12:17:35 +02:00
Igor Nikonov
6ac68e8303 DISTINCT in order optimization
+ optimization for DISTINCT containing primary key columns
2022-06-20 10:06:15 +00:00
Vladimir Chebotarev
aef6fe6008 Rebase fix. 2022-06-20 05:15:08 +03:00
Vladimir Chebotarev
92a553fb77 Build fix. 2022-06-20 05:15:08 +03:00
Vladimir Chebotarev
6a363b7429 Build fix. 2022-06-20 05:15:08 +03:00
Vladimir Chebotarev
d41c97ea1d Review fixes. 2022-06-20 05:15:08 +03:00
Vladimir Chebotarev
4f38e01343 Unused code. 2022-06-20 05:15:08 +03:00
Vladimir Chebotarev
cc45f15eae Build fix. 2022-06-20 05:15:08 +03:00
Vladimir Chebotarev
3c2a63b87a Fix test. 2022-06-20 05:15:07 +03:00
Vladimir Chebotarev
e50210969f Style. 2022-06-20 05:15:07 +03:00
Vladimir Chebotarev
7f9557f8a3 Added optimize_read_in_window_order setting. 2022-06-20 05:15:07 +03:00
Vladimir Chebotarev
ec22f6d539 Draft. 2022-06-20 05:15:07 +03:00
Azat Khuzhin
4694929623 Implement merging only for AggregatingStep
v2: fill AggregateColumnsConstData only for only_merge
    (fixes 01291_aggregation_in_order and some other tests)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 09:58:36 +03:00
Azat Khuzhin
3559e35b70 AggregatingStep: remove unused forward decl
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 09:58:36 +03:00
wangdh15
02cce40b3a when using clang12 compile, the unused filed shard_count will cause compile error. So delete it. 2022-06-16 10:43:31 +08:00
Alexander Tokmakov
5bfb15262c Revert "More parallel execution for queries with FINAL (#36396)"
This reverts commit c8afeafe0e.
2022-06-15 17:25:38 +03:00
Nikita Taranov
c8afeafe0e
More parallel execution for queries with FINAL (#36396) 2022-06-15 12:44:20 +02:00
Alexey Milovidov
ab9fc572d5
Merge pull request #37667 from ClickHouse/group-by-enum-fix
Support types with non-standard defaults in ROLLUP, CUBE, GROUPING SETS
2022-06-15 05:14:33 +03:00
Yakov Olkhovskiy
11e6b37ea6 preserve filling step position 2022-06-09 13:35:55 -04:00
mergify[bot]
2d01abf871
Merge branch 'master' into revert-37647-Fix-all-CheckTriviallyCopyableMove-Errors 2022-06-07 13:32:30 +00:00
Igor Nikonov
dcad154105
Merge pull request #37866 from ClickHouse/igor_minor_cleanup
Minor cleanup
2022-06-07 15:24:56 +02:00
Anton Popov
df6882d2b9
Revert "Fix errors of CheckTriviallyCopyableMove type" 2022-06-07 13:53:10 +02:00
Robert Schulze
2d87af2a15
Merge pull request #37647 from DevTeamBK/Fix-all-CheckTriviallyCopyableMove-Errors
Fix errors of CheckTriviallyCopyableMove type
2022-06-05 19:58:47 +02:00
Igor Nikonov
13149dc094 Minor cleanup 2022-06-05 14:31:07 +00:00
HeenaBansal2009
4cb561b070 Fix new warning from BuilderBinTidy 2022-06-03 11:47:36 -07:00
Nikolai Kochetov
468c04ee66 Fix test. 2022-06-02 21:29:29 +00:00
Nikolai Kochetov
176af473c3 Fix build. 2022-06-02 19:38:47 +00:00
Nikolai Kochetov
8991f39412 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-06-02 17:00:08 +00:00
Nikolai Kochetov
00395e752e Cleanup 2022-06-02 16:59:14 +00:00
HeenaBansal2009
e3080f2a97 Merge remote-tracking branch 'origin' into Fix-all-CheckTriviallyCopyableMove-Errors 2022-06-02 07:30:08 -07:00
Nikita Mikhaylov
d34e051c69
Support for simultaneous read from local and remote parallel replica (#37204) 2022-06-02 11:46:33 +02:00
Nikolai Kochetov
edac3d6714 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-06-02 09:36:20 +00:00
Anton Popov
6cf9405f09 fix optimize_monotonous_functions_in_order_by in distributed queries 2022-06-01 00:50:28 +00:00
Nikolai Kochetov
86fbb74703 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-31 18:07:47 +00:00
Nikolai Kochetov
147a819221 Refactor a little bit more. 2022-05-31 14:43:38 +00:00
Dmitry Novik
0e63583b8f Support types with non-standard defaults in ROLLUP, CUBE, GROUPING SETS 2022-05-31 00:11:10 +00:00
Nikolai Kochetov
77b07dd0a8
Merge pull request #37163 from ClickHouse/grouping-function
Add GROUPING function
2022-05-30 20:45:04 +02:00
HeenaBansal2009
b7eb6bbd38 Fixed clang-tidy-CheckTriviallyCopyableMove-errors 2022-05-30 11:09:03 -07:00
Nikolai Kochetov
5ef51ed27b Fix more tests. 2022-05-30 13:10:30 +00:00
Nikolai Kochetov
b80b1940ce Fix some tests. 2022-05-27 20:47:35 +00:00
Nikolai Kochetov
1b85f2c1d6 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-25 16:27:40 +02:00
Nikolai Kochetov
3d84aae0ab Better. 2022-05-24 20:06:08 +00:00
Amos Bird
76ddb39d02
refactor format 2022-05-24 12:09:00 +08:00
Amos Bird
983e52cd3f
Aggresive filter pushdown for join 2022-05-24 12:08:42 +08:00
Nikolai Kochetov
fd97a9d885 Move some resources 2022-05-23 19:47:32 +00:00
Nikolai Kochetov
9756b759c6 Move some resources 2022-05-23 13:46:57 +00:00
Nikolai Kochetov
56feef01e7 Move some resources 2022-05-20 19:49:31 +00:00
Dmitry Novik
b3ccf96c81 Merge remote-tracking branch 'origin/master' into grouping-function 2022-05-19 17:58:33 +00:00
Dmitry Novik
d4c66f4a48 Code cleanup & fix GROUPING() with TOTALS 2022-05-19 16:36:51 +00:00
Azat Khuzhin
dea1706d4e
Fix GROUP BY AggregateFunction (#37093)
* Fix GROUP BY AggregateFunction

finalizeChunk() was unconditionally converting AggregateFunction to the
underlying type, however this should be done only if the aggregate was
applied.

So pass names of aggregates as an argument to the finalizeChunk()

Fuzzer report [1]:

    Logical error: 'Bad cast from type DB::ColumnArray to DB::ColumnAggregateFunction'. Received signal 6 Received signal Aborted (6)

For the following query:

    SELECT
        arraySort(groupArrayArray(grp_simple)),
        grp_aggreg,
        arraySort(groupArrayArray(grp_simple)),
        b,
        arraySort(groupArrayArray(grp_simple)) AS grs
    FROM data_02294
    GROUP BY
        a,
        grp_aggreg,
        b
    SETTINGS optimize_aggregation_in_order = 1

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/37050/323ae98202d80fc4b311be1e7308ef2ac39e6063/fuzzer_astfuzzerdebug,actions//fuzzer.log

v2: fix conflicts in src/Interpreters/InterpreterSelectQuery.cpp
v3: Fix header for GROUP BY AggregateFunction WITH TOTALS
v4: Add sanity check into finalizeBlock()
v5: Use typeid_cast<&> to get more sensible error in case of bad cast (as suggested by @nickitat)
v6: Fix positions passed to finalizeChunk()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Core/ColumnNumbers.h: remove unused <string>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Optimize finalizeChunk()/finalizeBlock()

v2: s/ByPosition/Mask/ s/by_position/mask/
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-18 23:37:43 +02:00
Dmitry Novik
e5b395e054 Support ROLLUP and CUBE in GROUPING function 2022-05-16 17:33:38 +00:00
Robert Schulze
d66dcdad79
Fix new occurrences of new clang-tidy warnings 2022-05-16 11:31:36 +02:00
Dmitry Novik
6fc7dfea80 Support ordinary GROUP BY 2022-05-13 23:04:12 +00:00
Nikolai Kochetov
0a715b26db Move some resources. 2022-05-13 20:02:28 +00:00
Dmitry Novik
ae81268d4d Try to compute helper column lazy 2022-05-13 14:55:50 +00:00
Dmitry Novik
c5b40a9c91 WIP on GROUPING function 2022-05-12 16:40:26 +00:00
Maksim Kita
437d70d4da Fixed tests 2022-05-11 21:59:51 +02:00
Maksim Kita
75555c436b Fix usage of min_count_to_compile_sort_description setting 2022-05-11 21:59:51 +02:00
Maksim Kita
ea8ce3140a Fixed tests 2022-05-11 21:59:51 +02:00
Maksim Kita
4e7d10297b Fixed style 2022-05-11 21:59:51 +02:00
Maksim Kita
cbfb773b50 Fixed tests 2022-05-11 21:59:51 +02:00
Maksim Kita
8ceb63ee6c Added JIT compilation of SortDescription 2022-05-11 21:59:51 +02:00
Nikolai Kochetov
2d99f0ce13 Simplify code a little bit. 2022-05-11 12:16:15 +00:00
Nikolai Kochetov
4b8a2e2d80 Fix fuzzed queries. 2022-05-11 10:22:34 +00:00
Nikolai Kochetov
b6075031d8 Delete GroupingSetsTransform. 2022-05-10 17:54:36 +00:00
Nikolai Kochetov
f7dbd48ee5 Simplify code a little bit. 2022-05-10 16:12:03 +00:00
Nikolai Kochetov
a02e1d2f4a Simplify code a little bit. 2022-05-10 16:00:00 +00:00
mergify[bot]
55a6d22ad3
Merge branch 'master' into grouping-sets-fix 2022-05-09 14:02:10 +00:00
Alexey Milovidov
6216c1827f
Merge pull request #37020 from ucasfl/remove-code
remove useless code
2022-05-09 00:00:07 +03:00
fenglv
2cd0f2aaed remove useless code 2022-05-08 16:50:13 +00:00
Vladimir C
bd5fab97d9
Merge pull request #36415 from bigo-sg/concurrent_join 2022-05-06 17:11:10 +02:00
Dmitry Novik
9a251e0028 Cleanup code 2022-05-05 18:13:00 +00:00
Dmitry Novik
4cc26aa38b Merge remote-tracking branch 'origin/master' into grouping-sets-fix
And fix execution of the query with only one grouping set
2022-05-05 17:14:52 +00:00
Dmitry Novik
161f52292b Support distributed queries 2022-05-05 13:56:16 +00:00
Dmitry Novik
9be17ef50c
Merge pull request #35111 from azat/optimize_aggregation_in_order-prefix
Implement partial GROUP BY key for optimize_aggregation_in_order
2022-05-02 17:49:48 +02:00
Azat Khuzhin
190ce217bb Disable GROUP BY statistics for optimize_aggregation_in_order
This statistics significantly decrease performance of
optimize_aggregation_in_order with a prefix key.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:58:27 +03:00
Azat Khuzhin
3931dbd848 Implement partial GROUP BY key for optimize_aggregation_in_order
Suppose you have a table with lots of rows, like:

    create table data_02233 (parent_key Int, child_key Int, value Int) engine=MergeTree() order by parent_key

And you want to do GROUP BY (parent_key, child_key) with optimize_aggregation_in_order:

    select parent_key, child_key, count() from data_02233 group by parent_key, child_key with totals order by parent_key, child_key

Right now, it is not possible, because optimize_aggregation_in_order
supports only w/o key aggregation, i.e. GROUP BY cannot be done inside
unique parent_key region.

v2: rebase on top SortDescriptionWithPositions
v3: disable two-level aggregation
v4: fix merging of aggregates
v5: improve tests coverage (add a test with multiple parts, to add merge processor)
v6: add a test for compiled aggregate functions (sum()) explicitly
v7: add missing sortBlock()
v8: remove group_by_description_optimized
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-29 06:58:07 +03:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
fenglv
1b84d59047 fix typo
modify comment
2022-04-27 12:24:49 +00:00
lgbo-ustc
5738871a8b update QueryPipelineBuilder::joinPipelines 2022-04-27 10:24:19 +08:00
lgbo-ustc
520b05b9f1 update test case tests/queries/0_stateless/02236_explain_pipeline_join.sql 2022-04-27 10:08:22 +08:00
lgbo-ustc
6cb7b7888f update test case 02236_explain_pipeline_join 2022-04-26 19:07:07 +08:00
lgbo-ustc
0b0fa8453b fixed bug: resize on left pipeline cause the order by result wrong 2022-04-26 18:06:16 +08:00
lgbo-ustc
74ccc233d2 Merge remote-tracking branch 'ck/master' into concurrent_join 2022-04-26 09:21:02 +08:00
Nikita Taranov
5dc9478bac fix SortingStep::updateOutputStream() 2022-04-25 17:29:14 +00:00
lgbo-ustc
981d560553 Merge remote-tracking branch 'ck/master' into concurrent_join 2022-04-25 13:00:04 +08:00
Amos Bird
a25bb50096
Refactor many exception messages
1. Always use fmt variant
2. Remove redundant period at the end of message
3. Remove useless parenthesis
2022-04-24 19:44:00 +08:00
Nikita Mikhaylov
224f4dc620
Made parallel_reading_from_replicas work with localhost replica (#36281) 2022-04-22 15:52:38 +02:00
Dmitry Novik
94a381e522 Add test for parallel GROUPING SETS processing 2022-04-21 17:17:55 +00:00
Dmitry Novik
b5e2b38529 Optimize pipeline in case of single input 2022-04-21 14:11:58 +00:00
Dmitry Novik
9cefb62341 Cleanup 2022-04-21 10:43:11 +00:00
Dmitry Novik
61deae7105
Merge branch 'master' into grouping-sets-fix 2022-04-21 03:34:42 +02:00
Dmitry Novik
6e73cd6929 Implement parallel grouping sets processing 2022-04-21 01:18:40 +00:00
lgbo-ustc
3d7338581b Improve join
now adding joined blocks from right table can be run parallelly, speedup the join process
2022-04-19 16:07:30 +08:00
Robert Schulze
118e94523c
Activate clang-tidy warning "readability-container-contains"
This check suggests replacing <Container>.count() by
<Container>.contains() which is more speaking and in case of
multimaps/multisets also faster.
2022-04-18 23:53:11 +02:00
Dmitry Novik
a16710c750 Merge remote-tracking branch 'origin/master' into grouping-sets-fix 2022-04-14 17:29:51 +00:00
Nikolai Kochetov
362fcfd2b8
Merge pull request #36075 from ClickHouse/fix-limit-push-down-over-window
Disable LIMIT push down through WINDOW functions.
2022-04-13 11:57:37 +02:00