Commit Graph

718 Commits

Author SHA1 Message Date
Nikolai Kochetov
56feef01e7 Move some resources 2022-05-20 19:49:31 +00:00
Dmitry Novik
b3ccf96c81 Merge remote-tracking branch 'origin/master' into grouping-function 2022-05-19 17:58:33 +00:00
Dmitry Novik
d4c66f4a48 Code cleanup & fix GROUPING() with TOTALS 2022-05-19 16:36:51 +00:00
Dmitry Novik
86d48e1c99 Disable WITH ROLLUP/CUBE for GROUPING SETS 2022-05-19 14:10:04 +00:00
Azat Khuzhin
e32b695775 Fix projections with GROUP/ORDER BY in query and optimize_aggregation_in_order
With projections, GROUP BY/ORDER BY in query, optimize_aggregation_in_order,
GROUP BY's InputOrderInfo was used incorrectly for ORDER BY.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-19 13:11:22 +03:00
Azat Khuzhin
dea1706d4e
Fix GROUP BY AggregateFunction (#37093)
* Fix GROUP BY AggregateFunction

finalizeChunk() was unconditionally converting AggregateFunction to the
underlying type, however this should be done only if the aggregate was
applied.

So pass names of aggregates as an argument to the finalizeChunk()

Fuzzer report [1]:

    Logical error: 'Bad cast from type DB::ColumnArray to DB::ColumnAggregateFunction'. Received signal 6 Received signal Aborted (6)

For the following query:

    SELECT
        arraySort(groupArrayArray(grp_simple)),
        grp_aggreg,
        arraySort(groupArrayArray(grp_simple)),
        b,
        arraySort(groupArrayArray(grp_simple)) AS grs
    FROM data_02294
    GROUP BY
        a,
        grp_aggreg,
        b
    SETTINGS optimize_aggregation_in_order = 1

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/37050/323ae98202d80fc4b311be1e7308ef2ac39e6063/fuzzer_astfuzzerdebug,actions//fuzzer.log

v2: fix conflicts in src/Interpreters/InterpreterSelectQuery.cpp
v3: Fix header for GROUP BY AggregateFunction WITH TOTALS
v4: Add sanity check into finalizeBlock()
v5: Use typeid_cast<&> to get more sensible error in case of bad cast (as suggested by @nickitat)
v6: Fix positions passed to finalizeChunk()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Core/ColumnNumbers.h: remove unused <string>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Optimize finalizeChunk()/finalizeBlock()

v2: s/ByPosition/Mask/ s/by_position/mask/
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-18 23:37:43 +02:00
Dmitry Novik
6fc7dfea80 Support ordinary GROUP BY 2022-05-13 23:04:12 +00:00
Nikolai Kochetov
0a715b26db Move some resources. 2022-05-13 20:02:28 +00:00
Dmitry Novik
ae81268d4d Try to compute helper column lazy 2022-05-13 14:55:50 +00:00
Dmitry Novik
c5b40a9c91 WIP on GROUPING function 2022-05-12 16:40:26 +00:00
Maksim Kita
4e7d10297b Fixed style 2022-05-11 21:59:51 +02:00
Maksim Kita
8ceb63ee6c Added JIT compilation of SortDescription 2022-05-11 21:59:51 +02:00
Nikolai Kochetov
ec34761d9f
Merge pull request #33631 from ClickHouse/grouping-sets-fix
Support GROUPING SETS
2022-05-11 21:28:46 +02:00
Nikolai Kochetov
2d99f0ce13 Simplify code a little bit. 2022-05-11 12:16:15 +00:00
Nikolai Kochetov
a02e1d2f4a Simplify code a little bit. 2022-05-10 16:00:00 +00:00
vdimir
00cd21cacf
Option to force cross_to_inner_join_rewrite 2022-05-10 15:12:17 +00:00
mergify[bot]
55a6d22ad3
Merge branch 'master' into grouping-sets-fix 2022-05-09 14:02:10 +00:00
Nikolai Kochetov
ebfeca2c86 Try to move some code from SourceWithProgress to PipelineExecutor. 2022-05-09 10:28:05 +00:00
Vladimir C
bd5fab97d9
Merge pull request #36415 from bigo-sg/concurrent_join 2022-05-06 17:11:10 +02:00
Dmitry Novik
b4e1dbc3c8 Remove whitespaces 2022-05-05 18:35:41 +00:00
Dmitry Novik
4cc26aa38b Merge remote-tracking branch 'origin/master' into grouping-sets-fix
And fix execution of the query with only one grouping set
2022-05-05 17:14:52 +00:00
Dmitry Novik
161f52292b Support distributed queries 2022-05-05 13:56:16 +00:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
zhanglistar
7d798b3655
Count distinct optimization by using subquery of group by (#35993) 2022-04-28 14:55:37 +02:00
lgbo-ustc
520b05b9f1 update test case tests/queries/0_stateless/02236_explain_pipeline_join.sql 2022-04-27 10:08:22 +08:00
lgbo-ustc
20fc676bff fixed bug: resize on left pipeline cause the order by result wrong 2022-04-26 18:27:42 +08:00
lgbo-ustc
d96c29810a fixed bug: resize on left pipeline cause the order by result wrong 2022-04-26 18:12:14 +08:00
lgbo-ustc
0b0fa8453b fixed bug: resize on left pipeline cause the order by result wrong 2022-04-26 18:06:16 +08:00
lgbo-ustc
981d560553 Merge remote-tracking branch 'ck/master' into concurrent_join 2022-04-25 13:00:04 +08:00
Nikita Mikhaylov
224f4dc620
Made parallel_reading_from_replicas work with localhost replica (#36281) 2022-04-22 15:52:38 +02:00
Dmitry Novik
6ee62007d3 Handle GROUPING SETS with size < 2 2022-04-21 23:52:43 +00:00
Dmitry Novik
b5e2b38529 Optimize pipeline in case of single input 2022-04-21 14:11:58 +00:00
Dmitry Novik
df63a9b205 Depricate TOTALS with GROUPING SETS 2022-04-21 13:41:59 +00:00
Dmitry Novik
77a82cc090
Merge pull request #35631 from amosbird/projection-fix1
Fix broken SET reuse during projection analysis.
2022-04-21 15:32:52 +02:00
Dmitry Novik
61deae7105
Merge branch 'master' into grouping-sets-fix 2022-04-21 03:34:42 +02:00
Dmitry Novik
6e73cd6929 Implement parallel grouping sets processing 2022-04-21 01:18:40 +00:00
lgbo-ustc
3d7338581b Improve join
now adding joined blocks from right table can be run parallelly, speedup the join process
2022-04-19 16:07:30 +08:00
Robert Schulze
118e94523c
Activate clang-tidy warning "readability-container-contains"
This check suggests replacing <Container>.count() by
<Container>.contains() which is more speaking and in case of
multimaps/multisets also faster.
2022-04-18 23:53:11 +02:00
Dmitry Novik
a16710c750 Merge remote-tracking branch 'origin/master' into grouping-sets-fix 2022-04-14 17:29:51 +00:00
mergify[bot]
0b3c15c07a
Merge branch 'master' into projection-fix1 2022-04-12 13:49:28 +00:00
Yakov Olkhovskiy
155a2a0d42
Merge pull request #35349 from yakov-olkhovskiy/interpolate-feature
Interpolate feature
2022-04-11 11:15:50 -04:00
Yakov Olkhovskiy
64dcddc6e3 fixed ASTInterpolateElement::clone, fixed QueryNormalizer to exclude ASTInterpolateElement::children 2022-04-07 17:41:05 -04:00
Amos Bird
1238babb6f
Make SelectQueryInfo pseudo-copyable 2022-04-07 17:46:50 +08:00
Amos Bird
69022850b9
Better 2022-04-07 17:46:50 +08:00
Amos Bird
9cf5935604
Fix broken SET reuse during projection analysis. 2022-04-07 17:46:49 +08:00
Yakov Olkhovskiy
7dbe8bc2dc major bugs fixed, tests added, docs updated 2022-04-07 01:21:24 -04:00
Yakov Olkhovskiy
90c4cd3de7
Merge branch 'master' into interpolate-feature 2022-04-05 14:39:07 -04:00
Yakov Olkhovskiy
e0d6033c39 all columns can participate in interpolate expression despite if they are selected or not, some optimization on expressionless INTERPOLATE 2022-04-05 14:26:49 -04:00
Alexander Tokmakov
da00beaf7f Merge branch 'master' into mvcc_prototype 2022-04-05 11:14:42 +02:00
Nikita Taranov
bd89fcafdb
Make SortDescription::column_name always non-empty (#35805) 2022-04-04 14:17:15 +02:00
Yakov Olkhovskiy
ff4d295265 style fix 2022-04-03 22:19:35 -04:00
Yakov Olkhovskiy
95ad1bf6e1 use aliases if exist for original_select_set 2022-04-03 22:10:36 -04:00
Yakov Olkhovskiy
ec0ad8804a style fix 2022-04-01 21:45:58 -04:00
Yakov Olkhovskiy
0116233d36 allow INTERPOLATE to reference optimized out columns 2022-04-01 16:18:19 -04:00
Yakov Olkhovskiy
a15996315e bugfix - columns order tracking 2022-03-31 11:51:13 -04:00
Yakov Olkhovskiy
b5682c1f02 minor refactoring 2022-03-31 08:33:50 -04:00
Alexander Tokmakov
5a50ad9de3 Merge branch 'master' into mvcc_prototype 2022-03-31 11:35:04 +02:00
Nikita Taranov
30f2a942c5
Predict size of hash table for GROUP BY (#33439)
* use AggregationMethod ctor with reserve

* add new settings

* add HashTablesStatistics

* support queries with limit

* support distributed and with external aggregation

* add new profile events

* add some tests

* add perf test

* export cache stats through AsynchronousMetrics

* rm redundant trace

* fix style

* fix 02122_parallel_formatting test

* review fixes

* fix 02122_parallel_formatting test

* apply also to two-level HTs

* try simpler strategy

* increase max_size_to_preallocate_for_aggregation for experiment

* fixes

* Revert "increase max_size_to_preallocate_for_aggregation for experiment"

This reverts commit 6cf6f75704.

* fix test

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-30 22:47:51 +02:00
Yakov Olkhovskiy
6a1e116c46 refactoring 2022-03-30 16:34:19 -04:00
Yakov Olkhovskiy
615efa1381 aliases processing fixed 2022-03-28 19:15:53 -04:00
Alexander Tokmakov
208b242188 Merge branch 'master' into mvcc_prototype 2022-03-28 19:58:06 +02:00
Yakov Olkhovskiy
5a4694f340 major refactoring, simplified, optimized, bugs fixed 2022-03-27 14:32:09 -04:00
Yakov Olkhovskiy
adefcfd299
Merge branch 'master' into interpolate-feature 2022-03-24 15:33:09 -04:00
Yakov Olkhovskiy
83f406b722 optimization, INTERPOLATE without expr. list, any column is allowed except WITH FILL 2022-03-24 15:29:29 -04:00
Nikolai Kochetov
283e20a9a5
Merge pull request #35395 from amosbird/distributedmultiplejoin
Validate some thoughts over making sets
2022-03-24 10:30:26 +01:00
Alexander Tokmakov
9aed0507b7 Merge branch 'master' into mvcc_prototype 2022-03-23 18:07:22 +01:00
Amos Bird
ab7923a26c
Remove comments 2022-03-23 23:21:02 +08:00
Anton Popov
4ff9627f60 fix crash with enabled optimize_functions_to_subcolumns 2022-03-23 01:27:52 +00:00
mergify[bot]
e5a5ab2a40
Merge branch 'master' into distributedmultiplejoin 2022-03-21 10:00:51 +00:00
Amos Bird
243de091bb
Validate some thoughts over making sets 2022-03-21 10:58:44 +08:00
Yakov Olkhovskiy
c4daf514d6
Update InterpreterSelectQuery.cpp
bugfix: check column existence for INTERPOLATE expression target
2022-03-19 14:12:29 -04:00
Yakov Olkhovskiy
eb7474e73a
Merge branch 'master' into interpolate-feature 2022-03-19 03:11:14 -04:00
Yakov Olkhovskiy
a8e1671a76 type match check for INTERPOLATE expressions added, bugfix, printout fixed 2022-03-18 16:44:27 -04:00
Alexander Tokmakov
07d952b728 use snapshots for semistructured data, durability fixes 2022-03-17 18:26:18 +01:00
Alexander Tokmakov
d04dc03fa4 Merge branch 'master' into mvcc_prototype 2022-03-17 15:24:32 +01:00
Yakov Olkhovskiy
7bb66e6702 added INTERPOLATE extension for ORDER BY WITH FILL 2022-03-17 01:51:35 -04:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Alexander Tokmakov
061fa6a6f2 Merge branch 'master' into mvcc_prototype 2022-03-10 13:13:04 +01:00
Alexander Tokmakov
8acfb8d27f Merge branch 'master' into mvcc_prototype 2022-03-07 17:40:15 +01:00
Amos Bird
fe4534d464
Get rid of duplicate query planing. 2022-03-08 00:02:58 +08:00
Anton Popov
df3b07fe7c Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-03 22:25:28 +00:00
Frank Chen
b4829465d9
Improve the opentelemetry span logs for INSERT on distributed table (#34480) 2022-03-03 12:53:29 +01:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
Anton Popov
fcdebea925 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-25 13:41:30 +03:00
Alexander Tokmakov
aa6b9a2abc Merge branch 'master' into mvcc_prototype 2022-02-23 23:22:03 +03:00
Dmitry Novik
67df01d025
Merge branch 'master' into grouping-sets-fix 2022-02-22 04:18:03 -08:00
Dmitry Novik
4428e7aa1b
Merge branch 'master' into nv/move-part-count 2022-02-21 02:14:23 -08:00
Azat Khuzhin
774744a86d Fix allow_experimental_projection_optimization with enable_global_with_statement
allow_experimental_projection_optimization requires one more
InterpreterSelectQuery, which with enable_global_with_statement will
apply ApplyWithAliasVisitor if the query is not subquery.

But this should not be done for queries from
MergeTreeData::getQueryProcessingStage()/getQueryProcessingStageWithAggregateProjections()
since this will duplicate WITH statements over and over.

This will also fix scalar.xml perf tests, that leads to the following
error now:

    scalar.query0.prewarm0: DB::Exception: Stack size too large.

And since it has very long query in the log, this leads to the following
perf test error:

    _csv.Error: field larger than field limit (131072)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-16 19:14:47 +03:00
Anton Popov
a661eaf39f better performance of getting storage snapshot 2022-02-16 02:17:22 +03:00
Alexander Tokmakov
07e66e690d Merge branch 'master' into mvcc_prototype 2022-02-11 15:53:32 +03:00
Anton Popov
18940b8637 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-09 23:38:38 +03:00
Nikolai Kochetov
38fb50f736
Merge pull request #33958 from Algunenano/mv_cacheable_scalars
Scalar cache improvements
2022-02-09 16:46:53 +03:00
Nicolae Vartolomei
1cdb50cf13 Disable optimize_trivial_count when deduplication for part movement feature is enabled
Fixes #34089
2022-02-07 18:26:49 +00:00
mergify[bot]
dd947f964c
Merge branch 'master' into mv_cacheable_scalars 2022-02-07 10:07:26 +00:00
Amos Bird
a6f0b01e6a
Fix order by after aggregation 2022-02-07 00:42:11 +08:00
Amos Bird
1ab773cc90
Fix aggregation_in_order with normal projection 2022-02-06 16:46:12 +08:00
Alexander Tokmakov
ca5f951558 Merge branch 'master' into mvcc_prototype 2022-02-03 18:56:44 +03:00
Anton Popov
9b844c6b42
Merge pull request #32748 from CurtizJ/read-in-order-fixed-prefix
Support `optimize_read_in_order` if prefix of sorting key is already sorted
2022-02-03 18:17:08 +03:00
Anton Popov
836a348a9c Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-01 15:23:07 +03:00
Alexander Tokmakov
2e4ae37d98 Merge branch 'master' into mvcc_prototype 2022-02-01 13:20:03 +03:00