Commit Graph

1026 Commits

Author SHA1 Message Date
vdimir
714c53ab24
fix typos 2022-08-29 09:49:09 +00:00
vdimir
8e1632f824
Create sets for joins: better code 2022-08-29 09:49:08 +00:00
vdimir
7228091ff1
rename CreateSetAndFilterOnTheFlyTransform 2022-08-29 09:49:07 +00:00
vdimir
c778bba13f
Create sets for joins: wip 2022-08-29 09:47:00 +00:00
vdimir
31a167848d
Fix set finish condition in CreatingSetsOnTheFlyTransform 2022-08-29 09:46:59 +00:00
vdimir
8f06430ebd
Create sets for joins: upd 2022-08-29 09:46:58 +00:00
vdimir
3292566603
Format bytes in CreatingSetsOnTheFlyTransform logs 2022-08-29 09:46:57 +00:00
vdimir
031aaf3a45
Add Creating/FilterBySetsOnTheFlyStep for full sorting join 2022-08-29 09:46:57 +00:00
Azat Khuzhin
f9812d9917 Fix memory leak while pushing to MVs w/o query context (from Kafka/...)
While pushign to MVs, there is a low-level code that create
ThreadGroupStatus/ThreadStatus, it is required to gather some metrics
for system.query_views_log.

But, one should not use ThreadGroupStatus of the MainThreadStatus, since
this structure can hold some state, that may not be cleaned, plus this
may be racy, instead it is better to create new ThreadGroupStatus and
attach it instead.

Also this place misses detachQuery(), and because of this it leaks
ThreadGroupStatus::finished_threads_counters_memory. But it is only the
problem pushing to MVs is done w/o query context (i.e. from Kafka/...),
since when it has query context detachQuery() will be called eventually.

Before this patch series, when I've tried the reproducer with
500 MVs attached to Kafka engine (that @den-crane suggested), jemalloc
report looks like this:

    $ ../jeprof --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
    Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
    Using local file jeprof.44384.167.i167.heap.
    Total: 915.6 MB
       910.7  99.5%  99.5%    910.7  99.5% Snapshot (inline)
         9.5   1.0% 100.5%      9.5   1.0% std::__1::__libcpp_operator_new (inline)
         0.5   0.1% 100.6%      0.5   0.1% DB::TasksStatsCounters::create

And with focus to this place:

    $ ../jeprof --focus Snapshot --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
    Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
    Using local file jeprof.44384.167.i167.heap.
    Total: 915.6 MB
       910.7 100.0% 100.0%    910.7 100.0% Snapshot (inline)
         0.0   0.0% 100.0%    910.7 100.0% DB::QueryPipeline::reset
         0.0   0.0% 100.0%    910.7 100.0% DB::StorageKafka::streamToViews
         0.0   0.0% 100.0%    910.7 100.0% DB::StorageKafka::threadFunc
         0.0   0.0% 100.0%    910.7 100.0% ProfileEvents::Counters::getPartiallyAtomicSnapshot
         0.0   0.0% 100.0%    910.7 100.0% ~ThreadStatus
         0.0   0.0% 100.0%    910.7 100.0% ~ViewRuntimeData
         0.0   0.0% 100.0%    910.7 100.0% ~ViewRuntimeStats (inline)

Actually this report does not looks great (you understand it because I
stripped it), because --text does not that smart, but if you will use
--pdf for the report you will see the stacktrace (will attach pdf to the
pull request).

But after this patch series the process RSS does not goes beyond
~700MiB.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:36:33 +02:00
Vladimir C
e067629e0d
Merge pull request #40239 from vdimir/vdimir/tmp-file-metrics
More metrics for on-disk temporary data
2022-08-26 11:28:01 +02:00
vdimir
91788f29e8
Upd TemporaryFileOnDisk 2022-08-24 16:15:54 +00:00
vdimir
7194df1184
Move back TemporaryFile -> TemporaryFileOnDisk 2022-08-24 16:14:11 +00:00
vdimir
0349c85017
Use getCompressedBytes in BufferingToFileTransform and TemporaryFileStream 2022-08-24 16:14:10 +00:00
vdimir
51c44424cc
More metrics for temp files 2022-08-24 16:14:09 +00:00
vdimir
1321ac87b5
Minor fixes 2022-08-24 16:14:07 +00:00
vdimir
7e0c9062c7
Add ProfileEvents::ExternalSort(Un)CompressedBytes 2022-08-24 16:14:07 +00:00
Alexander Gololobov
1c2dd50ca5 Fix vertical merge of parts with lightweight deleted rows 2022-08-24 15:18:33 +02:00
Alexey Milovidov
ab91c99495
Merge branch 'master' into control_block_size_in_aggregator 2022-08-20 21:28:27 +03:00
Nikita Taranov
f650b23ee3 generate many blocks 2022-08-16 18:56:22 +02:00
Nikita Taranov
db0110fd7a more accurate crutch 2022-08-16 18:56:22 +02:00
Nikita Taranov
e5e0a24ab3 return chunks from prepareBlockAndFillWithoutKey 2022-08-16 18:56:22 +02:00
Vladimir Chebotaryov
3cc03b141e Fixed tests on Debug build type. 2022-08-16 15:43:37 +02:00
Vladimir Chebotaryov
66f9bfca61 Fixed point of origin for exponential decay window functions to the last value in window. 2022-08-16 15:43:37 +02:00
Anton Popov
4bd50bb06c
Merge branch 'master' into distinct_sorted_simplify 2022-08-12 17:11:18 +02:00
Kruglov Pavel
4c7222d938
Merge pull request #40020 from canhld94/ch_canh_fix_hash
fix HashMethodOneNumber with const column
2022-08-12 14:40:24 +02:00
Maksim Kita
6bec0f5854
Merge pull request #38956 from vdimir/dict-join-refactoring
Join with dictionary refactoring
2022-08-11 11:54:11 +02:00
Duc Canh Le
84cd867aa8 materialize column instead of handling column in hash method 2022-08-11 10:46:06 +08:00
vdimir
ad91c16ba0
Rename join_common -> JoinUtils 2022-08-10 14:20:28 +00:00
vdimir
708747ca0b
Merge branch 'master' into refactor-prepared-sets 2022-08-08 14:27:18 +02:00
Igor Nikonov
8278da6475 Fix: read row counts before move columns out of chunk 2022-08-05 21:29:57 +00:00
Igor Nikonov
9fddf6efde Merge remote-tracking branch 'origin/master' into ordinary_distinct_small_refact 2022-08-05 19:23:44 +00:00
Nikolai Kochetov
cf34232675 Output header is now empty for every MV chain.
Instead of checking that number of processors different for different
threads, simply always return empty header from buildChainImpl(), by
adding explicit conversion.

v2: ignore UNKNOWN_TABLE errors in test
2022-08-05 13:16:32 +03:00
Vladimir C
a627b00c43
Merge branch 'master' into refactor-prepared-sets 2022-08-04 13:27:38 +02:00
Nikita Taranov
4943202921
Improve memory usage during memory efficient merging of aggregation results (#39429) 2022-08-03 17:56:59 +02:00
Igor Nikonov
30782a2b05 Test: distinct sorted is not used on const column 2022-08-02 17:44:43 +00:00
Igor Nikonov
83e1dd1172
Merge branch 'master' into ordinary_distinct_small_refact 2022-07-31 00:23:21 +02:00
Igor Nikonov
7245ddcc20 Simple refactoring: ordinary DISTINCT implementation 2022-07-30 20:25:56 +00:00
Igor Nikonov
a7cfad105e
Merge branch 'master' into distinct_sorted_simplify 2022-07-30 21:57:53 +02:00
Igor Nikonov
3be51a6dea Construct DistinctSortedTransform only when it makes sense
otherwise fallback to DistinctTransform (i.e. ordinary distinct)
2022-07-30 19:41:03 +00:00
Maksim Kita
acb0137dbb
Merge pull request #39718 from kitaisreal/join-enums-refactoring
Join enums refactoring
2022-07-30 13:53:08 +02:00
Igor Nikonov
d951154ef4 Proved NULLs direction when compare rows 2022-07-29 22:12:03 +00:00
Igor Nikonov
13dc1697fb Remove unnecessary initialization 2022-07-29 20:34:23 +00:00
Igor Nikonov
b44373ba8f Merge remote-tracking branch 'origin/master' into distinct_sorted_simplify 2022-07-29 20:33:26 +00:00
Igor Nikonov
7b0b38e997 DistinctSortedTransform works only if columns contains sort prefix of
sort description
2022-07-29 20:01:07 +00:00
Maksim Kita
8fc6bad4f4 Join enums refactoring 2022-07-29 18:35:05 +02:00
Igor Nikonov
4af435bdda Fix: handle all const columns case correctly 2022-07-28 21:22:06 +00:00
Vladimir C
115506356c
Merge branch 'master' into refactor-prepared-sets 2022-07-27 19:57:23 +02:00
Igor Nikonov
377c04fbf1 Merge remote-tracking branch 'origin/master' into fix_distinct_sorted 2022-07-27 13:01:18 +00:00
Anton Popov
1547c010b9
Merge pull request #39432 from ClickHouse/distinct_sorted_chunk_perf_impr
DISTINCT in order: perf improvement
2022-07-27 14:17:58 +02:00
Nikolai Kochetov
873432fb53
Merge pull request #37849 from ClickHouse/bug-with-fill-date
Enforce equality of WITH FILL type with ORDER BY column's type for date/time types.
2022-07-27 12:27:53 +02:00
Igor Nikonov
64e51e56e7 Allocate memory for column arrays once 2022-07-27 08:22:07 +00:00
Igor Nikonov
12a7567402 Some polishing 2022-07-27 07:58:54 +00:00
Igor Nikonov
589104fa6e Make building column arrays for chunk processing straightforward 2022-07-27 07:44:42 +00:00
Igor Nikonov
cac4d77d0b Merge remote-tracking branch 'origin/master' into distinct_sorted_chunk_perf_impr 2022-07-26 20:36:38 +00:00
Yakov Olkhovskiy
0055c9307d
style fix 2022-07-26 16:08:03 -04:00
Igor Nikonov
24f3a6905f
Merge branch 'master' into fix_distinct_sorted 2022-07-26 21:57:44 +02:00
Igor Nikonov
d196ab24d4 Calculate DISTINCT columns positions which form sort prefix in sort
description once
2022-07-26 19:55:29 +00:00
Yakov Olkhovskiy
d93c67e303 comment and test added 2022-07-26 15:28:11 -04:00
vdimir
1e3fa2e01f
Refactor PreparedSets/SubqueryForSet 2022-07-26 18:39:02 +00:00
Alexander Gololobov
0666ec2e1f
Merge branch 'master' into feature/sql-standard-delete 2022-07-26 10:42:39 +02:00
Vladimir Chebotaryov
f32d9c5539
Uppercase ROWS, GROUPS, RANGE in queries with windows. (#39410) 2022-07-25 22:53:53 +02:00
Igor Nikonov
41e72aac83 Fix: DistinctSortedTransform doesn't take advantage of sorting
clearing_columns are set incorrectly, so we never clear HashSet
2022-07-24 21:35:36 +00:00
Alexander Gololobov
460950ecdc
Merge branch 'master' into feature/sql-standard-delete 2022-07-24 21:27:22 +02:00
Igor Nikonov
95511428b3 Couple optimizations
+ do not apply filter to chunk if there is no data for output
+ checking clear_data flag at compile time
2022-07-23 00:03:26 +00:00
Igor Nikonov
739ff34c6e Add some tests, still not sure about optimize_memory_usage option 2022-07-22 22:48:26 +00:00
Igor Nikonov
e50aebb5f0
Merge branch 'master' into distinct_sorted_chunk_perf_impr 2022-07-20 23:17:11 +02:00
Igor Nikonov
965f96bd84 DISTINCT in order: perf improvement
+ reduce allocations in DistinctSortedChunkTransform
+ use it for final distinct as well
2022-07-20 20:44:47 +00:00
Yakov Olkhovskiy
c4d040e02c
Merge branch 'master' into bug-with-fill-date 2022-07-20 09:10:45 -04:00
Dmitry Novik
50989bdb68
Merge branch 'master' into group-by-use-nulls 2022-07-19 14:58:01 +02:00
Alexander Gololobov
9de72d995a POC lightweight delete using __row_exists virtual column and prewhere-like filtering 2022-07-18 20:06:42 +02:00
Igor Nikonov
ef0ef9e03b
Merge pull request #39191 from ClickHouse/sort_transform_cleanup
Cleanup: done during #38719 (SortingStep: deduce way to sort based on input stream sort)
2022-07-16 01:53:58 +02:00
Igor Nikonov
b7f46d954e Cleanup: related to #38719 (SortingStep: deduce way to sort based on input stream sort) 2022-07-13 17:57:37 +00:00
vdimir
fa59133463
Do not spam log in MergeJoinAlgorithm 2022-07-13 11:51:11 +00:00
Vladimir C
d1d1e4d8a1
Merge pull request #38943 from amosbird/better-join-plan1
Avoid redundant join block transformation during planning.
2022-07-13 12:39:45 +02:00
Dmitry Novik
5f65b45269
Merge branch 'master' into group-by-use-nulls 2022-07-12 22:36:04 +02:00
Amos Bird
d3709c6c26
Avoid redundant join block transformation. 2022-07-12 22:20:10 +08:00
Nikolai Kochetov
75c3926cbb Fix insert into MV with enabled extremes. 2022-07-12 13:57:36 +00:00
vdimir
da523f3288
Fix assertion in full soring merge join 2022-07-08 11:31:15 +00:00
Dmitry Novik
d1df66687b
Merge branch 'master' into group-by-use-nulls 2022-07-07 20:54:38 +02:00
Dmitry Novik
1587385f7a Cleanup code 2022-07-07 18:53:20 +00:00
Robert Schulze
f15d9ca59c
Merge pull request #38774 from zvonand/zvonand-nnd
Reintroduce nonNegativeDerivative()
2022-07-07 20:39:13 +02:00
vdimir
7c586a9e7c
Minor updates for full soring merge join 2022-07-06 14:28:05 +00:00
vdimir
881d352e05
upd full sorting join 2022-07-06 14:28:05 +00:00
vdimir
aff6654d52
minor changes in full sort join 2022-07-06 14:27:33 +00:00
vdimir
f8e66601a7
Fix column remap in MergeJoinTransform 2022-07-06 14:27:32 +00:00
vdimir
0b994bb258
fix build 2022-07-06 14:27:32 +00:00
vdimir
753a567da8
full sorting join with using 2022-07-06 14:27:29 +00:00
vdimir
a90ac59ee5
MergeJoinAlgorithm::createBlockWithDefaults 2022-07-06 14:26:19 +00:00
vdimir
d184e184b4
full sort join: check key types, more tests 2022-07-06 14:26:19 +00:00
vdimir
a2a7abc2e9
add not implemented checks, add using testcase to full sort join 2022-07-06 14:26:18 +00:00
vdimir
92ff43eb7c
tests full sort join 2022-07-06 14:26:18 +00:00
vdimir
a0144e115d
full sorting all join 2022-07-06 14:26:18 +00:00
vdimir
4e88e8f5ec
full sort join: move block list to all join state 2022-07-06 14:26:17 +00:00
vdimir
94192a23fc
enable total compare in MergeJoinAlgorithm 2022-07-06 14:26:16 +00:00
vdimir
a92c60ba06
fix nulls comparsion in full sorting join 2022-07-06 14:26:15 +00:00
vdimir
7c5a5f4b64
full sorted any join tests passed 2022-07-06 14:26:15 +00:00
vdimir
26d812ec72
wip any full sorting merge, rewrite cursor 2022-07-06 14:26:14 +00:00
vdimir
a2d190edb8
wip MergeJoinTransform 2022-07-06 14:25:12 +00:00
vdimir
0b9d4ee640
wip sort join same rows 2022-07-06 14:25:12 +00:00
vdimir
6d198ff3d7
fix style 2022-07-06 14:25:11 +00:00
vdimir
88d8dc5be2
wip full sort any join 2022-07-06 14:25:11 +00:00
vdimir
ba787db0bb
Fix build, small changes 2022-07-06 14:25:10 +00:00
vdimir
d34a66c915
wip sorting merge 2022-07-06 14:25:09 +00:00
vdimir
1b429fc1af
wip: any left/right sorting join 2022-07-06 14:23:46 +00:00
vdimir
8dce97123c
wip: any inner full sorting join 2022-07-06 14:23:46 +00:00
vdimir
4a16195964
Calculate output header for full sorting merge join 2022-07-06 14:23:45 +00:00
vdimir
fa8eb35599
Pipeline for full sorting merge join 2022-07-06 14:23:44 +00:00
Maksim Kita
b94489d52c
Merge pull request #38859 from kitaisreal/merge-tree-merge-disable-batch-optimization
MergeTree merge disable batch optimization
2022-07-06 15:59:40 +02:00
Nikolai Kochetov
7de2f229ab
Merge pull request #38584 from ClickHouse/filimonov-AggregatingTransform-expandPipeline
Add check for empty proccessors in AggregatingTransform::expandPipeline
2022-07-06 14:38:40 +02:00
Andrey Zvonov
7de39d9b15 Merge branch 'master' of github.com:ClickHouse/ClickHouse into zvonand-nnd 2022-07-06 10:59:35 +03:00
Maksim Kita
bdc21737d5 MergeTree merge disable batch optimization 2022-07-05 16:15:00 +02:00
zvonand
8a270c01e9 fix floating point in intervals 2022-07-04 20:45:05 +03:00
Dmitry Novik
864ab20582 Use correct intermediate header for ROLLUP and CUBE 2022-07-04 16:17:58 +00:00
zvonand
f814985adf minor improvements 2022-07-04 16:03:59 +03:00
zvonand
eac84351f6 fix behavior 2022-07-04 01:26:07 +03:00
Igor Nikonov
2e2ef08712
Merge pull request #37803 from ClickHouse/dictinct_in_order_optimization
DISTINCT in order optimization
2022-07-03 21:59:04 +02:00
zvonand
8e99ea84a8 fix LOGICAL_ERROR 2022-07-02 14:09:51 +03:00
mergify[bot]
12f5250e86
Merge branch 'master' into dictinct_in_order_optimization 2022-07-01 22:51:35 +00:00
Anton Popov
ef87e1207c better support of read_in_order in case of fixed prefix of sorting key 2022-07-01 16:45:01 +00:00
Dmitry Novik
81dd90893e Merge remote-tracking branch 'origin/master' into group-by-use-nulls 2022-07-01 16:24:05 +00:00
Nikita Taranov
8ba3d405de impl 2022-07-01 16:05:32 +02:00
zvonand
3b5332d15e Revert "Revert "Non Negative Derivative window function""
This reverts commit dea3b5bfce.
2022-07-01 18:59:07 +05:00
Alexey Milovidov
20841f0e1e
Merge pull request #38551 from ClickHouse/revert-37628-non-neg-deriv
Revert "Non Negative Derivative window function"
2022-07-01 02:46:28 +03:00
Igor Nikonov
488ee75fc4 + use DistinctSorted for final distinct step
+ fix performance tests
2022-06-30 13:03:39 +00:00
Maksim Kita
0de66a2712
Merge pull request #38449 from ClickHouse/revert-38361-revert-38324-fix-partial-sort
Revert "Revert "Fix optimization in PartialSortingTransform (SIGSEGV and possible incorrect result)""
2022-06-30 13:02:38 +02:00
Dmitry Novik
98e9bc84d5 Refector ROLLUP and CUBE 2022-06-30 10:13:58 +00:00
mergify[bot]
4cbbfb431d
Merge branch 'master' into dictinct_in_order_optimization 2022-06-29 23:32:17 +00:00
Igor Nikonov
d435532c68 Adapt range search algorithm to high cardinality case
+ range search done in steps of some number of rows.
  Controled by new
  setting `distinct_in_order_range_search_step`. By default 0, i.e.
  whole chunk
+ before start binary search, linear probing is done on each step (32
  rows currently)
2022-06-29 23:30:35 +00:00
filimonov
e0acb6e337
Add check for empty proccessors in AggregatingTransform::expandPipeline 2022-06-29 15:23:53 +02:00
Igor Nikonov
3627c6ff36 Perf tests with high cardinality 2022-06-29 13:13:39 +00:00
mergify[bot]
26258959b1
Merge branch 'master' into distinct_sorted_small_refact 2022-06-29 09:38:34 +00:00
Alexey Milovidov
dea3b5bfce
Revert "Non Negative Derivative window function" 2022-06-29 08:56:15 +03:00
Igor Nikonov
4a00e33e6b Fixes for some review comments 2022-06-28 21:42:46 +00:00
Igor Nikonov
c1840e798c Fix: wrong header variable was used 2022-06-28 20:15:16 +00:00
Igor Nikonov
d80a21a445 Distinct sorted: calculate column positions once in constructor
- instead of calculating them on every chunk
2022-06-28 19:59:05 +00:00
Igor Nikonov
59295724ac Mark condition for empty chunk as unlikely 2022-06-27 20:44:39 +00:00
mergify[bot]
a9c1b68034
Merge branch 'master' into dictinct_in_order_optimization 2022-06-27 20:16:00 +00:00
Igor Nikonov
5a26349695 Fix: input chunk can have empty columns (no rows) 2022-06-27 19:51:06 +00:00
Dmitry Novik
1d15d72211 Support NULLs in ROLLUP 2022-06-27 18:42:26 +00:00
Nikita Taranov
2487ba7f00
Move updateInputStream to ITransformingStep (#37393) 2022-06-27 13:16:52 +02:00
Maksim Kita
3ebe6a03b1
Revert "Revert "Fix optimization in PartialSortingTransform (SIGSEGV and possible incorrect result)"" 2022-06-27 10:37:19 +02:00
Igor Nikonov
edd29707ca Some polishing 2022-06-26 21:44:10 +00:00
Igor Nikonov
68927dd60c Adapt distinct for sorted chunks to handle sorted stream, so we can use
it for final distinct as well
2022-06-26 14:52:36 +00:00
Igor Nikonov
1140cf6fb5 Fixes:
+ test warning
+ proper capacity for column positions array in DistinctTransform
2022-06-26 09:43:31 +00:00
mergify[bot]
b65cf4e1fe
Merge branch 'master' into dictinct_in_order_optimization 2022-06-24 22:52:14 +00:00
Alexander Tokmakov
3f4a09478d
Revert "Fix optimization in PartialSortingTransform (SIGSEGV and possible incorrect result)" 2022-06-23 23:01:11 +03:00
Igor Nikonov
2fd5467f36 Merge remote-tracking branch 'origin/master' into dictinct_in_order_optimization 2022-06-23 16:04:08 +00:00
mergify[bot]
b5d3fd50d2
Merge branch 'master' into dictinct_in_order_optimization 2022-06-23 09:48:38 +00:00
Igor Nikonov
944c247345 DISTINCT in order optimization
+ try use the optimization for final distinct in case of sorted stream
  (sorting inside and among chunks)
+ sorting description contains only columns from sorting key which are in
  header as well
2022-06-23 09:47:22 +00:00
Azat Khuzhin
9db64952c0 Fix SIGSEGV in optimization in PartialSortingTransform
Fixes: #37992 (cc @kitaisreal)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-22 21:39:10 +03:00