vdimir
714c53ab24
fix typos
2022-08-29 09:49:09 +00:00
vdimir
8e1632f824
Create sets for joins: better code
2022-08-29 09:49:08 +00:00
vdimir
7228091ff1
rename CreateSetAndFilterOnTheFlyTransform
2022-08-29 09:49:07 +00:00
vdimir
c778bba13f
Create sets for joins: wip
2022-08-29 09:47:00 +00:00
vdimir
31a167848d
Fix set finish condition in CreatingSetsOnTheFlyTransform
2022-08-29 09:46:59 +00:00
vdimir
8f06430ebd
Create sets for joins: upd
2022-08-29 09:46:58 +00:00
vdimir
3292566603
Format bytes in CreatingSetsOnTheFlyTransform logs
2022-08-29 09:46:57 +00:00
vdimir
031aaf3a45
Add Creating/FilterBySetsOnTheFlyStep for full sorting join
2022-08-29 09:46:57 +00:00
Azat Khuzhin
f9812d9917
Fix memory leak while pushing to MVs w/o query context (from Kafka/...)
...
While pushign to MVs, there is a low-level code that create
ThreadGroupStatus/ThreadStatus, it is required to gather some metrics
for system.query_views_log.
But, one should not use ThreadGroupStatus of the MainThreadStatus, since
this structure can hold some state, that may not be cleaned, plus this
may be racy, instead it is better to create new ThreadGroupStatus and
attach it instead.
Also this place misses detachQuery(), and because of this it leaks
ThreadGroupStatus::finished_threads_counters_memory. But it is only the
problem pushing to MVs is done w/o query context (i.e. from Kafka/...),
since when it has query context detachQuery() will be called eventually.
Before this patch series, when I've tried the reproducer with
500 MVs attached to Kafka engine (that @den-crane suggested), jemalloc
report looks like this:
$ ../jeprof --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
Using local file jeprof.44384.167.i167.heap.
Total: 915.6 MB
910.7 99.5% 99.5% 910.7 99.5% Snapshot (inline)
9.5 1.0% 100.5% 9.5 1.0% std::__1::__libcpp_operator_new (inline)
0.5 0.1% 100.6% 0.5 0.1% DB::TasksStatsCounters::create
And with focus to this place:
$ ../jeprof --focus Snapshot --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
Using local file jeprof.44384.167.i167.heap.
Total: 915.6 MB
910.7 100.0% 100.0% 910.7 100.0% Snapshot (inline)
0.0 0.0% 100.0% 910.7 100.0% DB::QueryPipeline::reset
0.0 0.0% 100.0% 910.7 100.0% DB::StorageKafka::streamToViews
0.0 0.0% 100.0% 910.7 100.0% DB::StorageKafka::threadFunc
0.0 0.0% 100.0% 910.7 100.0% ProfileEvents::Counters::getPartiallyAtomicSnapshot
0.0 0.0% 100.0% 910.7 100.0% ~ThreadStatus
0.0 0.0% 100.0% 910.7 100.0% ~ViewRuntimeData
0.0 0.0% 100.0% 910.7 100.0% ~ViewRuntimeStats (inline)
Actually this report does not looks great (you understand it because I
stripped it), because --text does not that smart, but if you will use
--pdf for the report you will see the stacktrace (will attach pdf to the
pull request).
But after this patch series the process RSS does not goes beyond
~700MiB.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:36:33 +02:00
Vladimir C
e067629e0d
Merge pull request #40239 from vdimir/vdimir/tmp-file-metrics
...
More metrics for on-disk temporary data
2022-08-26 11:28:01 +02:00
vdimir
91788f29e8
Upd TemporaryFileOnDisk
2022-08-24 16:15:54 +00:00
vdimir
7194df1184
Move back TemporaryFile -> TemporaryFileOnDisk
2022-08-24 16:14:11 +00:00
vdimir
0349c85017
Use getCompressedBytes in BufferingToFileTransform and TemporaryFileStream
2022-08-24 16:14:10 +00:00
vdimir
51c44424cc
More metrics for temp files
2022-08-24 16:14:09 +00:00
vdimir
1321ac87b5
Minor fixes
2022-08-24 16:14:07 +00:00
vdimir
7e0c9062c7
Add ProfileEvents::ExternalSort(Un)CompressedBytes
2022-08-24 16:14:07 +00:00
Alexander Gololobov
1c2dd50ca5
Fix vertical merge of parts with lightweight deleted rows
2022-08-24 15:18:33 +02:00
Alexey Milovidov
ab91c99495
Merge branch 'master' into control_block_size_in_aggregator
2022-08-20 21:28:27 +03:00
Nikita Taranov
f650b23ee3
generate many blocks
2022-08-16 18:56:22 +02:00
Nikita Taranov
db0110fd7a
more accurate crutch
2022-08-16 18:56:22 +02:00
Nikita Taranov
e5e0a24ab3
return chunks from prepareBlockAndFillWithoutKey
2022-08-16 18:56:22 +02:00
Vladimir Chebotaryov
3cc03b141e
Fixed tests on Debug build type.
2022-08-16 15:43:37 +02:00
Vladimir Chebotaryov
66f9bfca61
Fixed point of origin for exponential decay window functions to the last value in window.
2022-08-16 15:43:37 +02:00
Anton Popov
4bd50bb06c
Merge branch 'master' into distinct_sorted_simplify
2022-08-12 17:11:18 +02:00
Kruglov Pavel
4c7222d938
Merge pull request #40020 from canhld94/ch_canh_fix_hash
...
fix HashMethodOneNumber with const column
2022-08-12 14:40:24 +02:00
Maksim Kita
6bec0f5854
Merge pull request #38956 from vdimir/dict-join-refactoring
...
Join with dictionary refactoring
2022-08-11 11:54:11 +02:00
Duc Canh Le
84cd867aa8
materialize column instead of handling column in hash method
2022-08-11 10:46:06 +08:00
vdimir
ad91c16ba0
Rename join_common -> JoinUtils
2022-08-10 14:20:28 +00:00
vdimir
708747ca0b
Merge branch 'master' into refactor-prepared-sets
2022-08-08 14:27:18 +02:00
Igor Nikonov
8278da6475
Fix: read row counts before move columns out of chunk
2022-08-05 21:29:57 +00:00
Igor Nikonov
9fddf6efde
Merge remote-tracking branch 'origin/master' into ordinary_distinct_small_refact
2022-08-05 19:23:44 +00:00
Nikolai Kochetov
cf34232675
Output header is now empty for every MV chain.
...
Instead of checking that number of processors different for different
threads, simply always return empty header from buildChainImpl(), by
adding explicit conversion.
v2: ignore UNKNOWN_TABLE errors in test
2022-08-05 13:16:32 +03:00
Vladimir C
a627b00c43
Merge branch 'master' into refactor-prepared-sets
2022-08-04 13:27:38 +02:00
Nikita Taranov
4943202921
Improve memory usage during memory efficient merging of aggregation results ( #39429 )
2022-08-03 17:56:59 +02:00
Igor Nikonov
30782a2b05
Test: distinct sorted is not used on const column
2022-08-02 17:44:43 +00:00
Igor Nikonov
83e1dd1172
Merge branch 'master' into ordinary_distinct_small_refact
2022-07-31 00:23:21 +02:00
Igor Nikonov
7245ddcc20
Simple refactoring: ordinary DISTINCT implementation
2022-07-30 20:25:56 +00:00
Igor Nikonov
a7cfad105e
Merge branch 'master' into distinct_sorted_simplify
2022-07-30 21:57:53 +02:00
Igor Nikonov
3be51a6dea
Construct DistinctSortedTransform only when it makes sense
...
otherwise fallback to DistinctTransform (i.e. ordinary distinct)
2022-07-30 19:41:03 +00:00
Maksim Kita
acb0137dbb
Merge pull request #39718 from kitaisreal/join-enums-refactoring
...
Join enums refactoring
2022-07-30 13:53:08 +02:00
Igor Nikonov
d951154ef4
Proved NULLs direction when compare rows
2022-07-29 22:12:03 +00:00
Igor Nikonov
13dc1697fb
Remove unnecessary initialization
2022-07-29 20:34:23 +00:00
Igor Nikonov
b44373ba8f
Merge remote-tracking branch 'origin/master' into distinct_sorted_simplify
2022-07-29 20:33:26 +00:00
Igor Nikonov
7b0b38e997
DistinctSortedTransform works only if columns contains sort prefix of
...
sort description
2022-07-29 20:01:07 +00:00
Maksim Kita
8fc6bad4f4
Join enums refactoring
2022-07-29 18:35:05 +02:00
Igor Nikonov
4af435bdda
Fix: handle all const columns case correctly
2022-07-28 21:22:06 +00:00
Vladimir C
115506356c
Merge branch 'master' into refactor-prepared-sets
2022-07-27 19:57:23 +02:00
Igor Nikonov
377c04fbf1
Merge remote-tracking branch 'origin/master' into fix_distinct_sorted
2022-07-27 13:01:18 +00:00
Anton Popov
1547c010b9
Merge pull request #39432 from ClickHouse/distinct_sorted_chunk_perf_impr
...
DISTINCT in order: perf improvement
2022-07-27 14:17:58 +02:00
Nikolai Kochetov
873432fb53
Merge pull request #37849 from ClickHouse/bug-with-fill-date
...
Enforce equality of WITH FILL type with ORDER BY column's type for date/time types.
2022-07-27 12:27:53 +02:00
Igor Nikonov
64e51e56e7
Allocate memory for column arrays once
2022-07-27 08:22:07 +00:00
Igor Nikonov
12a7567402
Some polishing
2022-07-27 07:58:54 +00:00
Igor Nikonov
589104fa6e
Make building column arrays for chunk processing straightforward
2022-07-27 07:44:42 +00:00
Igor Nikonov
cac4d77d0b
Merge remote-tracking branch 'origin/master' into distinct_sorted_chunk_perf_impr
2022-07-26 20:36:38 +00:00
Yakov Olkhovskiy
0055c9307d
style fix
2022-07-26 16:08:03 -04:00
Igor Nikonov
24f3a6905f
Merge branch 'master' into fix_distinct_sorted
2022-07-26 21:57:44 +02:00
Igor Nikonov
d196ab24d4
Calculate DISTINCT columns positions which form sort prefix in sort
...
description once
2022-07-26 19:55:29 +00:00
Yakov Olkhovskiy
d93c67e303
comment and test added
2022-07-26 15:28:11 -04:00
vdimir
1e3fa2e01f
Refactor PreparedSets/SubqueryForSet
2022-07-26 18:39:02 +00:00
Alexander Gololobov
0666ec2e1f
Merge branch 'master' into feature/sql-standard-delete
2022-07-26 10:42:39 +02:00
Vladimir Chebotaryov
f32d9c5539
Uppercase ROWS
, GROUPS
, RANGE
in queries with windows. ( #39410 )
2022-07-25 22:53:53 +02:00
Igor Nikonov
41e72aac83
Fix: DistinctSortedTransform doesn't take advantage of sorting
...
clearing_columns are set incorrectly, so we never clear HashSet
2022-07-24 21:35:36 +00:00
Alexander Gololobov
460950ecdc
Merge branch 'master' into feature/sql-standard-delete
2022-07-24 21:27:22 +02:00
Igor Nikonov
95511428b3
Couple optimizations
...
+ do not apply filter to chunk if there is no data for output
+ checking clear_data flag at compile time
2022-07-23 00:03:26 +00:00
Igor Nikonov
739ff34c6e
Add some tests, still not sure about optimize_memory_usage option
2022-07-22 22:48:26 +00:00
Igor Nikonov
e50aebb5f0
Merge branch 'master' into distinct_sorted_chunk_perf_impr
2022-07-20 23:17:11 +02:00
Igor Nikonov
965f96bd84
DISTINCT in order: perf improvement
...
+ reduce allocations in DistinctSortedChunkTransform
+ use it for final distinct as well
2022-07-20 20:44:47 +00:00
Yakov Olkhovskiy
c4d040e02c
Merge branch 'master' into bug-with-fill-date
2022-07-20 09:10:45 -04:00
Dmitry Novik
50989bdb68
Merge branch 'master' into group-by-use-nulls
2022-07-19 14:58:01 +02:00
Alexander Gololobov
9de72d995a
POC lightweight delete using __row_exists virtual column and prewhere-like filtering
2022-07-18 20:06:42 +02:00
Igor Nikonov
ef0ef9e03b
Merge pull request #39191 from ClickHouse/sort_transform_cleanup
...
Cleanup: done during #38719 (SortingStep: deduce way to sort based on input stream sort)
2022-07-16 01:53:58 +02:00
Igor Nikonov
b7f46d954e
Cleanup: related to #38719 (SortingStep: deduce way to sort based on input stream sort)
2022-07-13 17:57:37 +00:00
vdimir
fa59133463
Do not spam log in MergeJoinAlgorithm
2022-07-13 11:51:11 +00:00
Vladimir C
d1d1e4d8a1
Merge pull request #38943 from amosbird/better-join-plan1
...
Avoid redundant join block transformation during planning.
2022-07-13 12:39:45 +02:00
Dmitry Novik
5f65b45269
Merge branch 'master' into group-by-use-nulls
2022-07-12 22:36:04 +02:00
Amos Bird
d3709c6c26
Avoid redundant join block transformation.
2022-07-12 22:20:10 +08:00
Nikolai Kochetov
75c3926cbb
Fix insert into MV with enabled extremes.
2022-07-12 13:57:36 +00:00
vdimir
da523f3288
Fix assertion in full soring merge join
2022-07-08 11:31:15 +00:00
Dmitry Novik
d1df66687b
Merge branch 'master' into group-by-use-nulls
2022-07-07 20:54:38 +02:00
Dmitry Novik
1587385f7a
Cleanup code
2022-07-07 18:53:20 +00:00
Robert Schulze
f15d9ca59c
Merge pull request #38774 from zvonand/zvonand-nnd
...
Reintroduce nonNegativeDerivative()
2022-07-07 20:39:13 +02:00
vdimir
7c586a9e7c
Minor updates for full soring merge join
2022-07-06 14:28:05 +00:00
vdimir
881d352e05
upd full sorting join
2022-07-06 14:28:05 +00:00
vdimir
aff6654d52
minor changes in full sort join
2022-07-06 14:27:33 +00:00
vdimir
f8e66601a7
Fix column remap in MergeJoinTransform
2022-07-06 14:27:32 +00:00
vdimir
0b994bb258
fix build
2022-07-06 14:27:32 +00:00
vdimir
753a567da8
full sorting join with using
2022-07-06 14:27:29 +00:00
vdimir
a90ac59ee5
MergeJoinAlgorithm::createBlockWithDefaults
2022-07-06 14:26:19 +00:00
vdimir
d184e184b4
full sort join: check key types, more tests
2022-07-06 14:26:19 +00:00
vdimir
a2a7abc2e9
add not implemented checks, add using testcase to full sort join
2022-07-06 14:26:18 +00:00
vdimir
92ff43eb7c
tests full sort join
2022-07-06 14:26:18 +00:00
vdimir
a0144e115d
full sorting all join
2022-07-06 14:26:18 +00:00
vdimir
4e88e8f5ec
full sort join: move block list to all join state
2022-07-06 14:26:17 +00:00
vdimir
94192a23fc
enable total compare in MergeJoinAlgorithm
2022-07-06 14:26:16 +00:00
vdimir
a92c60ba06
fix nulls comparsion in full sorting join
2022-07-06 14:26:15 +00:00
vdimir
7c5a5f4b64
full sorted any join tests passed
2022-07-06 14:26:15 +00:00
vdimir
26d812ec72
wip any full sorting merge, rewrite cursor
2022-07-06 14:26:14 +00:00
vdimir
a2d190edb8
wip MergeJoinTransform
2022-07-06 14:25:12 +00:00
vdimir
0b9d4ee640
wip sort join same rows
2022-07-06 14:25:12 +00:00
vdimir
6d198ff3d7
fix style
2022-07-06 14:25:11 +00:00
vdimir
88d8dc5be2
wip full sort any join
2022-07-06 14:25:11 +00:00
vdimir
ba787db0bb
Fix build, small changes
2022-07-06 14:25:10 +00:00
vdimir
d34a66c915
wip sorting merge
2022-07-06 14:25:09 +00:00
vdimir
1b429fc1af
wip: any left/right sorting join
2022-07-06 14:23:46 +00:00
vdimir
8dce97123c
wip: any inner full sorting join
2022-07-06 14:23:46 +00:00
vdimir
4a16195964
Calculate output header for full sorting merge join
2022-07-06 14:23:45 +00:00
vdimir
fa8eb35599
Pipeline for full sorting merge join
2022-07-06 14:23:44 +00:00
Maksim Kita
b94489d52c
Merge pull request #38859 from kitaisreal/merge-tree-merge-disable-batch-optimization
...
MergeTree merge disable batch optimization
2022-07-06 15:59:40 +02:00
Nikolai Kochetov
7de2f229ab
Merge pull request #38584 from ClickHouse/filimonov-AggregatingTransform-expandPipeline
...
Add check for empty proccessors in AggregatingTransform::expandPipeline
2022-07-06 14:38:40 +02:00
Andrey Zvonov
7de39d9b15
Merge branch 'master' of github.com:ClickHouse/ClickHouse into zvonand-nnd
2022-07-06 10:59:35 +03:00
Maksim Kita
bdc21737d5
MergeTree merge disable batch optimization
2022-07-05 16:15:00 +02:00
zvonand
8a270c01e9
fix floating point in intervals
2022-07-04 20:45:05 +03:00
Dmitry Novik
864ab20582
Use correct intermediate header for ROLLUP and CUBE
2022-07-04 16:17:58 +00:00
zvonand
f814985adf
minor improvements
2022-07-04 16:03:59 +03:00
zvonand
eac84351f6
fix behavior
2022-07-04 01:26:07 +03:00
Igor Nikonov
2e2ef08712
Merge pull request #37803 from ClickHouse/dictinct_in_order_optimization
...
DISTINCT in order optimization
2022-07-03 21:59:04 +02:00
zvonand
8e99ea84a8
fix LOGICAL_ERROR
2022-07-02 14:09:51 +03:00
mergify[bot]
12f5250e86
Merge branch 'master' into dictinct_in_order_optimization
2022-07-01 22:51:35 +00:00
Anton Popov
ef87e1207c
better support of read_in_order in case of fixed prefix of sorting key
2022-07-01 16:45:01 +00:00
Dmitry Novik
81dd90893e
Merge remote-tracking branch 'origin/master' into group-by-use-nulls
2022-07-01 16:24:05 +00:00
Nikita Taranov
8ba3d405de
impl
2022-07-01 16:05:32 +02:00
zvonand
3b5332d15e
Revert "Revert "Non Negative Derivative window function""
...
This reverts commit dea3b5bfce
.
2022-07-01 18:59:07 +05:00
Alexey Milovidov
20841f0e1e
Merge pull request #38551 from ClickHouse/revert-37628-non-neg-deriv
...
Revert "Non Negative Derivative window function"
2022-07-01 02:46:28 +03:00
Igor Nikonov
488ee75fc4
+ use DistinctSorted for final distinct step
...
+ fix performance tests
2022-06-30 13:03:39 +00:00
Maksim Kita
0de66a2712
Merge pull request #38449 from ClickHouse/revert-38361-revert-38324-fix-partial-sort
...
Revert "Revert "Fix optimization in PartialSortingTransform (SIGSEGV and possible incorrect result)""
2022-06-30 13:02:38 +02:00
Dmitry Novik
98e9bc84d5
Refector ROLLUP and CUBE
2022-06-30 10:13:58 +00:00
mergify[bot]
4cbbfb431d
Merge branch 'master' into dictinct_in_order_optimization
2022-06-29 23:32:17 +00:00
Igor Nikonov
d435532c68
Adapt range search algorithm to high cardinality case
...
+ range search done in steps of some number of rows.
Controled by new
setting `distinct_in_order_range_search_step`. By default 0, i.e.
whole chunk
+ before start binary search, linear probing is done on each step (32
rows currently)
2022-06-29 23:30:35 +00:00
filimonov
e0acb6e337
Add check for empty proccessors in AggregatingTransform::expandPipeline
2022-06-29 15:23:53 +02:00
Igor Nikonov
3627c6ff36
Perf tests with high cardinality
2022-06-29 13:13:39 +00:00
mergify[bot]
26258959b1
Merge branch 'master' into distinct_sorted_small_refact
2022-06-29 09:38:34 +00:00
Alexey Milovidov
dea3b5bfce
Revert "Non Negative Derivative window function"
2022-06-29 08:56:15 +03:00
Igor Nikonov
4a00e33e6b
Fixes for some review comments
2022-06-28 21:42:46 +00:00
Igor Nikonov
c1840e798c
Fix: wrong header variable was used
2022-06-28 20:15:16 +00:00
Igor Nikonov
d80a21a445
Distinct sorted: calculate column positions once in constructor
...
- instead of calculating them on every chunk
2022-06-28 19:59:05 +00:00
Igor Nikonov
59295724ac
Mark condition for empty chunk as unlikely
2022-06-27 20:44:39 +00:00
mergify[bot]
a9c1b68034
Merge branch 'master' into dictinct_in_order_optimization
2022-06-27 20:16:00 +00:00
Igor Nikonov
5a26349695
Fix: input chunk can have empty columns (no rows)
2022-06-27 19:51:06 +00:00
Dmitry Novik
1d15d72211
Support NULLs in ROLLUP
2022-06-27 18:42:26 +00:00
Nikita Taranov
2487ba7f00
Move updateInputStream
to ITransformingStep
( #37393 )
2022-06-27 13:16:52 +02:00
Maksim Kita
3ebe6a03b1
Revert "Revert "Fix optimization in PartialSortingTransform (SIGSEGV and possible incorrect result)""
2022-06-27 10:37:19 +02:00
Igor Nikonov
edd29707ca
Some polishing
2022-06-26 21:44:10 +00:00
Igor Nikonov
68927dd60c
Adapt distinct for sorted chunks to handle sorted stream, so we can use
...
it for final distinct as well
2022-06-26 14:52:36 +00:00
Igor Nikonov
1140cf6fb5
Fixes:
...
+ test warning
+ proper capacity for column positions array in DistinctTransform
2022-06-26 09:43:31 +00:00
mergify[bot]
b65cf4e1fe
Merge branch 'master' into dictinct_in_order_optimization
2022-06-24 22:52:14 +00:00
Alexander Tokmakov
3f4a09478d
Revert "Fix optimization in PartialSortingTransform (SIGSEGV and possible incorrect result)"
2022-06-23 23:01:11 +03:00
Igor Nikonov
2fd5467f36
Merge remote-tracking branch 'origin/master' into dictinct_in_order_optimization
2022-06-23 16:04:08 +00:00
mergify[bot]
b5d3fd50d2
Merge branch 'master' into dictinct_in_order_optimization
2022-06-23 09:48:38 +00:00
Igor Nikonov
944c247345
DISTINCT in order optimization
...
+ try use the optimization for final distinct in case of sorted stream
(sorting inside and among chunks)
+ sorting description contains only columns from sorting key which are in
header as well
2022-06-23 09:47:22 +00:00
Azat Khuzhin
9db64952c0
Fix SIGSEGV in optimization in PartialSortingTransform
...
Fixes : #37992 (cc @kitaisreal)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-22 21:39:10 +03:00