Release more num_streams if data is small (#53867)

* Release more num_streams if data is small

Besides the sum_marks and min_marks_for_concurrent_read, we could also involve the
system cores to get the num_streams if the data is small. Increasing the num_streams
and decreasing the min_marks_for_concurrent_read would improve the parallel performance
if the system has plentiful cores.

Test the patch on 2x80 vCPUs system. Q39 of clickbench has got 3.3x performance improvement.
Q36 has got 2.6x performance improvement. The overall geomean has got 9% gain.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* Release more num_streams if data is small
Change the min marks from 4 to 8 as the profit is small and 8 granules
is the default block size.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

---------

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
This commit is contained in:
Jiebin Sun 2023-10-17 00:41:38 +08:00 committed by GitHub
parent cbdb62d389
commit df17cd467b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -718,7 +718,29 @@ Pipe ReadFromMergeTree::spreadMarkRangesAmongStreams(RangesInDataParts && parts_
{ {
/// Reduce the number of num_streams if the data is small. /// Reduce the number of num_streams if the data is small.
if (info.sum_marks < num_streams * info.min_marks_for_concurrent_read && parts_with_ranges.size() < num_streams) if (info.sum_marks < num_streams * info.min_marks_for_concurrent_read && parts_with_ranges.size() < num_streams)
num_streams = std::max((info.sum_marks + info.min_marks_for_concurrent_read - 1) / info.min_marks_for_concurrent_read, parts_with_ranges.size()); {
/*
If the data is fragmented, then allocate the size of parts to num_streams. If the data is not fragmented, besides the sum_marks and
min_marks_for_concurrent_read, involve the system cores to get the num_streams. Increase the num_streams and decrease the min_marks_for_concurrent_read
if the data is small but system has plentiful cores. It helps to improve the parallel performance of `MergeTreeRead` significantly.
Make sure the new num_streams `num_streams * increase_num_streams_ratio` will not exceed the previous calculated prev_num_streams.
The new info.min_marks_for_concurrent_read `info.min_marks_for_concurrent_read / increase_num_streams_ratio` should be larger than 8.
https://github.com/ClickHouse/ClickHouse/pull/53867
*/
if ((info.sum_marks + info.min_marks_for_concurrent_read - 1) / info.min_marks_for_concurrent_read > parts_with_ranges.size())
{
const size_t prev_num_streams = num_streams;
num_streams = (info.sum_marks + info.min_marks_for_concurrent_read - 1) / info.min_marks_for_concurrent_read;
const size_t increase_num_streams_ratio = std::min(prev_num_streams / num_streams, info.min_marks_for_concurrent_read / 8);
if (increase_num_streams_ratio > 1)
{
num_streams = num_streams * increase_num_streams_ratio;
info.min_marks_for_concurrent_read = (info.sum_marks + num_streams - 1) / num_streams;
}
}
else
num_streams = parts_with_ranges.size();
}
} }
auto read_type = is_parallel_reading_from_replicas ? ReadType::ParallelReplicas : ReadType::Default; auto read_type = is_parallel_reading_from_replicas ? ReadType::ParallelReplicas : ReadType::Default;