* Optimize the merge if all hashSets are singleLevel
In PR(https://github.com/ClickHouse/ClickHouse/pull/50748), it has added new phase
`parallelizeMergePrepare` before merge if all the hashSets are not all singleLevel
or not all twoLevel. Then it will convert all the singleLevelSet to twoLevelSet in
parallel, which will increase the CPU utilization and QPS.
But if all the hashtables are singleLevel, it could also benefit from the
`parallelizeMergePrepare` optimization in most cases if the hashtable size are not
too small. By tuning the Query `SELECT COUNT(DISTINCT SearchPhase) FROM hits_v1`
in different threads, we have got the mild threshold 6,000.
Test patch with the Query 'SELECT COUNT(DISTINCT Title) FROM hits_v1' on 2x80 vCPUs
server. If the threads are less than 48, the hashSets are all twoLevel or mixed by
singleLevel and twoLevel. If the threads are over 56, all the hashSets are singleLevel.
And the QPS has got at most 2.35x performance gain.
Threads Opt/Base
8 100.0%
16 99.4%
24 110.3%
32 99.9%
40 99.3%
48 99.8%
56 183.0%
64 234.7%
72 233.1%
80 229.9%
88 224.5%
96 229.6%
104 235.1%
112 229.5%
120 229.1%
128 217.8%
136 222.9%
144 217.8%
152 204.3%
160 203.2%
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Add the comment and explanation for PR#52973
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
---------
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Convert hashSets in parallel before merge
Before merge, if one of the lhs and rhs is singleLevelSet and the other is twoLevelSet,
then the SingleLevelSet will call convertToTwoLevel(). The convert process is not in parallel
and it will cost lots of cycle if it cosume all the singleLevelSet.
The idea of the patch is to convert all the singleLevelSets to twoLevelSets in parallel if
the hashsets are not all singleLevel or not all twoLevel.
I have tested the patch on Intel 2 x 112 vCPUs SPR server with clickbench and latest upstream
ClickHouse.
Q5 has got a big 264% performance improvement and 24 queries have got at least 5% performance
gain. The overall geomean of 43 queries has gained 7.4% more than the base code.
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* add resize() for the data_vec in parallelizeMergePrepare()
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Add the performance test prepare_hash_before_merge.xml
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Fit the CI to rename the data set from hits_v1 to test.hits.
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* remove the redundant branch in UniqExactSet
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>
* Remove the empty methods and add throw exception in parallelizeMergePrepare()
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
---------
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>