* feat: Preserve number of streams after evaluation the window functions to allow parallel stream processing
* fix style
* fix style
* fix style
* setting query_plan_preserve_num_streams_after_window_functions default true
* fix tests by SETTINGS query_plan_preserve_num_streams_after_window_functions=0
* fix test references
* Resize the streams after the last window function, to keep the order between WindowTransforms (and WindowTransform works on single stream anyway).
* feat: Preserve number of streams after evaluation the window functions to allow parallel stream processing
* fix style
* fix style
* fix style
* setting query_plan_preserve_num_streams_after_window_functions default true
* fix tests by SETTINGS query_plan_preserve_num_streams_after_window_functions=0
* fix test references
* Resize the streams after the last window function, to keep the order between WindowTransforms (and WindowTransform works on single stream anyway).
* add perf test
* perf: change the dataset from 50M to 5M
* rename query_plan_preserve_num_streams_after_window_functions -> query_plan_enable_multithreading_after_window_functions
* update test reference
* fix clang-tidy
---------
Co-authored-by: Nikita Taranov <nikita.taranov@clickhouse.com>
* Cache cast function in set during execution
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
* minor fix for performance test
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
* Update src/Interpreters/castColumn.cpp
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>
* improvement
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
* fix use-after-free
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
---------
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>
* Optimize the merge if all hashSets are singleLevel
In PR(https://github.com/ClickHouse/ClickHouse/pull/50748), it has added new phase
`parallelizeMergePrepare` before merge if all the hashSets are not all singleLevel
or not all twoLevel. Then it will convert all the singleLevelSet to twoLevelSet in
parallel, which will increase the CPU utilization and QPS.
But if all the hashtables are singleLevel, it could also benefit from the
`parallelizeMergePrepare` optimization in most cases if the hashtable size are not
too small. By tuning the Query `SELECT COUNT(DISTINCT SearchPhase) FROM hits_v1`
in different threads, we have got the mild threshold 6,000.
Test patch with the Query 'SELECT COUNT(DISTINCT Title) FROM hits_v1' on 2x80 vCPUs
server. If the threads are less than 48, the hashSets are all twoLevel or mixed by
singleLevel and twoLevel. If the threads are over 56, all the hashSets are singleLevel.
And the QPS has got at most 2.35x performance gain.
Threads Opt/Base
8 100.0%
16 99.4%
24 110.3%
32 99.9%
40 99.3%
48 99.8%
56 183.0%
64 234.7%
72 233.1%
80 229.9%
88 224.5%
96 229.6%
104 235.1%
112 229.5%
120 229.1%
128 217.8%
136 222.9%
144 217.8%
152 204.3%
160 203.2%
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Add the comment and explanation for PR#52973
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
---------
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>