* Maintain per-thread timer_id rather than create/delete frequently
The QueryProfiler will frequently create/delete timer_id globally, which
will result in heavy kernel lock contention.
The idea is to maintain thread-local timer_id. Before create the
timer_id, it should check whether there is a timer_id already. And we
could stop the timer by timer_settime() rather than delete the timer_id
with timer_delete().
Apply the patch and run clickbench on latest 65d671b7c7 ClickHouse with
SPR 112 x 2 vCPUs. Query 4, 0, 5, 3, 15, 32 have 17.5%, 14.4%, 8.3%, 7.9%,
7.1%, 5.8% performance gain. The overall geomean has got 2.5%
performance gain.
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Pack the timer and delete the timer_id when thread terminates
Pack the timer and related methods into the class. Delete the timer_id
when the thread terminates.
According to the issue (ClickHouse#49965),
all of the SSB queries benefit from this optimization, some have even got
improved by ~30% and the overall QPS could be significantly improved by ~18%.
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Update src/Common/QueryProfiler.cpp
Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>
* Update src/Common/QueryProfiler.cpp
Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>
* Fix the review issue of QueryProfiler Timer from PR
https://github.com/ClickHouse/ClickHouse/pull/48778.
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Update src/Common/QueryProfiler.cpp
Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>
* Add two separate CurrentMetrics for created and active timers
in QueryProfiler.
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
---------
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>
Co-authored-by: Nikita Taranov <nikita.taranov@clickhouse.com>
$ clickhouse-benchmark -i 10000 -d 0 --parallelize_output_from_storages=0 -q "select * from values('foo')"
Loaded 1 queries.
Queries executed: 10000.
localhost:9000, queries 10000, QPS: 2800.490, RPS: 2800.490, MiB/s: 0.032, result RPS: 2800.490, result MiB/s: 0.032.
0.000% 0.000 sec.
10.000% 0.000 sec.
20.000% 0.000 sec.
30.000% 0.000 sec.
40.000% 0.000 sec.
50.000% 0.000 sec.
60.000% 0.000 sec.
70.000% 0.000 sec.
80.000% 0.000 sec.
90.000% 0.000 sec.
95.000% 0.000 sec.
99.000% 0.001 sec.
99.900% 0.001 sec.
99.990% 0.001 sec.
$ clickhouse-benchmark -i 10000 -d 0 --parallelize_output_from_storages=1 -q "select * from values('foo')"
Loaded 1 queries.
Queries executed: 10000.
localhost:9000, queries 10000, QPS: 1259.805, RPS: 1259.805, MiB/s: 0.014, result RPS: 1259.805, result MiB/s: 0.014.
0.000% 0.001 sec.
10.000% 0.001 sec.
20.000% 0.001 sec.
30.000% 0.001 sec.
40.000% 0.001 sec.
50.000% 0.001 sec.
60.000% 0.001 sec.
70.000% 0.001 sec.
80.000% 0.001 sec.
90.000% 0.001 sec.
95.000% 0.001 sec.
99.000% 0.001 sec.
99.900% 0.001 sec.
99.990% 0.003 sec.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Adding more processors for parallelize_output_from_storages is not a
costless operation (I've experienced some issues in production because
of this), and it is not easy to fix in a normal way, so let's disable it
for now.
Before this patch:
- INSERT INTO input SELECT * FROM numbers(10e6) SETTINGS parallelize_output_from_storages=1, min_insert_block_size_rows=1000
0 rows in set. Elapsed: 3.648 sec. Processed 20.00 million rows, 120.00 MB (5.48 million rows/s., 32.90 MB/s.)
- INSERT INTO input SELECT * FROM numbers(10e6) SETTINGS parallelize_output_from_storages=0, min_insert_block_size_rows=1000
0 rows in set. Elapsed: 1.851 sec. Processed 20.00 million rows, 120.00 MB (10.80 million rows/s., 64.82 MB/s.)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>