SELECTs are affected by the mutations, since it tries to apply them on
fly, and scanning over existing mutations can take significant amount of
time (for simple queries, i.e. count())
And also even after mutation had been finished, it still a problem,
because mutations are not removed instantly.
So instead introduce an atomic counter alter_conversions_mutations, that
is incremented for new mutations and decremented once mutation
finished/killed, that way once the mutation finished they will not
affect queries.
Here are some numbers for non-RENAME mutations:
rmt vanilla w/o mutations | queries: 3693, QPS: 494.813
rmt vanilla w/ mutations | queries: 2190, QPS: 388.256
rmt patched w/o mutations | queries: 3168, QPS: 620.061
rmt patched w/ mutations | queries: 3155, QPS: 614.424
mt vanilla w/o mutations | queries: 3498, QPS: 656.399
mt vanilla w/ mutations | queries: 3821, QPS: 600.425
mt patched w/o mutations | queries: 5732, QPS: 745.585
mt patched w/ mutations | queries: 4719, QPS: 715.034
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
It looks redundant (added in 5ef51ed), though it has "fix tests" in the
log message, but CI reports is not available for the commits from that
PR [1], so let's try.
[1]: https://github.com/ClickHouse/ClickHouse/pull/37543
Also this can be a big problem, since the code under that lock
(throttling or quotas with previous implementation that uses
boost::atomic_shared_ptr) may sleep.
Some numbers:
run | time
------------------------|------
max_threads=100 before | 23.1
max_threads=100 after | 15.1
max_threads=4500 before | 4.5
max_threads=4500 after | 2.3
Query:
select sum(number) from numbers_mt(2000000) settings max_threads=X, max_block_size = 1
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Configure keeper for perf tests
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Improve performance of SELECTs with active mutations
getAlterMutationCommandsForPart() can be a hot path for query execution
when there are pending mutations.
- LOG_TEST - it is not only check one bool, but actually a bunch of
atomics as well.
- Return std::vector over std::map (map is not required there) - no
changes in performance.
- Copy only RENAME_COLUMN (since only this mutation is required by
AlterConversions).
And here are results:
run|result
-|-
SELECT w/o ALTER|queries: 1565, QPS: 355.259, RPS: 355.259
SELECT w/ ALTER unpatched|queries: 2099, QPS: 220.623, RPS: 220.623
SELECT w/ ALTER and w/o LOG_TEST|queries: 2730, QPS: 235.859, RPS: 235.859
SELECT w/ ALTER and w/o LOG_TEST and w/ RENAME_COLUMN only|queries: 2995, QPS: 290.982, RPS: 290.982
But there are still room for improvements, at least MergeTree engines
could implement getStorageSnapshotForQuery().
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Add AlterConversions::supportsMutationCommandType(), flatten vector<vector<MutationCommand>>
* Work around what appears to be a clang static analysis bug
---------
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Co-authored-by: Michael Kolupaev <michael.kolupaev@clickhouse.com>
* reuse result of functionfactory::get
* add perf test
* Update src/Functions/coalesce.cpp
Co-authored-by: János Benjamin Antal <antaljanosbenjamin@users.noreply.github.com>
* change as requested
---------
Co-authored-by: János Benjamin Antal <antaljanosbenjamin@users.noreply.github.com>
Before perf tests was relying on the following:
SELECT sumMap(['foo', 'bar'], [-0., -0.])
┌─sumMap(['foo', 'bar'], [-0., -0.])─┐
│ (['bar','foo'],[-0,-0]) │
└────────────────────────────────────┘
While it got changed, and now:
┌─sumMap(['foo', 'bar'], [-0., -0.])─┐
│ ([],[]) │
└────────────────────────────────────┘
But it works for nan:
SELECT sumMap(['foo', 'bar'], [nan, nan])
┌─sumMap(['foo', 'bar'], [nan, nan])─┐
│ (['bar','foo'],[nan,nan]) │
└────────────────────────────────────┘
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>