ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-21 23:21:59 +00:00

History

Azat Khuzhin 35231662b3 Improve performance of AggregatingMergeTree w/ SimpleAggregateFunction(String) While reading from AggregatingMergeTree with SimpleAggregateFunction(String) in primary key and optimize_aggregation_in_order perf top shows: Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 287759760270 lost: 0/0 drop: 0/0 Children Self Shared Object Symbol + 12.64% 11.39% clickhouse [.] memcpy + 9.08% 0.23% [unknown] [.] 0000000000000000 + 8.45% 8.40% clickhouse [.] ProfileEvents::increment # <-- this, and in debug it has not 0.08x overhead, but 5.8x overhead + 7.68% 7.67% clickhouse [.] LZ4_compress_fast_extState + 5.29% 5.22% clickhouse [.] DB::IAggregateFunctionHelper<DB::AggregateFunctionNullUnary<true, true> >::addFree The reason is obvious, ProfileEvents is atomic counters (and also they are nested): <details> ``` Samples: 7M of event 'cycles', 4000 Hz, Event count (approx.): 450726149337 ProfileEvents::increment /usr/bin/clickhouse [Percent: local period] Percent│ │ │ │ Disassembly of section .text: │ │ 00000000078d8900 <ProfileEvents::increment(unsigned long, unsigned long)@@Base>: │ ProfileEvents::increment(unsigned long, unsigned long): 0.17 │ push %rbp 0.00 │ mov %rsi,%rbp 0.04 │ push %rbx 0.20 │ mov %rdi,%rbx 0.17 │ sub $0x8,%rsp 0.26 │ → callq DB::CurrentThread::getProfileEvents │ ProfileEvents::Counters::increment(unsigned long, unsigned long): 0.00 │ lea 0x0(,%rbx,8),%rdi 0.05 │ nop │ unsigned long std::__1::__cxx_atomic_fetch_add<unsigned long, unsigned long>(std::__1::__cxx_atomic_base_impl<unsigned long>*, unsigned long, std::__1::memory_order): 1.02 │ mov (%rax),%rdx 97.04 │ lock add %rbp,(%rdx,%rdi,1) │ ProfileEvents::Counters::increment(unsigned long, unsigned long): 0.21 │ mov 0x10(%rax),%rax 0.04 │ test %rax,%rax 0.00 │ → jne 78d8920 <ProfileEvents::increment(unsigned long, unsigned long)@@Base+0x20> │ ProfileEvents::increment(unsigned long, unsigned long): 0.38 │ add $0x8,%rsp 0.00 │ pop %rbx 0.04 │ pop %rbp 0.38 │ ← retq ``` </details> These ProfileEvents was ArenaAllocChunks (it shows ~1.5M events per second), and the reason is that the table has SimpleAggregateFunction(String) in PK, which requires Arena. But most of the time there Arena wasn't even used, so avoid this cost by re-creating Arena only if it was "used" (i.e. has new chunks). Another possibility is to avoid populating Arena::head in ctor, but this will make the Arena code more complex, so for now this was preferred. Also as a long-term solution it worth looking at implementing them via RCU (to move the extra overhead out from the write code path into read side).		2020-11-19 23:06:12 +03:00
..
agg_functions_min_max_any.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
aggregate_functions_of_group_by_keys.xml	Eliminate min/max/any aggregators of GROUP BY keys in SELECT section (#11667 )	2020-06-17 12:32:43 +03:00
aggregating_merge_tree_simple_aggregate_function_string.xml	Improve performance of AggregatingMergeTree w/ SimpleAggregateFunction(String)	2020-11-19 23:06:12 +03:00
aggregating_merge_tree.xml	Update aggregating_merge_tree.xml	2020-06-09 01:41:57 +03:00
aggregation_in_order.xml	better test	2020-05-31 03:00:16 +03:00
analyze_array_tuples.xml	Explicitly mark short perftest queries	2020-06-23 15:09:54 +03:00
and_function.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
any_anyLast.xml	more fixes	2020-10-30 18:54:38 +03:00
arithmetic_operations_in_aggr_func.xml	fixup	2020-07-03 11:39:43 +03:00
arithmetic.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
array_auc.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
array_element.xml	More stable perf tests	2020-06-08 16:57:33 +03:00
array_fill.xml	performance comparison	2020-04-16 23:55:21 +03:00
array_index_low_cardinality_numbers.xml	In perf test array_index_lc: adjusted iterations count	2020-08-24 12:02:48 +03:00
array_index_low_cardinality_strings.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
array_join.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
array_reduce.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
base64_hits.xml	fixup	2020-06-09 16:29:07 +03:00
base64.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
basename.xml	fixup	2020-06-09 16:29:07 +03:00
bigint_arithm.xml	Faster 256-bit multiplication (#15418 )	2020-09-29 20:52:34 +03:00
bit_operations_fixed_string_numbers.xml	More stable perf tests	2020-06-08 16:57:33 +03:00
bit_operations_fixed_string.xml	fixup	2020-06-09 01:13:08 +03:00
bitCount.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
bloom_filter_insert.xml	Fix perf tests	2020-11-08 16:55:26 +03:00
bloom_filter_select.xml	Update bloom_filter_select.xml	2020-11-09 17:58:27 +03:00
bounding_ratio.xml	Explicitly mark short perftest queries	2020-06-23 15:09:54 +03:00
cidr.xml	Explicitly mark short perftest queries	2020-06-23 15:09:54 +03:00
codecs_float_insert.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
codecs_float_select.xml	Explicitly mark short perftest queries	2020-06-23 15:09:54 +03:00
codecs_int_insert.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
codecs_int_select.xml	Explicitly mark short perftest queries	2020-06-23 15:09:54 +03:00
collations.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
column_column_comparison.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
columns_hashing.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
complex_array_creation.xml	fix more queries	2020-06-26 22:11:43 +03:00
concat_hits.xml	Explicitly mark short perftest queries	2020-06-23 15:09:54 +03:00
conditional.xml	fix more queries	2020-06-26 22:11:43 +03:00
consistent_hashes.xml	Remove perf test of sumbur hash, because we do not care	2020-11-08 21:17:30 +03:00
constant_column_comparison.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
constant_column_search.xml	fixes	2020-11-10 07:19:43 +03:00
count.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
cpu_synthetic.xml	Translate comments to english	2020-11-05 21:51:36 +03:00
cryptographic_hashes.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
date_parsing.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
date_time_64.xml	fixup	2020-07-02 10:44:16 +03:00
date_time_long.xml	Added tests for date_trunc function.	2020-08-25 16:41:23 -07:00
date_time_short.xml	Added tests for date_trunc function.	2020-08-25 16:41:23 -07:00
datetime_comparison.xml	Fix datetime comparison	2020-09-23 10:29:55 +08:00
decimal_aggregates.xml	Adjust ignore thresholds for unstable perf tests	2020-09-16 18:27:51 +03:00
decimal_casts.xml	more fixes	2020-10-30 18:54:38 +03:00
decimal_parse.xml	Added performance test	2020-05-17 06:16:34 +03:00
distinct_combinator.xml	more optimal aggregate functions with both 'if' and 'distinct' combinators	2020-06-22 17:57:30 +03:00
distributed_aggregation_memory_efficient.xml	Try fix perftest.	2020-05-06 21:53:40 +03:00
distributed_aggregation.xml	Merge remote-tracking branch 'origin/master' into HEAD	2020-04-16 23:03:17 +03:00
duplicate_order_by_and_distinct.xml	fix more queries	2020-06-26 22:11:43 +03:00
early_constant_folding.xml	fix more queries	2020-06-26 22:11:43 +03:00
empty_string_deserialization.xml	fixup	2020-07-02 10:44:16 +03:00
empty_string_serialization.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
encrypt_decrypt_empty_string_slow.xml	AES: Attempt to make performance tests faster and more stable.	2020-10-20 08:05:39 +03:00
encrypt_decrypt_empty_string.xml	AES: Attempt to make performance tests faster and more stable.	2020-10-20 08:05:39 +03:00
encrypt_decrypt_slow.xml	Try 12 threads max in perf test.	2020-11-09 19:14:05 +03:00
encrypt_decrypt.xml	AES: Attempt to make performance tests faster and more stable.	2020-10-20 08:05:39 +03:00
entropy.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
extract.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
first_significant_subdomain.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
fixed_string16.xml	fix more queries	2020-06-26 22:11:43 +03:00
float_formatting.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
float_mod.xml	Add performance test	2020-07-21 13:38:14 +03:00
float_parsing.xml	fixup	2020-07-02 10:44:16 +03:00
format_date_time.xml	fix more queries	2020-06-26 22:11:43 +03:00
format_readable.xml	more fixes	2020-10-30 18:54:38 +03:00
functions_coding.xml	fix more queries	2020-06-26 22:11:43 +03:00
functions_geo.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
functions_with_hash_tables.xml	more fixes	2020-10-30 18:54:38 +03:00
fuzz_bits.xml	fix issues	2020-05-29 05:06:21 +03:00
general_purpose_hashes_on_UUID.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
general_purpose_hashes.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
generate_table_function.xml	Try 12 threads max in perf test.	2020-11-09 19:14:05 +03:00
great_circle_dist.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
group_array_moving_sum.xml	performance comparison	2020-04-29 09:30:37 +03:00
group_by_sundy_li.xml	Update group_by_sundy_li.xml	2020-11-04 18:07:18 +03:00
h3.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
if_array_num.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
if_array_string.xml	Adjust ignore thresholds for unstable perf tests	2020-09-16 18:27:51 +03:00
if_string_const.xml	fix more queries	2020-06-26 22:11:43 +03:00
if_string_hits.xml	fix more queries	2020-06-26 22:11:43 +03:00
if_to_multiif.xml	fixup	2020-07-02 10:44:16 +03:00
if_transform_strings_to_enum.xml	Try to make if_transform_strings perf test faster (#12598 )	2020-07-21 13:34:55 +03:00
information_value.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
injective_functions_inside_uniq.xml	Delete injective functions inside uniq (#12337 )	2020-07-10 13:42:41 +03:00
insert_parallel.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
insert_select_default_small_block.xml	Add performance test	2020-05-29 05:30:49 +03:00
insert_sequential_and_background_merges.xml	Rename test	2020-10-23 13:42:15 +03:00
insert_values_with_expressions.xml	Update insert_values_with_expressions.xml	2020-09-22 13:23:10 +03:00
inserts_arrays_lowcardinality.xml	more fixes	2020-10-30 18:54:38 +03:00
int_parsing.xml	Update int_parsing.xml	2020-07-08 12:40:33 +03:00
IPv4.xml	fix more queries	2020-06-26 22:11:43 +03:00
IPv6.xml	fix more queries	2020-06-26 22:11:43 +03:00
jit_large_requests.xml	more fixes	2020-10-30 18:54:38 +03:00
jit_small_requests.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
joins_in_memory_pmj.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
joins_in_memory.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
json_extract_rapidjson.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
json_extract_simdjson.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
least_greatest_hits.xml	fixes	2020-07-09 13:46:16 +03:00
leftpad.xml	fixup	2020-07-02 10:44:16 +03:00
linear_regression.xml	More robust	2020-08-24 03:14:24 +03:00
local_replica.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
logical_functions_large.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
logical_functions_medium.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
logical_functions_small.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
materialized_view_parallel_insert.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
math.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
merge_table_streams.xml	Update merge_table_streams.xml	2020-05-23 23:45:28 +03:00
merge_tree_huge_pk.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
merge_tree_many_partitions_2.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
merge_tree_many_partitions.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
merge_tree_simple_select.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
mingroupby-orderbylimit1.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
modulo.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
monotonous_order_by.xml	Optimize monotonous ORDER BY (#12467 )	2020-07-15 13:10:21 +03:00
ngram_distance.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
number_formatting_formats.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
nyc_taxi.xml	Edited strange test	2020-05-23 17:53:54 +03:00
optimized_select_final.xml	Update optimized_select_final.xml	2020-10-30 17:48:59 +03:00
or_null_default.xml	Add a test	2020-11-04 14:00:43 +03:00
order_by_decimals.xml	Updated perftests.	2020-06-17 13:00:28 +03:00
order_by_read_in_order.xml	more fixes	2020-10-30 18:54:38 +03:00
order_by_single_column.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
order_with_limit.xml	Merge branch 'master' into remove-useless-setting-from-perf-test	2020-07-14 09:31:34 +03:00
parallel_final.xml	Try make tests faster.	2020-04-30 20:58:09 +03:00
parallel_index.xml	Don't account for short queries, we'll deal with them separately.	2020-09-17 13:00:51 +03:00
parallel_insert.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
parallel_mv.xml	more fixes	2020-10-30 18:54:38 +03:00
parse_engine_file.xml	Try 12 threads max in perf test.	2020-11-09 19:14:05 +03:00
point_in_polygon_const.xml	fix more queries	2020-06-26 22:11:43 +03:00
point_in_polygon.xml	Disable "optimize_trivial_insert_select" for one perf test	2020-08-26 01:33:31 +03:00
polymorphic_parts_l.xml	Update polymorphic_parts_l.xml	2020-07-03 11:46:48 +03:00
polymorphic_parts_m.xml	Update polymorphic_parts_m.xml	2020-07-03 11:46:29 +03:00
polymorphic_parts_s.xml	Merge remote-tracking branch 'origin' into CurtizJ-polymorphic-parts	2020-07-06 21:34:17 +03:00
pre_limit_no_sorting.xml	Added performance test to resemble questionable benchmark	2020-06-25 23:40:07 +03:00
prewhere.xml	Added performance test to resemble questionable benchmark	2020-06-25 23:40:07 +03:00
push_down_limit.xml	Marked some perf test queries as short	2020-11-11 19:58:31 +03:00
quantile_merge.xml	Marked some perf test queries as short	2020-11-11 19:58:31 +03:00
questdb_sum_float32.xml	split	2020-07-07 11:01:50 +03:00
questdb_sum_float64.xml	too slow	2020-07-07 20:07:04 +03:00
questdb_sum_int32.xml	split	2020-07-07 11:01:50 +03:00
rand.xml	Try 12 threads max in perf test.	2020-11-09 19:14:05 +03:00
random_fixed_string.xml	fixup	2020-07-02 10:44:16 +03:00
random_printable_ascii.xml	Adjust ignore thresholds for unstable perf tests	2020-09-16 18:27:51 +03:00
random_string_utf8.xml	more fixes	2020-10-30 18:54:38 +03:00
random_string.xml	more fixes	2020-10-30 18:54:38 +03:00
range.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
read_from_comp_parts.xml	Mark query short	2020-07-24 16:59:50 +03:00
read_hits_with_aio.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
read_in_order_many_parts.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
README.md	Update instruction for perf tests	2020-08-26 20:34:05 +03:00
redundant_functions_in_order_by.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
removing_group_by_keys.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
right.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
round_down.xml	make the test longer	2020-10-30 18:16:30 +03:00
round_methods.xml	fixup	2020-07-02 10:44:16 +03:00
scalar.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
select_format.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
set_hits.xml	fixup	2020-06-10 02:37:48 +03:00
set_index.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
set.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
simple_join_query.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
single_fixed_string_groupby.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
slices_hits.xml	fix more queries	2020-06-26 22:11:43 +03:00
sort_radix_trivial.xml	fix some broken performance tests	2020-05-28 10:45:03 +03:00
sort.xml	Add performance test #10981	2020-05-23 17:41:13 +03:00
string_join.xml	Edited strange test	2020-05-23 17:53:54 +03:00
string_set.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
string_sort.xml	Try 12 threads max in perf test.	2020-11-09 19:14:05 +03:00
sum_map.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
sum.xml	Add performance test	2020-05-18 07:55:06 +03:00
synthetic_hardware_benchmark.xml	Try 12 threads max in perf test.	2020-11-09 19:14:05 +03:00
trim_numbers.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
trim_urls.xml	fix	2020-11-13 08:00:43 +03:00
trim_whitespace.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
uniq.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00
url_hits.xml	Remove garbage from performance tests	2020-04-18 15:54:16 +03:00
vectorize_aggregation_combinators.xml	Added performance test to resemble questionable benchmark	2020-06-25 23:40:07 +03:00
visit_param_extract_raw.xml	performance comparison	2020-04-28 10:45:56 +03:00
website.xml	Adjust ignored perf test changes after NUMA binding	2020-10-30 18:12:15 +03:00

README.md

ClickHouse performance tests

This directory contains .xml-files with performance tests for @akuzm tool.

How to write performance test

First of all you should check existing tests don't cover your case. If there are no such tests than you should write your own.

You have to specify preconditions. It contains table names. Only hits_100m_single, hits_10m_single, test.hits are available in CI.

You can use substitions, create, fill and drop queries to prepare test. You can find examples in this folder.

Take into account, that these tests will run in CI which consists of 56-cores and 512 RAM machines. Queries will be executed much faster than on local laptop.

If your test continued more than 10 minutes, please, add tag long to have an opportunity to run all tests and skip long ones.

How to run performance test

TODO @akuzm

How to validate single test

pip3 install clickhouse_driver
../../docker/test/performance-comparison/perf.py --runs 1 insert_parallel.xml