ClickHouse/tests/performance
Azat Khuzhin 35231662b3 Improve performance of AggregatingMergeTree w/ SimpleAggregateFunction(String)
While reading from AggregatingMergeTree with
SimpleAggregateFunction(String) in primary key and
optimize_aggregation_in_order perf top shows:

    Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 287759760270 lost: 0/0 drop: 0/0
      Children      Self  Shared Object         Symbol
    +   12.64%    11.39%  clickhouse            [.] memcpy
    +    9.08%     0.23%  [unknown]             [.] 0000000000000000
    +    8.45%     8.40%  clickhouse            [.] ProfileEvents::increment    # <-- this, and in debug it has not 0.08x overhead, but 5.8x overhead
    +    7.68%     7.67%  clickhouse            [.] LZ4_compress_fast_extState
    +    5.29%     5.22%  clickhouse            [.] DB::IAggregateFunctionHelper<DB::AggregateFunctionNullUnary<true, true> >::addFree

The reason is obvious, ProfileEvents is atomic counters (and also they
are nested):

<details>

```
    Samples: 7M of event 'cycles', 4000 Hz, Event count (approx.): 450726149337
    ProfileEvents::increment  /usr/bin/clickhouse [Percent: local period]
    Percent│
           │
           │
           │    Disassembly of section .text:
           │
           │    00000000078d8900 <ProfileEvents::increment(unsigned long, unsigned long)@@Base>:
           │    ProfileEvents::increment(unsigned long, unsigned long):
      0.17 │      push  %rbp
      0.00 │      mov   %rsi,%rbp
      0.04 │      push  %rbx
      0.20 │      mov   %rdi,%rbx
      0.17 │      sub   $0x8,%rsp
      0.26 │    → callq DB::CurrentThread::getProfileEvents
           │    ProfileEvents::Counters::increment(unsigned long, unsigned long):
      0.00 │      lea   0x0(,%rbx,8),%rdi
      0.05 │      nop
           │    unsigned long std::__1::__cxx_atomic_fetch_add<unsigned long, unsigned long>(std::__1::__cxx_atomic_base_impl<unsigned long>*, unsigned long, std::__1::memory_order):
      1.02 │      mov   (%rax),%rdx
     97.04 │      lock  add   %rbp,(%rdx,%rdi,1)
           │    ProfileEvents::Counters::increment(unsigned long, unsigned long):
      0.21 │      mov   0x10(%rax),%rax
      0.04 │      test  %rax,%rax
      0.00 │    → jne   78d8920 <ProfileEvents::increment(unsigned long, unsigned long)@@Base+0x20>
           │    ProfileEvents::increment(unsigned long, unsigned long):
      0.38 │      add   $0x8,%rsp
      0.00 │      pop   %rbx
      0.04 │      pop   %rbp
      0.38 │    ← retq
```

</details>

These ProfileEvents was ArenaAllocChunks (it shows ~1.5M events per
second), and the reason is that the table has
SimpleAggregateFunction(String) in PK, which requires Arena.
But most of the time there Arena wasn't even used, so avoid this cost by
re-creating Arena only if it was "used" (i.e. has new chunks).

Another possibility is to avoid populating Arena::head in ctor, but this
will make the Arena code more complex, so for now this was preferred.

Also as a long-term solution it worth looking at implementing them via
RCU (to move the extra overhead out from the write code path into read
side).
2020-11-19 23:06:12 +03:00
..
agg_functions_min_max_any.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
aggregate_functions_of_group_by_keys.xml Eliminate min/max/any aggregators of GROUP BY keys in SELECT section (#11667) 2020-06-17 12:32:43 +03:00
aggregating_merge_tree_simple_aggregate_function_string.xml Improve performance of AggregatingMergeTree w/ SimpleAggregateFunction(String) 2020-11-19 23:06:12 +03:00
aggregating_merge_tree.xml Update aggregating_merge_tree.xml 2020-06-09 01:41:57 +03:00
aggregation_in_order.xml better test 2020-05-31 03:00:16 +03:00
analyze_array_tuples.xml Explicitly mark short perftest queries 2020-06-23 15:09:54 +03:00
and_function.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
any_anyLast.xml more fixes 2020-10-30 18:54:38 +03:00
arithmetic_operations_in_aggr_func.xml fixup 2020-07-03 11:39:43 +03:00
arithmetic.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
array_auc.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
array_element.xml More stable perf tests 2020-06-08 16:57:33 +03:00
array_fill.xml performance comparison 2020-04-16 23:55:21 +03:00
array_index_low_cardinality_numbers.xml In perf test array_index_lc: adjusted iterations count 2020-08-24 12:02:48 +03:00
array_index_low_cardinality_strings.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
array_join.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
array_reduce.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
base64_hits.xml fixup 2020-06-09 16:29:07 +03:00
base64.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
basename.xml fixup 2020-06-09 16:29:07 +03:00
bigint_arithm.xml Faster 256-bit multiplication (#15418) 2020-09-29 20:52:34 +03:00
bit_operations_fixed_string_numbers.xml More stable perf tests 2020-06-08 16:57:33 +03:00
bit_operations_fixed_string.xml fixup 2020-06-09 01:13:08 +03:00
bitCount.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
bloom_filter_insert.xml Fix perf tests 2020-11-08 16:55:26 +03:00
bloom_filter_select.xml Update bloom_filter_select.xml 2020-11-09 17:58:27 +03:00
bounding_ratio.xml Explicitly mark short perftest queries 2020-06-23 15:09:54 +03:00
cidr.xml Explicitly mark short perftest queries 2020-06-23 15:09:54 +03:00
codecs_float_insert.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
codecs_float_select.xml Explicitly mark short perftest queries 2020-06-23 15:09:54 +03:00
codecs_int_insert.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
codecs_int_select.xml Explicitly mark short perftest queries 2020-06-23 15:09:54 +03:00
collations.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
column_column_comparison.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
columns_hashing.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
complex_array_creation.xml fix more queries 2020-06-26 22:11:43 +03:00
concat_hits.xml Explicitly mark short perftest queries 2020-06-23 15:09:54 +03:00
conditional.xml fix more queries 2020-06-26 22:11:43 +03:00
consistent_hashes.xml Remove perf test of sumbur hash, because we do not care 2020-11-08 21:17:30 +03:00
constant_column_comparison.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
constant_column_search.xml fixes 2020-11-10 07:19:43 +03:00
count.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
cpu_synthetic.xml Translate comments to english 2020-11-05 21:51:36 +03:00
cryptographic_hashes.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
date_parsing.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
date_time_64.xml fixup 2020-07-02 10:44:16 +03:00
date_time_long.xml Added tests for date_trunc function. 2020-08-25 16:41:23 -07:00
date_time_short.xml Added tests for date_trunc function. 2020-08-25 16:41:23 -07:00
datetime_comparison.xml Fix datetime comparison 2020-09-23 10:29:55 +08:00
decimal_aggregates.xml Adjust ignore thresholds for unstable perf tests 2020-09-16 18:27:51 +03:00
decimal_casts.xml more fixes 2020-10-30 18:54:38 +03:00
decimal_parse.xml Added performance test 2020-05-17 06:16:34 +03:00
distinct_combinator.xml more optimal aggregate functions with both 'if' and 'distinct' combinators 2020-06-22 17:57:30 +03:00
distributed_aggregation_memory_efficient.xml Try fix perftest. 2020-05-06 21:53:40 +03:00
distributed_aggregation.xml Merge remote-tracking branch 'origin/master' into HEAD 2020-04-16 23:03:17 +03:00
duplicate_order_by_and_distinct.xml fix more queries 2020-06-26 22:11:43 +03:00
early_constant_folding.xml fix more queries 2020-06-26 22:11:43 +03:00
empty_string_deserialization.xml fixup 2020-07-02 10:44:16 +03:00
empty_string_serialization.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
encrypt_decrypt_empty_string_slow.xml AES: Attempt to make performance tests faster and more stable. 2020-10-20 08:05:39 +03:00
encrypt_decrypt_empty_string.xml AES: Attempt to make performance tests faster and more stable. 2020-10-20 08:05:39 +03:00
encrypt_decrypt_slow.xml Try 12 threads max in perf test. 2020-11-09 19:14:05 +03:00
encrypt_decrypt.xml AES: Attempt to make performance tests faster and more stable. 2020-10-20 08:05:39 +03:00
entropy.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
extract.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
first_significant_subdomain.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
fixed_string16.xml fix more queries 2020-06-26 22:11:43 +03:00
float_formatting.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
float_mod.xml Add performance test 2020-07-21 13:38:14 +03:00
float_parsing.xml fixup 2020-07-02 10:44:16 +03:00
format_date_time.xml fix more queries 2020-06-26 22:11:43 +03:00
format_readable.xml more fixes 2020-10-30 18:54:38 +03:00
functions_coding.xml fix more queries 2020-06-26 22:11:43 +03:00
functions_geo.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
functions_with_hash_tables.xml more fixes 2020-10-30 18:54:38 +03:00
fuzz_bits.xml fix issues 2020-05-29 05:06:21 +03:00
general_purpose_hashes_on_UUID.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
general_purpose_hashes.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
generate_table_function.xml Try 12 threads max in perf test. 2020-11-09 19:14:05 +03:00
great_circle_dist.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
group_array_moving_sum.xml performance comparison 2020-04-29 09:30:37 +03:00
group_by_sundy_li.xml Update group_by_sundy_li.xml 2020-11-04 18:07:18 +03:00
h3.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
if_array_num.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
if_array_string.xml Adjust ignore thresholds for unstable perf tests 2020-09-16 18:27:51 +03:00
if_string_const.xml fix more queries 2020-06-26 22:11:43 +03:00
if_string_hits.xml fix more queries 2020-06-26 22:11:43 +03:00
if_to_multiif.xml fixup 2020-07-02 10:44:16 +03:00
if_transform_strings_to_enum.xml Try to make if_transform_strings perf test faster (#12598) 2020-07-21 13:34:55 +03:00
information_value.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
injective_functions_inside_uniq.xml Delete injective functions inside uniq (#12337) 2020-07-10 13:42:41 +03:00
insert_parallel.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
insert_select_default_small_block.xml Add performance test 2020-05-29 05:30:49 +03:00
insert_sequential_and_background_merges.xml Rename test 2020-10-23 13:42:15 +03:00
insert_values_with_expressions.xml Update insert_values_with_expressions.xml 2020-09-22 13:23:10 +03:00
inserts_arrays_lowcardinality.xml more fixes 2020-10-30 18:54:38 +03:00
int_parsing.xml Update int_parsing.xml 2020-07-08 12:40:33 +03:00
IPv4.xml fix more queries 2020-06-26 22:11:43 +03:00
IPv6.xml fix more queries 2020-06-26 22:11:43 +03:00
jit_large_requests.xml more fixes 2020-10-30 18:54:38 +03:00
jit_small_requests.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
joins_in_memory_pmj.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
joins_in_memory.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
json_extract_rapidjson.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
json_extract_simdjson.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
least_greatest_hits.xml fixes 2020-07-09 13:46:16 +03:00
leftpad.xml fixup 2020-07-02 10:44:16 +03:00
linear_regression.xml More robust 2020-08-24 03:14:24 +03:00
local_replica.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
logical_functions_large.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
logical_functions_medium.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
logical_functions_small.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
materialized_view_parallel_insert.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
math.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
merge_table_streams.xml Update merge_table_streams.xml 2020-05-23 23:45:28 +03:00
merge_tree_huge_pk.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
merge_tree_many_partitions_2.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
merge_tree_many_partitions.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
merge_tree_simple_select.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
mingroupby-orderbylimit1.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
modulo.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
monotonous_order_by.xml Optimize monotonous ORDER BY (#12467) 2020-07-15 13:10:21 +03:00
ngram_distance.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
number_formatting_formats.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
nyc_taxi.xml Edited strange test 2020-05-23 17:53:54 +03:00
optimized_select_final.xml Update optimized_select_final.xml 2020-10-30 17:48:59 +03:00
or_null_default.xml Add a test 2020-11-04 14:00:43 +03:00
order_by_decimals.xml Updated perftests. 2020-06-17 13:00:28 +03:00
order_by_read_in_order.xml more fixes 2020-10-30 18:54:38 +03:00
order_by_single_column.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
order_with_limit.xml Merge branch 'master' into remove-useless-setting-from-perf-test 2020-07-14 09:31:34 +03:00
parallel_final.xml Try make tests faster. 2020-04-30 20:58:09 +03:00
parallel_index.xml Don't account for short queries, we'll deal with them separately. 2020-09-17 13:00:51 +03:00
parallel_insert.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
parallel_mv.xml more fixes 2020-10-30 18:54:38 +03:00
parse_engine_file.xml Try 12 threads max in perf test. 2020-11-09 19:14:05 +03:00
point_in_polygon_const.xml fix more queries 2020-06-26 22:11:43 +03:00
point_in_polygon.xml Disable "optimize_trivial_insert_select" for one perf test 2020-08-26 01:33:31 +03:00
polymorphic_parts_l.xml Update polymorphic_parts_l.xml 2020-07-03 11:46:48 +03:00
polymorphic_parts_m.xml Update polymorphic_parts_m.xml 2020-07-03 11:46:29 +03:00
polymorphic_parts_s.xml Merge remote-tracking branch 'origin' into CurtizJ-polymorphic-parts 2020-07-06 21:34:17 +03:00
pre_limit_no_sorting.xml Added performance test to resemble questionable benchmark 2020-06-25 23:40:07 +03:00
prewhere.xml Added performance test to resemble questionable benchmark 2020-06-25 23:40:07 +03:00
push_down_limit.xml Marked some perf test queries as short 2020-11-11 19:58:31 +03:00
quantile_merge.xml Marked some perf test queries as short 2020-11-11 19:58:31 +03:00
questdb_sum_float32.xml split 2020-07-07 11:01:50 +03:00
questdb_sum_float64.xml too slow 2020-07-07 20:07:04 +03:00
questdb_sum_int32.xml split 2020-07-07 11:01:50 +03:00
rand.xml Try 12 threads max in perf test. 2020-11-09 19:14:05 +03:00
random_fixed_string.xml fixup 2020-07-02 10:44:16 +03:00
random_printable_ascii.xml Adjust ignore thresholds for unstable perf tests 2020-09-16 18:27:51 +03:00
random_string_utf8.xml more fixes 2020-10-30 18:54:38 +03:00
random_string.xml more fixes 2020-10-30 18:54:38 +03:00
range.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
read_from_comp_parts.xml Mark query short 2020-07-24 16:59:50 +03:00
read_hits_with_aio.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
read_in_order_many_parts.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
README.md Update instruction for perf tests 2020-08-26 20:34:05 +03:00
redundant_functions_in_order_by.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
removing_group_by_keys.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
right.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
round_down.xml make the test longer 2020-10-30 18:16:30 +03:00
round_methods.xml fixup 2020-07-02 10:44:16 +03:00
scalar.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
select_format.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
set_hits.xml fixup 2020-06-10 02:37:48 +03:00
set_index.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
set.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
simple_join_query.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
single_fixed_string_groupby.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
slices_hits.xml fix more queries 2020-06-26 22:11:43 +03:00
sort_radix_trivial.xml fix some broken performance tests 2020-05-28 10:45:03 +03:00
sort.xml Add performance test #10981 2020-05-23 17:41:13 +03:00
string_join.xml Edited strange test 2020-05-23 17:53:54 +03:00
string_set.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
string_sort.xml Try 12 threads max in perf test. 2020-11-09 19:14:05 +03:00
sum_map.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
sum.xml Add performance test 2020-05-18 07:55:06 +03:00
synthetic_hardware_benchmark.xml Try 12 threads max in perf test. 2020-11-09 19:14:05 +03:00
trim_numbers.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
trim_urls.xml fix 2020-11-13 08:00:43 +03:00
trim_whitespace.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
uniq.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00
url_hits.xml Remove garbage from performance tests 2020-04-18 15:54:16 +03:00
vectorize_aggregation_combinators.xml Added performance test to resemble questionable benchmark 2020-06-25 23:40:07 +03:00
visit_param_extract_raw.xml performance comparison 2020-04-28 10:45:56 +03:00
website.xml Adjust ignored perf test changes after NUMA binding 2020-10-30 18:12:15 +03:00

ClickHouse performance tests

This directory contains .xml-files with performance tests for @akuzm tool.

How to write performance test

First of all you should check existing tests don't cover your case. If there are no such tests than you should write your own.

You have to specify preconditions. It contains table names. Only hits_100m_single, hits_10m_single, test.hits are available in CI.

You can use substitions, create, fill and drop queries to prepare test. You can find examples in this folder.

Take into account, that these tests will run in CI which consists of 56-cores and 512 RAM machines. Queries will be executed much faster than on local laptop.

If your test continued more than 10 minutes, please, add tag long to have an opportunity to run all tests and skip long ones.

How to run performance test

TODO @akuzm

How to validate single test

pip3 install clickhouse_driver
../../docker/test/performance-comparison/perf.py --runs 1 insert_parallel.xml