Commit Graph

1240 Commits

Author SHA1 Message Date
taiyang-li
56db02c953 Merge branch 'master' into improve_if_with_floating 2024-01-26 19:21:49 +08:00
taiyang-li
e8629cf4f5 add another perf and tests 2024-01-25 18:02:55 +08:00
taiyang-li
a657a2631f add perf tests 2024-01-24 19:58:07 +08:00
Robert Schulze
a4c6f87fb9
Further reduce runtime of norm_distance.xml 2024-01-23 10:52:52 +00:00
Raúl Marín
1e00dec997 Merge remote-tracking branch 'blessed/master' into argmin_optimization 2024-01-22 14:56:55 +01:00
Dmitry Novik
a18a8d8ea3
Merge pull request #59009 from nickitat/uniq_optimisation_for_distributed
`uniqExact` state parallel merging for distributed queries
2024-01-22 14:05:02 +01:00
Nikita Taranov
2b5482be8c add perf test 2024-01-19 17:57:11 +01:00
Raúl Marín
3739d46817 Merge remote-tracking branch 'blessed/master' into argmin_optimization 2024-01-19 14:43:44 +01:00
Robert Schulze
aa2d36e598
Reduce memory consumption of norm_distance.xml
The test data was stored in in-memory Array and Tuple tables, each
consuming about 80 GB. The population of the Tuple tables exceeded the
maximum memory thresholds (*).

As a fix,
- Reduce the Array cardinality from 200 to 150 and the number of rows
  from 10 to 8 million, reducing memory consumption to ca. 50 GB. This
  should pass the memory threshold and does not affect the test purpose.
- Don't test tuples. Due to their data layout, vector search is not
  efficient anyways. Saves another 80 GB.
2024-01-19 13:09:15 +00:00
Robert Schulze
180a07ee4b
Delete redundant test norm_distance_float.xml
- Duplicates the workload in norm_distance.xml
2024-01-19 13:09:14 +00:00
Robert Schulze
25fb44f16d
Extend performance test norm_dist.xml
Dimension = 200 (instead of 10) is more realistic for vector search use cases.
2024-01-19 13:09:08 +00:00
Raúl Marín
b92073ed7b
Revert "Extend performance test norm_dist.xml" 2024-01-19 13:21:03 +01:00
Raúl Marín
6aa5ac4cc6 Merge remote-tracking branch 'blessed/master' into argmin_optimization 2024-01-18 12:04:19 +01:00
Robert Schulze
abcb1b5c9f
Extend performance test norm_dist.xml
Dimension = 200 (instead of 10) is more realistic for vector search use cases.
2024-01-17 18:16:29 +00:00
Raúl Marín
849ac1fe99 Implement findExtremeMinIndex / findExtremeMaxIndex 2024-01-17 16:22:40 +01:00
Raúl Marín
cb9cc7a4cc Fix table names 2024-01-17 11:36:07 +01:00
Alexey Milovidov
1afcab35c1 Fix supply chain attack in performance tests 2024-01-14 08:25:12 +01:00
Duc Canh Le
458c8d758d simplify perf tests and minor code change
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
2024-01-11 08:25:35 +00:00
Duc Canh Le
6331d8a6f2 Merge branch 'master' into final_no_copy
Resolve conflicts

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
2024-01-11 02:55:14 +00:00
Azat Khuzhin
3d88dba0a7 Fix perf tests duration (checks.test_duration_ms)
The column in the source was seconds in Float32, we need to convert it to
milliseconds in UInt64.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2024-01-10 17:03:53 +03:00
taiyang-li
231de4ac49 Merge branch 'master' into ch_opt_array_element 2024-01-10 15:49:43 +08:00
李扬
2c76b1789c
Update tests/performance/array_element.xml
Co-authored-by: Kruglov Pavel <48961922+Avogar@users.noreply.github.com>
2024-01-10 10:46:29 +08:00
Nikolay Degterinsky
24733700fb
Merge pull request #57745 from KevinyhZou/imporve_multi_if_nullable
Improve `MultiIf` function performance while type is nullable
2024-01-09 23:17:58 +01:00
taiyang-li
e5b4bc8f45 Merge branch 'master' into ch_opt_array_element 2024-01-09 17:17:38 +08:00
李扬
f6691ec334
Update tests/performance/array_element.xml
Co-authored-by: Kruglov Pavel <48961922+Avogar@users.noreply.github.com>
2024-01-09 10:15:37 +08:00
taiyang-li
5c0ea7db94 Merge branch 'master' into ch_opt_array_element 2024-01-08 10:41:20 +08:00
Raúl Marín
0522d859c2
Merge pull request #58334 from Algunenano/minmax_non_numeric
Speedup MIN/MAX for non numeric types
2024-01-04 19:48:18 +01:00
Raúl Marín
641caba5b0 Adapt more tests 2024-01-04 11:36:33 +00:00
Raúl Marín
1d1edd5b57 Reduce sum_map.xml 2024-01-04 11:31:20 +00:00
Raúl Marín
2aa6690f2c Reduce hashed_dictionary.xml 2024-01-04 11:29:17 +00:00
Raúl Marín
39eaa8dc9c Halve the size of reinterpret_as.xml 2024-01-04 11:24:36 +00:00
Raúl Marín
3c7ae2f171 Reduce bounding_ratio.xml 2024-01-04 11:20:07 +00:00
Raúl Marín
c195320612 Reduce the size of join_used_flags.xml 2024-01-03 17:31:55 +00:00
Raúl Marín
c223ae56d3 Reduce the size of decimal_parse 2024-01-03 17:29:30 +00:00
Raúl Marín
910b338584 Reduce polymorphic_parts_m 2024-01-03 17:24:15 +00:00
Raúl Marín
b8305e1a6e Make test more reasonable 2024-01-03 17:19:44 +00:00
Raúl Marín
7ee1697971 Reduce setup time of min_max_index.xml 2024-01-03 17:16:45 +00:00
Raúl Marín
1c40700ea1 Merge remote-tracking branch 'blessed/master' into minmax_non_numeric 2024-01-03 14:09:28 +01:00
Duc Canh Le
d623702378 Merge branch 'master' into final_no_copy
Resolve conflicts

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
2024-01-02 06:07:49 +00:00
Raúl Marín
db97a69989 Add perf tests with tuples 2023-12-29 17:00:01 +01:00
Azat Khuzhin
0d71bf1411 Upload time of the perf tests into artifacts as test_duration_ms
Now perf test changes/failures will have two rows, row for new and row
for old server.

I thought about uploading only the time of the test on the new server,
but because not all perf tests uploaded, you cannot always get the time
of the test without the changes (i.e. from run on the upstream/master
repo/branch).

<details>

Before:

```sql
SELECT
    concat(test, ' #', toString(query_index)),
    'slower' AS test_status,
    0 AS test_duration_ms,
    concat('https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.', test, '.', toString(query_index)) AS report_url
FROM queries
WHERE (changed_fail != 0) AND (diff > 0)
UNION ALL
SELECT
    concat(test, ' #', toString(query_index)),
    'unstable' AS test_status,
    0 AS test_duration_ms,
    concat('https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#unstable-queries.', test, '.', toString(query_index)) AS report_url
FROM queries
WHERE unstable_fail != 0

Query id: 49dfdc9a-f549-4499-9a1a-410e5053f6c1

┌─concat(test, ' #', toString(query_index))─┬─test_status─┬─test_duration_ms─┬─report_url─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ hashed_array_dictionary #16               │ slower      │                0 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.hashed_array_dictionary.16 │
│ ngram_distance #2                         │ slower      │                0 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.2           │
│ ngram_distance #3                         │ slower      │                0 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.3           │
│ ngram_distance #4                         │ slower      │                0 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.4           │
└───────────────────────────────────────────┴─────────────┴──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

After:

```sql
SELECT
    concat(test, ' #', toString(query_index), '::', test_desc_.1) AS test_name,
    'slower' AS test_status,
    test_desc_.2 AS test_duration_ms,
    concat('https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.', test, '.', toString(query_index)) AS report_url
FROM queries
ARRAY JOIN map('old', left, 'new', right) AS test_desc_
WHERE (changed_fail != 0) AND (diff > 0)
UNION ALL
SELECT
    concat(test, ' #', toString(query_index), '::', test_desc_.1) AS test_name,
    'unstable' AS test_status,
    test_desc_.2 AS test_duration_ms,
    concat('https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#unstable-queries.', test, '.', toString(query_index)) AS report_url
FROM queries
ARRAY JOIN map('old', left, 'new', right) AS test_desc_
WHERE unstable_fail != 0

Query id: 20475bfd-754b-4159-aa16-7798f4720bf8

┌─test_name────────────────────────┬─test_status─┬─test_duration_ms─┬─report_url─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ hashed_array_dictionary #16::old │ slower      │           0.2149 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.hashed_array_dictionary.16 │
│ hashed_array_dictionary #16::new │ slower      │           0.2519 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.hashed_array_dictionary.16 │
│ ngram_distance #2::old           │ slower      │           0.3598 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.2           │
│ ngram_distance #2::new           │ slower      │           0.4425 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.2           │
│ ngram_distance #3::old           │ slower      │           0.3644 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.3           │
│ ngram_distance #3::new           │ slower      │           0.4716 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.3           │
│ ngram_distance #4::old           │ slower      │           0.3577 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.4           │
│ ngram_distance #4::new           │ slower      │           0.4577 │ https://s3.amazonaws.com/clickhouse-test-reports/$PR_TO_TEST/$SHA_TO_TEST/${CLICKHOUSE_PERFORMANCE_COMPARISON_CHECK_NAME_PREFIX}/report.html#changes-in-performance.ngram_distance.4           │
└──────────────────────────────────┴─────────────┴──────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```

</details>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-12-29 13:48:41 +01:00
Alexey Milovidov
0e678fb6c1
Merge pull request #56996 from ClickHouse/vdimir/hash_join_max_block_size
HashJoin respects max_joined_block_size_rows
2023-12-27 15:46:46 +01:00
Alexey Milovidov
f00337e2ba
Merge pull request #57872 from CurtizJ/optimize-aggregation-consecutive-keys
Better optimization of consecutive keys in aggregation
2023-12-27 15:44:22 +01:00
Raúl Marín
f8d9a850c7 Fix perf test README 2023-12-27 12:16:17 +00:00
kevinyhzou
d7d1541d81 add performance test xml 2023-12-27 19:43:59 +08:00
Duc Canh Le
476ca4246d Merge branch 'master' into final_no_copy
Resolve conflicts + add some comments

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
2023-12-27 07:00:58 +00:00
vdimir
afda7f1079
Add tests/performance/hashjoin_with_large_output.xml
Co-authored-by: liuneng <1398775315@qq.com>
2023-12-22 15:52:45 +00:00
zhanglistar
408f9ea1ae
Merge branch 'ClickHouse:master' into if-opt 2023-12-21 09:50:23 +08:00
zhanglistar
810305fafe Fix if.xml. 2023-12-20 10:58:46 +08:00
zhanglistar
f2a02cca2f if.xml add more perf tests. 2023-12-18 16:55:24 +08:00
zhanglistar
6d73c8b157 code format of if.xml 2023-12-18 16:48:27 +08:00
zhanglistar
ba34b80087 If improvement add comment and performance test. 2023-12-18 16:43:45 +08:00
Max K
84e5870b71
Reapply "improve CI with digest for docker, build and test jobs" (#57904)
* Revert "Revert "improve CI with digest for docker, build and test jobs""

* fix: docker manifest merge for missing images only
2023-12-18 09:07:22 +01:00
Anton Popov
5faf5e913e slightly faster and perf test 2023-12-16 03:35:59 +00:00
Robert Schulze
b36a93a25d
Revert "Remove arrayFold"
This reverts commit 15dc0ed610.
2023-12-14 09:52:29 +00:00
Alexey Milovidov
15dc0ed610 Remove arrayFold 2023-12-14 04:34:32 +01:00
Alexey Milovidov
2988f6f92a
Revert "Add new aggregation function groupArraySorted()" 2023-12-05 15:31:17 +03:00
taiyang-li
4bbc0db154 Optimize array element function when input is array(map)/array(array(string))/array(array(number)) 2023-12-05 15:57:43 +08:00
Alexey Milovidov
ae476002ef Remove bad test 2023-12-05 06:56:57 +01:00
Yarik Briukhovetskyi
69205769d0
Merge branch 'ClickHouse:master' into group_sorted_array_function 2023-11-23 20:23:47 +01:00
yariks5s
cb6898b52f fix due to review 2023-11-23 18:17:47 +00:00
Duc Canh Le
4b0382442f Merge branch 'master' into final_no_copy
Try fixing CI issues
2023-11-22 08:53:38 +00:00
Alexander Tokmakov
28e0c51e3f
Update avg_weighted.xml (#56797) 2023-11-16 20:46:17 +01:00
taiyang-li
819c7e75ff Merge branch 'master' into orc_tuple_field_prune 2023-11-02 17:52:05 +08:00
taiyang-li
b8665a610c fix failed perf test 2023-11-02 15:27:48 +08:00
taiyang-li
b142489c3c fix code style 2023-11-02 10:49:18 +08:00
lgbo-ustc
8334585eaf improve parquet struct field reading 2023-11-01 15:18:39 +08:00
taiyang-li
c97b2c5be7 fix code style 2023-10-31 12:00:45 +08:00
taiyang-li
b72341e1a8 Merge branch 'master' into orc_tuple_field_prune 2023-10-31 10:07:43 +08:00
taiyang-li
38f24c0455 add performance tests 2023-10-30 20:29:43 +08:00
Alexey Milovidov
88440d4c07
Merge pull request #54568 from JackyWoo/optimize_uniq_to_count2
Resubmit optimization uniq to count
2023-10-30 01:33:36 +01:00
Alexey Milovidov
64b6e68a50
Merge pull request #55683 from amosbird/issue-55653
Reuse granule during skip index reading
2023-10-30 00:51:51 +01:00
frinkr
18c50c11b3
Multithreading after window functions (#50771)
* feat: Preserve number of streams after evaluation the window functions to allow parallel stream processing

* fix style

* fix style

* fix style

* setting query_plan_preserve_num_streams_after_window_functions default true

* fix tests by SETTINGS query_plan_preserve_num_streams_after_window_functions=0

* fix test references

* Resize the streams after the last window function, to keep the order between WindowTransforms (and WindowTransform works on single stream anyway).

* feat: Preserve number of streams after evaluation the window functions to allow parallel stream processing

* fix style

* fix style

* fix style

* setting query_plan_preserve_num_streams_after_window_functions default true

* fix tests by SETTINGS query_plan_preserve_num_streams_after_window_functions=0

* fix test references

* Resize the streams after the last window function, to keep the order between WindowTransforms (and WindowTransform works on single stream anyway).

* add perf test

* perf: change the dataset from 50M to 5M

* rename query_plan_preserve_num_streams_after_window_functions -> query_plan_enable_multithreading_after_window_functions

* update test reference

* fix clang-tidy

---------

Co-authored-by: Nikita Taranov <nikita.taranov@clickhouse.com>
2023-10-27 12:36:28 +02:00
lgbo
489e6d9bdc
Optimization for getting value from map, arrayElement(1/2) (#55929) 2023-10-27 09:54:25 +02:00
Maksim Kita
aa5fc05a55 Revert "Merge pull request #55682 from ClickHouse/revert-35961-decimal-column-improve-get-permutation"
This reverts commit f6dee5fe3c, reversing
changes made to f96bda1deb.
2023-10-25 21:48:13 +03:00
李扬
465962df7f
Support orc filter push down (file + stripe + rowgroup level) (#55330)
* support orc filter push down

* update orc lib version

* replace setqueryinfo with setkeycondition

* fix issue https://github.com/ClickHouse/ClickHouse/issues/53536

* refactor source with key condition

* fix building error

* remove std::cout

* update orc

* update orc version

* fix bugs

* improve code

* upgrade orc lib

* fix code style

* change as requested

* add performance tests for orc filter push down

* add performance tests for orc filter push down

* fix all bugs

* fix default as null issue

* add uts for null as default issues

* upgrade orc lib

* fix failed orc lib uts and fix typo

* fix failed uts

* fix failed uts

* fix ast fuzzer tests

* fix bug of uint64 overflow in https://s3.amazonaws.com/clickhouse-test-reports/55330/de22fdcaea2e12c96f300e95f59beba84401712d/fuzzer_astfuzzerubsan/report.html

* fix asan fatal caused by reused column vector batch in native orc input format. refer to https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__asan__[4_4].htm

* fix wrong performance tests

* disable 02892_orc_filter_pushdown on aarch64. https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__aarch64_.html

* add some comments

* add some comments

* inline range::equals and range::less

* fix data race of key condition

* trigger ci
2023-10-24 12:08:17 -07:00
Robert Schulze
68c3f41b71
Fix performance tests 2023-10-24 08:56:09 +00:00
Duc Canh Le
5923e1b116
Cache cast function in set during execution (#55712)
* Cache cast function in set during execution

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>

* minor fix for performance test

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>

* Update src/Interpreters/castColumn.cpp

Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>

* improvement

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>

* fix use-after-free

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>

---------

Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>
2023-10-23 13:31:44 +02:00
Amos Bird
602f01f651
Reuse granule during skip index reading 2023-10-18 14:40:34 +08:00
Alexey Milovidov
2da1ff4b0d
Revert "Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort" 2023-10-16 19:07:11 +03:00
Duc Canh Le
16687632da add format Null to performance tests
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
2023-10-14 08:01:55 +00:00
Duc Canh Le
285ae778e4 Merge branch 'master' into final_no_copy
Fix flaky test 02447_drop_database_replica
2023-10-14 03:34:42 +00:00
Duc Canh Le
dcc464b4da add more performance test
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
2023-10-13 08:16:54 +00:00
Maksim Kita
d4f9e0de12 Added performance test 2023-10-11 19:01:00 +03:00
Robert Schulze
2848548c63
Merge remote-tracking branch 'rschu1ze/master' into arrayFold 2023-10-08 16:32:36 +00:00
JackyWoo
826f7ac7eb Merge branch 'master' into optimize_uniq_to_count2 2023-09-27 09:14:28 +08:00
Maksim Kita
40be8227ea Fixed tests 2023-09-25 17:29:42 +03:00
Maksim Kita
1de95d8c36 Updated implementation 2023-09-25 17:29:42 +03:00
Maksim Kita
0a9835d085 Added performance tests 2023-09-25 17:29:42 +03:00
JackyWoo
231d16040b Merge branch 'master' into optimize_uniq_to_count2 2023-09-19 10:29:03 +08:00
robot-clickhouse
51851ecc21
Merge pull request #54613 from bigo-sg/improve_json_query
Improve json sql functions by serializing json element into column's buffer direclty
2023-09-15 19:35:30 +02:00
lgbo-ustc
fa0f9b0e1f update benckmark test 2023-09-15 18:09:58 +08:00
lgbo-ustc
e8d217634e improve json sql functions by serilizing data into column direclty 2023-09-14 12:41:17 +08:00
Sema Checherinda
8a9b544a97
Merge branch 'master' into optimize_all_lonely_parts 2023-09-13 16:07:19 +02:00
JackyWoo
70a262a775 Add optimization uniq to count 2023-09-13 16:16:11 +08:00
Alexey Milovidov
bd4aec0601
Revert "Optimize uniq to count" 2023-09-13 09:14:06 +03:00
JackyWoo
d065ac32e0 Merge branch 'master' into optimize_uniq_to_count 2023-09-04 10:06:36 +08:00
Duc Canh Le
06afe0c2aa more stable stateless test + add perf. test 2023-08-31 06:27:06 +00:00
Jiebin Sun
7c529e5691
Optimize the merge if all hashSets are singleLevel in UniqExactSet (#52973)
* Optimize the merge if all hashSets are singleLevel

In PR(https://github.com/ClickHouse/ClickHouse/pull/50748), it has added new phase
`parallelizeMergePrepare` before merge if all the hashSets are not all singleLevel
or not all twoLevel. Then it will convert all the singleLevelSet to twoLevelSet in
parallel, which will increase the CPU utilization and QPS.

But if all the hashtables are singleLevel, it could also benefit from the
`parallelizeMergePrepare` optimization in most cases if the hashtable size are not
too small. By tuning the Query `SELECT COUNT(DISTINCT SearchPhase) FROM hits_v1`
in different threads, we have got the mild threshold 6,000.

Test patch with the Query 'SELECT COUNT(DISTINCT Title) FROM hits_v1' on 2x80 vCPUs
server. If the threads are less than 48, the hashSets are all twoLevel or mixed by
singleLevel and twoLevel. If the threads are over 56, all the hashSets are singleLevel.
And the QPS has got at most 2.35x performance gain.

Threads	Opt/Base
8	100.0%
16	99.4%
24	110.3%
32	99.9%
40	99.3%
48	99.8%
56	183.0%
64	234.7%
72	233.1%
80	229.9%
88	224.5%
96	229.6%
104	235.1%
112	229.5%
120	229.1%
128	217.8%
136	222.9%
144	217.8%
152	204.3%
160	203.2%

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

* Add the comment and explanation for PR#52973

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>

---------

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
2023-08-30 11:26:16 +02:00
JackyWoo
a963048e1a Merge branch 'master' into optimize_uniq_to_count 2023-08-28 11:10:05 +08:00