ClickHouse/tests/performance/uniq.xml

<test>
    <settings>
        <max_memory_usage>30000000000</max_memory_usage>
        <!--
            Because of random distribution of data between threads, the number
            of unique keys per thread might differ. This means that sometimes
            we switch to two-level aggregation, and sometimes we don't, based on
            the "bytes" threshold. Two-level aggregation turns out to be twice
            as fast, because it merges aggregation states in multiple threads.
            Lower the threshold here, to avoid jitter. It is unclear whether it
            would be beneficial to lower the default as well.
        -->
        <group_by_two_level_threshold_bytes>10000000</group_by_two_level_threshold_bytes>
    </settings>

    <substitutions>
        <substitution>
           <name>key</name>
           <values>
               <!--
                    1 as a key doesn't make much sense - the queries are either
                    too long in case of uniqExact or too short in case of limited
                    uniq, so the result is drowned by the noise.
                    <value>1</value>
                -->
               <value>SearchEngineID</value>
               <value>RegionID</value>
               <value>SearchPhrase</value>
               <value>ClientIP</value>
           </values>
        </substitution>
        <substitution>
           <name>func</name>
           <values>
               <value>sum</value>
               <value>uniq</value>
               <value>uniqExact</value>
               <value>uniqHLL12</value>
               <value>uniqCombined(12)</value>
               <value>uniqCombined(13)</value>
               <value>uniqCombined(14)</value>
               <value>uniqCombined(15)</value>
               <value>uniqCombined(16)</value>
               <value>uniqCombined(17)</value>
               <value>uniqCombined(18)</value>
               <value>uniqUpTo(3)</value>
               <value>uniqUpTo(5)</value>
               <value>uniqUpTo(10)</value>
               <value>uniqUpTo(25)</value>
               <value>uniqUpTo(100)</value>
           </values>
       </substitution>
    </substitutions>

    <query>SELECT {key} AS k, {func}(UserID) FROM hits_100m_single GROUP BY k FORMAT Null</query>
    <query>SELECT {key} AS k, uniqTheta(UserID) FROM hits_10m_single GROUP BY k FORMAT Null</query>
</test>
update perf tests sed -i s'/^<test.$/<test>/g' tests/performance/.xml WITH ceil(max(q[3]), 1) AS h SELECT concat('sed -i s\'/^<test.$/<test max_ignored_relative_change="', toString(h), '">/g\' tests/performance/', test, '.xml') AS s FROM ( SELECT test, query_index, count(), min(event_time), max(event_time) AS t, arrayMap(x -> floor(x, 3), quantiles(0, 0.5, 0.95, 1)(stat_threshold)) AS q, median(stat_threshold) AS m FROM perftest.query_metrics_v2 WHERE ((pr_number != 0) AND (event_date > '2021-01-01')) AND (metric = 'client_time') AND (abs(diff) < 0.05) AND (old_value > 0.1) GROUP BY test, query_index, query_display_name HAVING (t > '2021-01-01 00:00:00') AND ((q[3]) > 0.1) ORDER BY test DESC ) GROUP BY test ORDER BY h DESC FORMAT PrettySpace 2021-02-02 14:21:43 +00:00			`<test>`
Added performance test #3406 2018-11-22 23:45:16 +00:00			`<settings>`
Forward settings in perf test and fix exception in uniq.xml 2019-02-12 09:13:31 +00:00			`<max_memory_usage>30000000000</max_memory_usage>`
lower two-level aggregation threshold for uniq test to avoid jitter 2021-05-12 13:01:48 +00:00			`<!--`
			`Because of random distribution of data between threads, the number`
			`of unique keys per thread might differ. This means that sometimes`
			`we switch to two-level aggregation, and sometimes we don't, based on`
			`the "bytes" threshold. Two-level aggregation turns out to be twice`
			`as fast, because it merges aggregation states in multiple threads.`
			`Lower the threshold here, to avoid jitter. It is unclear whether it`
			`would be beneficial to lower the default as well.`
			`-->`
			`<group_by_two_level_threshold_bytes>10000000</group_by_two_level_threshold_bytes>`
Added performance test #3406 2018-11-22 23:45:16 +00:00			`</settings>`

			`<substitutions>`
			`<substitution>`
Revert "Removed <name> from all performance tests #6179" This reverts commit d61d489c2e14fc5fb16aee3be0254cf214a37818. 2019-07-27 21:17:44 +00:00			`<name>key</name>`
Added performance test #3406 2018-11-22 23:45:16 +00:00			`<values>`
performance comparison 2020-04-23 20:18:46 +00:00			`<!--`
			`1 as a key doesn't make much sense - the queries are either`
			`too long in case of uniqExact or too short in case of limited`
			`uniq, so the result is drowned by the noise.`
			`<value>1</value>`
			`-->`
Added performance test #3406 2018-11-22 23:45:16 +00:00			`<value>SearchEngineID</value>`
			`<value>RegionID</value>`
Try to limit all queries to see the changes 2021-05-20 08:13:27 +00:00			`<value>SearchPhrase</value>`
Reorder values 2021-05-20 08:14:24 +00:00			`<value>ClientIP</value>`
Added performance test #3406 2018-11-22 23:45:16 +00:00			`</values>`
Add performance test 2021-05-21 06:29:56 +00:00			`</substitution>`
Added performance test #3406 2018-11-22 23:45:16 +00:00			`<substitution>`
Revert "Removed <name> from all performance tests #6179" This reverts commit d61d489c2e14fc5fb16aee3be0254cf214a37818. 2019-07-27 21:17:44 +00:00			`<name>func</name>`
Added performance test #3406 2018-11-22 23:45:16 +00:00			`<values>`
			`<value>sum</value>`
			`<value>uniq</value>`
			`<value>uniqExact</value>`
			`<value>uniqHLL12</value>`
			`<value>uniqCombined(12)</value>`
			`<value>uniqCombined(13)</value>`
			`<value>uniqCombined(14)</value>`
			`<value>uniqCombined(15)</value>`
			`<value>uniqCombined(16)</value>`
			`<value>uniqCombined(17)</value>`
			`<value>uniqCombined(18)</value>`
			`<value>uniqUpTo(3)</value>`
			`<value>uniqUpTo(5)</value>`
			`<value>uniqUpTo(10)</value>`
			`<value>uniqUpTo(25)</value>`
			`<value>uniqUpTo(100)</value>`
			`</values>`
			`</substitution>`
			`</substitutions>`

Use hits_10m_single only for uniqTheta 2021-06-11 12:22:35 +00:00			`<query>SELECT {key} AS k, {func}(UserID) FROM hits_100m_single GROUP BY k FORMAT Null</query>`
			`<query>SELECT {key} AS k, uniqTheta(UserID) FROM hits_10m_single GROUP BY k FORMAT Null</query>`
Added performance test #3406 2018-11-22 23:45:16 +00:00			`</test>`