ClickHouse/docs/en/sql-reference/aggregate-functions/reference/uniqcombined.md
Ivan Blinkov 258d2fd499
[docs] split various kinds of CREATE queries into separate articles (#12328)
* normalize

* split & adjust links

* re-normalize

* adjust ru links

* adjust ja/tr links

* partially apply e0d19d2aea

* reset contribs
2020-07-09 18:10:35 +03:00

2.6 KiB
Raw Blame History

toc_priority
192

uniqCombined

Calculates the approximate number of different argument values.

uniqCombined(HLL_precision)(x[, ...])

The uniqCombined function is a good choice for calculating the number of different values.

Parameters

The function takes a variable number of parameters. Parameters can be Tuple, Array, Date, DateTime, String, or numeric types.

HLL_precision is the base-2 logarithm of the number of cells in HyperLogLog. Optional, you can use the function as uniqCombined(x[, ...]). The default value for HLL_precision is 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each).

Returned value

  • A number UInt64-type number.

Implementation details

Function:

  • Calculates a hash (64-bit hash for String and 32-bit otherwise) for all parameters in the aggregate, then uses it in calculations.

  • Uses a combination of three algorithms: array, hash table, and HyperLogLog with an error correction table.

    For a small number of distinct elements, an array is used. When the set size is larger, a hash table is used. For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory.
    
  • Provides the result deterministically (it doesnt depend on the query processing order).

!!! note "Note" Since it uses 32-bit hash for non-String type, the result will have very high error for cardinalities significantly larger than UINT_MAX (error will raise quickly after a few tens of billions of distinct values), hence in this case you should use uniqCombined64

Compared to the uniq function, the uniqCombined:

  • Consumes several times less memory.
  • Calculates with several times higher accuracy.
  • Usually has slightly lower performance. In some scenarios, uniqCombined can perform better than uniq, for example, with distributed queries that transmit a large number of aggregation states over the network.

See Also