uniqCombined() uses hashtable for medium cardinality, and since
HashTable resize by the power of 2 (well actually HashTableGrower grows
double by the power of 2, hence HashTableGrower::increaseSize() should
be overwritten to change this), with 1<<13 (default for uniqCombined)
and UInt64 HashValueType, the HashTable will takes:
getBufferSizeInBytes() == 131072
While it should be not greater then sizeof(HLL) ~= 98K, so reduce the
maximum cardinality for hashtable to 1<<12 with UInt64 HashValueType and
to 1<13 with UInt32, overwrite HashTableGrower::increaseSize() and cover
this using max_memory_usage.
Refs: https://github.com/ClickHouse/ClickHouse/pull/7221#issuecomment-539672742
v2: cover uniqCombined() with non-default K
By default uniqCombined() uses 32-bit hash for all types except String,
so this means that you cannot use uniqCombined() with i.e UInt64 and
cardinality > UINT_MAX, although you can use uniqCombined(toString())
and this will lead to 64-bit hash, but will not have good performance.
So uniqCombined64() had been introduced to fix this.
Requires: #7213
Instead, these methods return a pointer to the required data as they are
stored inside the hash table. The caller uses overloaded functions to
get the key and "mapped" values from this pointer. Such an interface
avoids the need for constructing iterator-like wrapper objects, which is
especially important for compound hash tables such as the future
StringHashMap.
Some aggregation methods initially emplace a temporary StringRef key
into a hash table. Then, if the key was not seen before, they make a
persistent copy of the key and update the hash table with it. This
approach is not suitable for compound hash tables, because the logic of
when the persistent key is needed is more complex, and is contained
within the hash table itself.
In this commit, we switch to managing key memory with callbacks passed
to the hash table, that allow it to request a persistent copy of the key
if it is needed. This should be more appropriate for compound hash
tables.
This commit prepares for StringHashMap PR #5417.