* allow separate table key / hash key, and use
std::string / StringRef for generic variant as
it has built-in storage and StringRef is supported
by the hash table, this avoids infinitely
growing arena with serialised keys
* use power-of-2 size for alpha vector for faster
binning without using modulo
* use custom grower and allocator for SpaceSaving
to start with smaller tables
* store computed hash in counter for faster
reinsertion of smallest element
This implements a new function for approximate
computation of the most frequent entries using
Filtered Space Saving with a merge step adapted
from Parallel Space Saving paper.
It works better for cases where GROUP BY x
is impractical due to high cardinality of x,
such as top IP addresses or top search queries.