The SpaceSaving has now specialised storage for
some keys, which only copies keys that
are to be retained in the structure, not all.
Most of the PODs implement this interface empty,
so there shouldn’t be any extra cost.
* allow separate table key / hash key, and use
std::string / StringRef for generic variant as
it has built-in storage and StringRef is supported
by the hash table, this avoids infinitely
growing arena with serialised keys
* use power-of-2 size for alpha vector for faster
binning without using modulo
* use custom grower and allocator for SpaceSaving
to start with smaller tables
* store computed hash in counter for faster
reinsertion of smallest element
This implements a new function for approximate
computation of the most frequent entries using
Filtered Space Saving with a merge step adapted
from Parallel Space Saving paper.
It works better for cases where GROUP BY x
is impractical due to high cardinality of x,
such as top IP addresses or top search queries.