In the current version of Poco, unsigned long no longer aliases to
UInt64 with LP64. The size_t aliases to unsigned long with clang,
so all the uses of size_t instead of UInt64 when interacting with
Poco interfaces are gone. I replaced uses with UInt64 where it makes
sense, and added an overloaded function for readVarUInt() to support size_t.
* the alpha_map vector always (de)serialises
the actual version (could empty sometimes)
* AggregateFunctionTopK generic variant deserialises
it as well instead of ignoring it
* AggregateFunctionTopK generic variant clears the
array before deserialising
refs #1283
The SpaceSaving has now specialised storage for
some keys, which only copies keys that
are to be retained in the structure, not all.
Most of the PODs implement this interface empty,
so there shouldn’t be any extra cost.
* allow separate table key / hash key, and use
std::string / StringRef for generic variant as
it has built-in storage and StringRef is supported
by the hash table, this avoids infinitely
growing arena with serialised keys
* use power-of-2 size for alpha vector for faster
binning without using modulo
* use custom grower and allocator for SpaceSaving
to start with smaller tables
* store computed hash in counter for faster
reinsertion of smallest element
This implements a new function for approximate
computation of the most frequent entries using
Filtered Space Saving with a merge step adapted
from Parallel Space Saving paper.
It works better for cases where GROUP BY x
is impractical due to high cardinality of x,
such as top IP addresses or top search queries.