Commit Graph

2 Commits

Author SHA1 Message Date
Marek Vavruša
45bd332460 AggregateFunctionTopK: fix memory usage, performance
* allow separate table key / hash key, and use
  std::string / StringRef for generic variant as
  it has built-in storage and StringRef is supported
  by the hash table, this avoids infinitely
  growing arena with serialised keys
* use power-of-2 size for alpha vector for faster
  binning without using modulo
* use custom grower and allocator for SpaceSaving
  to start with smaller tables
* store computed hash in counter for faster
  reinsertion of smallest element
2017-05-11 18:52:49 +04:00
Marek Vavruša
5f1e65b252 AggregateFunctions: implemented topK(n)
This implements a new function for approximate
computation of the most frequent entries using
Filtered Space Saving with a merge step adapted
from Parallel Space Saving paper.

It works better for cases where GROUP BY x
is impractical due to high cardinality of x,
such as top IP addresses or top search queries.
2017-05-03 23:09:52 -07:00