Commit Graph

6 Commits

Author SHA1 Message Date
Marek Vavruša
28bb5e25cf AggregateFunctionTopK: read alphaMap for generic
* the alpha_map vector always (de)serialises
  the actual version (could empty sometimes)
* AggregateFunctionTopK generic variant deserialises
  it as well instead of ignoring it
* AggregateFunctionTopK generic variant clears the
  array before deserialising

refs #1283
2017-10-09 01:12:38 +03:00
Marek Vavruša
e189c39056 SpaceSaving: internal storage for StringRef{}
The SpaceSaving has now specialised storage for
some keys, which only copies keys that
are to be retained in the structure, not all.

Most of the PODs implement this interface empty,
so there shouldn’t be any extra cost.
2017-06-26 21:16:13 +03:00
Marek Vavruša
45bd332460 AggregateFunctionTopK: fix memory usage, performance
* allow separate table key / hash key, and use
  std::string / StringRef for generic variant as
  it has built-in storage and StringRef is supported
  by the hash table, this avoids infinitely
  growing arena with serialised keys
* use power-of-2 size for alpha vector for faster
  binning without using modulo
* use custom grower and allocator for SpaceSaving
  to start with smaller tables
* store computed hash in counter for faster
  reinsertion of smallest element
2017-05-11 18:52:49 +04:00
Alexey Milovidov
d3e6321967 AggregateFunctionTopK: minor modifications [#CLICKHOUSE-2]. 2017-05-05 16:36:02 -07:00
Alexey Milovidov
9d4c814b12 Aggregate function topK: style modifications [#CLICKHOUSE-2]. 2017-05-05 14:17:04 -07:00
Marek Vavruša
5f1e65b252 AggregateFunctions: implemented topK(n)
This implements a new function for approximate
computation of the most frequent entries using
Filtered Space Saving with a merge step adapted
from Parallel Space Saving paper.

It works better for cases where GROUP BY x
is impractical due to high cardinality of x,
such as top IP addresses or top search queries.
2017-05-03 23:09:52 -07:00