ClickHouse

thevar1able/ClickHouse

Fork 0

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-10-09 18:10:48 +00:00

Commit Graph

Author	SHA1	Message	Date
Marek Vavruša	45bd332460	AggregateFunctionTopK: fix memory usage, performance * allow separate table key / hash key, and use std::string / StringRef for generic variant as it has built-in storage and StringRef is supported by the hash table, this avoids infinitely growing arena with serialised keys * use power-of-2 size for alpha vector for faster binning without using modulo * use custom grower and allocator for SpaceSaving to start with smaller tables * store computed hash in counter for faster reinsertion of smallest element	2017-05-11 18:52:49 +04:00
Marek Vavruša	5f1e65b252	AggregateFunctions: implemented topK(n) This implements a new function for approximate computation of the most frequent entries using Filtered Space Saving with a merge step adapted from Parallel Space Saving paper. It works better for cases where GROUP BY x is impractical due to high cardinality of x, such as top IP addresses or top search queries.	2017-05-03 23:09:52 -07:00

Author

SHA1

Message

Date

Marek Vavruša

45bd332460

AggregateFunctionTopK: fix memory usage, performance

* allow separate table key / hash key, and use
  std::string / StringRef for generic variant as
  it has built-in storage and StringRef is supported
  by the hash table, this avoids infinitely
  growing arena with serialised keys
* use power-of-2 size for alpha vector for faster
  binning without using modulo
* use custom grower and allocator for SpaceSaving
  to start with smaller tables
* store computed hash in counter for faster
  reinsertion of smallest element

2017-05-11 18:52:49 +04:00

Marek Vavruša

5f1e65b252

AggregateFunctions: implemented topK(n)

This implements a new function for approximate
computation of the most frequent entries using
Filtered Space Saving with a merge step adapted
from Parallel Space Saving paper.

It works better for cases where GROUP BY x
is impractical due to high cardinality of x,
such as top IP addresses or top search queries.

2017-05-03 23:09:52 -07:00

2 Commits