mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-25 00:52:02 +00:00
144 lines
5.7 KiB
Plaintext
144 lines
5.7 KiB
Plaintext
This directory contains several hash-map implementations, similar in
|
|
API to SGI's hash_map class, but with different performance
|
|
characteristics. sparse_hash_map uses very little space overhead, 1-2
|
|
bits per entry. dense_hash_map is very fast, particularly on lookup.
|
|
(sparse_hash_set and dense_hash_set are the set versions of these
|
|
routines.) On the other hand, these classes have requirements that
|
|
may not make them appropriate for all applications.
|
|
|
|
All these implementation use a hashtable with internal quadratic
|
|
probing. This method is space-efficient -- there is no pointer
|
|
overhead -- and time-efficient for good hash functions.
|
|
|
|
COMPILING
|
|
---------
|
|
To compile test applications with these classes, run ./configure
|
|
followed by make. To install these header files on your system, run
|
|
'make install'. (On Windows, the instructions are different; see
|
|
README_windows.txt.) See INSTALL for more details.
|
|
|
|
This code should work on any modern C++ system. It has been tested on
|
|
Linux (Ubuntu, Fedora, RedHat, Debian), Solaris 10 x86, FreeBSD 6.0,
|
|
OS X 10.3 and 10.4, and Windows under both VC++7 and VC++8.
|
|
|
|
USING
|
|
-----
|
|
See the html files in the doc directory for small example programs
|
|
that use these classes. It's enough to just include the header file:
|
|
|
|
#include <sparsehash/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
|
|
google::sparse_hash_set<int, int> number_mapper;
|
|
|
|
and use the class the way you would other hash-map implementations.
|
|
(Though see "API" below for caveats.)
|
|
|
|
By default (you can change it via a flag to ./configure), these hash
|
|
implementations are defined in the google namespace.
|
|
|
|
API
|
|
---
|
|
The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
|
|
dense_hash_set, are a superset of the API of SGI's hash_map class.
|
|
See doc/sparse_hash_map.html, et al., for more information about the
|
|
API.
|
|
|
|
The usage of these classes differ from SGI's hash_map, and other
|
|
hashtable implementations, in the following major ways:
|
|
|
|
1) dense_hash_map requires you to set aside one key value as the
|
|
'empty bucket' value, set via the set_empty_key() method. This
|
|
*MUST* be called before you can use the dense_hash_map. It is
|
|
illegal to insert any elements into a dense_hash_map whose key is
|
|
equal to the empty-key.
|
|
|
|
2) For both dense_hash_map and sparse_hash_map, if you wish to delete
|
|
elements from the hashtable, you must set aside a key value as the
|
|
'deleted bucket' value, set via the set_deleted_key() method. If
|
|
your hash-map is insert-only, there is no need to call this
|
|
method. If you call set_deleted_key(), it is illegal to insert any
|
|
elements into a dense_hash_map or sparse_hash_map whose key is
|
|
equal to the deleted-key.
|
|
|
|
3) These hash-map implementation support I/O. See below.
|
|
|
|
There are also some smaller differences:
|
|
|
|
1) The constructor takes an optional argument that specifies the
|
|
number of elements you expect to insert into the hashtable. This
|
|
differs from SGI's hash_map implementation, which takes an optional
|
|
number of buckets.
|
|
|
|
2) erase() does not immediately reclaim memory. As a consequence,
|
|
erase() does not invalidate any iterators, making loops like this
|
|
correct:
|
|
for (it = ht.begin(); it != ht.end(); ++it)
|
|
if (...) ht.erase(it);
|
|
As another consequence, a series of erase() calls can leave your
|
|
hashtable using more memory than it needs to. The hashtable will
|
|
automatically compact at the next call to insert(), but to
|
|
manually compact a hashtable, you can call
|
|
ht.resize(0)
|
|
|
|
I/O
|
|
---
|
|
In addition to the normal hash-map operations, sparse_hash_map can
|
|
read and write hashtables to disk. (dense_hash_map also has the API,
|
|
but it has not yet been implemented, and writes will always fail.)
|
|
|
|
In the simplest case, writing a hashtable is as easy as calling two
|
|
methods on the hashtable:
|
|
ht.write_metadata(fp);
|
|
ht.write_nopointer_data(fp);
|
|
|
|
Reading in this data is equally simple:
|
|
google::sparse_hash_map<...> ht;
|
|
ht.read_metadata(fp);
|
|
ht.read_nopointer_data(fp);
|
|
|
|
The above is sufficient if the key and value do not contain any
|
|
pointers: they are basic C types or agglomorations of basic C types.
|
|
If the key and/or value do contain pointers, you can still store the
|
|
hashtable by replacing write_nopointer_data() with a custom writing
|
|
routine. See sparse_hash_map.html et al. for more information.
|
|
|
|
SPARSETABLE
|
|
-----------
|
|
In addition to the hash-map and hash-set classes, this package also
|
|
provides sparsetable.h, an array implementation that uses space
|
|
proportional to the number of elements in the array, rather than the
|
|
maximum element index. It uses very little space overhead: 1 bit per
|
|
entry. See doc/sparsetable.html for the API.
|
|
|
|
RESOURCE USAGE
|
|
--------------
|
|
* sparse_hash_map has memory overhead of about 2 bits per hash-map
|
|
entry.
|
|
* dense_hash_map has a factor of 2-3 memory overhead: if your
|
|
hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
|
|
total.
|
|
|
|
Hashtables tend to double in size when resizing, creating an
|
|
additional 50% space overhead. dense_hash_map does in fact have a
|
|
significant "high water mark" memory use requirement.
|
|
sparse_hash_map, however, is written to need very little space
|
|
overhead when resizing: only a few bits per hashtable entry.
|
|
|
|
PERFORMANCE
|
|
-----------
|
|
You can compile and run the included file time_hash_map.cc to examine
|
|
the performance of sparse_hash_map, dense_hash_map, and your native
|
|
hash_map implementation on your system. One test against the
|
|
SGI hash_map implementation gave the following timing information for
|
|
a simple find() call:
|
|
SGI hash_map: 22 ns
|
|
dense_hash_map: 13 ns
|
|
sparse_hash_map: 117 ns
|
|
SGI map: 113 ns
|
|
|
|
See doc/performance.html for more detailed charts on resource usage
|
|
and performance data.
|
|
|
|
---
|
|
16 March 2005
|
|
(Last updated: 12 September 2010)
|