With this new layout, sparsehash will be used over default HashMap,
sparsehash is more memory efficient but it is also slower.
So in a nutshell:
- HashMap uses ~2x more memory then sparse_hash_map
- HashMap ~2-2.5x faster then sparse_hash_map
(tested on lots of input, and the most close to production was
dictionary with 600KK hashes and UInt16 as value)
TODO:
- fix allocated memory calculation
- getBufferSizeInBytes/getBufferSizeInCells interface
- benchmarks
v0: replace HashMap with google::sparse_hash_map
v2: use google::sparse_hash_map only when <sparse> isset to true
v3: replace attributes with different layout
v4: use ch hash over std::hash
clickhouse_common_io incudes new_delete.cpp, that uses memory.h, which
uses sdallocx (jemalloc).
And since there is -Wl,--no-undefined every undefined symbols are not
allowed, hence clickhouse_common_io must know about sdallocx symbol.
For the default build (-DUNBUNDLED=OFF) everything is good, because
jemalloc is static, and clickhouse_common_io linked with libcommon
(which is linked with jemalloc)
But if jemalloc will be shared, and clickhouse_common_io and libcommon
is different shared libraries then clickhouse_common_io should be linked
with jemalloc, otherwise you will undefined reference to sdallocx error.
This can be reproduced using the following build configuration:
-DUSE_STATIC_LIBRARIES=OFF -DCLICKHOUSE_SPLIT_BINARY=ON -DSPLIT_SHARED_LIBRARIES=ON -DUNBUNDLED=ON
Provided that you have systemd-wide jemalloc>=4 (see memory.h).
Refs: https://github.com/yandex/ClickHouse/pull/6878#discussion_r324902295
v2: do not link jemalloc if it is static
Scenarios that use Arena::allocContinue may waste quadratically many
memory and perform quadratically many copying, when the memory range
size reaches Arena's linear allocation threshold. To alleviate this,
make sure that the next memory chunk allocated by allocContinue is at
least linear_growth_threshold bytes bigger than the previous one, so
that we don't reallocate and copy that often.
- use sparsehash-c11 over libsparsehash
- fix typos in find_sparsehash and users of the vars (s/SPARCE/SPARSE/)
- drop libsparsehash-dev from docker images (but keep for unbunlded build)
- use ::google over GOOGLE_NAMESPACE