ClickHouse/tests
Azat Khuzhin 2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
..
ci Set allow_experimental_query_cache setting as obsolete (#49934) 2023-05-17 20:03:42 +02:00
config
fuzz
instructions
integration Support hardlinking parts transactionally 2023-05-18 21:05:56 -07:00
jepsen.clickhouse
perf_drafts
performance Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout 2023-05-19 06:07:21 +02:00
queries Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout 2023-05-19 06:07:21 +02:00
sqllogic
.gitignore
.rgignore
broken_tests.txt Merge pull request #49800 from ClickHouse/fix-adding-cast 2023-05-16 17:05:02 +02:00
clickhouse-test
CMakeLists.txt
msan_suppressions.txt
tsan_suppressions.txt
ubsan_suppressions.txt