Commit Graph

115165 Commits

Author SHA1 Message Date
Azat Khuzhin
01bf041cca Rewrite HashTableGrower{,WithPrecalculation}::set w/o ternary operators
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
634f168a74 Introduce max_size_degree for HashTableGrower{,WithPrecalculation}
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
42eac6bfbc Wrap implementation helpers into HashedDictionaryImpl namespace
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
6f351851ad Rename grower to HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
1ab130132c Add more comments into HashedDictionaryCollectionType.h
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7eba6def94 Add a comment for HashTableGrowerWithPrecalculation about load factor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
422cbe08fe Do not use PackedHashMap for non-POD for the purposes of layout
In clang-16 the behaviour for POD types had been changed in [1], this
does not allows us to use PackedHashMap for some types.

  [1]: 277123376c

Note, that I tried to come up with a more generic solution then
enumeratic types, but failed. Though now I think that this is good,
since this shows which types are not allowed for PackedHashMap

Another option is to use -fclang-abi-compat=13.0 but I doubt it is a
good idea.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fc19e79f50 Change coding style of declaring packed attribute in PackedHashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
65dd87d0da Fix "reference binding to misaligned address" in PackedHashMap
Use separate helpers that accept/return values, instead of reference,
anyway PackedHashMap is developed for small structure.

v0: fix for keys
v2: fix for values
v3: fix bitEquals
v4: fix for iterating over HashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7c8d8eeb56 Use Cell::setMapped() over separate helper insertSetMapped()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
3698302ddb Accept float values for dictionary layouts configurations
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
8c6d691f52 Use HashTable constructor in HashSet
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fb6f7631c2 Add ability to pass grower for HashTable during creation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7b5d156cc5 Optimize SPARSE_HASHED layout (by using PackedHashMap)
In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.

Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.

And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).

Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap                    | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB

As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).

v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
b44497fd4c Introduce PackedHashMap (HashMap with structure without padding)
In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the
overhead of 38% can be crutial, especially if you have tons of keys.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c4f23e87f1 Export grower_type in HashTable
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Michael Kolupaev
e84f0895e7 Support hardlinking parts transactionally 2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy
a2c3de5082
Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization
Fix IPv6 encoding in protobuf
2023-05-18 23:02:15 -04:00
Yakov Olkhovskiy
30083351f5 test fix 2023-05-18 14:42:48 +00:00
Sergei Trifonov
f98c337d2f
Fix stack-use-after-scope in resource manager test (#49908)
* Fix stack-use-after-scope in resource manager test

* fix
2023-05-18 14:53:46 +02:00
Kseniia Sumarokova
dd5ee930eb
Merge pull request #49914 from kssenii/fix-assertion-in-do-cleanup
Fix assertion in CacheMetadata::doCleanup
2023-05-18 12:22:49 +02:00
Kseniia Sumarokova
adebac1a92
Merge branch 'master' into fix-assertion-in-do-cleanup 2023-05-18 12:22:02 +02:00
robot-ch-test-poll2
a0ef0955da
Merge pull request #49983 from imbingo123/imbingo123-patch-modify_docs
Update grant.md
2023-05-18 10:39:49 +02:00
libin
d294ecbc16
Update grant.md
docs: Modifying grant example
2023-05-18 15:50:19 +08:00
Alexey Milovidov
86e14547d4
Merge pull request #49964 from ClickHouse/kssenii-patch-7
Follow up to #49429
2023-05-18 09:20:00 +03:00
Alexey Milovidov
5065049154
Merge pull request #49971 from azat/revert-48593-group_array_nullable
[RFC] Revert "`groupArray` returns cannot be nullable"
2023-05-18 09:17:42 +03:00
Rich Raposa
03b5bfe218
Merge pull request #49968 from ClickHouse/reddit
Add Reddit comments to datasets
2023-05-17 15:26:29 -06:00
Kseniia Sumarokova
855c95f626
Update src/Interpreters/Cache/Metadata.cpp
Co-authored-by: Igor Nikonov <954088+devcrafter@users.noreply.github.com>
2023-05-17 22:46:09 +02:00
Yakov Olkhovskiy
612b79868b test added 2023-05-17 20:40:51 +00:00
Azat Khuzhin
e2e3a03dbe
Revert "groupArray returns cannot be nullable" 2023-05-17 22:33:30 +02:00
rfraposa
6a136897e3 Create reddit-comments.md 2023-05-17 13:23:53 -06:00
Timur Solodovnikov
c7ab59302f
Set allow_experimental_query_cache setting as obsolete (#49934)
* set allow_experimental_query_cache as obsolete

* add tsolodov to trusted contributors

* CI linter

---------

Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
2023-05-17 20:03:42 +02:00
Kseniia Sumarokova
1c04085e8f
Update MergeTreeWriteAheadLog.h 2023-05-17 18:15:51 +02:00
Dan Roscigno
67b8aca910
Merge pull request #49935 from ClickHouse/thomoco-patch-3
Update postgresql.md
2023-05-17 11:25:24 -04:00
kssenii
f2dbcb5146 Better fix 2023-05-17 16:27:06 +02:00
alesapin
2b7bc19cae
Merge pull request #49911 from ClickHouse/make_test_less_flaky
Retry connection expired in test_rename_column/test.py
2023-05-17 16:03:48 +02:00
alesapin
a7c179e401
Merge branch 'master' into make_test_less_flaky 2023-05-17 15:44:24 +02:00
Han Fei
ed1d036151
Merge pull request #49884 from azat/dist-fix-async-block-processing
Fix processing pending batch for Distributed async INSERT after restart
2023-05-17 15:19:42 +02:00
Alexander Tokmakov
36c31e1d79
Improve concurrent parts removal with zero copy replication (#49630)
* improve concurrent parts removal

* fix

* fix
2023-05-17 14:07:34 +03:00
Alexander Tokmakov
c4d074a0a0
Merge pull request #48726 from ClickHouse/Follow_up_Backup_Restore_concurrency_check_node_2
Back/Restore concurrency check on previous fails
2023-05-17 14:03:24 +03:00
Alexander Tokmakov
1e529263d0
Merge branch 'master' into Follow_up_Backup_Restore_concurrency_check_node_2 2023-05-17 13:57:50 +03:00
Vitaly Baranov
15ebbd2ed6
Merge pull request #48896 from vitlibar/write-encrypted-to-backup
BACKUP from encrypted disks must not decrypt data
2023-05-17 12:40:00 +02:00
Vitaly Baranov
6c8a923c9d
Merge branch 'master' into write-encrypted-to-backup 2023-05-17 12:37:05 +02:00
Kseniia Sumarokova
ac048cbbff
Merge pull request #49925 from kssenii/add-more-logging-for-cache
Add some logging
2023-05-17 12:29:40 +02:00
Kseniia Sumarokova
edceda494d
Merge branch 'master' into add-more-logging-for-cache 2023-05-17 12:24:59 +02:00
Kseniia Sumarokova
3787b7f127
Update Metadata.cpp 2023-05-17 12:16:18 +02:00
Vitaly Baranov
f4ac4c3f9d Corrections after review. 2023-05-17 03:23:16 +02:00
Yakov Olkhovskiy
0a44a69dc8 remove unnecessary header 2023-05-17 00:22:13 +00:00
Yakov Olkhovskiy
282297b677 binary encoding of IPv6 in protobuf 2023-05-16 23:46:01 +00:00