Commit Graph

115395 Commits

Author SHA1 Message Date
Azat Khuzhin
65dd87d0da Fix "reference binding to misaligned address" in PackedHashMap
Use separate helpers that accept/return values, instead of reference,
anyway PackedHashMap is developed for small structure.

v0: fix for keys
v2: fix for values
v3: fix bitEquals
v4: fix for iterating over HashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7c8d8eeb56 Use Cell::setMapped() over separate helper insertSetMapped()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
3698302ddb Accept float values for dictionary layouts configurations
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
8c6d691f52 Use HashTable constructor in HashSet
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fb6f7631c2 Add ability to pass grower for HashTable during creation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7b5d156cc5 Optimize SPARSE_HASHED layout (by using PackedHashMap)
In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.

Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.

And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).

Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap                    | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB

As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).

v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
b44497fd4c Introduce PackedHashMap (HashMap with structure without padding)
In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the
overhead of 38% can be crutial, especially if you have tons of keys.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c4f23e87f1 Export grower_type in HashTable
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Michael Kolupaev
e84f0895e7 Support hardlinking parts transactionally 2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy
a2c3de5082
Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization
Fix IPv6 encoding in protobuf
2023-05-18 23:02:15 -04:00
Nikolay Degterinsky
ef45956713 Fix style 2023-05-19 01:31:45 +00:00
Nikolay Degterinsky
b8be714830 Add schema inference to more table engines 2023-05-19 00:44:27 +00:00
Dmitry Novik
aea71cf1bb
Merge branch 'master' into group-by-constant-fix 2023-05-19 01:29:56 +02:00
Michael Kolupaev
8dc59c1efe Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails 2023-05-18 21:40:24 +00:00
Denny Crane
e7b6056bbb test for #46128 2023-05-18 15:18:55 -03:00
Azat Khuzhin
0f7a310a67 Fix woboq codebrowser build with -Wno-poison-system-directories
woboq codebrowser uses clang tooling, which adds clang system includes
(in Linux::AddClangSystemIncludeArgs()), because none of (-nostdinc,
-nobuiltininc) is set.

And later it will complain with -Wpoison-system-directories for added by
itself includes in InitHeaderSearch::AddUnmappedPath(), because they are
starts from one of the following:
- /usr/include
- /usr/local/include

The interesting thing here is that it got broken only after upgrading to
llvm 16 (in #49678), and the reason for this is that clang 15 build has
system includes that does not trigger the warning -
"/usr/lib/clang/15.0.7/include", while clang 16 has
"/usr/include/clang/16.0.4/include"

So let's simply disable this warning, but only for woboq.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-18 18:26:05 +02:00
Azat Khuzhin
73661c3a46 Move tunnings for woboq codebrowser to cmake out from build.sh
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-18 18:18:30 +02:00
Amos Bird
6b4dcbd3ed
Use PROJECT_*_DIR instead of CMAKE_*_DIR. 2023-05-18 23:23:39 +08:00
Yakov Olkhovskiy
30083351f5 test fix 2023-05-18 14:42:48 +00:00
Denny Crane
94fe224935
Update partition.md 2023-05-18 10:06:59 -03:00
Sergei Trifonov
f98c337d2f
Fix stack-use-after-scope in resource manager test (#49908)
* Fix stack-use-after-scope in resource manager test

* fix
2023-05-18 14:53:46 +02:00
Kseniia Sumarokova
dd5ee930eb
Merge pull request #49914 from kssenii/fix-assertion-in-do-cleanup
Fix assertion in CacheMetadata::doCleanup
2023-05-18 12:22:49 +02:00
Kseniia Sumarokova
adebac1a92
Merge branch 'master' into fix-assertion-in-do-cleanup 2023-05-18 12:22:02 +02:00
robot-ch-test-poll2
a0ef0955da
Merge pull request #49983 from imbingo123/imbingo123-patch-modify_docs
Update grant.md
2023-05-18 10:39:49 +02:00
libin
d294ecbc16
Update grant.md
docs: Modifying grant example
2023-05-18 15:50:19 +08:00
FFFFFFFHHHHHHH
d31371adac
Merge branch 'master' into dot_product 2023-05-18 15:31:25 +08:00
Alexey Gerasimchuk
e44263d101
Merge branch 'master' into ADQM-808 2023-05-18 17:08:25 +10:00
Alexey Milovidov
86e14547d4
Merge pull request #49964 from ClickHouse/kssenii-patch-7
Follow up to #49429
2023-05-18 09:20:00 +03:00
Alexey Milovidov
5065049154
Merge pull request #49971 from azat/revert-48593-group_array_nullable
[RFC] Revert "`groupArray` returns cannot be nullable"
2023-05-18 09:17:42 +03:00
Alexey Gerasimchuk
1fb9e36b81
Merge branch 'master' into ADQM-808 2023-05-18 07:59:02 +10:00
Rich Raposa
03b5bfe218
Merge pull request #49968 from ClickHouse/reddit
Add Reddit comments to datasets
2023-05-17 15:26:29 -06:00
Kseniia Sumarokova
855c95f626
Update src/Interpreters/Cache/Metadata.cpp
Co-authored-by: Igor Nikonov <954088+devcrafter@users.noreply.github.com>
2023-05-17 22:46:09 +02:00
Yakov Olkhovskiy
612b79868b test added 2023-05-17 20:40:51 +00:00
Azat Khuzhin
e2e3a03dbe
Revert "groupArray returns cannot be nullable" 2023-05-17 22:33:30 +02:00
rfraposa
6a136897e3 Create reddit-comments.md 2023-05-17 13:23:53 -06:00
Han Fei
549af4d351 address comments 2023-05-17 21:23:32 +02:00
DanRoscigno
a1fc96953f reorder 2023-05-17 14:48:16 -04:00
Timur Solodovnikov
c7ab59302f
Set allow_experimental_query_cache setting as obsolete (#49934)
* set allow_experimental_query_cache as obsolete

* add tsolodov to trusted contributors

* CI linter

---------

Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
2023-05-17 20:03:42 +02:00
Dan Roscigno
addc0c0ece
Merge branch 'master' into allow_experimental_parallel_reading_from_replicas 2023-05-17 13:20:14 -04:00
Kseniia Sumarokova
1c04085e8f
Update MergeTreeWriteAheadLog.h 2023-05-17 18:15:51 +02:00
Dan Roscigno
67b8aca910
Merge pull request #49935 from ClickHouse/thomoco-patch-3
Update postgresql.md
2023-05-17 11:25:24 -04:00
kssenii
f2dbcb5146 Better fix 2023-05-17 16:27:06 +02:00
alesapin
2b7bc19cae
Merge pull request #49911 from ClickHouse/make_test_less_flaky
Retry connection expired in test_rename_column/test.py
2023-05-17 16:03:48 +02:00
alesapin
a7c179e401
Merge branch 'master' into make_test_less_flaky 2023-05-17 15:44:24 +02:00
Han Fei
ed1d036151
Merge pull request #49884 from azat/dist-fix-async-block-processing
Fix processing pending batch for Distributed async INSERT after restart
2023-05-17 15:19:42 +02:00
Alexander Tokmakov
36c31e1d79
Improve concurrent parts removal with zero copy replication (#49630)
* improve concurrent parts removal

* fix

* fix
2023-05-17 14:07:34 +03:00
Alexander Tokmakov
c4d074a0a0
Merge pull request #48726 from ClickHouse/Follow_up_Backup_Restore_concurrency_check_node_2
Back/Restore concurrency check on previous fails
2023-05-17 14:03:24 +03:00
Alexander Tokmakov
1e529263d0
Merge branch 'master' into Follow_up_Backup_Restore_concurrency_check_node_2 2023-05-17 13:57:50 +03:00
Vitaly Baranov
15ebbd2ed6
Merge pull request #48896 from vitlibar/write-encrypted-to-backup
BACKUP from encrypted disks must not decrypt data
2023-05-17 12:40:00 +02:00