ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-24 08:32:02 +00:00

Author	SHA1	Message	Date
Azat Khuzhin	01bf041cca	Rewrite HashTableGrower{,WithPrecalculation}::set w/o ternary operators Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	634f168a74	Introduce max_size_degree for HashTableGrower{,WithPrecalculation} Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	42eac6bfbc	Wrap implementation helpers into HashedDictionaryImpl namespace Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	6f351851ad	Rename grower to HashTableGrowerWithPrecalculationAndMaxLoadFactor Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	1ab130132c	Add more comments into HashedDictionaryCollectionType.h Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7eba6def94	Add a comment for HashTableGrowerWithPrecalculation about load factor Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	422cbe08fe	Do not use PackedHashMap for non-POD for the purposes of layout In clang-16 the behaviour for POD types had been changed in [1], this does not allows us to use PackedHashMap for some types. [1]: `277123376c` Note, that I tried to come up with a more generic solution then enumeratic types, but failed. Though now I think that this is good, since this shows which types are not allowed for PackedHashMap Another option is to use -fclang-abi-compat=13.0 but I doubt it is a good idea. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	fc19e79f50	Change coding style of declaring packed attribute in PackedHashMap Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	65dd87d0da	Fix "reference binding to misaligned address" in PackedHashMap Use separate helpers that accept/return values, instead of reference, anyway PackedHashMap is developed for small structure. v0: fix for keys v2: fix for values v3: fix bitEquals v4: fix for iterating over HashMap Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7c8d8eeb56	Use Cell::setMapped() over separate helper insertSetMapped() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	2996b38606	Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout As it turns out, HashMap/PackedHashMap works great even with max load factor of 0.99. By "great" I mean it least it works faster then google sparsehash, and not to mention it's friendliness to the memory allocator (it has zero fragmentation since it works with a continuious memory region, in comparison to the sparsehash that doing lots of realloc, which jemalloc does not like, due to it's slabs). Here is a table of different setups: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS - \| - \| - \| - \| - \| - HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap 0.5 \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB hashed 0.95 \| 34.903 \| 115.615 \| 8.65 \| 16GiB \| 18.7GiB PackedHashMap 0.95 \| 93.6 \| 19.883 \| 10.68 \| 10GiB \| 12.8GiB PackedHashMap 0.99 \| 26.113 \| 83.6 \| 11.96 \| 10GiB \| 12.3GiB As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less memory then SPARSE_HASHED in upstream, and it also 2x faster for read! v2: fix grower Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	3698302ddb	Accept float values for dictionary layouts configurations Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	8c6d691f52	Use HashTable constructor in HashSet Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	fb6f7631c2	Add ability to pass grower for HashTable during creation Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7b5d156cc5	Optimize SPARSE_HASHED layout (by using PackedHashMap) In case you want dictionary optimized for memory, SPARSE_HASHED is not always gives you what you need. Consider the following example <UInt64, UInt16> as <Key, Value>, but this pair will also have a 6 byte padding (on amd64), so this is almost 40% of space wastage. And because of this padding, even google::sparse_hash_map, does not make picture better, in fact, sparse_hash_map is not very friendly to memory allocators (especially jemalloc). Here are some numbers for dictionary with 1e9 elements and UInt64 as key, and UInt16 as value: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB As you can see PackedHashMap looks way more better then HASHED, and even better then SPARSE_HASHED, but slightly worse then sparse_hash_map with packed allocator (it is done with a custom patch to google sparse_hash_map). v2: rebase on top of bucket_count fix Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	b44497fd4c	Introduce PackedHashMap (HashMap with structure without padding) In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the overhead of 38% can be crutial, especially if you have tons of keys. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	c4f23e87f1	Export grower_type in HashTable Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Michael Kolupaev	e84f0895e7	Support hardlinking parts transactionally	2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy	a2c3de5082	Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization Fix IPv6 encoding in protobuf	2023-05-18 23:02:15 -04:00
Yakov Olkhovskiy	30083351f5	test fix	2023-05-18 14:42:48 +00:00
Sergei Trifonov	f98c337d2f	Fix stack-use-after-scope in resource manager test (#49908 ) * Fix stack-use-after-scope in resource manager test * fix	2023-05-18 14:53:46 +02:00
Kseniia Sumarokova	dd5ee930eb	Merge pull request #49914 from kssenii/fix-assertion-in-do-cleanup Fix assertion in CacheMetadata::doCleanup	2023-05-18 12:22:49 +02:00
Kseniia Sumarokova	adebac1a92	Merge branch 'master' into fix-assertion-in-do-cleanup	2023-05-18 12:22:02 +02:00
robot-ch-test-poll2	a0ef0955da	Merge pull request #49983 from imbingo123/imbingo123-patch-modify_docs Update grant.md	2023-05-18 10:39:49 +02:00
libin	d294ecbc16	Update grant.md docs: Modifying grant example	2023-05-18 15:50:19 +08:00
Alexey Milovidov	86e14547d4	Merge pull request #49964 from ClickHouse/kssenii-patch-7 Follow up to #49429	2023-05-18 09:20:00 +03:00
Alexey Milovidov	5065049154	Merge pull request #49971 from azat/revert-48593-group_array_nullable [RFC] Revert "`groupArray` returns cannot be nullable"	2023-05-18 09:17:42 +03:00
Rich Raposa	03b5bfe218	Merge pull request #49968 from ClickHouse/reddit Add Reddit comments to datasets	2023-05-17 15:26:29 -06:00
Kseniia Sumarokova	855c95f626	Update src/Interpreters/Cache/Metadata.cpp Co-authored-by: Igor Nikonov <954088+devcrafter@users.noreply.github.com>	2023-05-17 22:46:09 +02:00
Yakov Olkhovskiy	612b79868b	test added	2023-05-17 20:40:51 +00:00
Azat Khuzhin	e2e3a03dbe	Revert "`groupArray` returns cannot be nullable"	2023-05-17 22:33:30 +02:00
rfraposa	6a136897e3	Create reddit-comments.md	2023-05-17 13:23:53 -06:00
Timur Solodovnikov	c7ab59302f	Set allow_experimental_query_cache setting as obsolete (#49934 ) * set allow_experimental_query_cache as obsolete * add tsolodov to trusted contributors * CI linter --------- Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>	2023-05-17 20:03:42 +02:00
Kseniia Sumarokova	1c04085e8f	Update MergeTreeWriteAheadLog.h	2023-05-17 18:15:51 +02:00
Dan Roscigno	67b8aca910	Merge pull request #49935 from ClickHouse/thomoco-patch-3 Update postgresql.md	2023-05-17 11:25:24 -04:00
kssenii	f2dbcb5146	Better fix	2023-05-17 16:27:06 +02:00
alesapin	2b7bc19cae	Merge pull request #49911 from ClickHouse/make_test_less_flaky Retry connection expired in test_rename_column/test.py	2023-05-17 16:03:48 +02:00
alesapin	a7c179e401	Merge branch 'master' into make_test_less_flaky	2023-05-17 15:44:24 +02:00
Han Fei	ed1d036151	Merge pull request #49884 from azat/dist-fix-async-block-processing Fix processing pending batch for Distributed async INSERT after restart	2023-05-17 15:19:42 +02:00
Alexander Tokmakov	36c31e1d79	Improve concurrent parts removal with zero copy replication (#49630 ) * improve concurrent parts removal * fix * fix	2023-05-17 14:07:34 +03:00
Alexander Tokmakov	c4d074a0a0	Merge pull request #48726 from ClickHouse/Follow_up_Backup_Restore_concurrency_check_node_2 Back/Restore concurrency check on previous fails	2023-05-17 14:03:24 +03:00
Alexander Tokmakov	1e529263d0	Merge branch 'master' into Follow_up_Backup_Restore_concurrency_check_node_2	2023-05-17 13:57:50 +03:00
Vitaly Baranov	15ebbd2ed6	Merge pull request #48896 from vitlibar/write-encrypted-to-backup BACKUP from encrypted disks must not decrypt data	2023-05-17 12:40:00 +02:00
Vitaly Baranov	6c8a923c9d	Merge branch 'master' into write-encrypted-to-backup	2023-05-17 12:37:05 +02:00
Kseniia Sumarokova	ac048cbbff	Merge pull request #49925 from kssenii/add-more-logging-for-cache Add some logging	2023-05-17 12:29:40 +02:00
Kseniia Sumarokova	edceda494d	Merge branch 'master' into add-more-logging-for-cache	2023-05-17 12:24:59 +02:00
Kseniia Sumarokova	3787b7f127	Update Metadata.cpp	2023-05-17 12:16:18 +02:00
Vitaly Baranov	f4ac4c3f9d	Corrections after review.	2023-05-17 03:23:16 +02:00
Yakov Olkhovskiy	0a44a69dc8	remove unnecessary header	2023-05-17 00:22:13 +00:00
Yakov Olkhovskiy	282297b677	binary encoding of IPv6 in protobuf	2023-05-16 23:46:01 +00:00

1 2 3 4 5 ...

115165 Commits