ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-24 00:22:29 +00:00

Author	SHA1	Message	Date
Azat Khuzhin	65dd87d0da	Fix "reference binding to misaligned address" in PackedHashMap Use separate helpers that accept/return values, instead of reference, anyway PackedHashMap is developed for small structure. v0: fix for keys v2: fix for values v3: fix bitEquals v4: fix for iterating over HashMap Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7c8d8eeb56	Use Cell::setMapped() over separate helper insertSetMapped() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	2996b38606	Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout As it turns out, HashMap/PackedHashMap works great even with max load factor of 0.99. By "great" I mean it least it works faster then google sparsehash, and not to mention it's friendliness to the memory allocator (it has zero fragmentation since it works with a continuious memory region, in comparison to the sparsehash that doing lots of realloc, which jemalloc does not like, due to it's slabs). Here is a table of different setups: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS - \| - \| - \| - \| - \| - HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap 0.5 \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB hashed 0.95 \| 34.903 \| 115.615 \| 8.65 \| 16GiB \| 18.7GiB PackedHashMap 0.95 \| 93.6 \| 19.883 \| 10.68 \| 10GiB \| 12.8GiB PackedHashMap 0.99 \| 26.113 \| 83.6 \| 11.96 \| 10GiB \| 12.3GiB As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less memory then SPARSE_HASHED in upstream, and it also 2x faster for read! v2: fix grower Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	3698302ddb	Accept float values for dictionary layouts configurations Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	8c6d691f52	Use HashTable constructor in HashSet Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	fb6f7631c2	Add ability to pass grower for HashTable during creation Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7b5d156cc5	Optimize SPARSE_HASHED layout (by using PackedHashMap) In case you want dictionary optimized for memory, SPARSE_HASHED is not always gives you what you need. Consider the following example <UInt64, UInt16> as <Key, Value>, but this pair will also have a 6 byte padding (on amd64), so this is almost 40% of space wastage. And because of this padding, even google::sparse_hash_map, does not make picture better, in fact, sparse_hash_map is not very friendly to memory allocators (especially jemalloc). Here are some numbers for dictionary with 1e9 elements and UInt64 as key, and UInt16 as value: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB As you can see PackedHashMap looks way more better then HASHED, and even better then SPARSE_HASHED, but slightly worse then sparse_hash_map with packed allocator (it is done with a custom patch to google sparse_hash_map). v2: rebase on top of bucket_count fix Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	b44497fd4c	Introduce PackedHashMap (HashMap with structure without padding) In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the overhead of 38% can be crutial, especially if you have tons of keys. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	c4f23e87f1	Export grower_type in HashTable Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Michael Kolupaev	e84f0895e7	Support hardlinking parts transactionally	2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy	a2c3de5082	Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization Fix IPv6 encoding in protobuf	2023-05-18 23:02:15 -04:00
Nikolay Degterinsky	ef45956713	Fix style	2023-05-19 01:31:45 +00:00
Nikolay Degterinsky	b8be714830	Add schema inference to more table engines	2023-05-19 00:44:27 +00:00
Dmitry Novik	aea71cf1bb	Merge branch 'master' into group-by-constant-fix	2023-05-19 01:29:56 +02:00
Michael Kolupaev	8dc59c1efe	Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails	2023-05-18 21:40:24 +00:00
Denny Crane	e7b6056bbb	test for #46128	2023-05-18 15:18:55 -03:00
Azat Khuzhin	0f7a310a67	Fix woboq codebrowser build with -Wno-poison-system-directories woboq codebrowser uses clang tooling, which adds clang system includes (in Linux::AddClangSystemIncludeArgs()), because none of (-nostdinc, -nobuiltininc) is set. And later it will complain with -Wpoison-system-directories for added by itself includes in InitHeaderSearch::AddUnmappedPath(), because they are starts from one of the following: - /usr/include - /usr/local/include The interesting thing here is that it got broken only after upgrading to llvm 16 (in #49678), and the reason for this is that clang 15 build has system includes that does not trigger the warning - "/usr/lib/clang/15.0.7/include", while clang 16 has "/usr/include/clang/16.0.4/include" So let's simply disable this warning, but only for woboq. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-18 18:26:05 +02:00
Azat Khuzhin	73661c3a46	Move tunnings for woboq codebrowser to cmake out from build.sh Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-18 18:18:30 +02:00
Amos Bird	6b4dcbd3ed	Use PROJECT__DIR instead of CMAKE__DIR.	2023-05-18 23:23:39 +08:00
Yakov Olkhovskiy	30083351f5	test fix	2023-05-18 14:42:48 +00:00
Denny Crane	94fe224935	Update partition.md	2023-05-18 10:06:59 -03:00
Sergei Trifonov	f98c337d2f	Fix stack-use-after-scope in resource manager test (#49908 ) * Fix stack-use-after-scope in resource manager test * fix	2023-05-18 14:53:46 +02:00
Kseniia Sumarokova	dd5ee930eb	Merge pull request #49914 from kssenii/fix-assertion-in-do-cleanup Fix assertion in CacheMetadata::doCleanup	2023-05-18 12:22:49 +02:00
Kseniia Sumarokova	adebac1a92	Merge branch 'master' into fix-assertion-in-do-cleanup	2023-05-18 12:22:02 +02:00
robot-ch-test-poll2	a0ef0955da	Merge pull request #49983 from imbingo123/imbingo123-patch-modify_docs Update grant.md	2023-05-18 10:39:49 +02:00
libin	d294ecbc16	Update grant.md docs: Modifying grant example	2023-05-18 15:50:19 +08:00
FFFFFFFHHHHHHH	d31371adac	Merge branch 'master' into dot_product	2023-05-18 15:31:25 +08:00
Alexey Gerasimchuk	e44263d101	Merge branch 'master' into ADQM-808	2023-05-18 17:08:25 +10:00
Alexey Milovidov	86e14547d4	Merge pull request #49964 from ClickHouse/kssenii-patch-7 Follow up to #49429	2023-05-18 09:20:00 +03:00
Alexey Milovidov	5065049154	Merge pull request #49971 from azat/revert-48593-group_array_nullable [RFC] Revert "`groupArray` returns cannot be nullable"	2023-05-18 09:17:42 +03:00
Alexey Gerasimchuk	1fb9e36b81	Merge branch 'master' into ADQM-808	2023-05-18 07:59:02 +10:00
Rich Raposa	03b5bfe218	Merge pull request #49968 from ClickHouse/reddit Add Reddit comments to datasets	2023-05-17 15:26:29 -06:00
Kseniia Sumarokova	855c95f626	Update src/Interpreters/Cache/Metadata.cpp Co-authored-by: Igor Nikonov <954088+devcrafter@users.noreply.github.com>	2023-05-17 22:46:09 +02:00
Yakov Olkhovskiy	612b79868b	test added	2023-05-17 20:40:51 +00:00
Azat Khuzhin	e2e3a03dbe	Revert "`groupArray` returns cannot be nullable"	2023-05-17 22:33:30 +02:00
rfraposa	6a136897e3	Create reddit-comments.md	2023-05-17 13:23:53 -06:00
Han Fei	549af4d351	address comments	2023-05-17 21:23:32 +02:00
DanRoscigno	a1fc96953f	reorder	2023-05-17 14:48:16 -04:00
Timur Solodovnikov	c7ab59302f	Set allow_experimental_query_cache setting as obsolete (#49934 ) * set allow_experimental_query_cache as obsolete * add tsolodov to trusted contributors * CI linter --------- Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>	2023-05-17 20:03:42 +02:00
Dan Roscigno	addc0c0ece	Merge branch 'master' into allow_experimental_parallel_reading_from_replicas	2023-05-17 13:20:14 -04:00
Kseniia Sumarokova	1c04085e8f	Update MergeTreeWriteAheadLog.h	2023-05-17 18:15:51 +02:00
Dan Roscigno	67b8aca910	Merge pull request #49935 from ClickHouse/thomoco-patch-3 Update postgresql.md	2023-05-17 11:25:24 -04:00
kssenii	f2dbcb5146	Better fix	2023-05-17 16:27:06 +02:00
alesapin	2b7bc19cae	Merge pull request #49911 from ClickHouse/make_test_less_flaky Retry connection expired in test_rename_column/test.py	2023-05-17 16:03:48 +02:00
alesapin	a7c179e401	Merge branch 'master' into make_test_less_flaky	2023-05-17 15:44:24 +02:00
Han Fei	ed1d036151	Merge pull request #49884 from azat/dist-fix-async-block-processing Fix processing pending batch for Distributed async INSERT after restart	2023-05-17 15:19:42 +02:00
Alexander Tokmakov	36c31e1d79	Improve concurrent parts removal with zero copy replication (#49630 ) * improve concurrent parts removal * fix * fix	2023-05-17 14:07:34 +03:00
Alexander Tokmakov	c4d074a0a0	Merge pull request #48726 from ClickHouse/Follow_up_Backup_Restore_concurrency_check_node_2 Back/Restore concurrency check on previous fails	2023-05-17 14:03:24 +03:00
Alexander Tokmakov	1e529263d0	Merge branch 'master' into Follow_up_Backup_Restore_concurrency_check_node_2	2023-05-17 13:57:50 +03:00
Vitaly Baranov	15ebbd2ed6	Merge pull request #48896 from vitlibar/write-encrypted-to-backup BACKUP from encrypted disks must not decrypt data	2023-05-17 12:40:00 +02:00

... 2 3 4 5 6 ...

115395 Commits