Commit Graph

115280 Commits

Author SHA1 Message Date
Alexey Milovidov
193c82a09a
Merge pull request #49993 from den-crane/test/issue_46128
test for #46128
2023-05-19 11:43:13 +03:00
Alexey Milovidov
d234aebfc3
Merge pull request #49992 from azat/build/fix-woboq
Fix woboq codebrowser build with -Wno-poison-system-directories
2023-05-19 11:38:36 +03:00
Alexey Milovidov
f47375d16c Support Tableau 2023-05-19 10:28:13 +02:00
Azat Khuzhin
dc353faf44 Simplify obtaining query shard in test_distributed_load_balancing
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:58 +02:00
Azat Khuzhin
e37e8f83bb Fix flakiness of test_distributed_load_balancing
I saw the following in the logs for the failed test:

    2023.05.16 07:12:12.894051 [ 262 ] {74575ac0-b296-4fdc-bc8e-3476a305e6ea} <Warning> ConnectionPoolWithFailover: Connection failed at try №1, reason: Timeout exceeded while reading from socket (socket (172.16.3.2:9000), receive timeout 2000 ms)

And I think that the culprit is the
test_distributed_replica_max_ignored_errors for which it is normal,
however not for others, and this should not affect other tests.

So fix this by calling SYSTEM RELOAD CONFIG, which should reset error
count.

CI: https://s3.amazonaws.com/clickhouse-test-reports/49380/5abc1a1c68ee204c9024493be1d19835cf5630f7/integration_tests__release__[3_4].html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:58 +02:00
Azat Khuzhin
e1e2a83a9e Print type of the structure that will be used for HASHED/SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2b240d3721 Improve documentation for HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
f8e7d2cb1f Remove part of the HashTableGrowerWithPrecalculationAndMaxLoadFactor comment
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c9cde110cd Add initial degree as parameter for HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
01bf041cca Rewrite HashTableGrower{,WithPrecalculation}::set w/o ternary operators
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
634f168a74 Introduce max_size_degree for HashTableGrower{,WithPrecalculation}
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
42eac6bfbc Wrap implementation helpers into HashedDictionaryImpl namespace
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
6f351851ad Rename grower to HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
1ab130132c Add more comments into HashedDictionaryCollectionType.h
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7eba6def94 Add a comment for HashTableGrowerWithPrecalculation about load factor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
422cbe08fe Do not use PackedHashMap for non-POD for the purposes of layout
In clang-16 the behaviour for POD types had been changed in [1], this
does not allows us to use PackedHashMap for some types.

  [1]: 277123376c

Note, that I tried to come up with a more generic solution then
enumeratic types, but failed. Though now I think that this is good,
since this shows which types are not allowed for PackedHashMap

Another option is to use -fclang-abi-compat=13.0 but I doubt it is a
good idea.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fc19e79f50 Change coding style of declaring packed attribute in PackedHashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
65dd87d0da Fix "reference binding to misaligned address" in PackedHashMap
Use separate helpers that accept/return values, instead of reference,
anyway PackedHashMap is developed for small structure.

v0: fix for keys
v2: fix for values
v3: fix bitEquals
v4: fix for iterating over HashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7c8d8eeb56 Use Cell::setMapped() over separate helper insertSetMapped()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
3698302ddb Accept float values for dictionary layouts configurations
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
8c6d691f52 Use HashTable constructor in HashSet
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fb6f7631c2 Add ability to pass grower for HashTable during creation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7b5d156cc5 Optimize SPARSE_HASHED layout (by using PackedHashMap)
In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.

Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.

And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).

Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap                    | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB

As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).

v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
b44497fd4c Introduce PackedHashMap (HashMap with structure without padding)
In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the
overhead of 38% can be crutial, especially if you have tons of keys.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c4f23e87f1 Export grower_type in HashTable
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Michael Kolupaev
e84f0895e7 Support hardlinking parts transactionally 2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy
a2c3de5082
Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization
Fix IPv6 encoding in protobuf
2023-05-18 23:02:15 -04:00
Dmitry Novik
aea71cf1bb
Merge branch 'master' into group-by-constant-fix 2023-05-19 01:29:56 +02:00
Michael Kolupaev
8dc59c1efe Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails 2023-05-18 21:40:24 +00:00
Denny Crane
e7b6056bbb test for #46128 2023-05-18 15:18:55 -03:00
Azat Khuzhin
0f7a310a67 Fix woboq codebrowser build with -Wno-poison-system-directories
woboq codebrowser uses clang tooling, which adds clang system includes
(in Linux::AddClangSystemIncludeArgs()), because none of (-nostdinc,
-nobuiltininc) is set.

And later it will complain with -Wpoison-system-directories for added by
itself includes in InitHeaderSearch::AddUnmappedPath(), because they are
starts from one of the following:
- /usr/include
- /usr/local/include

The interesting thing here is that it got broken only after upgrading to
llvm 16 (in #49678), and the reason for this is that clang 15 build has
system includes that does not trigger the warning -
"/usr/lib/clang/15.0.7/include", while clang 16 has
"/usr/include/clang/16.0.4/include"

So let's simply disable this warning, but only for woboq.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-18 18:26:05 +02:00
Azat Khuzhin
73661c3a46 Move tunnings for woboq codebrowser to cmake out from build.sh
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-18 18:18:30 +02:00
Amos Bird
6b4dcbd3ed
Use PROJECT_*_DIR instead of CMAKE_*_DIR. 2023-05-18 23:23:39 +08:00
Yakov Olkhovskiy
30083351f5 test fix 2023-05-18 14:42:48 +00:00
Denny Crane
94fe224935
Update partition.md 2023-05-18 10:06:59 -03:00
Sergei Trifonov
f98c337d2f
Fix stack-use-after-scope in resource manager test (#49908)
* Fix stack-use-after-scope in resource manager test

* fix
2023-05-18 14:53:46 +02:00
Kseniia Sumarokova
dd5ee930eb
Merge pull request #49914 from kssenii/fix-assertion-in-do-cleanup
Fix assertion in CacheMetadata::doCleanup
2023-05-18 12:22:49 +02:00
Kseniia Sumarokova
adebac1a92
Merge branch 'master' into fix-assertion-in-do-cleanup 2023-05-18 12:22:02 +02:00
robot-ch-test-poll2
a0ef0955da
Merge pull request #49983 from imbingo123/imbingo123-patch-modify_docs
Update grant.md
2023-05-18 10:39:49 +02:00
libin
d294ecbc16
Update grant.md
docs: Modifying grant example
2023-05-18 15:50:19 +08:00
FFFFFFFHHHHHHH
d31371adac
Merge branch 'master' into dot_product 2023-05-18 15:31:25 +08:00
Alexey Milovidov
86e14547d4
Merge pull request #49964 from ClickHouse/kssenii-patch-7
Follow up to #49429
2023-05-18 09:20:00 +03:00
Alexey Milovidov
5065049154
Merge pull request #49971 from azat/revert-48593-group_array_nullable
[RFC] Revert "`groupArray` returns cannot be nullable"
2023-05-18 09:17:42 +03:00
Rich Raposa
03b5bfe218
Merge pull request #49968 from ClickHouse/reddit
Add Reddit comments to datasets
2023-05-17 15:26:29 -06:00
Kseniia Sumarokova
855c95f626
Update src/Interpreters/Cache/Metadata.cpp
Co-authored-by: Igor Nikonov <954088+devcrafter@users.noreply.github.com>
2023-05-17 22:46:09 +02:00
Yakov Olkhovskiy
612b79868b test added 2023-05-17 20:40:51 +00:00
Azat Khuzhin
e2e3a03dbe
Revert "groupArray returns cannot be nullable" 2023-05-17 22:33:30 +02:00
rfraposa
6a136897e3 Create reddit-comments.md 2023-05-17 13:23:53 -06:00
DanRoscigno
a1fc96953f reorder 2023-05-17 14:48:16 -04:00