Commit Graph

115364 Commits

Author SHA1 Message Date
Antonio Andelic
e46476dba2
Update src/Coordination/Changelog.cpp
Co-authored-by: alesapin <alesapin@clickhouse.com>
2023-05-19 12:44:20 +02:00
Alexey Milovidov
f5506210d6 Geo types are production ready 2023-05-19 12:43:55 +02:00
alesapin
2398de9d2f
Merge pull request #49473 from ClickHouse/fix_another_zero_copy_bug
Fix another zero copy bug
2023-05-19 12:41:36 +02:00
alesapin
e741450b88
Merge branch 'master' into fix_another_zero_copy_bug 2023-05-19 12:40:48 +02:00
alesapin
c46a5c27d0
Merge pull request #49889 from ClickHouse/fix_some_tests4
Fix some tests
2023-05-19 12:40:34 +02:00
alesapin
e5b001abda
Merge branch 'master' into fix_some_tests4 2023-05-19 12:34:03 +02:00
Antonio Andelic
6e468b29e8 Check return value of ftruncate 2023-05-19 10:15:06 +00:00
Alexey Milovidov
70c83f5133
Merge pull request #49991 from amosbird/clickhouse_as_library
Use PROJECT_*_DIR instead of CMAKE_*_DIR.
2023-05-19 12:37:18 +03:00
Alexey Milovidov
4dbe5b8329 Support them in tests 2023-05-19 11:13:28 +02:00
Alexey Milovidov
193c82a09a
Merge pull request #49993 from den-crane/test/issue_46128
test for #46128
2023-05-19 11:43:13 +03:00
Alexey Milovidov
d234aebfc3
Merge pull request #49992 from azat/build/fix-woboq
Fix woboq codebrowser build with -Wno-poison-system-directories
2023-05-19 11:38:36 +03:00
Alexey Milovidov
f47375d16c Support Tableau 2023-05-19 10:28:13 +02:00
Azat Khuzhin
dc353faf44 Simplify obtaining query shard in test_distributed_load_balancing
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:58 +02:00
Azat Khuzhin
e37e8f83bb Fix flakiness of test_distributed_load_balancing
I saw the following in the logs for the failed test:

    2023.05.16 07:12:12.894051 [ 262 ] {74575ac0-b296-4fdc-bc8e-3476a305e6ea} <Warning> ConnectionPoolWithFailover: Connection failed at try №1, reason: Timeout exceeded while reading from socket (socket (172.16.3.2:9000), receive timeout 2000 ms)

And I think that the culprit is the
test_distributed_replica_max_ignored_errors for which it is normal,
however not for others, and this should not affect other tests.

So fix this by calling SYSTEM RELOAD CONFIG, which should reset error
count.

CI: https://s3.amazonaws.com/clickhouse-test-reports/49380/5abc1a1c68ee204c9024493be1d19835cf5630f7/integration_tests__release__[3_4].html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:58 +02:00
Azat Khuzhin
e1e2a83a9e Print type of the structure that will be used for HASHED/SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2b240d3721 Improve documentation for HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
f8e7d2cb1f Remove part of the HashTableGrowerWithPrecalculationAndMaxLoadFactor comment
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c9cde110cd Add initial degree as parameter for HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
01bf041cca Rewrite HashTableGrower{,WithPrecalculation}::set w/o ternary operators
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
634f168a74 Introduce max_size_degree for HashTableGrower{,WithPrecalculation}
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
42eac6bfbc Wrap implementation helpers into HashedDictionaryImpl namespace
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
6f351851ad Rename grower to HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
1ab130132c Add more comments into HashedDictionaryCollectionType.h
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7eba6def94 Add a comment for HashTableGrowerWithPrecalculation about load factor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
422cbe08fe Do not use PackedHashMap for non-POD for the purposes of layout
In clang-16 the behaviour for POD types had been changed in [1], this
does not allows us to use PackedHashMap for some types.

  [1]: 277123376c

Note, that I tried to come up with a more generic solution then
enumeratic types, but failed. Though now I think that this is good,
since this shows which types are not allowed for PackedHashMap

Another option is to use -fclang-abi-compat=13.0 but I doubt it is a
good idea.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fc19e79f50 Change coding style of declaring packed attribute in PackedHashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
65dd87d0da Fix "reference binding to misaligned address" in PackedHashMap
Use separate helpers that accept/return values, instead of reference,
anyway PackedHashMap is developed for small structure.

v0: fix for keys
v2: fix for values
v3: fix bitEquals
v4: fix for iterating over HashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7c8d8eeb56 Use Cell::setMapped() over separate helper insertSetMapped()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
3698302ddb Accept float values for dictionary layouts configurations
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
8c6d691f52 Use HashTable constructor in HashSet
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fb6f7631c2 Add ability to pass grower for HashTable during creation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7b5d156cc5 Optimize SPARSE_HASHED layout (by using PackedHashMap)
In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.

Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.

And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).

Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap                    | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB

As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).

v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
b44497fd4c Introduce PackedHashMap (HashMap with structure without padding)
In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the
overhead of 38% can be crutial, especially if you have tons of keys.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c4f23e87f1 Export grower_type in HashTable
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Michael Kolupaev
e84f0895e7 Support hardlinking parts transactionally 2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy
a2c3de5082
Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization
Fix IPv6 encoding in protobuf
2023-05-18 23:02:15 -04:00
Nikolay Degterinsky
ef45956713 Fix style 2023-05-19 01:31:45 +00:00
Nikolay Degterinsky
b8be714830 Add schema inference to more table engines 2023-05-19 00:44:27 +00:00
Dmitry Novik
aea71cf1bb
Merge branch 'master' into group-by-constant-fix 2023-05-19 01:29:56 +02:00
Michael Kolupaev
8dc59c1efe Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails 2023-05-18 21:40:24 +00:00
Denny Crane
e7b6056bbb test for #46128 2023-05-18 15:18:55 -03:00
Azat Khuzhin
0f7a310a67 Fix woboq codebrowser build with -Wno-poison-system-directories
woboq codebrowser uses clang tooling, which adds clang system includes
(in Linux::AddClangSystemIncludeArgs()), because none of (-nostdinc,
-nobuiltininc) is set.

And later it will complain with -Wpoison-system-directories for added by
itself includes in InitHeaderSearch::AddUnmappedPath(), because they are
starts from one of the following:
- /usr/include
- /usr/local/include

The interesting thing here is that it got broken only after upgrading to
llvm 16 (in #49678), and the reason for this is that clang 15 build has
system includes that does not trigger the warning -
"/usr/lib/clang/15.0.7/include", while clang 16 has
"/usr/include/clang/16.0.4/include"

So let's simply disable this warning, but only for woboq.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-18 18:26:05 +02:00
Azat Khuzhin
73661c3a46 Move tunnings for woboq codebrowser to cmake out from build.sh
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-18 18:18:30 +02:00
Amos Bird
6b4dcbd3ed
Use PROJECT_*_DIR instead of CMAKE_*_DIR. 2023-05-18 23:23:39 +08:00
Yakov Olkhovskiy
30083351f5 test fix 2023-05-18 14:42:48 +00:00
Denny Crane
94fe224935
Update partition.md 2023-05-18 10:06:59 -03:00
Sergei Trifonov
f98c337d2f
Fix stack-use-after-scope in resource manager test (#49908)
* Fix stack-use-after-scope in resource manager test

* fix
2023-05-18 14:53:46 +02:00
Kseniia Sumarokova
dd5ee930eb
Merge pull request #49914 from kssenii/fix-assertion-in-do-cleanup
Fix assertion in CacheMetadata::doCleanup
2023-05-18 12:22:49 +02:00
Kseniia Sumarokova
adebac1a92
Merge branch 'master' into fix-assertion-in-do-cleanup 2023-05-18 12:22:02 +02:00