Commit Graph

115451 Commits

Author SHA1 Message Date
Alexander Tokmakov
55fc4adf05
Update 02441_alter_delete_and_drop_column.sql 2023-05-19 16:42:15 +03:00
Antonio Andelic
7f60af11cb
Merge branch 'master' into write-buffer-from-s3 2023-05-19 15:17:05 +02:00
robot-clickhouse
f947d1cc0a Automatic style fix 2023-05-19 13:00:26 +00:00
Antonio Andelic
3107070e76 Avoid deadlock when starting table in attach thread 2023-05-19 12:48:19 +00:00
Dmitry Novik
d705e5102b
Merge pull request #49838 from ClickHouse/group-by-constant-fix
Analyzer: do not optimize GROUP BY keys with ROLLUP and CUBE
2023-05-19 14:27:34 +02:00
Sergei Trifonov
5db5f6e44b
Merge branch 'master' into fix-throttlers 2023-05-19 14:08:36 +02:00
Alexey Milovidov
ab162756ba
Merge branch 'master' into dot_product 2023-05-19 14:46:53 +03:00
Antonio Andelic
9c3b17fa18
Remove whitespace 2023-05-19 13:00:51 +02:00
alesapin
6676285f02
Merge pull request #49921 from azat/tests/fix-test_distributed_load_balancing
Fix flakiness of test_distributed_load_balancing test
2023-05-19 13:00:38 +02:00
alesapin
632ab8a3d1
Merge pull request #49996 from ClickHouse/az
Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails
2023-05-19 12:58:47 +02:00
Antonio Andelic
e46476dba2
Update src/Coordination/Changelog.cpp
Co-authored-by: alesapin <alesapin@clickhouse.com>
2023-05-19 12:44:20 +02:00
Alexey Milovidov
f5506210d6 Geo types are production ready 2023-05-19 12:43:55 +02:00
alesapin
2398de9d2f
Merge pull request #49473 from ClickHouse/fix_another_zero_copy_bug
Fix another zero copy bug
2023-05-19 12:41:36 +02:00
alesapin
e741450b88
Merge branch 'master' into fix_another_zero_copy_bug 2023-05-19 12:40:48 +02:00
alesapin
c46a5c27d0
Merge pull request #49889 from ClickHouse/fix_some_tests4
Fix some tests
2023-05-19 12:40:34 +02:00
alesapin
e5b001abda
Merge branch 'master' into fix_some_tests4 2023-05-19 12:34:03 +02:00
kssenii
d6df009842 Fix 2023-05-19 12:22:46 +02:00
Antonio Andelic
6e468b29e8 Check return value of ftruncate 2023-05-19 10:15:06 +00:00
Alexey Milovidov
c1b96fd2f5
Merge branch 'master' into Fix_flaky_test_ssl_cert_authentication 2023-05-19 12:37:55 +03:00
Alexey Milovidov
70c83f5133
Merge pull request #49991 from amosbird/clickhouse_as_library
Use PROJECT_*_DIR instead of CMAKE_*_DIR.
2023-05-19 12:37:18 +03:00
Alexey Milovidov
4dbe5b8329 Support them in tests 2023-05-19 11:13:28 +02:00
Alexey Milovidov
193c82a09a
Merge pull request #49993 from den-crane/test/issue_46128
test for #46128
2023-05-19 11:43:13 +03:00
Alexey Milovidov
d234aebfc3
Merge pull request #49992 from azat/build/fix-woboq
Fix woboq codebrowser build with -Wno-poison-system-directories
2023-05-19 11:38:36 +03:00
Alexey Milovidov
f47375d16c Support Tableau 2023-05-19 10:28:13 +02:00
Victor Krasnov
4efb2037a2 Fix the premature amendment of the 00921 test 2023-05-19 06:22:21 +00:00
Azat Khuzhin
dc353faf44 Simplify obtaining query shard in test_distributed_load_balancing
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:58 +02:00
Azat Khuzhin
e37e8f83bb Fix flakiness of test_distributed_load_balancing
I saw the following in the logs for the failed test:

    2023.05.16 07:12:12.894051 [ 262 ] {74575ac0-b296-4fdc-bc8e-3476a305e6ea} <Warning> ConnectionPoolWithFailover: Connection failed at try №1, reason: Timeout exceeded while reading from socket (socket (172.16.3.2:9000), receive timeout 2000 ms)

And I think that the culprit is the
test_distributed_replica_max_ignored_errors for which it is normal,
however not for others, and this should not affect other tests.

So fix this by calling SYSTEM RELOAD CONFIG, which should reset error
count.

CI: https://s3.amazonaws.com/clickhouse-test-reports/49380/5abc1a1c68ee204c9024493be1d19835cf5630f7/integration_tests__release__[3_4].html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:58 +02:00
Azat Khuzhin
e1e2a83a9e Print type of the structure that will be used for HASHED/SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2b240d3721 Improve documentation for HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
f8e7d2cb1f Remove part of the HashTableGrowerWithPrecalculationAndMaxLoadFactor comment
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c9cde110cd Add initial degree as parameter for HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
01bf041cca Rewrite HashTableGrower{,WithPrecalculation}::set w/o ternary operators
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
634f168a74 Introduce max_size_degree for HashTableGrower{,WithPrecalculation}
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
42eac6bfbc Wrap implementation helpers into HashedDictionaryImpl namespace
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
6f351851ad Rename grower to HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
1ab130132c Add more comments into HashedDictionaryCollectionType.h
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7eba6def94 Add a comment for HashTableGrowerWithPrecalculation about load factor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
422cbe08fe Do not use PackedHashMap for non-POD for the purposes of layout
In clang-16 the behaviour for POD types had been changed in [1], this
does not allows us to use PackedHashMap for some types.

  [1]: 277123376c

Note, that I tried to come up with a more generic solution then
enumeratic types, but failed. Though now I think that this is good,
since this shows which types are not allowed for PackedHashMap

Another option is to use -fclang-abi-compat=13.0 but I doubt it is a
good idea.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fc19e79f50 Change coding style of declaring packed attribute in PackedHashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
65dd87d0da Fix "reference binding to misaligned address" in PackedHashMap
Use separate helpers that accept/return values, instead of reference,
anyway PackedHashMap is developed for small structure.

v0: fix for keys
v2: fix for values
v3: fix bitEquals
v4: fix for iterating over HashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7c8d8eeb56 Use Cell::setMapped() over separate helper insertSetMapped()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
3698302ddb Accept float values for dictionary layouts configurations
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
8c6d691f52 Use HashTable constructor in HashSet
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fb6f7631c2 Add ability to pass grower for HashTable during creation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7b5d156cc5 Optimize SPARSE_HASHED layout (by using PackedHashMap)
In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.

Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.

And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).

Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap                    | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB

As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).

v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
b44497fd4c Introduce PackedHashMap (HashMap with structure without padding)
In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the
overhead of 38% can be crutial, especially if you have tons of keys.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c4f23e87f1 Export grower_type in HashTable
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Michael Kolupaev
e84f0895e7 Support hardlinking parts transactionally 2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy
a2c3de5082
Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization
Fix IPv6 encoding in protobuf
2023-05-18 23:02:15 -04:00