Commit Graph

41573 Commits

Author SHA1 Message Date
lgbo-ustc
80af345ea6 update 2023-05-22 10:17:41 +08:00
lgbo-ustc
d5efc0e688 update 2023-05-22 10:17:41 +08:00
lgbo-ustc
8efec9bcca add locks for getNonJoinedBlocks 2023-05-22 10:17:41 +08:00
lgbo-ustc
89dd538bea update 2023-05-22 10:17:41 +08:00
lgbo-ustc
5c44e6a562 triger ci 2023-05-22 10:17:40 +08:00
lgbo-ustc
d89beb1bf7 update tests 2023-05-22 10:17:40 +08:00
lgbo-ustc
7772fed161 update
1. fixed the memoery overflow problem when  handle all delayed buckets parallely
2. resue exists tests
2023-05-22 10:17:40 +08:00
lgbo-ustc
39db0f84d9 add comment 2023-05-22 10:17:40 +08:00
lgbo-ustc
39ff030a6e grace hash join supports right/full join 2023-05-22 10:17:40 +08:00
Robert Schulze
2a9ff30a7f
Merge pull request #49380 from azat/dict/hashed-memory
Improve memory usage and speed of SPARSE_HASHED/HASHED dictionaries
2023-05-21 15:46:41 +02:00
Sergei Trifonov
3c002755e2
Merge pull request #50036 from ClickHouse/fix-load-balancing
Load balancing bugfixes
2023-05-21 11:21:55 +02:00
vdimir
8b77e2096c
Merge pull request #49760 from arthurpassos/extract_kv_ignore_kv_delimiter_when_reading_value 2023-05-20 13:27:59 +02:00
Igor Nikonov
fbcbd3ab90
Merge pull request #49846 from ClickHouse/clearable_hash_set_without_zero_storage
Clearable hash table and zero values
2023-05-20 11:19:44 +02:00
Alexey Milovidov
4e3188126f
Merge pull request #49050 from FFFFFFFHHHHHHH/dot_product
Add Function dotProduct for array
2023-05-20 03:07:13 +03:00
Alexey Milovidov
2323542e47
Merge pull request #50022 from ClickHouse/geo-types-production-ready
Geo types are production ready
2023-05-20 02:02:23 +03:00
Alexey Milovidov
54f7b8e6ab
Merge pull request #50030 from kssenii/aws-client-save-provider
Add method getCredentials() to S3::Client
2023-05-20 01:59:58 +03:00
Igor Nikonov
af80e29519
Merge branch 'master' into clearable_hash_set_without_zero_storage 2023-05-19 23:36:30 +02:00
Sergei Trifonov
14e8132ac4
Merge branch 'master' into fix-load-balancing 2023-05-19 23:05:27 +02:00
alekar
de710209a7
Merge branch 'master' into fix-osx-setsockopt-errors 2023-05-19 11:15:01 -07:00
serxa
052d8aca71 limit max_tries value by max_error_cap to avoid unlimited number of retries 2023-05-19 18:13:29 +00:00
serxa
d69c35fcdd fix PoolWithFailover error_count integer overflow 2023-05-19 17:57:00 +00:00
serxa
086888b285 fix ConnectionPoolWithFailover::getPriority 2023-05-19 17:54:29 +00:00
serxa
35e77f8e2a fix comment 2023-05-19 17:53:22 +00:00
kssenii
b29edc4737 Add method 2023-05-19 16:38:14 +02:00
mateng915
5237dd0245
New system table zookeeper connection (#45245)
* Feature: Support new system table to show which zookeeper node be connected

Description:
============
Currently we have no place to check which zk node be connected otherwise using
lsof command. It not convenient

Solution:
=========
Implemented a new system table, system.zookeeper_host when CK Server has zk
this table will show the zk node dir which connected by current CK server

Noted: This table can support multi-zookeeper cluster scenario.

* fixed review comments

* added test case

* update test cases

* remove unused code

* fixed review comments and removed unused code

* updated test cases for print host, port and is_expired

* modify the code comments

* fixed CI Failed

* fixed code style check failure

* updated test cases by added Tags

* update test reference

* update test cases

* added system.zookeeper_connection doc

* Update docs/en/operations/system-tables/zookeeper_connection.md

* Update docs/en/operations/system-tables/zookeeper_connection.md

* Update docs/en/operations/system-tables/zookeeper_connection.md

---------

Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2023-05-19 17:06:43 +03:00
Sergei Trifonov
67bf9ac539
Merge pull request #49797 from azat/fix-throttlers
Fix per-query IO/BACKUPs throttling settings
2023-05-19 15:51:57 +02:00
Dmitry Novik
d705e5102b
Merge pull request #49838 from ClickHouse/group-by-constant-fix
Analyzer: do not optimize GROUP BY keys with ROLLUP and CUBE
2023-05-19 14:27:34 +02:00
Sergei Trifonov
5db5f6e44b
Merge branch 'master' into fix-throttlers 2023-05-19 14:08:36 +02:00
Alexey Milovidov
ab162756ba
Merge branch 'master' into dot_product 2023-05-19 14:46:53 +03:00
alesapin
632ab8a3d1
Merge pull request #49996 from ClickHouse/az
Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails
2023-05-19 12:58:47 +02:00
Alexey Milovidov
f5506210d6 Geo types are production ready 2023-05-19 12:43:55 +02:00
alesapin
e741450b88
Merge branch 'master' into fix_another_zero_copy_bug 2023-05-19 12:40:48 +02:00
alesapin
e5b001abda
Merge branch 'master' into fix_some_tests4 2023-05-19 12:34:03 +02:00
Alexey Milovidov
70c83f5133
Merge pull request #49991 from amosbird/clickhouse_as_library
Use PROJECT_*_DIR instead of CMAKE_*_DIR.
2023-05-19 12:37:18 +03:00
Azat Khuzhin
e1e2a83a9e Print type of the structure that will be used for HASHED/SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
f8e7d2cb1f Remove part of the HashTableGrowerWithPrecalculationAndMaxLoadFactor comment
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c9cde110cd Add initial degree as parameter for HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
01bf041cca Rewrite HashTableGrower{,WithPrecalculation}::set w/o ternary operators
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
634f168a74 Introduce max_size_degree for HashTableGrower{,WithPrecalculation}
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
42eac6bfbc Wrap implementation helpers into HashedDictionaryImpl namespace
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
6f351851ad Rename grower to HashTableGrowerWithPrecalculationAndMaxLoadFactor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
1ab130132c Add more comments into HashedDictionaryCollectionType.h
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7eba6def94 Add a comment for HashTableGrowerWithPrecalculation about load factor
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
422cbe08fe Do not use PackedHashMap for non-POD for the purposes of layout
In clang-16 the behaviour for POD types had been changed in [1], this
does not allows us to use PackedHashMap for some types.

  [1]: 277123376c

Note, that I tried to come up with a more generic solution then
enumeratic types, but failed. Though now I think that this is good,
since this shows which types are not allowed for PackedHashMap

Another option is to use -fclang-abi-compat=13.0 but I doubt it is a
good idea.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
fc19e79f50 Change coding style of declaring packed attribute in PackedHashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
65dd87d0da Fix "reference binding to misaligned address" in PackedHashMap
Use separate helpers that accept/return values, instead of reference,
anyway PackedHashMap is developed for small structure.

v0: fix for keys
v2: fix for values
v3: fix bitEquals
v4: fix for iterating over HashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7c8d8eeb56 Use Cell::setMapped() over separate helper insertSetMapped()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
3698302ddb Accept float values for dictionary layouts configurations
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
8c6d691f52 Use HashTable constructor in HashSet
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00