v2: add proper escaping
v3: set distributed_ddl_output_mode=none for test to fix replicated database build
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Right now the memory will be counted for query/user for dictionary, but
only if it load by user (via SYSTEM RELOAD QUERY or via dictGet()), but
it could be also loaded in backgrounad (due to lifetime, or
update_field, so it is like Buffer, only server memory should be
charged.
v2: mark test as long
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
Before you was able to break the format by using "\n" or "\t", that will
simply lead to DDL hang, because DDLWorker will simply log the error and
do nothing more:
<Error> DDLWorker: Cannot parse DDL task query-0000000056: Incorrect task format. Will try to send error status: Code: 27. DB::ParsingException: Cannot parse input: expected '\n' before: 'bar\n1\n'. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 23.5.1.1)
Fix this by adding proper escaping.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Feature: Support new system table to show which zookeeper node be connected
Description:
============
Currently we have no place to check which zk node be connected otherwise using
lsof command. It not convenient
Solution:
=========
Implemented a new system table, system.zookeeper_host when CK Server has zk
this table will show the zk node dir which connected by current CK server
Noted: This table can support multi-zookeeper cluster scenario.
* fixed review comments
* added test case
* update test cases
* remove unused code
* fixed review comments and removed unused code
* updated test cases for print host, port and is_expired
* modify the code comments
* fixed CI Failed
* fixed code style check failure
* updated test cases by added Tags
* update test reference
* update test cases
* added system.zookeeper_connection doc
* Update docs/en/operations/system-tables/zookeeper_connection.md
* Update docs/en/operations/system-tables/zookeeper_connection.md
* Update docs/en/operations/system-tables/zookeeper_connection.md
---------
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
In clang-16 the behaviour for POD types had been changed in [1], this
does not allows us to use PackedHashMap for some types.
[1]: 277123376c
Note, that I tried to come up with a more generic solution then
enumeratic types, but failed. Though now I think that this is good,
since this shows which types are not allowed for PackedHashMap
Another option is to use -fclang-abi-compat=13.0 but I doubt it is a
good idea.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Use separate helpers that accept/return values, instead of reference,
anyway PackedHashMap is developed for small structure.
v0: fix for keys
v2: fix for values
v3: fix bitEquals
v4: fix for iterating over HashMap
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).
Here is a table of different setups:
settings | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
- | - | - | - | - | -
HASHED upstream | - | - | - | - | 35GiB
SPARSE_HASHED upstream | - | - | - | - | 26GiB
- | - | - | - | - | -
sparse_hash_map glibc hashbench | - | - | - | - | 17.5GiB
sparse_hash_map packed allocator | 101.878 | 231.48 | 4.32 | - | 17.7GiB
PackedHashMap 0.5 | 15.514 | 42.35 | 23.61 | 20GiB | 22GiB
hashed 0.95 | 34.903 | 115.615 | 8.65 | 16GiB | 18.7GiB
**PackedHashMap 0.95** | **93.6** | **19.883** | **10.68** | **10GiB** | **12.8GiB**
PackedHashMap 0.99 | 26.113 | 83.6 | 11.96 | 10GiB | 12.3GiB
As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!
v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.
Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.
And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).
Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:
settings | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream | - | - | - | - | 35GiB
SPARSE_HASHED upstream | - | - | - | - | 26GiB
- | - | - | - | - | -
sparse_hash_map glibc hashbench | - | - | - | - | 17.5GiB
sparse_hash_map packed allocator | 101.878 | 231.48 | 4.32 | - | 17.7GiB
PackedHashMap | 15.514 | 42.35 | 23.61 | 20GiB | 22GiB
As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).
v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the
overhead of 38% can be crutial, especially if you have tons of keys.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* set allow_experimental_query_cache as obsolete
* add tsolodov to trusted contributors
* CI linter
---------
Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
And also update the test, since now you could have slightly less sleep
intervals, if query spend some time in other places.
But what is important is that query_duration_ms does not exceeded
calculated delay.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
remote throttler by some reason had been overwritten by the global one
during reloads, likely this is for graceful reload of this option, but
it breaks per-query throttling, remove this logic.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
When some of this settings was set for default profile (in
users.xml/users.yml), then it will be always used regardless of what
user passed.
Fix this by not inherit per-query throttlers, for this they should be
reset before making query context and they should not be initialized as
before in Context::makeQueryContext(), since makeQueryContext() called
too early, when user settings was not read yet.
But there we had also initialization of per-server throttling, move this
into the ContextSharedPart::configureServerWideThrottling(), and call it
once we have ServerSettings set.
Also note, that this patch makes the following settings - server
settings:
- max_replicated_fetches_network_bandwidth_for_server
- max_replicated_sends_network_bandwidth_for_server
But this change should not affect anybody, since it is done with
compatiblity (i.e. if this setting is set in users profile it will be
read from it as well as a fallback).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>