In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.
Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.
And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).
Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:
settings | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream | - | - | - | - | 35GiB
SPARSE_HASHED upstream | - | - | - | - | 26GiB
- | - | - | - | - | -
sparse_hash_map glibc hashbench | - | - | - | - | 17.5GiB
sparse_hash_map packed allocator | 101.878 | 231.48 | 4.32 | - | 17.7GiB
PackedHashMap | 15.514 | 42.35 | 23.61 | 20GiB | 22GiB
As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).
v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the
overhead of 38% can be crutial, especially if you have tons of keys.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* set allow_experimental_query_cache as obsolete
* add tsolodov to trusted contributors
* CI linter
---------
Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
And also update the test, since now you could have slightly less sleep
intervals, if query spend some time in other places.
But what is important is that query_duration_ms does not exceeded
calculated delay.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
remote throttler by some reason had been overwritten by the global one
during reloads, likely this is for graceful reload of this option, but
it breaks per-query throttling, remove this logic.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
When some of this settings was set for default profile (in
users.xml/users.yml), then it will be always used regardless of what
user passed.
Fix this by not inherit per-query throttlers, for this they should be
reset before making query context and they should not be initialized as
before in Context::makeQueryContext(), since makeQueryContext() called
too early, when user settings was not read yet.
But there we had also initialization of per-server throttling, move this
into the ContextSharedPart::configureServerWideThrottling(), and call it
once we have ServerSettings set.
Also note, that this patch makes the following settings - server
settings:
- max_replicated_fetches_network_bandwidth_for_server
- max_replicated_sends_network_bandwidth_for_server
But this change should not affect anybody, since it is done with
compatiblity (i.e. if this setting is set in users profile it will be
read from it as well as a fallback).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
After abnormal server restart current_batch.txt (that contains list of
files to send to the remote shard), may not have all files, if it was
terminated between unlink .bin files and truncation of current_batch.txt
But it should be fixed in a more reliable way, though to backport the
patch I kept it simple.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
```
create table kafka
(
a UInt32,
a_str String Alias toString(a)
) engine = Kafka;
create table data
(
a UInt32;
a_str String
) engine = MergeTree
order by tuple();
create materialized view data_mv to data
(
a UInt32,
a_str String
) as
select a, a_str from kafka;
```
Alias type works as expected in comparison with MATERIALIZED/EPHEMERAL
or column with default expression.
Ref: https://github.com/ClickHouse/ClickHouse/pull/47138
Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>
setting (for query_plan_optimize_projection)
Fix a bug with projections and the aggregate_functions_null_for_empty
setting. This was already fixed in PR #42198 but got forgotten after
using query_plan_optimize_projection.
This happens when remote disconnects due to inactivity. It seems
to work on Linux, likely due to difference in SO_LINGER, maybe a
different default timeout on Darwin.
Verified manually using clickhouse cloud using following process:
1. Connect to instance.
2. Run `show tables`.
3. Wait 6 minutes.
4. Run `show tables`.
With this fix, the EINVAL is not reported, and client will simply
reconnect.