I've noticed this in one of production setups, where lots of queries are
executed with distributed_group_by_no_merge=2 (automatically vai
optimize_skip_unused_shards optimization) and
optimize_aggregation_in_order=1, with this two settings initiator may
not read from some shards (i.e. X) for quite long period of time,
because it reads from other shards (i.e. Y) and it does not need
any data from X yet (due to it read everything in order), and this will
lead to query timeout, so timeouts had been increased.
Previously both timeouts had been tuned, but this leads to connection
hungs in case of abnormal machine reboots. So it is better to tune only
send_timeout (and this should be enough, since the only problem is the
sender) and this will allow to see that the connection is broken on the
initiator once it will read from this shard.
but after changing only send_timeout, the query still timedout, and the
reason is this place, which swaps this timeouts.
It had been introduced in 134efcd, with a comment:
NOTE: We use send_timeout for the receive timeout and vice versa (change arguments ordering in TimeoutSetter),
because send_timeout is client-side setting which has opposite meaning on the server side.
But it sounds odd to me, it may only make sense with the
clickhouse-client, since any other driver does not implement any server
settings.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* support async insert for native protocol
* use separate queue for async inserts via native protocol
* fix test
* better logging for async inserts and more tests
* disable mixed internal and external data in async inserts
* fix tests
* fix quota in async inserts
* disable async insert for secondary query of distributed
Added new type of authentication based on SSH keys. It works only for Native TCP protocol.
Co-authored-by: Nikita Mikhaylov <nikitamikhaylov@clickhouse.com>
Co-authored-by: Robert Schulze <robert@clickhouse.com>
Resolves: #51486
Until now, it was illegal to encode 64-bit (unsigned) integers with
MSB=1, i.e. values > (1ULL<<63) - 1, as var-int. In more detail, the
var-int code used by ClickHouse server and client spent at most 9 bytes
per value such that 9 * 7 = 63 bits could be encoded. Some 3rd party
clients (e.g. Rust clickhouse-rs) had the same limitation, whereas other
clients understand the full range (Python clickhouse-driver).
PRs #47608 and #48628 added sanity checks as asserts or exceptions
during var-int encoding on the server side. This was considered okay as
such huge integers so far occurred only during testing (usually fuzzing)
but not in practice.
Issue #51486 is a new fuzzing issue where the exception thrown from the
sanity check led to a half-baked progress packet and as a result, a
logical error / server crash.
The only fix which is not another bandaid is to allow the full range in
var-int coding. Clients will have to allow the full range too, a note
will be added to the changelog. (the alternative was to create another
protocol version but as var-int is used all over the place this was
considered infeasible)
Review note: this is the relevant commit.