Sometimes replica may be readonly for a long time, and this will make
some INSERT queries fail, but it does not make sense to INSERT into
readonly replica, so let's ignore them.
But note, that this will require to extend TableStatus (not extend, but
introduce new version), that will have is_readonly field.
Also before background INSERT into Distributed does not uses
getManyChecked() which means that they do not request TableStatus
packet, while now they would, though this is minor (just a note).
v2: Add a note about max_replica_delay_for_distributed_queries for INSERT
v3: Skip read-only replicas for async INSERT into Distributed
v4: Remove extra @insert parameter for ConnectionPool*::get*
It make sense only when the table name had passed --
ConnectionPoolWithFailover::getManyChecked()
v5: rebase on top LoggerPtr
v6: rebase
v7: rebase
v8: move TryResult::is_readonly into the end
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Right now the receive_timeout for INSERT works as a timeout for
receiving data block, however this is not very convenient, since
sometimes server may not send data for quite some time (i.e. due to in
order aggregation), Ping packets is there for a reason (also Progress
and ProfileEvents as well, though the purpose is different).
Initially this special handling of receive_timeout had been added in
6a5ef9be83 ("dbms: fixed error with
hanging INSERTs [#METR-16514]"), but the behaviour was different, since
that time the receivePacket() was outside loop, only poll() was there,
and that was the workaround for poll() timeout (which does not triggers
the socket timeout).
But in fabd7193bd ("Code cleanups and
improvements"), receivePacket() had been moved into the loop, and so
this changed the behaviour of the timeout to current one.
Though all of this will not help for INSERT queries anyway, since there
are no Ping packets for them. Yet.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
I've noticed this in one of production setups, where lots of queries are
executed with distributed_group_by_no_merge=2 (automatically vai
optimize_skip_unused_shards optimization) and
optimize_aggregation_in_order=1, with this two settings initiator may
not read from some shards (i.e. X) for quite long period of time,
because it reads from other shards (i.e. Y) and it does not need
any data from X yet (due to it read everything in order), and this will
lead to query timeout, so timeouts had been increased.
Previously both timeouts had been tuned, but this leads to connection
hungs in case of abnormal machine reboots. So it is better to tune only
send_timeout (and this should be enough, since the only problem is the
sender) and this will allow to see that the connection is broken on the
initiator once it will read from this shard.
but after changing only send_timeout, the query still timedout, and the
reason is this place, which swaps this timeouts.
It had been introduced in 134efcd, with a comment:
NOTE: We use send_timeout for the receive timeout and vice versa (change arguments ordering in TimeoutSetter),
because send_timeout is client-side setting which has opposite meaning on the server side.
But it sounds odd to me, it may only make sense with the
clickhouse-client, since any other driver does not implement any server
settings.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* support async insert for native protocol
* use separate queue for async inserts via native protocol
* fix test
* better logging for async inserts and more tests
* disable mixed internal and external data in async inserts
* fix tests
* fix quota in async inserts
* disable async insert for secondary query of distributed
Added new type of authentication based on SSH keys. It works only for Native TCP protocol.
Co-authored-by: Nikita Mikhaylov <nikitamikhaylov@clickhouse.com>
Co-authored-by: Robert Schulze <robert@clickhouse.com>
Resolves: #51486
Until now, it was illegal to encode 64-bit (unsigned) integers with
MSB=1, i.e. values > (1ULL<<63) - 1, as var-int. In more detail, the
var-int code used by ClickHouse server and client spent at most 9 bytes
per value such that 9 * 7 = 63 bits could be encoded. Some 3rd party
clients (e.g. Rust clickhouse-rs) had the same limitation, whereas other
clients understand the full range (Python clickhouse-driver).
PRs #47608 and #48628 added sanity checks as asserts or exceptions
during var-int encoding on the server side. This was considered okay as
such huge integers so far occurred only during testing (usually fuzzing)
but not in practice.
Issue #51486 is a new fuzzing issue where the exception thrown from the
sanity check led to a half-baked progress packet and as a result, a
logical error / server crash.
The only fix which is not another bandaid is to allow the full range in
var-int coding. Clients will have to allow the full range too, a note
will be added to the changelog. (the alternative was to create another
protocol version but as var-int is used all over the place this was
considered infeasible)
Review note: this is the relevant commit.
Prevous implementation (DBMS_MIN_REVISION_WITH_INTERSERVER_SECRET)
accepts the salt from the client, which make it useless.
Reimplement the protocol to send the salt by the server and use it in
the client instead.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
It is useful to understand what is going on, in some obscure cases, for
instance if someone will copy configuration from the production to some
docker env, and then you will see docker's private network addresses
in the logs.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>