mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-27 10:02:01 +00:00
Fix possible "No more packets are available" for distributed queries
CI founds the following case: <details> 2022.05.25 22:36:06.778808 [ 3037 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Fatal> : Logical error: 'No more packets are available.'. 2022.05.25 22:42:24.960075 [ 17397 ] {} <Fatal> BaseDaemon: ######################################## 2022.05.25 22:42:24.971173 [ 17397 ] {} <Fatal> BaseDaemon: (version 22.6.1.1, build id: 9A1F9489854CED36) (from thread 3037) (query_id: 77743723-1fcd-4b3d-babc-d0615e3ff40e) (query: SELECT * FROM 2022.05.25 22:42:25.046871 [ 17397 ] {} <Fatal> BaseDaemon: 5. ./build_docker/../src/Common/Exception.cpp:47: DB::abortOnFailedAssertion() 2022.05.25 22:42:25.181449 [ 17397 ] {} <Fatal> BaseDaemon: 6. ./build_docker/../src/Common/Exception.cpp:70: DB::Exception::Exception() 2022.05.25 22:42:25.367710 [ 17397 ] {} <Fatal> BaseDaemon: 7. ./build_docker/../src/Client/MultiplexedConnections.cpp:0: DB::MultiplexedConnections::receivePacketUnlocked() 2022.05.25 22:42:25.414201 [ 17397 ] {} <Fatal> BaseDaemon: 8. ./build_docker/../src/Client/MultiplexedConnections.cpp:0: DB::MultiplexedConnections::receivePacket() 2022.05.25 22:42:25.493066 [ 17397 ] {} <Fatal> BaseDaemon: 9. ./build_docker/../src/QueryPipeline/RemoteQueryExecutor.cpp:279: DB::RemoteQueryExecutor::read() 2022.05.25 22:42:25.612679 [ 17397 ] {} <Fatal> BaseDaemon: 10. ./build_docker/../src/Processors/Sources/RemoteSource.cpp:0: DB::RemoteSource::tryGenerate() Here are additional logs for this query: $ pigz -cd clickhouse-server.stress.log.gz | fgrep -a 77743723-1fcd-4b3d-babc-d0615e3ff40e | fgrep -e Connection -e Distributed -e Fatal 2022.05.25 22:36:04.698671 [ 6613 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> Connection (127.0.0.2:9000): Connecting. Database: (not specified). User: default 2022.05.25 22:36:04.722568 [ 3419 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> Connection (127.0.0.2:9000): Connecting. Database: (not specified). User: default 2022.05.25 22:36:05.014432 [ 6613 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> Connection (127.0.0.2:9000): Connected to ClickHouse server version 22.6.1. 2022.05.25 22:36:05.091397 [ 6613 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Debug> Connection (127.0.0.2:9000): Sent data for 2 scalars, total 2 rows in 0.000125814 sec., 15602 rows/sec., 68.00 B (517.81 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (1.10 MiB/sec.) 2022.05.25 22:36:05.301301 [ 3419 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> Connection (127.0.0.2:9000): Connected to ClickHouse server version 22.6.1. 2022.05.25 22:36:05.343140 [ 3419 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Debug> Connection (127.0.0.2:9000): Sent data for 2 scalars, total 2 rows in 0.000116304 sec., 16889 rows/sec., 68.00 B (559.80 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (1.19 MiB/sec.) 2022.05.25 22:36:06.682535 [ 6613 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> StorageDistributed (remote): (127.0.0.2:9000) Cancelling query because enough data has been read 2022.05.25 22:36:06.778808 [ 3037 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Fatal> : Logical error: 'No more packets are available.'. 2022.05.25 22:36:06.789505 [ 3419 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> StorageDistributed (remote): (127.0.0.2:9000) Cancelling query because enough data has been read 2022.05.25 22:42:24.971173 [ 17397 ] {} <Fatal> BaseDaemon: (version 22.6.1.1, build id: 9A1F9489854CED36) (from thread 3037) (query_id: 77743723-1fcd-4b3d-babc-d0615e3ff40e) (query: SELECT * FROM </details> So between cancelling different sources the LOGICAL_ERROR occured, I believe that this is because of the race: T1: T2: RemoteQueryExecutor::read() checks was_cancelled RemoteQueryExecutor::tryCancel() connections->cancel() calls connections->receivePacket() Note, for this problem async_socket_for_remote/use_hedged_requests should be disabled, and original settings was: - --max_parallel_replicas=3 - --use_hedged_requests=false - --allow_experimental_parallel_reading_from_replicas=3 CI: https://s3.amazonaws.com/clickhouse-test-reports/37469/41cb029ed23e77f3a108e07e6b1b1bcb03dc7fcf/stress_test__undefined__actions_/fatal_messages.txt Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This commit is contained in:
parent
5faaf1b553
commit
078678237e
@ -275,6 +275,7 @@ Block RemoteQueryExecutor::read()
|
|||||||
|
|
||||||
while (true)
|
while (true)
|
||||||
{
|
{
|
||||||
|
std::lock_guard lock(was_cancelled_mutex);
|
||||||
if (was_cancelled)
|
if (was_cancelled)
|
||||||
return Block();
|
return Block();
|
||||||
|
|
||||||
@ -602,7 +603,8 @@ void RemoteQueryExecutor::sendExternalTables()
|
|||||||
|
|
||||||
void RemoteQueryExecutor::tryCancel(const char * reason, std::unique_ptr<ReadContext> * read_context)
|
void RemoteQueryExecutor::tryCancel(const char * reason, std::unique_ptr<ReadContext> * read_context)
|
||||||
{
|
{
|
||||||
/// Flag was_cancelled is atomic because it is checked in read().
|
/// Flag was_cancelled is atomic because it is checked in read(),
|
||||||
|
/// in case of packet had been read by fiber (async_socket_for_remote).
|
||||||
std::lock_guard guard(was_cancelled_mutex);
|
std::lock_guard guard(was_cancelled_mutex);
|
||||||
|
|
||||||
if (was_cancelled)
|
if (was_cancelled)
|
||||||
|
Loading…
Reference in New Issue
Block a user