Commit Graph

21 Commits

Author SHA1 Message Date
Azat Khuzhin
610382b693 kafka: fix SIGSEGV if there is an message with error in the middle of the batch
ReadBufferFromKafkaConsumer does not handle the case when there is
message with an error on non first position in the current batch, since
it goes through messages in the batch after poll and stop on first valid
message.

But later it can try to use message as valid:
- while storing offset
- get topic name
- ...

And besides the message itself is also invalid (you can find this in the
gdb traces below).

So just filter out messages win an error error after poll.

SIGSEGV was with the following stacktrace:
    (gdb) bt
    3  0x0000000010f05b4d in rd_kafka_offset_store (app_rkt=0x0, partition=0, offset=0) at ../contrib/librdkafka/src/rdkafka_offset.c:656
    4  0x0000000010e69657 in cppkafka::Consumer::store_offset (this=0x7f2015210820, msg=...) at ../contrib/cppkafka/include/cppkafka/message.h:225
    5  0x000000000e68f208 in DB::ReadBufferFromKafkaConsumer::storeLastReadMessageOffset (this=0x7f206a136618) at ../contrib/libcxx/include/iterator:1508
    6  0x000000000e68b207 in DB::KafkaBlockInputStream::readImpl (this=0x7f202c689020) at ../src/Storages/Kafka/KafkaBlockInputStream.cpp:150
    7  0x000000000dd1178d in DB::IBlockInputStream::read (this=this@entry=0x7f202c689020) at ../src/DataStreams/IBlockInputStream.cpp:60
    8  0x000000000dd34c0a in DB::copyDataImpl<> () at ../src/DataStreams/copyData.cpp:21
    9  DB::copyData () at ../src/DataStreams/copyData.cpp:62
    10 0x000000000e67c8f2 in DB::StorageKafka::streamToViews () at ../contrib/libcxx/include/memory:3823
    11 0x000000000e67d218 in DB::StorageKafka::threadFunc () at ../src/Storages/Kafka/StorageKafka.cpp:488

And some information from it:

    (gdb) p this.current.__i
    $14 = (std::__1::__wrap_iter<cppkafka::Message const*>::iterator_type) 0x7f1ca8f58660

    # current-1
    (gdb) p $14-1
    $15 = (const cppkafka::Message *) 0x7f1ca8f58600
    (gdb) p $16.handle_
    $17 = {__ptr_ = {<std::__1::__compressed_pair_elem<rd_kafka_message_s*, 0, false>> = { __value_ = 0x7f203577f938}, ...}
    (gdb) p *(rd_kafka_message_s*)0x7f203577f938
    $24 = {err = RD_KAFKA_RESP_ERR__TRANSPORT, rkt = 0x0, partition = 0, payload = 0x7f202f0339c0, len = 63, key = 0x0, key_len = 0, offset = 0, _private = 0x7f203577f8c0}

    # current
    (gdb) p $14-0
    $28 = (const cppkafka::Message *) 0x7f1ca8f58660
    (gdb) p $28.handle_.__ptr_
    $29 = {<std::__1::__compressed_pair_elem<rd_kafka_message_s*, 0, false>> = { __value_ = 0x7f184f129bf0}, ...}
    (gdb) p *(rd_kafka_message_s*)0x7f184f129bf0
    $30 = {err = RD_KAFKA_RESP_ERR_NO_ERROR, rkt = 0x7f1ed44fe000, partition = 1, payload = 0x7f1fc9bc6036, len = 242, key = 0x0, key_len = 0, offset = 2394853582209,

    # current+1
    (gdb) p (*($14+1)).handle_.__ptr_
    $44 = {<std::__1::__compressed_pair_elem<rd_kafka_message_s*, 0, false>> = { __value_ = 0x7f184f129d30}, ...}
    (gdb) p *(rd_kafka_message_s*)0x7f184f129d30
    $45 = {err = RD_KAFKA_RESP_ERR_NO_ERROR, rkt = 0x7f1ed44fe000, partition = 1, payload = 0x7f1fc9bc612f, len = 31, key = 0x0, key_len = 0, offset = 2394853582210,
      _private = 0x7f184f129cc0}

    # distance from the beginning
    (gdb) p messages.__end_-messages.__begin_
    $34 = 65536
    (gdb) p ($14-0)-messages.__begin_
    $37 = 8965
    (gdb) p ($14-1)-messages.__begin_
    $38 = 8964

    # parsing info
    (gdb) p allowed
    $39 = false
    (gdb) p new_rows
    $40 = 1
    (gdb) p total_rows
    $41 = 8964

    # current buffer is invalid
    (gdb) p *buffer.__ptr_
    $50 = {<DB::ReadBuffer> = {<DB::BufferBase> = {pos = 0x7f202f0339c0 "FindCoordinator response error: Local: Broker transport failure", bytes = 47904863385, working_buffer = {
            begin_pos = 0x7f202f0339c0 "FindCoordinator response error: Local: Broker transport failure",
            end_pos = 0x7f202f0339c0 "FindCoordinator response error: Local: Broker transport failure"}, internal_buffer = {

v0: check message errors in ReadBufferFromKafkaConsumer::nextImpl() (but
this may lead to using of that messages after and SIGSEGV again, doh).
v2: skip messages with an error after poll.
2020-07-10 11:41:44 +03:00
Azat Khuzhin
bd5e5e9462 kafka: remove outdated comment
As stated by @filimonov it is not relevant (after #11599)
2020-07-03 09:23:39 +03:00
Azat Khuzhin
6104872cae kafka: fix SIGSEGV on DROP TABLE
After #11599 it is possible that messages of the
ReadBufferFromKafkaConsumer will be cleaned-up right in
read_kafka_message callback (from KafkaBlockInputStream) if the stop
flag isset (i.e. DROP TABLE is waiting the consumer), and if
read_kafka_message already processed some rows it will not return 0 and
the loop after will try to get current topic from the buffer, which uses
messages in the underlying and this will got SIGSEGV:

    12:14:56.173262 [ 55421 ] {f7930856-d478-4e41-af56-24ce7b693e95} <Debug> executeQuery: (from 0.0.0.0:0, user: ) DROP TABLE IF EXISTS data.queue
    12:14:56.173285 [ 55421 ] {f7930856-d478-4e41-af56-24ce7b693e95} <Trace> StorageKafka (newly_queue): Waiting for cleanup
    12:14:56.180016 [ 55390 ] {} <Trace> BaseDaemon: Received signal 11
    12:14:56.180267 [ 4914 ] {} <Fatal> BaseDaemon: ########################################
    12:14:56.181879 [ 4914 ] {} <Fatal> BaseDaemon: (version 20.6.1.1, build id: 4CE0298F08583658) (from thread 55468) (no query) Received signal Segmentation fault (11)
    12:14:56.181900 [ 4914 ] {} <Fatal> BaseDaemon: Address: 0x8 Access: read. Address not mapped to object.
    12:14:56.181909 [ 4914 ] {} <Fatal> BaseDaemon: Stack trace:
    12:14:56.184676 [ 4914 ] {} <Fatal> BaseDaemon: 3. /ch/contrib/cppkafka/include/cppkafka/message.h:111: DB::KafkaBlockInputStream::readImpl() @ 0xe343f1c in /usr/lib/debug/usr/bin/clickhouse
    12:14:56.185553 [ 4914 ] {} <Fatal> BaseDaemon: 4. /ch/contrib/libcxx/include/vector:1003: DB::IBlockInputStream::read() @ 0xd9d95bd in /usr/lib/debug/usr/bin/clickhouse
    12:14:56.188238 [ 4914 ] {} <Fatal> BaseDaemon: 5. /ch/src/DataStreams/copyData.cpp:26: DB::copyData() @ 0xd9f712a in /usr/lib/debug/usr/bin/clickhouse
    12:14:56.188780 [ 4914 ] {} <Fatal> BaseDaemon: 6. /ch/contrib/libcxx/include/vector:1532: DB::StorageKafka::streamToViews() @ 0xe335e73 in /usr/lib/debug/usr/bin/clickhouse
    12:14:56.189331 [ 4914 ] {} <Fatal> BaseDaemon: 7. /ch/src/Storages/Kafka/StorageKafka.cpp:491: DB::StorageKafka::threadFunc() @ 0xe336738 in /usr/lib/debug/usr/bin/clickhouse

55421 thread (shows that it still waiting for deactivation):

    5  std::__1::lock_guard<>::lock_guard () at ../contrib/libcxx/include/__mutex_base:90
    6  DB::BackgroundSchedulePoolTaskInfo::deactivate (this=0x7fc7e4465f20) at ../src/Core/BackgroundSchedulePool.cpp:59
    7  DB::StorageKafka::shutdown (this=0x7fc7e45e4600) at ../contrib/libcxx/include/memory:3821

And just in case thread where read_kafka_message is called:

    0  DB::ReadBufferFromKafkaConsumer::nextImpl (this=0x7fd4901d4118) at ../contrib/libcxx/include/atomic:1491
    1  DB::ReadBuffer::next (this=0x7fd4901d4118) at ../src/IO/ReadBuffer.h:59
    2  DB::ReadBuffer::eof (this=0x7fd4901d4118) at ../src/IO/ReadBuffer.h:81
    3  DB::skipWhitespaceIfAny (buf=...) at ../src/IO/ReadHelpers.h:945
    4  DB::JSONEachRowRowInputFormat::readRow (ext=..., columns=..., this=0x7fd499a7a020) at ../src/Processors/Formats/Impl/JSONEachRowRowInputFormat.cpp:222
    5  DB::JSONEachRowRowInputFormat::readRow (this=0x7fd499a7a020, columns=..., ext=...) at ../src/Processors/Formats/Impl/JSONEachRowRowInputFormat.cpp:218
    6  DB::IRowInputFormat::generate (this=0x7fd499a7a020) at ../src/Processors/Formats/IRowInputFormat.cpp:64
    7  DB::ISource::work (this=0x7fd499a7a020) at ../src/Processors/ISource.cpp:48
    8  DB::KafkaBlockInputStream::<lambda()>::operator()(void) const () at ../contrib/libcxx/include/memory:3826
    9  DB::KafkaBlockInputStream::readImpl (this=0x7fd46e718820) at ../contrib/libcxx/include/new:340

Cc: @filimonov
2020-07-03 09:22:44 +03:00
Mikhail Filimonov
9609bd9dee Kafka better states, formats based on PeekableReadBuffer, and other minor fixes.
Add formats tests, fixes for JSONCompactEachRowWithNamesAndTypes, TSVWithNamesAndTypes. Some CR fixes
Add sanitizing for kafka_max_block_size and kafka_poll_max_batch_size
2020-06-19 10:49:05 +02:00
Alexey Milovidov
4bb3381bc9 Fix build after merge 2020-05-26 00:53:29 +03:00
filimonov
73a0ef6c0e
Fix for the hang during deletion of engine=Kafka (one more time) (#11145)
* Added flag for safer rdkafka destruction, but more testing detected another hang case, which has no straigt solutions and can be workarounded currenly only by draining the consumer queue, so destructor is back

* After review fixes

* After review fixes2

* After review fixes3
2020-05-26 00:13:10 +03:00
Alexey Milovidov
7e1813825b Return old names of macros 2020-05-24 01:24:01 +03:00
Alexey Milovidov
85f84550ba Progress on task 2020-05-23 23:37:37 +03:00
Alexey Milovidov
86b4d5a86f Remove duplicate whitespaces (preparation) 2020-05-23 22:09:06 +03:00
Alexey Milovidov
a2ad11897f Remove duplicate whitespaces (preparation) 2020-05-23 21:53:58 +03:00
Alexey Milovidov
1f13515a65 Make all LOG in single line (preparation) 2020-05-23 21:31:37 +03:00
Alexey Milovidov
d6bc4d2bd0 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, [^<]+ << "[^"]+" << [^<]+\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), ([^<]+) << "([^"]+)" << ([^<]+)\);/\1_FORMATTED(\2, "{}\4{}", \3, \5);/' 2020-05-23 20:14:57 +03:00
Alexey Milovidov
8042e5febe find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+)\);/\1_FORMATTED(\2, "\3{}\5{}", \4, \6);/' 2020-05-23 19:58:15 +03:00
Alexey Milovidov
ee4ffbc332 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+)\);/\1_FORMATTED(\2, "\3{}", \4);/' 2020-05-23 19:47:56 +03:00
Alexey Milovidov
8d2e80a5e2 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+"\)' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+, "[^"]+")\)/\1_FORMATTED(\2)/' 2020-05-23 19:42:39 +03:00
Mikhail Filimonov
e8a13842af Better fix, previous fix was wrong (was leaving the cycle by eof condition) 2020-05-20 11:02:02 +02:00
Mikhail Filimonov
b50480cbde Fixes the potential missed data during termination of Kafka engine table 2020-05-19 18:55:37 +02:00
Mikhail Filimonov
405cce2c53 Another variant of fix. It looks like we do not need destructor at all 2020-05-15 22:19:21 +02:00
Mikhail Filimonov
a1013f1e71 Fix for the hang during deletion of engine=Kafka 2020-05-14 10:38:02 +02:00
Alexey Milovidov
be22a4b94e Checkpoint 2020-04-22 08:39:31 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00