Fix possible issues with MySQL client protocol TLS connections

Occasionally, 02479_mysql_connect_to_self fails on CI [1].

  [1]: https://github.com/ClickHouse/ClickHouse/issues/50911

The problem was indeed query profiler and EINTR, but not in a way you
may think.

For such failures you may see the following trace in trace_log:

    contrib/openssl/crypto/bio/bss_sock.c:127::sock_read
    contrib/openssl/crypto/bio/bio_meth.c:121::bread_conv
    contrib/openssl/crypto/bio/bio_lib.c:285::bio_read_intern
    contrib/openssl/crypto/bio/bio_lib.c:311::BIO_read
    contrib/openssl/ssl/record/methods/tls_common.c:398::tls_default_read_n
    contrib/openssl/ssl/record/methods/tls_common.c:575::tls_get_more_records
    contrib/openssl/ssl/record/methods/tls_common.c:1122::tls_read_record
    contrib/openssl/ssl/record/rec_layer_s3.c:645::ssl3_read_bytes
    contrib/openssl/ssl/s3_lib.c:4527::ssl3_read_internal
    contrib/openssl/ssl/s3_lib.c:4551::ssl3_read
    contrib/openssl/ssl/ssl_lib.c:2343::ssl_read_internal
    contrib/openssl/ssl/ssl_lib.c:2357::SSL_read
    contrib/mariadb-connector-c/libmariadb/secure/openssl.c:729::ma_tls_read
    contrib/mariadb-connector-c/libmariadb/ma_tls.c:90::ma_pvio_tls_read
    contrib/mariadb-connector-c/libmariadb/ma_pvio.c:250::ma_pvio_read
    contrib/mariadb-connector-c/libmariadb/ma_pvio.c:297::ma_pvio_cache_read
    contrib/mariadb-connector-c/libmariadb/ma_net.c:373::ma_real_read
    contrib/mariadb-connector-c/libmariadb/ma_net.c:427::ma_net_read
    contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:192::ma_net_safe_read
    contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:2138::mthd_my_read_query_result
    contrib/mariadb-connector-c/libmariadb/mariadb_lib.c:2212::mysql_real_query
    src/Common/mysqlxx/Query.cpp:56::mysqlxx::Query::executeImpl()
    src/Common/mysqlxx/Query.cpp:73::mysqlxx::Query::use()
    src/Processors/Sources/MySQLSource.cpp:50::DB::MySQLSource::Connection::Connection()

After which the connection will fail with:

    Code: 1000. DB::Exception: Received from localhost:9000. DB::Exception: mysqlxx::ConnectionLost: Lost connection to MySQL server during query (127.0.0.1:9004). (POCO_EXCEPTION)

But, if you will take a look at ma_tls_read() you will see that it has
proper retries for SSL_ERROR_WANT_READ (and EINTR is just a special case
of it), but still, for some reason it fails.

And the reason is the units of the read/write timeout, ma_tls_read()
calls poll(read_timeout) in case of SSL_ERROR_WANT_READ, but, it
incorrectly assume that the timeout is in milliseconds, but that timeout
was in seconds, this bug had been fixed in [2], and now it works like a
charm!

  [2]: https://github.com/ClickHouse/mariadb-connector-c/pull/17

I've verified it with patching openssl library:

    diff --git a/crypto/bio/bss_sock.c b/crypto/bio/bss_sock.c
    index 82f7be85ae..3d2f3926a0 100644
    --- a/crypto/bio/bss_sock.c
    +++ b/crypto/bio/bss_sock.c
    @@ -124,7 +124,24 @@ static int sock_read(BIO *b, char *out, int outl)
                 ret = ktls_read_record(b->num, out, outl);
             else
     # endif
    -            ret = readsocket(b->num, out, outl);
    +        {
    +            static int i = 0;
    +            static int j = 0;
    +            if (!(++j % 5))
    +            {
    +                fprintf(stderr, "sock_read: inject EAGAIN with ret=0\n");
    +                ret = 0;
    +                errno = EAGAIN;
    +            }
    +            else if (!(++i % 3))
    +            {
    +                fprintf(stderr, "sock_read: inject EAGAIN with ret=-1\n");
    +                ret = -1;
    +                errno = EAGAIN;
    +            }
    +            else
    +                ret = readsocket(b->num, out, outl);
    +        }
             BIO_clear_retry_flags(b);
             if (ret <= 0) {
                 if (BIO_sock_should_retry(ret))

And before this patch (well, not the patch itself, but the referenced
patch in mariadb-connector-c) if you will pass read_write_timeout=1 it
will fail:

    SELECT * FROM mysql('127.0.0.1:9004', system, one, 'default', '', SETTINGS connect_timeout = 100, connection_wait_timeout = 100, read_write_timeout=1)
    Code: 1000. DB::Exception: Received from localhost:9000. DB::Exception: mysqlxx::ConnectionLost: Lost connection to MySQL server during query (127.0.0.1:9004). (POCO_EXCEPTION)

But after, it always works:

    $ ch benchmark -c10 -q "SELECT * FROM mysql('127.0.0.1:9004', system, one, 'default', '', SETTINGS connection_pool_size=1, connect_timeout = 100, connection_wait_timeout = 100, read_write_timeout=1)"
    ^CStopping launch of queries. SIGINT received.

    Queries executed: 478.

    localhost:9000, queries: 478, QPS: 120.171, RPS: 120.171, MiB/s: 0.001, result RPS: 120.171, result MiB/s: 0.001.

    0.000%          0.014 sec.
    10.000%         0.058 sec.
    20.000%         0.065 sec.
    30.000%         0.073 sec.
    40.000%         0.079 sec.
    50.000%         0.087 sec.
    60.000%         0.089 sec.
    70.000%         0.091 sec.
    80.000%         0.095 sec.
    90.000%         0.100 sec.
    95.000%         0.102 sec.
    99.000%         0.105 sec.
    99.900%         0.140 sec.
    99.990%         0.140 sec.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This commit is contained in:
Azat Khuzhin 2024-07-01 17:05:34 +02:00
parent 63b031479c
commit 9c7db6711b

@ -1 +1 @@
Subproject commit e39608998f5f6944ece9ec61f48e9172ec1de660
Subproject commit d0a788c5b9fcaca2368d9233770d3ca91ea79f88