Right now it is possible to get the following error:
Having 20 errors! 0 tests passed. 0 tests skipped. 57.37 s elapsed (MainProcess).
Won't run stateful tests because test data wasn't loaded.
Traceback (most recent call last):
File "/usr/lib/python3.9/multiprocessing/managers.py", line 802, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/ch/clickhouse/.cmake/../tests/clickhouse-test", line 1462, in <module>
main(args)
File "/src/ch/clickhouse/.cmake/../tests/clickhouse-test", line 1261, in main
if len(restarted_tests) > 0:
File "<string>", line 2, in __len__
File "/usr/lib/python3.9/multiprocessing/managers.py", line 806, in _callmethod
self._connect()
File "/usr/lib/python3.9/multiprocessing/managers.py", line 793, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python3.9/multiprocessing/connection.py", line 507, in Client
c = SocketClient(address)
File "/usr/lib/python3.9/multiprocessing/connection.py", line 635, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
The reason behind this is that manager's thread got terminated:
ipdb> p restarted_tests._manager._process
<ForkProcess name='SyncManager-1' pid=25125 parent=24939 stopped exitcode=-SIGTERM>
Refs: #29259 (cc: @vdimir)
Follow-up for: #29197 (cc: @tavplubix)
This should fix the following SIGSEGV, that was found on CI [1]:
<Fatal> BaseDaemon: Address: NULL pointer. Access: read. Unknown si_code.
<Fatal> BaseDaemon: 4.4. inlined from ../contrib/rocksdb/utilities/object_registry.cc:19: rocksdb::ObjectLibrary::FindEntry() const
...
<Fatal> BaseDaemon: 7.3. inlined from ../contrib/rocksdb/options/cf_options.cc:678: rocksdb::$_7::operator()()
[1]: https://clickhouse-test-reports.s3.yandex.net/29341/2b2bec3679df7965af908ce3f1e8e17e39bd12fe/integration_tests_flaky_check_(asan).html#fail1
And also I checked manually with TSan binary, and here is a data race
reported by TSan:
WARNING: ThreadSanitizer: data race (pid=3356)
Read of size 8 at 0x7b0c0008cca8 by thread T40:
2 rocksdb::ObjectLibrary::FindEntry() const obj-x86_64-linux-gnu/../contrib/rocksdb/utilities/object_registry.cc:18:27 (clickhouse-tsan+0x1b839a6c)
...
6 rocksdb::$_7::operator()() const obj-x86_64-linux-gnu/../contrib/rocksdb/options/cf_options.cc:676:32 (clickhouse-tsan+0x1b6bfa63)
...
28 rocksdb::GetColumnFamilyOptionsFromMap() obj-x86_64-linux-gnu/../contrib/rocksdb/options/options_helper.cc:727:10 (clickhouse-tsan+0x1b6fffd2)
29 DB::StorageEmbeddedRocksDB::initDb() obj-x86_64-linux-gnu/../src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp:359:26 (clickhouse-tsan+0x14195e31)
...
Previous write of size 8 at 0x7b0c0008cca8 by thread T41:
...
9 rocksdb::ObjectLibrary::AddEntry() obj-x86_64-linux-gnu/../contrib/rocksdb/utilities/object_registry.cc:31:19 (clickhouse-tsan+0x1b8392fc)
...
11 rocksdb::RegisterTableFactories()::$_0::operator()() const obj-x86_64-linux-gnu/../contrib/rocksdb/table/table_factory.cc:23:14 (clickhouse-tsan+0x1b7ea94c)
...
43 rocksdb::GetColumnFamilyOptionsFromMap() obj-x86_64-linux-gnu/../contrib/rocksdb/options/options_helper.cc:727:10 (clickhouse-tsan+0x1b6fffd2)
44 DB::StorageEmbeddedRocksDB::initDb() obj-x86_64-linux-gnu/../src/Storages/RocksDB/StorageEmbeddedRocksDB.cpp:359:26 (clickhouse-tsan+0x14195e31)
Refs: https://github.com/ClickHouse-Extras/rocksdb/pull/13Fixes: #29341
If ReplicatedAccessStorage startup was not executing or if it
failed before completing (for instance when ZooKeeper was not configured),
its destructor would call shutdown and try to join a missing thread.