Recently (#32094) test database had been overlapped, and random prefix
for database had been increased from 6 to 8.
But actually 6 bytes for random prefix should be enough (with existing
alphabet (0-9a-z) it is 36**6=2'176'782'336), and the real reason of
this overlap is that random generator by default uses shared state [1]:
The functions supplied by this module are actually bound methods of
a hidden instance of the random.Random class. You can instantiate your
own instances of Random to get generators that don’t share state.
[1]: https://docs.python.org/3/library/random.html
I've played a little bit with random in python, and using default random
generator it generates non-unique strings pretty fast, just in a few
runs, but using SystemRandom (that uses /dev/urandom) it takes ~1 minute.
Test:
```sh
$ while /tmp/test.py | LANG=c sort -S5G | LANG=c uniq -d | tee /dev/stderr | wc -l | fgrep -q -x -c 0; do :; done
```
```python
#!/usr/bin/env python3
import multiprocessing
import string
import random
def random_str(length=6):
alphabet = string.ascii_lowercase + string.digits
return ''.join(random.SystemRandom().choice(alphabet) for _ in range(length))
def worker(_):
print(random_str())
with multiprocessing.Pool(processes=2) as pool:
pool.map(worker, range(0, int(10e3)))
```
So let's switch to SystemRandom and use 6-byte prefix.
Set --server-check-retries to 90 (and this is ~45 seconds), since right
now sometimes it is not enough [1].
[1]: https://clickhouse-test-reports.s3.yandex.net/30191/0e34a9d550cfe6924fe575871f36c44dd44acdaa/functional_stateless_tests_(memory).html#fail1
And the reason I guess is clickhouse-test had been rewritten to
http.client in #30065, and since now it does not need to execute
clickhouse-client binary, which in debug/sanitizers builds can take also
sometime.
That said that with clickhouse-client for hung check it was not 15
seconds, but more (each clickhouse-client requires 0.6sec with
sanitizers for simple SELECT 1, while w/o 0.1second, also too much
should be optimized)
Cons of clickhouse-driver:
- it is one more extra dependency
- it does not have correct timeouts (only for socket operations, and
this is not the same, so we need to set timeout by ourself)
- it is one more thing which can break (@alesapin)
Pros:
- Using native protocol over executing binaries is always better
- `clickhouse-client` in debug build takes almost a second to execute simple `SELECT 1`
and `clickhouse-test` requires ~5 queries at start (determine some
flags, zk, alive, create database)
Notes:
- `FORMAT Vertical` had been replaced with printing of `pandas.DataFrame`
And after this patch tiny tests work with the speed of the test, and
does not requires +-5 seconds of bootstrapping.
Right now it is possible to get the following error:
Having 20 errors! 0 tests passed. 0 tests skipped. 57.37 s elapsed (MainProcess).
Won't run stateful tests because test data wasn't loaded.
Traceback (most recent call last):
File "/usr/lib/python3.9/multiprocessing/managers.py", line 802, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/ch/clickhouse/.cmake/../tests/clickhouse-test", line 1462, in <module>
main(args)
File "/src/ch/clickhouse/.cmake/../tests/clickhouse-test", line 1261, in main
if len(restarted_tests) > 0:
File "<string>", line 2, in __len__
File "/usr/lib/python3.9/multiprocessing/managers.py", line 806, in _callmethod
self._connect()
File "/usr/lib/python3.9/multiprocessing/managers.py", line 793, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python3.9/multiprocessing/connection.py", line 507, in Client
c = SocketClient(address)
File "/usr/lib/python3.9/multiprocessing/connection.py", line 635, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
The reason behind this is that manager's thread got terminated:
ipdb> p restarted_tests._manager._process
<ForkProcess name='SyncManager-1' pid=25125 parent=24939 stopped exitcode=-SIGTERM>
Refs: #29259 (cc: @vdimir)
Follow-up for: #29197 (cc: @tavplubix)