There was a race condition when issuing a DDL query on a replica just
after a new replica was added.
If the DDL query is issued after the new replica adds itself to the
list of replicas, but before the new replica has finished its
recovery, then the first replica adds the new replica to the list of
replicas to wait to confirm the query was replicated.
Meanwhile, the new replica is still in recovery and applies queries
from the /metadata snapshot. When it's done, it bumps its log_ptr
without marking the corresponding log entries (if any) as finished.
The first replica then waits until distributed_ddl_task_timeout
expires and wrongly assumes the query was not replicated.
The issue is fixed by remembering the max_log_ptr at the exact point
where the replica adds itself to the list of replicas, then mark as
finished all queries that happened between that max_log_ptr and the
max_log_ptr of the metadata snapshot used in recovery.
The bug was randomly observed during a downstream test. It can be
reproduced more easily by inserting a sleep of a few seconds at the
end of createReplicaNodesInZooKeeper, enough to have time to issue a
DDL query on the first replica.
10 seconds it not enough [1]:
2010_lc_native: [ FAIL ] 15.53 sec. - having stderror:
Traceback (most recent call last):
File "/usr/share/clickhouse-test/queries/0_stateless/2010_lc_native.python", line 302, in <module>
main()
File "/usr/share/clickhouse-test/queries/0_stateless/2010_lc_native.python", line 298, in main
insertLowCardinalityRowWithIndexOverflow()
File "/usr/share/clickhouse-test/queries/0_stateless/2010_lc_native.python", line 259, in insertLowCardinalityRowWithIndexOverflow
readException(s)
File "/usr/share/clickhouse-test/queries/0_stateless/2010_lc_native.python", line 185, in readException
assert(readVarUInt(s) == 2)
File "/usr/share/clickhouse-test/queries/0_stateless/2010_lc_native.python", line 65, in readVarUInt
byte = readStrict(s)[0]
File "/usr/share/clickhouse-test/queries/0_stateless/2010_lc_native.python", line 34, in readStrict
cur = s.recv(size)
socket.timeout: timed out
[1]: https://clickhouse-test-reports.s3.yandex.net/29060/7f1160c466a968c680d1bc73e0ad5e9419d0b6d0/functional_stateless_tests_(debug).html#fail1
Server log:
2021.09.17 21:02:53.734388 [ 682 ] {123456} <Trace> ContextAccess (default): Access granted: INSERT(x) ON test_0tmdh4.tab
2021.09.17 21:03:04.835044 [ 682 ] {123456} <Error> executeQuery: Code: 117. DB::Exception: Index for LowCardinality is out of range. Dictionary size is 1, but found index with value 72057594037927936. (INCORRECT_DATA) (version 21.11.1.8109) (from [::ffff:127.0.0.1]:59512) (in query: insert into test_0tmdh4.tab format TSV settings input_format_defaults_for_omitted_fields=0), Stack trace (when copying this message, always include the lines below):
And according to trace_log the problem is slow unwind:
2021-09-17 21:02:55.103893 libunwind::LocalAddressSpace::get32
2021-09-17 21:02:56.151960 libunwind::LocalAddressSpace::get8
2021-09-17 21:03:03.168447 libunwind::LocalAddressSpace::getULEB128
* Alter mv and lv (en)
* Translated to Russian. Fixes.
* Fixes
* Update docs/en/sql-reference/statements/alter/view.md
Co-authored-by: Ivan Blinkov <github@blinkov.ru>
Co-authored-by: Ivan Blinkov <github@blinkov.ru>