Add distributed_directory_monitor_split_batch_on_failure setting (OFF by
default), that will split the batch and send files one by one in case of
retriable errors.
v2: more error codes
Under use_compact_format_in_distributed_parts_names=1 and
internal_replication=true the server encodes all replicas for the
directory name for async INSERT into Distributed, and the directory name
looks like:
shard1_replica1,shard1_replica2,shard3_replica3
This is required for creating connections (to specific replicas only),
but in case of internal_replication=true, this can be avoided, since
this path will always includes all replicas.
This patch replaces all replicas with "_all_replicas" marker.
Note, that initial problem was that this path may overflow the NAME_MAX
if you will have more then 15 replicas, and the server will fail to
create the directory.
Also note, that changed directory name should not be a problem, since:
- empty directories will be removed since #16729
- and replicas encoded in the directory name is also supported anyway.
* initial commit: add setting and stub
* typo
* added test stub
* fix
* wip merging new integration test and code proto
* adding steps interpreters
* adding firstly proposed solution (moving parts etc)
* added checking zookeeper path existence
* fixing the include
* fixing and sorting includes
* fixing outdated struct
* fix the name
* added ast ptr as level of indirection
* fix ref
* updating the changes
* working on test stub
* fix iterator -> reference
* revert rocksdb submodule update
* fixed show privileges test
* updated the test stub
* replaced rand() with thread_local_rng(), updated the tests
updated the test
fixed test config path
test fix
removed error messages
fixed the test
updated the test
fixed string literal
fixed literal
typo: =
* fixed the empty replica error message
* updated the test and the code with logs
* updated the possible test cases, updated
* added the code/test milestone comments
* updated the test (added more testcases)
* replaced native assert with CH one
* individual replicas recursive delete fix
* updated the AS db.name AST
* two small logging fixes
* manually generated AST fixes
* Updated the test, added the possible algo change
* Some thoughts about optimizing the solution:
ALTER MOVE PARTITION .. TO TABLE -> move to detached/ + ALTER ... ATTACH
* fix
* Removed the replica sync in test as it's invalid
* Some test tweaks
* tmp
* Rewrote the algo by using the executeQuery instead of
hand-crafting the ASTPtr.
Two questions still active.
* tr: logging active parts
* Extracted the parts moving algo into a separate helper function
* Fixed the test data and the queries slightly
* Replaced query to system.parts to direct invocation,
started building the test that breaks on various parts.
* Added the case for tables when at least one replica is alive
* Updated the test to test replicas restoration by detaching/attaching
* Altered the test to check restoration without replica restart
* Added the tables swap in the start if the server failed last time
* Hotfix when only /replicas/replica... path was deleted
* Restore ZK paths while creating a replicated MergeTree table
* Updated the docs, fixed the algo for individual replicas restoration case
* Initial parts table storage fix, tests sync fix
* Reverted individual replica restoration to general algo
* Slightly optimised getDataParts
* Trying another solution with parts detaching
* Rewrote algo without any steps, added ON CLUSTER support
* Attaching parts from other replica on restoration
* Getting part checksums from ZK
* Removed ON CLUSTER, finished working solution
* Multiple small changes after review
* Fixing parallel test
* Supporting rewritten form on cluster
* Test fix
* Moar logging
* Using source replica as checksum provider
* improve test, remove some code from parser
* Trying solution with move to detached + forget
* Moving all parts (not only Committed) to detached
* Edited docs for RESTORE REPLICA
* Re-merging
* minor fixes
Co-authored-by: Alexander Tokmakov <avtokmakov@yandex-team.ru>
ASAN report:
=================================================================
==7686==ERROR: AddressSanitizer: container-overflow on address 0x6200000bf080 at pc 0x00002a787e79 bp 0x7fffffffa2f0 sp 0x7fffffffa2e8
READ of size 4 at 0x6200000bf080 thread T0
0 0x2a787e78 in replxx::calculate_displayed_length(char32_t const*, int) obj-x86_64-linux-gnu/../contrib/replxx/src/util.cxx:66:15
1 0x2a75786c in replxx::Replxx::ReplxxImpl::dynamicRefresh(replxx::Prompt&, char32_t*, int, int) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:2201:3
2 0x2a7453f0 in replxx::Replxx::ReplxxImpl::incremental_history_search(char32_t) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:2008:3
3 0x2a73eecc in replxx::Replxx::ReplxxImpl::action(unsigned long long, replxx::Replxx::ACTION_RESULT (replxx::Replxx::ReplxxImpl::* const&)(char32_t), char32_t) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:1246:29
4 0x2a73eecc in replxx::Replxx::ReplxxImpl::invoke(replxx::Replxx::ACTION, char32_t) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:318:70
5 0x2a74ed29 in std::__1::__function::__policy_func<replxx::Replxx::ACTION_RESULT (char32_t)>::operator()(char32_t&&) const obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:2221:16
6 0x2a74ed29 in std::__1::function<replxx::Replxx::ACTION_RESULT (char32_t)>::operator()(char32_t) const obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:2560:12
7 0x2a74ed29 in replxx::Replxx::ReplxxImpl::get_input_line() obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx🔢11
8 0x2a74dd3c in replxx::Replxx::ReplxxImpl::input(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:580:8
9 0x2a2a4075 in ReplxxLineReader::readOneLine(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) obj-x86_64-linux-gnu/../base/common/ReplxxLineReader.cpp:112:29
10 0x2a29b499 in LineReader::readLine(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) obj-x86_64-linux-gnu/../base/common/LineReader.cpp:81:26
11 0xb580f02 in DB::Client::mainImpl() obj-x86_64-linux-gnu/../programs/client/Client.cpp:665:33
12 0xb575825 in DB::Client::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) obj-x86_64-linux-gnu/../programs/client/Client.cpp:300:20
13 0x2a3aff25 in Poco::Util::Application::run() obj-x86_64-linux-gnu/../contrib/poco/Util/src/Application.cpp:334:8
14 0xb54c810 in mainEntryClickHouseClient(int, char**) obj-x86_64-linux-gnu/../programs/client/Client.cpp:2702:23
15 0xb326d8a in main obj-x86_64-linux-gnu/../programs/main.cpp:360:12
16 0x7ffff7dcbb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
17 0xb2794ad in _start (/src/ch/tmp/upstream/clickhouse-asan+0xb2794ad)
0x6200000bf080 is located 0 bytes inside of 3672-byte region [0x6200000bf080,0x6200000bfed8)
allocated by thread T0 here:
0 0xb3231dd in operator new(unsigned long) (/src/ch/tmp/upstream/clickhouse-asan+0xb3231dd)
1 0x2a75fb15 in void* std::__1::__libcpp_operator_new<unsigned long>(unsigned long) obj-x86_64-linux-gnu/../contrib/libcxx/include/new:235:10
2 0x2a75fb15 in std::__1::__libcpp_allocate(unsigned long, unsigned long) obj-x86_64-linux-gnu/../contrib/libcxx/include/new:261:10
3 0x2a75fb15 in std::__1::allocator<char32_t>::allocate(unsigned long) obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:840:38
4 0x2a75fb15 in std::__1::allocator_traits<std::__1::allocator<char32_t> >::allocate(std::__1::allocator<char32_t>&, unsigned long) obj-x86_64-linux-gnu/../contrib/libcxx/include/__memory/allocator_traits.h:468:21
5 0x2a75fb15 in std::__1::vector<char32_t, std::__1::allocator<char32_t> >::__vallocate(unsigned long) obj-x86_64-linux-gnu/../contrib/libcxx/include/vector:993:37
6 0x2a75fb15 in std::__1::enable_if<(__is_cpp17_forward_iterator<char32_t*>::value) && (is_constructible<char32_t, std::__1::iterator_traits<char32_t*>::reference>::value), void>::type std::__1::vector<char32_t, std::__1::allocator<char32_t> >::assign<char32_t*>(char32_t*, char32_t*) obj-x86_64-linux-gnu/../contrib/libcxx/include/vector:1460:9
7 0x2a745242 in std::__1::vector<char32_t, std::__1::allocator<char32_t> >::operator=(std::__1::vector<char32_t, std::__1::allocator<char32_t> > const&) obj-x86_64-linux-gnu/../contrib/libcxx/include/vector:1405:9
8 0x2a745242 in replxx::UnicodeString::assign(replxx::UnicodeString const&) obj-x86_64-linux-gnu/../contrib/replxx/src/unicodestring.hxx:83:9
9 0x2a745242 in replxx::Replxx::ReplxxImpl::incremental_history_search(char32_t) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:1993:24
10 0x2a73eecc in replxx::Replxx::ReplxxImpl::action(unsigned long long, replxx::Replxx::ACTION_RESULT (replxx::Replxx::ReplxxImpl::* const&)(char32_t), char32_t) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:1246:29
11 0x2a73eecc in replxx::Replxx::ReplxxImpl::invoke(replxx::Replxx::ACTION, char32_t) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:318:70
12 0x2a74ed29 in std::__1::__function::__policy_func<replxx::Replxx::ACTION_RESULT (char32_t)>::operator()(char32_t&&) const obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:2221:16
13 0x2a74ed29 in std::__1::function<replxx::Replxx::ACTION_RESULT (char32_t)>::operator()(char32_t) const obj-x86_64-linux-gnu/../contrib/libcxx/include/functional:2560:12
14 0x2a74ed29 in replxx::Replxx::ReplxxImpl::get_input_line() obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx🔢11
15 0x2a74dd3c in replxx::Replxx::ReplxxImpl::input(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) obj-x86_64-linux-gnu/../contrib/replxx/src/replxx_impl.cxx:580:8
16 0x2a2a4075 in ReplxxLineReader::readOneLine(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) obj-x86_64-linux-gnu/../base/common/ReplxxLineReader.cpp:112:29
17 0x2a29b499 in LineReader::readLine(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) obj-x86_64-linux-gnu/../base/common/LineReader.cpp:81:26
18 0xb580f02 in DB::Client::mainImpl() obj-x86_64-linux-gnu/../programs/client/Client.cpp:665:33
19 0xb575825 in DB::Client::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) obj-x86_64-linux-gnu/../programs/client/Client.cpp:300:20
20 0x2a3aff25 in Poco::Util::Application::run() obj-x86_64-linux-gnu/../contrib/poco/Util/src/Application.cpp:334:8
21 0xb54c810 in mainEntryClickHouseClient(int, char**) obj-x86_64-linux-gnu/../programs/client/Client.cpp:2702:23
22 0xb326d8a in main obj-x86_64-linux-gnu/../programs/main.cpp:360:12
23 0x7ffff7dcbb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_container_overflow=0.
If you suspect a false positive see also: https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow.
SUMMARY: AddressSanitizer: container-overflow obj-x86_64-linux-gnu/../contrib/replxx/src/util.cxx:66:15 in replxx::calculate_displayed_length(char32_t const*, int)
Shadow bytes around the buggy address:
0x0c408000fdc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c408000fdd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c408000fde0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c408000fdf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c408000fe00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c408000fe10:[fc]fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c408000fe20: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c408000fe30: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c408000fe40: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c408000fe50: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
0x0c408000fe60: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==7686==ABORTING
Refs: https://github.com/ClickHouse-Extras/replxx/pull/16
v2: fix test, do not use /dev/null since it client will lock it