This setting, when enabled (disabled by default), allows ClickHouse to
silently skip unavailable shards of a Distributed table during a query
execution, instead of throwing an exception to the client.
In case of restoring from current_batch.txt it is possible that the some
file from the batch will not be exist, and the fix submitted in #49884
was not complete, since it will fail later in markAsSend() (due to it
tries to obtain file size there):
2023.12.04 05:43:12.676658 [ 5006 ] {} <Error> dist.DirectoryMonitor.work4: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in file_size: No such file or directory ["/work4/clickhouse/data/dist/shard8_all_replicas//150426396.bin"], Stack trace (when copying this message, always include the lines below):
0. ./.build/./contrib/llvm-project/libcxx/include/exception:134: std::runtime_error::runtime_error(String const&) @ 0x00000000177e83f4 in /usr/lib/debug/usr/bin/clickhouse.debug
1. ./.build/./contrib/llvm-project/libcxx/include/string:1499: std::system_error::system_error(std::error_code, String const&) @ 0x00000000177f0fd5 in /usr/lib/debug/usr/bin/clickhouse.debug
2. ./.build/./contrib/llvm-project/libcxx/include/__filesystem/filesystem_error.h:42: std::__fs::filesystem::filesystem_error::filesystem_error[abi:v15000](String const&, std::__fs::filesystem::path const&, std::error_code) @ 0x000000000b844ca1 in /usr/lib/debug/usr/bin/clickhouse.debug
3. ./.build/./contrib/llvm-project/libcxx/include/__filesystem/filesystem_error.h:90: void std::__fs::filesystem::__throw_filesystem_error[abi:v15000]<String&, std::__fs::filesystem::path const&, std::error_code const&>(String&, std::__fs::filesystem::path const&, std::error_code const&) @ 0x000000001778f953 in /usr/lib/debug/usr/bin/clickhouse.debug
4. ./.build/./contrib/llvm-project/libcxx/src/filesystem/filesystem_common.h:0: std::__fs::filesystem::detail::(anonymous namespace)::ErrorHandler<unsigned long>::report(std::error_code const&) const @ 0x0000000017793ef7 in /usr/lib/debug/usr/bin/clickhouse.debug
5. ./.build/./contrib/llvm-project/libcxx/src/filesystem/operations.cpp:0: std::__fs::filesystem::__file_size(std::__fs::filesystem::path const&, std::error_code*) @ 0x0000000017793e26 in /usr/lib/debug/usr/bin/clickhouse.debug
6. ./.build/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:707: DB::DistributedAsyncInsertDirectoryQueue::markAsSend(String const&) @ 0x0000000011cd92c5 in /usr/lib/debug/usr/bin/clickhouse.debug
7. ./.build/./contrib/llvm-project/libcxx/include/__iterator/wrap_iter.h💯 DB::DistributedAsyncInsertBatch::send() @ 0x0000000011cdd81c in /usr/lib/debug/usr/bin/clickhouse.debug
8. ./.build/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:0: DB::DistributedAsyncInsertDirectoryQueue::processFilesWithBatching() @ 0x0000000011cd5054 in /usr/lib/debug/usr/bin/clickhouse.debug
9. ./.build/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:417: DB::DistributedAsyncInsertDirectoryQueue::processFiles() @ 0x0000000011cd3440 in /usr/lib/debug/usr/bin/clickhouse.debug
10. ./.build/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:0: DB::DistributedAsyncInsertDirectoryQueue::run() @ 0x0000000011cd3878 in /usr/lib/debug/usr/bin/clickhouse.debug
11. ./.build/./contrib/llvm-project/libcxx/include/__functional/function.h:0: DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x00000000103dbc34 in /usr/lib/debug/usr/bin/clickhouse.debug
12. ./.build/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:701: DB::BackgroundSchedulePool::threadFunction() @ 0x00000000103de1b6 in /usr/lib/debug/usr/bin/clickhouse.debug
13. ./.build/./src/Core/BackgroundSchedulePool.cpp:0: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000103de7d1 in /usr/lib/debug/usr/bin/clickhouse.debug
14. ./.build/./base/base/../base/wide_integer_impl.h:809: ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void*>) @ 0x000000000b8c5502 in /usr/lib/debug/usr/bin/clickhouse.debug
15. ./.build/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:302: void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000b8c936e in /usr/lib/debug/usr/bin/clickhouse.debug
16. ? @ 0x00007f1be8b30fd4 in ?
17. ? @ 0x00007f1be8bb15bc in ?
And instead of ignoring errors, DistributedAsyncInsertBatch::valid() had
been added, that should be called when the files had been read from the
current_batch.txt, if it is not valid (some files from the batch did not
exist), then there is no sense in trying to send the same batch, so just
this file will be ignored, and files will be processed in a regular
order.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Limit log frequence for "Skipping send data over distributed table" message
After SYSTEM STOP DISTRIBUTED SENDS it will constantly print this
message.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Rename directory monitor concept into async INSERT
Rename the following query settings (with preserving backward
compatiblity, by keeping old name as an alias):
- distributed_directory_monitor_sleep_time_ms -> distributed_async_insert_sleep_time_ms
- distributed_directory_monitor_max_sleep_time_ms -> distributed_async_insert_max_sleep_time_ms
- distributed_directory_monitor_batch -> distributed_async_insert_batch_inserts
- distributed_directory_monitor_split_batch_on_failure -> distributed_async_insert_split_batch_on_failure
Rename the following table settings (with preserving backward
compatiblity, by keeping old name as an alias):
- monitor_batch_inserts -> async_insert_batch
- monitor_split_batch_on_failure -> async_insert_split_batch_on_failure
- directory_monitor_sleep_time_ms -> async_insert_sleep_time_ms
- directory_monitor_max_sleep_time_ms -> async_insert_max_sleep_time_ms
And also update all the references:
$ gg -e directory_monitor_ -e monitor_ tests docs | cut -d: -f1 | sort -u | xargs sed -e 's/distributed_directory_monitor_sleep_time_ms/distributed_async_insert_sleep_time_ms/g' -e 's/distributed_directory_monitor_max_sleep_time_ms/distributed_async_insert_max_sleep_time_ms/g' -e 's/distributed_directory_monitor_batch_inserts/distributed_async_insert_batch/g' -e 's/distributed_directory_monitor_split_batch_on_failure/distributed_async_insert_split_batch_on_failure/g' -e 's/monitor_batch_inserts/async_insert_batch/g' -e 's/monitor_split_batch_on_failure/async_insert_split_batch_on_failure/g' -e 's/monitor_sleep_time_ms/async_insert_sleep_time_ms/g' -e 's/monitor_max_sleep_time_ms/async_insert_max_sleep_time_ms/g' -i
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Rename async_insert for Distributed into background_insert
This will avoid amigibuity between general async INSERT's and INSERT
into Distributed, which are indeed background, so new term express it
even better.
Mostly done with:
$ git di HEAD^ --name-only | xargs sed -i -e 's/distributed_async_insert/distributed_background_insert/g' -e 's/async_insert_batch/background_insert_batch/g' -e 's/async_insert_split_batch_on_failure/background_insert_split_batch_on_failure/g' -e 's/async_insert_sleep_time_ms/background_insert_sleep_time_ms/g' -e 's/async_insert_max_sleep_time_ms/background_insert_max_sleep_time_ms/g'
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Mark 02417_opentelemetry_insert_on_distributed_table as long
CI: https://s3.amazonaws.com/clickhouse-test-reports/55978/7a6abb03a0b507e29e999cb7e04f246a119c6f28/stateless_tests_flaky_check__asan_.html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
---------
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Sometimes you can have tons of data there, i.e. few TiBs, and sending
them on server shutdown does not looks sane (maybe there is a bug and
you need to update/restart to fix flushing).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Before you may see the following:
2023.07.25 09:21:39.705559 [ 692 ] {6b5e1299-1b64-4dbb-b25d-45e10027db22} <Trace> test_hkt5nnqj.dist_opentelemetry.DirectoryMonitor.default: Finished processing `` (took 37 ms)
Because file_path and current_file are the references to the same
variable in DistributedAsyncInsertDirectoryQueue::processFile().
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
After abnormal server restart current_batch.txt (that contains list of
files to send to the remote shard), may not have all files, if it was
terminated between unlink .bin files and truncation of current_batch.txt
But it should be fixed in a more reliable way, though to backport the
patch I kept it simple.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
After refactoring in #45491 this behaviour had been changed, hence
01555_system_distribution_queue_mask became flaky, since if the flush
had not happened before SYSTEM FLUSH DISTRIBUTED the error was not saved
into the system.distribution_queue.
v2: Improve 01555_system_distribution_queue_mask test
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Before it was initialized from disk only on startup, but if some INSERT
can create the object before, then, it will lead to the situation when
it will not be initialized.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Since #44922 it is not a directory monitor anymore.
v2: Remove unused error codes
v3: Contains some header fixes due to conflicts with master
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>