ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 02:41:59 +00:00

Author	SHA1	Message	Date
Alexey Milovidov	58396c5546	Merge pull request #57218 from tntnatbry/issue-43666 Issue 43666: Add skip_unavailable_shards as a setting for Distributed table.	2023-12-18 04:48:57 +01:00
Alexey Milovidov	c77183a597	Merge pull request #57480 from azat/dist/async-INSERT-fixes Fix possible distributed sends stuck due to "No such file or directory" (during recovering batch from disk)	2023-12-09 17:11:35 +01:00
Gagan Goel	e547db0a8c	Issue 43666: Add skip_unavailable_shards as a setting for Distributed table. This setting, when enabled (disabled by default), allows ClickHouse to silently skip unavailable shards of a Distributed table during a query execution, instead of throwing an exception to the client.	2023-12-08 15:43:59 -05:00
zhongyuankai	7b0f8d44e8	Make DirectoryMonitor handle cluster node list change (#42826 )	2023-12-08 14:41:51 +01:00
Azat Khuzhin	7986fe619a	Introduce DistributedAsyncInsertionFailures - event for async INSERT failures Useful for alerts Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-12-04 16:52:53 +01:00
Azat Khuzhin	604cec475a	Fix possible distributed sends stuck due to "No such file or directory" In case of restoring from current_batch.txt it is possible that the some file from the batch will not be exist, and the fix submitted in #49884 was not complete, since it will fail later in markAsSend() (due to it tries to obtain file size there): 2023.12.04 05:43:12.676658 [ 5006 ] {} <Error> dist.DirectoryMonitor.work4: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in file_size: No such file or directory ["/work4/clickhouse/data/dist/shard8_all_replicas//150426396.bin"], Stack trace (when copying this message, always include the lines below): 0. ./.build/./contrib/llvm-project/libcxx/include/exception:134: std::runtime_error::runtime_error(String const&) @ 0x00000000177e83f4 in /usr/lib/debug/usr/bin/clickhouse.debug 1. ./.build/./contrib/llvm-project/libcxx/include/string:1499: std::system_error::system_error(std::error_code, String const&) @ 0x00000000177f0fd5 in /usr/lib/debug/usr/bin/clickhouse.debug 2. ./.build/./contrib/llvm-project/libcxx/include/__filesystem/filesystem_error.h:42: std::__fs::filesystem::filesystem_error::filesystem_error[abi:v15000](String const&, std::__fs::filesystem::path const&, std::error_code) @ 0x000000000b844ca1 in /usr/lib/debug/usr/bin/clickhouse.debug 3. ./.build/./contrib/llvm-project/libcxx/include/__filesystem/filesystem_error.h:90: void std::__fs::filesystem::__throw_filesystem_error[abi:v15000]<String&, std::__fs::filesystem::path const&, std::error_code const&>(String&, std::__fs::filesystem::path const&, std::error_code const&) @ 0x000000001778f953 in /usr/lib/debug/usr/bin/clickhouse.debug 4. ./.build/./contrib/llvm-project/libcxx/src/filesystem/filesystem_common.h:0: std::__fs::filesystem::detail::(anonymous namespace)::ErrorHandler<unsigned long>::report(std::error_code const&) const @ 0x0000000017793ef7 in /usr/lib/debug/usr/bin/clickhouse.debug 5. ./.build/./contrib/llvm-project/libcxx/src/filesystem/operations.cpp:0: std::__fs::filesystem::__file_size(std::__fs::filesystem::path const&, std::error_code) @ 0x0000000017793e26 in /usr/lib/debug/usr/bin/clickhouse.debug 6. ./.build/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:707: DB::DistributedAsyncInsertDirectoryQueue::markAsSend(String const&) @ 0x0000000011cd92c5 in /usr/lib/debug/usr/bin/clickhouse.debug 7. ./.build/./contrib/llvm-project/libcxx/include/__iterator/wrap_iter.h💯 DB::DistributedAsyncInsertBatch::send() @ 0x0000000011cdd81c in /usr/lib/debug/usr/bin/clickhouse.debug 8. ./.build/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:0: DB::DistributedAsyncInsertDirectoryQueue::processFilesWithBatching() @ 0x0000000011cd5054 in /usr/lib/debug/usr/bin/clickhouse.debug 9. ./.build/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:417: DB::DistributedAsyncInsertDirectoryQueue::processFiles() @ 0x0000000011cd3440 in /usr/lib/debug/usr/bin/clickhouse.debug 10. ./.build/./src/Storages/Distributed/DistributedAsyncInsertDirectoryQueue.cpp:0: DB::DistributedAsyncInsertDirectoryQueue::run() @ 0x0000000011cd3878 in /usr/lib/debug/usr/bin/clickhouse.debug 11. ./.build/./contrib/llvm-project/libcxx/include/__functional/function.h:0: DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x00000000103dbc34 in /usr/lib/debug/usr/bin/clickhouse.debug 12. ./.build/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:701: DB::BackgroundSchedulePool::threadFunction() @ 0x00000000103de1b6 in /usr/lib/debug/usr/bin/clickhouse.debug 13. ./.build/./src/Core/BackgroundSchedulePool.cpp:0: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const) @ 0x00000000103de7d1 in /usr/lib/debug/usr/bin/clickhouse.debug 14. ./.build/./base/base/../base/wide_integer_impl.h:809: ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void>) @ 0x000000000b8c5502 in /usr/lib/debug/usr/bin/clickhouse.debug 15. ./.build/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:302: void std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000b8c936e in /usr/lib/debug/usr/bin/clickhouse.debug 16. ? @ 0x00007f1be8b30fd4 in ? 17. ? @ 0x00007f1be8bb15bc in ? And instead of ignoring errors, DistributedAsyncInsertBatch::valid() had been added, that should be called when the files had been read from the current_batch.txt, if it is not valid (some files from the batch did not exist), then there is no sense in trying to send the same batch, so just this file will be ignored, and files will be processed in a regular order. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-12-04 16:52:53 +01:00
Azat Khuzhin	638d0102f8	Fix error_count in case of distributed_directory_monitor_max_sleep_time_ms>5min In this case the error counter will be decremented everytime. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-12-04 16:52:53 +01:00
Igor Nikonov	63a1625b77	Merge remote-tracking branch 'origin/master' into pr-cleanup-narrow-dependency	2023-11-21 16:05:48 +00:00
Igor Nikonov	ce98dfb251	Settings pointer to reference	2023-11-21 16:04:54 +00:00
Igor Nikonov	66f6a6575f	Cleanup iteration: settings usage	2023-11-21 13:29:04 +00:00
Alexey Milovidov	d56cbda185	Add metrics for the number of queued jobs, which is useful for the IO thread pool	2023-11-18 19:07:59 +01:00
Azat Khuzhin	c25d6cd624	Rename directory monitor concept into background INSERT (#55978 ) * Limit log frequence for "Skipping send data over distributed table" message After SYSTEM STOP DISTRIBUTED SENDS it will constantly print this message. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Rename directory monitor concept into async INSERT Rename the following query settings (with preserving backward compatiblity, by keeping old name as an alias): - distributed_directory_monitor_sleep_time_ms -> distributed_async_insert_sleep_time_ms - distributed_directory_monitor_max_sleep_time_ms -> distributed_async_insert_max_sleep_time_ms - distributed_directory_monitor_batch -> distributed_async_insert_batch_inserts - distributed_directory_monitor_split_batch_on_failure -> distributed_async_insert_split_batch_on_failure Rename the following table settings (with preserving backward compatiblity, by keeping old name as an alias): - monitor_batch_inserts -> async_insert_batch - monitor_split_batch_on_failure -> async_insert_split_batch_on_failure - directory_monitor_sleep_time_ms -> async_insert_sleep_time_ms - directory_monitor_max_sleep_time_ms -> async_insert_max_sleep_time_ms And also update all the references: $ gg -e directory_monitor_ -e monitor_ tests docs \| cut -d: -f1 \| sort -u \| xargs sed -e 's/distributed_directory_monitor_sleep_time_ms/distributed_async_insert_sleep_time_ms/g' -e 's/distributed_directory_monitor_max_sleep_time_ms/distributed_async_insert_max_sleep_time_ms/g' -e 's/distributed_directory_monitor_batch_inserts/distributed_async_insert_batch/g' -e 's/distributed_directory_monitor_split_batch_on_failure/distributed_async_insert_split_batch_on_failure/g' -e 's/monitor_batch_inserts/async_insert_batch/g' -e 's/monitor_split_batch_on_failure/async_insert_split_batch_on_failure/g' -e 's/monitor_sleep_time_ms/async_insert_sleep_time_ms/g' -e 's/monitor_max_sleep_time_ms/async_insert_max_sleep_time_ms/g' -i Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Rename async_insert for Distributed into background_insert This will avoid amigibuity between general async INSERT's and INSERT into Distributed, which are indeed background, so new term express it even better. Mostly done with: $ git di HEAD^ --name-only \| xargs sed -i -e 's/distributed_async_insert/distributed_background_insert/g' -e 's/async_insert_batch/background_insert_batch/g' -e 's/async_insert_split_batch_on_failure/background_insert_split_batch_on_failure/g' -e 's/async_insert_sleep_time_ms/background_insert_sleep_time_ms/g' -e 's/async_insert_max_sleep_time_ms/background_insert_max_sleep_time_ms/g' Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Mark 02417_opentelemetry_insert_on_distributed_table as long CI: https://s3.amazonaws.com/clickhouse-test-reports/55978/7a6abb03a0b507e29e999cb7e04f246a119c6f28/stateless_tests_flaky_check__asan_.html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> --------- Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-11-01 15:09:39 +01:00
Azat Khuzhin	5d2a97106c	Fix missing thread accounting for insert_distributed_sync=1 Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-10-09 15:41:52 +02:00
Alexander Tokmakov	d41eca1dcc	rename new method	2023-08-28 16:01:00 +02:00
Alexander Tokmakov	9ab545e28c	do not wait for flush on shutdown	2023-08-25 19:09:10 +02:00
Azat Khuzhin	17ca2661a1	Add ability to turn off flush of Distributed on DETACH/DROP/server shutdown Sometimes you can have tons of data there, i.e. few TiBs, and sending them on server shutdown does not looks sane (maybe there is a bug and you need to update/restart to fix flushing). Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-08-17 08:58:06 +02:00
Azat Khuzhin	2f414950b7	Fix logging for asynchronous non-batched distributed sends (#52583 ) Before you may see the following: 2023.07.25 09:21:39.705559 [ 692 ] {6b5e1299-1b64-4dbb-b25d-45e10027db22} <Trace> test_hkt5nnqj.dist_opentelemetry.DirectoryMonitor.default: Finished processing `` (took 37 ms) Because file_path and current_file are the references to the same variable in DistributedAsyncInsertDirectoryQueue::processFile(). Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-08-07 20:57:42 +02:00
Vitaly Baranov	815a3857de	Remove non-const function Context::getClientInfo().	2023-07-17 15:02:07 +02:00
Vitaly Baranov	0cccba62cf	Support getHexUIntLowercase() with CityHash_v1_0_2::uint128 parameter.	2023-06-29 15:29:37 +02:00
Vitaly Baranov	f1f0daa654	Show halves of checksums in "system.parts", "system.projection_parts" and error messages in the correct order.	2023-06-25 17:17:56 +02:00
Vitaly Baranov	3711430d9f	Rename member fields of CityHash_v1_0_2::uint128: "first" -> "low64", "second" -> "high64".	2023-06-24 12:25:56 +02:00
Robert Schulze	c538506f2e	More fixes	2023-06-09 20:50:17 +00:00
Robert Schulze	1aa158909e	enable_qpl_deflate_codec --> enable_deflate_qpl_codec	2023-06-09 12:43:33 +00:00
jinjunzh	f1192d59af	refine patch according to comments	2023-06-09 12:43:15 +00:00
jinjunzh	056ca4f555	Add extensive testing cases for deflate qpl codec	2023-06-09 12:42:59 +00:00
Azat Khuzhin	69aec7af9b	Add new metrics BrokenDistributedBytesToInsert/DistributedBytesToInsert Useful to see at the server status overall. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-06-03 20:49:19 +02:00
Azat Khuzhin	f2a023140e	Fix processing pending batch for Distributed async INSERT after restart After abnormal server restart current_batch.txt (that contains list of files to send to the remote shard), may not have all files, if it was terminated between unlink .bin files and truncation of current_batch.txt But it should be fixed in a more reliable way, though to backport the patch I kept it simple. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-15 15:57:30 +02:00
Robert Schulze	1be81db885	Fix build, pt. II	2023-04-05 11:23:09 +00:00
Azat Khuzhin	ba6ecd2d4e	Fix ThreadPool for DistributedSink Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-04-01 16:00:03 +02:00
Alexander Tokmakov	54314061ab	fix logical error on cancellation	2023-03-23 13:13:16 +01:00
Robert Schulze	5b036a1a3b	More preparation for libcxx(abi), llvm, clang-tidy 16 (follow-up to #47722 )	2023-03-20 12:55:03 +00:00
Sema Checherinda	3c6deddd1d	work with comments on PR	2023-03-16 19:55:58 +01:00
Azat Khuzhin	3d247b8635	Preserve error in system.distribution_queue on SYSTEM FLUSH DISTRIBUTED After refactoring in #45491 this behaviour had been changed, hence 01555_system_distribution_queue_mask became flaky, since if the flush had not happened before SYSTEM FLUSH DISTRIBUTED the error was not saved into the system.distribution_queue. v2: Improve 01555_system_distribution_queue_mask test Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-03-13 20:07:55 +01:00
Mike Kot	9920a52c51	use std::lerp, constexpr hex.h	2023-03-07 22:50:17 +00:00
Han Fei	b7eef62458	Merge pull request #45491 from azat/dist/async-send-refactoring [RFC] Rewrite distributed sends to avoid using filesystem as a queue, use in-memory queue instead	2023-03-06 12:32:33 +01:00
Azat Khuzhin	d06a4b50d6	Latest review fixes (variable naming: s/monitor/queue) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:48:07 +01:00
Azat Khuzhin	591fca57f3	Fix function names for opentelemetry spans in StorageDistributed Fixes: 02417_opentelemetry_insert_on_distributed_table Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	7063c20b3c	Change noisy "Skipping send data over distributed table." message to test Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	16bfef3c8a	Fix processing current_batch.txt on init Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	752d27d663	Fix lossing files during distributed batch send v2: do not suppress exceptions in case of errors Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	263c042c6a	Fix opentelemetry for distributed batch sends Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	00115c6615	Rename readDistributedAsyncInsertHeader() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	a76d7b22c1	Use existing public methods of StorageDistributed in DistributedSink Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	e83699a8d3	Improve comment for DistributedAsyncInsertDirectoryQueue::initializeFilesFromDisk() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	e10fb142fd	Fix race for distributed sends from disk Before it was initialized from disk only on startup, but if some INSERT can create the object before, then, it will lead to the situation when it will not be initialized. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	b5434eac3b	Rename StorageDistributedDirectoryMonitor to DistributedAsyncInsertDirectoryQueue Since #44922 it is not a directory monitor anymore. v2: Remove unused error codes v3: Contains some header fixes due to conflicts with master Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	1c4659b8e7	Separate out Batch as DistributedAsyncInsertBatch (and also some helpers) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	33b13549ad	Separate out DirectoryMonitorSource as DistributedAsyncInsertSource Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	325a7b2305	Separate out DistributedHeader as DistributedAsyncInsertHeader Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	22a39e29f7	Add a comment for StorageDistributedDirectoryMonitor::Batch::recovered Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00

1 2 3 4 5 ...

288 Commits