ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 10:52:30 +00:00

Author	SHA1	Message	Date
Azat Khuzhin	263c042c6a	Fix opentelemetry for distributed batch sends Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	00115c6615	Rename readDistributedAsyncInsertHeader() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	a76d7b22c1	Use existing public methods of StorageDistributed in DistributedSink Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	e83699a8d3	Improve comment for DistributedAsyncInsertDirectoryQueue::initializeFilesFromDisk() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	e10fb142fd	Fix race for distributed sends from disk Before it was initialized from disk only on startup, but if some INSERT can create the object before, then, it will lead to the situation when it will not be initialized. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	b5434eac3b	Rename StorageDistributedDirectoryMonitor to DistributedAsyncInsertDirectoryQueue Since #44922 it is not a directory monitor anymore. v2: Remove unused error codes v3: Contains some header fixes due to conflicts with master Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	1c4659b8e7	Separate out Batch as DistributedAsyncInsertBatch (and also some helpers) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	33b13549ad	Separate out DirectoryMonitorSource as DistributedAsyncInsertSource Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	325a7b2305	Separate out DistributedHeader as DistributedAsyncInsertHeader Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	22a39e29f7	Add a comment for StorageDistributedDirectoryMonitor::Batch::recovered Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	0c19a75a1c	Add log message for batch restore Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	13a3e03f19	Introduce StorageDistributedDirectoryMonitor::Batch::{de,}serialize() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	f0a2efa630	Always manipulate with absolute file paths in DirectoryMonitor Otherwise on batch restore we can get the difference in file paths. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	ef1e642e05	Add log message to StorageDistributedDirectoryMonitor::addAndSchedule() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	16646c0923	Rename DirectoryMonitor::current_batch_file to current_file Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-28 22:33:36 +01:00
Azat Khuzhin	3f892e52ab	Revert "Revert "Merge pull request #44922 from azat/dist/async-INSERT-metrics"" This is the revert of revert since there will be follow up patches to address the issues. This reverts commit `a55798626a`.	2023-02-28 22:33:36 +01:00
AVMusorin	0bf0fe488e	added last_exception_time column into distribution_queue table	2023-02-21 23:46:57 +01:00
Antonio Andelic	8d16fe5793	Merge branch 'master' into add-support-for-settings-alias	2023-02-13 08:46:00 +01:00
Antonio Andelic	f96d480563	Merge branch 'master' into add-support-for-settings-alias	2023-02-09 16:07:45 +01:00
Robert Schulze	08c1f8346e	Merge remote-tracking branch 'origin/master' into rs/fix-fragile-linking	2023-02-07 11:22:00 +00:00
Robert Schulze	84b9ff450f	Fix terribly broken, fragile and potentially cyclic linking Sorry for the clickbaity title. This is about static method ConnectionTimeouts::getHTTPTimeouts(). It was be declared in header IO/ConnectionTimeouts.h, and defined in header IO/ConnectionTimeoutsContext.h (!). This is weird and caused issues with linking on s390x (##45520). There was an attempt to fix some inconsistencies (#45848) but neither did @Algunenano nor me at first really understand why the definition is in the header. Turns out that ConnectionTimeoutsContext.h is only #include'd from source files which are part of the normal server build BUT NOT part of the keeper standalone build (which must be enabled via CMake -DBUILD_STANDALONE_KEEPER=1). This dependency was not documented and as a result, some misguided workarounds were introduced earlier, e.g. `0341c6c54b` The deeper cause was that getHTTPTimeouts() is passed a "Context". This class is part of the "dbms" libary which is deliberately not linked by the standalone build of clickhouse-keeper. The context is only used to read the settings and the "Settings" class is part of the clickhouse_common library which is linked by clickhouse-keeper already. To resolve this mess, this PR - creates source file IO/ConnectionTimeouts.cpp and moves all ConnectionTimeouts definitions into it, including getHTTPTimeouts(). - breaks the wrong dependency by passing "Settings" instead of "Context" into getHTTPTimeouts(). - resolves the previous hacks	2023-02-05 20:49:34 +00:00
Azat Khuzhin	a196f995b1	Fix error message for a broken distributed batches ("While sending batch") There was an error from the begginning that does not respect file_indices, and iterate only over file_index_to_path, while this is not correct, since there can be less files then in file_index_to_path, and this is what file_indices for. Note, that only an error message was wrong, logic was fine. You can verify this by the logs: 2022.12.07 11:55:50.951976 [ 39217 ] {} <Debug> default.dist.DirectoryMonitor: Sending a batch of 10 files to localhost:9000 (128.42 thousand rows, 36.32 MiB bytes). 2022.12.07 11:55:50.953762 [ 39217 ] {} <Error> default.dist.DirectoryMonitor: Code: 516. DB::Exception: Received from localhost:9000. DB::Exception: Interserver authentication failed. Stack trace: ... : While sending batch, nums: 62, files: /work6/clickhouse/data/default/dist/shard1_replica1/66827258.bin As you can see "Sending a batch of 10 files" but "nums: 62" Fixes: #23856 Refs: #41813 Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-03 13:54:40 +01:00
Antonio Andelic	85cfee4bb9	Better alias definition	2023-02-01 13:54:03 +00:00
Antonio Andelic	714fad1529	Add support for settings alias	2023-01-26 14:06:46 +00:00
Alexander Tokmakov	70d1adfe4b	Better formatting for exception messages (#45449 ) * save format string for NetException * format exceptions * format exceptions 2 * format exceptions 3 * format exceptions 4 * format exceptions 5 * format exceptions 6 * fix * format exceptions 7 * format exceptions 8 * Update MergeTreeIndexGin.cpp * Update AggregateFunctionMap.cpp * Update AggregateFunctionMap.cpp * fix	2023-01-24 00:13:58 +03:00
Azat Khuzhin	a55798626a	Revert "Merge pull request #44922 from azat/dist/async-INSERT-metrics" There are the following problems with this patch: - Looses files on exception - Existing current_batch.txt on startup leads to ENOENT error and hung of distributed sends without ATTACH/DETACH - Race between creating the queue for sending at table startup and INSERT, if it had been created from INSERT, then it will not be initialized from disk They were addressed in #45491, but it makes code more cmoplex and plus since, likely, the release is comming, it is better to revert the change. This reverts commit `94604f71b7`, reversing changes made to `80f6a45376`.	2023-01-21 22:42:00 +01:00
Azat Khuzhin	f5b44cbe0d	Optimize and fix metrics for Distributed async INSERT In #43406 metrics was broken for a clean start, since they where not initialized from disk, but metrics for broken files was never initialized from disk. Fix this and rework how DirectoryMonitor works with file system: - do not iterate over directory before each send, do this only once on init, after the map of files will be updated from the INSERT - call fs::create_directories() from the ctor for "broken" folder to avoid excessive calls - cache "broken" paths This patch also fixes possible issue when current_batch can be processed multiple times (second time will be an exception), since if there is existing current_batch.txt after processing it you should remove it instantly. Plus this patch implicitly fixes issues with logging, that logs incorrect number of files in case of error (see #44907 for details). Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-05 11:07:40 +01:00
Alexander Tokmakov	52d3e5471b	Merge pull request #43406 from azat/dist/async-insert-stat Avoid race condition for updating system.distribution_queue values	2022-12-05 12:53:12 +03:00
Alexander Tokmakov	e45105bf44	detach threads from thread group	2022-11-28 21:31:55 +01:00
Azat Khuzhin	177cbbac4b	Avoid race condition for updating system.distribution_queue values Previously it was possible to have a race while updating files_count/bytes_count, since INSERT updates it those counters from one thread and the same metrics are updated from filesystem in a separate thread, and even though the access is synchronized with the mutex it avoids the race only for accessing the variables not the logical race, since it is possible that getFiles() from a separate thread will increment counters and later addAndSchedule() will increment them again. Here you can find an example of this race [1]. [1]: https://pastila.nl/?00950e00/41a3c7bbb0a7e75bd3f2922c58b02334 Note, that I analyzed logs from production system with lots of async Distributed INSERT and everything is OK there, even though the logs contains the following: 2022.11.20 02:21:15.459483 [ 11528 ] {} <Trace> v21.dist_out.DirectoryMonitor: Files set to 35 (was 34) 2022.11.20 02:21:15.459515 [ 11528 ] {} <Trace> v21.dist_out.DirectoryMonitor: Bytes set to 4035418 (was 3929008) 2022.11.20 02:21:15.819488 [ 11528 ] {} <Trace> v21.dist_out.DirectoryMonitor: Files set to 1 (was 2) 2022.11.20 02:21:15.819502 [ 11528 ] {} <Trace> v21.dist_out.DirectoryMonitor: Bytes set to 190072 (was 296482) As you may see it first increases the counters and next update decreases (and 4035418-3929008 == 296482-190072) Refs: #23885 Reported-by: @tavplubix Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-11-20 14:13:01 +01:00
Azat Khuzhin	3b2ceee753	Apply connection timeouts settings for Distributed async INSERT from the query Previosly connection related settings (connect_timeout_with_failover_ms, connect_timeout_with_failover_secure_ms) was applied from the query only for the case insert_distributed_sync=1, and in case of async INSERT it uses global settings. Note that this changes how connections is allocated, so now split_batch_on_failure will create it's own connection, and this can introduce more duplicates since in case of split_batch_on_failure is enabled it may send files to different server, but this should not be a problem because: - it does not resend batch if it has only one file, when deduplication will work - and in all other cases deduplication will not work since checksum should be different Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-11-11 09:54:07 +01:00
Azat Khuzhin	d1cee3e1ff	Do not resend batch as separate files if there is only one file Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-11-11 09:48:44 +01:00
Azat Khuzhin	4e76629aaf	Fixes for -Wshorten-64-to-32 - lots of static_cast - add safe_cast - types adjustments - config - IStorage::read/watch - ... - some TODO's (to convert types in future) P.S. That was quite a journey... v2: fixes after rebase v3: fix conflicts after #42308 merged Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-10-21 13:25:19 +02:00
zhongyuankai	d497b72f80	Update DirectoryMonitor.cpp	2022-09-27 11:00:38 +08:00
Frank Chen	a986380522	Update src/Storages/Distributed/DirectoryMonitor.cpp Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>	2022-09-08 17:25:29 +08:00
Frank Chen	329f31e7ab	Address review comments Signed-off-by: Frank Chen <frank.chen021@outlook.com>	2022-09-08 11:38:10 +08:00
Frank Chen	2067096035	Optimize span log for SYNC insert	2022-09-06 16:06:30 +08:00
Frank Chen	8365e7bfac	Remove extra attribute	2022-09-06 15:41:21 +08:00
Frank Chen	6ab1549d6c	Update writeToLocal to record related info	2022-09-05 16:40:48 +08:00
Frank Chen	a17bc51d5b	Save cluster/distributed/table to log	2022-09-05 16:39:47 +08:00
Frank Chen	3d65e3f2ee	Add cluster/distributed/remote to file	2022-09-05 16:37:55 +08:00
Frank Chen	92f7ca3616	Move TracingContextOnThread::current() out of class for simplicity Signed-off-by: Frank Chen <frank.chen021@outlook.com>	2022-08-25 20:23:56 +08:00
Frank Chen	bb00dcc19b	Remove using namespace from header Signed-off-by: Frank Chen <frank.chen021@outlook.com>	2022-08-25 20:20:13 +08:00
Frank Chen	99c37ce6c6	Merge branch 'master' into tracing_context_propagation	2022-08-25 10:07:16 +08:00
Frank Chen	cd19366b44	Move classes into DB::OpenTelemetry namespace	2022-08-24 16:41:40 +08:00
Frank Chen	efc6a60a60	Clean code	2022-08-24 15:59:44 +08:00
Robert Schulze	77e64935e1	Reduce some usage of StringRef	2022-08-19 09:56:59 +00:00
Frank Chen	a3b6ad2a65	Merge branch 'master' into tracing_context_propagation	2022-08-18 20:59:07 +08:00
Yakov Olkhovskiy	2e34b384c1	update tcp protocol, add quota_key	2022-08-03 15:44:08 -04:00
Frank Chen	8eb254c0c8	Fix merge problem	2022-08-02 10:23:51 +08:00

1 2 3 4 5

247 Commits