Sorry for the clickbaity title. This is about static method
ConnectionTimeouts::getHTTPTimeouts(). It was be declared in header
IO/ConnectionTimeouts.h, and defined in header
IO/ConnectionTimeoutsContext.h (!). This is weird and caused issues with
linking on s390x (##45520). There was an attempt to fix some
inconsistencies (#45848) but neither did @Algunenano nor me at first
really understand why the definition is in the header.
Turns out that ConnectionTimeoutsContext.h is only #include'd from
source files which are part of the normal server build BUT NOT part of
the keeper standalone build (which must be enabled via CMake
-DBUILD_STANDALONE_KEEPER=1). This dependency was not documented and as
a result, some misguided workarounds were introduced earlier, e.g.
0341c6c54b
The deeper cause was that getHTTPTimeouts() is passed a "Context". This
class is part of the "dbms" libary which is deliberately not linked by
the standalone build of clickhouse-keeper. The context is only used to
read the settings and the "Settings" class is part of the
clickhouse_common library which is linked by clickhouse-keeper already.
To resolve this mess, this PR
- creates source file IO/ConnectionTimeouts.cpp and moves all
ConnectionTimeouts definitions into it, including getHTTPTimeouts().
- breaks the wrong dependency by passing "Settings" instead of "Context"
into getHTTPTimeouts().
- resolves the previous hacks
There was an error from the begginning that does not respect
file_indices, and iterate only over file_index_to_path, while this is
not correct, since there can be less files then in file_index_to_path,
and this is what file_indices for.
Note, that only an error message was wrong, logic was fine. You can
verify this by the logs:
2022.12.07 11:55:50.951976 [ 39217 ] {} <Debug> default.dist.DirectoryMonitor: Sending a batch of 10 files to localhost:9000 (128.42 thousand rows, 36.32 MiB bytes).
2022.12.07 11:55:50.953762 [ 39217 ] {} <Error> default.dist.DirectoryMonitor: Code: 516. DB::Exception: Received from localhost:9000. DB::Exception: Interserver authentication failed. Stack trace:
...
: While sending batch, nums: 62, files: /work6/clickhouse/data/default/dist/shard1_replica1/66827258.bin
As you can see "Sending a batch of 10 files" but "nums: 62"
Fixes: #23856
Refs: #41813
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* save format string for NetException
* format exceptions
* format exceptions 2
* format exceptions 3
* format exceptions 4
* format exceptions 5
* format exceptions 6
* fix
* format exceptions 7
* format exceptions 8
* Update MergeTreeIndexGin.cpp
* Update AggregateFunctionMap.cpp
* Update AggregateFunctionMap.cpp
* fix
There are the following problems with this patch:
- Looses files on exception
- Existing current_batch.txt on startup leads to ENOENT error and hung
of distributed sends without ATTACH/DETACH
- Race between creating the queue for sending at table startup and
INSERT, if it had been created from INSERT, then it will not be
initialized from disk
They were addressed in #45491, but it makes code more cmoplex and plus
since, likely, the release is comming, it is better to revert the
change.
This reverts commit 94604f71b7, reversing
changes made to 80f6a45376.
In #43406 metrics was broken for a clean start, since they where not
initialized from disk, but metrics for broken files was never
initialized from disk.
Fix this and rework how DirectoryMonitor works with file system:
- do not iterate over directory before each send, do this only once on
init, after the map of files will be updated from the INSERT
- call fs::create_directories() from the ctor for "broken" folder to
avoid excessive calls
- cache "broken" paths
This patch also fixes possible issue when current_batch can be processed
multiple times (second time will be an exception), since if there is
existing current_batch.txt after processing it you should remove it
instantly.
Plus this patch implicitly fixes issues with logging, that logs
incorrect number of files in case of error (see #44907 for details).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Previously it was possible to have a race while updating
files_count/bytes_count, since INSERT updates it those counters from one
thread and the same metrics are updated from filesystem in a separate
thread, and even though the access is synchronized with the mutex it
avoids the race only for accessing the variables not the logical race,
since it is possible that getFiles() from a separate thread will
increment counters and later addAndSchedule() will increment them again.
Here you can find an example of this race [1].
[1]: https://pastila.nl/?00950e00/41a3c7bbb0a7e75bd3f2922c58b02334
Note, that I analyzed logs from production system with lots of async
Distributed INSERT and everything is OK there, even though the logs
contains the following:
2022.11.20 02:21:15.459483 [ 11528 ] {} <Trace> v21.dist_out.DirectoryMonitor: Files set to 35 (was 34)
2022.11.20 02:21:15.459515 [ 11528 ] {} <Trace> v21.dist_out.DirectoryMonitor: Bytes set to 4035418 (was 3929008)
2022.11.20 02:21:15.819488 [ 11528 ] {} <Trace> v21.dist_out.DirectoryMonitor: Files set to 1 (was 2)
2022.11.20 02:21:15.819502 [ 11528 ] {} <Trace> v21.dist_out.DirectoryMonitor: Bytes set to 190072 (was 296482)
As you may see it first increases the counters and next update
decreases (and 4035418-3929008 == 296482-190072)
Refs: #23885
Reported-by: @tavplubix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Previosly connection related settings (connect_timeout_with_failover_ms,
connect_timeout_with_failover_secure_ms) was applied from the query only
for the case insert_distributed_sync=1, and in case of async INSERT it
uses global settings.
Note that this changes how connections is allocated, so now
split_batch_on_failure will create it's own connection, and this can
introduce more duplicates since in case of split_batch_on_failure is
enabled it may send files to different server, but this should not be a
problem because:
- it does not resend batch if it has only one file, when deduplication
will work
- and in all other cases deduplication will not work since checksum
should be different
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
- lots of static_cast
- add safe_cast
- types adjustments
- config
- IStorage::read/watch
- ...
- some TODO's (to convert types in future)
P.S. That was quite a journey...
v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
* Add a warning if parallel_distributed_insert_select was ignored
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Respect max_distributed_depth for parallel_distributed_insert_select
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Print warning for non applied parallel_distributed_insert_select only for initial query
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Remove Cluster::getHashOfAddresses()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Forbid parallel_distributed_insert_select for remote()/cluster() with different addresses
Before it uses empty cluster name (getClusterName()) which is not
correct, compare all addresses instead.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Fix max_distributed_depth check
max_distributed_depth=1 must mean not more then one distributed query,
not two, since max_distributed_depth=0 means no limit, and
distribute_depth is 0 for the first query.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Fix INSERT INTO remote()/cluster() with parallel_distributed_insert_select
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Add a test for parallel_distributed_insert_select with cluster()/remote()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Return <remote> instead of empty cluster name in Distributed engine
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Make user with sharding_key and w/o in remote()/cluster() identical
Before with sharding_key the user was "default", while w/o it it was
empty.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Defines.h is a very common header, so lots of modules will be recompiled
on changes.
Move macros for protocol into separate header, this should significantly
decreases number of units to compile on it's changes.
UBsan reports:
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/Storages/Distributed/DirectoryMonitor.cpp:435:53 in
../src/Storages/Distributed/DirectoryMonitor.cpp:435: runtime error: 1.15292e+19 is outside the range of representable values of type 'long'
0 0x1df0c286 in DB::StorageDistributedDirectoryMonitor::run() obj-x86_64-linux-gnu/../src/Storages/Distributed/DirectoryMonitor.cpp:435:53
It is pretty easy to reproduce by limiting max_server_memory_usage
before staring the test.
- create directory monitors in parallel (this also includes rmdir in
case of directory is empty, since even if the directory is empty it
may take some time to remove it, due to waiting for journal or if the
directory is large, i.e. it had lots of files before, since remember
ext4 does not truncate the directory size on each unlink [1])
- initialize increment in parallel too (since it does readdir())
[1]: https://lore.kernel.org/linux-ext4/930A5754-5CE6-4567-8CF0-62447C97825C@dilger.ca/
Broken batches may be because of abnormal server shutdown (and lack of
fsync), and ignoring the whole batch is not great in this case, so apply
the same split logic here too.
v2: rename exception
v3: catch missing exception
v4: fix marking the file as broken multiple times (fixes
test_insert_distributed_async_send with setting enabled)
Add distributed_directory_monitor_split_batch_on_failure setting (OFF by
default), that will split the batch and send files one by one in case of
retriable errors.
v2: more error codes
Under use_compact_format_in_distributed_parts_names=1 and
internal_replication=true the server encodes all replicas for the
directory name for async INSERT into Distributed, and the directory name
looks like:
shard1_replica1,shard1_replica2,shard3_replica3
This is required for creating connections (to specific replicas only),
but in case of internal_replication=true, this can be avoided, since
this path will always includes all replicas.
This patch replaces all replicas with "_all_replicas" marker.
Note, that initial problem was that this path may overflow the NAME_MAX
if you will have more then 15 replicas, and the server will fail to
create the directory.
Also note, that changed directory name should not be a problem, since:
- empty directories will be removed since #16729
- and replicas encoded in the directory name is also supported anyway.