* Limit log frequence for "Skipping send data over distributed table" message
After SYSTEM STOP DISTRIBUTED SENDS it will constantly print this
message.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Rename directory monitor concept into async INSERT
Rename the following query settings (with preserving backward
compatiblity, by keeping old name as an alias):
- distributed_directory_monitor_sleep_time_ms -> distributed_async_insert_sleep_time_ms
- distributed_directory_monitor_max_sleep_time_ms -> distributed_async_insert_max_sleep_time_ms
- distributed_directory_monitor_batch -> distributed_async_insert_batch_inserts
- distributed_directory_monitor_split_batch_on_failure -> distributed_async_insert_split_batch_on_failure
Rename the following table settings (with preserving backward
compatiblity, by keeping old name as an alias):
- monitor_batch_inserts -> async_insert_batch
- monitor_split_batch_on_failure -> async_insert_split_batch_on_failure
- directory_monitor_sleep_time_ms -> async_insert_sleep_time_ms
- directory_monitor_max_sleep_time_ms -> async_insert_max_sleep_time_ms
And also update all the references:
$ gg -e directory_monitor_ -e monitor_ tests docs | cut -d: -f1 | sort -u | xargs sed -e 's/distributed_directory_monitor_sleep_time_ms/distributed_async_insert_sleep_time_ms/g' -e 's/distributed_directory_monitor_max_sleep_time_ms/distributed_async_insert_max_sleep_time_ms/g' -e 's/distributed_directory_monitor_batch_inserts/distributed_async_insert_batch/g' -e 's/distributed_directory_monitor_split_batch_on_failure/distributed_async_insert_split_batch_on_failure/g' -e 's/monitor_batch_inserts/async_insert_batch/g' -e 's/monitor_split_batch_on_failure/async_insert_split_batch_on_failure/g' -e 's/monitor_sleep_time_ms/async_insert_sleep_time_ms/g' -e 's/monitor_max_sleep_time_ms/async_insert_max_sleep_time_ms/g' -i
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Rename async_insert for Distributed into background_insert
This will avoid amigibuity between general async INSERT's and INSERT
into Distributed, which are indeed background, so new term express it
even better.
Mostly done with:
$ git di HEAD^ --name-only | xargs sed -i -e 's/distributed_async_insert/distributed_background_insert/g' -e 's/async_insert_batch/background_insert_batch/g' -e 's/async_insert_split_batch_on_failure/background_insert_split_batch_on_failure/g' -e 's/async_insert_sleep_time_ms/background_insert_sleep_time_ms/g' -e 's/async_insert_max_sleep_time_ms/background_insert_max_sleep_time_ms/g'
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Mark 02417_opentelemetry_insert_on_distributed_table as long
CI: https://s3.amazonaws.com/clickhouse-test-reports/55978/7a6abb03a0b507e29e999cb7e04f246a119c6f28/stateless_tests_flaky_check__asan_.html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
---------
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Added new type of authentication based on SSH keys. It works only for Native TCP protocol.
Co-authored-by: Nikita Mikhaylov <nikitamikhaylov@clickhouse.com>
Co-authored-by: Robert Schulze <robert@clickhouse.com>
There are lots of thread pools and simple local-vs-global is not enough
already, it is good to know which one in particular uses threads.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
There are very frequent flakiness of `test_cluster_copier` test, here is
an example of copier failures on CI [1]:
AssertionError: Instance: s0_1_0 (172.16.29.9). Info: {'ID': '5d68dcb46fdb4b0c54b7c7ba1ddde83b8f34d483bbb32abcb0c52b966444ce82', 'Running': False, 'ExitCode': 85, 'ProcessConfig': {'tty': False, 'entrypoint': '/usr/bin/clickhouse', 'arguments': ['copier', '--config', '/etc/clickhouse-server/config-copier.xml', '--task-path', '/clickhouse-copier/task_simple_4DFWYTDD49', '--task-file', '/task0_description.xml', '--task-upload-force', 'true', '--base-dir', '/var/log/clickhouse-server/copier', '--copy-fault-probability', '0.2', '--experimental-use-sample-offset', '1'], 'privileged': False, 'user': '0'}, 'OpenStdin': False, 'OpenStderr': True, 'OpenStdout': True, 'CanRemove': False, 'ContainerID': 'f356df6694b3cc09ee9830c623681626f8e8d999677c188b9fe911aa702784ca', 'DetachKeys': '', 'Pid': 84332}
assert 85 == 0
But let's look what the error it is, apparently it is UNFINISHED:
SELECT
name,
code
FROM system.errors
WHERE ((code % 256) = 85) AND (NOT remote)
SETTINGS system_events_show_zero_values = 1
┌─name─────────────────────────────┬─code─┐
│ FORMAT_IS_NOT_SUITABLE_FOR_INPUT │ 85 │
│ UNFINISHED │ 341 │
│ NO_SUCH_ERROR_CODE │ 597 │
└──────────────────────────────────┴──────┘
Let's verify:
$ grep -r UNFINISHED ./test_cluster_copier/_instances_0/s0_1_0/logs/copier/clickhouse-copier_*
./test_cluster_copier/_instances_0/s0_1_0/logs/copier/clickhouse-copier_20230206220846_368/log.log:2023.02.06 22:09:19.015251 [ 368 ] {} <Error> : virtual int DB::ClusterCopierApp::main(const std::vector<std::string> &): Code: 341. DB::Exception: Too many tries to process table cluster1.default.hits. Abort remaining execution. (UNFINISHED), Stack trace (when copying this message, always include the lines below):
And apparently that it is due to query error with fault injection:
2023.02.06 22:09:15.654724 [ 368 ] {} <Error> Application: An error occurred while processing partition 0: Code: 62. DB::Exception: Syntax error (Query): failed at position 168 ('Native'): Native. Expected one of: token, Dot, OR, AND, BETWEEN, NOT BETWEEN, LIKE, ILIKE, NOT LIKE, NOT ILIKE, IN, NOT IN, GLOBAL IN, GLOBAL NOT IN, MOD, DIV, IS NULL, IS NOT NULL, alias, AS, Comma, OFFSET, WITH TIES, BY, LIMIT, SETTINGS, UNION, EXCEPT, INTERSECT, INTO OUTFILE, FORMAT, end of query. (SYNTAX_ERROR), Stack trace (when copying this message, always include the lines below):
Example:
select x from x limit 1FORMAT Native
Syntax error: failed at position 32 ('Native'):
So fixing this should fix test_cluster_copier flakiness.
[1]: https://s3.amazonaws.com/clickhouse-test-reports/46045/bd4170e03c6af583a51d12d2c39fa775dcb9997b/integration_tests__release__[4/4].html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>