* Limit log frequence for "Skipping send data over distributed table" message
After SYSTEM STOP DISTRIBUTED SENDS it will constantly print this
message.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Rename directory monitor concept into async INSERT
Rename the following query settings (with preserving backward
compatiblity, by keeping old name as an alias):
- distributed_directory_monitor_sleep_time_ms -> distributed_async_insert_sleep_time_ms
- distributed_directory_monitor_max_sleep_time_ms -> distributed_async_insert_max_sleep_time_ms
- distributed_directory_monitor_batch -> distributed_async_insert_batch_inserts
- distributed_directory_monitor_split_batch_on_failure -> distributed_async_insert_split_batch_on_failure
Rename the following table settings (with preserving backward
compatiblity, by keeping old name as an alias):
- monitor_batch_inserts -> async_insert_batch
- monitor_split_batch_on_failure -> async_insert_split_batch_on_failure
- directory_monitor_sleep_time_ms -> async_insert_sleep_time_ms
- directory_monitor_max_sleep_time_ms -> async_insert_max_sleep_time_ms
And also update all the references:
$ gg -e directory_monitor_ -e monitor_ tests docs | cut -d: -f1 | sort -u | xargs sed -e 's/distributed_directory_monitor_sleep_time_ms/distributed_async_insert_sleep_time_ms/g' -e 's/distributed_directory_monitor_max_sleep_time_ms/distributed_async_insert_max_sleep_time_ms/g' -e 's/distributed_directory_monitor_batch_inserts/distributed_async_insert_batch/g' -e 's/distributed_directory_monitor_split_batch_on_failure/distributed_async_insert_split_batch_on_failure/g' -e 's/monitor_batch_inserts/async_insert_batch/g' -e 's/monitor_split_batch_on_failure/async_insert_split_batch_on_failure/g' -e 's/monitor_sleep_time_ms/async_insert_sleep_time_ms/g' -e 's/monitor_max_sleep_time_ms/async_insert_max_sleep_time_ms/g' -i
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Rename async_insert for Distributed into background_insert
This will avoid amigibuity between general async INSERT's and INSERT
into Distributed, which are indeed background, so new term express it
even better.
Mostly done with:
$ git di HEAD^ --name-only | xargs sed -i -e 's/distributed_async_insert/distributed_background_insert/g' -e 's/async_insert_batch/background_insert_batch/g' -e 's/async_insert_split_batch_on_failure/background_insert_split_batch_on_failure/g' -e 's/async_insert_sleep_time_ms/background_insert_sleep_time_ms/g' -e 's/async_insert_max_sleep_time_ms/background_insert_max_sleep_time_ms/g'
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Mark 02417_opentelemetry_insert_on_distributed_table as long
CI: https://s3.amazonaws.com/clickhouse-test-reports/55978/7a6abb03a0b507e29e999cb7e04f246a119c6f28/stateless_tests_flaky_check__asan_.html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
---------
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Sometimes the requests library detect the encoding incorrectly, and
because this test compares binary data it fails.
Here is an example of successfull attempt:
2023-10-30 07:32:37 [ 654 ] DEBUG : http://172.16.1.2:8123 "GET /?query=SELECT+%2A+FROM+test.simple+FORMAT+Protobuf+SETTINGS+format_schema%3D%27simple%3AKeyValuePair%27 HTTP/1.1" 200 None (connectionpool.py:546, _make_request)
2023-10-30 07:32:37 [ 654 ] DEBUG : Encoding detection: utf_8 will be used as a fallback match (api.py:480, from_bytes)
2023-10-30 07:32:37 [ 654 ] DEBUG : Encoding detection: Found utf_8 as plausible (best-candidate) for content. With 0 alternatives. (api.py:487, from_bytes)
And here is failed [1]:
2023-10-29 18:12:56 [ 525 ] DEBUG : http://172.16.9.2:8123 "GET /?query=SELECT+%2A+FROM+test.simple+FORMAT+Protobuf+SETTINGS+format_schema%3D%27message_tmp%3AMessageTmp%27 HTTP/1.1" 200 None (connectionpool.py:547, _make_request)
2023-10-29 18:12:56 [ 525 ] DEBUG : Encoding detection: Found utf_16_be as plausible (best-candidate) for content. With 1 alternatives. (api.py:487, from_bytes)
E AssertionError: assert '܈Ē͡扣܈Ȓͤ敦' == '\x07\x08\x01\x12\x03abc\x07\x08\x02\x12\x03def'
E - abcdef
E + ܈Ē͡扣܈Ȓͤ敦
[1]: https://s3.amazonaws.com/clickhouse-test-reports/56030/c7f392500e93863638c9ca9bd56c93b3193091f3/integration_tests__release__[3_4].html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Before least_used fails to detect when the disk started to have more
space, it works only when the disk starts to have less space.
The reason for this is that it uses priority_queue, and once the disk
goes at the bottom of the queue, free space will not be updated for it
until it will be selected again.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Refactor the code
* Add a new column xid for zookeeper_connection
* Support hostnames in the configuration
* Fix a typo
* Fix a typo
* Fix a typo
* Fix a bug about connect_time
* Update test case
* Update test case
* Fix a special build check error
* Resolve conflicts caused by rebase.
* Update failed test case
* Refactor the code according to comment
* Fix two compilation errors
Since current getting schema for CapnProto doesn't use cache, the
integration test was for the future if someone add cache.
Also, I was curious how schema cache affects performance and I compared reading binary files
with Protobuf (I guess it works the same for CapnProto).
```
for i in {1..1000}; do clickhouse-client -q \
"select * from test.simple format Protobuf settings format_schema='/format_schemas/simple:KeyValuePair'" > simple-protobuf${i}.bin; done
clickhouse-client --time -q "select * from file('simple-protobuf{1..999}.bin', 'Protobuf') format Null settings format_schema = 'simple:KeyValuePair'"
```
Protobuf with cache and without works with approximately the same
time.