* Limit log frequence for "Skipping send data over distributed table" message
After SYSTEM STOP DISTRIBUTED SENDS it will constantly print this
message.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Rename directory monitor concept into async INSERT
Rename the following query settings (with preserving backward
compatiblity, by keeping old name as an alias):
- distributed_directory_monitor_sleep_time_ms -> distributed_async_insert_sleep_time_ms
- distributed_directory_monitor_max_sleep_time_ms -> distributed_async_insert_max_sleep_time_ms
- distributed_directory_monitor_batch -> distributed_async_insert_batch_inserts
- distributed_directory_monitor_split_batch_on_failure -> distributed_async_insert_split_batch_on_failure
Rename the following table settings (with preserving backward
compatiblity, by keeping old name as an alias):
- monitor_batch_inserts -> async_insert_batch
- monitor_split_batch_on_failure -> async_insert_split_batch_on_failure
- directory_monitor_sleep_time_ms -> async_insert_sleep_time_ms
- directory_monitor_max_sleep_time_ms -> async_insert_max_sleep_time_ms
And also update all the references:
$ gg -e directory_monitor_ -e monitor_ tests docs | cut -d: -f1 | sort -u | xargs sed -e 's/distributed_directory_monitor_sleep_time_ms/distributed_async_insert_sleep_time_ms/g' -e 's/distributed_directory_monitor_max_sleep_time_ms/distributed_async_insert_max_sleep_time_ms/g' -e 's/distributed_directory_monitor_batch_inserts/distributed_async_insert_batch/g' -e 's/distributed_directory_monitor_split_batch_on_failure/distributed_async_insert_split_batch_on_failure/g' -e 's/monitor_batch_inserts/async_insert_batch/g' -e 's/monitor_split_batch_on_failure/async_insert_split_batch_on_failure/g' -e 's/monitor_sleep_time_ms/async_insert_sleep_time_ms/g' -e 's/monitor_max_sleep_time_ms/async_insert_max_sleep_time_ms/g' -i
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Rename async_insert for Distributed into background_insert
This will avoid amigibuity between general async INSERT's and INSERT
into Distributed, which are indeed background, so new term express it
even better.
Mostly done with:
$ git di HEAD^ --name-only | xargs sed -i -e 's/distributed_async_insert/distributed_background_insert/g' -e 's/async_insert_batch/background_insert_batch/g' -e 's/async_insert_split_batch_on_failure/background_insert_split_batch_on_failure/g' -e 's/async_insert_sleep_time_ms/background_insert_sleep_time_ms/g' -e 's/async_insert_max_sleep_time_ms/background_insert_max_sleep_time_ms/g'
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Mark 02417_opentelemetry_insert_on_distributed_table as long
CI: https://s3.amazonaws.com/clickhouse-test-reports/55978/7a6abb03a0b507e29e999cb7e04f246a119c6f28/stateless_tests_flaky_check__asan_.html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
---------
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
The TCP interface progress has this field. This is not a super accurate
measure of server side query time, but trying to measure from the client
is even worse.
After #36425 there was a lot of confusions/problems with configuring pools - when the message was confusing, and settings need to be ajusted in several places.
See some examples in #44251, #43351, #47900, #46515.
The commit includes the following changes:
1) Introduced a unified mechanism for reading pool sizes from the configuration file(s). Previously, pool sizes were read from the Context.cpp with fallbacks to profiles, whereas main_config_reloader in Server.cpp read them directly without fallbacks.
2) Corrected the data type for background_merges_mutations_concurrency_ratio. It should be float instead of int.
3) Refactored the default values for settings. Previously, they were defined in multiple places throughout the codebase, but they are now defined in one place (or two, to be exact: Settings.h and ServerSettings.h).
4) Improved documentation, including the correct message in system.settings.
Additionally make the code more conform with #46550.
为什么图中显示的数据与结论不符合?因为图中的数据是禁用了自适应索引粒度后得到的,默认情况下索引粒度是自适应的。
https://clickhouse.com/docs/en/optimize/sparse-primary-indexes
We mentioned in the beginning of this guide in the "DDL Statement Details", that we disabled adaptive index granularity (in order to simplify the discussions in this guide, as well as make the diagrams and results reproducible).
For tables with adaptive index granularity (index granularity is adaptive by default) the size of some granules can be less than 8192 rows depending on the row data sizes.
我们在本指南开头的“DDL 语句详细信息”中提到,我们禁用了自适应索引粒度(为了简化本指南中的讨论,并使图表和结果可重现)。
对于具有自适应索引粒度的表(默认情况下索引粒度是自适应的),某些粒度的大小可以小于 8192 行,具体取决于行数据大小。
https://clickhouse.com/docs/en/whats-new/changelog/2019#experimental-features-1
ClickHouse Release 19.6.3.18, 2019-06-13
Experimental Features:实验性特性
Add setting index_granularity_bytes (adaptive index granularity) for MergeTree* tables family.
为合并树家族的表系列添加设置index_granularity_bytes(自适应索引粒度)。
ClickHouse Release 19.10.1.5, 2019-07-12
Performance Improvement:优化改进
Add the possibility to write the final mark at the end of MergeTree columns. It allows to avoid useless reads for keys that are out of table data range. It is enabled only if adaptive index granularity is in use.
添加在合并树列末尾写入最终标记的可能性。它允许避免对超出表数据范围的键进行无用的读取。仅当使用自适应索引粒度时,才会启用它。