ClickHouse/docs/en/operations/monitoring.md
Dan Roscigno f4f85a069b
Go live with doc updates (#42053)
* QIP to add overview page

* wip

* New Tutorial and Datasets landing page

* give an example for Cloud

* Update UK Price Paid for Cloud

* Update nyc-taxi.md

* add option for Cloud Load Data button

* Removed the Import Raw Data section

* Update nyc-taxi.md

* update user management and replication docs

* mark self managed

* set doc ordering

* add redirects setting

* Simple fixes to index.md

Co-authored-by: rfraposa <richraposa@gmail.com>
2022-10-04 14:36:59 +03:00

2.9 KiB

slug sidebar_position sidebar_label
/en/operations/monitoring 45 Monitoring

Monitoring

import SelfManaged from '@site/docs/en/_snippets/_self_managed_only_automated.md';

You can monitor:

  • Utilization of hardware resources.
  • ClickHouse server metrics.

Resource Utilization

ClickHouse does not monitor the state of hardware resources by itself.

It is highly recommended to set up monitoring for:

  • Load and temperature on processors.

    You can use dmesg, turbostat or other instruments.

  • Utilization of storage system, RAM and network.

ClickHouse Server Metrics

ClickHouse server has embedded instruments for self-state monitoring.

To track server events use server logs. See the logger section of the configuration file.

ClickHouse collects:

  • Different metrics of how the server uses computational resources.
  • Common statistics on query processing.

You can find metrics in the system.metrics, system.events, and system.asynchronous_metrics tables.

You can configure ClickHouse to export metrics to Graphite. See the Graphite section in the ClickHouse server configuration file. Before configuring export of metrics, you should set up Graphite by following their official guide.

You can configure ClickHouse to export metrics to Prometheus. See the Prometheus section in the ClickHouse server configuration file. Before configuring export of metrics, you should set up Prometheus by following their official guide.

Additionally, you can monitor server availability through the HTTP API. Send the HTTP GET request to /ping. If the server is available, it responds with 200 OK.

To monitor servers in a cluster configuration, you should set the max_replica_delay_for_distributed_queries parameter and use the HTTP resource /replicas_status. A request to /replicas_status returns 200 OK if the replica is available and is not delayed behind the other replicas. If a replica is delayed, it returns 503 HTTP_SERVICE_UNAVAILABLE with information about the gap.