mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-30 19:42:00 +00:00
118 lines
4.1 KiB
Plaintext
118 lines
4.1 KiB
Plaintext
ClickHouse administration tips.
|
|
|
|
|
|
CPU
|
|
|
|
SSE 4.2 instruction set support is required. Most recent (since 2008) CPUs have this instruction set.
|
|
|
|
When choosing between CPU with more cores and slightly less frequency and CPU with less cores and more frequency, choose first.
|
|
For example, 16 cores with 2600 MHz is better than 8 cores with 3600 MHz.
|
|
|
|
|
|
Hyper-Threading
|
|
|
|
Don't disable hyper-threading. Some queries will benefit from hyper-threading and some will not.
|
|
|
|
|
|
Turbo-Boost
|
|
|
|
Don't disable turbo-boost. It will do significant performance gain on typical load.
|
|
Use 'turbostat' tool to show real CPU frequency under load.
|
|
|
|
|
|
CPU scaling governor
|
|
|
|
Always use 'performance' scaling governor. 'ondemand' scaling governor performs much worse even on constantly high demand.
|
|
sudo echo 'performance' | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
|
|
|
|
|
|
CPU throttling
|
|
|
|
Your CPU could be overheated. Use 'dmesg' to show if it was thermal throttled.
|
|
Your CPU could be power capped in datacenter. Use 'turbostat' tool under load to monitor that.
|
|
|
|
|
|
RAM
|
|
|
|
For small amount of data (up to ~200 GB compressed) prefer to use as much RAM as data volume.
|
|
For larger amount of data, if you run interactive (online) queries, use reasonable amount of RAM (128 GB or more) to hot data fit in page cache.
|
|
Even for data volumes of ~50 TB per server, using 128 GB of RAM is much better for query performance than 64 GB.
|
|
|
|
|
|
Swap
|
|
|
|
Disable swap. The only possible reason to not disable swap is when you are running ClickHouse on your personal laptop/desktop.
|
|
|
|
|
|
Huge pages
|
|
|
|
Disable transparent huge pages. It interfers badly with memory allocators, leading to major performance degradation.
|
|
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
|
|
Use 'perf top' to monitor time spent in kernel on doing memory management.
|
|
Don't allocate permanent huge pages.
|
|
|
|
|
|
Storage subsystem
|
|
|
|
If you could afford SSD, use SSD.
|
|
Otherwise use HDD. SATA HDDs 7200 RPM are Ok.
|
|
|
|
Prefer more servers with inplace storage to less servers with huge disk shelves.
|
|
Of course you could use huge disk shelves for archive storage with rare queries.
|
|
|
|
|
|
RAID
|
|
|
|
When using HDDs, you could use RAID-10, RAID-5, RAID-6 or RAID-50.
|
|
Use Linux software RAID (mdadm). Better to not use LVM.
|
|
When creating RAID-10, choose 'far' layout.
|
|
Prefer RAID-10 if you could afford it.
|
|
|
|
Don't use RAID-5 on more than 4 HDDs - use RAID-6 or RAID-50. RAID-6 is better.
|
|
When using RAID-5, RAID-6, or RAID-50, always increase stripe_cache_size, because default setting is awful.
|
|
echo 4096 | sudo tee /sys/block/md2/md/stripe_cache_size
|
|
Exact number is calculated from number of devices and chunk size: 2 * num_devices * chunk_size_in_bytes / 4096.
|
|
|
|
Chunk size 1024K is Ok for all RAID configurations.
|
|
Never use too small or too large chunk sizes.
|
|
|
|
On SSDs, you could use RAID-0.
|
|
Regardless to RAID, always use replication for data safety.
|
|
|
|
Enable NCQ. Use high queue depth. Use CFQ scheduler for HDDs and noop for SSDs. Don't lower readahead setting.
|
|
Enable write cache on HDDs.
|
|
|
|
|
|
Filesystem
|
|
|
|
Ext4 is Ok. Mount with noatime,nobarrier.
|
|
XFS is Ok too, but less tested with ClickHouse.
|
|
Most other filesystems should work fine. Filesystems with delayed allocation are better.
|
|
|
|
|
|
Linux kernel
|
|
|
|
Don't use too old Linux kernel. For example, on 2015, 3.18.19 is Ok.
|
|
You could use Yandex kernel: https://github.com/yandex/smart
|
|
- it gives at least 5% performance increase.
|
|
|
|
|
|
Network
|
|
|
|
When using IPv6, you must increase route cache.
|
|
Linux kernels before 3.2 has awful bugs in IPv6 implementation.
|
|
|
|
Prefer at least 10 Gbit network. 1 Gbit will also work, but much worse for repairing replicas with tens of terabytes of data and for processing huge distributed queries with much intermediate data.
|
|
|
|
|
|
ZooKeeper
|
|
|
|
Probably you already have ZooKeeper for another purposes.
|
|
It's Ok to use existing ZooKeeper installation if it is not overloaded.
|
|
|
|
Use recent version of ZooKeeper. At least 3.5 is Ok. Version in your linux package repository might be outdated.
|
|
|
|
With default settings, ZooKeeper have time bomb:
|
|
"A ZooKeeper server will not remove old snapshots and log files when using the default configuration (see autopurge below), this is the responsibility of the operator."
|
|
You need to defuse the bomb.
|