ClickHouse DBMS

ClickHouse. Just makes you think faster.

Run more queries in the same amount of time
Test more hypotheses
Slice and dice your data in many more new ways
Look at your data from new angles
Discover new dimensions

Blazing Fast

ClickHouse's performance exceeds comparable column-oriented DBMS currently available on the market. It processes hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second.

ClickHouse uses all available hardware to its full potential to process each query as fast as possible. The peak processing performance for a single query (after decompression, only used columns) stands at more than 2 terabytes per second.

Independent Benchmarks

ClickHouse: New Open Source Columnar Database by Percona
Column Store Database Benchmarks by Percona
1.1 Billion Taxi Rides on ClickHouse & an Intel Core i5 by Mark Litwintschik
ClickHouse vs Amazon RedShift Benchmark by Altinity
Geospatial processing with Clickhouse by Carto
ClickHouse and InfiniDB comparison by RamboLau (machine translation from Chinese)

Linearly Scalable

ClickHouse allows companies to add servers to their clusters when necessary without investing time or money into any additional DBMS modification. The system has been successfully serving Yandex.Metrica, while the count of servers in it's main production cluster have grown from 60 to 394 in two years, which are by the way located in six geographically distributed datacenters.

ClickHouse scales well both vertically and horizontally. ClickHouse is easily adaptable to perform either on cluster with hundreds of nodes, or on a single server or even on a tiny virtual machine. Currently there are installations with more than two trillion rows per single node, as well as installations with 100Tb of storage per single node.

Hardware Efficient

ClickHouse processes typical analytical queries two to three orders of magnitude faster than traditional row-oriented systems with the same available I/O throughput. The system's columnar storage format allows fitting more hot data in RAM, which leads to a shorter response times.

ClickHouse allows to minimize the number of seeks for range queries, which increases efficiency of using rotational disk drives, as it maintains locality of reference for continually stored data.

ClickHouse is CPU efficient because of it's vectorized query execution involving relevant processor instructions and runtime code generation.

By minimizing data transfers for most types of queries, ClickHouse enables companies to manage their data and create reports without using specialized networks that are aimed at high-performance computing.

Fault Tolerant

ClickHouse supports multi-master asynchronous replication and can be deployed across multiple datacenters. Downtime of a single node or the whole datacenter won't affect the system's availability for both reads and writes. Distributed reads are automatically balanced to live replicas to avoid increasing latency. Replicated data are synchronized automatically or semi-automatically after server downtime.

Feature Rich

ClickHouse features a user-friendly SQL query dialect with a number of built-in analytics capabilities. For example, it includes probabilistic data structures for fast and memory-efficient calculation of cardinalities and quantiles. There are functions for working dates, times and time zones, as well as some specialized ones like addressing URLs and IPs (both IPv4 and IPv6) and many more.

Data organizing options available in ClickHouse, such as arrays, array joins, tuples and nested data structures, are extremely efficient for managing denormalized data.

Using ClickHouse allows joining both distributed data and co-located data, as the system supports local joins and distributed joins. It also offers an opportunity to use external dictionaries, dimension tables loaded from an external source, for seamless joins with simple syntax.

ClickHouse supports approximate query processing – you can get results as fast as you want, which is indispensable when dealing with terabytes and petabytes of data.

The system's conditional aggregate functions, calculation of totals and extremes, allow getting results with a single query without having to run a number of them.

Success Stories

Yandex.Metrica
CloudFlare DNS Analytics
Migrating to Yandex ClickHouse by LifeStreet (machine translation from Russian)
How to start ClickHouse up and win the jackpot by SMI2 (machine translation from Russian)
First place at Analysys OLAP algorithm contest (machine translation from Chinese)
LHCb experiment by CERN

When to use ClickHouse

For analytics over stream of clean, well structured and immutable events or logs. It is recommended to put each such stream into a single wide fact table with pre-joined dimensions.

Some examples of viable applications:

Web and App analytics
Advertising networks and RTB
Telecommunications
E-commerce and finance
Information security
Monitoring and telemetry
Time series
Business intelligence
Online games
Internet of Things

When NOT to use ClickHouse

Transactional workloads (OLTP)
Key-value access with high request rate
Blob or document storage
Over-normalized data

Highly Reliable

ClickHouse has been managing petabytes of data serving a number of highload mass audience services of Yandex, Russia's leading search provider and one of largest European IT companies. Since 2012, ClickHouse has been providing robust database management for the company's web analytics service, comparison e-commerce platform, public email service, online advertising platform, business intelligence tools and infrastructure monitoring.

ClickHouse can be configured as purely distributed system located on independent nodes, without any single points of failure.

Software and hardware failures or misconfigurations do not result in loss of data. Instead of deleting "broken" data, ClickHouse saves it or asks you what to do before a startup. All data is checksummed before every read or write to disk or network. It is virtually impossible to delete data by accident as there are safeguards even for human errors.

ClickHouse offers flexible limits on query complexity and resource usage, which can be fine-tuned with settings. It is possible to simultaneously serve both a number of high priority low-latency requests and some long-running queries with background priority.

Simple and Handy

ClickHouse streamlines all your data processing. It's easy to use: ingest all your structured data into the system, and it is instantly available for reports. New columns for new properties or dimensions can be easily added to the system at any time without slowing it down.

ClickHouse is simple and works out-of-the-box. As well as performing on hundreds of node clusters, this system can be easily installed on a single server or even a virtual machine. No development experience or code-writing skills are required to install ClickHouse.

Quick Start

System requirements: Linux, x86_64 with SSE 4.2.

Install packages for Ubuntu/Debian:


sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4    # optional

sudo apt-add-repository "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"
sudo apt-get update

sudo apt-get install -y clickhouse-server clickhouse-client

sudo service clickhouse-server start
clickhouse-client

For other operating systems the easiest way to get started is using official Docker images of ClickHouse . Alternatively you can build ClickHouse from sources according to the instruction.

After installation proceed to tutorial or full documentation.

Contacts

Subscribe to the official ClickHouse blog and its counterpart in Russian.
Ask any questions on Stack Overflow or Google Group.
Join Telegram chat to discuss with real users in English or in Russian.

Or email ClickHouse team at Yandex directly: turn on JavaScript to see email address, for example if you are interested in commercial support.

Friendly reminder: check out the documentation in English or Russian first — maybe your question is already covered.

Like ClickHouse?

Help to spread the word about it via Facebook, Twitter and LinkedIn!

ClickHouse source code is published under Apache 2.0 License. Software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

ClickHouse