ClickHouse DBMS

ClickHouse. Just makes you think faster.

Run more queries in the same amount of time
Test more hypotheses
Slice and dice your data in many more new ways
Look at your data from new angles
Discover new dimensions

Blazing Fast

ClickHouse's performance exceeds comparable column-oriented DBMS currently available on the market. It processes hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second.

ClickHouse uses all available hardware to its full potential to process each query as fast as possible. The peak processing performance for a single query (after decompression, only used columns) stands at more than 2 terabytes per second.

Linearly Scalable

ClickHouse allows companies to add servers to their clusters when necessary without investing time or money into additional DBMS modification. The system has been successfully serving Yandex.Metrica, while the servers just in its main cluster, located in six geographically distributed datacenters, have grown from 60 to 394 in two years.

ClickHouse scales well both vertically and horizontally. ClickHouse is easily adaptable to perform both on hundreds of node clusters, and on a single server or even virtual machine. It currently has installations with more than two trillion rows per single node, as well as installations with 100 TB of storage per single node.

Hardware Efficient

ClickHouse processes typical analytical queries two to three orders of magnitude faster than traditional row-oriented systems with the same available IO throughput. The system’s columnar format allows fitting more hot data in the server’s RAM, which leads to a shorter response time.

ClickHouse allows to minimize number of seeks for range queries, which increases efficiency of using rotational drives, as it maintains locality of reference for stored data continually.

ClickHouse is CPU efficient because of its vectorized query execution and runtime code generation.

By minimizing data transfers for most types of queries, ClickHouse enables companies to manage their data and create reports without using a network that supports high-performance computing.

Feature Rich

ClickHouse features a number of built-in user-friendly web analytics capabilities, including probabilistic data structures for fast and memory-efficient calculation of cardinalities and quantiles, or functions for addressing URLs and IPs (both IPv4 and IPv6) as well as identifying dates, times and time zones.

Data management methods available in ClickHouse, such as arrays, array joins and nested data structures, are extremely efficient for managing denormalized data.

Using ClickHouse allows joining both distributed data and co-located data, as the system supports local joins and distributed joins. It also offers an opportunity to use external dictionaries, dimension tables loaded from an external source, for seamless joins.

ClickHouse supports approximate query processing – you can get results as fast as you want, which is indispensable when dealing with terabytes and petabytes of data.

The system’s conditional aggregate functions, calculation of totals and extremes, allow getting results with a single query without having to run a number of them.

Key Features

True column-oriented storage
Vectorized query execution
Data compression
Parallel and distributed query execution
Real-time query processing and data ingestion
On-disk locality of reference
Cross-datacenter replication
High availability
SQL support
Local and distributed joins
Pluggable external dimension tables
Arrays and nested data types
Approximate query processing
Probabilistic data structures
Full support of IPv6
Features for web analytics
State-of-the-art algorithms
Detailed documentation
Clean documented code

Applications

Web and App analytics
Advertising networks and RTB
Telecommunications
E-commerce
Information security
Monitoring and telemetry
Business intelligence
Online games
Internet of Things

Simple and Handy

ClickHouse streamlines all your data processing. It’s easy to use: ingest all your structured data into the system, and it is instantly available for reports. New columns for new properties or dimensions can be easily added to the system at any time without slowing it down.

ClickHouse is simple and works out-of-the-box. As well as performing on hundreds of node clusters, this system can be easily installed on a single server or even a virtual machine. No development experience or code-writing skills are required to install ClickHouse.

Highly Reliable

ClickHouse has been managing petabytes of data serving a number of highload mass audience services of Russia’s leading search provider and one of Europe’s largest IT companies, Yandex. Since 2012, ClickHouse has been providing robust database management for the company’s web analytics service, comparison shopping platform, email service, online advertising platform, business intelligence and infrastructure monitoring.

ClickHouse is purely distributed system located on independent nodes, which has no single point of failure.

Software or hardware failures or misconfigurations do not result in loss of data. Instead of deleting "broken" data, ClickHouse saves it or asks you what to do before a startup. All data are checksummed before every read or write to disk or network. It is virtually impossible to delete data by accident.

ClickHouse offers flexible limits on query complexity and resource usage, which can be fine-tuned using settings. It is possible to simultaneously serve both a number of high priority low-latency requests and some long-running queries with lowered priority.

Use Cases

ClickHouse currently powers Yandex.Metrica, world's second largest web analytics platform, with over 13 trillion database records and over 20 billion events a day, generating customized reports on the fly directly from non-aggregated data.

Another example is CERN’s LHCb experiment to store and process metadata on 10bn events with over 1000 attributes per event registered in 2011.

Quick Start

System requirements: Linux, x86_64 with SSE 4.2.

Install packages for Ubuntu 16.04 Xenial, Ubuntu 14.04 Trusty or Ubuntu 12.04 Precise:


sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4    # optional

sudo mkdir -p /etc/apt/sources.list.d
echo "deb http://repo.yandex.ru/clickhouse/trusty stable main" |
    sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update

sudo apt-get install clickhouse-server-common clickhouse-client

sudo service clickhouse-server start
clickhouse-client

For other operating systems the easiest way to get started is using official Docker images of ClickHouse . Alternatively you can build ClickHouse from sources according to the instruction.

After installation proceed to tutorial or full documentation.

Contacts

Ask any questions on Stack Overflow.
Discuss with real users in Telegram chat in English or in Russian.
Use Google Group for discussion.
Or email ClickHouse team at Yandex: turn on JavaScript to see email address.

Like ClickHouse?

Help to spread the word about it via Facebook, Twitter and LinkedIn!

Software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

ClickHouse