ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries.
Quick StartClickHouse's performance exceeds comparable column-oriented DBMS currently available on the market. It processes hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second.
ClickHouse uses all available hardware to its full potential to process each query as fast as possible. The peak processing performance for a single query (after decompression, only used columns) stands at more than 2 terabytes per second.
In contrast to common data management methods, where vast amounts of raw data in its native format are available as a "data lake" for any given query, ClickHouse offers instant results in most cases: the data is processed faster than it takes to create a query. Follow the link below to see detailed benchmarks by Yandex of ClickHouse in comparison with other database management systems. Also there are some links on third-party benchmarks in the following section.
Learn moreClickHouse allows companies to add servers to their clusters when necessary without investing time or money into any additional DBMS modification. The system has been successfully serving Yandex.Metrica, while the count of servers in it's main production cluster have grown from 60 to 394 in two years, which are by the way located in six geographically distributed datacenters.
ClickHouse scales well both vertically and horizontally. ClickHouse is easily adaptable to perform either on cluster with hundreds of nodes, or on a single server or even on a tiny virtual machine. Currently there are installations with more than two trillion rows per single node, as well as installations with 100Tb of storage per single node.
ClickHouse processes typical analytical queries two to three orders of magnitude faster than traditional row-oriented systems with the same available I/O throughput. The system's columnar storage format allows fitting more hot data in RAM, which leads to a shorter response times.
ClickHouse allows to minimize the number of seeks for range queries, which increases efficiency of using rotational disk drives, as it maintains locality of reference for continually stored data.
ClickHouse is CPU efficient because of it's vectorized query execution involving relevant processor instructions and runtime code generation.
By minimizing data transfers for most types of queries, ClickHouse enables companies to manage their data and create reports without using specialized networks that are aimed at high-performance computing.
ClickHouse supports multi-master asynchronous replication and can be deployed across multiple datacenters. Downtime of a single node or the whole datacenter won't affect the system's availability for both reads and writes. Distributed reads are automatically balanced to live replicas to avoid increasing latency. Replicated data are synchronized automatically or semi-automatically after server downtime.
ClickHouse features a user-friendly SQL query dialect with a number of built-in analytics capabilities. For example, it includes probabilistic data structures for fast and memory-efficient calculation of cardinalities and quantiles. There are functions for working dates, times and time zones, as well as some specialized ones like addressing URLs and IPs (both IPv4 and IPv6) and many more.
Data organizing options available in ClickHouse, such as arrays, array joins, tuples and nested data structures, are extremely efficient for managing denormalized data.
Using ClickHouse allows joining both distributed data and co-located data, as the system supports local joins and distributed joins. It also offers an opportunity to use external dictionaries, dimension tables loaded from an external source, for seamless joins with simple syntax.
ClickHouse supports approximate query processing – you can get results as fast as you want, which is indispensable when dealing with terabytes and petabytes of data.
The system's conditional aggregate functions, calculation of totals and extremes, allow getting results with a single query without having to run a number of them.
For analytics over stream of clean, well structured and immutable events or logs. It is recommended to put each such stream into a single wide fact table with pre-joined dimensions.
Some examples of viable applications:
ClickHouse has been managing petabytes of data serving a number of highload mass audience services of Yandex, Russia's leading search provider and one of largest European IT companies. Since 2012, ClickHouse has been providing robust database management for the company's web analytics service, comparison e-commerce platform, public email service, online advertising platform, business intelligence tools and infrastructure monitoring.
ClickHouse can be configured as purely distributed system located on independent nodes, without any single points of failure.
Software and hardware failures or misconfigurations do not result in loss of data. Instead of deleting "broken" data, ClickHouse saves it or asks you what to do before a startup. All data is checksummed before every read or write to disk or network. It is virtually impossible to delete data by accident as there are safeguards even for human errors.
ClickHouse offers flexible limits on query complexity and resource usage, which can be fine-tuned with settings. It is possible to simultaneously serve both a number of high priority low-latency requests and some long-running queries with background priority.
ClickHouse streamlines all your data processing. It's easy to use: ingest all your structured data into the system, and it is instantly available for reports. New columns for new properties or dimensions can be easily added to the system at any time without slowing it down.
ClickHouse is simple and works out-of-the-box. As well as performing on hundreds of node clusters, this system can be easily installed on a single server or even a virtual machine. No development experience or code-writing skills are required to install ClickHouse.
System requirements: Linux, x86_64 with SSE 4.2.
Install packages for Ubuntu/Debian:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4 # optional
sudo apt-add-repository "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/"
sudo apt-get update
sudo apt-get install -y clickhouse-server clickhouse-client
sudo service clickhouse-server start
clickhouse-client
For other operating systems the easiest way to get started is using official Docker images of ClickHouse . Alternatively you can build ClickHouse from sources according to the instruction.
After installation proceed to tutorial or full documentation.
Or email ClickHouse team at Yandex directly: turn on JavaScript to see email address, for example if you are interested in commercial support.
Friendly reminder: check out the documentation in English or Russian first — maybe your question is already covered.
Help to spread the word about it via Facebook, Twitter and LinkedIn!
ClickHouse source code is published under Apache 2.0 License. Software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.