ClickHouse/docs/en/introduction/distinctive_features.md

---
toc_priority: 4
toc_title: Distinctive Features
---

# Distinctive Features of ClickHouse {#distinctive-features-of-clickhouse}

## True Column-Oriented DBMS {#true-column-oriented-dbms}

In a true column-oriented DBMS, no extra data is stored with the values. Among other things, this means that constant-length values must be supported, to avoid storing their length “number” next to the values. As an example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any “garbage”) even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.

It is worth noting because there are systems that can store values of different columns separately, but that can’t effectively process analytical queries due to their optimization for other scenarios. Examples are HBase, BigTable, Cassandra, and HyperTable. In these systems, you would get throughput around a hundred thousand rows per second, but not hundreds of millions of rows per second.

It’s also worth noting that ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.

## Data Compression {#data-compression}

Some column-oriented DBMSs (InfiniDB CE and MonetDB) do not use data compression. However, data compression does play a key role in achieving excellent performance.

## Disk Storage of Data {#disk-storage-of-data}

Keeping data physically sorted by primary key makes it possible to extract data for its specific values or value ranges with low latency, less than a few dozen milliseconds. Some column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach encourages the allocation of a larger hardware budget than is necessary for real-time analysis. ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available.

## Parallel Processing on Multiple Cores {#parallel-processing-on-multiple-cores}

Large queries are parallelized naturally, taking all the necessary resources available on the current server.

## Distributed Processing on Multiple Servers {#distributed-processing-on-multiple-servers}

Almost none of the columnar DBMSs mentioned above have support for distributed query processing.
In ClickHouse, data can reside on different shards. Each shard can be a group of replicas used for fault tolerance. All shards are used to run a query in parallel, transparently for the user.

## SQL Support {#sql-support}

ClickHouse supports a declarative query language based on SQL that is identical to the SQL standard in many cases.
Supported queries include GROUP BY, ORDER BY, subqueries in FROM, IN, and JOIN clauses, and scalar subqueries.
Dependent subqueries and window functions are not supported.

## Vector Engine {#vector-engine}

Data is not only stored by columns but is processed by vectors (parts of columns), which allows achieving high CPU efficiency.

## Real-time Data Updates {#real-time-data-updates}

ClickHouse supports tables with a primary key. To quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. No locks are taken when new data is ingested.

## Index {#index}

Having a data physically sorted by primary key makes it possible to extract data for its specific values or value ranges with low latency, less than a few dozen milliseconds.

## Suitable for Online Queries {#suitable-for-online-queries}

Low latency means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment while the user interface page is loading. In other words, online.

## Support for Approximated Calculations {#support-for-approximated-calculations}

ClickHouse provides various ways to trade accuracy for performance:

1.  Aggregate functions for approximated calculation of the number of distinct values, medians, and quantiles.
2.  Running a query based on a part (sample) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
3.  Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.

## Data Replication and Data Integrity Support {#data-replication-and-data-integrity-support}

ClickHouse uses asynchronous multi-master replication. After being written to any available replica, all the remaining replicas retrieve their copy in the background. The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases.

For more information, see the section [Data replication](../engines/table_engines/mergetree_family/replication.md).

[Original article](https://clickhouse.tech/docs/en/introduction/distinctive_features/) <!--hide-->
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								---
 								toc_priority: 4
 								toc_title: Distinctive Features
 								---
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								# Distinctive Features of ClickHouse {#distinctive-features-of-clickhouse}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## True Column-Oriented DBMS {#true-column-oriented-dbms}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								In a true column-oriented DBMS, no extra data is stored with the values. Among other things, this means that constant-length values must be supported, to avoid storing their length “number” next to the values. As an example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any “garbage”) even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								It is worth noting because there are systems that can store values of different columns separately, but that can’t effectively process analytical queries due to their optimization for other scenarios. Examples are HBase, BigTable, Cassandra, and HyperTable. In these systems, you would get throughput around a hundred thousand rows per second, but not hundreds of millions of rows per second.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								It’s also worth noting that ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Data Compression {#data-compression}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Update of english documentation (#2918)

* Updating of english translation.

* Some bugs are fixed.

											
										
										
											2018-09-04 11:18:59 +00:00
+								Some column-oriented DBMSs (InfiniDB CE and MonetDB) do not use data compression. However, data compression does play a key role in achieving excellent performance.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Disk Storage of Data {#disk-storage-of-data}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								Keeping data physically sorted by primary key makes it possible to extract data for its specific values or value ranges with low latency, less than a few dozen milliseconds. Some column-oriented DBMSs (such as SAP HANA and Google PowerDrill) can only work in RAM. This approach encourages the allocation of a larger hardware budget than is necessary for real-time analysis. ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Parallel Processing on Multiple Cores {#parallel-processing-on-multiple-cores}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								Large queries are parallelized naturally, taking all the necessary resources available on the current server.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Distributed Processing on Multiple Servers {#distributed-processing-on-multiple-servers}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												fixes for distinctive_features.md

											
										
										
											2018-09-06 17:40:13 +00:00
+								Almost none of the columnar DBMSs mentioned above have support for distributed query processing.
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								In ClickHouse, data can reside on different shards. Each shard can be a group of replicas used for fault tolerance. All shards are used to run a query in parallel, transparently for the user.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## SQL Support {#sql-support}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Update of english documentation (#2918)

* Updating of english translation.

* Some bugs are fixed.

											
										
										
											2018-09-04 11:18:59 +00:00
+								ClickHouse supports a declarative query language based on SQL that is identical to the SQL standard in many cases.
 								Supported queries include GROUP BY, ORDER BY, subqueries in FROM, IN, and JOIN clauses, and scalar subqueries.
 								Dependent subqueries and window functions are not supported.
-												WIP on docs introduction articles (#2716)

* WIP on content of distinctive_features.md

* WIP on content of features_considered_disadvantages.md

											
										
										
											2018-07-27 11:26:20 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Vector Engine {#vector-engine}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								Data is not only stored by columns but is processed by vectors (parts of columns), which allows achieving high CPU efficiency.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Real-time Data Updates {#real-time-data-updates}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								ClickHouse supports tables with a primary key. To quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. No locks are taken when new data is ingested.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Index {#index}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								Having a data physically sorted by primary key makes it possible to extract data for its specific values or value ranges with low latency, less than a few dozen milliseconds.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Suitable for Online Queries {#suitable-for-online-queries}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								Low latency means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment while the user interface page is loading. In other words, online.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Support for Approximated Calculations {#support-for-approximated-calculations}
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												fixes for distinctive_features.md

											
										
										
											2018-09-06 17:40:13 +00:00
+								ClickHouse provides various ways to trade accuracy for performance:
-												Translate introduction to chinese (#3766)


											
										
										
											2018-12-06 12:15:49 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+.  Aggregate functions for approximated calculation of the number of distinct values, medians, and quantiles.
 .  Running a query based on a part (sample) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
 .  Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Update zh docs and fix en docs (#10125)


											
										
										
											2020-04-08 14:22:25 +00:00
+								## Data Replication and Data Integrity Support {#data-replication-and-data-integrity-support}
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Grammar in distinctive_features.md
											
										
										
											2020-03-27 07:45:31 +00:00
+								ClickHouse uses asynchronous multi-master replication. After being written to any available replica, all the remaining replicas retrieve their copy in the background. The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases.
-												Initial commit if EN docs

											
										
										
											2017-04-03 19:49:50 +00:00
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								For more information, see the section [Data replication](../engines/table_engines/mergetree_family/replication.md).
-												WIP on docs/website (#3383)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

											
										
										
											2018-10-16 10:47:17 +00:00
-												Domain change in docs

											
										
										
											2020-01-30 10:34:55 +00:00
+								[Original article](https://clickhouse.tech/docs/en/introduction/distinctive_features/) <!--hide-->