ClickHouse/docs/tools/output.md

# What is ClickHouse? {#what-is-clickhouse}

ClickHouse is a column-oriented database management system (DBMS) for
online analytical processing of queries (OLAP).

In a “normal” row-oriented DBMS, data is stored in this order:

    Row   WatchID       JavaEnable   Title                GoodEvent   EventTime
    ----- ------------- ------------ -------------------- ----------- ---------------------
    #0   89354350662   1            Investor Relations   1           2016-05-18 05:19:20
    #1   90329509958   0            Contact us           1           2016-05-18 08:10:20
    #2   89953706054   1            Mission              1           2016-05-18 07:38:00
    #N   ...           ...          ...                  ...         ...

In other words, all the values related to a row are physically stored
next to each other.

Examples of a row-oriented DBMS are MySQL, Postgres, and MS SQL Server.
{: .grey }

In a column-oriented DBMS, data is stored like this:

    Row:          #0                   #1                   #2                   #N
    ------------- --------------------- --------------------- --------------------- -----
    WatchID:      89354350662           90329509958           89953706054           ...
    JavaEnable:   1                     0                     1                     ...
    Title:        Investor Relations    Contact us            Mission               ...
    GoodEvent:    1                     1                     1                     ...
    EventTime:    2016-05-18 05:19:20   2016-05-18 08:10:20   2016-05-18 07:38:00   ...

These examples only show the order that data is arranged in. The values
from different columns are stored separately, and data from the same
column is stored together.

Examples of a column-oriented DBMS: Vertica, Paraccel (Actian Matrix and
Amazon Redshift), Sybase IQ, Exasol, Infobright, InfiniDB, MonetDB
(VectorWise and Actian Vector), LucidDB, SAP HANA, Google Dremel, Google
PowerDrill, Druid, and kdb+. {: .grey }

Different orders for storing data are better suited to different
scenarios. The data access scenario refers to what queries are made, how
often, and in what proportion; how much data is read for each type of
query – rows, columns, and bytes; the relationship between reading and
updating data; the working size of the data and how locally it is used;
whether transactions are used, and how isolated they are; requirements
for data replication and logical integrity; requirements for latency and
throughput for each type of query, and so on.

The higher the load on the system, the more important it is to customize
the system set up to match the requirements of the usage scenario, and
the more fine grained this customization becomes. There is no system
that is equally well-suited to significantly different scenarios. If a
system is adaptable to a wide set of scenarios, under a high load, the
system will handle all the scenarios equally poorly, or will work well
for just one or few of possible scenarios.

## Key Properties of the OLAP scenario {#key-properties-of-the-olap-scenario}

-   The vast majority of requests are for read access.
-   Data is updated in fairly large batches (\> 1000 rows), not by
    single rows; or it is not updated at all.
-   Data is added to the DB but is not modified.
-   For reads, quite a large number of rows are extracted from the DB,
    but only a small subset of columns.
-   Tables are “wide,” meaning they contain a large number of columns.
-   Queries are relatively rare (usually hundreds of queries per server
    or less per second).
-   For simple queries, latencies around 50 ms are allowed.
-   Column values are fairly small: numbers and short strings (for
    example, 60 bytes per URL).
-   Requires high throughput when processing a single query (up to
    billions of rows per second per server).
-   Transactions are not necessary.
-   Low requirements for data consistency.
-   There is one large table per query. All tables are small, except for
    one.
-   A query result is significantly smaller than the source data. In
    other words, data is filtered or aggregated, so the result fits in a
    single server’s RAM.

It is easy to see that the OLAP scenario is very different from other
popular scenarios (such as OLTP or Key-Value access). So it doesn’t make
sense to try to use OLTP or a Key-Value DB for processing analytical
queries if you want to get decent performance. For example, if you try
to use MongoDB or Redis for analytics, you will get very poor
performance compared to OLAP databases.

## Why Column-Oriented Databases Work Better in the OLAP Scenario {#why-column-oriented-databases-work-better-in-the-olap-scenario}

Column-oriented databases are better suited to OLAP scenarios: they are
at least 100 times faster in processing most queries. The reasons are
explained in detail below, but the fact is easier to demonstrate
visually:

**Row-oriented DBMS**

![Row-oriented](images/row_oriented.gif#)

**Column-oriented DBMS**

![Column-oriented](images/column_oriented.gif#)

See the difference?

### Input/output {#inputoutput}

1.  For an analytical query, only a small number of table columns need
    to be read. In a column-oriented database, you can read just the
    data you need. For example, if you need 5 columns out of 100, you
    can expect a 20-fold reduction in I/O.
2.  Since data is read in packets, it is easier to compress. Data in
    columns is also easier to compress. This further reduces the I/O
    volume.
3.  Due to the reduced I/O, more data fits in the system cache.

For example, the query “count the number of records for each advertising
platform” requires reading one “advertising platform ID” column, which
takes up 1 byte uncompressed. If most of the traffic was not from
advertising platforms, you can expect at least 10-fold compression of
this column. When using a quick compression algorithm, data
decompression is possible at a speed of at least several gigabytes of
uncompressed data per second. In other words, this query can be
processed at a speed of approximately several billion rows per second on
a single server. This speed is actually achieved in practice.

<details markdown="1">

<summary>Example</summary>

      $ clickhouse-client
      ClickHouse client version 0.0.52053.
      Connecting to localhost:9000.
      Connected to ClickHouse server version 0.0.52053.

      :) SELECT CounterID, count() FROM hits GROUP BY CounterID ORDER BY count() DESC LIMIT 20

      SELECT
      CounterID,
      count()
      FROM hits
      GROUP BY CounterID
      ORDER BY count() DESC
      LIMIT 20

      ┌─CounterID─┬──count()─┐
      │    114208 │ 56057344 │
      │    115080 │ 51619590 │
      │      3228 │ 44658301 │
      │     38230 │ 42045932 │
      │    145263 │ 42042158 │
      │     91244 │ 38297270 │
      │    154139 │ 26647572 │
      │    150748 │ 24112755 │
      │    242232 │ 21302571 │
      │    338158 │ 13507087 │
      │     62180 │ 12229491 │
      │     82264 │ 12187441 │
      │    232261 │ 12148031 │
      │    146272 │ 11438516 │
      │    168777 │ 11403636 │
      │   4120072 │ 11227824 │
      │  10938808 │ 10519739 │
      │     74088 │  9047015 │
      │    115079 │  8837972 │
      │    337234 │  8205961 │
      └───────────┴──────────┘

      20 rows in set. Elapsed: 0.153 sec. Processed 1.00 billion rows, 4.00 GB (6.53 billion rows/s., 26.10 GB/s.)

      :)

</details>

### CPU {#cpu}

Since executing a query requires processing a large number of rows, it
helps to dispatch all operations for entire vectors instead of for
separate rows, or to implement the query engine so that there is almost
no dispatching cost. If you don’t do this, with any half-decent disk
subsystem, the query interpreter inevitably stalls the CPU. It makes
sense to both store data in columns and process it, when possible, by
columns.

There are two ways to do this:

1.  A vector engine. All operations are written for vectors, instead of
    for separate values. This means you don’t need to call operations
    very often, and dispatching costs are negligible. Operation code
    contains an optimized internal cycle.

2.  Code generation. The code generated for the query has all the
    indirect calls in it.

This is not done in “normal” databases, because it doesn’t make sense
when running simple queries. However, there are exceptions. For example,
MemSQL uses code generation to reduce latency when processing SQL
queries. (For comparison, analytical DBMSs require optimization of
throughput, not latency.)

Note that for CPU efficiency, the query language must be declarative
(SQL or MDX), or at least a vector (J, K). The query should only contain
implicit loops, allowing for optimization.

[Original article](https://clickhouse.tech/docs/en/) <!--hide-->
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								# What is ClickHouse? {#what-is-clickhouse}
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								ClickHouse is a column-oriented database management system (DBMS) for
 								online analytical processing of queries (OLAP).
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								In a “normal” row-oriented DBMS, data is stored in this order:
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    Row   WatchID       JavaEnable   Title                GoodEvent   EventTime
 								    ----- ------------- ------------ -------------------- ----------- ---------------------
-												Fix broken links in docs

											
										
										
											2020-10-13 17:23:29 +00:00
+								    #0   89354350662   1            Investor Relations   1           2016-05-18 05:19:20
 								    #1   90329509958   0            Contact us           1           2016-05-18 08:10:20
 								    #2   89953706054   1            Mission              1           2016-05-18 07:38:00
 								    #N   ...           ...          ...                  ...         ...
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								In other words, all the values related to a row are physically stored
 								next to each other.
 								Examples of a row-oriented DBMS are MySQL, Postgres, and MS SQL Server.
 								{: .grey }
 								In a column-oriented DBMS, data is stored like this:
-												Fix broken links in docs

											
										
										
											2020-10-13 17:23:29 +00:00
+								    Row:          #0                   #1                   #2                   #N
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    ------------- --------------------- --------------------- --------------------- -----
 								    WatchID:      89354350662           90329509958           89953706054           ...
 								    JavaEnable:   1                     0                     1                     ...
 								    Title:        Investor Relations    Contact us            Mission               ...
 								    GoodEvent:    1                     1                     1                     ...
 								    EventTime:    2016-05-18 05:19:20   2016-05-18 08:10:20   2016-05-18 07:38:00   ...
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								These examples only show the order that data is arranged in. The values
 								from different columns are stored separately, and data from the same
 								column is stored together.
 								Examples of a column-oriented DBMS: Vertica, Paraccel (Actian Matrix and
 								Amazon Redshift), Sybase IQ, Exasol, Infobright, InfiniDB, MonetDB
 								(VectorWise and Actian Vector), LucidDB, SAP HANA, Google Dremel, Google
 								PowerDrill, Druid, and kdb+. {: .grey }
 								Different orders for storing data are better suited to different
 								scenarios. The data access scenario refers to what queries are made, how
 								often, and in what proportion; how much data is read for each type of
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								query – rows, columns, and bytes; the relationship between reading and
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
+								updating data; the working size of the data and how locally it is used;
 								whether transactions are used, and how isolated they are; requirements
 								for data replication and logical integrity; requirements for latency and
 								throughput for each type of query, and so on.
 								The higher the load on the system, the more important it is to customize
 								the system set up to match the requirements of the usage scenario, and
 								the more fine grained this customization becomes. There is no system
 								that is equally well-suited to significantly different scenarios. If a
 								system is adaptable to a wide set of scenarios, under a high load, the
 								system will handle all the scenarios equally poorly, or will work well
 								for just one or few of possible scenarios.
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								## Key Properties of the OLAP scenario {#key-properties-of-the-olap-scenario}
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								-   The vast majority of requests are for read access.
 								-   Data is updated in fairly large batches (\> 1000 rows), not by
 								    single rows; or it is not updated at all.
 								-   Data is added to the DB but is not modified.
 								-   For reads, quite a large number of rows are extracted from the DB,
 								    but only a small subset of columns.
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   Tables are “wide,” meaning they contain a large number of columns.
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
+								-   Queries are relatively rare (usually hundreds of queries per server
 								    or less per second).
 								-   For simple queries, latencies around 50 ms are allowed.
 								-   Column values are fairly small: numbers and short strings (for
 								    example, 60 bytes per URL).
 								-   Requires high throughput when processing a single query (up to
 								    billions of rows per second per server).
 								-   Transactions are not necessary.
 								-   Low requirements for data consistency.
 								-   There is one large table per query. All tables are small, except for
 								    one.
 								-   A query result is significantly smaller than the source data. In
 								    other words, data is filtered or aggregated, so the result fits in a
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    single server’s RAM.
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								It is easy to see that the OLAP scenario is very different from other
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								popular scenarios (such as OLTP or Key-Value access). So it doesn’t make
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
+								sense to try to use OLTP or a Key-Value DB for processing analytical
 								queries if you want to get decent performance. For example, if you try
 								to use MongoDB or Redis for analytics, you will get very poor
 								performance compared to OLAP databases.
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								## Why Column-Oriented Databases Work Better in the OLAP Scenario {#why-column-oriented-databases-work-better-in-the-olap-scenario}
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								Column-oriented databases are better suited to OLAP scenarios: they are
 								at least 100 times faster in processing most queries. The reasons are
 								explained in detail below, but the fact is easier to demonstrate
 								visually:
 								**Row-oriented DBMS**
 								![Row-oriented](images/row_oriented.gif#)
 								**Column-oriented DBMS**
 								![Column-oriented](images/column_oriented.gif#)
 								See the difference?
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								### Input/output {#inputoutput}
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 .  For an analytical query, only a small number of table columns need
 								    to be read. In a column-oriented database, you can read just the
 								    data you need. For example, if you need 5 columns out of 100, you
 								    can expect a 20-fold reduction in I/O.
 .  Since data is read in packets, it is easier to compress. Data in
 								    columns is also easier to compress. This further reduces the I/O
 								    volume.
 .  Due to the reduced I/O, more data fits in the system cache.
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								For example, the query “count the number of records for each advertising
 								platform” requires reading one “advertising platform ID” column, which
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
+								takes up 1 byte uncompressed. If most of the traffic was not from
 								advertising platforms, you can expect at least 10-fold compression of
 								this column. When using a quick compression algorithm, data
 								decompression is possible at a speed of at least several gigabytes of
 								uncompressed data per second. In other words, this query can be
 								processed at a speed of approximately several billion rows per second on
 								a single server. This speed is actually achieved in practice.
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								<details markdown="1">
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								<summary>Example</summary>
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								      $ clickhouse-client
 								      ClickHouse client version 0.0.52053.
 								      Connecting to localhost:9000.
 								      Connected to ClickHouse server version 0.0.52053.
 								      :) SELECT CounterID, count() FROM hits GROUP BY CounterID ORDER BY count() DESC LIMIT 20
 								      SELECT
 								      CounterID,
 								      count()
 								      FROM hits
 								      GROUP BY CounterID
 								      ORDER BY count() DESC
 								      LIMIT 20
 								      ┌─CounterID─┬──count()─┐
 								      │    114208 │ 56057344 │
 								      │    115080 │ 51619590 │
 								      │      3228 │ 44658301 │
 								      │     38230 │ 42045932 │
 								      │    145263 │ 42042158 │
 								      │     91244 │ 38297270 │
 								      │    154139 │ 26647572 │
 								      │    150748 │ 24112755 │
 								      │    242232 │ 21302571 │
 								      │    338158 │ 13507087 │
 								      │     62180 │ 12229491 │
 								      │     82264 │ 12187441 │
 								      │    232261 │ 12148031 │
 								      │    146272 │ 11438516 │
 								      │    168777 │ 11403636 │
 								      │   4120072 │ 11227824 │
 								      │  10938808 │ 10519739 │
 								      │     74088 │  9047015 │
 								      │    115079 │  8837972 │
 								      │    337234 │  8205961 │
 								      └───────────┴──────────┘
 rows in set. Elapsed: 0.153 sec. Processed 1.00 billion rows, 4.00 GB (6.53 billion rows/s., 26.10 GB/s.)
 								      :)
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								</details>
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								### CPU {#cpu}
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
 								Since executing a query requires processing a large number of rows, it
 								helps to dispatch all operations for entire vectors instead of for
 								separate rows, or to implement the query engine so that there is almost
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								no dispatching cost. If you don’t do this, with any half-decent disk
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
+								subsystem, the query interpreter inevitably stalls the CPU. It makes
 								sense to both store data in columns and process it, when possible, by
 								columns.
 								There are two ways to do this:
 .  A vector engine. All operations are written for vectors, instead of
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								    for separate values. This means you don’t need to call operations
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
+								    very often, and dispatching costs are negligible. Operation code
 								    contains an optimized internal cycle.
 .  Code generation. The code generated for the query has all the
 								    indirect calls in it.
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								This is not done in “normal” databases, because it doesn’t make sense
-												Docs fixes

											
										
										
											2020-02-03 12:00:10 +00:00
+								when running simple queries. However, there are exceptions. For example,
 								MemSQL uses code generation to reduce latency when processing SQL
 								queries. (For comparison, analytical DBMSs require optimization of
 								throughput, not latency.)
 								Note that for CPU efficiency, the query language must be declarative
 								(SQL or MDX), or at least a vector (J, K). The query should only contain
 								implicit loops, allowing for optimization.
 								[Original article](https://clickhouse.tech/docs/en/) <!--hide-->