From 8f694c1fb3a85c508ac43d959ff041aeb45a9353 Mon Sep 17 00:00:00 2001 From: Alexey Milovidov Date: Mon, 27 Jun 2016 05:43:07 +0300 Subject: [PATCH] Tutorial: preparation [#METR-20000]. --- doc/tutorial.html | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/doc/tutorial.html b/doc/tutorial.html index b9ca5c70dd3..1e395a13222 100644 --- a/doc/tutorial.html +++ b/doc/tutorial.html @@ -166,6 +166,25 @@ border-bottom: 1px dashed #f00; text-decoration: none; } + + .tip + { + background-color: #EEE; + border: 1px solid #EEE; + padding: 5px 10px 5px 10px; + } + + .tip b + { + font-size: 150%; + color: #888; + } + + .warranty { + font-size: 10pt; + color: #888; + line-height: 150%; + } @@ -403,14 +422,12 @@ Let assume that our aim is to provide a set of reports for each advertiser. Comm

The first is that String data type is used in cases when Enum or numeric type would fit best.

-Tip 1 -

When set of possible values is determined and known to be small. (E.g. OS name, browser vendors etc.) it's recommended to use Enums or numbers to improve performance. -When set of possible values is not limited (search query, URL, etc.) just go ahead with String.

+

When set of possible values is determined and known to be small. (E.g. OS name, browser vendors etc.) it's recommended to use Enums or numbers to improve performance. +When set of possible values is not limited (search query, URL, etc.) just go ahead with String.

The second is that dataset contains redundant fields like Year, Quarter, Month, DayOfMonth, DayOfWeek. In fact a single FlightDate would be enough. Most likely they have been added to improve performance for other DBMS'es which DateTime handling functions may be not efficient.

-Tip 2 -

ClickHouse functions for operating with DateTime fields are well-optimized so such redundancy is not required. Anyway much columns is not a reason to worry — ClickHouse is a column-oriented DBMS. This allows you to have as much fields as you need. Hundreds of columns in a table is fine for ClickHouse.

+

ClickHouse functions for operating with DateTime fields are well-optimized so such redundancy is not required. Anyway much columns is not a reason to worry — ClickHouse is a column-oriented DBMS. This allows you to have as much fields as you need. Hundreds of columns in a table is fine for ClickHouse.

Querying the sample dataset

@@ -580,8 +597,7 @@ Creating a distributed table providing a view into local tables of the cluster:
INSERT INTO ontime_all SELECT * FROM ontime;
-Tip 3 -

Worth to notice that the approach given above wouldn't fit for sharding of large tables. Please use built-in sharding feature.

+

Worth to notice that the approach given above wouldn't fit for sharding of large tables.
Please use built-in sharding feature.

As you could expect heavy queries are executed N times faster being launched on 3 servers instead of one.

See here @@ -667,6 +683,10 @@ ENGINE = ReplicatedMergeTree(
INSERT INTO ontime_replica SELECT * FROM ontime;

Replication operates in multi-master mode. Data can be loaded into any replica — it will be synced with other instances automatically. Replication is asynchronous so at a given moment of time not all replicas may contain recently inserted data. To allow data insertion at least one replica should be up. Others will sync up data and repair consistency once they will become active again. Please notice that such scheme allows for the possibility of just appended data loss.

+

Feedback

+

Ask any questions on Stackoverflow. Use Google Group for discussion.
Or send private message to developers: clickhouse-feedback@yandex-team.com.

+

Software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

+