* Update of english version of descriprion of the table function `file`. * New syntax for ReplacingMergeTree. Some improvements in text. * Significantly change article about SummingMergeTree. Article is restructured, text is changed in many places of the document. New syntax for table creation is described. * Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version. * New syntax for new syntax of CREATE TABLE * Added english docs on Aggregating, Replacing and SummingMergeTree. * CollapsingMergeTree docs. English version. * 1. Update of CollapsingMergeTree. 2. Minor changes in markup * Update aggregatefunction.md * Update aggregatefunction.md * Update aggregatefunction.md * Update aggregatingmergetree.md * GraphiteMergeTree docs update. New syntax for creation of Replicated* tables. Minor changes in *MergeTree tables creation syntax. * Markup fix * Markup and language fixes * Clarification in the CollapsingMergeTree article
3.5 KiB
AggregatingMergeTree
The engine inherits from MergeTree, altering the logic for data parts merging. ClickHouse replaces all rows with the same primary key with a single row (within a one data part) that stores a combination of states of aggregate functions.
You can use AggregatingMergeTree
tables for incremental data aggregation, including for aggregated materialized views.
The engine processes all columns with AggregateFunction type.
It is appropriate to use AggregatingMergeTree
if it reduces the number of rows by orders.
Creating a Table
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]
For a description of request parameters, see request description.
Query clauses
When creating a ReplacingMergeTree
table the same clauses are required, as when creating a MergeTree
table.
Deprecated Method for Creating a Table
!!! attention Do not use this method in new projects and, if possible, switch the old projects to the method described above.
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE [=] AggregatingMergeTree(date-column [, sampling_expression], (primary, key), index_granularity)
All of the parameters have the same meaning as in MergeTree
.
SELECT and INSERT
To insert data, use INSERT SELECT query with aggregate -State
- functions.
When selecting data from AggregatingMergeTree
table, use GROUP BY
clause and the same aggregate functions as when inserting data, but using -Merge
suffix.
In the results of SELECT
query the values of AggregateFunction
type have implementation-specific binary representation for all of the ClickHouse output formats. If dump data into, for example, TabSeparated
format with SELECT
query then this dump can be loaded back using INSERT
query.
Example of an Aggregated Materialized View
AggregatingMergeTree
materialized view that watches the test.visits
table:
CREATE MATERIALIZED VIEW test.basic
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(StartDate) ORDER BY (CounterID, StartDate)
AS SELECT
CounterID,
StartDate,
sumState(Sign) AS Visits,
uniqState(UserID) AS Users
FROM test.visits
GROUP BY CounterID, StartDate;
Inserting of data into the test.visits
table.
INSERT INTO test.visits ...
The data are inserted in both the table and view test.basic
that will perform the aggregation.
To get the aggregated data, we need to execute a query such as SELECT ... GROUP BY ...
from the view test.basic
:
SELECT
StartDate,
sumMerge(Visits) AS Visits,
uniqMerge(Users) AS Users
FROM test.basic
GROUP BY StartDate
ORDER BY StartDate;