Added relevant video to the page.
6.3 KiB
slug | sidebar_position | sidebar_label |
---|---|---|
/en/engines/table-engines/mergetree-family/aggregatingmergetree | 60 | AggregatingMergeTree |
AggregatingMergeTree
The engine inherits from MergeTree, altering the logic for data parts merging. ClickHouse replaces all rows with the same primary key (or more accurately, with the same sorting key) with a single row (within a one data part) that stores a combination of states of aggregate functions.
You can use AggregatingMergeTree
tables for incremental data aggregation, including for aggregated materialized views.
You can see an example of how to use the AggregatingMergeTree and Aggregate functions in the below video:
The engine processes all columns with the following types:
AggregateFunction
SimpleAggregateFunction
It is appropriate to use AggregatingMergeTree
if it reduces the number of rows by orders.
Creating a Table
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]
For a description of request parameters, see request description.
Query clauses
When creating an AggregatingMergeTree
table, the same clauses are required as when creating a MergeTree
table.
Deprecated Method for Creating a Table
:::note Do not use this method in new projects and, if possible, switch the old projects to the method described above. :::
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE [=] AggregatingMergeTree(date-column [, sampling_expression], (primary, key), index_granularity)
All of the parameters have the same meaning as in MergeTree
.
SELECT and INSERT
To insert data, use INSERT SELECT query with aggregate -State- functions.
When selecting data from AggregatingMergeTree
table, use GROUP BY
clause and the same aggregate functions as when inserting data, but using the -Merge
suffix.
In the results of SELECT
query, the values of AggregateFunction
type have implementation-specific binary representation for all of the ClickHouse output formats. For example, if you dump data into TabSeparated
format with a SELECT
query, then this dump can be loaded back using an INSERT
query.
Example of an Aggregated Materialized View
The following example assumes that you have a database named test
, so create it if it doesn't already exist:
CREATE DATABASE test;
Now create the table test.visits
that contains the raw data:
CREATE TABLE test.visits
(
StartDate DateTime64 NOT NULL,
CounterID UInt64,
Sign Nullable(Int32),
UserID Nullable(Int32)
) ENGINE = MergeTree ORDER BY (StartDate, CounterID);
Next, you need an AggregatingMergeTree
table that will store AggregationFunction
s that keep track of the total number of visits and the number of unique users.
Create an AggregatingMergeTree
materialized view that watches the test.visits
table, and uses the AggregateFunction
type:
CREATE TABLE test.agg_visits (
StartDate DateTime64 NOT NULL,
CounterID UInt64,
Visits AggregateFunction(sum, Nullable(Int32)),
Users AggregateFunction(uniq, Nullable(Int32))
)
ENGINE = AggregatingMergeTree() ORDER BY (StartDate, CounterID);
Create a materialized view that populates test.agg_visits
from test.visits
:
CREATE MATERIALIZED VIEW test.visits_mv TO test.agg_visits
AS SELECT
StartDate,
CounterID,
sumState(Sign) AS Visits,
uniqState(UserID) AS Users
FROM test.visits
GROUP BY StartDate, CounterID;
Insert data into the test.visits
table:
INSERT INTO test.visits (StartDate, CounterID, Sign, UserID)
VALUES (1667446031000, 1, 3, 4), (1667446031000, 1, 6, 3);
The data is inserted in both test.visits
and test.agg_visits
.
To get the aggregated data, execute a query such as SELECT ... GROUP BY ...
from the materialized view test.mv_visits
:
SELECT
StartDate,
sumMerge(Visits) AS Visits,
uniqMerge(Users) AS Users
FROM test.agg_visits
GROUP BY StartDate
ORDER BY StartDate;
┌───────────────StartDate─┬─Visits─┬─Users─┐
│ 2022-11-03 03:27:11.000 │ 9 │ 2 │
└─────────────────────────┴────────┴───────┘
Add another couple of records to test.visits
, but this time try using a different timestamp for one of the records:
INSERT INTO test.visits (StartDate, CounterID, Sign, UserID)
VALUES (1669446031000, 2, 5, 10), (1667446031000, 3, 7, 5);
Run the SELECT
query again, which will return the following output:
┌───────────────StartDate─┬─Visits─┬─Users─┐
│ 2022-11-03 03:27:11.000 │ 16 │ 3 │
│ 2022-11-26 07:00:31.000 │ 5 │ 1 │
└─────────────────────────┴────────┴───────┘