mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-27 10:02:01 +00:00

Mark Needham a85886c2e0 AggregatingMergeTree: Split table creation and MV definition + add more to example

2024-03-22 16:26:43 +00:00

5.9 KiB

Raw Blame History

slug	sidebar_position	sidebar_label
/en/engines/table-engines/mergetree-family/aggregatingmergetree	60	AggregatingMergeTree

AggregatingMergeTree

The engine inherits from MergeTree, altering the logic for data parts merging. ClickHouse replaces all rows with the same primary key (or more accurately, with the same sorting key) with a single row (within a one data part) that stores a combination of states of aggregate functions.

You can use AggregatingMergeTree tables for incremental data aggregation, including for aggregated materialized views.

The engine processes all columns with the following types:

AggregateFunction

SimpleAggregateFunction

It is appropriate to use AggregatingMergeTree if it reduces the number of rows by orders.

Creating a Table

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

For a description of request parameters, see request description.

Query clauses

When creating an AggregatingMergeTree table the same clauses are required, as when creating a MergeTree table.

Deprecated Method for Creating a Table

:::note Do not use this method in new projects and, if possible, switch the old projects to the method described above. :::

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE [=] AggregatingMergeTree(date-column [, sampling_expression], (primary, key), index_granularity)

All of the parameters have the same meaning as in MergeTree.

SELECT and INSERT

To insert data, use INSERT SELECT query with aggregate -State- functions. When selecting data from AggregatingMergeTree table, use GROUP BY clause and the same aggregate functions as when inserting data, but using -Merge suffix.

In the results of SELECT query, the values of AggregateFunction type have implementation-specific binary representation for all of the ClickHouse output formats. If dump data into, for example, TabSeparated format with SELECT query then this dump can be loaded back using INSERT query.

Example of an Aggregated Materialized View

The following examples assumes that you have a database named test so make sure you create that if it doesn't already exist:

CREATE DATABASE test;

We will create the table test.visits that contain the raw data:

CREATE TABLE test.visits
 (
    StartDate DateTime64 NOT NULL,
    CounterID UInt64,
    Sign Nullable(Int32),
    UserID Nullable(Int32)
) ENGINE = MergeTree ORDER BY (StartDate, CounterID);

Next, we need to create an AggregatingMergeTree table that will store AggregationFunctions that keep track of the total number of visits and the number of unique users.

AggregatingMergeTree materialized view that watches the test.visits table, and use the AggregateFunction type:

CREATE TABLE test.agg_visits (
    StartDate DateTime64 NOT NULL,
    CounterID UInt64,
    Visits AggregateFunction(sum, Nullable(Int32)),
    Users AggregateFunction(uniq, Nullable(Int32))
)
ENGINE = AggregatingMergeTree() ORDER BY (StartDate, CounterID);

And then let's create a materialized view that populates test.agg_visits from test.visits :

CREATE MATERIALIZED VIEW test.visits_mv TO test.agg_visits
AS SELECT
    StartDate,
    CounterID,
    sumState(Sign) AS Visits,
    uniqState(UserID) AS Users
FROM test.visits
GROUP BY StartDate, CounterID;

Inserting data into the test.visits table.

INSERT INTO test.visits (StartDate, CounterID, Sign, UserID)
 VALUES (1667446031000, 1, 3, 4), (1667446031000, 1, 6, 3);

The data is inserted in both test.visits and test.agg_visits.

To get the aggregated data, we need to execute a query such as SELECT ... GROUP BY ... from the materialized view test.mv_visits:

SELECT
    StartDate,
    sumMerge(Visits) AS Visits,
    uniqMerge(Users) AS Users
FROM test.agg_visits
GROUP BY StartDate
ORDER BY StartDate;

┌───────────────StartDate─┬─Visits─┬─Users─┐
│ 2022-11-03 03:27:11.000 │      9 │     2 │
└─────────────────────────┴────────┴───────┘

And how about if we add another couple of records to test.visits, but this time we'll use a different timestamp for one of the records:

INSERT INTO test.visits (StartDate, CounterID, Sign, UserID)
 VALUES (1669446031000, 2, 5, 10), (1667446031000, 3, 7, 5);

If we then run the SELECT query again, we'll see the following output:

┌───────────────StartDate─┬─Visits─┬─Users─┐
│ 2022-11-03 03:27:11.000 │     16 │     3 │
│ 2022-11-26 07:00:31.000 │      5 │     1 │
└─────────────────────────┴────────┴───────┘

Blog: Using Aggregate Combinators in ClickHouse

5.9 KiB Raw Blame History