Merge pull request #38166 from ClickHouse/grouping-docs

Update GROUP BY clause docs
2024-11-10 09:32:06 +00:00 · 2022-06-18 22:12:55 +03:00 · 2022-06-18 22:12:55 +03:00 · ac0891a48d
commit ac0891a48d
parent 8c8cd6a21d 00aca924d0
1 changed files with 47 additions and 6 deletions
--- a/docs/en/sql-reference/statements/select/group-by.md
+++ b/docs/en/sql-reference/statements/select/group-by.md
@ -48,9 +48,9 @@ You can see that `GROUP BY` for `y = NULL` summed up `x`, as if `NULL` is this v

 If you pass several keys to `GROUP BY`, the result will give you all the combinations of the selection, as if `NULL` were a specific value.

-## WITH ROLLUP Modifier
+## ROLLUP Modifier

-`WITH ROLLUP` modifier is used to calculate subtotals for the key expressions, based on their order in the `GROUP BY` list. The subtotals rows are added after the result table.
+`ROLLUP` modifier is used to calculate subtotals for the key expressions, based on their order in the `GROUP BY` list. The subtotals rows are added after the result table.

 The subtotals are calculated in the reverse order: at first subtotals are calculated for the last key expression in the list, then for the previous one, and so on up to the first key expression.

@ -78,7 +78,7 @@ Consider the table t:
 Query:

 ```sql
-SELECT year, month, day, count(*) FROM t GROUP BY year, month, day WITH ROLLUP;
+SELECT year, month, day, count(*) FROM t GROUP BY ROLLUP(year, month, day);
 ```
 As `GROUP BY` section has three key expressions, the result contains four tables with subtotals "rolled up" from right to left:

@ -109,10 +109,14 @@ As `GROUP BY` section has three key expressions, the result contains four tables
 │    0 │     0 │   0 │       6 │
 └──────┴───────┴─────┴─────────┘
 ```
+The same query also can be written using `WITH` keyword.
+```sql
+SELECT year, month, day, count(*) FROM t GROUP BY year, month, day WITH ROLLUP;
+```

-## WITH CUBE Modifier
+## CUBE Modifier

-`WITH CUBE` modifier is used to calculate subtotals for every combination of the key expressions in the `GROUP BY` list. The subtotals rows are added after the result table.
+`CUBE` modifier is used to calculate subtotals for every combination of the key expressions in the `GROUP BY` list. The subtotals rows are added after the result table.

 In the subtotals rows the values of all "grouped" key expressions are set to `0` or empty line.

@ -138,7 +142,7 @@ Consider the table t:
 Query:

 ```sql
-SELECT year, month, day, count(*) FROM t GROUP BY year, month, day WITH CUBE;
+SELECT year, month, day, count(*) FROM t GROUP BY CUBE(year, month, day);
 ```

 As `GROUP BY` section has three key expressions, the result contains eight tables with subtotals for all key expression combinations:
@ -196,6 +200,10 @@ Columns, excluded from `GROUP BY`, are filled with zeros.
 │    0 │     0 │   0 │       6 │
 └──────┴───────┴─────┴─────────┘
 ```
+The same query also can be written using `WITH` keyword.
+```sql
+SELECT year, month, day, count(*) FROM t GROUP BY year, month, day WITH CUBE;
+```


 ## WITH TOTALS Modifier
@ -260,6 +268,39 @@ GROUP BY domain

 For every different key value encountered, `GROUP BY` calculates a set of aggregate function values.

+## GROUPING SETS modifier
+
+This is the most general modifier.
+This modifier allows to manually specify several aggregation key sets (grouping sets).
+Aggregation is performed separately for each grouping set, after that all results are combined.
+If a column is not presented in a grouping set, it's filled with a default value.
+
+In other words, modifiers described above can be represented via `GROUPING SETS`.
+Despite the fact that queries with `ROLLUP`, `CUBE` and `GROUPING SETS` modifiers are syntactically equal, they may have different performance.
+When `GROUPING SETS` try to execute everything in parallel, `ROLLUP` and `CUBE` are executing the final merging of the aggregates in a single thread.
+
+In the situation when source columns contain default values, it might be hard to distinguish if a row is a part of the aggregation which uses those columns as keys or not.
+To solve this problem `GROUPING` function must be used.
+
+**Example**
+
+The following two queries are equivalent.
+
+```sql
+-- Query 1
+SELECT year, month, day, count(*) FROM t GROUP BY year, month, day WITH ROLLUP;
+
+-- Query 2
+SELECT year, month, day, count(*) FROM t GROUP BY
+GROUPING SETS
+(
+    (year, month, day),
+    (year, month),
+    (year),
+    ()
+);
+```
+
 ## Implementation Details

 Aggregation is one of the most important features of a column-oriented DBMS, and thus it’s implementation is one of the most heavily optimized parts of ClickHouse. By default, aggregation is done in memory using a hash-table. It has 40+ specializations that are chosen automatically depending on “grouping key” data types.