From ef3ab85b76d7a2ba93c5dfafa0918bf1df78a81a Mon Sep 17 00:00:00 2001 From: Vasily Nemkov Date: Mon, 4 Sep 2023 16:57:38 +0200 Subject: [PATCH 1/2] Minor clarifications to the `OPTIMIZE ... DEDUPLICATE` docs --- docs/en/sql-reference/statements/optimize.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/en/sql-reference/statements/optimize.md b/docs/en/sql-reference/statements/optimize.md index 45d336c42f2..784428a24c7 100644 --- a/docs/en/sql-reference/statements/optimize.md +++ b/docs/en/sql-reference/statements/optimize.md @@ -94,8 +94,10 @@ Result: │ 1 │ 1 │ 3 │ 3 │ └─────────────┴───────────────┴───────┴───────────────┘ ``` +All following examples are executed against this state with 5 rows. -When columns for deduplication are not specified, all of them are taken into account. Row is removed only if all values in all columns are equal to corresponding values in previous row: +#### `DEDUPLICATE` +When columns for deduplication are not specified, all of them are taken into account. The row is removed only if all values in all columns are equal to corresponding values in the previous row: ``` sql OPTIMIZE TABLE example FINAL DEDUPLICATE; @@ -116,7 +118,7 @@ Result: │ 1 │ 1 │ 3 │ 3 │ └─────────────┴───────────────┴───────┴───────────────┘ ``` - +#### `DEDUPLICATE BY *` When columns are specified implicitly, the table is deduplicated by all columns that are not `ALIAS` or `MATERIALIZED`. Considering the table above, these are `primary_key`, `secondary_key`, `value`, and `partition_key` columns: ```sql OPTIMIZE TABLE example FINAL DEDUPLICATE BY *; @@ -137,7 +139,7 @@ Result: │ 1 │ 1 │ 3 │ 3 │ └─────────────┴───────────────┴───────┴───────────────┘ ``` - +#### `DEDUPLICATE BY * EXCEPT` Deduplicate by all columns that are not `ALIAS` or `MATERIALIZED` and explicitly not `value`: `primary_key`, `secondary_key`, and `partition_key` columns. ``` sql @@ -158,7 +160,7 @@ Result: │ 1 │ 1 │ 2 │ 3 │ └─────────────┴───────────────┴───────┴───────────────┘ ``` - +#### `DEDUPLICATE BY ` Deduplicate explicitly by `primary_key`, `secondary_key`, and `partition_key` columns: ```sql OPTIMIZE TABLE example FINAL DEDUPLICATE BY primary_key, secondary_key, partition_key; @@ -178,7 +180,7 @@ Result: │ 1 │ 1 │ 2 │ 3 │ └─────────────┴───────────────┴───────┴───────────────┘ ``` - +#### `DEDUPLICATE BY COLUMNS()` Deduplicate by any column matching a regex: `primary_key`, `secondary_key`, and `partition_key` columns: ```sql OPTIMIZE TABLE example FINAL DEDUPLICATE BY COLUMNS('.*_key'); From c0db042879fac1fc281771fd8b6a598967816b30 Mon Sep 17 00:00:00 2001 From: Vasily Nemkov Date: Mon, 4 Sep 2023 17:01:10 +0200 Subject: [PATCH 2/2] Update optimize.md --- docs/en/sql-reference/statements/optimize.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/en/sql-reference/statements/optimize.md b/docs/en/sql-reference/statements/optimize.md index 784428a24c7..49843eaff9a 100644 --- a/docs/en/sql-reference/statements/optimize.md +++ b/docs/en/sql-reference/statements/optimize.md @@ -181,7 +181,7 @@ Result: └─────────────┴───────────────┴───────┴───────────────┘ ``` #### `DEDUPLICATE BY COLUMNS()` -Deduplicate by any column matching a regex: `primary_key`, `secondary_key`, and `partition_key` columns: +Deduplicate by all columns matching a regex: `primary_key`, `secondary_key`, and `partition_key` columns: ```sql OPTIMIZE TABLE example FINAL DEDUPLICATE BY COLUMNS('.*_key'); ```