Merge pull request #54257 from Enmk/patch-3

Minor clarifications to the `OPTIMIZE ... DEDUPLICATE` docs
This commit is contained in:
Igor Nikonov 2023-09-04 22:08:12 +02:00 committed by GitHub
commit 6e73dcf654
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -94,8 +94,10 @@ Result:
│ 1 │ 1 │ 3 │ 3 │
└─────────────┴───────────────┴───────┴───────────────┘
```
All following examples are executed against this state with 5 rows.
When columns for deduplication are not specified, all of them are taken into account. Row is removed only if all values in all columns are equal to corresponding values in previous row:
#### `DEDUPLICATE`
When columns for deduplication are not specified, all of them are taken into account. The row is removed only if all values in all columns are equal to corresponding values in the previous row:
``` sql
OPTIMIZE TABLE example FINAL DEDUPLICATE;
@ -116,7 +118,7 @@ Result:
│ 1 │ 1 │ 3 │ 3 │
└─────────────┴───────────────┴───────┴───────────────┘
```
#### `DEDUPLICATE BY *`
When columns are specified implicitly, the table is deduplicated by all columns that are not `ALIAS` or `MATERIALIZED`. Considering the table above, these are `primary_key`, `secondary_key`, `value`, and `partition_key` columns:
```sql
OPTIMIZE TABLE example FINAL DEDUPLICATE BY *;
@ -137,7 +139,7 @@ Result:
│ 1 │ 1 │ 3 │ 3 │
└─────────────┴───────────────┴───────┴───────────────┘
```
#### `DEDUPLICATE BY * EXCEPT`
Deduplicate by all columns that are not `ALIAS` or `MATERIALIZED` and explicitly not `value`: `primary_key`, `secondary_key`, and `partition_key` columns.
``` sql
@ -158,7 +160,7 @@ Result:
│ 1 │ 1 │ 2 │ 3 │
└─────────────┴───────────────┴───────┴───────────────┘
```
#### `DEDUPLICATE BY <list of columns>`
Deduplicate explicitly by `primary_key`, `secondary_key`, and `partition_key` columns:
```sql
OPTIMIZE TABLE example FINAL DEDUPLICATE BY primary_key, secondary_key, partition_key;
@ -178,8 +180,8 @@ Result:
│ 1 │ 1 │ 2 │ 3 │
└─────────────┴───────────────┴───────┴───────────────┘
```
Deduplicate by any column matching a regex: `primary_key`, `secondary_key`, and `partition_key` columns:
#### `DEDUPLICATE BY COLUMNS(<regex>)`
Deduplicate by all columns matching a regex: `primary_key`, `secondary_key`, and `partition_key` columns:
```sql
OPTIMIZE TABLE example FINAL DEDUPLICATE BY COLUMNS('.*_key');
```