Commit Graph

6 Commits

Author SHA1 Message Date
Ivan Lezhankin
9d70b9e520 Add more tests to skip-list 2020-12-21 17:04:52 +03:00
Vasily Nemkov
bf8c7cd685 Checking that columns from PARTITION BY are present in DEDUPLICATE BY 2020-12-15 13:41:00 +03:00
Vasily Nemkov
a2f85a03f3 Enforcing all sorting keys to be present in DEDUPLICATE BY columns
Updated test and minor cleanup
2020-12-09 18:08:37 +03:00
Vasily Nemkov
168155eeec Minor: cleanup 2020-12-07 18:07:40 +03:00
Vasily Nemkov
f01a566646 Updated tests 2020-12-07 17:42:49 +03:00
Vasily Nemkov
70ea507dae OPTIMIZE DEDUPLICATE BY columns
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on.

Following syntax variants are now supported:

OPTIMIZE TABLE table DEDUPLICATE; -- the old one
OPTIMIZE TABLE table DEDUPLICATE BY *;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT (colX, colY);
OPTIMIZE TABLE table DEDUPLICATE BY col1,col2,col3;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex');
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT (colX, colY);

Note that * behaves just like in SELECT: MATERIALIZED, and ALIAS columns are not used for expansion.
Also, it is an error to specify empty list of columns, or write an expression that results in an empty list of columns, or deduplicate by an ALIAS column.
Column transformers other than EXCEPT are not supported.
2020-12-07 09:44:07 +03:00