Sometimes we want to push a log entry once and only once. Because it is
not possible to create a sequential node in ZooKeeper and store its name
to a well known location in the same transaction we'll do it in the
other order. First somehow generate a unique identifier, then submit a
log entry with that identifier. Later, we can search through log entries
using the identifier we provided to find the node.
Required for part movement between shards.
When the part data (e.g. data.bin) is corrupted, but the checksums.txt
is present -- explicitly deleting the checksums.txt.
Removed the extra logging, changes some exceptions message.
The original ticket idea was to search for the possibly available data
into the /detached folders for the GET_PART command, but
@tavplubix pointed out this would be quite expensive for an every
fetch.
So a new command is going to be introduced, ATTACH_PART, which will
cover ALTER TABLE ATTACH PART and only for which the search will start.
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on.
Following syntax variants are now supported:
OPTIMIZE TABLE table DEDUPLICATE; -- the old one
OPTIMIZE TABLE table DEDUPLICATE BY *;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT (colX, colY);
OPTIMIZE TABLE table DEDUPLICATE BY col1,col2,col3;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex');
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT (colX, colY);
Note that * behaves just like in SELECT: MATERIALIZED, and ALIAS columns are not used for expansion.
Also, it is an error to specify empty list of columns, or write an expression that results in an empty list of columns, or deduplicate by an ALIAS column.
Column transformers other than EXCEPT are not supported.