The problem with the initial implementation #52653 was:
- OR can have multiple arguments
- It simply not correct to assume that if there are two arguments this is OK.
Consider the following example:
"WHERE (column_not_from_partition_by = 1) OR false OR false"
Will be converted to:
"WHERE false OR false"
And it will simply read nothing.
Yes, we could apply some optimization for bool, but this will not always
work, since to optimize things like "0 = 1" we need to execute it.
And the only way to make handle this correctly (with ability to ignore
some commands during filtering) is to make is_constant() function return
has it use something from the input block, so that we can be sure, that
we have some sensible, and not just "false".
Plus we cannot simply ignore the difference of the input and output
arguments of handling OR, we need to add always-true (1/true) if the
size is different, since otherwise it could break invariants (see
comment in the code).
This includes (but not limited to):
- _part* filtering for MergeTree
- _path/_file for various File/HDFS/... engines
- _table for Merge
- ...
P.S. analyzer does not have this bug, since it execute expression as
whole, and this is what filterBlockWithQuery() should do actually
instead, but this will be a more complex patch.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
The problem was the order of the columns, in case of SELECT FINAL it got
"counters_Map.count", "counters_Map.id"
But in case of OPTIMIZE FINAL it got "counters_Map.id",
"counters_Map.count" correctly.
Note, that this bugs exists there from the very recent versions, I've
checked 19.x and it was there.
P.S. there is a workaround for this problem, if you will use one of the
following patterns for key columns:
- *ID
- *Key
- *Type
That way it will be explicitly matched as key and everything will work.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
One interesting thing about S3 C++ SDK is that it can read file multiple
times for calculating checksums and signature, but the last is not done
for the https protocol, though the checksum "could".
And indeed it does, since default checksum algorithm (MD5) does not
support streaming, and so it always calculated, regardless of the
protocol, however everything else (CRC*/SHA*) supports streamming and
actually will not be calculated for https at all!
This will be fixed in the follow up patch.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>