The problem with the initial implementation #52653 was:
- OR can have multiple arguments
- It simply not correct to assume that if there are two arguments this is OK.
Consider the following example:
"WHERE (column_not_from_partition_by = 1) OR false OR false"
Will be converted to:
"WHERE false OR false"
And it will simply read nothing.
Yes, we could apply some optimization for bool, but this will not always
work, since to optimize things like "0 = 1" we need to execute it.
And the only way to make handle this correctly (with ability to ignore
some commands during filtering) is to make is_constant() function return
has it use something from the input block, so that we can be sure, that
we have some sensible, and not just "false".
Plus we cannot simply ignore the difference of the input and output
arguments of handling OR, we need to add always-true (1/true) if the
size is different, since otherwise it could break invariants (see
comment in the code).
This includes (but not limited to):
- _part* filtering for MergeTree
- _path/_file for various File/HDFS/... engines
- _table for Merge
- ...
P.S. analyzer does not have this bug, since it execute expression as
whole, and this is what filterBlockWithQuery() should do actually
instead, but this will be a more complex patch.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
(cherry picked from commit b107712e0c)
The problem with the initial implementation #52653 was:
- OR can have multiple arguments
- It simply not correct to assume that if there are two arguments this is OK.
Consider the following example:
"WHERE (column_not_from_partition_by = 1) OR false OR false"
Will be converted to:
"WHERE false OR false"
And it will simply read nothing.
Yes, we could apply some optimization for bool, but this will not always
work, since to optimize things like "0 = 1" we need to execute it.
And the only way to make handle this correctly (with ability to ignore
some commands during filtering) is to make is_constant() function return
has it use something from the input block, so that we can be sure, that
we have some sensible, and not just "false".
Plus we cannot simply ignore the difference of the input and output
arguments of handling OR, we need to add always-true (1/true) if the
size is different, since otherwise it could break invariants (see
comment in the code).
This includes (but not limited to):
- _part* filtering for MergeTree
- _path/_file for various File/HDFS/... engines
- _table for Merge
- ...
P.S. analyzer does not have this bug, since it execute expression as
whole, and this is what filterBlockWithQuery() should do actually
instead, but this will be a more complex patch.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This should fix filtering by virtual columns with non-uint8 types, i.e.
queries like:
SELECT * FROM data WHERE 1.0
Fixes: 02346_full_text_search
Fixes: 00990_hasToken_and_tokenbf
v2: move check out from is_const to filterBlockWithQuery(), since in is_const there is no way to validate sets.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Virtual columns did not supports queries with OR, for example query like
this (here `m` is the `Merge` table, see the test):
select key from m where (value = 10 and _table = 'v1') or (value = 20 and _table = 'v1');
Will always leads to:
Cannot find column `value` in source stream, there are only columns ...
The reason for this is that it actually executes the following queries:
SELECT key, value FROM default.d1 WHERE ((value = 10) AND ('v1' = 'v1')) OR ((value = 20) AND ('v1' = 'v1'));
SELECT key FROM default.d2 WHERE 0;
And this kind of filtering is used not only for `Merge` table but also:
- `_table` for `Merge` (already mentioned)
- `_file` for `File`
- `_idx` for `S3`
- and as well as filtering `system.*` tables by `database`/`table`/...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* save format string for NetException
* format exceptions
* format exceptions 2
* format exceptions 3
* format exceptions 4
* format exceptions 5
* format exceptions 6
* fix
* format exceptions 7
* format exceptions 8
* Update MergeTreeIndexGin.cpp
* Update AggregateFunctionMap.cpp
* Update AggregateFunctionMap.cpp
* fix