Commit Graph

48 Commits

Author SHA1 Message Date
Azat Khuzhin
96f2a46a66 Fix filtering by virtual columns with OR filter in query
The problem with the initial implementation #52653 was:
- OR can have multiple arguments
- It simply not correct to assume that if there are two arguments this is OK.
  Consider the following example:

    "WHERE (column_not_from_partition_by = 1) OR false OR false"

  Will be converted to:

    "WHERE false OR false"

And it will simply read nothing.

Yes, we could apply some optimization for bool, but this will not always
work, since to optimize things like "0 = 1" we need to execute it.

And the only way to make handle this correctly (with ability to ignore
some commands during filtering) is to make is_constant() function return
has it use something from the input block, so that we can be sure, that
we have some sensible, and not just "false".

Plus we cannot simply ignore the difference of the input and output
arguments of handling OR, we need to add always-true (1/true) if the
size is different, since otherwise it could break invariants (see
comment in the code).

This includes (but not limited to):
- _part* filtering for MergeTree
- _path/_file for various File/HDFS/... engines
- _table for Merge
- ...

P.S. analyzer does not have this bug, since it execute expression as
whole, and this is what filterBlockWithQuery() should do actually
instead, but this will be a more complex patch.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
(cherry picked from commit b107712e0c)
2023-10-16 15:42:10 +02:00
Antonio Andelic
fb901c24a1
Revert "Fix filtering by virtual columns with OR filter in query" 2023-10-16 09:45:49 +02:00
Azat Khuzhin
b107712e0c Fix filtering by virtual columns with OR filter in query
The problem with the initial implementation #52653 was:
- OR can have multiple arguments
- It simply not correct to assume that if there are two arguments this is OK.
  Consider the following example:

    "WHERE (column_not_from_partition_by = 1) OR false OR false"

  Will be converted to:

    "WHERE false OR false"

And it will simply read nothing.

Yes, we could apply some optimization for bool, but this will not always
work, since to optimize things like "0 = 1" we need to execute it.

And the only way to make handle this correctly (with ability to ignore
some commands during filtering) is to make is_constant() function return
has it use something from the input block, so that we can be sure, that
we have some sensible, and not just "false".

Plus we cannot simply ignore the difference of the input and output
arguments of handling OR, we need to add always-true (1/true) if the
size is different, since otherwise it could break invariants (see
comment in the code).

This includes (but not limited to):
- _part* filtering for MergeTree
- _path/_file for various File/HDFS/... engines
- _table for Merge
- ...

P.S. analyzer does not have this bug, since it execute expression as
whole, and this is what filterBlockWithQuery() should do actually
instead, but this will be a more complex patch.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-10-10 20:47:52 +02:00
Kruglov Pavel
4d675dbad4
Merge pull request #54825 from azat/fix-virtual-columns-filtering
Fix filtering parts with indexHint for non analyzer (resubmit)
2023-09-22 18:20:16 +02:00
Michael Kolupaev
be0c512329
Fix virtual columns having incorrect values after ORDER BY (#54811)
Fix virtual columns having incorrect values after ORDER BY
2023-09-21 10:14:28 -07:00
Azat Khuzhin
9c3fb64106 Check type of the filter expressions while filtering by virtual columns
This should fix filtering by virtual columns with non-uint8 types, i.e.
queries like:

    SELECT * FROM data WHERE 1.0

Fixes: 02346_full_text_search
Fixes: 00990_hasToken_and_tokenbf
v2: move check out from is_const to filterBlockWithQuery(), since in is_const there is no way to validate sets.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-09-20 13:36:25 +02:00
Nikolai Kochetov
4cbfca78e4 Fixing 01748_partition_id_pruning 2023-09-01 14:29:31 +00:00
Antonio Andelic
f406019413 Apply PR comments 2023-08-30 09:26:01 +00:00
avogar
89b402f4e2 Fix 2023-08-22 13:01:07 +00:00
Kruglov Pavel
9869196021
Update VirtualColumnUtils.cpp 2023-08-21 14:22:48 +02:00
Kruglov Pavel
21413636ac
Remove debug logging 2023-08-21 13:30:08 +02:00
avogar
8e445b5461 Fixes 2023-08-18 17:49:40 +00:00
avogar
4c32097df3 Use filter by file/path before reading in url/file/hdfs table functions, reduce code duplication 2023-08-17 16:54:43 +00:00
Azat Khuzhin
68aed0d16e RFC: Fix filtering by virtual columns with OR expression
Virtual columns did not supports queries with OR, for example query like
this (here `m` is the `Merge` table, see the test):

    select key from m where (value = 10 and _table = 'v1') or (value = 20 and _table = 'v1');

Will always leads to:

    Cannot find column `value` in source stream, there are only columns ...

The reason for this is that it actually executes the following queries:

    SELECT key, value FROM default.d1 WHERE ((value = 10) AND ('v1' = 'v1')) OR ((value = 20) AND ('v1' = 'v1'));
    SELECT key FROM default.d2 WHERE 0;

And this kind of filtering is used not only for `Merge` table but also:
- `_table` for `Merge` (already mentioned)
- `_file` for `File`
- `_idx` for `S3`
- and as well as filtering `system.*` tables by `database`/`table`/...

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-07-27 16:35:17 +02:00
Nikolai Kochetov
22e49748b5 Cleanup. 2023-06-22 14:23:04 +00:00
Nikolai Kochetov
8e7d06e0a4 Remove isReady from FutureSet iface. 2023-06-19 12:56:24 +00:00
Nikolai Kochetov
afa74f697c Refactor a bit. 2023-06-16 19:38:50 +00:00
Nikolai Kochetov
d8f39b8df1 Fixing more tests. 2023-05-24 17:53:37 +00:00
Nikolai Kochetov
618486815b Merge branch 'master' into refactor-subqueries-for-in 2023-05-05 20:39:09 +02:00
Nikolai Kochetov
f598a39ea2 Refactor PreparedSets [3] 2023-05-04 17:54:08 +00:00
Alexander Gololobov
2e20f2a14d Do not skip building set even when reading from remote 2023-05-02 21:31:56 +02:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages (#45449)
* save format string for NetException

* format exceptions

* format exceptions 2

* format exceptions 3

* format exceptions 4

* format exceptions 5

* format exceptions 6

* fix

* format exceptions 7

* format exceptions 8

* Update MergeTreeIndexGin.cpp

* Update AggregateFunctionMap.cpp

* Update AggregateFunctionMap.cpp

* fix
2023-01-24 00:13:58 +03:00
Raúl Marín
45d27f461b
Merge branch 'master' into perf_experiment 2022-12-20 09:07:48 +00:00
Anton Popov
e0dd533811 fix scheduling of async tasks in StorageS3 2022-11-28 16:13:01 +00:00
Anton Popov
65a78bcd91 improve performance of storage S3 2022-11-26 15:24:01 +00:00
Raúl Marín
19310a5877 Replace std::vector with absl inlined_vector 2022-10-13 21:50:11 +02:00
vdimir
1e3fa2e01f
Refactor PreparedSets/SubqueryForSet 2022-07-26 18:39:02 +00:00
Dmitry Novik
16c6b60703 Introduce AggregationKeysInfo 2022-05-25 23:22:29 +00:00
Dmitry Novik
e5b395e054 Support ROLLUP and CUBE in GROUPING function 2022-05-16 17:33:38 +00:00
Dmitry Novik
6fc7dfea80 Support ordinary GROUP BY 2022-05-13 23:04:12 +00:00
Dmitry Novik
ae81268d4d Try to compute helper column lazy 2022-05-13 14:55:50 +00:00
Dmitry Novik
c5b40a9c91 WIP on GROUPING function 2022-05-12 16:40:26 +00:00
tavplubix
3afb7aa97e
Update VirtualColumnUtils.cpp 2021-06-28 16:47:47 +03:00
Ivan Lezhankin
d0ad6d9cff Fix all remaining tests 2021-05-31 21:50:07 +03:00
Maksim Kita
150a88d647 ExpressionActions compile only necessary places 2021-05-19 11:43:16 +03:00
Amos Bird
50f2e488bd
Fix invalid virtual column expr 2021-04-21 10:29:03 +08:00
Alexey Milovidov
22720dd7a4 Fix trivial mistake in filtering by virtual columns 2021-04-11 22:39:22 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Amos Bird
02604185f2
Correctly check constant expr 2021-03-10 02:48:46 +08:00
Amos Bird
091894f8ca
Add more comments 2021-03-08 11:09:06 +08:00
Amos Bird
909cb3c243
Fix again.... 2021-03-04 22:27:07 +08:00
Amos Bird
fac832227c
Fix again 2021-03-04 19:43:03 +08:00
Amos Bird
9205fad8c7
Better 2021-03-04 19:43:03 +08:00
Amos Bird
2f8f4e9697
Fix tests 2021-03-04 19:43:03 +08:00
Amos Bird
93b661ad5a
partition id pruning 2021-03-04 19:43:03 +08:00
Artem Zuikov
2afd123eda
Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645) 2020-07-22 20:13:05 +03:00
Alexey Milovidov
b78e1145e8 Fix filtering by virtual columns #12166 2020-07-09 02:52:57 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00