Commit Graph

23 Commits

Author SHA1 Message Date
Azat Khuzhin
0b47f4a9e9 Fix optimize_trivial_count_query with partition predicate
Consider the following example:

    CREATE TABLE test(p DateTime, k int) ENGINE MergeTree PARTITION BY toDate(p) ORDER BY k;
    INSERT INTO test VALUES ('2020-09-01 00:01:02', 1), ('2020-09-01 20:01:03', 2), ('2020-09-02 00:01:03', 3);

- SELECT count() FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00'
  In this case rpn will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN (due to strict), FUNCTION_AND)
  and for optimize_trivial_count_query we cannot use index if there is at least one FUNCTION_UNKNOWN.
  since there is no post processing and return count() based on only the first predicate is wrong.

  Before this patch FUNCTION_UNKNOWN was allowed for optimize_trivial_count_query, and the result was wrong.

And two examples above just to show the difference, the behaviour hadn't been changed with this patch:

- SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00'
  In this case will be (FUNCTION_IN_RANGE, FUNCTION_IN_RANGE (due to non-strict), FUNCTION_AND)
  so it will prune everything out and nothing will be read.

- SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND toUnixTimestamp(p)%5==0
  In this case will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN, FUNCTION_AND)
  and all, two, partitions will be scanned, but due to filtering later none of rows will be matched.
2020-11-25 23:09:17 +03:00
Nikolai Kochetov
46f70dd0de Merge branch 'master' into actions-dag-f14 2020-11-12 11:54:44 +03:00
Alexander Tokmakov
b94cc5c4e5 remove more stringstreams 2020-11-10 21:22:26 +03:00
Nikolai Kochetov
99cc9b1ec0 Fix build 2020-11-09 16:20:56 +03:00
Amos Bird
aa436a3cb1
Transform single point 2020-11-06 14:59:55 +08:00
alexey-milovidov
adeba6bdd8
Merge pull request #15074 from amosbird/btc
Extend trivial count optimization.
2020-10-22 02:50:57 +03:00
Nikolai Kochetov
a7fb2e38a5 Use ColumnWithTypeAndName as function argument instead of Block. 2020-10-09 10:41:28 +03:00
Amos Bird
867216103f
Extend trivial count optimization. 2020-10-08 18:08:17 +08:00
Nikolai Kochetov
dad9d369a1 Merge branch 'master' into bobrik-parallel-randes 2020-07-23 16:21:32 +03:00
Artem Zuikov
2afd123eda
Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645) 2020-07-22 20:13:05 +03:00
Nikolai Kochetov
755f15def3 Make MergeTreeSetIndex::checkInRange const. 2020-07-21 14:22:45 +03:00
Nikolai Kochetov
12c5e376c6 Remove mutable from RPNElement. 2020-07-21 14:02:58 +03:00
Ivan Babrou
d9d8d0242e Optimize PK lookup for queries that match exact PK range
Existing code that looks up marks that match the query has a pathological
case, when most of the part does in fact match the query.

The code works by recursively splitting a part into ranges and then discarding
the ranges that definitely do not match the query, based on primary key.

The problem is that it requires visiting every mark that matches the query,
making the complexity of this sort of look up O(n).

For queries that match exact range on the primary key, we can find
both left and right parts of the range with O(log 2) complexity.

This change implements exactly that.

To engage this optimization, the query must:

* Have a prefix list of the primary key.
* Have only range or single set element constraints for columns.
* Have only AND as a boolean operator.

Consider a table with `(service, timestamp)` as the primary key.

The following conditions will be optimized:

* `service = 'foo'`
* `service = 'foo' and timestamp >= now() - 3600`
* `service in ('foo')`
* `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now`

The following will fall back to previous lookup algorithm:

* `timestamp >= now() - 3600`
* `service in ('foo', 'bar') and timestamp >= now() - 3600`
* `service = 'foo'`

Note that the optimization won't engage when PK has a range expression
followed by a point expression, since in that case the range is not continuous.

Trace query logging provides the following messages types of messages,
each representing a different kind of PK usage for a part:

```
Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps
Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps
Not using index on part 20200710_5710473_5710473_0
```

Number of steps translates to computational complexity.

Here's a comparison for before and after for a query over 24h of data:

```
Read 4562944 rows, 148.05 MiB in 45.19249672 sec.,   100966 rows/sec.,   3.28 MiB/sec.
Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec.
```

This is especially useful for queries that read data in order
and terminate early to return "last X things" matching a query.

See #11564 for more thoughts on this.
2020-07-11 12:26:54 -07:00
myrrc
8c3417fbf7
ILIKE operator (#12125)
* Integrated CachingAllocator into MarkCache

* fixed build errors

* reset func hotfix

* upd: Fixing build

* updated submodules links

* fix 2

* updating grabber allocator proto

* updating lost work

* updating CMake to use concepts

* some other changes to get it building (integration into MarkCache)

* further integration into caches

* updated Async metrics, fixed some build errors

* and some other errors revealing

* added perfect forwarding to some functions

* fix: forward template

* fix: constexpr modifier

* fix: FakePODAllocator missing member func

* updated PODArray constructor taking alloc params

* fix: PODArray overload with n restored

* fix: FakePODAlloc duplicating alloc() func

* added constexpr variable for alloc_tag_t

* split cache values by allocators, provided updates

* fix: memcpy

* fix: constexpr modifier

* fix: noexcept modifier

* fix: alloc_tag_t for PODArray constructor

* fix: PODArray copy ctor with different alloc

* fix: resize() signature

* updating to lastest working master

* syncing with 273267

* first draft version

* fix: update Searcher to case-insensitive

* added ILIKE test

* fixed style errors, updated test, split like and ilike,  added notILike

* replaced inconsistent comments

* fixed show tables ilike

* updated missing test cases

* regenerated ya.make

* Update 01355_ilike.sql

Co-authored-by: myrrc <me-clickhouse@myrrec.space>
Co-authored-by: alexey-milovidov <milovidov@yandex-team.ru>
2020-07-05 18:57:59 +03:00
Azat Khuzhin
d93b9a57f6 Forward declaration for Context as much as possible.
Now after changing Context.h 488 modules will be recompiled instead of 582.
2020-05-21 01:53:18 +03:00
alexey-milovidov
a46a61c970
Update KeyCondition.h 2020-04-08 05:56:25 +03:00
alexey-milovidov
a42d875a68
Update KeyCondition.h 2020-04-08 05:55:39 +03:00
alexey-milovidov
723a1f41e2
Update KeyCondition.h 2020-04-08 05:55:22 +03:00
alexey-milovidov
94a621060d
Update KeyCondition.h 2020-04-08 05:55:03 +03:00
Anton Popov
2dc1eddfab fix FieldRef 2020-04-06 16:35:11 +03:00
Anton Popov
5ada959853 improve performance of index analysis with monotonic functions 2020-04-06 13:37:34 +03:00
Anton Popov
79024d73a2 improve performance of index analysis with monotonic functions 2020-04-06 13:37:34 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00