Commit Graph

59 Commits

Author SHA1 Message Date
Amos Bird
661c541e57
Fix Nullable keys in hyperrectangle. 2022-11-11 11:14:05 +08:00
Duc Canh Le
bd2bd7149d fix typo 2022-11-07 18:22:50 +08:00
Duc Canh Le
69d6d42443 NOT LIKE only work for 'prefix%' 2022-11-06 15:26:19 +08:00
Duc Canh Le
07d9720ed5
Merge branch 'master' into ch_canh_fix_prefix_not_like 2022-11-01 22:16:09 +08:00
Duc Canh Le
c080964391 correct 'notLike' in key condition 2022-10-28 13:39:01 +08:00
Maksim Kita
1b6293f6db MergeTree indexes use ActionsDAG 2022-10-26 12:44:37 +02:00
Amos Bird
15a69bce84
Use index when row_policy_filter is always false 2022-08-29 16:44:32 +08:00
Amos Bird
fa8fab2e8f
Fix KeyCondition with other filters 2022-08-11 19:20:44 +08:00
vdimir
b7c5c54181
Fix build 2022-08-10 13:43:55 +00:00
vdimir
708747ca0b
Merge branch 'master' into refactor-prepared-sets 2022-08-08 14:27:18 +02:00
vdimir
1e3fa2e01f
Refactor PreparedSets/SubqueryForSet 2022-07-26 18:39:02 +00:00
Nikolai Kochetov
eaeb30a71a Merge branch 'master' into use-dag-in-key-condition 2022-07-19 18:39:52 +02:00
avogar
9291d33080 Pass const std::string_view & by value, not by reference 2022-07-14 16:11:57 +00:00
taiyang-li
8dbf1c60e7 merge master and fix conflict 2022-03-23 11:36:50 +08:00
Maksim Kita
2fdcf53a76 Fix clang-tidy warnings in Server, Storages folders 2022-03-14 18:17:35 +00:00
liyang
37ba8004ff Speep up mergetree starting up process 2021-12-18 16:39:59 +08:00
Raúl Marín
f0ee0724ac Reduce dependencies on ASTSelectQuery.h
243 -> 152
2021-11-26 18:35:24 +01:00
Raúl Marín
b2cfa70541 Reduce dependencies on ASTFunction.h
481 -> 230
2021-11-26 18:21:54 +01:00
Kruglov Pavel
f559c34113
Merge pull request #28302 from amosbird/binaryconstantwrap
Always monotonic for non-zero division
2021-09-03 20:02:54 +03:00
Amos Bird
4624bf70b0
Always monotonic for non-zero division 2021-08-28 23:33:18 +08:00
Amos Bird
f2374a6916
Better nullable primary key implementation. 2021-08-28 17:48:28 +08:00
alexey-milovidov
b52411a715
Merge pull request #12455 from amosbird/npc
Nullable primary key with correct KeyCondition
2021-07-18 17:52:20 +03:00
Nikolai Kochetov
f5f57781b7 Fix build after merge. 2021-06-22 17:45:22 +03:00
Nikolai Kochetov
3dc0b9c096 Merge branch 'master' into use-dag-in-key-condition 2021-06-22 17:02:01 +03:00
Nikolai Kochetov
21e39e10ea Update KeyCondition constructor 2021-06-22 13:28:56 +03:00
Nikolai Kochetov
eef6c73030 Use DAG in KeyCondition 2021-06-21 19:17:05 +03:00
Anton Popov
ffa56bde24 fix usage of index with array columns and ARRAY JOIN 2021-06-21 15:34:05 +03:00
Nikolai Kochetov
47387d4faa Merge branch 'master' into use-dag-in-key-condition 2021-06-21 14:10:09 +03:00
Amos Bird
f2ed5ef42b
Nullable primary key with correct KeyCondition 2021-06-18 23:04:24 +08:00
Nikolai Kochetov
481b87b37a Remove some code from keyCondition. 2021-06-15 16:47:37 +03:00
Nikolai Kochetov
38966e3e6b Part 2. 2021-06-03 15:26:02 +03:00
Nikolai Kochetov
c855cf7057 Part 1. 2021-06-02 19:56:24 +03:00
kssenii
70469429c1 Fixes 2021-05-24 23:39:56 +00:00
Nikolai Kochetov
8d8e57615c A little bit better index description. 2021-04-16 12:42:23 +03:00
Nikolai Kochetov
be52b2889a Better description for key condition. 2021-04-15 20:30:04 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Azat Khuzhin
0b47f4a9e9 Fix optimize_trivial_count_query with partition predicate
Consider the following example:

    CREATE TABLE test(p DateTime, k int) ENGINE MergeTree PARTITION BY toDate(p) ORDER BY k;
    INSERT INTO test VALUES ('2020-09-01 00:01:02', 1), ('2020-09-01 20:01:03', 2), ('2020-09-02 00:01:03', 3);

- SELECT count() FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00'
  In this case rpn will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN (due to strict), FUNCTION_AND)
  and for optimize_trivial_count_query we cannot use index if there is at least one FUNCTION_UNKNOWN.
  since there is no post processing and return count() based on only the first predicate is wrong.

  Before this patch FUNCTION_UNKNOWN was allowed for optimize_trivial_count_query, and the result was wrong.

And two examples above just to show the difference, the behaviour hadn't been changed with this patch:

- SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00'
  In this case will be (FUNCTION_IN_RANGE, FUNCTION_IN_RANGE (due to non-strict), FUNCTION_AND)
  so it will prune everything out and nothing will be read.

- SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND toUnixTimestamp(p)%5==0
  In this case will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN, FUNCTION_AND)
  and all, two, partitions will be scanned, but due to filtering later none of rows will be matched.
2020-11-25 23:09:17 +03:00
Nikolai Kochetov
46f70dd0de Merge branch 'master' into actions-dag-f14 2020-11-12 11:54:44 +03:00
Alexander Tokmakov
b94cc5c4e5 remove more stringstreams 2020-11-10 21:22:26 +03:00
Nikolai Kochetov
99cc9b1ec0 Fix build 2020-11-09 16:20:56 +03:00
Amos Bird
aa436a3cb1
Transform single point 2020-11-06 14:59:55 +08:00
alexey-milovidov
adeba6bdd8
Merge pull request #15074 from amosbird/btc
Extend trivial count optimization.
2020-10-22 02:50:57 +03:00
Nikolai Kochetov
a7fb2e38a5 Use ColumnWithTypeAndName as function argument instead of Block. 2020-10-09 10:41:28 +03:00
Amos Bird
867216103f
Extend trivial count optimization. 2020-10-08 18:08:17 +08:00
Nikolai Kochetov
dad9d369a1 Merge branch 'master' into bobrik-parallel-randes 2020-07-23 16:21:32 +03:00
Artem Zuikov
2afd123eda
Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645) 2020-07-22 20:13:05 +03:00
Nikolai Kochetov
755f15def3 Make MergeTreeSetIndex::checkInRange const. 2020-07-21 14:22:45 +03:00
Nikolai Kochetov
12c5e376c6 Remove mutable from RPNElement. 2020-07-21 14:02:58 +03:00
Ivan Babrou
d9d8d0242e Optimize PK lookup for queries that match exact PK range
Existing code that looks up marks that match the query has a pathological
case, when most of the part does in fact match the query.

The code works by recursively splitting a part into ranges and then discarding
the ranges that definitely do not match the query, based on primary key.

The problem is that it requires visiting every mark that matches the query,
making the complexity of this sort of look up O(n).

For queries that match exact range on the primary key, we can find
both left and right parts of the range with O(log 2) complexity.

This change implements exactly that.

To engage this optimization, the query must:

* Have a prefix list of the primary key.
* Have only range or single set element constraints for columns.
* Have only AND as a boolean operator.

Consider a table with `(service, timestamp)` as the primary key.

The following conditions will be optimized:

* `service = 'foo'`
* `service = 'foo' and timestamp >= now() - 3600`
* `service in ('foo')`
* `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now`

The following will fall back to previous lookup algorithm:

* `timestamp >= now() - 3600`
* `service in ('foo', 'bar') and timestamp >= now() - 3600`
* `service = 'foo'`

Note that the optimization won't engage when PK has a range expression
followed by a point expression, since in that case the range is not continuous.

Trace query logging provides the following messages types of messages,
each representing a different kind of PK usage for a part:

```
Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps
Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps
Not using index on part 20200710_5710473_5710473_0
```

Number of steps translates to computational complexity.

Here's a comparison for before and after for a query over 24h of data:

```
Read 4562944 rows, 148.05 MiB in 45.19249672 sec.,   100966 rows/sec.,   3.28 MiB/sec.
Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec.
```

This is especially useful for queries that read data in order
and terminate early to return "last X things" matching a query.

See #11564 for more thoughts on this.
2020-07-11 12:26:54 -07:00
myrrc
8c3417fbf7
ILIKE operator (#12125)
* Integrated CachingAllocator into MarkCache

* fixed build errors

* reset func hotfix

* upd: Fixing build

* updated submodules links

* fix 2

* updating grabber allocator proto

* updating lost work

* updating CMake to use concepts

* some other changes to get it building (integration into MarkCache)

* further integration into caches

* updated Async metrics, fixed some build errors

* and some other errors revealing

* added perfect forwarding to some functions

* fix: forward template

* fix: constexpr modifier

* fix: FakePODAllocator missing member func

* updated PODArray constructor taking alloc params

* fix: PODArray overload with n restored

* fix: FakePODAlloc duplicating alloc() func

* added constexpr variable for alloc_tag_t

* split cache values by allocators, provided updates

* fix: memcpy

* fix: constexpr modifier

* fix: noexcept modifier

* fix: alloc_tag_t for PODArray constructor

* fix: PODArray copy ctor with different alloc

* fix: resize() signature

* updating to lastest working master

* syncing with 273267

* first draft version

* fix: update Searcher to case-insensitive

* added ILIKE test

* fixed style errors, updated test, split like and ilike,  added notILike

* replaced inconsistent comments

* fixed show tables ilike

* updated missing test cases

* regenerated ya.make

* Update 01355_ilike.sql

Co-authored-by: myrrc <me-clickhouse@myrrec.space>
Co-authored-by: alexey-milovidov <milovidov@yandex-team.ru>
2020-07-05 18:57:59 +03:00