Commit Graph

68 Commits

Author SHA1 Message Date
Ivan Babrou
d9d8d0242e Optimize PK lookup for queries that match exact PK range
Existing code that looks up marks that match the query has a pathological
case, when most of the part does in fact match the query.

The code works by recursively splitting a part into ranges and then discarding
the ranges that definitely do not match the query, based on primary key.

The problem is that it requires visiting every mark that matches the query,
making the complexity of this sort of look up O(n).

For queries that match exact range on the primary key, we can find
both left and right parts of the range with O(log 2) complexity.

This change implements exactly that.

To engage this optimization, the query must:

* Have a prefix list of the primary key.
* Have only range or single set element constraints for columns.
* Have only AND as a boolean operator.

Consider a table with `(service, timestamp)` as the primary key.

The following conditions will be optimized:

* `service = 'foo'`
* `service = 'foo' and timestamp >= now() - 3600`
* `service in ('foo')`
* `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now`

The following will fall back to previous lookup algorithm:

* `timestamp >= now() - 3600`
* `service in ('foo', 'bar') and timestamp >= now() - 3600`
* `service = 'foo'`

Note that the optimization won't engage when PK has a range expression
followed by a point expression, since in that case the range is not continuous.

Trace query logging provides the following messages types of messages,
each representing a different kind of PK usage for a part:

```
Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps
Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps
Not using index on part 20200710_5710473_5710473_0
```

Number of steps translates to computational complexity.

Here's a comparison for before and after for a query over 24h of data:

```
Read 4562944 rows, 148.05 MiB in 45.19249672 sec.,   100966 rows/sec.,   3.28 MiB/sec.
Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec.
```

This is especially useful for queries that read data in order
and terminate early to return "last X things" matching a query.

See #11564 for more thoughts on this.
2020-07-11 12:26:54 -07:00
Nikita Mikhaylov
d31ed58f01 done 2020-07-06 17:33:31 +03:00
Anton Popov
73676f5022
Improve performace of reading in order of sorting key. (#11696)
* simplify reading in order of sorting key

* add perf test for reading many parts

* Revert "simplify reading in order of sorting key"

This reverts commit 7267d7c46e.

* add threshold for preliminary merge for reading in order

* better threshold

* limit threads in test
2020-07-04 15:48:51 +03:00
Alexey Milovidov
8eed47857b Fix estimation of the number of marks for various thresholds 2020-06-25 23:20:22 +03:00
Alexey Milovidov
8872417d00 Respect direct_io/mmap settings while reading secondary indices 2020-06-25 22:31:54 +03:00
Alexey Milovidov
5608f15749 Revive mmap IO 2020-06-25 22:15:41 +03:00
alesapin
b1e8976df4 Merge with master 2020-06-22 12:04:27 +03:00
alesapin
4c0879ae30 Better logging in storages 2020-06-19 20:17:13 +03:00
alesapin
dffdece350 getColumns in StorageInMemoryMetadta (only compilable) 2020-06-17 19:39:58 +03:00
alesapin
33c27de54d Check methods in metadata 2020-06-17 17:32:25 +03:00
alesapin
1afdebeebd Primary key in storage metadata 2020-06-17 15:39:20 +03:00
alesapin
1da393b218 Sampling key in StorageInMemoryMetadata 2020-06-17 15:07:09 +03:00
alesapin
ba04d02f1e Compilable sorting key in metadata 2020-06-17 14:05:11 +03:00
alesapin
62f2c17a66 Secondary indices in StorageInMemoryMetadata 2020-06-17 12:38:47 +03:00
alesapin
71f99a274d Compileable getSampleBlockWithColumns in StorageInMemoryMetadata 2020-06-16 17:25:08 +03:00
Anton Popov
5c42408add
Merge pull request #9113 from dimarub2000/group_by_in_order_optimization
[WIP] Optimization of GROUP BY with respect to table sorting key.
2020-06-06 14:25:59 +03:00
Alexander Kuzmenkov
c7d9094a7a
Merge pull request #11259 from ClickHouse/consistent_metadata3
More consistent metadata for secondary indices
2020-06-03 12:23:21 +03:00
Nikolai Kochetov
53d12f5ab8 Try fix build. 2020-06-01 20:06:21 +03:00
Nikolai Kochetov
d25326e75c Try fix build. 2020-06-01 20:03:57 +03:00
alesapin
663e92b1c5 Rename to methods 2020-06-01 14:29:11 +03:00
alesapin
3847ea892d Merge branch 'master' into consistent_metadata3 2020-06-01 13:17:59 +03:00
Alexey Milovidov
25f941020b Remove namespace pollution 2020-05-31 00:57:37 +03:00
Dmitry
4b0d32f026 Merge branch 'master' of github.com:yandex/ClickHouse into group_by_in_order_optimization 2020-05-31 00:21:02 +03:00
alesapin
5f8b69547b More readable code 2020-05-28 16:45:08 +03:00
alesapin
26a9829e04 Indices description as vector 2020-05-28 15:47:17 +03:00
alesapin
52ca6b2051 I'm able to build it 2020-05-28 15:37:05 +03:00
Nikolai Kochetov
da0052858d Fix build. 2020-05-28 13:57:04 +03:00
Dmitry
41d1cd1c9b fix bad merge 2020-05-27 01:17:32 +03:00
Dmitry
38c585f867 Merge branch 'master' of github.com:yandex/ClickHouse into group_by_in_order_optimization 2020-05-26 21:27:50 +03:00
alesapin
c3a6571036 Compilable code 2020-05-25 20:22:20 +03:00
alesapin
6281dd6893
Merge pull request #11115 from ClickHouse/consistent_metadata
Refactoring in storage metadata (more consistent keys)
2020-05-25 13:09:56 +03:00
Alexey Milovidov
7e1813825b Return old names of macros 2020-05-24 01:24:01 +03:00
Alexey Milovidov
18febd7b97 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_[^\_(]+\([^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+"\);' | while read file; do perl -pne 's/(LOG_[^\_(]+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)"\);/${1}_FORMATTED(${2}, "${3}{}${5}{}${7}{}${9}{}${11}", ${4}, ${6}, ${8}, ${10});/' $file > ${file}.tmp; mv ${file}.tmp $file; done 2020-05-23 22:56:05 +03:00
Alexey Milovidov
a2ad11897f Remove duplicate whitespaces (preparation) 2020-05-23 21:53:58 +03:00
Alexey Milovidov
1f13515a65 Make all LOG in single line (preparation) 2020-05-23 21:31:37 +03:00
Alexey Milovidov
533f86278a find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+"\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)"\);/\1_FORMATTED(\2, "\3{}\5{}\7", \4, \6);/' 2020-05-23 20:00:41 +03:00
Alexey Milovidov
e391b77d81 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+"\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)"\);/\1_FORMATTED(\2, "\3{}\5", \4);/' 2020-05-23 19:56:05 +03:00
Alexey Milovidov
ee4ffbc332 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+)\);/\1_FORMATTED(\2, "\3{}", \4);/' 2020-05-23 19:47:56 +03:00
Alexey Milovidov
8d2e80a5e2 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+"\)' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+, "[^"]+")\)/\1_FORMATTED(\2)/' 2020-05-23 19:42:39 +03:00
Dmitry
47778c0259 Merge branch 'master' of github.com:yandex/ClickHouse into group_by_in_order_optimization 2020-05-21 23:45:12 +03:00
alesapin
327d17ac6a Better 2020-05-21 22:46:03 +03:00
Azat Khuzhin
d93b9a57f6 Forward declaration for Context as much as possible.
Now after changing Context.h 488 modules will be recompiled instead of 582.
2020-05-21 01:53:18 +03:00
alesapin
616902a995 Sorting and primary key (broken) 2020-05-20 21:11:38 +03:00
alesapin
9fb28f5ac0 Add sampling key 2020-05-20 18:16:39 +03:00
alesapin
07cb21ccb7
Merge branch 'master' into refactor-reservations 2020-05-18 11:43:48 +03:00
Gleb Novikov
1a25ac6e1f Merge branch 'master' into refactor-reservations 2020-05-16 23:34:45 +03:00
Nikolai Kochetov
4d22374f24 Merged with master. 2020-05-14 12:06:15 +03:00
Anton Popov
84d4ad4315 add some comments 2020-05-13 19:00:06 +03:00
Dmitry
bbe0245b9d changes after review #1 2020-05-13 16:49:10 +03:00
Dmitry
152a636c23 fix bad merge 2020-05-13 03:13:01 +03:00