Commit Graph

3360 Commits

Author SHA1 Message Date
Ivan Babrou
d9d8d0242e Optimize PK lookup for queries that match exact PK range
Existing code that looks up marks that match the query has a pathological
case, when most of the part does in fact match the query.

The code works by recursively splitting a part into ranges and then discarding
the ranges that definitely do not match the query, based on primary key.

The problem is that it requires visiting every mark that matches the query,
making the complexity of this sort of look up O(n).

For queries that match exact range on the primary key, we can find
both left and right parts of the range with O(log 2) complexity.

This change implements exactly that.

To engage this optimization, the query must:

* Have a prefix list of the primary key.
* Have only range or single set element constraints for columns.
* Have only AND as a boolean operator.

Consider a table with `(service, timestamp)` as the primary key.

The following conditions will be optimized:

* `service = 'foo'`
* `service = 'foo' and timestamp >= now() - 3600`
* `service in ('foo')`
* `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now`

The following will fall back to previous lookup algorithm:

* `timestamp >= now() - 3600`
* `service in ('foo', 'bar') and timestamp >= now() - 3600`
* `service = 'foo'`

Note that the optimization won't engage when PK has a range expression
followed by a point expression, since in that case the range is not continuous.

Trace query logging provides the following messages types of messages,
each representing a different kind of PK usage for a part:

```
Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps
Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps
Not using index on part 20200710_5710473_5710473_0
```

Number of steps translates to computational complexity.

Here's a comparison for before and after for a query over 24h of data:

```
Read 4562944 rows, 148.05 MiB in 45.19249672 sec.,   100966 rows/sec.,   3.28 MiB/sec.
Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec.
```

This is especially useful for queries that read data in order
and terminate early to return "last X things" matching a query.

See #11564 for more thoughts on this.
2020-07-11 12:26:54 -07:00
Vitaly Baranov
fe6122a1dd
Merge pull request #12394 from vitlibar/fix-calculating-implicit-access-rights
Fix calculating implicit access rights
2020-07-11 19:31:56 +03:00
Alexander Kuzmenkov
10b0287fa4
Merge pull request #12341 from ClickHouse/aku/perfect-visitor
More perfect forwarding in field visitors
2020-07-11 13:36:37 +03:00
alexey-milovidov
c615ea658b
Merge pull request #12400 from vitlibar/fix-bad_typeid
Fix std::bad_typeid when JSON functions called with argument of wrong type
2020-07-11 05:40:44 +03:00
alexey-milovidov
7bc2c83b80
Merge pull request #12372 from s-mx/issue-10429-add_setting_for_ascii_grid_symbols
add setting for ascii grid symbols
2020-07-11 05:31:41 +03:00
alexey-milovidov
4b2da605da
Update PrettyCompactBlockOutputFormat.cpp 2020-07-11 00:32:21 +03:00
alexey-milovidov
ca0591320d
Update PrettyBlockOutputFormat.cpp 2020-07-11 00:31:25 +03:00
alexey-milovidov
88e9003c35
Update DataTypeNullable.cpp 2020-07-11 00:29:07 +03:00
alexey-milovidov
7c2bd32c9c
Merge pull request #12401 from ClickHouse/fix_segfault_in_storage_merge
Fix another Context-related segfault
2020-07-11 00:19:00 +03:00
Maxim Sabyanin
40f7ec71d3 add setting output_format_pretty_grid_charset
This setting allows to chose charset for printing grids (either UTF-8 or
ASCII).
2020-07-10 22:25:49 +03:00
Artem Zuikov
6b26842ce5
RIGHT and FULL JOIN for MergeJoin (#12118) 2020-07-10 21:10:06 +03:00
alexey-milovidov
e22547c29d
Merge pull request #12388 from ClickHouse/bloom-filter-arg-check
Check arguments of bloom filter index
2020-07-10 20:54:16 +03:00
alexey-milovidov
caef1d8e24
Update MergeTreeIndexFullText.cpp 2020-07-10 20:53:58 +03:00
alexey-milovidov
2d9e0ec049
Merge pull request #12376 from ClickHouse/fix-totals-state-2
Fix TOTALS/ROLLUP/CUBE for aggregate functions with -State and Nullable arguments
2020-07-10 20:18:48 +03:00
alexey-milovidov
d819624d7c
Merge pull request #12378 from ClickHouse/allow-clear-column-with-dependencies
Allow to CLEAR column even if there are depending DEFAULT expressions
2020-07-10 20:18:14 +03:00
alexey-milovidov
031c773260
Merge pull request #12384 from ClickHouse/support-negative-float-constants-in-key-condition
Avoid exception when negative or floating point constant is used in WHERE condition for indexed tables
2020-07-10 20:16:35 +03:00
Vitaly Baranov
30e3d61b01 Fix calculating implicit access rights. 2020-07-10 17:16:43 +03:00
Vitaly Baranov
94c858b2dc Fix std::bad_typeid when JSON functions called with argument of wrong type. 2020-07-10 17:12:57 +03:00
Vitaly Baranov
3a0d358694 Allow typeid_cast() to cast nullptr to nullptr. 2020-07-10 17:02:48 +03:00
Alexander Tokmakov
20d95a21fc fix another context-related segfault 2020-07-10 17:00:44 +03:00
Artem Zuikov
01b5c2663c
Delete injective functions inside uniq (#12337) 2020-07-10 13:42:41 +03:00
Alexey Milovidov
47eaffbe63 Additional checks 2020-07-10 11:21:40 +03:00
Alexey Milovidov
4b86f36d37 Check arguments of bloom filter index 2020-07-10 11:13:21 +03:00
alesapin
5cae87e664
Merge pull request #12335 from ClickHouse/fix_alter_exit_codes
Fix alter rename error messages
2020-07-10 11:05:20 +03:00
Alexey Milovidov
d543a75f65 Allow to parse operator NOT as a function #12262 2020-07-10 09:48:05 +03:00
Alexey Milovidov
276b3a0215 Avoid exception when negative or floating point constant is used in WHERE condition for indexed tables #11905 2020-07-10 09:30:49 +03:00
Alexey Milovidov
a4b35a8a6f Allow to CLEAR column even if there are depending DEFAULT expressions #12333 2020-07-10 08:54:35 +03:00
Alexey Milovidov
df2c7fec24 Add comment 2020-07-10 08:42:09 +03:00
alexey-milovidov
c16d8e094b
Merge pull request #12308 from ClickHouse/fix-codec-bad-exception-code
Fix wrong exception code in codecs Delta, DoubleDelta #12110
2020-07-10 08:40:46 +03:00
alexey-milovidov
6479e2b406
Update FieldVisitors.h 2020-07-10 08:37:43 +03:00
Alexey Milovidov
70273725d5 Fix error 2020-07-10 08:30:54 +03:00
Alexey Milovidov
c610a4b0a8 Fix error with ownership of aggregate function states with nested states 2020-07-10 08:28:34 +03:00
alexey-milovidov
c5ebf596c8
Merge pull request #12315 from ClickHouse/fix-race-condition-replicated-merge-tree-queue
Fix race condition in ReplicatedMergeTreeQueue
2020-07-10 08:11:56 +03:00
alexey-milovidov
8d7e418617
Merge pull request #12314 from BohuTANG/mysql_select_database
Support MySQL 'SELECT DATABASE()'
2020-07-10 06:32:04 +03:00
Alexey Milovidov
12e00411b4 Fix TOTALS/ROLLUP/CUBE for aggregate functions with -State and Nullable arguments #12163 2020-07-10 06:23:42 +03:00
Alexey Milovidov
f252dd94c8 Miscellaneous 2020-07-10 05:17:15 +03:00
Alexey Milovidov
fcdcb3cb1e Remove useless code 2020-07-10 04:58:27 +03:00
Alexey Milovidov
afc00fa0b8 Merge branch 'master' into fix-codec-bad-exception-code 2020-07-10 04:12:24 +03:00
Alexey Milovidov
7fc90aa070 Fix error 2020-07-10 02:45:29 +03:00
Alexey Milovidov
ad6fcd57b2 Merge branch 'master' into fix-race-condition-replicated-merge-tree-queue 2020-07-10 02:21:24 +03:00
Anton Popov
c4767557f2
Merge pull request #12306 from CurtizJ/fix-with-fill
Fix order of columns in WITH FILL modifier
2020-07-10 00:12:11 +03:00
Alexander Kuzmenkov
0001a08081 More perfect forwarding in field visitors 2020-07-09 19:16:33 +03:00
Alexey Milovidov
1b3d389135 Revert unrelated changes 2020-07-09 17:45:53 +03:00
alexey-milovidov
bb5247ea9d
Merge pull request #12313 from ClickHouse/sanitizer-trap-log-from-separate-thread
Log sanitizer trap messages from separate thread
2020-07-09 17:40:55 +03:00
alesapin
0156f43ed3 Human readable errors in alter rename queries 2020-07-09 17:30:38 +03:00
BohuTANG
260bcb9d79 Add integration test for mysql replacement query 2020-07-09 22:20:54 +08:00
alesapin
9dea4ab323 Initial version 2020-07-09 17:14:44 +03:00
Vladimir Chebotarev
faf6be6576
Implemented single part uploads for DiskS3 (#12026)
* Implemented single part uploads for DiskS3.
* Added `min_multi_part_upload_size` to disk configuration.
2020-07-09 17:09:17 +03:00
alesapin
47f05dcadd
Merge pull request #12304 from CurtizJ/fix-ttl-rename
Fix TTL after renaming column.
2020-07-09 13:06:27 +03:00
Alexey Milovidov
380f748358 Fix issues 2020-07-09 08:08:41 +03:00