Commit Graph

199 Commits

Author SHA1 Message Date
Alexey Milovidov
8dfa933028 Amend 2021-01-25 23:48:10 +03:00
Alexey Milovidov
9ee5c1535e Allow to disable checksums on read 2021-01-25 23:29:04 +03:00
Azat Khuzhin
1c364b6ee3 Fix SIGSEGV with merge_tree_min_rows_for_concurrent_read/merge_tree_min_bytes_for_concurrent_read=0/UINT64_MAX
In case of 0 or too huge value it will try to read not existing marks
and got:

    Logical error: 'Trying to get non existing mark 11936128518282651045, while size is 2'.
2021-01-24 14:39:57 +03:00
Pavel Kruglov
900580af02 Add parallel select when there is one part with level>0 in select final 2021-01-21 20:34:50 +03:00
Amos Bird
a3d19fa64d
Correctly override default settings remotely 2021-01-08 12:28:09 +08:00
Amos Bird
0260953a47
better 2021-01-06 17:18:48 +08:00
Amos Bird
a157a5b3b3
add max_partitions_to_read setting 2021-01-04 12:40:48 +08:00
Nikolai Kochetov
46f70dd0de Merge branch 'master' into actions-dag-f14 2020-11-12 11:54:44 +03:00
tavplubix
058aa8f85e
Merge pull request #16824 from ClickHouse/replace_stringstreams_with_buffers
Replace std::*stringstreams with DB::*Buffers
2020-11-12 01:11:44 +03:00
Nikolai Kochetov
1846bb3cac Merge branch 'master' into actions-dag-f14 2020-11-11 13:08:57 +03:00
Alexander Tokmakov
b94cc5c4e5 remove more stringstreams 2020-11-10 21:22:26 +03:00
Nikolai Kochetov
c6575c9032 Update ExpressionActions constructur 2020-11-10 19:27:55 +03:00
Nikolai Kochetov
07fe3a6347 Fix build. 2020-11-10 15:14:05 +03:00
Nikolai Kochetov
195c941c4e Merge branch 'master' into storage-read-query-plan 2020-11-10 15:02:22 +03:00
Nikolai Kochetov
363c1e05c0 Try fix tests. 2020-11-10 12:35:05 +03:00
Nikolai Kochetov
6717c7a0af Merge branch 'master' into actions-dag-f14 2020-11-09 14:57:48 +03:00
alexey-milovidov
0e6ae4aff7
Merge pull request #16253 from amosbird/pf
Prune partition in verbatim way.
2020-11-08 18:58:02 +03:00
Alexey Milovidov
fd84d16387 Fix "server failed to start" error 2020-11-07 03:14:53 +03:00
alexey-milovidov
7fb53b205c
Merge pull request #16637 from azat/mt-read_in_order-spread-fix
Fix spreading for ReadInOrderOptimizer with expression in ORDER BY
2020-11-06 17:36:03 +03:00
Nikolai Kochetov
c10f733587 Merge branch 'master' into storage-read-query-plan 2020-11-06 15:43:46 +03:00
Nikolai Kochetov
9aeb757da4 Merge branch 'master' into actions-dag-f14 2020-11-06 15:04:20 +03:00
Amos Bird
2b0085c106
Pruning is different from counting 2020-11-06 19:58:03 +08:00
Amos Bird
30bf5e6d26
Prune partition in verbatim way. 2020-11-06 09:56:13 +08:00
Alexey Milovidov
1bcf22d42f Fix 'max_parallel_replicas' without sampling. 2020-11-04 18:59:14 +03:00
Azat Khuzhin
2389406c21 Fix spreading for ReadInOrderOptimizer with expression in ORDER BY
This will fix optimize_read_in_order/optimize_aggregation_in_order with
max_threads>0 and expression in ORDER BY
2020-11-04 07:07:26 +03:00
Nikolai Kochetov
54a9b80a11 Fix build 2020-11-03 22:30:58 +03:00
Nikolai Kochetov
6767a226fc Merge branch 'master' into actions-dag-f14 2020-11-03 15:21:06 +03:00
Nikolai Kochetov
07a7c46b89 Refactor ExpressionActions [Part 3] 2020-11-03 14:28:28 +03:00
Anton Popov
a3a8e18637
Merge branch 'master' into select_final 2020-11-03 00:00:43 +03:00
Nikolai Kochetov
1c106691b5
Merge pull request #16423 from amosbird/jbodread
Balanced reading from JBOD
2020-10-29 19:22:45 +03:00
Amos Bird
f995ef9797
Balanced reading from JBOD 2020-10-29 04:05:07 +08:00
Mikhail Filimonov
41971e073a
Fix typos reported by codespell 2020-10-27 12:04:03 +01:00
Pavel Kruglov
89fdeb4e15 Fix style, move setting and add checking level>0 2020-10-21 20:35:31 +03:00
Pavel Kruglov
f5fac575f4 don't postprocess single parts 2020-10-15 15:22:41 +03:00
Pavel Kruglov
44c2b138f3 Fix style 2020-10-13 22:53:36 +03:00
Pavel Kruglov
be0cb31d21 Add tests and comments 2020-10-13 21:55:03 +03:00
Pavel Kruglov
8200bab859 Add setting do_not_merge_across_partitions 2020-10-13 17:54:52 +03:00
Nikolai Kochetov
7e58f99f64 Merge branch 'master' into storage-read-query-plan 2020-10-12 13:12:39 +03:00
Nikolai Kochetov
76a04fb4b4
Merge pull request #15762 from ClickHouse/new-block-for-functions
Use `ColumnsWithTypeAndName` instead of `Block` for function calls
2020-10-10 08:50:38 +03:00
Nikolai Kochetov
a7fb2e38a5 Use ColumnWithTypeAndName as function argument instead of Block. 2020-10-09 10:41:28 +03:00
Azat Khuzhin
75e612fc16 Use full featured parser for force_data_skipping_indices 2020-10-07 01:44:14 +03:00
Azat Khuzhin
ef6d12967f Implement force_data_skipping_indices setting 2020-10-07 01:42:31 +03:00
Nikolai Kochetov
f9bf1e3406 Merge branch 'master' into storage-read-query-plan 2020-10-06 09:50:10 +03:00
Azat Khuzhin
21deb6812c
Drop unused code for numeric_limits<int128> in MergeTreeDataSelectExecutor (#15519) 2020-10-02 16:46:20 +03:00
Nikolai Kochetov
ea131989be Try fix test. 2020-10-01 21:47:20 +03:00
Nikolai Kochetov
ec64def384 Use QueryPlan while reading from MergeTree. 2020-10-01 20:34:22 +03:00
Amos Bird
81d08b59e5
Replace useless multiset with unordered_set 2020-09-25 16:38:09 +08:00
filimonov
cc24ef9f83
Better debug message from MergeTreeDataSelectExecutor
See #15168
2020-09-22 21:35:29 +02:00
alexey-milovidov
85483f8532
Merge pull request #14853 from ClickHouse/akz/optimized_index_binary_search
Optimized marks selection algorithm for continuous marks ranges
2020-09-20 19:48:45 +03:00
Alexander Kazakov
1ee2e3d2b3 Review fix 2020-09-18 16:03:48 +03:00
roman
ddca262fe6 fix review comments 2020-09-17 20:54:21 +01:00
roman
b41421cb1c [settings]: introduce new query complexity settings for leaf-nodes
The new setting should allow to control query complexity on leaf nodes
excluding the final merging stage on the root-node. For example, distributed
query that reads 1k rows from 5 shards will breach the `max_rows_to_read=5000`,
while effectively every shard reads only 1k rows. With setting `max_rows_to_read_leaf=1500`
this limit won't be reached and query will succeed since every shard reads
not more that ~1k rows.
2020-09-17 10:37:05 +01:00
Alexander Kazakov
7465e00163 Optimized marks selection algorithm for continuous marks ranges 2020-09-15 17:22:32 +03:00
Nikolai Kochetov
92c937db8b Remove CreatingSetsBlockInputStream 2020-09-02 16:13:13 +03:00
Amos Bird
591a4d60d4
Fix bug in mark inclusion search. 2020-08-29 09:46:46 +08:00
Amos Bird
078b14610d
ALTER MODIFY SAMPLE BY 2020-08-27 22:31:30 +08:00
Artem Zuikov
becc186c91
Add support for extended precision integers and decimals (#13097) 2020-08-19 14:52:17 +03:00
alexey-milovidov
23ccb0b6be
Merge pull request #13677 from hagen1778/merge-tree-fail-fast-on-rows-limit
[mergeTree]: fail fast if max_rows_to_read limit exceeded on parts scan
2020-08-18 22:24:39 +03:00
roman
35e28b4c6b [mergeTree]: make exception message more clear 2020-08-17 09:52:04 +01:00
roman
b637699ccd [mergeTree]: fail fast if max_rows_to_read limit exceeded on parts scan
The motivation behind this change is to skip ranges scan for all selected parts
if it is clear that `max_rows_to_read` is already exceeded. The change is quite
noticeable for queries over big number of parts.
2020-08-14 13:53:48 +01:00
Nikolai Kochetov
9b67cd9faf Merge branch 'master' into refactor-pipes-3 2020-08-10 10:50:17 +03:00
Alexey Milovidov
edd89a8610 Fix half of typos 2020-08-08 03:47:03 +03:00
Alexey Milovidov
0ac3f0481f Probably fix error 2020-08-07 04:27:29 +03:00
Nikolai Kochetov
20e63d2271 Refactor Pipe [part 6] 2020-08-06 15:24:05 +03:00
Nikolai Kochetov
09fbce1b1e Merge branch 'master' into refactor-pipes-3 2020-08-04 11:32:34 +03:00
Nikolai Kochetov
2cca4d5fcf Refactor Pipe [part 2]. 2020-08-03 16:54:14 +03:00
Nikolai Kochetov
e411916bde Refactor Pipe [part 1]. 2020-08-03 14:33:11 +03:00
Azat Khuzhin
e37c42c56c Fix logging in MergeTreeDataSelectExecutor for multiple threads (attach to thread group) 2020-08-02 13:40:01 +03:00
Azat Khuzhin
101217470e Use "Not using primary index on part" over "Not using index on part" (add "primary") 2020-08-02 13:40:01 +03:00
Nikolai Kochetov
39530f837e Remove TreeExecutorBlockInputStream. 2020-07-31 16:23:19 +03:00
Alexander Kuzmenkov
1b9269ae0c fixup 2020-07-28 19:58:19 +03:00
Alexander Kuzmenkov
297cf65f1f Merge remote-tracking branch 'origin/master' into HEAD 2020-07-28 19:56:35 +03:00
Alexander Kuzmenkov
ba7c33f806
Merge pull request #12754 from bobrik/ivan/obvious-skip
Show total granules examined by skipping indices
2020-07-28 17:14:25 +03:00
Ivan Babrou
e835ec0b56 Show marks before applying skipping indices
This change makes skipping index efficiency more obvious, changing this:

```
Selected 30 parts by date, 30 parts by key, 592 marks to read from 541 ranges
```

Into this:

```
Selected 30 parts by date, 30 parts by key, 48324 marks by primary key, 592 marks to read from 541 ranges
```
2020-07-24 15:45:38 -07:00
Ivan Babrou
67d4529783 Show total granules examined by skipping indices
This change makes skipping index efficiency more obvious, changing this:

```
Index `idx_duration` has dropped 59 granules.
```

Into this:

```
Index `idx_duration` has dropped 59 / 61 granules.
```
2020-07-24 14:50:32 -07:00
Nikolai Kochetov
dad9d369a1 Merge branch 'master' into bobrik-parallel-randes 2020-07-23 16:21:32 +03:00
Artem Zuikov
2afd123eda
Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645) 2020-07-22 20:13:05 +03:00
Nikolai Kochetov
b27066389a Do not create ThreadPool for single thread. 2020-07-22 14:51:35 +03:00
Nikolai Kochetov
486a4932c3 Fix tests. 2020-07-21 17:08:18 +03:00
Nikolai Kochetov
0cc55781d8 Try fix tests. 2020-07-20 18:09:00 +03:00
Ivan Babrou
72622a9b00 Parallelize PK range and skipping index stages
This runs PK lookup and skipping index stages on parts
in parallel, as described in #11564.

While #12277 sped up PK lookups, skipping index stage
may still be a bottleneck in a select query. Here we
parallelize both stages between parts.

On a query that uses a bloom filter skipping index to pick
2,688 rows out of 8,273,114,994 on a two day time span,
this change reduces latency from 10.5s to 1.5s.
2020-07-19 21:49:41 -07:00
Ivan Babrou
d9d8d0242e Optimize PK lookup for queries that match exact PK range
Existing code that looks up marks that match the query has a pathological
case, when most of the part does in fact match the query.

The code works by recursively splitting a part into ranges and then discarding
the ranges that definitely do not match the query, based on primary key.

The problem is that it requires visiting every mark that matches the query,
making the complexity of this sort of look up O(n).

For queries that match exact range on the primary key, we can find
both left and right parts of the range with O(log 2) complexity.

This change implements exactly that.

To engage this optimization, the query must:

* Have a prefix list of the primary key.
* Have only range or single set element constraints for columns.
* Have only AND as a boolean operator.

Consider a table with `(service, timestamp)` as the primary key.

The following conditions will be optimized:

* `service = 'foo'`
* `service = 'foo' and timestamp >= now() - 3600`
* `service in ('foo')`
* `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now`

The following will fall back to previous lookup algorithm:

* `timestamp >= now() - 3600`
* `service in ('foo', 'bar') and timestamp >= now() - 3600`
* `service = 'foo'`

Note that the optimization won't engage when PK has a range expression
followed by a point expression, since in that case the range is not continuous.

Trace query logging provides the following messages types of messages,
each representing a different kind of PK usage for a part:

```
Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps
Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps
Not using index on part 20200710_5710473_5710473_0
```

Number of steps translates to computational complexity.

Here's a comparison for before and after for a query over 24h of data:

```
Read 4562944 rows, 148.05 MiB in 45.19249672 sec.,   100966 rows/sec.,   3.28 MiB/sec.
Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec.
```

This is especially useful for queries that read data in order
and terminate early to return "last X things" matching a query.

See #11564 for more thoughts on this.
2020-07-11 12:26:54 -07:00
Nikita Mikhaylov
d31ed58f01 done 2020-07-06 17:33:31 +03:00
Anton Popov
73676f5022
Improve performace of reading in order of sorting key. (#11696)
* simplify reading in order of sorting key

* add perf test for reading many parts

* Revert "simplify reading in order of sorting key"

This reverts commit 7267d7c46e.

* add threshold for preliminary merge for reading in order

* better threshold

* limit threads in test
2020-07-04 15:48:51 +03:00
Alexey Milovidov
8eed47857b Fix estimation of the number of marks for various thresholds 2020-06-25 23:20:22 +03:00
Alexey Milovidov
8872417d00 Respect direct_io/mmap settings while reading secondary indices 2020-06-25 22:31:54 +03:00
Alexey Milovidov
5608f15749 Revive mmap IO 2020-06-25 22:15:41 +03:00
alesapin
b1e8976df4 Merge with master 2020-06-22 12:04:27 +03:00
alesapin
4c0879ae30 Better logging in storages 2020-06-19 20:17:13 +03:00
alesapin
dffdece350 getColumns in StorageInMemoryMetadta (only compilable) 2020-06-17 19:39:58 +03:00
alesapin
33c27de54d Check methods in metadata 2020-06-17 17:32:25 +03:00
alesapin
1afdebeebd Primary key in storage metadata 2020-06-17 15:39:20 +03:00
alesapin
1da393b218 Sampling key in StorageInMemoryMetadata 2020-06-17 15:07:09 +03:00
alesapin
ba04d02f1e Compilable sorting key in metadata 2020-06-17 14:05:11 +03:00
alesapin
62f2c17a66 Secondary indices in StorageInMemoryMetadata 2020-06-17 12:38:47 +03:00
alesapin
71f99a274d Compileable getSampleBlockWithColumns in StorageInMemoryMetadata 2020-06-16 17:25:08 +03:00
Anton Popov
5c42408add
Merge pull request #9113 from dimarub2000/group_by_in_order_optimization
[WIP] Optimization of GROUP BY with respect to table sorting key.
2020-06-06 14:25:59 +03:00
Alexander Kuzmenkov
c7d9094a7a
Merge pull request #11259 from ClickHouse/consistent_metadata3
More consistent metadata for secondary indices
2020-06-03 12:23:21 +03:00
Nikolai Kochetov
53d12f5ab8 Try fix build. 2020-06-01 20:06:21 +03:00
Nikolai Kochetov
d25326e75c Try fix build. 2020-06-01 20:03:57 +03:00