ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 19:02:04 +00:00

Author	SHA1	Message	Date
Anton Popov	a3a8e18637	Merge branch 'master' into select_final	2020-11-03 00:00:43 +03:00
Nikolai Kochetov	1c106691b5	Merge pull request #16423 from amosbird/jbodread Balanced reading from JBOD	2020-10-29 19:22:45 +03:00
Amos Bird	f995ef9797	Balanced reading from JBOD	2020-10-29 04:05:07 +08:00
Mikhail Filimonov	41971e073a	Fix typos reported by codespell	2020-10-27 12:04:03 +01:00
Pavel Kruglov	89fdeb4e15	Fix style, move setting and add checking level>0	2020-10-21 20:35:31 +03:00
Pavel Kruglov	f5fac575f4	don't postprocess single parts	2020-10-15 15:22:41 +03:00
Pavel Kruglov	44c2b138f3	Fix style	2020-10-13 22:53:36 +03:00
Pavel Kruglov	be0cb31d21	Add tests and comments	2020-10-13 21:55:03 +03:00
Pavel Kruglov	8200bab859	Add setting do_not_merge_across_partitions	2020-10-13 17:54:52 +03:00
Nikolai Kochetov	76a04fb4b4	Merge pull request #15762 from ClickHouse/new-block-for-functions Use `ColumnsWithTypeAndName` instead of `Block` for function calls	2020-10-10 08:50:38 +03:00
Nikolai Kochetov	a7fb2e38a5	Use ColumnWithTypeAndName as function argument instead of Block.	2020-10-09 10:41:28 +03:00
Azat Khuzhin	75e612fc16	Use full featured parser for force_data_skipping_indices	2020-10-07 01:44:14 +03:00
Azat Khuzhin	ef6d12967f	Implement force_data_skipping_indices setting	2020-10-07 01:42:31 +03:00
Azat Khuzhin	21deb6812c	Drop unused code for numeric_limits<int128> in MergeTreeDataSelectExecutor (#15519 )	2020-10-02 16:46:20 +03:00
Amos Bird	81d08b59e5	Replace useless multiset with unordered_set	2020-09-25 16:38:09 +08:00
filimonov	cc24ef9f83	Better debug message from MergeTreeDataSelectExecutor See #15168	2020-09-22 21:35:29 +02:00
alexey-milovidov	85483f8532	Merge pull request #14853 from ClickHouse/akz/optimized_index_binary_search Optimized marks selection algorithm for continuous marks ranges	2020-09-20 19:48:45 +03:00
Alexander Kazakov	1ee2e3d2b3	Review fix	2020-09-18 16:03:48 +03:00
roman	ddca262fe6	fix review comments	2020-09-17 20:54:21 +01:00
roman	b41421cb1c	[settings]: introduce new query complexity settings for leaf-nodes The new setting should allow to control query complexity on leaf nodes excluding the final merging stage on the root-node. For example, distributed query that reads 1k rows from 5 shards will breach the `max_rows_to_read=5000`, while effectively every shard reads only 1k rows. With setting `max_rows_to_read_leaf=1500` this limit won't be reached and query will succeed since every shard reads not more that ~1k rows.	2020-09-17 10:37:05 +01:00
Alexander Kazakov	7465e00163	Optimized marks selection algorithm for continuous marks ranges	2020-09-15 17:22:32 +03:00
Nikolai Kochetov	92c937db8b	Remove CreatingSetsBlockInputStream	2020-09-02 16:13:13 +03:00
Amos Bird	591a4d60d4	Fix bug in mark inclusion search.	2020-08-29 09:46:46 +08:00
Amos Bird	078b14610d	ALTER MODIFY SAMPLE BY	2020-08-27 22:31:30 +08:00
Artem Zuikov	becc186c91	Add support for extended precision integers and decimals (#13097 )	2020-08-19 14:52:17 +03:00
alexey-milovidov	23ccb0b6be	Merge pull request #13677 from hagen1778/merge-tree-fail-fast-on-rows-limit [mergeTree]: fail fast if max_rows_to_read limit exceeded on parts scan	2020-08-18 22:24:39 +03:00
roman	35e28b4c6b	[mergeTree]: make exception message more clear	2020-08-17 09:52:04 +01:00
roman	b637699ccd	[mergeTree]: fail fast if max_rows_to_read limit exceeded on parts scan The motivation behind this change is to skip ranges scan for all selected parts if it is clear that `max_rows_to_read` is already exceeded. The change is quite noticeable for queries over big number of parts.	2020-08-14 13:53:48 +01:00
Nikolai Kochetov	9b67cd9faf	Merge branch 'master' into refactor-pipes-3	2020-08-10 10:50:17 +03:00
Alexey Milovidov	edd89a8610	Fix half of typos	2020-08-08 03:47:03 +03:00
Alexey Milovidov	0ac3f0481f	Probably fix error	2020-08-07 04:27:29 +03:00
Nikolai Kochetov	20e63d2271	Refactor Pipe [part 6]	2020-08-06 15:24:05 +03:00
Nikolai Kochetov	09fbce1b1e	Merge branch 'master' into refactor-pipes-3	2020-08-04 11:32:34 +03:00
Nikolai Kochetov	2cca4d5fcf	Refactor Pipe [part 2].	2020-08-03 16:54:14 +03:00
Nikolai Kochetov	e411916bde	Refactor Pipe [part 1].	2020-08-03 14:33:11 +03:00
Azat Khuzhin	e37c42c56c	Fix logging in MergeTreeDataSelectExecutor for multiple threads (attach to thread group)	2020-08-02 13:40:01 +03:00
Azat Khuzhin	101217470e	Use "Not using primary index on part" over "Not using index on part" (add "primary")	2020-08-02 13:40:01 +03:00
Nikolai Kochetov	39530f837e	Remove TreeExecutorBlockInputStream.	2020-07-31 16:23:19 +03:00
Alexander Kuzmenkov	1b9269ae0c	fixup	2020-07-28 19:58:19 +03:00
Alexander Kuzmenkov	297cf65f1f	Merge remote-tracking branch 'origin/master' into HEAD	2020-07-28 19:56:35 +03:00
Alexander Kuzmenkov	ba7c33f806	Merge pull request #12754 from bobrik/ivan/obvious-skip Show total granules examined by skipping indices	2020-07-28 17:14:25 +03:00
Ivan Babrou	e835ec0b56	Show marks before applying skipping indices This change makes skipping index efficiency more obvious, changing this: ``` Selected 30 parts by date, 30 parts by key, 592 marks to read from 541 ranges ``` Into this: ``` Selected 30 parts by date, 30 parts by key, 48324 marks by primary key, 592 marks to read from 541 ranges ```	2020-07-24 15:45:38 -07:00
Ivan Babrou	67d4529783	Show total granules examined by skipping indices This change makes skipping index efficiency more obvious, changing this: ``` Index `idx_duration` has dropped 59 granules. ``` Into this: ``` Index `idx_duration` has dropped 59 / 61 granules. ```	2020-07-24 14:50:32 -07:00
Nikolai Kochetov	dad9d369a1	Merge branch 'master' into bobrik-parallel-randes	2020-07-23 16:21:32 +03:00
Artem Zuikov	2afd123eda	Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645 )	2020-07-22 20:13:05 +03:00
Nikolai Kochetov	b27066389a	Do not create ThreadPool for single thread.	2020-07-22 14:51:35 +03:00
Nikolai Kochetov	486a4932c3	Fix tests.	2020-07-21 17:08:18 +03:00
Nikolai Kochetov	0cc55781d8	Try fix tests.	2020-07-20 18:09:00 +03:00
Ivan Babrou	72622a9b00	Parallelize PK range and skipping index stages This runs PK lookup and skipping index stages on parts in parallel, as described in #11564. While #12277 sped up PK lookups, skipping index stage may still be a bottleneck in a select query. Here we parallelize both stages between parts. On a query that uses a bloom filter skipping index to pick 2,688 rows out of 8,273,114,994 on a two day time span, this change reduces latency from 10.5s to 1.5s.	2020-07-19 21:49:41 -07:00
Ivan Babrou	d9d8d0242e	Optimize PK lookup for queries that match exact PK range Existing code that looks up marks that match the query has a pathological case, when most of the part does in fact match the query. The code works by recursively splitting a part into ranges and then discarding the ranges that definitely do not match the query, based on primary key. The problem is that it requires visiting every mark that matches the query, making the complexity of this sort of look up O(n). For queries that match exact range on the primary key, we can find both left and right parts of the range with O(log 2) complexity. This change implements exactly that. To engage this optimization, the query must: * Have a prefix list of the primary key. * Have only range or single set element constraints for columns. * Have only AND as a boolean operator. Consider a table with `(service, timestamp)` as the primary key. The following conditions will be optimized: * `service = 'foo'` * `service = 'foo' and timestamp >= now() - 3600` * `service in ('foo')` * `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now` The following will fall back to previous lookup algorithm: * `timestamp >= now() - 3600` * `service in ('foo', 'bar') and timestamp >= now() - 3600` * `service = 'foo'` Note that the optimization won't engage when PK has a range expression followed by a point expression, since in that case the range is not continuous. Trace query logging provides the following messages types of messages, each representing a different kind of PK usage for a part: ``` Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps Not using index on part 20200710_5710473_5710473_0 ``` Number of steps translates to computational complexity. Here's a comparison for before and after for a query over 24h of data: ``` Read 4562944 rows, 148.05 MiB in 45.19249672 sec., 100966 rows/sec., 3.28 MiB/sec. Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec. ``` This is especially useful for queries that read data in order and terminate early to return "last X things" matching a query. See #11564 for more thoughts on this.	2020-07-11 12:26:54 -07:00

1 2 3

117 Commits