ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 19:02:04 +00:00

Author	SHA1	Message	Date
Nikolai Kochetov	39530f837e	Remove TreeExecutorBlockInputStream.	2020-07-31 16:23:19 +03:00
Alexander Kuzmenkov	1b9269ae0c	fixup	2020-07-28 19:58:19 +03:00
Alexander Kuzmenkov	297cf65f1f	Merge remote-tracking branch 'origin/master' into HEAD	2020-07-28 19:56:35 +03:00
Alexander Kuzmenkov	ba7c33f806	Merge pull request #12754 from bobrik/ivan/obvious-skip Show total granules examined by skipping indices	2020-07-28 17:14:25 +03:00
Ivan Babrou	e835ec0b56	Show marks before applying skipping indices This change makes skipping index efficiency more obvious, changing this: ``` Selected 30 parts by date, 30 parts by key, 592 marks to read from 541 ranges ``` Into this: ``` Selected 30 parts by date, 30 parts by key, 48324 marks by primary key, 592 marks to read from 541 ranges ```	2020-07-24 15:45:38 -07:00
Ivan Babrou	67d4529783	Show total granules examined by skipping indices This change makes skipping index efficiency more obvious, changing this: ``` Index `idx_duration` has dropped 59 granules. ``` Into this: ``` Index `idx_duration` has dropped 59 / 61 granules. ```	2020-07-24 14:50:32 -07:00
Nikolai Kochetov	dad9d369a1	Merge branch 'master' into bobrik-parallel-randes	2020-07-23 16:21:32 +03:00
Artem Zuikov	2afd123eda	Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645 )	2020-07-22 20:13:05 +03:00
Nikolai Kochetov	b27066389a	Do not create ThreadPool for single thread.	2020-07-22 14:51:35 +03:00
Nikolai Kochetov	486a4932c3	Fix tests.	2020-07-21 17:08:18 +03:00
Nikolai Kochetov	0cc55781d8	Try fix tests.	2020-07-20 18:09:00 +03:00
Ivan Babrou	72622a9b00	Parallelize PK range and skipping index stages This runs PK lookup and skipping index stages on parts in parallel, as described in #11564. While #12277 sped up PK lookups, skipping index stage may still be a bottleneck in a select query. Here we parallelize both stages between parts. On a query that uses a bloom filter skipping index to pick 2,688 rows out of 8,273,114,994 on a two day time span, this change reduces latency from 10.5s to 1.5s.	2020-07-19 21:49:41 -07:00
Ivan Babrou	d9d8d0242e	Optimize PK lookup for queries that match exact PK range Existing code that looks up marks that match the query has a pathological case, when most of the part does in fact match the query. The code works by recursively splitting a part into ranges and then discarding the ranges that definitely do not match the query, based on primary key. The problem is that it requires visiting every mark that matches the query, making the complexity of this sort of look up O(n). For queries that match exact range on the primary key, we can find both left and right parts of the range with O(log 2) complexity. This change implements exactly that. To engage this optimization, the query must: * Have a prefix list of the primary key. * Have only range or single set element constraints for columns. * Have only AND as a boolean operator. Consider a table with `(service, timestamp)` as the primary key. The following conditions will be optimized: * `service = 'foo'` * `service = 'foo' and timestamp >= now() - 3600` * `service in ('foo')` * `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now` The following will fall back to previous lookup algorithm: * `timestamp >= now() - 3600` * `service in ('foo', 'bar') and timestamp >= now() - 3600` * `service = 'foo'` Note that the optimization won't engage when PK has a range expression followed by a point expression, since in that case the range is not continuous. Trace query logging provides the following messages types of messages, each representing a different kind of PK usage for a part: ``` Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps Not using index on part 20200710_5710473_5710473_0 ``` Number of steps translates to computational complexity. Here's a comparison for before and after for a query over 24h of data: ``` Read 4562944 rows, 148.05 MiB in 45.19249672 sec., 100966 rows/sec., 3.28 MiB/sec. Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec. ``` This is especially useful for queries that read data in order and terminate early to return "last X things" matching a query. See #11564 for more thoughts on this.	2020-07-11 12:26:54 -07:00
Nikita Mikhaylov	d31ed58f01	done	2020-07-06 17:33:31 +03:00
Anton Popov	73676f5022	Improve performace of reading in order of sorting key. (#11696 ) * simplify reading in order of sorting key * add perf test for reading many parts * Revert "simplify reading in order of sorting key" This reverts commit `7267d7c46e`. * add threshold for preliminary merge for reading in order * better threshold * limit threads in test	2020-07-04 15:48:51 +03:00
Alexey Milovidov	8eed47857b	Fix estimation of the number of marks for various thresholds	2020-06-25 23:20:22 +03:00
Alexey Milovidov	8872417d00	Respect direct_io/mmap settings while reading secondary indices	2020-06-25 22:31:54 +03:00
Alexey Milovidov	5608f15749	Revive mmap IO	2020-06-25 22:15:41 +03:00
alesapin	b1e8976df4	Merge with master	2020-06-22 12:04:27 +03:00
alesapin	4c0879ae30	Better logging in storages	2020-06-19 20:17:13 +03:00
alesapin	dffdece350	getColumns in StorageInMemoryMetadta (only compilable)	2020-06-17 19:39:58 +03:00
alesapin	33c27de54d	Check methods in metadata	2020-06-17 17:32:25 +03:00
alesapin	1afdebeebd	Primary key in storage metadata	2020-06-17 15:39:20 +03:00
alesapin	1da393b218	Sampling key in StorageInMemoryMetadata	2020-06-17 15:07:09 +03:00
alesapin	ba04d02f1e	Compilable sorting key in metadata	2020-06-17 14:05:11 +03:00
alesapin	62f2c17a66	Secondary indices in StorageInMemoryMetadata	2020-06-17 12:38:47 +03:00
alesapin	71f99a274d	Compileable getSampleBlockWithColumns in StorageInMemoryMetadata	2020-06-16 17:25:08 +03:00
Anton Popov	5c42408add	Merge pull request #9113 from dimarub2000/group_by_in_order_optimization [WIP] Optimization of GROUP BY with respect to table sorting key.	2020-06-06 14:25:59 +03:00
Alexander Kuzmenkov	c7d9094a7a	Merge pull request #11259 from ClickHouse/consistent_metadata3 More consistent metadata for secondary indices	2020-06-03 12:23:21 +03:00
Nikolai Kochetov	53d12f5ab8	Try fix build.	2020-06-01 20:06:21 +03:00
Nikolai Kochetov	d25326e75c	Try fix build.	2020-06-01 20:03:57 +03:00
alesapin	663e92b1c5	Rename to methods	2020-06-01 14:29:11 +03:00
alesapin	3847ea892d	Merge branch 'master' into consistent_metadata3	2020-06-01 13:17:59 +03:00
Alexey Milovidov	25f941020b	Remove namespace pollution	2020-05-31 00:57:37 +03:00
Dmitry	4b0d32f026	Merge branch 'master' of github.com:yandex/ClickHouse into group_by_in_order_optimization	2020-05-31 00:21:02 +03:00
alesapin	5f8b69547b	More readable code	2020-05-28 16:45:08 +03:00
alesapin	26a9829e04	Indices description as vector	2020-05-28 15:47:17 +03:00
alesapin	52ca6b2051	I'm able to build it	2020-05-28 15:37:05 +03:00
Nikolai Kochetov	da0052858d	Fix build.	2020-05-28 13:57:04 +03:00
Dmitry	41d1cd1c9b	fix bad merge	2020-05-27 01:17:32 +03:00
Dmitry	38c585f867	Merge branch 'master' of github.com:yandex/ClickHouse into group_by_in_order_optimization	2020-05-26 21:27:50 +03:00
alesapin	c3a6571036	Compilable code	2020-05-25 20:22:20 +03:00
alesapin	6281dd6893	Merge pull request #11115 from ClickHouse/consistent_metadata Refactoring in storage metadata (more consistent keys)	2020-05-25 13:09:56 +03:00
Alexey Milovidov	7e1813825b	Return old names of macros	2020-05-24 01:24:01 +03:00
Alexey Milovidov	18febd7b97	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_[^\_(]+$[^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+"$;' \| while read file; do perl -pne 's/(LOG_[^\_(]+)$([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)"$;/${1}_FORMATTED(${2}, "${3}{}${5}{}${7}{}${9}{}${11}", ${4}, ${6}, ${8}, ${10});/' $file > ${file}.tmp; mv ${file}.tmp $file; done	2020-05-23 22:56:05 +03:00
Alexey Milovidov	a2ad11897f	Remove duplicate whitespaces (preparation)	2020-05-23 21:53:58 +03:00
Alexey Milovidov	1f13515a65	Make all LOG in single line (preparation)	2020-05-23 21:31:37 +03:00
Alexey Milovidov	533f86278a	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+$[^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+"$;' \| xargs sed -i -r -e 's/(LOG_\w+)$([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)"$;/\1_FORMATTED(\2, "\3{}\5{}\7", \4, \6);/'	2020-05-23 20:00:41 +03:00
Alexey Milovidov	e391b77d81	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+$[^,]+, "[^"]+" << [^<]+ << "[^"]+"$;' \| xargs sed -i -r -e 's/(LOG_\w+)$([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)"$;/\1_FORMATTED(\2, "\3{}\5", \4);/'	2020-05-23 19:56:05 +03:00
Alexey Milovidov	ee4ffbc332	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+$[^,]+, "[^"]+" << [^<]+$;' \| xargs sed -i -r -e 's/(LOG_\w+)$([^,]+), "([^"]+)" << ([^<]+)$;/\1_FORMATTED(\2, "\3{}", \4);/'	2020-05-23 19:47:56 +03:00

1 2

80 Commits