ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-16 19:32:07 +00:00

Author	SHA1	Message	Date
Alexey Milovidov	1bcf22d42f	Fix 'max_parallel_replicas' without sampling.	2020-11-04 18:59:14 +03:00
Azat Khuzhin	2389406c21	Fix spreading for ReadInOrderOptimizer with expression in ORDER BY This will fix optimize_read_in_order/optimize_aggregation_in_order with max_threads>0 and expression in ORDER BY	2020-11-04 07:07:26 +03:00
Nikolai Kochetov	54a9b80a11	Fix build	2020-11-03 22:30:58 +03:00
Nikolai Kochetov	6767a226fc	Merge branch 'master' into actions-dag-f14	2020-11-03 15:21:06 +03:00
Nikolai Kochetov	07a7c46b89	Refactor ExpressionActions [Part 3]	2020-11-03 14:28:28 +03:00
Anton Popov	a3a8e18637	Merge branch 'master' into select_final	2020-11-03 00:00:43 +03:00
Nikolai Kochetov	1c106691b5	Merge pull request #16423 from amosbird/jbodread Balanced reading from JBOD	2020-10-29 19:22:45 +03:00
Amos Bird	f995ef9797	Balanced reading from JBOD	2020-10-29 04:05:07 +08:00
Mikhail Filimonov	41971e073a	Fix typos reported by codespell	2020-10-27 12:04:03 +01:00
Pavel Kruglov	89fdeb4e15	Fix style, move setting and add checking level>0	2020-10-21 20:35:31 +03:00
Pavel Kruglov	f5fac575f4	don't postprocess single parts	2020-10-15 15:22:41 +03:00
Pavel Kruglov	44c2b138f3	Fix style	2020-10-13 22:53:36 +03:00
Pavel Kruglov	be0cb31d21	Add tests and comments	2020-10-13 21:55:03 +03:00
Pavel Kruglov	8200bab859	Add setting do_not_merge_across_partitions	2020-10-13 17:54:52 +03:00
Nikolai Kochetov	7e58f99f64	Merge branch 'master' into storage-read-query-plan	2020-10-12 13:12:39 +03:00
Nikolai Kochetov	76a04fb4b4	Merge pull request #15762 from ClickHouse/new-block-for-functions Use `ColumnsWithTypeAndName` instead of `Block` for function calls	2020-10-10 08:50:38 +03:00
Nikolai Kochetov	a7fb2e38a5	Use ColumnWithTypeAndName as function argument instead of Block.	2020-10-09 10:41:28 +03:00
Azat Khuzhin	75e612fc16	Use full featured parser for force_data_skipping_indices	2020-10-07 01:44:14 +03:00
Azat Khuzhin	ef6d12967f	Implement force_data_skipping_indices setting	2020-10-07 01:42:31 +03:00
Nikolai Kochetov	f9bf1e3406	Merge branch 'master' into storage-read-query-plan	2020-10-06 09:50:10 +03:00
Azat Khuzhin	21deb6812c	Drop unused code for numeric_limits<int128> in MergeTreeDataSelectExecutor (#15519 )	2020-10-02 16:46:20 +03:00
Nikolai Kochetov	ea131989be	Try fix test.	2020-10-01 21:47:20 +03:00
Nikolai Kochetov	ec64def384	Use QueryPlan while reading from MergeTree.	2020-10-01 20:34:22 +03:00
Amos Bird	81d08b59e5	Replace useless multiset with unordered_set	2020-09-25 16:38:09 +08:00
filimonov	cc24ef9f83	Better debug message from MergeTreeDataSelectExecutor See #15168	2020-09-22 21:35:29 +02:00
alexey-milovidov	85483f8532	Merge pull request #14853 from ClickHouse/akz/optimized_index_binary_search Optimized marks selection algorithm for continuous marks ranges	2020-09-20 19:48:45 +03:00
Alexander Kazakov	1ee2e3d2b3	Review fix	2020-09-18 16:03:48 +03:00
roman	ddca262fe6	fix review comments	2020-09-17 20:54:21 +01:00
roman	b41421cb1c	[settings]: introduce new query complexity settings for leaf-nodes The new setting should allow to control query complexity on leaf nodes excluding the final merging stage on the root-node. For example, distributed query that reads 1k rows from 5 shards will breach the `max_rows_to_read=5000`, while effectively every shard reads only 1k rows. With setting `max_rows_to_read_leaf=1500` this limit won't be reached and query will succeed since every shard reads not more that ~1k rows.	2020-09-17 10:37:05 +01:00
Alexander Kazakov	7465e00163	Optimized marks selection algorithm for continuous marks ranges	2020-09-15 17:22:32 +03:00
Nikolai Kochetov	92c937db8b	Remove CreatingSetsBlockInputStream	2020-09-02 16:13:13 +03:00
Amos Bird	591a4d60d4	Fix bug in mark inclusion search.	2020-08-29 09:46:46 +08:00
Amos Bird	078b14610d	ALTER MODIFY SAMPLE BY	2020-08-27 22:31:30 +08:00
Artem Zuikov	becc186c91	Add support for extended precision integers and decimals (#13097 )	2020-08-19 14:52:17 +03:00
alexey-milovidov	23ccb0b6be	Merge pull request #13677 from hagen1778/merge-tree-fail-fast-on-rows-limit [mergeTree]: fail fast if max_rows_to_read limit exceeded on parts scan	2020-08-18 22:24:39 +03:00
roman	35e28b4c6b	[mergeTree]: make exception message more clear	2020-08-17 09:52:04 +01:00
roman	b637699ccd	[mergeTree]: fail fast if max_rows_to_read limit exceeded on parts scan The motivation behind this change is to skip ranges scan for all selected parts if it is clear that `max_rows_to_read` is already exceeded. The change is quite noticeable for queries over big number of parts.	2020-08-14 13:53:48 +01:00
Nikolai Kochetov	9b67cd9faf	Merge branch 'master' into refactor-pipes-3	2020-08-10 10:50:17 +03:00
Alexey Milovidov	edd89a8610	Fix half of typos	2020-08-08 03:47:03 +03:00
Alexey Milovidov	0ac3f0481f	Probably fix error	2020-08-07 04:27:29 +03:00
Nikolai Kochetov	20e63d2271	Refactor Pipe [part 6]	2020-08-06 15:24:05 +03:00
Nikolai Kochetov	09fbce1b1e	Merge branch 'master' into refactor-pipes-3	2020-08-04 11:32:34 +03:00
Nikolai Kochetov	2cca4d5fcf	Refactor Pipe [part 2].	2020-08-03 16:54:14 +03:00
Nikolai Kochetov	e411916bde	Refactor Pipe [part 1].	2020-08-03 14:33:11 +03:00
Azat Khuzhin	e37c42c56c	Fix logging in MergeTreeDataSelectExecutor for multiple threads (attach to thread group)	2020-08-02 13:40:01 +03:00
Azat Khuzhin	101217470e	Use "Not using primary index on part" over "Not using index on part" (add "primary")	2020-08-02 13:40:01 +03:00
Nikolai Kochetov	39530f837e	Remove TreeExecutorBlockInputStream.	2020-07-31 16:23:19 +03:00
Alexander Kuzmenkov	1b9269ae0c	fixup	2020-07-28 19:58:19 +03:00
Alexander Kuzmenkov	297cf65f1f	Merge remote-tracking branch 'origin/master' into HEAD	2020-07-28 19:56:35 +03:00
Alexander Kuzmenkov	ba7c33f806	Merge pull request #12754 from bobrik/ivan/obvious-skip Show total granules examined by skipping indices	2020-07-28 17:14:25 +03:00
Ivan Babrou	e835ec0b56	Show marks before applying skipping indices This change makes skipping index efficiency more obvious, changing this: ``` Selected 30 parts by date, 30 parts by key, 592 marks to read from 541 ranges ``` Into this: ``` Selected 30 parts by date, 30 parts by key, 48324 marks by primary key, 592 marks to read from 541 ranges ```	2020-07-24 15:45:38 -07:00
Ivan Babrou	67d4529783	Show total granules examined by skipping indices This change makes skipping index efficiency more obvious, changing this: ``` Index `idx_duration` has dropped 59 granules. ``` Into this: ``` Index `idx_duration` has dropped 59 / 61 granules. ```	2020-07-24 14:50:32 -07:00
Nikolai Kochetov	dad9d369a1	Merge branch 'master' into bobrik-parallel-randes	2020-07-23 16:21:32 +03:00
Artem Zuikov	2afd123eda	Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645 )	2020-07-22 20:13:05 +03:00
Nikolai Kochetov	b27066389a	Do not create ThreadPool for single thread.	2020-07-22 14:51:35 +03:00
Nikolai Kochetov	486a4932c3	Fix tests.	2020-07-21 17:08:18 +03:00
Nikolai Kochetov	0cc55781d8	Try fix tests.	2020-07-20 18:09:00 +03:00
Ivan Babrou	72622a9b00	Parallelize PK range and skipping index stages This runs PK lookup and skipping index stages on parts in parallel, as described in #11564. While #12277 sped up PK lookups, skipping index stage may still be a bottleneck in a select query. Here we parallelize both stages between parts. On a query that uses a bloom filter skipping index to pick 2,688 rows out of 8,273,114,994 on a two day time span, this change reduces latency from 10.5s to 1.5s.	2020-07-19 21:49:41 -07:00
Ivan Babrou	d9d8d0242e	Optimize PK lookup for queries that match exact PK range Existing code that looks up marks that match the query has a pathological case, when most of the part does in fact match the query. The code works by recursively splitting a part into ranges and then discarding the ranges that definitely do not match the query, based on primary key. The problem is that it requires visiting every mark that matches the query, making the complexity of this sort of look up O(n). For queries that match exact range on the primary key, we can find both left and right parts of the range with O(log 2) complexity. This change implements exactly that. To engage this optimization, the query must: * Have a prefix list of the primary key. * Have only range or single set element constraints for columns. * Have only AND as a boolean operator. Consider a table with `(service, timestamp)` as the primary key. The following conditions will be optimized: * `service = 'foo'` * `service = 'foo' and timestamp >= now() - 3600` * `service in ('foo')` * `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now` The following will fall back to previous lookup algorithm: * `timestamp >= now() - 3600` * `service in ('foo', 'bar') and timestamp >= now() - 3600` * `service = 'foo'` Note that the optimization won't engage when PK has a range expression followed by a point expression, since in that case the range is not continuous. Trace query logging provides the following messages types of messages, each representing a different kind of PK usage for a part: ``` Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps Not using index on part 20200710_5710473_5710473_0 ``` Number of steps translates to computational complexity. Here's a comparison for before and after for a query over 24h of data: ``` Read 4562944 rows, 148.05 MiB in 45.19249672 sec., 100966 rows/sec., 3.28 MiB/sec. Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec. ``` This is especially useful for queries that read data in order and terminate early to return "last X things" matching a query. See #11564 for more thoughts on this.	2020-07-11 12:26:54 -07:00
Nikita Mikhaylov	d31ed58f01	done	2020-07-06 17:33:31 +03:00
Anton Popov	73676f5022	Improve performace of reading in order of sorting key. (#11696 ) * simplify reading in order of sorting key * add perf test for reading many parts * Revert "simplify reading in order of sorting key" This reverts commit `7267d7c46e`. * add threshold for preliminary merge for reading in order * better threshold * limit threads in test	2020-07-04 15:48:51 +03:00
Alexey Milovidov	8eed47857b	Fix estimation of the number of marks for various thresholds	2020-06-25 23:20:22 +03:00
Alexey Milovidov	8872417d00	Respect direct_io/mmap settings while reading secondary indices	2020-06-25 22:31:54 +03:00
Alexey Milovidov	5608f15749	Revive mmap IO	2020-06-25 22:15:41 +03:00
alesapin	b1e8976df4	Merge with master	2020-06-22 12:04:27 +03:00
alesapin	4c0879ae30	Better logging in storages	2020-06-19 20:17:13 +03:00
alesapin	dffdece350	getColumns in StorageInMemoryMetadta (only compilable)	2020-06-17 19:39:58 +03:00
alesapin	33c27de54d	Check methods in metadata	2020-06-17 17:32:25 +03:00
alesapin	1afdebeebd	Primary key in storage metadata	2020-06-17 15:39:20 +03:00
alesapin	1da393b218	Sampling key in StorageInMemoryMetadata	2020-06-17 15:07:09 +03:00
alesapin	ba04d02f1e	Compilable sorting key in metadata	2020-06-17 14:05:11 +03:00
alesapin	62f2c17a66	Secondary indices in StorageInMemoryMetadata	2020-06-17 12:38:47 +03:00
alesapin	71f99a274d	Compileable getSampleBlockWithColumns in StorageInMemoryMetadata	2020-06-16 17:25:08 +03:00
Anton Popov	5c42408add	Merge pull request #9113 from dimarub2000/group_by_in_order_optimization [WIP] Optimization of GROUP BY with respect to table sorting key.	2020-06-06 14:25:59 +03:00
Alexander Kuzmenkov	c7d9094a7a	Merge pull request #11259 from ClickHouse/consistent_metadata3 More consistent metadata for secondary indices	2020-06-03 12:23:21 +03:00
Nikolai Kochetov	53d12f5ab8	Try fix build.	2020-06-01 20:06:21 +03:00
Nikolai Kochetov	d25326e75c	Try fix build.	2020-06-01 20:03:57 +03:00
alesapin	663e92b1c5	Rename to methods	2020-06-01 14:29:11 +03:00
alesapin	3847ea892d	Merge branch 'master' into consistent_metadata3	2020-06-01 13:17:59 +03:00
Alexey Milovidov	25f941020b	Remove namespace pollution	2020-05-31 00:57:37 +03:00
Dmitry	4b0d32f026	Merge branch 'master' of github.com:yandex/ClickHouse into group_by_in_order_optimization	2020-05-31 00:21:02 +03:00
alesapin	5f8b69547b	More readable code	2020-05-28 16:45:08 +03:00
alesapin	26a9829e04	Indices description as vector	2020-05-28 15:47:17 +03:00
alesapin	52ca6b2051	I'm able to build it	2020-05-28 15:37:05 +03:00
Nikolai Kochetov	da0052858d	Fix build.	2020-05-28 13:57:04 +03:00
Dmitry	41d1cd1c9b	fix bad merge	2020-05-27 01:17:32 +03:00
Dmitry	38c585f867	Merge branch 'master' of github.com:yandex/ClickHouse into group_by_in_order_optimization	2020-05-26 21:27:50 +03:00
alesapin	c3a6571036	Compilable code	2020-05-25 20:22:20 +03:00
alesapin	6281dd6893	Merge pull request #11115 from ClickHouse/consistent_metadata Refactoring in storage metadata (more consistent keys)	2020-05-25 13:09:56 +03:00
Alexey Milovidov	7e1813825b	Return old names of macros	2020-05-24 01:24:01 +03:00
Alexey Milovidov	18febd7b97	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_[^\_(]+$[^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+"$;' \| while read file; do perl -pne 's/(LOG_[^\_(]+)$([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)"$;/${1}_FORMATTED(${2}, "${3}{}${5}{}${7}{}${9}{}${11}", ${4}, ${6}, ${8}, ${10});/' $file > ${file}.tmp; mv ${file}.tmp $file; done	2020-05-23 22:56:05 +03:00
Alexey Milovidov	a2ad11897f	Remove duplicate whitespaces (preparation)	2020-05-23 21:53:58 +03:00
Alexey Milovidov	1f13515a65	Make all LOG in single line (preparation)	2020-05-23 21:31:37 +03:00
Alexey Milovidov	533f86278a	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+$[^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+"$;' \| xargs sed -i -r -e 's/(LOG_\w+)$([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)"$;/\1_FORMATTED(\2, "\3{}\5{}\7", \4, \6);/'	2020-05-23 20:00:41 +03:00
Alexey Milovidov	e391b77d81	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+$[^,]+, "[^"]+" << [^<]+ << "[^"]+"$;' \| xargs sed -i -r -e 's/(LOG_\w+)$([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)"$;/\1_FORMATTED(\2, "\3{}\5", \4);/'	2020-05-23 19:56:05 +03:00
Alexey Milovidov	ee4ffbc332	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+$[^,]+, "[^"]+" << [^<]+$;' \| xargs sed -i -r -e 's/(LOG_\w+)$([^,]+), "([^"]+)" << ([^<]+)$;/\1_FORMATTED(\2, "\3{}", \4);/'	2020-05-23 19:47:56 +03:00
Alexey Milovidov	8d2e80a5e2	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+$[^,]+, "[^"]+"$' \| xargs sed -i -r -e 's/(LOG_\w+)$([^,]+, "[^"]+")$/\1_FORMATTED(\2)/'	2020-05-23 19:42:39 +03:00
Dmitry	47778c0259	Merge branch 'master' of github.com:yandex/ClickHouse into group_by_in_order_optimization	2020-05-21 23:45:12 +03:00
alesapin	327d17ac6a	Better	2020-05-21 22:46:03 +03:00
Azat Khuzhin	d93b9a57f6	Forward declaration for Context as much as possible. Now after changing Context.h 488 modules will be recompiled instead of 582.	2020-05-21 01:53:18 +03:00

1 2 3 4

176 Commits