ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-16 11:22:12 +00:00

Author	SHA1	Message	Date
Antonio Andelic	a37ca4b961	Merge branch 'master' into custom-key-parallel-replicas-with-shard	2023-01-18 11:53:04 +00:00
Alexander Tokmakov	5cd90c1a3e	Merge branch 'master' into exception_message_patterns	2023-01-17 20:04:04 +01:00
Azat Khuzhin	54fc6859ae	Fix race in Distributed table startup Before this patch it was possible to have multiple directory monitors for the same directory, one from the INSERT context, another one on storage startup(). Here are an example of logs for this scenario: 2022.12.07 12:12:27.552485 [ 39925 ] {a47fcb32-4f44-4dbd-94fe-0070d4ea0f6b} <Debug> DDLWorker: Executed query: DETACH TABLE inc.dist_urls_in ... 2022.12.07 12:12:33.228449 [ 4408 ] {20c761d3-a46d-417b-9fcd-89a8919dd1fe} <Debug> executeQuery: (from 0.0.0.0:0, user: ) /* ddl_entry=query-0000089229 */ ATTACH TABLE inc.dist_urls_in (stage: Complete) ... this is the DirectoryMonitor created from the context of INSERT for the old StoragePtr that had not been destroyed yet (becase of "was 1" this can be done only from the context of INSERT) ... 2022.12.07 12:12:35.556048 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Files set to 173 (was 1) 2022.12.07 12:12:35.556078 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Bytes set to 29750181 (was 71004) 2022.12.07 12:12:35.562716 [ 39536 ] {} <Trace> Connection (i13.ch:9000): Connected to ClickHouse server version 22.10.1. 2022.12.07 12:12:35.562750 [ 39536 ] {} <Debug> inc.dist_urls_in.DirectoryMonitor: Sending a batch of 10 files to i13.ch:9000 (0.00 rows, 0.00 B bytes). ... this is the DirectoryMonitor that created during ATTACH ... 2022.12.07 12:12:35.802080 [ 39265 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Files set to 173 (was 0) 2022.12.07 12:12:35.802107 [ 39265 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Bytes set to 29750181 (was 0) 2022.12.07 12:12:35.834216 [ 39265 ] {} <Debug> inc.dist_urls_in.DirectoryMonitor: Sending a batch of 10 files to i13.ch:9000 (0.00 rows, 0.00 B bytes). ... 2022.12.07 12:12:38.532627 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Sent a batch of 10 files (took 2976 ms). ... 2022.12.07 12:12:38.601051 [ 39265 ] {} <Error> inc.dist_urls_in.DirectoryMonitor: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in file_size: No such file or directory ["/data6/clickhouse/data/inc/dist_urls_in/shard13_replica1/66827403.bin"], Stack trace (when copying this message, always include the lines below): ... 2022.12.07 12:12:54.132837 [ 4408 ] {20c761d3-a46d-417b-9fcd-89a8919dd1fe} <Debug> DDLWorker: Executed query: ATTACH TABLE inc.dist_urls_in And eventually both monitors (for a short period of time, one replaces another) are trying to process the same batch (current_batch.txt), and one of them fails because such file had been already removed. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-17 14:51:00 +01:00
Antonio Andelic	dd31de18a3	Extend where correctly	2023-01-17 13:31:09 +00:00
Antonio Andelic	bd352068d7	Turn replicas into shard for custom_key	2023-01-17 12:34:42 +00:00
Antonio Andelic	a85565c4a6	Merge branch 'master' into custom-key-parallel-replicas	2023-01-17 07:37:18 +00:00
Alexander Tokmakov	522686f78b	less empty patterns	2023-01-17 01:19:44 +01:00
Alexander Tokmakov	94604f71b7	Merge pull request #44922 from azat/dist/async-INSERT-metrics Optimize and fix metrics for Distributed async INSERT	2023-01-16 14:12:56 +03:00
Antonio Andelic	36cf9debae	Merge branch 'master' into custom-key-parallel-replicas	2023-01-16 08:11:08 +00:00
Maksim Kita	43a0996356	Fixed tests	2023-01-12 12:07:58 +01:00
Maksim Kita	a140d6c5b1	Fixed code review issues	2023-01-12 12:07:58 +01:00
Maksim Kita	47f4159909	Analyzer support distributed queries processing	2023-01-12 12:07:58 +01:00
Antonio Andelic	ecbffa80b6	Add READ_TASKS mode	2023-01-12 07:56:15 +00:00
Nikita Mikhaylov	857799fbca	Parallel distributed insert select with s3Cluster [3] (#44955 ) * Revert "Revert "Resurrect parallel distributed insert select with s3Cluster (#41535)"" This reverts commit `b8d9066004`. * Fix build * Better * Fix test * Automatic style fix Co-authored-by: robot-clickhouse <robot-clickhouse@users.noreply.github.com>	2023-01-09 13:30:32 +01:00
Azat Khuzhin	f5b44cbe0d	Optimize and fix metrics for Distributed async INSERT In #43406 metrics was broken for a clean start, since they where not initialized from disk, but metrics for broken files was never initialized from disk. Fix this and rework how DirectoryMonitor works with file system: - do not iterate over directory before each send, do this only once on init, after the map of files will be updated from the INSERT - call fs::create_directories() from the ctor for "broken" folder to avoid excessive calls - cache "broken" paths This patch also fixes possible issue when current_batch can be processed multiple times (second time will be an exception), since if there is existing current_batch.txt after processing it you should remove it instantly. Plus this patch implicitly fixes issues with logging, that logs incorrect number of files in case of error (see #44907 for details). Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-05 11:07:40 +01:00
Nikita Taranov	8ed5cfc265	Memory bound merging for distributed aggregation in order (#40879 ) * impl * fix style * make executeQueryWithParallelReplicas similar to executeQuery * impl for parallel replicas * cleaner code for remote sorting properties * update test * fix * handle when nodes of old versions participate * small fixes * temporary enable for testing * fix after merge * Revert "temporary enable for testing" This reverts commit `cce7f8884c`. * review fixes * add bc test * Update src/Core/Settings.h	2022-11-28 00:41:31 +01:00
Anton Popov	2ae3cfa9e0	Merge branch 'master' into dynamic-columns-14	2022-10-31 16:15:19 +01:00
Maksim Kita	ca93ee7479	Fixed tests	2022-10-24 10:22:20 +02:00
Azat Khuzhin	4e76629aaf	Fixes for -Wshorten-64-to-32 - lots of static_cast - add safe_cast - types adjustments - config - IStorage::read/watch - ... - some TODO's (to convert types in future) P.S. That was quite a journey... v2: fixes after rebase v3: fix conflicts after #42308 merged Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-10-21 13:25:19 +02:00
Alexander Tokmakov	bd10a9d2d4	Merge pull request #42168 from ClickHouse/unreachable_macro Abort instead of `__builtin_unreachable` in debug builds	2022-10-08 19:05:57 +03:00
Alexander Tokmakov	4175f8cde6	abort instead of __builtin_unreachable in debug builds	2022-10-07 21:49:08 +02:00
Alexander Tokmakov	b8d9066004	Revert "Resurrect parallel distributed insert select with s3Cluster (#41535 )" This reverts commit `860e34e760`.	2022-10-07 15:53:30 +02:00
Nikita Mikhaylov	860e34e760	Resurrect parallel distributed insert select with s3Cluster (#41535 )	2022-10-06 13:47:32 +02:00
Anton Popov	6e61cf92f5	Merge remote-tracking branch 'upstream/master' into HEAD	2022-10-03 13:16:57 +00:00
Alexey Milovidov	ab4db2d0c4	Fix 5/6 of trash	2022-09-19 08:50:53 +02:00
Anton Popov	f0a404e2c8	Merge remote-tracking branch 'upstream/master' into HEAD	2022-09-06 15:51:16 +00:00
Alexander Tokmakov	f9f85a0e8b	Revert "Parallel distributed insert select from *Cluster table functions (#39107 )" This reverts commit `d3cc234986`.	2022-08-24 15:17:15 +03:00
Nikita Mikhaylov	d3cc234986	Parallel distributed insert select from *Cluster table functions (#39107 )	2022-08-15 12:41:17 +02:00
Alexander Gololobov	ae0d00083c	Renamed __row_exists to _row_exists	2022-07-18 20:07:36 +02:00
Alexander Gololobov	9de72d995a	POC lightweight delete using __row_exists virtual column and prewhere-like filtering	2022-07-18 20:06:42 +02:00
avogar	59c1c472cb	Better exception messages on wrong table engines/functions argument types	2022-06-23 20:04:06 +00:00
Nikolai Kochetov	8991f39412	Merge branch 'master' into refactor-read-metrics-and-callbacks	2022-06-02 17:00:08 +00:00
Nikita Mikhaylov	d34e051c69	Support for simultaneous read from local and remote parallel replica (#37204 )	2022-06-02 11:46:33 +02:00
Nikolai Kochetov	c71256ea38	Remove some commented code.	2022-05-30 13:18:20 +00:00
Nikolai Kochetov	1b85f2c1d6	Merge branch 'master' into refactor-read-metrics-and-callbacks	2022-05-25 16:27:40 +02:00
Nikolai Kochetov	fd97a9d885	Move some resources	2022-05-23 19:47:32 +00:00
Nikolai Kochetov	56feef01e7	Move some resources	2022-05-20 19:49:31 +00:00
Anton Popov	e911900054	remove last mentions of data streams	2022-05-09 19:15:24 +00:00
Anton Popov	515f68eead	Merge remote-tracking branch 'upstream/master' into dynamic-columns-14	2022-05-06 16:10:51 +00:00
Anton Popov	566c08086a	support Object type inside other types	2022-05-06 14:44:00 +00:00
mergify[bot]	64084b5e32	Merge branch 'master' into shared_ptr_helper3	2022-05-03 20:46:16 +00:00
Robert Schulze	330212e0f4	Remove inherited create() method + disallow copying The original motivation for this commit was that shared_ptr_helper used std::shared_ptr<>() which does two heap allocations instead of make_shared<>() which does a single allocation. Turned out that 1. the affected code (--> Storages/) is not on a hot path (rendering the performance argument moot ...) 2. yet copying Storage objects is potentially dangerous and was previously allowed. Hence, this change - removes shared_ptr_helper and as a result all inherited create() methods, - instead, Storage objects are now created using make_shared<>() by the caller (for that to work, many constructors had to be made public), and - all Storage classes were marked as noncopyable using boost::noncopyable. In sum, we are (likely) not making things faster but the code becomes cleaner and harder to misuse.	2022-05-02 08:46:52 +02:00
mergify[bot]	265398d1b6	Merge branch 'master' into feat/add_part_offset	2022-04-25 15:58:16 +00:00
Robert Schulze	b24ca8de52	Fix various clang-tidy warnings When I tried to add cool new clang-tidy 14 warnings, I noticed that the current clang-tidy settings already produce a ton of warnings. This commit addresses many of these. Almost all of them were non-critical, i.e. C vs. C++ style casts.	2022-04-20 10:29:05 +02:00
Alexey Milovidov	242919eddd	Remove abbreviation	2022-04-18 01:02:49 +02:00
Alexander Tokmakov	07d952b728	use snapshots for semistructured data, durability fixes	2022-03-17 18:26:18 +01:00
roverxu	29a842bf22	feat(...): [LWD] support getting _part_offset of a row	2022-03-15 15:40:10 +08:00
Anton Popov	36ec379aeb	Merge remote-tracking branch 'upstream/master' into HEAD	2022-03-14 16:28:35 +00:00
Anton Popov	37efe2ddb5	Apply suggestions from code review Co-authored-by: Kseniia Sumarokova <54203879+kssenii@users.noreply.github.com>	2022-03-10 22:24:19 +01:00
Azat Khuzhin	4843e210c3	Support view() for parallel_distributed_insert_select Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-03-08 22:05:57 +03:00
Azat Khuzhin	c4b6342853	Improvements for `parallel_distributed_insert_select` (and related) (#34728 ) * Add a warning if parallel_distributed_insert_select was ignored Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Respect max_distributed_depth for parallel_distributed_insert_select Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Print warning for non applied parallel_distributed_insert_select only for initial query Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Remove Cluster::getHashOfAddresses() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Forbid parallel_distributed_insert_select for remote()/cluster() with different addresses Before it uses empty cluster name (getClusterName()) which is not correct, compare all addresses instead. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Fix max_distributed_depth check max_distributed_depth=1 must mean not more then one distributed query, not two, since max_distributed_depth=0 means no limit, and distribute_depth is 0 for the first query. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Fix INSERT INTO remote()/cluster() with parallel_distributed_insert_select Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Add a test for parallel_distributed_insert_select with cluster()/remote() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Return <remote> instead of empty cluster name in Distributed engine Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Make user with sharding_key and w/o in remote()/cluster() identical Before with sharding_key the user was "default", while w/o it it was empty. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-03-08 15:24:39 +01:00
Anton Popov	04a3a10148	minor fixes	2022-03-01 20:20:53 +03:00
Anton Popov	2758db5341	add more comments	2022-03-01 19:32:55 +03:00
Anton Popov	a661eaf39f	better performance of getting storage snapshot	2022-02-16 02:17:22 +03:00
Anton Popov	dcd7312d75	cache common type on objects in MergeTree	2022-02-09 23:47:53 +03:00
Anton Popov	18940b8637	Merge remote-tracking branch 'upstream/master' into HEAD	2022-02-09 23:38:38 +03:00
feng lv	6325d4d9b0	continue of #34317 fix fix	2022-02-06 08:59:17 +00:00
Anton Popov	78b9f15abb	Merge remote-tracking branch 'upstream/master' into HEAD	2022-01-30 03:24:37 +03:00
Anton Popov	e8ce091e68	Merge remote-tracking branch 'upstream/master' into HEAD	2022-01-21 20:11:18 +03:00
Kruglov Pavel	2295a07066	Merge pull request #33534 from azat/fwd-decl RFC: Split headers, move SystemLog into module, more forward declarations	2022-01-18 17:22:49 +03:00
Azat Khuzhin	c341b3b237	Add current database to table names in JOIN section for distributed queries This should fix JOIN w/o explicit database. v2: rewrite only JOIN section, since there is old behavior that relies on default_database for IN section, see [1]: - 01487_distributed_in_not_default_db - 01152_cross_replication [1]: https://s3.amazonaws.com/clickhouse-test-reports/33611/d0ea3c76fa51131171b1825939680867eb1c04da/fast_test__actions_.html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-01-14 11:23:38 +03:00
Azat Khuzhin	0a9b1ee803	Remove RestoreQualifiedNamesMatcher::Data::rename (always true) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-01-14 11:18:52 +03:00
Azat Khuzhin	aee034a597	Use explicit template instantiation for SystemLog - Move some code into module part to avoid dependency from IStorage in SystemLog - Remove extra headers from SystemLog.h - Rewrite some code that was relying on headers that was included by SystemLog.h v2: rebase v3: squash move into module part with explicit template instantiation (to make each commit self compilable after rebase)	2022-01-10 22:01:41 +03:00
Azat Khuzhin	1637c41d42	Remove leftovers of old _shard_num via identifier implementation	2022-01-10 21:21:24 +03:00
avogar	8112a71233	Implement schema inference for most input formats	2021-12-29 12:18:56 +03:00
Anton Popov	99ebabd822	Merge remote-tracking branch 'upstream/master' into HEAD	2021-12-17 19:02:29 +03:00
Alexey Milovidov	5c90ed2ed9	Unambiguous formatting of distributed queries	2021-12-10 00:55:14 +03:00
Nikita Mikhaylov	dbf5091016	Parallel reading from replicas (#29279 )	2021-12-09 13:39:28 +03:00
Anton Popov	6f4d9a53b2	Merge remote-tracking branch 'origin/sparse-serialization' into HEAD	2021-12-01 15:54:33 +03:00
Raúl Marín	7781fc12ed	Reduce dependencies on ASTSelectWithUnionQuery.h 521 -> 77 files requiring changes	2021-11-26 19:27:16 +01:00
Raúl Marín	b2cfa70541	Reduce dependencies on ASTFunction.h 481 -> 230	2021-11-26 18:21:54 +01:00
Anton Popov	a20922b2d3	Merge remote-tracking branch 'origin/sparse-serialization' into HEAD	2021-11-09 15:36:25 +03:00
feng lv	6f12348282	enable modify table comment of some table	2021-10-29 12:31:18 +00:00
Alexander Tokmakov	2e7e195e77	change alter_lock to std::timed_mutex	2021-10-26 13:37:00 +03:00
Nikolai Kochetov	fd14faeae2	Remove DataStreams folder.	2021-10-15 23:18:20 +03:00
Nikolai Kochetov	2957971ee3	Remove some last streams.	2021-10-13 21:22:02 +03:00
Vitaly Baranov	1636ee24bb	Fix using materialized column as sharding key.	2021-10-04 10:56:42 +03:00
Nikolai Kochetov	341553febd	Fix build.	2021-09-16 20:40:42 +03:00
Nikolai Kochetov	b997214620	Rename QueryPipeline to QueryPipelineBuilder.	2021-09-14 20:48:18 +03:00
Nikolai Kochetov	0e267c50b4	Merge branch 'master' into rewrite-pushing-to-views	2021-09-14 16:13:54 +03:00
alexey-milovidov	ea13a8b562	Merge pull request #28659 from myrrc/improvement/tostring_to_magic_enum Improving CH type system with concepts	2021-09-12 15:26:29 +03:00
Nikolai Kochetov	f569a3e3f7	Merge branch 'master' into rewrite-pushing-to-views	2021-09-09 20:30:23 +03:00
Anton Popov	4c388e3d84	Merge remote-tracking branch 'origin/sparse-serialization' into HEAD	2021-09-09 14:10:16 +03:00
Nikolai Kochetov	999a4fe831	Fix other tests.	2021-09-08 21:29:38 +03:00
ZhiYong Wang	978dd19fa2	Fix coredump in creating distributed table	2021-09-07 19:05:26 +08:00
Mike Kot	8e9aacadd1	Initial: replacing hardcoded toString for enums with magic_enum	2021-09-06 16:24:03 +02:00
Alexander Tokmakov	42378b5913	fix	2021-08-20 17:05:53 +03:00
Anton Popov	61239343e3	Merge remote-tracking branch 'origin/sparse-serialization' into HEAD	2021-08-20 16:33:30 +03:00
Alexander Tokmakov	8c6dd18917	check cluster name before creating Distributed	2021-08-20 14:55:04 +03:00
Azat Khuzhin	702d9955c0	Fix distributed queries with zero shards and aggregation	2021-08-08 19:22:49 +03:00
Azat Khuzhin	3be3c503aa	Fix some comments	2021-08-08 09:58:07 +03:00
alexey-milovidov	c5207fc237	Merge pull request #26466 from azat/optimize-dist-select Rework SELECT from Distributed optimizations	2021-08-08 03:59:32 +03:00
mergify[bot]	dc57254982	Merge branch 'master' into improve_create_or_replace	2021-08-03 11:39:07 +00:00
Azat Khuzhin	97851bde08	Fix Distributed over Distributed for WithMergeableStateAfterAggregation* stages In case if one Distributed has multiple shards, and underlying Distributed has only one, there can be the case when the query will be tried to process from Complete to WithMergeableStateAfterAggregation, which is obviously wrong.	2021-08-03 10:10:08 +03:00
Anton Popov	e36736b50c	Merge remote-tracking branch 'origin/sparse-serialization' into HEAD	2021-08-02 22:52:02 +03:00
Azat Khuzhin	ff12f5102a	Avoid running LIMIT BY/DISTINCT step on the initiator for optimize_distributed_group_by_sharding_key Before the following queries was running LimitBy/Distinct step on the initator: select distinct sharding_key from dist order by k While this can be omitted.	2021-08-02 21:04:30 +03:00
Azat Khuzhin	2fb95d9ee0	Rework SELECT from Distributed query stages optimization Before this patch it wasn't possible to optimize simple SELECT * FROM dist ORDER BY (w/o GROUP BY and DISTINCT) to more optimal stage (QueryProcessingStage::WithMergeableStateAfterAggregationAndLimit), since that code was under allow_nondeterministic_optimize_skip_unused_shards, rework it and make it possible. Also now distributed_push_down_limit is respected for optimize_distributed_group_by_sharding_key. Next step will be to enable distributed_push_down_limit by default. v2: fix detection of aggregates	2021-08-02 21:04:29 +03:00
Azat Khuzhin	bb6d030fb8	Optimize distributed SELECT w/o GROUP BY	2021-08-02 21:04:29 +03:00
Nikolai Kochetov	61d8f880cd	Rename some files.	2021-07-26 19:48:25 +03:00
mergify[bot]	044be267d6	Merge branch 'master' into improve_create_or_replace	2021-07-26 08:38:48 +00:00

1 2 3 4 5 ...

377 Commits