ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 19:02:04 +00:00

Author	SHA1	Message	Date
Azat Khuzhin	b725e1d131	DistributedBlockOutputStream: Remove superfluous brackets for string construction	2021-01-17 12:48:51 +03:00
Azat Khuzhin	2955e25e83	Fix inserted blocks accounting for insert_distributed_one_random_shard=1 It is tricky due to block splitting Refs: https://github.com/ClickHouse/ClickHouse/pull/18294	2021-01-17 12:45:42 +03:00
Azat Khuzhin	858f07c796	Update comment for query AST cloning during inesrt into multiple local shards Refs: https://github.com/ClickHouse/ClickHouse/pull/18264#discussion_r558839456	2021-01-17 12:40:41 +03:00
alexey-milovidov	a5a19de878	Update DistributedBlockOutputStream.cpp	2021-01-16 13:22:25 +03:00
feng lv	9829c09720	fix	2021-01-15 15:54:35 +00:00
Azat Khuzhin	ecae6c1c60	Avoid reading the distributed batch just to read the block header Before this patch batched mode of the DirectoryMonitor is 2x slower then non-batched, after it should be more or less the same as non-batched.	2021-01-14 22:38:46 +03:00
Azat Khuzhin	56475774d3	Fix readability-static-definition-in-anonymous-namespace in DirectoryMonitor	2021-01-10 23:57:40 +03:00
Azat Khuzhin	2565d2ac44	Verify compressed headers while sending distributed batches Before this patch the DirectoryMonitor was checking the compressed file by reading it one more time (since w/o this receiver may stuck on truncated file), while this is ineffective and we can just check the checksums before sending. But note that this may decrease batch size that is used for sending over network.	2021-01-10 21:23:42 +03:00
Azat Khuzhin	819b9d7d56	Add more metadata into distributed .bin files to avoid doing the same on sending Before this patch StorageDistributedDirectoryMonitor reading .bin files in batch mode, just to calculate number of bytes/rows, this is very ineffective, let's just store them in the header (rows/bytes).	2021-01-10 18:17:15 +03:00
Azat Khuzhin	fce8b6b5ef	Refactoring distributed header parsing	2021-01-10 18:17:15 +03:00
Azat Khuzhin	676bc83c6d	Check per-block checksum of the distributed batch on the sender before sending This is already done for distributed_directory_monitor_batch_inserts=1, so let's do the same for the non batched mode, since otherwise in case the file will be truncated the receiver will just stuck (since it will wait for the block, but the sender will not send it).	2021-01-10 18:17:14 +03:00
Azat Khuzhin	471deab63a	Rename fsync_tmp_directory to fsync_directories for Distributed engine	2021-01-09 17:51:30 +03:00
Azat Khuzhin	ae0b15455f	Add fsync_tmp_directory support into DirectoryMonitor	2021-01-09 16:31:52 +03:00
Azat Khuzhin	2e55bd2285	Accept IDisk in DirectoryMonitor (for further fsync)	2021-01-09 16:31:42 +03:00
Azat Khuzhin	dd669cb2b6	Add fsync support for Distributed/DirectoryMonitor Note that there is no fsync_tmp_directory support in DirectoryMonitor since you cannot propagate the error to user anyway.	2021-01-09 15:26:25 +03:00
Azat Khuzhin	fbe5df809b	Sync other temporary directories for Distributed fsync_tmp_directories	2021-01-09 11:36:04 +03:00
Azat Khuzhin	b5ace27014	Add fsync support for Distributed engine. Two new settings (by analogy with MergeTree family) has been added: - `fsync_after_insert` - Do fsync for every inserted. Will decreases performance of inserts. - `fsync_tmp_directory` - Do fsync for temporary directory (that is used for async INSERT only) after all part operations (writes, renames, etc.). Refs: #17380 (p1)	2021-01-09 11:31:32 +03:00
alexey-milovidov	417e685830	Merge pull request #18775 from ClickHouse/dont-insert-empty-blocks-distributed-sync Do not insert empty blocks on sync Distributed INSERT	2021-01-06 20:06:35 +03:00
Alexey Milovidov	1572d2122b	Respect network_compression_method in async INSERT into Distributed table	2021-01-06 03:41:34 +03:00
Alexey Milovidov	190402b7d5	Do not insert empty blocks on sync Distributed INSERT	2021-01-06 02:54:22 +03:00
Amos Bird	6fc225e676	Distributed insertion to one random shard (#18294 ) * Distributed insertion to one random shard * add some tests * add some documentation * Respect shards' weights * fine locking Co-authored-by: Ivan Lezhankin <ilezhankin@yandex-team.ru>	2020-12-23 19:04:05 +03:00
Azat Khuzhin	5b3ab48861	More forward declaration for generic headers The following headers are pretty generic, so use forward declaration as much as possible: - Context.h - Settings.h - ConnectionTimeouts.h (Also this shows that some missing some includes -- this has been fixed) And split ConnectionTimeouts.h into ConnectionTimeoutsContext.h (since module part cannot be added for it, due to recursive build dependencies that will be introduced) Also remove Settings from the RemoteBlockInputStream/RemoteQueryExecutor and just pass the context, since settings was passed only in speicifc places, that can allow making a copy of Context (i.e. Copier). Approx results (How much units will be recompiled after changing file X?): - ConnectionTimeouts.h - mainline: 100 - Context.h: - mainline: ~800 - patched: 415 - Settings.h: - mainline: 900-1K - patched: 440 (most of them because of the Context.h)	2020-12-12 17:43:10 +03:00
Alexander Tokmakov	bfbf150c67	fix segfault when 'not enough space'	2020-12-02 17:49:43 +03:00
Amos Bird	1d9d586e20	Make global_context consistent.	2020-11-20 18:23:14 +08:00
Alexander Tokmakov	b94cc5c4e5	remove more stringstreams	2020-11-10 21:22:26 +03:00
alexey-milovidov	f39457bc77	Merge pull request #16788 from azat/fix-use_compact_format_in_distributed_parts_names Apply use_compact_format_in_distributed_parts_names for each INSERT (with internal_replication)	2020-11-08 23:23:10 +03:00
Azat Khuzhin	04db0834bf	Apply use_compact_format_in_distributed_parts_names for each INSERT (with internal_replication) Before this patch use_compact_format_in_distributed_parts_names was applied only from default profile (at server start) for internal_replication=1, and was ignored on INSERT.	2020-11-08 03:05:52 +03:00
Alexey Milovidov	fd84d16387	Fix "server failed to start" error	2020-11-07 03:14:53 +03:00
Azat Khuzhin	59cdc964a1	Do not store reference to BackgroundSchedulePool in DirectoryMonitor (useless)	2020-11-05 23:43:34 +03:00
Alexander Kuzmenkov	fb64cf210a	straighten the protocol version	2020-09-17 17:37:29 +03:00
Azat Khuzhin	0159c74f21	Secure inter-cluster query execution (with initial_user as current query user) [v3] Add inter-server cluster secret, it is used for Distributed queries inside cluster, you can configure in the configuration file: <remote_servers> <logs> <shard> <secret>foobar</secret> <!-- empty -- works as before --> ... </shard> </logs> </remote_servers> And this will allow clickhouse to make sure that the query was not faked, and was issued from the node that knows the secret. And since trust appeared it can use initial_user for query execution, this will apply correct *_for_user (since with inter-server secret enabled, the query will be executed from the same user on the shards as on initator, unlike "default" user w/o it). v2: Change user to the initial_user for Distributed queries if secret match v3: Add Protocol::Cluster package v4: Drop Protocol::Cluster and use plain Protocol::Hello + user marker v5: Do not use user from Hello for cluster-secure (superfluous)	2020-09-15 01:36:28 +03:00
Azat Khuzhin	a588947fe2	Fix DistributedFilesToInsert metric (zeroed when it should not) CurrentMetrics::Increment add amount for specified metric only for the lifetime of the object, but this is not the intention, since DistributedFilesToInsert is a gauge and after #10263 it can exit from the callback (and enter again later, for example after SYSTEM STOP DISTRIBUTED SEND it will always exit from it, until SYSTEM START DISTRIBUTED SEND). So make Increment member of a class (this will also fix possible issues with substructing value on DROP TABLE).	2020-08-27 00:43:00 +03:00
Alexander Tokmakov	dd4b8b9663	fix lock order inversion when renaming distributed table	2020-08-20 16:36:22 +03:00
Alexey Milovidov	12f66fa82c	Fix 99% of typos	2020-08-08 04:01:47 +03:00
Vitaly Baranov	56665a15f7	Rework and rename the template class SettingsCollection => BaseSettings.	2020-07-31 20:54:18 +03:00
Vladimir Chebotarev	faedb04722	Minor fixes.	2020-07-28 19:45:46 +03:00
Vladimir Chebotarev	1b3f5c99f5	Real fix of test.	2020-07-26 21:27:36 +03:00
Alexey Milovidov	73a5c38398	Fix potential overflow in integer division #12119	2020-07-05 03:29:03 +03:00
alesapin	ebb36bec8a	Merge branch 'master' into atomic_metadata5	2020-06-18 11:57:16 +03:00
alesapin	1ddeb3d149	Buildable getSampleBlock in StorageInMemoryMetadata	2020-06-16 18:51:29 +03:00
Azat Khuzhin	c139a05370	Forward declaration in StorageDistributed	2020-06-14 01:09:21 +03:00
Azat Khuzhin	d2383f0f5d	Fix async INSERT into Distributed for prefer_localhost_replica=0 and w/o internal_replication	2020-06-08 21:58:56 +03:00
Azat Khuzhin	86c5465bf8	Rewrite StorageSystemDistributionQueue interfaces	2020-06-04 03:04:32 +03:00
Azat Khuzhin	f0050adc51	Make system.distribution_queue metrics non racy	2020-06-04 02:36:16 +03:00
Azat Khuzhin	09c3ca9c6c	Add last_exception into system.distribution_queue	2020-06-04 02:36:16 +03:00
Azat Khuzhin	389f78ceee	Add system.distribution_queue system.distribution_queue contains the following columns: - database - table - data_path - is_blocked - error_count - data_files - data_compressed_bytes	2020-06-04 02:36:16 +03:00
alesapin	6253e9b97b	Revert disabled tests	2020-06-02 21:41:29 +03:00
Alexey Milovidov	25f941020b	Remove namespace pollution	2020-05-31 00:57:37 +03:00
Alexey Milovidov	146370934a	Keep the value of DistributedFilesToInsert metric on exceptions	2020-05-27 13:07:38 +03:00
Alexey Milovidov	7e1813825b	Return old names of macros	2020-05-24 01:24:01 +03:00
Alexey Milovidov	ce0619dabf	Progress on task	2020-05-24 00:26:45 +03:00
Alexey Milovidov	2d7d5a1547	Apply all transformations again	2020-05-24 00:16:27 +03:00
Alexey Milovidov	bab24879e9	Progress on task	2020-05-24 00:16:05 +03:00
Alexey Milovidov	e1695feb7f	Apply all transformations again	2020-05-23 23:40:32 +03:00
Alexey Milovidov	85f84550ba	Progress on task	2020-05-23 23:37:37 +03:00
Alexey Milovidov	7e2fb9ad65	Apply all transformations again	2020-05-23 22:38:30 +03:00
Alexey Milovidov	eacff92d0e	Progress on task	2020-05-23 22:35:08 +03:00
Alexey Milovidov	29762240de	Remove duplicate whitespaces (preparation)	2020-05-23 22:31:54 +03:00
Alexey Milovidov	7fed65cbe2	Remove duplicate whitespaces (preparation)	2020-05-23 22:14:58 +03:00
Alexey Milovidov	ab0562a574	Make all LOG in single line (preparation)	2020-05-23 22:05:41 +03:00
Alexey Milovidov	a2ad11897f	Remove duplicate whitespaces (preparation)	2020-05-23 21:53:58 +03:00
Alexey Milovidov	1f13515a65	Make all LOG in single line (preparation)	2020-05-23 21:31:37 +03:00
Alexey Milovidov	8042e5febe	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+\);' \| xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+)\);/\1_FORMATTED(\2, "\3{}\5{}", \4, \6);/'	2020-05-23 19:58:15 +03:00
Alexey Milovidov	e391b77d81	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+"\);' \| xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)"\);/\1_FORMATTED(\2, "\3{}\5", \4);/'	2020-05-23 19:56:05 +03:00
Alexey Milovidov	8d2e80a5e2	find {base,src,programs} -name '.h' -or -name '.cpp' \| xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+"\)' \| xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+, "[^"]+")\)/\1_FORMATTED(\2)/'	2020-05-23 19:42:39 +03:00
Azat Khuzhin	d93b9a57f6	Forward declaration for Context as much as possible. Now after changing Context.h 488 modules will be recompiled instead of 582.	2020-05-21 01:53:18 +03:00
alexey-milovidov	7cf3538840	Merge pull request #10270 from ClickHouse/quota-key-in-client Support quota_key for Native client	2020-05-17 14:09:40 +03:00
alexey-milovidov	7ee35f102d	Merge pull request #10867 from azat/dist-INSERT-load-balancing Respect prefer_localhost_replica/load_balancing on INSERT into Distributed	2020-05-17 11:11:35 +03:00
Alexey Milovidov	397859ccb8	Fix error	2020-05-17 08:45:20 +03:00
Azat Khuzhin	2498041cc1	Avoid sending partially written files by the DistributedBlockOutputStream	2020-05-16 01:00:42 +03:00
Azat Khuzhin	52d73c7f45	Fix prefer_localhost_replica=0 and load_balancing for Distributed INSERT	2020-05-14 03:29:03 +03:00
Azat Khuzhin	e1d4837753	Fix list of possible nodes for Distributed INSERT for internal_replication=0	2020-05-14 03:28:08 +03:00
Azat Khuzhin	cdf3845e43	Respect load_balancing in DirectoryMonitor, to fix w/o internal_replication	2020-05-14 01:33:25 +03:00
Azat Khuzhin	085bafad05	Handle prefer_localhost_replica on INSERT into Distributed Right now it will issue remote send even if finally the local replica will be selected - not good I guess. This should also fix load_balancing.	2020-05-13 01:38:03 +03:00
Azat Khuzhin	889f54b549	Fix ENOENT exception on current_batch.txt in DirectoryMonitor current_batch.txt will not exist if there was no send, this is the case when all batches that was pending has been marked as pending.	2020-05-13 01:23:18 +03:00
alexey-milovidov	ddc84163a7	Merge pull request #10486 from azat/dist-send-on-INSERT Fix distributed send that are scheduled by INSERT query	2020-05-11 06:28:35 +03:00
Azat Khuzhin	5c89cdbe61	Fix distributed send retries on distributed_directory_monitor_{max_,}sleep_time_ms > 5min In this case error_count can be decreased before checking it for rescheduling send. And actually this can be a problem not only when distributed_directory_monitor_{max_,}sleep_time_ms > 5min, because all threads can be occupated and the real timeout between sends will be > 5min.	2020-05-10 12:37:38 +03:00
Gleb Novikov	c637d99e07	Volumes and storages refactoring: 1. Moved Volume to separate file 2. Created IVolume interface and implemented current behaviour in implementation of new interface — VolumeJBOD 3. Replaced all old volume usages with new VolumeJBOD. Where it is unnecessary to have JBOD — left just IVolume. 4. Removed old Volume completely 5. Moved StoragePolicy to separated files 6. Moved DiskSelector to separated files 7. Removed DiskSpaceMonitor file	2020-05-04 23:15:38 +03:00
Azat Khuzhin	6ffdd53b6a	Share auto-increment for first batch and tmp file in DistributedBlockOutputStream	2020-05-03 14:47:59 +03:00
Azat Khuzhin	53c470cab4	Fix directory monitor initialization from INSERT into Distributed This also fixes hardlink code (when one file should be sent to multiple servers, i.e. internal_replication == false) of writeToShard() with distributed_storage_policy (i.e. when StorageDistributed::getPath() will path to different filesystems). Plus also cleanup DistributedBlockOutputStream::writeToShard() a little.	2020-05-03 14:47:51 +03:00
Azat Khuzhin	e97e1f06db	Do not schedule distributed send if there were no error Since in this case it will be scheduled from the DistributedBlockOutputStream with the distributed_directory_monitor_max_sleep_time_ms, and this will overwrite timer that was set by the DistributedBlockOutputStream, not good.	2020-05-03 14:46:44 +03:00
Azat Khuzhin	947b3942dd	Schedule distributed sends after the file has been written	2020-05-03 14:46:43 +03:00
Azat Khuzhin	0157fd5d93	Fix distributed send that are scheduled by INSERT query Before this patch each INSERT query re-schedule distributed send, thus each time it resets the timer, while this is not the expected behaviour, since in on frequent INSERT distributed sends will not be triggered at all. Fix this by not resetting the timer.	2020-05-03 14:46:42 +03:00
Azat Khuzhin	6bb39dafc3	Drop decreated code (cond var and note for thread) in DirectoryMonitor	2020-05-03 14:46:41 +03:00
Azat Khuzhin	63d8ab8f03	Make createSelector() static (in storage) and const (in stream)	2020-05-01 11:31:05 +03:00
Azat Khuzhin	f22ba15b4a	Reduce copy-paste of DistributedBlockOutputStream::createSelector This will make it less error prone.	2020-05-01 02:59:40 +03:00
Alexey Milovidov	1e325a9fd9	Checkpoint	2020-04-22 09:22:14 +03:00
Azat Khuzhin	5d11118cc9	Use thread pool (background_distributed_schedule_pool_size) for distributed sends After #8756 the problem with 1 thread for each (distributed table, disk) for distributed sends became even worse (since there can be multiple disks), so use predefined thread pool for this tasks, that can be controlled with background_distributed_schedule_pool_size knob.	2020-04-19 12:01:56 +03:00
Azat Khuzhin	673ddc9d77	Drop superfluous locking for atomic in DirectoryMonitor	2020-04-19 00:22:48 +03:00
Alexey Milovidov	8ad04d4fec	Remove useless code	2020-04-15 00:05:45 +03:00
Azat Khuzhin	6d85207bfb	Convert blocks if structure does not match on INSERT into Distributed() Follow-up for: #10105	2020-04-08 23:46:01 +03:00
Azat Khuzhin	b2fa9d8750	Fix SIGSEGV on INSERT into Distributed on different struct with underlying	2020-04-08 02:35:31 +03:00
Ivan Lezhankin	06446b4f08	dbms/ → src/	2020-04-03 18:14:31 +03:00
Azat Khuzhin	f53c9a6b25	Fix "Block structure mismatch" for INSERT into Distributed Add missing conversion (via ConvertingBlockInputStream) for INSERT into remote nodes (for sync insert, async insert and async batch insert), like for local nodes (in DistributedBlockOutputStream::writeBlockConverted). This is required when the structure of the Distributed table differs from the structure of the local table. And also add a warning message, to highlight this in logs (since this works slower). Fixes: #19888	2021-02-02 10:16:41 +03:00

1 2 3 4

194 Commits