ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-15 19:02:04 +00:00

Author	SHA1	Message	Date
Maksim Kita	67e9b85951	Merge ext into common	2021-06-16 23:28:41 +03:00
alexey-milovidov	34d12063f8	Merge pull request #23349 from azat/dist-respect-insert_allow_materialized_columns Respect insert_allow_materialized_columns for INSERT into Distributed()	2021-06-14 07:23:00 +03:00
Azat Khuzhin	2109980284	Respect max_distributed_connections for insert_distributed_sync Otherwise for huge clusters and sync insert it may run out of max_thread_pool_size (default 10K).	2021-06-08 09:11:44 +03:00
Alexey Milovidov	17962459f5	Merge branch 'master' into issue-16775	2021-06-06 02:18:28 +03:00
tavplubix	e9ff0b6d70	Merge pull request #23657 from kssenii/poco-file-to-std-fs Poco::File to std::filesystem	2021-05-31 23:17:02 +03:00
Nikolai Kochetov	afc1fe7f3d	Make ContextPtr const by default.	2021-05-31 17:49:02 +03:00
Alexey Milovidov	273226de32	Remove string parameter for Density	2021-05-24 06:43:25 +03:00
Alexey Milovidov	40d4f0678f	Remove overload (harmful)	2021-05-23 04:25:06 +03:00
Azat Khuzhin	4d737a5481	Respect insert_allow_materialized_columns for INSERT into Distributed()	2021-05-20 07:40:46 +03:00
Azat Khuzhin	c3e65c0d27	Async INSERT into Distributed() does support settings Since #4852	2021-05-20 07:40:46 +03:00
kssenii	ab1a05a1f4	Poco::Path to fs::path, less concatination	2021-05-09 14:59:49 +03:00
kssenii	02288359c5	Less manual concatenation of paths	2021-05-08 13:59:55 +03:00
fibersel	cb53bbb7b0	add experimental codecs flag, add integration test for experimental codecs	2021-05-06 14:57:22 +03:00
kssenii	2dabdd0f73	Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs	2021-05-05 18:42:40 +03:00
Azat Khuzhin	9c6e8e1462	Add BrokenDistributedFilesToInsert new metric Number of files for asynchronous insertion into Distributed tables that has been marked as broken. This metric will starts from 0 on start. Number of files for every shard is summed.	2021-05-04 22:48:07 +03:00
Azat Khuzhin	74269882f7	Add broken_data_files/broken_data_compressed_bytes into distribution_queue	2021-05-04 22:48:07 +03:00
Azat Khuzhin	5e33604c4d	Add file paths into logs on failed distributed async sends	2021-05-03 08:55:38 +03:00
kssenii	ee06936596	Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs	2021-05-01 17:24:31 +03:00
Maksim Kita	1db6eb3666	Merge pull request #23744 from azat/dist-INSERT-preserve-error Preserve errors for INSERT into Distributed	2021-04-29 10:26:34 +03:00
Azat Khuzhin	73ab415c4c	Preserve errors for INSERT into Distributed Before this patch (and after #22208) the INSERT may fail with "Cannot schedule a task" because the pool in DistributedBlockOutputStream already throws exception and simply fail in writeSuffix().	2021-04-28 22:33:29 +03:00
kssenii	deb4903af8	Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs	2021-04-28 20:57:13 +03:00
kssenii	1e4a61ce63	Fix build	2021-04-27 20:22:39 +03:00
kssenii	eeb71672a0	Change in Storages/*	2021-04-27 16:49:37 +03:00
Amos Bird	096d76627e	Skip unavaiable shards when writing to distributed tables	2021-04-21 10:30:40 +08:00
Alexey Milovidov	77e64b3ebd	Merge branch 'master' into protocol-compression-auto	2021-04-17 16:46:51 +03:00
Azat Khuzhin	d2cf03ea41	Change logging from trace to debug for messages with rows/bytes	2021-04-15 21:00:16 +03:00
Amos Bird	bf5b668f85	Fix random_shard_insert issue	2021-04-15 21:12:39 +08:00
Alexey Milovidov	6f56c3280f	Uncompress data in Distributed sends if needed	2021-04-14 00:53:39 +03:00
Ivan	495c6e03aa	Replace all Context references with std::weak_ptr (#22297 ) * Replace all Context references with std::weak_ptr * Fix shared context captured by value * Fix build * Fix Context with named sessions * Fix copy context * Fix gcc build * Merge with master and fix build * Fix gcc-9 build	2021-04-11 02:33:54 +03:00
Alexander Kuzmenkov	0264124146	Merge pull request #21942 from ucasFL/distributed_depth Add settings max_distributed_depth	2021-04-09 15:52:58 +03:00
Azat Khuzhin	c27b931f6a	Slightly improve logging messages for Distributed async sends - add took time (in ms) - add rows/bytes	2021-04-08 08:10:39 +03:00
Azat Khuzhin	27d4fbd13b	Compare Block itself for distributed async INSERT batches INSERT into Distributed with insert_distributed_sync=1 stores the distributed batches on the disk for sending in background. But types may be a little bit different for the Distributed and it's underlying table, so the initiator need to know whether conversion is required or not. Before this patch those on disk distributed batches contains header, which includes dumpStructure() for the block in that batch, however it checks not only names and types and plus dumpStructure() is a debug method. So instead of storing string representation for the block header we should store empty block in the file header (note, that we cannot store the empty block not in header, since this will require reading all blocks from file, due to some trickery of the readers interface). Note, that this patch also contains tiny refactoring: - s/header/distributed_header/ v1: dumpNamesAndTypes() v2: dump empty block into the batch itself v3: move empty block into the header	2021-04-06 10:05:21 +03:00
feng lv	56073db22d	max distributed depth Add settings max_distributed_depth fix style fix fix fix fix test fix fix	2021-04-05 14:00:54 +00:00
Azat Khuzhin	79ed35876e	DirectoryMonitor: Remove const qualifier and lots of mutable qualifiers	2021-03-03 23:30:24 +03:00
Azat Khuzhin	45ee650e26	Distributed: check for bytes_to_throw/delay_insert only before INSERT Before it was checked for each block.	2021-03-03 23:30:24 +03:00
Azat Khuzhin	6965ac26c3	Distributed: Add ability to delay/throttle INSERT until pending data will be reduced Add two new settings for the Distributed engine: - bytes_to_delay_insert - max_delay_to_insert If at the beginning of INSERT there will be too much pending data, more then bytes_to_delay_insert, then the INSERT will wait until it will be shrinked, and not more then max_delay_to_insert seconds. If after this there will be still too much pending, it will throw an exception. Also new profile events were added (by analogy to the MergeTree): - DistributedDelayedInserts (although you can use system.errors instead of this, but still) - DistributedRejectedInserts - DistributedDelayedInsertsMilliseconds	2021-03-03 23:30:23 +03:00
Azat Khuzhin	15f7459cae	Distributed/DirectoryMonitor: protect metric_pending_files with metrics_lock Since there is local value, that is not atomic, anyway we already have lock for metrics, so it is fine.	2021-03-03 23:30:03 +03:00
Azat Khuzhin	70049db143	CurrentMetrics/Increment: Introduce add()	2021-03-03 23:30:03 +03:00
Azat Khuzhin	017c054a35	Distributed/DirectoryMonitor: Use std::lock_guard over std::unique_lock It is more natural, since we do not need lazy locking.	2021-03-03 23:30:03 +03:00
Azat Khuzhin	fcf49a4914	Distributed: Calculate counters for async INSERT at INSERT time Previous patch fixes the inaccuracy, but it's done using iterating over directory on each request (to system.distribution_queue or to check bytes_to_throw_insert), and like previous patch alredy stated, it may have pretty huge overhead (especially when you have lots of distributed files pending). This patch remove that recalculation (but it will still be done, and if there is different, there will be a log message), and replace it with proper account at INSERT time (and after file has been sent, or marked as broken).	2021-03-03 23:30:03 +03:00
Azat Khuzhin	b43046ba06	Distributed: More accurate distribution_queue counters So now system.distribution_queue will show accurate statistics, so tests does not requires sleep anymore. But note that with too much distributed pending this will iterate over all directories.	2021-03-03 23:30:03 +03:00
Azat Khuzhin	b5a5778589	Distributed: Add ability to limit amount of pending bytes for async INSERT Right now with distributed_directory_monitor_batch_inserts=1 and insert_distributed_sync=0 INSERT into Distributed table will store blocks that should be sent to remote (and in case of prefer_localhost_replica=0 to the localhost too) on the local filesystem, and sent it in background. However there is no limit for this storage, and if the remote is unavailable (or some other error), these pending blocks may take significant space, and this is not always desired behaviour. Add new Distributed setting - bytes_to_throw_insert, that will set the limit for how much pending bytes is allowed, if the limit will be reached an exception will be throw. By default was set to 0, to avoid surprises.	2021-03-03 23:30:00 +03:00
Azat Khuzhin	02198d091e	Add proper checks while parsing directory names for async INSERT (fixes SIGSEGV)	2021-02-15 10:53:41 +03:00
Kruglov Pavel	d94e8624d7	Merge branch 'master' into shard-id	2021-02-06 16:48:17 +03:00
feng lv	0edf65c094	fix test fix update fix spell	2021-02-02 09:22:30 +00:00
feng lv	4279c7da41	add setting insert_shard_id add test fix style fix	2021-02-02 04:26:59 +00:00
Anton Popov	c7070da85a	better abstractions in disk interface	2021-01-26 17:49:35 +03:00
Azat Khuzhin	109dbe5df4	Check the stream before sending while hanlding async INSERTs into Distributed It is possible to get corruption (even though it is very unlikely, and initially it wasn't corruption) just before the data block goes in the file on disk, and in case of batching, it will break the packets, since it will write the packet type but will not write any data after.	2021-01-22 21:29:58 +03:00
Azat Khuzhin	8a00816396	Do not mark file for distributed send as broken on EOF - the sender will got ATTEMPT_TO_READ_AFTER_EOF (added in `946c275dfb`) when the client just go away, i.e. server had been restarted, and this is incorrect to mark the file as broken in this case. - since #18853 the file will be checked on the sender locally, and in case the file was truncated CANNOT_READ_ALL_DATA will be thrown. But before #18853 the sender will not receive ATTEMPT_TO_READ_AFTER_EOF from the client in case of file was truncated on the sender, since the client will just wait for more data, IOW just hang. - and I don't see how ATTEMPT_TO_READ_AFTER_EOF can be received while reading local file.	2021-01-20 01:10:17 +03:00
Azat Khuzhin	a6631287a7	DistributedBlockOutputStream: add more comments	2021-01-17 12:50:37 +03:00
Azat Khuzhin	b725e1d131	DistributedBlockOutputStream: Remove superfluous brackets for string construction	2021-01-17 12:48:51 +03:00
Azat Khuzhin	2955e25e83	Fix inserted blocks accounting for insert_distributed_one_random_shard=1 It is tricky due to block splitting Refs: https://github.com/ClickHouse/ClickHouse/pull/18294	2021-01-17 12:45:42 +03:00
Azat Khuzhin	858f07c796	Update comment for query AST cloning during inesrt into multiple local shards Refs: https://github.com/ClickHouse/ClickHouse/pull/18264#discussion_r558839456	2021-01-17 12:40:41 +03:00
alexey-milovidov	a5a19de878	Update DistributedBlockOutputStream.cpp	2021-01-16 13:22:25 +03:00
feng lv	9829c09720	fix	2021-01-15 15:54:35 +00:00
Azat Khuzhin	ecae6c1c60	Avoid reading the distributed batch just to read the block header Before this patch batched mode of the DirectoryMonitor is 2x slower then non-batched, after it should be more or less the same as non-batched.	2021-01-14 22:38:46 +03:00
Azat Khuzhin	56475774d3	Fix readability-static-definition-in-anonymous-namespace in DirectoryMonitor	2021-01-10 23:57:40 +03:00
Azat Khuzhin	2565d2ac44	Verify compressed headers while sending distributed batches Before this patch the DirectoryMonitor was checking the compressed file by reading it one more time (since w/o this receiver may stuck on truncated file), while this is ineffective and we can just check the checksums before sending. But note that this may decrease batch size that is used for sending over network.	2021-01-10 21:23:42 +03:00
Azat Khuzhin	819b9d7d56	Add more metadata into distributed .bin files to avoid doing the same on sending Before this patch StorageDistributedDirectoryMonitor reading .bin files in batch mode, just to calculate number of bytes/rows, this is very ineffective, let's just store them in the header (rows/bytes).	2021-01-10 18:17:15 +03:00
Azat Khuzhin	fce8b6b5ef	Refactoring distributed header parsing	2021-01-10 18:17:15 +03:00
Azat Khuzhin	676bc83c6d	Check per-block checksum of the distributed batch on the sender before sending This is already done for distributed_directory_monitor_batch_inserts=1, so let's do the same for the non batched mode, since otherwise in case the file will be truncated the receiver will just stuck (since it will wait for the block, but the sender will not send it).	2021-01-10 18:17:14 +03:00
Azat Khuzhin	471deab63a	Rename fsync_tmp_directory to fsync_directories for Distributed engine	2021-01-09 17:51:30 +03:00
Azat Khuzhin	ae0b15455f	Add fsync_tmp_directory support into DirectoryMonitor	2021-01-09 16:31:52 +03:00
Azat Khuzhin	2e55bd2285	Accept IDisk in DirectoryMonitor (for further fsync)	2021-01-09 16:31:42 +03:00
Azat Khuzhin	dd669cb2b6	Add fsync support for Distributed/DirectoryMonitor Note that there is no fsync_tmp_directory support in DirectoryMonitor since you cannot propagate the error to user anyway.	2021-01-09 15:26:25 +03:00
Azat Khuzhin	fbe5df809b	Sync other temporary directories for Distributed fsync_tmp_directories	2021-01-09 11:36:04 +03:00
Azat Khuzhin	b5ace27014	Add fsync support for Distributed engine. Two new settings (by analogy with MergeTree family) has been added: - `fsync_after_insert` - Do fsync for every inserted. Will decreases performance of inserts. - `fsync_tmp_directory` - Do fsync for temporary directory (that is used for async INSERT only) after all part operations (writes, renames, etc.). Refs: #17380 (p1)	2021-01-09 11:31:32 +03:00
alexey-milovidov	417e685830	Merge pull request #18775 from ClickHouse/dont-insert-empty-blocks-distributed-sync Do not insert empty blocks on sync Distributed INSERT	2021-01-06 20:06:35 +03:00
Alexey Milovidov	1572d2122b	Respect network_compression_method in async INSERT into Distributed table	2021-01-06 03:41:34 +03:00
Alexey Milovidov	190402b7d5	Do not insert empty blocks on sync Distributed INSERT	2021-01-06 02:54:22 +03:00
Amos Bird	6fc225e676	Distributed insertion to one random shard (#18294 ) * Distributed insertion to one random shard * add some tests * add some documentation * Respect shards' weights * fine locking Co-authored-by: Ivan Lezhankin <ilezhankin@yandex-team.ru>	2020-12-23 19:04:05 +03:00
Azat Khuzhin	5b3ab48861	More forward declaration for generic headers The following headers are pretty generic, so use forward declaration as much as possible: - Context.h - Settings.h - ConnectionTimeouts.h (Also this shows that some missing some includes -- this has been fixed) And split ConnectionTimeouts.h into ConnectionTimeoutsContext.h (since module part cannot be added for it, due to recursive build dependencies that will be introduced) Also remove Settings from the RemoteBlockInputStream/RemoteQueryExecutor and just pass the context, since settings was passed only in speicifc places, that can allow making a copy of Context (i.e. Copier). Approx results (How much units will be recompiled after changing file X?): - ConnectionTimeouts.h - mainline: 100 - Context.h: - mainline: ~800 - patched: 415 - Settings.h: - mainline: 900-1K - patched: 440 (most of them because of the Context.h)	2020-12-12 17:43:10 +03:00
Alexander Tokmakov	bfbf150c67	fix segfault when 'not enough space'	2020-12-02 17:49:43 +03:00
Amos Bird	1d9d586e20	Make global_context consistent.	2020-11-20 18:23:14 +08:00
Alexander Tokmakov	b94cc5c4e5	remove more stringstreams	2020-11-10 21:22:26 +03:00
alexey-milovidov	f39457bc77	Merge pull request #16788 from azat/fix-use_compact_format_in_distributed_parts_names Apply use_compact_format_in_distributed_parts_names for each INSERT (with internal_replication)	2020-11-08 23:23:10 +03:00
Azat Khuzhin	04db0834bf	Apply use_compact_format_in_distributed_parts_names for each INSERT (with internal_replication) Before this patch use_compact_format_in_distributed_parts_names was applied only from default profile (at server start) for internal_replication=1, and was ignored on INSERT.	2020-11-08 03:05:52 +03:00
Alexey Milovidov	fd84d16387	Fix "server failed to start" error	2020-11-07 03:14:53 +03:00
Azat Khuzhin	59cdc964a1	Do not store reference to BackgroundSchedulePool in DirectoryMonitor (useless)	2020-11-05 23:43:34 +03:00
Alexander Kuzmenkov	fb64cf210a	straighten the protocol version	2020-09-17 17:37:29 +03:00
Azat Khuzhin	0159c74f21	Secure inter-cluster query execution (with initial_user as current query user) [v3] Add inter-server cluster secret, it is used for Distributed queries inside cluster, you can configure in the configuration file: <remote_servers> <logs> <shard> <secret>foobar</secret> <!-- empty -- works as before --> ... </shard> </logs> </remote_servers> And this will allow clickhouse to make sure that the query was not faked, and was issued from the node that knows the secret. And since trust appeared it can use initial_user for query execution, this will apply correct *_for_user (since with inter-server secret enabled, the query will be executed from the same user on the shards as on initator, unlike "default" user w/o it). v2: Change user to the initial_user for Distributed queries if secret match v3: Add Protocol::Cluster package v4: Drop Protocol::Cluster and use plain Protocol::Hello + user marker v5: Do not use user from Hello for cluster-secure (superfluous)	2020-09-15 01:36:28 +03:00
Azat Khuzhin	a588947fe2	Fix DistributedFilesToInsert metric (zeroed when it should not) CurrentMetrics::Increment add amount for specified metric only for the lifetime of the object, but this is not the intention, since DistributedFilesToInsert is a gauge and after #10263 it can exit from the callback (and enter again later, for example after SYSTEM STOP DISTRIBUTED SEND it will always exit from it, until SYSTEM START DISTRIBUTED SEND). So make Increment member of a class (this will also fix possible issues with substructing value on DROP TABLE).	2020-08-27 00:43:00 +03:00
Alexander Tokmakov	dd4b8b9663	fix lock order inversion when renaming distributed table	2020-08-20 16:36:22 +03:00
Alexey Milovidov	12f66fa82c	Fix 99% of typos	2020-08-08 04:01:47 +03:00
Vitaly Baranov	56665a15f7	Rework and rename the template class SettingsCollection => BaseSettings.	2020-07-31 20:54:18 +03:00
Vladimir Chebotarev	faedb04722	Minor fixes.	2020-07-28 19:45:46 +03:00
Vladimir Chebotarev	1b3f5c99f5	Real fix of test.	2020-07-26 21:27:36 +03:00
Alexey Milovidov	73a5c38398	Fix potential overflow in integer division #12119	2020-07-05 03:29:03 +03:00
alesapin	ebb36bec8a	Merge branch 'master' into atomic_metadata5	2020-06-18 11:57:16 +03:00
alesapin	1ddeb3d149	Buildable getSampleBlock in StorageInMemoryMetadata	2020-06-16 18:51:29 +03:00
Azat Khuzhin	c139a05370	Forward declaration in StorageDistributed	2020-06-14 01:09:21 +03:00
Azat Khuzhin	d2383f0f5d	Fix async INSERT into Distributed for prefer_localhost_replica=0 and w/o internal_replication	2020-06-08 21:58:56 +03:00
Azat Khuzhin	86c5465bf8	Rewrite StorageSystemDistributionQueue interfaces	2020-06-04 03:04:32 +03:00
Azat Khuzhin	f0050adc51	Make system.distribution_queue metrics non racy	2020-06-04 02:36:16 +03:00
Azat Khuzhin	09c3ca9c6c	Add last_exception into system.distribution_queue	2020-06-04 02:36:16 +03:00
Azat Khuzhin	389f78ceee	Add system.distribution_queue system.distribution_queue contains the following columns: - database - table - data_path - is_blocked - error_count - data_files - data_compressed_bytes	2020-06-04 02:36:16 +03:00
alesapin	6253e9b97b	Revert disabled tests	2020-06-02 21:41:29 +03:00
Alexey Milovidov	25f941020b	Remove namespace pollution	2020-05-31 00:57:37 +03:00
Alexey Milovidov	146370934a	Keep the value of DistributedFilesToInsert metric on exceptions	2020-05-27 13:07:38 +03:00
Alexey Milovidov	7e1813825b	Return old names of macros	2020-05-24 01:24:01 +03:00

1 2 3 4

194 Commits