ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-16 03:12:43 +00:00

Author	SHA1	Message	Date
Frank Chen	40c6e4c0d6	Merge remote-tracking branch 'origin/master' into tracing_context_propagation	2022-08-02 10:02:09 +08:00
Alexey Milovidov	8fb70abe3e	Merge pull request #39178 from azat/dist-insert-log Add connection info for Distributed sends log message	2022-07-31 02:22:22 +03:00
Robert Schulze	4333750985	Less usage of StringRef ... replaced by std::string_view, see #39262	2022-07-24 18:33:52 +00:00
Robert Schulze	81ef1099cc	Even less usage of StringRef --> see #39300	2022-07-19 07:01:06 +00:00
Azat Khuzhin	4f25a08b7c	Add connection info for Distributed sends log message Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-07-13 16:04:12 +03:00
Frank Chen	da57a993e4	Fix CI	2022-07-09 13:43:10 +08:00
Frank Chen	d3d89f59ca	Add tracing support to distributed insert	2022-07-07 17:43:09 +08:00
Nikolai Kochetov	1b85f2c1d6	Merge branch 'master' into refactor-read-metrics-and-callbacks	2022-05-25 16:27:40 +02:00
Nikolai Kochetov	56feef01e7	Move some resources	2022-05-20 19:49:31 +00:00
Anton Popov	e911900054	remove last mentions of data streams	2022-05-09 19:15:24 +00:00
Amos Bird	4a5e4274f0	base should not depend on Common	2022-04-29 10:26:35 +08:00
Maksim Kita	57444fc7d3	Merge pull request #36444 from rschu1ze/clang-tidy-fixes Clang tidy fixes	2022-04-21 16:11:27 +02:00
Robert Schulze	b24ca8de52	Fix various clang-tidy warnings When I tried to add cool new clang-tidy 14 warnings, I noticed that the current clang-tidy settings already produce a ton of warnings. This commit addresses many of these. Almost all of them were non-critical, i.e. C vs. C++ style casts.	2022-04-20 10:29:05 +02:00
Robert Schulze	118e94523c	Activate clang-tidy warning "readability-container-contains" This check suggests replacing <Container>.count() by <Container>.contains() which is more speaking and in case of multimaps/multisets also faster.	2022-04-18 23:53:11 +02:00
Azat Khuzhin	c4b6342853	Improvements for `parallel_distributed_insert_select` (and related) (#34728 ) * Add a warning if parallel_distributed_insert_select was ignored Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Respect max_distributed_depth for parallel_distributed_insert_select Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Print warning for non applied parallel_distributed_insert_select only for initial query Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Remove Cluster::getHashOfAddresses() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Forbid parallel_distributed_insert_select for remote()/cluster() with different addresses Before it uses empty cluster name (getClusterName()) which is not correct, compare all addresses instead. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Fix max_distributed_depth check max_distributed_depth=1 must mean not more then one distributed query, not two, since max_distributed_depth=0 means no limit, and distribute_depth is 0 for the first query. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Fix INSERT INTO remote()/cluster() with parallel_distributed_insert_select Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Add a test for parallel_distributed_insert_select with cluster()/remote() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Return <remote> instead of empty cluster name in Distributed engine Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> * Make user with sharding_key and w/o in remote()/cluster() identical Before with sharding_key the user was "default", while w/o it it was empty. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-03-08 15:24:39 +01:00
Frank Chen	b4829465d9	Improve the opentelemetry span logs for INSERT on distributed table (#34480 )	2022-03-03 12:53:29 +01:00
Hongbin	99bd56e2de	Fix some code comments style	2022-02-28 08:15:37 +08:00
Anton Popov	7e9770dcf0	minor enhancements	2022-02-08 15:57:23 +03:00
Anton Popov	96a506c6fa	fix inserts to distributed tables in case of change of native protocol	2022-01-29 03:23:25 +03:00
Amos Bird	6d62060e16	Build improvement	2022-01-17 22:36:27 +08:00
Raúl Marín	b2cfa70541	Reduce dependencies on ASTFunction.h 481 -> 230	2021-11-26 18:21:54 +01:00
avogar	51831afff8	Fix tests	2021-11-11 20:27:23 +03:00
Nikolai Kochetov	a08c98d760	Move some files.	2021-10-16 17:03:50 +03:00
Nikolai Kochetov	fd14faeae2	Remove DataStreams folder.	2021-10-15 23:18:20 +03:00
Nikolai Kochetov	c6bce1a4cf	Update Native.	2021-10-08 20:21:19 +03:00
Nikolai Kochetov	78e1db209f	Remove more data streams (#29491 ) * Remove more streams. * Fixing build. * Fixing build. * Rename files. * Fix fast test. * Fix StorageKafka. * Try fix kafka test. * Move createBuffer to KafkaSource ctor. * Revert "Move createBuffer to KafkaSource ctor." This reverts commit `81fa94d27e`. * Revert "Try fix kafka test." This reverts commit `2107e54969`. * Comment some rows in test. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2021-10-07 11:26:08 +03:00
Vitaly Baranov	8a01b32cba	Merge pull request #28637 from vitlibar/fix-materialized-column-as-sharding-key Fix materialized column as sharding key	2021-10-05 10:53:24 +03:00
Vitaly Baranov	1636ee24bb	Fix using materialized column as sharding key.	2021-10-04 10:56:42 +03:00
Azat Khuzhin	ae5ee2dd28	Move macros for distributed engine into separate header	2021-10-03 14:34:03 +03:00
Azat Khuzhin	6a9dd9828d	Move protocol macros into separate header Defines.h is a very common header, so lots of modules will be recompiled on changes. Move macros for protocol into separate header, this should significantly decreases number of units to compile on it's changes.	2021-10-03 14:34:03 +03:00
Alexey Milovidov	fe6b7c77c7	Rename "common" to "base"	2021-10-02 10:13:14 +03:00
Nikolai Kochetov	341553febd	Fix build.	2021-09-16 20:40:42 +03:00
Nikolai Kochetov	66a76ab70f	Rewrite PushingToViewsBlockOutputStream part 6	2021-09-03 20:29:36 +03:00
Nikolai Kochetov	61d8f880cd	Rename some files.	2021-07-26 19:48:25 +03:00
Nikolai Kochetov	9b5a816b43	Merge branch 'master' into output-streams-to-processors	2021-07-26 18:03:11 +03:00
Nikolai Kochetov	0eb563dc1b	Fix more tests.	2021-07-26 17:47:29 +03:00
Nikolai Kochetov	9c92f43359	Update storages.	2021-07-23 22:33:59 +03:00
Nikolai Kochetov	2dc5c89b66	Update Storage::write	2021-07-23 17:25:35 +03:00
Nikolai Kochetov	3ed3f7a9f7	Fix integration tests.	2021-07-22 13:38:22 +03:00
Nikolai Kochetov	65d3e713d6	Fix another one test.	2021-07-21 15:16:13 +03:00
Nikolai Kochetov	179ec05a72	Remove some streams.	2021-07-20 21:18:43 +03:00
alexey-milovidov	b16e01507f	Merge pull request #26464 from azat/ubsan-dir-mon-fix Fix undefined-behavior in DirectoryMonitor (for exponential back off)	2021-07-17 18:18:42 +03:00
alexey-milovidov	ca37548888	Merge pull request #26430 from azat/fix-dist-msg Fix "While sending batch" (on Distributed async send)	2021-07-17 13:01:36 +03:00
Azat Khuzhin	d2967ffa0b	Fix undefined-behavior in DirectoryMonitor (for exponential back off) UBsan reports [1]: ../src/Storages/Distributed/DirectoryMonitor.cpp:435:54: runtime error: 2.30584e+19 is outside the range of representable values of type 'unsigned long' [1]: https://clickhouse-test-reports.s3.yandex.net/0/10f3500b3be73c9498d994d189784c7d44ed6793/stress_test_(undefined).html#fail1	2021-07-17 12:18:05 +03:00
Azat Khuzhin	80e614318c	Fix "While sending batch" (on Distributed async send)	2021-07-16 22:27:46 +03:00
Azat Khuzhin	a3653bd665	Fix overflow in exponential sleep in DirectoryMonitor UBsan reports: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/Storages/Distributed/DirectoryMonitor.cpp:435:53 in ../src/Storages/Distributed/DirectoryMonitor.cpp:435: runtime error: 1.15292e+19 is outside the range of representable values of type 'long' 0 0x1df0c286 in DB::StorageDistributedDirectoryMonitor::run() obj-x86_64-linux-gnu/../src/Storages/Distributed/DirectoryMonitor.cpp:435:53 It is pretty easy to reproduce by limiting max_server_memory_usage before staring the test.	2021-07-16 04:10:47 +03:00
Azat Khuzhin	f3d3ec44a6	Add ability to set Distributed directory monitor settings via CREATE TABLE	2021-07-16 04:10:47 +03:00
alexey-milovidov	4183f3164a	Merge branch 'master' into fixrandomoneshardinsert	2021-07-13 04:46:40 +03:00
alexey-milovidov	0a26687115	Merge pull request #23864 from azat/dist-split-batch-and-retry Add ability to split distributed batch on failures (i.e. due to memory limits)	2021-06-27 19:28:26 +03:00
Azat Khuzhin	a616ae8861	Improve startup time of Distributed engine. - create directory monitors in parallel (this also includes rmdir in case of directory is empty, since even if the directory is empty it may take some time to remove it, due to waiting for journal or if the directory is large, i.e. it had lots of files before, since remember ext4 does not truncate the directory size on each unlink [1]) - initialize increment in parallel too (since it does readdir()) [1]: https://lore.kernel.org/linux-ext4/930A5754-5CE6-4567-8CF0-62447C97825C@dilger.ca/	2021-06-24 10:27:51 +03:00
Azat Khuzhin	3bd53c68f9	Try to split the batch in case of broken batch too Broken batches may be because of abnormal server shutdown (and lack of fsync), and ignoring the whole batch is not great in this case, so apply the same split logic here too. v2: rename exception v3: catch missing exception v4: fix marking the file as broken multiple times (fixes test_insert_distributed_async_send with setting enabled)	2021-06-23 02:48:47 +03:00
Azat Khuzhin	a0209178cc	Add ability to split distributed batch on failures Add distributed_directory_monitor_split_batch_on_failure setting (OFF by default), that will split the batch and send files one by one in case of retriable errors. v2: more error codes	2021-06-23 02:48:47 +03:00
Azat Khuzhin	e148ef739d	Drop replicas from dirname for internal_replication=true Under use_compact_format_in_distributed_parts_names=1 and internal_replication=true the server encodes all replicas for the directory name for async INSERT into Distributed, and the directory name looks like: shard1_replica1,shard1_replica2,shard3_replica3 This is required for creating connections (to specific replicas only), but in case of internal_replication=true, this can be avoided, since this path will always includes all replicas. This patch replaces all replicas with "_all_replicas" marker. Note, that initial problem was that this path may overflow the NAME_MAX if you will have more then 15 replicas, and the server will fail to create the directory. Also note, that changed directory name should not be a problem, since: - empty directories will be removed since #16729 - and replicas encoded in the directory name is also supported anyway.	2021-06-23 02:47:38 +03:00
Maksim Kita	67e9b85951	Merge ext into common	2021-06-16 23:28:41 +03:00
alexey-milovidov	34d12063f8	Merge pull request #23349 from azat/dist-respect-insert_allow_materialized_columns Respect insert_allow_materialized_columns for INSERT into Distributed()	2021-06-14 07:23:00 +03:00
Azat Khuzhin	2109980284	Respect max_distributed_connections for insert_distributed_sync Otherwise for huge clusters and sync insert it may run out of max_thread_pool_size (default 10K).	2021-06-08 09:11:44 +03:00
Alexey Milovidov	17962459f5	Merge branch 'master' into issue-16775	2021-06-06 02:18:28 +03:00
tavplubix	e9ff0b6d70	Merge pull request #23657 from kssenii/poco-file-to-std-fs Poco::File to std::filesystem	2021-05-31 23:17:02 +03:00
Nikolai Kochetov	afc1fe7f3d	Make ContextPtr const by default.	2021-05-31 17:49:02 +03:00
Alexey Milovidov	273226de32	Remove string parameter for Density	2021-05-24 06:43:25 +03:00
Alexey Milovidov	40d4f0678f	Remove overload (harmful)	2021-05-23 04:25:06 +03:00
Azat Khuzhin	4d737a5481	Respect insert_allow_materialized_columns for INSERT into Distributed()	2021-05-20 07:40:46 +03:00
Azat Khuzhin	c3e65c0d27	Async INSERT into Distributed() does support settings Since #4852	2021-05-20 07:40:46 +03:00
kssenii	ab1a05a1f4	Poco::Path to fs::path, less concatination	2021-05-09 14:59:49 +03:00
kssenii	02288359c5	Less manual concatenation of paths	2021-05-08 13:59:55 +03:00
fibersel	cb53bbb7b0	add experimental codecs flag, add integration test for experimental codecs	2021-05-06 14:57:22 +03:00
kssenii	2dabdd0f73	Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs	2021-05-05 18:42:40 +03:00
Azat Khuzhin	9c6e8e1462	Add BrokenDistributedFilesToInsert new metric Number of files for asynchronous insertion into Distributed tables that has been marked as broken. This metric will starts from 0 on start. Number of files for every shard is summed.	2021-05-04 22:48:07 +03:00
Azat Khuzhin	74269882f7	Add broken_data_files/broken_data_compressed_bytes into distribution_queue	2021-05-04 22:48:07 +03:00
Azat Khuzhin	5e33604c4d	Add file paths into logs on failed distributed async sends	2021-05-03 08:55:38 +03:00
kssenii	ee06936596	Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs	2021-05-01 17:24:31 +03:00
Maksim Kita	1db6eb3666	Merge pull request #23744 from azat/dist-INSERT-preserve-error Preserve errors for INSERT into Distributed	2021-04-29 10:26:34 +03:00
Azat Khuzhin	73ab415c4c	Preserve errors for INSERT into Distributed Before this patch (and after #22208) the INSERT may fail with "Cannot schedule a task" because the pool in DistributedBlockOutputStream already throws exception and simply fail in writeSuffix().	2021-04-28 22:33:29 +03:00
kssenii	deb4903af8	Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs	2021-04-28 20:57:13 +03:00
kssenii	1e4a61ce63	Fix build	2021-04-27 20:22:39 +03:00
kssenii	eeb71672a0	Change in Storages/*	2021-04-27 16:49:37 +03:00
Amos Bird	096d76627e	Skip unavaiable shards when writing to distributed tables	2021-04-21 10:30:40 +08:00
Alexey Milovidov	77e64b3ebd	Merge branch 'master' into protocol-compression-auto	2021-04-17 16:46:51 +03:00
Azat Khuzhin	d2cf03ea41	Change logging from trace to debug for messages with rows/bytes	2021-04-15 21:00:16 +03:00
Amos Bird	bf5b668f85	Fix random_shard_insert issue	2021-04-15 21:12:39 +08:00
Alexey Milovidov	6f56c3280f	Uncompress data in Distributed sends if needed	2021-04-14 00:53:39 +03:00
Ivan	495c6e03aa	Replace all Context references with std::weak_ptr (#22297 ) * Replace all Context references with std::weak_ptr * Fix shared context captured by value * Fix build * Fix Context with named sessions * Fix copy context * Fix gcc build * Merge with master and fix build * Fix gcc-9 build	2021-04-11 02:33:54 +03:00
Alexander Kuzmenkov	0264124146	Merge pull request #21942 from ucasFL/distributed_depth Add settings max_distributed_depth	2021-04-09 15:52:58 +03:00
Azat Khuzhin	c27b931f6a	Slightly improve logging messages for Distributed async sends - add took time (in ms) - add rows/bytes	2021-04-08 08:10:39 +03:00
Azat Khuzhin	27d4fbd13b	Compare Block itself for distributed async INSERT batches INSERT into Distributed with insert_distributed_sync=1 stores the distributed batches on the disk for sending in background. But types may be a little bit different for the Distributed and it's underlying table, so the initiator need to know whether conversion is required or not. Before this patch those on disk distributed batches contains header, which includes dumpStructure() for the block in that batch, however it checks not only names and types and plus dumpStructure() is a debug method. So instead of storing string representation for the block header we should store empty block in the file header (note, that we cannot store the empty block not in header, since this will require reading all blocks from file, due to some trickery of the readers interface). Note, that this patch also contains tiny refactoring: - s/header/distributed_header/ v1: dumpNamesAndTypes() v2: dump empty block into the batch itself v3: move empty block into the header	2021-04-06 10:05:21 +03:00
feng lv	56073db22d	max distributed depth Add settings max_distributed_depth fix style fix fix fix fix test fix fix	2021-04-05 14:00:54 +00:00
Azat Khuzhin	79ed35876e	DirectoryMonitor: Remove const qualifier and lots of mutable qualifiers	2021-03-03 23:30:24 +03:00
Azat Khuzhin	45ee650e26	Distributed: check for bytes_to_throw/delay_insert only before INSERT Before it was checked for each block.	2021-03-03 23:30:24 +03:00
Azat Khuzhin	6965ac26c3	Distributed: Add ability to delay/throttle INSERT until pending data will be reduced Add two new settings for the Distributed engine: - bytes_to_delay_insert - max_delay_to_insert If at the beginning of INSERT there will be too much pending data, more then bytes_to_delay_insert, then the INSERT will wait until it will be shrinked, and not more then max_delay_to_insert seconds. If after this there will be still too much pending, it will throw an exception. Also new profile events were added (by analogy to the MergeTree): - DistributedDelayedInserts (although you can use system.errors instead of this, but still) - DistributedRejectedInserts - DistributedDelayedInsertsMilliseconds	2021-03-03 23:30:23 +03:00
Azat Khuzhin	15f7459cae	Distributed/DirectoryMonitor: protect metric_pending_files with metrics_lock Since there is local value, that is not atomic, anyway we already have lock for metrics, so it is fine.	2021-03-03 23:30:03 +03:00
Azat Khuzhin	70049db143	CurrentMetrics/Increment: Introduce add()	2021-03-03 23:30:03 +03:00
Azat Khuzhin	017c054a35	Distributed/DirectoryMonitor: Use std::lock_guard over std::unique_lock It is more natural, since we do not need lazy locking.	2021-03-03 23:30:03 +03:00
Azat Khuzhin	fcf49a4914	Distributed: Calculate counters for async INSERT at INSERT time Previous patch fixes the inaccuracy, but it's done using iterating over directory on each request (to system.distribution_queue or to check bytes_to_throw_insert), and like previous patch alredy stated, it may have pretty huge overhead (especially when you have lots of distributed files pending). This patch remove that recalculation (but it will still be done, and if there is different, there will be a log message), and replace it with proper account at INSERT time (and after file has been sent, or marked as broken).	2021-03-03 23:30:03 +03:00
Azat Khuzhin	b43046ba06	Distributed: More accurate distribution_queue counters So now system.distribution_queue will show accurate statistics, so tests does not requires sleep anymore. But note that with too much distributed pending this will iterate over all directories.	2021-03-03 23:30:03 +03:00
Azat Khuzhin	b5a5778589	Distributed: Add ability to limit amount of pending bytes for async INSERT Right now with distributed_directory_monitor_batch_inserts=1 and insert_distributed_sync=0 INSERT into Distributed table will store blocks that should be sent to remote (and in case of prefer_localhost_replica=0 to the localhost too) on the local filesystem, and sent it in background. However there is no limit for this storage, and if the remote is unavailable (or some other error), these pending blocks may take significant space, and this is not always desired behaviour. Add new Distributed setting - bytes_to_throw_insert, that will set the limit for how much pending bytes is allowed, if the limit will be reached an exception will be throw. By default was set to 0, to avoid surprises.	2021-03-03 23:30:00 +03:00
Azat Khuzhin	02198d091e	Add proper checks while parsing directory names for async INSERT (fixes SIGSEGV)	2021-02-15 10:53:41 +03:00
Kruglov Pavel	d94e8624d7	Merge branch 'master' into shard-id	2021-02-06 16:48:17 +03:00
feng lv	0edf65c094	fix test fix update fix spell	2021-02-02 09:22:30 +00:00
feng lv	4279c7da41	add setting insert_shard_id add test fix style fix	2021-02-02 04:26:59 +00:00
Anton Popov	c7070da85a	better abstractions in disk interface	2021-01-26 17:49:35 +03:00

1 2 3 4 5

247 Commits