Commit Graph

67 Commits

Author SHA1 Message Date
Azat Khuzhin
6965ac26c3 Distributed: Add ability to delay/throttle INSERT until pending data will be reduced
Add two new settings for the Distributed engine:
- bytes_to_delay_insert
- max_delay_to_insert

If at the beginning of INSERT there will be too much pending data, more
then bytes_to_delay_insert, then the INSERT will wait until it will be
shrinked, and not more then max_delay_to_insert seconds.

If after this there will be still too much pending, it will throw an
exception.

Also new profile events were added (by analogy to the MergeTree):
- DistributedDelayedInserts (although you can use system.errors instead
  of this, but still)
- DistributedRejectedInserts
- DistributedDelayedInsertsMilliseconds
2021-03-03 23:30:23 +03:00
Azat Khuzhin
fcf49a4914 Distributed: Calculate counters for async INSERT at INSERT time
Previous patch fixes the inaccuracy, but it's done using iterating over
directory on each request (to system.distribution_queue or to check
bytes_to_throw_insert), and like previous patch alredy stated, it may
have pretty huge overhead (especially when you have lots of distributed
files pending).

This patch remove that recalculation (but it will still be done, and
if there is different, there will be a log message), and replace it with
proper account at INSERT time (and after file has been sent, or marked
as broken).
2021-03-03 23:30:03 +03:00
Azat Khuzhin
b5a5778589 Distributed: Add ability to limit amount of pending bytes for async INSERT
Right now with distributed_directory_monitor_batch_inserts=1 and
insert_distributed_sync=0 INSERT into Distributed table will store
blocks that should be sent to remote (and in case of
prefer_localhost_replica=0 to the localhost too) on the local
filesystem, and sent it in background.

However there is no limit for this storage, and if the remote is
unavailable (or some other error), these pending blocks may take
significant space, and this is not always desired behaviour.

Add new Distributed setting - bytes_to_throw_insert, that will set the
limit for how much pending bytes is allowed, if the limit will be
reached an exception will be throw.

By default was set to 0, to avoid surprises.
2021-03-03 23:30:00 +03:00
Kruglov Pavel
d94e8624d7
Merge branch 'master' into shard-id 2021-02-06 16:48:17 +03:00
feng lv
0edf65c094 fix test
fix

update

fix spell
2021-02-02 09:22:30 +00:00
feng lv
4279c7da41 add setting insert_shard_id
add test

fix style

fix
2021-02-02 04:26:59 +00:00
Anton Popov
c7070da85a better abstractions in disk interface 2021-01-26 17:49:35 +03:00
Azat Khuzhin
a6631287a7 DistributedBlockOutputStream: add more comments 2021-01-17 12:50:37 +03:00
Azat Khuzhin
b725e1d131 DistributedBlockOutputStream: Remove superfluous brackets for string construction 2021-01-17 12:48:51 +03:00
Azat Khuzhin
2955e25e83 Fix inserted blocks accounting for insert_distributed_one_random_shard=1
It is tricky due to block splitting

Refs: https://github.com/ClickHouse/ClickHouse/pull/18294
2021-01-17 12:45:42 +03:00
Azat Khuzhin
858f07c796 Update comment for query AST cloning during inesrt into multiple local shards
Refs: https://github.com/ClickHouse/ClickHouse/pull/18264#discussion_r558839456
2021-01-17 12:40:41 +03:00
alexey-milovidov
a5a19de878
Update DistributedBlockOutputStream.cpp 2021-01-16 13:22:25 +03:00
feng lv
9829c09720 fix 2021-01-15 15:54:35 +00:00
Azat Khuzhin
ecae6c1c60 Avoid reading the distributed batch just to read the block header
Before this patch batched mode of the DirectoryMonitor is 2x slower then
non-batched, after it should be more or less the same as non-batched.
2021-01-14 22:38:46 +03:00
Azat Khuzhin
819b9d7d56 Add more metadata into distributed .bin files to avoid doing the same on sending
Before this patch StorageDistributedDirectoryMonitor reading .bin files
in batch mode, just to calculate number of bytes/rows, this is very
ineffective, let's just store them in the header (rows/bytes).
2021-01-10 18:17:15 +03:00
Azat Khuzhin
471deab63a Rename fsync_tmp_directory to fsync_directories for Distributed engine 2021-01-09 17:51:30 +03:00
Azat Khuzhin
2e55bd2285 Accept IDisk in DirectoryMonitor (for further fsync) 2021-01-09 16:31:42 +03:00
Azat Khuzhin
fbe5df809b Sync other temporary directories for Distributed fsync_tmp_directories 2021-01-09 11:36:04 +03:00
Azat Khuzhin
b5ace27014 Add fsync support for Distributed engine.
Two new settings (by analogy with MergeTree family) has been added:

- `fsync_after_insert` - Do fsync for every inserted. Will decreases
  performance of inserts.

- `fsync_tmp_directory` - Do fsync for temporary directory (that is used
  for async INSERT only) after all part operations (writes, renames,
  etc.).

Refs: #17380 (p1)
2021-01-09 11:31:32 +03:00
alexey-milovidov
417e685830
Merge pull request #18775 from ClickHouse/dont-insert-empty-blocks-distributed-sync
Do not insert empty blocks on sync Distributed INSERT
2021-01-06 20:06:35 +03:00
Alexey Milovidov
1572d2122b Respect network_compression_method in async INSERT into Distributed table 2021-01-06 03:41:34 +03:00
Alexey Milovidov
190402b7d5 Do not insert empty blocks on sync Distributed INSERT 2021-01-06 02:54:22 +03:00
Amos Bird
6fc225e676
Distributed insertion to one random shard (#18294)
* Distributed insertion to one random shard

* add some tests

* add some documentation

* Respect shards' weights

* fine locking

Co-authored-by: Ivan Lezhankin <ilezhankin@yandex-team.ru>
2020-12-23 19:04:05 +03:00
Azat Khuzhin
5b3ab48861 More forward declaration for generic headers
The following headers are pretty generic, so use forward declaration as
much as possible:
- Context.h
- Settings.h
- ConnectionTimeouts.h
(Also this shows that some missing some includes -- this has been fixed)

And split ConnectionTimeouts.h into ConnectionTimeoutsContext.h (since
module part cannot be added for it, due to recursive build dependencies
that will be introduced)

Also remove Settings from the RemoteBlockInputStream/RemoteQueryExecutor
and just pass the context, since settings was passed only in speicifc
places, that can allow making a copy of Context (i.e. Copier).

Approx results (How much units will be recompiled after changing file X?):

- ConnectionTimeouts.h
  - mainline: 100

- Context.h:
  - mainline: ~800
  - patched:  415

- Settings.h:
  - mainline: 900-1K
  - patched:  440 (most of them because of the Context.h)
2020-12-12 17:43:10 +03:00
Alexander Tokmakov
bfbf150c67 fix segfault when 'not enough space' 2020-12-02 17:49:43 +03:00
Alexander Tokmakov
b94cc5c4e5 remove more stringstreams 2020-11-10 21:22:26 +03:00
alexey-milovidov
f39457bc77
Merge pull request #16788 from azat/fix-use_compact_format_in_distributed_parts_names
Apply use_compact_format_in_distributed_parts_names for each INSERT (with internal_replication)
2020-11-08 23:23:10 +03:00
Azat Khuzhin
04db0834bf Apply use_compact_format_in_distributed_parts_names for each INSERT (with internal_replication)
Before this patch use_compact_format_in_distributed_parts_names was
applied only from default profile (at server start) for
internal_replication=1, and was ignored on INSERT.
2020-11-08 03:05:52 +03:00
Alexey Milovidov
fd84d16387 Fix "server failed to start" error 2020-11-07 03:14:53 +03:00
Alexander Kuzmenkov
fb64cf210a straighten the protocol version 2020-09-17 17:37:29 +03:00
Vitaly Baranov
56665a15f7 Rework and rename the template class SettingsCollection => BaseSettings. 2020-07-31 20:54:18 +03:00
Vladimir Chebotarev
faedb04722 Minor fixes. 2020-07-28 19:45:46 +03:00
Vladimir Chebotarev
1b3f5c99f5 Real fix of test. 2020-07-26 21:27:36 +03:00
Alexey Milovidov
73a5c38398 Fix potential overflow in integer division #12119 2020-07-05 03:29:03 +03:00
alesapin
1ddeb3d149 Buildable getSampleBlock in StorageInMemoryMetadata 2020-06-16 18:51:29 +03:00
Azat Khuzhin
d2383f0f5d Fix async INSERT into Distributed for prefer_localhost_replica=0 and w/o internal_replication 2020-06-08 21:58:56 +03:00
Alexey Milovidov
25f941020b Remove namespace pollution 2020-05-31 00:57:37 +03:00
Alexey Milovidov
7e1813825b Return old names of macros 2020-05-24 01:24:01 +03:00
Alexey Milovidov
ce0619dabf Progress on task 2020-05-24 00:26:45 +03:00
Alexey Milovidov
2d7d5a1547 Apply all transformations again 2020-05-24 00:16:27 +03:00
Alexey Milovidov
bab24879e9 Progress on task 2020-05-24 00:16:05 +03:00
Alexey Milovidov
eacff92d0e Progress on task 2020-05-23 22:35:08 +03:00
Alexey Milovidov
a2ad11897f Remove duplicate whitespaces (preparation) 2020-05-23 21:53:58 +03:00
Alexey Milovidov
1f13515a65 Make all LOG in single line (preparation) 2020-05-23 21:31:37 +03:00
Alexey Milovidov
8042e5febe find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+)\);/\1_FORMATTED(\2, "\3{}\5{}", \4, \6);/' 2020-05-23 19:58:15 +03:00
Azat Khuzhin
d93b9a57f6 Forward declaration for Context as much as possible.
Now after changing Context.h 488 modules will be recompiled instead of 582.
2020-05-21 01:53:18 +03:00
alexey-milovidov
7cf3538840
Merge pull request #10270 from ClickHouse/quota-key-in-client
Support quota_key for Native client
2020-05-17 14:09:40 +03:00
alexey-milovidov
7ee35f102d
Merge pull request #10867 from azat/dist-INSERT-load-balancing
Respect prefer_localhost_replica/load_balancing on INSERT into Distributed
2020-05-17 11:11:35 +03:00
Alexey Milovidov
397859ccb8 Fix error 2020-05-17 08:45:20 +03:00
Azat Khuzhin
2498041cc1 Avoid sending partially written files by the DistributedBlockOutputStream 2020-05-16 01:00:42 +03:00