Commit Graph

261 Commits

Author SHA1 Message Date
Frank Chen
2067096035 Optimize span log for SYNC insert 2022-09-06 16:06:30 +08:00
Frank Chen
8365e7bfac Remove extra attribute 2022-09-06 15:41:21 +08:00
Frank Chen
6ab1549d6c Update writeToLocal to record related info 2022-09-05 16:40:48 +08:00
Frank Chen
a17bc51d5b Save cluster/distributed/table to log 2022-09-05 16:39:47 +08:00
Frank Chen
3d65e3f2ee Add cluster/distributed/remote to file 2022-09-05 16:37:55 +08:00
Frank Chen
92f7ca3616 Move TracingContextOnThread::current() out of class for simplicity
Signed-off-by: Frank Chen <frank.chen021@outlook.com>
2022-08-25 20:23:56 +08:00
Frank Chen
bb00dcc19b Remove using namespace from header
Signed-off-by: Frank Chen <frank.chen021@outlook.com>
2022-08-25 20:20:13 +08:00
Frank Chen
99c37ce6c6
Merge branch 'master' into tracing_context_propagation 2022-08-25 10:07:16 +08:00
Frank Chen
cd19366b44 Move classes into DB::OpenTelemetry namespace 2022-08-24 16:41:40 +08:00
Frank Chen
efc6a60a60 Clean code 2022-08-24 15:59:44 +08:00
Robert Schulze
77e64935e1
Reduce some usage of StringRef 2022-08-19 09:56:59 +00:00
Frank Chen
a3b6ad2a65
Merge branch 'master' into tracing_context_propagation 2022-08-18 20:59:07 +08:00
Yakov Olkhovskiy
2e34b384c1 update tcp protocol, add quota_key 2022-08-03 15:44:08 -04:00
Frank Chen
8eb254c0c8 Fix merge problem 2022-08-02 10:23:51 +08:00
Frank Chen
40c6e4c0d6 Merge remote-tracking branch 'origin/master' into tracing_context_propagation 2022-08-02 10:02:09 +08:00
Alexey Milovidov
8fb70abe3e
Merge pull request #39178 from azat/dist-insert-log
Add connection info for Distributed sends log message
2022-07-31 02:22:22 +03:00
Robert Schulze
4333750985
Less usage of StringRef
... replaced by std::string_view, see #39262
2022-07-24 18:33:52 +00:00
Robert Schulze
81ef1099cc
Even less usage of StringRef
--> see #39300
2022-07-19 07:01:06 +00:00
Azat Khuzhin
4f25a08b7c Add connection info for Distributed sends log message
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-13 16:04:12 +03:00
Frank Chen
da57a993e4 Fix CI 2022-07-09 13:43:10 +08:00
Frank Chen
d3d89f59ca Add tracing support to distributed insert 2022-07-07 17:43:09 +08:00
Nikolai Kochetov
1b85f2c1d6 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-25 16:27:40 +02:00
Nikolai Kochetov
56feef01e7 Move some resources 2022-05-20 19:49:31 +00:00
Anton Popov
e911900054 remove last mentions of data streams 2022-05-09 19:15:24 +00:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Maksim Kita
57444fc7d3
Merge pull request #36444 from rschu1ze/clang-tidy-fixes
Clang tidy fixes
2022-04-21 16:11:27 +02:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
Robert Schulze
118e94523c
Activate clang-tidy warning "readability-container-contains"
This check suggests replacing <Container>.count() by
<Container>.contains() which is more speaking and in case of
multimaps/multisets also faster.
2022-04-18 23:53:11 +02:00
Azat Khuzhin
c4b6342853
Improvements for parallel_distributed_insert_select (and related) (#34728)
* Add a warning if parallel_distributed_insert_select was ignored

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Respect max_distributed_depth for parallel_distributed_insert_select

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Print warning for non applied parallel_distributed_insert_select only for initial query

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Remove Cluster::getHashOfAddresses()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Forbid parallel_distributed_insert_select for remote()/cluster() with different addresses

Before it uses empty cluster name (getClusterName()) which is not
correct, compare all addresses instead.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Fix max_distributed_depth check

max_distributed_depth=1 must mean not more then one distributed query,
not two, since max_distributed_depth=0 means no limit, and
distribute_depth is 0 for the first query.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Fix INSERT INTO remote()/cluster() with parallel_distributed_insert_select

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Add a test for parallel_distributed_insert_select with cluster()/remote()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Return <remote> instead of empty cluster name in Distributed engine

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Make user with sharding_key and w/o in remote()/cluster() identical

Before with sharding_key the user was "default", while w/o it it was
empty.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-08 15:24:39 +01:00
Frank Chen
b4829465d9
Improve the opentelemetry span logs for INSERT on distributed table (#34480) 2022-03-03 12:53:29 +01:00
Hongbin
99bd56e2de
Fix some code comments style 2022-02-28 08:15:37 +08:00
Anton Popov
7e9770dcf0 minor enhancements 2022-02-08 15:57:23 +03:00
Anton Popov
96a506c6fa fix inserts to distributed tables in case of change of native protocol 2022-01-29 03:23:25 +03:00
Amos Bird
6d62060e16
Build improvement 2022-01-17 22:36:27 +08:00
Raúl Marín
b2cfa70541 Reduce dependencies on ASTFunction.h
481 -> 230
2021-11-26 18:21:54 +01:00
avogar
51831afff8 Fix tests 2021-11-11 20:27:23 +03:00
Nikolai Kochetov
a08c98d760 Move some files. 2021-10-16 17:03:50 +03:00
Nikolai Kochetov
fd14faeae2 Remove DataStreams folder. 2021-10-15 23:18:20 +03:00
Nikolai Kochetov
c6bce1a4cf Update Native. 2021-10-08 20:21:19 +03:00
Nikolai Kochetov
78e1db209f
Remove more data streams (#29491)
* Remove more streams.

* Fixing build.

* Fixing build.

* Rename files.

* Fix fast test.

* Fix StorageKafka.

* Try fix kafka test.

* Move createBuffer to KafkaSource ctor.

* Revert "Move createBuffer to KafkaSource ctor."

This reverts commit 81fa94d27e.

* Revert "Try fix kafka test."

This reverts commit 2107e54969.

* Comment some rows in test.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-10-07 11:26:08 +03:00
Vitaly Baranov
8a01b32cba
Merge pull request #28637 from vitlibar/fix-materialized-column-as-sharding-key
Fix materialized column as sharding key
2021-10-05 10:53:24 +03:00
Vitaly Baranov
1636ee24bb Fix using materialized column as sharding key. 2021-10-04 10:56:42 +03:00
Azat Khuzhin
ae5ee2dd28 Move macros for distributed engine into separate header 2021-10-03 14:34:03 +03:00
Azat Khuzhin
6a9dd9828d Move protocol macros into separate header
Defines.h is a very common header, so lots of modules will be recompiled
on changes.
Move macros for protocol into separate header, this should significantly
decreases number of units to compile on it's changes.
2021-10-03 14:34:03 +03:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Nikolai Kochetov
341553febd Fix build. 2021-09-16 20:40:42 +03:00
Nikolai Kochetov
66a76ab70f Rewrite PushingToViewsBlockOutputStream part 6 2021-09-03 20:29:36 +03:00
Nikolai Kochetov
61d8f880cd Rename some files. 2021-07-26 19:48:25 +03:00
Nikolai Kochetov
9b5a816b43 Merge branch 'master' into output-streams-to-processors 2021-07-26 18:03:11 +03:00
Nikolai Kochetov
0eb563dc1b Fix more tests. 2021-07-26 17:47:29 +03:00
Nikolai Kochetov
9c92f43359 Update storages. 2021-07-23 22:33:59 +03:00
Nikolai Kochetov
2dc5c89b66 Update Storage::write 2021-07-23 17:25:35 +03:00
Nikolai Kochetov
3ed3f7a9f7 Fix integration tests. 2021-07-22 13:38:22 +03:00
Nikolai Kochetov
65d3e713d6 Fix another one test. 2021-07-21 15:16:13 +03:00
Nikolai Kochetov
179ec05a72 Remove some streams. 2021-07-20 21:18:43 +03:00
alexey-milovidov
b16e01507f
Merge pull request #26464 from azat/ubsan-dir-mon-fix
Fix undefined-behavior in DirectoryMonitor (for exponential back off)
2021-07-17 18:18:42 +03:00
alexey-milovidov
ca37548888
Merge pull request #26430 from azat/fix-dist-msg
Fix "While sending batch" (on Distributed async send)
2021-07-17 13:01:36 +03:00
Azat Khuzhin
d2967ffa0b Fix undefined-behavior in DirectoryMonitor (for exponential back off)
UBsan reports [1]:

    ../src/Storages/Distributed/DirectoryMonitor.cpp:435:54: runtime error: 2.30584e+19 is outside the range of representable values of type 'unsigned long'

  [1]: https://clickhouse-test-reports.s3.yandex.net/0/10f3500b3be73c9498d994d189784c7d44ed6793/stress_test_(undefined).html#fail1
2021-07-17 12:18:05 +03:00
Azat Khuzhin
80e614318c Fix "While sending batch" (on Distributed async send) 2021-07-16 22:27:46 +03:00
Azat Khuzhin
a3653bd665 Fix overflow in exponential sleep in DirectoryMonitor
UBsan reports:

    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/Storages/Distributed/DirectoryMonitor.cpp:435:53 in
    ../src/Storages/Distributed/DirectoryMonitor.cpp:435: runtime error: 1.15292e+19 is outside the range of representable values of type 'long'
        0 0x1df0c286 in DB::StorageDistributedDirectoryMonitor::run() obj-x86_64-linux-gnu/../src/Storages/Distributed/DirectoryMonitor.cpp:435:53

It is pretty easy to reproduce by limiting max_server_memory_usage
before staring the test.
2021-07-16 04:10:47 +03:00
Azat Khuzhin
f3d3ec44a6 Add ability to set Distributed directory monitor settings via CREATE TABLE 2021-07-16 04:10:47 +03:00
alexey-milovidov
4183f3164a
Merge branch 'master' into fixrandomoneshardinsert 2021-07-13 04:46:40 +03:00
alexey-milovidov
0a26687115
Merge pull request #23864 from azat/dist-split-batch-and-retry
Add ability to split distributed batch on failures (i.e. due to memory limits)
2021-06-27 19:28:26 +03:00
Azat Khuzhin
a616ae8861 Improve startup time of Distributed engine.
- create directory monitors in parallel (this also includes rmdir in
  case of directory is empty, since even if the directory is empty it
  may take some time to remove it, due to waiting for journal or if the
  directory is large, i.e. it had lots of files before, since remember
  ext4 does not truncate the directory size on each unlink [1])
- initialize increment in parallel too (since it does readdir())

  [1]: https://lore.kernel.org/linux-ext4/930A5754-5CE6-4567-8CF0-62447C97825C@dilger.ca/
2021-06-24 10:27:51 +03:00
Azat Khuzhin
3bd53c68f9 Try to split the batch in case of broken batch too
Broken batches may be because of abnormal server shutdown (and lack of
fsync), and ignoring the whole batch is not great in this case, so apply
the same split logic here too.

v2: rename exception
v3: catch missing exception
v4: fix marking the file as broken multiple times (fixes
test_insert_distributed_async_send with setting enabled)
2021-06-23 02:48:47 +03:00
Azat Khuzhin
a0209178cc Add ability to split distributed batch on failures
Add distributed_directory_monitor_split_batch_on_failure setting (OFF by
default), that will split the batch and send files one by one in case of
retriable errors.

v2: more error codes
2021-06-23 02:48:47 +03:00
Azat Khuzhin
e148ef739d Drop replicas from dirname for internal_replication=true
Under use_compact_format_in_distributed_parts_names=1 and
internal_replication=true the server encodes all replicas for the
directory name for async INSERT into Distributed, and the directory name
looks like:

    shard1_replica1,shard1_replica2,shard3_replica3

This is required for creating connections (to specific replicas only),
but in case of internal_replication=true, this can be avoided, since
this path will always includes all replicas.

This patch replaces all replicas with "_all_replicas" marker.

Note, that initial problem was that this path may overflow the NAME_MAX
if you will have more then 15 replicas, and the server will fail to
create the directory.

Also note, that changed directory name should not be a problem, since:
- empty directories will be removed since #16729
- and replicas encoded in the directory name is also supported anyway.
2021-06-23 02:47:38 +03:00
Maksim Kita
67e9b85951 Merge ext into common 2021-06-16 23:28:41 +03:00
alexey-milovidov
34d12063f8
Merge pull request #23349 from azat/dist-respect-insert_allow_materialized_columns
Respect insert_allow_materialized_columns for INSERT into Distributed()
2021-06-14 07:23:00 +03:00
Azat Khuzhin
2109980284 Respect max_distributed_connections for insert_distributed_sync
Otherwise for huge clusters and sync insert it may run out of
max_thread_pool_size (default 10K).
2021-06-08 09:11:44 +03:00
Alexey Milovidov
17962459f5 Merge branch 'master' into issue-16775 2021-06-06 02:18:28 +03:00
tavplubix
e9ff0b6d70
Merge pull request #23657 from kssenii/poco-file-to-std-fs
Poco::File to std::filesystem
2021-05-31 23:17:02 +03:00
Nikolai Kochetov
afc1fe7f3d Make ContextPtr const by default. 2021-05-31 17:49:02 +03:00
Alexey Milovidov
273226de32 Remove string parameter for Density 2021-05-24 06:43:25 +03:00
Alexey Milovidov
40d4f0678f Remove overload (harmful) 2021-05-23 04:25:06 +03:00
Azat Khuzhin
4d737a5481 Respect insert_allow_materialized_columns for INSERT into Distributed() 2021-05-20 07:40:46 +03:00
Azat Khuzhin
c3e65c0d27 Async INSERT into Distributed() does support settings
Since #4852
2021-05-20 07:40:46 +03:00
kssenii
ab1a05a1f4 Poco::Path to fs::path, less concatination 2021-05-09 14:59:49 +03:00
kssenii
02288359c5 Less manual concatenation of paths 2021-05-08 13:59:55 +03:00
fibersel
cb53bbb7b0 add experimental codecs flag, add integration test for experimental codecs 2021-05-06 14:57:22 +03:00
kssenii
2dabdd0f73 Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs 2021-05-05 18:42:40 +03:00
Azat Khuzhin
9c6e8e1462 Add BrokenDistributedFilesToInsert new metric
Number of files for asynchronous insertion into Distributed tables that
has been marked as broken. This metric will starts from 0 on start.
Number of files for every shard is summed.
2021-05-04 22:48:07 +03:00
Azat Khuzhin
74269882f7 Add broken_data_files/broken_data_compressed_bytes into distribution_queue 2021-05-04 22:48:07 +03:00
Azat Khuzhin
5e33604c4d Add file paths into logs on failed distributed async sends 2021-05-03 08:55:38 +03:00
kssenii
ee06936596 Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs 2021-05-01 17:24:31 +03:00
Maksim Kita
1db6eb3666
Merge pull request #23744 from azat/dist-INSERT-preserve-error
Preserve errors for INSERT into Distributed
2021-04-29 10:26:34 +03:00
Azat Khuzhin
73ab415c4c Preserve errors for INSERT into Distributed
Before this patch (and after #22208) the INSERT may fail with "Cannot
schedule a task" because the pool in DistributedBlockOutputStream
already throws exception and simply fail in writeSuffix().
2021-04-28 22:33:29 +03:00
kssenii
deb4903af8 Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs 2021-04-28 20:57:13 +03:00
kssenii
1e4a61ce63 Fix build 2021-04-27 20:22:39 +03:00
kssenii
eeb71672a0 Change in Storages/* 2021-04-27 16:49:37 +03:00
Amos Bird
096d76627e
Skip unavaiable shards when writing to distributed tables 2021-04-21 10:30:40 +08:00
Alexey Milovidov
77e64b3ebd Merge branch 'master' into protocol-compression-auto 2021-04-17 16:46:51 +03:00
Azat Khuzhin
d2cf03ea41 Change logging from trace to debug for messages with rows/bytes 2021-04-15 21:00:16 +03:00
Amos Bird
bf5b668f85
Fix random_shard_insert issue 2021-04-15 21:12:39 +08:00
Alexey Milovidov
6f56c3280f Uncompress data in Distributed sends if needed 2021-04-14 00:53:39 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Alexander Kuzmenkov
0264124146
Merge pull request #21942 from ucasFL/distributed_depth
Add settings max_distributed_depth
2021-04-09 15:52:58 +03:00
Azat Khuzhin
c27b931f6a Slightly improve logging messages for Distributed async sends
- add took time (in ms)
- add rows/bytes
2021-04-08 08:10:39 +03:00
Azat Khuzhin
27d4fbd13b Compare Block itself for distributed async INSERT batches
INSERT into Distributed with insert_distributed_sync=1 stores the
distributed batches on the disk for sending in background.

But types may be a little bit different for the Distributed and it's
underlying table, so the initiator need to know whether conversion is
required or not.

Before this patch those on disk distributed batches contains header,
which includes dumpStructure() for the block in that batch, however it
checks not only names and types and plus dumpStructure() is a debug
method.

So instead of storing string representation for the block header we
should store empty block in the file header (note, that we cannot store
the empty block not in header, since this will require reading all
blocks from file, due to some trickery of the readers interface).

Note, that this patch also contains tiny refactoring:
- s/header/distributed_header/

v1: dumpNamesAndTypes()
v2: dump empty block into the batch itself
v3: move empty block into the header
2021-04-06 10:05:21 +03:00
feng lv
56073db22d max distributed depth
Add settings max_distributed_depth

fix style

fix

fix

fix

fix test

fix

fix
2021-04-05 14:00:54 +00:00