Commit Graph

194 Commits

Author SHA1 Message Date
Alexey Milovidov
8fb70abe3e
Merge pull request #39178 from azat/dist-insert-log
Add connection info for Distributed sends log message
2022-07-31 02:22:22 +03:00
Robert Schulze
4333750985
Less usage of StringRef
... replaced by std::string_view, see #39262
2022-07-24 18:33:52 +00:00
Robert Schulze
81ef1099cc
Even less usage of StringRef
--> see #39300
2022-07-19 07:01:06 +00:00
Azat Khuzhin
4f25a08b7c Add connection info for Distributed sends log message
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-13 16:04:12 +03:00
Nikolai Kochetov
1b85f2c1d6 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-25 16:27:40 +02:00
Nikolai Kochetov
56feef01e7 Move some resources 2022-05-20 19:49:31 +00:00
Anton Popov
e911900054 remove last mentions of data streams 2022-05-09 19:15:24 +00:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Maksim Kita
57444fc7d3
Merge pull request #36444 from rschu1ze/clang-tidy-fixes
Clang tidy fixes
2022-04-21 16:11:27 +02:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
Robert Schulze
118e94523c
Activate clang-tidy warning "readability-container-contains"
This check suggests replacing <Container>.count() by
<Container>.contains() which is more speaking and in case of
multimaps/multisets also faster.
2022-04-18 23:53:11 +02:00
Azat Khuzhin
c4b6342853
Improvements for parallel_distributed_insert_select (and related) (#34728)
* Add a warning if parallel_distributed_insert_select was ignored

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Respect max_distributed_depth for parallel_distributed_insert_select

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Print warning for non applied parallel_distributed_insert_select only for initial query

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Remove Cluster::getHashOfAddresses()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Forbid parallel_distributed_insert_select for remote()/cluster() with different addresses

Before it uses empty cluster name (getClusterName()) which is not
correct, compare all addresses instead.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Fix max_distributed_depth check

max_distributed_depth=1 must mean not more then one distributed query,
not two, since max_distributed_depth=0 means no limit, and
distribute_depth is 0 for the first query.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Fix INSERT INTO remote()/cluster() with parallel_distributed_insert_select

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Add a test for parallel_distributed_insert_select with cluster()/remote()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Return <remote> instead of empty cluster name in Distributed engine

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Make user with sharding_key and w/o in remote()/cluster() identical

Before with sharding_key the user was "default", while w/o it it was
empty.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-08 15:24:39 +01:00
Frank Chen
b4829465d9
Improve the opentelemetry span logs for INSERT on distributed table (#34480) 2022-03-03 12:53:29 +01:00
Hongbin
99bd56e2de
Fix some code comments style 2022-02-28 08:15:37 +08:00
Anton Popov
7e9770dcf0 minor enhancements 2022-02-08 15:57:23 +03:00
Anton Popov
96a506c6fa fix inserts to distributed tables in case of change of native protocol 2022-01-29 03:23:25 +03:00
Amos Bird
6d62060e16
Build improvement 2022-01-17 22:36:27 +08:00
Raúl Marín
b2cfa70541 Reduce dependencies on ASTFunction.h
481 -> 230
2021-11-26 18:21:54 +01:00
avogar
51831afff8 Fix tests 2021-11-11 20:27:23 +03:00
Nikolai Kochetov
a08c98d760 Move some files. 2021-10-16 17:03:50 +03:00
Nikolai Kochetov
fd14faeae2 Remove DataStreams folder. 2021-10-15 23:18:20 +03:00
Nikolai Kochetov
c6bce1a4cf Update Native. 2021-10-08 20:21:19 +03:00
Nikolai Kochetov
78e1db209f
Remove more data streams (#29491)
* Remove more streams.

* Fixing build.

* Fixing build.

* Rename files.

* Fix fast test.

* Fix StorageKafka.

* Try fix kafka test.

* Move createBuffer to KafkaSource ctor.

* Revert "Move createBuffer to KafkaSource ctor."

This reverts commit 81fa94d27e.

* Revert "Try fix kafka test."

This reverts commit 2107e54969.

* Comment some rows in test.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-10-07 11:26:08 +03:00
Vitaly Baranov
8a01b32cba
Merge pull request #28637 from vitlibar/fix-materialized-column-as-sharding-key
Fix materialized column as sharding key
2021-10-05 10:53:24 +03:00
Vitaly Baranov
1636ee24bb Fix using materialized column as sharding key. 2021-10-04 10:56:42 +03:00
Azat Khuzhin
ae5ee2dd28 Move macros for distributed engine into separate header 2021-10-03 14:34:03 +03:00
Azat Khuzhin
6a9dd9828d Move protocol macros into separate header
Defines.h is a very common header, so lots of modules will be recompiled
on changes.
Move macros for protocol into separate header, this should significantly
decreases number of units to compile on it's changes.
2021-10-03 14:34:03 +03:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Nikolai Kochetov
341553febd Fix build. 2021-09-16 20:40:42 +03:00
Nikolai Kochetov
66a76ab70f Rewrite PushingToViewsBlockOutputStream part 6 2021-09-03 20:29:36 +03:00
Nikolai Kochetov
61d8f880cd Rename some files. 2021-07-26 19:48:25 +03:00
Nikolai Kochetov
9b5a816b43 Merge branch 'master' into output-streams-to-processors 2021-07-26 18:03:11 +03:00
Nikolai Kochetov
0eb563dc1b Fix more tests. 2021-07-26 17:47:29 +03:00
Nikolai Kochetov
9c92f43359 Update storages. 2021-07-23 22:33:59 +03:00
Nikolai Kochetov
2dc5c89b66 Update Storage::write 2021-07-23 17:25:35 +03:00
Nikolai Kochetov
3ed3f7a9f7 Fix integration tests. 2021-07-22 13:38:22 +03:00
Nikolai Kochetov
65d3e713d6 Fix another one test. 2021-07-21 15:16:13 +03:00
Nikolai Kochetov
179ec05a72 Remove some streams. 2021-07-20 21:18:43 +03:00
alexey-milovidov
b16e01507f
Merge pull request #26464 from azat/ubsan-dir-mon-fix
Fix undefined-behavior in DirectoryMonitor (for exponential back off)
2021-07-17 18:18:42 +03:00
alexey-milovidov
ca37548888
Merge pull request #26430 from azat/fix-dist-msg
Fix "While sending batch" (on Distributed async send)
2021-07-17 13:01:36 +03:00
Azat Khuzhin
d2967ffa0b Fix undefined-behavior in DirectoryMonitor (for exponential back off)
UBsan reports [1]:

    ../src/Storages/Distributed/DirectoryMonitor.cpp:435:54: runtime error: 2.30584e+19 is outside the range of representable values of type 'unsigned long'

  [1]: https://clickhouse-test-reports.s3.yandex.net/0/10f3500b3be73c9498d994d189784c7d44ed6793/stress_test_(undefined).html#fail1
2021-07-17 12:18:05 +03:00
Azat Khuzhin
80e614318c Fix "While sending batch" (on Distributed async send) 2021-07-16 22:27:46 +03:00
Azat Khuzhin
a3653bd665 Fix overflow in exponential sleep in DirectoryMonitor
UBsan reports:

    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/Storages/Distributed/DirectoryMonitor.cpp:435:53 in
    ../src/Storages/Distributed/DirectoryMonitor.cpp:435: runtime error: 1.15292e+19 is outside the range of representable values of type 'long'
        0 0x1df0c286 in DB::StorageDistributedDirectoryMonitor::run() obj-x86_64-linux-gnu/../src/Storages/Distributed/DirectoryMonitor.cpp:435:53

It is pretty easy to reproduce by limiting max_server_memory_usage
before staring the test.
2021-07-16 04:10:47 +03:00
Azat Khuzhin
f3d3ec44a6 Add ability to set Distributed directory monitor settings via CREATE TABLE 2021-07-16 04:10:47 +03:00
alexey-milovidov
4183f3164a
Merge branch 'master' into fixrandomoneshardinsert 2021-07-13 04:46:40 +03:00
alexey-milovidov
0a26687115
Merge pull request #23864 from azat/dist-split-batch-and-retry
Add ability to split distributed batch on failures (i.e. due to memory limits)
2021-06-27 19:28:26 +03:00
Azat Khuzhin
a616ae8861 Improve startup time of Distributed engine.
- create directory monitors in parallel (this also includes rmdir in
  case of directory is empty, since even if the directory is empty it
  may take some time to remove it, due to waiting for journal or if the
  directory is large, i.e. it had lots of files before, since remember
  ext4 does not truncate the directory size on each unlink [1])
- initialize increment in parallel too (since it does readdir())

  [1]: https://lore.kernel.org/linux-ext4/930A5754-5CE6-4567-8CF0-62447C97825C@dilger.ca/
2021-06-24 10:27:51 +03:00
Azat Khuzhin
3bd53c68f9 Try to split the batch in case of broken batch too
Broken batches may be because of abnormal server shutdown (and lack of
fsync), and ignoring the whole batch is not great in this case, so apply
the same split logic here too.

v2: rename exception
v3: catch missing exception
v4: fix marking the file as broken multiple times (fixes
test_insert_distributed_async_send with setting enabled)
2021-06-23 02:48:47 +03:00
Azat Khuzhin
a0209178cc Add ability to split distributed batch on failures
Add distributed_directory_monitor_split_batch_on_failure setting (OFF by
default), that will split the batch and send files one by one in case of
retriable errors.

v2: more error codes
2021-06-23 02:48:47 +03:00
Azat Khuzhin
e148ef739d Drop replicas from dirname for internal_replication=true
Under use_compact_format_in_distributed_parts_names=1 and
internal_replication=true the server encodes all replicas for the
directory name for async INSERT into Distributed, and the directory name
looks like:

    shard1_replica1,shard1_replica2,shard3_replica3

This is required for creating connections (to specific replicas only),
but in case of internal_replication=true, this can be avoided, since
this path will always includes all replicas.

This patch replaces all replicas with "_all_replicas" marker.

Note, that initial problem was that this path may overflow the NAME_MAX
if you will have more then 15 replicas, and the server will fail to
create the directory.

Also note, that changed directory name should not be a problem, since:
- empty directories will be removed since #16729
- and replicas encoded in the directory name is also supported anyway.
2021-06-23 02:47:38 +03:00