Commit Graph

325 Commits

Author SHA1 Message Date
Antonio Andelic
a37ca4b961 Merge branch 'master' into custom-key-parallel-replicas-with-shard 2023-01-18 11:53:04 +00:00
Azat Khuzhin
54fc6859ae Fix race in Distributed table startup
Before this patch it was possible to have multiple directory monitors
for the same directory, one from the INSERT context, another one on
storage startup().

Here are an example of logs for this scenario:

    2022.12.07 12:12:27.552485 [ 39925 ] {a47fcb32-4f44-4dbd-94fe-0070d4ea0f6b} <Debug> DDLWorker: Executed query: DETACH TABLE inc.dist_urls_in
    ...
    2022.12.07 12:12:33.228449 [ 4408 ] {20c761d3-a46d-417b-9fcd-89a8919dd1fe} <Debug> executeQuery: (from 0.0.0.0:0, user: ) /* ddl_entry=query-0000089229 */ ATTACH TABLE inc.dist_urls_in (stage: Complete)
    ... this is the DirectoryMonitor created from the context of INSERT for the old StoragePtr that had not been destroyed yet (becase of "was 1" this can be done only from the context of INSERT) ...
    2022.12.07 12:12:35.556048 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Files set to 173 (was 1)
    2022.12.07 12:12:35.556078 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Bytes set to 29750181 (was 71004)
    2022.12.07 12:12:35.562716 [ 39536 ] {} <Trace> Connection (i13.ch:9000): Connected to ClickHouse server version 22.10.1.
    2022.12.07 12:12:35.562750 [ 39536 ] {} <Debug> inc.dist_urls_in.DirectoryMonitor: Sending a batch of 10 files to i13.ch:9000 (0.00 rows, 0.00 B bytes).
    ... this is the DirectoryMonitor that created during ATTACH ...
    2022.12.07 12:12:35.802080 [ 39265 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Files set to 173 (was 0)
    2022.12.07 12:12:35.802107 [ 39265 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Bytes set to 29750181 (was 0)
    2022.12.07 12:12:35.834216 [ 39265 ] {} <Debug> inc.dist_urls_in.DirectoryMonitor: Sending a batch of 10 files to i13.ch:9000 (0.00 rows, 0.00 B bytes).
    ...
    2022.12.07 12:12:38.532627 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Sent a batch of 10 files (took 2976 ms).
    ...
    2022.12.07 12:12:38.601051 [ 39265 ] {} <Error> inc.dist_urls_in.DirectoryMonitor: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in file_size: No such file or directory ["/data6/clickhouse/data/inc/dist_urls_in/shard13_replica1/66827403.bin"], Stack trace (when copying this message, always include the lines below):
    ...
    2022.12.07 12:12:54.132837 [ 4408 ] {20c761d3-a46d-417b-9fcd-89a8919dd1fe} <Debug> DDLWorker: Executed query: ATTACH TABLE inc.dist_urls_in

And eventually both monitors (for a short period of time, one replaces
another) are trying to process the same batch (current_batch.txt), and
one of them fails because such file had been already removed.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-17 14:51:00 +01:00
Antonio Andelic
dd31de18a3 Extend where correctly 2023-01-17 13:31:09 +00:00
Antonio Andelic
bd352068d7 Turn replicas into shard for custom_key 2023-01-17 12:34:42 +00:00
Antonio Andelic
a85565c4a6 Merge branch 'master' into custom-key-parallel-replicas 2023-01-17 07:37:18 +00:00
Alexander Tokmakov
94604f71b7
Merge pull request #44922 from azat/dist/async-INSERT-metrics
Optimize and fix metrics for Distributed async INSERT
2023-01-16 14:12:56 +03:00
Antonio Andelic
36cf9debae Merge branch 'master' into custom-key-parallel-replicas 2023-01-16 08:11:08 +00:00
Maksim Kita
43a0996356 Fixed tests 2023-01-12 12:07:58 +01:00
Maksim Kita
a140d6c5b1 Fixed code review issues 2023-01-12 12:07:58 +01:00
Maksim Kita
47f4159909 Analyzer support distributed queries processing 2023-01-12 12:07:58 +01:00
Antonio Andelic
ecbffa80b6 Add READ_TASKS mode 2023-01-12 07:56:15 +00:00
Nikita Mikhaylov
857799fbca
Parallel distributed insert select with s3Cluster [3] (#44955)
* Revert "Revert "Resurrect parallel distributed insert select with s3Cluster (#41535)""

This reverts commit b8d9066004.

* Fix build

* Better

* Fix test

* Automatic style fix

Co-authored-by: robot-clickhouse <robot-clickhouse@users.noreply.github.com>
2023-01-09 13:30:32 +01:00
Azat Khuzhin
f5b44cbe0d Optimize and fix metrics for Distributed async INSERT
In #43406 metrics was broken for a clean start, since they where not
initialized from disk, but metrics for broken files was never
initialized from disk.

Fix this and rework how DirectoryMonitor works with file system:
- do not iterate over directory before each send, do this only once on
  init, after the map of files will be updated from the INSERT
- call fs::create_directories() from the ctor for "broken" folder to
  avoid excessive calls
- cache "broken" paths

This patch also fixes possible issue when current_batch can be processed
multiple times (second time will be an exception), since if there is
existing current_batch.txt after processing it you should remove it
instantly.

Plus this patch implicitly fixes issues with logging, that logs
incorrect number of files in case of error (see #44907 for details).

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-05 11:07:40 +01:00
Nikita Taranov
8ed5cfc265
Memory bound merging for distributed aggregation in order (#40879)
* impl

* fix style

* make executeQueryWithParallelReplicas similar to executeQuery

* impl for parallel replicas

* cleaner code for remote sorting properties

* update test

* fix

* handle when nodes of old versions participate

* small fixes

* temporary enable for testing

* fix after merge

* Revert "temporary enable for testing"

This reverts commit cce7f8884c.

* review fixes

* add bc test

* Update src/Core/Settings.h
2022-11-28 00:41:31 +01:00
Anton Popov
2ae3cfa9e0
Merge branch 'master' into dynamic-columns-14 2022-10-31 16:15:19 +01:00
Maksim Kita
ca93ee7479 Fixed tests 2022-10-24 10:22:20 +02:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Alexander Tokmakov
bd10a9d2d4
Merge pull request #42168 from ClickHouse/unreachable_macro
Abort instead of `__builtin_unreachable` in debug builds
2022-10-08 19:05:57 +03:00
Alexander Tokmakov
4175f8cde6 abort instead of __builtin_unreachable in debug builds 2022-10-07 21:49:08 +02:00
Alexander Tokmakov
b8d9066004 Revert "Resurrect parallel distributed insert select with s3Cluster (#41535)"
This reverts commit 860e34e760.
2022-10-07 15:53:30 +02:00
Nikita Mikhaylov
860e34e760
Resurrect parallel distributed insert select with s3Cluster (#41535) 2022-10-06 13:47:32 +02:00
Anton Popov
6e61cf92f5 Merge remote-tracking branch 'upstream/master' into HEAD 2022-10-03 13:16:57 +00:00
Alexey Milovidov
ab4db2d0c4 Fix 5/6 of trash 2022-09-19 08:50:53 +02:00
Anton Popov
f0a404e2c8 Merge remote-tracking branch 'upstream/master' into HEAD 2022-09-06 15:51:16 +00:00
Alexander Tokmakov
f9f85a0e8b Revert "Parallel distributed insert select from *Cluster table functions (#39107)"
This reverts commit d3cc234986.
2022-08-24 15:17:15 +03:00
Nikita Mikhaylov
d3cc234986
Parallel distributed insert select from *Cluster table functions (#39107) 2022-08-15 12:41:17 +02:00
Alexander Gololobov
ae0d00083c Renamed __row_exists to _row_exists 2022-07-18 20:07:36 +02:00
Alexander Gololobov
9de72d995a POC lightweight delete using __row_exists virtual column and prewhere-like filtering 2022-07-18 20:06:42 +02:00
avogar
59c1c472cb Better exception messages on wrong table engines/functions argument types 2022-06-23 20:04:06 +00:00
Nikolai Kochetov
8991f39412 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-06-02 17:00:08 +00:00
Nikita Mikhaylov
d34e051c69
Support for simultaneous read from local and remote parallel replica (#37204) 2022-06-02 11:46:33 +02:00
Nikolai Kochetov
c71256ea38 Remove some commented code. 2022-05-30 13:18:20 +00:00
Nikolai Kochetov
1b85f2c1d6 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-25 16:27:40 +02:00
Nikolai Kochetov
fd97a9d885 Move some resources 2022-05-23 19:47:32 +00:00
Nikolai Kochetov
56feef01e7 Move some resources 2022-05-20 19:49:31 +00:00
Anton Popov
e911900054 remove last mentions of data streams 2022-05-09 19:15:24 +00:00
Anton Popov
515f68eead Merge remote-tracking branch 'upstream/master' into dynamic-columns-14 2022-05-06 16:10:51 +00:00
Anton Popov
566c08086a support Object type inside other types 2022-05-06 14:44:00 +00:00
mergify[bot]
64084b5e32
Merge branch 'master' into shared_ptr_helper3 2022-05-03 20:46:16 +00:00
Robert Schulze
330212e0f4
Remove inherited create() method + disallow copying
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
   previously allowed.

Hence, this change

- removes shared_ptr_helper and as a result all inherited create() methods,

- instead, Storage objects are now created using make_shared<>() by the
  caller (for that to work, many constructors had to be made public), and

- all Storage classes were marked as noncopyable using boost::noncopyable.

In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
2022-05-02 08:46:52 +02:00
mergify[bot]
265398d1b6
Merge branch 'master' into feat/add_part_offset 2022-04-25 15:58:16 +00:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
Alexey Milovidov
242919eddd Remove abbreviation 2022-04-18 01:02:49 +02:00
Alexander Tokmakov
07d952b728 use snapshots for semistructured data, durability fixes 2022-03-17 18:26:18 +01:00
roverxu
29a842bf22 feat(...): [LWD] support getting _part_offset of a row 2022-03-15 15:40:10 +08:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Anton Popov
37efe2ddb5
Apply suggestions from code review
Co-authored-by: Kseniia Sumarokova <54203879+kssenii@users.noreply.github.com>
2022-03-10 22:24:19 +01:00
Azat Khuzhin
4843e210c3 Support view() for parallel_distributed_insert_select
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-08 22:05:57 +03:00
Azat Khuzhin
c4b6342853
Improvements for parallel_distributed_insert_select (and related) (#34728)
* Add a warning if parallel_distributed_insert_select was ignored

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Respect max_distributed_depth for parallel_distributed_insert_select

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Print warning for non applied parallel_distributed_insert_select only for initial query

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Remove Cluster::getHashOfAddresses()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Forbid parallel_distributed_insert_select for remote()/cluster() with different addresses

Before it uses empty cluster name (getClusterName()) which is not
correct, compare all addresses instead.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Fix max_distributed_depth check

max_distributed_depth=1 must mean not more then one distributed query,
not two, since max_distributed_depth=0 means no limit, and
distribute_depth is 0 for the first query.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Fix INSERT INTO remote()/cluster() with parallel_distributed_insert_select

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Add a test for parallel_distributed_insert_select with cluster()/remote()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Return <remote> instead of empty cluster name in Distributed engine

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

* Make user with sharding_key and w/o in remote()/cluster() identical

Before with sharding_key the user was "default", while w/o it it was
empty.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-08 15:24:39 +01:00
Anton Popov
04a3a10148 minor fixes 2022-03-01 20:20:53 +03:00