Commit Graph

446 Commits

Author SHA1 Message Date
Dmitry Novik
eb7ae91d01 Do not add alias to a temporary table 2023-06-01 14:34:30 +00:00
Dmitry Novik
c6dcb69b85 Fix GLOBAL IN 2023-06-01 14:34:30 +00:00
Dmitry Novik
85e5ed79e5 Fix distributed JOINs 2023-06-01 14:34:30 +00:00
Dmitry Novik
b86516131b Attempt to fix global JOINs and INs 2023-06-01 14:34:30 +00:00
Dmitry Novik
a4cb82127d Analyzer: WIP on distributed queries 2023-06-01 14:34:29 +00:00
Alexey Milovidov
956c399b2a Remove useless code 2023-06-01 03:04:29 +02:00
Nikolai Kochetov
8cec00dd6e Merge branch 'master' into refactor-subqueries-for-in 2023-05-30 18:08:37 +02:00
Anton Popov
612173e734 refactoring near alter conversions 2023-05-25 22:54:54 +00:00
Nikolai Kochetov
b5b261b22c Merge branch 'master' into refactor-subqueries-for-in 2023-05-25 21:22:06 +02:00
Nikita Mikhaylov
1c3b6738f4
Fixes for parallel replicas (#50195) 2023-05-25 14:41:04 +02:00
Nikolai Kochetov
d8f39b8df1 Fixing more tests. 2023-05-24 17:53:37 +00:00
Alexey Milovidov
5a44dc26e7 Fixes for clang-17 2023-05-13 02:57:31 +02:00
Sema Checherinda
f2ad1122a1 fix convertation 2023-05-10 17:50:42 +00:00
Kruglov Pavel
2ad161d2b7
Merge branch 'master' into non-blocking-connect 2023-04-19 13:39:40 +02:00
Yakov Olkhovskiy
35e9e45249
Merge pull request #48062 from Algunenano/unnecessary_alter_checks
Only check MV on ALTER when necessary
2023-04-03 17:23:11 -04:00
Azat Khuzhin
f38a7aeabe ThreadPool metrics introspection
There are lots of thread pools and simple local-vs-global is not enough
already, it is good to know which one in particular uses threads.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-03-29 10:46:59 +02:00
Raúl Marín
d1a6c1991a Only check MV on ALTER when necessary 2023-03-27 17:45:15 +02:00
avogar
38e44861ae Fix possible race conditions 2023-03-21 16:01:54 +00:00
Dmitry Novik
cced9cf613 Fix build 2023-03-14 12:04:39 +00:00
Dmitry Novik
ae3d30a736 Merge remote-tracking branch 'origin/master' into fix-grouping-for-grouping-sets 2023-03-14 12:01:51 +00:00
Alexander Tokmakov
ed08f8f5c5
Merge branch 'master' into revert_25674 2023-03-12 02:33:25 +03:00
Alexander Tokmakov
7b1b238d0b Revert "Merge pull request #25674 from amosbird/distributedreturnconnection"
This reverts commit 5ffd99dfd4, reversing
changes made to 2796aa333f.
2023-03-11 19:09:47 +01:00
Maksim Kita
0358cb36d8 Fixed tests 2023-03-11 11:51:54 +01:00
Maksim Kita
677408e02e Fixed style check 2023-03-11 11:51:54 +01:00
Maksim Kita
a762112e15 Analyzer support distributed JOINS and subqueries in IN functions 2023-03-11 11:51:54 +01:00
Dmitry Novik
2699ef477f Move visitor 2023-03-10 14:36:56 +00:00
Dmitry Novik
a305c6e7ab Fix distributed GROUPING SETS and GROUPING function 2023-03-09 18:00:23 +00:00
Antonio Andelic
35c15e6ef8 Merge branch 'master' into custom-key-parallel-replicas 2023-03-07 09:37:38 +00:00
Han Fei
b7eef62458
Merge pull request #45491 from azat/dist/async-send-refactoring
[RFC] Rewrite distributed sends to avoid using filesystem as a queue, use in-memory queue instead
2023-03-06 12:32:33 +01:00
Antonio Andelic
737cf8e149 Better 2023-03-03 15:14:49 +00:00
Antonio Andelic
01cf9c94f4 Merge branch 'master' into custom-key-parallel-replicas 2023-03-02 14:28:42 +00:00
Maksim Kita
d39be3ac9c Fixed tests 2023-03-01 18:05:07 +01:00
Maksim Kita
51ee007e01 Fixed tests 2023-03-01 18:05:07 +01:00
Azat Khuzhin
e10fb142fd Fix race for distributed sends from disk
Before it was initialized from disk only on startup, but if some INSERT
can create the object before, then, it will lead to the situation when
it will not be initialized.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-02-28 22:33:36 +01:00
Azat Khuzhin
b5434eac3b Rename StorageDistributedDirectoryMonitor to DistributedAsyncInsertDirectoryQueue
Since #44922 it is not a directory monitor anymore.

v2: Remove unused error codes
v3: Contains some header fixes due to conflicts with master
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-02-28 22:33:36 +01:00
Azat Khuzhin
3f892e52ab Revert "Revert "Merge pull request #44922 from azat/dist/async-INSERT-metrics""
This is the revert of revert since there will be follow up patches to
address the issues.

This reverts commit a55798626a.
2023-02-28 22:33:36 +01:00
Maksim Kita
cbd961de98 Fixed code review issues 2023-02-18 17:06:00 +01:00
Maksim Kita
05baf271f0 Analyzer fix table functions with invalid arguments analysis 2023-02-16 12:17:04 +01:00
Maksim Kita
6b2adc1ec2 Analyzer storage Merge fixes 2023-02-16 12:17:03 +01:00
Maksim Kita
b1ab2af7ad Analyzer support storage Merge 2023-02-16 12:17:03 +01:00
Maksim Kita
84065fb13f Analyzer added distributed table functions support 2023-02-16 12:17:03 +01:00
Maksim Kita
a090a8449d Analyzer distributed read fix 2023-02-16 12:17:03 +01:00
Maksim Kita
f8442b2a8d Analyzer support LiveView 2023-02-16 12:17:03 +01:00
Maksim Kita
25da9dcef7 StorageDistributed Planner initialization fix 2023-02-16 12:17:02 +01:00
Antonio Andelic
adde580756
Merge branch 'master' into custom-key-parallel-replicas 2023-02-14 14:09:12 +01:00
Antonio Andelic
f67e5505ab Merge branch 'master' into custom-key-parallel-replicas 2023-02-06 11:12:39 +00:00
Robert Schulze
84b9ff450f
Fix terribly broken, fragile and potentially cyclic linking
Sorry for the clickbaity title. This is about static method
ConnectionTimeouts::getHTTPTimeouts(). It was be declared in header
IO/ConnectionTimeouts.h, and defined in header
IO/ConnectionTimeoutsContext.h (!). This is weird and caused issues with
linking on s390x (##45520). There was an attempt to fix some
inconsistencies (#45848) but neither did @Algunenano nor me at first
really understand why the definition is in the header.

Turns out that ConnectionTimeoutsContext.h is only #include'd from
source files which are part of the normal server build BUT NOT part of
the keeper standalone build (which must be enabled via CMake
-DBUILD_STANDALONE_KEEPER=1). This dependency was not documented and as
a result, some misguided workarounds were introduced earlier, e.g.
0341c6c54b

The deeper cause was that getHTTPTimeouts() is passed a "Context". This
class is part of the "dbms" libary which is deliberately not linked by
the standalone build of clickhouse-keeper. The context is only used to
read the settings and the "Settings" class is part of the
clickhouse_common library which is linked by clickhouse-keeper already.

To resolve this mess, this PR

- creates source file IO/ConnectionTimeouts.cpp and moves all
  ConnectionTimeouts definitions into it, including getHTTPTimeouts().

- breaks the wrong dependency by passing "Settings" instead of "Context"
  into getHTTPTimeouts().

- resolves the previous hacks
2023-02-05 20:49:34 +00:00
Han Fei
532b341de9
Merge pull request #45975 from ucasfl/_part
use LowCardnality for _part and _partition_id virtual column
2023-02-05 18:00:46 +01:00
Nikita Mikhaylov
33877b5e00
Parallel replicas. Part [2] (#43772) 2023-02-03 14:34:18 +01:00
flynn
2d1dd694c6 make _table LowCardinality 2023-02-02 16:33:31 +00:00
flynn
f88a8bac19 fix 2023-02-02 16:22:09 +00:00
flynn
bc38ebaf52 use LowCardnality for _part and _partition_id virtual column
fix
2023-02-02 16:20:29 +00:00
Antonio Andelic
c99efa75b7
Merge branch 'master' into custom-key-parallel-replicas 2023-01-31 11:58:30 +01:00
alesapin
be58d5d1af Fix bug in tables drop which can lead to potential query hung 2023-01-30 17:00:28 +01:00
Antonio Andelic
95853af459 Merge branch 'master' into custom-key-parallel-replicas 2023-01-24 10:49:40 +00:00
Antonio Andelic
37b62b3a58 Use Map for custom_key 2023-01-24 10:46:47 +00:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages (#45449)
* save format string for NetException

* format exceptions

* format exceptions 2

* format exceptions 3

* format exceptions 4

* format exceptions 5

* format exceptions 6

* fix

* format exceptions 7

* format exceptions 8

* Update MergeTreeIndexGin.cpp

* Update AggregateFunctionMap.cpp

* Update AggregateFunctionMap.cpp

* fix
2023-01-24 00:13:58 +03:00
Azat Khuzhin
51019bc9f3 Fix a race between Distributed table creation and INSERT into it
Initializing queues for pending on-disk files for async INSERT cannot be
done after table had been attached and visible to user, since it
initializes the per-table counter, that is used during INSERT.

Now there is a window, when this counter is not initialized and it will
start from the beginning, and this could lead to CANNOT_LINK error:

    Destination file /data/clickhouse/data/urls_v1/urls_in/shard6_replica1/13129817.bin is already exist and have different inode

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-23 09:55:43 +01:00
Antonio Andelic
44ef00dc05 Merge branch 'master' into custom-key-parallel-replicas 2023-01-22 14:20:58 +00:00
Alexander Tokmakov
1174eaa132
Merge pull request #45492 from azat/revert/dist/async-INSERT-metrics
Revert "Merge pull request #44922 from azat/dist/async-INSERT-metrics"
2023-01-22 00:45:10 +03:00
Azat Khuzhin
a55798626a Revert "Merge pull request #44922 from azat/dist/async-INSERT-metrics"
There are the following problems with this patch:
- Looses files on exception
- Existing current_batch.txt on startup leads to ENOENT error and hung
  of distributed sends without ATTACH/DETACH
- Race between creating the queue for sending at table startup and
  INSERT, if it had been created from INSERT, then it will not be
  initialized from disk

They were addressed in #45491, but it makes code more cmoplex and plus
since, likely, the release is comming, it is better to revert the
change.

This reverts commit 94604f71b7, reversing
changes made to 80f6a45376.
2023-01-21 22:42:00 +01:00
Maksim Kita
19f1bae5ed
Merge pull request #45254 from kitaisreal/planner-small-fixes
Planner small fixes
2023-01-21 19:54:17 +03:00
Alexander Tokmakov
910d6dc0ce
Merge pull request #45342 from ClickHouse/exception_message_patterns
Save message format strings for DB::Exception
2023-01-20 18:46:52 +03:00
Maksim Kita
2c56b0b2b9 Planner small fixes 2023-01-19 19:05:49 +01:00
Antonio Andelic
d93cb3e1dd More correct check 2023-01-19 13:24:35 +00:00
Antonio Andelic
ddfb913f99 better 2023-01-19 11:28:26 +00:00
Antonio Andelic
3b0c63551e Combine approaches 2023-01-19 10:26:38 +00:00
Antonio Andelic
7a75144ce3 Refactor 2023-01-19 09:42:54 +00:00
Antonio Andelic
1c0a3e38c0 Fix queries with Distributed storage 2023-01-19 08:13:59 +00:00
Antonio Andelic
a37ca4b961 Merge branch 'master' into custom-key-parallel-replicas-with-shard 2023-01-18 11:53:04 +00:00
Alexander Tokmakov
5cd90c1a3e Merge branch 'master' into exception_message_patterns 2023-01-17 20:04:04 +01:00
Azat Khuzhin
54fc6859ae Fix race in Distributed table startup
Before this patch it was possible to have multiple directory monitors
for the same directory, one from the INSERT context, another one on
storage startup().

Here are an example of logs for this scenario:

    2022.12.07 12:12:27.552485 [ 39925 ] {a47fcb32-4f44-4dbd-94fe-0070d4ea0f6b} <Debug> DDLWorker: Executed query: DETACH TABLE inc.dist_urls_in
    ...
    2022.12.07 12:12:33.228449 [ 4408 ] {20c761d3-a46d-417b-9fcd-89a8919dd1fe} <Debug> executeQuery: (from 0.0.0.0:0, user: ) /* ddl_entry=query-0000089229 */ ATTACH TABLE inc.dist_urls_in (stage: Complete)
    ... this is the DirectoryMonitor created from the context of INSERT for the old StoragePtr that had not been destroyed yet (becase of "was 1" this can be done only from the context of INSERT) ...
    2022.12.07 12:12:35.556048 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Files set to 173 (was 1)
    2022.12.07 12:12:35.556078 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Bytes set to 29750181 (was 71004)
    2022.12.07 12:12:35.562716 [ 39536 ] {} <Trace> Connection (i13.ch:9000): Connected to ClickHouse server version 22.10.1.
    2022.12.07 12:12:35.562750 [ 39536 ] {} <Debug> inc.dist_urls_in.DirectoryMonitor: Sending a batch of 10 files to i13.ch:9000 (0.00 rows, 0.00 B bytes).
    ... this is the DirectoryMonitor that created during ATTACH ...
    2022.12.07 12:12:35.802080 [ 39265 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Files set to 173 (was 0)
    2022.12.07 12:12:35.802107 [ 39265 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Bytes set to 29750181 (was 0)
    2022.12.07 12:12:35.834216 [ 39265 ] {} <Debug> inc.dist_urls_in.DirectoryMonitor: Sending a batch of 10 files to i13.ch:9000 (0.00 rows, 0.00 B bytes).
    ...
    2022.12.07 12:12:38.532627 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Sent a batch of 10 files (took 2976 ms).
    ...
    2022.12.07 12:12:38.601051 [ 39265 ] {} <Error> inc.dist_urls_in.DirectoryMonitor: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in file_size: No such file or directory ["/data6/clickhouse/data/inc/dist_urls_in/shard13_replica1/66827403.bin"], Stack trace (when copying this message, always include the lines below):
    ...
    2022.12.07 12:12:54.132837 [ 4408 ] {20c761d3-a46d-417b-9fcd-89a8919dd1fe} <Debug> DDLWorker: Executed query: ATTACH TABLE inc.dist_urls_in

And eventually both monitors (for a short period of time, one replaces
another) are trying to process the same batch (current_batch.txt), and
one of them fails because such file had been already removed.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-17 14:51:00 +01:00
Antonio Andelic
dd31de18a3 Extend where correctly 2023-01-17 13:31:09 +00:00
Antonio Andelic
bd352068d7 Turn replicas into shard for custom_key 2023-01-17 12:34:42 +00:00
Antonio Andelic
a85565c4a6 Merge branch 'master' into custom-key-parallel-replicas 2023-01-17 07:37:18 +00:00
Alexander Tokmakov
522686f78b less empty patterns 2023-01-17 01:19:44 +01:00
Alexander Tokmakov
94604f71b7
Merge pull request #44922 from azat/dist/async-INSERT-metrics
Optimize and fix metrics for Distributed async INSERT
2023-01-16 14:12:56 +03:00
Antonio Andelic
36cf9debae Merge branch 'master' into custom-key-parallel-replicas 2023-01-16 08:11:08 +00:00
Maksim Kita
43a0996356 Fixed tests 2023-01-12 12:07:58 +01:00
Maksim Kita
a140d6c5b1 Fixed code review issues 2023-01-12 12:07:58 +01:00
Maksim Kita
47f4159909 Analyzer support distributed queries processing 2023-01-12 12:07:58 +01:00
Antonio Andelic
ecbffa80b6 Add READ_TASKS mode 2023-01-12 07:56:15 +00:00
Nikita Mikhaylov
857799fbca
Parallel distributed insert select with s3Cluster [3] (#44955)
* Revert "Revert "Resurrect parallel distributed insert select with s3Cluster (#41535)""

This reverts commit b8d9066004.

* Fix build

* Better

* Fix test

* Automatic style fix

Co-authored-by: robot-clickhouse <robot-clickhouse@users.noreply.github.com>
2023-01-09 13:30:32 +01:00
Azat Khuzhin
f5b44cbe0d Optimize and fix metrics for Distributed async INSERT
In #43406 metrics was broken for a clean start, since they where not
initialized from disk, but metrics for broken files was never
initialized from disk.

Fix this and rework how DirectoryMonitor works with file system:
- do not iterate over directory before each send, do this only once on
  init, after the map of files will be updated from the INSERT
- call fs::create_directories() from the ctor for "broken" folder to
  avoid excessive calls
- cache "broken" paths

This patch also fixes possible issue when current_batch can be processed
multiple times (second time will be an exception), since if there is
existing current_batch.txt after processing it you should remove it
instantly.

Plus this patch implicitly fixes issues with logging, that logs
incorrect number of files in case of error (see #44907 for details).

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-05 11:07:40 +01:00
Nikita Taranov
8ed5cfc265
Memory bound merging for distributed aggregation in order (#40879)
* impl

* fix style

* make executeQueryWithParallelReplicas similar to executeQuery

* impl for parallel replicas

* cleaner code for remote sorting properties

* update test

* fix

* handle when nodes of old versions participate

* small fixes

* temporary enable for testing

* fix after merge

* Revert "temporary enable for testing"

This reverts commit cce7f8884c.

* review fixes

* add bc test

* Update src/Core/Settings.h
2022-11-28 00:41:31 +01:00
Anton Popov
2ae3cfa9e0
Merge branch 'master' into dynamic-columns-14 2022-10-31 16:15:19 +01:00
Maksim Kita
ca93ee7479 Fixed tests 2022-10-24 10:22:20 +02:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Alexander Tokmakov
bd10a9d2d4
Merge pull request #42168 from ClickHouse/unreachable_macro
Abort instead of `__builtin_unreachable` in debug builds
2022-10-08 19:05:57 +03:00
Alexander Tokmakov
4175f8cde6 abort instead of __builtin_unreachable in debug builds 2022-10-07 21:49:08 +02:00
Alexander Tokmakov
b8d9066004 Revert "Resurrect parallel distributed insert select with s3Cluster (#41535)"
This reverts commit 860e34e760.
2022-10-07 15:53:30 +02:00
Nikita Mikhaylov
860e34e760
Resurrect parallel distributed insert select with s3Cluster (#41535) 2022-10-06 13:47:32 +02:00
Anton Popov
6e61cf92f5 Merge remote-tracking branch 'upstream/master' into HEAD 2022-10-03 13:16:57 +00:00
Alexey Milovidov
ab4db2d0c4 Fix 5/6 of trash 2022-09-19 08:50:53 +02:00
Anton Popov
f0a404e2c8 Merge remote-tracking branch 'upstream/master' into HEAD 2022-09-06 15:51:16 +00:00
Alexander Tokmakov
f9f85a0e8b Revert "Parallel distributed insert select from *Cluster table functions (#39107)"
This reverts commit d3cc234986.
2022-08-24 15:17:15 +03:00
Nikita Mikhaylov
d3cc234986
Parallel distributed insert select from *Cluster table functions (#39107) 2022-08-15 12:41:17 +02:00
Alexander Gololobov
ae0d00083c Renamed __row_exists to _row_exists 2022-07-18 20:07:36 +02:00
Alexander Gololobov
9de72d995a POC lightweight delete using __row_exists virtual column and prewhere-like filtering 2022-07-18 20:06:42 +02:00
avogar
59c1c472cb Better exception messages on wrong table engines/functions argument types 2022-06-23 20:04:06 +00:00