Commit Graph

82 Commits

Author SHA1 Message Date
vdimir
498ba33160
Rename max_temp_data_on_disk -> max_temporary_data_on_disk 2022-10-01 08:51:33 +00:00
vdimir
0f1a7c252d
better TemporaryDataOnDisk 2022-09-29 09:51:46 +00:00
vdimir
9f3f34548c
Allow to create temporaty streams on leaf TemporaryDataOnDisk 2022-09-29 09:51:45 +00:00
vdimir
15c7a3be34
Temp data on disk: build 2022-09-29 09:51:41 +00:00
vdimir
c0898ce289
Use abstraction for temporary data on disk in Sort and Aggregation 2022-09-29 09:51:41 +00:00
Alexander Tokmakov
f9e7451d88
Merge pull request #41343 from azat/deadlock-fix
Fix possible deadlock with async_socket_for_remote/use_hedged_requests and parallel KILL
2022-09-20 16:59:31 +03:00
Azat Khuzhin
f76f14a99c Fix possible hung/deadlock on query cancellation
Right due to graceful cancellation of the query, it is possible to hung
accepting new queries or even have a deadlock, this is because
cancellation is done while acquiring ProcessListBase::mutex.

So this patch makes query cancellation lock-free, and now the lock will
be acquired only for preparing the query and after cancel is done.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-09-19 17:31:21 +02:00
Azat Khuzhin
5cc5ca22de Fix possible deadlock with async_socket_for_remote/use_hedged_requests and parallel KILL
Right now it is possible to call QueryStatus::addPipelineExecutor() when
the executors_mutex already acquired, it is possible when the query was
cancelled via KILL QUERY.

Here I will show some traces from debugger from a real example, where
tons of ProcessList::insert() got deadlocked.

Let's look at the lock owner for one of the threads that was deadlocked
in ProcessList::insert():

    (gdb) p *mutex
    $2 = {
      __data = {
        __owner = 46899,
      },
    }

And now let's see the stack trace of the 46899:

    #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
    #1  0x00007fb65569b714 in __GI___pthread_mutex_lock (mutex=0x7fb4a9d15298) at ../nptl/pthread_mutex_lock.c:80
    #2  0x000000001b6edd91 in pthread_mutex_lock (arg=0x7fb4a9d15298) at ../src/Common/ThreadFuzzer.cpp:317
    #3  std::__1::__libcpp_mutex_lock (__m=0x7fb4a9d15298) at ../contrib/libcxx/include/__threading_support:303
    #4  std::__1::mutex::lock (this=0x7fb4a9d15298) at ../contrib/libcxx/src/mutex.cpp:33
    #5  0x0000000014c7ae63 in std::__1::lock_guard<std::__1::mutex>::lock_guard (__m=..., this=<optimized out>) at ../contrib/libcxx/include/__mutex_base:91
    #6  DB::QueryStatus::addPipelineExecutor (this=0x7fb4a9d14f90, e=0x80) at ../src/Interpreters/ProcessList.cpp:372
    #7  0x0000000015bee4a7 in DB::PipelineExecutor::PipelineExecutor (this=0x7fb4b1e53618, processors=..., elem=<optimized out>) at ../src/Processors/Executors/PipelineExecutor.cpp:54
    #12 std::__1::make_shared<DB::PipelineExecutor, std::__1::vector<std::__1::shared_ptr<DB::IProcessor>, std::__1::allocator<std::__1::shared_ptr<DB::IProcessor> > >&, DB::QueryStatus*&, void> (__args=@0x7fb63095b9b0: 0x7fb4a9d14f90, __args=@0x7fb63095b9b0: 0x7fb4a9d14f90) at ../contrib/libcxx/include/__memory/shared_ptr.h:963
    #13 DB::QueryPipelineBuilder::execute (this=0x7fb63095b8b0) at ../src/QueryPipeline/QueryPipelineBuilder.cpp:552
    #14 0x00000000158c6c27 in DB::Connection::sendExternalTablesData (this=0x7fb6545e9d98, data=...) at ../src/Client/Connection.cpp:797
    #27 0x0000000014043a81 in DB::RemoteQueryExecutorRoutine::operator() (this=0x7fb63095bf20, sink=...) at ../src/QueryPipeline/RemoteQueryExecutorReadContext.cpp:46
    #32 0x000000000a16dd4f in make_fcontext () at ../contrib/boost/libs/context/src/asm/make_x86_64_sysv_elf_gas.S:71

And also in the logs you can see very strange things for this thread:

    2022.09.13 14:14:51.228979 [ 51145 ] {1712D4E914EC7C99} <Debug> Connection (localhost:9000): Sent data for 1 external tables, total 11 rows in 0.00046389 sec., 23688 rows/sec., 3.84 KiB (8.07 MiB/sec.), compressed 1.1070121092649958 times to 3.47 KiB (7.29 MiB/sec.)
    ...
    2022.09.13 14:14:51.719402 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> executeQuery: (from 10.101.15.181:42478) KILL QUERY WHERE query_id = '1712D4E914EC7C99' (stage: Complete)
    2022.09.13 14:14:51.719488 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> executeQuery: (internal) SELECT query_id, user, query FROM system.processes WHERE query_id = '1712D4E914EC7C99' (stage: Complete)
    2022.09.13 14:14:51.719754 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Trace> ContextAccess (default): Access granted: SELECT(user, query_id, query) ON system.processes
    2022.09.13 14:14:51.720544 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
    2022.09.13 14:14:53.228964 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> Connection (localhost:9000): Sent data for 2 scalars, total 2 rows in 2.6838e-05 sec., 73461 rows/sec., 68.00 B (2.38 MiB/sec.), compressed 0.4594594594594595 times to 148.00 B (5.16 MiB/sec.)

How is this possible? The answer is fibers and query cancellation
routine. During cancellation of async queries it going into fibers again
and try to do this gracefully. However because of this during canceling
query it may call QueryStatus::addPipelineExecutor() from
QueryStatus::cancelQuery().

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-09-15 14:41:27 +03:00
Alexey Milovidov
fa62c7e982 Fix half of trash 2022-09-10 04:08:16 +02:00
Dmitry Novik
865ee5d0d6 Refactor code 2022-08-29 20:24:35 +02:00
Dmitry Novik
1169315580 Add OvercommitTracker blocking 2022-08-29 19:44:05 +02:00
Dmitry Novik
cfe509c3de Block overcommit tracker in ProcessList near allocations 2022-08-29 17:49:01 +02:00
Dmitry Novik
5b7fe91675 Avoid deadlock in case of new query and OOM 2022-08-26 17:02:31 +02:00
Azat Khuzhin
c29768325c Print query in one line on fatal errors
executeQuery() logging query without new lines in SQL, and it is useful,
since you can grep and see the whole query (usually).

So let's use the same in fatal error handler to make CI reports more
informative.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-03 18:55:53 +03:00
Dmitry Novik
60b9d81773 Remove global_memory_usage_overcommit_max_wait_microseconds 2022-05-27 16:30:29 +00:00
Dmitry Novik
17608b3d93 Update documentation and defaults for memory overcommit 2022-05-11 16:18:41 +00:00
Dmitry Novik
2ed5a4013a
Revert "Revert "Memory overcommit: continue query execution if memory is available"" 2022-05-03 00:45:13 +02:00
alesapin
f0b7af0aa2
Revert "Memory overcommit: continue query execution if memory is available" 2022-05-03 00:36:50 +02:00
Dmitry Novik
71b6f89166
Merge pull request #35637 from ClickHouse/memory-overcommit-free
Memory overcommit: continue query execution if memory is available
2022-05-02 19:00:18 +02:00
mergify[bot]
e9fde5d067
Merge branch 'master' into memory-overcommit-free 2022-04-29 19:31:59 +00:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Azat Khuzhin
6339a48923 Add is_all_data_sent into system.processes
v2: fix SHOW PROCESSLIST (does not have process list entry)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-26 12:15:00 +03:00
Dmitry Novik
0d75e773ab Continue query execution if memory is freed 2022-03-26 18:25:58 +00:00
Dmitry Novik
07d0e3c823 cleanup 2022-02-16 20:19:10 +00:00
Dmitry Novik
44d5ddc939 Fix deadlock in OvercommitTracker 2022-02-16 20:02:14 +00:00
Dmitry Novik
bb6dad7d0e Fix lock order 2022-02-15 15:04:13 +00:00
mergify[bot]
cb3e5f8538
Merge branch 'master' into memory-overcommit 2022-02-10 11:01:43 +00:00
Anton Popov
c28255850a fix metric Query 2022-02-01 15:06:49 +03:00
Dmitry Novik
d7b4a32938 fix build 2022-01-18 20:26:12 +00:00
Dmitry Novik
c0970b75ee
Merge branch 'master' into memory-overcommit 2022-01-18 15:30:24 +03:00
Dmitry Novik
83c663e2d6 Cleanup after code review 2022-01-18 12:21:59 +00:00
Mikhail f. Shiryaev
d35ad19135
Fix review points 2022-01-10 18:23:17 +01:00
Mikhail f. Shiryaev
a8c83dce14
Use map for QueryKindAmounts 2022-01-10 13:49:53 +01:00
Mikhail f. Shiryaev
a882e64644
Use IAST::QueryKind instead of strings in QueryKindAmount 2021-12-29 17:32:59 +01:00
Mikhail f. Shiryaev
272ea7fc5b
Merge pull request #32609 from cmsxbc/query-kind-concurent_restriction
add settings: max_concurrent_select_queries and max_concurrent_insert_queries
2021-12-29 15:23:45 +01:00
Dmitry Novik
dce9390ece Init thread_group before QueryStatus creation 2021-12-16 16:46:15 +03:00
Dmitry Novik
0d91864ec0 Avoid possible data race when thread_group is initialized 2021-12-15 18:11:17 +03:00
Dmitry Novik
de432f9270 Fix possible NPE when thread_group is not set 2021-12-14 16:39:53 +03:00
cmsxbc
b30e250eed add max_concurrent_select_queries and max_concurrent_insert_queries 2021-12-14 07:37:38 +00:00
Raúl Marín
3fc4167c54 Rework how progress is reported in views 2021-12-09 17:08:29 +01:00
Dmitry Novik
8d8222acf2 Fix build with gcc 2021-12-07 13:11:31 +03:00
mergify[bot]
600dcb749a
Merge branch 'master' into memory-overcommit 2021-12-07 00:40:20 +00:00
Dmitry Novik
12101d82aa Fix overcommit ratio comparison and race condition 2021-12-06 21:34:52 +03:00
Raúl Marín
5662d0aa59 Use softer checks 2021-12-02 14:53:55 +01:00
Raúl Marín
c498b7ba59 Move limits check to ProcessList 2021-11-26 12:44:39 +01:00
Raúl Marín
f39648dafb Style 2021-11-23 09:23:22 +01:00
Raúl Marín
c6d3065885 Check max_execution_time in the pipeline and pulling executors 2021-11-23 09:23:22 +01:00
Kseniia Sumarokova
2313981fd7
Merge pull request #31260 from azat/external-cleanup
Cleanup extern ProfileEvents/CurrentMetrics and add a style check
2021-11-12 00:02:57 +03:00
Azat Khuzhin
baf14444e6 Cleanup ProfileEvents and CurrentMetrics 2021-11-10 21:15:27 +03:00
Dmitry Novik
9464dc2fd3 Add waiting timeout 2021-11-09 16:02:36 +03:00