When Buffer() is under preassure, acquiring per-layer lock may take
significant time. And so the following query may take significant amount of time:
SELECT total_bytes, total_rows FROM system.tables WHERE engine='Buffer'
Add 3 new engine arguments:
- flush_time
- flush_rows
- flush_bytes
That will be checked only for background flush, this maybe useful if
INSERT latency is "crucial".
It uses very fast CLOCK_MONOTONIC_COARSE, so this should not be a
problem.
Also note that there is no sense in using microseconds/nanoseconds since
accuracy of CLOCK_MONOTONIC_COARSE usually milliseconds.
* master: (759 commits)
Suppress UBSan report in Decimal comparison
Suppress UBSan report in Decimal comparison
Fix UBSan report in arrayDifference
Update README.md
Non significant change in AggregationCommon
Print stack trace on SIGTRAP
Fix dependent test
Fix tests for better parallel run
Add test for already working code
Revert "Fix access control manager destruction order"
Update index.md
Update index.md
Update index.md
Bit more complicated example for isIPv4String - ru
Bit more complicated example for isIPv4String
cleanup
Replace database with ordinary
Added comments
Split tests to make them stable
Fixes
...
# Conflicts:
# src/Storages/MergeTree/MergeTreeRangeReader.cpp
If you push data via Buffer engine then all your queries will be done
from one user, however this is not always desired behavior, since this
will not allow to limit queries with max_concurrent_queries_for_user and
similar.
Otherwise you can see something like this for the following query:
```sql
WITH
arrayMap(x -> demangle(addressToSymbol(x)), s.trace) AS trace_array,
arrayStringConcat(trace_array, '\n') AS trace_string
SELECT
p.thread_id,
p.query_id,
p.query,
trace_string
FROM
(
SELECT
query_id,
query,
arrayJoin(thread_ids) AS thread_id
FROM system.processes
) AS p
INNER JOIN system.stack_trace AS s ON p.thread_id = s.thread_id
ORDER BY p.query_id ASC
SETTINGS enable_global_with_statement = 0, allow_introspection_functions = 1
FORMAT PrettyCompactNoEscapes
```
Lots of the following:
```sql
INSERT INTO buffer (...) VALUES
__lll_lock_wait
pthread_mutex_lock
std::__1::mutex::lock()
DB::StorageBuffer::reschedule()
DB::BufferBlockOutputStream::write(DB::Block const&)
```
That will wait one of this:
```
INSERT INTO buffer (...) VALUES
...
DB::PushingToViewsBlockOutputStream::write(DB::Block const&)
DB::AddingDefaultBlockOutputStream::write(DB::Block const&)
DB::SquashingBlockOutputStream::finalize()
DB::SquashingBlockOutputStream::writeSuffix()
DB::PushingToViewsBlockOutputStream::writeSuffix()
DB::StorageBuffer::writeBlockToDestination(DB::Block const&, std::__1::shared_ptr<DB::IStorage>)
DB::StorageBuffer::flushBuffer(DB::StorageBuffer::Buffer&, bool, bool, bool)
```
P.S. we cannot simply unlock the buffer during flushing, see comments in
the code
<details>
Stacktrace:
```
(gdb) bt
0 DB::appendBlock (from=..., to=...) at ../src/Storages/StorageBuffer.cpp:411
1 DB::BufferBlockOutputStream::insertIntoBuffer (this=<optimized out>, block=..., buffer=...) at ../src/Storages/StorageBuffer.cpp:541
2 0x000000000f2e9d5f in DB::BufferBlockOutputStream::write (this=<optimized out>, block=...) at ../src/Storages/StorageBuffer.cpp:508
3 0x000000000ec426c4 in DB::PushingToViewsBlockOutputStream::write (this=0x7f74660faa18, block=...) at ../src/DataStreams/PushingToViewsBlockOutputStream.cpp:160
4 0x000000000ec49633 in DB::AddingDefaultBlockOutputStream::write (this=0x7f74660f1b18, block=...) at ../src/DataStreams/AddingDefaultBlockOutputStream.cpp:10
5 0x000000000ec483ac in DB::SquashingBlockOutputStream::finalize (this=0x7f74660f1d18) at ../src/DataStreams/SquashingBlockOutputStream.cpp:30
6 0x000000000ec48429 in DB::SquashingBlockOutputStream::writeSuffix (this=0x7f74660f1d18) at ../src/DataStreams/SquashingBlockOutputStream.cpp:50
7 0x000000000ec43f8f in DB::PushingToViewsBlockOutputStream::writeSuffix (this=0x7f74660f8258) at ../src/DataStreams/PushingToViewsBlockOutputStream.cpp:280
8 0x000000000ec43f8f in DB::PushingToViewsBlockOutputStream::writeSuffix (this=0x7f74b7ddea18) at ../src/DataStreams/PushingToViewsBlockOutputStream.cpp:280
9 0x000000000f2e6748 in DB::StorageBuffer::writeBlockToDestination (this=<optimized out>, block=..., table=...) at ../src/Storages/StorageBuffer.cpp:820
10 0x000000000f2ea00b in DB::BufferBlockOutputStream::write (this=0x7f7574e11748, block=...) at ../src/Storages/StorageBuffer.cpp:469
11 0x000000000ec426c4 in DB::PushingToViewsBlockOutputStream::write (this=0x7f7574ed3658, block=...) at ../src/DataStreams/PushingToViewsBlockOutputStream.cpp:160
12 0x000000000ec49633 in DB::AddingDefaultBlockOutputStream::write (this=0x7f7574e84518, block=...) at ../src/DataStreams/AddingDefaultBlockOutputStream.cpp:10
13 0x000000000ec482f4 in DB::SquashingBlockOutputStream::write (this=0x7f7574e84718, block=...) at ../src/DataStreams/SquashingBlockOutputStream.cpp:17
14 0x000000000ebe8bce in DB::CountingBlockOutputStream::write (this=0x7f7574ed3720, block=...) at ../src/DataStreams/CountingBlockOutputStream.cpp:17
15 0x000000000f68e834 in DB::TCPHandler::receiveData (this=<optimized out>, scalar=<optimized out>) at ../src/Server/TCPHandler.cpp:1168
16 0x000000000f68737c in DB::TCPHandler::receivePacket (this=0x7f7574f17000) at ../src/Server/TCPHandler.cpp:918
17 0x000000000f688d2f in DB::TCPHandler::readDataNext (this=0x7f7574f17000, poll_interval=@0x7f6f1dff1f78: 10000000, receive_timeout=@0x7f6f1dff1f68: 300) at ../src/Server/TCPHandler.cpp:460
18 0x000000000f6878be in DB::TCPHandler::readData (this=0x7f7574f17000, connection_settings=...) at ../src/Server/TCPHandler.cpp:490
19 DB::TCPHandler::processInsertQuery (this=0x7f7574f17000, connection_settings=...) at ../src/Server/TCPHandler.cpp:519
20 0x000000000f680ab9 in DB::TCPHandler::runImpl (this=0x7f7574f17000) at ../src/Server/TCPHandler.cpp:268
21 0x000000000f68f297 in DB::TCPHandler::run (this=0x7f7574f17000) at ../src/Server/TCPHandler.cpp:1414
22 0x0000000011fb81cf in Poco::Net::TCPServerConnection::start (this=0x0) at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
23 0x0000000011fb9be1 in Poco::Net::TCPServerDispatcher::run (this=0x7f752ab5fd00) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:112
24 0x00000000120e71c9 in Poco::PooledThread::run (this=0x7f747d3a4580) at ../contrib/poco/Foundation/src/ThreadPool.cpp:199
25 0x00000000120e315a in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
26 0x00007f760620aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
27 0x00007f760613aeaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) p to.data.__end_-to.data.__begin_
$17 = 10
(gdb) p to.data.__begin_[9].column.px
$19 = (const DB::IColumn *) 0x7f7328392720
(gdb) p to.data.__begin_[8].column.px
$20 = (const DB::IColumn *) 0x0
(gdb) p to.data.__begin_[7].column.px
$21 = (const DB::IColumn *) 0x7f746f33d360
```
Line numbers matched with this version -
f0e7cb16a7/src/Storages/StorageBuffer.cpp (L411)
</details>
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on.
Following syntax variants are now supported:
OPTIMIZE TABLE table DEDUPLICATE; -- the old one
OPTIMIZE TABLE table DEDUPLICATE BY *;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT (colX, colY);
OPTIMIZE TABLE table DEDUPLICATE BY col1,col2,col3;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex');
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT (colX, colY);
Note that * behaves just like in SELECT: MATERIALIZED, and ALIAS columns are not used for expansion.
Also, it is an error to specify empty list of columns, or write an expression that results in an empty list of columns, or deduplicate by an ALIAS column.
Column transformers other than EXCEPT are not supported.
Buffer engine is usually used on INSERTs, but right now there is no way
to track number of INSERTed rows per-table, since only summary metrics
exists:
- StorageBufferRows
- StorageBufferBytes
But it can be pretty useful to track INSERTed rows rate (and it can be
exposed via http_handlers for i.e. prometheus)