Commit Graph

184 Commits

Author SHA1 Message Date
Vladimir C
963c0111bf
Merge pull request #39418 from vdimir/join_and_sets
Filter joined streams for `full_sorting_join` by each other before sorting
2022-09-02 13:57:06 +02:00
vdimir
c778bba13f
Create sets for joins: wip 2022-08-29 09:47:00 +00:00
vdimir
71708d595f
Create sets for joins: wip 2022-08-29 09:46:59 +00:00
vdimir
8f06430ebd
Create sets for joins: upd 2022-08-29 09:46:58 +00:00
vdimir
c5bc7b0a0c
Resize pipeline after full sort join 2022-08-29 09:46:56 +00:00
Frank Chen
a3b6ad2a65
Merge branch 'master' into tracing_context_propagation 2022-08-18 20:59:07 +08:00
Alexey Milovidov
1a8ddf2956 Addition to prev. revision 2022-08-14 09:35:22 +02:00
Alexey Milovidov
001aca3b47 ProfileEvents for incomplete data due to query complexity settings 2022-08-14 09:17:02 +02:00
Nikita Taranov
17956cb668
Extend protocol with query parameters (#39906) 2022-08-12 14:28:35 +02:00
vdimir
708747ca0b
Merge branch 'master' into refactor-prepared-sets 2022-08-08 14:27:18 +02:00
Nikolai Kochetov
60599197b2 Review fixes. 2022-08-04 15:23:10 +00:00
Frank Chen
40c6e4c0d6 Merge remote-tracking branch 'origin/master' into tracing_context_propagation 2022-08-02 10:02:09 +08:00
Nikolai Kochetov
59a11b32ad
Merge branch 'master' into use-dag-in-key-condition 2022-07-29 17:01:33 +02:00
Vladimir C
115506356c
Merge branch 'master' into refactor-prepared-sets 2022-07-27 19:57:23 +02:00
vdimir
d9928ac93d
Add methods to SubqueryForSet, do not use refernce to SetPtr 2022-07-26 18:39:09 +00:00
vdimir
1e3fa2e01f
Refactor PreparedSets/SubqueryForSet 2022-07-26 18:39:02 +00:00
Nikolai Kochetov
908c1d8cba
Merge pull request #39601 from ClickHouse/fix-chain-add-sink
Fix Chain::addSink
2022-07-26 14:54:44 +02:00
Nikolai Kochetov
683a8866ef Fix Chain::addSink 2022-07-26 09:36:39 +00:00
Nikolai Kochetov
be9c7ed52c Add ReadFromMerge step. 2022-07-25 19:41:43 +00:00
Alexander Tokmakov
bed2206ae9
Merge pull request #39460 from ClickHouse/remove_some_dead_and_commented_code
Remove some dead and commented code
2022-07-22 13:24:34 +03:00
Alexander Tokmakov
a8da5d96fc remove some dead and commented code 2022-07-21 15:05:48 +02:00
Nikolai Kochetov
91043351aa Fixing build. 2022-07-20 20:30:16 +00:00
Amos Bird
982e1a73d3
Better 2022-07-12 22:21:46 +08:00
Amos Bird
d3709c6c26
Avoid redundant join block transformation. 2022-07-12 22:20:10 +08:00
Frank Chen
57a7e4a7c9 Remove old API reference 2022-07-07 17:42:35 +08:00
vdimir
7c586a9e7c
Minor updates for full soring merge join 2022-07-06 14:28:05 +00:00
vdimir
881d352e05
upd full sorting join 2022-07-06 14:28:05 +00:00
vdimir
0b994bb258
fix build 2022-07-06 14:27:32 +00:00
vdimir
d184e184b4
full sort join: check key types, more tests 2022-07-06 14:26:19 +00:00
vdimir
4e88e8f5ec
full sort join: move block list to all join state 2022-07-06 14:26:17 +00:00
vdimir
1b429fc1af
wip: any left/right sorting join 2022-07-06 14:23:46 +00:00
vdimir
8dce97123c
wip: any inner full sorting join 2022-07-06 14:23:46 +00:00
vdimir
4a16195964
Calculate output header for full sorting merge join 2022-07-06 14:23:45 +00:00
vdimir
fa8eb35599
Pipeline for full sorting merge join 2022-07-06 14:23:44 +00:00
Maksim Kita
bdc21737d5 MergeTree merge disable batch optimization 2022-07-05 16:15:00 +02:00
Kseniia Sumarokova
cce381a3ae
Merge pull request #38307 from azat/fix-insert-profile-events
Fix INSERT into Distributed hung due to ProfileEvents
2022-06-23 10:09:45 +02:00
Nikita Taranov
41ba0118b5
Bring back #36396 (#38110)
* Revert "Revert "More parallel execution for queries with `FINAL` (#36396)""

This reverts commit 5bfb15262c.

* fix tests

* fix review suggestions

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-06-22 15:05:07 +02:00
Azat Khuzhin
9eeb856519 Fix INSERT into Distributed hung due to ProfileEvents
Right now RemoteInserter does not read ProfileEvents for INSERT, it
handles them only after sending the query or on finish.

But #37391 sends them for each INSERT block, but sometimes they can be
no ProfileEvents packet, since it sends only non-empty blocks.

And this adds too much complexity, and anyway ProfileEvents are useless
for the server, so let's send them only if the query is initial (i.e.
send by user).

Note, that it is okay to change the logic of sending ProfileEvents w/o
changing DBMS_TCP_PROTOCOL_VERSION, because there were no public
releases with the original patch included yet.

Fixes: #37391
Refs: #35075
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-22 15:41:15 +03:00
Maksim Kita
cb018348cf
Merge pull request #38022 from kitaisreal/sorting-added-batch-queue-variants
Sorting added batch queue variants
2022-06-20 22:35:44 +02:00
Kseniia Sumarokova
a756b4be27
Merge pull request #37391 from azat/insert-profile-events-fix
Send profile events for INSERT queries (previously only SELECT was supported)
2022-06-20 12:16:29 +02:00
Maksim Kita
dbbf499005 Fixed unit tests 2022-06-18 18:20:01 +02:00
Azat Khuzhin
4baa7690ae Send profile events for INSERT queries (previously only SELECT was supported)
Reproducer:

    echo "1" | clickhouse-client --query="insert into function null('foo String') format TSV" --print-profile-events --profile-events-delay-ms=-1

However, clickhouse-local is differnt, it does sent the periodically,
but only if query was long enough, i.e.:

    # yes | head -n100000 | clickhouse-local --query="insert into function null('foo String') format TSV" --print-profile-events --profile-events-delay-ms=-1
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] ContextLock: 10 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] DiskReadElapsedMicroseconds: 29 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] IOBufferAllocBytes: 200000 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] IOBufferAllocs: 1 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertQuery: 1 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertedBytes: 1000000 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] InsertedRows: 100000 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] MemoryTrackerUsage: 1521975 (gauge)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] OSCPUVirtualTimeMicroseconds: 102148 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] OSReadChars: 135700 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] OSWriteChars: 8 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] Query: 1 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] RWLockAcquiredReadLocks: 2 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] ReadBufferFromFileDescriptorRead: 5 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] ReadBufferFromFileDescriptorReadBytes: 134464 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] RealTimeMicroseconds: 293747 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] SoftPageFaults: 382 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] TableFunctionExecute: 1 (increment)
    [s1.ch] 2022.05.20 15:20:27 [ 0 ] UserTimeMicroseconds: 102148 (increment)

v2: Proper support ProfileEvents in INSERTs (with protocol change)
v3: Receive profile events on INSERT queries
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-16 11:59:01 +03:00
Alexander Tokmakov
5bfb15262c Revert "More parallel execution for queries with FINAL (#36396)"
This reverts commit c8afeafe0e.
2022-06-15 17:25:38 +03:00
Nikita Taranov
c8afeafe0e
More parallel execution for queries with FINAL (#36396) 2022-06-15 12:44:20 +02:00
Azat Khuzhin
7210be1534 Disable send_logs_level for INSERT into Distributed to avoid possible hung
In case of INSERT into Distributed table with send_logs_level!=none it
is possible to receive tons of Log packets, and w/o consuming it
properly the socket buffer will be full, and eventually the query will
hung.

This happens because receiver will not read data until it will send logs
packets, but sender does not reads those Log packets and so receiver
hung, and hence the sender will hung too, because receiver do not
consume Data packets anymore.

In the initial version of this patch I tried to properly consume Log
packets, but it is not possible to ensure that before writing Data
blocks all Log packets had been consumed, that said that with current
protocol implementation it is not possible to fix Log packets consuming
properly, to avoid deadlock, so send_logs_level had been simply
disabled.

But note, that this does not differs to the user, in what ClickHouse did
before, since before it simply does not consume those packets, so the
client does not saw those messages anyway.

<details>

The receiver:

    Poco::Net::SocketImpl::poll(Poco::Timespan const&, int)
    Poco::Net::SocketImpl::sendBytes(void const*, int, int)
    Poco::Net::StreamSocketImpl::sendBytes(void const*, int, int)
    DB::WriteBufferFromPocoSocket::nextImpl()
    DB::TCPHandler::sendLogData(DB::Block const&)
    DB::TCPHandler::sendLogs()
    DB::TCPHandler::readDataNext()
    DB::TCPHandler::processInsertQuery()

    State      Recv-Q  Send-Q          Local Address:Port         Peer Address:Port Process
    ESTAB      4331792 211637           127.0.0.1:9000            127.0.0.1:24446 users:(("clickhouse-serv",pid=46874,fd=3850))

The sender:

    Poco::Net::SocketImpl::poll(Poco::Timespan const&, int)
    Poco::Net::SocketImpl::sendBytes(void const*, int, int)
    Poco::Net::StreamSocketImpl::sendBytes(void const*, int, int)
    DB::WriteBufferFromPocoSocket::nextImpl()
    DB::WriteBuffer::write(char const*, unsigned long)
    DB::CompressedWriteBuffer::nextImpl()
    DB::WriteBuffer::write(char const*, unsigned long)
    DB::SerializationString::serializeBinaryBulk(DB::IColumn const&, DB::WriteBuffer&, unsigned long, unsigned long) const
    DB::NativeWriter::write(DB::Block const&)
    DB::Connection::sendData(DB::Block const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool)
    DB::RemoteInserter::write(DB::Block)
    DB::RemoteSink::consume(DB::Chunk)
    DB::SinkToStorage::onConsume(DB::Chunk)

    State      Recv-Q  Send-Q         Local Address:Port         Peer Address:Port Process
    ESTAB      67883   3008240           127.0.0.1:24446           127.0.0.1:9000  users:(("clickhouse-serv",pid=41610,fd=25))

</details>

v2: rebase to use clickhouse_client_timeout and add clickhouse_test_wait_queries
v3: use KILL QUERY
v4: adjust the test
v5: disable send_logs_level for INSERT into Distributed
v6: add no-backward-compatibility-check tag
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-13 13:44:33 +03:00
Anton Popov
df6882d2b9
Revert "Fix errors of CheckTriviallyCopyableMove type" 2022-06-07 13:53:10 +02:00
Azat Khuzhin
078678237e Fix possible "No more packets are available" for distributed queries
CI founds the following case:

<details>

    2022.05.25 22:36:06.778808 [ 3037 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Fatal> : Logical error: 'No more packets are available.'.
    2022.05.25 22:42:24.960075 [ 17397 ] {} <Fatal> BaseDaemon: ########################################
    2022.05.25 22:42:24.971173 [ 17397 ] {} <Fatal> BaseDaemon: (version 22.6.1.1, build id: 9A1F9489854CED36) (from thread 3037) (query_id: 77743723-1fcd-4b3d-babc-d0615e3ff40e) (query: SELECT * FROM
    2022.05.25 22:42:25.046871 [ 17397 ] {} <Fatal> BaseDaemon: 5. ./build_docker/../src/Common/Exception.cpp:47: DB::abortOnFailedAssertion()
    2022.05.25 22:42:25.181449 [ 17397 ] {} <Fatal> BaseDaemon: 6. ./build_docker/../src/Common/Exception.cpp:70: DB::Exception::Exception()
    2022.05.25 22:42:25.367710 [ 17397 ] {} <Fatal> BaseDaemon: 7. ./build_docker/../src/Client/MultiplexedConnections.cpp:0: DB::MultiplexedConnections::receivePacketUnlocked()
    2022.05.25 22:42:25.414201 [ 17397 ] {} <Fatal> BaseDaemon: 8. ./build_docker/../src/Client/MultiplexedConnections.cpp:0: DB::MultiplexedConnections::receivePacket()
    2022.05.25 22:42:25.493066 [ 17397 ] {} <Fatal> BaseDaemon: 9. ./build_docker/../src/QueryPipeline/RemoteQueryExecutor.cpp:279: DB::RemoteQueryExecutor::read()
    2022.05.25 22:42:25.612679 [ 17397 ] {} <Fatal> BaseDaemon: 10. ./build_docker/../src/Processors/Sources/RemoteSource.cpp:0: DB::RemoteSource::tryGenerate()

Here are additional logs for this query:
    $ pigz -cd clickhouse-server.stress.log.gz | fgrep -a 77743723-1fcd-4b3d-babc-d0615e3ff40e | fgrep -e Connection -e Distributed -e Fatal
    2022.05.25 22:36:04.698671 [ 6613 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> Connection (127.0.0.2:9000): Connecting. Database: (not specified). User: default
    2022.05.25 22:36:04.722568 [ 3419 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> Connection (127.0.0.2:9000): Connecting. Database: (not specified). User: default
    2022.05.25 22:36:05.014432 [ 6613 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> Connection (127.0.0.2:9000): Connected to ClickHouse server version 22.6.1.
    2022.05.25 22:36:05.091397 [ 6613 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Debug> Connection (127.0.0.2:9000): Sent data for 2 scalars, total 2 rows in 0.000125814 sec., 15602 rows/sec., 68.00 B (517.81 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (1.10 MiB/sec.)
    2022.05.25 22:36:05.301301 [ 3419 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> Connection (127.0.0.2:9000): Connected to ClickHouse server version 22.6.1.
    2022.05.25 22:36:05.343140 [ 3419 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Debug> Connection (127.0.0.2:9000): Sent data for 2 scalars, total 2 rows in 0.000116304 sec., 16889 rows/sec., 68.00 B (559.80 KiB/sec.), compressed 0.4594594594594595 times to 148.00 B (1.19 MiB/sec.)
    2022.05.25 22:36:06.682535 [ 6613 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> StorageDistributed (remote): (127.0.0.2:9000) Cancelling query because enough data has been read
    2022.05.25 22:36:06.778808 [ 3037 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Fatal> : Logical error: 'No more packets are available.'.
    2022.05.25 22:36:06.789505 [ 3419 ] {77743723-1fcd-4b3d-babc-d0615e3ff40e} <Trace> StorageDistributed (remote): (127.0.0.2:9000) Cancelling query because enough data has been read
    2022.05.25 22:42:24.971173 [ 17397 ] {} <Fatal> BaseDaemon: (version 22.6.1.1, build id: 9A1F9489854CED36) (from thread 3037) (query_id: 77743723-1fcd-4b3d-babc-d0615e3ff40e) (query: SELECT * FROM

</details>

So between cancelling different sources the LOGICAL_ERROR occured, I
believe that this is because of the race:

    T1:                                        T2:
    RemoteQueryExecutor::read()
    checks was_cancelled
                                               RemoteQueryExecutor::tryCancel()
                                               connections->cancel()
    calls connections->receivePacket()

Note, for this problem async_socket_for_remote/use_hedged_requests
should be disabled, and original settings was:
- --max_parallel_replicas=3
- --use_hedged_requests=false
- --allow_experimental_parallel_reading_from_replicas=3

CI: https://s3.amazonaws.com/clickhouse-test-reports/37469/41cb029ed23e77f3a108e07e6b1b1bcb03dc7fcf/stress_test__undefined__actions_/fatal_messages.txt
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-06-07 08:20:32 +03:00
Robert Schulze
2d87af2a15
Merge pull request #37647 from DevTeamBK/Fix-all-CheckTriviallyCopyableMove-Errors
Fix errors of CheckTriviallyCopyableMove type
2022-06-05 19:58:47 +02:00
Nikolai Kochetov
00395e752e Cleanup 2022-06-02 16:59:14 +00:00
Nikolai Kochetov
506e102c17 Fix quotas. 2022-06-02 09:27:08 +00:00