Commit Graph

84 Commits

Author SHA1 Message Date
Aleksei Semiglazov
921518db0a CLICKHOUSE-606: query deduplication based on parts' UUID
* add the query data deduplication excluding duplicated parts in MergeTree family engines.

query deduplication is based on parts' UUID which should be enabled first with merge_tree setting
assign_part_uuids=1

allow_experimental_query_deduplication setting is to enable part deduplication, default ot false.

data part UUID is a mechanism of giving a data part a unique identifier.
Having UUID and deduplication mechanism provides a potential of moving parts
between shards preserving data consistency on a read path:
duplicated UUIDs will cause root executor to retry query against on of the replica explicitly
asking to exclude encountered duplicated fingerprints during a distributed query execution.

NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will
update part's UUID.

* add _part_uuid virtual column, allowing to use UUIDs in predicates.

Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com>

address comments
2021-02-02 16:53:39 +00:00
Alexey Milovidov
01a703ae50 Update test 2021-01-31 10:48:18 +03:00
Alexey Milovidov
bfcb12c2e9 Add test-connect tool 2021-01-29 09:13:43 +03:00
Alexey Milovidov
9477f8a8b1 Revert "Remove old non-automated test"
This reverts commit 217d05443a.
2021-01-29 08:27:58 +03:00
Azat Khuzhin
e97c01c3ea
Fix UAF of the CompressedWriteBuffer after Connection::disconnect (#19599)
ASan report [1]:

<details>

Stacktrace with stripped shared_ptr and vector stuff:

```
==86==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0002b4888 at pc 0x00000a997056 bp 0x7f9e2ad55c00 sp 0x7f9e2ad55bf8
READ of size 8 at 0x60d0002b4888 thread T3 (TCPHandler)
    0 0xa997055 in DB::BufferBase::Buffer::end() const obj-x86_64-linux-gnu/../src/IO/BufferBase.h:40:46
    1 0xa997055 in DB::BufferBase::available() const obj-x86_64-linux-gnu/../src/IO/BufferBase.h:81:68
    2 0xa997055 in DB::BufferBase::hasPendingData() const obj-x86_64-linux-gnu/../src/IO/BufferBase.h:94:56
    3 0xa997055 in DB::WriteBuffer::nextIfAtEnd() obj-x86_64-linux-gnu/../src/IO/WriteBuffer.h:67:14
    4 0xa997055 in DB::WriteBuffer::write(char const*, unsigned long) obj-x86_64-linux-gnu/../src/IO/WriteBuffer.h:78:13
    5 0x1dcff45e in DB::CompressedWriteBuffer::nextImpl() obj-x86_64-linux-gnu/../src/Compression/CompressedWriteBuffer.cpp:37:9
    6 0x1dcffb8a in DB::WriteBuffer::next() obj-x86_64-linux-gnu/../src/IO/WriteBuffer.h:46:13
    7 0x1dcffb8a in DB::CompressedWriteBuffer::~CompressedWriteBuffer() obj-x86_64-linux-gnu/../src/Compression/CompressedWriteBuffer.cpp:54:9
    11 0xab600cf in DB::Connection::~Connection() obj-x86_64-linux-gnu/../src/Client/Connection.h:114:28
    15 0xac4adb9 in PoolBase<DB::Connection>::PooledObject::~PooledObject() obj-x86_64-linux-gnu/../src/Common/PoolBase.h:35:12
    30 0xac485e4 in PoolBase<DB::Connection>::~PoolBase() obj-x86_64-linux-gnu/../src/Common/PoolBase.h:105:26
    44 0xad2722f in DB::Cluster::ShardInfo::~ShardInfo() obj-x86_64-linux-gnu/../src/Interpreters/Cluster.h:167:12
    52 0xad393b0 in DB::Cluster::~Cluster() obj-x86_64-linux-gnu/../src/Interpreters/Cluster.h:30:7
    56 0x1f99f269 in DB::StorageDistributed::~StorageDistributed() obj-x86_64-linux-gnu/../src/Storages/StorageDistributed.cpp:338:41
    69 0x1e231846 in DB::Context::~Context() obj-x86_64-linux-gnu/../src/Interpreters/Context.cpp:501:19
    71 0x2073ccd3 in DB::TCPHandler::runImpl() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:416:23
    72 0x2075db1c in DB::TCPHandler::run() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:1417:9

0x60d0002b4888 is located 56 bytes inside of 136-byte region [0x60d0002b4850,0x60d0002b48d8)
freed by thread T3 (TCPHandler) here:
    0 0xa93d682 in operator delete(void*, unsigned long) (/workspace/clickhouse+0xa93d682)
    1 0x2059d592 in std::__1::__shared_weak_count::__release_shared() obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:2518:9
    2 0x2059d592 in std::__1::shared_ptr<DB::WriteBuffer>::~shared_ptr() obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:3212:19
    3 0x2059d592 in std::__1::shared_ptr<DB::WriteBuffer>::operator=(std::__1::shared_ptr<DB::WriteBuffer>&&) obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:3243:5
    4 0x2059d592 in DB::Connection::disconnect() obj-x86_64-linux-gnu/../src/Client/Connection.cpp:143:9
    5 0x205d6e6d in DB::MultiplexedConnections::disconnect() obj-x86_64-linux-gnu/../src/Client/MultiplexedConnections.cpp:159:25
    6 0x1de6ec19 in DB::RemoteQueryExecutor::~RemoteQueryExecutor() obj-x86_64-linux-gnu/../src/DataStreams/RemoteQueryExecutor.cpp:86:34
    10 0x20bf0e2c in DB::RemoteSource::~RemoteSource() obj-x86_64-linux-gnu/../src/Processors/Sources/RemoteSource.cpp:22:29
    21 0x20869680 in DB::Pipe::~Pipe() obj-x86_64-linux-gnu/../src/Processors/Pipe.h:25:7
    22 0x20869680 in DB::QueryPipeline::reset() obj-x86_64-linux-gnu/../src/Processors/QueryPipeline.cpp:79:1
    23 0x1de2d89d in DB::BlockIO::reset() obj-x86_64-linux-gnu/../src/DataStreams/BlockIO.cpp:45:14
    24 0x1de2d9f7 in DB::BlockIO::operator=(DB::BlockIO&&) obj-x86_64-linux-gnu/../src/DataStreams/BlockIO.cpp:57:5
    25 0x20762731 in DB::QueryState::operator=(DB::QueryState&&) obj-x86_64-linux-gnu/../src/Server/TCPHandler.h:31:8
    26 0x2073c70c in DB::QueryState::reset() obj-x86_64-linux-gnu/../src/Server/TCPHandler.h:85:15
    27 0x2073c70c in DB::TCPHandler::runImpl() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:399:19
    28 0x2075db1c in DB::TCPHandler::run() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:1417:9
    29 0x266eeebe in Poco::Net::TCPServerConnection::start() obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerConnection.cpp:43:3
    30 0x266ef9db in Poco::Net::TCPServerDispatcher::run() obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerDispatcher.cpp:112:19
    31 0x269b1204 in Poco::PooledThread::run() obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/ThreadPool.cpp:199:14
    32 0x269ab756 in Poco::ThreadImpl::runnableEntry(void*) obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/Thread_POSIX.cpp:345:27
    33 0x7f9f06ea8608 in start_thread /build/glibc-ZN95T4/glibc-2.31/nptl/pthread_create.c:477:8

previously allocated by thread T3 (TCPHandler) here:
    0 0xa93ca1d in operator new(unsigned long) (/workspace/clickhouse+0xa93ca1d)
    1 0x2059b8cd in void* std::__1::__libcpp_operator_new<unsigned long>(unsigned long) obj-x86_64-linux-gnu/../contrib/libcxx/include/new:235:10
    8 0x2059b8cd in DB::Connection::connect(DB::ConnectionTimeouts const&) obj-x86_64-linux-gnu/../src/Client/Connection.cpp:112:15
    9 0x205a0a1d in DB::Connection::getServerRevision(DB::ConnectionTimeouts const&) obj-x86_64-linux-gnu/../src/Client/Connection.cpp:289:9
    10 0x205bdafa in DB::ConnectionPoolWithFailover::tryGetEntry(DB::IConnectionPool&, DB::ConnectionTimeouts const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, DB::Settings const*, DB::QualifiedTableName const*) obj-x86_64-linux-gnu/../src/Client/ConnectionPoolWithFailover.cpp:251:45
    11 0x205c06cf in DB::ConnectionPoolWithFailover::getManyChecked()::$_8::operator()() const obj-x86_64-linux-gnu/../src/Client/ConnectionPoolWithFailover.cpp:169:16
    20 0x205bd61f in DB::ConnectionPoolWithFailover::getManyChecked(DB::ConnectionTimeouts const&, DB::Settings const*, DB::PoolMode, DB::QualifiedTableName const&) obj-x86_64-linux-gnu/../src/Client/ConnectionPoolWithFailover.cpp:172:12
    28 0x1de6f425 in DB::RemoteQueryExecutor::sendQuery() obj-x86_64-linux-gnu/../src/DataStreams/RemoteQueryExecutor.cpp:143:31
    29 0x1fdd1410 in DB::getStructureOfRemoteTableInShard(DB::Cluster const&, DB::Cluster::ShardInfo const&, DB::StorageID const&, DB::Context const&, std::__1::shared_ptr<DB::IAST> const&) obj-x86_64-linux-gnu/../src/Storages/getStructureOfRemoteTable.cpp:78:12
    30 0x1fdd74a8 in DB::getStructureOfRemoteTable(DB::Cluster const&, DB::StorageID const&, DB::Context const&, std::__1::shared_ptr<DB::IAST> const&) obj-x86_64-linux-gnu/../src/Storages/getStructureOfRemoteTable.cpp:131:32
    31 0x1d42cd79 in DB::TableFunctionRemote::getActualTableStructure(DB::Context const&) const obj-x86_64-linux-gnu/../src/TableFunctions/TableFunctionRemote.cpp:261:12
    32 0x1d42b4e2 in DB::TableFunctionRemote::executeImpl(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ColumnsDescription) const obj-x86_64-linux-gnu/../src/TableFunctions/TableFunctionRemote.cpp:222:26
    33 0x1e2c8b15 in DB::ITableFunction::execute(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ColumnsDescription) const obj-x86_64-linux-gnu/../src/TableFunctions/ITableFunction.cpp:24:16
    34 0x1e2417ff in DB::Context::executeTableFunction(std::__1::shared_ptr<DB::IAST> const&) obj-x86_64-linux-gnu/../src/Interpreters/Context.cpp:1007:35
    35 0x1f15bd67 in DB::JoinedTables::getLeftTableStorage() obj-x86_64-linux-gnu/../src/Interpreters/JoinedTables.cpp:162:42
    36 0x1ebf393f in DB::InterpreterSelectQuery::InterpreterSelectQuery() obj-x86_64-linux-gnu/../src/Interpreters/InterpreterSelectQuery.cpp:306:33
    42 0x1eb3bf90 in DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, DB::Context&, DB::SelectQueryOptions const&) obj-x86_64-linux-gnu/../src/Interpreters/InterpreterFactory.cpp:110:16
    43 0x1f4d9dee in DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*) obj-x86_64-linux-gnu/../src/Interpreters/executeQuery.cpp:520:28
    44 0x1f4d7067 in DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool) obj-x86_64-linux-gnu/../src/Interpreters/executeQuery.cpp:900:30
    45 0x2073b0bc in DB::TCPHandler::runImpl() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:260:24
    46 0x2075db1c in DB::TCPHandler::run() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:1417:9
```

</details>

  [1]: https://clickhouse-test-reports.s3.yandex.net/19583/9f8ab99dd12a6f60a20f5f84ab2f5d53874c6ae7/fuzzer_asan/report.htmlfail1
2021-01-26 14:27:58 +03:00
Azat Khuzhin
109dbe5df4 Check the stream before sending while hanlding async INSERTs into Distributed
It is possible to get corruption (even though it is very unlikely, and
initially it wasn't corruption) just before the data block goes in the
file on disk, and in case of batching, it will break the packets, since
it will write the packet type but will not write any data after.
2021-01-22 21:29:58 +03:00
Azat Khuzhin
e4350e078c Add ability to distinguish remote exceptions from local 2021-01-20 01:10:17 +03:00
Azat Khuzhin
cf085b0687 Split RemoteQueryExecutorReadContext into module part 2021-01-16 01:57:36 +03:00
Nikolai Kochetov
af7f5c9518
Merge pull request #17868 from ClickHouse/async-read-from-socket
Async read from socket
2020-12-23 12:20:42 +03:00
Nikolai Kochetov
01c8d5dccc Use replace fiber to callback in ReadBufferFromPocoSocket 2020-12-18 17:24:44 +03:00
nikitamikhaylov
18d52dbc63 better 2020-12-16 14:55:33 +03:00
nikitamikhaylov
4ccdb3ca20 done 2020-12-16 14:55:33 +03:00
Nikolai Kochetov
4c290996dc Fixing tests. 2020-12-15 21:22:20 +03:00
Nikolai Kochetov
3d6ace5890 Fixing tests. 2020-12-15 21:20:14 +03:00
Nikolai Kochetov
00de8abb0a Fixing tests. 2020-12-15 00:07:10 +03:00
Nikolai Kochetov
8de5cd5bc7 Merge branch 'master' into async-read-from-socket 2020-12-14 17:45:38 +03:00
Nikolai Kochetov
75ac87c241 Fixing build. 2020-12-14 17:42:12 +03:00
Nikolai Kochetov
e295dfe6e3 Use ucontext for asan 2020-12-14 17:42:08 +03:00
Azat Khuzhin
5b3ab48861 More forward declaration for generic headers
The following headers are pretty generic, so use forward declaration as
much as possible:
- Context.h
- Settings.h
- ConnectionTimeouts.h
(Also this shows that some missing some includes -- this has been fixed)

And split ConnectionTimeouts.h into ConnectionTimeoutsContext.h (since
module part cannot be added for it, due to recursive build dependencies
that will be introduced)

Also remove Settings from the RemoteBlockInputStream/RemoteQueryExecutor
and just pass the context, since settings was passed only in speicifc
places, that can allow making a copy of Context (i.e. Copier).

Approx results (How much units will be recompiled after changing file X?):

- ConnectionTimeouts.h
  - mainline: 100

- Context.h:
  - mainline: ~800
  - patched:  415

- Settings.h:
  - mainline: 900-1K
  - patched:  440 (most of them because of the Context.h)
2020-12-12 17:43:10 +03:00
Nikolai Kochetov
156f44808f Fixing crash. 2020-12-09 17:11:45 +03:00
Nikolai Kochetov
082a496364 Add async read to RemoteQueryExecutor 2020-12-03 15:21:10 +03:00
Nikolai Kochetov
e3946bc2b5 Add async read to RemoteQueryExecutor. 2020-12-02 20:02:14 +03:00
Pavel Kruglov
5b94dd2f74 Add eof check in receiveHello 2020-11-24 18:15:13 +03:00
Alexey Milovidov
24f4fa6edf Follow Arcadia ya.make rules 2020-11-17 00:16:50 +03:00
Alexey Milovidov
3df04ce0c2 Follow Arcadia ya.make rules 2020-11-16 21:24:58 +03:00
filimonov
7f7f66a1f1
add comment & restart CI... 2020-11-13 17:45:58 +01:00
filimonov
be16b4ef77
Update Connection.cpp 2020-11-13 12:04:56 +01:00
filimonov
178d8e9b75
Update Connection.cpp 2020-11-12 22:07:19 +01:00
filimonov
8eff47420b
Update Connection.cpp 2020-11-12 18:18:34 +01:00
Alexander Tokmakov
a06be511df pcg serialization 2020-11-09 16:07:38 +03:00
Alexey Milovidov
fd84d16387 Fix "server failed to start" error 2020-11-07 03:14:53 +03:00
Maxim Akhmedov
3627fabfb9 Remove -g0 form Arcadia build settings. 2020-10-29 17:37:23 +03:00
Amos Bird
d2dcfc3f0d
Refactor processors. 2020-10-12 17:30:05 +08:00
Alexander Kuzmenkov
dde4cf70e1 Merge remote-tracking branch 'origin/master' into HEAD 2020-09-22 14:03:59 +03:00
alexey-milovidov
9b6c62e82b
Merge pull request #14867 from amosbird/lbo
Explicit define what first replica is.
2020-09-17 19:37:15 +03:00
Alexander Kuzmenkov
fb64cf210a straighten the protocol version 2020-09-17 17:37:29 +03:00
Amos Bird
38d53c38f6
Explicit define what first replica is. 2020-09-16 17:54:41 +08:00
Azat Khuzhin
785d1b2a75 OpenSSLHelpers cleanup
Add few more specializations for encodeSHA256():
- std::string encodeSHA256(const std::string_view &);
- std::string encodeSHA256(const void *, size_t);
- void encodeSHA256(const void *, size_t, unsigned char *);
2020-09-15 01:36:28 +03:00
Azat Khuzhin
0159c74f21 Secure inter-cluster query execution (with initial_user as current query user) [v3]
Add inter-server cluster secret, it is used for Distributed queries
inside cluster, you can configure in the configuration file:

  <remote_servers>
      <logs>
          <shard>
              <secret>foobar</secret> <!-- empty -- works as before -->
              ...
          </shard>
      </logs>
  </remote_servers>

And this will allow clickhouse to make sure that the query was not
faked, and was issued from the node that knows the secret. And since
trust appeared it can use initial_user for query execution, this will
apply correct *_for_user (since with inter-server secret enabled, the
query will be executed from the same user on the shards as on initator,
unlike "default" user w/o it).

v2: Change user to the initial_user for Distributed queries if secret match
v3: Add Protocol::Cluster package
v4: Drop Protocol::Cluster and use plain Protocol::Hello + user marker
v5: Do not use user from Hello for cluster-secure (superfluous)
2020-09-15 01:36:28 +03:00
Alexey Milovidov
e3924b8057 Fix "Arcadia" 2020-09-08 01:14:13 +03:00
alesapin
10c7a6c45e
Add ability to specify Default codec for columns (#14049)
* Add ability to specify DefaultCompression codec which correspond to settings specified in config.xml

* Fix style

* Rename DefaultCompression to simple Default

* Fix compression codec

* Better codec description representation

* Less strange code and one method

* Fix delta
2020-08-28 20:40:45 +03:00
Vasily Nemkov
f94f786cc3 First attempt to fix data race in ConnectionPoolWithFailover::getStatus() 2020-08-20 23:25:38 +03:00
Nikolai Kochetov
9b67cd9faf Merge branch 'master' into refactor-pipes-3 2020-08-10 10:50:17 +03:00
Alexey Milovidov
edd89a8610 Fix half of typos 2020-08-08 03:47:03 +03:00
Nikolai Kochetov
20e63d2271 Refactor Pipe [part 6] 2020-08-06 15:24:05 +03:00
Alexey Milovidov
6f690b7c0d Normalize ya.make files, fix "Arcadia" build 2020-08-02 16:57:38 +03:00
Vitaly Baranov
56665a15f7 Rework and rename the template class SettingsCollection => BaseSettings. 2020-07-31 20:54:18 +03:00
Nikita Mikhaylov
4d49d2c671 another removes 2020-07-30 13:31:14 +03:00
Alexander Kuzmenkov
ac436c79eb Merge remote-tracking branch 'origin/master' into HEAD 2020-07-07 15:42:11 +03:00
Alexander Kuzmenkov
5c417f45b8 streaming wip 2020-06-30 12:25:23 +03:00