Aleksei Semiglazov
921518db0a
CLICKHOUSE-606: query deduplication based on parts' UUID
...
* add the query data deduplication excluding duplicated parts in MergeTree family engines.
query deduplication is based on parts' UUID which should be enabled first with merge_tree setting
assign_part_uuids=1
allow_experimental_query_deduplication setting is to enable part deduplication, default ot false.
data part UUID is a mechanism of giving a data part a unique identifier.
Having UUID and deduplication mechanism provides a potential of moving parts
between shards preserving data consistency on a read path:
duplicated UUIDs will cause root executor to retry query against on of the replica explicitly
asking to exclude encountered duplicated fingerprints during a distributed query execution.
NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will
update part's UUID.
* add _part_uuid virtual column, allowing to use UUIDs in predicates.
Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com>
address comments
2021-02-02 16:53:39 +00:00
Alexey Milovidov
01a703ae50
Update test
2021-01-31 10:48:18 +03:00
Alexey Milovidov
bfcb12c2e9
Add test-connect tool
2021-01-29 09:13:43 +03:00
Alexey Milovidov
9477f8a8b1
Revert "Remove old non-automated test"
...
This reverts commit 217d05443a
.
2021-01-29 08:27:58 +03:00
Azat Khuzhin
e97c01c3ea
Fix UAF of the CompressedWriteBuffer after Connection::disconnect ( #19599 )
...
ASan report [1]:
<details>
Stacktrace with stripped shared_ptr and vector stuff:
```
==86==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0002b4888 at pc 0x00000a997056 bp 0x7f9e2ad55c00 sp 0x7f9e2ad55bf8
READ of size 8 at 0x60d0002b4888 thread T3 (TCPHandler)
0 0xa997055 in DB::BufferBase::Buffer::end() const obj-x86_64-linux-gnu/../src/IO/BufferBase.h:40:46
1 0xa997055 in DB::BufferBase::available() const obj-x86_64-linux-gnu/../src/IO/BufferBase.h:81:68
2 0xa997055 in DB::BufferBase::hasPendingData() const obj-x86_64-linux-gnu/../src/IO/BufferBase.h:94:56
3 0xa997055 in DB::WriteBuffer::nextIfAtEnd() obj-x86_64-linux-gnu/../src/IO/WriteBuffer.h:67:14
4 0xa997055 in DB::WriteBuffer::write(char const*, unsigned long) obj-x86_64-linux-gnu/../src/IO/WriteBuffer.h:78:13
5 0x1dcff45e in DB::CompressedWriteBuffer::nextImpl() obj-x86_64-linux-gnu/../src/Compression/CompressedWriteBuffer.cpp:37:9
6 0x1dcffb8a in DB::WriteBuffer::next() obj-x86_64-linux-gnu/../src/IO/WriteBuffer.h:46:13
7 0x1dcffb8a in DB::CompressedWriteBuffer::~CompressedWriteBuffer() obj-x86_64-linux-gnu/../src/Compression/CompressedWriteBuffer.cpp:54:9
11 0xab600cf in DB::Connection::~Connection() obj-x86_64-linux-gnu/../src/Client/Connection.h:114:28
15 0xac4adb9 in PoolBase<DB::Connection>::PooledObject::~PooledObject() obj-x86_64-linux-gnu/../src/Common/PoolBase.h:35:12
30 0xac485e4 in PoolBase<DB::Connection>::~PoolBase() obj-x86_64-linux-gnu/../src/Common/PoolBase.h:105:26
44 0xad2722f in DB::Cluster::ShardInfo::~ShardInfo() obj-x86_64-linux-gnu/../src/Interpreters/Cluster.h:167:12
52 0xad393b0 in DB::Cluster::~Cluster() obj-x86_64-linux-gnu/../src/Interpreters/Cluster.h:30:7
56 0x1f99f269 in DB::StorageDistributed::~StorageDistributed() obj-x86_64-linux-gnu/../src/Storages/StorageDistributed.cpp:338:41
69 0x1e231846 in DB::Context::~Context() obj-x86_64-linux-gnu/../src/Interpreters/Context.cpp:501:19
71 0x2073ccd3 in DB::TCPHandler::runImpl() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:416:23
72 0x2075db1c in DB::TCPHandler::run() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:1417:9
0x60d0002b4888 is located 56 bytes inside of 136-byte region [0x60d0002b4850,0x60d0002b48d8)
freed by thread T3 (TCPHandler) here:
0 0xa93d682 in operator delete(void*, unsigned long) (/workspace/clickhouse+0xa93d682)
1 0x2059d592 in std::__1::__shared_weak_count::__release_shared() obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:2518:9
2 0x2059d592 in std::__1::shared_ptr<DB::WriteBuffer>::~shared_ptr() obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:3212:19
3 0x2059d592 in std::__1::shared_ptr<DB::WriteBuffer>::operator=(std::__1::shared_ptr<DB::WriteBuffer>&&) obj-x86_64-linux-gnu/../contrib/libcxx/include/memory:3243:5
4 0x2059d592 in DB::Connection::disconnect() obj-x86_64-linux-gnu/../src/Client/Connection.cpp:143:9
5 0x205d6e6d in DB::MultiplexedConnections::disconnect() obj-x86_64-linux-gnu/../src/Client/MultiplexedConnections.cpp:159:25
6 0x1de6ec19 in DB::RemoteQueryExecutor::~RemoteQueryExecutor() obj-x86_64-linux-gnu/../src/DataStreams/RemoteQueryExecutor.cpp:86:34
10 0x20bf0e2c in DB::RemoteSource::~RemoteSource() obj-x86_64-linux-gnu/../src/Processors/Sources/RemoteSource.cpp:22:29
21 0x20869680 in DB::Pipe::~Pipe() obj-x86_64-linux-gnu/../src/Processors/Pipe.h:25:7
22 0x20869680 in DB::QueryPipeline::reset() obj-x86_64-linux-gnu/../src/Processors/QueryPipeline.cpp:79:1
23 0x1de2d89d in DB::BlockIO::reset() obj-x86_64-linux-gnu/../src/DataStreams/BlockIO.cpp:45:14
24 0x1de2d9f7 in DB::BlockIO::operator=(DB::BlockIO&&) obj-x86_64-linux-gnu/../src/DataStreams/BlockIO.cpp:57:5
25 0x20762731 in DB::QueryState::operator=(DB::QueryState&&) obj-x86_64-linux-gnu/../src/Server/TCPHandler.h:31:8
26 0x2073c70c in DB::QueryState::reset() obj-x86_64-linux-gnu/../src/Server/TCPHandler.h:85:15
27 0x2073c70c in DB::TCPHandler::runImpl() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:399:19
28 0x2075db1c in DB::TCPHandler::run() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:1417:9
29 0x266eeebe in Poco::Net::TCPServerConnection::start() obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerConnection.cpp:43:3
30 0x266ef9db in Poco::Net::TCPServerDispatcher::run() obj-x86_64-linux-gnu/../contrib/poco/Net/src/TCPServerDispatcher.cpp:112:19
31 0x269b1204 in Poco::PooledThread::run() obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/ThreadPool.cpp:199:14
32 0x269ab756 in Poco::ThreadImpl::runnableEntry(void*) obj-x86_64-linux-gnu/../contrib/poco/Foundation/src/Thread_POSIX.cpp:345:27
33 0x7f9f06ea8608 in start_thread /build/glibc-ZN95T4/glibc-2.31/nptl/pthread_create.c:477:8
previously allocated by thread T3 (TCPHandler) here:
0 0xa93ca1d in operator new(unsigned long) (/workspace/clickhouse+0xa93ca1d)
1 0x2059b8cd in void* std::__1::__libcpp_operator_new<unsigned long>(unsigned long) obj-x86_64-linux-gnu/../contrib/libcxx/include/new:235:10
8 0x2059b8cd in DB::Connection::connect(DB::ConnectionTimeouts const&) obj-x86_64-linux-gnu/../src/Client/Connection.cpp:112:15
9 0x205a0a1d in DB::Connection::getServerRevision(DB::ConnectionTimeouts const&) obj-x86_64-linux-gnu/../src/Client/Connection.cpp:289:9
10 0x205bdafa in DB::ConnectionPoolWithFailover::tryGetEntry(DB::IConnectionPool&, DB::ConnectionTimeouts const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, DB::Settings const*, DB::QualifiedTableName const*) obj-x86_64-linux-gnu/../src/Client/ConnectionPoolWithFailover.cpp:251:45
11 0x205c06cf in DB::ConnectionPoolWithFailover::getManyChecked()::$_8::operator()() const obj-x86_64-linux-gnu/../src/Client/ConnectionPoolWithFailover.cpp:169:16
20 0x205bd61f in DB::ConnectionPoolWithFailover::getManyChecked(DB::ConnectionTimeouts const&, DB::Settings const*, DB::PoolMode, DB::QualifiedTableName const&) obj-x86_64-linux-gnu/../src/Client/ConnectionPoolWithFailover.cpp:172:12
28 0x1de6f425 in DB::RemoteQueryExecutor::sendQuery() obj-x86_64-linux-gnu/../src/DataStreams/RemoteQueryExecutor.cpp:143:31
29 0x1fdd1410 in DB::getStructureOfRemoteTableInShard(DB::Cluster const&, DB::Cluster::ShardInfo const&, DB::StorageID const&, DB::Context const&, std::__1::shared_ptr<DB::IAST> const&) obj-x86_64-linux-gnu/../src/Storages/getStructureOfRemoteTable.cpp:78:12
30 0x1fdd74a8 in DB::getStructureOfRemoteTable(DB::Cluster const&, DB::StorageID const&, DB::Context const&, std::__1::shared_ptr<DB::IAST> const&) obj-x86_64-linux-gnu/../src/Storages/getStructureOfRemoteTable.cpp:131:32
31 0x1d42cd79 in DB::TableFunctionRemote::getActualTableStructure(DB::Context const&) const obj-x86_64-linux-gnu/../src/TableFunctions/TableFunctionRemote.cpp:261:12
32 0x1d42b4e2 in DB::TableFunctionRemote::executeImpl(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ColumnsDescription) const obj-x86_64-linux-gnu/../src/TableFunctions/TableFunctionRemote.cpp:222:26
33 0x1e2c8b15 in DB::ITableFunction::execute(std::__1::shared_ptr<DB::IAST> const&, DB::Context const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ColumnsDescription) const obj-x86_64-linux-gnu/../src/TableFunctions/ITableFunction.cpp:24:16
34 0x1e2417ff in DB::Context::executeTableFunction(std::__1::shared_ptr<DB::IAST> const&) obj-x86_64-linux-gnu/../src/Interpreters/Context.cpp:1007:35
35 0x1f15bd67 in DB::JoinedTables::getLeftTableStorage() obj-x86_64-linux-gnu/../src/Interpreters/JoinedTables.cpp:162:42
36 0x1ebf393f in DB::InterpreterSelectQuery::InterpreterSelectQuery() obj-x86_64-linux-gnu/../src/Interpreters/InterpreterSelectQuery.cpp:306:33
42 0x1eb3bf90 in DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, DB::Context&, DB::SelectQueryOptions const&) obj-x86_64-linux-gnu/../src/Interpreters/InterpreterFactory.cpp:110:16
43 0x1f4d9dee in DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*) obj-x86_64-linux-gnu/../src/Interpreters/executeQuery.cpp:520:28
44 0x1f4d7067 in DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool) obj-x86_64-linux-gnu/../src/Interpreters/executeQuery.cpp:900:30
45 0x2073b0bc in DB::TCPHandler::runImpl() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:260:24
46 0x2075db1c in DB::TCPHandler::run() obj-x86_64-linux-gnu/../src/Server/TCPHandler.cpp:1417:9
```
</details>
[1]: https://clickhouse-test-reports.s3.yandex.net/19583/9f8ab99dd12a6f60a20f5f84ab2f5d53874c6ae7/fuzzer_asan/report.htmlfail1
2021-01-26 14:27:58 +03:00
Azat Khuzhin
109dbe5df4
Check the stream before sending while hanlding async INSERTs into Distributed
...
It is possible to get corruption (even though it is very unlikely, and
initially it wasn't corruption) just before the data block goes in the
file on disk, and in case of batching, it will break the packets, since
it will write the packet type but will not write any data after.
2021-01-22 21:29:58 +03:00
Azat Khuzhin
e4350e078c
Add ability to distinguish remote exceptions from local
2021-01-20 01:10:17 +03:00
Azat Khuzhin
cf085b0687
Split RemoteQueryExecutorReadContext into module part
2021-01-16 01:57:36 +03:00
Nikolai Kochetov
af7f5c9518
Merge pull request #17868 from ClickHouse/async-read-from-socket
...
Async read from socket
2020-12-23 12:20:42 +03:00
Nikolai Kochetov
01c8d5dccc
Use replace fiber to callback in ReadBufferFromPocoSocket
2020-12-18 17:24:44 +03:00
nikitamikhaylov
18d52dbc63
better
2020-12-16 14:55:33 +03:00
nikitamikhaylov
4ccdb3ca20
done
2020-12-16 14:55:33 +03:00
Nikolai Kochetov
4c290996dc
Fixing tests.
2020-12-15 21:22:20 +03:00
Nikolai Kochetov
3d6ace5890
Fixing tests.
2020-12-15 21:20:14 +03:00
Nikolai Kochetov
00de8abb0a
Fixing tests.
2020-12-15 00:07:10 +03:00
Nikolai Kochetov
8de5cd5bc7
Merge branch 'master' into async-read-from-socket
2020-12-14 17:45:38 +03:00
Nikolai Kochetov
75ac87c241
Fixing build.
2020-12-14 17:42:12 +03:00
Nikolai Kochetov
e295dfe6e3
Use ucontext for asan
2020-12-14 17:42:08 +03:00
Azat Khuzhin
5b3ab48861
More forward declaration for generic headers
...
The following headers are pretty generic, so use forward declaration as
much as possible:
- Context.h
- Settings.h
- ConnectionTimeouts.h
(Also this shows that some missing some includes -- this has been fixed)
And split ConnectionTimeouts.h into ConnectionTimeoutsContext.h (since
module part cannot be added for it, due to recursive build dependencies
that will be introduced)
Also remove Settings from the RemoteBlockInputStream/RemoteQueryExecutor
and just pass the context, since settings was passed only in speicifc
places, that can allow making a copy of Context (i.e. Copier).
Approx results (How much units will be recompiled after changing file X?):
- ConnectionTimeouts.h
- mainline: 100
- Context.h:
- mainline: ~800
- patched: 415
- Settings.h:
- mainline: 900-1K
- patched: 440 (most of them because of the Context.h)
2020-12-12 17:43:10 +03:00
Nikolai Kochetov
156f44808f
Fixing crash.
2020-12-09 17:11:45 +03:00
Nikolai Kochetov
082a496364
Add async read to RemoteQueryExecutor
2020-12-03 15:21:10 +03:00
Nikolai Kochetov
e3946bc2b5
Add async read to RemoteQueryExecutor.
2020-12-02 20:02:14 +03:00
Pavel Kruglov
5b94dd2f74
Add eof check in receiveHello
2020-11-24 18:15:13 +03:00
Alexey Milovidov
24f4fa6edf
Follow Arcadia ya.make rules
2020-11-17 00:16:50 +03:00
Alexey Milovidov
3df04ce0c2
Follow Arcadia ya.make rules
2020-11-16 21:24:58 +03:00
filimonov
7f7f66a1f1
add comment & restart CI...
2020-11-13 17:45:58 +01:00
filimonov
be16b4ef77
Update Connection.cpp
2020-11-13 12:04:56 +01:00
filimonov
178d8e9b75
Update Connection.cpp
2020-11-12 22:07:19 +01:00
filimonov
8eff47420b
Update Connection.cpp
2020-11-12 18:18:34 +01:00
Alexander Tokmakov
a06be511df
pcg serialization
2020-11-09 16:07:38 +03:00
Alexey Milovidov
fd84d16387
Fix "server failed to start" error
2020-11-07 03:14:53 +03:00
Maxim Akhmedov
3627fabfb9
Remove -g0 form Arcadia build settings.
2020-10-29 17:37:23 +03:00
Amos Bird
d2dcfc3f0d
Refactor processors.
2020-10-12 17:30:05 +08:00
Alexander Kuzmenkov
dde4cf70e1
Merge remote-tracking branch 'origin/master' into HEAD
2020-09-22 14:03:59 +03:00
alexey-milovidov
9b6c62e82b
Merge pull request #14867 from amosbird/lbo
...
Explicit define what first replica is.
2020-09-17 19:37:15 +03:00
Alexander Kuzmenkov
fb64cf210a
straighten the protocol version
2020-09-17 17:37:29 +03:00
Amos Bird
38d53c38f6
Explicit define what first replica is.
2020-09-16 17:54:41 +08:00
Azat Khuzhin
785d1b2a75
OpenSSLHelpers cleanup
...
Add few more specializations for encodeSHA256():
- std::string encodeSHA256(const std::string_view &);
- std::string encodeSHA256(const void *, size_t);
- void encodeSHA256(const void *, size_t, unsigned char *);
2020-09-15 01:36:28 +03:00
Azat Khuzhin
0159c74f21
Secure inter-cluster query execution (with initial_user as current query user) [v3]
...
Add inter-server cluster secret, it is used for Distributed queries
inside cluster, you can configure in the configuration file:
<remote_servers>
<logs>
<shard>
<secret>foobar</secret> <!-- empty -- works as before -->
...
</shard>
</logs>
</remote_servers>
And this will allow clickhouse to make sure that the query was not
faked, and was issued from the node that knows the secret. And since
trust appeared it can use initial_user for query execution, this will
apply correct *_for_user (since with inter-server secret enabled, the
query will be executed from the same user on the shards as on initator,
unlike "default" user w/o it).
v2: Change user to the initial_user for Distributed queries if secret match
v3: Add Protocol::Cluster package
v4: Drop Protocol::Cluster and use plain Protocol::Hello + user marker
v5: Do not use user from Hello for cluster-secure (superfluous)
2020-09-15 01:36:28 +03:00
Alexey Milovidov
e3924b8057
Fix "Arcadia"
2020-09-08 01:14:13 +03:00
alesapin
10c7a6c45e
Add ability to specify Default codec for columns ( #14049 )
...
* Add ability to specify DefaultCompression codec which correspond to settings specified in config.xml
* Fix style
* Rename DefaultCompression to simple Default
* Fix compression codec
* Better codec description representation
* Less strange code and one method
* Fix delta
2020-08-28 20:40:45 +03:00
Vasily Nemkov
f94f786cc3
First attempt to fix data race in ConnectionPoolWithFailover::getStatus()
2020-08-20 23:25:38 +03:00
Nikolai Kochetov
9b67cd9faf
Merge branch 'master' into refactor-pipes-3
2020-08-10 10:50:17 +03:00
Alexey Milovidov
edd89a8610
Fix half of typos
2020-08-08 03:47:03 +03:00
Nikolai Kochetov
20e63d2271
Refactor Pipe [part 6]
2020-08-06 15:24:05 +03:00
Alexey Milovidov
6f690b7c0d
Normalize ya.make files, fix "Arcadia" build
2020-08-02 16:57:38 +03:00
Vitaly Baranov
56665a15f7
Rework and rename the template class SettingsCollection => BaseSettings.
2020-07-31 20:54:18 +03:00
Nikita Mikhaylov
4d49d2c671
another removes
2020-07-30 13:31:14 +03:00
Alexander Kuzmenkov
ac436c79eb
Merge remote-tracking branch 'origin/master' into HEAD
2020-07-07 15:42:11 +03:00
Alexander Kuzmenkov
5c417f45b8
streaming wip
2020-06-30 12:25:23 +03:00