mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-23 08:02:02 +00:00
Merge branch 'master' into is_obsolete
This commit is contained in:
commit
0243542587
@ -74,6 +74,7 @@ ConstructorInitializerIndentWidth: 4
|
||||
ContinuationIndentWidth: 4
|
||||
DerivePointerAlignment: false
|
||||
DisableFormat: false
|
||||
IndentRequiresClause: false
|
||||
IndentWidth: 4
|
||||
IndentWrappedFunctionNames: false
|
||||
MacroBlockBegin: ''
|
||||
|
1
.github/workflows/woboq.yml
vendored
1
.github/workflows/woboq.yml
vendored
@ -12,6 +12,7 @@ jobs:
|
||||
# don't use dockerhub push because this image updates so rarely
|
||||
WoboqCodebrowser:
|
||||
runs-on: [self-hosted, style-checker]
|
||||
timeout-minutes: 420 # the task is pretty heavy, so there's an additional hour
|
||||
steps:
|
||||
- name: Set envs
|
||||
run: |
|
||||
|
2
.gitmodules
vendored
2
.gitmodules
vendored
@ -19,7 +19,7 @@
|
||||
url = https://github.com/google/googletest
|
||||
[submodule "contrib/capnproto"]
|
||||
path = contrib/capnproto
|
||||
url = https://github.com/capnproto/capnproto
|
||||
url = https://github.com/ClickHouse/capnproto
|
||||
[submodule "contrib/double-conversion"]
|
||||
path = contrib/double-conversion
|
||||
url = https://github.com/google/double-conversion
|
||||
|
@ -6,8 +6,10 @@ rules:
|
||||
level: warning
|
||||
indent-sequences: consistent
|
||||
line-length:
|
||||
# there are some bash -c "", so this is OK
|
||||
max: 300
|
||||
# there are:
|
||||
# - bash -c "", so this is OK
|
||||
# - yaml in tests
|
||||
max: 1000
|
||||
level: warning
|
||||
comments:
|
||||
min-spaces-from-content: 1
|
||||
|
@ -21,6 +21,7 @@
|
||||
* Setting `enable_memory_bound_merging_of_aggregation_results` is enabled by default. If you update from version prior to 22.12, we recommend to set this flag to `false` until update is finished. [#50319](https://github.com/ClickHouse/ClickHouse/pull/50319) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
|
||||
#### New Feature
|
||||
* Added storage engine AzureBlobStorage and azureBlobStorage table function. The supported set of features is very similar to storage/table function S3 [#50604] (https://github.com/ClickHouse/ClickHouse/pull/50604) ([alesapin](https://github.com/alesapin)) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni).
|
||||
* Added native ClickHouse Keeper CLI Client, it is available as `clickhouse keeper-client` [#47414](https://github.com/ClickHouse/ClickHouse/pull/47414) ([pufit](https://github.com/pufit)).
|
||||
* Add `urlCluster` table function. Refactor all *Cluster table functions to reduce code duplication. Make schema inference work for all possible *Cluster function signatures and for named collections. Closes [#38499](https://github.com/ClickHouse/ClickHouse/issues/38499). [#45427](https://github.com/ClickHouse/ClickHouse/pull/45427) ([attack204](https://github.com/attack204)), Pavel Kruglov.
|
||||
* The query cache can now be used for production workloads. [#47977](https://github.com/ClickHouse/ClickHouse/pull/47977) ([Robert Schulze](https://github.com/rschu1ze)). The query cache can now support queries with totals and extremes modifier. [#48853](https://github.com/ClickHouse/ClickHouse/pull/48853) ([Robert Schulze](https://github.com/rschu1ze)). Make `allow_experimental_query_cache` setting as obsolete for backward-compatibility. It was removed in https://github.com/ClickHouse/ClickHouse/pull/47977. [#49934](https://github.com/ClickHouse/ClickHouse/pull/49934) ([Timur Solodovnikov](https://github.com/tsolodov)).
|
||||
@ -31,7 +32,7 @@
|
||||
* Introduces new keyword `INTO OUTFILE 'file.txt' APPEND`. [#48880](https://github.com/ClickHouse/ClickHouse/pull/48880) ([alekar](https://github.com/alekar)).
|
||||
* Added `system.zookeeper_connection` table that shows information about Keeper connections. [#45245](https://github.com/ClickHouse/ClickHouse/pull/45245) ([mateng915](https://github.com/mateng0915)).
|
||||
* Add new function `generateRandomStructure` that generates random table structure. It can be used in combination with table function `generateRandom`. [#47409](https://github.com/ClickHouse/ClickHouse/pull/47409) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Allow the use of `CASE` without an `ELSE` branch and extended `transform` to deal with more types. Also fix some issues that made transform() return incorrect results when decimal types were mixed with other numeric types. [#48300](https://github.com/ClickHouse/ClickHouse/pull/48300) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
|
||||
* Allow the use of `CASE` without an `ELSE` branch and extended `transform` to deal with more types. Also fix some issues that made transform() return incorrect results when decimal types were mixed with other numeric types. [#48300](https://github.com/ClickHouse/ClickHouse/pull/48300) ([Salvatore Mesoraca](https://github.com/aiven-sal)). This closes #2655. This closes #9596. This closes #38666.
|
||||
* Added [server-side encryption using KMS keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) with S3 tables, and the `header` setting with S3 disks. Closes [#48723](https://github.com/ClickHouse/ClickHouse/issues/48723). [#48724](https://github.com/ClickHouse/ClickHouse/pull/48724) ([Johann Gan](https://github.com/johanngan)).
|
||||
* Add MemoryTracker for the background tasks (merges and mutation). Introduces `merges_mutations_memory_usage_soft_limit` and `merges_mutations_memory_usage_to_ram_ratio` settings that represent the soft memory limit for merges and mutations. If this limit is reached ClickHouse won't schedule new merge or mutation tasks. Also `MergesMutationsMemoryTracking` metric is introduced to allow observing current memory usage of background tasks. Resubmit [#46089](https://github.com/ClickHouse/ClickHouse/issues/46089). Closes [#48774](https://github.com/ClickHouse/ClickHouse/issues/48774). [#48787](https://github.com/ClickHouse/ClickHouse/pull/48787) ([Dmitry Novik](https://github.com/novikd)).
|
||||
* Function `dotProduct` work for array. [#49050](https://github.com/ClickHouse/ClickHouse/pull/49050) ([FFFFFFFHHHHHHH](https://github.com/FFFFFFFHHHHHHH)).
|
||||
|
11
README.md
11
README.md
@ -22,12 +22,13 @@ curl https://clickhouse.com/ | sh
|
||||
|
||||
## Upcoming Events
|
||||
|
||||
* [**v23.5 Release Webinar**](https://clickhouse.com/company/events/v23-5-release-webinar?utm_source=github&utm_medium=social&utm_campaign=release-webinar-2023-05) - Jun 8 - 23.5 is rapidly approaching. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
|
||||
* [**ClickHouse Meetup in Bangalore**](https://www.meetup.com/clickhouse-bangalore-user-group/events/293740066/) - Jun 7
|
||||
* [**ClickHouse Meetup in San Francisco**](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/293426725/) - Jun 7
|
||||
* [**v23.6 Release Webinar**](https://clickhouse.com/company/events/v23-6-release-call?utm_source=github&utm_medium=social&utm_campaign=release-webinar-2023-06) - Jun 29 - 23.6 is rapidly approaching. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
|
||||
* [**ClickHouse Meetup in Paris**](https://www.meetup.com/clickhouse-france-user-group/events/294283460) - Jul 4
|
||||
* [**ClickHouse Meetup in Boston**](https://www.meetup.com/clickhouse-boston-user-group/events/293913596) - Jul 18
|
||||
* [**ClickHouse Meetup in NYC**](https://www.meetup.com/clickhouse-new-york-user-group/events/293913441) - Jul 19
|
||||
* [**ClickHouse Meetup in Toronto**](https://www.meetup.com/clickhouse-toronto-user-group/events/294183127) - Jul 20
|
||||
|
||||
|
||||
Also, keep an eye out for upcoming meetups in Amsterdam, Boston, NYC, Beijing, and Toronto. Somewhere else you want us to be? Please feel free to reach out to tyler <at> clickhouse <dot> com.
|
||||
Also, keep an eye out for upcoming meetups around the world. Somewhere else you want us to be? Please feel free to reach out to tyler <at> clickhouse <dot> com.
|
||||
|
||||
## Recent Recordings
|
||||
* **Recent Meetup Videos**: [Meetup Playlist](https://www.youtube.com/playlist?list=PL0Z2YDlm0b3iNDUzpY1S3L_iV4nARda_U) Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
|
||||
|
@ -2,6 +2,7 @@
|
||||
|
||||
#include <cstdint>
|
||||
#include <string>
|
||||
#include <array>
|
||||
|
||||
#if defined(__SSE2__)
|
||||
#include <emmintrin.h>
|
||||
|
@ -11,3 +11,8 @@ constexpr double interpolateExponential(double min, double max, double ratio)
|
||||
assert(min > 0 && ratio >= 0 && ratio <= 1);
|
||||
return min * std::pow(max / min, ratio);
|
||||
}
|
||||
|
||||
constexpr double interpolateLinear(double min, double max, double ratio)
|
||||
{
|
||||
return std::lerp(min, max, ratio);
|
||||
}
|
||||
|
@ -117,6 +117,9 @@ public:
|
||||
void readRaw(char * buffer, std::streamsize length);
|
||||
/// Reads length bytes of raw data into buffer.
|
||||
|
||||
void readCString(std::string& value);
|
||||
/// Reads zero-terminated C-string into value.
|
||||
|
||||
void readBOM();
|
||||
/// Reads a byte-order mark from the stream and configures
|
||||
/// the reader for the encountered byte order.
|
||||
|
@ -56,6 +56,8 @@ public:
|
||||
LITTLE_ENDIAN_BYTE_ORDER = 3 /// little-endian byte-order
|
||||
};
|
||||
|
||||
static const std::streamsize DEFAULT_MAX_CSTR_LENGTH { 1024 };
|
||||
|
||||
BinaryWriter(std::ostream & ostr, StreamByteOrder byteOrder = NATIVE_BYTE_ORDER);
|
||||
/// Creates the BinaryWriter.
|
||||
|
||||
@ -131,6 +133,9 @@ public:
|
||||
void writeRaw(const char * buffer, std::streamsize length);
|
||||
/// Writes length raw bytes from the given buffer to the stream.
|
||||
|
||||
void writeCString(const char* cString, std::streamsize maxLength = DEFAULT_MAX_CSTR_LENGTH);
|
||||
/// Writes zero-terminated C-string.
|
||||
|
||||
void writeBOM();
|
||||
/// Writes a byte-order mark to the stream. A byte order mark is
|
||||
/// a 16-bit integer with a value of 0xFEFF, written in host byte-order.
|
||||
|
@ -274,6 +274,31 @@ void BinaryReader::readRaw(char* buffer, std::streamsize length)
|
||||
}
|
||||
|
||||
|
||||
void BinaryReader::readCString(std::string& value)
|
||||
{
|
||||
value.clear();
|
||||
if (!_istr.good())
|
||||
{
|
||||
return;
|
||||
}
|
||||
value.reserve(256);
|
||||
while (true)
|
||||
{
|
||||
char c;
|
||||
_istr.get(c);
|
||||
if (!_istr.good())
|
||||
{
|
||||
break;
|
||||
}
|
||||
if (c == '\0')
|
||||
{
|
||||
break;
|
||||
}
|
||||
value += c;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void BinaryReader::readBOM()
|
||||
{
|
||||
UInt16 bom;
|
||||
|
@ -271,7 +271,7 @@ BinaryWriter& BinaryWriter::operator << (const std::string& value)
|
||||
BinaryWriter& BinaryWriter::operator << (const char* value)
|
||||
{
|
||||
poco_check_ptr (value);
|
||||
|
||||
|
||||
if (_pTextConverter)
|
||||
{
|
||||
std::string converted;
|
||||
@ -332,6 +332,15 @@ void BinaryWriter::writeRaw(const char* buffer, std::streamsize length)
|
||||
}
|
||||
|
||||
|
||||
void BinaryWriter::writeCString(const char* cString, std::streamsize maxLength)
|
||||
{
|
||||
const std::size_t len = ::strnlen(cString, maxLength);
|
||||
writeRaw(cString, len);
|
||||
static const char zero = '\0';
|
||||
_ostr.write(&zero, sizeof(zero));
|
||||
}
|
||||
|
||||
|
||||
void BinaryWriter::writeBOM()
|
||||
{
|
||||
UInt16 value = 0xFEFF;
|
||||
|
@ -13,3 +13,4 @@ target_compile_options (_poco_mongodb
|
||||
|
||||
target_include_directories (_poco_mongodb SYSTEM PUBLIC "include")
|
||||
target_link_libraries (_poco_mongodb PUBLIC Poco::Net)
|
||||
|
||||
|
@ -33,7 +33,7 @@ namespace MongoDB
|
||||
/// This class represents a BSON Array.
|
||||
{
|
||||
public:
|
||||
typedef SharedPtr<Array> Ptr;
|
||||
using Ptr = SharedPtr<Array>;
|
||||
|
||||
Array();
|
||||
/// Creates an empty Array.
|
||||
@ -41,8 +41,31 @@ namespace MongoDB
|
||||
virtual ~Array();
|
||||
/// Destroys the Array.
|
||||
|
||||
// Document template functions available for backward compatibility
|
||||
using Document::add;
|
||||
using Document::get;
|
||||
|
||||
template <typename T>
|
||||
T get(int pos) const
|
||||
Document & add(T value)
|
||||
/// Creates an element with the name from the current pos and value and
|
||||
/// adds it to the array document.
|
||||
///
|
||||
/// The active document is returned to allow chaining of the add methods.
|
||||
{
|
||||
return Document::add<T>(Poco::NumberFormatter::format(size()), value);
|
||||
}
|
||||
|
||||
Document & add(const char * value)
|
||||
/// Creates an element with a name from the current pos and value and
|
||||
/// adds it to the array document.
|
||||
///
|
||||
/// The active document is returned to allow chaining of the add methods.
|
||||
{
|
||||
return Document::add(Poco::NumberFormatter::format(size()), value);
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
T get(std::size_t pos) const
|
||||
/// Returns the element at the given index and tries to convert
|
||||
/// it to the template type. If the element is not found, a
|
||||
/// Poco::NotFoundException will be thrown. If the element cannot be
|
||||
@ -52,7 +75,7 @@ namespace MongoDB
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
T get(int pos, const T & deflt) const
|
||||
T get(std::size_t pos, const T & deflt) const
|
||||
/// Returns the element at the given index and tries to convert
|
||||
/// it to the template type. If the element is not found, or
|
||||
/// has the wrong type, the deflt argument will be returned.
|
||||
@ -60,12 +83,12 @@ namespace MongoDB
|
||||
return Document::get<T>(Poco::NumberFormatter::format(pos), deflt);
|
||||
}
|
||||
|
||||
Element::Ptr get(int pos) const;
|
||||
Element::Ptr get(std::size_t pos) const;
|
||||
/// Returns the element at the given index.
|
||||
/// An empty element will be returned if the element is not found.
|
||||
|
||||
template <typename T>
|
||||
bool isType(int pos) const
|
||||
bool isType(std::size_t pos) const
|
||||
/// Returns true if the type of the element equals the TypeId of ElementTrait,
|
||||
/// otherwise false.
|
||||
{
|
||||
@ -74,6 +97,9 @@ namespace MongoDB
|
||||
|
||||
std::string toString(int indent = 0) const;
|
||||
/// Returns a string representation of the Array.
|
||||
|
||||
private:
|
||||
friend void BSONReader::read<Array::Ptr>(Array::Ptr & to);
|
||||
};
|
||||
|
||||
|
||||
|
@ -40,7 +40,7 @@ namespace MongoDB
|
||||
/// A Binary stores its data in a Poco::Buffer<unsigned char>.
|
||||
{
|
||||
public:
|
||||
typedef SharedPtr<Binary> Ptr;
|
||||
using Ptr = SharedPtr<Binary>;
|
||||
|
||||
Binary();
|
||||
/// Creates an empty Binary with subtype 0.
|
||||
|
@ -18,6 +18,7 @@
|
||||
#define MongoDB_Connection_INCLUDED
|
||||
|
||||
|
||||
#include "Poco/MongoDB/OpMsgMessage.h"
|
||||
#include "Poco/MongoDB/RequestMessage.h"
|
||||
#include "Poco/MongoDB/ResponseMessage.h"
|
||||
#include "Poco/Mutex.h"
|
||||
@ -39,7 +40,7 @@ namespace MongoDB
|
||||
/// for more information on the wire protocol.
|
||||
{
|
||||
public:
|
||||
typedef Poco::SharedPtr<Connection> Ptr;
|
||||
using Ptr = Poco::SharedPtr<Connection>;
|
||||
|
||||
class MongoDB_API SocketFactory
|
||||
{
|
||||
@ -90,7 +91,7 @@ namespace MongoDB
|
||||
|
||||
Poco::Net::SocketAddress address() const;
|
||||
/// Returns the address of the MongoDB server.
|
||||
|
||||
|
||||
const std::string & uri() const;
|
||||
/// Returns the uri on which the connection was made.
|
||||
|
||||
@ -145,6 +146,21 @@ namespace MongoDB
|
||||
/// Use this when a response is expected: only a "query" or "getmore"
|
||||
/// request will return a response.
|
||||
|
||||
void sendRequest(OpMsgMessage & request, OpMsgMessage & response);
|
||||
/// Sends a request to the MongoDB server and receives the response
|
||||
/// using newer wire protocol with OP_MSG.
|
||||
|
||||
void sendRequest(OpMsgMessage & request);
|
||||
/// Sends an unacknowledged request to the MongoDB server using newer
|
||||
/// wire protocol with OP_MSG.
|
||||
/// No response is sent by the server.
|
||||
|
||||
void readResponse(OpMsgMessage & response);
|
||||
/// Reads additional response data when previous message's flag moreToCome
|
||||
/// indicates that server will send more data.
|
||||
/// NOTE: See comments in OpMsgCursor code.
|
||||
|
||||
|
||||
protected:
|
||||
void connect();
|
||||
|
||||
@ -164,7 +180,7 @@ namespace MongoDB
|
||||
}
|
||||
inline const std::string & Connection::uri() const
|
||||
{
|
||||
return _uri;
|
||||
return _uri;
|
||||
}
|
||||
|
||||
|
||||
|
@ -40,6 +40,9 @@ namespace MongoDB
|
||||
Cursor(const std::string & fullCollectionName, QueryRequest::Flags flags = QueryRequest::QUERY_DEFAULT);
|
||||
/// Creates a Cursor for the given database and collection ("database.collection"), using the specified flags.
|
||||
|
||||
Cursor(const Document & aggregationResponse);
|
||||
/// Creates a Cursor for the given aggregation query response.
|
||||
|
||||
virtual ~Cursor();
|
||||
/// Destroys the Cursor.
|
||||
|
||||
|
@ -26,6 +26,8 @@
|
||||
#include "Poco/MongoDB/QueryRequest.h"
|
||||
#include "Poco/MongoDB/UpdateRequest.h"
|
||||
|
||||
#include "Poco/MongoDB/OpMsgCursor.h"
|
||||
#include "Poco/MongoDB/OpMsgMessage.h"
|
||||
|
||||
namespace Poco
|
||||
{
|
||||
@ -45,6 +47,9 @@ namespace MongoDB
|
||||
virtual ~Database();
|
||||
/// Destroys the Database.
|
||||
|
||||
const std::string & name() const;
|
||||
/// Database name
|
||||
|
||||
bool authenticate(
|
||||
Connection & connection,
|
||||
const std::string & username,
|
||||
@ -62,34 +67,49 @@ namespace MongoDB
|
||||
/// May throw a Poco::ProtocolException if authentication fails for a reason other than
|
||||
/// invalid credentials.
|
||||
|
||||
Document::Ptr queryBuildInfo(Connection & connection) const;
|
||||
/// Queries server build info (all wire protocols)
|
||||
|
||||
Document::Ptr queryServerHello(Connection & connection) const;
|
||||
/// Queries hello response from server (all wire protocols)
|
||||
|
||||
Int64 count(Connection & connection, const std::string & collectionName) const;
|
||||
/// Sends a count request for the given collection to MongoDB.
|
||||
/// Sends a count request for the given collection to MongoDB. (old wire protocol)
|
||||
///
|
||||
/// If the command fails, -1 is returned.
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> createCommand() const;
|
||||
/// Creates a QueryRequest for a command.
|
||||
/// Creates a QueryRequest for a command. (old wire protocol)
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> createCountRequest(const std::string & collectionName) const;
|
||||
/// Creates a QueryRequest to count the given collection.
|
||||
/// The collectionname must not contain the database name.
|
||||
/// The collectionname must not contain the database name. (old wire protocol)
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::DeleteRequest> createDeleteRequest(const std::string & collectionName) const;
|
||||
/// Creates a DeleteRequest to delete documents in the given collection.
|
||||
/// The collectionname must not contain the database name.
|
||||
/// The collectionname must not contain the database name. (old wire protocol)
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::InsertRequest> createInsertRequest(const std::string & collectionName) const;
|
||||
/// Creates an InsertRequest to insert new documents in the given collection.
|
||||
/// The collectionname must not contain the database name.
|
||||
/// The collectionname must not contain the database name. (old wire protocol)
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> createQueryRequest(const std::string & collectionName) const;
|
||||
/// Creates a QueryRequest.
|
||||
/// Creates a QueryRequest. (old wire protocol)
|
||||
/// The collectionname must not contain the database name.
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::UpdateRequest> createUpdateRequest(const std::string & collectionName) const;
|
||||
/// Creates an UpdateRequest.
|
||||
/// Creates an UpdateRequest. (old wire protocol)
|
||||
/// The collectionname must not contain the database name.
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::OpMsgMessage> createOpMsgMessage(const std::string & collectionName) const;
|
||||
/// Creates OpMsgMessage. (new wire protocol)
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::OpMsgMessage> createOpMsgMessage() const;
|
||||
/// Creates OpMsgMessage for database commands that do not require collection as an argument. (new wire protocol)
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::OpMsgCursor> createOpMsgCursor(const std::string & collectionName) const;
|
||||
/// Creates OpMsgCursor. (new wire protocol)
|
||||
|
||||
Poco::MongoDB::Document::Ptr ensureIndex(
|
||||
Connection & connection,
|
||||
const std::string & collection,
|
||||
@ -100,14 +120,16 @@ namespace MongoDB
|
||||
int version = 0,
|
||||
int ttl = 0);
|
||||
/// Creates an index. The document returned is the result of a getLastError call.
|
||||
/// For more info look at the ensureIndex information on the MongoDB website.
|
||||
/// For more info look at the ensureIndex information on the MongoDB website. (old wire protocol)
|
||||
|
||||
Document::Ptr getLastErrorDoc(Connection & connection) const;
|
||||
/// Sends the getLastError command to the database and returns the error document.
|
||||
/// (old wire protocol)
|
||||
|
||||
std::string getLastError(Connection & connection) const;
|
||||
/// Sends the getLastError command to the database and returns the err element
|
||||
/// from the error document. When err is null, an empty string is returned.
|
||||
/// (old wire protocol)
|
||||
|
||||
static const std::string AUTH_MONGODB_CR;
|
||||
/// Default authentication mechanism prior to MongoDB 3.0.
|
||||
@ -115,6 +137,27 @@ namespace MongoDB
|
||||
static const std::string AUTH_SCRAM_SHA1;
|
||||
/// Default authentication mechanism for MongoDB 3.0.
|
||||
|
||||
enum WireVersion
|
||||
/// Wire version as reported by the command hello.
|
||||
/// See details in MongoDB github, repository specifications.
|
||||
/// @see queryServerHello
|
||||
{
|
||||
VER_26 = 1,
|
||||
VER_26_2 = 2,
|
||||
VER_30 = 3,
|
||||
VER_32 = 4,
|
||||
VER_34 = 5,
|
||||
VER_36 = 6, ///< First wire version that supports OP_MSG
|
||||
VER_40 = 7,
|
||||
VER_42 = 8,
|
||||
VER_44 = 9,
|
||||
VER_50 = 13,
|
||||
VER_51 = 14, ///< First wire version that supports only OP_MSG
|
||||
VER_52 = 15,
|
||||
VER_53 = 16,
|
||||
VER_60 = 17
|
||||
};
|
||||
|
||||
protected:
|
||||
bool authCR(Connection & connection, const std::string & username, const std::string & password);
|
||||
bool authSCRAM(Connection & connection, const std::string & username, const std::string & password);
|
||||
@ -127,6 +170,12 @@ namespace MongoDB
|
||||
//
|
||||
// inlines
|
||||
//
|
||||
inline const std::string & Database::name() const
|
||||
{
|
||||
return _dbname;
|
||||
}
|
||||
|
||||
|
||||
inline Poco::SharedPtr<Poco::MongoDB::QueryRequest> Database::createCommand() const
|
||||
{
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> cmd = createQueryRequest("$cmd");
|
||||
@ -158,6 +207,24 @@ namespace MongoDB
|
||||
return new Poco::MongoDB::UpdateRequest(_dbname + '.' + collectionName);
|
||||
}
|
||||
|
||||
// -- New wire protocol commands
|
||||
|
||||
inline Poco::SharedPtr<Poco::MongoDB::OpMsgMessage> Database::createOpMsgMessage(const std::string & collectionName) const
|
||||
{
|
||||
return new Poco::MongoDB::OpMsgMessage(_dbname, collectionName);
|
||||
}
|
||||
|
||||
inline Poco::SharedPtr<Poco::MongoDB::OpMsgMessage> Database::createOpMsgMessage() const
|
||||
{
|
||||
// Collection name for database commands is not needed.
|
||||
return createOpMsgMessage("");
|
||||
}
|
||||
|
||||
inline Poco::SharedPtr<Poco::MongoDB::OpMsgCursor> Database::createOpMsgCursor(const std::string & collectionName) const
|
||||
{
|
||||
return new Poco::MongoDB::OpMsgCursor(_dbname, collectionName);
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
} // namespace Poco::MongoDB
|
||||
|
@ -31,6 +31,7 @@ namespace Poco
|
||||
namespace MongoDB
|
||||
{
|
||||
|
||||
class Array;
|
||||
|
||||
class ElementFindByName
|
||||
{
|
||||
@ -48,8 +49,8 @@ namespace MongoDB
|
||||
/// Represents a MongoDB (BSON) document.
|
||||
{
|
||||
public:
|
||||
typedef SharedPtr<Document> Ptr;
|
||||
typedef std::vector<Document::Ptr> Vector;
|
||||
using Ptr = SharedPtr<Document>;
|
||||
using Vector = std::vector<Document::Ptr>;
|
||||
|
||||
Document();
|
||||
/// Creates an empty Document.
|
||||
@ -86,6 +87,10 @@ namespace MongoDB
|
||||
/// Unlike the other add methods, this method returns
|
||||
/// a reference to the new document.
|
||||
|
||||
Array & addNewArray(const std::string & name);
|
||||
/// Create a new array and add it to this document.
|
||||
/// Method returns a reference to the new array.
|
||||
|
||||
void clear();
|
||||
/// Removes all elements from the document.
|
||||
|
||||
@ -95,7 +100,7 @@ namespace MongoDB
|
||||
bool empty() const;
|
||||
/// Returns true if the document doesn't contain any documents.
|
||||
|
||||
bool exists(const std::string & name);
|
||||
bool exists(const std::string & name) const;
|
||||
/// Returns true if the document has an element with the given name.
|
||||
|
||||
template <typename T>
|
||||
@ -158,6 +163,9 @@ namespace MongoDB
|
||||
/// return an Int64. When the element is not found, a
|
||||
/// Poco::NotFoundException will be thrown.
|
||||
|
||||
bool remove(const std::string & name);
|
||||
/// Removes an element from the document.
|
||||
|
||||
template <typename T>
|
||||
bool isType(const std::string & name) const
|
||||
/// Returns true when the type of the element equals the TypeId of ElementTrait.
|
||||
@ -227,12 +235,23 @@ namespace MongoDB
|
||||
}
|
||||
|
||||
|
||||
inline bool Document::exists(const std::string & name)
|
||||
inline bool Document::exists(const std::string & name) const
|
||||
{
|
||||
return std::find_if(_elements.begin(), _elements.end(), ElementFindByName(name)) != _elements.end();
|
||||
}
|
||||
|
||||
|
||||
inline bool Document::remove(const std::string & name)
|
||||
{
|
||||
auto it = std::find_if(_elements.begin(), _elements.end(), ElementFindByName(name));
|
||||
if (it == _elements.end())
|
||||
return false;
|
||||
|
||||
_elements.erase(it);
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
inline std::size_t Document::size() const
|
||||
{
|
||||
return _elements.size();
|
||||
|
@ -45,7 +45,7 @@ namespace MongoDB
|
||||
/// Represents an Element of a Document or an Array.
|
||||
{
|
||||
public:
|
||||
typedef Poco::SharedPtr<Element> Ptr;
|
||||
using Ptr = Poco::SharedPtr<Element>;
|
||||
|
||||
explicit Element(const std::string & name);
|
||||
/// Creates the Element with the given name.
|
||||
@ -80,7 +80,7 @@ namespace MongoDB
|
||||
}
|
||||
|
||||
|
||||
typedef std::list<Element::Ptr> ElementSet;
|
||||
using ElementSet = std::list<Element::Ptr>;
|
||||
|
||||
|
||||
template <typename T>
|
||||
@ -266,7 +266,7 @@ namespace MongoDB
|
||||
}
|
||||
|
||||
|
||||
typedef Nullable<unsigned char> NullValue;
|
||||
using NullValue = Nullable<unsigned char>;
|
||||
|
||||
|
||||
// BSON Null Value
|
||||
|
@ -35,7 +35,7 @@ namespace MongoDB
|
||||
/// Represents JavaScript type in BSON.
|
||||
{
|
||||
public:
|
||||
typedef SharedPtr<JavaScriptCode> Ptr;
|
||||
using Ptr = SharedPtr<JavaScriptCode>;
|
||||
|
||||
JavaScriptCode();
|
||||
/// Creates an empty JavaScriptCode object.
|
||||
|
@ -28,6 +28,9 @@ namespace MongoDB
|
||||
{
|
||||
|
||||
|
||||
class Message; // Required to disambiguate friend declaration in MessageHeader.
|
||||
|
||||
|
||||
class MongoDB_API MessageHeader
|
||||
/// Represents the message header which is always prepended to a
|
||||
/// MongoDB request or response message.
|
||||
@ -37,14 +40,18 @@ namespace MongoDB
|
||||
|
||||
enum OpCode
|
||||
{
|
||||
// Opcodes deprecated in MongoDB 5.0
|
||||
OP_REPLY = 1,
|
||||
OP_MSG = 1000,
|
||||
OP_UPDATE = 2001,
|
||||
OP_INSERT = 2002,
|
||||
OP_QUERY = 2004,
|
||||
OP_GET_MORE = 2005,
|
||||
OP_DELETE = 2006,
|
||||
OP_KILL_CURSORS = 2007
|
||||
OP_KILL_CURSORS = 2007,
|
||||
|
||||
/// Opcodes supported in MongoDB 5.1 and later
|
||||
OP_COMPRESSED = 2012,
|
||||
OP_MSG = 2013
|
||||
};
|
||||
|
||||
explicit MessageHeader(OpCode);
|
||||
|
@ -33,6 +33,13 @@
|
||||
//
|
||||
|
||||
|
||||
#if defined(_WIN32) && defined(POCO_DLL)
|
||||
# if defined(MongoDB_EXPORTS)
|
||||
# define MongoDB_API __declspec(dllexport)
|
||||
# else
|
||||
# define MongoDB_API __declspec(dllimport)
|
||||
# endif
|
||||
#endif
|
||||
|
||||
|
||||
#if !defined(MongoDB_API)
|
||||
@ -47,6 +54,11 @@
|
||||
//
|
||||
// Automatically link MongoDB library.
|
||||
//
|
||||
#if defined(_MSC_VER)
|
||||
# if !defined(POCO_NO_AUTOMATIC_LIBS) && !defined(MongoDB_EXPORTS)
|
||||
# pragma comment(lib, "PocoMongoDB" POCO_LIB_SUFFIX)
|
||||
# endif
|
||||
#endif
|
||||
|
||||
|
||||
#endif // MongoDBMongoDB_INCLUDED
|
||||
|
@ -44,7 +44,7 @@ namespace MongoDB
|
||||
/// as its value.
|
||||
{
|
||||
public:
|
||||
typedef SharedPtr<ObjectId> Ptr;
|
||||
using Ptr = SharedPtr<ObjectId>;
|
||||
|
||||
explicit ObjectId(const std::string & id);
|
||||
/// Creates an ObjectId from a string.
|
||||
|
96
base/poco/MongoDB/include/Poco/MongoDB/OpMsgCursor.h
Normal file
96
base/poco/MongoDB/include/Poco/MongoDB/OpMsgCursor.h
Normal file
@ -0,0 +1,96 @@
|
||||
//
|
||||
// OpMsgCursor.h
|
||||
//
|
||||
// Library: MongoDB
|
||||
// Package: MongoDB
|
||||
// Module: OpMsgCursor
|
||||
//
|
||||
// Definition of the OpMsgCursor class.
|
||||
//
|
||||
// Copyright (c) 2012, Applied Informatics Software Engineering GmbH.
|
||||
// and Contributors.
|
||||
//
|
||||
// SPDX-License-Identifier: BSL-1.0
|
||||
//
|
||||
|
||||
|
||||
#ifndef MongoDB_OpMsgCursor_INCLUDED
|
||||
#define MongoDB_OpMsgCursor_INCLUDED
|
||||
|
||||
|
||||
#include "Poco/MongoDB/Connection.h"
|
||||
#include "Poco/MongoDB/MongoDB.h"
|
||||
#include "Poco/MongoDB/OpMsgMessage.h"
|
||||
|
||||
namespace Poco
|
||||
{
|
||||
namespace MongoDB
|
||||
{
|
||||
|
||||
|
||||
class MongoDB_API OpMsgCursor : public Document
|
||||
/// OpMsgCursor is an helper class for querying multiple documents using OpMsgMessage.
|
||||
{
|
||||
public:
|
||||
OpMsgCursor(const std::string & dbname, const std::string & collectionName);
|
||||
/// Creates a OpMsgCursor for the given database and collection.
|
||||
|
||||
virtual ~OpMsgCursor();
|
||||
/// Destroys the OpMsgCursor.
|
||||
|
||||
void setEmptyFirstBatch(bool empty);
|
||||
/// Empty first batch is used to get error response faster with little server processing
|
||||
|
||||
bool emptyFirstBatch() const;
|
||||
|
||||
void setBatchSize(Int32 batchSize);
|
||||
/// Set non-default batch size
|
||||
|
||||
Int32 batchSize() const;
|
||||
/// Current batch size (zero or negative number indicates default batch size)
|
||||
|
||||
Int64 cursorID() const;
|
||||
|
||||
OpMsgMessage & next(Connection & connection);
|
||||
/// Tries to get the next documents. As long as response message has a
|
||||
/// cursor ID next can be called to retrieve the next bunch of documents.
|
||||
///
|
||||
/// The cursor must be killed (see kill()) when not all documents are needed.
|
||||
|
||||
OpMsgMessage & query();
|
||||
/// Returns the associated query.
|
||||
|
||||
void kill(Connection & connection);
|
||||
/// Kills the cursor and reset it so that it can be reused.
|
||||
|
||||
private:
|
||||
OpMsgMessage _query;
|
||||
OpMsgMessage _response;
|
||||
|
||||
bool _emptyFirstBatch{false};
|
||||
Int32 _batchSize{-1};
|
||||
/// Batch size used in the cursor. Zero or negative value means that default shall be used.
|
||||
|
||||
Int64 _cursorID{0};
|
||||
};
|
||||
|
||||
|
||||
//
|
||||
// inlines
|
||||
//
|
||||
inline OpMsgMessage & OpMsgCursor::query()
|
||||
{
|
||||
return _query;
|
||||
}
|
||||
|
||||
inline Int64 OpMsgCursor::cursorID() const
|
||||
{
|
||||
return _cursorID;
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
} // namespace Poco::MongoDB
|
||||
|
||||
|
||||
#endif // MongoDB_OpMsgCursor_INCLUDED
|
163
base/poco/MongoDB/include/Poco/MongoDB/OpMsgMessage.h
Normal file
163
base/poco/MongoDB/include/Poco/MongoDB/OpMsgMessage.h
Normal file
@ -0,0 +1,163 @@
|
||||
//
|
||||
// OpMsgMessage.h
|
||||
//
|
||||
// Library: MongoDB
|
||||
// Package: MongoDB
|
||||
// Module: OpMsgMessage
|
||||
//
|
||||
// Definition of the OpMsgMessage class.
|
||||
//
|
||||
// Copyright (c) 2022, Applied Informatics Software Engineering GmbH.
|
||||
// and Contributors.
|
||||
//
|
||||
// SPDX-License-Identifier: BSL-1.0
|
||||
//
|
||||
|
||||
|
||||
#ifndef MongoDB_OpMsgMessage_INCLUDED
|
||||
#define MongoDB_OpMsgMessage_INCLUDED
|
||||
|
||||
|
||||
#include "Poco/MongoDB/Document.h"
|
||||
#include "Poco/MongoDB/Message.h"
|
||||
#include "Poco/MongoDB/MongoDB.h"
|
||||
|
||||
#include <string>
|
||||
|
||||
namespace Poco
|
||||
{
|
||||
namespace MongoDB
|
||||
{
|
||||
|
||||
|
||||
class MongoDB_API OpMsgMessage : public Message
|
||||
/// This class represents a request/response (OP_MSG) to send requests and receive responses to/from MongoDB.
|
||||
{
|
||||
public:
|
||||
// Constants for most often used MongoDB commands that can be sent using OP_MSG
|
||||
// For complete list see: https://www.mongodb.com/docs/manual/reference/command/
|
||||
|
||||
// Query and write
|
||||
static const std::string CMD_INSERT;
|
||||
static const std::string CMD_DELETE;
|
||||
static const std::string CMD_UPDATE;
|
||||
static const std::string CMD_FIND;
|
||||
static const std::string CMD_FIND_AND_MODIFY;
|
||||
static const std::string CMD_GET_MORE;
|
||||
|
||||
// Aggregation
|
||||
static const std::string CMD_AGGREGATE;
|
||||
static const std::string CMD_COUNT;
|
||||
static const std::string CMD_DISTINCT;
|
||||
static const std::string CMD_MAP_REDUCE;
|
||||
|
||||
// Replication and administration
|
||||
static const std::string CMD_HELLO;
|
||||
static const std::string CMD_REPL_SET_GET_STATUS;
|
||||
static const std::string CMD_REPL_SET_GET_CONFIG;
|
||||
|
||||
static const std::string CMD_CREATE;
|
||||
static const std::string CMD_CREATE_INDEXES;
|
||||
static const std::string CMD_DROP;
|
||||
static const std::string CMD_DROP_DATABASE;
|
||||
static const std::string CMD_KILL_CURSORS;
|
||||
static const std::string CMD_LIST_DATABASES;
|
||||
static const std::string CMD_LIST_INDEXES;
|
||||
|
||||
// Diagnostic
|
||||
static const std::string CMD_BUILD_INFO;
|
||||
static const std::string CMD_COLL_STATS;
|
||||
static const std::string CMD_DB_STATS;
|
||||
static const std::string CMD_HOST_INFO;
|
||||
|
||||
|
||||
enum Flags : UInt32
|
||||
{
|
||||
MSG_FLAGS_DEFAULT = 0,
|
||||
|
||||
MSG_CHECKSUM_PRESENT = (1 << 0),
|
||||
|
||||
MSG_MORE_TO_COME = (1 << 1),
|
||||
/// Sender will send another message and is not prepared for overlapping messages
|
||||
|
||||
MSG_EXHAUST_ALLOWED = (1 << 16)
|
||||
/// Client is prepared for multiple replies (using the moreToCome bit) to this request
|
||||
};
|
||||
|
||||
OpMsgMessage();
|
||||
/// Creates an OpMsgMessage for response.
|
||||
|
||||
OpMsgMessage(const std::string & databaseName, const std::string & collectionName, UInt32 flags = MSG_FLAGS_DEFAULT);
|
||||
/// Creates an OpMsgMessage for requests.
|
||||
|
||||
virtual ~OpMsgMessage();
|
||||
|
||||
const std::string & databaseName() const;
|
||||
|
||||
const std::string & collectionName() const;
|
||||
|
||||
void setCommandName(const std::string & command);
|
||||
/// Sets the command name and clears the command document
|
||||
|
||||
void setCursor(Poco::Int64 cursorID, Poco::Int32 batchSize = -1);
|
||||
/// Sets the command "getMore" for the cursor id with batch size (if it is not negative).
|
||||
|
||||
const std::string & commandName() const;
|
||||
/// Current command name.
|
||||
|
||||
void setAcknowledgedRequest(bool ack);
|
||||
/// Set false to create request that does not return response.
|
||||
/// It has effect only for commands that write or delete documents.
|
||||
/// Default is true (request returns acknowledge response).
|
||||
|
||||
bool acknowledgedRequest() const;
|
||||
|
||||
UInt32 flags() const;
|
||||
|
||||
Document & body();
|
||||
/// Access to body document.
|
||||
/// Additional query arguments shall be added after setting the command name.
|
||||
|
||||
const Document & body() const;
|
||||
|
||||
Document::Vector & documents();
|
||||
/// Documents prepared for request or retrieved in response.
|
||||
|
||||
const Document::Vector & documents() const;
|
||||
/// Documents prepared for request or retrieved in response.
|
||||
|
||||
bool responseOk() const;
|
||||
/// Reads "ok" status from the response message.
|
||||
|
||||
void clear();
|
||||
/// Clears the message.
|
||||
|
||||
void send(std::ostream & ostr);
|
||||
/// Writes the request to stream.
|
||||
|
||||
void read(std::istream & istr);
|
||||
/// Reads the response from the stream.
|
||||
|
||||
private:
|
||||
enum PayloadType : UInt8
|
||||
{
|
||||
PAYLOAD_TYPE_0 = 0,
|
||||
PAYLOAD_TYPE_1 = 1
|
||||
};
|
||||
|
||||
std::string _databaseName;
|
||||
std::string _collectionName;
|
||||
UInt32 _flags{MSG_FLAGS_DEFAULT};
|
||||
std::string _commandName;
|
||||
bool _acknowledged{true};
|
||||
|
||||
Document _body;
|
||||
Document::Vector _documents;
|
||||
};
|
||||
|
||||
|
||||
}
|
||||
} // namespace Poco::MongoDB
|
||||
|
||||
|
||||
#endif // MongoDB_OpMsgMessage_INCLUDED
|
@ -94,7 +94,23 @@ namespace MongoDB
|
||||
|
||||
operator Connection::Ptr() { return _connection; }
|
||||
|
||||
#if defined(POCO_ENABLE_CPP11)
|
||||
// Disable copy to prevent unwanted release of resources: C++11 way
|
||||
PooledConnection(const PooledConnection &) = delete;
|
||||
PooledConnection & operator=(const PooledConnection &) = delete;
|
||||
|
||||
// Enable move semantics
|
||||
PooledConnection(PooledConnection && other) = default;
|
||||
PooledConnection & operator=(PooledConnection &&) = default;
|
||||
#endif
|
||||
|
||||
private:
|
||||
#if !defined(POCO_ENABLE_CPP11)
|
||||
// Disable copy to prevent unwanted release of resources: pre C++11 way
|
||||
PooledConnection(const PooledConnection &);
|
||||
PooledConnection & operator=(const PooledConnection &);
|
||||
#endif
|
||||
|
||||
Poco::ObjectPool<Connection, Connection::Ptr> & _pool;
|
||||
Connection::Ptr _connection;
|
||||
};
|
||||
|
@ -33,7 +33,7 @@ namespace MongoDB
|
||||
/// Represents a regular expression in BSON format.
|
||||
{
|
||||
public:
|
||||
typedef SharedPtr<RegularExpression> Ptr;
|
||||
using Ptr = SharedPtr<RegularExpression>;
|
||||
|
||||
RegularExpression();
|
||||
/// Creates an empty RegularExpression.
|
||||
|
@ -38,6 +38,9 @@ namespace MongoDB
|
||||
ResponseMessage();
|
||||
/// Creates an empty ResponseMessage.
|
||||
|
||||
ResponseMessage(const Int64 & cursorID);
|
||||
/// Creates an ResponseMessage for existing cursor ID.
|
||||
|
||||
virtual ~ResponseMessage();
|
||||
/// Destroys the ResponseMessage.
|
||||
|
||||
|
@ -20,7 +20,7 @@ namespace Poco {
|
||||
namespace MongoDB {
|
||||
|
||||
|
||||
Array::Array():
|
||||
Array::Array():
|
||||
Document()
|
||||
{
|
||||
}
|
||||
@ -31,7 +31,7 @@ Array::~Array()
|
||||
}
|
||||
|
||||
|
||||
Element::Ptr Array::get(int pos) const
|
||||
Element::Ptr Array::get(std::size_t pos) const
|
||||
{
|
||||
std::string name = Poco::NumberFormatter::format(pos);
|
||||
return Document::get(name);
|
||||
|
@ -319,4 +319,30 @@ void Connection::sendRequest(RequestMessage& request, ResponseMessage& response)
|
||||
}
|
||||
|
||||
|
||||
void Connection::sendRequest(OpMsgMessage& request, OpMsgMessage& response)
|
||||
{
|
||||
Poco::Net::SocketOutputStream sos(_socket);
|
||||
request.send(sos);
|
||||
|
||||
response.clear();
|
||||
readResponse(response);
|
||||
}
|
||||
|
||||
|
||||
void Connection::sendRequest(OpMsgMessage& request)
|
||||
{
|
||||
request.setAcknowledgedRequest(false);
|
||||
Poco::Net::SocketOutputStream sos(_socket);
|
||||
request.send(sos);
|
||||
}
|
||||
|
||||
|
||||
void Connection::readResponse(OpMsgMessage& response)
|
||||
{
|
||||
Poco::Net::SocketInputStream sis(_socket);
|
||||
response.read(sis);
|
||||
}
|
||||
|
||||
|
||||
|
||||
} } // Poco::MongoDB
|
||||
|
@ -33,6 +33,12 @@ Cursor::Cursor(const std::string& fullCollectionName, QueryRequest::Flags flags)
|
||||
}
|
||||
|
||||
|
||||
Cursor::Cursor(const Document& aggregationResponse) :
|
||||
_query(aggregationResponse.get<Poco::MongoDB::Document::Ptr>("cursor")->get<std::string>("ns")),
|
||||
_response(aggregationResponse.get<Poco::MongoDB::Document::Ptr>("cursor")->get<Int64>("id"))
|
||||
{
|
||||
}
|
||||
|
||||
Cursor::~Cursor()
|
||||
{
|
||||
try
|
||||
|
@ -334,6 +334,50 @@ bool Database::authSCRAM(Connection& connection, const std::string& username, co
|
||||
}
|
||||
|
||||
|
||||
Document::Ptr Database::queryBuildInfo(Connection& connection) const
|
||||
{
|
||||
// build info can be issued on "config" system database
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createCommand();
|
||||
request->selector().add("buildInfo", 1);
|
||||
|
||||
Poco::MongoDB::ResponseMessage response;
|
||||
connection.sendRequest(*request, response);
|
||||
|
||||
Document::Ptr buildInfo;
|
||||
if ( response.documents().size() > 0 )
|
||||
{
|
||||
buildInfo = response.documents()[0];
|
||||
}
|
||||
else
|
||||
{
|
||||
throw Poco::ProtocolException("Didn't get a response from the buildinfo command");
|
||||
}
|
||||
return buildInfo;
|
||||
}
|
||||
|
||||
|
||||
Document::Ptr Database::queryServerHello(Connection& connection) const
|
||||
{
|
||||
// hello can be issued on "config" system database
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createCommand();
|
||||
request->selector().add("hello", 1);
|
||||
|
||||
Poco::MongoDB::ResponseMessage response;
|
||||
connection.sendRequest(*request, response);
|
||||
|
||||
Document::Ptr hello;
|
||||
if ( response.documents().size() > 0 )
|
||||
{
|
||||
hello = response.documents()[0];
|
||||
}
|
||||
else
|
||||
{
|
||||
throw Poco::ProtocolException("Didn't get a response from the hello command");
|
||||
}
|
||||
return hello;
|
||||
}
|
||||
|
||||
|
||||
Int64 Database::count(Connection& connection, const std::string& collectionName) const
|
||||
{
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> countRequest = createCountRequest(collectionName);
|
||||
@ -390,7 +434,7 @@ Document::Ptr Database::getLastErrorDoc(Connection& connection) const
|
||||
{
|
||||
Document::Ptr errorDoc;
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createQueryRequest("$cmd");
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createCommand();
|
||||
request->setNumberToReturn(1);
|
||||
request->selector().add("getLastError", 1);
|
||||
|
||||
@ -420,7 +464,7 @@ std::string Database::getLastError(Connection& connection) const
|
||||
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> Database::createCountRequest(const std::string& collectionName) const
|
||||
{
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createQueryRequest("$cmd");
|
||||
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createCommand();
|
||||
request->setNumberToReturn(1);
|
||||
request->selector().add("count", collectionName);
|
||||
return request;
|
||||
|
@ -20,8 +20,8 @@ namespace MongoDB {
|
||||
|
||||
|
||||
DeleteRequest::DeleteRequest(const std::string& collectionName, DeleteRequest::Flags flags):
|
||||
RequestMessage(MessageHeader::OP_DELETE),
|
||||
_flags(flags),
|
||||
RequestMessage(MessageHeader::OP_DELETE),
|
||||
_flags(flags),
|
||||
_fullCollectionName(collectionName),
|
||||
_selector()
|
||||
{
|
||||
|
@ -35,6 +35,14 @@ Document::~Document()
|
||||
}
|
||||
|
||||
|
||||
Array& Document::addNewArray(const std::string& name)
|
||||
{
|
||||
Array::Ptr newArray = new Array();
|
||||
add(name, newArray);
|
||||
return *newArray;
|
||||
}
|
||||
|
||||
|
||||
Element::Ptr Document::get(const std::string& name) const
|
||||
{
|
||||
Element::Ptr element;
|
||||
@ -84,7 +92,7 @@ void Document::read(BinaryReader& reader)
|
||||
while (type != '\0')
|
||||
{
|
||||
Element::Ptr element;
|
||||
|
||||
|
||||
std::string name = BSONReader(reader).readCString();
|
||||
|
||||
switch (type)
|
||||
@ -198,7 +206,7 @@ void Document::write(BinaryWriter& writer)
|
||||
else
|
||||
{
|
||||
std::stringstream sstream;
|
||||
Poco::BinaryWriter tempWriter(sstream);
|
||||
Poco::BinaryWriter tempWriter(sstream, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
|
||||
for (ElementSet::iterator it = _elements.begin(); it != _elements.end(); ++it)
|
||||
{
|
||||
tempWriter << static_cast<unsigned char>((*it)->type());
|
||||
@ -207,7 +215,7 @@ void Document::write(BinaryWriter& writer)
|
||||
element->write(tempWriter);
|
||||
}
|
||||
tempWriter.flush();
|
||||
|
||||
|
||||
Poco::Int32 len = static_cast<Poco::Int32>(5 + sstream.tellp()); /* 5 = sizeof(len) + 0-byte */
|
||||
writer << len;
|
||||
writer.writeRaw(sstream.str());
|
||||
|
@ -24,7 +24,7 @@ Element::Element(const std::string& name) : _name(name)
|
||||
}
|
||||
|
||||
|
||||
Element::~Element()
|
||||
Element::~Element()
|
||||
{
|
||||
}
|
||||
|
||||
|
@ -21,7 +21,7 @@ namespace MongoDB {
|
||||
|
||||
|
||||
GetMoreRequest::GetMoreRequest(const std::string& collectionName, Int64 cursorID):
|
||||
RequestMessage(MessageHeader::OP_GET_MORE),
|
||||
RequestMessage(MessageHeader::OP_GET_MORE),
|
||||
_fullCollectionName(collectionName),
|
||||
_numberToReturn(100),
|
||||
_cursorID(cursorID)
|
||||
|
@ -20,7 +20,7 @@ namespace MongoDB {
|
||||
|
||||
|
||||
InsertRequest::InsertRequest(const std::string& collectionName, Flags flags):
|
||||
RequestMessage(MessageHeader::OP_INSERT),
|
||||
RequestMessage(MessageHeader::OP_INSERT),
|
||||
_flags(flags),
|
||||
_fullCollectionName(collectionName)
|
||||
{
|
||||
|
@ -37,7 +37,7 @@ void KillCursorsRequest::buildRequest(BinaryWriter& writer)
|
||||
for (std::vector<Int64>::iterator it = _cursors.begin(); it != _cursors.end(); ++it)
|
||||
{
|
||||
writer << *it;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
@ -19,7 +19,7 @@ namespace Poco {
|
||||
namespace MongoDB {
|
||||
|
||||
|
||||
Message::Message(MessageHeader::OpCode opcode):
|
||||
Message::Message(MessageHeader::OpCode opcode):
|
||||
_header(opcode)
|
||||
{
|
||||
}
|
||||
|
@ -20,10 +20,10 @@ namespace Poco {
|
||||
namespace MongoDB {
|
||||
|
||||
|
||||
MessageHeader::MessageHeader(OpCode opCode):
|
||||
_messageLength(0),
|
||||
_requestID(0),
|
||||
_responseTo(0),
|
||||
MessageHeader::MessageHeader(OpCode opCode):
|
||||
_messageLength(0),
|
||||
_requestID(0),
|
||||
_responseTo(0),
|
||||
_opCode(opCode)
|
||||
{
|
||||
}
|
||||
@ -42,7 +42,7 @@ void MessageHeader::read(BinaryReader& reader)
|
||||
|
||||
Int32 opCode;
|
||||
reader >> opCode;
|
||||
_opCode = (OpCode) opCode;
|
||||
_opCode = static_cast<OpCode>(opCode);
|
||||
|
||||
if (!reader.good())
|
||||
{
|
||||
@ -56,7 +56,7 @@ void MessageHeader::write(BinaryWriter& writer)
|
||||
writer << _messageLength;
|
||||
writer << _requestID;
|
||||
writer << _responseTo;
|
||||
writer << (Int32) _opCode;
|
||||
writer << static_cast<Int32>(_opCode);
|
||||
}
|
||||
|
||||
|
||||
|
@ -32,7 +32,7 @@ ObjectId::ObjectId(const std::string& id)
|
||||
poco_assert_dbg(id.size() == 24);
|
||||
|
||||
const char* p = id.c_str();
|
||||
for (std::size_t i = 0; i < 12; ++i)
|
||||
for (std::size_t i = 0; i < 12; ++i)
|
||||
{
|
||||
_id[i] = fromHex(p);
|
||||
p += 2;
|
||||
|
187
base/poco/MongoDB/src/OpMsgCursor.cpp
Normal file
187
base/poco/MongoDB/src/OpMsgCursor.cpp
Normal file
@ -0,0 +1,187 @@
|
||||
//
|
||||
// OpMsgCursor.cpp
|
||||
//
|
||||
// Library: MongoDB
|
||||
// Package: MongoDB
|
||||
// Module: OpMsgCursor
|
||||
//
|
||||
// Copyright (c) 2022, Applied Informatics Software Engineering GmbH.
|
||||
// and Contributors.
|
||||
//
|
||||
// SPDX-License-Identifier: BSL-1.0
|
||||
//
|
||||
|
||||
|
||||
#include "Poco/MongoDB/OpMsgCursor.h"
|
||||
#include "Poco/MongoDB/Array.h"
|
||||
|
||||
//
|
||||
// NOTE:
|
||||
//
|
||||
// MongoDB specification indicates that the flag MSG_EXHAUST_ALLOWED shall be
|
||||
// used in the request when the receiver is ready to receive multiple messages
|
||||
// without sending additional requests in between. Sender (MongoDB) indicates
|
||||
// that more messages follow with flag MSG_MORE_TO_COME.
|
||||
//
|
||||
// It seems that this does not work properly. MSG_MORE_TO_COME is set and reading
|
||||
// next messages sometimes works, however often the data is missing in response
|
||||
// or the message header contains wrong message length and reading blocks.
|
||||
// Opcode in the header is correct.
|
||||
//
|
||||
// Using MSG_EXHAUST_ALLOWED is therefore currently disabled.
|
||||
//
|
||||
// It seems that related JIRA ticket is:
|
||||
//
|
||||
// https://jira.mongodb.org/browse/SERVER-57297
|
||||
//
|
||||
// https://github.com/mongodb/specifications/blob/master/source/message/OP_MSG.rst
|
||||
//
|
||||
|
||||
#define MONGODB_EXHAUST_ALLOWED_WORKS false
|
||||
|
||||
namespace Poco {
|
||||
namespace MongoDB {
|
||||
|
||||
|
||||
static const std::string keyCursor {"cursor"};
|
||||
static const std::string keyFirstBatch {"firstBatch"};
|
||||
static const std::string keyNextBatch {"nextBatch"};
|
||||
|
||||
static Poco::Int64 cursorIdFromResponse(const MongoDB::Document& doc);
|
||||
|
||||
|
||||
OpMsgCursor::OpMsgCursor(const std::string& db, const std::string& collection):
|
||||
#if MONGODB_EXHAUST_ALLOWED_WORKS
|
||||
_query(db, collection, OpMsgMessage::MSG_EXHAUST_ALLOWED)
|
||||
#else
|
||||
_query(db, collection)
|
||||
#endif
|
||||
{
|
||||
}
|
||||
|
||||
OpMsgCursor::~OpMsgCursor()
|
||||
{
|
||||
try
|
||||
{
|
||||
poco_assert_dbg(_cursorID == 0);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void OpMsgCursor::setEmptyFirstBatch(bool empty)
|
||||
{
|
||||
_emptyFirstBatch = empty;
|
||||
}
|
||||
|
||||
|
||||
bool OpMsgCursor::emptyFirstBatch() const
|
||||
{
|
||||
return _emptyFirstBatch;
|
||||
}
|
||||
|
||||
|
||||
void OpMsgCursor::setBatchSize(Int32 batchSize)
|
||||
{
|
||||
_batchSize = batchSize;
|
||||
}
|
||||
|
||||
|
||||
Int32 OpMsgCursor::batchSize() const
|
||||
{
|
||||
return _batchSize;
|
||||
}
|
||||
|
||||
|
||||
OpMsgMessage& OpMsgCursor::next(Connection& connection)
|
||||
{
|
||||
if (_cursorID == 0)
|
||||
{
|
||||
_response.clear();
|
||||
|
||||
if (_emptyFirstBatch || _batchSize > 0)
|
||||
{
|
||||
Int32 bsize = _emptyFirstBatch ? 0 : _batchSize;
|
||||
if (_query.commandName() == OpMsgMessage::CMD_FIND)
|
||||
{
|
||||
_query.body().add("batchSize", bsize);
|
||||
}
|
||||
else if (_query.commandName() == OpMsgMessage::CMD_AGGREGATE)
|
||||
{
|
||||
auto& cursorDoc = _query.body().addNewDocument("cursor");
|
||||
cursorDoc.add("batchSize", bsize);
|
||||
}
|
||||
}
|
||||
|
||||
connection.sendRequest(_query, _response);
|
||||
|
||||
const auto& rdoc = _response.body();
|
||||
_cursorID = cursorIdFromResponse(rdoc);
|
||||
}
|
||||
else
|
||||
{
|
||||
#if MONGODB_EXHAUST_ALLOWED_WORKS
|
||||
std::cout << "Response flags: " << _response.flags() << std::endl;
|
||||
if (_response.flags() & OpMsgMessage::MSG_MORE_TO_COME)
|
||||
{
|
||||
std::cout << "More to come. Reading more response: " << std::endl;
|
||||
_response.clear();
|
||||
connection.readResponse(_response);
|
||||
}
|
||||
else
|
||||
#endif
|
||||
{
|
||||
_response.clear();
|
||||
_query.setCursor(_cursorID, _batchSize);
|
||||
connection.sendRequest(_query, _response);
|
||||
}
|
||||
}
|
||||
|
||||
const auto& rdoc = _response.body();
|
||||
_cursorID = cursorIdFromResponse(rdoc);
|
||||
|
||||
return _response;
|
||||
}
|
||||
|
||||
|
||||
void OpMsgCursor::kill(Connection& connection)
|
||||
{
|
||||
_response.clear();
|
||||
if (_cursorID != 0)
|
||||
{
|
||||
_query.setCommandName(OpMsgMessage::CMD_KILL_CURSORS);
|
||||
|
||||
MongoDB::Array::Ptr cursors = new MongoDB::Array();
|
||||
cursors->add<Poco::Int64>(_cursorID);
|
||||
_query.body().add("cursors", cursors);
|
||||
|
||||
connection.sendRequest(_query, _response);
|
||||
|
||||
const auto killed = _response.body().get<MongoDB::Array::Ptr>("cursorsKilled", nullptr);
|
||||
if (!killed || killed->size() != 1 || killed->get<Poco::Int64>(0, -1) != _cursorID)
|
||||
{
|
||||
throw Poco::ProtocolException("Cursor not killed as expected: " + std::to_string(_cursorID));
|
||||
}
|
||||
|
||||
_cursorID = 0;
|
||||
_query.clear();
|
||||
_response.clear();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Poco::Int64 cursorIdFromResponse(const MongoDB::Document& doc)
|
||||
{
|
||||
Poco::Int64 id {0};
|
||||
auto cursorDoc = doc.get<Document::Ptr>(keyCursor, nullptr);
|
||||
if(cursorDoc)
|
||||
{
|
||||
id = cursorDoc->get<Poco::Int64>("id", 0);
|
||||
}
|
||||
return id;
|
||||
}
|
||||
|
||||
|
||||
} } // Namespace Poco::MongoDB
|
412
base/poco/MongoDB/src/OpMsgMessage.cpp
Normal file
412
base/poco/MongoDB/src/OpMsgMessage.cpp
Normal file
@ -0,0 +1,412 @@
|
||||
//
|
||||
// OpMsgMessage.cpp
|
||||
//
|
||||
// Library: MongoDB
|
||||
// Package: MongoDB
|
||||
// Module: OpMsgMessage
|
||||
//
|
||||
// Copyright (c) 2022, Applied Informatics Software Engineering GmbH.
|
||||
// and Contributors.
|
||||
//
|
||||
// SPDX-License-Identifier: BSL-1.0
|
||||
//
|
||||
|
||||
#include "Poco/MongoDB/OpMsgMessage.h"
|
||||
#include "Poco/MongoDB/MessageHeader.h"
|
||||
#include "Poco/MongoDB/Array.h"
|
||||
#include "Poco/StreamCopier.h"
|
||||
#include "Poco/Logger.h"
|
||||
|
||||
#define POCO_MONGODB_DUMP false
|
||||
|
||||
namespace Poco {
|
||||
namespace MongoDB {
|
||||
|
||||
// Query and write
|
||||
const std::string OpMsgMessage::CMD_INSERT { "insert" };
|
||||
const std::string OpMsgMessage::CMD_DELETE { "delete" };
|
||||
const std::string OpMsgMessage::CMD_UPDATE { "update" };
|
||||
const std::string OpMsgMessage::CMD_FIND { "find" };
|
||||
const std::string OpMsgMessage::CMD_FIND_AND_MODIFY { "findAndModify" };
|
||||
const std::string OpMsgMessage::CMD_GET_MORE { "getMore" };
|
||||
|
||||
// Aggregation
|
||||
const std::string OpMsgMessage::CMD_AGGREGATE { "aggregate" };
|
||||
const std::string OpMsgMessage::CMD_COUNT { "count" };
|
||||
const std::string OpMsgMessage::CMD_DISTINCT { "distinct" };
|
||||
const std::string OpMsgMessage::CMD_MAP_REDUCE { "mapReduce" };
|
||||
|
||||
// Replication and administration
|
||||
const std::string OpMsgMessage::CMD_HELLO { "hello" };
|
||||
const std::string OpMsgMessage::CMD_REPL_SET_GET_STATUS { "replSetGetStatus" };
|
||||
const std::string OpMsgMessage::CMD_REPL_SET_GET_CONFIG { "replSetGetConfig" };
|
||||
|
||||
const std::string OpMsgMessage::CMD_CREATE { "create" };
|
||||
const std::string OpMsgMessage::CMD_CREATE_INDEXES { "createIndexes" };
|
||||
const std::string OpMsgMessage::CMD_DROP { "drop" };
|
||||
const std::string OpMsgMessage::CMD_DROP_DATABASE { "dropDatabase" };
|
||||
const std::string OpMsgMessage::CMD_KILL_CURSORS { "killCursors" };
|
||||
const std::string OpMsgMessage::CMD_LIST_DATABASES { "listDatabases" };
|
||||
const std::string OpMsgMessage::CMD_LIST_INDEXES { "listIndexes" };
|
||||
|
||||
// Diagnostic
|
||||
const std::string OpMsgMessage::CMD_BUILD_INFO { "buildInfo" };
|
||||
const std::string OpMsgMessage::CMD_COLL_STATS { "collStats" };
|
||||
const std::string OpMsgMessage::CMD_DB_STATS { "dbStats" };
|
||||
const std::string OpMsgMessage::CMD_HOST_INFO { "hostInfo" };
|
||||
|
||||
|
||||
static const std::string& commandIdentifier(const std::string& command);
|
||||
/// Commands have different names for the payload that is sent in a separate section
|
||||
|
||||
|
||||
static const std::string keyCursor {"cursor"};
|
||||
static const std::string keyFirstBatch {"firstBatch"};
|
||||
static const std::string keyNextBatch {"nextBatch"};
|
||||
|
||||
|
||||
OpMsgMessage::OpMsgMessage() :
|
||||
Message(MessageHeader::OP_MSG)
|
||||
{
|
||||
}
|
||||
|
||||
|
||||
OpMsgMessage::OpMsgMessage(const std::string& databaseName, const std::string& collectionName, UInt32 flags) :
|
||||
Message(MessageHeader::OP_MSG),
|
||||
_databaseName(databaseName),
|
||||
_collectionName(collectionName),
|
||||
_flags(flags)
|
||||
{
|
||||
}
|
||||
|
||||
|
||||
OpMsgMessage::~OpMsgMessage()
|
||||
{
|
||||
}
|
||||
|
||||
const std::string& OpMsgMessage::databaseName() const
|
||||
{
|
||||
return _databaseName;
|
||||
}
|
||||
|
||||
|
||||
const std::string& OpMsgMessage::collectionName() const
|
||||
{
|
||||
return _collectionName;
|
||||
}
|
||||
|
||||
|
||||
void OpMsgMessage::setCommandName(const std::string& command)
|
||||
{
|
||||
_commandName = command;
|
||||
_body.clear();
|
||||
|
||||
// IMPORTANT: Command name must be first
|
||||
if (_collectionName.empty())
|
||||
{
|
||||
// Collection is not specified. It is assumed that this particular command does
|
||||
// not need it.
|
||||
_body.add(_commandName, Int32(1));
|
||||
}
|
||||
else
|
||||
{
|
||||
_body.add(_commandName, _collectionName);
|
||||
}
|
||||
_body.add("$db", _databaseName);
|
||||
}
|
||||
|
||||
|
||||
void OpMsgMessage::setCursor(Poco::Int64 cursorID, Poco::Int32 batchSize)
|
||||
{
|
||||
_commandName = OpMsgMessage::CMD_GET_MORE;
|
||||
_body.clear();
|
||||
|
||||
// IMPORTANT: Command name must be first
|
||||
_body.add(_commandName, cursorID);
|
||||
_body.add("$db", _databaseName);
|
||||
_body.add("collection", _collectionName);
|
||||
if (batchSize > 0)
|
||||
{
|
||||
_body.add("batchSize", batchSize);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
const std::string& OpMsgMessage::commandName() const
|
||||
{
|
||||
return _commandName;
|
||||
}
|
||||
|
||||
|
||||
void OpMsgMessage::setAcknowledgedRequest(bool ack)
|
||||
{
|
||||
const auto& id = commandIdentifier(_commandName);
|
||||
if (id.empty())
|
||||
return;
|
||||
|
||||
_acknowledged = ack;
|
||||
|
||||
auto writeConcern = _body.get<Document::Ptr>("writeConcern", nullptr);
|
||||
if (writeConcern)
|
||||
writeConcern->remove("w");
|
||||
|
||||
if (ack)
|
||||
{
|
||||
_flags = _flags & (~MSG_MORE_TO_COME);
|
||||
}
|
||||
else
|
||||
{
|
||||
_flags = _flags | MSG_MORE_TO_COME;
|
||||
if (!writeConcern)
|
||||
_body.addNewDocument("writeConcern").add("w", 0);
|
||||
else
|
||||
writeConcern->add("w", 0);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
||||
bool OpMsgMessage::acknowledgedRequest() const
|
||||
{
|
||||
return _acknowledged;
|
||||
}
|
||||
|
||||
|
||||
UInt32 OpMsgMessage::flags() const
|
||||
{
|
||||
return _flags;
|
||||
}
|
||||
|
||||
|
||||
Document& OpMsgMessage::body()
|
||||
{
|
||||
return _body;
|
||||
}
|
||||
|
||||
|
||||
const Document& OpMsgMessage::body() const
|
||||
{
|
||||
return _body;
|
||||
}
|
||||
|
||||
|
||||
Document::Vector& OpMsgMessage::documents()
|
||||
{
|
||||
return _documents;
|
||||
}
|
||||
|
||||
|
||||
const Document::Vector& OpMsgMessage::documents() const
|
||||
{
|
||||
return _documents;
|
||||
}
|
||||
|
||||
|
||||
bool OpMsgMessage::responseOk() const
|
||||
{
|
||||
Poco::Int64 ok {false};
|
||||
if (_body.exists("ok"))
|
||||
{
|
||||
ok = _body.getInteger("ok");
|
||||
}
|
||||
return (ok != 0);
|
||||
}
|
||||
|
||||
|
||||
void OpMsgMessage::clear()
|
||||
{
|
||||
_flags = MSG_FLAGS_DEFAULT;
|
||||
_commandName.clear();
|
||||
_body.clear();
|
||||
_documents.clear();
|
||||
}
|
||||
|
||||
|
||||
void OpMsgMessage::send(std::ostream& ostr)
|
||||
{
|
||||
BinaryWriter socketWriter(ostr, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
|
||||
|
||||
// Serialise the body
|
||||
std::stringstream ss;
|
||||
BinaryWriter writer(ss, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
|
||||
writer << _flags;
|
||||
|
||||
writer << PAYLOAD_TYPE_0;
|
||||
_body.write(writer);
|
||||
|
||||
if (!_documents.empty())
|
||||
{
|
||||
// Serialise attached documents
|
||||
|
||||
std::stringstream ssdoc;
|
||||
BinaryWriter wdoc(ssdoc, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
|
||||
for (auto& doc: _documents)
|
||||
{
|
||||
doc->write(wdoc);
|
||||
}
|
||||
wdoc.flush();
|
||||
|
||||
const std::string& identifier = commandIdentifier(_commandName);
|
||||
const Poco::Int32 size = static_cast<Poco::Int32>(sizeof(size) + identifier.size() + 1 + ssdoc.tellp());
|
||||
writer << PAYLOAD_TYPE_1;
|
||||
writer << size;
|
||||
writer.writeCString(identifier.c_str());
|
||||
StreamCopier::copyStream(ssdoc, ss);
|
||||
}
|
||||
writer.flush();
|
||||
|
||||
#if POCO_MONGODB_DUMP
|
||||
const std::string section = ss.str();
|
||||
std::string dump;
|
||||
Logger::formatDump(dump, section.data(), section.length());
|
||||
std::cout << dump << std::endl;
|
||||
#endif
|
||||
|
||||
messageLength(static_cast<Poco::Int32>(ss.tellp()));
|
||||
|
||||
_header.write(socketWriter);
|
||||
StreamCopier::copyStream(ss, ostr);
|
||||
|
||||
ostr.flush();
|
||||
}
|
||||
|
||||
|
||||
void OpMsgMessage::read(std::istream& istr)
|
||||
{
|
||||
std::string message;
|
||||
{
|
||||
BinaryReader reader(istr, BinaryReader::LITTLE_ENDIAN_BYTE_ORDER);
|
||||
_header.read(reader);
|
||||
|
||||
poco_assert_dbg(_header.opCode() == _header.OP_MSG);
|
||||
|
||||
const std::streamsize remainingSize {_header.getMessageLength() - _header.MSG_HEADER_SIZE };
|
||||
message.reserve(remainingSize);
|
||||
|
||||
#if POCO_MONGODB_DUMP
|
||||
std::cout
|
||||
<< "Message hdr: " << _header.getMessageLength() << " " << remainingSize << " "
|
||||
<< _header.opCode() << " " << _header.getRequestID() << " " << _header.responseTo()
|
||||
<< std::endl;
|
||||
#endif
|
||||
|
||||
reader.readRaw(remainingSize, message);
|
||||
|
||||
#if POCO_MONGODB_DUMP
|
||||
std::string dump;
|
||||
Logger::formatDump(dump, message.data(), message.length());
|
||||
std::cout << dump << std::endl;
|
||||
#endif
|
||||
}
|
||||
// Read complete message and then interpret it.
|
||||
|
||||
std::istringstream msgss(message);
|
||||
BinaryReader reader(msgss, BinaryReader::LITTLE_ENDIAN_BYTE_ORDER);
|
||||
|
||||
Poco::UInt8 payloadType {0xFF};
|
||||
|
||||
reader >> _flags;
|
||||
reader >> payloadType;
|
||||
poco_assert_dbg(payloadType == PAYLOAD_TYPE_0);
|
||||
|
||||
_body.read(reader);
|
||||
|
||||
// Read next sections from the buffer
|
||||
while (msgss.good())
|
||||
{
|
||||
// NOTE: Not tested yet with database, because it returns everything in the body.
|
||||
// Does MongoDB ever return documents as Payload type 1?
|
||||
reader >> payloadType;
|
||||
if (!msgss.good())
|
||||
{
|
||||
break;
|
||||
}
|
||||
poco_assert_dbg(payloadType == PAYLOAD_TYPE_1);
|
||||
#if POCO_MONGODB_DUMP
|
||||
std::cout << "section payload: " << payloadType << std::endl;
|
||||
#endif
|
||||
|
||||
Poco::Int32 sectionSize {0};
|
||||
reader >> sectionSize;
|
||||
poco_assert_dbg(sectionSize > 0);
|
||||
|
||||
#if POCO_MONGODB_DUMP
|
||||
std::cout << "section size: " << sectionSize << std::endl;
|
||||
#endif
|
||||
std::streamoff offset = sectionSize - sizeof(sectionSize);
|
||||
std::streampos endOfSection = msgss.tellg() + offset;
|
||||
|
||||
std::string identifier;
|
||||
reader.readCString(identifier);
|
||||
#if POCO_MONGODB_DUMP
|
||||
std::cout << "section identifier: " << identifier << std::endl;
|
||||
#endif
|
||||
|
||||
// Loop to read documents from this section.
|
||||
while (msgss.tellg() < endOfSection)
|
||||
{
|
||||
#if POCO_MONGODB_DUMP
|
||||
std::cout << "section doc: " << msgss.tellg() << " " << endOfSection << std::endl;
|
||||
#endif
|
||||
Document::Ptr doc = new Document();
|
||||
doc->read(reader);
|
||||
_documents.push_back(doc);
|
||||
if (msgss.tellg() < 0)
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Extract documents from the cursor batch if they are there.
|
||||
MongoDB::Array::Ptr batch;
|
||||
auto curDoc = _body.get<MongoDB::Document::Ptr>(keyCursor, nullptr);
|
||||
if (curDoc)
|
||||
{
|
||||
batch = curDoc->get<MongoDB::Array::Ptr>(keyFirstBatch, nullptr);
|
||||
if (!batch)
|
||||
{
|
||||
batch = curDoc->get<MongoDB::Array::Ptr>(keyNextBatch, nullptr);
|
||||
}
|
||||
}
|
||||
if (batch)
|
||||
{
|
||||
for(std::size_t i = 0; i < batch->size(); i++)
|
||||
{
|
||||
const auto& d = batch->get<MongoDB::Document::Ptr>(i, nullptr);
|
||||
if (d)
|
||||
{
|
||||
_documents.push_back(d);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
const std::string& commandIdentifier(const std::string& command)
|
||||
{
|
||||
// Names of identifiers for commands that send bulk documents in the request
|
||||
// The identifier is set in the section type 1.
|
||||
static std::map<std::string, std::string> identifiers {
|
||||
{ OpMsgMessage::CMD_INSERT, "documents" },
|
||||
{ OpMsgMessage::CMD_DELETE, "deletes" },
|
||||
{ OpMsgMessage::CMD_UPDATE, "updates" },
|
||||
|
||||
// Not sure if create index can send document section
|
||||
{ OpMsgMessage::CMD_CREATE_INDEXES, "indexes" }
|
||||
};
|
||||
|
||||
const auto i = identifiers.find(command);
|
||||
if (i != identifiers.end())
|
||||
{
|
||||
return i->second;
|
||||
}
|
||||
|
||||
// This likely means that documents are incorrectly set for a command
|
||||
// that does not send list of documents in section type 1.
|
||||
static const std::string emptyIdentifier;
|
||||
return emptyIdentifier;
|
||||
}
|
||||
|
||||
|
||||
} } // namespace Poco::MongoDB
|
@ -20,10 +20,10 @@ namespace MongoDB {
|
||||
|
||||
|
||||
QueryRequest::QueryRequest(const std::string& collectionName, QueryRequest::Flags flags):
|
||||
RequestMessage(MessageHeader::OP_QUERY),
|
||||
_flags(flags),
|
||||
RequestMessage(MessageHeader::OP_QUERY),
|
||||
_flags(flags),
|
||||
_fullCollectionName(collectionName),
|
||||
_numberToSkip(0),
|
||||
_numberToSkip(0),
|
||||
_numberToReturn(100),
|
||||
_selector(),
|
||||
_returnFieldSelector()
|
||||
|
@ -25,8 +25,8 @@ RegularExpression::RegularExpression()
|
||||
}
|
||||
|
||||
|
||||
RegularExpression::RegularExpression(const std::string& pattern, const std::string& options):
|
||||
_pattern(pattern),
|
||||
RegularExpression::RegularExpression(const std::string& pattern, const std::string& options):
|
||||
_pattern(pattern),
|
||||
_options(options)
|
||||
{
|
||||
}
|
||||
|
@ -21,7 +21,7 @@ namespace Poco {
|
||||
namespace MongoDB {
|
||||
|
||||
|
||||
ReplicaSet::ReplicaSet(const std::vector<Net::SocketAddress> &addresses):
|
||||
ReplicaSet::ReplicaSet(const std::vector<Net::SocketAddress> &addresses):
|
||||
_addresses(addresses)
|
||||
{
|
||||
}
|
||||
@ -81,8 +81,8 @@ Connection::Ptr ReplicaSet::isMaster(const Net::SocketAddress& address)
|
||||
{
|
||||
conn = 0;
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
@ -21,7 +21,7 @@ namespace Poco {
|
||||
namespace MongoDB {
|
||||
|
||||
|
||||
RequestMessage::RequestMessage(MessageHeader::OpCode opcode):
|
||||
RequestMessage::RequestMessage(MessageHeader::OpCode opcode):
|
||||
Message(opcode)
|
||||
{
|
||||
}
|
||||
@ -35,7 +35,7 @@ RequestMessage::~RequestMessage()
|
||||
void RequestMessage::send(std::ostream& ostr)
|
||||
{
|
||||
std::stringstream ss;
|
||||
BinaryWriter requestWriter(ss);
|
||||
BinaryWriter requestWriter(ss, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
|
||||
buildRequest(requestWriter);
|
||||
requestWriter.flush();
|
||||
|
||||
|
@ -21,10 +21,20 @@ namespace MongoDB {
|
||||
|
||||
|
||||
ResponseMessage::ResponseMessage():
|
||||
Message(MessageHeader::OP_REPLY),
|
||||
_responseFlags(0),
|
||||
_cursorID(0),
|
||||
_startingFrom(0),
|
||||
Message(MessageHeader::OP_REPLY),
|
||||
_responseFlags(0),
|
||||
_cursorID(0),
|
||||
_startingFrom(0),
|
||||
_numberReturned(0)
|
||||
{
|
||||
}
|
||||
|
||||
|
||||
ResponseMessage::ResponseMessage(const Int64& cursorID):
|
||||
Message(MessageHeader::OP_REPLY),
|
||||
_responseFlags(0),
|
||||
_cursorID(cursorID),
|
||||
_startingFrom(0),
|
||||
_numberReturned(0)
|
||||
{
|
||||
}
|
||||
@ -50,7 +60,7 @@ void ResponseMessage::read(std::istream& istr)
|
||||
clear();
|
||||
|
||||
BinaryReader reader(istr, BinaryReader::LITTLE_ENDIAN_BYTE_ORDER);
|
||||
|
||||
|
||||
_header.read(reader);
|
||||
|
||||
reader >> _responseFlags;
|
||||
|
@ -20,7 +20,7 @@ namespace MongoDB {
|
||||
|
||||
|
||||
UpdateRequest::UpdateRequest(const std::string& collectionName, UpdateRequest::Flags flags):
|
||||
RequestMessage(MessageHeader::OP_UPDATE),
|
||||
RequestMessage(MessageHeader::OP_UPDATE),
|
||||
_flags(flags),
|
||||
_fullCollectionName(collectionName),
|
||||
_selector(),
|
||||
|
4
contrib/CMakeLists.txt
vendored
4
contrib/CMakeLists.txt
vendored
@ -146,7 +146,7 @@ add_contrib (amqpcpp-cmake AMQP-CPP) # requires: libuv
|
||||
add_contrib (cassandra-cmake cassandra) # requires: libuv
|
||||
if (NOT OS_DARWIN)
|
||||
add_contrib (curl-cmake curl)
|
||||
add_contrib (azure-cmake azure)
|
||||
add_contrib (azure-cmake azure) # requires: curl
|
||||
add_contrib (sentry-native-cmake sentry-native) # requires: curl
|
||||
endif()
|
||||
add_contrib (fmtlib-cmake fmtlib)
|
||||
@ -157,7 +157,7 @@ add_contrib (librdkafka-cmake librdkafka) # requires: libgsasl
|
||||
add_contrib (nats-io-cmake nats-io)
|
||||
add_contrib (isa-l-cmake isa-l)
|
||||
add_contrib (libhdfs3-cmake libhdfs3) # requires: google-protobuf, krb5, isa-l
|
||||
add_contrib (hive-metastore-cmake hive-metastore) # requires: thrift/avro/arrow/libhdfs3
|
||||
add_contrib (hive-metastore-cmake hive-metastore) # requires: thrift, avro, arrow, libhdfs3
|
||||
add_contrib (cppkafka-cmake cppkafka)
|
||||
add_contrib (libpqxx-cmake libpqxx)
|
||||
add_contrib (libpq-cmake libpq)
|
||||
|
2
contrib/NuRaft
vendored
2
contrib/NuRaft
vendored
@ -1 +1 @@
|
||||
Subproject commit b56784be1aec568fb72aff47f281097c017623cb
|
||||
Subproject commit 491eaf592d950e0e37accbe8b3f217e068c9fecf
|
@ -116,43 +116,79 @@ configure_file("${ORC_SOURCE_SRC_DIR}/Adaptor.hh.in" "${ORC_BUILD_INCLUDE_DIR}/A
|
||||
# ARROW_ORC + adapters/orc/CMakefiles
|
||||
set(ORC_SRCS
|
||||
"${CMAKE_CURRENT_BINARY_DIR}/orc_proto.pb.h"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/ExpressionTree.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/Literal.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/PredicateLeaf.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/SargsApplier.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/SearchArgument.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/TruthValue.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Exceptions.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/OrcFile.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Reader.cc"
|
||||
"${ORC_ADDITION_SOURCE_DIR}/orc_proto.pb.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Adaptor.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Adaptor.hh.in"
|
||||
"${ORC_SOURCE_SRC_DIR}/BlockBuffer.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/BlockBuffer.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/BloomFilter.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/BloomFilter.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Bpacking.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/BpackingDefault.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/BpackingDefault.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/ByteRLE.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/ByteRLE.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/CMakeLists.txt"
|
||||
"${ORC_SOURCE_SRC_DIR}/ColumnPrinter.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/ColumnReader.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/ColumnReader.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/ColumnWriter.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/ColumnWriter.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Common.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Compression.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Compression.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/ConvertColumnReader.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/ConvertColumnReader.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/CpuInfoUtil.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/CpuInfoUtil.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Dispatch.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Exceptions.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Int128.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/LzoDecompressor.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/LzoDecompressor.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/MemoryPool.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Murmur3.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Murmur3.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Options.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/OrcFile.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLE.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLE.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLEV2Util.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLEV2Util.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLEv1.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLEv1.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLEv2.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Reader.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Reader.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/RleDecoderV2.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RleEncoderV2.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/RLEV2Util.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/SchemaEvolution.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/SchemaEvolution.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Statistics.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Statistics.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/StripeStream.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/StripeStream.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Timezone.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Timezone.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/TypeImpl.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/TypeImpl.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Utils.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/Vector.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Writer.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Adaptor.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/BloomFilter.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/Murmur3.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/BlockBuffer.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/wrap/orc-proto-wrapper.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/io/InputStream.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/io/InputStream.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/io/OutputStream.cc"
|
||||
"${ORC_ADDITION_SOURCE_DIR}/orc_proto.pb.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/io/OutputStream.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/ExpressionTree.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/ExpressionTree.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/Literal.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/PredicateLeaf.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/PredicateLeaf.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/SargsApplier.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/SargsApplier.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/SearchArgument.cc"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/SearchArgument.hh"
|
||||
"${ORC_SOURCE_SRC_DIR}/sargs/TruthValue.cc"
|
||||
)
|
||||
|
||||
add_library(_orc ${ORC_SRCS})
|
||||
|
@ -1,6 +1,6 @@
|
||||
option (ENABLE_AZURE_BLOB_STORAGE "Enable Azure blob storage" ${ENABLE_LIBRARIES})
|
||||
|
||||
if (NOT ENABLE_AZURE_BLOB_STORAGE OR BUILD_STANDALONE_KEEPER OR OS_FREEBSD OR (NOT ARCH_AMD64))
|
||||
if (NOT ENABLE_AZURE_BLOB_STORAGE OR BUILD_STANDALONE_KEEPER OR OS_FREEBSD)
|
||||
message(STATUS "Not using Azure blob storage")
|
||||
return()
|
||||
endif()
|
||||
|
2
contrib/capnproto
vendored
2
contrib/capnproto
vendored
@ -1 +1 @@
|
||||
Subproject commit dc8b50b999777bcb23c89bb5907c785c3f654441
|
||||
Subproject commit 976209a6d18074804f60d18ef99b6a809d27dadf
|
@ -4,7 +4,7 @@ if (SANITIZE OR NOT (
|
||||
))
|
||||
if (ENABLE_JEMALLOC)
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL}
|
||||
"jemalloc is disabled implicitly: it doesn't work with sanitizers and can only be used with x86_64, aarch64, or ppc64le Linux or FreeBSD builds and RelWithDebInfo macOS builds.")
|
||||
"jemalloc is disabled implicitly: it doesn't work with sanitizers and can only be used with x86_64, aarch64, or ppc64le Linux or FreeBSD builds and RelWithDebInfo macOS builds. Use -DENABLE_JEMALLOC=0")
|
||||
endif ()
|
||||
set (ENABLE_JEMALLOC OFF)
|
||||
else ()
|
||||
|
@ -1,11 +1,11 @@
|
||||
if(NOT ARCH_AARCH64 AND NOT OS_FREEBSD AND NOT APPLE AND NOT ARCH_PPC64LE AND NOT ARCH_S390X)
|
||||
if(NOT OS_FREEBSD AND NOT APPLE AND NOT ARCH_PPC64LE AND NOT ARCH_S390X)
|
||||
option(ENABLE_HDFS "Enable HDFS" ${ENABLE_LIBRARIES})
|
||||
elseif(ENABLE_HDFS)
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Cannot use HDFS3 with current configuration")
|
||||
endif()
|
||||
|
||||
if(NOT ENABLE_HDFS)
|
||||
message(STATUS "Not using hdfs")
|
||||
message(STATUS "Not using HDFS")
|
||||
return()
|
||||
endif()
|
||||
|
||||
|
2
contrib/orc
vendored
2
contrib/orc
vendored
@ -1 +1 @@
|
||||
Subproject commit c5d7755ba0b9a95631c8daea4d094101f26ec761
|
||||
Subproject commit 568d1d60c250af1890f226c182bc15bd8cc94cf1
|
@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
|
||||
esac
|
||||
|
||||
ARG REPOSITORY="https://s3.amazonaws.com/clickhouse-builds/22.4/31c367d3cd3aefd316778601ff6565119fe36682/package_release"
|
||||
ARG VERSION="23.5.1.3174"
|
||||
ARG VERSION="23.5.3.24"
|
||||
ARG PACKAGES="clickhouse-keeper"
|
||||
|
||||
# user/group precreated explicitly with fixed uid/gid on purpose.
|
||||
|
@ -33,7 +33,7 @@ RUN arch=${TARGETARCH:-amd64} \
|
||||
# lts / testing / prestable / etc
|
||||
ARG REPO_CHANNEL="stable"
|
||||
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
|
||||
ARG VERSION="23.5.1.3174"
|
||||
ARG VERSION="23.5.3.24"
|
||||
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
|
||||
|
||||
# user/group precreated explicitly with fixed uid/gid on purpose.
|
||||
|
@ -22,7 +22,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
|
||||
|
||||
ARG REPO_CHANNEL="stable"
|
||||
ARG REPOSITORY="deb https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
|
||||
ARG VERSION="23.5.1.3174"
|
||||
ARG VERSION="23.5.3.24"
|
||||
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
|
||||
|
||||
# set non-empty deb_location_url url to create a docker image
|
||||
|
@ -16,6 +16,12 @@ For more information and documentation see https://clickhouse.com/.
|
||||
- The tag `head` is built from the latest commit to the default branch.
|
||||
- Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.
|
||||
|
||||
### Compatibility
|
||||
|
||||
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
|
||||
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A). Most ARM CPUs after 2017 support ARMv8.2-A. A notable exception is Raspberry Pi 4 from 2019 whose CPU only supports ARMv8.0-A.
|
||||
- Since the Clickhouse 23.3 Ubuntu image started using `ubuntu:22.04` as its base image, it requires docker version >= `20.10.10`, or use `docker run -- privileged` instead. Alternatively, try the Clickhouse Alpine image.
|
||||
|
||||
## How to use this image
|
||||
|
||||
### start server instance
|
||||
|
@ -12,10 +12,10 @@ RUN apt-get update --yes && \
|
||||
# We need to get the repository's HEAD each time despite, so we invalidate layers' cache
|
||||
ARG CACHE_INVALIDATOR=0
|
||||
RUN mkdir /sqlancer && \
|
||||
wget -q -O- https://github.com/sqlancer/sqlancer/archive/master.tar.gz | \
|
||||
wget -q -O- https://github.com/sqlancer/sqlancer/archive/main.tar.gz | \
|
||||
tar zx -C /sqlancer && \
|
||||
cd /sqlancer/sqlancer-master && \
|
||||
mvn package -DskipTests && \
|
||||
cd /sqlancer/sqlancer-main && \
|
||||
mvn --no-transfer-progress package -DskipTests && \
|
||||
rm -r /root/.m2
|
||||
|
||||
COPY run.sh /
|
||||
|
@ -16,7 +16,6 @@ def process_result(result_folder):
|
||||
"TLPGroupBy",
|
||||
"TLPHaving",
|
||||
"TLPWhere",
|
||||
"TLPWhereGroupBy",
|
||||
"NoREC",
|
||||
]
|
||||
failed_tests = []
|
||||
|
@ -33,7 +33,7 @@ cd /workspace
|
||||
|
||||
for _ in $(seq 1 60); do if [[ $(wget -q 'localhost:8123' -O-) == 'Ok.' ]]; then break ; else sleep 1; fi ; done
|
||||
|
||||
cd /sqlancer/sqlancer-master
|
||||
cd /sqlancer/sqlancer-main
|
||||
|
||||
TIMEOUT=300
|
||||
NUM_QUERIES=1000
|
||||
|
@ -15,6 +15,9 @@ dpkg -i package_folder/clickhouse-client_*.deb
|
||||
|
||||
ln -s /usr/share/clickhouse-test/clickhouse-test /usr/bin/clickhouse-test
|
||||
|
||||
# shellcheck disable=SC1091
|
||||
source /usr/share/clickhouse-test/ci/attach_gdb.lib || true # FIXME: to not break old builds, clean on 2023-09-01
|
||||
|
||||
# install test configs
|
||||
/usr/share/clickhouse-test/config/install.sh
|
||||
|
||||
@ -85,6 +88,8 @@ fi
|
||||
|
||||
sleep 5
|
||||
|
||||
attach_gdb_to_clickhouse || true # FIXME: to not break old builds, clean on 2023-09-01
|
||||
|
||||
function run_tests()
|
||||
{
|
||||
set -x
|
||||
|
@ -59,14 +59,9 @@ install_packages previous_release_package_folder
|
||||
# available for dump via clickhouse-local
|
||||
configure
|
||||
|
||||
# local_blob_storage disk type does not exist in older versions
|
||||
sudo cat /etc/clickhouse-server/config.d/storage_conf.xml \
|
||||
| sed "s|<type>local_blob_storage</type>|<type>local</type>|" \
|
||||
> /etc/clickhouse-server/config.d/storage_conf.xml.tmp
|
||||
sudo mv /etc/clickhouse-server/config.d/storage_conf.xml.tmp /etc/clickhouse-server/config.d/storage_conf.xml
|
||||
|
||||
# it contains some new settings, but we can safely remove it
|
||||
rm /etc/clickhouse-server/config.d/merge_tree.xml
|
||||
rm /etc/clickhouse-server/users.d/nonconst_timezone.xml
|
||||
|
||||
start
|
||||
stop
|
||||
@ -92,13 +87,9 @@ export USE_S3_STORAGE_FOR_MERGE_TREE=1
|
||||
export ZOOKEEPER_FAULT_INJECTION=0
|
||||
configure
|
||||
|
||||
sudo cat /etc/clickhouse-server/config.d/storage_conf.xml \
|
||||
| sed "s|<type>local_blob_storage</type>|<type>local</type>|" \
|
||||
> /etc/clickhouse-server/config.d/storage_conf.xml.tmp
|
||||
sudo mv /etc/clickhouse-server/config.d/storage_conf.xml.tmp /etc/clickhouse-server/config.d/storage_conf.xml
|
||||
|
||||
# it contains some new settings, but we can safely remove it
|
||||
rm /etc/clickhouse-server/config.d/merge_tree.xml
|
||||
rm /etc/clickhouse-server/users.d/nonconst_timezone.xml
|
||||
|
||||
start
|
||||
|
||||
@ -128,6 +119,13 @@ mv /var/log/clickhouse-server/clickhouse-server.log /var/log/clickhouse-server/c
|
||||
install_packages package_folder
|
||||
export ZOOKEEPER_FAULT_INJECTION=1
|
||||
configure
|
||||
|
||||
# Just in case previous version left some garbage in zk
|
||||
sudo cat /etc/clickhouse-server/config.d/lost_forever_check.xml \
|
||||
| sed "s|>1<|>0<|g" \
|
||||
> /etc/clickhouse-server/config.d/lost_forever_check.xml.tmp
|
||||
sudo mv /etc/clickhouse-server/config.d/lost_forever_check.xml.tmp /etc/clickhouse-server/config.d/lost_forever_check.xml
|
||||
|
||||
start 500
|
||||
clickhouse-client --query "SELECT 'Server successfully started', 'OK', NULL, ''" >> /test_output/test_results.tsv \
|
||||
|| (rg --text "<Error>.*Application" /var/log/clickhouse-server/clickhouse-server.log > /test_output/application_errors.txt \
|
||||
|
32
docs/changelogs/v22.8.18.31-lts.md
Normal file
32
docs/changelogs/v22.8.18.31-lts.md
Normal file
@ -0,0 +1,32 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v22.8.18.31-lts (4de7a95a544) FIXME as compared to v22.8.17.17-lts (df7f2ef0b41)
|
||||
|
||||
#### Performance Improvement
|
||||
* Backported in [#49214](https://github.com/ClickHouse/ClickHouse/issues/49214): Fixed excessive reading in queries with `FINAL`. [#47801](https://github.com/ClickHouse/ClickHouse/pull/47801) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
* Backported in [#49079](https://github.com/ClickHouse/ClickHouse/issues/49079): Update time zones. The following were updated: Africa/Cairo, Africa/Casablanca, Africa/El_Aaiun, America/Bogota, America/Cambridge_Bay, America/Ciudad_Juarez, America/Godthab, America/Inuvik, America/Iqaluit, America/Nuuk, America/Ojinaga, America/Pangnirtung, America/Rankin_Inlet, America/Resolute, America/Whitehorse, America/Yellowknife, Asia/Gaza, Asia/Hebron, Asia/Kuala_Lumpur, Asia/Singapore, Canada/Yukon, Egypt, Europe/Kirov, Europe/Volgograd, Singapore. [#48572](https://github.com/ClickHouse/ClickHouse/pull/48572) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Fix bad cast from LowCardinality column when using short circuit function execution [#43311](https://github.com/ClickHouse/ClickHouse/pull/43311) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix msan issue in randomStringUTF8(<uneven number>) [#49750](https://github.com/ClickHouse/ClickHouse/pull/49750) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* JIT compilation not equals NaN fix [#50056](https://github.com/ClickHouse/ClickHouse/pull/50056) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix crash with `multiIf` and constant condition and nullable arguments [#50123](https://github.com/ClickHouse/ClickHouse/pull/50123) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fixed type conversion from Date/Date32 to DateTime64 when querying with DateTime64 index [#50280](https://github.com/ClickHouse/ClickHouse/pull/50280) ([Lucas Chang](https://github.com/lucas-tubi)).
|
||||
* Fix Keeper deadlock on exception when preprocessing requests. [#50387](https://github.com/ClickHouse/ClickHouse/pull/50387) ([frinkr](https://github.com/frinkr)).
|
||||
* Fix Log family table return wrong rows count after truncate [#50585](https://github.com/ClickHouse/ClickHouse/pull/50585) ([flynn](https://github.com/ucasfl)).
|
||||
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Improve test reports [#49151](https://github.com/ClickHouse/ClickHouse/pull/49151) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Update github.com/distribution/distribution [#50114](https://github.com/ClickHouse/ClickHouse/pull/50114) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Catch issues with dockerd during the build [#50700](https://github.com/ClickHouse/ClickHouse/pull/50700) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
19
docs/changelogs/v22.8.19.10-lts.md
Normal file
19
docs/changelogs/v22.8.19.10-lts.md
Normal file
@ -0,0 +1,19 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v22.8.19.10-lts (989bc2fe8b0) FIXME as compared to v22.8.18.31-lts (4de7a95a544)
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Fix subtly broken copy-on-write of ColumnLowCardinality dictionary [#51064](https://github.com/ClickHouse/ClickHouse/pull/51064) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Generate safe IVs [#51086](https://github.com/ClickHouse/ClickHouse/pull/51086) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Fix a versions' tweak for tagged commits, improve version_helper [#51035](https://github.com/ClickHouse/ClickHouse/pull/51035) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Sqlancer has changed master to main [#51060](https://github.com/ClickHouse/ClickHouse/pull/51060) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
45
docs/changelogs/v23.3.3.52-lts.md
Normal file
45
docs/changelogs/v23.3.3.52-lts.md
Normal file
@ -0,0 +1,45 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v23.3.3.52-lts (cb963c474db) FIXME as compared to v23.3.2.37-lts (1b144bcd101)
|
||||
|
||||
#### Improvement
|
||||
* Backported in [#49954](https://github.com/ClickHouse/ClickHouse/issues/49954): Add support for (an unusual) case where the arguments in the `IN` operator are single-element tuples. [#49844](https://github.com/ClickHouse/ClickHouse/pull/49844) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
* Backported in [#49210](https://github.com/ClickHouse/ClickHouse/issues/49210): Fix glibc compatibility check: replace `preadv` from musl. [#49144](https://github.com/ClickHouse/ClickHouse/pull/49144) ([alesapin](https://github.com/alesapin)).
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Fix aggregate empty string error [#48999](https://github.com/ClickHouse/ClickHouse/pull/48999) ([LiuNeng](https://github.com/liuneng1994)).
|
||||
* Fix key not found error for queries with multiple StorageJoin [#49137](https://github.com/ClickHouse/ClickHouse/pull/49137) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix race on Outdated parts loading [#49223](https://github.com/ClickHouse/ClickHouse/pull/49223) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Fix bug in DISTINCT [#49628](https://github.com/ClickHouse/ClickHouse/pull/49628) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix msan issue in randomStringUTF8(<uneven number>) [#49750](https://github.com/ClickHouse/ClickHouse/pull/49750) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* fix `is_prefix` in OptimizeRegularExpression [#49919](https://github.com/ClickHouse/ClickHouse/pull/49919) ([Han Fei](https://github.com/hanfei1991)).
|
||||
* Fix IPv6 encoding in protobuf [#49933](https://github.com/ClickHouse/ClickHouse/pull/49933) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Avoid deadlock when starting table in attach thread of `ReplicatedMergeTree` [#50026](https://github.com/ClickHouse/ClickHouse/pull/50026) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* JIT compilation not equals NaN fix [#50056](https://github.com/ClickHouse/ClickHouse/pull/50056) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix crash with `multiIf` and constant condition and nullable arguments [#50123](https://github.com/ClickHouse/ClickHouse/pull/50123) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix reconnecting of HTTPS session when target host IP was changed [#50240](https://github.com/ClickHouse/ClickHouse/pull/50240) ([Aleksei Filatov](https://github.com/aalexfvk)).
|
||||
* Fixed type conversion from Date/Date32 to DateTime64 when querying with DateTime64 index [#50280](https://github.com/ClickHouse/ClickHouse/pull/50280) ([Lucas Chang](https://github.com/lucas-tubi)).
|
||||
* Fix Keeper deadlock on exception when preprocessing requests. [#50387](https://github.com/ClickHouse/ClickHouse/pull/50387) ([frinkr](https://github.com/frinkr)).
|
||||
* Fix incorrect constant folding [#50536](https://github.com/ClickHouse/ClickHouse/pull/50536) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix Log family table return wrong rows count after truncate [#50585](https://github.com/ClickHouse/ClickHouse/pull/50585) ([flynn](https://github.com/ucasfl)).
|
||||
* Fix bug in `uniqExact` parallel merging [#50590](https://github.com/ClickHouse/ClickHouse/pull/50590) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Implement status comment [#48468](https://github.com/ClickHouse/ClickHouse/pull/48468) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Update curl to 8.0.1 (for CVEs) [#48765](https://github.com/ClickHouse/ClickHouse/pull/48765) ([Boris Kuschel](https://github.com/bkuschel)).
|
||||
* Improve test reports [#49151](https://github.com/ClickHouse/ClickHouse/pull/49151) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Fallback auth gh api [#49314](https://github.com/ClickHouse/ClickHouse/pull/49314) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Improve CI: status commit, auth for get_gh_api [#49388](https://github.com/ClickHouse/ClickHouse/pull/49388) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Update github.com/distribution/distribution [#50114](https://github.com/ClickHouse/ClickHouse/pull/50114) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Catch issues with dockerd during the build [#50700](https://github.com/ClickHouse/ClickHouse/pull/50700) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
22
docs/changelogs/v23.3.4.17-lts.md
Normal file
22
docs/changelogs/v23.3.4.17-lts.md
Normal file
@ -0,0 +1,22 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v23.3.4.17-lts (2c99b73ff40) FIXME as compared to v23.3.3.52-lts (cb963c474db)
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Fix crash when Pool::Entry::disconnect() is called [#50334](https://github.com/ClickHouse/ClickHouse/pull/50334) ([Val Doroshchuk](https://github.com/valbok)).
|
||||
* Avoid storing logs in Keeper containing unknown operation [#50751](https://github.com/ClickHouse/ClickHouse/pull/50751) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Fix subtly broken copy-on-write of ColumnLowCardinality dictionary [#51064](https://github.com/ClickHouse/ClickHouse/pull/51064) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Generate safe IVs [#51086](https://github.com/ClickHouse/ClickHouse/pull/51086) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Don't mark a part as broken on `Poco::TimeoutException` [#50811](https://github.com/ClickHouse/ClickHouse/pull/50811) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Fix a versions' tweak for tagged commits, improve version_helper [#51035](https://github.com/ClickHouse/ClickHouse/pull/51035) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Sqlancer has changed master to main [#51060](https://github.com/ClickHouse/ClickHouse/pull/51060) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
19
docs/changelogs/v23.3.5.9-lts.md
Normal file
19
docs/changelogs/v23.3.5.9-lts.md
Normal file
@ -0,0 +1,19 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v23.3.5.9-lts (f5fbc2fd2b3) FIXME as compared to v23.3.4.17-lts (2c99b73ff40)
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Fix broken index analysis when binary operator contains a null constant argument [#50177](https://github.com/ClickHouse/ClickHouse/pull/50177) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Cleanup moving parts [#50489](https://github.com/ClickHouse/ClickHouse/pull/50489) ([vdimir](https://github.com/vdimir)).
|
||||
* Do not apply projection if read-in-order was enabled. [#50923](https://github.com/ClickHouse/ClickHouse/pull/50923) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Increase max array size in group bitmap [#50620](https://github.com/ClickHouse/ClickHouse/pull/50620) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
|
42
docs/changelogs/v23.4.3.48-stable.md
Normal file
42
docs/changelogs/v23.4.3.48-stable.md
Normal file
@ -0,0 +1,42 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v23.4.3.48-stable (d9199f8d3cc) FIXME as compared to v23.4.2.11-stable (b6442320f9d)
|
||||
|
||||
#### Backward Incompatible Change
|
||||
* Backported in [#49981](https://github.com/ClickHouse/ClickHouse/issues/49981): Revert "`groupArray` returns cannot be nullable" (due to binary compatibility breakage for `groupArray`/`groupArrayLast`/`groupArraySample` over `Nullable` types, which likely will lead to `TOO_LARGE_ARRAY_SIZE` or `CANNOT_READ_ALL_DATA`). [#49971](https://github.com/ClickHouse/ClickHouse/pull/49971) ([Azat Khuzhin](https://github.com/azat)).
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Fix key not found error for queries with multiple StorageJoin [#49137](https://github.com/ClickHouse/ClickHouse/pull/49137) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix fuzz bug when subquery set is not built when reading from remote() [#49425](https://github.com/ClickHouse/ClickHouse/pull/49425) ([Alexander Gololobov](https://github.com/davenger)).
|
||||
* Fix postgres database setting [#49481](https://github.com/ClickHouse/ClickHouse/pull/49481) ([Mal Curtis](https://github.com/snikch)).
|
||||
* Fix AsynchronousReadIndirectBufferFromRemoteFS breaking on short seeks [#49525](https://github.com/ClickHouse/ClickHouse/pull/49525) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Fix bug in DISTINCT [#49628](https://github.com/ClickHouse/ClickHouse/pull/49628) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix assert in SpanHolder::finish() with fibers [#49673](https://github.com/ClickHouse/ClickHouse/pull/49673) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fix msan issue in randomStringUTF8(<uneven number>) [#49750](https://github.com/ClickHouse/ClickHouse/pull/49750) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* fix `is_prefix` in OptimizeRegularExpression [#49919](https://github.com/ClickHouse/ClickHouse/pull/49919) ([Han Fei](https://github.com/hanfei1991)).
|
||||
* Fix IPv6 encoding in protobuf [#49933](https://github.com/ClickHouse/ClickHouse/pull/49933) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
|
||||
* Avoid deadlock when starting table in attach thread of `ReplicatedMergeTree` [#50026](https://github.com/ClickHouse/ClickHouse/pull/50026) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Fix assert in SpanHolder::finish() with fibers attempt 2 [#50034](https://github.com/ClickHouse/ClickHouse/pull/50034) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* JIT compilation not equals NaN fix [#50056](https://github.com/ClickHouse/ClickHouse/pull/50056) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix crashing in case of Replicated database without arguments [#50058](https://github.com/ClickHouse/ClickHouse/pull/50058) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix crash with `multiIf` and constant condition and nullable arguments [#50123](https://github.com/ClickHouse/ClickHouse/pull/50123) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix iceberg metadata parsing [#50232](https://github.com/ClickHouse/ClickHouse/pull/50232) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix bugs in Poco sockets in non-blocking mode, use true non-blocking sockets [#50252](https://github.com/ClickHouse/ClickHouse/pull/50252) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fixed type conversion from Date/Date32 to DateTime64 when querying with DateTime64 index [#50280](https://github.com/ClickHouse/ClickHouse/pull/50280) ([Lucas Chang](https://github.com/lucas-tubi)).
|
||||
* Fix Keeper deadlock on exception when preprocessing requests. [#50387](https://github.com/ClickHouse/ClickHouse/pull/50387) ([frinkr](https://github.com/frinkr)).
|
||||
* Fix Log family table return wrong rows count after truncate [#50585](https://github.com/ClickHouse/ClickHouse/pull/50585) ([flynn](https://github.com/ucasfl)).
|
||||
* Fix bug in `uniqExact` parallel merging [#50590](https://github.com/ClickHouse/ClickHouse/pull/50590) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Improve CI: status commit, auth for get_gh_api [#49388](https://github.com/ClickHouse/ClickHouse/pull/49388) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Update github.com/distribution/distribution [#50114](https://github.com/ClickHouse/ClickHouse/pull/50114) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Catch issues with dockerd during the build [#50700](https://github.com/ClickHouse/ClickHouse/pull/50700) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
22
docs/changelogs/v23.4.4.16-stable.md
Normal file
22
docs/changelogs/v23.4.4.16-stable.md
Normal file
@ -0,0 +1,22 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v23.4.4.16-stable (747ba4fc6a0) FIXME as compared to v23.4.3.48-stable (d9199f8d3cc)
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Fix crash when Pool::Entry::disconnect() is called [#50334](https://github.com/ClickHouse/ClickHouse/pull/50334) ([Val Doroshchuk](https://github.com/valbok)).
|
||||
* Fix iceberg V2 optional metadata parsing [#50974](https://github.com/ClickHouse/ClickHouse/pull/50974) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix subtly broken copy-on-write of ColumnLowCardinality dictionary [#51064](https://github.com/ClickHouse/ClickHouse/pull/51064) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Generate safe IVs [#51086](https://github.com/ClickHouse/ClickHouse/pull/51086) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Don't mark a part as broken on `Poco::TimeoutException` [#50811](https://github.com/ClickHouse/ClickHouse/pull/50811) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Fix a versions' tweak for tagged commits, improve version_helper [#51035](https://github.com/ClickHouse/ClickHouse/pull/51035) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Sqlancer has changed master to main [#51060](https://github.com/ClickHouse/ClickHouse/pull/51060) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
18
docs/changelogs/v23.5.2.7-stable.md
Normal file
18
docs/changelogs/v23.5.2.7-stable.md
Normal file
@ -0,0 +1,18 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v23.5.2.7-stable (5751aa1ab9f) FIXME as compared to v23.5.1.3174-stable (2fec796e73e)
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Fix build for aarch64 (temporary disable azure) [#50770](https://github.com/ClickHouse/ClickHouse/pull/50770) ([alesapin](https://github.com/alesapin)).
|
||||
* Rename azure_blob_storage to azureBlobStorage [#50812](https://github.com/ClickHouse/ClickHouse/pull/50812) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
|
||||
|
26
docs/changelogs/v23.5.3.24-stable.md
Normal file
26
docs/changelogs/v23.5.3.24-stable.md
Normal file
@ -0,0 +1,26 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_label: 2023
|
||||
---
|
||||
|
||||
# 2023 Changelog
|
||||
|
||||
### ClickHouse release v23.5.3.24-stable (76f54616d3b) FIXME as compared to v23.5.2.7-stable (5751aa1ab9f)
|
||||
|
||||
#### Bug Fix (user-visible misbehavior in an official stable release)
|
||||
|
||||
* Fix Log family table return wrong rows count after truncate [#50585](https://github.com/ClickHouse/ClickHouse/pull/50585) ([flynn](https://github.com/ucasfl)).
|
||||
* Fix bug in `uniqExact` parallel merging [#50590](https://github.com/ClickHouse/ClickHouse/pull/50590) ([Nikita Taranov](https://github.com/nickitat)).
|
||||
* Revert recent grace hash join changes [#50699](https://github.com/ClickHouse/ClickHouse/pull/50699) ([vdimir](https://github.com/vdimir)).
|
||||
* Avoid storing logs in Keeper containing unknown operation [#50751](https://github.com/ClickHouse/ClickHouse/pull/50751) ([Antonio Andelic](https://github.com/antonio2368)).
|
||||
* Add compat setting for non-const timezones [#50834](https://github.com/ClickHouse/ClickHouse/pull/50834) ([Robert Schulze](https://github.com/rschu1ze)).
|
||||
* Fix iceberg V2 optional metadata parsing [#50974](https://github.com/ClickHouse/ClickHouse/pull/50974) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix subtly broken copy-on-write of ColumnLowCardinality dictionary [#51064](https://github.com/ClickHouse/ClickHouse/pull/51064) ([Michael Kolupaev](https://github.com/al13n321)).
|
||||
* Generate safe IVs [#51086](https://github.com/ClickHouse/ClickHouse/pull/51086) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
|
||||
|
||||
#### NOT FOR CHANGELOG / INSIGNIFICANT
|
||||
|
||||
* Don't mark a part as broken on `Poco::TimeoutException` [#50811](https://github.com/ClickHouse/ClickHouse/pull/50811) ([Alexander Tokmakov](https://github.com/tavplubix)).
|
||||
* Fix a versions' tweak for tagged commits, improve version_helper [#51035](https://github.com/ClickHouse/ClickHouse/pull/51035) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
* Sqlancer has changed master to main [#51060](https://github.com/ClickHouse/ClickHouse/pull/51060) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
|
||||
|
@ -53,6 +53,7 @@ Engines in the family:
|
||||
- [JDBC](../../engines/table-engines/integrations/jdbc.md)
|
||||
- [MySQL](../../engines/table-engines/integrations/mysql.md)
|
||||
- [MongoDB](../../engines/table-engines/integrations/mongodb.md)
|
||||
- [Redis](../../engines/table-engines/integrations/redis.md)
|
||||
- [HDFS](../../engines/table-engines/integrations/hdfs.md)
|
||||
- [S3](../../engines/table-engines/integrations/s3.md)
|
||||
- [Kafka](../../engines/table-engines/integrations/kafka.md)
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/ExternalDistributed
|
||||
sidebar_position: 12
|
||||
sidebar_position: 55
|
||||
sidebar_label: ExternalDistributed
|
||||
title: ExternalDistributed
|
||||
---
|
||||
|
@ -0,0 +1,52 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/azureBlobStorage
|
||||
sidebar_position: 10
|
||||
sidebar_label: Azure Blob Storage
|
||||
---
|
||||
|
||||
# AzureBlobStorage Table Engine
|
||||
|
||||
This engine provides an integration with [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) ecosystem.
|
||||
|
||||
## Create Table
|
||||
|
||||
``` sql
|
||||
CREATE TABLE azure_blob_storage_table (name String, value UInt32)
|
||||
ENGINE = AzureBlobStorage(connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])
|
||||
[PARTITION BY expr]
|
||||
[SETTINGS ...]
|
||||
```
|
||||
|
||||
### Engine parameters
|
||||
|
||||
- `connection_string|storage_account_url` — connection_string includes account name & key ([Create connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&bc=%2Fazure%2Fstorage%2Fblobs%2Fbreadcrumb%2Ftoc.json#configure-a-connection-string-for-an-azure-storage-account)) or you could also provide the storage account url here and account name & account key as separate parameters (see parameters account_name & account_key)
|
||||
- `container_name` - Container name
|
||||
- `blobpath` - file path. Supports following wildcards in readonly mode: `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc'`, `'def'` — strings.
|
||||
- `account_name` - if storage_account_url is used, then account name can be specified here
|
||||
- `account_key` - if storage_account_url is used, then account key can be specified here
|
||||
- `format` — The [format](/docs/en/interfaces/formats.md) of the file.
|
||||
- `compression` — Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. By default, it will autodetect compression by file extension. (same as setting to `auto`).
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
CREATE TABLE test_table (key UInt64, data String)
|
||||
ENGINE = AzureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;',
|
||||
'test_container', 'test_table', 'CSV');
|
||||
|
||||
INSERT INTO test_table VALUES (1, 'a'), (2, 'b'), (3, 'c');
|
||||
|
||||
SELECT * FROM test_table;
|
||||
```
|
||||
|
||||
```text
|
||||
┌─key──┬─data──┐
|
||||
│ 1 │ a │
|
||||
│ 2 │ b │
|
||||
│ 3 │ c │
|
||||
└──────┴───────┘
|
||||
```
|
||||
|
||||
## See also
|
||||
|
||||
[Azure Blob Storage Table Function](/docs/en/sql-reference/table-functions/azureBlobStorage)
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/deltalake
|
||||
sidebar_position: 40
|
||||
sidebar_label: DeltaLake
|
||||
---
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/embedded-rocksdb
|
||||
sidebar_position: 9
|
||||
sidebar_position: 50
|
||||
sidebar_label: EmbeddedRocksDB
|
||||
---
|
||||
|
||||
@ -99,7 +99,7 @@ INSERT INTO test VALUES ('some key', 1, 'value', 3.2);
|
||||
|
||||
### Deletes
|
||||
|
||||
Rows can be deleted using `DELETE` query or `TRUNCATE`.
|
||||
Rows can be deleted using `DELETE` query or `TRUNCATE`.
|
||||
|
||||
```sql
|
||||
DELETE FROM test WHERE key LIKE 'some%' AND v1 > 1;
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/hdfs
|
||||
sidebar_position: 6
|
||||
sidebar_position: 80
|
||||
sidebar_label: HDFS
|
||||
---
|
||||
|
||||
@ -63,7 +63,7 @@ SELECT * FROM hdfs_engine_table LIMIT 2
|
||||
- `ALTER` and `SELECT...SAMPLE` operations.
|
||||
- Indexes.
|
||||
- [Zero-copy](../../../operations/storing-data.md#zero-copy) replication is possible, but not recommended.
|
||||
|
||||
|
||||
:::note Zero-copy replication is not ready for production
|
||||
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
|
||||
:::
|
||||
@ -233,6 +233,12 @@ libhdfs3 support HDFS namenode HA.
|
||||
- `_path` — Path to the file.
|
||||
- `_file` — Name of the file.
|
||||
|
||||
## Storage Settings {#storage-settings}
|
||||
|
||||
- [hdfs_truncate_on_insert](/docs/en/operations/settings/settings.md#hdfs-truncate-on-insert) - allows to truncate file before insert into it. Disabled by default.
|
||||
- [hdfs_create_multiple_files](/docs/en/operations/settings/settings.md#hdfs_allow_create_multiple_files) - allows to create a new file on each insert if format has suffix. Disabled by default.
|
||||
- [hdfs_skip_empty_files](/docs/en/operations/settings/settings.md#hdfs_skip_empty_files) - allows to skip empty files while reading. Disabled by default.
|
||||
|
||||
**See Also**
|
||||
|
||||
- [Virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns)
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/hive
|
||||
sidebar_position: 4
|
||||
sidebar_position: 84
|
||||
sidebar_label: Hive
|
||||
---
|
||||
|
||||
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/hudi
|
||||
sidebar_position: 86
|
||||
sidebar_label: Hudi
|
||||
---
|
||||
|
||||
|
@ -1,5 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/iceberg
|
||||
sidebar_position: 90
|
||||
sidebar_label: Iceberg
|
||||
---
|
||||
|
||||
|
@ -6,24 +6,4 @@ sidebar_label: Integrations
|
||||
|
||||
# Table Engines for Integrations
|
||||
|
||||
ClickHouse provides various means for integrating with external systems, including table engines. Like with all other table engines, the configuration is done using `CREATE TABLE` or `ALTER TABLE` queries. Then from a user perspective, the configured integration looks like a normal table, but queries to it are proxied to the external system. This transparent querying is one of the key advantages of this approach over alternative integration methods, like dictionaries or table functions, which require to use custom query methods on each use.
|
||||
|
||||
List of supported integrations:
|
||||
|
||||
- [ODBC](../../../engines/table-engines/integrations/odbc.md)
|
||||
- [JDBC](../../../engines/table-engines/integrations/jdbc.md)
|
||||
- [MySQL](../../../engines/table-engines/integrations/mysql.md)
|
||||
- [MongoDB](../../../engines/table-engines/integrations/mongodb.md)
|
||||
- [HDFS](../../../engines/table-engines/integrations/hdfs.md)
|
||||
- [S3](../../../engines/table-engines/integrations/s3.md)
|
||||
- [Kafka](../../../engines/table-engines/integrations/kafka.md)
|
||||
- [EmbeddedRocksDB](../../../engines/table-engines/integrations/embedded-rocksdb.md)
|
||||
- [RabbitMQ](../../../engines/table-engines/integrations/rabbitmq.md)
|
||||
- [PostgreSQL](../../../engines/table-engines/integrations/postgresql.md)
|
||||
- [SQLite](../../../engines/table-engines/integrations/sqlite.md)
|
||||
- [Hive](../../../engines/table-engines/integrations/hive.md)
|
||||
- [ExternalDistributed](../../../engines/table-engines/integrations/ExternalDistributed.md)
|
||||
- [MaterializedPostgreSQL](../../../engines/table-engines/integrations/materialized-postgresql.md)
|
||||
- [NATS](../../../engines/table-engines/integrations/nats.md)
|
||||
- [DeltaLake](../../../engines/table-engines/integrations/deltalake.md)
|
||||
- [Hudi](../../../engines/table-engines/integrations/hudi.md)
|
||||
ClickHouse provides various means for integrating with external systems, including table engines. Like with all other table engines, the configuration is done using `CREATE TABLE` or `ALTER TABLE` queries. Then from a user perspective, the configured integration looks like a normal table, but queries to it are proxied to the external system. This transparent querying is one of the key advantages of this approach over alternative integration methods, like dictionaries or table functions, which require the use of custom query methods on each use.
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/jdbc
|
||||
sidebar_position: 3
|
||||
sidebar_position: 100
|
||||
sidebar_label: JDBC
|
||||
---
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/kafka
|
||||
sidebar_position: 8
|
||||
sidebar_position: 110
|
||||
sidebar_label: Kafka
|
||||
---
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/materialized-postgresql
|
||||
sidebar_position: 12
|
||||
sidebar_position: 130
|
||||
sidebar_label: MaterializedPostgreSQL
|
||||
title: MaterializedPostgreSQL
|
||||
---
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/mongodb
|
||||
sidebar_position: 5
|
||||
sidebar_position: 135
|
||||
sidebar_label: MongoDB
|
||||
---
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/mysql
|
||||
sidebar_position: 4
|
||||
sidebar_position: 138
|
||||
sidebar_label: MySQL
|
||||
---
|
||||
|
||||
@ -35,6 +35,10 @@ The table structure can differ from the original MySQL table structure:
|
||||
- Column types may differ from those in the original MySQL table. ClickHouse tries to [cast](../../../engines/database-engines/mysql.md#data_types-support) values to the ClickHouse data types.
|
||||
- The [external_table_functions_use_nulls](../../../operations/settings/settings.md#external-table-functions-use-nulls) setting defines how to handle Nullable columns. Default value: 1. If 0, the table function does not make Nullable columns and inserts default values instead of nulls. This is also applicable for NULL values inside arrays.
|
||||
|
||||
:::note
|
||||
The MySQL Table Engine is currently not available on the ClickHouse builds for MacOS ([issue](https://github.com/ClickHouse/ClickHouse/issues/21191))
|
||||
:::
|
||||
|
||||
**Engine Parameters**
|
||||
|
||||
- `host:port` — MySQL server address.
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/nats
|
||||
sidebar_position: 14
|
||||
sidebar_position: 140
|
||||
sidebar_label: NATS
|
||||
---
|
||||
|
||||
@ -83,12 +83,12 @@ You can select one of the subjects the table reads from and publish your data th
|
||||
CREATE TABLE queue (
|
||||
key UInt64,
|
||||
value UInt64
|
||||
) ENGINE = NATS
|
||||
) ENGINE = NATS
|
||||
SETTINGS nats_url = 'localhost:4444',
|
||||
nats_subjects = 'subject1,subject2',
|
||||
nats_format = 'JSONEachRow';
|
||||
|
||||
INSERT INTO queue
|
||||
INSERT INTO queue
|
||||
SETTINGS stream_like_engine_insert_queue = 'subject2'
|
||||
VALUES (1, 1);
|
||||
```
|
||||
@ -102,7 +102,7 @@ Example:
|
||||
key UInt64,
|
||||
value UInt64,
|
||||
date DateTime
|
||||
) ENGINE = NATS
|
||||
) ENGINE = NATS
|
||||
SETTINGS nats_url = 'localhost:4444',
|
||||
nats_subjects = 'subject1',
|
||||
nats_format = 'JSONEachRow',
|
||||
@ -137,7 +137,7 @@ Example:
|
||||
CREATE TABLE queue (
|
||||
key UInt64,
|
||||
value UInt64
|
||||
) ENGINE = NATS
|
||||
) ENGINE = NATS
|
||||
SETTINGS nats_url = 'localhost:4444',
|
||||
nats_subjects = 'subject1',
|
||||
nats_format = 'JSONEachRow',
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/odbc
|
||||
sidebar_position: 2
|
||||
sidebar_position: 150
|
||||
sidebar_label: ODBC
|
||||
---
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/postgresql
|
||||
sidebar_position: 11
|
||||
sidebar_position: 160
|
||||
sidebar_label: PostgreSQL
|
||||
---
|
||||
|
||||
@ -136,7 +136,7 @@ postgresql> SELECT * FROM test;
|
||||
|
||||
### Creating Table in ClickHouse, and connecting to PostgreSQL table created above
|
||||
|
||||
This example uses the [PostgreSQL table engine](/docs/en/engines/table-engines/integrations/postgresql.md) to connect the ClickHouse table to the PostgreSQL table:
|
||||
This example uses the [PostgreSQL table engine](/docs/en/engines/table-engines/integrations/postgresql.md) to connect the ClickHouse table to the PostgreSQL table and use both SELECT and INSERT statements to the PostgreSQL database:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE default.postgresql_table
|
||||
@ -150,10 +150,21 @@ ENGINE = PostgreSQL('localhost:5432', 'public', 'test', 'postges_user', 'postgre
|
||||
|
||||
### Inserting initial data from PostgreSQL table into ClickHouse table, using a SELECT query
|
||||
|
||||
The [postgresql table function](/docs/en/sql-reference/table-functions/postgresql.md) copies the data from PostgreSQL to ClickHouse, which is often used for improving the query performance of the data by querying or performing analytics in ClickHouse rather than in PostgreSQL, or can also be used for migrating data from PostgreSQL to ClickHouse:
|
||||
The [postgresql table function](/docs/en/sql-reference/table-functions/postgresql.md) copies the data from PostgreSQL to ClickHouse, which is often used for improving the query performance of the data by querying or performing analytics in ClickHouse rather than in PostgreSQL, or can also be used for migrating data from PostgreSQL to ClickHouse. Since we will be copying the data from PostgreSQL to ClickHouse, we will use a MergeTree table engine in ClickHouse and call it postgresql_copy:
|
||||
|
||||
``` sql
|
||||
INSERT INTO default.postgresql_table
|
||||
CREATE TABLE default.postgresql_copy
|
||||
(
|
||||
`float_nullable` Nullable(Float32),
|
||||
`str` String,
|
||||
`int_id` Int32
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY (int_id);
|
||||
```
|
||||
|
||||
``` sql
|
||||
INSERT INTO default.postgresql_copy
|
||||
SELECT * FROM postgresql('localhost:5432', 'public', 'test', 'postges_user', 'postgres_password');
|
||||
```
|
||||
|
||||
@ -164,13 +175,13 @@ If then performing ongoing synchronization between the PostgreSQL table and Clic
|
||||
This would require keeping track of the max ID or timestamp previously added, such as the following:
|
||||
|
||||
``` sql
|
||||
SELECT max(`int_id`) AS maxIntID FROM default.postgresql_table;
|
||||
SELECT max(`int_id`) AS maxIntID FROM default.postgresql_copy;
|
||||
```
|
||||
|
||||
Then inserting values from PostgreSQL table greater than the max
|
||||
|
||||
``` sql
|
||||
INSERT INTO default.postgresql_table
|
||||
INSERT INTO default.postgresql_copy
|
||||
SELECT * FROM postgresql('localhost:5432', 'public', 'test', 'postges_user', 'postgres_password');
|
||||
WHERE int_id > maxIntID;
|
||||
```
|
||||
@ -178,7 +189,7 @@ WHERE int_id > maxIntID;
|
||||
### Selecting data from the resulting ClickHouse table
|
||||
|
||||
``` sql
|
||||
SELECT * FROM postgresql_table WHERE str IN ('test');
|
||||
SELECT * FROM postgresql_copy WHERE str IN ('test');
|
||||
```
|
||||
|
||||
``` text
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/rabbitmq
|
||||
sidebar_position: 10
|
||||
sidebar_position: 170
|
||||
sidebar_label: RabbitMQ
|
||||
---
|
||||
|
||||
|
119
docs/en/engines/table-engines/integrations/redis.md
Normal file
119
docs/en/engines/table-engines/integrations/redis.md
Normal file
@ -0,0 +1,119 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/redis
|
||||
sidebar_position: 175
|
||||
sidebar_label: Redis
|
||||
---
|
||||
|
||||
# Redis
|
||||
|
||||
This engine allows integrating ClickHouse with [Redis](https://redis.io/). For Redis takes kv model, we strongly recommend you only query it in a point way, such as `where k=xx` or `where k in (xx, xx)`.
|
||||
|
||||
## Creating a Table {#creating-a-table}
|
||||
|
||||
``` sql
|
||||
CREATE TABLE [IF NOT EXISTS] [db.]table_name
|
||||
(
|
||||
name1 [type1],
|
||||
name2 [type2],
|
||||
...
|
||||
) ENGINE = Redis(host:port[, db_index[, password[, pool_size]]]) PRIMARY KEY(primary_key_name);
|
||||
```
|
||||
|
||||
**Engine Parameters**
|
||||
|
||||
- `host:port` — Redis server address, you can ignore port and default Redis port 6379 will be used.
|
||||
|
||||
- `db_index` — Redis db index range from 0 to 15, default is 0.
|
||||
|
||||
- `password` — User password, default is blank string.
|
||||
|
||||
- `pool_size` — Redis max connection pool size, default is 16.
|
||||
|
||||
- `primary_key_name` - any column name in the column list.
|
||||
|
||||
- `primary` must be specified, it supports only one column in the primary key. The primary key will be serialized in binary as a Redis key.
|
||||
|
||||
- columns other than the primary key will be serialized in binary as Redis value in corresponding order.
|
||||
|
||||
- queries with key equals or in filtering will be optimized to multi keys lookup from Redis. If queries without filtering key full table scan will happen which is a heavy operation.
|
||||
|
||||
## Usage Example {#usage-example}
|
||||
|
||||
Create a table in ClickHouse which allows to read data from Redis:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE redis_table
|
||||
(
|
||||
`k` String,
|
||||
`m` String,
|
||||
`n` UInt32
|
||||
)
|
||||
ENGINE = Redis('redis1:6379') PRIMARY KEY(k);
|
||||
```
|
||||
|
||||
Insert:
|
||||
|
||||
```sql
|
||||
INSERT INTO redis_table Values('1', 1, '1', 1.0), ('2', 2, '2', 2.0);
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT COUNT(*) FROM redis_table;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─count()─┐
|
||||
│ 2 │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT * FROM redis_table WHERE key='1';
|
||||
```
|
||||
|
||||
```text
|
||||
┌─key─┬─v1─┬─v2─┬─v3─┐
|
||||
│ 1 │ 1 │ 1 │ 1 │
|
||||
└─────┴────┴────┴────┘
|
||||
```
|
||||
|
||||
``` sql
|
||||
SELECT * FROM redis_table WHERE v1=2;
|
||||
```
|
||||
|
||||
```text
|
||||
┌─key─┬─v1─┬─v2─┬─v3─┐
|
||||
│ 2 │ 2 │ 2 │ 2 │
|
||||
└─────┴────┴────┴────┘
|
||||
```
|
||||
|
||||
Update:
|
||||
|
||||
Note that the primary key cannot be updated.
|
||||
|
||||
```sql
|
||||
ALTER TABLE redis_table UPDATE v1=2 WHERE key='1';
|
||||
```
|
||||
|
||||
Delete:
|
||||
|
||||
```sql
|
||||
ALTER TABLE redis_table DELETE WHERE key='1';
|
||||
```
|
||||
|
||||
Truncate:
|
||||
|
||||
Flush Redis db asynchronously. Also `Truncate` support SYNC mode.
|
||||
|
||||
```sql
|
||||
TRUNCATE TABLE redis_table SYNC;
|
||||
```
|
||||
|
||||
|
||||
## Limitations {#limitations}
|
||||
|
||||
Redis engine also supports scanning queries, such as `where k > xx`, but it has some limitations:
|
||||
1. Scanning query may produce some duplicated keys in a very rare case when it is rehashing. See details in [Redis Scan](https://github.com/redis/redis/blob/e4d183afd33e0b2e6e8d1c79a832f678a04a7886/src/dict.c#L1186-L1269)
|
||||
2. During the scanning, keys could be created and deleted, so the resulting dataset can not represent a valid point in time.
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/s3
|
||||
sidebar_position: 7
|
||||
sidebar_position: 180
|
||||
sidebar_label: S3
|
||||
---
|
||||
|
||||
@ -8,30 +8,7 @@ sidebar_label: S3
|
||||
|
||||
This engine provides integration with [Amazon S3](https://aws.amazon.com/s3/) ecosystem. This engine is similar to the [HDFS](../../../engines/table-engines/special/file.md#table_engines-hdfs) engine, but provides S3-specific features.
|
||||
|
||||
## Create Table {#creating-a-table}
|
||||
|
||||
``` sql
|
||||
CREATE TABLE s3_engine_table (name String, value UInt32)
|
||||
ENGINE = S3(path [, NOSIGN | aws_access_key_id, aws_secret_access_key,] format, [compression])
|
||||
[PARTITION BY expr]
|
||||
[SETTINGS ...]
|
||||
```
|
||||
|
||||
**Engine parameters**
|
||||
|
||||
- `path` — Bucket url with path to file. Supports following wildcards in readonly mode: `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc'`, `'def'` — strings. For more information see [below](#wildcards-in-path).
|
||||
- `NOSIGN` - If this keyword is provided in place of credentials, all the requests will not be signed.
|
||||
- `format` — The [format](../../../interfaces/formats.md#formats) of the file.
|
||||
- `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user. You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file. For more information see [Using S3 for Data Storage](../mergetree-family/mergetree.md#table_engine-mergetree-s3).
|
||||
- `compression` — Compression type. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. Parameter is optional. By default, it will auto-detect compression by file extension.
|
||||
|
||||
### PARTITION BY
|
||||
|
||||
`PARTITION BY` — Optional. In most cases you don't need a partition key, and if it is needed you generally don't need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).
|
||||
|
||||
For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](/docs/en/sql-reference/data-types/date.md). The partition names here have the `"YYYYMM"` format.
|
||||
|
||||
**Example**
|
||||
## Example
|
||||
|
||||
``` sql
|
||||
CREATE TABLE s3_engine_table (name String, value UInt32)
|
||||
@ -49,6 +26,135 @@ SELECT * FROM s3_engine_table LIMIT 2;
|
||||
│ two │ 2 │
|
||||
└──────┴───────┘
|
||||
```
|
||||
## Create Table {#creating-a-table}
|
||||
|
||||
``` sql
|
||||
CREATE TABLE s3_engine_table (name String, value UInt32)
|
||||
ENGINE = S3(path [, NOSIGN | aws_access_key_id, aws_secret_access_key,] format, [compression])
|
||||
[PARTITION BY expr]
|
||||
[SETTINGS ...]
|
||||
```
|
||||
|
||||
### Engine parameters
|
||||
|
||||
- `path` — Bucket url with path to file. Supports following wildcards in readonly mode: `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc'`, `'def'` — strings. For more information see [below](#wildcards-in-path).
|
||||
- `NOSIGN` - If this keyword is provided in place of credentials, all the requests will not be signed.
|
||||
- `format` — The [format](../../../interfaces/formats.md#formats) of the file.
|
||||
- `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user. You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file. For more information see [Using S3 for Data Storage](../mergetree-family/mergetree.md#table_engine-mergetree-s3).
|
||||
- `compression` — Compression type. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. Parameter is optional. By default, it will auto-detect compression by file extension.
|
||||
|
||||
### PARTITION BY
|
||||
|
||||
`PARTITION BY` — Optional. In most cases you don't need a partition key, and if it is needed you generally don't need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).
|
||||
|
||||
For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](/docs/en/sql-reference/data-types/date.md). The partition names here have the `"YYYYMM"` format.
|
||||
|
||||
### Querying partitioned data
|
||||
|
||||
This example uses the [docker compose recipe](https://github.com/ClickHouse/examples/tree/5fdc6ff72f4e5137e23ea075c88d3f44b0202490/docker-compose-recipes/recipes/ch-and-minio-S3), which integrates ClickHouse and MinIO. You should be able to reproduce the same queries using S3 by replacing the endpoint and authentication values.
|
||||
|
||||
Notice that the S3 endpoint in the `ENGINE` configuration uses the parameter token `{_partition_id}` as part of the S3 object (filename), and that the SELECT queries select against those resulting object names (e.g., `test_3.csv`).
|
||||
|
||||
:::note
|
||||
As shown in the example, querying from S3 tables that are partitioned is
|
||||
not directly supported at this time, but can be accomplished by querying the bucket contents with a wildcard.
|
||||
|
||||
The primary use-case for writing
|
||||
partitioned data in S3 is to enable transferring that data into another
|
||||
ClickHouse system (for example, moving from on-prem systems to ClickHouse
|
||||
Cloud). Because ClickHouse datasets are often very large, and network
|
||||
reliability is sometimes imperfect it makes sense to transfer datasets
|
||||
in subsets, hence partitioned writes.
|
||||
:::
|
||||
|
||||
#### Create the table
|
||||
```sql
|
||||
CREATE TABLE p
|
||||
(
|
||||
`column1` UInt32,
|
||||
`column2` UInt32,
|
||||
`column3` UInt32
|
||||
)
|
||||
ENGINE = S3(
|
||||
# highlight-next-line
|
||||
'http://minio:10000/clickhouse//test_{_partition_id}.csv',
|
||||
'minioadmin',
|
||||
'minioadminpassword',
|
||||
'CSV')
|
||||
PARTITION BY column3
|
||||
```
|
||||
|
||||
#### Insert data
|
||||
```sql
|
||||
insert into p values (1, 2, 3), (3, 2, 1), (78, 43, 45)
|
||||
```
|
||||
|
||||
#### Select from partition 3
|
||||
|
||||
:::tip
|
||||
This query uses the s3 table function
|
||||
:::
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM s3('http://minio:10000/clickhouse//test_3.csv', 'minioadmin', 'minioadminpassword', 'CSV')
|
||||
```
|
||||
```response
|
||||
┌─c1─┬─c2─┬─c3─┐
|
||||
│ 1 │ 2 │ 3 │
|
||||
└────┴────┴────┘
|
||||
```
|
||||
|
||||
#### Select from partition 1
|
||||
```sql
|
||||
SELECT *
|
||||
FROM s3('http://minio:10000/clickhouse//test_1.csv', 'minioadmin', 'minioadminpassword', 'CSV')
|
||||
```
|
||||
```response
|
||||
┌─c1─┬─c2─┬─c3─┐
|
||||
│ 3 │ 2 │ 1 │
|
||||
└────┴────┴────┘
|
||||
```
|
||||
|
||||
#### Select from partition 45
|
||||
```sql
|
||||
SELECT *
|
||||
FROM s3('http://minio:10000/clickhouse//test_45.csv', 'minioadmin', 'minioadminpassword', 'CSV')
|
||||
```
|
||||
```response
|
||||
┌─c1─┬─c2─┬─c3─┐
|
||||
│ 78 │ 43 │ 45 │
|
||||
└────┴────┴────┘
|
||||
```
|
||||
|
||||
#### Select from all partitions
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM s3('http://minio:10000/clickhouse//**', 'minioadmin', 'minioadminpassword', 'CSV')
|
||||
```
|
||||
```response
|
||||
┌─c1─┬─c2─┬─c3─┐
|
||||
│ 3 │ 2 │ 1 │
|
||||
└────┴────┴────┘
|
||||
┌─c1─┬─c2─┬─c3─┐
|
||||
│ 1 │ 2 │ 3 │
|
||||
└────┴────┴────┘
|
||||
┌─c1─┬─c2─┬─c3─┐
|
||||
│ 78 │ 43 │ 45 │
|
||||
└────┴────┴────┘
|
||||
```
|
||||
|
||||
You may naturally try to `Select * from p`, but as noted above, this query will fail; use the preceding query.
|
||||
|
||||
```sql
|
||||
SELECT * FROM p
|
||||
```
|
||||
```response
|
||||
Received exception from server (version 23.4.1):
|
||||
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception: Reading from a partitioned S3 storage is not implemented yet. (NOT_IMPLEMENTED)
|
||||
```
|
||||
|
||||
## Virtual columns {#virtual-columns}
|
||||
|
||||
- `_path` — Path to the file.
|
||||
@ -127,6 +233,12 @@ CREATE TABLE table_with_asterisk (name String, value UInt32)
|
||||
ENGINE = S3('https://clickhouse-public-datasets.s3.amazonaws.com/my-bucket/{some,another}_folder/*', 'CSV');
|
||||
```
|
||||
|
||||
## Storage Settings {#storage-settings}
|
||||
|
||||
- [s3_truncate_on_insert](/docs/en/operations/settings/settings.md#s3-truncate-on-insert) - allows to truncate file before insert into it. Disabled by default.
|
||||
- [s3_create_multiple_files](/docs/en/operations/settings/settings.md#s3_allow_create_multiple_files) - allows to create a new file on each insert if format has suffix. Disabled by default.
|
||||
- [s3_skip_empty_files](/docs/en/operations/settings/settings.md#s3_skip_empty_files) - allows to skip empty files while reading. Disabled by default.
|
||||
|
||||
## S3-related Settings {#settings}
|
||||
|
||||
The following settings can be set before query execution or placed into configuration file.
|
||||
|
@ -1,6 +1,6 @@
|
||||
---
|
||||
slug: /en/engines/table-engines/integrations/sqlite
|
||||
sidebar_position: 7
|
||||
sidebar_position: 185
|
||||
sidebar_label: SQLite
|
||||
---
|
||||
|
||||
|
@ -1,104 +1,142 @@
|
||||
# Approximate Nearest Neighbor Search Indexes [experimental] {#table_engines-ANNIndex}
|
||||
|
||||
Nearest neighborhood search refers to the problem of finding the point(s) with the smallest distance to a given point in an n-dimensional
|
||||
space. Since exact search is in practice usually typically too slow, the task is often solved with approximate algorithms. A popular use
|
||||
case of of neighbor search is finding similar pictures (texts) for a given picture (text). Pictures (texts) can be decomposed into
|
||||
[embeddings](https://cloud.google.com/architecture/overview-extracting-and-serving-feature-embeddings-for-machine-learning), and instead of
|
||||
comparing pictures (texts) pixel-by-pixel (character-by-character), only the embeddings are compared.
|
||||
Nearest neighborhood search is the problem of finding the M closest points for a given point in an N-dimensional vector space. The most
|
||||
straightforward approach to solve this problem is a brute force search where the distance between all points in the vector space and the
|
||||
reference point is computed. This method guarantees perfect accuracy but it is usually too slow for practical applications. Thus, nearest
|
||||
neighborhood search problems are often solved with [approximative algorithms](https://github.com/erikbern/ann-benchmarks). Approximative
|
||||
nearest neighborhood search techniques, in conjunction with [embedding
|
||||
methods](https://cloud.google.com/architecture/overview-extracting-and-serving-feature-embeddings-for-machine-learning) allow to search huge
|
||||
amounts of media (pictures, songs, articles, etc.) in milliseconds.
|
||||
|
||||
In terms of SQL, the problem can be expressed as follows:
|
||||
Blogs:
|
||||
- [Vector Search with ClickHouse - Part 1](https://clickhouse.com/blog/vector-search-clickhouse-p1)
|
||||
- [Vector Search with ClickHouse - Part 2](https://clickhouse.com/blog/vector-search-clickhouse-p2)
|
||||
|
||||
|
||||
In terms of SQL, the nearest neighborhood problem can be expressed as follows:
|
||||
|
||||
``` sql
|
||||
SELECT *
|
||||
FROM table
|
||||
WHERE L2Distance(column, Point) < MaxDistance
|
||||
ORDER BY Distance(vectors, Point)
|
||||
LIMIT N
|
||||
```
|
||||
|
||||
`vectors` contains N-dimensional values of type [Array](../../../sql-reference/data-types/array.md) or
|
||||
[Tuple](../../../sql-reference/data-types/tuple.md), for example embeddings. Function `Distance` computes the distance between two vectors.
|
||||
Often, the the Euclidean (L2) distance is chosen as distance function but [other
|
||||
distance functions](/docs/en/sql-reference/functions/distance-functions.md) are also possible. `Point` is the reference point, e.g. `(0.17,
|
||||
0.33, ...)`, and `N` limits the number of search results.
|
||||
|
||||
An alternative formulation of the nearest neighborhood search problem looks as follows:
|
||||
|
||||
``` sql
|
||||
SELECT *
|
||||
FROM table
|
||||
ORDER BY L2Distance(column, Point)
|
||||
WHERE Distance(vectors, Point) < MaxDistance
|
||||
LIMIT N
|
||||
```
|
||||
|
||||
The queries are expensive because the L2 (Euclidean) distance between `Point` and all points in `column` and must be computed. To speed this process up, Approximate Nearest Neighbor Search Indexes (ANN indexes) store a compact representation of the search space (using clustering, search trees, etc.) which allows to compute an approximate answer quickly.
|
||||
While the first query returns the top-`N` closest points to the reference point, the second query returns all points closer to the reference
|
||||
point than a maximally allowed radius `MaxDistance`. Parameter `N` limits the number of returned values which is useful for situations where
|
||||
`MaxDistance` is difficult to determine in advance.
|
||||
|
||||
# Creating ANN Indexes
|
||||
With brute force search, both queries are expensive (linear in the number of points) because the distance between all points in `vectors` and
|
||||
`Point` must be computed. To speed this process up, Approximate Nearest Neighbor Search Indexes (ANN indexes) store a compact representation
|
||||
of the search space (using clustering, search trees, etc.) which allows to compute an approximate answer much quicker (in sub-linear time).
|
||||
|
||||
As long as ANN indexes are experimental, you first need to `SET allow_experimental_annoy_index = 1`.
|
||||
# Creating and Using ANN Indexes
|
||||
|
||||
Syntax to create an ANN index over an `Array` column:
|
||||
Syntax to create an ANN index over an [Array](../../../sql-reference/data-types/array.md) column:
|
||||
|
||||
```sql
|
||||
CREATE TABLE table
|
||||
(
|
||||
`id` Int64,
|
||||
`embedding` Array(Float32),
|
||||
INDEX <ann_index_name> embedding TYPE <ann_index_type>(<ann_index_parameters>) GRANULARITY <N>
|
||||
`vectors` Array(Float32),
|
||||
INDEX [ann_index_name vectors TYPE [ann_index_type]([ann_index_parameters]) [GRANULARITY [N]]
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY id;
|
||||
```
|
||||
|
||||
Syntax to create an ANN index over a `Tuple` column:
|
||||
Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
|
||||
|
||||
```sql
|
||||
CREATE TABLE table
|
||||
(
|
||||
`id` Int64,
|
||||
`embedding` Tuple(Float32[, Float32[, ...]]),
|
||||
INDEX <ann_index_name> embedding TYPE <ann_index_type>(<ann_index_parameters>) GRANULARITY <N>
|
||||
`vectors` Tuple(Float32[, Float32[, ...]]),
|
||||
INDEX [ann_index_name] vectors TYPE [ann_index_type]([ann_index_parameters]) [GRANULARITY [N]]
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY id;
|
||||
```
|
||||
|
||||
ANN indexes are built during column insertion and merge and `INSERT` and `OPTIMIZE` statements will be slower than for ordinary tables. ANNIndexes are ideally used only with immutable or rarely changed data, respectively there are much more read requests than write requests.
|
||||
|
||||
Similar to regular skip indexes, ANN indexes are constructed over granules and each indexed block consists of `GRANULARITY = <N>`-many
|
||||
granules. For example, if the primary index granularity of the table is 8192 (setting `index_granularity = 8192`) and `GRANULARITY = 2`,
|
||||
then each indexed block will consist of 16384 rows. However, unlike skip indexes, ANN indexes are not only able to skip the entire indexed
|
||||
block, they are able to skip individual granules in indexed blocks. As a result, the `GRANULARITY` parameter has a different meaning in ANN
|
||||
indexes than in normal skip indexes. Basically, the bigger `GRANULARITY` is chosen, the more data is provided to a single ANN index, and the
|
||||
higher the chance that with the right hyper parameters, the index will remember the data structure better.
|
||||
|
||||
# Using ANN Indexes
|
||||
ANN indexes are built during column insertion and merge. As a result, `INSERT` and `OPTIMIZE` statements will be slower than for ordinary
|
||||
tables. ANNIndexes are ideally used only with immutable or rarely changed data, respectively when are far more read requests than write
|
||||
requests.
|
||||
|
||||
ANN indexes support two types of queries:
|
||||
|
||||
- WHERE queries:
|
||||
|
||||
``` sql
|
||||
SELECT *
|
||||
FROM table
|
||||
WHERE DistanceFunction(column, Point) < MaxDistance
|
||||
LIMIT N
|
||||
```
|
||||
|
||||
- ORDER BY queries:
|
||||
|
||||
``` sql
|
||||
SELECT *
|
||||
FROM table
|
||||
[WHERE ...]
|
||||
ORDER BY DistanceFunction(column, Point)
|
||||
ORDER BY Distance(vectors, Point)
|
||||
LIMIT N
|
||||
```
|
||||
|
||||
`DistanceFunction` is a [distance function](/docs/en/sql-reference/functions/distance-functions.md), `Point` is a reference vector (e.g. `(0.17, 0.33, ...)`) and `MaxDistance` is a floating point value which restricts the size of the neighbourhood.
|
||||
- WHERE queries:
|
||||
|
||||
``` sql
|
||||
SELECT *
|
||||
FROM table
|
||||
WHERE Distance(vectors, Point) < MaxDistance
|
||||
LIMIT N
|
||||
```
|
||||
|
||||
:::tip
|
||||
To avoid writing out large vectors, you can use [query parameters](/docs/en//interfaces/cli.md#queries-with-parameters-cli-queries-with-parameters), e.g.
|
||||
To avoid writing out large vectors, you can use [query
|
||||
parameters](/docs/en/interfaces/cli.md#queries-with-parameters-cli-queries-with-parameters), e.g.
|
||||
|
||||
```bash
|
||||
clickhouse-client --param_vec='hello' --query="SELECT * FROM table WHERE L2Distance(embedding, {vec: Array(Float32)}) < 1.0"
|
||||
clickhouse-client --param_vec='hello' --query="SELECT * FROM table WHERE L2Distance(vectors, {vec: Array(Float32)}) < 1.0"
|
||||
```
|
||||
:::
|
||||
|
||||
ANN indexes cannot speed up queries that contain both a `WHERE DistanceFunction(column, Point) < MaxDistance` and an `ORDER BY DistanceFunction(column, Point)` clause. Also, the approximate algorithms used to determine the nearest neighbors require a limit, hence queries that use an ANN index must have a `LIMIT` clause.
|
||||
**Restrictions**: Queries that contain both a `WHERE Distance(vectors, Point) < MaxDistance` and an `ORDER BY Distance(vectors, Point)`
|
||||
clause cannot use ANN indexes. Also, the approximate algorithms used to determine the nearest neighbors require a limit, hence queries
|
||||
without `LIMIT` clause cannot utilize ANN indexes. Also ANN indexes are only used if the query has a `LIMIT` value smaller than setting
|
||||
`max_limit_for_ann_queries` (default: 1 million rows). This is a safeguard to prevent large memory allocations by external libraries for
|
||||
approximate neighbor search.
|
||||
|
||||
**Differences to Skip Indexes** Similar to regular [skip indexes](https://clickhouse.com/docs/en/optimize/skipping-indexes), ANN indexes are
|
||||
constructed over granules and each indexed block consists of `GRANULARITY = [N]`-many granules (`[N]` = 1 by default for normal skip
|
||||
indexes). For example, if the primary index granularity of the table is 8192 (setting `index_granularity = 8192`) and `GRANULARITY = 2`,
|
||||
then each indexed block will contain 16384 rows. However, data structures and algorithms for approximate neighborhood search (usually
|
||||
provided by external libraries) are inherently row-oriented. They store a compact representation of a set of rows and also return rows for
|
||||
ANN queries. This causes some rather unintuitive differences in the way ANN indexes behave compared to normal skip indexes.
|
||||
|
||||
When a user defines a ANN index on a column, ClickHouse internally creates a ANN "sub-index" for each index block. The sub-index is "local"
|
||||
in the sense that it only knows about the rows of its containing index block. In the previous example and assuming that a column has 65536
|
||||
rows, we obtain four index blocks (spanning eight granules) and a ANN sub-index for each index block. A sub-index is theoretically able to
|
||||
return the rows with the N closest points within its index block directly. However, since ClickHouse loads data from disk to memory at the
|
||||
granularity of granules, sub-indexes extrapolate matching rows to granule granularity. This is different from regular skip indexes which
|
||||
skip data at the granularity of index blocks.
|
||||
|
||||
The `GRANULARITY` parameter determines how many ANN sub-indexes are created. Bigger `GRANULARITY` values mean fewer but larger ANN
|
||||
sub-indexes, up to the point where a column (or a column's data part) has only a single sub-index. In that case, the sub-index has a
|
||||
"global" view of all column rows and can directly return all granules of the column (part) with relevant rows (there are at most
|
||||
`LIMIT [N]`-many such granules). In a second step, ClickHouse will load these granules and identify the actually best rows by performing a
|
||||
brute-force distance calculation over all rows of the granules. With a small `GRANULARITY` value, each of the sub-indexes returns up to
|
||||
`LIMIT N`-many granules. As a result, more granules need to be loaded and post-filtered. Note that the search accuracy is with both cases
|
||||
equally good, only the processing performance differs. It is generally recommended to use a large `GRANULARITY` for ANN indexes and fall
|
||||
back to a smaller `GRANULARITY` values only in case of problems like excessive memory consumption of the ANN structures. If no `GRANULARITY`
|
||||
was specified for ANN indexes, the default value is 100 million.
|
||||
|
||||
An ANN index is only used if the query has a `LIMIT` value smaller than setting `max_limit_for_ann_queries` (default: 1 million rows). This is a safety measure which helps to avoid large memory consumption by external libraries for approximate neighbor search.
|
||||
|
||||
# Available ANN Indexes
|
||||
|
||||
@ -106,51 +144,68 @@ An ANN index is only used if the query has a `LIMIT` value smaller than setting
|
||||
|
||||
## Annoy {#annoy}
|
||||
|
||||
(currently disabled on ARM due to memory safety problems with the algorithm)
|
||||
Annoy indexes are currently experimental, to use them you first need to `SET allow_experimental_annoy_index = 1`. They are also currently
|
||||
disabled on ARM due to memory safety problems with the algorithm.
|
||||
|
||||
This type of ANN index implements [the Annoy algorithm](https://github.com/spotify/annoy) which uses a recursive division of the space in random linear surfaces (lines in 2D, planes in 3D etc.).
|
||||
This type of ANN index implements [the Annoy algorithm](https://github.com/spotify/annoy) which is based on a recursive division of the
|
||||
space in random linear surfaces (lines in 2D, planes in 3D etc.).
|
||||
|
||||
Syntax to create a Annoy index over a `Array` column:
|
||||
<div class='vimeo-container'>
|
||||
<iframe src="//www.youtube.com/embed/QkCCyLW0ehU"
|
||||
width="640"
|
||||
height="360"
|
||||
frameborder="0"
|
||||
allow="autoplay;
|
||||
fullscreen;
|
||||
picture-in-picture"
|
||||
allowfullscreen>
|
||||
</iframe>
|
||||
</div>
|
||||
|
||||
Syntax to create an Annoy index over an [Array](../../../sql-reference/data-types/array.md) column:
|
||||
|
||||
```sql
|
||||
CREATE TABLE table
|
||||
(
|
||||
id Int64,
|
||||
embedding Array(Float32),
|
||||
INDEX <ann_index_name> embedding TYPE annoy([DistanceName[, NumTrees]]) GRANULARITY N
|
||||
vectors Array(Float32),
|
||||
INDEX [ann_index_name] vectors TYPE annoy([Distance[, NumTrees]]) [GRANULARITY N]
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY id;
|
||||
```
|
||||
|
||||
Syntax to create a Annoy index over a `Tuple` column:
|
||||
Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
|
||||
|
||||
```sql
|
||||
CREATE TABLE table
|
||||
(
|
||||
id Int64,
|
||||
embedding Tuple(Float32[, Float32[, ...]]),
|
||||
INDEX <ann_index_name> embedding TYPE annoy([DistanceName[, NumTrees]]) GRANULARITY N
|
||||
vectors Tuple(Float32[, Float32[, ...]]),
|
||||
INDEX [ann_index_name] vectors TYPE annoy([Distance[, NumTrees]]) [GRANULARITY N]
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY id;
|
||||
```
|
||||
|
||||
Parameter `DistanceName` is name of a distance function (default `L2Distance`). Annoy currently supports `L2Distance` and `cosineDistance` as distance functions. Parameter `NumTrees` (default: 100) is the number of trees which the algorithm will create. Higher values of `NumTree` mean slower `CREATE` and `SELECT` statements (approximately linearly), but increase the accuracy of search results.
|
||||
Annoy currently supports `L2Distance` and `cosineDistance` as distance function `Distance`. If no distance function was specified during
|
||||
index creation, `L2Distance` is used as default. Parameter `NumTrees` is the number of trees which the algorithm creates (default if not
|
||||
specified: 100). Higher values of `NumTree` mean more accurate search results but slower index creation / query times (approximately
|
||||
linearly) as well as larger index sizes.
|
||||
|
||||
:::note
|
||||
Indexes over columns of type `Array` will generally work faster than indexes on `Tuple` columns. All arrays **must** have same length. Use [CONSTRAINT](/docs/en/sql-reference/statements/create/table.md#constraints) to avoid errors. For example, `CONSTRAINT constraint_name_1 CHECK length(embedding) = 256`.
|
||||
Indexes over columns of type `Array` will generally work faster than indexes on `Tuple` columns. All arrays **must** have same length. Use
|
||||
[CONSTRAINT](/docs/en/sql-reference/statements/create/table.md#constraints) to avoid errors. For example, `CONSTRAINT constraint_name_1
|
||||
CHECK length(vectors) = 256`.
|
||||
:::
|
||||
|
||||
Setting `annoy_index_search_k_nodes` (default: `NumTrees * LIMIT`) determines how many tree nodes are inspected during SELECTs. It can be used to
|
||||
balance runtime and accuracy at runtime.
|
||||
Setting `annoy_index_search_k_nodes` (default: `NumTrees * LIMIT`) determines how many tree nodes are inspected during SELECTs. Larger
|
||||
values mean more accurate results at the cost of longer query runtime:
|
||||
|
||||
Example:
|
||||
|
||||
``` sql
|
||||
```sql
|
||||
SELECT *
|
||||
FROM table_name [WHERE ...]
|
||||
ORDER BY L2Distance(column, Point)
|
||||
FROM table_name
|
||||
ORDER BY L2Distance(vectors, Point)
|
||||
LIMIT N
|
||||
SETTINGS annoy_index_search_k_nodes=100
|
||||
SETTINGS annoy_index_search_k_nodes=100;
|
||||
```
|
||||
|
@ -491,7 +491,7 @@ Syntax: `tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, ran
|
||||
|
||||
#### Special-purpose
|
||||
|
||||
- An experimental index to support approximate nearest neighbor (ANN) search. See [here](annindexes.md) for details.
|
||||
- Experimental indexes to support approximate nearest neighbor (ANN) search. See [here](annindexes.md) for details.
|
||||
- An experimental inverted index to support full-text search. See [here](invertedindexes.md) for details.
|
||||
|
||||
### Functions Support {#functions-support}
|
||||
@ -853,7 +853,7 @@ Tags:
|
||||
- `max_data_part_size_bytes` — the maximum size of a part that can be stored on any of the volume’s disks. If the a size of a merged part estimated to be bigger than `max_data_part_size_bytes` then this part will be written to a next volume. Basically this feature allows to keep new/small parts on a hot (SSD) volume and move them to a cold (HDD) volume when they reach large size. Do not use this setting if your policy has only one volume.
|
||||
- `move_factor` — when the amount of available space gets lower than this factor, data automatically starts to move on the next volume if any (by default, 0.1). ClickHouse sorts existing parts by size from largest to smallest (in descending order) and selects parts with the total size that is sufficient to meet the `move_factor` condition. If the total size of all parts is insufficient, all parts will be moved.
|
||||
- `prefer_not_to_merge` — Disables merging of data parts on this volume. When this setting is enabled, merging data on this volume is not allowed. This allows controlling how ClickHouse works with slow disks.
|
||||
- `perform_ttl_move_on_insert` — Disables TTL move on data part INSERT. By default if we insert a data part that already expired by the TTL move rule it immediately goes to a volume/disk declared in move rule. This can significantly slowdown insert in case if destination volume/disk is slow (e.g. S3).
|
||||
- `perform_ttl_move_on_insert` — Disables TTL move on data part INSERT. By default (if enabled) if we insert a data part that already expired by the TTL move rule it immediately goes to a volume/disk declared in move rule. This can significantly slowdown insert in case if destination volume/disk is slow (e.g. S3). If disabled then already expired data part is written into a default volume and then right after moved to TTL volume.
|
||||
- `load_balancing` - Policy for disk balancing, `round_robin` or `least_used`.
|
||||
|
||||
Configuration examples:
|
||||
@ -1138,7 +1138,7 @@ These parameters define the cache layer:
|
||||
|
||||
Cache parameters:
|
||||
- `path` — The path where metadata for the cache is stored.
|
||||
- `max_size` — The size (amount of memory) that the cache can grow to.
|
||||
- `max_size` — The size (amount of disk space) that the cache can grow to.
|
||||
|
||||
:::tip
|
||||
There are several other cache parameters that you can use to tune your storage, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) for the details.
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user