Merge remote-tracking branch 'upstream/master' into add-separate-access-for-use-named-collections

This commit is contained in:
kssenii 2023-06-14 13:33:56 +02:00
commit 25ae93bbf8
362 changed files with 10760 additions and 3472 deletions

View File

@ -74,6 +74,7 @@ ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
DerivePointerAlignment: false
DisableFormat: false
IndentRequiresClause: false
IndentWidth: 4
IndentWrappedFunctionNames: false
MacroBlockBegin: ''

View File

@ -12,7 +12,7 @@
#### Upgrade Notes
* Compress marks and primary key by default. It significantly reduces the cold query time. Upgrade notes: the support for compressed marks and primary key has been added in version 22.9. If you turned on compressed marks or primary key or installed version 23.5 or newer, which has compressed marks or primary key on by default, you will not be able to downgrade to version 22.8 or earlier. You can also explicitly disable compressed marks or primary keys by specifying the `compress_marks` and `compress_primary_key` settings in the `<merge_tree>` section of the server configuration file. **Upgrade notes:** If you upgrade from versions prior to 22.9, you should either upgrade all replicas at once or disable the compression before upgrade, or upgrade through an intermediate version, where the compressed marks are supported but not enabled by default, such as 23.3. [#42587](https://github.com/ClickHouse/ClickHouse/pull/42587) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Make local object storage work consistently with s3 object storage, fix problem with append (closes [#48465](https://github.com/ClickHouse/ClickHouse/issues/48465)), make it configurable as independent storage. The change is backward incompatible because the cache on top of local object storage is not incompatible to previous versions. [#48791](https://github.com/ClickHouse/ClickHouse/pull/48791) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Make local object storage work consistently with s3 object storage, fix problem with append (closes [#48465](https://github.com/ClickHouse/ClickHouse/issues/48465)), make it configurable as independent storage. The change is backward incompatible because the cache on top of local object storage is not compatible to previous versions. [#48791](https://github.com/ClickHouse/ClickHouse/pull/48791) ([Kseniia Sumarokova](https://github.com/kssenii)).
* The experimental feature "in-memory data parts" is removed. The data format is still supported, but the settings are no-op, and compact or wide parts will be used instead. This closes [#45409](https://github.com/ClickHouse/ClickHouse/issues/45409). [#49429](https://github.com/ClickHouse/ClickHouse/pull/49429) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Changed default values of settings `parallelize_output_from_storages` and `input_format_parquet_preserve_order`. This allows ClickHouse to reorder rows when reading from files (e.g. CSV or Parquet), greatly improving performance in many cases. To restore the old behavior of preserving order, use `parallelize_output_from_storages = 0`, `input_format_parquet_preserve_order = 1`. [#49479](https://github.com/ClickHouse/ClickHouse/pull/49479) ([Michael Kolupaev](https://github.com/al13n321)).
* Make projections production-ready. Add the `optimize_use_projections` setting to control whether the projections will be selected for SELECT queries. The setting `allow_experimental_projection_optimization` is obsolete and does nothing. [#49719](https://github.com/ClickHouse/ClickHouse/pull/49719) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
@ -21,6 +21,7 @@
* Setting `enable_memory_bound_merging_of_aggregation_results` is enabled by default. If you update from version prior to 22.12, we recommend to set this flag to `false` until update is finished. [#50319](https://github.com/ClickHouse/ClickHouse/pull/50319) ([Nikita Taranov](https://github.com/nickitat)).
#### New Feature
* Added storage engine AzureBlobStorage and azureBlobStorage table function. The supported set of features is very similar to storage/table function S3 [#50604] (https://github.com/ClickHouse/ClickHouse/pull/50604) ([alesapin](https://github.com/alesapin)) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni).
* Added native ClickHouse Keeper CLI Client, it is available as `clickhouse keeper-client` [#47414](https://github.com/ClickHouse/ClickHouse/pull/47414) ([pufit](https://github.com/pufit)).
* Add `urlCluster` table function. Refactor all *Cluster table functions to reduce code duplication. Make schema inference work for all possible *Cluster function signatures and for named collections. Closes [#38499](https://github.com/ClickHouse/ClickHouse/issues/38499). [#45427](https://github.com/ClickHouse/ClickHouse/pull/45427) ([attack204](https://github.com/attack204)), Pavel Kruglov.
* The query cache can now be used for production workloads. [#47977](https://github.com/ClickHouse/ClickHouse/pull/47977) ([Robert Schulze](https://github.com/rschu1ze)). The query cache can now support queries with totals and extremes modifier. [#48853](https://github.com/ClickHouse/ClickHouse/pull/48853) ([Robert Schulze](https://github.com/rschu1ze)). Make `allow_experimental_query_cache` setting as obsolete for backward-compatibility. It was removed in https://github.com/ClickHouse/ClickHouse/pull/47977. [#49934](https://github.com/ClickHouse/ClickHouse/pull/49934) ([Timur Solodovnikov](https://github.com/tsolodov)).
@ -31,7 +32,7 @@
* Introduces new keyword `INTO OUTFILE 'file.txt' APPEND`. [#48880](https://github.com/ClickHouse/ClickHouse/pull/48880) ([alekar](https://github.com/alekar)).
* Added `system.zookeeper_connection` table that shows information about Keeper connections. [#45245](https://github.com/ClickHouse/ClickHouse/pull/45245) ([mateng915](https://github.com/mateng0915)).
* Add new function `generateRandomStructure` that generates random table structure. It can be used in combination with table function `generateRandom`. [#47409](https://github.com/ClickHouse/ClickHouse/pull/47409) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow the use of `CASE` without an `ELSE` branch and extended `transform` to deal with more types. Also fix some issues that made transform() return incorrect results when decimal types were mixed with other numeric types. [#48300](https://github.com/ClickHouse/ClickHouse/pull/48300) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Allow the use of `CASE` without an `ELSE` branch and extended `transform` to deal with more types. Also fix some issues that made transform() return incorrect results when decimal types were mixed with other numeric types. [#48300](https://github.com/ClickHouse/ClickHouse/pull/48300) ([Salvatore Mesoraca](https://github.com/aiven-sal)). This closes #2655. This closes #9596. This closes #38666.
* Added [server-side encryption using KMS keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) with S3 tables, and the `header` setting with S3 disks. Closes [#48723](https://github.com/ClickHouse/ClickHouse/issues/48723). [#48724](https://github.com/ClickHouse/ClickHouse/pull/48724) ([Johann Gan](https://github.com/johanngan)).
* Add MemoryTracker for the background tasks (merges and mutation). Introduces `merges_mutations_memory_usage_soft_limit` and `merges_mutations_memory_usage_to_ram_ratio` settings that represent the soft memory limit for merges and mutations. If this limit is reached ClickHouse won't schedule new merge or mutation tasks. Also `MergesMutationsMemoryTracking` metric is introduced to allow observing current memory usage of background tasks. Resubmit [#46089](https://github.com/ClickHouse/ClickHouse/issues/46089). Closes [#48774](https://github.com/ClickHouse/ClickHouse/issues/48774). [#48787](https://github.com/ClickHouse/ClickHouse/pull/48787) ([Dmitry Novik](https://github.com/novikd)).
* Function `dotProduct` work for array. [#49050](https://github.com/ClickHouse/ClickHouse/pull/49050) ([FFFFFFFHHHHHHH](https://github.com/FFFFFFFHHHHHHH)).
@ -68,7 +69,6 @@
* Improve performance of BLAKE3 by 11% by enabling LTO for Rust. [#49600](https://github.com/ClickHouse/ClickHouse/pull/49600) ([Azat Khuzhin](https://github.com/azat)). Now it is on par with C++.
* Optimize the structure of the `system.opentelemetry_span_log`. Use `LowCardinality` where appropriate. Although this table is generally stupid (it is using the Map data type even for common attributes), it will be slightly better. [#49647](https://github.com/ClickHouse/ClickHouse/pull/49647) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Try to reserve hash table's size in `grace_hash` join. [#49816](https://github.com/ClickHouse/ClickHouse/pull/49816) ([lgbo](https://github.com/lgbo-ustc)).
* As is addressed in issue [#49748](https://github.com/ClickHouse/ClickHouse/issues/49748), the predicates with date converters, such as `toYear`, `toYYYYMM`, could be rewritten with the equivalent date (YYYY-MM-DD) comparisons at the AST level. And this transformation could bring performance improvement as it is free from the expensive date converter and the comparison between dates (or integers in the low level representation) is quite low-cost. The [prototype](https://github.com/ZhiguoZh/ClickHouse/commit/c7f1753f0c9363a19d95fa46f1cfed1d9f505ee0) shows that, with all identified date converters optimized, the overall QPS of the 13 queries is enhanced by **~11%** on the ICX server (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads). [#50062](https://github.com/ClickHouse/ClickHouse/pull/50062) [#50307](https://github.com/ClickHouse/ClickHouse/pull/50307) ([Zhiguo Zhou](https://github.com/ZhiguoZh)).
* Parallel merge of `uniqExactIf` states. Closes [#49885](https://github.com/ClickHouse/ClickHouse/issues/49885). [#50285](https://github.com/ClickHouse/ClickHouse/pull/50285) ([flynn](https://github.com/ucasfl)).
* Keeper improvement: add `CheckNotExists` request to Keeper, which allows to improve the performance of Replicated tables. [#48897](https://github.com/ClickHouse/ClickHouse/pull/48897) ([Antonio Andelic](https://github.com/antonio2368)).
* Keeper performance improvements: avoid serializing same request twice while processing. Cache deserialization results of large requests. Controlled by new coordination setting `min_request_size_for_cache`. [#49004](https://github.com/ClickHouse/ClickHouse/pull/49004) ([Antonio Andelic](https://github.com/antonio2368)).

View File

@ -13,9 +13,10 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 23.5 | ✔️ |
| 23.4 | ✔️ |
| 23.3 | ✔️ |
| 23.2 | ✔️ |
| 23.2 | |
| 23.1 | ❌ |
| 22.12 | ❌ |
| 22.11 | ❌ |

View File

@ -117,6 +117,9 @@ public:
void readRaw(char * buffer, std::streamsize length);
/// Reads length bytes of raw data into buffer.
void readCString(std::string& value);
/// Reads zero-terminated C-string into value.
void readBOM();
/// Reads a byte-order mark from the stream and configures
/// the reader for the encountered byte order.

View File

@ -56,6 +56,8 @@ public:
LITTLE_ENDIAN_BYTE_ORDER = 3 /// little-endian byte-order
};
static const std::streamsize DEFAULT_MAX_CSTR_LENGTH { 1024 };
BinaryWriter(std::ostream & ostr, StreamByteOrder byteOrder = NATIVE_BYTE_ORDER);
/// Creates the BinaryWriter.
@ -131,6 +133,9 @@ public:
void writeRaw(const char * buffer, std::streamsize length);
/// Writes length raw bytes from the given buffer to the stream.
void writeCString(const char* cString, std::streamsize maxLength = DEFAULT_MAX_CSTR_LENGTH);
/// Writes zero-terminated C-string.
void writeBOM();
/// Writes a byte-order mark to the stream. A byte order mark is
/// a 16-bit integer with a value of 0xFEFF, written in host byte-order.

View File

@ -274,6 +274,31 @@ void BinaryReader::readRaw(char* buffer, std::streamsize length)
}
void BinaryReader::readCString(std::string& value)
{
value.clear();
if (!_istr.good())
{
return;
}
value.reserve(256);
while (true)
{
char c;
_istr.get(c);
if (!_istr.good())
{
break;
}
if (c == '\0')
{
break;
}
value += c;
}
}
void BinaryReader::readBOM()
{
UInt16 bom;

View File

@ -271,7 +271,7 @@ BinaryWriter& BinaryWriter::operator << (const std::string& value)
BinaryWriter& BinaryWriter::operator << (const char* value)
{
poco_check_ptr (value);
if (_pTextConverter)
{
std::string converted;
@ -332,6 +332,15 @@ void BinaryWriter::writeRaw(const char* buffer, std::streamsize length)
}
void BinaryWriter::writeCString(const char* cString, std::streamsize maxLength)
{
const std::size_t len = ::strnlen(cString, maxLength);
writeRaw(cString, len);
static const char zero = '\0';
_ostr.write(&zero, sizeof(zero));
}
void BinaryWriter::writeBOM()
{
UInt16 value = 0xFEFF;

View File

@ -13,3 +13,4 @@ target_compile_options (_poco_mongodb
target_include_directories (_poco_mongodb SYSTEM PUBLIC "include")
target_link_libraries (_poco_mongodb PUBLIC Poco::Net)

View File

@ -33,7 +33,7 @@ namespace MongoDB
/// This class represents a BSON Array.
{
public:
typedef SharedPtr<Array> Ptr;
using Ptr = SharedPtr<Array>;
Array();
/// Creates an empty Array.
@ -41,8 +41,31 @@ namespace MongoDB
virtual ~Array();
/// Destroys the Array.
// Document template functions available for backward compatibility
using Document::add;
using Document::get;
template <typename T>
T get(int pos) const
Document & add(T value)
/// Creates an element with the name from the current pos and value and
/// adds it to the array document.
///
/// The active document is returned to allow chaining of the add methods.
{
return Document::add<T>(Poco::NumberFormatter::format(size()), value);
}
Document & add(const char * value)
/// Creates an element with a name from the current pos and value and
/// adds it to the array document.
///
/// The active document is returned to allow chaining of the add methods.
{
return Document::add(Poco::NumberFormatter::format(size()), value);
}
template <typename T>
T get(std::size_t pos) const
/// Returns the element at the given index and tries to convert
/// it to the template type. If the element is not found, a
/// Poco::NotFoundException will be thrown. If the element cannot be
@ -52,7 +75,7 @@ namespace MongoDB
}
template <typename T>
T get(int pos, const T & deflt) const
T get(std::size_t pos, const T & deflt) const
/// Returns the element at the given index and tries to convert
/// it to the template type. If the element is not found, or
/// has the wrong type, the deflt argument will be returned.
@ -60,12 +83,12 @@ namespace MongoDB
return Document::get<T>(Poco::NumberFormatter::format(pos), deflt);
}
Element::Ptr get(int pos) const;
Element::Ptr get(std::size_t pos) const;
/// Returns the element at the given index.
/// An empty element will be returned if the element is not found.
template <typename T>
bool isType(int pos) const
bool isType(std::size_t pos) const
/// Returns true if the type of the element equals the TypeId of ElementTrait,
/// otherwise false.
{
@ -74,6 +97,9 @@ namespace MongoDB
std::string toString(int indent = 0) const;
/// Returns a string representation of the Array.
private:
friend void BSONReader::read<Array::Ptr>(Array::Ptr & to);
};

View File

@ -40,7 +40,7 @@ namespace MongoDB
/// A Binary stores its data in a Poco::Buffer<unsigned char>.
{
public:
typedef SharedPtr<Binary> Ptr;
using Ptr = SharedPtr<Binary>;
Binary();
/// Creates an empty Binary with subtype 0.

View File

@ -18,6 +18,7 @@
#define MongoDB_Connection_INCLUDED
#include "Poco/MongoDB/OpMsgMessage.h"
#include "Poco/MongoDB/RequestMessage.h"
#include "Poco/MongoDB/ResponseMessage.h"
#include "Poco/Mutex.h"
@ -39,7 +40,7 @@ namespace MongoDB
/// for more information on the wire protocol.
{
public:
typedef Poco::SharedPtr<Connection> Ptr;
using Ptr = Poco::SharedPtr<Connection>;
class MongoDB_API SocketFactory
{
@ -90,7 +91,7 @@ namespace MongoDB
Poco::Net::SocketAddress address() const;
/// Returns the address of the MongoDB server.
const std::string & uri() const;
/// Returns the uri on which the connection was made.
@ -145,6 +146,21 @@ namespace MongoDB
/// Use this when a response is expected: only a "query" or "getmore"
/// request will return a response.
void sendRequest(OpMsgMessage & request, OpMsgMessage & response);
/// Sends a request to the MongoDB server and receives the response
/// using newer wire protocol with OP_MSG.
void sendRequest(OpMsgMessage & request);
/// Sends an unacknowledged request to the MongoDB server using newer
/// wire protocol with OP_MSG.
/// No response is sent by the server.
void readResponse(OpMsgMessage & response);
/// Reads additional response data when previous message's flag moreToCome
/// indicates that server will send more data.
/// NOTE: See comments in OpMsgCursor code.
protected:
void connect();
@ -164,7 +180,7 @@ namespace MongoDB
}
inline const std::string & Connection::uri() const
{
return _uri;
return _uri;
}

View File

@ -40,6 +40,9 @@ namespace MongoDB
Cursor(const std::string & fullCollectionName, QueryRequest::Flags flags = QueryRequest::QUERY_DEFAULT);
/// Creates a Cursor for the given database and collection ("database.collection"), using the specified flags.
Cursor(const Document & aggregationResponse);
/// Creates a Cursor for the given aggregation query response.
virtual ~Cursor();
/// Destroys the Cursor.

View File

@ -26,6 +26,8 @@
#include "Poco/MongoDB/QueryRequest.h"
#include "Poco/MongoDB/UpdateRequest.h"
#include "Poco/MongoDB/OpMsgCursor.h"
#include "Poco/MongoDB/OpMsgMessage.h"
namespace Poco
{
@ -45,6 +47,9 @@ namespace MongoDB
virtual ~Database();
/// Destroys the Database.
const std::string & name() const;
/// Database name
bool authenticate(
Connection & connection,
const std::string & username,
@ -62,34 +67,49 @@ namespace MongoDB
/// May throw a Poco::ProtocolException if authentication fails for a reason other than
/// invalid credentials.
Document::Ptr queryBuildInfo(Connection & connection) const;
/// Queries server build info (all wire protocols)
Document::Ptr queryServerHello(Connection & connection) const;
/// Queries hello response from server (all wire protocols)
Int64 count(Connection & connection, const std::string & collectionName) const;
/// Sends a count request for the given collection to MongoDB.
/// Sends a count request for the given collection to MongoDB. (old wire protocol)
///
/// If the command fails, -1 is returned.
Poco::SharedPtr<Poco::MongoDB::QueryRequest> createCommand() const;
/// Creates a QueryRequest for a command.
/// Creates a QueryRequest for a command. (old wire protocol)
Poco::SharedPtr<Poco::MongoDB::QueryRequest> createCountRequest(const std::string & collectionName) const;
/// Creates a QueryRequest to count the given collection.
/// The collectionname must not contain the database name.
/// The collectionname must not contain the database name. (old wire protocol)
Poco::SharedPtr<Poco::MongoDB::DeleteRequest> createDeleteRequest(const std::string & collectionName) const;
/// Creates a DeleteRequest to delete documents in the given collection.
/// The collectionname must not contain the database name.
/// The collectionname must not contain the database name. (old wire protocol)
Poco::SharedPtr<Poco::MongoDB::InsertRequest> createInsertRequest(const std::string & collectionName) const;
/// Creates an InsertRequest to insert new documents in the given collection.
/// The collectionname must not contain the database name.
/// The collectionname must not contain the database name. (old wire protocol)
Poco::SharedPtr<Poco::MongoDB::QueryRequest> createQueryRequest(const std::string & collectionName) const;
/// Creates a QueryRequest.
/// Creates a QueryRequest. (old wire protocol)
/// The collectionname must not contain the database name.
Poco::SharedPtr<Poco::MongoDB::UpdateRequest> createUpdateRequest(const std::string & collectionName) const;
/// Creates an UpdateRequest.
/// Creates an UpdateRequest. (old wire protocol)
/// The collectionname must not contain the database name.
Poco::SharedPtr<Poco::MongoDB::OpMsgMessage> createOpMsgMessage(const std::string & collectionName) const;
/// Creates OpMsgMessage. (new wire protocol)
Poco::SharedPtr<Poco::MongoDB::OpMsgMessage> createOpMsgMessage() const;
/// Creates OpMsgMessage for database commands that do not require collection as an argument. (new wire protocol)
Poco::SharedPtr<Poco::MongoDB::OpMsgCursor> createOpMsgCursor(const std::string & collectionName) const;
/// Creates OpMsgCursor. (new wire protocol)
Poco::MongoDB::Document::Ptr ensureIndex(
Connection & connection,
const std::string & collection,
@ -100,14 +120,16 @@ namespace MongoDB
int version = 0,
int ttl = 0);
/// Creates an index. The document returned is the result of a getLastError call.
/// For more info look at the ensureIndex information on the MongoDB website.
/// For more info look at the ensureIndex information on the MongoDB website. (old wire protocol)
Document::Ptr getLastErrorDoc(Connection & connection) const;
/// Sends the getLastError command to the database and returns the error document.
/// (old wire protocol)
std::string getLastError(Connection & connection) const;
/// Sends the getLastError command to the database and returns the err element
/// from the error document. When err is null, an empty string is returned.
/// (old wire protocol)
static const std::string AUTH_MONGODB_CR;
/// Default authentication mechanism prior to MongoDB 3.0.
@ -115,6 +137,27 @@ namespace MongoDB
static const std::string AUTH_SCRAM_SHA1;
/// Default authentication mechanism for MongoDB 3.0.
enum WireVersion
/// Wire version as reported by the command hello.
/// See details in MongoDB github, repository specifications.
/// @see queryServerHello
{
VER_26 = 1,
VER_26_2 = 2,
VER_30 = 3,
VER_32 = 4,
VER_34 = 5,
VER_36 = 6, ///< First wire version that supports OP_MSG
VER_40 = 7,
VER_42 = 8,
VER_44 = 9,
VER_50 = 13,
VER_51 = 14, ///< First wire version that supports only OP_MSG
VER_52 = 15,
VER_53 = 16,
VER_60 = 17
};
protected:
bool authCR(Connection & connection, const std::string & username, const std::string & password);
bool authSCRAM(Connection & connection, const std::string & username, const std::string & password);
@ -127,6 +170,12 @@ namespace MongoDB
//
// inlines
//
inline const std::string & Database::name() const
{
return _dbname;
}
inline Poco::SharedPtr<Poco::MongoDB::QueryRequest> Database::createCommand() const
{
Poco::SharedPtr<Poco::MongoDB::QueryRequest> cmd = createQueryRequest("$cmd");
@ -158,6 +207,24 @@ namespace MongoDB
return new Poco::MongoDB::UpdateRequest(_dbname + '.' + collectionName);
}
// -- New wire protocol commands
inline Poco::SharedPtr<Poco::MongoDB::OpMsgMessage> Database::createOpMsgMessage(const std::string & collectionName) const
{
return new Poco::MongoDB::OpMsgMessage(_dbname, collectionName);
}
inline Poco::SharedPtr<Poco::MongoDB::OpMsgMessage> Database::createOpMsgMessage() const
{
// Collection name for database commands is not needed.
return createOpMsgMessage("");
}
inline Poco::SharedPtr<Poco::MongoDB::OpMsgCursor> Database::createOpMsgCursor(const std::string & collectionName) const
{
return new Poco::MongoDB::OpMsgCursor(_dbname, collectionName);
}
}
} // namespace Poco::MongoDB

View File

@ -31,6 +31,7 @@ namespace Poco
namespace MongoDB
{
class Array;
class ElementFindByName
{
@ -48,8 +49,8 @@ namespace MongoDB
/// Represents a MongoDB (BSON) document.
{
public:
typedef SharedPtr<Document> Ptr;
typedef std::vector<Document::Ptr> Vector;
using Ptr = SharedPtr<Document>;
using Vector = std::vector<Document::Ptr>;
Document();
/// Creates an empty Document.
@ -86,6 +87,10 @@ namespace MongoDB
/// Unlike the other add methods, this method returns
/// a reference to the new document.
Array & addNewArray(const std::string & name);
/// Create a new array and add it to this document.
/// Method returns a reference to the new array.
void clear();
/// Removes all elements from the document.
@ -95,7 +100,7 @@ namespace MongoDB
bool empty() const;
/// Returns true if the document doesn't contain any documents.
bool exists(const std::string & name);
bool exists(const std::string & name) const;
/// Returns true if the document has an element with the given name.
template <typename T>
@ -158,6 +163,9 @@ namespace MongoDB
/// return an Int64. When the element is not found, a
/// Poco::NotFoundException will be thrown.
bool remove(const std::string & name);
/// Removes an element from the document.
template <typename T>
bool isType(const std::string & name) const
/// Returns true when the type of the element equals the TypeId of ElementTrait.
@ -227,12 +235,23 @@ namespace MongoDB
}
inline bool Document::exists(const std::string & name)
inline bool Document::exists(const std::string & name) const
{
return std::find_if(_elements.begin(), _elements.end(), ElementFindByName(name)) != _elements.end();
}
inline bool Document::remove(const std::string & name)
{
auto it = std::find_if(_elements.begin(), _elements.end(), ElementFindByName(name));
if (it == _elements.end())
return false;
_elements.erase(it);
return true;
}
inline std::size_t Document::size() const
{
return _elements.size();

View File

@ -45,7 +45,7 @@ namespace MongoDB
/// Represents an Element of a Document or an Array.
{
public:
typedef Poco::SharedPtr<Element> Ptr;
using Ptr = Poco::SharedPtr<Element>;
explicit Element(const std::string & name);
/// Creates the Element with the given name.
@ -80,7 +80,7 @@ namespace MongoDB
}
typedef std::list<Element::Ptr> ElementSet;
using ElementSet = std::list<Element::Ptr>;
template <typename T>
@ -266,7 +266,7 @@ namespace MongoDB
}
typedef Nullable<unsigned char> NullValue;
using NullValue = Nullable<unsigned char>;
// BSON Null Value

View File

@ -35,7 +35,7 @@ namespace MongoDB
/// Represents JavaScript type in BSON.
{
public:
typedef SharedPtr<JavaScriptCode> Ptr;
using Ptr = SharedPtr<JavaScriptCode>;
JavaScriptCode();
/// Creates an empty JavaScriptCode object.

View File

@ -28,6 +28,9 @@ namespace MongoDB
{
class Message; // Required to disambiguate friend declaration in MessageHeader.
class MongoDB_API MessageHeader
/// Represents the message header which is always prepended to a
/// MongoDB request or response message.
@ -37,14 +40,18 @@ namespace MongoDB
enum OpCode
{
// Opcodes deprecated in MongoDB 5.0
OP_REPLY = 1,
OP_MSG = 1000,
OP_UPDATE = 2001,
OP_INSERT = 2002,
OP_QUERY = 2004,
OP_GET_MORE = 2005,
OP_DELETE = 2006,
OP_KILL_CURSORS = 2007
OP_KILL_CURSORS = 2007,
/// Opcodes supported in MongoDB 5.1 and later
OP_COMPRESSED = 2012,
OP_MSG = 2013
};
explicit MessageHeader(OpCode);

View File

@ -33,6 +33,13 @@
//
#if defined(_WIN32) && defined(POCO_DLL)
# if defined(MongoDB_EXPORTS)
# define MongoDB_API __declspec(dllexport)
# else
# define MongoDB_API __declspec(dllimport)
# endif
#endif
#if !defined(MongoDB_API)
@ -47,6 +54,11 @@
//
// Automatically link MongoDB library.
//
#if defined(_MSC_VER)
# if !defined(POCO_NO_AUTOMATIC_LIBS) && !defined(MongoDB_EXPORTS)
# pragma comment(lib, "PocoMongoDB" POCO_LIB_SUFFIX)
# endif
#endif
#endif // MongoDBMongoDB_INCLUDED

View File

@ -44,7 +44,7 @@ namespace MongoDB
/// as its value.
{
public:
typedef SharedPtr<ObjectId> Ptr;
using Ptr = SharedPtr<ObjectId>;
explicit ObjectId(const std::string & id);
/// Creates an ObjectId from a string.

View File

@ -0,0 +1,96 @@
//
// OpMsgCursor.h
//
// Library: MongoDB
// Package: MongoDB
// Module: OpMsgCursor
//
// Definition of the OpMsgCursor class.
//
// Copyright (c) 2012, Applied Informatics Software Engineering GmbH.
// and Contributors.
//
// SPDX-License-Identifier: BSL-1.0
//
#ifndef MongoDB_OpMsgCursor_INCLUDED
#define MongoDB_OpMsgCursor_INCLUDED
#include "Poco/MongoDB/Connection.h"
#include "Poco/MongoDB/MongoDB.h"
#include "Poco/MongoDB/OpMsgMessage.h"
namespace Poco
{
namespace MongoDB
{
class MongoDB_API OpMsgCursor : public Document
/// OpMsgCursor is an helper class for querying multiple documents using OpMsgMessage.
{
public:
OpMsgCursor(const std::string & dbname, const std::string & collectionName);
/// Creates a OpMsgCursor for the given database and collection.
virtual ~OpMsgCursor();
/// Destroys the OpMsgCursor.
void setEmptyFirstBatch(bool empty);
/// Empty first batch is used to get error response faster with little server processing
bool emptyFirstBatch() const;
void setBatchSize(Int32 batchSize);
/// Set non-default batch size
Int32 batchSize() const;
/// Current batch size (zero or negative number indicates default batch size)
Int64 cursorID() const;
OpMsgMessage & next(Connection & connection);
/// Tries to get the next documents. As long as response message has a
/// cursor ID next can be called to retrieve the next bunch of documents.
///
/// The cursor must be killed (see kill()) when not all documents are needed.
OpMsgMessage & query();
/// Returns the associated query.
void kill(Connection & connection);
/// Kills the cursor and reset it so that it can be reused.
private:
OpMsgMessage _query;
OpMsgMessage _response;
bool _emptyFirstBatch{false};
Int32 _batchSize{-1};
/// Batch size used in the cursor. Zero or negative value means that default shall be used.
Int64 _cursorID{0};
};
//
// inlines
//
inline OpMsgMessage & OpMsgCursor::query()
{
return _query;
}
inline Int64 OpMsgCursor::cursorID() const
{
return _cursorID;
}
}
} // namespace Poco::MongoDB
#endif // MongoDB_OpMsgCursor_INCLUDED

View File

@ -0,0 +1,163 @@
//
// OpMsgMessage.h
//
// Library: MongoDB
// Package: MongoDB
// Module: OpMsgMessage
//
// Definition of the OpMsgMessage class.
//
// Copyright (c) 2022, Applied Informatics Software Engineering GmbH.
// and Contributors.
//
// SPDX-License-Identifier: BSL-1.0
//
#ifndef MongoDB_OpMsgMessage_INCLUDED
#define MongoDB_OpMsgMessage_INCLUDED
#include "Poco/MongoDB/Document.h"
#include "Poco/MongoDB/Message.h"
#include "Poco/MongoDB/MongoDB.h"
#include <string>
namespace Poco
{
namespace MongoDB
{
class MongoDB_API OpMsgMessage : public Message
/// This class represents a request/response (OP_MSG) to send requests and receive responses to/from MongoDB.
{
public:
// Constants for most often used MongoDB commands that can be sent using OP_MSG
// For complete list see: https://www.mongodb.com/docs/manual/reference/command/
// Query and write
static const std::string CMD_INSERT;
static const std::string CMD_DELETE;
static const std::string CMD_UPDATE;
static const std::string CMD_FIND;
static const std::string CMD_FIND_AND_MODIFY;
static const std::string CMD_GET_MORE;
// Aggregation
static const std::string CMD_AGGREGATE;
static const std::string CMD_COUNT;
static const std::string CMD_DISTINCT;
static const std::string CMD_MAP_REDUCE;
// Replication and administration
static const std::string CMD_HELLO;
static const std::string CMD_REPL_SET_GET_STATUS;
static const std::string CMD_REPL_SET_GET_CONFIG;
static const std::string CMD_CREATE;
static const std::string CMD_CREATE_INDEXES;
static const std::string CMD_DROP;
static const std::string CMD_DROP_DATABASE;
static const std::string CMD_KILL_CURSORS;
static const std::string CMD_LIST_DATABASES;
static const std::string CMD_LIST_INDEXES;
// Diagnostic
static const std::string CMD_BUILD_INFO;
static const std::string CMD_COLL_STATS;
static const std::string CMD_DB_STATS;
static const std::string CMD_HOST_INFO;
enum Flags : UInt32
{
MSG_FLAGS_DEFAULT = 0,
MSG_CHECKSUM_PRESENT = (1 << 0),
MSG_MORE_TO_COME = (1 << 1),
/// Sender will send another message and is not prepared for overlapping messages
MSG_EXHAUST_ALLOWED = (1 << 16)
/// Client is prepared for multiple replies (using the moreToCome bit) to this request
};
OpMsgMessage();
/// Creates an OpMsgMessage for response.
OpMsgMessage(const std::string & databaseName, const std::string & collectionName, UInt32 flags = MSG_FLAGS_DEFAULT);
/// Creates an OpMsgMessage for requests.
virtual ~OpMsgMessage();
const std::string & databaseName() const;
const std::string & collectionName() const;
void setCommandName(const std::string & command);
/// Sets the command name and clears the command document
void setCursor(Poco::Int64 cursorID, Poco::Int32 batchSize = -1);
/// Sets the command "getMore" for the cursor id with batch size (if it is not negative).
const std::string & commandName() const;
/// Current command name.
void setAcknowledgedRequest(bool ack);
/// Set false to create request that does not return response.
/// It has effect only for commands that write or delete documents.
/// Default is true (request returns acknowledge response).
bool acknowledgedRequest() const;
UInt32 flags() const;
Document & body();
/// Access to body document.
/// Additional query arguments shall be added after setting the command name.
const Document & body() const;
Document::Vector & documents();
/// Documents prepared for request or retrieved in response.
const Document::Vector & documents() const;
/// Documents prepared for request or retrieved in response.
bool responseOk() const;
/// Reads "ok" status from the response message.
void clear();
/// Clears the message.
void send(std::ostream & ostr);
/// Writes the request to stream.
void read(std::istream & istr);
/// Reads the response from the stream.
private:
enum PayloadType : UInt8
{
PAYLOAD_TYPE_0 = 0,
PAYLOAD_TYPE_1 = 1
};
std::string _databaseName;
std::string _collectionName;
UInt32 _flags{MSG_FLAGS_DEFAULT};
std::string _commandName;
bool _acknowledged{true};
Document _body;
Document::Vector _documents;
};
}
} // namespace Poco::MongoDB
#endif // MongoDB_OpMsgMessage_INCLUDED

View File

@ -94,7 +94,23 @@ namespace MongoDB
operator Connection::Ptr() { return _connection; }
#if defined(POCO_ENABLE_CPP11)
// Disable copy to prevent unwanted release of resources: C++11 way
PooledConnection(const PooledConnection &) = delete;
PooledConnection & operator=(const PooledConnection &) = delete;
// Enable move semantics
PooledConnection(PooledConnection && other) = default;
PooledConnection & operator=(PooledConnection &&) = default;
#endif
private:
#if !defined(POCO_ENABLE_CPP11)
// Disable copy to prevent unwanted release of resources: pre C++11 way
PooledConnection(const PooledConnection &);
PooledConnection & operator=(const PooledConnection &);
#endif
Poco::ObjectPool<Connection, Connection::Ptr> & _pool;
Connection::Ptr _connection;
};

View File

@ -33,7 +33,7 @@ namespace MongoDB
/// Represents a regular expression in BSON format.
{
public:
typedef SharedPtr<RegularExpression> Ptr;
using Ptr = SharedPtr<RegularExpression>;
RegularExpression();
/// Creates an empty RegularExpression.

View File

@ -38,6 +38,9 @@ namespace MongoDB
ResponseMessage();
/// Creates an empty ResponseMessage.
ResponseMessage(const Int64 & cursorID);
/// Creates an ResponseMessage for existing cursor ID.
virtual ~ResponseMessage();
/// Destroys the ResponseMessage.

View File

@ -20,7 +20,7 @@ namespace Poco {
namespace MongoDB {
Array::Array():
Array::Array():
Document()
{
}
@ -31,7 +31,7 @@ Array::~Array()
}
Element::Ptr Array::get(int pos) const
Element::Ptr Array::get(std::size_t pos) const
{
std::string name = Poco::NumberFormatter::format(pos);
return Document::get(name);

View File

@ -319,4 +319,30 @@ void Connection::sendRequest(RequestMessage& request, ResponseMessage& response)
}
void Connection::sendRequest(OpMsgMessage& request, OpMsgMessage& response)
{
Poco::Net::SocketOutputStream sos(_socket);
request.send(sos);
response.clear();
readResponse(response);
}
void Connection::sendRequest(OpMsgMessage& request)
{
request.setAcknowledgedRequest(false);
Poco::Net::SocketOutputStream sos(_socket);
request.send(sos);
}
void Connection::readResponse(OpMsgMessage& response)
{
Poco::Net::SocketInputStream sis(_socket);
response.read(sis);
}
} } // Poco::MongoDB

View File

@ -33,6 +33,12 @@ Cursor::Cursor(const std::string& fullCollectionName, QueryRequest::Flags flags)
}
Cursor::Cursor(const Document& aggregationResponse) :
_query(aggregationResponse.get<Poco::MongoDB::Document::Ptr>("cursor")->get<std::string>("ns")),
_response(aggregationResponse.get<Poco::MongoDB::Document::Ptr>("cursor")->get<Int64>("id"))
{
}
Cursor::~Cursor()
{
try

View File

@ -334,6 +334,50 @@ bool Database::authSCRAM(Connection& connection, const std::string& username, co
}
Document::Ptr Database::queryBuildInfo(Connection& connection) const
{
// build info can be issued on "config" system database
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createCommand();
request->selector().add("buildInfo", 1);
Poco::MongoDB::ResponseMessage response;
connection.sendRequest(*request, response);
Document::Ptr buildInfo;
if ( response.documents().size() > 0 )
{
buildInfo = response.documents()[0];
}
else
{
throw Poco::ProtocolException("Didn't get a response from the buildinfo command");
}
return buildInfo;
}
Document::Ptr Database::queryServerHello(Connection& connection) const
{
// hello can be issued on "config" system database
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createCommand();
request->selector().add("hello", 1);
Poco::MongoDB::ResponseMessage response;
connection.sendRequest(*request, response);
Document::Ptr hello;
if ( response.documents().size() > 0 )
{
hello = response.documents()[0];
}
else
{
throw Poco::ProtocolException("Didn't get a response from the hello command");
}
return hello;
}
Int64 Database::count(Connection& connection, const std::string& collectionName) const
{
Poco::SharedPtr<Poco::MongoDB::QueryRequest> countRequest = createCountRequest(collectionName);
@ -390,7 +434,7 @@ Document::Ptr Database::getLastErrorDoc(Connection& connection) const
{
Document::Ptr errorDoc;
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createQueryRequest("$cmd");
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createCommand();
request->setNumberToReturn(1);
request->selector().add("getLastError", 1);
@ -420,7 +464,7 @@ std::string Database::getLastError(Connection& connection) const
Poco::SharedPtr<Poco::MongoDB::QueryRequest> Database::createCountRequest(const std::string& collectionName) const
{
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createQueryRequest("$cmd");
Poco::SharedPtr<Poco::MongoDB::QueryRequest> request = createCommand();
request->setNumberToReturn(1);
request->selector().add("count", collectionName);
return request;

View File

@ -20,8 +20,8 @@ namespace MongoDB {
DeleteRequest::DeleteRequest(const std::string& collectionName, DeleteRequest::Flags flags):
RequestMessage(MessageHeader::OP_DELETE),
_flags(flags),
RequestMessage(MessageHeader::OP_DELETE),
_flags(flags),
_fullCollectionName(collectionName),
_selector()
{

View File

@ -35,6 +35,14 @@ Document::~Document()
}
Array& Document::addNewArray(const std::string& name)
{
Array::Ptr newArray = new Array();
add(name, newArray);
return *newArray;
}
Element::Ptr Document::get(const std::string& name) const
{
Element::Ptr element;
@ -84,7 +92,7 @@ void Document::read(BinaryReader& reader)
while (type != '\0')
{
Element::Ptr element;
std::string name = BSONReader(reader).readCString();
switch (type)
@ -198,7 +206,7 @@ void Document::write(BinaryWriter& writer)
else
{
std::stringstream sstream;
Poco::BinaryWriter tempWriter(sstream);
Poco::BinaryWriter tempWriter(sstream, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
for (ElementSet::iterator it = _elements.begin(); it != _elements.end(); ++it)
{
tempWriter << static_cast<unsigned char>((*it)->type());
@ -207,7 +215,7 @@ void Document::write(BinaryWriter& writer)
element->write(tempWriter);
}
tempWriter.flush();
Poco::Int32 len = static_cast<Poco::Int32>(5 + sstream.tellp()); /* 5 = sizeof(len) + 0-byte */
writer << len;
writer.writeRaw(sstream.str());

View File

@ -24,7 +24,7 @@ Element::Element(const std::string& name) : _name(name)
}
Element::~Element()
Element::~Element()
{
}

View File

@ -21,7 +21,7 @@ namespace MongoDB {
GetMoreRequest::GetMoreRequest(const std::string& collectionName, Int64 cursorID):
RequestMessage(MessageHeader::OP_GET_MORE),
RequestMessage(MessageHeader::OP_GET_MORE),
_fullCollectionName(collectionName),
_numberToReturn(100),
_cursorID(cursorID)

View File

@ -20,7 +20,7 @@ namespace MongoDB {
InsertRequest::InsertRequest(const std::string& collectionName, Flags flags):
RequestMessage(MessageHeader::OP_INSERT),
RequestMessage(MessageHeader::OP_INSERT),
_flags(flags),
_fullCollectionName(collectionName)
{

View File

@ -37,7 +37,7 @@ void KillCursorsRequest::buildRequest(BinaryWriter& writer)
for (std::vector<Int64>::iterator it = _cursors.begin(); it != _cursors.end(); ++it)
{
writer << *it;
}
}
}

View File

@ -19,7 +19,7 @@ namespace Poco {
namespace MongoDB {
Message::Message(MessageHeader::OpCode opcode):
Message::Message(MessageHeader::OpCode opcode):
_header(opcode)
{
}

View File

@ -20,10 +20,10 @@ namespace Poco {
namespace MongoDB {
MessageHeader::MessageHeader(OpCode opCode):
_messageLength(0),
_requestID(0),
_responseTo(0),
MessageHeader::MessageHeader(OpCode opCode):
_messageLength(0),
_requestID(0),
_responseTo(0),
_opCode(opCode)
{
}
@ -42,7 +42,7 @@ void MessageHeader::read(BinaryReader& reader)
Int32 opCode;
reader >> opCode;
_opCode = (OpCode) opCode;
_opCode = static_cast<OpCode>(opCode);
if (!reader.good())
{
@ -56,7 +56,7 @@ void MessageHeader::write(BinaryWriter& writer)
writer << _messageLength;
writer << _requestID;
writer << _responseTo;
writer << (Int32) _opCode;
writer << static_cast<Int32>(_opCode);
}

View File

@ -32,7 +32,7 @@ ObjectId::ObjectId(const std::string& id)
poco_assert_dbg(id.size() == 24);
const char* p = id.c_str();
for (std::size_t i = 0; i < 12; ++i)
for (std::size_t i = 0; i < 12; ++i)
{
_id[i] = fromHex(p);
p += 2;

View File

@ -0,0 +1,187 @@
//
// OpMsgCursor.cpp
//
// Library: MongoDB
// Package: MongoDB
// Module: OpMsgCursor
//
// Copyright (c) 2022, Applied Informatics Software Engineering GmbH.
// and Contributors.
//
// SPDX-License-Identifier: BSL-1.0
//
#include "Poco/MongoDB/OpMsgCursor.h"
#include "Poco/MongoDB/Array.h"
//
// NOTE:
//
// MongoDB specification indicates that the flag MSG_EXHAUST_ALLOWED shall be
// used in the request when the receiver is ready to receive multiple messages
// without sending additional requests in between. Sender (MongoDB) indicates
// that more messages follow with flag MSG_MORE_TO_COME.
//
// It seems that this does not work properly. MSG_MORE_TO_COME is set and reading
// next messages sometimes works, however often the data is missing in response
// or the message header contains wrong message length and reading blocks.
// Opcode in the header is correct.
//
// Using MSG_EXHAUST_ALLOWED is therefore currently disabled.
//
// It seems that related JIRA ticket is:
//
// https://jira.mongodb.org/browse/SERVER-57297
//
// https://github.com/mongodb/specifications/blob/master/source/message/OP_MSG.rst
//
#define MONGODB_EXHAUST_ALLOWED_WORKS false
namespace Poco {
namespace MongoDB {
static const std::string keyCursor {"cursor"};
static const std::string keyFirstBatch {"firstBatch"};
static const std::string keyNextBatch {"nextBatch"};
static Poco::Int64 cursorIdFromResponse(const MongoDB::Document& doc);
OpMsgCursor::OpMsgCursor(const std::string& db, const std::string& collection):
#if MONGODB_EXHAUST_ALLOWED_WORKS
_query(db, collection, OpMsgMessage::MSG_EXHAUST_ALLOWED)
#else
_query(db, collection)
#endif
{
}
OpMsgCursor::~OpMsgCursor()
{
try
{
poco_assert_dbg(_cursorID == 0);
}
catch (...)
{
}
}
void OpMsgCursor::setEmptyFirstBatch(bool empty)
{
_emptyFirstBatch = empty;
}
bool OpMsgCursor::emptyFirstBatch() const
{
return _emptyFirstBatch;
}
void OpMsgCursor::setBatchSize(Int32 batchSize)
{
_batchSize = batchSize;
}
Int32 OpMsgCursor::batchSize() const
{
return _batchSize;
}
OpMsgMessage& OpMsgCursor::next(Connection& connection)
{
if (_cursorID == 0)
{
_response.clear();
if (_emptyFirstBatch || _batchSize > 0)
{
Int32 bsize = _emptyFirstBatch ? 0 : _batchSize;
if (_query.commandName() == OpMsgMessage::CMD_FIND)
{
_query.body().add("batchSize", bsize);
}
else if (_query.commandName() == OpMsgMessage::CMD_AGGREGATE)
{
auto& cursorDoc = _query.body().addNewDocument("cursor");
cursorDoc.add("batchSize", bsize);
}
}
connection.sendRequest(_query, _response);
const auto& rdoc = _response.body();
_cursorID = cursorIdFromResponse(rdoc);
}
else
{
#if MONGODB_EXHAUST_ALLOWED_WORKS
std::cout << "Response flags: " << _response.flags() << std::endl;
if (_response.flags() & OpMsgMessage::MSG_MORE_TO_COME)
{
std::cout << "More to come. Reading more response: " << std::endl;
_response.clear();
connection.readResponse(_response);
}
else
#endif
{
_response.clear();
_query.setCursor(_cursorID, _batchSize);
connection.sendRequest(_query, _response);
}
}
const auto& rdoc = _response.body();
_cursorID = cursorIdFromResponse(rdoc);
return _response;
}
void OpMsgCursor::kill(Connection& connection)
{
_response.clear();
if (_cursorID != 0)
{
_query.setCommandName(OpMsgMessage::CMD_KILL_CURSORS);
MongoDB::Array::Ptr cursors = new MongoDB::Array();
cursors->add<Poco::Int64>(_cursorID);
_query.body().add("cursors", cursors);
connection.sendRequest(_query, _response);
const auto killed = _response.body().get<MongoDB::Array::Ptr>("cursorsKilled", nullptr);
if (!killed || killed->size() != 1 || killed->get<Poco::Int64>(0, -1) != _cursorID)
{
throw Poco::ProtocolException("Cursor not killed as expected: " + std::to_string(_cursorID));
}
_cursorID = 0;
_query.clear();
_response.clear();
}
}
Poco::Int64 cursorIdFromResponse(const MongoDB::Document& doc)
{
Poco::Int64 id {0};
auto cursorDoc = doc.get<Document::Ptr>(keyCursor, nullptr);
if(cursorDoc)
{
id = cursorDoc->get<Poco::Int64>("id", 0);
}
return id;
}
} } // Namespace Poco::MongoDB

View File

@ -0,0 +1,412 @@
//
// OpMsgMessage.cpp
//
// Library: MongoDB
// Package: MongoDB
// Module: OpMsgMessage
//
// Copyright (c) 2022, Applied Informatics Software Engineering GmbH.
// and Contributors.
//
// SPDX-License-Identifier: BSL-1.0
//
#include "Poco/MongoDB/OpMsgMessage.h"
#include "Poco/MongoDB/MessageHeader.h"
#include "Poco/MongoDB/Array.h"
#include "Poco/StreamCopier.h"
#include "Poco/Logger.h"
#define POCO_MONGODB_DUMP false
namespace Poco {
namespace MongoDB {
// Query and write
const std::string OpMsgMessage::CMD_INSERT { "insert" };
const std::string OpMsgMessage::CMD_DELETE { "delete" };
const std::string OpMsgMessage::CMD_UPDATE { "update" };
const std::string OpMsgMessage::CMD_FIND { "find" };
const std::string OpMsgMessage::CMD_FIND_AND_MODIFY { "findAndModify" };
const std::string OpMsgMessage::CMD_GET_MORE { "getMore" };
// Aggregation
const std::string OpMsgMessage::CMD_AGGREGATE { "aggregate" };
const std::string OpMsgMessage::CMD_COUNT { "count" };
const std::string OpMsgMessage::CMD_DISTINCT { "distinct" };
const std::string OpMsgMessage::CMD_MAP_REDUCE { "mapReduce" };
// Replication and administration
const std::string OpMsgMessage::CMD_HELLO { "hello" };
const std::string OpMsgMessage::CMD_REPL_SET_GET_STATUS { "replSetGetStatus" };
const std::string OpMsgMessage::CMD_REPL_SET_GET_CONFIG { "replSetGetConfig" };
const std::string OpMsgMessage::CMD_CREATE { "create" };
const std::string OpMsgMessage::CMD_CREATE_INDEXES { "createIndexes" };
const std::string OpMsgMessage::CMD_DROP { "drop" };
const std::string OpMsgMessage::CMD_DROP_DATABASE { "dropDatabase" };
const std::string OpMsgMessage::CMD_KILL_CURSORS { "killCursors" };
const std::string OpMsgMessage::CMD_LIST_DATABASES { "listDatabases" };
const std::string OpMsgMessage::CMD_LIST_INDEXES { "listIndexes" };
// Diagnostic
const std::string OpMsgMessage::CMD_BUILD_INFO { "buildInfo" };
const std::string OpMsgMessage::CMD_COLL_STATS { "collStats" };
const std::string OpMsgMessage::CMD_DB_STATS { "dbStats" };
const std::string OpMsgMessage::CMD_HOST_INFO { "hostInfo" };
static const std::string& commandIdentifier(const std::string& command);
/// Commands have different names for the payload that is sent in a separate section
static const std::string keyCursor {"cursor"};
static const std::string keyFirstBatch {"firstBatch"};
static const std::string keyNextBatch {"nextBatch"};
OpMsgMessage::OpMsgMessage() :
Message(MessageHeader::OP_MSG)
{
}
OpMsgMessage::OpMsgMessage(const std::string& databaseName, const std::string& collectionName, UInt32 flags) :
Message(MessageHeader::OP_MSG),
_databaseName(databaseName),
_collectionName(collectionName),
_flags(flags)
{
}
OpMsgMessage::~OpMsgMessage()
{
}
const std::string& OpMsgMessage::databaseName() const
{
return _databaseName;
}
const std::string& OpMsgMessage::collectionName() const
{
return _collectionName;
}
void OpMsgMessage::setCommandName(const std::string& command)
{
_commandName = command;
_body.clear();
// IMPORTANT: Command name must be first
if (_collectionName.empty())
{
// Collection is not specified. It is assumed that this particular command does
// not need it.
_body.add(_commandName, Int32(1));
}
else
{
_body.add(_commandName, _collectionName);
}
_body.add("$db", _databaseName);
}
void OpMsgMessage::setCursor(Poco::Int64 cursorID, Poco::Int32 batchSize)
{
_commandName = OpMsgMessage::CMD_GET_MORE;
_body.clear();
// IMPORTANT: Command name must be first
_body.add(_commandName, cursorID);
_body.add("$db", _databaseName);
_body.add("collection", _collectionName);
if (batchSize > 0)
{
_body.add("batchSize", batchSize);
}
}
const std::string& OpMsgMessage::commandName() const
{
return _commandName;
}
void OpMsgMessage::setAcknowledgedRequest(bool ack)
{
const auto& id = commandIdentifier(_commandName);
if (id.empty())
return;
_acknowledged = ack;
auto writeConcern = _body.get<Document::Ptr>("writeConcern", nullptr);
if (writeConcern)
writeConcern->remove("w");
if (ack)
{
_flags = _flags & (~MSG_MORE_TO_COME);
}
else
{
_flags = _flags | MSG_MORE_TO_COME;
if (!writeConcern)
_body.addNewDocument("writeConcern").add("w", 0);
else
writeConcern->add("w", 0);
}
}
bool OpMsgMessage::acknowledgedRequest() const
{
return _acknowledged;
}
UInt32 OpMsgMessage::flags() const
{
return _flags;
}
Document& OpMsgMessage::body()
{
return _body;
}
const Document& OpMsgMessage::body() const
{
return _body;
}
Document::Vector& OpMsgMessage::documents()
{
return _documents;
}
const Document::Vector& OpMsgMessage::documents() const
{
return _documents;
}
bool OpMsgMessage::responseOk() const
{
Poco::Int64 ok {false};
if (_body.exists("ok"))
{
ok = _body.getInteger("ok");
}
return (ok != 0);
}
void OpMsgMessage::clear()
{
_flags = MSG_FLAGS_DEFAULT;
_commandName.clear();
_body.clear();
_documents.clear();
}
void OpMsgMessage::send(std::ostream& ostr)
{
BinaryWriter socketWriter(ostr, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
// Serialise the body
std::stringstream ss;
BinaryWriter writer(ss, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
writer << _flags;
writer << PAYLOAD_TYPE_0;
_body.write(writer);
if (!_documents.empty())
{
// Serialise attached documents
std::stringstream ssdoc;
BinaryWriter wdoc(ssdoc, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
for (auto& doc: _documents)
{
doc->write(wdoc);
}
wdoc.flush();
const std::string& identifier = commandIdentifier(_commandName);
const Poco::Int32 size = static_cast<Poco::Int32>(sizeof(size) + identifier.size() + 1 + ssdoc.tellp());
writer << PAYLOAD_TYPE_1;
writer << size;
writer.writeCString(identifier.c_str());
StreamCopier::copyStream(ssdoc, ss);
}
writer.flush();
#if POCO_MONGODB_DUMP
const std::string section = ss.str();
std::string dump;
Logger::formatDump(dump, section.data(), section.length());
std::cout << dump << std::endl;
#endif
messageLength(static_cast<Poco::Int32>(ss.tellp()));
_header.write(socketWriter);
StreamCopier::copyStream(ss, ostr);
ostr.flush();
}
void OpMsgMessage::read(std::istream& istr)
{
std::string message;
{
BinaryReader reader(istr, BinaryReader::LITTLE_ENDIAN_BYTE_ORDER);
_header.read(reader);
poco_assert_dbg(_header.opCode() == _header.OP_MSG);
const std::streamsize remainingSize {_header.getMessageLength() - _header.MSG_HEADER_SIZE };
message.reserve(remainingSize);
#if POCO_MONGODB_DUMP
std::cout
<< "Message hdr: " << _header.getMessageLength() << " " << remainingSize << " "
<< _header.opCode() << " " << _header.getRequestID() << " " << _header.responseTo()
<< std::endl;
#endif
reader.readRaw(remainingSize, message);
#if POCO_MONGODB_DUMP
std::string dump;
Logger::formatDump(dump, message.data(), message.length());
std::cout << dump << std::endl;
#endif
}
// Read complete message and then interpret it.
std::istringstream msgss(message);
BinaryReader reader(msgss, BinaryReader::LITTLE_ENDIAN_BYTE_ORDER);
Poco::UInt8 payloadType {0xFF};
reader >> _flags;
reader >> payloadType;
poco_assert_dbg(payloadType == PAYLOAD_TYPE_0);
_body.read(reader);
// Read next sections from the buffer
while (msgss.good())
{
// NOTE: Not tested yet with database, because it returns everything in the body.
// Does MongoDB ever return documents as Payload type 1?
reader >> payloadType;
if (!msgss.good())
{
break;
}
poco_assert_dbg(payloadType == PAYLOAD_TYPE_1);
#if POCO_MONGODB_DUMP
std::cout << "section payload: " << payloadType << std::endl;
#endif
Poco::Int32 sectionSize {0};
reader >> sectionSize;
poco_assert_dbg(sectionSize > 0);
#if POCO_MONGODB_DUMP
std::cout << "section size: " << sectionSize << std::endl;
#endif
std::streamoff offset = sectionSize - sizeof(sectionSize);
std::streampos endOfSection = msgss.tellg() + offset;
std::string identifier;
reader.readCString(identifier);
#if POCO_MONGODB_DUMP
std::cout << "section identifier: " << identifier << std::endl;
#endif
// Loop to read documents from this section.
while (msgss.tellg() < endOfSection)
{
#if POCO_MONGODB_DUMP
std::cout << "section doc: " << msgss.tellg() << " " << endOfSection << std::endl;
#endif
Document::Ptr doc = new Document();
doc->read(reader);
_documents.push_back(doc);
if (msgss.tellg() < 0)
{
break;
}
}
}
// Extract documents from the cursor batch if they are there.
MongoDB::Array::Ptr batch;
auto curDoc = _body.get<MongoDB::Document::Ptr>(keyCursor, nullptr);
if (curDoc)
{
batch = curDoc->get<MongoDB::Array::Ptr>(keyFirstBatch, nullptr);
if (!batch)
{
batch = curDoc->get<MongoDB::Array::Ptr>(keyNextBatch, nullptr);
}
}
if (batch)
{
for(std::size_t i = 0; i < batch->size(); i++)
{
const auto& d = batch->get<MongoDB::Document::Ptr>(i, nullptr);
if (d)
{
_documents.push_back(d);
}
}
}
}
const std::string& commandIdentifier(const std::string& command)
{
// Names of identifiers for commands that send bulk documents in the request
// The identifier is set in the section type 1.
static std::map<std::string, std::string> identifiers {
{ OpMsgMessage::CMD_INSERT, "documents" },
{ OpMsgMessage::CMD_DELETE, "deletes" },
{ OpMsgMessage::CMD_UPDATE, "updates" },
// Not sure if create index can send document section
{ OpMsgMessage::CMD_CREATE_INDEXES, "indexes" }
};
const auto i = identifiers.find(command);
if (i != identifiers.end())
{
return i->second;
}
// This likely means that documents are incorrectly set for a command
// that does not send list of documents in section type 1.
static const std::string emptyIdentifier;
return emptyIdentifier;
}
} } // namespace Poco::MongoDB

View File

@ -20,10 +20,10 @@ namespace MongoDB {
QueryRequest::QueryRequest(const std::string& collectionName, QueryRequest::Flags flags):
RequestMessage(MessageHeader::OP_QUERY),
_flags(flags),
RequestMessage(MessageHeader::OP_QUERY),
_flags(flags),
_fullCollectionName(collectionName),
_numberToSkip(0),
_numberToSkip(0),
_numberToReturn(100),
_selector(),
_returnFieldSelector()

View File

@ -25,8 +25,8 @@ RegularExpression::RegularExpression()
}
RegularExpression::RegularExpression(const std::string& pattern, const std::string& options):
_pattern(pattern),
RegularExpression::RegularExpression(const std::string& pattern, const std::string& options):
_pattern(pattern),
_options(options)
{
}

View File

@ -21,7 +21,7 @@ namespace Poco {
namespace MongoDB {
ReplicaSet::ReplicaSet(const std::vector<Net::SocketAddress> &addresses):
ReplicaSet::ReplicaSet(const std::vector<Net::SocketAddress> &addresses):
_addresses(addresses)
{
}
@ -81,8 +81,8 @@ Connection::Ptr ReplicaSet::isMaster(const Net::SocketAddress& address)
{
conn = 0;
}
return 0;
return 0;
}

View File

@ -21,7 +21,7 @@ namespace Poco {
namespace MongoDB {
RequestMessage::RequestMessage(MessageHeader::OpCode opcode):
RequestMessage::RequestMessage(MessageHeader::OpCode opcode):
Message(opcode)
{
}
@ -35,7 +35,7 @@ RequestMessage::~RequestMessage()
void RequestMessage::send(std::ostream& ostr)
{
std::stringstream ss;
BinaryWriter requestWriter(ss);
BinaryWriter requestWriter(ss, BinaryWriter::LITTLE_ENDIAN_BYTE_ORDER);
buildRequest(requestWriter);
requestWriter.flush();

View File

@ -21,10 +21,20 @@ namespace MongoDB {
ResponseMessage::ResponseMessage():
Message(MessageHeader::OP_REPLY),
_responseFlags(0),
_cursorID(0),
_startingFrom(0),
Message(MessageHeader::OP_REPLY),
_responseFlags(0),
_cursorID(0),
_startingFrom(0),
_numberReturned(0)
{
}
ResponseMessage::ResponseMessage(const Int64& cursorID):
Message(MessageHeader::OP_REPLY),
_responseFlags(0),
_cursorID(cursorID),
_startingFrom(0),
_numberReturned(0)
{
}
@ -50,7 +60,7 @@ void ResponseMessage::read(std::istream& istr)
clear();
BinaryReader reader(istr, BinaryReader::LITTLE_ENDIAN_BYTE_ORDER);
_header.read(reader);
reader >> _responseFlags;

View File

@ -20,7 +20,7 @@ namespace MongoDB {
UpdateRequest::UpdateRequest(const std::string& collectionName, UpdateRequest::Flags flags):
RequestMessage(MessageHeader::OP_UPDATE),
RequestMessage(MessageHeader::OP_UPDATE),
_flags(flags),
_fullCollectionName(collectionName),
_selector(),

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54474)
SET(VERSION_REVISION 54475)
SET(VERSION_MAJOR 23)
SET(VERSION_MINOR 5)
SET(VERSION_MINOR 6)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 3920eb987f7ed837ada5de8907284adf123f0583)
SET(VERSION_DESCRIBE v23.5.1.1-testing)
SET(VERSION_STRING 23.5.1.1)
SET(VERSION_GITHASH 2fec796e73efda10a538a03af3205ce8ffa1b2de)
SET(VERSION_DESCRIBE v23.6.1.1-testing)
SET(VERSION_STRING 23.6.1.1)
# end of autochange

2
contrib/NuRaft vendored

@ -1 +1 @@
Subproject commit b56784be1aec568fb72aff47f281097c017623cb
Subproject commit 491eaf592d950e0e37accbe8b3f217e068c9fecf

View File

@ -1,6 +1,6 @@
option (ENABLE_AZURE_BLOB_STORAGE "Enable Azure blob storage" ${ENABLE_LIBRARIES})
if (NOT ENABLE_AZURE_BLOB_STORAGE OR BUILD_STANDALONE_KEEPER OR OS_FREEBSD OR ARCH_PPC64LE)
if (NOT ENABLE_AZURE_BLOB_STORAGE OR BUILD_STANDALONE_KEEPER OR OS_FREEBSD OR (NOT ARCH_AMD64))
message(STATUS "Not using Azure blob storage")
return()
endif()

2
contrib/capnproto vendored

@ -1 +1 @@
Subproject commit dc8b50b999777bcb23c89bb5907c785c3f654441
Subproject commit 976209a6d18074804f60d18ef99b6a809d27dadf

2
contrib/lz4 vendored

@ -1 +1 @@
Subproject commit 4c9431e9af596af0556e5da0ae99305bafb2b10b
Subproject commit e82198428c8061372d5adef1f9bfff4203f6081e

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
esac
ARG REPOSITORY="https://s3.amazonaws.com/clickhouse-builds/22.4/31c367d3cd3aefd316778601ff6565119fe36682/package_release"
ARG VERSION="23.4.2.11"
ARG VERSION="23.5.2.7"
ARG PACKAGES="clickhouse-keeper"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -33,7 +33,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.4.2.11"
ARG VERSION="23.5.2.7"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -22,7 +22,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="23.4.2.11"
ARG VERSION="23.5.2.7"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -16,6 +16,11 @@ For more information and documentation see https://clickhouse.com/.
- The tag `head` is built from the latest commit to the default branch.
- Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.
### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A). Most ARM CPUs after 2017 support ARMv8.2-A. A notable exception is Raspberry Pi 4 from 2019 whose CPU only supports ARMv8.0-A.
## How to use this image
### start server instance

View File

@ -15,6 +15,9 @@ dpkg -i package_folder/clickhouse-client_*.deb
ln -s /usr/share/clickhouse-test/clickhouse-test /usr/bin/clickhouse-test
# shellcheck disable=SC1091
source /usr/share/clickhouse-test/ci/attach_gdb.lib || true # FIXME: to not break old builds, clean on 2023-09-01
# install test configs
/usr/share/clickhouse-test/config/install.sh
@ -85,6 +88,8 @@ fi
sleep 5
attach_gdb_to_clickhouse || true # FIXME: to not break old builds, clean on 2023-09-01
function run_tests()
{
set -x

View File

@ -59,12 +59,6 @@ install_packages previous_release_package_folder
# available for dump via clickhouse-local
configure
# local_blob_storage disk type does not exist in older versions
sudo cat /etc/clickhouse-server/config.d/storage_conf.xml \
| sed "s|<type>local_blob_storage</type>|<type>local</type>|" \
> /etc/clickhouse-server/config.d/storage_conf.xml.tmp
sudo mv /etc/clickhouse-server/config.d/storage_conf.xml.tmp /etc/clickhouse-server/config.d/storage_conf.xml
# it contains some new settings, but we can safely remove it
rm /etc/clickhouse-server/config.d/merge_tree.xml
@ -92,11 +86,6 @@ export USE_S3_STORAGE_FOR_MERGE_TREE=1
export ZOOKEEPER_FAULT_INJECTION=0
configure
sudo cat /etc/clickhouse-server/config.d/storage_conf.xml \
| sed "s|<type>local_blob_storage</type>|<type>local</type>|" \
> /etc/clickhouse-server/config.d/storage_conf.xml.tmp
sudo mv /etc/clickhouse-server/config.d/storage_conf.xml.tmp /etc/clickhouse-server/config.d/storage_conf.xml
# it contains some new settings, but we can safely remove it
rm /etc/clickhouse-server/config.d/merge_tree.xml

View File

@ -0,0 +1,32 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v22.8.18.31-lts (4de7a95a544) FIXME as compared to v22.8.17.17-lts (df7f2ef0b41)
#### Performance Improvement
* Backported in [#49214](https://github.com/ClickHouse/ClickHouse/issues/49214): Fixed excessive reading in queries with `FINAL`. [#47801](https://github.com/ClickHouse/ClickHouse/pull/47801) ([Nikita Taranov](https://github.com/nickitat)).
#### Build/Testing/Packaging Improvement
* Backported in [#49079](https://github.com/ClickHouse/ClickHouse/issues/49079): Update time zones. The following were updated: Africa/Cairo, Africa/Casablanca, Africa/El_Aaiun, America/Bogota, America/Cambridge_Bay, America/Ciudad_Juarez, America/Godthab, America/Inuvik, America/Iqaluit, America/Nuuk, America/Ojinaga, America/Pangnirtung, America/Rankin_Inlet, America/Resolute, America/Whitehorse, America/Yellowknife, Asia/Gaza, Asia/Hebron, Asia/Kuala_Lumpur, Asia/Singapore, Canada/Yukon, Egypt, Europe/Kirov, Europe/Volgograd, Singapore. [#48572](https://github.com/ClickHouse/ClickHouse/pull/48572) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix bad cast from LowCardinality column when using short circuit function execution [#43311](https://github.com/ClickHouse/ClickHouse/pull/43311) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix msan issue in randomStringUTF8(<uneven number>) [#49750](https://github.com/ClickHouse/ClickHouse/pull/49750) ([Robert Schulze](https://github.com/rschu1ze)).
* JIT compilation not equals NaN fix [#50056](https://github.com/ClickHouse/ClickHouse/pull/50056) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix crash with `multiIf` and constant condition and nullable arguments [#50123](https://github.com/ClickHouse/ClickHouse/pull/50123) ([Anton Popov](https://github.com/CurtizJ)).
* Fixed type conversion from Date/Date32 to DateTime64 when querying with DateTime64 index [#50280](https://github.com/ClickHouse/ClickHouse/pull/50280) ([Lucas Chang](https://github.com/lucas-tubi)).
* Fix Keeper deadlock on exception when preprocessing requests. [#50387](https://github.com/ClickHouse/ClickHouse/pull/50387) ([frinkr](https://github.com/frinkr)).
* Fix Log family table return wrong rows count after truncate [#50585](https://github.com/ClickHouse/ClickHouse/pull/50585) ([flynn](https://github.com/ucasfl)).
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Improve test reports [#49151](https://github.com/ClickHouse/ClickHouse/pull/49151) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update github.com/distribution/distribution [#50114](https://github.com/ClickHouse/ClickHouse/pull/50114) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Catch issues with dockerd during the build [#50700](https://github.com/ClickHouse/ClickHouse/pull/50700) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,35 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.2.7.32-stable (934f6a2aa0e) FIXME as compared to v23.2.6.34-stable (570190045b0)
#### Performance Improvement
* Backported in [#49218](https://github.com/ClickHouse/ClickHouse/issues/49218): Fixed excessive reading in queries with `FINAL`. [#47801](https://github.com/ClickHouse/ClickHouse/pull/47801) ([Nikita Taranov](https://github.com/nickitat)).
#### Build/Testing/Packaging Improvement
* Backported in [#49208](https://github.com/ClickHouse/ClickHouse/issues/49208): Fix glibc compatibility check: replace `preadv` from musl. [#49144](https://github.com/ClickHouse/ClickHouse/pull/49144) ([alesapin](https://github.com/alesapin)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix key not found error for queries with multiple StorageJoin [#49137](https://github.com/ClickHouse/ClickHouse/pull/49137) ([vdimir](https://github.com/vdimir)).
* Fix race on Outdated parts loading [#49223](https://github.com/ClickHouse/ClickHouse/pull/49223) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix bug in DISTINCT [#49628](https://github.com/ClickHouse/ClickHouse/pull/49628) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix msan issue in randomStringUTF8(<uneven number>) [#49750](https://github.com/ClickHouse/ClickHouse/pull/49750) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix IPv6 encoding in protobuf [#49933](https://github.com/ClickHouse/ClickHouse/pull/49933) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Avoid deadlock when starting table in attach thread of `ReplicatedMergeTree` [#50026](https://github.com/ClickHouse/ClickHouse/pull/50026) ([Antonio Andelic](https://github.com/antonio2368)).
* JIT compilation not equals NaN fix [#50056](https://github.com/ClickHouse/ClickHouse/pull/50056) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix crash with `multiIf` and constant condition and nullable arguments [#50123](https://github.com/ClickHouse/ClickHouse/pull/50123) ([Anton Popov](https://github.com/CurtizJ)).
* Fix Keeper deadlock on exception when preprocessing requests. [#50387](https://github.com/ClickHouse/ClickHouse/pull/50387) ([frinkr](https://github.com/frinkr)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Improve test reports [#49151](https://github.com/ClickHouse/ClickHouse/pull/49151) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fallback auth gh api [#49314](https://github.com/ClickHouse/ClickHouse/pull/49314) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Improve CI: status commit, auth for get_gh_api [#49388](https://github.com/ClickHouse/ClickHouse/pull/49388) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update github.com/distribution/distribution [#50114](https://github.com/ClickHouse/ClickHouse/pull/50114) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Catch issues with dockerd during the build [#50700](https://github.com/ClickHouse/ClickHouse/pull/50700) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,45 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.3.3.52-lts (cb963c474db) FIXME as compared to v23.3.2.37-lts (1b144bcd101)
#### Improvement
* Backported in [#49954](https://github.com/ClickHouse/ClickHouse/issues/49954): Add support for (an unusual) case where the arguments in the `IN` operator are single-element tuples. [#49844](https://github.com/ClickHouse/ClickHouse/pull/49844) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
#### Build/Testing/Packaging Improvement
* Backported in [#49210](https://github.com/ClickHouse/ClickHouse/issues/49210): Fix glibc compatibility check: replace `preadv` from musl. [#49144](https://github.com/ClickHouse/ClickHouse/pull/49144) ([alesapin](https://github.com/alesapin)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix aggregate empty string error [#48999](https://github.com/ClickHouse/ClickHouse/pull/48999) ([LiuNeng](https://github.com/liuneng1994)).
* Fix key not found error for queries with multiple StorageJoin [#49137](https://github.com/ClickHouse/ClickHouse/pull/49137) ([vdimir](https://github.com/vdimir)).
* Fix race on Outdated parts loading [#49223](https://github.com/ClickHouse/ClickHouse/pull/49223) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix bug in DISTINCT [#49628](https://github.com/ClickHouse/ClickHouse/pull/49628) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix msan issue in randomStringUTF8(<uneven number>) [#49750](https://github.com/ClickHouse/ClickHouse/pull/49750) ([Robert Schulze](https://github.com/rschu1ze)).
* fix `is_prefix` in OptimizeRegularExpression [#49919](https://github.com/ClickHouse/ClickHouse/pull/49919) ([Han Fei](https://github.com/hanfei1991)).
* Fix IPv6 encoding in protobuf [#49933](https://github.com/ClickHouse/ClickHouse/pull/49933) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Avoid deadlock when starting table in attach thread of `ReplicatedMergeTree` [#50026](https://github.com/ClickHouse/ClickHouse/pull/50026) ([Antonio Andelic](https://github.com/antonio2368)).
* JIT compilation not equals NaN fix [#50056](https://github.com/ClickHouse/ClickHouse/pull/50056) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix crash with `multiIf` and constant condition and nullable arguments [#50123](https://github.com/ClickHouse/ClickHouse/pull/50123) ([Anton Popov](https://github.com/CurtizJ)).
* Fix reconnecting of HTTPS session when target host IP was changed [#50240](https://github.com/ClickHouse/ClickHouse/pull/50240) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Fixed type conversion from Date/Date32 to DateTime64 when querying with DateTime64 index [#50280](https://github.com/ClickHouse/ClickHouse/pull/50280) ([Lucas Chang](https://github.com/lucas-tubi)).
* Fix Keeper deadlock on exception when preprocessing requests. [#50387](https://github.com/ClickHouse/ClickHouse/pull/50387) ([frinkr](https://github.com/frinkr)).
* Fix incorrect constant folding [#50536](https://github.com/ClickHouse/ClickHouse/pull/50536) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix Log family table return wrong rows count after truncate [#50585](https://github.com/ClickHouse/ClickHouse/pull/50585) ([flynn](https://github.com/ucasfl)).
* Fix bug in `uniqExact` parallel merging [#50590](https://github.com/ClickHouse/ClickHouse/pull/50590) ([Nikita Taranov](https://github.com/nickitat)).
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Implement status comment [#48468](https://github.com/ClickHouse/ClickHouse/pull/48468) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update curl to 8.0.1 (for CVEs) [#48765](https://github.com/ClickHouse/ClickHouse/pull/48765) ([Boris Kuschel](https://github.com/bkuschel)).
* Improve test reports [#49151](https://github.com/ClickHouse/ClickHouse/pull/49151) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fallback auth gh api [#49314](https://github.com/ClickHouse/ClickHouse/pull/49314) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Improve CI: status commit, auth for get_gh_api [#49388](https://github.com/ClickHouse/ClickHouse/pull/49388) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update github.com/distribution/distribution [#50114](https://github.com/ClickHouse/ClickHouse/pull/50114) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Catch issues with dockerd during the build [#50700](https://github.com/ClickHouse/ClickHouse/pull/50700) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,42 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.4.3.48-stable (d9199f8d3cc) FIXME as compared to v23.4.2.11-stable (b6442320f9d)
#### Backward Incompatible Change
* Backported in [#49981](https://github.com/ClickHouse/ClickHouse/issues/49981): Revert "`groupArray` returns cannot be nullable" (due to binary compatibility breakage for `groupArray`/`groupArrayLast`/`groupArraySample` over `Nullable` types, which likely will lead to `TOO_LARGE_ARRAY_SIZE` or `CANNOT_READ_ALL_DATA`). [#49971](https://github.com/ClickHouse/ClickHouse/pull/49971) ([Azat Khuzhin](https://github.com/azat)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix key not found error for queries with multiple StorageJoin [#49137](https://github.com/ClickHouse/ClickHouse/pull/49137) ([vdimir](https://github.com/vdimir)).
* Fix fuzz bug when subquery set is not built when reading from remote() [#49425](https://github.com/ClickHouse/ClickHouse/pull/49425) ([Alexander Gololobov](https://github.com/davenger)).
* Fix postgres database setting [#49481](https://github.com/ClickHouse/ClickHouse/pull/49481) ([Mal Curtis](https://github.com/snikch)).
* Fix AsynchronousReadIndirectBufferFromRemoteFS breaking on short seeks [#49525](https://github.com/ClickHouse/ClickHouse/pull/49525) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix bug in DISTINCT [#49628](https://github.com/ClickHouse/ClickHouse/pull/49628) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix assert in SpanHolder::finish() with fibers [#49673](https://github.com/ClickHouse/ClickHouse/pull/49673) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix msan issue in randomStringUTF8(<uneven number>) [#49750](https://github.com/ClickHouse/ClickHouse/pull/49750) ([Robert Schulze](https://github.com/rschu1ze)).
* fix `is_prefix` in OptimizeRegularExpression [#49919](https://github.com/ClickHouse/ClickHouse/pull/49919) ([Han Fei](https://github.com/hanfei1991)).
* Fix IPv6 encoding in protobuf [#49933](https://github.com/ClickHouse/ClickHouse/pull/49933) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Avoid deadlock when starting table in attach thread of `ReplicatedMergeTree` [#50026](https://github.com/ClickHouse/ClickHouse/pull/50026) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix assert in SpanHolder::finish() with fibers attempt 2 [#50034](https://github.com/ClickHouse/ClickHouse/pull/50034) ([Kruglov Pavel](https://github.com/Avogar)).
* JIT compilation not equals NaN fix [#50056](https://github.com/ClickHouse/ClickHouse/pull/50056) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix crashing in case of Replicated database without arguments [#50058](https://github.com/ClickHouse/ClickHouse/pull/50058) ([Azat Khuzhin](https://github.com/azat)).
* Fix crash with `multiIf` and constant condition and nullable arguments [#50123](https://github.com/ClickHouse/ClickHouse/pull/50123) ([Anton Popov](https://github.com/CurtizJ)).
* Fix iceberg metadata parsing [#50232](https://github.com/ClickHouse/ClickHouse/pull/50232) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix bugs in Poco sockets in non-blocking mode, use true non-blocking sockets [#50252](https://github.com/ClickHouse/ClickHouse/pull/50252) ([Kruglov Pavel](https://github.com/Avogar)).
* Fixed type conversion from Date/Date32 to DateTime64 when querying with DateTime64 index [#50280](https://github.com/ClickHouse/ClickHouse/pull/50280) ([Lucas Chang](https://github.com/lucas-tubi)).
* Fix Keeper deadlock on exception when preprocessing requests. [#50387](https://github.com/ClickHouse/ClickHouse/pull/50387) ([frinkr](https://github.com/frinkr)).
* Fix Log family table return wrong rows count after truncate [#50585](https://github.com/ClickHouse/ClickHouse/pull/50585) ([flynn](https://github.com/ucasfl)).
* Fix bug in `uniqExact` parallel merging [#50590](https://github.com/ClickHouse/ClickHouse/pull/50590) ([Nikita Taranov](https://github.com/nickitat)).
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Improve CI: status commit, auth for get_gh_api [#49388](https://github.com/ClickHouse/ClickHouse/pull/49388) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update github.com/distribution/distribution [#50114](https://github.com/ClickHouse/ClickHouse/pull/50114) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Catch issues with dockerd during the build [#50700](https://github.com/ClickHouse/ClickHouse/pull/50700) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,599 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.5.1.3174-stable (2fec796e73e) FIXME as compared to v23.4.1.1943-stable (3920eb987f7)
#### Backward Incompatible Change
* Make local object storage work consistently with s3 object storage, fix problem with append (closes [#48465](https://github.com/ClickHouse/ClickHouse/issues/48465)), make it configurable as independent storage. The change is backward incompatible because cache on top of local object storage is not incompatible to previous versions. [#48791](https://github.com/ClickHouse/ClickHouse/pull/48791) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Date_trunc function to always return datetime type. [#48851](https://github.com/ClickHouse/ClickHouse/pull/48851) ([Shane Andrade](https://github.com/mauidude)).
* Remove the experimental feature "in-memory data parts". The data format is still supported, but the settings are no-op, and compact or wide parts will be used instead. This closes [#45409](https://github.com/ClickHouse/ClickHouse/issues/45409). [#49429](https://github.com/ClickHouse/ClickHouse/pull/49429) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Changed default values of settings parallelize_output_from_storages and input_format_parquet_preserve_order. This allows ClickHouse to reorder rows when reading from files (e.g. CSV or Parquet), greatly improving performance in many cases. To restore the old behavior of preserving order, use `parallelize_output_from_storages = 0`, `input_format_parquet_preserve_order = 1`. [#49479](https://github.com/ClickHouse/ClickHouse/pull/49479) ([Michael Kolupaev](https://github.com/al13n321)).
* Make projections production-ready. Add the `optimize_use_projections` setting to control whether the projections will be selected for SELECT queries. The setting `allow_experimental_projection_optimization` is obsolete and does nothing. [#49719](https://github.com/ClickHouse/ClickHouse/pull/49719) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Mark joinGet() as non deterministic (so as dictGet). [#49843](https://github.com/ClickHouse/ClickHouse/pull/49843) ([Azat Khuzhin](https://github.com/azat)).
* Revert "`groupArray` returns cannot be nullable" (due to binary compatibility breakage for `groupArray`/`groupArrayLast`/`groupArraySample` over `Nullable` types, which likely will lead to `TOO_LARGE_ARRAY_SIZE` or `CANNOT_READ_ALL_DATA`). [#49971](https://github.com/ClickHouse/ClickHouse/pull/49971) ([Azat Khuzhin](https://github.com/azat)).
#### New Feature
* Password type in queries like `CREATE USER u IDENTIFIED BY 'p'` will be automatically set according to the setting `default_password_type` in the `config.xml` on the server. Closes [#42915](https://github.com/ClickHouse/ClickHouse/issues/42915). [#44674](https://github.com/ClickHouse/ClickHouse/pull/44674) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add bcrypt password authentication type. Closes [#34599](https://github.com/ClickHouse/ClickHouse/issues/34599). [#44905](https://github.com/ClickHouse/ClickHouse/pull/44905) ([Nikolay Degterinsky](https://github.com/evillique)).
* Added `system.zookeeper_connection` table that shows information about ZooKeeper connections. [#45245](https://github.com/ClickHouse/ClickHouse/pull/45245) ([mateng915](https://github.com/mateng0915)).
* Add urlCluster table function. Refactor all *Cluster table functions to reduce code duplication. Make schema inference work for all possible *Cluster function signatures and for named collections. Closes [#38499](https://github.com/ClickHouse/ClickHouse/issues/38499). [#45427](https://github.com/ClickHouse/ClickHouse/pull/45427) ([attack204](https://github.com/attack204)).
* Extend `first_value` and `last_value` to accept null. [#46467](https://github.com/ClickHouse/ClickHouse/pull/46467) ([lgbo](https://github.com/lgbo-ustc)).
* Add server and format settings `display_secrets_in_show_and_select` for displaying secrets of tables, databases, table functions, and dictionaries. Add privilege `displaySecretsInShowAndSelect` controlling which users can view secrets. [#46528](https://github.com/ClickHouse/ClickHouse/pull/46528) ([Mike Kot](https://github.com/myrrc)).
* Add new function `generateRandomStructure` that generates random table structure. It can be used in combination with table function `generateRandom`. [#47409](https://github.com/ClickHouse/ClickHouse/pull/47409) ([Kruglov Pavel](https://github.com/Avogar)).
* Added native ClickHouse Keeper CLI Client. [#47414](https://github.com/ClickHouse/ClickHouse/pull/47414) ([pufit](https://github.com/pufit)).
* The query cache can now be used for production workloads. [#47977](https://github.com/ClickHouse/ClickHouse/pull/47977) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix a bug that prevented the use of `CASE` without an `ELSE` branch and extended `transform` to deal with more types. Also fix some bugs that made transform() return incorrect results when decimal types were mixed with other numeric types. [#48300](https://github.com/ClickHouse/ClickHouse/pull/48300) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Added [server-side encryption using KMS keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) with S3 tables, and the `header` setting with S3 disks. Closes [#48723](https://github.com/ClickHouse/ClickHouse/issues/48723). [#48724](https://github.com/ClickHouse/ClickHouse/pull/48724) ([Johann Gan](https://github.com/johanngan)).
* Add MemoryTracker for the background tasks (merges and mutation). Introduces `merges_mutations_memory_usage_soft_limit` and `merges_mutations_memory_usage_to_ram_ratio` settings that represent the soft memory limit for merges and mutations. If this limit is reached ClickHouse won't schedule new merge or mutation tasks. Also `MergesMutationsMemoryTracking` metric is introduced to allow observing current memory usage of background tasks. Resubmit [#46089](https://github.com/ClickHouse/ClickHouse/issues/46089). Closes [#48774](https://github.com/ClickHouse/ClickHouse/issues/48774). [#48787](https://github.com/ClickHouse/ClickHouse/pull/48787) ([Dmitry Novik](https://github.com/novikd)).
* Function `dotProduct` work for array. [#49050](https://github.com/ClickHouse/ClickHouse/pull/49050) ([FFFFFFFHHHHHHH](https://github.com/FFFFFFFHHHHHHH)).
* Support statement `SHOW INDEX` to improve compatibility with MySQL. [#49158](https://github.com/ClickHouse/ClickHouse/pull/49158) ([Robert Schulze](https://github.com/rschu1ze)).
* Add virtual column `_file` and `_path` support to table function `url`. - Impove error message for table function `url`. - resolves [#49231](https://github.com/ClickHouse/ClickHouse/issues/49231) - resolves [#49232](https://github.com/ClickHouse/ClickHouse/issues/49232). [#49356](https://github.com/ClickHouse/ClickHouse/pull/49356) ([Ziyi Tan](https://github.com/Ziy1-Tan)).
* Adding the `grants` field in the users.xml file, which allows specifying grants for users. [#49381](https://github.com/ClickHouse/ClickHouse/pull/49381) ([pufit](https://github.com/pufit)).
* Add alias `str_to_map` and `mapfromstring` for `extractkeyvaluepairs`. closes [#47185](https://github.com/ClickHouse/ClickHouse/issues/47185). [#49466](https://github.com/ClickHouse/ClickHouse/pull/49466) ([flynn](https://github.com/ucasfl)).
* Support full/right join by using grace hash join algorithm. [#49483](https://github.com/ClickHouse/ClickHouse/pull/49483) ([lgbo](https://github.com/lgbo-ustc)).
* `WITH FILL` modifier groups filling by sorting prefix. Controlled by `use_with_fill_by_sorting_prefix` setting (enabled by default). Related to [#33203](https://github.com/ClickHouse/ClickHouse/issues/33203)#issuecomment-1418736794. [#49503](https://github.com/ClickHouse/ClickHouse/pull/49503) ([Igor Nikonov](https://github.com/devcrafter)).
* Add SQL functions for entropy-learned hashing. [#49656](https://github.com/ClickHouse/ClickHouse/pull/49656) ([Robert Schulze](https://github.com/rschu1ze)).
* Clickhouse-client now accepts queries after "--multiquery" when "--query" (or "-q") is absent. example: clickhouse-client --multiquery "select 1; select 2;". [#49870](https://github.com/ClickHouse/ClickHouse/pull/49870) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Add separate `handshake_timeout` for receiving Hello packet from replica. Closes [#48854](https://github.com/ClickHouse/ClickHouse/issues/48854). [#49948](https://github.com/ClickHouse/ClickHouse/pull/49948) ([Kruglov Pavel](https://github.com/Avogar)).
* New setting s3_max_inflight_parts_for_one_file sets the limit of concurrently loaded parts with multipart upload request in scope of one file. [#49961](https://github.com/ClickHouse/ClickHouse/pull/49961) ([Sema Checherinda](https://github.com/CheSema)).
* Geographical data types (`Point`, `Ring`, `Polygon`, and `MultiPolygon`) are production-ready. [#50022](https://github.com/ClickHouse/ClickHouse/pull/50022) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added a function "space()" which repeats a space as many times as specified. [#50103](https://github.com/ClickHouse/ClickHouse/pull/50103) ([Robert Schulze](https://github.com/rschu1ze)).
* Added --input_format_csv_trim_whitespaces option. [#50215](https://github.com/ClickHouse/ClickHouse/pull/50215) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Added the dictGetAll function for regexp tree dictionaries to return values from multiple matches as arrays. Closes [#50254](https://github.com/ClickHouse/ClickHouse/issues/50254). [#50255](https://github.com/ClickHouse/ClickHouse/pull/50255) ([Johann Gan](https://github.com/johanngan)).
* Added toLastDayOfWeek() function to round a date or a date with time up to the nearest Saturday or Sunday. [#50315](https://github.com/ClickHouse/ClickHouse/pull/50315) ([Victor Krasnov](https://github.com/sirvickr)).
* Ability to ignore a skip index by specifying `ignore_data_skipping_indices`. [#50329](https://github.com/ClickHouse/ClickHouse/pull/50329) ([Boris Kuschel](https://github.com/bkuschel)).
* Revert 'Add SQL functions for entropy-learned hashing'. [#50416](https://github.com/ClickHouse/ClickHouse/pull/50416) ([Robert Schulze](https://github.com/rschu1ze)).
* Add `system.user_processes` table and `SHOW USER PROCESSES` query to show memory info and ProfileEvents on user level. [#50492](https://github.com/ClickHouse/ClickHouse/pull/50492) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Added storage engine `AzureBlobStorage` and `azure_blob_storage` table function. The supported set of features is very similar to storage/table function `S3`. Implements [#19307](https://github.com/ClickHouse/ClickHouse/issues/19307). [#50604](https://github.com/ClickHouse/ClickHouse/pull/50604) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
#### Performance Improvement
* Compress marks and primary key by default. It significantly reduces the cold query time. Upgrade notes: the support for compressed marks and primary key has been added in version 22.9. If you turned on compressed marks or primary key or installed version 23.5 or newer, which has compressed marks or primary key on by default, you will not be able to downgrade to version 22.8 or earlier. You can also explicitly disable compressed marks or primary keys by specifying the `compress_marks` and `compress_primary_key` settings in the `<merge_tree>` section of the server configuration file. **Upgrade notes:** If you upgrade from versions prior to 22.9, you should either upgrade all replicas at once or disable the compression before upgrade, or upgrade through an intermediate version, where the compressed marks are supported but not enabled by default, such as 23.3. [#42587](https://github.com/ClickHouse/ClickHouse/pull/42587) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* When reading from multiple files reduce parallel parsing threads for each file resolves [#42192](https://github.com/ClickHouse/ClickHouse/issues/42192). [#46661](https://github.com/ClickHouse/ClickHouse/pull/46661) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Do not store blocks in `ANY` hash join if nothing is inserted. [#48633](https://github.com/ClickHouse/ClickHouse/pull/48633) ([vdimir](https://github.com/vdimir)).
* Fixes aggregate combinator `-If` when JIT compiled. Closes [#48120](https://github.com/ClickHouse/ClickHouse/issues/48120). [#49083](https://github.com/ClickHouse/ClickHouse/pull/49083) ([Igor Nikonov](https://github.com/devcrafter)).
* For reading from remote tables we use smaller tasks (instead of reading the whole part) to make tasks stealing work * task size is determined by size of columns to read * always use 1mb buffers for reading from s3 * boundaries of cache segments aligned to 1mb so they have decent size even with small tasks. it also should prevent fragmentation. [#49287](https://github.com/ClickHouse/ClickHouse/pull/49287) ([Nikita Taranov](https://github.com/nickitat)).
* Default size of a read buffer for reading from local filesystem changed to a slightly better value. Also two new settings are introduced: `max_read_buffer_size_local_fs` and `max_read_buffer_size_remote_fs`. [#49321](https://github.com/ClickHouse/ClickHouse/pull/49321) ([Nikita Taranov](https://github.com/nickitat)).
* Improve memory usage and speed of `SPARSE_HASHED`/`HASHED` dictionaries (e.g. `SPARSE_HASHED` now eats 2.6x less memory, and is ~2x faster). [#49380](https://github.com/ClickHouse/ClickHouse/pull/49380) ([Azat Khuzhin](https://github.com/azat)).
* Use aggregate projection only if it reads fewer granules than normal reading. It should help in case if query hits the PK of the table, but not the projection. Fixes [#49150](https://github.com/ClickHouse/ClickHouse/issues/49150). [#49417](https://github.com/ClickHouse/ClickHouse/pull/49417) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Optimize PODArray::resize_fill() callers. [#49459](https://github.com/ClickHouse/ClickHouse/pull/49459) ([Azat Khuzhin](https://github.com/azat)).
* Optimize the system.query_log and system.query_thread_log tables by applying LowCardinality when appropriate. The queries over these tables will be faster. [#49530](https://github.com/ClickHouse/ClickHouse/pull/49530) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Better performance when reading local Parquet files (through parallel reading). [#49539](https://github.com/ClickHouse/ClickHouse/pull/49539) ([Michael Kolupaev](https://github.com/al13n321)).
* Improve the performance of `RIGHT/FULL JOIN` by up to 2 times in certain scenarios, especially when joining a small left table with a large right table. [#49585](https://github.com/ClickHouse/ClickHouse/pull/49585) ([lgbo](https://github.com/lgbo-ustc)).
* Improve performance of BLAKE3 by 11% by enabling LTO for Rust. [#49600](https://github.com/ClickHouse/ClickHouse/pull/49600) ([Azat Khuzhin](https://github.com/azat)).
* Optimize the structure of the `system.opentelemetry_span_log`. Use `LowCardinality` where appropriate. Although this table is generally stupid (it is using the Map data type even for common attributes), it will be slightly better. [#49647](https://github.com/ClickHouse/ClickHouse/pull/49647) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Try to reserve hash table's size in `grace_hash` join. [#49816](https://github.com/ClickHouse/ClickHouse/pull/49816) ([lgbo](https://github.com/lgbo-ustc)).
* As is addresed in issue [#49748](https://github.com/ClickHouse/ClickHouse/issues/49748), the predicates with date converters, such as **toYear, toYYYYMM**, could be rewritten with the equivalent date (YYYY-MM-DD) comparisons at the AST level. And this transformation could bring performance improvement as it is free from the expensive date converter and the comparison between dates (or integers in the low level representation) is quite low-cost. The [prototype](https://github.com/ZhiguoZh/ClickHouse/commit/c7f1753f0c9363a19d95fa46f1cfed1d9f505ee0) shows that, with all identified date converters optimized, the overall QPS of the 13 queries is enhanced by **~11%** on the ICX server (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads). [#50062](https://github.com/ClickHouse/ClickHouse/pull/50062) ([Zhiguo Zhou](https://github.com/ZhiguoZh)).
* Parallel merge of `uniqExactIf` states. Closes [#49885](https://github.com/ClickHouse/ClickHouse/issues/49885). [#50285](https://github.com/ClickHouse/ClickHouse/pull/50285) ([flynn](https://github.com/ucasfl)).
* As is addresed in issue [#49748](https://github.com/ClickHouse/ClickHouse/issues/49748), the predicates with date converters, such as toYear, toYYYYMM, could be rewritten with the equivalent date (YYYY-MM-DD) comparisons at the AST level. And this transformation could bring performance improvement as it is free from the expensive date converter and the comparison between dates (or integers in the low level representation) is quite low-cost. [#50307](https://github.com/ClickHouse/ClickHouse/pull/50307) ([Zhiguo Zhou](https://github.com/ZhiguoZh)).
* Parallel merging supported for `uniqExact` with modifiers `-Array`, `-Merge`, `-OrNull`, `-State`. [#50413](https://github.com/ClickHouse/ClickHouse/pull/50413) ([flynn](https://github.com/ucasfl)).
* Enable LZ4_FAST_DEC_LOOP for Arm LZ4 to get 5% of decompression speed. [#50588](https://github.com/ClickHouse/ClickHouse/pull/50588) ([Daniel Kutenin](https://github.com/danlark1)).
#### Improvement
* Add support for CGroup version 2 for asynchronous metrics about the memory usage and availability. This closes [#37983](https://github.com/ClickHouse/ClickHouse/issues/37983). [#45999](https://github.com/ClickHouse/ClickHouse/pull/45999) ([sichenzhao](https://github.com/sichenzhao)).
* Cluster table functions should always skip unavailable shards. close [#46314](https://github.com/ClickHouse/ClickHouse/issues/46314). [#46765](https://github.com/ClickHouse/ClickHouse/pull/46765) ([zk_kiger](https://github.com/zk-kiger)).
* When your csv file contains empty columns, like ```. [#47496](https://github.com/ClickHouse/ClickHouse/pull/47496) ([你不要过来啊](https://github.com/iiiuwioajdks)).
* ROW POLICY for all tables that belong to a DATABASE. [#47640](https://github.com/ClickHouse/ClickHouse/pull/47640) ([Ilya Golshtein](https://github.com/ilejn)).
* Add Google Cloud Storage S3 compatible table function `gcs`. Like the `oss` and `cosn` functions, it is just an alias over the `s3` table function, and it does not bring any new features. [#47815](https://github.com/ClickHouse/ClickHouse/pull/47815) ([Kuba Kaflik](https://github.com/jkaflik)).
* Add ability to use strict parts size for S3 (compatibility with CloudFlare R2 S3 Storage). [#48492](https://github.com/ClickHouse/ClickHouse/pull/48492) ([Azat Khuzhin](https://github.com/azat)).
* Added new columns with info about `Replicated` database replicas to `system.clusters`: `database_shard_name`, `database_replica_name`, `is_active`. Added an optional `FROM SHARD` clause to `SYSTEM DROP DATABASE REPLICA` query. [#48548](https://github.com/ClickHouse/ClickHouse/pull/48548) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add a new column `zookeeper_name` in system.replicas, to indicate on which (auxiliary) zookeeper cluster the replicated table's metadata is stored. [#48549](https://github.com/ClickHouse/ClickHouse/pull/48549) ([cangyin](https://github.com/cangyin)).
* `IN` operator support compare `Date` and `Date32`. Closes [#48736](https://github.com/ClickHouse/ClickHouse/issues/48736). [#48806](https://github.com/ClickHouse/ClickHouse/pull/48806) ([flynn](https://github.com/ucasfl)).
* Support for erasure codes in HDFS, author: @M1eyu2018, @tomscut. [#48833](https://github.com/ClickHouse/ClickHouse/pull/48833) ([M1eyu](https://github.com/M1eyu2018)).
* The query cache can now supports queries with totals and extremes modifier. [#48853](https://github.com/ClickHouse/ClickHouse/pull/48853) ([Robert Schulze](https://github.com/rschu1ze)).
* Introduces new keyword `INTO OUTFILE 'file.txt' APPEND`. [#48880](https://github.com/ClickHouse/ClickHouse/pull/48880) ([alekar](https://github.com/alekar)).
* The `BACKUP` command will not decrypt data from encrypted disks while making a backup. Instead the data will be stored in a backup in encrypted form. Such backups can be restored only to an encrypted disk with the same (or extended) list of encryption keys. [#48896](https://github.com/ClickHouse/ClickHouse/pull/48896) ([Vitaly Baranov](https://github.com/vitlibar)).
* Keeper improvement: add `CheckNotExists` request to Keeper. [#48897](https://github.com/ClickHouse/ClickHouse/pull/48897) ([Antonio Andelic](https://github.com/antonio2368)).
* Implement SYSTEM DROP REPLICA from auxillary ZooKeeper clusters, may be close [#48931](https://github.com/ClickHouse/ClickHouse/issues/48931). [#48932](https://github.com/ClickHouse/ClickHouse/pull/48932) ([wangxiaobo](https://github.com/wzb5212)).
* Add Array data type to MongoDB. Closes [#48598](https://github.com/ClickHouse/ClickHouse/issues/48598). [#48983](https://github.com/ClickHouse/ClickHouse/pull/48983) ([Nikolay Degterinsky](https://github.com/evillique)).
* Keeper performance improvements: avoid serializing same request twice while processing. Cache deserialization results of large requests. Controlled by new coordination setting `min_request_size_for_cache`. [#49004](https://github.com/ClickHouse/ClickHouse/pull/49004) ([Antonio Andelic](https://github.com/antonio2368)).
* Support storing `Interval` data types in tables. [#49085](https://github.com/ClickHouse/ClickHouse/pull/49085) ([larryluogit](https://github.com/larryluogit)).
* Add support for size suffixes in quota creation statement parameters. [#49087](https://github.com/ClickHouse/ClickHouse/pull/49087) ([Eridanus](https://github.com/Eridanus117)).
* Allow using `ntile` window function without explicit window frame definition: `ntile(3) OVER (ORDER BY a)`, close [#46763](https://github.com/ClickHouse/ClickHouse/issues/46763). [#49093](https://github.com/ClickHouse/ClickHouse/pull/49093) ([vdimir](https://github.com/vdimir)).
* Added settings (`number_of_mutations_to_delay`, `number_of_mutations_to_throw`) to delay or throw `ALTER` queries that create mutations (`ALTER UPDATE`, `ALTER DELETE`, `ALTER MODIFY COLUMN`, ...) in case when table already has a lot of unfinished mutations. [#49117](https://github.com/ClickHouse/ClickHouse/pull/49117) ([Anton Popov](https://github.com/CurtizJ)).
* Added setting `async_insert` for `MergeTables`. It has the same meaning as query-level setting `async_insert` and enables asynchronous inserts for specific table. Note: it doesn't take effect for insert queries from `clickhouse-client`, use query-level setting in that case. [#49122](https://github.com/ClickHouse/ClickHouse/pull/49122) ([Anton Popov](https://github.com/CurtizJ)).
* Catch exception from `create_directories` in filesystem cache. [#49203](https://github.com/ClickHouse/ClickHouse/pull/49203) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Copies embedded examples to a new field `example` in `system.functions` to supplement the field `description`. [#49222](https://github.com/ClickHouse/ClickHouse/pull/49222) ([Dan Roscigno](https://github.com/DanRoscigno)).
* Enable connection options for the MongoDB dictionary. Example: ``` xml <source> <mongodb> <host>localhost</host> <port>27017</port> <user></user> <password></password> <db>test</db> <collection>dictionary_source</collection> <options>ssl=true</options> </mongodb> </source> ``` ### Documentation entry for user-facing changes. [#49225](https://github.com/ClickHouse/ClickHouse/pull/49225) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Added an alias `asymptotic` for `asymp` computational method for `kolmogorovSmirnovTest`. Improved documentation. [#49286](https://github.com/ClickHouse/ClickHouse/pull/49286) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Aggregation function groupBitAnd/Or/Xor now work on signed integer data. This makes them consistent with the behavior of scalar functions bitAnd/Or/Xor. [#49292](https://github.com/ClickHouse/ClickHouse/pull/49292) ([exmy](https://github.com/exmy)).
* Split function-documentation into more fine-granular fields. [#49300](https://github.com/ClickHouse/ClickHouse/pull/49300) ([Robert Schulze](https://github.com/rschu1ze)).
* Introduced settings: - `merge_max_block_size_bytes` to limit the amount of memory used for background operations. - `vertical_merge_algorithm_min_bytes_to_activate` to add another condition to activate vertical merges. [#49313](https://github.com/ClickHouse/ClickHouse/pull/49313) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Use multiple threads shared between all tables within a server to load outdated data parts. The the size of the pool and its queue is controlled by `max_outdated_parts_loading_thread_pool_size` and `outdated_part_loading_thread_pool_queue_size` settings. [#49317](https://github.com/ClickHouse/ClickHouse/pull/49317) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Don't overestimate the size of processed data for `LowCardinality` columns when they share dictionaries between blocks. This closes [#49322](https://github.com/ClickHouse/ClickHouse/issues/49322). See also [#48745](https://github.com/ClickHouse/ClickHouse/issues/48745). [#49323](https://github.com/ClickHouse/ClickHouse/pull/49323) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Parquet writer now uses reasonable row group size when invoked through OUTFILE. [#49325](https://github.com/ClickHouse/ClickHouse/pull/49325) ([Michael Kolupaev](https://github.com/al13n321)).
* Allow restricted keywords like `ARRAY` as an alias if the alias is quoted. Closes [#49324](https://github.com/ClickHouse/ClickHouse/issues/49324). [#49360](https://github.com/ClickHouse/ClickHouse/pull/49360) ([Nikolay Degterinsky](https://github.com/evillique)).
* Added possibility to use temporary tables in FROM part of ATTACH PARTITION FROM and REPLACE PARTITION FROM. [#49436](https://github.com/ClickHouse/ClickHouse/pull/49436) ([Roman Vasin](https://github.com/rvasin)).
* Data parts loading and deletion jobs were moved to shared server-wide pools instead of per-table pools. Pools sizes are controlled via settings `max_active_parts_loading_thread_pool_size`, `max_outdated_parts_loading_thread_pool_size` and `max_parts_cleaning_thread_pool_size` in top-level config. Table-level settings `max_part_loading_threads` and `max_part_removal_threads` became obsolete. [#49474](https://github.com/ClickHouse/ClickHouse/pull/49474) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow `?password=pass` in URL. Password is replaced in browser history. [#49505](https://github.com/ClickHouse/ClickHouse/pull/49505) ([Mike Kot](https://github.com/myrrc)).
* Allow zero objects in ReadBufferFromRemoteFSGather (because empty files are not backuped, so we might end up with zero blobs in metadata file). Closes [#49480](https://github.com/ClickHouse/ClickHouse/issues/49480). [#49519](https://github.com/ClickHouse/ClickHouse/pull/49519) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Attach thread MemoryTracker to `total_memory_tracker` after `ThreadGroup` detached. [#49527](https://github.com/ClickHouse/ClickHouse/pull/49527) ([Dmitry Novik](https://github.com/novikd)).
* Make `Pretty` formats prettier: squash blocks if not much time passed since the output of the previous block. This is controlled by a new setting `output_format_pretty_squash_ms` (100ms by default). This closes [#49153](https://github.com/ClickHouse/ClickHouse/issues/49153). [#49537](https://github.com/ClickHouse/ClickHouse/pull/49537) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add initial support to do JOINs with pure parallel replicas. [#49544](https://github.com/ClickHouse/ClickHouse/pull/49544) ([Raúl Marín](https://github.com/Algunenano)).
* Fix parameterized views when query parameter used multiple times in the query. [#49556](https://github.com/ClickHouse/ClickHouse/pull/49556) ([Azat Khuzhin](https://github.com/azat)).
* Release memory allocated for the last sent ProfileEvents snapshot in the context of a query. Followup [#47564](https://github.com/ClickHouse/ClickHouse/issues/47564). [#49561](https://github.com/ClickHouse/ClickHouse/pull/49561) ([Dmitry Novik](https://github.com/novikd)).
* Function "makeDate" now provides a MySQL-compatible overload (year & day of the year argument). [#49603](https://github.com/ClickHouse/ClickHouse/pull/49603) ([Robert Schulze](https://github.com/rschu1ze)).
* More parallelism on `Outdated` parts removal with "zero-copy replication". [#49630](https://github.com/ClickHouse/ClickHouse/pull/49630) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Reduced number of `List` ZooKeeper requests when selecting parts to merge and a lot of partitions do not have anything to merge. [#49637](https://github.com/ClickHouse/ClickHouse/pull/49637) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Support `dictionary` table function for `RegExpTreeDictionary`. [#49666](https://github.com/ClickHouse/ClickHouse/pull/49666) ([Han Fei](https://github.com/hanfei1991)).
* Added weighted fair IO scheduling policy. Added dynamic resource manager, which allows IO scheduling hierarchy to be updated in runtime w/o server restarts. [#49671](https://github.com/ClickHouse/ClickHouse/pull/49671) ([Sergei Trifonov](https://github.com/serxa)).
* Add compose request after multipart upload to GCS. This enables the usage of copy operation on objects uploaded with the multipart upload. It's recommended to set `s3_strict_upload_part_size` to some value because compose request can fail on objects created with parts of different sizes. [#49693](https://github.com/ClickHouse/ClickHouse/pull/49693) ([Antonio Andelic](https://github.com/antonio2368)).
* Improve the "best-effort" parsing logic to accept `key_value_delimiter` as a valid part of the value. This also simplifies branching and might even speed up things a bit. [#49760](https://github.com/ClickHouse/ClickHouse/pull/49760) ([Arthur Passos](https://github.com/arthurpassos)).
* Facilitate profile data association and aggregation for the same query. [#49777](https://github.com/ClickHouse/ClickHouse/pull/49777) ([helifu](https://github.com/helifu)).
* System log tables can now have custom sorting keys. [#49778](https://github.com/ClickHouse/ClickHouse/pull/49778) ([helifu](https://github.com/helifu)).
* A new field 'partitions' is used to indicate which partitions are participating in the calculation. [#49779](https://github.com/ClickHouse/ClickHouse/pull/49779) ([helifu](https://github.com/helifu)).
* Added `enable_the_endpoint_id_with_zookeeper_name_prefix` setting for `ReplicatedMergeTree` (disabled by default). When enabled, it adds ZooKeeper cluster name to table's interserver communication endpoint. It avoids `Duplicate interserver IO endpoint` errors when having replicated tables with the same path, but different auxiliary ZooKeepers. [#49780](https://github.com/ClickHouse/ClickHouse/pull/49780) ([helifu](https://github.com/helifu)).
* Add query parameters to clickhouse-local. Closes [#46561](https://github.com/ClickHouse/ClickHouse/issues/46561). [#49785](https://github.com/ClickHouse/ClickHouse/pull/49785) ([Nikolay Degterinsky](https://github.com/evillique)).
* Qpl_deflate codec lower the minimum simd version to sse 4.2. [doc change in qpl](https://github.com/intel/qpl/commit/3f8f5cea27739f5261e8fd577dc233ffe88bf679) - intel® qpl relies on a run-time kernels dispatcher and cpuid check to choose the best available implementation(sse/avx2/avx512) - restructured cmakefile for qpl build in clickhouse to align with latest upstream qpl. [#49811](https://github.com/ClickHouse/ClickHouse/pull/49811) ([jasperzhu](https://github.com/jinjunzh)).
* Allow loading dictionaries and functions from YAML by default. In previous versions, it required editing the `dictionaries_config` or `user_defined_executable_functions_config` in the configuration file, as they expected `*.xml` files. [#49812](https://github.com/ClickHouse/ClickHouse/pull/49812) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The Kafka table engine now allows to use alias columns. [#49824](https://github.com/ClickHouse/ClickHouse/pull/49824) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Add setting to limit the max number of pairs produced by extractKeyValuePairs, safeguard to avoid using way too much memory. [#49836](https://github.com/ClickHouse/ClickHouse/pull/49836) ([Arthur Passos](https://github.com/arthurpassos)).
* Add support for (an unusual) case where the arguments in the `IN` operator are single-element tuples. [#49844](https://github.com/ClickHouse/ClickHouse/pull/49844) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* `bitHammingDistance` function support `String` and `FixedString` data type. Closes [#48827](https://github.com/ClickHouse/ClickHouse/issues/48827). [#49858](https://github.com/ClickHouse/ClickHouse/pull/49858) ([flynn](https://github.com/ucasfl)).
* Fix timeout resetting errors in the client on OS X. [#49863](https://github.com/ClickHouse/ClickHouse/pull/49863) ([alekar](https://github.com/alekar)).
* Add support for big integers, such as UInt128, Int128, UInt256, and Int256 in the function `bitCount`. This enables Hamming distance over large bit masks for AI applications. [#49867](https://github.com/ClickHouse/ClickHouse/pull/49867) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* This PR makes fingerprints to be used instead of key IDs in encrypted disks. [#49882](https://github.com/ClickHouse/ClickHouse/pull/49882) ([Vitaly Baranov](https://github.com/vitlibar)).
* Add UUID data type to PostgreSQL. Closes [#49739](https://github.com/ClickHouse/ClickHouse/issues/49739). [#49894](https://github.com/ClickHouse/ClickHouse/pull/49894) ([Nikolay Degterinsky](https://github.com/evillique)).
* Make `allow_experimental_query_cache` setting as obsolete for backward-compatibility. It was removed in https://github.com/ClickHouse/ClickHouse/pull/47977. [#49934](https://github.com/ClickHouse/ClickHouse/pull/49934) ([Timur Solodovnikov](https://github.com/tsolodov)).
* Function toUnixTimestamp() now accepts Date and Date32 arguments. [#49989](https://github.com/ClickHouse/ClickHouse/pull/49989) ([Victor Krasnov](https://github.com/sirvickr)).
* Charge only server memory for dictionaries. [#49995](https://github.com/ClickHouse/ClickHouse/pull/49995) ([Azat Khuzhin](https://github.com/azat)).
* Add schema inference to PostgreSQL, MySQL, MeiliSearch, and SQLite table engines. Closes [#49972](https://github.com/ClickHouse/ClickHouse/issues/49972). [#50000](https://github.com/ClickHouse/ClickHouse/pull/50000) ([Nikolay Degterinsky](https://github.com/evillique)).
* The server will allow using the `SQL_*` settings such as `SQL_AUTO_IS_NULL` as no-ops for MySQL compatibility. This closes [#49927](https://github.com/ClickHouse/ClickHouse/issues/49927). [#50013](https://github.com/ClickHouse/ClickHouse/pull/50013) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Preserve initial_query_id for ON CLUSTER queries, which is useful for introspection (under `distributed_ddl_entry_format_version=5`). [#50015](https://github.com/ClickHouse/ClickHouse/pull/50015) ([Azat Khuzhin](https://github.com/azat)).
* Preserve backward incompatibility for renamed settings by using aliases (`allow_experimental_projection_optimization` for `optimize_use_projections`, `allow_experimental_lightweight_delete` for `enable_lightweight_delete`). [#50044](https://github.com/ClickHouse/ClickHouse/pull/50044) ([Azat Khuzhin](https://github.com/azat)).
* Support cross-replication in distributed queries using the new infrastructure. [#50097](https://github.com/ClickHouse/ClickHouse/pull/50097) ([Dmitry Novik](https://github.com/novikd)).
* Support passing fqdn through setting my_hostname to register cluster node in keeper. Add setting of invisible to support multi compute groups. A compute group as a cluster, is invisible to other compute groups. [#50186](https://github.com/ClickHouse/ClickHouse/pull/50186) ([Yangkuan Liu](https://github.com/LiuYangkuan)).
* Fix PostgreSQL reading all the data even though `LIMIT n` could be specified. [#50187](https://github.com/ClickHouse/ClickHouse/pull/50187) ([Kseniia Sumarokova](https://github.com/kssenii)).
* 1) Fixed an error `NOT_FOUND_COLUMN_IN_BLOCK` in case of using parallel replicas with non-replicated storage with disabled setting `parallel_replicas_for_non_replicated_merge_tree` 2) Now `allow_experimental_parallel_reading_from_replicas` have 3 possible values - 0, 1 and 2. 0 - disabled, 1 - enabled, silently disable them in case of failure (in case of FINAL or JOIN), 2 - enabled, throw an expection in case of failure. 3) If FINAL modifier is used in SELECT query and parallel replicas are enabled, ClickHouse will try to disable them if `allow_experimental_parallel_reading_from_replicas` is set to 1 and throw an exception otherwise. [#50195](https://github.com/ClickHouse/ClickHouse/pull/50195) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Don't send head request for all keys in Iceberg schema inference, only for keys that are used for reaing data. [#50203](https://github.com/ClickHouse/ClickHouse/pull/50203) ([Kruglov Pavel](https://github.com/Avogar)).
* Add new profile events for queries with subqueries (`QueriesWithSubqueries`/`SelectQueriesWithSubqueries`/`InsertQueriesWithSubqueries`). [#50204](https://github.com/ClickHouse/ClickHouse/pull/50204) ([Azat Khuzhin](https://github.com/azat)).
* Adding the roles field in the users.xml file, which allows specifying roles with grants via a config file. [#50278](https://github.com/ClickHouse/ClickHouse/pull/50278) ([pufit](https://github.com/pufit)).
* When parallel replicas are enabled they will always skip unavailable servers (the behavior is controlled by the setting `skip_unavailable_shards`, enabled by default and can be only disabled). This closes: [#48565](https://github.com/ClickHouse/ClickHouse/issues/48565). [#50293](https://github.com/ClickHouse/ClickHouse/pull/50293) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix a typo. [#50306](https://github.com/ClickHouse/ClickHouse/pull/50306) ([helifu](https://github.com/helifu)).
* Setting `enable_memory_bound_merging_of_aggregation_results` is enabled by default. If you update from version prior to 22.12, we recommend to set this flag to `false` until update is finished. [#50319](https://github.com/ClickHouse/ClickHouse/pull/50319) ([Nikita Taranov](https://github.com/nickitat)).
* Report `CGroupCpuCfsPeriod` and `CGroupCpuCfsQuota` in AsynchronousMetrics. - Respect cgroup v2 memory limits during server startup. [#50379](https://github.com/ClickHouse/ClickHouse/pull/50379) ([alekar](https://github.com/alekar)).
* Bump internal protobuf to v3.18 (fixes CVE-2022-1941). [#50400](https://github.com/ClickHouse/ClickHouse/pull/50400) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump internal libxml2 to v2.10.4 (fixes CVE-2023-28484 and CVE-2023-29469). [#50402](https://github.com/ClickHouse/ClickHouse/pull/50402) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump c-ares to v1.19.1 (CVE-2023-32067, CVE-2023-31130, CVE-2023-31147). [#50403](https://github.com/ClickHouse/ClickHouse/pull/50403) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix CVE-2022-2469 in libgsasl. [#50404](https://github.com/ClickHouse/ClickHouse/pull/50404) ([Robert Schulze](https://github.com/rschu1ze)).
* Make filter push down through cross join. [#50430](https://github.com/ClickHouse/ClickHouse/pull/50430) ([Han Fei](https://github.com/hanfei1991)).
* Add a signal handler for SIGQUIT to work the same way as SIGINT. Closes [#50298](https://github.com/ClickHouse/ClickHouse/issues/50298). [#50435](https://github.com/ClickHouse/ClickHouse/pull/50435) ([Nikolay Degterinsky](https://github.com/evillique)).
* In case JSON parse fails due to the large size of the object output the last position to allow debugging. [#50474](https://github.com/ClickHouse/ClickHouse/pull/50474) ([Valentin Alexeev](https://github.com/valentinalexeev)).
* Support decimals with not fixed size. Closes [#49130](https://github.com/ClickHouse/ClickHouse/issues/49130). [#50586](https://github.com/ClickHouse/ClickHouse/pull/50586) ([Kruglov Pavel](https://github.com/Avogar)).
* Disable pure parallel replicas if trivial count optimization is possible. [#50594](https://github.com/ClickHouse/ClickHouse/pull/50594) ([Raúl Marín](https://github.com/Algunenano)).
* Added support of TRUNCATE db.table additional to TRUNCATE TABLE db.table in MaterializedMySQL. [#50624](https://github.com/ClickHouse/ClickHouse/pull/50624) ([Val Doroshchuk](https://github.com/valbok)).
* Disable parallel replicas automatically when the estimated number of granules is less than threshold. The behavior is controlled by a setting `parallel_replicas_min_number_of_granules_to_enable`. [#50639](https://github.com/ClickHouse/ClickHouse/pull/50639) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* When creating skipping indexes via "ALTER TABLE table ADD INDEX", the "GRANULARITY" clause can now be omitted. In that case, GRANULARITY is assumed to be 1. [#50658](https://github.com/ClickHouse/ClickHouse/pull/50658) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix slow cache in presence of big inserts. [#50680](https://github.com/ClickHouse/ClickHouse/pull/50680) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Set default max_elements limit in filesystem cache to 10000000. [#50682](https://github.com/ClickHouse/ClickHouse/pull/50682) ([Kseniia Sumarokova](https://github.com/kssenii)).
* SHOW INDICES is now an alias of statement SHOW INDEX/INDEXES/KEYS. [#50713](https://github.com/ClickHouse/ClickHouse/pull/50713) ([Robert Schulze](https://github.com/rschu1ze)).
#### Build/Testing/Packaging Improvement
* New and improved keeper-bench. Everything can be customized from yaml/XML file: - request generator - each type of request generator can have a specific set of fields - multi requests can be generated just by doing the same under `multi` key - for each request or subrequest in multi a `weight` field can be defined to control distribution - define trees that need to be setup for a test run - hosts can be defined with all timeouts customizable and it's possible to control how many sessions to generate for each host - integers defined with `min_value` and `max_value` fields are random number generators. [#48547](https://github.com/ClickHouse/ClickHouse/pull/48547) ([Antonio Andelic](https://github.com/antonio2368)).
* ... Add a test to check max_rows_to_read_leaf behaviour. [#48950](https://github.com/ClickHouse/ClickHouse/pull/48950) ([Sean Haynes](https://github.com/seandhaynes)).
* Io_uring is not supported on macos, don't choose it when running tests on local to avoid occassional failures. [#49250](https://github.com/ClickHouse/ClickHouse/pull/49250) ([Frank Chen](https://github.com/FrankChen021)).
* Support named fault injection for testing. [#49361](https://github.com/ClickHouse/ClickHouse/pull/49361) ([Han Fei](https://github.com/hanfei1991)).
* Fix the 01193_metadata_loading test to match the query execution time specific to s390x. [#49455](https://github.com/ClickHouse/ClickHouse/pull/49455) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Use the RapidJSONParser library to parse the JSON float values in case of s390x. [#49457](https://github.com/ClickHouse/ClickHouse/pull/49457) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Allow running ClickHouse in the OS where the `prctl` (process control) syscall is not available, such as AWS Lambda. [#49538](https://github.com/ClickHouse/ClickHouse/pull/49538) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve CI check with an enabled analyzer. Now it should be green if only tests from `tests/broken_tests.txt` are broken. [#49562](https://github.com/ClickHouse/ClickHouse/pull/49562) ([Dmitry Novik](https://github.com/novikd)).
* Fixed the issue of build conflict between contrib/isa-l and isa-l in qpl [49296](https://github.com/ClickHouse/ClickHouse/issues/49296). [#49584](https://github.com/ClickHouse/ClickHouse/pull/49584) ([jasperzhu](https://github.com/jinjunzh)).
* Utilities are now only build if explicitly requested ("-DENABLE_UTILS=1") instead of by default, this reduces link times in typical development builds. [#49620](https://github.com/ClickHouse/ClickHouse/pull/49620) ([Robert Schulze](https://github.com/rschu1ze)).
* Pull build description of idxd-config into a separate CMake file to avoid accidental removal in future. [#49651](https://github.com/ClickHouse/ClickHouse/pull/49651) ([jasperzhu](https://github.com/jinjunzh)).
* Add CI check with an enabled analyzer in the master. Followup [#49562](https://github.com/ClickHouse/ClickHouse/issues/49562). [#49668](https://github.com/ClickHouse/ClickHouse/pull/49668) ([Dmitry Novik](https://github.com/novikd)).
* Switch to LLVM/clang 16. [#49678](https://github.com/ClickHouse/ClickHouse/pull/49678) ([Azat Khuzhin](https://github.com/azat)).
* Fixed DefaultHash64 for non-64 bit integers on s390x. [#49833](https://github.com/ClickHouse/ClickHouse/pull/49833) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Allow building ClickHouse with clang-17. [#49851](https://github.com/ClickHouse/ClickHouse/pull/49851) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* ClickHouse is now easier to be integrated into other cmake projects. [#49991](https://github.com/ClickHouse/ClickHouse/pull/49991) ([Amos Bird](https://github.com/amosbird)).
* Link `boost::context` library to `clickhouse_common_io`. This closes: [#50381](https://github.com/ClickHouse/ClickHouse/issues/50381). [#50385](https://github.com/ClickHouse/ClickHouse/pull/50385) ([HaiBo Li](https://github.com/marising)).
* Add support for building with clang-17. [#50410](https://github.com/ClickHouse/ClickHouse/pull/50410) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix strange additional QEMU logging after [#47151](https://github.com/ClickHouse/ClickHouse/issues/47151), see https://s3.amazonaws.com/clickhouse-test-reports/50078/a4743996ee4f3583884d07bcd6501df0cfdaa346/stateless_tests__release__databasereplicated__[3_4].html. [#50442](https://github.com/ClickHouse/ClickHouse/pull/50442) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* ClickHouse can work on Linux RISC-V 6.1.22. This closes [#50456](https://github.com/ClickHouse/ClickHouse/issues/50456). [#50457](https://github.com/ClickHouse/ClickHouse/pull/50457) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* ActionsDAG: fix wrong optimization [#47584](https://github.com/ClickHouse/ClickHouse/pull/47584) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Correctly handle concurrent snapshots in Keeper [#48466](https://github.com/ClickHouse/ClickHouse/pull/48466) ([Antonio Andelic](https://github.com/antonio2368)).
* MergeTreeMarksLoader holds DataPart instead of DataPartStorage [#48515](https://github.com/ClickHouse/ClickHouse/pull/48515) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* sequence state fix [#48603](https://github.com/ClickHouse/ClickHouse/pull/48603) ([Ilya Golshtein](https://github.com/ilejn)).
* Back/Restore concurrency check on previous fails [#48726](https://github.com/ClickHouse/ClickHouse/pull/48726) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix Attaching a table with non-existent ZK path does not increase the ReadonlyReplica metric [#48954](https://github.com/ClickHouse/ClickHouse/pull/48954) ([wangxiaobo](https://github.com/wzb5212)).
* Fix possible terminate called for uncaught exception in some places [#49112](https://github.com/ClickHouse/ClickHouse/pull/49112) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix key not found error for queries with multiple StorageJoin [#49137](https://github.com/ClickHouse/ClickHouse/pull/49137) ([vdimir](https://github.com/vdimir)).
* Fix wrong query result when using nullable primary key [#49172](https://github.com/ClickHouse/ClickHouse/pull/49172) ([Duc Canh Le](https://github.com/canhld94)).
* Revert "Fix GCS native copy ([#48981](https://github.com/ClickHouse/ClickHouse/issues/48981))" [#49194](https://github.com/ClickHouse/ClickHouse/pull/49194) ([Raúl Marín](https://github.com/Algunenano)).
* Fix reinterpretAs*() on big endian machines [#49198](https://github.com/ClickHouse/ClickHouse/pull/49198) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Lock zero copy parts more atomically [#49211](https://github.com/ClickHouse/ClickHouse/pull/49211) ([alesapin](https://github.com/alesapin)).
* Fix race on Outdated parts loading [#49223](https://github.com/ClickHouse/ClickHouse/pull/49223) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix all key value is null and group use rollup return wrong answer [#49282](https://github.com/ClickHouse/ClickHouse/pull/49282) ([Shuai li](https://github.com/loneylee)).
* Fix calculating load_factor for HASHED dictionaries with SHARDS [#49319](https://github.com/ClickHouse/ClickHouse/pull/49319) ([Azat Khuzhin](https://github.com/azat)).
* Disallow configuring compression CODECs for alias columns [#49363](https://github.com/ClickHouse/ClickHouse/pull/49363) ([Timur Solodovnikov](https://github.com/tsolodov)).
* Fix bug in removal of existing part directory [#49365](https://github.com/ClickHouse/ClickHouse/pull/49365) ([alesapin](https://github.com/alesapin)).
* Properly fix GCS when HMAC is used [#49390](https://github.com/ClickHouse/ClickHouse/pull/49390) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix fuzz bug when subquery set is not built when reading from remote() [#49425](https://github.com/ClickHouse/ClickHouse/pull/49425) ([Alexander Gololobov](https://github.com/davenger)).
* Invert `shutdown_wait_unfinished_queries` [#49427](https://github.com/ClickHouse/ClickHouse/pull/49427) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix another zero copy bug [#49473](https://github.com/ClickHouse/ClickHouse/pull/49473) ([alesapin](https://github.com/alesapin)).
* Fix postgres database setting [#49481](https://github.com/ClickHouse/ClickHouse/pull/49481) ([Mal Curtis](https://github.com/snikch)).
* Correctly handle s3Cluster arguments [#49490](https://github.com/ClickHouse/ClickHouse/pull/49490) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix bug in TraceCollector destructor. [#49508](https://github.com/ClickHouse/ClickHouse/pull/49508) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix AsynchronousReadIndirectBufferFromRemoteFS breaking on short seeks [#49525](https://github.com/ClickHouse/ClickHouse/pull/49525) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix dictionaries loading order [#49560](https://github.com/ClickHouse/ClickHouse/pull/49560) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Forbid the change of data type of Object('json') column [#49563](https://github.com/ClickHouse/ClickHouse/pull/49563) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix stress test (Logical error: Expected 7134 >= 11030) [#49623](https://github.com/ClickHouse/ClickHouse/pull/49623) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix bug in DISTINCT [#49628](https://github.com/ClickHouse/ClickHouse/pull/49628) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix: DISTINCT in order with zero values in non-sorted columns [#49636](https://github.com/ClickHouse/ClickHouse/pull/49636) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix one-off error in big integers found by UBSan with fuzzer [#49645](https://github.com/ClickHouse/ClickHouse/pull/49645) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix reading from sparse columns after restart [#49660](https://github.com/ClickHouse/ClickHouse/pull/49660) ([Anton Popov](https://github.com/CurtizJ)).
* Fix assert in SpanHolder::finish() with fibers [#49673](https://github.com/ClickHouse/ClickHouse/pull/49673) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix short circuit functions and mutations with sparse arguments [#49716](https://github.com/ClickHouse/ClickHouse/pull/49716) ([Anton Popov](https://github.com/CurtizJ)).
* Fix writing appended files to incremental backups [#49725](https://github.com/ClickHouse/ClickHouse/pull/49725) ([Vitaly Baranov](https://github.com/vitlibar)).
* Ignore LWD column in checkPartDynamicColumns [#49737](https://github.com/ClickHouse/ClickHouse/pull/49737) ([Alexander Gololobov](https://github.com/davenger)).
* Fix msan issue in randomStringUTF8(<uneven number>) [#49750](https://github.com/ClickHouse/ClickHouse/pull/49750) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix aggregate function kolmogorovSmirnovTest [#49768](https://github.com/ClickHouse/ClickHouse/pull/49768) ([FFFFFFFHHHHHHH](https://github.com/FFFFFFFHHHHHHH)).
* Fix settings aliases in native protocol [#49776](https://github.com/ClickHouse/ClickHouse/pull/49776) ([Azat Khuzhin](https://github.com/azat)).
* Fix `arrayMap` with array of tuples with single argument [#49789](https://github.com/ClickHouse/ClickHouse/pull/49789) ([Anton Popov](https://github.com/CurtizJ)).
* Fix per-query IO/BACKUPs throttling settings [#49797](https://github.com/ClickHouse/ClickHouse/pull/49797) ([Azat Khuzhin](https://github.com/azat)).
* Fix setting NULL in profile definition [#49831](https://github.com/ClickHouse/ClickHouse/pull/49831) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix a bug with projections and the aggregate_functions_null_for_empty setting (for query_plan_optimize_projection) [#49873](https://github.com/ClickHouse/ClickHouse/pull/49873) ([Amos Bird](https://github.com/amosbird)).
* Fix processing pending batch for Distributed async INSERT after restart [#49884](https://github.com/ClickHouse/ClickHouse/pull/49884) ([Azat Khuzhin](https://github.com/azat)).
* Fix assertion in CacheMetadata::doCleanup [#49914](https://github.com/ClickHouse/ClickHouse/pull/49914) ([Kseniia Sumarokova](https://github.com/kssenii)).
* fix `is_prefix` in OptimizeRegularExpression [#49919](https://github.com/ClickHouse/ClickHouse/pull/49919) ([Han Fei](https://github.com/hanfei1991)).
* Fix metrics `WriteBufferFromS3Bytes`, `WriteBufferFromS3Microseconds` and `WriteBufferFromS3RequestsErrors` [#49930](https://github.com/ClickHouse/ClickHouse/pull/49930) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Fix IPv6 encoding in protobuf [#49933](https://github.com/ClickHouse/ClickHouse/pull/49933) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix possible Logical error on bad Nullable parsing for text formats [#49960](https://github.com/ClickHouse/ClickHouse/pull/49960) ([Kruglov Pavel](https://github.com/Avogar)).
* Add setting output_format_parquet_compliant_nested_types to produce more compatible Parquet files [#50001](https://github.com/ClickHouse/ClickHouse/pull/50001) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix logical error in stress test "Not enough space to add ..." [#50021](https://github.com/ClickHouse/ClickHouse/pull/50021) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Avoid deadlock when starting table in attach thread of `ReplicatedMergeTree` [#50026](https://github.com/ClickHouse/ClickHouse/pull/50026) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix assert in SpanHolder::finish() with fibers attempt 2 [#50034](https://github.com/ClickHouse/ClickHouse/pull/50034) ([Kruglov Pavel](https://github.com/Avogar)).
* Add proper escaping for DDL OpenTelemetry context serialization [#50045](https://github.com/ClickHouse/ClickHouse/pull/50045) ([Azat Khuzhin](https://github.com/azat)).
* Fix reporting broken projection parts [#50052](https://github.com/ClickHouse/ClickHouse/pull/50052) ([Amos Bird](https://github.com/amosbird)).
* JIT compilation not equals NaN fix [#50056](https://github.com/ClickHouse/ClickHouse/pull/50056) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix crashing in case of Replicated database without arguments [#50058](https://github.com/ClickHouse/ClickHouse/pull/50058) ([Azat Khuzhin](https://github.com/azat)).
* Fix crash with `multiIf` and constant condition and nullable arguments [#50123](https://github.com/ClickHouse/ClickHouse/pull/50123) ([Anton Popov](https://github.com/CurtizJ)).
* Fix invalid index analysis for date related keys [#50153](https://github.com/ClickHouse/ClickHouse/pull/50153) ([Amos Bird](https://github.com/amosbird)).
* do not allow modify order by when there are no order by cols [#50154](https://github.com/ClickHouse/ClickHouse/pull/50154) ([Han Fei](https://github.com/hanfei1991)).
* Fix broken index analysis when binary operator contains a null constant argument [#50177](https://github.com/ClickHouse/ClickHouse/pull/50177) ([Amos Bird](https://github.com/amosbird)).
* clickhouse-client: disallow usage of `--query` and `--queries-file` at the same time [#50210](https://github.com/ClickHouse/ClickHouse/pull/50210) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Fix UB for INTO OUTFILE extensions (APPEND / AND STDOUT) and WATCH EVENTS [#50216](https://github.com/ClickHouse/ClickHouse/pull/50216) ([Azat Khuzhin](https://github.com/azat)).
* Fix skipping spaces at end of row in CustomSeparatedIgnoreSpaces format [#50224](https://github.com/ClickHouse/ClickHouse/pull/50224) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix iceberg metadata parsing [#50232](https://github.com/ClickHouse/ClickHouse/pull/50232) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix nested distributed SELECT in WITH clause [#50234](https://github.com/ClickHouse/ClickHouse/pull/50234) ([Azat Khuzhin](https://github.com/azat)).
* Fix reconnecting of HTTPS session when target host IP was changed [#50240](https://github.com/ClickHouse/ClickHouse/pull/50240) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Fix msan issue in keyed siphash [#50245](https://github.com/ClickHouse/ClickHouse/pull/50245) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix bugs in Poco sockets in non-blocking mode, use true non-blocking sockets [#50252](https://github.com/ClickHouse/ClickHouse/pull/50252) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix checksum calculation for backup entries [#50264](https://github.com/ClickHouse/ClickHouse/pull/50264) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fixed type conversion from Date/Date32 to DateTime64 when querying with DateTime64 index [#50280](https://github.com/ClickHouse/ClickHouse/pull/50280) ([Lucas Chang](https://github.com/lucas-tubi)).
* Comparison functions NaN fix [#50287](https://github.com/ClickHouse/ClickHouse/pull/50287) ([Maksim Kita](https://github.com/kitaisreal)).
* JIT aggregation nullable key fix [#50291](https://github.com/ClickHouse/ClickHouse/pull/50291) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix clickhouse-local crashing when writing empty Arrow or Parquet output [#50328](https://github.com/ClickHouse/ClickHouse/pull/50328) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix crash when Pool::Entry::disconnect() is called [#50334](https://github.com/ClickHouse/ClickHouse/pull/50334) ([Val Doroshchuk](https://github.com/valbok)).
* Improved fetch part by holding directory lock longer [#50339](https://github.com/ClickHouse/ClickHouse/pull/50339) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix bitShift* functions with both constant arguments [#50343](https://github.com/ClickHouse/ClickHouse/pull/50343) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix Keeper deadlock on exception when preprocessing requests. [#50387](https://github.com/ClickHouse/ClickHouse/pull/50387) ([frinkr](https://github.com/frinkr)).
* Fix hashing of const integer values [#50421](https://github.com/ClickHouse/ClickHouse/pull/50421) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix excessive memory usage for FINAL (due to too much streams usage) [#50429](https://github.com/ClickHouse/ClickHouse/pull/50429) ([Azat Khuzhin](https://github.com/azat)).
* Fix merge_tree_min_rows_for_seek/merge_tree_min_bytes_for_seek for data skipping indexes [#50432](https://github.com/ClickHouse/ClickHouse/pull/50432) ([Azat Khuzhin](https://github.com/azat)).
* Limit the number of in-flight tasks for loading outdated parts [#50450](https://github.com/ClickHouse/ClickHouse/pull/50450) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Keeper fix: apply uncommitted state after snapshot install [#50483](https://github.com/ClickHouse/ClickHouse/pull/50483) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix incorrect constant folding [#50536](https://github.com/ClickHouse/ClickHouse/pull/50536) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix logical error in stress test (Not enough space to add ...) [#50583](https://github.com/ClickHouse/ClickHouse/pull/50583) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix converting Null to LowCardinality(Nullable) in values table function [#50637](https://github.com/ClickHouse/ClickHouse/pull/50637) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash in anti/semi join [#50638](https://github.com/ClickHouse/ClickHouse/pull/50638) ([vdimir](https://github.com/vdimir)).
* Revert invalid RegExpTreeDictionary optimization [#50642](https://github.com/ClickHouse/ClickHouse/pull/50642) ([Johann Gan](https://github.com/johanngan)).
* Correctly disable async insert with deduplication when it's not needed [#50663](https://github.com/ClickHouse/ClickHouse/pull/50663) ([Antonio Andelic](https://github.com/antonio2368)).
#### Build Improvement
* Fixed Functional Test 00870_t64_codec, 00871_t64_codec_signed, 00872_t64_bit_codec. [#49658](https://github.com/ClickHouse/ClickHouse/pull/49658) ([Sanjam Panda](https://github.com/saitama951)).
#### NO CL ENTRY
* NO CL ENTRY: 'Fix user MemoryTracker counter in async inserts'. [#47630](https://github.com/ClickHouse/ClickHouse/pull/47630) ([Dmitry Novik](https://github.com/novikd)).
* NO CL ENTRY: 'Revert "Make `Pretty` formats even prettier."'. [#49850](https://github.com/ClickHouse/ClickHouse/pull/49850) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Update first_value.md:remove redundant 's''. [#50331](https://github.com/ClickHouse/ClickHouse/pull/50331) ([sslouis](https://github.com/savezed)).
* NO CL ENTRY: 'Revert "less logs in WriteBufferFromS3"'. [#50390](https://github.com/ClickHouse/ClickHouse/pull/50390) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Attempt to fix the "system.stack_trace" test [#44627](https://github.com/ClickHouse/ClickHouse/pull/44627) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* rework WriteBufferFromS3, add tests, add abortion [#44869](https://github.com/ClickHouse/ClickHouse/pull/44869) ([Sema Checherinda](https://github.com/CheSema)).
* Rework locking in fs cache [#44985](https://github.com/ClickHouse/ClickHouse/pull/44985) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update ubuntu_ami_for_ci.sh [#47151](https://github.com/ClickHouse/ClickHouse/pull/47151) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Implement status comment [#48468](https://github.com/ClickHouse/ClickHouse/pull/48468) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update curl to 8.0.1 (for CVEs) [#48765](https://github.com/ClickHouse/ClickHouse/pull/48765) ([Boris Kuschel](https://github.com/bkuschel)).
* Fix some tests [#48792](https://github.com/ClickHouse/ClickHouse/pull/48792) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Bug Fix for 02432_s3_parallel_parts_cleanup.sql with zero copy replication [#48865](https://github.com/ClickHouse/ClickHouse/pull/48865) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Add AsyncLoader with dependency tracking and runtime prioritization [#48923](https://github.com/ClickHouse/ClickHouse/pull/48923) ([Sergei Trifonov](https://github.com/serxa)).
* Fix incorrect createColumn call on join clause [#48998](https://github.com/ClickHouse/ClickHouse/pull/48998) ([Ongkong](https://github.com/ongkong)).
* Try fix flaky 01346_alter_enum_partition_key_replicated_zookeeper_long [#49099](https://github.com/ClickHouse/ClickHouse/pull/49099) ([Sergei Trifonov](https://github.com/serxa)).
* Fix possible logical error "Cannot cancel. Either no query sent or already cancelled" [#49106](https://github.com/ClickHouse/ClickHouse/pull/49106) ([Kruglov Pavel](https://github.com/Avogar)).
* Refactor ColumnLowCardinality::cutAndCompact [#49111](https://github.com/ClickHouse/ClickHouse/pull/49111) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix tests with enabled analyzer [#49116](https://github.com/ClickHouse/ClickHouse/pull/49116) ([Dmitry Novik](https://github.com/novikd)).
* Use `SharedMutex` instead of `UpgradableMutex` [#49139](https://github.com/ClickHouse/ClickHouse/pull/49139) ([Sergei Trifonov](https://github.com/serxa)).
* Don't add metadata_version file if it doesn't exist [#49146](https://github.com/ClickHouse/ClickHouse/pull/49146) ([alesapin](https://github.com/alesapin)).
* clearing s3 between tests in a robust way [#49157](https://github.com/ClickHouse/ClickHouse/pull/49157) ([Sema Checherinda](https://github.com/CheSema)).
* Align connect timeout with aws sdk default [#49161](https://github.com/ClickHouse/ClickHouse/pull/49161) ([Nikita Taranov](https://github.com/nickitat)).
* Fix test_encrypted_disk_replication [#49193](https://github.com/ClickHouse/ClickHouse/pull/49193) ([Vitaly Baranov](https://github.com/vitlibar)).
* Allow using function `concat` with `Map` type [#49200](https://github.com/ClickHouse/ClickHouse/pull/49200) ([Anton Popov](https://github.com/CurtizJ)).
* Slight improvements to coordinator logging [#49204](https://github.com/ClickHouse/ClickHouse/pull/49204) ([Raúl Marín](https://github.com/Algunenano)).
* Fix some typos in conversion functions [#49221](https://github.com/ClickHouse/ClickHouse/pull/49221) ([Raúl Marín](https://github.com/Algunenano)).
* CMake: Remove some GCC-specific code [#49224](https://github.com/ClickHouse/ClickHouse/pull/49224) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix oss-fuzz build errors [#49236](https://github.com/ClickHouse/ClickHouse/pull/49236) ([Nikita Taranov](https://github.com/nickitat)).
* Update version after release [#49237](https://github.com/ClickHouse/ClickHouse/pull/49237) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version_date.tsv and changelogs after v23.4.1.1943-stable [#49239](https://github.com/ClickHouse/ClickHouse/pull/49239) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Merge [#24050](https://github.com/ClickHouse/ClickHouse/issues/24050) [#49240](https://github.com/ClickHouse/ClickHouse/pull/49240) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add file name to exception raised during decompression [#49241](https://github.com/ClickHouse/ClickHouse/pull/49241) ([Nikolay Degterinsky](https://github.com/evillique)).
* Disable ISA-L on aarch64 architectures [#49256](https://github.com/ClickHouse/ClickHouse/pull/49256) ([Jordi Villar](https://github.com/jrdi)).
* Add a comment in FileCache.cpp [#49260](https://github.com/ClickHouse/ClickHouse/pull/49260) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix garbage [#48719](https://github.com/ClickHouse/ClickHouse/issues/48719) [#49263](https://github.com/ClickHouse/ClickHouse/pull/49263) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update build for nasm [#49288](https://github.com/ClickHouse/ClickHouse/pull/49288) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix race in `waitForProcessingQueue` [#49302](https://github.com/ClickHouse/ClickHouse/pull/49302) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix stress test [#49309](https://github.com/ClickHouse/ClickHouse/pull/49309) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix 02516_join_with_totals_and_subquery_bug with new analyzer [#49310](https://github.com/ClickHouse/ClickHouse/pull/49310) ([Dmitry Novik](https://github.com/novikd)).
* Fallback auth gh api [#49314](https://github.com/ClickHouse/ClickHouse/pull/49314) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Unpoison stack frame ptrs from libunwind for msan [#49316](https://github.com/ClickHouse/ClickHouse/pull/49316) ([Robert Schulze](https://github.com/rschu1ze)).
* Respect projections in 01600_parts [#49318](https://github.com/ClickHouse/ClickHouse/pull/49318) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* move pipe compute into initializePipeline [#49326](https://github.com/ClickHouse/ClickHouse/pull/49326) ([Konstantin Morozov](https://github.com/k-morozov)).
* Fix compiling average example (suppress -Wframe-larger-than) [#49358](https://github.com/ClickHouse/ClickHouse/pull/49358) ([Azat Khuzhin](https://github.com/azat)).
* Fix join_use_nulls in analyzer [#49359](https://github.com/ClickHouse/ClickHouse/pull/49359) ([vdimir](https://github.com/vdimir)).
* Fix 02680_mysql_ast_logical_err in analyzer [#49362](https://github.com/ClickHouse/ClickHouse/pull/49362) ([vdimir](https://github.com/vdimir)).
* Remove wrong assertion in cache [#49376](https://github.com/ClickHouse/ClickHouse/pull/49376) ([Kseniia Sumarokova](https://github.com/kssenii)).
* A better way of excluding ISA-L on non-x86 [#49378](https://github.com/ClickHouse/ClickHouse/pull/49378) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix decimal aggregates test for s390x [#49382](https://github.com/ClickHouse/ClickHouse/pull/49382) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Move logging one line higher [#49387](https://github.com/ClickHouse/ClickHouse/pull/49387) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Improve CI: status commit, auth for get_gh_api [#49388](https://github.com/ClickHouse/ClickHouse/pull/49388) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix printing hung queries in clickhouse-test. [#49389](https://github.com/ClickHouse/ClickHouse/pull/49389) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Correctly stop CNF convert for too many atomics in new analyzer [#49402](https://github.com/ClickHouse/ClickHouse/pull/49402) ([Antonio Andelic](https://github.com/antonio2368)).
* Remove 02707_complex_query_fails_analyzer test [#49403](https://github.com/ClickHouse/ClickHouse/pull/49403) ([Dmitry Novik](https://github.com/novikd)).
* Update FileSegment.cpp [#49411](https://github.com/ClickHouse/ClickHouse/pull/49411) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Switch Block::NameMap to google::dense_hash_map over HashMap [#49412](https://github.com/ClickHouse/ClickHouse/pull/49412) ([Azat Khuzhin](https://github.com/azat)).
* Slightly reduce inter-header dependencies [#49413](https://github.com/ClickHouse/ClickHouse/pull/49413) ([Azat Khuzhin](https://github.com/azat)).
* Update WithFileName.cpp [#49414](https://github.com/ClickHouse/ClickHouse/pull/49414) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix some assertions failing in stress test [#49415](https://github.com/ClickHouse/ClickHouse/pull/49415) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Correctly cleanup sequential node in ZooKeeperWithFaultInjection [#49418](https://github.com/ClickHouse/ClickHouse/pull/49418) ([vdimir](https://github.com/vdimir)).
* Throw an exception for non-parametric functions in new analyzer [#49419](https://github.com/ClickHouse/ClickHouse/pull/49419) ([Dmitry Novik](https://github.com/novikd)).
* Fix some bad error messages [#49420](https://github.com/ClickHouse/ClickHouse/pull/49420) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update version_date.tsv and changelogs after v23.4.2.11-stable [#49422](https://github.com/ClickHouse/ClickHouse/pull/49422) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Remove trash [#49423](https://github.com/ClickHouse/ClickHouse/pull/49423) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Whitespaces [#49424](https://github.com/ClickHouse/ClickHouse/pull/49424) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove dependency from DB::Context in remote/cache readers [#49426](https://github.com/ClickHouse/ClickHouse/pull/49426) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Merging [#49066](https://github.com/ClickHouse/ClickHouse/issues/49066) (Better error handling during loading of parts) [#49430](https://github.com/ClickHouse/ClickHouse/pull/49430) ([Anton Popov](https://github.com/CurtizJ)).
* all s3-blobs removed when merge aborted, remove part from failed fetch without unlock keper [#49432](https://github.com/ClickHouse/ClickHouse/pull/49432) ([Sema Checherinda](https://github.com/CheSema)).
* Make INSERT do more things in parallel to avoid getting bottlenecked on one thread [#49434](https://github.com/ClickHouse/ClickHouse/pull/49434) ([Michael Kolupaev](https://github.com/al13n321)).
* Make 'exceptions shorter than 30' test less noisy [#49435](https://github.com/ClickHouse/ClickHouse/pull/49435) ([Michael Kolupaev](https://github.com/al13n321)).
* Build fixes for ENABLE_LIBRARIES=OFF [#49437](https://github.com/ClickHouse/ClickHouse/pull/49437) ([Azat Khuzhin](https://github.com/azat)).
* Add image for docker-server jepsen [#49452](https://github.com/ClickHouse/ClickHouse/pull/49452) ([alesapin](https://github.com/alesapin)).
* Follow-up to [#48792](https://github.com/ClickHouse/ClickHouse/issues/48792) [#49458](https://github.com/ClickHouse/ClickHouse/pull/49458) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add method `getCurrentAvailabilityZone` to `AWSEC2MetadataClient` [#49464](https://github.com/ClickHouse/ClickHouse/pull/49464) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add an integration test for `shutdown_wait_unfinished_queries` [#49469](https://github.com/ClickHouse/ClickHouse/pull/49469) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Replace `NO DELAY` with `SYNC` in tests [#49470](https://github.com/ClickHouse/ClickHouse/pull/49470) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Check the PRs body directly in lambda, without rerun. Fix RCE in the CI [#49475](https://github.com/ClickHouse/ClickHouse/pull/49475) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Minor changes for setThreadName [#49476](https://github.com/ClickHouse/ClickHouse/pull/49476) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Static cast std::atomic<size_t> to uint64_t to serialize. [#49482](https://github.com/ClickHouse/ClickHouse/pull/49482) ([alekar](https://github.com/alekar)).
* Fix logical error in stress test, add some logging [#49491](https://github.com/ClickHouse/ClickHouse/pull/49491) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fixes in server jepsen image [#49492](https://github.com/ClickHouse/ClickHouse/pull/49492) ([alesapin](https://github.com/alesapin)).
* Fix UserTimeMicroseconds and SystemTimeMicroseconds descriptions [#49521](https://github.com/ClickHouse/ClickHouse/pull/49521) ([Sergei Trifonov](https://github.com/serxa)).
* Remove garbage from HDFS [#49531](https://github.com/ClickHouse/ClickHouse/pull/49531) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Split ReadWriteBufferFromHTTP.h into .h and .cpp file [#49533](https://github.com/ClickHouse/ClickHouse/pull/49533) ([Michael Kolupaev](https://github.com/al13n321)).
* Remove garbage from Pretty format [#49534](https://github.com/ClickHouse/ClickHouse/pull/49534) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Make input_format_parquet_preserve_order imply !parallelize_output_from_storages [#49536](https://github.com/ClickHouse/ClickHouse/pull/49536) ([Michael Kolupaev](https://github.com/al13n321)).
* Remove extra semicolons [#49545](https://github.com/ClickHouse/ClickHouse/pull/49545) ([Bulat Gaifullin](https://github.com/bgaifullin)).
* Fix 00597_push_down_predicate_long for analyzer [#49551](https://github.com/ClickHouse/ClickHouse/pull/49551) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix stress test (assertion 'key_metadata.lock()') [#49554](https://github.com/ClickHouse/ClickHouse/pull/49554) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix writeAnyEscapedString if quote_character is a meta character [#49558](https://github.com/ClickHouse/ClickHouse/pull/49558) ([Robert Schulze](https://github.com/rschu1ze)).
* Add CMake option for BOOST_USE_UCONTEXT [#49564](https://github.com/ClickHouse/ClickHouse/pull/49564) ([ltrk2](https://github.com/ltrk2)).
* Fix 01655_plan_optimizations_optimize_read_in_window_order for analyzer [#49565](https://github.com/ClickHouse/ClickHouse/pull/49565) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix `ThreadPool::wait` [#49572](https://github.com/ClickHouse/ClickHouse/pull/49572) ([Anton Popov](https://github.com/CurtizJ)).
* Query cache: disable for internal queries [#49573](https://github.com/ClickHouse/ClickHouse/pull/49573) ([Robert Schulze](https://github.com/rschu1ze)).
* Remove `test_merge_tree_s3_restore` [#49576](https://github.com/ClickHouse/ClickHouse/pull/49576) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix bad test [#49578](https://github.com/ClickHouse/ClickHouse/pull/49578) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove obsolete test about deprecated feature [#49579](https://github.com/ClickHouse/ClickHouse/pull/49579) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Avoid error found by AST Fuzzer [#49580](https://github.com/ClickHouse/ClickHouse/pull/49580) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix wrong assert [#49581](https://github.com/ClickHouse/ClickHouse/pull/49581) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Flaky test 02723_zookeeper_name.sql [#49592](https://github.com/ClickHouse/ClickHouse/pull/49592) ([Sema Checherinda](https://github.com/CheSema)).
* Query Cache: Safeguard against empty chunks [#49593](https://github.com/ClickHouse/ClickHouse/pull/49593) ([Robert Schulze](https://github.com/rschu1ze)).
* 02723_zookeeper_name: Force a deterministic result order [#49594](https://github.com/ClickHouse/ClickHouse/pull/49594) ([Robert Schulze](https://github.com/rschu1ze)).
* Remove dangerous code (stringstream) [#49595](https://github.com/ClickHouse/ClickHouse/pull/49595) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove some code [#49596](https://github.com/ClickHouse/ClickHouse/pull/49596) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove "locale" [#49597](https://github.com/ClickHouse/ClickHouse/pull/49597) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* CMake: Cleanup utils build [#49598](https://github.com/ClickHouse/ClickHouse/pull/49598) ([Robert Schulze](https://github.com/rschu1ze)).
* Follow-up for [#49580](https://github.com/ClickHouse/ClickHouse/issues/49580) [#49604](https://github.com/ClickHouse/ClickHouse/pull/49604) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix typo [#49605](https://github.com/ClickHouse/ClickHouse/pull/49605) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix bad test 01660_system_parts_smoke [#49611](https://github.com/ClickHouse/ClickHouse/pull/49611) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Minor changes [#49612](https://github.com/ClickHouse/ClickHouse/pull/49612) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Follow-up for [#49576](https://github.com/ClickHouse/ClickHouse/issues/49576) [#49615](https://github.com/ClickHouse/ClickHouse/pull/49615) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix error in [#48300](https://github.com/ClickHouse/ClickHouse/issues/48300) [#49616](https://github.com/ClickHouse/ClickHouse/pull/49616) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix typo: "as much slots" -> "as many slots" [#49617](https://github.com/ClickHouse/ClickHouse/pull/49617) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Better concurrent parts removal with zero copy [#49619](https://github.com/ClickHouse/ClickHouse/pull/49619) ([alesapin](https://github.com/alesapin)).
* CMake: Remove legacy switch for ccache [#49627](https://github.com/ClickHouse/ClickHouse/pull/49627) ([Robert Schulze](https://github.com/rschu1ze)).
* Try to fix integration test 'test_ssl_cert_authentication' [#49632](https://github.com/ClickHouse/ClickHouse/pull/49632) ([Nikolay Degterinsky](https://github.com/evillique)).
* Unflake 01660_system_parts_smoke [#49633](https://github.com/ClickHouse/ClickHouse/pull/49633) ([Robert Schulze](https://github.com/rschu1ze)).
* Add trash [#49634](https://github.com/ClickHouse/ClickHouse/pull/49634) ([Robert Schulze](https://github.com/rschu1ze)).
* Remove commented code [#49635](https://github.com/ClickHouse/ClickHouse/pull/49635) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add flaky test [#49646](https://github.com/ClickHouse/ClickHouse/pull/49646) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix race in `Context::createCopy` [#49663](https://github.com/ClickHouse/ClickHouse/pull/49663) ([Anton Popov](https://github.com/CurtizJ)).
* Disable 01710_projection_aggregation_in_order.sql [#49667](https://github.com/ClickHouse/ClickHouse/pull/49667) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix flaky 02684_bson.sql [#49674](https://github.com/ClickHouse/ClickHouse/pull/49674) ([Kruglov Pavel](https://github.com/Avogar)).
* Some cache cleanup after rework locking [#49675](https://github.com/ClickHouse/ClickHouse/pull/49675) ([Igor Nikonov](https://github.com/devcrafter)).
* Correctly update log pointer during database replica recovery [#49676](https://github.com/ClickHouse/ClickHouse/pull/49676) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Enable distinct in order after fix [#49636](https://github.com/ClickHouse/ClickHouse/issues/49636) [#49677](https://github.com/ClickHouse/ClickHouse/pull/49677) ([Igor Nikonov](https://github.com/devcrafter)).
* Build fixes for RISCV64 [#49688](https://github.com/ClickHouse/ClickHouse/pull/49688) ([Azat Khuzhin](https://github.com/azat)).
* Add some logging [#49690](https://github.com/ClickHouse/ClickHouse/pull/49690) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix a wrong built generator removal, use `depth=1` [#49692](https://github.com/ClickHouse/ClickHouse/pull/49692) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix member call on null pointer in AST fuzzer [#49696](https://github.com/ClickHouse/ClickHouse/pull/49696) ([Nikolay Degterinsky](https://github.com/evillique)).
* Improve woboq codebrowser pipeline [#49701](https://github.com/ClickHouse/ClickHouse/pull/49701) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Enable `do_not_evict_index_and_mark_files` by default [#49702](https://github.com/ClickHouse/ClickHouse/pull/49702) ([Nikita Taranov](https://github.com/nickitat)).
* Backport fix for UBSan error in musl/logf.c [#49705](https://github.com/ClickHouse/ClickHouse/pull/49705) ([Nikita Taranov](https://github.com/nickitat)).
* Fix flaky test for `kolmogorovSmirnovTest` function [#49710](https://github.com/ClickHouse/ClickHouse/pull/49710) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Update clickhouse-test [#49712](https://github.com/ClickHouse/ClickHouse/pull/49712) ([Alexander Tokmakov](https://github.com/tavplubix)).
* IBM s390x: ip encoding fix [#49713](https://github.com/ClickHouse/ClickHouse/pull/49713) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Remove not used ErrorCodes [#49715](https://github.com/ClickHouse/ClickHouse/pull/49715) ([Sergei Trifonov](https://github.com/serxa)).
* Disable mmap for StorageFile in clickhouse-server [#49717](https://github.com/ClickHouse/ClickHouse/pull/49717) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix typo [#49718](https://github.com/ClickHouse/ClickHouse/pull/49718) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Do not launch workflows for PRs w/o "can be tested" [#49726](https://github.com/ClickHouse/ClickHouse/pull/49726) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Move assertions after logging [#49729](https://github.com/ClickHouse/ClickHouse/pull/49729) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Docs: Fix sidebar label for dictionary table function [#49730](https://github.com/ClickHouse/ClickHouse/pull/49730) ([Robert Schulze](https://github.com/rschu1ze)).
* Do not allocate own buffer in CachedOnDiskReadBufferFromFile when `use_external_buffer == true` [#49733](https://github.com/ClickHouse/ClickHouse/pull/49733) ([Nikita Taranov](https://github.com/nickitat)).
* fix convertation [#49749](https://github.com/ClickHouse/ClickHouse/pull/49749) ([Sema Checherinda](https://github.com/CheSema)).
* fix flaky test 02504_regexp_dictionary_ua_parser [#49753](https://github.com/ClickHouse/ClickHouse/pull/49753) ([Han Fei](https://github.com/hanfei1991)).
* Fix unit test `ExceptionFromWait` [#49755](https://github.com/ClickHouse/ClickHouse/pull/49755) ([Anton Popov](https://github.com/CurtizJ)).
* Add forgotten lock (addition to [#49117](https://github.com/ClickHouse/ClickHouse/issues/49117)) [#49757](https://github.com/ClickHouse/ClickHouse/pull/49757) ([Anton Popov](https://github.com/CurtizJ)).
* Fix typo [#49762](https://github.com/ClickHouse/ClickHouse/pull/49762) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix build of `libfiu` on clang-16 [#49766](https://github.com/ClickHouse/ClickHouse/pull/49766) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update README.md [#49782](https://github.com/ClickHouse/ClickHouse/pull/49782) ([Tyler Hannan](https://github.com/tylerhannan)).
* Analyzer: fix column not found for optimized prewhere with sample by [#49784](https://github.com/ClickHouse/ClickHouse/pull/49784) ([vdimir](https://github.com/vdimir)).
* Typo: demange.cpp --> demangle.cpp [#49799](https://github.com/ClickHouse/ClickHouse/pull/49799) ([Robert Schulze](https://github.com/rschu1ze)).
* Analyzer: apply _CAST to constants only once [#49800](https://github.com/ClickHouse/ClickHouse/pull/49800) ([Dmitry Novik](https://github.com/novikd)).
* Use CLOCK_MONOTONIC_RAW over CLOCK_MONOTONIC on Linux (fixes non monotonic clock) [#49819](https://github.com/ClickHouse/ClickHouse/pull/49819) ([Azat Khuzhin](https://github.com/azat)).
* README.md: 4 --> 5 [#49822](https://github.com/ClickHouse/ClickHouse/pull/49822) ([Robert Schulze](https://github.com/rschu1ze)).
* Allow ASOF JOIN over nullable right column [#49826](https://github.com/ClickHouse/ClickHouse/pull/49826) ([vdimir](https://github.com/vdimir)).
* Make 01533_multiple_nested test more reliable [#49828](https://github.com/ClickHouse/ClickHouse/pull/49828) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* What happens if I remove everything in msan_suppressions? [#49829](https://github.com/ClickHouse/ClickHouse/pull/49829) ([Robert Schulze](https://github.com/rschu1ze)).
* Update README.md [#49832](https://github.com/ClickHouse/ClickHouse/pull/49832) ([AnneClickHouse](https://github.com/AnneClickHouse)).
* Randomize enable_multiple_prewhere_read_steps setting [#49834](https://github.com/ClickHouse/ClickHouse/pull/49834) ([Alexander Gololobov](https://github.com/davenger)).
* Analyzer: do not optimize GROUP BY keys with ROLLUP and CUBE [#49838](https://github.com/ClickHouse/ClickHouse/pull/49838) ([Dmitry Novik](https://github.com/novikd)).
* Clearable hash table and zero values [#49846](https://github.com/ClickHouse/ClickHouse/pull/49846) ([Igor Nikonov](https://github.com/devcrafter)).
* Reset vectorscan reference to an "official" repo [#49848](https://github.com/ClickHouse/ClickHouse/pull/49848) ([Robert Schulze](https://github.com/rschu1ze)).
* Enable few slow clang-tidy checks for clangd [#49855](https://github.com/ClickHouse/ClickHouse/pull/49855) ([Azat Khuzhin](https://github.com/azat)).
* Update QPL docs [#49857](https://github.com/ClickHouse/ClickHouse/pull/49857) ([Robert Schulze](https://github.com/rschu1ze)).
* Small-ish .clang-tidy update [#49859](https://github.com/ClickHouse/ClickHouse/pull/49859) ([Robert Schulze](https://github.com/rschu1ze)).
* Follow-up for clang-tidy [#49861](https://github.com/ClickHouse/ClickHouse/pull/49861) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix "reference to local binding" after fixes for clang-17 [#49874](https://github.com/ClickHouse/ClickHouse/pull/49874) ([Azat Khuzhin](https://github.com/azat)).
* fix typo [#49876](https://github.com/ClickHouse/ClickHouse/pull/49876) ([JackyWoo](https://github.com/JackyWoo)).
* Log with warning if the server was terminated forcefully [#49881](https://github.com/ClickHouse/ClickHouse/pull/49881) ([Azat Khuzhin](https://github.com/azat)).
* Fix some tests [#49889](https://github.com/ClickHouse/ClickHouse/pull/49889) ([Alexander Tokmakov](https://github.com/tavplubix)).
* use chassert in MergeTreeDeduplicationLog to have better log info [#49891](https://github.com/ClickHouse/ClickHouse/pull/49891) ([Han Fei](https://github.com/hanfei1991)).
* Multiple pools support for AsyncLoader [#49893](https://github.com/ClickHouse/ClickHouse/pull/49893) ([Sergei Trifonov](https://github.com/serxa)).
* Fix stack-use-after-scope in resource manager test [#49908](https://github.com/ClickHouse/ClickHouse/pull/49908) ([Sergei Trifonov](https://github.com/serxa)).
* Retry connection expired in test_rename_column/test.py [#49911](https://github.com/ClickHouse/ClickHouse/pull/49911) ([alesapin](https://github.com/alesapin)).
* Try to fix flaky test_distributed_load_balancing tests [#49912](https://github.com/ClickHouse/ClickHouse/pull/49912) ([Kruglov Pavel](https://github.com/Avogar)).
* Remove unused code [#49918](https://github.com/ClickHouse/ClickHouse/pull/49918) ([alesapin](https://github.com/alesapin)).
* Fix flakiness of test_distributed_load_balancing test [#49921](https://github.com/ClickHouse/ClickHouse/pull/49921) ([Azat Khuzhin](https://github.com/azat)).
* Add some logging [#49925](https://github.com/ClickHouse/ClickHouse/pull/49925) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Support hardlinking parts transactionally [#49931](https://github.com/ClickHouse/ClickHouse/pull/49931) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix for analyzer: 02377_ optimize_sorting_by_input_stream_properties_e… [#49943](https://github.com/ClickHouse/ClickHouse/pull/49943) ([Igor Nikonov](https://github.com/devcrafter)).
* Follow up to [#49429](https://github.com/ClickHouse/ClickHouse/issues/49429) [#49964](https://github.com/ClickHouse/ClickHouse/pull/49964) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix flaky test_ssl_cert_authentication to use urllib3 [#49982](https://github.com/ClickHouse/ClickHouse/pull/49982) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix woboq codebrowser build with -Wno-poison-system-directories [#49992](https://github.com/ClickHouse/ClickHouse/pull/49992) ([Azat Khuzhin](https://github.com/azat)).
* test for [#46128](https://github.com/ClickHouse/ClickHouse/issues/46128) [#49993](https://github.com/ClickHouse/ClickHouse/pull/49993) ([Denny Crane](https://github.com/den-crane)).
* Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails [#49996](https://github.com/ClickHouse/ClickHouse/pull/49996) ([Michael Kolupaev](https://github.com/al13n321)).
* Check return value of `ftruncate` in Keeper [#50020](https://github.com/ClickHouse/ClickHouse/pull/50020) ([Antonio Andelic](https://github.com/antonio2368)).
* Add some assertions [#50025](https://github.com/ClickHouse/ClickHouse/pull/50025) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update 02441_alter_delete_and_drop_column.sql [#50027](https://github.com/ClickHouse/ClickHouse/pull/50027) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Move some common code to common [#50028](https://github.com/ClickHouse/ClickHouse/pull/50028) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add method getCredentials() to S3::Client [#50030](https://github.com/ClickHouse/ClickHouse/pull/50030) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update query_log.md [#50032](https://github.com/ClickHouse/ClickHouse/pull/50032) ([Sergei Trifonov](https://github.com/serxa)).
* Get rid of indirect write buffer in object storages [#50033](https://github.com/ClickHouse/ClickHouse/pull/50033) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Load balancing bugfixes [#50036](https://github.com/ClickHouse/ClickHouse/pull/50036) ([Sergei Trifonov](https://github.com/serxa)).
* Update S3 sdk to v1.11.61 [#50037](https://github.com/ClickHouse/ClickHouse/pull/50037) ([Nikita Taranov](https://github.com/nickitat)).
* Fix 02735_system_zookeeper_connection for DatabaseReplicated [#50047](https://github.com/ClickHouse/ClickHouse/pull/50047) ([Azat Khuzhin](https://github.com/azat)).
* Add more profile events for distributed connections [#50051](https://github.com/ClickHouse/ClickHouse/pull/50051) ([Sergei Trifonov](https://github.com/serxa)).
* FileCache: simple tryReserve() cleanup [#50059](https://github.com/ClickHouse/ClickHouse/pull/50059) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix hashed/sparse_hashed dictionaries max_load_factor upper range [#50065](https://github.com/ClickHouse/ClickHouse/pull/50065) ([Azat Khuzhin](https://github.com/azat)).
* Clearer coordinator log [#50101](https://github.com/ClickHouse/ClickHouse/pull/50101) ([Raúl Marín](https://github.com/Algunenano)).
* Analyzer: Do not execute table functions multiple times [#50105](https://github.com/ClickHouse/ClickHouse/pull/50105) ([Dmitry Novik](https://github.com/novikd)).
* Update default settings for Replicated database [#50108](https://github.com/ClickHouse/ClickHouse/pull/50108) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Make async prefetched buffer work with arbitrary impl [#50109](https://github.com/ClickHouse/ClickHouse/pull/50109) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update github.com/distribution/distribution [#50114](https://github.com/ClickHouse/ClickHouse/pull/50114) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Docs: Update clickhouse-local arguments [#50138](https://github.com/ClickHouse/ClickHouse/pull/50138) ([Robert Schulze](https://github.com/rschu1ze)).
* Change fields destruction order in AsyncTaskExecutor [#50151](https://github.com/ClickHouse/ClickHouse/pull/50151) ([Kruglov Pavel](https://github.com/Avogar)).
* Follow-up to [#49889](https://github.com/ClickHouse/ClickHouse/issues/49889) [#50152](https://github.com/ClickHouse/ClickHouse/pull/50152) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Clarification comment on retries controller behavior [#50155](https://github.com/ClickHouse/ClickHouse/pull/50155) ([Igor Nikonov](https://github.com/devcrafter)).
* Switch to upstream repository of vectorscan [#50159](https://github.com/ClickHouse/ClickHouse/pull/50159) ([Azat Khuzhin](https://github.com/azat)).
* Refactor lambdas, prepare to prio runners [#50160](https://github.com/ClickHouse/ClickHouse/pull/50160) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Speed-up the shellcheck with parallel xargs [#50164](https://github.com/ClickHouse/ClickHouse/pull/50164) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update an exception message [#50180](https://github.com/ClickHouse/ClickHouse/pull/50180) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Upgrade boost submodule [#50188](https://github.com/ClickHouse/ClickHouse/pull/50188) ([ltrk2](https://github.com/ltrk2)).
* Implement a uniform way to query processor core IDs [#50190](https://github.com/ClickHouse/ClickHouse/pull/50190) ([ltrk2](https://github.com/ltrk2)).
* Don't replicate delete through DDL worker if there is just 1 shard [#50193](https://github.com/ClickHouse/ClickHouse/pull/50193) ([Alexander Gololobov](https://github.com/davenger)).
* Fix codebrowser by using clang-15 image [#50197](https://github.com/ClickHouse/ClickHouse/pull/50197) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Add comments to build reports [#50200](https://github.com/ClickHouse/ClickHouse/pull/50200) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Automatic backports of important fixes to cloud-release [#50202](https://github.com/ClickHouse/ClickHouse/pull/50202) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Unify priorities: lower value means higher priority [#50205](https://github.com/ClickHouse/ClickHouse/pull/50205) ([Sergei Trifonov](https://github.com/serxa)).
* Use transactions for encrypted disks [#50206](https://github.com/ClickHouse/ClickHouse/pull/50206) ([alesapin](https://github.com/alesapin)).
* Get detailed error instead of unknown error for function test [#50207](https://github.com/ClickHouse/ClickHouse/pull/50207) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* README.md: Remove Berlin Meetup from upcoming events [#50218](https://github.com/ClickHouse/ClickHouse/pull/50218) ([Robert Schulze](https://github.com/rschu1ze)).
* Minor adjustment of clickhouse-client/local parameter docs [#50219](https://github.com/ClickHouse/ClickHouse/pull/50219) ([Robert Schulze](https://github.com/rschu1ze)).
* Unify priorities: rework IO scheduling subsystem [#50231](https://github.com/ClickHouse/ClickHouse/pull/50231) ([Sergei Trifonov](https://github.com/serxa)).
* Add new metrics BrokenDistributedBytesToInsert/DistributedBytesToInsert [#50238](https://github.com/ClickHouse/ClickHouse/pull/50238) ([Azat Khuzhin](https://github.com/azat)).
* Fix URL in backport comment [#50241](https://github.com/ClickHouse/ClickHouse/pull/50241) ([pufit](https://github.com/pufit)).
* Fix `02535_max_parallel_replicas_custom_key` [#50242](https://github.com/ClickHouse/ClickHouse/pull/50242) ([Antonio Andelic](https://github.com/antonio2368)).
* Fixes for MergeTree with readonly disks [#50244](https://github.com/ClickHouse/ClickHouse/pull/50244) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Yet another refactoring [#50257](https://github.com/ClickHouse/ClickHouse/pull/50257) ([Anton Popov](https://github.com/CurtizJ)).
* Unify priorities: rework AsyncLoader [#50272](https://github.com/ClickHouse/ClickHouse/pull/50272) ([Sergei Trifonov](https://github.com/serxa)).
* buffers d-tor finalize free [#50275](https://github.com/ClickHouse/ClickHouse/pull/50275) ([Sema Checherinda](https://github.com/CheSema)).
* Fix 02767_into_outfile_extensions_msan under analyzer [#50290](https://github.com/ClickHouse/ClickHouse/pull/50290) ([Azat Khuzhin](https://github.com/azat)).
* QPL: Add a comment about isal [#50308](https://github.com/ClickHouse/ClickHouse/pull/50308) ([Robert Schulze](https://github.com/rschu1ze)).
* Avoid clang 15 crash [#50310](https://github.com/ClickHouse/ClickHouse/pull/50310) ([Raúl Marín](https://github.com/Algunenano)).
* Cleanup Annoy index [#50312](https://github.com/ClickHouse/ClickHouse/pull/50312) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix flaky `AsyncLoader.StaticPriorities` unit test [#50313](https://github.com/ClickHouse/ClickHouse/pull/50313) ([Sergei Trifonov](https://github.com/serxa)).
* Update gtest_async_loader.cpp [#50317](https://github.com/ClickHouse/ClickHouse/pull/50317) ([Nikita Taranov](https://github.com/nickitat)).
* Fix IS (NOT) NULL operator priority [#50327](https://github.com/ClickHouse/ClickHouse/pull/50327) ([Nikolay Degterinsky](https://github.com/evillique)).
* Update README.md [#50340](https://github.com/ClickHouse/ClickHouse/pull/50340) ([Tyler Hannan](https://github.com/tylerhannan)).
* do not fix the event list in test [#50342](https://github.com/ClickHouse/ClickHouse/pull/50342) ([Sema Checherinda](https://github.com/CheSema)).
* less logs in WriteBufferFromS3 [#50347](https://github.com/ClickHouse/ClickHouse/pull/50347) ([Sema Checherinda](https://github.com/CheSema)).
* Remove legacy install scripts superseded by universal.sh [#50360](https://github.com/ClickHouse/ClickHouse/pull/50360) ([Robert Schulze](https://github.com/rschu1ze)).
* Fail perf tests when too many queries slowed down [#50361](https://github.com/ClickHouse/ClickHouse/pull/50361) ([Nikita Taranov](https://github.com/nickitat)).
* Fix after [#50109](https://github.com/ClickHouse/ClickHouse/issues/50109) [#50362](https://github.com/ClickHouse/ClickHouse/pull/50362) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix log message [#50363](https://github.com/ClickHouse/ClickHouse/pull/50363) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Compare functions NaN update test [#50366](https://github.com/ClickHouse/ClickHouse/pull/50366) ([Maksim Kita](https://github.com/kitaisreal)).
* Add re-creation for cherry-pick PRs [#50373](https://github.com/ClickHouse/ClickHouse/pull/50373) ([pufit](https://github.com/pufit)).
* Without applying `prepareRightBlock` will cause mismatch block structrue [#50383](https://github.com/ClickHouse/ClickHouse/pull/50383) ([lgbo](https://github.com/lgbo-ustc)).
* fix hung in unit tests [#50391](https://github.com/ClickHouse/ClickHouse/pull/50391) ([Sema Checherinda](https://github.com/CheSema)).
* Fix poll timeout in MaterializedMySQL [#50392](https://github.com/ClickHouse/ClickHouse/pull/50392) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Compile aggregate expressions enable by default [#50401](https://github.com/ClickHouse/ClickHouse/pull/50401) ([Maksim Kita](https://github.com/kitaisreal)).
* Update app.py [#50407](https://github.com/ClickHouse/ClickHouse/pull/50407) ([Nikita Taranov](https://github.com/nickitat)).
* reuse s3_mocks, rewrite test test_paranoid_check_in_logs [#50408](https://github.com/ClickHouse/ClickHouse/pull/50408) ([Sema Checherinda](https://github.com/CheSema)).
* test for [#42610](https://github.com/ClickHouse/ClickHouse/issues/42610) [#50409](https://github.com/ClickHouse/ClickHouse/pull/50409) ([Denny Crane](https://github.com/den-crane)).
* Remove something [#50411](https://github.com/ClickHouse/ClickHouse/pull/50411) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Mark the builds without results as pending [#50415](https://github.com/ClickHouse/ClickHouse/pull/50415) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Revert "Fix msan issue in keyed siphash" [#50426](https://github.com/ClickHouse/ClickHouse/pull/50426) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Revert "Revert "less logs in WriteBufferFromS3" ([#50390](https://github.com/ClickHouse/ClickHouse/issues/50390))" [#50444](https://github.com/ClickHouse/ClickHouse/pull/50444) ([Sema Checherinda](https://github.com/CheSema)).
* Paranoid fix for removing parts from ZooKeeper [#50448](https://github.com/ClickHouse/ClickHouse/pull/50448) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add timeout for unit tests [#50449](https://github.com/ClickHouse/ClickHouse/pull/50449) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Changes related to an internal feature [#50453](https://github.com/ClickHouse/ClickHouse/pull/50453) ([Michael Kolupaev](https://github.com/al13n321)).
* Don't crash if config doesn't have logger section [#50455](https://github.com/ClickHouse/ClickHouse/pull/50455) ([Michael Kolupaev](https://github.com/al13n321)).
* Update function docs [#50466](https://github.com/ClickHouse/ClickHouse/pull/50466) ([Robert Schulze](https://github.com/rschu1ze)).
* Revert "make filter push down through cross join" [#50467](https://github.com/ClickHouse/ClickHouse/pull/50467) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Add some assertions [#50470](https://github.com/ClickHouse/ClickHouse/pull/50470) ([Kseniia Sumarokova](https://github.com/kssenii)).
* CI: Enable aspell on nested docs [#50476](https://github.com/ClickHouse/ClickHouse/pull/50476) ([Robert Schulze](https://github.com/rschu1ze)).
* Try fix flaky test test_async_query_sending [#50480](https://github.com/ClickHouse/ClickHouse/pull/50480) ([Kruglov Pavel](https://github.com/Avogar)).
* Disable 00534_functions_bad_arguments with msan [#50481](https://github.com/ClickHouse/ClickHouse/pull/50481) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Typos: Follow-up to [#50476](https://github.com/ClickHouse/ClickHouse/issues/50476) [#50482](https://github.com/ClickHouse/ClickHouse/pull/50482) ([Robert Schulze](https://github.com/rschu1ze)).
* Remove unneeded Keeper test [#50485](https://github.com/ClickHouse/ClickHouse/pull/50485) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix KeyError in cherry-pick [#50493](https://github.com/ClickHouse/ClickHouse/pull/50493) ([pufit](https://github.com/pufit)).
* Make typeid_cast for pointers noexcept [#50495](https://github.com/ClickHouse/ClickHouse/pull/50495) ([Sergey Kazmin ](https://github.com/yerseg)).
* less traces in logs [#50518](https://github.com/ClickHouse/ClickHouse/pull/50518) ([Sema Checherinda](https://github.com/CheSema)).
* Implement endianness-independent serialization for UUID [#50519](https://github.com/ClickHouse/ClickHouse/pull/50519) ([ltrk2](https://github.com/ltrk2)).
* Remove strange object storage methods [#50521](https://github.com/ClickHouse/ClickHouse/pull/50521) ([alesapin](https://github.com/alesapin)).
* Fix low quality code around metadata in RocksDB (experimental feature never used in production) [#50527](https://github.com/ClickHouse/ClickHouse/pull/50527) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Function if constant folding [#50529](https://github.com/ClickHouse/ClickHouse/pull/50529) ([Maksim Kita](https://github.com/kitaisreal)).
* Add profile events for fs cache eviction [#50533](https://github.com/ClickHouse/ClickHouse/pull/50533) ([Kseniia Sumarokova](https://github.com/kssenii)).
* QueryNode small fix [#50535](https://github.com/ClickHouse/ClickHouse/pull/50535) ([Maksim Kita](https://github.com/kitaisreal)).
* Control memory usage in generateRandom [#50538](https://github.com/ClickHouse/ClickHouse/pull/50538) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Disable skim (Rust library) under memory sanitizer [#50539](https://github.com/ClickHouse/ClickHouse/pull/50539) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* MSan support for Rust [#50541](https://github.com/ClickHouse/ClickHouse/pull/50541) ([Azat Khuzhin](https://github.com/azat)).
* Make 01565_query_loop_after_client_error slightly more robust [#50542](https://github.com/ClickHouse/ClickHouse/pull/50542) ([Azat Khuzhin](https://github.com/azat)).
* Resize BufferFromVector underlying vector only pos_offset == vector.size() [#50546](https://github.com/ClickHouse/ClickHouse/pull/50546) ([auxten](https://github.com/auxten)).
* Add async iteration to object storage [#50548](https://github.com/ClickHouse/ClickHouse/pull/50548) ([alesapin](https://github.com/alesapin)).
* skip extracting darwin toolchain in builder when unncessary [#50550](https://github.com/ClickHouse/ClickHouse/pull/50550) ([SuperDJY](https://github.com/cmsxbc)).
* Remove flaky test [#50558](https://github.com/ClickHouse/ClickHouse/pull/50558) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Revert "Disable skim (Rust library) under memory sanitizer" [#50574](https://github.com/ClickHouse/ClickHouse/pull/50574) ([Azat Khuzhin](https://github.com/azat)).
* Analyzer: fix 01487_distributed_in_not_default_db [#50587](https://github.com/ClickHouse/ClickHouse/pull/50587) ([Dmitry Novik](https://github.com/novikd)).
* Fix commit for DiskObjectStorage [#50599](https://github.com/ClickHouse/ClickHouse/pull/50599) ([alesapin](https://github.com/alesapin)).
* Fix Jepsen runs in PRs [#50615](https://github.com/ClickHouse/ClickHouse/pull/50615) ([Antonio Andelic](https://github.com/antonio2368)).
* Revert incorrect optimizations [#50629](https://github.com/ClickHouse/ClickHouse/pull/50629) ([Raúl Marín](https://github.com/Algunenano)).
* Disable 01676_clickhouse_client_autocomplete under UBSan [#50636](https://github.com/ClickHouse/ClickHouse/pull/50636) ([Nikita Taranov](https://github.com/nickitat)).
* Merging [#50329](https://github.com/ClickHouse/ClickHouse/issues/50329) [#50660](https://github.com/ClickHouse/ClickHouse/pull/50660) ([Anton Popov](https://github.com/CurtizJ)).
* Revert "date_trunc function to always return DateTime type" [#50670](https://github.com/ClickHouse/ClickHouse/pull/50670) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix flaky test 02461_prewhere_row_level_policy_lightweight_delete [#50674](https://github.com/ClickHouse/ClickHouse/pull/50674) ([Alexander Gololobov](https://github.com/davenger)).
* Fix asan issue with analyzer and prewhere [#50685](https://github.com/ClickHouse/ClickHouse/pull/50685) ([Alexander Gololobov](https://github.com/davenger)).
* Catch issues with dockerd during the build [#50700](https://github.com/ClickHouse/ClickHouse/pull/50700) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Temporarily disable annoy index tests (flaky for analyzer) [#50714](https://github.com/ClickHouse/ClickHouse/pull/50714) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix assertion from stress test [#50718](https://github.com/ClickHouse/ClickHouse/pull/50718) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix flaky unit test [#50719](https://github.com/ClickHouse/ClickHouse/pull/50719) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Show correct sharing state in system.query_cache [#50728](https://github.com/ClickHouse/ClickHouse/pull/50728) ([Robert Schulze](https://github.com/rschu1ze)).

View File

@ -0,0 +1,18 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.5.2.7-stable (5751aa1ab9f) FIXME as compared to v23.5.1.3174-stable (2fec796e73e)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Fix build for aarch64 (temporary disable azure) [#50770](https://github.com/ClickHouse/ClickHouse/pull/50770) ([alesapin](https://github.com/alesapin)).
* Rename azure_blob_storage to azureBlobStorage [#50812](https://github.com/ClickHouse/ClickHouse/pull/50812) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).

View File

@ -53,6 +53,7 @@ Engines in the family:
- [JDBC](../../engines/table-engines/integrations/jdbc.md)
- [MySQL](../../engines/table-engines/integrations/mysql.md)
- [MongoDB](../../engines/table-engines/integrations/mongodb.md)
- [Redis](../../engines/table-engines/integrations/redis.md)
- [HDFS](../../engines/table-engines/integrations/hdfs.md)
- [S3](../../engines/table-engines/integrations/s3.md)
- [Kafka](../../engines/table-engines/integrations/kafka.md)

View File

@ -0,0 +1,51 @@
---
slug: /en/engines/table-engines/integrations/azureBlobStorage
sidebar_label: Azure Blob Storage
---
# AzureBlobStorage Table Engine
This engine provides an integration with [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) ecosystem.
## Create Table
``` sql
CREATE TABLE azure_blob_storage_table (name String, value UInt32)
ENGINE = AzureBlobStorage(connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])
[PARTITION BY expr]
[SETTINGS ...]
```
### Engine parameters
- `connection_string|storage_account_url` — connection_string includes account name & key ([Create connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&bc=%2Fazure%2Fstorage%2Fblobs%2Fbreadcrumb%2Ftoc.json#configure-a-connection-string-for-an-azure-storage-account)) or you could also provide the storage account url here and account name & account key as separate parameters (see parameters account_name & account_key)
- `container_name` - Container name
- `blobpath` - file path. Supports following wildcards in readonly mode: `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc'`, `'def'` — strings.
- `account_name` - if storage_account_url is used, then account name can be specified here
- `account_key` - if storage_account_url is used, then account key can be specified here
- `format` — The [format](/docs/en/interfaces/formats.md) of the file.
- `compression` — Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. By default, it will autodetect compression by file extension. (same as setting to `auto`).
**Example**
``` sql
CREATE TABLE test_table (key UInt64, data String)
ENGINE = AzureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;',
'test_container', 'test_table', 'CSV');
INSERT INTO test_table VALUES (1, 'a'), (2, 'b'), (3, 'c');
SELECT * FROM test_table;
```
```text
┌─key──┬─data──┐
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
└──────┴───────┘
```
## See also
[Azure Blob Storage Table Function](/docs/en/sql-reference/table-functions/azureBlobStorage)

View File

@ -6,24 +6,4 @@ sidebar_label: Integrations
# Table Engines for Integrations
ClickHouse provides various means for integrating with external systems, including table engines. Like with all other table engines, the configuration is done using `CREATE TABLE` or `ALTER TABLE` queries. Then from a user perspective, the configured integration looks like a normal table, but queries to it are proxied to the external system. This transparent querying is one of the key advantages of this approach over alternative integration methods, like dictionaries or table functions, which require to use custom query methods on each use.
List of supported integrations:
- [ODBC](../../../engines/table-engines/integrations/odbc.md)
- [JDBC](../../../engines/table-engines/integrations/jdbc.md)
- [MySQL](../../../engines/table-engines/integrations/mysql.md)
- [MongoDB](../../../engines/table-engines/integrations/mongodb.md)
- [HDFS](../../../engines/table-engines/integrations/hdfs.md)
- [S3](../../../engines/table-engines/integrations/s3.md)
- [Kafka](../../../engines/table-engines/integrations/kafka.md)
- [EmbeddedRocksDB](../../../engines/table-engines/integrations/embedded-rocksdb.md)
- [RabbitMQ](../../../engines/table-engines/integrations/rabbitmq.md)
- [PostgreSQL](../../../engines/table-engines/integrations/postgresql.md)
- [SQLite](../../../engines/table-engines/integrations/sqlite.md)
- [Hive](../../../engines/table-engines/integrations/hive.md)
- [ExternalDistributed](../../../engines/table-engines/integrations/ExternalDistributed.md)
- [MaterializedPostgreSQL](../../../engines/table-engines/integrations/materialized-postgresql.md)
- [NATS](../../../engines/table-engines/integrations/nats.md)
- [DeltaLake](../../../engines/table-engines/integrations/deltalake.md)
- [Hudi](../../../engines/table-engines/integrations/hudi.md)
ClickHouse provides various means for integrating with external systems, including table engines. Like with all other table engines, the configuration is done using `CREATE TABLE` or `ALTER TABLE` queries. Then from a user perspective, the configured integration looks like a normal table, but queries to it are proxied to the external system. This transparent querying is one of the key advantages of this approach over alternative integration methods, like dictionaries or table functions, which require the use of custom query methods on each use.

View File

@ -0,0 +1,119 @@
---
slug: /en/sql-reference/table-functions/redis
sidebar_position: 43
sidebar_label: Redis
---
# Redis
This engine allows integrating ClickHouse with [Redis](https://redis.io/). For Redis takes kv model, we strongly recommend you only query it in a point way, such as `where k=xx` or `where k in (xx, xx)`.
## Creating a Table {#creating-a-table}
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name
(
name1 [type1],
name2 [type2],
...
) ENGINE = Redis(host:port[, db_index[, password[, pool_size]]]) PRIMARY KEY(primary_key_name);
```
**Engine Parameters**
- `host:port` — Redis server address, you can ignore port and default Redis port 6379 will be used.
- `db_index` — Redis db index range from 0 to 15, default is 0.
- `password` — User password, default is blank string.
- `pool_size` — Redis max connection pool size, default is 16.
- `primary_key_name` - any column name in the column list.
- `primary` must be specified, it supports only one column in the primary key. The primary key will be serialized in binary as a Redis key.
- columns other than the primary key will be serialized in binary as Redis value in corresponding order.
- queries with key equals or in filtering will be optimized to multi keys lookup from Redis. If queries without filtering key full table scan will happen which is a heavy operation.
## Usage Example {#usage-example}
Create a table in ClickHouse which allows to read data from Redis:
``` sql
CREATE TABLE redis_table
(
`k` String,
`m` String,
`n` UInt32
)
ENGINE = Redis('redis1:6379') PRIMARY KEY(k);
```
Insert:
```sql
INSERT INTO redis_table Values('1', 1, '1', 1.0), ('2', 2, '2', 2.0);
```
Query:
``` sql
SELECT COUNT(*) FROM redis_table;
```
``` text
┌─count()─┐
│ 2 │
└─────────┘
```
``` sql
SELECT * FROM redis_table WHERE key='1';
```
```text
┌─key─┬─v1─┬─v2─┬─v3─┐
│ 1 │ 1 │ 1 │ 1 │
└─────┴────┴────┴────┘
```
``` sql
SELECT * FROM redis_table WHERE v1=2;
```
```text
┌─key─┬─v1─┬─v2─┬─v3─┐
│ 2 │ 2 │ 2 │ 2 │
└─────┴────┴────┴────┘
```
Update:
Note that the primary key cannot be updated.
```sql
ALTER TABLE redis_table UPDATE v1=2 WHERE key='1';
```
Delete:
```sql
ALTER TABLE redis_table DELETE WHERE key='1';
```
Truncate:
Flush Redis db asynchronously. Also `Truncate` support SYNC mode.
```sql
TRUNCATE TABLE redis_table SYNC;
```
## Limitations {#limitations}
Redis engine also supports scanning queries, such as `where k > xx`, but it has some limitations:
1. Scanning query may produce some duplicated keys in a very rare case when it is rehashing. See details in [Redis Scan](https://github.com/redis/redis/blob/e4d183afd33e0b2e6e8d1c79a832f678a04a7886/src/dict.c#L1186-L1269)
2. During the scanning, keys could be created and deleted, so the resulting dataset can not represent a valid point in time.

View File

@ -1,104 +1,142 @@
# Approximate Nearest Neighbor Search Indexes [experimental] {#table_engines-ANNIndex}
Nearest neighborhood search refers to the problem of finding the point(s) with the smallest distance to a given point in an n-dimensional
space. Since exact search is in practice usually typically too slow, the task is often solved with approximate algorithms. A popular use
case of of neighbor search is finding similar pictures (texts) for a given picture (text). Pictures (texts) can be decomposed into
[embeddings](https://cloud.google.com/architecture/overview-extracting-and-serving-feature-embeddings-for-machine-learning), and instead of
comparing pictures (texts) pixel-by-pixel (character-by-character), only the embeddings are compared.
Nearest neighborhood search is the problem of finding the M closest points for a given point in an N-dimensional vector space. The most
straightforward approach to solve this problem is a brute force search where the distance between all points in the vector space and the
reference point is computed. This method guarantees perfect accuracy but it is usually too slow for practical applications. Thus, nearest
neighborhood search problems are often solved with [approximative algorithms](https://github.com/erikbern/ann-benchmarks). Approximative
nearest neighborhood search techniques, in conjunction with [embedding
methods](https://cloud.google.com/architecture/overview-extracting-and-serving-feature-embeddings-for-machine-learning) allow to search huge
amounts of media (pictures, songs, articles, etc.) in milliseconds.
In terms of SQL, the problem can be expressed as follows:
Blogs:
- [Vector Search with ClickHouse - Part 1](https://clickhouse.com/blog/vector-search-clickhouse-p1)
- [Vector Search with ClickHouse - Part 2](https://clickhouse.com/blog/vector-search-clickhouse-p2)
In terms of SQL, the nearest neighborhood problem can be expressed as follows:
``` sql
SELECT *
FROM table
WHERE L2Distance(column, Point) < MaxDistance
ORDER BY Distance(vectors, Point)
LIMIT N
```
`vectors` contains N-dimensional values of type [Array](../../../sql-reference/data-types/array.md) or
[Tuple](../../../sql-reference/data-types/tuple.md), for example embeddings. Function `Distance` computes the distance between two vectors.
Often, the the Euclidean (L2) distance is chosen as distance function but [other
distance functions](/docs/en/sql-reference/functions/distance-functions.md) are also possible. `Point` is the reference point, e.g. `(0.17,
0.33, ...)`, and `N` limits the number of search results.
An alternative formulation of the nearest neighborhood search problem looks as follows:
``` sql
SELECT *
FROM table
ORDER BY L2Distance(column, Point)
WHERE Distance(vectors, Point) < MaxDistance
LIMIT N
```
The queries are expensive because the L2 (Euclidean) distance between `Point` and all points in `column` and must be computed. To speed this process up, Approximate Nearest Neighbor Search Indexes (ANN indexes) store a compact representation of the search space (using clustering, search trees, etc.) which allows to compute an approximate answer quickly.
While the first query returns the top-`N` closest points to the reference point, the second query returns all points closer to the reference
point than a maximally allowed radius `MaxDistance`. Parameter `N` limits the number of returned values which is useful for situations where
`MaxDistance` is difficult to determine in advance.
# Creating ANN Indexes
With brute force search, both queries are expensive (linear in the number of points) because the distance between all points in `vectors` and
`Point` must be computed. To speed this process up, Approximate Nearest Neighbor Search Indexes (ANN indexes) store a compact representation
of the search space (using clustering, search trees, etc.) which allows to compute an approximate answer much quicker (in sub-linear time).
As long as ANN indexes are experimental, you first need to `SET allow_experimental_annoy_index = 1`.
# Creating and Using ANN Indexes
Syntax to create an ANN index over an `Array` column:
Syntax to create an ANN index over an [Array](../../../sql-reference/data-types/array.md) column:
```sql
CREATE TABLE table
(
`id` Int64,
`embedding` Array(Float32),
INDEX <ann_index_name> embedding TYPE <ann_index_type>(<ann_index_parameters>) GRANULARITY <N>
`vectors` Array(Float32),
INDEX [ann_index_name vectors TYPE [ann_index_type]([ann_index_parameters]) [GRANULARITY [N]]
)
ENGINE = MergeTree
ORDER BY id;
```
Syntax to create an ANN index over a `Tuple` column:
Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
```sql
CREATE TABLE table
(
`id` Int64,
`embedding` Tuple(Float32[, Float32[, ...]]),
INDEX <ann_index_name> embedding TYPE <ann_index_type>(<ann_index_parameters>) GRANULARITY <N>
`vectors` Tuple(Float32[, Float32[, ...]]),
INDEX [ann_index_name] vectors TYPE [ann_index_type]([ann_index_parameters]) [GRANULARITY [N]]
)
ENGINE = MergeTree
ORDER BY id;
```
ANN indexes are built during column insertion and merge and `INSERT` and `OPTIMIZE` statements will be slower than for ordinary tables. ANNIndexes are ideally used only with immutable or rarely changed data, respectively there are much more read requests than write requests.
Similar to regular skip indexes, ANN indexes are constructed over granules and each indexed block consists of `GRANULARITY = <N>`-many
granules. For example, if the primary index granularity of the table is 8192 (setting `index_granularity = 8192`) and `GRANULARITY = 2`,
then each indexed block will consist of 16384 rows. However, unlike skip indexes, ANN indexes are not only able to skip the entire indexed
block, they are able to skip individual granules in indexed blocks. As a result, the `GRANULARITY` parameter has a different meaning in ANN
indexes than in normal skip indexes. Basically, the bigger `GRANULARITY` is chosen, the more data is provided to a single ANN index, and the
higher the chance that with the right hyper parameters, the index will remember the data structure better.
# Using ANN Indexes
ANN indexes are built during column insertion and merge. As a result, `INSERT` and `OPTIMIZE` statements will be slower than for ordinary
tables. ANNIndexes are ideally used only with immutable or rarely changed data, respectively when are far more read requests than write
requests.
ANN indexes support two types of queries:
- WHERE queries:
``` sql
SELECT *
FROM table
WHERE DistanceFunction(column, Point) < MaxDistance
LIMIT N
```
- ORDER BY queries:
``` sql
SELECT *
FROM table
[WHERE ...]
ORDER BY DistanceFunction(column, Point)
ORDER BY Distance(vectors, Point)
LIMIT N
```
`DistanceFunction` is a [distance function](/docs/en/sql-reference/functions/distance-functions.md), `Point` is a reference vector (e.g. `(0.17, 0.33, ...)`) and `MaxDistance` is a floating point value which restricts the size of the neighbourhood.
- WHERE queries:
``` sql
SELECT *
FROM table
WHERE Distance(vectors, Point) < MaxDistance
LIMIT N
```
:::tip
To avoid writing out large vectors, you can use [query parameters](/docs/en//interfaces/cli.md#queries-with-parameters-cli-queries-with-parameters), e.g.
To avoid writing out large vectors, you can use [query
parameters](/docs/en/interfaces/cli.md#queries-with-parameters-cli-queries-with-parameters), e.g.
```bash
clickhouse-client --param_vec='hello' --query="SELECT * FROM table WHERE L2Distance(embedding, {vec: Array(Float32)}) < 1.0"
clickhouse-client --param_vec='hello' --query="SELECT * FROM table WHERE L2Distance(vectors, {vec: Array(Float32)}) < 1.0"
```
:::
ANN indexes cannot speed up queries that contain both a `WHERE DistanceFunction(column, Point) < MaxDistance` and an `ORDER BY DistanceFunction(column, Point)` clause. Also, the approximate algorithms used to determine the nearest neighbors require a limit, hence queries that use an ANN index must have a `LIMIT` clause.
**Restrictions**: Queries that contain both a `WHERE Distance(vectors, Point) < MaxDistance` and an `ORDER BY Distance(vectors, Point)`
clause cannot use ANN indexes. Also, the approximate algorithms used to determine the nearest neighbors require a limit, hence queries
without `LIMIT` clause cannot utilize ANN indexes. Also ANN indexes are only used if the query has a `LIMIT` value smaller than setting
`max_limit_for_ann_queries` (default: 1 million rows). This is a safeguard to prevent large memory allocations by external libraries for
approximate neighbor search.
**Differences to Skip Indexes** Similar to regular [skip indexes](https://clickhouse.com/docs/en/optimize/skipping-indexes), ANN indexes are
constructed over granules and each indexed block consists of `GRANULARITY = [N]`-many granules (`[N]` = 1 by default for normal skip
indexes). For example, if the primary index granularity of the table is 8192 (setting `index_granularity = 8192`) and `GRANULARITY = 2`,
then each indexed block will contain 16384 rows. However, data structures and algorithms for approximate neighborhood search (usually
provided by external libraries) are inherently row-oriented. They store a compact representation of a set of rows and also return rows for
ANN queries. This causes some rather unintuitive differences in the way ANN indexes behave compared to normal skip indexes.
When a user defines a ANN index on a column, ClickHouse internally creates a ANN "sub-index" for each index block. The sub-index is "local"
in the sense that it only knows about the rows of its containing index block. In the previous example and assuming that a column has 65536
rows, we obtain four index blocks (spanning eight granules) and a ANN sub-index for each index block. A sub-index is theoretically able to
return the rows with the N closest points within its index block directly. However, since ClickHouse loads data from disk to memory at the
granularity of granules, sub-indexes extrapolate matching rows to granule granularity. This is different from regular skip indexes which
skip data at the granularity of index blocks.
The `GRANULARITY` parameter determines how many ANN sub-indexes are created. Bigger `GRANULARITY` values mean fewer but larger ANN
sub-indexes, up to the point where a column (or a column's data part) has only a single sub-index. In that case, the sub-index has a
"global" view of all column rows and can directly return all granules of the column (part) with relevant rows (there are at most
`LIMIT [N]`-many such granules). In a second step, ClickHouse will load these granules and identify the actually best rows by performing a
brute-force distance calculation over all rows of the granules. With a small `GRANULARITY` value, each of the sub-indexes returns up to
`LIMIT N`-many granules. As a result, more granules need to be loaded and post-filtered. Note that the search accuracy is with both cases
equally good, only the processing performance differs. It is generally recommended to use a large `GRANULARITY` for ANN indexes and fall
back to a smaller `GRANULARITY` values only in case of problems like excessive memory consumption of the ANN structures. If no `GRANULARITY`
was specified for ANN indexes, the default value is 100 million.
An ANN index is only used if the query has a `LIMIT` value smaller than setting `max_limit_for_ann_queries` (default: 1 million rows). This is a safety measure which helps to avoid large memory consumption by external libraries for approximate neighbor search.
# Available ANN Indexes
@ -106,51 +144,68 @@ An ANN index is only used if the query has a `LIMIT` value smaller than setting
## Annoy {#annoy}
(currently disabled on ARM due to memory safety problems with the algorithm)
Annoy indexes are currently experimental, to use them you first need to `SET allow_experimental_annoy_index = 1`. They are also currently
disabled on ARM due to memory safety problems with the algorithm.
This type of ANN index implements [the Annoy algorithm](https://github.com/spotify/annoy) which uses a recursive division of the space in random linear surfaces (lines in 2D, planes in 3D etc.).
This type of ANN index implements [the Annoy algorithm](https://github.com/spotify/annoy) which is based on a recursive division of the
space in random linear surfaces (lines in 2D, planes in 3D etc.).
Syntax to create a Annoy index over a `Array` column:
<div class='vimeo-container'>
<iframe src="//www.youtube.com/embed/QkCCyLW0ehU"
width="640"
height="360"
frameborder="0"
allow="autoplay;
fullscreen;
picture-in-picture"
allowfullscreen>
</iframe>
</div>
Syntax to create an Annoy index over an [Array](../../../sql-reference/data-types/array.md) column:
```sql
CREATE TABLE table
(
id Int64,
embedding Array(Float32),
INDEX <ann_index_name> embedding TYPE annoy([DistanceName[, NumTrees]]) GRANULARITY N
vectors Array(Float32),
INDEX [ann_index_name] vectors TYPE annoy([Distance[, NumTrees]]) [GRANULARITY N]
)
ENGINE = MergeTree
ORDER BY id;
```
Syntax to create a Annoy index over a `Tuple` column:
Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
```sql
CREATE TABLE table
(
id Int64,
embedding Tuple(Float32[, Float32[, ...]]),
INDEX <ann_index_name> embedding TYPE annoy([DistanceName[, NumTrees]]) GRANULARITY N
vectors Tuple(Float32[, Float32[, ...]]),
INDEX [ann_index_name] vectors TYPE annoy([Distance[, NumTrees]]) [GRANULARITY N]
)
ENGINE = MergeTree
ORDER BY id;
```
Parameter `DistanceName` is name of a distance function (default `L2Distance`). Annoy currently supports `L2Distance` and `cosineDistance` as distance functions. Parameter `NumTrees` (default: 100) is the number of trees which the algorithm will create. Higher values of `NumTree` mean slower `CREATE` and `SELECT` statements (approximately linearly), but increase the accuracy of search results.
Annoy currently supports `L2Distance` and `cosineDistance` as distance function `Distance`. If no distance function was specified during
index creation, `L2Distance` is used as default. Parameter `NumTrees` is the number of trees which the algorithm creates (default if not
specified: 100). Higher values of `NumTree` mean more accurate search results but slower index creation / query times (approximately
linearly) as well as larger index sizes.
:::note
Indexes over columns of type `Array` will generally work faster than indexes on `Tuple` columns. All arrays **must** have same length. Use [CONSTRAINT](/docs/en/sql-reference/statements/create/table.md#constraints) to avoid errors. For example, `CONSTRAINT constraint_name_1 CHECK length(embedding) = 256`.
Indexes over columns of type `Array` will generally work faster than indexes on `Tuple` columns. All arrays **must** have same length. Use
[CONSTRAINT](/docs/en/sql-reference/statements/create/table.md#constraints) to avoid errors. For example, `CONSTRAINT constraint_name_1
CHECK length(vectors) = 256`.
:::
Setting `annoy_index_search_k_nodes` (default: `NumTrees * LIMIT`) determines how many tree nodes are inspected during SELECTs. It can be used to
balance runtime and accuracy at runtime.
Setting `annoy_index_search_k_nodes` (default: `NumTrees * LIMIT`) determines how many tree nodes are inspected during SELECTs. Larger
values mean more accurate results at the cost of longer query runtime:
Example:
``` sql
```sql
SELECT *
FROM table_name [WHERE ...]
ORDER BY L2Distance(column, Point)
FROM table_name
ORDER BY L2Distance(vectors, Point)
LIMIT N
SETTINGS annoy_index_search_k_nodes=100
SETTINGS annoy_index_search_k_nodes=100;
```

View File

@ -491,7 +491,7 @@ Syntax: `tokenbf_v1(size_of_bloom_filter_in_bytes, number_of_hash_functions, ran
#### Special-purpose
- An experimental index to support approximate nearest neighbor (ANN) search. See [here](annindexes.md) for details.
- Experimental indexes to support approximate nearest neighbor (ANN) search. See [here](annindexes.md) for details.
- An experimental inverted index to support full-text search. See [here](invertedindexes.md) for details.
### Functions Support {#functions-support}
@ -1138,7 +1138,7 @@ These parameters define the cache layer:
Cache parameters:
- `path` — The path where metadata for the cache is stored.
- `max_size` — The size (amount of memory) that the cache can grow to.
- `max_size` — The size (amount of disk space) that the cache can grow to.
:::tip
There are several other cache parameters that you can use to tune your storage, see [using local cache](/docs/en/operations/storing-data.md/#using-local-cache) for the details.

View File

@ -194,7 +194,129 @@ You can pass parameters to `clickhouse-client` (all parameters have a default va
- `--print-profile-events` Print `ProfileEvents` packets.
- `--profile-events-delay-ms` Delay between printing `ProfileEvents` packets (-1 - print only totals, 0 - print every single packet).
Since version 20.5, `clickhouse-client` has automatic syntax highlighting (always enabled).
Instead of `--host`, `--port`, `--user` and `--password` options, ClickHouse client also supports connection strings (see next section).
## Connection string {#connection_string}
clickhouse-client alternatively supports connecting to clickhouse server using a connection string similar to [MongoDB](https://www.mongodb.com/docs/manual/reference/connection-string/), [PostgreSQL](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING), [MySQL](https://dev.mysql.com/doc/refman/8.0/en/connecting-using-uri-or-key-value-pairs.html#connecting-using-uri). It has the following syntax:
```text
clickhouse:[//[user[:password]@][hosts_and_ports]][/database][?query_parameters]
```
Where
- `user` - (optional) is a user name,
- `password` - (optional) is a user password. If `:` is specified and the password is blank, the client will prompt for the user's password.
- `hosts_and_ports` - (optional) is a list of hosts and optional ports `host[:port] [, host:[port]], ...`,
- `database` - (optional) is the database name,
- `query_parameters` - (optional) is a list of key-value pairs `param1=value1[,&param2=value2], ...`. For some parameters, no value is required. Parameter names and values are case-sensitive.
If no user is specified, `default` user without password will be used.
If no host is specified, the `localhost` will be used (localhost).
If no port is specified is not specified, `9000` will be used as port.
If no database is specified, the `default` database will be used.
If the user name, password or database was specified in the connection string, it cannot be specified using `--user`, `--password` or `--database` (and vice versa).
The host component can either be an a host name and IP address. Put an IPv6 address in square brackets to specify it:
```text
clickhouse://[2001:db8::1234]
```
URI allows multiple hosts to be connected to. Connection strings can contain multiple hosts. ClickHouse-client will try to connect to these hosts in order (i.e. from left to right). After the connection is established, no attempt to connect to the remaining hosts is made.
The connection string must be specified as the first argument of clickhouse-client. The connection string can be combined with arbitrary other [command-line-options](#command-line-options) except `--host/-h` and `--port`.
The following keys are allowed for component `query_parameter`:
- `secure` or shorthanded `s` - no value. If specified, client will connect to the server over a secure connection (TLS). See `secure` in [command-line-options](#command-line-options)
### Percent encoding {#connection_string_uri_percent_encoding}
Non-US ASCII, spaces and special characters in the `user`, `password`, `hosts`, `database` and `query parameters` must be [percent-encoded](https://en.wikipedia.org/wiki/URL_encoding).
### Examples {#connection_string_examples}
Connect to localhost using port 9000 and execute the query `SELECT 1`.
``` bash
clickhouse-client clickhouse://localhost:9000 --query "SELECT 1"
```
Connect to localhost using user `john` with password `secret`, host `127.0.0.1` and port `9000`
``` bash
clickhouse-client clickhouse://john:secret@127.0.0.1:9000
```
Connect to localhost using default user, host with IPV6 address `[::1]` and port `9000`.
``` bash
clickhouse-client clickhouse://[::1]:9000
```
Connect to localhost using port 9000 in multiline mode.
``` bash
clickhouse-client clickhouse://localhost:9000 '-m'
```
Connect to localhost using port 9000 with the user `default`.
``` bash
clickhouse-client clickhouse://default@localhost:9000
# equivalent to:
clickhouse-client clickhouse://localhost:9000 --user default
```
Connect to localhost using port 9000 to `my_database` database.
``` bash
clickhouse-client clickhouse://localhost:9000/my_database
# equivalent to:
clickhouse-client clickhouse://localhost:9000 --database my_database
```
Connect to localhost using port 9000 to `my_database` database specified in the connection string and a secure connection using shorthanded 's' URI parameter.
```bash
clickhouse-client clickhouse://localhost/my_database?s
# equivalent to:
clickhouse-client clickhouse://localhost/my_database -s
```
Connect to default host using default port, default user, and default database.
``` bash
clickhouse-client clickhouse:
```
Connect to the default host using the default port, using user `my_user` and no password.
``` bash
clickhouse-client clickhouse://my_user@
# Using a blank password between : and @ means to asking user to enter the password before starting the connection.
clickhouse-client clickhouse://my_user:@
```
Connect to localhost using email as the user name. `@` symbol is percent encoded to `%40`.
``` bash
clickhouse-client clickhouse://some_user%40some_mail.com@localhost:9000
```
Connect to one of provides hosts: `192.168.1.15`, `192.168.1.25`.
``` bash
clickhouse-client clickhouse://192.168.1.15,192.168.1.25
```
### Configuration Files {#configuration_files}

View File

@ -193,6 +193,7 @@ SELECT * FROM nestedt FORMAT TSV
- [output_format_tsv_crlf_end_of_line](/docs/en/operations/settings/settings-formats.md/#output_format_tsv_crlf_end_of_line) - if it is set true, end of line in TSV output format will be `\r\n` instead of `\n`. Default value - `false`.
- [input_format_tsv_skip_first_lines](/docs/en/operations/settings/settings-formats.md/#input_format_tsv_skip_first_lines) - skip specified number of lines at the beginning of data. Default value - `0`.
- [input_format_tsv_detect_header](/docs/en/operations/settings/settings-formats.md/#input_format_tsv_detect_header) - automatically detect header with names and types in TSV format. Default value - `true`.
- [input_format_tsv_skip_trailing_empty_lines](/docs/en/operations/settings/settings-formats.md/#input_format_tsv_skip_trailing_empty_lines) - skip trailing empty lines at the end of data. Default value - `false`.
## TabSeparatedRaw {#tabseparatedraw}
@ -467,6 +468,7 @@ The CSV format supports the output of totals and extremes the same way as `TabSe
- [output_format_csv_crlf_end_of_line](/docs/en/operations/settings/settings-formats.md/#output_format_csv_crlf_end_of_line) - if it is set to true, end of line in CSV output format will be `\r\n` instead of `\n`. Default value - `false`.
- [input_format_csv_skip_first_lines](/docs/en/operations/settings/settings-formats.md/#input_format_csv_skip_first_lines) - skip the specified number of lines at the beginning of data. Default value - `0`.
- [input_format_csv_detect_header](/docs/en/operations/settings/settings-formats.md/#input_format_csv_detect_header) - automatically detect header with names and types in CSV format. Default value - `true`.
- [input_format_csv_skip_trailing_empty_lines](/docs/en/operations/settings/settings-formats.md/#input_format_csv_skip_trailing_empty_lines) - skip trailing empty lines at the end of data. Default value - `false`.
- [input_format_csv_trim_whitespaces](/docs/en/operations/settings/settings-formats.md/#input_format_csv_trim_whitespaces) - trim spaces and tabs in non-quoted CSV strings. Default value - `true`.
## CSVWithNames {#csvwithnames}
@ -495,7 +497,9 @@ the types from input data will be compared with the types of the corresponding c
Similar to [Template](#format-template), but it prints or reads all names and types of columns and uses escaping rule from [format_custom_escaping_rule](/docs/en/operations/settings/settings-formats.md/#format_custom_escaping_rule) setting and delimiters from [format_custom_field_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_field_delimiter), [format_custom_row_before_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_row_before_delimiter), [format_custom_row_after_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_row_after_delimiter), [format_custom_row_between_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_row_between_delimiter), [format_custom_result_before_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_result_before_delimiter) and [format_custom_result_after_delimiter](/docs/en/operations/settings/settings-formats.md/#format_custom_result_after_delimiter) settings, not from format strings.
If setting [input_format_custom_detect_header](/docs/en/operations/settings/settings.md/#input_format_custom_detect_header) is enabled, ClickHouse will automatically detect header with names and types if any.
If setting [input_format_custom_detect_header](/docs/en/operations/settings/settings-formats.md/#input_format_custom_detect_header) is enabled, ClickHouse will automatically detect header with names and types if any.
If setting [input_format_tsv_skip_trailing_empty_lines](/docs/en/operations/settings/settings-formats.md/#input_format_custom_detect_header) is enabled, trailing empty lines at the end of file will be skipped.
There is also `CustomSeparatedIgnoreSpaces` format, which is similar to [TemplateIgnoreSpaces](#templateignorespaces).

View File

@ -329,8 +329,8 @@ SELECT count() FROM system.schema_inference_cache WHERE storage='S3'
## Text formats {#text-formats}
For text formats, ClickHouse reads the data row by row, extracts column values according to the format,
and then uses some recursive parsers and heuristics to determine the type for each value. The maximum number of rows read from the data in schema inference
is controlled by the setting `input_format_max_rows_to_read_for_schema_inference` with default value 25000.
and then uses some recursive parsers and heuristics to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference
is controlled by the settings `input_format_max_rows_to_read_for_schema_inference` (25000 by default) and `input_format_max_bytes_to_read_for_schema_inference` (32Mb by default).
By default, all inferred types are [Nullable](../sql-reference/data-types/nullable.md), but you can change this by setting `schema_inference_make_columns_nullable` (see examples in the [settings](#settings-for-text-formats) section).
### JSON formats {#json-formats}
@ -1144,13 +1144,15 @@ Line: value_1=2, value_2="Some string 2", value_3="[4, 5, NULL]"$$)
### Settings for text formats {#settings-for-text-formats}
#### input_format_max_rows_to_read_for_schema_inference
#### input_format_max_rows_to_read_for_schema_inference/input_format_max_bytes_to_read_for_schema_inference
This setting controls the maximum number of rows to be read while schema inference.
The more rows are read, the more time is spent on schema inference, but the greater the chance to
These settings control the amount of data to be read while schema inference.
The more rows/bytes are read, the more time is spent on schema inference, but the greater the chance to
correctly determine the types (especially when the data contains a lot of nulls).
Default value: `25000`.
Default values:
- `25000` for `input_format_max_rows_to_read_for_schema_inference`.
- `33554432` (32 Mb) for `input_format_max_bytes_to_read_for_schema_inference`.
#### column_names_for_schema_inference
@ -1192,7 +1194,7 @@ DESC format(JSONEachRow, '{"id" : 1, "age" : 25, "name" : "Josh", "status" : nul
#### schema_inference_make_columns_nullable
Controls making inferred types `Nullable` in schema inference for formats without information about nullability.
If the setting is enabled, all inferred type will be `Nullable`, if disabled, the inferred type will be `Nullable` only if the column contains `NULL` in a sample that is parsed during schema inference.
If the setting is enabled, all inferred type will be `Nullable`, if disabled, the inferred type will be `Nullable` only if `input_format_null_as_default` is disabled and the column contains `NULL` in a sample that is parsed during schema inference.
Enabled by default.
@ -1215,7 +1217,8 @@ DESC format(JSONEachRow, $$
└─────────┴─────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
```sql
SET schema_inference_make_columns_nullable = 0
SET schema_inference_make_columns_nullable = 0;
SET input_format_null_as_default = 0;
DESC format(JSONEachRow, $$
{"id" : 1, "age" : 25, "name" : "Josh", "status" : null, "hobbies" : ["football", "cooking"]}
{"id" : 2, "age" : 19, "name" : "Alan", "status" : "married", "hobbies" : ["tennis", "art"]}
@ -1232,6 +1235,25 @@ DESC format(JSONEachRow, $$
└─────────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
```sql
SET schema_inference_make_columns_nullable = 0;
SET input_format_null_as_default = 1;
DESC format(JSONEachRow, $$
{"id" : 1, "age" : 25, "name" : "Josh", "status" : null, "hobbies" : ["football", "cooking"]}
{"id" : 2, "age" : 19, "name" : "Alan", "status" : "married", "hobbies" : ["tennis", "art"]}
$$)
```
```response
┌─name────┬─type──────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ id │ Int64 │ │ │ │ │ │
│ age │ Int64 │ │ │ │ │ │
│ name │ String │ │ │ │ │ │
│ status │ String │ │ │ │ │ │
│ hobbies │ Array(String) │ │ │ │ │ │
└─────────┴───────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
#### input_format_try_infer_integers
If enabled, ClickHouse will try to infer integers instead of floats in schema inference for text formats.
@ -1623,7 +1645,7 @@ In schema inference for CapnProto format ClickHouse uses the following type matc
## Strong-typed binary formats {#strong-typed-binary-formats}
In such formats, each serialized value contains information about its type (and possibly about its name), but there is no information about the whole table.
In schema inference for such formats, ClickHouse reads data row by row (up to `input_format_max_rows_to_read_for_schema_inference` rows) and extracts
In schema inference for such formats, ClickHouse reads data row by row (up to `input_format_max_rows_to_read_for_schema_inference` rows or `input_format_max_bytes_to_read_for_schema_inference` bytes) and extracts
the type (and possibly name) for each value from the data and then converts these types to ClickHouse types.
### MsgPack {#msgpack}

View File

@ -1881,6 +1881,32 @@ The default server configuration file `config.xml` contains the following settin
</trace_log>
```
## asynchronous_insert_log {#server_configuration_parameters-asynchronous_insert_log}
Settings for the [asynchronous_insert_log](../../operations/system-tables/asynchronous_insert_log.md#system_tables-asynchronous_insert_log) system table for logging async inserts.
Parameters:
- `database` — Database name.
- `table` — Table name.
- `partition_by` — [Custom partitioning key](../../engines/table-engines/mergetree-family/custom-partitioning-key.md) for a system table. Can't be used if `engine` defined.
- `engine` - [MergeTree Engine Definition](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-creating-a-table) for a system table. Can't be used if `partition_by` defined.
- `flush_interval_milliseconds` — Interval for flushing data from the buffer in memory to the table.
- `storage_policy` Name of storage policy to use for the table (optional)
**Example**
```xml
<clickhouse>
<asynchronous_insert_log>
<database>system</database>
<table>asynchronous_insert_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
<partition_by>toYYYYMM(event_date)</partition_by>
<!-- <engine>Engine = MergeTree PARTITION BY event_date ORDER BY event_time TTL event_date + INTERVAL 30 day</engine> -->
</asynchronous_insert_log>
</clickhouse>
```
## query_masking_rules {#query-masking-rules}
Regexp-based rules, which will be applied to queries as well as all log messages before storing them in server logs,

View File

@ -137,6 +137,12 @@ The maximum rows of data to read for automatic schema inference.
Default value: `25'000`.
## input_format_max_bytes_to_read_for_schema_inference {#input_format_max_bytes_to_read_for_schema_inference}
The maximum amount of data in bytes to read for automatic schema inference.
Default value: `33554432` (32 Mb).
## column_names_for_schema_inference {#column_names_for_schema_inference}
The list of column names to use in schema inference for formats without column names. The format: 'column1,column2,column3,...'
@ -728,6 +734,12 @@ My NULL
My NULL
```
### input_format_tsv_skip_trailing_empty_lines {input_format_tsv_skip_trailing_empty_lines}
When enabled, trailing empty lines at the end of TSV file will be skipped.
Disabled by default.
## CSV format settings {#csv-format-settings}
### format_csv_delimiter {#format_csv_delimiter}
@ -882,6 +894,12 @@ My NULL
My NULL
```
### input_format_csv_skip_trailing_empty_lines {input_format_csv_skip_trailing_empty_lines}
When enabled, trailing empty lines at the end of CSV file will be skipped.
Disabled by default.
### input_format_csv_trim_whitespaces {#input_format_csv_trim_whitespaces}
Trims spaces and tabs in non-quoted CSV strings.
@ -1475,6 +1493,12 @@ Sets the character that is interpreted as a suffix after the result set for [Cus
Default value: `''`.
### input_format_custom_skip_trailing_empty_lines {input_format_custom_skip_trailing_empty_lines}
When enabled, trailing empty lines at the end of file in CustomSeparated format will be skipped.
Disabled by default.
## Regexp format settings {#regexp-format-settings}
### format_regexp_escaping_rule {#format_regexp_escaping_rule}

View File

@ -535,8 +535,6 @@ Possible values:
The first phase of a grace join reads the right table and splits it into N buckets depending on the hash value of key columns (initially, N is `grace_hash_join_initial_buckets`). This is done in a way to ensure that each bucket can be processed independently. Rows from the first bucket are added to an in-memory hash table while the others are saved to disk. If the hash table grows beyond the memory limit (e.g., as set by [`max_bytes_in_join`](/docs/en/operations/settings/query-complexity.md/#settings-max_bytes_in_join)), the number of buckets is increased and the assigned bucket for each row. Any rows which dont belong to the current bucket are flushed and reassigned.
Supports `INNER/LEFT/RIGHT/FULL ALL/ANY JOIN`.
- hash
[Hash join algorithm](https://en.wikipedia.org/wiki/Hash_join) is used. The most generic implementation that supports all combinations of kind and strictness and multiple join keys that are combined with `OR` in the `JOIN ON` section.
@ -1959,6 +1957,10 @@ Default value: empty string (disabled)
For the replicated tables by default the only 100 of the most recent inserts for each partition are deduplicated (see [replicated_deduplication_window](merge-tree-settings.md/#replicated-deduplication-window), [replicated_deduplication_window_seconds](merge-tree-settings.md/#replicated-deduplication-window-seconds)).
For not replicated tables see [non_replicated_deduplication_window](merge-tree-settings.md/#non-replicated-deduplication-window).
:::note
`insert_deduplication_token` works on a partition level (the same as `insert_deduplication` checksum). Multiple partitions can have the same `insert_deduplication_token`.
:::
Example:
```sql
@ -4367,6 +4369,32 @@ Possible values:
Default value: `false`.
## rename_files_after_processing
- **Type:** String
- **Default value:** Empty string
This setting allows to specify renaming pattern for files processed by `file` table function. When option is set, all files read by `file` table function will be renamed according to specified pattern with placeholders, only if files processing was successful.
### Placeholders
- `%f` — Original filename without extension (e.g., "sample").
- `%e` — Original file extension with dot (e.g., ".csv").
- `%t` — Timestamp (in microseconds).
- `%%` — Percentage sign ("%").
### Example
- Option: `--rename_files_after_processing="processed_%f_%t%e"`
- Query: `SELECT * FROM file('sample.csv')`
If reading `sample.csv` is successful, file will be renamed to `processed_sample_1683473210851438.csv`
## function_json_value_return_type_allow_complex
Control whether allow to return complex type (such as: struct, array, map) for json_value function.

View File

@ -0,0 +1,64 @@
---
slug: /en/operations/system-tables/asynchronous_insert_log
---
# asynchronous_insert_log
Contains information about async inserts. Each entry represents an insert query buffered into an async insert query.
To start logging configure parameters in the [asynchronous_insert_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-asynchronous_insert_log) section.
The flushing period of data is set in `flush_interval_milliseconds` parameter of the [asynchronous_insert_log](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-asynchronous_insert_log) server settings section. To force flushing, use the [SYSTEM FLUSH LOGS](../../sql-reference/statements/system.md#query_language-system-flush_logs) query.
ClickHouse does not delete data from the table automatically. See [Introduction](../../operations/system-tables/index.md#system-tables-introduction) for more details.
Columns:
- `event_date` ([Date](../../sql-reference/data-types/date.md)) — The date when the async insert happened.
- `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — The date and time when the async insert finished execution.
- `event_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — The date and time when the async insert finished execution with microseconds precision.
- `query` ([String](../../sql-reference/data-types/string.md)) — Query string.
- `database` ([String](../../sql-reference/data-types/string.md)) — The name of the database the table is in.
- `table` ([String](../../sql-reference/data-types/string.md)) — Table name.
- `format` ([String](/docs/en/sql-reference/data-types/string.md)) — Format name.
- `query_id` ([String](../../sql-reference/data-types/string.md)) — ID of the initial query.
- `bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Number of inserted bytes.
- `exception` ([String](../../sql-reference/data-types/string.md)) — Exception message.
- `status` ([Enum8](../../sql-reference/data-types/enum.md)) — Status of the view. Values:
- `'Ok' = 1` — Successful insert.
- `'ParsingError' = 2` — Exception when parsing the data.
- `'FlushError' = 3` — Exception when flushing the data.
- `flush_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — The date and time when the flush happened.
- `flush_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — The date and time when the flush happened with microseconds precision.
- `flush_query_id` ([String](../../sql-reference/data-types/string.md)) — ID of the flush query.
**Example**
Query:
``` sql
SELECT * FROM system.asynchronous_insert_log LIMIT 1 \G;
```
Result:
``` text
event_date: 2023-06-08
event_time: 2023-06-08 10:08:53
event_time_microseconds: 2023-06-08 10:08:53.199516
query: INSERT INTO public.data_guess (user_id, datasource_id, timestamp, path, type, num, str) FORMAT CSV
database: public
table: data_guess
format: CSV
query_id: b46cd4c4-0269-4d0b-99f5-d27668c6102e
bytes: 133223
exception:
status: Ok
flush_time: 2023-06-08 10:08:55
flush_time_microseconds: 2023-06-08 10:08:55.139676
flush_query_id: cd2c1e43-83f5-49dc-92e4-2fbc7f8d3716
```
**See Also**
- [system.query_log](../../operations/system-tables/query_log.md#system_tables-query_log) — Description of the `query_log` system table which contains common information about queries execution.
- [system.asynchronous_inserts](../../operations/system-tables/asynchronous_inserts.md#system_tables-asynchronous_inserts) — This table contains information about pending asynchronous inserts in queue.

View File

@ -0,0 +1,45 @@
---
slug: /en/operations/system-tables/asynchronous_inserts
---
# asynchronous_inserts
Contains information about pending asynchronous inserts in queue.
Columns:
- `query` ([String](../../sql-reference/data-types/string.md)) — Query string.
- `database` ([String](../../sql-reference/data-types/string.md)) — The name of the database the table is in.
- `table` ([String](../../sql-reference/data-types/string.md)) — Table name.
- `format` ([String](/docs/en/sql-reference/data-types/string.md)) — Format name.
- `first_update` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — First insert time with microseconds resolution.
- `total_bytes` ([UInt64](../../sql-reference/data-types/int-uint.md#uint-ranges)) — Total number of bytes waiting in the queue.
- `entries.query_id` ([Array(String)](../../sql-reference/data-types/array.md)) - Array of query ids of the inserts waiting in the queue.
- `entries.bytes` ([Array(UInt64)](../../sql-reference/data-types/array.md)) - Array of bytes of each insert query waiting in the queue.
**Example**
Query:
``` sql
SELECT * FROM system.asynchronous_inserts LIMIT 1 \G;
```
Result:
``` text
Row 1:
──────
query: INSERT INTO public.data_guess (user_id, datasource_id, timestamp, path, type, num, str) FORMAT CSV
database: public
table: data_guess
format: CSV
first_update: 2023-06-08 10:08:54.199606
total_bytes: 133223
entries.query_id: ['b46cd4c4-0269-4d0b-99f5-d27668c6102e']
entries.bytes: [133223]
```
**See Also**
- [system.query_log](../../operations/system-tables/query_log.md#system_tables-query_log) — Description of the `query_log` system table which contains common information about queries execution.
- [system.asynchronous_insert_log](../../operations/system-tables/asynchronous_insert_log.md#system_tables-asynchronous_insert_log) — This table contains information about async inserts performed.

View File

@ -33,7 +33,7 @@ SELECT
toTypeName(toNullable('') AS val) AS source_type,
toTypeName(toString(val)) AS to_type_result_type,
toTypeName(CAST(val, 'String')) AS cast_result_type
┌─source_type──────┬─to_type_result_type─┬─cast_result_type─┐
│ Nullable(String) │ Nullable(String) │ String │
└──────────────────┴─────────────────────┴──────────────────┘
@ -203,7 +203,7 @@ Result:
## toDate
Converts the argument to [Date](/docs/en/sql-reference/data-types/date.md) data type.
Converts the argument to [Date](/docs/en/sql-reference/data-types/date.md) data type.
If the argument is [DateTime](/docs/en/sql-reference/data-types/datetime.md) or [DateTime64](/docs/en/sql-reference/data-types/datetime64.md), it truncates it and leaves the date component of the DateTime:
@ -232,7 +232,7 @@ SELECT
│ 2022-12-30 │ Date │
└────────────┴──────────────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
1 row in set. Elapsed: 0.001 sec.
```
```sql
@ -314,20 +314,183 @@ SELECT
└─────────────────────┴───────────────┴─────────────┴─────────────────────┘
```
## toDateOrZero
The same as [toDate](#todate) but returns lower boundary of [Date](/docs/en/sql-reference/data-types/date.md) if an invalid argument is received. Only [String](/docs/en/sql-reference/data-types/string.md) argument is supported.
**Example**
Query:
``` sql
SELECT toDateOrZero('2022-12-30'), toDateOrZero('');
```
Result:
```response
┌─toDateOrZero('2022-12-30')─┬─toDateOrZero('')─┐
│ 2022-12-30 │ 1970-01-01 │
└────────────────────────────┴──────────────────┘
```
## toDateOrNull
The same as [toDate](#todate) but returns `NULL` if an invalid argument is received. Only [String](/docs/en/sql-reference/data-types/string.md) argument is supported.
**Example**
Query:
``` sql
SELECT toDateOrNull('2022-12-30'), toDateOrNull('');
```
Result:
```response
┌─toDateOrNull('2022-12-30')─┬─toDateOrNull('')─┐
│ 2022-12-30 │ ᴺᵁᴸᴸ │
└────────────────────────────┴──────────────────┘
```
## toDateOrDefault
Like [toDate](#todate) but if unsuccessful, returns a default value which is either the second argument (if specified), or otherwise the lower boundary of [Date](/docs/en/sql-reference/data-types/date.md).
**Syntax**
``` sql
toDateOrDefault(expr [, default_value])
```
**Example**
Query:
``` sql
SELECT toDateOrDefault('2022-12-30'), toDateOrDefault('', '2023-01-01'::Date);
```
Result:
```response
┌─toDateOrDefault('2022-12-30')─┬─toDateOrDefault('', CAST('2023-01-01', 'Date'))─┐
│ 2022-12-30 │ 2023-01-01 │
└───────────────────────────────┴─────────────────────────────────────────────────┘
```
## toDateTime
Converts an input value to [DateTime](/docs/en/sql-reference/data-types/datetime.md).
**Syntax**
``` sql
toDateTime(expr[, time_zone ])
```
**Arguments**
- `expr` — The value. [String](/docs/en/sql-reference/data-types/string.md), [Int](/docs/en/sql-reference/data-types/int-uint.md), [Date](/docs/en/sql-reference/data-types/date.md) or [DateTime](/docs/en/sql-reference/data-types/datetime.md).
- `time_zone` — Time zone. [String](/docs/en/sql-reference/data-types/string.md).
If `expr` is a number, it is interpreted as the number of seconds since the beginning of the Unix Epoch (as Unix timestamp).
**Returned value**
- A date time. [DateTime](/docs/en/sql-reference/data-types/datetime.md)
**Example**
Query:
``` sql
SELECT toDateTime('2022-12-30 13:44:17'), toDateTime(1685457500, 'UTC');
```
Result:
```response
┌─toDateTime('2022-12-30 13:44:17')─┬─toDateTime(1685457500, 'UTC')─┐
│ 2022-12-30 13:44:17 │ 2023-05-30 14:38:20 │
└───────────────────────────────────┴───────────────────────────────┘
```
## toDateTimeOrZero
The same as [toDateTime](#todatetime) but returns lower boundary of [DateTime](/docs/en/sql-reference/data-types/datetime.md) if an invalid argument is received. Only [String](/docs/en/sql-reference/data-types/string.md) argument is supported.
**Example**
Query:
``` sql
SELECT toDateTimeOrZero('2022-12-30 13:44:17'), toDateTimeOrZero('');
```
Result:
```response
┌─toDateTimeOrZero('2022-12-30 13:44:17')─┬─toDateTimeOrZero('')─┐
│ 2022-12-30 13:44:17 │ 1970-01-01 00:00:00 │
└─────────────────────────────────────────┴──────────────────────┘
```
## toDateTimeOrNull
The same as [toDateTime](#todatetime) but returns `NULL` if an invalid argument is received. Only [String](/docs/en/sql-reference/data-types/string.md) argument is supported.
**Example**
Query:
``` sql
SELECT toDateTimeOrNull('2022-12-30 13:44:17'), toDateTimeOrNull('');
```
Result:
```response
┌─toDateTimeOrNull('2022-12-30 13:44:17')─┬─toDateTimeOrNull('')─┐
│ 2022-12-30 13:44:17 │ ᴺᵁᴸᴸ │
└─────────────────────────────────────────┴──────────────────────┘
```
## toDateTimeOrDefault
Like [toDateTime](#todatetime) but if unsuccessful, returns a default value which is either the third argument (if specified), or otherwise the lower boundary of [DateTime](/docs/en/sql-reference/data-types/datetime.md).
**Syntax**
``` sql
toDateTimeOrDefault(expr [, time_zone [, default_value]])
```
**Example**
Query:
``` sql
SELECT toDateTimeOrDefault('2022-12-30 13:44:17'), toDateTimeOrDefault('', 'UTC', '2023-01-01'::DateTime('UTC'));
```
Result:
```response
┌─toDateTimeOrDefault('2022-12-30 13:44:17')─┬─toDateTimeOrDefault('', 'UTC', CAST('2023-01-01', 'DateTime(\'UTC\')'))─┐
│ 2022-12-30 13:44:17 │ 2023-01-01 00:00:00 │
└────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
```
## toDate32
Converts the argument to the [Date32](/docs/en/sql-reference/data-types/date32.md) data type. If the value is outside the range, `toDate32` returns the border values supported by [Date32](/docs/en/sql-reference/data-types/date32.md). If the argument has [Date](/docs/en/sql-reference/data-types/date.md) type, it's borders are taken into account.
@ -519,6 +682,11 @@ SELECT toDateTime64('2019-01-01 00:00:00', 3, 'Asia/Istanbul') AS value, toTypeN
└─────────────────────────┴─────────────────────────────────────────────────────────────────────┘
```
## toDateTime64OrZero
## toDateTime64OrNull
## toDateTime64OrDefault
## toDecimal(32\|64\|128\|256)
@ -1247,7 +1415,7 @@ Returns DateTime values parsed from input string according to a MySQL style form
**Supported format specifiers**
All format specifiers listed in [formatDateTime](/docs/en/sql-reference/functions/date-time-functions.md#date_time_functions-formatDateTime) except:
- %Q: Quarter (1-4)
- %Q: Quarter (1-4)
**Example**

View File

@ -142,11 +142,11 @@ The following operations with [projections](/docs/en/engines/table-engines/merge
## ADD PROJECTION
`ALTER TABLE [db].name ADD PROJECTION name ( SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY] )` - Adds projection description to tables metadata.
`ALTER TABLE [db].name ADD PROJECTION [IF NOT EXISTS] name ( SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY] )` - Adds projection description to tables metadata.
## DROP PROJECTION
`ALTER TABLE [db].name DROP PROJECTION name` - Removes projection description from tables metadata and deletes projection files from disk. Implemented as a [mutation](/docs/en/sql-reference/statements/alter/index.md#mutations).
`ALTER TABLE [db].name DROP PROJECTION [IF EXISTS] name` - Removes projection description from tables metadata and deletes projection files from disk. Implemented as a [mutation](/docs/en/sql-reference/statements/alter/index.md#mutations).
## MATERIALIZE PROJECTION
@ -154,7 +154,7 @@ The following operations with [projections](/docs/en/engines/table-engines/merge
## CLEAR PROJECTION
`ALTER TABLE [db.]table CLEAR PROJECTION name IN PARTITION partition_name` - Deletes projection files from disk without removing description. Implemented as a [mutation](/docs/en/sql-reference/statements/alter/index.md#mutations).
`ALTER TABLE [db.]table CLEAR PROJECTION [IF EXISTS] name IN PARTITION partition_name` - Deletes projection files from disk without removing description. Implemented as a [mutation](/docs/en/sql-reference/statements/alter/index.md#mutations).
The commands `ADD`, `DROP` and `CLEAR` are lightweight in a sense that they only change metadata or remove files.

View File

@ -0,0 +1,71 @@
---
slug: /en/sql-reference/table-functions/azureBlobStorage
sidebar_label: azureBlobStorage
keywords: [azure blob storage]
---
# azureBlobStorage Table Function
Provides a table-like interface to select/insert files in [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs). This table function is similar to the [s3 function](../../sql-reference/table-functions/s3.md).
**Syntax**
``` sql
azureBlobStorage(- connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression, structure])
```
**Arguments**
- `connection_string|storage_account_url` — connection_string includes account name & key ([Create connection string](https://learn.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json&bc=%2Fazure%2Fstorage%2Fblobs%2Fbreadcrumb%2Ftoc.json#configure-a-connection-string-for-an-azure-storage-account)) or you could also provide the storage account url here and account name & account key as separate parameters (see parameters account_name & account_key)
- `container_name` - Container name
- `blobpath` - file path. Supports following wildcards in readonly mode: `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc'`, `'def'` — strings.
- `account_name` - if storage_account_url is used, then account name can be specified here
- `account_key` - if storage_account_url is used, then account key can be specified here
- `format` — The [format](../../interfaces/formats.md#formats) of the file.
- `compression` — Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. By default, it will autodetect compression by file extension. (same as setting to `auto`).
- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`.
**Returned value**
A table with the specified structure for reading or writing data in the specified file.
**Examples**
Write data into azure blob storage using the following :
```sql
INSERT INTO TABLE FUNCTION azureBlobStorage('http://azurite1:10000/devstoreaccount1',
'test_container', 'test_{_partition_id}.csv', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',
'CSV', 'auto', 'column1 UInt32, column2 UInt32, column3 UInt32') PARTITION BY column3 VALUES (1, 2, 3), (3, 2, 1), (78, 43, 3);
```
And then it can be read using
```sql
SELECT * FROM azureBlobStorage('http://azurite1:10000/devstoreaccount1',
'test_container', 'test_1.csv', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',
'CSV', 'auto', 'column1 UInt32, column2 UInt32, column3 UInt32');
```
```response
┌───column1─┬────column2─┬───column3─┐
│ 3 │ 2 │ 1 │
└───────────┴────────────┴───────────┘
```
or using connection_string
```sql
SELECT count(*) FROM azureBlobStorage('DefaultEndpointsProtocol=https;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;EndPointSuffix=core.windows.net',
'test_container', 'test_3.csv', 'CSV', 'auto' , 'column1 UInt32, column2 UInt32, column3 UInt32');
```
``` text
┌─count()─┐
│ 2 │
└─────────┘
```
**See Also**
- [AzureBlobStorage Table Engine](/docs/en/engines/table-engines/integrations/azureBlobStorage.md)

View File

@ -1,11 +0,0 @@
---
slug: /en/sql-reference/table-functions/azure_blob_storage
sidebar_position: 45
sidebar_label: azure_blob_storage
keywords: [azure blob storage]
---
# azure\_blob\_storage Table Function
Provides a table-like interface to select/insert files in [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs). This table function is similar to the [s3 function](../../sql-reference/table-functions/s3.md).

View File

@ -40,7 +40,7 @@ VALUES (1, 2, 3), (3, 2, 1), (1, 3, 2)
As a result, the data is written into the file `test.tsv`:
```bash
# cat /var/lib/clickhouse/user_files/test.tsv
# cat /var/lib/clickhouse/user_files/test.tsv
1 2 3
3 2 1
1 3 2
@ -163,7 +163,7 @@ Query the number of rows in all files of these two directories:
SELECT count(*) FROM file('{some,another}_dir/*', 'TSV', 'name String, value UInt32');
```
:::note
:::note
If your listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
:::
@ -199,3 +199,4 @@ SELECT count(*) FROM file('big_dir/**/file002', 'CSV', 'name String, value UInt3
**See Also**
- [Virtual columns](/docs/en/engines/table-engines/index.md#table_engines-virtual_columns)
- [Rename files after processing](/docs/en/operations/settings/settings.md#rename_files_after_processing)

View File

@ -0,0 +1,67 @@
---
slug: /en/sql-reference/table-functions/redis
sidebar_position: 10
sidebar_label: Redis
---
# Redis
This table function allows integrating ClickHouse with [Redis](https://redis.io/).
**Syntax**
```sql
redis(host:port, key, structure[, db_index[, password[, pool_size]]])
```
**Arguments**
- `host:port` — Redis server address, you can ignore port and default Redis port 6379 will be used.
- `key` — any column name in the column list.
- `structure` — The schema for the ClickHouse table returned from this function.
- `db_index` — Redis db index range from 0 to 15, default is 0.
- `password` — User password, default is blank string.
- `pool_size` — Redis max connection pool size, default is 16.
- `primary` must be specified, it supports only one column in the primary key. The primary key will be serialized in binary as a Redis key.
- columns other than the primary key will be serialized in binary as Redis value in corresponding order.
- queries with key equals or in filtering will be optimized to multi keys lookup from Redis. If queries without filtering key full table scan will happen which is a heavy operation.
**Returned Value**
A table object with key as Redis key, other columns packaged together as Redis value.
## Usage Example {#usage-example}
Create a table in ClickHouse which allows to read data from Redis:
``` sql
CREATE TABLE redis_table
(
`k` String,
`m` String,
`n` UInt32
)
ENGINE = Redis('redis1:6379') PRIMARY KEY(k);
```
```sql
SELECT * FROM redis(
'redis1:6379',
'key',
'key String, v1 String, v2 UInt32'
)
```
**See Also**
- [The `Redis` table engine](/docs/en/engines/table-engines/integrations/redis.md)
- [Using redis as a dictionary source](/docs/en/sql-reference/dictionaries/index.md#redis)

View File

@ -142,7 +142,129 @@ $ clickhouse-client --param_tbl="numbers" --param_db="system" --param_col="numbe
- `--history_file` - путь к файлу с историей команд.
- `--param_<name>` — значение параметра для [запроса с параметрами](#cli-queries-with-parameters).
Начиная с версии 20.5, в `clickhouse-client` есть автоматическая подсветка синтаксиса (включена всегда).
Вместо параметров `--host`, `--port`, `--user` и `--password` клиент ClickHouse также поддерживает строки подключения (смотри следующий раздел).
## Строка подключения {#connection_string}
clickhouse-client также поддерживает подключение к серверу clickhouse с помощью строки подключения, аналогичной [MongoDB](https://www.mongodb.com/docs/manual/reference/connection-string/), [PostgreSQL](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING), [MySQL](https://dev.mysql.com/doc/refman/8.0/en/connecting-using-uri-or-key-value-pairs.html#connecting-using-uri). Она имеет следующий синтаксис:
```text
clickhouse:[//[user[:password]@][hosts_and_ports]][/database][?query_parameters]
```
Где
- `user` - (необязательно) - это имя пользователя,
- `password` - (необязательно) - Пароль пользователя. Если символ `:` укаказан, и пароль пуст, то клиент запросит ввести пользователя пароль.
- `hosts_and_ports` - (необязательно) - список хостов и необязательных портов. `host[:port] [, host:[port]], ...`,
- `database` - (необязательно) - это имя базы данных,
- `query_parameters` - (опционально) список пар ключ-значение `param1=value1[,&param2=value2], ...`. Для некоторых параметров значение не требуется. Имена и значения параметров чувствительны к регистру.
Если user не указан, будут использоваться имя пользователя `default`.
Если host не указан, будет использован хост `localhost`.
Если port не указан, будет использоваться порт `9000`.
Если база данных не указана, будет использоваться база данных `default`.
Если имя пользователя, пароль или база данных были указаны в строке подключения, их нельзя указать с помощью `--user`, `--password` или `--database` (и наоборот).
Параметр host может быть либо именем хоста, либо IP-адресом. Для указания IPv6-адреса поместите его в квадратные скобки:
```text
clickhouse://[2001:db8::1234]
```
URI позволяет подключаться к нескольким хостам. Строки подключения могут содержать несколько хостов. ClickHouse-client будет пытаться подключиться к этим хостам по порядку (т.е. слева направо). После установления соединения попытки подключения к оставшимся хостам не предпринимаются.
Строка подключения должна быть указана в первом аргументе clickhouse-client. Строка подключения может комбинироваться с другими [параметрами командной строки] (#command-line-options) кроме `--host/-h` и `--port`.
Для компонента `query_parameter` разрешены следующие ключи:
- `secure` или сокращенно `s` - без значение. Если параметр указан, то соединение с сервером будет осуществляться по защищенному каналу (TLS). См. `secure` в [command-line-options](#command-line-options).
### Кодирование URI {#connection_string_uri_percent_encoding}
Не US ASCII и специальные символы в имени пользователя, пароле, хостах, базе данных и параметрах запроса должны быть [закодированы](https://ru.wikipedia.org/wiki/URL#%D0%9A%D0%BE%D0%B4%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_URL).
### Примеры {#connection_string_examples}
Подключиться к localhost через порт 9000 и выполнить запрос `SELECT 1`
``` bash
clickhouse-client clickhouse://localhost:9000 --query "SELECT 1"
```
Подключиться к localhost, используя пользователя `john` с паролем `secret`, хост `127.0.0.1` и порт `9000`
``bash
clickhouse-client clickhouse://john:secret@127.0.0.1:9000
```
Подключиться к localhost, используя пользователя по умолчанию, хост с IPV6 адресом `[::1]` и порт `9000`.
``` bash
clickhouse-client clickhouse://[::1]:9000
```
Подключиться к localhost через порт 9000 многострочном режиме.
``` bash
clickhouse-client clickhouse://localhost:9000 '-m'
```
Подключиться к localhost через порт 9000 с пользователем default.
``` bash
clickhouse-client clickhouse://default@localhost:9000
# Эквивалетно:
clickhouse-client clickhouse://localhost:9000 --user default
```
Подключиться к localhost через порт 9000 с базой данных `my_database`
``` bash
clickhouse-client clickhouse://localhost:9000/my_database
# Эквивалетно:
clickhouse-client clickhouse://localhost:9000 --database my_database
```
Подключиться к localhost через порт 9000 с базой данных `my_database`, указанной в строке подключения, используя безопасным соединением при помощи короткого варианта параметра URI 's'.
``` bash
clickhouse-client clickhouse://localhost/my_database?s
# Эквивалетно:
clickhouse-client clickhouse://localhost/my_database -s
```
Подключиться к хосту по умолчанию с использованием порта по умолчанию, пользователя по умолчанию, и базы данных по умолчанию.
``` bash
clickhouse-client clickhouse:
```
Подключиться к хосту по умолчанию через порт по умолчанию, используя имя пользователя `my_user` без пароля.
``` bash
clickhouse-client clickhouse://my_user@
# Использование пустого пароля между : и @ означает, что пользователь должен ввести пароль перед началом соединения.
clickhouse-client clickhouse://my_user:@
```
Подключиться к localhost, используя электронную почту, как имя пользователя. Символ `@` закодирован как `%40`.
``` bash
clickhouse-client clickhouse://some_user%40some_mail.com@localhost:9000
```
Подключится к одному из хостов: `192.168.1.15`, `192.168.1.25`.
``` bash
clickhouse-client clickhouse://192.168.1.15,192.168.1.25
```
### Конфигурационные файлы {#configuration_files}

View File

@ -4126,3 +4126,26 @@ SELECT sum(number) FROM numbers(10000000000) SETTINGS partial_result_on_first_ca
Возможные значения:: `true`, `false`
Значение по умолчанию: `false`
## rename_files_after_processing
- **Тип:** Строка
- **Значение по умолчанию:** Пустая строка
Этот параметр позволяет задать паттерн для переименования файлов, обрабатываемых табличной функцией `file`. Когда опция установлена, все файлы, прочитанные табличной функцией `file`, будут переименованы в соответствии с указанным шаблоном, если обработка и чтение файла завершились успешно.
### Шаблон
Шаблон поддерживает следующие виды плейсхолдеров:
- `%f` — Исходное имя файла без расширения (например "sample").
- `%e` — Оригинальное расширение файла с точкой (например ".csv").
- `%t` — Текущее время (в микросекундах).
- `%%` — Знак процента ("%").
### Пример
- Значение аргумента: `--rename_files_after_processing="processed_%f_%t%e"`
- Запрос: `SELECT * FROM file('sample.csv')`
Если чтение и обработка `sample.csv` прошли успешно, файл будет переименован в `processed_sample_1683473210851438.csv`.

View File

@ -165,22 +165,217 @@ SELECT toUInt64(nan), toUInt32(-32), toUInt16('16'), toUInt8(8.8);
## toDate {#todate}
Cиноним: `DATE`.
Конвертирует аргумент в значение [Date](/docs/ru/sql-reference/data-types/date.md).
**Синтаксис**
``` sql
toDate(expr)
```
**Аргументы**
- `expr` — Значение для преобразования. [String](/docs/ru/sql-reference/data-types/string.md), [Int](/docs/ru/sql-reference/data-types/int-uint.md), [Date](/docs/ru/sql-reference/data-types/date.md) или [DateTime](/docs/ru/sql-reference/data-types/datetime.md).
Если `expr` является числом выглядит как UNIX timestamp (больше чем 65535), оно интерпретируется как DateTime, затем обрезается до Date учитывавая текущую часовой пояс. Если `expr` является числом и меньше чем 65536, оно интерпретируется как количество дней с 1970-01-01.
**Возвращаемое значение**
- Календарная дата. [Date](/docs/ru/sql-reference/data-types/date.md).
**Пример**
Запрос:
``` sql
SELECT toDate('2022-12-30'), toDate(1685457500);
```
Результат:
```response
┌─toDate('2022-12-30')─┬─toDate(1685457500)─┐
│ 2022-12-30 │ 2023-05-30 │
└──────────────────────┴────────────────────┘
```
## toDateOrZero {#todateorzero}
Как [toDate](#todate), но в случае неудачи возвращает нижнюю границу [Date](/docs/ru/sql-reference/data-types/date.md)). Поддерживается только аргумент типа [String](/docs/ru/sql-reference/data-types/string.md).
**Пример**
Запрос:
``` sql
SELECT toDateOrZero('2022-12-30'), toDateOrZero('');
```
Результат:
```response
┌─toDateOrZero('2022-12-30')─┬─toDateOrZero('')─┐
│ 2022-12-30 │ 1970-01-01 │
└────────────────────────────┴──────────────────┘
```
## toDateOrNull {#todateornull}
Как [toDate](#todate), но в случае неудачи возвращает `NULL`. Поддерживается только аргумент типа [String](/docs/ru/sql-reference/data-types/string.md).
**Пример**
Запрос:
``` sql
SELECT toDateOrNull('2022-12-30'), toDateOrNull('');
```
Результат:
```response
┌─toDateOrNull('2022-12-30')─┬─toDateOrNull('')─┐
│ 2022-12-30 │ ᴺᵁᴸᴸ │
└────────────────────────────┴──────────────────┘
```
## toDateOrDefault {#todateordefault}
Как [toDate](#todate), но в случае неудачи возвращает значение по умолчанию (или второй аргумент (если указан), или нижняя граница [Date](/docs/ru/sql-reference/data-types/date.md)).
**Синтаксис**
``` sql
toDateOrDefault(expr [, default_value])
```
**Пример**
Запрос:
``` sql
SELECT toDateOrDefault('2022-12-30'), toDateOrDefault('', '2023-01-01'::Date);
```
Результат:
```response
┌─toDateOrDefault('2022-12-30')─┬─toDateOrDefault('', CAST('2023-01-01', 'Date'))─┐
│ 2022-12-30 │ 2023-01-01 │
└───────────────────────────────┴─────────────────────────────────────────────────┘
```
## toDateTime {#todatetime}
Конвертирует аргумент в значение [DateTime](/docs/ru/sql-reference/data-types/datetime.md).
**Синтаксис**
``` sql
toDateTime(expr[, time_zone ])
```
**Аргументы**
- `expr` — Значение для преобразования. [String](/docs/ru/sql-reference/data-types/string.md), [Int](/docs/ru/sql-reference/data-types/int-uint.md), [Date](/docs/ru/sql-reference/data-types/date.md) или [DateTime](/docs/ru/sql-reference/data-types/datetime.md).
- `time_zone` — Часовой пояс. [String](/docs/ru/sql-reference/data-types/string.md).
Если `expr` является числом, оно интерпретируется как количество секунд от начала unix эпохи.
**Возвращаемое значение**
- Время. [DateTime](/docs/ru/sql-reference/data-types/datetime.md)
**Пример**
Запрос:
``` sql
SELECT toDateTime('2022-12-30 13:44:17'), toDateTime(1685457500, 'UTC');
```
Результат:
```response
┌─toDateTime('2022-12-30 13:44:17')─┬─toDateTime(1685457500, 'UTC')─┐
│ 2022-12-30 13:44:17 │ 2023-05-30 14:38:20 │
└───────────────────────────────────┴───────────────────────────────┘
```
## toDateTimeOrZero {#todatetimeorzero}
Как [toDateTime](#todatetime), но в случае неудачи возвращает нижнюю границу [DateTime](/docs/ru/sql-reference/data-types/datetime.md)). Поддерживается только аргумент типа [String](/docs/ru/sql-reference/data-types/string.md).
**Пример**
Запрос:
``` sql
SELECT toDateTimeOrZero('2022-12-30 13:44:17'), toDateTimeOrZero('');
```
Результат:
```response
┌─toDateTimeOrZero('2022-12-30 13:44:17')─┬─toDateTimeOrZero('')─┐
│ 2022-12-30 13:44:17 │ 1970-01-01 00:00:00 │
└─────────────────────────────────────────┴──────────────────────┘
```
## toDateTimeOrNull {#todatetimeornull}
Как [toDateTime](#todatetime), но в случае неудачи возвращает `NULL`. Поддерживается только аргумент типа [String](/docs/ru/sql-reference/data-types/string.md).
**Example**
Query:
``` sql
SELECT toDateTimeOrNull('2022-12-30 13:44:17'), toDateTimeOrNull('');
```
Result:
```response
┌─toDateTimeOrNull('2022-12-30 13:44:17')─┬─toDateTimeOrNull('')─┐
│ 2022-12-30 13:44:17 │ ᴺᵁᴸᴸ │
└─────────────────────────────────────────┴──────────────────────┘
```
## toDateTimeOrDefault {#todatetimeordefault}
Как [toDateTime](#todatetime), но в случае неудачи возвращает значение по умолчанию (или третий аргумент (если указан), или нижняя граница [DateTime](/docs/ru/sql-reference/data-types/datetime.md)).
**Синтаксис**
``` sql
toDateTimeOrDefault(expr, [, time_zone [, default_value]])
```
**Пример**
Запрос:
``` sql
SELECT toDateTimeOrDefault('2022-12-30 13:44:17'), toDateTimeOrDefault('', 'UTC', '2023-01-01'::DateTime('UTC'));
```
Результат:
```response
┌─toDateTimeOrDefault('2022-12-30 13:44:17')─┬─toDateTimeOrDefault('', 'UTC', CAST('2023-01-01', 'DateTime(\'UTC\')'))─┐
│ 2022-12-30 13:44:17 │ 2023-01-01 00:00:00 │
└────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
```
## toDate32 {#todate32}
Конвертирует аргумент в значение типа [Date32](../../sql-reference/data-types/date32.md). Если значение выходит за границы диапазона, возвращается пограничное значение `Date32`. Если аргумент имеет тип [Date](../../sql-reference/data-types/date.md), учитываются границы типа `Date`.
@ -301,6 +496,14 @@ SELECT
└─────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────┘
```
## toDateTime64
## toDateTime64OrZero
## toDateTime64OrNull
## toDateTime64OrDefault
## toDecimal(32\|64\|128\|256) {#todecimal3264128}
Преобразует `value` к типу данных [Decimal](../../sql-reference/functions/type-conversion-functions.md) с точностью `S`. `value` может быть числом или строкой. Параметр `S` (scale) задаёт число десятичных знаков.

View File

@ -8,13 +8,13 @@ sidebar_label: PROJECTION
Доступны следующие операции с [проекциями](../../../engines/table-engines/mergetree-family/mergetree.md#projections):
- `ALTER TABLE [db].name ADD PROJECTION name ( SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY] )` — добавляет описание проекции в метаданные.
- `ALTER TABLE [db].name ADD PROJECTION [IF NOT EXISTS] name ( SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY] )` — добавляет описание проекции в метаданные.
- `ALTER TABLE [db].name DROP PROJECTION name` — удаляет описание проекции из метаданных и удаляет файлы проекции с диска.
- `ALTER TABLE [db].name DROP PROJECTION [IF EXISTS] name` — удаляет описание проекции из метаданных и удаляет файлы проекции с диска.
- `ALTER TABLE [db.]table MATERIALIZE PROJECTION name IN PARTITION partition_name` — перестраивает проекцию в указанной партиции. Реализовано как [мутация](../../../sql-reference/statements/alter/index.md#mutations).
- `ALTER TABLE [db.]table CLEAR PROJECTION name IN PARTITION partition_name` — удаляет файлы проекции с диска без удаления описания.
- `ALTER TABLE [db.]table CLEAR PROJECTION [IF EXISTS] name IN PARTITION partition_name` — удаляет файлы проекции с диска без удаления описания.
Команды `ADD`, `DROP` и `CLEAR` — легковесны, поскольку они только меняют метаданные или удаляют файлы.
@ -22,4 +22,4 @@ sidebar_label: PROJECTION
:::note
Манипуляции с проекциями поддерживаются только для таблиц с движком [`*MergeTree`](../../../engines/table-engines/mergetree-family/mergetree.md) (включая [replicated](../../../engines/table-engines/mergetree-family/replication.md) варианты).
:::
:::

View File

@ -126,3 +126,4 @@ SELECT count(*) FROM file('big_dir/file{0..9}{0..9}{0..9}', 'CSV', 'name String,
**Смотрите также**
- [Виртуальные столбцы](index.md#table_engines-virtual_columns)
- [Переименование файлов после обработки](/docs/ru/operations/settings/settings.md#rename_files_after_processing)

View File

@ -5,13 +5,13 @@
#include <iostream>
#include <iomanip>
#include <optional>
#include <string_view>
#include <Common/scope_guard_safe.h>
#include <boost/program_options.hpp>
#include <boost/algorithm/string/replace.hpp>
#include <filesystem>
#include <string>
#include "Client.h"
#include "Client/ConnectionString.h"
#include "Core/Protocol.h"
#include "Parsers/formatAST.h"
@ -1248,6 +1248,9 @@ void Client::readArguments(
std::vector<Arguments> & external_tables_arguments,
std::vector<Arguments> & hosts_and_ports_arguments)
{
bool has_connection_string = argc >= 2 && tryParseConnectionString(std::string_view(argv[1]), common_arguments, hosts_and_ports_arguments);
int start_argument_index = has_connection_string ? 2 : 1;
/** We allow different groups of arguments:
* - common arguments;
* - arguments for any number of external tables each in form "--external args...",
@ -1260,10 +1263,13 @@ void Client::readArguments(
std::string prev_host_arg;
std::string prev_port_arg;
for (int arg_num = 1; arg_num < argc; ++arg_num)
for (int arg_num = start_argument_index; arg_num < argc; ++arg_num)
{
std::string_view arg = argv[arg_num];
if (has_connection_string)
checkIfCmdLineOptionCanBeUsedWithConnectionString(arg);
if (arg == "--external")
{
in_external_group = true;

View File

@ -449,7 +449,7 @@ let queries = [
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "CPU Usage (cores)",
@ -457,7 +457,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Queries Running",
@ -465,7 +465,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Merges Running",
@ -473,7 +473,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Selected Bytes/second",
@ -481,7 +481,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "IO Wait",
@ -489,7 +489,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "CPU Wait",
@ -497,7 +497,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "OS CPU Usage (Userspace)",
@ -506,7 +506,7 @@ FROM system.asynchronous_metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
AND metric = 'OSUserTimeNormalized'
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "OS CPU Usage (Kernel)",
@ -515,7 +515,7 @@ FROM system.asynchronous_metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
AND metric = 'OSSystemTimeNormalized'
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Read From Disk",
@ -523,7 +523,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Read From Filesystem",
@ -531,7 +531,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Memory (tracked)",
@ -539,7 +539,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Load Average (15 minutes)",
@ -548,7 +548,7 @@ FROM system.asynchronous_metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
AND metric = 'LoadAverage15'
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Selected Rows/second",
@ -556,7 +556,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Inserted Rows/second",
@ -564,7 +564,7 @@ ORDER BY t`
FROM system.metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Total MergeTree Parts",
@ -573,7 +573,7 @@ FROM system.asynchronous_metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
AND metric = 'TotalPartsOfMergeTreeTables'
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
},
{
"title": "Max Parts For Partition",
@ -582,7 +582,7 @@ FROM system.asynchronous_metric_log
WHERE event_date >= toDate(now() - {seconds:UInt32}) AND event_time >= now() - {seconds:UInt32}
AND metric = 'MaxPartCountForPartition'
GROUP BY t
ORDER BY t`
ORDER BY t WITH FILL STEP {rounding:UInt32}`
}
];

View File

@ -202,6 +202,7 @@ enum class AccessType
M(URL, "", GLOBAL, SOURCES) \
M(REMOTE, "", GLOBAL, SOURCES) \
M(MONGO, "", GLOBAL, SOURCES) \
M(REDIS, "", GLOBAL, SOURCES) \
M(MEILISEARCH, "", GLOBAL, SOURCES) \
M(MYSQL, "", GLOBAL, SOURCES) \
M(POSTGRES, "", GLOBAL, SOURCES) \

View File

@ -122,7 +122,7 @@ public:
size_t size;
readVarUInt(size, in);
static constexpr size_t max_size = 1_GiB;
static constexpr size_t max_size = 100_GiB;
if (size == 0)
throw Exception(ErrorCodes::INCORRECT_DATA, "Incorrect size (0) in groupBitmap.");

View File

@ -157,8 +157,8 @@ public:
void read(DB::ReadBuffer & buf)
{
size_t size = 0;
DB::readIntBinary<size_t>(size, buf);
DB::readIntBinary<size_t>(total_values, buf);
readBinaryLittleEndian(size, buf);
readBinaryLittleEndian(total_values, buf);
/// Compatibility with old versions.
if (size > total_values)
@ -171,16 +171,16 @@ public:
samples.resize(size);
for (size_t i = 0; i < size; ++i)
DB::readPODBinary(samples[i], buf);
readBinaryLittleEndian(samples[i], buf);
sorted = false;
}
void write(DB::WriteBuffer & buf) const
{
size_t size = samples.size();
DB::writeIntBinary<size_t>(size, buf);
DB::writeIntBinary<size_t>(total_values, buf);
const size_t size = samples.size();
writeBinaryLittleEndian(size, buf);
writeBinaryLittleEndian(total_values, buf);
for (size_t i = 0; i < size; ++i)
{
@ -190,12 +190,12 @@ public:
/// Here we ensure that padding is zero without changing the protocol.
/// TODO: After implementation of "versioning aggregate function state",
/// change the serialization format.
Element elem;
memset(&elem, 0, sizeof(elem));
elem = samples[i];
DB::writePODBinary(elem, buf);
DB::transformEndianness<std::endian::little>(elem);
DB::writeString(reinterpret_cast<const char*>(&elem), sizeof(elem), buf);
}
}

View File

@ -1,10 +1,11 @@
#pragma once
#include <exception>
#include <Common/CurrentThread.h>
#include <Common/HashTable/HashSet.h>
#include <Common/ThreadPool.h>
#include <Common/setThreadName.h>
#include <Common/scope_guard_safe.h>
#include <Common/setThreadName.h>
namespace DB
@ -48,30 +49,38 @@ public:
}
else
{
auto next_bucket_to_merge = std::make_shared<std::atomic_uint32_t>(0);
auto thread_func = [&lhs, &rhs, next_bucket_to_merge, thread_group = CurrentThread::getGroup()]()
try
{
SCOPE_EXIT_SAFE(
if (thread_group)
CurrentThread::detachFromGroupIfNotDetached();
);
if (thread_group)
CurrentThread::attachToGroupIfDetached(thread_group);
setThreadName("UniqExactMerger");
auto next_bucket_to_merge = std::make_shared<std::atomic_uint32_t>(0);
while (true)
auto thread_func = [&lhs, &rhs, next_bucket_to_merge, thread_group = CurrentThread::getGroup()]()
{
const auto bucket = next_bucket_to_merge->fetch_add(1);
if (bucket >= rhs.NUM_BUCKETS)
return;
lhs.impls[bucket].merge(rhs.impls[bucket]);
}
};
SCOPE_EXIT_SAFE(
if (thread_group)
CurrentThread::detachFromGroupIfNotDetached();
);
if (thread_group)
CurrentThread::attachToGroupIfDetached(thread_group);
setThreadName("UniqExactMerger");
for (size_t i = 0; i < std::min<size_t>(thread_pool->getMaxThreads(), rhs.NUM_BUCKETS); ++i)
thread_pool->scheduleOrThrowOnError(thread_func);
thread_pool->wait();
while (true)
{
const auto bucket = next_bucket_to_merge->fetch_add(1);
if (bucket >= rhs.NUM_BUCKETS)
return;
lhs.impls[bucket].merge(rhs.impls[bucket]);
}
};
for (size_t i = 0; i < std::min<size_t>(thread_pool->getMaxThreads(), rhs.NUM_BUCKETS); ++i)
thread_pool->scheduleOrThrowOnError(thread_func);
thread_pool->wait();
}
catch (...)
{
thread_pool->wait();
throw;
}
}
}
}

View File

@ -144,6 +144,7 @@ void BackupImpl::open(const ContextPtr & context)
if (!uuid)
uuid = UUIDHelpers::generateV4();
lock_file_name = use_archive ? (archive_params.archive_name + ".lock") : ".lock";
lock_file_before_first_file_checked = false;
writing_finalized = false;
/// Check that we can write a backup there and create the lock file to own this destination.
@ -833,13 +834,10 @@ void BackupImpl::writeFile(const BackupFileInfo & info, BackupEntryPtr entry)
if (writing_finalized)
throw Exception(ErrorCodes::LOGICAL_ERROR, "Backup is already finalized");
bool should_check_lock_file = false;
{
std::lock_guard lock{mutex};
++num_files;
total_size += info.size;
if (!num_entries)
should_check_lock_file = true;
}
auto src_disk = entry->getDisk();
@ -859,7 +857,7 @@ void BackupImpl::writeFile(const BackupFileInfo & info, BackupEntryPtr entry)
return;
}
if (!should_check_lock_file)
if (!lock_file_before_first_file_checked.exchange(true))
checkLockFile(true);
/// NOTE: `mutex` must be unlocked during copying otherwise writing will be in one thread maximum and hence slow.

View File

@ -141,6 +141,7 @@ private:
std::shared_ptr<IArchiveReader> archive_reader;
std::shared_ptr<IArchiveWriter> archive_writer;
String lock_file_name;
std::atomic<bool> lock_file_before_first_file_checked = false;
bool writing_finalized = false;
bool deduplicate_files = true;

View File

@ -413,6 +413,7 @@ dbms_target_link_libraries (
boost::system
clickhouse_common_io
Poco::MongoDB
Poco::Redis
)
if (TARGET ch::mysqlxx)

View File

@ -1364,6 +1364,7 @@ void ClientBase::sendData(Block & sample, const ColumnsDescription & columns_des
columns_description_for_query,
ConstraintsDescription{},
String{},
{},
};
StoragePtr storage = std::make_shared<StorageFile>(in_file, global_context->getUserFilesPath(), args);
storage->startup();

View File

@ -0,0 +1,239 @@
#include "ConnectionString.h"
#include <Common/Exception.h>
#include <Poco/Exception.h>
#include <Poco/URI.h>
#include <array>
#include <iostream>
#include <string>
#include <unordered_map>
namespace DB
{
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
}
namespace
{
using namespace std::string_literals;
using namespace std::literals::string_view_literals;
constexpr auto CONNECTION_URI_SCHEME = "clickhouse:"sv;
const std::unordered_map<std::string_view, std::string_view> PROHIBITED_CLIENT_OPTIONS = {
/// Client option, client option long name
{"-h", "--host"},
{"--host", "--host"},
{"--port", "--port"},
{"--connection", "--connection"},
};
std::string uriDecode(const std::string & uri_encoded_string, bool plus_as_space)
{
std::string decoded_string;
Poco::URI::decode(uri_encoded_string, decoded_string, plus_as_space);
return decoded_string;
}
void getHostAndPort(const Poco::URI & uri, std::vector<std::vector<std::string>> & hosts_and_ports_arguments)
{
std::vector<std::string> host_and_port;
const std::string & host = uri.getHost();
if (!host.empty())
{
host_and_port.push_back("--host=" + uriDecode(host, false));
}
// Port can be written without host (":9000"). Empty host name equals to default host.
auto port = uri.getPort();
if (port != 0)
host_and_port.push_back("--port=" + std::to_string(port));
if (!host_and_port.empty())
hosts_and_ports_arguments.push_back(std::move(host_and_port));
}
void buildConnectionString(
std::string_view host_and_port,
std::string_view right_part,
Poco::URI & uri,
std::vector<std::vector<std::string>> & hosts_and_ports_arguments)
{
// User info does not matter in sub URI
auto uri_string = std::string(CONNECTION_URI_SCHEME);
if (!host_and_port.empty())
{
uri_string.append("//");
uri_string.append(host_and_port);
}
// Right part from string includes '/database?[params]'
uri_string.append(right_part);
try
{
uri = Poco::URI(uri_string);
}
catch (const Poco::URISyntaxException & invalid_uri_exception)
{
throw DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS,
"Invalid connection string syntax {}: {}", uri_string, invalid_uri_exception.what());
}
getHostAndPort(uri, hosts_and_ports_arguments);
}
std::string makeArgument(const std::string & connection_string_parameter_name)
{
return (connection_string_parameter_name.size() == 1 ? "-"s : "--"s) + connection_string_parameter_name;
}
}
namespace DB
{
bool tryParseConnectionString(
std::string_view connection_string,
std::vector<std::string> & common_arguments,
std::vector<std::vector<std::string>> & hosts_and_ports_arguments)
{
if (connection_string == CONNECTION_URI_SCHEME)
return true;
if (!connection_string.starts_with(CONNECTION_URI_SCHEME))
return false;
size_t offset = CONNECTION_URI_SCHEME.size();
if ((connection_string.substr(offset).starts_with("//")))
offset += 2;
auto hosts_end_pos = std::string_view::npos;
auto hosts_or_user_info_end_pos = connection_string.find_first_of("?/@", offset);
auto has_user_info = hosts_or_user_info_end_pos != std::string_view::npos && connection_string[hosts_or_user_info_end_pos] == '@';
if (has_user_info)
{
// Move offset right after user info
offset = hosts_or_user_info_end_pos + 1;
hosts_end_pos = connection_string.find_first_of("?/@", offset);
// Several '@' symbols in connection string is prohibited.
// If user name contains '@' then it should be percent-encoded.
// several users: 'usr1@host1,@usr2@host2' is invalid.
if (hosts_end_pos != std::string_view::npos && connection_string[hosts_end_pos] == '@')
{
throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Symbols '@' in URI in password or user name should be percent-encoded. Individual user names for different hosts also prohibited. {}",
connection_string);
}
}
else
hosts_end_pos = hosts_or_user_info_end_pos;
const auto * hosts_end = hosts_end_pos != std::string_view::npos ? connection_string.begin() + hosts_end_pos
: connection_string.end();
try
{
/** Poco::URI doesn't support several hosts in URI.
* Split string clickhouse:[user[:password]@]host1:port1, ... , hostN:portN[database]?[query_parameters]
* into multiple string for each host:
* clickhouse:[user[:password]@]host1:port1[database]?[query_parameters]
* ...
* clickhouse:[user[:password]@]hostN:portN[database]?[query_parameters]
*/
Poco::URI uri;
const auto * last_host_begin = connection_string.begin() + offset;
for (const auto * it = last_host_begin; it != hosts_end; ++it)
{
if (*it == ',')
{
buildConnectionString({last_host_begin, it}, {hosts_end, connection_string.end()}, uri, hosts_and_ports_arguments);
last_host_begin = it + 1;
}
}
if (uri.empty())
{
// URI has no host specified
uri = std::string(connection_string);
getHostAndPort(uri, hosts_and_ports_arguments);
}
else
buildConnectionString({last_host_begin, hosts_end}, {hosts_end, connection_string.end()}, uri, hosts_and_ports_arguments);
Poco::URI::QueryParameters params = uri.getQueryParameters();
for (const auto & param : params)
{
if (param.first == "secure" || param.first == "s")
{
if (!param.second.empty())
throw Exception(ErrorCodes::BAD_ARGUMENTS, "secure URI query parameter does not allow value");
common_arguments.push_back(makeArgument(param.first));
}
else
throw Exception(ErrorCodes::BAD_ARGUMENTS, "URI query parameter {} is not supported", param.first);
}
auto user_info = uri.getUserInfo();
if (!user_info.empty())
{
// Poco::URI doesn't decode user name/password by default.
// But ClickHouse allows to have users with email user name like: 'john@some_mail.com'
// john@some_mail.com should be percent-encoded: 'john%40some_mail.com'
size_t pos = user_info.find(':');
if (pos != std::string::npos)
{
common_arguments.push_back("--user");
common_arguments.push_back(uriDecode(user_info.substr(0, pos), true));
++pos; // Skip ':'
common_arguments.push_back("--password");
if (user_info.size() > pos + 1)
common_arguments.push_back(uriDecode(user_info.substr(pos), true));
else
{
// in case of user_info == 'user:', ':' is specified, but password is empty
// then add password argument "\n" which means: Ask user for a password.
common_arguments.push_back("\n");
}
}
else
{
common_arguments.push_back("--user");
common_arguments.push_back(uriDecode(user_info, true));
}
}
const auto & database_name = uri.getPath();
size_t start_symbol = !database_name.empty() && database_name[0] == '/' ? 1u : 0u;
if (database_name.size() > start_symbol)
{
common_arguments.push_back("--database");
common_arguments.push_back(start_symbol == 0u ? database_name : database_name.substr(start_symbol));
}
}
catch (const Poco::URISyntaxException & invalid_uri_exception)
{
throw DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS,
"Invalid connection string '{}': {}", connection_string, invalid_uri_exception.what());
}
return true;
}
void checkIfCmdLineOptionCanBeUsedWithConnectionString(std::string_view command_line_option)
{
if (PROHIBITED_CLIENT_OPTIONS.contains(command_line_option))
throw Exception(ErrorCodes::BAD_ARGUMENTS,
"Mixing a connection string and {} option is prohibited", PROHIBITED_CLIENT_OPTIONS.at(command_line_option));
}
}

View File

@ -0,0 +1,27 @@
#pragma once
#include <string>
#include <string_view>
#include <vector>
namespace DB
{
/** Tries to parse ClickHouse connection string.
* if @connection_string starts with 'clickhouse:' then connection string will be parsed
* and converted into a set of arguments for the client.
* Connection string format is similar to URI "clickhouse:[//[user[:password]@][hosts_and_ports]][/dbname][?query_parameters]"
* with the difference that hosts_and_ports can contain multiple hosts separated by ','.
* example: clickhouse://user@host1:port1,host2:port2
* @return Returns false if no connection string was specified. If a connection string was specified, returns true if it is valid, and throws an exception if it is invalid.
* @exception Throws DB::Exception if URI has valid scheme (clickhouse:), but invalid internals.
*/
bool tryParseConnectionString(
std::string_view connection_string,
std::vector<std::string> & common_arguments,
std::vector<std::vector<std::string>> & hosts_and_ports_arguments);
// Throws DB::Exception with BAD_ARGUMENTS if the given command line argument
// is not allowed to be used with a connection string.
void checkIfCmdLineOptionCanBeUsedWithConnectionString(std::string_view command_line_option);
}

View File

@ -101,7 +101,9 @@ static String getLoadSuggestionQuery(Int32 suggestion_limit, bool basic_suggesti
add_column("name", "columns", true, suggestion_limit);
}
query = "SELECT DISTINCT arrayJoin(extractAll(name, '[\\\\w_]{2,}')) AS res FROM (" + query + ") WHERE notEmpty(res)";
/// FIXME: Forbid this query using new analyzer because of bug https://github.com/ClickHouse/ClickHouse/issues/50669
/// We should remove this restriction after resolving this bug.
query = "SELECT DISTINCT arrayJoin(extractAll(name, '[\\\\w_]{2,}')) AS res FROM (" + query + ") WHERE notEmpty(res) SETTINGS allow_experimental_analyzer=0";
return query;
}

Some files were not shown because too many files have changed in this diff Show More