ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-10 01:25:21 +00:00

Author	SHA1	Message	Date
Alexey Milovidov	d3c3d8b8e4	Remove export of dynamic symbols	2023-05-06 23:52:16 +02:00
alex filatov	bafd9773bc	fix Unknown library method 'extDict_libClone' We have an issue when using external dictionary. Occasionally library bridge called with extDict_libClone and fails with Unknown library method 'extDict_libClone'. And it looks like it is because of at some point `else if (method == "extDict_libNew")` was changed to if (lib_new) with no handling for extDict_libClone inside this new if else statement and reporing an error that extDict_libClone is an unknown method. So there is a two-line fix to handle extDict_libClone properly. Error logs that we have: ``` 2022.12.16 14:17:44.285088 [ 393573 ] {} <Error> ExternalDictionaries: Could not update cache dictionary 'dict.vhash_s', next update is scheduled at 2022-12-16 14:18:00: Code: 86. DB::Exception: Received error from remote server /extdict_request?version=1&dictionary_id=be2b2cd1-ba57-4658-8d1b-35ef40ab005b&method=extDict_libClone&from_dictionary_id=c3537142-eaa9-4deb-9b65-47eb8ea1dee6. HTTP status code: 500 Internal Server Error, body: Unknown library method 'extDict_libClone' 2022.12.16 14:17:44.387049 [ 399133 ] {} <Error> ExternalDictionaries: Could not update cache dictionary 'dict.vhash_s', next update is scheduled at 2022-12-16 14:17:51: Code: 86. DB::Exception: Received error from remote server /extdict_request?version=1&dictionary_id=0df866ac-6c94-4974-a76c-3940522091b9&method=extDict_libClone&from_dictionary_id=c3537142-eaa9-4deb-9b65-47eb8ea1dee6. HTTP status code: 500 Internal Server Error, body: Unknown library method 'extDict_libClone' 2022.12.16 14:17:44.488468 [ 397769 ] {} <Error> ExternalDictionaries: Could not update cache dictionary 'dict.vhash_s', next update is scheduled at 2022-12-16 14:19:38: Code: 86. DB::Exception: Received error from remote server /extdict_request?version=1&dictionary_id=2d8af321-b669-4526-982b-42c0fabf0e8d&method=extDict_libClone&from_dictionary_id=c3537142-eaa9-4deb-9b65-47eb8ea1dee6. HTTP status code: 500 Internal Server Error, body: Unknown library method 'extDict_libClone' 2022.12.16 14:17:44.489935 [ 398226 ] {datamarts_v_dwh_node0032-241534:0x552da2_1_11} <Error> executeQuery: Code: 510. DB::Exception: Update failed for dictionary 'dict.vhash_s': Code: 510. DB::Exception: Update failed for dictionary dict.vhash_s : Code: 86. DB::Exception: Received error from remote server /extdict_request?version=1&dictionary_id=be2b2cd1-ba57-4658-8d1b-35ef40ab005b&method=extDict_libClone&from_dictionary_id=c3537142-eaa9-4deb-9b65-47eb8ea1dee6. HTTP status code: 500 Internal Server Error, body: Unknown library method 'extDict_libClone' ```	2023-03-02 15:53:09 +03:00
Alexander Tokmakov	31e16c4b4d	fix	2023-01-24 15:29:19 +01:00
Alexander Tokmakov	70d1adfe4b	Better formatting for exception messages (#45449 ) * save format string for NetException * format exceptions * format exceptions 2 * format exceptions 3 * format exceptions 4 * format exceptions 5 * format exceptions 6 * fix * format exceptions 7 * format exceptions 8 * Update MergeTreeIndexGin.cpp * Update AggregateFunctionMap.cpp * Update AggregateFunctionMap.cpp * fix	2023-01-24 00:13:58 +03:00
Robert Schulze	9c066e964d	Less use of CH-specific bit_cast() Converted usage of CH-custom bit_cast to std::bit_cast if possible, i.e. when sizeof(From) == sizeof(To). (The CH-custom bit_cast is able to deal with sizeof(From) != sizeof(To).) Motivation for this came from #42847 where it is not clear how the internal bit_cast should behave on big endian systems, so we better avoid that situation as much as possible.	2022-11-04 15:52:48 +00:00
Alexey Milovidov	bac578b23a	Merge pull request #41428 from ClickHouse/remove-dlopen Remove `dlopen`	2022-09-18 00:09:57 +03:00
Alexey Milovidov	ada7a44ae4	Remove -WithTerminatingZero methods	2022-09-17 05:34:18 +02:00
Alexey Milovidov	35cce03125	Remove dlopen	2022-09-17 03:02:34 +02:00
Robert Schulze	fd97058e45	fix: incorporate review comments	2022-09-14 15:21:24 +00:00
Robert Schulze	fac1be9700	chore: restore SYSTEM RELOAD MODEL(S) and moniting view SYSTEM.MODELS - This commit restores statements "SYSTEM RELOAD MODEL(S)" which provide a mechanism to update a model explicitly. It also saves potentially unnecessary reloads of a model from disk after it's initial load. To keep the complexity low, the semantics of "SYSTEM RELOAD MODEL(S) was changed from eager to lazy. This means that both statements previously immedately reloaded the specified/all models, whereas now the statements only trigger an unload and the first call to catboostEvaluate() does the actual load. - Monitoring view SYSTEM.MODELS is also restored but with some obsolete fields removed. The view was not documented in the past and for now it remains undocumented. The commit is thus not considered a breach of ClickHouse's public interface.	2022-09-12 19:33:02 +00:00
Robert Schulze	60f9f6855d	feat: implement catboost in library-bridge This commit moves the catboost model evaluation out of the server process into the library-bridge binary. This serves two goals: On the one hand, crashes / memory corruptions of the catboost library no longer affect the server. On the other hand, we can forbid loading dynamic libraries in the server (catboost was the last consumer of this functionality), thus improving security. SQL syntax: SELECT catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction, ACTION AS target FROM amazon_train LIMIT 10 Required configuration: <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path> * Implementation Details * The internal protocol between the server and the library-bridge is simple: - HTTP GET on path "/extdict_ping": A ping, used during the handshake to check if the library-bridge runs. - HTTP POST on path "extdict_request" (1) Send a "catboost_GetTreeCount" request from the server to the bridge, containing a library path (e.g /home/user/libcatboost.so) and a model path (e.g. /home/user/model.bin). Rirst, this unloads the catboost library handler associated to the model path (if it was loaded), then loads the catboost library handler associated to the model path, then executes GetTreeCount() on the library handler and finally sends the result back to the server. Step (1) is called once by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The library path handler is unloaded in the beginning because it contains state which may no longer be valid if the user runs catboost("/path/to/model.bin", ...) more than once and if "model.bin" was updated in between. (2) Send "catboost_Evaluate" from the server to the bridge, containing the model path and the features to run the interference on. Step (2) is called multiple times (once per chunk) by the server from function FunctionCatBoostEvaluate::executeImpl(). The library handler for the given model path is expected to be already loaded by Step (1). Fixes #27870	2022-09-08 09:01:32 +00:00
Robert Schulze	912663b719	Revert "Move CatBoost evaluation into clickhouse-library-bridge"	2022-08-31 20:54:43 +02:00
Robert Schulze	4ed1e1a5be	perf: don't copy data around unnecessarily	2022-08-29 20:27:06 +00:00
Robert Schulze	35a37c91f8	chore: incorporate review feedback	2022-08-29 20:27:06 +00:00
robot-clickhouse	64fa077148	style: fix style	2022-08-29 20:27:06 +00:00
Robert Schulze	6b2b3c1eb3	feat: implement catboost in library-bridge This commit moves the catboost model evaluation out of the server process into the library-bridge binary. This serves two goals: On the one hand, crashes / memory corruptions of the catboost library no longer affect the server. On the other hand, we can forbid loading dynamic libraries in the server (catboost was the last consumer of this functionality), thus improving security. SQL syntax: SELECT catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction, ACTION AS target FROM amazon_train LIMIT 10 Required configuration: <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path> * Implementation Details * The internal protocol between the server and the library-bridge is simple: - HTTP GET on path "/extdict_ping": A ping, used during the handshake to check if the library-bridge runs. - HTTP POST on path "extdict_request" (1) Send a "catboost_GetTreeCount" request from the server to the bridge, containing a library path (e.g /home/user/libcatboost.so) and a model path (e.g. /home/user/model.bin). Rirst, this unloads the catboost library handler associated to the model path (if it was loaded), then loads the catboost library handler associated to the model path, then executes GetTreeCount() on the library handler and finally sends the result back to the server. Step (1) is called once by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The library path handler is unloaded in the beginning because it contains state which may no longer be valid if the user runs catboost("/path/to/model.bin", ...) more than once and if "model.bin" was updated in between. (2) Send "catboost_Evaluate" from the server to the bridge, containing the model path and the features to run the interference on. Step (2) is called multiple times (once per chunk) by the server from function FunctionCatBoostEvaluate::executeImpl(). The library handler for the given model path is expected to be already loaded by Step (1). Fixes #27870	2022-08-29 20:26:45 +00:00
Robert Schulze	810221baf2	Assume unversioned server has version=0 and use tryParse() instead of from_chars()	2022-08-10 07:39:32 +00:00
Robert Schulze	e0d5020a92	Add simple versioning to the -bridge-to-server protocol - In general, it is expected that clickhouse--bridges and clickhouse-server were build from the same source version (e.g. are upgraded "atomically"). If that is not the case, we should at least be able to detect the mismatch and abort. - This commit adds a URL parameter "version", defined in a header shared by the server and bridges. The bridge returns an error in case of mismatch. - The version is not send and checked for "ping" requests (used for handshake), only for regular requests send after handshake. This is because the internally thrown server-side exception due to HTTP failure does not propagate the exact HTTP error (it only stores the error as text), and as a result, the server-side handshake code simply retries in case of error with exponential backoff and finally fails with a "timeout error". This is reasonable as pings typically fail due to time out. However, without a rework of HTTP exceptions, version mismatch during ping would also appear as "timeout" which is too misleading. The behavior may be changed later if needed. - Note that introducing a version parameter does not represent a protocol upgrade itself. Bridges older than the server will simply ignore the field. Only servers older than the bridges receive an error but such a situation should never occur in practice.	2022-08-08 19:40:37 +00:00
Robert Schulze	9952ab1099	Prefix class names "LibraryBridge*Handler" with "ExternalDictionary" - necessary to disambiguate the names from "CatBoost"-"LibraryBridgeHandler" which will be added in a next step	2022-08-08 17:16:46 +00:00
Robert Schulze	20bb8a248e	Prepare server-side BridgeHelper for catboost integration Wall of text, sorry, but I also had to document some stuff for myself: There are three ways to communicate data using HTTP: - the HTTP verb: for our purposes, PUT and GET, - the HTTP path: '/ping', '/request' etc., - the HTTP URL parameter(s), e.g. 'method=libNew&dictionary_id=1234' The bridge will use different handlers for communication with the external dictionary library and for communication with the catboost library. Handlers are created based on a combination of the HTTP verb and the HTTP method. More specifically, there will be combinations - GET + '/extdict_ping' - PUT + '/extdict_request' - GET + '/catboost_ping' - PUT + '/catboost_request'. For each combination, the bridge expects a certain set of URL parameters, e.g. for the first combination parameter "dictionary_id" is expected. Starting with this commit, the library-bridge creates handlers based on the first two combinations (the latter two combinations will be added later). This makes the handler creation mechanism consistent with it's counterpart in xdbc-bridge. For that, it was necessary to make both IBridgeHelper methods "getMainURI()" and "getPingURI()" pure virtual so that derived classes (LibraryBridgeHelper and XDBCBridgeHelper) must provide custom URLs with custom paths. Side note 1: Previously, LibraryBridgeHelper sent HTTP URL parameter "method=ping" during handshake (PING) but the library-bridge ignored that parameter. We now omit this parameter, i.e. LibraryBridgeHelper::PING was removed. Again, this makes things consistent with xdbc-bridge. Side note 2: xdbc-bridge is unchanged in this commit. Therefore, XDBCBridgeHelper now uses the HTTP paths previously in the base class. For funny reason, XDBCBridgeHelper did not use IBridgeHelper::getMainURI() - it generates the URLs by itself. I kept it that way for now but provided an implementation of getMainURI() anyways.	2022-08-04 19:29:51 +00:00
Robert Schulze	ea73b98fb9	Prepare library-bridge for catboost integration - Rename generic file and identifier names in library-bridge to something more dictionary-specific. This is needed because later on, catboost will be integrated into library-bridge. - Also: Some smaller fixes like typos and un-inlining non-performance critical code. - The logic remains unchanged in this commit.	2022-08-04 19:26:51 +00:00
Robert Schulze	1a7727a254	Prefix overridden add_executable() command with "clickhouse_" A simple HelloWorld program with zero includes except iostream triggers a build of ca. 2000 source files. The reason is that ClickHouse's top-level CMakeLists.txt overrides "add_executable()" to link all binaries against "clickhouse_new_delete". This links against "clickhouse_common_io", which in turn has lots of 3rd party library dependencies ... Without linking "clickhouse_new_delete", the number of compiled files for "HelloWorld" goes down to ca. 70. As an example, the self-extracting-executable needs none of its current dependencies but other programs may also benefit. In order to restore access to the original "add_executable()", the overriding version is now prefixed. There is precedence for a "clickhouse_" prefix (as opposed to "ch_"), for example "clickhouse_split_debug_symbols". In general prefixing makes sense also because overriding CMake commands relies on undocumented behavior and is considered not-so-great practice (). () https://crascit.com/2018/09/14/do-not-redefine-cmake-commands/	2022-07-11 19:36:18 +02:00
Robert Schulze	bb358617e1	Better naming for stuff related to splitted debug symbols The previous name was slightly misleading, e.g. it is not about "intalling stripped binaries" but about splitting debug symbols from the binary.	2022-06-30 23:41:27 +02:00
Robert Schulze	5a4f21c50f	Support for Clang Thread Safety Analysis (TSA) - TSA is a static analyzer build by Google which finds race conditions and deadlocks at compile time. - It works by associating a shared member variable with a synchronization primitive that protects it. The compiler can then check at each access if proper locking happened before. A good introduction are [0] and [1]. - TSA requires some help by the programmer via annotations. Luckily, LLVM's libcxx already has annotations for std::mutex, std::lock_guard, std::shared_mutex and std::scoped_lock. This commit enables them (--> contrib/libcxx-cmake/CMakeLists.txt). - Further, this commit adds convenience macros for the low-level annotations for use in ClickHouse (--> base/defines.h). For demonstration, they are leveraged in a few places. - As we compile with "-Wall -Wextra -Weverything", the required compiler flag "-Wthread-safety-analysis" was already enabled. Negative checks are an experimental feature of TSA and disabled (--> cmake/warnings.cmake). Compile times did not increase noticeably. - TSA is used in a few places with simple locking. I tried TSA also where locking is more complex. The problem was usually that it is unclear which data is protected by which lock :-(. But there was definitely some weird code where locking looked broken. So there is some potential to find bugs. *** Limitations of TSA besides the ones listed in [1]: - The programmer needs to know which lock protects which piece of shared data. This is not always easy for large classes. - Two synchronization primitives used in ClickHouse are not annotated in libcxx: (1) std::unique_lock: A releaseable lock handle often together with std::condition_variable, e.g. in solve producer-consumer problems. (2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be considered a design flaw + typically it is slower than a standard mutex. In this commit, one std::recursive_mutex was converted to std::mutex and annotated with TSA. - For free-standing functions (e.g. helper functions) which are passed shared data members, it can be tricky to specify the associated lock. This is because the annotations use the normal C++ rules for symbol resolution. [0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html [1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf	2022-06-20 16:13:25 +02:00
Amos Bird	4a5e4274f0	base should not depend on Common	2022-04-29 10:26:35 +08:00
Robert Schulze	118e94523c	Activate clang-tidy warning "readability-container-contains" This check suggests replacing <Container>.count() by <Container>.contains() which is more speaking and in case of multimaps/multisets also faster.	2022-04-18 23:53:11 +02:00
alesapin	e790a73081	Simplify strip for new packages	2022-03-23 15:14:30 +01:00
Mikhail f. Shiryaev	1d362796df	Fix strip bug	2022-03-22 11:10:02 +01:00
Mikhail f. Shiryaev	fa2a9bb9aa	Separate BUILD_STRIPPED_BINARIES_PREFIX to option and parameter	2022-03-22 11:10:02 +01:00
alesapin	e53578910b	Add ability to strip binaries in cmake	2022-03-10 22:23:28 +01:00
Azat Khuzhin	3b3635c6d5	Fix formatting error in logging messages Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-02-01 14:30:04 +03:00
Azat Khuzhin	bedf208cbd	Use fmt::runtime() for LOG_* for non constexpr Here is oneliner: $ gg 'LOG_$DEBUG\\|TRACE\\|INFO\\|TEST\\|WARNING\\|ERROR\\|FATAL$([^,], [a-zA-Z]' -- :.cpp :.h \| cut -d: -f1 \| sort -u \| xargs -r sed -E -i 's#(LOG_[A-Z])\(([^,]), ([A-Za-z][^,)])#\1(\2, fmt::runtime(\3)#' Note, that I tried to do this with coccinelle (tool for semantic patchin), but it cannot parse C++: $ cat fmt.cocci @@ expression log; expression var; @@ -LOG_DEBUG(log, var) +LOG_DEBUG(log, fmt::runtime(var)) I've also tried to use some macros/templates magic to do this implicitly in logger_useful.h, but I failed to do so, and apparently it is not possible for now. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> v2: manual fixes Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-02-01 14:30:03 +03:00
Nikolai Kochetov	a08c98d760	Move some files.	2021-10-16 17:03:50 +03:00
Nikolai Kochetov	ab28c6c855	Remove BlockInputStream interfaces.	2021-10-14 13:25:43 +03:00
Nikolai Kochetov	ec18340351	Remove streams from formats.	2021-10-11 19:11:50 +03:00
Alexey Milovidov	fe6b7c77c7	Rename "common" to "base"	2021-10-02 10:13:14 +03:00
kssenii	294695bb7d	Review fixes	2021-08-02 13:40:58 +00:00
kssenii	9c6a8b0059	Restore previous ids passing	2021-08-01 08:59:19 +00:00
kssenii	130253e3b9	Fix bridge-server interaction in case of metadata inconsistency	2021-08-01 08:59:16 +00:00
Raúl Marín	2442216472	Fix style too	2021-07-28 11:39:53 +02:00
kssenii	3fe5e8d1ce	Fix	2021-07-28 08:30:58 +00:00
kssenii	6c220c8b35	Fix ids parsing	2021-07-27 20:54:21 +00:00
Nikolai Kochetov	d03bcebc8e	Remove debug logging.	2021-07-23 12:05:42 +03:00
Nikolai Kochetov	f38de35b14	Rename some constants.	2021-07-21 19:13:17 +03:00
alexey-milovidov	04be5437d9	Merge pull request #25296 from abyss7/http-issues Add settings for HTTP header limitations	2021-06-17 01:50:48 +03:00
Maksim Kita	67e9b85951	Merge ext into common	2021-06-16 23:28:41 +03:00
Ivan Lezhankin	ba08a580f8	Add test	2021-06-16 17:33:14 +03:00
Ivan Lezhankin	b182d87d9c	Add settings for HTTP header limitations	2021-06-15 17:33:46 +03:00
Alexey Milovidov	e905883c75	More fixes for PVS-Studio	2021-05-08 19:12:31 +03:00
Maksim Kita	dcf41db1ae	Merge pull request #23048 from kitaisreal/library-dictionary-bridge-library-interface LibraryDictionary bridge library interface	2021-04-14 11:23:29 +03:00

1 2

82 Commits