ClickHouse/programs/library-bridge/LibraryBridgeHandlerFactory.cpp

45 lines
1.5 KiB
C++
Raw Normal View History

#include "LibraryBridgeHandlerFactory.h"
#include <Poco/Net/HTTPServerRequest.h>
#include <Server/HTTP/HTMLForm.h>
#include "LibraryBridgeHandlers.h"
namespace DB
{
LibraryBridgeHandlerFactory::LibraryBridgeHandlerFactory(
const std::string & name_,
size_t keep_alive_timeout_,
ContextPtr context_)
: WithContext(context_)
, log(&Poco::Logger::get(name_))
, name(name_)
, keep_alive_timeout(keep_alive_timeout_)
{
}
std::unique_ptr<HTTPRequestHandler> LibraryBridgeHandlerFactory::createRequestHandler(const HTTPServerRequest & request)
{
Poco::URI uri{request.getURI()};
LOG_DEBUG(log, "Request URI: {}", uri.toString());
if (request.getMethod() == Poco::Net::HTTPRequest::HTTP_GET)
Prepare server-side BridgeHelper for catboost integration Wall of text, sorry, but I also had to document some stuff for myself: There are three ways to communicate data using HTTP: - the HTTP verb: for our purposes, PUT and GET, - the HTTP path: '/ping', '/request' etc., - the HTTP URL parameter(s), e.g. 'method=libNew&dictionary_id=1234' The bridge will use different handlers for communication with the external dictionary library and for communication with the catboost library. Handlers are created based on a combination of the HTTP verb and the HTTP method. More specifically, there will be combinations - GET + '/extdict_ping' - PUT + '/extdict_request' - GET + '/catboost_ping' - PUT + '/catboost_request'. For each combination, the bridge expects a certain set of URL parameters, e.g. for the first combination parameter "dictionary_id" is expected. Starting with this commit, the library-bridge creates handlers based on the first two combinations (the latter two combinations will be added later). This makes the handler creation mechanism consistent with it's counterpart in xdbc-bridge. For that, it was necessary to make both IBridgeHelper methods "getMainURI()" and "getPingURI()" pure virtual so that derived classes (LibraryBridgeHelper and XDBCBridgeHelper) must provide custom URLs with custom paths. Side note 1: Previously, LibraryBridgeHelper sent HTTP URL parameter "method=ping" during handshake (PING) but the library-bridge ignored that parameter. We now omit this parameter, i.e. LibraryBridgeHelper::PING was removed. Again, this makes things consistent with xdbc-bridge. Side note 2: xdbc-bridge is unchanged in this commit. Therefore, XDBCBridgeHelper now uses the HTTP paths previously in the base class. For funny reason, XDBCBridgeHelper did not use IBridgeHelper::getMainURI() - it generates the URLs by itself. I kept it that way for now but provided an implementation of getMainURI() anyways.
2022-08-04 18:33:13 +00:00
{
if (uri.getPath() == "/extdict_ping")
return std::make_unique<ExternalDictionaryLibraryBridgeExistsHandler>(keep_alive_timeout, getContext());
feat: implement catboost in library-bridge This commit moves the catboost model evaluation out of the server process into the library-bridge binary. This serves two goals: On the one hand, crashes / memory corruptions of the catboost library no longer affect the server. On the other hand, we can forbid loading dynamic libraries in the server (catboost was the last consumer of this functionality), thus improving security. SQL syntax: SELECT catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction, ACTION AS target FROM amazon_train LIMIT 10 Required configuration: <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path> *** Implementation Details *** The internal protocol between the server and the library-bridge is simple: - HTTP GET on path "/extdict_ping": A ping, used during the handshake to check if the library-bridge runs. - HTTP POST on path "extdict_request" (1) Send a "catboost_GetTreeCount" request from the server to the bridge, containing a library path (e.g /home/user/libcatboost.so) and a model path (e.g. /home/user/model.bin). Rirst, this unloads the catboost library handler associated to the model path (if it was loaded), then loads the catboost library handler associated to the model path, then executes GetTreeCount() on the library handler and finally sends the result back to the server. Step (1) is called once by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The library path handler is unloaded in the beginning because it contains state which may no longer be valid if the user runs catboost("/path/to/model.bin", ...) more than once and if "model.bin" was updated in between. (2) Send "catboost_Evaluate" from the server to the bridge, containing the model path and the features to run the interference on. Step (2) is called multiple times (once per chunk) by the server from function FunctionCatBoostEvaluate::executeImpl(). The library handler for the given model path is expected to be already loaded by Step (1). Fixes #27870
2022-08-05 07:53:06 +00:00
else if (uri.getPath() == "/catboost_ping")
return std::make_unique<CatBoostLibraryBridgeExistsHandler>(keep_alive_timeout, getContext());
Prepare server-side BridgeHelper for catboost integration Wall of text, sorry, but I also had to document some stuff for myself: There are three ways to communicate data using HTTP: - the HTTP verb: for our purposes, PUT and GET, - the HTTP path: '/ping', '/request' etc., - the HTTP URL parameter(s), e.g. 'method=libNew&dictionary_id=1234' The bridge will use different handlers for communication with the external dictionary library and for communication with the catboost library. Handlers are created based on a combination of the HTTP verb and the HTTP method. More specifically, there will be combinations - GET + '/extdict_ping' - PUT + '/extdict_request' - GET + '/catboost_ping' - PUT + '/catboost_request'. For each combination, the bridge expects a certain set of URL parameters, e.g. for the first combination parameter "dictionary_id" is expected. Starting with this commit, the library-bridge creates handlers based on the first two combinations (the latter two combinations will be added later). This makes the handler creation mechanism consistent with it's counterpart in xdbc-bridge. For that, it was necessary to make both IBridgeHelper methods "getMainURI()" and "getPingURI()" pure virtual so that derived classes (LibraryBridgeHelper and XDBCBridgeHelper) must provide custom URLs with custom paths. Side note 1: Previously, LibraryBridgeHelper sent HTTP URL parameter "method=ping" during handshake (PING) but the library-bridge ignored that parameter. We now omit this parameter, i.e. LibraryBridgeHelper::PING was removed. Again, this makes things consistent with xdbc-bridge. Side note 2: xdbc-bridge is unchanged in this commit. Therefore, XDBCBridgeHelper now uses the HTTP paths previously in the base class. For funny reason, XDBCBridgeHelper did not use IBridgeHelper::getMainURI() - it generates the URLs by itself. I kept it that way for now but provided an implementation of getMainURI() anyways.
2022-08-04 18:33:13 +00:00
}
if (request.getMethod() == Poco::Net::HTTPRequest::HTTP_POST)
Prepare server-side BridgeHelper for catboost integration Wall of text, sorry, but I also had to document some stuff for myself: There are three ways to communicate data using HTTP: - the HTTP verb: for our purposes, PUT and GET, - the HTTP path: '/ping', '/request' etc., - the HTTP URL parameter(s), e.g. 'method=libNew&dictionary_id=1234' The bridge will use different handlers for communication with the external dictionary library and for communication with the catboost library. Handlers are created based on a combination of the HTTP verb and the HTTP method. More specifically, there will be combinations - GET + '/extdict_ping' - PUT + '/extdict_request' - GET + '/catboost_ping' - PUT + '/catboost_request'. For each combination, the bridge expects a certain set of URL parameters, e.g. for the first combination parameter "dictionary_id" is expected. Starting with this commit, the library-bridge creates handlers based on the first two combinations (the latter two combinations will be added later). This makes the handler creation mechanism consistent with it's counterpart in xdbc-bridge. For that, it was necessary to make both IBridgeHelper methods "getMainURI()" and "getPingURI()" pure virtual so that derived classes (LibraryBridgeHelper and XDBCBridgeHelper) must provide custom URLs with custom paths. Side note 1: Previously, LibraryBridgeHelper sent HTTP URL parameter "method=ping" during handshake (PING) but the library-bridge ignored that parameter. We now omit this parameter, i.e. LibraryBridgeHelper::PING was removed. Again, this makes things consistent with xdbc-bridge. Side note 2: xdbc-bridge is unchanged in this commit. Therefore, XDBCBridgeHelper now uses the HTTP paths previously in the base class. For funny reason, XDBCBridgeHelper did not use IBridgeHelper::getMainURI() - it generates the URLs by itself. I kept it that way for now but provided an implementation of getMainURI() anyways.
2022-08-04 18:33:13 +00:00
{
if (uri.getPath() == "/extdict_request")
return std::make_unique<ExternalDictionaryLibraryBridgeRequestHandler>(keep_alive_timeout, getContext());
feat: implement catboost in library-bridge This commit moves the catboost model evaluation out of the server process into the library-bridge binary. This serves two goals: On the one hand, crashes / memory corruptions of the catboost library no longer affect the server. On the other hand, we can forbid loading dynamic libraries in the server (catboost was the last consumer of this functionality), thus improving security. SQL syntax: SELECT catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction, ACTION AS target FROM amazon_train LIMIT 10 Required configuration: <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path> *** Implementation Details *** The internal protocol between the server and the library-bridge is simple: - HTTP GET on path "/extdict_ping": A ping, used during the handshake to check if the library-bridge runs. - HTTP POST on path "extdict_request" (1) Send a "catboost_GetTreeCount" request from the server to the bridge, containing a library path (e.g /home/user/libcatboost.so) and a model path (e.g. /home/user/model.bin). Rirst, this unloads the catboost library handler associated to the model path (if it was loaded), then loads the catboost library handler associated to the model path, then executes GetTreeCount() on the library handler and finally sends the result back to the server. Step (1) is called once by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The library path handler is unloaded in the beginning because it contains state which may no longer be valid if the user runs catboost("/path/to/model.bin", ...) more than once and if "model.bin" was updated in between. (2) Send "catboost_Evaluate" from the server to the bridge, containing the model path and the features to run the interference on. Step (2) is called multiple times (once per chunk) by the server from function FunctionCatBoostEvaluate::executeImpl(). The library handler for the given model path is expected to be already loaded by Step (1). Fixes #27870
2022-08-05 07:53:06 +00:00
else if (uri.getPath() == "/catboost_request")
return std::make_unique<CatBoostLibraryBridgeRequestHandler>(keep_alive_timeout, getContext());
Prepare server-side BridgeHelper for catboost integration Wall of text, sorry, but I also had to document some stuff for myself: There are three ways to communicate data using HTTP: - the HTTP verb: for our purposes, PUT and GET, - the HTTP path: '/ping', '/request' etc., - the HTTP URL parameter(s), e.g. 'method=libNew&dictionary_id=1234' The bridge will use different handlers for communication with the external dictionary library and for communication with the catboost library. Handlers are created based on a combination of the HTTP verb and the HTTP method. More specifically, there will be combinations - GET + '/extdict_ping' - PUT + '/extdict_request' - GET + '/catboost_ping' - PUT + '/catboost_request'. For each combination, the bridge expects a certain set of URL parameters, e.g. for the first combination parameter "dictionary_id" is expected. Starting with this commit, the library-bridge creates handlers based on the first two combinations (the latter two combinations will be added later). This makes the handler creation mechanism consistent with it's counterpart in xdbc-bridge. For that, it was necessary to make both IBridgeHelper methods "getMainURI()" and "getPingURI()" pure virtual so that derived classes (LibraryBridgeHelper and XDBCBridgeHelper) must provide custom URLs with custom paths. Side note 1: Previously, LibraryBridgeHelper sent HTTP URL parameter "method=ping" during handshake (PING) but the library-bridge ignored that parameter. We now omit this parameter, i.e. LibraryBridgeHelper::PING was removed. Again, this makes things consistent with xdbc-bridge. Side note 2: xdbc-bridge is unchanged in this commit. Therefore, XDBCBridgeHelper now uses the HTTP paths previously in the base class. For funny reason, XDBCBridgeHelper did not use IBridgeHelper::getMainURI() - it generates the URLs by itself. I kept it that way for now but provided an implementation of getMainURI() anyways.
2022-08-04 18:33:13 +00:00
}
return nullptr;
}
}