ClickHouse/programs
Robert Schulze 60f9f6855d
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-09-08 09:01:32 +00:00
..
bash-completion Update available formats for bash completion 2022-08-28 17:22:32 +02:00
benchmark Limit suppression to a specific warning 2022-08-21 18:24:17 +00:00
client Fix typos with new codespell 2022-09-02 08:54:48 +00:00
compressor fixed cosmetic issues 2022-07-15 17:23:37 -04:00
copier update tcp protocol, add quota_key 2022-08-03 15:44:08 -04:00
diagnostics Escape credentials for diagnostics tool 2022-07-29 12:25:03 +01:00
disks Added mkdir command 2022-09-02 19:30:35 +02:00
extract-from-config Allow globs in keys for clickhouse-extract-from-config tool (#38966) 2022-07-08 16:13:32 +02:00
format Activate clang-tidy warning "readability-container-contains" 2022-04-18 23:53:11 +02:00
git-import Fix typos with new codespell 2022-09-02 08:54:48 +00:00
install Limit suppression to a specific warning 2022-08-21 18:24:17 +00:00
keeper Prefix overridden add_executable() command with "clickhouse_" 2022-07-11 19:36:18 +02:00
keeper-converter Address PR comments 2022-07-27 07:51:30 +00:00
library-bridge feat: implement catboost in library-bridge 2022-09-08 09:01:32 +00:00
local Add "readpassphrase" as a dependency for clickhouse-local 2022-08-21 12:12:11 +02:00
obfuscator Better 2022-09-01 20:19:25 +00:00
odbc-bridge Assume unversioned server has version=0 and use tryParse() instead of from_chars() 2022-08-10 07:39:32 +00:00
self-extracting add native build for cross-compilation 2022-07-20 23:09:05 -04:00
server Revert "Remove trash" 2022-09-06 02:04:36 +02:00
static-files-disk-uploader Limit suppression to a specific warning 2022-08-21 18:24:17 +00:00
su Limit suppression to a specific warning 2022-08-21 18:24:17 +00:00
clickhouse-split-helper Move all folders inside /dbms one level up (#9974) 2020-04-02 02:51:21 +03:00
CMakeLists.txt feat: implement catboost in library-bridge 2022-09-08 09:01:32 +00:00
config_tools.h.in Add basic commands for disk tool (list-disks, list, move, remove, link, copy, read, write) + tests 2022-06-06 16:52:58 +03:00
embed_binary.S.in Adds a better way to include binary resources 2021-06-09 14:03:30 -07:00
main.cpp Limit suppression to a specific warning 2022-08-21 18:24:17 +00:00