Commit Graph

20 Commits

Author SHA1 Message Date
Alexey Milovidov
35cce03125 Remove dlopen 2022-09-17 03:02:34 +02:00
Robert Schulze
60f9f6855d
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-09-08 09:01:32 +00:00
Robert Schulze
912663b719
Revert "Move CatBoost evaluation into clickhouse-library-bridge" 2022-08-31 20:54:43 +02:00
Robert Schulze
6b2b3c1eb3
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-08-29 20:26:45 +00:00
Robert Schulze
ea73b98fb9
Prepare library-bridge for catboost integration
- Rename generic file and identifier names in library-bridge to
  something more dictionary-specific. This is needed because later on,
  catboost will be integrated into library-bridge.

- Also: Some smaller fixes like typos and un-inlining non-performance
  critical code.

- The logic remains unchanged in this commit.
2022-08-04 19:26:51 +00:00
Robert Schulze
1a7727a254
Prefix overridden add_executable() command with "clickhouse_"
A simple HelloWorld program with zero includes except iostream triggers
a build of ca. 2000 source files. The reason is that ClickHouse's
top-level CMakeLists.txt overrides "add_executable()" to link all
binaries against "clickhouse_new_delete". This links against
"clickhouse_common_io", which in turn has lots of 3rd party library
dependencies ... Without linking "clickhouse_new_delete", the number of
compiled files for "HelloWorld" goes down to ca. 70.

As an example, the self-extracting-executable needs none of its current
dependencies but other programs may also benefit.

In order to restore access to the original "add_executable()", the
overriding version is now prefixed. There is precedence for a
"clickhouse_" prefix (as opposed to "ch_"), for example
"clickhouse_split_debug_symbols". In general prefixing makes sense also
because overriding CMake commands relies on undocumented behavior and is
considered not-so-great practice (*).

(*) https://crascit.com/2018/09/14/do-not-redefine-cmake-commands/
2022-07-11 19:36:18 +02:00
Robert Schulze
bb358617e1
Better naming for stuff related to splitted debug symbols
The previous name was slightly misleading, e.g. it is not about
"intalling stripped binaries" but about splitting debug symbols from the
binary.
2022-06-30 23:41:27 +02:00
alesapin
e790a73081 Simplify strip for new packages 2022-03-23 15:14:30 +01:00
Mikhail f. Shiryaev
1d362796df
Fix strip bug 2022-03-22 11:10:02 +01:00
Mikhail f. Shiryaev
fa2a9bb9aa
Separate BUILD_STRIPPED_BINARIES_PREFIX to option and parameter 2022-03-22 11:10:02 +01:00
alesapin
e53578910b Add ability to strip binaries in cmake 2022-03-10 22:23:28 +01:00
Maksim Kita
dcf41db1ae
Merge pull request #23048 from kitaisreal/library-dictionary-bridge-library-interface
LibraryDictionary bridge library interface
2021-04-14 11:23:29 +03:00
Maksim Kita
616d7d19f8 LibraryDictionary bridge library interface 2021-04-13 22:53:36 +03:00
kssenii
7a89948801 Fix 2021-04-07 07:16:50 +00:00
kssenii
4419a430cb Less dependencies 2021-04-06 20:15:32 +00:00
kssenii
323fb54a8e Fix split build finally 2021-03-12 21:12:34 +00:00
kssenii
02ed33936a Fix split build 2021-03-12 12:54:49 +00:00
kssenii
b7b15fe920 Remove unused 2021-03-07 15:23:20 +00:00
kssenii
94af06588e Make libraries storage as singleton 2021-03-06 18:38:01 +00:00
kssenii
dd4a7b6e3d First version 2021-03-05 10:19:01 +00:00