ClickHouse® is a real-time analytics DBMS
Go to file
Robert Schulze 6b2b3c1eb3
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-08-29 20:26:45 +00:00
.github use input token instead of env var 2022-08-19 14:55:02 -04:00
base Merge pull request #40224 from ClickHouse/alexey-milovidov-patch-4 2022-08-22 23:12:01 +03:00
benchmark Remove old file 2022-07-12 20:28:02 +02:00
cmake Revert "Avx enablement" 2022-08-21 15:47:02 +03:00
contrib Merge branch 'master' into keeper-listen-host 2022-08-21 18:43:55 +02:00
docker Merge pull request #40647 from ClickHouse/high-level-coverage 2022-08-27 23:13:10 +03:00
docs feat: implement catboost in library-bridge 2022-08-29 20:26:45 +00:00
packages Fix doinst.sh generator for tgz packages 2022-08-26 20:09:56 +02:00
programs feat: implement catboost in library-bridge 2022-08-29 20:26:45 +00:00
src feat: implement catboost in library-bridge 2022-08-29 20:26:45 +00:00
tests feat: implement catboost in library-bridge 2022-08-29 20:26:45 +00:00
utils Update aspell-dict to fix doc check 2022-08-29 00:27:19 +02:00
website move-images-to-clickhouse-presentations 2022-08-10 04:04:56 +02:00
.clang-format add BeforeLambdaBody to .clang-format 2022-02-11 16:51:45 +01:00
.clang-tidy Check what will be if I enable concurrency-mt-unsafe in clang-tidy 2022-08-15 07:49:23 +03:00
.editorconfig Changed tabs to spaces in editor configs and in style guide [#CLICKHOUSE-3]. 2017-04-01 11:35:09 +03:00
.gitattributes mark test data as binary 2022-01-22 03:19:47 +03:00
.gitignore feat: implement catboost in library-bridge 2022-08-29 20:26:45 +00:00
.gitmodules fix build with clang-15 2022-08-01 18:00:54 +02:00
.pylintrc Cover deprecated bad-* pylint options with black 2022-06-08 14:18:28 +02:00
.vimrc Changed tabs to spaces in editor configs and in style guide [#CLICKHOUSE-3]. 2017-04-01 11:35:09 +03:00
.yamllint Drop truthy.check-keys from yamllint (does not supported on CI) 2021-02-21 06:15:36 +03:00
AUTHORS Update AUTHORS 2021-09-22 11:38:03 +03:00
CHANGELOG.md Update CHANGELOG.md 2022-08-24 03:35:51 +03:00
CMakeLists.txt Revert "Support for DWARF-5 in in house DWARF parser" 2022-08-29 14:25:53 +03:00
CODE_OF_CONDUCT.md Add minimal code of conduct #9676 2020-03-16 12:44:28 +03:00
CONTRIBUTING.md Mention ClickHouse CLA in CONTRIBUTING.md (#32697) 2021-12-14 03:47:19 +03:00
format_sources allow several <graphite> targets (#603) 2017-03-21 23:08:09 +04:00
LICENSE Update year 2022-01-27 01:01:27 +03:00
PreLoad.cmake Update PreLoad.cmake 2022-08-26 18:30:05 +08:00
README.md Update README.md 2022-08-14 07:04:49 +02:00
SECURITY.md Update security 2022-08-20 13:04:01 +02:00

ClickHouse — open source distributed column-oriented DBMS

ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

  • Official website has a quick high-level overview of ClickHouse on the main page.
  • Tutorial shows how to set up and query a small ClickHouse cluster.
  • Documentation provides more in-depth information.
  • YouTube channel has a lot of content about ClickHouse in video format.
  • Slack and Telegram allow chatting with ClickHouse users in real-time.
  • Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
  • Code Browser (Woboq) with syntax highlight and navigation.
  • Code Browser (github.dev) with syntax highlight, powered by github.dev.
  • Contacts can help to get your questions answered if there are any.

Upcoming events

  • v22.8 Release Webinar Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release, provide live demos, and share vision into what is coming in the roadmap.