Note, that it can the fail the client if the skim itself will fail,
however I haven't seen it panicd, so let's try.
P.S. about adding USE_SKIM into configure header instead of just compile
option for target, it is better, because it allows not to recompile lots
of C++ headers, since we have to add skim library as PUBLIC. But anyway
this will be resolved in a different way, but separatelly.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Note, that it can the fail the client if the skim itself will fail,
however I haven't seen it panicd, so let's try.
P.S. about adding USE_SKIM into configure header instead of just compile
option for target, it is better, because it allows not to recompile lots
of C++ headers, since we have to add skim library as PUBLIC.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.
SQL syntax:
SELECT
catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
ACTION AS target
FROM amazon_train
LIMIT 10
Required configuration:
<catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
*** Implementation Details ***
The internal protocol between the server and the library-bridge is
simple:
- HTTP GET on path "/extdict_ping":
A ping, used during the handshake to check if the library-bridge runs.
- HTTP POST on path "extdict_request"
(1) Send a "catboost_GetTreeCount" request from the server to the
bridge, containing a library path (e.g /home/user/libcatboost.so) and
a model path (e.g. /home/user/model.bin). Rirst, this unloads the
catboost library handler associated to the model path (if it was
loaded), then loads the catboost library handler associated to the
model path, then executes GetTreeCount() on the library handler and
finally sends the result back to the server. Step (1) is called once
by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
library path handler is unloaded in the beginning because it contains
state which may no longer be valid if the user runs
catboost("/path/to/model.bin", ...) more than once and if "model.bin"
was updated in between.
(2) Send "catboost_Evaluate" from the server to the bridge, containing
the model path and the features to run the interference on. Step (2)
is called multiple times (once per chunk) by the server from function
FunctionCatBoostEvaluate::executeImpl(). The library handler for the
given model path is expected to be already loaded by Step (1).
Fixes#27870
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.
SQL syntax:
SELECT
catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
ACTION AS target
FROM amazon_train
LIMIT 10
Required configuration:
<catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
*** Implementation Details ***
The internal protocol between the server and the library-bridge is
simple:
- HTTP GET on path "/extdict_ping":
A ping, used during the handshake to check if the library-bridge runs.
- HTTP POST on path "extdict_request"
(1) Send a "catboost_GetTreeCount" request from the server to the
bridge, containing a library path (e.g /home/user/libcatboost.so) and
a model path (e.g. /home/user/model.bin). Rirst, this unloads the
catboost library handler associated to the model path (if it was
loaded), then loads the catboost library handler associated to the
model path, then executes GetTreeCount() on the library handler and
finally sends the result back to the server. Step (1) is called once
by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
library path handler is unloaded in the beginning because it contains
state which may no longer be valid if the user runs
catboost("/path/to/model.bin", ...) more than once and if "model.bin"
was updated in between.
(2) Send "catboost_Evaluate" from the server to the bridge, containing
the model path and the features to run the interference on. Step (2)
is called multiple times (once per chunk) by the server from function
FunctionCatBoostEvaluate::executeImpl(). The library handler for the
given model path is expected to be already loaded by Step (1).
Fixes#27870
* replace exit with assert in test_single_page
* improve save_raw_single_page docs option
* More grammar fixes
* "Built from" link in new tab
* fix mistype
* Example of include in docs
* add anchor to meeting form
* Draft of translation helper
* WIP on translation helper
* Replace some fa docs content with machine translation
* Some improvements for introduction/performance.md
* Minor improvements for example_datasets
* Add website/package-lock.json to .gitignore
* YT paragraph was badly outdated and there is no real reason to write a new one
* Use weird introduction article as a starting point for F.A.Q.
* Some refactoring of first half of ya_metrika_task.md
* minor
* Weird docs footer bugfix
* Additional .gitignore entries
* Merge a bunch of small articles about system tables into single one
* Merge a bunch of small articles about formats into single one
* Adapt table with formats to English docs too
* Add SPb meetup link to main page
* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles
* Merge MacOS.md into build_osx.md
* Move Data types higher in ToC
* Publish changelog on website alongside documentation
* Few fixes for en/table_engines/file.md
* Use smaller header sizes in changelogs
* Group up table engines inside ToC
* Move table engines out of top level too
* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.
* Move stuff that is part of query language into respective folder
* Move table functions lower in ToC
* Lost redirects.txt update
* Do not rely on comments in yaml + fix few ru titles
* Extract major parts of queries.md into separate articles
* queries.md has been supposed to be removed
* Fix weird translation
* Fix a bunch of links
* There is only table of contents left
* "Query language" is actually part of SQL abbreviation
* Change filename in README.md too
* fix mistype
* s/formats\/interfaces/interfaces\/formats/g
* Remove extra clarification from header as it was too verbose, probably making it a bit more confusing
* Empty article was supposed to be hidden
* At least change incorrect title
* Move special links to the bottom of nav and slightly highlight them
* Skip hidden pages in bottom navigation too
* Make front page of documentation to be part of Introduction
* Make tables in introduction somewhat readable + move abbreviation definitions earlier
* Some introduction text refactoring
* Some docs introduction refactoring
* Use admonitions instead of divs
* Additional .gitignore
* Treat .gif as images too
* Clarify ToC item
* Additional .gitignore entries
* Merge a bunch of small articles about system tables into single one
* Merge a bunch of small articles about formats into single one
* Adapt table with formats to English docs too
* Add SPb meetup link to main page
* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles
* Merge MacOS.md into build_osx.md
* Move Data types higher in ToC
* Publish changelog on website alongside documentation
* Few fixes for en/table_engines/file.md
* Use smaller header sizes in changelogs
* Group up table engines inside ToC
* Move table engines out of top level too
* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.
* Move stuff that is part of query language into respective folder
* Move table functions lower in ToC
* Lost redirects.txt update
* Do not rely on comments in yaml + fix few ru titles
* Extract major parts of queries.md into separate articles
* queries.md has been supposed to be removed
* Fix weird translation
* Fix a bunch of links
* There is only table of contents left
* "Query language" is actually part of SQL abbreviation
* Change filename in README.md too
* fix mistype