This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.
SQL syntax:
SELECT
catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
ACTION AS target
FROM amazon_train
LIMIT 10
Required configuration:
<catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
*** Implementation Details ***
The internal protocol between the server and the library-bridge is
simple:
- HTTP GET on path "/extdict_ping":
A ping, used during the handshake to check if the library-bridge runs.
- HTTP POST on path "extdict_request"
(1) Send a "catboost_GetTreeCount" request from the server to the
bridge, containing a library path (e.g /home/user/libcatboost.so) and
a model path (e.g. /home/user/model.bin). Rirst, this unloads the
catboost library handler associated to the model path (if it was
loaded), then loads the catboost library handler associated to the
model path, then executes GetTreeCount() on the library handler and
finally sends the result back to the server. Step (1) is called once
by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
library path handler is unloaded in the beginning because it contains
state which may no longer be valid if the user runs
catboost("/path/to/model.bin", ...) more than once and if "model.bin"
was updated in between.
(2) Send "catboost_Evaluate" from the server to the bridge, containing
the model path and the features to run the interference on. Step (2)
is called multiple times (once per chunk) by the server from function
FunctionCatBoostEvaluate::executeImpl(). The library handler for the
given model path is expected to be already loaded by Step (1).
Fixes#27870
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.
SQL syntax:
SELECT
catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
ACTION AS target
FROM amazon_train
LIMIT 10
Required configuration:
<catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
*** Implementation Details ***
The internal protocol between the server and the library-bridge is
simple:
- HTTP GET on path "/extdict_ping":
A ping, used during the handshake to check if the library-bridge runs.
- HTTP POST on path "extdict_request"
(1) Send a "catboost_GetTreeCount" request from the server to the
bridge, containing a library path (e.g /home/user/libcatboost.so) and
a model path (e.g. /home/user/model.bin). Rirst, this unloads the
catboost library handler associated to the model path (if it was
loaded), then loads the catboost library handler associated to the
model path, then executes GetTreeCount() on the library handler and
finally sends the result back to the server. Step (1) is called once
by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
library path handler is unloaded in the beginning because it contains
state which may no longer be valid if the user runs
catboost("/path/to/model.bin", ...) more than once and if "model.bin"
was updated in between.
(2) Send "catboost_Evaluate" from the server to the bridge, containing
the model path and the features to run the interference on. Step (2)
is called multiple times (once per chunk) by the server from function
FunctionCatBoostEvaluate::executeImpl(). The library handler for the
given model path is expected to be already loaded by Step (1).
Fixes#27870
* replace exit with assert in test_single_page
* improve save_raw_single_page docs option
* More grammar fixes
* "Built from" link in new tab
* fix mistype
* Example of include in docs
* add anchor to meeting form
* Draft of translation helper
* WIP on translation helper
* Replace some fa docs content with machine translation
* Some improvements for introduction/performance.md
* Minor improvements for example_datasets
* Add website/package-lock.json to .gitignore
* YT paragraph was badly outdated and there is no real reason to write a new one
* Use weird introduction article as a starting point for F.A.Q.
* Some refactoring of first half of ya_metrika_task.md
* minor
* Weird docs footer bugfix
* Additional .gitignore entries
* Merge a bunch of small articles about system tables into single one
* Merge a bunch of small articles about formats into single one
* Adapt table with formats to English docs too
* Add SPb meetup link to main page
* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles
* Merge MacOS.md into build_osx.md
* Move Data types higher in ToC
* Publish changelog on website alongside documentation
* Few fixes for en/table_engines/file.md
* Use smaller header sizes in changelogs
* Group up table engines inside ToC
* Move table engines out of top level too
* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.
* Move stuff that is part of query language into respective folder
* Move table functions lower in ToC
* Lost redirects.txt update
* Do not rely on comments in yaml + fix few ru titles
* Extract major parts of queries.md into separate articles
* queries.md has been supposed to be removed
* Fix weird translation
* Fix a bunch of links
* There is only table of contents left
* "Query language" is actually part of SQL abbreviation
* Change filename in README.md too
* fix mistype
* s/formats\/interfaces/interfaces\/formats/g
* Remove extra clarification from header as it was too verbose, probably making it a bit more confusing
* Empty article was supposed to be hidden
* At least change incorrect title
* Move special links to the bottom of nav and slightly highlight them
* Skip hidden pages in bottom navigation too
* Make front page of documentation to be part of Introduction
* Make tables in introduction somewhat readable + move abbreviation definitions earlier
* Some introduction text refactoring
* Some docs introduction refactoring
* Use admonitions instead of divs
* Additional .gitignore
* Treat .gif as images too
* Clarify ToC item
* Additional .gitignore entries
* Merge a bunch of small articles about system tables into single one
* Merge a bunch of small articles about formats into single one
* Adapt table with formats to English docs too
* Add SPb meetup link to main page
* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles
* Merge MacOS.md into build_osx.md
* Move Data types higher in ToC
* Publish changelog on website alongside documentation
* Few fixes for en/table_engines/file.md
* Use smaller header sizes in changelogs
* Group up table engines inside ToC
* Move table engines out of top level too
* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.
* Move stuff that is part of query language into respective folder
* Move table functions lower in ToC
* Lost redirects.txt update
* Do not rely on comments in yaml + fix few ru titles
* Extract major parts of queries.md into separate articles
* queries.md has been supposed to be removed
* Fix weird translation
* Fix a bunch of links
* There is only table of contents left
* "Query language" is actually part of SQL abbreviation
* Change filename in README.md too
* fix mistype
* Clean up docs folder by moving all build-related tools to subdirectory
* Remove unused script
* Remove unused script #2
* Some refactoring in concatenate.py
* Rewrite build.sh in Python
- Get rid of half of copypaste in yml files
- Draft of redirects support
* Actually include redirects.conf
* copy conf too
* Keep H1 the same in single page docs
* fix some paths
* Keep only pages index in yaml
* Workaround for missing jQuery
* Delay docs init
* update presentations
* CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com
* update submodule
* lost files
* CLICKHOUSE-2981: prefer sphinx docs over original reference
* CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links
* update presentations
* Less confusing directory structure (docs -> doc/reference/)
* Minify sphinx docs too
* Website release script: fail fast + pass docker hash on deploy
* Do not underline links in docs
* shorter
* cleanup docker images
* tune nginx config
* CLICKHOUSE-3043: get rid of habrastorage links
* Lost translation
* CLICKHOUSE-2936: temporary client-side redirect
* behaves weird in test
* put redirect back
* CLICKHOUSE-3047: copy docs txts to public too
* move to proper file
* remove old pages to avoid confusion
* Remove reference redirect warning for now
* Refresh README.md
* Yellow buttons in docs
* Use svg flags instead of unicode ones in docs
* fix test website instance
* Put flags to separate files
* wrong flag
* Copy Yandex.Metrica introduction from main page to docs
* Yet another home page structure change, couple new blocks (CLICKHOUSE-3045)
* Update Contacts section
* CLICKHOUSE-2849: more detailed legal information
* CLICKHOUSE-2978 preparation - split by files
* More changes in Contacts block
* Tune texts on index page
* update presentations
* One more benchmark
* Add usage sections to index page, adapted from slides
* Get the roadmap started, based on slides from last ClickHouse Meetup
* CLICKHOUSE-2977: some rendering tuning
* Get rid of excessive section in the end of getting started
* Make headers linkable
* CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849
* CLICKHOUSE-2981: fix mobile styles in docs
* Ban crawling of duplicating docs
* Open some external links in new tab
* Ban old docs too
* Lots of trivial fixes in english docs
* Lots of trivial fixes in russian docs
* Remove getting started copies in markdown
* Add Yandex.Webmaster
* Fix some sphinx warnings
* More warnings fixed in english docs
* More sphinx warnings fixed
* Add code-block:: text
* More code-block:: text
* These headers look not that well
* Better switch between documentation languages
* merge use_case.rst into ya_metrika_task.rst
* Edit the agg_functions.rst texts
* Add lost empty lines
* Lost blank lines
* Add new logo sizes
* update presentations
* Next step in migrating to new documentation
* Fix all warnings in en reference
* Fix all warnings in ru reference
* Re-arrange existing reference
* Move operation tips to main reference
* Fix typos noticed by milovidov@
* Get rid of zookeeper.md
* Looks like duplicate of tutorial.html
* Fix some mess with html tags in tutorial
* No idea why nobody noticed this before, but it was completely not clear whet to get the data
* Match code block styling between main and tutorial pages (in favor of the latter)
* Get rid of some copypaste in tutorial
* Normalize header styles
* Move example_datasets to sphinx
* Move presentations submodule to website
* Move and update README.md
* No point in duplicating articles from habrahabr here
* Move development-related docs as is for now
* doc/reference/ -> docs/ (to match the URL on website)
* Adapt links to match the previous commit
* Adapt development docs to rst (still lacks translation and strikethrough support)
* clean on release
* blacklist presentations in gulp
* strikethrough support in sphinx
* just copy development folder for now
* fix weird introduction in style article
* Style guide translation (WIP)
* Finish style guide translation to English
* gulp clean separately
* Update year in LICENSE
* Initial CONTRIBUTING.md
* Fix remaining links to old docs in tutorial
* Some tutorial fixes
* Typo
* Another typo
* Update list of authors from yandex-team accoding to git log
* Fix diff with master
* couple fixes in en what_is_clickhouse.rst
* Try different link to blog in Russian
* Swap words
* Slightly larger line height
* CLICKHOUSE-3089: disable hyphenation in docs
* update presentations
* Fix copying of txt files
* update submodule
* CLICKHOUSE-3108: fix overflow issues in mobile version
* Less weird tutorial header in mobile version
* CLICKHOUSE-3073: skip sourcemaps by default
* CLICKHOUSE-3067: rename item in docs navigation
* fix list markup
* CLICKHOUSE-3067: some documentation style tuning
* CLICKHOUSE-3067: less laggy single page documentation
* update presentations
* YQL-3278: add some links to ClickHouse Meetup in Berlin on October 5, 2017
* Add "time series" keyword
* Switch link to next event
* Switch link to next event #2
* smaller font
* Remove Palo Alto link
* Add link to Success stories list
* better title
* Update index.html
* Update index.html
* Do not expect gulp in $PATH
* Add link to Beijing meetup
* ignore presentations
* introduce requirements.txt
* Apply hacks by bayonet@ using monkey patching
* Simplify and fix patching of "single" docs on Mac OS (it still has a bug on chunk borders though)
* remove hidden symbol
* s/2016–2017/2016–2018/g
* Add some place to put virtualenv
* mkdocs was missing from requirements.txt
* This way it hurts eyes less
* Change header layout + add flags
* yandex_fonts.css -> custom.css
* Larger docs logo
* Shorter link
* Link to home from logo
* Borrow some more styles from main page
* Tune some links
* Remove shadow
* Add header border
* Header font
* Better flag margin
* Improve single page mode
* Fix search results hover
* Fix some MarkDown errors
* Silence useless error
* Get rid of index.html's
* Enable syntax highlight
* Fix link label in ru
* More style fixes in documentation scripts
Interpreters/Compiler.cpp contained hard-coded paths for system's
includes needed for query compiler. These paths were not portable
between different Linux distros and gcc/clang versions. For example,
Debian/Ubuntu use /usr/lib/gcc/x86_64-linux-gnu/*/include,
RHEL/Fedora use /usr/lib/gcc/x86_64-redhat-linux/*/include,
others use /usr/lib/gcc/*/include (without x86_64-XXX triplet).
Patch 68850012b "Embedded compiler fixes" attempted to fix this problem
by adding CMAKE_LIBRARY_ARCHITECTURE after /usr/lib. Unfortunally,
CMAKE_LIBRARY_ARCHITECTURE is not defined on RHEL/Fedora because someone
decided to omit "-gnu" from x86_64-redhat-linux (see RHBZ#1531678).
Patch 70e35d0bc "Build fixes (#1718)" added a workaround for
undefined CMAKE_LIBRARY_ARCHITECTURE on RHEL/Fedora, but hasn't fixed
problem with missing /usr/lib/gcc/x86_64-redhat-linux/*/include/
in the list of hardcoded paths.
Remove hard-coded paths and get the list of `-isystem` includes directly
from bundled clickhouse-clang.
Other changes:
- Enable RPATH for the build directory to get working binaries
without installing them by `make install`.