Commit Graph

71 Commits

Author SHA1 Message Date
Robert Schulze
60f9f6855d
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-09-08 09:01:32 +00:00
Robert Schulze
912663b719
Revert "Move CatBoost evaluation into clickhouse-library-bridge" 2022-08-31 20:54:43 +02:00
Robert Schulze
6b2b3c1eb3
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-08-29 20:26:45 +00:00
rfraposa
06ac99b1e7 Add cmake page back to docs && fix /settings/settings in /zh 2022-04-24 16:47:19 -06:00
Yatsishin Ilya
f59c7f5254 Add more build paths to .gitignore 2022-01-18 15:25:48 +00:00
Alexander Tokmakov
571dd3acfb fix style check 2021-09-21 10:28:33 +03:00
Azat Khuzhin
24c8968c80 Add *.log/*.stderr/*.stdout into gitignore 2021-06-08 09:14:47 +03:00
Nikolai Kochetov
77233bbdbb Ignore cmake-in-clickhouse 2021-04-28 18:10:12 +03:00
tison
c809af5dc2 ignore data store files 2021-02-17 12:58:17 +08:00
Ivan
315ff4f0d9
ANTLR4 Grammar for ClickHouse and new parser (#11298) 2020-12-04 05:15:44 +03:00
qianmoQ
e740bd40f1 fix document for index.md and distinctive-features.md 2020-11-24 20:36:19 +08:00
Ivan Lezhankin
7ea393ada8 Fix build without libraries 2020-10-10 23:41:27 +03:00
vladimir golovchenko
cb153d2605 Updated gitignore-files. 2020-08-06 18:05:32 -07:00
Ivan Blinkov
4ef322274d Add integration test 2020-05-29 22:53:16 +03:00
Alexey Milovidov
2986fcd93e Remove outdated contents from gitignore 2020-05-05 19:38:40 +03:00
Ivan Blinkov
98769778f4
Turkish docs translation stub (#10282) 2020-04-15 16:56:49 +03:00
Ivan Blinkov
765dd7c495
Update some docs translations (#10044) 2020-04-04 12:15:31 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00
Ivan
97f2a2213e
Move all folders inside /dbms one level up (#9974)
* Move some code outside dbms/src folder
* Fix paths
2020-04-02 02:51:21 +03:00
Ivan Blinkov
03aa7894d9
Draft of docs translation helper (#9755)
* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation
2020-03-19 20:49:27 +03:00
Ivan Blinkov
5abe3ac3f1
Switch docs to python3 and update MkDocs to 1.1 (#9711)
+ some grammar and css fixes
2020-03-18 16:02:32 +03:00
Ivan Blinkov
242a1a85d4 adjust .gitignore 2020-02-14 12:34:18 +03:00
Ivan Blinkov
18538f5c65 Domain change in docs 2020-01-30 13:34:55 +03:00
Ivan Lezhankin
e63ef08af8 Update gitignore 2020-01-14 16:30:06 +03:00
Ivan
4f2f5cca84
Add support for cross-compiling to the CPU architecture AARCH64 (#7370) 2019-10-30 10:01:53 +03:00
Alexander Tokmakov
93c672aa0b delete BlockInputStreamFromRowInputStream 2019-08-27 21:29:56 +03:00
Nikita Mikhaylov
b8f99255ae better gitignore with mrk2 2019-07-04 21:30:01 +03:00
Ivan Lezhankin
92769a2460 Don't update "arrow" on client-side for nothing 2019-02-07 16:47:16 +03:00
Alexander GQ Gerasiov
afa9e8d4ea .gitignore: Move debian/ specific entries to debian/.gitignore
Signed-off-by: Alexander GQ Gerasiov <gq@cs.msu.su>
2019-01-21 01:26:50 +03:00
Ivan Lezhankin
77daa519ff Update librdkafka to v1.0.0-RC5 2019-01-14 14:15:57 +03:00
Ivan Lezhankin
6df757c6f7 Refactor constant folding and make it reusable for primary_key_expr 2018-12-17 17:59:01 +03:00
Ivan Lezhankin
935615a647 Reimplement FREEZE command. 2018-11-12 15:26:14 +03:00
proller
a7437b93a9 Commited StorageSystemContributors.generated.cpp (#3510)
* CLICKHOUSE-4085 system.contributors

* fi

* Fix random

* Commited StorageSystemContributors.generated.cpp

* fix

* Update CMakeLists.txt
2018-11-01 17:05:37 +03:00
abyss7
d538f70679 Fix build and tests on Fedora (#3496)
* Fix some tests and build on Fedora 28

* Update contrib/ssl

* Try `sudo` first, then without `sudo`.
2018-10-30 17:05:44 +03:00
proller
c35c979285 CLICKHOUSE-4085 system.contributors (#3452)
* CLICKHOUSE-4085 system.contributors

* fi

* Fix random
2018-10-26 20:43:50 +03:00
Alexey Milovidov
3021852383 Imported stateful tests (without data) [#CLICKHOUSE-3] 2018-08-07 20:08:51 +03:00
Ivan Blinkov
b589903680
WIP on docs (#2753)
* Some improvements for introduction/performance.md

* Minor improvements for example_datasets

* Add website/package-lock.json to .gitignore

* YT paragraph was badly outdated and there is no real reason to write a new one

* Use weird introduction article as a starting point for F.A.Q.

* Some refactoring of first half of ya_metrika_task.md

* minor

* Weird docs footer bugfix
2018-07-30 19:34:55 +03:00
Ivan Blinkov
c7e526e050
WIP on documentation (#2692)
* Additional .gitignore entries

* Merge a bunch of small articles about system tables into single one

* Merge a bunch of small articles about formats into single one

* Adapt table with formats to English docs too

* Add SPb meetup link to main page

* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles

* Merge MacOS.md into build_osx.md

* Move Data types higher in ToC

* Publish changelog on website alongside documentation

* Few fixes for en/table_engines/file.md

* Use smaller header sizes in changelogs

* Group up table engines inside ToC

* Move table engines out of top level too

* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.

* Move stuff that is part of query language into respective folder

* Move table functions lower in ToC

* Lost redirects.txt update

* Do not rely on comments in yaml + fix few ru titles

* Extract major parts of queries.md into separate articles

* queries.md has been supposed to be removed

* Fix weird translation

* Fix a bunch of links

* There is only table of contents left

* "Query language" is actually part of SQL abbreviation

* Change filename in README.md too

* fix mistype

* s/formats\/interfaces/interfaces\/formats/g

* Remove extra clarification from header as it was too verbose, probably making it a bit more confusing

* Empty article was supposed to be hidden

* At least change incorrect title

* Move special links to the bottom of nav and slightly highlight them

* Skip hidden pages in bottom navigation too

* Make front page of documentation to be part of Introduction

* Make tables in introduction somewhat readable + move abbreviation definitions earlier

* Some introduction text refactoring

* Some docs introduction refactoring

* Use admonitions instead of divs

* Additional .gitignore

* Treat .gif as images too

* Clarify ToC item
2018-07-20 20:35:34 +03:00
Ivan Blinkov
0a4a5b36cc
Some WIP on documentation refactoring (#2659)
* Additional .gitignore entries

* Merge a bunch of small articles about system tables into single one

* Merge a bunch of small articles about formats into single one

* Adapt table with formats to English docs too

* Add SPb meetup link to main page

* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles

* Merge MacOS.md into build_osx.md

* Move Data types higher in ToC

* Publish changelog on website alongside documentation

* Few fixes for en/table_engines/file.md

* Use smaller header sizes in changelogs

* Group up table engines inside ToC

* Move table engines out of top level too

* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.

* Move stuff that is part of query language into respective folder

* Move table functions lower in ToC

* Lost redirects.txt update

* Do not rely on comments in yaml + fix few ru titles

* Extract major parts of queries.md into separate articles

* queries.md has been supposed to be removed

* Fix weird translation

* Fix a bunch of links

* There is only table of contents left

* "Query language" is actually part of SQL abbreviation

* Change filename in README.md too

* fix mistype
2018-07-18 13:00:53 +03:00
Ivan Blinkov
ba1393fbbd Refactoring of documentation infrastructure to get rid of a lots of copypaste (#2616)
* Clean up docs folder by moving all build-related tools to subdirectory

* Remove unused script

* Remove unused script #2

* Some refactoring in concatenate.py

* Rewrite build.sh in Python

- Get rid of half of copypaste in yml files
- Draft of redirects support

* Actually include redirects.conf

* copy conf too

* Keep H1 the same in single page docs

* fix some paths

* Keep only pages index in yaml

* Workaround for missing jQuery

* Delay docs init
2018-07-09 22:59:07 +03:00
Ivan Blinkov
2d541d3016 Add Berlin meetup link & update roadmap (#2491)
* Update roadmap

* Add Berlin meetup link

* fix indent
2018-06-09 15:21:45 +03:00
proller
76468d8d89 Change build system DIST from artful to bionic (#2330)
* Pbuilder: use ubuntu-ports mirror (with arm64 packages)

* Fix arm64

* Fixed tests isolation. [#CLICKHOUSE-2]

* Fix nodes leak in case of session expiration. [#CLICKHOUSE-2]

* fix

* Add new clang versions

* ubuntu bionic && gcc-8 fixes

* Fixes

* wip

* Change build system DIST from artful to bionic
2018-05-09 07:50:54 +03:00
Vitaliy Lyudvichenko
63cc34d3f6 Fixed incorrect failed OP detection in ZooKeeper. [#CLICKHOUSE-2] 2018-05-03 16:34:19 +03:00
Ivan Blinkov
361a27485d Some progress on documentation (#1942)
* update presentations

* CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com

* update submodule

* lost files

* CLICKHOUSE-2981: prefer sphinx docs over original reference

* CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links

* update presentations

* Less confusing directory structure (docs -> doc/reference/)

* Minify sphinx docs too

* Website release script: fail fast + pass docker hash on deploy

* Do not underline links in docs

* shorter

* cleanup docker images

* tune nginx config

* CLICKHOUSE-3043: get rid of habrastorage links

* Lost translation

* CLICKHOUSE-2936: temporary client-side redirect

* behaves weird in test

* put redirect back

* CLICKHOUSE-3047: copy docs txts to public too

* move to proper file

* remove old pages to avoid confusion

* Remove reference redirect warning for now

* Refresh README.md

* Yellow buttons in docs

* Use svg flags instead of unicode ones in docs

* fix test website instance

* Put flags to separate files

* wrong flag

* Copy Yandex.Metrica introduction from main page to docs

* Yet another home page structure change, couple new blocks (CLICKHOUSE-3045)

* Update Contacts section

* CLICKHOUSE-2849: more detailed legal information

* CLICKHOUSE-2978 preparation - split by files

* More changes in Contacts block

* Tune texts on index page

* update presentations

* One more benchmark

* Add usage sections to index page, adapted from slides

* Get the roadmap started, based on slides from last ClickHouse Meetup

* CLICKHOUSE-2977: some rendering tuning

* Get rid of excessive section in the end of getting started

* Make headers linkable

* CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849

* CLICKHOUSE-2981: fix mobile styles in docs

* Ban crawling of duplicating docs

* Open some external links in new tab

* Ban old docs too

* Lots of trivial fixes in english docs

* Lots of trivial fixes in russian docs

* Remove getting started copies in markdown

* Add Yandex.Webmaster

* Fix some sphinx warnings

* More warnings fixed in english docs

* More sphinx warnings fixed

* Add code-block:: text

* More code-block:: text

* These headers look not that well

* Better switch between documentation languages

* merge use_case.rst into ya_metrika_task.rst

* Edit the agg_functions.rst texts

* Add lost empty lines

* Lost blank lines

* Add new logo sizes

* update presentations

* Next step in migrating to new documentation

* Fix all warnings in en reference

* Fix all warnings in ru reference

* Re-arrange existing reference

* Move operation tips to main reference

* Fix typos noticed by milovidov@

* Get rid of zookeeper.md

* Looks like duplicate of tutorial.html

* Fix some mess with html tags in tutorial

* No idea why nobody noticed this before, but it was completely not clear whet to get the data

* Match code block styling between main and tutorial pages (in favor of the latter)

* Get rid of some copypaste in tutorial

* Normalize header styles

* Move example_datasets to sphinx

* Move presentations submodule to website

* Move and update README.md

* No point in duplicating articles from habrahabr here

* Move development-related docs as is for now

* doc/reference/ -> docs/ (to match the URL on website)

* Adapt links to match the previous commit

* Adapt development docs to rst (still lacks translation and strikethrough support)

* clean on release

* blacklist presentations in gulp

* strikethrough support in sphinx

* just copy development folder for now

* fix weird introduction in style article

* Style guide translation (WIP)

* Finish style guide translation to English

* gulp clean separately

* Update year in LICENSE

* Initial CONTRIBUTING.md

* Fix remaining links to old docs in tutorial

* Some tutorial fixes

* Typo

* Another typo

* Update list of authors from yandex-team accoding to git log

* Fix diff with master

* couple fixes in en what_is_clickhouse.rst

* Try different link to blog in Russian

* Swap words

* Slightly larger line height

* CLICKHOUSE-3089: disable hyphenation in docs

* update presentations

* Fix copying of txt files

* update submodule

* CLICKHOUSE-3108: fix overflow issues in mobile version

* Less weird tutorial header in mobile version

* CLICKHOUSE-3073: skip sourcemaps by default

* CLICKHOUSE-3067: rename item in docs navigation

* fix list markup

* CLICKHOUSE-3067: some documentation style tuning

* CLICKHOUSE-3067: less laggy single page documentation

* update presentations

* YQL-3278: add some links to ClickHouse Meetup in Berlin on October 5, 2017

* Add "time series" keyword

* Switch link to next event

* Switch link to next event #2

* smaller font

* Remove Palo Alto link

* Add link to Success stories list

* better title

* Update index.html

* Update index.html

* Do not expect gulp in $PATH

* Add link to Beijing meetup

* ignore presentations

* introduce requirements.txt

* Apply hacks by bayonet@ using monkey patching

* Simplify and fix patching of "single" docs on Mac OS (it still has a bug on chunk borders though)

* remove hidden symbol

* s/2016–2017/2016–2018/g

* Add some place to put virtualenv

* mkdocs was missing from requirements.txt

* This way it hurts eyes less

* Change header layout + add flags

* yandex_fonts.css -> custom.css

* Larger docs logo

* Shorter link

* Link to home from logo

* Borrow some more styles from main page

* Tune some links

* Remove shadow

* Add header border

* Header font

* Better flag margin

* Improve single page mode

* Fix search results hover

* Fix some MarkDown errors

* Silence useless error

* Get rid of index.html's

* Enable syntax highlight

* Fix link label in ru

* More style fixes in documentation scripts
2018-02-21 21:44:33 +03:00
BayoNet
c4eb3d8069 Preparations for changing documentation site generator. 2018-02-13 01:05:44 +03:00
Alexey Milovidov
996eafada9 Revert "Remove hard-coded paths in Interpreters/Compiler.cpp"
This reverts commit 3a97fbd0e7.
2018-01-09 20:49:25 +03:00
Roman Tsisyk
3a97fbd0e7 Remove hard-coded paths in Interpreters/Compiler.cpp
Interpreters/Compiler.cpp contained hard-coded paths for system's
includes needed for query compiler. These paths were not portable
between different Linux distros and gcc/clang versions. For example,
Debian/Ubuntu use /usr/lib/gcc/x86_64-linux-gnu/*/include,
RHEL/Fedora use /usr/lib/gcc/x86_64-redhat-linux/*/include,
others use /usr/lib/gcc/*/include (without x86_64-XXX triplet).

Patch 68850012b "Embedded compiler fixes" attempted to fix this problem
by adding CMAKE_LIBRARY_ARCHITECTURE after /usr/lib. Unfortunally,
CMAKE_LIBRARY_ARCHITECTURE is not defined on RHEL/Fedora because someone
decided to omit "-gnu" from x86_64-redhat-linux (see RHBZ#1531678).

Patch 70e35d0bc "Build fixes (#1718)" added a workaround for
undefined CMAKE_LIBRARY_ARCHITECTURE on RHEL/Fedora, but hasn't fixed
problem with missing /usr/lib/gcc/x86_64-redhat-linux/*/include/
in the list of hardcoded paths.

Remove hard-coded paths and get the list of `-isystem` includes directly
from bundled clickhouse-clang.

Other changes:

- Enable RPATH for the build directory to get working binaries
  without installing them by `make install`.
2018-01-09 20:24:25 +03:00
Ivan Blinkov
bd98072259 Some progress on website and docs (#1717)
* Add link to Beijing meetup

* ignore presentations

* introduce requirements.txt

* Apply hacks by bayonet@ using monkey patching

* Simplify and fix patching of "single" docs on Mac OS (it still has a bug on chunk borders though)
2017-12-29 18:45:21 +03:00
Vitaliy Lyudvichenko
db3a67a421 Add clearer RangeFiltered implementation. [#CLICKHOUSE-3178] 2017-10-26 17:16:06 +03:00
Alexey Milovidov
14d5149293 Added missing files [#CLICKHOUSE-3276]. 2017-09-04 22:33:17 +03:00