Commit Graph

94 Commits

Author SHA1 Message Date
Robert Schulze
60f9f6855d
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-09-08 09:01:32 +00:00
Robert Schulze
912663b719
Revert "Move CatBoost evaluation into clickhouse-library-bridge" 2022-08-31 20:54:43 +02:00
Robert Schulze
6b2b3c1eb3
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-08-29 20:26:45 +00:00
Yakov Olkhovskiy
31a7ed09a1 disable default ENABLE_CLICKHOUSE_SELF_EXTRACTING and add to env 2022-08-27 21:08:01 +00:00
Robert Schulze
ad0d060dc1
Merge pull request #39904 from ClickHouse/library-bridge-refactoring
Prepare library-bridge for catboost integration
2022-08-08 12:15:01 +02:00
Yakov Olkhovskiy
b1f45fa787 Don't create self-extracting clickhouse for split build 2022-08-05 21:48:40 -04:00
Robert Schulze
ea73b98fb9
Prepare library-bridge for catboost integration
- Rename generic file and identifier names in library-bridge to
  something more dictionary-specific. This is needed because later on,
  catboost will be integrated into library-bridge.

- Also: Some smaller fixes like typos and un-inlining non-performance
  critical code.

- The logic remains unchanged in this commit.
2022-08-04 19:26:51 +00:00
Robert Schulze
dcc8751685
Disable harmful env var check to workaround failure to start the server 2022-07-31 08:55:07 +00:00
Robert Schulze
7c23e48b5b
Revert exclusion of libharmful (did not work anyways) 2022-07-31 08:05:12 +00:00
Robert Schulze
7fe106a0fb
Try to fix libharmful fail 2022-07-31 07:44:25 +00:00
Robert Schulze
3d1797f75f
Merge remote-tracking branch 'origin/master' into no-split-binary 2022-07-29 12:17:43 +00:00
Azat Khuzhin
b90152b6ec Fix clickhouse-su building in splitted build
- Add status log message
- Add it to clickhouse-bundle in shared build
- Move clickhouse-su.cpp into su.cpp, since executable does not have
  include directories of linked libraries (dbms here), only
  clickhouse-lib-su does, hence it cannot find includes

CI: https://github.com/ClickHouse/ClickHouse/runs/7566319416?check_suite_focus=true
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-29 11:36:51 +03:00
Robert Schulze
199e254777
Merge remote-tracking branch 'origin/master' into no-split-binary 2022-07-28 15:54:22 +00:00
Alexey Milovidov
071374b152 Remove SPLIT_BINARY 2022-07-24 01:15:54 +02:00
Yakov Olkhovskiy
e5f165d909
Merge branch 'master' into cmake-self-extracting-executable 2022-07-13 16:09:18 -04:00
Robert Schulze
1a7727a254
Prefix overridden add_executable() command with "clickhouse_"
A simple HelloWorld program with zero includes except iostream triggers
a build of ca. 2000 source files. The reason is that ClickHouse's
top-level CMakeLists.txt overrides "add_executable()" to link all
binaries against "clickhouse_new_delete". This links against
"clickhouse_common_io", which in turn has lots of 3rd party library
dependencies ... Without linking "clickhouse_new_delete", the number of
compiled files for "HelloWorld" goes down to ca. 70.

As an example, the self-extracting-executable needs none of its current
dependencies but other programs may also benefit.

In order to restore access to the original "add_executable()", the
overriding version is now prefixed. There is precedence for a
"clickhouse_" prefix (as opposed to "ch_"), for example
"clickhouse_split_debug_symbols". In general prefixing makes sense also
because overriding CMake commands relies on undocumented behavior and is
considered not-so-great practice (*).

(*) https://crascit.com/2018/09/14/do-not-redefine-cmake-commands/
2022-07-11 19:36:18 +02:00
Yakov Olkhovskiy
8a3f124982 add self-extracting to clickhouse-bundle 2022-07-06 22:01:21 -04:00
Yakov Olkhovskiy
c6db15458a build utils/self-extracting-executable/compressor whenever we want to build compressed binary 2022-07-06 20:40:41 -04:00
Robert Schulze
59236d60c9
Merge pull request #38654 from ClickHouse/better-naming-for-split-debug-symbols
Better naming for stuff related to splitted debug symbols
2022-07-01 09:28:41 +02:00
Robert Schulze
bb358617e1
Better naming for stuff related to splitted debug symbols
The previous name was slightly misleading, e.g. it is not about
"intalling stripped binaries" but about splitting debug symbols from the
binary.
2022-06-30 23:41:27 +02:00
Yakov Olkhovskiy
5d36994c4d self-extracting requires utils (uses utils/self-extracting-executable/compressor) 2022-06-27 11:41:23 -04:00
mergify[bot]
4e5fd226c8
Merge branch 'master' into utility-self-extracting 2022-06-27 12:26:16 +00:00
Yakov Olkhovskiy
8ce6b8226d
Update CMakeLists.txt 2022-06-27 08:25:21 -04:00
Yakov Olkhovskiy
39ea5ffdcb compress clickhouse executable, new target 'self-extracted' is added 2022-06-27 01:36:27 -04:00
Robert Schulze
bc46cef63c
Minor follow-up
- change ELF section name to ".clickhouse.hash" (lowercase seems
  standard)

- more expressive/concise integrity check messages at startup
2022-06-14 08:52:13 +00:00
Robert Schulze
bc6f30fd40
Move binary hash to ELF section ".ClickHouse.hash" 2022-06-13 08:46:23 +00:00
Varinara
ed6e8176fe Add basic commands for disk tool (list-disks, list, move, remove, link, copy, read, write) + tests 2022-06-06 16:52:58 +03:00
Alexey Milovidov
fd7642b6aa Fix "splitted" build 2022-05-24 06:04:48 +02:00
Alexey Milovidov
2f93f11144 Maybe better 2022-05-23 02:03:13 +02:00
alesapin
f88f654798
Update programs/CMakeLists.txt
Co-authored-by: Mikhail f. Shiryaev <felixoid@clickhouse.com>
2022-04-22 11:30:35 +02:00
Mikhail f. Shiryaev
49a572e00c
Build only clickhouse-keeper with musl 2022-04-21 13:43:24 +02:00
alesapin
ba81816dc1 Better cmake 2022-04-20 12:11:55 +02:00
alesapin
2f496c7945 Merge branch 'master' into musl-check 2022-04-12 14:40:47 +02:00
alesapin
e790a73081 Simplify strip for new packages 2022-03-23 15:14:30 +01:00
Mikhail f. Shiryaev
1d362796df
Fix strip bug 2022-03-22 11:10:02 +01:00
Mikhail f. Shiryaev
fa2a9bb9aa
Separate BUILD_STRIPPED_BINARIES_PREFIX to option and parameter 2022-03-22 11:10:02 +01:00
alesapin
96c0e9fddf Better cmake 2022-03-11 15:47:07 +01:00
alesapin
e53578910b Add ability to strip binaries in cmake 2022-03-10 22:23:28 +01:00
Azat Khuzhin
4a0facd341 Remove MAKE_STATIC_LIBRARIES (in favor of USE_STATIC_LIBRARIES)
There is no more MAKE_*, so remove this alias.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-01-24 17:28:33 +03:00
Azat Khuzhin
7496ed7fde Remove unbundled gtest support
v2: Fix unit tests (do not rely on USE_GTEST)
2022-01-20 10:01:54 +03:00
Azat Khuzhin
5c32f6dd3e Remove unbundled nuraft support 2022-01-20 08:47:16 +03:00
Alexey Milovidov
70ad617062 Disable odbc-bridge and library-bridge 2022-01-09 11:34:27 +03:00
Alexey Boykov
c5cb4e071c Creating only one binary, check compatibility 2021-10-07 21:01:36 +03:00
Alexey Milovidov
e9e77b4403 .tech -> .com 2021-09-22 03:22:57 +03:00
kssenii
a930823518 Fix build 2021-08-29 14:18:04 +00:00
kssenii
f27f519aa2 Fix build and add example 2021-08-28 20:35:51 +00:00
kssenii
073d7fdd5e Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into disk-over-web-server 2021-08-06 19:42:29 +00:00
alesapin
743600a359 Merge branch 'update_buffer_size_in_nuraft' into zookeeper_snapshots 2021-06-21 13:46:22 +03:00
Mike Kot
4c391f8e99
SYSTEM RESTORE REPLICA replica [ON CLUSTER cluster] (#13652)
* initial commit: add setting and stub

* typo

* added test stub

* fix

* wip merging new integration test and code proto

* adding steps interpreters

* adding firstly proposed solution (moving parts etc)

* added checking zookeeper path existence

* fixing the include

* fixing and sorting includes

* fixing outdated struct

* fix the name

* added ast ptr as level of indirection

* fix ref

* updating the changes

* working on test stub

* fix iterator -> reference

* revert rocksdb submodule update

* fixed show privileges test

* updated the test stub

* replaced rand() with thread_local_rng(), updated the tests

updated the test

fixed test config path

test fix

removed error messages

fixed the test

updated the test

fixed string literal

fixed literal

typo: =

* fixed the empty replica error message

* updated the test and the code with logs

* updated the possible test cases, updated

* added the code/test milestone comments

* updated the test (added more testcases)

* replaced native assert with CH one

* individual replicas recursive delete fix

* updated the AS db.name AST

* two small logging fixes

* manually generated AST fixes

* Updated the test, added the possible algo change

* Some thoughts about optimizing the solution:

ALTER MOVE PARTITION .. TO TABLE -> move to detached/ + ALTER ... ATTACH

* fix

* Removed the replica sync in test as it's invalid

* Some test tweaks

* tmp

* Rewrote the algo by using the executeQuery instead of

hand-crafting the ASTPtr.

Two questions still active.

* tr: logging active parts

* Extracted the parts moving algo into a separate helper function

* Fixed the test data and the queries slightly

* Replaced query to system.parts to direct invocation,

started building the test that breaks on various parts.

* Added the case for tables when at least one replica is alive

* Updated the test to test replicas restoration by detaching/attaching

* Altered the test to check restoration without replica restart

* Added the tables swap in the start if the server failed last time

* Hotfix when only /replicas/replica... path was deleted

* Restore ZK paths while creating a replicated MergeTree table

* Updated the docs, fixed the algo for individual replicas restoration case

* Initial parts table storage fix, tests sync fix

* Reverted individual replica restoration to general algo

* Slightly optimised getDataParts

* Trying another solution with parts detaching

* Rewrote algo without any steps, added ON CLUSTER support

* Attaching parts from other replica on restoration

* Getting part checksums from ZK

* Removed ON CLUSTER, finished working solution

* Multiple small changes after review

* Fixing parallel test

* Supporting rewritten form on cluster

* Test fix

* Moar logging

* Using source replica as checksum provider

* improve test, remove some code from parser

* Trying solution with move to detached + forget

* Moving all parts (not only Committed) to detached

* Edited docs for RESTORE REPLICA

* Re-merging

* minor fixes

Co-authored-by: Alexander Tokmakov <avtokmakov@yandex-team.ru>
2021-06-20 11:24:43 +03:00
kssenii
7cc6588f96 Tool to put files on server 2021-06-18 14:11:32 +00:00