ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-21 15:12:02 +00:00

Author	SHA1	Message	Date
Robert Schulze	60f9f6855d	feat: implement catboost in library-bridge This commit moves the catboost model evaluation out of the server process into the library-bridge binary. This serves two goals: On the one hand, crashes / memory corruptions of the catboost library no longer affect the server. On the other hand, we can forbid loading dynamic libraries in the server (catboost was the last consumer of this functionality), thus improving security. SQL syntax: SELECT catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction, ACTION AS target FROM amazon_train LIMIT 10 Required configuration: <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path> * Implementation Details * The internal protocol between the server and the library-bridge is simple: - HTTP GET on path "/extdict_ping": A ping, used during the handshake to check if the library-bridge runs. - HTTP POST on path "extdict_request" (1) Send a "catboost_GetTreeCount" request from the server to the bridge, containing a library path (e.g /home/user/libcatboost.so) and a model path (e.g. /home/user/model.bin). Rirst, this unloads the catboost library handler associated to the model path (if it was loaded), then loads the catboost library handler associated to the model path, then executes GetTreeCount() on the library handler and finally sends the result back to the server. Step (1) is called once by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The library path handler is unloaded in the beginning because it contains state which may no longer be valid if the user runs catboost("/path/to/model.bin", ...) more than once and if "model.bin" was updated in between. (2) Send "catboost_Evaluate" from the server to the bridge, containing the model path and the features to run the interference on. Step (2) is called multiple times (once per chunk) by the server from function FunctionCatBoostEvaluate::executeImpl(). The library handler for the given model path is expected to be already loaded by Step (1). Fixes #27870	2022-09-08 09:01:32 +00:00
Robert Schulze	912663b719	Revert "Move CatBoost evaluation into clickhouse-library-bridge"	2022-08-31 20:54:43 +02:00
Robert Schulze	6b2b3c1eb3	feat: implement catboost in library-bridge This commit moves the catboost model evaluation out of the server process into the library-bridge binary. This serves two goals: On the one hand, crashes / memory corruptions of the catboost library no longer affect the server. On the other hand, we can forbid loading dynamic libraries in the server (catboost was the last consumer of this functionality), thus improving security. SQL syntax: SELECT catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction, ACTION AS target FROM amazon_train LIMIT 10 Required configuration: <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path> * Implementation Details * The internal protocol between the server and the library-bridge is simple: - HTTP GET on path "/extdict_ping": A ping, used during the handshake to check if the library-bridge runs. - HTTP POST on path "extdict_request" (1) Send a "catboost_GetTreeCount" request from the server to the bridge, containing a library path (e.g /home/user/libcatboost.so) and a model path (e.g. /home/user/model.bin). Rirst, this unloads the catboost library handler associated to the model path (if it was loaded), then loads the catboost library handler associated to the model path, then executes GetTreeCount() on the library handler and finally sends the result back to the server. Step (1) is called once by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The library path handler is unloaded in the beginning because it contains state which may no longer be valid if the user runs catboost("/path/to/model.bin", ...) more than once and if "model.bin" was updated in between. (2) Send "catboost_Evaluate" from the server to the bridge, containing the model path and the features to run the interference on. Step (2) is called multiple times (once per chunk) by the server from function FunctionCatBoostEvaluate::executeImpl(). The library handler for the given model path is expected to be already loaded by Step (1). Fixes #27870	2022-08-29 20:26:45 +00:00
Yakov Olkhovskiy	31a7ed09a1	disable default ENABLE_CLICKHOUSE_SELF_EXTRACTING and add to env	2022-08-27 21:08:01 +00:00
Robert Schulze	ad0d060dc1	Merge pull request #39904 from ClickHouse/library-bridge-refactoring Prepare library-bridge for catboost integration	2022-08-08 12:15:01 +02:00
Yakov Olkhovskiy	b1f45fa787	Don't create self-extracting clickhouse for split build	2022-08-05 21:48:40 -04:00
Robert Schulze	ea73b98fb9	Prepare library-bridge for catboost integration - Rename generic file and identifier names in library-bridge to something more dictionary-specific. This is needed because later on, catboost will be integrated into library-bridge. - Also: Some smaller fixes like typos and un-inlining non-performance critical code. - The logic remains unchanged in this commit.	2022-08-04 19:26:51 +00:00
Robert Schulze	dcc8751685	Disable harmful env var check to workaround failure to start the server	2022-07-31 08:55:07 +00:00
Robert Schulze	7c23e48b5b	Revert exclusion of libharmful (did not work anyways)	2022-07-31 08:05:12 +00:00
Robert Schulze	7fe106a0fb	Try to fix libharmful fail	2022-07-31 07:44:25 +00:00
Robert Schulze	3d1797f75f	Merge remote-tracking branch 'origin/master' into no-split-binary	2022-07-29 12:17:43 +00:00
Azat Khuzhin	b90152b6ec	Fix clickhouse-su building in splitted build - Add status log message - Add it to clickhouse-bundle in shared build - Move clickhouse-su.cpp into su.cpp, since executable does not have include directories of linked libraries (dbms here), only clickhouse-lib-su does, hence it cannot find includes CI: https://github.com/ClickHouse/ClickHouse/runs/7566319416?check_suite_focus=true Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-07-29 11:36:51 +03:00
Robert Schulze	199e254777	Merge remote-tracking branch 'origin/master' into no-split-binary	2022-07-28 15:54:22 +00:00
Alexey Milovidov	071374b152	Remove SPLIT_BINARY	2022-07-24 01:15:54 +02:00
Yakov Olkhovskiy	e5f165d909	Merge branch 'master' into cmake-self-extracting-executable	2022-07-13 16:09:18 -04:00
Robert Schulze	1a7727a254	Prefix overridden add_executable() command with "clickhouse_" A simple HelloWorld program with zero includes except iostream triggers a build of ca. 2000 source files. The reason is that ClickHouse's top-level CMakeLists.txt overrides "add_executable()" to link all binaries against "clickhouse_new_delete". This links against "clickhouse_common_io", which in turn has lots of 3rd party library dependencies ... Without linking "clickhouse_new_delete", the number of compiled files for "HelloWorld" goes down to ca. 70. As an example, the self-extracting-executable needs none of its current dependencies but other programs may also benefit. In order to restore access to the original "add_executable()", the overriding version is now prefixed. There is precedence for a "clickhouse_" prefix (as opposed to "ch_"), for example "clickhouse_split_debug_symbols". In general prefixing makes sense also because overriding CMake commands relies on undocumented behavior and is considered not-so-great practice (). () https://crascit.com/2018/09/14/do-not-redefine-cmake-commands/	2022-07-11 19:36:18 +02:00
Yakov Olkhovskiy	8a3f124982	add self-extracting to clickhouse-bundle	2022-07-06 22:01:21 -04:00
Yakov Olkhovskiy	c6db15458a	build utils/self-extracting-executable/compressor whenever we want to build compressed binary	2022-07-06 20:40:41 -04:00
Robert Schulze	59236d60c9	Merge pull request #38654 from ClickHouse/better-naming-for-split-debug-symbols Better naming for stuff related to splitted debug symbols	2022-07-01 09:28:41 +02:00
Robert Schulze	bb358617e1	Better naming for stuff related to splitted debug symbols The previous name was slightly misleading, e.g. it is not about "intalling stripped binaries" but about splitting debug symbols from the binary.	2022-06-30 23:41:27 +02:00
Yakov Olkhovskiy	5d36994c4d	self-extracting requires utils (uses utils/self-extracting-executable/compressor)	2022-06-27 11:41:23 -04:00
mergify[bot]	4e5fd226c8	Merge branch 'master' into utility-self-extracting	2022-06-27 12:26:16 +00:00
Yakov Olkhovskiy	8ce6b8226d	Update CMakeLists.txt	2022-06-27 08:25:21 -04:00
Yakov Olkhovskiy	39ea5ffdcb	compress clickhouse executable, new target 'self-extracted' is added	2022-06-27 01:36:27 -04:00
Robert Schulze	bc46cef63c	Minor follow-up - change ELF section name to ".clickhouse.hash" (lowercase seems standard) - more expressive/concise integrity check messages at startup	2022-06-14 08:52:13 +00:00
Robert Schulze	bc6f30fd40	Move binary hash to ELF section ".ClickHouse.hash"	2022-06-13 08:46:23 +00:00
Varinara	ed6e8176fe	Add basic commands for disk tool (list-disks, list, move, remove, link, copy, read, write) + tests	2022-06-06 16:52:58 +03:00
Alexey Milovidov	fd7642b6aa	Fix "splitted" build	2022-05-24 06:04:48 +02:00
Alexey Milovidov	2f93f11144	Maybe better	2022-05-23 02:03:13 +02:00
alesapin	f88f654798	Update programs/CMakeLists.txt Co-authored-by: Mikhail f. Shiryaev <felixoid@clickhouse.com>	2022-04-22 11:30:35 +02:00
Mikhail f. Shiryaev	49a572e00c	Build only clickhouse-keeper with musl	2022-04-21 13:43:24 +02:00
alesapin	ba81816dc1	Better cmake	2022-04-20 12:11:55 +02:00
alesapin	2f496c7945	Merge branch 'master' into musl-check	2022-04-12 14:40:47 +02:00
alesapin	e790a73081	Simplify strip for new packages	2022-03-23 15:14:30 +01:00
Mikhail f. Shiryaev	1d362796df	Fix strip bug	2022-03-22 11:10:02 +01:00
Mikhail f. Shiryaev	fa2a9bb9aa	Separate BUILD_STRIPPED_BINARIES_PREFIX to option and parameter	2022-03-22 11:10:02 +01:00
alesapin	96c0e9fddf	Better cmake	2022-03-11 15:47:07 +01:00
alesapin	e53578910b	Add ability to strip binaries in cmake	2022-03-10 22:23:28 +01:00
Azat Khuzhin	4a0facd341	Remove MAKE_STATIC_LIBRARIES (in favor of USE_STATIC_LIBRARIES) There is no more MAKE_*, so remove this alias. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-01-24 17:28:33 +03:00
Azat Khuzhin	7496ed7fde	Remove unbundled gtest support v2: Fix unit tests (do not rely on USE_GTEST)	2022-01-20 10:01:54 +03:00
Azat Khuzhin	5c32f6dd3e	Remove unbundled nuraft support	2022-01-20 08:47:16 +03:00
Alexey Milovidov	70ad617062	Disable odbc-bridge and library-bridge	2022-01-09 11:34:27 +03:00
Alexey Boykov	c5cb4e071c	Creating only one binary, check compatibility	2021-10-07 21:01:36 +03:00
Alexey Milovidov	e9e77b4403	.tech -> .com	2021-09-22 03:22:57 +03:00
kssenii	a930823518	Fix build	2021-08-29 14:18:04 +00:00
kssenii	f27f519aa2	Fix build and add example	2021-08-28 20:35:51 +00:00
kssenii	073d7fdd5e	Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into disk-over-web-server	2021-08-06 19:42:29 +00:00
alesapin	743600a359	Merge branch 'update_buffer_size_in_nuraft' into zookeeper_snapshots	2021-06-21 13:46:22 +03:00
Mike Kot	4c391f8e99	SYSTEM RESTORE REPLICA replica [ON CLUSTER cluster] (#13652 ) * initial commit: add setting and stub * typo * added test stub * fix * wip merging new integration test and code proto * adding steps interpreters * adding firstly proposed solution (moving parts etc) * added checking zookeeper path existence * fixing the include * fixing and sorting includes * fixing outdated struct * fix the name * added ast ptr as level of indirection * fix ref * updating the changes * working on test stub * fix iterator -> reference * revert rocksdb submodule update * fixed show privileges test * updated the test stub * replaced rand() with thread_local_rng(), updated the tests updated the test fixed test config path test fix removed error messages fixed the test updated the test fixed string literal fixed literal typo: = * fixed the empty replica error message * updated the test and the code with logs * updated the possible test cases, updated * added the code/test milestone comments * updated the test (added more testcases) * replaced native assert with CH one * individual replicas recursive delete fix * updated the AS db.name AST * two small logging fixes * manually generated AST fixes * Updated the test, added the possible algo change * Some thoughts about optimizing the solution: ALTER MOVE PARTITION .. TO TABLE -> move to detached/ + ALTER ... ATTACH * fix * Removed the replica sync in test as it's invalid * Some test tweaks * tmp * Rewrote the algo by using the executeQuery instead of hand-crafting the ASTPtr. Two questions still active. * tr: logging active parts * Extracted the parts moving algo into a separate helper function * Fixed the test data and the queries slightly * Replaced query to system.parts to direct invocation, started building the test that breaks on various parts. * Added the case for tables when at least one replica is alive * Updated the test to test replicas restoration by detaching/attaching * Altered the test to check restoration without replica restart * Added the tables swap in the start if the server failed last time * Hotfix when only /replicas/replica... path was deleted * Restore ZK paths while creating a replicated MergeTree table * Updated the docs, fixed the algo for individual replicas restoration case * Initial parts table storage fix, tests sync fix * Reverted individual replica restoration to general algo * Slightly optimised getDataParts * Trying another solution with parts detaching * Rewrote algo without any steps, added ON CLUSTER support * Attaching parts from other replica on restoration * Getting part checksums from ZK * Removed ON CLUSTER, finished working solution * Multiple small changes after review * Fixing parallel test * Supporting rewritten form on cluster * Test fix * Moar logging * Using source replica as checksum provider * improve test, remove some code from parser * Trying solution with move to detached + forget * Moving all parts (not only Committed) to detached * Edited docs for RESTORE REPLICA * Re-merging * minor fixes Co-authored-by: Alexander Tokmakov <avtokmakov@yandex-team.ru>	2021-06-20 11:24:43 +03:00
kssenii	7cc6588f96	Tool to put files on server	2021-06-18 14:11:32 +00:00

1 2

94 Commits