From now on cargo will not download anything from the internet during
builds. This step had been moved for docker image builds (via cargo
vendor).
And now cargo inside docker.io/clickhouse/binary-builder will not use
any crates from the internet, so we don't need to add --offline for
cargo commands in cmake (corrosion_import_crate()).
Also the docker build command had been adjusted to allow following
symlinks inside build context, by using tar, this is required for Rust
packages.
Note, that to make proper Cargo.lock that could be vendored I did the
following:
- per-project locks had been removed (since there is no automatic way to
sync the workspace Cargo.lock with per-project Cargo.lock, since cargo
update/generate-lockfile will use only per-project Cargo.toml files
apparently, -Z minimal-versions does not helps either)
- and to generate Cargo.lock with less changes I've pinned version in
the Cargo.toml strictly, i.e. not 'foo = "0.1"' but 'foo = "=0.1"'
then the Cargo.lock for workspace had been generated and afterwards
I've reverted this part.
Plus I have to update the dependencies afterwards, since otherwise there
are conflicts with dependencies for std library. Non trivial.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In #47424 the readability-identifier-naming had been disabled for
clang-tidy builds, but the code base is already uses this notations, so
at least, let's enable it for clangd, so that developers who are using
editor/IDE with clangd will be warned.
FWIW clangd does not think that this check is slow [1], but I guess this
file hadn't been updated for quite a while.
[1]: https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/TidyFastChecks.inc
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Yes, all writes to the history file is done under flock, *but*, before
writing the history file there is sort(), and so if you will run the
following tests the 01300_client_save_history_when_terminated_long will fail:
$ /src/tests/clickhouse-test --print-time -j2 01180_client_syntax_errors 01300_client_save_history_when_terminated_long 0001_select
And it has nothing todo with timeouts:
expect: does "" (spawn_id exp8) match glob pattern "for the history"? no
f8f1dbfdaaca :) select (1, 2
expect: does "\u001b[1Gf8f1dbfdaaca :) select \u001b[0;22;33m(\u001b[0;22;32m1\u001b[0;1m,\u001b[0m \u001b[0;22;32m2\u001b[0m\u001b[J" (spawn_id exp8) match glob pattern "for the history"? no
expect: does "\u001b[1Gf8f1dbfdaaca :) select \u001b[0;22;33m(\u001b[0;22;32m1\u001b[0;1m,\u001b[0m \u001b[0;22;32m2\u001b[0m\u001b[J\u001b[29G" (spawn_id exp8) match glob pattern "for the history"? no
expect: timed out
The "select (1, 2" is from 01180_client_syntax_errors
And use real file only when the history should be preserved across runs
(i.e. there are multiple invocations of clickhouse-client)
CI: https://s3.amazonaws.com/clickhouse-test-reports/0/1adfbac19fed9813725d8b1df14e617b58a45d20/stateless_tests__asan__[2/2].html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Note, that it can the fail the client if the skim itself will fail,
however I haven't seen it panicd, so let's try.
P.S. about adding USE_SKIM into configure header instead of just compile
option for target, it is better, because it allows not to recompile lots
of C++ headers, since we have to add skim library as PUBLIC. But anyway
this will be resolved in a different way, but separatelly.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Note, that it can the fail the client if the skim itself will fail,
however I haven't seen it panicd, so let's try.
P.S. about adding USE_SKIM into configure header instead of just compile
option for target, it is better, because it allows not to recompile lots
of C++ headers, since we have to add skim library as PUBLIC.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.
SQL syntax:
SELECT
catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
ACTION AS target
FROM amazon_train
LIMIT 10
Required configuration:
<catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
*** Implementation Details ***
The internal protocol between the server and the library-bridge is
simple:
- HTTP GET on path "/extdict_ping":
A ping, used during the handshake to check if the library-bridge runs.
- HTTP POST on path "extdict_request"
(1) Send a "catboost_GetTreeCount" request from the server to the
bridge, containing a library path (e.g /home/user/libcatboost.so) and
a model path (e.g. /home/user/model.bin). Rirst, this unloads the
catboost library handler associated to the model path (if it was
loaded), then loads the catboost library handler associated to the
model path, then executes GetTreeCount() on the library handler and
finally sends the result back to the server. Step (1) is called once
by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
library path handler is unloaded in the beginning because it contains
state which may no longer be valid if the user runs
catboost("/path/to/model.bin", ...) more than once and if "model.bin"
was updated in between.
(2) Send "catboost_Evaluate" from the server to the bridge, containing
the model path and the features to run the interference on. Step (2)
is called multiple times (once per chunk) by the server from function
FunctionCatBoostEvaluate::executeImpl(). The library handler for the
given model path is expected to be already loaded by Step (1).
Fixes#27870
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.
SQL syntax:
SELECT
catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
ACTION AS target
FROM amazon_train
LIMIT 10
Required configuration:
<catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
*** Implementation Details ***
The internal protocol between the server and the library-bridge is
simple:
- HTTP GET on path "/extdict_ping":
A ping, used during the handshake to check if the library-bridge runs.
- HTTP POST on path "extdict_request"
(1) Send a "catboost_GetTreeCount" request from the server to the
bridge, containing a library path (e.g /home/user/libcatboost.so) and
a model path (e.g. /home/user/model.bin). Rirst, this unloads the
catboost library handler associated to the model path (if it was
loaded), then loads the catboost library handler associated to the
model path, then executes GetTreeCount() on the library handler and
finally sends the result back to the server. Step (1) is called once
by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
library path handler is unloaded in the beginning because it contains
state which may no longer be valid if the user runs
catboost("/path/to/model.bin", ...) more than once and if "model.bin"
was updated in between.
(2) Send "catboost_Evaluate" from the server to the bridge, containing
the model path and the features to run the interference on. Step (2)
is called multiple times (once per chunk) by the server from function
FunctionCatBoostEvaluate::executeImpl(). The library handler for the
given model path is expected to be already loaded by Step (1).
Fixes#27870
* replace exit with assert in test_single_page
* improve save_raw_single_page docs option
* More grammar fixes
* "Built from" link in new tab
* fix mistype
* Example of include in docs
* add anchor to meeting form
* Draft of translation helper
* WIP on translation helper
* Replace some fa docs content with machine translation