ClickHouse/docs/en/development/integrating_rust_libraries.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

73 lines
4.5 KiB
Markdown
Raw Normal View History

2022-08-28 13:53:52 +00:00
---
2022-08-28 13:58:27 +00:00
slug: /en/development/integrating_rust_libraries
2022-08-28 13:53:52 +00:00
---
# Integrating Rust libraries
2022-04-20 15:33:56 +00:00
Rust library integration will be described based on BLAKE3 hash-function integration.
2022-09-28 17:25:24 +00:00
The first step of integration is to add the library to /rust folder. To do this, you need to create an empty Rust project and include the required library in Cargo.toml. It is also necessary to configure new library compilation as static by adding `crate-type = ["staticlib"]` to Cargo.toml.
2022-04-20 15:33:56 +00:00
2022-09-28 17:25:24 +00:00
Next, you need to link the library to CMake using Corrosion library. The first step is to add the library folder in the CMakeLists.txt inside the /rust folder. After that, you should add the CMakeLists.txt file to the library directory. In it, you need to call the Corrosion import function. These lines were used to import BLAKE3:
2022-04-20 15:33:56 +00:00
```
2022-09-28 17:25:24 +00:00
corrosion_import_crate(MANIFEST_PATH Cargo.toml NO_STD)
2022-04-20 15:33:56 +00:00
2022-09-28 17:25:24 +00:00
target_include_directories(_ch_rust_blake3 INTERFACE include)
add_library(ch_rust::blake3 ALIAS _ch_rust_blake3)
2022-04-20 15:33:56 +00:00
```
2022-09-28 17:25:24 +00:00
Thus, we will create a correct CMake target using Corrosion, and then rename it with a more convenient name. Note that the name `_ch_rust_blake3` comes from Cargo.toml, where it is used as project name (`name = "_ch_rust_blake3"`).
2022-04-20 15:33:56 +00:00
2022-09-28 17:25:24 +00:00
Since Rust data types are not compatible with C/C++ data types, we will use our empty library project to create shim methods for conversion of data received from C/C++, calling library methods, and inverse conversion for output data. For example, this method was written for BLAKE3:
2022-04-20 15:33:56 +00:00
```
#[no_mangle]
pub unsafe extern "C" fn blake3_apply_shim(
begin: *const c_char,
_size: u32,
out_char_data: *mut u8,
) -> *mut c_char {
if begin.is_null() {
let err_str = CString::new("input was a null pointer").unwrap();
return err_str.into_raw();
}
2022-09-28 17:25:24 +00:00
let mut hasher = blake3::Hasher::new();
2022-04-20 15:33:56 +00:00
let input_bytes = CStr::from_ptr(begin);
let input_res = input_bytes.to_bytes();
hasher.update(input_res);
let mut reader = hasher.finalize_xof();
2022-09-28 17:25:24 +00:00
reader.fill(std::slice::from_raw_parts_mut(out_char_data, blake3::OUT_LEN));
2022-04-20 15:33:56 +00:00
std::ptr::null_mut()
}
```
2022-06-02 13:01:59 +00:00
This method gets C-compatible string, its size and output string pointer as input. Then, it converts C-compatible inputs into types that are used by actual library methods and calls them. After that, it should convert library methods' outputs back into C-compatible type. In that particular case library supported direct writing into pointer by method fill(), so the conversion was not needed. The main advice here is to create less methods, so you will need to do less conversions on each method call and won't create much overhead.
2022-04-20 15:33:56 +00:00
2022-09-28 17:25:24 +00:00
It is worth noting that the `#[no_mangle]` attribute and `extern "C"` are mandatory for all such methods. Without them, it will not be possible to perform a correct C/C++-compatible compilation. Moreover, they are necessary for the next step of the integration.
After writing the code for the shim methods, we need to prepare the header file for the library. This can be done manually, or you can use the cbindgen library for auto-generation. In case of using cbindgen, you will need to write a build.rs build script and include cbindgen as a build-dependency.
An example of a build script that can auto-generate a header file:
```
let crate_dir = env::var("CARGO_MANIFEST_DIR").unwrap();
let package_name = env::var("CARGO_PKG_NAME").unwrap();
let output_file = ("include/".to_owned() + &format!("{}.h", package_name)).to_string();
match cbindgen::generate(&crate_dir) {
Ok(header) => {
header.write_to_file(&output_file);
}
Err(err) => {
panic!("{}", err)
}
}
```
2022-06-03 11:50:17 +00:00
Also, you should use attribute #[no_mangle] and `extern "C"` for every C-compatible attribute. Without it library can compile incorrectly and cbindgen won't launch header autogeneration.
2022-04-20 15:33:56 +00:00
2022-09-28 17:25:24 +00:00
After all these steps you can test your library in a small project to find all problems with compatibility or header generation. If any problems occur during header generation, you can try to configure it with cbindgen.toml file (you can find a template here: [https://github.com/eqrion/cbindgen/blob/master/template.toml](https://github.com/eqrion/cbindgen/blob/master/template.toml)).
2022-04-20 15:33:56 +00:00
2022-09-28 17:25:24 +00:00
It is worth noting the problem that occurred when integrating BLAKE3:
MemorySanitizer can cause false-positive reports as it's unable to see if some variables in Rust are initialized or not. It was solved with writing a method with more explicit definition for some variables, although this implementation of method is slower and is used only to fix MemorySanitizer builds.