From now on cargo will not download anything from the internet during
builds. This step had been moved for docker image builds (via cargo
vendor).
And now cargo inside docker.io/clickhouse/binary-builder will not use
any crates from the internet, so we don't need to add --offline for
cargo commands in cmake (corrosion_import_crate()).
Also the docker build command had been adjusted to allow following
symlinks inside build context, by using tar, this is required for Rust
packages.
Note, that to make proper Cargo.lock that could be vendored I did the
following:
- per-project locks had been removed (since there is no automatic way to
sync the workspace Cargo.lock with per-project Cargo.lock, since cargo
update/generate-lockfile will use only per-project Cargo.toml files
apparently, -Z minimal-versions does not helps either)
- and to generate Cargo.lock with less changes I've pinned version in
the Cargo.toml strictly, i.e. not 'foo = "0.1"' but 'foo = "=0.1"'
then the Cargo.lock for workspace had been generated and afterwards
I've reverted this part.
Plus I have to update the dependencies afterwards, since otherwise there
are conflicts with dependencies for std library. Non trivial.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
LTO in Rust produces multiple definition of `rust_eh_personality' (and
few others), and to overcome this --allow-multiple-definition has been
added.
Query for benchmark:
SELECT ignore(BLAKE3(materialize('Lorem ipsum dolor sit amet, consectetur adipiscing elit'))) FROM numbers(1000000000) FORMAT `Null`
upstream : Elapsed: 2.494 sec. Processed 31.13 million rows, 249.08 MB (12.48 million rows/s., 99.86 MB/s.)
upstream + rust lto: Elapsed: 13.56 sec. Processed 191.9 million rows, 1.5400 GB (14.15 million rows/s., 113.22 MB/s.)
llvm BLAKE3 : Elapsed: 3.053 sec. Processed 43.24 million rows, 345.88 MB (14.16 million rows/s., 113.28 MB/s.)
Note, I thought about simply replacing it with BLAKE3 from LLVM, but:
- this will not solve LTO issues for Rust (and in future more libraries
could be added)
- it makes integrating_rust_libraries.md useless (and there is even blog
post)
So instead I've decided to add this quirk (--allow-multiple-definition)
to fix builds.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Right now fuzzy search is too smart for SQL, it even takes into account
the case, which should not be accounted (you don't want to type "SELECT"
instead of "select" to find the query).
And to tell the truth, I think too smart fuzzy searching for SQL queries
is not required, and is only harming.
Exact matching seems better algorithm for SQL, it is not 100% exact, it
splits by space, and apply separate matcher actually for each word.
Note, that if you think that "space is not enough" as the delimiter,
then you should first know that this is the delimiter only for the
input query, so to match "system.query_log" you can use "sy qu log"
(also you can disable exact mode by prepending "'" char).
But it ignores the case by default, and the behaviour what is expected
from the CaseMatching::Ignore.
TL;DR;
Just for the history I will describe what had been tried.
At first I tried CaseMatching::Ignore - it does not helps for
SkimV1/SkimV2/Clangd matches.
So I converted lines from the history and input query, to the lower
case. However this does not work for UPPER CASE, since only initial
portion of the query had been converted to the lower.
Then I've looked into skim/fuzzy-matcher crates code, and look for the
reason why CaseMatching::Ignore does not work, and found that there is
still a penalty for case mismatch, but there is no way to pass it from
the user code, so I've tried guerrilla to monkey patch the library's
code and it works:
// Avoid penalty for case mismatch (even with CaseMatching::Ignore)
let _guard = guerrilla::patch0(SkimScoreConfig::default, || {
let score_match = 16;
let gap_start = -3;
let gap_extension = -1;
let bonus_first_char_multiplier = 2;
return SkimScoreConfig{
score_match,
gap_start,
gap_extension,
bonus_first_char_multiplier,
bonus_head: score_match / 2,
bonus_break: score_match / 2 + gap_extension,
bonus_camel: score_match / 2 + 2 * gap_extension,
bonus_consecutive: -(gap_start + gap_extension),
// penalty_case_mismatch: gap_extension * 2,
penalty_case_mismatch: 0,
};
});
But this does not sounds like a trivial code, so I decided, to look
around, and realized that "exact" matching should do what is required
for the completion of queries (at least from my point of view).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In case of multi-line queries in the history, skim may leave some
symbols on the screen, which looks icky.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This leads to the problem when you switch compiler flags, for example:
$ cmake -DSANITIZE=memory ..
$ ninja
$ cmake -DSANITIZE= ..
$ ninja
And this leads to:
ld.lld-15: error: undefined symbol: __msan_init
>>> referenced by lib.rs.cc
>>> lib.rs.o:(msan.module_ctor) in archive rust/skim/RelWithDebInfo/lib_ch_rust_skim_rust.a
Reported-by: @alexey-milovidov
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Before this patch corrosion requires that CMAKE_BUILD_TYPE matches the
CMAKE_CONFIGURATION_TYPES, which is
"RelWithDebInfo;Debug;Release;MinSizeRel", so that said, that if you
were using CMAKE_BUILD_TYPE=debug, it will not work.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Note, that it can the fail the client if the skim itself will fail,
however I haven't seen it panicd, so let's try.
P.S. about adding USE_SKIM into configure header instead of just compile
option for target, it is better, because it allows not to recompile lots
of C++ headers, since we have to add skim library as PUBLIC. But anyway
this will be resolved in a different way, but separatelly.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Note, that it can the fail the client if the skim itself will fail,
however I haven't seen it panicd, so let's try.
P.S. about adding USE_SKIM into configure header instead of just compile
option for target, it is better, because it allows not to recompile lots
of C++ headers, since we have to add skim library as PUBLIC.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>