Previously, only this syntax to create a skip index worked:
INDEX index_name column_name TYPE vector_similarity('hnsw', 'L2Distance')
Now, this syntax will work as well:
INDEX index_name column_name TYPE vector_similarity(hnsw, L2Distance)
USearch (similar to FAISS) allows to specify the distance function,
quantization, and various HNSW meta-parameters for index creation and
sarch. Some users wished for greater configurability, so let's expose
them.
Index creation now requires either
- 2 parameters (with the other 4 parameters taking on default values), or
- 6 parameters for full control
This commit also remove quantization `f64` (that would be upsampling).
First, index type "vector_similarity" is more speaking and user-friendly
than "usearch". Second, we should not expose the name of the library
doing the job (usearch). Of course, the docs will continue to mention
usearch (credit where credit is due).
Existing setting `allow_experimental_usearch_index` was marked obsolete.
A new settings `allow_experimental_vector_similarity_index` was added.
These kind of vector search similarity queries are rather obscure and
rare in practice. They require the user to specify a maximum distance
which is not intuitive to obtain. Furthermore, these queries are not
natively supported in USearch, so the vector search index had to emulate
these queries.
Therefore simplifying the code base and restricting vector search to
ORDER-BY queries only.
Annoy indexes fell out of favor in the community, at least when it comes
to vector databases. Such indexes work okay-ish low dimensions but they
suffers badly from a curse of dimensionality which makes them inapt for
a high number of dimensions.
Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.
(*) spotify/annoy#456
Indexes for approximate nearest neighbourhood (ANN) search (Annoy,
USearch) can currently be build on columns of type Array(Float32) or
Tuple(Float32[, Float32[, ...]]). In practice, only Arrays are relevant
which makes sense as arrays store high-dimensional embeddings
consecutively and the additional flexibility of different data types in
a tuple is not needed.
Therefore, removing support for ANN indexes over tuple columns to
simplify the code, tests and docs.