add test and docs

This commit is contained in:
FArthur-cmd 2022-08-22 19:00:20 +00:00
parent 6c7e04e13d
commit 9222e41a5b
3 changed files with 26 additions and 1 deletions

View File

@ -32,13 +32,18 @@ Approximate Nearest Neighbor Search Indexes (`ANNIndexes`) are similar to skip i
LIMIT N
```
In these queries, `DistanceFunction` is selected from [distance functions](../../../sql-reference/functions/distance-functions). `Point` is a known vector (something like `(0.1, 0.1, ... )`). `Value` - a float value that will bound the neighbourhood.
In these queries, `DistanceFunction` is selected from [distance functions](../../../sql-reference/functions/distance-functions). `Point` is a known vector (something like `(0.1, 0.1, ... )`). To avoid writing large vectors, use [client parameters](../../../interfaces/cli.md#queries-with-parameters-cli-queries-with-parameters). `Value` - a float value that will bound the neighbourhood.
!!! note "Note"
ANN index can't speed up query that satisfies both types(`where + order by`, only one of them). All queries must have the limit, as algorithms are used to find nearest neighbors and need a specific number of them.
!!! note "Note"
Indexes are applied only to queries with a limit less than the `max_limit_for_ann_queries` setting. This helps to avoid memory overflows in queries with a large limit. `max_limit_for_ann_queries` setting can be changed if you know you can provide enough memory. The default value is `1000000`.
Both types of queries are handled the same way. The indexes get `n` neighbors (where `n` is taken from the `LIMIT` clause) and work with them. In `ORDER BY` query they remember the numbers of all parts of the granule that have at least one of neighbor. In `WHERE` query they remember only those parts that satisfy the requirements.
## Create table with ANNIndex
```sql

View File

@ -6,3 +6,11 @@
1 [0,0,10]
5 [0,0,10.2]
4 [0,0,9.7]
1 [0,0,10]
2 [0,0,10.5]
3 [0,0,9.5]
4 [0,0,9.7]
5 [0,0,10.2]
1 [0,0,10]
5 [0,0,10.2]
4 [0,0,9.7]

View File

@ -24,6 +24,18 @@ FROM 02354_annoy
ORDER BY L2Distance(embedding, [0.0, 0.0, 10.0])
LIMIT 3;
SET param_02354_target_vector='[0.0, 0.0, 10.0]';
SELECT *
FROM 02354_annoy
WHERE L2Distance(embedding, {02354_target_vector: Array(Float32)}) < 1.0
LIMIT 5;
SELECT *
FROM 02354_annoy
ORDER BY L2Distance(embedding, {02354_target_vector: Array(Float32)})
LIMIT 3;
SELECT *
FROM 02354_annoy
ORDER BY L2Distance(embedding, [0.0, 0.0])