Update Annoy docs

This commit is contained in:
Robert Schulze 2023-06-12 20:06:57 +00:00
parent 33ab8ee95c
commit 4f39ee51ae
No known key found for this signature in database
GPG Key ID: 26703B55FB13728A
4 changed files with 17 additions and 14 deletions

View File

@ -100,7 +100,7 @@ ANN indexes support two types of queries:
:::tip
To avoid writing out large vectors, you can use [query
parameters](/docs/en//interfaces/cli.md#queries-with-parameters-cli-queries-with-parameters), e.g.
parameters](/docs/en/interfaces/cli.md#queries-with-parameters-cli-queries-with-parameters), e.g.
```bash
clickhouse-client --param_vec='hello' --query="SELECT * FROM table WHERE L2Distance(vectors, {vec: Array(Float32)}) < 1.0"
@ -128,14 +128,14 @@ granularity of granules, sub-indexes extrapolate matching rows to granule granul
skip data at the granularity of index blocks.
The `GRANULARITY` parameter determines how many ANN sub-indexes are created. Bigger `GRANULARITY` values mean fewer but larger ANN
sub-indexes, up to the point where a column (or a column part) has only a single sub-index. In that case, the sub-index has a "global" view of
all column rows and can directly return all granules of the column (part) with relevant rows (there are at at most `LIMIT <N>`-many
such granules). In a second step, ClickHouse will load these granules and identify the actually best rows by performing a brute-force distance
calculation over all rows of the granules. With a small `GRANULARITY` value, each of the sub-indexes returns up to `LIMIT N`-many granules.
As a result, more granules need to be loaded and post-filtered. Note that the search accuracy is with both cases equally good, only the
processing performance differs. It is generally recommended to use a large `GRANULARITY` for ANN indexes and fall back to a smaller
`GRANULARITY` values only in case of problems like excessive memory consumption of the ANN structures. If no `GRANULARITY` was specified for
ANN indexes, the default value is 100 million.
sub-indexes, up to the point where a column (or a column's data part) has only a single sub-index. In that case, the sub-index has a
"global" view of all column rows and can directly return all granules of the column (part) with relevant rows (there are at most `LIMIT
<N>`-many such granules). In a second step, ClickHouse will load these granules and identify the actually best rows by performing a
brute-force distance calculation over all rows of the granules. With a small `GRANULARITY` value, each of the sub-indexes returns up to
`LIMIT N`-many granules. As a result, more granules need to be loaded and post-filtered. Note that the search accuracy is with both cases
equally good, only the processing performance differs. It is generally recommended to use a large `GRANULARITY` for ANN indexes and fall
back to a smaller `GRANULARITY` values only in case of problems like excessive memory consumption of the ANN structures. If no `GRANULARITY`
was specified for ANN indexes, the default value is 100 million.
# Available ANN Indexes
@ -204,7 +204,7 @@ values mean more accurate results at the cost of longer query runtime:
``` sql
SELECT *
FROM table_name [WHERE ...]
FROM table_name
ORDER BY L2Distance(vectors, Point)
LIMIT N
SETTINGS annoy_index_search_k_nodes=100

View File

@ -12,6 +12,9 @@ class ASTFunction;
class ASTIndexDeclaration : public IAST
{
public:
static const auto DEFAULT_INDEX_GRANULARITY = 1uz;
static const auto DEFAULT_ANNOY_INDEX_GRANULARITY = 100'000'000uz;
String name;
IAST * expr;
ASTFunction * type;

View File

@ -52,9 +52,9 @@ bool ParserCreateIndexDeclaration::parseImpl(Pos & pos, ASTPtr & node, Expected
else
{
if (index->type->name == "annoy")
index->granularity = 100'000'000;
index->granularity = ASTIndexDeclaration::DEFAULT_ANNOY_INDEX_GRANULARITY;
else
index->granularity = 1;
index->granularity = ASTIndexDeclaration::DEFAULT_INDEX_GRANULARITY;
}
node = index;

View File

@ -147,9 +147,9 @@ bool ParserIndexDeclaration::parseImpl(Pos & pos, ASTPtr & node, Expected & expe
else
{
if (index->type->name == "annoy")
index->granularity = 100'000'000;
index->granularity = ASTIndexDeclaration::DEFAULT_ANNOY_INDEX_GRANULARITY;
else
index->granularity = 1;
index->granularity = ASTIndexDeclaration::DEFAULT_INDEX_GRANULARITY;
}
node = index;