Docs: Remove tuple support from ANN indexes

Indexes for approximate nearest neighbourhood (ANN) search (Annoy, USearch) can currently be build on columns of type Array(Float32) or Tuple(Float32[, Float32[, ...]]). In practice, only Arrays are relevant which makes sense as arrays store high-dimensional embeddings consecutively and the additional flexibility of different data types in a tuple is not needed. Therefore, removing support for ANN indexes over tuple columns to simplify the code, tests and docs.
2024-09-20 00:30:49 +00:00 · 2024-05-06 14:18:30 +00:00 · 2024-05-06 14:18:30 +00:00 · b00c64fe9d
commit b00c64fe9d
parent b64ad9ac28
1 changed files with 7 additions and 48 deletions
--- a/docs/en/engines/table-engines/mergetree-family/annindexes.md
+++ b/docs/en/engines/table-engines/mergetree-family/annindexes.md
@ -22,9 +22,8 @@ ORDER BY Distance(vectors, Point)
 LIMIT N
 ```

-`vectors` contains N-dimensional values of type [Array](../../../sql-reference/data-types/array.md) or
-[Tuple](../../../sql-reference/data-types/tuple.md), for example embeddings. Function `Distance` computes the distance between two vectors.
-Often, the Euclidean (L2) distance is chosen as distance function but [other
+`vectors` contains N-dimensional values of type [Array(Float32)](../../../sql-reference/data-types/array.md), for example embeddings.
+Function `Distance` computes the distance between two vectors. Often, the Euclidean (L2) distance is chosen as distance function but [other
 distance functions](/docs/en/sql-reference/functions/distance-functions.md) are also possible. `Point` is the reference point, e.g. `(0.17,
 0.33, ...)`, and `N` limits the number of search results.

@ -47,7 +46,7 @@ of the search space (using clustering, search trees, etc.) which allows to compu

 # Creating and Using ANN Indexes {#creating_using_ann_indexes}

-Syntax to create an ANN index over an [Array](../../../sql-reference/data-types/array.md) column:
+Syntax to create an ANN index over an [Array(Float32)](../../../sql-reference/data-types/array.md) column:

 ```sql
 CREATE TABLE table_with_ann_index
@ -60,19 +59,6 @@ ENGINE = MergeTree
 ORDER BY id;
 ```

-Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
-
-```sql
-CREATE TABLE table_with_ann_index
-(
-  `id` Int64,
-  `vectors` Tuple(Float32[, Float32[, ...]]),
-  INDEX [ann_index_name] vectors TYPE [ann_index_type]([ann_index_parameters]) [GRANULARITY [N]]
-)
-ENGINE = MergeTree
-ORDER BY id;
-```
-
 ANN indexes are built during column insertion and merge. As a result, `INSERT` and `OPTIMIZE` statements will be slower than for ordinary
 tables. ANNIndexes are ideally used only with immutable or rarely changed data, respectively when are far more read requests than write
 requests.
@ -164,7 +150,7 @@ linear surfaces (lines in 2D, planes in 3D etc.).
  </iframe>
 </div>

-Syntax to create an Annoy index over an [Array](../../../sql-reference/data-types/array.md) column:
+Syntax to create an Annoy index over an [Array(Float32)](../../../sql-reference/data-types/array.md) column:

 ```sql
 CREATE TABLE table_with_annoy_index
@ -177,19 +163,6 @@ ENGINE = MergeTree
 ORDER BY id;
 ```

-Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
-
-```sql
-CREATE TABLE table_with_annoy_index
-(
-  id Int64,
-  vectors Tuple(Float32[, Float32[, ...]]),
-  INDEX [ann_index_name] vectors TYPE annoy([Distance[, NumTrees]]) [GRANULARITY N]
-)
-ENGINE = MergeTree
-ORDER BY id;
-```
-
 Annoy currently supports two distance functions:
 - `L2Distance`, also called Euclidean distance, is the length of a line segment between two points in Euclidean space
  ([Wikipedia](https://en.wikipedia.org/wiki/Euclidean_distance)).
@ -203,10 +176,9 @@ Parameter `NumTrees` is the number of trees which the algorithm creates (default
 more accurate search results but slower index creation / query times (approximately linearly) as well as larger index sizes.

 :::note
-Indexes over columns of type `Array` will generally work faster than indexes on `Tuple` columns. All arrays must have same length. To avoid
-errors, you can use a [CONSTRAINT](/docs/en/sql-reference/statements/create/table.md#constraints), for example, `CONSTRAINT
-constraint_name_1 CHECK length(vectors) = 256`. Also, empty `Arrays` and unspecified `Array` values in INSERT statements (i.e. default
-values) are not supported.
+All arrays must have same length. To avoid errors, you can use a
+[CONSTRAINT](/docs/en/sql-reference/statements/create/table.md#constraints), for example, `CONSTRAINT constraint_name_1 CHECK
+length(vectors) = 256`. Also, empty `Arrays` and unspecified `Array` values in INSERT statements (i.e. default values) are not supported.
 :::

 The creation of Annoy indexes (whenever a new part is build, e.g. at the end of a merge) is a relatively slow process. You can increase
@ -264,19 +236,6 @@ ENGINE = MergeTree
 ORDER BY id;
 ```

-Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
-
-```sql
-CREATE TABLE table_with_usearch_index
-(
-  id Int64,
-  vectors Tuple(Float32[, Float32[, ...]]),
-  INDEX [ann_index_name] vectors TYPE usearch([Distance[, ScalarKind]]) [GRANULARITY N]
-)
-ENGINE = MergeTree
-ORDER BY id;
-```
-
 USearch currently supports two distance functions:
 - `L2Distance`, also called Euclidean distance, is the length of a line segment between two points in Euclidean space
  ([Wikipedia](https://en.wikipedia.org/wiki/Euclidean_distance)).