Commit Graph

488 Commits

Author SHA1 Message Date
Robert Schulze
9fb4c23c06
Merge pull request #68678 from rschu1ze/usearch-2.14
Vector similarity index: make `bf16` the default quantization
2024-08-28 08:45:02 +00:00
Robert Schulze
c40c8b7adb
Enable bf16 + f64 quantization, make bf16 the default 2024-08-23 07:32:34 +00:00
leonkozlowski
e416a2b3d2 patch: fix reference to sorting key in primary key docs 2024-08-20 09:42:19 -04:00
Robert Schulze
38a2b0dcc7
Allow Array(Float64) as type of underlying column 2024-08-15 10:47:55 +00:00
Robert Schulze
6170a8663f
Bump usearch to 2.13.2 2024-08-14 08:04:00 +00:00
Robert Schulze
fb76cb90b1
Allow un-quoted skip index parameters
Previously, only this syntax to create a skip index worked:

   INDEX index_name column_name TYPE vector_similarity('hnsw', 'L2Distance')

Now, this syntax will work as well:

  INDEX index_name column_name TYPE vector_similarity(hnsw, L2Distance)
2024-08-12 15:32:25 +00:00
Robert Schulze
d2e79f0b92
Rework vector index parameters
USearch (similar to FAISS) allows to specify the distance function,
quantization, and various HNSW meta-parameters for index creation and
sarch. Some users wished for greater configurability, so let's expose
them.

Index creation now requires either
- 2 parameters (with the other 4 parameters taking on default values), or
- 6 parameters for full control

This commit also remove quantization `f64` (that would be upsampling).
2024-08-12 15:32:19 +00:00
Robert Schulze
785b6637fa
Rename index type "usearch" to "vector_similarity"
First, index type "vector_similarity" is more speaking and user-friendly
than "usearch". Second, we should not expose the name of the library
doing the job (usearch). Of course, the docs will continue to mention
usearch (credit where credit is due).

Existing setting `allow_experimental_usearch_index` was marked obsolete.
A new settings `allow_experimental_vector_similarity_index` was added.
2024-08-12 15:30:45 +00:00
Robert Schulze
40bed3e20f
Remove support for WHERE-type queries
These kind of vector search similarity queries are rather obscure and
rare in practice. They require the user to specify a maximum distance
which is not intuitive to obtain. Furthermore, these queries are not
natively supported in USearch, so the vector search index had to emulate
these queries.

Therefore simplifying the code base and restricting vector search to
ORDER-BY queries only.
2024-08-12 15:25:52 +00:00
Robert Schulze
218421c255
Remove Annoy indexes
Annoy indexes fell out of favor in the community, at least when it comes
to vector databases. Such indexes work okay-ish low dimensions but they
suffers badly from a curse of dimensionality which makes them inapt for
a high number of dimensions.

Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.

(*) spotify/annoy#456
2024-08-12 15:24:49 +00:00
Robert Schulze
d09c82ff76
Cosmetics II 2024-08-06 12:36:09 +00:00
JackyWoo
9036ce9725 Some fixups after merging 2024-07-04 15:38:33 +08:00
JackyWoo
0c5821e5b8 Merge branch 'master' into add_statistics_cmsketch
# Conflicts:
#	docs/en/engines/table-engines/mergetree-family/mergetree.md
#	src/Storages/Statistics/Statistics.cpp
#	src/Storages/Statistics/Statistics.h
#	src/Storages/Statistics/StatisticsTDigest.h
#	src/Storages/Statistics/StatisticsUniq.h
#	src/Storages/Statistics/TDigestStatistics.cpp
#	tests/queries/0_stateless/02864_statistics_uniq.sql
2024-07-04 10:25:53 +08:00
Robert Schulze
2cefa56f9b
Update docs 2024-07-03 10:13:15 +00:00
Robert Schulze
cc67efd789
Some fixups 2024-06-26 12:39:50 +00:00
Robert Schulze
d59a170144
Docs for MergeTree: Capitalized SETTINGS 2024-06-10 07:05:36 +00:00
Robert Schulze
cdd2957a31
Move MergeTree setting docs into MergeTree settings docs page 2024-06-09 19:09:33 +00:00
Han Fei
c04e7e64af Merge branch 'master' into hanfei/stats_uniq 2024-06-05 13:09:15 +02:00
Robert Schulze
46434f9040
Merge pull request #63578 from ElderlyPassionFruit/add-compression-sorts-optimization
Best-effort sorting to improve compressability
2024-06-04 19:02:55 +00:00
Robert Schulze
d5ec72d2a3
Be more graceful to existing inverted indexes 2024-05-31 15:50:16 +00:00
Robert Schulze
335a0844f5
Cosmetics and docs 2024-05-27 09:41:29 +00:00
Han Fei
dc30cee58f refind docs 2024-05-25 18:02:06 +02:00
Han Fei
ee7ad460fd Merge branch 'master' into hanfei/stats_uniq 2024-05-25 18:01:21 +02:00
Shaun Struwig
f8fc1fa338
Merge branch 'master' into document_revision 2024-05-23 16:00:38 +02:00
Blargian
ce26c4f657 Review changes and replace … with ... 2024-05-23 13:54:45 +02:00
Han Fei
b8e7e99c91 Merge branch 'master' into hanfei/stats_uniq 2024-05-21 01:28:36 +02:00
Jordi Villar
16889ff032 Rollback doc example 2024-05-17 18:03:51 +02:00
Jordi Villar
acba6fd7a2 Fix typo 2024-05-17 17:57:24 +02:00
Jordi Villar
a2c040111c Improve ReplacingMergeTree is_deleted documentation 2024-05-17 17:47:35 +02:00
Han Fei
2e2d20717b refine docs 2024-05-17 17:37:16 +02:00
Han Fei
79bbe0b587 Merge branch 'master' into hanfei/stats_uniq 2024-05-14 18:16:52 +02:00
Justin de Guzman
849dd825c5
[Docs] Use ReplicatedMergeTree not ReplicatedReplacingMergeTree for data replication examples 2024-05-09 15:33:40 -07:00
Robert Schulze
0f87653fef
Rename "inverted" to "fulltext" 2024-05-06 16:59:36 +00:00
Robert Schulze
b00c64fe9d
Docs: Remove tuple support from ANN indexes
Indexes for approximate nearest neighbourhood (ANN) search (Annoy,
USearch) can currently be build on columns of type Array(Float32) or
Tuple(Float32[, Float32[, ...]]). In practice, only Arrays are relevant
which makes sense as arrays store high-dimensional embeddings
consecutively and the additional flexibility of different data types in
a tuple is not needed.

Therefore, removing support for ANN indexes over tuple columns to
simplify the code, tests and docs.
2024-05-06 14:18:30 +00:00
Robert Schulze
53c089722c
Docs: Add a note about the naming of the inverted index in earlier versions 2024-05-06 14:04:34 +00:00
Robert Schulze
0a01a92bf8
Merge pull request #62884 from rschu1ze/inverted-to-fulltext
Rename "inverted index" to "full-text index"
2024-04-30 20:03:35 +00:00
Julia Kartseva
8c99b0d5eb docs 2024-04-29 18:01:21 +00:00
HowePa
26b46816d4 fix settings link 2024-04-29 11:36:45 +08:00
alesapin
1b562ce569
Merge pull request #61769 from kirillgarbar/modify-engine
Search for convert_to_replicated flag at the correct path
2024-04-24 18:17:29 +00:00
Han Fei
42621b2060 Merge branch 'master' into hanfei/stats_uniq 2024-04-24 19:55:15 +02:00
Robert Schulze
56e32e0f99
Rename MergeTreeIndexInverted* to MergeTreeIndexFullText* 2024-04-24 07:20:08 +00:00
Raúl Marín
b90eb1962f Remove mentions of clean_deleted_rows from the documentation 2024-04-11 19:56:30 +02:00
Kirill
f2e0a3be1c
Fix typo in docs
Co-authored-by: Alexey Milovidov <milovidov@clickhouse.com>
2024-03-25 01:31:59 +03:00
Mark Needham
a85886c2e0 AggregatingMergeTree: Split table creation and MV definition + add more to example 2024-03-22 16:26:43 +00:00
Кирилл Гарбар
acb623f802 Mention problem in docs 2024-03-22 16:11:23 +03:00
Han Fei
b2ceeba0f2 Merge branch 'master' into hanfei/stats_uniq 2024-03-19 23:47:08 +01:00
kssenii
b538e94acf Merge remote-tracking branch 'origin/master' into add-documentation-for-disks-configuration 2024-03-14 14:06:12 +01:00
Smita Kulkarni
a168a84624 Merge branch 'master' into Fix_endpoint_for_azureblobstorage 2024-02-29 10:55:31 +01:00
Smita Kulkarni
5d68c9f046 Updated default value of endpoint_contains_account_name to true 2024-02-29 09:38:13 +01:00
Alexander Tokmakov
25c82a5949
Merge pull request #57798 from kirillgarbar/modify-engine
Convert MergeTree tables to replicated on server restart if flag is set
2024-02-28 21:55:15 +01:00