Commit Graph

150670 Commits

Author SHA1 Message Date
Robert Schulze
d03b354550
Merge pull request #67964 from rschu1ze/multiquery-followup-new2
Remove obsolete `--multiquery` parameter (follow-up to #63898), pt. III
2024-08-12 18:42:53 +00:00
Shaun Struwig
eab8594570
Update aspell-dict.txt 2024-08-12 20:35:33 +02:00
Shaun Struwig
aa7a2bcb02
Fix typo 2024-08-12 20:34:02 +02:00
Robert Schulze
c22265b889
Some fixups 2024-08-12 17:45:38 +00:00
Kruglov Pavel
ba85cc8d59
Merge pull request #67043 from Avogar/improve-squashing
Improve columns squashing for String/Array/Map/Variant/Dynamic types
2024-08-12 17:14:15 +00:00
Antonio Andelic
ea9b7d4c27
Merge pull request #67998 from ClickHouse/minio-audit-logs
Collect minio audit logs in stateless tests
2024-08-12 17:03:17 +00:00
Alexey Milovidov
74fb08198c Merge branch 'master' into fix-68177 2024-08-12 18:39:06 +02:00
Alexey Milovidov
58ed71bf11
Merge pull request #68181 from ClickHouse/fix-leftovers
Fix leftovers
2024-08-12 16:22:17 +00:00
Alexey Milovidov
c95a40cdf0
Merge pull request #68182 from ClickHouse/fix-transactions
Fix test `01172_transaction_counters`
2024-08-12 16:20:26 +00:00
Pablo Marcos
858b7e55d0 Improve condition in case the default column consumes slightly more memory
It never happened in the few hundreds of tests I ran successfully,
but we'd rather be safe than sorry.
2024-08-12 16:16:57 +00:00
Robert Schulze
fe537045c9
Merge remote-tracking branch 'ClickHouse/master' into query_cache_tag 2024-08-12 16:16:32 +00:00
Yarik Briukhovetskyi
3a6e05eb43
try to fix includes 2024-08-12 18:03:42 +02:00
Yarik Briukhovetskyi
ea1cd66575
fix tidy 2024-08-12 17:32:43 +02:00
Robert Schulze
fb76cb90b1
Allow un-quoted skip index parameters
Previously, only this syntax to create a skip index worked:

   INDEX index_name column_name TYPE vector_similarity('hnsw', 'L2Distance')

Now, this syntax will work as well:

  INDEX index_name column_name TYPE vector_similarity(hnsw, L2Distance)
2024-08-12 15:32:25 +00:00
Robert Schulze
d2e79f0b92
Rework vector index parameters
USearch (similar to FAISS) allows to specify the distance function,
quantization, and various HNSW meta-parameters for index creation and
sarch. Some users wished for greater configurability, so let's expose
them.

Index creation now requires either
- 2 parameters (with the other 4 parameters taking on default values), or
- 6 parameters for full control

This commit also remove quantization `f64` (that would be upsampling).
2024-08-12 15:32:19 +00:00
Robert Schulze
cc5c64e1ed
Add migration helper for legacy 'annoy' and 'usearch' indexes types
Index types 'annoy' and 'usearch' were removed and replaced by
'vector_similarity' indexes in an earlier commit.

This means unfortuantely, that if customers have tables with these
indexes and upgrade, their database might not start anymore - the
system loads the metadata at startup, thinks something is wrong with
such tables, and halts immediately.

This commit adds support for loading and attaching such indexes back.
Data insert or use (search) return an error which recommends a migration
to 'vector_similarity' indexes. The implementation is generally similar
to what has recently been implemented for 'full_text' indexes [1, 2].

[1] https://github.com/ClickHouse/ClickHouse/pull/64656
[2] https://github.com/ClickHouse/ClickHouse/pull/64846
2024-08-12 15:31:27 +00:00
Robert Schulze
785b6637fa
Rename index type "usearch" to "vector_similarity"
First, index type "vector_similarity" is more speaking and user-friendly
than "usearch". Second, we should not expose the name of the library
doing the job (usearch). Of course, the docs will continue to mention
usearch (credit where credit is due).

Existing setting `allow_experimental_usearch_index` was marked obsolete.
A new settings `allow_experimental_vector_similarity_index` was added.
2024-08-12 15:30:45 +00:00
Robert Schulze
021fad920e
Cosmetics: minor stuff 2024-08-12 15:30:41 +00:00
Robert Schulze
2aa037985b
Cosmetics: simplify inheritance hierarchy 2024-08-12 15:30:38 +00:00
Robert Schulze
901906159d
Cosmetics: ApproximateNearestNeighborInformation --> Info + nest in class 2024-08-12 15:30:35 +00:00
Robert Schulze
6170aad43e
Cosmetics: ApproximateNearestNeighborIndexesCommon --> VectorSimilarityCondition 2024-08-12 15:30:30 +00:00
Robert Schulze
e20eff635e
Cosmetics: variable naming 2024-08-12 15:30:27 +00:00
Robert Schulze
1bf320a1a8
Cosmetics: metric --> distance_function (for consistent terminology) 2024-08-12 15:30:24 +00:00
Robert Schulze
3f47b42d71
Remove funny typedef 2024-08-12 15:30:21 +00:00
Robert Schulze
fb26a9e6d4
Cosmetics: whitespaces 2024-08-12 15:30:18 +00:00
Robert Schulze
0f1765a273
Cosmetics: function naming 2024-08-12 15:30:14 +00:00
Robert Schulze
a8167abca2
Cosmetics: use native types/functions 2024-08-12 15:30:10 +00:00
Robert Schulze
9ad890e399
Cosmetics: whitespaces 2024-08-12 15:30:07 +00:00
Robert Schulze
27a6931a35
Cosmetics: variable naming 2024-08-12 15:29:59 +00:00
Robert Schulze
289c27c804
Introduce version for for index files in persistence 2024-08-12 15:29:02 +00:00
Robert Schulze
4ad624cb7e
Cosmetics 2024-08-12 15:28:58 +00:00
Robert Schulze
74de79e52b
Addd logging of basic statistics 2024-08-12 15:28:46 +00:00
Robert Schulze
8853b3359b
Remove useless templatization
Makes the code cleaner, compile faster, and the binary smaller.
2024-08-12 15:27:06 +00:00
Robert Schulze
4f23f7754b
Cosmetics 2024-08-12 15:26:05 +00:00
Robert Schulze
7f611681df
Add a similar sanity check as in other skipping indexes 2024-08-12 15:26:01 +00:00
Robert Schulze
f944ef25bb
Better handling of errors during add, search, and save 2024-08-12 15:25:58 +00:00
Robert Schulze
e7c2bf49c3
Add detach/attach test 2024-08-12 15:25:55 +00:00
Robert Schulze
40bed3e20f
Remove support for WHERE-type queries
These kind of vector search similarity queries are rather obscure and
rare in practice. They require the user to specify a maximum distance
which is not intuitive to obtain. Furthermore, these queries are not
natively supported in USearch, so the vector search index had to emulate
these queries.

Therefore simplifying the code base and restricting vector search to
ORDER-BY queries only.
2024-08-12 15:25:52 +00:00
Robert Schulze
abb8e61981
Remove support code for Lp norm in vector search
It is a generalization of other norms, too expensive to calculate and
not relevant in practice. Also, Usearch doesn't support it.
2024-08-12 15:25:48 +00:00
Robert Schulze
65186f0b69
Remove tuple support
Indexes for approximate nearest neighbourhood (ANN) search (USearch) can
be build on columns of type Array(Float32) or Tuple(Float32[, Float32[, ...]]).
In practice, Arrays(Float32) is the only relevant data type.
Arrays store high-dimensional embeddings consecutively (--> cache
locality) and the additional flexibility of different data types in a
tuple is not needed for vector search.

Therefore removing support for ANN indexes over tuple columns to
simplify the code, tests and docs.
2024-08-12 15:25:39 +00:00
Robert Schulze
218421c255
Remove Annoy indexes
Annoy indexes fell out of favor in the community, at least when it comes
to vector databases. Such indexes work okay-ish low dimensions but they
suffers badly from a curse of dimensionality which makes them inapt for
a high number of dimensions.

Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.

(*) spotify/annoy#456
2024-08-12 15:24:49 +00:00
Robert Schulze
7c41939921
Fix test results (no analyzer support yet ...) 2024-08-12 15:24:22 +00:00
Robert Schulze
d7211f9d12
Fix CMake integration of usearch and annoy
Registers usearch and annoy properly via configure_config.cmake and
config.h.in like all other 3rd party libs, instead of (mis)using
target_compile_definitions.
2024-08-12 15:24:18 +00:00
Robert Schulze
a39b9cf643
Un-screw usearch's build description
No directory 'SimSIMD-map' exists, the build only worked because SimSIMD
support in usearch was (accidentally?) disabled. This commit corrects
the build description. SimSIMD support in usearch will be enabled by a
later commit.
2024-08-12 15:24:14 +00:00
divanik
eb3ffb7184 Add supportsReplication 2024-08-12 15:09:16 +00:00
Robert Schulze
85f63b056b
Merge pull request #68135 from ClickHouse/refactor-field-get
Only use Field::safeGet - Field::get prone to type punning
2024-08-12 14:25:11 +00:00
Pablo Marcos
da5b9582a9 Fix indent 2024-08-12 13:54:17 +00:00
János Benjamin Antal
6cde029ed9 Fix style 2024-08-12 13:48:44 +00:00
Pablo Marcos
f7c6eabb49 Small fix to filter by current_database in system.query_log 2024-08-12 13:44:05 +00:00
János Benjamin Antal
34643ee16c Run test only from modified files 2024-08-12 13:30:25 +00:00