Commit Graph

150516 Commits

Author SHA1 Message Date
Robert Schulze
2ffcc97af2
Merge pull request #63675 from rschu1ze/vector-search
Initial implementation of vector similarity index
2024-08-13 15:06:20 +00:00
Yakov Olkhovskiy
3e8a177622
Merge pull request #61908 from ClickHouse/ci-fuzzer-enable
CI: enable libfuzzer (fixing build and docker)
2024-08-13 14:22:09 +00:00
Yarik Briukhovetskyi
39c25663ae
Merge pull request #67879 from bigo-sg/opt_orc_writer
Avoid allocating unnecessary capacity for array column while writing orc & some minor refactors
2024-08-13 12:51:11 +00:00
Robert Schulze
99282e526a
Merge pull request #68235 from sakulali/query_cache_tag
QueryCache: Add tagging
2024-08-13 10:44:10 +00:00
Yarik Briukhovetskyi
086c0f03a6
Merge pull request #65997 from yariks5s/hive_style_partitioning
Implementing Hive-style partitioning
2024-08-13 10:04:21 +00:00
vdimir
dfb892ba5f
Merge pull request #66616 from Blargian/docs_getXYZ
add documentation for `getSubcolumn` and `getTypeSerializationStreams`
2024-08-13 09:35:44 +00:00
Pablo Marcos
d28e2d7546
Merge pull request #68203 from pamarcos/fix-test-01903_correct_block_size_prediction_with_default
[Green CI] Fix test 01903_correct_block_size_prediction_with_default
2024-08-13 08:11:35 +00:00
pufit
ae5223854f
Merge pull request #67653 from ClickHouse/pufit/inconsistent-formating-grant-current-grants
Fix inconsistent formatting for `GRANT CURRENT GRANTS`
2024-08-13 03:21:26 +00:00
Yakov Olkhovskiy
a517bc90cd
Update PULL_REQUEST_TEMPLATE.md 2024-08-12 21:42:47 -04:00
Alexey Milovidov
203857020f
Merge pull request #68178 from ClickHouse/fix-68177
Fix `test_cluster_all_replicas`
2024-08-12 21:52:13 +00:00
János Benjamin Antal
ac6826392d
Merge pull request #67554 from ClickHouse/fix-message-queue-sink-from-http-interface
Fix message queue sink from http interface
2024-08-12 21:29:14 +00:00
János Benjamin Antal
6eb4a71ad3
Merge pull request #68163 from azat/backups-processes
[RFC] Fix settings/current_database in system.processes for async BACKUP/RESTORE
2024-08-12 21:07:55 +00:00
János Benjamin Antal
eaa5715a02
Merge pull request #68200 from ClickHouse/remove-log-engine-from-kafka-integration-tests
Remove Log engine from Kafka integration tests
2024-08-12 20:44:53 +00:00
Han Fei
40382451a2
Merge pull request #68186 from rschu1ze/stats-tests-refactoring
Refactor tests for (experimental) statistics
2024-08-12 18:58:19 +00:00
Robert Schulze
45a14fa0ce
Fix spelling 2024-08-12 18:54:06 +00:00
Robert Schulze
d03b354550
Merge pull request #67964 from rschu1ze/multiquery-followup-new2
Remove obsolete `--multiquery` parameter (follow-up to #63898), pt. III
2024-08-12 18:42:53 +00:00
Shaun Struwig
eab8594570
Update aspell-dict.txt 2024-08-12 20:35:33 +02:00
Shaun Struwig
aa7a2bcb02
Fix typo 2024-08-12 20:34:02 +02:00
Robert Schulze
c22265b889
Some fixups 2024-08-12 17:45:38 +00:00
Kruglov Pavel
ba85cc8d59
Merge pull request #67043 from Avogar/improve-squashing
Improve columns squashing for String/Array/Map/Variant/Dynamic types
2024-08-12 17:14:15 +00:00
Antonio Andelic
ea9b7d4c27
Merge pull request #67998 from ClickHouse/minio-audit-logs
Collect minio audit logs in stateless tests
2024-08-12 17:03:17 +00:00
Alexey Milovidov
74fb08198c Merge branch 'master' into fix-68177 2024-08-12 18:39:06 +02:00
Alexey Milovidov
58ed71bf11
Merge pull request #68181 from ClickHouse/fix-leftovers
Fix leftovers
2024-08-12 16:22:17 +00:00
Alexey Milovidov
c95a40cdf0
Merge pull request #68182 from ClickHouse/fix-transactions
Fix test `01172_transaction_counters`
2024-08-12 16:20:26 +00:00
Pablo Marcos
858b7e55d0 Improve condition in case the default column consumes slightly more memory
It never happened in the few hundreds of tests I ran successfully,
but we'd rather be safe than sorry.
2024-08-12 16:16:57 +00:00
Robert Schulze
fe537045c9
Merge remote-tracking branch 'ClickHouse/master' into query_cache_tag 2024-08-12 16:16:32 +00:00
Yarik Briukhovetskyi
3a6e05eb43
try to fix includes 2024-08-12 18:03:42 +02:00
Yarik Briukhovetskyi
ea1cd66575
fix tidy 2024-08-12 17:32:43 +02:00
Robert Schulze
fb76cb90b1
Allow un-quoted skip index parameters
Previously, only this syntax to create a skip index worked:

   INDEX index_name column_name TYPE vector_similarity('hnsw', 'L2Distance')

Now, this syntax will work as well:

  INDEX index_name column_name TYPE vector_similarity(hnsw, L2Distance)
2024-08-12 15:32:25 +00:00
Robert Schulze
d2e79f0b92
Rework vector index parameters
USearch (similar to FAISS) allows to specify the distance function,
quantization, and various HNSW meta-parameters for index creation and
sarch. Some users wished for greater configurability, so let's expose
them.

Index creation now requires either
- 2 parameters (with the other 4 parameters taking on default values), or
- 6 parameters for full control

This commit also remove quantization `f64` (that would be upsampling).
2024-08-12 15:32:19 +00:00
Robert Schulze
cc5c64e1ed
Add migration helper for legacy 'annoy' and 'usearch' indexes types
Index types 'annoy' and 'usearch' were removed and replaced by
'vector_similarity' indexes in an earlier commit.

This means unfortuantely, that if customers have tables with these
indexes and upgrade, their database might not start anymore - the
system loads the metadata at startup, thinks something is wrong with
such tables, and halts immediately.

This commit adds support for loading and attaching such indexes back.
Data insert or use (search) return an error which recommends a migration
to 'vector_similarity' indexes. The implementation is generally similar
to what has recently been implemented for 'full_text' indexes [1, 2].

[1] https://github.com/ClickHouse/ClickHouse/pull/64656
[2] https://github.com/ClickHouse/ClickHouse/pull/64846
2024-08-12 15:31:27 +00:00
Robert Schulze
785b6637fa
Rename index type "usearch" to "vector_similarity"
First, index type "vector_similarity" is more speaking and user-friendly
than "usearch". Second, we should not expose the name of the library
doing the job (usearch). Of course, the docs will continue to mention
usearch (credit where credit is due).

Existing setting `allow_experimental_usearch_index` was marked obsolete.
A new settings `allow_experimental_vector_similarity_index` was added.
2024-08-12 15:30:45 +00:00
Robert Schulze
021fad920e
Cosmetics: minor stuff 2024-08-12 15:30:41 +00:00
Robert Schulze
2aa037985b
Cosmetics: simplify inheritance hierarchy 2024-08-12 15:30:38 +00:00
Robert Schulze
901906159d
Cosmetics: ApproximateNearestNeighborInformation --> Info + nest in class 2024-08-12 15:30:35 +00:00
Robert Schulze
6170aad43e
Cosmetics: ApproximateNearestNeighborIndexesCommon --> VectorSimilarityCondition 2024-08-12 15:30:30 +00:00
Robert Schulze
e20eff635e
Cosmetics: variable naming 2024-08-12 15:30:27 +00:00
Robert Schulze
1bf320a1a8
Cosmetics: metric --> distance_function (for consistent terminology) 2024-08-12 15:30:24 +00:00
Robert Schulze
3f47b42d71
Remove funny typedef 2024-08-12 15:30:21 +00:00
Robert Schulze
fb26a9e6d4
Cosmetics: whitespaces 2024-08-12 15:30:18 +00:00
Robert Schulze
0f1765a273
Cosmetics: function naming 2024-08-12 15:30:14 +00:00
Robert Schulze
a8167abca2
Cosmetics: use native types/functions 2024-08-12 15:30:10 +00:00
Robert Schulze
9ad890e399
Cosmetics: whitespaces 2024-08-12 15:30:07 +00:00
Robert Schulze
27a6931a35
Cosmetics: variable naming 2024-08-12 15:29:59 +00:00
Robert Schulze
289c27c804
Introduce version for for index files in persistence 2024-08-12 15:29:02 +00:00
Robert Schulze
4ad624cb7e
Cosmetics 2024-08-12 15:28:58 +00:00
Robert Schulze
74de79e52b
Addd logging of basic statistics 2024-08-12 15:28:46 +00:00
Robert Schulze
8853b3359b
Remove useless templatization
Makes the code cleaner, compile faster, and the binary smaller.
2024-08-12 15:27:06 +00:00
Robert Schulze
4f23f7754b
Cosmetics 2024-08-12 15:26:05 +00:00
Robert Schulze
7f611681df
Add a similar sanity check as in other skipping indexes 2024-08-12 15:26:01 +00:00