ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-26 09:32:01 +00:00

Author	SHA1	Message	Date
Robert Schulze	fb76cb90b1	Allow un-quoted skip index parameters Previously, only this syntax to create a skip index worked: INDEX index_name column_name TYPE vector_similarity('hnsw', 'L2Distance') Now, this syntax will work as well: INDEX index_name column_name TYPE vector_similarity(hnsw, L2Distance)	2024-08-12 15:32:25 +00:00
Robert Schulze	d2e79f0b92	Rework vector index parameters USearch (similar to FAISS) allows to specify the distance function, quantization, and various HNSW meta-parameters for index creation and sarch. Some users wished for greater configurability, so let's expose them. Index creation now requires either - 2 parameters (with the other 4 parameters taking on default values), or - 6 parameters for full control This commit also remove quantization `f64` (that would be upsampling).	2024-08-12 15:32:19 +00:00
Robert Schulze	cc5c64e1ed	Add migration helper for legacy 'annoy' and 'usearch' indexes types Index types 'annoy' and 'usearch' were removed and replaced by 'vector_similarity' indexes in an earlier commit. This means unfortuantely, that if customers have tables with these indexes and upgrade, their database might not start anymore - the system loads the metadata at startup, thinks something is wrong with such tables, and halts immediately. This commit adds support for loading and attaching such indexes back. Data insert or use (search) return an error which recommends a migration to 'vector_similarity' indexes. The implementation is generally similar to what has recently been implemented for 'full_text' indexes [1, 2]. [1] https://github.com/ClickHouse/ClickHouse/pull/64656 [2] https://github.com/ClickHouse/ClickHouse/pull/64846	2024-08-12 15:31:27 +00:00
Robert Schulze	785b6637fa	Rename index type "usearch" to "vector_similarity" First, index type "vector_similarity" is more speaking and user-friendly than "usearch". Second, we should not expose the name of the library doing the job (usearch). Of course, the docs will continue to mention usearch (credit where credit is due). Existing setting `allow_experimental_usearch_index` was marked obsolete. A new settings `allow_experimental_vector_similarity_index` was added.	2024-08-12 15:30:45 +00:00
Robert Schulze	021fad920e	Cosmetics: minor stuff	2024-08-12 15:30:41 +00:00
Robert Schulze	2aa037985b	Cosmetics: simplify inheritance hierarchy	2024-08-12 15:30:38 +00:00
Robert Schulze	901906159d	Cosmetics: ApproximateNearestNeighborInformation --> Info + nest in class	2024-08-12 15:30:35 +00:00
Robert Schulze	6170aad43e	Cosmetics: ApproximateNearestNeighborIndexesCommon --> VectorSimilarityCondition	2024-08-12 15:30:30 +00:00
Robert Schulze	e20eff635e	Cosmetics: variable naming	2024-08-12 15:30:27 +00:00
Robert Schulze	1bf320a1a8	Cosmetics: metric --> distance_function (for consistent terminology)	2024-08-12 15:30:24 +00:00
Robert Schulze	3f47b42d71	Remove funny typedef	2024-08-12 15:30:21 +00:00
Robert Schulze	fb26a9e6d4	Cosmetics: whitespaces	2024-08-12 15:30:18 +00:00
Robert Schulze	0f1765a273	Cosmetics: function naming	2024-08-12 15:30:14 +00:00
Robert Schulze	a8167abca2	Cosmetics: use native types/functions	2024-08-12 15:30:10 +00:00
Robert Schulze	9ad890e399	Cosmetics: whitespaces	2024-08-12 15:30:07 +00:00
Robert Schulze	27a6931a35	Cosmetics: variable naming	2024-08-12 15:29:59 +00:00
Robert Schulze	289c27c804	Introduce version for for index files in persistence	2024-08-12 15:29:02 +00:00
Robert Schulze	4ad624cb7e	Cosmetics	2024-08-12 15:28:58 +00:00
Robert Schulze	74de79e52b	Addd logging of basic statistics	2024-08-12 15:28:46 +00:00
Kruglov Pavel	bba4a90a9c	Merge branch 'master' into better-dynamic	2024-08-12 17:28:09 +02:00
Robert Schulze	8853b3359b	Remove useless templatization Makes the code cleaner, compile faster, and the binary smaller.	2024-08-12 15:27:06 +00:00
Nikita Taranov	57a614857c	address review comments	2024-08-12 16:27:01 +01:00
Robert Schulze	4f23f7754b	Cosmetics	2024-08-12 15:26:05 +00:00
Robert Schulze	7f611681df	Add a similar sanity check as in other skipping indexes	2024-08-12 15:26:01 +00:00
Robert Schulze	f944ef25bb	Better handling of errors during add, search, and save	2024-08-12 15:25:58 +00:00
Robert Schulze	e7c2bf49c3	Add detach/attach test	2024-08-12 15:25:55 +00:00
Robert Schulze	40bed3e20f	Remove support for WHERE-type queries These kind of vector search similarity queries are rather obscure and rare in practice. They require the user to specify a maximum distance which is not intuitive to obtain. Furthermore, these queries are not natively supported in USearch, so the vector search index had to emulate these queries. Therefore simplifying the code base and restricting vector search to ORDER-BY queries only.	2024-08-12 15:25:52 +00:00
Robert Schulze	abb8e61981	Remove support code for Lp norm in vector search It is a generalization of other norms, too expensive to calculate and not relevant in practice. Also, Usearch doesn't support it.	2024-08-12 15:25:48 +00:00
Robert Schulze	65186f0b69	Remove tuple support Indexes for approximate nearest neighbourhood (ANN) search (USearch) can be build on columns of type Array(Float32) or Tuple(Float32[, Float32[, ...]]). In practice, Arrays(Float32) is the only relevant data type. Arrays store high-dimensional embeddings consecutively (--> cache locality) and the additional flexibility of different data types in a tuple is not needed for vector search. Therefore removing support for ANN indexes over tuple columns to simplify the code, tests and docs.	2024-08-12 15:25:39 +00:00
Robert Schulze	218421c255	Remove Annoy indexes Annoy indexes fell out of favor in the community, at least when it comes to vector databases. Such indexes work okay-ish low dimensions but they suffers badly from a curse of dimensionality which makes them inapt for a high number of dimensions. Now that Annoy is gone, issue () also disappears and we can drop 'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests. () spotify/annoy#456	2024-08-12 15:24:49 +00:00
Robert Schulze	7c41939921	Fix test results (no analyzer support yet ...)	2024-08-12 15:24:22 +00:00
Robert Schulze	d7211f9d12	Fix CMake integration of usearch and annoy Registers usearch and annoy properly via configure_config.cmake and config.h.in like all other 3rd party libs, instead of (mis)using target_compile_definitions.	2024-08-12 15:24:18 +00:00
Robert Schulze	a39b9cf643	Un-screw usearch's build description No directory 'SimSIMD-map' exists, the build only worked because SimSIMD support in usearch was (accidentally?) disabled. This commit corrects the build description. SimSIMD support in usearch will be enabled by a later commit.	2024-08-12 15:24:14 +00:00
János Benjamin Antal	897b8d5a88	Try to give more chances to `node2` to steal some work	2024-08-12 15:21:01 +00:00
divanik	eb3ffb7184	Add supportsReplication	2024-08-12 15:09:16 +00:00
Robert Schulze	85f63b056b	Merge pull request #68135 from ClickHouse/refactor-field-get Only use Field::safeGet - Field::get prone to type punning	2024-08-12 14:25:11 +00:00
jsc0218	a9d19c7aca	Merge remote-tracking branch 'origin/master' into LWDRebuildProj	2024-08-12 14:06:25 +00:00
Pablo Marcos	da5b9582a9	Fix indent	2024-08-12 13:54:17 +00:00
avogar	8522776c33	Remove unused code	2024-08-12 13:49:01 +00:00
János Benjamin Antal	6cde029ed9	Fix style	2024-08-12 13:48:44 +00:00
avogar	9834457c26	Fix copying arguments	2024-08-12 13:48:18 +00:00
avogar	a329456146	Fix review comments	2024-08-12 13:47:10 +00:00
Peter Nguyen	c817a4e8ad	Update settings.md to clarify create_if_not_exists behavior Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>	2024-08-12 07:45:51 -06:00
Pablo Marcos	f7c6eabb49	Small fix to filter by current_database in system.query_log	2024-08-12 13:44:05 +00:00
Peter Nguyen	83d20bee00	Update 03221_create_if_not_exists_setting test to a .sql test	2024-08-12 07:42:55 -06:00
János Benjamin Antal	34643ee16c	Run test only from modified files	2024-08-12 13:30:25 +00:00
Nikita Taranov	633b15d7a4	Merge branch 'master' into cpu_cgroup_aware	2024-08-12 14:18:36 +01:00
Robert Schulze	037a1006fd	Merge remote-tracking branch 'ClickHouse/master' into ci-fuzzer-enable	2024-08-12 12:28:32 +00:00
Anton Popov	3172bf8d76	better accounting of time for merge of projections	2024-08-12 12:23:32 +00:00
Nikita Taranov	2f546fb513	Merge pull request #68098 from aiven-sal/aiven-sal/segfault Fix UB in hopEnd, hopStart, tumbleEnd, and tumbleStart	2024-08-12 12:09:23 +00:00

... 6 7 8 9 10 ...

151178 Commits