ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-12 18:45:20 +00:00

Author	SHA1	Message	Date
Robert Schulze	cc5c64e1ed	Add migration helper for legacy 'annoy' and 'usearch' indexes types Index types 'annoy' and 'usearch' were removed and replaced by 'vector_similarity' indexes in an earlier commit. This means unfortuantely, that if customers have tables with these indexes and upgrade, their database might not start anymore - the system loads the metadata at startup, thinks something is wrong with such tables, and halts immediately. This commit adds support for loading and attaching such indexes back. Data insert or use (search) return an error which recommends a migration to 'vector_similarity' indexes. The implementation is generally similar to what has recently been implemented for 'full_text' indexes [1, 2]. [1] https://github.com/ClickHouse/ClickHouse/pull/64656 [2] https://github.com/ClickHouse/ClickHouse/pull/64846	2024-08-12 15:31:27 +00:00
Robert Schulze	785b6637fa	Rename index type "usearch" to "vector_similarity" First, index type "vector_similarity" is more speaking and user-friendly than "usearch". Second, we should not expose the name of the library doing the job (usearch). Of course, the docs will continue to mention usearch (credit where credit is due). Existing setting `allow_experimental_usearch_index` was marked obsolete. A new settings `allow_experimental_vector_similarity_index` was added.	2024-08-12 15:30:45 +00:00
Robert Schulze	021fad920e	Cosmetics: minor stuff	2024-08-12 15:30:41 +00:00
Robert Schulze	2aa037985b	Cosmetics: simplify inheritance hierarchy	2024-08-12 15:30:38 +00:00
Robert Schulze	901906159d	Cosmetics: ApproximateNearestNeighborInformation --> Info + nest in class	2024-08-12 15:30:35 +00:00
Robert Schulze	6170aad43e	Cosmetics: ApproximateNearestNeighborIndexesCommon --> VectorSimilarityCondition	2024-08-12 15:30:30 +00:00
Robert Schulze	e20eff635e	Cosmetics: variable naming	2024-08-12 15:30:27 +00:00
Robert Schulze	1bf320a1a8	Cosmetics: metric --> distance_function (for consistent terminology)	2024-08-12 15:30:24 +00:00
Robert Schulze	3f47b42d71	Remove funny typedef	2024-08-12 15:30:21 +00:00
Robert Schulze	fb26a9e6d4	Cosmetics: whitespaces	2024-08-12 15:30:18 +00:00
Robert Schulze	0f1765a273	Cosmetics: function naming	2024-08-12 15:30:14 +00:00
Robert Schulze	a8167abca2	Cosmetics: use native types/functions	2024-08-12 15:30:10 +00:00
Robert Schulze	9ad890e399	Cosmetics: whitespaces	2024-08-12 15:30:07 +00:00
Robert Schulze	27a6931a35	Cosmetics: variable naming	2024-08-12 15:29:59 +00:00
Robert Schulze	289c27c804	Introduce version for for index files in persistence	2024-08-12 15:29:02 +00:00
Robert Schulze	4ad624cb7e	Cosmetics	2024-08-12 15:28:58 +00:00
Robert Schulze	74de79e52b	Addd logging of basic statistics	2024-08-12 15:28:46 +00:00
Robert Schulze	8853b3359b	Remove useless templatization Makes the code cleaner, compile faster, and the binary smaller.	2024-08-12 15:27:06 +00:00
Robert Schulze	4f23f7754b	Cosmetics	2024-08-12 15:26:05 +00:00
Robert Schulze	7f611681df	Add a similar sanity check as in other skipping indexes	2024-08-12 15:26:01 +00:00
Robert Schulze	f944ef25bb	Better handling of errors during add, search, and save	2024-08-12 15:25:58 +00:00
Robert Schulze	e7c2bf49c3	Add detach/attach test	2024-08-12 15:25:55 +00:00
Robert Schulze	40bed3e20f	Remove support for WHERE-type queries These kind of vector search similarity queries are rather obscure and rare in practice. They require the user to specify a maximum distance which is not intuitive to obtain. Furthermore, these queries are not natively supported in USearch, so the vector search index had to emulate these queries. Therefore simplifying the code base and restricting vector search to ORDER-BY queries only.	2024-08-12 15:25:52 +00:00
Robert Schulze	abb8e61981	Remove support code for Lp norm in vector search It is a generalization of other norms, too expensive to calculate and not relevant in practice. Also, Usearch doesn't support it.	2024-08-12 15:25:48 +00:00
Robert Schulze	65186f0b69	Remove tuple support Indexes for approximate nearest neighbourhood (ANN) search (USearch) can be build on columns of type Array(Float32) or Tuple(Float32[, Float32[, ...]]). In practice, Arrays(Float32) is the only relevant data type. Arrays store high-dimensional embeddings consecutively (--> cache locality) and the additional flexibility of different data types in a tuple is not needed for vector search. Therefore removing support for ANN indexes over tuple columns to simplify the code, tests and docs.	2024-08-12 15:25:39 +00:00
Robert Schulze	218421c255	Remove Annoy indexes Annoy indexes fell out of favor in the community, at least when it comes to vector databases. Such indexes work okay-ish low dimensions but they suffers badly from a curse of dimensionality which makes them inapt for a high number of dimensions. Now that Annoy is gone, issue () also disappears and we can drop 'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests. () spotify/annoy#456	2024-08-12 15:24:49 +00:00
Robert Schulze	7c41939921	Fix test results (no analyzer support yet ...)	2024-08-12 15:24:22 +00:00
Robert Schulze	d7211f9d12	Fix CMake integration of usearch and annoy Registers usearch and annoy properly via configure_config.cmake and config.h.in like all other 3rd party libs, instead of (mis)using target_compile_definitions.	2024-08-12 15:24:18 +00:00
Robert Schulze	a39b9cf643	Un-screw usearch's build description No directory 'SimSIMD-map' exists, the build only worked because SimSIMD support in usearch was (accidentally?) disabled. This commit corrects the build description. SimSIMD support in usearch will be enabled by a later commit.	2024-08-12 15:24:14 +00:00
Robert Schulze	85f63b056b	Merge pull request #68135 from ClickHouse/refactor-field-get Only use Field::safeGet - Field::get prone to type punning	2024-08-12 14:25:11 +00:00
Nikita Taranov	2f546fb513	Merge pull request #68098 from aiven-sal/aiven-sal/segfault Fix UB in hopEnd, hopStart, tumbleEnd, and tumbleStart	2024-08-12 12:09:23 +00:00
Sema Checherinda	5e836bc20e	Merge pull request #67472 from ClickHouse/chesema-02765 speed up system flush logs	2024-08-12 11:51:55 +00:00
vdimir	52f37f2ec6	Merge pull request #67980 from ClickHouse/vdimir/fix_03130_convert_outer_join_to_inner_join Fix 03130_convert_outer_join_to_inner_join	2024-08-12 11:34:10 +00:00
Robert Schulze	0aa30b10d5	Merge pull request #68069 from rschu1ze/cmake-cleanup Minor CMake cleanup	2024-08-12 06:43:00 +00:00
Alexey Milovidov	2a12604cf5	Merge pull request #66494 from azat/gdb-image Update gdb to 15.1 (by compiling from sources)	2024-08-12 05:04:57 +00:00
Alexey Milovidov	94fe53de64	Merge branch 'master' into vdimir/fix_03130_convert_outer_join_to_inner_join	2024-08-12 07:03:34 +02:00
Alexey Milovidov	0b1887eb65	Merge pull request #68138 from jsc0218/Fix01710 Fix01710 Timeout	2024-08-12 04:07:52 +00:00
Alexey Milovidov	c1c8e6dd8d	Merge pull request #68099 from GrahamCampbell/patch-2 Do not apply redundant sorting removal when there's an offset	2024-08-11 23:59:15 +00:00
Alexey Milovidov	d18a68f285	Merge pull request #68160 from azat/tests/02122_join_group_by_timeout tests: fix 02122_join_group_by_timeout flakiness	2024-08-12 02:20:51 +02:00
Alexey Milovidov	c462f4639b	Merge pull request #68161 from narqo/patch-1 Fix typos in Prometheus protocol docs	2024-08-11 23:55:53 +00:00
Alexey Milovidov	d2a9eaaa01	Merge pull request #68157 from azat/local-fix-log Remove "Processing configuration file" message from clickhouse-local	2024-08-11 23:07:43 +00:00
Yakov Olkhovskiy	5c8665c660	fix system.kafka_consumers and doc, fix tidy	2024-08-11 20:40:55 +00:00
Alexey Milovidov	4de79653ea	Merge pull request #68134 from azat/tests/01246_buffer_flush tests: fix 01246_buffer_flush flakiness due to slow trace_log flush	2024-08-11 17:56:05 +00:00
Azat Khuzhin	e384e2c38e	tests: fix 02122_join_group_by_timeout flakiness CI found [1] failure of the test: 2024-08-11 21:06:07 /usr/share/clickhouse-test/queries/0_stateless/02122_join_group_by_timeout.sh: line 51: 52614 Killed timeout -s KILL $MAX_PROCESS_WAIT $CLICKHOUSE_CLIENT -q "SELECT a.name as n And the problem is not the server, but the client, since query executed for ~1 second: 2024.08.11 21:06:02.284318 [ 49232 ] {ba989ee2-f615-49ca-bcd8-31b3916aeb2c} <Debug> executeQuery: (from [::1]:54144) (comment: 02122_join_group_by_timeout.sh) SELECT a.name as n FROM ( SELECT 'Name' as name, number FROM system.numbers LIMIT 2000000 ) AS a, ( SELECT 'Name' as name2, number FROM system.numbers LIMIT 2000000 ) as b FORMAT Null SETTINGS max_execution_time = 1, timeout_overflow_mode = 'break' (stage: Complete) 2024.08.11 21:06:03.331249 [ 49232 ] {ba989ee2-f615-49ca-bcd8-31b3916aeb2c} <Debug> executeQuery: Read 517104 rows, 3.95 MiB in 1.072023 sec., 482362.78512681165 rows/sec., 3.68 MiB/sec. [1]: https://s3.amazonaws.com/clickhouse-test-reports/67134/18da3f0ab63da1eef9396627d0dfd56cf5356f65/stateless_tests__msan__[1_4].html So instead of using timeout, let's use time from the system.query_log instead. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2024-08-11 18:39:09 +02:00
Vladimir Varankin	d314e5aa45	typos in prometheus.md	2024-08-11 18:37:29 +02:00
Yakov Olkhovskiy	8e706265e6	fix	2024-08-11 16:29:35 +00:00
Igor Nikonov	4ef3fe416d	Fix and simplify test	2024-08-11 13:08:53 +00:00
Yakov Olkhovskiy	4fec61da55	fix wrong datatype in system.kafka_consumers	2024-08-11 12:35:27 +00:00
Igor Nikonov	fbf4baf47e	Merge remote-tracking branch 'origin/master' into patch-2	2024-08-11 11:52:42 +00:00
Azat Khuzhin	29afd2de78	Remove "Processing configuration file" message from clickhouse-local Make the behaviour identical to the clickhouse-client Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2024-08-11 13:26:45 +02:00

1 2 3 4 5 ...

150337 Commits