Commit Graph

150325 Commits

Author SHA1 Message Date
Robert Schulze
9ad890e399
Cosmetics: whitespaces 2024-08-12 15:30:07 +00:00
Robert Schulze
27a6931a35
Cosmetics: variable naming 2024-08-12 15:29:59 +00:00
Robert Schulze
289c27c804
Introduce version for for index files in persistence 2024-08-12 15:29:02 +00:00
Robert Schulze
4ad624cb7e
Cosmetics 2024-08-12 15:28:58 +00:00
Robert Schulze
74de79e52b
Addd logging of basic statistics 2024-08-12 15:28:46 +00:00
Robert Schulze
8853b3359b
Remove useless templatization
Makes the code cleaner, compile faster, and the binary smaller.
2024-08-12 15:27:06 +00:00
Robert Schulze
4f23f7754b
Cosmetics 2024-08-12 15:26:05 +00:00
Robert Schulze
7f611681df
Add a similar sanity check as in other skipping indexes 2024-08-12 15:26:01 +00:00
Robert Schulze
f944ef25bb
Better handling of errors during add, search, and save 2024-08-12 15:25:58 +00:00
Robert Schulze
e7c2bf49c3
Add detach/attach test 2024-08-12 15:25:55 +00:00
Robert Schulze
40bed3e20f
Remove support for WHERE-type queries
These kind of vector search similarity queries are rather obscure and
rare in practice. They require the user to specify a maximum distance
which is not intuitive to obtain. Furthermore, these queries are not
natively supported in USearch, so the vector search index had to emulate
these queries.

Therefore simplifying the code base and restricting vector search to
ORDER-BY queries only.
2024-08-12 15:25:52 +00:00
Robert Schulze
abb8e61981
Remove support code for Lp norm in vector search
It is a generalization of other norms, too expensive to calculate and
not relevant in practice. Also, Usearch doesn't support it.
2024-08-12 15:25:48 +00:00
Robert Schulze
65186f0b69
Remove tuple support
Indexes for approximate nearest neighbourhood (ANN) search (USearch) can
be build on columns of type Array(Float32) or Tuple(Float32[, Float32[, ...]]).
In practice, Arrays(Float32) is the only relevant data type.
Arrays store high-dimensional embeddings consecutively (--> cache
locality) and the additional flexibility of different data types in a
tuple is not needed for vector search.

Therefore removing support for ANN indexes over tuple columns to
simplify the code, tests and docs.
2024-08-12 15:25:39 +00:00
Robert Schulze
218421c255
Remove Annoy indexes
Annoy indexes fell out of favor in the community, at least when it comes
to vector databases. Such indexes work okay-ish low dimensions but they
suffers badly from a curse of dimensionality which makes them inapt for
a high number of dimensions.

Now that Annoy is gone, issue (*) also disappears and we can drop
'no-ubsan', 'no-cpu-aarch64', and 'no-asan' from tests.

(*) spotify/annoy#456
2024-08-12 15:24:49 +00:00
Robert Schulze
7c41939921
Fix test results (no analyzer support yet ...) 2024-08-12 15:24:22 +00:00
Robert Schulze
d7211f9d12
Fix CMake integration of usearch and annoy
Registers usearch and annoy properly via configure_config.cmake and
config.h.in like all other 3rd party libs, instead of (mis)using
target_compile_definitions.
2024-08-12 15:24:18 +00:00
Robert Schulze
a39b9cf643
Un-screw usearch's build description
No directory 'SimSIMD-map' exists, the build only worked because SimSIMD
support in usearch was (accidentally?) disabled. This commit corrects
the build description. SimSIMD support in usearch will be enabled by a
later commit.
2024-08-12 15:24:14 +00:00
Robert Schulze
85f63b056b
Merge pull request #68135 from ClickHouse/refactor-field-get
Only use Field::safeGet - Field::get prone to type punning
2024-08-12 14:25:11 +00:00
Nikita Taranov
2f546fb513
Merge pull request #68098 from aiven-sal/aiven-sal/segfault
Fix UB in hopEnd, hopStart, tumbleEnd, and tumbleStart
2024-08-12 12:09:23 +00:00
Sema Checherinda
5e836bc20e
Merge pull request #67472 from ClickHouse/chesema-02765
speed up system flush logs
2024-08-12 11:51:55 +00:00
vdimir
52f37f2ec6
Merge pull request #67980 from ClickHouse/vdimir/fix_03130_convert_outer_join_to_inner_join
Fix 03130_convert_outer_join_to_inner_join
2024-08-12 11:34:10 +00:00
Robert Schulze
0aa30b10d5
Merge pull request #68069 from rschu1ze/cmake-cleanup
Minor CMake cleanup
2024-08-12 06:43:00 +00:00
Alexey Milovidov
2a12604cf5
Merge pull request #66494 from azat/gdb-image
Update gdb to 15.1 (by compiling from sources)
2024-08-12 05:04:57 +00:00
Alexey Milovidov
94fe53de64 Merge branch 'master' into vdimir/fix_03130_convert_outer_join_to_inner_join 2024-08-12 07:03:34 +02:00
Alexey Milovidov
0b1887eb65
Merge pull request #68138 from jsc0218/Fix01710
Fix01710 Timeout
2024-08-12 04:07:52 +00:00
Alexey Milovidov
c1c8e6dd8d
Merge pull request #68099 from GrahamCampbell/patch-2
Do not apply redundant sorting removal when there's an offset
2024-08-11 23:59:15 +00:00
Alexey Milovidov
d18a68f285
Merge pull request #68160 from azat/tests/02122_join_group_by_timeout
tests: fix 02122_join_group_by_timeout flakiness
2024-08-12 02:20:51 +02:00
Alexey Milovidov
c462f4639b
Merge pull request #68161 from narqo/patch-1
Fix typos in Prometheus protocol docs
2024-08-11 23:55:53 +00:00
Alexey Milovidov
d2a9eaaa01
Merge pull request #68157 from azat/local-fix-log
Remove "Processing configuration file" message from clickhouse-local
2024-08-11 23:07:43 +00:00
Yakov Olkhovskiy
5c8665c660 fix system.kafka_consumers and doc, fix tidy 2024-08-11 20:40:55 +00:00
Alexey Milovidov
4de79653ea
Merge pull request #68134 from azat/tests/01246_buffer_flush
tests: fix 01246_buffer_flush flakiness due to slow trace_log flush
2024-08-11 17:56:05 +00:00
Azat Khuzhin
e384e2c38e tests: fix 02122_join_group_by_timeout flakiness
CI found [1] failure of the test:

    2024-08-11 21:06:07 /usr/share/clickhouse-test/queries/0_stateless/02122_join_group_by_timeout.sh: line 51: 52614 Killed                  timeout -s KILL $MAX_PROCESS_WAIT $CLICKHOUSE_CLIENT -q "SELECT a.name as n

And the problem is not the server, but the client, since query executed
for ~1 second:

    2024.08.11 21:06:02.284318 [ 49232 ] {ba989ee2-f615-49ca-bcd8-31b3916aeb2c} <Debug> executeQuery: (from [::1]:54144) (comment: 02122_join_group_by_timeout.sh) SELECT a.name as n FROM ( SELECT 'Name' as name, number FROM system.numbers LIMIT 2000000 ) AS a, ( SELECT 'Name' as name2, number FROM system.numbers LIMIT 2000000 ) as b FORMAT Null SETTINGS max_execution_time = 1, timeout_overflow_mode = 'break'  (stage: Complete)
    2024.08.11 21:06:03.331249 [ 49232 ] {ba989ee2-f615-49ca-bcd8-31b3916aeb2c} <Debug> executeQuery: Read 517104 rows, 3.95 MiB in 1.072023 sec., 482362.78512681165 rows/sec., 3.68 MiB/sec.

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/67134/18da3f0ab63da1eef9396627d0dfd56cf5356f65/stateless_tests__msan__[1_4].html

So instead of using timeout, let's use time from the system.query_log
instead.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2024-08-11 18:39:09 +02:00
Vladimir Varankin
d314e5aa45
typos in prometheus.md 2024-08-11 18:37:29 +02:00
Yakov Olkhovskiy
8e706265e6 fix 2024-08-11 16:29:35 +00:00
Igor Nikonov
4ef3fe416d Fix and simplify test 2024-08-11 13:08:53 +00:00
Yakov Olkhovskiy
4fec61da55 fix wrong datatype in system.kafka_consumers 2024-08-11 12:35:27 +00:00
Igor Nikonov
fbf4baf47e Merge remote-tracking branch 'origin/master' into patch-2 2024-08-11 11:52:42 +00:00
Azat Khuzhin
29afd2de78 Remove "Processing configuration file" message from clickhouse-local
Make the behaviour identical to the clickhouse-client

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2024-08-11 13:26:45 +02:00
Alexander Tokmakov
7de1c4bfc3
Merge pull request #68051 from ClickHouse/tavplubix-patch-10
Update test_drop_is_lock_free/test.py
2024-08-11 11:21:38 +00:00
Alexander Tokmakov
459e34b7b8
Merge pull request #68156 from ClickHouse/revert-68034-stats-tests-refactoring
Revert "Refactor tests for (experimental) statistics"
2024-08-11 11:20:15 +00:00
Alexander Tokmakov
53bc1b7e35
Revert "Refactor tests for (experimental) statistics" 2024-08-11 13:19:36 +02:00
Robert Schulze
f7e7a884b5
Merge pull request #67962 from Blargian/docs_toDecimalXYZ
Docs:`toDecimal32/64/128/256` and variants
2024-08-11 09:23:40 +00:00
Robert Schulze
45db564354
Merge pull request #68034 from rschu1ze/stats-tests-refactoring
Refactor tests for (experimental) statistics
2024-08-11 08:43:31 +00:00
Robert Schulze
4502862033
Fix no-SSE3 build 2024-08-11 08:35:47 +00:00
Azat Khuzhin
1142305b11 tests: fix 01246_buffer_flush flakiness due to slow trace_log flush
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2024-08-11 07:06:51 +02:00
Alexey Milovidov
80b18a8591
Merge pull request #68145 from ClickHouse/followup-68112
Remove the extra cell from reports when it is not necessary
2024-08-11 02:42:35 +00:00
Yakov Olkhovskiy
e93584e741 fix Field conversion to IPv4 2024-08-10 23:02:30 +00:00
Igor Nikonov
9ce97e918b
Merge branch 'master' into patch-2 2024-08-11 00:07:46 +02:00
János Benjamin Antal
613ebe367c Only add extra cell when necessary 2024-08-10 22:05:11 +00:00
Alexey Milovidov
639cfd3cc2
Merge pull request #68136 from azat/tests/01600_parts_states_metrics_long
tests: attempt to fix 01600_parts_states_metrics_long (by forbid parallel run)
2024-08-10 21:13:22 +00:00