ClickHouse® is a real-time analytics DBMS
Go to file
Azat Khuzhin 177c98b6a9 Use "exact" matching for fuzzy search
Right now fuzzy search is too smart for SQL, it even takes into account
the case, which should not be accounted (you don't want to type "SELECT"
instead of "select" to find the query).

And to tell the truth, I think too smart fuzzy searching for SQL queries
is not required, and is only harming.

Exact matching seems better algorithm for SQL, it is not 100% exact, it
splits by space, and apply separate matcher actually for each word.
Note, that if you think that "space is not enough" as the delimiter,
then you should first know that this is the delimiter only for the
input query, so to match "system.query_log" you can use "sy qu log"
(also you can disable exact mode by prepending "'" char).

But it ignores the case by default, and the behaviour what is expected
from the CaseMatching::Ignore.

TL;DR;

Just for the history I will describe what had been tried.

At first I tried CaseMatching::Ignore - it does not helps for
SkimV1/SkimV2/Clangd matches.

So I converted lines from the history and input query, to the lower
case. However this does not work for UPPER CASE, since only initial
portion of the query had been converted to the lower.

Then I've looked into skim/fuzzy-matcher crates code, and look for the
reason why CaseMatching::Ignore does not work, and found that there is
still a penalty for case mismatch, but there is no way to pass it from
the user code, so I've tried guerrilla to monkey patch the library's
code and it works:

    // Avoid penalty for case mismatch (even with CaseMatching::Ignore)
    let _guard = guerrilla::patch0(SkimScoreConfig::default, || {
        let score_match = 16;
        let gap_start = -3;
        let gap_extension = -1;
        let bonus_first_char_multiplier = 2;

        return SkimScoreConfig{
            score_match,
            gap_start,
            gap_extension,
            bonus_first_char_multiplier,
            bonus_head: score_match / 2,
            bonus_break: score_match / 2 + gap_extension,
            bonus_camel: score_match / 2 + 2 * gap_extension,
            bonus_consecutive: -(gap_start + gap_extension),
            // penalty_case_mismatch: gap_extension * 2,
            penalty_case_mismatch: 0,
        };
    });

But this does not sounds like a trivial code, so I decided, to look
around, and realized that "exact" matching should do what is required
for the completion of queries (at least from my point of view).

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-02-04 14:15:02 +01:00
.github Merge pull request #45568 from ClickHouse/keeper-systemd 2023-02-03 23:08:02 +01:00
base fix TSA support 2023-01-10 01:19:42 +00:00
benchmark Remove old file 2022-07-12 20:28:02 +02:00
cmake Update version to 23.2.1.1 2023-01-25 23:57:29 +01:00
contrib Define S3 client with bucket and endpoint resolution (#45783) 2023-02-03 14:30:52 +01:00
docker Merge pull request #38983 from CurtizJ/randomize-mt-settings 2023-02-04 02:59:52 +01:00
docs fix heading level 2023-02-03 13:57:47 -05:00
packages Do not use debconf/confmodule in tgz packages 2023-02-03 12:16:19 +01:00
programs Merge pull request #38983 from CurtizJ/randomize-mt-settings 2023-02-04 02:59:52 +01:00
rust Use "exact" matching for fuzzy search 2023-02-04 14:15:02 +01:00
src Merge pull request #45985 from ClickHouse/fix-crash-in-regression 2023-02-04 03:01:46 +01:00
tests Merge pull request #45985 from ClickHouse/fix-crash-in-regression 2023-02-04 03:01:46 +01:00
utils Update version_date.tsv and changelogs after v23.1.3.5-stable 2023-02-03 13:00:13 +00:00
.clang-format add BeforeLambdaBody to .clang-format 2022-02-11 16:51:45 +01:00
.clang-tidy Temporarily disable misc-* due to being too slow 2022-12-07 11:43:47 +00:00
.editorconfig Changed tabs to spaces in editor configs and in style guide [#CLICKHOUSE-3]. 2017-04-01 11:35:09 +03:00
.exrc Fix vim settings (wrong group for autocmd) 2022-12-03 21:23:24 +01:00
.git-blame-ignore-revs Add files with revision to ignore for git blame 2022-09-13 23:05:56 +02:00
.gitattributes Ignore core.autocrlf for tests references 2022-10-05 09:13:27 +02:00
.gitignore Update .gitignore 2023-01-18 01:49:52 +01:00
.gitmodules Merge branch 'master' into iaadeflate_upgrade_qpl_v1.0.0 2023-02-02 10:20:49 +08:00
.pylintrc Cover deprecated bad-* pylint options with black 2022-06-08 14:18:28 +02:00
.snyk Add exclusions from the Snyk scan 2022-10-31 17:47:02 +01:00
.yamllint Drop truthy.check-keys from yamllint (does not supported on CI) 2021-02-21 06:15:36 +03:00
AUTHORS Update AUTHORS 2021-09-22 11:38:03 +03:00
CHANGELOG.md Update CHANGELOG.md 2023-01-31 13:41:01 +01:00
CMakeLists.txt What happens if I remove 156 lines of code? 2023-01-03 18:51:16 +00:00
CODE_OF_CONDUCT.md Add minimal code of conduct #9676 2020-03-16 12:44:28 +03:00
CONTRIBUTING.md Mention ClickHouse CLA in CONTRIBUTING.md (#32697) 2021-12-14 03:47:19 +03:00
format_sources allow several <graphite> targets (#603) 2017-03-21 23:08:09 +04:00
LICENSE Update LICENSE 2023-01-02 00:35:32 +01:00
PreLoad.cmake Update PreLoad.cmake 2022-08-26 18:30:05 +08:00
README.md Fix slack link in README 2023-01-31 19:47:22 -05:00
SECURITY.md Update version_date.tsv and changelogs after v23.1.1.3077-stable 2023-01-25 23:05:49 +00:00

ClickHouse — open source distributed column-oriented DBMS

ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

  • Official website has a quick high-level overview of ClickHouse on the main page.
  • ClickHouse Cloud ClickHouse as a service, built by the creators and maintainers.
  • Tutorial shows how to set up and query a small ClickHouse cluster.
  • Documentation provides more in-depth information.
  • YouTube channel has a lot of content about ClickHouse in video format.
  • Slack and Telegram allow chatting with ClickHouse users in real-time.
  • Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
  • Code Browser (Woboq) with syntax highlight and navigation.
  • Code Browser (github.dev) with syntax highlight, powered by github.dev.
  • Contacts can help to get your questions answered if there are any.

Upcoming events

  • Recording available: v23.1 Release Webinar 23.1 is the ClickHouse New Year release. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release. Inverted indices, query cache, and so -- very -- much more.
  • Recording available: ClickHouse Meetup at the CHEQ office in Tel Aviv - We are very excited to be holding our next in-person ClickHouse meetup at the CHEQ office in Tel Aviv! Hear from CHEQ, ServiceNow and Contentsquare, as well as a deep dive presentation from ClickHouse CTO Alexey Milovidov. Join us for a fun evening of talks, food and discussion!