The (experimental) inverted index writes/reads files different from the
standard files written by the other skip indexes. The original problem
was that with database engine "ordinary", DROP TABLE of a table with
inverted index finds unknown files in persistence and complains. The
same will happen with engine "atomic" but deferred. As a hotfix, the
error was silenced by explicitly adding the four files created in a
specific test to the deletion code.
This PR tries a cleaner solution where all needed files are provided via
the normal checksum structure. One drawback remains which is that the
affected files were written earlier and we don't have their checksums
available. Therefore, the inverted index is currently excluded from
CHECK TABLE.
Minimal repro:
SET allow_experimental_inverted_index = 1;
DROP TABLE IF EXISTS tab;
CREATE TABLE tab(s String, INDEX af(s) TYPE inverted(2)) ENGINE = MergeTree() ORDER BY s;
INSERT INTO tab VALUES ('Alick a01');
CHECK TABLE tab;
DROP TABLE IF EXISTS tab;
run ./clickhouse-test with --db-engine Ordinary
Those tests was simply broken and timed out without failing the test
before this PRs:
- #46911
- #46857
- #46779, #46636, #46619
So after those fixes they should be fast and fasttest compatible.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Example of such greeting:
2023-02-26 15:26:33 expect: does "\u001b[0J\u001b[?25hClickHouse client version 23.3.1.207 (official build).\r\nConnecting to database test_i4r0kozg at localhost:9000 as user default.\r\nConnected to ClickHouse server version 23.3.1 revision 54461.\r\n\r\nWarnings:\r\n * Server logging level is set to 'test' and performance is degraded. This cannot be used in production.\r\n * Linux is not using a fast clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource\r\n * The setting 'allow_remote_fs_zero_copy_replication' is enabled for MergeTree tables. But the feature of 'zero-copy replication' is under development and is not ready for production. The usage of this feature can lead to data corruption and loss. The setting should be disabled in production.\r\n * Table system.session_log is enabled. It's unreliable and may contain garbage. Do not use it for any kind of security monitoring.\r\n\r\n\u001b[?2004h\u001b[1Gfunctional-tests :) " (spawn_id exp4) match regular expression "ClickHouse client version [\d]{2}.[\d]{1,2}.[\d]{1,2}.[\d]{1,}.\r"? Gate "ClickHouse client version *\r"? gate=yes re=no
CI: https://s3.amazonaws.com/clickhouse-test-reports/0/df1e18ad4cb9b08240273169ca7dd6ca1cac617c/stateless_tests__aarch64_/run.log
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Fixes#45204
The problem is that ASTSelectQuery::group_by_with_grouping_sets == true
implies ASTSelectQuery::groupBy() but sometimes this wasn't the case. I
added a sanity check a few months ago but had no idea how the AST became
corrupt.
All crashes/exceptions were during AST fuzzing. Looking at
Client/QueryFuzzer.cpp, there is a very small chance to run into the
issue. In detail:
1. In QueryFuzzer::fuzz(), we find that the AST is a ASTSelectQuery and
groupBy() returns true.
2. With small probability, we do
select->group_by_with_grouping_sets = !select->group_by_with_grouping_sets;
where the (default false) group_by_with_grouping_sets flips true.
3. With small probability, we change the expression type in the
following WHERE or PREWHERE if-branches.
This situation is illegal. One possibility is changing the fuzzing code
to not generate it. The fuzzing code is however generic, and doesn't
really care about such details. Therefore, instead add an (theoretically
unnecessary) extra check to ASTSelectQuery::formatImpl() for robustness.