mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-17 05:03:20 +00:00
152 lines
9.7 KiB
Markdown
152 lines
9.7 KiB
Markdown
---
|
|
title: 'Fuzzing ClickHouse'
|
|
date: '2021-03-08'
|
|
author: '[Alexander Kuzmenkov](https://github.com/akuzm)'
|
|
tags: ['fuzzing', 'testing']
|
|
---
|
|
|
|
Testing is a major problem in software development: there is never enough of
|
|
it. It becomes especially true in a database management system, whose task is
|
|
to interpret a query language that works on the persistent state managed by the
|
|
system in a distributed fashion. Each of these three functions is hard enough
|
|
to test even in isolation, and it gets much worse when you combine them. As
|
|
ClickHouse developers, we know this from experience. Despite a large amount of
|
|
automated testing of all kinds we routinely perform as part of our continuous
|
|
integration system, new bugs and regressions are creeping in. We are always
|
|
looking for the ways to improve our test coverage, and this article will
|
|
describe our recent development in this area -- the AST-based query fuzzer.
|
|
|
|
A natural form of testing for a SQL DBMS is to create an SQL script describing
|
|
the test case, and record its reference result. To test, we run the script and
|
|
check that the result matches the reference. This is used in many SQL DBMSes,
|
|
and it is the default kind of a test you are expected to write for any
|
|
ClickHouse feature or fix. Currently we have [73k lines of SQL tests
|
|
alone](https://github.com/ClickHouse/ClickHouse/tree/master/tests/queries/0_stateless),
|
|
that reach the [code coverage of
|
|
76%](https://clickhouse-test-reports.s3.yandex.net/0/47d684a5c35410201d4dd4f63f3287bf25cdabb7/coverage_report/test_output/index.html).
|
|
|
|
This form of testing, where a developer writes a few simplified examples of how
|
|
the feature can and cannot be used, is sometimes called "example-based
|
|
testing". Sadly, the bugs often appear in various corner cases and intersecion
|
|
of features, and it is not practical to enumerate them all by hand. There is a
|
|
technique for automating this process, called "property-based testing". It lets
|
|
you write more general tests of the form "for all values matching these specs,
|
|
the result of some operation on them should match this other spec". For
|
|
example, such a test can check that if you add two positive numbers, the result
|
|
is greater than both of them. But you don't specify which numbers exactly, only
|
|
these properties. Then, the property testing system randomly generates some
|
|
examples with particular numbers that match the specification, and checks that
|
|
the result also matches its specification.
|
|
|
|
Property-based testing is said to be very efficient, but requires some
|
|
developer effort and expertise to write the tests in a special way. There is
|
|
another well-known testing technique that is in some sense a corner case of
|
|
property-based testing, and that doesn't require much developer time. It is
|
|
called fuzzing. When you are fuzzing your program, you feed it random inputs
|
|
generated according to some grammar, and the property you are checking is that
|
|
your program terminates correctly (no segfaults or assertions or other kinds of
|
|
program errors). Most often, the grammar of input for fuzzing is simple -- say,
|
|
bit flips and additions, or maybe some dictionary. The space of possible inputs
|
|
is huge, so to find interesting paths in it, fuzzing software records the code
|
|
paths taken by the program under test for a particular input, and focuses on
|
|
the inputs that lead to new code paths that were not seen before. It also
|
|
employs some techniques for finding interesting constant values, and so on. In
|
|
general, fuzzing allows you to find many interesting corner cases in your
|
|
program automatically, without much developer involvement.
|
|
|
|
|
|
Finding valid SQL queries with bit flips would take a long time, so there are
|
|
systems that generate valid SQL queries based on the grammar, such as
|
|
[SQLSmith](https://github.com/anse1/sqlsmith). They are succesfully used for
|
|
finding bugs in databases. It would be interesting to use such a system for
|
|
ClickHouse, but it requires some up-front effort to support the ClickHouse SQL
|
|
grammar and functions, which may be different from the standard. Also, such
|
|
systems don't use any feedback, so while they are much better than systems with
|
|
primitive grammar, they still might have a hard time finding interesting
|
|
examples. But we already have a big corpus of human-written interesting SQL
|
|
queries -- it's in our regression tests. Maybe we can use them as a base for
|
|
fuzzing? We tried to do this, and it turned out to be surprisingly simple and
|
|
efficient.
|
|
|
|
Consider some SQL query from a regression test. After parsing, it is easy to
|
|
mutate the resulting AST (abstract syntax tree, an internal representation of
|
|
the parsed query) before execution to introduce random changes into the query.
|
|
For strings and arrays, we make random modifications such as inserting a random
|
|
character or doubling the string. For numbers, there are well-known Bad Numbers
|
|
such as 0, 1, powers of two and nearby, integer limits, `NaN`. `NaN`s proved to
|
|
be especially efficient in finding bugs, because you can often have some
|
|
alternative branches in your numeric code, but for a `NaN`, both branches hold
|
|
(or not) simultaneously, so this leads to nasty effects.
|
|
|
|
Another interesting thing we can do is change the arguments to functions and
|
|
expressions in the select list. Naturally, all the interesting arguments can be
|
|
taken from other test queries. Same goes for changing the tables used in the
|
|
queries. When the fuzzer runs in CI, it runs queries from all the SQL tests in
|
|
random order, mixing in the parts of query from different tests, so that we can
|
|
eventually test all the possible permutations of our features.
|
|
|
|
The core implementation of the fuzzer is relatively small, consisting of about
|
|
700 lines of C++ code. A prototype was made in a couple of days, but naturally
|
|
it took significantly longer to polish it and to start routinely using it in
|
|
CI. It is very productive and let us find more than 200 bugs already (see the
|
|
label [fuzz](https://github.com/ClickHouse/ClickHouse/labels/fuzz) on GitHub).
|
|
Some errors it finds are not very interesting, e.g. wrong error messages when a
|
|
type of argument doesn't match. But we also found some serious logic errors or
|
|
even memory errors. We fix all the errors we find, even not significant ones,
|
|
because this lets us ensure that under normal operation, the fuzzer doesn't
|
|
find any errors. This is similar to the approach usually taken with compiler
|
|
warnings and other optional diagnostics -- it's better to fix or disable every
|
|
single case, so that you can be sure you have no diagnostics if everything is
|
|
OK, and it's easy to notice new problems.
|
|
|
|
After fixing the majority of pre-existing error, this fuzzer became efficient
|
|
for finding errors in new features. Pull requests introducing new features
|
|
normally adds an SQL test, and we pay extra attention to the new tests when
|
|
fuzzing, generating more permutations for them. Even if the coverage of the
|
|
test is not sufficient, there is a good chance that the fuzzer will find the
|
|
missing corner cases. So when we see that all the fuzzer runs in different
|
|
configurations have failed for a particular pull request, this almost always
|
|
means that it introduces a new bug.
|
|
|
|
A major factor that makes fuzzing really efficient is that we have a lot of
|
|
assertions and other checks of program logic in our code. For debug-only
|
|
checks, we use the plain `assert` macro from `<cassert>`. For checks that are
|
|
needed even in release mode, we use an exception with a special code
|
|
`LOGICAL_ERROR` that signifies an internal program error. We did some work to
|
|
ensure that these errors are distinct from errors caused by the wrong user
|
|
actions. A user error reported for a randomly generated query is normal (e.g.
|
|
it references some non-existent columns), but when we see an internal program
|
|
error, we know that it's definitely a bug, same as an assertion. Of course,
|
|
even without assertions, you get some checks for memory errors provided by the
|
|
OS (segfaults). Various kinds of sanitizers are also very useful in conjunction
|
|
with fuzzing. We run this fuzzer under clang's Address, Memory,
|
|
UndefinedBehavior and Thread sanitizers, as we do for most of our tests.
|
|
|
|
To see for yourself how it works, you only need the normal ClickHouse client.
|
|
Start `clickhouse-client --query-fuzzer-runs=100`, enter any query, and enjoy
|
|
the client going crazy and running a hundred of random queries instead. All
|
|
queries from the current session become a source for expressions for fuzzing,
|
|
so try entering several different queries to get more interesting results. Be
|
|
careful not to do this in production! When you do this experiment, you'll soon
|
|
notice that the fuzzer tends to generate queries that are too long to run. This
|
|
is why for the CI fuzzer runs we have to configure the server to limit query
|
|
execution time, memory usage and so on using the corresponding [server
|
|
settings](https://clickhouse.tech/docs/en/operations/settings/query-complexity/#:~:text=In%20the%20default%20configuration%20file,query%20within%20a%20single%20server.).
|
|
We had a hilarious situation after that: the fuzzer figured out how to remove
|
|
the limits by generating a `SET max_execution_time = 0` query, and then
|
|
generated a never-ending query and failed. Thankfully we were able to defeat
|
|
its cleverness by using [settings
|
|
constraints](https://clickhouse.tech/docs/en/operations/settings/constraints-on-settings/).
|
|
|
|
The AST-based fuzzer we discussed is only one of the many kinds of fuzzers we
|
|
have in ClickHouse. There is a talk (in Russian) [3] by Alexey Milovidov that
|
|
explores all the fuzzer in greater detail (in Russian). Another interesting
|
|
recent development is application of pivoted query synthesis technique,
|
|
implemented in [SQLancer](https://github.com/sqlancer/sqlancer), to ClickHouse.
|
|
The authors are going to give [a talk about
|
|
this](https://heisenbug-piter.ru/2021/spb/talks/nr1cwknssdodjkqgzsbvh/) soon,
|
|
so stay tuned.
|
|
|
|
12-08-21 [Alexander Kuzmenkov](https://github.com/akuzm)
|
|
|