Trailing escape ('ab\') is disallowed in SQL, in standardese:
"If an escape character is specified, then [...] If there is not a
partitioning of the string PVC into substrings such that each substring
has length 1 (one) or 2, no substring of length 1 (one) is the escape
character ECV, and each substring of length 2 is the escape character
ECV followed by either the escape character ECV, an <underscore>
character, or the <percent> character, then an exception condition is
raised: data exception - invalid escape sequence."
I first thought this is checked already higher up in the stack, at least
for const needles, as single trailing backslashes ('ab\') are rejected,
but then I realized that ClickHouse quotes by default. I.e., double
trailing backslashes ('ab\\') are not rejected but when interpreted as
LIKE needle ('ab\') they should.
The original goal was to get change
const auto & needle = String(
reinterpret_cast<const char *>(cur_needle_data),
cur_needle_length);
in Functions/MatchImpl.h into a std::string_view to save an allocation +
copy. The needle is eventually passed as search pattern into the re2
library. Re2 has an alternative constructor taking a const char * i.e. a
NULL-terminated string. Here, the needle is NULL-terminated but
1. this is only because it is passed inside a ColumnString yet this is
not always the case (e.g. fixed string columns has a dense layout w/o
NULL terminator).
2. assuming NULL termination for users != MatchImpl of the regex code is
too dangerous.
So, for now we'll stay with copying to be on the safe side. One fine day
when re2 has a ptr/size ctor, we can use std::string_view.
Just changing a few other places from std::string to std::string_view
but this will not help with performance.