After #37544 OptimizedRegularExpressionImpl started to be moved, but
StringSearcher is not copyable since it holds pointers that goes out of
scope after move (before Regexps::get() returns std::shared_ptr<Regexp>
but it had been replaced with Regexps::createRegexp() that returns
Regexp object).
<details>
<summary>ASan report</summary>
==48348==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff577239a9 at pc 0x00001518209b bp 0x7fff57723820 sp 0x7fff57723818
READ of size 1 at 0x7fff577239a9 thread T0
0 0x1518209a in char8_t const* DB::StringSearcher<true, true>::search<char8_t>(char8_t const*, char8_t const*) const /bld/./src/Common/StringSearcher.h:730:41
1 0x1518dd3f in char8_t const* DB::StringSearcher<true, true>::search<char8_t>(char8_t const*, unsigned long) const /bld/./src/Common/StringSearcher.h:751:16
2 0x1518dd3f in OptimizedRegularExpressionImpl<false>::match(char const*, unsigned long, std::__1::vector<OptimizedRegularExpressionDetails::Match, std::__1::allocator<OptimizedRegularExpressionDetails::Match> >&, unsigned int) const /bld/./src/Common/OptimizedRegularExpression.cpp:463:54
3 0x1811cb42 in DB::ExtractAllImpl::get(char const*&, char const*&) /bld/./src/Functions/FunctionsStringArray.h:588:18
4 0x1811aa62 in DB::FunctionTokens<DB::ExtractAllImpl>::executeImpl(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> > const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long) const /bld/./src/Functions/FunctionsStringArray.h:704:30
5 0x14fe17b4 in DB::IFunction::executeImplDryRun(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> > const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long) const /bld/./src/Functions/IFunction.h:409:16
Address 0x7fff577239a9 is located in stack of thread T0 at offset 201 in frame
0 0x1518d98f in OptimizedRegularExpressionImpl<false>::match(char const*, unsigned long, std::__1::vector<OptimizedRegularExpressionDetails::Match, std::__1::allocator<OptimizedRegularExpressionDetails::Match> >&, unsigned int) const /bld/./src/Common/OptimizedRegularExpression.cpp:439
</details>
CI: https://s3.amazonaws.com/clickhouse-test-reports/39342/c6f7698f9ad6ae22199182ebf7c3b2dac77d69d8/fuzzer_astfuzzerasan,actions//report.htmlFixes: #37544 (cc @rschu1ze)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Current implementation of NOEXCEPT_SCOPE will not work, you cannot
rethrow exception outside the catch block, this will simply terminate
(via std::terminate) the program.
In other words NOEXCEPT_SCOPE macro will simply call std::terminate on
exception and will lost original exception.
But if NOEXCEPT_SCOPE will accept the code that should be runned w/o
exceptions, then it can catch exception and log it, rewrite it in this
way.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
As found by @KochetovNicolai before this patch, lambda in
ThreadFromGlobalPool() ctor assigns value only to a copy of the
thread_id value, and so check in joinable() had been working
incorrectly, fix this by changing the value not the shared_ptr itself.
Also it is not safe to assign thread_id w/o atomics, since this can be
racy, so wrap id with std::atomic<>
Fixes: #28431
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
joinable() should be used only outside, since internally it is enough to
know `state` to know that something is wrong.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
A simple HelloWorld program with zero includes except iostream triggers
a build of ca. 2000 source files. The reason is that ClickHouse's
top-level CMakeLists.txt overrides "add_executable()" to link all
binaries against "clickhouse_new_delete". This links against
"clickhouse_common_io", which in turn has lots of 3rd party library
dependencies ... Without linking "clickhouse_new_delete", the number of
compiled files for "HelloWorld" goes down to ca. 70.
As an example, the self-extracting-executable needs none of its current
dependencies but other programs may also benefit.
In order to restore access to the original "add_executable()", the
overriding version is now prefixed. There is precedence for a
"clickhouse_" prefix (as opposed to "ch_"), for example
"clickhouse_split_debug_symbols". In general prefixing makes sense also
because overriding CMake commands relies on undocumented behavior and is
considered not-so-great practice (*).
(*) https://crascit.com/2018/09/14/do-not-redefine-cmake-commands/
When WRITE lock attemp fails (exclusive lock for ALTER/DELETE), and
there are multiple READ locks (shared lock for SELECT/INSERT), i.e. one
INSERT is in progress and one SELECT is queued after ALTER/DELETE
started but before it fails, this SELECT will wait until INSERT will
finishes.
This happens because in case of WRITE lock failure it does not notify
the next READ lock that can be acquired.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
SQL functions countSubstrings(), countSubstringsCaseInsensitive(),
countSubstringsUTF8(), position(), positionCaseInsensitive(),
positionUTF8() with non-const pattern argument use fallback sorters
LibCASCIICaseSensitiveStringSearcher and LibCASCIICaseInsensitiveStringSearcher
which call ::strstr(), resp. ::strcasestr(). These functions assume that
the haystack is 0-terminated and they even document that. However, the
callers did not check if the haystack contains 0-byte (perhaps because
its sort of expensive). As a consequence, if the haystack contained a
zero byte in it's payload, matches behind this zero byte were ignored.
create table t (id UInt32, pattern String) engine = MergeTree() order by id;
insert into t values (1, 'x');
select countSubstrings('aaaxxxaa\0xxx', pattern) from t;
We returned 3 before this commit, now we return 6