The following new provile events had been added:
- FileSync - Number of times the F_FULLFSYNC/fsync/fdatasync function was called for files.
- DirectorySync - Number of times the F_FULLFSYNC/fsync/fdatasync function was called for directories.
- FileSyncElapsedMicroseconds - Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for files.
- DirectorySyncElapsedMicroseconds - Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for directories.
v2: rewrite test to sh with retries
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
- Introduced with the C++20 <bit> header
- The problem with __builtin_c(l|t)z() is that 0 as input has an
undefined result (*) and the code did not always check. The std::
versions do not have this issue.
- In some cases, we continue to use buildin_c(l|t)z(), (e.g. in
src/Common/BitHelpers.h) because the std:: versions only accept
unsigned inputs (and they also check that) and the casting would be
ugly.
(*) https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Public suffix list may contain special characters (you may find format
here - [1]):
- asterisk (*)
- exclamation mark (!)
[1]: https://github.com/publicsuffix/list/wiki/Format
It is easier to describe how it should be interpreted with an examples.
Consider the following part of the list:
*.sch.uk
*.kawasaki.jp
!city.kawasaki.jp
And here are the results for `cutToFirstSignificantSubdomainCustom()`:
If you have only asterisk (*):
foo.something.sheffield.sch.uk -> something.sheffield.sch.uk
sheffield.sch.uk -> sheffield.sch.uk
If you have exclamation mark (!) too:
foo.kawasaki.jp -> foo.kawasaki.jp
foo.foo.kawasaki.jp -> foo.foo.kawasaki.jp
city.kawasaki.jp -> city.kawasaki.jp
some.city.kawasaki.jp -> city.kawasaki.jp
TLDs had been verified wit the following script [2], to match with
python publicsuffix2 module.
[2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6
v2: fix StringHashTable padding requirements
Fixes: #39468
Follow-up for: #17748
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
After #37544 OptimizedRegularExpressionImpl started to be moved, but
StringSearcher is not copyable since it holds pointers that goes out of
scope after move (before Regexps::get() returns std::shared_ptr<Regexp>
but it had been replaced with Regexps::createRegexp() that returns
Regexp object).
<details>
<summary>ASan report</summary>
==48348==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff577239a9 at pc 0x00001518209b bp 0x7fff57723820 sp 0x7fff57723818
READ of size 1 at 0x7fff577239a9 thread T0
0 0x1518209a in char8_t const* DB::StringSearcher<true, true>::search<char8_t>(char8_t const*, char8_t const*) const /bld/./src/Common/StringSearcher.h:730:41
1 0x1518dd3f in char8_t const* DB::StringSearcher<true, true>::search<char8_t>(char8_t const*, unsigned long) const /bld/./src/Common/StringSearcher.h:751:16
2 0x1518dd3f in OptimizedRegularExpressionImpl<false>::match(char const*, unsigned long, std::__1::vector<OptimizedRegularExpressionDetails::Match, std::__1::allocator<OptimizedRegularExpressionDetails::Match> >&, unsigned int) const /bld/./src/Common/OptimizedRegularExpression.cpp:463:54
3 0x1811cb42 in DB::ExtractAllImpl::get(char const*&, char const*&) /bld/./src/Functions/FunctionsStringArray.h:588:18
4 0x1811aa62 in DB::FunctionTokens<DB::ExtractAllImpl>::executeImpl(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> > const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long) const /bld/./src/Functions/FunctionsStringArray.h:704:30
5 0x14fe17b4 in DB::IFunction::executeImplDryRun(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> > const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long) const /bld/./src/Functions/IFunction.h:409:16
Address 0x7fff577239a9 is located in stack of thread T0 at offset 201 in frame
0 0x1518d98f in OptimizedRegularExpressionImpl<false>::match(char const*, unsigned long, std::__1::vector<OptimizedRegularExpressionDetails::Match, std::__1::allocator<OptimizedRegularExpressionDetails::Match> >&, unsigned int) const /bld/./src/Common/OptimizedRegularExpression.cpp:439
</details>
CI: https://s3.amazonaws.com/clickhouse-test-reports/39342/c6f7698f9ad6ae22199182ebf7c3b2dac77d69d8/fuzzer_astfuzzerasan,actions//report.htmlFixes: #37544 (cc @rschu1ze)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>