Commit Graph

19 Commits

Author SHA1 Message Date
Han Fei
a257ff6cf3 address comment 2023-05-22 10:41:22 +02:00
Han Fei
ea59761809 fix OptimizeRegularExpression 2023-05-16 15:25:04 +02:00
Han Fei
02de4ad6df address comments 2023-03-22 17:50:19 +01:00
Han Fei
575c4263a3 address comments 2023-03-22 17:47:25 +01:00
Han Fei
d78a9e03ad refine 2023-03-15 15:38:11 +01:00
Han Fei
076d33bb03 refine a little bit 2023-03-14 18:15:42 +01:00
Han Fei
01be209e43 fix test 2023-03-14 17:44:02 +01:00
Han Fei
de8d0040a4 refine code 2023-03-13 18:34:47 +01:00
Han Fei
39a1185486 fix test 2023-03-10 15:30:29 +01:00
Han Fei
420108a7a0 support alternatives 2023-03-06 19:10:36 +01:00
Han Fei
c1e80683c4 Refine OptimizeRegularExpression Function 2023-03-03 17:59:21 +01:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages (#45449)
* save format string for NetException

* format exceptions

* format exceptions 2

* format exceptions 3

* format exceptions 4

* format exceptions 5

* format exceptions 6

* fix

* format exceptions 7

* format exceptions 8

* Update MergeTreeIndexGin.cpp

* Update AggregateFunctionMap.cpp

* Update AggregateFunctionMap.cpp

* fix
2023-01-24 00:13:58 +03:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Azat Khuzhin
18eb8b6d48 Fix UB (stack-use-after-scope) in extactAll()
After #37544 OptimizedRegularExpressionImpl started to be moved, but
StringSearcher is not copyable since it holds pointers that goes out of
scope after move (before Regexps::get() returns std::shared_ptr<Regexp>
but it had been replaced with Regexps::createRegexp() that returns
Regexp object).

<details>

<summary>ASan report</summary>

    ==48348==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff577239a9 at pc 0x00001518209b bp 0x7fff57723820 sp 0x7fff57723818
    READ of size 1 at 0x7fff577239a9 thread T0
        0 0x1518209a in char8_t const* DB::StringSearcher<true, true>::search<char8_t>(char8_t const*, char8_t const*) const /bld/./src/Common/StringSearcher.h:730:41
        1 0x1518dd3f in char8_t const* DB::StringSearcher<true, true>::search<char8_t>(char8_t const*, unsigned long) const /bld/./src/Common/StringSearcher.h:751:16
        2 0x1518dd3f in OptimizedRegularExpressionImpl<false>::match(char const*, unsigned long, std::__1::vector<OptimizedRegularExpressionDetails::Match, std::__1::allocator<OptimizedRegularExpressionDetails::Match> >&, unsigned int) const /bld/./src/Common/OptimizedRegularExpression.cpp:463:54
        3 0x1811cb42 in DB::ExtractAllImpl::get(char const*&, char const*&) /bld/./src/Functions/FunctionsStringArray.h:588:18
        4 0x1811aa62 in DB::FunctionTokens<DB::ExtractAllImpl>::executeImpl(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> > const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long) const /bld/./src/Functions/FunctionsStringArray.h:704:30
        5 0x14fe17b4 in DB::IFunction::executeImplDryRun(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> > const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long) const /bld/./src/Functions/IFunction.h:409:16

    Address 0x7fff577239a9 is located in stack of thread T0 at offset 201 in frame
        0 0x1518d98f in OptimizedRegularExpressionImpl<false>::match(char const*, unsigned long, std::__1::vector<OptimizedRegularExpressionDetails::Match, std::__1::allocator<OptimizedRegularExpressionDetails::Match> >&, unsigned int) const /bld/./src/Common/OptimizedRegularExpression.cpp:439

</details>

CI: https://s3.amazonaws.com/clickhouse-test-reports/39342/c6f7698f9ad6ae22199182ebf7c3b2dac77d69d8/fuzzer_astfuzzerasan,actions//report.html
Fixes: #37544 (cc @rschu1ze)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-20 10:18:44 +03:00
Robert Schulze
01ab7b9bad
Pass strings in some places as string_view
The original goal was to get change

  const auto & needle = String(
        reinterpret_cast<const char *>(cur_needle_data),
        cur_needle_length);

in Functions/MatchImpl.h into a std::string_view to save an allocation +
copy. The needle is eventually passed as search pattern into the re2
library. Re2 has an alternative constructor taking a const char * i.e. a
NULL-terminated string. Here, the needle is NULL-terminated but
1. this is only because it is passed inside a ColumnString yet this is
   not always the case (e.g. fixed string columns has a dense layout w/o
   NULL terminator).
2. assuming NULL termination for users != MatchImpl of the regex code is
   too dangerous.

So, for now we'll stay with copying to be on the safe side. One fine day
when re2 has a ptr/size ctor, we can use std::string_view.

Just changing a few other places from std::string to std::string_view
but this will not help with performance.
2022-05-25 10:05:51 +02:00
Alexey Milovidov
cbeeb7ec4f Remove Arcadia 2022-04-16 00:20:47 +02:00
Alexey Milovidov
30f1f88118 Allow case-insensitive regexps; added a test #11101 2020-06-14 03:43:42 +03:00
Alexey Milovidov
dea8d366c9 Loose some limitation 2020-05-07 04:29:31 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00