Commit Graph

35 Commits

Author SHA1 Message Date
Robert Schulze
49934a3dc8
Cache compiled regexps when evaluating non-const needles
Needles in a (non-const) needle column may repeat and this commit allows
to skip compilation for known needles. Out of the different design
alternatives (see below, if someone is interested), we now maintain
- one global pattern cache,
- with a fixed size of 42k elements currently,
- and use LRU as eviction strategy.

------------------------------------------------------------------------

(sorry for the wall of text, dumping it here not for reading but just
for reference)

Write-up about considered design alternatives:

1. Keep the current global cache of const needles. For non-const
   needles, probe the cache but don't store values in it.
   Pros: need to maintain just a single cache, no problem with cache
         pollution assuming there are few distinct constant needles
   Cons: only useful if a non-const needle occurred as already as a
         const needle
   --> overall too simplistic

2. Keep the current global cache for const needles. For non-const
   needles, create a local (e.g. per-query) cache
   Pros: unlike (1.), non-const needles can be skipped even if they
         did not occur yet, no pollution of the const pattern cache when
         there are very many non-const needles (e.g. large / highly
         distinct needle columns).
   Cons: caches may explode "horizontally", i.e. we'll end up with the
         const cache + caches for Q1, Q2, ... QN, this makes it harder
         to control the overall space consumption, also patterns
         residing in different caches cannot be reused between queries,
         another difficulty is that the concept of "query" does not
         really exist at matching level - there are only column chunks
         and we'd potentially end up with 1 cache / chunk

3. Queries with const and non-const needles insert into the same global
   cache.
   Pros: the advantages of (2.) + allows to reuse compiled patterns
         accross parallel queries
   Cons: needs an eviction strategy to control cache size and pollution
         (and btw. (2.) also needs eviction strategies for the
         individual caches)

4. Queries with const needle use global cache, queries with non-const
   needle use a different global cache
   --> Overall similar to (3) but ignores the (likely) edge case that
       const and non-const needles overlap.

In sum, (3.) seems the simplest and most beneficial approach.

Eviction strategies:

0. Don't ever evict --> cache may grow infinitely and eventually make
   the system unusable (may even pose a DoS risk)

1. Flush the cache after a certain threshold is exceeded --> very
   simple but may lead to peridic performance drops

2. Use LRU --> more graceful performance degradation at threshold but
   comes with a (constant) performance overhead to maintain the LRU
   queue

In sum, given that the pattern compilation in RE2 should be quite costly
(pattern-to-DFA/NFA), LRU may be acceptable.
2022-05-25 22:04:06 +02:00
Robert Schulze
7232f47c68
Fix Bug 37114 - ilike on FixedString(N) columns produces wrong results
The main fix is in MatchImpl.h where the "case_insensitive" parameter is
added to Regexps::get().

Also made "case_insensitive" a non-default template parameter to reduce
the risk of future bugs.

The remainder of this commit are minor random code improvements.

resoves #37114
2022-05-11 14:30:21 +02:00
Antonio Andelic
9990abb76a Use compile-time check for Exception messages, fix wrong messages 2022-03-29 13:16:11 +00:00
Maksim Kita
538f8cbaad Fix clang-tidy warnings in Disks, Formats, Functions folders 2022-03-14 18:17:35 +00:00
taiyang-li
b9c42effb4 change as requested 2022-02-05 19:30:40 +08:00
taiyang-li
e9c435a23f fix style 2022-01-30 13:23:11 +08:00
taiyang-li
c9d5251e12 finish dev 2022-01-30 09:10:27 +08:00
taiyang-li
5228a3e421 commit again 2022-01-29 23:42:04 +08:00
Dmitry Novik
64b0365848 Fix build 2021-12-15 20:40:36 +03:00
Nickita Taranov
734bb5b026 support any serializable column 2021-10-28 23:17:45 +03:00
Nickita Taranov
877e8b579b fix style 2021-10-24 14:45:12 +03:00
Nickita Taranov
2d23e3b17d support for ColumnString 2021-10-22 18:53:02 +03:00
Nickita Taranov
a211c9ecbc support nullable for ColumnConst 2021-10-22 18:52:59 +03:00
Pavel Kruglov
70b51133c1 Try to simplify code 2021-08-09 18:01:08 +03:00
Pavel Kruglov
0662df8b76 Fix performance with JIT, add arguments to function isSuitableForShortCircuitArgumentsExecution 2021-08-09 17:54:14 +03:00
Pavel Kruglov
e792fa588f Mark all Functions as sutable or not for executing as short circuit arguments 2021-08-09 17:50:09 +03:00
Nikolay Degterinsky
491ddc859d Fixed spelling and code style, added generated ya.make files 2021-06-19 18:52:09 +00:00
Nikolay Degterinsky
4fb23c25fb Added SplitByWhitespace & SplitByNonAlpha functions (new tokenize functions) 2021-06-19 12:33:36 +00:00
Nikolai Kochetov
dbaa6ffc62 Rename ContextConstPtr to ContextPtr. 2021-06-01 15:20:52 +03:00
Alexander Kuzmenkov
3f57fc085b remove mutable context references from functions interface
Also remove it from some visitors.
2021-05-28 19:45:37 +03:00
Maksim Kita
d923d9e6ef Function move file 2021-05-17 10:30:42 +03:00
Maksim Kita
947f28d430 IFunction refactoring 2021-05-15 20:33:15 +03:00
Vladimir
454b77c654
Update SplitByRegexpImpl 2021-05-13 13:27:29 +03:00
abel-wang
51c1d7c7ba split into characters when split by '' & add docs 2021-05-13 11:15:38 +08:00
abel-wang
99b9fe6c33 add function splitByRegexp 2021-05-13 10:37:09 +08:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Ivan Lezhankin
f897f7c93f Refactor IFunction to execute with const arguments 2020-11-17 16:24:45 +03:00
Nikolai Kochetov
740fad66f3 Part 6. 2020-10-18 22:00:13 +03:00
Nikolai Kochetov
959424f28a Rename block to columns. 2020-10-14 17:04:50 +03:00
Nikolai Kochetov
966b1d6cf5 Rename Block to ColumnsWithTypeAndName. 2020-10-14 16:09:11 +03:00
Nikolai Kochetov
3a17c2a7ac Rename FunctionArguments to ColumnsWithTypeAndName 2020-10-11 22:20:20 +03:00
Nikolai Kochetov
d28325a353 Replace getByPosition to [] 2020-10-10 21:24:57 +03:00
Nikolai Kochetov
a7fb2e38a5 Use ColumnWithTypeAndName as function argument instead of Block. 2020-10-09 10:41:28 +03:00
Nikolai Kochetov
e4689ce302 Make IFunction::executeImpl const. 2020-07-21 16:58:07 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00