Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.
The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()
v0: substringCount()
v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle
v3: rename to countSubstrings (by analogy with countEqual())
Consider the following example:
CREATE TABLE test(p DateTime, k int) ENGINE MergeTree PARTITION BY toDate(p) ORDER BY k;
INSERT INTO test VALUES ('2020-09-01 00:01:02', 1), ('2020-09-01 20:01:03', 2), ('2020-09-02 00:01:03', 3);
- SELECT count() FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00'
In this case rpn will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN (due to strict), FUNCTION_AND)
and for optimize_trivial_count_query we cannot use index if there is at least one FUNCTION_UNKNOWN.
since there is no post processing and return count() based on only the first predicate is wrong.
Before this patch FUNCTION_UNKNOWN was allowed for optimize_trivial_count_query, and the result was wrong.
And two examples above just to show the difference, the behaviour hadn't been changed with this patch:
- SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00'
In this case will be (FUNCTION_IN_RANGE, FUNCTION_IN_RANGE (due to non-strict), FUNCTION_AND)
so it will prune everything out and nothing will be read.
- SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND toUnixTimestamp(p)%5==0
In this case will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN, FUNCTION_AND)
and all, two, partitions will be scanned, but due to filtering later none of rows will be matched.
Before this patch the following query ignores the settings for INSERT:
insert into test_parallel_insert select * from numbers_mt(65535*2) settings max_insert_threads=10
And the reason is that SETTINGS was parsed by the SELECT parser.
Fix this by push down the SETTINGS from the SELECT to INSERT.
Also note that since INSERT parser does not use ParserQueryWithOutput the
following works:
insert into test_parallel_insert select * from numbers_mt(65535*2) format Null settings max_insert_threads=10