The following headers are pretty generic, so use forward declaration as
much as possible:
- Context.h
- Settings.h
- ConnectionTimeouts.h
(Also this shows that some missing some includes -- this has been fixed)
And split ConnectionTimeouts.h into ConnectionTimeoutsContext.h (since
module part cannot be added for it, due to recursive build dependencies
that will be introduced)
Also remove Settings from the RemoteBlockInputStream/RemoteQueryExecutor
and just pass the context, since settings was passed only in speicifc
places, that can allow making a copy of Context (i.e. Copier).
Approx results (How much units will be recompiled after changing file X?):
- ConnectionTimeouts.h
- mainline: 100
- Context.h:
- mainline: ~800
- patched: 415
- Settings.h:
- mainline: 900-1K
- patched: 440 (most of them because of the Context.h)
v2: Add a note that top_level_domains_lists aren not applied w/o restart
v3: Remove ExtractFirstSignificantSubdomain{Default,Custom}Lookup.h headers
v4: TLDListsHolder: remove FIXME for dense_hash_map (this is not significant)
- Update after IFunction interfaces changes
- move type checks into FunctionCountMatches::getReturnTypeImpl()
- Use StringRef over String
- Separate out logic for counting sub matches into separate helper
- Do not copy other regular expression matches, only the first
- Add some comments
- Set is_no_capture, to avoid check for number of subpatterns
- Add countMatchesCaseInsensitive()
- Reguster functions in case-sensitive manner, since this is not SQL
standard
Was:
Code: 44. DB::Exception: Received from localhost:9000. DB::Exception: Illegal column UInt16 of first argument of function toUnixTimestamp: While processing toUnixTimestamp(today()).
Now:
Code: 44. DB::Exception: Received from localhost:9000. DB::Exception: Illegal type Date of first argument of function toUnixTimestamp: While processing toUnixTimestamp(today()).
Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.
The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()
v0: substringCount()
v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle
v3: rename to countSubstrings (by analogy with countEqual())
Making it implicitly cast to Date() does not looks correct, since before
it returns somewhat unexpected result:
SELECT toUnixTimestamp(today())
┌─toUnixTimestamp(today())─┐
│ 18591 │
└──────────────────────────┘
Sometimes it is odd to get TLD itself from the
cutToFirstSignificantSubdomain() (since you will not get TLD itself if
you pass it directly):
- cutToFirstSignificantSubdomain('org') -> ""
- cutToFirstSignificantSubdomain('www.org') -> org
- cutToFirstSignificantSubdomain('kernel.org') -> kernel.org
- cutToFirstSignificantSubdomain('www.kernel.org') -> kernel.org
So add one more function to get www.org in this case:
- cutToFirstSignificantSubdomainWithWWW('org') -> ""
- cutToFirstSignificantSubdomainWithWWW('www.org') -> www.org
- cutToFirstSignificantSubdomainWithWWW('kernel.org') -> kernel.org
- cutToFirstSignificantSubdomainWithWWW('www.kernel.org') -> kernel.org
P.S. not sure about the naming though, so it will great if someone has
suggestion for the name.