Public suffix list may contain special characters (you may find format
here - [1]):
- asterisk (*)
- exclamation mark (!)
[1]: https://github.com/publicsuffix/list/wiki/Format
It is easier to describe how it should be interpreted with an examples.
Consider the following part of the list:
*.sch.uk
*.kawasaki.jp
!city.kawasaki.jp
And here are the results for `cutToFirstSignificantSubdomainCustom()`:
If you have only asterisk (*):
foo.something.sheffield.sch.uk -> something.sheffield.sch.uk
sheffield.sch.uk -> sheffield.sch.uk
If you have exclamation mark (!) too:
foo.kawasaki.jp -> foo.kawasaki.jp
foo.foo.kawasaki.jp -> foo.foo.kawasaki.jp
city.kawasaki.jp -> city.kawasaki.jp
some.city.kawasaki.jp -> city.kawasaki.jp
TLDs had been verified wit the following script [2], to match with
python publicsuffix2 module.
[2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6
v2: fix StringHashTable padding requirements
Fixes: #39468
Follow-up for: #17748
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
v2: Add a note that top_level_domains_lists aren not applied w/o restart
v3: Remove ExtractFirstSignificantSubdomain{Default,Custom}Lookup.h headers
v4: TLDListsHolder: remove FIXME for dense_hash_map (this is not significant)