Commit Graph

9 Commits

Author SHA1 Message Date
Azat Khuzhin
1d4a7c7290 Add support of !/* (exclamation/asterisk) in custom TLDs
Public suffix list may contain special characters (you may find format
here - [1]):
- asterisk (*)
- exclamation mark (!)

  [1]: https://github.com/publicsuffix/list/wiki/Format

It is easier to describe how it should be interpreted with an examples.

Consider the following part of the list:

    *.sch.uk
    *.kawasaki.jp
    !city.kawasaki.jp

And here are the results for `cutToFirstSignificantSubdomainCustom()`:

If you have only asterisk (*):

    foo.something.sheffield.sch.uk -> something.sheffield.sch.uk
    sheffield.sch.uk               -> sheffield.sch.uk

If you have exclamation mark (!) too:

    foo.kawasaki.jp                -> foo.kawasaki.jp
    foo.foo.kawasaki.jp            -> foo.foo.kawasaki.jp
    city.kawasaki.jp               -> city.kawasaki.jp
    some.city.kawasaki.jp          -> city.kawasaki.jp

TLDs had been verified wit the following script [2], to match with
python publicsuffix2 module.

  [2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6

v2: fix StringHashTable padding requirements
Fixes: #39468
Follow-up for: #17748
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-26 08:34:30 +03:00
Robert Schulze
deda29b46b
Pass const StringRef by value, not by reference
See #39224
2022-07-15 11:34:56 +00:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Azat Khuzhin
91621cfcd2 Fix reading of custom TLD w/o new line at EOF
Fixes: #28177
2021-08-27 00:43:21 +03:00
Azat Khuzhin
bad1f91f28 Fix reading of custom TLDs (stops processing with lower buffer or bigger file) 2021-07-29 10:36:32 +03:00
Azat Khuzhin
840a21d073 Add top_level_domains_path for easier overriding 2020-12-09 21:08:31 +03:00
Azat Khuzhin
c987be632f Switch TLDList to StringHashSet (to avoid errors on collisions) 2020-12-09 21:08:30 +03:00
Azat Khuzhin
916cbd6610 Add ability to use custom TLD list
v2: Add a note that top_level_domains_lists aren not applied w/o restart
v3: Remove ExtractFirstSignificantSubdomain{Default,Custom}Lookup.h headers
v4: TLDListsHolder: remove FIXME for dense_hash_map (this is not significant)
2020-12-09 21:08:22 +03:00