Commit Graph

7 Commits

Author SHA1 Message Date
vdimir
14d0f6457b
Add tests and doc for some url-related functions 2022-10-26 10:52:57 +00:00
Azat Khuzhin
1d4a7c7290 Add support of !/* (exclamation/asterisk) in custom TLDs
Public suffix list may contain special characters (you may find format
here - [1]):
- asterisk (*)
- exclamation mark (!)

  [1]: https://github.com/publicsuffix/list/wiki/Format

It is easier to describe how it should be interpreted with an examples.

Consider the following part of the list:

    *.sch.uk
    *.kawasaki.jp
    !city.kawasaki.jp

And here are the results for `cutToFirstSignificantSubdomainCustom()`:

If you have only asterisk (*):

    foo.something.sheffield.sch.uk -> something.sheffield.sch.uk
    sheffield.sch.uk               -> sheffield.sch.uk

If you have exclamation mark (!) too:

    foo.kawasaki.jp                -> foo.kawasaki.jp
    foo.foo.kawasaki.jp            -> foo.foo.kawasaki.jp
    city.kawasaki.jp               -> city.kawasaki.jp
    some.city.kawasaki.jp          -> city.kawasaki.jp

TLDs had been verified wit the following script [2], to match with
python publicsuffix2 module.

  [2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6

v2: fix StringHashTable padding requirements
Fixes: #39468
Follow-up for: #17748
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-26 08:34:30 +03:00
Azat Khuzhin
196b517e79 tests: add echo for 01601_custom_tld
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-26 08:34:30 +03:00
Azat Khuzhin
91621cfcd2 Fix reading of custom TLD w/o new line at EOF
Fixes: #28177
2021-08-27 00:43:21 +03:00
Azat Khuzhin
42a8445462 Fix constness of custom TLDs
Before this patch the functions below returns incorrect type for consts,
and hence optimize_skip_unused_shards does not work:

- cutToFirstSignificantSubdomainCustom()
- cutToFirstSignificantSubdomainCustomWithWWW()
- firstSignificantSubdomainCustom()
2021-07-07 01:27:31 +03:00
Azat Khuzhin
b68517f69e Fix cutToFirstSignificantSubdomainCustom()/firstSignificantSubdomainCustom() for 3+level domains
Custom TLD lists (added in #17748), may contain domain of the 3-d level,
however builtin TLD lists does not have such records, so it is not
affected.

Note that this will significantly increase hashtable lookups.

Fixes: #17748
2021-03-26 00:00:16 +03:00
Azat Khuzhin
8875767b87 Add a test for custom TLD 2020-12-09 21:08:30 +03:00