mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-16 03:12:43 +00:00
1d4a7c7290
Public suffix list may contain special characters (you may find format here - [1]): - asterisk (*) - exclamation mark (!) [1]: https://github.com/publicsuffix/list/wiki/Format It is easier to describe how it should be interpreted with an examples. Consider the following part of the list: *.sch.uk *.kawasaki.jp !city.kawasaki.jp And here are the results for `cutToFirstSignificantSubdomainCustom()`: If you have only asterisk (*): foo.something.sheffield.sch.uk -> something.sheffield.sch.uk sheffield.sch.uk -> sheffield.sch.uk If you have exclamation mark (!) too: foo.kawasaki.jp -> foo.kawasaki.jp foo.foo.kawasaki.jp -> foo.foo.kawasaki.jp city.kawasaki.jp -> city.kawasaki.jp some.city.kawasaki.jp -> city.kawasaki.jp TLDs had been verified wit the following script [2], to match with python publicsuffix2 module. [2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6 v2: fix StringHashTable padding requirements Fixes: #39468 Follow-up for: #17748 Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> |
||
---|---|---|
.. | ||
basename.cpp | ||
CMakeLists.txt | ||
config_functions_url.h.in | ||
cutFragment.cpp | ||
cutQueryString.cpp | ||
cutQueryStringAndFragment.cpp | ||
cutToFirstSignificantSubdomain.cpp | ||
cutToFirstSignificantSubdomainCustom.cpp | ||
cutURLParameter.cpp | ||
cutWWW.cpp | ||
decodeURLComponent.cpp | ||
domain.cpp | ||
domain.h | ||
domainWithoutWWW.cpp | ||
ExtractFirstSignificantSubdomain.h | ||
extractURLParameter.cpp | ||
extractURLParameterNames.cpp | ||
extractURLParameters.cpp | ||
firstSignificantSubdomain.cpp | ||
firstSignificantSubdomainCustom.cpp | ||
FirstSignificantSubdomainCustomImpl.h | ||
fragment.cpp | ||
fragment.h | ||
FunctionsURL.h | ||
netloc.cpp | ||
path.cpp | ||
path.h | ||
pathFull.cpp | ||
port.cpp | ||
protocol.cpp | ||
protocol.h | ||
queryString.cpp | ||
queryString.h | ||
queryStringAndFragment.cpp | ||
queryStringAndFragment.h | ||
registerFunctionsURL.cpp | ||
tldLookup.generated.cpp | ||
tldLookup.gperf | ||
tldLookup.h | ||
tldLookup.sh | ||
topLevelDomain.cpp | ||
URLHierarchy.cpp | ||
URLPathHierarchy.cpp |