Public suffix list may contain special characters (you may find format
here - [1]):
- asterisk (*)
- exclamation mark (!)
[1]: https://github.com/publicsuffix/list/wiki/Format
It is easier to describe how it should be interpreted with an examples.
Consider the following part of the list:
*.sch.uk
*.kawasaki.jp
!city.kawasaki.jp
And here are the results for `cutToFirstSignificantSubdomainCustom()`:
If you have only asterisk (*):
foo.something.sheffield.sch.uk -> something.sheffield.sch.uk
sheffield.sch.uk -> sheffield.sch.uk
If you have exclamation mark (!) too:
foo.kawasaki.jp -> foo.kawasaki.jp
foo.foo.kawasaki.jp -> foo.foo.kawasaki.jp
city.kawasaki.jp -> city.kawasaki.jp
some.city.kawasaki.jp -> city.kawasaki.jp
TLDs had been verified wit the following script [2], to match with
python publicsuffix2 module.
[2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6
v2: fix StringHashTable padding requirements
Fixes: #39468
Follow-up for: #17748
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
- fix the link to 22.7 log in table of contents
- To allow reuse of the changelog in the docs I need to change the `<br>` to <br/>. With this change we can update the doc build process to
```cp $PARENT_DIR/CHANGELOG.md $PARENT_DIR/clickhouse-docs/docs/en/whats-new/changelog/index.md
```
getauxval() from glibc-compatibility did not work always correctly:
- It does not work after setenv(), and this breaks vsyscalls,
like sched_getcpu() [1] (and BaseDaemon.cpp always set TZ if timezone
is defined, which is true for CI [2]).
Also note, that fixing setenv() will not fix LSan,
since the culprit is getauxval()
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1163404
[2]: ClickHouse#32928 (comment)
- Another think that is definitely broken is LSan (Leak Sanitizer), it
relies on worked getauxval() but it does not work if __environ is not
initialized yet (there is even a commit about this).
And because of, at least, one leak had been introduced [3]:
[3]: ClickHouse#33840
Fix this by using /proc/self/auxv with fallback to environ solution to
make it compatible with environment that does not allow reading from
auxv (or no procfs).
v2: add fallback to environ solution
v3: fix return value for __auxv_init_procfs()
(cherry picked from commit f187c3499a)
v4: more verbose message on errors, CI founds [1]:
AUXV already has value (529267711)
[1]: https://s3.amazonaws.com/clickhouse-test-reports/39103/2325f7e8442d1672ce5fb43b11039b6a8937e298/stress_test__memory__actions_.html
v5: break at AT_NULL
v6: ignore AT_IGNORE
v7: suppress TSan and remove superior check to avoid abort() in case of race
v8: proper suppressions (not inner function but itself)
Refs: #33957
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>