* save format string for NetException
* format exceptions
* format exceptions 2
* format exceptions 3
* format exceptions 4
* format exceptions 5
* format exceptions 6
* fix
* format exceptions 7
* format exceptions 8
* Update MergeTreeIndexGin.cpp
* Update AggregateFunctionMap.cpp
* Update AggregateFunctionMap.cpp
* fix
- lots of static_cast
- add safe_cast
- types adjustments
- config
- IStorage::read/watch
- ...
- some TODO's (to convert types in future)
P.S. That was quite a journey...
v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
When UserInfo part and '@' appear in the URL, the host after @ should
be returned. For example, when url is "https://user:pass@clickhouse.com/",
start_of_host should be char 'c' after '@', end_of_host should be '/'
other than ':'.
Public suffix list may contain special characters (you may find format
here - [1]):
- asterisk (*)
- exclamation mark (!)
[1]: https://github.com/publicsuffix/list/wiki/Format
It is easier to describe how it should be interpreted with an examples.
Consider the following part of the list:
*.sch.uk
*.kawasaki.jp
!city.kawasaki.jp
And here are the results for `cutToFirstSignificantSubdomainCustom()`:
If you have only asterisk (*):
foo.something.sheffield.sch.uk -> something.sheffield.sch.uk
sheffield.sch.uk -> sheffield.sch.uk
If you have exclamation mark (!) too:
foo.kawasaki.jp -> foo.kawasaki.jp
foo.foo.kawasaki.jp -> foo.foo.kawasaki.jp
city.kawasaki.jp -> city.kawasaki.jp
some.city.kawasaki.jp -> city.kawasaki.jp
TLDs had been verified wit the following script [2], to match with
python publicsuffix2 module.
[2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6
v2: fix StringHashTable padding requirements
Fixes: #39468
Follow-up for: #17748
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Sometimes it is useful to build contrib with debug symbols for further
debugging.
With everything turned ON (i.e. debug build) I got 3.3GB vs 3.0GB w/o
this patch, 9% bloat, thoughts about this is this OK or not for you, if
not STRIP_DEBUG_SYMBOLS_HEAVY_CONTRIB can be OFF by default (regardless
of build type).
P.S. aws debug symbols adds just 1.7%.
v2: rename STRIP_HEAVY_DEBUG_SYMBOLS
v3: OMIT_HEAVY_DEBUG_SYMBOLS
v4: documentation had been removed
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This commit migrates ClickHouse to Vectorscan. The first 10 min of
[0] explain the reasons for it.
(*) Addresses (but does not resolve) #38046
(*) Config parameter names (e.g. "max_hyperscan_regexp_length") are
preserved for compatibility. Likewise, error codes (e.g.
"ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g.
"HyperscanDeleter") are preserved as vectorscan aims to be a drop-in
replacement.
[0] https://www.youtube.com/watch?v=KlZWmmflW6M
Official docs:
Some headers from C library were deprecated in C++ and are no longer
welcome in C++ codebases. Some have no effect in C++. For more details
refer to the C++ 14 Standard [depr.c.headers] section. This check
replaces C standard library headers with their C++ alternatives and
removes redundant ones.
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.