Commit Graph

87 Commits

Author SHA1 Message Date
Roman Vasin
c002d0944b Refactor to do erase() on small fragment 2022-12-02 17:25:55 +00:00
Roman Vasin
fe2e0987eb Remove unneeded header files 2022-12-02 11:05:54 +00:00
Roman Vasin
1a2bdf14fa Small refactor: renaming various vars 2022-12-01 14:09:37 +00:00
Roman Vasin
1cce8c023d Refactor cutURLParameter to work directly on result column 2022-12-01 13:02:37 +00:00
Roman Vasin
d4aeb9342c Add check for array type 2022-11-29 07:53:16 +00:00
Roman Vasin
b7cac89a8f Add 02483_cuturlparameter_with_arrays test 2022-11-28 16:01:29 +00:00
Roman Vasin
75b2aaad64 Use {} in exception messages 2022-11-28 15:21:57 +00:00
Roman Vasin
532e2c50d7 Fix bug with zeros after end of values 2022-11-28 09:53:34 +00:00
Roman Vasin
d0382270e5 Make array working on tables data correctly 2022-11-25 13:44:17 +00:00
Roman Vasin
7bbb32f80a Refactor code by implementation of cuURL() 2022-11-25 08:25:48 +00:00
Roman Vasin
7084fbfee3 Make iteration through array of values 2022-11-25 06:51:08 +00:00
Roman Vasin
1e10d022ce Refactor FunctionCutURLParameter is based directly on IFunction 2022-11-22 11:27:52 +00:00
taiyang-li
5fa0968bd5 reset to original solution 2022-11-03 15:05:23 +08:00
taiyang-li
e78861ad1c Merge branch 'master' into enable_max_splits 2022-11-03 14:41:52 +08:00
vdimir
14d0f6457b
Add tests and doc for some url-related functions 2022-10-26 10:52:57 +00:00
taiyang-li
fcbc217a7d enable limits for functions using FunctionTokens 2022-10-26 16:18:32 +08:00
Vladimir C
8fabe1515c
Merge pull request #42274 from dentiscalprum/fix_domain 2022-10-25 13:56:58 +02:00
Quanfa Fu
b07f65343d Add functions: domainRFC, topLevelDomainRFC, domainWithoutWWWRFC... 2022-10-23 12:01:26 +08:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Quanfa Fu
dbe68ab0a8 Fix wrong behave of domain func with URLs contains UserInfo part and '@'
When UserInfo part and '@' appear in the URL, the host after @ should
be returned. For example, when url is "https://user:pass@clickhouse.com/",
start_of_host should be char 'c' after '@', end_of_host should be '/'
other than ':'.
2022-10-19 14:27:06 +08:00
Robert Schulze
0c095b30b2
Remove unused file 2022-09-28 08:12:15 +00:00
Robert Schulze
77e64935e1
Reduce some usage of StringRef 2022-08-19 09:56:59 +00:00
Li Yin
4088c0a7f3 Automated function registration
Automated register all functions with below naming convention by
iterating through the symbols:
void DB::registerXXX(DB::FunctionFactory &)
2022-07-29 15:39:50 +08:00
Azat Khuzhin
1d4a7c7290 Add support of !/* (exclamation/asterisk) in custom TLDs
Public suffix list may contain special characters (you may find format
here - [1]):
- asterisk (*)
- exclamation mark (!)

  [1]: https://github.com/publicsuffix/list/wiki/Format

It is easier to describe how it should be interpreted with an examples.

Consider the following part of the list:

    *.sch.uk
    *.kawasaki.jp
    !city.kawasaki.jp

And here are the results for `cutToFirstSignificantSubdomainCustom()`:

If you have only asterisk (*):

    foo.something.sheffield.sch.uk -> something.sheffield.sch.uk
    sheffield.sch.uk               -> sheffield.sch.uk

If you have exclamation mark (!) too:

    foo.kawasaki.jp                -> foo.kawasaki.jp
    foo.foo.kawasaki.jp            -> foo.foo.kawasaki.jp
    city.kawasaki.jp               -> city.kawasaki.jp
    some.city.kawasaki.jp          -> city.kawasaki.jp

TLDs had been verified wit the following script [2], to match with
python publicsuffix2 module.

  [2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6

v2: fix StringHashTable padding requirements
Fixes: #39468
Follow-up for: #17748
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-26 08:34:30 +03:00
Robert Schulze
13482af4ee
First try at reducing the use of StringRef
- to be replaced by std::string_view
- suggested in #39262
2022-07-17 17:26:02 +00:00
Azat Khuzhin
e8f5cd3c68 Add separate option to omit symbols from heavy contrib
Sometimes it is useful to build contrib with debug symbols for further
debugging.

With everything turned ON (i.e. debug build) I got 3.3GB vs 3.0GB w/o
this patch, 9% bloat, thoughts about this is this OK or not for you, if
not STRIP_DEBUG_SYMBOLS_HEAVY_CONTRIB can be OFF by default (regardless
of build type).

P.S. aws debug symbols adds just 1.7%.
v2: rename STRIP_HEAVY_DEBUG_SYMBOLS
v3: OMIT_HEAVY_DEBUG_SYMBOLS
v4: documentation had been removed
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-02 06:32:03 +03:00
Robert Schulze
2c828338f4
Replace hyperscan by vectorscan
This commit migrates ClickHouse to Vectorscan. The first 10 min of
[0] explain the reasons for it.

(*) Addresses (but does not resolve) #38046

(*) Config parameter names (e.g. "max_hyperscan_regexp_length") are
    preserved for compatibility. Likewise, error codes (e.g.
    "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g.
    "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in
    replacement.

[0] https://www.youtube.com/watch?v=KlZWmmflW6M
2022-06-24 10:47:52 +02:00
Alexey Milovidov
c50791dd3b Fix clang-tidy-14, part 1 2022-05-27 22:52:14 +02:00
Robert Schulze
1b81bb49b4
Enable clang-tidy modernize-deprecated-headers & hicpp-deprecated-headers
Official docs:

  Some headers from C library were deprecated in C++ and are no longer
  welcome in C++ codebases. Some have no effect in C++. For more details
  refer to the C++ 14 Standard [depr.c.headers] section. This check
  replaces C standard library headers with their C++ alternatives and
  removes redundant ones.
2022-05-09 08:23:33 +02:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
Alexey Milovidov
cbeeb7ec4f Remove Arcadia 2022-04-16 00:20:47 +02:00
Maksim Kita
538f8cbaad Fix clang-tidy warnings in Disks, Formats, Functions folders 2022-03-14 18:17:35 +00:00
Maksim Kita
5c92ad0d57 Function encodeURLComponent minor fixes 2022-02-17 18:34:23 +00:00
zzsmdfj
1c068f1295 to issue/#31092_add_encodeURLComponent_function 2022-02-17 11:55:06 +08:00
zzsmdfj
6b78da6f02 to issue/#31092_add_encodeURLComponent_function 2022-02-17 11:32:47 +08:00
zzsmdfj
4dcb411f4f to #31092_add_encodeURLComponent_function 2022-02-16 10:19:20 +08:00
taiyang-li
c9d5251e12 finish dev 2022-01-30 09:10:27 +08:00
Azat Khuzhin
66a210410f Fix build w/o hyperscan 2022-01-20 10:02:02 +03:00
alexey-milovidov
df2fede98b
Update decodeURLComponent.cpp 2022-01-08 07:21:12 +03:00
cmsxbc
37349a9d0f
add function decodeURLFormComponent 2022-01-07 20:51:30 +08:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Pavel Kruglov
70b51133c1 Try to simplify code 2021-08-09 18:01:08 +03:00
Pavel Kruglov
0662df8b76 Fix performance with JIT, add arguments to function isSuitableForShortCircuitArgumentsExecution 2021-08-09 17:54:14 +03:00
Pavel Kruglov
e792fa588f Mark all Functions as sutable or not for executing as short circuit arguments 2021-08-09 17:50:09 +03:00
Alexey Milovidov
ba1442532b Fix build 2021-07-10 11:43:28 +03:00
Alexey Milovidov
9ca38235aa Correct fix for #26041 2021-07-10 11:29:08 +03:00
alexey-milovidov
05d1af153c
Merge branch 'master' into rename-const-context-ptr 2021-06-12 03:25:09 +03:00
Azat Khuzhin
e0c1780370 Fix topLevelDomain() for IDN hosts 2021-06-09 10:59:56 +03:00
Azat Khuzhin
38ac83dff9 Update comments for getURLHost() 2021-06-09 10:59:10 +03:00
Nikolai Kochetov
dbaa6ffc62 Rename ContextConstPtr to ContextPtr. 2021-06-01 15:20:52 +03:00