Commit Graph

65 Commits

Author SHA1 Message Date
Li Yin
4088c0a7f3 Automated function registration
Automated register all functions with below naming convention by
iterating through the symbols:
void DB::registerXXX(DB::FunctionFactory &)
2022-07-29 15:39:50 +08:00
Azat Khuzhin
1d4a7c7290 Add support of !/* (exclamation/asterisk) in custom TLDs
Public suffix list may contain special characters (you may find format
here - [1]):
- asterisk (*)
- exclamation mark (!)

  [1]: https://github.com/publicsuffix/list/wiki/Format

It is easier to describe how it should be interpreted with an examples.

Consider the following part of the list:

    *.sch.uk
    *.kawasaki.jp
    !city.kawasaki.jp

And here are the results for `cutToFirstSignificantSubdomainCustom()`:

If you have only asterisk (*):

    foo.something.sheffield.sch.uk -> something.sheffield.sch.uk
    sheffield.sch.uk               -> sheffield.sch.uk

If you have exclamation mark (!) too:

    foo.kawasaki.jp                -> foo.kawasaki.jp
    foo.foo.kawasaki.jp            -> foo.foo.kawasaki.jp
    city.kawasaki.jp               -> city.kawasaki.jp
    some.city.kawasaki.jp          -> city.kawasaki.jp

TLDs had been verified wit the following script [2], to match with
python publicsuffix2 module.

  [2]: https://gist.github.com/azat/c1a7a9f1e3519793134ef4b1df5461a6

v2: fix StringHashTable padding requirements
Fixes: #39468
Follow-up for: #17748
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-26 08:34:30 +03:00
Robert Schulze
13482af4ee
First try at reducing the use of StringRef
- to be replaced by std::string_view
- suggested in #39262
2022-07-17 17:26:02 +00:00
Azat Khuzhin
e8f5cd3c68 Add separate option to omit symbols from heavy contrib
Sometimes it is useful to build contrib with debug symbols for further
debugging.

With everything turned ON (i.e. debug build) I got 3.3GB vs 3.0GB w/o
this patch, 9% bloat, thoughts about this is this OK or not for you, if
not STRIP_DEBUG_SYMBOLS_HEAVY_CONTRIB can be OFF by default (regardless
of build type).

P.S. aws debug symbols adds just 1.7%.
v2: rename STRIP_HEAVY_DEBUG_SYMBOLS
v3: OMIT_HEAVY_DEBUG_SYMBOLS
v4: documentation had been removed
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-02 06:32:03 +03:00
Robert Schulze
2c828338f4
Replace hyperscan by vectorscan
This commit migrates ClickHouse to Vectorscan. The first 10 min of
[0] explain the reasons for it.

(*) Addresses (but does not resolve) #38046

(*) Config parameter names (e.g. "max_hyperscan_regexp_length") are
    preserved for compatibility. Likewise, error codes (e.g.
    "ErrorCodes::HYPERSCAN_CANNOT_SCAN_TEXT") and function/class names (e.g.
    "HyperscanDeleter") are preserved as vectorscan aims to be a drop-in
    replacement.

[0] https://www.youtube.com/watch?v=KlZWmmflW6M
2022-06-24 10:47:52 +02:00
Alexey Milovidov
c50791dd3b Fix clang-tidy-14, part 1 2022-05-27 22:52:14 +02:00
Robert Schulze
1b81bb49b4
Enable clang-tidy modernize-deprecated-headers & hicpp-deprecated-headers
Official docs:

  Some headers from C library were deprecated in C++ and are no longer
  welcome in C++ codebases. Some have no effect in C++. For more details
  refer to the C++ 14 Standard [depr.c.headers] section. This check
  replaces C standard library headers with their C++ alternatives and
  removes redundant ones.
2022-05-09 08:23:33 +02:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
Alexey Milovidov
cbeeb7ec4f Remove Arcadia 2022-04-16 00:20:47 +02:00
Maksim Kita
538f8cbaad Fix clang-tidy warnings in Disks, Formats, Functions folders 2022-03-14 18:17:35 +00:00
Maksim Kita
5c92ad0d57 Function encodeURLComponent minor fixes 2022-02-17 18:34:23 +00:00
zzsmdfj
1c068f1295 to issue/#31092_add_encodeURLComponent_function 2022-02-17 11:55:06 +08:00
zzsmdfj
6b78da6f02 to issue/#31092_add_encodeURLComponent_function 2022-02-17 11:32:47 +08:00
zzsmdfj
4dcb411f4f to #31092_add_encodeURLComponent_function 2022-02-16 10:19:20 +08:00
taiyang-li
c9d5251e12 finish dev 2022-01-30 09:10:27 +08:00
Azat Khuzhin
66a210410f Fix build w/o hyperscan 2022-01-20 10:02:02 +03:00
alexey-milovidov
df2fede98b
Update decodeURLComponent.cpp 2022-01-08 07:21:12 +03:00
cmsxbc
37349a9d0f
add function decodeURLFormComponent 2022-01-07 20:51:30 +08:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Pavel Kruglov
70b51133c1 Try to simplify code 2021-08-09 18:01:08 +03:00
Pavel Kruglov
0662df8b76 Fix performance with JIT, add arguments to function isSuitableForShortCircuitArgumentsExecution 2021-08-09 17:54:14 +03:00
Pavel Kruglov
e792fa588f Mark all Functions as sutable or not for executing as short circuit arguments 2021-08-09 17:50:09 +03:00
Alexey Milovidov
ba1442532b Fix build 2021-07-10 11:43:28 +03:00
Alexey Milovidov
9ca38235aa Correct fix for #26041 2021-07-10 11:29:08 +03:00
alexey-milovidov
05d1af153c
Merge branch 'master' into rename-const-context-ptr 2021-06-12 03:25:09 +03:00
Azat Khuzhin
e0c1780370 Fix topLevelDomain() for IDN hosts 2021-06-09 10:59:56 +03:00
Azat Khuzhin
38ac83dff9 Update comments for getURLHost() 2021-06-09 10:59:10 +03:00
Nikolai Kochetov
dbaa6ffc62 Rename ContextConstPtr to ContextPtr. 2021-06-01 15:20:52 +03:00
Alexander Kuzmenkov
3f57fc085b remove mutable context references from functions interface
Also remove it from some visitors.
2021-05-28 19:45:37 +03:00
Maksim Kita
d923d9e6ef Function move file 2021-05-17 10:30:42 +03:00
Maksim Kita
947f28d430 IFunction refactoring 2021-05-15 20:33:15 +03:00
Alexey Milovidov
4ff812db7f Maybe better support for paths with whitespaces 2021-04-24 22:47:52 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Azat Khuzhin
b68517f69e Fix cutToFirstSignificantSubdomainCustom()/firstSignificantSubdomainCustom() for 3+level domains
Custom TLD lists (added in #17748), may contain domain of the 3-d level,
however builtin TLD lists does not have such records, so it is not
affected.

Note that this will significantly increase hashtable lookups.

Fixes: #17748
2021-03-26 00:00:16 +03:00
Azat Khuzhin
30cd1c6145 Fix typo in FirstSignificantSubdomainCustomLookup name 2021-03-26 00:00:16 +03:00
Azat Khuzhin
916cbd6610 Add ability to use custom TLD list
v2: Add a note that top_level_domains_lists aren not applied w/o restart
v3: Remove ExtractFirstSignificantSubdomain{Default,Custom}Lookup.h headers
v4: TLDListsHolder: remove FIXME for dense_hash_map (this is not significant)
2020-12-09 21:08:22 +03:00
Nikolai Kochetov
bac1def5f9
Merge pull request #17134 from abyss7/tcp-port
Implement tcpPort() function for tests
2020-11-20 20:32:55 +03:00
Azat Khuzhin
48645eae33 Add cutToFirstSignificantSubdomainWithWWW()
Sometimes it is odd to get TLD itself from the
cutToFirstSignificantSubdomain() (since you will not get TLD itself if
you pass it directly):
- cutToFirstSignificantSubdomain('org')            -> ""
- cutToFirstSignificantSubdomain('www.org')        -> org
- cutToFirstSignificantSubdomain('kernel.org')     -> kernel.org
- cutToFirstSignificantSubdomain('www.kernel.org') -> kernel.org

So add one more function to get www.org in this case:
- cutToFirstSignificantSubdomainWithWWW('org')            -> ""
- cutToFirstSignificantSubdomainWithWWW('www.org')        -> www.org
- cutToFirstSignificantSubdomainWithWWW('kernel.org')     -> kernel.org
- cutToFirstSignificantSubdomainWithWWW('www.kernel.org') -> kernel.org

P.S. not sure about the naming though, so it will great if someone has
suggestion for the name.
2020-11-18 21:09:27 +03:00
Ivan Lezhankin
f897f7c93f Refactor IFunction to execute with const arguments 2020-11-17 16:24:45 +03:00
Nikolai Kochetov
295e612343 Fix build and tests. 2020-10-20 16:11:57 +03:00
Nikolai Kochetov
740fad66f3 Part 6. 2020-10-18 22:00:13 +03:00
Nikolai Kochetov
959424f28a Rename block to columns. 2020-10-14 17:04:50 +03:00
Nikolai Kochetov
966b1d6cf5 Rename Block to ColumnsWithTypeAndName. 2020-10-14 16:09:11 +03:00
Nikolai Kochetov
3a17c2a7ac Rename FunctionArguments to ColumnsWithTypeAndName 2020-10-11 22:20:20 +03:00
Nikolai Kochetov
d28325a353 Replace getByPosition to [] 2020-10-10 21:24:57 +03:00
Nikolai Kochetov
a7fb2e38a5 Use ColumnWithTypeAndName as function argument instead of Block. 2020-10-09 10:41:28 +03:00
myrrc
c309f55c20 updated setting and added default value 2020-09-10 14:02:52 +03:00
Alexey Milovidov
53e39b05b2 Lower binary size in debug build 2020-09-07 18:35:18 +03:00
Azat Khuzhin
2d7cb03120 Suppress superfluous wget (-nv) output
Since for dowloading some of files wget logging may take 50% of overall
log [1].

  [1]: https://clickhouse-builds.s3.yandex.net/14315/c32ff4c98cb3b83a12f945eadd180415b7a3b269/clickhouse_build_check/build_log_761119955_1598923036.txt
2020-09-01 10:25:13 +03:00
Alexey Milovidov
ac2de4cc95 Fix build 2020-08-08 03:08:41 +03:00