Update url-functions.md

This commit is contained in:
Blargian 2024-05-29 15:04:02 +02:00
parent a68ebaafd7
commit 33dc8ce456

View File

@ -7,9 +7,35 @@ sidebar_label: URLs
# Functions for Working with URLs # Functions for Working with URLs
:::note :::note
The functions mentioned in this section for the most part do not follow the RFC convention as they are maximally simplified for improved performance. Functions following a specific RFC convention have `RFC` appended to the function name and are generally less performant. The functions mentioned in this section for the most part do not follow the RFC-3986 convention as they are maximally simplified for improved performance. Functions following the RFC-3986 convention have `RFC` appended to the function name and are generally less performant.
- When should I pick the non-`RFC` variant?
— Pick the non-`RFC` variant when working with domains which are allowed to be publically registered and when userinfo and the `@` symbol does not appear in the URL.
::: :::
The table belows details which symbols are restricted (`✗`) and which are available (`✔`) for use in the whole URL between the two variants.
|Symbol | non-`RFC`| `RFC` |
|-------|----------|-------|
| ' ' | ✗ |✗ |
| \t | ✗ |✗ |
| < | ✗ |✗ |
| > | ✗ |✗ |
| % | ✗ |✔* |
| { | ✗ |✗ |
| } | ✗ |✗ |
| \| | ✗ |✗ |
| \\\ | ✗ |✗ |
| ^ | ✗ |✗ |
| ~ | ✗ |✔* |
| [ | ✗ |✗ |
| ] | ✗ |✔ |
| ; | ✗ |✔* |
| = | ✗ |✔* |
| & | ✗ |✔* |
The symbols above marked `*` are sub-delimiters in the RFC 3986 convention and are allowed for userinfo following the `@` symbol.
## Functions that Extract Parts of a URL ## Functions that Extract Parts of a URL
If the relevant part isnt present in a URL, an empty string is returned. If the relevant part isnt present in a URL, an empty string is returned.
@ -18,7 +44,7 @@ If the relevant part isnt present in a URL, an empty string is returned.
Extracts the protocol from a URL. Extracts the protocol from a URL.
Examples of typical returned values: http, https, ftp, mailto, tel, magnet... Examples of typical returned values: http, https, ftp, mailto, tel, magnet
### domain ### domain
@ -32,7 +58,7 @@ domain(URL)
**Arguments** **Arguments**
- `url` — URL. [String](../data-types/string.md). - `URL` — URL. Type: [String](../../sql-reference/data-types/string.md).
The URL can be specified with or without a protocol. Examples: The URL can be specified with or without a protocol. Examples:
@ -114,7 +140,7 @@ topLevelDomain(URL)
**Arguments** **Arguments**
- `url` — URL. [String](../data-types/string.md). - `URL` — URL. [String](../../sql-reference/data-types/string.md).
:::note :::note
The URL can be specified with or without a protocol. Examples: The URL can be specified with or without a protocol. Examples:
@ -128,7 +154,7 @@ https://clickhouse.com/time/
**Returned values** **Returned values**
- Domain name if ClickHouse can parse the input string as a URL. Otherwise, an empty string. [String](../data-types/string.md). - Domain name if ClickHouse can parse the input string as a URL. Otherwise, an empty string. [String](../../sql-reference/data-types/string.md).
**Example** **Example**
@ -156,7 +182,7 @@ topLevelDomainRFC(URL)
**Arguments** **Arguments**
- `url` — URL. [String](../data-types/string.md). - `URL` — URL. [String](../../sql-reference/data-types/string.md).
:::note :::note
The URL can be specified with or without a protocol. Examples: The URL can be specified with or without a protocol. Examples:
@ -170,7 +196,7 @@ https://clickhouse.com/time/
**Returned values** **Returned values**
- Domain name if ClickHouse can parse the input string as a URL. Otherwise, an empty string. [String](../data-types/string.md). - Domain name if ClickHouse can parse the input string as a URL. Otherwise, an empty string. [String](../../sql-reference/data-types/string.md).
**Example** **Example**
@ -425,17 +451,17 @@ This can be useful if you need a fresh TLD list or if you have a custom list.
**Syntax** **Syntax**
``` sql ``` sql
cutToFirstSignificantSubdomainCustom(URL, TLD) cutToFirstSignificantSubdomain(URL, TLD)
``` ```
**Arguments** **Arguments**
- `URL` — URL. [String](../data-types/string.md). - `URL` — URL. [String](../../sql-reference/data-types/string.md).
- `TLD` — Custom TLD list name. [String](../data-types/string.md). - `TLD` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
**Returned value** **Returned value**
- Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../data-types/string.md). - Part of the domain that includes top-level subdomains up to the first significant subdomain. [String](../../sql-reference/data-types/string.md).
**Example** **Example**
@ -505,12 +531,14 @@ cutToFirstSignificantSubdomainCustomWithWWW(URL, TLD)
**Arguments** **Arguments**
- `URL` — URL. [String](../data-types/string.md). - `URL` — URL. [String](../../sql-reference/data-types/string.md).
- `TLD` — Custom TLD list name. [String](../data-types/string.md). - `TLD` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
**Returned value** **Returned value**
- Part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`. [String](../data-types/string.md). - Part of the domain that includes top-level subdomains up to the first significant subdomain without stripping `www`.
Type: [String](../../sql-reference/data-types/string.md).
**Example** **Example**
@ -580,12 +608,14 @@ firstSignificantSubdomainCustom(URL, TLD)
**Arguments** **Arguments**
- `URL` — URL. [String](../data-types/string.md). - `URL` — URL. [String](../../sql-reference/data-types/string.md).
- `TLD` — Custom TLD list name. [String](../data-types/string.md). - `TLD` — Custom TLD list name. [String](../../sql-reference/data-types/string.md).
**Returned value** **Returned value**
- First significant subdomain. [String](../data-types/string.md). - First significant subdomain.
Type: [String](../../sql-reference/data-types/string.md).
**Example** **Example**
@ -825,11 +855,13 @@ netloc(URL)
**Arguments** **Arguments**
- `url` — URL. [String](../data-types/string.md). - `URL` — URL. [String](../../sql-reference/data-types/string.md).
**Returned value** **Returned value**
- `username:password@host:port`. [String](../data-types/string.md). - `username:password@host:port`.
Type: `String`.
**Example** **Example**
@ -879,12 +911,14 @@ cutURLParameter(URL, name)
**Arguments** **Arguments**
- `URL` — URL. [String](../data-types/string.md). - `URL` — URL. [String](../../sql-reference/data-types/string.md).
- `name` — name of URL parameter. [String](../data-types/string.md) or [Array](../data-types/array.md) of Strings. - `name` — name of URL parameter. [String](../../sql-reference/data-types/string.md) or [Array](../../sql-reference/data-types/array.md) of Strings.
**Returned value** **Returned value**
- URL with `name` URL parameter removed. [String](../data-types/string.md). - URL with `name` URL parameter removed.
Type: `String`.
**Example** **Example**