ClickHouse/doc/reference/en/functions/url_functions.rst

119 lines
4.5 KiB
ReStructuredText
Raw Normal View History

2017-04-26 19:16:38 +00:00
Functions for working with URLs
2017-04-03 19:49:50 +00:00
------------------------
2017-04-26 19:16:38 +00:00
All these functions don't follow the RFC. They are maximally simplified for improved performance.
2017-04-03 19:49:50 +00:00
Функции, извлекающие часть URL-а.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2017-04-26 19:16:38 +00:00
If there isn't anything similar in a URL, an empty string is returned.
2017-04-03 19:49:50 +00:00
protocol
""""""""
2017-04-26 19:16:38 +00:00
- Selects the protocol. Examples: http, ftp, mailto, magnet...
2017-04-03 19:49:50 +00:00
domain
"""""""
2017-04-26 19:16:38 +00:00
- Selects the domain.
2017-04-03 19:49:50 +00:00
domainWithoutWWW
""""""""""""
2017-04-26 19:16:38 +00:00
- Selects the domain and removes no more than one 'www.' from the beginning of it, if present.
2017-04-03 19:49:50 +00:00
topLevelDomain
"""""""""""
2017-04-26 19:16:38 +00:00
- Selects the top-level domain. Example: .ru.
2017-04-03 19:49:50 +00:00
firstSignificantSubdomain
""""""""""""""
2017-04-26 19:16:38 +00:00
- Selects the "first significant subdomain". This is a non-standard concept specific to Yandex.Metrica. The first significant subdomain is a second-level domain if it is 'com', 'net', 'org', or 'co'. Otherwise, it is a third-level domain. For example, firstSignificantSubdomain('https://news.yandex.ru/') = 'yandex', firstSignificantSubdomain('https://news.yandex.com.tr/') = 'yandex'. The list of "insignificant" second-level domains and other implementation details may change in the future.
2017-04-03 19:49:50 +00:00
cutToFirstSignificantSubdomain
""""""""""""""""
2017-04-26 19:16:38 +00:00
- Selects the part of the domain that includes top-level subdomains up to the "first significant subdomain" (see the explanation above).
2017-04-03 19:49:50 +00:00
2017-04-26 19:16:38 +00:00
For example, ``cutToFirstSignificantSubdomain('https://news.yandex.com.tr/') = 'yandex.com.tr'``.
2017-04-03 19:49:50 +00:00
path
""""
2017-04-26 19:16:38 +00:00
- Selects the path. Example: /top/news.html The path does not include the query-string.
2017-04-03 19:49:50 +00:00
pathFull
"""""""
2017-04-26 19:16:38 +00:00
- The same as above, but including query-string and fragment. Example: /top/news.html?page=2#comments
2017-04-03 19:49:50 +00:00
queryString
"""""""""
2017-04-26 19:16:38 +00:00
- Selects the query-string. Example: page=1&lr=213. query-string does not include the first question mark, or # and everything that comes after #.
2017-04-03 19:49:50 +00:00
fragment
""""""
2017-04-26 19:16:38 +00:00
- Selects the fragment identifier. fragment does not include the first number sign (#).
2017-04-03 19:49:50 +00:00
queryStringAndFragment
"""""""""
2017-04-26 19:16:38 +00:00
- Selects the query-string and fragment identifier. Example: page=1#29390.
2017-04-03 19:49:50 +00:00
extractURLParameter(URL, name)
"""""""""
2017-04-26 19:16:38 +00:00
- Selects the value of the 'name' parameter in the URL, if present. Otherwise, selects an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL in exactly the same way as in the argument passed.
2017-04-03 19:49:50 +00:00
extractURLParameters(URL)
""""""""""
2017-04-26 19:16:38 +00:00
- Gets an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.
2017-04-03 19:49:50 +00:00
extractURLParameterNames(URL)
""""""""
2017-04-26 19:16:38 +00:00
- Gets an array of name=value strings corresponding to the names of URL parameters. The values are not decoded in any way.
2017-04-03 19:49:50 +00:00
URLHierarchy(URL)
"""""""""
2017-04-26 19:16:38 +00:00
- Gets an array containing the URL trimmed to the ``/``, ``?`` characters in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters. Example:
2017-04-03 19:49:50 +00:00
URLPathHierarchy(URL)
""""""""
2017-04-26 19:16:38 +00:00
- The same thing, but without the protocol and host in the result. The / element (root) is not included. Example:
This function is used for implementing tree-view reports by URL in Yandex.Metrica.
2017-04-03 19:49:50 +00:00
::
URLPathHierarchy('https://example.com/browse/CONV-6788') =
[
'/browse/',
'/browse/CONV-6788'
]
decodeURLComponent(URL)
"""""""""""
2017-04-26 19:16:38 +00:00
Returns a URL-decoded URL.
Example:
2017-04-03 19:49:50 +00:00
.. code-block:: sql
:) SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS DecodedURL;
┌─DecodedURL─────────────────────────────┐
│ http://127.0.0.1:8123/?query=SELECT 1; │
└────────────────────────────────────────┘
2017-04-26 19:16:38 +00:00
Functions that remove part of a URL.
2017-04-03 19:49:50 +00:00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2017-04-26 19:16:38 +00:00
If the URL doesn't have anything similar, the URL remains unchanged.
2017-04-03 19:49:50 +00:00
cutWWW
"""""
2017-04-26 19:16:38 +00:00
Removes no more than one 'www.' from the beginning of the URL's domain, if present.
2017-04-03 19:49:50 +00:00
cutQueryString
""""""
2017-04-26 19:16:38 +00:00
Removes the query-string. The question mark is also removed..
2017-04-03 19:49:50 +00:00
cutFragment
""""""""
2017-04-26 19:16:38 +00:00
Removes the fragment identifier. The number sign is also removed.
2017-04-03 19:49:50 +00:00
cutQueryStringAndFragment
""""""""""
2017-04-26 19:16:38 +00:00
Removes the query-string and fragment identifier. The question mark and number sign are also removed.
2017-04-03 19:49:50 +00:00
cutURLParameter(URL, name)
""""""""""
2017-04-26 19:16:38 +00:00
Removes the URL parameter named 'name', if present. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.