2017-04-26 19:16:38 +00:00
Functions for working with URLs
2017-06-13 04:15:47 +00:00
-------------------------------
2017-04-03 19:49:50 +00:00
2017-04-26 19:16:38 +00:00
All these functions don't follow the RFC. They are maximally simplified for improved performance.
2017-04-03 19:49:50 +00:00
2017-06-13 04:15:47 +00:00
Functions tat extract part of the URL
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2017-04-26 19:16:38 +00:00
If there isn't anything similar in a URL, an empty string is returned.
2017-04-03 19:49:50 +00:00
protocol
""""""""
2017-06-13 04:15:47 +00:00
Selects the protocol. Examples: http, ftp, mailto, magnet...
2017-04-03 19:49:50 +00:00
domain
2017-06-13 04:15:47 +00:00
""""""
Selects the domain.
2017-04-03 19:49:50 +00:00
domainWithoutWWW
2017-06-13 04:15:47 +00:00
""""""""""""""""
Selects the domain and removes no more than one 'www.' from the beginning of it, if present.
2017-04-03 19:49:50 +00:00
topLevelDomain
2017-06-13 04:15:47 +00:00
""""""""""""""
Selects the top-level domain. Example: .ru.
2017-04-03 19:49:50 +00:00
firstSignificantSubdomain
2017-06-13 04:15:47 +00:00
"""""""""""""""""""""""""
Selects the "first significant subdomain". This is a non-standard concept specific to Yandex.Metrica. The first significant subdomain is a second-level domain if it is 'com', 'net', 'org', or 'co'. Otherwise, it is a third-level domain. For example, firstSignificantSubdomain('https://news.yandex.ru/') = 'yandex', firstSignificantSubdomain('https://news.yandex.com.tr/') = 'yandex'. The list of "insignificant" second-level domains and other implementation details may change in the future.
2017-04-03 19:49:50 +00:00
cutToFirstSignificantSubdomain
2017-06-13 04:15:47 +00:00
""""""""""""""""""""""""""""""
Selects the part of the domain that includes top-level subdomains up to the "first significant subdomain" (see the explanation above).
2017-04-03 19:49:50 +00:00
2017-04-26 19:16:38 +00:00
For example, `` cutToFirstSignificantSubdomain('https://news.yandex.com.tr/') = 'yandex.com.tr' `` .
2017-04-03 19:49:50 +00:00
path
""""
2017-06-13 04:15:47 +00:00
Selects the path. Example: /top/news.html The path does not include the query-string.
2017-04-03 19:49:50 +00:00
pathFull
2017-06-13 04:15:47 +00:00
""""""""
The same as above, but including query-string and fragment. Example: /top/news.html?page=2#comments
2017-04-03 19:49:50 +00:00
queryString
2017-06-13 04:15:47 +00:00
"""""""""""
Selects the query-string. Example: page=1&lr=213. query-string does not include the first question mark, or # and everything that comes after #.
2017-04-03 19:49:50 +00:00
fragment
2017-06-13 04:15:47 +00:00
""""""""
Selects the fragment identifier. fragment does not include the first number sign (#).
2017-04-03 19:49:50 +00:00
queryStringAndFragment
2017-06-13 04:15:47 +00:00
""""""""""""""""""""""
Selects the query-string and fragment identifier. Example: page=1#29390.
2017-04-03 19:49:50 +00:00
extractURLParameter(URL, name)
2017-06-13 04:15:47 +00:00
""""""""""""""""""""""""""""""
Selects the value of the 'name' parameter in the URL, if present. Otherwise, selects an empty string. If there are many parameters with this name, it returns the first occurrence. This function works under the assumption that the parameter name is encoded in the URL in exactly the same way as in the argument passed.
2017-04-03 19:49:50 +00:00
extractURLParameters(URL)
2017-06-13 04:15:47 +00:00
"""""""""""""""""""""""""
Gets an array of name=value strings corresponding to the URL parameters. The values are not decoded in any way.
2017-04-03 19:49:50 +00:00
extractURLParameterNames(URL)
2017-06-13 04:15:47 +00:00
"""""""""""""""""""""""""""""
Gets an array of name=value strings corresponding to the names of URL parameters. The values are not decoded in any way.
2017-04-03 19:49:50 +00:00
URLHierarchy(URL)
2017-06-13 04:15:47 +00:00
"""""""""""""""""
Gets an array containing the URL trimmed to the `` / `` , `` ? `` characters in the path and query-string. Consecutive separator characters are counted as one. The cut is made in the position after all the consecutive separator characters. Example:
2017-04-03 19:49:50 +00:00
URLPathHierarchy(URL)
2017-06-13 04:15:47 +00:00
"""""""""""""""""""""
The same thing, but without the protocol and host in the result. The / element (root) is not included. Example:
2017-04-26 19:16:38 +00:00
This function is used for implementing tree-view reports by URL in Yandex.Metrica.
2017-06-13 20:35:07 +00:00
2017-06-13 04:15:47 +00:00
.. code-block :: text
2017-04-03 19:49:50 +00:00
URLPathHierarchy('https://example.com/browse/CONV-6788') =
[
'/browse/',
'/browse/CONV-6788'
]
decodeURLComponent(URL)
2017-06-13 04:15:47 +00:00
"""""""""""""""""""""""
2017-04-26 19:16:38 +00:00
Returns a URL-decoded URL.
2017-06-13 04:15:47 +00:00
2017-04-26 19:16:38 +00:00
Example:
2017-04-03 19:49:50 +00:00
.. code-block :: sql
2017-06-13 04:15:47 +00:00
SELECT decodeURLComponent('http://127.0.0.1:8123/?query=SELECT%201%3B') AS DecodedURL;
2017-04-03 19:49:50 +00:00
2017-06-20 14:19:03 +00:00
.. code-block :: text
2017-04-03 19:49:50 +00:00
┌─DecodedURL─────────────────────────────┐
│ http://127.0.0.1:8123/?query=SELECT 1; │
└────────────────────────────────────────┘
2017-04-26 19:16:38 +00:00
Functions that remove part of a URL.
2017-06-13 04:15:47 +00:00
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2017-04-26 19:16:38 +00:00
If the URL doesn't have anything similar, the URL remains unchanged.
2017-04-03 19:49:50 +00:00
cutWWW
2017-06-13 04:15:47 +00:00
""""""
2017-04-26 19:16:38 +00:00
Removes no more than one 'www.' from the beginning of the URL's domain, if present.
2017-04-03 19:49:50 +00:00
cutQueryString
2017-06-13 04:15:47 +00:00
""""""""""""""
2017-04-26 19:16:38 +00:00
Removes the query-string. The question mark is also removed..
2017-04-03 19:49:50 +00:00
cutFragment
2017-06-13 04:15:47 +00:00
"""""""""""
2017-04-26 19:16:38 +00:00
Removes the fragment identifier. The number sign is also removed.
2017-04-03 19:49:50 +00:00
cutQueryStringAndFragment
2017-06-13 04:15:47 +00:00
"""""""""""""""""""""""""
2017-04-26 19:16:38 +00:00
Removes the query-string and fragment identifier. The question mark and number sign are also removed.
2017-04-03 19:49:50 +00:00
cutURLParameter(URL, name)
2017-06-13 04:15:47 +00:00
""""""""""""""""""""""""""
2017-04-26 19:16:38 +00:00
Removes the URL parameter named 'name', if present. This function works under the assumption that the parameter name is encoded in the URL exactly the same way as in the passed argument.