ClickHouse/docs/en/sql-reference/functions/splitting-merging-functions.md

339 lines
9.6 KiB
Markdown
Raw Normal View History

2020-04-03 13:23:32 +00:00
---
sidebar_position: 47
sidebar_label: Splitting and Merging Strings and Arrays
2020-04-03 13:23:32 +00:00
---
# Functions for Splitting and Merging Strings and Arrays {#functions-for-splitting-and-merging-strings-and-arrays}
2020-03-20 10:10:48 +00:00
## splitByChar(separator, s) {#splitbycharseparator-s}
Splits a string into substrings separated by a specified character. It uses a constant string `separator` which consisting of exactly one character.
Returns an array of selected substrings. Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.
**Syntax**
``` sql
2021-05-25 13:03:35 +00:00
splitByChar(separator, s)
```
**Arguments**
- `separator` — The separator which should contain exactly one character. [String](../../sql-reference/data-types/string.md).
- `s` — The string to split. [String](../../sql-reference/data-types/string.md).
**Returned value(s)**
Returns an array of selected substrings. Empty substrings may be selected when:
- A separator occurs at the beginning or end of the string;
- There are multiple consecutive separators;
- The original string `s` is empty.
2021-05-25 13:03:35 +00:00
Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2021-05-25 13:03:35 +00:00
SELECT splitByChar(',', '1,2,3,abcde');
```
2020-03-20 10:10:48 +00:00
``` text
┌─splitByChar(',', '1,2,3,abcde')─┐
│ ['1','2','3','abcde'] │
└─────────────────────────────────┘
```
2020-03-20 10:10:48 +00:00
## splitByString(separator, s) {#splitbystringseparator-s}
Splits a string into substrings separated by a string. It uses a constant string `separator` of multiple characters as the separator. If the string `separator` is empty, it will split the string `s` into an array of single characters.
**Syntax**
``` sql
2021-05-25 13:03:35 +00:00
splitByString(separator, s)
```
**Arguments**
- `separator` — The separator. [String](../../sql-reference/data-types/string.md).
- `s` — The string to split. [String](../../sql-reference/data-types/string.md).
**Returned value(s)**
Returns an array of selected substrings. Empty substrings may be selected when:
2021-05-25 13:03:35 +00:00
Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).
2020-03-20 18:36:14 +00:00
- A non-empty separator occurs at the beginning or end of the string;
- There are multiple consecutive non-empty separators;
- The original string `s` is empty while the separator is not empty.
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2021-05-25 13:03:35 +00:00
SELECT splitByString(', ', '1, 2 3, 4,5, abcde');
```
2020-03-20 10:10:48 +00:00
``` text
┌─splitByString(', ', '1, 2 3, 4,5, abcde')─┐
│ ['1','2 3','4,5','abcde'] │
└───────────────────────────────────────────┘
```
2020-03-20 10:10:48 +00:00
``` sql
2021-05-25 13:03:35 +00:00
SELECT splitByString('', 'abcde');
```
2020-03-20 10:10:48 +00:00
``` text
┌─splitByString('', 'abcde')─┐
│ ['a','b','c','d','e'] │
└────────────────────────────┘
```
## splitByRegexp(regexp, s) {#splitbyregexpseparator-s}
2021-05-19 10:21:34 +00:00
Splits a string into substrings separated by a regular expression. It uses a regular expression string `regexp` as the separator. If the `regexp` is empty, it will split the string `s` into an array of single characters. If no match is found for this regular expression, the string `s` won't be split.
**Syntax**
``` sql
2021-05-25 13:03:35 +00:00
splitByRegexp(regexp, s)
```
**Arguments**
- `regexp` — Regular expression. Constant. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md).
- `s` — The string to split. [String](../../sql-reference/data-types/string.md).
**Returned value(s)**
Returns an array of selected substrings. Empty substrings may be selected when:
- A non-empty regular expression match occurs at the beginning or end of the string;
- There are multiple consecutive non-empty regular expression matches;
- The original string `s` is empty while the regular expression is not empty.
2021-05-25 13:03:35 +00:00
Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).
2021-05-19 10:21:34 +00:00
**Example**
2021-05-19 10:21:34 +00:00
Query:
``` sql
2021-05-25 13:03:35 +00:00
SELECT splitByRegexp('\\d+', 'a12bc23de345f');
```
2021-05-19 10:21:34 +00:00
Result:
``` text
2021-05-13 09:21:00 +00:00
┌─splitByRegexp('\\d+', 'a12bc23de345f')─┐
│ ['a','bc','de','f'] │
└────────────────────────────────────────┘
```
2021-05-19 10:21:34 +00:00
Query:
``` sql
2021-05-25 13:03:35 +00:00
SELECT splitByRegexp('', 'abcde');
```
2021-05-19 10:21:34 +00:00
Result:
``` text
┌─splitByRegexp('', 'abcde')─┐
│ ['a','b','c','d','e'] │
└────────────────────────────┘
```
## splitByWhitespace(s) {#splitbywhitespaceseparator-s}
Splits a string into substrings separated by whitespace characters.
Returns an array of selected substrings.
**Syntax**
``` sql
splitByWhitespace(s)
```
**Arguments**
- `s` — The string to split. [String](../../sql-reference/data-types/string.md).
**Returned value(s)**
Returns an array of selected substrings.
Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).
**Example**
``` sql
SELECT splitByWhitespace(' 1! a, b. ');
```
``` text
┌─splitByWhitespace(' 1! a, b. ')─┐
│ ['1!','a,','b.'] │
└─────────────────────────────────────┘
```
## splitByNonAlpha(s) {#splitbynonalphaseparator-s}
Splits a string into substrings separated by whitespace and punctuation characters.
Returns an array of selected substrings.
**Syntax**
``` sql
splitByNonAlpha(s)
```
**Arguments**
- `s` — The string to split. [String](../../sql-reference/data-types/string.md).
**Returned value(s)**
Returns an array of selected substrings.
Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).
**Example**
``` sql
SELECT splitByNonAlpha(' 1! a, b. ');
```
``` text
┌─splitByNonAlpha(' 1! a, b. ')─┐
│ ['1','a','b'] │
└───────────────────────────────────┘
```
2020-03-20 10:10:48 +00:00
## arrayStringConcat(arr\[, separator\]) {#arraystringconcatarr-separator}
2021-10-28 21:40:42 +00:00
Concatenates string representations of values listed in the array with the separator. `separator` is an optional parameter: a constant string, set to an empty string by default.
Returns the string.
2020-03-20 10:10:48 +00:00
## alphaTokens(s) {#alphatokenss}
Selects substrings of consecutive bytes from the ranges a-z and A-Z.Returns an array of substrings.
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2021-05-25 13:03:35 +00:00
SELECT alphaTokens('abca1abc');
DOCAPI-8530: Code blocks markup fix (#7060) * Typo fix. * Links fix. * Fixed links in docs. * More fixes. * docs/en: cleaning some files * docs/en: cleaning data_types * docs/en: cleaning database_engines * docs/en: cleaning development * docs/en: cleaning getting_started * docs/en: cleaning interfaces * docs/en: cleaning operations * docs/en: cleaning query_lamguage * docs/en: cleaning en * docs/ru: cleaning data_types * docs/ru: cleaning index * docs/ru: cleaning database_engines * docs/ru: cleaning development * docs/ru: cleaning general * docs/ru: cleaning getting_started * docs/ru: cleaning interfaces * docs/ru: cleaning operations * docs/ru: cleaning query_language * docs: cleaning interfaces/http * Update docs/en/data_types/array.md decorated ``` Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/getting_started/example_datasets/nyc_taxi.md fixed typo Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/getting_started/example_datasets/ontime.md fixed typo Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/interfaces/formats.md fixed error Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/table_engines/custom_partitioning_key.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/utils/clickhouse-local.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/dicts/external_dicts_dict_sources.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/utils/clickhouse-local.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/json_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/json_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/other_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/other_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/date_time_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/table_engines/jdbc.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * docs: fixed error * docs: fixed error
2019-09-23 15:31:46 +00:00
```
2020-03-20 10:10:48 +00:00
``` text
┌─alphaTokens('abca1abc')─┐
│ ['abca','abc'] │
└─────────────────────────┘
```
2020-03-20 10:10:48 +00:00
## extractAllGroups(text, regexp) {#extractallgroups}
Extracts all groups from non-overlapping substrings matched by a regular expression.
2021-07-29 15:20:55 +00:00
**Syntax**
``` sql
2021-07-29 15:20:55 +00:00
extractAllGroups(text, regexp)
```
2021-07-29 15:20:55 +00:00
**Arguments**
- `text` — [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md).
- `regexp` — Regular expression. Constant. [String](../data-types/string.md) or [FixedString](../data-types/fixedstring.md).
**Returned values**
- If the function finds at least one matching group, it returns `Array(Array(String))` column, clustered by group_id (1 to N, where N is number of capturing groups in `regexp`).
- If there is no matching group, returns an empty array.
Type: [Array](../data-types/array.md).
**Example**
Query:
``` sql
SELECT extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
```
Result:
``` text
┌─extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','123'],['8','"hkl"']] │
└───────────────────────────────────────────────────────────────────────┘
```
2021-10-25 16:15:28 +00:00
## ngrams {#ngrams}
Splits the UTF-8 string into n-grams of `ngramsize` symbols.
**Syntax**
``` sql
ngrams(string, ngramsize)
```
**Arguments**
- `string` — String. [String](../../sql-reference/data-types/string.md) or [FixedString](../../sql-reference/data-types/fixedstring.md).
- `ngramsize` — The size of an n-gram. [UInt](../../sql-reference/data-types/int-uint.md).
**Returned values**
- Array with n-grams.
2021-11-24 13:18:38 +00:00
Type: [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).
2021-10-25 16:15:28 +00:00
**Example**
Query:
``` sql
SELECT ngrams('ClickHouse', 3);
```
Result:
``` text
┌─ngrams('ClickHouse', 3)───────────────────────────┐
│ ['Cli','lic','ick','ckH','kHo','Hou','ous','use'] │
└───────────────────────────────────────────────────┘
```
## tokens {#tokens}
Splits a string into tokens using non-alphanumeric ASCII characters as separators.
**Arguments**
- `input_string` — Any set of bytes represented as the [String](../../sql-reference/data-types/string.md) data type object.
**Returned value**
- The resulting array of tokens from input string.
Type: [Array](../data-types/array.md).
**Example**
Query:
2021-10-25 16:15:28 +00:00
``` sql
SELECT tokens('test1,;\\ test2,;\\ test3,;\\ test4') AS tokens;
```
Result:
2021-10-25 16:15:28 +00:00
``` text
┌─tokens────────────────────────────┐
│ ['test1','test2','test3','test4'] │
└───────────────────────────────────┘
```