mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-22 23:52:03 +00:00

rfraposa 869967de41 Remove H1 anchor tags from docs

2022-06-02 04:55:18 -06:00

9.3 KiB

Raw Blame History

sidebar_position	sidebar_label
47	Splitting and Merging Strings and Arrays

Functions for Splitting and Merging Strings and Arrays

splitByChar(separator, s)

Splits a string into substrings separated by a specified character. It uses a constant string separator which consisting of exactly one character. Returns an array of selected substrings. Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.

Syntax

splitByChar(separator, s)

Arguments

separator — The separator which should contain exactly one character. String.
s — The string to split. String.

Returned value(s)

Returns an array of selected substrings. Empty substrings may be selected when:

A separator occurs at the beginning or end of the string;
There are multiple consecutive separators;
The original string s is empty.

Type: Array(String).

Example

SELECT splitByChar(',', '1,2,3,abcde');

┌─splitByChar(',', '1,2,3,abcde')─┐
│ ['1','2','3','abcde']           │
└─────────────────────────────────┘

splitByString(separator, s)

Splits a string into substrings separated by a string. It uses a constant string separator of multiple characters as the separator. If the string separator is empty, it will split the string s into an array of single characters.

Syntax

splitByString(separator, s)

Arguments

separator — The separator. String.
s — The string to split. String.

Returned value(s)

Returns an array of selected substrings. Empty substrings may be selected when:

Type: Array(String).

A non-empty separator occurs at the beginning or end of the string;
There are multiple consecutive non-empty separators;
The original string s is empty while the separator is not empty.

Example

SELECT splitByString(', ', '1, 2 3, 4,5, abcde');

┌─splitByString(', ', '1, 2 3, 4,5, abcde')─┐
│ ['1','2 3','4,5','abcde']                 │
└───────────────────────────────────────────┘

SELECT splitByString('', 'abcde');

┌─splitByString('', 'abcde')─┐
│ ['a','b','c','d','e']      │
└────────────────────────────┘

splitByRegexp(regexp, s)

Splits a string into substrings separated by a regular expression. It uses a regular expression string regexp as the separator. If the regexp is empty, it will split the string s into an array of single characters. If no match is found for this regular expression, the string s won't be split.

Syntax

splitByRegexp(regexp, s)

Arguments

regexp — Regular expression. Constant. String or FixedString.
s — The string to split. String.

Returned value(s)

Returns an array of selected substrings. Empty substrings may be selected when:

A non-empty regular expression match occurs at the beginning or end of the string;
There are multiple consecutive non-empty regular expression matches;
The original string s is empty while the regular expression is not empty.

Type: Array(String).

Example

Query:

SELECT splitByRegexp('\\d+', 'a12bc23de345f');

Result:

┌─splitByRegexp('\\d+', 'a12bc23de345f')─┐
│ ['a','bc','de','f']                    │
└────────────────────────────────────────┘

Query:

SELECT splitByRegexp('', 'abcde');

Result:

┌─splitByRegexp('', 'abcde')─┐
│ ['a','b','c','d','e']      │
└────────────────────────────┘

splitByWhitespace(s)

Splits a string into substrings separated by whitespace characters. Returns an array of selected substrings.

Syntax

splitByWhitespace(s)

Arguments

s — The string to split. String.

Returned value(s)

Returns an array of selected substrings.

Type: Array(String).

Example

SELECT splitByWhitespace('  1!  a,  b.  ');

┌─splitByWhitespace('  1!  a,  b.  ')─┐
│ ['1!','a,','b.']                    │
└─────────────────────────────────────┘

splitByNonAlpha(s)

Splits a string into substrings separated by whitespace and punctuation characters. Returns an array of selected substrings.

Syntax

splitByNonAlpha(s)

Arguments

s — The string to split. String.

Returned value(s)

Returns an array of selected substrings.

Type: Array(String).

Example

SELECT splitByNonAlpha('  1!  a,  b.  ');

┌─splitByNonAlpha('  1!  a,  b.  ')─┐
│ ['1','a','b']                     │
└───────────────────────────────────┘

arrayStringConcat(arr[, separator])

Concatenates string representations of values listed in the array with the separator. separator is an optional parameter: a constant string, set to an empty string by default. Returns the string.

alphaTokens(s)

Selects substrings of consecutive bytes from the ranges a-z and A-Z.Returns an array of substrings.

Example

SELECT alphaTokens('abca1abc');

┌─alphaTokens('abca1abc')─┐
│ ['abca','abc']          │
└─────────────────────────┘

extractAllGroups(text, regexp)

Extracts all groups from non-overlapping substrings matched by a regular expression.

Syntax

extractAllGroups(text, regexp)

Arguments

text — String or FixedString.
regexp — Regular expression. Constant. String or FixedString.

Returned values

If the function finds at least one matching group, it returns Array(Array(String)) column, clustered by group_id (1 to N, where N is number of capturing groups in regexp).
If there is no matching group, returns an empty array.

Type: Array.

Example

Query:

SELECT extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');

Result:

┌─extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','123'],['8','"hkl"']]                                         │
└───────────────────────────────────────────────────────────────────────┘

ngrams

Splits the UTF-8 string into n-grams of ngramsize symbols.

Syntax

ngrams(string, ngramsize)

Arguments

string — String. String or FixedString.
ngramsize — The size of an n-gram. UInt.

Returned values

Array with n-grams.

Type: Array(String).

Example

Query:

SELECT ngrams('ClickHouse', 3);

Result:

┌─ngrams('ClickHouse', 3)───────────────────────────┐
│ ['Cli','lic','ick','ckH','kHo','Hou','ous','use'] │
└───────────────────────────────────────────────────┘

tokens

Splits a string into tokens using non-alphanumeric ASCII characters as separators.

Arguments

input_string — Any set of bytes represented as the String data type object.

Returned value

The resulting array of tokens from input string.

Type: Array.

Example

Query:

SELECT tokens('test1,;\\ test2,;\\ test3,;\\   test4') AS tokens;

Result:

┌─tokens────────────────────────────┐
│ ['test1','test2','test3','test4'] │
└───────────────────────────────────┘

9.3 KiB Raw Blame History

Functions for Splitting and Merging Strings and Arrays

splitByChar(separator, s)

splitByString(separator, s)

splitByRegexp(regexp, s)

splitByWhitespace(s)

splitByNonAlpha(s)

arrayStringConcat(arr[, separator])

alphaTokens(s)

extractAllGroups(text, regexp)

ngrams

tokens

9.3 KiB

Raw Blame History