2020-04-03 13:23:32 +00:00
---
toc_priority: 40
toc_title: Working with strings
---
2020-04-30 18:19:18 +00:00
# Functions for Working with Strings {#functions-for-working-with-strings}
2017-12-28 15:13:23 +00:00
2020-03-19 15:32:53 +00:00
## empty {#empty}
2017-04-03 19:49:50 +00:00
2017-04-26 19:16:38 +00:00
Returns 1 for an empty string or 0 for a non-empty string.
The result type is UInt8.
A string is considered non-empty if it contains at least one byte, even if this is a space or a null byte.
The function also works for arrays.
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## notEmpty {#notempty}
2017-12-28 15:13:23 +00:00
2017-04-26 19:16:38 +00:00
Returns 0 for an empty string or 1 for a non-empty string.
The result type is UInt8.
The function also works for arrays.
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## length {#length}
2017-12-28 15:13:23 +00:00
2017-04-26 19:16:38 +00:00
Returns the length of a string in bytes (not in characters, and not in code points).
The result type is UInt64.
The function also works for arrays.
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## lengthUTF8 {#lengthutf8}
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn’ t throw an exception).
2017-04-26 19:16:38 +00:00
The result type is UInt64.
2017-04-03 19:49:50 +00:00
2020-03-21 04:11:51 +00:00
## char\_length, CHAR\_LENGTH {#char-length}
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn’ t throw an exception).
2019-01-30 10:39:46 +00:00
The result type is UInt64.
2020-03-21 04:11:51 +00:00
## character\_length, CHARACTER\_LENGTH {#character-length}
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
Returns the length of a string in Unicode code points (not in characters), assuming that the string contains a set of bytes that make up UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn’ t throw an exception).
2019-01-30 10:39:46 +00:00
The result type is UInt64.
2020-03-19 15:32:53 +00:00
## lower, lcase {#lower}
2017-12-28 15:13:23 +00:00
2017-04-26 19:16:38 +00:00
Converts ASCII Latin symbols in a string to lowercase.
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## upper, ucase {#upper}
2017-12-28 15:13:23 +00:00
2017-04-26 19:16:38 +00:00
Converts ASCII Latin symbols in a string to uppercase.
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## lowerUTF8 {#lowerutf8}
2017-12-28 15:13:23 +00:00
Converts a string to lowercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
2020-03-20 10:10:48 +00:00
It doesn’ t detect the language. So for Turkish the result might not be exactly correct.
2017-12-28 15:13:23 +00:00
If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point.
If the string contains a set of bytes that is not UTF-8, then the behavior is undefined.
2020-03-19 15:32:53 +00:00
## upperUTF8 {#upperutf8}
2017-04-03 19:49:50 +00:00
2017-12-28 15:13:23 +00:00
Converts a string to uppercase, assuming the string contains a set of bytes that make up a UTF-8 encoded text.
2020-03-20 10:10:48 +00:00
It doesn’ t detect the language. So for Turkish the result might not be exactly correct.
2017-12-28 15:13:23 +00:00
If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point.
If the string contains a set of bytes that is not UTF-8, then the behavior is undefined.
2020-03-19 15:32:53 +00:00
## isValidUTF8 {#isvalidutf8}
2019-04-07 18:58:13 +00:00
2019-04-07 18:59:53 +00:00
Returns 1, if the set of bytes is valid UTF-8 encoded, otherwise 0.
2019-04-07 18:58:13 +00:00
2020-03-19 15:32:53 +00:00
## toValidUTF8 {#tovalidutf8}
2019-05-17 12:55:21 +00:00
2019-05-20 14:41:10 +00:00
Replaces invalid UTF-8 characters by the `<60> ` (U+FFFD) character. All running in a row invalid characters are collapsed into the one replacement character.
2020-03-20 10:10:48 +00:00
``` sql
2019-05-20 14:41:10 +00:00
toValidUTF8( input_string )
```
Parameters:
2020-04-30 18:19:18 +00:00
- input\_string — Any set of bytes represented as the [String ](../../sql-reference/data-types/string.md ) data type object.
2019-05-20 14:41:10 +00:00
Returned value: Valid UTF-8 string.
2020-03-19 15:32:53 +00:00
**Example**
2019-05-20 14:41:10 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-05-23 11:37:05 +00:00
SELECT toValidUTF8('\x61\xF0\x80\x80\x80b')
2019-05-20 14:41:10 +00:00
```
2020-03-20 10:10:48 +00:00
``` text
2019-05-23 11:37:05 +00:00
┌─toValidUTF8('a<> <61> <EFBFBD> <EFBFBD> b')─┐
│ a<> b │
└───────────────────────┘
2019-05-20 14:41:10 +00:00
```
2019-05-17 12:55:21 +00:00
2020-03-18 18:43:51 +00:00
## repeat {#repeat}
2019-10-24 14:59:00 +00:00
Repeats a string as many times as specified and concatenates the replicated values as a single string.
**Syntax**
2020-03-20 10:10:48 +00:00
``` sql
2019-10-24 14:59:00 +00:00
repeat(s, n)
```
**Parameters**
2020-04-30 18:19:18 +00:00
- `s` — The string to repeat. [String ](../../sql-reference/data-types/string.md ).
- `n` — The number of times to repeat the string. [UInt ](../../sql-reference/data-types/int-uint.md ).
2019-10-24 14:59:00 +00:00
**Returned value**
2020-03-20 10:10:48 +00:00
The single string, which contains the string `s` repeated `n` times. If `n` \< 1, the function returns empty string.
2019-10-24 14:59:00 +00:00
Type: `String` .
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-10-24 14:59:00 +00:00
SELECT repeat('abc', 10)
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-10-24 14:59:00 +00:00
┌─repeat('abc', 10)──────────────┐
│ abcabcabcabcabcabcabcabcabcabc │
└────────────────────────────────┘
```
2019-05-21 09:08:43 +00:00
2020-03-19 15:32:53 +00:00
## reverse {#reverse}
2017-04-03 19:49:50 +00:00
2017-04-26 19:16:38 +00:00
Reverses the string (as a sequence of bytes).
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## reverseUTF8 {#reverseutf8}
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
Reverses a sequence of Unicode code points, assuming that the string contains a set of bytes representing a UTF-8 text. Otherwise, it does something else (it doesn’ t throw an exception).
2017-04-03 19:49:50 +00:00
2020-03-20 10:10:48 +00:00
## format(pattern, s0, s1, …) {#format}
2019-05-18 11:30:36 +00:00
2020-03-20 10:10:48 +00:00
Formatting constant pattern with the string listed in the arguments. `pattern` is a simplified Python format pattern. Format string contains “replacement fields” surrounded by curly braces `{}` . Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: `{{ '{{' }}` and `{{ '}}' }}` . Field names can be numbers (starting from zero) or empty (then they are treated as consequence numbers).
2019-05-18 11:30:36 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-05-18 11:30:36 +00:00
SELECT format('{1} {0} {1}', 'World', 'Hello')
2019-09-23 15:31:46 +00:00
```
2020-03-20 10:10:48 +00:00
``` text
2019-05-18 11:30:36 +00:00
┌─format('{1} {0} {1}', 'World', 'Hello')─┐
│ Hello World Hello │
└─────────────────────────────────────────┘
2019-09-23 15:31:46 +00:00
```
2020-03-20 10:10:48 +00:00
``` sql
2019-05-18 11:30:36 +00:00
SELECT format('{} {}', 'Hello', 'World')
2019-09-23 15:31:46 +00:00
```
2020-03-20 10:10:48 +00:00
``` text
2019-05-18 11:30:36 +00:00
┌─format('{} {}', 'Hello', 'World')─┐
│ Hello World │
└───────────────────────────────────┘
```
2020-03-18 18:43:51 +00:00
## concat {#concat}
2017-12-28 15:13:23 +00:00
Concatenates the strings listed in the arguments, without a separator.
2020-03-20 10:10:48 +00:00
**Syntax**
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-11-08 19:08:55 +00:00
concat(s1, s2, ...)
```
2019-11-08 19:54:33 +00:00
**Parameters**
2019-11-08 19:08:55 +00:00
2020-02-02 21:59:23 +00:00
Values of type String or FixedString.
2019-11-08 19:08:55 +00:00
**Returned values**
2020-03-20 10:10:48 +00:00
Returns the String that results from concatenating the arguments.
2019-11-08 19:08:55 +00:00
2020-03-20 10:10:48 +00:00
If any of argument values is `NULL` , `concat` returns `NULL` .
2019-11-08 19:08:55 +00:00
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-12-26 12:51:48 +00:00
SELECT concat('Hello, ', 'World!')
2019-11-08 19:08:55 +00:00
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-12-26 12:51:48 +00:00
┌─concat('Hello, ', 'World!')─┐
│ Hello, World! │
└─────────────────────────────┘
2019-11-08 19:08:55 +00:00
```
2020-03-18 18:43:51 +00:00
## concatAssumeInjective {#concatassumeinjective}
2019-11-08 19:08:55 +00:00
2019-12-26 12:51:48 +00:00
Same as [concat ](#concat ), the difference is that you need to ensure that `concat(s1, s2, ...) → sn` is injective, it will be used for optimization of GROUP BY.
2019-11-08 19:08:55 +00:00
2020-03-20 10:10:48 +00:00
The function is named “injective” if it always returns different result for different values of arguments. In other words: different arguments never yield identical result.
2019-11-08 19:08:55 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2020-03-19 15:32:53 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2020-03-19 15:32:53 +00:00
concatAssumeInjective(s1, s2, ...)
```
**Parameters**
Values of type String or FixedString.
**Returned values**
2020-03-20 10:10:48 +00:00
Returns the String that results from concatenating the arguments.
2020-03-19 15:32:53 +00:00
If any of argument values is `NULL` , `concatAssumeInjective` returns `NULL` .
2019-11-08 19:08:55 +00:00
**Example**
2019-12-26 12:51:48 +00:00
Input table:
2020-03-20 10:10:48 +00:00
``` sql
2020-02-02 21:59:23 +00:00
CREATE TABLE key_val(`key1` String, `key2` String, `value` UInt32) ENGINE = TinyLog;
INSERT INTO key_val VALUES ('Hello, ','World',1), ('Hello, ','World',2), ('Hello, ','World!',3), ('Hello',', World!',2);
SELECT * from key_val;
2019-12-26 12:51:48 +00:00
```
2020-03-20 10:10:48 +00:00
``` text
2019-12-26 12:51:48 +00:00
┌─key1────┬─key2─────┬─value─┐
│ Hello, │ World │ 1 │
│ Hello, │ World │ 2 │
│ Hello, │ World! │ 3 │
│ Hello │ , World! │ 2 │
└─────────┴──────────┴───────┘
```
2019-11-08 19:08:55 +00:00
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-11-08 19:08:55 +00:00
SELECT concat(key1, key2), sum(value) FROM key_val GROUP BY concatAssumeInjective(key1, key2)
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-11-08 19:08:55 +00:00
┌─concat(key1, key2)─┬─sum(value)─┐
│ Hello, World! │ 3 │
│ Hello, World! │ 2 │
│ Hello, World │ 3 │
└────────────────────┴────────────┘
```
2019-01-30 10:39:46 +00:00
2020-03-19 15:32:53 +00:00
## substring(s, offset, length), mid(s, offset, length), substr(s, offset, length) {#substring}
2017-04-03 19:49:50 +00:00
2020-03-20 10:10:48 +00:00
Returns a substring starting with the byte from the ‘ offset’ index that is ‘ length’ bytes long. Character indexing starts from one (as in standard SQL). The ‘ offset’ and ‘ length’ arguments must be constants.
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## substringUTF8(s, offset, length) {#substringutf8}
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
The same as ‘ substring’ , but for Unicode code points. Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, it returns some result (it doesn’ t throw an exception).
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## appendTrailingCharIfAbsent(s, c) {#appendtrailingcharifabsent}
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
If the ‘ s’ string is non-empty and does not contain the ‘ c’ character at the end, it appends the ‘ c’ character to the end.
2017-12-28 15:13:23 +00:00
2020-03-19 15:32:53 +00:00
## convertCharset(s, from, to) {#convertcharset}
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
Returns the string ‘ s’ that was converted from the encoding in ‘ from’ to the encoding in ‘ to’ .
2017-04-03 19:49:50 +00:00
2020-03-19 15:32:53 +00:00
## base64Encode(s) {#base64encode}
2020-03-20 10:10:48 +00:00
Encodes ‘ s’ string into base64
2018-11-02 19:06:05 +00:00
2020-03-19 15:32:53 +00:00
## base64Decode(s) {#base64decode}
2020-03-20 10:10:48 +00:00
Decode base64-encoded string ‘ s’ into original string. In case of failure raises an exception.
2018-11-02 19:06:05 +00:00
2020-03-19 15:32:53 +00:00
## tryBase64Decode(s) {#trybase64decode}
2018-11-14 09:32:42 +00:00
Similar to base64Decode, but in case of error an empty string would be returned.
2020-03-19 15:32:53 +00:00
## endsWith(s, suffix) {#endswith}
2019-01-30 10:39:46 +00:00
Returns whether to end with the specified suffix. Returns 1 if the string ends with the specified suffix, otherwise it returns 0.
2020-03-19 15:32:53 +00:00
## startsWith(str, prefix) {#startswith}
2019-01-30 10:39:46 +00:00
2019-09-26 11:39:06 +00:00
Returns 1 whether string starts with the specified prefix, otherwise it returns 0.
2020-03-20 10:10:48 +00:00
``` sql
2019-09-30 07:24:02 +00:00
SELECT startsWith('Spider-Man', 'Spi');
2019-09-26 11:39:06 +00:00
```
**Returned values**
2020-03-21 04:11:51 +00:00
- 1, if the string starts with the specified prefix.
- 0, if the string doesn’ t start with the specified prefix.
2019-09-26 11:39:06 +00:00
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-09-26 11:39:06 +00:00
SELECT startsWith('Hello, world!', 'He');
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-09-26 11:39:06 +00:00
┌─startsWith('Hello, world!', 'He')─┐
│ 1 │
└───────────────────────────────────┘
```
2019-01-30 10:39:46 +00:00
2020-03-18 18:43:51 +00:00
## trim {#trim}
2020-01-05 22:50:16 +00:00
Removes all specified characters from the start or end of a string.
By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string.
**Syntax**
2020-03-20 10:10:48 +00:00
``` sql
2020-01-05 22:50:16 +00:00
trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)
```
**Parameters**
2020-04-30 18:19:18 +00:00
- `trim_character` — specified characters for trim. [String ](../../sql-reference/data-types/string.md ).
- `input_string` — string for trim. [String ](../../sql-reference/data-types/string.md ).
2020-01-05 22:50:16 +00:00
**Returned value**
A string without leading and (or) trailing specified characters.
Type: `String` .
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2020-01-05 22:50:16 +00:00
SELECT trim(BOTH ' ()' FROM '( Hello, world! )')
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2020-01-05 22:50:16 +00:00
┌─trim(BOTH ' ()' FROM '( Hello, world! )')─┐
│ Hello, world! │
└───────────────────────────────────────────────┘
```
2020-03-18 18:43:51 +00:00
## trimLeft {#trimleft}
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
Removes all consecutive occurrences of common whitespace (ASCII character 32) from the beginning of a string. It doesn’ t remove other kinds of whitespace characters (tab, no-break space, etc.).
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2020-01-05 22:50:16 +00:00
trimLeft(input_string)
2019-12-05 10:03:35 +00:00
```
2020-01-05 22:50:16 +00:00
Alias: `ltrim(input_string)` .
2019-12-05 10:03:35 +00:00
2020-03-20 10:10:48 +00:00
**Parameters**
2019-12-05 10:03:35 +00:00
2020-04-30 18:19:18 +00:00
- `input_string` — string to trim. [String ](../../sql-reference/data-types/string.md ).
2019-12-05 10:03:35 +00:00
**Returned value**
A string without leading common whitespaces.
Type: `String` .
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-12-05 10:03:35 +00:00
SELECT trimLeft(' Hello, world! ')
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-12-05 10:03:35 +00:00
┌─trimLeft(' Hello, world! ')─┐
│ Hello, world! │
└─────────────────────────────────────┘
```
2020-03-18 18:43:51 +00:00
## trimRight {#trimright}
2019-12-05 10:03:35 +00:00
2020-03-20 10:10:48 +00:00
Removes all consecutive occurrences of common whitespace (ASCII character 32) from the end of a string. It doesn’ t remove other kinds of whitespace characters (tab, no-break space, etc.).
2019-12-05 10:03:35 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2019-12-05 10:03:35 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2020-01-05 22:50:16 +00:00
trimRight(input_string)
2019-12-05 10:03:35 +00:00
```
2020-01-05 22:50:16 +00:00
Alias: `rtrim(input_string)` .
2019-12-05 10:03:35 +00:00
**Parameters**
2020-04-30 18:19:18 +00:00
- `input_string` — string to trim. [String ](../../sql-reference/data-types/string.md ).
2019-12-05 10:03:35 +00:00
**Returned value**
A string without trailing common whitespaces.
Type: `String` .
2019-01-30 10:39:46 +00:00
2019-12-05 10:03:35 +00:00
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-12-05 10:03:35 +00:00
SELECT trimRight(' Hello, world! ')
```
Result:
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
``` text
2019-12-05 10:03:35 +00:00
┌─trimRight(' Hello, world! ')─┐
│ Hello, world! │
└──────────────────────────────────────┘
```
2020-03-20 10:10:48 +00:00
## trimBoth {#trimboth}
2019-12-05 10:03:35 +00:00
2020-03-20 10:10:48 +00:00
Removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string. It doesn’ t remove other kinds of whitespace characters (tab, no-break space, etc.).
2019-12-05 10:03:35 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2019-12-05 10:03:35 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2020-01-05 22:50:16 +00:00
trimBoth(input_string)
2019-12-05 10:03:35 +00:00
```
2020-01-05 22:50:16 +00:00
Alias: `trim(input_string)` .
2019-12-05 10:03:35 +00:00
**Parameters**
2020-04-30 18:19:18 +00:00
- `input_string` — string to trim. [String ](../../sql-reference/data-types/string.md ).
2019-12-05 10:03:35 +00:00
**Returned value**
A string without leading and trailing common whitespaces.
Type: `String` .
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-12-05 10:03:35 +00:00
SELECT trimBoth(' Hello, world! ')
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-12-05 10:03:35 +00:00
┌─trimBoth(' Hello, world! ')─┐
│ Hello, world! │
└─────────────────────────────────────┘
```
2019-01-30 10:39:46 +00:00
2020-03-19 15:32:53 +00:00
## CRC32(s) {#crc32}
2019-06-17 21:49:37 +00:00
2019-10-20 20:04:52 +00:00
Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial and initial value `0xffffffff` (zlib implementation).
The result type is UInt32.
2020-03-19 15:32:53 +00:00
## CRC32IEEE(s) {#crc32ieee}
2019-10-20 20:04:52 +00:00
Returns the CRC32 checksum of a string, using CRC-32-IEEE 802.3 polynomial.
2019-06-17 21:49:37 +00:00
The result type is UInt32.
2020-03-19 15:32:53 +00:00
## CRC64(s) {#crc64}
2019-10-20 20:04:52 +00:00
Returns the CRC64 checksum of a string, using CRC-64-ECMA polynomial.
The result type is UInt64.
2020-01-30 10:34:55 +00:00
[Original article ](https://clickhouse.tech/docs/en/query_language/functions/string_functions/ ) <!--hide-->