2020-04-03 13:23:32 +00:00
---
2022-08-28 14:53:34 +00:00
slug: /en/sql-reference/functions/string-search-functions
2023-04-19 17:05:55 +00:00
sidebar_position: 160
2023-02-27 08:13:09 +00:00
sidebar_label: Searching in Strings
2020-04-03 13:23:32 +00:00
---
2022-06-02 10:55:18 +00:00
# Functions for Searching in Strings
2017-12-28 15:13:23 +00:00
2024-03-28 20:54:26 +00:00
All functions in this section search case-sensitively by default. Case-insensitive search is usually provided by separate function variants.
2017-12-28 15:13:23 +00:00
2024-03-28 20:54:26 +00:00
:::note
Case-insensitive search follows the lowercase-uppercase rules of the English language. E.g. Uppercased `i` in the English language is
`I` whereas in the Turkish language it is `İ` - results for languages other than English may be unexpected.
:::
2024-04-08 19:55:27 +00:00
Functions in this section also assume that the searched string (referred to in this section as `haystack` ) and the search string (referred to in this section as `needle` ) are single-byte encoded text. If this assumption is
2023-04-20 09:30:11 +00:00
violated, no exception is thrown and results are undefined. Search with UTF-8 encoded strings is usually provided by separate function
variants. Likewise, if a UTF-8 function variant is used and the input strings are not UTF-8 encoded text, no exception is thrown and the
2024-03-28 20:54:26 +00:00
results are undefined. Note that no automatic Unicode normalization is performed, however you can use the
2024-05-24 03:54:16 +00:00
[normalizeUTF8*() ](https://clickhouse.com../functions/string-functions/ ) functions for that.
2020-06-19 10:08:10 +00:00
2023-04-20 10:08:49 +00:00
[General strings functions ](string-functions.md ) and [functions for replacing in strings ](string-replace-functions.md ) are described separately.
2021-03-22 16:30:28 +00:00
2023-04-20 09:30:11 +00:00
## position
2021-02-22 09:49:49 +00:00
2023-04-20 09:30:11 +00:00
Returns the position (in bytes, starting at 1) of a substring `needle` in a string `haystack` .
2017-12-28 15:13:23 +00:00
2020-03-13 06:33:02 +00:00
**Syntax**
2018-04-28 11:45:37 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2021-03-30 06:15:52 +00:00
position(haystack, needle[, start_pos])
2021-07-29 15:20:55 +00:00
```
2021-03-30 06:15:52 +00:00
2023-04-20 09:30:11 +00:00
Alias:
- `position(needle IN haystack)`
2021-03-13 18:25:06 +00:00
2021-02-15 21:22:10 +00:00
**Arguments**
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2023-04-19 15:55:29 +00:00
- `needle` — Substring to be searched. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `start_pos` – Position (1-based) in `haystack` at which the search starts. [UInt ](../data-types/int-uint.md ). Optional.
2020-03-13 06:33:02 +00:00
2024-05-24 04:42:13 +00:00
**Returned value**
2020-03-13 06:33:02 +00:00
2024-05-24 03:54:16 +00:00
- Starting position in bytes and counting from 1, if the substring was found. [UInt64 ](../data-types/int-uint.md ).
- 0, if the substring was not found. [UInt64 ](../data-types/int-uint.md ).
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
If substring `needle` is empty, these rules apply:
- if no `start_pos` was specified: return `1`
- if `start_pos = 0` : return `1`
- if `start_pos >= 1` and `start_pos <= length(haystack) + 1` : return `start_pos`
- otherwise: return `0`
2020-03-13 06:33:02 +00:00
2024-03-08 11:27:09 +00:00
The same rules also apply to functions `locate` , `positionCaseInsensitive` , `positionUTF8` and `positionCaseInsensitiveUTF8` .
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
**Examples**
2020-03-13 06:33:02 +00:00
2024-03-28 20:54:26 +00:00
Query:
2020-03-20 10:10:48 +00:00
``` sql
2021-03-13 18:18:45 +00:00
SELECT position('Hello, world!', '!');
2020-03-13 06:33:02 +00:00
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2020-03-13 06:33:02 +00:00
┌─position('Hello, world!', '!')─┐
│ 13 │
└────────────────────────────────┘
```
2023-04-20 09:30:11 +00:00
Example with `start_pos` argument:
2024-03-28 20:54:26 +00:00
Query:
2020-08-02 13:29:10 +00:00
``` sql
SELECT
position('Hello, world!', 'o', 1),
position('Hello, world!', 'o', 7)
```
2024-04-01 15:06:54 +00:00
Result:
2020-08-02 13:29:10 +00:00
``` text
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│ 5 │ 9 │
└───────────────────────────────────┴───────────────────────────────────┘
```
2023-04-20 09:30:11 +00:00
Example for `needle IN haystack` syntax:
2020-03-13 06:33:02 +00:00
2024-03-28 20:54:26 +00:00
Query:
2023-04-20 09:30:11 +00:00
```sql
SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s);
2020-03-13 06:33:02 +00:00
```
Result:
2023-04-20 09:30:11 +00:00
```text
┌─equals(6, position(s, '/'))─┐
│ 1 │
└─────────────────────────────┘
2020-03-13 06:33:02 +00:00
```
2023-04-20 09:30:11 +00:00
Examples with empty `needle` substring:
2023-01-30 08:13:12 +00:00
2024-03-28 20:54:26 +00:00
Query:
2023-01-30 08:13:12 +00:00
``` sql
SELECT
position('abc', ''),
position('abc', '', 0),
position('abc', '', 1),
position('abc', '', 2),
position('abc', '', 3),
position('abc', '', 4),
position('abc', '', 5)
```
2024-03-28 20:54:26 +00:00
Result:
2023-01-30 08:13:12 +00:00
``` text
┌─position('abc', '')─┬─position('abc', '', 0)─┬─position('abc', '', 1)─┬─position('abc', '', 2)─┬─position('abc', '', 3)─┬─position('abc', '', 4)─┬─position('abc', '', 5)─┐
│ 1 │ 1 │ 1 │ 2 │ 3 │ 4 │ 0 │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘
```
2024-03-08 11:27:09 +00:00
## locate
Like [position ](#position ) but with arguments `haystack` and `locate` switched.
The behavior of this function depends on the ClickHouse version:
- in versions < v24.3 , `locate` was an alias of function `position` and accepted arguments `(haystack, needle[, start_pos])` .
- in versions >= 24.3,, `locate` is an individual function (for better compatibility with MySQL) and accepts arguments `(needle, haystack[, start_pos])` . The previous behavior
can be restored using setting [function_locate_has_mysql_compatible_argument_order = false ](../../operations/settings/settings.md#function-locate-has-mysql-compatible-argument-order );
**Syntax**
``` sql
locate(needle, haystack[, start_pos])
```
2022-06-02 10:55:18 +00:00
## positionCaseInsensitive
2020-03-13 06:33:02 +00:00
2024-03-28 20:54:26 +00:00
A case insensitive invariant of [position ](#position ).
**Example**
Query:
``` sql
SELECT position('Hello, world!', 'hello');
```
Result:
``` text
┌─position('Hello, world!', 'hello')─┐
│ 0 │
└────────────────────────────────────┘
```
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
## positionUTF8
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
Like [position ](#position ) but assumes `haystack` and `needle` are UTF-8 encoded strings.
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
**Examples**
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
Function `positionUTF8` correctly counts character `ö` (represented by two points) as a single Unicode codepoint:
2020-03-13 06:33:02 +00:00
2024-03-28 20:45:36 +00:00
Query:
2020-03-20 10:10:48 +00:00
``` sql
2023-04-20 09:30:11 +00:00
SELECT positionUTF8('Motörhead', 'r');
2020-03-13 06:33:02 +00:00
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2023-04-20 09:30:11 +00:00
┌─position('Motörhead', 'r')─┐
│ 5 │
└────────────────────────────┘
2020-03-13 06:33:02 +00:00
```
2023-04-20 09:30:11 +00:00
## positionCaseInsensitiveUTF8
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
Like [positionUTF8 ](#positionutf8 ) but searches case-insensitively.
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
## multiSearchAllPositions
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
Like [position ](#position ) but returns an array of positions (in bytes, starting at 1) for multiple `needle` substrings in a `haystack` string.
:::note
All `multiSearch*()` functions only support up to 2< sup > 8</ sup > needles.
:::
2020-03-13 06:33:02 +00:00
**Syntax**
2020-03-20 10:10:48 +00:00
``` sql
2023-04-20 09:30:11 +00:00
multiSearchAllPositions(haystack, [needle1, needle2, ..., needleN])
2020-03-13 06:33:02 +00:00
```
2021-02-15 21:22:10 +00:00
**Arguments**
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Substrings to be searched. [Array ](../data-types/array.md ).
2020-03-13 06:33:02 +00:00
2024-05-24 04:42:13 +00:00
**Returned value**
2020-03-13 06:33:02 +00:00
2024-03-28 20:45:36 +00:00
- Array of the starting position in bytes and counting from 1, if the substring was found.
2024-03-28 20:06:17 +00:00
- 0, if the substring was not found.
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
**Example**
2020-03-13 06:33:02 +00:00
2024-03-28 20:45:36 +00:00
Query:
2020-03-20 10:10:48 +00:00
``` sql
2023-04-20 09:30:11 +00:00
SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world']);
2020-03-13 06:33:02 +00:00
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2023-04-20 09:30:11 +00:00
┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0] │
└───────────────────────────────────────────────────────────────────┘
2020-03-13 06:33:02 +00:00
```
2024-03-28 11:36:11 +00:00
## multiSearchAllPositionsCaseInsensitive
Like [multiSearchAllPositions ](#multisearchallpositions ) but ignores case.
**Syntax**
```sql
multiSearchAllPositionsCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 11:36:11 +00:00
**Returned value**
- Array of the starting position in bytes and counting from 1 (if the substring was found).
- 0 if the substring was not found.
**Example**
Query:
```sql
SELECT multiSearchAllPositionsCaseInsensitive('ClickHouse',['c','h']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 11:36:11 +00:00
```response
["1","6"]
```
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
## multiSearchAllPositionsUTF8
2020-03-13 06:33:02 +00:00
2024-06-12 12:09:37 +00:00
Like [multiSearchAllPositions ](#multisearchallpositions ) but assumes `haystack` and the `needle` substrings are UTF-8 encoded strings.
2020-03-13 06:33:02 +00:00
2024-03-28 11:36:11 +00:00
**Syntax**
```sql
multiSearchAllPositionsUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 encoded string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — UTF-8 encoded substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 11:36:11 +00:00
**Returned value**
- Array of the starting position in bytes and counting from 1 (if the substring was found).
- 0 if the substring was not found.
**Example**
2024-03-28 20:20:33 +00:00
Given `ClickHouse` as a UTF-8 string, find the positions of `C` (`\x43`) and `H` (`\x48`).
2024-03-28 11:36:11 +00:00
Query:
```sql
SELECT multiSearchAllPositionsUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x43','\x48']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 11:36:11 +00:00
```response
["1","6"]
```
## multiSearchAllPositionsCaseInsensitiveUTF8
Like [multiSearchAllPositionsUTF8 ](#multisearchallpositionsutf8 ) but ignores case.
**Syntax**
```sql
multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 encoded string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — UTF-8 encoded substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 11:36:11 +00:00
**Returned value**
- Array of the starting position in bytes and counting from 1 (if the substring was found).
- 0 if the substring was not found.
**Example**
Given `ClickHouse` as a UTF-8 string, find the positions of `c` (`\x63`) and `h` (`\x68`).
Query:
```sql
SELECT multiSearchAllPositionsCaseInsensitiveUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x63','\x68']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 11:36:11 +00:00
```response
["1","6"]
```
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
## multiSearchFirstPosition
2020-03-13 06:33:02 +00:00
2024-03-28 20:20:33 +00:00
Like [`position` ](#position ) but returns the leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
2020-03-13 06:33:02 +00:00
2024-06-12 12:09:37 +00:00
Functions [`multiSearchFirstPositionCaseInsensitive` ](#multisearchfirstpositioncaseinsensitive ), [`multiSearchFirstPositionUTF8` ](#multisearchfirstpositionutf8 ) and [`multiSearchFirstPositionCaseInsensitiveUTF8` ](#multisearchfirstpositioncaseinsensitiveutf8 ) provide case-insensitive and/or UTF-8 variants of this function.
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
```sql
2024-03-28 19:52:55 +00:00
multiSearchFirstPosition(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 19:52:55 +00:00
**Returned value**
- Leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
- 0, if there was no match.
**Example**
Query:
```sql
SELECT multiSearchFirstPosition('Hello World',['llo', 'Wor', 'ld']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 19:52:55 +00:00
```response
3
```
## multiSearchFirstPositionCaseInsensitive
2024-06-12 12:09:37 +00:00
Like [`multiSearchFirstPosition` ](#multisearchfirstposition ) but ignores case.
2024-03-28 19:52:55 +00:00
**Syntax**
```sql
multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Array of substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 19:52:55 +00:00
**Returned value**
- Leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
- 0, if there was no match.
**Example**
Query:
```sql
SELECT multiSearchFirstPositionCaseInsensitive('HELLO WORLD',['wor', 'ld', 'ello']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 19:52:55 +00:00
```response
2
```
2024-03-28 20:20:33 +00:00
## multiSearchFirstPositionUTF8
2024-03-28 19:52:55 +00:00
2024-06-12 12:09:37 +00:00
Like [`multiSearchFirstPosition` ](#multisearchfirstposition ) but assumes `haystack` and `needle` to be UTF-8 strings.
2024-03-28 19:52:55 +00:00
**Syntax**
```sql
multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Array of UTF-8 substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 19:52:55 +00:00
**Returned value**
- Leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
- 0, if there was no match.
**Example**
Find the leftmost offset in UTF-8 string `hello world` which matches any of the given needles.
Query:
```sql
SELECT multiSearchFirstPositionUTF8('\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64',['wor', 'ld', 'ello']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 19:52:55 +00:00
```response
2
```
2024-03-28 20:20:33 +00:00
## multiSearchFirstPositionCaseInsensitiveUTF8
2024-03-28 19:52:55 +00:00
2024-06-12 12:09:37 +00:00
Like [`multiSearchFirstPosition` ](#multisearchfirstposition ) but assumes `haystack` and `needle` to be UTF-8 strings and ignores case.
2024-03-28 19:52:55 +00:00
**Syntax**
```sql
multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Array of UTF-8 substrings to be searched. [Array ](../data-types/array.md )
2024-03-28 19:52:55 +00:00
**Returned value**
- Leftmost offset in a `haystack` string which matches any of multiple `needle` strings, ignoring case.
- 0, if there was no match.
**Example**
Find the leftmost offset in UTF-8 string `HELLO WORLD` which matches any of the given needles.
Query:
```sql
SELECT multiSearchFirstPositionCaseInsensitiveUTF8('\x48\x45\x4c\x4c\x4f\x20\x57\x4f\x52\x4c\x44',['wor', 'ld', 'ello']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 19:52:55 +00:00
```response
2
2020-03-13 06:33:02 +00:00
```
2023-04-20 09:30:11 +00:00
## multiSearchFirstIndex
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
Returns the index `i` (starting from 1) of the leftmost found needle< sub > i</ sub > in the string `haystack` and 0 otherwise.
2020-03-13 06:33:02 +00:00
2024-06-12 12:09:37 +00:00
Functions [`multiSearchFirstIndexCaseInsensitive` ](#multisearchfirstindexcaseinsensitive ), [`multiSearchFirstIndexUTF8` ](#multisearchfirstindexutf8 ) and [`multiSearchFirstIndexCaseInsensitiveUTF8` ](#multisearchfirstindexcaseinsensitiveutf8 ) provide case-insensitive and/or UTF-8 variants of this function.
2020-03-13 06:33:02 +00:00
**Syntax**
2023-04-20 09:30:11 +00:00
```sql
2024-03-28 18:47:50 +00:00
multiSearchFirstIndex(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 18:47:50 +00:00
**Returned value**
2024-05-24 04:42:13 +00:00
- index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. [UInt8 ](../data-types/int-uint.md ).
2024-03-28 18:47:50 +00:00
**Example**
Query:
```sql
SELECT multiSearchFirstIndex('Hello World',['World','Hello']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 18:47:50 +00:00
```response
1
```
2024-03-28 20:20:33 +00:00
## multiSearchFirstIndexCaseInsensitive
2024-03-28 18:47:50 +00:00
Returns the index `i` (starting from 1) of the leftmost found needle< sub > i</ sub > in the string `haystack` and 0 otherwise. Ignores case.
**Syntax**
```sql
multiSearchFirstIndexCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 18:47:50 +00:00
**Returned value**
2024-05-24 04:42:13 +00:00
- index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. [UInt8 ](../data-types/int-uint.md ).
2024-03-28 18:47:50 +00:00
**Example**
Query:
```sql
SELECT multiSearchFirstIndexCaseInsensitive('hElLo WoRlD',['World','Hello']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 18:47:50 +00:00
```response
1
```
2024-03-28 20:20:33 +00:00
## multiSearchFirstIndexUTF8
2024-03-28 18:47:50 +00:00
Returns the index `i` (starting from 1) of the leftmost found needle< sub > i</ sub > in the string `haystack` and 0 otherwise. Assumes `haystack` and `needle` are UTF-8 encoded strings.
2020-03-13 06:33:02 +00:00
**Syntax**
2023-04-20 09:30:11 +00:00
```sql
2024-03-28 18:47:50 +00:00
multiSearchFirstIndexUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Array of UTF-8 substrings to be searched. [Array ](../data-types/array.md )
2024-03-28 18:47:50 +00:00
**Returned value**
2024-05-24 04:42:13 +00:00
- index (starting from 1) of the leftmost found needle, Otherwise 0, if there was no match. [UInt8 ](../data-types/int-uint.md ).
2024-03-28 18:47:50 +00:00
**Example**
Given `Hello World` as a UTF-8 string, find the first index of UTF-8 strings `Hello` and `World` .
Query:
```sql
SELECT multiSearchFirstIndexUTF8('\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64',['\x57\x6f\x72\x6c\x64','\x48\x65\x6c\x6c\x6f']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 18:47:50 +00:00
```response
1
2020-03-13 06:33:02 +00:00
```
2024-03-28 20:20:33 +00:00
## multiSearchFirstIndexCaseInsensitiveUTF8
2024-03-28 18:47:50 +00:00
Returns the index `i` (starting from 1) of the leftmost found needle< sub > i</ sub > in the string `haystack` and 0 otherwise. Assumes `haystack` and `needle` are UTF-8 encoded strings. Ignores case.
**Syntax**
```sql
multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Array of UTF-8 substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 18:47:50 +00:00
**Returned value**
2024-05-24 04:42:13 +00:00
- index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. [UInt8 ](../data-types/int-uint.md ).
2024-03-28 18:47:50 +00:00
**Example**
Given `HELLO WORLD` as a UTF-8 string, find the first index of UTF-8 strings `hello` and `world` .
Query:
```sql
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('\x48\x45\x4c\x4c\x4f\x20\x57\x4f\x52\x4c\x44',['\x68\x65\x6c\x6c\x6f','\x77\x6f\x72\x6c\x64']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 18:47:50 +00:00
```response
1
2020-03-13 06:33:02 +00:00
```
2024-03-28 20:20:33 +00:00
## multiSearchAny
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
Returns 1, if at least one string needle< sub > i</ sub > matches the string `haystack` and 0 otherwise.
2020-03-13 06:33:02 +00:00
2024-06-12 12:09:37 +00:00
Functions [`multiSearchAnyCaseInsensitive` ](#multisearchanycaseinsensitive ), [`multiSearchAnyUTF8` ](#multisearchanyutf8 ) and [`multiSearchAnyCaseInsensitiveUTF8` ](#multisearchanycaseinsensitiveutf8 ) provide case-insensitive and/or UTF-8 variants of this function.
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
```sql
2024-03-28 11:36:11 +00:00
multiSearchAny(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 11:36:11 +00:00
**Returned value**
- 1, if there was at least one match.
- 0, if there was not at least one match.
**Example**
Query:
```sql
SELECT multiSearchAny('ClickHouse',['C','H']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 11:36:11 +00:00
```response
1
```
2024-03-28 20:20:33 +00:00
## multiSearchAnyCaseInsensitive
2024-03-28 11:36:11 +00:00
Like [multiSearchAny ](#multisearchany ) but ignores case.
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
```sql
2024-03-28 11:36:11 +00:00
multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — Substrings to be searched. [Array ](../data-types/array.md )
2024-03-28 11:36:11 +00:00
**Returned value**
- 1, if there was at least one case-insensitive match.
- 0, if there was not at least one case-insensitive match.
**Example**
Query:
```sql
SELECT multiSearchAnyCaseInsensitive('ClickHouse',['c','h']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 11:36:11 +00:00
```response
1
```
2024-03-28 20:20:33 +00:00
## multiSearchAnyUTF8
2024-03-28 11:36:11 +00:00
2024-03-28 20:20:33 +00:00
Like [multiSearchAny ](#multisearchany ) but assumes `haystack` and the `needle` substrings are UTF-8 encoded strings.
2024-03-28 11:36:11 +00:00
*Syntax**
```sql
multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — UTF-8 substrings to be searched. [Array ](../data-types/array.md ).
2024-03-28 11:36:11 +00:00
**Returned value**
- 1, if there was at least one match.
- 0, if there was not at least one match.
**Example**
Given `ClickHouse` as a UTF-8 string, check if there are any `C` ('\x43') or `H` ('\x48') letters in the word.
Query:
```sql
SELECT multiSearchAnyUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x43','\x48']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 11:36:11 +00:00
```response
1
```
2024-03-28 20:20:33 +00:00
## multiSearchAnyCaseInsensitiveUTF8
2024-03-28 11:36:11 +00:00
2024-06-12 12:09:37 +00:00
Like [multiSearchAnyUTF8 ](#multisearchanyutf8 ) but ignores case.
2024-03-28 11:36:11 +00:00
*Syntax**
```sql
multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `needle` — UTF-8 substrings to be searched. [Array ](../data-types/array.md )
2024-03-28 11:36:11 +00:00
**Returned value**
- 1, if there was at least one case-insensitive match.
- 0, if there was not at least one case-insensitive match.
**Example**
Given `ClickHouse` as a UTF-8 string, check if there is any letter `h` (`\x68`) in the word, ignoring case.
Query:
```sql
SELECT multiSearchAnyCaseInsensitiveUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x68']);
```
2024-03-28 20:45:36 +00:00
Result:
2024-03-28 11:36:11 +00:00
```response
1
2020-03-13 06:33:02 +00:00
```
2024-01-10 13:39:19 +00:00
## match {#match}
2020-03-13 06:33:02 +00:00
2023-04-20 09:30:11 +00:00
Returns whether string `haystack` matches the regular expression `pattern` in [re2 regular syntax ](https://github.com/google/re2/wiki/Syntax ).
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
Matching is based on UTF-8, e.g. `.` matches the Unicode code point `¥` which is represented in UTF-8 using two bytes. The regular
expression must not contain null bytes. If the haystack or the pattern are not valid UTF-8, then the behavior is undefined.
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
Unlike re2's default behavior, `.` matches line breaks. To disable this, prepend the pattern with `(?-s)` .
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
If you only want to search substrings in a string, you can use functions [like ](#like ) or [position ](#position ) instead - they work much faster than this function.
2019-10-12 21:12:09 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
```sql
match(haystack, pattern)
2019-10-12 21:12:09 +00:00
```
2023-04-20 09:30:11 +00:00
Alias: `haystack REGEXP pattern operator`
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
## multiMatchAny
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
Like `match` but returns 1 if at least one of the patterns match and 0 otherwise.
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
:::note
Functions in the `multi[Fuzzy]Match*()` family use the the (Vectorscan)[https://github.com/VectorCamp/vectorscan] library. As such, they are only enabled if ClickHouse is compiled with support for vectorscan.
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
To turn off all functions that use hyperscan, use setting `SET allow_hyperscan = 0;` .
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
Due to restrictions of vectorscan, the length of the `haystack` string must be less than 2< sup > 32</ sup > bytes.
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
Hyperscan is generally vulnerable to regular expression denial of service (ReDoS) attacks (e.g. see
(here)[https://www.usenix.org/conference/usenixsecurity22/presentation/turonova], (here)[https://doi.org/10.1007/s10664-021-10033-1] and
(here)[https://doi.org/10.1145/3236024.3236027]. Users are adviced to check the provided patterns carefully.
:::
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
If you only want to search multiple substrings in a string, you can use function [multiSearchAny ](#multisearchany ) instead - it works much faster than this function.
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
```sql
2024-05-23 11:54:45 +00:00
multiMatchAny(haystack, \[pattern< sub > 1</ sub > , pattern< sub > 2</ sub > , ..., pattern< sub > n</ sub > \])
2019-10-12 21:12:09 +00:00
```
2023-04-20 09:30:11 +00:00
## multiMatchAnyIndex
2019-10-12 21:12:09 +00:00
2023-04-20 09:30:11 +00:00
Like `multiMatchAny` but returns any index that matches the haystack.
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
```sql
2024-05-23 11:54:45 +00:00
multiMatchAnyIndex(haystack, \[pattern< sub > 1</ sub > , pattern< sub > 2</ sub > , ..., pattern< sub > n</ sub > \])
2023-04-20 09:30:11 +00:00
```
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
## multiMatchAllIndices
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
Like `multiMatchAny` but returns the array of all indices that match the haystack in any order.
2019-03-23 22:49:38 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2019-03-23 22:49:38 +00:00
2023-04-20 09:30:11 +00:00
```sql
2024-05-23 11:54:45 +00:00
multiMatchAllIndices(haystack, \[pattern< sub > 1</ sub > , pattern< sub > 2</ sub > , ..., pattern< sub > n</ sub > \])
2023-04-20 09:30:11 +00:00
```
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
## multiFuzzyMatchAny
2019-01-23 08:38:32 +00:00
2023-04-20 09:30:11 +00:00
Like `multiMatchAny` but returns 1 if any pattern matches the haystack within a constant [edit distance ](https://en.wikipedia.org/wiki/Edit_distance ). This function relies on the experimental feature of [hyperscan ](https://intel.github.io/hyperscan/dev-reference/compilation.html#approximate-matching ) library, and can be slow for some corner cases. The performance depends on the edit distance value and patterns used, but it's always more expensive compared to a non-fuzzy variants.
2019-01-23 08:38:32 +00:00
2022-06-17 13:26:59 +00:00
:::note
2023-04-20 09:30:11 +00:00
`multiFuzzyMatch*()` function family do not support UTF-8 regular expressions (it threats them as a sequence of bytes) due to restrictions of hyperscan.
2022-04-09 13:29:05 +00:00
:::
2019-03-28 15:12:37 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2023-01-06 11:14:49 +00:00
2023-04-20 09:30:11 +00:00
```sql
2024-05-23 11:54:45 +00:00
multiFuzzyMatchAny(haystack, distance, \[pattern< sub > 1</ sub > , pattern< sub > 2</ sub > , ..., pattern< sub > n</ sub > \])
2023-04-20 09:30:11 +00:00
```
2017-12-28 15:13:23 +00:00
2023-04-20 09:30:11 +00:00
## multiFuzzyMatchAnyIndex
2019-03-23 22:49:38 +00:00
2023-04-20 09:30:11 +00:00
Like `multiFuzzyMatchAny` but returns any index that matches the haystack within a constant edit distance.
2019-03-23 22:49:38 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2023-02-08 13:07:27 +00:00
2023-04-20 09:30:11 +00:00
```sql
2024-05-23 11:54:45 +00:00
multiFuzzyMatchAnyIndex(haystack, distance, \[pattern< sub > 1</ sub > , pattern< sub > 2</ sub > , ..., pattern< sub > n</ sub > \])
2023-04-20 09:30:11 +00:00
```
2023-02-08 13:07:27 +00:00
2023-04-20 09:30:11 +00:00
## multiFuzzyMatchAllIndices
2019-03-24 21:47:34 +00:00
2023-04-20 09:30:11 +00:00
Like `multiFuzzyMatchAny` but returns the array of all indices in any order that match the haystack within a constant edit distance.
2019-03-23 19:40:16 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2019-03-23 19:40:16 +00:00
2023-04-20 09:30:11 +00:00
```sql
2024-05-23 11:54:45 +00:00
multiFuzzyMatchAllIndices(haystack, distance, \[pattern< sub > 1</ sub > , pattern< sub > 2</ sub > , ..., pattern< sub > n</ sub > \])
2023-04-20 09:30:11 +00:00
```
2019-10-13 13:22:09 +00:00
2023-04-20 09:30:11 +00:00
## extract
2019-10-13 13:22:09 +00:00
2023-04-20 09:30:11 +00:00
Extracts a fragment of a string using a regular expression. If `haystack` does not match the `pattern` regex, an empty string is returned.
2019-03-29 01:02:05 +00:00
2023-04-20 09:30:11 +00:00
For regex without subpatterns, the function uses the fragment that matches the entire regex. Otherwise, it uses the fragment that matches the first subpattern.
2019-03-29 01:02:05 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2019-03-29 01:02:05 +00:00
2023-04-20 09:30:11 +00:00
```sql
extract(haystack, pattern)
```
2019-03-29 01:02:05 +00:00
2023-04-20 09:30:11 +00:00
## extractAll
2019-10-13 13:22:09 +00:00
2023-04-20 09:30:11 +00:00
Extracts all fragments of a string using a regular expression. If `haystack` does not match the `pattern` regex, an empty string is returned.
2019-10-13 13:22:09 +00:00
2023-04-20 09:30:11 +00:00
Returns an array of strings consisting of all matches of the regex.
2019-05-05 06:51:36 +00:00
2023-04-20 09:30:11 +00:00
The behavior with respect to subpatterns is the same as in function `extract` .
2017-12-28 15:13:23 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2017-12-28 15:13:23 +00:00
2023-04-20 09:30:11 +00:00
```sql
extractAll(haystack, pattern)
```
2017-12-28 15:13:23 +00:00
2022-06-02 10:55:18 +00:00
## extractAllGroupsHorizontal
2020-10-06 11:17:19 +00:00
2021-07-29 15:27:50 +00:00
Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.
2020-10-06 11:17:19 +00:00
2024-06-12 12:09:37 +00:00
This function is slower than [extractAllGroupsVertical ](#extractallgroupsvertical ).
2020-10-06 11:17:19 +00:00
2021-07-29 15:20:55 +00:00
**Syntax**
2020-10-06 11:17:19 +00:00
``` sql
extractAllGroupsHorizontal(haystack, pattern)
```
2021-07-29 15:20:55 +00:00
**Arguments**
2020-10-06 11:17:19 +00:00
2024-05-24 03:54:16 +00:00
- `haystack` — Input string. [String ](../data-types/string.md ).
- `pattern` — Regular expression with [re2 syntax ](https://github.com/google/re2/wiki/Syntax ). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. [String ](../data-types/string.md ).
2020-10-06 11:17:19 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Array of arrays of matches. [Array ](../data-types/array.md ).
2020-10-06 11:17:19 +00:00
2024-05-23 13:48:20 +00:00
:::note
2021-07-29 15:20:55 +00:00
If `haystack` does not match the `pattern` regex, an array of empty arrays is returned.
2024-05-23 13:48:20 +00:00
:::
2020-10-06 11:17:19 +00:00
**Example**
``` sql
2021-03-13 18:18:45 +00:00
SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
2020-10-06 11:17:19 +00:00
```
Result:
``` text
┌─extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','def','ghi'],['111','222','333']] │
└──────────────────────────────────────────────────────────────────────────────────────────┘
```
2022-06-02 10:55:18 +00:00
## extractAllGroupsVertical
2020-10-06 11:17:19 +00:00
Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the `haystack` .
2021-07-29 15:20:55 +00:00
**Syntax**
2020-10-06 11:17:19 +00:00
``` sql
extractAllGroupsVertical(haystack, pattern)
```
2021-07-29 15:20:55 +00:00
**Arguments**
2020-10-06 11:17:19 +00:00
2024-05-24 03:54:16 +00:00
- `haystack` — Input string. [String ](../data-types/string.md ).
- `pattern` — Regular expression with [re2 syntax ](https://github.com/google/re2/wiki/Syntax ). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. [String ](../data-types/string.md ).
2020-10-06 11:17:19 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Array of arrays of matches. [Array ](../data-types/array.md ).
2020-10-06 11:17:19 +00:00
2024-05-23 13:48:20 +00:00
:::note
2021-07-29 15:20:55 +00:00
If `haystack` does not match the `pattern` regex, an empty array is returned.
2024-05-23 13:48:20 +00:00
:::
2020-10-06 11:17:19 +00:00
**Example**
``` sql
2021-03-13 18:18:45 +00:00
SELECT extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
2020-10-06 11:17:19 +00:00
```
Result:
``` text
┌─extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','111'],['def','222'],['ghi','333']] │
└────────────────────────────────────────────────────────────────────────────────────────┘
```
2024-06-12 12:09:37 +00:00
## like
2020-10-06 11:17:19 +00:00
2023-04-20 09:30:11 +00:00
Returns whether string `haystack` matches the LIKE expression `pattern` .
2017-12-28 15:13:23 +00:00
2023-04-20 09:30:11 +00:00
A LIKE expression can contain normal characters and the following metasymbols:
2017-12-28 15:13:23 +00:00
2023-04-19 15:55:29 +00:00
- `%` indicates an arbitrary number of arbitrary characters (including zero characters).
- `_` indicates a single arbitrary character.
- `\` is for escaping literals `%` , `_` and `\` .
2017-12-28 15:13:23 +00:00
2022-06-17 13:26:59 +00:00
Matching is based on UTF-8, e.g. `_` matches the Unicode code point `¥` which is represented in UTF-8 using two bytes.
2022-06-02 09:56:06 +00:00
2023-04-20 09:30:11 +00:00
If the haystack or the LIKE expression are not valid UTF-8, the behavior is undefined.
2017-12-28 15:13:23 +00:00
2024-05-24 03:54:16 +00:00
No automatic Unicode normalization is performed, you can use the [normalizeUTF8*() ](https://clickhouse.com../functions/string-functions/ ) functions for that.
2020-10-19 15:32:09 +00:00
2024-04-11 13:56:47 +00:00
To match against literal `%` , `_` and `\` (which are LIKE metacharacters), prepend them with a backslash: `\%` , `\_` and `\\` .
2023-04-20 09:30:11 +00:00
The backslash loses its special meaning (i.e. is interpreted literally) if it prepends a character different than `%` , `_` or `\` .
Note that ClickHouse requires backslashes in strings [to be quoted as well ](../syntax.md#string ), so you would actually need to write `\\%` , `\\_` and `\\\\` .
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
For LIKE expressions of the form `%needle%` , the function is as fast as the `position` function.
All other LIKE expressions are internally converted to a regular expression and executed with a performance similar to function `match` .
2022-06-17 13:26:59 +00:00
2020-10-19 15:32:09 +00:00
**Syntax**
2023-04-20 09:30:11 +00:00
```sql
like(haystack, pattern)
2020-10-19 15:32:09 +00:00
```
2023-04-20 09:30:11 +00:00
Alias: `haystack LIKE pattern` (operator)
2020-10-19 15:32:09 +00:00
2024-01-10 13:34:55 +00:00
## notLike {#notlike}
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
Like `like` but negates the result.
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
Alias: `haystack NOT LIKE pattern` (operator)
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
## ilike
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
Like `like` but searches case-insensitively.
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
Alias: `haystack ILIKE pattern` (operator)
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
## notILike
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
Like `ilike` but negates the result.
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
Alias: `haystack NOT ILIKE pattern` (operator)
2020-10-19 15:32:09 +00:00
2023-04-20 09:30:11 +00:00
## ngramDistance
2023-01-09 14:13:36 +00:00
2024-05-24 03:54:16 +00:00
Calculates the 4-gram distance between a `haystack` string and a `needle` string. For this, it counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns a [Float32 ](../data-types/float.md/#float32-float64 ) between 0 and 1. The smaller the result is, the more similar the strings are to each other.
2020-10-19 15:32:09 +00:00
2024-03-30 15:30:55 +00:00
Functions [`ngramDistanceCaseInsensitive` ](#ngramdistancecaseinsensitive ), [`ngramDistanceUTF8` ](#ngramdistanceutf8 ), [`ngramDistanceCaseInsensitiveUTF8` ](#ngramdistancecaseinsensitiveutf8 ) provide case-insensitive and/or UTF-8 variants of this function.
2019-03-05 22:42:28 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
2019-03-05 22:42:28 +00:00
2023-04-20 09:30:11 +00:00
```sql
ngramDistance(haystack, needle)
```
2019-03-05 22:42:28 +00:00
2024-03-30 15:30:55 +00:00
**Parameters**
2024-04-01 17:41:18 +00:00
- `haystack` : First comparison string. [String literal ](../syntax#string )
- `needle` : Second comparison string. [String literal ](../syntax#string )
2024-03-30 15:30:55 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Value between 0 and 1 representing the similarity between the two strings. [Float32 ](../data-types/float.md/#float32-float64 )
2024-03-30 15:30:55 +00:00
**Implementation details**
This function will throw an exception if constant `needle` or `haystack` arguments are more than 32Kb in size. If any non-constant `haystack` or `needle` arguments are more than 32Kb in size, then the distance is always 1.
**Examples**
The more similar two strings are to each other, the closer the result will be to 0 (identical).
Query:
```sql
SELECT ngramDistance('ClickHouse','ClickHouse!');
```
Result:
```response
0.06666667
```
The less similar two strings are to each, the larger the result will be.
Query:
```sql
SELECT ngramDistance('ClickHouse','House');
```
Result:
```response
0.5555556
```
## ngramDistanceCaseInsensitive
2024-03-30 19:04:51 +00:00
Provides a case-insensitive variant of [ngramDistance ](#ngramdistance ).
2024-03-30 15:30:55 +00:00
**Syntax**
```sql
ngramDistanceCaseInsensitive(haystack, needle)
```
**Parameters**
2024-04-01 17:41:18 +00:00
- `haystack` : First comparison string. [String literal ](../syntax#string )
- `needle` : Second comparison string. [String literal ](../syntax#string )
2024-03-30 15:30:55 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Value between 0 and 1 representing the similarity between the two strings. [Float32 ](../data-types/float.md/#float32-float64 )
2024-03-30 15:30:55 +00:00
**Examples**
2024-04-01 18:00:30 +00:00
With [ngramDistance ](#ngramdistance ) differences in case will affect the similarity value:
2024-03-30 15:30:55 +00:00
Query:
```sql
SELECT ngramDistance('ClickHouse','clickhouse');
```
Result:
```response
0.71428573
```
2024-04-01 18:00:30 +00:00
With [ngramDistanceCaseInsensitive ](#ngramdistancecaseinsensitive ) case is ignored so two identical strings differing only in case will now return a low similarity value:
2024-03-30 15:30:55 +00:00
Query:
```sql
SELECT ngramDistanceCaseInsensitive('ClickHouse','clickhouse');
```
Result:
```response
0
```
## ngramDistanceUTF8
Provides a UTF-8 variant of [ngramDistance ](#ngramdistance ). Assumes that `needle` and `haystack` strings are UTF-8 encoded strings.
**Syntax**
```sql
ngramDistanceUTF8(haystack, needle)
```
**Parameters**
2024-04-01 17:41:18 +00:00
- `haystack` : First UTF-8 encoded comparison string. [String literal ](../syntax#string )
- `needle` : Second UTF-8 encoded comparison string. [String literal ](../syntax#string )
2024-03-30 15:30:55 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Value between 0 and 1 representing the similarity between the two strings. [Float32 ](../data-types/float.md/#float32-float64 )
2024-03-30 15:30:55 +00:00
2024-04-01 17:31:02 +00:00
**Example**
Query:
```sql
SELECT ngramDistanceUTF8('abcde','cde');
```
Result:
```response
0.5
```
2024-03-30 15:30:55 +00:00
## ngramDistanceCaseInsensitiveUTF8
Provides a case-insensitive variant of [ngramDistanceUTF8 ](#ngramdistanceutf8 ).
**Syntax**
```sql
ngramDistanceCaseInsensitiveUTF8(haystack, needle)
```
**Parameters**
2024-04-01 17:41:18 +00:00
- `haystack` : First UTF-8 encoded comparison string. [String literal ](../syntax#string )
- `needle` : Second UTF-8 encoded comparison string. [String literal ](../syntax#string )
2024-03-30 15:30:55 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Value between 0 and 1 representing the similarity between the two strings. [Float32 ](../data-types/float.md/#float32-float64 )
2024-03-30 15:30:55 +00:00
2024-04-01 17:31:02 +00:00
**Example**
Query:
```sql
SELECT ngramDistanceCaseInsensitiveUTF8('abcde','CDE');
```
Result:
```response
0.5
```
2024-03-30 15:30:55 +00:00
2023-04-20 09:30:11 +00:00
## ngramSearch
2019-03-05 22:42:28 +00:00
2024-05-24 03:54:16 +00:00
Like `ngramDistance` but calculates the non-symmetric difference between a `needle` string and a `haystack` string, i.e. the number of n-grams from the needle minus the common number of n-grams normalized by the number of `needle` n-grams. Returns a [Float32 ](../data-types/float.md/#float32-float64 ) between 0 and 1. The bigger the result is, the more likely `needle` is in the `haystack` . This function is useful for fuzzy string search. Also see function [`soundex` ](../../sql-reference/functions/string-functions#soundex ).
2019-05-25 18:47:26 +00:00
2024-03-30 16:22:31 +00:00
Functions [`ngramSearchCaseInsensitive` ](#ngramsearchcaseinsensitive ), [`ngramSearchUTF8` ](#ngramsearchutf8 ), [`ngramSearchCaseInsensitiveUTF8` ](#ngramsearchcaseinsensitiveutf8 ) provide case-insensitive and/or UTF-8 variants of this function.
**Syntax**
```sql
ngramSearch(haystack, needle)
```
**Parameters**
2024-04-01 17:41:18 +00:00
- `haystack` : First comparison string. [String literal ](../syntax#string )
- `needle` : Second comparison string. [String literal ](../syntax#string )
2024-03-30 16:22:31 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Value between 0 and 1 representing the likelihood of the `needle` being in the `haystack` . [Float32 ](../data-types/float.md/#float32-float64 )
2024-03-30 16:22:31 +00:00
**Implementation details**
2019-05-25 18:47:26 +00:00
2022-06-17 13:26:59 +00:00
:::note
2023-04-20 09:30:11 +00:00
The UTF-8 variants use the 3-gram distance. These are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables – collisions may occur. With UTF-8 case-insensitive format we do not use fair `tolower` function – we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one – this works for Latin and mostly for all Cyrillic letters.
2022-04-09 13:29:05 +00:00
:::
2018-10-16 10:47:17 +00:00
2024-03-30 16:22:31 +00:00
**Example**
Query:
```sql
SELECT ngramSearch('Hello World','World Hello');
```
Result:
```response
0.5
```
## ngramSearchCaseInsensitive
2024-06-12 12:09:37 +00:00
Provides a case-insensitive variant of [ngramSearch ](#ngramsearch ).
2024-03-30 16:22:31 +00:00
2023-04-20 09:30:11 +00:00
**Syntax**
```sql
2024-03-30 16:22:31 +00:00
ngramSearchCaseInsensitive(haystack, needle)
```
**Parameters**
2024-04-01 17:41:18 +00:00
- `haystack` : First comparison string. [String literal ](../syntax#string )
- `needle` : Second comparison string. [String literal ](../syntax#string )
2024-03-30 16:22:31 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Value between 0 and 1 representing the likelihood of the `needle` being in the `haystack` . [Float32 ](../data-types/float.md/#float32-float64 )
2024-03-30 16:22:31 +00:00
The bigger the result is, the more likely `needle` is in the `haystack` .
**Example**
Query:
```sql
SELECT ngramSearchCaseInsensitive('Hello World','hello');
```
Result:
```response
1
```
## ngramSearchUTF8
Provides a UTF-8 variant of [ngramSearch ](#ngramsearch ) in which `needle` and `haystack` are assumed to be UTF-8 encoded strings.
**Syntax**
```sql
ngramSearchUTF8(haystack, needle)
```
**Parameters**
2024-04-01 17:41:18 +00:00
- `haystack` : First UTF-8 encoded comparison string. [String literal ](../syntax#string )
- `needle` : Second UTF-8 encoded comparison string. [String literal ](../syntax#string )
2024-03-30 16:22:31 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Value between 0 and 1 representing the likelihood of the `needle` being in the `haystack` . [Float32 ](../data-types/float.md/#float32-float64 )
2024-03-30 16:22:31 +00:00
The bigger the result is, the more likely `needle` is in the `haystack` .
**Example**
Query:
```sql
SELECT ngramSearchUTF8('абвгдеёжз', 'гдеёзд');
```
Result:
```response
0.5
```
## ngramSearchCaseInsensitiveUTF8
Provides a case-insensitive variant of [ngramSearchUTF8 ](#ngramsearchutf8 ) in which `needle` and `haystack` .
**Syntax**
```sql
ngramSearchCaseInsensitiveUTF8(haystack, needle)
```
**Parameters**
2024-04-01 17:41:18 +00:00
- `haystack` : First UTF-8 encoded comparison string. [String literal ](../syntax#string )
- `needle` : Second UTF-8 encoded comparison string. [String literal ](../syntax#string )
2024-03-30 16:22:31 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- Value between 0 and 1 representing the likelihood of the `needle` being in the `haystack` . [Float32 ](../data-types/float.md/#float32-float64 )
2024-03-30 16:22:31 +00:00
The bigger the result is, the more likely `needle` is in the `haystack` .
**Example**
Query:
```sql
SELECT ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз');
```
Result:
```response
0.57142854
2023-04-20 09:30:11 +00:00
```
2022-06-02 10:55:18 +00:00
## countSubstrings
2020-11-26 18:16:07 +00:00
2024-04-13 09:32:40 +00:00
Returns how often a substring `needle` occurs in a string `haystack` .
2020-11-26 18:16:07 +00:00
2024-04-13 09:25:08 +00:00
Functions [`countSubstringsCaseInsensitive` ](#countsubstringscaseinsensitive ) and [`countSubstringsCaseInsensitiveUTF8` ](#countsubstringscaseinsensitiveutf8 ) provide case-insensitive and case-insensitive + UTF-8 variants of this function respectively.
2020-11-26 18:16:07 +00:00
**Syntax**
``` sql
countSubstrings(haystack, needle[, start_pos])
```
2021-02-15 21:22:10 +00:00
**Arguments**
2020-11-26 18:16:07 +00:00
2023-04-20 09:30:11 +00:00
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `needle` — Substring to be searched. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `start_pos` – Position (1-based) in `haystack` at which the search starts. [UInt ](../data-types/int-uint.md ). Optional.
2020-11-26 18:16:07 +00:00
2024-05-24 04:42:13 +00:00
**Returned value**
2020-11-26 18:16:07 +00:00
2024-05-24 03:54:16 +00:00
- The number of occurrences. [UInt64 ](../data-types/int-uint.md ).
2020-11-26 18:16:07 +00:00
**Examples**
``` sql
2020-12-29 10:54:17 +00:00
SELECT countSubstrings('aaaa', 'aa');
2020-11-26 18:16:07 +00:00
```
Result:
``` text
┌─countSubstrings('aaaa', 'aa')─┐
│ 2 │
└───────────────────────────────┘
```
2023-04-20 09:30:11 +00:00
Example with `start_pos` argument:
2020-12-29 10:54:17 +00:00
```sql
SELECT countSubstrings('abc___abc', 'abc', 4);
```
Result:
``` text
┌─countSubstrings('abc___abc', 'abc', 4)─┐
│ 1 │
└────────────────────────────────────────┘
```
2024-04-13 09:25:08 +00:00
## countSubstringsCaseInsensitive
2024-04-13 09:32:40 +00:00
Returns how often a substring `needle` occurs in a string `haystack` . Ignores case.
2024-04-13 09:25:08 +00:00
**Syntax**
``` sql
countSubstringsCaseInsensitive(haystack, needle[, start_pos])
```
**Arguments**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `needle` — Substring to be searched. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `start_pos` – Position (1-based) in `haystack` at which the search starts. [UInt ](../data-types/int-uint.md ). Optional.
2024-04-13 09:25:08 +00:00
2024-05-24 04:42:13 +00:00
**Returned value**
2024-04-13 09:25:08 +00:00
2024-05-24 03:54:16 +00:00
- The number of occurrences. [UInt64 ](../data-types/int-uint.md ).
2024-04-13 09:25:08 +00:00
**Examples**
2024-04-13 09:32:40 +00:00
Query:
2024-04-13 09:25:08 +00:00
``` sql
SELECT countSubstringsCaseInsensitive('AAAA', 'aa');
```
Result:
``` text
┌─countSubstringsCaseInsensitive('AAAA', 'aa')─┐
│ 2 │
└──────────────────────────────────────────────┘
```
Example with `start_pos` argument:
2024-04-13 09:32:40 +00:00
Query:
2024-04-13 09:25:08 +00:00
```sql
SELECT countSubstringsCaseInsensitive('abc___ABC___abc', 'abc', 4);
```
Result:
``` text
┌─countSubstringsCaseInsensitive('abc___ABC___abc', 'abc', 4)─┐
│ 2 │
└─────────────────────────────────────────────────────────────┘
```
## countSubstringsCaseInsensitiveUTF8
2024-04-13 09:32:40 +00:00
Returns how often a substring `needle` occurs in a string `haystack` . Ignores case and assumes that `haystack` is a UTF8 string.
2024-04-13 09:25:08 +00:00
**Syntax**
``` sql
countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])
```
**Arguments**
- `haystack` — UTF-8 string in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `needle` — Substring to be searched. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `start_pos` – Position (1-based) in `haystack` at which the search starts. [UInt ](../data-types/int-uint.md ). Optional.
2024-04-13 09:25:08 +00:00
2024-05-24 04:42:13 +00:00
**Returned value**
2024-04-13 09:25:08 +00:00
2024-05-24 03:54:16 +00:00
- The number of occurrences. [UInt64 ](../data-types/int-uint.md ).
2024-04-13 09:25:08 +00:00
**Examples**
2024-04-13 09:32:40 +00:00
Query:
2024-04-13 09:25:08 +00:00
``` sql
SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'К А ');
```
Result:
``` text
┌─countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'К А ')─┐
│ 4 │
└────────────────────────────────────────────────────────────────────┘
```
Example with `start_pos` argument:
2024-04-13 09:32:40 +00:00
Query:
2024-04-13 09:25:08 +00:00
```sql
SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'К А ', 13);
```
Result:
``` text
┌─countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'К А ', 13)─┐
│ 2 │
└────────────────────────────────────────────────────────────────────────┘
```
2020-12-29 10:54:17 +00:00
2023-04-20 09:30:11 +00:00
## countMatches
2020-10-23 04:28:25 +00:00
Returns the number of regular expression matches for a `pattern` in a `haystack` .
2020-12-29 10:54:17 +00:00
2020-12-21 19:30:37 +00:00
**Syntax**
``` sql
countMatches(haystack, pattern)
```
2021-02-15 21:22:10 +00:00
**Arguments**
2020-12-21 19:30:37 +00:00
2023-04-19 15:55:29 +00:00
- `haystack` — The string to search in. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `pattern` — The regular expression with [re2 syntax ](https://github.com/google/re2/wiki/Syntax ). [String ](../data-types/string.md ).
2020-12-21 19:30:37 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- The number of matches. [UInt64 ](../data-types/int-uint.md ).
2020-12-21 19:30:37 +00:00
**Examples**
``` sql
2020-12-24 17:06:11 +00:00
SELECT countMatches('foobar.com', 'o+');
2020-12-21 19:30:37 +00:00
```
Result:
``` text
2020-12-22 19:10:03 +00:00
┌─countMatches('foobar.com', 'o+')─┐
│ 2 │
└──────────────────────────────────┘
2020-12-21 19:30:37 +00:00
```
``` sql
2020-12-24 17:06:11 +00:00
SELECT countMatches('aaaa', 'aa');
2020-12-21 19:30:37 +00:00
```
Result:
``` text
2020-12-24 17:06:11 +00:00
┌─countMatches('aaaa', 'aa')────┐
2020-12-21 19:30:37 +00:00
│ 2 │
└───────────────────────────────┘
```
2023-02-16 09:33:51 +00:00
2024-02-08 11:27:24 +00:00
## countMatchesCaseInsensitive
2024-04-13 09:34:57 +00:00
Returns the number of regular expression matches for a pattern in a haystack like [`countMatches` ](#countmatches ) but matching ignores the case.
2024-04-13 09:05:43 +00:00
**Syntax**
``` sql
countMatchesCaseInsensitive(haystack, pattern)
```
**Arguments**
- `haystack` — The string to search in. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `pattern` — The regular expression with [re2 syntax ](https://github.com/google/re2/wiki/Syntax ). [String ](../data-types/string.md ).
2024-04-13 09:05:43 +00:00
**Returned value**
2024-05-24 03:54:16 +00:00
- The number of matches. [UInt64 ](../data-types/int-uint.md ).
2024-04-13 09:05:43 +00:00
**Examples**
Query:
``` sql
SELECT countMatchesCaseInsensitive('AAAA', 'aa');
```
Result:
``` text
┌─countMatchesCaseInsensitive('AAAA', 'aa')────┐
│ 2 │
└──────────────────────────────────────────────┘
```
2024-02-08 11:27:24 +00:00
2023-04-20 09:30:11 +00:00
## regexpExtract
2023-02-16 09:33:51 +00:00
2024-04-01 15:06:54 +00:00
Extracts the first string in `haystack` that matches the regexp pattern and corresponds to the regex group index.
2023-02-16 09:33:51 +00:00
**Syntax**
``` sql
regexpExtract(haystack, pattern[, index])
```
Alias: `REGEXP_EXTRACT(haystack, pattern[, index])` .
**Arguments**
2023-04-19 15:55:29 +00:00
- `haystack` — String, in which regexp pattern will to be matched. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `pattern` — String, regexp expression, must be constant. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 03:54:16 +00:00
- `index` – An integer number greater or equal 0 with default 1. It represents which regex group to extract. [UInt or Int ](../data-types/int-uint.md ). Optional.
2023-02-16 09:33:51 +00:00
2024-05-24 04:42:13 +00:00
**Returned value**
2023-02-16 09:33:51 +00:00
2024-05-23 13:48:20 +00:00
`pattern` may contain multiple regexp groups, `index` indicates which regex group to extract. An index of 0 means matching the entire regular expression. [String ](../data-types/string.md ).
2023-02-16 09:33:51 +00:00
**Examples**
``` sql
SELECT
regexpExtract('100-200', '(\\d+)-(\\d+)', 1),
regexpExtract('100-200', '(\\d+)-(\\d+)', 2),
regexpExtract('100-200', '(\\d+)-(\\d+)', 0),
2023-04-20 09:30:11 +00:00
regexpExtract('100-200', '(\\d+)-(\\d+)');
```
2023-02-16 09:33:51 +00:00
2023-04-20 09:30:11 +00:00
Result:
``` text
2023-02-16 09:33:51 +00:00
┌─regexpExtract('100-200', '(\\d+)-(\\d+)', 1)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)', 2)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)', 0)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)')─┐
│ 100 │ 200 │ 100-200 │ 100 │
└──────────────────────────────────────────────┴──────────────────────────────────────────────┴──────────────────────────────────────────────┴───────────────────────────────────────────┘
```
2023-07-06 19:43:37 +00:00
## hasSubsequence
2024-04-01 15:06:54 +00:00
Returns 1 if `needle` is a subsequence of `haystack` , or 0 otherwise.
2023-07-06 19:43:37 +00:00
A subsequence of a string is a sequence that can be derived from the given string by deleting zero or more elements without changing the order of the remaining elements.
**Syntax**
``` sql
hasSubsequence(haystack, needle)
```
**Arguments**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2023-07-10 09:18:09 +00:00
- `needle` — Subsequence to be searched. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2023-07-06 19:43:37 +00:00
2024-05-24 04:42:13 +00:00
**Returned value**
2023-07-06 19:43:37 +00:00
2024-05-24 04:42:13 +00:00
- 1, if needle is a subsequence of haystack, 0 otherwise. [UInt8 ](../data-types/int-uint.md ).
2023-07-06 19:43:37 +00:00
**Examples**
2024-03-30 13:13:30 +00:00
Query:
2023-07-06 19:43:37 +00:00
``` sql
2024-03-30 13:13:30 +00:00
SELECT hasSubsequence('garbage', 'arg');
2023-07-06 19:43:37 +00:00
```
Result:
``` text
┌─hasSubsequence('garbage', 'arg')─┐
│ 1 │
└──────────────────────────────────┘
```
## hasSubsequenceCaseInsensitive
2024-06-12 12:09:37 +00:00
Like [hasSubsequence ](#hassubsequence ) but searches case-insensitively.
2023-07-06 19:43:37 +00:00
2024-03-30 13:13:30 +00:00
**Syntax**
``` sql
2024-03-30 14:16:01 +00:00
hasSubsequenceCaseInsensitive(haystack, needle)
2024-03-30 13:13:30 +00:00
```
**Arguments**
- `haystack` — String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `needle` — Subsequence to be searched. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 04:42:13 +00:00
**Returned value**
2024-03-30 13:13:30 +00:00
2024-05-24 04:42:13 +00:00
- 1, if needle is a subsequence of haystack, 0 otherwise [UInt8 ](../data-types/int-uint.md ).
2024-03-30 13:13:30 +00:00
**Examples**
Query:
``` sql
SELECT hasSubsequenceCaseInsensitive('garbage', 'ARG');
```
Result:
``` text
┌─hasSubsequenceCaseInsensitive('garbage', 'ARG')─┐
│ 1 │
└─────────────────────────────────────────────────┘
```
2023-07-06 19:43:37 +00:00
## hasSubsequenceUTF8
2024-06-12 13:09:50 +00:00
Like [hasSubsequence ](#hassubsequence ) but assumes `haystack` and `needle` are UTF-8 encoded strings.
2023-07-06 19:43:37 +00:00
2024-03-30 13:13:30 +00:00
**Syntax**
``` sql
hasSubsequenceUTF8(haystack, needle)
```
**Arguments**
- `haystack` — String in which the search is performed. UTF-8 encoded [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `needle` — Subsequence to be searched. UTF-8 encoded [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 04:42:13 +00:00
**Returned value**
2024-03-30 13:13:30 +00:00
2024-05-24 04:42:13 +00:00
- 1, if needle is a subsequence of haystack, 0, otherwise. [UInt8 ](../data-types/int-uint.md ).
2024-03-30 13:13:30 +00:00
Query:
**Examples**
``` sql
select hasSubsequenceUTF8('ClickHouse - столбцовая система управления базами данных', 'система');
```
Result:
``` text
┌─hasSubsequenceUTF8('ClickHouse - столбцовая система управления базами данных', 'система')─┐
│ 1 │
└───────────────────────────────────────────────────────────────────────────────────────────┘
```
2023-07-06 19:43:37 +00:00
## hasSubsequenceCaseInsensitiveUTF8
2024-06-12 12:09:37 +00:00
Like [hasSubsequenceUTF8 ](#hassubsequenceutf8 ) but searches case-insensitively.
2024-03-28 10:22:28 +00:00
2024-03-30 13:13:30 +00:00
**Syntax**
``` sql
hasSubsequenceCaseInsensitiveUTF8(haystack, needle)
```
**Arguments**
- `haystack` — String in which the search is performed. UTF-8 encoded [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `needle` — Subsequence to be searched. UTF-8 encoded [String ](../../sql-reference/syntax.md#syntax-string-literal ).
2024-05-24 04:42:13 +00:00
**Returned value**
2024-03-30 13:13:30 +00:00
2024-05-24 04:42:13 +00:00
- 1, if needle is a subsequence of haystack, 0 otherwise. [UInt8 ](../data-types/int-uint.md ).
2024-03-30 13:13:30 +00:00
**Examples**
Query:
``` sql
select hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА');
```
Result:
``` text
┌─hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА')─┐
│ 1 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
2024-03-28 10:22:28 +00:00
## hasToken
Returns 1 if a given token is present in a haystack, or 0 otherwise.
**Syntax**
```sql
hasToken(haystack, token)
```
**Parameters**
- `haystack` : String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `token` : Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).
**Returned value**
2024-05-24 04:42:13 +00:00
- 1, if the token is present in the haystack, 0 otherwise. [UInt8 ](../data-types/int-uint.md ).
2024-03-28 10:22:28 +00:00
**Implementation details**
Token must be a constant string. Supported by tokenbf_v1 index specialization.
**Example**
Query:
```sql
SELECT hasToken('Hello World','Hello');
```
```response
1
```
## hasTokenOrNull
Returns 1 if a given token is present, 0 if not present, and null if the token is ill-formed.
**Syntax**
```sql
hasTokenOrNull(haystack, token)
```
**Parameters**
- `haystack` : String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `token` : Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).
**Returned value**
2024-05-24 04:42:13 +00:00
- 1, if the token is present in the haystack, 0 if it is not present, and null if the token is ill formed.
2024-03-28 10:22:28 +00:00
**Implementation details**
Token must be a constant string. Supported by tokenbf_v1 index specialization.
**Example**
Where `hasToken` would throw an error for an ill-formed token, `hasTokenOrNull` returns `null` for an ill-formed token.
Query:
```sql
SELECT hasTokenOrNull('Hello World','Hello,World');
```
```response
null
```
## hasTokenCaseInsensitive
Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case.
**Syntax**
```sql
hasTokenCaseInsensitive(haystack, token)
```
**Parameters**
- `haystack` : String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `token` : Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).
**Returned value**
2024-05-24 04:42:13 +00:00
- 1, if the token is present in the haystack, 0 otherwise. [UInt8 ](../data-types/int-uint.md ).
2024-03-28 10:22:28 +00:00
**Implementation details**
Token must be a constant string. Supported by tokenbf_v1 index specialization.
**Example**
Query:
```sql
SELECT hasTokenCaseInsensitive('Hello World','hello');
```
```response
1
```
## hasTokenCaseInsensitiveOrNull
Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case and returns null if the token is ill-formed.
**Syntax**
```sql
2024-03-30 14:16:01 +00:00
hasTokenCaseInsensitiveOrNull(haystack, token)
2024-03-28 10:22:28 +00:00
```
**Parameters**
- `haystack` : String in which the search is performed. [String ](../../sql-reference/syntax.md#syntax-string-literal ).
- `token` : Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).
**Returned value**
2024-05-24 04:42:13 +00:00
- 1, if the token is present in the haystack, 0 if the token is not present, otherwise [`null` ](../data-types/nullable.md ) if the token is ill-formed. [UInt8 ](../data-types/int-uint.md ).
2024-03-28 10:22:28 +00:00
**Implementation details**
Token must be a constant string. Supported by tokenbf_v1 index specialization.
**Example**
Where `hasTokenCaseInsensitive` would throw an error for an ill-formed token, `hasTokenCaseInsensitiveOrNull` returns `null` for an ill-formed token.
Query:
```sql
SELECT hasTokenCaseInsensitiveOrNull('Hello World','hello,world');
```
```response
null
2024-04-11 13:56:47 +00:00
```