Merge pull request #62075 from Blargian/document_multiSearchXYZ

[Docs] add multiSearchXYZ functions
This commit is contained in:
Robert Schulze 2024-04-09 21:02:01 +00:00 committed by GitHub
commit 007ae001de
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 558 additions and 23 deletions

View File

@ -6,14 +6,17 @@ sidebar_label: Searching in Strings
# Functions for Searching in Strings
All functions in this section search by default case-sensitively. Case-insensitive search is usually provided by separate function variants.
Note that case-insensitive search follows the lowercase-uppercase rules of the English language. E.g. Uppercased `i` in English language is
`I` whereas in Turkish language it is `İ` - results for languages other than English may be unexpected.
All functions in this section search case-sensitively by default. Case-insensitive search is usually provided by separate function variants.
Functions in this section also assume that the searched string and the search string are single-byte encoded text. If this assumption is
:::note
Case-insensitive search follows the lowercase-uppercase rules of the English language. E.g. Uppercased `i` in the English language is
`I` whereas in the Turkish language it is `İ` - results for languages other than English may be unexpected.
:::
Functions in this section also assume that the searched string (referred to in this section as `haystack`) and the search string (referred to in this section as `needle`) are single-byte encoded text. If this assumption is
violated, no exception is thrown and results are undefined. Search with UTF-8 encoded strings is usually provided by separate function
variants. Likewise, if a UTF-8 function variant is used and the input strings are not UTF-8 encoded text, no exception is thrown and the
results are undefined. Note that no automatic Unicode normalization is performed, you can use the
results are undefined. Note that no automatic Unicode normalization is performed, however you can use the
[normalizeUTF8*()](https://clickhouse.com/docs/en/sql-reference/functions/string-functions/) functions for that.
[General strings functions](string-functions.md) and [functions for replacing in strings](string-replace-functions.md) are described separately.
@ -54,6 +57,8 @@ Type: `Integer`.
**Examples**
Query:
``` sql
SELECT position('Hello, world!', '!');
```
@ -68,6 +73,8 @@ Result:
Example with `start_pos` argument:
Query:
``` sql
SELECT
position('Hello, world!', 'o', 1),
@ -84,6 +91,8 @@ Result:
Example for `needle IN haystack` syntax:
Query:
```sql
SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s);
```
@ -98,6 +107,8 @@ Result:
Examples with empty `needle` substring:
Query:
``` sql
SELECT
position('abc', ''),
@ -109,6 +120,8 @@ SELECT
position('abc', '', 5)
```
Result:
``` text
┌─position('abc', '')─┬─position('abc', '', 0)─┬─position('abc', '', 1)─┬─position('abc', '', 2)─┬─position('abc', '', 3)─┬─position('abc', '', 4)─┬─position('abc', '', 5)─┐
│ 1 │ 1 │ 1 │ 2 │ 3 │ 4 │ 0 │
@ -132,7 +145,23 @@ locate(needle, haystack[, start_pos])
## positionCaseInsensitive
Like [position](#position) but searches case-insensitively.
A case insensitive invariant of [position](#position).
**Example**
Query:
``` sql
SELECT position('Hello, world!', 'hello');
```
Result:
``` text
┌─position('Hello, world!', 'hello')─┐
│ 0 │
└────────────────────────────────────┘
```
## positionUTF8
@ -142,6 +171,8 @@ Like [position](#position) but assumes `haystack` and `needle` are UTF-8 encoded
Function `positionUTF8` correctly counts character `ö` (represented by two points) as a single Unicode codepoint:
Query:
``` sql
SELECT positionUTF8('Motörhead', 'r');
```
@ -175,14 +206,17 @@ multiSearchAllPositions(haystack, [needle1, needle2, ..., needleN])
**Arguments**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Substrings to be searched. Array
- `needle` — Substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned values**
- Array of the starting position in bytes and counting from 1 (if the substring was found) or 0 (if the substring was not found)
- Array of the starting position in bytes and counting from 1, if the substring was found.
- 0, if the substring was not found.
**Example**
Query:
``` sql
SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world']);
```
@ -194,45 +228,535 @@ Result:
│ [0,13,0] │
└───────────────────────────────────────────────────────────────────┘
```
## multiSearchAllPositionsCaseInsensitive
## multiSearchAllPositionsUTF8
Like [multiSearchAllPositions](#multiSearchAllPositions) but assumes `haystack` and the `needle`-s are UTF-8 encoded strings.
## multiSearchFirstPosition
Like `position` but returns the leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
Functions `multiSearchFirstPositionCaseInsensitive`, `multiSearchFirstPositionUTF8` and `multiSearchFirstPositionCaseInsensitiveUTF8` provide case-insensitive and/or UTF-8 variants of this function.
Like [multiSearchAllPositions](#multisearchallpositions) but ignores case.
**Syntax**
```sql
multiSearchFirstPosition(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])
multiSearchAllPositionsCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- Array of the starting position in bytes and counting from 1 (if the substring was found).
- 0 if the substring was not found.
**Example**
Query:
```sql
SELECT multiSearchAllPositionsCaseInsensitive('ClickHouse',['c','h']);
```
Result:
```response
["1","6"]
```
## multiSearchAllPositionsUTF8
Like [multiSearchAllPositions](#multiSearchAllPositions) but assumes `haystack` and the `needle` substrings are UTF-8 encoded strings.
**Syntax**
```sql
multiSearchAllPositionsUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 encoded string in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — UTF-8 encoded substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- Array of the starting position in bytes and counting from 1 (if the substring was found).
- 0 if the substring was not found.
**Example**
Given `ClickHouse` as a UTF-8 string, find the positions of `C` (`\x43`) and `H` (`\x48`).
Query:
```sql
SELECT multiSearchAllPositionsUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x43','\x48']);
```
Result:
```response
["1","6"]
```
## multiSearchAllPositionsCaseInsensitiveUTF8
Like [multiSearchAllPositionsUTF8](#multisearchallpositionsutf8) but ignores case.
**Syntax**
```sql
multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 encoded string in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — UTF-8 encoded substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- Array of the starting position in bytes and counting from 1 (if the substring was found).
- 0 if the substring was not found.
**Example**
Given `ClickHouse` as a UTF-8 string, find the positions of `c` (`\x63`) and `h` (`\x68`).
Query:
```sql
SELECT multiSearchAllPositionsCaseInsensitiveUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x63','\x68']);
```
Result:
```response
["1","6"]
```
## multiSearchFirstPosition
Like [`position`](#position) but returns the leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
Functions [`multiSearchFirstPositionCaseInsensitive`](#multiSearchFirstPositionCaseInsensitive), [`multiSearchFirstPositionUTF8`](#multiSearchFirstPositionUTF8) and [`multiSearchFirstPositionCaseInsensitiveUTF8`](#multiSearchFirstPositionCaseInsensitiveUTF8) provide case-insensitive and/or UTF-8 variants of this function.
**Syntax**
```sql
multiSearchFirstPosition(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- Leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
- 0, if there was no match.
**Example**
Query:
```sql
SELECT multiSearchFirstPosition('Hello World',['llo', 'Wor', 'ld']);
```
Result:
```response
3
```
## multiSearchFirstPositionCaseInsensitive
Like [`multiSearchFirstPosition`](#multiSearchFirstPosition) but ignores case.
**Syntax**
```sql
multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Array of substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- Leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
- 0, if there was no match.
**Example**
Query:
```sql
SELECT multiSearchFirstPositionCaseInsensitive('HELLO WORLD',['wor', 'ld', 'ello']);
```
Result:
```response
2
```
## multiSearchFirstPositionUTF8
Like [`multiSearchFirstPosition`](#multiSearchFirstPosition) but assumes `haystack` and `needle` to be UTF-8 strings.
**Syntax**
```sql
multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Array of UTF-8 substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- Leftmost offset in a `haystack` string which matches any of multiple `needle` strings.
- 0, if there was no match.
**Example**
Find the leftmost offset in UTF-8 string `hello world` which matches any of the given needles.
Query:
```sql
SELECT multiSearchFirstPositionUTF8('\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64',['wor', 'ld', 'ello']);
```
Result:
```response
2
```
## multiSearchFirstPositionCaseInsensitiveUTF8
Like [`multiSearchFirstPosition`](#multiSearchFirstPosition) but assumes `haystack` and `needle` to be UTF-8 strings and ignores case.
**Syntax**
```sql
multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Array of UTF-8 substrings to be searched. [Array](../../sql-reference/data-types/array.md)
**Returned value**
- Leftmost offset in a `haystack` string which matches any of multiple `needle` strings, ignoring case.
- 0, if there was no match.
**Example**
Find the leftmost offset in UTF-8 string `HELLO WORLD` which matches any of the given needles.
Query:
```sql
SELECT multiSearchFirstPositionCaseInsensitiveUTF8('\x48\x45\x4c\x4c\x4f\x20\x57\x4f\x52\x4c\x44',['wor', 'ld', 'ello']);
```
Result:
```response
2
```
## multiSearchFirstIndex
Returns the index `i` (starting from 1) of the leftmost found needle<sub>i</sub> in the string `haystack` and 0 otherwise.
Functions `multiSearchFirstIndexCaseInsensitive`, `multiSearchFirstIndexUTF8` and `multiSearchFirstIndexCaseInsensitiveUTF8` provide case-insensitive and/or UTF-8 variants of this function.
Functions [`multiSearchFirstIndexCaseInsensitive`](#multiSearchFirstIndexCaseInsensitive), [`multiSearchFirstIndexUTF8`](#multiSearchFirstIndexUTF8) and [`multiSearchFirstIndexCaseInsensitiveUTF8`](#multiSearchFirstIndexCaseInsensitiveUTF8) provide case-insensitive and/or UTF-8 variants of this function.
**Syntax**
```sql
multiSearchFirstIndex(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])
multiSearchFirstIndex(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- index (starting from 1) of the leftmost found needle.
- 0, if there was no match.
**Example**
Query:
```sql
SELECT multiSearchFirstIndex('Hello World',['World','Hello']);
```
## multiSearchAny {#multisearchany}
Result:
```response
1
```
## multiSearchFirstIndexCaseInsensitive
Returns the index `i` (starting from 1) of the leftmost found needle<sub>i</sub> in the string `haystack` and 0 otherwise. Ignores case.
**Syntax**
```sql
multiSearchFirstIndexCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- index (starting from 1) of the leftmost found needle.
- 0, if there was no match.
**Example**
Query:
```sql
SELECT multiSearchFirstIndexCaseInsensitive('hElLo WoRlD',['World','Hello']);
```
Result:
```response
1
```
## multiSearchFirstIndexUTF8
Returns the index `i` (starting from 1) of the leftmost found needle<sub>i</sub> in the string `haystack` and 0 otherwise. Assumes `haystack` and `needle` are UTF-8 encoded strings.
**Syntax**
```sql
multiSearchFirstIndexUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Array of UTF-8 substrings to be searched. [Array](../../sql-reference/data-types/array.md)
**Returned value**
- index (starting from 1) of the leftmost found needle.
- 0, if there was no match.
**Example**
Given `Hello World` as a UTF-8 string, find the first index of UTF-8 strings `Hello` and `World`.
Query:
```sql
SELECT multiSearchFirstIndexUTF8('\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64',['\x57\x6f\x72\x6c\x64','\x48\x65\x6c\x6c\x6f']);
```
Result:
```response
1
```
## multiSearchFirstIndexCaseInsensitiveUTF8
Returns the index `i` (starting from 1) of the leftmost found needle<sub>i</sub> in the string `haystack` and 0 otherwise. Assumes `haystack` and `needle` are UTF-8 encoded strings. Ignores case.
**Syntax**
```sql
multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Array of UTF-8 substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- index (starting from 1) of the leftmost found needle.
- 0, if there was no match.
**Example**
Given `HELLO WORLD` as a UTF-8 string, find the first index of UTF-8 strings `hello` and `world`.
Query:
```sql
SELECT multiSearchFirstIndexCaseInsensitiveUTF8('\x48\x45\x4c\x4c\x4f\x20\x57\x4f\x52\x4c\x44',['\x68\x65\x6c\x6c\x6f','\x77\x6f\x72\x6c\x64']);
```
Result:
```response
1
```
## multiSearchAny
Returns 1, if at least one string needle<sub>i</sub> matches the string `haystack` and 0 otherwise.
Functions `multiSearchAnyCaseInsensitive`, `multiSearchAnyUTF8` and `multiSearchAnyCaseInsensitiveUTF8` provide case-insensitive and/or UTF-8 variants of this function.
Functions [`multiSearchAnyCaseInsensitive`](#multiSearchAnyCaseInsensitive), [`multiSearchAnyUTF8`](#multiSearchAnyUTF8) and []`multiSearchAnyCaseInsensitiveUTF8`](#multiSearchAnyCaseInsensitiveUTF8) provide case-insensitive and/or UTF-8 variants of this function.
**Syntax**
```sql
multiSearchAny(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])
multiSearchAny(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- 1, if there was at least one match.
- 0, if there was not at least one match.
**Example**
Query:
```sql
SELECT multiSearchAny('ClickHouse',['C','H']);
```
Result:
```response
1
```
## multiSearchAnyCaseInsensitive
Like [multiSearchAny](#multisearchany) but ignores case.
**Syntax**
```sql
multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — String in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — Substrings to be searched. [Array](../../sql-reference/data-types/array.md)
**Returned value**
- 1, if there was at least one case-insensitive match.
- 0, if there was not at least one case-insensitive match.
**Example**
Query:
```sql
SELECT multiSearchAnyCaseInsensitive('ClickHouse',['c','h']);
```
Result:
```response
1
```
## multiSearchAnyUTF8
Like [multiSearchAny](#multisearchany) but assumes `haystack` and the `needle` substrings are UTF-8 encoded strings.
*Syntax**
```sql
multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — UTF-8 substrings to be searched. [Array](../../sql-reference/data-types/array.md).
**Returned value**
- 1, if there was at least one match.
- 0, if there was not at least one match.
**Example**
Given `ClickHouse` as a UTF-8 string, check if there are any `C` ('\x43') or `H` ('\x48') letters in the word.
Query:
```sql
SELECT multiSearchAnyUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x43','\x48']);
```
Result:
```response
1
```
## multiSearchAnyCaseInsensitiveUTF8
Like [multiSearchAnyUTF8](#multiSearchAnyUTF8) but ignores case.
*Syntax**
```sql
multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])
```
**Parameters**
- `haystack` — UTF-8 string in which the search is performed. [String](../../sql-reference/syntax.md#syntax-string-literal).
- `needle` — UTF-8 substrings to be searched. [Array](../../sql-reference/data-types/array.md)
**Returned value**
- 1, if there was at least one case-insensitive match.
- 0, if there was not at least one case-insensitive match.
**Example**
Given `ClickHouse` as a UTF-8 string, check if there is any letter `h`(`\x68`) in the word, ignoring case.
Query:
```sql
SELECT multiSearchAnyCaseInsensitiveUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x68']);
```
Result:
```response
1
```
## match {#match}

View File

@ -557,6 +557,17 @@ Mongodb
mortonDecode
mortonEncode
MsgPack
multiSearchAllPositionsCaseInsensitive
multiSearchAllPositionsCaseInsensitiveUTF
multiSearchAnyCaseInsensitive
multiSearchAnyCaseInsensitiveUTF
multiSearchAnyUTF
multiSearchFirstIndexCaseInsensitive
multiSearchFirstIndexCaseInsensitiveUTF
multiSearchFirstIndexUTF
multiSearchFirstPositionCaseInsensitive
multiSearchFirstPositionCaseInsensitiveUTF
multiSearchFirstPositionUTF
MultiPolygon
Multiline
Multiqueries