ClickHouse/docs/en/sql-reference/functions/string-search-functions.md

---
sidebar_position: 41
sidebar_label: For Searching in Strings
---

# Functions for Searching in Strings

The search is case-sensitive by default in all these functions. There are separate variants for case insensitive search.

:::note    
Functions for [replacing](../../sql-reference/functions/string-replace-functions.md) and [other manipulations with strings](../../sql-reference/functions/string-functions.md) are described separately.
:::

## position(haystack, needle), locate(haystack, needle)

Searches for the substring `needle` in the string `haystack`.

Returns the position (in bytes) of the found substring in the string, starting from 1.

For a case-insensitive search, use the function [positionCaseInsensitive](#positioncaseinsensitive).

**Syntax**

``` sql
position(haystack, needle[, start_pos])
```

``` sql
position(needle IN haystack)
```

Alias: `locate(haystack, needle[, start_pos])`.

:::note    
Syntax of `position(needle IN haystack)` provides SQL-compatibility, the function works the same way as to `position(haystack, needle)`.
:::

**Arguments**

-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `start_pos` – Position of the first character in the string to start search. [UInt](../../sql-reference/data-types/int-uint.md). Optional.

**Returned values**

-   Starting position in bytes (counting from 1), if substring was found.
-   0, if the substring was not found.

Type: `Integer`.

**Examples**

The phrase “Hello, world!” contains a set of bytes representing a single-byte encoded text. The function returns some expected result:

Query:

``` sql
SELECT position('Hello, world!', '!');
```

Result:

``` text
┌─position('Hello, world!', '!')─┐
│                             13 │
└────────────────────────────────┘
```

``` sql
SELECT
    position('Hello, world!', 'o', 1),
    position('Hello, world!', 'o', 7)
```

``` text
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│                                 5 │                                 9 │
└───────────────────────────────────┴───────────────────────────────────┘
```

The same phrase in Russian contains characters which can’t be represented using a single byte. The function returns some unexpected result (use [positionUTF8](#positionutf8) function for multi-byte encoded text):

Query:

``` sql
SELECT position('Привет, мир!', '!');
```

Result:

``` text
┌─position('Привет, мир!', '!')─┐
│                            21 │
└───────────────────────────────┘
```

**Examples for POSITION(needle IN haystack) syntax**

Query:

```sql
SELECT 3 = position('c' IN 'abc');
```

Result:

```text
┌─equals(3, position('abc', 'c'))─┐
│                               1 │
└─────────────────────────────────┘
```

Query:

```sql
SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s);
```

Result:

```text
┌─equals(6, position(s, '/'))─┐
│                           1 │
└─────────────────────────────┘
```

## positionCaseInsensitive

The same as [position](#position) returns the position (in bytes) of the found substring in the string, starting from 1. Use the function for a case-insensitive search.

Works under the assumption that the string contains a set of bytes representing a single-byte encoded text. If this assumption is not met and a character can’t be represented using a single byte, the function does not throw an exception and returns some unexpected result. If character can be represented using two bytes, it will use two bytes and so on.

**Syntax**

``` sql
positionCaseInsensitive(haystack, needle[, start_pos])
```

**Arguments**

-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `start_pos` — Optional parameter, position of the first character in the string to start search. [UInt](../../sql-reference/data-types/int-uint.md).

**Returned values**

-   Starting position in bytes (counting from 1), if substring was found.
-   0, if the substring was not found.

Type: `Integer`.

**Example**

Query:

``` sql
SELECT positionCaseInsensitive('Hello, world!', 'hello');
```

Result:

``` text
┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
│                                                 1 │
└───────────────────────────────────────────────────┘
```

## positionUTF8

Returns the position (in Unicode points) of the found substring in the string, starting from 1.

Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.

For a case-insensitive search, use the function [positionCaseInsensitiveUTF8](#positioncaseinsensitiveutf8).

**Syntax**

``` sql
positionUTF8(haystack, needle[, start_pos])
```

**Arguments**

-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `start_pos` — Optional parameter, position of the first character in the string to start search. [UInt](../../sql-reference/data-types/int-uint.md)

**Returned values**

-   Starting position in Unicode points (counting from 1), if substring was found.
-   0, if the substring was not found.

Type: `Integer`.

**Examples**

The phrase “Hello, world!” in Russian contains a set of Unicode points representing a single-point encoded text. The function returns some expected result:

Query:

``` sql
SELECT positionUTF8('Привет, мир!', '!');
```

Result:

``` text
┌─positionUTF8('Привет, мир!', '!')─┐
│                                12 │
└───────────────────────────────────┘
```

The phrase “Salut, étudiante!”, where character `é` can be represented using a one point (`U+00E9`) or two points (`U+0065U+0301`) the function can be returned some unexpected result:

Query for the letter `é`, which is represented one Unicode point `U+00E9`:

``` sql
SELECT positionUTF8('Salut, étudiante!', '!');
```

Result:

``` text
┌─positionUTF8('Salut, étudiante!', '!')─┐
│                                     17 │
└────────────────────────────────────────┘
```

Query for the letter `é`, which is represented two Unicode points `U+0065U+0301`:

``` sql
SELECT positionUTF8('Salut, étudiante!', '!');
```

Result:

``` text
┌─positionUTF8('Salut, étudiante!', '!')─┐
│                                     18 │
└────────────────────────────────────────┘
```

## positionCaseInsensitiveUTF8

The same as [positionUTF8](#positionutf8), but is case-insensitive. Returns the position (in Unicode points) of the found substring in the string, starting from 1.

Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.

**Syntax**

``` sql
positionCaseInsensitiveUTF8(haystack, needle[, start_pos])
```

**Arguments**

-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `start_pos` — Optional parameter, position of the first character in the string to start search. [UInt](../../sql-reference/data-types/int-uint.md)

**Returned value**

-   Starting position in Unicode points (counting from 1), if substring was found.
-   0, if the substring was not found.

Type: `Integer`.

**Example**

Query:

``` sql
SELECT positionCaseInsensitiveUTF8('Привет, мир!', 'Мир');
```

Result:

``` text
┌─positionCaseInsensitiveUTF8('Привет, мир!', 'Мир')─┐
│                                                  9 │
└────────────────────────────────────────────────────┘
```

## multiSearchAllPositions

The same as [position](../../sql-reference/functions/string-search-functions.md#position) but returns `Array` of positions (in bytes) of the found corresponding substrings in the string. Positions are indexed starting from 1.

The search is performed on sequences of bytes without respect to string encoding and collation.

-   For case-insensitive ASCII search, use the function `multiSearchAllPositionsCaseInsensitive`.
-   For search in UTF-8, use the function [multiSearchAllPositionsUTF8](#multiSearchAllPositionsUTF8).
-   For case-insensitive UTF-8 search, use the function multiSearchAllPositionsCaseInsensitiveUTF8.

**Syntax**

``` sql
multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])
```

**Arguments**

-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).

**Returned values**

-   Array of starting positions in bytes (counting from 1), if the corresponding substring was found and 0 if not found.

**Example**

Query:

``` sql
SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world']);
```

Result:

``` text
┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0]                                                          │
└───────────────────────────────────────────────────────────────────┘
```

## multiSearchAllPositionsUTF8

See `multiSearchAllPositions`.

## multiSearchFirstPosition(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])

The same as `position` but returns the leftmost offset of the string `haystack` that is matched to some of the needles.

For a case-insensitive search or/and in UTF-8 format use functions `multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8`.

## multiSearchFirstIndex(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])

Returns the index `i` (starting from 1) of the leftmost found needle<sub>i</sub> in the string `haystack` and 0 otherwise.

For a case-insensitive search or/and in UTF-8 format use functions `multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8`.

## multiSearchAny(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])

Returns 1, if at least one string needle<sub>i</sub> matches the string `haystack` and 0 otherwise.

For a case-insensitive search or/and in UTF-8 format use functions `multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8`.

:::note    
In all `multiSearch*` functions the number of needles should be less than 2<sup>8</sup> because of implementation specification.
:::

## match(haystack, pattern)

Checks whether the string matches the `pattern` regular expression. A `re2` regular expression. The [syntax](https://github.com/google/re2/wiki/Syntax) of the `re2` regular expressions is more limited than the syntax of the Perl regular expressions.

Returns 0 if it does not match, or 1 if it matches.

The regular expression works with the string as if it is a set of bytes. The regular expression can’t contain null bytes.
For patterns to search for substrings in a string, it is better to use LIKE or ‘position’, since they work much faster.

## multiMatchAny(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])

The same as `match`, but returns 0 if none of the regular expressions are matched and 1 if any of the patterns matches. It uses [hyperscan](https://github.com/intel/hyperscan) library. For patterns to search substrings in a string, it is better to use `multiSearchAny` since it works much faster.

:::note    
The length of any of the `haystack` string must be less than 2<sup>32</sup> bytes otherwise the exception is thrown. This restriction takes place because of hyperscan API.
:::

## multiMatchAnyIndex(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])

The same as `multiMatchAny`, but returns any index that matches the haystack.

## multiMatchAllIndices(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])

The same as `multiMatchAny`, but returns the array of all indicies that match the haystack in any order.

## multiFuzzyMatchAny(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])

The same as `multiMatchAny`, but returns 1 if any pattern matches the haystack within a constant [edit distance](https://en.wikipedia.org/wiki/Edit_distance). This function relies on the experimental feature of [hyperscan](https://intel.github.io/hyperscan/dev-reference/compilation.html#approximate-matching) library, and can be slow for some corner cases. The performance depends on the edit distance value and patterns used, but it's always more expensive compared to a non-fuzzy variants.

## multiFuzzyMatchAnyIndex(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])

The same as `multiFuzzyMatchAny`, but returns any index that matches the haystack within a constant edit distance.

## multiFuzzyMatchAllIndices(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])

The same as `multiFuzzyMatchAny`, but returns the array of all indices in any order that match the haystack within a constant edit distance.

:::note    
`multiFuzzyMatch*` functions do not support UTF-8 regular expressions, and such expressions are treated as bytes because of hyperscan restriction.
:::

:::note    
To turn off all functions that use hyperscan, use setting `SET allow_hyperscan = 0;`.
:::

## extract(haystack, pattern)

Extracts a fragment of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. If the regex does not contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.

## extractAll(haystack, pattern)

Extracts all the fragments of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. Returns an array of strings consisting of all matches to the regex. In general, the behavior is the same as the ‘extract’ function (it takes the first subpattern, or the entire expression if there isn’t a subpattern).

## extractAllGroupsHorizontal

Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.

:::note    
`extractAllGroupsHorizontal` function is slower than [extractAllGroupsVertical](#extractallgroups-vertical).
:::

**Syntax**

``` sql
extractAllGroupsHorizontal(haystack, pattern)
```

**Arguments**

-   `haystack` — Input string. Type: [String](../../sql-reference/data-types/string.md).
-   `pattern` — Regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. Type: [String](../../sql-reference/data-types/string.md).

**Returned value**

-   Type: [Array](../../sql-reference/data-types/array.md).

If `haystack` does not match the `pattern` regex, an array of empty arrays is returned.

**Example**

Query:

``` sql
SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
```

Result:

``` text
┌─extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','def','ghi'],['111','222','333']]                                                │
└──────────────────────────────────────────────────────────────────────────────────────────┘
```

**See Also**

-   [extractAllGroupsVertical](#extractallgroups-vertical)

## extractAllGroupsVertical

Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the `haystack`.

**Syntax**

``` sql
extractAllGroupsVertical(haystack, pattern)
```

**Arguments**

-   `haystack` — Input string. Type: [String](../../sql-reference/data-types/string.md).
-   `pattern` — Regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. Type: [String](../../sql-reference/data-types/string.md).

**Returned value**

-   Type: [Array](../../sql-reference/data-types/array.md).

If `haystack` does not match the `pattern` regex, an empty array is returned.

**Example**

Query:

``` sql
SELECT extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
```

Result:

``` text
┌─extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','111'],['def','222'],['ghi','333']]                                            │
└────────────────────────────────────────────────────────────────────────────────────────┘
```

**See Also**

-   [extractAllGroupsHorizontal](#extractallgroups-horizontal)

## like(haystack, pattern), haystack LIKE pattern operator

Checks whether a string matches a simple regular expression.
The regular expression can contain the metasymbols `%` and `_`.

`%` indicates any quantity of any bytes (including zero characters).

`_` indicates any one byte.

Use the backslash (`\`) for escaping metasymbols. See the note on escaping in the description of the ‘match’ function.

For regular expressions like `%needle%`, the code is more optimal and works as fast as the `position` function.
For other regular expressions, the code is the same as for the ‘match’ function.

## notLike(haystack, pattern), haystack NOT LIKE pattern operator

The same thing as ‘like’, but negative.

## ilike

Case insensitive variant of [like](https://clickhouse.com/docs/en/sql-reference/functions/string-search-functions/#function-like) function. You can use `ILIKE` operator instead of the `ilike` function.

**Syntax**

``` sql
ilike(haystack, pattern)
```

**Arguments**

-   `haystack` — Input string. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `pattern` — If `pattern` does not contain percent signs or underscores, then the `pattern` only represents the string itself. An underscore (`_`) in `pattern` stands for (matches) any single character. A percent sign (`%`) matches any sequence of zero or more characters.

Some `pattern` examples:

``` text
'abc' ILIKE 'abc'    true
'abc' ILIKE 'a%'     true
'abc' ILIKE '_b_'    true
'abc' ILIKE 'c'      false
```

**Returned values**

-   True, if the string matches `pattern`.
-   False, if the string does not match `pattern`.

**Example**

Input table:

``` text
┌─id─┬─name─────┬─days─┐
│  1 │ January  │   31 │
│  2 │ February │   29 │
│  3 │ March    │   31 │
│  4 │ April    │   30 │
└────┴──────────┴──────┘
```

Query:

``` sql
SELECT * FROM Months WHERE ilike(name, '%j%');
```

Result:

``` text
┌─id─┬─name────┬─days─┐
│  1 │ January │   31 │
└────┴─────────┴──────┘
```

**See Also**

-   [like](https://clickhouse.com/docs/en/sql-reference/functions/string-search-functions/#function-like) <!--hide-->

## ngramDistance(haystack, needle)

Calculates the 4-gram distance between `haystack` and `needle`: counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns float number from 0 to 1 – the closer to zero, the more strings are similar to each other. If the constant `needle` or `haystack` is more than 32Kb, throws an exception. If some of the non-constant `haystack` or `needle` strings are more than 32Kb, the distance is always one.

For case-insensitive search or/and in UTF-8 format use functions `ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8`.

## ngramSearch(haystack, needle)

Same as `ngramDistance` but calculates the non-symmetric difference between `needle` and `haystack` – the number of n-grams from needle minus the common number of n-grams normalized by the number of `needle` n-grams. The closer to one, the more likely `needle` is in the `haystack`. Can be useful for fuzzy string search.

For case-insensitive search or/and in UTF-8 format use functions `ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8`.

:::note    
For UTF-8 case we use 3-gram distance. All these are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables – collisions may occur. With UTF-8 case-insensitive format we do not use fair `tolower` function – we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one – this works for Latin and mostly for all Cyrillic letters.
:::

## countSubstrings

Returns the number of substring occurrences.

For a case-insensitive search, use [countSubstringsCaseInsensitive](../../sql-reference/functions/string-search-functions.md#countSubstringsCaseInsensitive) or [countSubstringsCaseInsensitiveUTF8](../../sql-reference/functions/string-search-functions.md#countSubstringsCaseInsensitiveUTF8) functions.

**Syntax**

``` sql
countSubstrings(haystack, needle[, start_pos])
```

**Arguments**

-   `haystack` — The string to search in. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `needle` — The substring to search for. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `start_pos` – Position of the first character in the string to start search. Optional. [UInt](../../sql-reference/data-types/int-uint.md).

**Returned values**

-   Number of occurrences.

Type: [UInt64](../../sql-reference/data-types/int-uint.md).

**Examples**

Query:

``` sql
SELECT countSubstrings('foobar.com', '.');
```

Result:

``` text
┌─countSubstrings('foobar.com', '.')─┐
│                                  1 │
└────────────────────────────────────┘
```

Query:

``` sql
SELECT countSubstrings('aaaa', 'aa');
```

Result:

``` text
┌─countSubstrings('aaaa', 'aa')─┐
│                             2 │
└───────────────────────────────┘
```

Query:

```sql
SELECT countSubstrings('abc___abc', 'abc', 4);
```

Result:

``` text
┌─countSubstrings('abc___abc', 'abc', 4)─┐
│                                      1 │
└────────────────────────────────────────┘
```

## countSubstringsCaseInsensitive

Returns the number of substring occurrences case-insensitive.

**Syntax**

``` sql
countSubstringsCaseInsensitive(haystack, needle[, start_pos])
```

**Arguments**

-   `haystack` — The string to search in. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `needle` — The substring to search for. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `start_pos` — Position of the first character in the string to start search. Optional. [UInt](../../sql-reference/data-types/int-uint.md).

**Returned values**

-   Number of occurrences.

Type: [UInt64](../../sql-reference/data-types/int-uint.md).

**Examples**

Query:

``` sql
SELECT countSubstringsCaseInsensitive('aba', 'B');
```

Result:

``` text
┌─countSubstringsCaseInsensitive('aba', 'B')─┐
│                                          1 │
└────────────────────────────────────────────┘
```

Query:

``` sql
SELECT countSubstringsCaseInsensitive('foobar.com', 'CoM');
```

Result:

``` text
┌─countSubstringsCaseInsensitive('foobar.com', 'CoM')─┐
│                                                   1 │
└─────────────────────────────────────────────────────┘
```

Query:

``` sql
SELECT countSubstringsCaseInsensitive('abC___abC', 'aBc', 2);
```

Result:

``` text
┌─countSubstringsCaseInsensitive('abC___abC', 'aBc', 2)─┐
│                                                     1 │
└───────────────────────────────────────────────────────┘
```

## countSubstringsCaseInsensitiveUTF8

Returns the number of substring occurrences in `UTF-8` case-insensitive.

**Syntax**

``` sql
SELECT countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])
```

**Arguments**

-   `haystack` — The string to search in. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `needle` — The substring to search for. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `start_pos` — Position of the first character in the string to start search. Optional. [UInt](../../sql-reference/data-types/int-uint.md).

**Returned values**

-   Number of occurrences.

Type: [UInt64](../../sql-reference/data-types/int-uint.md).

**Examples**

Query:

``` sql
SELECT countSubstringsCaseInsensitiveUTF8('абв', 'A');
```

Result:

``` text
┌─countSubstringsCaseInsensitiveUTF8('абв', 'A')─┐
│                                              1 │
└────────────────────────────────────────────────┘
```

Query:

```sql
SELECT countSubstringsCaseInsensitiveUTF8('аБв__АбВ__абв', 'Абв');
```

Result:

``` text
┌─countSubstringsCaseInsensitiveUTF8('аБв__АбВ__абв', 'Абв')─┐
│                                                          3 │
└────────────────────────────────────────────────────────────┘
```

## countMatches(haystack, pattern)

Returns the number of regular expression matches for a `pattern` in a `haystack`.

**Syntax**

``` sql
countMatches(haystack, pattern)
```

**Arguments**

-   `haystack` — The string to search in. [String](../../sql-reference/syntax.md#syntax-string-literal).
-   `pattern` — The regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). [String](../../sql-reference/data-types/string.md).

**Returned value**

-   The number of matches.

Type: [UInt64](../../sql-reference/data-types/int-uint.md).

**Examples**

Query:

``` sql
SELECT countMatches('foobar.com', 'o+');
```

Result:

``` text
┌─countMatches('foobar.com', 'o+')─┐
│                                2 │
└──────────────────────────────────┘
```

Query:

``` sql
SELECT countMatches('aaaa', 'aa');
```

Result:

``` text
┌─countMatches('aaaa', 'aa')────┐
│                             2 │
└───────────────────────────────┘
```
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								---
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								sidebar_position: 41
 								sidebar_label: For Searching in Strings
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								---
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								# Functions for Searching in Strings
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Update string_search_functions.md
											
										
										
											2019-02-11 12:49:33 +00:00
+								The search is case-sensitive by default in all these functions. There are separate variants for case insensitive search.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								Functions for [replacing](../../sql-reference/functions/string-replace-functions.md) and [other manipulations with strings](../../sql-reference/functions/string-functions.md) are described separately.
 								:::
-												Update string-search-functions.md
											
										
										
											2020-06-19 10:08:10 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## position(haystack, needle), locate(haystack, needle)
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Update docs/en/sql-reference/functions/string-search-functions.md

Co-authored-by: Anna <42538400+adevyatova@users.noreply.github.com>
											
										
										
											2021-03-23 06:25:38 +00:00
+								Searches for the substring `needle` in the string `haystack`.
-												new examples

											
										
										
											2021-03-22 16:30:28 +00:00
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								Returns the position (in bytes) of the found substring in the string, starting from 1.
-												draft

											
										
										
											2021-02-22 09:49:49 +00:00
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								For a case-insensitive search, use the function [positionCaseInsensitive](#positioncaseinsensitive).
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								**Syntax**
-												Changes in accordance with comments from the developers.

											
										
										
											2018-04-28 11:45:37 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												developer`s comments done

											
										
										
											2021-03-30 06:15:52 +00:00
+								position(haystack, needle[, start_pos])
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								```
-												developer`s comments done

											
										
										
											2021-03-30 06:15:52 +00:00
 								``` sql
 								position(needle IN haystack)
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								```
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
-												Add start_pos argument for position to documentation, case insensitive tests

											
										
										
											2020-08-02 13:29:10 +00:00
+								Alias: `locate(haystack, needle[, start_pos])`.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								Syntax of `position(needle IN haystack)` provides SQL-compatibility, the function works the same way as to `position(haystack, needle)`.
 								:::
-												++

											
										
										
											2021-03-13 18:25:06 +00:00
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
-												Update docs/en/sql-reference/functions/string-search-functions.md

Co-authored-by: Anna <42538400+adevyatova@users.noreply.github.com>
											
										
										
											2021-03-23 06:26:23 +00:00
+								-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-												Update docs/en/sql-reference/functions/string-search-functions.md

Co-authored-by: Anna <42538400+adevyatova@users.noreply.github.com>
											
										
										
											2021-03-23 06:27:12 +00:00
+								-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-												Update docs/en/sql-reference/functions/string-search-functions.md

Co-authored-by: Anna <42538400+adevyatova@users.noreply.github.com>
											
										
										
											2021-03-23 06:26:35 +00:00
+								-   `start_pos` – Position of the first character in the string to start search. [UInt](../../sql-reference/data-types/int-uint.md). Optional.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								**Returned values**
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   Starting position in bytes (counting from 1), if substring was found.
 								-   0, if the substring was not found.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Type: `Integer`.
 								**Examples**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								The phrase “Hello, world!” contains a set of bytes representing a single-byte encoded text. The function returns some expected result:
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Query:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT position('Hello, world!', '!');
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
 								Result:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								┌─position('Hello, world!', '!')─┐
 								│                             13 │
 								└────────────────────────────────┘
 								```
-												Add start_pos argument for position to documentation, case insensitive tests

											
										
										
											2020-08-02 13:29:10 +00:00
+								``` sql
 								SELECT
 								    position('Hello, world!', 'o', 1),
 								    position('Hello, world!', 'o', 7)
 								```
 								``` text
 								┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
 								│                                 5 │                                 9 │
 								└───────────────────────────────────┴───────────────────────────────────┘
 								```
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								The same phrase in Russian contains characters which can’t be represented using a single byte. The function returns some unexpected result (use [positionUTF8](#positionutf8) function for multi-byte encoded text):
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Query:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT position('Привет, мир!', '!');
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
 								Result:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								┌─position('Привет, мир!', '!')─┐
 								│                            21 │
 								└───────────────────────────────┘
 								```
-												developer`s comments done

											
										
										
											2021-03-30 06:15:52 +00:00
+								**Examples for POSITION(needle IN haystack) syntax**
-												++

											
										
										
											2021-03-22 16:49:14 +00:00
-												new examples

											
										
										
											2021-03-22 16:30:28 +00:00
+								Query:
 								```sql
-												Update docs/en/sql-reference/functions/string-search-functions.md

Co-authored-by: Anna <42538400+adevyatova@users.noreply.github.com>
											
										
										
											2021-03-23 06:26:48 +00:00
+								SELECT 3 = position('c' IN 'abc');
-												new examples

											
										
										
											2021-03-22 16:30:28 +00:00
+								```
 								Result:
 								```text
 								┌─equals(3, position('abc', 'c'))─┐
 								│                               1 │
 								└─────────────────────────────────┘
 								```
 								Query:
 								```sql
 								SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s);
 								```
 								Result:
 								```text
 								┌─equals(6, position(s, '/'))─┐
 								│                           1 │
 								└─────────────────────────────┘
 								```
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## positionCaseInsensitive
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								The same as [position](#position) returns the position (in bytes) of the found substring in the string, starting from 1. Use the function for a case-insensitive search.
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								Works under the assumption that the string contains a set of bytes representing a single-byte encoded text. If this assumption is not met and a character can’t be represented using a single byte, the function does not throw an exception and returns some unexpected result. If character can be represented using two bytes, it will use two bytes and so on.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								**Syntax**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Add start_pos argument for position to documentation, case insensitive tests

											
										
										
											2020-08-02 13:29:10 +00:00
+								positionCaseInsensitive(haystack, needle[, start_pos])
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `start_pos` — Optional parameter, position of the first character in the string to start search. [UInt](../../sql-reference/data-types/int-uint.md).
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								**Returned values**
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   Starting position in bytes (counting from 1), if substring was found.
 								-   0, if the substring was not found.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Type: `Integer`.
 								**Example**
 								Query:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT positionCaseInsensitive('Hello, world!', 'hello');
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
 								Result:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
 								│                                                 1 │
 								└───────────────────────────────────────────────────┘
 								```
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## positionUTF8
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Returns the position (in Unicode points) of the found substring in the string, starting from 1.
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								For a case-insensitive search, use the function [positionCaseInsensitiveUTF8](#positioncaseinsensitiveutf8).
 								**Syntax**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Add start_pos argument for position to documentation, case insensitive tests

											
										
										
											2020-08-02 13:29:10 +00:00
+								positionUTF8(haystack, needle[, start_pos])
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `start_pos` — Optional parameter, position of the first character in the string to start search. [UInt](../../sql-reference/data-types/int-uint.md)
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								**Returned values**
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   Starting position in Unicode points (counting from 1), if substring was found.
 								-   0, if the substring was not found.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Type: `Integer`.
 								**Examples**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								The phrase “Hello, world!” in Russian contains a set of Unicode points representing a single-point encoded text. The function returns some expected result:
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Query:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT positionUTF8('Привет, мир!', '!');
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
 								Result:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								┌─positionUTF8('Привет, мир!', '!')─┐
 								│                                12 │
 								└───────────────────────────────────┘
 								```
-												Update string-search-functions.md
											
										
										
											2020-06-19 10:05:38 +00:00
+								The phrase “Salut, étudiante!”, where character `é` can be represented using a one point (`U+00E9`) or two points (`U+0065U+0301`) the function can be returned some unexpected result:
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Query for the letter `é`, which is represented one Unicode point `U+00E9`:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT positionUTF8('Salut, étudiante!', '!');
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
 								Result:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								┌─positionUTF8('Salut, étudiante!', '!')─┐
 								│                                     17 │
 								└────────────────────────────────────────┘
 								```
-												Update string-search-functions.md
											
										
										
											2020-06-19 10:05:38 +00:00
+								Query for the letter `é`, which is represented two Unicode points `U+0065U+0301`:
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT positionUTF8('Salut, étudiante!', '!');
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
 								Result:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												Update string-search-functions.md
											
										
										
											2020-06-19 10:05:38 +00:00
+								┌─positionUTF8('Salut, étudiante!', '!')─┐
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								│                                     18 │
 								└────────────────────────────────────────┘
 								```
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## positionCaseInsensitiveUTF8
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								The same as [positionUTF8](#positionutf8), but is case-insensitive. Returns the position (in Unicode points) of the found substring in the string, starting from 1.
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								Works under the assumption that the string contains a set of bytes representing a UTF-8 encoded text. If this assumption is not met, the function does not throw an exception and returns some unexpected result. If character can be represented using two Unicode points, it will use two and so on.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								**Syntax**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Add start_pos argument for position to documentation, case insensitive tests

											
										
										
											2020-08-02 13:29:10 +00:00
+								positionCaseInsensitiveUTF8(haystack, needle[, start_pos])
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `start_pos` — Optional parameter, position of the first character in the string to start search. [UInt](../../sql-reference/data-types/int-uint.md)
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								**Returned value**
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   Starting position in Unicode points (counting from 1), if substring was found.
 								-   0, if the substring was not found.
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
 								Type: `Integer`.
 								**Example**
 								Query:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT positionCaseInsensitiveUTF8('Привет, мир!', 'Мир');
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								```
 								Result:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												DOCS-57: position, positionCaseInsensitive, positionUTF8, positionCaseInsensitiveUTF8 (#9631)


											
										
										
											2020-03-13 06:33:02 +00:00
+								┌─positionCaseInsensitiveUTF8('Привет, мир!', 'Мир')─┐
 								│                                                  9 │
 								└────────────────────────────────────────────────────┘
 								```
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiSearchAllPositions
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												[docs] split aggregate function and system table references (#11742)

* prefer relative links from root

* wip

* split aggregate function reference

* split system tables
											
										
										
											2020-06-18 08:24:31 +00:00
+								The same as [position](../../sql-reference/functions/string-search-functions.md#position) but returns `Array` of positions (in bytes) of the found corresponding substrings in the string. Positions are indexed starting from 1.
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												Update string_search_functions.md
											
										
										
											2020-02-02 21:38:00 +00:00
+								The search is performed on sequences of bytes without respect to string encoding and collation.
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   For case-insensitive ASCII search, use the function `multiSearchAllPositionsCaseInsensitive`.
 								-   For search in UTF-8, use the function [multiSearchAllPositionsUTF8](#multiSearchAllPositionsUTF8).
 								-   For case-insensitive UTF-8 search, use the function multiSearchAllPositionsCaseInsensitiveUTF8.
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								**Syntax**
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
+								multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])
 								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								-   `haystack` — String, in which substring will to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `needle` — Substring to be searched. [String](../../sql-reference/syntax.md#syntax-string-literal).
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
 								**Returned values**
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   Array of starting positions in bytes (counting from 1), if the corresponding substring was found and 0 if not found.
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
 								**Example**
 								Query:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world']);
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
+								```
 								Result:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
+								┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
 								│ [0,13,0]                                                          │
 								└───────────────────────────────────────────────────────────────────┘
 								```
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiSearchAllPositionsUTF8
-												elenbaskakova-DOCSUP-178
docs(multiSearchAllPositions, multiSearchAllPositionsUTF8): Full description of functions  was added

											
										
										
											2019-10-12 21:12:09 +00:00
-												Update string_search_functions.md
											
										
										
											2020-02-02 21:38:00 +00:00
+								See `multiSearchAllPositions`.
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiSearchFirstPosition(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												Renamings, fixes to search algorithms, more tests

											
										
										
											2019-03-23 22:49:38 +00:00
+								The same as `position` but returns the leftmost offset of the string `haystack` that is matched to some of the needles.
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												Renamings, fixes to search algorithms, more tests

											
										
										
											2019-03-23 22:49:38 +00:00
+								For a case-insensitive search or/and in UTF-8 format use functions `multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8`.
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiSearchFirstIndex(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])
-												Renamings, fixes to search algorithms, more tests

											
										
										
											2019-03-23 22:49:38 +00:00
-												fix hyperscan to treat regular expressions as utf-8 expressions

											
										
										
											2019-05-05 06:51:36 +00:00
+								Returns the index `i` (starting from 1) of the leftmost found needle<sub>i</sub> in the string `haystack` and 0 otherwise.
-												Renamings, fixes to search algorithms, more tests

											
										
										
											2019-03-23 22:49:38 +00:00
 								For a case-insensitive search or/and in UTF-8 format use functions `multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8`.
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiSearchAny(haystack, \[needle<sub>1</sub>, needle<sub>2</sub>, …, needle<sub>n</sub>\])
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												fix hyperscan to treat regular expressions as utf-8 expressions

											
										
										
											2019-05-05 06:51:36 +00:00
+								Returns 1, if at least one string needle<sub>i</sub> matches the string `haystack` and 0 otherwise.
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												Renamings, fixes to search algorithms, more tests

											
										
										
											2019-03-23 22:49:38 +00:00
+								For a case-insensitive search or/and in UTF-8 format use functions `multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8`.
-												Docs for multi string search (#4123)


											
										
										
											2019-01-23 08:38:32 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								In all `multiSearch*` functions the number of needles should be less than 2<sup>8</sup> because of implementation specification.
 								:::
-												More restrictions added

											
										
										
											2019-03-28 15:12:37 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## match(haystack, pattern)
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Doc fixes: remove double placeholders; add them where missing. (#3923)

* Doc fix: add spaces where missing

* Doc fixes: rm double spaces

* Doc fixes: edit spaces

* Doc fixes: rm double spaces in /fa

* Revert "Doc fixes: rm double spaces in /fa"

This reverts commit bb879a62ef5fa965d989fea3b1b2a693d2016a2d.

* Doc fix: resolve all problems with double spaces in /fa

* Doc fix: add spaces for readability

* Doc fix: add spaces

* Fix spaces

											
										
										
											2018-12-25 15:25:43 +00:00
+								Checks whether the string matches the `pattern` regular expression. A `re2` regular expression. The [syntax](https://github.com/google/re2/wiki/Syntax) of the `re2` regular expressions is more limited than the syntax of the Perl regular expressions.
-												Partial sync between ru and en version (#3464)

* Update of english version of descriprion of the table function `file`.

* New syntax for ReplacingMergeTree.
Some improvements in text.

* Significantly change article about SummingMergeTree.
Article is restructured, text is changed in many places of the document. New syntax for table creation is described.

* Descriptions of AggregateFunction and AggregatingMergeTree are updated. Russian version.

* New syntax for new syntax of CREATE TABLE

* Added english docs on Aggregating, Replacing and SummingMergeTree.

* CollapsingMergeTree docs. English version.

* 1. Update of CollapsingMergeTree. 2. Minor changes in markup

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatefunction.md

* Update aggregatingmergetree.md

* GraphiteMergeTree docs update.
New syntax for creation of Replicated* tables.
Minor changes in *MergeTree tables creation syntax.

* Markup fix

* Markup and language fixes

* Clarification in the CollapsingMergeTree article

* DOCAPI-4821. Sync between ru and en versions of docs.

* Fixed the ambiguity in geo functions description.

* Example of JOIN in ru docs

* Deleted misinforming example.

											
										
										
											2018-11-01 13:28:45 +00:00
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								Returns 0 if it does not match, or 1 if it matches.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								The regular expression works with the string as if it is a set of bytes. The regular expression can’t contain null bytes.
 								For patterns to search for substrings in a string, it is better to use LIKE or ‘position’, since they work much faster.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiMatchAny(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
-												Renamings, fixes to search algorithms, more tests

											
										
										
											2019-03-23 22:49:38 +00:00
-												More restrictions added

											
										
										
											2019-03-28 15:12:37 +00:00
+								The same as `match`, but returns 0 if none of the regular expressions are matched and 1 if any of the patterns matches. It uses [hyperscan](https://github.com/intel/hyperscan) library. For patterns to search substrings in a string, it is better to use `multiSearchAny` since it works much faster.
-												Renamings, fixes to search algorithms, more tests

											
										
										
											2019-03-23 22:49:38 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								The length of any of the `haystack` string must be less than 2<sup>32</sup> bytes otherwise the exception is thrown. This restriction takes place because of hyperscan API.
 								:::
-												Fix hyperscan, add some notes, test, 4 more perf tests

											
										
										
											2019-03-24 21:47:34 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiMatchAnyIndex(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
-												Hyperscan multi regular expressions search

											
										
										
											2019-03-23 19:40:16 +00:00
-												Renamings, fixes to search algorithms, more tests

											
										
										
											2019-03-23 22:49:38 +00:00
+								The same as `multiMatchAny`, but returns any index that matches the haystack.
-												Hyperscan multi regular expressions search

											
										
										
											2019-03-23 19:40:16 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiMatchAllIndices(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
-												All multi{Fuzzy}MatchAllIndices functions

											
										
										
											2019-10-13 13:22:09 +00:00
 								The same as `multiMatchAny`, but returns the array of all indicies that match the haystack in any order.
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiFuzzyMatchAny(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
-												Added hyperscan fuzzy search

											
										
										
											2019-03-29 01:02:05 +00:00
-												Update string-search-functions.md
											
										
										
											2021-11-22 13:05:23 +00:00
+								The same as `multiMatchAny`, but returns 1 if any pattern matches the haystack within a constant [edit distance](https://en.wikipedia.org/wiki/Edit_distance). This function relies on the experimental feature of [hyperscan](https://intel.github.io/hyperscan/dev-reference/compilation.html#approximate-matching) library, and can be slow for some corner cases. The performance depends on the edit distance value and patterns used, but it's always more expensive compared to a non-fuzzy variants.
-												Added hyperscan fuzzy search

											
										
										
											2019-03-29 01:02:05 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiFuzzyMatchAnyIndex(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
-												Added hyperscan fuzzy search

											
										
										
											2019-03-29 01:02:05 +00:00
-												Better docs

											
										
										
											2019-03-29 01:39:59 +00:00
+								The same as `multiFuzzyMatchAny`, but returns any index that matches the haystack within a constant edit distance.
-												Added hyperscan fuzzy search

											
										
										
											2019-03-29 01:02:05 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## multiFuzzyMatchAllIndices(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, …, pattern<sub>n</sub>\])
-												All multi{Fuzzy}MatchAllIndices functions

											
										
										
											2019-10-13 13:22:09 +00:00
-												Some docs

											
										
										
											2019-10-13 13:35:43 +00:00
+								The same as `multiFuzzyMatchAny`, but returns the array of all indices in any order that match the haystack within a constant edit distance.
-												All multi{Fuzzy}MatchAllIndices functions

											
										
										
											2019-10-13 13:22:09 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								`multiFuzzyMatch*` functions do not support UTF-8 regular expressions, and such expressions are treated as bytes because of hyperscan restriction.
 								:::
-												fix hyperscan to treat regular expressions as utf-8 expressions

											
										
										
											2019-05-05 06:51:36 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								To turn off all functions that use hyperscan, use setting `SET allow_hyperscan = 0;`.
 								:::
-												Added hyperscan fuzzy search

											
										
										
											2019-03-29 01:02:05 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## extract(haystack, pattern)
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								Extracts a fragment of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. If the regex does not contain subpatterns, it takes the fragment that matches the entire regex. Otherwise, it takes the fragment that matches the first subpattern.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## extractAll(haystack, pattern)
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								Extracts all the fragments of a string using a regular expression. If ‘haystack’ does not match the ‘pattern’ regex, an empty string is returned. Returns an array of strings consisting of all matches to the regex. In general, the behavior is the same as the ‘extract’ function (it takes the first subpattern, or the entire expression if there isn’t a subpattern).
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## extractAllGroupsHorizontal
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:27:50 +00:00
+								Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								`extractAllGroupsHorizontal` function is slower than [extractAllGroupsVertical](#extractallgroups-vertical).
 								:::
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								**Syntax**
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
 								``` sql
 								extractAllGroupsHorizontal(haystack, pattern)
 								```
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								**Arguments**
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
 								-   `haystack` — Input string. Type: [String](../../sql-reference/data-types/string.md).
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								-   `pattern` — Regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. Type: [String](../../sql-reference/data-types/string.md).
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
 								**Returned value**
 								-   Type: [Array](../../sql-reference/data-types/array.md).
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								If `haystack` does not match the `pattern` regex, an array of empty arrays is returned.
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
 								**Example**
 								Query:
 								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
+								```
 								Result:
 								``` text
 								┌─extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
 								│ [['abc','def','ghi'],['111','222','333']]                                                │
 								└──────────────────────────────────────────────────────────────────────────────────────────┘
 								```
-												Document the countMatches function.

Задокументировал функцию countMatches.

											
										
										
											2020-12-21 19:30:37 +00:00
+								**See Also**
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
+								-   [extractAllGroupsVertical](#extractallgroups-vertical)
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## extractAllGroupsVertical
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
 								Matches all groups of the `haystack` string using the `pattern` regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the `haystack`.
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								**Syntax**
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
 								``` sql
 								extractAllGroupsVertical(haystack, pattern)
 								```
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								**Arguments**
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
 								-   `haystack` — Input string. Type: [String](../../sql-reference/data-types/string.md).
 								-   `pattern` — Regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). Must contain groups, each group enclosed in parentheses. If `pattern` contains no groups, an exception is thrown. Type: [String](../../sql-reference/data-types/string.md).
 								**Returned value**
 								-   Type: [Array](../../sql-reference/data-types/array.md).
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								If `haystack` does not match the `pattern` regex, an empty array is returned.
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
 								**Example**
 								Query:
 								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
+								```
 								Result:
 								``` text
 								┌─extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
 								│ [['abc','111'],['def','222'],['ghi','333']]                                            │
 								└────────────────────────────────────────────────────────────────────────────────────────┘
 								```
-												Document the countMatches function.

Задокументировал функцию countMatches.

											
										
										
											2020-12-21 19:30:37 +00:00
+								**See Also**
-												DOCSUP-1674: Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English) (#14317)

* Docs for the extractAllGroupsHorizontal and extractAllGroupsVertical functions (English).

* Minor fixes (en).

* Misspelling fixed.

* English docs corrected and translated into Russian.

* English misspelling corrected.

Co-authored-by: Olga Revyakina <revolg@yandex.ru>
Co-authored-by: Olga Revyakina <revolg@yandex-team.ru>
											
										
										
											2020-10-06 11:17:19 +00:00
+								-   [extractAllGroupsHorizontal](#extractallgroups-horizontal)
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## like(haystack, pattern), haystack LIKE pattern operator
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
 								Checks whether a string matches a simple regular expression.
 								The regular expression can contain the metasymbols `%` and `_`.
-												translation function document

											
										
										
											2019-05-05 17:38:05 +00:00
+								`%` indicates any quantity of any bytes (including zero characters).
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
 								`_` indicates any one byte.
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								Use the backslash (`\`) for escaping metasymbols. See the note on escaping in the description of the ‘match’ function.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
 								For regular expressions like `%needle%`, the code is more optimal and works as fast as the `position` function.
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								For other regular expressions, the code is the same as for the ‘match’ function.
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## notLike(haystack, pattern), haystack NOT LIKE pattern operator
-												Sources for english documentation switched to Markdown.
Edit page link is fixed too for both language versions of documentation.

											
										
										
											2017-12-28 15:13:23 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								The same thing as ‘like’, but negative.
-												Update of english documentation (#2918)

* Updating of english translation.

* Some bugs are fixed.

											
										
										
											2018-09-04 11:18:59 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## ilike
-												DOCSUP-3478: Documented the iLike function (#15880)

* Description of the iLike function

Добавил описание функции iLike и добавил оператор ILIKE.

* Update string-search-functions.md

Changed by comments.

* Update and translation ilike function and ILIKE operator..

Внес поправки в английскую версию и сделал перевод на русский язык.

Co-authored-by: Dmitriy <sevirov@yandex-team.ru>
											
										
										
											2020-10-19 15:32:09 +00:00
-												find . -type f -name '*.md'| xargs -I{} perl -pi -e 's|https://clickhouse.tech|https://clickhouse.com|g' {}

											
										
										
											2021-09-19 20:05:54 +00:00
+								Case insensitive variant of [like](https://clickhouse.com/docs/en/sql-reference/functions/string-search-functions/#function-like) function. You can use `ILIKE` operator instead of the `ilike` function.
-												DOCSUP-3478: Documented the iLike function (#15880)

* Description of the iLike function

Добавил описание функции iLike и добавил оператор ILIKE.

* Update string-search-functions.md

Changed by comments.

* Update and translation ilike function and ILIKE operator..

Внес поправки в английскую версию и сделал перевод на русский язык.

Co-authored-by: Dmitriy <sevirov@yandex-team.ru>
											
										
										
											2020-10-19 15:32:09 +00:00
 								**Syntax**
 								``` sql
 								ilike(haystack, pattern)
 								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												DOCSUP-3478: Documented the iLike function (#15880)

* Description of the iLike function

Добавил описание функции iLike и добавил оператор ILIKE.

* Update string-search-functions.md

Changed by comments.

* Update and translation ilike function and ILIKE operator..

Внес поправки в английскую версию и сделал перевод на русский язык.

Co-authored-by: Dmitriy <sevirov@yandex-team.ru>
											
										
										
											2020-10-19 15:32:09 +00:00
 								-   `haystack` — Input string. [String](../../sql-reference/syntax.md#syntax-string-literal).
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								-   `pattern` — If `pattern` does not contain percent signs or underscores, then the `pattern` only represents the string itself. An underscore (`_`) in `pattern` stands for (matches) any single character. A percent sign (`%`) matches any sequence of zero or more characters.
-												DOCSUP-3478: Documented the iLike function (#15880)

* Description of the iLike function

Добавил описание функции iLike и добавил оператор ILIKE.

* Update string-search-functions.md

Changed by comments.

* Update and translation ilike function and ILIKE operator..

Внес поправки в английскую версию и сделал перевод на русский язык.

Co-authored-by: Dmitriy <sevirov@yandex-team.ru>
											
										
										
											2020-10-19 15:32:09 +00:00
 								Some `pattern` examples:
 								``` text
 								'abc' ILIKE 'abc'    true
 								'abc' ILIKE 'a%'     true
 								'abc' ILIKE '_b_'    true
 								'abc' ILIKE 'c'      false
 								```
 								**Returned values**
 								-   True, if the string matches `pattern`.
-												Avoid short syntax

											
										
										
											2021-05-27 19:44:11 +00:00
+								-   False, if the string does not match `pattern`.
-												DOCSUP-3478: Documented the iLike function (#15880)

* Description of the iLike function

Добавил описание функции iLike и добавил оператор ILIKE.

* Update string-search-functions.md

Changed by comments.

* Update and translation ilike function and ILIKE operator..

Внес поправки в английскую версию и сделал перевод на русский язык.

Co-authored-by: Dmitriy <sevirov@yandex-team.ru>
											
										
										
											2020-10-19 15:32:09 +00:00
 								**Example**
 								Input table:
 								``` text
 								┌─id─┬─name─────┬─days─┐
 								│  1 │ January  │   31 │
 								│  2 │ February │   29 │
 								│  3 │ March    │   31 │
 								│  4 │ April    │   30 │
 								└────┴──────────┴──────┘
 								```
 								Query:
 								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT * FROM Months WHERE ilike(name, '%j%');
-												DOCSUP-3478: Documented the iLike function (#15880)

* Description of the iLike function

Добавил описание функции iLike и добавил оператор ILIKE.

* Update string-search-functions.md

Changed by comments.

* Update and translation ilike function and ILIKE operator..

Внес поправки в английскую версию и сделал перевод на русский язык.

Co-authored-by: Dmitriy <sevirov@yandex-team.ru>
											
										
										
											2020-10-19 15:32:09 +00:00
+								```
 								Result:
 								``` text
 								┌─id─┬─name────┬─days─┐
 								│  1 │ January │   31 │
 								└────┴─────────┴──────┘
 								```
 								**See Also**
-												find . -type f -name '*.md'| xargs -I{} perl -pi -e 's|https://clickhouse.tech|https://clickhouse.com|g' {}

											
										
										
											2021-09-19 20:05:54 +00:00
+								-   [like](https://clickhouse.com/docs/en/sql-reference/functions/string-search-functions/#function-like) <!--hide-->
-												DOCSUP-3478: Documented the iLike function (#15880)

* Description of the iLike function

Добавил описание функции iLike и добавил оператор ILIKE.

* Update string-search-functions.md

Changed by comments.

* Update and translation ilike function and ILIKE operator..

Внес поправки в английскую версию и сделал перевод на русский язык.

Co-authored-by: Dmitriy <sevirov@yandex-team.ru>
											
										
										
											2020-10-19 15:32:09 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## ngramDistance(haystack, needle)
-												Rename trigramDistance to ngramDistance, add more functions with CaseInsensitive and UTF, update docs, more job done in perf, added some perf tests for string search that I would like to see

											
										
										
											2019-03-05 22:42:28 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								Calculates the 4-gram distance between `haystack` and `needle`: counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns float number from 0 to 1 – the closer to zero, the more strings are similar to each other. If the constant `needle` or `haystack` is more than 32Kb, throws an exception. If some of the non-constant `haystack` or `needle` strings are more than 32Kb, the distance is always one.
-												Rename trigramDistance to ngramDistance, add more functions with CaseInsensitive and UTF, update docs, more job done in perf, added some perf tests for string search that I would like to see

											
										
										
											2019-03-05 22:42:28 +00:00
 								For case-insensitive search or/and in UTF-8 format use functions `ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8`.
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## ngramSearch(haystack, needle)
-												Rename trigramDistance to ngramDistance, add more functions with CaseInsensitive and UTF, update docs, more job done in perf, added some perf tests for string search that I would like to see

											
										
										
											2019-03-05 22:42:28 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								Same as `ngramDistance` but calculates the non-symmetric difference between `needle` and `haystack` – the number of n-grams from needle minus the common number of n-grams normalized by the number of `needle` n-grams. The closer to one, the more likely `needle` is in the `haystack`. Can be useful for fuzzy string search.
-												ngramEntry function was added

											
										
										
											2019-05-25 18:47:26 +00:00
-												ngramEntry renamed to ngramSearch

											
										
										
											2019-05-27 09:05:02 +00:00
+								For case-insensitive search or/and in UTF-8 format use functions `ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8`.
-												ngramEntry function was added

											
										
										
											2019-05-25 18:47:26 +00:00
-												Removed /ja folder, cleaned up /ru markdown

											
										
										
											2022-04-09 13:29:05 +00:00
+								:::note
 								For UTF-8 case we use 3-gram distance. All these are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables – collisions may occur. With UTF-8 case-insensitive format we do not use fair `tolower` function – we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one – this works for Latin and mostly for all Cyrillic letters.
 								:::
-												WIP on docs/website (#3383)

* CLICKHOUSE-4063: less manual html @ index.md

* CLICKHOUSE-4063: recommend markdown="1" in README.md

* CLICKHOUSE-4003: manually purge custom.css for now

* CLICKHOUSE-4064: expand <details> before any print (including to pdf)

* CLICKHOUSE-3927: rearrange interfaces/formats.md a bit

* CLICKHOUSE-3306: add few http headers

* Remove copy-paste introduced in #3392

* Hopefully better chinese fonts #3392

* get rid of tabs @ custom.css

* Apply comments and patch from #3384

* Add jdbc.md to ToC and some translation, though it still looks badly incomplete

* minor punctuation

* Add some backlinks to official website from mirrors that just blindly take markdown sources

* Do not make fonts extra light

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {}

* find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {}

* Remove outdated stuff from roadmap.md

* Not so light font on front page too

* Refactor Chinese formats.md to match recent changes in other languages

											
										
										
											2018-10-16 10:47:17 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## countSubstrings
-												Implement countSubstrings()

Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.

The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()

v0: substringCount()

v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle

v3: rename to countSubstrings (by analogy with countEqual())

											
										
										
											2020-11-26 18:16:07 +00:00
-												Minor improvements

											
										
										
											2020-12-29 11:30:47 +00:00
+								Returns the number of substring occurrences.
-												Implement countSubstrings()

Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.

The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()

v0: substringCount()

v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle

v3: rename to countSubstrings (by analogy with countEqual())

											
										
										
											2020-11-26 18:16:07 +00:00
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
+								For a case-insensitive search, use [countSubstringsCaseInsensitive](../../sql-reference/functions/string-search-functions.md#countSubstringsCaseInsensitive) or [countSubstringsCaseInsensitiveUTF8](../../sql-reference/functions/string-search-functions.md#countSubstringsCaseInsensitiveUTF8) functions.
-												Implement countSubstrings()

Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.

The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()

v0: substringCount()

v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle

v3: rename to countSubstrings (by analogy with countEqual())

											
										
										
											2020-11-26 18:16:07 +00:00
 								**Syntax**
 								``` sql
 								countSubstrings(haystack, needle[, start_pos])
 								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												Implement countSubstrings()

Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.

The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()

v0: substringCount()

v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle

v3: rename to countSubstrings (by analogy with countEqual())

											
										
										
											2020-11-26 18:16:07 +00:00
 								-   `haystack` — The string to search in. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `needle` — The substring to search for. [String](../../sql-reference/syntax.md#syntax-string-literal).
-												fixed typos

											
										
										
											2020-12-29 11:11:48 +00:00
+								-   `start_pos` – Position of the first character in the string to start search. Optional. [UInt](../../sql-reference/data-types/int-uint.md).
-												Implement countSubstrings()

Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.

The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()

v0: substringCount()

v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle

v3: rename to countSubstrings (by analogy with countEqual())

											
										
										
											2020-11-26 18:16:07 +00:00
 								**Returned values**
 								-   Number of occurrences.
-												Undo changes

											
										
										
											2021-01-13 21:13:36 +00:00
+								Type: [UInt64](../../sql-reference/data-types/int-uint.md).
-												Implement countSubstrings()

Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.

The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()

v0: substringCount()

v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle

v3: rename to countSubstrings (by analogy with countEqual())

											
										
										
											2020-11-26 18:16:07 +00:00
 								**Examples**
 								Query:
 								``` sql
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
+								SELECT countSubstrings('foobar.com', '.');
-												Implement countSubstrings()

Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.

The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()

v0: substringCount()

v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle

v3: rename to countSubstrings (by analogy with countEqual())

											
										
										
											2020-11-26 18:16:07 +00:00
+								```
 								Result:
 								``` text
 								┌─countSubstrings('foobar.com', '.')─┐
 								│                                  1 │
 								└────────────────────────────────────┘
 								```
 								Query:
 								``` sql
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
+								SELECT countSubstrings('aaaa', 'aa');
-												Implement countSubstrings()

Function to count number of substring occurrences in the string:
- in case of needle is multi char - counts non-intersecting substrings
- the code is based on position helpers.

The following new functions is available:
- countSubstrings()
- countSubstringsCaseInsensitive()
- countSubstringsCaseInsensitiveUTF8()

v0: substringCount()

v2:
- add substringCountCaseInsensitiveUTF8
- improve tests
- fix coding style issues
- fix multichar needle

v3: rename to countSubstrings (by analogy with countEqual())

											
										
										
											2020-11-26 18:16:07 +00:00
+								```
 								Result:
 								``` text
 								┌─countSubstrings('aaaa', 'aa')─┐
 								│                             2 │
 								└───────────────────────────────┘
 								```
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
+								Query:
 								```sql
 								SELECT countSubstrings('abc___abc', 'abc', 4);
 								```
 								Result:
 								``` text
 								┌─countSubstrings('abc___abc', 'abc', 4)─┐
 								│                                      1 │
 								└────────────────────────────────────────┘
 								```
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## countSubstringsCaseInsensitive
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
-												Minor improvements

											
										
										
											2020-12-29 11:30:47 +00:00
+								Returns the number of substring occurrences case-insensitive.
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
 								**Syntax**
 								``` sql
 								countSubstringsCaseInsensitive(haystack, needle[, start_pos])
 								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
 								-   `haystack` — The string to search in. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `needle` — The substring to search for. [String](../../sql-reference/syntax.md#syntax-string-literal).
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								-   `start_pos` — Position of the first character in the string to start search. Optional. [UInt](../../sql-reference/data-types/int-uint.md).
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
 								**Returned values**
 								-   Number of occurrences.
-												Undo changes

											
										
										
											2021-01-13 21:13:36 +00:00
+								Type: [UInt64](../../sql-reference/data-types/int-uint.md).
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
 								**Examples**
 								Query:
 								``` sql
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								SELECT countSubstringsCaseInsensitive('aba', 'B');
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
+								```
 								Result:
 								``` text
 								┌─countSubstringsCaseInsensitive('aba', 'B')─┐
 								│                                          1 │
 								└────────────────────────────────────────────┘
 								```
 								Query:
 								``` sql
 								SELECT countSubstringsCaseInsensitive('foobar.com', 'CoM');
 								```
 								Result:
 								``` text
 								┌─countSubstringsCaseInsensitive('foobar.com', 'CoM')─┐
 								│                                                   1 │
 								└─────────────────────────────────────────────────────┘
 								```
 								Query:
 								``` sql
 								SELECT countSubstringsCaseInsensitive('abC___abC', 'aBc', 2);
 								```
 								Result:
 								``` text
 								┌─countSubstringsCaseInsensitive('abC___abC', 'aBc', 2)─┐
 								│                                                     1 │
 								└───────────────────────────────────────────────────────┘
 								```
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## countSubstringsCaseInsensitiveUTF8
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
-												Minor improvements

											
										
										
											2020-12-29 11:30:47 +00:00
+								Returns the number of substring occurrences in `UTF-8` case-insensitive.
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
 								**Syntax**
 								``` sql
 								SELECT countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])
 								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
 								-   `haystack` — The string to search in. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `needle` — The substring to search for. [String](../../sql-reference/syntax.md#syntax-string-literal).
-												Edit and translate to Russian

Поправил шаблоны в английской и русской версиях.

											
										
										
											2021-03-13 18:18:45 +00:00
+								-   `start_pos` — Position of the first character in the string to start search. Optional. [UInt](../../sql-reference/data-types/int-uint.md).
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
 								**Returned values**
 								-   Number of occurrences.
 								Type: [UInt64](../../sql-reference/data-types/int-uint.md).
-												Remove trailing whitespaces from docs

											
										
										
											2021-07-29 15:20:55 +00:00
+								**Examples**
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
 								Query:
 								``` sql
 								SELECT countSubstringsCaseInsensitiveUTF8('абв', 'A');
 								```
 								Result:
 								``` text
 								┌─countSubstringsCaseInsensitiveUTF8('абв', 'A')─┐
 								│                                              1 │
 								└────────────────────────────────────────────────┘
 								```
 								Query:
 								```sql
 								SELECT countSubstringsCaseInsensitiveUTF8('аБв__АбВ__абв', 'Абв');
 								```
 								Result:
 								``` text
 								┌─countSubstringsCaseInsensitiveUTF8('аБв__АбВ__абв', 'Абв')─┐
 								│                                                          3 │
 								└────────────────────────────────────────────────────────────┘
 								```
-												 add countMatches sql function (issue #15413)

											
										
										
											2020-10-23 04:28:25 +00:00
-												Remove H1 anchor tags from docs

											
										
										
											2022-06-02 10:55:18 +00:00
+								## countMatches(haystack, pattern)
-												 add countMatches sql function (issue #15413)

											
										
										
											2020-10-23 04:28:25 +00:00
 								Returns the number of regular expression matches for a `pattern` in a `haystack`.
-												Edited English description

											
										
										
											2020-12-29 10:54:17 +00:00
-												Document the countMatches function.

Задокументировал функцию countMatches.

											
										
										
											2020-12-21 19:30:37 +00:00
+								**Syntax**
 								``` sql
 								countMatches(haystack, pattern)
 								```
-												Global replacement `Parameters` to `Arguments`

											
										
										
											2021-02-15 21:22:10 +00:00
+								**Arguments**
-												Document the countMatches function.

Задокументировал функцию countMatches.

											
										
										
											2020-12-21 19:30:37 +00:00
 								-   `haystack` — The string to search in. [String](../../sql-reference/syntax.md#syntax-string-literal).
 								-   `pattern` — The regular expression with [re2 syntax](https://github.com/google/re2/wiki/Syntax). [String](../../sql-reference/data-types/string.md).
 								**Returned value**
 								-   The number of matches.
 								Type: [UInt64](../../sql-reference/data-types/int-uint.md).
 								**Examples**
 								Query:
 								``` sql
-												Update countMatches function

Поставил ';' в конце запросов.

											
										
										
											2020-12-24 17:06:11 +00:00
+								SELECT countMatches('foobar.com', 'o+');
-												Document the countMatches function.

Задокументировал функцию countMatches.

											
										
										
											2020-12-21 19:30:37 +00:00
+								```
 								Result:
 								``` text
-												Translation into Russian language

Выполнил перевод на русский язык.

											
										
										
											2020-12-22 19:10:03 +00:00
+								┌─countMatches('foobar.com', 'o+')─┐
 								│                                2 │
 								└──────────────────────────────────┘
-												Document the countMatches function.

Задокументировал функцию countMatches.

											
										
										
											2020-12-21 19:30:37 +00:00
+								```
 								Query:
 								``` sql
-												Update countMatches function

Поставил ';' в конце запросов.

											
										
										
											2020-12-24 17:06:11 +00:00
+								SELECT countMatches('aaaa', 'aa');
-												Document the countMatches function.

Задокументировал функцию countMatches.

											
										
										
											2020-12-21 19:30:37 +00:00
+								```
 								Result:
 								``` text
-												Update countMatches function

Поставил ';' в конце запросов.

											
										
										
											2020-12-24 17:06:11 +00:00
+								┌─countMatches('aaaa', 'aa')────┐
-												Document the countMatches function.

Задокументировал функцию countMatches.

											
										
										
											2020-12-21 19:30:37 +00:00
+								│                             2 │
 								└───────────────────────────────┘
 								```