mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-10 01:25:21 +00:00
full path to links
This commit is contained in:
parent
b6b9be7a3e
commit
8d002e6d43
@ -12,7 +12,7 @@ Simhash is a hash function, which returns close hash values for close (similar)
|
||||
|
||||
## halfMD5
|
||||
|
||||
[Interprets](../../sql-reference/functions/type-conversion-functions.md#type_conversion_functions-reinterpretAsString) all the input parameters as strings and calculates the [MD5](https://en.wikipedia.org/wiki/MD5) hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as `UInt64` in big-endian byte order.
|
||||
[Interprets](/docs/en/sql-reference/functions/type-conversion-functions.md/#type_conversion_functions-reinterpretAsString) all the input parameters as strings and calculates the [MD5](https://en.wikipedia.org/wiki/MD5) hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as `UInt64` in big-endian byte order.
|
||||
|
||||
```sql
|
||||
halfMD5(par1, ...)
|
||||
@ -23,11 +23,11 @@ Consider using the [sipHash64](#hash_functions-siphash64) function instead.
|
||||
|
||||
**Arguments**
|
||||
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](../../sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
|
||||
**Returned Value**
|
||||
|
||||
A [UInt64](../../sql-reference/data-types/int-uint.md) data type hash value.
|
||||
A [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -61,7 +61,7 @@ sipHash64(par1,...)
|
||||
|
||||
This is a cryptographic hash function. It works at least three times faster than the [MD5](#hash_functions-md5) function.
|
||||
|
||||
Function [interprets](../../sql-reference/functions/type-conversion-functions.md#type_conversion_functions-reinterpretAsString) all the input parameters as strings and calculates the hash value for each of them. Then combines hashes by the following algorithm:
|
||||
Function [interprets](/docs/en/sql-reference/functions/type-conversion-functions.md/#type_conversion_functions-reinterpretAsString) all the input parameters as strings and calculates the hash value for each of them. Then combines hashes by the following algorithm:
|
||||
|
||||
1. After hashing all the input parameters, the function gets the array of hashes.
|
||||
2. Function takes the first and the second elements and calculates a hash for the array of them.
|
||||
@ -70,11 +70,11 @@ Function [interprets](../../sql-reference/functions/type-conversion-functions.md
|
||||
|
||||
**Arguments**
|
||||
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](../../sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
|
||||
**Returned Value**
|
||||
|
||||
A [UInt64](../../sql-reference/data-types/int-uint.md) data type hash value.
|
||||
A [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -100,13 +100,13 @@ sipHash128(par1,...)
|
||||
|
||||
**Arguments**
|
||||
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](../../sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
|
||||
**Returned value**
|
||||
|
||||
A 128-bit `SipHash` hash value.
|
||||
|
||||
Type: [FixedString(16)](../../sql-reference/data-types/fixedstring.md).
|
||||
Type: [FixedString(16)](/docs/en/sql-reference/data-types/fixedstring.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -136,11 +136,11 @@ This is a fast non-cryptographic hash function. It uses the CityHash algorithm f
|
||||
|
||||
**Arguments**
|
||||
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](../../sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
|
||||
**Returned Value**
|
||||
|
||||
A [UInt64](../../sql-reference/data-types/int-uint.md) data type hash value.
|
||||
A [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
|
||||
|
||||
**Examples**
|
||||
|
||||
@ -174,7 +174,7 @@ It works faster than intHash32. Average quality.
|
||||
|
||||
## SHA1, SHA224, SHA256, SHA512
|
||||
|
||||
Calculates SHA-1, SHA-224, SHA-256, SHA-512 hash from a string and returns the resulting set of bytes as [FixedString](../data-types/fixedstring.md).
|
||||
Calculates SHA-1, SHA-224, SHA-256, SHA-512 hash from a string and returns the resulting set of bytes as [FixedString](/docs/en/sql-reference/data-types/fixedstring.md).
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -190,17 +190,17 @@ Even in these cases, we recommend applying the function offline and pre-calculat
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `s` — Input string for SHA hash calculation. [String](../data-types/string.md).
|
||||
- `s` — Input string for SHA hash calculation. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- SHA hash as a hex-unencoded FixedString. SHA-1 returns as FixedString(20), SHA-224 as FixedString(28), SHA-256 — FixedString(32), SHA-512 — FixedString(64).
|
||||
|
||||
Type: [FixedString](../data-types/fixedstring.md).
|
||||
Type: [FixedString](/docs/en/sql-reference/data-types/fixedstring.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Use the [hex](../functions/encoding-functions.md#hex) function to represent the result as a hex-encoded string.
|
||||
Use the [hex](/docs/en/sql-reference/functions/encoding-functions.md/#hex) function to represent the result as a hex-encoded string.
|
||||
|
||||
Query:
|
||||
|
||||
@ -218,7 +218,7 @@ Result:
|
||||
|
||||
## BLAKE3
|
||||
|
||||
Calculates BLAKE3 hash string and returns the resulting set of bytes as [FixedString](../data-types/fixedstring.md).
|
||||
Calculates BLAKE3 hash string and returns the resulting set of bytes as [FixedString](/docs/en/sql-reference/data-types/fixedstring.md).
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -230,17 +230,17 @@ This cryptographic hash-function is integrated into ClickHouse with BLAKE3 Rust
|
||||
|
||||
**Arguments**
|
||||
|
||||
- s - input string for BLAKE3 hash calculation. [String](../data-types/string.md).
|
||||
- s - input string for BLAKE3 hash calculation. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
|
||||
**Return value**
|
||||
|
||||
- BLAKE3 hash as a byte array with type FixedString(32).
|
||||
|
||||
Type: [FixedString](../data-types/fixedstring.md).
|
||||
Type: [FixedString](/docs/en/sql-reference/data-types/fixedstring.md).
|
||||
|
||||
**Example**
|
||||
|
||||
Use function [hex](../functions/encoding-functions.md#hex) to represent the result as a hex-encoded string.
|
||||
Use function [hex](/docs/en/sql-reference/functions/encoding-functions.md/#hex) to represent the result as a hex-encoded string.
|
||||
|
||||
Query:
|
||||
```sql
|
||||
@ -276,11 +276,11 @@ These functions use the `Fingerprint64` and `Hash64` methods respectively from a
|
||||
|
||||
**Arguments**
|
||||
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](../../sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data)..
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
|
||||
**Returned Value**
|
||||
|
||||
A [UInt64](../../sql-reference/data-types/int-uint.md) data type hash value.
|
||||
A [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -423,11 +423,11 @@ metroHash64(par1, ...)
|
||||
|
||||
**Arguments**
|
||||
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](../../sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
The function takes a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
|
||||
**Returned Value**
|
||||
|
||||
A [UInt64](../../sql-reference/data-types/int-uint.md) data type hash value.
|
||||
A [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -458,12 +458,12 @@ murmurHash2_64(par1, ...)
|
||||
|
||||
**Arguments**
|
||||
|
||||
Both functions take a variable number of input parameters. Arguments can be any of the [supported data types](../../sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
Both functions take a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
|
||||
**Returned Value**
|
||||
|
||||
- The `murmurHash2_32` function returns hash value having the [UInt32](../../sql-reference/data-types/int-uint.md) data type.
|
||||
- The `murmurHash2_64` function returns hash value having the [UInt64](../../sql-reference/data-types/int-uint.md) data type.
|
||||
- The `murmurHash2_32` function returns hash value having the [UInt32](/docs/en/sql-reference/data-types/int-uint.md) data type.
|
||||
- The `murmurHash2_64` function returns hash value having the [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -489,13 +489,13 @@ gccMurmurHash(par1, ...)
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `par1, ...` — A variable number of parameters that can be any of the [supported data types](../../sql-reference/data-types/index.md#data_types).
|
||||
- `par1, ...` — A variable number of parameters that can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md/#data_types).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Calculated hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -526,12 +526,12 @@ murmurHash3_64(par1, ...)
|
||||
|
||||
**Arguments**
|
||||
|
||||
Both functions take a variable number of input parameters. Arguments can be any of the [supported data types](../../sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
Both functions take a variable number of input parameters. Arguments can be any of the [supported data types](/docs/en/sql-reference/data-types/index.md). For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed `Tuple` with the same data, `Map` and the corresponding `Array(Tuple(key, value))` type with the same data).
|
||||
|
||||
**Returned Value**
|
||||
|
||||
- The `murmurHash3_32` function returns a [UInt32](../../sql-reference/data-types/int-uint.md) data type hash value.
|
||||
- The `murmurHash3_64` function returns a [UInt64](../../sql-reference/data-types/int-uint.md) data type hash value.
|
||||
- The `murmurHash3_32` function returns a [UInt32](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
|
||||
- The `murmurHash3_64` function returns a [UInt64](/docs/en/sql-reference/data-types/int-uint.md) data type hash value.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -557,13 +557,13 @@ murmurHash3_128(expr)
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `expr` — A list of [expressions](../../sql-reference/syntax.md#syntax-expressions). [String](../../sql-reference/data-types/string.md).
|
||||
- `expr` — A list of [expressions](/docs/en/sql-reference/syntax.md/#syntax-expressions). [String](/docs/en/sql-reference/data-types/string.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
A 128-bit `MurmurHash3` hash value.
|
||||
|
||||
Type: [FixedString(16)](../../sql-reference/data-types/fixedstring.md).
|
||||
Type: [FixedString(16)](/docs/en/sql-reference/data-types/fixedstring.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -593,13 +593,13 @@ xxh3(expr)
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `expr` — A list of [expressions](../../sql-reference/syntax.md#syntax-expressions) of any data type.
|
||||
- `expr` — A list of [expressions](/docs/en/sql-reference/syntax.md/#syntax-expressions) of any data type.
|
||||
|
||||
**Returned value**
|
||||
|
||||
A 64-bit `xxh3` hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -658,7 +658,7 @@ Result:
|
||||
|
||||
Splits a ASCII string into n-grams of `ngramsize` symbols and returns the n-gram `simhash`. Is case sensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](../../sql-reference/functions/bit-functions.md#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](/docs/en/sql-reference/functions/bit-functions.md/#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -668,14 +668,14 @@ ngramSimHash(string[, ngramsize])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -697,7 +697,7 @@ Result:
|
||||
|
||||
Splits a ASCII string into n-grams of `ngramsize` symbols and returns the n-gram `simhash`. Is case insensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](../../sql-reference/functions/bit-functions.md#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](/docs/en/sql-reference/functions/bit-functions.md/#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -707,14 +707,14 @@ ngramSimHashCaseInsensitive(string[, ngramsize])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -736,7 +736,7 @@ Result:
|
||||
|
||||
Splits a UTF-8 string into n-grams of `ngramsize` symbols and returns the n-gram `simhash`. Is case sensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](../../sql-reference/functions/bit-functions.md#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](/docs/en/sql-reference/functions/bit-functions.md/#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -746,14 +746,14 @@ ngramSimHashUTF8(string[, ngramsize])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -775,7 +775,7 @@ Result:
|
||||
|
||||
Splits a UTF-8 string into n-grams of `ngramsize` symbols and returns the n-gram `simhash`. Is case insensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](../../sql-reference/functions/bit-functions.md#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](/docs/en/sql-reference/functions/bit-functions.md/#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -785,14 +785,14 @@ ngramSimHashCaseInsensitiveUTF8(string[, ngramsize])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -814,7 +814,7 @@ Result:
|
||||
|
||||
Splits a ASCII string into parts (shingles) of `shinglesize` words and returns the word shingle `simhash`. Is case sensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](../../sql-reference/functions/bit-functions.md#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](/docs/en/sql-reference/functions/bit-functions.md/#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -824,14 +824,14 @@ wordShingleSimHash(string[, shinglesize])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -853,7 +853,7 @@ Result:
|
||||
|
||||
Splits a ASCII string into parts (shingles) of `shinglesize` words and returns the word shingle `simhash`. Is case insensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](../../sql-reference/functions/bit-functions.md#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](/docs/en/sql-reference/functions/bit-functions.md/#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -863,14 +863,14 @@ wordShingleSimHashCaseInsensitive(string[, shinglesize])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -892,7 +892,7 @@ Result:
|
||||
|
||||
Splits a UTF-8 string into parts (shingles) of `shinglesize` words and returns the word shingle `simhash`. Is case sensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](../../sql-reference/functions/bit-functions.md#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](/docs/en/sql-reference/functions/bit-functions.md/#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -902,14 +902,14 @@ wordShingleSimHashUTF8(string[, shinglesize])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optinal. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optinal. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -931,7 +931,7 @@ Result:
|
||||
|
||||
Splits a UTF-8 string into parts (shingles) of `shinglesize` words and returns the word shingle `simhash`. Is case insensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](../../sql-reference/functions/bit-functions.md#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [bitHammingDistance](/docs/en/sql-reference/functions/bit-functions.md/#bithammingdistance). The smaller is the [Hamming Distance](https://en.wikipedia.org/wiki/Hamming_distance) of the calculated `simhashes` of two strings, the more likely these strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -941,14 +941,14 @@ wordShingleSimHashCaseInsensitiveUTF8(string[, shinglesize])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Hash value.
|
||||
|
||||
Type: [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -970,7 +970,7 @@ Result:
|
||||
|
||||
Splits a ASCII string into n-grams of `ngramsize` symbols and calculates hash values for each n-gram. Uses `hashnum` minimum hashes to calculate the minimum hash and `hashnum` maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](../../sql-reference/functions/tuple-functions.md#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](/docs/en/sql-reference/functions/tuple-functions.md/#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -980,15 +980,15 @@ ngramMinHash(string[, ngramsize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two hashes — the minimum and the maximum.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([UInt64](../../sql-reference/data-types/int-uint.md), [UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([UInt64](/docs/en/sql-reference/data-types/int-uint.md), [UInt64](/docs/en/sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1010,7 +1010,7 @@ Result:
|
||||
|
||||
Splits a ASCII string into n-grams of `ngramsize` symbols and calculates hash values for each n-gram. Uses `hashnum` minimum hashes to calculate the minimum hash and `hashnum` maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](../../sql-reference/functions/tuple-functions.md#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](/docs/en/sql-reference/functions/tuple-functions.md/#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -1020,15 +1020,15 @@ ngramMinHashCaseInsensitive(string[, ngramsize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two hashes — the minimum and the maximum.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([UInt64](../../sql-reference/data-types/int-uint.md), [UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([UInt64](/docs/en/sql-reference/data-types/int-uint.md), [UInt64](/docs/en/sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1050,7 +1050,7 @@ Result:
|
||||
|
||||
Splits a UTF-8 string into n-grams of `ngramsize` symbols and calculates hash values for each n-gram. Uses `hashnum` minimum hashes to calculate the minimum hash and `hashnum` maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](../../sql-reference/functions/tuple-functions.md#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](/docs/en/sql-reference/functions/tuple-functions.md/#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -1060,15 +1060,15 @@ ngramMinHashUTF8(string[, ngramsize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two hashes — the minimum and the maximum.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([UInt64](../../sql-reference/data-types/int-uint.md), [UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([UInt64](/docs/en/sql-reference/data-types/int-uint.md), [UInt64](/docs/en/sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1090,7 +1090,7 @@ Result:
|
||||
|
||||
Splits a UTF-8 string into n-grams of `ngramsize` symbols and calculates hash values for each n-gram. Uses `hashnum` minimum hashes to calculate the minimum hash and `hashnum` maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](../../sql-reference/functions/tuple-functions.md#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](/docs/en/sql-reference/functions/tuple-functions.md/#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -1100,15 +1100,15 @@ ngramMinHashCaseInsensitiveUTF8(string [, ngramsize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two hashes — the minimum and the maximum.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([UInt64](../../sql-reference/data-types/int-uint.md), [UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([UInt64](/docs/en/sql-reference/data-types/int-uint.md), [UInt64](/docs/en/sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1138,15 +1138,15 @@ ngramMinHashArg(string[, ngramsize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two tuples with `hashnum` n-grams each.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md)), [Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md))).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md)), [Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1176,15 +1176,15 @@ ngramMinHashArgCaseInsensitive(string[, ngramsize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two tuples with `hashnum` n-grams each.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md)), [Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md))).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md)), [Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1214,15 +1214,15 @@ ngramMinHashArgUTF8(string[, ngramsize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two tuples with `hashnum` n-grams each.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md)), [Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md))).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md)), [Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1252,15 +1252,15 @@ ngramMinHashArgCaseInsensitiveUTF8(string[, ngramsize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `ngramsize` — The size of an n-gram. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two tuples with `hashnum` n-grams each.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md)), [Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md))).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md)), [Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1282,7 +1282,7 @@ Result:
|
||||
|
||||
Splits a ASCII string into parts (shingles) of `shinglesize` words and calculates hash values for each word shingle. Uses `hashnum` minimum hashes to calculate the minimum hash and `hashnum` maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](../../sql-reference/functions/tuple-functions.md#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](/docs/en/sql-reference/functions/tuple-functions.md/#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -1292,15 +1292,15 @@ wordShingleMinHash(string[, shinglesize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two hashes — the minimum and the maximum.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([UInt64](../../sql-reference/data-types/int-uint.md), [UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([UInt64](/docs/en/sql-reference/data-types/int-uint.md), [UInt64](/docs/en/sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1322,7 +1322,7 @@ Result:
|
||||
|
||||
Splits a ASCII string into parts (shingles) of `shinglesize` words and calculates hash values for each word shingle. Uses `hashnum` minimum hashes to calculate the minimum hash and `hashnum` maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](../../sql-reference/functions/tuple-functions.md#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](/docs/en/sql-reference/functions/tuple-functions.md/#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -1332,15 +1332,15 @@ wordShingleMinHashCaseInsensitive(string[, shinglesize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two hashes — the minimum and the maximum.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([UInt64](../../sql-reference/data-types/int-uint.md), [UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([UInt64](/docs/en/sql-reference/data-types/int-uint.md), [UInt64](/docs/en/sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1362,7 +1362,7 @@ Result:
|
||||
|
||||
Splits a UTF-8 string into parts (shingles) of `shinglesize` words and calculates hash values for each word shingle. Uses `hashnum` minimum hashes to calculate the minimum hash and `hashnum` maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](../../sql-reference/functions/tuple-functions.md#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](/docs/en/sql-reference/functions/tuple-functions.md/#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -1372,15 +1372,15 @@ wordShingleMinHashUTF8(string[, shinglesize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two hashes — the minimum and the maximum.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([UInt64](../../sql-reference/data-types/int-uint.md), [UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([UInt64](/docs/en/sql-reference/data-types/int-uint.md), [UInt64](/docs/en/sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1402,7 +1402,7 @@ Result:
|
||||
|
||||
Splits a UTF-8 string into parts (shingles) of `shinglesize` words and calculates hash values for each word shingle. Uses `hashnum` minimum hashes to calculate the minimum hash and `hashnum` maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
|
||||
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](../../sql-reference/functions/tuple-functions.md#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
Can be used for detection of semi-duplicate strings with [tupleHammingDistance](/docs/en/sql-reference/functions/tuple-functions.md/#tuplehammingdistance). For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -1412,15 +1412,15 @@ wordShingleMinHashCaseInsensitiveUTF8(string[, shinglesize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two hashes — the minimum and the maximum.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([UInt64](../../sql-reference/data-types/int-uint.md), [UInt64](../../sql-reference/data-types/int-uint.md)).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([UInt64](/docs/en/sql-reference/data-types/int-uint.md), [UInt64](/docs/en/sql-reference/data-types/int-uint.md)).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1450,15 +1450,15 @@ wordShingleMinHashArg(string[, shinglesize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two tuples with `hashnum` word shingles each.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md)), [Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md))).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md)), [Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1488,15 +1488,15 @@ wordShingleMinHashArgCaseInsensitive(string[, shinglesize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two tuples with `hashnum` word shingles each.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md)), [Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md))).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md)), [Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1526,15 +1526,15 @@ wordShingleMinHashArgUTF8(string[, shinglesize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two tuples with `hashnum` word shingles each.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md)), [Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md))).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md)), [Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -1564,15 +1564,15 @@ wordShingleMinHashArgCaseInsensitiveUTF8(string[, shinglesize, hashnum])
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `string` — String. [String](../../sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](../../sql-reference/data-types/int-uint.md).
|
||||
- `string` — String. [String](/docs/en/sql-reference/data-types/string.md).
|
||||
- `shinglesize` — The size of a word shingle. Optional. Possible values: any number from `1` to `25`. Default value: `3`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
- `hashnum` — The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from `1` to `25`. Default value: `6`. [UInt8](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- Tuple with two tuples with `hashnum` word shingles each.
|
||||
|
||||
Type: [Tuple](../../sql-reference/data-types/tuple.md)([Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md)), [Tuple](../../sql-reference/data-types/tuple.md)([String](../../sql-reference/data-types/string.md))).
|
||||
Type: [Tuple](/docs/en/sql-reference/data-types/tuple.md)([Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md)), [Tuple](/docs/en/sql-reference/data-types/tuple.md)([String](/docs/en/sql-reference/data-types/string.md))).
|
||||
|
||||
**Example**
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user