2020-04-03 13:23:32 +00:00
---
toc_priority: 50
toc_title: Hash
---
2020-03-20 10:10:48 +00:00
# Hash functions {#hash-functions}
2017-12-28 15:13:23 +00:00
2019-07-25 10:14:04 +00:00
Hash functions can be used for the deterministic pseudo-random shuffling of elements.
2017-04-03 19:49:50 +00:00
2020-03-21 04:11:51 +00:00
## halfMD5 {#hash-functions-halfmd5}
2017-04-03 19:49:50 +00:00
2020-04-03 13:23:32 +00:00
[Interprets ](../../sql_reference/functions/type_conversion_functions.md#type_conversion_functions-reinterpretAsString ) all the input parameters as strings and calculates the [MD5 ](https://en.wikipedia.org/wiki/MD5 ) hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as `UInt64` in big-endian byte order.
2017-04-03 19:49:50 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
halfMD5(par1, ...)
```
2019-07-25 10:14:04 +00:00
The function is relatively slow (5 million short strings per second per processor core).
2019-06-06 03:44:07 +00:00
Consider using the [sipHash64 ](#hash_functions-siphash64 ) function instead.
**Parameters**
2020-04-03 13:23:32 +00:00
The function takes a variable number of input parameters. Parameters can be any of the [supported data types ](../../sql_reference/data_types/index.md ).
2019-06-06 03:44:07 +00:00
**Returned Value**
2020-04-03 13:23:32 +00:00
A [UInt64 ](../../sql_reference/data_types/int_uint.md ) data type hash value.
2019-06-06 03:44:07 +00:00
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
SELECT halfMD5(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS halfMD5hash, toTypeName(halfMD5hash) AS type
```
2020-03-20 10:10:48 +00:00
``` text
2019-06-06 03:44:07 +00:00
┌────────halfMD5hash─┬─type───┐
│ 186182704141653334 │ UInt64 │
└────────────────────┴────────┘
```
2020-03-22 09:14:59 +00:00
## MD5 {#hash_functions-md5}
2017-12-28 15:13:23 +00:00
2017-04-26 19:16:38 +00:00
Calculates the MD5 from a string and returns the resulting set of bytes as FixedString(16).
2020-03-20 10:10:48 +00:00
If you don’ t need MD5 in particular, but you need a decent cryptographic 128-bit hash, use the ‘ sipHash128’ function instead.
2017-12-28 15:13:23 +00:00
If you want to get the same result as output by the md5sum utility, use lower(hex(MD5(s))).
2020-03-22 09:14:59 +00:00
## sipHash64 {#hash_functions-siphash64}
2017-04-03 19:49:50 +00:00
2019-07-25 10:14:04 +00:00
Produces a 64-bit [SipHash ](https://131002.net/siphash/ ) hash value.
2019-06-06 03:44:07 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
sipHash64(par1,...)
```
This is a cryptographic hash function. It works at least three times faster than the [MD5 ](#hash_functions-md5 ) function.
2020-04-03 13:23:32 +00:00
Function [interprets ](../../sql_reference/functions/type_conversion_functions.md#type_conversion_functions-reinterpretAsString ) all the input parameters as strings and calculates the hash value for each of them. Then combines hashes by the following algorithm:
2019-07-25 10:14:04 +00:00
2020-03-20 10:10:48 +00:00
1. After hashing all the input parameters, the function gets the array of hashes.
2. Function takes the first and the second elements and calculates a hash for the array of them.
3. Then the function takes the hash value, calculated at the previous step, and the third element of the initial hash array, and calculates a hash for the array of them.
4. The previous step is repeated for all the remaining elements of the initial hash array.
2019-07-25 10:14:04 +00:00
2019-06-06 03:44:07 +00:00
**Parameters**
2020-04-03 13:23:32 +00:00
The function takes a variable number of input parameters. Parameters can be any of the [supported data types ](../../sql_reference/data_types/index.md ).
2019-06-06 03:44:07 +00:00
**Returned Value**
2020-04-03 13:23:32 +00:00
A [UInt64 ](../../sql_reference/data_types/int_uint.md ) data type hash value.
2019-06-06 03:44:07 +00:00
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
SELECT sipHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS SipHash, toTypeName(SipHash) AS type
```
2020-03-20 10:10:48 +00:00
``` text
2019-06-06 03:44:07 +00:00
┌──────────────SipHash─┬─type───┐
│ 13726873534472839665 │ UInt64 │
└──────────────────────┴────────┘
```
2017-12-28 15:13:23 +00:00
2020-03-22 09:14:59 +00:00
## sipHash128 {#hash_functions-siphash128}
2017-04-03 19:49:50 +00:00
2017-04-26 19:16:38 +00:00
Calculates SipHash from a string.
Accepts a String-type argument. Returns FixedString(16).
2019-05-05 17:38:05 +00:00
Differs from sipHash64 in that the final xor-folding state is only done up to 128 bits.
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
## cityHash64 {#cityhash64}
2017-04-03 19:49:50 +00:00
2019-07-25 10:14:04 +00:00
Produces a 64-bit [CityHash ](https://github.com/google/cityhash ) hash value.
2019-06-06 03:44:07 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
cityHash64(par1,...)
```
2019-07-25 10:14:04 +00:00
This is a fast non-cryptographic hash function. It uses the CityHash algorithm for string parameters and implementation-specific fast non-cryptographic hash function for parameters with other data types. The function uses the CityHash combinator to get the final results.
2019-06-06 03:44:07 +00:00
**Parameters**
2020-04-03 13:23:32 +00:00
The function takes a variable number of input parameters. Parameters can be any of the [supported data types ](../../sql_reference/data_types/index.md ).
2019-06-06 03:44:07 +00:00
**Returned Value**
2020-04-03 13:23:32 +00:00
A [UInt64 ](../../sql_reference/data_types/int_uint.md ) data type hash value.
2019-06-06 03:44:07 +00:00
**Examples**
Call example:
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
SELECT cityHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS CityHash, toTypeName(CityHash) AS type
```
2020-03-20 10:10:48 +00:00
``` text
2019-06-06 03:44:07 +00:00
┌─────────────CityHash─┬─type───┐
│ 12072650598913549138 │ UInt64 │
└──────────────────────┴────────┘
```
The following example shows how to compute the checksum of the entire table with accuracy up to the row order:
2020-03-20 10:10:48 +00:00
``` sql
2019-07-25 10:14:04 +00:00
SELECT groupBitXor(cityHash64(*)) FROM table
2019-06-06 03:44:07 +00:00
```
2020-03-20 10:10:48 +00:00
## intHash32 {#inthash32}
2017-04-03 19:49:50 +00:00
2017-04-26 19:16:38 +00:00
Calculates a 32-bit hash code from any type of integer.
This is a relatively fast non-cryptographic hash function of average quality for numbers.
2017-04-03 19:49:50 +00:00
2020-03-20 10:10:48 +00:00
## intHash64 {#inthash64}
2017-12-28 15:13:23 +00:00
2017-04-26 19:16:38 +00:00
Calculates a 64-bit hash code from any type of integer.
It works faster than intHash32. Average quality.
2017-04-03 19:49:50 +00:00
2020-03-20 10:10:48 +00:00
## SHA1 {#sha1}
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
## SHA224 {#sha224}
2017-04-03 19:49:50 +00:00
2020-03-20 10:10:48 +00:00
## SHA256 {#sha256}
2017-04-03 19:49:50 +00:00
2017-04-26 19:16:38 +00:00
Calculates SHA-1, SHA-224, or SHA-256 from a string and returns the resulting set of bytes as FixedString(20), FixedString(28), or FixedString(32).
The function works fairly slowly (SHA-1 processes about 5 million short strings per second per processor core, while SHA-224 and SHA-256 process about 2.2 million).
2020-03-20 10:10:48 +00:00
We recommend using this function only in cases when you need a specific hash function and you can’ t select it.
2017-04-26 19:16:38 +00:00
Even in these cases, we recommend applying the function offline and pre-calculating values when inserting them into the table, instead of applying it in SELECTS.
2017-04-03 19:49:50 +00:00
2020-03-20 10:10:48 +00:00
## URLHash(url\[, N\]) {#urlhashurl-n}
2017-04-26 19:16:38 +00:00
2017-12-28 15:13:23 +00:00
A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization.
`URLHash(s)` – Calculates a hash from a string without one of the trailing symbols `/` ,`?` or `#` at the end, if present.
`URLHash(s, N)` – Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols `/` ,`?` or `#` at the end, if present.
2017-04-26 19:16:38 +00:00
Levels are the same as in URLHierarchy. This function is specific to Yandex.Metrica.
2017-12-28 15:13:23 +00:00
2020-03-20 10:10:48 +00:00
## farmHash64 {#farmhash64}
2019-01-30 10:39:46 +00:00
2019-06-06 03:44:07 +00:00
Produces a 64-bit [FarmHash ](https://github.com/google/farmhash ) hash value.
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
farmHash64(par1, ...)
```
The function uses the `Hash64` method from all [available methods ](https://github.com/google/farmhash/blob/master/src/farmhash.h ).
**Parameters**
2020-04-03 13:23:32 +00:00
The function takes a variable number of input parameters. Parameters can be any of the [supported data types ](../../sql_reference/data_types/index.md ).
2019-06-06 03:44:07 +00:00
**Returned Value**
2020-04-03 13:23:32 +00:00
A [UInt64 ](../../sql_reference/data_types/int_uint.md ) data type hash value.
2019-06-06 03:44:07 +00:00
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
SELECT farmHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS FarmHash, toTypeName(FarmHash) AS type
```
2020-03-20 10:10:48 +00:00
``` text
2019-06-06 03:44:07 +00:00
┌─────────────FarmHash─┬─type───┐
│ 17790458267262532859 │ UInt64 │
└──────────────────────┴────────┘
```
2019-01-30 10:39:46 +00:00
2020-03-22 09:14:59 +00:00
## javaHash {#hash_functions-javahash}
2019-01-30 10:39:46 +00:00
2019-09-23 23:27:24 +00:00
Calculates [JavaHash ](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/String.java#l1452 ) from a string. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2019-11-29 12:15:56 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-09-19 13:49:16 +00:00
SELECT javaHash('');
2019-09-19 12:31:03 +00:00
```
**Returned value**
A `Int32` data type hash value.
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-09-19 12:31:03 +00:00
SELECT javaHash('Hello, world!');
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-09-19 12:31:03 +00:00
┌─javaHash('Hello, world!')─┐
│ -1880044555 │
└───────────────────────────┘
```
2020-03-18 18:43:51 +00:00
## javaHashUTF16LE {#javahashutf16le}
2019-11-11 04:13:55 +00:00
2019-12-03 00:56:38 +00:00
Calculates [JavaHash ](http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/478a4add975b/src/share/classes/java/lang/String.java#l1452 ) from a string, assuming it contains bytes representing a string in UTF-16LE encoding.
2019-11-11 04:13:55 +00:00
2020-03-20 10:10:48 +00:00
**Syntax**
2019-11-28 20:26:41 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-11-29 12:15:56 +00:00
javaHashUTF16LE(stringUtf16le)
2019-11-28 20:26:41 +00:00
```
2020-03-20 10:10:48 +00:00
**Parameters**
2019-11-28 20:26:41 +00:00
2020-03-21 04:11:51 +00:00
- `stringUtf16le` — a string in UTF-16LE encoding.
2019-11-28 20:26:41 +00:00
**Returned value**
2019-11-29 12:15:56 +00:00
A `Int32` data type hash value.
2019-11-28 20:26:41 +00:00
2019-11-11 04:13:55 +00:00
**Example**
2019-11-28 20:26:41 +00:00
Correct query with UTF-16LE encoded string.
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-11-28 20:26:41 +00:00
SELECT javaHashUTF16LE(convertCharset('test', 'utf-8', 'utf-16le'))
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-11-28 20:26:41 +00:00
┌─javaHashUTF16LE(convertCharset('test', 'utf-8', 'utf-16le'))─┐
│ 3556498 │
└──────────────────────────────────────────────────────────────┘
2019-11-11 04:13:55 +00:00
```
2020-03-21 04:11:51 +00:00
## hiveHash {#hash-functions-hivehash}
2019-09-19 12:31:03 +00:00
Calculates `HiveHash` from a string.
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-09-19 13:49:16 +00:00
SELECT hiveHash('');
2019-09-19 12:31:03 +00:00
```
2019-01-30 10:39:46 +00:00
2019-09-23 23:27:24 +00:00
This is just [JavaHash ](#hash_functions-javahash ) with zeroed out sign bit. This function is used in [Apache Hive ](https://en.wikipedia.org/wiki/Apache_Hive ) for versions before 3.0. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.
2019-01-30 10:39:46 +00:00
2019-09-19 12:31:03 +00:00
**Returned value**
A `Int32` data type hash value.
Type: `hiveHash` .
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-09-19 12:31:03 +00:00
SELECT hiveHash('Hello, world!');
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-09-19 12:31:03 +00:00
┌─hiveHash('Hello, world!')─┐
│ 267439093 │
└───────────────────────────┘
```
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
## metroHash64 {#metrohash64}
2019-01-30 10:39:46 +00:00
2019-06-06 03:44:07 +00:00
Produces a 64-bit [MetroHash ](http://www.jandrewrogers.com/2015/05/27/metrohash/ ) hash value.
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
metroHash64(par1, ...)
```
**Parameters**
2020-04-03 13:23:32 +00:00
The function takes a variable number of input parameters. Parameters can be any of the [supported data types ](../../sql_reference/data_types/index.md ).
2019-06-06 03:44:07 +00:00
**Returned Value**
2020-04-03 13:23:32 +00:00
A [UInt64 ](../../sql_reference/data_types/int_uint.md ) data type hash value.
2019-06-06 03:44:07 +00:00
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
SELECT metroHash64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MetroHash, toTypeName(MetroHash) AS type
```
2020-03-20 10:10:48 +00:00
``` text
2019-06-06 03:44:07 +00:00
┌────────────MetroHash─┬─type───┐
│ 14235658766382344533 │ UInt64 │
└──────────────────────┴────────┘
```
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
## jumpConsistentHash {#jumpconsistenthash}
2019-01-30 10:39:46 +00:00
Calculates JumpConsistentHash form a UInt64.
2019-07-16 08:43:45 +00:00
Accepts two arguments: a UInt64-type key and the number of buckets. Returns Int32.
2019-01-30 10:39:46 +00:00
For more information, see the link: [JumpConsistentHash ](https://arxiv.org/pdf/1406.2294.pdf )
2020-03-21 04:11:51 +00:00
## murmurHash2\_32, murmurHash2\_64 {#murmurhash2-32-murmurhash2-64}
2019-01-30 10:39:46 +00:00
2019-06-06 03:44:07 +00:00
Produces a [MurmurHash2 ](https://github.com/aappleby/smhasher ) hash value.
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
murmurHash2_32(par1, ...)
murmurHash2_64(par1, ...)
```
**Parameters**
2020-04-03 13:23:32 +00:00
Both functions take a variable number of input parameters. Parameters can be any of the [supported data types ](../../sql_reference/data_types/index.md ).
2019-06-06 03:44:07 +00:00
**Returned Value**
2020-04-03 13:23:32 +00:00
- The `murmurHash2_32` function returns hash value having the [UInt32 ](../../sql_reference/data_types/int_uint.md ) data type.
- The `murmurHash2_64` function returns hash value having the [UInt64 ](../../sql_reference/data_types/int_uint.md ) data type.
2019-06-06 03:44:07 +00:00
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
SELECT murmurHash2_64(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MurmurHash2, toTypeName(MurmurHash2) AS type
```
2020-03-20 10:10:48 +00:00
``` text
2019-06-06 03:44:07 +00:00
┌──────────MurmurHash2─┬─type───┐
│ 11832096901709403633 │ UInt64 │
└──────────────────────┴────────┘
```
2020-03-21 04:11:51 +00:00
## murmurHash3\_32, murmurHash3\_64 {#murmurhash3-32-murmurhash3-64}
2019-06-06 03:44:07 +00:00
Produces a [MurmurHash3 ](https://github.com/aappleby/smhasher ) hash value.
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
murmurHash3_32(par1, ...)
murmurHash3_64(par1, ...)
```
**Parameters**
2020-04-03 13:23:32 +00:00
Both functions take a variable number of input parameters. Parameters can be any of the [supported data types ](../../sql_reference/data_types/index.md ).
2019-06-06 03:44:07 +00:00
**Returned Value**
2020-04-03 13:23:32 +00:00
- The `murmurHash3_32` function returns a [UInt32 ](../../sql_reference/data_types/int_uint.md ) data type hash value.
- The `murmurHash3_64` function returns a [UInt64 ](../../sql_reference/data_types/int_uint.md ) data type hash value.
2019-06-06 03:44:07 +00:00
**Example**
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
SELECT murmurHash3_32(array('e','x','a'), 'mple', 10, toDateTime('2019-06-15 23:00:00')) AS MurmurHash3, toTypeName(MurmurHash3) AS type
```
2020-03-20 10:10:48 +00:00
``` text
2019-06-06 03:44:07 +00:00
┌─MurmurHash3─┬─type───┐
│ 2152717 │ UInt32 │
└─────────────┴────────┘
```
2020-03-21 04:11:51 +00:00
## murmurHash3\_128 {#murmurhash3-128}
2019-06-06 03:44:07 +00:00
Produces a 128-bit [MurmurHash3 ](https://github.com/aappleby/smhasher ) hash value.
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
murmurHash3_128( expr )
```
**Parameters**
2020-04-03 13:23:32 +00:00
- `expr` — [Expressions ](../syntax.md#syntax-expressions ) returning a [String ](../../sql_reference/data_types/string.md )-type value.
2019-06-06 03:44:07 +00:00
**Returned Value**
2020-04-03 13:23:32 +00:00
A [FixedString(16) ](../../sql_reference/data_types/fixedstring.md ) data type hash value.
2019-01-30 10:39:46 +00:00
2019-06-06 03:44:07 +00:00
**Example**
2019-01-30 10:39:46 +00:00
2020-03-20 10:10:48 +00:00
``` sql
2019-06-06 03:44:07 +00:00
SELECT murmurHash3_128('example_string') AS MurmurHash3, toTypeName(MurmurHash3) AS type
```
2020-03-20 10:10:48 +00:00
``` text
2019-06-06 03:44:07 +00:00
┌─MurmurHash3──────┬─type────────────┐
│ 6<> 1<1C> 4"S5KT<4B> ~~q │ FixedString(16) │
└──────────────────┴─────────────────┘
```
2019-01-30 10:39:46 +00:00
2020-03-21 04:11:51 +00:00
## xxHash32, xxHash64 {#hash-functions-xxhash32}
2019-01-30 10:39:46 +00:00
2019-09-24 00:50:58 +00:00
Calculates `xxHash` from a string. It is proposed in two flavors, 32 and 64 bits.
2020-03-20 10:10:48 +00:00
``` sql
2019-09-24 00:50:58 +00:00
SELECT xxHash32('');
OR
SELECT xxHash64('');
```
**Returned value**
A `Uint32` or `Uint64` data type hash value.
Type: `xxHash` .
**Example**
Query:
2020-03-20 10:10:48 +00:00
``` sql
2019-09-24 00:50:58 +00:00
SELECT xxHash32('Hello, world!');
```
Result:
2020-03-20 10:10:48 +00:00
``` text
2019-09-24 00:50:58 +00:00
┌─xxHash32('Hello, world!')─┐
│ 834093149 │
└───────────────────────────┘
```
**See Also**
2020-03-21 04:11:51 +00:00
- [xxHash ](http://cyan4973.github.io/xxHash/ ).
2018-10-16 10:47:17 +00:00
2020-01-30 10:34:55 +00:00
[Original article ](https://clickhouse.tech/docs/en/query_language/functions/hash_functions/ ) <!--hide-->