ClickHouse/docs/en/sql-reference/functions/bit-functions.md
gyuton 8011f5c36a
DOCSUP-5910: Documented SimHash, MinHash, bitHammingDistance and tupleHammingDistance functions (#22131)
Co-authored-by: olgarev <56617294+olgarev@users.noreply.github.com>
Co-authored-by: George <gyuton@yandex-team.ru>
Co-authored-by: Vladimir <vdimir@yandex-team.ru>
2021-04-02 14:19:25 +03:00

6.5 KiB
Raw Blame History

toc_priority toc_title
48 Bit

Bit Functions

Bit functions work for any pair of types from UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, or Float64.

The result type is an integer with bits equal to the maximum bits of its arguments. If at least one of the arguments is signed, the result is a signed number. If an argument is a floating-point number, it is cast to Int64.

bitAnd(a, b)

bitOr(a, b)

bitXor(a, b)

bitNot(a)

bitShiftLeft(a, b)

bitShiftRight(a, b)

bitRotateLeft(a, b)

bitRotateRight(a, b)

bitTest

Takes any integer and converts it into binary form, returns the value of a bit at specified position. The countdown starts from 0 from the right to the left.

Syntax

SELECT bitTest(number, index)

Arguments

  • number Integer number.
  • index Position of bit.

Returned values

Returns a value of bit at specified position.

Type: UInt8.

Example

For example, the number 43 in base-2 (binary) numeral system is 101011.

Query:

SELECT bitTest(43, 1);

Result:

┌─bitTest(43, 1)─┐
│              1 │
└────────────────┘

Another example:

Query:

SELECT bitTest(43, 2);

Result:

┌─bitTest(43, 2)─┐
│              0 │
└────────────────┘

bitTestAll

Returns result of logical conjuction (AND operator) of all bits at given positions. The countdown starts from 0 from the right to the left.

The conjuction for bitwise operations:

0 AND 0 = 0

0 AND 1 = 0

1 AND 0 = 0

1 AND 1 = 1

Syntax

SELECT bitTestAll(number, index1, index2, index3, index4, ...)

Arguments

  • number Integer number.
  • index1, index2, index3, index4 Positions of bit. For example, for set of positions (index1, index2, index3, index4) is true if and only if all of its positions are true (index1index2, ⋀ index3index4).

Returned values

Returns result of logical conjuction.

Type: UInt8.

Example

For example, the number 43 in base-2 (binary) numeral system is 101011.

Query:

SELECT bitTestAll(43, 0, 1, 3, 5);

Result:

┌─bitTestAll(43, 0, 1, 3, 5)─┐
│                          1 │
└────────────────────────────┘

Another example:

Query:

SELECT bitTestAll(43, 0, 1, 3, 5, 2);

Result:

┌─bitTestAll(43, 0, 1, 3, 5, 2)─┐
│                             0 │
└───────────────────────────────┘

bitTestAny

Returns result of logical disjunction (OR operator) of all bits at given positions. The countdown starts from 0 from the right to the left.

The disjunction for bitwise operations:

0 OR 0 = 0

0 OR 1 = 1

1 OR 0 = 1

1 OR 1 = 1

Syntax

SELECT bitTestAny(number, index1, index2, index3, index4, ...)

Arguments

  • number Integer number.
  • index1, index2, index3, index4 Positions of bit.

Returned values

Returns result of logical disjuction.

Type: UInt8.

Example

For example, the number 43 in base-2 (binary) numeral system is 101011.

Query:

SELECT bitTestAny(43, 0, 2);

Result:

┌─bitTestAny(43, 0, 2)─┐
│                    1 │
└──────────────────────┘

Another example:

Query:

SELECT bitTestAny(43, 4, 2);

Result:

┌─bitTestAny(43, 4, 2)─┐
│                    0 │
└──────────────────────┘

bitCount

Calculates the number of bits set to one in the binary representation of a number.

Syntax

bitCount(x)

Arguments

  • xInteger or floating-point number. The function uses the value representation in memory. It allows supporting floating-point numbers.

Returned value

  • Number of bits set to one in the input number.

The function doesnt convert input value to a larger type (sign extension). So, for example, bitCount(toUInt8(-1)) = 8.

Type: UInt8.

Example

Take for example the number 333. Its binary representation: 0000000101001101.

Query:

SELECT bitCount(333);

Result:

┌─bitCount(333)─┐
│             5 │
└───────────────┘

bitHammingDistance

Returns the Hamming Distance between the bit representations of two integer values. Can be used with SimHash functions for detection of semi-duplicate strings. The smaller is the distance, the more likely those strings are the same.

Syntax

bitHammingDistance(int1, int2)

Arguments

  • int1 — First integer value. Int64.
  • int2 — Second integer value. Int64.

Returned value

  • The Hamming distance.

Type: UInt8.

Examples

Query:

SELECT bitHammingDistance(111, 121);

Result:

┌─bitHammingDistance(111, 121)─┐
│                            3 │
└──────────────────────────────┘

With SimHash:

SELECT bitHammingDistance(ngramSimHash('cat ate rat'), ngramSimHash('rat ate cat'));

Result:

┌─bitHammingDistance(ngramSimHash('cat ate rat'), ngramSimHash('rat ate cat'))─┐
│                                                                            5 │
└──────────────────────────────────────────────────────────────────────────────┘