ClickHouse/docs/en/sql-reference/aggregate-functions/reference/quantileexact.md
2021-06-23 00:32:41 +03:00

12 KiB
Raw Blame History

toc_priority
202

quantileExact Functions

quantileExact

Exactly computes the quantile of a numeric data sequence.

To get exact value, all the passed values are combined into an array, which is then partially sorted. Therefore, the function consumes O(n) memory, where n is a number of values that were passed. However, for a small number of values, the function is very effective.

When using multiple quantile* functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.

Syntax

quantileExact(level)(expr)

Alias: medianExact.

Arguments

  • level — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates median.
  • expr — Expression over the column values resulting in numeric data types, Date or DateTime.

Returned value

  • Quantile of the specified level.

Type:

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example

Query:

SELECT quantileExact(number) FROM numbers(10)

Result:

┌─quantileExact(number)─┐
│                     5 │
└───────────────────────┘

quantileExactLow

Similar to quantileExact, this computes the exact quantile of a numeric data sequence.

To get the exact value, all the passed values are combined into an array, which is then fully sorted. The sorting algorithm's complexity is O(N·log(N)), where N = std::distance(first, last) comparisons.

The return value depends on the quantile level and the number of elements in the selection, i.e. if the level is 0.5, then the function returns the lower median value for an even number of elements and the middle median value for an odd number of elements. Median is calculated similarly to the median_low implementation which is used in python.

For all other levels, the element at the index corresponding to the value of level * size_of_array is returned. For example:

SELECT quantileExactLow(0.1)(number) FROM numbers(10)

┌─quantileExactLow(0.1)(number)─┐
                             1 
└───────────────────────────────┘

When using multiple quantile* functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.

Syntax

quantileExact(level)(expr)

Alias: medianExactLow.

Arguments

  • level — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates median.
  • expr — Expression over the column values resulting in numeric data types, Date or DateTime.

Returned value

  • Quantile of the specified level.

Type:

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example

Query:

SELECT quantileExactLow(number) FROM numbers(10)

Result:

┌─quantileExactLow(number)─┐
│                        4 │
└──────────────────────────┘

quantileExactHigh

Similar to quantileExact, this computes the exact quantile of a numeric data sequence.

All the passed values are combined into an array, which is then fully sorted, to get the exact value. The sorting algorithm's complexity is O(N·log(N)), where N = std::distance(first, last) comparisons.

The return value depends on the quantile level and the number of elements in the selection, i.e. if the level is 0.5, then the function returns the higher median value for an even number of elements and the middle median value for an odd number of elements. Median is calculated similarly to the median_high implementation which is used in python. For all other levels, the element at the index corresponding to the value of level * size_of_array is returned.

This implementation behaves exactly similar to the current quantileExact implementation.

When using multiple quantile* functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantiles function.

Syntax

quantileExactHigh(level)(expr)

Alias: medianExactHigh.

Arguments

  • level — Level of quantile. Optional parameter. Constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99]. Default value: 0.5. At level=0.5 the function calculates median.
  • expr — Expression over the column values resulting in numeric data types, Date or DateTime.

Returned value

  • Quantile of the specified level.

Type:

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example

Query:

SELECT quantileExactHigh(number) FROM numbers(10)

Result:

┌─quantileExactHigh(number)─┐
│                         5 │
└───────────────────────────┘

quantileExactExclusive

Exactly computes the quantile of a numeric data sequence.

To get exact value, all the passed values are combined into an array, which is then partially sorted. Therefore, the function consumes O(n) memory, where n is a number of values that were passed. However, for a small number of values, the function is very effective.

This function is equivalent to PERCENTILE.EXC Excel function, (type R6).

When using multiple quantileExactExclusive functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantilesExactExclusive function.

Syntax

quantileExactExclusive(level)(expr)

Arguments

Parameters

  • level — Level of quantile. Optional. Possible values: (0, 1) — bounds not included. Default value: 0.5. At level=0.5 the function calculates median. Float.

Returned value

  • Quantile of the specified level.

Type:

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example

Query:

CREATE TABLE num AS numbers(1000);

SELECT quantileExactExclusive(0.6)(x) FROM (SELECT number AS x FROM num);

Result:

┌─quantileExactExclusive(0.6)(x)─┐
│                          599.6 │
└────────────────────────────────┘

quantileExactInclusive

Exactly computes the quantile of a numeric data sequence.

To get exact value, all the passed values are combined into an array, which is then partially sorted. Therefore, the function consumes O(n) memory, where n is a number of values that were passed. However, for a small number of values, the function is very effective.

This function is equivalent to PERCENTILE.INC Excel function, (type R7).

When using multiple quantileExactInclusive functions with different levels in a query, the internal states are not combined (that is, the query works less efficiently than it could). In this case, use the quantilesExactInclusive function.

Syntax

quantileExactInclusive(level)(expr)

Arguments

Parameters

  • level — Level of quantile. Optional. Possible values: [0, 1] — bounds included. Default value: 0.5. At level=0.5 the function calculates median. Float.

Returned value

  • Quantile of the specified level.

Type:

  • Float64 for numeric data type input.
  • Date if input values have the Date type.
  • DateTime if input values have the DateTime type.

Example

Query:

CREATE TABLE num AS numbers(1000);

SELECT quantileExactInclusive(0.6)(x) FROM (SELECT number AS x FROM num);

Result:

┌─quantileExactInclusive(0.6)(x)─┐
│                          599.4 │
└────────────────────────────────┘

See Also