ClickHouse/docs/en/sql-reference/functions/machine-learning-functions.md

---
toc_priority: 64
toc_title: Machine Learning
---

# Machine Learning Functions {#machine-learning-functions}

## evalMLMethod {#machine_learning_methods-evalmlmethod}

Prediction using fitted regression models uses `evalMLMethod` function. See link in `linearRegression`.

## stochasticLinearRegression {#stochastic-linear-regression}

The [stochasticLinearRegression](../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md#agg_functions-stochasticlinearregression) aggregate function implements stochastic gradient descent method using linear model and MSE loss function. Uses `evalMLMethod` to predict on new data.

## stochasticLogisticRegression {#stochastic-logistic-regression}

The [stochasticLogisticRegression](../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md#agg_functions-stochasticlogisticregression) aggregate function implements stochastic gradient descent method for binary classification problem. Uses `evalMLMethod` to predict on new data.

## bayesAB {#bayesab}

Compares test groups (variants) and calculates for each group the probability to be the best one. The first group is used as a control group.

**Syntax** 

``` sql
bayesAB(distribution_name, higher_is_better, variant_names, x, y)
```

**Arguments** 

-   `distribution_name` — Name of the probability distribution. [String](../../sql-reference/data-types/string.md). Possible values:

    -   `beta` for [Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution)
    -   `gamma` for [Gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)

-   `higher_is_better` — Boolean flag. [Boolean](../../sql-reference/data-types/boolean.md). Possible values:

    -    `0` — lower values are considered to be better than higher
    -    `1` — higher values are considered to be better than lower

-   `variant_names` — Variant names. [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).

-   `x` — Numbers of tests for the corresponding variants. [Array](../../sql-reference/data-types/array.md)([Float64](../../sql-reference/data-types/float.md)).

-   `y` — Numbers of successful tests for the corresponding variants. [Array](../../sql-reference/data-types/array.md)([Float64](../../sql-reference/data-types/float.md)).

!!! note "Note"
    All three arrays must have the same size. All `x` and `y` values must be non-negative constant numbers. `y` cannot be larger than `x`.

**Returned values**

For each variant the function calculates:
-   `beats_control` — long-term probability to out-perform the first (control) variant
-   `to_be_best` — long-term probability to out-perform all other variants

Type: JSON.

**Example**

Query:

``` sql
SELECT bayesAB('beta', 1, ['Control', 'A', 'B'], [3000., 3000., 3000.], [100., 90., 110.]) FORMAT PrettySpace;
```

Result:

``` text
{
   "data":[
      {
         "variant_name":"Control",
         "x":3000,
         "y":100,
         "beats_control":0,
         "to_be_best":0.22619
      },
      {
         "variant_name":"A",
         "x":3000,
         "y":90,
         "beats_control":0.23469,
         "to_be_best":0.04671
      },
      {
         "variant_name":"B",
         "x":3000,
         "y":110,
         "beats_control":0.7580899999999999,
         "to_be_best":0.7271
      }
   ]
}
```
Get rid of toc_en.yml (#10023) 2020-04-03 13:23:32 +00:00			`---`
			`toc_priority: 64`
Update machine-learning-functions.md 2020-06-19 10:20:23 +00:00			`toc_title: Machine Learning`
Get rid of toc_en.yml (#10023) 2020-04-03 13:23:32 +00:00			`---`

			`# Machine Learning Functions {#machine-learning-functions}`

Update machine-learning-functions.md 2020-06-19 10:20:23 +00:00			`## evalMLMethod {#machine_learning_methods-evalmlmethod}`
Get rid of toc_en.yml (#10023) 2020-04-03 13:23:32 +00:00
			Prediction using fitted regression models uses `evalMLMethod` function. See link in `linearRegression`.

docs(fix): typo 2021-03-16 10:30:05 +00:00			`## stochasticLinearRegression {#stochastic-linear-regression}`
Get rid of toc_en.yml (#10023) 2020-04-03 13:23:32 +00:00
[docs] split aggregate function and system table references (#11742) * prefer relative links from root * wip * split aggregate function reference * split system tables 2020-06-18 08:24:31 +00:00			The [stochasticLinearRegression](../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md#agg_functions-stochasticlinearregression) aggregate function implements stochastic gradient descent method using linear model and MSE loss function. Uses `evalMLMethod` to predict on new data.
Get rid of toc_en.yml (#10023) 2020-04-03 13:23:32 +00:00
Update machine-learning-functions.md 2020-06-19 10:20:23 +00:00			`## stochasticLogisticRegression {#stochastic-logistic-regression}`
Get rid of toc_en.yml (#10023) 2020-04-03 13:23:32 +00:00
[docs] split aggregate function and system table references (#11742) * prefer relative links from root * wip * split aggregate function reference * split system tables 2020-06-18 08:24:31 +00:00			The [stochasticLogisticRegression](../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md#agg_functions-stochasticlogisticregression) aggregate function implements stochastic gradient descent method for binary classification problem. Uses `evalMLMethod` to predict on new data.
DOCSUP-2407: Documented the bayesAB function (#15599) * Docs for the bayesAB function, english. * Note edited. Co-authored-by: Olga Revyakina <revolg@yandex-team.ru> 2020-10-05 13:06:16 +00:00
			`## bayesAB {#bayesab}`

			`Compares test groups (variants) and calculates for each group the probability to be the best one. The first group is used as a control group.`

			`Syntax`

			``` sql
			`bayesAB(distribution_name, higher_is_better, variant_names, x, y)`
			```

Global replacement `Parameters` to `Arguments` 2021-02-15 21:22:10 +00:00			`Arguments`
DOCSUP-2407: Documented the bayesAB function (#15599) * Docs for the bayesAB function, english. * Note edited. Co-authored-by: Olga Revyakina <revolg@yandex-team.ru> 2020-10-05 13:06:16 +00:00
			- `distribution_name` — Name of the probability distribution. [String](../../sql-reference/data-types/string.md). Possible values:

			- `beta` for [Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution)
			- `gamma` for [Gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)

			- `higher_is_better` — Boolean flag. [Boolean](../../sql-reference/data-types/boolean.md). Possible values:

Edit and translate to Russian Поправил шаблоны в английской и русской версиях. 2021-03-13 18:18:45 +00:00			- `0` — lower values are considered to be better than higher
			- `1` — higher values are considered to be better than lower
DOCSUP-2407: Documented the bayesAB function (#15599) * Docs for the bayesAB function, english. * Note edited. Co-authored-by: Olga Revyakina <revolg@yandex-team.ru> 2020-10-05 13:06:16 +00:00
Edit and translate to Russian Поправил шаблоны в английской и русской версиях. 2021-03-13 18:18:45 +00:00			- `variant_names` — Variant names. [Array](../../sql-reference/data-types/array.md)([String](../../sql-reference/data-types/string.md)).
DOCSUP-2407: Documented the bayesAB function (#15599) * Docs for the bayesAB function, english. * Note edited. Co-authored-by: Olga Revyakina <revolg@yandex-team.ru> 2020-10-05 13:06:16 +00:00
Edit and translate to Russian Поправил шаблоны в английской и русской версиях. 2021-03-13 18:18:45 +00:00			- `x` — Numbers of tests for the corresponding variants. [Array](../../sql-reference/data-types/array.md)([Float64](../../sql-reference/data-types/float.md)).
DOCSUP-2407: Documented the bayesAB function (#15599) * Docs for the bayesAB function, english. * Note edited. Co-authored-by: Olga Revyakina <revolg@yandex-team.ru> 2020-10-05 13:06:16 +00:00
Edit and translate to Russian Поправил шаблоны в английской и русской версиях. 2021-03-13 18:18:45 +00:00			- `y` — Numbers of successful tests for the corresponding variants. [Array](../../sql-reference/data-types/array.md)([Float64](../../sql-reference/data-types/float.md)).
DOCSUP-2407: Documented the bayesAB function (#15599) * Docs for the bayesAB function, english. * Note edited. Co-authored-by: Olga Revyakina <revolg@yandex-team.ru> 2020-10-05 13:06:16 +00:00
			`!!! note "Note"`
			All three arrays must have the same size. All `x` and `y` values must be non-negative constant numbers. `y` cannot be larger than `x`.

			`Returned values`

			`For each variant the function calculates:`
Edit and translate to Russian Поправил шаблоны в английской и русской версиях. 2021-03-13 18:18:45 +00:00			- `beats_control` — long-term probability to out-perform the first (control) variant
			- `to_be_best` — long-term probability to out-perform all other variants
DOCSUP-2407: Documented the bayesAB function (#15599) * Docs for the bayesAB function, english. * Note edited. Co-authored-by: Olga Revyakina <revolg@yandex-team.ru> 2020-10-05 13:06:16 +00:00
			`Type: JSON.`

			`Example`

			`Query:`

			``` sql
			`SELECT bayesAB('beta', 1, ['Control', 'A', 'B'], [3000., 3000., 3000.], [100., 90., 110.]) FORMAT PrettySpace;`
			```

			`Result:`

			``` text
			`{`
			`"data":[`
			`{`
			`"variant_name":"Control",`
			`"x":3000,`
			`"y":100,`
			`"beats_control":0,`
			`"to_be_best":0.22619`
			`},`
			`{`
			`"variant_name":"A",`
			`"x":3000,`
			`"y":90,`
			`"beats_control":0.23469,`
			`"to_be_best":0.04671`
			`},`
			`{`
			`"variant_name":"B",`
			`"x":3000,`
			`"y":110,`
			`"beats_control":0.7580899999999999,`
			`"to_be_best":0.7271`
			`}`
			`]`
			`}`
			```