2020-04-03 13:23:32 +00:00
---
toc_priority: 64
2020-06-19 10:20:23 +00:00
toc_title: Machine Learning
2020-04-03 13:23:32 +00:00
---
# Machine Learning Functions {#machine-learning-functions}
2020-06-19 10:20:23 +00:00
## evalMLMethod {#machine_learning_methods-evalmlmethod}
2020-04-03 13:23:32 +00:00
Prediction using fitted regression models uses `evalMLMethod` function. See link in `linearRegression` .
2021-03-16 10:30:05 +00:00
## stochasticLinearRegression {#stochastic-linear-regression}
2020-04-03 13:23:32 +00:00
2020-06-18 08:24:31 +00:00
The [stochasticLinearRegression ](../../sql-reference/aggregate-functions/reference/stochasticlinearregression.md#agg_functions-stochasticlinearregression ) aggregate function implements stochastic gradient descent method using linear model and MSE loss function. Uses `evalMLMethod` to predict on new data.
2020-04-03 13:23:32 +00:00
2020-06-19 10:20:23 +00:00
## stochasticLogisticRegression {#stochastic-logistic-regression}
2020-04-03 13:23:32 +00:00
2020-06-18 08:24:31 +00:00
The [stochasticLogisticRegression ](../../sql-reference/aggregate-functions/reference/stochasticlogisticregression.md#agg_functions-stochasticlogisticregression ) aggregate function implements stochastic gradient descent method for binary classification problem. Uses `evalMLMethod` to predict on new data.
2020-10-05 13:06:16 +00:00
## bayesAB {#bayesab}
Compares test groups (variants) and calculates for each group the probability to be the best one. The first group is used as a control group.
2021-07-29 15:20:55 +00:00
**Syntax**
2020-10-05 13:06:16 +00:00
``` sql
bayesAB(distribution_name, higher_is_better, variant_names, x, y)
```
2021-07-29 15:20:55 +00:00
**Arguments**
2020-10-05 13:06:16 +00:00
- `distribution_name` — Name of the probability distribution. [String ](../../sql-reference/data-types/string.md ). Possible values:
- `beta` for [Beta distribution ](https://en.wikipedia.org/wiki/Beta_distribution )
- `gamma` for [Gamma distribution ](https://en.wikipedia.org/wiki/Gamma_distribution )
- `higher_is_better` — Boolean flag. [Boolean ](../../sql-reference/data-types/boolean.md ). Possible values:
2021-03-13 18:18:45 +00:00
- `0` — lower values are considered to be better than higher
- `1` — higher values are considered to be better than lower
2020-10-05 13:06:16 +00:00
2021-03-13 18:18:45 +00:00
- `variant_names` — Variant names. [Array ](../../sql-reference/data-types/array.md )([String](../../sql-reference/data-types/string.md)).
2020-10-05 13:06:16 +00:00
2021-03-13 18:18:45 +00:00
- `x` — Numbers of tests for the corresponding variants. [Array ](../../sql-reference/data-types/array.md )([Float64](../../sql-reference/data-types/float.md)).
2020-10-05 13:06:16 +00:00
2021-03-13 18:18:45 +00:00
- `y` — Numbers of successful tests for the corresponding variants. [Array ](../../sql-reference/data-types/array.md )([Float64](../../sql-reference/data-types/float.md)).
2020-10-05 13:06:16 +00:00
!!! note "Note"
All three arrays must have the same size. All `x` and `y` values must be non-negative constant numbers. `y` cannot be larger than `x` .
**Returned values**
For each variant the function calculates:
2021-03-13 18:18:45 +00:00
- `beats_control` — long-term probability to out-perform the first (control) variant
- `to_be_best` — long-term probability to out-perform all other variants
2020-10-05 13:06:16 +00:00
Type: JSON.
**Example**
Query:
``` sql
SELECT bayesAB('beta', 1, ['Control', 'A', 'B'], [3000., 3000., 3000.], [100., 90., 110.]) FORMAT PrettySpace;
```
Result:
``` text
{
"data":[
{
"variant_name":"Control",
"x":3000,
"y":100,
"beats_control":0,
"to_be_best":0.22619
},
{
"variant_name":"A",
"x":3000,
"y":90,
"beats_control":0.23469,
"to_be_best":0.04671
},
{
"variant_name":"B",
"x":3000,
"y":110,
"beats_control":0.7580899999999999,
"to_be_best":0.7271
}
]
}
```