From c59d7003924bdf2f2769db0b04558ecfe301b5a2 Mon Sep 17 00:00:00 2001 From: Sergey Suvorov Date: Mon, 30 Oct 2023 20:54:51 +0100 Subject: [PATCH] Add documentation for minSampleSizeConversion and minSampleSizeContinous --- .../functions/other-functions.md | 87 +++++++++++++++++++ 1 file changed, 87 insertions(+) diff --git a/docs/en/sql-reference/functions/other-functions.md b/docs/en/sql-reference/functions/other-functions.md index 6b092cf384d..db231166b74 100644 --- a/docs/en/sql-reference/functions/other-functions.md +++ b/docs/en/sql-reference/functions/other-functions.md @@ -2823,3 +2823,90 @@ Result: │ SELECT a, b FROM tab WHERE (a > 3) AND (b < 3) │ └─────────────────────────────────────────────────────────────────────────┘ ``` + +## minSampleSizeConversion + +Calculates minimum required sample size for an A/B test comparing conversions (proportions) in two samples. + +**Syntax** + +``` sql +minSampleSizeConversion(baseline, mde, power, alpha) +``` + +Uses the formula described in [this article](https://towardsdatascience.com/required-sample-size-for-a-b-testing-6f6608dd330a). Assumes equal sizes of treatment and control groups. Returns the sample size required for one group (i.e. the sample size required for the whole experiment is twice the returned value). + +**Arguments** + +- `baseline` — Baseline conversion. [Float](../data-types/float.md). +- `mde` — Minimum detectable effect (MDE) as percentage points (e.g. for a baseline conversion 0.25 the MDE 0.03 means an expected change to 0.25 ± 0.03). [Float](../data-types/float.md). +- `power` — Required statistical power of a test (1 - probability of Type II error). [Float](../data-types/float.md). +- `alpha` — Required significance level of a test (probability of Type I error). [Float](../data-types/float.md). + +**Returned value** + +A named [Tuple](../data-types/tuple.md) with 3 elements: + +- `"minimum_sample_size"` — Required sample size. [Float64](../data-types/float.md). +- `"detect_range_lower"` — Lower bound of the range of values not detectable with the returned required sample size (i.e. all values less than or equal to `"detect_range_lower"` are detectable with the provided `alpha` and `power`). Calculated as `baseline - mde`. [Float64](../data-types/float.md). +- `"detect_range_upper"` — Upper bound of the range of values not detectable with the returned required sample size (i.e. all values greater than or equal to `"detect_range_upper"` are detectable with the provided `alpha` and `power`). Calculated as `baseline + mde`. [Float64](../data-types/float.md). + +**Example** + +The following query calculates the required sample size for an A/B test with baseline conversion of 25%, MDE of 3%, significance level of 5%, and the desired statistical power of 80%: + +``` sql +SELECT minSampleSizeConversion(0.25, 0.03, 0.80, 0.05) AS sample_size; +``` + +Result: + +``` text +┌─sample_size───────────────────┐ +│ (3396.077603219163,0.22,0.28) │ +└───────────────────────────────┘ +``` + +## minSampleSizeContinous + +Calculates minimum required sample size for an A/B test comparing means of a continuous metric in two samples. + +**Syntax** + +``` sql +minSampleSizeContinous(baseline, sigma, mde, power, alpha) +``` + +Uses the formula described in [this article](https://towardsdatascience.com/required-sample-size-for-a-b-testing-6f6608dd330a). Assumes equal sizes of treatment and control groups. Returns the required sample size for one group (i.e. the sample size required for the whole experiment is twice the returned value). Also assumes equal variance of the test metric in treatment and control groups. + +**Arguments** + +- `baseline` — Baseline value of a metric. [Integer](../data-types/int-uint.md) or [Float](../data-types/float.md). +- `sigma` — Baseline standard deviation of a metric. [Integer](../data-types/int-uint.md) or [Float](../data-types/float.md). +- `mde` — Minimum detectable effect (MDE) as percentage of the baseline value (e.g. for a baseline value 112.25 the MDE 0.03 means an expected change to 112.25 ± 112.25*0.03). [Integer](../data-types/int-uint.md) or [Float](../data-types/float.md). +- `power` — Required statistical power of a test (1 - probability of Type II error). [Integer](../data-types/int-uint.md) or [Float](../data-types/float.md). +- `alpha` — Required significance level of a test (probability of Type I error). [Integer](../data-types/int-uint.md) or [Float](../data-types/float.md). + +**Returned value** + +A named [Tuple](../data-types/tuple.md) with 3 elements: + +- `"minimum_sample_size"` — Required sample size. [Float64](../data-types/float.md). +- `"detect_range_lower"` — Lower bound of the range of values not detectable with the returned required sample size (i.e. all values less than or equal to `"detect_range_lower"` are detectable with the provided `alpha` and `power`). Calculated as `baseline * (1 - mde)`. [Float64](../data-types/float.md). +- `"detect_range_upper"` — Upper bound of the range of values not detectable with the returned required sample size (i.e. all values greater than or equal to `"detect_range_upper"` are detectable with the provided `alpha` and `power`). Calculated as `baseline * (1 + mde)`. [Float64](../data-types/float.md). + +**Example** + +The following query calculates the required sample size for an A/B test on a metric with baseline value of 112.25, standard deviation of 21.1, MDE of 3%, significance level of 5%, and the desired statistical power of 80%: + +``` sql +SELECT minSampleSizeContinous(112.25, 21.1, 0.03, 0.80, 0.05) AS sample_size; +``` + +Result: + +``` text +┌─sample_size───────────────────────────┐ +│ (616.2931945826209,108.8825,115.6175) │ +└───────────────────────────────────────┘ +```