Corrections
corrections
Multiple testing correction (MTC) statistical engines.
This module provides correction algorithms for family-wise error rate (FWER) and false discovery rate (FDR). MTC is critical when testing multiple metrics or variants simultaneously to prevent the dramatic inflation of false positives (Type I errors).
| FUNCTION | DESCRIPTION |
|---|---|
apply_multiple_testing_correction |
Applies multiple testing corrections on p-values using statsmodels. |
apply_multiple_testing_correction
apply_multiple_testing_correction(
p_values: List[float],
alpha: float = 0.05,
method: str = "fdr_bh",
) -> List[float]
Applies multiple testing corrections on p-values using statsmodels.
TODO: Implement step-down Dunnett's correction procedure for multi-arm comparisons against a common control. TODO: Support family-wise bootstrap-based resampling corrections to account for non-normal dependency structures.
When performing multiple statistical tests simultaneously, the probability of obtaining at least one false positive (rejecting \(H_0\) when it is actually true) increases with the number of tests. This inflation of Type I error is known as the Multiple Testing Problem.
Mathematical Background of FWER Inflation
For \(m\) independent tests, each run at nominal significance level \(\\alpha\): $$ \text{FWER} = P(\text{at least one false positive}) = 1 - (1 - \alpha)^m $$ - If \(m = 1\) and \(\\alpha = 0.05\), \(\\text{FWER} = 0.05\). - If \(m = 10\) and \(\\alpha = 0.05\), \(\\text{FWER} = 1 - (0.95)^{10} \\approx 0.40\) (\(40\\%\) false positive probability). - If \(m = 50\) and \(\\alpha = 0.05\), \(\\text{FWER} \\approx 0.92\) (near-certainty of committing a false positive).
Supported Correction Methodologies
- Bonferroni Correction (
"bonferroni"): Controls the Family-Wise Error Rate (FWER) in the strong sense. It adjusts each p-value by multiplying it by the total number of tests \(m\): $$ p^{\text{adj}}_i = \min(p_i \times m, \ 1.0) $$ Highly conservative; has low statistical power when \(m\) is large or when tests are highly correlated. - Holm-Bonferroni Procedure (
"holm"): A step-down FWER control method that is uniformly more powerful than the standard Bonferroni correction. It orders the raw p-values: \(p_{(1)} \\le p_{(2)} \\le \\dots \\le p_{(m)}\). The adjusted p-values are computed sequentially as: $$ p^{\text{adj}}{(i)} = \max \left( (m - i + 1) \times p{(i)}, \ p^{\text{adj}}_{(i-1)} \right) \quad \text{for } i \ge 1 $$ (with \(p^{\\text{adj}}_{(0)} = 0\), bounded above by \(1.0\)). - Benjamini-Hochberg (BH) Procedure (
"fdr_bh"): Controls the False Discovery Rate (FDR), which is the expected proportion of false positives among all rejections. This is the preferred method for digital product experimentation (A/B testing with multiple secondary metrics), as it provides vastly superior statistical power compared to FWER controllers. It orders raw p-values: \(p_{(1)} \\le p_{(2)} \\le \\dots \\le p_{(m)}\). The adjusted p-values are calculated as: $$ p^{\text{adj}}{(i)} = \min \left( \frac{m}{i} \times p{(i)}, \ p^{\text{adj}}_{(i+1)} \right) \quad \text{for } i \le m - 1 $$ (with \(p^{\\text{adj}}_{(m)} = p_{(m)}\), bounded above by \(1.0\)). - Benjamini-Yekutieli (BY) Procedure (
"fdr_by"): Controls the False Discovery Rate under arbitrary dependency structures (i.e. positive regression dependency or negative correlation) among test statistics. BY applies an additional harmonic penalty: $$ P_{(i)} \le \frac{i}{m \sum_{j=1}^m \frac{1}{j}} \alpha $$ - Hochberg Step-up Procedure (
"hochberg"): A step-up FWER controlling procedure that is uniformly more powerful than Holm-Bonferroni, but requires the test statistics to be independent or satisfy Simes' inequality. It starts from the largest p-value down to the smallest.
| PARAMETER | DESCRIPTION |
|---|---|
p_values
|
List of raw, unadjusted p-values calculated from various metric tests.
TYPE:
|
alpha
|
Nominal significance level (e.g., 0.05). Defaults to 0.05.
TYPE:
|
method
|
Correction algorithm. Options include
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[float]
|
List[float]: A list of adjusted p-values, in the same index order as the input. |