SRM
srm
Sample Ratio Mismatch (SRM) validation using Pearson Chi-Square Goodness-of-Fit tests.
This module provides the diagnostic engine for detecting Sample Ratio Mismatches (SRMs). SRMs are critical indicator signals of experiment integrity breaches (e.g., assignment skew, tracking failures, or redirection bugs).
| FUNCTION | DESCRIPTION |
|---|---|
check_srm |
Calculates the Chi-square p-value to check for Sample Ratio Mismatch (SRM). |
check_srm
Calculates the Chi-square p-value to check for Sample Ratio Mismatch (SRM).
Sample Ratio Mismatch (SRM) is one of the most critical diagnostic flags in web and system experimentation. It indicates that the observed sample allocation counts deviate from the planned/designed allocation ratios. This method performs a Pearson Chi-square goodness-of-fit test to determine whether the observed counts are statistically compatible with the expected ratios.
Mathematical Formulation
Let \(k\) be the number of variants, let \(O_i\) be the observed count of units in variant \(i\) (\(i \in \{1, \dots, k\}\)), and let \(r_i\) be the planned allocation ratio for variant \(i\). The total observed sample size is: $$ N = \sum_{i=1}^{k} O_i $$ The expected sample count \(E_i\) for variant \(i\) is calculated as: $$ E_i = N \times \frac{r_i}{\sum_{j=1}^{k} r_j} $$ The Pearson Chi-square test statistic is computed as: $$ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} $$ Under the null hypothesis \(H_0\) (there is no SRM, and the assignment mechanism is unbiased): $$ \chi^2 \sim \chi^2_{k-1} $$ where \(k-1\) is the degrees of freedom of the distribution. The p-value is calculated as: $$ p = 1 - F_{\chi^2_{k-1}}(\chi^2_{\text{calc}}) $$ where \(F\) is the cumulative distribution function of the Chi-square distribution.
Interpretation Threshold
- If \(p < 0.001\) (\(0.1\%\) significance): The null hypothesis of perfect assignment is rejected. An SRM is highly likely, signaling a telemetry or system bug that invalidates downstream causal inferences.
- Common causes of SRM: browser-specific treatment crashes, asymmetric page-redirection delays, bot filters interacting with treatment flags, or mid-experiment changes in allocation rates.
| PARAMETER | DESCRIPTION |
|---|---|
observed_counts
|
The actual recorded sample sizes allocated to each variant (e.g.,
TYPE:
|
expected_ratios
|
The target allocation proportions or relative weights (e.g.,
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
The calculated p-value of the goodness-of-fit test.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
SRMError
|
If the computed p-value is strictly less than 0.001, indicating a severe, non-random mismatch. |
Examples:
Example
Source code in src\xpyrment\validate\srm.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | |