Analyze Module
The xpyrment.analyze module contains submodules and components for analyze.
analyze
Experiment analysis, variance reduction, multiple testing corrections, and statistical inference.
This package houses the core statistical analysis engine of xpyrment. It coordinates the calculation of treatment
effects, computes variance-reduced adjusted statistics, corrects for multiple simultaneous comparisons, and
provides a modular statistical inference suite (frequentist, bayesian, sequential, and bootstrap).
Fluent API Integration
To preserve elegant, object-oriented fluent API chaining, this package dynamically registers the
run_analysis method on the main Experiment orchestrator class when imported:
| MODULE | DESCRIPTION |
|---|---|
confounding |
Multi-Factor Fractional ANOVA Confounding Resolvers for sparse experimental designs. |
copula |
Non-Gaussian Copula-Based Multi-Metric Inference (Block 24). |
corrections |
Multiple testing correction (MTC) statistical engines. |
extreme |
Extreme Value Theory (EVT) for Heavy-Tailed Conversions (Block 32). |
inference |
Statistical inference engines, frameworks, and decision-making systems. |
its |
Interrupted Time Series (ITS) with HAC Standard Errors (Block 35). |
markov |
Multi-State Markov Transition Journey Modeling (Block 30). |
meta_regression |
Meta-Regression with Knapp-Hartung and Newey-West HAC Standard Errors (Block 38). |
orchestrator |
Experiment analysis orchestrator, results compiler, and setup entrypoints. |
outliers |
Winsorization & Outlier Stabilization (Block 43). |
ratio |
Ratio Metric Delta Method Delta-Variance Estimation (Block 45). |
registry |
Metric Registry & Directed Acyclic Graph (DAG) Evaluator (Block 49). |
sequential |
Group Sequential Lan-DeMets Alpha Spending Functions (Block 36). |
srm |
Sample Ratio Mismatch (SRM) Detection & Sequential Guardrails (Block 41). |
streaming |
Low-latency streaming OLS/Ridge regression via recursive least squares (RLS) and Woodbury updates. |
variance_reduction |
Variance reduction algorithms, focusing on continuous and ratio-level CUPED. |
| CLASS | DESCRIPTION |
|---|---|
AnalysisResult |
Holds results from an experiment analysis and provides summary formatting and plotting interfaces. |
StreamingOLS |
Streaming Ordinary Least Squares and Ridge Regression. |
AliasResolver |
Computes and resolves the alias structure in fractional factorial designs. |
CopulaMultiMetricInference |
Joint inference engine for correlated non-Gaussian metrics using empirical Gaussian Copulas. |
MarkovJourneyAnalyzer |
Analyzes chronological user state transitions in experimentation datasets. |
ExtremeValueTailEstimator |
Estimates tail indices and expected extreme lift using Generalized Pareto Distributions. |
InterruptedTimeSeries |
Models segmented regression over system-wide updates with Newey-West HAC standard errors. |
GroupSequentialMonitor |
Computes O'Brien-Fleming and Pocock sequential boundaries via Lan-DeMets spending functions. |
MetaRegressor |
Solves random-effects meta-regression models with Knapp-Hartung or Newey-West HAC standard errors. |
SampleRatioMismatchDetector |
Detects Sample Ratio Mismatch (SRM) using frequentist retrospective and online sequential tests. |
WinsorizationEngine |
Provides symmetric or asymmetric percentile-based capping boundaries. |
RatioMetricDeltaMethod |
Estimates ratio values, delta-variance bounds, and Wald treatment comparisons. |
MetricRegistry |
Manages a registry of metrics and evaluates them using topological sorting of a DAG. |
| FUNCTION | DESCRIPTION |
|---|---|
run_analysis |
Executes the statistical analysis across all registered metrics in an Experiment container. |
setup |
Initializes the experimental setup container, serving as the library's primary entrypoint. |
apply_cuped |
Applies Controlled-experiments Using Pre-Experiment Data (CUPED) on a series. |
apply_multiple_testing_correction |
Applies multiple testing corrections on p-values using statsmodels. |
AnalysisResult
AnalysisResult(
raw_results: List[dict],
alpha: float = 0.05,
balance_checker: Optional[Any] = None,
)
Holds results from an experiment analysis and provides summary formatting and plotting interfaces.
This container aggregates the individual metric dictionaries calculated across control and treatment groups. It provides high-level APIs to compile clean summary tables and forward coordinates to the visualization engine.
| ATTRIBUTE | DESCRIPTION |
|---|---|
raw_results |
A list of metric calculation result dictionaries (keys: mean, lift, p_value, etc.).
TYPE:
|
alpha |
Nominal significance level (Type I error rate) used in the analysis. Defaults to 0.05.
TYPE:
|
df_raw |
The raw, unformatted results compiled into a pandas DataFrame.
TYPE:
|
balance_checker |
Fitted balance checker object if covariates were present.
TYPE:
|
| PARAMETER | DESCRIPTION |
|---|---|
raw_results
|
Raw list of metric results.
TYPE:
|
alpha
|
Nominal significance level used.
TYPE:
|
balance_checker
|
Fitted CovariateBalanceChecker if covariates were specified.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
love_plot |
Returns the ASCII Love Plot visualization for baseline covariate balance. |
to_dict |
Converts the complete analysis results and metadata to a robust, serializable dictionary. |
to_json |
Converts the analysis results into a standardized, portable JSON string. |
summary |
Returns a summarized, human-readable DataFrame of the analysis. |
plot |
Generates and returns a forest plot of the relative metric lifts and confidence intervals. |
Source code in src\xpyrment\analyze\orchestrator.py
love_plot
Returns the ASCII Love Plot visualization for baseline covariate balance.
| RETURNS | DESCRIPTION |
|---|---|
str
|
An ASCII text-based representation or message.
TYPE:
|
Source code in src\xpyrment\analyze\orchestrator.py
to_dict
Converts the complete analysis results and metadata to a robust, serializable dictionary.
Includes significance thresholds (alpha), raw metric outcomes, and covariate balance diagnostics if present.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
A nested dictionary with native Python types, guaranteed to be JSON serializable.
TYPE:
|
Source code in src\xpyrment\analyze\orchestrator.py
to_json
Converts the analysis results into a standardized, portable JSON string.
| PARAMETER | DESCRIPTION |
|---|---|
indent
|
If provided, formats the JSON string with this indentation level.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Standardized JSON representation of the analysis results.
TYPE:
|
Source code in src\xpyrment\analyze\orchestrator.py
summary
Returns a summarized, human-readable DataFrame of the analysis.
Formats raw numeric statistics (standard errors, differences, variances) into readable percentage lifts, relative confidence intervals, power indicators, and significance star symbols.
Significance Star Mapping
***: \(p < 0.001\) (Highly significant)**: \(p < 0.01\) (Significant)*: \(p < 0.05\) (Significant)- No star : \(p \ge 0.05\) (Not statistically significant at the nominal level \(\alpha=0.05\))
| PARAMETER | DESCRIPTION |
|---|---|
formatted
|
If True, returns nicely formatted strings for display (with percentage symbols, stars, and bracketed intervals). If False, returns the raw numeric values. Defaults to True.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
pd.DataFrame: A pandas DataFrame containing binned summaries of each analyzed metric. |
Source code in src\xpyrment\analyze\orchestrator.py
plot
Generates and returns a forest plot of the relative metric lifts and confidence intervals.
Forwards coordinates to the visualization module.
| PARAMETER | DESCRIPTION |
|---|---|
**kwargs
|
Plot customization arguments forwarded to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
matplotlib.axes.Axes or plotly.graph_objects.Figure: The generated relative lift forest plot. |
Source code in src\xpyrment\analyze\orchestrator.py
StreamingOLS
Streaming Ordinary Least Squares and Ridge Regression.
Maintains the running coefficients (beta) and the inverse of the covariance matrix using the Woodbury matrix identity (Sherman-Morrison formula). Updates take O(P^2) time complexity per sample rather than O(P^3) offline inversion time.
| PARAMETER | DESCRIPTION |
|---|---|
n_features
|
Number of independent features (excluding bias).
TYPE:
|
l2_penalty
|
Ridge L2 regularization multiplier (lambda). Defaults to 1.0.
TYPE:
|
fit_intercept
|
If True, automatically appends a bias term. Defaults to True.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
update |
Performs a single-sample recursive least squares update. |
update_batch |
Updates the running model with a batch of observations. |
predict |
Predicts the target value using the active coefficient weights. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
coefficients |
Returns the current estimated regression coefficients.
TYPE:
|
Source code in src\xpyrment\analyze\streaming.py
update
Performs a single-sample recursive least squares update.
Updates the running coefficient vector (beta) and covariance inverse (P) in O(P^2) time.
| PARAMETER | DESCRIPTION |
|---|---|
x
|
1D array of features of shape (n_features,).
TYPE:
|
y
|
Numeric target value.
TYPE:
|
Source code in src\xpyrment\analyze\streaming.py
update_batch
Updates the running model with a batch of observations.
Loops through each row to perform recursive rank-1 updates.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Feature matrix of shape (M, n_features).
TYPE:
|
y
|
Target outcomes of shape (M,).
TYPE:
|
Source code in src\xpyrment\analyze\streaming.py
predict
Predicts the target value using the active coefficient weights.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Feature matrix of shape (M, n_features).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
np.ndarray: Vector of predictions of shape (M,). |
Source code in src\xpyrment\analyze\streaming.py
AliasResolver
Computes and resolves the alias structure in fractional factorial designs.
TODO: Implement sequential D-optimal design updates that minimize the trace of the alias matrix projection error
covariance, Cov(beta_1_true) = sigma^2 (X_1^T X_1)^{-1} + A Cov(beta_2) A^T.
| PARAMETER | DESCRIPTION |
|---|---|
primary_cols
|
Column names representing primary effects of interest (e.g., main effects).
TYPE:
|
potential_confounding_cols
|
Column names representing potential higher-order confounding effects (e.g., 2-way or 3-way interactions).
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
compute_alias_matrix |
Computes the alias matrix: A = (X1^T X1 + lambda * I)^{-1} X1^T X2. |
get_alias_report |
Generates a structured report detailing which primary effects are aliased with which confounding effects. |
resolve_coefficients |
Resolves/decouples true primary coefficients given biased estimates and prior confounding parameters. |
Source code in src\xpyrment\analyze\confounding.py
compute_alias_matrix
Computes the alias matrix: A = (X1^T X1 + lambda * I)^{-1} X1^T X2.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input design or experimental data containing all specified columns.
TYPE:
|
l2_penalty
|
Regularization parameter for the inverse computation to ensure numerical stability. Defaults to 1e-6.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
np.ndarray: The alias matrix A of shape (len(primary_cols) + 1, len(potential_confounding_cols)). The first row corresponds to the unpenalized intercept/bias term. |
Source code in src\xpyrment\analyze\confounding.py
get_alias_report
Generates a structured report detailing which primary effects are aliased with which confounding effects.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input design or experimental data.
TYPE:
|
threshold
|
Only reports alias coefficients whose absolute value exceeds this threshold. Defaults to 1e-3.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, List[Tuple[str, float]]]
|
Dict[str, List[Tuple[str, float]]]: A dictionary mapping primary effect names (and 'intercept') to lists of (confounding_col_name, alias_coefficient) tuples. |
Source code in src\xpyrment\analyze\confounding.py
resolve_coefficients
Resolves/decouples true primary coefficients given biased estimates and prior confounding parameters.
Uses the relationship: beta1_true = beta1_biased - A * beta2_prior
| PARAMETER | DESCRIPTION |
|---|---|
beta1_biased
|
Biased primary coefficients (including intercept) of shape (len(primary_cols) + 1,).
TYPE:
|
beta2_prior
|
Prior known or estimated higher-order confounding coefficients of shape (len(potential_confounding_cols),).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
np.ndarray: Cleaned/debiased primary coefficients of shape (len(primary_cols) + 1,). |
Source code in src\xpyrment\analyze\confounding.py
CopulaMultiMetricInference
Joint inference engine for correlated non-Gaussian metrics using empirical Gaussian Copulas.
Estimates univariate empirical cumulative distribution functions (CDFs) to project marginal observations onto uniform space [0, 1], maps to standard normal space, estimates the latent correlation structure, and performs a joint Wald test.
TODO: Support parametric copula families (such as Clayton or Gumbel) to capture asymmetric tail dependencies.
TODO: Add multivariate p-value corrections (e.g. step-down procedures) to control Family-Wise Error Rate (FWER) under copula.
| PARAMETER | DESCRIPTION |
|---|---|
l2_penalty
|
Regularization for covariance matrix inversion. Defaults to 1e-6.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
fit_copula |
Fits empirical marginals and constructs the latent copula correlation matrix. |
test_joint_shift |
Performs a joint Wald test of treatment effects across all correlated metrics. |
Source code in src\xpyrment\analyze\copula.py
fit_copula
fit_copula(
df: DataFrame, metric_cols: List[str]
) -> CopulaMultiMetricInference
Fits empirical marginals and constructs the latent copula correlation matrix.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
DataFrame containing experimental metric columns.
TYPE:
|
metric_cols
|
List of metric column names to include in the joint copula.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
CopulaMultiMetricInference
|
Fitted engine. |
Source code in src\xpyrment\analyze\copula.py
test_joint_shift
test_joint_shift(
df: DataFrame,
treatment_col: str,
metric_cols: List[str],
) -> Dict[str, Union[float, ndarray]]
Performs a joint Wald test of treatment effects across all correlated metrics.
H0: delta_1 = delta_2 = ... = delta_D = 0
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Experimental dataset.
TYPE:
|
treatment_col
|
Column indicating binary treatment assignment (0 = control, 1 = treated).
TYPE:
|
metric_cols
|
Correlated metric column names.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict
|
Dictionary containing joint Wald statistic, joint p-value, individual lifts, and covariance.
TYPE:
|
Source code in src\xpyrment\analyze\copula.py
MarkovJourneyAnalyzer
Analyzes chronological user state transitions in experimentation datasets.
Extracts transition probability matrices, stationary distributions (long-term states), and executes Chi-squared transition homogeneity tests.
TODO: Add continuous-time Markov intensity matrix (Q) estimations to model exact duration stay times within states.
TODO: Implement bootstrap confidence interval approximations for stationary distribution probability shifts.
| PARAMETER | DESCRIPTION |
|---|---|
states
|
Unique ordered labels of possible states in the funnel.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
extract_transitions |
Helper to extract chronological pairwise transitions from a DataFrame grouped by user. |
compute_transition_matrix |
Computes transition count and probability matrices. |
compute_stationary_distribution |
Computes the stationary distribution pi of the Markov transition matrix via power iteration. |
test_transition_homogeneity |
Runs a Chi-squared homogeneity test of transition matrices between Control and Treatment. |
Source code in src\xpyrment\analyze\markov.py
extract_transitions
Helper to extract chronological pairwise transitions from a DataFrame grouped by user.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
DataFrame containing chronological sequence of states per user.
TYPE:
|
user_col
|
Column indicating user ID.
TYPE:
|
state_col
|
Column indicating user state.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[Tuple[str, str]]
|
List[Tuple[str, str]]: Chronological transitions. |
Source code in src\xpyrment\analyze\markov.py
compute_transition_matrix
Computes transition count and probability matrices.
| PARAMETER | DESCRIPTION |
|---|---|
transitions
|
List of state-to-state transitions.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tuple[ndarray, ndarray]
|
Tuple[np.ndarray, np.ndarray]: Transition count matrix (S, S) and probability matrix (S, S). |
Source code in src\xpyrment\analyze\markov.py
compute_stationary_distribution
compute_stationary_distribution(
probs: ndarray, max_iters: int = 150, tol: float = 1e-07
) -> ndarray
Computes the stationary distribution pi of the Markov transition matrix via power iteration.
Solves pi * P = pi subject to sum(pi) = 1.0.
Source code in src\xpyrment\analyze\markov.py
test_transition_homogeneity
test_transition_homogeneity(
control_transitions: List[Tuple[str, str]],
treatment_transitions: List[Tuple[str, str]],
) -> Dict[str, Union[float, ndarray]]
Runs a Chi-squared homogeneity test of transition matrices between Control and Treatment.
H0: P^(C) = P^(T) (transition probabilities are identical across groups)
Source code in src\xpyrment\analyze\markov.py
ExtremeValueTailEstimator
Estimates tail indices and expected extreme lift using Generalized Pareto Distributions.
TODO: Add a profile likelihood fallback optimizer to support shape parameter estimation when xi is outside the MOM boundary [0, 0.5].
TODO: Implement automated threshold choice heuristics (e.g., using Hill plots or Gertensgarbe's sequential tests).
| PARAMETER | DESCRIPTION |
|---|---|
percentile
|
Percentile threshold to define the tail cutoff. Defaults to 0.95.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
fit |
Fits the GPD parameters on outcomes exceeding the pre-specified percentile. |
expected_shortfall |
Calculates Expected Shortfall (conditional tail expectation) for fitted GPD. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
metrics |
Returns GPD parameters and tail indices.
TYPE:
|
Source code in src\xpyrment\analyze\extreme.py
fit
fit(outcomes: ndarray) -> ExtremeValueTailEstimator
Fits the GPD parameters on outcomes exceeding the pre-specified percentile.
Uses the Method of Moments (MOM) for GPD estimation (valid for shape xi < 0.5): sigma_MOM = 1/2 * mean_exceed * (1 + mean_exceed^2 / var_exceed) xi_MOM = 1/2 * (1 - mean_exceed^2 / var_exceed)
Source code in src\xpyrment\analyze\extreme.py
expected_shortfall
Calculates Expected Shortfall (conditional tail expectation) for fitted GPD.
ES_u = u + (sigma + xi * (VaR - u)) / (1 - xi) At the threshold itself (VaR = u), ES is simply: ES = u + sigma / (1 - xi)
Source code in src\xpyrment\analyze\extreme.py
InterruptedTimeSeries
Models segmented regression over system-wide updates with Newey-West HAC standard errors.
TODO: Support autoregressive integrated moving average (ARIMA) error components integrated with the segmented OLS model.
TODO: Add dynamic lag order selection using AIC/BIC information criteria to optimize the Newey-West HAC spectral bandwidth.
| PARAMETER | DESCRIPTION |
|---|---|
treatment_index
|
Step/index where the policy intervention was activated.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
fit |
Fits the interrupted time series model. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
results |
Returns coefficient summaries, standard errors, and p-values.
TYPE:
|
Source code in src\xpyrment\analyze\its.py
results
property
Returns coefficient summaries, standard errors, and p-values.
fit
fit(outcomes: ndarray) -> InterruptedTimeSeries
Fits the interrupted time series model.
| PARAMETER | DESCRIPTION |
|---|---|
outcomes
|
Chronological series of outcomes.
TYPE:
|
Source code in src\xpyrment\analyze\its.py
GroupSequentialMonitor
Computes O'Brien-Fleming and Pocock sequential boundaries via Lan-DeMets spending functions.
TODO: Implement exact multivariate normal integration (e.g. using Genz-Bretz algorithms) to solve multi-look joint covariance critical bounds.
TODO: Support binding and non-binding futility boundaries using beta-spending formulations to support early stopping for futility.
| PARAMETER | DESCRIPTION |
|---|---|
alpha
|
Total Type I error budget. Defaults to 0.05.
TYPE:
|
spending_type
|
'obrien_fleming' or 'pocock' type spending.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
alpha_spent |
Computes cumulative alpha spent at information fraction t in [0, 1]. |
compute_boundaries |
Computes sequential critical boundary values Z_crit and p-value boundaries for each look. |
Source code in src\xpyrment\analyze\sequential.py
alpha_spent
Computes cumulative alpha spent at information fraction t in [0, 1].
Source code in src\xpyrment\analyze\sequential.py
compute_boundaries
Computes sequential critical boundary values Z_crit and p-value boundaries for each look.
Using sequential spent alpha increments, the critical z-score at look k satisfies: P(Reject at look k | No previous rejections) = alpha(t_k) - alpha(t_{k-1})
Using a standard sequential spending boundary approximation
Z_crit(k) = Phi^-1(1 - (alpha(t_k) - alpha(t_{k-1})) / 2)
Source code in src\xpyrment\analyze\sequential.py
MetaRegressor
Solves random-effects meta-regression models with Knapp-Hartung or Newey-West HAC standard errors.
Mathematical Specifications of Meta-Regression
Meta-regression aggregates study-level treatment effect estimates across various trials and cohorts.
Let \(y_j\) be the estimated treatment effect of study \(j \in \{1, \dots, J\}\), let \(v_j\) be the within-study variance, and let \(X_j\) be a \(P\)-dimensional vector of study-level covariates. The random-effects meta-regression model is: $$ y_j = X_j \beta + u_j + \epsilon_j $$ where \(u_j \sim N(0, \tau^2)\) is the between-study random effect, and \(\epsilon_j \sim N(0, v_j)\) is the within-study error.
-
DerSimonian-Laird Heterogeneity Variance Estimation: The between-study variance parameter \(\tau^2\) is estimated using a closed-form method of moments: $$ \tau^2 = \max\left(0, \frac{Q - (J - P)}{\sum_{j=1}^J w_{FE, j} - \text{tr}((X^T W_{FE} X)^{-1} X^T W_{FE}^2 X)}\right) $$ where \(Q = \sum_{j=1}^J (y_j - X_j \hat{\beta}_{FE})^2 / v_j\) is Cochran's Q statistic, and \(W_{FE} = \text{diag}(1/v_j)\) represents fixed-effect weights.
-
Coefficients WLS Solver: With the estimated \(\tau^2\), the final random-effects weights are \(w_j = 1 / (v_j + \tau^2)\). The coefficients are solved via: $$ \hat{\beta} = (X^T W X)^{-1} X^T W y $$ where \(W = \text{diag}(w_j)\).
-
Knapp-Hartung (KH) Covariance Adjustment: Standard random-effects models can underestimate standard errors when \(J\) is small. KH adjusts the variance-covariance matrix: $$ \text{Cov}(\hat{\beta}){KH} = q \cdot (X^T W X)^{-1} $$ where \(q\) is a scaling multiplier based on weighted residual sum of squares: $$ q = \max\left(1.0, \frac{\sum{j=1}^J w_j (y_j - X_j \hat{\beta})^2}{J - P}\right) $$
-
Newey-West HAC Covariance Correction: If the studies are ordered chronologically and exhibit serial correlation, Newey-West HAC standard errors can be computed. The scores (estimating function contributions) for each study \(j\) are: $$ g_j = w_j e_j X_j^T $$ where \(e_j = y_j - X_j \hat{\beta}\) is the study-level residual. The Heteroskedasticity and Autocorrelation Consistent (HAC) long-run covariance of these scores is estimated with the Bartlett kernel: $$ \Omega_{HAC} = \Gamma_0 + \sum_{l=1}^L \left(1 - \frac{l}{L+1}\right) (\Gamma_l + \Gamma_l^T) $$ where \(\Gamma_l = \sum_{j=l+1}^J g_j g_{j-l}^T\). The final sandwich covariance estimator is: $$ \text{Cov}(\hat{\beta}){HAC} = (X^T W X)^{-1} \Omega{HAC} (X^T W X)^{-1} $$
| PARAMETER | DESCRIPTION |
|---|---|
l2_penalty
|
Ridge penalty for numerical stability. Defaults to 1e-5.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
fit |
Fits the random-effects meta-regression model. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
results |
Returns fitted parameters, tau^2, and coefficient stats.
TYPE:
|
Source code in src\xpyrment\analyze\meta_regression.py
results
property
Returns fitted parameters, tau^2, and coefficient stats.
fit
fit(
outcomes: ndarray,
variances: ndarray,
covariates: ndarray,
cov_type: str = "classic",
hac_lag: Optional[int] = None,
) -> MetaRegressor
Fits the random-effects meta-regression model.
| PARAMETER | DESCRIPTION |
|---|---|
outcomes
|
Study-level estimated treatment effects of shape (J,).
TYPE:
|
variances
|
Study-level within-study variances (SE^2) of shape (J,).
TYPE:
|
covariates
|
Covariate matrix of shape (J, P).
TYPE:
|
cov_type
|
Covariance/standard error adjustment type (
TYPE:
|
hac_lag
|
Spectral lag bandwidth parameter (\(L\)). If None, computed automatically. Defaults to None.
TYPE:
|
Source code in src\xpyrment\analyze\meta_regression.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | |
SampleRatioMismatchDetector
Detects Sample Ratio Mismatch (SRM) using frequentist retrospective and online sequential tests.
TODO: Implement sequential binomial SPRT variance boundaries to handle dynamic sample-rate drift under multi-arm scenarios. TODO: Add Monte Carlo power estimation diagnostics for retrospective sample size mismatch sensitivity.
| PARAMETER | DESCRIPTION |
|---|---|
target_treatment_ratio
|
Targeted assignment fraction for the treatment group. Defaults to 0.5.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
test_retrospective |
Performs a retrospective Pearson Chi-Squared goodness-of-fit test. |
test_sequential |
Performs Wald's Sequential Probability Ratio Test (SPRT) over binomial allocations. |
Source code in src\xpyrment\analyze\srm.py
test_retrospective
Performs a retrospective Pearson Chi-Squared goodness-of-fit test.
| PARAMETER | DESCRIPTION |
|---|---|
observed_counts
|
A tuple of (observed_control, observed_treatment).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Union[float, bool]]
|
Dict[str, Union[float, bool]]: SRM test statistics, p-value, and flag indicating mismatch. |
Source code in src\xpyrment\analyze\srm.py
test_sequential
test_sequential(
assignments: ndarray,
delta: float = 0.02,
alpha: float = 0.01,
) -> Dict[str, Union[ndarray, bool]]
Performs Wald's Sequential Probability Ratio Test (SPRT) over binomial allocations.
Determines if the running allocation ratio departs significantly from target ratio. Null Hypothesis (H_0): p = target_treatment_ratio Alternative Hypotheses (H_1): p = target_treatment_ratio + delta OR target_treatment_ratio - delta
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Union[ndarray, bool]]
|
Dict[str, Union[np.ndarray, bool]]: Running likelihood ratios and stopping decisions. |
Source code in src\xpyrment\analyze\srm.py
WinsorizationEngine
Provides symmetric or asymmetric percentile-based capping boundaries.
TODO: Support adaptive Hampel filters for time-series rolling outlier identification and correction. TODO: Implement Huber-loss robust M-estimation scaling as an alternative to hard Winsorized trimming.
| PARAMETER | DESCRIPTION |
|---|---|
bounds
|
Percentile thresholds [lower, upper] in [0, 1]. Defaults to (0.01, 0.99).
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
fit |
Calculates value thresholds at the configured percentiles. |
transform |
Applies winsorization capping boundaries on the target data. |
fit_transform |
Fits thresholds and applies winsorization transformation. |
Source code in src\xpyrment\analyze\outliers.py
fit
fit(data: ndarray) -> WinsorizationEngine
Calculates value thresholds at the configured percentiles.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
Data array to extract percentiles from.
TYPE:
|
Source code in src\xpyrment\analyze\outliers.py
transform
Applies winsorization capping boundaries on the target data.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
Target data to transform.
TYPE:
|
Source code in src\xpyrment\analyze\outliers.py
fit_transform
RatioMetricDeltaMethod
Estimates ratio values, delta-variance bounds, and Wald treatment comparisons.
TODO: Integrate Fieller's Theorem confidence intervals to complement standard Delta Method Wald standard errors. TODO: Implement robust Huber-White sandwich estimator overrides for ratio cluster-correlated observations.
| METHOD | DESCRIPTION |
|---|---|
compute_ratio_variance |
Applies the Delta Method Taylor expansion to calculate variance of ratio of means: |
fit |
Fits the ratio delta method over control and treatment groups. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
results |
Returns point estimates, delta variances, standard errors, and Wald p-values.
TYPE:
|
Source code in src\xpyrment\analyze\ratio.py
results
property
Returns point estimates, delta variances, standard errors, and Wald p-values.
compute_ratio_variance
Applies the Delta Method Taylor expansion to calculate variance of ratio of means:
Var(Y_bar / X_bar) approx (1 / mu_x^2) * Var(Y_bar) + (mu_y^2 / mu_x^4) * Var(X_bar) - 2 * (mu_y / mu_x^3) * Cov(Y_bar, X_bar)
Source code in src\xpyrment\analyze\ratio.py
fit
fit(
num_ctrl: ndarray,
den_ctrl: ndarray,
num_trt: ndarray,
den_trt: ndarray,
) -> RatioMetricDeltaMethod
Fits the ratio delta method over control and treatment groups.
Source code in src\xpyrment\analyze\ratio.py
MetricRegistry
Manages a registry of metrics and evaluates them using topological sorting of a DAG.
TODO: Implement out-of-core streaming batch evaluations for high-velocity telemetry pipelines. TODO: Add symbolic differentiation support to automatically derive Delta Method variance paths for complex algebraic combinations of parents.
| METHOD | DESCRIPTION |
|---|---|
add_raw |
Registers a raw, primitive input metric name. |
add_derived |
Registers a derived metric calculated from dependency parents. |
evaluate |
Evaluates all registered metrics topologically, using caching. |
Source code in src\xpyrment\analyze\registry.py
add_derived
add_derived(
name: str,
dependencies: List[str],
formula: Callable[..., ndarray],
) -> MetricRegistry
Registers a derived metric calculated from dependency parents.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Name of the derived metric.
TYPE:
|
dependencies
|
List of dependency metric names.
TYPE:
|
formula
|
A function returning an np.ndarray given dependency arrays.
TYPE:
|
Source code in src\xpyrment\analyze\registry.py
evaluate
Evaluates all registered metrics topologically, using caching.
| PARAMETER | DESCRIPTION |
|---|---|
inputs
|
Dictionary of raw primitive arrays.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, ndarray]
|
Dict[str, np.ndarray]: Complete dictionary containing both raw and derived arrays. |
Source code in src\xpyrment\analyze\registry.py
run_analysis
run_analysis(
experiment: Experiment,
control: str = "control",
treatment: str = "treatment",
alpha: float = 0.05,
multi_test_correction: Optional[str] = None,
covariates: Optional[List[str]] = None,
) -> AnalysisResult
Executes the statistical analysis across all registered metrics in an Experiment container.
Iterates over each registered metric in the experiment, calculates means, relative lifts, p-values,
confidence intervals, and power. If requested, applies multiple testing corrections across the p-values,
updates the experiment state to ANALYZED, and returns a structured AnalysisResult.
| PARAMETER | DESCRIPTION |
|---|---|
experiment
|
The initialized, pre-registered experiment setup container.
TYPE:
|
control
|
The label of the control variant in the treatment column. Defaults to
TYPE:
|
treatment
|
The label of the treatment variant in the treatment column. Defaults to
TYPE:
|
alpha
|
Significance level (Type I error probability) for confidence intervals. Defaults to 0.05.
TYPE:
|
multi_test_correction
|
Multiple testing correction algorithm to apply across the
registered metrics. Options:
TYPE:
|
covariates
|
List of covariates to check balance and adjust.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
AnalysisResult
|
A rich, summarized results container.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no metrics have been registered, or if control/treatment labels are missing from the active dataset. |
PhaseOrderError
|
If the experiment is in an invalid state for running analysis. |
Source code in src\xpyrment\analyze\orchestrator.py
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | |
setup
setup(
data: DataFrame,
treatment_col: str,
id_col: Optional[str] = None,
covariates: Optional[List[str]] = None,
) -> Experiment
Initializes the experimental setup container, serving as the library's primary entrypoint.
Sets up the Experiment object with the target dataset, identifying variant and unit columns,
and locks the state machine to ExperimentState.DESIGNED.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
The main experiment dataset containing exposure logs and outcomes.
TYPE:
|
treatment_col
|
Column name containing variant strings (e.g.,
TYPE:
|
id_col
|
Column name containing unique unit identifiers (e.g.,
TYPE:
|
covariates
|
Optional list of baseline covariates.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Experiment
|
A state-gated
TYPE:
|
Source code in src\xpyrment\analyze\orchestrator.py
apply_cuped
Applies Controlled-experiments Using Pre-Experiment Data (CUPED) on a series.
CUPED (Deng et al., 2013) is the standard variance reduction method in modern online experimentation. It uses pre-experiment covariate data to remove pre-existing user-level variation, leaving a highly concentrated treatment signal.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The dataset containing both target and pre-period columns.
TYPE:
|
target_col
|
Column name representing the post-experiment metric of interest (\(Y\)).
TYPE:
|
pre_col
|
Column name representing the pre-period covariate (\(X\)).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
pd.Series: A pandas Series containing the CUPED-adjusted values. |
Source code in src\xpyrment\analyze\variance_reduction.py
apply_multiple_testing_correction
apply_multiple_testing_correction(
p_values: List[float],
alpha: float = 0.05,
method: str = "fdr_bh",
) -> List[float]
Applies multiple testing corrections on p-values using statsmodels.
TODO: Implement step-down Dunnett's correction procedure for multi-arm comparisons against a common control. TODO: Support family-wise bootstrap-based resampling corrections to account for non-normal dependency structures.
When performing multiple statistical tests simultaneously, the probability of obtaining at least one false positive (rejecting \(H_0\) when it is actually true) increases with the number of tests. This inflation of Type I error is known as the Multiple Testing Problem.
Mathematical Background of FWER Inflation
For \(m\) independent tests, each run at nominal significance level \(\\alpha\): $$ \text{FWER} = P(\text{at least one false positive}) = 1 - (1 - \alpha)^m $$ - If \(m = 1\) and \(\\alpha = 0.05\), \(\\text{FWER} = 0.05\). - If \(m = 10\) and \(\\alpha = 0.05\), \(\\text{FWER} = 1 - (0.95)^{10} \\approx 0.40\) (\(40\\%\) false positive probability). - If \(m = 50\) and \(\\alpha = 0.05\), \(\\text{FWER} \\approx 0.92\) (near-certainty of committing a false positive).
Supported Correction Methodologies
- Bonferroni Correction (
"bonferroni"): Controls the Family-Wise Error Rate (FWER) in the strong sense. It adjusts each p-value by multiplying it by the total number of tests \(m\): $$ p^{\text{adj}}_i = \min(p_i \times m, \ 1.0) $$ Highly conservative; has low statistical power when \(m\) is large or when tests are highly correlated. - Holm-Bonferroni Procedure (
"holm"): A step-down FWER control method that is uniformly more powerful than the standard Bonferroni correction. It orders the raw p-values: \(p_{(1)} \\le p_{(2)} \\le \\dots \\le p_{(m)}\). The adjusted p-values are computed sequentially as: $$ p^{\text{adj}}{(i)} = \max \left( (m - i + 1) \times p{(i)}, \ p^{\text{adj}}_{(i-1)} \right) \quad \text{for } i \ge 1 $$ (with \(p^{\\text{adj}}_{(0)} = 0\), bounded above by \(1.0\)). - Benjamini-Hochberg (BH) Procedure (
"fdr_bh"): Controls the False Discovery Rate (FDR), which is the expected proportion of false positives among all rejections. This is the preferred method for digital product experimentation (A/B testing with multiple secondary metrics), as it provides vastly superior statistical power compared to FWER controllers. It orders raw p-values: \(p_{(1)} \\le p_{(2)} \\le \\dots \\le p_{(m)}\). The adjusted p-values are calculated as: $$ p^{\text{adj}}{(i)} = \min \left( \frac{m}{i} \times p{(i)}, \ p^{\text{adj}}_{(i+1)} \right) \quad \text{for } i \le m - 1 $$ (with \(p^{\\text{adj}}_{(m)} = p_{(m)}\), bounded above by \(1.0\)). - Benjamini-Yekutieli (BY) Procedure (
"fdr_by"): Controls the False Discovery Rate under arbitrary dependency structures (i.e. positive regression dependency or negative correlation) among test statistics. BY applies an additional harmonic penalty: $$ P_{(i)} \le \frac{i}{m \sum_{j=1}^m \frac{1}{j}} \alpha $$ - Hochberg Step-up Procedure (
"hochberg"): A step-up FWER controlling procedure that is uniformly more powerful than Holm-Bonferroni, but requires the test statistics to be independent or satisfy Simes' inequality. It starts from the largest p-value down to the smallest.
| PARAMETER | DESCRIPTION |
|---|---|
p_values
|
List of raw, unadjusted p-values calculated from various metric tests.
TYPE:
|
alpha
|
Nominal significance level (e.g., 0.05). Defaults to 0.05.
TYPE:
|
method
|
Correction algorithm. Options include
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[float]
|
List[float]: A list of adjusted p-values, in the same index order as the input. |