Variance Reduction

variance_reduction

Variance reduction algorithms, focusing on continuous and ratio-level CUPED.

This module provides high-performance variance reduction utilities. Reducing metric variance is mathematically equivalent to increasing the signal-to-noise ratio, enabling massive gains in statistical power and substantial reductions in required sample size (and therefore runtime).

FUNCTION	DESCRIPTION
`apply_cuped`	Applies Controlled-experiments Using Pre-Experiment Data (CUPED) on a series.

apply_cuped

apply_cuped(
    df: DataFrame, target_col: str, pre_col: str
) -> Series

Applies Controlled-experiments Using Pre-Experiment Data (CUPED) on a series.

CUPED (Deng et al., 2013) is the standard variance reduction method in modern online experimentation. It uses pre-experiment covariate data to remove pre-existing user-level variation, leaving a highly concentrated treatment signal.

PARAMETER	DESCRIPTION
`df`	The dataset containing both target and pre-period columns. TYPE: `DataFrame`
`target_col`	Column name representing the post-experiment metric of interest (\(Y\)). TYPE: `str`
`pre_col`	Column name representing the pre-period covariate (\(X\)). TYPE: `str`

RETURNS	DESCRIPTION
`Series`	pd.Series: A pandas Series containing the CUPED-adjusted values.

Source code in src\xpyrment\analyze\variance_reduction.py

def apply_cuped(df: pd.DataFrame, target_col: str, pre_col: str) -> pd.Series:
    """Applies Controlled-experiments Using Pre-Experiment Data (CUPED) on a series.

    CUPED (Deng et al., 2013) is the standard variance reduction method in modern online experimentation. It uses
    pre-experiment covariate data to remove pre-existing user-level variation, leaving a highly concentrated treatment signal.

    Args:
        df (pd.DataFrame): The dataset containing both target and pre-period columns.
        target_col (str): Column name representing the post-experiment metric of interest ($Y$).
        pre_col (str): Column name representing the pre-period covariate ($X$).

    Returns:
        pd.Series: A pandas Series containing the CUPED-adjusted values.
    """
    clean_df = df[[target_col, pre_col]].dropna()
    if len(clean_df) < 2:
        return df[target_col]

    cov_yx = clean_df[target_col].cov(clean_df[pre_col])
    var_x = clean_df[pre_col].var()

    if var_x > 0.0:
        theta = cov_yx / var_x
    else:
        theta = 0.0

    mean_x = clean_df[pre_col].mean()

    # Apply CUPED adjustment to all rows, maintaining the original Series shape and index
    adjusted = df[target_col] - theta * (df[pre_col] - mean_x)
    return adjusted