Skip to content

Balance

balance

Covariate balance checking and standardized mean differences (SMD).

This module provides diagnostic engines to evaluate whether the control and treatment groups are balanced across key pre-experiment covariates (demographics, platform, historical engagement), preventing confounding or pre-existing selection bias from skewing treatment estimates.

FUNCTION DESCRIPTION
check_covariate_balance

Computes Normalized Differences and t-tests to evaluate balance of pre-period covariates.

check_covariate_balance

check_covariate_balance(
    df: DataFrame, treatment_col: str, covariate_cols: list
) -> dict

Computes Normalized Differences and t-tests to evaluate balance of pre-period covariates.

Verifies that pre-period characteristics are distributed symmetrically across treatment arms. While simple t-tests can be used, they are highly sensitive in online datasets: with large footprints, extremely tiny, practically negligible differences will yield highly significant p-values (\(p < 0.05\)). Therefore, we compute Standardized Mean Differences (SMD) as the primary effect size metric.

Mathematical Representation
  1. Standardized Mean Difference (SMD) for continuous covariates: Let \(\bar{X}_T\) and \(\bar{X}_C\) be the sample means of a covariate \(X\) in the treatment and control groups, and let \(s_T^2\) and \(s_C^2\) be their sample variances. $$ \text{SMD} = \frac{\bar{X}_T - \bar{X}_C}{\sqrt{\frac{s_T^2 + s_C^2}{2}}} $$
  2. Pearson Chi-Square Test for Independence for categorical covariates: Evaluates whether the proportion of units in each category is independent of treatment.
PARAMETER DESCRIPTION
df

The experimental dataset containing units, treatment assignments, and covariates.

TYPE: DataFrame

treatment_col

Column name identifying experimental groups/arms.

TYPE: str

covariate_cols

List of column names representing categorical or continuous pre-experiment covariates.

TYPE: list

RETURNS DESCRIPTION
dict

A dictionary mapping each covariate name to a diagnostic sub-dictionary containing SMD, p-values, and balance classification tags.

TYPE: dict

Source code in src\xpyrment\validate\balance.py
def check_covariate_balance(df: pd.DataFrame, treatment_col: str, covariate_cols: list) -> dict:
    r"""Computes Normalized Differences and t-tests to evaluate balance of pre-period covariates.

    Verifies that pre-period characteristics are distributed symmetrically across treatment arms.
    While simple t-tests can be used, they are highly sensitive in online datasets: with large footprints,
    extremely tiny, practically negligible differences will yield highly significant p-values ($p < 0.05$).
    Therefore, we compute **Standardized Mean Differences (SMD)** as the primary effect size metric.

    ??? mathbox "Mathematical Representation"

        1. **Standardized Mean Difference (SMD)** for continuous covariates:
           Let $\bar{X}_T$ and $\bar{X}_C$ be the sample means of a covariate $X$ in the treatment and control groups,
           and let $s_T^2$ and $s_C^2$ be their sample variances.
           $$
           \text{SMD} = \frac{\bar{X}_T - \bar{X}_C}{\sqrt{\frac{s_T^2 + s_C^2}{2}}}
           $$
        2. **Pearson Chi-Square Test for Independence** for categorical covariates:
           Evaluates whether the proportion of units in each category is independent of treatment.

    Args:
        df (pd.DataFrame): The experimental dataset containing units, treatment assignments, and covariates.
        treatment_col (str): Column name identifying experimental groups/arms.
        covariate_cols (list): List of column names representing categorical or continuous pre-experiment covariates.

    Returns:
        dict: A dictionary mapping each covariate name to a diagnostic sub-dictionary containing SMD, p-values,
            and balance classification tags.
    """
    import numpy as np
    from scipy import stats

    groups = df[treatment_col].unique()
    if len(groups) < 2:
        raise ValueError(f"Balance check requires at least 2 distinct groups in '{treatment_col}'. Found {len(groups)}.")

    # Sort groups to be deterministic: first group is control (group 0), second is treatment (group 1)
    groups = sorted(groups)
    grp_0 = df[df[treatment_col] == groups[0]]
    grp_1 = df[df[treatment_col] == groups[1]]

    results = {}

    for cov in covariate_cols:
        if cov not in df.columns:
            raise KeyError(f"Covariate column '{cov}' not found in DataFrame.")

        # Determine type: check if column is numeric
        if pd.api.types.is_numeric_dtype(df[cov]):
            val_0 = grp_0[cov].dropna()
            val_1 = grp_1[cov].dropna()

            mean_0 = val_0.mean()
            mean_1 = val_1.mean()
            var_0 = val_0.var(ddof=1)
            var_1 = val_1.var(ddof=1)

            # Compute Standardized Mean Difference (SMD)
            pooled_sd = np.sqrt((var_0 + var_1) / 2.0)
            if pooled_sd == 0.0:
                smd = 0.0
            else:
                smd = (mean_1 - mean_0) / pooled_sd

            # Welch's t-test (unequal variances assumed)
            if len(val_0) > 0 and len(val_1) > 0:
                _, p_val = stats.ttest_ind(val_1, val_0, equal_var=False)
                # Kolmogorov-Smirnov test for distribution shape alignment
                ks_stat, ks_p = stats.ks_2samp(val_1, val_0)
            else:
                p_val = 1.0
                ks_stat, ks_p = 0.0, 1.0

            results[cov] = {
                "type": "numeric",
                "smd": float(smd),
                "p_value": float(p_val),
                "ks_statistic": float(ks_stat),
                "ks_p_value": float(ks_p)
            }
        else:
            # Categorical covariate: build crosstab contingency table
            contingency_table = pd.crosstab(df[cov], df[treatment_col])

            if contingency_table.shape[0] > 0 and contingency_table.shape[1] > 0:
                # Pearson's chi-square test of independence
                chi2_res = stats.chi2_contingency(contingency_table)
                p_val = chi2_res.pvalue
            else:
                p_val = 1.0

            results[cov] = {
                "type": "categorical",
                "p_value": float(p_val)
            }

    # Integrate Mahalanobis distance multivariate covariance balance tests
    numeric_covs = [cov for cov in covariate_cols if pd.api.types.is_numeric_dtype(df[cov])]
    if len(numeric_covs) >= 1:
        data_0 = grp_0[numeric_covs].dropna()
        data_1 = grp_1[numeric_covs].dropna()

        if len(data_0) > len(numeric_covs) and len(data_1) > len(numeric_covs):
            mean_0 = data_0.mean().values
            mean_1 = data_1.mean().values

            # Pooled covariance matrix
            cov_0 = data_0.cov().values
            cov_1 = data_1.cov().values
            pooled_cov = (cov_0 + cov_1) / 2.0

            try:
                inv_pooled_cov = np.linalg.pinv(pooled_cov)
                diff = mean_1 - mean_0
                mahalanobis_dist = np.sqrt(np.dot(np.dot(diff, inv_pooled_cov), diff))
                results["_multivariate"] = {
                    "mahalanobis_distance": float(mahalanobis_dist),
                    "n_covariates": len(numeric_covs)
                }
            except np.linalg.LinAlgError:
                pass

    return results