Skip to content

Interactions Module

The xpyrment.interactions module contains submodules and components for interactions.

interactions

Multi-factor experimental design and covariate-treatment interaction analysis.

This package provides tools to discover, model, and visualize interactions within experimental studies. Understanding interactions is critical to determine whether multiple treatments conflict or work synergistically, and whether a treatment effect is heterogeneous across user characteristics.

Submodules: - detector: Orchestrates and dispatches multi-model interaction checks across covariates and factors. - anova: Performs Factorial Analysis of Variance (ANOVA) and partitions sum of squares. - regression: Fits interactive regression models and conducts Likelihood Ratio Tests (LRT). - hstat: Quantifies model-agnostic interaction strengths using Friedman's H-statistic. - shap: Calculates second-order game-theoretic Shapley interaction values (TreeSHAP). - plots: Renders symmetric heatmaps and diagnostic visualization charts.

MODULE DESCRIPTION
anova

Factorial Analysis of Variance (ANOVA) and statistical interaction testing.

detector

Interaction dispatcher and multi-model statistical screening.

hstat

Model-agnostic interaction measurement using Friedman's H-statistic.

plots

Visualizations and plot generation utilities for multi-factor interactions.

regression

Interactive regression modeling and Likelihood Ratio Testing (LRT).

shap

Game-theoretic feature interaction analysis using SHAP interaction indices.

CLASS DESCRIPTION
InteractionDetector

Dispatches multi-factor, covariate-treatment, and non-linear interactions.

FUNCTION DESCRIPTION
run_factorial_anova

Computes factorial ANOVA tables with interaction terms for DoE factors.

check_treatment_covariate_interaction

Computes a Likelihood Ratio Test (LRT) to check if a covariate significantly interacts with the treatment split.

calculate_shap_interactions

Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).

compute_friedman_h_statistic

Computes model-agnostic Friedman's H-statistic representing the degree of interaction between two features.

plot_interaction_heatmap

Generates an interaction term heatmap using matplotlib.

plot_interaction_effects

Plots interaction effects between a treatment and a covariate on a metric.

InteractionDetector

InteractionDetector(experiment)

Dispatches multi-factor, covariate-treatment, and non-linear interactions.

In complex online and physical experiments, interventions rarely operate in a vacuum. The treatment effect of one change may depend heavily on the status of other features (multi-factor interaction) or the characteristics of the experimental unit (covariate-treatment interaction, or Heterogeneous Treatment Effect). The InteractionDetector screens for these interactions automatically.

Mathematical Interaction Categories
  1. Multi-Factor Synergy or Interference (Factor-Factor Interaction): Evaluates whether combining Treatment 1 (\(T_1\)) and Treatment 2 (\(T_2\)) yields a response that differs from the sum of their individual effects: $$ Y = \beta_0 + \beta_1 T_1 + \beta_2 T_2 + \beta_3 (T_1 \times T_2) + \varepsilon $$
  2. If \(\\beta_3 > 0\), the factors are synergistic.
  3. If \(\\beta_3 < 0\), the factors interfere with each other (redundancy or collision).

  4. Covariate-Treatment Interaction (Heterogeneous Treatment Effects - HTE): Evaluates whether the treatment effect varies systematically across pre-experiment characteristics \(C\) (e.g., country, browser, mobile vs. desktop, or historical spending): $$ Y = \beta_0 + \beta_1 T + \beta_2 C + \beta_3 (T \times C) + \varepsilon $$ A significant \(\\beta_3\) (\(p < 0.05\)) indicates that the treatment effect varies across subpopulations.

  5. Response Surface Curvature (Non-linear Interaction): In continuous Design of Experiments (DoE), evaluates quadratic curvatures and continuous interaction slopes: $$ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1^2 + \beta_4 X_2^2 + \beta_5 (X_1 \times X_2) + \varepsilon $$ where \(\\beta_5\) captures the continuous twisting of the response landscape, and \(\\beta_3, \\beta_4\) capture curvature.

Algorithmic Screening Workflow
function detect_all(experiment):
    Initialize interaction_results = {}
    For each pair of factors (A, B) in design:
        Run OLS: Y ~ A * B
        Extract interaction p-value.
        If p-value < 0.05:
            Add to interaction_results["factor_factor"]

    For each covariate C in experiment.covariates:
        Run OLS: Y ~ Treatment * C
        Extract interaction p-value.
        If p-value < 0.05:
            Add to interaction_results["heterogeneous_treatment_effects"]

    Return interaction_results
ATTRIBUTE DESCRIPTION
experiment

The completed or active experiment container.

TYPE: Experiment

PARAMETER DESCRIPTION
experiment

The experiment container containing datasets, metrics, and factor definitions.

TYPE: Experiment

METHOD DESCRIPTION
detect_all

Runs ANOVA and regression checks to identify interaction terms across factors and covariates.

Source code in src\xpyrment\interactions\detector.py
def __init__(self, experiment):
    """Initializes an InteractionDetector.

    Args:
        experiment (Experiment): The experiment container containing datasets, metrics, and factor definitions.
    """
    self.experiment = experiment

detect_all

detect_all() -> dict

Runs ANOVA and regression checks to identify interaction terms across factors and covariates.

Fits multi-variable linear regression models with product terms and flags statistically significant interactions.

RETURNS DESCRIPTION
dict

A dictionary grouping detected interactions, their estimated coefficients, standard errors, and p-values.

TYPE: dict

Source code in src\xpyrment\interactions\detector.py
def detect_all(self) -> dict:
    """Runs ANOVA and regression checks to identify interaction terms across factors and covariates.

    Fits multi-variable linear regression models with product terms and flags statistically
    significant interactions.

    Returns:
        dict: A dictionary grouping detected interactions, their estimated coefficients, standard errors,
            and p-values.
    """
    from xpyrment.interactions.regression import check_treatment_covariate_interaction

    results = {
        "factor_factor": [],
        "heterogeneous_treatment_effects": []
    }

    df = self.experiment.data
    treatment_col = self.experiment.treatment_col
    target_metrics = [m.name for m in self.experiment.metrics]
    # Fallback: if no metrics registered, but columns exist, we could guess, 
    # but let's stick to registered metrics.
    if not target_metrics:
        # If orchestrator was used, metrics might be registered there, 
        # but usually they are in experiment.metrics
        pass
    covariates = self.experiment.covariates

    # 1. Screen for Heterogeneous Treatment Effects (Covariate-Treatment)
    for metric in target_metrics:
        for cov in covariates:
            p_val = check_treatment_covariate_interaction(df, treatment_col, cov, metric)
            if p_val < 0.05:
                results["heterogeneous_treatment_effects"].append({
                    "metric": metric,
                    "covariate": cov,
                    "p_value": p_val
                })

    # 2. Factor-Factor Interactions (if multiple factors exist)
    # Note: In xpyrment, factors are often encoded in the variant column or separate columns.
    # For simplicity, we assume if treatment_col contains more than 2 variants, we check them.
    # But a more robust way is to check if the user registered multiple factors.
    # For now, let's just stick to the docstring's plan of OLS: Y ~ T * C.

    return results

run_factorial_anova

run_factorial_anova(
    df: DataFrame, formula: str
) -> DataFrame

Computes factorial ANOVA tables with interaction terms for DoE factors.

Factorial ANOVA decomposes the total variability of an experimental outcome into portions attributable to main factor effects, multi-factor interaction effects, and random error. This is crucial for verifying which process factors have a statistically significant impact on the response variable, and whether factors behave synergetically or antagonistically when combined.

Mathematical Formulation

For a two-factor experimental design (Factor \(A\) with \(I\) levels, Factor \(B\) with \(J\) levels, and \(K\) replicates per cell), the response \(Y_{ijk}\) is modeled as: $$ Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta){ij} + \varepsilon{ijk} $$ where: - \(\\mu\): The grand mean of the response. - \(\\alpha_i\): The main effect of Factor \(A\) at level \(i\) (subject to \(\\sum_{i=1}^I \\alpha_i = 0\)). - \(\\beta_j\): The main effect of Factor \(B\) at level \(j\) (subject to \(\\sum_{j=1}^J \\beta_j = 0\)). - \((\\alpha\\beta)_{ij}\) smokes: The interaction effect between Factor \(A\) at level \(i\) and Factor \(B\) at level \(j\) (subject to \(\\sum_{i=1}^I (\\alpha\\beta)_{ij} = \\sum_{j=1}^J (\\alpha\\beta)_{ij} = 0\)). - \(\\varepsilon_{ijk}\): Independent and identically distributed normal error terms, \(\\varepsilon_{ijk} \\sim \\mathcal{N}(0, \\sigma^2)\).

Decomposition of Sum of Squares (SS): The total sum of squares (\(SS_{\\text{Total}}\)) measures total sample variation: $$ SS_{\text{Total}} = SS_A + SS_B + SS_{AB} + SS_{\text{Error}} $$ where: - \(SS_A = J K \\sum_{i=1}^I (\\bar{Y}_{i\\cdot\\cdot} - \\bar{Y}_{\\cdot\\cdot\\cdot})^2\) (Main effect \(A\)) - \(SS_B = I K \\sum_{j=1}^J (\\bar{Y}_{\\cdot j\\cdot} - \\bar{Y}_{\\cdot\\cdot\\cdot})^2\) (Main effect \(B\)) - \(SS_{AB} = K \\sum_{i=1}^I \\sum_{j=1}^J (\\bar{Y}_{ij\\cdot} - \\bar{Y}_{i\\cdot\\cdot} - \\bar{Y}_{\\cdot j\\cdot} + \\bar{Y}_{\\cdot\\cdot\\cdot})^2\) (Interaction effect \(AB\)) - \(SS_{\\text{Error}} = \\sum_{i=1}^I \\sum_{j=1}^J \\sum_{k=1}^K (Y_{ijk} - \\bar{Y}_{ij\\cdot})^2\) (Residual variation)

F-Test Ratios

Significance of each effect is evaluated by comparing the Mean Square (\(MS = SS / df\)) against the residual Mean Square (\(MS_{\\text{Error}}\)): - For Factor \(A\): $$ F_A = \frac{MS_A}{MS_{\text{Error}}} = \frac{SS_A / (I-1)}{SS_{\text{Error}} / [IJ(K-1)]} \sim F_{I-1, \ IJ(K-1)} $$ - For Interaction \(AB\): $$ F_{AB} = \frac{MS_{AB}}{MS_{\text{Error}}} = \frac{SS_{AB} / [(I-1)(J-1)]}{SS_{\text{Error}} / [IJ(K-1)]} \sim F_{(I-1)(J-1), \ IJ(K-1)} $$ A significant \(F_{AB}\) (\(p < 0.05\)) proves that the effect of Factor \(A\) depends on the level of Factor \(B\). This indicates that interpreting main effects alone is statistically misleading; the interaction must be evaluated.

PARAMETER DESCRIPTION
df

The experimental dataset.

TYPE: DataFrame

formula

R-style regression formula (e.g., "revenue ~ factor_a * factor_b").

TYPE: str

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A standard ANOVA table detailing Sum of Squares, degrees of freedom (\(df\)), F-statistics, and p-values for each term.

Source code in src\xpyrment\interactions\anova.py
def run_factorial_anova(df: pd.DataFrame, formula: str) -> pd.DataFrame:
    r"""Computes factorial ANOVA tables with interaction terms for DoE factors.

    Factorial ANOVA decomposes the total variability of an experimental outcome into portions attributable
    to main factor effects, multi-factor interaction effects, and random error. This is crucial for verifying
    which process factors have a statistically significant impact on the response variable, and whether factors
    behave synergetically or antagonistically when combined.

    Mathematical Formulation:
        For a two-factor experimental design (Factor $A$ with $I$ levels, Factor $B$ with $J$ levels, and $K$ replicates per cell),
        the response $Y_{ijk}$ is modeled as:
        $$
        Y_{ijk} = \\mu + \\alpha_i + \\beta_j + (\\alpha\\beta)_{ij} + \\varepsilon_{ijk}
        $$
        where:
        - $\\mu$: The grand mean of the response.
        - $\\alpha_i$: The main effect of Factor $A$ at level $i$ (subject to $\\sum_{i=1}^I \\alpha_i = 0$).
        - $\\beta_j$: The main effect of Factor $B$ at level $j$ (subject to $\\sum_{j=1}^J \\beta_j = 0$).
        - $(\\alpha\\beta)_{ij}$ smokes: The interaction effect between Factor $A$ at level $i$ and Factor $B$ at level $j$
          (subject to $\\sum_{i=1}^I (\\alpha\\beta)_{ij} = \\sum_{j=1}^J (\\alpha\\beta)_{ij} = 0$).
        - $\\varepsilon_{ijk}$: Independent and identically distributed normal error terms, $\\varepsilon_{ijk} \\sim \\mathcal{N}(0, \\sigma^2)$.

    Decomposition of Sum of Squares (SS):
        The total sum of squares ($SS_{\\text{Total}}$) measures total sample variation:
        $$
        SS_{\\text{Total}} = SS_A + SS_B + SS_{AB} + SS_{\\text{Error}}
        $$
        where:
        - $SS_A = J K \\sum_{i=1}^I (\\bar{Y}_{i\\cdot\\cdot} - \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Main effect $A$)
        - $SS_B = I K \\sum_{j=1}^J (\\bar{Y}_{\\cdot j\\cdot} - \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Main effect $B$)
        - $SS_{AB} = K \\sum_{i=1}^I \\sum_{j=1}^J (\\bar{Y}_{ij\\cdot} - \\bar{Y}_{i\\cdot\\cdot} - \\bar{Y}_{\\cdot j\\cdot} + \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Interaction effect $AB$)
        - $SS_{\\text{Error}} = \\sum_{i=1}^I \\sum_{j=1}^J \\sum_{k=1}^K (Y_{ijk} - \\bar{Y}_{ij\\cdot})^2$ (Residual variation)

    F-Test Ratios:
        Significance of each effect is evaluated by comparing the Mean Square ($MS = SS / df$) against the residual Mean Square ($MS_{\\text{Error}}$):
        - For Factor $A$:
          $$
          F_A = \\frac{MS_A}{MS_{\\text{Error}}} = \\frac{SS_A / (I-1)}{SS_{\\text{Error}} / [IJ(K-1)]} \\sim F_{I-1, \\ IJ(K-1)}
          $$
        - For Interaction $AB$:
          $$
          F_{AB} = \\frac{MS_{AB}}{MS_{\\text{Error}}} = \\frac{SS_{AB} / [(I-1)(J-1)]}{SS_{\\text{Error}} / [IJ(K-1)]} \\sim F_{(I-1)(J-1), \\ IJ(K-1)}
          $$
        A significant $F_{AB}$ ($p < 0.05$) proves that the effect of Factor $A$ depends on the level of Factor $B$. This indicates that
        interpreting main effects alone is statistically misleading; the interaction must be evaluated.

    Args:
        df (pd.DataFrame): The experimental dataset.
        formula (str): R-style regression formula (e.g., `"revenue ~ factor_a * factor_b"`).

    Returns:
        pd.DataFrame: A standard ANOVA table detailing Sum of Squares, degrees of freedom ($df$), F-statistics,
            and p-values for each term.
    """
    import statsmodels.api as sm
    import statsmodels.formula.api as smf

    # Fit ordinary least squares model using R-style formula
    model = smf.ols(formula, data=df).fit()

    # Compute Type II ANOVA table
    anova_table = sm.stats.anova_lm(model, typ=2)

    return anova_table

check_treatment_covariate_interaction

check_treatment_covariate_interaction(
    df: DataFrame,
    treatment_col: str,
    covariate_col: str,
    target_col: str,
) -> float

Computes a Likelihood Ratio Test (LRT) to check if a covariate significantly interacts with the treatment split.

Evaluates whether the treatment effect varies across different values of a pre-period covariate. To determine if the interaction term is statistically necessary (rather than just overfitting the sample), we fit nested regression models and perform a classical Likelihood Ratio Test.

Mathematical Formulation of Nested Models

We define two models representing competing hypotheses: 1. Restricted Null Model (\(M_{\\text{null}}\)) (additive, assuming no interaction): $$ Y_i = \beta_0 + \beta_1 T_i + \beta_2 C_i + \varepsilon_i $$ 2. Unrestricted Alternative Model (\(M_{\\text{alt}}\)) (interactive, assuming interaction): $$ Y_i = \beta_0 + \beta_1 T_i + \beta_2 C_i + \beta_3 (T_i \times C_i) + \varepsilon_i $$ where: - \(Y_i\): The target outcome metric (\(target\\_col\)) for unit \(i\). - \(T_i\): The treatment group indicator (\(treatment\\_col\), e.g., \(0\) or \(1\)). - \(C_i\): The pre-period covariate (\(covariate\\_col\), e.g., device type or baseline revenue). - \(T_i \\times C_i\): The interaction/product term.

The Likelihood Ratio Test (LRT): Let \(\\ln L(M_{\\text{null}})\) and \(\\ln L(M_{\\text{alt}})\) be the maximized log-likelihood values of the nested models. The test statistic \(D\) is computed as: $$ D = 2 \left( \ln L(M_{\text{alt}}) - \ln L(M_{\text{null}}) \right) $$ Under the null hypothesis \(H_0: \\beta_3 = 0\) (no interaction), the test statistic \(D\) asymptotically follows a Chi-square distribution with degrees of freedom equal to the difference in the number of parameters: $$ D \sim \chi^2_{df_{\text{alt}} - df_{\text{null}}} = \chi^2_1 $$ (since we added exactly one interaction parameter, \(\\beta_3\)).

The resulting p-value is calculated as:
$$
p = 1 - F_{\\chi^2_1}(D)
$$
where $F$ is the cumulative distribution function of the Chi-square distribution with 1 degree of freedom.
If $p < 0.05$, the alternative model is selected, confirming that the treatment effect varies across levels of the covariate.
PARAMETER DESCRIPTION
df

The experimental dataset.

TYPE: DataFrame

treatment_col

Column containing treatment assignments.

TYPE: str

covariate_col

Column containing the target pre-period covariate under evaluation.

TYPE: str

target_col

Column containing the outcome response variable (\(Y\)).

TYPE: str

RETURNS DESCRIPTION
float

The calculated p-value of the Likelihood Ratio Test. A value \(< 0.05\) indicates a significant interaction.

TYPE: float

Source code in src\xpyrment\interactions\regression.py
def check_treatment_covariate_interaction(df: pd.DataFrame, treatment_col: str, covariate_col: str, target_col: str) -> float:
    r"""Computes a Likelihood Ratio Test (LRT) to check if a covariate significantly interacts with the treatment split.

    Evaluates whether the treatment effect varies across different values of a pre-period covariate.
    To determine if the interaction term is statistically necessary (rather than just overfitting the sample),
    we fit nested regression models and perform a classical Likelihood Ratio Test.

    Mathematical Formulation of Nested Models:
        We define two models representing competing hypotheses:
        1. **Restricted Null Model ($M_{\\text{null}}$)** (additive, assuming no interaction):
           $$
           Y_i = \\beta_0 + \\beta_1 T_i + \\beta_2 C_i + \\varepsilon_i
           $$
        2. **Unrestricted Alternative Model ($M_{\\text{alt}}$)** (interactive, assuming interaction):
           $$
           Y_i = \\beta_0 + \\beta_1 T_i + \\beta_2 C_i + \\beta_3 (T_i \\times C_i) + \\varepsilon_i
           $$
        where:
        - $Y_i$: The target outcome metric ($target\\_col$) for unit $i$.
        - $T_i$: The treatment group indicator ($treatment\\_col$, e.g., $0$ or $1$).
        - $C_i$: The pre-period covariate ($covariate\\_col$, e.g., device type or baseline revenue).
        - $T_i \\times C_i$: The interaction/product term.

    The Likelihood Ratio Test (LRT):
        Let $\\ln L(M_{\\text{null}})$ and $\\ln L(M_{\\text{alt}})$ be the maximized log-likelihood values of the nested models.
        The test statistic $D$ is computed as:
        $$
        D = 2 \\left( \\ln L(M_{\\text{alt}}) - \\ln L(M_{\\text{null}}) \\right)
        $$
        Under the null hypothesis $H_0: \\beta_3 = 0$ (no interaction), the test statistic $D$ asymptotically follows a
        Chi-square distribution with degrees of freedom equal to the difference in the number of parameters:
        $$
        D \\sim \\chi^2_{df_{\\text{alt}} - df_{\\text{null}}} = \\chi^2_1
        $$
        (since we added exactly one interaction parameter, $\\beta_3$).

        The resulting p-value is calculated as:
        $$
        p = 1 - F_{\\chi^2_1}(D)
        $$
        where $F$ is the cumulative distribution function of the Chi-square distribution with 1 degree of freedom.
        If $p < 0.05$, the alternative model is selected, confirming that the treatment effect varies across levels of the covariate.

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column containing treatment assignments.
        covariate_col (str): Column containing the target pre-period covariate under evaluation.
        target_col (str): Column containing the outcome response variable ($Y$).

    Returns:
        float: The calculated p-value of the Likelihood Ratio Test. A value $< 0.05$ indicates a significant interaction.
    """
    import statsmodels.formula.api as smf
    from scipy.stats import chi2

    # Model 1: Additive (Restricted)
    formula_null = f"{target_col} ~ {treatment_col} + {covariate_col}"
    model_null = smf.ols(formula_null, data=df).fit()

    # Model 2: Interactive (Unrestricted)
    formula_alt = f"{target_col} ~ {treatment_col} * {covariate_col}"
    model_alt = smf.ols(formula_alt, data=df).fit()

    # Likelihood Ratio Test
    # For OLS, LRT = n * ln(RSS_null / RSS_alt)
    # But statsmodels OLS results have .llf (log-likelihood)
    llf_null = model_null.llf
    llf_alt = model_alt.llf

    lr_stat = 2 * (llf_alt - llf_null)
    p_value = chi2.sf(lr_stat, df=1)

    return float(p_value)

calculate_shap_interactions

calculate_shap_interactions(
    model: Any, X_data: Any
) -> Union[list, ndarray]

Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).

SHAP (SHapley Additive exPlanations) interaction values (Lundberg et al., 2018) are based on the coalitional game-theoretic Shapley Interaction Index (Grabisch and Roubens, 1999). While standard Shapley values partition a model's prediction additively among individual features, SHAP interaction values separate these contributions into pure main effects and pairwise interaction effects, providing an exact model-agnostic representation of how features cooperate or compete.

Mathematical Formulation and Coalition Deficits

Let \(M\) be the complete set of all features. The SHAP interaction value \(\\Phi_{i,j}\) between feature \(i\) and feature \(j\) (where \(i \\neq j\)) measures the pure interaction effect after accounting for all other subsets of features: $$ \Phi_{i,j} = \sum_{S \subseteq M \setminus \{i, j\}} \frac{|S|! (|M| - |S| - 2)!}{2 (|M| - 1)!} \Delta_{i,j}(S) $$ where the second-order marginal contribution difference \(\\Delta_{i,j}(S)\) is defined as: $$ \Delta_{i,j}(S) = f(S \cup \{i, j\}) - f(S \cup \{i\}) - f(S \cup \{j\}) + f(S) $$ The diagonal elements \(\\Phi_{i,i}\) capture the main effect of feature \(i\) after removing all of its pairwise interactions with other features: $$ \Phi_{i,i} = \phi_i - \sum_{j \neq i} \Phi_{i,j} $$ where \(\\phi_i\) is the standard Shapley value for feature \(i\).

The complete set of main effects and interaction values decomposes the model prediction \(f(x)\) exactly: $$ f(x) = \Phi_0 + \sum_{i=1}^{|M|} \Phi_{i,i} + \sum_{i \neq j} \Phi_{i,j} $$ where \(\\Phi_0 = E[f(x)]\) is the base value (expected prediction of the model across the background training distribution).

Computational Complexity

Evaluating the summation requires computing model predictions across \(2^{|M|}\) feature coalitions, which is NP-hard in the general case. To make this practical, modern libraries utilize TreeSHAP (Lundberg et al., 2020), which calculates exact tree-based Shapley interaction values in \(O(T L D^2)\) time where \(T\) is the number of trees, \(L\) is max leaves, and \(D\) is max depth.

PARAMETER DESCRIPTION
model

A trained model (typically a tree ensemble like XGBoost or LightGBM).

TYPE: Any

X_data

The background evaluation matrix of covariate features.

TYPE: Any

RETURNS DESCRIPTION
list

A nested list or 3D numpy array of shape (num_samples, num_features, num_features) containing individual Shapley interaction matrices.

TYPE: Union[list, ndarray]

Source code in src\xpyrment\interactions\shap.py
def calculate_shap_interactions(model: Any, X_data: Any) -> Union[list, np.ndarray]:
    r"""Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).

    SHAP (SHapley Additive exPlanations) interaction values (Lundberg et al., 2018) are based on the coalitional game-theoretic
    Shapley Interaction Index (Grabisch and Roubens, 1999). While standard Shapley values partition a model's prediction
    additively among individual features, SHAP interaction values separate these contributions into pure main effects and
    pairwise interaction effects, providing an exact model-agnostic representation of how features cooperate or compete.

    Mathematical Formulation and Coalition Deficits:
        Let $M$ be the complete set of all features. The SHAP interaction value $\\Phi_{i,j}$ between feature $i$ and feature $j$
        (where $i \\neq j$) measures the pure interaction effect after accounting for all other subsets of features:
        $$
        \\Phi_{i,j} = \\sum_{S \\subseteq M \\setminus \\{i, j\\}} \\frac{|S|! (|M| - |S| - 2)!}{2 (|M| - 1)!} \\Delta_{i,j}(S)
        $$
        where the second-order marginal contribution difference $\\Delta_{i,j}(S)$ is defined as:
        $$
        \\Delta_{i,j}(S) = f(S \\cup \\{i, j\\}) - f(S \\cup \\{i\\}) - f(S \\cup \\{j\\}) + f(S)
        $$
        The diagonal elements $\\Phi_{i,i}$ capture the main effect of feature $i$ after removing all of its pairwise interactions
        with other features:
        $$
        \\Phi_{i,i} = \\phi_i - \\sum_{j \\neq i} \\Phi_{i,j}
        $$
        where $\\phi_i$ is the standard Shapley value for feature $i$.

        The complete set of main effects and interaction values decomposes the model prediction $f(x)$ exactly:
        $$
        f(x) = \\Phi_0 + \\sum_{i=1}^{|M|} \\Phi_{i,i} + \\sum_{i \\neq j} \\Phi_{i,j}
        $$
        where $\\Phi_0 = E[f(x)]$ is the base value (expected prediction of the model across the background training distribution).

    Computational Complexity:
        Evaluating the summation requires computing model predictions across $2^{|M|}$ feature coalitions, which is NP-hard
        in the general case. To make this practical, modern libraries utilize TreeSHAP (Lundberg et al., 2020), which calculates
        exact tree-based Shapley interaction values in $O(T L D^2)$ time where $T$ is the number of trees, $L$ is max leaves,
        and $D$ is max depth.

    Args:
        model: A trained model (typically a tree ensemble like XGBoost or LightGBM).
        X_data: The background evaluation matrix of covariate features.

    Returns:
        list: A nested list or 3D numpy array of shape `(num_samples, num_features, num_features)` containing
            individual Shapley interaction matrices.
    """
    try:
        import shap
    except ImportError as e:
        raise ImportError(
            "The 'shap' library is required to calculate SHAP interactions. "
            "Please install it using: pip install shap"
        ) from e

    explainer = shap.TreeExplainer(model)
    interaction_values = explainer.shap_interaction_values(X_data)

    return interaction_values

compute_friedman_h_statistic

compute_friedman_h_statistic(
    model: Any, X_data: Any, feature_i: str, feature_j: str
) -> float

Computes model-agnostic Friedman's H-statistic representing the degree of interaction between two features.

Friedman's H-statistic (Friedman and Popescu, 2008) measures the strength of interaction between features by evaluating how much of the model's prediction variation is due to joint, non-additive behavior. Unlike linear regression product terms, the H-statistic is model-agnostic and can capture highly complex, non-linear interactions in machine learning models (e.g., gradient boosted trees, neural networks).

Mathematical Formulation and Partial Dependence

Let \(x_i\) and \(x_j\) be two features. Let \(PD_i(x_i)\) and \(PD_j(x_j)\) be the 1-way Partial Dependence (PD) functions, which represent the average prediction of the model when fixing the respective feature value: $$ PD_i(x_i) = \frac{1}{N} \sum_{k=1}^N f(x_i, \ x_{k, \setminus i}) $$ Let \(PD_{ij}(x_i, \\ x_j)\) be the 2-way joint Partial Dependence function: $$ PD_{ij}(x_i, \ x_j) = \frac{1}{N} \sum_{k=1}^N f(x_i, \ x_j, \ x_{k, \setminus \{i, j\}}) $$ If there is no interaction between \(x_i\) and \(x_j\) (meaning their combined effect on the prediction is perfectly additive), then the joint PD can be decomposed exactly as the sum of their individual PD functions: $$ PD_{ij}(x_i, \ x_j) = PD_i(x_i) + PD_j(x_j) $$ Friedman's \(H^2_{ij}\) statistic measures the normalized squared deviation from this additive null hypothesis over the empirical distribution of the dataset: $$ H^2_{ij} = \frac{\sum_{k=1}^N \left[ PD_{ij}(x_{k,i}, \ x_{k,j}) - PD_i(x_{k,i}) - PD_j(x_{k,j}) \right]^2}{\sum_{k=1}^N \left[ PD_{ij}(x_{k,i}, \ x_{k,j}) \right]^2} $$

Interpretation of the H-Statistic: - \(H^2_{ij} = 0\): No interaction whatsoever. The features affect the response in a perfectly additive manner. - \(H^2_{ij} = 1.0\): The combined prediction depends entirely on their interaction; the individual main effects explain \(0\\%\) of the joint variation. - In practice, a value of \(H_{ij} > 0.10\) (representing the square root of \(H^2_{ij}\)) suggests a substantial, non-negligible interaction effect.

PARAMETER DESCRIPTION
model

A trained, black-box machine learning model (must implement .predict() or .predict_proba()).

TYPE: Any

X_data

The covariate dataset used to evaluate the partial dependencies.

TYPE: Any

feature_i

Name of the first target feature.

TYPE: str

feature_j

Name of the second target feature.

TYPE: str

RETURNS DESCRIPTION
float

The computed Friedman's H-statistic (\(H_{ij} \\in [0, 1]\)).

TYPE: float

Source code in src\xpyrment\interactions\hstat.py
def compute_friedman_h_statistic(model: Any, X_data: Any, feature_i: str, feature_j: str) -> float:
    r"""Computes model-agnostic Friedman's H-statistic representing the degree of interaction between two features.

    Friedman's H-statistic (Friedman and Popescu, 2008) measures the strength of interaction between features
    by evaluating how much of the model's prediction variation is due to joint, non-additive behavior.
    Unlike linear regression product terms, the H-statistic is model-agnostic and can capture highly complex,
    non-linear interactions in machine learning models (e.g., gradient boosted trees, neural networks).

    Mathematical Formulation and Partial Dependence:
        Let $x_i$ and $x_j$ be two features. Let $PD_i(x_i)$ and $PD_j(x_j)$ be the 1-way Partial Dependence (PD) functions,
        which represent the average prediction of the model when fixing the respective feature value:
        $$
        PD_i(x_i) = \\frac{1}{N} \\sum_{k=1}^N f(x_i, \\ x_{k, \\setminus i})
        $$
        Let $PD_{ij}(x_i, \\ x_j)$ be the 2-way joint Partial Dependence function:
        $$
        PD_{ij}(x_i, \\ x_j) = \\frac{1}{N} \\sum_{k=1}^N f(x_i, \\ x_j, \\ x_{k, \\setminus \\{i, j\\}})
        $$
        If there is no interaction between $x_i$ and $x_j$ (meaning their combined effect on the prediction is perfectly additive),
        then the joint PD can be decomposed exactly as the sum of their individual PD functions:
        $$
        PD_{ij}(x_i, \\ x_j) = PD_i(x_i) + PD_j(x_j)
        $$
        Friedman's $H^2_{ij}$ statistic measures the normalized squared deviation from this additive null hypothesis
        over the empirical distribution of the dataset:
        $$
        H^2_{ij} = \\frac{\\sum_{k=1}^N \\left[ PD_{ij}(x_{k,i}, \\ x_{k,j}) - PD_i(x_{k,i}) - PD_j(x_{k,j}) \\right]^2}{\\sum_{k=1}^N \\left[ PD_{ij}(x_{k,i}, \\ x_{k,j}) \\right]^2}
        $$
    Interpretation of the H-Statistic:
        - $H^2_{ij} = 0$: No interaction whatsoever. The features affect the response in a perfectly additive manner.
        - $H^2_{ij} = 1.0$: The combined prediction depends entirely on their interaction; the individual main effects explain
          $0\\%$ of the joint variation.
        - In practice, a value of $H_{ij} > 0.10$ (representing the square root of $H^2_{ij}$) suggests a substantial,
          non-negligible interaction effect.

    Args:
        model: A trained, black-box machine learning model (must implement `.predict()` or `.predict_proba()`).
        X_data: The covariate dataset used to evaluate the partial dependencies.
        feature_i (str): Name of the first target feature.
        feature_j (str): Name of the second target feature.

    Returns:
        float: The computed Friedman's H-statistic ($H_{ij} \\in [0, 1]$).
    """
    import numpy as np
    import pandas as pd

    # Ensure X_data is a DataFrame for easy column indexing
    if not isinstance(X_data, pd.DataFrame):
        X_data = pd.DataFrame(X_data)

    # We evaluate PDs over the empirical distribution of X_data
    n = len(X_data)
    pd_ij = np.zeros(n)
    pd_i = np.zeros(n)
    pd_j = np.zeros(n)

    for k in range(n):
        xi_k = X_data.iloc[k][feature_i]
        xj_k = X_data.iloc[k][feature_j]

        # PD_ij(xi_k, xj_k)
        # We need the average of f(xi_k, xj_k, X_\setminus ij)
        X_ij = X_data.copy()
        X_ij[feature_i] = xi_k
        X_ij[feature_j] = xj_k
        pd_ij[k] = np.mean(model.predict(X_ij))

        # PD_i(xi_k)
        X_i = X_data.copy()
        X_i[feature_i] = xi_k
        pd_i[k] = np.mean(model.predict(X_i))

        # PD_j(xj_k)
        X_j = X_data.copy()
        X_j[feature_j] = xj_k
        pd_j[k] = np.mean(model.predict(X_j))

    # Center the PDs as per Friedman (2008) to remove grand mean effects
    # In some implementations, this is handled by the differencing

    numerator = np.sum((pd_ij - pd_i - pd_j)**2)
    denominator = np.sum(pd_ij**2)

    if denominator == 0:
        return 0.0

    h2 = numerator / denominator
    return float(np.sqrt(max(0, h2)))

plot_interaction_heatmap

plot_interaction_heatmap(
    df_interactions: DataFrame,
    annot: Optional[bool] = None,
    ax: Optional[Axes] = None,
    **kwargs
) -> tuple

Generates an interaction term heatmap using matplotlib.

Visualizes a symmetric matrix of feature/factor interactions. Heatmaps are a highly effective diagnostic chart for screening complex multi-factor studies or high-dimensional covariate sets, allowing the user to instantly recognize clusters of strong synergy or severe interference.

Matrix Structure

Let \(F = \{f_1, f_2, \dots, f_m\}\) be the set of analyzed factors or covariates. The plotting engine constructs a symmetric \(m \times m\) matrix \(H\): - Cell \(H_{i,j}\) contains the strength of the interaction between \(f_i\) and \(f_j\). This value can represent either: 1. The absolute regression interaction coefficient (\(|\beta_{\text{interaction}}|\)). 2. The model-agnostic Friedman's H-statistic (\(H_{ij}\)). 3. The statistical significance transformed index (\(-\log_{10}(p_{\text{value}})\)). - Cells along the diagonal (\(H_{i,i}\)) are typically zeroed or set to represent the main effect of factor \(f_i\). - The matrix is rendered using a divergent colormap (such as RdBu or seismic if mapping positive/negative coefficients) or a sequential colormap (such as Viridis or YlOrRd if mapping absolute H-statistics or significance).

PARAMETER DESCRIPTION
df_interactions

A rectangular or pivoted DataFrame representing the interaction strength matrix, with factor names as both index and column headings.

TYPE: DataFrame

annot

Whether to annotate the cells with numeric values. If None, annotations are enabled automatically only if the matrix size is small (e.g., <= 20 features).

TYPE: bool DEFAULT: None

ax

Pre-existing axes for the plot. If None, a new figure and axes are created.

TYPE: Axes DEFAULT: None

**kwargs

Additional keyword arguments to pass to seaborn.heatmap.

DEFAULT: {}

RETURNS DESCRIPTION
tuple

A tuple (fig, ax) containing: - fig (matplotlib.figure.Figure): The active matplotlib Figure canvas. - ax (matplotlib.axes.Axes): The axes container housing the rendered heatmap.

TYPE: tuple

Source code in src\xpyrment\interactions\plots.py
def plot_interaction_heatmap(
    df_interactions: pd.DataFrame,
    annot: Optional[bool] = None,
    ax: Optional[plt.Axes] = None,
    **kwargs
) -> tuple:
    r"""Generates an interaction term heatmap using matplotlib.

    Visualizes a symmetric matrix of feature/factor interactions. Heatmaps are a highly effective diagnostic
    chart for screening complex multi-factor studies or high-dimensional covariate sets, allowing the user
    to instantly recognize clusters of strong synergy or severe interference.

    Matrix Structure:
        Let $F = \{f_1, f_2, \dots, f_m\}$ be the set of analyzed factors or covariates. The plotting engine constructs
        a symmetric $m \times m$ matrix $H$:
        - Cell $H_{i,j}$ contains the strength of the interaction between $f_i$ and $f_j$. This value can represent
          either:
          1. The absolute regression interaction coefficient ($|\beta_{\text{interaction}}|$).
          2. The model-agnostic Friedman's H-statistic ($H_{ij}$).
          3. The statistical significance transformed index ($-\log_{10}(p_{\text{value}})$).
        - Cells along the diagonal ($H_{i,i}$) are typically zeroed or set to represent the main effect of factor $f_i$.
        - The matrix is rendered using a divergent colormap (such as `RdBu` or `seismic` if mapping positive/negative coefficients)
          or a sequential colormap (such as `Viridis` or `YlOrRd` if mapping absolute H-statistics or significance).

    Args:
        df_interactions (pd.DataFrame): A rectangular or pivoted DataFrame representing the interaction strength matrix,
            with factor names as both index and column headings.
        annot (bool, optional): Whether to annotate the cells with numeric values. If None,
            annotations are enabled automatically only if the matrix size is small (e.g., <= 20 features).
        ax (matplotlib.axes.Axes, optional): Pre-existing axes for the plot. If None, a new figure
            and axes are created.
        **kwargs: Additional keyword arguments to pass to `seaborn.heatmap`.

    Returns:
        tuple: A tuple `(fig, ax)` containing:
            - `fig` (matplotlib.figure.Figure): The active matplotlib Figure canvas.
            - `ax` (matplotlib.axes.Axes): The axes container housing the rendered heatmap.
    """
    # Validate symmetric square matrix
    if df_interactions.shape[0] != df_interactions.shape[1]:
        raise ValueError(f"Interaction matrix must be square, got shape {df_interactions.shape}")
    if list(df_interactions.index) != list(df_interactions.columns):
        raise ValueError("Interaction matrix must have matching index and columns")

    num_features = df_interactions.shape[0]

    # Determine whether to annotate cells
    if annot is None:
        # Heuristic: only annotate for smaller matrices to keep the plot readable and fast
        annot_flag = num_features <= 20
    else:
        annot_flag = annot

    # Determine the appropriate colormap based on the numeric data values
    numeric_df = df_interactions.select_dtypes(include=[np.number])
    if numeric_df.empty:
        raise TypeError("Interaction matrix must contain numeric columns to plot heatmap.")

    has_negative_values = (numeric_df.to_numpy() < 0).any()
    cmap = kwargs.pop("cmap", "RdBu_r" if has_negative_values else "YlOrRd")

    if ax is None:
        # Calculate a dynamic figure size based on the number of features
        fig_size = max(8, min(24, num_features * 1.5))
        fig, ax_to_use = plt.subplots(figsize=(fig_size, fig_size * 0.8))
    else:
        fig = ax.figure
        ax_to_use = ax

    # Calculate symmetric vmin/vmax for divergent colormaps
    heatmap_kwargs = {
        "annot": annot_flag,
        "fmt": ".3g",
        "cmap": cmap,
        "ax": ax_to_use,
        "square": True,
        "cbar_kws": {'label': 'Interaction Strength'}
    }

    if has_negative_values:
        max_abs = np.abs(numeric_df.to_numpy()).max()
        heatmap_kwargs["vmin"] = -max_abs
        heatmap_kwargs["vmax"] = max_abs
        heatmap_kwargs["center"] = 0
    else:
        heatmap_kwargs["vmin"] = 0

    heatmap_kwargs.update(kwargs)

    # Render the heatmap
    sns.heatmap(df_interactions, **heatmap_kwargs)

    # Add title and adjust layout
    ax_to_use.set_title("Factor Interaction Heatmap", pad=20, fontsize=14, fontweight="bold")
    ax_to_use.set_xlabel("Factor", fontweight="bold")
    ax_to_use.set_ylabel("Factor", fontweight="bold")

    if ax is None:
        fig.tight_layout()

    return fig, ax_to_use

plot_interaction_effects

plot_interaction_effects(
    data: DataFrame,
    treatment_col: str,
    metric_col: str,
    covariate_col: str,
    ax: Optional[Axes] = None,
    **kwargs
) -> tuple

Plots interaction effects between a treatment and a covariate on a metric.

Generates a line plot showing the average metric value for different treatment groups across levels of the covariate. This helps visualize if the treatment effect varies depending on the covariate value (heterogeneous treatment effect).

Example
from xpyrment.interactions import plot_interaction_effects

plot_interaction_effects(data, "treatment", "revenue", "country")
PARAMETER DESCRIPTION
data

The experimental data containing treatments, covariates, and metrics.

TYPE: DataFrame

treatment_col

The name of the column representing the treatment group.

TYPE: str

metric_col

The name of the column representing the outcome metric.

TYPE: str

covariate_col

The name of the column representing the interacting covariate.

TYPE: str

ax

Pre-existing axes for the plot. If None, a new figure and axes are created.

TYPE: Axes DEFAULT: None

**kwargs

Additional keyword arguments to pass to the underlying plotting functions.

DEFAULT: {}

RETURNS DESCRIPTION
tuple

A tuple (fig, ax) containing the matplotlib Figure and Axes objects.

TYPE: tuple

Source code in src\xpyrment\interactions\plots.py
def plot_interaction_effects(
    data: pd.DataFrame,
    treatment_col: str,
    metric_col: str,
    covariate_col: str,
    ax: Optional[plt.Axes] = None,
    **kwargs
) -> tuple:
    """Plots interaction effects between a treatment and a covariate on a metric.

    Generates a line plot showing the average metric value for different treatment
    groups across levels of the covariate. This helps visualize if the treatment
    effect varies depending on the covariate value (heterogeneous treatment effect).

    Example:
        ```python
        from xpyrment.interactions import plot_interaction_effects

        plot_interaction_effects(data, "treatment", "revenue", "country")
        ```

    Args:
        data (pd.DataFrame): The experimental data containing treatments, covariates, and metrics.
        treatment_col (str): The name of the column representing the treatment group.
        metric_col (str): The name of the column representing the outcome metric.
        covariate_col (str): The name of the column representing the interacting covariate.
        ax (matplotlib.axes.Axes, optional): Pre-existing axes for the plot. If None, a new figure
            and axes are created.
        **kwargs: Additional keyword arguments to pass to the underlying plotting functions.

    Returns:
        tuple: A tuple `(fig, ax)` containing the matplotlib Figure and Axes objects.
    """
    num_treatments = data[treatment_col].nunique()

    if ax is None:
        fig, ax_to_use = plt.subplots(figsize=(10, 6))
    else:
        fig = ax.figure
        ax_to_use = ax

    # Drop NaNs to prevent plotting issues
    plot_data = data.dropna(subset=[treatment_col, metric_col, covariate_col]).copy()

    # Check if covariate is numeric or categorical
    if pd.api.types.is_numeric_dtype(plot_data[covariate_col]) and plot_data[covariate_col].nunique() > 10:
        # For continuous numeric covariates, use a scatter plot with regression lines

        # Get default seaborn color palette
        palette = sns.color_palette(n_colors=num_treatments)
        treatments = sorted(plot_data[treatment_col].unique())

        for idx, treatment in enumerate(treatments):
            group_data = plot_data[plot_data[treatment_col] == treatment]

            # Combine default and provided kwargs
            regplot_kwargs = {
                "scatter_kws": {"alpha": 0.5},
                "label": str(treatment),
                "color": palette[idx]
            }
            regplot_kwargs.update(kwargs)

            sns.regplot(
                data=group_data,
                x=covariate_col,
                y=metric_col,
                ax=ax_to_use,
                **regplot_kwargs
            )

        ax_to_use.legend(title=treatment_col)
    else:
        # For categorical or discrete covariates, use a point plot (interaction plot)

        # Ensure we have enough markers and linestyles for all treatment groups
        all_markers = ["o", "s", "D", "^", "v", "<", ">", "p", "*", "h", "H", "+", "x", "X", "d", "|", "_"]
        all_linestyles = ["-", "--", "-.", ":"] * ((num_treatments // 4) + 1)

        pointplot_kwargs = {
            "dodge": True,
            "markers": all_markers[:num_treatments],
            "linestyles": all_linestyles[:num_treatments]
        }
        pointplot_kwargs.update(kwargs)

        sns.pointplot(
            data=plot_data,
            x=covariate_col,
            y=metric_col,
            hue=treatment_col,
            ax=ax_to_use,
            **pointplot_kwargs
        )

    ax_to_use.set_title(f"Interaction Effect of {treatment_col} and {covariate_col} on {metric_col}", pad=15)
    ax_to_use.set_xlabel(covariate_col.capitalize())
    ax_to_use.set_ylabel(metric_col.capitalize())

    if ax is None:
        fig.tight_layout()

    return fig, ax_to_use