Interactions Module

The xpyrment.interactions module contains submodules and components for interactions.

interactions

Multi-factor experimental design and covariate-treatment interaction analysis.

This package provides tools to discover, model, and visualize interactions within experimental studies. Understanding interactions is critical to determine whether multiple treatments conflict or work synergistically, and whether a treatment effect is heterogeneous across user characteristics.

Submodules: - detector: Orchestrates and dispatches multi-model interaction checks across covariates and factors. - anova: Performs Factorial Analysis of Variance (ANOVA) and partitions sum of squares. - regression: Fits interactive regression models and conducts Likelihood Ratio Tests (LRT). - hstat: Quantifies model-agnostic interaction strengths using Friedman's H-statistic. - shap: Calculates second-order game-theoretic Shapley interaction values (TreeSHAP). - plots: Renders symmetric heatmaps and diagnostic visualization charts.

MODULE	DESCRIPTION
`anova`	Factorial Analysis of Variance (ANOVA) and statistical interaction testing.
`detector`	Interaction dispatcher and multi-model statistical screening.
`hstat`	Model-agnostic interaction measurement using Friedman's H-statistic.
`plots`	Visualizations and plot generation utilities for multi-factor interactions.
`regression`	Interactive regression modeling and Likelihood Ratio Testing (LRT).
`shap`	Game-theoretic feature interaction analysis using SHAP interaction indices.

CLASS	DESCRIPTION
`InteractionDetector`	Dispatches multi-factor, covariate-treatment, and non-linear interactions.

FUNCTION	DESCRIPTION
`run_factorial_anova`	Computes factorial ANOVA tables with interaction terms for DoE factors.
`check_treatment_covariate_interaction`	Computes a Likelihood Ratio Test (LRT) to check if a covariate significantly interacts with the treatment split.
`calculate_shap_interactions`	Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).
`compute_friedman_h_statistic`	Computes model-agnostic Friedman's H-statistic representing the degree of interaction between two features.
`plot_interaction_heatmap`	Generates an interaction term heatmap using matplotlib.
`plot_interaction_effects`	Plots interaction effects between a treatment and a covariate on a metric.

InteractionDetector

InteractionDetector(experiment)

Dispatches multi-factor, covariate-treatment, and non-linear interactions.

In complex online and physical experiments, interventions rarely operate in a vacuum. The treatment effect of one change may depend heavily on the status of other features (multi-factor interaction) or the characteristics of the experimental unit (covariate-treatment interaction, or Heterogeneous Treatment Effect). The InteractionDetector screens for these interactions automatically.

Mathematical Interaction Categories

Multi-Factor Synergy or Interference (Factor-Factor Interaction): Evaluates whether combining Treatment 1 ($T_1$) and Treatment 2 ($T_2$) yields a response that differs from the sum of their individual effects: $$ Y = \beta_0 + \beta_1 T_1 + \beta_2 T_2 + \beta_3 (T_1 \times T_2) + \varepsilon $$
If $\\beta_3 > 0$, the factors are synergistic.
If $\\beta_3 < 0$, the factors interfere with each other (redundancy or collision).
Covariate-Treatment Interaction (Heterogeneous Treatment Effects - HTE): Evaluates whether the treatment effect varies systematically across pre-experiment characteristics $C$ (e.g., country, browser, mobile vs. desktop, or historical spending): $$ Y = \beta_0 + \beta_1 T + \beta_2 C + \beta_3 (T \times C) + \varepsilon $$ A significant $\\beta_3$ ($p < 0.05$) indicates that the treatment effect varies across subpopulations.
Response Surface Curvature (Non-linear Interaction): In continuous Design of Experiments (DoE), evaluates quadratic curvatures and continuous interaction slopes: $$ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1^2 + \beta_4 X_2^2 + \beta_5 (X_1 \times X_2) + \varepsilon $$ where $\\beta_5$ captures the continuous twisting of the response landscape, and $\\beta_3, \\beta_4$ capture curvature.

Algorithmic Screening Workflow

function detect_all(experiment):
    Initialize interaction_results = {}
    For each pair of factors (A, B) in design:
        Run OLS: Y ~ A * B
        Extract interaction p-value.
        If p-value < 0.05:
            Add to interaction_results["factor_factor"]

    For each covariate C in experiment.covariates:
        Run OLS: Y ~ Treatment * C
        Extract interaction p-value.
        If p-value < 0.05:
            Add to interaction_results["heterogeneous_treatment_effects"]

    Return interaction_results

ATTRIBUTE	DESCRIPTION
`experiment`	The completed or active experiment container. TYPE: `Experiment`

PARAMETER	DESCRIPTION
`experiment`	The experiment container containing datasets, metrics, and factor definitions. TYPE: `Experiment`

METHOD	DESCRIPTION
`detect_all`	Runs ANOVA and regression checks to identify interaction terms across factors and covariates.

Source code in src\xpyrment\interactions\detector.py

def __init__(self, experiment):
    """Initializes an InteractionDetector.

    Args:
        experiment (Experiment): The experiment container containing datasets, metrics, and factor definitions.
    """
    self.experiment = experiment

detect_all

detect_all() -> dict

Runs ANOVA and regression checks to identify interaction terms across factors and covariates.

Fits multi-variable linear regression models with product terms and flags statistically significant interactions.

RETURNS	DESCRIPTION
`dict`	A dictionary grouping detected interactions, their estimated coefficients, standard errors, and p-values. TYPE: `dict`

Source code in src\xpyrment\interactions\detector.py

def detect_all(self) -> dict:
    """Runs ANOVA and regression checks to identify interaction terms across factors and covariates.

    Fits multi-variable linear regression models with product terms and flags statistically
    significant interactions.

    Returns:
        dict: A dictionary grouping detected interactions, their estimated coefficients, standard errors,
            and p-values.
    """
    from xpyrment.interactions.regression import check_treatment_covariate_interaction

    results = {
        "factor_factor": [],
        "heterogeneous_treatment_effects": []
    }

    df = self.experiment.data
    treatment_col = self.experiment.treatment_col
    target_metrics = [m.name for m in self.experiment.metrics]
    # Fallback: if no metrics registered, but columns exist, we could guess, 
    # but let's stick to registered metrics.
    if not target_metrics:
        # If orchestrator was used, metrics might be registered there, 
        # but usually they are in experiment.metrics
        pass
    covariates = self.experiment.covariates

    # 1. Screen for Heterogeneous Treatment Effects (Covariate-Treatment)
    for metric in target_metrics:
        for cov in covariates:
            p_val = check_treatment_covariate_interaction(df, treatment_col, cov, metric)
            if p_val < 0.05:
                results["heterogeneous_treatment_effects"].append({
                    "metric": metric,
                    "covariate": cov,
                    "p_value": p_val
                })

    # 2. Factor-Factor Interactions (if multiple factors exist)
    # Note: In xpyrment, factors are often encoded in the variant column or separate columns.
    # For simplicity, we assume if treatment_col contains more than 2 variants, we check them.
    # But a more robust way is to check if the user registered multiple factors.
    # For now, let's just stick to the docstring's plan of OLS: Y ~ T * C.

    return results

run_factorial_anova

run_factorial_anova(
    df: DataFrame, formula: str
) -> DataFrame

Computes factorial ANOVA tables with interaction terms for DoE factors.

Factorial ANOVA decomposes the total variability of an experimental outcome into portions attributable to main factor effects, multi-factor interaction effects, and random error. This is crucial for verifying which process factors have a statistically significant impact on the response variable, and whether factors behave synergetically or antagonistically when combined.

Mathematical Formulation

For a two-factor experimental design (Factor $A$ with $I$ levels, Factor $B$ with $J$ levels, and $K$ replicates per cell), the response $Y_{ijk}$ is modeled as: $$ Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta){ij} + \varepsilon{ijk} $$ where: - $\\mu$: The grand mean of the response. - $\\alpha_i$: The main effect of Factor $A$ at level $i$ (subject to $\\sum_{i=1}^I \\alpha_i = 0$). - $\\beta_j$: The main effect of Factor $B$ at level $j$ (subject to $\\sum_{j=1}^J \\beta_j = 0$). - $(\\alpha\\beta)_{ij}$ smokes: The interaction effect between Factor $A$ at level $i$ and Factor $B$ at level $j$ (subject to $\\sum_{i=1}^I (\\alpha\\beta)_{ij} = \\sum_{j=1}^J (\\alpha\\beta)_{ij} = 0$). - $\\varepsilon_{ijk}$: Independent and identically distributed normal error terms, $\\varepsilon_{ijk} \\sim \\mathcal{N}(0, \\sigma^2)$.

Decomposition of Sum of Squares (SS): The total sum of squares ($SS_{\\text{Total}}$) measures total sample variation: $$ SS_{\text{Total}} = SS_A + SS_B + SS_{AB} + SS_{\text{Error}} $$ where: - $SS_A = J K \\sum_{i=1}^I (\\bar{Y}_{i\\cdot\\cdot} - \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Main effect $A$) - $SS_B = I K \\sum_{j=1}^J (\\bar{Y}_{\\cdot j\\cdot} - \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Main effect $B$) - $SS_{AB} = K \\sum_{i=1}^I \\sum_{j=1}^J (\\bar{Y}_{ij\\cdot} - \\bar{Y}_{i\\cdot\\cdot} - \\bar{Y}_{\\cdot j\\cdot} + \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Interaction effect $AB$) - $SS_{\\text{Error}} = \\sum_{i=1}^I \\sum_{j=1}^J \\sum_{k=1}^K (Y_{ijk} - \\bar{Y}_{ij\\cdot})^2$ (Residual variation)

F-Test Ratios

Significance of each effect is evaluated by comparing the Mean Square ($MS = SS / df$) against the residual Mean Square ($MS_{\\text{Error}}$): - For Factor $A$: $$ F_A = \frac{MS_A}{MS_{\text{Error}}} = \frac{SS_A / (I-1)}{SS_{\text{Error}} / [IJ(K-1)]} \sim F_{I-1, \ IJ(K-1)} $$ - For Interaction $AB$: $$ F_{AB} = \frac{MS_{AB}}{MS_{\text{Error}}} = \frac{SS_{AB} / [(I-1)(J-1)]}{SS_{\text{Error}} / [IJ(K-1)]} \sim F_{(I-1)(J-1), \ IJ(K-1)} $$ A significant $F_{AB}$ ($p < 0.05$) proves that the effect of Factor $A$ depends on the level of Factor $B$. This indicates that interpreting main effects alone is statistically misleading; the interaction must be evaluated.

PARAMETER	DESCRIPTION
`df`	The experimental dataset. TYPE: `DataFrame`
`formula`	R-style regression formula (e.g., `"revenue ~ factor_a * factor_b"`). TYPE: `str`

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: A standard ANOVA table detailing Sum of Squares, degrees of freedom ($df$), F-statistics, and p-values for each term.

Source code in src\xpyrment\interactions\anova.py

def run_factorial_anova(df: pd.DataFrame, formula: str) -> pd.DataFrame:
    r"""Computes factorial ANOVA tables with interaction terms for DoE factors.

    Factorial ANOVA decomposes the total variability of an experimental outcome into portions attributable
    to main factor effects, multi-factor interaction effects, and random error. This is crucial for verifying
    which process factors have a statistically significant impact on the response variable, and whether factors
    behave synergetically or antagonistically when combined.

    Mathematical Formulation:
        For a two-factor experimental design (Factor $A$ with $I$ levels, Factor $B$ with $J$ levels, and $K$ replicates per cell),
        the response $Y_{ijk}$ is modeled as:
        $$
        Y_{ijk} = \\mu + \\alpha_i + \\beta_j + (\\alpha\\beta)_{ij} + \\varepsilon_{ijk}
        $$
        where:
        - $\\mu$: The grand mean of the response.
        - $\\alpha_i$: The main effect of Factor $A$ at level $i$ (subject to $\\sum_{i=1}^I \\alpha_i = 0$).
        - $\\beta_j$: The main effect of Factor $B$ at level $j$ (subject to $\\sum_{j=1}^J \\beta_j = 0$).
        - $(\\alpha\\beta)_{ij}$ smokes: The interaction effect between Factor $A$ at level $i$ and Factor $B$ at level $j$
          (subject to $\\sum_{i=1}^I (\\alpha\\beta)_{ij} = \\sum_{j=1}^J (\\alpha\\beta)_{ij} = 0$).
        - $\\varepsilon_{ijk}$: Independent and identically distributed normal error terms, $\\varepsilon_{ijk} \\sim \\mathcal{N}(0, \\sigma^2)$.

    Decomposition of Sum of Squares (SS):
        The total sum of squares ($SS_{\\text{Total}}$) measures total sample variation:
        $$
        SS_{\\text{Total}} = SS_A + SS_B + SS_{AB} + SS_{\\text{Error}}
        $$
        where:
        - $SS_A = J K \\sum_{i=1}^I (\\bar{Y}_{i\\cdot\\cdot} - \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Main effect $A$)
        - $SS_B = I K \\sum_{j=1}^J (\\bar{Y}_{\\cdot j\\cdot} - \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Main effect $B$)
        - $SS_{AB} = K \\sum_{i=1}^I \\sum_{j=1}^J (\\bar{Y}_{ij\\cdot} - \\bar{Y}_{i\\cdot\\cdot} - \\bar{Y}_{\\cdot j\\cdot} + \\bar{Y}_{\\cdot\\cdot\\cdot})^2$ (Interaction effect $AB$)
        - $SS_{\\text{Error}} = \\sum_{i=1}^I \\sum_{j=1}^J \\sum_{k=1}^K (Y_{ijk} - \\bar{Y}_{ij\\cdot})^2$ (Residual variation)

    F-Test Ratios:
        Significance of each effect is evaluated by comparing the Mean Square ($MS = SS / df$) against the residual Mean Square ($MS_{\\text{Error}}$):
        - For Factor $A$:
          $$
          F_A = \\frac{MS_A}{MS_{\\text{Error}}} = \\frac{SS_A / (I-1)}{SS_{\\text{Error}} / [IJ(K-1)]} \\sim F_{I-1, \\ IJ(K-1)}
          $$
        - For Interaction $AB$:
          $$
          F_{AB} = \\frac{MS_{AB}}{MS_{\\text{Error}}} = \\frac{SS_{AB} / [(I-1)(J-1)]}{SS_{\\text{Error}} / [IJ(K-1)]} \\sim F_{(I-1)(J-1), \\ IJ(K-1)}
          $$
        A significant $F_{AB}$ ($p < 0.05$) proves that the effect of Factor $A$ depends on the level of Factor $B$. This indicates that
        interpreting main effects alone is statistically misleading; the interaction must be evaluated.

    Args:
        df (pd.DataFrame): The experimental dataset.
        formula (str): R-style regression formula (e.g., `"revenue ~ factor_a * factor_b"`).

    Returns:
        pd.DataFrame: A standard ANOVA table detailing Sum of Squares, degrees of freedom ($df$), F-statistics,
            and p-values for each term.
    """
    import statsmodels.api as sm
    import statsmodels.formula.api as smf

    # Fit ordinary least squares model using R-style formula
    model = smf.ols(formula, data=df).fit()

    # Compute Type II ANOVA table
    anova_table = sm.stats.anova_lm(model, typ=2)

    return anova_table

check_treatment_covariate_interaction

check_treatment_covariate_interaction(
    df: DataFrame,
    treatment_col: str,
    covariate_col: str,
    target_col: str,
) -> float

Computes a Likelihood Ratio Test (LRT) to check if a covariate significantly interacts with the treatment split.

Evaluates whether the treatment effect varies across different values of a pre-period covariate. To determine if the interaction term is statistically necessary (rather than just overfitting the sample), we fit nested regression models and perform a classical Likelihood Ratio Test.

Mathematical Formulation of Nested Models

We define two models representing competing hypotheses: 1. Restricted Null Model ($M_{\\text{null}}$) (additive, assuming no interaction): $$ Y_i = \beta_0 + \beta_1 T_i + \beta_2 C_i + \varepsilon_i $$ 2. Unrestricted Alternative Model ($M_{\\text{alt}}$) (interactive, assuming interaction): $$ Y_i = \beta_0 + \beta_1 T_i + \beta_2 C_i + \beta_3 (T_i \times C_i) + \varepsilon_i $$ where: - $Y_i$: The target outcome metric ($target\\_col$) for unit $i$. - $T_i$: The treatment group indicator ($treatment\\_col$, e.g., $0$ or $1$). - $C_i$: The pre-period covariate ($covariate\\_col$, e.g., device type or baseline revenue). - $T_i \\times C_i$: The interaction/product term.

The Likelihood Ratio Test (LRT): Let $\\ln L(M_{\\text{null}})$ and $\\ln L(M_{\\text{alt}})$ be the maximized log-likelihood values of the nested models. The test statistic $D$ is computed as: $$ D = 2 \left( \ln L(M_{\text{alt}}) - \ln L(M_{\text{null}}) \right) $$ Under the null hypothesis $H_0: \\beta_3 = 0$ (no interaction), the test statistic $D$ asymptotically follows a Chi-square distribution with degrees of freedom equal to the difference in the number of parameters: $$ D \sim \chi^2_{df_{\text{alt}} - df_{\text{null}}} = \chi^2_1 $$ (since we added exactly one interaction parameter, $\\beta_3$).

The resulting p-value is calculated as:
$$
p = 1 - F_{\\chi^2_1}(D)
$$
where $F$ is the cumulative distribution function of the Chi-square distribution with 1 degree of freedom.
If $p < 0.05$, the alternative model is selected, confirming that the treatment effect varies across levels of the covariate.

PARAMETER	DESCRIPTION
`df`	The experimental dataset. TYPE: `DataFrame`
`treatment_col`	Column containing treatment assignments. TYPE: `str`
`covariate_col`	Column containing the target pre-period covariate under evaluation. TYPE: `str`
`target_col`	Column containing the outcome response variable ($Y$). TYPE: `str`

RETURNS	DESCRIPTION
`float`	The calculated p-value of the Likelihood Ratio Test. A value $< 0.05$ indicates a significant interaction. TYPE: `float`

Source code in src\xpyrment\interactions\regression.py

def check_treatment_covariate_interaction(df: pd.DataFrame, treatment_col: str, covariate_col: str, target_col: str) -> float:
    r"""Computes a Likelihood Ratio Test (LRT) to check if a covariate significantly interacts with the treatment split.

    Evaluates whether the treatment effect varies across different values of a pre-period covariate.
    To determine if the interaction term is statistically necessary (rather than just overfitting the sample),
    we fit nested regression models and perform a classical Likelihood Ratio Test.

    Mathematical Formulation of Nested Models:
        We define two models representing competing hypotheses:
        1. **Restricted Null Model ($M_{\\text{null}}$)** (additive, assuming no interaction):
           $$
           Y_i = \\beta_0 + \\beta_1 T_i + \\beta_2 C_i + \\varepsilon_i
           $$
        2. **Unrestricted Alternative Model ($M_{\\text{alt}}$)** (interactive, assuming interaction):
           $$
           Y_i = \\beta_0 + \\beta_1 T_i + \\beta_2 C_i + \\beta_3 (T_i \\times C_i) + \\varepsilon_i
           $$
        where:
        - $Y_i$: The target outcome metric ($target\\_col$) for unit $i$.
        - $T_i$: The treatment group indicator ($treatment\\_col$, e.g., $0$ or $1$).
        - $C_i$: The pre-period covariate ($covariate\\_col$, e.g., device type or baseline revenue).
        - $T_i \\times C_i$: The interaction/product term.

    The Likelihood Ratio Test (LRT):
        Let $\\ln L(M_{\\text{null}})$ and $\\ln L(M_{\\text{alt}})$ be the maximized log-likelihood values of the nested models.
        The test statistic $D$ is computed as:
        $$
        D = 2 \\left( \\ln L(M_{\\text{alt}}) - \\ln L(M_{\\text{null}}) \\right)
        $$
        Under the null hypothesis $H_0: \\beta_3 = 0$ (no interaction), the test statistic $D$ asymptotically follows a
        Chi-square distribution with degrees of freedom equal to the difference in the number of parameters:
        $$
        D \\sim \\chi^2_{df_{\\text{alt}} - df_{\\text{null}}} = \\chi^2_1
        $$
        (since we added exactly one interaction parameter, $\\beta_3$).

        The resulting p-value is calculated as:
        $$
        p = 1 - F_{\\chi^2_1}(D)
        $$
        where $F$ is the cumulative distribution function of the Chi-square distribution with 1 degree of freedom.
        If $p < 0.05$, the alternative model is selected, confirming that the treatment effect varies across levels of the covariate.

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column containing treatment assignments.
        covariate_col (str): Column containing the target pre-period covariate under evaluation.
        target_col (str): Column containing the outcome response variable ($Y$).

    Returns:
        float: The calculated p-value of the Likelihood Ratio Test. A value $< 0.05$ indicates a significant interaction.
    """
    import statsmodels.formula.api as smf
    from scipy.stats import chi2

    # Model 1: Additive (Restricted)
    formula_null = f"{target_col} ~ {treatment_col} + {covariate_col}"
    model_null = smf.ols(formula_null, data=df).fit()

    # Model 2: Interactive (Unrestricted)
    formula_alt = f"{target_col} ~ {treatment_col} * {covariate_col}"
    model_alt = smf.ols(formula_alt, data=df).fit()

    # Likelihood Ratio Test
    # For OLS, LRT = n * ln(RSS_null / RSS_alt)
    # But statsmodels OLS results have .llf (log-likelihood)
    llf_null = model_null.llf
    llf_alt = model_alt.llf

    lr_stat = 2 * (llf_alt - llf_null)
    p_value = chi2.sf(lr_stat, df=1)

    return float(p_value)

calculate_shap_interactions

calculate_shap_interactions(
    model: Any, X_data: Any
) -> Union[list, ndarray]

Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).

SHAP (SHapley Additive exPlanations) interaction values (Lundberg et al., 2018) are based on the coalitional game-theoretic Shapley Interaction Index (Grabisch and Roubens, 1999). While standard Shapley values partition a model's prediction additively among individual features, SHAP interaction values separate these contributions into pure main effects and pairwise interaction effects, providing an exact model-agnostic representation of how features cooperate or compete.

Mathematical Formulation and Coalition Deficits

Let $M$ be the complete set of all features. The SHAP interaction value $\\Phi_{i,j}$ between feature $i$ and feature $j$ (where $i \\neq j$) measures the pure interaction effect after accounting for all other subsets of features: $$ \Phi_{i,j} = \sum_{S \subseteq M \setminus \{i, j\}} \frac{|S|! (|M| - |S| - 2)!}{2 (|M| - 1)!} \Delta_{i,j}(S) $$ where the second-order marginal contribution difference $\\Delta_{i,j}(S)$ is defined as: $$ \Delta_{i,j}(S) = f(S \cup \{i, j\}) - f(S \cup \{i\}) - f(S \cup \{j\}) + f(S) $$ The diagonal elements $\\Phi_{i,i}$ capture the main effect of feature $i$ after removing all of its pairwise interactions with other features: $$ \Phi_{i,i} = \phi_i - \sum_{j \neq i} \Phi_{i,j} $$ where $\\phi_i$ is the standard Shapley value for feature $i$.

The complete set of main effects and interaction values decomposes the model prediction $f(x)$ exactly: $$ f(x) = \Phi_0 + \sum_{i=1}^{|M|} \Phi_{i,i} + \sum_{i \neq j} \Phi_{i,j} $$ where $\\Phi_0 = E[f(x)]$ is the base value (expected prediction of the model across the background training distribution).

Computational Complexity

Evaluating the summation requires computing model predictions across $2^{|M|}$ feature coalitions, which is NP-hard in the general case. To make this practical, modern libraries utilize TreeSHAP (Lundberg et al., 2020), which calculates exact tree-based Shapley interaction values in $O(T L D^2)$ time where $T$ is the number of trees, $L$ is max leaves, and $D$ is max depth.

PARAMETER	DESCRIPTION
`model`	A trained model (typically a tree ensemble like XGBoost or LightGBM). TYPE: `Any`
`X_data`	The background evaluation matrix of covariate features. TYPE: `Any`

RETURNS	DESCRIPTION
`list`	A nested list or 3D numpy array of shape `(num_samples, num_features, num_features)` containing individual Shapley interaction matrices. TYPE: `Union[list, ndarray]`

Source code in src\xpyrment\interactions\shap.py

def calculate_shap_interactions(model: Any, X_data: Any) -> Union[list, np.ndarray]:
    r"""Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).

    SHAP (SHapley Additive exPlanations) interaction values (Lundberg et al., 2018) are based on the coalitional game-theoretic
    Shapley Interaction Index (Grabisch and Roubens, 1999). While standard Shapley values partition a model's prediction
    additively among individual features, SHAP interaction values separate these contributions into pure main effects and
    pairwise interaction effects, providing an exact model-agnostic representation of how features cooperate or compete.

    Mathematical Formulation and Coalition Deficits:
        Let $M$ be the complete set of all features. The SHAP interaction value $\\Phi_{i,j}$ between feature $i$ and feature $j$
        (where $i \\neq j$) measures the pure interaction effect after accounting for all other subsets of features:
        $$
        \\Phi_{i,j} = \\sum_{S \\subseteq M \\setminus \\{i, j\\}} \\frac{|S|! (|M| - |S| - 2)!}{2 (|M| - 1)!} \\Delta_{i,j}(S)
        $$
        where the second-order marginal contribution difference $\\Delta_{i,j}(S)$ is defined as:
        $$
        \\Delta_{i,j}(S) = f(S \\cup \\{i, j\\}) - f(S \\cup \\{i\\}) - f(S \\cup \\{j\\}) + f(S)
        $$
        The diagonal elements $\\Phi_{i,i}$ capture the main effect of feature $i$ after removing all of its pairwise interactions
        with other features:
        $$
        \\Phi_{i,i} = \\phi_i - \\sum_{j \\neq i} \\Phi_{i,j}
        $$
        where $\\phi_i$ is the standard Shapley value for feature $i$.

        The complete set of main effects and interaction values decomposes the model prediction $f(x)$ exactly:
        $$
        f(x) = \\Phi_0 + \\sum_{i=1}^{|M|} \\Phi_{i,i} + \\sum_{i \\neq j} \\Phi_{i,j}
        $$
        where $\\Phi_0 = E[f(x)]$ is the base value (expected prediction of the model across the background training distribution).

    Computational Complexity:
        Evaluating the summation requires computing model predictions across $2^{|M|}$ feature coalitions, which is NP-hard
        in the general case. To make this practical, modern libraries utilize TreeSHAP (Lundberg et al., 2020), which calculates
        exact tree-based Shapley interaction values in $O(T L D^2)$ time where $T$ is the number of trees, $L$ is max leaves,
        and $D$ is max depth.

    Args:
        model: A trained model (typically a tree ensemble like XGBoost or LightGBM).
        X_data: The background evaluation matrix of covariate features.

    Returns:
        list: A nested list or 3D numpy array of shape `(num_samples, num_features, num_features)` containing
            individual Shapley interaction matrices.
    """
    try:
        import shap
    except ImportError as e:
        raise ImportError(
            "The 'shap' library is required to calculate SHAP interactions. "
            "Please install it using: pip install shap"
        ) from e

    explainer = shap.TreeExplainer(model)
    interaction_values = explainer.shap_interaction_values(X_data)

    return interaction_values

compute_friedman_h_statistic

compute_friedman_h_statistic(
    model: Any, X_data: Any, feature_i: str, feature_j: str
) -> float

Computes model-agnostic Friedman's H-statistic representing the degree of interaction between two features.

Friedman's H-statistic (Friedman and Popescu, 2008) measures the strength of interaction between features by evaluating how much of the model's prediction variation is due to joint, non-additive behavior. Unlike linear regression product terms, the H-statistic is model-agnostic and can capture highly complex, non-linear interactions in machine learning models (e.g., gradient boosted trees, neural networks).

Mathematical Formulation and Partial Dependence

Let $x_i$ and $x_j$ be two features. Let $PD_i(x_i)$ and $PD_j(x_j)$ be the 1-way Partial Dependence (PD) functions, which represent the average prediction of the model when fixing the respective feature value: $$ PD_i(x_i) = \frac{1}{N} \sum_{k=1}^N f(x_i, \ x_{k, \setminus i}) $$ Let $PD_{ij}(x_i, \\ x_j)$ be the 2-way joint Partial Dependence function: $$ PD_{ij}(x_i, \ x_j) = \frac{1}{N} \sum_{k=1}^N f(x_i, \ x_j, \ x_{k, \setminus \{i, j\}}) $$ If there is no interaction between $x_i$ and $x_j$ (meaning their combined effect on the prediction is perfectly additive), then the joint PD can be decomposed exactly as the sum of their individual PD functions: $$ PD_{ij}(x_i, \ x_j) = PD_i(x_i) + PD_j(x_j) $$ Friedman's $H^2_{ij}$ statistic measures the normalized squared deviation from this additive null hypothesis over the empirical distribution of the dataset: $$ H^2_{ij} = \frac{\sum_{k=1}^N \left[ PD_{ij}(x_{k,i}, \ x_{k,j}) - PD_i(x_{k,i}) - PD_j(x_{k,j}) \right]^2}{\sum_{k=1}^N \left[ PD_{ij}(x_{k,i}, \ x_{k,j}) \right]^2} $$

Interpretation of the H-Statistic: - $H^2_{ij} = 0$: No interaction whatsoever. The features affect the response in a perfectly additive manner. - $H^2_{ij} = 1.0$: The combined prediction depends entirely on their interaction; the individual main effects explain $0\\%$ of the joint variation. - In practice, a value of $H_{ij} > 0.10$ (representing the square root of $H^2_{ij}$) suggests a substantial, non-negligible interaction effect.

PARAMETER	DESCRIPTION
`model`	A trained, black-box machine learning model (must implement `.predict()` or `.predict_proba()`). TYPE: `Any`
`X_data`	The covariate dataset used to evaluate the partial dependencies. TYPE: `Any`
`feature_i`	Name of the first target feature. TYPE: `str`
`feature_j`	Name of the second target feature. TYPE: `str`

RETURNS	DESCRIPTION
`float`	The computed Friedman's H-statistic ($H_{ij} \\in [0, 1]$). TYPE: `float`

Source code in src\xpyrment\interactions\hstat.py

def compute_friedman_h_statistic(model: Any, X_data: Any, feature_i: str, feature_j: str) -> float:
    r"""Computes model-agnostic Friedman's H-statistic representing the degree of interaction between two features.

    Friedman's H-statistic (Friedman and Popescu, 2008) measures the strength of interaction between features
    by evaluating how much of the model's prediction variation is due to joint, non-additive behavior.
    Unlike linear regression product terms, the H-statistic is model-agnostic and can capture highly complex,
    non-linear interactions in machine learning models (e.g., gradient boosted trees, neural networks).

    Mathematical Formulation and Partial Dependence:
        Let $x_i$ and $x_j$ be two features. Let $PD_i(x_i)$ and $PD_j(x_j)$ be the 1-way Partial Dependence (PD) functions,
        which represent the average prediction of the model when fixing the respective feature value:
        $$
        PD_i(x_i) = \\frac{1}{N} \\sum_{k=1}^N f(x_i, \\ x_{k, \\setminus i})
        $$
        Let $PD_{ij}(x_i, \\ x_j)$ be the 2-way joint Partial Dependence function:
        $$
        PD_{ij}(x_i, \\ x_j) = \\frac{1}{N} \\sum_{k=1}^N f(x_i, \\ x_j, \\ x_{k, \\setminus \\{i, j\\}})
        $$
        If there is no interaction between $x_i$ and $x_j$ (meaning their combined effect on the prediction is perfectly additive),
        then the joint PD can be decomposed exactly as the sum of their individual PD functions:
        $$
        PD_{ij}(x_i, \\ x_j) = PD_i(x_i) + PD_j(x_j)
        $$
        Friedman's $H^2_{ij}$ statistic measures the normalized squared deviation from this additive null hypothesis
        over the empirical distribution of the dataset:
        $$
        H^2_{ij} = \\frac{\\sum_{k=1}^N \\left[ PD_{ij}(x_{k,i}, \\ x_{k,j}) - PD_i(x_{k,i}) - PD_j(x_{k,j}) \\right]^2}{\\sum_{k=1}^N \\left[ PD_{ij}(x_{k,i}, \\ x_{k,j}) \\right]^2}
        $$
    Interpretation of the H-Statistic:
        - $H^2_{ij} = 0$: No interaction whatsoever. The features affect the response in a perfectly additive manner.
        - $H^2_{ij} = 1.0$: The combined prediction depends entirely on their interaction; the individual main effects explain
          $0\\%$ of the joint variation.
        - In practice, a value of $H_{ij} > 0.10$ (representing the square root of $H^2_{ij}$) suggests a substantial,
          non-negligible interaction effect.

    Args:
        model: A trained, black-box machine learning model (must implement `.predict()` or `.predict_proba()`).
        X_data: The covariate dataset used to evaluate the partial dependencies.
        feature_i (str): Name of the first target feature.
        feature_j (str): Name of the second target feature.

    Returns:
        float: The computed Friedman's H-statistic ($H_{ij} \\in [0, 1]$).
    """
    import numpy as np
    import pandas as pd

    # Ensure X_data is a DataFrame for easy column indexing
    if not isinstance(X_data, pd.DataFrame):
        X_data = pd.DataFrame(X_data)

    # We evaluate PDs over the empirical distribution of X_data
    n = len(X_data)
    pd_ij = np.zeros(n)
    pd_i = np.zeros(n)
    pd_j = np.zeros(n)

    for k in range(n):
        xi_k = X_data.iloc[k][feature_i]
        xj_k = X_data.iloc[k][feature_j]

        # PD_ij(xi_k, xj_k)
        # We need the average of f(xi_k, xj_k, X_\setminus ij)
        X_ij = X_data.copy()
        X_ij[feature_i] = xi_k
        X_ij[feature_j] = xj_k
        pd_ij[k] = np.mean(model.predict(X_ij))

        # PD_i(xi_k)
        X_i = X_data.copy()
        X_i[feature_i] = xi_k
        pd_i[k] = np.mean(model.predict(X_i))

        # PD_j(xj_k)
        X_j = X_data.copy()
        X_j[feature_j] = xj_k
        pd_j[k] = np.mean(model.predict(X_j))

    # Center the PDs as per Friedman (2008) to remove grand mean effects
    # In some implementations, this is handled by the differencing

    numerator = np.sum((pd_ij - pd_i - pd_j)**2)
    denominator = np.sum(pd_ij**2)

    if denominator == 0:
        return 0.0

    h2 = numerator / denominator
    return float(np.sqrt(max(0, h2)))

plot_interaction_heatmap

plot_interaction_heatmap(
    df_interactions: DataFrame,
    annot: Optional[bool] = None,
    ax: Optional[Axes] = None,
    **kwargs
) -> tuple

Generates an interaction term heatmap using matplotlib.

Visualizes a symmetric matrix of feature/factor interactions. Heatmaps are a highly effective diagnostic chart for screening complex multi-factor studies or high-dimensional covariate sets, allowing the user to instantly recognize clusters of strong synergy or severe interference.

Matrix Structure

Let $F = \{f_1, f_2, \dots, f_m\}$ be the set of analyzed factors or covariates. The plotting engine constructs a symmetric $m \times m$ matrix $H$: - Cell $H_{i,j}$ contains the strength of the interaction between $f_i$ and $f_j$. This value can represent either: 1. The absolute regression interaction coefficient ($|\beta_{\text{interaction}}|$). 2. The model-agnostic Friedman's H-statistic ($H_{ij}$). 3. The statistical significance transformed index ($-\log_{10}(p_{\text{value}})$). - Cells along the diagonal ($H_{i,i}$) are typically zeroed or set to represent the main effect of factor $f_i$. - The matrix is rendered using a divergent colormap (such as RdBu or seismic if mapping positive/negative coefficients) or a sequential colormap (such as Viridis or YlOrRd if mapping absolute H-statistics or significance).

PARAMETER	DESCRIPTION
`df_interactions`	A rectangular or pivoted DataFrame representing the interaction strength matrix, with factor names as both index and column headings. TYPE: `DataFrame`
`annot`	Whether to annotate the cells with numeric values. If None, annotations are enabled automatically only if the matrix size is small (e.g., <= 20 features). TYPE: `bool` DEFAULT: `None`
`ax`	Pre-existing axes for the plot. If None, a new figure and axes are created. TYPE: `Axes` DEFAULT: `None`
`**kwargs`	Additional keyword arguments to pass to `seaborn.heatmap`. DEFAULT: `{}`

RETURNS	DESCRIPTION
`tuple`	A tuple `(fig, ax)` containing: - `fig` (matplotlib.figure.Figure): The active matplotlib Figure canvas. - `ax` (matplotlib.axes.Axes): The axes container housing the rendered heatmap. TYPE: `tuple`

Source code in src\xpyrment\interactions\plots.py

def plot_interaction_heatmap(
    df_interactions: pd.DataFrame,
    annot: Optional[bool] = None,
    ax: Optional[plt.Axes] = None,
    **kwargs
) -> tuple:
    r"""Generates an interaction term heatmap using matplotlib.

    Visualizes a symmetric matrix of feature/factor interactions. Heatmaps are a highly effective diagnostic
    chart for screening complex multi-factor studies or high-dimensional covariate sets, allowing the user
    to instantly recognize clusters of strong synergy or severe interference.

    Matrix Structure:
        Let $F = \{f_1, f_2, \dots, f_m\}$ be the set of analyzed factors or covariates. The plotting engine constructs
        a symmetric $m \times m$ matrix $H$:
        - Cell $H_{i,j}$ contains the strength of the interaction between $f_i$ and $f_j$. This value can represent
          either:
          1. The absolute regression interaction coefficient ($|\beta_{\text{interaction}}|$).
          2. The model-agnostic Friedman's H-statistic ($H_{ij}$).
          3. The statistical significance transformed index ($-\log_{10}(p_{\text{value}})$).
        - Cells along the diagonal ($H_{i,i}$) are typically zeroed or set to represent the main effect of factor $f_i$.
        - The matrix is rendered using a divergent colormap (such as `RdBu` or `seismic` if mapping positive/negative coefficients)
          or a sequential colormap (such as `Viridis` or `YlOrRd` if mapping absolute H-statistics or significance).

    Args:
        df_interactions (pd.DataFrame): A rectangular or pivoted DataFrame representing the interaction strength matrix,
            with factor names as both index and column headings.
        annot (bool, optional): Whether to annotate the cells with numeric values. If None,
            annotations are enabled automatically only if the matrix size is small (e.g., <= 20 features).
        ax (matplotlib.axes.Axes, optional): Pre-existing axes for the plot. If None, a new figure
            and axes are created.
        **kwargs: Additional keyword arguments to pass to `seaborn.heatmap`.

    Returns:
        tuple: A tuple `(fig, ax)` containing:
            - `fig` (matplotlib.figure.Figure): The active matplotlib Figure canvas.
            - `ax` (matplotlib.axes.Axes): The axes container housing the rendered heatmap.
    """
    # Validate symmetric square matrix
    if df_interactions.shape[0] != df_interactions.shape[1]:
        raise ValueError(f"Interaction matrix must be square, got shape {df_interactions.shape}")
    if list(df_interactions.index) != list(df_interactions.columns):
        raise ValueError("Interaction matrix must have matching index and columns")

    num_features = df_interactions.shape[0]

    # Determine whether to annotate cells
    if annot is None:
        # Heuristic: only annotate for smaller matrices to keep the plot readable and fast
        annot_flag = num_features <= 20
    else:
        annot_flag = annot

    # Determine the appropriate colormap based on the numeric data values
    numeric_df = df_interactions.select_dtypes(include=[np.number])
    if numeric_df.empty:
        raise TypeError("Interaction matrix must contain numeric columns to plot heatmap.")

    has_negative_values = (numeric_df.to_numpy() < 0).any()
    cmap = kwargs.pop("cmap", "RdBu_r" if has_negative_values else "YlOrRd")

    if ax is None:
        # Calculate a dynamic figure size based on the number of features
        fig_size = max(8, min(24, num_features * 1.5))
        fig, ax_to_use = plt.subplots(figsize=(fig_size, fig_size * 0.8))
    else:
        fig = ax.figure
        ax_to_use = ax

    # Calculate symmetric vmin/vmax for divergent colormaps
    heatmap_kwargs = {
        "annot": annot_flag,
        "fmt": ".3g",
        "cmap": cmap,
        "ax": ax_to_use,
        "square": True,
        "cbar_kws": {'label': 'Interaction Strength'}
    }

    if has_negative_values:
        max_abs = np.abs(numeric_df.to_numpy()).max()
        heatmap_kwargs["vmin"] = -max_abs
        heatmap_kwargs["vmax"] = max_abs
        heatmap_kwargs["center"] = 0
    else:
        heatmap_kwargs["vmin"] = 0

    heatmap_kwargs.update(kwargs)

    # Render the heatmap
    sns.heatmap(df_interactions, **heatmap_kwargs)

    # Add title and adjust layout
    ax_to_use.set_title("Factor Interaction Heatmap", pad=20, fontsize=14, fontweight="bold")
    ax_to_use.set_xlabel("Factor", fontweight="bold")
    ax_to_use.set_ylabel("Factor", fontweight="bold")

    if ax is None:
        fig.tight_layout()

    return fig, ax_to_use

plot_interaction_effects

plot_interaction_effects(
    data: DataFrame,
    treatment_col: str,
    metric_col: str,
    covariate_col: str,
    ax: Optional[Axes] = None,
    **kwargs
) -> tuple

Plots interaction effects between a treatment and a covariate on a metric.

Generates a line plot showing the average metric value for different treatment groups across levels of the covariate. This helps visualize if the treatment effect varies depending on the covariate value (heterogeneous treatment effect).

Example

from xpyrment.interactions import plot_interaction_effects

plot_interaction_effects(data, "treatment", "revenue", "country")

PARAMETER	DESCRIPTION
`data`	The experimental data containing treatments, covariates, and metrics. TYPE: `DataFrame`
`treatment_col`	The name of the column representing the treatment group. TYPE: `str`
`metric_col`	The name of the column representing the outcome metric. TYPE: `str`
`covariate_col`	The name of the column representing the interacting covariate. TYPE: `str`
`ax`	Pre-existing axes for the plot. If None, a new figure and axes are created. TYPE: `Axes` DEFAULT: `None`
`**kwargs`	Additional keyword arguments to pass to the underlying plotting functions. DEFAULT: `{}`

RETURNS	DESCRIPTION
`tuple`	A tuple `(fig, ax)` containing the matplotlib Figure and Axes objects. TYPE: `tuple`

Source code in src\xpyrment\interactions\plots.py

def plot_interaction_effects(
    data: pd.DataFrame,
    treatment_col: str,
    metric_col: str,
    covariate_col: str,
    ax: Optional[plt.Axes] = None,
    **kwargs
) -> tuple:
    """Plots interaction effects between a treatment and a covariate on a metric.

    Generates a line plot showing the average metric value for different treatment
    groups across levels of the covariate. This helps visualize if the treatment
    effect varies depending on the covariate value (heterogeneous treatment effect).

    Example:
        ```python
        from xpyrment.interactions import plot_interaction_effects

        plot_interaction_effects(data, "treatment", "revenue", "country")
        ```

    Args:
        data (pd.DataFrame): The experimental data containing treatments, covariates, and metrics.
        treatment_col (str): The name of the column representing the treatment group.
        metric_col (str): The name of the column representing the outcome metric.
        covariate_col (str): The name of the column representing the interacting covariate.
        ax (matplotlib.axes.Axes, optional): Pre-existing axes for the plot. If None, a new figure
            and axes are created.
        **kwargs: Additional keyword arguments to pass to the underlying plotting functions.

    Returns:
        tuple: A tuple `(fig, ax)` containing the matplotlib Figure and Axes objects.
    """
    num_treatments = data[treatment_col].nunique()

    if ax is None:
        fig, ax_to_use = plt.subplots(figsize=(10, 6))
    else:
        fig = ax.figure
        ax_to_use = ax

    # Drop NaNs to prevent plotting issues
    plot_data = data.dropna(subset=[treatment_col, metric_col, covariate_col]).copy()

    # Check if covariate is numeric or categorical
    if pd.api.types.is_numeric_dtype(plot_data[covariate_col]) and plot_data[covariate_col].nunique() > 10:
        # For continuous numeric covariates, use a scatter plot with regression lines

        # Get default seaborn color palette
        palette = sns.color_palette(n_colors=num_treatments)
        treatments = sorted(plot_data[treatment_col].unique())

        for idx, treatment in enumerate(treatments):
            group_data = plot_data[plot_data[treatment_col] == treatment]

            # Combine default and provided kwargs
            regplot_kwargs = {
                "scatter_kws": {"alpha": 0.5},
                "label": str(treatment),
                "color": palette[idx]
            }
            regplot_kwargs.update(kwargs)

            sns.regplot(
                data=group_data,
                x=covariate_col,
                y=metric_col,
                ax=ax_to_use,
                **regplot_kwargs
            )

        ax_to_use.legend(title=treatment_col)
    else:
        # For categorical or discrete covariates, use a point plot (interaction plot)

        # Ensure we have enough markers and linestyles for all treatment groups
        all_markers = ["o", "s", "D", "^", "v", "<", ">", "p", "*", "h", "H", "+", "x", "X", "d", "|", "_"]
        all_linestyles = ["-", "--", "-.", ":"] * ((num_treatments // 4) + 1)

        pointplot_kwargs = {
            "dodge": True,
            "markers": all_markers[:num_treatments],
            "linestyles": all_linestyles[:num_treatments]
        }
        pointplot_kwargs.update(kwargs)

        sns.pointplot(
            data=plot_data,
            x=covariate_col,
            y=metric_col,
            hue=treatment_col,
            ax=ax_to_use,
            **pointplot_kwargs
        )

    ax_to_use.set_title(f"Interaction Effect of {treatment_col} and {covariate_col} on {metric_col}", pad=15)
    ax_to_use.set_xlabel(covariate_col.capitalize())
    ax_to_use.set_ylabel(metric_col.capitalize())

    if ax is None:
        fig.tight_layout()

    return fig, ax_to_use