SHAP

shap

Game-theoretic feature interaction analysis using SHAP interaction indices.

This module provides the calculate_shap_interactions function, which computes second-order Shapley interaction values to decompose joint black-box predictions into main and interaction components.

FUNCTION	DESCRIPTION
`calculate_shap_interactions`	Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).

calculate_shap_interactions

calculate_shap_interactions(
    model: Any, X_data: Any
) -> Union[list, ndarray]

Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).

SHAP (SHapley Additive exPlanations) interaction values (Lundberg et al., 2018) are based on the coalitional game-theoretic Shapley Interaction Index (Grabisch and Roubens, 1999). While standard Shapley values partition a model's prediction additively among individual features, SHAP interaction values separate these contributions into pure main effects and pairwise interaction effects, providing an exact model-agnostic representation of how features cooperate or compete.

Mathematical Formulation and Coalition Deficits

Let $M$ be the complete set of all features. The SHAP interaction value $\\Phi_{i,j}$ between feature $i$ and feature $j$ (where $i \\neq j$) measures the pure interaction effect after accounting for all other subsets of features: $$ \Phi_{i,j} = \sum_{S \subseteq M \setminus \{i, j\}} \frac{|S|! (|M| - |S| - 2)!}{2 (|M| - 1)!} \Delta_{i,j}(S) $$ where the second-order marginal contribution difference $\\Delta_{i,j}(S)$ is defined as: $$ \Delta_{i,j}(S) = f(S \cup \{i, j\}) - f(S \cup \{i\}) - f(S \cup \{j\}) + f(S) $$ The diagonal elements $\\Phi_{i,i}$ capture the main effect of feature $i$ after removing all of its pairwise interactions with other features: $$ \Phi_{i,i} = \phi_i - \sum_{j \neq i} \Phi_{i,j} $$ where $\\phi_i$ is the standard Shapley value for feature $i$.

The complete set of main effects and interaction values decomposes the model prediction $f(x)$ exactly: $$ f(x) = \Phi_0 + \sum_{i=1}^{|M|} \Phi_{i,i} + \sum_{i \neq j} \Phi_{i,j} $$ where $\\Phi_0 = E[f(x)]$ is the base value (expected prediction of the model across the background training distribution).

Computational Complexity

Evaluating the summation requires computing model predictions across $2^{|M|}$ feature coalitions, which is NP-hard in the general case. To make this practical, modern libraries utilize TreeSHAP (Lundberg et al., 2020), which calculates exact tree-based Shapley interaction values in $O(T L D^2)$ time where $T$ is the number of trees, $L$ is max leaves, and $D$ is max depth.

PARAMETER	DESCRIPTION
`model`	A trained model (typically a tree ensemble like XGBoost or LightGBM). TYPE: `Any`
`X_data`	The background evaluation matrix of covariate features. TYPE: `Any`

RETURNS	DESCRIPTION
`list`	A nested list or 3D numpy array of shape `(num_samples, num_features, num_features)` containing individual Shapley interaction matrices. TYPE: `Union[list, ndarray]`

Source code in src\xpyrment\interactions\shap.py

def calculate_shap_interactions(model: Any, X_data: Any) -> Union[list, np.ndarray]:
    r"""Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).

    SHAP (SHapley Additive exPlanations) interaction values (Lundberg et al., 2018) are based on the coalitional game-theoretic
    Shapley Interaction Index (Grabisch and Roubens, 1999). While standard Shapley values partition a model's prediction
    additively among individual features, SHAP interaction values separate these contributions into pure main effects and
    pairwise interaction effects, providing an exact model-agnostic representation of how features cooperate or compete.

    Mathematical Formulation and Coalition Deficits:
        Let $M$ be the complete set of all features. The SHAP interaction value $\\Phi_{i,j}$ between feature $i$ and feature $j$
        (where $i \\neq j$) measures the pure interaction effect after accounting for all other subsets of features:
        $$
        \\Phi_{i,j} = \\sum_{S \\subseteq M \\setminus \\{i, j\\}} \\frac{|S|! (|M| - |S| - 2)!}{2 (|M| - 1)!} \\Delta_{i,j}(S)
        $$
        where the second-order marginal contribution difference $\\Delta_{i,j}(S)$ is defined as:
        $$
        \\Delta_{i,j}(S) = f(S \\cup \\{i, j\\}) - f(S \\cup \\{i\\}) - f(S \\cup \\{j\\}) + f(S)
        $$
        The diagonal elements $\\Phi_{i,i}$ capture the main effect of feature $i$ after removing all of its pairwise interactions
        with other features:
        $$
        \\Phi_{i,i} = \\phi_i - \\sum_{j \\neq i} \\Phi_{i,j}
        $$
        where $\\phi_i$ is the standard Shapley value for feature $i$.

        The complete set of main effects and interaction values decomposes the model prediction $f(x)$ exactly:
        $$
        f(x) = \\Phi_0 + \\sum_{i=1}^{|M|} \\Phi_{i,i} + \\sum_{i \\neq j} \\Phi_{i,j}
        $$
        where $\\Phi_0 = E[f(x)]$ is the base value (expected prediction of the model across the background training distribution).

    Computational Complexity:
        Evaluating the summation requires computing model predictions across $2^{|M|}$ feature coalitions, which is NP-hard
        in the general case. To make this practical, modern libraries utilize TreeSHAP (Lundberg et al., 2020), which calculates
        exact tree-based Shapley interaction values in $O(T L D^2)$ time where $T$ is the number of trees, $L$ is max leaves,
        and $D$ is max depth.

    Args:
        model: A trained model (typically a tree ensemble like XGBoost or LightGBM).
        X_data: The background evaluation matrix of covariate features.

    Returns:
        list: A nested list or 3D numpy array of shape `(num_samples, num_features, num_features)` containing
            individual Shapley interaction matrices.
    """
    try:
        import shap
    except ImportError as e:
        raise ImportError(
            "The 'shap' library is required to calculate SHAP interactions. "
            "Please install it using: pip install shap"
        ) from e

    explainer = shap.TreeExplainer(model)
    interaction_values = explainer.shap_interaction_values(X_data)

    return interaction_values