SHAP
shap
Game-theoretic feature interaction analysis using SHAP interaction indices.
This module provides the calculate_shap_interactions function, which computes second-order Shapley
interaction values to decompose joint black-box predictions into main and interaction components.
| FUNCTION | DESCRIPTION |
|---|---|
calculate_shap_interactions |
Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive). |
calculate_shap_interactions
Computes SHAP interaction values to decompose multi-factor combinations (computationally expensive).
SHAP (SHapley Additive exPlanations) interaction values (Lundberg et al., 2018) are based on the coalitional game-theoretic Shapley Interaction Index (Grabisch and Roubens, 1999). While standard Shapley values partition a model's prediction additively among individual features, SHAP interaction values separate these contributions into pure main effects and pairwise interaction effects, providing an exact model-agnostic representation of how features cooperate or compete.
Mathematical Formulation and Coalition Deficits
Let \(M\) be the complete set of all features. The SHAP interaction value \(\\Phi_{i,j}\) between feature \(i\) and feature \(j\) (where \(i \\neq j\)) measures the pure interaction effect after accounting for all other subsets of features: $$ \Phi_{i,j} = \sum_{S \subseteq M \setminus \{i, j\}} \frac{|S|! (|M| - |S| - 2)!}{2 (|M| - 1)!} \Delta_{i,j}(S) $$ where the second-order marginal contribution difference \(\\Delta_{i,j}(S)\) is defined as: $$ \Delta_{i,j}(S) = f(S \cup \{i, j\}) - f(S \cup \{i\}) - f(S \cup \{j\}) + f(S) $$ The diagonal elements \(\\Phi_{i,i}\) capture the main effect of feature \(i\) after removing all of its pairwise interactions with other features: $$ \Phi_{i,i} = \phi_i - \sum_{j \neq i} \Phi_{i,j} $$ where \(\\phi_i\) is the standard Shapley value for feature \(i\).
The complete set of main effects and interaction values decomposes the model prediction \(f(x)\) exactly: $$ f(x) = \Phi_0 + \sum_{i=1}^{|M|} \Phi_{i,i} + \sum_{i \neq j} \Phi_{i,j} $$ where \(\\Phi_0 = E[f(x)]\) is the base value (expected prediction of the model across the background training distribution).
Computational Complexity
Evaluating the summation requires computing model predictions across \(2^{|M|}\) feature coalitions, which is NP-hard in the general case. To make this practical, modern libraries utilize TreeSHAP (Lundberg et al., 2020), which calculates exact tree-based Shapley interaction values in \(O(T L D^2)\) time where \(T\) is the number of trees, \(L\) is max leaves, and \(D\) is max depth.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
A trained model (typically a tree ensemble like XGBoost or LightGBM).
TYPE:
|
X_data
|
The background evaluation matrix of covariate features.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
A nested list or 3D numpy array of shape
TYPE:
|