Hstat
hstat
Model-agnostic interaction measurement using Friedman's H-statistic.
This module provides the compute_friedman_h_statistic function, which leverages partial dependence
decompositions to quantify the strength of interactions in any machine learning model.
| FUNCTION | DESCRIPTION |
|---|---|
compute_friedman_h_statistic |
Computes model-agnostic Friedman's H-statistic representing the degree of interaction between two features. |
compute_friedman_h_statistic
Computes model-agnostic Friedman's H-statistic representing the degree of interaction between two features.
Friedman's H-statistic (Friedman and Popescu, 2008) measures the strength of interaction between features by evaluating how much of the model's prediction variation is due to joint, non-additive behavior. Unlike linear regression product terms, the H-statistic is model-agnostic and can capture highly complex, non-linear interactions in machine learning models (e.g., gradient boosted trees, neural networks).
Mathematical Formulation and Partial Dependence
Let \(x_i\) and \(x_j\) be two features. Let \(PD_i(x_i)\) and \(PD_j(x_j)\) be the 1-way Partial Dependence (PD) functions, which represent the average prediction of the model when fixing the respective feature value: $$ PD_i(x_i) = \frac{1}{N} \sum_{k=1}^N f(x_i, \ x_{k, \setminus i}) $$ Let \(PD_{ij}(x_i, \\ x_j)\) be the 2-way joint Partial Dependence function: $$ PD_{ij}(x_i, \ x_j) = \frac{1}{N} \sum_{k=1}^N f(x_i, \ x_j, \ x_{k, \setminus \{i, j\}}) $$ If there is no interaction between \(x_i\) and \(x_j\) (meaning their combined effect on the prediction is perfectly additive), then the joint PD can be decomposed exactly as the sum of their individual PD functions: $$ PD_{ij}(x_i, \ x_j) = PD_i(x_i) + PD_j(x_j) $$ Friedman's \(H^2_{ij}\) statistic measures the normalized squared deviation from this additive null hypothesis over the empirical distribution of the dataset: $$ H^2_{ij} = \frac{\sum_{k=1}^N \left[ PD_{ij}(x_{k,i}, \ x_{k,j}) - PD_i(x_{k,i}) - PD_j(x_{k,j}) \right]^2}{\sum_{k=1}^N \left[ PD_{ij}(x_{k,i}, \ x_{k,j}) \right]^2} $$
Interpretation of the H-Statistic: - \(H^2_{ij} = 0\): No interaction whatsoever. The features affect the response in a perfectly additive manner. - \(H^2_{ij} = 1.0\): The combined prediction depends entirely on their interaction; the individual main effects explain \(0\\%\) of the joint variation. - In practice, a value of \(H_{ij} > 0.10\) (representing the square root of \(H^2_{ij}\)) suggests a substantial, non-negligible interaction effect.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
A trained, black-box machine learning model (must implement
TYPE:
|
X_data
|
The covariate dataset used to evaluate the partial dependencies.
TYPE:
|
feature_i
|
Name of the first target feature.
TYPE:
|
feature_j
|
Name of the second target feature.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
The computed Friedman's H-statistic (\(H_{ij} \\in [0, 1]\)).
TYPE:
|