HTE
hte
Heterogeneous Treatment Effect (HTE) discovery and subgroup diagnostics.
This module provides detection systems to identify whether treatment effects are non-uniform across distinct user segments or demographic cohorts (e.g., country, platform, or acquisition channel).
| FUNCTION | DESCRIPTION |
|---|---|
scan_subgroups_for_hte |
Scans demographics/segments to detect Heterogeneous Treatment Effects (HTE) across cohorts. |
scan_subgroups_for_hte
scan_subgroups_for_hte(
df: DataFrame,
treatment_col: str,
metric_col: str,
segment_cols: list,
) -> dict
Scans demographics/segments to detect Heterogeneous Treatment Effects (HTE) across cohorts.
The Average Treatment Effect (ATE) can often be misleading if different user subgroups respond in opposite directions. For instance, a feature might increase engagement for new users but severely degrade it for power users. Identifying these Heterogeneous Treatment Effects (HTE) is critical for personalized targeting and risk mitigation.
The Statistical Threat of Naive Subgroup Sweeping
A common mistake is to perform independent t-tests across numerous segments (e.g., checking 20 different countries). Doing so dramatically inflates the probability of false positives due to multiple testing: $$ \text{FWER} = 1 - (1 - \alpha)^g $$ where \(g\) is the number of subgroups. If \(g=20\) and \(\\alpha=0.05\), there is a \(64\\%\) chance of detecting a "significant" subgroup effect purely by random chance.
To prevent false discoveries, this module implements a two-stage diagnostic framework: 1. Global Interaction Filtering: Rather than running isolated tests on individual subgroups, we fit an integrated regression model containing an interaction term between the treatment assignment indicator \(T\) and the subgroup variable \(S\): $$ Y_i = \beta_0 + \beta_1 T_i + \beta_2 S_i + \beta_3 (T_i \times S_i) + \varepsilon_i $$ We only report subgroup-specific effects if the joint interaction coefficient \(\\beta_3\) is statistically significant (\(p < 0.05\)). 2. Causal Partitioning (Advanced): Uses algorithmic techniques (such as Causal Trees or Forests, Wager and Athey 2018) that recursively split the covariate space to maximize the difference in treatment effects between leaves, using sample-splitting to prevent overfitting and ensure honest confidence intervals.
Pseudocode for Subgroup HTE Sweeping
function scan_subgroups_for_hte(DataFrame df, String treatment_col, String metric_col, List segment_cols):
Initialize hte_results = {}
For each segment in segment_cols:
- Fit OLS: metric_col ~ treatment_col * segment
- Compute F-test for the significance of the interaction term.
- If interaction p-value < 0.05:
- Calculate specific treatment lifts and confidence intervals within each level of the segment.
- Add results to hte_results[segment]
Return hte_results
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The experimental dataset.
TYPE:
|
treatment_col
|
Column containing treatment assignments.
TYPE:
|
metric_col
|
The outcome metric column.
TYPE:
|
segment_cols
|
List of categorical columns representing user segments (e.g.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
A dictionary of detected heterogeneous treatment effects, including interaction p-values, segment-specific lifts, and confidence intervals.
TYPE:
|