Skip to content

HTE

hte

Heterogeneous Treatment Effect (HTE) discovery and subgroup diagnostics.

This module provides detection systems to identify whether treatment effects are non-uniform across distinct user segments or demographic cohorts (e.g., country, platform, or acquisition channel).

FUNCTION DESCRIPTION
scan_subgroups_for_hte

Scans demographics/segments to detect Heterogeneous Treatment Effects (HTE) across cohorts.

scan_subgroups_for_hte

scan_subgroups_for_hte(
    df: DataFrame,
    treatment_col: str,
    metric_col: str,
    segment_cols: list,
) -> dict

Scans demographics/segments to detect Heterogeneous Treatment Effects (HTE) across cohorts.

The Average Treatment Effect (ATE) can often be misleading if different user subgroups respond in opposite directions. For instance, a feature might increase engagement for new users but severely degrade it for power users. Identifying these Heterogeneous Treatment Effects (HTE) is critical for personalized targeting and risk mitigation.

The Statistical Threat of Naive Subgroup Sweeping

A common mistake is to perform independent t-tests across numerous segments (e.g., checking 20 different countries). Doing so dramatically inflates the probability of false positives due to multiple testing: $$ \text{FWER} = 1 - (1 - \alpha)^g $$ where \(g\) is the number of subgroups. If \(g=20\) and \(\\alpha=0.05\), there is a \(64\\%\) chance of detecting a "significant" subgroup effect purely by random chance.

To prevent false discoveries, this module implements a two-stage diagnostic framework: 1. Global Interaction Filtering: Rather than running isolated tests on individual subgroups, we fit an integrated regression model containing an interaction term between the treatment assignment indicator \(T\) and the subgroup variable \(S\): $$ Y_i = \beta_0 + \beta_1 T_i + \beta_2 S_i + \beta_3 (T_i \times S_i) + \varepsilon_i $$ We only report subgroup-specific effects if the joint interaction coefficient \(\\beta_3\) is statistically significant (\(p < 0.05\)). 2. Causal Partitioning (Advanced): Uses algorithmic techniques (such as Causal Trees or Forests, Wager and Athey 2018) that recursively split the covariate space to maximize the difference in treatment effects between leaves, using sample-splitting to prevent overfitting and ensure honest confidence intervals.

Pseudocode for Subgroup HTE Sweeping
function scan_subgroups_for_hte(DataFrame df, String treatment_col, String metric_col, List segment_cols):
    Initialize hte_results = {}
    For each segment in segment_cols:
        - Fit OLS: metric_col ~ treatment_col * segment
        - Compute F-test for the significance of the interaction term.
        - If interaction p-value < 0.05:
            - Calculate specific treatment lifts and confidence intervals within each level of the segment.
            - Add results to hte_results[segment]
    Return hte_results
PARAMETER DESCRIPTION
df

The experimental dataset.

TYPE: DataFrame

treatment_col

Column containing treatment assignments.

TYPE: str

metric_col

The outcome metric column.

TYPE: str

segment_cols

List of categorical columns representing user segments (e.g. ["platform", "country"]).

TYPE: list

RETURNS DESCRIPTION
dict

A dictionary of detected heterogeneous treatment effects, including interaction p-values, segment-specific lifts, and confidence intervals.

TYPE: dict

Source code in src\xpyrment\interpret\hte.py
def scan_subgroups_for_hte(df: pd.DataFrame, treatment_col: str, metric_col: str, segment_cols: list) -> dict:
    r"""Scans demographics/segments to detect Heterogeneous Treatment Effects (HTE) across cohorts.

    The Average Treatment Effect (ATE) can often be misleading if different user subgroups respond in opposite directions.
    For instance, a feature might increase engagement for new users but severely degrade it for power users.
    Identifying these Heterogeneous Treatment Effects (HTE) is critical for personalized targeting and risk mitigation.

    The Statistical Threat of Naive Subgroup Sweeping:
        A common mistake is to perform independent t-tests across numerous segments (e.g., checking 20 different countries).
        Doing so dramatically inflates the probability of false positives due to multiple testing:
        $$
        \\text{FWER} = 1 - (1 - \\alpha)^g
        $$
        where $g$ is the number of subgroups. If $g=20$ and $\\alpha=0.05$, there is a $64\\%$ chance of detecting a
        "significant" subgroup effect purely by random chance.

    To prevent false discoveries, this module implements a two-stage diagnostic framework:
        1. **Global Interaction Filtering**: Rather than running isolated tests on individual subgroups, we fit an integrated
           regression model containing an interaction term between the treatment assignment indicator $T$ and the subgroup variable $S$:
           $$
           Y_i = \\beta_0 + \\beta_1 T_i + \\beta_2 S_i + \\beta_3 (T_i \\times S_i) + \\varepsilon_i
           $$
           We only report subgroup-specific effects if the joint interaction coefficient $\\beta_3$ is statistically significant ($p < 0.05$).
        2. **Causal Partitioning** (Advanced): Uses algorithmic techniques (such as Causal Trees or Forests, Wager and Athey 2018)
           that recursively split the covariate space to maximize the difference in treatment effects between leaves, using
           sample-splitting to prevent overfitting and ensure honest confidence intervals.

    Pseudocode for Subgroup HTE Sweeping:
        ```text
        function scan_subgroups_for_hte(DataFrame df, String treatment_col, String metric_col, List segment_cols):
            Initialize hte_results = {}
            For each segment in segment_cols:
                - Fit OLS: metric_col ~ treatment_col * segment
                - Compute F-test for the significance of the interaction term.
                - If interaction p-value < 0.05:
                    - Calculate specific treatment lifts and confidence intervals within each level of the segment.
                    - Add results to hte_results[segment]
            Return hte_results
        ```

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column containing treatment assignments.
        metric_col (str): The outcome metric column.
        segment_cols (list): List of categorical columns representing user segments (e.g. `["platform", "country"]`).

    Returns:
        dict: A dictionary of detected heterogeneous treatment effects, including interaction p-values,
            segment-specific lifts, and confidence intervals.
    """
    import statsmodels.formula.api as smf
    import numpy as np

    hte_results = {}

    for segment in segment_cols:
        # Fit OLS: Y ~ T * S
        formula = f"{metric_col} ~ {treatment_col} * {segment}"
        try:
            model = smf.ols(formula, data=df).fit()

            # Check interaction p-value (usually the last coefficient if segment is binary/numeric)
            # For categorical segments, we look for any interaction term
            interaction_terms = [c for c in model.pvalues.index if ":" in c]

            significant_interaction = False
            for term in interaction_terms:
                if model.pvalues[term] < 0.05:
                    significant_interaction = True
                    break

            if significant_interaction:
                # Calculate lifts within each level of the segment
                segment_levels = df[segment].unique()
                lifts = {}
                for level in segment_levels:
                    sub_df = df[df[segment] == level]
                    ctrl = sub_df[sub_df[treatment_col] == 0][metric_col]
                    trt = sub_df[sub_df[treatment_col] == 1][metric_col]

                    if len(ctrl) > 1 and len(trt) > 1:
                        mean_ctrl = np.mean(ctrl)
                        mean_trt = np.mean(trt)
                        lift = (mean_trt - mean_ctrl) / mean_ctrl if mean_ctrl != 0 else 0.0
                        lifts[str(level)] = lift

                hte_results[segment] = {
                    "interaction_p_value": min(model.pvalues[interaction_terms]),
                    "subgroup_lifts": lifts
                }
        except Exception:
            continue

    return hte_results