Metrics Module

The xpyrment.metrics package implements a unified, low-code statistical metrics taxonomy supporting automated CUPED variance reduction, Delta method ratio calculations, and frequentist/Bayesian evaluation pipelines.

📊 Supported Metric Models

The following metric formulas are dynamically registered and supported:

Metric Model	Technical Class Path	Key Features & Analytical Properties
MeanMetric	`xpyrment.metrics.taxonomy.MeanMetric`	A metric representing a continuous or numeric value (e.g., average revenue, sessions).
ProportionMetric	`xpyrment.metrics.taxonomy.ProportionMetric`	A metric representing a binary/proportion rate (e.g., conversion rate, success rate).
RatioMetric	`xpyrment.metrics.taxonomy.RatioMetric`	A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate).

📦 Package Reference

metrics

Metrics package for taxonomy, guardrails, and transformations.

This package houses the core definitions and statistical calculation routines for metrics evaluated during an experiment: - BaseMetric: The abstract base class establishing standard evaluation contracts. - MeanMetric: For continuous measurements, supporting pre-period CUPED adjustments. - ProportionMetric: For binary binomial event rates. - RatioMetric: For compound aggregate metrics (e.g., Click-Through Rate) evaluated via Delta Method variance. - GuardrailMetric: Special monitoring wrapper to prevent platform/business regressions. - log_transform: Normalization for extremely skewed continuous metrics. - delta_normalization: Taylor expansion adjustment for advanced aggregate metrics.

MODULE	DESCRIPTION
`guardrails`	Guardrail metrics to protect core platform health and business stability.
`taxonomy`	Standardized metrics taxonomy, calculation engines, and variance reduction routines.
`transformations`	Mathematical transformations and normalization utilities for experimental telemetry.

CLASS	DESCRIPTION
`GuardrailMetric`	Defines a guardrail metric with specific breach thresholds.
`BaseMetric`	Abstract base class representing a statistical metric in the experiment taxonomy.
`MeanMetric`	A metric representing a continuous or numeric value (e.g., average revenue, sessions).
`ProportionMetric`	A metric representing a binary/proportion rate (e.g., conversion rate, success rate).
`RatioMetric`	A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate).

FUNCTION	DESCRIPTION
`log_transform`	Transforms continuous metrics using a shifted natural log transformation.
`delta_normalization`	Normalizes metrics using the Delta Method expansion (Stub/Scaffolding).

GuardrailMetric

GuardrailMetric(
    metric: BaseMetric, max_allowed_change: float = 0.01
)

Defines a guardrail metric with specific breach thresholds.

Guardrail metrics are designed to prevent treatment arms from causing catastrophic regressions on critical secondary metrics. Unlike primary metrics (where we search for significant positive changes), guardrail metrics are evaluated to ensure that they do not deteriorate beyond a pre-specified tolerance boundary, irrespective of statistical significance.

ATTRIBUTE	DESCRIPTION
`metric`	The underlying metric to monitor (e.g., MeanMetric, RatioMetric). TYPE: `BaseMetric`
`max_allowed_change`	The maximum tolerated relative change (positive or negative) expressed as a fraction (e.g., `0.01` represents a 1% threshold). TYPE: `float`

Examples:

Example

>>> from xpyrment.metrics.taxonomy import MeanMetric
>>> from xpyrment.metrics.guardrails import GuardrailMetric
>>> latency_metric = MeanMetric("Page Latency", value_col="load_time")
>>> guardrail = GuardrailMetric(latency_metric, max_allowed_change=0.02) # 2% max increase
>>> calc_result = {"metric_name": "Page Latency", "relative_lift": 0.035} # 3.5% lift (regression)
>>> guardrail.check_breach(calc_result)
True

PARAMETER	DESCRIPTION
`metric`	The concrete metric instance being monitored. TYPE: `BaseMetric`
`max_allowed_change`	The threshold for the maximum absolute relative lift allowed before triggering a breach. Defaults to 0.01 (1%). TYPE: `float` DEFAULT: `0.01`

METHOD	DESCRIPTION
`check_breach`	Determines if the calculated lift breaches the guardrail thresholds.

Source code in src\xpyrment\metrics\guardrails.py

def __init__(self, metric: BaseMetric, max_allowed_change: float = 0.01):
    """Initializes a GuardrailMetric wrapper.

    Args:
        metric (BaseMetric): The concrete metric instance being monitored.
        max_allowed_change (float): The threshold for the maximum absolute relative lift allowed before
            triggering a breach. Defaults to 0.01 (1%).
    """
    self.metric = metric
    self.max_allowed_change = max_allowed_change

check_breach

check_breach(calculation_result: Dict[str, Any]) -> bool

Determines if the calculated lift breaches the guardrail thresholds.

Mathematical Representation

Let $L$ be the relative lift calculated for the wrapped metric: $$ L = \frac{\bar{Y}_T - \bar{Y}_C}{\bar{Y}_C} $$ A breach is detected if the magnitude of the relative lift exceeds the maximum allowed change: $$ \text{Breach} = |L| > \text{max_allowed_change} $$

Args: calculation_result (Dict[str, Any]): Output dictionary produced by calling metric.calculate() on experimental data.

RETURNS	DESCRIPTION
`bool`	True if the relative lift is larger in magnitude than `max_allowed_change` (breached), False otherwise. TYPE: `bool`

Source code in src\xpyrment\metrics\guardrails.py

def check_breach(self, calculation_result: Dict[str, Any]) -> bool:
    r"""Determines if the calculated lift breaches the guardrail thresholds.

    ??? mathbox "Mathematical Representation"

        Let $L$ be the relative lift calculated for the wrapped metric:
        $$
        L = \frac{\bar{Y}_T - \bar{Y}_C}{\bar{Y}_C}
        $$
        A breach is detected if the magnitude of the relative lift exceeds the maximum allowed change:
        $$
        \text{Breach} = |L| > \text{max\_allowed\_change}
        $$
    Args:
        calculation_result (Dict[str, Any]): Output dictionary produced by calling
            `metric.calculate()` on experimental data.

    Returns:
        bool: True if the relative lift is larger in magnitude than `max_allowed_change` (breached),
            False otherwise.
    """
    lift = calculation_result.get("relative_lift", 0.0)
    # Breach occurs if metric deteriorates beyond max_allowed_change
    return abs(lift) > self.max_allowed_change

BaseMetric

BaseMetric(name: str)

Bases: ABC

Abstract base class representing a statistical metric in the experiment taxonomy.

All custom metrics must inherit from BaseMetric and implement the abstract .calculate() method to return a standardized MetricResult dictionary.

ATTRIBUTE	DESCRIPTION
`name`	The unique descriptive name of the metric. TYPE: `str`

PARAMETER	DESCRIPTION
`name`	Unique descriptive name of the metric. TYPE: `str`

METHOD	DESCRIPTION
`calculate`	Abstract method to compute statistics for control and treatment groups.

Source code in src\xpyrment\metrics\taxonomy.py

def __init__(self, name: str):
    """Initializes a BaseMetric.

    Args:
        name (str): Unique descriptive name of the metric.
    """
    self.name = name

calculate `abstractmethod`

calculate(
    df: DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
) -> Dict[str, Any]

Abstract method to compute statistics for control and treatment groups.

PARAMETER	DESCRIPTION
`df`	The experimental dataset. TYPE: `DataFrame`
`treatment_col`	Column name identifying experimental groups/arms. TYPE: `str`
`control`	The value in `treatment_col` representing the control group. TYPE: `str`
`treatment`	The value in `treatment_col` representing the treatment group. TYPE: `str`

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dict[str, Any]: A compliant `MetricResult` dictionary containing mean, variance, p-value, confidence intervals, power, etc.

Source code in src\xpyrment\metrics\taxonomy.py

@abstractmethod
def calculate(
    self, df: pd.DataFrame, treatment_col: str, control: str, treatment: str
) -> Dict[str, Any]:
    """Abstract method to compute statistics for control and treatment groups.

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column name identifying experimental groups/arms.
        control (str): The value in `treatment_col` representing the control group.
        treatment (str): The value in `treatment_col` representing the treatment group.

    Returns:
        Dict[str, Any]: A compliant `MetricResult` dictionary containing mean, variance,
            p-value, confidence intervals, power, etc.
    """
    pass

MeanMetric

MeanMetric(
    name: str,
    value_col: str,
    pre_period_col: Optional[str] = None,
)

Bases: BaseMetric

A metric representing a continuous or numeric value (e.g., average revenue, sessions).

Supports optional pre-period CUPED (Controlled-comparison Using Pre-Existing Data) adjustment to explain away pre-existing variance and dramatically lower required sample sizes or runtimes.

ATTRIBUTE	DESCRIPTION
`value_col`	The column in the DataFrame containing active experiment period values. TYPE: `str`
`pre_period_col`	The column containing pre-experiment baseline values for CUPED. TYPE: `Optional[str]`

PARAMETER	DESCRIPTION
`name`	Unique descriptive name of the metric. TYPE: `str`
`value_col`	Column name containing experiment period values. TYPE: `str`
`pre_period_col`	Column name containing pre-experiment baseline values for CUPED. Defaults to None (no CUPED applied). TYPE: `Optional[str]` DEFAULT: `None`

METHOD	DESCRIPTION
`calculate`	Calculates descriptive and Welch's t-test statistics for the mean metric.

Source code in src\xpyrment\metrics\taxonomy.py

def __init__(
    self,
    name: str,
    value_col: str,
    pre_period_col: Optional[str] = None,
):
    """Initializes a MeanMetric.

    Args:
        name (str): Unique descriptive name of the metric.
        value_col (str): Column name containing experiment period values.
        pre_period_col (Optional[str]): Column name containing pre-experiment baseline values for CUPED.
            Defaults to None (no CUPED applied).
    """
    super().__init__(name)
    self.value_col = value_col
    self.pre_period_col = pre_period_col

calculate

calculate(
    df: DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]

Calculates descriptive and Welch's t-test statistics for the mean metric.

Drops missing values on the value column. If pre_period_col is provided, performs joint missing drop and executes a standard linear CUPED adjustment:

\[ Y_i^{\text{CUPED}} = Y_i - \theta (X_i - \bar{X}) \]

PARAMETER	DESCRIPTION
`df`	The experimental dataset. TYPE: `DataFrame`
`treatment_col`	Column identifying treatment assignments. TYPE: `str`
`control`	Control arm identifier value in `treatment_col`. TYPE: `str`
`treatment`	Treatment arm identifier value in `treatment_col`. TYPE: `str`
`alpha`	Significance level for Welch's confidence intervals. Defaults to 0.05. TYPE: `float` DEFAULT: `0.05`

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dict[str, Any]: A completed `MetricResult` dictionary.

RAISES	DESCRIPTION
`ValueError`	If either control or treatment group becomes empty after filtering.

Source code in src\xpyrment\metrics\taxonomy.py

def calculate(
    self,
    df: pd.DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]:
    r"""Calculates descriptive and Welch's t-test statistics for the mean metric.

    Drops missing values on the value column. If `pre_period_col` is provided,
    performs joint missing drop and executes a standard linear CUPED adjustment:

    $$
    Y_i^{\text{CUPED}} = Y_i - \theta (X_i - \bar{X})
    $$

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column identifying treatment assignments.
        control (str): Control arm identifier value in `treatment_col`.
        treatment (str): Treatment arm identifier value in `treatment_col`.
        alpha (float): Significance level for Welch's confidence intervals. Defaults to 0.05.

    Returns:
        Dict[str, Any]: A completed `MetricResult` dictionary.

    Raises:
        ValueError: If either control or treatment group becomes empty after filtering.
    """
    df_clean = df.dropna(subset=[self.value_col]).copy()

    c_mask = df_clean[treatment_col] == control
    t_mask = df_clean[treatment_col] == treatment

    n_c = int(np.sum(c_mask))
    n_t = int(np.sum(t_mask))

    if n_c == 0 or n_t == 0:
        raise ValueError(f"Control or treatment group is empty for metric {self.name}.")

    y = df_clean[self.value_col].to_numpy()

    cuped_achieved = False
    variance_reduction = 0.0

    if self.pre_period_col and self.pre_period_col in df_clean.columns:
        df_clean = df_clean.dropna(subset=[self.pre_period_col])
        c_mask = df_clean[treatment_col] == control
        t_mask = df_clean[treatment_col] == treatment
        n_c = int(np.sum(c_mask))
        n_t = int(np.sum(t_mask))

        y = df_clean[self.value_col].to_numpy()
        x = df_clean[self.pre_period_col].to_numpy()

        var_x = np.var(x, ddof=1)
        if not np.isclose(var_x, 0.0, atol=1e-12):
            cov_yx = np.cov(y, x, ddof=1)[0, 1]
            theta = cov_yx / var_x
            mean_x_global = np.mean(x)

            y_cuped = y - theta * (x - mean_x_global)

            y_c = y_cuped[c_mask]
            y_t = y_cuped[t_mask]

            mean_c = float(np.mean(y_c))
            mean_t = float(np.mean(y_t))

            var_c = float(np.var(y_c, ddof=1))
            var_t = float(np.var(y_t, ddof=1))

            cuped_achieved = True

            orig_var = np.var(y, ddof=1)
            adjusted_var = np.var(y_cuped, ddof=1)
            if orig_var > 0:
                variance_reduction = max(0.0, (orig_var - adjusted_var) / orig_var)
        else:
            mean_c = float(np.mean(y[c_mask]))
            mean_t = float(np.mean(y[t_mask]))
            var_c = float(np.var(y[c_mask], ddof=1))
            var_t = float(np.var(y[t_mask], ddof=1))
    else:
        mean_c = float(np.mean(y[c_mask]))
        mean_t = float(np.mean(y[t_mask]))
        var_c = float(np.var(y[c_mask], ddof=1))
        var_t = float(np.var(y[t_mask], ddof=1))

    stats_dict = self._calculate_stats(
        mean_c=mean_c,
        mean_t=mean_t,
        var_c=var_c,
        var_t=var_t,
        n_c=n_c,
        n_t=n_t,
        alpha=alpha,
    )

    diff = mean_t - mean_c
    relative_lift = diff / mean_c if mean_c != 0 else 0.0

    results = {
        "metric_name": self.name,
        "metric_type": "Mean",
        "control_mean": mean_c,
        "treatment_mean": treatment_mean if (treatment_mean := mean_t) is not None else mean_t,
        "control_var": var_c,
        "treatment_var": var_t,
        "control_n": n_c,
        "treatment_n": n_t,
        "absolute_difference": diff,
        "relative_lift": relative_lift,
        "cuped_applied": cuped_achieved,
        "variance_reduction": variance_reduction,
        **stats_dict,
    }

    return results

ProportionMetric

ProportionMetric(
    name: str,
    value_col: str,
    pre_period_col: Optional[str] = None,
)

Bases: MeanMetric

A metric representing a binary/proportion rate (e.g., conversion rate, success rate).

Inherits continuous logic from MeanMetric, as proportions can be modelled asymptotically using normal approximations (Z-test/t-test) under the Central Limit Theorem.

PARAMETER	DESCRIPTION
`name`	Unique descriptive name of the metric. TYPE: `str`
`value_col`	Column name containing experiment period values. TYPE: `str`
`pre_period_col`	Column name containing pre-experiment baseline values for CUPED. Defaults to None (no CUPED applied). TYPE: `Optional[str]` DEFAULT: `None`

METHOD	DESCRIPTION
`calculate`	Calculates proportion conversion rates, differences, and statistical significance.

Source code in src\xpyrment\metrics\taxonomy.py

def __init__(
    self,
    name: str,
    value_col: str,
    pre_period_col: Optional[str] = None,
):
    """Initializes a MeanMetric.

    Args:
        name (str): Unique descriptive name of the metric.
        value_col (str): Column name containing experiment period values.
        pre_period_col (Optional[str]): Column name containing pre-experiment baseline values for CUPED.
            Defaults to None (no CUPED applied).
    """
    super().__init__(name)
    self.value_col = value_col
    self.pre_period_col = pre_period_col

calculate

calculate(
    df: DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]

Calculates proportion conversion rates, differences, and statistical significance.

Drops missing values, calculates means and variances of binary inputs, and delegates to MeanMetric.calculate while overriding the metric type string to "Proportion".

PARAMETER	DESCRIPTION
`df`	The experimental dataset. TYPE: `DataFrame`
`treatment_col`	Column identifying treatment assignments. TYPE: `str`
`control`	Control arm identifier value. TYPE: `str`
`treatment`	Treatment arm identifier value. TYPE: `str`
`alpha`	Significance level. Defaults to 0.05. TYPE: `float` DEFAULT: `0.05`

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dict[str, Any]: Standardized results dict with "metric_type" set to "Proportion".

Source code in src\xpyrment\metrics\taxonomy.py

def calculate(
    self,
    df: pd.DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]:
    """Calculates proportion conversion rates, differences, and statistical significance.

    Drops missing values, calculates means and variances of binary inputs, and delegates
    to `MeanMetric.calculate` while overriding the metric type string to "Proportion".

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column identifying treatment assignments.
        control (str): Control arm identifier value.
        treatment (str): Treatment arm identifier value.
        alpha (float): Significance level. Defaults to 0.05.

    Returns:
        Dict[str, Any]: Standardized results dict with "metric_type" set to "Proportion".
    """
    res = super().calculate(df, treatment_col, control, treatment, alpha)
    res["metric_type"] = "Proportion"
    return res

RatioMetric

RatioMetric(
    name: str,
    numerator_col: str,
    denominator_col: str,
    pre_numerator_col: Optional[str] = None,
    pre_denominator_col: Optional[str] = None,
)

Bases: BaseMetric

A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate).

Employs the Delta Method to approximate ratio-level variances and supports double-covariate ratio-level CUPED adjustments to independently reduce variance in numerator and denominator.

ATTRIBUTE	DESCRIPTION
`numerator_col`	The column containing active period numerator values. TYPE: `str`
`denominator_col`	The column containing active period denominator values (must be $>0$). TYPE: `str`
`pre_numerator_col`	Column containing pre-experiment baseline numerator values. TYPE: `Optional[str]`
`pre_denominator_col`	Column containing pre-experiment baseline denominator values. TYPE: `Optional[str]`

PARAMETER	DESCRIPTION
`name`	Unique descriptive name. TYPE: `str`
`numerator_col`	Active numerator column name. TYPE: `str`
`denominator_col`	Active denominator column name. TYPE: `str`
`pre_numerator_col`	Pre-experiment numerator column. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`pre_denominator_col`	Pre-experiment denominator column. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`

METHOD	DESCRIPTION
`calculate`	Calculates ratio values, Delta-method variances, and statistical significance.

Source code in src\xpyrment\metrics\taxonomy.py

def __init__(
    self,
    name: str,
    numerator_col: str,
    denominator_col: str,
    pre_numerator_col: Optional[str] = None,
    pre_denominator_col: Optional[str] = None,
):
    """Initializes a RatioMetric.

    Args:
        name (str): Unique descriptive name.
        numerator_col (str): Active numerator column name.
        denominator_col (str): Active denominator column name.
        pre_numerator_col (Optional[str]): Pre-experiment numerator column. Defaults to None.
        pre_denominator_col (Optional[str]): Pre-experiment denominator column. Defaults to None.
    """
    super().__init__(name)
    self.numerator_col = numerator_col
    self.denominator_col = denominator_col
    self.pre_numerator_col = pre_numerator_col
    self.pre_denominator_col = pre_denominator_col

calculate

calculate(
    df: DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]

Calculates ratio values, Delta-method variances, and statistical significance.

Cleans missing values and non-positive denominators. If double-covariates are present, separately fits linear CUPED adjustments to the numerator and denominator series:

\[ U_i^{\text{CUPED}} = U_i - \theta_U (U_{i,\text{pre}} - \bar{U}_{\text{pre}}) \]

\[ V_i^{\text{CUPED}} = V_i - \theta_V (V_{i,\text{pre}} - \bar{V}_{\text{pre}}) \]

The ratio variance is then estimated using the Delta Method formulation:

\[ \text{Var}\left(\frac{U}{V}\right) \approx \frac{1}{\bar{V}^2} \left[ \text{Var}(U) + R^2 \text{Var}(V) - 2 R \text{Cov}(U, V) \right] \]

PARAMETER	DESCRIPTION
`df`	The experimental dataset. TYPE: `DataFrame`
`treatment_col`	Column identifying treatment assignments. TYPE: `str`
`control`	Control arm identifier value. TYPE: `str`
`treatment`	Treatment arm identifier value. TYPE: `str`
`alpha`	Significance level. Defaults to 0.05. TYPE: `float` DEFAULT: `0.05`

RETURNS	DESCRIPTION
`Dict[str, Any]`	Dict[str, Any]: Completed `MetricResult` dictionary.

RAISES	DESCRIPTION
`ValueError`	If either control or treatment group becomes empty after filtering.

Source code in src\xpyrment\metrics\taxonomy.py

def calculate(
    self,
    df: pd.DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]:
    r"""Calculates ratio values, Delta-method variances, and statistical significance.

    Cleans missing values and non-positive denominators. If double-covariates are present,
    separately fits linear CUPED adjustments to the numerator and denominator series:

    $$
    U_i^{\text{CUPED}} = U_i - \theta_U (U_{i,\text{pre}} - \bar{U}_{\text{pre}})
    $$

    $$
    V_i^{\text{CUPED}} = V_i - \theta_V (V_{i,\text{pre}} - \bar{V}_{\text{pre}})
    $$

    The ratio variance is then estimated using the Delta Method formulation:

    $$
    \text{Var}\left(\frac{U}{V}\right) \approx \frac{1}{\bar{V}^2} \left[ \text{Var}(U) + R^2 \text{Var}(V) - 2 R \text{Cov}(U, V) \right]
    $$

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column identifying treatment assignments.
        control (str): Control arm identifier value.
        treatment (str): Treatment arm identifier value.
        alpha (float): Significance level. Defaults to 0.05.

    Returns:
        Dict[str, Any]: Completed `MetricResult` dictionary.

    Raises:
        ValueError: If either control or treatment group becomes empty after filtering.
    """
    df_clean = df.dropna(subset=[self.numerator_col, self.denominator_col]).copy()
    df_clean = df_clean[df_clean[self.denominator_col] > 0]

    c_mask = df_clean[treatment_col] == control
    t_mask = df_clean[treatment_col] == treatment

    n_c = int(np.sum(c_mask))
    n_t = int(np.sum(t_mask))

    if n_c == 0 or n_t == 0:
        raise ValueError(f"Control or treatment group is empty for ratio metric {self.name}.")

    cuped_achieved = False
    variance_reduction = 0.0

    num = df_clean[self.numerator_col].to_numpy()
    den = df_clean[self.denominator_col].to_numpy()

    if (
        self.pre_numerator_col
        and self.pre_denominator_col
        and self.pre_numerator_col in df_clean.columns
        and self.pre_denominator_col in df_clean.columns
    ):
        df_clean = df_clean.dropna(subset=[self.pre_numerator_col, self.pre_denominator_col])
        df_clean = df_clean[df_clean[self.pre_denominator_col] > 0]

        c_mask = df_clean[treatment_col] == control
        t_mask = df_clean[treatment_col] == treatment
        n_c = int(np.sum(c_mask))
        n_t = int(np.sum(t_mask))

        num = df_clean[self.numerator_col].to_numpy()
        den = df_clean[self.denominator_col].to_numpy()
        pre_num = df_clean[self.pre_numerator_col].to_numpy()
        pre_den = df_clean[self.pre_denominator_col].to_numpy()

        var_pre_num = np.var(pre_num, ddof=1)
        var_pre_den = np.var(pre_den, ddof=1)

        if not np.isclose(var_pre_num, 0.0, atol=1e-12) and not np.isclose(var_pre_den, 0.0, atol=1e-12):
            cov_num = np.cov(num, pre_num, ddof=1)[0, 1]
            theta_num = cov_num / var_pre_num
            mean_pre_num_global = np.mean(pre_num)
            num_cuped = num - theta_num * (pre_num - mean_pre_num_global)

            cov_den = np.cov(den, pre_den, ddof=1)[0, 1]
            theta_den = cov_den / var_pre_den
            mean_pre_den_global = np.mean(pre_den)
            den_cuped = den - theta_den * (pre_den - mean_pre_den_global)

            num_c, num_t = num_cuped[c_mask], num_cuped[t_mask]
            den_c, den_t = den_cuped[c_mask], den_cuped[t_mask]

            mean_num_c, mean_num_t = np.mean(num_c), np.mean(num_t)
            mean_den_c, mean_den_t = np.mean(den_c), np.mean(den_t)

            ratio_c = mean_num_c / mean_den_c
            ratio_t = mean_num_t / mean_den_t

            var_num_c = np.var(num_c, ddof=1)
            var_den_c = np.var(den_c, ddof=1)
            cov_num_den_c = np.cov(num_c, den_c, ddof=1)[0, 1]

            var_ratio_c = (1 / (mean_den_c**2)) * (
                var_num_c + (ratio_c**2) * var_den_c - 2 * ratio_c * cov_num_den_c
            )

            var_num_t = np.var(num_t, ddof=1)
            var_den_t = np.var(den_t, ddof=1)
            cov_num_den_t = np.cov(num_t, den_t, ddof=1)[0, 1]

            var_ratio_t = (1 / (mean_den_t**2)) * (
                var_num_t + (ratio_t**2) * var_den_t - 2 * ratio_t * cov_num_den_t
            )

            cuped_achieved = True

            orig_ratio_global = np.mean(num) / np.mean(den)
            orig_var_ratio = (1 / (np.mean(den) ** 2)) * (
                np.var(num, ddof=1)
                + (orig_ratio_global**2) * np.var(den, ddof=1)
                - 2 * orig_ratio_global * np.cov(num, den, ddof=1)[0, 1]
            )

            adj_ratio_global = np.mean(num_cuped) / np.mean(den_cuped)
            adj_var_ratio = (1 / (np.mean(den_cuped) ** 2)) * (
                np.var(num_cuped, ddof=1)
                + (adj_ratio_global**2) * np.var(den_cuped, ddof=1)
                - 2 * adj_ratio_global * np.cov(num_cuped, den_cuped, ddof=1)[0, 1]
            )

            if orig_var_ratio > 0:
                variance_reduction = max(0.0, (orig_var_ratio - adj_var_ratio) / orig_var_ratio)
        else:
            mean_num_c, mean_num_t = np.mean(num[c_mask]), np.mean(num[t_mask])
            mean_den_c, mean_den_t = np.mean(den[c_mask]), np.mean(den[t_mask])
            ratio_c = mean_num_c / mean_den_c
            ratio_t = mean_num_t / mean_den_t

            var_num_c = np.var(num[c_mask], ddof=1)
            var_den_c = np.var(den[c_mask], ddof=1)
            cov_num_den_c = np.cov(num[c_mask], den[c_mask], ddof=1)[0, 1]
            var_ratio_c = (1 / (mean_den_c**2)) * (
                var_num_c + (ratio_c**2) * var_den_c - 2 * ratio_c * cov_num_den_c
            )

            var_num_t = np.var(num[t_mask], ddof=1)
            var_den_t = np.var(den[t_mask], ddof=1)
            cov_num_den_t = np.cov(num[t_mask], den[t_mask], ddof=1)[0, 1]
            var_ratio_t = (1 / (mean_den_t**2)) * (
                var_num_t + (ratio_t**2) * var_den_t - 2 * ratio_t * cov_num_den_t
            )
    else:
        mean_num_c, mean_num_t = np.mean(num[c_mask]), np.mean(num[t_mask])
        mean_den_c, mean_den_t = np.mean(den[c_mask]), np.mean(den[t_mask])
        ratio_c = mean_num_c / mean_den_c
        ratio_t = mean_num_t / mean_den_t

        var_num_c = np.var(num[c_mask], ddof=1)
        var_den_c = np.var(den[c_mask], ddof=1)
        cov_num_den_c = np.cov(num[c_mask], den[c_mask], ddof=1)[0, 1]
        var_ratio_c = (1 / (mean_den_c**2)) * (
            var_num_c + (ratio_c**2) * var_den_c - 2 * ratio_c * cov_num_den_c
        )

        var_num_t = np.var(num[t_mask], ddof=1)
        var_den_t = np.var(den[t_mask], ddof=1)
        cov_num_den_t = np.cov(num[t_mask], den[t_mask], ddof=1)[0, 1]
        var_ratio_t = (1 / (mean_den_t**2)) * (
            var_num_t + (ratio_t**2) * var_den_t - 2 * ratio_t * cov_num_den_t
        )

    stats_dict = self._calculate_stats(
        mean_c=ratio_c,
        mean_t=ratio_t,
        var_c=var_ratio_c,
        var_t=var_ratio_t,
        n_c=n_c,
        n_t=n_t,
        alpha=alpha,
    )

    diff = ratio_t - ratio_c
    relative_lift = diff / ratio_c if ratio_c != 0 else 0.0

    results = {
        "metric_name": self.name,
        "metric_type": "Ratio",
        "control_mean": ratio_c,
        "treatment_mean": ratio_t,
        "control_var": var_ratio_c,
        "treatment_var": var_ratio_t,
        "control_n": n_c,
        "treatment_n": n_t,
        "absolute_difference": diff,
        "relative_lift": relative_lift,
        "cuped_applied": cuped_achieved,
        "variance_reduction": variance_reduction,
        **stats_dict,
    }

    return results

log_transform

log_transform(df: DataFrame, col: str) -> Series

Transforms continuous metrics using a shifted natural log transformation.

Highly skewed distributions, such as revenue per user or session durations, often violate the normality assumptions of classical parametric tests (e.g., Student's or Welch's t-test). Applying a natural log transformation normalizes the distribution and stabilizes variance (homoscedasticity). The addition of 1 ensures that zero values remain mapped to zero.

Mathematical Representation

The transformation is defined as: $$ y_{\text{transformed}} = \ln(y + 1) $$ This is mathematically equivalent to: $$ \log1p(y) $$ which maintains numerical precision for extremely small values of $y \approx 0$.

PARAMETER	DESCRIPTION
`df`	The source DataFrame containing the column to transform. TYPE: `DataFrame`
`col`	The name of the target column in `df` representing the skewed metric. TYPE: `str`

RETURNS	DESCRIPTION
`Series`	pd.Series: A new pandas Series containing the log-transformed values.

Examples:

Example

>>> import pandas as pd
>>> df = pd.DataFrame({"revenue": [0.0, 10.0, 150.5]})
>>> log_transform(df, "revenue")
0    0.000000
1    2.397895
2    5.020586
Name: revenue, dtype: float64

Source code in src\xpyrment\metrics\transformations.py

def log_transform(df: pd.DataFrame, col: str) -> pd.Series:
    r"""Transforms continuous metrics using a shifted natural log transformation.

    Highly skewed distributions, such as revenue per user or session durations, often violate
    the normality assumptions of classical parametric tests (e.g., Student's or Welch's t-test).
    Applying a natural log transformation normalizes the distribution and stabilizes variance
    (homoscedasticity). The addition of 1 ensures that zero values remain mapped to zero.

    ??? mathbox "Mathematical Representation"

        The transformation is defined as:
        $$
        y_{\text{transformed}} = \ln(y + 1)
        $$
        This is mathematically equivalent to:
        $$
        \log1p(y)
        $$
        which maintains numerical precision for extremely small values of $y \approx 0$.

    Args:
        df (pd.DataFrame): The source DataFrame containing the column to transform.
        col (str): The name of the target column in `df` representing the skewed metric.

    Returns:
        pd.Series: A new pandas Series containing the log-transformed values.

    Examples:
        ??? example "Example"

            ```python
            >>> import pandas as pd
            >>> df = pd.DataFrame({"revenue": [0.0, 10.0, 150.5]})
            >>> log_transform(df, "revenue")
            0    0.000000
            1    2.397895
            2    5.020586
            Name: revenue, dtype: float64
            ```
    """
    return np.log1p(df[col])

delta_normalization

delta_normalization(df: DataFrame, col: str) -> Series

Normalizes metrics using the Delta Method expansion (Stub/Scaffolding).

The Delta Method is a general technique for approximating the variance of a function of random variables. For non-linear transformations or aggregate metrics (e.g., Click-Through-Rate where the denominator is not fixed), direct variance calculations are biased. Delta normalization computes a Taylor series expansion of the target function around its expected value to derive an asymptotically normal approximation.

Mathematical Context

Let $g(X)$ be a differentiable function of a random variable $X$ with mean $\mu$ and variance $\sigma^2$. The first-order Taylor expansion of $g(X)$ about $\mu$ is: $$ g(X) \approx g(\mu) + g'(\mu)(X - \mu) $$ Taking the variance of this linear approximation yields: $$ \text{Var}(g(X)) \approx [g'(\mu)]^2 \sigma^2 $$ For multidimensional vectors, such as ratio estimates of the form $g(X, Y) = X / Y$, the Taylor expansion incorporates the covariance between numerator and denominator: $$ \text{Var}\left(\frac{X}{Y}\right) \approx \frac{1}{\mu_Y^2} \text{Var}(X) + \frac{\mu_X^2}{\mu_Y^4} \text{Var}(Y) - 2 \frac{\mu_X}{\mu_Y^3} \text{Cov}(X, Y) $$

Args: df (pd.DataFrame): The source DataFrame containing the metric columns. col (str): The name of the column representing the metric to normalize.

RETURNS	DESCRIPTION
`Series`	pd.Series: A pandas Series of normalized values.

Source code in src\xpyrment\metrics\transformations.py

def delta_normalization(df: pd.DataFrame, col: str) -> pd.Series:
    r"""Normalizes metrics using the Delta Method expansion (Stub/Scaffolding).

    The Delta Method is a general technique for approximating the variance of a function of random
    variables. For non-linear transformations or aggregate metrics (e.g., Click-Through-Rate where the
    denominator is not fixed), direct variance calculations are biased. Delta normalization computes a
    Taylor series expansion of the target function around its expected value to derive an asymptotically
    normal approximation.

    ??? mathbox "Mathematical Context"

        Let $g(X)$ be a differentiable function of a random variable $X$ with mean $\mu$ and variance $\sigma^2$.
        The first-order Taylor expansion of $g(X)$ about $\mu$ is:
        $$
        g(X) \approx g(\mu) + g'(\mu)(X - \mu)
        $$
        Taking the variance of this linear approximation yields:
        $$
        \text{Var}(g(X)) \approx [g'(\mu)]^2 \sigma^2
        $$
        For multidimensional vectors, such as ratio estimates of the form $g(X, Y) = X / Y$, the Taylor expansion
        incorporates the covariance between numerator and denominator:
        $$
        \text{Var}\left(\frac{X}{Y}\right) \approx \frac{1}{\mu_Y^2} \text{Var}(X) + \frac{\mu_X^2}{\mu_Y^4} \text{Var}(Y) - 2 \frac{\mu_X}{\mu_Y^3} \text{Cov}(X, Y)
        $$
    Args:
        df (pd.DataFrame): The source DataFrame containing the metric columns.
        col (str): The name of the column representing the metric to normalize.

    Returns:
        pd.Series: A pandas Series of normalized values.
    """
    series = df[col]
    mean_val = series.mean()
    std_val = series.std(ddof=1)

    if pd.isna(std_val) or std_val == 0.0:
        return pd.Series(0.0, index=series.index, name=col)

    return (series - mean_val) / std_val

Metrics Module

📊 Supported Metric Models

📦 Package Reference

metrics

GuardrailMetric

check_breach

BaseMetric

calculate abstractmethod

MeanMetric

calculate

ProportionMetric

calculate

RatioMetric

calculate

log_transform

delta_normalization

calculate `abstractmethod`