Skip to content

Metrics Module

The xpyrment.metrics package implements a unified, low-code statistical metrics taxonomy supporting automated CUPED variance reduction, Delta method ratio calculations, and frequentist/Bayesian evaluation pipelines.

📊 Supported Metric Models

The following metric formulas are dynamically registered and supported:

Metric Model Technical Class Path Key Features & Analytical Properties
MeanMetric xpyrment.metrics.taxonomy.MeanMetric A metric representing a continuous or numeric value (e.g., average revenue, sessions).
ProportionMetric xpyrment.metrics.taxonomy.ProportionMetric A metric representing a binary/proportion rate (e.g., conversion rate, success rate).
RatioMetric xpyrment.metrics.taxonomy.RatioMetric A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate).

📦 Package Reference

metrics

Metrics package for taxonomy, guardrails, and transformations.

This package houses the core definitions and statistical calculation routines for metrics evaluated during an experiment: - BaseMetric: The abstract base class establishing standard evaluation contracts. - MeanMetric: For continuous measurements, supporting pre-period CUPED adjustments. - ProportionMetric: For binary binomial event rates. - RatioMetric: For compound aggregate metrics (e.g., Click-Through Rate) evaluated via Delta Method variance. - GuardrailMetric: Special monitoring wrapper to prevent platform/business regressions. - log_transform: Normalization for extremely skewed continuous metrics. - delta_normalization: Taylor expansion adjustment for advanced aggregate metrics.

MODULE DESCRIPTION
guardrails

Guardrail metrics to protect core platform health and business stability.

taxonomy

Standardized metrics taxonomy, calculation engines, and variance reduction routines.

transformations

Mathematical transformations and normalization utilities for experimental telemetry.

CLASS DESCRIPTION
GuardrailMetric

Defines a guardrail metric with specific breach thresholds.

BaseMetric

Abstract base class representing a statistical metric in the experiment taxonomy.

MeanMetric

A metric representing a continuous or numeric value (e.g., average revenue, sessions).

ProportionMetric

A metric representing a binary/proportion rate (e.g., conversion rate, success rate).

RatioMetric

A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate).

FUNCTION DESCRIPTION
log_transform

Transforms continuous metrics using a shifted natural log transformation.

delta_normalization

Normalizes metrics using the Delta Method expansion (Stub/Scaffolding).

GuardrailMetric

GuardrailMetric(
    metric: BaseMetric, max_allowed_change: float = 0.01
)

Defines a guardrail metric with specific breach thresholds.

Guardrail metrics are designed to prevent treatment arms from causing catastrophic regressions on critical secondary metrics. Unlike primary metrics (where we search for significant positive changes), guardrail metrics are evaluated to ensure that they do not deteriorate beyond a pre-specified tolerance boundary, irrespective of statistical significance.

ATTRIBUTE DESCRIPTION
metric

The underlying metric to monitor (e.g., MeanMetric, RatioMetric).

TYPE: BaseMetric

max_allowed_change

The maximum tolerated relative change (positive or negative) expressed as a fraction (e.g., 0.01 represents a 1% threshold).

TYPE: float

Examples:

Example
>>> from xpyrment.metrics.taxonomy import MeanMetric
>>> from xpyrment.metrics.guardrails import GuardrailMetric
>>> latency_metric = MeanMetric("Page Latency", value_col="load_time")
>>> guardrail = GuardrailMetric(latency_metric, max_allowed_change=0.02) # 2% max increase
>>> calc_result = {"metric_name": "Page Latency", "relative_lift": 0.035} # 3.5% lift (regression)
>>> guardrail.check_breach(calc_result)
True
PARAMETER DESCRIPTION
metric

The concrete metric instance being monitored.

TYPE: BaseMetric

max_allowed_change

The threshold for the maximum absolute relative lift allowed before triggering a breach. Defaults to 0.01 (1%).

TYPE: float DEFAULT: 0.01

METHOD DESCRIPTION
check_breach

Determines if the calculated lift breaches the guardrail thresholds.

Source code in src\xpyrment\metrics\guardrails.py
def __init__(self, metric: BaseMetric, max_allowed_change: float = 0.01):
    """Initializes a GuardrailMetric wrapper.

    Args:
        metric (BaseMetric): The concrete metric instance being monitored.
        max_allowed_change (float): The threshold for the maximum absolute relative lift allowed before
            triggering a breach. Defaults to 0.01 (1%).
    """
    self.metric = metric
    self.max_allowed_change = max_allowed_change

check_breach

check_breach(calculation_result: Dict[str, Any]) -> bool

Determines if the calculated lift breaches the guardrail thresholds.

Mathematical Representation

Let \(L\) be the relative lift calculated for the wrapped metric: $$ L = \frac{\bar{Y}_T - \bar{Y}_C}{\bar{Y}_C} $$ A breach is detected if the magnitude of the relative lift exceeds the maximum allowed change: $$ \text{Breach} = |L| > \text{max_allowed_change} $$

Args: calculation_result (Dict[str, Any]): Output dictionary produced by calling metric.calculate() on experimental data.

RETURNS DESCRIPTION
bool

True if the relative lift is larger in magnitude than max_allowed_change (breached), False otherwise.

TYPE: bool

Source code in src\xpyrment\metrics\guardrails.py
def check_breach(self, calculation_result: Dict[str, Any]) -> bool:
    r"""Determines if the calculated lift breaches the guardrail thresholds.

    ??? mathbox "Mathematical Representation"

        Let $L$ be the relative lift calculated for the wrapped metric:
        $$
        L = \frac{\bar{Y}_T - \bar{Y}_C}{\bar{Y}_C}
        $$
        A breach is detected if the magnitude of the relative lift exceeds the maximum allowed change:
        $$
        \text{Breach} = |L| > \text{max\_allowed\_change}
        $$
    Args:
        calculation_result (Dict[str, Any]): Output dictionary produced by calling
            `metric.calculate()` on experimental data.

    Returns:
        bool: True if the relative lift is larger in magnitude than `max_allowed_change` (breached),
            False otherwise.
    """
    lift = calculation_result.get("relative_lift", 0.0)
    # Breach occurs if metric deteriorates beyond max_allowed_change
    return abs(lift) > self.max_allowed_change

BaseMetric

BaseMetric(name: str)

Bases: ABC

Abstract base class representing a statistical metric in the experiment taxonomy.

All custom metrics must inherit from BaseMetric and implement the abstract .calculate() method to return a standardized MetricResult dictionary.

ATTRIBUTE DESCRIPTION
name

The unique descriptive name of the metric.

TYPE: str

PARAMETER DESCRIPTION
name

Unique descriptive name of the metric.

TYPE: str

METHOD DESCRIPTION
calculate

Abstract method to compute statistics for control and treatment groups.

Source code in src\xpyrment\metrics\taxonomy.py
def __init__(self, name: str):
    """Initializes a BaseMetric.

    Args:
        name (str): Unique descriptive name of the metric.
    """
    self.name = name

calculate abstractmethod

calculate(
    df: DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
) -> Dict[str, Any]

Abstract method to compute statistics for control and treatment groups.

PARAMETER DESCRIPTION
df

The experimental dataset.

TYPE: DataFrame

treatment_col

Column name identifying experimental groups/arms.

TYPE: str

control

The value in treatment_col representing the control group.

TYPE: str

treatment

The value in treatment_col representing the treatment group.

TYPE: str

RETURNS DESCRIPTION
Dict[str, Any]

Dict[str, Any]: A compliant MetricResult dictionary containing mean, variance, p-value, confidence intervals, power, etc.

Source code in src\xpyrment\metrics\taxonomy.py
@abstractmethod
def calculate(
    self, df: pd.DataFrame, treatment_col: str, control: str, treatment: str
) -> Dict[str, Any]:
    """Abstract method to compute statistics for control and treatment groups.

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column name identifying experimental groups/arms.
        control (str): The value in `treatment_col` representing the control group.
        treatment (str): The value in `treatment_col` representing the treatment group.

    Returns:
        Dict[str, Any]: A compliant `MetricResult` dictionary containing mean, variance,
            p-value, confidence intervals, power, etc.
    """
    pass

MeanMetric

MeanMetric(
    name: str,
    value_col: str,
    pre_period_col: Optional[str] = None,
)

Bases: BaseMetric

A metric representing a continuous or numeric value (e.g., average revenue, sessions).

Supports optional pre-period CUPED (Controlled-comparison Using Pre-Existing Data) adjustment to explain away pre-existing variance and dramatically lower required sample sizes or runtimes.

ATTRIBUTE DESCRIPTION
value_col

The column in the DataFrame containing active experiment period values.

TYPE: str

pre_period_col

The column containing pre-experiment baseline values for CUPED.

TYPE: Optional[str]

PARAMETER DESCRIPTION
name

Unique descriptive name of the metric.

TYPE: str

value_col

Column name containing experiment period values.

TYPE: str

pre_period_col

Column name containing pre-experiment baseline values for CUPED. Defaults to None (no CUPED applied).

TYPE: Optional[str] DEFAULT: None

METHOD DESCRIPTION
calculate

Calculates descriptive and Welch's t-test statistics for the mean metric.

Source code in src\xpyrment\metrics\taxonomy.py
def __init__(
    self,
    name: str,
    value_col: str,
    pre_period_col: Optional[str] = None,
):
    """Initializes a MeanMetric.

    Args:
        name (str): Unique descriptive name of the metric.
        value_col (str): Column name containing experiment period values.
        pre_period_col (Optional[str]): Column name containing pre-experiment baseline values for CUPED.
            Defaults to None (no CUPED applied).
    """
    super().__init__(name)
    self.value_col = value_col
    self.pre_period_col = pre_period_col

calculate

calculate(
    df: DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]

Calculates descriptive and Welch's t-test statistics for the mean metric.

Drops missing values on the value column. If pre_period_col is provided, performs joint missing drop and executes a standard linear CUPED adjustment:

\[ Y_i^{\text{CUPED}} = Y_i - \theta (X_i - \bar{X}) \]
PARAMETER DESCRIPTION
df

The experimental dataset.

TYPE: DataFrame

treatment_col

Column identifying treatment assignments.

TYPE: str

control

Control arm identifier value in treatment_col.

TYPE: str

treatment

Treatment arm identifier value in treatment_col.

TYPE: str

alpha

Significance level for Welch's confidence intervals. Defaults to 0.05.

TYPE: float DEFAULT: 0.05

RETURNS DESCRIPTION
Dict[str, Any]

Dict[str, Any]: A completed MetricResult dictionary.

RAISES DESCRIPTION
ValueError

If either control or treatment group becomes empty after filtering.

Source code in src\xpyrment\metrics\taxonomy.py
def calculate(
    self,
    df: pd.DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]:
    r"""Calculates descriptive and Welch's t-test statistics for the mean metric.

    Drops missing values on the value column. If `pre_period_col` is provided,
    performs joint missing drop and executes a standard linear CUPED adjustment:

    $$
    Y_i^{\text{CUPED}} = Y_i - \theta (X_i - \bar{X})
    $$

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column identifying treatment assignments.
        control (str): Control arm identifier value in `treatment_col`.
        treatment (str): Treatment arm identifier value in `treatment_col`.
        alpha (float): Significance level for Welch's confidence intervals. Defaults to 0.05.

    Returns:
        Dict[str, Any]: A completed `MetricResult` dictionary.

    Raises:
        ValueError: If either control or treatment group becomes empty after filtering.
    """
    df_clean = df.dropna(subset=[self.value_col]).copy()

    c_mask = df_clean[treatment_col] == control
    t_mask = df_clean[treatment_col] == treatment

    n_c = int(np.sum(c_mask))
    n_t = int(np.sum(t_mask))

    if n_c == 0 or n_t == 0:
        raise ValueError(f"Control or treatment group is empty for metric {self.name}.")

    y = df_clean[self.value_col].to_numpy()

    cuped_achieved = False
    variance_reduction = 0.0

    if self.pre_period_col and self.pre_period_col in df_clean.columns:
        df_clean = df_clean.dropna(subset=[self.pre_period_col])
        c_mask = df_clean[treatment_col] == control
        t_mask = df_clean[treatment_col] == treatment
        n_c = int(np.sum(c_mask))
        n_t = int(np.sum(t_mask))

        y = df_clean[self.value_col].to_numpy()
        x = df_clean[self.pre_period_col].to_numpy()

        var_x = np.var(x, ddof=1)
        if not np.isclose(var_x, 0.0, atol=1e-12):
            cov_yx = np.cov(y, x, ddof=1)[0, 1]
            theta = cov_yx / var_x
            mean_x_global = np.mean(x)

            y_cuped = y - theta * (x - mean_x_global)

            y_c = y_cuped[c_mask]
            y_t = y_cuped[t_mask]

            mean_c = float(np.mean(y_c))
            mean_t = float(np.mean(y_t))

            var_c = float(np.var(y_c, ddof=1))
            var_t = float(np.var(y_t, ddof=1))

            cuped_achieved = True

            orig_var = np.var(y, ddof=1)
            adjusted_var = np.var(y_cuped, ddof=1)
            if orig_var > 0:
                variance_reduction = max(0.0, (orig_var - adjusted_var) / orig_var)
        else:
            mean_c = float(np.mean(y[c_mask]))
            mean_t = float(np.mean(y[t_mask]))
            var_c = float(np.var(y[c_mask], ddof=1))
            var_t = float(np.var(y[t_mask], ddof=1))
    else:
        mean_c = float(np.mean(y[c_mask]))
        mean_t = float(np.mean(y[t_mask]))
        var_c = float(np.var(y[c_mask], ddof=1))
        var_t = float(np.var(y[t_mask], ddof=1))

    stats_dict = self._calculate_stats(
        mean_c=mean_c,
        mean_t=mean_t,
        var_c=var_c,
        var_t=var_t,
        n_c=n_c,
        n_t=n_t,
        alpha=alpha,
    )

    diff = mean_t - mean_c
    relative_lift = diff / mean_c if mean_c != 0 else 0.0

    results = {
        "metric_name": self.name,
        "metric_type": "Mean",
        "control_mean": mean_c,
        "treatment_mean": treatment_mean if (treatment_mean := mean_t) is not None else mean_t,
        "control_var": var_c,
        "treatment_var": var_t,
        "control_n": n_c,
        "treatment_n": n_t,
        "absolute_difference": diff,
        "relative_lift": relative_lift,
        "cuped_applied": cuped_achieved,
        "variance_reduction": variance_reduction,
        **stats_dict,
    }

    return results

ProportionMetric

ProportionMetric(
    name: str,
    value_col: str,
    pre_period_col: Optional[str] = None,
)

Bases: MeanMetric

A metric representing a binary/proportion rate (e.g., conversion rate, success rate).

Inherits continuous logic from MeanMetric, as proportions can be modelled asymptotically using normal approximations (Z-test/t-test) under the Central Limit Theorem.

PARAMETER DESCRIPTION
name

Unique descriptive name of the metric.

TYPE: str

value_col

Column name containing experiment period values.

TYPE: str

pre_period_col

Column name containing pre-experiment baseline values for CUPED. Defaults to None (no CUPED applied).

TYPE: Optional[str] DEFAULT: None

METHOD DESCRIPTION
calculate

Calculates proportion conversion rates, differences, and statistical significance.

Source code in src\xpyrment\metrics\taxonomy.py
def __init__(
    self,
    name: str,
    value_col: str,
    pre_period_col: Optional[str] = None,
):
    """Initializes a MeanMetric.

    Args:
        name (str): Unique descriptive name of the metric.
        value_col (str): Column name containing experiment period values.
        pre_period_col (Optional[str]): Column name containing pre-experiment baseline values for CUPED.
            Defaults to None (no CUPED applied).
    """
    super().__init__(name)
    self.value_col = value_col
    self.pre_period_col = pre_period_col

calculate

calculate(
    df: DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]

Calculates proportion conversion rates, differences, and statistical significance.

Drops missing values, calculates means and variances of binary inputs, and delegates to MeanMetric.calculate while overriding the metric type string to "Proportion".

PARAMETER DESCRIPTION
df

The experimental dataset.

TYPE: DataFrame

treatment_col

Column identifying treatment assignments.

TYPE: str

control

Control arm identifier value.

TYPE: str

treatment

Treatment arm identifier value.

TYPE: str

alpha

Significance level. Defaults to 0.05.

TYPE: float DEFAULT: 0.05

RETURNS DESCRIPTION
Dict[str, Any]

Dict[str, Any]: Standardized results dict with "metric_type" set to "Proportion".

Source code in src\xpyrment\metrics\taxonomy.py
def calculate(
    self,
    df: pd.DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]:
    """Calculates proportion conversion rates, differences, and statistical significance.

    Drops missing values, calculates means and variances of binary inputs, and delegates
    to `MeanMetric.calculate` while overriding the metric type string to "Proportion".

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column identifying treatment assignments.
        control (str): Control arm identifier value.
        treatment (str): Treatment arm identifier value.
        alpha (float): Significance level. Defaults to 0.05.

    Returns:
        Dict[str, Any]: Standardized results dict with "metric_type" set to "Proportion".
    """
    res = super().calculate(df, treatment_col, control, treatment, alpha)
    res["metric_type"] = "Proportion"
    return res

RatioMetric

RatioMetric(
    name: str,
    numerator_col: str,
    denominator_col: str,
    pre_numerator_col: Optional[str] = None,
    pre_denominator_col: Optional[str] = None,
)

Bases: BaseMetric

A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate).

Employs the Delta Method to approximate ratio-level variances and supports double-covariate ratio-level CUPED adjustments to independently reduce variance in numerator and denominator.

ATTRIBUTE DESCRIPTION
numerator_col

The column containing active period numerator values.

TYPE: str

denominator_col

The column containing active period denominator values (must be \(>0\)).

TYPE: str

pre_numerator_col

Column containing pre-experiment baseline numerator values.

TYPE: Optional[str]

pre_denominator_col

Column containing pre-experiment baseline denominator values.

TYPE: Optional[str]

PARAMETER DESCRIPTION
name

Unique descriptive name.

TYPE: str

numerator_col

Active numerator column name.

TYPE: str

denominator_col

Active denominator column name.

TYPE: str

pre_numerator_col

Pre-experiment numerator column. Defaults to None.

TYPE: Optional[str] DEFAULT: None

pre_denominator_col

Pre-experiment denominator column. Defaults to None.

TYPE: Optional[str] DEFAULT: None

METHOD DESCRIPTION
calculate

Calculates ratio values, Delta-method variances, and statistical significance.

Source code in src\xpyrment\metrics\taxonomy.py
def __init__(
    self,
    name: str,
    numerator_col: str,
    denominator_col: str,
    pre_numerator_col: Optional[str] = None,
    pre_denominator_col: Optional[str] = None,
):
    """Initializes a RatioMetric.

    Args:
        name (str): Unique descriptive name.
        numerator_col (str): Active numerator column name.
        denominator_col (str): Active denominator column name.
        pre_numerator_col (Optional[str]): Pre-experiment numerator column. Defaults to None.
        pre_denominator_col (Optional[str]): Pre-experiment denominator column. Defaults to None.
    """
    super().__init__(name)
    self.numerator_col = numerator_col
    self.denominator_col = denominator_col
    self.pre_numerator_col = pre_numerator_col
    self.pre_denominator_col = pre_denominator_col

calculate

calculate(
    df: DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]

Calculates ratio values, Delta-method variances, and statistical significance.

Cleans missing values and non-positive denominators. If double-covariates are present, separately fits linear CUPED adjustments to the numerator and denominator series:

\[ U_i^{\text{CUPED}} = U_i - \theta_U (U_{i,\text{pre}} - \bar{U}_{\text{pre}}) \]
\[ V_i^{\text{CUPED}} = V_i - \theta_V (V_{i,\text{pre}} - \bar{V}_{\text{pre}}) \]

The ratio variance is then estimated using the Delta Method formulation:

\[ \text{Var}\left(\frac{U}{V}\right) \approx \frac{1}{\bar{V}^2} \left[ \text{Var}(U) + R^2 \text{Var}(V) - 2 R \text{Cov}(U, V) \right] \]
PARAMETER DESCRIPTION
df

The experimental dataset.

TYPE: DataFrame

treatment_col

Column identifying treatment assignments.

TYPE: str

control

Control arm identifier value.

TYPE: str

treatment

Treatment arm identifier value.

TYPE: str

alpha

Significance level. Defaults to 0.05.

TYPE: float DEFAULT: 0.05

RETURNS DESCRIPTION
Dict[str, Any]

Dict[str, Any]: Completed MetricResult dictionary.

RAISES DESCRIPTION
ValueError

If either control or treatment group becomes empty after filtering.

Source code in src\xpyrment\metrics\taxonomy.py
def calculate(
    self,
    df: pd.DataFrame,
    treatment_col: str,
    control: str,
    treatment: str,
    alpha: float = 0.05,
) -> Dict[str, Any]:
    r"""Calculates ratio values, Delta-method variances, and statistical significance.

    Cleans missing values and non-positive denominators. If double-covariates are present,
    separately fits linear CUPED adjustments to the numerator and denominator series:

    $$
    U_i^{\text{CUPED}} = U_i - \theta_U (U_{i,\text{pre}} - \bar{U}_{\text{pre}})
    $$

    $$
    V_i^{\text{CUPED}} = V_i - \theta_V (V_{i,\text{pre}} - \bar{V}_{\text{pre}})
    $$

    The ratio variance is then estimated using the Delta Method formulation:

    $$
    \text{Var}\left(\frac{U}{V}\right) \approx \frac{1}{\bar{V}^2} \left[ \text{Var}(U) + R^2 \text{Var}(V) - 2 R \text{Cov}(U, V) \right]
    $$

    Args:
        df (pd.DataFrame): The experimental dataset.
        treatment_col (str): Column identifying treatment assignments.
        control (str): Control arm identifier value.
        treatment (str): Treatment arm identifier value.
        alpha (float): Significance level. Defaults to 0.05.

    Returns:
        Dict[str, Any]: Completed `MetricResult` dictionary.

    Raises:
        ValueError: If either control or treatment group becomes empty after filtering.
    """
    df_clean = df.dropna(subset=[self.numerator_col, self.denominator_col]).copy()
    df_clean = df_clean[df_clean[self.denominator_col] > 0]

    c_mask = df_clean[treatment_col] == control
    t_mask = df_clean[treatment_col] == treatment

    n_c = int(np.sum(c_mask))
    n_t = int(np.sum(t_mask))

    if n_c == 0 or n_t == 0:
        raise ValueError(f"Control or treatment group is empty for ratio metric {self.name}.")

    cuped_achieved = False
    variance_reduction = 0.0

    num = df_clean[self.numerator_col].to_numpy()
    den = df_clean[self.denominator_col].to_numpy()

    if (
        self.pre_numerator_col
        and self.pre_denominator_col
        and self.pre_numerator_col in df_clean.columns
        and self.pre_denominator_col in df_clean.columns
    ):
        df_clean = df_clean.dropna(subset=[self.pre_numerator_col, self.pre_denominator_col])
        df_clean = df_clean[df_clean[self.pre_denominator_col] > 0]

        c_mask = df_clean[treatment_col] == control
        t_mask = df_clean[treatment_col] == treatment
        n_c = int(np.sum(c_mask))
        n_t = int(np.sum(t_mask))

        num = df_clean[self.numerator_col].to_numpy()
        den = df_clean[self.denominator_col].to_numpy()
        pre_num = df_clean[self.pre_numerator_col].to_numpy()
        pre_den = df_clean[self.pre_denominator_col].to_numpy()

        var_pre_num = np.var(pre_num, ddof=1)
        var_pre_den = np.var(pre_den, ddof=1)

        if not np.isclose(var_pre_num, 0.0, atol=1e-12) and not np.isclose(var_pre_den, 0.0, atol=1e-12):
            cov_num = np.cov(num, pre_num, ddof=1)[0, 1]
            theta_num = cov_num / var_pre_num
            mean_pre_num_global = np.mean(pre_num)
            num_cuped = num - theta_num * (pre_num - mean_pre_num_global)

            cov_den = np.cov(den, pre_den, ddof=1)[0, 1]
            theta_den = cov_den / var_pre_den
            mean_pre_den_global = np.mean(pre_den)
            den_cuped = den - theta_den * (pre_den - mean_pre_den_global)

            num_c, num_t = num_cuped[c_mask], num_cuped[t_mask]
            den_c, den_t = den_cuped[c_mask], den_cuped[t_mask]

            mean_num_c, mean_num_t = np.mean(num_c), np.mean(num_t)
            mean_den_c, mean_den_t = np.mean(den_c), np.mean(den_t)

            ratio_c = mean_num_c / mean_den_c
            ratio_t = mean_num_t / mean_den_t

            var_num_c = np.var(num_c, ddof=1)
            var_den_c = np.var(den_c, ddof=1)
            cov_num_den_c = np.cov(num_c, den_c, ddof=1)[0, 1]

            var_ratio_c = (1 / (mean_den_c**2)) * (
                var_num_c + (ratio_c**2) * var_den_c - 2 * ratio_c * cov_num_den_c
            )

            var_num_t = np.var(num_t, ddof=1)
            var_den_t = np.var(den_t, ddof=1)
            cov_num_den_t = np.cov(num_t, den_t, ddof=1)[0, 1]

            var_ratio_t = (1 / (mean_den_t**2)) * (
                var_num_t + (ratio_t**2) * var_den_t - 2 * ratio_t * cov_num_den_t
            )

            cuped_achieved = True

            orig_ratio_global = np.mean(num) / np.mean(den)
            orig_var_ratio = (1 / (np.mean(den) ** 2)) * (
                np.var(num, ddof=1)
                + (orig_ratio_global**2) * np.var(den, ddof=1)
                - 2 * orig_ratio_global * np.cov(num, den, ddof=1)[0, 1]
            )

            adj_ratio_global = np.mean(num_cuped) / np.mean(den_cuped)
            adj_var_ratio = (1 / (np.mean(den_cuped) ** 2)) * (
                np.var(num_cuped, ddof=1)
                + (adj_ratio_global**2) * np.var(den_cuped, ddof=1)
                - 2 * adj_ratio_global * np.cov(num_cuped, den_cuped, ddof=1)[0, 1]
            )

            if orig_var_ratio > 0:
                variance_reduction = max(0.0, (orig_var_ratio - adj_var_ratio) / orig_var_ratio)
        else:
            mean_num_c, mean_num_t = np.mean(num[c_mask]), np.mean(num[t_mask])
            mean_den_c, mean_den_t = np.mean(den[c_mask]), np.mean(den[t_mask])
            ratio_c = mean_num_c / mean_den_c
            ratio_t = mean_num_t / mean_den_t

            var_num_c = np.var(num[c_mask], ddof=1)
            var_den_c = np.var(den[c_mask], ddof=1)
            cov_num_den_c = np.cov(num[c_mask], den[c_mask], ddof=1)[0, 1]
            var_ratio_c = (1 / (mean_den_c**2)) * (
                var_num_c + (ratio_c**2) * var_den_c - 2 * ratio_c * cov_num_den_c
            )

            var_num_t = np.var(num[t_mask], ddof=1)
            var_den_t = np.var(den[t_mask], ddof=1)
            cov_num_den_t = np.cov(num[t_mask], den[t_mask], ddof=1)[0, 1]
            var_ratio_t = (1 / (mean_den_t**2)) * (
                var_num_t + (ratio_t**2) * var_den_t - 2 * ratio_t * cov_num_den_t
            )
    else:
        mean_num_c, mean_num_t = np.mean(num[c_mask]), np.mean(num[t_mask])
        mean_den_c, mean_den_t = np.mean(den[c_mask]), np.mean(den[t_mask])
        ratio_c = mean_num_c / mean_den_c
        ratio_t = mean_num_t / mean_den_t

        var_num_c = np.var(num[c_mask], ddof=1)
        var_den_c = np.var(den[c_mask], ddof=1)
        cov_num_den_c = np.cov(num[c_mask], den[c_mask], ddof=1)[0, 1]
        var_ratio_c = (1 / (mean_den_c**2)) * (
            var_num_c + (ratio_c**2) * var_den_c - 2 * ratio_c * cov_num_den_c
        )

        var_num_t = np.var(num[t_mask], ddof=1)
        var_den_t = np.var(den[t_mask], ddof=1)
        cov_num_den_t = np.cov(num[t_mask], den[t_mask], ddof=1)[0, 1]
        var_ratio_t = (1 / (mean_den_t**2)) * (
            var_num_t + (ratio_t**2) * var_den_t - 2 * ratio_t * cov_num_den_t
        )

    stats_dict = self._calculate_stats(
        mean_c=ratio_c,
        mean_t=ratio_t,
        var_c=var_ratio_c,
        var_t=var_ratio_t,
        n_c=n_c,
        n_t=n_t,
        alpha=alpha,
    )

    diff = ratio_t - ratio_c
    relative_lift = diff / ratio_c if ratio_c != 0 else 0.0

    results = {
        "metric_name": self.name,
        "metric_type": "Ratio",
        "control_mean": ratio_c,
        "treatment_mean": ratio_t,
        "control_var": var_ratio_c,
        "treatment_var": var_ratio_t,
        "control_n": n_c,
        "treatment_n": n_t,
        "absolute_difference": diff,
        "relative_lift": relative_lift,
        "cuped_applied": cuped_achieved,
        "variance_reduction": variance_reduction,
        **stats_dict,
    }

    return results

log_transform

log_transform(df: DataFrame, col: str) -> Series

Transforms continuous metrics using a shifted natural log transformation.

Highly skewed distributions, such as revenue per user or session durations, often violate the normality assumptions of classical parametric tests (e.g., Student's or Welch's t-test). Applying a natural log transformation normalizes the distribution and stabilizes variance (homoscedasticity). The addition of 1 ensures that zero values remain mapped to zero.

Mathematical Representation

The transformation is defined as: $$ y_{\text{transformed}} = \ln(y + 1) $$ This is mathematically equivalent to: $$ \log1p(y) $$ which maintains numerical precision for extremely small values of \(y \approx 0\).

PARAMETER DESCRIPTION
df

The source DataFrame containing the column to transform.

TYPE: DataFrame

col

The name of the target column in df representing the skewed metric.

TYPE: str

RETURNS DESCRIPTION
Series

pd.Series: A new pandas Series containing the log-transformed values.

Examples:

Example
>>> import pandas as pd
>>> df = pd.DataFrame({"revenue": [0.0, 10.0, 150.5]})
>>> log_transform(df, "revenue")
0    0.000000
1    2.397895
2    5.020586
Name: revenue, dtype: float64
Source code in src\xpyrment\metrics\transformations.py
def log_transform(df: pd.DataFrame, col: str) -> pd.Series:
    r"""Transforms continuous metrics using a shifted natural log transformation.

    Highly skewed distributions, such as revenue per user or session durations, often violate
    the normality assumptions of classical parametric tests (e.g., Student's or Welch's t-test).
    Applying a natural log transformation normalizes the distribution and stabilizes variance
    (homoscedasticity). The addition of 1 ensures that zero values remain mapped to zero.

    ??? mathbox "Mathematical Representation"

        The transformation is defined as:
        $$
        y_{\text{transformed}} = \ln(y + 1)
        $$
        This is mathematically equivalent to:
        $$
        \log1p(y)
        $$
        which maintains numerical precision for extremely small values of $y \approx 0$.

    Args:
        df (pd.DataFrame): The source DataFrame containing the column to transform.
        col (str): The name of the target column in `df` representing the skewed metric.

    Returns:
        pd.Series: A new pandas Series containing the log-transformed values.

    Examples:
        ??? example "Example"

            ```python
            >>> import pandas as pd
            >>> df = pd.DataFrame({"revenue": [0.0, 10.0, 150.5]})
            >>> log_transform(df, "revenue")
            0    0.000000
            1    2.397895
            2    5.020586
            Name: revenue, dtype: float64
            ```
    """
    return np.log1p(df[col])

delta_normalization

delta_normalization(df: DataFrame, col: str) -> Series

Normalizes metrics using the Delta Method expansion (Stub/Scaffolding).

The Delta Method is a general technique for approximating the variance of a function of random variables. For non-linear transformations or aggregate metrics (e.g., Click-Through-Rate where the denominator is not fixed), direct variance calculations are biased. Delta normalization computes a Taylor series expansion of the target function around its expected value to derive an asymptotically normal approximation.

Mathematical Context

Let \(g(X)\) be a differentiable function of a random variable \(X\) with mean \(\mu\) and variance \(\sigma^2\). The first-order Taylor expansion of \(g(X)\) about \(\mu\) is: $$ g(X) \approx g(\mu) + g'(\mu)(X - \mu) $$ Taking the variance of this linear approximation yields: $$ \text{Var}(g(X)) \approx [g'(\mu)]^2 \sigma^2 $$ For multidimensional vectors, such as ratio estimates of the form \(g(X, Y) = X / Y\), the Taylor expansion incorporates the covariance between numerator and denominator: $$ \text{Var}\left(\frac{X}{Y}\right) \approx \frac{1}{\mu_Y^2} \text{Var}(X) + \frac{\mu_X^2}{\mu_Y^4} \text{Var}(Y) - 2 \frac{\mu_X}{\mu_Y^3} \text{Cov}(X, Y) $$

Args: df (pd.DataFrame): The source DataFrame containing the metric columns. col (str): The name of the column representing the metric to normalize.

RETURNS DESCRIPTION
Series

pd.Series: A pandas Series of normalized values.

Source code in src\xpyrment\metrics\transformations.py
def delta_normalization(df: pd.DataFrame, col: str) -> pd.Series:
    r"""Normalizes metrics using the Delta Method expansion (Stub/Scaffolding).

    The Delta Method is a general technique for approximating the variance of a function of random
    variables. For non-linear transformations or aggregate metrics (e.g., Click-Through-Rate where the
    denominator is not fixed), direct variance calculations are biased. Delta normalization computes a
    Taylor series expansion of the target function around its expected value to derive an asymptotically
    normal approximation.

    ??? mathbox "Mathematical Context"

        Let $g(X)$ be a differentiable function of a random variable $X$ with mean $\mu$ and variance $\sigma^2$.
        The first-order Taylor expansion of $g(X)$ about $\mu$ is:
        $$
        g(X) \approx g(\mu) + g'(\mu)(X - \mu)
        $$
        Taking the variance of this linear approximation yields:
        $$
        \text{Var}(g(X)) \approx [g'(\mu)]^2 \sigma^2
        $$
        For multidimensional vectors, such as ratio estimates of the form $g(X, Y) = X / Y$, the Taylor expansion
        incorporates the covariance between numerator and denominator:
        $$
        \text{Var}\left(\frac{X}{Y}\right) \approx \frac{1}{\mu_Y^2} \text{Var}(X) + \frac{\mu_X^2}{\mu_Y^4} \text{Var}(Y) - 2 \frac{\mu_X}{\mu_Y^3} \text{Cov}(X, Y)
        $$
    Args:
        df (pd.DataFrame): The source DataFrame containing the metric columns.
        col (str): The name of the column representing the metric to normalize.

    Returns:
        pd.Series: A pandas Series of normalized values.
    """
    series = df[col]
    mean_val = series.mean()
    std_val = series.std(ddof=1)

    if pd.isna(std_val) or std_val == 0.0:
        return pd.Series(0.0, index=series.index, name=col)

    return (series - mean_val) / std_val