Skip to content

Power

power

Power analysis and sample-size planning calculators.

This module provides industry-standard power calculators for experimental design, helping experimenters determine the minimum required sample size per variant to detect a target Minimum Detectable Effect (MDE) with specified Type I and Type II error thresholds (\(\alpha, \beta\)). It also handles variance reduction credit (CUPED sample-size deflation) and estimated experiment runtimes.

Mathematical Specifications

The required sample size per variant \(n\) for a two-sample t-test is given by: $$ n = \frac{2 \sigma^2 \left(Z_{1 - \alpha/2} + Z_{1 - \beta}\right)^2}{\delta^2} $$ where: - \(\sigma^2\): Population variance. For binary proportions (\(p\)), \(\sigma^2 = p(1 - p)\). - \(Z_{1 - \alpha/2}\): Standard normal critical value for a two-sided test at significance level \(\alpha\). - \(Z_{1 - \beta}\): Standard normal quantile corresponding to the desired statistical power (\(1 - \beta\)). - \(\delta\): The target absolute Minimum Detectable Effect (MDE).

If pre-period baseline covariates are available, the CUPED variance-adjusted sample size is: $$ n_{\text{CUPED}} = n \left(1 - \rho^2\right) $$ where \(\rho\) is the correlation between pre-period and experiment-period values.

CLASS DESCRIPTION
ExperimentDesignResult

Class to hold, format, and present experiment design and statistical power analysis results.

FUNCTION DESCRIPTION
design_experiment

Computes the required sample size and duration for an experiment based on design constraints.

generate_power_curve_data

Generates sample size coordinates across a range of relative MDE values.

ExperimentDesignResult

ExperimentDesignResult(details: Dict[str, Any])

Class to hold, format, and present experiment design and statistical power analysis results.

Provides high-fidelity text-based representations and structured summaries of design outputs to help experimenters evaluate sizing requirements, potential CUPED savings, and run runtimes.

ATTRIBUTE DESCRIPTION
details

Dictionary containing raw parameter values and sizing outputs from the power analysis engine.

TYPE: Dict[str, Any]

PARAMETER DESCRIPTION
details

Dictionary of design details from design_experiment.

TYPE: Dict[str, Any]

METHOD DESCRIPTION
summary

Compiles a structured summary of the experiment design parameters.

__repr__

Generates an aesthetic text block summary of the experiment design parameters.

Source code in src\xpyrment\plan\power.py
def __init__(self, details: Dict[str, Any]):
    """Initializes an ExperimentDesignResult wrapper.

    Args:
        details (Dict[str, Any]): Dictionary of design details from `design_experiment`.
    """
    self.details = details

summary

summary() -> Dict[str, list]

Compiles a structured summary of the experiment design parameters.

Formats raw numbers into clear, readable text elements (e.g., currency, percentages, and human-readable sample sizes with digit grouping).

RETURNS DESCRIPTION
Dict[str, list]

Dict[str, list]: A dictionary mapping parameter names to formatted value strings.

Source code in src\xpyrment\plan\power.py
def summary(self) -> Dict[str, list]:
    """Compiles a structured summary of the experiment design parameters.

    Formats raw numbers into clear, readable text elements (e.g., currency, percentages,
    and human-readable sample sizes with digit grouping).

    Returns:
        Dict[str, list]: A dictionary mapping parameter names to formatted value strings.
    """
    summary = {
        "Parameter": [
            "Metric Type",
            "Baseline Value",
            "Target MDE (Absolute)",
            "Target MDE (Relative)",
            "Significance Level (Alpha)",
            "Statistical Power (1-Beta)",
            "Sample Size Per Variant",
            "Total Sample Size Required",
        ],
        "Value": [
            self.details["metric_type"].capitalize(),
            f"{self.details['baseline_value']:.4f}",
            f"{self.details['mde_absolute']:.4f}",
            f"{self.details['mde_relative']:.2%}",
            f"{self.details['alpha']:.2%}",
            f"{self.details['power']:.2%}",
            f"{int(np.ceil(self.details['sample_size_per_variant'])):,}",
            f"{int(np.ceil(self.details['total_sample_size'])):,}",
        ],
    }

    if self.details.get("pre_post_correlation"):
        corr = self.details["pre_post_correlation"]
        reduced_size = self.details["cuped_sample_size_per_variant"]
        summary["Parameter"].extend([
            "Pre-Post Correlation",
            "CUPED Sample Size Per Variant",
            "CUPED Total Sample Size",
            "CUPED Sample Size Savings",
        ])
        summary["Value"].extend([
            f"{corr:.2f}",
            f"{int(np.ceil(reduced_size)):,}",
            f"{int(np.ceil(reduced_size * 2)):,}",
            f"{self.details['cuped_savings']:.1%}",
        ])

    if self.details.get("daily_traffic"):
        summary["Parameter"].extend([
            "Daily Traffic",
            "Estimated Duration (Standard)",
        ])
        summary["Value"].extend([
            f"{int(self.details['daily_traffic']):,}/day",
            f"{self.details['duration_days_standard']:.1f} days",
        ])
        if self.details.get("pre_post_correlation"):
            summary["Parameter"].append("Estimated Duration (CUPED)")
            summary["Value"].append(f"{self.details['duration_days_cuped']:.1f} days")

    return summary

__repr__

__repr__() -> str

Generates an aesthetic text block summary of the experiment design parameters.

Source code in src\xpyrment\plan\power.py
def __repr__(self) -> str:
    """Generates an aesthetic text block summary of the experiment design parameters."""
    s = "=========================================\n"
    s += "       Experiment Design Summary        \n"
    s += "=========================================\n"
    summary_dict = self.summary()
    for param, val in zip(summary_dict["Parameter"], summary_dict["Value"]):
        s += f"{param:<30}: {val}\n"
    s += "=========================================\n"
    return s

design_experiment

design_experiment(
    metric_type: str,
    baseline_value: float,
    standard_deviation: Optional[float] = None,
    mde: float = 0.05,
    mde_type: str = "relative",
    alpha: float = 0.05,
    power: float = 0.8,
    pre_post_correlation: Optional[float] = None,
    daily_traffic: Optional[int] = None,
) -> ExperimentDesignResult

Computes the required sample size and duration for an experiment based on design constraints.

This function performs rigorous a priori power analysis to determine required sample sizes. It supports continuous means, proportions, and ratio metrics, integrates pre-post correlation for CUPED calculation, and maps sizes to daily traffic to compute duration.

PARAMETER DESCRIPTION
metric_type

The statistical distribution type. Options are 'mean', 'proportion', or 'ratio'.

TYPE: str

baseline_value

The current historical control group value (mean or rate).

TYPE: float

standard_deviation

The historical standard deviation of the metric. Required for 'mean' and 'ratio' metric types. Ignored for 'proportion'.

TYPE: Optional[float] DEFAULT: None

mde

The target Minimum Detectable Effect. Expressed as a fraction of baseline for "relative" (e.g., 0.05 is 5%) or directly as a raw difference for "absolute". Defaults to 0.05.

TYPE: float DEFAULT: 0.05

mde_type

Dictates how mde is interpreted. Options are 'relative' or 'absolute'. Defaults to 'relative'.

TYPE: str DEFAULT: 'relative'

alpha

The probability of a Type I error (significance level, e.g., 0.05 for 95% confidence). Defaults to 0.05.

TYPE: float DEFAULT: 0.05

power

The desired statistical power (\(1 - \beta\), e.g., 0.80 to capture true effects 80% of the time). Defaults to 0.80.

TYPE: float DEFAULT: 0.8

pre_post_correlation

The correlation coefficient (\(\rho\)) between baseline pre-period values and active experiment-period values. If provided, calculates CUPED-deflated sizing. Defaults to None.

TYPE: Optional[float] DEFAULT: None

daily_traffic

Expected daily volume of unique units entering the experiment. If provided, calculates duration. Defaults to None.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
ExperimentDesignResult

A wrapper object containing formatted parameters and sample sizing calculations.

TYPE: ExperimentDesignResult

RAISES DESCRIPTION
ValueError

If statistical inputs are out of logical bounds (e.g., negative traffic, proportion baseline not in \((0, 1)\), or correlation not in \([-1, 1]\)).

ValueError

If standard deviation is missing for mean/ratio metrics.

Examples:

Example
>>> # Planning a conversion rate proportion test (10% baseline, relative MDE of 5%)
>>> result = design_experiment(
...     metric_type="proportion",
...     baseline_value=0.10,
...     mde=0.05,
...     mde_type="relative"
... )
>>> int(result.details["sample_size_per_variant"])
141258
Source code in src\xpyrment\plan\power.py
def design_experiment(
    metric_type: str,
    baseline_value: float,
    standard_deviation: Optional[float] = None,
    mde: float = 0.05,
    mde_type: str = "relative",
    alpha: float = 0.05,
    power: float = 0.80,
    pre_post_correlation: Optional[float] = None,
    daily_traffic: Optional[int] = None,
) -> ExperimentDesignResult:
    r"""Computes the required sample size and duration for an experiment based on design constraints.

    This function performs rigorous a priori power analysis to determine required sample sizes.
    It supports continuous means, proportions, and ratio metrics, integrates pre-post correlation
    for CUPED calculation, and maps sizes to daily traffic to compute duration.

    Args:
        metric_type (str): The statistical distribution type. Options are `'mean'`,
            `'proportion'`, or `'ratio'`.
        baseline_value (float): The current historical control group value (mean or rate).
        standard_deviation (Optional[float]): The historical standard deviation of the metric.
            Required for `'mean'` and `'ratio'` metric types. Ignored for `'proportion'`.
        mde (float): The target Minimum Detectable Effect. Expressed as a fraction of baseline for
            `"relative"` (e.g., `0.05` is 5%) or directly as a raw difference for `"absolute"`.
            Defaults to 0.05.
        mde_type (str): Dictates how `mde` is interpreted. Options are `'relative'` or `'absolute'`.
            Defaults to `'relative'`.
        alpha (float): The probability of a Type I error (significance level, e.g., 0.05 for 95% confidence).
            Defaults to 0.05.
        power (float): The desired statistical power ($1 - \beta$, e.g., 0.80 to capture true effects 80% of the time).
            Defaults to 0.80.
        pre_post_correlation (Optional[float]): The correlation coefficient ($\rho$) between baseline pre-period
            values and active experiment-period values. If provided, calculates CUPED-deflated sizing.
            Defaults to None.
        daily_traffic (Optional[int]): Expected daily volume of unique units entering the experiment. If provided,
            calculates duration. Defaults to None.

    Returns:
        ExperimentDesignResult: A wrapper object containing formatted parameters and sample sizing calculations.

    Raises:
        ValueError: If statistical inputs are out of logical bounds (e.g., negative traffic, proportion baseline
            not in $(0, 1)$, or correlation not in $[-1, 1]$).
        ValueError: If standard deviation is missing for mean/ratio metrics.

    Examples:
        ??? example "Example"

            ```python
            >>> # Planning a conversion rate proportion test (10% baseline, relative MDE of 5%)
            >>> result = design_experiment(
            ...     metric_type="proportion",
            ...     baseline_value=0.10,
            ...     mde=0.05,
            ...     mde_type="relative"
            ... )
            >>> int(result.details["sample_size_per_variant"])
            141258
            ```
    """
    metric_type = metric_type.lower()
    mde_type = mde_type.lower()

    if metric_type not in ["mean", "proportion", "ratio"]:
        raise ValueError("metric_type must be one of: 'mean', 'proportion', 'ratio'.")
    if mde_type not in ["relative", "absolute"]:
        raise ValueError("mde_type must be 'relative' or 'absolute'.")

    if mde_type == "relative":
        mde_absolute = baseline_value * mde
        mde_relative = mde
    else:
        mde_absolute = mde
        mde_relative = mde / baseline_value if baseline_value != 0 else 0.0

    if metric_type == "proportion":
        if baseline_value <= 0 or baseline_value >= 1:
            raise ValueError("For proportions, baseline_value must be strictly between 0 and 1.")
        variance = baseline_value * (1 - baseline_value)
    else:
        if standard_deviation is None:
            raise ValueError(f"standard_deviation is required for metric type '{metric_type}'.")
        variance = standard_deviation**2

    z_alpha = stats.norm.ppf(1 - alpha / 2)
    z_beta = stats.norm.ppf(power)

    factor = 2 * (z_alpha + z_beta) ** 2
    sample_size = factor * variance / (mde_absolute**2)

    details = {
        "metric_type": metric_type,
        "baseline_value": baseline_value,
        "standard_deviation": standard_deviation if metric_type != "proportion" else np.sqrt(variance),
        "mde_absolute": mde_absolute,
        "mde_relative": mde_relative,
        "alpha": alpha,
        "power": power,
        "sample_size_per_variant": sample_size,
        "total_sample_size": sample_size * 2,
    }

    if pre_post_correlation is not None:
        if not (-1.0 <= pre_post_correlation <= 1.0):
            raise ValueError("pre_post_correlation must be between -1.0 and 1.0.")

        vr_factor = 1.0 - (pre_post_correlation**2)
        cuped_sample_size = sample_size * vr_factor

        details["pre_post_correlation"] = pre_post_correlation
        details["cuped_sample_size_per_variant"] = cuped_sample_size
        details["cuped_savings"] = 1.0 - vr_factor

    if daily_traffic is not None:
        if daily_traffic <= 0:
            raise ValueError("daily_traffic must be positive.")
        details["daily_traffic"] = daily_traffic
        details["duration_days_standard"] = (sample_size * 2) / daily_traffic
        if pre_post_correlation is not None:
            details["duration_days_cuped"] = (cuped_sample_size * 2) / daily_traffic

    return ExperimentDesignResult(details)

generate_power_curve_data

generate_power_curve_data(
    metric_type: str,
    baseline_value: float,
    standard_deviation: Optional[float] = None,
    alpha: float = 0.05,
    power: float = 0.8,
    mde_range: Optional[ndarray] = None,
    pre_post_correlation: Optional[float] = None,
) -> Dict[str, ndarray]

Generates sample size coordinates across a range of relative MDE values.

This function calculates required sizing across a coordinate spectrum of possible MDEs, allowing downstream reporting tools to plot an interactive or static "power curve" graph (sample size vs. effect size).

PARAMETER DESCRIPTION
metric_type

Metric type ('mean', 'proportion', 'ratio').

TYPE: str

baseline_value

Historical control average value.

TYPE: float

standard_deviation

Historical metric standard deviation. Required for continuous.

TYPE: Optional[float] DEFAULT: None

alpha

Significance level. Defaults to 0.05.

TYPE: float DEFAULT: 0.05

power

Desired statistical power. Defaults to 0.80.

TYPE: float DEFAULT: 0.8

mde_range

Array of relative MDE points to evaluate. If not provided, evaluates 50 linear coordinates in \([0.01, 0.15]\). Defaults to None.

TYPE: Optional[ndarray] DEFAULT: None

pre_post_correlation

Pre-post correlation coefficient for CUPED-adjusted curve. Defaults to None.

TYPE: Optional[float] DEFAULT: None

RETURNS DESCRIPTION
Dict[str, ndarray]

Dict[str, np.ndarray]: Dictionary mapping coordinate names to numpy arrays of results. Contains keys 'mde_relative' and 'sample_size_per_variant', and optionally 'cuped_sample_size_per_variant'.

Source code in src\xpyrment\plan\power.py
def generate_power_curve_data(
    metric_type: str,
    baseline_value: float,
    standard_deviation: Optional[float] = None,
    alpha: float = 0.05,
    power: float = 0.80,
    mde_range: Optional[np.ndarray] = None,
    pre_post_correlation: Optional[float] = None,
) -> Dict[str, np.ndarray]:
    """Generates sample size coordinates across a range of relative MDE values.

    This function calculates required sizing across a coordinate spectrum of possible MDEs,
    allowing downstream reporting tools to plot an interactive or static "power curve"
    graph (sample size vs. effect size).

    Args:
        metric_type (str): Metric type ('mean', 'proportion', 'ratio').
        baseline_value (float): Historical control average value.
        standard_deviation (Optional[float]): Historical metric standard deviation. Required for continuous.
        alpha (float): Significance level. Defaults to 0.05.
        power (float): Desired statistical power. Defaults to 0.80.
        mde_range (Optional[np.ndarray]): Array of relative MDE points to evaluate. If not provided,
            evaluates 50 linear coordinates in $[0.01, 0.15]$. Defaults to None.
        pre_post_correlation (Optional[float]): Pre-post correlation coefficient for CUPED-adjusted curve.
            Defaults to None.

    Returns:
        Dict[str, np.ndarray]: Dictionary mapping coordinate names to numpy arrays of results.
            Contains keys `'mde_relative'` and `'sample_size_per_variant'`, and optionally
            `'cuped_sample_size_per_variant'`.
    """
    if mde_range is None:
        mde_range = np.linspace(0.01, 0.15, 50)

    sample_sizes = []
    cuped_sample_sizes = []

    for mde_val in mde_range:
        res = design_experiment(
            metric_type=metric_type,
            baseline_value=baseline_value,
            standard_deviation=standard_deviation,
            mde=mde_val,
            mde_type="relative",
            alpha=alpha,
            power=power,
            pre_post_correlation=pre_post_correlation,
        )
        sample_sizes.append(res.details["sample_size_per_variant"])
        if pre_post_correlation is not None:
            cuped_sample_sizes.append(res.details["cuped_sample_size_per_variant"])

    ret = {
        "mde_relative": mde_range,
        "sample_size_per_variant": np.array(sample_sizes),
    }

    if pre_post_correlation is not None:
        ret["cuped_sample_size_per_variant"] = np.array(cuped_sample_sizes)

    return ret