Skip to content

Plan Module

The xpyrment.plan module contains submodules and components for plan.

plan

Experiment planning, power analysis, duration estimation, and preregistration.

This package houses components dedicated to the planning stage of the experimental lifecycle, helping experimenters define scientific/business hypotheses, calculate necessary sample sizes and run runtimes, and lock plans via cryptographic preregistration.

Submodules: - hypothesis: Forms HypothesisSpec containers mapping outcomes to statistical directions. - power: Handles a priori statistical power calculations and sample sizing. - duration: Maps required sample sizes to calendar run durations. - preregistration: Issues immutable PreregistrationCards to protect analysis integrity.

MODULE DESCRIPTION
duration

Duration estimation utilities based on statistical requirements and traffic pipelines.

hypothesis

Hypothesis specification layer binding metrics to targeted statistical directions.

power

Power analysis and sample-size planning calculators.

preregistration

Preregistration specifications to prevent retrospective study optimization.

CLASS DESCRIPTION
HypothesisSpec

Specifies the hypothesis under test and binds it to a primary metric.

ExperimentDesignResult

Class to hold, format, and present experiment design and statistical power analysis results.

PreregistrationCard

Represents an immutable spec card registered prior to running the experiment.

FUNCTION DESCRIPTION
estimate_duration_days

Estimates the required experiment run duration in days.

design_experiment

Computes the required sample size and duration for an experiment based on design constraints.

generate_power_curve_data

Generates sample size coordinates across a range of relative MDE values.

HypothesisSpec

HypothesisSpec(
    primary_metric: BaseMetric,
    direction: Literal[
        "two-sided", "greater", "less"
    ] = "two-sided",
    description: str = "",
)

Specifies the hypothesis under test and binds it to a primary metric.

This class serves as the formal pre-registered scientific statement of the experiment's goal, preventing retrospective hypothesis adjustments after seeing results (HARKing). It defines the directional expectation of the treatment effect.

Mathematical Specifications

Let \(\theta_C\) and \(\theta_T\) represent the true population parameter (e.g., mean, rate, or ratio) of the control and treatment groups, respectively, for the bound primary_metric.

  1. "two-sided" (Default): Tests for any difference between arms, representing the standard industrial default. $$ H_0: \theta_T = \theta_C \quad \text{vs.} \quad H_1: \theta_T \neq \theta_C $$
  2. "greater" (One-sided upper-tailed): Tests whether treatment is strictly better than control. $$ H_0: \theta_T \le \theta_C \quad \text{vs.} \quad H_1: \theta_T > \theta_C $$
  3. "less" (One-sided lower-tailed): Tests whether treatment is strictly worse than control (typically used for testing negative side effects or latency increases). $$ H_0: \theta_T \ge \theta_C \quad \text{vs.} \quad H_1: \theta_T < \theta_C $$

Attributes: primary_metric (BaseMetric): The registered metric used as the primary outcome variable for evaluating this hypothesis. direction (Literal["two-sided", "greater", "less"]): The directional mathematical sign of the statistical test. Defaults to "two-sided". description (str): Text describing the business or scientific hypothesis in natural language.

Examples:

Example
>>> from xpyrment.metrics.taxonomy import ProportionMetric
>>> from xpyrment.plan.hypothesis import HypothesisSpec
>>> conv_metric = ProportionMetric("Conversion Rate", value_col="converted")
>>> spec = HypothesisSpec(
...     primary_metric=conv_metric,
...     direction="greater",
...     description="Redesigned checkout button increases conversion rates."
... )
>>> spec.direction
'greater'
PARAMETER DESCRIPTION
primary_metric

The evaluation metric being tested.

TYPE: BaseMetric

direction

The statistical test directionality. Options are "two-sided", "greater", or "less". Defaults to "two-sided".

TYPE: Literal['two-sided', 'greater', 'less'] DEFAULT: 'two-sided'

description

A descriptive summary of the business hypothesis. Defaults to "".

TYPE: str DEFAULT: ''

Source code in src\xpyrment\plan\hypothesis.py
def __init__(
    self,
    primary_metric: BaseMetric,
    direction: Literal["two-sided", "greater", "less"] = "two-sided",
    description: str = "",
):
    """Initializes a new HypothesisSpec.

    Args:
        primary_metric (BaseMetric): The evaluation metric being tested.
        direction (Literal["two-sided", "greater", "less"]): The statistical test directionality.
            Options are `"two-sided"`, `"greater"`, or `"less"`. Defaults to `"two-sided"`.
        description (str): A descriptive summary of the business hypothesis. Defaults to "".
    """
    self.primary_metric = primary_metric
    self.direction = direction
    self.description = description

ExperimentDesignResult

ExperimentDesignResult(details: Dict[str, Any])

Class to hold, format, and present experiment design and statistical power analysis results.

Provides high-fidelity text-based representations and structured summaries of design outputs to help experimenters evaluate sizing requirements, potential CUPED savings, and run runtimes.

ATTRIBUTE DESCRIPTION
details

Dictionary containing raw parameter values and sizing outputs from the power analysis engine.

TYPE: Dict[str, Any]

PARAMETER DESCRIPTION
details

Dictionary of design details from design_experiment.

TYPE: Dict[str, Any]

METHOD DESCRIPTION
summary

Compiles a structured summary of the experiment design parameters.

__repr__

Generates an aesthetic text block summary of the experiment design parameters.

Source code in src\xpyrment\plan\power.py
def __init__(self, details: Dict[str, Any]):
    """Initializes an ExperimentDesignResult wrapper.

    Args:
        details (Dict[str, Any]): Dictionary of design details from `design_experiment`.
    """
    self.details = details

summary

summary() -> Dict[str, list]

Compiles a structured summary of the experiment design parameters.

Formats raw numbers into clear, readable text elements (e.g., currency, percentages, and human-readable sample sizes with digit grouping).

RETURNS DESCRIPTION
Dict[str, list]

Dict[str, list]: A dictionary mapping parameter names to formatted value strings.

Source code in src\xpyrment\plan\power.py
def summary(self) -> Dict[str, list]:
    """Compiles a structured summary of the experiment design parameters.

    Formats raw numbers into clear, readable text elements (e.g., currency, percentages,
    and human-readable sample sizes with digit grouping).

    Returns:
        Dict[str, list]: A dictionary mapping parameter names to formatted value strings.
    """
    summary = {
        "Parameter": [
            "Metric Type",
            "Baseline Value",
            "Target MDE (Absolute)",
            "Target MDE (Relative)",
            "Significance Level (Alpha)",
            "Statistical Power (1-Beta)",
            "Sample Size Per Variant",
            "Total Sample Size Required",
        ],
        "Value": [
            self.details["metric_type"].capitalize(),
            f"{self.details['baseline_value']:.4f}",
            f"{self.details['mde_absolute']:.4f}",
            f"{self.details['mde_relative']:.2%}",
            f"{self.details['alpha']:.2%}",
            f"{self.details['power']:.2%}",
            f"{int(np.ceil(self.details['sample_size_per_variant'])):,}",
            f"{int(np.ceil(self.details['total_sample_size'])):,}",
        ],
    }

    if self.details.get("pre_post_correlation"):
        corr = self.details["pre_post_correlation"]
        reduced_size = self.details["cuped_sample_size_per_variant"]
        summary["Parameter"].extend([
            "Pre-Post Correlation",
            "CUPED Sample Size Per Variant",
            "CUPED Total Sample Size",
            "CUPED Sample Size Savings",
        ])
        summary["Value"].extend([
            f"{corr:.2f}",
            f"{int(np.ceil(reduced_size)):,}",
            f"{int(np.ceil(reduced_size * 2)):,}",
            f"{self.details['cuped_savings']:.1%}",
        ])

    if self.details.get("daily_traffic"):
        summary["Parameter"].extend([
            "Daily Traffic",
            "Estimated Duration (Standard)",
        ])
        summary["Value"].extend([
            f"{int(self.details['daily_traffic']):,}/day",
            f"{self.details['duration_days_standard']:.1f} days",
        ])
        if self.details.get("pre_post_correlation"):
            summary["Parameter"].append("Estimated Duration (CUPED)")
            summary["Value"].append(f"{self.details['duration_days_cuped']:.1f} days")

    return summary

__repr__

__repr__() -> str

Generates an aesthetic text block summary of the experiment design parameters.

Source code in src\xpyrment\plan\power.py
def __repr__(self) -> str:
    """Generates an aesthetic text block summary of the experiment design parameters."""
    s = "=========================================\n"
    s += "       Experiment Design Summary        \n"
    s += "=========================================\n"
    summary_dict = self.summary()
    for param, val in zip(summary_dict["Parameter"], summary_dict["Value"]):
        s += f"{param:<30}: {val}\n"
    s += "=========================================\n"
    return s

PreregistrationCard

PreregistrationCard(
    experiment_id: str, spec: Dict[str, Any]
)

Represents an immutable spec card registered prior to running the experiment.

A PreregistrationCard bundles the complete configurations of an experiment (such as primary and secondary metrics, target statistical power, significance thresholds, and sample sizes) and secures them against post-hoc tampering by computing a SHA-256 signature via the ExperimentRegistry.

Why Pre-registration Matters

In both clinical trials and industrial A/B testing, a common failure mode is post-hoc tampering (p-hacking, retrospective metric selection, or adjusting significance thresholds mid-run). A pre-registered card acts as an immutable ledger. Before final reports are compiled, the system validates the active analysis configuration against this card's signature.

ATTRIBUTE DESCRIPTION
experiment_id

Unique tracking identifier for the experiment.

TYPE: str

spec

Structural dictionary outlining the planned experimental parameters.

TYPE: Dict[str, Any]

hash_signature

Cryptographic SHA-256 hash representing the serialized spec dictionary.

TYPE: str

Examples:

Example
>>> spec = {"metric": "conversion_rate", "alpha": 0.05, "target_n": 10000}
>>> card = PreregistrationCard("EXP-999", spec)
>>> card.hash_signature[:8]
'88e6e885'
>>> card.verify({"metric": "conversion_rate", "alpha": 0.05, "target_n": 10000})
True
>>> card.verify({"metric": "conversion_rate", "alpha": 0.10, "target_n": 10000}) # alpha altered!
False
PARAMETER DESCRIPTION
experiment_id

Unique tracking identifier for the experiment.

TYPE: str

spec

Dictionary containing complete design parameters to be registered.

TYPE: Dict[str, Any]

METHOD DESCRIPTION
verify

Verifies if the current spec matches the registered immutable signature.

to_json

Serializes the preregistration card parameters and cryptographic signature to JSON.

Source code in src\xpyrment\plan\preregistration.py
def __init__(self, experiment_id: str, spec: Dict[str, Any]):
    """Initializes a new PreregistrationCard and registers its specification.

    Args:
        experiment_id (str): Unique tracking identifier for the experiment.
        spec (Dict[str, Any]): Dictionary containing complete design parameters
            to be registered.
    """
    self.experiment_id = experiment_id
    self.spec = spec
    self._registry = ExperimentRegistry()
    self.hash_signature = self._registry.register_spec(experiment_id, spec)

verify

verify(current_spec: Dict[str, Any]) -> bool

Verifies if the current spec matches the registered immutable signature.

Compares the provided specification dictionary against the original registered spec.

PARAMETER DESCRIPTION
current_spec

The operational parameters dictionary currently under execution.

TYPE: Dict[str, Any]

RETURNS DESCRIPTION
bool

True if the current specification matches the pre-registered specification exactly, False if there is any mismatch or if the card was not properly recorded.

TYPE: bool

Source code in src\xpyrment\plan\preregistration.py
def verify(self, current_spec: Dict[str, Any]) -> bool:
    """Verifies if the current spec matches the registered immutable signature.

    Compares the provided specification dictionary against the original registered spec.

    Args:
        current_spec (Dict[str, Any]): The operational parameters dictionary currently under execution.

    Returns:
        bool: True if the current specification matches the pre-registered specification exactly,
            False if there is any mismatch or if the card was not properly recorded.
    """
    return self._registry.verify_spec(self.experiment_id, current_spec)

to_json

to_json() -> str

Serializes the preregistration card parameters and cryptographic signature to JSON.

RETURNS DESCRIPTION
str

Indented, human-readable JSON string representing the card metadata.

TYPE: str

Source code in src\xpyrment\plan\preregistration.py
def to_json(self) -> str:
    """Serializes the preregistration card parameters and cryptographic signature to JSON.

    Returns:
        str: Indented, human-readable JSON string representing the card metadata.
    """
    return json.dumps({
        "experiment_id": self.experiment_id,
        "spec": self.spec,
        "signature": self.hash_signature
    }, indent=2)

estimate_duration_days

estimate_duration_days(
    required_sample_size: Real, daily_traffic: Real
) -> float

Estimates the required experiment run duration in days.

Translates the calculated target sample size (\(N_{\text{required}}\)) into the estimated calendar days needed to accumulate that sample volume based on active daily traffic (\(T_{\text{daily}}\)).

Mathematical Model

The duration in days (\(D\)) is computed as: $$ D = \frac{N_{\text{required}}}{T_{\text{daily}}} $$ where \(N_{\text{required}}\) represents the combined total sample size across all active arms (control + treatment arms) or the single-arm requirement multiplied by the number of arms.

PARAMETER DESCRIPTION
required_sample_size

The total sample size needed across all arms combined (e.g., control \(n\) + treatment \(n\)). Must be greater than zero.

TYPE: Real

daily_traffic

The expected number of unique qualifying experimental units (e.g., users, sessions, or pageviews) entering the experiment pipeline per day. Must be greater than zero.

TYPE: Real

RETURNS DESCRIPTION
float

Estimated run duration in decimal calendar days.

TYPE: float

RAISES DESCRIPTION
TypeError

If required_sample_size or daily_traffic is not a real number.

ValueError

If required_sample_size or daily_traffic is less than or equal to zero.

Examples:

Example
>>> estimate_duration_days(required_sample_size=50000, daily_traffic=5000)
10.0
Source code in src\xpyrment\plan\duration.py
def estimate_duration_days(required_sample_size: numbers.Real, daily_traffic: numbers.Real) -> float:
    r"""Estimates the required experiment run duration in days.

    Translates the calculated target sample size ($N_{\text{required}}$) into the estimated calendar days
    needed to accumulate that sample volume based on active daily traffic ($T_{\text{daily}}$).

    ??? mathbox "Mathematical Model"

        The duration in days ($D$) is computed as:
        $$
        D = \frac{N_{\text{required}}}{T_{\text{daily}}}
        $$
        where $N_{\text{required}}$ represents the combined total sample size across all active arms
        (control + treatment arms) or the single-arm requirement multiplied by the number of arms.

    Args:
        required_sample_size (numbers.Real): The total sample size needed across all arms combined
            (e.g., control $n$ + treatment $n$). Must be greater than zero.
        daily_traffic (numbers.Real): The expected number of unique qualifying experimental units (e.g., users,
            sessions, or pageviews) entering the experiment pipeline per day. Must be greater than zero.

    Returns:
        float: Estimated run duration in decimal calendar days.

    Raises:
        TypeError: If `required_sample_size` or `daily_traffic` is not a real number.
        ValueError: If `required_sample_size` or `daily_traffic` is less than or equal to zero.

    Examples:
        ??? example "Example"

            ```python
            >>> estimate_duration_days(required_sample_size=50000, daily_traffic=5000)
            10.0
            ```
    """
    if not isinstance(required_sample_size, numbers.Real):
        raise TypeError("required_sample_size must be a real number.")
    if not isinstance(daily_traffic, numbers.Real):
        raise TypeError("daily_traffic must be a real number.")

    if required_sample_size <= 0:
        raise ValueError("required_sample_size must be greater than zero.")
    if daily_traffic <= 0:
        raise ValueError("daily_traffic must be greater than zero.")
    return required_sample_size / daily_traffic

design_experiment

design_experiment(
    metric_type: str,
    baseline_value: float,
    standard_deviation: Optional[float] = None,
    mde: float = 0.05,
    mde_type: str = "relative",
    alpha: float = 0.05,
    power: float = 0.8,
    pre_post_correlation: Optional[float] = None,
    daily_traffic: Optional[int] = None,
) -> ExperimentDesignResult

Computes the required sample size and duration for an experiment based on design constraints.

This function performs rigorous a priori power analysis to determine required sample sizes. It supports continuous means, proportions, and ratio metrics, integrates pre-post correlation for CUPED calculation, and maps sizes to daily traffic to compute duration.

PARAMETER DESCRIPTION
metric_type

The statistical distribution type. Options are 'mean', 'proportion', or 'ratio'.

TYPE: str

baseline_value

The current historical control group value (mean or rate).

TYPE: float

standard_deviation

The historical standard deviation of the metric. Required for 'mean' and 'ratio' metric types. Ignored for 'proportion'.

TYPE: Optional[float] DEFAULT: None

mde

The target Minimum Detectable Effect. Expressed as a fraction of baseline for "relative" (e.g., 0.05 is 5%) or directly as a raw difference for "absolute". Defaults to 0.05.

TYPE: float DEFAULT: 0.05

mde_type

Dictates how mde is interpreted. Options are 'relative' or 'absolute'. Defaults to 'relative'.

TYPE: str DEFAULT: 'relative'

alpha

The probability of a Type I error (significance level, e.g., 0.05 for 95% confidence). Defaults to 0.05.

TYPE: float DEFAULT: 0.05

power

The desired statistical power (\(1 - \beta\), e.g., 0.80 to capture true effects 80% of the time). Defaults to 0.80.

TYPE: float DEFAULT: 0.8

pre_post_correlation

The correlation coefficient (\(\rho\)) between baseline pre-period values and active experiment-period values. If provided, calculates CUPED-deflated sizing. Defaults to None.

TYPE: Optional[float] DEFAULT: None

daily_traffic

Expected daily volume of unique units entering the experiment. If provided, calculates duration. Defaults to None.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
ExperimentDesignResult

A wrapper object containing formatted parameters and sample sizing calculations.

TYPE: ExperimentDesignResult

RAISES DESCRIPTION
ValueError

If statistical inputs are out of logical bounds (e.g., negative traffic, proportion baseline not in \((0, 1)\), or correlation not in \([-1, 1]\)).

ValueError

If standard deviation is missing for mean/ratio metrics.

Examples:

Example
>>> # Planning a conversion rate proportion test (10% baseline, relative MDE of 5%)
>>> result = design_experiment(
...     metric_type="proportion",
...     baseline_value=0.10,
...     mde=0.05,
...     mde_type="relative"
... )
>>> int(result.details["sample_size_per_variant"])
141258
Source code in src\xpyrment\plan\power.py
def design_experiment(
    metric_type: str,
    baseline_value: float,
    standard_deviation: Optional[float] = None,
    mde: float = 0.05,
    mde_type: str = "relative",
    alpha: float = 0.05,
    power: float = 0.80,
    pre_post_correlation: Optional[float] = None,
    daily_traffic: Optional[int] = None,
) -> ExperimentDesignResult:
    r"""Computes the required sample size and duration for an experiment based on design constraints.

    This function performs rigorous a priori power analysis to determine required sample sizes.
    It supports continuous means, proportions, and ratio metrics, integrates pre-post correlation
    for CUPED calculation, and maps sizes to daily traffic to compute duration.

    Args:
        metric_type (str): The statistical distribution type. Options are `'mean'`,
            `'proportion'`, or `'ratio'`.
        baseline_value (float): The current historical control group value (mean or rate).
        standard_deviation (Optional[float]): The historical standard deviation of the metric.
            Required for `'mean'` and `'ratio'` metric types. Ignored for `'proportion'`.
        mde (float): The target Minimum Detectable Effect. Expressed as a fraction of baseline for
            `"relative"` (e.g., `0.05` is 5%) or directly as a raw difference for `"absolute"`.
            Defaults to 0.05.
        mde_type (str): Dictates how `mde` is interpreted. Options are `'relative'` or `'absolute'`.
            Defaults to `'relative'`.
        alpha (float): The probability of a Type I error (significance level, e.g., 0.05 for 95% confidence).
            Defaults to 0.05.
        power (float): The desired statistical power ($1 - \beta$, e.g., 0.80 to capture true effects 80% of the time).
            Defaults to 0.80.
        pre_post_correlation (Optional[float]): The correlation coefficient ($\rho$) between baseline pre-period
            values and active experiment-period values. If provided, calculates CUPED-deflated sizing.
            Defaults to None.
        daily_traffic (Optional[int]): Expected daily volume of unique units entering the experiment. If provided,
            calculates duration. Defaults to None.

    Returns:
        ExperimentDesignResult: A wrapper object containing formatted parameters and sample sizing calculations.

    Raises:
        ValueError: If statistical inputs are out of logical bounds (e.g., negative traffic, proportion baseline
            not in $(0, 1)$, or correlation not in $[-1, 1]$).
        ValueError: If standard deviation is missing for mean/ratio metrics.

    Examples:
        ??? example "Example"

            ```python
            >>> # Planning a conversion rate proportion test (10% baseline, relative MDE of 5%)
            >>> result = design_experiment(
            ...     metric_type="proportion",
            ...     baseline_value=0.10,
            ...     mde=0.05,
            ...     mde_type="relative"
            ... )
            >>> int(result.details["sample_size_per_variant"])
            141258
            ```
    """
    metric_type = metric_type.lower()
    mde_type = mde_type.lower()

    if metric_type not in ["mean", "proportion", "ratio"]:
        raise ValueError("metric_type must be one of: 'mean', 'proportion', 'ratio'.")
    if mde_type not in ["relative", "absolute"]:
        raise ValueError("mde_type must be 'relative' or 'absolute'.")

    if mde_type == "relative":
        mde_absolute = baseline_value * mde
        mde_relative = mde
    else:
        mde_absolute = mde
        mde_relative = mde / baseline_value if baseline_value != 0 else 0.0

    if metric_type == "proportion":
        if baseline_value <= 0 or baseline_value >= 1:
            raise ValueError("For proportions, baseline_value must be strictly between 0 and 1.")
        variance = baseline_value * (1 - baseline_value)
    else:
        if standard_deviation is None:
            raise ValueError(f"standard_deviation is required for metric type '{metric_type}'.")
        variance = standard_deviation**2

    z_alpha = stats.norm.ppf(1 - alpha / 2)
    z_beta = stats.norm.ppf(power)

    factor = 2 * (z_alpha + z_beta) ** 2
    sample_size = factor * variance / (mde_absolute**2)

    details = {
        "metric_type": metric_type,
        "baseline_value": baseline_value,
        "standard_deviation": standard_deviation if metric_type != "proportion" else np.sqrt(variance),
        "mde_absolute": mde_absolute,
        "mde_relative": mde_relative,
        "alpha": alpha,
        "power": power,
        "sample_size_per_variant": sample_size,
        "total_sample_size": sample_size * 2,
    }

    if pre_post_correlation is not None:
        if not (-1.0 <= pre_post_correlation <= 1.0):
            raise ValueError("pre_post_correlation must be between -1.0 and 1.0.")

        vr_factor = 1.0 - (pre_post_correlation**2)
        cuped_sample_size = sample_size * vr_factor

        details["pre_post_correlation"] = pre_post_correlation
        details["cuped_sample_size_per_variant"] = cuped_sample_size
        details["cuped_savings"] = 1.0 - vr_factor

    if daily_traffic is not None:
        if daily_traffic <= 0:
            raise ValueError("daily_traffic must be positive.")
        details["daily_traffic"] = daily_traffic
        details["duration_days_standard"] = (sample_size * 2) / daily_traffic
        if pre_post_correlation is not None:
            details["duration_days_cuped"] = (cuped_sample_size * 2) / daily_traffic

    return ExperimentDesignResult(details)

generate_power_curve_data

generate_power_curve_data(
    metric_type: str,
    baseline_value: float,
    standard_deviation: Optional[float] = None,
    alpha: float = 0.05,
    power: float = 0.8,
    mde_range: Optional[ndarray] = None,
    pre_post_correlation: Optional[float] = None,
) -> Dict[str, ndarray]

Generates sample size coordinates across a range of relative MDE values.

This function calculates required sizing across a coordinate spectrum of possible MDEs, allowing downstream reporting tools to plot an interactive or static "power curve" graph (sample size vs. effect size).

PARAMETER DESCRIPTION
metric_type

Metric type ('mean', 'proportion', 'ratio').

TYPE: str

baseline_value

Historical control average value.

TYPE: float

standard_deviation

Historical metric standard deviation. Required for continuous.

TYPE: Optional[float] DEFAULT: None

alpha

Significance level. Defaults to 0.05.

TYPE: float DEFAULT: 0.05

power

Desired statistical power. Defaults to 0.80.

TYPE: float DEFAULT: 0.8

mde_range

Array of relative MDE points to evaluate. If not provided, evaluates 50 linear coordinates in \([0.01, 0.15]\). Defaults to None.

TYPE: Optional[ndarray] DEFAULT: None

pre_post_correlation

Pre-post correlation coefficient for CUPED-adjusted curve. Defaults to None.

TYPE: Optional[float] DEFAULT: None

RETURNS DESCRIPTION
Dict[str, ndarray]

Dict[str, np.ndarray]: Dictionary mapping coordinate names to numpy arrays of results. Contains keys 'mde_relative' and 'sample_size_per_variant', and optionally 'cuped_sample_size_per_variant'.

Source code in src\xpyrment\plan\power.py
def generate_power_curve_data(
    metric_type: str,
    baseline_value: float,
    standard_deviation: Optional[float] = None,
    alpha: float = 0.05,
    power: float = 0.80,
    mde_range: Optional[np.ndarray] = None,
    pre_post_correlation: Optional[float] = None,
) -> Dict[str, np.ndarray]:
    """Generates sample size coordinates across a range of relative MDE values.

    This function calculates required sizing across a coordinate spectrum of possible MDEs,
    allowing downstream reporting tools to plot an interactive or static "power curve"
    graph (sample size vs. effect size).

    Args:
        metric_type (str): Metric type ('mean', 'proportion', 'ratio').
        baseline_value (float): Historical control average value.
        standard_deviation (Optional[float]): Historical metric standard deviation. Required for continuous.
        alpha (float): Significance level. Defaults to 0.05.
        power (float): Desired statistical power. Defaults to 0.80.
        mde_range (Optional[np.ndarray]): Array of relative MDE points to evaluate. If not provided,
            evaluates 50 linear coordinates in $[0.01, 0.15]$. Defaults to None.
        pre_post_correlation (Optional[float]): Pre-post correlation coefficient for CUPED-adjusted curve.
            Defaults to None.

    Returns:
        Dict[str, np.ndarray]: Dictionary mapping coordinate names to numpy arrays of results.
            Contains keys `'mde_relative'` and `'sample_size_per_variant'`, and optionally
            `'cuped_sample_size_per_variant'`.
    """
    if mde_range is None:
        mde_range = np.linspace(0.01, 0.15, 50)

    sample_sizes = []
    cuped_sample_sizes = []

    for mde_val in mde_range:
        res = design_experiment(
            metric_type=metric_type,
            baseline_value=baseline_value,
            standard_deviation=standard_deviation,
            mde=mde_val,
            mde_type="relative",
            alpha=alpha,
            power=power,
            pre_post_correlation=pre_post_correlation,
        )
        sample_sizes.append(res.details["sample_size_per_variant"])
        if pre_post_correlation is not None:
            cuped_sample_sizes.append(res.details["cuped_sample_size_per_variant"])

    ret = {
        "mde_relative": mde_range,
        "sample_size_per_variant": np.array(sample_sizes),
    }

    if pre_post_correlation is not None:
        ret["cuped_sample_size_per_variant"] = np.array(cuped_sample_sizes)

    return ret