Plan Module
The xpyrment.plan module contains submodules and components for plan.
plan
Experiment planning, power analysis, duration estimation, and preregistration.
This package houses components dedicated to the planning stage of the experimental lifecycle, helping experimenters define scientific/business hypotheses, calculate necessary sample sizes and run runtimes, and lock plans via cryptographic preregistration.
Submodules:
- hypothesis: Forms HypothesisSpec containers mapping outcomes to statistical directions.
- power: Handles a priori statistical power calculations and sample sizing.
- duration: Maps required sample sizes to calendar run durations.
- preregistration: Issues immutable PreregistrationCards to protect analysis integrity.
| MODULE | DESCRIPTION |
|---|---|
duration |
Duration estimation utilities based on statistical requirements and traffic pipelines. |
hypothesis |
Hypothesis specification layer binding metrics to targeted statistical directions. |
power |
Power analysis and sample-size planning calculators. |
preregistration |
Preregistration specifications to prevent retrospective study optimization. |
| CLASS | DESCRIPTION |
|---|---|
HypothesisSpec |
Specifies the hypothesis under test and binds it to a primary metric. |
ExperimentDesignResult |
Class to hold, format, and present experiment design and statistical power analysis results. |
PreregistrationCard |
Represents an immutable spec card registered prior to running the experiment. |
| FUNCTION | DESCRIPTION |
|---|---|
estimate_duration_days |
Estimates the required experiment run duration in days. |
design_experiment |
Computes the required sample size and duration for an experiment based on design constraints. |
generate_power_curve_data |
Generates sample size coordinates across a range of relative MDE values. |
HypothesisSpec
HypothesisSpec(
primary_metric: BaseMetric,
direction: Literal[
"two-sided", "greater", "less"
] = "two-sided",
description: str = "",
)
Specifies the hypothesis under test and binds it to a primary metric.
This class serves as the formal pre-registered scientific statement of the experiment's goal, preventing retrospective hypothesis adjustments after seeing results (HARKing). It defines the directional expectation of the treatment effect.
Mathematical Specifications
Let \(\theta_C\) and \(\theta_T\) represent the true population parameter (e.g., mean, rate, or ratio)
of the control and treatment groups, respectively, for the bound primary_metric.
"two-sided"(Default): Tests for any difference between arms, representing the standard industrial default. $$ H_0: \theta_T = \theta_C \quad \text{vs.} \quad H_1: \theta_T \neq \theta_C $$"greater"(One-sided upper-tailed): Tests whether treatment is strictly better than control. $$ H_0: \theta_T \le \theta_C \quad \text{vs.} \quad H_1: \theta_T > \theta_C $$"less"(One-sided lower-tailed): Tests whether treatment is strictly worse than control (typically used for testing negative side effects or latency increases). $$ H_0: \theta_T \ge \theta_C \quad \text{vs.} \quad H_1: \theta_T < \theta_C $$
Attributes: primary_metric (BaseMetric): The registered metric used as the primary outcome variable for evaluating this hypothesis. direction (Literal["two-sided", "greater", "less"]): The directional mathematical sign of the statistical test. Defaults to "two-sided". description (str): Text describing the business or scientific hypothesis in natural language.
Examples:
Example
>>> from xpyrment.metrics.taxonomy import ProportionMetric
>>> from xpyrment.plan.hypothesis import HypothesisSpec
>>> conv_metric = ProportionMetric("Conversion Rate", value_col="converted")
>>> spec = HypothesisSpec(
... primary_metric=conv_metric,
... direction="greater",
... description="Redesigned checkout button increases conversion rates."
... )
>>> spec.direction
'greater'
| PARAMETER | DESCRIPTION |
|---|---|
primary_metric
|
The evaluation metric being tested.
TYPE:
|
direction
|
The statistical test directionality.
Options are
TYPE:
|
description
|
A descriptive summary of the business hypothesis. Defaults to "".
TYPE:
|
Source code in src\xpyrment\plan\hypothesis.py
ExperimentDesignResult
Class to hold, format, and present experiment design and statistical power analysis results.
Provides high-fidelity text-based representations and structured summaries of design outputs to help experimenters evaluate sizing requirements, potential CUPED savings, and run runtimes.
| ATTRIBUTE | DESCRIPTION |
|---|---|
details |
Dictionary containing raw parameter values and sizing outputs from the power analysis engine.
TYPE:
|
| PARAMETER | DESCRIPTION |
|---|---|
details
|
Dictionary of design details from
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
summary |
Compiles a structured summary of the experiment design parameters. |
__repr__ |
Generates an aesthetic text block summary of the experiment design parameters. |
Source code in src\xpyrment\plan\power.py
summary
Compiles a structured summary of the experiment design parameters.
Formats raw numbers into clear, readable text elements (e.g., currency, percentages, and human-readable sample sizes with digit grouping).
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, list]
|
Dict[str, list]: A dictionary mapping parameter names to formatted value strings. |
Source code in src\xpyrment\plan\power.py
__repr__
Generates an aesthetic text block summary of the experiment design parameters.
Source code in src\xpyrment\plan\power.py
PreregistrationCard
Represents an immutable spec card registered prior to running the experiment.
A PreregistrationCard bundles the complete configurations of an experiment (such as primary and
secondary metrics, target statistical power, significance thresholds, and sample sizes) and
secures them against post-hoc tampering by computing a SHA-256 signature via the ExperimentRegistry.
Why Pre-registration Matters
In both clinical trials and industrial A/B testing, a common failure mode is post-hoc tampering (p-hacking, retrospective metric selection, or adjusting significance thresholds mid-run). A pre-registered card acts as an immutable ledger. Before final reports are compiled, the system validates the active analysis configuration against this card's signature.
| ATTRIBUTE | DESCRIPTION |
|---|---|
experiment_id |
Unique tracking identifier for the experiment.
TYPE:
|
spec |
Structural dictionary outlining the planned experimental parameters.
TYPE:
|
hash_signature |
Cryptographic SHA-256 hash representing the serialized
TYPE:
|
Examples:
Example
>>> spec = {"metric": "conversion_rate", "alpha": 0.05, "target_n": 10000}
>>> card = PreregistrationCard("EXP-999", spec)
>>> card.hash_signature[:8]
'88e6e885'
>>> card.verify({"metric": "conversion_rate", "alpha": 0.05, "target_n": 10000})
True
>>> card.verify({"metric": "conversion_rate", "alpha": 0.10, "target_n": 10000}) # alpha altered!
False
| PARAMETER | DESCRIPTION |
|---|---|
experiment_id
|
Unique tracking identifier for the experiment.
TYPE:
|
spec
|
Dictionary containing complete design parameters to be registered.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
verify |
Verifies if the current spec matches the registered immutable signature. |
to_json |
Serializes the preregistration card parameters and cryptographic signature to JSON. |
Source code in src\xpyrment\plan\preregistration.py
verify
Verifies if the current spec matches the registered immutable signature.
Compares the provided specification dictionary against the original registered spec.
| PARAMETER | DESCRIPTION |
|---|---|
current_spec
|
The operational parameters dictionary currently under execution.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the current specification matches the pre-registered specification exactly, False if there is any mismatch or if the card was not properly recorded.
TYPE:
|
Source code in src\xpyrment\plan\preregistration.py
to_json
Serializes the preregistration card parameters and cryptographic signature to JSON.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Indented, human-readable JSON string representing the card metadata.
TYPE:
|
Source code in src\xpyrment\plan\preregistration.py
estimate_duration_days
Estimates the required experiment run duration in days.
Translates the calculated target sample size (\(N_{\text{required}}\)) into the estimated calendar days needed to accumulate that sample volume based on active daily traffic (\(T_{\text{daily}}\)).
Mathematical Model
The duration in days (\(D\)) is computed as: $$ D = \frac{N_{\text{required}}}{T_{\text{daily}}} $$ where \(N_{\text{required}}\) represents the combined total sample size across all active arms (control + treatment arms) or the single-arm requirement multiplied by the number of arms.
| PARAMETER | DESCRIPTION |
|---|---|
required_sample_size
|
The total sample size needed across all arms combined (e.g., control \(n\) + treatment \(n\)). Must be greater than zero.
TYPE:
|
daily_traffic
|
The expected number of unique qualifying experimental units (e.g., users, sessions, or pageviews) entering the experiment pipeline per day. Must be greater than zero.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Estimated run duration in decimal calendar days.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
TypeError
|
If |
ValueError
|
If |
Examples:
Source code in src\xpyrment\plan\duration.py
design_experiment
design_experiment(
metric_type: str,
baseline_value: float,
standard_deviation: Optional[float] = None,
mde: float = 0.05,
mde_type: str = "relative",
alpha: float = 0.05,
power: float = 0.8,
pre_post_correlation: Optional[float] = None,
daily_traffic: Optional[int] = None,
) -> ExperimentDesignResult
Computes the required sample size and duration for an experiment based on design constraints.
This function performs rigorous a priori power analysis to determine required sample sizes. It supports continuous means, proportions, and ratio metrics, integrates pre-post correlation for CUPED calculation, and maps sizes to daily traffic to compute duration.
| PARAMETER | DESCRIPTION |
|---|---|
metric_type
|
The statistical distribution type. Options are
TYPE:
|
baseline_value
|
The current historical control group value (mean or rate).
TYPE:
|
standard_deviation
|
The historical standard deviation of the metric.
Required for
TYPE:
|
mde
|
The target Minimum Detectable Effect. Expressed as a fraction of baseline for
TYPE:
|
mde_type
|
Dictates how
TYPE:
|
alpha
|
The probability of a Type I error (significance level, e.g., 0.05 for 95% confidence). Defaults to 0.05.
TYPE:
|
power
|
The desired statistical power (\(1 - \beta\), e.g., 0.80 to capture true effects 80% of the time). Defaults to 0.80.
TYPE:
|
pre_post_correlation
|
The correlation coefficient (\(\rho\)) between baseline pre-period values and active experiment-period values. If provided, calculates CUPED-deflated sizing. Defaults to None.
TYPE:
|
daily_traffic
|
Expected daily volume of unique units entering the experiment. If provided, calculates duration. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ExperimentDesignResult
|
A wrapper object containing formatted parameters and sample sizing calculations.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If statistical inputs are out of logical bounds (e.g., negative traffic, proportion baseline not in \((0, 1)\), or correlation not in \([-1, 1]\)). |
ValueError
|
If standard deviation is missing for mean/ratio metrics. |
Examples:
Example
Source code in src\xpyrment\plan\power.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | |
generate_power_curve_data
generate_power_curve_data(
metric_type: str,
baseline_value: float,
standard_deviation: Optional[float] = None,
alpha: float = 0.05,
power: float = 0.8,
mde_range: Optional[ndarray] = None,
pre_post_correlation: Optional[float] = None,
) -> Dict[str, ndarray]
Generates sample size coordinates across a range of relative MDE values.
This function calculates required sizing across a coordinate spectrum of possible MDEs, allowing downstream reporting tools to plot an interactive or static "power curve" graph (sample size vs. effect size).
| PARAMETER | DESCRIPTION |
|---|---|
metric_type
|
Metric type ('mean', 'proportion', 'ratio').
TYPE:
|
baseline_value
|
Historical control average value.
TYPE:
|
standard_deviation
|
Historical metric standard deviation. Required for continuous.
TYPE:
|
alpha
|
Significance level. Defaults to 0.05.
TYPE:
|
power
|
Desired statistical power. Defaults to 0.80.
TYPE:
|
mde_range
|
Array of relative MDE points to evaluate. If not provided, evaluates 50 linear coordinates in \([0.01, 0.15]\). Defaults to None.
TYPE:
|
pre_post_correlation
|
Pre-post correlation coefficient for CUPED-adjusted curve. Defaults to None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, ndarray]
|
Dict[str, np.ndarray]: Dictionary mapping coordinate names to numpy arrays of results.
Contains keys |