Core Module
The xpyrment.core module contains submodules and components for core.
core
Core engine abstractions, state management, exception classes, and shared types.
This package provides the foundational structural mechanisms for the xpyrment package:
- Experiment: The central orchestration state container that governs execution.
- ExperimentState: The rigid phase-gating mechanism (CREATED -> PLANNED -> DESIGNED -> RUNNING -> ANALYZED -> REPORTED).
- ExperimentRegistry: Cryptographic hashing and pre-registration validator to prevent post-hoc changes.
- Custom Exceptions: Robust, informative error feedback to protect experimental integrity (PhaseOrderError, SRMError, AliasError).
- Strict Typing schemas: Standardized TypedDict representation (MetricResult) of calculation outputs.
| MODULE | DESCRIPTION |
|---|---|
exceptions |
Custom exception classes for the xpyrment core system. |
experiment |
Central orchestrator for experiment setup, configuration, and phase management. |
registry |
Preregistration registry for locking and verifying experiment specifications. |
serialization |
Robust serialization utilities to guarantee native Python and JSON compatibility (Block 52). |
state |
State machine and phase-gating representation for experiment lifecycles. |
telemetry |
Centralized Telemetry, Structured JSON Logging, and Execution Profiling (Block 56). |
types |
Core type definitions, TypeDicts, and Literals for the xpyrment library. |
| CLASS | DESCRIPTION |
|---|---|
PhaseOrderError |
Raised when an operation is performed in an invalid state/phase. |
SRMError |
Raised when a Sample Ratio Mismatch (SRM) is detected during validation. |
AliasError |
Raised when fractional factorial alias confounding is violated or misconfigured. |
Experiment |
The central orchestration class for setting up, configuring, and executing experiments. |
ExperimentRegistry |
Manages immutable experiment specifications to prevent post-hoc changes (pre-registration). |
ExperimentState |
Enforces the phase-gated state of the experimental lifecycle. |
MetricResult |
The canonical data schema representing the output of a statistical metric analysis. |
ExecutionProfiler |
Context manager and decorator tracking processing stages, execution duration, and peak memory usage. |
| FUNCTION | DESCRIPTION |
|---|---|
make_serializable |
Recursively converts numpy and non-serializable objects to native, standard JSON-compliant types. |
serialize_to_json |
Converts a nested object recursively to serializable format and dumps it as a JSON string. |
configure_telemetry |
Configures and registers the centralized JSON telemetry logger handlers. |
get_logger |
Returns the centralized JSON telemetry logger instance. |
| ATTRIBUTE | DESCRIPTION |
|---|---|
MetricType |
Literal representing the supported category of metrics.
|
MetricType
module-attribute
Literal representing the supported category of metrics.
Supported Types
"mean": A continuous or discrete numeric metric where statistics are calculated on a per-unit basis (e.g., average sessions per user, average page views)."proportion": A binary rate metric representing yes/no outcomes on a per-unit basis, equivalent to a Bernoulli trial (e.g., conversion rate, click-through-rate where the unit of analysis is the user)."ratio": An aggregated metric computed as the sum of a numerator divided by the sum of a denominator across all units (e.g., global Click-Through-Rate = total clicks / total impressions). Requires Delta Method for proper variance approximation."revenue": A highly skewed continuous monetary metric (e.g., revenue per user, average order value). Often subject to log-transformations or specialized variance reduction.
PhaseOrderError
Bases: Exception
Raised when an operation is performed in an invalid state/phase.
This exception is a core mechanism of the phase-gated execution flow. It prevents:
- Downstream actions (e.g., calling .analyze() or .report()) from being executed
before upstream requirements (e.g., .design() or .validate()) are complete.
- Upstream state reversals (e.g., transitioning back to CREATED or PLANNED
once an experiment is already RUNNING or ANALYZED), which could lead to
post-hoc configuration tampering or invalid statistical analysis.
Mathematical & Operational Context: The experimental lifecycle is governed by a strict directed, non-cyclic graph of transitions: CREATED -> PLANNED -> DESIGNED -> RUNNING -> ANALYZED -> REPORTED. Any transition where index(target_state) < index(current_state) violates this unidirectional flow and triggers this error (with the exception of re-running the analysis on the same or new frozen data, which remains in the ANALYZED state).
| ATTRIBUTE | DESCRIPTION |
|---|---|
message |
Explains the invalid state transition attempt and the active state.
TYPE:
|
Examples:
SRMError
Bases: Exception
Raised when a Sample Ratio Mismatch (SRM) is detected during validation.
SRM occurs when the observed ratio of sample counts assigned to treatment arms significantly deviates from the pre-specified expected ratio (e.g., 50/50 split). An SRM is a critical indicator of data quality issues, selection bias, or bugs in the randomization/assignment mechanism.
Mathematical Background
A Pearson Chi-Square Goodness-of-Fit test is performed to evaluate the discrepancy: $$ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} $$ where \(O_i\) is the observed count in arm \(i\) and \(E_i\) is the expected count under the planned split. The degrees of freedom is \(k - 1\). This exception is raised if the resulting p-value is extremely small (typically p < 0.001), indicating that the deviation is highly unlikely to have occurred by chance alone.
Remediation Procedures
- Halt the experiment or analysis immediately.
- Audit the randomization and unit assignment pipeline for bugs.
- Check for data loss, telemetry issues, or delay in event ingestion pipelines.
- Validate that the assignment tracking log captures all users on first touch.
| ATTRIBUTE | DESCRIPTION |
|---|---|
message |
Explains the observed vs. expected sample counts and the p-value.
TYPE:
|
AliasError
Bases: Exception
Raised when fractional factorial alias confounding is violated or misconfigured.
In fractional factorial classical Design of Experiments (DoE), only a fraction of all possible factor combinations is run. This results in confounding (aliasing), where the estimate of a specific main effect is mathematically indistinguishable from a multi-factor interaction term.
This exception is raised when: - A user tries to estimate an effect that is completely confounded with another active effect of equal or lower order (violating the specified design Resolution). - The defined alias structure does not match the actual combinations present in the design matrix. - The design resolution (III, IV, or V) is insufficient to support the hypothesis or interaction analysis requested.
Mathematical Context
Let \(X\) be the design matrix and \(C = (X^T X)^{-1} X^T Y\) be the parameter estimates. If the design is fractional, some columns of \(X\) are linear combinations of others, leading to a rank-deficient matrix where unique solutions for all factors and interactions do not exist. The alias relation matrix \(A\) defines which terms are confounded: $$ E[\hat{\beta}_1] = \beta_1 + A \beta_2 $$ An AliasError prevents the system from proceeding with invalid or unresolvable confounding structures.
| ATTRIBUTE | DESCRIPTION |
|---|---|
message |
Details the confounded factors or resolution constraint violated.
TYPE:
|
Experiment
Experiment(
data: DataFrame,
treatment_col: str,
id_col: Optional[str] = None,
covariates: Optional[List[str]] = None,
)
The central orchestration class for setting up, configuring, and executing experiments.
The Experiment class binds the experimental dataset, defines treatment structures, maps
the metric taxonomy, and strictly enforces state transitions across the execution lifecycle.
Through the state-machine rules, it ensures that all calculations are performed sequentially
and reproducibly, eliminating retrospective tampering or incorrect state usage.
| ATTRIBUTE | DESCRIPTION |
|---|---|
data |
A copy of the input DataFrame containing assignments and telemetry.
TYPE:
|
treatment_col |
The column in
TYPE:
|
id_col |
The column in
TYPE:
|
metrics |
List of metrics registered for statistical calculation.
TYPE:
|
state |
The current lifecycle phase of the experiment.
TYPE:
|
State Gating Mechanism
Execution functions across downstream submodules verify that the experiment is in the
appropriate state before proceeding. For example, running power analysis transitions the
state from CREATED to PLANNED. Running randomization moves from PLANNED to DESIGNED.
Analyzing results requires a transition to ANALYZED.
Examples:
Example
>>> import pandas as pd
>>> from xpyrment import Experiment
>>> from xpyrment.metrics.taxonomy import MeanMetric
>>> df = pd.DataFrame({"user_id": [1, 2, 3], "group": ["control", "treatment", "control"], "revenue": [10.5, 12.0, 9.5]})
>>> exp = Experiment(df, treatment_col="group", id_col="user_id")
>>> exp.state
<ExperimentState.CREATED: 'CREATED'>
>>> metric = MeanMetric("Revenue Metric", value_col="revenue")
>>> exp.add_metrics(metric)
>>> exp.state
<ExperimentState.PLANNED: 'PLANNED'>
Copies the input DataFrame to guarantee immutability of the source dataset during internal state transitions and potential data transformations (e.g., CUPED alignment or log scaling).
| PARAMETER | DESCRIPTION |
|---|---|
data
|
The source DataFrame containing unit-level data.
TYPE:
|
treatment_col
|
Name of the column designating experimental groups/arms.
TYPE:
|
id_col
|
Name of the column containing unique identifiers for each experimental unit. Required for certain operations like sequential analysis and user assignments.
TYPE:
|
covariates
|
List of baseline covariates for balance checking or adjustments.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
| METHOD | DESCRIPTION |
|---|---|
transition_to |
Enforces transition logic to guarantee the phase-gated execution flow. |
add_metrics |
Adds statistical metrics to the experiment configuration. |
add_covariates |
Adds baseline covariates to the experiment configuration. |
register_metric |
Conveniently registers a metric and appends it to the configuration. |
Source code in src\xpyrment\core\experiment.py
transition_to
transition_to(target_state: ExperimentState) -> None
Enforces transition logic to guarantee the phase-gated execution flow.
Uses the ordinal indices of ExperimentState members to verify that the transition is
monotonically increasing (forward-only).
Mathematical/Logical Representation: Let \(S\) be the ordered tuple of states: $$ S = (\text{CREATED}, \text{PLANNED}, \text{DESIGNED}, \text{RUNNING}, \text{ANALYZED}, \text{REPORTED}) $$ A state transition from state \(s_1\) to state \(s_2\) is valid if and only if: $$ \text{Index}(s_2) \ge \text{Index}(s_1) $$ with a special exemption permitting \(s_1 = \text{ANALYZED} \rightarrow s_2 = \text{ANALYZED}\) to support re-running statistical engines on the locked design data.
| PARAMETER | DESCRIPTION |
|---|---|
target_state
|
The state the experiment is attempting to transition into.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
PhaseOrderError
|
If a backwards state transition is attempted, or if transition is otherwise unauthorized. |
Source code in src\xpyrment\core\experiment.py
add_metrics
add_metrics(
metrics: Union[BaseMetric, List[BaseMetric]],
) -> Experiment
Adds statistical metrics to the experiment configuration.
Successfully registering a metric moves the experiment from CREATED to PLANNED state, representing
that the evaluation criteria have been defined prior to running designs, validations, or analyses.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
A single metric object or a list of metrics
(inheriting from
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Experiment
|
The experiment instance itself (for fluent API chaining).
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
PhaseOrderError
|
If the experiment has already progressed past the |
Source code in src\xpyrment\core\experiment.py
add_covariates
add_covariates(names: Union[str, List[str]]) -> Experiment
Adds baseline covariates to the experiment configuration.
| PARAMETER | DESCRIPTION |
|---|---|
names
|
A single covariate column name or a list of names.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Experiment
|
The experiment instance itself (for fluent API chaining).
TYPE:
|
Source code in src\xpyrment\core\experiment.py
register_metric
register_metric(
name: str,
metric_type: str = "mean",
value_col: Optional[str] = None,
covariate: Optional[str] = None,
numerator_col: Optional[str] = None,
denominator_col: Optional[str] = None,
pre_numerator_col: Optional[str] = None,
pre_denominator_col: Optional[str] = None,
) -> Experiment
Conveniently registers a metric and appends it to the configuration.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Unique descriptive name of the metric.
TYPE:
|
metric_type
|
Type of metric. Options: "mean", "proportion", "ratio". Defaults to "mean".
TYPE:
|
value_col
|
Column name containing experiment period values. Defaults to the metric name.
TYPE:
|
covariate
|
Pre-period covariate column name for CUPED. Defaults to None.
TYPE:
|
numerator_col
|
Column containing numerator values for RatioMetric.
TYPE:
|
denominator_col
|
Column containing denominator values for RatioMetric.
TYPE:
|
pre_numerator_col
|
Pre-period numerator column for RatioMetric CUPED.
TYPE:
|
pre_denominator_col
|
Pre-period denominator column for RatioMetric CUPED.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Experiment
|
The experiment instance itself (for fluent API chaining).
TYPE:
|
Source code in src\xpyrment\core\experiment.py
ExperimentRegistry
Manages immutable experiment specifications to prevent post-hoc changes (pre-registration).
By calculating a cryptographic SHA-256 signature of serialized, key-sorted experiment specifications, the registry provides an audit trail. Analysts can verify that the running parameters (such as target sample sizes, significance levels, and chosen metrics) precisely match the registered plan, preventing retrospective optimization of analysis parameters.
| ATTRIBUTE | DESCRIPTION |
|---|---|
_registry |
Internal store mapping experiment IDs to their registered specification dictionaries and pre-computed hashes.
TYPE:
|
Examples:
Example
>>> registry = ExperimentRegistry()
>>> spec = {"primary_metric": "conversion_rate", "alpha": 0.05, "target_n": 10000}
>>> spec_hash = registry.register_spec("EXP-101", spec)
>>> len(spec_hash)
64
>>> registry.verify_spec("EXP-101", spec)
True
>>> modified_spec = {"primary_metric": "conversion_rate", "alpha": 0.10, "target_n": 10000}
>>> registry.verify_spec("EXP-101", modified_spec)
False
| METHOD | DESCRIPTION |
|---|---|
register_spec |
Serializes the experiment specification, hashes it, and stores it in the registry. |
verify_spec |
Verifies if the current spec_dict matches the registered hash to prevent p-hacking. |
Source code in src\xpyrment\core\registry.py
register_spec
Serializes the experiment specification, hashes it, and stores it in the registry.
Ensures that dictionaries are serialized with sorted keys to maintain deterministic hashing across systems, irrespective of key-insertion order.
Mathematical Representation
Let \(S\) be the key-sorted, compact JSON serialization of spec_dict encoded in UTF-8.
The registered hash \(H\) is:
$$
H = \text{SHA256}(S)
$$
Args: experiment_id (str): Unique identifier of the experiment. spec_dict (Dict[str, Any]): Structural parameters representing the experiment plan, including registered metrics, statistical thresholds (\(\alpha, \beta\)), and design configurations.
| RETURNS | DESCRIPTION |
|---|---|
str
|
The hexadecimal representation of the SHA-256 signature hash.
TYPE:
|
Source code in src\xpyrment\core\registry.py
verify_spec
Verifies if the current spec_dict matches the registered hash to prevent p-hacking.
Re-hashes the incoming specification dictionary using key-sorted serialization and performs a constant-time comparison against the stored hash for the given experiment ID.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_id
|
Registered ID of the experiment to verify.
TYPE:
|
spec_dict
|
The active specification dictionary to validate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the current specification matches the pre-registered specification exactly, False if there is a mismatch or if the experiment ID was never registered.
TYPE:
|
Source code in src\xpyrment\core\registry.py
ExperimentState
Bases: Enum
Enforces the phase-gated state of the experimental lifecycle.
This enum acts as the source of truth for the state machine within Experiment. State transitions
are restricted to a forward-only unidirectional progression, preventing common experimental
malpractices such as p-hacking, retrospective hypothesis creation, or design tampering.
State Diagram & Authorized Transitions:
stateDiagram-v2
[*] --> CREATED : Initialization
CREATED --> PLANNED : add_metrics()
PLANNED --> DESIGNED : design() / do_doe()
DESIGNED --> RUNNING : start_experiment()
RUNNING --> ANALYZED : run_analysis()
ANALYZED --> ANALYZED : re-run (with frozen data)
ANALYZED --> REPORTED : compile_report()
REPORTED --> [*]
States Description
- CREATED: The experiment has been instantiated with raw data and a designated treatment column. No configuration or planning has occurred yet.
- PLANNED: Hypotheses have been bound, primary and secondary metrics have been assigned, and power calculation (required sample size and Minimum Detectable Effect) has been performed.
- DESIGNED: Randomization scheme, traffic split fractions, ramp-up schedules, or classical factorial designs (DoE matrices) have been generated and locked.
- RUNNING: The experiment is actively ingesting live assignment and telemetry data. Sequential monitoring (e.g., mSPRT) or early-stopping boundaries are actively checked.
- ANALYZED: Ingested data is locked, and statistical inference engines (frequentist, Bayesian, or sequential) have run. CUPED variance reduction and multi-comparison corrections are finalized.
- REPORTED: The full lifecycle audit trail, key metrics, and decision recommendations have been serialized into an immutable Experiment Card or exported (JSON/PDF).
Exceptions & Gating:
- Any attempt to transition backwards in the sequence (e.g., from RUNNING to PLANNED to add a new
metric) will raise a PhaseOrderError.
- Re-running analysis in the ANALYZED state to adjust statistical parameters (e.g., alpha, multiple
comparison correction method) is authorized without violating state order rules, provided the
underlying experimental design remains frozen.
MetricResult
Bases: TypedDict
The canonical data schema representing the output of a statistical metric analysis.
This TypedDict establishes a contract for all inference engines (frequentist, Bayesian, and sequential) and reporting utilities, ensuring that every calculated metric contains both descriptive statistics and rigorous statistical validation metrics.
| ATTRIBUTE | DESCRIPTION |
|---|---|
metric_name |
The unique identifier assigned to the analyzed metric.
TYPE:
|
metric_type |
The standardized type string (e.g., "Mean", "Proportion", "Ratio", "Revenue").
TYPE:
|
control_mean |
The sample mean (\(\bar{Y}_C\)) or proportion (\(p_C\)) calculated for the control group.
TYPE:
|
treatment_mean |
The sample mean (\(\bar{Y}_T\)) or proportion (\(p_T\)) calculated for the treatment group.
TYPE:
|
control_var |
The sample variance (\(s^2_C\)) calculated for the control group. For ratios, this represents the Delta-method approximated variance.
TYPE:
|
treatment_var |
The sample variance (\(s^2_T\)) calculated for the treatment group. For ratios, this represents the Delta-method approximated variance.
TYPE:
|
control_n |
The total count of unique units in the control group (\(N_C\)).
TYPE:
|
treatment_n |
The total count of unique units in the treatment group (\(N_T\)).
TYPE:
|
absolute_difference |
The point estimate of the absolute treatment effect: $$ \Delta = \bar{Y}_T - \bar{Y}_C $$
TYPE:
|
relative_lift |
The percentage increase or decrease of the treatment mean relative to the control mean: $$ \text{Lift} = \frac{\bar{Y}_T - \bar{Y}_C}{\bar{Y}_C} $$
TYPE:
|
cuped_applied |
True if Controlled-comparison Using Pre-Existing Data (CUPED) was applied to adjust the variance of this metric. False otherwise.
TYPE:
|
variance_reduction |
The percentage reduction in variance achieved by CUPED, bounded in \([0, 1)\): $$ \text{Reduction} = 1 - \frac{\text{Var}(Y_{\text{CUPED}})}{\text{Var}(Y_{\text{original}})} $$
TYPE:
|
p_value |
The statistical p-value associated with the hypothesis test. For frequentist, this represents the probability of observing a test statistic at least as extreme as the one computed, under the null hypothesis (\(H_0\)).
TYPE:
|
ci_lower |
The lower bound of the absolute confidence/credible interval at the \((1 - \alpha)\) confidence level.
TYPE:
|
ci_upper |
The upper bound of the absolute confidence/credible interval at the \((1 - \alpha)\) confidence level.
TYPE:
|
rel_ci_lower |
The lower bound of the relative confidence/credible interval, scaled relative to the control mean: $$ \text{Rel CI Lower} = \frac{\text{CI Lower}}{\bar{Y}_C} $$
TYPE:
|
rel_ci_upper |
The upper bound of the relative confidence/credible interval, scaled relative to the control mean: $$ \text{Rel CI Upper} = \frac{\text{CI Upper}}{\bar{Y}_C} $$
TYPE:
|
power |
The statistical power (\(1 - \beta\)) achieved by the sample size, denoting the probability of correctly rejecting the null hypothesis when the true treatment effect equals the observed difference.
TYPE:
|
ExecutionProfiler
Context manager and decorator tracking processing stages, execution duration, and peak memory usage.
In production platforms, fine-grained telemetry profile statistics are crucial to detect performance bottlenecks, resource hot spots, and algorithmic memory leaks (especially within bootstrap, MCMC, or massive-scale matrix solvers).
| PARAMETER | DESCRIPTION |
|---|---|
stage_name
|
Identifier label of the current processing stage (e.g., "bootstrap_resampling").
TYPE:
|
logger
|
Custom target logger instance. Defaults to None.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
__enter__ |
Enters the context boundary, initiating tracemalloc memory tracing and epoch timers. |
__exit__ |
Exits the context boundary, stops tracking, records peaks, and logs structured JSON telemetry metrics. |
__call__ |
Allows class to function seamlessly as an execution profiler decorator for standard python functions. |
Source code in src\xpyrment\core\telemetry.py
__enter__
__enter__() -> ExecutionProfiler
Enters the context boundary, initiating tracemalloc memory tracing and epoch timers.
| RETURNS | DESCRIPTION |
|---|---|
ExecutionProfiler
|
Active instance.
TYPE:
|
Source code in src\xpyrment\core\telemetry.py
__exit__
Exits the context boundary, stops tracking, records peaks, and logs structured JSON telemetry metrics.
| PARAMETER | DESCRIPTION |
|---|---|
exc_type
|
Exception type raised within the context.
TYPE:
|
exc_val
|
Exception value.
TYPE:
|
exc_tb
|
Traceback object.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
False to propagate any exceptions raised in the block.
TYPE:
|
Source code in src\xpyrment\core\telemetry.py
__call__
Allows class to function seamlessly as an execution profiler decorator for standard python functions.
| PARAMETER | DESCRIPTION |
|---|---|
func
|
Callable target function to wrap.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
Decorated callable function.
TYPE:
|
Source code in src\xpyrment\core\telemetry.py
make_serializable
Recursively converts numpy and non-serializable objects to native, standard JSON-compliant types.
| PARAMETER | DESCRIPTION |
|---|---|
obj
|
The nested object or value to convert.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
Standard Python dictionary, list, float, int, bool, or string.
TYPE:
|
Source code in src\xpyrment\core\serialization.py
serialize_to_json
Converts a nested object recursively to serializable format and dumps it as a JSON string.
| PARAMETER | DESCRIPTION |
|---|---|
obj
|
The object to convert and serialize.
TYPE:
|
indent
|
Indentation level for pretty-printing.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Validated JSON string.
TYPE:
|
Source code in src\xpyrment\core\serialization.py
configure_telemetry
Configures and registers the centralized JSON telemetry logger handlers.
| PARAMETER | DESCRIPTION |
|---|---|
level
|
Log filter level threshold (e.g. logging.INFO). Defaults to logging.INFO.
TYPE:
|
stream
|
Output stream target. Defaults to sys.stdout.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Logger
|
logging.Logger: Configured telemetry Logger instance. |
Source code in src\xpyrment\core\telemetry.py
get_logger
Returns the centralized JSON telemetry logger instance.
Ensures the logger is fully configured with default handlers if unconfigured.
| RETURNS | DESCRIPTION |
|---|---|
Logger
|
logging.Logger: Active telemetry Logger. |