Skip to content

Orchestrator

orchestrator

Experiment analysis orchestrator, results compiler, and setup entrypoints.

This module provides the central user-facing API for launching analyses on experimental datasets. It coordinates the execution of registered metrics, handles multiple testing corrections, manages state transitions, and constructs the unified AnalysisResult data layer for plotting and reporting.

CLASS DESCRIPTION
AnalysisResult

Holds results from an experiment analysis and provides summary formatting and plotting interfaces.

FUNCTION DESCRIPTION
run_analysis

Executes the statistical analysis across all registered metrics in an Experiment container.

setup

Initializes the experimental setup container, serving as the library's primary entrypoint.

AnalysisResult

AnalysisResult(
    raw_results: List[dict],
    alpha: float = 0.05,
    balance_checker: Optional[Any] = None,
)

Holds results from an experiment analysis and provides summary formatting and plotting interfaces.

This container aggregates the individual metric dictionaries calculated across control and treatment groups. It provides high-level APIs to compile clean summary tables and forward coordinates to the visualization engine.

ATTRIBUTE DESCRIPTION
raw_results

A list of metric calculation result dictionaries (keys: mean, lift, p_value, etc.).

TYPE: List[dict]

alpha

Nominal significance level (Type I error rate) used in the analysis. Defaults to 0.05.

TYPE: float

df_raw

The raw, unformatted results compiled into a pandas DataFrame.

TYPE: DataFrame

balance_checker

Fitted balance checker object if covariates were present.

TYPE: Optional[Any]

PARAMETER DESCRIPTION
raw_results

Raw list of metric results.

TYPE: List[dict]

alpha

Nominal significance level used.

TYPE: float DEFAULT: 0.05

balance_checker

Fitted CovariateBalanceChecker if covariates were specified.

TYPE: Optional[Any] DEFAULT: None

METHOD DESCRIPTION
love_plot

Returns the ASCII Love Plot visualization for baseline covariate balance.

to_dict

Converts the complete analysis results and metadata to a robust, serializable dictionary.

to_json

Converts the analysis results into a standardized, portable JSON string.

summary

Returns a summarized, human-readable DataFrame of the analysis.

plot

Generates and returns a forest plot of the relative metric lifts and confidence intervals.

Source code in src\xpyrment\analyze\orchestrator.py
def __init__(
    self,
    raw_results: List[dict],
    alpha: float = 0.05,
    balance_checker: Optional[Any] = None,
):
    """Initializes an AnalysisResult.

    Args:
        raw_results (List[dict]): Raw list of metric results.
        alpha (float): Nominal significance level used.
        balance_checker (Optional[Any]): Fitted CovariateBalanceChecker if covariates were specified.
    """
    self.raw_results = raw_results
    self.alpha = alpha
    self.df_raw = pd.DataFrame(raw_results)
    self.balance_checker = balance_checker

love_plot

love_plot() -> str

Returns the ASCII Love Plot visualization for baseline covariate balance.

RETURNS DESCRIPTION
str

An ASCII text-based representation or message.

TYPE: str

Source code in src\xpyrment\analyze\orchestrator.py
def love_plot(self) -> str:
    """Returns the ASCII Love Plot visualization for baseline covariate balance.

    Returns:
        str: An ASCII text-based representation or message.
    """
    if self.balance_checker is None:
        return "No covariate balance diagnostics were compiled (no covariates provided)."
    return self.balance_checker.generate_love_plot()

to_dict

to_dict() -> dict

Converts the complete analysis results and metadata to a robust, serializable dictionary.

Includes significance thresholds (alpha), raw metric outcomes, and covariate balance diagnostics if present.

RETURNS DESCRIPTION
dict

A nested dictionary with native Python types, guaranteed to be JSON serializable.

TYPE: dict

Source code in src\xpyrment\analyze\orchestrator.py
def to_dict(self) -> dict:
    """Converts the complete analysis results and metadata to a robust, serializable dictionary.

    Includes significance thresholds (alpha), raw metric outcomes, and covariate balance diagnostics if present.

    Returns:
        dict: A nested dictionary with native Python types, guaranteed to be JSON serializable.
    """
    from xpyrment.core.serialization import make_serializable

    state = {
        "alpha": self.alpha,
        "metrics": self.raw_results,
        "covariate_balance": (
            self.balance_checker.diagnostics_
            if (self.balance_checker is not None)
            else None
        ),
    }
    return make_serializable(state)

to_json

to_json(indent: Optional[int] = None) -> str

Converts the analysis results into a standardized, portable JSON string.

PARAMETER DESCRIPTION
indent

If provided, formats the JSON string with this indentation level.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
str

Standardized JSON representation of the analysis results.

TYPE: str

Source code in src\xpyrment\analyze\orchestrator.py
def to_json(self, indent: Optional[int] = None) -> str:
    """Converts the analysis results into a standardized, portable JSON string.

    Args:
        indent (Optional[int]): If provided, formats the JSON string with this indentation level.

    Returns:
        str: Standardized JSON representation of the analysis results.
    """
    from xpyrment.core.serialization import serialize_to_json

    return serialize_to_json(self.to_dict(), indent=indent)

summary

summary(formatted: bool = True) -> DataFrame

Returns a summarized, human-readable DataFrame of the analysis.

Formats raw numeric statistics (standard errors, differences, variances) into readable percentage lifts, relative confidence intervals, power indicators, and significance star symbols.

Significance Star Mapping
  • *** : \(p < 0.001\) (Highly significant)
  • ** : \(p < 0.01\) (Significant)
  • * : \(p < 0.05\) (Significant)
  • No star : \(p \ge 0.05\) (Not statistically significant at the nominal level \(\alpha=0.05\))
PARAMETER DESCRIPTION
formatted

If True, returns nicely formatted strings for display (with percentage symbols, stars, and bracketed intervals). If False, returns the raw numeric values. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing binned summaries of each analyzed metric.

Source code in src\xpyrment\analyze\orchestrator.py
def summary(self, formatted: bool = True) -> pd.DataFrame:
    r"""Returns a summarized, human-readable DataFrame of the analysis.

    Formats raw numeric statistics (standard errors, differences, variances) into readable
    percentage lifts, relative confidence intervals, power indicators, and significance star symbols.

    Significance Star Mapping:
        - `***` : $p < 0.001$ (Highly significant)
        - `**`  : $p < 0.01$ (Significant)
        - `*`   : $p < 0.05$ (Significant)
        - No star : $p \ge 0.05$ (Not statistically significant at the nominal level $\alpha=0.05$)

    Args:
        formatted (bool): If True, returns nicely formatted strings for display (with percentage symbols,
            stars, and bracketed intervals). If False, returns the raw numeric values. Defaults to True.

    Returns:
        pd.DataFrame: A pandas DataFrame containing binned summaries of each analyzed metric.
    """
    df = self.df_raw.copy()

    if not formatted:
        return df

    summary_df = pd.DataFrame()
    summary_df["Metric"] = df["metric_name"]
    summary_df["Type"] = df["metric_type"]

    summary_df["Control Mean"] = df["control_mean"].map("{:.4f}".format)
    summary_df["Treatment Mean"] = df["treatment_mean"].map("{:.4f}".format)

    summary_df["Relative Lift"] = df["relative_lift"].map("{:+.2%}".format, na_action="ignore").fillna("N/A")

    ci_mask = df["rel_ci_lower"].isna() | df["rel_ci_upper"].isna()
    lower_str = df["rel_ci_lower"].map("{:+.2%}".format, na_action="ignore")
    upper_str = df["rel_ci_upper"].map("{:+.2%}".format, na_action="ignore")
    summary_df["95% CI (Rel)"] = np.where(ci_mask, "N/A", "[" + lower_str.astype(str) + ", " + upper_str.astype(str) + "]")

    p_val = df["p_value"]
    p_str = p_val.map("{:.4f}".format, na_action="ignore")
    sig_symbol = np.select([p_val < 0.001, p_val < 0.01, p_val < 0.05], ["***", "**", "*"], default="")
    summary_df["p-value"] = np.where(p_val.isna(), "N/A", p_str.astype(str) + sig_symbol)

    summary_df["Post-hoc Power"] = df["power"].map("{:.1%}".format, na_action="ignore").fillna("N/A")

    summary_df["CUPED"] = df["cuped_applied"].map({True: "Yes", False: "No"})

    # Previously, the logic was: `if not cuped: return "-"`.
    # In python, `bool(np.nan)` is True, while `bool(None)` or `bool(False)` is False.
    # Using `.astype(bool)` exactly replicates this prior behavior.
    cuped_applied = df["cuped_applied"].astype(bool)
    var_red_str = df["variance_reduction"].map("{:.1%}".format, na_action="ignore").fillna("-")
    summary_df["Var Reduction"] = np.where(cuped_applied, var_red_str, "-")

    # Automatically raise an alert / print warning if covariate imbalance is detected
    if self.balance_checker is not None and self.balance_checker.diagnostics_:
        imbalanced = []
        for name, stats in self.balance_checker.diagnostics_.items():
            if abs(stats["smd"]) > 0.1:
                imbalanced.append(f"'{name}' (SMD={stats['smd']:+.4f})")
        if imbalanced:
            warnings.warn(
                f"COVARIATE IMBALANCE DETECTED: The following baseline covariates have standardized mean "
                f"differences (SMD) exceeding the standard 0.1 threshold: {', '.join(imbalanced)}. "
                f"Consider running CUPED adjustments, matching, or checking your randomization procedure."
            )

    return summary_df

plot

plot(**kwargs: Any) -> Any

Generates and returns a forest plot of the relative metric lifts and confidence intervals.

Forwards coordinates to the visualization module.

PARAMETER DESCRIPTION
**kwargs

Plot customization arguments forwarded to plot_forest (e.g., figure size, colors).

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
Any

matplotlib.axes.Axes or plotly.graph_objects.Figure: The generated relative lift forest plot.

Source code in src\xpyrment\analyze\orchestrator.py
def plot(self, **kwargs: Any) -> Any:
    """Generates and returns a forest plot of the relative metric lifts and confidence intervals.

    Forwards coordinates to the visualization module.

    Args:
        **kwargs: Plot customization arguments forwarded to `plot_forest` (e.g., figure size, colors).

    Returns:
        matplotlib.axes.Axes or plotly.graph_objects.Figure: The generated relative lift forest plot.
    """
    # Re-routed to the reporting/export layer dynamically
    from xpyrment.report.export import plot_forest

    return plot_forest(self.df_raw, alpha=self.alpha, **kwargs)

run_analysis

run_analysis(
    experiment: Experiment,
    control: str = "control",
    treatment: str = "treatment",
    alpha: float = 0.05,
    multi_test_correction: Optional[str] = None,
    covariates: Optional[List[str]] = None,
) -> AnalysisResult

Executes the statistical analysis across all registered metrics in an Experiment container.

Iterates over each registered metric in the experiment, calculates means, relative lifts, p-values, confidence intervals, and power. If requested, applies multiple testing corrections across the p-values, updates the experiment state to ANALYZED, and returns a structured AnalysisResult.

PARAMETER DESCRIPTION
experiment

The initialized, pre-registered experiment setup container.

TYPE: Experiment

control

The label of the control variant in the treatment column. Defaults to "control".

TYPE: str DEFAULT: 'control'

treatment

The label of the treatment variant in the treatment column. Defaults to "treatment".

TYPE: str DEFAULT: 'treatment'

alpha

Significance level (Type I error probability) for confidence intervals. Defaults to 0.05.

TYPE: float DEFAULT: 0.05

multi_test_correction

Multiple testing correction algorithm to apply across the registered metrics. Options: "bonferroni", "holm", "fdr_bh", "fdr_by", "hochberg". Defaults to None.

TYPE: str DEFAULT: None

covariates

List of covariates to check balance and adjust.

TYPE: List[str] DEFAULT: None

RETURNS DESCRIPTION
AnalysisResult

A rich, summarized results container.

TYPE: AnalysisResult

RAISES DESCRIPTION
ValueError

If no metrics have been registered, or if control/treatment labels are missing from the active dataset.

PhaseOrderError

If the experiment is in an invalid state for running analysis.

Source code in src\xpyrment\analyze\orchestrator.py
def run_analysis(
    experiment: Experiment,
    control: str = "control",
    treatment: str = "treatment",
    alpha: float = 0.05,
    multi_test_correction: Optional[str] = None,
    covariates: Optional[List[str]] = None,
) -> AnalysisResult:
    """Executes the statistical analysis across all registered metrics in an Experiment container.

    Iterates over each registered metric in the experiment, calculates means, relative lifts, p-values,
    confidence intervals, and power. If requested, applies multiple testing corrections across the p-values,
    updates the experiment state to `ANALYZED`, and returns a structured `AnalysisResult`.

    Args:
        experiment (Experiment): The initialized, pre-registered experiment setup container.
        control (str): The label of the control variant in the treatment column. Defaults to `"control"`.
        treatment (str): The label of the treatment variant in the treatment column. Defaults to `"treatment"`.
        alpha (float): Significance level (Type I error probability) for confidence intervals. Defaults to 0.05.
        multi_test_correction (str, optional): Multiple testing correction algorithm to apply across the
            registered metrics. Options: `"bonferroni"`, `"holm"`, `"fdr_bh"`, `"fdr_by"`, `"hochberg"`. Defaults to None.
        covariates (List[str], optional): List of covariates to check balance and adjust.

    Returns:
        AnalysisResult: A rich, summarized results container.

    Raises:
        ValueError: If no metrics have been registered, or if control/treatment labels are missing from
            the active dataset.
        PhaseOrderError: If the experiment is in an invalid state for running analysis.
    """
    unique_variants = experiment.data[experiment.treatment_col].unique()
    if control not in unique_variants:
        raise ValueError(f"Control label '{control}' not found.")
    if treatment not in unique_variants:
        raise ValueError(f"Treatment label '{treatment}' not found.")

    # 1. Topological sorted evaluation of MetricRegistry DAG if present
    if getattr(experiment, "metric_registry", None) is not None:
        registry = experiment.metric_registry
        raw_inputs = {}
        for node_name, node_info in registry.nodes.items():
            if node_info["type"] == "raw" and node_name in experiment.data.columns:
                raw_inputs[node_name] = experiment.data[node_name].to_numpy()

        evaluated_cache = registry.evaluate(raw_inputs)
        for key, val in evaluated_cache.items():
            if (
                key not in experiment.data.columns
                or registry.nodes.get(key, {}).get("type") == "derived"
            ):
                if len(val) == len(experiment.data):
                    experiment.data[key] = val

        # Auto-populate metrics from DAG if metrics list is currently empty
        if not experiment.metrics:
            from xpyrment.metrics.taxonomy import MeanMetric

            for name in evaluated_cache.keys():
                experiment.metrics.append(MeanMetric(name, value_col=name))

    if not experiment.metrics:
        raise ValueError("No metrics have been added to the experiment.")

    # Resolve global and method-specific covariates
    covs_to_check = (
        covariates if covariates is not None else getattr(experiment, "covariates", [])
    )

    # 2. Automated Covariate-adjusted CUPED routing
    if covs_to_check:
        for metric in experiment.metrics:
            from xpyrment.metrics.taxonomy import MeanMetric

            if isinstance(metric, MeanMetric) and not getattr(
                metric, "pre_period_col", None
            ):
                possible_candidates = [
                    f"pre_{metric.value_col}",
                    f"{metric.value_col}_pre",
                    f"{metric.value_col}_baseline",
                ]
                for cand in possible_candidates:
                    if cand in covs_to_check and cand in experiment.data.columns:
                        metric.pre_period_col = cand
                        break

    # 3. Covariate imbalance checking in a single call
    balance_checker = None
    if covs_to_check:
        valid_covs = [c for c in covs_to_check if c in experiment.data.columns]
        if valid_covs:
            from xpyrment.quasi.balance import CovariateBalanceChecker

            sub_df = experiment.data[
                experiment.data[experiment.treatment_col].isin([control, treatment])
            ].dropna(subset=valid_covs)
            if len(sub_df) > 0:
                import numpy as np

                X = sub_df[valid_covs].to_numpy()
                T = (
                    (sub_df[experiment.treatment_col] == treatment)
                    .astype(int)
                    .to_numpy()
                )
                balance_checker = CovariateBalanceChecker(covariate_names=valid_covs)
                balance_checker.fit(X, T)

    # 4. Statistical Evaluation of each Metric
    results = []
    for metric in experiment.metrics:
        res = metric.calculate(
            experiment.data,
            treatment_col=experiment.treatment_col,
            control=control,
            treatment=treatment,
            alpha=alpha,
        )
        results.append(res)

    # Apply multiple testing corrections if requested
    if multi_test_correction and len(results) > 1:
        p_vals = [res["p_value"] for res in results]
        adjusted_p = apply_multiple_testing_correction(
            p_vals, alpha=alpha, method=multi_test_correction
        )
        for i, val in enumerate(adjusted_p):
            results[i]["p_value"] = val

    experiment.transition_to(ExperimentState.ANALYZED)
    return AnalysisResult(results, alpha=alpha, balance_checker=balance_checker)

setup

setup(
    data: DataFrame,
    treatment_col: str,
    id_col: Optional[str] = None,
    covariates: Optional[List[str]] = None,
) -> Experiment

Initializes the experimental setup container, serving as the library's primary entrypoint.

Sets up the Experiment object with the target dataset, identifying variant and unit columns, and locks the state machine to ExperimentState.DESIGNED.

PARAMETER DESCRIPTION
data

The main experiment dataset containing exposure logs and outcomes.

TYPE: DataFrame

treatment_col

Column name containing variant strings (e.g., "variant").

TYPE: str

id_col

Column name containing unique unit identifiers (e.g., "user_id").

TYPE: str DEFAULT: None

covariates

Optional list of baseline covariates.

TYPE: List[str] DEFAULT: None

RETURNS DESCRIPTION
Experiment

A state-gated Experiment orchestrator instance, ready for metric registration and planning.

TYPE: Experiment

Source code in src\xpyrment\analyze\orchestrator.py
def setup(
    data: pd.DataFrame,
    treatment_col: str,
    id_col: Optional[str] = None,
    covariates: Optional[List[str]] = None,
) -> Experiment:
    """Initializes the experimental setup container, serving as the library's primary entrypoint.

    Sets up the `Experiment` object with the target dataset, identifying variant and unit columns,
    and locks the state machine to `ExperimentState.DESIGNED`.

    Args:
        data (pd.DataFrame): The main experiment dataset containing exposure logs and outcomes.
        treatment_col (str): Column name containing variant strings (e.g., `"variant"`).
        id_col (str, optional): Column name containing unique unit identifiers (e.g., `"user_id"`).
        covariates (List[str], optional): Optional list of baseline covariates.

    Returns:
        Experiment: A state-gated `Experiment` orchestrator instance, ready for metric registration and planning.
    """
    print("==========================================")
    print("      Initializing xpyrment Setup         ")
    print("==========================================")
    print(f"Total rows in dataset:  {len(data)}")
    print(f"Treatment column:      {treatment_col}")
    if id_col:
        print(f"ID column:             {id_col}")

    exp = Experiment(data, treatment_col, id_col, covariates=covariates)
    return exp