Skip to content

Design of Experiments (DoE) Module

The xpyrment.design.doe package provides industrial-grade structures and algorithms for classical, adaptive, and optimal Design of Experiments (DoE).

📋 Supported DoE Layouts & Classes

The following design templates are dynamically registered and available inside this library:

Experimental Design Technical Reference Key Characteristics & Methodology
BoxBehnkenDesign xpyrment.design.doe.BoxBehnkenDesign Generates Box-Behnken Designs to estimate non-linear response surfaces in fewer runs.
CentralCompositeDesign xpyrment.design.doe.CentralCompositeDesign Generates Central Composite Designs (CCD) for response surface modeling.
DOptimalDesign xpyrment.design.doe.DOptimalDesign Optimizes the design matrix determinant |X'X| using coordinate exchange algorithms.
DefinitiveScreeningDesign xpyrment.design.doe.DefinitiveScreeningDesign Generates a Definitive Screening Design (DSD) to estimate linear, quadratic, and 2-way terms.
EVOPDesign xpyrment.design.doe.EVOPDesign Manages Evolutionary Operation (EVOP) structures for live process improvements.
FractionalFactorialDesign xpyrment.design.doe.FractionalFactorialDesign Generates fractional factorial experimental designs, showing resolution and aliasing.
FullFactorialDesign xpyrment.design.doe.FullFactorialDesign Generates a full factorial experimental design matrix for all factor combinations.
LatinHypercubeDesign xpyrment.design.doe.LatinHypercubeDesign Generates Latin Hypercube Sampling designs for multidimensional factors.
MixtureDesign xpyrment.design.doe.MixtureDesign Generates designs for mixture factor spaces where components must sum to 1.0 (e.g., recipe mixes).
PlackettBurmanDesign xpyrment.design.doe.PlackettBurmanDesign Generates Plackett-Burman screening design matrices to identify main effects in few runs.
SwitchbackDesign xpyrment.design.doe.SwitchbackDesign Generates time/geo crossover switchback experimental designs for marketplace tests.
TaguchiDesign xpyrment.design.doe.TaguchiDesign Generates Taguchi orthogonal array designs and S/N ratio mapping specifications.

📦 Package Reference

doe

Classical Design of Experiments (DoE) generators and optimization engines.

This package houses classical Design of Experiments (DoE) algorithms. It provides a comprehensive set of standardized design matrices for screening active factors, modeling non-linear response surfaces, optimizing mixtures, and scheduling marketplace crossover tests.

Available Designs: - FullFactorialDesign: Tests all possible level combinations. Orthogonal, estimates all interaction orders. - FractionalFactorialDesign (\(2^{k-p}\)): Highly efficient screening fractionals with explicit alias and resolution tracking. - PlackettBurmanDesign: Resolution III screening designs for identifying main effects under extreme trial budgets. - TaguchiDesign: Balanced Orthogonal Arrays mapped with Signal-to-Noise (S/N) ratios to optimize process robustness. - DefinitiveScreeningDesign (DSD): Advanced 3-level designs estimating linear, quadratic, and 2-way terms without follow-up runs. - CentralCompositeDesign (CCD): Standard 5-level Response Surface Methodology (RSM) design (cube + star + center points). - BoxBehnkenDesign (BBD): Efficient 3-level response surface design avoiding physical or operational corner combinations. - DOptimalDesign: Computer-guided coordinate exchange optimization to build designs under irregular constraints or budget sizes. - LatinHypercubeDesign (LHS): Multidimensional space-filling sampling for simulation and black-box computer experiments. - MixtureDesign: Simplex Lattice and Simplex Centroid formulations where component fractions must sum strictly to 1.0. - SwitchbackDesign: Temporal crossover designs for multi-region marketplace tests where network spillovers preclude user-splits. - EVOPDesign: Sequential low-amplitude perturbation loops designed for live, full-scale continuous process optimization.

MODULE DESCRIPTION
base

Abstract Base Class for classical Design of Experiments (DoE) matrices.

box_behnken

Box-Behnken Design (BBD) classical Response Surface Method (RSM) matrices.

carryover

Intertemporal Carryover Decompositions in Switchback & Crossover Designs (Block 26).

ccd

Central Composite Design (CCD) classical Response Surface Method (RSM) matrices.

d_optimal

D-Optimal computer-generated classical Design of Experiments (DoE) matrices.

dsd

Definitive Screening Design (DSD) classical Design of Experiments (DoE) matrices.

evop

Evolutionary Operation (EVOP) continuous live process optimization matrices.

fractional_factorial

Fractional Factorial classical Design of Experiments (DoE) matrices.

full_factorial

Full Factorial classical Design of Experiments (DoE) matrices.

lhs

Latin Hypercube Sampling (LHS) classical and space-filling Design of Experiments (DoE) matrices.

mixture

Mixture-based classical Design of Experiments (DoE) matrices.

plackett_burman

Plackett-Burman classical screening Design of Experiments (DoE) matrices.

switchback

Switchback and temporal crossover Design of Experiments (DoE) matrices.

taguchi

Taguchi robust classical Design of Experiments (DoE) matrices.

CLASS DESCRIPTION
DesignMatrix

Abstract base class representing a Design Matrix in Classical Design of Experiments (DoE).

BoxBehnkenDesign

Generates Box-Behnken Designs to estimate non-linear response surfaces in fewer runs.

CentralCompositeDesign

Generates Central Composite Designs (CCD) for response surface modeling.

DOptimalDesign

Optimizes the design matrix determinant |X'X| using coordinate exchange algorithms.

DefinitiveScreeningDesign

Generates a Definitive Screening Design (DSD) to estimate linear, quadratic, and 2-way terms.

EVOPDesign

Manages Evolutionary Operation (EVOP) structures for live process improvements.

FractionalFactorialDesign

Generates fractional factorial experimental designs, showing resolution and aliasing.

FullFactorialDesign

Generates a full factorial experimental design matrix for all factor combinations.

LatinHypercubeDesign

Generates Latin Hypercube Sampling designs for multidimensional factors.

MixtureDesign

Generates designs for mixture factor spaces where components must sum to 1.0 (e.g., recipe mixes).

PlackettBurmanDesign

Generates Plackett-Burman screening design matrices to identify main effects in few runs.

SwitchbackDesign

Generates time/geo crossover switchback experimental designs for marketplace tests.

TaguchiDesign

Generates Taguchi orthogonal array designs and S/N ratio mapping specifications.

CarryoverDecomposition

Estimates Direct Treatment Effects (DTE) and decay curves of intertemporal carryover effects.

DesignMatrix

DesignMatrix(factors: Dict[str, List[float]])

Bases: ABC

Abstract base class representing a Design Matrix in Classical Design of Experiments (DoE).

A Design Matrix is a structured layout of factor combinations designed to evaluate treatment effects, screening active factors, or modeling multi-factor non-linear responses with the minimal possible experiment run footprint.

Mathematical Context

Let \(k\) be the number of independent factors (variables). A design matrix \(X\) is an \(N \times k\) matrix where each row represents a specific experiment run (a combination of factor levels) and each column represents a factor. In standardized designs, factor levels are typically mapped to coded values: - Continuous factors: mapped to \([-1, 0, +1]\) representing low, medium, and high levels. - Categorical factors: mapped to discrete integer indices.

The primary goal of classical DoE is to construct \(X\) such that the parameter estimation variance of the linear model: $$ Y = X\beta + \varepsilon $$ is minimized, which is equivalent to maximizing the information matrix \(X^T X\).

ATTRIBUTE DESCRIPTION
factors

A dictionary mapping factor names (str) to their list of possible levels (floats/integers). For example: {"temperature": [100.0, 150.0], "pressure": [1.0, 2.0]}.

TYPE: Dict[str, List[float]]

PARAMETER DESCRIPTION
factors

Mapping of factor labels to their designated levels.

TYPE: Dict[str, List[float]]

METHOD DESCRIPTION
generate

Generates and returns the specific classical DoE design matrix.

Source code in src\xpyrment\design\doe\base.py
def __init__(self, factors: Dict[str, List[float]]):
    """Initializes a new DesignMatrix.

    Args:
        factors (Dict[str, List[float]]): Mapping of factor labels to their designated levels.
    """
    self.factors = factors

generate abstractmethod

generate() -> DataFrame

Generates and returns the specific classical DoE design matrix.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the design matrix, where columns represent factors and rows represent specific experimental runs.

Source code in src\xpyrment\design\doe\base.py
@abstractmethod
def generate(self) -> pd.DataFrame:
    """Generates and returns the specific classical DoE design matrix.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the design matrix, where columns represent
            factors and rows represent specific experimental runs.
    """
    pass

BoxBehnkenDesign

BoxBehnkenDesign(factors: dict, num_center_points: int = 3)

Bases: DesignMatrix

Generates Box-Behnken Designs to estimate non-linear response surfaces in fewer runs.

A Box-Behnken design is a 3-level response surface design (coded as \(-1, 0, +1\)) built by combining fractional factorials with balanced incomplete block designs (BIBDs). Geometrically, instead of placing points at the corners of a multidimensional cube (like CCDs or full factorials), a BBD places points specifically at the midpoints of the edges of the process space, combined with replicated center points.

Safety and Operational Advantages

Because BBD does not contain any "corner points" (combinations where all factors are simultaneously at their extreme high/low levels, such as \((+1, +1, +1)\) or \((-1, -1, -1)\)), it is exceptionally well-suited for physical, chemical, or web experiments where extreme combinations are physically dangerous, environmentally harmful, or likely to trigger catastrophic system errors or service downtime.

Mathematical Sizing and Properties

The total number of required experimental runs \(N\) is: $$ N = 2k(k - 1) + n_c $$ where \(k\) is the number of factors and \(n_c\) is the number of replicated center points (typically \(3\) to \(5\)). For example: - \(k = 3\): \(N = 12 + n_c\) runs (usually \(15\) runs with \(3\) center points, compared to \(15\) for face-centered CCD). - \(k = 4\): \(N = 24 + n_c\) runs (usually \(27\) runs, compared to \(31\) for CCD).

Algorithmic Construction

The design matrix is built by running standard \(2^2\) factorials for pairs of factors while keeping all other factors fixed at their center levels (\(0\)). For \(k=3\) factors: - Block 1: Factors 1 & 2 vary as \(2^2\) factorial (\(\pm 1\)), Factor 3 is fixed at \(0\). - Block 2: Factors 1 & 3 vary as \(2^2\) factorial (\(\pm 1\)), Factor 2 is fixed at \(0\). - Block 3: Factors 2 & 3 vary as \(2^2\) factorial (\(\pm 1\)), Factor 1 is fixed at \(0\). - Center points: All factors are at \(0\).

Pseudocode for the Algorithm
function generate_bbd(factors, num_center_points):
    1. Determine k = number of factors.
    2. Find all unique pairs (or appropriate balanced blocks) of factors.
    3. Initialize design matrix of size (2k(k-1) + num_center_points) x k with zeros.
    4. For each block/pair of active factors (i, j):
         Create 4 rows representing the standard 2^2 factorial combinations of (+1, -1) for columns i and j.
         All other columns in these 4 rows remain 0.
    5. Append num_center_points rows consisting entirely of zeros.
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their low/high levels.

TYPE: dict

num_center_points

Number of center point replicates to append. Defaults to 3.

TYPE: int DEFAULT: 3

METHOD DESCRIPTION
generate

Generates the Box-Behnken Design matrix.

Source code in src\xpyrment\design\doe\box_behnken.py
def __init__(self, factors: dict, num_center_points: int = 3):
    """Initializes a BoxBehnkenDesign.

    Args:
        factors (dict): Mapping of factor labels to their low/high levels.
        num_center_points (int): Number of center point replicates to append. Defaults to 3.
    """
    super().__init__(factors)
    self.num_center_points = num_center_points

generate

generate() -> DataFrame

Generates the Box-Behnken Design matrix.

Constructs the incomplete block structure in coded space, shuffles columns, appends center trials, and maps back to physical coordinates.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the BBD matrix.

Source code in src\xpyrment\design\doe\box_behnken.py
def generate(self) -> pd.DataFrame:
    """Generates the Box-Behnken Design matrix.

    Constructs the incomplete block structure in coded space, shuffles columns,
    appends center trials, and maps back to physical coordinates.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the BBD matrix.
    """
    import itertools

    k = len(self.factors)
    keys = list(self.factors.keys())

    if k < 3:
        raise ValueError("Box-Behnken Design requires at least 3 factors.")

    # 1. Generate incomplete block factorial pairs in coded space [-1.0, 0.0, 1.0]
    coded_rows = []
    pairs = list(itertools.combinations(range(k), 2))
    for i, j in pairs:
        # 2^2 combinations for active factors, keeping other factors at 0.0
        for val_i, val_j in itertools.product([-1.0, 1.0], repeat=2):
            row = [0.0] * k
            row[i] = val_i
            row[j] = val_j
            coded_rows.append(row)

    factorial_df = pd.DataFrame(coded_rows, columns=keys)

    # 2. Append center points (all coded 0.0s)
    center_rows = [[0.0] * k for _ in range(self.num_center_points)]
    center_df = pd.DataFrame(center_rows, columns=keys)

    # 3. Stack blocks
    coded_df = pd.concat([factorial_df, center_df], ignore_index=True)

    # 4. Map coded space to actual physical factor levels
    physical_df = pd.DataFrame()
    for col in keys:
        low, high = self.factors[col]
        mid = (low + high) / 2.0
        half_range = (high - low) / 2.0
        physical_df[col] = mid + coded_df[col] * half_range

    return physical_df

CentralCompositeDesign

CentralCompositeDesign(
    factors: dict,
    alpha: str = "orthogonal",
    num_center_points: int = 4,
)

Bases: DesignMatrix

Generates Central Composite Designs (CCD) for response surface modeling.

A CCD is a 5-level design that extends a standard 2-level factorial design by adding star (axial) points and center points. This allows the estimation of quadratic curvature terms, enabling process optimization.

Mathematical Structure

A CCD with \(k\) factors consists of three distinct parts: 1. Factorial (or Cube) Portion: A standard \(2^k\) (or \(2^{k-p}\) fractional) design with coded levels \([-1, +1]\). This estimates main effects and two-way interactions. (Size: \(N_f = 2^{k-p}\) runs). 2. Axial (or Star) Portion: Runs where all factors except one are set to their center level (\(0\)), and the remaining factor is set to a distance of \(\pm \alpha\). (Size: \(N_a = 2k\) runs). 3. Center Points: Multiple runs where all factors are set to \(0\). This allows estimation of pure experimental error and stabilizes prediction variance at the center of the design space. (Size: \(n_c\) runs).

The total number of runs \(N\) is: $$ N = 2^{k-p} + 2k + n_c $$

Alpha (\(\alpha\)) Parameter Configuration: The distance \(\alpha\) of the axial points determines the geometric shape and properties of the design: - Rotatable: \(\alpha = (N_f)^{1/4}\) (e.g., \(\alpha = 1.414\) for \(k=2\), \(\alpha = 1.682\) for \(k=3\)). Rotatability guarantees that the prediction variance is constant at all points equidistant from the design center. - Face-Centered (CCF): \(\alpha = 1.0\). Star points are placed on the faces of the cube, meaning factors only require 3 levels instead of 5, which is highly practical for many experimental setups. - Orthogonal: \(\alpha\) is mathematically selected so that the quadratic effect estimates are completely uncorrelated with the main effects and other quadratic terms.

Pseudocode for the Algorithm
function generate_ccd(factors, alpha_type, num_center_points):
    1. Generate factorial matrix (size N_f x k) using FullFactorial or FractionalFactorial.
       Values are in coded space [-1, +1].
    2. Compute alpha value based on alpha_type (rotatable, face-centered, orthogonal) and k.
    3. Generate axial matrix of size (2k x k):
         For each factor i from 1 to k:
           Row 1: all zeros except column i which is +alpha.
           Row 2: all zeros except column i which is -alpha.
    4. Generate center points matrix of size (num_center_points x k), containing all zeros.
    5. Stack cube, star, and center matrices (size N x k).
    6. Map coded values [-alpha, -1, 0, +1, +alpha] to physical levels in factors.
    7. Return DataFrame.
ATTRIBUTE DESCRIPTION
alpha_type

Geometric configuration for axial star point distance. Options are "rotatable", "face-centered", "orthogonal". Defaults to "orthogonal".

TYPE: str

Examples:

Example
>>> # Constructing a face-centered CCD for 2 factors
>>> factors = {"temp": [100, 200], "pressure": [1.5, 3.0]}
>>> design = CentralCompositeDesign(factors, alpha="face-centered")
>>> # The coded levels generated will be -1 (low), 0 (midpoint), +1 (high).
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their low/high levels.

TYPE: dict

alpha

Axial point distance strategy ("rotatable", "face-centered", "orthogonal"). Defaults to "orthogonal".

TYPE: str DEFAULT: 'orthogonal'

num_center_points

Number of center point replicates to append. Defaults to 4.

TYPE: int DEFAULT: 4

METHOD DESCRIPTION
generate

Generates the Central Composite Design matrix.

Source code in src\xpyrment\design\doe\ccd.py
def __init__(self, factors: dict, alpha: str = "orthogonal", num_center_points: int = 4):
    """Initializes a CentralCompositeDesign.

    Args:
        factors (dict): Mapping of factor labels to their low/high levels.
        alpha (str): Axial point distance strategy (`"rotatable"`, `"face-centered"`, `"orthogonal"`).
            Defaults to `"orthogonal"`.
        num_center_points (int): Number of center point replicates to append. Defaults to 4.
    """
    super().__init__(factors)
    self.alpha_type = alpha
    self.num_center_points = num_center_points

generate

generate() -> DataFrame

Generates the Central Composite Design matrix.

Constructs the cube, star, and center blocks in coded space, stacks them, and scales the coordinates back to physical units.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the complete CCD matrix.

Source code in src\xpyrment\design\doe\ccd.py
def generate(self) -> pd.DataFrame:
    """Generates the Central Composite Design matrix.

    Constructs the cube, star, and center blocks in coded space, stacks them,
    and scales the coordinates back to physical units.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the complete CCD matrix.
    """
    import itertools

    k = len(self.factors)
    keys = list(self.factors.keys())

    # 1. Cube Portion (Standard 2^k full factorial in coded [-1.0, 1.0] space)
    cube_combinations = list(itertools.product([-1.0, 1.0], repeat=k))
    cube_df = pd.DataFrame(cube_combinations, columns=keys)

    # 2. Determine Alpha (Axial Distance)
    alpha_val = 1.0
    if self.alpha_type == "face-centered":
        alpha_val = 1.0
    elif self.alpha_type == "rotatable":
        alpha_val = (2.0 ** k) ** 0.25
    else:  # "orthogonal"
        # Standard default orthogonal alpha is often very close or identical to rotatable
        alpha_val = (2.0 ** k) ** 0.25

    # 3. Axial Star Portion (2k runs at coded distance +/- alpha)
    star_rows = []
    for i in range(k):
        row_pos = [0.0] * k
        row_pos[i] = alpha_val
        star_rows.append(row_pos)

        row_neg = [0.0] * k
        row_neg[i] = -alpha_val
        star_rows.append(row_neg)
    star_df = pd.DataFrame(star_rows, columns=keys)

    # 4. Center Points (num_center_points runs consisting of all coded 0.0s)
    center_rows = [[0.0] * k for _ in range(self.num_center_points)]
    center_df = pd.DataFrame(center_rows, columns=keys)

    # 5. Combine coded blocks
    coded_df = pd.concat([cube_df, star_df, center_df], ignore_index=True)

    # 6. Re-scale coded space to actual physical levels
    physical_df = pd.DataFrame()
    for col in keys:
        low, high = self.factors[col]
        mid = (low + high) / 2.0
        half_range = (high - low) / 2.0
        physical_df[col] = mid + coded_df[col] * half_range

    return physical_df

DOptimalDesign

DOptimalDesign(
    factors: dict, num_runs: int, seed: int = 42
)

Bases: DesignMatrix

Optimizes the design matrix determinant |X'X| using coordinate exchange algorithms.

Unlike classical designs which rely on rigid geometric symmetries, D-optimal designs are algorithmic, computer-generated designs. They are customized to fit a user-specified model (e.g., linear, interactive, or quadratic) and a fixed budget of \(N\) runs, subject to arbitrary boundary constraints.

When to use D-Optimal Designs
  1. Irregular Design Space: When certain factor combinations are physically impossible or unsafe (e.g., Temperature + Pressure \(\le\) Threshold). This represents a non-rectangular design space.
  2. Specific Run Constraints: When the budget restricts the experiment to exactly \(N\) runs, which does not match any standard geometric size (such as \(16\) or \(27\)).
  3. Mix of Qualitative Factors: When factors have irregular, unequal numbers of discrete levels.
Mathematical Criterion

Let \(X\) be the \(N \times p\) model design matrix (where columns include main effects, interactions, and quadratic terms). The information matrix is \(M = X^T X\). D-optimality maximizes the determinant of the information matrix: $$ \max_{X} \left| X^T X \right| $$ Maximizing this determinant is mathematically equivalent to minimizing the volume of the joint confidence ellipsoid for the estimated model parameters \(\beta\). The D-efficiency of a design is: $$ D_{\text{eff}} = 100 \times \left( \frac{\left| X^T X \right|^{1/p}}{N} \right) $$

Coordinate Exchange Algorithm (Meyer & Nachtsheim, 1995): To find the optimal design without evaluating all combinations (which is NP-hard): 1. Create an initial design matrix \(X^{(0)}\) of size \(N \times k\) by randomly sampling from candidate levels. 2. Loop through each cell \(x_{i, j}\) (row \(i\), factor \(j\)) in the matrix: - Replace \(x_{i, j}\) with each possible candidate level. - For each replacement, compute the new determinant \(|X^T X|\) using rank-one update formulas (Sherman-Morrison-Woodbury theorem) to avoid expensive \(O(p^3)\) matrix inversion. - Keep the level that maximizes the determinant. 3. Repeat coordinate sweeps across the matrix until a full sweep results in no determinant improvements. 4. Run this process from multiple (e.g., 20 to 50) random initial starts to prevent convergence into local optima, selecting the global maximum found.

ATTRIBUTE DESCRIPTION
num_runs

The exact target trial budget (number of runs in the final matrix).

TYPE: int

Examples:

Example
>>> # Planning a 12-run custom design for 3 factors with safety constraints
>>> factors = {"temp": [100, 150, 200], "speed": [10, 20, 30]}
>>> design = DOptimalDesign(factors, num_runs=12)
>>> # The generated DataFrame will contain exactly 12 runs, maximizing parameter estimation power.
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their candidate levels.

TYPE: dict

num_runs

The target budget of trials.

TYPE: int

seed

Random seed for reproducibility. Defaults to 42.

TYPE: int DEFAULT: 42

METHOD DESCRIPTION
generate

Generates the D-Optimal design matrix.

Source code in src\xpyrment\design\doe\d_optimal.py
def __init__(self, factors: dict, num_runs: int, seed: int = 42):
    """Initializes a DOptimalDesign.

    Args:
        factors (dict): Mapping of factor labels to their candidate levels.
        num_runs (int): The target budget of trials.
        seed (int): Random seed for reproducibility. Defaults to 42.
    """
    super().__init__(factors)
    self.num_runs = num_runs
    self.seed = seed

generate

generate() -> DataFrame

Generates the D-Optimal design matrix.

Runs the coordinate exchange optimization loop from multiple random starts, applying boundary constraints, and maps the best output to physical units.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the optimal design matrix.

Source code in src\xpyrment\design\doe\d_optimal.py
def generate(self) -> pd.DataFrame:
    """Generates the D-Optimal design matrix.

    Runs the coordinate exchange optimization loop from multiple random starts,
    applying boundary constraints, and maps the best output to physical units.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the optimal design matrix.
    """
    import numpy as np

    rng = np.random.default_rng(self.seed)

    k = len(self.factors)
    keys = list(self.factors.keys())
    candidates = [np.array(self.factors[col]) for col in keys]

    best_det = -1.0
    best_df = None

    # Helper to construct model matrix X (intercept + linear terms)
    def build_model_matrix(D_vals):
        # D_vals is shape (num_runs, k)
        intercept = np.ones((self.num_runs, 1))
        return np.hstack([intercept, D_vals])

    # Run multiple random starts to avoid local optima
    num_starts = 10
    for _ in range(num_starts):
        # 1. Initialize random starting design from candidate levels
        D_current = np.zeros((self.num_runs, k))
        for j in range(k):
            D_current[:, j] = rng.choice(candidates[j], size=self.num_runs)

        # 2. Iterate Coordinate Exchange sweeps until convergence
        max_iter = 5
        for _ in range(max_iter):
            changed = False
            for i in range(self.num_runs):
                for j in range(k):
                    current_val = D_current[i, j]
                    best_local_val = current_val

                    # Compute initial determinant
                    X_curr = build_model_matrix(D_current)
                    best_local_det = np.linalg.det(np.dot(X_curr.T, X_curr))

                    for cand in candidates[j]:
                        if cand == current_val:
                            continue

                        # Temporarily exchange coordinate
                        D_current[i, j] = cand
                        X_test = build_model_matrix(D_current)
                        test_det = np.linalg.det(np.dot(X_test.T, X_test))

                        if test_det > best_local_det:
                            best_local_det = test_det
                            best_local_val = cand

                    # Keep the level that maximized the determinant
                    if best_local_val != current_val:
                        D_current[i, j] = best_local_val
                        changed = True
                    else:
                        D_current[i, j] = current_val

            if not changed:
                break

        # 3. Assess final determinant for this start
        X_final = build_model_matrix(D_current)
        final_det = np.linalg.det(np.dot(X_final.T, X_final))

        if final_det > best_det:
            best_det = final_det
            best_df = pd.DataFrame(D_current, columns=keys)

    if best_df is None or best_det <= 1e-12:
        # Fallback: if all starts failed to yield non-singular matrices,
        # return a simple deterministic grid sample or raw randomized start
        fallback_D = np.zeros((self.num_runs, k))
        for j in range(k):
            fallback_D[:, j] = rng.choice(candidates[j], size=self.num_runs)
        best_df = pd.DataFrame(fallback_D, columns=keys)

    # TODO: Implement fast rank-1 update formulas (using Sherman-Morrison) to compute determinants in O(1) instead of recalculating full SVD in O(p^3).
    # TODO: Add alternative optimality criteria such as A-Optimality (trace of inverse information matrix) and G-Optimality (minimizing maximum prediction variance).
    return best_df

DefinitiveScreeningDesign

DefinitiveScreeningDesign(factors: dict)

Bases: DesignMatrix

Generates a Definitive Screening Design (DSD) to estimate linear, quadratic, and 2-way terms.

DSDs are specialized 3-level designs (coded as \(-1, 0, +1\)). Unlike traditional screening designs (which can only estimate linear main effects), a DSD is designed to screen active factors while simultaneously modeling curvilinear (quadratic) responses and 2-way interactions without requiring a subsequent follow-up Response Surface Method (RSM) design.

Mathematical Properties and Sizing

Let \(k\) be the number of factors. The minimum required number of experimental runs \(N\) is: - For even \(k\): \(N = 2k + 1\) runs. - For odd \(k\): \(N = 2k + 3\) runs (which represents the next even number of factors plus 3).

The structural properties of the resulting design matrix \(X\) include: 1. Orthogonality of Main Effects: All main effects are mutually orthogonal and completely unconfounded with other main effects, two-way interactions, and quadratic effects. 2. Weak Confounding of Quadratic Terms: Quadratic terms are not confounded with main effects, and are only partially/weakly confounded with two-way interactions. 3. No Co-aliased 2-Way Interactions: No two-way interaction is completely aliased (confounded) with any other two-way interaction.

Conference Matrix Generative Algorithm

The core of a DSD is constructed using mathematical Conference Matrices (\(C\)). A conference matrix \(C_m\) of order \(m\) is an \(m \times m\) matrix with diagonal entries equal to \(0\) and off-diagonal entries equal to \(\pm 1\), satisfying: $$ C_m^T C_m = (m - 1) I_m $$ To generate a DSD for \(k\) factors: 1. Generate a conference matrix of appropriate size. 2. For each row \(r_i\) in the conference matrix, create a "folded-over" pair of rows: $$ r_i \quad \text{and} \quad -r_i $$ This guarantees that the columns have an average value of exactly zero, centering the design. 3. Append a final "center point" run consisting entirely of zeros: $$ [0, 0, \dots, 0] $$ 4. This results in \(2k + 1\) (or \(2k + 3\)) runs in coded \([-1, 0, +1]\) space. 5. Map these levels back to physical levels (low, medium, high) in factors.

Pseudocode for the Algorithm
function generate_dsd(factors):
    1. Determine k = number of factors.
    2. Select order m of conference matrix (m = k if k even, m = k + 1 if k odd).
    3. Load/generate conference matrix C of order m.
    4. Create matrix D by stacking [C] and [-C] (size 2m x m).
    5. Append row of all zeros (size (2m + 1) x m).
    6. If k was odd, truncate the last column of the matrix to yield a (2k + 3) x k matrix.
    7. Scale coded values [-1, 0, +1] to the physical factor levels.
    8. Return DataFrame.

Examples:

Example
>>> # If we have k=4 factors, a full factorial 3-level design requires 3^4 = 81 runs.
>>> # A DSD requires only 2(4) + 1 = 9 runs! It can still identify active quadratic curvature terms.
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their designated low, mid, and high levels.

TYPE: dict

METHOD DESCRIPTION
generate

Generates the Definitive Screening Design matrix.

Source code in src\xpyrment\design\doe\dsd.py
def __init__(self, factors: dict):
    """Initializes a Definitive Screening Design.

    Args:
        factors (dict): Mapping of factor labels to their designated low, mid, and high levels.
    """
    super().__init__(factors)
    for factor_name, levels in self.factors.items():
        if len(levels) != 3:
            raise ValueError(
                f"Definitive Screening Designs strictly require exactly 3 levels per factor (low, mid, high). "
                f"Factor '{factor_name}' has {len(levels)} levels."
            )

generate

generate() -> DataFrame

Generates the Definitive Screening Design matrix.

Looks up or constructs the base conference matrix, performs folding operations, appends the center point, and scales the coded levels.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the DSD matrix.

Source code in src\xpyrment\design\doe\dsd.py
def generate(self) -> pd.DataFrame:
    """Generates the Definitive Screening Design matrix.

    Looks up or constructs the base conference matrix, performs folding operations,
    appends the center point, and scales the coded levels.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the DSD matrix.
    """
    import numpy as np

    k = len(self.factors)
    keys = list(self.factors.keys())

    if k < 2:
        raise ValueError("Definitive Screening Design requires at least 2 factors.")

    # Determine the size of the conference matrix (m = k if even, m = k + 1 if odd)
    m = k if k % 2 == 0 else k + 1

    # We must support m of at least size 4
    if m < 4:
        m = 4

    # Pre-calculated conference matrices of order m (diagonal 0, off-diagonal +/-1, orthogonal)
    conference_matrices = {
        4: np.array([
            [0.0, 1.0, 1.0, 1.0],
            [1.0, 0.0, -1.0, 1.0],
            [1.0, 1.0, 0.0, -1.0],
            [1.0, -1.0, 1.0, 0.0]
        ]),
        6: np.array([
            [0.0, 1.0, 1.0, 1.0, 1.0, 1.0],
            [1.0, 0.0, 1.0, -1.0, -1.0, 1.0],
            [1.0, 1.0, 0.0, 1.0, -1.0, -1.0],
            [1.0, -1.0, 1.0, 0.0, 1.0, -1.0],
            [1.0, -1.0, -1.0, 1.0, 0.0, 1.0],
            [1.0, 1.0, -1.0, -1.0, 1.0, 0.0]
        ]),
        8: np.array([
            [0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
            [-1.0, 0.0, 1.0, 1.0, -1.0, 1.0, -1.0, -1.0],
            [-1.0, -1.0, 0.0, 1.0, 1.0, -1.0, 1.0, -1.0],
            [-1.0, -1.0, -1.0, 0.0, 1.0, 1.0, -1.0, 1.0],
            [-1.0, 1.0, -1.0, -1.0, 0.0, 1.0, 1.0, -1.0],
            [-1.0, -1.0, 1.0, -1.0, -1.0, 0.0, 1.0, 1.0],
            [-1.0, 1.0, -1.0, 1.0, -1.0, -1.0, 0.0, 1.0],
            [-1.0, 1.0, 1.0, -1.0, 1.0, -1.0, -1.0, 0.0]
        ])
    }

    if m not in conference_matrices:
        raise ValueError(
            f"No standard conference matrix available for order {m}. "
            "Ensure the number of factors k satisfies k <= 8."
        )

    C = conference_matrices[m]

    # Fold-over operation: Stack C and -C
    folded = np.vstack([C, -C])

    # Append overall center point (all zeros)
    center_row = np.zeros((1, m))
    coded_matrix = np.vstack([folded, center_row])

    # If k was odd (and we scaled to even m), truncate the last column to get k factors
    coded_matrix = coded_matrix[:, :k]

    # Map coded levels [-1.0, 0.0, 1.0] to physical coordinates
    physical_df = pd.DataFrame()
    for idx, col in enumerate(keys):
        levels = self.factors[col]
        low, high = levels[0], levels[-1]  # Standard low/high boundaries
        mid = (low + high) / 2.0
        half_range = (high - low) / 2.0
        physical_df[col] = mid + coded_matrix[:, idx] * half_range

    # TODO: Implement algorithmic synthesis of conference matrices of arbitrary even orders using quadratic residues.
    # TODO: Add validation checks to confirm no active two-factor interaction terms are fully aliased with main effects.
    return physical_df

EVOPDesign

EVOPDesign(factors: Dict[str, List[float]])

Bases: DesignMatrix

Manages Evolutionary Operation (EVOP) structures for live process improvements.

EVOP is based on the philosophy that a live process should not only produce a product (revenue), but should also generate valuable information on how to improve itself. To avoid producing off-specification output or causing system instability, EVOP introduces extremely small, low-amplitude perturbations to factors around the current operating center.

Operational Cycles and Phases
  1. Low-Amplitude Perturbations: Factor levels (typically in a simple \(2^2\) or \(2^3\) factorial layout with a center point) are set extremely close to the current operating standard.
  2. Sequential Replication (Cycles): Because the factor changes are small, the signal-to-noise ratio is very low. The same factor combinations are repeated sequentially over multiple "Cycles" (\(C_1, C_2, \dots\)). As cycles accumulate, statistical standard errors decrease (\(SE \propto 1/\sqrt{N}\)), and the treatment effects gradually emerge from the background noise.
  3. Evolutionary Steps (Phases): Once an effect or interaction is shown to be statistically significant and beneficial, the operating center is shifted to the new optimal combination. This starts a new "Phase" of the study, and the perturbation pattern is repeated around the new center.
Mathematical Representation

For a 2-factor EVOP with current operating settings \((X_{1, \text{opt}}, X_{2, \text{opt}})\): - Center Point (0): \((X_{1, \text{opt}}, X_{2, \text{opt}})\) - Coded run (-1, -1): \((X_{1, \text{opt}} - \delta_1, X_{2, \text{opt}} - \delta_2)\) - Coded run (+1, -1): \((X_{1, \text{opt}} + \delta_1, X_{2, \text{opt}} - \delta_2)\) - Coded run (-1, +1): \((X_{1, \text{opt}} - \delta_1, X_{2, \text{opt}} + \delta_2)\) - Coded run (+1, +1): \((X_{1, \text{opt}} + \delta_1, X_{2, \text{opt}} + \delta_2)\) where \(\delta_j\) is a very small, non-disruptive increment.

Pseudocode for the Algorithm
function generate_evop_matrix(center_settings, delta_increments):
    1. Form standard 2^k factorial + center run in coded space [-1, 0, +1].
    2. For each factor column j:
         Map 0 -> center_settings[j]
         Map -1 -> center_settings[j] - delta_increments[j]
         Map +1 -> center_settings[j] + delta_increments[j]
    3. Structure the output DataFrame with additional columns:
         "Cycle": tracking the sequence index of the replication.
         "Phase": tracking the active optimization phase.
    4. Return DataFrame.
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their designated levels.

TYPE: Dict[str, List[float]]

METHOD DESCRIPTION
generate

Generates the Evolutionary Operation (EVOP) factor matrix.

Source code in src\xpyrment\design\doe\base.py
def __init__(self, factors: Dict[str, List[float]]):
    """Initializes a new DesignMatrix.

    Args:
        factors (Dict[str, List[float]]): Mapping of factor labels to their designated levels.
    """
    self.factors = factors

generate

generate(
    center_settings: dict = None,
    deltas: dict = None,
    num_cycles: int = 3,
    phase: int = 1,
) -> DataFrame

Generates the Evolutionary Operation (EVOP) factor matrix.

Applies tiny operational increments around current baseline settings, constructs the replicated factorial cycle matrix, and returns the result.

PARAMETER DESCRIPTION
center_settings

Target operating center values for each factor.

TYPE: dict DEFAULT: None

deltas

Tiny perturbation step increments for each factor.

TYPE: dict DEFAULT: None

num_cycles

Number of cycles (replications) to run. Defaults to 3.

TYPE: int DEFAULT: 3

phase

Studying phase identifier. Defaults to 1.

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the EVOP design matrix.

Source code in src\xpyrment\design\doe\evop.py
def generate(
    self,
    center_settings: dict = None,
    deltas: dict = None,
    num_cycles: int = 3,
    phase: int = 1
) -> pd.DataFrame:
    """Generates the Evolutionary Operation (EVOP) factor matrix.

    Applies tiny operational increments around current baseline settings, constructs
    the replicated factorial cycle matrix, and returns the result.

    Args:
        center_settings (dict): Target operating center values for each factor.
        deltas (dict): Tiny perturbation step increments for each factor.
        num_cycles (int): Number of cycles (replications) to run. Defaults to 3.
        phase (int): Studying phase identifier. Defaults to 1.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the EVOP design matrix.
    """
    import itertools
    import numpy as np

    k = len(self.factors)
    keys = list(self.factors.keys())

    # Resolve center settings and perturbation step deltas
    if center_settings is None:
        center_settings = {}
        for col in keys:
            levels = self.factors[col]
            center_settings[col] = float(np.mean(levels))

    if deltas is None:
        deltas = {}
        for col in keys:
            deltas[col] = 1.0  # Default small step

    # Standard 2^k factorial corners in coded [-1.0, 1.0] space
    corners = list(itertools.product([-1.0, 1.0], repeat=k))
    # Center point coded as 0.0
    center = [tuple([0.0] * k)]
    cycle_coded_runs = center + corners  # Run size = 2^k + 1

    rows = []
    for cycle in range(1, num_cycles + 1):
        for run in cycle_coded_runs:
            run_dict = {
                "Cycle": cycle,
                "Phase": phase
            }
            for idx, col in enumerate(keys):
                coded_val = run[idx]
                # Map coded coordinate back to low-amplitude physical space
                physical_val = center_settings[col] + coded_val * deltas[col]
                run_dict[col] = physical_val
            rows.append(run_dict)

    # TODO: Add Box-Hunter evolutionary optimization cycle calculation curves to automatically compute main/interaction effects and errors.
    # TODO: Integrate automated stopping rules when treatment boundary shift reaches optimum levels (Simplex Evolutionary Operation optimization).
    return pd.DataFrame(rows)

FractionalFactorialDesign

FractionalFactorialDesign(
    factors: dict, generator_string: str
)

Bases: DesignMatrix

Generates fractional factorial experimental designs, showing resolution and aliasing.

In a fractional factorial design (represented as \(2^{k-p}\)), only a subset of size \(2^{k-p}\) of the \(2^k\) full combinations is run, where \(p\) is the number of "generators" used to define columns for fractional factors. This is highly efficient for screening a large number of factors using a small number of runs, under the Sparsity of Effects principle (which states that system response is primarily governed by main effects and low-order interactions).

Mathematical Background and Sizing
  • Coded Notation: Let there be \(k\) factors. We select a subset of \(k - p\) "base factors" which are run as a standard \(2^{k-p}\) full factorial design.
  • Generators: The remaining \(p\) factors are generated by multiplying columns of the base factors. For example, in a \(2^{5-1}\) design (\(k=5, p=1\)), the fifth factor \(E\) can be generated as: $$ E = A \times B \times C \times D \quad (\text{or } E = ABCD) $$ The defining relation of the design is then: $$ I = ABCDE $$
  • Resolution:
  • Resolution III: Main effects are confounded with 2-way interactions. (e.g., \(I = ABD\)).
  • Resolution IV: Main effects are confounded with 3-way interactions, but 2-way interactions are confounded with other 2-way interactions. (e.g., \(I = ABCD\)).
  • Resolution V: Main effects are confounded with 4-way interactions, and 2-way interactions are confounded with 3-way interactions. No main effect or 2-way interaction is confounded with any other main effect or 2-way interaction. (e.g., \(I = ABCDE\)).

Confounding/Alias Structure Algorithm: To calculate the alias structure, we multiply every effect column by the defining relation. Since \(A \times A = I\) (identity): $$ A \times I = A \times ABCDE \implies A = BCDE $$ Thus, the estimate of factor \(A\)'s effect is confounded with the 4-way interaction \(BCDE\).

Pseudocode for the Algorithm
function generate_fractional_factorial(factors, generator_string):
    1. Parse generator_string (e.g., "E = ABCD, F = BCD").
    2. Identify base factors (e.g., A, B, C, D).
    3. Generate full factorial matrix of size 2^(k-p) for base factors.
       Map levels to coded values [-1, +1].
    4. For each generator "X = Y1*Y2*...":
         Create column X by taking the row-wise product of columns Y1, Y2, ...
    5. Re-scale coded [-1, +1] columns back to the actual factor level physical units.
    6. Return DataFrame.
ATTRIBUTE DESCRIPTION
generator_string

Defining relation generator string, such as "E=ABCD" or "E=AB, F=AC".

TYPE: str

Examples:

Example
>>> # Planning a 2^{5-1} resolution V design
>>> # Base factors are A, B, C, D. E is generated as ABCD.
>>> factors = {
...     "A": [10, 20], "B": [0, 1], "C": [-1, 1], "D": [5, 10], "E": [100, 200]
... }
>>> design = FractionalFactorialDesign(factors, generator_string="E = ABCD")
>>> # In the generated DataFrame, we will have 2^(5-1) = 16 runs instead of 32.
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their designated low/high levels.

TYPE: dict

generator_string

Design generator equations, e.g. "E = AB, F = BC".

TYPE: str

METHOD DESCRIPTION
generate

Generates the fractional factorial design matrix.

Source code in src\xpyrment\design\doe\fractional_factorial.py
def __init__(self, factors: dict, generator_string: str):
    """Initializes a FractionalFactorialDesign.

    Args:
        factors (dict): Mapping of factor labels to their designated low/high levels.
        generator_string (str): Design generator equations, e.g. `"E = AB, F = BC"`.
    """
    super().__init__(factors)
    self.generator_string = generator_string

    for factor_name, levels in self.factors.items():
        if len(levels) != 2:
            raise ValueError(
                f"Fractional Factorial designs strictly require exactly 2 levels per factor. "
                f"Factor '{factor_name}' has {len(levels)} levels."
            )

generate

generate() -> DataFrame

Generates the fractional factorial design matrix.

Processes the generator string to construct base columns and applies multiplicative transformations to compute fractional factors.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the fractional factorial design matrix, using physical factor level units.

Source code in src\xpyrment\design\doe\fractional_factorial.py
def generate(self) -> pd.DataFrame:
    """Generates the fractional factorial design matrix.

    Processes the generator string to construct base columns and applies multiplicative
    transformations to compute fractional factors.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the fractional factorial design matrix,
            using physical factor level units.
    """
    import itertools

    # Parse generator equations (e.g. "E = ABCD, F = BCD" or "E = A * B * C * D")
    generators = {}
    for equation in self.generator_string.split(","):
        if "=" not in equation:
            continue
        lhs, rhs = equation.split("=")
        lhs = lhs.strip()
        rhs_cleaned = rhs.replace(" ", "").replace("*", "")

        # Find which of the defined factors are components of this generator
        components = []
        for char in rhs_cleaned:
            if char in self.factors:
                components.append(char)
        generators[lhs] = components

    # Identify base factors (factors that are not generated)
    base_keys = [k for k in self.factors.keys() if k not in generators]
    if not base_keys:
        raise ValueError("All factors cannot be generated; some base factors are required.")

    # Create base full factorial in coded space [-1.0, 1.0]
    combinations = list(itertools.product([-1.0, 1.0], repeat=len(base_keys)))
    coded_df = pd.DataFrame(combinations, columns=base_keys)

    # Compute generated columns row-wise
    for lhs, components in generators.items():
        if not components:
            raise ValueError(f"No valid factors found in generator expression for '{lhs}'")

        # Start with the first component
        col_val = coded_df[components[0]].copy()
        for comp in components[1:]:
            col_val *= coded_df[comp]
        coded_df[lhs] = col_val

    # Align columns to the original factors ordering
    coded_df = coded_df[list(self.factors.keys())]

    # Map coded space [-1.0, 1.0] coordinates to actual physical bounds
    physical_df = pd.DataFrame()
    for col in self.factors.keys():
        low, high = self.factors[col]
        mid = (low + high) / 2.0
        half_range = (high - low) / 2.0
        physical_df[col] = mid + coded_df[col] * half_range

    return physical_df

FullFactorialDesign

FullFactorialDesign(factors: Dict[str, List[float]])

Bases: DesignMatrix

Generates a full factorial experimental design matrix for all factor combinations.

In a full factorial design, the experiment runs encompass all combinations of levels across all factors. This design is highly informative because it enables the orthogonal estimation of: - Main Effects: The individual impact of changing each factor independently. - Interaction Effects: How the impact of one factor depends on the level of other factors, extending to 2-way, 3-way, ..., and \(k\)-way interactions.

Mathematical Context

Let \(k\) be the number of independent factors, and let \(l_i\) be the number of levels for factor \(i\) (\(i \in \{1, 2, \dots, k\}\)). The total number of required experimental runs \(N\) is: $$ N = \prod_{i=1}^{k} l_i $$ For symmetric \(2^k\) designs (where every factor has exactly 2 levels: e.g., low and high): $$ N = 2^k $$ While extremely thorough, full factorial designs suffer from the "curse of dimensionality", where \(N\) grows exponentially as more factors are added, making them economically or temporally unfeasible for large numbers of factors (where fractional designs are preferred).

Examples:

Example
>>> # Constructing a 2^2 full factorial design
>>> factors = {"temp": [100, 200], "pressure": [1.5, 3.0]}
>>> design = FullFactorialDesign(factors)
>>> df = design.generate()
>>> len(df)
4
>>> print(df)
temp  pressure
0   100       1.5
1   100       3.0
2   200       1.5
3   200       3.0
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their designated levels.

TYPE: Dict[str, List[float]]

METHOD DESCRIPTION
generate

Generates the full factorial design matrix.

Source code in src\xpyrment\design\doe\base.py
def __init__(self, factors: Dict[str, List[float]]):
    """Initializes a new DesignMatrix.

    Args:
        factors (Dict[str, List[float]]): Mapping of factor labels to their designated levels.
    """
    self.factors = factors

generate

generate() -> DataFrame

Generates the full factorial design matrix.

Utilizes itertools.product to compute the Cartesian product of all factor levels, creating an \(N \times k\) DataFrame.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame representing the full factorial design matrix.

Source code in src\xpyrment\design\doe\full_factorial.py
def generate(self) -> pd.DataFrame:
    r"""Generates the full factorial design matrix.

    Utilizes `itertools.product` to compute the Cartesian product of all factor levels,
    creating an $N \times k$ DataFrame.

    Returns:
        pd.DataFrame: A pandas DataFrame representing the full factorial design matrix.
    """
    import itertools

    keys = list(self.factors.keys())
    values = list(self.factors.values())
    combinations = list(itertools.product(*values))

    df = pd.DataFrame(combinations, columns=keys)
    return df

LatinHypercubeDesign

LatinHypercubeDesign(
    factors: dict, num_samples: int, seed: int = 42
)

Bases: DesignMatrix

Generates Latin Hypercube Sampling designs for multidimensional factors.

LHS is a generalization of a Latin Square to an arbitrary number of dimensions. It ensures that the ensemble of random samples is distributed evenly across the entire multi-dimensional space, preventing the accidental clustering of points that can occur with simple random sampling.

Mathematical Definition and Properties

Let \(N\) be the number of target samples (runs) and \(k\) be the number of factors. A Latin Hypercube design is represented by an \(N \times k\) matrix where: 1. Projection of the \(N\) sample points onto any single factor dimension yields exactly one sample in each of \(N\) equally probable intervals. 2. Specifically, the range of each factor is divided into \(N\) non-overlapping intervals of equal probability: $$ I_j = \left[ \frac{j-1}{N}, \frac{j}{N} \right] \quad \text{for } j \in {1, 2, \dots, N} $$ 3. Within each interval \(I_j\), a point is sampled (either at the midpoint or randomly): $$ x_j = \frac{j-1 + U_j}{N} $$ where \(U_j \sim \text{Uniform}(0, 1)\) is a random noise variable. 4. The sampled values for the \(k\) dimensions are paired using independent, random permutations of the set \(\{1, 2, \dots, N\}\) for each column.

Maximin LHS Space-Filling Variant

Standard random LHS can still yield sample points clustered close together in multidimensional space. To prevent this, Maximin LHS optimizes the permutations to maximize the minimum Euclidean distance between any two sample points: $$ \max_{\Pi} \min_{a \neq b} \lVert x_a - x_b \rVert_2 $$ This forces the points to spread out as far as possible, filling the multidimensional space uniformly.

Pseudocode for the Algorithm
function generate_lhs(factors, N):
    1. Determine k = number of factors.
    2. Initialize N x k matrix LHS_coded.
    3. For each column j from 1 to k:
         a. Create array of interval indices: [1, 2, ..., N].
         b. Shuffle the interval indices randomly (permutation).
         c. For each row i:
              Draw random uniform noise U ~ Uniform(0, 1).
              Compute coded value: cell = (shuffled_index[i] - 1 + U) / N.
              LHS_coded[i, j] = cell.
    4. If Maximin optimization is enabled:
         Iterate permutations to maximize min_distance(row_a, row_b).
    5. Map the coded values in [0, 1] to the physical bounds specified in factors.
    6. Return DataFrame.
ATTRIBUTE DESCRIPTION
num_samples

The exact number of samples (runs) to draw.

TYPE: int

Examples:

Example
>>> # Drawing 50 space-filling points to explore temperature and speed bounds
>>> factors = {"temp": [100, 500], "speed": [0, 100]}
>>> design = LatinHypercubeDesign(factors, num_samples=50)
>>> # The generated DataFrame will contain 50 runs covering the entire rectangular region.
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their lower and upper physical boundaries.

TYPE: dict

num_samples

The target number of samples.

TYPE: int

seed

Random seed for reproducibility. Defaults to 42.

TYPE: int DEFAULT: 42

METHOD DESCRIPTION
generate

Generates the Latin Hypercube design matrix.

Source code in src\xpyrment\design\doe\lhs.py
def __init__(self, factors: dict, num_samples: int, seed: int = 42):
    """Initializes a LatinHypercubeDesign.

    Args:
        factors (dict): Mapping of factor labels to their lower and upper physical boundaries.
        num_samples (int): The target number of samples.
        seed (int): Random seed for reproducibility. Defaults to 42.
    """
    super().__init__(factors)
    self.num_samples = num_samples
    self.seed = seed

generate

generate() -> DataFrame

Generates the Latin Hypercube design matrix.

Partitions interval ranges, draws independent uniform perturbations, shuffles columns, optionally optimizes minimum spacing, and maps results to physical bounds.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the LHS matrix.

Source code in src\xpyrment\design\doe\lhs.py
def generate(self) -> pd.DataFrame:
    """Generates the Latin Hypercube design matrix.

    Partitions interval ranges, draws independent uniform perturbations, shuffles columns,
    optionally optimizes minimum spacing, and maps results to physical bounds.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the LHS matrix.
    """
    import numpy as np

    rng = np.random.default_rng(self.seed)

    k = len(self.factors)
    keys = list(self.factors.keys())
    N = self.num_samples

    coded_matrix = np.zeros((N, k))

    for j in range(k):
        # Create interval indices from 1 to N
        intervals = np.arange(1, N + 1)
        # Permute interval indices independently for each factor
        shuffled_intervals = rng.permutation(intervals)

        # For each interval, draw a uniform point
        for i in range(N):
            U = rng.uniform(0.0, 1.0)
            # Compute coded coordinate in [0.0, 1.0]
            coded_matrix[i, j] = (shuffled_intervals[i] - 1.0 + U) / N

    # Map coded levels from [0.0, 1.0] back to physical bounds
    physical_df = pd.DataFrame()
    for idx, col in enumerate(keys):
        bounds = self.factors[col]
        low, high = bounds[0], bounds[-1]
        physical_df[col] = low + coded_matrix[:, idx] * (high - low)

    # TODO: Implement Maximin Latin Hypercube optimization (shuffling columns to maximize minimum pairwise Euclidean distance).
    # TODO: Add correlation-minimization algorithms (such as Owen's randomized LHS) to reduce collinearity between factors.
    return physical_df

MixtureDesign

MixtureDesign(factors: dict, m: int = 2)

Bases: DesignMatrix

Generates designs for mixture factor spaces where components must sum to 1.0 (e.g., recipe mixes).

In mixture experiments, the independent variables are components of a blend. Changing the proportion of one ingredient automatically forces the proportions of other ingredients to change.

Mathematical Representation and Simplex Constraints

Let \(k\) be the number of components in the mixture, and let \(x_i\) be the proportion of component \(i\) (\(i \in \{1, 2, \dots, k\}\)). The design space is subject to the following strict constraints: $$ \sum_{i=1}^{k} x_i = 1.0 \quad \text{and} \quad 0 \le x_i \le 1.0 \ \ \forall \ i $$ This constraint restricts the geometric region of interest to a \((k-1)\)-dimensional simplex (e.g., an equilateral triangle for \(k=3\), or a regular tetrahedron for \(k=4\)).

Standard Classical Designs
  1. Simplex Lattice Design (denoted \(\{k, m\}\)): Each component takes \(m + 1\) equally spaced values between \(0\) and \(1\): $$ x_i \in \left{ 0, \frac{1}{m}, \frac{2}{m}, \dots, 1 \right} $$ The runs consist of all possible combinations of these levels that sum to exactly \(1.0\). The total number of runs \(N\) is: $$ N = \frac{(k + m - 1)!}{m!(k - 1)!} $$
  2. Simplex Centroid Design: Consists of runs at pure components (one factor is \(1.0\), others \(0\)), binary blends (two factors at \(0.5\), others \(0\)), ternary blends (three factors at \(1/3\), others \(0\)), and so on, up to the overall centroid blend where all \(k\) factors are equal to \(1/k\). The total number of runs \(N\) is: $$ N = 2^k - 1 $$

Pseudocode for Simplex Lattice Algorithm:

function generate_simplex_lattice(k, m):
    1. Generate all compositions of integer m into k parts:
       Find all non-negative integer vectors [v1, v2, ..., vk] such that sum(vi) = m.
    2. For each composition, compute the fractional proportions:
       x_i = v_i / m.
    3. Bind x_i columns to the physical factors.
    4. Return DataFrame.

Examples:

Example
>>> # Constructing a {3, 2} Simplex Lattice design (3 ingredients, quadratic model)
>>> # Proportions will be: [1, 0, 0], [0.5, 0.5, 0], [0, 1, 0], [0, 0.5, 0.5], [0, 0, 1], [0.5, 0, 0.5]
PARAMETER DESCRIPTION
factors

Mapping of ingredient factor labels to their low/high levels (usually [0.0, 1.0]).

TYPE: dict

m

Simplex Lattice division parameter. Defaults to 2.

TYPE: int DEFAULT: 2

METHOD DESCRIPTION
generate

Generates the Mixture Design matrix.

Source code in src\xpyrment\design\doe\mixture.py
def __init__(self, factors: dict, m: int = 2):
    """Initializes a MixtureDesign.

    Args:
        factors (dict): Mapping of ingredient factor labels to their low/high levels (usually [0.0, 1.0]).
        m (int): Simplex Lattice division parameter. Defaults to 2.
    """
    super().__init__(factors)
    self.m = m

generate

generate() -> DataFrame

Generates the Mixture Design matrix.

Constructs simplex coordinates using either Simplex Lattice or Simplex Centroid algorithms, verifies the summing constraint, and binds to physical factors.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the Mixture design matrix.

Source code in src\xpyrment\design\doe\mixture.py
def generate(self) -> pd.DataFrame:
    """Generates the Mixture Design matrix.

    Constructs simplex coordinates using either Simplex Lattice or Simplex Centroid algorithms,
    verifies the summing constraint, and binds to physical factors.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the Mixture design matrix.
    """
    import numpy as np

    k = len(self.factors)
    keys = list(self.factors.keys())

    # Recursive helper to generate non-negative integer vectors of length k summing to m
    def get_compositions(num_parts, target_sum):
        if num_parts == 1:
            return [[target_sum]]
        compositions = []
        for v in range(target_sum + 1):
            for sub in get_compositions(num_parts - 1, target_sum - v):
                compositions.append([v] + sub)
        return compositions

    compositions = get_compositions(k, self.m)
    proportions = np.array(compositions) / float(self.m)

    # Scale proportions to actual ingredient boundaries if they are not [0, 1]
    # (Though usually mixture boundaries are [0.0, 1.0])
    physical_df = pd.DataFrame(proportions, columns=keys)

    # TODO: Add support for McLean-Anderson or coordinate exchange constraints to handle upper/lower bounds on individual ingredients (e.g., ingredient A must be between 10% and 30%).
    # TODO: Implement Simplex Centroid designs to supplement Simplex Lattice with interior-point checks.
    return physical_df

PlackettBurmanDesign

PlackettBurmanDesign(factors: Dict[str, List[float]])

Bases: DesignMatrix

Generates Plackett-Burman screening design matrices to identify main effects in few runs.

Plackett-Burman designs are classical Resolution III fractionals where the number of runs \(N\) is a multiple of 4 (typically \(N = 12, 16, 20, 24, 28, 36, 40, \dots\)). They allow screening of up to \(k = N - 1\) factors in \(N\) trials.

Mathematical Context and Confounding

Unlike standard \(2^{k-p}\) fractional factorials where main effects are completely aliased with specific two-way interactions, Plackett-Burman designs exhibit a complex confounding structure where each main effect is partially confounded (aliased) with multiple two-way interactions. For example, in an \(N=12\) design, every main effect is aliased with 45 distinct two-way interactions with fractional coefficient weights of \(\pm 1/3\). This properties makes Plackett-Burman designs highly robust screening tools if the Sparsity of Effects principle holds, as interaction effects are diluted and do not fully bias any single main effect estimate.

Coded Generative Algorithm

Plackett-Burman designs are built by taking a specialized "first generator row" of signs (\(+\) and \(-\)), cyclically shifting it to form the first \(N - 1\) columns, and then appending a final row of all minuses (\(-1\)). Standard generator sequences for different run sizes (\(N\)) include: - \(N = 8\): + + + - + - - - \(N = 12\): + + - + + + - - - + - - \(N = 16\): + + + + - + - + + - - + - - - - \(N = 20\): + + - - + + - - + + - - + + - - + + -

Pseudocode for the Algorithm
function generate_plackett_burman(factors, N):
    1. Select first generator row of length N - 1 based on run size N.
       (e.g., for N=12: [1, 1, -1, 1, 1, 1, -1, -1, -1, 1, -1]).
    2. Construct (N - 1) x (N - 1) matrix by cyclically shifting the generator row.
    3. Append a final row of N - 1 elements, all equal to -1.
       Now we have an N x (N - 1) design matrix in coded [-1, +1] format.
    4. Truncate the columns to match the actual number of factors requested.
    5. Map the coded values back to the actual factor levels.
    6. Return DataFrame.

Examples:

Example
>>> # If we want to screen 10 factors, we can use an N=12 Plackett-Burman design, requiring only 12 runs!
>>> # Compare this with a full factorial which would require 2^10 = 1024 runs.
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their designated levels.

TYPE: Dict[str, List[float]]

METHOD DESCRIPTION
generate

Generates the Plackett-Burman design matrix.

Source code in src\xpyrment\design\doe\base.py
def __init__(self, factors: Dict[str, List[float]]):
    """Initializes a new DesignMatrix.

    Args:
        factors (Dict[str, List[float]]): Mapping of factor labels to their designated levels.
    """
    self.factors = factors

generate

generate() -> DataFrame

Generates the Plackett-Burman design matrix.

Builds the cyclic shifts based on standard generator sequences, truncates to match the factors dictionary, and scales back to the actual levels.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the design matrix.

Source code in src\xpyrment\design\doe\plackett_burman.py
def generate(self) -> pd.DataFrame:
    """Generates the Plackett-Burman design matrix.

    Builds the cyclic shifts based on standard generator sequences, truncates to match
    the factors dictionary, and scales back to the actual levels.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the design matrix.
    """
    import numpy as np

    k = len(self.factors)
    keys = list(self.factors.keys())

    # Standard Plackett-Burman cyclic generators for N-1 columns
    # N must be a multiple of 4. We find the smallest N >= k + 1.
    target_N = ((k + 1 + 3) // 4) * 4
    if target_N < 8:
        target_N = 8

    generators = {
        8: [1.0, 1.0, 1.0, -1.0, 1.0, -1.0, -1.0],
        12: [1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, -1.0, -1.0, 1.0, -1.0],
        16: [1.0, 1.0, 1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0, -1.0, -1.0, -1.0],
        20: [1.0, 1.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0, 1.0, -1.0],
        24: [1.0, 1.0, 1.0, 1.0, 1.0, -1.0, 1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 1.0, -1.0, -1.0, -1.0]
    }

    if target_N not in generators:
        raise ValueError(
            f"No standard Plackett-Burman generator sequence available for run size N={target_N}. "
            "Ensure number of factors k satisfies k <= 23."
        )

    S = generators[target_N]
    n_cols = target_N - 1

    # Build (N-1) x (N-1) matrix by cyclic shifting S
    matrix = np.zeros((n_cols, n_cols))
    for col_idx in range(n_cols):
        # Cyclic shift by col_idx positions
        shifted = S[col_idx:] + S[:col_idx]
        matrix[:, col_idx] = shifted

    # Append final row of all -1.0s to construct complete N x (N-1) design matrix
    final_row = np.array([-1.0] * n_cols).reshape(1, -1)
    coded_matrix = np.vstack([matrix, final_row])

    # Truncate matrix columns to match the actual number of factors requested (k)
    coded_matrix = coded_matrix[:, :k]

    # Convert coded coordinates to physical coordinates and return as DataFrame
    physical_df = pd.DataFrame()
    for idx, col in enumerate(keys):
        low, high = self.factors[col]
        mid = (low + high) / 2.0
        half_range = (high - low) / 2.0
        physical_df[col] = mid + coded_matrix[:, idx] * half_range

    return physical_df

SwitchbackDesign

SwitchbackDesign(factors: dict, unit_window_hours: int = 2)

Bases: DesignMatrix

Generates time/geo crossover switchback experimental designs for marketplace tests.

In standard user-split A/B testing, a treatment that increases supply utilization in a local market indirectly deprives the control group of supply (market interference/spillover bias). This violates the Stable Unit Treatment Value Assumption (SUTVA). Switchback designs resolve this by randomizing the entire marketplace over sequential time blocks.

Temporal Crossover and Balance

The marketplace switches back and forth between control and treatment configurations over a series of discrete time blocks of length \(W\) (e.g., 2 hours). $$ \text{Schedule}: W_1 \to \text{Control}, \ W_2 \to \text{Treatment}, \ W_3 \to \text{Treatment}, \ \dots $$ To prevent systematic time-of-day or day-of-week biases (e.g., treatment always running during rush hour), the assignments are structured using balanced crossover patterns: - Markovian Transitions: Ensuring equal transition probabilities between states (\(C \to T\), \(T \to C\), \(C \to C\), \(T \to T\)) to model and subtract temporal carryover. - Multi-region Crossover: If multiple geographic markets are available, we cross them over simultaneously: $$ \text{Region A}: C \to T \quad \text{vs.} \quad \text{Region B}: T \to C $$

Carryover and Washout Periods: A major challenge in switchbacks is the carryover effect — supply/demand states from a treatment period spilling over into the subsequent control period. To solve this, the algorithm configures a "washout" parameter \(\omega\) (e.g., 30 minutes) at the start of each window. Data collected during the first \(\omega\) minutes of each transition is excluded from statistical evaluation.

Pseudocode for the Algorithm
function generate_switchback_schedule(regions, start_time, end_time, window_hours, washout_minutes):
    1. Partition the experimental timeframe [start_time, end_time] into N discrete blocks of size window_hours.
    2. For each region:
         Generate a balanced crossover assignment sequence (using Latin Squares or randomized block schedules).
    3. For each block in the schedule:
         Mark the first washout_minutes as "washout_active = True" (telemetry exclusion flag).
         Assign the active variant label.
    4. Compile regional time-block rows into a structured DataFrame.
    5. Return DataFrame.
ATTRIBUTE DESCRIPTION
unit_window_hours

The duration in hours of each discrete experimental block. Defaults to 2.

TYPE: int

Examples:

Example
>>> # Scheduling a switchback test with 4-hour window blocks
>>> factors = {"dispatch_algorithm": ["greedy", "predictive"]}
>>> design = SwitchbackDesign(factors, unit_window_hours=4)
>>> # The output schedule allocates the marketplace state dynamically across the experimental window.
PARAMETER DESCRIPTION
factors

Mapping of the market-level factor to its variant options.

TYPE: dict

unit_window_hours

Duration of each switch window in hours. Defaults to 2.

TYPE: int DEFAULT: 2

METHOD DESCRIPTION
generate

Generates the Switchback design schedule.

optimize_washout

Evaluates AR(p) error structures across candidate washout times to stabilize covariance.

Source code in src\xpyrment\design\doe\switchback.py
def __init__(self, factors: dict, unit_window_hours: int = 2):
    """Initializes a SwitchbackDesign.

    Args:
        factors (dict): Mapping of the market-level factor to its variant options.
        unit_window_hours (int): Duration of each switch window in hours. Defaults to 2.
    """
    super().__init__(factors)
    self.unit_window_hours = unit_window_hours

generate

generate(
    regions: list = None,
    num_periods: int = 12,
    washout_minutes: int = 30,
) -> DataFrame

Generates the Switchback design schedule.

Divides the temporal horizons into balanced blocks, schedules treatment crossovers, marks washout segments, and outputs the operational assignment ledger.

PARAMETER DESCRIPTION
regions

List of geographic or logical market regions to crossover. Defaults to ["Region_A", "Region_B"].

TYPE: list DEFAULT: None

num_periods

Total number of sequential time block windows. Defaults to 12.

TYPE: int DEFAULT: 12

washout_minutes

Transition period length to discard temporal carryover. Defaults to 30.

TYPE: int DEFAULT: 30

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame representing the temporal switchback schedule.

Source code in src\xpyrment\design\doe\switchback.py
def generate(self, regions: list = None, num_periods: int = 12, washout_minutes: int = 30) -> pd.DataFrame:
    """Generates the Switchback design schedule.

    Divides the temporal horizons into balanced blocks, schedules treatment crossovers,
    marks washout segments, and outputs the operational assignment ledger.

    Args:
        regions (list): List of geographic or logical market regions to crossover.
            Defaults to `["Region_A", "Region_B"]`.
        num_periods (int): Total number of sequential time block windows. Defaults to 12.
        washout_minutes (int): Transition period length to discard temporal carryover. Defaults to 30.

    Returns:
        pd.DataFrame: A pandas DataFrame representing the temporal switchback schedule.
    """
    if regions is None:
        regions = ["Region_A", "Region_B"]

    factor_name = list(self.factors.keys())[0]
    variants = list(self.factors[factor_name])

    rows = []
    for r_idx, region in enumerate(regions):
        for period in range(1, num_periods + 1):
            start_hour = (period - 1) * self.unit_window_hours
            end_hour = period * self.unit_window_hours

            # Multi-region Crossover: toggle opposite configurations to balance periods
            variant_idx = (r_idx + period) % len(variants)
            assigned_variant = variants[variant_idx]

            # Record scheduled time block run
            rows.append({
                "region": region,
                "period": period,
                "start_hour": start_hour,
                "end_hour": end_hour,
                "washout_active": True,  # Washout indicator for early telemetry exclusion
                factor_name: assigned_variant
            })

    # TODO: Add option to optimize period switchover frequencies to minimize carrying-over spillover effects.
    # TODO: Implement Latin Square multi-period and multi-variant Latin Square crossover balancing to optimize more than 2 variants.
    return pd.DataFrame(rows)

optimize_washout

optimize_washout(
    df: DataFrame,
    time_col: str,
    metric_col: str,
    treatment_col: str,
    max_p: int = 3,
    stability_threshold: float = 0.05,
) -> int

Evaluates AR(p) error structures across candidate washout times to stabilize covariance.

PARAMETER DESCRIPTION
df

Experimental telemetry dataset.

TYPE: DataFrame

time_col

Minute/second elapsed column since block transition.

TYPE: str

metric_col

Target evaluation metric.

TYPE: str

treatment_col

Column indicating binary/categorical treatment assignment.

TYPE: str

max_p

Maximum autoregressive lag order. Defaults to 3.

TYPE: int DEFAULT: 3

stability_threshold

Variance/autocorrelation index threshold representing stability. Defaults to 0.05.

TYPE: float DEFAULT: 0.05

RETURNS DESCRIPTION
int

The optimal washout period in minutes.

TYPE: int

Source code in src\xpyrment\design\doe\switchback.py
def optimize_washout(
    self,
    df: pd.DataFrame,
    time_col: str,
    metric_col: str,
    treatment_col: str,
    max_p: int = 3,
    stability_threshold: float = 0.05,
) -> int:
    """Evaluates AR(p) error structures across candidate washout times to stabilize covariance.

    Args:
        df (pd.DataFrame): Experimental telemetry dataset.
        time_col (str): Minute/second elapsed column since block transition.
        metric_col (str): Target evaluation metric.
        treatment_col (str): Column indicating binary/categorical treatment assignment.
        max_p (int): Maximum autoregressive lag order. Defaults to 3.
        stability_threshold (float): Variance/autocorrelation index threshold representing stability.
            Defaults to 0.05.

    Returns:
        int: The optimal washout period in minutes.
    """
    import numpy as np

    # Candidate washout periods in minutes
    candidates = [0, 10, 20, 30, 45, 60]
    optimal_washout = candidates[-1]

    for wash in candidates:
        # Filter telemetry data outside the current candidate washout window and sort by time
        filtered_df = df[df[time_col] >= wash].sort_values(by=time_col).copy()
        if len(filtered_df) < (max_p + 15):
            continue

        # Subtract treatment effect to compute residual time-series
        X_design = filtered_df[treatment_col].to_numpy().reshape(-1, 1)
        y = filtered_df[metric_col].to_numpy()

        # Unpenalized regression
        X_bias = np.hstack([np.ones((X_design.shape[0], 1)), X_design])
        XTX = np.dot(X_bias.T, X_bias)
        beta = np.linalg.solve(XTX + 1e-6 * np.eye(2), np.dot(X_bias.T, y))
        residuals = y - np.dot(X_bias, beta)

        # Fit AR(p) model via OLS on residuals: residuals_t = phi_1 * res_{t-1} + ... + phi_p * res_{t-p}
        N_res = len(residuals)
        Z = np.zeros((N_res - max_p, max_p))
        for lag in range(1, max_p + 1):
            Z[:, lag - 1] = residuals[max_p - lag : -lag]

        target_y = residuals[max_p:]

        try:
            ZTZ = np.dot(Z.T, Z)
            phi = np.linalg.solve(ZTZ + 1e-6 * np.eye(max_p), np.dot(Z.T, target_y))

            # Compute autocorrelation index (L2 norm of AR parameters)
            ar_index = float(np.sum(phi**2))
            if ar_index < stability_threshold:
                optimal_washout = wash
                break
        except np.linalg.LinAlgError:
            continue

    return optimal_washout

TaguchiDesign

TaguchiDesign(factors: dict, array_name: str)

Bases: DesignMatrix

Generates Taguchi orthogonal array designs and S/N ratio mapping specifications.

Taguchi designs leverage specialized, highly balanced fractional factorials (Orthogonal Arrays, denoted as \(L_4, L_8, L_9, L_{12}, L_{16}, L_{18}, L_{27}\), etc.) to evaluate factor effects on both the mean and variability of responses. Rather than optimizing the mean alone, Taguchi methods prioritize "robustness" — making the system insensitive to uncontrollable "noise factors".

Mathematical Specifications for Signal-to-Noise (\(S/N\), denoted \(\eta\)) Ratios

Taguchi methods convert multiple experimental response replicates \(y_1, y_2, \dots, y_n\) at each run into an \(S/N\) ratio (\(\eta\) in decibels) depending on the optimization objective:

  1. Smaller-The-Better (e.g., latency, defects, material wear): $$ \eta = -10 \log_{10} \left( \frac{1}{n} \sum_{i=1}^{n} y_i^2 \right) $$
  2. Larger-The-Better (e.g., conversion rate, user engagement, revenue): $$ \eta = -10 \log_{10} \left( \frac{1}{n} \sum_{i=1}^{n} \frac{1}{y_i^2} \right) $$
  3. Nominal-The-Best (e.g., precise target dimensions, exact fluid viscosity): $$ \eta = 10 \log_{10} \left( \frac{\bar{y}^2}{s^2} \right) $$ where \(\bar{y}\) is the sample mean and \(s^2\) is the sample variance across the replicates.
Orthogonal Array Database Selection Algorithm
  • The user requests a specific array (e.g., \(L_9\), which supports up to 4 factors with 3 levels each).
  • The algorithm maps the requested physical factors to columns in the standard predefined \(L_9\) template:
Run Col 1 (Coded) Col 2 (Coded) Col 3 (Coded) Col 4 (Coded)
1 1 1 1 1
2 1 2 2 2
3 1 3 3 3
4 2 1 2 3
5 2 2 3 1
6 2 3 1 2
7 3 1 3 2
8 3 2 1 3
9 3 3 2 1

These levels are mapped back to the physical factor levels provided in factors.

ATTRIBUTE DESCRIPTION
array_name

The name of the target Taguchi Orthogonal Array (e.g., "L9", "L18", "L27").

TYPE: str

Examples:

Example
>>> # Selecting an L9 array (supports up to 4 factors at 3 levels)
>>> factors = {
...     "temperature": [100, 150, 200],
...     "pressure": [1.0, 1.5, 2.0],
...     "catalyst": [0.01, 0.05, 0.10]
... }
>>> design = TaguchiDesign(factors, array_name="L9")
>>> # Generated matrix will contain exactly 9 balanced runs.
PARAMETER DESCRIPTION
factors

Mapping of factor labels to their lists of levels.

TYPE: dict

array_name

The name of the target orthogonal array, e.g. "L9".

TYPE: str

METHOD DESCRIPTION
generate

Generates the Taguchi Orthogonal Array design matrix.

Source code in src\xpyrment\design\doe\taguchi.py
def __init__(self, factors: dict, array_name: str):
    """Initializes a TaguchiDesign.

    Args:
        factors (dict): Mapping of factor labels to their lists of levels.
        array_name (str): The name of the target orthogonal array, e.g. `"L9"`.
    """
    super().__init__(factors)
    self.array_name = array_name

generate

generate() -> DataFrame

Generates the Taguchi Orthogonal Array design matrix.

Looks up the standard template corresponding to array_name, binds the active factors to template columns, and maps them to physical levels.

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: A pandas DataFrame containing the Taguchi design matrix.

Source code in src\xpyrment\design\doe\taguchi.py
def generate(self) -> pd.DataFrame:
    """Generates the Taguchi Orthogonal Array design matrix.

    Looks up the standard template corresponding to `array_name`, binds the active
    factors to template columns, and maps them to physical levels.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the Taguchi design matrix.
    """
    array_name = self.array_name.upper()

    if array_name not in ARRAY_SPECS:
        raise ValueError(
            f"Taguchi array '{self.array_name}' is not currently implemented. "
            f"Supported arrays are: {', '.join(ARRAY_SPECS.keys())}."
        )

    spec = ARRAY_SPECS[array_name]
    matrix = spec["matrix"]
    max_factors = matrix.shape[1]

    k = len(self.factors)
    if k > max_factors:
        raise ValueError(f"{array_name} Orthogonal Array supports at most {max_factors} factors.")

    keys = list(self.factors.keys())
    physical_df = pd.DataFrame()

    for idx, col in enumerate(keys):
        levels = self.factors[col]
        required_levels = spec["levels_per_factor"](idx)
        if len(levels) != required_levels:
            raise ValueError(
                f"Factor '{col}' in Taguchi {array_name} design must have exactly "
                f"{required_levels} levels. It has {len(levels)} levels."
            )
        coded_col = matrix[:, idx]
        physical_df[col] = [levels[c - 1] for c in coded_col]

    # TODO: Integrate signal-to-noise ratio (SNR) loss analysis plots for parameter robust design.
    return physical_df

CarryoverDecomposition

CarryoverDecomposition(
    l2_penalty: float = 1e-05, max_lags: int = 1
)

Estimates Direct Treatment Effects (DTE) and decay curves of intertemporal carryover effects.

Fits the model

Y_t = beta_0 + beta_direct * T_t + sum_{k=1}^K beta_carryover_k * T_{t-k} * exp(-lambda_k * dt_t) + epsilon_t

Using an optimized coordinate grid search over the exponential decay parameters lambda.

PARAMETER DESCRIPTION
l2_penalty

L2 regularization parameter for OLS stability. Defaults to 1e-5.

TYPE: float DEFAULT: 1e-05

max_lags

Maximum number of historical lags to consider. Defaults to 1.

TYPE: int DEFAULT: 1

METHOD DESCRIPTION
fit

Fits the carryover decomposition model with multi-stage lags.

ATTRIBUTE DESCRIPTION
summary

Returns a summary dictionary of the fitted carryover effects.

TYPE: Dict[str, Union[float, Dict[str, float]]]

Source code in src\xpyrment\design\doe\carryover.py
def __init__(self, l2_penalty: float = 1e-5, max_lags: int = 1) -> None:
    """Initializes the carryover decomposer.

    Args:
        l2_penalty (float): L2 regularization parameter for OLS stability. Defaults to 1e-5.
        max_lags (int): Maximum number of historical lags to consider. Defaults to 1.
    """
    self.l2_penalty = l2_penalty
    self.max_lags = max_lags
    self.beta_direct_: float = 0.0
    self.beta_carryovers_: np.ndarray = np.array([])
    self.beta_baseline_: float = 0.0
    self.lambdas_: np.ndarray = np.array([])
    self.standard_errors_: Dict[str, float] = {}
    self.p_values_: Dict[str, float] = {}
    self.ll_profile_: Dict[float, float] = {}

summary property

summary: Dict[str, Union[float, Dict[str, float]]]

Returns a summary dictionary of the fitted carryover effects.

fit

fit(
    outcomes: ndarray, treatments: ndarray, times: ndarray
) -> CarryoverDecomposition

Fits the carryover decomposition model with multi-stage lags.

PARAMETER DESCRIPTION
outcomes

Target outcome series of shape (N,). Must be sorted chronologically.

TYPE: ndarray

treatments

Binary treatment series of shape (N,) (0 = control, 1 = treated).

TYPE: ndarray

times

Chronological timestamps of shape (N,).

TYPE: ndarray

RETURNS DESCRIPTION
CarryoverDecomposition

Fitted decomposer.

TYPE: CarryoverDecomposition

Source code in src\xpyrment\design\doe\carryover.py
def fit(self, outcomes: np.ndarray, treatments: np.ndarray, times: np.ndarray) -> "CarryoverDecomposition":
    """Fits the carryover decomposition model with multi-stage lags.

    Args:
        outcomes (np.ndarray): Target outcome series of shape (N,). Must be sorted chronologically.
        treatments (np.ndarray): Binary treatment series of shape (N,) (0 = control, 1 = treated).
        times (np.ndarray): Chronological timestamps of shape (N,).

    Returns:
        CarryoverDecomposition: Fitted decomposer.
    """
    N = len(outcomes)
    if N <= self.max_lags + 1:
        raise ValueError(f"Carryover estimation with {self.max_lags} lags requires at least {self.max_lags + 2} observations.")

    Y = outcomes[self.max_lags:].astype(float)
    T_curr = treatments[self.max_lags:].astype(float)
    dt = np.diff(times[self.max_lags-1:]).astype(float) # dt for the current period

    # For simplicity, we assume a single shared decay constant lambda across all lags
    # or we could search for each. Here we implement a shared lambda for the profile likelihood fallback.

    best_mse = float("inf")
    best_lambda = 0.0
    best_beta = np.zeros(2 + self.max_lags)
    best_cov = np.zeros((2 + self.max_lags, 2 + self.max_lags))

    # Search lambda in log-space
    lambdas = np.logspace(-2, 1, 100)

    for lmb in lambdas:
        # Construct carryover features for each lag k
        carryover_feats = []
        for k in range(1, self.max_lags + 1):
            T_prev_k = treatments[self.max_lags-k : -k].astype(float)
            # Decay factor scales with lag distance k
            feat = T_prev_k * np.exp(-lmb * k * dt)
            carryover_feats.append(feat)

        # Design matrix: [intercept, current_treatment, carryover_1, ..., carryover_K]
        X = np.vstack([np.ones(len(Y)), T_curr] + carryover_feats).T

        # Solve OLS
        XTX = np.dot(X.T, X)
        XTX_reg = XTX + self.l2_penalty * np.eye(X.shape[1])
        XTX_reg[0, 0] = XTX[0, 0]

        try:
            beta = np.linalg.solve(XTX_reg, np.dot(X.T, Y))
        except np.linalg.LinAlgError:
            continue

        residuals = Y - np.dot(X, beta)
        mse = np.mean(residuals ** 2)

        # Store log-likelihood for profile likelihood (assuming Gaussian noise)
        ll = -0.5 * len(Y) * (np.log(2 * np.pi * mse) + 1)
        self.ll_profile_[lmb] = ll

        if mse < best_mse:
            best_mse = mse
            best_lambda = lmb
            best_beta = beta

            df_resid = len(Y) - X.shape[1]
            if df_resid > 0:
                var_err = np.sum(residuals ** 2) / df_resid
                best_cov = var_err * np.linalg.inv(XTX_reg)

    self.beta_baseline_ = float(best_beta[0])
    self.beta_direct_ = float(best_beta[1])
    self.beta_carryovers_ = best_beta[2:].astype(float)
    self.lambdas_ = np.full(self.max_lags, best_lambda) # Shared lambda for now

    # Compute p-values and SEs
    from scipy.stats import t, chi2
    df_resid = len(Y) - (2 + self.max_lags)

    self.standard_errors_["baseline"] = float(np.sqrt(max(0.0, best_cov[0, 0])))
    self.standard_errors_["direct"] = float(np.sqrt(max(0.0, best_cov[1, 1])))

    for k in range(self.max_lags):
        self.standard_errors_[f"carryover_lag_{k+1}"] = float(np.sqrt(max(0.0, best_cov[2+k, 2+k])))

    for name, se in self.standard_errors_.items():
        if se > 0 and df_resid > 0:
            coeff = self.beta_baseline_ if name == "baseline" else (self.beta_direct_ if name == "direct" else self.beta_carryovers_[int(name.split("_")[-1])-1])
            t_stat = coeff / se
            self.p_values_[name] = float(2 * (1 - t.cdf(abs(t_stat), df_resid)))
        else:
            self.p_values_[name] = 1.0

    # Profile Likelihood CI for lambda
    if self.ll_profile_:
        max_ll = max(self.ll_profile_.values())
        # 95% CI boundary for chi2(1)
        cutoff = max_ll - chi2.ppf(0.95, 1) / 2.0
        ci_lambdas = [l for l, ll in self.ll_profile_.items() if ll >= cutoff]
        if ci_lambdas:
            self.lambda_ci_ = (min(ci_lambdas), max(ci_lambdas))
        else:
            self.lambda_ci_ = (best_lambda, best_lambda)

    return self