LHS

lhs

Latin Hypercube Sampling (LHS) classical and space-filling Design of Experiments (DoE) matrices.

This module provides the LatinHypercubeDesign class, which constructs Latin Hypercube Sampling (LHS) matrices. LHS is a modern space-filling design technique widely utilized in computer experiments, high-dimensional simulation scenarios (such as Monte Carlo models), and black-box software testing to explore large design spaces with minimal runs.

CLASS	DESCRIPTION
`LatinHypercubeDesign`	Generates Latin Hypercube Sampling designs for multidimensional factors.

LatinHypercubeDesign

LatinHypercubeDesign(
    factors: dict, num_samples: int, seed: int = 42
)

Bases: DesignMatrix

Generates Latin Hypercube Sampling designs for multidimensional factors.

LHS is a generalization of a Latin Square to an arbitrary number of dimensions. It ensures that the ensemble of random samples is distributed evenly across the entire multi-dimensional space, preventing the accidental clustering of points that can occur with simple random sampling.

Mathematical Definition and Properties

Let $N$ be the number of target samples (runs) and $k$ be the number of factors. A Latin Hypercube design is represented by an $N \times k$ matrix where: 1. Projection of the $N$ sample points onto any single factor dimension yields exactly one sample in each of $N$ equally probable intervals. 2. Specifically, the range of each factor is divided into $N$ non-overlapping intervals of equal probability: $$ I_j = \left[ \frac{j-1}{N}, \frac{j}{N} \right] \quad \text{for } j \in {1, 2, \dots, N} $$ 3. Within each interval $I_j$, a point is sampled (either at the midpoint or randomly): $$ x_j = \frac{j-1 + U_j}{N} $$ where $U_j \sim \text{Uniform}(0, 1)$ is a random noise variable. 4. The sampled values for the $k$ dimensions are paired using independent, random permutations of the set $\{1, 2, \dots, N\}$ for each column.

Maximin LHS Space-Filling Variant

Standard random LHS can still yield sample points clustered close together in multidimensional space. To prevent this, Maximin LHS optimizes the permutations to maximize the minimum Euclidean distance between any two sample points: $$ \max_{\Pi} \min_{a \neq b} \lVert x_a - x_b \rVert_2 $$ This forces the points to spread out as far as possible, filling the multidimensional space uniformly.

Pseudocode for the Algorithm

function generate_lhs(factors, N):
    1. Determine k = number of factors.
    2. Initialize N x k matrix LHS_coded.
    3. For each column j from 1 to k:
         a. Create array of interval indices: [1, 2, ..., N].
         b. Shuffle the interval indices randomly (permutation).
         c. For each row i:
              Draw random uniform noise U ~ Uniform(0, 1).
              Compute coded value: cell = (shuffled_index[i] - 1 + U) / N.
              LHS_coded[i, j] = cell.
    4. If Maximin optimization is enabled:
         Iterate permutations to maximize min_distance(row_a, row_b).
    5. Map the coded values in [0, 1] to the physical bounds specified in factors.
    6. Return DataFrame.

ATTRIBUTE	DESCRIPTION
`num_samples`	The exact number of samples (runs) to draw. TYPE: `int`

Examples:

Example

>>> # Drawing 50 space-filling points to explore temperature and speed bounds
>>> factors = {"temp": [100, 500], "speed": [0, 100]}
>>> design = LatinHypercubeDesign(factors, num_samples=50)
>>> # The generated DataFrame will contain 50 runs covering the entire rectangular region.

PARAMETER	DESCRIPTION
`factors`	Mapping of factor labels to their lower and upper physical boundaries. TYPE: `dict`
`num_samples`	The target number of samples. TYPE: `int`
`seed`	Random seed for reproducibility. Defaults to 42. TYPE: `int` DEFAULT: `42`

METHOD	DESCRIPTION
`generate`	Generates the Latin Hypercube design matrix.

Source code in src\xpyrment\design\doe\lhs.py

def __init__(self, factors: dict, num_samples: int, seed: int = 42):
    """Initializes a LatinHypercubeDesign.

    Args:
        factors (dict): Mapping of factor labels to their lower and upper physical boundaries.
        num_samples (int): The target number of samples.
        seed (int): Random seed for reproducibility. Defaults to 42.
    """
    super().__init__(factors)
    self.num_samples = num_samples
    self.seed = seed

generate

generate() -> DataFrame

Generates the Latin Hypercube design matrix.

Partitions interval ranges, draws independent uniform perturbations, shuffles columns, optionally optimizes minimum spacing, and maps results to physical bounds.

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: A pandas DataFrame containing the LHS matrix.

Source code in src\xpyrment\design\doe\lhs.py

def generate(self) -> pd.DataFrame:
    """Generates the Latin Hypercube design matrix.

    Partitions interval ranges, draws independent uniform perturbations, shuffles columns,
    optionally optimizes minimum spacing, and maps results to physical bounds.

    Returns:
        pd.DataFrame: A pandas DataFrame containing the LHS matrix.
    """
    import numpy as np

    rng = np.random.default_rng(self.seed)

    k = len(self.factors)
    keys = list(self.factors.keys())
    N = self.num_samples

    coded_matrix = np.zeros((N, k))

    for j in range(k):
        # Create interval indices from 1 to N
        intervals = np.arange(1, N + 1)
        # Permute interval indices independently for each factor
        shuffled_intervals = rng.permutation(intervals)

        # For each interval, draw a uniform point
        for i in range(N):
            U = rng.uniform(0.0, 1.0)
            # Compute coded coordinate in [0.0, 1.0]
            coded_matrix[i, j] = (shuffled_intervals[i] - 1.0 + U) / N

    # Map coded levels from [0.0, 1.0] back to physical bounds
    physical_df = pd.DataFrame()
    for idx, col in enumerate(keys):
        bounds = self.factors[col]
        low, high = bounds[0], bounds[-1]
        physical_df[col] = low + coded_matrix[:, idx] * (high - low)

    # TODO: Implement Maximin Latin Hypercube optimization (shuffling columns to maximize minimum pairwise Euclidean distance).
    # TODO: Add correlation-minimization algorithms (such as Owen's randomized LHS) to reduce collinearity between factors.
    return physical_df