Metrics Module
The xpyrment.metrics package implements a unified, low-code statistical metrics taxonomy supporting automated CUPED variance reduction, Delta method ratio calculations, and frequentist/Bayesian evaluation pipelines.
📊 Supported Metric Models
The following metric formulas are dynamically registered and supported:
| Metric Model | Technical Class Path | Key Features & Analytical Properties |
|---|---|---|
| MeanMetric | xpyrment.metrics.taxonomy.MeanMetric |
A metric representing a continuous or numeric value (e.g., average revenue, sessions). |
| ProportionMetric | xpyrment.metrics.taxonomy.ProportionMetric |
A metric representing a binary/proportion rate (e.g., conversion rate, success rate). |
| RatioMetric | xpyrment.metrics.taxonomy.RatioMetric |
A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate). |
📦 Package Reference
metrics
Metrics package for taxonomy, guardrails, and transformations.
This package houses the core definitions and statistical calculation routines for metrics
evaluated during an experiment:
- BaseMetric: The abstract base class establishing standard evaluation contracts.
- MeanMetric: For continuous measurements, supporting pre-period CUPED adjustments.
- ProportionMetric: For binary binomial event rates.
- RatioMetric: For compound aggregate metrics (e.g., Click-Through Rate) evaluated via Delta Method variance.
- GuardrailMetric: Special monitoring wrapper to prevent platform/business regressions.
- log_transform: Normalization for extremely skewed continuous metrics.
- delta_normalization: Taylor expansion adjustment for advanced aggregate metrics.
| MODULE | DESCRIPTION |
|---|---|
guardrails |
Guardrail metrics to protect core platform health and business stability. |
taxonomy |
Standardized metrics taxonomy, calculation engines, and variance reduction routines. |
transformations |
Mathematical transformations and normalization utilities for experimental telemetry. |
| CLASS | DESCRIPTION |
|---|---|
GuardrailMetric |
Defines a guardrail metric with specific breach thresholds. |
BaseMetric |
Abstract base class representing a statistical metric in the experiment taxonomy. |
MeanMetric |
A metric representing a continuous or numeric value (e.g., average revenue, sessions). |
ProportionMetric |
A metric representing a binary/proportion rate (e.g., conversion rate, success rate). |
RatioMetric |
A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate). |
| FUNCTION | DESCRIPTION |
|---|---|
log_transform |
Transforms continuous metrics using a shifted natural log transformation. |
delta_normalization |
Normalizes metrics using the Delta Method expansion (Stub/Scaffolding). |
GuardrailMetric
GuardrailMetric(
metric: BaseMetric, max_allowed_change: float = 0.01
)
Defines a guardrail metric with specific breach thresholds.
Guardrail metrics are designed to prevent treatment arms from causing catastrophic regressions on critical secondary metrics. Unlike primary metrics (where we search for significant positive changes), guardrail metrics are evaluated to ensure that they do not deteriorate beyond a pre-specified tolerance boundary, irrespective of statistical significance.
| ATTRIBUTE | DESCRIPTION |
|---|---|
metric |
The underlying metric to monitor (e.g., MeanMetric, RatioMetric).
TYPE:
|
max_allowed_change |
The maximum tolerated relative change (positive or negative)
expressed as a fraction (e.g.,
TYPE:
|
Examples:
Example
>>> from xpyrment.metrics.taxonomy import MeanMetric
>>> from xpyrment.metrics.guardrails import GuardrailMetric
>>> latency_metric = MeanMetric("Page Latency", value_col="load_time")
>>> guardrail = GuardrailMetric(latency_metric, max_allowed_change=0.02) # 2% max increase
>>> calc_result = {"metric_name": "Page Latency", "relative_lift": 0.035} # 3.5% lift (regression)
>>> guardrail.check_breach(calc_result)
True
| PARAMETER | DESCRIPTION |
|---|---|
metric
|
The concrete metric instance being monitored.
TYPE:
|
max_allowed_change
|
The threshold for the maximum absolute relative lift allowed before triggering a breach. Defaults to 0.01 (1%).
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
check_breach |
Determines if the calculated lift breaches the guardrail thresholds. |
Source code in src\xpyrment\metrics\guardrails.py
check_breach
Determines if the calculated lift breaches the guardrail thresholds.
Mathematical Representation
Let \(L\) be the relative lift calculated for the wrapped metric: $$ L = \frac{\bar{Y}_T - \bar{Y}_C}{\bar{Y}_C} $$ A breach is detected if the magnitude of the relative lift exceeds the maximum allowed change: $$ \text{Breach} = |L| > \text{max_allowed_change} $$
Args:
calculation_result (Dict[str, Any]): Output dictionary produced by calling
metric.calculate() on experimental data.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the relative lift is larger in magnitude than
TYPE:
|
Source code in src\xpyrment\metrics\guardrails.py
BaseMetric
Bases: ABC
Abstract base class representing a statistical metric in the experiment taxonomy.
All custom metrics must inherit from BaseMetric and implement the abstract .calculate()
method to return a standardized MetricResult dictionary.
| ATTRIBUTE | DESCRIPTION |
|---|---|
name |
The unique descriptive name of the metric.
TYPE:
|
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Unique descriptive name of the metric.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
calculate |
Abstract method to compute statistics for control and treatment groups. |
Source code in src\xpyrment\metrics\taxonomy.py
calculate
abstractmethod
Abstract method to compute statistics for control and treatment groups.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The experimental dataset.
TYPE:
|
treatment_col
|
Column name identifying experimental groups/arms.
TYPE:
|
control
|
The value in
TYPE:
|
treatment
|
The value in
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dict[str, Any]: A compliant |
Source code in src\xpyrment\metrics\taxonomy.py
MeanMetric
Bases: BaseMetric
A metric representing a continuous or numeric value (e.g., average revenue, sessions).
Supports optional pre-period CUPED (Controlled-comparison Using Pre-Existing Data) adjustment to explain away pre-existing variance and dramatically lower required sample sizes or runtimes.
| ATTRIBUTE | DESCRIPTION |
|---|---|
value_col |
The column in the DataFrame containing active experiment period values.
TYPE:
|
pre_period_col |
The column containing pre-experiment baseline values for CUPED.
TYPE:
|
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Unique descriptive name of the metric.
TYPE:
|
value_col
|
Column name containing experiment period values.
TYPE:
|
pre_period_col
|
Column name containing pre-experiment baseline values for CUPED. Defaults to None (no CUPED applied).
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
calculate |
Calculates descriptive and Welch's t-test statistics for the mean metric. |
Source code in src\xpyrment\metrics\taxonomy.py
calculate
calculate(
df: DataFrame,
treatment_col: str,
control: str,
treatment: str,
alpha: float = 0.05,
) -> Dict[str, Any]
Calculates descriptive and Welch's t-test statistics for the mean metric.
Drops missing values on the value column. If pre_period_col is provided,
performs joint missing drop and executes a standard linear CUPED adjustment:
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The experimental dataset.
TYPE:
|
treatment_col
|
Column identifying treatment assignments.
TYPE:
|
control
|
Control arm identifier value in
TYPE:
|
treatment
|
Treatment arm identifier value in
TYPE:
|
alpha
|
Significance level for Welch's confidence intervals. Defaults to 0.05.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dict[str, Any]: A completed |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If either control or treatment group becomes empty after filtering. |
Source code in src\xpyrment\metrics\taxonomy.py
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
ProportionMetric
Bases: MeanMetric
A metric representing a binary/proportion rate (e.g., conversion rate, success rate).
Inherits continuous logic from MeanMetric, as proportions can be modelled asymptotically
using normal approximations (Z-test/t-test) under the Central Limit Theorem.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Unique descriptive name of the metric.
TYPE:
|
value_col
|
Column name containing experiment period values.
TYPE:
|
pre_period_col
|
Column name containing pre-experiment baseline values for CUPED. Defaults to None (no CUPED applied).
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
calculate |
Calculates proportion conversion rates, differences, and statistical significance. |
Source code in src\xpyrment\metrics\taxonomy.py
calculate
calculate(
df: DataFrame,
treatment_col: str,
control: str,
treatment: str,
alpha: float = 0.05,
) -> Dict[str, Any]
Calculates proportion conversion rates, differences, and statistical significance.
Drops missing values, calculates means and variances of binary inputs, and delegates
to MeanMetric.calculate while overriding the metric type string to "Proportion".
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The experimental dataset.
TYPE:
|
treatment_col
|
Column identifying treatment assignments.
TYPE:
|
control
|
Control arm identifier value.
TYPE:
|
treatment
|
Treatment arm identifier value.
TYPE:
|
alpha
|
Significance level. Defaults to 0.05.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dict[str, Any]: Standardized results dict with "metric_type" set to "Proportion". |
Source code in src\xpyrment\metrics\taxonomy.py
RatioMetric
RatioMetric(
name: str,
numerator_col: str,
denominator_col: str,
pre_numerator_col: Optional[str] = None,
pre_denominator_col: Optional[str] = None,
)
Bases: BaseMetric
A metric calculated as the ratio: sum(numerator) / sum(denominator) (e.g., Click-Through-Rate).
Employs the Delta Method to approximate ratio-level variances and supports double-covariate ratio-level CUPED adjustments to independently reduce variance in numerator and denominator.
| ATTRIBUTE | DESCRIPTION |
|---|---|
numerator_col |
The column containing active period numerator values.
TYPE:
|
denominator_col |
The column containing active period denominator values (must be \(>0\)).
TYPE:
|
pre_numerator_col |
Column containing pre-experiment baseline numerator values.
TYPE:
|
pre_denominator_col |
Column containing pre-experiment baseline denominator values.
TYPE:
|
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Unique descriptive name.
TYPE:
|
numerator_col
|
Active numerator column name.
TYPE:
|
denominator_col
|
Active denominator column name.
TYPE:
|
pre_numerator_col
|
Pre-experiment numerator column. Defaults to None.
TYPE:
|
pre_denominator_col
|
Pre-experiment denominator column. Defaults to None.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
calculate |
Calculates ratio values, Delta-method variances, and statistical significance. |
Source code in src\xpyrment\metrics\taxonomy.py
calculate
calculate(
df: DataFrame,
treatment_col: str,
control: str,
treatment: str,
alpha: float = 0.05,
) -> Dict[str, Any]
Calculates ratio values, Delta-method variances, and statistical significance.
Cleans missing values and non-positive denominators. If double-covariates are present, separately fits linear CUPED adjustments to the numerator and denominator series:
The ratio variance is then estimated using the Delta Method formulation:
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The experimental dataset.
TYPE:
|
treatment_col
|
Column identifying treatment assignments.
TYPE:
|
control
|
Control arm identifier value.
TYPE:
|
treatment
|
Treatment arm identifier value.
TYPE:
|
alpha
|
Significance level. Defaults to 0.05.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dict[str, Any]: Completed |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If either control or treatment group becomes empty after filtering. |
Source code in src\xpyrment\metrics\taxonomy.py
393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 | |
log_transform
Transforms continuous metrics using a shifted natural log transformation.
Highly skewed distributions, such as revenue per user or session durations, often violate the normality assumptions of classical parametric tests (e.g., Student's or Welch's t-test). Applying a natural log transformation normalizes the distribution and stabilizes variance (homoscedasticity). The addition of 1 ensures that zero values remain mapped to zero.
Mathematical Representation
The transformation is defined as: $$ y_{\text{transformed}} = \ln(y + 1) $$ This is mathematically equivalent to: $$ \log1p(y) $$ which maintains numerical precision for extremely small values of \(y \approx 0\).
| PARAMETER | DESCRIPTION |
|---|---|
df
|
The source DataFrame containing the column to transform.
TYPE:
|
col
|
The name of the target column in
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
pd.Series: A new pandas Series containing the log-transformed values. |
Examples:
Example
Source code in src\xpyrment\metrics\transformations.py
delta_normalization
Normalizes metrics using the Delta Method expansion (Stub/Scaffolding).
The Delta Method is a general technique for approximating the variance of a function of random variables. For non-linear transformations or aggregate metrics (e.g., Click-Through-Rate where the denominator is not fixed), direct variance calculations are biased. Delta normalization computes a Taylor series expansion of the target function around its expected value to derive an asymptotically normal approximation.
Mathematical Context
Let \(g(X)\) be a differentiable function of a random variable \(X\) with mean \(\mu\) and variance \(\sigma^2\). The first-order Taylor expansion of \(g(X)\) about \(\mu\) is: $$ g(X) \approx g(\mu) + g'(\mu)(X - \mu) $$ Taking the variance of this linear approximation yields: $$ \text{Var}(g(X)) \approx [g'(\mu)]^2 \sigma^2 $$ For multidimensional vectors, such as ratio estimates of the form \(g(X, Y) = X / Y\), the Taylor expansion incorporates the covariance between numerator and denominator: $$ \text{Var}\left(\frac{X}{Y}\right) \approx \frac{1}{\mu_Y^2} \text{Var}(X) + \frac{\mu_X^2}{\mu_Y^4} \text{Var}(Y) - 2 \frac{\mu_X}{\mu_Y^3} \text{Cov}(X, Y) $$
Args: df (pd.DataFrame): The source DataFrame containing the metric columns. col (str): The name of the column representing the metric to normalize.
| RETURNS | DESCRIPTION |
|---|---|
Series
|
pd.Series: A pandas Series of normalized values. |