Skip to content

Report Module

The xpyrment.report module contains submodules and components for report.

report

Experimental reporting, lifecycle tracking, and scientific visualization.

This package provides logging, recording, and charting tools to summarize experimental phases. It ensures that setups, runtime quality checks, and analytical inferences are aggregated and presented in standard-compliant, publication-ready formats.

Submodules: - card: Compiles standard, machine-readable Experiment Cards for metadata cataloging. - audit: Logs and chains immutable lifecycle events for audit and governance compliance. - export: Generates high-quality statistical plots (Forest plots, Power curves).

MODULE DESCRIPTION
audit

Immutable, compliance-ready experimental audit trails.

card

Unified, standard-compliant Experiment Cards and meta-ledgers.

export

Statistical visualization and plot generation for experimental reports.

generator
CLASS DESCRIPTION
ExperimentCard

Consumes metadata, planning state, and calculations to compile a unified report card.

AuditTrail

Maintains an immutable, compliance-ready audit trail of experiment phase transition events.

ExperimentReportGenerator

Generates premium standalone Markdown and HTML reports from experiment AnalysisResult instances.

FUNCTION DESCRIPTION
plot_forest

Generates a horizontal forest plot visualizing relative lift and confidence intervals.

plot_power_curve

Plots required sample size per variant across a range of Minimum Detectable Effects (MDE).

ExperimentCard

ExperimentCard(
    experiment_id: str,
    plan_spec: dict,
    validation_spec: dict,
    analysis_summary: dict,
)

Consumes metadata, planning state, and calculations to compile a unified report card.

An Experiment Card (inspired by Model Cards, Mitchell et al. 2019) is the definitive, unified record and metadata registry of an experiment. It acts as a standardized document that records the design, execution, and results of an experiment in a machine-readable format. This makes it possible to search, catalog, and run large-scale meta-analyses across thousands of past experiments (e.g., tracking cumulative lift, estimating p-value distributions, or measuring historical power).

The Experiment Card schema unifies three core lifecycle stages
  1. Planning & Setup Specification (plan_spec):
  2. mde: Minimum Detectable Effect (relative or absolute).
  3. alpha: Nominal Type I error rate (e.g., \(0.05\)).
  4. power: Target statistical power (\(1 - \\beta\), e.g., \(0.80\)).
  5. target_sample_size: Calculated sample size requirement.
  6. metric_registry: Names and types of registered primary, secondary, and guardrail metrics.
  7. Runtime Validation & Diagnostics (validation_spec):
  8. srm_p_value: Pearson Chi-Square goodness-of-fit p-value for sample allocation ratio mismatches.
  9. covariate_balance: Standardized Mean Differences (SMDs) confirming unbiased random assignments.
  10. Statistical Analysis Outcomes (analysis_summary):
  11. treatment_effect: Relative and absolute lifts, standard errors, and confidence intervals.
  12. p_values: Observed p-values (with any multiple-testing adjustments applied).
  13. recommendation: Automated decision outcome (e.g., "SHIP", "NO-SHIP", "INCONCLUSIVE").
ATTRIBUTE DESCRIPTION
experiment_id

Unique tracking identifier for the experiment.

TYPE: str

plan_spec

Setup configurations, expected metrics, and calculated power parameters.

TYPE: dict

validation_spec

Summary of SRM and covariate balance diagnostics.

TYPE: dict

analysis_summary

Calculated point estimates, confidence intervals, and launch recommendations.

TYPE: dict

PARAMETER DESCRIPTION
experiment_id

The unique ID of the experiment.

TYPE: str

plan_spec

Setup and design characteristics dictionary.

TYPE: dict

validation_spec

Pre-analysis quality check outcomes.

TYPE: dict

analysis_summary

Post-analysis statistical summaries.

TYPE: dict

METHOD DESCRIPTION
to_dict

Serializes the experiment card metadata to a standard python dictionary.

to_json

Dumps the card as a formatted JSON document.

Source code in src\xpyrment\report\card.py
def __init__(self, experiment_id: str, plan_spec: dict, validation_spec: dict, analysis_summary: dict):
    """Initializes a new ExperimentCard.

    Args:
        experiment_id (str): The unique ID of the experiment.
        plan_spec (dict): Setup and design characteristics dictionary.
        validation_spec (dict): Pre-analysis quality check outcomes.
        analysis_summary (dict): Post-analysis statistical summaries.
    """
    self.experiment_id = experiment_id
    self.plan_spec = plan_spec
    self.validation_spec = validation_spec
    self.analysis_summary = analysis_summary

to_dict

to_dict() -> dict

Serializes the experiment card metadata to a standard python dictionary.

RETURNS DESCRIPTION
dict

The nested dictionary of card metadata.

TYPE: dict

Source code in src\xpyrment\report\card.py
def to_dict(self) -> dict:
    """Serializes the experiment card metadata to a standard python dictionary.

    Returns:
        dict: The nested dictionary of card metadata.
    """
    return {
        "experiment_id": self.experiment_id,
        "plan_spec": self.plan_spec,
        "validation_spec": self.validation_spec,
        "analysis_summary": self.analysis_summary
    }

to_json

to_json() -> str

Dumps the card as a formatted JSON document.

RETURNS DESCRIPTION
str

Indented, pretty-printed JSON string of the complete experiment card ledger.

TYPE: str

Source code in src\xpyrment\report\card.py
def to_json(self) -> str:
    """Dumps the card as a formatted JSON document.

    Returns:
        str: Indented, pretty-printed JSON string of the complete experiment card ledger.
    """
    return json.dumps(self.to_dict(), indent=2)

AuditTrail

AuditTrail(experiment_id: str, db_path: str = None)

Maintains an immutable, compliance-ready audit trail of experiment phase transition events.

In enterprise, financial, and clinical environments, maintaining a rigorous record of an experiment's history is critical for governance, auditing, and scientific reproducibility. An audit trail acts as a tamper-evident, chronological log tracking every key lifecycle change, modification to allocation parameters, and analytical peeking event.

Cryptographic Verification and State-Chaining

To satisfy strict regulatory compliance frameworks, the audit log entries are structured as a linear hash chain: - Each log entry is represented as a state block \(B_k = (t_k, a_k, d_k, h_{k-1})\) where: - \(t_k\): Coordinated Universal Time (ISO 8601 UTC timestamp). - \(a_k\): The action or state transition executed (e.g., "ALLOCATION_SHIFT"). - \(d_k\): Detailed parameter changes (e.g., altering treatment allocation from \(10\\%\) to \(50\\%\)). - \(h_{k-1}\): The SHA-256 cryptographic hash of the preceding block \(B_{k-1}\). - The hash of the current block \(h_k\) is computed as: $$ h_k = H(t_k \parallel a_k \parallel d_k \parallel h_{k-1}) $$ where \(\\parallel\) denotes string concatenation, and \(H\) is the SHA-256 secure hash function. - Because of this chaining, any retroactive modification of historical logs immediately breaks the hash chain, making the log highly secure and tamper-evident.

ATTRIBUTE DESCRIPTION
experiment_id

The unique identifier of the experiment under audit.

TYPE: str

logs

List of chronological, cryptographically linked log events.

TYPE: List[Dict[str, str]]

PARAMETER DESCRIPTION
experiment_id

The unique ID of the target experiment.

TYPE: str

db_path

Optional SQLite database path for tamper-proof persistence.

TYPE: str DEFAULT: None

METHOD DESCRIPTION
log_event

Appends a new event with an active timestamp to the audit trail log.

verify_integrity

Verifies the complete cryptographic chain of the audit trail ledger.

get_logs

Returns the full list of chronological logs in the audit ledger.

Source code in src\xpyrment\report\audit.py
def __init__(self, experiment_id: str, db_path: str = None):
    """Initializes an AuditTrail log.

    Args:
        experiment_id (str): The unique ID of the target experiment.
        db_path (str, optional): Optional SQLite database path for tamper-proof persistence.
    """
    self.experiment_id = experiment_id
    self.db_path = db_path
    self.logs: List[Dict[str, str]] = []

    if self.db_path:
        import sqlite3
        with sqlite3.connect(self.db_path) as conn:
            cursor = conn.cursor()
            cursor.execute(
                '''CREATE TABLE IF NOT EXISTS audit_logs (
                    experiment_id TEXT,
                    timestamp TEXT,
                    action TEXT,
                    details TEXT,
                    prev_hash TEXT,
                    hash TEXT,
                    signature TEXT,
                    public_key TEXT
                )'''
            )
            conn.commit()

log_event

log_event(
    action: str,
    details: str,
    signature: str = None,
    public_key: str = None,
)

Appends a new event with an active timestamp to the audit trail log.

Calculates timestamps in strict UTC, hashes the event details with the prior block's hash, and appends the entry to the ledger. Optionally attaches RSA/ECDSA digital signatures.

PARAMETER DESCRIPTION
action

The action category (e.g., "PHASE_TRANSITION", "ALLOCATION_MODIFIED").

TYPE: str

details

Detailed text or JSON payload describing the parameters or user that initiated the change.

TYPE: str

signature

Cryptographic signature of the event hash.

TYPE: str DEFAULT: None

public_key

Public key string to verify the signature.

TYPE: str DEFAULT: None

Source code in src\xpyrment\report\audit.py
def log_event(self, action: str, details: str, signature: str = None, public_key: str = None):
    """Appends a new event with an active timestamp to the audit trail log.

    Calculates timestamps in strict UTC, hashes the event details with the prior block's hash,
    and appends the entry to the ledger. Optionally attaches RSA/ECDSA digital signatures.

    Args:
        action (str): The action category (e.g., `"PHASE_TRANSITION"`, `"ALLOCATION_MODIFIED"`).
        details (str): Detailed text or JSON payload describing the parameters or user that initiated the change.
        signature (str, optional): Cryptographic signature of the event hash.
        public_key (str, optional): Public key string to verify the signature.
    """
    import hashlib

    timestamp = datetime.datetime.now(datetime.UTC).isoformat()
    prev_hash = "0" * 64 if len(self.logs) == 0 else self.logs[-1]["hash"]

    safe_signature = signature or ""
    safe_public_key = public_key or ""
    # Formulate canonical block string for SHA-256 hashing
    data_str = f"{timestamp}||{action}||{details}||{prev_hash}||{safe_signature}||{safe_public_key}"
    current_hash = hashlib.sha256(data_str.encode("utf-8")).hexdigest()

    log_entry = {
        "timestamp": timestamp,
        "action": action,
        "details": details,
        "prev_hash": prev_hash,
        "hash": current_hash,
        "signature": safe_signature,
        "public_key": safe_public_key
    }
    self.logs.append(log_entry)

    if self.db_path:
        import sqlite3
        with sqlite3.connect(self.db_path) as conn:
            cursor = conn.cursor()
            cursor.execute(
                '''INSERT INTO audit_logs (experiment_id, timestamp, action, details, prev_hash, hash, signature, public_key)
                   VALUES (?, ?, ?, ?, ?, ?, ?, ?)''',
                (self.experiment_id, timestamp, action, details, prev_hash, current_hash, log_entry["signature"], log_entry["public_key"])
            )
            conn.commit()

verify_integrity

verify_integrity() -> bool

Verifies the complete cryptographic chain of the audit trail ledger.

RETURNS DESCRIPTION
bool

True if the hash chain is fully intact and unmodified, False otherwise.

TYPE: bool

Source code in src\xpyrment\report\audit.py
def verify_integrity(self) -> bool:
    """Verifies the complete cryptographic chain of the audit trail ledger.

    Returns:
        bool: True if the hash chain is fully intact and unmodified, False otherwise.
    """
    import hashlib

    for i in range(len(self.logs)):
        block = self.logs[i]
        expected_prev = "0" * 64 if i == 0 else self.logs[i-1]["hash"]

        if block["prev_hash"] != expected_prev:
            return False

        # Recalculate block hash
        data_str = f"{block['timestamp']}||{block['action']}||{block['details']}||{block['prev_hash']}||{block.get('signature', '')}||{block.get('public_key', '')}"
        actual_hash = hashlib.sha256(data_str.encode("utf-8")).hexdigest()

        if block["hash"] != actual_hash:
            return False

    return True

get_logs

get_logs() -> List[Dict[str, str]]

Returns the full list of chronological logs in the audit ledger.

RETURNS DESCRIPTION
List[Dict[str, str]]

List[Dict[str, str]]: A list of dictionary objects representing the serialized ledger blocks.

Source code in src\xpyrment\report\audit.py
def get_logs(self) -> List[Dict[str, str]]:
    """Returns the full list of chronological logs in the audit ledger.

    Returns:
        List[Dict[str, str]]: A list of dictionary objects representing the serialized ledger blocks.
    """
    return self.logs

ExperimentReportGenerator

ExperimentReportGenerator(
    result: AnalysisResult,
    experiment_name: str = "A/B Experiment Report",
)

Generates premium standalone Markdown and HTML reports from experiment AnalysisResult instances.

PARAMETER DESCRIPTION
result

The completed analysis result object.

TYPE: AnalysisResult

experiment_name

The logical name of the experiment.

TYPE: str DEFAULT: 'A/B Experiment Report'

RAISES DESCRIPTION
ValueError

If the analysis results are empty or invalid.

METHOD DESCRIPTION
generate_markdown

Generates a complete, beautiful GitHub-compatible Markdown summary card.

generate_html

Generates a premium, self-contained interactive HTML dashboard of the results.

save_html

Saves the beautifully compiled HTML dashboard report to a local file.

save_markdown

Saves the GitHub-compatible Markdown summary report to a local file.

Source code in src\xpyrment\report\generator.py
def __init__(
    self, result: AnalysisResult, experiment_name: str = "A/B Experiment Report"
):
    """Initializes the report generator.

    Args:
        result (AnalysisResult): The completed analysis result object.
        experiment_name (str): The logical name of the experiment.

    Raises:
        ValueError: If the analysis results are empty or invalid.
    """
    if result is None or not hasattr(result, "df_raw") or result.df_raw is None:
        raise ValueError("Invalid AnalysisResult provided to report generator.")

    self.result = result
    self.df_raw = result.df_raw
    self.alpha = result.alpha
    self.balance_checker = result.balance_checker
    self.experiment_name = experiment_name

    # Compute retrospective Sample Ratio Mismatch (SRM) stats
    self.control_n = 0
    self.treatment_n = 0
    self.srm_p_value = 1.0
    self.srm_passed = True

    if len(self.df_raw) > 0:
        row = self.df_raw.iloc[0]
        self.control_n = int(row.get("control_n", 0))
        self.treatment_n = int(row.get("treatment_n", 0))
        total_n = self.control_n + self.treatment_n

        if total_n > 0:
            expected_n = total_n / 2.0
            chi_sq = ((self.control_n - expected_n) ** 2 / expected_n) + (
                (self.treatment_n - expected_n) ** 2 / expected_n
            )
            self.srm_p_value = float(1.0 - chi2.cdf(chi_sq, df=1))
            self.srm_passed = (
                self.srm_p_value >= 0.01
            )  # Standard 0.01 SRM critical alpha

generate_markdown

generate_markdown() -> str

Generates a complete, beautiful GitHub-compatible Markdown summary card.

RETURNS DESCRIPTION
str

Markdown card representation of the experiment results.

TYPE: str

Source code in src\xpyrment\report\generator.py
def generate_markdown(self) -> str:
    """Generates a complete, beautiful GitHub-compatible Markdown summary card.

    Returns:
        str: Markdown card representation of the experiment results.
    """
    lines = []
    lines.append(f"# 📊 {self.experiment_name}")
    lines.append("")
    lines.append("## 📌 Executive Summary")
    lines.append(f"- **Nominal Significance Level (Alpha)**: `{self.alpha}`")
    lines.append(
        f"- **Total Samples**: `{self.control_n + self.treatment_n:,}` (Control: `{self.control_n:,}`, Treatment: `{self.treatment_n:,}`)"
    )

    # SRM Status
    srm_status = (
        "🟢 **PASSED**" if self.srm_passed else "🔴 **FAILED (Potential Bias!)**"
    )
    lines.append(
        f"- **Sample Ratio Mismatch (SRM)**: {srm_status} (p-value: `{self.srm_p_value:.6f}`)"
    )

    # Covariate balance
    if self.balance_checker is not None:
        imbalanced = [
            name
            for name, diag in self.balance_checker.diagnostics_.items()
            if abs(diag["smd"]) > 0.1
        ]
        if imbalanced:
            lines.append(
                f"- **Covariate Balance**: ⚠️ **IMBALANCE DETECTED** in: `{', '.join(imbalanced)}`"
            )
        else:
            lines.append(
                "- **Covariate Balance**: 🟢 **ALL COVARIATES BALANCED** (SMD <= 0.1)"
            )
    else:
        lines.append("- **Covariate Balance**: `Not Evaluated`")

    lines.append("")
    lines.append("## 📈 Metric Performance")
    lines.append(
        "| Metric | Type | Control Mean | Treatment Mean | Relative Lift | P-Value | Significance | CUPED |"
    )
    lines.append("| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: |")

    for metric in self._iter_metric_rows():
        sig = self._metric_significance(metric)
        lines.append(
            f"| **{metric.name}** | `{metric.type}` | "
            f"{metric.control_mean:.4f} | {metric.treatment_mean:.4f} | "
            f"**{metric.lift:+.2%}** | `{metric.p_value:.5f}` | "
            f"{sig['sig_badge_md']} | {'🟢 CUPED' if metric.cuped_applied else '⚪ Standard'} |"
        )

    if self.balance_checker is not None:
        lines.append("")
        lines.append("## ⚖️ Covariate Balance Love Plot")
        lines.append("```text")
        lines.append(self.result.love_plot())
        lines.append("```")

    return "\n".join(lines)

generate_html

generate_html() -> str

Generates a premium, self-contained interactive HTML dashboard of the results.

RETURNS DESCRIPTION
str

Portable HTML report page content with embedded modern CSS and layouts.

TYPE: str

Source code in src\xpyrment\report\generator.py
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
    def generate_html(self) -> str:
        """Generates a premium, self-contained interactive HTML dashboard of the results.

        Returns:
            str: Portable HTML report page content with embedded modern CSS and layouts.
        """
        # Load premium custom SVG logo which also serves as the favicon
        svg_logo_text = self._get_svg_logo()

        favicon_tags = []
        if svg_logo_text:
            import base64

            svg_b64 = base64.b64encode(svg_logo_text.encode("utf-8")).decode("utf-8")
            favicon_tags.append(
                f'<link rel="icon" href="data:image/svg+xml;base64,{svg_b64}" sizes="any" type="image/svg+xml">'
            )

        favicons_html = "\n    ".join(favicon_tags)

        logo_html = ""
        if svg_logo_text:
            cleaned_svg = svg_logo_text
            if cleaned_svg.startswith("<?xml"):
                end_xml_idx = cleaned_svg.find("?>")
                if end_xml_idx != -1:
                    cleaned_svg = cleaned_svg[end_xml_idx + 2 :].strip()
            if cleaned_svg.startswith("<!DOCTYPE"):
                end_doc_idx = cleaned_svg.find(">")
                if end_doc_idx != -1:
                    cleaned_svg = cleaned_svg[end_doc_idx + 1 :].strip()
            logo_html = f'<div class="header-logo-container">{cleaned_svg}</div>'

        # Formulate HTML metric table rows
        table_rows = []
        for metric in self._iter_metric_rows():
            sig = self._metric_significance(metric)
            lift_fmt = self._metric_lift_presentation(metric)
            cuped_badge = self._metric_cuped_badge_html(metric)

            table_rows.append(f"""
            <tr>
                <td><strong>{metric.name}</strong></td>
                <td><span class="badge-type">{metric.type}</span></td>
                <td>{metric.control_mean:.4f}</td>
                <td>{metric.treatment_mean:.4f}</td>
                <td><span class="{lift_fmt['lift_class_html']}">{lift_fmt['lift_str']}</span></td>
                <td><code>{metric.p_value:.5f}</code></td>
                <td><span class="{sig['sig_class_html']}">{sig['sig_text_html']}</span></td>
                <td>{cuped_badge}</td>
            </tr>
            """)

        # Covariate balance diagnostics section
        cov_rows = []
        if self.balance_checker is not None:
            for c_name, diag in self.balance_checker.diagnostics_.items():
                smd = diag["smd"]
                vr = diag["variance_ratio"]
                balanced = abs(smd) <= 0.1
                balance_class = "badge-success" if balanced else "badge-danger"
                balance_text = "BALANCED" if balanced else "IMBALANCED"

                cov_rows.append(f"""
                <tr>
                    <td>{c_name}</td>
                    <td>{diag['mean_control']:.4f}</td>
                    <td>{diag['mean_treatment']:.4f}</td>
                    <td><code class="{"smd-success" if balanced else "smd-fail"}">{smd:+.4f}</code></td>
                    <td><code>{vr:.4f}</code></td>
                    <td><span class="badge {balance_class}">{balance_text}</span></td>
                </tr>
                """)
        else:
            cov_rows.append(
                '<tr><td colspan="6" class="text-center">No baseline covariates were specified.</td></tr>'
            )

        # SRM card rendering
        srm_class = "card-success-border" if self.srm_passed else "card-danger-border"
        srm_badge_text = "PASSED" if self.srm_passed else "ALERT"

        love_plot_content = (
            self.result.love_plot()
            if self.balance_checker is not None
            else "No covariate balance available."
        )

        html_template = f"""<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{self.experiment_name}</title>
    {favicons_html}
    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;700&display=swap" rel="stylesheet">
    <style>
        :root {{
            --bg-color: #0f172a;
            --panel-bg: #1e293b;
            --border-color: #334155;
            --text-primary: #f8fafc;
            --text-secondary: #94a3b8;
            --teal: #0ea5e9;
            --success: #10b981;
            --danger: #ef4444;
            --warning: #f59e0b;
            --gray: #64748b;
        }}

        * {{
            box-sizing: border-box;
            margin: 0;
            padding: 0;
        }}

        body {{
            font-family: 'Inter', sans-serif;
            background-color: var(--bg-color);
            color: var(--text-primary);
            padding: 2rem;
            line-height: 1.6;
        }}

        .container {{
            max-width: 1200px;
            margin: 0 auto;
        }}

        /* Header block */
        header {{
            margin-bottom: 2rem;
            border-bottom: 2px solid var(--border-color);
            padding-bottom: 1.5rem;
            display: flex;
            justify-content: space-between;
            align-items: center;
        }}

        header h1 {{
            font-size: 2rem;
            font-weight: 700;
            background: linear-gradient(to right, #38bdf8, #0ea5e9);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
        }}

        header .meta-tag {{
            font-size: 0.875rem;
            color: var(--text-secondary);
        }}

        .header-logo-container svg {{
            width: 80px;
            height: auto;
            display: block;
        }}

        /* KPI Cards Grid */
        .grid {{
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(240px, 1fr));
            gap: 1.5rem;
            margin-bottom: 2rem;
        }}

        .card {{
            background-color: var(--panel-bg);
            border: 1px solid var(--border-color);
            border-radius: 12px;
            padding: 1.5rem;
            box-shadow: 0 4px 6px -1px rgb(0 0 0 / 0.1), 0 2px 4px -2px rgb(0 0 0 / 0.1);
            transition: transform 0.2s ease, border-color 0.2s ease;
        }}

        .card:hover {{
            transform: translateY(-2px);
            border-color: var(--teal);
        }}

        .card h3 {{
            font-size: 0.875rem;
            font-weight: 600;
            color: var(--text-secondary);
            text-transform: uppercase;
            letter-spacing: 0.05em;
            margin-bottom: 0.5rem;
        }}

        .card .value {{
            font-size: 1.75rem;
            font-weight: 700;
            color: var(--text-primary);
        }}

        .card .sub {{
            font-size: 0.75rem;
            color: var(--text-secondary);
            margin-top: 0.25rem;
        }}

        .card-success-border {{
            border-left: 5px solid var(--success);
        }}

        .card-danger-border {{
            border-left: 5px solid var(--danger);
        }}

        /* Badges */
        .badge {{
            display: inline-block;
            padding: 0.25rem 0.5rem;
            border-radius: 6px;
            font-size: 0.75rem;
            font-weight: 600;
        }}

        .badge-success {{
            background-color: rgba(16, 185, 129, 0.2);
            color: #34d399;
        }}

        .badge-danger {{
            background-color: rgba(239, 68, 68, 0.2);
            color: #f87171;
        }}

        .badge-gray {{
            background-color: rgba(100, 116, 139, 0.2);
            color: #94a3b8;
        }}

        .badge-type {{
            background-color: rgba(56, 189, 248, 0.15);
            color: #38bdf8;
            padding: 0.2rem 0.4rem;
            border-radius: 4px;
            font-size: 0.75rem;
            font-family: monospace;
        }}

        /* Table */
        .section-title {{
            font-size: 1.25rem;
            font-weight: 600;
            margin-bottom: 1rem;
            color: var(--text-primary);
        }}

        .panel {{
            background-color: var(--panel-bg);
            border: 1px solid var(--border-color);
            border-radius: 12px;
            padding: 1.5rem;
            margin-bottom: 2rem;
            overflow: hidden;
        }}

        table {{
            width: 100%;
            border-collapse: collapse;
            text-align: left;
        }}

        th, td {{
            padding: 1rem;
            border-bottom: 1px solid var(--border-color);
        }}

        th {{
            color: var(--text-secondary);
            font-size: 0.875rem;
            text-transform: uppercase;
            font-weight: 600;
            letter-spacing: 0.05em;
        }}

        tr:last-child td {{
            border-bottom: none;
        }}

        tr:hover td {{
            background-color: rgba(255, 255, 255, 0.02);
        }}

        /* Lift Colors */
        .positive-lift {{
            color: #34d399;
            font-weight: 600;
        }}

        .negative-lift {{
            color: #f87171;
            font-weight: 600;
        }}

        /* Signficance Badges */
        .sig-badge {{
            background-color: rgba(16, 185, 129, 0.25);
            color: #34d399;
            padding: 0.2rem 0.5rem;
            border-radius: 4px;
            font-size: 0.75rem;
            font-weight: bold;
        }}

        .neutral-badge {{
            background-color: rgba(100, 116, 139, 0.2);
            color: #94a3b8;
            padding: 0.2rem 0.5rem;
            border-radius: 4px;
            font-size: 0.75rem;
        }}

        .smd-success {{
            color: #34d399;
            font-weight: bold;
        }}

        .smd-fail {{
            color: #fbbf24;
            font-weight: bold;
        }}

        /* Preformatted containers */
        pre {{
            background-color: #0b0f19;
            border: 1px solid var(--border-color);
            border-radius: 8px;
            padding: 1rem;
            overflow-x: auto;
            color: #38bdf8;
            font-family: monospace;
            font-size: 0.875rem;
        }}

        .row {{
            display: flex;
            gap: 1.5rem;
            flex-wrap: wrap;
        }}

        .col {{
            flex: 1;
            min-width: 300px;
        }}

        .text-center {{
            text-align: center;
        }}
    </style>
</head>
<body>
    <div class="container">
        <header>
            <div style="display: flex; align-items: center; gap: 1.25rem;">
                {logo_html}
                <div>
                    <h1>{self.experiment_name}</h1>
                    <div class="meta-tag">Generated by xpyrment on standard execution pipeline</div>
                </div>
            </div>
            <div class="meta-tag">Nominal Alpha: <strong>{self.alpha}</strong></div>
        </header>

        <!-- KPI Summary row -->
        <div class="grid">
            <div class="card">
                <h3>Total Assigned Users</h3>
                <div class="value">{self.control_n + self.treatment_n:,}</div>
                <div class="sub">Control: {self.control_n:,} | Treatment: {self.treatment_n:,}</div>
            </div>

            <div class="card {srm_class}">
                <h3>SRM Safety Shield</h3>
                <div class="value" style="color: {'#34d399' if self.srm_passed else '#f87171'}">
                    {srm_badge_text}
                </div>
                <div class="sub">Pearson Chi-Square p-value: {self.srm_p_value:.6f}</div>
            </div>

            <div class="card">
                <h3>Baseline Covariates</h3>
                <div class="value">
                    {len(self.balance_checker.diagnostics_) if self.balance_checker is not None else 0}
                </div>
                <div class="sub">
                    { "Diagnostics fully evaluated" if self.balance_checker is not None else "No registered covariates" }
                </div>
            </div>
        </div>

        <!-- Main dashboard layouts -->
        <div class="panel">
            <div class="section-title">📊 Statistical Metric Performance Summary</div>
            <table>
                <thead>
                    <tr>
                        <th>Metric Name</th>
                        <th>Type</th>
                        <th>Control Mean</th>
                        <th>Treatment Mean</th>
                        <th>Relative Lift</th>
                        <th>P-Value</th>
                        <th>Significance</th>
                        <th>CUPED</th>
                    </tr>
                </thead>
                <tbody>
                    {"".join(table_rows)}
                </tbody>
            </table>
        </div>

        <div class="row">
            <div class="col">
                <div class="panel">
                    <div class="section-title">⚖️ Baseline Covariate Balance Detail</div>
                    <table>
                        <thead>
                            <tr>
                                <th>Covariate Name</th>
                                <th>Mean (Ctrl)</th>
                                <th>Mean (Trt)</th>
                                <th>SMD</th>
                                <th>Var Ratio</th>
                                <th>Status</th>
                            </tr>
                        </thead>
                        <tbody>
                            {"".join(cov_rows)}
                        </tbody>
                    </table>
                </div>
            </div>

            <div class="col">
                <div class="panel" style="height: 100%;">
                    <div class="section-title">📉 Covariate Balance ASCII Love Plot</div>
                    <pre>{love_plot_content}</pre>
                </div>
            </div>
        </div>
    </div>
</body>
</html>
"""
        return html_template

save_html

save_html(filepath: str)

Saves the beautifully compiled HTML dashboard report to a local file.

PARAMETER DESCRIPTION
filepath

Full target file path.

TYPE: str

Source code in src\xpyrment\report\generator.py
def save_html(self, filepath: str):
    """Saves the beautifully compiled HTML dashboard report to a local file.

    Args:
        filepath (str): Full target file path.
    """
    # Ensure parent directories exist
    os.makedirs(os.path.dirname(os.path.abspath(filepath)), exist_ok=True)
    with open(filepath, "w", encoding="utf-8") as f:
        f.write(self.generate_html())

save_markdown

save_markdown(filepath: str)

Saves the GitHub-compatible Markdown summary report to a local file.

PARAMETER DESCRIPTION
filepath

Full target file path.

TYPE: str

Source code in src\xpyrment\report\generator.py
def save_markdown(self, filepath: str):
    """Saves the GitHub-compatible Markdown summary report to a local file.

    Args:
        filepath (str): Full target file path.
    """
    # Ensure parent directories exist
    os.makedirs(os.path.dirname(os.path.abspath(filepath)), exist_ok=True)
    with open(filepath, "w", encoding="utf-8") as f:
        f.write(self.generate_markdown())

plot_forest

plot_forest(
    df_raw: DataFrame,
    alpha: float = 0.05,
    title: str = "A/B Test Results - Relative Lift & 95% CIs",
    figsize: tuple = (10, 5),
) -> tuple

Generates a horizontal forest plot visualizing relative lift and confidence intervals.

A Forest Plot is the industrial standard for reviewing multiple metrics simultaneously. It displays each metric's estimated treatment lift along with its surrounding confidence bounds. This allows rapid, visual identification of which metrics experienced significant shifts, whether the shifts are positive or negative, and how much uncertainty surrounds each estimate.

Visual Elements
  • Center Dots: Represent the point estimate of the relative lift (\(\\hat{\\theta}\)).
  • Horizontal Bars: Represent the \(1 - \\alpha\) confidence interval (\([\\theta_{\\text{lower}}, \\ \\theta_{\\text{upper}}]\)).
  • Vertical Reference Line: Placed at \(x = 0\) (represented as a dashed red line) to denote the Null Hypothesis (no effect). If a metric's horizontal bar does not cross this dashed line, the effect is statistically significant.
  • Color Coding: Significant shifts (\(p < \\alpha\)) are colored in high-contrast teal, while insignificant shifts are shaded in neutral slate-grey.
PARAMETER DESCRIPTION
df_raw

A DataFrame containing the statistical summary. Must include the columns: - "metric_name" (str): Name of the target metric. - "relative_lift" (float): The point estimate of relative lift. - "rel_ci_lower" (float): The lower bound of the relative confidence interval. - "rel_ci_upper" (float): The upper bound of the relative confidence interval. - "p_value" (float): The calculated p-value of the hypothesis test.

TYPE: DataFrame

alpha

Nominal significance level used to color-code significance. Defaults to 0.05.

TYPE: float DEFAULT: 0.05

title

Title of the rendered plot. Defaults to "A/B Test Results - Relative Lift & 95% CIs".

TYPE: str DEFAULT: 'A/B Test Results - Relative Lift & 95% CIs'

figsize

Dimensions of the figure canvas. Defaults to (10, 5).

TYPE: tuple DEFAULT: (10, 5)

RETURNS DESCRIPTION
tuple

A tuple (fig, ax) containing: - fig (matplotlib.figure.Figure): The active matplotlib Figure canvas. - ax (matplotlib.axes.Axes): The axes container housing the rendered forest plot.

TYPE: tuple

Source code in src\xpyrment\report\export.py
def plot_forest(
    df_raw: pd.DataFrame,
    alpha: float = 0.05,
    title: str = "A/B Test Results - Relative Lift & 95% CIs",
    figsize: tuple = (10, 5),
) -> tuple:
    r"""Generates a horizontal forest plot visualizing relative lift and confidence intervals.

    A Forest Plot is the industrial standard for reviewing multiple metrics simultaneously. It displays
    each metric's estimated treatment lift along with its surrounding confidence bounds. This allows rapid,
    visual identification of which metrics experienced significant shifts, whether the shifts are positive
    or negative, and how much uncertainty surrounds each estimate.

    Visual Elements:
        - **Center Dots**: Represent the point estimate of the relative lift ($\\hat{\\theta}$).
        - **Horizontal Bars**: Represent the $1 - \\alpha$ confidence interval ($[\\theta_{\\text{lower}}, \\ \\theta_{\\text{upper}}]$).
        - **Vertical Reference Line**: Placed at $x = 0$ (represented as a dashed red line) to denote the Null Hypothesis
          (no effect). If a metric's horizontal bar does not cross this dashed line, the effect is statistically significant.
        - **Color Coding**: Significant shifts ($p < \\alpha$) are colored in high-contrast teal, while insignificant
          shifts are shaded in neutral slate-grey.

    Args:
        df_raw (pd.DataFrame): A DataFrame containing the statistical summary. Must include the columns:
            - `"metric_name"` (str): Name of the target metric.
            - `"relative_lift"` (float): The point estimate of relative lift.
            - `"rel_ci_lower"` (float): The lower bound of the relative confidence interval.
            - `"rel_ci_upper"` (float): The upper bound of the relative confidence interval.
            - `"p_value"` (float): The calculated p-value of the hypothesis test.
        alpha (float): Nominal significance level used to color-code significance. Defaults to 0.05.
        title (str): Title of the rendered plot. Defaults to `"A/B Test Results - Relative Lift & 95% CIs"`.
        figsize (tuple): Dimensions of the figure canvas. Defaults to `(10, 5)`.

    Returns:
        tuple: A tuple `(fig, ax)` containing:
            - `fig` (matplotlib.figure.Figure): The active matplotlib Figure canvas.
            - `ax` (matplotlib.axes.Axes): The axes container housing the rendered forest plot.
    """
    df = df_raw.copy().sort_values(by="metric_name")

    sns.set_theme(style="whitegrid")
    fig, ax = plt.subplots(figsize=figsize)

    ax.axvline(0, color="#d32f2f", linestyle="--", linewidth=1.5, label="No Effect")

    y_positions = np.arange(len(df))

    sig_color = "#009688"
    nonsig_color = "#78909c"

    for idx, row in enumerate(df.itertuples(index=False)):
        lift = getattr(row, "relative_lift")
        ci_lower = getattr(row, "rel_ci_lower")
        ci_upper = getattr(row, "rel_ci_upper")
        p_val = getattr(row, "p_value")

        is_significant = p_val < alpha
        color = sig_color if is_significant else nonsig_color

        ax.plot([ci_lower, ci_upper], [idx, idx], color=color, linewidth=2.5, zorder=2)

        ax.scatter(
            lift,
            idx,
            color=color,
            s=120,
            edgecolors="black",
            linewidths=1.2,
            zorder=3,
        )

        text_label = f" {lift:+.2%} (p={p_val:.4f})"
        ax.text(
            max(ci_upper, 0) + 0.005,
            idx,
            text_label,
            va="center",
            ha="left",
            fontsize=10,
            fontweight="bold" if is_significant else "normal",
            color=color,
        )

    ax.set_yticks(y_positions)
    ax.set_yticklabels(df["metric_name"], fontsize=12, fontweight="bold")
    ax.set_xlabel("Relative Lift (%)", fontsize=12, fontweight="bold")
    ax.set_title(title, fontsize=14, fontweight="bold", pad=20)

    import matplotlib.ticker as mtick

    ax.xaxis.set_major_formatter(mtick.PercentFormatter(1.0))

    sns.despine(left=True, bottom=True)
    ax.grid(True, axis="x", linestyle=":", alpha=0.6)
    ax.grid(False, axis="y")

    x_min, x_max = ax.get_xlim()
    ax.set_xlim(x_min - 0.01, x_max + 0.03)

    plt.tight_layout()
    return fig, ax

plot_power_curve

plot_power_curve(
    power_curve_data: Dict[str, ndarray],
    title: str = "A/B Test Design - Required Sample Size vs. MDE",
    figsize: tuple = (10, 6),
) -> tuple

Plots required sample size per variant across a range of Minimum Detectable Effects (MDE).

Mathematical Relationship and CUPED Savings

Because sample size scales quadratically with the inverse of the MDE: $$ N \propto \frac{1}{\delta^2} $$ small increases in the precision requirements (smaller MDE) trigger massive increases in the required sample size.

If a pre-period covariate is registered, the plot overlays a second curve displaying the required sample size when applying CUPED variance reduction. - Let \(\\rho\) be the correlation coefficient between the pre-period covariate and the post-period outcome. - The required sample size under CUPED (\(N_{\\text{CUPED}}\)) is deflated by a factor of \((1 - \\rho^2)\): $$ N_{\text{CUPED}} = N_{\text{standard}} \times (1 - \rho^2) $$ - The visual shaded gap between the standard curve and the CUPED curve demonstrates the direct sample size savings (and consequently, the timeline savings) gained by utilizing pre-period covariate adjustment.

PARAMETER DESCRIPTION
power_curve_data

A dictionary containing: - "mde_relative" (np.ndarray): 1D array of hypothetical MDE percentages. - "sample_size_per_variant" (np.ndarray): Required sample size under standard Wald designs. - "cuped_sample_size_per_variant" (np.ndarray, optional): Required sample size under CUPED adjustments.

TYPE: Dict[str, ndarray]

title

Title of the rendered plot. Defaults to "A/B Test Design - Required Sample Size vs. MDE".

TYPE: str DEFAULT: 'A/B Test Design - Required Sample Size vs. MDE'

figsize

Dimensions of the figure canvas. Defaults to (10, 6).

TYPE: tuple DEFAULT: (10, 6)

RETURNS DESCRIPTION
tuple

A tuple (fig, ax) containing: - fig (matplotlib.figure.Figure): The active matplotlib Figure canvas. - ax (matplotlib.axes.Axes): The axes container housing the curves.

TYPE: tuple

Source code in src\xpyrment\report\export.py
def plot_power_curve(
    power_curve_data: Dict[str, np.ndarray],
    title: str = "A/B Test Design - Required Sample Size vs. MDE",
    figsize: tuple = (10, 6),
) -> tuple:
    r"""Plots required sample size per variant across a range of Minimum Detectable Effects (MDE).

    ??? mathbox "Mathematical Relationship and CUPED Savings"

        Because sample size scales quadratically with the inverse of the MDE:
        $$
        N \\propto \\frac{1}{\\delta^2}
        $$
        small increases in the precision requirements (smaller MDE) trigger massive increases in the required sample size.

        If a pre-period covariate is registered, the plot overlays a second curve displaying the required sample size
        when applying CUPED variance reduction.
        - Let $\\rho$ be the correlation coefficient between the pre-period covariate and the post-period outcome.
        - The required sample size under CUPED ($N_{\\text{CUPED}}$) is deflated by a factor of $(1 - \\rho^2)$:
          $$
          N_{\\text{CUPED}} = N_{\\text{standard}} \\times (1 - \\rho^2)
          $$
        - The visual shaded gap between the standard curve and the CUPED curve demonstrates the direct **sample size savings**
          (and consequently, the timeline savings) gained by utilizing pre-period covariate adjustment.

    Args:
        power_curve_data (Dict[str, np.ndarray]): A dictionary containing:
            - `"mde_relative"` (np.ndarray): 1D array of hypothetical MDE percentages.
            - `"sample_size_per_variant"` (np.ndarray): Required sample size under standard Wald designs.
            - `"cuped_sample_size_per_variant"` (np.ndarray, optional): Required sample size under CUPED adjustments.
        title (str): Title of the rendered plot. Defaults to `"A/B Test Design - Required Sample Size vs. MDE"`.
        figsize (tuple): Dimensions of the figure canvas. Defaults to `(10, 6)`.

    Returns:
        tuple: A tuple `(fig, ax)` containing:
            - `fig` (matplotlib.figure.Figure): The active matplotlib Figure canvas.
            - `ax` (matplotlib.axes.Axes): The axes container housing the curves.
    """
    sns.set_theme(style="darkgrid")
    fig, ax = plt.subplots(figsize=figsize)

    mde_pct = power_curve_data["mde_relative"]
    standard_n = power_curve_data["sample_size_per_variant"]

    ax.plot(
        mde_pct,
        standard_n,
        color="#e53935",
        linewidth=2.5,
        marker="o",
        markersize=5,
        label="Standard A/B Design",
    )

    if "cuped_sample_size_per_variant" in power_curve_data:
        cuped_n = power_curve_data["cuped_sample_size_per_variant"]
        ax.plot(
            mde_pct,
            cuped_n,
            color="#1e88e5",
            linewidth=2.5,
            marker="s",
            markersize=5,
            label="CUPED Design (Variance Reduced)",
        )

        ax.fill_between(
            mde_pct,
            cuped_n,
            standard_n,
            color="#bbdefb",
            alpha=0.3,
            label="Sample Size Savings via CUPED",
        )

    import matplotlib.ticker as mtick

    ax.xaxis.set_major_formatter(mtick.PercentFormatter(1.0))
    ax.get_yaxis().set_major_formatter(mtick.FuncFormatter(lambda x, p: f"{int(x):,}"))

    ax.set_xlabel(
        "Relative Minimum Detectable Effect (MDE)", fontsize=12, fontweight="bold"
    )
    ax.set_ylabel("Required Sample Size (Per Variant)", fontsize=12, fontweight="bold")
    ax.set_title(title, fontsize=14, fontweight="bold", pad=20)

    ax.legend(fontsize=11, frameon=True, facecolor="white")
    sns.despine()

    plt.tight_layout()
    return fig, ax