PASS Framework¶

A decision-tree guide for using temporalcv’s validation gates to catch leakage before it corrupts your results.

Philosophy¶

“Run gates first, train models second.”

Validation gates are pre-flight checks for your ML pipeline. They catch problems before you waste time on model development that will fail in production.

The Gate Decision Tree¶

        graph TD
    A[Start: New ML Pipeline] --> B{Running gates?}
    B -->|No| C[Run gate_signal_verification first]
    B -->|Yes| D{What did gates return?}

    D -->|HALT| E[STOP: Critical issue detected]
    D -->|WARN| F[Proceed with caution]
    D -->|PASS| G[Continue to model training]

    E --> H{Which gate HALTed?}
    H -->|shuffled_target| I[Features encode target position]
    H -->|temporal_boundary| J[Gap violation for h-step]
    H -->|suspicious_improvement| K[Unrealistic performance]

    I --> L[Fix: Check .shift on rolling features]
    J --> M[Fix: Set horizon parameter]
    K --> N[Fix: Investigate data pipeline]

    L --> C
    M --> C
    N --> C

    F --> O[Verify externally before deploying]
    G --> P[Proceed to WalkForwardCV]

Gate Reference¶

`gate_signal_verification` — The Definitive Leakage Detector¶

What it tests: Whether features encode information about target position (not just value).

How it works:

Shuffle the target labels randomly
Train your model on shuffled data
If model still performs well → features leak target position

from temporalcv.gates import gate_signal_verification

result = gate_signal_verification(
    model=my_model,
    X=X,
    y=y,
    n_shuffles=100,  # Statistical power requires ≥100
    test_size=0.2
)

print(f"Status: {result.status}")
print(f"p-value: {result.pvalue:.4f}")

Interpretation:

Status	p-value	Meaning
HALT	< 0.05	Model beats shuffled baseline → leakage detected
WARN	0.05-0.10	Borderline, investigate further
PASS	> 0.10	No evidence of position encoding

Common causes of HALT:

Rolling stats without .shift()
Centered moving averages
Lookahead in feature engineering
Target leakage in pipeline

`gate_temporal_boundary` — Gap Enforcement¶

What it tests: Whether train/test splits have sufficient separation for h-step forecasting.

The rule: For h-step ahead forecasting, you need gap >= h.

from temporalcv.gates import gate_temporal_boundary

result = gate_temporal_boundary(
    cv=WalkForwardCV(n_splits=5),
    horizon=5,  # 5-step ahead forecast
    X=X
)

if result.status == "HALT":
    print(f"Gap too small: {result.actual_gap} < {result.required_gap}")

Interpretation:

Status	Meaning	Action
HALT	Gap < horizon	Set `horizon` parameter in CV
PASS	Gap >= horizon	Continue

`gate_suspicious_improvement` — Reality Check¶

What it tests: Whether model improvement over baseline is unrealistically large.

The heuristic: >20% improvement on first attempt is suspicious.

from temporalcv.gates import gate_suspicious_improvement

result = gate_suspicious_improvement(
    model_metric=model_mae,
    baseline_metric=persistence_mae,
    threshold=0.20  # 20% improvement threshold
)

Interpretation:

Improvement	Status	Meaning
< 5%	PASS	Realistic, typical for good models
5-20%	PASS	Good improvement, likely valid
20-50%	WARN	Investigate — possibly valid, often leakage
> 50%	HALT	Almost certainly leakage or bug

Running Gates in Sequence¶

The recommended order:

from temporalcv.gates import (
    gate_signal_verification,
    gate_temporal_boundary,
    gate_suspicious_improvement
)
from temporalcv import run_gates

# 1. Collect gate results
gate_results = [
    gate_signal_verification(model, X, y, n_shuffles=100),
    gate_temporal_boundary(cv, horizon=h),
    gate_suspicious_improvement(model_mae, baseline_mae),
]

# 2. Aggregate into report
report = run_gates(gate_results)

# 3. Act on status
if report.status == "HALT":
    print(f"BLOCKED: {report.summary()}")
    raise ValueError("Fix leakage before proceeding")
elif report.status == "WARN":
    print(f"CAUTION: {report.summary()}")
    # Proceed but verify externally
else:
    print("PASS: Proceeding to model training")

When to Run Gates¶

Phase	Gates to Run	Why
Initial exploration	`shuffled_target`	Catch pipeline bugs early
Before CV	`temporal_boundary`	Verify gap is correct
After CV	`suspicious_improvement`	Reality check results
Before deployment	All gates	Final validation

Gate False Positives¶

Gates can occasionally produce false signals. Here’s how to diagnose:

`shuffled_target` False HALT¶

Symptoms: Gate HALTs but you’ve verified features are correct.

Possible causes:

Very strong signal (rare but possible)
Deterministic features that naturally correlate with position

Resolution:

Increase n_shuffles to 500+
Check p-value distribution across multiple runs
If consistently < 0.01, it’s likely real leakage

`suspicious_improvement` False WARN¶

Symptoms: Gate WARNs but improvement is legitimate.

Possible causes:

Weak baseline (not using best available)
Domain where large improvements are common

Resolution:

Verify baseline is reasonable (persistence for time series)
Check literature for typical improvements in your domain
Document justification if proceeding

Guardrails: The HALT/WARN/PASS Framework¶

Philosophy¶

The Gate Decision Tree¶

Gate Reference¶

gate_signal_verification — The Definitive Leakage Detector¶

gate_temporal_boundary — Gap Enforcement¶

gate_suspicious_improvement — Reality Check¶

Running Gates in Sequence¶

When to Run Gates¶

Gate False Positives¶

shuffled_target False HALT¶

suspicious_improvement False WARN¶

See Also¶

`gate_signal_verification` — The Definitive Leakage Detector¶

`gate_temporal_boundary` — Gap Enforcement¶

`gate_suspicious_improvement` — Reality Check¶

`shuffled_target` False HALT¶

`suspicious_improvement` False WARN¶