Diagnostics¶
Tools for understanding what drives validation results and how sensitive they are to parameter choices.
Overview¶
The diagnostics module provides two key capabilities:
Influence diagnostics - Identify which observations disproportionately affect test statistics
Sensitivity analysis - Assess how results change with different parameter choices
Influence Diagnostics¶
When a DM test shows significant difference between models, you should ask: “Is this driven by a few outliers or consistent across the sample?”
from temporalcv.diagnostics import compute_dm_influence
# Get influence scores for each observation
influence = compute_dm_influence(errors1, errors2, h=1)
# Check for high-influence observations
print(f"High-influence observations: {influence.n_high_influence_obs}")
print(f"High-influence blocks: {influence.n_high_influence_blocks}")
# Investigate specific observations
high_obs = np.where(influence.observation_high_mask)[0]
print(f"Indices of high-influence observations: {high_obs}")
Interpreting Influence Results¶
The InfluenceDiagnostic provides two views:
observation_influence: HAC-adjusted per-observation scores (for exploration)block_influence: Block jackknife scores (recommended for decisions)
Use block influence for decisions because:
It respects temporal dependence structure
It’s theoretically justified for serially correlated data
Observation-level influence can be misleading with autocorrelation
Gap Sensitivity Analysis¶
How robust are your results to the gap parameter choice?
from temporalcv.diagnostics import gap_sensitivity_analysis
result = gap_sensitivity_analysis(
model=my_model,
X=X,
y=y,
gap_range=range(0, 15),
metric="mae",
degradation_threshold=0.10,
)
print(f"Break-even gap: {result.break_even_gap}")
print(f"Sensitivity score: {result.sensitivity_score:.3f}")
# Plot sensitivity curve
import matplotlib.pyplot as plt
plt.plot(result.gap_values, result.metrics)
plt.axhline(result.baseline_metric * 1.1, color='r', linestyle='--', label='10% degradation')
plt.xlabel("Gap")
plt.ylabel("MAE")
plt.title("Gap Sensitivity Analysis")
plt.legend()
Interpreting Sensitivity Results¶
break_even_gap: First gap where performance degrades beyond thresholdsensitivity_score: Coefficient of variation (higher = more sensitive)
A model with break_even_gap=1 is concerning—performance drops immediately with any gap, suggesting possible lookahead bias.
Best Practices¶
Always run influence diagnostics when publishing results
High-influence observations should be investigated
Consider leave-one-out robustness checks
Gap sensitivity is a leakage detector
Sharp degradation at extra_gap=1 suggests temporal leakage
Gradual degradation is expected as forecasting horizon increases
Combine with validation gates
Run
gate_signal_verification()firstUse diagnostics for deeper investigation after gates pass
API Reference¶
See the function signatures below for complete documentation.