Assumptions by Module¶
This document explicitly lists the assumptions required by each temporalcv module, what happens when they’re violated, and how to validate them.
Global Assumptions¶
These assumptions apply across all modules:
Assumption |
Description |
Validation |
|---|---|---|
Temporal ordering |
Data is sorted by time, oldest first |
|
No missing values |
Arrays contain no NaN/None |
|
Consistent dtype |
Arrays are float64 for precision |
|
statistical_tests.py¶
dm_test()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
n ≥ 30 |
Asymptotic normality |
Unreliable p-values |
Raises |
h ≥ 1 |
Valid horizon |
Invalid bandwidth |
Raises |
Same-length errors |
Paired comparison |
Cannot compute |
Raises |
Stationary errors |
Asymptotic theory |
Biased variance |
Check ADF test |
Critical: For h > 1, errors are MA(h-1). HAC variance handles this, but stationarity is still assumed.
pt_test()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
n ≥ 20 |
Asymptotic normality |
Unreliable p-values |
Raises |
Independence under H₀ |
Valid null distribution |
p-values too optimistic |
Check residual ACF |
Same-length arrays |
Paired comparison |
Cannot compute |
Raises |
Warning: For h > 1, HAC correction is NOT applied to PT test. P-values may be overly optimistic. Use DM test for rigorous multi-step comparison.
compute_hac_variance()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
Weak stationarity |
Consistent estimation |
Biased variance |
KPSS test |
Sufficient bandwidth |
Capture autocorrelation |
Underestimated variance |
bandwidth ≥ h - 1 |
conformal.py¶
SplitConformalPredictor¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
Exchangeability |
Coverage guarantee |
Coverage < (1-α) |
Shuffle test |
Calibration ≥ 10 (enforced) |
Valid quantile |
Raises ValueError |
n_calibration check |
Calibration ≥ 30-50 (recommended) |
Stable quantile |
High variance |
Sample size check |
i.i.d. residuals |
Valid quantile |
Systematic under/over-coverage |
Residual diagnostics |
Critical: Time series data is NOT exchangeable. Split conformal provides approximate, not exact, coverage for temporal data.
Mitigation: Use AdaptiveConformalPredictor for distribution shift, or accept approximate coverage.
AdaptiveConformalPredictor¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
Gradual shift |
Adaptation can track |
Lag in coverage |
Monitor coverage rate |
γ well-tuned |
Convergence |
Oscillation or slow adaptation |
Grid search |
cv.py¶
WalkForwardCV¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
gap ≥ horizon |
Prevent lookahead |
Information leakage |
Asserted in split |
Sufficient data |
Minimum splits |
Too few splits for inference |
n > window_size + n_splits × test_size |
Temporal ordering |
Valid train/test split |
Training on future |
Check timestamps |
Critical: If computing CHANGE targets (y[t+h] - y[t]), ensure target is computed BEFORE train/test split to avoid leakage through target construction.
persistence.py¶
compute_move_threshold()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
Training data only |
Prevent regime leakage |
Optimistic threshold |
Strict data partitioning |
n ≥ 30 |
Stable percentile |
High variance |
Sample size check |
Critical [BUG-003]: Computing threshold from full data (train + test) leaks future regime information. Always use training data only.
compute_move_conditional_metrics()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
n_up ≥ 10, n_down ≥ 10 |
Reliable estimates |
High variance |
Check |
Threshold from training |
No leakage |
Optimistic MC-SS |
Pass explicit threshold |
Same-length arrays |
Paired comparison |
Cannot compute |
Raises |
regimes.py¶
classify_volatility_regime()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
basis=’changes’ |
Correct methodology |
Mislabeled regimes |
Use default |
window < n |
Valid rolling |
All MED labels |
Check data length |
Sufficient variance |
Meaningful thresholds |
Degenerate classification |
Check std > 1e-8 |
Critical [BUG-005]: Using basis='levels' mislabels steady drifts as “high volatility”. Always use basis='changes' (default).
classify_direction_regime()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
Threshold from training |
No leakage |
Biased regime labels |
Strict partitioning |
threshold ≥ 0 |
Valid classification |
Raises ValueError |
Asserted |
gates.py¶
gate_signal_verification()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
Temporal relationship |
Shuffling destroys signal |
False HALT |
Not applicable |
Model is fit per call |
Fresh fit on shuffled |
Stale model state |
Provided model |
n_shuffles ≥ 3 |
Stable average |
High variance |
Default is 5 |
Interpretation: If model beats shuffled target, the features contain temporal structure correlated with target ordering — likely leakage.
gate_synthetic_ar1()¶
Assumption |
Required For |
Violation Consequence |
Validation |
|---|---|---|---|
phi < 1 |
Stationarity |
Explosive process |
Validated range |
n_samples ≥ 200 |
Stable estimates |
High variance |
Default is 500 |
Validation Checklist¶
Before using temporalcv, verify:
Data Quality¶
Data is sorted by time (oldest first)
No missing values
Sufficient sample size (n ≥ 50 for most applications)
Train/Test Split¶
Gap ≥ forecast horizon
Thresholds computed from training only
No feature engineering on full data
Statistical Tests¶
n ≥ 30 for DM test
n ≥ 20 for PT test
Residuals approximately stationary
Conformal Prediction¶
Understand that coverage is approximate for time series
Calibration set is large enough (≥ 30)
Consider adaptive conformal for distribution shift
Regime Classification¶
Using basis=’changes’ for volatility (not levels)
Thresholds from training data only
Check sample counts per regime
When Assumptions Fail¶
Assumption |
Fallback |
|---|---|
n too small |
Use bootstrap or qualitative comparison |
Non-stationarity |
Apply differencing or detrending first |
Exchangeability violated |
Accept approximate conformal coverage |
Too few samples per regime |
Mask low-n regimes or aggregate |
Temporal order unknown |
Cannot use temporal validation — stop |
Knowledge Tiers for Assumptions¶
Assumption |
Tier |
Justification |
|---|---|---|
n ≥ 30 for DM |
T2 |
Standard practice, some debate |
Gap ≥ horizon |
T1 |
Theoretically required |
Training-only thresholds |
T2 |
BUG-003 fix in v2 |
basis=’changes’ |
T2 |
BUG-005 fix in v2 |
n_up, n_down ≥ 10 |
T3 |
Rule of thumb |
Exchangeability |
T1 |
Required for exact conformal |