Multi-Horizon Forecasting Tutorial¶
How to systematically evaluate predictability across multiple forecast horizons and find your model’s effective range.
The Multi-Horizon Question¶
“How far ahead can my model usefully predict?”
Predictability decays with horizon. A model that beats persistence at h=1 may be worthless at h=20. This tutorial shows how to:
Compare performance across horizons
Find the “predictability horizon” where skill vanishes
Choose the right horizon for your use case
The compare_horizons() Function¶
temporalcv provides compare_horizons() for systematic horizon analysis:
from temporalcv import compare_horizons, WalkForwardCV
from sklearn.linear_model import Ridge
# Compare horizons 1 through 20
results = compare_horizons(
model=Ridge(),
X=X,
y=y,
horizons=range(1, 21),
cv=WalkForwardCV(n_splits=5),
metric='mae'
)
# Analyze results
for h, metrics in results.items():
skill = 1 - metrics['model_mae'] / metrics['baseline_mae']
print(f"Horizon {h:2d}: MAE={metrics['model_mae']:.4f}, Skill={skill:+.2%}")
Skill Score Interpretation¶
The skill score measures improvement over baseline:
Skill Score = 1 - (Model Error / Baseline Error)
Skill Score |
Interpretation |
|---|---|
> 0.20 |
Strong skill |
0.10 - 0.20 |
Moderate skill |
0.05 - 0.10 |
Marginal skill |
0.00 - 0.05 |
No practical skill |
< 0.00 |
Worse than baseline |
Visualizing Skill Decay¶
import matplotlib.pyplot as plt
import numpy as np
horizons = list(results.keys())
skill_scores = [
1 - results[h]['model_mae'] / results[h]['baseline_mae']
for h in horizons
]
plt.figure(figsize=(10, 6))
plt.plot(horizons, skill_scores, 'bo-', markersize=8)
plt.axhline(y=0.05, color='r', linestyle='--', label='Practical skill threshold')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.fill_between(horizons, 0, skill_scores, alpha=0.3, where=np.array(skill_scores) > 0.05)
plt.xlabel('Forecast Horizon')
plt.ylabel('Skill Score')
plt.title('Predictability Decay with Horizon')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Finding the Predictability Horizon¶
The predictability horizon is where skill drops below a practical threshold:
def find_predictability_horizon(results, threshold=0.05):
"""Find the horizon where skill drops below threshold."""
for h in sorted(results.keys()):
skill = 1 - results[h]['model_mae'] / results[h]['baseline_mae']
if skill < threshold:
return h - 1 if h > 1 else 0
return max(results.keys()) # Skill never drops below threshold
pred_horizon = find_predictability_horizon(results, threshold=0.05)
print(f"Predictability horizon: {pred_horizon} steps")
Multi-Step Forecasting Strategies¶
Strategy 1: Direct (Independent Models)¶
Train a separate model for each horizon:
from temporalcv import WalkForwardCV
from sklearn.linear_model import Ridge
direct_models = {}
for h in range(1, pred_horizon + 1):
# Create features and targets for horizon h
X_h, y_h = create_features_for_horizon(data, horizon=h)
# Train with proper gap
cv = WalkForwardCV(n_splits=5, horizon=h)
model = Ridge()
scores = cross_val_score(model, X_h, y_h, cv=cv, scoring='neg_mae')
direct_models[h] = model.fit(X_h, y_h) # Final fit on all data
print(f"Horizon {h}: MAE = {-scores.mean():.4f}")
Pros: Each model optimized for its horizon Cons: More models to maintain, no information sharing
Strategy 2: Recursive (Iterative Prediction)¶
Use h=1 model iteratively:
def recursive_forecast(model, initial_features, horizon):
"""Generate multi-step forecast recursively."""
predictions = []
current_features = initial_features.copy()
for step in range(horizon):
# Predict next step
pred = model.predict(current_features.reshape(1, -1))[0]
predictions.append(pred)
# Update features with prediction
current_features = np.roll(current_features, -1)
current_features[-1] = pred
return np.array(predictions)
# Use h=1 model recursively
forecasts = recursive_forecast(model_h1, X_test[0], horizon=10)
Pros: Single model, captures dynamics Cons: Error accumulation, sensitive to model bias
Strategy 3: Multi-Output (Single Model)¶
Train one model to predict all horizons simultaneously:
from sklearn.multioutput import MultiOutputRegressor
# Create multi-horizon targets
def create_multi_horizon_data(y, max_horizon):
"""Create features and multi-horizon targets."""
n = len(y) - max_horizon
X = y[:n].reshape(-1, 1) # Simplified: just lag-1
Y = np.column_stack([
y[h:n+h] for h in range(1, max_horizon + 1)
])
return X, Y
X_multi, Y_multi = create_multi_horizon_data(y, max_horizon=pred_horizon)
# Train multi-output model
model = MultiOutputRegressor(Ridge())
model.fit(X_train, Y_train)
# Predict all horizons at once
all_forecasts = model.predict(X_test) # Shape: (n_samples, n_horizons)
Pros: Captures cross-horizon correlations Cons: May underperform at specific horizons
Gap Enforcement for Multi-Horizon¶
Critical: CV gap must match your forecast horizon.
from temporalcv import WalkForwardCV
# For h=5 forecasting
cv_h5 = WalkForwardCV(
n_splits=5,
horizon=5, # Gap = 5
)
# For h=20 forecasting
cv_h20 = WalkForwardCV(
n_splits=5,
horizon=20, # Gap = 20
)
# WRONG: Single CV for all horizons
# cv_shared = WalkForwardCV(n_splits=5) # No gap!
Statistical Significance Across Horizons¶
Use the Diebold-Mariano test to check if skill is significant at each horizon:
from temporalcv import dm_test
for h in range(1, pred_horizon + 1):
model_errors = results[h]['model_errors']
baseline_errors = results[h]['baseline_errors']
dm_result = dm_test(
errors1=baseline_errors,
errors2=model_errors,
h=h # Horizon for HAC variance
)
skill = 1 - results[h]['model_mae'] / results[h]['baseline_mae']
sig = "***" if dm_result.pvalue < 0.01 else "**" if dm_result.pvalue < 0.05 else ""
print(f"Horizon {h:2d}: Skill={skill:+.2%}, p={dm_result.pvalue:.3f} {sig}")
Complete Example¶
Based on Example 09: Multi-Horizon:
import numpy as np
from temporalcv import WalkForwardCV, compare_horizons, dm_test
from sklearn.linear_model import Ridge
# Generate synthetic data with decaying predictability
np.random.seed(42)
n = 500
y = np.cumsum(np.random.randn(n) * 0.1) # Random walk
X = np.column_stack([
np.roll(y, 1), # Lag 1
np.roll(y, 2), # Lag 2
np.roll(y, 7), # Seasonal lag
])
X = X[10:] # Remove edge effects
y = y[10:]
# Compare horizons
print("=" * 50)
print("MULTI-HORIZON ANALYSIS")
print("=" * 50)
for h in [1, 5, 10, 20]:
# Proper CV with gap
cv = WalkForwardCV(n_splits=5, horizon=h)
model_errors = []
baseline_errors = []
for train_idx, test_idx in cv.split(X, y):
model = Ridge().fit(X[train_idx], y[train_idx])
pred = model.predict(X[test_idx])
model_errors.extend(np.abs(pred - y[test_idx]))
baseline_errors.extend(np.abs(y[train_idx][-1] - y[test_idx])) # Persistence
model_mae = np.mean(model_errors)
baseline_mae = np.mean(baseline_errors)
skill = 1 - model_mae / baseline_mae
# Statistical test
dm = dm_test(np.array(baseline_errors), np.array(model_errors), h=h)
print(f"\nHorizon h={h}:")
print(f" Model MAE: {model_mae:.4f}")
print(f" Baseline MAE: {baseline_mae:.4f}")
print(f" Skill Score: {skill:+.2%}")
print(f" DM p-value: {dm.pvalue:.4f}")
print(f" Significant: {'Yes' if dm.pvalue < 0.05 else 'No'}")
Key Takeaways¶
Predictability decays: Always test multiple horizons
Match gap to horizon: CV gap >= forecast horizon
Statistical significance: Use DM test at each horizon
Choose strategy: Direct for accuracy, recursive for simplicity
Document the horizon: Report the practical limit of your model
See Also¶
Example 09: Multi-Horizon — Complete code example
Lag Selection Guide — Choosing feature lags
Statistical Tests — DM test for significance
Walk-Forward CV — Gap enforcement