Tutorial: Walk-Forward Cross-Validation¶
Proper temporal CV that respects time ordering and prevents leakage.
Why Walk-Forward?¶
Standard k-fold CV shuffles data, destroying temporal relationships:
# WRONG: sklearn's KFold ignores time
from sklearn.model_selection import KFold
for train_idx, test_idx in KFold(5).split(X):
# test_idx might contain observations BEFORE train_idx!
pass
Walk-forward CV always trains on past, tests on future.
Basic Usage¶
from temporalcv import WalkForwardCV
import numpy as np
# Sample data
X = np.random.randn(200, 5)
y = np.random.randn(200)
# Create splitter
cv = WalkForwardCV(
n_splits=5,
window_type="expanding", # Training window grows
extra_gap=0, # No extra gap (default - adjust for multi-step forecasts)
test_size=1 # 1 observation per test fold
)
# Use like sklearn
for train_idx, test_idx in cv.split(X):
print(f"Train: {train_idx[0]}-{train_idx[-1]}, Test: {test_idx[0]}-{test_idx[-1]}")
Output:
Train: 0-159, Test: 160-160
Train: 0-167, Test: 168-168
Train: 0-175, Test: 176-176
Train: 0-183, Test: 184-184
Train: 0-191, Test: 192-192
Window Types¶
Expanding Window¶
Training window grows with each split. Good when more data is always better.
cv = WalkForwardCV(
n_splits=5,
window_type="expanding",
test_size=10
)
Split 1: Train [0, 150), Test [150, 160)
Split 2: Train [0, 160), Test [160, 170)
Split 3: Train [0, 170), Test [170, 180)
...
Sliding Window¶
Fixed-size training window. Good when recent data is more relevant.
cv = WalkForwardCV(
n_splits=5,
window_type="sliding",
window_size=100, # Required for sliding
test_size=10
)
Split 1: Train [50, 150), Test [150, 160)
Split 2: Train [60, 160), Test [160, 170)
Split 3: Train [70, 170), Test [170, 180)
...
Gap Enforcement¶
Critical for multi-step forecasting. For h-step forecasts, set horizon=h:
# For 2-step ahead forecasts
cv = WalkForwardCV(
n_splits=5,
window_type="sliding",
window_size=100,
horizon=2, # Minimum separation for 2-step forecasts
extra_gap=0, # Optional: additional safety margin (default: 0)
test_size=1
)
for train_idx, test_idx in cv.split(X):
# Guaranteed: train_idx[-1] + total_gap < test_idx[0]
# where total_gap = horizon + extra_gap
total_gap = (cv.horizon or 0) + cv.extra_gap
assert train_idx[-1] + total_gap < test_idx[0]
Why Gap Matters¶
Without proper separation, the last training observation can leak into test features:
h=2 forecast: y[t+2] = f(y[t], y[t-1], ...)
If train ends at t=99 and test starts at t=100 (no gap):
- Test prediction uses y[99] (last training observation)
- This is fine for h=1, but for h=2 it's LEAKAGE
With horizon=2, extra_gap=0:
- Train ends at t=99, test starts at t=102 (total separation = 2)
- Test prediction for y[102] uses y[101], y[100], ... (safe!)
sklearn Compatibility¶
Works with cross_val_score and GridSearchCV:
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge
cv = WalkForwardCV(n_splits=5, window_type="expanding")
model = Ridge(alpha=1.0)
scores = cross_val_score(model, X, y, cv=cv, scoring="neg_mean_absolute_error")
print(f"MAE: {-scores.mean():.4f} (+/- {scores.std():.4f})")
Split Inspection¶
Get detailed information about each split:
cv = WalkForwardCV(n_splits=5, window_type="sliding", window_size=100, horizon=2, extra_gap=0)
for split_info in cv.get_split_info(X):
print(f"Split {split_info.split_idx}:")
print(f" Train: [{split_info.train_start}, {split_info.train_end})")
print(f" Test: [{split_info.test_start}, {split_info.test_end})")
print(f" Train size: {split_info.train_size}")
print(f" Gap: {split_info.gap}")
Complete Example: Walk-Forward Evaluation¶
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_absolute_error
from temporalcv import WalkForwardCV
# Generate AR(1) data
np.random.seed(42)
n = 300
y = np.zeros(n)
for t in range(1, n):
y[t] = 0.9 * y[t-1] + np.random.randn() * 0.1
# Helper: Create lag features WITHIN a fold (prevents leakage)
def create_lag_features(data, n_lags=5):
"""Create lag features from data - use only within CV folds."""
X = np.column_stack([np.roll(data, i) for i in range(1, n_lags + 1)])
return X[n_lags:], data[n_lags:] # Remove rows with NaN
# Walk-forward CV
cv = WalkForwardCV(
n_splits=10,
window_type="sliding",
window_size=150,
horizon=2, # Minimum separation for 2-step forecasts
extra_gap=0, # No additional safety margin
test_size=5
)
# Evaluate - compute features INSIDE each fold to prevent leakage
results = []
model = Ridge(alpha=1.0)
n_lags = 5
for fold, (train_idx, test_idx) in enumerate(cv.split(y)):
# Extract data for this fold
y_train = y[train_idx]
y_test = y[test_idx]
# Create features INSIDE the fold (correct approach)
X_train, y_train_clean = create_lag_features(y_train, n_lags=n_lags)
# For test: need context from training for first n_lags predictions
# Use last n_lags values from training as context
y_context = np.concatenate([y_train[-n_lags:], y_test])
X_test, _ = create_lag_features(y_context, n_lags=n_lags)
# X_test now has correct features for y_test (first n_lags rows are for context)
# Fit on training
model.fit(X_train, y_train_clean)
# Predict on test
preds = model.predict(X_test)
actuals = y_test
# Compute metrics
mae = mean_absolute_error(actuals, preds)
results.append({
'fold': fold,
'train_size': len(X_train),
'test_size': len(X_test),
'mae': mae
})
# Summary
import pandas as pd
df = pd.DataFrame(results)
print(df.to_string(index=False))
print(f"\nMean MAE: {df['mae'].mean():.4f}")
print(f"Std MAE: {df['mae'].std():.4f}")
Best Practices¶
1. Choose Window Type Based on Data¶
Scenario |
Window Type |
Why |
|---|---|---|
Stationary process |
Sliding |
Old data equally relevant |
Trending/seasonal |
Expanding |
More data improves estimates |
Regime changes |
Sliding (short) |
Recent data more relevant |
Limited data |
Expanding |
Maximize training size |
2. Set Appropriate Gap¶
# For h-step forecasting: set horizon=h
horizon = 2 # 2-step forecast
extra_gap = 0 # Minimum safe (total separation = horizon)
# Conservative: add safety margin
extra_gap = 1 # Total separation = horizon + 1 = 3
3. Use Enough Splits¶
Minimum: 5 splits (statistical validity)
Typical: 10-20 splits
Maximum: Limited by min_train_size and test_size
4. Validate Split Boundaries¶
from temporalcv.gates import gate_temporal_boundary
for train_idx, test_idx in cv.split(X):
result = gate_temporal_boundary(
train_end_idx=train_idx[-1],
test_start_idx=test_idx[0],
horizon=2,
extra_gap=cv.extra_gap
)
assert result.status.name == "PASS"
Common Pitfalls¶
Pitfall 1: Features Computed Before Split¶
Note: Pure lag features (y[t-1], y[t-2], etc.) are backward-looking and safe. The real danger is rolling statistics, centered windows, or target-derived features.
# SAFE - lag features only look backward
X = create_lag_features(y) # Lag features are backward-looking
for train_idx, test_idx in cv.split(X):
model.fit(X[train_idx], y[train_idx]) # This is fine!
# DANGEROUS - rolling stats, centered windows, target encoding
X['rolling_mean'] = y.rolling(10, center=True).mean() # Uses future!
X['target_mean'] = y.groupby(category).transform('mean') # Uses future!
# ISSUE - test features have no context for first n_lags
for train_idx, test_idx in cv.split(y):
X_train = create_lag_features(y[train_idx])
X_test = create_lag_features(y[test_idx]) # First n_lags rows are NaN!
model.fit(X_train, y[train_idx])
# RIGHT - use training context for test features
for train_idx, test_idx in cv.split(y):
y_train, y_test = y[train_idx], y[test_idx]
X_train, y_train_clean = create_lag_features(y_train, n_lags)
# Test features need context from end of training
y_context = np.concatenate([y_train[-n_lags:], y_test])
X_test, _ = create_lag_features(y_context, n_lags)
model.fit(X_train, y_train_clean)
preds = model.predict(X_test)
Note: The “Complete Example” above shows this pattern in detail.
Pitfall 2: Insufficient Gap¶
# WRONG for h=3 forecasts
cv = WalkForwardCV() # No horizon set, defaults to None (no gap enforcement)
# RIGHT
cv = WalkForwardCV(horizon=3, extra_gap=0) # Total separation = 3 (minimum safe)
Pitfall 3: Too Few Test Observations¶
# WRONG
cv = WalkForwardCV(n_splits=50, test_size=1) # 50 single-observation tests
# BETTER
cv = WalkForwardCV(n_splits=10, test_size=5) # 10 tests with 5 obs each