API Reference: Walk-Forward Cross-Validation¶
sklearn-compatible temporal cross-validation with gap enforcement for h-step forecasting.
When to Use¶
graph TD
A[Time Series CV?] --> B{Data characteristics}
B -->|Recent data more relevant| C[sliding window]
B -->|More data always helps| D[expanding window]
B -->|Financial with overlap| E[PurgedKFold]
C --> F{Multi-step forecast?}
D --> F
F -->|Yes, h > 1| G[Set horizon=h]
F -->|No, h=1| H[horizon=1 or None]
G --> I{Extra safety?}
I -->|Yes| J[extra_gap > 0]
I -->|No| K[extra_gap=0]
Common Mistakes¶
No gap for h-step forecasting
For h=5 forecast:
WalkForwardCV(horizon=5)requiredWithout gap, target leaks into training features
Using KFold on time series
Random splits destroy temporal order → 47%+ fake improvement
Always use
WalkForwardCVorTimeSeriesSplit
Sliding window too small
Window must be larger than model’s memory
Rule of thumb:
window_size >= 5 * n_features
See Also: Walk-Forward Tutorial, Example 07, Example 20
Data Classes¶
SplitInfo¶
Metadata for a single CV split.
@dataclass
class SplitInfo:
split_idx: int # Zero-based split index
train_start: int # First training index (inclusive)
train_end: int # Last training index (inclusive)
test_start: int # First test index (inclusive)
test_end: int # Last test index (inclusive)
Properties:
Property |
Type |
Description |
|---|---|---|
|
|
Number of training samples |
|
|
Number of test samples |
|
|
Actual gap between train end and test start |
Classes¶
WalkForwardCV¶
Walk-forward cross-validation with gap enforcement.
Inherits from sklearn.model_selection.BaseCrossValidator.
class WalkForwardCV(BaseCrossValidator):
def __init__(
self,
n_splits: int = 5,
window_type: Literal["expanding", "sliding"] = "expanding",
window_size: Optional[int] = None,
horizon: Optional[int] = None,
extra_gap: int = 0,
test_size: int = 1,
)
Parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Number of CV folds |
|
|
|
|
|
|
|
Window size (required for sliding) |
|
|
|
Forecast horizon (minimum required separation for h-step forecasts) |
|
|
|
Additional separation beyond horizon (total = horizon + extra_gap) |
|
|
|
Samples in each test fold |
Window Types:
Type |
Description |
When to Use |
|---|---|---|
|
Training window grows from start |
More data always helps |
|
Fixed-size window slides forward |
Recent data more relevant |
Methods¶
split(X, y=None, groups=None)¶
Generate train/test indices.
def split(
self,
X: ArrayLike,
y: Optional[ArrayLike] = None,
groups: Optional[ArrayLike] = None,
) -> Generator[Tuple[np.ndarray, np.ndarray], None, None]
Yields: (train_indices, test_indices) tuples
Example:
cv = WalkForwardCV(n_splits=5, window_type="sliding", window_size=100, horizon=2, extra_gap=0)
for train_idx, test_idx in cv.split(X):
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]
model.fit(X_train, y_train)
preds = model.predict(X_test)
get_n_splits(X=None, y=None, groups=None)¶
Return number of splits.
def get_n_splits(
self,
X: Optional[ArrayLike] = None,
y: Optional[ArrayLike] = None,
groups: Optional[ArrayLike] = None,
) -> int
Parameters:
If
Xprovided: returns actual number of valid splitsIf
Xis None: returns configuredn_splits
get_split_info(X)¶
Return detailed metadata for all splits.
def get_split_info(self, X: ArrayLike) -> List[SplitInfo]
Returns: List of SplitInfo objects with split boundaries
Example:
cv = WalkForwardCV(n_splits=3, horizon=2, extra_gap=0)
for info in cv.get_split_info(X):
print(f"Split {info.split_idx}:")
print(f" Train: [{info.train_start}, {info.train_end}]")
print(f" Test: [{info.test_start}, {info.test_end}]")
print(f" Gap: {info.gap}") # Total gap = horizon + extra_gap
sklearn Integration¶
Works with cross_val_score and GridSearchCV:
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.linear_model import Ridge
from temporalcv import WalkForwardCV
cv = WalkForwardCV(n_splits=5, window_type="expanding", horizon=2, extra_gap=0)
# cross_val_score
scores = cross_val_score(
Ridge(), X, y, cv=cv,
scoring="neg_mean_absolute_error"
)
# GridSearchCV
param_grid = {"alpha": [0.1, 1.0, 10.0]}
search = GridSearchCV(Ridge(), param_grid, cv=cv)
search.fit(X, y)
Gap Enforcement¶
For h-step ahead forecasts, set horizon=h:
# For 2-step forecasts: horizon=2
cv = WalkForwardCV(
n_splits=5,
window_type="sliding",
window_size=100,
horizon=2, # Minimum required separation for 2-step forecasts
extra_gap=0, # Optional: additional safety margin (default: 0)
test_size=1
)
for train_idx, test_idx in cv.split(X):
# Guaranteed: train_idx[-1] + total_gap < test_idx[0]
# where total_gap = horizon + extra_gap
total_gap = (cv.horizon or 0) + cv.extra_gap
assert train_idx[-1] + total_gap < test_idx[0]
Gap rule [T1]: For h-step ahead forecasting, total_separation = horizon + extra_gap must be at least h.
Per Bergmeir & Benitez (2012): temporal separation must equal or exceed forecast horizon.
Why gap matters:
h=2 forecast: y[t+2] = f(y[t], y[t-1], ...)
Without gap (horizon=None, extra_gap=0):
- Train ends at t=99, test starts at t=100 (gap=0)
- Test prediction y[100] uses features from y[99], y[98], ...
- For h=1: OK (predicting one step ahead)
- For h=2: LEAKAGE (y[100] target overlaps training features)
With horizon=2, extra_gap=0:
- Train ends at t=99, test starts at t=102 (total_gap=2)
- Test prediction y[102] uses features from y[101], y[100], ...
- No overlap with training targets (safe for h=2 forecasts)
Nested Cross-Validation¶
NestedWalkForwardCV¶
Nested walk-forward CV for hyperparameter tuning with temporal integrity.
class NestedWalkForwardCV:
def __init__(
self,
estimator,
param_grid: Dict[str, List] = None, # For grid search
param_distributions: Dict[str, Any] = None, # For random search
*,
n_iter: int = None, # Required for random search
n_outer_splits: int = 3,
n_inner_splits: int = 5,
horizon: int = 1,
extra_gap: int = None, # Defaults to 0
window_type: str = "expanding",
scoring: str = "neg_mean_squared_error",
refit: bool = True,
random_state: int = None,
)
Parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
sklearn estimator |
- |
Model with fit/predict methods |
|
|
|
Parameters for grid search |
|
|
|
Distributions for random search |
|
|
|
Iterations for random search |
|
|
|
Outer folds (performance estimation) |
|
|
|
Inner folds (hyperparameter selection) |
|
|
|
Forecast horizon (minimum required separation) |
|
|
|
Additional separation beyond horizon (total = horizon + extra_gap) |
|
|
|
Refit on all data with best params |
Attributes (after fit()):
Attribute |
Type |
Description |
|---|---|---|
|
|
Best hyperparameters found |
|
estimator |
Model refitted with best_params_ |
|
|
Unbiased scores per outer fold |
|
|
Mean of outer scores |
|
|
Std of outer scores |
|
|
Fraction agreeing with best_params_ |
Example: Grid Search¶
from temporalcv import NestedWalkForwardCV
from sklearn.linear_model import Ridge
nested_cv = NestedWalkForwardCV(
estimator=Ridge(),
param_grid={"alpha": [0.01, 0.1, 1.0, 10.0]},
n_outer_splits=3,
n_inner_splits=5,
horizon=4,
scoring="neg_mean_squared_error",
)
nested_cv.fit(X, y)
print(f"Best alpha: {nested_cv.best_params_['alpha']}")
print(f"Score: {nested_cv.mean_outer_score_:.4f} ± {nested_cv.std_outer_score_:.4f}")
print(f"Stability: {nested_cv.params_stability_:.1%}")
# Predict with refitted model
predictions = nested_cv.predict(X_new)
Example: Randomized Search¶
from scipy.stats import loguniform
nested_cv = NestedWalkForwardCV(
estimator=Ridge(),
param_distributions={"alpha": loguniform(1e-3, 1e2)},
n_iter=20,
n_outer_splits=3,
random_state=42,
)
nested_cv.fit(X, y)
When to Use Nested CV vs Single CV¶
Use Case |
Recommendation |
|---|---|
Fixed hyperparameters |
Single |
Few hyperparameters, low impact |
Single CV with default params |
Many hyperparameters, high impact |
Nested CV (avoids optimistic bias) |
Publication/rigorous validation |
Nested CV (unbiased estimates) |
NestedCVResult¶
Structured result from nested CV.
@dataclass
class NestedCVResult:
best_params: Dict[str, Any] # Optimal parameters
outer_scores: np.ndarray # Per-fold scores
mean_outer_score: float # Mean score
std_outer_score: float # Std of scores
n_outer_splits: int
n_inner_splits: int
scoring: str
best_params_per_fold: List[Dict] # Per-fold selections
params_stability: float # Consistency metric
Access via nested_cv.get_result().
References¶
[T1] Time Series Cross-Validation:
Bergmeir, C. & Benítez, J.M. (2012). “On the use of cross-validation for time series predictor evaluation.” Information Sciences, 191, 192-213. DOI: 10.1016/j.ins.2011.12.028
Tashman, L.J. (2000). “Out-of-sample tests of forecasting accuracy: An analysis and review.” International Journal of Forecasting, 16(4), 437-450. DOI: 10.1016/S0169-2070(00)00065-0
[T1] Nested Cross-Validation:
Varma, S. & Simon, R. (2006). “Bias in error estimation when using cross-validation for model selection.” BMC Bioinformatics, 7(1), 91. DOI: 10.1186/1471-2105-7-91