Note
Go to the end to download the full example code.
Example 14: Energy Load Forecasting¶
Real-World Case Study: Electrical Load Prediction¶
Energy load forecasting is a classic time series problem with:
Multiple Seasonalities: Daily (24h), weekly (168h), annual patterns - Peak demand during business hours - Lower on weekends - Weather-dependent seasonal shifts
Multi-Step Horizons: Grid operators need forecasts for: - Day-ahead (24 hours) - Week-ahead (168 hours) - Hour-ahead (real-time adjustments)
High Stakes: Errors cost money - Over-forecast: Pay for unused generation capacity - Under-forecast: Emergency power purchases at premium
This example demonstrates: - Handling multiple seasonalities in features - Multi-step ahead forecasting with proper gap enforcement - Calendar feature engineering (hour, day of week, holiday proxies)
Key Concepts¶
Hourly data with daily + weekly patterns
WalkForwardCV with horizon > 1 (gap enforcement critical)
Calendar features: hour_of_day, day_of_week, is_business_hour
Multi-step MAE evaluation
======================================================================
EXAMPLE 14: ENERGY LOAD FORECASTING
======================================================================
📊 Generated hourly energy load data: 1992 hours
Date range: 2023-01-08 00:00:00 to 2023-03-31 23:00:00
Mean load: 994 MW
Peak load: 1475 MW
Min load: 462 MW
📈 Average load by hour (daily pattern):
Peak hour: 12:00 (1300 MW)
Trough hour: 0:00 (687 MW)
======================================================================
PART 2: WHY MULTI-STEP FORECASTING NEEDS GAP ENFORCEMENT
======================================================================
For day-ahead forecasting (24-hour horizon):
At time t, you need to forecast load[t+1], load[t+2], ..., load[t+24]
If your training data goes up to time t, then:
- load_lag1 at time t+1 = load[t] ← AVAILABLE at time t
- load_lag24 at time t+1 = load[t-23] ← AVAILABLE at time t
But if you train on data up to t+23 for a 24-step forecast:
- Your features at t+24 use load[t+23] ← NOT AVAILABLE at time t!
WalkForwardCV enforces the gap: train data ends at t - horizon + 1.
======================================================================
PART 3: BASELINE — 24-HOUR LAG (SAME HOUR YESTERDAY)
======================================================================
📊 24-Hour Lag Baseline (same hour yesterday):
Test MAE: 65.49 MW
This is our reference for model improvement.
======================================================================
PART 4: TRAIN MODELS WITH CALENDAR FEATURES
======================================================================
✅ Model Performance (1-hour ahead features):
--------------------------------------------------
Model MAE (MW) vs Baseline
--------------------------------------------------
24-Hour Lag 65.49 (baseline)
GradientBoosting 47.92 -26.8%
Ridge 47.09 -28.1%
--------------------------------------------------
======================================================================
PART 5: WALK-FORWARD CV FOR DAY-AHEAD (24-HOUR) FORECASTING
======================================================================
For day-ahead forecasting, we set horizon=24:
- This enforces a 24-hour gap between train end and test start
- Ensures features at test time use only available information
- Simulates real-world operational forecasting
📊 Walk-Forward CV with horizon=24 (day-ahead):
----------------------------------------------------------------------
Fold Train Hours Test Period GB MAE Ridge MAE
----------------------------------------------------------------------
1 1464 2023-03-11 to 2023-03-17 48.44 49.41
2 1632 2023-03-18 to 2023-03-24 46.25 50.15
3 1800 2023-03-25 to 2023-03-31 49.28 44.12
----------------------------------------------------------------------
Mean 47.99 47.89
======================================================================
PART 6: FEATURE IMPORTANCE
======================================================================
📊 Feature Importance (GradientBoosting):
----------------------------------------
load_lag168 0.803 ████████████████████████████████████████
load_lag1 0.088 ████
load_lag24 0.063 ███
hour 0.025 █
is_business_hour 0.007
load_std24 0.005
load_ma24 0.004
day_of_week 0.003
is_weekend 0.002
======================================================================
PART 7: VALIDATION GATES
======================================================================
📊 Gate: Suspicious Improvement Check
Best model MAE: 47.89 MW
Baseline MAE: 68.56 MW
Improvement: 30.1%
Status: GateStatus.HALT
======================================================================
PART 8: KEY TAKEAWAYS
======================================================================
1. MULTI-STEP FORECASTING REQUIRES PROPER GAPS
- horizon=24 for day-ahead forecasting
- horizon=168 for week-ahead
- Gap ensures features use only available information
2. CALENDAR FEATURES ARE CRUCIAL
- hour_of_day captures daily pattern
- day_of_week captures weekly pattern
- is_business_hour helps with demand peaks
- Consider holidays as special features
3. LAGGED FEATURES AT SEASONAL PERIODS
- load_lag24: Same hour yesterday
- load_lag168: Same hour last week
- These are often the most important features
4. 24-HOUR LAG IS A STRONG BASELINE
- Simple: predict what happened same hour yesterday
- Often hard to beat significantly
- Use as sanity check for your model
5. ROLLING FEATURES NEED .shift()
- load_ma24 = load.shift(1).rolling(24).mean()
- Without shift, you include current value → leakage
- Even small leakage compounds over 24-hour horizon
6. OPERATIONAL CONSIDERATIONS
- Real-time corrections (hour-ahead) can use more recent data
- Day-ahead scheduling needs 24-hour gap
- Week-ahead planning needs 168-hour gap
- Different horizons may need different models
======================================================================
Example 14 complete.
======================================================================
from __future__ import annotations
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import Ridge
# temporalcv imports
from temporalcv import WalkForwardCV
from temporalcv.gates import gate_suspicious_improvement
# =============================================================================
# PART 1: Generate Synthetic Energy Load Data
# =============================================================================
def generate_energy_load_data(
n_days: int = 60,
base_load: float = 1000,
daily_amplitude: float = 300,
weekly_amplitude: float = 100,
noise_std: float = 50,
seed: int = 42,
) -> pd.DataFrame:
"""
Generate synthetic hourly energy load data.
Mimics real electrical load patterns:
- Daily cycle: Peak at 2-6 PM, trough at 3-5 AM
- Weekly cycle: Lower on weekends
- Random noise: Weather, demand variation
Parameters
----------
n_days : int
Number of days to simulate.
base_load : float
Baseline load in MW.
daily_amplitude : float
Daily cycle amplitude.
weekly_amplitude : float
Weekly cycle amplitude.
noise_std : float
Standard deviation of noise.
seed : int
Random seed.
Returns
-------
pd.DataFrame
Hourly load data with calendar features.
"""
rng = np.random.default_rng(seed)
n_hours = n_days * 24
timestamps = pd.date_range("2023-01-01", periods=n_hours, freq="h")
# Extract calendar features
hour_of_day = timestamps.hour
day_of_week = timestamps.dayofweek
# Daily pattern: peaks at hour 14-18 (2-6 PM), trough at 3-5 AM
# Using sine wave shifted to peak in afternoon
daily_pattern = daily_amplitude * np.sin(2 * np.pi * (hour_of_day - 6) / 24)
# Weekly pattern: weekends are ~10% lower
is_weekend = (day_of_week >= 5).astype(float)
weekly_pattern = -weekly_amplitude * is_weekend
# Add some trend (slight growth over time)
trend = 0.5 * np.arange(n_hours) / 24 # ~0.5 MW growth per day
# Combine components
noise = rng.normal(0, noise_std, n_hours)
load = base_load + daily_pattern + weekly_pattern + trend + noise
load = np.maximum(load, 0) # Load can't be negative
# Create DataFrame
df = pd.DataFrame(
{
"load": load,
"hour": hour_of_day,
"day_of_week": day_of_week,
"is_weekend": is_weekend.astype(int),
"is_business_hour": (
(hour_of_day >= 9) & (hour_of_day <= 17) & (day_of_week < 5)
).astype(int),
},
index=timestamps,
)
# Add lagged features (strictly causal)
df["load_lag1"] = df["load"].shift(1) # Previous hour
df["load_lag24"] = df["load"].shift(24) # Same hour yesterday
df["load_lag168"] = df["load"].shift(168) # Same hour last week
# Rolling features (with shift to prevent leakage)
df["load_ma24"] = df["load"].shift(1).rolling(24).mean() # Daily rolling avg
df["load_std24"] = df["load"].shift(1).rolling(24).std()
df = df.dropna()
return df
print("=" * 70)
print("EXAMPLE 14: ENERGY LOAD FORECASTING")
print("=" * 70)
# Generate data
df = generate_energy_load_data(n_days=90, seed=42)
print(f"\n📊 Generated hourly energy load data: {len(df)} hours")
print(f" Date range: {df.index[0]} to {df.index[-1]}")
print(f" Mean load: {df['load'].mean():.0f} MW")
print(f" Peak load: {df['load'].max():.0f} MW")
print(f" Min load: {df['load'].min():.0f} MW")
# Show daily pattern
daily_avg = df.groupby("hour")["load"].mean()
print("\n📈 Average load by hour (daily pattern):")
print(f" Peak hour: {daily_avg.idxmax()}:00 ({daily_avg.max():.0f} MW)")
print(f" Trough hour: {daily_avg.idxmin()}:00 ({daily_avg.min():.0f} MW)")
# =============================================================================
# PART 2: Why Multi-Step Forecasting Needs Gap Enforcement
# =============================================================================
print("\n" + "=" * 70)
print("PART 2: WHY MULTI-STEP FORECASTING NEEDS GAP ENFORCEMENT")
print("=" * 70)
print(
"""
For day-ahead forecasting (24-hour horizon):
At time t, you need to forecast load[t+1], load[t+2], ..., load[t+24]
If your training data goes up to time t, then:
- load_lag1 at time t+1 = load[t] ← AVAILABLE at time t
- load_lag24 at time t+1 = load[t-23] ← AVAILABLE at time t
But if you train on data up to t+23 for a 24-step forecast:
- Your features at t+24 use load[t+23] ← NOT AVAILABLE at time t!
WalkForwardCV enforces the gap: train data ends at t - horizon + 1.
"""
)
# =============================================================================
# PART 3: Baseline — 24-Hour Lag (Same Hour Yesterday)
# =============================================================================
print("\n" + "=" * 70)
print("PART 3: BASELINE — 24-HOUR LAG (SAME HOUR YESTERDAY)")
print("=" * 70)
# Split data (last 14 days for test)
test_days = 14
test_size = test_days * 24
train_size = len(df) - test_size
df_train = df.iloc[:train_size]
df_test = df.iloc[train_size:]
y_test = df_test["load"].values
# Baseline: predict same hour yesterday
baseline_pred = df_test["load_lag24"].values
baseline_mae = np.mean(np.abs(y_test - baseline_pred))
print("📊 24-Hour Lag Baseline (same hour yesterday):")
print(f" Test MAE: {baseline_mae:.2f} MW")
print(" This is our reference for model improvement.")
# =============================================================================
# PART 4: Train Models with Proper Features
# =============================================================================
print("\n" + "=" * 70)
print("PART 4: TRAIN MODELS WITH CALENDAR FEATURES")
print("=" * 70)
# Features
feature_cols = [
"hour",
"day_of_week",
"is_weekend",
"is_business_hour",
"load_lag1",
"load_lag24",
"load_lag168",
"load_ma24",
"load_std24",
]
X_train = df_train[feature_cols].values
y_train = df_train["load"].values
X_test = df_test[feature_cols].values
# Train models
gb_model = GradientBoostingRegressor(
n_estimators=100, max_depth=4, learning_rate=0.1, random_state=42
)
ridge_model = Ridge(alpha=1.0)
gb_model.fit(X_train, y_train)
ridge_model.fit(X_train, y_train)
# Predictions
gb_pred = gb_model.predict(X_test)
ridge_pred = ridge_model.predict(X_test)
# MAE
gb_mae = np.mean(np.abs(y_test - gb_pred))
ridge_mae = np.mean(np.abs(y_test - ridge_pred))
print("✅ Model Performance (1-hour ahead features):")
print("-" * 50)
print(f"{'Model':<20} {'MAE (MW)':<15} {'vs Baseline':<15}")
print("-" * 50)
print(f"{'24-Hour Lag':<20} {baseline_mae:<15.2f} {'(baseline)':<15}")
print(
f"{'GradientBoosting':<20} {gb_mae:<15.2f} {(gb_mae - baseline_mae) / baseline_mae * 100:+.1f}%"
)
print(f"{'Ridge':<20} {ridge_mae:<15.2f} {(ridge_mae - baseline_mae) / baseline_mae * 100:+.1f}%")
print("-" * 50)
# =============================================================================
# PART 5: Walk-Forward CV for Day-Ahead (24-Hour) Forecasting
# =============================================================================
print("\n" + "=" * 70)
print("PART 5: WALK-FORWARD CV FOR DAY-AHEAD (24-HOUR) FORECASTING")
print("=" * 70)
print(
"""
For day-ahead forecasting, we set horizon=24:
- This enforces a 24-hour gap between train end and test start
- Ensures features at test time use only available information
- Simulates real-world operational forecasting
"""
)
# WalkForwardCV with 24-hour horizon
wfcv = WalkForwardCV(
window_type="expanding",
window_size=30 * 24, # Start with 30 days of training
horizon=24, # 24-hour ahead forecast
test_size=7 * 24, # Test on 1 week at a time
n_splits=3,
)
print("\n📊 Walk-Forward CV with horizon=24 (day-ahead):")
print("-" * 70)
print(f"{'Fold':<8} {'Train Hours':<15} {'Test Period':<25} {'GB MAE':<12} {'Ridge MAE':<12}")
print("-" * 70)
gb_maes = []
ridge_maes = []
baseline_maes = []
for fold_idx, (train_idx, test_idx) in enumerate(wfcv.split(df[feature_cols].values)):
# Get data
X_tr = df.iloc[train_idx][feature_cols].values
y_tr = df.iloc[train_idx]["load"].values
X_te = df.iloc[test_idx][feature_cols].values
y_te = df.iloc[test_idx]["load"].values
# Baseline for this fold
bl_te = df.iloc[test_idx]["load_lag24"].values
bl_mae = np.mean(np.abs(y_te - bl_te))
baseline_maes.append(bl_mae)
# Train and predict
gb_fold = GradientBoostingRegressor(
n_estimators=100, max_depth=4, learning_rate=0.1, random_state=42
)
ridge_fold = Ridge(alpha=1.0)
gb_fold.fit(X_tr, y_tr)
ridge_fold.fit(X_tr, y_tr)
gb_pred_fold = gb_fold.predict(X_te)
ridge_pred_fold = ridge_fold.predict(X_te)
# MAE
gb_mae_fold = np.mean(np.abs(y_te - gb_pred_fold))
ridge_mae_fold = np.mean(np.abs(y_te - ridge_pred_fold))
gb_maes.append(gb_mae_fold)
ridge_maes.append(ridge_mae_fold)
# Print
test_start = df.index[test_idx[0]]
test_end = df.index[test_idx[-1]]
print(
f"{fold_idx + 1:<8} {len(train_idx):<15} {str(test_start.date()) + ' to ' + str(test_end.date()):<25} {gb_mae_fold:<12.2f} {ridge_mae_fold:<12.2f}"
)
print("-" * 70)
print(f"{'Mean':<8} {'':<15} {'':<25} {np.mean(gb_maes):<12.2f} {np.mean(ridge_maes):<12.2f}")
# =============================================================================
# PART 6: Feature Importance for Energy Forecasting
# =============================================================================
print("\n" + "=" * 70)
print("PART 6: FEATURE IMPORTANCE")
print("=" * 70)
# Get feature importance from the last GB model
importance = gb_fold.feature_importances_
importance_df = pd.DataFrame({"feature": feature_cols, "importance": importance}).sort_values(
"importance", ascending=False
)
print("\n📊 Feature Importance (GradientBoosting):")
print("-" * 40)
for _, row in importance_df.iterrows():
bar = "█" * int(row["importance"] * 50)
print(f"{row['feature']:<20} {row['importance']:.3f} {bar}")
# =============================================================================
# PART 7: Validation Gates
# =============================================================================
print("\n" + "=" * 70)
print("PART 7: VALIDATION GATES")
print("=" * 70)
best_model_mae = min(np.mean(gb_maes), np.mean(ridge_maes))
avg_baseline_mae = np.mean(baseline_maes)
gate_result = gate_suspicious_improvement(
model_metric=best_model_mae,
baseline_metric=avg_baseline_mae,
threshold=0.30,
warn_threshold=0.15,
)
print("\n📊 Gate: Suspicious Improvement Check")
print(f" Best model MAE: {best_model_mae:.2f} MW")
print(f" Baseline MAE: {avg_baseline_mae:.2f} MW")
print(f" Improvement: {(avg_baseline_mae - best_model_mae) / avg_baseline_mae * 100:.1f}%")
print(f" Status: {gate_result.status}")
# =============================================================================
# PART 8: Key Takeaways
# =============================================================================
print("\n" + "=" * 70)
print("PART 8: KEY TAKEAWAYS")
print("=" * 70)
print(
"""
1. MULTI-STEP FORECASTING REQUIRES PROPER GAPS
- horizon=24 for day-ahead forecasting
- horizon=168 for week-ahead
- Gap ensures features use only available information
2. CALENDAR FEATURES ARE CRUCIAL
- hour_of_day captures daily pattern
- day_of_week captures weekly pattern
- is_business_hour helps with demand peaks
- Consider holidays as special features
3. LAGGED FEATURES AT SEASONAL PERIODS
- load_lag24: Same hour yesterday
- load_lag168: Same hour last week
- These are often the most important features
4. 24-HOUR LAG IS A STRONG BASELINE
- Simple: predict what happened same hour yesterday
- Often hard to beat significantly
- Use as sanity check for your model
5. ROLLING FEATURES NEED .shift()
- load_ma24 = load.shift(1).rolling(24).mean()
- Without shift, you include current value → leakage
- Even small leakage compounds over 24-hour horizon
6. OPERATIONAL CONSIDERATIONS
- Real-time corrections (hour-ahead) can use more recent data
- Day-ahead scheduling needs 24-hour gap
- Week-ahead planning needs 168-hour gap
- Different horizons may need different models
"""
)
print("\n" + "=" * 70)
print("Example 14 complete.")
print("=" * 70)
Total running time of the script: (0 minutes 1.223 seconds)