Mathematical Foundations

This document provides the mathematical derivations underlying temporalcv’s statistical tests and metrics. Each section is tagged with its knowledge tier.


1. Diebold-Mariano Test [T1]

Purpose: Compare predictive accuracy of two forecasting models.

Reference: Diebold, F.X. & Mariano, R.S. (1995). Comparing predictive accuracy. Journal of Business & Economic Statistics, 13(3), 253-263.

1.1 Loss Differential

Given two forecast error series e₁,ₜ and e₂,ₜ, define the loss differential:

d_t = L(e₁,ₜ) - L(e₂,ₜ)

Where L(·) is a loss function:

  • Squared loss: L(e) = e²

  • Absolute loss: L(e) = |e|

1.2 Null Hypothesis

H₀: E[d_t] = 0  (equal predictive accuracy)
H₁: E[d_t] ≠ 0  (different predictive accuracy)

1.3 Test Statistic

DM = d̄ / √(V̂(d̄))

Where:

  • d̄ = (1/n) Σ d_t is the sample mean of loss differentials

  • V̂(d̄) is the HAC variance estimator (see Section 2)

Under H₀, DM → N(0,1) asymptotically.

1.4 Harvey Adjustment [T1]

For small samples, Harvey et al. (1997) proposed:

DM_adj = DM × √((n + 1 - 2h + h(h-1)/n) / n)

This corrects for small-sample bias in the variance estimate.

Reference: Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the equality of prediction mean squared errors. International Journal of Forecasting, 13(2), 281-291.


2. HAC Variance Estimation [T1]

Purpose: Correct for serial correlation in forecast errors for h > 1.

Reference: Newey, W.K. & West, K.D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703-708.

2.1 Problem

For h-step forecasts, forecast errors follow an MA(h-1) process, inducing autocorrelation. Standard variance estimators are biased.

2.2 Bartlett Kernel

The Bartlett kernel weight for lag j:

w(j) = 1 - |j| / (bandwidth + 1)    if |j| ≤ bandwidth
w(j) = 0                             otherwise

2.3 HAC Variance Formula

V̂(d̄) = (1/n) × [γ₀ + 2 Σⱼ₌₁^bandwidth w(j) × γⱼ]

Where γⱼ is the sample autocovariance at lag j:

γⱼ = (1/n) Σₜ (d_t - )(d_{t-j} - )

2.4 Automatic Bandwidth Selection [T1]

Andrews (1991) rule:

bandwidth = floor(4 × (n/100)^(2/9))

For h-step forecasts, setting bandwidth = h - 1 is theoretically motivated (MA(h-1) structure).

Reference: Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59(3), 817-858.


3. Pesaran-Timmermann Test [T1]

Purpose: Test whether directional forecasts are better than random.

Reference: Pesaran, M.H. & Timmermann, A. (1992). A simple nonparametric test of predictive performance. Journal of Business & Economic Statistics, 10(4), 461-465.

3.1 Observed Accuracy

 = (number of correct directions) / n

3.2 Expected Accuracy Under Independence

Under the null hypothesis that predictions are independent of actuals:

p* = p_y × p_x + (1 - p_y) × (1 - p_x)

Where:

  • p_y = P(actual > 0) = fraction of positive actuals

  • p_x = P(prediction > 0) = fraction of positive predictions

3.3 Variance Components

V(p̂) = p* × (1 - p*) / n

V(p*) = [(2p_y - 1)² × p_x(1-p_x) + (2p_x - 1)² × p_y(1-p_y)
         + 4 × p_y × p_x × (1-p_y) × (1-p_x) / n] / n

3.4 Test Statistic

PT = (p̂ - p*) / √(V(p̂) + V(p*))

Under H₀, PT → N(0,1) asymptotically. One-sided test: reject if PT > z_α.

3.5 Three-Class Extension [T3]

Warning: The 3-class mode (UP/DOWN/FLAT) is an ad-hoc extension not published in the academic literature.

For 3 classes with marginal probabilities p_y^k and p_x^k for k ∈ {UP, DOWN, FLAT}:

p* = Σₖ p_y^k × p_x^k

The variance formulas are approximations. Use 2-class mode for rigorous testing.


4. Conformal Prediction [T1]

Purpose: Distribution-free prediction intervals with coverage guarantee.

Reference: Romano, Y., Patterson, E. & Candès, E.J. (2019). Conformalized quantile regression. NeurIPS.

4.1 Finite-Sample Coverage Guarantee

For calibration set of size n and miscoverage rate α:

P(Y_{n+1} ∈ Ĉ(X_{n+1})) ≥ 1 - α

This holds for any distribution (no parametric assumptions needed).

4.2 Quantile Formula

The critical step uses the ceiling function:

q = ceil((n + 1) × (1 - α)) / n

Why ceiling?

The (n+1)(1-α) quantile of n nonconformity scores gives exact coverage. The ceiling ensures we round up, guaranteeing at least (1-α) coverage.

4.3 Nonconformity Score

For regression with residual-based scores:

s_i = |y_i - ŷ_i|

The prediction interval is:

Ĉ(x) = [ŷ(x) - , ŷ(x) + ]

Where q̂ is the empirical quantile of calibration scores.

4.4 Adaptive Conformal Inference [T1]

For distribution shift, Gibbs & Candès (2021) proposed:

q_{t+1} = q_t - γα        if y_t ∈ Ĉ_t(x_t)  (covered)
q_{t+1} = q_t + γ(1-α)    if y_t ∉ Ĉ_t(x_t)  (not covered)

This adapts the quantile online to maintain target coverage.

Reference: Gibbs, I. & Candès, E.J. (2021). Adaptive conformal inference under distribution shift. NeurIPS.


5. Move-Conditional Skill Score [T2]

Purpose: Measure forecasting skill on significant moves, excluding flat periods.

5.1 Motivation

For high-persistence series (ACF(1) > 0.9), the persistence baseline (predict no change) achieves trivially low overall MAE because most periods are “flat.”

Conditioning on moves isolates genuine forecasting skill.

5.2 Move Classification

Given threshold τ (typically 70th percentile of |actuals| from training):

UP:   actual > τ
DOWN: actual < -τ
FLAT: |actual| ≤ τ

5.3 MC-SS Formula

MC-SS = 1 - (model_MAE_moves / persistence_MAE_moves)

Where:

  • model_MAE_moves = MAE of model predictions on UP and DOWN periods only

  • persistence_MAE_moves = mean(|actual|) on moves (since persistence predicts 0)

5.4 Interpretation

MC-SS

Meaning

> 0

Model beats persistence on moves

= 0

Model equals persistence on moves

< 0

Model worse than persistence on moves

5.5 Threshold Selection [T2]

The 70th percentile was chosen empirically:

  • ~30% of periods are “moves” (UP or DOWN)

  • ~70% are “flat”

This provides meaningful signal while maintaining sufficient sample size.

Source: myga-forecasting-v2 Phase 11 analysis.


6. AR(1) Theoretical Bounds [T1]

Purpose: Establish optimal forecast error for AR(1) process.

6.1 AR(1) Process

y_t = φ × y_{t-1} + σ × ε_t

Where:

  • φ = persistence coefficient (typically 0.9 < φ < 1 for financial data)

  • σ = innovation standard deviation

  • ε_t ~ N(0, 1) i.i.d.

6.2 Optimal 1-Step Predictor

ŷ_t|t-1 = φ × y_{t-1}

The forecast error is:

e_t = y_t - ŷ_t|t-1 = σ × ε_t

6.3 Optimal MAE

Since ε_t ~ N(0, 1), the expected absolute error is:

E[|ε|] = √(2/π) ≈ 0.798

Therefore:

Optimal MAE = σ × √(2/π) ≈ 0.798 × σ

6.4 Validation Gate Application [T2]

If a model achieves MAE significantly below σ × √(2/π) on synthetic AR(1) data, it indicates lookahead bias (the model is “seeing” future ε values).

The tolerance factor of 1.5 allows for finite-sample variation:

HALT if: model_MAE < (1/1.5) × theoretical_MAE

Notation Reference

Symbol

Meaning

h

Forecast horizon (steps ahead)

n

Sample size

α

Significance level or miscoverage rate

φ

AR(1) persistence coefficient

σ

Innovation standard deviation

d_t

Loss differential at time t

L(·)

Loss function

γⱼ

Autocovariance at lag j

Observed accuracy

p*

Expected accuracy under null


Knowledge Tier Summary

Section

Tier

Confidence

DM Test

T1

Academically validated

HAC Variance

T1

Academically validated

PT Test (2-class)

T1

Academically validated

PT Test (3-class)

T3

Ad-hoc extension

Conformal

T1

Academically validated

MC-SS

T2

Empirical (v2)

AR(1) Bounds

T1

Standard statistics