Mathematical Foundations¶

This document provides the mathematical derivations underlying temporalcv’s statistical tests and metrics. Each section is tagged with its knowledge tier.

1. Diebold-Mariano Test [T1]¶

Purpose: Compare predictive accuracy of two forecasting models.

Reference: Diebold, F.X. & Mariano, R.S. (1995). Comparing predictive accuracy. Journal of Business & Economic Statistics, 13(3), 253-263.

1.1 Loss Differential¶

Given two forecast error series e₁,ₜ and e₂,ₜ, define the loss differential:

d_t = L(e₁,ₜ) - L(e₂,ₜ)

Where L(·) is a loss function:

Squared loss: L(e) = e²
Absolute loss: L(e) = |e|

1.2 Null Hypothesis¶

H₀: E[d_t] = 0  (equal predictive accuracy)
H₁: E[d_t] ≠ 0  (different predictive accuracy)

1.3 Test Statistic¶

DM = d̄ / √(V̂(d̄))

Where:

d̄ = (1/n) Σ d_t is the sample mean of loss differentials
V̂(d̄) is the HAC variance estimator (see Section 2)

Under H₀, DM → N(0,1) asymptotically.

1.4 Harvey Adjustment [T1]¶

For small samples, Harvey et al. (1997) proposed:

DM_adj = DM × √((n + 1 - 2h + h(h-1)/n) / n)

This corrects for small-sample bias in the variance estimate.

Reference: Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the equality of prediction mean squared errors. International Journal of Forecasting, 13(2), 281-291.

2. HAC Variance Estimation [T1]¶

Purpose: Correct for serial correlation in forecast errors for h > 1.

Reference: Newey, W.K. & West, K.D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3), 703-708.

2.1 Problem¶

For h-step forecasts, forecast errors follow an MA(h-1) process, inducing autocorrelation. Standard variance estimators are biased.

2.2 Bartlett Kernel¶

The Bartlett kernel weight for lag j:

w(j) = 1 - |j| / (bandwidth + 1)    if |j| ≤ bandwidth
w(j) = 0                             otherwise

2.3 HAC Variance Formula¶

V̂(d̄) = (1/n) × [γ₀ + 2 Σⱼ₌₁^bandwidth w(j) × γⱼ]

Where γⱼ is the sample autocovariance at lag j:

γⱼ = (1/n) Σₜ (d_t - d̄)(d_{t-j} - d̄)

2.4 Automatic Bandwidth Selection [T1]¶

Andrews (1991) rule:

bandwidth = floor(4 × (n/100)^(2/9))

For h-step forecasts, setting bandwidth = h - 1 is theoretically motivated (MA(h-1) structure).

Reference: Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica, 59(3), 817-858.

3. Pesaran-Timmermann Test [T1]¶

Purpose: Test whether directional forecasts are better than random.

Reference: Pesaran, M.H. & Timmermann, A. (1992). A simple nonparametric test of predictive performance. Journal of Business & Economic Statistics, 10(4), 461-465.

3.1 Observed Accuracy¶

p̂ = (number of correct directions) / n

3.2 Expected Accuracy Under Independence¶

Under the null hypothesis that predictions are independent of actuals:

p* = p_y × p_x + (1 - p_y) × (1 - p_x)

Where:

p_y = P(actual > 0) = fraction of positive actuals
p_x = P(prediction > 0) = fraction of positive predictions

3.3 Variance Components¶

V(p̂) = p* × (1 - p*) / n

V(p*) = [(2p_y - 1)² × p_x(1-p_x) + (2p_x - 1)² × p_y(1-p_y)
         + 4 × p_y × p_x × (1-p_y) × (1-p_x) / n] / n

3.4 Test Statistic¶

PT = (p̂ - p*) / √(V(p̂) + V(p*))

Under H₀, PT → N(0,1) asymptotically. One-sided test: reject if PT > z_α.

3.5 Three-Class Extension [T3]¶

Warning: The 3-class mode (UP/DOWN/FLAT) is an ad-hoc extension not published in the academic literature.

For 3 classes with marginal probabilities p_y^k and p_x^k for k ∈ {UP, DOWN, FLAT}:

p* = Σₖ p_y^k × p_x^k

The variance formulas are approximations. Use 2-class mode for rigorous testing.

4. Conformal Prediction [T1]¶

Purpose: Distribution-free prediction intervals with coverage guarantee.

Reference: Romano, Y., Patterson, E. & Candès, E.J. (2019). Conformalized quantile regression. NeurIPS.

4.1 Finite-Sample Coverage Guarantee¶

For calibration set of size n and miscoverage rate α:

P(Y_{n+1} ∈ Ĉ(X_{n+1})) ≥ 1 - α

This holds for any distribution (no parametric assumptions needed).

4.2 Quantile Formula¶

The critical step uses the ceiling function:

q = ceil((n + 1) × (1 - α)) / n

Why ceiling?

The (n+1)(1-α) quantile of n nonconformity scores gives exact coverage. The ceiling ensures we round up, guaranteeing at least (1-α) coverage.

4.3 Nonconformity Score¶

For regression with residual-based scores:

s_i = |y_i - ŷ_i|

The prediction interval is:

Ĉ(x) = [ŷ(x) - q̂, ŷ(x) + q̂]

Where q̂ is the empirical quantile of calibration scores.

4.4 Adaptive Conformal Inference [T1]¶

For distribution shift, Gibbs & Candès (2021) proposed:

q_{t+1} = q_t - γα        if y_t ∈ Ĉ_t(x_t)  (covered)
q_{t+1} = q_t + γ(1-α)    if y_t ∉ Ĉ_t(x_t)  (not covered)

This adapts the quantile online to maintain target coverage.

Reference: Gibbs, I. & Candès, E.J. (2021). Adaptive conformal inference under distribution shift. NeurIPS.

5. Move-Conditional Skill Score [T2]¶

Purpose: Measure forecasting skill on significant moves, excluding flat periods.

5.1 Motivation¶

For high-persistence series (ACF(1) > 0.9), the persistence baseline (predict no change) achieves trivially low overall MAE because most periods are “flat.”

Conditioning on moves isolates genuine forecasting skill.

5.2 Move Classification¶

Given threshold τ (typically 70th percentile of |actuals| from training):

UP:   actual > τ
DOWN: actual < -τ
FLAT: |actual| ≤ τ

5.3 MC-SS Formula¶

MC-SS = 1 - (model_MAE_moves / persistence_MAE_moves)

Where:

model_MAE_moves = MAE of model predictions on UP and DOWN periods only
persistence_MAE_moves = mean(|actual|) on moves (since persistence predicts 0)

5.4 Interpretation¶

MC-SS	Meaning
> 0	Model beats persistence on moves
= 0	Model equals persistence on moves
< 0	Model worse than persistence on moves

5.5 Threshold Selection [T2]¶

The 70th percentile was chosen empirically:

~30% of periods are “moves” (UP or DOWN)
~70% are “flat”

This provides meaningful signal while maintaining sufficient sample size.

Source: myga-forecasting-v2 Phase 11 analysis.

6. AR(1) Theoretical Bounds [T1]¶

Purpose: Establish optimal forecast error for AR(1) process.

6.1 AR(1) Process¶

y_t = φ × y_{t-1} + σ × ε_t

Where:

φ = persistence coefficient (typically 0.9 < φ < 1 for financial data)
σ = innovation standard deviation
ε_t ~ N(0, 1) i.i.d.

6.2 Optimal 1-Step Predictor¶

ŷ_t|t-1 = φ × y_{t-1}

The forecast error is:

e_t = y_t - ŷ_t|t-1 = σ × ε_t

6.3 Optimal MAE¶

Since ε_t ~ N(0, 1), the expected absolute error is:

E[|ε|] = √(2/π) ≈ 0.798

Therefore:

Optimal MAE = σ × √(2/π) ≈ 0.798 × σ

6.4 Validation Gate Application [T2]¶

If a model achieves MAE significantly below σ × √(2/π) on synthetic AR(1) data, it indicates lookahead bias (the model is “seeing” future ε values).

The tolerance factor of 1.5 allows for finite-sample variation:

HALT if: model_MAE < (1/1.5) × theoretical_MAE

Notation Reference¶

Symbol	Meaning
h	Forecast horizon (steps ahead)
n	Sample size
α	Significance level or miscoverage rate
φ	AR(1) persistence coefficient
σ	Innovation standard deviation
d_t	Loss differential at time t
L(·)	Loss function
γⱼ	Autocovariance at lag j
p̂	Observed accuracy
p*	Expected accuracy under null

Knowledge Tier Summary¶

Section	Tier	Confidence
DM Test	T1	Academically validated
HAC Variance	T1	Academically validated
PT Test (2-class)	T1	Academically validated
PT Test (3-class)	T3	Ad-hoc extension
Conformal	T1	Academically validated
MC-SS	T2	Empirical (v2)
AR(1) Bounds	T1	Standard statistics