Back to Blog
Tutorial
tutorial
machine-learning
backtesting

A Practitioner's Guide to Cross-Validation in Finance

DR
Dr. Sarah Chen
February 5, 2026
3,247 views
A Practitioner's Guide to Cross-Validation in Finance

A Practitioner's Guide to Cross-Validation in Finance

Standard k-fold cross-validation can be dangerously misleading when applied to financial time-series data. Here's why, and what to do about it.

The Problem with Standard CV

Financial data has three properties that violate the assumptions of standard cross-validation:

  • Serial correlation -- Today's returns are correlated with yesterday's returns
  • Non-stationarity -- The data distribution changes over time (regime shifts)
  • Information leakage -- Features computed from a rolling window can leak future information
  • Better Alternatives

    Purged Walk-Forward CV

    from sklearn.model_selection import TimeSeriesSplit
    

    def purged_walk_forward(X, y, n_splits=5, embargo_pct=0.01): """ Walk-forward CV with purging and embargo. Purging removes samples near the train/test boundary. Embargo prevents using recently trained-on data. """ tscv = TimeSeriesSplit(n_splits=n_splits) embargo_size = int(len(X) * embargo_pct)

    for train_idx, test_idx in tscv.split(X): # Purge: remove overlap train_idx = train_idx[:-embargo_size] yield train_idx, test_idx

    Combinatorial Purged CV (CPCV)

    Introduced by Marcos Lopez de Prado, CPCV generates multiple train/test paths through the data, providing more reliable backtests with fewer data points.

    Key Takeaways

    • Never use random k-fold CV on time-series data
    • Always purge samples near train/test boundaries
    • Apply an embargo period proportional to your feature lookback
    • Use multiple paths through the data to reduce variance of your performance estimate
    Financial ML requires financial ML techniques. Don't trust generic tools blindly.