Back to Blog
Tutorial
tutorial
feature-engineering
machine-learning
python

Feature Engineering for Financial ML: The Complete Playbook

DR
Dr. Sarah Chen
March 10, 2026
2 min read
13,456 views
Feature Engineering for Financial ML: The Complete Playbook

Feature Engineering for Financial ML: The Complete Playbook

In quantitative finance, features are everything. The model is secondary. Here's the complete playbook for engineering features that actually predict returns.

Price-Based Features

def price_features(df):

"""Standard price-based features.""" features = pd.DataFrame(index=df.index)

# Returns at multiple horizons for d in [1, 5, 21, 63, 252]: features[f"ret_{d}d"] = df["close"].pct_change(d)

# Volatility features["vol_21d"] = df["close"].pct_change().rolling(21).std() features["vol_ratio"] = features["vol_21d"] / df["close"].pct_change().rolling(63).std()

# Moving average crossovers features["ma_cross"] = df["close"].rolling(50).mean() / df["close"].rolling(200).mean()

# RSI delta = df["close"].diff() gain = delta.where(delta > 0, 0).rolling(14).mean() loss = (-delta.where(delta < 0, 0)).rolling(14).mean() features["rsi"] = 100 - (100 / (1 + gain / loss))

return features

Volume Features

  • Volume ratio — Today's volume / 20-day average volume
  • VWAP deviation — Distance from volume-weighted average price
  • OBV slope — On-balance volume trend (accumulation/distribution)

Fundamental Features (when available)

  • Earnings yield — E/P ratio (inverse P/E)
  • Book-to-market — Classic value factor
  • Asset growth — Companies growing assets too fast underperform
  • Gross profitability — Gross profit / total assets

Cross-Sectional Features

The most powerful features are relative, not absolute:

def cross_sectional_rank(feature_series):

"""Rank feature across all stocks (0 to 1).""" return feature_series.rank(pct=True)

Ranking removes outliers, handles different scales, and creates uniform distributions that models love.

Feature Interaction

Don't just stack features — combine them:

# Value + Momentum combo (historically one of the strongest factors)

features["value_momentum"] = ( cross_sectional_rank(features["earnings_yield"]) + cross_sectional_rank(features["ret_252d"]) ) / 2

The Golden Rules

  • Always rank cross-sectionally — Models perform better on ranks than raw values
  • Use multiple lookback windows — 1d, 5d, 21d, 63d, 252d capture different time scales
  • Normalize within time periods — Z-score each day's cross-section independently
  • Remove forward-looking information — Shift all features by at least 1 day
  • Less is more — 20 well-chosen features beat 200 kitchen-sink features
  • Leave Feedback

    /blog/feature-engineering-financial-ml-playbook