Feature Engineering for Financial ML: The Complete Playbook
Feature Engineering for Financial ML: The Complete Playbook
In quantitative finance, features are everything. The model is secondary. Here's the complete playbook for engineering features that actually predict returns.
Price-Based Features
def price_features(df):
"""Standard price-based features."""
features = pd.DataFrame(index=df.index)
# Returns at multiple horizons
for d in [1, 5, 21, 63, 252]:
features[f"ret_{d}d"] = df["close"].pct_change(d)
# Volatility
features["vol_21d"] = df["close"].pct_change().rolling(21).std()
features["vol_ratio"] = features["vol_21d"] / df["close"].pct_change().rolling(63).std()
# Moving average crossovers
features["ma_cross"] = df["close"].rolling(50).mean() / df["close"].rolling(200).mean()
# RSI
delta = df["close"].diff()
gain = delta.where(delta > 0, 0).rolling(14).mean()
loss = (-delta.where(delta < 0, 0)).rolling(14).mean()
features["rsi"] = 100 - (100 / (1 + gain / loss))
return features
Volume Features
- Volume ratio — Today's volume / 20-day average volume
- VWAP deviation — Distance from volume-weighted average price
- OBV slope — On-balance volume trend (accumulation/distribution)
Fundamental Features (when available)
- Earnings yield — E/P ratio (inverse P/E)
- Book-to-market — Classic value factor
- Asset growth — Companies growing assets too fast underperform
- Gross profitability — Gross profit / total assets
Cross-Sectional Features
The most powerful features are relative, not absolute:
def cross_sectional_rank(feature_series):
"""Rank feature across all stocks (0 to 1)."""
return feature_series.rank(pct=True)
Ranking removes outliers, handles different scales, and creates uniform distributions that models love.
Feature Interaction
Don't just stack features — combine them:
# Value + Momentum combo (historically one of the strongest factors)
features["value_momentum"] = (
cross_sectional_rank(features["earnings_yield"]) +
cross_sectional_rank(features["ret_252d"])
) / 2