Machine Learning in Finance: What Actually Works (and What Doesn't)
Machine Learning in Finance: What Actually Works
Everyone wants to throw a neural network at stock prices. Most fail spectacularly. Here's an honest look at which ML techniques produce real alpha — and which are hype.
What Works
1. Gradient Boosted Trees (XGBoost, LightGBM)
The workhorse of quant finance. GBTs handle tabular data with mixed feature types, missing values, and nonlinear relationships. 80% of winning AlphaNova submissions use tree-based models.
import lightgbm as lgb
model = lgb.LGBMRegressor(
n_estimators=500,
learning_rate=0.01,
max_depth=6,
num_leaves=31,
min_child_samples=50, # Critical for financial data
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.1,
reg_lambda=1.0,
)
2. Linear Models with Good Features
Ridge regression on well-engineered features often beats complex deep learning. The secret is in feature engineering, not model complexity.
3. Ensemble Methods
Stacking multiple weak models consistently outperforms single strong models. Think: random forest of strategies.
What Doesn't Work
1. LSTMs for Price Prediction
Despite hundreds of Medium articles, LSTMs predicting next-day returns do not produce tradeable alpha. The signal-to-noise ratio in daily returns is too low.
2. Reinforcement Learning (for most people)
RL for portfolio optimization sounds exciting but requires massive compute, careful reward shaping, and tends to overfit catastrophically.
3. Transformer Models on Raw Prices
GPT-style models on price sequences haven't shown consistent out-of-sample performance. The data is too noisy and non-stationary.
The Sweet Spot
The most successful ML approaches in finance:
The competition leaderboard doesn't lie: simplicity with good features beats complexity every time.