Machine Learning in Finance: What Actually Works

Everyone wants to throw a neural network at stock prices. Most fail spectacularly. Here's an honest look at which ML techniques produce real alpha — and which are hype.

What Works

1. Gradient Boosted Trees (XGBoost, LightGBM)

The workhorse of quant finance. GBTs handle tabular data with mixed feature types, missing values, and nonlinear relationships. 80% of winning AlphaNova submissions use tree-based models.

import lightgbm as lgb

model = lgb.LGBMRegressor(
n_estimators=500,
learning_rate=0.01,
max_depth=6,
num_leaves=31,
min_child_samples=50,  # Critical for financial data
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.1,
reg_lambda=1.0,
)

2. Linear Models with Good Features

Ridge regression on well-engineered features often beats complex deep learning. The secret is in feature engineering, not model complexity.

3. Ensemble Methods

Stacking multiple weak models consistently outperforms single strong models. Think: random forest of strategies.

What Doesn't Work

1. LSTMs for Price Prediction

Despite hundreds of Medium articles, LSTMs predicting next-day returns do not produce tradeable alpha. The signal-to-noise ratio in daily returns is too low.

2. Reinforcement Learning (for most people)

RL for portfolio optimization sounds exciting but requires massive compute, careful reward shaping, and tends to overfit catastrophically.

3. Transformer Models on Raw Prices

GPT-style models on price sequences haven't shown consistent out-of-sample performance. The data is too noisy and non-stationary.

The Sweet Spot

The most successful ML approaches in finance:

Use domain knowledge to engineer features (not raw prices)

Apply simple, regularized models (GBTs, ridge regression)

Ensemble aggressively across models and time periods

Validate rigorously with proper time-series splits

The competition leaderboard doesn't lie: simplicity with good features beats complexity every time.

Machine Learning in Finance: What Actually Works (and What Doesn't)