Alternative Data for Alpha Generation: Satellite Imagery, NLP, and Beyond
Alternative Data for Alpha Generation
Traditional financial data — prices, volumes, fundamentals — is fully priced in. The new frontier of alpha generation comes from alternative data: non-traditional datasets that provide informational edges.
The Alternative Data Landscape
Satellite Imagery
- Retail parking lots — Count cars at Walmart to predict revenue before earnings
- Oil storage tanks — Estimate crude inventory from shadow analysis
- Crop health — NDVI satellite data predicts agricultural commodity prices
Natural Language Processing
- Earnings call transcripts — Sentiment analysis predicts post-earnings drift
- SEC filings — Detect changes in risk language between quarterly filings
- Social media — Reddit/Twitter sentiment as a contrarian indicator
Web Data
- Job postings — Hiring surges predict revenue growth 2-3 quarters ahead
- App download rankings — Mobile app traction forecasts tech stock performance
- Price tracking — Web-scraped product prices indicate inflation trends
Building an NLP Signal
from transformers import pipeline
Financial sentiment model
sentiment = pipeline(
"sentiment-analysis",
model="ProsusAI/finbert"
)
def score_earnings_call(transcript: str) -> float:
"""Score earnings call transcript sentiment."""
# Split into chunks (model max 512 tokens)
chunks = [transcript[i:i+500] for i in range(0, len(transcript), 500)]
scores = []
for chunk in chunks:
result = sentiment(chunk)[0]
score = result["score"] if result["label"] == "positive" else -result["score"]
scores.append(score)
return sum(scores) / len(scores)
The Challenge
Alternative data is expensive, noisy, and decays fast. A satellite imagery signal that worked in 2024 may be arbitraged away by 2026 as more funds adopt it.
On AlphaNova
While our competitions use obfuscated data (preventing external data edges), the feature engineering mindset from alternative data carries over perfectly. The skill of extracting signal from noise is universal.