Alternative data sources for quant models
Beyond traditional price/volume data, what alternative data are people using in their research?
Popular sources:
- Satellite imagery (parking lot counts, crop health)
- NLP sentiment (news, earnings calls, social media)
- Web traffic (SimilarWeb, Google Trends)
- Credit card transactions (spending patterns)
- Patent filings (innovation signals)
43 Replies
For those wondering: yes, you can use external Python packages in submissions, but they must be in the approved list. No custom C extensions.
One more thing: the scoring engine uses a held-out test period that you never see. So your validation score is the best you can do.
import numpy as np
from scipy import optimize
def max_sharpe_portfolio(returns, rf=0.0):
n = returns.shape[1]
init_w = np.ones(n) / n
bounds = [(0.0, 0.1)] * n
constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1.0}
result = optimize.minimize(
lambda w: -(np.mean(returns @ w) - rf) / np.std(returns @ w),
init_w, bounds=bounds, constraints=constraints
)
return result.x
Here's a simple max-Sharpe optimizer for reference.
One thing to watch out for: survivorship bias in the training data. Make sure you include delisted securities.
Turnover control is crucial. My best performing model has a turnover of only 8% daily. High turnover strategies rarely survive transaction costs.
I disagree about the GARCH approach. In my experience, realized volatility estimators (like the Rogers-Satchell estimator) outperform parametric models.
import numpy as np
from scipy import optimize
def max_sharpe_portfolio(returns, rf=0.0):
n = returns.shape[1]
init_w = np.ones(n) / n
bounds = [(0.0, 0.1)] * n
constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1.0}
result = optimize.minimize(
lambda w: -(np.mean(returns @ w) - rf) / np.std(returns @ w),
init_w, bounds=bounds, constraints=constraints
)
return result.x
Here's a simple max-Sharpe optimizer for reference.
One more thing: the scoring engine uses a held-out test period that you never see. So your validation score is the best you can do.
Interesting thread! I've been exploring reinforcement learning for portfolio allocation. The challenge is defining the right reward function.