Feature importance: Shapley values for financial models
SHAP values provide a principled way to understand feature importance in ML models.
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)
Key findings from my analysis:
- Momentum features (5d, 21d returns) have the highest SHAP values
- Volatility features are most important during regime transitions
- Calendar features matter more than I expected (quarter-end effects)
9 Replies
I've found that sector neutrality is a key factor in the scoring. Strategies that are long one sector and short another tend to underperform.
Thanks for sharing! This is exactly the kind of insight that helps the community grow. Bookmarking this thread.
Has anyone tried using attention mechanisms for this? The temporal attention weights could tell you which historical periods are most relevant.
Anyone else noticing that momentum factors have been working particularly well in the last month of competition data?
For those wondering: yes, you can use external Python packages in submissions, but they must be in the approved list. No custom C extensions.
One more thing: the scoring engine uses a held-out test period that you never see. So your validation score is the best you can do.
The biggest mistake I see newcomers make: optimizing for the wrong metric. Sharpe != best trading strategy. Consider Calmar, Sortino, and max drawdown.
The biggest mistake I see newcomers make: optimizing for the wrong metric. Sharpe != best trading strategy. Consider Calmar, Sortino, and max drawdown.
Great analysis! I've been using a similar approach with rolling z-scores and it's been working well for mean reversion signals.
For factor models, I'd strongly recommend the Fama-French 5-factor model as a starting point. It captures most systematic risk.
I disagree about the GARCH approach. In my experience, realized volatility estimators (like the Rogers-Satchell estimator) outperform parametric models.