Handling missing data in financial datasets
The competition dataset has ~3% missing values scattered throughout. How are people handling this?
I've tried:
- Forward fill (most common for financial data)
- Interpolation
- Setting missing = 0 (bad idea for returns)
- Dropping rows with missing values
4 Replies
Interesting thread! I've been exploring reinforcement learning for portfolio allocation. The challenge is defining the right reward function.
I'd recommend reading "Quantitative Portfolio Management" by Michael Isichenko. It's the best practical guide I've found.
Great discussion! This is why I love this community - knowledge sharing makes everyone better.
The data quality in this competition is actually quite good compared to real-world datasets. In practice, you'd spend 60%+ of your time cleaning data.
For those wondering: yes, you can use external Python packages in submissions, but they must be in the approved list. No custom C extensions.