Market microstructure signals -- anyone using order book data?
Curious if anyone has incorporated order book data into their models. I've been looking at bid-ask spread dynamics and order flow imbalance as potential alpha sources.
The challenge is the data frequency -- most competition datasets are daily, but microstructure signals are intraday.
32 Replies
Good point about overfitting. My rule of thumb: never trust a backtest with fewer than 500 observations in the out-of-sample period.
One thing to watch out for: survivorship bias in the training data. Make sure you include delisted securities.
The data quality in this competition is actually quite good compared to real-world datasets. In practice, you'd spend 60%+ of your time cleaning data.
For those new to the platform: start with the tutorial competition. It has a smaller dataset and more forgiving scoring.
The documentation for the API is at /docs -- it's OpenAPI/Swagger format. Very helpful for understanding submission formats.
Good point about overfitting. My rule of thumb: never trust a backtest with fewer than 500 observations in the out-of-sample period.
I disagree about the GARCH approach. In my experience, realized volatility estimators (like the Rogers-Satchell estimator) outperform parametric models.