Posted by Danny TarlowThe San Francisco weather has been pretty nice the last couple weekends, so some friends and I decided to go check out the horse races at Golden Gate Fields over in Berkeley. We were there more to watch and have fun than to bet, but I couldn't help but think about modeling the races. There was certainly no shortage of strategies that people swore by: look for a shiny coat and perked up ears; pick horses with lighter coats in warmer weather; pick the favorite in the short races; and plenty others.
Less anecdotally, there seems to be a bit of agreement on the factors that go into a horse's performance:
I wonder how much of this is relevant if you have some data about the horse's past performance. In the Netflix challenge, it is my informal understanding that information from IMDB about actors, genres, directors, etc. isn't terribly useful in improving recommendations because most of a user's preferences towards these rough categories are already captured in their ratings profile. The argument could probably be made that horses race less than people on Netflix rate movies, so there is less information in the data, but it's hard to say if this would make a difference in model performance without doing some analysis on real data.
I haven't had a chance to go through all of these links, but there does appear to be a lot of data out there. The problem is that none of the sources seem to be centralized, free, and in an easily accessible format:
I think the best bet would probably be to focus on an individual track to start. My first choice would be to find some historical data from Golden Gate Fields, but it looks like other tracks have better data available. For example, the Santa Anita data looks decent:
With a bit of help from the Mechanical Turk, maybe it would be possible to put together a reasonable data set.
It might be fun to play around with this a bit more. It wouldn't be hard to build a model similar to my March Madness predictions, though who knows how well it would work in practice. Regardless, doing something more rigorous is probably better than blindly betting on the horse that is named after my grandfather.