Posted by Danny TarlowThis is a guest post by Dr. Scott Turner, the force behind team "The Pain Machine," which was the co-winner of the Sweet 16 contest from last year. In this post, he'll describe what to expect from his entry to this year's algorithmic March Madness prediction contest. If you are planning to enter and would like to contribute a guest post, please email Danny or Lee.
Dr. Turner has a Ph.D. in Artificial Intelligence from UCLA. His dissertation subject was a program called MINSTREL that told stories about King Arthur and his knights, as a way to explore issues in creativity and storytelling. Since obtaining his Ph.D. in 1993, Dr. Turner has worked for the Aerospace Corporation, where he advises the nation's space programs on software and systems engineering issues.
As a lifelong college basketball fan and a student of AI, I was intrigued last year when I saw Danny's call for participants in a tournament prediction contest. I put together a program and managed to tie Danny in the Sweet Sixteen bracket.
The program I wrote last year used a genetic algorithm to evolve a scoring equation based upon features such RPI, strength of schedule, wins and losses, etc., and selected the equation that did the best job of predicting the same outcome as the games in the training set. I felt the key in winning a tournament picking contest was in guessing the upsets, so I added some features to the prediction model intended to identify and pick likely upsets.
In retrospect, the competition pool for this contest is so small that picking upsets is probably not as important as it is in (say) the full Yahoo pool, where you need to gamble more if you hope to distinguish yourself from the mass of consensus picks.
Since the contest last year I have continued to work on the Pain Machine. The focus of my effort shifted more towards predicting the margins of regular season games. My latest models predict regular season games about 80% correctly with an error of about 8 points. Since 2/5, the Pain Machine has predicted 63% correctly against the spread (from Bodog.com) for selected games where it has identified a strong advantage. However, the sample size for that is very small so the result may not be meaningful. Performance of the model against the 2009 and 2010 tournament games is similar (or better), although again the sample size is very small.
In the course of the last year I have identified several interesting (i.e., non-obvious) keys to understanding and predicting college basketball games. But perhaps the most striking realization has been the variability of the outcome of college game. In Stokol's original paper on his LMRC method for predicting college basketball games, he compares pairwise home-and-home matchups and determines that a team needs to win at home by >21 points to have an even chance to beat the same team on the road! While I don't agree with the magnitude of that result, it's clear that what I would have evaluated as a "convincing" win -- say by 10 or 12 points -- is actually surprisingly weak evidence of the superiority of Team A over Team B. A case in point is Virginia Tech's home win over Duke late in the season. Virginia Tech is at best a bubble team and still managed to beat one of the best two or three teams in the country. Does this mean that Virginia Tech is significantly better than we thought? Or Duke significantly worse? Probably not. So it's hard to imagine a predictor that could consistently predict those sorts of outcomes, which makes the problem all the more interesting!
-- Scott Turner