Posted by Danny TarlowThose of you who read this blog regularly know that I like to play around each year predicting March Madness basketball scores. This last year, Lee got involved, and we ran the first annual March Madness Predictive Analytics Challenge, which by all measures was a great success.
Well, it's fun to run my model, but it's a pretty basic model. It doesn't use any information other than the scores of each game, so important things like when the game was played, whether it was a home or away game for each team, and various other pieces of side information are ignored. It's not that I don't think there's useful additional information, it's just tricky to figure out a good way to get it into the model.
So I'm quite pleased to report that some of my buddies from the Toronto machine learning group -- Ryan, George, and Iain -- had some great ideas. They wrote a paper about the ideas, which will be appearing at the upcoming conference, Uncertainty in Artificial Intelligence (UAI 2010). They're also releasing their data and code. The rough idea of the model is to train a different model for each context (which is given by the approximate date, who is home/away, and other side information), but to constrain the models with similar contexts to have similar parameters using Gaussian Process priors. As they say in the abstract:
We propose a framework for incorporating side information by coupling together multiple PMF problems via Gaussian process priors. We replace scalar latent features with functions that vary over the covariate space. The GP priors on these functions require them to vary smoothly and share information. We apply this new method to predict the scores of professional basketball games, where side information about the venue and date of the game are relevant for the outcome.It's a cool model and a really nice idea. If you followed the previous action related to March Madness, I encourage you to take a look. And of course, it's never too early to be thinking about your entry for the 2011 March Madness Predictive Analytics Challenge!