Posted by Danny TarlowI went with a new approach to Machine March Madness predictions this year. I won't go into the details right now, but here's a neat visualization that comes out of the algorithm. What you need to know is that I'm sticking with the basic original idea of using latent real-valued descriptors for each team, but I'm abandoning the requirement that there are segregated offensive and defensive descriptors for each team. Instead, the model this year represents each team with a set of numbers that can be used to explain both offensive and defensive performance.
So I'll skip all of the details and jump straight to showing you what the model has learned from this year's regular season. Below is a visualization of what happens when I ask the model to use two numbers to describe each team, then I plot the learned numbers as x and y coordinates on a standard plot.
These results lose the easy interpretability as offensive and defensive strengths, but the model is such that teams in similar locations on the plot will typically be predicted to perform similarly. To help with eyeballing the results, I've color coded 1 through 4 seeds: #1 seeds are blue, #2's are green, #3's are red, and #4's are magenta.
I won't try too hard to explain what's going on, but it does seem to group the stronger teams in the lower and left parts of the plot, and the weaker teams in the upper and right parts. Anybody notice any other interesting patterns?