*Posted by Danny Tarlow*

So I'll skip all of the details and jump straight to showing you what the model has learned from this year's regular season. Below is a visualization of what happens when I ask the model to use two numbers to describe each team, then I plot the learned numbers as x and y coordinates on a standard plot.

These results lose the easy interpretability as offensive and defensive strengths, but the model is such that teams in similar locations on the plot will typically be predicted to perform similarly. To help with eyeballing the results, I've color coded 1 through 4 seeds: #1 seeds are blue, #2's are green, #3's are red, and #4's are magenta.

I won't try too hard to explain what's going on, but it does seem to group the stronger teams in the lower and left parts of the plot, and the weaker teams in the upper and right parts. Anybody notice any other interesting patterns?

## 6 comments:

Interesting -- does this just use game scores to drive the ratings? Iona is the #2 scoring team in the nation, and Georgetown is something like 250th, so it suggests that diagonal is "scoring" but maybe I'm just seeing something that's not there.

Interesting! Based on the top and bottom teams in terms of points per game (which I'm looking at here: http://slice-publish.s3-website-us-east-1.amazonaws.com/rrqTvopyJGc/# ), that explanation does seem to fit.

And it does just use game scores as the supervision in a manner similar to my previous models, where the descriptors are modulated by the opposing team's descriptors.

What dataset did you use to create this?

The scores from all games this season, from the github repo:

https://github.com/dtarlow/Machine-March-Madness/blob/master/data/GameResults_201213.csv

Not sure if you can reveal your secrets here, but is this some form of k-SNE or t-SNE?

Nope, no SNE here.

Post a Comment