Wednesday, March 21, 2012

Round 2 Update + Upset Analysis

Posted by Danny Tarlow
Here's another great guest post from Scott Turner, our #1 Machine March Madness guest poster. Great analysis -- thanks Scott! If you want more where this came from, check out his blog.

On my blog here I took a closer look at how the Pain Machine predicts upsets in the tournament and how effective it was this year.  I thought it might be interesting to look at how the top competitors in the Machine Madness contest predicted upsets.  I put together the following table with the competitors across the top and an X in every cell where they predicted an upset.  Boxes are green for correct predictions and red for incorrect predictions.  The final row(s) in the table shows the scores & possible scores for each competitors.

Game Pain Machine Predict the Madness Sentinel Danny's Conservative Picks AJ's Madness Matrix Factorizer
Texas over Cincy X X X X X
Texas over FSU X X
WVU over Gonzaga X X X
Purdue over St. Mary's X X X X X
NC State over SDSU X
South Florida over Temple X X
New Mexico over Louisville X X
Virginia over Florida X
Colorado State over Murray State X
Vandy over Wisconsin X
Wichita State over Indiana X
Murray State over Marquette X X
Upset Prediction Rate 43% 25% 33% 0% 25% 29%
Current Score 42 43 42 41 41 39
Possible Points 166 155 166 161 137 163


(I'm not counting #9 over #8 as an upset. That's why Danny has only 41 points; he predicted a #9 over #8 upsets that did not happen.)

So what do you think?

One thing that jumps out immediately is that the competitors predicted many more upsets this year than in past years.  Historically we'd expect around 7-8 upsets in the first two rounds.  Last year the average number of upsets was about 2 (discounting the Pain Machine and LMRC).  The Pain Machine is forced to predict this many, but this year the Matrix Factorizer also predicts 7, and Predict the Madness and AJ's Madness predict 4.  From what I can glean from the model descriptions, none of these models (other than the Pain Machine) force a certain level of upsets. 

Monte's model ("Predict the Madness") seems to use only statistical inputs, and not any strength measures, or strength of competition measures.  This sort of model will value statistics over strength of schedule, and so you might see it making upset picks that would not agree with the team strengths (as proxied by seeds).

The Sentinel uses a Monte Carlo type method to predict games, so rather than always produce the most likely result, it only most likely to produce the most likely result.  (If that makes sense :-)  The model can be tweaked by choosing how long to run the Monte Carlo simulation.  With a setting of 50 it seems to produce about half the expected number of upsets.

Danny's Dangerous Picks are anything but; it is by far the most conservative of the competitors.  The pick of Murray State over Marquette suggests that Danny's asymmetric loss function component might have led to his model undervaluing strength of schedule.

AJ's Madness model seems to employ a number of hand-tuned weights for different components of the prediction formula.  That may account for the prediction upsets, including the somewhat surprising CSU over Murray State prediction.

The Matrix Factorizer has two features that might lead to a high upset rate.  First, there's an asymmetric reward for getting a correct pick, which might skew towards upsets.  Secondly, Jasper optimized his model parameters based upon the results of previous tournaments, so that presumably built in a bias towards making some upset picks.

What's interesting about the actual upsets?

First, Texas over Cincy and Purdue over St. Mary's were consensus picks (excepting Danny's Conservative Picks).   This suggests that these teams really were mis-seeded.  Purdue vs. St. Mary's is the classic trap seeding problem for humans -- St. Mary's has a much better record, but faced much weaker competition.  Texas came very close to beating Cincinnati -- they shot 16% in the first half and still tied the game up late -- which would have made the predictors 2-0 on consensus picks.

Second, the predictors agreed on few of the other picks.  Three predictors liked WVU over Gonzaga, and the Pain Machine and the Matrix Factorizer agreed on two other games.  Murray State over Marquette is an interesting pick -- another classic trap pick for a predictor that undervalues strength of schedule -- and both Danny's predictor and the Matrix Factorizer "fell" for this pick.

So how did the predictors do?

The Pain Machine was by far the best, getting 43% of its upset predictions correct.  Sentinel was next at 33%.  Perhaps not coincidentally, these two predictors have the most possible points remaining.

In terms of scoring, the Baseline is ahead of all the predictors, so none came out ahead (so far) due to their predictions.  The PM and Sentinel do have a slight edge in possible points remaining over the Baseline.

So who will win?

The contest winner will probably come down to predicting the final game correctly.  There's a more interesting spread of champion predictions than I expected -- particularly given the statistical dominance of Kentucky. 

If Kentucky wins, the likely winner will be the Baseline or Danny.  If Kansas wins, the Pain Machine will likely win unless Wisconsin makes it to the Final Four, in which case AJ should win.  If Michigan State wins, then the Sentinel will likely win.  And finally, if Ohio State wins, then Predict the Madness should win.

2 comments:

mmcnair said...

Just posted this on your blog, Scott, but figured I'd comment here as well...

Hey Scott, I actually do account for strength of schedule, but I do it by component. Sorry for being unclear but here is the line: "For the variables, I use the location of the game, metrics for the team's offense and defense, and metrics of the team's opponents' averages for both offense and defense."

The last part ("metrics of the team's opponents' averages") is the "strength of schedule" part.

mmcnair said...

Also, I'm not sure why people force upsets in their predictions. Even if you know there's a certain number of upsets (which we don't), you're still lowering your expected points. This is similar to people refusing to pick all four #1s to make the Final Four, even if they are all the most likely to win their region.

Is it simply to help differentiate your picks? If so, I'd think there are better ways to do that (but I'm not sure).