Monday, March 21, 2011

After Round 2: Points from Upsets

Posted by Danny Tarlow
You can see the current standings here. At the moment, 3 of the top 5 (including the top 2) entries used some sort of human judgement in their algorithm: picking based on Higher Seed (tied for 1st) is the judgement of the seeding committee; the Nate Silver (tied for 1st) baseline uses seed information along with several other power ratings, some of which are human based, and some of which are computer based; and the Lee bracket (tied for 4th) was filled out by Lee with no computer assistance. The top two computer entries are the TrueSkill algorithm (3rd) that Scott Turner implemented and the Delete Kernel (tied for 4th) entry from Kevin, who built his entry based on the simplest 1D probabilistic matrix factorization model that I wrote about previously (and released code for).

I think the take-away at this point is that the real winner right now is whoever decided on the seeding of teams. The Higher Seed bracket is tied in first place, and the strength of the other brackets mostly comes from how closely their picks matched the higher seed. Yes, there have been a lot of upsets, including some big ones, and the entrants did indeed pick some upsets, but the entrants didn't generally pick the right upsets.

Here's an alternative way of looking at results that reveals this. I took the point totals for each contestant and split off the contribution to the point total that came from picking an upset. For the two rounds, I report "A/B" where B is the total number of points the entrant earned in the Yahoo bracket, and A is the number of points that came from predicting upsets. I define "upset points" to be points gained from a correct prediction, where the winning team is not the best ranked seed that could possibly have made it to that point. So even though Richmond (12) was the favorite over Morehead (13), predicting Richmond making it to the Sweet 16 would give a contestant 2 "upset points", because the best ranked seed that could have made it to that point was Louisville (4). Here are the results:
Points from upsets

TEAM                   R1      R2     Total
Delete Kernel:        4/23    2/20    6/43
InItToWinIt:          2/22    2/20    4/42
Human (Lee):          1/25    2/18    3/43
Point Differential:   3/23    0/16    3/39
Silver:               3/25    0/20    3/45
LRMC:                 3/25    0/16    3/41
Dirknbr1:             0/23    2/8     2/31
Danny:                2/22    0/16    2/38
TrueSkill:            2/26    0/18    2/44
The Pain Machine:     0/19    0/18    0/37
Higher Seed:          0/25    0/20    0/45
Under this evaluation measure (which is admittedly not what anybody was trying to optimize), the completely computer-based models are doing better. Perhaps the real take-away at the moment, though, is that predicting upsets is hard!


Scott Turner said...

The Higher Seeds only looks good compared to our small universe of picks. I can't seem to find where that entry stands in the overall contest (you can probably see that) but the overall leaders have 42 of 48 possible points, so at a guess Higher Seeds is probably around 50% or so.

Danny Tarlow said...

Actually, I'm able to see the percentile ranking for the Nate Silver Baseline that I entered, which is tied with Higher Seed, and it's in the 96th percentile overall. TrueSkill is 91st percentile.