Friday, March 18, 2011

Early Analysis of Algorithmic March Madness Entries

Posted by Danny Tarlow
Scott Turner has been analyzing entrants into our 2011 March Madness Predictive Analytics Challenge, and he has some interesting insights and comments. I'm posting them verbatim here.

In the last 10 years, the upset rate in the first round has been 22%. (Ignoring 8-9 matchups.) Looking at the other 6 entries in the contest (the ones I think are entries, anyway), they predict 16 upsets -- about 8%. Of the baselines, only LMRC comes close with 5 upsets (15%).

For my entry, I forced 15% upset picks over the entire tournament (~9). (15% is about the overall upset rate in the tournament for the past 10 years.) So I have 4 in the first round and 5 in the later rounds.

Also, everyone has a #1 seed winning it all, but no one has Pittsburgh.

(I'm assuming that Point Diff, InitToWinIt, DukeRepeats, Danny's, dirknbr1 and mine are the competitors.)

Ignoring upset picks where there's only a 1 seed differential (e.g., 9 over an 8), these are the upset picks across all the competitors:

Qty     Upset   Delta   Details
[1]     3 > 1   2       (BYU > Pitt)
[2]     6 > 3   3       (Xav > Syr, SJU > BYU)
[3]     7 > 2   5       (Was > UNC, A&M > ND, UCLA > Fla)
[1]     9 > 1   8       (ODU > Pitt)
[4]     10 > 7  3       (FSU > A&M, UGA > Was, PSU > Temple, MSU > UCLA)
[1]     10 > 2  8       (UGA > UNC)
[3]     11 > 6  5       (Missou > Cinc, Gonz > SJU x 2)
[1]     12 > 9  3       (USU > ODU)
[4]     12 > 5  7       (USU > KSU x 4)
[2]     12 > 4  8       (USU > Wis x 2)
[2]     12 > 3  8       (USU > BYU x 2)
[1]     12 > 2  12      (USU > Pitt, USU > Kan)
[2]     13 > 4  11      (Bel > Wis x 2)
[1]     14 > 6  8       (St. Pete > Georgetown)
[1]     14 > 3  11      (St. Pete > Purdue)
which if I add correctly is 30 upsets picks. The upset % over the last ten years (again ignoring 1 seed differential) is 119/519 = ~23%. By that metric these brackets ought to show 86 (!) upsets - almost 3x more. (The number is even worse if you take out my bracket, where I tried to force the right upset metric, but even my bracket should show more upsets.)

I think there's an interesting discussion to be had over whether a computer predictor ought to pick the most likely event on a game by game basis or not. The former leads to mostly "chalk" picks -- the only time you're going to go against the seeding is where the seeding is wrong (according to your model). Imagine a tournament where in every game the better team has a 1% advantage. The hypothetical predictor will pick the favorites in every game -- and miss almost half the games. Is that the best strategy?

It's also interesting to note that USU counts for a full third of the upset picks. If we assume that most of these models are only picking upsets if they think there's been an actual seeding mistake, then USU was the only "consensus" mis-seeding. That's kind of an interesting result in itself. That suggests that the committee does a good job seeding (at least in some sense) and/or that these predictors don't do a good job of discerning "hidden" information about the teams.

Of the 30 upset predictions, after the first day we know that 2 are right (the picks of Gonz > SJU), 13 are wrong, and the remaining 13 are undecided. That doesn't look like such a good record for the predictors :-). The correct upset pick was a delta of 5, which is a pretty big upset.

I can't see right now whether the ESPN Tourney Challenge shows statistics like how many people in the tourney picked particular upsets, but it would be interesting to compare the competitor upset picks to how people picked.

-- Scott Turner


Danny Tarlow said...

> The hypothetical predictor will pick the favorites in every game -- and miss almost half the games. Is that the best strategy?

I think this is a great question. To answer it, we need to precisely define what "best" is, though, right? In your example where the favorite always has a 51% chance of winning, if the goal is to choose the bracket that is most likely to be 100% correct, then always choosing the favorite is clearly the best choice.

I wonder what the optimal strategy would be if your goal was, say, to optimize some sort of precision-recall score on the upsets that you correctly predict. Or maybe something like an intersection over union score: |correctly picked upsets| / |picked upset OR upset occurred|. If you correctly pick all upsets, you get a score of 1; if you pick no upsets, you get a score of 0.

Scott Turner said...

I've struggled a little bit with this question myself since last year.

If you're in a big contest (like Yahoo or ESPN) and your goal is to
finish in (say) the Top 5%, then clearly making chalk picks is not
going to do that -- even if you feel that the higher seed is the
better team in every matchup.

If you're in a small contest (like ours) with a goal to win the
contest, then your best strategy might well be to make all or almost
all chalk picks, and hope that you're just slightly better enough to
distinguish yourself.

In light of previous AI challenges like Deep Blue and Watson, maybe
the right metric for success is to out-perform an average pick. So
the rules for this contest could be modified to say that the winner
needs to outperform all the other contestants *and* finish in the top
50% of all picks on Yahoo. I think that might motivate more
interesting competition.

Another possibility is to compete round-by-round, predict point
spreads and the winner has minimum error. That has some advantages,
but the contest would be much harder to run and to participate in.