*Posted by Danny Tarlow*

*Scott Turner has been analyzing entrants into our 2011 March Madness Predictive Analytics Challenge, and he has some interesting insights and comments. I'm posting them verbatim here.*

In the last 10 years, the upset rate in the first round has been 22%. (Ignoring 8-9 matchups.) Looking at the other 6 entries in the contest (the ones I think are entries, anyway), they predict 16 upsets -- about 8%. Of the baselines, only LMRC comes close with 5 upsets (15%).

For my entry, I forced 15% upset picks over the entire tournament (~9). (15% is about the overall upset rate in the tournament for the past 10 years.) So I have 4 in the first round and 5 in the later rounds.

Also, everyone has a #1 seed winning it all, but no one has Pittsburgh.

(I'm assuming that Point Diff, InitToWinIt, DukeRepeats, Danny's, dirknbr1 and mine are the competitors.)

Ignoring upset picks where there's only a 1 seed differential (e.g., 9 over an 8), these are the upset picks across all the competitors:

Qty Upset Delta Details [1] 3 > 1 2 (BYU > Pitt) [2] 6 > 3 3 (Xav > Syr, SJU > BYU) [3] 7 > 2 5 (Was > UNC, A&M > ND, UCLA > Fla) [1] 9 > 1 8 (ODU > Pitt) [4] 10 > 7 3 (FSU > A&M, UGA > Was, PSU > Temple, MSU > UCLA) [1] 10 > 2 8 (UGA > UNC) [3] 11 > 6 5 (Missou > Cinc, Gonz > SJU x 2) [1] 12 > 9 3 (USU > ODU) [4] 12 > 5 7 (USU > KSU x 4) [2] 12 > 4 8 (USU > Wis x 2) [2] 12 > 3 8 (USU > BYU x 2) [1] 12 > 2 12 (USU > Pitt, USU > Kan) [2] 13 > 4 11 (Bel > Wis x 2) [1] 14 > 6 8 (St. Pete > Georgetown) [1] 14 > 3 11 (St. Pete > Purdue)which if I add correctly is 30 upsets picks. The upset % over the last ten years (again ignoring 1 seed differential) is 119/519 = ~23%. By that metric these brackets ought to show 86 (!) upsets - almost 3x more. (The number is even worse if you take out my bracket, where I tried to force the right upset metric, but even my bracket should show more upsets.)

I think there's an interesting discussion to be had over whether a computer predictor ought to pick the most likely event on a game by game basis or not. The former leads to mostly "chalk" picks -- the only time you're going to go against the seeding is where the seeding is wrong (according to your model). Imagine a tournament where in every game the better team has a 1% advantage. The hypothetical predictor will pick the favorites in every game -- and miss almost half the games. Is that the best strategy?

It's also interesting to note that USU counts for a full third of the upset picks. If we assume that most of these models are only picking upsets if they think there's been an actual seeding mistake, then USU was the only "consensus" mis-seeding. That's kind of an interesting result in itself. That suggests that the committee does a good job seeding (at least in some sense) and/or that these predictors don't do a good job of discerning "hidden" information about the teams.

Of the 30 upset predictions, after the first day we know that 2 are right (the picks of Gonz > SJU), 13 are wrong, and the remaining 13 are undecided. That doesn't look like such a good record for the predictors :-). The correct upset pick was a delta of 5, which is a pretty big upset.

I can't see right now whether the ESPN Tourney Challenge shows statistics like how many people in the tourney picked particular upsets, but it would be interesting to compare the competitor upset picks to how people picked.

-- Scott Turner

## 2 comments:

> The hypothetical predictor will pick the favorites in every game -- and miss almost half the games. Is that the best strategy?

I think this is a great question. To answer it, we need to precisely define what "best" is, though, right? In your example where the favorite always has a 51% chance of winning, if the goal is to choose the bracket that is most likely to be 100% correct, then always choosing the favorite is clearly the best choice.

I wonder what the optimal strategy would be if your goal was, say, to optimize some sort of precision-recall score on the upsets that you correctly predict. Or maybe something like an intersection over union score: |correctly picked upsets| / |picked upset OR upset occurred|. If you correctly pick all upsets, you get a score of 1; if you pick no upsets, you get a score of 0.

I've struggled a little bit with this question myself since last year.

If you're in a big contest (like Yahoo or ESPN) and your goal is to

finish in (say) the Top 5%, then clearly making chalk picks is not

going to do that -- even if you feel that the higher seed is the

better team in every matchup.

If you're in a small contest (like ours) with a goal to win the

contest, then your best strategy might well be to make all or almost

all chalk picks, and hope that you're just slightly better enough to

distinguish yourself.

In light of previous AI challenges like Deep Blue and Watson, maybe

the right metric for success is to out-perform an average pick. So

the rules for this contest could be modified to say that the winner

needs to outperform all the other contestants *and* finish in the top

50% of all picks on Yahoo. I think that might motivate more

interesting competition.

Another possibility is to compete round-by-round, predict point

spreads and the winner has minimum error. That has some advantages,

but the contest would be much harder to run and to participate in.

Post a Comment