Posted by Danny TarlowScott Turner has been analyzing entrants into our 2011 March Madness Predictive Analytics Challenge, and he has some interesting insights and comments. I'm posting them verbatim here.
In the last 10 years, the upset rate in the first round has been 22%. (Ignoring 8-9 matchups.) Looking at the other 6 entries in the contest (the ones I think are entries, anyway), they predict 16 upsets -- about 8%. Of the baselines, only LMRC comes close with 5 upsets (15%).
For my entry, I forced 15% upset picks over the entire tournament (~9). (15% is about the overall upset rate in the tournament for the past 10 years.) So I have 4 in the first round and 5 in the later rounds.
Also, everyone has a #1 seed winning it all, but no one has Pittsburgh.
(I'm assuming that Point Diff, InitToWinIt, DukeRepeats, Danny's, dirknbr1 and mine are the competitors.)
Ignoring upset picks where there's only a 1 seed differential (e.g., 9 over an 8), these are the upset picks across all the competitors:
Qty Upset Delta Details  3 > 1 2 (BYU > Pitt)  6 > 3 3 (Xav > Syr, SJU > BYU)  7 > 2 5 (Was > UNC, A&M > ND, UCLA > Fla)  9 > 1 8 (ODU > Pitt)  10 > 7 3 (FSU > A&M, UGA > Was, PSU > Temple, MSU > UCLA)  10 > 2 8 (UGA > UNC)  11 > 6 5 (Missou > Cinc, Gonz > SJU x 2)  12 > 9 3 (USU > ODU)  12 > 5 7 (USU > KSU x 4)  12 > 4 8 (USU > Wis x 2)  12 > 3 8 (USU > BYU x 2)  12 > 2 12 (USU > Pitt, USU > Kan)  13 > 4 11 (Bel > Wis x 2)  14 > 6 8 (St. Pete > Georgetown)  14 > 3 11 (St. Pete > Purdue)which if I add correctly is 30 upsets picks. The upset % over the last ten years (again ignoring 1 seed differential) is 119/519 = ~23%. By that metric these brackets ought to show 86 (!) upsets - almost 3x more. (The number is even worse if you take out my bracket, where I tried to force the right upset metric, but even my bracket should show more upsets.)
I think there's an interesting discussion to be had over whether a computer predictor ought to pick the most likely event on a game by game basis or not. The former leads to mostly "chalk" picks -- the only time you're going to go against the seeding is where the seeding is wrong (according to your model). Imagine a tournament where in every game the better team has a 1% advantage. The hypothetical predictor will pick the favorites in every game -- and miss almost half the games. Is that the best strategy?
It's also interesting to note that USU counts for a full third of the upset picks. If we assume that most of these models are only picking upsets if they think there's been an actual seeding mistake, then USU was the only "consensus" mis-seeding. That's kind of an interesting result in itself. That suggests that the committee does a good job seeding (at least in some sense) and/or that these predictors don't do a good job of discerning "hidden" information about the teams.
Of the 30 upset predictions, after the first day we know that 2 are right (the picks of Gonz > SJU), 13 are wrong, and the remaining 13 are undecided. That doesn't look like such a good record for the predictors :-). The correct upset pick was a delta of 5, which is a pretty big upset.
I can't see right now whether the ESPN Tourney Challenge shows statistics like how many people in the tourney picked particular upsets, but it would be interesting to compare the competitor upset picks to how people picked.
-- Scott Turner