Posted by Danny TarlowScott Turner writes...
Doing well in a tournament picking contest probably comes down to picking the right upsets. Anyone can pick the higher seeds to win.My response...
Define an upset as a lower seed beating a higher seed, and ignore upsets where there's only 1 step differential (i.e., a #9 beating a #8). If my math from last year is correct, the upset rate in the tournament is around 22%. Half those upsets happen in the first round, about 7.
Some recent thoughts about upsets:
I leave it to Danny / Lee to turn this into a blog posting :-)
From a machine learning perspective, I think Scott raises an interesting issue here. Let me rephrase the problem a little more abstractly, to more clearly get at the crux of the issue. Suppose that some oracle were to come down and tell us that exactly 15 of the games in this year's March Madness tournament will be upsets. How should this affect our prediction strategy?
There are probably two natural answers:
- Don't change anything. I have my prediction for each game, and I think it's going to lead to the most number of correct predictions.
- Make my base predictions, but go back and find the games that I'm most uncertain about, and flip predictions until I am predicting exactly 15 upsets.
So if the goal is to win the $5 million prize and you believe the oracle, then the right strategy is to pick the 15 upsets that the model thinks are most likely.
However, while both of these strategies make some sense, they both seem too extreme. Perhaps the more natural objective should be to ensure that we win this year's Machine March Madness prediction contest. If that's our goal, what's the best strategy? What if we had the predictions from all of the competitors for past years, and I told you that this year's field was going to be drawn from a similar set of competitors?
See Scott's picks for most likely upsets over at his blog.