Posted by LeeWelcome to the second annual March Madness Predictive Analytics Challenge! I'm very excited about this event and I hope you are, too! We're still trying to line up some prizes, but for sure, like last year, there will be a gift certificate to Amazon.com.
This year's format will be more or less the same as last year's.
BackgroundMost readers of this blog are probably familiar with the general idea of what this contest is about. In case you aren't a frequent reader or a fan of college basketball, this section will serve as a brief introduction. March 11th is "Selection Sunday" where the teams for the NCAA College Basketball tournament will be selected. In total, there will be 68 teams with 8 teams playing four "play-in" games on March 15th and 16th to determine the field of 64. For the purposes of this contest, you do not need to worry about these initial play-in games. The remaining 64 teams are then pit against each other in a bracket with one national champion emerging as the winner. Every year, millions of people fill in their predictions of who will be the winners and losers of the games. People participate in leagues or pools with other people to see who has the best bracket. We would like YOU to participate in our algorithm-only pool. That is, your bracket must be completed by a computer algorithm based upon historical data without the use of human judgment.
Contest FormatThe format is fairly simple. We will have two pools: a Tournament pool and a Sweet Sixteen pool. Entries in both pools will be evaluated on the typical exponential point scoring system. Correct picks get 1, 2, 4, 8, 16, and 32 points depending on the depth in the bracket (1 point in the first round, 2 points in the second round, etc). The entry only needs to pick the winning team. Thus, if the other team is no longer in the tournament, but the winning team is picked, points are still awarded. Each person is limited to one entry per pool. Each pool will have a winner determined by the submission scoring the most points.
DeadlinesTOURNAMENT pool entries must be submitted no later than March 17, 2011 (the first day of play in the round of 64).
SWEET SIXTEEN pool entries must be submitted no later than March 24, 2010 (the beginning of the sweet sixteen round).
- Your bracket must be chosen completely by a computer algorithm.
- The computer algorithm must base the decision upon historical data.
- You may not hard code selections into your algorithm (e.g., "Always pick Stanford over Cal")
- Your algorithm may only use the data set published for the tournament. The data will be released on Sunday, March 13.
- The above rule is fairly restricting, but I believe this provides a more even playing field. The contest should be about your algorithm's predictive capabilities and not a data advantage one person has over another.
- You must be able to provide code that shows how your entry picks the winners. In other words, your bracket and the selection of winning teams in your bracket must be reproducible by me on a machine.
- In the event of a tie, the entry with the EARLIER submission time wins.
SubmissionsWe'll be using Yahoo's bracket system for the contest submissions. Please send an e-mail to leezen+MarchMadness at gmail for the group password to join. Please include your team name, team members, and brief description.
DataAs described above, only the official contest data on this blog is acceptable for use in this contest. You can get a sample of the data, which has all games from the 2006 season through February 2011. Please see this post for details. I will also update this post on Sunday with a link to the full data set.
UPDATE: I have an updated post with details on the final data: Selection Sunday Data
Additional InformationPlease be aware that algorithm computation time will be somewhat important in this task. You will be able to predict most of your games ahead of time between March 13th and 17th but because of the four play-in games, you will need to predict the outcome of four games between March 15th and 17th as the match-ups in the round of 64 will not be known until the play-in games are complete.
If you have other questions, concerns, etc. please comment on this post and I'll do my best to answer.