Tuesday, February 28, 2012

Preliminary Aggregate Data

Posted by Lee

For those of you who want to play with just aggregate game result data, I've posted an updated version that you can play with. The format is the same as described in a previous post: date, home team, away team, home score, away score, and whether or not the home team won.

This data covers the 2006 season through 2/26/2012 and, as with the player-level data, will be updated on Selection Sunday to reflect the most up to date information.

Monday, February 27, 2012

Preliminary 2011 Season Data

Posted by Lee

In addition to data from the 2006-2010 seasons shared publicly via Google Docs

We've published some preliminary data for the 2011 season. This uses the same format as past seasons' data and spans the beginning of the 2011 season through 2/26.

After Selection Sunday (March 11th), we will publish an updated set of data for the 2011 season. Please let us know if you find any problems with the preliminary data.

Machine March Madness 2012: Starter Code

Posted by Danny Tarlow
I've started a github repository for the 2012 March Madness competition, to which I've committed some python code that I worked on over the weekend:
https://github.com/dtarlow/Machine-March-Madness

Here, you can find code that parses data from previous seasons, constructs the past brackets, and learns a few different models based on past data. More details are in the README.

I will post in more detail about the models once I get them working a bit better, but I encourage you to take a look at the high level structure in learn_synthetic.py and model.py.

I've brainstormed a bunch of TODOs at the bottom of the README, so if you'd like to jump in and work on some of those, please do. Or feel free to go off in your own direction.

For detailed discussions of the code, questions, or bug reports/fixes, head on over to the official Google group.

Saturday, February 25, 2012

Google group for March Madness competition...

Posted by Danny Tarlow
... here.

We'll use the Google group for discussion of issues related to rules, but other posts are fair game: maybe you're looking for somebody to team up with, or maybe you want to brainstorm modeling ideas, etc.

Thursday, February 23, 2012

Machine March Madness 2012

Posted by Danny Tarlow
Every year, the NCAA College Basketball seasons ends with a tournament of 64 teams. Humans around the US (but also elsewhere in the world) fill in brackets with predictions of the outcome, enter pools, and wait excitedly for the results.

College basketball is a streaky and fairly high variance game, so there are many chances for an underdog to make a run deep into the tournament. We see this often -- for example, last year's tournament featured a final four made up of 3, 4, 8, and 11 seeds -- leading to the colloquial tournament name, "March Madness".

So without further ado, it is my pleasure to announce that this year, this blog, in conjunction with commissioner Lee, will host another "Machine March Madness" contest. The big idea is simple: using data from this season and from past seasons (which we will provide -- e.g., past data here: full and simple), build a computer system that fills out a bracket, then pit yourself against the field of silicon competition. You can see posts from last season's tournament here, and some press coverage here.

We'll get more details coming soon, including details about prizes. For now, you can do a few things.
  1. Download the past data (full and simple), and start thinking about how you'd model the tournament. To get some starter ideas, I recommend this timeless post by George Dahl.
  2. Let us know in the comments if there is any other data that you would like to use. The rule we have is that all systems must be built using the same data, but we're open to suggestions about what this data is.
  3. Get started!


Update: Here's a question about additional data to use, posted on Quora.