Monday, March 8, 2010

Data-driven March Madness 2010

Posted by Danny Tarlow
Some of you may remember from last year that I don't really follow college basketball, but I still like to enter into the March Madness bracket pools with my friends. How do I keep from being woefully uninformed, you might ask? Simple: Data-driven march madness predictions.

I just quickly checked my scripts from last year, and the scraping and parsing seems to work. So consider the data gathering officially started. Anybody else who is interested in applying some machine learning and making data-driven predictions, I suggest you start thinking about your strategy now. I will provide the data shortly.

Update:
The data is being put in a MySQL database with two tables. First is the game result:
mysql> describe game_result;
+-------------+------------+------+-----+---------+----------------+
| Field       | Type       | Null | Key | Default | Extra          |
+-------------+------------+------+-----+---------+----------------+
| game_id     | int(11)    | NO   | PRI | NULL    | auto_increment | 
| date_played | date       | YES  |     | NULL    |                | 
| home_code   | varchar(3) | YES  | MUL | NULL    |                | 
| home_score  | int(11)    | YES  |     | NULL    |                | 
| away_code   | varchar(3) | YES  |     | NULL    |                | 
| away_score  | int(11)    | YES  |     | NULL    |                | 
+-------------+------------+------+-----+---------+----------------+
And second is the team codes, as used by rivals.yahoo.com:
mysql> describe team_code;
+-----------+-------------+------+-----+---------+----------------+
| Field     | Type        | Null | Key | Default | Extra          |
+-----------+-------------+------+-----+---------+----------------+
| team_id   | int(11)     | NO   | PRI | NULL    | auto_increment | 
| team_code | varchar(3)  | YES  | UNI | NULL    |                | 
| team_name | varchar(64) | YES  |     | NULL    |                | 
+-----------+-------------+------+-----+---------+----------------+
I haven't set the exact format of the output, but it will be some simple export of this.

2 comments:

Joseph Turian said...

Can you tell us about what the data will look like?

Danny Tarlow said...

Sure. I've updated the post to show it. I need to look for the last few rounds of tournament results from last year, but I should be able to get them.

There will definitely be all regular season games from last year and this year (including those played by teams that didn't make the tournament).