Monday, March 14, 2011

March Madness Predictions: Code Description

Posted by Danny Tarlow
In the last post, I showed outputs of the 1D version of my matrix factorization model for predicting March Madness results. Here, I'm posting the code along with a brief description of how to get it running, so you can replicated my results, possibly as the basis for your entry into the 2011 March Madness Predictive Analytics Challenge.

To start, there are two Python files you need:
You'll also need the pickle file with the data:
Put all the files in a directory, and make a directory named "models" to store the output.

Now there are two steps:
  1. Train a model by running "python learner.py".
  2. Print out the offensive and defensive scores by running, "python bracket.py".
That's it!

If you'd like to simulate particular games, there is a function in bracket.py called simulate_game(team_code_A, team_code_B). There is also old code to simulate a full bracket, but that hasn't been updated from previous years (yet).

If you'd like to train higher dimensional models or play around with the regularization parameter, feel free to change things at the top of __init__() in learner.py. Higher dimensional models are harder to visualize, so one idea would be to sort teams based on how they are predicted to fare against a favorite like Duke ("dau" team code).

Happy predicting!

1 comment:

Denimboy said...

What's your take on Nate Silvers (538) bracket:

http://fivethirtyeight.blogs.nytimes.com/2011/03/14/how-we-made-our-n-c-a-a-picks/

He's building an ensamble of power ratings, team venue distance, and discounting for player suspensions. It seems that his bracket may have been overpowered by taking tinto account the seed values.