Thursday, March 21, 2013

The 2013 Machine March Madness Field

Posted by Danny Tarlow
Thanks everybody who entered this year's Machine March Madness competition. Based on the descriptions of the approaches, it's clear that a lot of hard work and ingenuity has gone into the contest. I'm excited to see how all the different approaches do.

Below, you can see the competitors's descriptions of their approaches. We'll also have some longer posts diving into more details coming up in the near future. If there are any in particular that you're itching to hear more about, leave a note in the comments.

If you have entered but not sent me a description of your approach yet, please do. I'll update this post as more descriptions come in.

Without further ado, here is your 2013 Machine March Madness field!

-----------------------------------------------------------------------------

Marginal Madness
Kevin Swersky
http://tournament.fantasysports.yahoo.com/t1/2909174

I'm using variational Bayesian matrix factorization with normal priors on the latent factors, and Gaussian-inverse Wishart hyperpriors on the hyperparameters of the priors. Inference is performed using mean-field (no direct optimization of any model parameters is done). The entries of the matrix are R(i,j) = P(team i beats team j) using the empirical counts over the 2012-2013 season. I found that the brackets produced using this were much more stable with respect to the number of factors than any other representation. I used 20 factors, the number of which was chosen based on squared error on 25% randomly held-out entries of R. For my predictions, I just took the mean vectors and ignored any uncertainty learned by the model. Ideally, I should have selected the number of factors, or assessed the stability of the model by using the variational lower bound, but I was lazy. To predict the final score, I used gradient-boosted regression trees from scikit-learn on the feature vectors produced by the factorization.

-----------------------------------------------------------------------------

Larry's Upsetting Picks
Laurent
http://tournament.fantasysports.yahoo.com/t1/1398519

I'm using a PMF-based model and I'm also modelling several other aspects such as teams' strength over time (both over a season and across seasons) as well as conferences' strength. These different aspects are combined linearly together to form a prediction.

I also tried using a team's winning percentage (both over the season and over the last few games) but that didn't lead to an improvement.

On a technical note, I also noticed that in PMF instead of using the raw score, using the difference in scores gives slightly increased (winner determination) accuracy.

-----------------------------------------------------------------------------

K. V. Southwood's Fine Bracket
K.V. Southwood
http://tournament.fantasysports.yahoo.com/t1/3003299

I created an ensemble model based on 3 individual models:

1) multiple linear regression model based on predicting the points margin

2) multiple linear regression model based on predicting offensive points scored

3) logistic regression model based on predicting win vs. loss

-----------------------------------------------------------------------------
Ryan's Rank 1 Approximation
Ryan B.
http://tournament.fantasysports.yahoo.com/t1/1636526

Brief description of approach (same as last year): For each season (e.g. 2006-2007) I have enumerated the teams and compiled the scores of the games into a matrix S. For example, if team 1 beat team 2 with a score of 82-72 then S12=82 and S21=72. Ideally, each team would play every other team at least once, but this is obviously not the case so the matrix S is sparse. Using the method proposed by George Dahl, I define vectors o and d which correspond to each teams offensive and defensive ability. The approximation to the matrix S is then just the outer product od' (for example (od')_12=o1d2=S12est). This is a simple rank one approximation for the matrix. If each team played each other at least once then the matrix S would be dense and the vectors o and d could be found by finding the SVD of S (see http://www.stanford.edu/~boyd/ee263/notes/low_rank_approx.pdf). Because this is not the case, we instead define a matrix P that represents which teams played that season. For example, P12=P21=1 if teams 1 and 2 played a game. Now the problem stated by George can be expressed compactedly as, "minimize ||P.*(o*d')-S||_F". Here, '.*' represents the Hadamard product and ||.||_F is the Frobenius norm. In this from, it is easy to see that, for constant vector o and variable vector d, this is a convex problem. Also, for constant vector d and variable vector o this is a convex problem. Therefore, by solving a series of convex problems, alternating the vector variable between o and d, the problem converges rapidly in about 5 to 10 steps (see "Nonnegative Matrix Factorizations" code here http://cvxr.com/cvx/examples/). From this point the problem is easily expanded to handle higher rank approximations.

-----------------------------------------------------------------------------

Scott Turner's Prediction Machine
Scott Turner
http://tournament.fantasysports.yahoo.com/t1/1760363

Linear regression on a number of statistics, including strength ratings to predict MOV (Margin of Victory). The basic model is used to predict game outcomes throughout the year, but there are some modifications for the Tournament. Additions this year include a new metric for analyzing possible upsets, an algorithm for forcing upset selections based upon the (predicted) score required to win the pool, and some modifications for neutral-court and tournament games. More details at http://netprophetblog.blogspot.com/.

-----------------------------------------------------------------------------
noodlebot
Joe
http://tournament.fantasysports.yahoo.com/t1/2298853

See my blog post and project page.
http://joenoodles.com/2013/02/ncaa-d1-basketball-db/
https://github.com/jnu/ncaa

-----------------------------------------------------------------------------
Danny's Dad (Human Baseline)
Danny's Dad.
http://tournament.fantasysports.yahoo.com/t1/2664431

Literally, Danny's Dad's picks.

-----------------------------------------------------------------------------
Obama's Bracket (Human Baseline)
Barack Obama
http://tournament.fantasysports.yahoo.com/t1/1673628

The President's picks.

-----------------------------------------------------------------------------
MatrixFactorizer
Jasper Snoek
http://tournament.fantasysports.yahoo.com/t1/1597161

Probabilistic matrix factorization augmented with Gaussian Processes and Bayesian optimization. More details will be forthcoming in a longer blog post (Update: here).

-----------------------------------------------------------------------------

LA's Machine Mad Pick
LeAnthony M.
http://tournament.fantasysports.yahoo.com/t1/1647581

I used 2011 final four stats data rather than last years. Including RPI, Off eff, turnovers, & def eff. A fitness function of the final standings NCAA tournament standings feed into an evolving genetic program giving me a final equation. I feed in this equations, this years team of 64 to compute the final standing of the 2013 tournament.

-----------------------------------------------------------------------------

Predict the Madness
Monte McNair
http://tournament.fantasysports.yahoo.com/t1/2002207

???

-----------------------------------------------------------------------------

TheSentinel
Chuck
http://tournament.fantasysports.yahoo.com/t1/2997354

Similar strategy as last year. Used Ken Pomeroy's Pythag ratings with the log5 calculation to determine probability of winning the game.

Used a Monte Carlo simulation at 65 iterations which provided a few interesting upsets, Oregon over Oklahoma St. (I believe they were miss seeded myself!).

-----------------------------------------------------------------------------

Danny's Dangerous Picks
Danny
http://tournament.fantasysports.yahoo.com/t1/1421921

Developed a variant on probabilistic matrix factorization, where the scores of a game are modeled as the output of a neural network that takes as input a learned latent vector for each team as well as the elementwise product of the latent vectors for the two teams. Latent vectors for each team are learned for each team for each season jointly with the neural net parameters, which are shared across all seasons from 2006-2007 through the present. I used 5D latent vectors and a one hidden layer neural net with 50 hidden units.

-----------------------------------------------------------------------------

Human Bracket
Lee
http://tournament.fantasysports.yahoo.com/t1/3297751

The Commissioner's human bracket.

-----------------------------------------------------------------------------

The Rosenthal Fit
Jeffrey Rosenthal
http://tournament.fantasysports.yahoo.com/t1/1666195

Details here: http://www.tsn.ca/story/?id=418503

-----------------------------------------------------------------------------

Last Year's Winner (Baseline)
Jasper Snoek
http://tournament.fantasysports.yahoo.com/t1/1644140

(The winning algorithm from last year, run on this year's data but otherwise unmodified. Entered as a baseline.) I modified Danny's starter code in two ways: First, I added an asymmetric component to the loss function, so the model is rewarded for getting the prediction correct even if the absolute predicted scores are wrong. Second, I changed the regularization so that latent vectors are penalized for deviating from the global average over latent vectors, rather than being penalized for being far from 0. This can be interpreted as imposing a basic hierarchical prior.

I then ran a search over model parameters (e.g., latent dimension, regularization strength, parameter that trades off the two parts of the loss function) to find the setting that did best on number of correct predictions made in the past 5 years's tournaments.

-----------------------------------------------------------------------------

Leon's Super Legendary Bracket
Leon
http://tournament.fantasysports.yahoo.com/t1/1712730

Defensive efficiency vs Offensive efficiency; tie-breakers favored defense over offense. Chose final score using season averages in wins/losses.

-----------------------------------------------------------------------------

Tim J's Nets for Nets
Tim J.
http://tournament.fantasysports.yahoo.com/t1/1546944

Based on full season statics for each team run a discriminant analysis for correlation with wins including seasons 2000-present.

Then I trained a neural network only on neutral location games, measuring both performance in mean squared error and actual past year bracket scores from 2007-2012, and predicting the bracket for this year.

-----------------------------------------------------------------------------

natebrix's Neat Bracket
Nate
http://tournament.fantasysports.yahoo.com/t1/1931619

The method is a variation on Boyd Nation's Iterative Strength Rating that incorporates margin of victory and weights late-season games more strongly. This link has more:
https://nathanbrixius.wordpress.com/2013/03/20/ncaa-tournament-prediction-model-2013/

-----------------------------------------------------------------------------

-----------------------------------------------------------------------------

Mark's LR bracket
Mark???
http://tournament.fantasysports.yahoo.com/t1/2504134

Logistic Regression???

-----------------------------------------------------------------------------

Ask me about my T-Rex
Zach Mayer
http://tournament.fantasysports.yahoo.com/t1/1827557

???

-----------------------------------------------------------------------------

ScottyJ's Grand Bracket
???
http://tournament.fantasysports.yahoo.com/t1/1876867

???

-----------------------------------------------------------------------------

Guess O'Bot 3000
???
http://tournament.fantasysports.yahoo.com/t1/1646914

???

-----------------------------------------------------------------------------

Andy's Astounding Bracket

???
http://tournament.fantasysports.yahoo.com/t1/1645698

???

-----------------------------------------------------------------------------

Dan Tran's Dazzling Bracket
???
http://tournament.fantasysports.yahoo.com/t1/1668480

???

No comments: