Posted by Danny Tarlow
Thanks everybody who entered this year's Machine March Madness competition.
Based on the descriptions of the approaches, it's clear that a
lot of hard work and ingenuity has gone into the contest. I'm excited to
see how all the different approaches do.
Below, you can see the competitors's descriptions of their approaches.
We'll also have some longer posts diving into more details
coming up in the near future. If there are any in particular that
you're itching to hear more about, leave a note in the comments.
If you have entered but not sent me a description of your approach yet,
please do. I'll update this post as more descriptions come in.
Without further ado, here is your 2013 Machine March Madness field!
-----------------------------------------------------------------------------
Marginal Madness
Kevin Swersky
http://tournament.fantasysports.yahoo.com/t1/2909174
I'm using variational Bayesian matrix factorization with normal priors
on the latent factors, and Gaussian-inverse Wishart hyperpriors on the
hyperparameters of the priors. Inference is performed using mean-field
(no direct optimization of any model parameters is done). The entries
of the matrix are R(i,j) = P(team i beats team j) using the empirical
counts over the 2012-2013 season. I found that the brackets produced
using this were much more stable with respect to the number of factors
than any other representation. I used 20 factors, the number of which
was chosen based on squared error on 25% randomly held-out entries of
R. For my predictions, I just took the mean vectors and ignored any
uncertainty learned by the model. Ideally, I should have selected the
number of factors, or assessed the stability of the model by using the
variational lower bound, but I was lazy. To predict the final score, I
used gradient-boosted regression trees from scikit-learn on the
feature vectors produced by the factorization.
-----------------------------------------------------------------------------
Larry's Upsetting Picks
Laurent
http://tournament.fantasysports.yahoo.com/t1/1398519
I'm using a PMF-based model and I'm also modelling several other
aspects such as teams' strength over time (both over a season and
across seasons) as well as conferences' strength. These different
aspects are combined linearly together to form a prediction.
I also tried using a team's winning percentage (both over the season
and over the last few games) but that didn't lead to an improvement.
On a technical note, I also noticed that in PMF instead of using the
raw score, using the difference in scores gives slightly increased
(winner determination) accuracy.
-----------------------------------------------------------------------------
K. V. Southwood's Fine Bracket
K.V. Southwood
http://tournament.fantasysports.yahoo.com/t1/3003299
I created an ensemble model based on 3 individual models:
1) multiple linear regression model based on predicting the points margin
2) multiple linear regression model based on predicting offensive points scored
3) logistic regression model based on predicting win vs. loss
-----------------------------------------------------------------------------
Ryan's Rank 1 Approximation
Ryan B.
http://tournament.fantasysports.yahoo.com/t1/1636526
Brief description of approach (same as last year):
For each season (e.g. 2006-2007) I have enumerated the teams and
compiled the scores of the games into a matrix S. For example, if team
1 beat team 2 with a score of 82-72 then S12=82 and S21=72. Ideally,
each team would play every other team at least once, but this is
obviously not the case so the matrix S is sparse. Using the method
proposed by George Dahl, I define vectors o and d which correspond to
each teams offensive and defensive ability. The approximation to the
matrix S is then just the outer product od' (for example
(od')_12=o1d2=S12est). This is a simple rank one approximation for the
matrix. If each team played each other at least once then the matrix S
would be dense and the vectors o and d could be found by finding the
SVD of S (see http://www.stanford.edu/~boyd/ee263/notes/low_rank_approx.pdf).
Because this is not the case, we instead define a matrix P that
represents which teams played that season. For example, P12=P21=1 if
teams 1 and 2 played a game. Now the problem stated by George can be
expressed compactedly as, "minimize ||P.*(o*d')-S||_F". Here, '.*'
represents the Hadamard product and ||.||_F is the Frobenius norm. In
this from, it is easy to see that, for constant vector o and variable
vector d, this is a convex problem. Also, for constant vector d and
variable vector o this is a convex problem. Therefore, by solving a
series of convex problems, alternating the vector variable between o
and d, the problem converges rapidly in about 5 to 10 steps (see
"Nonnegative Matrix Factorizations" code here http://cvxr.com/cvx/examples/).
From this point the problem is easily expanded to handle higher rank
approximations.
-----------------------------------------------------------------------------
Scott Turner's Prediction Machine
Scott Turner
http://tournament.fantasysports.yahoo.com/t1/1760363
Linear regression on a number of statistics, including strength ratings to
predict MOV (Margin of Victory). The basic model is used to predict game
outcomes throughout the year, but there are some modifications for the
Tournament. Additions this year include a new metric for analyzing possible
upsets, an algorithm for forcing upset selections based upon the (predicted)
score required to win the pool, and some modifications for neutral-court and
tournament games. More details at
http://netprophetblog.blogspot.com/.
-----------------------------------------------------------------------------
noodlebot
Joe
http://tournament.fantasysports.yahoo.com/t1/2298853
See my blog post and project page.
http://joenoodles.com/2013/02/ncaa-d1-basketball-db/
https://github.com/jnu/ncaa
-----------------------------------------------------------------------------
Danny's Dad (Human Baseline)
Danny's Dad.
http://tournament.fantasysports.yahoo.com/t1/2664431
Literally, Danny's Dad's picks.
-----------------------------------------------------------------------------
Obama's Bracket (Human Baseline)
Barack Obama
http://tournament.fantasysports.yahoo.com/t1/1673628
The President's picks.
-----------------------------------------------------------------------------
MatrixFactorizer
Jasper Snoek
http://tournament.fantasysports.yahoo.com/t1/1597161
Probabilistic matrix factorization augmented with Gaussian Processes
and Bayesian optimization. More details will be forthcoming
in a longer blog post (Update:
here).
-----------------------------------------------------------------------------
LA's Machine Mad Pick
LeAnthony M.
http://tournament.fantasysports.yahoo.com/t1/1647581
I used 2011 final four stats data rather than last years. Including RPI, Off
eff, turnovers, & def eff. A fitness function of the final standings NCAA
tournament standings feed into an evolving genetic program giving me a final
equation. I feed in this equations, this years team of 64 to compute the final
standing of the 2013 tournament.
-----------------------------------------------------------------------------
Predict the Madness
Monte McNair
http://tournament.fantasysports.yahoo.com/t1/2002207
???
-----------------------------------------------------------------------------
TheSentinel
Chuck
http://tournament.fantasysports.yahoo.com/t1/2997354
Similar strategy as last year. Used Ken Pomeroy's Pythag ratings with the log5
calculation to determine probability of winning the game.
Used a Monte Carlo simulation at 65 iterations which provided a few interesting
upsets, Oregon over Oklahoma St. (I believe they were miss seeded myself!).
-----------------------------------------------------------------------------
Danny's Dangerous Picks
Danny
http://tournament.fantasysports.yahoo.com/t1/1421921
Developed a variant on probabilistic matrix factorization, where the scores
of a game are modeled as the output of a neural network that takes as input
a learned latent vector for each team as well as the elementwise product of the latent vectors for the
two teams.
Latent vectors for each team are learned for each team for each season jointly with
the neural net parameters, which are shared across all
seasons from 2006-2007 through the present. I used 5D latent vectors and a one
hidden layer neural net with 50 hidden units.
-----------------------------------------------------------------------------
Human Bracket
Lee
http://tournament.fantasysports.yahoo.com/t1/3297751
The Commissioner's human bracket.
-----------------------------------------------------------------------------
The Rosenthal Fit
Jeffrey Rosenthal
http://tournament.fantasysports.yahoo.com/t1/1666195
Details here:
http://www.tsn.ca/story/?id=418503
-----------------------------------------------------------------------------
Last Year's Winner (Baseline)
Jasper Snoek
http://tournament.fantasysports.yahoo.com/t1/1644140
(The winning algorithm from last year, run on this year's data but otherwise
unmodified. Entered as a baseline.) I modified Danny's starter code in two
ways: First, I added an asymmetric component to the loss function, so the model
is rewarded for getting the prediction correct even if the absolute predicted
scores are wrong. Second, I changed the regularization so that latent vectors
are penalized for deviating from the global average over latent vectors, rather
than being penalized for being far from 0. This can be interpreted as imposing a
basic hierarchical prior.
I then ran a search over model parameters (e.g., latent dimension,
regularization strength, parameter that trades off the two parts of the loss
function) to find the setting that did best on number of correct predictions
made in the past 5 years's tournaments.
-----------------------------------------------------------------------------
Leon's Super Legendary Bracket
Leon
http://tournament.fantasysports.yahoo.com/t1/1712730
Defensive efficiency vs Offensive efficiency; tie-breakers favored defense over
offense. Chose final score using season averages in wins/losses.
-----------------------------------------------------------------------------
Tim J's Nets for Nets
Tim J.
http://tournament.fantasysports.yahoo.com/t1/1546944
Based on full season statics for each team run a discriminant analysis for
correlation with wins including seasons 2000-present.
Then I trained a neural network only on neutral location games, measuring both
performance in mean squared error and actual past year bracket scores from
2007-2012, and predicting the bracket for this year.
-----------------------------------------------------------------------------
natebrix's Neat Bracket
Nate
http://tournament.fantasysports.yahoo.com/t1/1931619
The method is a variation on Boyd Nation's Iterative Strength Rating that
incorporates margin of victory and weights late-season games more strongly. This
link has more:
https://nathanbrixius.wordpress.com/2013/03/20/ncaa-tournament-prediction-model-2013/
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
Mark's LR bracket
Mark???
http://tournament.fantasysports.yahoo.com/t1/2504134
Logistic Regression???
-----------------------------------------------------------------------------
Ask me about my T-Rex
Zach Mayer
http://tournament.fantasysports.yahoo.com/t1/1827557
???
-----------------------------------------------------------------------------
ScottyJ's Grand Bracket
???
http://tournament.fantasysports.yahoo.com/t1/1876867
???
-----------------------------------------------------------------------------
Guess O'Bot 3000
???
http://tournament.fantasysports.yahoo.com/t1/1646914
???
-----------------------------------------------------------------------------
Andy's Astounding Bracket
???
http://tournament.fantasysports.yahoo.com/t1/1645698
???
-----------------------------------------------------------------------------
Dan Tran's Dazzling Bracket
???
http://tournament.fantasysports.yahoo.com/t1/1668480
???