Thursday, August 11, 2011

Testing Intuitions about Markov Chain Monte Carlo: Do I have a bug?

Posted by Danny Tarlow
For one project I've been working on recently, I'm using a Markov Chain Monte Carlo (MCMC) method known as slice sampling. There are some good tutorials, examples, and implementations out there (e.g., by Iain Murray or Radford Neal), but for various reasons, I wanted to implement my own version.

Now, debugging MCMC algorithms is somewhat troublesome, due to their random nature and the fact that chains just sometimes mix slowly, but there are some good ways to be pretty sure that you get things right. For example, the Geweke method is highly regarded as _the_ method to make sure you're getting it right. So this exercise is not actually really about debugging. It's more about testing intuitions about the behavior of a sampler.

With that out of the way, on to the question:
I implemented my sampler, initialized it with small random numbers for the parameters, and set it running on a simple test model (which I'm intentionally not describing in detail). One high level statistic that is relevant to look at is the (log) probability of samples versus iteration of the sampler, so I made that plot. It looks like this:
This plot looks a bit surprising. Upon initialization, the sampler moves directly to regions of space that have very low probability (remember, this is a _log_ probability*), and it appears to just keep going to exponentially less and less probable regions. The point of a sampler is that it should spend an amount of time in a state in proportion to the state's probability. And this sampler is making a beeline to a state that is e^-600 times less probable than where it started.

So here's the question: do I have a bug? In other words, if you were my supervisor and I came to you with this plot, would you dismiss this plot and send me back to debugging. If not, explain how this possibly could make sense.

I'll post my answer sometime in the next couple days.

* I'm leaving out constants, so the graph would be shifted down (but wouldn't change shape or scale) if I was including all the constants.

Sunday, April 10, 2011

Crawling Code

Posted by Lee

One of the contestants requested that I upload the code to crawl the boxscores. I have done so and it is available on github: https://github.com/leezen/boxscore-crawler

Note that Yahoo changed its format starting around March 10th and the code uses the flag to get the old boxscore format. It is unclear how long this option will remain available from Yahoo.

2011 Predictive Analytics Challenge Winner

Posted by Lee

We knew it would be a machine, but we didn't know which one until UConn's victory ensured The Pain Machine the title as winner of the 2011 March Madness Predictive Analytics Challenge! Congraulations to Scott Turner and his entry on the victory -- his second in a row! He will be receiving a $25 gift certificate to Amazon.com. It doesn't sound like he'll be resting on his laurels as he's started a blog that will detail further development of his system.

Thank you to all the participants and entrants this year. We would love to know how you thought the contest went, how we can improve for next year, and any other feedback you might have! We look forward to your participation again next year!

Sunday, March 27, 2011

2011 Algorithmic March Madness: Machines Lock in Victory over Humans

Posted by Danny Tarlow
Well that was an exciting and surprising weekend of basketball. (8)Butler beat (2)Florida in overtime, and (11)Virginia Commonwealth handily took care of (1)Kansas, eliminating the last remaining #1 seed. Rounding out the Final Four are (3)UConn and (4)Kentucky.

March Madness is usually good for some Cinderella stories, but this Final Four seems particularly improbable. There are no #1 seeds remaining (this has been the case 3 times in March Madness history, according to the TV announcers), and never before have teams seeded as low as #11 and #8 met in a Final Four game.

None of the entries in our second annual March Madness Algorithm Challenge saw this amount of madness coming. Two entrants correctly predicted that (3)UConn would make the final four (Team Delete Kernel and The Pain Machine), but no other algorithms or baselines got any Final Four teams correct. In fact, the only entry that has any more chance at points is The Pain Machine, which has UConn winning one more game.

So the question of which algorithm will win the contest is still not settled: a UConn victory on April 2, and The Pain Machine walks home with the prize; a UConn loss, and Team Delete Kernel is our winner.

What is settled at this point is that a machine will claim victory over the human-aided competition. The human baselines include our commissioner Lee's bracket; the Higher Seed bracket (where the human intervention came via the committee that chose seeds); and the Nate Silver baseline, which was a part-human, part-computer effort.

To give you an idea of the potentially winning methodologies, Scott Turner (The Pain Machine) describes his approach here, and Kevin Lee (Team Delete Kernel) based his model on the method described here.

So it's premature to congratulate a winner yet, but let me tritely say that I, for one, welcome our new March Madness algorithm overlords.
ENTRANT                         R1  R2  R3  R4  R5  Winner      Pts  Possible
Team Delete Kernel              23  20  16  8   -   Ohio St.    67   67
Human (Lee)*                    25  18  16  0   -   Kansas      59   59
Baseline (Higher Seed)*         25  20  12  0   -   Ohio St.    57   57
The Pain Machine                19  18  8   8   -   Kansas      53   69
Baseline (Nate Silver)*         25  20  8   0   -   Ohio St.    53   53
InItToWinIt                     22  20  8   0   -   Kansas      50   50
Baseline (TrueSkill)            26  18  4   0   -   Ohio St.    48   48
Danny's Dangerous Picks         22  16  8   0   -   Duke        46   46
Baseline (LRMC)                 25  16  4   0   -   Ohio St.    45   45
DukeRepeats                     23  16  4   0   -   Duke        43   43
Point Differential Centrality   23  16  4   0   -   Ohio St.    43   43
dirknbr1                        23  8   4   0   -   Ohio St.    35   35

* Denotes human-involvement.
You can see the full brackets here. Also, there is a second, Sweet 16 contest that we haven't mentioned lately. Stay tuned for an update on that front.

Friday, March 25, 2011

Algorithmic March Madness: Elite 8 Update

Posted by Danny Tarlow
The Sweet 16 saw some major upsets, with two of the remaining three number one seeds falling. Perhaps most impressive was Arizona's (5 seed) 93-77 thumping of Duke (1 seed). In the Elite 8, we have the following seeds: 4, 2, 5, 3, 1, 11, 8, 2.

No entrants predicted that Baylor Butler (8 seed) or VCU (11 seed) would have made it this far, but we're starting to see the first signs of weakness in the Higher Seed and Nate Silver baselines. At this point, our Commissioner's human-chosen baseline bracket is tied for first with Team Delete Kernel's algorithmically chosen bracket. Don't count out InItToWinIt or The Pain Machine quite yet, though. I haven't worked through all the scenarios, but if Kansas wins it all and Arizona beats UConn, InItToWinIt has a good shot at the title. If UConn makes it to the final game, The Pain Machine is looking strong. And if Kentucky wins it all, Team Delete Kernel looks to have at least a share of the title locked up. Interestingly, I haven't been able to find a potential outcome where any baseline other than Lee's human-chosen bracket has a chance at the title. So after a tough start, the algorithmically-chosen brackets are making their move!

Currently, here are the standings:
ENTRANT                         R1  R2  R3  R4  R5  Winner      Pts  Possible
Human (Lee)                     25  18  16  -   -   Kansas      59   131
Team Delete Kernel              23  20  16  -   -   Ohio St.    59   91
Baseline (Higher Seed)          25  20  12  -   -   Ohio St.    57   81
Baseline (Nate Silver)          25  20  8   -   -   Ohio St.    53   77
InItToWinIt                     22  20  8   -   -   Kansas      50   106
Baseline (TrueSkill)            26  18  4   -   -   Ohio St.    48   48
Danny's Dangerous Picks         22  16  8   -   -   Duke        46   70
The Pain Machine                19  18  8   -   -   Kansas      45   125
Baseline (LRMC)                 25  16  4   -   -   Ohio St.    45   69
DukeRepeats                     23  16  4   -   -   Duke        43   67
Point Differential Centrality   23  16  4   -   -   Ohio St.    43   51
dirknbr1                        23  8   4   -   -   Ohio St.    35   59

Tuesday, March 22, 2011

Sweet Sixteen Bracket

Posted by Lee

If you did not have a chance to get your entry in for the full 63-pick tournament, you still have a chance to make picks in the condensed 15-pick format for our Sweet Sixteen Bracket! The next set of games begin Thursday night (March 24) Eastern time, so you will need to submit your brackets by then. If you already entered in the original bracket, you have been invited to the new one. If you are new and would like to participate, please e-mail leezen+MarchMadness at gmail.

Currently, in the full bracket prediction contest, Team Delete Kernel and InItToWinIt are sitting in the top half of the pack. Don't count The Pain Machine out yet, though, it can still score a possible 153 points, which is more than TrueSkill has available even though it currently ranks third.

P.S. Sorry I did not get around to posting competitor profiles this weekend.

Monday, March 21, 2011

After Round 2: Points from Upsets

Posted by Danny Tarlow
You can see the current standings here. At the moment, 3 of the top 5 (including the top 2) entries used some sort of human judgement in their algorithm: picking based on Higher Seed (tied for 1st) is the judgement of the seeding committee; the Nate Silver (tied for 1st) baseline uses seed information along with several other power ratings, some of which are human based, and some of which are computer based; and the Lee bracket (tied for 4th) was filled out by Lee with no computer assistance. The top two computer entries are the TrueSkill algorithm (3rd) that Scott Turner implemented and the Delete Kernel (tied for 4th) entry from Kevin, who built his entry based on the simplest 1D probabilistic matrix factorization model that I wrote about previously (and released code for).

I think the take-away at this point is that the real winner right now is whoever decided on the seeding of teams. The Higher Seed bracket is tied in first place, and the strength of the other brackets mostly comes from how closely their picks matched the higher seed. Yes, there have been a lot of upsets, including some big ones, and the entrants did indeed pick some upsets, but the entrants didn't generally pick the right upsets.

Here's an alternative way of looking at results that reveals this. I took the point totals for each contestant and split off the contribution to the point total that came from picking an upset. For the two rounds, I report "A/B" where B is the total number of points the entrant earned in the Yahoo bracket, and A is the number of points that came from predicting upsets. I define "upset points" to be points gained from a correct prediction, where the winning team is not the best ranked seed that could possibly have made it to that point. So even though Richmond (12) was the favorite over Morehead (13), predicting Richmond making it to the Sweet 16 would give a contestant 2 "upset points", because the best ranked seed that could have made it to that point was Louisville (4). Here are the results:
Points from upsets

TEAM                   R1      R2     Total
Delete Kernel:        4/23    2/20    6/43
InItToWinIt:          2/22    2/20    4/42
Human (Lee):          1/25    2/18    3/43
Point Differential:   3/23    0/16    3/39
Silver:               3/25    0/20    3/45
LRMC:                 3/25    0/16    3/41
Dirknbr1:             0/23    2/8     2/31
Danny:                2/22    0/16    2/38
TrueSkill:            2/26    0/18    2/44
The Pain Machine:     0/19    0/18    0/37
Higher Seed:          0/25    0/20    0/45
Under this evaluation measure (which is admittedly not what anybody was trying to optimize), the completely computer-based models are doing better. Perhaps the real take-away at the moment, though, is that predicting upsets is hard!