Thursday, April 29, 2010

NFL Play-by-Play Data

Posted by Danny Tarlow
Brian over at Advanced NFL Stats, who runs an excellent site, has compiled seven years worth of play-by-play data for the NFL. It looks like an amazing data set:

Thanks, Brian!

Thursday, April 22, 2010

Reddit Data Release

Posted by Danny Tarlow
Reddit has released some data related to how users vote on stories (links):
The format is username,link_id,vote where vote is -1 or 1 (downvote or upvote). The dump is 29MB gzip compressed and contains 7,405,561 votes from 31,927 users over 2,046,401 links. It contains votes only from users with the preference "make my votes public" turned on (which is not the default). This doesn't have the subreddit ID or anything in there, but I'd be willing to make another dump with more data if anything comes of this one
It looks very interesting and could be a good data set for recommendation system-like algorithms.

Tuesday, April 6, 2010

Duke is not the only winner from last night

Posted by Lee
Duke isn't the only winner after last night's incredible matchup. Scott Turner (entry: The Pain Machine) finished first in our Sweet 16 bracket tied with Danny (there is no tie breaker as we did not ask contestants to predict the scoring outcome). Also, congratulations to Danny for having the top entry in both the full and the Sweet 16 brackets!

Thank you to all contestants for your participation and making this a really fun event!

I apologize for the lack of coverage of the results (swamped at work), but I'll be sure to post some results comparisons and more coverage of our inaugural March Madness Predictive Analytics Challenge. In the meantime, you can view the final standings here:
Complete Results for the Full Predictive Bracket
Complete Results for the Sweet 16 Predictive Bracket