Thursday, April 22, 2010

Reddit Data Release

Posted by Danny Tarlow
Reddit has released some data related to how users vote on stories (links):
The format is username,link_id,vote where vote is -1 or 1 (downvote or upvote). The dump is 29MB gzip compressed and contains 7,405,561 votes from 31,927 users over 2,046,401 links. It contains votes only from users with the preference "make my votes public" turned on (which is not the default). This doesn't have the subreddit ID or anything in there, but I'd be willing to make another dump with more data if anything comes of this one
It looks very interesting and could be a good data set for recommendation system-like algorithms.

