Monday, March 15, 2010

Updated Player Data

Posted by Lee
In the original announcement post, I did not have 2010 player data included. I have now included that data in the player data.

Though the data has the same format as the other player data and I tried as hard as possible to match players to players, it's possible that some of the players are matched inaccurately. This might be due to two players with the same name on a team or players who I have assumed transfer schools but maybe actually be another player with the same name appearing on a different team.

In the process of doing some checks on the 2010 data, I realized that I had made a mistake in the hashing of the original data set. Players with the same name, instead of the same hash, were being mapped to each other. I've fixed that and re-uploaded all the player data. Sorry about this and I hope this doesn't severely inconvenience anyone.

The most up-to-date version of the data are available at

The 2010 player data has a slightly different schema (sorry!) It includes three sets of field goal figures -- field goals made and attempted without 3 pointers, field goals made and attempted including 3 pointers, and 3 pointers made and attempted. Also note that the last four columns are in slightly different order.
  • ID (GUID)
  • Name
  • Height
  • Position
  • Team
  • Year
  • Class (Freshman, Sophomore, Junior, Senior)
  • Games - the number of games the player participated in
  • Field goals (shots) made, excluding three point shots
  • Field goal attempts, exlcuding three point shots
  • Field goals (shots) made, including three point shots
  • Field goal attempts, including three point shots
  • Three point shots made
  • Three point shots attempted
  • Free throws made
  • Free throw attempts
  • Rebounds
  • Assists
  • Steals
  • Blocks


Eric said...

Hey, thanks for making all of this available. I noticed that the 2010 season data (march_madness_2009_2010_3.tgz) is missing games after 3/7/10. Do you think you'll be able to add those in? Syracuse's loss at the end of the season is especially important!


Danny Tarlow said...

Hi Eric, I've updated this post. You should be able to get the data at the same place but replacing the 3 with a 4. Let me know if that is missing any games (it shouldn't be).

Hugues said...

thanks for the data. The 2010 player data have 2 additional columns. What are they ? Thanks

Eric said...

Thanks, looks good.

Lee said...

Yes, I just realized that it has two extra columns. Apologies -- I did not mean to leave those in. The extra columns are the FG and FGA with 3 point shots made and attempted included, respectively. I've updated the post to reflect this schema. Also, in checking my columns, the ordering is slightly different at the end. Please be aware of this. I will make notes on both the posts providing data.

David said...

Hey has anyone read the book Mathletics? I just started reading and looking into changing the algo to match the concepts in the book..