Friday, October 30, 2009

New York Times data

Posted by Danny Tarlow
I haven't checked it out yet, but this looks pretty interesting:
For more than 150 years, The New York Times has meticulously indexed its archives. Through this process, we have developed an enormous collection of subject headings, ranging from “Absinthe”[1] to “Zoos”[2]. Unfortunately, our list of subject headings is an island. For example, even though we can show you every article written about “Colbert, Stephen [3],” our databases can’t tell you that he was born on May 13, 1964, or that he lost the 2008 Grammy for best spoken word album to Al Gore. To do this we would need to map our subject headings onto other Web databases such as Freebase and DBPedia. So that’s exactly what we did.

No comments: