Friday, October 30, 2009

New York Times data

Posted by Danny Tarlow
I haven't checked it out yet, but this looks pretty interesting:
For more than 150 years, The New York Times has meticulously indexed its archives. Through this process, we have developed an enormous collection of subject headings, ranging from “Absinthe”[1] to “Zoos”[2]. Unfortunately, our list of subject headings is an island. For example, even though we can show you every article written about “Colbert, Stephen [3],” our databases can’t tell you that he was born on May 13, 1964, or that he lost the 2008 Grammy for best spoken word album to Al Gore. To do this we would need to map our subject headings onto other Web databases such as Freebase and DBPedia. So that’s exactly what we did.
http://open.blogs.nytimes.com/2009/10/29/first-5000-tags-released-to-the-linked-data-cloud/

Monday, October 19, 2009

Emacs for C++

Posted by Danny Tarlow
I really need to try this: http://nflath.com/2009/10/c-customizations-for-emacs/

Tuesday, October 13, 2009

NIPS 2009 Workshops

Posted by Danny Tarlow
The list of NIPS 2009 workshops is also up (and has been for a while, I think):
http://nips.cc/Conferences/2009/Program/schedule.php?Session=Workshops

As usual, there look to be a number of interesting sessions. The ones that look most interesting to me are the following:

Twitter and Toronto

Posted by Danny Tarlow
In case the data collection falls through for the other alternative, my "Government 2.0" group is also planning out a Toronto-based app that uses Twitter data as the primary data source. The ground rules are pretty loose: come up with an application that uses data about the city of Toronto and serves some subset of people in Toronto.

Being a group of 4 AI/Machine Learning-ish PhD students, we think it would be fun to try to take a stab at some fairly substantive machine learning problems -- and there seems to be no shortage of need for this on Twitter. Of course, the first step towards an interesting machine learning application is an interesting way of looking at some interesting data. More to come...

Smart meters

Posted by Danny Tarlow
I had a meeting today with some U of T colleagues and a couple people at Toronto Hydro for the "Government 2.0" course I'm currently enrolled in. Like most people probably have, I had heard the buzz about "The Smart Grid" and "Smart Meters," but up until a short time ago, I didn't realize that the smart meter project is not only real, but up and running, collecting data in hundreds of thousands of homes across the Toronto area. From the Toronto Hydro site:
As of August 31, 2009, we had 602,418 smart meters installed across our city and we continue with our installation schedule.
We went in today and got more details about the data, and a quick demo about what the data looks like: for each house that has a smart meter running, they get hourly electricity consumption readings 24 hours a day. This is pretty cool -- it's to the point where you can see a spike in electricity use at 10pm and start speculating about whether it was caused by somebody turning on the dryer. Even just looking at the raw readings, you can start to get a bit of an idea about what the causes of electricity consumption might be.

So the question now is to formulate a project that could make good use of this data. There are some simple ground rules: no privacy violations, and nothing too controversial (E.g., no law enforcement applications). Other than that, it's pretty wide open. What would you do with this data?