Thursday, February 25, 2010

AISTATS Papers

Posted by Danny Tarlow
Machine Learning people in Toronto had a good year at the Thirteenth International Conference on Artificial Intelligence and Statistics this year. We'll have 10 papers at the upcoming conference in Sardinia:
http://learning.cs.toronto.edu/index.shtml?section=research

I'll post about my paper and publish some code sometime in the upcoming month or two. Congrats to everybody.

Wednesday, February 3, 2010

Hot Button Issues Part 2: ClimateGate

Posted by Danny Tarlow
Derek, a friend of mine from high school and a guy that I have a lot of respect for, and I have been having a conversation in the comments of another post, and the topic has turned to "ClimateGate". It's interesting, because we have quite different viewpoints (I, for example, am pretty uninformed but will usually take the side of scientists when all else is equal), but the bigger issue is really how we as non climate experts can make sense out of so many conflicting stories, where there is bias and politically-charged agendas looming at every turn.

There are a lot of issues at play here (which I find very interesting), and I don't think I can properly address them all today (or maybe ever). This is good fodder for several full posts, though, I think. It also ties in with the post I wrote a while back about scientific controversies and hot-button issues.

On to climategate: Derek points me to conservapedia as a more reliable source than Wikipedia:
The reason I pointed you to Conservapedia is because they do allow primary sources and original work to be included in their articles.
This is surprising to me, but I'm happy going with it--the more primary sources the better.

The first thing to note is that there is a lot going on here. The first subsection is about Data Manipulation, so it seems reasonable to start reading there. The main issue seems to be part of the source code that is shown in the cited article titled, The Proof Behind the CRU ClimateGate Debacle. After some Googling, the code directory that this was taken looks to be from here:
http://www.di2.nu/foia/osborn-tree6/

Specifically, a comment in the file briffa_sep98_d.pro says:
;
; Apply a VERY ARTIFICAL correction for decline!!
;
This seems to be one of a set of files amongst 4 labeled "briffa_sep98_a.pro" through "briffa_sep98_d.pro". The other files, (a-c), don't seem to have this comment.

One thing I can say from personal experience is that it's not uncommon to make up data at some point in the research process. There are plenty of reasons why it is actually good practice, because it lets you verify that your code is working as expected--for example, if you make up some crazy or random data and you're getting good results, you should really start to question your methods--you're doing something wrong. A good illustration of the case is the one where they "discovered" (haha) that dead salmon can perceive human emotions:
http://www.wired.com/wiredscience/2009/09/fmrisalmon/

Anyhow, I'm not sure what the point of applying this artificial adjustment would be in the climate case (I won't pretend to begin to understand how all that code fits together), but then again, if you were trying to hide something devious, it also doesn't seem like a good idea to make it stand out with a big three-line comment with caps to emphasize how artificial it is.

So in and of itself, one comment about artificial changes to an array in a huge directory of code doesn't seem like that big of a deal to me. This is a change in the final plotting of the results (not something buried deep in a model), so the important questions are which graphs this code produced, where they were published, and what claims they were used to support. If this code could be shown to have produced a figure in a published paper or influential presentation (and it was not explained as being artificial), it would be a very big deal in my eyes.

I haven't read the other sections or main issues, so I won't comment on them now.

At a broader level, I absolutely agree with the criticisms of the scientists for failing to release data and code. Especially for these controversial issues, I think it's important to let anybody who wants to run your code and reproduce every figure and table in your results (I try to do this on my blog, but I admit I could do better in my research. It's something I am working on, but it does take work). Not releasing code and data doesn't mean their conclusions are wrong, but I don't think they're upholding the spirit of science.

However, if somebody finds an error in the code (which is 100% plausible) and wants to dispute the results, I do think peer review is the proper venue--not blogs or popular media. You can't expect a scientist to defend him or herself from every blog post or news article out there. It would be a full time job and an extremely frustrating battle, which I wrote more about here. Most scientific journals that I know of will publish notes that point out errors in papers that they've published (see e.g., the discussion here). If you find an error, send it to the editorial board to verify, then they will verify it and ask the scientist to respond. If you come up with a better way of doing things, write a paper and publish it.

Now, you can further question the foundations of peer review or the bias of a scientific group, but that is a much bigger topic that will have to wait for another day.

Finally, this quote did resonate with me:
Climate researchers know their prescriptions don't carry the certainty laymen assume from that which is labeled "science," yet most shy from a straightforward account of this uncertainty.

"Methods certainly need to be continually refined and improved. I doubt that anyone in the paleoclimate community would disagree with that," says Rob Wilson of the University of St. Andrews's School of Geography and Geosciences. "However, can the nuances of methodological developments be communicated to the laymen—and would they want to know?"
Wilson goes on to say that he doesn't think people would want to know. I disagree, but I also don't know how to communicate the nuances effectively. Much of science takes tens of years for very smart people to really learn, and the conclusions are often of the form, "we think this, but we're not totally sure". To add to that, often times scientists are not the best communicators in the world. It takes a rare and special person to figure out how to distill these complex ideas, nuances, and uncertainties into explanations that people can understand. I think it's absolutely something that scientists should continually be thinking about, and I do think scientists should be open to audit by the public, so long as that doesn't require them to spend all their time responding to unfounded criticisms.