Monday, July 20, 2009

Summer school wrap up

Posted by Danny Tarlow
I've been quiet recently on the blog front for two pretty significant reasons: (a) I've been busy attending summer school, talking with interesting people, hanging out on the beach, and traveling through Italy; and (b) I've had very little prolonged, reliable, free internet access. Most times I've signed online have been to hastily answer a few emails and maybe to book a train, flight, or hotel for the next city.

I'm now settled down a bit in Paris, so I thought I'd give a quick recap of summer school.

The over-arching theme of the school was how to use machine learning to attack hard problems in computer vision. There is plenty of overlap in the tools and problems that researchers in both fields think about, but there is a bit of prejudice on both sides. The oversimplified version is that machine learning people think computer vision people use ad-hoc algorithms that are over-engineered to only work well on the specific data sets they're using; and computer vision people think that machine learning people are too high on their pedestals, coming up with methods that may be "pretty," but that would never work on large enough problems to be of any interest.

In reality, neither of these views are completely true, and we saw some great talks at the school showing how you can come up with well-justified approaches using machine learning towards solving computer vision problems that are efficient and powerful.

All of the talks were very good, but two of the highlights for me were Pushmeet Kohli's talk on using graph cuts for energy minimization, and Rob Fergus's talk about internet-scale image retrieval.

Pushmeet's abstract:
Over the last few years energy minimization has emerged as an indispensable tool in computer vision. The reason for this rising popularity has been the successes of efficient Graph cut based minimization algorithms in solving many low level vision problems such as image segmentation, object recognition, and stereo reconstruction. This tutorial will explain how these algorithms work, and what classes of minimization problems they can solve.
The nice part of the talk is that he went far beyond the standard case, where you have only two classes (e.g., foreground and background) with submodular potentials (crudely: nearby pixels should prefer to take the same label). In the past 5 or so years, some serious advances have been made in making these methods applicable to a wide variety of interesting, realistic problems. He talked about the non-discrete case, the non-submodular case, the multi-label case, and the higher-order potentials case.

Rob Fergus's abstract:
Existing object recognition methods are not practical to use on huge image collections found on the Internet and elsewhere. Recently, a number of computer vision approaches have been proposed that scale to millions of images. At their core, many of them rely on machine learning techniques to learn compact representations that can be used in efficient indexing schemes. My talk will review this work, highlighting the common learning techniques used and discuss open problems in this area.
The nice part about this talk was that he chose approaches that were extremely practical. Rather than having a method (say, machine learning) that you want to apply to a problem whether it fits or not, this was a more gentle approach, using efficient algorithms and data structures, then only applying a small amount of machine learning in the cases where it produced the largest gains -- in this case, learning hash functions.

Also, if you're clicking around the ICVSS page, don't miss this one =P:

No comments: