Wednesday, October 20, 2010

An Algorithm to Generate Impossible Art?

Posted by Danny Tarlow
I posted last time about constraint satisfaction problems that are locally consistent (arc-consistent) everywhere but inconsistent globally. Part of the interesting bit was the connection to "impossible" Escher-style art, such as the impossible triangle.

This is really only the beginning, though. There should be many more examples where we have local consistency but global inconsistency. For example, it's straightforward to extend the 3-cycle that produces the impossible triangle to any odd-cycle. This produces impossible pentagons, impossible heptagons, etc. I've drawn the pentagon and heptagon examples below: Naturally, I want to push this further, finding more interesting figures to draw. Here's a proposed algorithm:
  • Generate a random constraint satisfaction problem that is arc-consistent but unsatisfiable.
  • Translate the kernel--the locally consistent problem that remains after running arc-consistency--into a drawing.
Both steps present some challenges. Unfortunately I don't (yet) have the answers. Maybe somebody out there can help:

For the first, how do we do it efficiently? I don't know, but we can think of doing guess and check for now. If somebody knows more about this, I'd be interested to hear about it. It seems related to generating hard instances to evaluate constraint satisfaction solvers.

For the second, perhaps somebody with some art background can help. How do we do this generally? I can extend the triangle example to odd cycles of this special form where each variable (corner) has two possible interpretations locally (UPDATE: I'm not sure this is the right interpretation): one where the figure is coming out of the page and one where it is going into the page. But how do we draw a visual analog of a variable with only one legal interpretation? How do we draw variables with more than 2 possible states? How do we draw examples with higher connectivity?

Does anybody know of references to any prior work on this topic?

Saturday, October 16, 2010

Limitations of Locality

Posted by Danny Tarlow
In this day and age, we're used to tackling problems where one processor just isn't enough.

The standard solution, of course, is to use more processors. Frameworks like MapReduce (and Hadoop) give us an abstraction where a master process spawns off a bunch of worker processes, each processor does some work on its own, then they all report their piece of results, which are combined together by the master process at the end to give the answer.

This works great when the pieces of the problem are independent-- that is, the workers don't need to communicate with each other in order to do their job. If your goal is to compute the average of a huge list of numbers, this works great: each processor reports the sum of numbers in its part of the list and the number of elements. If your goal is to compute the median, then it doesn't work so easily. (Think about it.)

So what do we do when work that the processors are doing isn't independent? Imagine that there are a bunch of interacting pieces to your model. For example, in computer vision, a common problem is to label each pixel in an image with the class of object at that pixel. We shouldn't expect to be able to label each pixel independently, because we know that objects in the world are "blobby", so knowing the value of one pixel will tell you something about the likely values of nearby pixels.

As another example, consider the problem of trying to label a webpage with a category representing what it's about. You could try to do it independently--just looking at the words and images on the page. However, a better approach would probably be to take into account dependencies between pages: if page A links to page B, that probably tells us something about the relationship between the category of A and the category of B. This could be especially useful for the cases where a page itself has few words, but it's linked to by many other pages.

In machine learning, perhaps our most powerful formalism to represent these types of interaction is the probabilistic graphical model. The power comes in the ability to precisely specify the structure of interactions in our problem via a graph (or hypergraph). We can then design message passing algorithms that take advantage of this structure: each region does some local computation, and then parts of the model that directly interact with each other exchange messages. In this way, information is propagated amongst the workers, but computation is still mostly done locally. If you've read about Google's Pregel infrastructure, it probably sounds pretty similar.

Unfortunately, there are certain fundamental limitations to doing things completely locally. The interesting thing is that the same fundamental issue comes up in a variety of settings.

In message passing algorithms (e.g.,, neighboring regions send messages to each other to try to enforce local consistency--that is, when two regions have beliefs about overlapping variables, they pass messages to each other to try to ensure that they agree on the value of the overlapping variables. You would hope that if you can find a solution that is locally consistent everywhere, that this would imply that there is a globally consistent solution.

Interestingly, this isn't the case. In MAP inference, this crops up in the looseness of the standard linear programming relaxation, which has constraints that enforce local but not global consistency. In constraint satisfaction problems, this comes up in the incompleteness of arc-consistency algorithms: there are certain unsatisfiable constraint satisfaction problems that arc-consistency, a local consistency checking algorithm, cannot recognize as inconsistent.

I was discussing these connections at my tea talk the other day, and Geoff Hinton pointed out that this same phenomenon also comes up in the impossible triangle optical illusion. I think that's kind of cool -- people also get confused when things look sufficiently consistent locally but actually are inconsistent globally. Here's a diagram showing my interpretation of his comment along with the constraint satisfaction example.

Note that when you look at each corner independently (top row), everything looks consistent in isolation. However, when you look at the figure as a whole (bottom left), there is no globally consistent interpretation. In the constraint satisfaction example (bottom right), bold black edges represent legal combinations of variables, while missing edges represent illegal combinations. In this case, the arc-consistency algorithm terminates without finding an inconsistency, but if you work through the possibilities, you'll see that there is no legal assignment.

Now this isn't to say things are hopeless. Even though our locally consistent solutions can be very bad in the worst case, they often are good enough to be useful. If we want globally consistent exact solutions, we have lift and project methods from theoretical computer science, which show how to gradually move from local to global consistency. The big three frameworks are covered here:
A Comparison of the Sherali-Adams, Lovász-Schrijver and Lasserre Relaxations for 0-1 Programming

In the worst case, these methods become exponentially expensive as you move to become more global (which is not surprising, since we can represent many NP-hard problems with graphical models). However, it turns out that in practice, the real-world problems that we are interested are not the hardest computational problems. In some recent work, David Sontag and company have shown that just making things a little more global can be enough to exactly solve hard protein folding and protein design problems from computational biology: D. Sontag, T. Meltzer, A. Globerson, Y. Weiss, T. Jaakkola. Tightening LP Relaxations for MAP using Message Passing. Uncertainty in Artificial Intelligence (UAI) 24, July 2008.
[pdf] [code]

I should also (shamelessly) point out that global interactions don't always have to amount to intractable models. Some of my own recent work focuses on how we can get the best of both worlds when we have parts of the model that interact locally along with some additional global interactions. It turns out that you can develop very efficient message passing algorithms in some of these cases as well: D. Tarlow, I. Givoni, and R. Zemel. HOP-MAP: Efficient Message Passing with High Order Potentials. The 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
[pdf] [code]

Wednesday, October 13, 2010

Max-Product Belief Propagation and Constraint Satisfaction Problems

Posted by Danny Tarlow
I'm giving the machine learning tea talk today on the Augmenting DAG (directed acyclic graph) algorithm, which is an algorithm with some interesting history. It seems that there was a line of work starting in the mid seventies in the USSR about relating the discrete maximum a posteriori (MAP) inference problem (in English: find the most likely explanation of variables under some probabilistic model) in pairwise graphical models to: (a) constraint satisfaction problems, and (b) linear program relaxations. The work went unnoticed for 30 years (it was published in Russian), but it was recently discovered and reviewed by Tomas Werner in the 2007 journal paper available here:
T. Werner. A Linear Programming Approach to Max-sum Problem: A Review. IEEE Trans. on Pattern Recognition and Machine Intelligence (PAMI) 29(7), July 2007.

It's interesting, because even today, the best known general algorithms for the MAP inference problem are based on linear program relaxations (though we solve the relaxation using algorithms that are more efficient than general purpose solvers). The most notable modern versions are variants on max-product belief propagation: the tree-reweighted belief propagation (TRW) of Martin Wainwright et al. and the sequential TRW (TRW-S) algorithm of Vladimir Kolmogorov (code here:

The other interesting bit is the connection between MAP and constraint satisfaction. It turns out that the inner loop of the Augmenting DAG algorithm is to run an arc-consistency algorithm. If you find an arc-consistency violation in an appropriately constructed graph, then you have also discovered a step that you can take in the dual of the MAP linear program relaxation that will improve your upper bound.

In practice, the prevailing wisdom is that you're better off using max-product or TRW-S to solve a general problem, but in my opinion, the Augmenting DAG algorithm is a very nice algorithm, and the paper makes a lot of interesting connections to work in other fields. It's definitely worth a read if you're interested in this sort of thing.