*Posted by Danny Tarlow*

Now, debugging MCMC algorithms is somewhat troublesome, due to their random nature and the fact that chains just sometimes mix slowly, but there are some good ways to be pretty sure that you get things right. For example, the Geweke method is highly regarded as _the_ method to make sure you're getting it right. So this exercise is not actually really about debugging. It's more about testing intuitions about the behavior of a sampler.

With that out of the way, on to the question:

I implemented my sampler, initialized it with small random numbers for the parameters, and set it running on a simple test model (which I'm intentionally not describing in detail). One high level statistic that is relevant to look at is the (log) probability of samples versus iteration of the sampler, so I made that plot. It looks like this:

This plot looks a bit surprising. Upon initialization, the sampler moves directly to regions of space that have very low probability (remember, this is a _log_ probability*), and it appears to just keep going to exponentially less and less probable regions. The point of a sampler is that it should spend an amount of time in a state in proportion to the state's probability. And this sampler is making a beeline to a state that is e^-600 times less probable than where it started.

So here's the question: do I have a bug? In other words, if you were my supervisor and I came to you with this plot, would you dismiss this plot and send me back to debugging. If not, explain how this possibly could make sense.

I'll post my answer sometime in the next couple days.

* I'm leaving out constants, so the graph would be shifted down (but wouldn't change shape or scale) if I was including all the constants.

**Update:**This is long overdue, but Iain Murray nailed it in the comments:

No there isn’t (necessarily) a bug. This type of plot is very easily produced with valid code: e.g. by slice sampling a unit spherical Gaussian distribution in D=5000 dimensions and initializing at an atypical point of high-probability (much closer to the origin than sqrt(D) away). Simple Metropolis and slice samplers can’t change the log-prob by more than ≈1 per iteration, so large log-prob changes are slow and painful. Intelligent proposals, reparameterizations, or auxiliary variable methods can improve matters. This is a nice illustration that initializing with an optimizer (without some form of early stopping) can be a bad idea!In fact, I was using Matlab's built-in slice sampler, along with a zero-mean, spherical, many thousand-dimensional Gaussian distribution for the likelihood, and initializing near the mode (0).