Metropolis Algorithm

by Weidong Liang

Beijing, 2014.04.09

Introduction

Simliar to rejection sampling, in the Metropolis algorithm, we sample from a proposal distribution; the difference is that we maintain a record of the current state $ x^{\tau} $ so that the new sample would be generated using $ q(x | x^{\tau}) $. Also, Metropolis algorithm requires that the proposal distribution is symmetric, i.e. $ q(x_A | x_B) = q(x_B | xA) $ for all values of $ x_A $ and $ x_B $. The proposal sample is then accepted with the probability: $$ A(x^*, x^{(\tau)}) = min(1, \frac{\tilde{p}( x^* )}{\tilde{p}( x^{(\tau)} )} ) $$ where $ P(x) = \tilde{p}(x) / Z $ is the desired distribution. If the proposal sample is accepted, then $ x^{(\tau + 1)} = x^* $, otherwise $ x^{(\tau + 1 )} = x^{(\tau)} $ and a new sample is drawn from $ q( x | x^{\tau} )$ and the process is repeated.

Simulation

In this simulation, we will apply the Metropolis Algorithm to sample from a bivariate Gaussian with a proposal probability distribution of isotropic Gaussian, $ P(X, Y) = P(X)P(Y) $, $ P(X) \sim N(\mu, \sigma^2) $, $ P(Y) \sim N(\mu, \sigma^2) $.

For multi-variate Gaussian distribution, we have pdf: $$ f_x(x_1, ..., x_k) = \frac{1}{ \sqrt{ (2\pi)^k | \Sigma | } } exp({ -\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu) }) $$ In the bivariate case, we have: $$ \mu = \begin{bmatrix} \mu_x \\ \mu_y \end{bmatrix} , \Sigma = \begin{bmatrix} \sigma_x^2 & \rho \sigma_x \sigma_y \\ \rho \sigma_x \sigma_y & \sigma_y^2 \end{bmatrix} $$ therefore, the pdf of the bivariate Gaussian distribution is: $$ f(x, y) = \frac{1}{2 \pi \sigma_x \sigma_y \sqrt{1 - \rho^2} } exp( -\frac{1}{2 (1 - \rho^2) } [ \frac{ (x - \mu_x)^2 }{ \sigma_x^2 } + \frac{ (y - \mu_y)^2 }{ \sigma_y^2 } - \frac{2 \rho (x - \mu_x) (y - \mu_y) }{ \sigma_x \sigma_y } ] ) $$

To sample from the bivariate Gaussian distribution, we sample from a proposal isotropic Gaussian distribution $ q(x | x^{\tau}) = N(x^{\tau}, \sigma) $; we starts at the point colored orange, and the subsequent samples are drawn and connected with a line which if accepted will be colored green and if rejected will be colored red. The blue points form an equi-probability contour, which are generated from the desired bivariate Gaussian distribution with a standard deviation of 1. Note how the samples drawn eventually move from the lower right to the area enclosed by the contour line; thus the samples eventually move from are of low probability to area of high probability (as determined by the objective probability distribution).

TODO: add histogram plot for x(0), x(1)

Algorithm

Metropolis-Algorithm(tilde_q(x), QSampler(x) )

if $ x^{(\tau)} $ not initialized

$ x^{(\tau)} = argmax_x{\tilde{p}(x)} $ // if not possible, any x should do

$ x^* \sim QSampler(x^{\tau}) $ // sample from $ q(x|x^{(\tau)}) $
$ A(x^*, x^{(\tau)}) = min(1, \frac{\tilde{p}(x^*)}{\tilde{p}(x^{(\tau)})}) $ //Calculate acceptance probability
$ u = U(0, 1) $ // generates a uniform random sample
if $ u \lt A(x^*, x^{(\tau)}) $

$ x^{(\tau)} = x^* $
return $x ^* $

else

return Metropolis-Algorithm(tilde_q(x), QSampler(x) )

Implementation

Note

From the formulation of the acceptance probability $ A(x^*, x^{(\tau)} ) $, we know that if the new sample is more likely, it will always be accepted; otherwise the less likely it is compared to $ x^{\tau} $, the less likely it will be accepted; therefore the samples generated will eventually falls within the area where $ p(x) $ is large, i.e. more likely to be generated.

The samples generated are highly correlated, correlation between samples maybe reduced selecting only every n-th generated sample by increasing the step sizes between each state transition.

Reference

Metropolis Hastings Algorithm
Bishop: Pattern Recognition and Machine Learning
Examples on Gibbs Sampling and Metropolis-Hastings Algorithm