Importance Sampling

by Weidong Liang

Beijing, 2014.04.05

Introduction

As mentioned in the chapter introduction, one possible usage of pseudo-random number sampling is to obtain a numerical approximation to the integral $ \mathbb{E}[f(x)] = \int_{x \sim P(X)} {f(x) \, dx} \approx \frac{1}{N} \, \sum_1^N{f(x_i)} $, importance sampling provides a framework for evaluating the expectation; but itself is not able to draw samples from the distribution $ P(X) $.

Importance sampling assume that we are able to evaluate the given probability density function up to a normalization constant, i.e., let $ p(x) $ be the probability density function of the concerned distribution, and let $ p(x) = \frac{\tilde{p}(x)}{Z_p} $, importance sampling required that we can readily evaluate $ \tilde{p}(x) $, ability to evaluate the normalization constant $ Z_p $ is not required.

Assuming that we have a proposal distribution $ q(x) $ from which we can easily draw samples $ x^{(l)} $ from, the simple form of importance sampling is: $$ \mathbb{E}[f] = \int{f(x) p(x) dx} = \int{f(x) \frac{p(x)}{q(x)} q(x) dx} \approx \frac{1}{L} \sum_{l=1}^{L}{ \frac{p(x^{x(l)})}{q(x^{(l)})} f(x^{(l)}) } $$

For distribution that can only evaluated up to a normalization constant, let $ q(x) = \frac{\tilde{q}(x)}{Z_q} $, we have: $$ \mathbb{E}[f] = \int{f(x) p(x) dx} = \int{f(x) \frac{p(x)}{q(x)} q(x) dx} = \frac{Z_p}{Z_q} \int{ f(x) \frac{ \tilde{p}(x) }{ \tilde{q}(x) } dx } \approx \frac{Z_p}{Z_q} \frac{1}{L} \sum_{l=1}^{L}{ \tilde{r_l} f( x^{(l)} ) } $$ where $ \tilde{r_l} = \frac{ \tilde{p}( x^{(l)} ) }{ \tilde{q}( x^{(l)} ) } $. Similarly, the ratio of $ \frac{Z_p}{Z_q} $ can also be approximated using: $$ \frac{Z_p}{Z_q} = \frac{1}{Z_q} \int{ \tilde{p}(x) dx } = \int{ \frac{ \tilde{p}(x) }{ \tilde{q}(x) } q(x) dx } \approx \frac{1}{L} \sum_{l=1}^{L}{ \tilde{r_l} } $$ Combining the above we have: $$ \mathbb{E}[f] \approx \sum_{l=1}^{L}{ w_l f(x^{(l)}) } $$ where $$ w_l = \frac{ \tilde{r_l} }{ \sum_m{ \tilde{r_m} } } = \frac{ \tilde{p}( x^{(l)} ) / q( x^{(l)} ) }{ \sum_m{ \tilde{p}( x^{(m)} ) / q( x^{(m)} ) } } $$

Similar to rejection sampling, the efficiency of the importance sampling depends greatly on how welll the sampling distribution $ q(x) $ matches $ p(x) $.

Simulation

In the following simulation, we construct a toy problem using the bimodal normal distribution $ P(x) = \frac{1}{2}[N(0.8, 0.1^2) + N(0.3, 0.2^2)] $ as the distribution under which, the objective function $ F(x) = sin(5 x)+2.2 $ is evaluated to find the mean. The proposal distribution is a unimodal normal distribution $ Q(x) = N(0.5, 0.4^2) $ whose samples can be easily generated.

Algorithm

ExpectationByImportanceSampling(tilde_P, F, Q, QSampler, num_samples) // Evaluate $ \mathbb{E}_{P(X)}[F] $ using Importance Sampling

norm_factor = 0 //Normalization factor for $ w_l $, which is $ \sum_m{ \tilde{p}(x^{(m)}) / q(x^{(m)}) } $
mean_f = 0
for i = 0; i < num_samples; ++i
1. x_i = QSampler() //Draw sample x_i from Q(x)
2. r_i = tilde_P(x_i) / Q(x_i)
3. mean_f += r_i * F(x_i)
4. norm_factor += r_i
return mean_f / norm_factor

Implementation

Note

Since it is often the case that $ p(x)f(x) $ has big variation across the domain and that a significant portion of the mass is concentrated only in relatively small regions; therefore in the set of importance weights $ \{ r_l \} $, it could be dorminated only by a few weights with large values. If the proposal distribution $ Q(x) $ is so far off such that the importance weights produced are small at regions where $ p(x)f(x) $ is large, then the resuling mean might not be a very good approximation to the actual mean.

Reference

Bishop: Pattern Recognition and Machine Learning