Importance Sampling

by Weidong Liang

Beijing, 2014.04.05


Introduction

As mentioned in the chapter introduction, one possible usage of pseudo-random number sampling is to obtain a numerical approximation to the integral \( \mathbb{E}[f(x)] = \int_{x \sim P(X)} {f(x) \, dx} \approx \frac{1}{N} \, \sum_1^N{f(x_i)} \), importance sampling provides a framework for evaluating the expectation; but itself is not able to draw samples from the distribution \( P(X) \).

Importance sampling assume that we are able to evaluate the given probability density function up to a normalization constant, i.e., let \( p(x) \) be the probability density function of the concerned distribution, and let \( p(x) = \frac{\tilde{p}(x)}{Z_p} \), importance sampling required that we can readily evaluate \( \tilde{p}(x) \), ability to evaluate the normalization constant \( Z_p \) is not required.

Assuming that we have a proposal distribution \( q(x) \) from which we can easily draw samples \( x^{(l)} \) from, the simple form of importance sampling is: $$ \mathbb{E}[f] = \int{f(x) p(x) dx} = \int{f(x) \frac{p(x)}{q(x)} q(x) dx} \approx \frac{1}{L} \sum_{l=1}^{L}{ \frac{p(x^{x(l)})}{q(x^{(l)})} f(x^{(l)}) } $$

For distribution that can only evaluated up to a normalization constant, let \( q(x) = \frac{\tilde{q}(x)}{Z_q} \), we have: $$ \mathbb{E}[f] = \int{f(x) p(x) dx} = \int{f(x) \frac{p(x)}{q(x)} q(x) dx} = \frac{Z_p}{Z_q} \int{ f(x) \frac{ \tilde{p}(x) }{ \tilde{q}(x) } dx } \approx \frac{Z_p}{Z_q} \frac{1}{L} \sum_{l=1}^{L}{ \tilde{r_l} f( x^{(l)} ) } $$ where \( \tilde{r_l} = \frac{ \tilde{p}( x^{(l)} ) }{ \tilde{q}( x^{(l)} ) } \). Similarly, the ratio of \( \frac{Z_p}{Z_q} \) can also be approximated using: $$ \frac{Z_p}{Z_q} = \frac{1}{Z_q} \int{ \tilde{p}(x) dx } = \int{ \frac{ \tilde{p}(x) }{ \tilde{q}(x) } q(x) dx } \approx \frac{1}{L} \sum_{l=1}^{L}{ \tilde{r_l} } $$ Combining the above we have: $$ \mathbb{E}[f] \approx \sum_{l=1}^{L}{ w_l f(x^{(l)}) } $$ where $$ w_l = \frac{ \tilde{r_l} }{ \sum_m{ \tilde{r_m} } } = \frac{ \tilde{p}( x^{(l)} ) / q( x^{(l)} ) }{ \sum_m{ \tilde{p}( x^{(m)} ) / q( x^{(m)} ) } } $$

Similar to rejection sampling, the efficiency of the importance sampling depends greatly on how welll the sampling distribution \( q(x) \) matches \( p(x) \).


Simulation

In the following simulation, we construct a toy problem using the bimodal normal distribution \( P(x) = \frac{1}{2}[N(0.8, 0.1^2) + N(0.3, 0.2^2)] \) as the distribution under which, the objective function \( F(x) = sin(5 x)+2.2 \) is evaluated to find the mean. The proposal distribution is a unimodal normal distribution \( Q(x) = N(0.5, 0.4^2) \) whose samples can be easily generated.


Algorithm

ExpectationByImportanceSampling(tilde_P, F, Q, QSampler, num_samples)  // Evaluate \( \mathbb{E}_{P(X)}[F] \) using Importance Sampling

  1. norm_factor = 0  //Normalization factor for \( w_l \), which is \( \sum_m{ \tilde{p}(x^{(m)}) / q(x^{(m)}) } \)
  2. mean_f = 0
  3. for i = 0; i < num_samples; ++i
    1. x_i = QSampler()  //Draw sample x_i from Q(x)
    2. r_i = tilde_P(x_i) / Q(x_i)
    3. mean_f += r_i * F(x_i)
    4. norm_factor += r_i
  4. return mean_f / norm_factor

Implementation


Note

Since it is often the case that \( p(x)f(x) \) has big variation across the domain and that a significant portion of the mass is concentrated only in relatively small regions; therefore in the set of importance weights \( \{ r_l \} \), it could be dorminated only by a few weights with large values. If the proposal distribution \( Q(x) \) is so far off such that the importance weights produced are small at regions where \( p(x)f(x) \) is large, then the resuling mean might not be a very good approximation to the actual mean.


Reference


comments powered by Disqus