by Weidong Liang
Beijing, 2014.04.05
As mentioned in the chapter introduction, one possible usage of pseudo-random number sampling is to obtain a numerical approximation to the integral \( \mathbb{E}[f(x)] = \int_{x \sim P(X)} {f(x) \, dx} \approx \frac{1}{N} \, \sum_1^N{f(x_i)} \), importance sampling provides a framework for evaluating the expectation; but itself is not able to draw samples from the distribution \( P(X) \).
Importance sampling assume that we are able to evaluate the given probability density function up to a normalization constant, i.e., let \( p(x) \) be the probability density function of the concerned distribution, and let \( p(x) = \frac{\tilde{p}(x)}{Z_p} \), importance sampling required that we can readily evaluate \( \tilde{p}(x) \), ability to evaluate the normalization constant \( Z_p \) is not required.
Assuming that we have a proposal distribution \( q(x) \) from which we can easily draw samples \( x^{(l)} \) from, the simple form of importance sampling is: $$ \mathbb{E}[f] = \int{f(x) p(x) dx} = \int{f(x) \frac{p(x)}{q(x)} q(x) dx} \approx \frac{1}{L} \sum_{l=1}^{L}{ \frac{p(x^{x(l)})}{q(x^{(l)})} f(x^{(l)}) } $$
For distribution that can only evaluated up to a normalization constant, let \( q(x) = \frac{\tilde{q}(x)}{Z_q} \), we have: $$ \mathbb{E}[f] = \int{f(x) p(x) dx} = \int{f(x) \frac{p(x)}{q(x)} q(x) dx} = \frac{Z_p}{Z_q} \int{ f(x) \frac{ \tilde{p}(x) }{ \tilde{q}(x) } dx } \approx \frac{Z_p}{Z_q} \frac{1}{L} \sum_{l=1}^{L}{ \tilde{r_l} f( x^{(l)} ) } $$ where \( \tilde{r_l} = \frac{ \tilde{p}( x^{(l)} ) }{ \tilde{q}( x^{(l)} ) } \). Similarly, the ratio of \( \frac{Z_p}{Z_q} \) can also be approximated using: $$ \frac{Z_p}{Z_q} = \frac{1}{Z_q} \int{ \tilde{p}(x) dx } = \int{ \frac{ \tilde{p}(x) }{ \tilde{q}(x) } q(x) dx } \approx \frac{1}{L} \sum_{l=1}^{L}{ \tilde{r_l} } $$ Combining the above we have: $$ \mathbb{E}[f] \approx \sum_{l=1}^{L}{ w_l f(x^{(l)}) } $$ where $$ w_l = \frac{ \tilde{r_l} }{ \sum_m{ \tilde{r_m} } } = \frac{ \tilde{p}( x^{(l)} ) / q( x^{(l)} ) }{ \sum_m{ \tilde{p}( x^{(m)} ) / q( x^{(m)} ) } } $$
Similar to rejection sampling, the efficiency of the importance sampling depends greatly on how welll the sampling distribution \( q(x) \) matches \( p(x) \).
In the following simulation, we construct a toy problem using the bimodal normal distribution \( P(x) = \frac{1}{2}[N(0.8, 0.1^2) + N(0.3, 0.2^2)] \) as the distribution under which, the objective function \( F(x) = sin(5 x)+2.2 \) is evaluated to find the mean. The proposal distribution is a unimodal normal distribution \( Q(x) = N(0.5, 0.4^2) \) whose samples can be easily generated.
ExpectationByImportanceSampling(tilde_P, F, Q, QSampler, num_samples) // Evaluate \( \mathbb{E}_{P(X)}[F] \) using Importance Sampling
Since it is often the case that \( p(x)f(x) \) has big variation across the domain and that a significant portion of the mass is concentrated only in relatively small regions; therefore in the set of importance weights \( \{ r_l \} \), it could be dorminated only by a few weights with large values. If the proposal distribution \( Q(x) \) is so far off such that the importance weights produced are small at regions where \( p(x)f(x) \) is large, then the resuling mean might not be a very good approximation to the actual mean.