Finite Discrete Distribution Sampling

by Weidong Liang

Beijing, 2014.04.04

Introduction

Probability of a finite discrete distribution can be specified by its probability mass function (pmf), \( P(x = X_i) = p_i \). For simple univariate discrete distribution, we can just use an array to store the pmf \( [p_0, p_1, ..., p_N ] \). To sample from such a distribution, we first construct the cummulative distribution function: \( F(i) = \sum_{j=1}^{i}{p_j} \), using the idea of probability integral transform, we draw a sample from the uniform distribution \( u \sim U(0, 1) \), locate \( i \) such that \( F(i-1) \le u \lt F(i) \), and return \( X_i \) as the desired sample.

Simulation

Algorithm

FiniteDiscreteDistributionSampling ( PMF(x) ) // Sample from a finite discrete distribution with the given probability mass function

Construct the cummulative distribution using \( F(i) = \sum_{j=1}^{i}{p_j} \)
\( u \sim U(0, 1) \)
Find \( i \) such that \( F(i-1) \le u \lt F(i) \)
Return \( i \)

Implementation

Note

In the above implementation at the step of finding \( i \) such that \( F(i-1) \le u \lt F(i) \), simple linear search is used to make the code easier to understand; which is fine when speed is not a problem. For efficient implementation, we can use binary search (complexity of O(log n)), indexed search, or using the Alias method to achieve constant time complexity.

Reference

Sampling From Finite Discrete Distribution