Parallel Rcpp for MCMC diagnostics
A common issue in Bayesian statistics and Markov Chain Monte Carlo is the concept of convergence. When approximating parameters by MCMC, we expect the chains to converge to the stationary distributions. Visual inspection of a MCMC traceplot can suggest convergence, but a more robust solution is to use multiple chains. If multiple have arrived at the same distribution, then we can be more certain of convergence.
One of the key challenges with multiple chains is that MCMC simulations are often computationally intensive. Even if the sampler is written in a compiled language, running two or three chains sequentially will double or triple what may already be a lengthy process. In this post, I show how to use Open-MP for parallelizing MCMC simulations. After doing so, the same code with 2-3 chains will likely only take slightly longer than one chain.
Convergence Criteria
There are several diagnostics for quantifying convergence, but the Gelman-Rubin diagnostic (Gelman & Rubin, 1992) is most commonly used. The Gelman-Rubin diagnostic requires that the parameter in question must be approximately normal.
Convergence is assessed by parameter (one parameter may have converged, but a second parameter may not have). Here i indicates the MCMC iteration, and j indicates the chain number. Note that the burn-in portion of a MCMC simulation is not included in the calculation of this diagnostic. The Gelman-Rubin diagnostic is calculated as follows.
First, the variance of each chain of a parameter is calculated
Next, the within chain variance is calculated by taking the mean of these variances
The between chain variance is calculated as
where
Finally, the variance of the stationary distribution is
And the Gelman-Rubin diagnostic is just the square root of this value divided by the within chain variance.
For a univariate parameter, it is appropriate to use a matrix to handle multiple MCMC chains of samples from the stationary distribution. Consider the columns to be individual chains and the rows to represent MCMC samples. The following is an R function to calculate the Gelman-Rubin diagnostic of a parameter stored in such a fashion.
A parameter is usually considered converged when the Gelman-Rubin statistic is . If your parameter has not converged, this can usually be fixed by increasing the number of burnin iterations.
Check out the Parallel Example package for implementation details.
Parallel Chains
Below is C++ code for approximating the mean and variance of a normal distribution. This approximation is performed by a Gibbs Sampler with the following full conditionals
where
This sampler is implemented in C++ via Rcpp (Eddelbuettel & François, 2011) and parallelized using open-MP.
To add the openmp flag to the compiler, add the following two lines to your src/Makevars
file
There are 4 key differences between this code and a single-threaded Gibbs sampler
1) The use of #include <omp.h>
2) The preprocessor statement #pragma omp parallel for num_threads(chains)
before the for
loop around the chains. This preprocessor command is what indicates parallelization. The number of threads can (and is in this case) be a variable. Additionally, all variables in the scope above the preprocessor command are shared amongst the threads unless otherwise specified.
3) All variables used for the Gibbs Sampler (the multi-threaded portion) must be from a thread-safe library, such as the C++ STL or armadillo. The standard Rcpp
types are not thread-safe, and may not be used.
4) The compiler must be made aware of open-MP by a flag. These instructions are passed to the compiler via the commands in src/Makevars
Some caveats: not every compiler supports openmp. For example, the default clang
compiler on OS X will not compile the above code. An alternative is to use gcc
via homebrew installed with the command brew install gcc --without-multilib
. On most linux distributions, gcc
should compile C++ with open-MP without additional configuration.
Results
To investigate performance of parallel chains using open-MP, I simulated data from a normal distribution with 1,000, 10,000, 100,000, and 1,000,000 observations and benchmarked the above code with 1,000 iterations and 1,000 burnins. Below is a density plot of the timings (in milliseconds) by chain number as well as number of observations in the simulated data.
While there is some overhead with adding an additional thread, going from 1 to 2 chains and 3 to 4 chains is mostly negligible. A more computationally intensive model would demonstrate the time savings of using open-MP for parallel chains even further.
For an additional example of open-MP for parallel MCMC chains see this Mixture Model package.
References
- Gelman, A., & Rubin, D. B. (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 7(4), 457–472. doi:10.1214/ss/1177011136
- Eddelbuettel, D., & François, R. (2011). Rcpp: Seamless R and C++ Integration. Journal Of Statistical Software, 40(8), 1–18. Retrieved from http://www.jstatsoft.org/v40/i08/