Discussion:
[Rcpp-devel] Parallel random numbers using Rcpp and OpenMP
Matteo Fasiolo
2014-04-27 23:30:24 UTC
Permalink
Dear Rcpp developers,

I am working on a small package (here <https://github.com/mfasiolo/mvn>)
that should provide efficient tools
for the multivariate normal distribution using Rcpp/RcppArmadillo and
OpenMP.

Creating functions that evaluate the multivariate normal density efficiently
was fairly straightforward, but I am struggling with parallel random number
generation with OpenMP.

As I understand, doing things such as:

NumericMatrix out(n, d);
#pragma omp for schedule(static)
for(int kk = 0; kk < d; kk++) out( _, kk) = rnorm(n);

is not going to work, because rnorm() is not thread safe
(in fact this code crashed R). On the other hand R level parallelism
using clusterApply, mclapply etc appears to be too slow to be of any
use for this purpose.

Is anybody aware of any package providing a parallel C++ rng which my
package might link to? I have read this
post<http://www.lindonslog.com/programming/parallel-random-number-generation-trng/>
about
Tina's rng, which
seems to work with OpenMP and Rcpp. How hard would it be to have
such a library included in my package (if at all possible)?

Sorry for the possibly silly questions and thanks for any suggestion.

Matteo
Dirk Eddelbuettel
2014-04-28 00:10:23 UTC
Permalink
Matteo,

On 28 April 2014 at 01:30, Matteo Fasiolo wrote:
| Dear Rcpp developers,
|
|  I am working on a small package (here) that should provide efficient tools 
| for the multivariate normal distribution using Rcpp/RcppArmadillo and OpenMP.
|  
| Creating functions that evaluate the multivariate normal density efficiently
| was fairly straightforward,

Yes, and there is also a Rcpp Gallery post on this topic:
http://gallery.rcpp.org/articles/dmvnorm_arma/

| but I am struggling with parallel random number
| generation with OpenMP.
|
| As I understand, doing things such as:
|
| NumericMatrix out(n, d);
| #pragma omp for schedule(static)
| for(int kk = 0; kk < d; kk++) out( _, kk) = rnorm(n);
|
| is not going to work, because rnorm() is not thread safe 
| (in fact this code crashed R). On the other hand R level parallelism
| using clusterApply, mclapply etc appears to be too slow to be of any
| use for this purpose.

You cannot 'touch R' at all if you are in an OpenMP context as R is nowhere
near thread safe. Use R as a 'shell' to launch a job, keep data in C++
containers and go crazy in OpenMP. Then collect results and report back to R.
|
| Is anybody aware of any package providing a parallel C++ rng which my
| package might link to? I have read this post about Tina's rng, which
| seems to work with OpenMP and Rcpp. How hard would it be to have
| such a library included in my package (if at all possible)?

A number of people have worked on this. I used this to try to make the 'port'
of DEoptim to C++ / Rcpp work in parallel. I got it partially working but
didn't get to a point where I liked the reproducibility.

Around that time, Petr Savicky had already done some work, and I presume
continued with that. See his page on this research which you can find at
http://www2.cs.cas.cz/~savicky/randomNumbers/ -- I not sure if he has code
at R-Forge or GitHub.
Matt D.
2014-04-28 08:39:28 UTC
Permalink
Post by Matteo Fasiolo
[...]
NumericMatrix out(n, d);
#pragma omp for schedule(static)
for(int kk = 0; kk < d; kk++) out( _, kk) = rnorm(n);
is not going to work, because rnorm() is not thread safe
(in fact this code crashed R). On the other hand R level parallelism
using clusterApply, mclapply etc appears to be too slow to be of any
use for this purpose.
Is anybody aware of any package providing a parallel C++ rng which my
package might link to?
Your first choice can be to go with the C++ Standard Library -- header
<random> -- (importantly) using one object per thread:
http://stackoverflow.com/questions/8813592/c11-thread-safety-of-random-number-generators

See:
http://en.cppreference.com/w/cpp/numeric/random
http://isocpp.org/blog/2013/03/n3551-random-number-generation

Since you're using OpenMP, you can get the distinct thread ID by using
the `omp_get_thread_num` function:
http://stackoverflow.com/questions/15918758/how-to-make-each-thread-use-its-own-rng-in-c11

Note that if you're doing large scale parallel statistics (say, Monte
Carlo) you may also want a PRNG with statistical properties which don't
deteriorate (e.g., no spurious correlation, let alone overlapping) for
very large samples; preferably, also one that doesn't maintain any kind
of global state (so-called "stateless") is going to be easier to use,
too (nothing to synchronize/lock across threads).

A relatively recent work that's particularly relevant is Random123:
https://www.deshawresearch.com/resources_random123.html
http://www.thesalmons.org/john/random123/

MIT-licensed C++ version is available here:
http://www.sitmo.com/article/parallel-random-number-generator-in-c/

With a simple example:
http://www.sitmo.com/article/multi-threaded-random-number-generation-in-c11/

// The author is currently working on getting it into Boost.Random;
discussion:
http://www.wilmott.com/messageview.cfm?catid=44&threadid=95882&STARTPAGE=2#710955

HTH,

Best,

Matt
Matteo Fasiolo
2014-04-28 10:51:09 UTC
Permalink
Hi all,

many thanks for your detailed replies, I see that there are many options
out there.

As a first step I think I will try out the C++11 option that Matt
suggested, as it seems
relatively simple. In particular I might do something like:

#pragma omp parallel
{
std::mt19937_64 engine(
static_cast<uint64_t>(seeds[omp_get_thread_num()] ) );
std::uniform_real_distribution<double> zeroToOne(0.0, 1.0);

#pragma omp for
for (int i = 0; i < 1000; i++) {
double a = zeroToOne(engine);
}
}

were seeds is a NumericVector coming from R. That's probably not ideal, but
it might
give reasonable and reproducible results.

Thanks,

Matteo
Post by Matt D.
Post by Matteo Fasiolo
[...]
NumericMatrix out(n, d);
#pragma omp for schedule(static)
for(int kk = 0; kk < d; kk++) out( _, kk) = rnorm(n);
is not going to work, because rnorm() is not thread safe
(in fact this code crashed R). On the other hand R level parallelism
using clusterApply, mclapply etc appears to be too slow to be of any
use for this purpose.
Is anybody aware of any package providing a parallel C++ rng which my
package might link to?
Your first choice can be to go with the C++ Standard Library -- header
http://stackoverflow.com/questions/8813592/c11-thread-
safety-of-random-number-generators
http://en.cppreference.com/w/cpp/numeric/random
http://isocpp.org/blog/2013/03/n3551-random-number-generation
Since you're using OpenMP, you can get the distinct thread ID by using the
http://stackoverflow.com/questions/15918758/how-to-
make-each-thread-use-its-own-rng-in-c11
Note that if you're doing large scale parallel statistics (say, Monte
Carlo) you may also want a PRNG with statistical properties which don't
deteriorate (e.g., no spurious correlation, let alone overlapping) for very
large samples; preferably, also one that doesn't maintain any kind of
global state (so-called "stateless") is going to be easier to use, too
(nothing to synchronize/lock across threads).
https://www.deshawresearch.com/resources_random123.html
http://www.thesalmons.org/john/random123/
http://www.sitmo.com/article/parallel-random-number-generator-in-c/
http://www.sitmo.com/article/multi-threaded-random-number-
generation-in-c11/
// The author is currently working on getting it into Boost.Random;
discussion: http://www.wilmott.com/messageview.cfm?catid=44&
threadid=95882&STARTPAGE=2#710955
HTH,
Best,
Matt
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Romain Francois
2014-04-28 11:00:01 UTC
Permalink
Hi,

If you can assume c++11, you might bot need openmp, as you can just use standard support for threads, etc ...

Although the compiler suite used on windows (if you care about that) does not support c++11 threads. This might be available later.

Romain
Post by Matteo Fasiolo
Hi all,
many thanks for your detailed replies, I see that there are many options out there.
As a first step I think I will try out the C++11 option that Matt suggested, as it seems
#pragma omp parallel
{
std::mt19937_64 engine( static_cast<uint64_t>(seeds[omp_get_thread_num()] ) );
std::uniform_real_distribution<double> zeroToOne(0.0, 1.0);
#pragma omp for
for (int i = 0; i < 1000; i++) {
double a = zeroToOne(engine);
}
}
were seeds is a NumericVector coming from R. That's probably not ideal, but it might
give reasonable and reproducible results.
Thanks,
Matteo
Post by Matt D.
Post by Matteo Fasiolo
[...]
NumericMatrix out(n, d);
#pragma omp for schedule(static)
for(int kk = 0; kk < d; kk++) out( _, kk) = rnorm(n);
is not going to work, because rnorm() is not thread safe
(in fact this code crashed R). On the other hand R level parallelism
using clusterApply, mclapply etc appears to be too slow to be of any
use for this purpose.
Is anybody aware of any package providing a parallel C++ rng which my
package might link to?
http://stackoverflow.com/questions/8813592/c11-thread-safety-of-random-number-generators
http://en.cppreference.com/w/cpp/numeric/random
http://isocpp.org/blog/2013/03/n3551-random-number-generation
http://stackoverflow.com/questions/15918758/how-to-make-each-thread-use-its-own-rng-in-c11
Note that if you're doing large scale parallel statistics (say, Monte Carlo) you may also want a PRNG with statistical properties which don't deteriorate (e.g., no spurious correlation, let alone overlapping) for very large samples; preferably, also one that doesn't maintain any kind of global state (so-called "stateless") is going to be easier to use, too (nothing to synchronize/lock across threads).
https://www.deshawresearch.com/resources_random123.html
http://www.thesalmons.org/john/random123/
http://www.sitmo.com/article/parallel-random-number-generator-in-c/
http://www.sitmo.com/article/multi-threaded-random-number-generation-in-c11/
// The author is currently working on getting it into Boost.Random; discussion: http://www.wilmott.com/messageview.cfm?catid=44&threadid=95882&STARTPAGE=2#710955
HTH,
Best,
Matt
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
_______________________________________________
Rcpp-devel mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Mark Clements
2014-04-30 08:10:52 UTC
Permalink
My two cents worth:

For the microsimulation package, we needed uniform random number streams
and sub-streams at the C++ level, while supporting R's non-uniform
random number distributions[*]. For this, we used the C++ RngStreams
library and provided "double *user_unif_rand ()" for user-defined RNGs.
Essentially, this reproduces the "L'Ecuyer-CMRG" RNG provided by R, with
seed manipulation at the C++ level (cf. using .Random.seed and R's C
machinery). We also allow for the C++ seeds to be set to and from R.

See:
https://github.com/mclements/microsimulation/blob/master/src/microsimulation.cc
https://github.com/mclements/microsimulation/blob/master/src/microsimulation.h
(circa lines 290-355)
https://github.com/mclements/microsimulation/blob/master/R/rcpp_hello_world.R
(circa lines 6-36)

This is not implemented using OpenMP. For our purpose, where we are
doing 10^7 small simulations, we chunk the simulations at the R level
and use the parallel package to call the C++ code on each chunk. This
approaches scales well with more processors.

We have also looked at using the C++ RngStream library with
Boost.Random's and C++11's non-uniform random number distributions.
Again, this has not been implemented using OpenMP (todo?). For a simple
wrapper for Boost, see:
https://github.com/mclements/microsimulation/blob/master/src/rngstream-boost.hpp

with an example:
https://github.com/mclements/microsimulation/blob/master/src/rngstream-example.cpp

Two brief notes: first, we have put the RngStream library in the ssim
namespace for use with the microsimulation package. Other uses of the
Boost wrapper would probably want to change the namespace.

Second, the implementation for C++11 random number generators and
distributions required a small change to the RngStream library -
specifically, to output the random uniform as an unsigned long.
Interestingly, the resulting C++11 random numbers drop every second
value compared with R and Boost.Random.

For the C+++11 wrapper, see the code RngStream* and rngstream* in:
https://github.com/mclements/microsimulation/tree/master/test

Sincerely, Mark.

[*] The BH package was not available, C++11 compiler requirements were
not accepted on CRAN, reimplementing non-uniform random number
distributions using UNURAN would have taken too long, and the R
non-uniform random number distributions are extensive and well tested.
Matteo Fasiolo
2014-04-30 09:15:58 UTC
Permalink
Hi Mark,

many thanks for all the info, I will certainly have a detailed look at what
you are doing.

I think it would be nice to have a package that uses C++ level parallelism
(OpenMP or not)
library("microbenchmark")
library("mvtnorm")
library("mvnfast")
library("MASS")
# Generating 1e4 50-dimensional multivariate normal r.v.
N <- 10000
d <- 50
mu <- 1:d
tmp <- matrix(rnorm(d^2), d, d)
mcov <- tcrossprod(tmp, tmp)
myChol <- chol(mcov)
microbenchmark(rmvn(N, mu, mcov, ncores = 4),
+ rmvn(N, mu, mcov),
+ rmvnorm(N, mu, mcov, method = "chol"),
+ mvrnorm(N, mu, mcov))

Unit: milliseconds
expr min lq median
uq max neval
rmvn(N, mu, mcov, ncores = 4) 8.339534 12.85869 13.23915
13.82640 38.04688 100
rmvn(N, mu, mcov) 31.645092 32.32220 33.70345
34.03561 57.78102 100
rmvnorm(N, mu, mcov, method = "chol") 57.956183 58.99730 59.84668
82.11610 90.71133 100
mvrnorm(N, mu, mcov) 60.621391 62.06400 82.86829
85.42350 90.50662 100


Here I followed Matt's suggestion and used C++11 mt19937_64 rng, with
seeds generated by R::runif.

Obviously what you have done by using RngStreams with Boost's
distributions is much more rigorous, so probably

that the approach rights to provide faster version of rnorm(), rpois() etc.


Thanks,


Matteo
For the microsimulation package, we needed uniform random number streams
and sub-streams at the C++ level, while supporting R's non-uniform
random number distributions[*]. For this, we used the C++ RngStreams
library and provided "double *user_unif_rand ()" for user-defined RNGs.
Essentially, this reproduces the "L'Ecuyer-CMRG" RNG provided by R, with
seed manipulation at the C++ level (cf. using .Random.seed and R's C
machinery). We also allow for the C++ seeds to be set to and from R.
https://github.com/mclements/microsimulation/blob/master/src/microsimulation.cc
https://github.com/mclements/microsimulation/blob/master/src/microsimulation.h
(circa lines 290-355)
https://github.com/mclements/microsimulation/blob/master/R/rcpp_hello_world.R
(circa lines 6-36)
This is not implemented using OpenMP. For our purpose, where we are
doing 10^7 small simulations, we chunk the simulations at the R level
and use the parallel package to call the C++ code on each chunk. This
approaches scales well with more processors.
We have also looked at using the C++ RngStream library with
Boost.Random's and C++11's non-uniform random number distributions.
Again, this has not been implemented using OpenMP (todo?). For a simple
https://github.com/mclements/microsimulation/blob/master/src/rngstream-boost.hpp
https://github.com/mclements/microsimulation/blob/master/src/rngstream-example.cpp
Two brief notes: first, we have put the RngStream library in the ssim
namespace for use with the microsimulation package. Other uses of the
Boost wrapper would probably want to change the namespace.
Second, the implementation for C++11 random number generators and
distributions required a small change to the RngStream library -
specifically, to output the random uniform as an unsigned long.
Interestingly, the resulting C++11 random numbers drop every second
value compared with R and Boost.Random.
https://github.com/mclements/microsimulation/tree/master/test
Sincerely, Mark.
[*] The BH package was not available, C++11 compiler requirements were
not accepted on CRAN, reimplementing non-uniform random number
distributions using UNURAN would have taken too long, and the R
non-uniform random number distributions are extensive and well tested.
Loading...