Home > Software engineering >  Setting GSL RNG seed correctly in Rcpp for model with repeat iterations
Setting GSL RNG seed correctly in Rcpp for model with repeat iterations

Time:02-03

I am writing a stochastic, process driven model of transmission of infection and diagnostic testing to detect infection. The model requires repeat random samples across multiple time steps and iterations. The faster my model can run, the better. For the random sampling in the model, parameters for the random samples can change at each time step in the model. I first wrote my model in R, and then in CPP (via the great Rcpp package). In Rcpp, using the R based random number generator, the model takes about 7% of the time to run as it took in R. I was advised that using GSL within CPP for random number generation is faster again. In the CPP model, with GSL based random sampling instead of R based random sampling, I get a marginal increase in speed. However, I am not sure that I am using the GSL based random sampler correctly.

My questions are:

  1. Is it correct to only do the seed setting procedure once for the GSL RNG based on the time of day and use this same construct for all of my random draws (as I have done in code below)? I confess I do not fully understand the seed setting procedure within CPP for GSL as I am new to both. I have compared the distributions produced using both R-based and GSL-based RNG and they are very similar, so hopefully this bit is OK.

I obtained the code for setting the GSL seed according to the time of day from this Stack Overflow post:

Comparison of run speeds of the random sampling function in R only, CPP using the R RNG and CPP using the GSL RNG based on 100 comparisons of 1000 iterations using the "microbenchmark" package.

CodePudding user response:

A package you may find useful is my RcppZiggurat (github). It revives the old but fast Ziggurat RNG for normal covariates and times it. It use several other Ziggurat implementations as benchmarks -- including one from the GSL.

First, we can use its code and infrastructure to set up a simple structure (see below). I first show that 'yes indeed' we can seed a GSL RNG:

> setseedGSL(42)
> rnormGSLZig(5)
[1] -0.811264  1.092556 -1.873074 -0.146400 -1.653703
> rnormGSLZig(5)    # different
[1] -1.281593  0.893496 -0.545510 -0.337940 -1.258800
> setseedGSL(42)
> rnormGSLZig(5)    # as before
[1] -0.811264  1.092556 -1.873074 -0.146400 -1.653703
>

Note that we need a global variable for an instance of a GSL RNG 'state'.

Second, we can show that Rcpp is actually faster that either the standard normal GSL generator or its Ziggurat implementation. Using Rcpp vectorised is even faster:

> library(microbenchmark)
> n <- 1e5
> res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
> res
Unit: microseconds
             expr     min        lq     mean   median       uq      max neval cld
   rnormGSLZig(n) 996.580 1151.7065 1768.500 1355.053 1424.220 18597.82   100   b
 rnormGSLPlain(n) 996.316 1085.6820 1392.323 1358.696 1431.715  2929.05   100   b
      rcppLoop(n) 223.221  259.2395  641.715  518.706  573.899 13779.20   100  a 
    rcppDirect(n)  46.224   67.2075  384.004  293.499  320.919 14883.86   100  a 
> 

The code is below; it is a pretty quick adaptation from my RcppZiggurat package. You can sourceCpp() it (if you have RcppGSL installed which I used to 'easily' get the compile and link instructions to the GSL) and it will run the demo code shown above.

#include <Rcpp/Lighter>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

// [[Rcpp::depends(RcppGSL)]]

class ZigguratGSL {
public:
    ZigguratGSL(uint32_t seed=12345678) {
        gsl_rng_env_setup() ;
        r = gsl_rng_alloc (gsl_rng_default);
        gsl_rng_set(r, seed);
    }
    ~ZigguratGSL() {
        gsl_rng_free(r);
    }
    double normZig() {
        const double sigma=1.0;
        return gsl_ran_gaussian_ziggurat(r, sigma);
    }
    double normPlain() {
        const double sigma=1.0;
        return gsl_ran_gaussian_ziggurat(r, sigma);
    }
    void setSeed(const uint32_t seed) {
        gsl_rng_set(r, seed);
    }
private:
    gsl_rng *r;
};

static ZigguratGSL gsl;

// [[Rcpp::export]]
void setseedGSL(const uint32_t s) {
    gsl.setSeed(s);
    return;
}

// [[Rcpp::export]]
Rcpp::NumericVector rnormGSLZig(int n) {
    Rcpp::NumericVector x(n);
    for (int i=0; i<n; i  ) {
        x[i] = gsl.normZig();
    }
    return x;
}

// [[Rcpp::export]]
Rcpp::NumericVector rnormGSLPlain(int n) {
    Rcpp::NumericVector x(n);
    for (int i=0; i<n; i  ) {
        x[i] = gsl.normPlain();
    }
    return x;
}

// [[Rcpp::export]]
Rcpp::NumericVector rcppLoop(int n) {
    Rcpp::NumericVector x(n);
    for (int i=0; i<n; i  ) {
        x[i] = R::rnorm(1.0,0.0);
    }
    return x;
}

// [[Rcpp::export]]
Rcpp::NumericVector rcppDirect(int n) {
    return Rcpp::rnorm(n, 1.0, 0.0);
}


/*** R
setseedGSL(42)
rnormGSLZig(5)
rnormGSLZig(5)    # different
setseedGSL(42)
rnormGSLZig(5)    # as before


library(microbenchmark)
n <- 1e5
res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
res
*/

PS We write it as Rcpp. Capital R, lowercase cpp.

  • Related