How to run a function x times in different clusters in r?-CodePudding

I have this command: replicate(10^4, binom_ttest(100, 0.5)) %>% {sum(.<0.05)}/20000

binom_ttest is a function I have created that returns 2x p.value 1 for the binom test and 1 for the t test

As this is a very long calculation I wanted to ask how I can distribute it to 2 clusters? I know there is a possibility with parLapply but this doesn't work: parLapply(makeCluster(2), 1:10000, binom_ttest(100, 0.5))

CodePudding user response：

The future.apply package provides future_replicate(), which is a parallel implementation of replicate();

library(future.apply)
plan(multisession, workers = 2)

y <- future_replicate(10^4, binom_ttest(100, 0.5))

It makes sure proper parallel random number generation (RNG) is used, which is critical when doing permutation tests, bootstrapping, etc.

CodePudding user response：

Here's an example test run of parallel. I'd recommend to try a dry run with a simple function and then expand it to your more complicated example.

As a side note, every piece of data set you are using has to be available to all clusters and hence has to be exported like simpl() in the example below.

library(parallel)

cl <- makeCluster(getOption("cl.cores", 2))

cl
# socket cluster with 2 nodes on host ‘localhost’

simpl <- function(x)diag(x)

clusterExport(cl, varlist=("simpl"))

parLapply( cl, 1:5, function(x) simpl(x) )
#[[1]]
#     [,1]
#[1,]    1
#
#[[2]]
#     [,1] [,2]
#[1,]    1    0
#[2,]    0    1
#
#... etc
#
#[[5]]
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    1    0    0    0    0
#[2,]    0    1    0    0    0
#[3,]    0    0    1    0    0
#[4,]    0    0    0    1    0
#[5,]    0    0    0    0    1

CodePudding user response：

Here is a way.
You forgot to export the function to the workers and in order to make the code reproducible, it's better to set the pseudo-RNG.

library(parallel)

binom_ttest <- function(n, p) {   
  x <- sample(0:1, n, replace = TRUE, prob = c(1-p, p))   
  xsum <- sum(x==1)    
  p_binom <- binom.test(xsum, n, 0.5)[["p.value"]]   
  p_ttest <- t.test(x, mu=0.5)[["p.value"]]   
  c(p_binom, p_ttest) 
}

cl <- makeCluster(2)
clusterExport(cl, "binom_ttest")
clusterSetRNGStream(cl = cl, 2021)
res <- parSapply(cl, 1:10000, FUN = function(i) binom_ttest(100, 0.5))
stopCluster(cl)

rowMeans(res < 0.05)
#[1] 0.0347 0.0571