Home > Software engineering >  How to run a function x times in different clusters in r?
How to run a function x times in different clusters in r?


I have this command: replicate(10^4, binom_ttest(100, 0.5)) %>% {sum(.<0.05)}/20000

binom_ttest is a function I have created that returns 2x p.value 1 for the binom test and 1 for the t test

As this is a very long calculation I wanted to ask how I can distribute it to 2 clusters? I know there is a possibility with parLapply but this doesn't work: parLapply(makeCluster(2), 1:10000, binom_ttest(100, 0.5))

CodePudding user response:

The future.apply package provides future_replicate(), which is a parallel implementation of replicate();

plan(multisession, workers = 2)

y <- future_replicate(10^4, binom_ttest(100, 0.5))

It makes sure proper parallel random number generation (RNG) is used, which is critical when doing permutation tests, bootstrapping, etc.

CodePudding user response:

Here's an example test run of parallel. I'd recommend to try a dry run with a simple function and then expand it to your more complicated example.

As a side note, every piece of data set you are using has to be available to all clusters and hence has to be exported like simpl() in the example below.


cl <- makeCluster(getOption("cl.cores", 2))

# socket cluster with 2 nodes on host ‘localhost’

simpl <- function(x)diag(x)

clusterExport(cl, varlist=("simpl"))

parLapply( cl, 1:5, function(x) simpl(x) )
#     [,1]
#[1,]    1
#     [,1] [,2]
#[1,]    1    0
#[2,]    0    1
#... etc
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    1    0    0    0    0
#[2,]    0    1    0    0    0
#[3,]    0    0    1    0    0
#[4,]    0    0    0    1    0
#[5,]    0    0    0    0    1

CodePudding user response:

Here is a way.
You forgot to export the function to the workers and in order to make the code reproducible, it's better to set the pseudo-RNG.


binom_ttest <- function(n, p) {   
  x <- sample(0:1, n, replace = TRUE, prob = c(1-p, p))   
  xsum <- sum(x==1)    
  p_binom <- binom.test(xsum, n, 0.5)[["p.value"]]   
  p_ttest <- t.test(x, mu=0.5)[["p.value"]]   
  c(p_binom, p_ttest) 

cl <- makeCluster(2)
clusterExport(cl, "binom_ttest")
clusterSetRNGStream(cl = cl, 2021)
res <- parSapply(cl, 1:10000, FUN = function(i) binom_ttest(100, 0.5))

rowMeans(res < 0.05)
#[1] 0.0347 0.0571
  • Related