I have a parallel process I've had working in R for a while using a FORK cluster over a PSOCK (due to the speed and memory overhead of PSOCK) and registering the backend using registerDoParallel. I wanted to include a progress bar to see how far along my code is (sometimes if can run for hours or days depending on the number of iterations) and to see if any changes help speed up my code. Sadly, I can't include a progress bar with doParallel, but I can with doSNOW. Problem is, if I use doSNOW with a FORK I get this error "no application method for 'sendData' applied to an object of class "c('forknode', 'SOCK0node')". It works flawlessly with a PSOCK though, but I have the memory issues with PSOCK I'd like to avoid. How do I get around this? Am I unable to use a FORK with doSNOW?
CodePudding user response:
(author of the Futureverse here)
You can use foreach()
with doFuture and then the multicore
future backend for forked parallelization. This will allow you to use progressr for progress updates.
Here's an example adopted from https://progressr.futureverse.org/#foreach-with-dofuture:
library(doFuture)
registerDoFuture() ## %dopar% parallelizes via future
plan(multicore) ## forked parallel processing (via 'parallel')
library(progressr)
handlers(global = TRUE)
handlers("cli") ## how progress is reported
my_fcn <- function(xs) {
p <- progressor(along = xs)
foreach(x = xs) %dopar% {
Sys.sleep(6.0-x)
p(sprintf("x=%g", x))
sqrt(x)
}
}
my_fcn(1:5)
■■■■■■■■■■■■ 40% | x=4 ETA: 3s
PS. This will also allow you to move away from depending on the snow package, which is considered deprecated and is now falling behind of improvements that have been done to the parallel package.