Hi I am trying to call a python function with reticulate in a parallel manner using foreach
like so:
library(reticulate)
library(doParallel)
library(foreach)
library(parallel)
py_install("wandb")
wandb <- import("wandb")
cl <- makeCluster(detectCores(), type = 'PSOCK')
registerDoParallel(cl)
foreach(i = 1:5) %dopar% {
wandb$init(project = "test")
}
gives:
Error in {: task 1 failed - "attempt to apply non-function"
Traceback:
1. foreach(i = 1:5) %dopar% {
. wandb$init(project = "test")
. }
2. e$fun(obj, substitute(ex), parent.frame(), e$data)
Does the foreach
package not work with reticulate?
CodePudding user response:
You cannot export reticulate python.builtin.module
objects from one R process to another. They are designed to only work within the same R process they're created. If attempted, then you'll get the error your reporting.
If you use future framework for your parallelization, then you have it check for this and give an informative error message immediately, e.g.
library(reticulate)
library(foreach)
library(doFuture)
registerDoFuture()
cl <- parallelly::makeClusterPSOCK(2L)
plan(cluster, workers = cl)
## Detect non-exportable objects and give an error asap
options(future.globals.onReference = "error")
# py_install("wandb")
wandb <- import("wandb")
res <- foreach(i = 1:5) %dopar% {
wandb$init(project = "test")
sqrt(i)
}
The call to foreach()
will result in:
Error: Detected a non-exportable reference ('externalptr') in one of
the globals ('wandb' of class 'python.builtin.module') used in the
future expression
You can read more about this in https://future.futureverse.org/articles/future-4-non-exportable-objects.html#package-reticulate.
A workaround would be create the wandb
object within each iteration, which runs on the worker end. Something like:
res <- foreach(i = 1:5) %dopar% {
wandb <- import("wandb")
wandb$init(project = "test", mode = "offline")
sqrt(i)
}
Disclaimer: I know nothing about the 'wandb' Python module. Maybe the above doesn't make sense.