Home > front end >  Parallelized reticulate call with foreach failing
Parallelized reticulate call with foreach failing

Time:10-24

Hi I am trying to call a python function with reticulate in a parallel manner using foreach like so:

library(reticulate)
library(doParallel)
library(foreach)
library(parallel)

py_install("wandb")
wandb <- import("wandb")


cl <- makeCluster(detectCores(), type = 'PSOCK')
registerDoParallel(cl)
foreach(i = 1:5) %dopar% {
    wandb$init(project = "test")
}

gives:

Error in {: task 1 failed - "attempt to apply non-function"
Traceback:

1. foreach(i = 1:5) %dopar% {
 .     wandb$init(project = "test")
 . }
2. e$fun(obj, substitute(ex), parent.frame(), e$data)

Does the foreach package not work with reticulate?

CodePudding user response:

You cannot export reticulate python.builtin.module objects from one R process to another. They are designed to only work within the same R process they're created. If attempted, then you'll get the error your reporting.

If you use future framework for your parallelization, then you have it check for this and give an informative error message immediately, e.g.

library(reticulate)
library(foreach)
library(doFuture)
registerDoFuture()
cl <- parallelly::makeClusterPSOCK(2L)
plan(cluster, workers = cl)

## Detect non-exportable objects and give an error asap
options(future.globals.onReference = "error")

# py_install("wandb")
wandb <- import("wandb")

res <- foreach(i = 1:5) %dopar% {
  wandb$init(project = "test")
  sqrt(i)
}

The call to foreach() will result in:

Error: Detected a non-exportable reference ('externalptr') in one of
the globals ('wandb' of class 'python.builtin.module') used in the
future expression

You can read more about this in https://future.futureverse.org/articles/future-4-non-exportable-objects.html#package-reticulate.

A workaround would be create the wandb object within each iteration, which runs on the worker end. Something like:

res <- foreach(i = 1:5) %dopar% {
  wandb <- import("wandb")
  wandb$init(project = "test", mode = "offline")
  sqrt(i)
}

Disclaimer: I know nothing about the 'wandb' Python module. Maybe the above doesn't make sense.

  • Related