Home > Enterprise >  Nested parallelized (foreach) R loop with condition
Nested parallelized (foreach) R loop with condition

Time:11-09

I'm trying to implement some parallelization in R loops to deal with large raster files. I've used some very useful posts, but cannot make my code work.

Here's an example with three raster files:

library(raster)

#Simulating rasters:

n.size <- 10

env1 <- raster(nrows=n.size, ncols=n.size, xmn=0, xmx=1, ymn=0, ymx=1)
v1 <- runif(ncell(env1)/2, min=0.5, max=1)
v2 <- runif(ncell(env1)/2, min=0, max=0.5)
values(env1) <- c(v1,v2)
env1[c(71:100)] <- NA

env2 <- raster(nrows=n.size, ncols=n.size, xmn=0, xmx=1, ymn=0, ymx=1)
v2 <- runif(ncell(env1)/2, min=0.7, max=1)
v1 <- runif(ncell(env1)/2, min=0, max=0.3)
values(env2) <- c(v1,v2)

env3 <- raster(nrows=n.size, ncols=n.size, xmn=0, xmx=1, ymn=0, ymx=1)
v2 <- runif(ncell(env3)/2, min=0.9, max=1)
v1 <- runif(ncell(env3)/2, min=0, max=0.1)
values(env3) <- c(v1,v2)

myStack <- stack(env1,env2,env3)

plot(myStack)

The tree rasters have the same extent and dimensions, but the first one has some grid cells with missing data. I want to set the correspondent cells in the other two rasters to have missing data as well.

In a serial, traditional loop, I do that

myStack.mod <- myStack

start.time <- Sys.time()
for (j in 2:length(names(myStack))) {
    for (i in 1:ncell(myStack[[1]])) {
        if (is.na(myStack[[1]][i])) {
            myStack.mod[[j]][i] <- NA
        }
    }
}
end.time <-   Sys.time() - start.time
end.time
plot(myStack.mod)

To parallelize it, I tried the following:

cores=detectCores()
cl <- makeCluster(cores[1]-2) #not to overload your computer
registerDoParallel(cl)

myStack.mod <- myStack

start.time <- Sys.time()

foreach (j = 2:length(names(myStack))) %:%
    foreach(i = 1:ncell(myStack[[1]])) %dopar% {
        if (is.na(myStack[[1]][i])) {
            myStack.mod[[j]][i] <- NA
        }
    }
    
    

end.time <-   Sys.time() - start.time
end.time

stopCluster(cl)

plot(myStack.mod)

But it doesn't work. Does anyone have any idea where the problem is? Many thanks.

CodePudding user response:

You cannot assign values from inside a %dopar% { ... } expression. Instead, just as with functions, you need to return values, either by explicitly calling return(), or by putting the value you want to return last in the expression.

An example,

y <- foreach(i = 1:3) %dopar% {
  sqrt(i)
}

What you're trying to do is something like:

y <- double(3)
foreach(i = 1:3) %dopar% {
  y[i] <- sqrt(i)
}

but that does not work and was never meant to work. Basically, foreach() is not a for-loop, it's more like lapply().

  • Related