I'm trying to implement some parallelization in R loops to deal with large raster files. I've used some very useful posts, but cannot make my code work.
Here's an example with three raster files:
library(raster)
#Simulating rasters:
n.size <- 10
env1 <- raster(nrows=n.size, ncols=n.size, xmn=0, xmx=1, ymn=0, ymx=1)
v1 <- runif(ncell(env1)/2, min=0.5, max=1)
v2 <- runif(ncell(env1)/2, min=0, max=0.5)
values(env1) <- c(v1,v2)
env1[c(71:100)] <- NA
env2 <- raster(nrows=n.size, ncols=n.size, xmn=0, xmx=1, ymn=0, ymx=1)
v2 <- runif(ncell(env1)/2, min=0.7, max=1)
v1 <- runif(ncell(env1)/2, min=0, max=0.3)
values(env2) <- c(v1,v2)
env3 <- raster(nrows=n.size, ncols=n.size, xmn=0, xmx=1, ymn=0, ymx=1)
v2 <- runif(ncell(env3)/2, min=0.9, max=1)
v1 <- runif(ncell(env3)/2, min=0, max=0.1)
values(env3) <- c(v1,v2)
myStack <- stack(env1,env2,env3)
plot(myStack)
The tree rasters have the same extent and dimensions, but the first one has some grid cells with missing data. I want to set the correspondent cells in the other two rasters to have missing data as well.
In a serial, traditional loop, I do that
myStack.mod <- myStack
start.time <- Sys.time()
for (j in 2:length(names(myStack))) {
for (i in 1:ncell(myStack[[1]])) {
if (is.na(myStack[[1]][i])) {
myStack.mod[[j]][i] <- NA
}
}
}
end.time <- Sys.time() - start.time
end.time
plot(myStack.mod)
To parallelize it, I tried the following:
cores=detectCores()
cl <- makeCluster(cores[1]-2) #not to overload your computer
registerDoParallel(cl)
myStack.mod <- myStack
start.time <- Sys.time()
foreach (j = 2:length(names(myStack))) %:%
foreach(i = 1:ncell(myStack[[1]])) %dopar% {
if (is.na(myStack[[1]][i])) {
myStack.mod[[j]][i] <- NA
}
}
end.time <- Sys.time() - start.time
end.time
stopCluster(cl)
plot(myStack.mod)
But it doesn't work. Does anyone have any idea where the problem is? Many thanks.
CodePudding user response:
You cannot assign values from inside a %dopar% { ... }
expression. Instead, just as with functions, you need to return values, either by explicitly calling return()
, or by putting the value you want to return last in the expression.
An example,
y <- foreach(i = 1:3) %dopar% {
sqrt(i)
}
What you're trying to do is something like:
y <- double(3)
foreach(i = 1:3) %dopar% {
y[i] <- sqrt(i)
}
but that does not work and was never meant to work. Basically, foreach()
is not a for-loop, it's more like lapply()
.