Is there a way to do parallelism in this R Code?-CodePudding

I am trying to learn parallelism in R. I have written a code where I have a 50*50 matrix from the value 1 to 250000. For each element in the matrix, I am looking for its neighbor with the lowest value. The neighbors can be in a diagonal position too. Then I am replacing the element itself with the lowest neighbor. The amount of time taken for running this code is about 4.5 seconds on my computer. If it is possible to do, could anyone help me make the for loops parallel? Here is the Code Snippet

start_time <- Sys.time()


myMatrix <- matrix(1:250000, nrow=500) # a 500 * 500 matrix from 1 to 250000


indexBound <- function(row,col) { # this function is to check if the indexes are out of bound
  if(row<0 || col <0 || row > 500 || col >500){
    return (FALSE)
  }
  else{
    return (TRUE)
  }
}


for(row in 1:nrow(myMatrix)){
  
  for(col in 1:ncol(myMatrix)){
    li <- list()
    if(indexBound(row-1,col-1)){
      li <- c(li,myMatrix[row-1,col-1])
     
    }
    if(indexBound(row-1,col)){
      li <- c(li,myMatrix[row-1,col])
     
    }
    if(indexBound(row-1,col 1)){
      li <- c(li,myMatrix[row-1,col 1])
      
    }
    if(indexBound(row,col-1)){
      li <- c(li,myMatrix[row,col-1])
    }
    if(indexBound(row-1,col 1)){
      li <- c(li,myMatrix[row,col 1])
      
    }
    if(indexBound(row 1,col-1)){
      li <- c(li,myMatrix[row 1,col-1])
      
    }
    if(indexBound(row 1,col)){
      li <- c(li,myMatrix[row 1,col])
    
    }
    if(indexBound(row 1,col 1)){
      li <- c(li, myMatrix[row 1,col 1])
     
    }
    min = Reduce(min,li) #find the lowest value from the list
    myMatrix[row,col] = min
  }
}
end_time <- Sys.time()

end_time - start_time

Thank you for your response.

CodePudding user response：

Your script is going to result in a matrix with all elements equal to 2. If that's not the intention, you should create a copy of myMatrix to use when building li (inside the if statements).

I realize that this is probably a contrived example to explore parallelization, but with R it's usually best to focus on vectorization first. When vectorized, this operation can be fast enough that parallelization may actually be slower due to overhead. For example, here is a vectorized solution using a padded matrix (this doesn't give all 2's, and it still doesn't include the current cell in the min calculation):

library(matrixStats)

system.time({
  idxShift <- expand.grid(rep(list(-1:1), 2))[-5,] # don't include the current cell (0, 0)
  myMatrix <- matrix(nrow = 502, ncol = 502)
  myMatrix[2:501, 2:501] <- matrix(1:250000, nrow = 500)
  myMatrix <- matrix(rowMins(mapply(function(i,j) c(myMatrix[2:501   i, 2:501   j]), idxShift$Var1, idxShift$Var2), na.rm = TRUE), nrow = 500)
})

   user  system elapsed 
   0.03    0.00    0.03

Compare it to a parallel version of the same vectorized code using future.apply:

library(future.apply)
plan(multisession)

system.time({
  idxShift <- expand.grid(rep(list(-1:1), 2))[-5,]
  myMatrix <- matrix(nrow = 502, ncol = 502)
  myMatrix[2:501, 2:501] <- matrix(1:250000, nrow = 500)
  myMatrix <- matrix(rowMins(future_mapply(function(i,j) c(myMatrix[2:501   i, 2:501   j]), idxShift$Var1, idxShift$Var2), na.rm = TRUE), nrow = 500)
})

future:::ClusterRegistry("stop")

   user  system elapsed 
   0.10    0.05    2.11

If I didn't mess up something, the parallel solution is slower, and that is even without including plan(multisession) in the timing.