Home > Net >  What is the most efficient way to apply operations between arrays of different sizes to each row of
What is the most efficient way to apply operations between arrays of different sizes to each row of

Time:01-18

I have a large three dimensional array corresponding to a hyperspectral cube in spatRaster format. I also have two smaller ones (smaller area) that correspond to reference images (dark and white) used to correct values in the pixels of the cube. The smaller ones only have one line of pixels and were originally spatRasters but were transformed to array format to do some calculations. Lastly, I have a couple of constants.

For each line in the sample image, I need to subtract the line of pixels in dark image from the line of pixels in the sample image and then divide it by the line of pixels in the white image. Then, multiply by the constants and store the corrected line in a new array that eventually becomes a corrected spatRaster.

I wrote a script that does this as follows:

#hyperspectral cube
nlines_s <- 1412  #number of lines of pixels in the image
npixels_s <- 1024 #number of pixels within each line
nbands_s <- 448   #number of spectral bands within each pixel

sample_image <- terra::rast(array(runif(1),dim = c(nlines_s ,npixels_s ,nbands_s ))) 

# spatRasters already transformed to array type
dark_current <- array(runif(1),dim = c(1,1024,448)) 
white_reference <- array(runif(1),dim = c(1,1024,448))

#integration time constants
t_s <- 13
t_w <- 13

int_time <- t_w/t_s

#reflectance of reference material
R_ref <- 0.99 



#Process that needs to be speed up
reflectance_image <- array(0, c(nlines_s,npixels_s,nbands_s ))

for (fr in 1:nlines_s) {
  
      reflectance_image[fr, , ] <- as.matrix(R_ref * int_time  * ((sample_image[fr, ,]-dark_current)/white_reference) )

print(paste0("Calculating reflectance: ",round((fr/nlines_s)*100,2), "%"))      
}

ri_r <- terra::rast(reflectance_image)
names(ri_r) <- names(sample_image)

The code works as is, but it takes a long time because of the sequential nature of the for loop. I suspect this can be optimized by applying the calculation to all the lines (rows) of the image at the same time. I've been trying to do so using apply functions but I have not succeeded (might have done it wrong, though).

What would be the solution that would speed up the processing time the most?

CodePudding user response:

Here is your (adjusted) example data.

library(terra)
#terra 1.6.52
nr <- 1400
nc <- 1000
nl <- 40
set.seed(1)
image <- terra::rast(nrow=nr, ncol=nc, nlyr=nl, vals=runif(nr*nc*nl))

dark <- array(runif(nc*nl),dim = c(1, nc, nl)) 
white <- array(runif(nc*nl),dim = c(1, nc, nl))
int_time <- 1
R_ref <- 0.99 

Your approach, slightly improved

f1 <- function() {
    ref <- array(0, c(nr, nc, nl))
    for (r in 1:nr) {
         ref[r, , ] <- as.matrix(((image[r, ] - dark)/white) )
    }
    ref <- terra::rast(ref * (R_ref * int_time), ext=ext(image), crs=crs(image))
    names(ref) <- names(image)
    ref
}

A "terra" solution

f2 <- function() {
    d <- rast(dark, ext=ext(image), crs=crs(image))
    d <- disagg(d, c(nrow(image), 1))
    w <- rast(white, ext=ext(image), crs=crs(image))
    w <- disagg(w, c(nrow(image), 1))
    (R_ref * int_time) * (image - d) / w
}

The above is better but still clumsy. In terra version "terra 1.6.52" I have added support for arithmetic computations with a SpatRaster and a matrix. The columns in the matrix represent layers, the rows represent cells. I use that in f3 below.

f3 <- function() {
    dcur <- t(apply(dark, 2, c))
    wref <- t(apply(white, 2, c))
    (R_ref * int_time) * (image - dcur) / wref
}

Comparison

system.time(x1 <- f1())
#   user  system elapsed 
#  15.36    2.50   17.92 
system.time(x2 <- f2())                                         
#   user  system elapsed 
#   9.34    1.36   10.71 
system.time(x3 <- f3())
#   user  system elapsed 
#   1.74    0.45    2.19 

Note that f3 depends on "terra 1.6.52". That is currently the development version. You can install it with install.packages('terra', repos='https://rspatial.r-universe.dev')

Unfortunately, when I use nl=400, that is, use dimensions more similar to your data, the relative differences are much smaller.

system.time(x1 <- f1())
#   user  system elapsed 
# 145.01   16.11  161.53 
system.time(x2 <- f2())                                                                                  
#   user  system elapsed 
# 108.78   14.95  124.08 
system.time(x3 <- f3())
#   user  system elapsed 
# 100.62   10.85  111.91 

CodePudding user response:

What would be the solution that would speed up the processing time the most?

This may not be the absolute most optimal, but a simple RcppArmadillo function that takes base R arrays instead of SpatRaster objects speeds it up by almost an order of magnitude. Using @RoberHijmans larger dataset (with 448 bands) and f3:

Rcpp::cppFunction(
  "arma::cube f4(arma::cube& img, const arma::mat& drk, const arma::mat& wht) {
    for(unsigned int i = 0; i < img.n_rows; i  ) {
      img.row(i) -= drk;
      img.row(i) %= wht;
    }
    return(img);
  }",
  depends = "RcppArmadillo",
  plugins = "cpp11"
)

ref_image1 <- as.array(image)
system.time(ref_image1 <- f4(ref_image1, dark[1,,], R_ref*int_time/white[1,,]))
#>  user  system elapsed 
#> 14.78    0.65   15.42 
system.time(ref_image2 <- f3())
#>   user  system elapsed                   
#> 139.89    8.63  149.09

Note that the two solutions will be very slightly different due to the different order of the multiplications.

all.equal(ref_image1, as.array(ref_image2))
#> [1] "Mean relative difference: 3.520606e-08"
  • Related