Process sets of rasters in parallel using lapp function from terra package-CodePudding

I have groups of rasters that I want to run a function on, I think probably using the lapp function from the {terra} package? Here is a simple example using toy data of the 'style' of thing I am hoping to accomplish.

library("terra")

rp10val = 106520
rp20val = 106520
rp50val = 154250
rp100val = 154250
rp200val = 154250
rp500val = 154250
rp1500val = 154250
sopval = 200

rp_10_vul = rast(nrow = 10, ncol = 10, vals = rep(rp10val, 10))
rp_20_vul = rast(nrow = 10, ncol = 10, vals = rep(rp20val, 10))
rp_50_vul = rast(nrow = 10, ncol = 10, vals = rep(rp50val, 10))
rp_100_vul = rast(nrow = 10, ncol = 10, vals = rep(rp100val, 10))
rp_200_vul = rast(nrow = 10, ncol = 10, vals = rep(rp200val, 10))
rp_500_vul = rast(nrow = 10, ncol = 10, vals = rep(rp500val, 10))
rp_1500_vul = rast(nrow = 10, ncol = 10, vals = rep(rp1500val, 10))
sop_tile = rast(nrow = 10, ncol = 10, vals = rep(sopval, 10))

input_raster_group <- c(rp_10_vul, rp_20_vul, rp_50_vul, rp_100_vul, 
                        rp_200_vul, rp_500_vul, rp_1500_vul, sop_tile)

## In real world each of these lists would have rasters with different data in

input_raster_lists <- list(list(input_raster_group), 
                           list(input_raster_group),
                           list(input_raster_group))

mcmapply(lapp,
         input_raster_lists,
         function(a,b,c,d,e,f,g,h){a b c d e f g h},
         mc.cores = 2)

## If working on windows, this might be better to try and run as proof of concept
# mapply(lapp,
#         input_raster_lists,
#         function(a,b,c,d,e,f,g,h){(a b-c) / (d e f g h)})

CodePudding user response：

Simplified data to make this easier to read

library("terra")
r10 = rast(nrow = 10, ncol = 10, vals = 10)
r20 = rast(nrow = 10, ncol = 10, vals = 20)
r50 = rast(nrow = 10, ncol = 10, vals = 50)
group <- c(r10, r20, r50)
input <- list(group, group, group)

You can use lapply to compute lists sequentially

x <- lapply(input, \(i) sum(i))
y <- lapply(input, \(i) app(i, sum))
z <- lapply(input, \(i) lapp(i, function(a,b,c){a b c}))

To use parallelization you could use e.g. parallel::parLapply or, as in your case, parallel::mcmapply.

SpatRaster objects hold a pointer (reference) to a C object that cannot be passed to a worker. Therefore you would need to use wrap and unwrap as I show below. I use proxy=TRUE to not force values to memory.

library(parallel)
inp <- lapply(input, \(x) wrap(x, proxy=TRUE))
f <- \(i) { unwrap(i) |> sum() |> wrap(proxy=TRUE)}
b <- mcmapply(f, inp)
out <- lapply(b, unwrap)

This approach may be useful in some cases, e.g. when you have to do many simulations on a relatively small raster that is memory.

In most cases you would do parallelization because you are dealing with large rasters that are on disk. In that case, you could just send the filenames to the workers, and create the SpatRasters there (and write the output to disk).

There is more discussion here