Calculate stats on a group of raster stack layers using {terra}?-CodePudding

I have a raster stack of 4 layers. Two of the layers are from model 1, two of the layers are from model 2. I need to calculate the median, 5th percentile and 95th percentile of each model. Is there some way to do this in one step? i.e. without writing out two intermediate stacks of rasters and then joining them together again. My attempt is below but it doesn't do the function by group.

library("terra")   
# Create some toy data
a <- rast(ncol = 10, nrow = 10, vals=rep(5,100), names=1)
b <- rast(ncol = 10, nrow = 10, vals=rep(10,100), names=1)
c <- rast(ncol = 10, nrow = 10, vals=rep(5,100), names=2)
d <- rast(ncol = 10, nrow = 10, vals=rep(10,100), names=2)
z <- c(a, b, c, d)

# Try to write a function to do the work
app(z,
    function(x) {
      c(median(x), quantile(x, c(0.05, 0.95)))
      },
     filename = "grouped_stats.tif)

My desired result is a raster stack of 6 layers. Something like this.

class       : SpatRaster
dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
resolution  : 36, 18  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84
sources     : memory  (3 layers)
              memory  (3 layers)
names       : median_1, q5_1, q95_1, median_2, pc5_2, pc95_2
min values  :      7.5,  5.0,  10.0,      7.5,   5.0,   10.0
max values  :      7.5,  5.0,  10.0,      7.5,   5.0,   10.0

Any ideas please? Thanks.

EFFORT 1

Inspired by @spacedman I wrote this function but it doesn't quite get me there. Putting it here as possible inspiration for others.

grouped_stats <- function(x) {
  layers_names <- unique(names(x))
  cell_output <- NA
  for (each_layer in layers_names) {
     cell_output <- rbind(cell_output,
                    c(median(x[[each_layer]], na.rm = TRUE),
                      quantile(x[[each_layer]], 0.05, 0.95)))
     names(cell_output) <- glue("{each_layer}_{c('median','pc5','pc95')}")
  }
  cell_output
}

g <- app(z, fun = grouped_stats)

EFFORT 2

Getting nearer I think, but not quite there.

my_stats_function <- function(x) {c(median(x), quantile(0.05, 0.95))}

app(z, 
    function(x){
      unlist(tapply(x, layer_names, my_stats_function))
      })

class       : SpatRaster 
dimensions  : 10, 10, 4  (nrow, ncol, nlyr)
resolution  : 36, 18  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84
source      : memory
names       :   11, 1.95%,   21, 2.95%
min values  : 7.50,  0.05, 7.50,  0.05
max values  : 7.50,  0.05, 7.50,  0.05

EFFORT 3

Think I'm about there. :-)

my_stats_function <- function(x) {c(median(x), quantile(x, c(0.05, 0.95)))}

app(z, 
    function(x){
      unlist(tapply(x, layer_names, my_stats_function))
      })

class       : SpatRaster
dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
resolution  : 36, 18  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84
source      : memory
names       : 11, 1.5%, 1.95%, 21, 2.5%, 2.95%
min values  :  5,    5,     5,  5,    5,     5
max values  :  5,    5,     5,  5,    5,     5

CodePudding user response：

If you work with the parts within the function you can do this:

First define which layers are which group:

> p1 = c(1,3)
> p2 = c(2,4)

Then subset x and work with that:

> app(z, function(x){c(median(x[p1]), quantile(x[p1], c(0.05, 0.95)), median(x[p2], ), quantile(x[p2], c(0.05, 0.95)))})
class       : SpatRaster                  
dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
resolution  : 36, 18  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
source      : memory 
names       : lyr.1, lyr.2, lyr.3, lyr.4, lyr.5, lyr.6 
min values  :     5,     5,     5,    10,    10,    10 
max values  :     5,     5,     5,    10,    10,    10

I'm not sure how you get a median of 7.5 in your sample though, so maybe you used mean for that, and maybe your groupings are different...

CodePudding user response：

Your example data (with changed layer names for clarity)

library(terra)
a <- rast(ncol = 10, nrow = 10, vals=rep(5,100), names="A")
b <- rast(ncol = 10, nrow = 10, vals=rep(10,100), names="A")
c <- rast(ncol = 10, nrow = 10, vals=rep(5,100), names="B")
d <- rast(ncol = 10, nrow = 10, vals=rep(10,100), names="B")
z <- c(a, b, c, d)

What the CRAN version of "terra" you cannot use tapp because it does not handle functions that return multiple numbers. However, with the development version (>= 1.6.29) you can use it.

x <- tapp(z, names(z), f)
x
#class       : SpatRaster 
#dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
#resolution  : 36, 18  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source(s)   : memory
#names       : A_5%, A_50%, A_95%, B_5%, B_50%, B_95%
#min values  : 5.25,   7.5,  9.75, 5.25,   7.5,  9.75 
#max values  : 5.25,   7.5,  9.75, 5.25,   7.5,  9.75

That is the cleanest solution, I think. But in this case, because there is a quantile<SpatRaster> method it might be faster to do

probs <- c(0.05, 0.5, 0.95) 
ids <- names(z)
uids <- unique(ids)
x <- lapply(uids, function(i) quantile(z[[ids == i]], probs))
rast(x)
#class       : SpatRaster 
#dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
#resolution  : 36, 18  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source(s)   : memory
#names       : q0.05, q0.5, q0.95, q0.05, q0.5, q0.95 
#min values  :  5.25,  7.5,  9.75,  5.25,  7.5,  9.75 
#max values  :  5.25,  7.5,  9.75,  5.25,  7.5,  9.75

But I would only even look at that if the rasters are very large.

I assume that in all cases it is more efficient to use

function(x) quantile(x, c(0.5, 0.05, 0.95))

instead of

function(x) c(median(x), quantile(x, c(0.05, 0.95)))

And here is a relatively simple solution for the current CRAN version, with a loop and app

f <- function(x) quantile(x, c(0.05, 0.5, 0.95)) 
ids <- names(z)
uids <- unique(ids)
x <- lapply(uids, function(i) app(z[[ids == i]], f))
rast(x)
#class       : SpatRaster 
#dimensions  : 10, 10, 6  (nrow, ncol, nlyr)
#resolution  : 36, 18  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#sources     : memory  (3 layers) 
#              memory  (3 layers) 
#names       :   5%, 50%,  95%,   5%, 50%,  95% 
#min values  : 5.25, 7.5, 9.75, 5.25, 7.5, 9.75 
#max values  : 5.25, 7.5, 9.75, 5.25, 7.5, 9.75