Summarise multi-dimensional array by month-CodePudding

I have a multi-dimensional array where the 3rd dimension represents time. For purposes of this question, let's use the ozone dataset from the plyr package:

> str(ozone)
 num [1:24, 1:24, 1:72] 260 258 258 254 252 252 250 248 248 248 ...
 - attr(*, "dimnames")=List of 3
  ..$ lat : chr [1:24] "-21.2" "-18.7" "-16.2" "-13.7" ...
  ..$ long: chr [1:24] "-113.8" "-111.3" "-108.8" "-106.3" ...
  ..$ time: chr [1:72] "1" "2" "3" "4" ...

From the documentation:

The data are monthly ozone averages on a very coarse 24 by 24 grid covering Central America, from Jan 1995 to Dec 2000. The data is stored in a 3d area with the first two dimensions representing latitude and longitude, and the third representing time.

What I would like to do is to create the monthly average for each of the lat/long cells. I can do this for a single lat/long combination with tapply like so:

> tapply(ozone[1,1,], rep(1:12, 6), mean)
       1        2        3        4        5        6        7        8        9       10       11       12 
264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333 277.0000 285.0000 283.0000 273.3333

but I am stumped on doing this over the entire array at once. apply will let me select the dimensions to operate over (MARGIN), tapply will let me use a factor to select slices (INDEX), but I need both.

I am open to suggestions but prefer to work with arrays rather than data frames due to the size and complexity of the data.

CodePudding user response：

You can divide the time dimension in month and year and use then apply.

x <- plyr::ozone
x <- array(x, c(dim(x)[1:2], 12, dim(x)[3]/12),
           c(dimnames(x)[1:2], list(month=1:12, year=1995:2000)))
#dim(x) <- c(dim(x)[1:2], 12, dim(x)[3]/12) #Alternative without names
. <- apply(x, 1:3, mean)
.[1,1,]
#       1        2        3        4        5        6        7        8 
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333 
#       9       10       11       12 
#277.0000 285.0000 283.0000 273.3333

Another option can be.

. <- simplify2array(lapply(split(dimnames(plyr::ozone)[[3]], 1:12), \(x)
                    apply(plyr::ozone[,,x], 1:2, mean)))
.[1,1,]
#       1        2        3        4        5        6        7        8 
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333 
#       9       10       11       12 
#277.0000 285.0000 283.0000 273.3333

CodePudding user response：

Using the array defined reproducibly in the Note at the end. The precise output was not defined but if you wanted a different order of dimensions then use aperm.

month_mean <- function(x) c(tapply(x, rep(1:12, each = 6), mean))
aa <- apply(a, 1:2, month_mean)

Now check the result

# check
all.equal(aa[, 4, 9], month_mean(a[4, 9, ]))
## [1] TRUE

# another check
for(i in 1:24) for(j in 1:24) {
  check <- all.equal(aa[, i, j], month_mean(a[i, j, ]))
  if (!check) stop("not equal")
}

Note

set.seed(123)
a <- array(runif(24 * 24 * 72), c(24, 24, 72))