Home > Enterprise >  Summarise multi-dimensional array by month
Summarise multi-dimensional array by month

Time:05-23

I have a multi-dimensional array where the 3rd dimension represents time. For purposes of this question, let's use the ozone dataset from the plyr package:

> str(ozone)
 num [1:24, 1:24, 1:72] 260 258 258 254 252 252 250 248 248 248 ...
 - attr(*, "dimnames")=List of 3
  ..$ lat : chr [1:24] "-21.2" "-18.7" "-16.2" "-13.7" ...
  ..$ long: chr [1:24] "-113.8" "-111.3" "-108.8" "-106.3" ...
  ..$ time: chr [1:72] "1" "2" "3" "4" ...

From the documentation:

The data are monthly ozone averages on a very coarse 24 by 24 grid covering Central America, from Jan 1995 to Dec 2000. The data is stored in a 3d area with the first two dimensions representing latitude and longitude, and the third representing time.

What I would like to do is to create the monthly average for each of the lat/long cells. I can do this for a single lat/long combination with tapply like so:

> tapply(ozone[1,1,], rep(1:12, 6), mean)
       1        2        3        4        5        6        7        8        9       10       11       12 
264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333 277.0000 285.0000 283.0000 273.3333

but I am stumped on doing this over the entire array at once. apply will let me select the dimensions to operate over (MARGIN), tapply will let me use a factor to select slices (INDEX), but I need both.

I am open to suggestions but prefer to work with arrays rather than data frames due to the size and complexity of the data.

CodePudding user response:

You can divide the time dimension in month and year and use then apply.

x <- plyr::ozone
x <- array(x, c(dim(x)[1:2], 12, dim(x)[3]/12),
           c(dimnames(x)[1:2], list(month=1:12, year=1995:2000)))
#dim(x) <- c(dim(x)[1:2], 12, dim(x)[3]/12) #Alternative without names
. <- apply(x, 1:3, mean)
.[1,1,]
#       1        2        3        4        5        6        7        8 
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333 
#       9       10       11       12 
#277.0000 285.0000 283.0000 273.3333 

Another option can be.

. <- simplify2array(lapply(split(dimnames(plyr::ozone)[[3]], 1:12), \(x)
                    apply(plyr::ozone[,,x], 1:2, mean)))
.[1,1,]
#       1        2        3        4        5        6        7        8 
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333 
#       9       10       11       12 
#277.0000 285.0000 283.0000 273.3333 

CodePudding user response:

Using the array defined reproducibly in the Note at the end. The precise output was not defined but if you wanted a different order of dimensions then use aperm.

month_mean <- function(x) c(tapply(x, rep(1:12, each = 6), mean))
aa <- apply(a, 1:2, month_mean)

Now check the result

# check
all.equal(aa[, 4, 9], month_mean(a[4, 9, ]))
## [1] TRUE

# another check
for(i in 1:24) for(j in 1:24) {
  check <- all.equal(aa[, i, j], month_mean(a[i, j, ]))
  if (!check) stop("not equal")
}

Note

set.seed(123)
a <- array(runif(24 * 24 * 72), c(24, 24, 72))
  •  Tags:  
  • r
  • Related