I have a multi-dimensional array where the 3rd dimension represents time. For purposes of this question, let's use the ozone
dataset from the plyr
package:
> str(ozone)
num [1:24, 1:24, 1:72] 260 258 258 254 252 252 250 248 248 248 ...
- attr(*, "dimnames")=List of 3
..$ lat : chr [1:24] "-21.2" "-18.7" "-16.2" "-13.7" ...
..$ long: chr [1:24] "-113.8" "-111.3" "-108.8" "-106.3" ...
..$ time: chr [1:72] "1" "2" "3" "4" ...
From the documentation:
The data are monthly ozone averages on a very coarse 24 by 24 grid covering Central America, from Jan 1995 to Dec 2000. The data is stored in a 3d area with the first two dimensions representing latitude and longitude, and the third representing time.
What I would like to do is to create the monthly average for each of the lat/long cells. I can do this for a single lat/long combination with tapply
like so:
> tapply(ozone[1,1,], rep(1:12, 6), mean)
1 2 3 4 5 6 7 8 9 10 11 12
264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333 277.0000 285.0000 283.0000 273.3333
but I am stumped on doing this over the entire array at once. apply
will let me select the dimensions to operate over (MARGIN
), tapply
will let me use a factor to select slices (INDEX
), but I need both.
I am open to suggestions but prefer to work with arrays rather than data frames due to the size and complexity of the data.
CodePudding user response:
You can divide the time dimension in month and year and use then apply
.
x <- plyr::ozone
x <- array(x, c(dim(x)[1:2], 12, dim(x)[3]/12),
c(dimnames(x)[1:2], list(month=1:12, year=1995:2000)))
#dim(x) <- c(dim(x)[1:2], 12, dim(x)[3]/12) #Alternative without names
. <- apply(x, 1:3, mean)
.[1,1,]
# 1 2 3 4 5 6 7 8
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333
# 9 10 11 12
#277.0000 285.0000 283.0000 273.3333
Another option can be.
. <- simplify2array(lapply(split(dimnames(plyr::ozone)[[3]], 1:12), \(x)
apply(plyr::ozone[,,x], 1:2, mean)))
.[1,1,]
# 1 2 3 4 5 6 7 8
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333
# 9 10 11 12
#277.0000 285.0000 283.0000 273.3333
CodePudding user response:
Using the array defined reproducibly in the Note at the end. The precise output was not defined but if you wanted a different order of dimensions then use aperm
.
month_mean <- function(x) c(tapply(x, rep(1:12, each = 6), mean))
aa <- apply(a, 1:2, month_mean)
Now check the result
# check
all.equal(aa[, 4, 9], month_mean(a[4, 9, ]))
## [1] TRUE
# another check
for(i in 1:24) for(j in 1:24) {
check <- all.equal(aa[, i, j], month_mean(a[i, j, ]))
if (!check) stop("not equal")
}
Note
set.seed(123)
a <- array(runif(24 * 24 * 72), c(24, 24, 72))