Compute the average per 5 years in a data frame?-CodePudding

Here are some example how my data look like. I want to compute the average of each column in my data frame for each 5 years.

     data=data.frame(date=c('01-01-2011', '04-05-2011',
                   '02-09-2021', '11-11-2011'),
            va1=c(34,56,78,32),va2=c(34,56,78,32))

example, if the date 01-01-1990 ......01-01-2010, compute the average by 5 years

01-01-1995 (previous 5 years)

01-01-2000 (previous 5 years)

01-01-2005 (previous 5 years)

01-01-2010 (previous 5 years)

CodePudding user response：

I am not clear on what you mean by 5 years so we have assumed that you want the 5 years ending in the current row's date. Note that the format of the example data dates in the question is ambiguous.

Convert to a zoo series using read.zoo. This will also convert the dates to Date class with the indicated format and sort the data. Then compute the widths (number of rows to average over for each row) as w and run rollapplyr. If you need a data.frame use fortify.zoo(zz) .

library(zoo)
z <- read.zoo(data, format = "%m-%d-%Y")
tt <- time(z)
w <- seq_along(tt) - findInterval(tt - 5 * 365.25, tt)
zz <- cbind(z, mean5yr = rollapplyr(z, w, mean)); zz

giving:

           va1.z va2.z va1.mean5yr va2.mean5yr
2011-01-01    34    34    34.00000    34.00000
2011-04-05    56    56    45.00000    45.00000
2011-11-11    32    32    40.66667    40.66667
2021-02-09    78    78    78.00000    78.00000

Added

Here are some variations depending on what you want. library(zoo) is need for all.

z <- read.zoo(data, FUN = as.yearmon, format = "%m-%d-%Y")
tt <- time(z); w <- seq_along(tt) - findInterval(tt - 5, tt)
zz <- cbind(z, mean5yr = rollapplyr(z, w, mean)); zz
##          va1.z va2.z va1.mean5yr va2.mean5yr
## Jan 2011    34    34    34.00000    34.00000
## Apr 2011    56    56    45.00000    45.00000
## Nov 2011    32    32    40.66667    40.66667
## Feb 2021    78    78    78.00000    78.00000

or based on partitioning the time range into non=overlapping 5 year pieces here are some variations.

z <- read.zoo(data, format = "%m-%d-%Y")
tt <- as.integer(as.yearmon(as.Date(cut(time(z), "5 years"))))
aggregate(z, tt, mean)
##           va1      va2
## 2011 40.66667 40.66667
## 2021 78.00000 78.00000

z <- read.zoo(data, format = "%m-%d-%Y")
bins <- c(-Inf, 2010, 2015, 2020, Inf)  # custom bins
tt <- cut(as.integer(as.yearmon(time(z))), bins, right = FALSE)
aggregate(z, tt, mean)
##                  va1      va2
## [2010,2015) 40.66667 40.66667
## [2020, Inf) 78.00000 78.00000

z <- read.zoo(data, format = "%m-%d-%Y")
tt <- 5 * (ceiling(as.yearmon(time(z))) %/% 5) # first yr
aggregate(z, tt, mean)
##           va1      va2
## 2010 40.66667 40.66667
## 2020 78.00000 78.00000

z <- read.zoo(data, format = "%m-%d-%Y")
tt <- as.integer(5 * ceiling(as.yearmon(time(z)) / 5)) # last yr
aggregate(z, tt, mean)
##           va1      va2
## 2015 40.66667 40.66667
## 2025 78.00000 78.00000

CodePudding user response：

A solution could be:

periods <- seq(as.Date("1990-01-01"),as.Date("2030-01-01"), by= "5 years")
groups <- cut(as.Date(data$date, "%m-%d-%Y"), breaks = periods, labels = periods[-1]) 
average_by_5yr <- aggregate(data[,c("va1","va2")], by=list(period=groups), mean)

average_by_5yr

Of course it's the case you know in advance the starting and ending periods, as i can deduce from the question, if i'm not wrong.

EDIT Also see the fragment as.Date(data$date, "%m-%d-%Y"). I can't guess from your example whether the date is in mdy format or dmy format. I assume the first.