Here are some example how my data look like. I want to compute the average of each column in my data frame for each 5 years.
data=data.frame(date=c('01-01-2011', '04-05-2011',
'02-09-2021', '11-11-2011'),
va1=c(34,56,78,32),va2=c(34,56,78,32))
example, if the date 01-01-1990 ......01-01-2010, compute the average by 5 years
01-01-1995 (previous 5 years)
01-01-2000 (previous 5 years)
01-01-2005 (previous 5 years)
01-01-2010 (previous 5 years)
CodePudding user response:
I am not clear on what you mean by 5 years so we have assumed that you want the 5 years ending in the current row's date. Note that the format of the example data dates in the question is ambiguous.
Convert to a zoo series using read.zoo
. This will also convert the dates to Date class with the indicated format and sort the data. Then compute the widths (number of rows to average over for each row) as w
and run rollapplyr
. If you need a data.frame use fortify.zoo(zz)
.
library(zoo)
z <- read.zoo(data, format = "%m-%d-%Y")
tt <- time(z)
w <- seq_along(tt) - findInterval(tt - 5 * 365.25, tt)
zz <- cbind(z, mean5yr = rollapplyr(z, w, mean)); zz
giving:
va1.z va2.z va1.mean5yr va2.mean5yr
2011-01-01 34 34 34.00000 34.00000
2011-04-05 56 56 45.00000 45.00000
2011-11-11 32 32 40.66667 40.66667
2021-02-09 78 78 78.00000 78.00000
Added
Here are some variations depending on what you want. library(zoo)
is need for all.
z <- read.zoo(data, FUN = as.yearmon, format = "%m-%d-%Y")
tt <- time(z); w <- seq_along(tt) - findInterval(tt - 5, tt)
zz <- cbind(z, mean5yr = rollapplyr(z, w, mean)); zz
## va1.z va2.z va1.mean5yr va2.mean5yr
## Jan 2011 34 34 34.00000 34.00000
## Apr 2011 56 56 45.00000 45.00000
## Nov 2011 32 32 40.66667 40.66667
## Feb 2021 78 78 78.00000 78.00000
or based on partitioning the time range into non=overlapping 5 year pieces here are some variations.
z <- read.zoo(data, format = "%m-%d-%Y")
tt <- as.integer(as.yearmon(as.Date(cut(time(z), "5 years"))))
aggregate(z, tt, mean)
## va1 va2
## 2011 40.66667 40.66667
## 2021 78.00000 78.00000
z <- read.zoo(data, format = "%m-%d-%Y")
bins <- c(-Inf, 2010, 2015, 2020, Inf) # custom bins
tt <- cut(as.integer(as.yearmon(time(z))), bins, right = FALSE)
aggregate(z, tt, mean)
## va1 va2
## [2010,2015) 40.66667 40.66667
## [2020, Inf) 78.00000 78.00000
z <- read.zoo(data, format = "%m-%d-%Y")
tt <- 5 * (ceiling(as.yearmon(time(z))) %/% 5) # first yr
aggregate(z, tt, mean)
## va1 va2
## 2010 40.66667 40.66667
## 2020 78.00000 78.00000
z <- read.zoo(data, format = "%m-%d-%Y")
tt <- as.integer(5 * ceiling(as.yearmon(time(z)) / 5)) # last yr
aggregate(z, tt, mean)
## va1 va2
## 2015 40.66667 40.66667
## 2025 78.00000 78.00000
CodePudding user response:
A solution could be:
periods <- seq(as.Date("1990-01-01"),as.Date("2030-01-01"), by= "5 years")
groups <- cut(as.Date(data$date, "%m-%d-%Y"), breaks = periods, labels = periods[-1])
average_by_5yr <- aggregate(data[,c("va1","va2")], by=list(period=groups), mean)
average_by_5yr
Of course it's the case you know in advance the starting and ending periods, as i can deduce from the question, if i'm not wrong.
EDIT
Also see the fragment as.Date(data$date, "%m-%d-%Y")
. I can't guess from your example whether the date is in mdy format or dmy format. I assume the first.