Home > Blockchain >  How to include the last month in computing the mean of a year?
How to include the last month in computing the mean of a year?

Time:12-31

in a simple data frame like this:

  dat=as.data.frame(a=c(5,6,7,8,9),date=as.Date(c("1971-01-01","1971-02-01","1971-12-01","1972-01-01","1972-02-01"), "%m-%d-%y"),month=c(1,2,12,1,2))

I need to compute the mean for each year: considering the months 1 and 2 from the same year but 12 from the previous year

first mean value will include  "1971-01-01","1971-02-01"
Second mean value will include  "1971-12-01","1972-01-01","1972-02-01"   etc

CodePudding user response:

A possible solution:

library(tidyverse) 
library(lubridate)

dat=data.frame(a=c(5,6,7,8,9),date=c("1971-01-01","1971-02-01","1971-12-01","1972-01-01","1972-02-01"),month=c(1,2,12,1,2))

dat %>% 
  mutate(year = if_else(month(date) == 12, year(date) 1, year(date))) %>% 
  group_by(year) %>% 
  summarise(avg = mean(a[month %in% c(1,2,12)]))

#> # A tibble: 2 × 2
#>    year   avg
#>   <dbl> <dbl>
#> 1  1971   5.5
#> 2  1972   8

CodePudding user response:

Another way would be:

transform(dat, year = ifelse(month == 12,
                             as.integer(format(date, '%Y'))   1,
                             as.integer(format(date, '%Y')))) -> dat


aggregate(a ~ year, dat, mean)

# year   a
# 1 1971 5.5
# 2 1972 8.0
  •  Tags:  
  • r
  • Related