Home > Software design >  Apply running average by year for all columns in R
Apply running average by year for all columns in R

Time:10-28

I need to calculate the running average by year for each column. My data example:

dat <- data.frame(yr = c(1980, 1980, 1980, 1980, 1980, 1981, 1981, 1981, 1981, 1981, 1982, 1982, 1982, 1982, 1982), data1 = c(-10.16,  -7.48,  -3.31,  -6.04, -11.68, -13.40, -10.41, -10.65,  -6.70, -17.05, -25.62, -29.14, -16.65,  -6.42,   0.28), data2 = c(2.30, -7.52, -13.26, -13.24, -14.74,  -9.38,  -8.93, -11.78, -14.07, -11.66,  -8.82, -10.30,  -7.99, -10.02, -15.36), data3 = c(-14.83, -15.08, -16.44, -18.95, -13.40,  -7.16, -4.35,  -1.61,  -0.01,  -0.35,  -2.09,  -3.12,   0.87,  -0.06,   2.29))

The running average is calculated as follows:

library(TTR)
library(dplyr)
runavg <- with(dat, ave(data1, yr, FUN=function(x) 
    TTR::runMean(x, n=3)) )

The question is simple, but I would like to ask you to explain in more detail how to apply this code to every column? I tried various solutions using lapply, function, mutate... I just realized that I do not have enough knowledge in this area :(. I will be very grateful for the help.

CodePudding user response:

We may use across in tidyverse to loop across all the 'data' columns and creating new columns by applying the runMean function

library(dplyr)
library(TTR)
dat <- dat %>% 
   group_by(yr) %>%
     mutate(across(starts_with('data'),
      ~ runMean(., n = 3), .names = 'runavg_{.col}'))

-output

dat
# A tibble: 15 × 7
# Groups:   yr [3]
      yr  data1  data2  data3 runavg_data1 runavg_data2 runavg_data3
   <dbl>  <dbl>  <dbl>  <dbl>        <dbl>        <dbl>        <dbl>
 1  1980 -10.2    2.3  -14.8         NA           NA          NA    
 2  1980  -7.48  -7.52 -15.1         NA           NA          NA    
 3  1980  -3.31 -13.3  -16.4         -6.98        -6.16      -15.4  
 4  1980  -6.04 -13.2  -19.0         -5.61       -11.3       -16.8  
 5  1980 -11.7  -14.7  -13.4         -7.01       -13.7       -16.3  
 6  1981 -13.4   -9.38  -7.16        NA           NA          NA    
 7  1981 -10.4   -8.93  -4.35        NA           NA          NA    
 8  1981 -10.6  -11.8   -1.61       -11.5        -10.0        -4.37 
 9  1981  -6.7  -14.1   -0.01        -9.25       -11.6        -1.99 
10  1981 -17.0  -11.7   -0.35       -11.5        -12.5        -0.657
11  1982 -25.6   -8.82  -2.09        NA           NA          NA    
12  1982 -29.1  -10.3   -3.12        NA           NA          NA    
13  1982 -16.6   -7.99   0.87       -23.8         -9.04       -1.45 
14  1982  -6.42 -10.0   -0.06       -17.4         -9.44       -0.77 
15  1982   0.28 -15.4    2.29        -7.60       -11.1         1.03 

The similar option in base R is lapply

dat[paste0('runavg_', names(dat)[-1])] <- lapply(dat[-1], function(x) ave(x, dat$yr, FUN = function(u) runMean(u,  n = 3)))

CodePudding user response:

1) This uses rollmeanr from zoo and also uses by from base R. rollmeanr can handle multiple columns at once providing a bit of simplification. This splits up dat[-1] by dat[[1]], i.e. by year, and then applies rollmeanr to each component. Then we rbind the components back together and cbind the original dat to it.

library(zoo)

avg <- do.call("rbind", by(dat[-1], dat[[1]], rollmeanr, 3, fill = NA))
cbind(dat, avg = avg)

giving:

     yr  data1  data2  data3  avg.data1  avg.data2   avg.data3
1  1980 -10.16   2.30 -14.83         NA         NA          NA
2  1980  -7.48  -7.52 -15.08         NA         NA          NA
3  1980  -3.31 -13.26 -16.44  -6.983333  -6.160000 -15.4500000
4  1980  -6.04 -13.24 -18.95  -5.610000 -11.340000 -16.8233333
5  1980 -11.68 -14.74 -13.40  -7.010000 -13.746667 -16.2633333
6  1981 -13.40  -9.38  -7.16         NA         NA          NA
7  1981 -10.41  -8.93  -4.35         NA         NA          NA
8  1981 -10.65 -11.78  -1.61 -11.486667 -10.030000  -4.3733333
9  1981  -6.70 -14.07  -0.01  -9.253333 -11.593333  -1.9900000
10 1981 -17.05 -11.66  -0.35 -11.466667 -12.503333  -0.6566667
11 1982 -25.62  -8.82  -2.09         NA         NA          NA
12 1982 -29.14 -10.30  -3.12         NA         NA          NA
13 1982 -16.65  -7.99   0.87 -23.803333  -9.036667  -1.4466667
14 1982  -6.42 -10.02  -0.06 -17.403333  -9.436667  -0.7700000
15 1982   0.28 -15.36   2.29  -7.596667 -11.123333   1.0333333

1a) Another possibility is to define your own Mean function which takes the mean if the years are all the same and returns NA otherwise. That can be used with rollapplyr.

library(zoo)

Mean <- function(x) if (var(x[, 1]) == 0) colMeans(x[, -1]) else NA
data.frame(dat, avg = rollapplyr(dat, 3, Mean, fill = NA, by.column = FALSE))

2) Using dplyr and zoo we can use group_by and group_modify like this. For each year the group_modify function runs rollmeanr.

library(dplyr, exclude = c("lag", "filter"))
library(zoo)

dat %>%
  group_by(yr) %>%
  group_modify(~ data.frame(., avg = rollmeanr(., 3, fill = NA)) ) %>%
  ungroup
  • Related