Home > Back-end >  Calculate running average over multiple variables in R
Calculate running average over multiple variables in R

Time:04-27

I am trying to calculate the running average of many variables in R in my data frame. Consider using the air quality data as an example, I can achieve this on the Wind variable with dplyr like so:

require(dplyr)

 airquality <- airquality %>% 
  group_by(Month) %>% 
  mutate(rec = 1) %>% 
  mutate(rollavg = cumsum(Wind)/cumsum(rec)) %>% 
  select(-rec)

head(as.data.frame(airquality))
#  Ozone Solar.R Wind Temp Month Day   rollavg
#1    41     190  7.4   67     5   1  7.400000
#2    36     118  8.0   72     5   2  7.700000
#3    12     149 12.6   74     5   3  9.333333
#4    18     313 11.5   62     5   4  9.875000
#5    NA      NA 14.3   56     5   5 10.760000
#6    28      NA 14.9   66     5   6 11.450000

But my data set has over 100 variables, so is there a way this can be achieved without writing this code for each one? Say I wanted to get the running average for Temp as well, I am looking for something like this:

    require(dplyr)   

vars <- c("Wind", "Temp")
    
airquality <- airquality %>% 
      group_by(Month) %>% 
      mutate(rec = 1) %>% 
      mutate(rollavg = cumsum(vars)/cumsum(rec)) %>% 
      select(-rec)

But this just returns NA throughout.

CodePudding user response:

You could use across:

airquality <- airquality %>% 
  group_by(Month) %>% 
  mutate(across(your_variables,~cumsum(.x)/row_number(),
              .names = 'rollavg_{.col}'))
  • Related