I am trying to calculate the running average of many variables in R in my data frame. Consider using the air quality data as an example, I can achieve this on the Wind
variable with dplyr like so:
require(dplyr)
airquality <- airquality %>%
group_by(Month) %>%
mutate(rec = 1) %>%
mutate(rollavg = cumsum(Wind)/cumsum(rec)) %>%
select(-rec)
head(as.data.frame(airquality))
# Ozone Solar.R Wind Temp Month Day rollavg
#1 41 190 7.4 67 5 1 7.400000
#2 36 118 8.0 72 5 2 7.700000
#3 12 149 12.6 74 5 3 9.333333
#4 18 313 11.5 62 5 4 9.875000
#5 NA NA 14.3 56 5 5 10.760000
#6 28 NA 14.9 66 5 6 11.450000
But my data set has over 100 variables, so is there a way this can be achieved without writing this code for each one? Say I wanted to get the running average for Temp
as well, I am looking for something like this:
require(dplyr)
vars <- c("Wind", "Temp")
airquality <- airquality %>%
group_by(Month) %>%
mutate(rec = 1) %>%
mutate(rollavg = cumsum(vars)/cumsum(rec)) %>%
select(-rec)
But this just returns NA throughout.
CodePudding user response:
You could use across
:
airquality <- airquality %>%
group_by(Month) %>%
mutate(across(your_variables,~cumsum(.x)/row_number(),
.names = 'rollavg_{.col}'))