I'm working with panel data. I want to add 3 and 5-year moving averages of some of my variables into my dataset. I have the following code which works fine but is very long. Is there a neater and more efficient way of creating moving averaged variables?
dataset <- dataset %>%
group_by(iso3c) %>%
dplyr::mutate(manu_GDP_3=rollapply(manu_GDP,3,mean,align='right',fill=NA),
manu_GDP_5=rollapply(manu_GDP,5,mean,align='right',fill=NA),
age_dep_3=rollapply(age_dep,3,mean,align='right',fill=NA),
age_dep_5=rollapply(age_dep,5,mean,align='right',fill=NA),
agr_GDP_3=rollapply(agr_GDP,3,mean,align='right',fill=NA),
agr_GDP_5=rollapply(agr_GDP,5,mean,align='right',fill=NA),
services_GDP_3=rollapply(services_GDP,3,mean,align='right',fill=NA),
services_GDP_5=rollapply(services_GDP,5,mean,align='right',fill=NA),
debtGNI_3=rollapply(debtGNI,3,mean,align='right',fill=NA),
debtGNI_5=rollapply(debtGNI,5,mean,align='right',fill=NA),
Foreign_liab_3=rollapply(Foreign_liab,3,mean,align='right',fill=NA),
Foreign_liab_5=rollapply(Foreign_liab,5,mean,align='right',fill=NA),
intcapimp_X_3=rollapply(intcapimp_X,3,mean,align='right',fill=NA),
intcapimp_X_5=rollapply(intcapimp_X,5,mean,align='right',fill=NA),
regime_3=rollapply(regime,3,mean,align='right',fill=NA),
regime_5=rollapply(regime,5,mean,align='right',fill=NA),
CBI2_3=rollapply(CBI2,3,mean,align='right',fill=NA),
CBI2_5=rollapply(CBI2,5,mean,align='right',fill=NA),
resource_rent_3=rollapply(resource_rent,3,mean,align='right',fill=NA),
resource_rent_5=rollapply(resource_rent,5,mean,align='right',fill=NA),
oil_rents_3=rollapply(oil_rents,3,mean,align='right',fill=NA),
oil_rents_5=rollapply(oil_rents,5,mean,align='right',fill=NA),
coal_rents_3=rollapply(coal_rents,3,mean,align='right',fill=NA),
coal_rents_5=rollapply(coal_rents,5,mean,align='right',fill=NA)) %>%
ungroup()
CodePudding user response:
rollapply
does work on multiple columns. For example, using the built in mtcars we have the following.
library(zoo)
mtcars %>%
select(1:5) %>%
group_by(cyl) %>%
transform(avg3 = rollmeanr(cbind(hp, disp), 3, fill = NA),
avg5 = rollmeanr(cbind(hp, disp), 5, fill = NA)) %>%
ungroup
giving:
mpg cyl disp hp drat avg3.hp avg3.disp avg5.hp avg5.disp
1 21.0 6 160.0 110 3.90 NA NA NA NA
2 21.0 6 160.0 110 3.90 NA NA NA NA
3 22.8 4 108.0 93 3.85 104.33333 142.66667 NA NA
4 21.4 6 258.0 110 3.08 104.33333 175.33333 NA NA
5 18.7 8 360.0 175 3.15 126.00000 242.00000 119.6 209.20
6 18.1 6 225.0 105 2.76 130.00000 281.00000 118.6 222.20
7 14.3 8 360.0 245 3.21 175.00000 315.00000 145.6 262.20
8 24.4 4 146.7 62 3.69 137.33333 243.90000 139.4 269.94
...etc...
CodePudding user response:
A couple things:
zoo::rollmean
(and its*r
right variant) are more efficient thanrollapply(., n, mean)
, I suggest using it instead.across
allows you to iterate one or more functions over a selection of columns. I'm usingacross(Sepal.Length:Petal.Width
, ..)below, but it could easily be
across(c(Sepal.Length, Petal.Width, Sepal.Width), ..)` just as easily. When given a list of functions, the naming convention seems intuitive, see the results below.
iris %>%
group_by(Species) %>%
mutate(
across(Sepal.Length:Petal.Width,
list(mean3 = ~ zoo::rollmeanr(., 3, fill = NA),
mean5 = ~ zoo::rollmeanr(., 5, fill = NA)) )
) %>%
ungroup()
# # A tibble: 150 x 13
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_mean3 Sepal.Length_mean5 Sepal.Width_mean3 Sepal.Width_mean5 Petal.Length_mean3 Petal.Length_mean5 Petal.Width_mean3 Petal.Width_mean5
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa NA NA NA NA NA NA NA NA
# 2 4.9 3 1.4 0.2 setosa NA NA NA NA NA NA NA NA
# 3 4.7 3.2 1.3 0.2 setosa 4.9 NA 3.23 NA 1.37 NA 0.2 NA
# 4 4.6 3.1 1.5 0.2 setosa 4.73 NA 3.1 NA 1.4 NA 0.2 NA
# 5 5 3.6 1.4 0.2 setosa 4.77 4.86 3.3 3.28 1.4 1.4 0.2 0.2
# 6 5.4 3.9 1.7 0.4 setosa 5 4.92 3.53 3.36 1.53 1.46 0.267 0.24
# 7 4.6 3.4 1.4 0.3 setosa 5 4.86 3.63 3.44 1.5 1.46 0.3 0.26
# 8 5 3.4 1.5 0.2 setosa 5 4.92 3.57 3.48 1.53 1.5 0.3 0.26
# 9 4.4 2.9 1.4 0.2 setosa 4.67 4.88 3.23 3.44 1.43 1.48 0.233 0.26
# 10 4.9 3.1 1.5 0.1 setosa 4.77 4.86 3.13 3.34 1.47 1.5 0.167 0.24
# # ... with 140 more rows
This means your code might be reduced to the following (untested):
dataset %>%
group_by(iso3c) %>%
mutate(
across(c(manu_GDP, age_dep, agr_GDP, services_GDP, debtGNI, Foreign_liab,
intcapimp_X, regime, CBI2, resource_rent, oil_rents, coal_rents),
list("3" = ~ zoo::rollmeanr(., 3, fill = NA),
"5" = ~ zoo::rollmeanr(., 5, fill = NA)) )
) %>%
ungroup()