Hope the community can help me since I am relatively new to R and to the StackOverflow community.
I am trying to replace a missing value of a group with the average of the 3 previous years and then use this newly generated mean to continue generating the next period missing value in R either using dplyr or data.table. My data looks something like this (desired output column rounded to 2 digits):
df <- data.frame(gvkey = c(10443, 10443, 10443, 10443, 10443, 10443, 10443, 29206, 29206, 29206, 29206, 29206), fyear = c(2005, 2006, 2007, 2008, 2009, 2010, 2011, 2017, 2018, 2019, 2020, 2021), receivables = c(543, 595, 757, NA, NA, NA, NA, 147.469, 161.422, 154.019, NA, NA), desired_output = c(543, 595, 757, 631.67, 661.22, 683.30, 658.73, 147.47, 161.42, 154.02, 154.30, 156.58))
I have attempted the following line of code, but it does not use the newly generated number:
df <- df %>% mutate(mean_rect=rollapply(rect,3,mean,align='right',fill=NA))
Any help would be greatly appreciated!
CodePudding user response:
Because your desired fill value depends on any previously created fill values, I think the only reasonable approach is a trusty for
loop:
df$out <- NA
for (i in 1:nrow(df)) {
if (!is.na(df$receivables[i])) {
df$out[i] <- df$receivables[i]
} else {
df$out[i] <- mean(df$out[(i-3):(i-1)], na.rm = T)
}
}
gvkey fyear receivables desired_output out
1 10443 2005 543.000 543.00 543.0000
2 10443 2006 595.000 595.00 595.0000
3 10443 2007 757.000 757.00 757.0000
4 10443 2008 NA 631.67 631.6667
5 10443 2009 NA 661.22 661.2222
6 10443 2010 NA 683.30 683.2963
7 10443 2011 NA 658.73 658.7284
8 29206 2017 147.469 147.47 147.4690
9 29206 2018 161.422 161.42 161.4220
10 29206 2019 154.019 154.02 154.0190
11 29206 2020 NA 154.30 154.3033
12 29206 2021 NA 156.58 156.5814